OpenAI is establishing a new team led by Ilya Sutskever, the company’s chief scientist and co-founder, to develop strategies for guiding and controlling “superintelligent” AI systems. In a recent blog post, Sutskever and Jan Leike, a lead on OpenAI’s alignment team, anticipate the arrival of AI systems surpassing human intelligence within the next decade. Recognizing the potential risks associated with such advanced AI, they emphasize the need for research aimed at managing and restraining it.
The current methods for aligning AI, such as reinforcement learning from human feedback, rely on human supervision, but as AI surpasses human intelligence, this supervision becomes inadequate. To address the challenge of “superintelligence alignment,” OpenAI is forming a new Superalignment team, headed by Sutskever and Leike, which will have access to a significant amount of computing power. This team will consist of researchers and engineers from OpenAI’s previous alignment division, as well as experts from other organizations within the company, working together to tackle the technical obstacles associated with controlling superintelligent AI over the next four years.
The team’s approach involves creating a “human-level automated alignment researcher.” The ultimate goal is to train AI systems using human feedback, develop AI capable of evaluating other AI systems, and ultimately build AI that can conduct alignment research. OpenAI hypothesizes that AI can make faster progress in alignment research compared to humans.
While acknowledging the limitations and potential pitfalls, the team believes that using AI for evaluation and research will be instrumental in advancing alignment efforts. They acknowledge the challenges posed by inconsistencies, biases, and vulnerabilities that may arise from relying on AI for evaluation. Additionally, they recognize that certain aspects of the alignment problem may extend beyond engineering concerns.
Despite the potential difficulties, Sutskever and Leike believe it is worth pursuing this avenue. They emphasize that superintelligence alignment is fundamentally a machine learning problem and that the expertise of machine learning experts, even those not currently working on alignment, will be crucial in finding solutions. OpenAI intends to share the outcomes of their research widely and considers contributing to the alignment and safety of AI models beyond OpenAI as an important aspect of their work.