OpenAI has faced criticism in recent months for moving too quickly and recklessly to develop more powerful artificial intelligence, and the company seems keen to show it takes AI safety seriously. Today, the company unveiled research that it says could help researchers scrutinise AI models even as they become more capable and useful.
The new technique is one of several AI safety ideas the company has touted in recent weeks: putting two AI models into a conversation and making the reasoning of the more powerful model more transparent, or “readable,” so humans can understand the AI’s intent.
“this is, [artificial general intelligence] “It’s safe and beneficial,” Yining Chen, an OpenAI researcher working on the study, told WIRED.
So far, the research has been tested on AI models designed to solve simple math problems. OpenAI researchers asked the AI models to explain their reasoning as they answered questions or solved problems. A second model was trained to detect whether the answer was correct, and the researchers found that by having the two models interact, the math-solving model tended to be more candid and transparent about its reasoning.
OpenAI has published a paper detailing the approach. “This is part of a long-term safety research plan,” says Jan-Hendrik Kirchner, another OpenAI researcher on the study. “We hope that other researchers will follow suit and try other algorithms as well.”
Transparency and explainability are key concerns for AI researchers working to build more powerful systems. While large language models can sometimes provide reasonable explanations for how they reached a conclusion, a key concern is that future models may provide explanations that are more opaque or deceptive – that is, they may pursue undesirable goals while lying about them.
The research published today is part of a broader effort to understand how the large-scale language models at the heart of programs like ChatGPT work. It’s one of a number of techniques that could help make more powerful AI models more transparent and safer. OpenAI and other companies are also exploring more mechanical ways to peer deeper into how large language models work.
OpenAI has been more transparent about its AI safety efforts in recent weeks following criticism of its approach. In May, WIRED learned that it had disbanded a team of researchers dedicated to studying long-term AI risks. This came on the heels of the departure of co-founder and key technical leader Ilya Sutskever, who was part of the board that temporarily fired CEO Sam Altman last November.
OpenAI was founded on the promise of making AI more transparent and safer, but with ChatGPT’s wild success and growing competition from heavily backed rivals, some have accused the company of prioritizing flashy advances and market share over safety.
Daniel Kokotaillo, a researcher who left OpenAI and signed an open letter criticizing the company’s approach to AI safety, said the new efforts are important but incremental, and don’t change the fact that companies developing the technology need more oversight. “It doesn’t change the situation we’re in,” he said. “We have a bunch of opaque, unaccountable, unregulated companies basically racing each other to build artificial superintelligence with no plan for how to control it.”
Another source familiar with OpenAI’s internal affairs, speaking on condition of anonymity because he was not authorized to speak publicly, said external oversight of AI companies is also needed. “The question is whether they’re serious about the processes and governance mechanisms necessary to put the good of society above profits,” the source said. “It’s not about whether they’re letting their researchers do their jobs in a safety-sensitive way.”