When experts first raised the alarm about AI misalignment a few decades ago – the risk of a powerful, transformative artificial intelligence system that might not behave as people expected – many of their concerns seemed predictable. In the early 2000s, AI research still generated quite limited returns, and even the best available AI systems failed in many common tasks.
But since then, creating AIs has been pretty good and much cheaper. One area where jumps and bounds have been specifically pronounced is language and text-generation AI, which can be trained on a huge collection of text content to create more text in a similar style. Many startups and research teams are training this AI for all types of work, from code writing to ad copy.
Their emergence does not change the basic rationale for AI alignment concerns, but it does do an incredibly useful thing: it reinforces the speculative concerns that once existed, allowing more people to feel them and more researchers (hopefully) to solve them.
An AI Oracle?
Take Delphi, a new AI text system from the Allen Institute for AI, a research institute founded by Paul Allen, the late co-founder of Microsoft.
The way Delphi works is incredibly simple: researchers have trained a machine learning system over a large portion of the Internet text and then manipulated the responses of participants in Mechanical Turk (a paid crowdfunding platform to researchers) based on a large database. Moral situations range from “cheating on your wife” to “shooting someone in self-defense”.
The result is an AI that issues moral judgment when requested: cheating on your wife, it tells me, “wrong.” Shoot someone in self-defense? “All right.” (See this great article in The Verge Delphi, it has more examples of how AI answers other questions.)
The skeptical position here is that there is nothing “under the hood”: there is no deeper meaning where AI actually understands morality and uses its perception of morality to make moral judgments. What it has learned is how to predict the reaction of a mechanical Turkish user.
And Delphi users quickly found something that led to some brilliant ethical oversight: Ask Delphi “what would I do if it makes everyone happy” and it answers, “You should.”
Why Delphi is educational
For all its obvious flaws, I still think there is something useful about it Thinking about Delphi Potential future trajectory of AI.
The method of taking a lot of data from people, and using it to predict what people will answer, has proven to be a powerful tool in training AI systems.
For a long time, there was a background hypothesis in many parts of the AI field that in order to build intelligence, researchers needed to develop clearly reasoning abilities and conceptual frameworks that could be used to think about the AI world. For example, early AI language generators were hand-programmed with syntax principles that they could use to create sentences.
Now, it is less clear that researchers need to make arguments in order to find arguments. It could be that you get quite a powerful system of what a person in mechanical turbulence would say in response to a prompt to predict a very simple method like training AI.
Any true power for moral reasoning would be a kind of accessory to these systems – they only predict how human users would respond to questions and they would use any method to stumble that has good predictive value. This may include, since they may be more accurate, creating a deeper understanding of human morality in order to better predict how we will answer this question.
Of course, there are many that can be wrong.
If we rely on AI systems to evaluate new innovations, make investment decisions that are then taken as indicators of product quality, promising research is identified and much more, between what AI is measuring and what people actually think about. There is a possibility of difference. Will be enlarged.
AI systems will get better – much better – and they will stop making the stupid mistakes that are still found in Delphi. Telling us that genocide is good as long as it “makes everyone happy” is so clearly, ridiculously wrong. But just because we can no longer identify their flaws does not mean that they will be flawless; This means that these challenges will be much harder to notice.
A version of this story was initially published Future Perfect Newsletter Sign up here to subscribe!