Whether the machines can in fact I think, Turing believed, it was a question.too pointless to be worth discussing. ” Nevertheless, the “Turing test” has become a benchmark for machine intelligence. Over the decades, various computer programs have competed to transmit it cheap talking trickswith some success.
In recent years, wealthy technology companies, including Google, Facebook and OpenAI, have developed a new class of computer programs known as “big language models” with conversational capabilities far beyond the rudimentary chatbots of the past. One of these models – LaMDA on Google – Google engineer Blake Lemoine has convinced that this is the case not only intelligent, but conscious and sensitive.
If Lemoine was drawn to LaMDA’s realistic answers, it seems plausible that many other people with much less understanding of artificial intelligence, AI, could be too – which speaks to its potential as a tool for deception and manipulation in the wrong hands.
For many in this field, LaMDA’s remarkable ability in the Turing imitation game is not an achievement to be celebrated. If nothing else, this shows that the fair test has outlived its use as a lodestar for artificial intelligence.
“These tests don’t really reach intelligence,” said Gary Marcus, a cognitivist and co-author of the book.Restart AI. ” What one strives for is the ability of a software program to pass as a human being, at least under certain conditions. Which, come to think of it, may not be so good for society.
“I don’t think it’s progress on intelligence,” Marcus said of programs like LaMDA that generate human prose or conversation. “It’s an advance on misleading people that you have intelligence.”
Lemoine can be extraordinary among his peers in the industry. Both Google and external AI experts say the program does not have and could not have something like the inner life it envisions. We don’t have to worry that LaMDA will become Skynet, the malicious machine mind from the Terminator movies, soon.
But now that we live in the world envisioned by Turing, there are reasons for various concerns: one in which computer programs are advanced enough to make people seem to have their own leeway, even if they don’t.
State-of-the-art artificial intelligence programs, such as OpenAI’s GPT-3 text generator and DALL-E 2 image generator, are focused on generating incredibly human creations using vast datasets and enormous computing power. They represent a much more powerful, sophisticated approach to software development than was possible when programmers in the 1960s gave a chatbot called ELIZA canned responses to various verbal cues in an attempt to deceive human interlocutors. And they can have commercial applications in everyday tools, such as search engines, autocomplete suggestions, and voice assistants such as Apple’s Siri and Amazon’s Alexa.
It is also worth noting that the AI sector has largely shifted from using the Turing test as an explicit benchmark. Designers of large language models are now striving for high test scores such as General linguistic understanding of assessment, or GLUEand on A set of answers to questions from Stanford or SQuAD. And unlike ELIZA, LaMDA was not created with the specific intention of becoming human; he’s just very good at putting together and spitting out plausible-sounding answers to all kinds of questions.
Yet beneath this sophistication, today’s models and tests share with the Turing test the main goal of creating results that are as human-like as possible. This “arms race”, as the ethics of AI Margaret Mitchell called it in Conversation on Twitter Spaces with reporters at the Washington Post on Wednesday, came at the expense of any other possible targets for language models. These include ensuring that their work is understandable and that they do not mislead people or inadvertently increase harmful bias. Mitchell and her former colleague Timnit Gebru were fired by Google in 2021 and 2020, respectively. is a co-author of the article highlighting these and other risks from large language models.
As Google distanced itself from Lemoine’s claims, he and other industry leaders at other times celebrated the ability of their systems to deceive people, like Jeremy Kahn pointed out this week in its Fortune Eye newsletter At a public event in 2018, for example, the company proudly released recordings of a voice assistant called Duplex, complemented by verbal tics such as “hmm” and “mm-hmm”, which tricked receptionists into thinking he was human when they called to book appointments. (After a response, Google promised that the system would be identified as automated.)
“The most disturbing legacy of the Turing Test is ethical: the test is mainly about fraud,” Kahn wrote. “Here, too, the impact of the test on the field is very real and disturbing.”
Kahn reiterated the call, often made by AI critics and commentators, to withdraw the Turing test and move on. Of course, the industry already has, in the sense that it has replaced the Simulation Game with more scientific criteria.
But Lemoine’s story suggests that perhaps the Turing test could have served a different purpose in an age when machines were increasingly able to sound human. Instead of being an ambitious standard, the Turing test should serve as an ethical red flag: Any system that can pass it runs the risk of deceiving people.