Turing Test

Written by

Marc Mercier

Table of Contents

What is the Turing test and how does it work?

FAQs

Summary

The Turing Test is a method proposed by Alan Turing to see if a machine can imitate human conversation well enough that a person can’t tell they’re chatting with a computer.
Rather than proving that a machine “thinks” like a human, the test checks if it can convincingly mimic human behavior through dialogue, sometimes using tricks like typos or casual speech.
Passing the Turing Test wouldn’t necessarily mean a machine has consciousness; it would only show it can imitate human conversation convincingly, which sparks debates about intelligence and what it means to “think."

What is the Turing test and how does it work?

‍The Turing Test is an AI test to see whether, through a chat conversation, a computer can convince a human that it is human. A human is asked to judge whether the “person” they are speaking to is a human or a computer. If they judge that they are speaking to a human but they are actually speaking to a computer, the computer has passed the Turing Test.

Essentially, it is a test to assess whether a computer can imitate a human so convincingly that it can fool a human into thinking that they are speaking to a human. Of course, there are many things to unpack about this test.

Build AI Chatbots

Build custom autonomous agents

Start now

What's the point of the Turing test?

This may seem like a strange question as the point seems obvious: to know whether a machine can convincingly imitate a human in a chat conversation. There are some deeper considerations, however.

Are we testing whether a machine can genuinely imitate a human in terms of underlying thought or intelligence or just fool a human that it is human? There is a difference.

Imitating humans in terms of underlying thought or intelligence is what people typically think of when they think about the Turing Test – that humans are genuinely not able to tell the difference between chatting to a human or a machine. This was actually not the way the test was initially conceived because “tricking” humans was allowed. For example, making typos might be a way for a computer to trick a human into believing it was human as a machine would never make a spelling mistake.

The underlying problem is that tests have rules and therefore are inevitably flawed in some ways. For example how long you speak to the test subject matters. It’s easier to imitate a human over 5 minutes than over one hundred hours of conversation. Tricks might work over the 5 minute version but not over the hundred hour version.

Does it matter who performs the Turing test?

A scientist with training on how to spot machines versus humans will be much harder to fool than someone off the street with no training – not just because of the scientist's ability to evaluate answers but also because of knowing what questions to ask.

Even if the computer has a level of “thinking” and intelligence at the level of a human, that may not be enough to fool the tester. That is because the computer could be too perfect or too unemotional in its responses.

There are even philosophical considerations around the Turing Test such as whether if computers reach a generalized human-level intelligence that would mean that machines can “think” or are conscious. This was in part a question that Alan Turing was trying to bypass with this test. If a machine can accurately imitate a human then for all intents and purposes it is “thinking”.

Of course, that does not mean it has consciousness or that it is thinking in the same way that a human thinks. In fact, it’s guaranteed that it does not think in the way that humans think. The real interest in this question lies when seen from a practical point of view. Aircrafts fly, for example. That is what is important. It’s much less interesting that they don’t imitate birds in the way they fly.

The Turing Test is interested in the results, not in the way the results are achieved.

A more important point is that the Turing Test is understood generally to describe a state of affairs where machine intelligence has reached at least human level intelligence. It is a much smaller group that is interested in the question of whether a machine has technically passed a Turing Test considering all the flaws described above.

While passing a Turing Test could be an impressive technical feat, especially if the test is long-running and run by knowledgeable people, it is much less impressive than a machine that could fool all the people, all of the time. Of course the longer the period of time over which the test is run and the higher the level of expertise of the evaluators, the more likely these two scenarios converge.

Are we near a computer passing the Turing test?

Now that you understand what the test is, the next question must be "are we anywhere near a computer passing the test?" (i.e. achieving generalized human intelligence). The short answer is “No”.

While there has been tremendous progress in Natural Language Processing which is the ability of a computer to identify the intention behind a single spoken phrase (which is the technology driving all the voice assistants), we are very far from a generalized human-level intelligence.

It turns out that current technology is not very good at ambiguity (understanding the meaning behind ambiguous statements), memory (incorporating previously stated facts into the current conversation) or context (factoring in facts that are unstated but relevant to the current situation). In short, the current technology is almost nowhere near in terms of what is needed.

Part of the problem is current AI technology needs to learn using huge amounts of data. Any domain where there is a huge amount of repetitive data available is ripe for introducing AI, for example speech recognition and image processing including self-driving cars.

Success in NLP is driven by the fact that there is almost unlimited data for one-off statements and questions with no context or no memory. If I say “I want to buy orange” it is in most cases a simple statement needing no additional information about context or memory to understand. The intention is: “Buy Orange Juice”.

When there is context or memory involved, this creates dimensionality. If I say I want to “buy orange juice” but I have previously told you that I am a financial trader who trades in orange juice, then you need to understand that in this context I want to buy a financial instrument that will make money if the price of orange juice goes up.

Deploying AI Agents?

Read our Blueprint for AI Agent Implementation

Read Now

‍

So now what does our data look like? “Buy orange juice” means: buying a bottle of orange juice from the shop OR if has previously stated that they are a financial trader in orange juice, it means they want to buy a financial instrument linked to the price of orange juice.

What if our financial trader has just said he is thirsty, then he means he wants to buy a bottle of orange juice from the shop. So we add another data point: OR if has previously stated that they are a financial trader in orange juice but they have recently stated that they are thirsty, it means they want to buy a bottle of orange juice.

An financial enterprise would quickly run into problems if they launched a trading bot that users believed had human level "intelligence".

Is passing the Turing test impossible?

Conversation data has many dimensions, unfortunately. Infinite dimensions. This means that the machine learning algorithms would need to have access to a dataset that had large amounts of data for every possible dimension, and that is of course impossible.

This does not, of course, mean that passing the Turing Test is impossible. We know it’s possible because we already have the technology to do it, in our brains. Just like people hundreds of years ago knew that flight was possible by observing birds flying.

The issue is that our approach to AI in this cannot be built on big data because big data with sufficient dimensionality does not exist. There are simply too many variables, too many dimensions. Even as we speak Google gets 800 million searches a day that it has never seen before. That gives you a clue as to how difficult the data approach would be.

Ray Kurtzweil at Google is following an approach that to some extent tries to replicate the human brain. He has estimated that we will get to generalized intelligence and be able to pass a very hard Turing Test by 2029.

His forecast is based on the assumption that progress in this field will be exponential and therefore even relatively modest progress today is much more significant than it seems if you assume that we are on an exponential trajectory of progress.

Whether he is right we will have to wait and see, but what it does tell you is that it is highly unlikely that the break through will happen in the next 10 years.

What would it mean for a machine to pass a credible Turing test?

The final point is what would it mean if a machine passed a credible Turing Test. If the machine passed the test using some sort of big data approaches, in a similar way to the way machines beat humans at board games, even sophisticated ones, the implications would not be as great as if the machine passed it using a brain replication approach.

The brain replication approach would mean that the machine is likely to be closer to “thinking” in the way that we define thinking as humans. It could extrapolate meaning from minimal examples in the way that humans do, rather than need hundreds of examples of the exact case to extrapolate meaning.

As mentioned above, it is more likely that a “brain replication” approach will provide the breakthrough as a big data approach is not possible. This would likely mean that machines would have achieved a general intelligence, not just in conversation, but in multiple domains.

The implication of this cannot be overstated as this would likely lead to complete reset of society. This is especially true if machines have the ability to improve themselves in meaningful ways which will lead to the possibility of exponential increase in their intelligence in a virtuous circle that will change life as we know it.

Humans' interaction with machines

Sticking to more mundane matters, it is worth bearing in mind that even if a machine was the equivalent of a human, that does not mean that we would interact with them like we do with humans. This is exactly the same as with a human. Interacting with humans is not always efficient. Trying to explain to your colleague how to do something over the phone can be tedious and inefficient in situations where it would be easier to show them how to do it. If only humans had a graphical interface available over the web!

Voice interfaces (or chat based interfaces) clearly have limitations in terms of the inputting or outputting of information. Clearly there are limitations and situations where it is much more efficient to show information graphically, or click on a graphical interface, than use a voice interface. Bot platforms are therefore designed to always try to get the user back to the happy path and not let the conversation meander.

My point is also that computers are not limited like humans in terms of the interfaces they can use to receive or provide information and therefore conversations with machines will necessarily involve using the optimal interface for the task at hand.

While passing the Turing Test would be a huge milestone in terms of human / computer interaction, the actual human / computer “conversations” will not be limited to just voice and text.

Build AI Chatbots

Build custom autonomous agents

Start now

FAQs

How does the Turing Test compare to other benchmarks for AI, like the Winograd Schema Challenge or the ARC Challenge?

The Turing Test checks if AI can mimic human conversation, but newer benchmarks like the Winograd Schema Challenge and ARC Challenge focus more on reasoning, common sense, and problem-solving. Things that reveal deeper intelligence rather than surface-level imitation.

Is the Turing Test still considered relevant in modern AI research, or are there better alternatives today?

The Turing Test is still a useful thought experiment and milestone, but many researchers now see it as outdated. Modern tests focus more on measuring actual understanding, logic, and generalization.

How does cultural or linguistic bias affect the results of a Turing Test?

Yes. AI can misunderstand idioms, humor, or references tied to specific cultures or languages, which makes it easier to spot as non-human in certain contexts.

How would passing the Turing Test redefine what it means to be “human”?

If a machine passed a rigorous Turing Test, it might force us to rethink whether human-ness is about biology or behavior and what makes our way of thinking so unique after all.

What types of questions are typically most effective at exposing non-human traits in AI?

Questions that rely on context, emotional nuance, or real-world common sense, like interpreting sarcasm, vague references, or conflicting information, are usually the quickest giveaways.