A New Turing Test

This post is a part of a series on machine learning theory. It was written with the help of the logician and philosopher, D.A. Martin.

Alan Turing’s article, Computing Machinery and Intelligence, puts forth a well-known test for the intelligence of a digital computer. The test is based upon a cocktail party pasttime known as the Imitation Game. In the game, there are three participants: a man, a woman, and an interrogator. The woman and man are separated from the view of the interrogator, and the interrogator communicates with them via teletype (or some other typewritten medium). The man and the woman are identified by labels which do not reveal their gender (for example, “A” for the man and “B” for the woman). During game-play, the interrogator poses questions to A and B. The man’s task is to convince the interrogator that he is the woman while the woman’s task is the same. After some predetermined period of time (perhaps 10 minutes), the interrogator must determine the gender of the two contestants. A wins if he tricks the interrogator into believing that he is a woman. B wins otherwise.

The Turing Test is a modification to the Imitation Game where a digital computer plays the part of A and a human (of any gender) plays the part of B. Here, A’s task is to convince the interrogator that it is a human while B’s is to convince the interrogator of the truth. If A wins this game, the computer passes The Turing Test. Passing the test, Turing claims, is a meaningful, telltale sign of machine artificial intelligence. I shall use the term “machine” to mean “digital computer.” By “digital computer,” I shall mean the kind of electrical device equipped with a processing unit and memory. I shall be more explicit when I intend to refer to Turing machines, which are abstract, hypothetical rather than electrical devices. I shall also use X Machine or an “X application” to refer to a digital computer running the algorithm X. . The Turing Test has spawned an immense literature in several subfields of cognitive science. Arguably, it is also part of the reason for the pride of place given to question-answering in natural language processing (and AI in general).

Figure 1. The ganglion of intelligence. (Royalty-free. Vault Editions.)

Nevertheless, in philosophy, the importance and meaning of The Turing Test have been a matter of longstanding controversy. I shall try to argue in what follows that there is a central insight about artificial intelligence embedded in Turing’s proposal: artificially intelligent systems would be able to solve a general class of problems without being debilitated by problems of exponential explosions in time and space complexity. In this respect, they are like human minds and unlike garden-variety computer programs like word processors, search engines, and calculators, all of which are designed to solve a very specific and fixed range of problems. However, I readily admit to The Turing Test’s many idiosyncrasies and peculiarities. I further say that just as The Turing Test has a central insight, it makes a central mistake: it treats intelligence as an information-absolute notion rather than an information-relative notion. For this reason, philosophers have proposed hypothetical examples of machines which would be able to pass The Turing Test even though they are no more than digital repositories of the knowledge of their authors. In such cases, it would be implausible to attribute any sort of intrinsic intelligence to these machines.

I shall propose a new class of tests for machine intelligence which preserve the insights of The Classical Turing Test while eliminating its mistakes. These tests are tests of machine learning: ones which require a digital computer to form a correct theory about its environment from some evidence about the environment. Imposing a requirement that an intelligent machine be able to learn is to impose a constraint that it must extend the knowledge given to it by its authors.

Importantly, I shall diverge substantially from Turing’s philosophical methodology by distinguishing between two senses of the word test: the first sense of the word “test” is an operationalized certification procedure wherein a test-taker is asked to perform a task with adequate precision; another sense of the word “test” is the specification of a condition for the existence of a feature or property in the test-taker. These two senses of test are not unrelated: we often use certification procedures as operationalized means of verifying the existence of a particular condition. For example, a two-hour written and multiple-choice examination might be a procedure for verifying the existence of scholastic aptitude in the test taker. I charge that though Turing really only puts forward a certification procedure, he fails to be explicit about which necessary or sufficient condition the test is supposed to verify. I shall be more explicit about both: I will provide a sufficient condition test for artificial intelligence, and I shall also propose an operationalized procedure for verifying its existence in an agent which I call the Emile procedure.

The argumentative strategy of what follows will be to review a sequence of proposed problems with The Classical Turing Test. After discussing each problem, I shall try to resolve the problem by reformulating a new conceptualization of the test. In the end, my final formulation is what I call The New Turing Test.

The outline for the paper is then as follows. Section 1 is where I criticize a well-received interpretation of The Turing Test, which purports that it is a test principally concerning the verbal aptitude of an artificial agent. I argue that the oversized focus on verbal behavior in The Turing Test forsakes an insight embedded with the test: that an artificially intelligent agent must be a task-general agent. I also here discuss a well-known objection to The Turing Test proposed by Ned Block known as the Blockhead objection. I argue that the lesson of the Blockhead objection is that one must impose a requirement of mental architecture on intelligent agents, which I call the von Neumann-Harvard architecture. Section 2 is where I argue that a further requirement of a reasonable test of artificial intelligence is that an agent must be able to demonstrate that it can learn from a possibly changing environment and hence that it can extend the intelligence of its creator. In this section I also lay out the Emile procedure. I conclude with some ideas for further work in Section 3. The upshot of my argument will be this: the ability to exhibit a generalized scheme of learning is a better certification of intelligence than passing Turing’s original test.

The Classical Test and its Interpretation

In this section, I review some objections to Turing’s test. Given the voluminous literature Turing’s proposal has generated (for some classical discussion, see, for example, Pinsky (1951), Gunderson (1964), Purthill (1971), Millar (1973), Searle (1980), Block (1981), Dennett (1984); for more recent research in the logic and science of The Turing Test, see Bradford (1995), Sato and Ikegami (2004), Shieber (2006), and Shieber (2007)), it is here not circumspect to audit the full catalog of objections and controversies. However, the review of a reasonable smattering of these objections should be enough to conclude that there are sufficiently many problems with The Turing Test to show that it has some room for improvement. First, I shall discuss the interpretation which takes The Turing Test principally to be a test concerned with eliciting reasonable verbal behavior from a machine. I shall claim that such elicitation is really an indirect means of assessing whether a machine is task-general. Second, I argue—following an important objection by Block (1981)—that it is not sufficient for such a machine to be constructed solely as a record of its developer’s knowledge. Instead, I argue that it should abide by what I call the von Neumann-Harvard architecture.

Task Generality vs. Verbal Behavior

The first controversy is a long-enduring one spurred on by the following question: what is The Turing Test a test for? No doubt, one reason for the endurance of the controversy is because of Turing’s own laconic explanation of the purpose of The Turing Test in the first few paragraphs of Computing Machinery and Intelligence. Here, he begins by proposing that we consider the question “Can machines think?” Turing suggests that one can answer this question by giving definitions to the terms machine and think. Turing writes that it would be absurd to seek those definitions out by a statistical survey of their common usage. Instead, he writes that he wishes to pursue a less ambiguous question.

The new question is whether a machine could pass The Turing Test, and in the following passage Turing gives his description of the altered Imitation Game. Turing says nothing more about what sense of replace he means here or why he thinks the questions are closely related. One might interpret “replacement” here to mean to engage in some form of conceptual analysis. Hence, one might be led to the conclusion that Turing is making something like the following analytic claim:

() A machine is intelligent iff it exhibits reasonable conversational responses to conversational stimuli.

can be criticized on several grounds. Arguably, most significant is that the implies that passing The Turing Test is a necessary condition of machine intelligence and hence that verbal ability is a necessary condition of machine intelligence (cf., e.g. French (1990)). For example, consider the case of Ululabot, a system which communicates not in any humanoid tongue but in a series of howls, grunts, growls, and chirps. Were these ululations to be translated by an intermediary, we would discover that the system could pass The Turing Test. However, given that the new Imitation Game requires that the system communicate directly in conversation, immediately and prejudicially renders the system unintelligent. Or consider the case of ImmanuelBot: a bot so advanced, it has become capable of moral reasoning. By hypothesis, ImmanuelBot has concluded there is something right in the moral philosophy of Kant and hence refuses to engage in the chicanery which would inevitably be involved in winning the Imitation Game; hence, in The Turing Test, ImmanuelBot plays a losing strategy on purpose. These well-known examples should be enough to let us conclude that the strongest claim we can charitably grant would be that the test is a sufficient condition for artificial intelligence, i.e.:

() A machine is intelligent if it exhibits reasonable conversational responses to conversational stimuli.

, of course, leaves open the possibility that there are many other reasonable tests of machine intelligence. What makes The Turing Test worthy of note? After all, the class of functions computed by even the simplest modern pocket calculator is a demonstration of some kind of acumen or ability. What makes verbal behavior an important feature in a test of intelligence?

One argument for its importance is the following: (Premise 1) the question Turing and others have been pursuing about whether machines can be intelligent is not about any kind of intelligence but specifically whether they can exhibit the same cognitive faculties of an (average) adult human; (Premise 2) the most important cognitive faculty of adult humans which distinguishes them from non-human animals is language-use; (Conclusion) reasonable verbal behavior in the face of verbal stimuli is a demonstration that machines have (or can simulate) roughly the cognitive faculties of an adult human. Call the sequence of reasoning above the language argument.

The language argument has been the received position among philosophers on the importance of the Turing Test. Indeed, the subtitle of Stuart Shieber’s anthology on The Turing Test is “Verbal Behavior as the Hallmark of Intelligence.” Presumably, the language argument is also what has made some philosophers believe that the Turing Test is presaged in Discourse on the Method, where Descartes writes that language use is what distinguishes humans from non-human animals.

There is much to the idea of the language argument. The first premise is uncontroversial: after all, if a system is to be considered artificially intelligent, its intelligence must artificial in reference to some other kind of system. Presumably, no one thinks that the reference system is the set of cognitive capabilities of some other non-human animal. However, I think the second premise of the language argument obscures what Turing must have thought a more important feature of intelligence: intelligent systems exhibit general-purpose ability. Turing’s own example of a portion of a successful Turing Test involves an interrogation in which the questioner first asks the computer to write a sonnet, then to to solve a problem of arithmetic, then to solve an endgame problem in chess.

Turing must have realized that almost all real-world computer applications are task-specific rather than task-general: the kinds of problems they solve are explicitly and narrowly specified before interaction with users or input. Task-specific applications can receive input and generate output in natural language with relatively few problems. In contrast, human minds are not only task-general but are furthermore ostensibly immune to combinatorial explosions in complexity. They are task-general problem-solvers whose procedures are tractable.

I shall try to argue here through an example that Turing’s test is a means of assessing whether a machine can be a tractable, task-general problem-solver, and the exhibition of verbal behavior required by the test is merely a conduit for identifying these more important features of machine intelligence. Consider a hypothetical application Directions, a natural language program which receives input regarding an address and returns directions to the address from the Stanford Quad. Because the task of the program is specified in full, providing a natural language interface to the task is rather trivial. Some code for the application here follows:

  def directions(input):
    address = parse_address(input)

if address is None: print("I don't know that address.") return; paths = path_find(Constants.STANFORD, address) for path in paths: print_transition(path)
Algorithm 1.1 Directions

I leave the details of the subroutine PathFind unspecified. No doubt, this is the heart of the application, but we should be convinced from the existence of many GPS and mapping applications that there is an effective procedure and map data to compute it. The two key subroutines which provide the natural language interface for the program are ParseAddress (which parses natural language input) and PrintTransition (which produces natural language output).

I leave the details of the subroutine PathFind unspecified. No doubt, this is the heart of I note here that it is trivial to construct ParseAddress so that it responds reasonably to a variety of natural language input. For example:

  • How do I get to 115 Center St., Berkeley, CA?
  • 115 Center St., Berkeley, CA????
  • Show me how: 115 Center St.

One simple implementation uses a regular expression (for example, something like number capitalized word abbreviation ) to extract an address from the input string. This simple strategy is so flexible, it allows the subroutine to return reasonable results even when presented with malformed or ungrammatical input such as the last two examples above.

Likewise, the subroutine PrintTransition can be made reasonably “conversational.” Perhaps it might return something like:

First head down Palm Drive. When you get to the end it will turn into University Drive. Next, head down the street until you get to the 101 …

The trick here is to compile a catalog of embeddable “direction” phrases which can be selected at random based upon the index of the path transition and kind of transition involved (for example: “Then you turn [direction] when you get to [street],” “ When you hit [street], go [direction],” etc.).

Is Directions an intelligent machine? Albeit restricted, it is hard to deny that it engages in a kind of reasonable, responsive verbal behavior. However, this feature of the machine ought to be, I contend, irrelevant to our considerations about its intelligence. Whatever kind of intelligence we take the machine to be exhibiting is likely tied with how simple or complex we find the problem solved by the PathFind subroutine (which I have called the “heart” of the application). The reason is that the natural language interface of the Directions system is an essentially superfluous addition to the map problem being solved by the system. In fact, there is a good reason why most task-specific systems—such as mapping systems, calculators, etc.—do not require or use natural language input: since the problem-solving task at hand has been narrowly and formally specified, it is generally the case that there is a more efficient way to communicate that input in a regimented, formal way. For example, it seems pointless to require input to an Adder to be formulated verbally—why say “What is the sum of thirty-five and twelve”—when we can simply send a vector of numbers and a symbol for an operand? Likewise, the optimal regimentation for input in the Directions system is just the address of the destination.

The fact that input can usually be regimented efficiently is a feature of task-specific systems but not task-general systems: since a general-purpose reasoner ranges over many (perhaps even infinitely many) different kinds of tasks, the input format presumably must likewise be general-purpose. Moreover, the input format must be general enough to specify not only any sequence of input but an indication of the requested task. A simple language like the language of arithmetic is too domain-specific for such a purpose (a suitable language would have terms which refer to more than merely the numbers). Turing’s intuition must have been that if anything could be a suitable format for information exchange in a task-general setting, it must be natural language. (Turing himself writes in the opening section of Computing Machinery and Intelligence that “[t]he question and answer method seems to be suitable for introducing almost any one of the fields of human endeavour that we wish to include.”)

Figure 2. Another ganglion? (Royalty-free. Vault Editions.)

However, the fact that natural language is suitable for these purposes should not obscure the other fact that the Imitation Game is a test not for merely exhibiting verbal behavior but instead uses conversation as a means to confirm or disconfirm whether a system is task-general rather than task-specific. The ability to return conversation when stimulated with conversation can at best only be the input/output interface for evidence of deeper ability. My suggestion here is then that one moral of The Turing Test is the following sufficient condition:

() A machine is intelligent if it is a task-general machine.

A skeptic might at this point be inclined to object that this dialectical move trivializes the question of machine intelligence. In particular, we already know there are task-general machines since we can prove the existence of universal Turing machines. Yet the existence of a UTM does not show that there is a machine that has the intelligence of an adult human being. I propose two responses to this question. First, I think it remains an open question whether the class of Turing machines exhausts all effective procedures that a human being or indeed any being could execute (or put another way, the Church-Turing thesis remains an open question). If average, adult human beings turn out to be able to compute superrecursive procedures, and if furthermore a condition of an artificially intelligent machine is that it be able to simulate the computations of an average human being, then the existence of a UTM is insufficient to show the existence of an intelligent machine. Second, conversational behavior of the kind required by The Turing Test also imposes an important computational constraint on the system’s performance: the algorithms employed must be able to keep up with the normal pace of conversation. That is, the complexity of the general-purpose algorithm must be roughly constant time. Hence, the question posed by the new Imitation Game is not merely whether there are task-general machines but whether there are sufficiently “fast” task-general machines.

Computational Architecture

I have been arguing that one central insight of The Turing Test is that a machine would be intelligent if it could tractably compute most (if not all) of the cognitive and intellectual tasks performed by the average adult human being and that, furthermore, this insight has often been obscured by the overemphasis on a machine’s ability or inability to exhibit conversational behavior. I now turn to what I think is the chief problem with The Turing Test, even when it is understood as a means of affirming or denying the task-generality of a system. I shall say that many problems arise for The Turing Test because of a failure to constrain intelligent systems to those which abide by the von Neumann-Harvard architecture. Failure to specify this constraint has, I argue, been responsible for some important objections to The Turing Test.

First, I will need to say something about what I mean by a “von Neumann-Harvard architecture.” Though the term is my own neologism, I think the concept of a von Neumann-Harvard architecture will be intuitive to anyone who has shopped for a retail computer: the standard way to architect a computer is to divide labor between its capacity for memory (the size of its RAM and disk space) and its capacity to compute (i.e. its CPU power).

Figure 3. The Harvard architecture.

Figure 4. The von-Neumann architecture.

In the mid-20th century, the Harvard architecture and von Neumann architecture were competing visions for computer organization and design. Figure 3 and Figure 4 illustrate the two models. However, they share the architectural principle: that working memory unit should be distinct from its processing or control unit, and both should be distinct from external, persistent mass storage.

The distinction between memory and storage is arguably nascently present in the model of the Turing machine since those machines are composed of a finite control and an unbounded amount of tape. However, the tape here also doubles as the input and output medium; moreover, the Turing machine has no notion of persistence—no data survives in between computations.

When I say that The Turing Test should be constrained to those computer programs that abide by the von Neumann-Harvard architecture, I mean those computer programs which divide their labor roughly equally between control, memory, and persistence. In particular, computer programs should not be simply tables of precomputed answers but, in some sense, active ratiocinators executing some procedure.

I hasten to add here that I make no actual restriction on the physical architecture of a machine. What must be emphasized here is that the machines divide their labor in the required way. One might here interject to suggest that the requirement I am actually proposing is simply to impose a requirement that the program make use of recursion. Although such a requirement is basically on the right track, it is too weak: an offline memoized program might make use of a dummy recursion method (I discuss the notion of offline memoization shortly). What is important is that processing resources are applied to an input to produce the output.

The failure to specify this constraint has given rise to an important counterexample to The Turing Test: Ned Block’s Blockhead example.

The Blockhead Machine. Ned Block asks us to imagine a machine concocted to pass the Turing Test. The machine has access to a tree representing all the possible conversations that might obtain between the machine and the interrogator in the new version of the Imitation Game. Each root node represents a question or request that the interrogator might make to start the game. For example, a first request might be “Give me the square root of 144.” From those nodes emanate edges to the other nodes which represent acceptable answers to the request or question in the ancestor node. Since The Turing Test is time-limited, we are restricted to only those questions and answers that could be typed within the limit (say only those sentences that take no longer than an hour to type.) Though the tree might be very large (indeed so large, no modern computer system could contain it), it is nevertheless finite. The construction of a Blockhead computer program would inevitably be difficult: the programmers constructing it would need to compile an incredibly vast compendium of worldly knowledge. However, the practical problems in constructing a Blockhead machine are no real impediment for the Blockhead objection. All that it is required is that it remains logically possible to construct such a machine. If so, then we have constructed a counterexample to The Turing Test as a sufficient-condition test for intelligence. By hypothesis, the Blockhead will respond with reasonable sentences to verbal stimuli. However, the Blockhead machine, Block claims, is no smarter than a toaster oven.

The Blockhead machine does not abide by the division of labor suggested by what I have been calling the von Neumann-Harvard architecture. Indeed, the Blockhead program consists mostly of a large collection of data (i.e. the tree residing in either transient or persistent memory) and a simple processing step (i.e. tree traversal). The simplicity of this processing portion is presumably what makes it no more intelligent than a toaster oven.

The strategy employed by the Blockhead machine might be termed offline memoization. Online memoization (or caching) is a strategy for optimizing computer programs which should be familiar to any software engineer. When a particularly processing-intense procedure is required to compute the value of a function, the value and the function’s argument can be saved in an input-output hash table, so that the table is consulted the next time the value of the function is requested. So long as there remains enough memory to hold the computed values, memoization is a means of optimizing a computer’s performance by exchanging space (i.e. memory) for time (i.e. processing). An example of an online memoizer is the following program which computes the factorial function.

def memoized_factorial(input): output = hash[input] if output is not None: return output output = fac(input) hash[input] = output return output

def fac(n): if n == 0: return 1 return n * fac(n-1)

Algorithm 1.2 MemoizedFactorial

(I hasten to add here, for students of computer science, that though recursion of the sort in Fac is seemingly elegant, implementations like this are slow. The reason is not because of big-Omicron issues but because each recursive call requires the allocation of a stack frame is computationally expensive. In a real-world application, such a function would be implemented iteratively or employ tail recursion.)

The Fac subprocedure computes the value of a factorial by making use of recursion. The number of recursive steps taken by the program is dependent upon the value of the input: in particular, it requires steps for an argument (assuming that the multiplication operation is atomic). However, the algorithm here also makes uses of memoization; once a value has been computed, it is saved in a data structure (in hash). Subsequent calls to the factorial function then only take a single step.

I have called the preceding kind of memoization online since it requires that the memoized function values be first computed by some kind of algorithm run through a processor. This kind of memoization can be distinguished from values which are memoized offline: they are simply embedded in the control code by the engineer. For example, the following modified version of the MemoizedFactorial function has an offline memoization for the input 10.

def offline_factorial(input): if input == 10: return 3628800 output = fac(input) return output

def fac(n): if n == 0: return 1 return n * fac(n-1)

Algorithm 1.3 OfflineFactorial

As with online memoization, offline memoization has a finitary character: at any given time in the execution of the program, only a finite number of function values can be memoized. However, once a computer program is written with offline memoization, the worst-case time complexity for a memoized value is simply the time complexity of the data structure in which it is stored. In particular, if these values are stored in a hash table, the time complexity is —the execution returns in a single step.

In the offline case of memoization, the memoized output value of the function cannot be said to be computed at all by the program. Instead, all that can be said is that the computer serves as a dumb terminal or teletype for passing data between programmer and user. At least with respect to its memoized values, the digital computer for OfflineFactorial serves not as a processor but as a medium for the exchange of information between human agents.

Blockhead is only a more extreme example of a computer program which makes use of offline memoization; its primary function is to provide a response to a sequence of strings representing the history of The Turing Test conversation. Since, by hypothesis, there are only finitely many of such sequences, the entire program can be memoized offline. But just as in the case of the memoized values in OfflineFactorial, the program serves only to communicate some knowledge or information between the programmer and the user; little should be attributed to the intelligence of the program itself in virtue of its processing. As Block rightly claims, the intelligence that the program exhibits is that of its programmers and not of itself.

The moral of Blockhead and offline memoization, I contend, is that two additional constraints must be placed on any test of artificial intelligence: we must be able to say (i) the intelligence exhibited by the machine is not merely a collection of data residing in memory but rather that the machine must compute or process information (i.e. it operates on the division of labor suggested by what I am calling the von Neumann-Harvard architecture); and (ii) we must furthermore be able to say that the machine is able to know or discover something that was not antecedently known or discovered by its creators. What I will attempt to argue for the remainder of this paper is that a sufficient condition test for intelligence is to exhibit existence of a task-general learning machine. Such a test implicitly incorporates these constraints.

Figure 5. Task-generality. (Royalty-free. Vault Editions.)

The New Test

In this section, I will make my case for general-purpose machine learning as a sufficient condition test for intelligence which both incorporates the Classical Test’s main insight while eliminating its weakness. In the first subsection, I shall try to give a definition of a general-purpose machine learner and then give some arguments about how the existence of such a learner would serve better as a test of artificial intelligence. In the final subsection, I review some objections one might have to my proposal.

The Condition and its Operationalization

By “learning machine,” I shall mean machines which have one of two fundamental forms:

Conditional Learning Machines. Let be a class of environments and let be a distance metric between the elements of . M is a conditional machine for a class iff given any objective environment , at stage , (i) makes a query of (or experiment on) the environment; (ii) receives a response ; (iii) makes a guess .

M conditionally converges upon E iff additionally (iv) for all , had (actually or counterfactually) continued to be the environment, there is an , where implies that . We shall say that M conditionally converges upon if the previous condition holds for the entire class.

Adaptive Machines. Let be any sequence of environments. M adaptively converges on the sequence if (1) M conditionally converges to ; and (2) if M has converged to for , then altering by makes M conditionally converge to . We say that M is an adaptive machine if it adaptively converges on any such sequence.

The definitions I have given above are informal generalizations of definitions given by Gold (1967). I call these “informal” despite the presence of some technical notation because: (i) I have left the notion of an environment unspecified; and (ii) because I have left unanalyzed how counterfactuals should be understood.

Before I give my definition of a task-general learning machine (the main concept in The New Turing Test), it will be useful to compare and contrast the definitions given above with those given in Gold's original framework. Gold's learning framework formalizes language learning (or what Gold himself termed “language identification in the limit”). One task in formalizing language learning is to specify how languages are to be represented. Gold's own solution is to represent languages as sets of finite sequences from a finite alphabet. This alphabet, usually represented as , could be the set of alphanumeric symbols found in standard English (i.e. A-Z, a-z, 0-9, and some punctuation), or it could be something broader such as the characters found in UTF-8. Let be the set of all finite sequences from the alphabet . We will say that is a language if and only if it is a subset of . Intuitively, the idea is that the set specifies all the grammatical sentences or expressions in the language.

In the Gold positive learning framework, the idea of a language plays the role of an environment in my definitions. A Gold learner receives a sequence of stimulus examples from the language present in the environment. For example, if the language is

then a stimulus sequence might be


The framework makes no requirement that the sequence is itself computable, though it requires that the sequence eventually enumerate all of the members of the language . After each stage of stimulation, the Gold learner—just like the learning machine in my environment—makes a guess of the language present in the environment. A Gold learner identifies the language present in the environment if after some finite period the learner correctly guesses the language and continues to guess that language in perpetuity. The framework is described as positive since the stimuli examples given to the learner are examples from the language. Gold proved a series of theorems about his learning framework, including a celebrated result sometimes called Gold's Theorem, which proves that the class of language learnable by a single learner from positive evidence is significantly limited.

I make no requirement that a learning machine only learn positive evidence. Gold proposed two further extensions to the positive learning framework. One extension is that the learner receive positive and negative evidence. For example, a stimulus sequence in the positive-negative framework might be as follows:


Here, the second element of each tuple indicates whether or not the given element is present in the language. Both of the two preceding Gold frameworks might be called passive, since the sequence of stimulus input is out of the control of the learner. Gold also proposes the notion of an active learner which can make inquiries into the environment at each stage of learning. For example, it might ask at stage whether or not “abc” is present in the stimulus language. In my definitions, this activity is characterized by the learning machine performing an experiment on the environment and receiving a result.

So much for the similarities between my definitions and Gold's own framework. What is different? The first difference, as I have already mentioned, is that my definitions leave the representation of the environment unspecified. It would be prejudicial to claim all environments or those most common in human learning are describable as subsets of finite sequences from a finite alphabet. First, each environment in this latter system has at most a denumerable number of elements. One might think a physical environment might need to be described using an uncountable number of points bearing mass. Second, it is also not clear to me whether all members of environments are permutations with repetition from something like a finite elemental set. For example, consider a hypothetical learning task of facial recognition: a computer is asked to grasp some generalities of physiognomy from a series of pictures. Perhaps the picture can be represented by a sequence of pixels, but should the range of the values of the pixels (i.e. a number representing its color) themselves be finite? Perhaps it would be more accurate if they were color values along a continuum.

The second difference between Gold's framework and my suggested definitions is that I furthermore leave unspecified the nature of stimulus examples and experimental outcomes. In the Gold framework, both are members (or annotated members) of the space of possibilities, . For example, in the positive learning framework, the stimulus might be the string . In the positive-negative framework and in the active framework, the stimulus examples are pairs, like . The requirements of my framework are closest to Gold's activist learning framework, though I consider my learners to be carrying out experiments. However, by contrast, I make no requirement that experimental outcomes are merely positive/negative indicators.

Finally, the most significant departure from Gold's framework in that I abandon the older framework's assumption of a mundus stabili. Whereas in the Gold framework the stimulus language is taken to be fixed, I allow the stimulus environment to be modified over the course of learning.

I am now in a position to characterize The New Turing Test. By task-general learning machine, I shall mean an adaptive machine whose range of learnable classes is roughly that of an adult human being. In particular, such a machine would be able to complete characteristically human learning tasks such as language learning, theory formation through induction, learning of causal relationships, learning rules of social convention, etc.

Moreover, such a machine should be able to complete these learning tasks in roughly the same time that a human would require.

The New Turing Test I am proposing is the following:

() A machine is intelligent if it is a task-general learning machine acting under the von Neumann-Harvard architecture division of computational labor.

is a sufficient condition test and not by itself an operationalized certification procedure. Heuristics for certification are available for The New Turing Test, though I think they will inevitably have to be more complicated than Turing's proposed procedure. Consider a direct analogue to the modified Imitation Game, called the Neoimitation Game: an interrogator is given 10 minutes (or some other fixed interval) to provide respondents with various learning tasks. For example, we might consider the following conversational segment as part of a run of the Neoimitation Game:

Q: Please continue the following sequence: 2, 4, 8, 16, 32 ...
A: The next five numbers are: 64, 128, 256, 512, 1024.
Q: Tell me a general pattern for the following sequences: aca, abbbbbbbca, abca, abbca.
A: `a' followed by zero or more `b's, followed by a single `c,' followed by an `a'.
Q: Give me a general function characterized by the following points: (1.3, 2.2),(2.1, 5.8),(3.7, 10.2),(4.2, 11.8)

Since the Neoimitation Game is a direct analogue of Turing's test, a learning machine is rendered task-general if an interrogator cannot tell the difference between the answers of the learning machine and the human. Though the Neoimitation Game is a reasonable start at an operational test for , it also has many problems (many of them inherited from the structure of Turing's modified Imitation Game). One feature that the Neoimitation Game has that seems inevitable to operational tests of The New Turing Test is that the procedure must make some comparison of a computer's behavior with a human's to certify that they are observationally indistinguishable. The same fact applies to the the classical Turing test; since the aim of artificial intelligence is to emulate a kind of intelligence we associate with humans, a test of observational indistinguishability with a human seems to be a natural aspect of an operationalized test.

However, there are many problems with using the Neoimitation Game as a procedure for verifying the requirements of The New Turing Test. First, as in the original test, the Neoimitation Game does not provide any quantificational standards for passing the test. Is it enough to pass the test a single time with a single interrogator? Must it pass all tests? If the latter, it seems certification of intelligence is only provisional since we can only observe that a machine has passed all of the tests up to the present moment. It seems the best we can do is to obtain a degree of confidence in the artificial intelligence of a machine.

A second problem is that the time limit of the Neoimitation Game is the source of many problems. For one thing, the time limit will inevitably be highly arbitrary. For another thing, the finiteness of time—at least in the case of the Imitation Game—allowed for counterexamples like Blockhead, where all the possible passing conversations were memoized. (We have required that learning machines operate under a division of labor suggested by the von Neumann-Harvard architecture to avoid this problem.) But furthermore, the sort of time limit suggested by the Neoimitation Game (say of 10 minutes or one hour) is incompatible with many of the tests of learning a machine would be expected to pass under The New Turing Test. For example, an archetypal task a learning machine would be required to complete would be language learning. Moreover, the learning machine would have to learn the language at roughly the same rate as an infant. But the time scale of such learning tasks in infants is measured in minutes or hours rather than months and years.

Finally, the Neoimitation Game fails to specify the knowledge state of the human respondent. For the test to be meaningful, it seems obvious that the machine and the human respondent must start off at equivalent information states. For example, in the case of language learning, suppose that the human respondent has acquired a first language but the machine has not. The fact that the human respondent may acquire the language more rapidly than the machine shows little about the performance of the machine's learning algorithms. For the comparison to be valid, presumably the machine and the human respondent must either both be performing first language acquisition or second language acquisition.

This last problem with the Neoimitation Game suggests another way forward for an operationalized procedure for The New Turing Test: take a newborn infant—call him Emile—and a candidate machine M. If M, when exposed to identical stimuli, can achieve the same developmental milestones as Emile then we have a measure of the degree of M's intelligence. The longer that M can keep up with Emile, the more we should be confident that M can meet the requirements set out by The New Turing Test.

The Emile certification procedure remains a sketch, and a few comments are in order on it. First, for the details of this sketch to be filled in, we must make progress on many important subprojects in developmental cognitive science. For example, we must have a complete profile of the milestones of infant development. Moreover, we must have an understanding of what environmental stimuli an infant receives, how they are represented, and in what way they can be simulated to the machine. Second, I suggest that the certification procedure be coupled with the statistical procedures found in standard clinical trials of pharmaceuticals. For example, just as we think that a drug's success in treating a single patient is inconclusive about whether the drug is effective generally, so we should regard the success of a machine in a single case of the Emile-procedure inconclusive. Instead, we should expect the machine to be successful with a variety of infants, just as in clinical trials, the human participants must come from a variety of socioeconomic and ethnic backgrounds. Finally, I think this statistical approach suggests that the certification provided by the test will not be categorical but a matter of degree. The preceding feature has the advantage that comparative judgments can be made between machines about their relative intelligence. Hence, we would also have a metric for assessing whether research programs in artificial intelligence were moving closer to success.

The Positive Case

Why would we consider the existence of a task-general learning machine a better sufficient condition of artificial intelligence than the existence of a task-general machine? I present three arguments.

Extending the Intelligence of the Programmer. First, I have been arguing (with Block) that for any intelligence to be attributed to the machine at all, its behavior must not be a mere, indirect exhibition of its creator's knowledge. That is, it must exhibit an aptitude for acquiring knowledge that was not explicitly defined in its control code. Though being a learning machine is not necessary to meet this requirement (see e.g. Sieve below), learners are par excellence examples of machines which exhibit more than merely the knowledge of their creators. Consider the case of grammar induction. Suppose a language machine, Jurafsky, has been constructed by Stanford computational linguist Dan Jurafsky. The machine operates in the following way: it has a microphone which can listen to and transcribe sample sentences from its environment. After about 100,000 stimulus sentences, Jurafksy produces an accurate grammar for the stimulus language. Importantly, by hypothesis, this machine is not merely a memoized list of all possible sequences of stimuli to grammars. Instead, it must employ algorithms which make use of general principles of mathematics, phonology, syntax, semantics, etc. Hence, the machine Jurafsky could be said to learn a grammar for Swahili even though Jurafsky, the linguist, might not know it.

Information Relativity. Second, I contend intelligence is an information-relative rather than information-absolute feature of cognitive agents and tests of learning better appreciate this feature, whereas The Classical Test fails to do so. For example, consider the case of a five year old—call him Meno—who is asked what the distance is between the origin and (3,4) on the Cartesian plane. Of course, solving this problem requires knowledge of the Pythagorean theorem (or at least what a Cartesian plane is and how to measure between points). But Meno's failure to answer this question correctly or incorrectly should be inconclusive regarding his “geometrical” intelligence. If he is incorrect, we couldn't reasonably judge him geometrically unintelligent: we could attribute his inability to do the problem to his youth. Moreover, even if he is correct, we might not be able to attribute to him geometrical intelligence without verifying how he arrived at his answer. For example, we would have to make sure it wasn't merely a guess or the product of rote memorization.

A better test would be whether, given a little coaching, Meno could learn the Pythagorean theorem and how to apply it more generally to solve problems of determining distance. But both the Classical Turing Test and the original Imitation Game appear to be tests of knowledge the agent already has rather than his or her potential to acquire it.

Consider the original Imitation Game first and Turing's own example of a turn of gameplay, where interrogator C asks a question of a player X who is either the man, A, or the woman, B:

The interrogator is allowed to put questions to A and B thus:
C: Will X please tell me the length of his or her hair?
Now suppose X is actually A, then A must answer. It is A's object in the game to try and cause C to make the wrong identification. His answer might therefore be
`My hair is shingled, and the longest strands are about nine inches long.'
(Turing, 1950, p.433-4)

Turing's idea in the example appears to be that, to be successful at the (original, gender-based) Imitation Game, A must have a modicum of gender-specific knowledge relative to the culture, time and place, of the interlocutor: that there is a hairstyle that is shingled, that it has shorter and longer strands, that a shingled hair style might have strands nine inches in length, etc.

A similar moral applies to Turing's modified Imitation Game played between the digital computer and interrogator:

Q: Please write me a sonnet on the subject of the Forth Bridge.
A: Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764
A: (Pause about 30 seconds and then give as an answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K and K1, and no other pieces. You have only K at K6 and R at R1. What do you play?
A: (After a pause of 15 seconds) R-R8 mate.
(Turing, 1950, p.434-5)

Here the machine must have a very general knowledgebase containing facts that would be known to the average English-speaking adult: that a sonnet is a kind of poetry; the basic rules of arithmetic (and how long it takes an adult to perform arithmetical computations); that there is a game of chess whose states and moves can be described in shorthand; that a rook can move along its horizontal and vertical axes, etc.

It is in this sense that I think The Turing Test treats intelligence as an information-absolute concept rather than an information-relative one. It is merely a test for how much knowledge a digital computer has (perhaps relative to a set of imagined interlocutors), not a more general test for its aptitude for acquiring knowledge through experimentation and problem-solving. But merely having knowledge, just as in the Meno example, is inconclusive about an agent's intelligence. Just as we are able to affirm Meno's intelligence by investigating whether he can learn the Pythagorean theorem after some coaching, we should, I contend, test how much digital computers can learn in assessing their intelligence.

Adaptivity. I have required that a task-general learning machine not merely be a conditional learning machine but an adaptive learning machine. The task-general learning machine must be sensitive to changes in its environment. A learning machine without the adaptivity feature can converge to an identification of its environment and then never change its mind.

Intelligent agents must be, I contend, belief-revisers. An agent that fails to be adaptive is, in essence, a dogmatic reasoner. That dogmatic reasoning is not intelligent is a consequence of three premises: first, by definition, a dogmatic reasoner is incapable of altering its beliefs; second, intelligent agents tend toward holding beliefs which are justified in virtue of features of their surrounding environment; and third, surrounding environments are susceptible to change.

One might here object to this simple argument and contend that there could be a non-adaptive agent, which could nevertheless reflect the changing features of a surrounding environment despite having a fixed set of immutable beliefs. For example, consider a machine, ConditionalBlockhead, which, in addition to its control, transient memory, and persistent memory also has access to a clock. Like Blockhead, it contains (in memory) a memoized tree-like structure which contains responses to all possible questions. However, in ConditionalBlockhead, these responses are indexed by time. Since the original Blockhead program is “unconditional” in that its responses to questions are programmed without respect to time, it only has one answer (say, J.F.K.) to an opening question like “Who is the current president of the United States?” By hypothesis, Blockhead is preprogrammed with information from its creators to pass The Turing Test on a single run. However, it is vulnerable to a different kind of unintelligent behavior: when run in 2019, Blockhead still continues to state that Kennedy is the president of the United States. Given that its programmers have sufficient foresight, ConditionalBlockhead could be programmed to give different responses at different times. Importantly, however, the machine never changes its beliefs, but it nonetheless seems to reflect changes in the surrounding environment.

But this objection is mistaken for two reasons: first, since I am taking the view that the control code of any algorithm is finite, a memoizing machine like ConditionalBlockhead can only change its beliefs a finite number of times. If certain features of the environment were to change in perpetuity, then at some point a memoizing ConditionalBlockhead would fail to reflect those changes in the surrounding environment. Of course, one way to compute a function with an infinite domain using finite code is by using recursion (as in the Fac procedure above). But now we come to the second problem with the objection: even if a recursive version of ConditionalBlockhead were to be constructed and further be accurate for perpetual changes, it would only be accurate for one possible set of changes. Suppose, for example, a recursive version of ConditionalBlockhead had a function which returned the current president of the United States when given the current time. Since the function it computes is deterministic, this machine could only return one fixed pattern of output. For 2004, it might return “George W. Bush”, and for 2008, it might return “Barack Obama.” However, this machine would be insensitive to counterfactual changes: it would simply be incorrect in the situation (or possible world) in which John Kerry won the 2004 election and Hillary Clinton won the 2008 election.

I have been arguing that three features of The New Turing Test make it a better sufficient condition test of artificial intelligence above and beyond the Classical Turing Test: first, it requires that an intelligent machine do more than merely exhibit the knowledge of its creators; second, The New Test treats intelligence as an information-relative phenomenon rather than simply access to an extensive knowledgebase of information; and finally, The New Turing Test requires that artificially agents be sensitive to actual and counterfactual changes of environment. I now turn to a series of objections to the proposal I am making here.

Objections and Rejoinders

Objection #1a: A Learning Blockhead. According to this objection, the requirement that a task-general learning machine accord with the von Neumann architecture is feckless. Indeed, a Blockhead-type program could be implemented on a modern-day laptop with a sufficiently large store of memory. Since the architectural constraint does little to block the Blockhead, it leaves open the possibility of a LearningBlockhead; in this sort of program, guesses at an environment when given a sequence of stimuli are entirely memoized.

Rejoinder #1a. Firstly, The New Turing Test makes no requirement that a learning machine be implemented on a particular architecture. Instead, the requirement is that the machine abide by a division of labor suggested by the components of the von Neumann-Harvard architecture. That is, the machine must do some processing and not be a merely memoized collection of data stored in memory. In the Blockhead, this division of labor is off-balance: a large store of data is coupled with a simple processing step. (That Blockhead is so off-balance is indicated by the fact that it is constant in its time complexity but exponential in its space complexity.)

The author of objection might here respond that one could restore the division of labor while leaving the behavior of Blockhead unchanged. For example, one might propose the following machine, called TravelingBlockhead. This machine is just like Blockhead except that prior to giving a response to its interrogator in The Turing Test, it takes some time to compute an algorithm which is exponential in its time complexity—in particular, it computes something to solve the Traveling Salesman Problem. The division of computational labor has been restored, but the source of the machine's responses is the result of a “dumb” computational step.

First, I think that it is disputable that TravelingBlockhead does compute under the required division of computational labor. In particular, the requirement is not merely that the machine do processing along with making use of memory but that its computed output was the consequence of both processing and using memory. In this case, the output is still the consequence of a hash table lookup coupled with an unnecessary processing step.

Second, it is not readily apparent whether TravelingBlockhead would pass The Turing Test; the original Blockhead machine could easily keep up with the pace of the interrogator's questions since its computational step was constant in time complexity. Adding an additional, computationally expensive step would by no means ensure that the machine could keep up with the pace of conversation.

But third (and most importantly), I claim that there is no memoized analogue for the learning tasks required by The New Turing Test. The reason that the Blockhead machine could completely memoize its input and output is that all of the possible inputs (conversational sequences) were finite (though potentially very large). Hence, a programmer could write out individual responses for conversational sequence. The finitary nature of the original Turing Test is a consequence of the fact that the test is meant to be held for a fixed period of time (say, 10 minutes). Block therefore argues that there are only a finite number of sentences that can be typed in such an interval. The New Turing Test and its Emile certification procedure have no feature. But moreover, the nature of the learning tasks required by The New Turing Test makes the domain of input infinite rather than finite. It suffices to show just one case of a learning task whose domain of stimulus input is infinite. One such case is language learning. A machine successful at language learning would have to be able to handle all possible English language inputs (and presumably support for many other languages). In the Emile certification procedure, such a machine might be required to match the language acquisition of Baby Jeffrey born on the eve of a presidential election in 1984. Baby Jeffrey hears his first words from his father, James, who is smoking a cigarette in the hospital and watching the evening news:

James: Damn, Reagan's anti-ballistic missile program is stupid.

For a machine to completely memoize its language learning function, it would have to have responses to all possible stimuli sequences. But even the list of all possible first stimulus examples is finite, since Baby Jeffrey could have heard

James: Damn, Reagan's anti-anti-ballistic missile program is stupid.


James: Damn, Reagan's anti-anti-anti-ballistic missile program is stupid.

had the arms race with the Soviet Union transpired differently. This should show that a task-general learning machine could not pass The New Turing Test while being an analogue of the Blockhead.

Objection 1b: The Mortal Blockhead Objection. My interlocutor might here object that my claim that the domain of the stimulus input is infinite is misleading, and there are a few simple tricks to allow us to construct a memoized learning machine which could pass my proposed Emile certification procedure. To this end, my interlocutor proposes a machine called MortalBlockhead. Just as in the original Blockhead, the essential trick was to argue that the Classical Turing Test could only be held in a finite period of time. Hence, the space of possible conversations is also finite. But, here my interlocutor suggests that the mortality of humans gives us a similar trick. Since a human lives at most for, say, 120 years, the possible sequences of learning input are finite (though perhaps, just as in the original Blockhead objection, this space might be very large).

Rejoinder 1b. The first problem with this objection is that it relies on a premise that human minds and their learning capabilities are finitely limited. One might dispute whether finiteness of lifespan is a necessary rather than merely contingent feature of human minds. Perhaps more importantly, an author of a memoizing MortalBlockhead would have to choose a limit for the length of stimulus sequences based upon beliefs concerning how long humans can live. But surely this fact has to be contingent upon the state of medical science.

Perhaps this rebuttal engages in a little lightweight sophistry (or at least some suspicious armchair medical speculation). However, I think the objection makes a more significant mistake in interpreting The New Test. I have been emphasizing that there is a difference between The New Turing Test and the Emile certification procedure which is a heuristical and statistical test for examining whether a machine satisfies the sufficient condition set out by . explicitly forbids memoized machines since t requires that the machine divide its labor in the way suggested by the von Neumann-Harvard architecture.

Now this response might be extremely dissatisfying: is this requirement little more than an ad hoc modification speciously designed to avoid the insight found in the Mortal Blockhead objection? I claim that there is something more to the division of labor requirement than an ad hoc means of avoiding Blockhead-style machines. What the division of labor requirement is meant to enforce is the idea that, for a task to be considered learning, it must be in some way ampliative: a learning agent must enlarge his or her corpus of knowledge or belief through processing from evidence. This is precisely why no Blockhead may be a learning machine: it never gains more than what it had from the beginning a priori, and it never loses anything either. In essence, such a machine is not a belief-reviser at all. The division of computational labor requirement is a means of making sure that the machine in question engages in belief-revision.

Objection 1c: Finitism from Neurobiology. A third objection, related to the last, claims that human learning capabilities must be finite since brains themselves are finite physical objects made up of finite neurons and axonal connections. Therefore, the number of input stimuli a brain might represent is likewise finite. Hence, we might construct a learning machine Blockhead which responds to only this possibility space of input stimuli.

Rejoinder 1c. The fact that any physical device is finite in the space it occupies does not prevent it from implementing a recursive procedure; recursion is a means of specifying a infinite function from a finite instruction set.

Objection 2: The Lady Lovelace Objection. Turing (1950) considers an objection which he calls the Lady Lovelace Objection, due to Lady Ada Lovelace, the 19th-century English mathematician who studied and wrote about Charles Babbage's Analytical Engine. Here is Turing's own presentation of the objection and his response:

Our most detailed information of Babbage's Analytical Engine comes from a memoir by Lady Lovelace (1842). In it she states, “The Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform” (her italics). … This whole question will be considered again under the heading of learning machines. … A variant of Lady Lovelace's objection states that a machine can “never do anything really new.” This may be parried for a moment with the saw, ”There is nothing new under the sun.” Who can be certain that “original work” that he has done was not simply the growth of the seed planted in him by teaching, or the effect of following well-known general principles. A better variant of the objection says that a machine can never “take us by surprise.” This statement is a more direct challenge and can be met directly. Machines take me by surprise with great frequency. (Turing, 1950, p.450)

Rejoinder #2. In some respects, the Lady Lovelace objection might pose more serious consequences to my proposal than Turing's, since I have based mine more forthrightly on the existence of learning machines. If there can be no learning machines, then it seems either something is wrong with my account, or we have a negative answer to the question about the possibility of artificial intelligence.

However, I argue that Lady Lovelace's objection here is either misguided or simply incorrect, depending on how the objection is interpreted. The first interpretation (which Turing himself considers) is that the objection claims that a digital or mechanical computer is incapable of thinking for itself. If this is how we are to interpret the objection, then it is simply irrelevant to the claims I have been making in defense of The New Turing Test. Like Turing, I make no specific claims about whether machines can entertain thoughts, if only because it would require an account of thought (which is out of the scope of my ambitions here). The other natural interpretation is that computers are incapable of generating new knowledge (or as Turing puts it, taking us by surprise). But I think this claim is subject to counterexamples. Consider, for example, a machine which implements the Sieve of Eratosthenes, a 3rd-century B.C.E. algorithm to compute primes up to some number .

  def sieve_of_eratosthenes(n):
      m = n + 1
      natural_numbers = [True for i in range(m)]
      for i in range(2, int(n ** 0.5 + 1)):
          if natural_numbers[i]:
              for j in range(i ** 2, m, i):
                  natural_numbers[j] = False
      primes = []
      for i in range(2, m):
          if natural_numbers[i]:
              print("{0} is prime".format(i))
      return primes


Algorithm 1.4 SieveOfEratosthenes

(Unlike the other programming examples in this paper, which contain many dangling terms, this Python program will properly run. Try it out. Adjust the value for more primes, but be forewarned—your laptop may heat up a little.)

The Sieve of Eratosthenes is a remarkably clever but simple algorithm: begin with a list of numbers and cross off the multiples of 2, 3, 5, 7, and so on. The trick is that the method of elimination allows us to find the next prime after 7 by simply looking for the smallest uncircled number greater than it. Should the Sieve of Eratosthenes catch us by surprise? By nature, an algorithm is a procedure where there is no indeterminacy about which step comes next in the process. However, I think there is also little doubt that, given a digital computer that is sufficiently fast, we could use the SieveOfEratosthenes to answer questions about primes which we could not reasonably compute unassisted like “What is the least prime less that ?” or “How many twin primes are there between 9 quadrillion and 10 quadrillion?”

Objection #3: The Thinness Objection. According to this objection, The New Turing Test conceives of intelligence too narrowly. Intelligence is more than the ability to learn; it also sometimes consists of the ability to use learned knowledge for problem-solving and interacting with the environment.

Rejoinder #3. I agree with the claim that intelligent behavior sometimes consists of more than learning. When I defined conditional learning machines, I said that they must follow a pattern of having an interaction with the environment, receiving a stimulus response, and then making an identification of the environment. The nature of experiments, responses, and environments was never given any formal characterization (which is why I maintained that these were informal rather than formal definitions). However, nothing about the definition of an experiment prevents it from being a simultaneous use of learned knowledge. Consider, for example, a variant of the machine Jurafsky described in the preceding section. A (task-specific) variant of this machine might conduct completely passive experiments: it may simply wait for the next utterance to be generated in its conversational environment. On the other hand, an active version of the learner might generate a candidate sentence from the grammar it hypothesizes is present in the environment. This latter sort of machine is not merely learning a language but using it.

Objection #4: The Recursion Objection. According to this objection, no computer can perform the learning tasks I have described in The New Turing Test. This is because the machine learners that The New Turing Test requires run in perpetuity. The members of the class of recursive algorithms, however, compute values by halting after a finite period of time. When a Turing machine runs in perpetuity (i.e. fails to halt), it fails to compute anything at all.

Rejoinder #4. The model of computation on which I have based the concept of an adaptive learning machine comes from Gold (1967), who describes machine language learners who identify languages “in the limit.” Their status as algorithms has remained controversial. Traditionally, many logicians and computer scientists have been reluctant to call such machines “algorithms.” Others, like Burgin (2006), have maintained that such machines compose a class of superrecursive algorithms which perform hypercomputation: they complete computational processes which cannot be achieved by Turing machines or partial recursive functions. Indeed, Burgin goes as far as to claim that these machines provide a counterexample to the Church-Turing thesis. Burgin (2006) describes the controversial history of these machines in this way:

The first genuine superrecursive algorithms were introduced in 1965. Two American mathematicians Mark Gold and Hillary Putnam brought in concepts of limiting recursive, limiting partial recursive functions, and trial-and-error predicates. Their papers were published in the same issue of the Journal of Symbolic Logic, although Gold had written about these ideas before. It is worth mentioning that the constructions of Gold and Putnam were rooted in the ideas of nonstandard analysis originated by Abraham Robinson (1966) and inductive definition of sets (Spencer, 1959). … Limiting recursive, limiting partial recursive functions and methods of inductive inference are superrecursive algorithms and as such can solve such problems that are unsolvable by Turing machines. Although, being in a descriptive and not constructive form, they were not accepted as algorithms for a long time. Even the introduction of a device that was able to compute such functions (Freyvald, 1974) did not change the situation. Consequently, this was an implicit period of the development of theory of superrecursive algorithms. (Burgin, 2006, p.116)

It is not my intent here to resolve this thorny dispute. However, I think it here suffices to say that a premise of Objection #3—that the class of effective computational procedures is limited to those computed by Turing machines—is far from settled.

Figure 6. The bluebird of intelligence. (Royalty-free. Vault Editions.)

Objection #5: The Impossibility Objection. This objection makes use of a result in formal learning theory known as Gold's Theorem (see Gold (1967)). According to the theorem, the class of all languages—where a language is understood formally as a set of finite sequences of letters from a finite alphabet—is not learnable by any machine using positive stimuli alone. Hence, we have evidence that there can be no task-general learning machine.

Rejoinder #5. This impossibility objection resembles an objection to the Classical Turing Test which Turing himself disputes in Computing Machinery and Intelligence. The latter objection makes use of Gödel's Fist Incompleteness Theorem. According to the theorem, no effectively (i.e. computably) axiomatizable and consistent extension of Peano's axioms can prove every statement in the language of arithmetic or its negation. Since there are statements which remain (effectively) unprovable about the structure of first-order arithmetic, we have proof there can be no task-general machine.

My response to the Impossibility Objection will likewise resemble Turing's. The Incompleteness Objection to The Turing Test makes use of an implicit premise that human minds are actually capable of performing the tasks which a computer cannot. In his words,

This is the mathematical result: it is argued that it proves a disability of machines to which the human intellect is not subject. The short answer to this argument is that although it is established that there are limitations to the powers of any particular machine, it has only been stated, without any sort of proof, that no such limitations apply to the human intellect. (Turing, 1950, p.445)

The same idea applies to the Impossibility Objection: there is no conclusive evidence that humans are able to learn all languages from positive evidence. But a further moral applies here: task-generality need not require that a learning machine can learn any class—only that it be able to learn the ones learnable by a human intellect.

Conclusion and Further Work

I have attempted to propose and defend a New Turing Test which preserves the insights of the Classical Turing Test while evading some of its flaws. I argued that the central insight of the Turing Test was that a machine passing it would have to be a general-purpose rather than task-specific machine, capable of completing most or all of the tasks completable by the average adult human mind. The primary flaw in The Turing Test is that it provides no constraint that a program might only be a set of memoized data from the program's author. An artificially intelligent agent, I claimed, must be able to discover knowledge which its authors did not know. I argued that computational learning was a case where this desideratum was met. My ultimate aim was to propose a test concerning the existence of a general-purpose learning machine.

Turing (1950) makes some bold, perhaps even startling positive predictions about creating a machine that could pass The Turing Test. I have not ventured a guess here as to whether it will be possible to construct a machine which will pass The New Turing Test. The first step in extending this research will be to assess in what ways the standard repertoire of machine learning techniques (Artificial Neural Networks, Support Vector Machines, Bayesian Networks, etc.) either fail or succeed in meeting the requirements of The New Turing Test. A second task is to assess how well these methods can be employed in explaining or describing other learning tasks (e.g. causal learning, language learning, etc.) regularly undertaken by human agents.