Finite Domain, Complexity, and the Turing Test

This post is currently a draft.

Blockhead and Aunt Bertha are part of a class of objections which charge that the Turing test does not constitute a sufficient condition for intelligence on the grounds that there are trivial programs which can pass it through memorization. An example such objection might begin by arguing that it is logically possible to construct a computer program in the following way:

1. Perform a Turing test with a sequence of questions
   with a human (call her Bertha) respondent

2. Record those responses on a hash structure on 

3. Repeat steps 1 and 2 for each possible Turing test
   sequence of questions.

Figure 1. A setup procedure

Once this setup procedure is completed, the objection specifies the following rule for interactions in the interrogation: when posed with a question from an interrogator, have the program look up Bertha’s responses in the hashtable. Because we have enumerated all possible Turing test sequences, the appropriate response should be somewhere in it.

What is wrong in calling the Bertha an intelligent machine? According to Block, the machine does not engage in any information processing. The mere transmission of Turing test responses processed by some other system (human or otherwise) does not seem sufficient for us to count the device doing the transmission as artificially intelligent. Block (1981) analogizes the the Bertha machine’s intelligence to a kind of walkie-talkie. He writes:

The point of the machine example may be illuminated by comparing it with a two-way radio. If one is speaking to an intelligent person over a two-way radio, the radio will normally emit sensible replies to whatever one says. But the radio does not do this in virtue of a capacity to make sensible replies that it possesses.

In the following, I would like to explore two questions. First, does the Blockhead class of examples pose a genuine threat to the Turing test as a test of intelligence? I wish to argue that in order to make a Blockhead style example computable, one must place a finite bound on the length of Turing test interrogations. I think that imposing the constraint fundamentally alters the hardness of the Turing test. The alteration of hardness is so dramatic, I believe the Blockhead style examples are really irrelevant to the Turing test. The second question I wish to explore concerns a recent suggestion to place constraints on the running time or complexity of Turing test machine candidates as a way of methodically ruling out Blockhead-style counterexamples. I wish to argue that this constraint does not succeed in fixing the problem it intends to solve. However, I begin with an elaboration of some of the philosophical issues motivating Blockhead and some preliminaries on the complexity of algorithms.

1. Processing and Complexity

The class of Blockhead style counterexamples makes problematic what at first appears to be a appealing feature of the Turing test: performing a Turing test does not require any inspection of the artificial agent’s internal features or configuration. Instead, we may look at the stimulus-response behavior of the agent in the interrogation to make a decision about its intelligence. Accordingly, this aspect of the Turing test is sometimes regarded as its implicit reliance on behaviorism. The counterexamples suggest that even if the machine passes the Turing test, what we need to do is to look into the machine’s internal configuration to make sure that some kind of processing is at work. This kind of move away from behaviorism is considered psychologistic.

Though we can modify the Turing test to include an information processing requirement, specifying what information processing amounts to seems inevitably to turn out either too general and vacuous or too narrow and chauvinistic. Consider the strategy of adding a psychological requirement for the artificial agent playing the Turing test. For example, we could add that the agent in question has to “think about” the question being asked. Though this would probably rule out the case of the walkie-talkie, we would be in the undesirable position of saying what we meant by a machine having to think. Turing (1950) writes that the purpose of the Imitation Game proposal is to avoid giving an analysis or definition of a philosophically troublesome concept like machine thought. The problem with specifying an additional, general psychological predicate as a requirement for all Turing test agents is not even in finding the right one (is it thought? is it reasoning?) but then characterizing it with enough specificity for it to rule out anything at all.

To take the other, more concrete alternative, we could say that the agent would have to use a CPU in order to compute its Turing test responses. This would presumably also rule out walkie-talkies, which are usually CPU-less devices. But this requirement is chauvinistic to our own state of digital technology: though the current generation of our most powerful computational devices are each organized using the von Neumann architecture and always contain a very powerful processing unit, it is not, I do not think, a necessary feature of all computational devices no less all candidate intelligent agents. The von Neumann architecture might very well be the best solution we have at present for reasonable computer organization and design but who is to say it won’t be superseded by a better system in the future? This point carries over from CPUs to other computational features more broadly: once we require a concrete feature (CPUs, RAM, disk drives, etc.) in a candidate computer or program, we are bound to exclude other objects which serve the same role.

Since the original Turing test allowed us to make assessments of intelligence without any regard for details of internal configuration, we might attempt to salvage the Turing test by ruling out the Blockhead computer on some principled grounds. The sort of strategy I would like to consider is to add constraints on the running time or storage capacity of the computer programs playing the Turing test. Though Turing never explicitly includes any performance constraints of this sort in his description, they are implicitly present: the computer must at least be fast enough to keep up with the rate of usual human conversation.

Computer scientists typically study the performance or speed of algorithms not in units of elapsed time as such but rather the number of steps required to complete the computation, which therefore only have an indirect relationship with time. I w ill call computers whose time performance is able to measured this way discrete machines. To further complicate matters about the performance of discrete machines, the number of time steps used in a computation is often variable dependent on its input. Consider the following algorithm Duplicates, which returns true if two arrays of length contain a duplicate:

def has_duplicate(A,B):
    for i in range(len(A)):
        for j in range(len(B)):
          if A[i] == B[j]:
              return True
     return False

Figure 2. The Duplicates program

Duplicates first walks through the first array, A, looking for duplicates in B within an inner loop. Hence, the number of steps in the process is steps for two pairwise distinct input arrays (where the outer loop has one step, incrementing the counter and the inner loop has an additional comparison step, and the return statement at the the end adds a final additional step). Typically the specific step count is not needed and the time complexity of the algorithm is written as , meaning that the number of time steps of the algorithm is bounded above by some constant multiple of (e.g. 3). Note that conventionally we always classify the algorithms step count by the worst-case scenario; no matter how large the arrays are, the function always returns in three steps if a duplicate is located at the starting indices. However, this is not characteristic of the worst case inputs.

Is Duplicates a fast program? Standardly, algorithms which an only be computed in exponential time (e.g. ) are considered intractable. This algorithm is clearly subexponential but, in practice, computer scientists aim for their algorithms to be linear (i.e.) or linearithmic to be perform on modern computing hardware on large cases. I will say something more about the question of complexity and real-time measurements of performance later on.

2. Finite Constraints and Bertha

To return to Aunt Bertha, I now consider whether the program would be too slow to pass a Turing test. For example, consider the following basic constraint on running time:

(FT) Finite Time. In order to pass the Turing test, an artificial Turing test agent must return adequate responses to verbal stimuli in a finite period.

Such a constraint seems so obvious as to probably escape mention because finitism of this kind is an essential characteristic of the orthodox Church-Turing conception of the computer (though examples of hypercomputation or superrecursion are occasionally proposed, see e.g. Burgin (2005)). But what is not at all obvious is whether the Bertha procedure above meets criterion FT. Defenders of the Bertha objection will likely hasten to add, at this stage, that the performance of the algorithm during runtime is , since the underlying data structure storing the responses is a hashtable (in random access memory). However, that value is the time complexity of the interrogation phase of the computer program. If the number of Turing test interrogation sequences is unbounded and we include the setup phase as part of the procedure in the count of steps, then the program playing the Turing test cannot return responses in finite time since it never exits the set up phase. Hence it fails to satisfy FT.

Block’s own solution is to fix the Turing test to some length of time (like an hour). Is this a reasonable modification to the original Turing test? On the one hand, we should expect that if the Turing test were operationalized, the test would be finite in length. But this fact doesn’t seem to give us any further reason to explicitly prevent the length of the test from being potentially infinite.

Introducing a finite bound on the domain of function often has a dramatic effect on the computability profile of the problem it is intended to solve (at least when computability is conceived as computability by a Turing machine). For example, limiting our attention to a finite number of models which are finite in size makes first order logic with a finite signature decidable (by some Turing machine) whereas this is not so in the general case. Likewise removing the axiom scheme of induction from Peano arithmetic to yield Presburger arithmetic produces a decidable axiomatization. Perhaps most striking, introducing a finite bound on the domain of arguments to a function obviates the need for recursion. That is, if we accept the Church-Turing conception of a computer, then its finite control program can simply code return values on a per-input basis. What recursion allows us to do is to define a procedure which can return values for an infinite number of values using a finite control program. Creating a finite bound on the length of a Turing test conversation may at first appear to be an innocuous recognition of the the fact that all scientific experiment have a finite time period but it also fundamentally changes the computational hardness of the task.

In the case of Bertha, bounding the length of the interrogation is what permits the body of the Turing test sequences to be memorized; if the length were unbounded and hence memorization impossible, then the machine would have to be programmed to manufacture responses creatively based upon contextual information from the history of the entire conversation. Perhaps the point of Turing test, at heart, is to show when artificial agents exhibit substantial intelligence beyond that which was directly endowed to them by their creators.

Blocks addresses concerns about the finite bound of the interrogation in Objections 7, 7a, and 7b of Psychologism and Behaviorism. Because the human player’s ability to participate by is bounded (by death, if we are being grim, by retirement if we are not), Block writes the ability to complete a Turing test of unbounded length could not be a necessary condition of intelligence. He writes:

You can (if you like) characterize intelligence as the capacity to pass a Turing Test of arbitrary length, but since humans do not have this capacity, your characterization will not be a necessary condition of intelligence, and even if it were a sufficient condition of intelligence (which I very much doubt—see below) a sufficient condition of intelligence that humans do not satisfy would be of little interest to proponents of views like the neo-Turing Test conception.

It is conventional to treat the Turing test as a sufficient condition test for intelligence rather than the stronger (and indeed incorrect) claim that it is a necessary condition. Block’s remarks here, even if we restrict our attention to the Turing test as a sufficient condition test, are perplexing. First it is unclear what, if anything, could be meant by a human passing or failing the Turing test. Perhaps what he means is that an ersatz human player could play the role of the computer in original Turing test. This would yield a new test whose results would be obscure at best because the form of the Turing test provides meaningful results only when one player with a special attribute competes against another player who wishes to mimic that attribute but does not have it genuinely. (The attribute of being a woman was the property of interest in the original imitation game described by Turing). Of course, one human cannot compete against another in a Turing test while pretending to simulate humanlike intelligence.

Another alternative is that a human’s passing the Turing test simply means that the human player, by virtue of his or her responses, convinces the judge of his or her true identity—i.e. the artificial agent fails the Turing test. But there is no reason to think that leaving the length of the interrogation unbounded would guarantee that the human could not, in this sense, “pass” the Turing test. If we agree that (i) the original Turing test as described by Turing did not bound the length of the interrogation and (ii) there have been many such empirical approximations of the original Turing test, then the claim that a human could never pass would amount to the claim that artificial agents have always or mostly fooled the adjudicators. Obviously, continued interest in the Turing test is due to the fact that humans do regularly pass the Turing test and so this interpretation would be absurd.

What Block might mean here is that if the Turing test were unbounded, the judge would have a foolproof strategy to identify the computer and the human independent of the computer’s properties by playing the game forever. (Admittedly, it is hard to see what this would have to do specifically with the human player’s failing the test.) In this interpretation of Block’s discussion, all the judge would have to do is wait out until the human perished thereby revealing the identities of the competitors. Of course, one premise of this line of reasoning is that the artificial agent always outlives any human competitor but this is no stipulation of the Turing test. Moreover, for the strategy to work, the human judge would have to also outlive the competitors and it’s hard to see how this could be stipulated. Furthermore, it would be presumably trivial to have a programmed artificial agent limit its responses to a period similar to the life of a human, after which it would fail to respond to the teletyped prompts from the judge.

Much of the confusion in Block’s Objection 7 has its root in the formulation of the objection as a problem concerning the Bertha program. For Block’s dialectical opponent, the main difficulty about finiteness is that the Bertha machine only operates on a finite domain. Block rightly responds that humans too only operate on a finite number of inputs over their lifetimes. But the real problem is not about Bertha as a program but rather that, in order to regard Bertha as a counterexample, we have to make an alteration to the Turing test itself by introducing an a priori bound on interrogation length. How could such a move not fundamentally alter the hardness for the agent? Consider a more familiar example: suppose you are teaching a calculus class and you have to choose between giving an exam with questions from the student’s textbook or questions which you could invent the night before. Using the textbook might make your life easier but a cunning student John might make another student Jean work out the problems with him the night before. So all John has to do is memorize the solutions. That would ultimately, then, make the test an examination of John’s memory than his mastery of calculus. To test the latter, it would be necessary to quiz John on novel problems to which he had no previous exposure. Introducing the finite bound for the Turing test is something like choosing the textbook option. And similarly, when you tell your class that some of the exam problems may be new and not found in the textbook, you expand the set of potential problems to an infinite cardinality. Note here that this modification is independent of the number of problems John and Jean ultimately end up doing; what makes the test harder is that they have to prepare for an infinite number of potential problems. Since you have made the test impossible to study for by memorization, John is forced to learn the principles of calculus himself in order to do the problems on his own. The fact that the domain of the exam is now unbounded does not imply however that John’s presumably finite lifespan jeopardizes his ability to demonstrate his knowledge of calculus to you. How many problems are needed to do this? Well, that is really a determination to be made by you the teacher just as in the Turing test it is a determination to be made by the human judge.

3. Infinitary Capacities

In Block’s objection 7a, his imagined interlocutor argues that since humans are capable of understanding an infinite number of sentences and that the Bertha machine is only capable of understanding a finite number of sentences, the Bertha machine should not be considered intelligent. Block’s response is that in order to stipulate such a condition on capacities, one would inevitably have to be led to a “kind of theorizing about cognitive mechanisms that would be unacceptable to the behaviorist.” He writes:

Consider the kind of reformulation of the neo-Turing Test conception of intelligence suggested by the idealization objection; it would be something like: “intelligence = the possession of language-processing mechanisms such that, were they provided with unlimited memory, and were they directed by motivational mechanisms that assigned at least a moderately high preference value to responding sensibly, and were they `insulated’ from `stop’ signals from emotion centers, and so forth, then the language-processing mechanisms would have the capacity to respond sensibly indefinitely.’Notice that in order to state such a doctrine, one has to distinguish among various mental components and mechanisms.

What Block means here is that if we were to introduce a requirement in the Turing test that passing test takers must be able to respond “indefinitely” then we would inevitably be led to cognitive rather than behavioral constraints. This theorizing would not be, he claims, merely about language processing but perhaps also potentially about emotions and other mental faculties. For example, it might be the case the agent would have to be emotionally adjusted to play the imitation game indefinitely. In objection 7b, his interlocutor responds that one need not directly speak of “infinite capacities” for language but rather that one add the non-cognitive idealization that the players not be subject to nonaccidental causes of death. In this case, the human agents would live forever and hence indirectly be capable of producing and understanding an infinite number of distinct sentences. Block’s response here to objection 7b is insistence that “how long we can go on in a Turing Test is not just how long we live, but the nature of our cognitive mechanisms and their interactions with other mental mechanisms.”

Both Block and his interlocutor agree, it appears, that the Bertha machine only has a finitary capacity for language but I believe the issue is much more subtle. I have said already that we are prone to be chauvinistic when specifying conditions on the sort of machine taking the Turing test. One way of being chauvinistic in this way is by requiring that the artificial agent participating in the Turing test be a discrete computing device (or Turing-complete device) like a Turing machine. The purely behavioristic constraints of the Turing test allow it to be a general purpose test for intelligence rather than a test for a specific kind of agent. Nevertheless, it is obvious that what Block has in mind is something like a discrete computing device. If so, then such machine is presumably capable of recognizing an infinite language (even if its control code is not specifically designed to do so). Indeed, even a finite state automaton, which is only capable of recognizing regular languages, can nevertheless recognize an infinite language. Of course, both finite state automata and Turing machines are abstractions where the devices can take an unbounded number of steps (and in the case of the Turing machine, may use an unbounded number of cells in memory.) In reality, of course, we assume that a device’s memory, whether it is Turing complete or not, is bounded by physical constraints (as is a human’s). The subtlety here is that whether we should regard these physical limitations as essential or accidental characteristics of these agents. If we are open to these physical characteristics being accidental rather than essential properties of agents, it is by no means obvious that the Bertha machine and indeed humans fail to have an infinitary capacity for language.

Whatever comes of this subtlety, Block’s response to both objections follows the same pattern: the first is to infer that the unboundedness of the length of interrogation, which is a negative constraint on the Turing test itself, automatically translates into a requirement that the agent taking the test have a some intrinsic positive infinitary property. Block then argues that (i) since humans themselves are bounded intelligent agents, such an infinitary feature could not be a necessary condition of intelligence in general; and (ii) notwithstanding, adding such a constraint would inevitably lead us to cognitive theorizing.

First, as in Block’s response to objection 7, I think there is yet again an unwarranted augmentation of the Turing test from a test for of artificial agents to a test of intelligence for humans as well. As I have said above, it is hard to make sense of what it means for a Turing test to be a test for humans in the first place. Even leaving the Turing test aside, to claim that the feature of being able to speak and understand sentences unbounded in length is a necessary or sufficient condition of human intelligence would be obtuse. Indeed, even if you maintained both that the capacity for human language was a necessary condition of intelligence in general and that human language has an essentially infinitary character (perhaps that such a language must always have an infinite number of sentences), it seems superfluous and irrelevant if not just wrongheaded to further claim that intelligent agents must under the appropriate ideal conditions be able to speak and understand forever. In this, I think Block agree and we may as well grant him the point that specifying with byzantine ferocity the conditions for a human incapable of “nonaccidental death” to be able to speak forever would probably involve the description of some non-linguistic mentalistic features.

But I think more telling is the move in the discussion from the properties, structure, and constraints on the imitation game test (i.e. talk of the bounded or unbounded length of the interrogation) to the properties of the agent itself. The unbounded length of the Turing test interrogation is not a test for the property that an agent can “speak forever” any more than a calculus exam without the problem set specified beforehand is a test for a student to write out a potential infinity of solutions on paper. That is not to say, however, that the unbounded length of the Turing test is unimportant; instead, I claim that the feature is only superficially and accidentally a constraint on an agent’s reliance on memory. Instead it is a behavioral constraint to ensure that the agent is reacting adaptively and creatively to novel stimuli. The same could be said of the teacher giving a calculus exam; the reason why the teacher resists giving out a problem set in advance is not because he or she thinks that the only way to demonstrate knowledge of calculus is by completing arbitrarily large numbers of problems but rather because doing new and unseen problems is part of showing that you have gained a faculty in mathematics.

4. Exponential Complexity and Bertha

This section is in draft.

Another constraint, supported by Shieber (2004), draws from a discussion of computational complexity in Block (1981). Block’s own requirement of limiting the duration of Turing test was a constraint on the interrogation itself and I have argued above that, in total, that made the Turing test easier for machine designers. Principle FT above and the following constraint are not constraints on the interrogation but rather on the machines taking the Turing test. They are both aimed at making the machine designer’s job harder. Indeed, Shieber’s constraint might be considered a refinement of FT; while the latter said that machines must finish their computations in finite time, Shieber says machines must not be able to exploit any more than exponential storage space in computing their Turing test responses. More explicitly:

(SS) Subexponential Space. In order to pass the Turing test, an artificial Turing test agent must return adequate responses to verbal stimuli in a finite number of steps without requiring space exponential in the length of the stimuli.

Just as we can assess computer programs for their speed (i.e. their time complexity), we may also study how much space they use (i.e. space complexity). Again, this is not a straightforward measurement because, just as before, the amount of space used by algorithm maybe dependent upon its input. Moreover, different compressions schemes allow the same object to represented with different storage (for example, a googol maybe represented with 101 characters or just the six spelling out “googol.”) However, we can establish the number of units of space used (where these units may be bytes of storage or cells on a Turing machine tape) based on the size of the input itself. In this case, we may again say again use big omicron notation to say, e.g., that a particular program is . This would mean that the storage use of the algorithm is bounded by some constant multiple of the size of the input .

Does the SS constraint offer us a principled way of blocking Blockhead while preserving what is desirable in the original Turing test?

The focus on exponential space complexity seems to derive not from Block himself but from his imagined dialectical opponent, who cites an a dictum from Newell and Simon (1979). They write that “the task of intelligence… is to avert the ever-present threat of the exponential explosion of search.” It is a convention of computer scientists to view exponential time complexity as an intractable one, though in reality most general purpose algorithms (sorting, searching) become infeasible to compute on modern machinery when they have quadratic (i.e. ) running time. To make things even more complicated, altering the base of the exponent of an exponential running time algorithm to draw it closer to 1 often makes the algorithms suitable for practical use for a reasonable range of inputs (this is one strategy when having to create an algorithm for an NP-complete problem).