A New Turing Test

This post is a part of a series on machine learning theory. It was written with the help of the logician and philosopher, D.A. Martin.

Alan Turing’s article, Computing Machinery and Intelligence, puts forth a well-known test for the intelligence of a digital computer. The test is based upon a cocktail party pasttime known as the Imitation Game. In the game, there are three participants: a man, a woman, and an interrogator. The woman and man are separated from the view of the interrogator, and the interrogator communicates with them via teletype (or some other typewritten medium). The man and the woman are identified by labels which do not reveal their gender (for example, “A” for the man and “B” for the woman). During game-play, the interrogator poses questions to A and B. The man’s task is to convince the interrogator that he is the woman while the woman’s task is the same. After some predetermined period of time (perhaps 10 minutes), the interrogator must determine the gender of the two contestants. A wins if he tricks the interrogator into believing that he is a woman. B wins otherwise.

The Turing Test is a modification to the Imitation Game where a digital computer plays the part of A and a human (of any gender) plays the part of B. Here, A’s task is to convince the interrogator that it is a human while B’s is to convince the interrogator of the truth. If A wins this game, the computer passes The Turing Test. Passing the test, Turing claims, is a meaningful, telltale sign of machine artificial intelligence. I shall use the term “machine” to mean “digital computer.” By “digital computer,” I shall mean the kind of electrical device equipped with a processing unit and memory. I shall be more explicit when I intend to refer to Turing machines, which are abstract, hypothetical rather than electrical devices. I shall also use X Machine or an “X application” to refer to a digital computer running the algorithm X. . The Turing Test has spawned an immense literature in several subfields of cognitive science. Arguably, it is also part of the reason for the pride of place given to question-answering in natural language processing (and AI in general).

Image
Figure 1. The ganglion of intelligence. (Royalty-free. Vault Editions.)





Nevertheless, in philosophy, the importance and meaning of The Turing Test have been a matter of longstanding controversy. I shall try to argue in what follows that there is a central insight about artificial intelligence embedded in Turing’s proposal: artificially intelligent systems would be able to solve a general class of problems without being debilitated by problems of exponential explosions in time and space complexity. In this respect, they are like human minds and unlike garden-variety computer programs like word processors, search engines, and calculators, all of which are designed to solve a very specific and fixed range of problems.
Continue reading “A New Turing Test” »

Gold’s Theorem

This post is a part of a series on machine learning theory. It is for my colleague Silvia.

John Locke begins An Essay Concerning Human Understanding, published in 1690, with a bold, empiricist aspiration: to demonstrate that all knowledge obtained by the Understanding is grounded in the faculty of sensation. Locke writes that he intends to show how agents “barely by the use of their natural faculties, may attain to all the knowledge they have, without the help of any innate impressions; and may arrive at certainty, without any such original notions or principles.” A competing, established opinion, he writes, claims that the Understanding depends in part upon “primary notions, koinai ennoiai, characters, as it were stamped upon the mind of man; which the soul receives in its very first being and brings into the world with it.” Locke does not say here which philosophers he has in mind in his rebuke of this theory of innate ideas, though the view resembles a psychological and epistemological position affirmed decades earlier by Descartes and also one by Locke’s contemporary G.W. Leibniz.

Arguably, many early 20th-century Anglophone philosophers shared in a Lockean repudiation of innate ideas. Logical positivists, logical empiricists, and psychological behaviorists could embrace a rebuke of innate ideas as a point in favor of scientific rigor and against the excesses of a priori metaphysics. A notable challenge to these 20th-century views came from Noam Chomsky, who writes in an article in Synthese in 1967 that “contemporary research supports a theory of psychological a priori principles that bears a striking resemblance to the classical doctrine of innate ideas.”

Here, Chomsky puts forward several arguments in favor of this claim, including the argument that language learning in humans has an aspect of creativity: when children achieve linguistic competence, they gain an ability to produce new phrases and sentences that are distinct from “familiar” sentences that they have already seen or heard and also distinct from generalizations of familiar sentences present in their learning environments as stimuli.

Image
Figure 1. Some fruit. (Royalty-free. Vault Editions.))


In the same year, E. Mark Gold published an article which proposes a formalized paradigm for learning from stimulus examples—now called the Gold learning paradigm—and also a general but very powerful unlearnability theorem about language acquisition in this paradigm (today called Gold’s theorem). The unlearnability theorem states that many classes of languages (even fairly restricted classes such as the regular languages) are unlearnable in the formal paradigm from stimulus examples. For many cognitive scientists and philosophers, Gold’s Theorem provides strong (perhaps even incontrovertible) evidence for a Chomskyan innatism about language acquisition. Since Gold’s Theorem shows that classes of languages, such as the recursively enumerable languages, are not learnable from stimulus examples alone and since there are recursively enumerable natural languages that are learned by children, there must be innate psychological faculties or ideas which aid children in learning the languages that they do from their environments. This anti-empiricist attitude is described by \citet{Demopoulous1989:

In fact the argument we are about to give seems to have been first suggested by Gold in his original paper. Then the argument against empiricism is that, within empiricist methodology, it is not possible to motivate constraints which are specific to the domain of objects learned, and thus the empiricist cannot in this way insure that acquisition will occur. But since the success criterion and data format are not in question, the empiricist position is untenable, insofar as it implies that any recursively enumerable language is learnable.

Continue reading “Gold’s Theorem” »

The Deprojective Theory: The Basics

This post is a part of a series on the deprojective theory.

It is something of a platitude that science does or ought to provide us humans more than mere empirical data. Rather the ideal, it is so said, is for science to provide us with knowledge. There are—of course—complications of nuance with any such bromide, but nevertheless philosophers also like to ask, ensuingly, what scientific knowledge consists in. Scientific knowledge: what is the real thing? One candidate is that science explains rather than merely reports. That is, explanation animates scientific knowledge. But, then, what is explanation?

Explanation is itself hard to explain. Many theorists have maintained that explanation is a matter of giving answers to why-questions. These differ from answers to other interrogations like what-questions, which are largely the domain of reporting and description and how-questions, which are the domain of engineering and artistry. But there are still complications. For example, think about Jimmy McNulty, the homicide detective in David Simon’s anti-hero in his series, The Wire, standing over the corpse of man killed in a vacant apartment building while next to a Baltimore coroner. They might question here

Why did this man die?


Both, in their own way, are scientists. But they seek different questions. And the answers which will satisfy them, therefore, will naturally be different. Probably the coroner is asking whether the man died of apoxia, whereas Jimmy asks who murdered him.

In a coming work, The Foundations of Microeconomic Analysis, I characterize what I claim is the fundamental explanatory methodology of neoclassical microeconomics and, in particular, general equilibrium theory. I call the methodology analogical-projective explanation. The explanatory regime is based on a more general epistemological phenomenon which I call a deprojection. I focus here on what the actual mode of explanation rather than its applications.

Both of these are terms of art, but there is really nothing new about them other than the terminology. Analogical-projective explanation belongs to a family of theories of explanation known collectively (and perhaps misleadingly) as deductivist theories. The term is misleading because they are meant to cover explanatory methods like deduction and induction.

What do they have in common? All such regimes of explanation require that when someone is explaining to you, say, what is so-and-so, that someone makes an inference where so-and-so is the conclusion. Why is the sky blue, you ask me? Here is a deduction which explains it: every time light passes through a scattering medium, light is scattered in an inversely proportional way to its wavelength. Every wavelength of light becomes more visible the more it is scattered. For all visible wavelengths of the spectrum (i.e. those not including x-rays and television waves, etc), the family of blue-violet has the shortest wavelength. The preceding three claims are what the philosopher C.G. Hempel called universally-quantified natural laws (or laws of nature). When we add the following plain assertion, that the sky is a scattering medium, along with the rules of the predicate calculus, should yield the conclusion the sky is blue. In this case, the portion of the deduction not including the conclusion serves as a explanation for the conclusion.

Such explanatory inferences can be carried out in other inferential systems. Outside of predicate logic, Hempel spent much time focused on regimes of statistical inference for the purposes of induction.

Image
Figure 1. Some deprojectionists, surely. (Royalty-free. Vault Editions.)


Continue reading “The Deprojective Theory: The Basics” »

Einstein, Podolsky, and Rosen

This post is a part of a series on physical theory.

In 1935, Einstein, Podolsky, and Rosen published a paper in the Physical Review entitled “Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?” The paper constitutes part of the foundation for Einstein’s now well-known skepticism about the entrenched probabilism of quantum mechanics, which is sometimes summarized by the quote often attributed to him that God does not play dice with the universe.

EPR (as they are now called) do not propose experimental results but rather a thought experiment and a philosophical argument. In a particularly vivid variant of their thought experiment due to Bohm (1951)—now sometimes called the EPR experiment—two particles, an electron and a positron, separate at a point of radioactive decay. According to the experiment, they speed off in separate directions in a way where their total spin is zero. Along a chosen axis each of the particles can take one of two real values ( and ). For the observable property of spin along this axis, I will write for short “.”

According to the formalism of quantum mechanics, systems describing these two particles, individually, can be formulated as Hilbert spaces—that is, complex valued inner product spaces—and the representation of the spin observable along the chosen axis is represented in the formalism by a Hermitian (self-adjoint) operator. The observable values for the operator are then the eigenvalues of such operators (and since they are Hermitian, these values will be real numbers). Corresponding to each eigenvalue, there is an eigenspace. As a fact of linear algebra, there is a projection operator which takes any complex valued vector in the space and projects it onto the eigenspace defined by the th eigenvalue. For an observable , I will write the projection operator as “” and its th eigenvalue as “.” Likewise its projection operator for this eigenvalue will be written “.”

States of the positron system and the electron system are given by complex valued vectors. Part the mystery of quantum mechanical systems is that in general there is no certain answer to experimental questions like “Does the electron have value of spin ½ in the direction given that the state is ?” According to the Born rule, that answer is given by a probability value which is obtained from the projection operator on the eigenspace defined by the eigenvalue . So if the state is , then the probability of observable having value its th eigenvalue is given by the following formula:



Continue reading “Einstein, Podolsky, and Rosen” »

Levels of Measurement and Cardinal Utility


A few weeks ago, I was having a chat with Todd and some others in the office and it was in the conversational mix that cardinal utility had the property of preserving “intervals.” It was occasionally also mentioned that such utility representations were closed under “linear transformations.” I was confused by the discussion and at first I didn’t know why. On my walk home that day, I remembered I had heard those sorts of claims before. I typically think of a linear transformations as any mapping from a vector space to a vector space , both over a field , with the following properties of linearity:

  1. if is in the field then for , ;
  2. if , then .


For example, the equation is a linear transformation from the vector space of the set of reals back into itself. So, . When we speak of the algebra on in one dimension, is the underlying set for the vector space as well as the field.

Note that has the first property; suppose for example that and . Then



It also has the second property; for example, let and ; then



But clearly, this linear transformation does not preserve intervals:



I didn’t think I could be wrong about my understanding of the conventional use of the term “linear.” I thought maybe what people mean instead of “linear” in this context is that cardinal utility was closed under the class affine transformations. That is, the class containing all transformations of the following form (where is a scalar value in the field of and ):
Continue reading “Levels of Measurement and Cardinal Utility” »