# Gold’s Theorem

This post is a part of a series on machine learning theory. It is for my colleague Silvia.

John Locke begins An Essay Concerning Human Understanding, published in 1690, with a bold, empiricist aspiration: to demonstrate that all knowledge obtained by the Understanding is grounded in the faculty of sensation. Locke writes that he intends to show how agents “barely by the use of their natural faculties, may attain to all the knowledge they have, without the help of any innate impressions; and may arrive at certainty, without any such original notions or principles.” A competing, established opinion, he writes, claims that the Understanding depends in part upon “primary notions, koinai ennoiai, characters, as it were stamped upon the mind of man; which the soul receives in its very first being and brings into the world with it.” Locke does not say here which philosophers he has in mind in his rebuke of this theory of innate ideas, though the view resembles a psychological and epistemological position affirmed decades earlier by Descartes and also one by Locke’s contemporary G.W. Leibniz.

Arguably, many early 20th-century Anglophone philosophers shared in a Lockean repudiation of innate ideas. Logical positivists, logical empiricists, and psychological behaviorists could embrace a rebuke of innate ideas as a point in favor of scientific rigor and against the excesses of a priori metaphysics. A notable challenge to these 20th-century views came from Noam Chomsky, who writes in an article in Synthese in 1967 that “contemporary research supports a theory of psychological a priori principles that bears a striking resemblance to the classical doctrine of innate ideas.”

Here, Chomsky puts forward several arguments in favor of this claim, including the argument that language learning in humans has an aspect of creativity: when children achieve linguistic competence, they gain an ability to produce new phrases and sentences that are distinct from “familiar” sentences that they have already seen or heard and also distinct from generalizations of familiar sentences present in their learning environments as stimuli.

Figure 1. Some fruit. (Royalty-free. Vault Editions.))

In the same year, E. Mark Gold published an article which proposes a formalized paradigm for learning from stimulus examples—now called the Gold learning paradigm—and also a general but very powerful unlearnability theorem about language acquisition in this paradigm (today called Gold’s theorem). The unlearnability theorem states that many classes of languages (even fairly restricted classes such as the regular languages) are unlearnable in the formal paradigm from stimulus examples. For many cognitive scientists and philosophers, Gold’s Theorem provides strong (perhaps even incontrovertible) evidence for a Chomskyan innatism about language acquisition. Since Gold’s Theorem shows that classes of languages, such as the recursively enumerable languages, are not learnable from stimulus examples alone and since there are recursively enumerable natural languages that are learned by children, there must be innate psychological faculties or ideas which aid children in learning the languages that they do from their environments. This anti-empiricist attitude is described by \citet{Demopoulous1989:

In fact the argument we are about to give seems to have been first suggested by Gold in his original paper. Then the argument against empiricism is that, within empiricist methodology, it is not possible to motivate constraints which are specific to the domain of objects learned, and thus the empiricist cannot in this way insure that acquisition will occur. But since the success criterion and data format are not in question, the empiricist position is untenable, insofar as it implies that any recursively enumerable language is learnable.

# The Deprojective Theory: The Basics

This post is a part of a series on the deprojective theory.

It is something of a platitude that science does or ought to provide us humans more than mere empirical data. Rather the ideal, it is so said, is for science to provide us with knowledge. There are—of course—complications of nuance with any such bromide, but nevertheless philosophers also like to ask, ensuingly, what scientific knowledge consists in. Scientific knowledge: what is the real thing? One candidate is that science explains rather than merely reports. That is, explanation animates scientific knowledge. But, then, what is explanation?

Explanation is itself hard to explain. Many theorists have maintained that explanation is a matter of giving answers to why-questions. These differ from answers to other interrogations like what-questions, which are largely the domain of reporting and description and how-questions, which are the domain of engineering and artistry. But there are still complications. For example, think about Jimmy McNulty, the homicide detective in David Simon’s anti-hero in his series, The Wire, standing over the corpse of man killed in a vacant apartment building while next to a Baltimore coroner. They might question here

Why did this man die?

Both, in their own way, are scientists. But they seek different questions. And the answers which will satisfy them, therefore, will naturally be different. Probably the coroner is asking whether the man died of apoxia, whereas Jimmy asks who murdered him.

In a coming work, The Foundations of Microeconomic Analysis, I characterize what I claim is the fundamental explanatory methodology of neoclassical microeconomics and, in particular, general equilibrium theory. I call the methodology analogical-projective explanation. The explanatory regime is based on a more general epistemological phenomenon which I call a deprojection. I focus here on what the actual mode of explanation rather than its applications.

Both of these are terms of art, but there is really nothing new about them other than the terminology. Analogical-projective explanation belongs to a family of theories of explanation known collectively (and perhaps misleadingly) as deductivist theories. The term is misleading because they are meant to cover explanatory methods like deduction and induction.

What do they have in common? All such regimes of explanation require that when someone is explaining to you, say, what is so-and-so, that someone makes an inference where so-and-so is the conclusion. Why is the sky blue, you ask me? Here is a deduction which explains it: every time light passes through a scattering medium, light is scattered in an inversely proportional way to its wavelength. Every wavelength of light becomes more visible the more it is scattered. For all visible wavelengths of the spectrum (i.e. those not including x-rays and television waves, etc), the family of blue-violet has the shortest wavelength. The preceding three claims are what the philosopher C.G. Hempel called universally-quantified natural laws (or laws of nature). When we add the following plain assertion, that the sky is a scattering medium, along with the rules of the predicate calculus, should yield the conclusion the sky is blue. In this case, the portion of the deduction not including the conclusion serves as a explanation for the conclusion.

Such explanatory inferences can be carried out in other inferential systems. Outside of predicate logic, Hempel spent much time focused on regimes of statistical inference for the purposes of induction.

Figure 1. Some deprojectionists, surely. (Royalty-free. Vault Editions. )

Continue reading “The Deprojective Theory: The Basics” »

# Einstein, Podolsky, and Rosen

This post is a part of a series on physical theory.

In 1935, Einstein, Podolsky, and Rosen published a paper in the Physical Review entitled “Can Quantum-Mechanical Description of Physical Reality Be Considered Complete?” The paper constitutes part of the foundation for Einstein’s now well-known skepticism about the entrenched probabilism of quantum mechanics, which is sometimes summarized by the quote often attributed to him that God does not play dice with the universe.

EPR (as they are now called) do not propose experimental results but rather a thought experiment and a philosophical argument. In a particularly vivid variant of their thought experiment due to Bohm (1951)—now sometimes called the EPR experiment—two particles, an electron and a positron, separate at a point of radioactive decay. According to the experiment, they speed off in separate directions in a way where their total spin is zero. Along a chosen axis each of the particles can take one of two real values ( and ). For the observable property of spin along this axis, I will write for short “.”

According to the formalism of quantum mechanics, systems describing these two particles, individually, can be formulated as Hilbert spaces—that is, complex valued inner product spaces—and the representation of the spin observable along the chosen axis is represented in the formalism by a Hermitian (self-adjoint) operator. The observable values for the operator are then the eigenvalues of such operators (and since they are Hermitian, these values will be real numbers). Corresponding to each eigenvalue, there is an eigenspace. As a fact of linear algebra, there is a projection operator which takes any complex valued vector in the space and projects it onto the eigenspace defined by the th eigenvalue. For an observable , I will write the projection operator as “” and its th eigenvalue as “.” Likewise its projection operator for this eigenvalue will be written “.”

States of the positron system and the electron system are given by complex valued vectors. Part the mystery of quantum mechanical systems is that in general there is no certain answer to experimental questions like “Does the electron have value of spin ½ in the direction given that the state is ?” According to the Born rule, that answer is given by a probability value which is obtained from the projection operator on the eigenspace defined by the eigenvalue . So if the state is , then the probability of observable having value its th eigenvalue is given by the following formula:

Continue reading “Einstein, Podolsky, and Rosen” »

# Levels of Measurement and Cardinal Utility

A few weeks ago, I was having a chat with Todd and some others in the office and it was in the conversational mix that cardinal utility had the property of preserving “intervals.” It was occasionally also mentioned that such utility representations were closed under “linear transformations.” I was confused by the discussion and at first I didn’t know why. On my walk home that day, I remembered I had heard those sorts of claims before. I typically think of a linear transformations as any mapping from a vector space to a vector space , both over a field , with the following properties of linearity:

1. if is in the field then for , ;
2. if , then .

For example, the equation is a linear transformation from the vector space of the set of reals back into itself. So, . When we speak of the algebra on in one dimension, is the underlying set for the vector space as well as the field.

Note that has the first property; suppose for example that and . Then

It also has the second property; for example, let and ; then

But clearly, this linear transformation does not preserve intervals:

I didn’t think I could be wrong about my understanding of the conventional use of the term “linear.” I thought maybe what people mean instead of “linear” in this context is that cardinal utility was closed under the class affine transformations. That is, the class containing all transformations of the following form (where is a scalar value in the field of and ):
Continue reading “Levels of Measurement and Cardinal Utility” »

# Strong Homomorphisms and Embeddings

Homomorphisms are usually defined as structure preserving mappings from one model to another. Representation theorems are taken to establish the existence of a homomorphism between a qualitative first-order structure endowed with some empirical relations and some sort of numerical first order structure. The classic example is in the measurement of hedonic utility using introspection. In that case, we describe the axiomatic conditions ensuring the existence of a mapping from a structure to the structure of reals with their standard ordering . Here is meant to be a (usually finite) set of alternatives or choices and the relation is meant to encode the introspectively accessible relation that something feels better than something else.

I have been thinking that this notion of a homomorphism was exactly the same as in model theory but it turns out there are some subtleties. In model theory, usually we say that if we have two structures and in the same signature (which is a set of constant symbols, relation symbols, and function symbols), then a homomorphism from , the domain of , to , the domain of is a function satisfying the following conditions:

• (i) For any constant symbol in , is
• (ii) For any -ary function symbol in and ,
• (iii:a) For any -ary relation symbol in and ,

The superscripts here indicate how the symbols are interpreted in the respective structures with objects or -tuples of objects. The conditions taken together are less demanding conditions than what is usually meant, it appears, in the theory of measurement. Here, we replace the third condition with

• (iii:b) For any -ary relation symbol in and ,

In model theory, a mapping satisfying (i), (ii), and (iii:b) is called a strong homomorphism. The condition ensures that if two objects, for example, are not -related in then the will remain unrelated by in the mapping to .

If we add the condition that , a strong homomorphism, is an injection then the map is usually called an embedding. Likewise if a strong homomorphism is a bijection then it is an isomorphism.