Gold’s Theorem

This post is a part of a series on machine learning theory. It is for my colleague Silvia.

John Locke begins An Essay Concerning Human Understanding, published in 1690, with a bold, empiricist aspiration: to demonstrate that all knowledge obtained by the Understanding is grounded in the faculty of sensation. Locke writes that he intends to show how agents “barely by the use of their natural faculties, may attain to all the knowledge they have, without the help of any innate impressions; and may arrive at certainty, without any such original notions or principles.” A competing, established opinion, he writes, claims that the Understanding depends in part upon “primary notions, koinai ennoiai, characters, as it were stamped upon the mind of man; which the soul receives in its very first being and brings into the world with it.” Locke does not say here which philosophers he has in mind in his rebuke of this theory of innate ideas, though the view resembles a psychological and epistemological position affirmed decades earlier by Descartes and also one by Locke’s contemporary G.W. Leibniz.

Arguably, many early 20th-century Anglophone philosophers shared in a Lockean repudiation of innate ideas. Logical positivists, logical empiricists, and psychological behaviorists could embrace a rebuke of innate ideas as a point in favor of scientific rigor and against the excesses of a priori metaphysics. A notable challenge to these 20th-century views came from Noam Chomsky, who writes in an article in Synthese in 1967 that “contemporary research supports a theory of psychological a priori principles that bears a striking resemblance to the classical doctrine of innate ideas.”

Here, Chomsky puts forward several arguments in favor of this claim, including the argument that language learning in humans has an aspect of creativity: when children achieve linguistic competence, they gain an ability to produce new phrases and sentences that are distinct from “familiar” sentences that they have already seen or heard and also distinct from generalizations of familiar sentences present in their learning environments as stimuli.

Image
Figure 1. Some fruit. (Royalty-free. Vault Editions.))


In the same year, E. Mark Gold published an article which proposes a formalized paradigm for learning from stimulus examples—now called the Gold learning paradigm—and also a general but very powerful unlearnability theorem about language acquisition in this paradigm (today called Gold’s theorem). The unlearnability theorem states that many classes of languages (even fairly restricted classes such as the regular languages) are unlearnable in the formal paradigm from stimulus examples. For many cognitive scientists and philosophers, Gold’s Theorem provides strong (perhaps even incontrovertible) evidence for a Chomskyan innatism about language acquisition. Since Gold’s Theorem shows that classes of languages, such as the recursively enumerable languages, are not learnable from stimulus examples alone and since there are recursively enumerable natural languages that are learned by children, there must be innate psychological faculties or ideas which aid children in learning the languages that they do from their environments. This anti-empiricist attitude is described by Demopoulous (1989) claim that Gold’s Theorem provides an argument against rationalism. They write:

The actual relation of Gold’s results to the empiricism/rationalism controversy seems to us rather different. Gold’s paradigm looks a lot more like a formalization of so-called `rationalism’. The fixed class of candidate hypotheses (grammars) corresponds to what is given by universal grammar: the innate definition of the essential properties of language. What Gold actually shows, therefore, is not “the plausibility of rationalism” but rather the inadequacy of a huge range of rationalist theories: under a wide range of different choices of universal grammar, language acquisition appears to remain impossible.



The apparent consistency of Gold’s impossibility theorem with both a theory of rationalism and a competing theory of empiricism suggests that Gold’s Theorem might very well be irrelevant to the rationalism-empiricism controversy.

Some have sought recently to explain the source of this irrelevance by contending that Gold’s paradigm of learning is insufficiently nuanced to truly model language learning in humans. For example, Johnson (2004) identifies several intuitive differences between Gold’s paradigm and language learning in children, including the interesting and plausible claim that language learning in humans may be inexact: humans may only learn an approximation of the language (or languages) used in their stimulus environments.

I shall argue in what follows that placing the blame on the Gold paradigm of learning is misguided because results similar to Gold’s Theorem can easily be reproduced in more nuanced formal paradigms of language learning. In Section 1, I review Gold’s paradigm and Gold’s Theorem. In Section 2, I review a variety of rationalist and empiricists interpretations of Gold’s theorem and also construct the standard anti-empiricist argument using Gold’s Theorem. In Section 3, I consider a variety of empiricist objections to the standard anti-empiricist argument, including an important objection about inexact learning. I show in each case, however, that none of the objections are sufficient to diffuse the anti-empiricist argument or to show what is wrong with the Gold paradigm of learning.

Introduction to Identification in the Limit



Gold’s Paradigm



In 1967, E. Mark Gold published a highly influential and frequently-cited article in the computer science journal Information and Control which investigates the properties of a formally described paradigm for learning which he termed identification in the limit (Gold, 1967). The paradigm of learning Gold describes concerns the induction of formal languages, which are similar to the highly abstract languages considered today in standard textbooks on Automata Theory. Formal languages, in these books, are sets of finite sequences of symbols generated from some symbolic alphabet. For example, if we define an alphabet as the set of lowercase English letters (i.e. ), then the following set is a formal language for :



So is the following finite set, :



If we denote as the set of all the finite sequences of the elements of , then any subset of is a formal -language. In the following, I refer to as the set of all formal -languages (the relevant alphabet should be clear and if not I will write ). and in the preceding cases are names of languages rather than languages themselves, though this is a distinction I shall largely ignore.

The specification of formal languages lack many of the familiar syntactical, semantical, and phonological elements we might expect when describing a natural language. For example, even though the two examples above are fully specified -languages, nothing is said in their specification about the meanings of their elements or even whether the elements are phrases, sentences, or words in the language. (As we shall see later on in this paper, however, even though formal languages are not explicitly specified with features such as denotation functions, they nevertheless are capable of representing many semantic features.)

Though formal languages are seemingly threadbare in comparison to their natural language analogues, their highly abstract nature has proven useful in the investigation of their computational properties. Perhaps the most famous example of this kind of investigation was the discovery of the computational hierarchy of formal languages, which demonstrated that formal languages can differ in the strength of the computational devices needed to effectively recognize them. For example, some formal languages, like those definable by a regular expression, can be decided by a memoryless finite-state automaton, whereas others require a device which has a memory mechanism, such as a pushdown automaton or Turing machine. The resulting hierarchy of formal languages organized by computational strength (described in Hopcroft (1979), Sipser (1996), and likely any other standard textbook on the theory of computation), and as illustrated in Figure~, is now called the Chomsky hierarchy after its introduction in Chomsky (1956).

Image
Figure 2. The Chomsky Set Inclusion Hierarchy of Formal Languages

Gold (1967) proceeds in a similar spirit in order to investigate the computational learning properties of formal languages, where learning is formally understood as identification in the limit. According to this paradigm of learning (now called the Gold paradigm), a formal language for some alphabet is first selected as a target language. A learner in this paradigm receives a sequence of input signals from a learning environment for the target language.

Learning environments differ in the kind of signal that they send to the learner. The first kind of learning environment is what Gold calls a text, which presents examples from the language as its input signals. For example, if the target language is , then the first signal might be while the second example might be . However, not just any arbitrary sequence of elements from a -language is a text. Instead, the sequence must meet the following condition: each element of the target language must appear somewhere in the sequence.

Each time a learner receives a stimulus example from the text, the learner guesses the name of a language, such as “”. This pattern of input-stimulation and guessing continues ad infinitum.

At any time during training, we say a learner has observed a finite initial segment of a training sequence. Hence, formally, a learner is a function from finite sequences of elements of a training sequence into names of languages. For example, if has observed and makes the guess “,” then we shall write:



A learner conditionally T-learns (for “text-learns”) a language given a text just in case is presented and after some finite time ‘s guesses are the same and are a name of . Furthermore, unconditionally T-learns iff given any text which targets conditionally learns given .

Gold’s unlearnability theorem concerns classes of languages rather than individual languages which we elaborate with the two following definitions:

Definition. A learner T-learns a class of languages iff for each , unconditionally T-learns . A class of languages is T-learnable iff there exists a learner which T-learns .


Gold also describes another kind of learning environment which he calls an informant. Unlike a text, which contains only examples of words from the target language, an informant stimulates the learners with a sequence of boolean indication pairs where is a finite sequence of symbols from alphabet and is a boolean indicator indicating whether or not this word belongs to the target language. As expected, a sequence of such pairs is an informant for a language only if every element of occurs somewhere in the sequence in a pair with a positive boolean indication.

Image
Figure 3. Just Another Botanical (Royalty-free. Vault Editions.)

Analogously, we will say that learners conditionally I-learn or classes of languages are I-learnable when we restrict ourselves to learnability situations which utilize informants rather than texts.

Note that showing that a class of languages is not learnable is equivalent to showing that for all learners, there exists a language in the class and a stimulus environment which targets that language wherein the learner does not conditionally learn the language given that environment. Furthermore, if a class of languages is unlearnable then every one of its superclasses is likewise unlearnable. If a class of languages is learnable, then any of its subclasses is likewise learnable.

Gold describes several varieties of informants and texts as well as the kinds of methods used for naming languages, though these are out of the scope of this study. Nevertheless, one point bears emphasis: the text-learner is a means of formally describing positive learning from examples. Hence text-learners are by construction much more restricted than informant-learners, which may utilize negative information as well as positive information. I will something more about which kind of learning model more closely approximates the kind of learning exhibited by humans. First, however, I review the proof of Gold’s famous impossibility theorem.

Gold’s Unlearnability Theorem for Text-Learning



Gold (1967) catalogs the learnability strength of the various environments presented in the preceding section. The results can be summarized by the following table, similar to one in the original article, where the “dividing lines” separate classes of languages by their learnability.

Image
Table 1. Gold’s Hierarchy of Dividing Lines of Learnability

The hierarchy as presented in Gold’s original article places no special emphasis on the first such dividing line, though almost all of the subsequent exegesis by philosophers and psychologists has focused on the result establishing it. This result (Gold’s Theorem, as it is now called) demonstrates that text-learning is a comparatively weak learnability model.

Gold’s Theorem concerns the learnability of classes of formal languages which Gold calls superfinite. A superfinite class of languages is any which contains a certain sequence of finite languages and one infinite language. More precisely we will say that a class of languages is superfinite just in case it satisfies the following two properties:

  • Infinite Ascending Chain contains an infinite ascending chain of languages such that for each there exists an such that .
  • Limit Language contains a limit language such that is a word in for some iff .


If we limit our attention to merely stimulus environments which are texts, we can prove Gold’s theorem.

Theorem. (Gold’s Unlearnability Theorem.) If a class contains a superfinite class then not T-learnable.

Proof.

Assume the antecedent and suppose toward a contradiction that some learner T-learns . This means that for every language in and every text that targets , conditionally T-learns given . It is enough to show that there is a stimulus environment that targets a language in and the range of under that stimulus environment is infinite.

Since is countably infinite, there is a bijection from the natural numbers to the members of . Let denote such a bijective function so that where .

For each , there is a subsequence of which contains only and all members of .

Construct a text as follows:

Begin by enumerating repeating if necessary until guesses “.” Otherwise, this construction targets and does not T-learn it. Next start stimulating with a subsequence of S (which is a supersequence of ) containing all and only members of until guesses “.” Continue in this way for each . This constructed sequence targets , but the range of under the construction is infinite. The construction targets which is in , but does not learn . This contradicts our assumption.∎




Rationalist and Empiricist Interpretations



Ever since its introduction in the 1960s, Gold’s unlearnability theorem for text-learning has been at the center of an ongoing debate whose contestants regard the theorem as revealing something significant about the nature of language acquisition in humans and also the rationalist-empiricist debate. Since the theorem is stated in the generic framework of formal languages, many who regard formal language identification as a means of abstractly representing scientific inquiry also naturally see Gold’s Theorem as saying something important about the nature of induction in general. In fact, one textbook on formal learning theory, Jain (1999), refers to learners as scientists. In this section, I consider several varieties of arguments which employ Gold’s Theorem in order to take sides in the rationalist-empiricist debate.

Image
Figure 4. Just Another Botanical (Royalty-free. Vault Editions.)

One persistent difficulty in understanding the dialectic of the rationalist-empiricist debate since the publication of Gold’s Theorem is that not all commentators agree upon or even specify what they mean by empiricism or rationalism. To set the terms of engagement, I shall take empiricism for the remainder of this discussion to be the doctrine that the only innate faculties humans possess to facilitate the learning of a first language are sensory faculties. I take rationalism or nativism about language learning to be the doctrine that humans possess innate conceptual faculties other than sensory faculties which assist humans in some significant way in language acquisition.

One argument employing Gold’s Theorem is expressed by the quote from Scholz (2011) in the introductory section of this paper. For Scholz, Gold’s Theorem shows that many rationalist theories of language acquisition must be mistaken. If language acquisition is the selection of one language among an innately determined class of languages characterized by a Universal Grammar, and if this class has the superfiniteness property, then clearly this class fails to be learnable from text.

However, the purported challenge to rationalism presented by Gold’s Theorem here is overstated. If Gold’s Theorem shows anything at all about rationalist theories of language acquisition which appeal to the idea that learning is selection among the class specified by Universal Grammar, it is that the class of languages must be much more restricted than previously expected. Faced with the results proved in Gold’s Theorem, a rationalist might naturally be inclined to argue that the theorem is simply more proof of the existence of innate conceptual faculties which exclude features such as superfiniteness from the class of candidate grammars. In this sense, Gold’s Theorem is an argument for rather than against rationalism. Indeed, Wexler (1983) show that placing enough restrictions on grammars makes them identifiable by a text-learner.

Image
Figure 5. Just Another Botanical (Royalty-free. Vault Editions.)

A second and perhaps more prevalent conclusion is that Gold’s Theorem presents a serious challenge to empiricism about language learning. Since text-learners, the argument goes, are formal functional representations of language learners who receive positive linguistic evidence in the form of sensory input, and since empiricism maintains that human language learners are the learners who receive positive linguistic evidence via sensation, empiricism must have the purportedly absurd consequence that human language learners cannot learn classes of languages like the context-sensitive languages and context-free languages.

In light of this strange consequence, one possible alternative available to the empiricist (and one that Gold himself suggests) is to argue that Gold’s Theorem does not invalidate empiricism but merely shows that children receive negative stimulus information in addition to positive evidence through sensory faculties. Gold writes:

The child receives negative instances by being corrected in a way we don’t realize. If we can assume that the child receives both positive and negative instances, then it is being presented information by an “informant.” The child may receive the equivalent of negative information when it does not get the desired response to an utterance. It is difficult to interpret the actual training program of a child in terms of the naive model of a language assumed here.



Though Gold’s suggestion seems a reasonable one, a first major difficulty lies in providing an adequate psychological interpretation of negative evidence in language learning. One obvious candidate is corrective feedback from an interlocutor, as evident in the following example:

(1) Child: *I dranked the juice. Father: I drank the juice.



Do we regard this instance of corrective feedback as negative stimulus evidence in the Gold paradigm? One problem here is to explain why the father’s response, as perceived by the child, is a correction of the preceding statement rather than a distinct (and empathetic) indicative statement which contains a stylistic variation of the word “dranked.” One possibility is that corrective feedback is colored by tonal cues, for example:



(2) Child: *I dranked the juice.
Father (sternly): I drank the juice.



In order to perceive the father’s stern response in () as negative, we must assume that the child has associated vocal sternness with ungrammaticality. Clearly in the earliest stages of language development, no such conceptual competence will be present (at least according to any sufficiently strong theory of empiricist language acquisition), and hence we must conclude that the initial stages of language learning occur from positive examples alone.

An analogous point can be made about the following exchange:

(3) Child: *I dranked the juice.
Father : Child, “I dranked the juice” is not grammatical.



Here, we must assume that the child has mastered, to some degree, the concept of grammaticality as a metalinguistic notion and that it applies to quoted entities. Again, according to any reasonably strong empiricist theory, such concepts are not innate in human language learners and hence they must also be acquired from sensation. Again, the empiricist must concede that initial stages of language learning occur from positive examples alone.

Image
Figure 6. Just Another Botanical (Royalty-free. Vault Editions.)

A second (and probably more serious) problem in characterizing corrective feedback as negative evidence is that the responses in examples in (1), (2), and arguably also () are grammatical and hence positive rather than negative examples. The judgment that the responses are negative is made only in relation to the constructions generated by the child. However, stimuli in Gold’s paradigm do not in general contain constructions generated by the leaner in addition to those provided by the environment.

Finally, Gold’s suggestion that the unlearnability theorem demonstrates that children must receive negative evidence contradicts a long and growing list of empirical studies, which conclude that children receive relatively little corrective feedback. One of the earliest studies, Brown (1970), concludes that their “results provide no support for the notion that there is a communication pressure favoring mature constructions.” A more recent study, Marcus (1993), concludes in a similar vein about problems associated with noisy negative feedback (a kind of corrective signal sent after some errors and after some correct sentences, but in different proportions):

There are three serious problems with the position that parents provide negative evidence to help their children learn language. First, if such feedback does exist, it is too noisy to be used in practice. Second, it is not available to all children, not available at all the relevant ages, and probably not available for many types of errors. Third, reply categories have often been defined relative to the child’s preceding utterance, and thus observed correlations between parental replies and children’s utterances may simply be artifacts of the coding scheme. Any one of these problems is enough to significantly undermine the position that parents provide negative evidence to their children is significantly undermined.



With these considerations about negative evidence in mind, we can characterize the anti-empiricist argument schematically:

  • R1 If natural languages are learnable, children have access to negative data or the learning of natural languages is constrained by innate cognitive faculties restricting the hypothesis class of languages.
  • R2 Children don’t have access to negative data.
  • R3 Natural languages are learnable.
  • C There are innate constraints on language acquisition.


Empiricist Rejoinders and Responses



Does Gold’s Theorem provide incontrovertible proof against the doctrine of empiricism about language acquisition? Many cognitive scientists, philosophers, and psychologists inclined towards empiricism have disputed the relevance of Gold’s Theorem to language learning in humans. This attitude is captured nicely by Johnson (2004), who analogizes the relationship between Gold’s Theorem and “real life” language learning to the relationship between the First Incompleteness Theorem and calculators:

In general, the relation of Gold’s Theorem to normal child language acquisition is analogous to the relation between Gödel’s first incompleteness theorem and the production of calculators. Gödel’s theorem show that no accurate calculator can compute every arithmetic truth. But actual calculators don’t experience difficulties from this fact, since the unprovable statements are far enough away from normal operations that they don’t appear in real life situations.



In this section, I review a variety of empiricist rejoinders to the anti-empiricist argument in this vein. Despite their variety, we shall see that they share a common theme: all claim that Gold’s Theorem is irrelevant to the anti-empiricist argument and to language learning because Gold’s paradigm is either over-general, under-specific, or insufficiently nuanced. The general form of the argument disputes whether we can genuinely substitute the informal, “real life” notion of language learnability with the formally described notion of identifiability in the limit.

Image
Figure 7. Just Another Botanical (Royalty-free. Vault Editions.)

Rejoinder 1: Computational Theory of Mind. The first empiricist rejoinder claims that any application of Gold’s Theorem to the “real life” case of language learning must implicitly depend upon a controversial conviction about the computational theory of mind. The argument goes that before we settle whether formal notion identifiability in the limit is a substitute for the informal notion of language learnability, we must first determine whether the human mind is or exhibits characteristics sufficiently similar to a computer.

The reason this objection to the anti-empiricist argument fails is because Gold’s Theorem is a result which concerns more than simply class of partial recursive functions. If Gold’s Theorem established the fact that no computable learner could identify any superfinite class in the limit, then the anti-empricist argument would presumably require a subsequent appeal to a CTM-like principle which asserted that cognitive capacities like learning were restricted to those that could be modeled by some Turing Machine (or analogous device). However, Gold’s Theorem proves that no function can identify a superfinite class (or superclass of a superfinite class) in the limit. For this reason, the anti-empiricist argument does not depend upon any controversial claims about the computational theory of mind.

Rejoinder 2: Semantics and the E-Language/I-Language Distinction. The second empiricist rejoinder takes aim at a representational assumption in Gold’s paradigm. According to this objection, we cannot equate identifiability in Gold’s paradigm with learnability of languages since oversimple formal languages are identified in the limit rather than feature-rich natural languages, which bear semantic properties in addition to syntactic properties.

This objection also fails: though formal languages are commonly presented in standard textbooks as containing elements which bear only syntactic properties, this does not imply that formal languages cannot represent semantic properties. For example, consider an arithmetical term language generated from an alphabet containing digits, an addition symbol, and a separator symbol: . With this alphabet, we can represent a term language with syntactic elements to the left of the separator and an indication of their meaning to the right of the separator:



Here the separator simply allows us to specify a denotation function using shorthand. Presumably many other semantic and phonetic features could be specified in a similar way.

A similar reply can be made about a variant of the second rejoinder, which makes use of a distinction introduced by Chomsky (1986) between E-languages (for “externalist”) and I-languages (for “internalist”). The distinction is characterized in Barber (2010) as follows:

An I-language is an abstract state the `mind/brain’ can be in, an informational state or state of knowledge … The `E’ in `E-language’ stands for `external’. The properties of an E-language are `independent of the properties of the mind/brain’ and in particular of the language module of the E-language’s possessor, being determined at least partially by external characteristics such as overt behaviour or the social or physical environment.



According to the objection, socially shared, observable verbal and written productions like sentences are parts of an E-language rather than an I-language, but what human beings learn are I-languages, which are collections of mental representations. Again, however, the same response applies: formal languages do not specify whether the elements which they contain are socially observable productions or internal representations. Indeed, so long as the mental objects can be represented as combinations or sequences of some (potentially infinite) alphabet, then I-languages can be just as easily be represented as formal languages as E-languages.

It might be maintained, however, that the moral of the E-language/I-language distinction objection is that it is possible for different speakers of the same E-language to possess distinct I-languages. Hence, it is possible for a child to learn an I-language that is distinct from its environment. As a response to Rejoinder 5, however, I shall introduce a modified paradigm of learning where this is possible but show that a Gold-style theorem can easily be reproduced.



Rejoinder 3: Infinite Learning Time. The third empiricist rejoinder again targets a feature of the Gold learning paradigm. In Gold’s paradigm, learners can identify a language in the limit after converging upon it after some finite time. As a consequence, Gold’s paradigm permits for the stimulus period prior to successful convergence to be arbitrarily large. However, the stimulus period for language learning in normally developing humans is bounded by some fixed and comparatively small number. This period is called the critical period by some developmental psychologists. In general, the objection continues, the idealizations made by the Gold paradigm about the time required to acquire a language make it a poor substitute for modeling “real-life” language acquisition.

However, we can easily construct a Gold’s Theorem style result once we modify Gold’s paradigm to incorporate the notion of a critical period. Suppose, then, that language acquisition is construed formally as -learning, a modification to Gold’s identifiability in the limit: a learner -learns a language given a text iff is presented with and, after some finite time less than , guesses and continues to guess a name of . Let unconditional -learning and -learnability be defined in a manner analogous to the definition of -learning. What results is the following impossibility theorem:

Theorem.

If a class contains languages and such that , then is not -learnable.
Proof. Suppose towards a contradiction that -learns . This means that for every text that targets (which is a member of ), -learns . Construct a text that targets as follows: enumerate all and only the elements of times (repeating if necessary). By hypothesis, and hence -learns . Thus has converged upon the guess of at the th element of . Now cycle through all the remaining members of . targets but fails to -learn . ∎


Since the antecedent of Theorem~ is less demanding than Theorem~, what seems to follow is that identifiability by -learning is in some sense “easier” than -learning. Since we acquired an unlearnability theorem for -learning, a fortiori we should have one for learning.

Rejoinder 4: Uniformity of Stimulus Environment The next rejoinder criticizes the Gold paradigm of learning for permitting any sequence of stimulus examples to be considered a text environment just so long as each element appears somewhere in the sequence. This leads to the possibility that highly abnormal training sequences are permitted as acceptable environments. For example, suppose that the target language is (which is the language where each element is the string “” repeated one or more times). Now suppose that the training example consists of for the first 100 trillion stimulus examples followed by a subsequence which contains at least at least one occurrence of the remaining elements of . Clearly no human language learner is capable of observing 100 trillion spoken or written stimulus examples, and hence no human language learner should be expected to infer from that stimulus environment.

An adequate model of learning should seek to eliminate such abnormal environments in favor of normal ones. This attitude is expressed in Johnson (2004), who claims that an adequate model of language learning will inevitably eliminate odd or repetitive environments:

But on the other hand, identifiability quantifies universally over all environments, however odd or repetitive, whereas acquirability quantifies (almost) universally only over the normal environments. Thus, acquirability deals with fewer environments than identifiability does, so there is less opportunity for a collection of problematic texts to show up and render a class unacquirable.



What is normality of a stimulus environment? To avoid say odd repetitions, you might require that a text additionally be \texit{maximally uniform}. So suppose that texts have an an external process for generation in which they are generated by sampling a uniform distribution over a given language. Hence, we would think it unlikely to find long stretches of repetitions. We can define the idea of maximally uniform text learning or learning as an analogue of -learning with maximally uniform texts.

Alternatively, suppose that we have a metric of (grammatical) complexity on the contents of each language . Suppose, likewise the likelihood of seeing a stimulus example is based upon a normal distribution where sentences with low complexity have a higher chance of being seen. Call such a generated text a complexity-weighted text with learning defined like -learning.

These are seemingly ways of normalizing environments. But in either case, Gold’s theorem trivially follows, mutatis mutandis. That is because the proof of Gold’s theorem did not make use the order of the generated subsequences. All that is needed is that, whatever probability distribution is used, it provides full support to the language (i.e. no element has measure zero probability).

Rejoinder 5: Inexact Learning. Another objection, similar to Rejoinder 3, disputes the strictness of the Gold paradigm requirement that languages be identified exactly. The underlying moral of the suggestion is that children do not precisely acquire a single language from a targeting stimulus environment. Instead they learn something like the language from their environment.

To formalize this notion, we must stipulate that there is a determinate binary similarity relation so that bears to just in case is sufficiently similar to so that can be guessed from an environment while still learning . We shall not assume that is necessarily symmetric: even if can be guessed from an environment, it may not be that can be guessed from an environment.

Suppose further, we construe identifiability formally as -learning, a modification of Gold’s identifiability in the limit: a learner conditionally -learns a language given a text for iff is presented with , there exists a finite set ) to which is sufficiently similar (i.e. , for ), and after some finite time guesses and continues to guess a name of a member of . Call the neighborhood of .

Similarly, unconditionally O-learns iff for any text , if targets then conditionally O-learns given . Just as with T-learning, a learner learns a class of languages whenever it unconditionally O-learns every member in that class.

Finally a class of languages is O-learnable iff there exists a learner such that for every language in , and every text which targets , conditionally O-learns from .

Here, once again, there is no trouble in modifying Theorem~ appropriately so that a similar result follows. For example if is a strictly reflexive relation (i.e. iff ), then Theorem~ immediately follows for -learning since the case is exactly similar. A more general result and less restrictive result, however, can be obtained.

Theorem. If a class contains a superfinite class of languages (i.e contains languages and a limit language and for the superfinite class, then is not -learnable.
Proof.

Assume the antecedent, but suppose toward a contradiction that some learner O-learns . This means that for every language in and every text that targets , conditionally O-learns given .

If for some there does not exist an such that bears to , then a contradiction follows trivially, since is not -learnable. Hence we assume that for each in the superfinite class there is at least one such that and are -related.

Determine an enumeration of , . For each , there is a subsequence of which contains only and all members of .

We shall construct a text as follows:

Begin by enumerating repeating if necessary until guesses some such that . (Otherwise, this construction targets and does not -learn it).

Next enumerate the subsequence of (which is a supersequence of ) containing all and only members of , repeating if necessary. until will guess such that . Again, if this is not the case, then this construction targets and does not -learn it.

Continue this procedure for all subsequences to construct .

The behavior of under the proper initial segments of the constructed sequence is as follows: because is finitely bounded for the superfinite class, does not converge to any guess after any finite time. Instead, it alters its guess indefinitely. A fortiori, it does not converge upon some language that can be guessed for .

Hence, the constructed sequence targets but the range of under the construction is infinite and hence does not learn , which is in . This contradicts our assumption. ∎




Easy peasy lemon squeezy.

Note that the finite boundedness of for the superfinite class is a highly plausible and weak assumption. First, the requirement is not intended to hold in general for all languages, but only for those languages in some superfinite class. Second, the finite boundedness condition does not require that the number of languages to which any given language in the superfinite set is -related is itself finite. Indeed, it may be possible that an infinite number of “dialects” or idiolects can be guessed from a stimulus language in the superfinite set. It may also be that these “guessed” languages are themselves distinct from the set of languages in the superfinite class.\footnote{If all the “guessed” languages are distinct from the class of superfinite languages, then it is possible that each stimulus environment language in the superfinite class, bears to an infinite number of “guessed” languages and each of these languages bears to one another.}

What the finite-boundedness condition for the superfinite class asserts is that one such dialect can only be guessed from a finite (though potentially large) number of stimulus languages in the superfinite class. To speak metaphorically, in terms of accents or dialects, it may be possible to learn the Queen's English from an East Anglian or Cambridgeshire stimulus environment but the Queen's English cannot be guessed from some sufficiently distant and cofinite set of stimulus environment languages, like Yemeni Arabic, Lousiana Creole, the Pyongan dialect of Korean, and so on.

Image
Figure 8. Not Another Botanical. (Royalty-free. Vault Editions.)

Conclusion



I have demonstrated in the preceding that Gold's Theorem and the anti-empiricist argument are not easily diffused. Though the Gold paradigm is clearly a highly abstract and idealized model of learning, it remains difficult to construct an empiricist argument against its irrelevance to natural language learning. Most empiricist style objections to the anti-empiricist objection argue that Gold's paradigm for learning is a poor approximation of natural language learning. However, modifying Gold's Theorem by introducing additional learning features seems in general to make the learning criteria stricter than the criteria in the original paradigm. Since learning in these new models is not made any easier, it should ultimately not be surprising that unlearnability results can be reproduced for these more refined models of learning.