How people learn from sparse data (from Tenenbaum et al., 2011)

In concept learning, the data might correspond to several example objects (Fig. 1) and the hypotheses to possible extensions of the concept. Why, given three examples of different kinds of horses, would a child generalize the word “horse” to all and only horses (h1)? Why not h2, “all horses except Clydesdales”; h3, “all animals”; or any other rule consistent with the data? Likelihoods favor the more specific patterns, h1 and h2; it would be a highly suspicious coincidence* to draw three random examples that all fall within the smaller sets h1 or h2 if they were actually drawn from the much larger h3. The prior favors h1 and h3, because as more coherent and distinctive categories, they are more likely to be the referents of common words in language. Only h1 scores highly on both terms.

Likewise, in causal learning, the data could be co-occurences between events; the hypotheses, possible causal relations linking the events. Likelihoods favor causal links that make the co-occurence more probable, whereas priors favor links that fit with our background knowledge of what kinds of events are likely to cause which others; for example, a disease (e.g., cold) is more likely to cause a symptom (e.g., coughing) than the other way around.


*The principle of suspicious coincidence, formulated by Horace Barlow (1956; 1990) states that the cooccurrence of two events, A and B, should be deemed "suspicious" (deserving scrutiny) if its probability, P(A&B), is significantly larger than the product of the marginals P(A)P(B).


Shimon Edelman <se37 at cornell.edu>
Last modified on Thu Mar 23 09:57:14 2017