How people learn from sparse data (from Tenenbaum et al., 2011)
In concept learning, the
data might correspond to several example objects
(Fig. 1) and the hypotheses to possible extensions
of the concept. Why, given three examples
of different kinds of horses, would a child generalize
the word “horse” to all and only horses
(h1)? Why not h2, “all horses except Clydesdales”;
h3, “all animals”; or any other rule consistent with
the data? Likelihoods favor the more specific
patterns, h1 and h2; it would be a highly suspicious
coincidence* to draw three random examples
that all fall within the smaller sets h1 or h2
if they were actually drawn from the much larger
h3. The prior favors h1 and h3, because as
more coherent and distinctive categories, they
are more likely to be the referents of common
words in language. Only h1 scores highly
on both terms.
Likewise, in causal learning, the
data could be co-occurences between events; the
hypotheses, possible causal relations linking
the events. Likelihoods favor causal links that
make the co-occurence more probable, whereas
priors favor links that fit with our background
knowledge of what kinds of events are likely to
cause which others; for example, a disease (e.g.,
cold) is more likely to cause a symptom (e.g.,
coughing) than the other way around.
*The principle of suspicious coincidence, formulated by Horace
Barlow (1956; 1990) states that the cooccurrence of two events, A and B,
should be deemed "suspicious" (deserving scrutiny) if its probability,
P(A&B), is significantly larger than the product of the marginals P(A)P(B).
Shimon Edelman <se37 at cornell.edu>
Last modified on Thu Mar 23 09:57:14 2017