Induction and Identity

Induction and Identity
by Joseph Rowlands

One of the famous problems of induction is how to go from some number of specific occurrences to a generalization. The classic example is someone seeing white swans and concluding that all swans are white. When the first black swan is discovered, it proves the original generalization false. It also raises the question of whether it was ever justified to believe that all swans are white. How can you go from a finite number of examples to making a claim about all swans?

The most unsatisfying of answers relies on some form of enumeration. The generalization is said to be justified by the fact that we've seen many white swans. The more white swans we see, the more justified the generalization. This is purely a numbers game. Quantity is supposedly the justification.

There are other variants that rely on numbers alone. Instead of justifying it by seeing some number of swans, the generalization may be justified by seeing a percentage of the total number of swans. The higher the percentage, the more justified the generalization is said to be.

One reasons why numbers alone are unsatisfying is that there is no reason to believe that a swan can't be black. Seeing a million white swans says nothing about whether a black swan is impossible. The only connection between the color and the swans is a count of how often they go together.

Instead of correlation, generalizations should be based on causation. There should be some reason for making the connection. That fact that swans and the color white happen to correlate is not enough. There must be a reason given for why swans would will be white.

We can look at the question of swan color now and offer genetics as the explanation for why they are the color they are. Even before genetics, it would have been possible to speculate that animals are limited in their variability by their nature. The mechanics might not be known, just as the mechanics of genetics isn't understood by most people. We can still expect some limits without knowing exactly how it works.

The specific claim that swans must be white is made on rather shaky grounds, because the mechanics are unknown. If someone were to produce a black swan, it shouldn't cause too much surprise. There was nothing known that seemed to exclude the color black, even though it hasn't been seen before.

Pointing to the genetic nature of the swan as a potential cause of only seeing white swans is an improvement over pure enumeration. There is now a causal theory which is better than a mere description of the fact that swans are always white.

When the black swan is discovered, the causal theory for why they are white doesn't change. Genetics is still viewed as the causal explanation of the swan color, but now it is seen to have enough variability to include white or black. Does this make it a flawed theory? If contradictory evidence doesn't change the theory of the underlying cause, is it really explaining anything?

It still does provide a causal theory, but the details and mechanics were never understood. It provided a reason why swans are all white, but doesn't show why it is necessary.

Going back to the problem of induction, providing a causal explanation is better than correlation, but it doesn't really explain the generalization itself. Before asking why all swans are white, it needs to be noticed. Not only do you need to notice that all swans have been white, but you have to decide to speculate that future swans will also be white.

It is still fair to ask when can you reasonably go from seeing white swans to speculating that swans are all white. If you've only seen one swan, it wouldn't be reasonable. If you've seen two or three, it may still not be reasonable. Is there a right number of swans before you can start to speculate? Or is there some other criteria?

A number just isn't going to do it. There's nothing magical about 50 or 100 or 1000. To have any significance, the number would have to be discovered through induction. One way is to notice other animals and how frequently you find examples of different colors among them.

Another possibility is that you might recognize that animals in the same area tend to have the same attributes. This would mean you'd need to see different groups of swans, scattered in different geographical regions to have any confidence that they were all white. How many groups you need to observe and how different the geographical regions need to be would also be based on observation of other animals.

Other information might imply that you need more data. Perhaps if swans are found to have different physical attributes in other geographical regions, you might defer making assumptions about what is possible. Only when you've seen enough examples that patterns are merely repeating would you be justified in believing that you've reach the limits of variability.

All of this additional information can be use to eventually make the claim that all swans are white. Are you justified in making that claim at that point? This opens the question of the standards of justification for induction. Even then, the generalization would be knowingly based on observations and not on a detailed causal explanation. New information would not make you ask where the flaw in your reasoning was. There was no way to know that a swan could be black until you saw one. You might ask whether you generalized too soon, but you wouldn't question whether you had misinterpreted the data.

The reply to a black swan should be "Oh! I didn't know that was possible!", instead of "That can't be! Swans are supposed to be white!". The difference is illustrative. In the former, it is recognized that the conclusion is based on the best information available, but new information could always invalidate it. There's nothing known that requires swans to not be black. There just wasn't any data showing that it ever happened.

In the latter exclamation, the speaker has twisted the conclusion in his head. It was originally a generalization from the data. Consequently, new data should simply alter the generalization. But the speaker is treating the generalization as if it were independent of the data and that new data should be distrusted because it is incompatible.

This is critical distinction, then. A generalized statement can be of at least two types. The first type is simply a generalization from the data, with new data easily modifying the generalization. The second type is a generalization derived from some other facts or causal theory. New data would contradict the rationale for the generalization. This doesn't happen in the first type. The generalization is contradicted, but not the justification for that generalization.

Discuss this Article (23 messages)