3. Knowledge with a Rationale
The concept of knowledge with a rationale is to be found in many philosophers. It will be convenient to begin with the views of Leibniz on this subject. After the publication in 1700 of a French translation of Locke’s An Essay Concerning Human Understanding, Leibniz set himself the task of writing a rebuttal entitled: New Essays on the Human Understanding. However, this interesting work was only published in 1765, nearly fifty years after Leibniz’ death. Leibniz writes (c. 1700, p. 143):
“From this arises another question, whether all truths depend on experience, that is to say on induction and on instances, or whether there are some which have another basis also.”
Of course, Leibniz, being a rationalist, will argue that not all truths depend just on induction and on instances, but some depend on reason as well. Now, it is clear that his arguments are relevant to the knowledge produced by deep neural networks, since, as was remarked above, these work by a form of computer induction. Thus, neural networks embody an empiricist approach and Leibniz’s criticisms of that approach therefore apply to the valuation of the output of these networks. Leibniz argues against exclusive reliance on induction as follows (c. 1700, p. 144):
“Now all the instances which confirm a general truth, however numerous they may be, are not sufficient to establish the universal necessity of this same truth, for it does not follow that what happened before will happen in the same way again. For example, the Greeks and the Romans, and all the other peoples of the earth known to the ancients, always observed that before the passage of twenty-four hours day changes to night and night to day. But they would have been wrong if they had believed that the same rule hold good everywhere, for since that time the contrary has been experienced during a visit to Nova Zembla.”
Leibniz then goes on to argue that animals are purely empiricist, but that men are superior because they have reason. As he says (c. 1700, p. 145):
“It is in this also that the knowledge of men differs from that of the brutes: the latter are purely empirical, and guide themselves solely by particular instances; for, as far as we can judge, they never go so far as to form necessary propositions; whereas men are capable of demonstrative sciences. This also is why the faculty the brutes have of making sequences of ideas is something inferior to the reason which is in man. The sequences of the brutes are just like those of the simple empiricists who claim that what has happened sometimes will happen again in a case where what strikes them is similar, without being capable of determining whether the same reasons hold good. It is because of this that it is so easy for men to catch animals, and so easy for pure empiricists to make mistakes. And people whom age and experience has rendered skilful are not exempt from this when they rely too much on their past experience, as some have done in civil and military affairs; they do not pay sufficient attention to the fact that the world changes, and that men become more skilful by discovering countless new contrivances, whereas the stags and hares of to-day are no more cunning than those of yesterday.”
Leibniz develops these arguments in The Monadology of 1714, where he writes (pp. 7-8):
“26. Memory provides souls with a kind of consecutiveness, which copies reason but must be distinguished from it. What I mean is this: we often see that animals, when they have a perception of something which strikes them, and of which they had a similar perception previously, are led, by the representation of their memory, to expect what was united with this perception before, and are carried away by feelings similar to those they had before. For example, when dogs are shown a stick, they remember the pain which it has caused them in the past, and howl or run away.
…
28. Men act like brutes in so far as the sequences of their perceptions arise through the principle of memory only, like those empirical physicians who have mere practice without theory. We are all merely empiricists as regards three-fourths of our actions. For example, when we expect it to be day tomorrow, we are behaving as empiricists, because until now it has always happened thus. The astronomer alone knows this by reason.”
Leibniz here clearly demarcates knowledge which is based purely on induction from observations from the kind of knowledge which the astronomer possesses and which is based on reason. It is this second kind of knowledge which we are describing as ‘knowledge with a rationale’. However, we will not pursue Leibniz’ analysis of this kind of knowledge in detail, but rather skip some centuries and consider a passage of Popper’s which uses a very similar example to that of Leibniz. Popper says (1972, p. 20):
“Thus I assert that with the corroboration of Newton’s theory, and the description of the earth as a rotating planet, the degree of corroboration of the statement s ‘The sun rises in Rome once in every twenty-four hours’ has greatly increased. For, on its own, s is not very well testable; but Newton’s theory, and the theory of the rotation of the earth are well testable. And if these are true, s will be true also.”
I prefer to use the term ‘confirmation’, meaning empirical confirmation by observations and experiments, rather than Popper’s term ‘corroboration’. However, with this alteration, Popper’s general point can be put as follows. Let us start with Leibniz’ original example of a possible law (the law of the sun’s setting and rising), namely: ‘Before the passage of twenty-four hours day changes to night and night to day.’ This law was obtained by induction from countless observations in the ancient world. These observations confirmed the law, and we shall call this type of confirmation: direct confirmation. Now the law of the sun’s setting and rising can with some significant qualifications be deduced from Newton’s theory, which I take to include the theory of rotation of the earth. The significant qualifications, which are alluded to by Leibniz, are that the law does not apply to some regions of the earth near the poles at some times of the year. If we modify the law to take account of these qualifications, then it obtains, in addition to its direct confirmation, some indirect confirmation. Newton’s theory is confirmed by observations on the planets, on the tides, on the motion of pendula, on the motion of projectiles etc. Since these observations confirm Newton’s theory and since the modified law of the sun’s setting and rising is derivable from Newton’s theory, it follows that this modified law is indirectly confirmed by these observations on the tides, on pendula, etc. which, prior to the introduction of Newton’s theory might well have seemed completely irrelevant to it.
This example is typical of what we are calling ‘knowledge with a rationale’ in the natural sciences. Knowledge obtained purely by induction from observations is not in itself knowledge with a rationale, but it can be turned into knowledge with a rationale by explaining it by some theory. Usually, the explanatory theory corrects the original empirical law in some ways, and it also provides some indirect evidence for the modified empirical law. This notion of ‘rationale’ is rather different from that of Leibniz, but they are not unconnected. Leibniz, as we have seen, speaks of ‘demonstrative sciences’ which men but not animals are capable of producing. We can also agree with Leibniz that knowledge with a rationale is superior to knowledge based solely on induction from observations. Knowledge with a rationale is based on theoretical explanations which both indicate the limitations of empirical generalisations and also provide indirect as well as direct confirmation for modified versions of these empirical generalisatons. Despite these seemingly convincing arguments, knowledge with a rationale has been challenged recently by Chris Anderson in his famous and provocative 2008 article: ‘The End of Theory: The Data Deluge Makes the Scientific Method Obsolete’.
Anderson writes:
“The new availability of huge amounts of data along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.”
However, if explanatory theories are no longer produced, then knowledge with a rationale in science will also disappear. But do huge amounts of data really make knowledge with a rationale unnecessary? Leibniz’ example of the ‘law’ that before the passage of twenty-four hours day changes to night and night to day shows that this is not the case. A massive amount of data could in principle have been collected from all the civilised peoples on earth for thousands of years. All of which would have confirmed this law, since those regions of the earth where it breaks down (such as Nova Zembla) did not have any civilised inhabitants, or, for the most part, any inhabitants at all. Yet the law was not correct, and we can see that it was worth developing astronomical theories to explain this law, because they showed its limitations.
Similar considerations apply to Anderson’s more specific claim that “correlation supersedes causation”. Anderson also says, quite correctly:
“Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y … . Instead, you must understand the underlying mechanisms connecting the two.”
This is indeed part of scientists’ training and, contrary to what Anderson claims, a valuable part, as we can show by a simple example. Heavy drinking is strongly correlated with lung cancer, but most researchers would not regard this correlation as causal in character. It almost certainly arises because heavy drinking is strongly correlated with heavy smoking and there is a causal connection between heavy smoking and lung cancer. Recognising which correlations are causal in character and which are not, is important in practice. Giving up heavy smoking for someone who continues to drink heavily will indeed reduce the risk of lung cancer, but giving up heavy drinking for someone who continues to smoke heavily will not reduce the risk of lung cancer to any significant extent.
It is subtle matter to decide whether a particular correlation is causal or not. Theoretical considerations and indirect evidence need to be brought in. So, genuine causal knowledge is always knowledge with a rationale. Anderson’s claim is that, with huge amounts of data, causal knowledge becomes unnecessary, and that correlation is all that is required. However, it is easy to give counterexamples to this thesis. The standard example of a correlation which is not causal is the following. The reading of a barometer falling in value is strongly correlated with rain occurring. However, the connection is not causal. If someone reduces the value of the reading by fiddling with the barometer, this will not produce rain. Now this example can easily be generalised to fit our era of big data. Suppose all smart phones are equipped with sensors which can detect rain, and with an app, which, in conjunction with the sensors, can act as a barometer. Suppose that the data from the app and the rain-detecting sensors is harvested from millions and millions of smart phones by some large corporation. A correlation between the reading of the barometer app falling and rain occurring is established on the basis of huge amounts of data, but this does not make the correlation causal. If a hacker breaks into the system and makes the value of the barometer app fall, this will not produce rain. So, very large amounts of data do not make knowledge with a rationale redundant in the sciences.
The situation regarding the relation between correlation and causation is very similar to that, analysed earlier, between the law of the sun’s setting and rising and Newton’s theory. This can be illustrated by the following example.
3 To investigate the relationship between diet and coronary heart disease, Ancel Keys carried out an extensive study in seven countries. The study was begun between 1958 and 1964 and in 1970, Keys published the five year results. One finding concerned the relationship between x (the percentage of the diet calories which came from saturated fat) and y (the percentage of the cohort eating that diet who had died of coronoary heart disease). The result showed a close approximation to a linear regression model with a correlation of 0.84. Keys thought that this provided strong evidence of the the claim that a diet high in saturated fat causes coronary heart disease. However, he was well aware that
correlation is not causation, and sought additional evidence to establish the causal claim.
One of the ways in which a correlation can fail to be a causation is because of the existence of a ‘confounder’. This is illustrated by our earlier example of the correlation between heavy drinking and lung cancer where the confounder was heavy smoking. Could there be a confounder in Keys’ diet study? The Japanese cohort had a diet which contained much less saturated fat than the American cohort, and the death rate from coronary heart disease was also much lower among the Japanese than among the Americans. Here, however, there could have been a genetic confounder. The Japanese might have a set of genes which protected them against coronary heart disease and which the Americans lacked. However, there was an easy way to test the hypothesis of a genetic confounder. Keys had only to compare Japanese living in Japan and eating a traditional Japanese diet with Japanese who had emigrated to the U.S.A. and started eating a typical American diet. It turned out that the latter category of Japanese had rates of coronary heart disease more or less the same as those of other Americans, and this ruled out a genetic confounder.
Another way of establishing causality is to try to undertand the mechanisms (if any) connecting the correlated variables. In the present case this was done very successfully through experiments on rabbits fed an unusual diet and also from observations obtained from autopsies of humans. These led to the elucidation of the following mechniam. A diet high in saturated fat produces a raised level of LDL cholesterol in the blood, which leads to infiltration of some parts of the arterial wall by lipids, which are then absorbed by macrophages, and become foam cells. This brings about the full development of atherosclerotic plaques. This mechanism explains the occurrence of coronary heart disease which is produced by atherosclerotic plaques forming in the coronary arteries. The mechanism also leads to a generalisation of the proposed causal law, since a diet high in saturated fat can cause any form of atherosclerosis, and not just coronary heart disease. Strokes, for example, are another form of atherosclerosis.
A third way of checking causal claims is the use of randomized controlled trials, since randomization provides a technique for dealing with unknown confounders. A double blind randomized controlled trial was carried out in the USA in the period 1959-1968, and sure enough it showed that the group which ate the diet low in saturated fat had a statistically significant decrease in the number of atherosclerotic events.
This example is very similar to that of the law relating to the sun’s setting and rising and Newton’s theory considered earlier. The starting point in the saturated fat case is an observed correlation. This is explained by postulating a causal link between the correlated variables, and an investigation proceeds as to whether this causal link is correct. In the course of this investigation, the original causal law is changed (generalised) and it is then confirmed by the results of a variety of experiments and observations, quite different from those which confirmed the original correlation. This new evidence included the testing of possible confounders, the establishment of physiological mechanisms, and the result of randomized controlled trials. At the end of the investigation, a rationale has been provided for a causal law which explains the observed correlation.
Knowledge with a rationale is also important in the law. It would obviously be unjust to send someone to prison, unless we can validly claim to know that he or she is guilty. This knowledge must be capable of being justified by producing an explicit rationale for the guilty verdict, and so should be knowledge with a rationale.
That concludes our account of the distinction between knowledge how and knowledge with a rationale. We will now briefly compare it to Sosa’s quite similar distinction.
Sosa writes (1985, pp. 241-2):
“From this standpoint we may distinguish between two general varieties of knowledge as follows:
One has animal knowledge about one’s environment, one’s past, and one’s own experience if one’s judgements and beliefs about these are direct responses to their impact – e.g. through perception or memory – with little or no benefit of reflection or understanding.
One has reflective knowledge if one’s judgment or belief manifests not only such direct response to the fact known but also understanding of its place in a wider whole that includes one’s belief and knowledge of it and how these come about.”
The passages from Leibniz quoted above cast some light on Sosa’s expression ‘animal knowledge’ since Leibniz thought that animals could only gain knowledge by induction from instances. However, the term does not seem to us quite appropriate for our investigation, since we want to consider learning such things as language, or how to play the piano or to play chess. These are all things which, apart possibly from the language of bees, are characteristic of humans rather than animals. This is why we prefer to use Ryle’s concept of knowledge how which applies to such examples. Sosa’s concept of reflective knowledge, however, seems quite close to our concept of knowledge with a rationale.
Having explained the two kinds of knowledge we want to consider, we must see how they apply to deep learning. Before doing so, however, we must draw attention in the next section to a relevant feature of deep neural networks (and indeed of nearly all neural networks), namely that the results they produce are opaque, in the sense of being unintelligible to human beings.