The Bible, History, and Bayes' Theorem (part 2)
The Bible, History, and Bayes' Theorem (part 2)
(continued from part 1)
There are several issues you've brought up, and which I've also thought of, which remain to be addressed, so I thought I'd recap what I see as the major remaining issues here:
- How to apply Bayes' Theorem to specific problems in the Jesus Historicity debate (e.g. Paul's reference to James as 'brother of the Lord'.)
- How can we choose between different hypotheses? How can our personal assumptions (such as that it is likely that Jesus existed) bias our evaluation of alternative hypotheses? How can we test two hypotheses without succumbing to the problem of making up ad hoc 'predictions' that simply mould the hypothesis to the new evidence?
- The 'numerical' problem of Bayesian analysis. Is it not appropriate to place numbers where they are unwarranted? Why not just rank probabilities or even simply subjective judgments of confidence?
- Grounding evidence and estimates of probability ratios. How do we know our probability estimates are actually good?
- Dealing with unknown probabilities. The outcomes of our analysis may crucially depend on the probabilities we input as assumptions. But if we only have a vague idea of those probabilities in the first place, how can Bayesian theory help us narrow down the likely ranges of those vague first-estimates?
- Assessing reliability of witnesses. How should we treat statements of fact from possibly (probably) unreliable witnesses? How do we incorporate ideas about what people 'believe' vs. what they actually 'know'?
- Importance of prior information. How sensitive is Bayesian analysis to our initial estimates of probabilities, and what strategies and methods can help eliminate strong personal biases?
- How the accumulation of evidence can increase the confidence in the overall-best hypothesis.
This is only a rough list, and I've probably missed stuff, etc. But it's a decent starting point.
I will say right now that I'm not the one who's going to be able to answer such specific questions as your question about James. I don't have enough knowledge of history and existing historical methods to give a competent model for that specific of a situation.
However, I can definitely come up with analogous problems and show how, given the assumptions of a problem, Bayes' theorem can be applied very flexibly, like an all-purpose tool, to work on whatever problem you can specify. I just don't have the right background knowledge to properly specify the problem of James' relationship to a historical Jesus. But if some historian can specify that problem well-enough, then Bayes' theorem will apply to it.
Given my personal limitations on the subject of history, I've decided to focus more on the general issue of how Bayes theorem can be applied to any subject -- history included -- and how, even more generally, learning more and understanding more about how Bayes' theorem works in a practical way, can help just about anyone to improve their skill in rational, plausible, evidence-based reasoning.
Therefore, I will focus on making Bayesian tools more understandable to the average reader, and also on how we could apply Bayesian reasoning to more general problems that may have some analogous relationship to the specific question of Jesus' historicity.
The first, most crucial topic to explore now comes clearly to the fore:
The Importance of Prior Information
In the previous example of the birthday present from Alice, we had allowed for the assumption that the likelihoods of getting a watch or getting a keychain should be of equal weight. Thus, they should have a 1 to 1 relationship, which translates to 50% chance of a watch and 50% chance of a keychain.
But what if we knew, again from 100% reliable information from trusty Bob, that Alice happened to work for a watch factory? Furthermore, Bob tells us, Alice has a long-run history of getting people watch-related presents for their birthdays 90% of the time!
Clearly, if we were to ignore this prior information about the situation, we might end up with quite bad probability estimates. It surely no longer seems reasonable to start of with an assumption of equal likelihood, 50% watch, 50% keychain.
As a reminder, when we assumed the probability of a keychain, P(K), was initially 50%, we were able to correctly calculated the probability of a keychain after hearing the box rattle, P(K|R), which turned out to be 75%, because key chains tend to rattle more often than watches (60% vs. 20%).
But now, we should instead apply our prior information from trusty Bob that the initial estimate of P(K) should be more like 10%, since it is 90% likely that Alice would get us a watch-related present, as she usually does. So, we adjust our initial prior probabilities:
P(W) = 90% = 0.9
P(K) = 10% = 0.1
Now, let's look at how this change in prior probabilities affects the posterior probabilities of K or W, after hearing a rattle sound (R). We just apply Bayes' theorem as usual. After practicing this a few times, it will begin to become natural and obvious to us.
P(K|R) = P(K) x P(R|K) / [ P(K) x P(R|K) + P(W) x P(R|W) ]
= 0.1 x 0.6 / [ 0.1 x 0.6 + 0.9 x 0.2 ]
= 0.06 / ( 0.06 + 0.18 )
= 0.06 / 0.24
= 6/24 = 1/4 = 0.25 = 25%
So, Bayes theorem tells us that, given our assumptions, the original prior P(K) of 10% should be updated to a new P(K|R) of 25%, which incorporates the new clue of the rattle sound, which is more likely in the case of a keychain than a watch.
Notice that P(K|R) is larger than 10%, but it is not very close to 75%, which was the result we got from the first example.
So, even after rattling the box, we should still expect that there's only a 25% chance of a keychain, and it's still 75% likely that the present is a watch.
Clearly, the prior probability that we assume about the likelihood of getting each of the different kinds of presents has a large, dramatic effect on the post probabilities. The clue of the rattling sound still gives us some information, updating 10% to 25%, but it is not enough to overwhelm the initial very low prior probability to boost it above 50% or anywhere close to 75%. The rattle is a good clue, but it's not that good. There's still too much of a chance that Alice got you a watch and it just happened to be one of the 20% of watches which happen to rattle when shaken.
The Strength of Evidence
What kind of clue could overcome such a low prior probability? The strength of evidence is linked to the conditional probabilities, or likelihoods that they predict for various outcomes. The rattling clue (R) had a likelihood of happening 60% of the time when there's a keychain, and 20% of the time when there's a watch. Presumably, if you had put 'Pebbles' on your wish list, they would have something like a 95% chance of rattling when shaken, and so rattling would have even more strongly favoured pebbles in the box than a watch.
Now, imagine you had a third friend, Cindy, who was pretty trustworthy, but not quite as trusty as Bob. She tells the truth 90% of the time, and only 10% of the time is she wrong (or lying).
Bob has let you know ahead of time that Cindy definitely does know what's in the box.
While Bob distracts Alice by pointing out the importance of prior probabilities in Bayesian calculations, you quickly whisper to Cindy, "What's in the box?"
Cindy whispers back, "It's a keychain!" When Alice turns back to you, you've already straightened your face, but perhaps are smirking a little bit.
The question now is: a) If Cindy whispered her "keychain" (let's call this 'Ck') before you shook the box, what is the actual probability of a keychain, given that Cindy has said it's a keychain, P(K|Ck)? b) If Cindy had whispered "keychain" after you shook the box, what is the actual probability of a keychain, given both R and Ck, P(K|R and Ck)? c) If, after hearing Cindy's answer (from part a), you then shake the box and it rattles, is P(K|Ck and R) the same as P(K|R and Ck)?
Extraordinary Claims Require Extraordinary Evidence
Let's explore this by working through the three-part question above.
Part a) is a quite straight-forward Bayesian calculation. Since Cindy tells the truth 90% of the time, then the probability that she would say "keychain", in the case when it's actually a keychain, is 90%. Likewise, the case when it's a watch, she would still say "keychain" 10% of the time, presumably because she's lying. So, P(Ck|K) is 90% and P(Ck|W) is 10%. Plug these in and we get our answer:
P(K|Ck) = P(K) x P(Ck|K) / [ P(K) x P(Ck|K) + P(W) x P(Ck|W) ]
= 0.1 x 0.9 / [ 0.1 x 0.9 + 0.9 x 0.1 ]
= 0.09 / ( 0.09 + 0.09 )
= 0.09 / 0.18
= 9/18 = 1/2 = 0.5 = 50%
Thus, before shaking the box, Cindy's answer of "keychain" improves our estimated probability of a keychain from a mere 10% to 50%. It's almost as if Cindy's 90% chance of telling the truth 'cancels out' the prior 90% chance that Alice had bought us a watch, rather than a keychain. In fact, the math is exactly like that. You need strong evidence in favour of something to overcome strong prior implausibility of some claim. This is reminiscent of Sagan's motto that "extraordinary claims require extraordinary evidence". Cindy's claim that the present is a keychain is an extraordinary claim, but her high degree of trustworthiness makes her report count as extraordinary evidence in favour of the keychain hypothesis. Not enough to tip the scales, but enough to bring it back as a valid contender against the watch hypothesis.
Cumulative Evidence Accumulates Cumulatively
Answering part b) requires us to take the prior information/evidence of the rattling box (R) into account. Essentially, the posterior probability from one piece of evidence (R) becomes the prior probability for the next piece of evidence (Ck).
Again, this is actually a rather straight-forward application of Bayes' theorem. You basically just apply it twice: First for the evidence of R, and then for the evidence of Ck. Just take P(K|R) as your prior probability for Ck (rather than the usual P(K).
P(K|R and Ck) = P(K|R) x P(Ck|K and R) / [ P(K|R) x P(Ck|K and R) + P(W|R) x P(Ck|W and R) ]
This equation is asking us for P(Ck|K and R) and P(Ck|W and R), which are the probabilities of Cindy saying "keychain" given that it is a keychain (or watch) and the box rattled. But Cindy's answer doesn't depend on whether or not the box rattled. Regardless of a rattle, she will answer truthfully 90% of the time, according to our assumptions. So, P(Ck|K and R) is actually exactly the same as P(Ck|K), which is 90%. Likewise, P(Ck|W and R) is P(Ck|W), which is 10%. So, we'll simplify the equation and just plug in the numbers:
P(K|R and Ck) = P(K|R) x P(Ck|K) / [ P(K|R) x P(Ck|K) + P(W|R) x P(Ck|W) ]
= 0.25 x 0.9 / [ 0.25 x 0.9 + .75 x 0.1 ]
= 1/4 x 9/10 / [ 1/4 x 9/10 + 3/4 x 1/10 ]
= 9/40 / [ 9/40 + 3/40 ]
= 9 / ( 9 + 3)
= 9 / 12 = 3/4 = 0.75 = 75%
So, Cindy's "keychain" statement raises our estimate from 25% (itself improved from 10% after hearing the rattling) all the way up to 75%, which is respectable. Even though we initially would expect that Alice's predilection for watch-related gifts made a keychain unlikely at 10%, the combined evidence of hearing a rattle sound, and getting Cindy to confess that it's a "keychain" has boosted our confidence that it really is a keychain, and not a watch.
It could still be the case that the rattle came from a watch, and Cindy was just lying to us, but both of these combined are so unlikely as to outweigh the initial unlikeliness of getting a non-watch-related gift in the first place.
Just to confirm that Bayesian calculations don't introduce weirdness or inconsistencies, let's check what would have happened if we heard Cindy's "keychain" first, and then rattled the box second. Would the answer have been different???
Remember, that Cindy's "keychain" resulted in P(K|Ck) = 50%, and the proper way to combine evidence is to use the posterior probabilities from one as the prior probabilities for the next. So, the calculation uses a prior probability of 50%, which is exactly the same as the very first example in the previous post. And since the rattling evidence is independent of anything Cindy might say, we don't have to consider any dependencies between the evidence (this is not always the case in the real world, but this can be handled; it's just more complex than I want for this example).
P(K|Ck and R) = P(K|Ck) x P(R|K and Ck) / [ P(K|Ck) x P(R|K and Ck) + P(W|Ck) x P(R|W and Ck) ]
= P(K|Ck) x P(R|K) / [ P(K|Ck) x P(R|K) + P(W|Ck) x P(R|W) ]
= 0.5 x 0.6 / [ 0.5 x 0.6 + 0.5 x 0.2 ]
= 0.3 / ( 0.3 + 0.1 )
= 0.3 / 0.4
= 3/4 = 0.75 = 75%
Exactly the same as P(K|R and Ck), as we should expect.
If you keep things straight in the calculations, each independent piece of evidence modifies the final posterior probability independently of the other pieces, giving the same final answer regardless of the order you examine/process the evidence. The most important thing in Bayesian probability is to include all the relevant evidence and background information in the overall model of the situation. The more thorough your evidence, the more confident you can be in the answers that come out.
I'm working on developing more examples which may be more closely relevant to Jesus' historicity, but I think I'll just post this part first, since it's very important to understand how prior information works in Bayesian probability calculations.
If there's some issue you'd like a more-direct answer to, which I haven't addressed yet, please remind me, and I'll try to give a brief answer in the meantime.