Pragmatic Rationality Within The Prisoner’s Dilemma

Chaoslord2004's picture

I apologize for the tables not coming through and for the Greek symbols not coming through. Hopefully this essay still makes sense.

********************************

In 1950, Merrill Flood and Melvin Dresher, both employees of the Rand Corporation, formulated an interesting problem for Game Theory: the Prisoner’s Dilemma (PD). While this dilemma is abstract in it‘s very nature, it has far reaching real world implications. Flood is quoted in Poundstone (1993) as saying: “…I never foresaw the tremendous impact that this idea would have on science and society, although Dresher and I did think that our result was of considerable importance…” (p. 117). Since this paradox has implications for a wide array of fields, it is imperative that it be resolved. As we shall see, the problem ultimately comes down to the theory of utility one subscribes to. The Prisoner’s Dilemma boils down to two primary solutions: the egoist solution and the altruist solution. It will be shown that the egoist solution is self-defeating, while the altruist solution offers us the best possible payout.
A Common Prisoner’s Dilemma Story
Before it can be shown that it is more rational to cooperate, we need to set up the situation. Since the Prisoner‘s Dilemma usually takes a story form, we will start with a story:
Suppose that you steal the Hope Diamond, and are now looking for a buyer. You stumble across a man of questionable character named Mr. Big. Furthermore, you know that he is a shrewd businessman. The man decides to give you more than the value of the Diamond (suppose a million dollars). While you agree to the deal, you have yet to specify a mode of transaction.
Mr. Big suggests that you meet in a cornfield to do the transaction. However, you know of previous cases where Mr. Big met people in a cornfield, mugged and then stole what he was supposed to buy. Thus, you turn down Mr. Big’s idea, but not before offering your own idea of how the transaction should take place. You both agree to hide your goods in different locations; after the goods are hidden, you would phone Mr. Big with the location of the diamond, and he would tell you the location of the $1,000,000. However, it occurs to you that you could give him directions to the wrong location. If you did this, you would be able to keep the diamond, but also get the one million dollars. Shortly after you realize this, you realize that the same reasoning will occur to Mr. Big. What do you do? (Poundstone, 1993, 103-105).
There appear to be four possible outcomes: (1) both of you are honest, and the transaction goes as planned, (2) you lie while Mr. Big tells the truth, (3) you tell the truth while Mr. Big lies, or (4) you both lie. The question remains, what do you do? Before we can answer this question, the reward payout matrix is as follows:
Mr. Big is honest Mr. Big lies
You are honest One million, Hope Diamond $0, One Million and Hope Diamond
You lie One Million and Hope Diamond, $0 Hope Diamond, One Million

The way the (PD) matrix is set up is uncontroversial. Moreover, all classical (PD)’s have the same structure as the above example. No one disagrees with the basic game and decision theoretic outcomes. However, what is deemed most rational will be determined primarily by which utility function one subscribes to. If one chooses the principle of Maximize Self Utility (MSU), then one ought to lie; if one adopts the Maximize Global Utility principle (MGU), however, then it is most rational to be honest.
The Importance and Implications of The Prisoner’s Dilemma
Like many of the paradoxes# philosophers and mathematicians analyze, the Prisoner’s Dilemma has far reaching real world implications. PD’s arise in biology, economics and even our everyday lives. Poundstone (1993) writes,
The Prisoner’s Dilemma is apt to turn up anywhere a conflict of interests exists … Study of the prisoner’s dilemma has the great power of explaining why animal and human societies are organized the way they are (p. 9).
For example, take the example of the arms race the United States and the Soviet Union engaged in. The Cold War is a classic example of a Prisoner’s Dilemma. Suppose both countries agree to disarm. Both countries had a choice to make: betray and build up, or stick with the treaty and disarm (Clark, 2002, 151-152). While it is unlikely that the abstract instantiations of a prisoner’s dilemma happen in real life, any solution that solves the abstract version will give us great insight into solutions for real world cases.
Priliminary Remarks
Before one analyze the two main arguments used to resolve the (PD), one needs to understand exactly what Prisoner’s Dilemma says. One important aspect of the Prisoner’s Dilemma is that it asks one to make a decision under both ignorance, and risk. In the abstract version, we are given no information about our opponent. In addition, we cannot even be certain that we actually have an opponent; there are no safeguards to assure us that deception has not happened.
In all classical versions of the (PD), we are only given the following information: If you cooperate, and your opponent cooperates, then you get payout f. If you cooperate and your opponent defects, you get payout d. If you defect, and your opponent stays silent, you get payout l. If you defect and your opponent defects, you get payout c. You also know that the following relation holds between the values: l > f > c > d. This ranking relation holds for a given person (a for our purposes), where l is the best possible payout, and d the worst possible payout.
Other than the above information, one is completely ignorance of anything else related to the opponents disposition. All one knows, say a, is that if there is in fact another person a is competing against, he is aware of the same situation. However, a could not know which decision his opponent will take. All that a knows is that if his opponent is rational, then he ought to either cooperate, or defect (depending on whether altruism or egoism is the best option). Hence, the goal is not to demonstrate how the players will act, only how they ought to act.
The choice a ought to make will be a matter of which utility function (global utility or self utility) secures the highest payout. Before we can discuss the utility functions, a brief discussion regarding the Dominance Principle is necessary.
The Dominance Principle
In almost every decision we make, there will be a range of decisions that secure a higher payout than others. While it is possible that two decisions one might make ultimately secure the same payout, we will only consider those decisions where at least one decision secures the highest payout on either all, or almost all possible outcomes. Within Decision Theory there is a principle known as The Dominance Principle (DP). Michael Resnik (1987) states the (DP) as follows: “we say that A dominates another act B if, in a state-by-state comparison, A yields outcomes that are at least as good as those yielded by B and in some states they are even better.” What the (DP) says is that we can rule out those decisions which pay less than others (p.9). To illustrate this point, imagine the following payout matrix between two decisions and two outcomes:
Table 1.
Outcome G Outcome K
Decision F $10 $7
Decision Y $6 $7

We are assuming that these are the only possible choices and outcomes available. Given two decisions, F and Y, the dominance principle states that if our goal is monetary gain, then we would always fair better by choosing F. Regardless of what outcome will occur, we will always get either the same monetary gain by choosing F over Y, or in some outcomes, we receive a higher monetary payout. Hence, decision F dominates over decision Y. However, this is only in the case where one decision dominates over another decision. If F and Y both yielded the same payout (say $10 dollars each, for each possible payout), then obviously the (DP) is inapplicable.
Intuitively this seems like a good principle. If one decision gets one closer to ones gains either all of the time, or fairs at least as well as other choices, then obviously we should choose that decision. While this principle is not disputed within the context of the Prisoner’s Dilemma, how it is applied is highly debated.

The Argument for Defection
The argument for defection is often called the called the egoists solution to the (PD). An egoist would argue that it is irrelevant as to what ones opponent chooses. Hence, when developing a solution to the (PD), the egoist will argue that we should simply focus on what is best for us, and disregard what is best for the group.
As stated earlier, the egoist, like the altruist, uses the Dominance Principle. However, unlike the altruist, the egoist employs a utility function which maximizes his possible payout, regardless of what his opponents decision might be. It is of paramount important that this point be emphasized; the egoist will only be thinking of himself.
Here is the payout matrix for the Prisoner’s Dilemma, with the same relation for a as before, l > f > c > d:
Table 2.
b cooperates b defects
a cooperates Payout f Payout d
a defects Payout l Payout c

Let the following values for a be: f = 5 years in prison, d = 10 years in prison, l = 0 years in prison, c = 7 years in prison. Using the dominance principle coupled with the principle of Maximize Self Utility (MSU), a will reason as follows:
a reasons:
P1: Either b will defect, or b will cooperate. [Law of Excluded Middle]
P2: Suppose b cooperates. [Assumption]
P3: If I defect, then I will get the maximum possible payout: l. [Follows from P2]
P4: If I were to cooperate, then I would get the second highest payout: f. [Follows from P2]
P5: Suppose b defects. [Assumption]
P6: If I defect, then I would get the third highest payout: c. [Follows from P5]
P7: If I cooperate, then I get the lowest possible payout: d. [Follows from P5]
C1: Regardless of what my opponent does, it would be most rational for me to defect [Principle Of Maximize Self Utility] (Olin, 2003, p. 146).
One should notice that table 1 has a values for a, but not b . Since a doesn’t care what choice b will make, it is irrelevant to add in his possible values.
We can easily grant that the above argument is valid; if principle of Maximize Self Utility is a good interpretation of the Dominance Principle, then the argument is sound. However, the principle of Maximize Self Utility is a very bad application of the (DP). We can show by a reductio ad absurdum argument that this principle is an extremely poor utility function. Assume for reductio that the (MSU) is correct:
P1: The (MSU) is a good utility function [assumption for reductio]
P2: If (MSU) is a good utility function, then a and b ought to defect [follows from the definition of what a good decision-theoretic principle is]
P3: If a and b ought to defect, then by defecting, they will secure the highest possible payout [follows from the definition of what a good decision-theoretic principle is]
P4: If a and b ought to defect, then they will each serve the 7 years in prison [fact of the payout matrix]
P5: Serving 7 years in prison is not the highest possible payout [fact of the payout matrix]
C1: Ergo, The (MSU) is a bad decision-theoretic principle [reductio P1-P5]
C2: Ergo, the Maximize Global Utility principle (MGU) is a good utility function [Follows from the reductio]
All we have done is assume that the principle of Maximize Self Utility is a good application of the Dominance Principle, and then follow the principle to it’s logical conclusion. If the (MSU) is a good application of the Dominance Principle, then the only outcomes a and b ought to consider are those situations which guarantee that they get the highest possible payout, regardless of which choice their opponent takes.
Looking at table 2, defection always secures for a (or b if b had values) the best outcome no matter what. Furthermore, if the (MSU) is a good decision-theoretic principle, then it aught to secure for the agents the best solution that is possible. If this was not the case, then there would be no reason to favor it over another principle (such as the principle Maximize Global Utility). However, as we can see from table 2, if one ought to adopt the (MSU), then it follows that one aught to get c. However, spending 7 years in prison is not the maximum possible payout.
The reasoning the egoist employs should strike us as extremely naïve. The egoist is asking us to put blinders on by completely ignore our opponent. What intuition could possibly justify ignoring our opponent? It cannot be a higher payout, for this is demonstrably wrong. The only intuition found in support of this position seems to be the old saying that if one is going to error, it is better to error on the side of caution. It has yet to be seen how this intuition can be justified within the context of the Prisoner’s Dilemma. Furthermore, it is not obvious that this intuition is justifiable in many cases. Until the egoist can justify this assumption, one aught to look upon this intuition with skepticism.
The only way for the egoist to rationally maintain his view, in spite of the reduction, is by convincing us that spending 7 years in prison is better than spending 5 years in prison. It remains to be seen how this is even possible. The burden of proof is on the egoist to rescue his argument from the reductio. Until the egoist can do this, we should reject his argument as myopic and self-defeating.
The Argument For Cooperation

As has been shown above, the egoists argument rests upon a dubious utility principle. We now turn to the explanation and defense of the argument for cooperation. Like the argument for defection, the argument for cooperation rests upon the Dominance Principle. However, the altruist will interpret the principle in light of the principle of Maximize Global Utility (MGU). The altruist argues that given a prisoners dilemma, and given any two agents a and b, a aught to make his choice considering b‘s payout, and vice versa. Hence, contrary to the egoist, the altruist will argue that what is best for you, is also best for the group. Once again, the payout matrix is as follows:
Table 4.
b cooperates b defects
a cooperate Payout f, f’ Payout d, l’
a defect Payout l, d’ Payout c, c’

With the values for a being: f = 5 years in prison, d = 10 years in prison, l = 0 years in prison, and c = 7 years in prison. Let the values for your opponent be: f’ = 5 years in prison, d’ = 10 years in prison, l’ = 0 years in prison, and c’ = 7 years in prison. The argument for cooperation is as follows. Take any two agents a and b.
a reasons the following way:
P1: Ether I will cooperate, or I will defect. [Law of Excluded Middle, and all a can control].
P2: If I defect, and b cooperates, then I get l , while he gets d’. [follows from the matrix of the game].
P3: If I cooperate, and b defects, then I get d, while b gets l’. [follows from the matrix of the game].
P4: I reason that my opponent can also either defect or cooperate [follows from the rules of the game].
P5: I know that both of us can either use the maximize self utility function, or the maximize global utility function. [Law of Excluded Middle].
P6: If we both reason in terms of the former principle, then I will get payout c, while b gets payout c’. [follows from the rules of the game]
P7: If both of us reason in terms of the latter, then I will get payout f, and b will get payout f’.
C1: a and b aught to cooperate, because (f & f’) > (c & c’) [Maximize Global Utility function]
To be clear, the above argument is not designed to demonstrate that a and b will cooperate, only that they ought to cooperate. a starts off by assuming that either he will defect, or he will not. Since this is a law of classical logic, this step does not need to be justified. The rest of the argument merely shows the various possible payout’s a and b will get. Hence, a and b aught to reason from the global perspective, for this yields the best possible outcome.
One criticism that might be launched against this solution is that it rests upon wishful thinking. Brian Skyrms (1987) lodges this criticism: “…but their cooperation appears to be based on magical thinking, because each knows that his act cannot influence that of the other” (p. 50). Because the issue is utility, not influence, this objection misses the point. Furthermore, it is unclear as to what Skyrms exactly means by “influence.” If Skyrms means that the two agents cannot causally effect one another, then he is correct; however, this is wholly irrelevant.
If Skyrms uses the word “influence” to mean that one agent either cannot, or should not take into consideration the payoffs of his opponent, then he is demonstrably wrong. If the latter meaning is what Skyrms means, then his criticism merely collapses into an argument for defection. In addition, Skyrms would then have to find a way to respond to the self-defeating nature of the egoist argument.
In conclusion, it has been shown that argument for defection ultimately rests upon a dubious principle. Moreover, until the egoist can resolve the difficulties presented in this paper, we should at a bear minimum tentatively reject his argument. Since the payout for cooperation is greater than the payout for defection, we should conclude the following: Give the abstract single round prisoner’s dilemma, if one is rational, then one ought to cooperate. While this conclusion must be tentative, it is seems unlikely that the egoist can save himself from his own self-defeating argument.

*********************************

Bibliography

Poundstone, William (1992) Prisoner’s Dilemma. New York: Doubleday.

Resnik, Michael, D (1987) Choices: An Introduction To Decision Theory. Minneapolis: University Of Minnesota Press.

Sober, Elliott and Wilson, David Sloan (1998) Onto Others: The Evolutionary and Psychology Unselfish Behavior. Massachusetts: Harvard University Press.

Skyrms, Brian (1996) The Evolution Of The Social Contract. Cambridge: Cambridge University Press.

Rescher, Nicholas (2001) Paradoxes: Their Roots, Range, and Resolution. Chicago: Open Court.

Clark, Michael (2002) Paradoxes from a to z. London: Routledge - Taylor & Francis Group.

Olin, Doris (2003) Paradox. Montreal: McGill-Queen’s University Press.

Held, Virginia (1968) “On the Meaning of Trust”, Ethics 78(2): 156-159.

Tullock, Gordon (1967) “The Prisoner’s Dilemma and Mutual Trust”, Ethics 77(3): 229-230.

Steiner, Hillel (1982) “Prisoner’s Dilemma as an Insoluble Problem”, Mind 91(362): 285-286.

Birmingham, Robert L (1969) “The Prisoner’s Dilemma and Mutual Trust: Comment”, Ethics 79(2): 156-158.

"In the high school halls, in the shopping malls, conform or be cast out" ~ Rush, from Subdivisions

American Atheist's picture

Nice!

Nice!

Chaoslord2004 wrote: I

Chaoslord2004 wrote:
I apologize for the tables not coming through and for the Greek symbols not coming through.

You can make ALL of that come through if you create a new forum topic, use the first post to explain what the thread is about, and then in the second post cut and paste the entire argument where it's formatted properly into the post.  It wont work in the first post, but it will work in any post thereafter.

 

 

Please donate to one of these highly rated charities to help impede the GOP attack on America 2017-2019.

Support our activism efforts by making your Amazon purchases via this link.