# ECON 251: Financial Theory

### Lecture Chapters

• 0
• 1146
• 2034
• 3219
Transcript Audio Low Bandwidth Video High Bandwidth Video
html

# Financial Theory

## ECON 251 - Lecture 14 - Quantifying Uncertainty and Risk

### Chapter 1. Expectation, Variance, and Covariance [00:00:00]

Professor John Geanakoplos: We’ve dealt so far with the case of certainty, and we’ve done almost as much as we could in certainty, and I now want to move to the case of uncertainty, which is really where things get much more interesting and things can go wrong. So I’m going to cover this. So we’re ready to start.

So, so far we’ve considered is, the case of certainty. So with uncertainty things get much more interesting, and I want to remind you of a few of the basics of mathematical statistics that I’m sure you know. So you know we deal with random variables which have uncertain outcomes, but with well-defined probabilities.

So another step that we’re not going to take in this course is to say people just have no idea what the chances are something’s going to happen. Shiller thinks we live in a world like that where who knows what the future’s going to be like and people, they hear a story and then everybody gets wildly optimistic, and then they hear some terrible story and then everybody gets wildly pessimistic, and that kind of mood swing can affect the whole economy.

I’m not going to deal with that. It’s hard to quantify and I’m not exactly sure it’s as important as he thinks it is. So we’re going to deal with the case where many things can happen, but you know what the chances are that they could happen, and still lots of things can go wrong in that case. So there are a couple of words that I want you to know, which we went over last time, and I’ll just do an example.

We always deal with states of the world, states of nature. That was Leibniz’s idea. So let’s take the simplest case where with probability 1 half you could get 1, and with probability 1 half you could get minus 1. So that’s a random variable. It might be how your investment does. Half the time you’re going to make a dollar. Half the time, you’re going lose a dollar. So this is X, so we define the expectation of X, which I write as X bar, as the probability of the up state happening, so let’s just call that 1 half times 1, + 1 half times minus 1 which equals 0.

Then I define the variance of X to be, what’s the expectation of the squared difference from the expectation? So how uncertain it is. You’re sort of on average expecting to get 0, so uncertain it is, is measured how far from 0 you are, but we’re going to square it. So it’s 1 half times (1 - X bar) squared + 1 half times (minus 1 - X bar) squared = 1 half times 1 + 1 half times 1 which also equals 1. So the variance is 1.

And then I’ll write the standard deviation of X equals the square root of the variance of X, which equals the square root of 1 which is also 1. So very often we’re going to use the expectation of X, that’s going to be how good the thing is, and the standard deviation is going to be how uncertain it is, and people aren’t going to like–soon we’re going to introduce the idea that people don’t like uncertainty and this is the measure of what they do like.

It pays off on average a big number, say, this one doesn’t but it could, and the measure of uncertainty is the standard deviation. I choose that rather than the variance for a reason you’ll see. It makes all the graphs prettier, but also if you double X you’ll double the expectation, obviously, because you just double everything inside here.

The variance, though, you’re going to end up squaring the two. If you double X you’ll double all these outcomes and the mean, so you’ll end up multiplying the variance by 4, whereas you’ll multiply the standard deviation by 2. So re-scaling just re-scales these two numbers and has a funny effect on that number. So that’s the reason why we use these two.

Now, you could take another example, by the way, which is .9 times 3 [correction: .9 times 1 third]; let’s call this Y, and .1 times minus something. How about let’s call this 1 third and this minus 3. Now, what’s the expectation of Y? The expectation of Y equals .3, right, equals–just write it out, it’s .9 times 1 third + .1 times minus 3 which equals .3 - .3 which equals 0, so the expectation of this random variable is the same as the expectation of that random variable.

And now the variance of this, of Y, is .9 times (1 third - 0) squared + .1 times (minus 3 - 0) squared, which equals .9 times 1 ninth, right, + .1 times 9 which equals .1 + .9 which equals 1, which is the same as the other one. So here we’ve got another random variable which looks quite different from this, so clearly standard deviation and expectation don’t characterize things. This looks quite different from that one, has the same standard deviation and the same expectation.

So we’re going to come back what the difference is between these two variables in a second. So there’s another thing I want to introduce which is the covariance of X and Y. So we could look at the outcomes of these variables. Where am I going to write this? I’ll write it over here. We could look at the outcome of these variables in a picture like this, and so here we have X and here we have Y. So X could turn out to be 1 when Y is 1 third, and X could turn out to be 1 when Y is minus 3. So here’s an outcome, and here’s an outcome, and X could be minus 1, and we could get 1 third or minus 3. So there are four outcomes looked at here.

So if you looked at X alone it’s got a 50/50 chance you’re here or here. If you look at Y alone it’s a 90 percent chance up there and a 10 percent chance down there. So those are called the marginal distributions, but the joint distribution we would have to add a number. So if you looked at X alone, by the way, you would say X alone you would say here’s 0, here’s 1, here’s minus 1, so you could have this or this with probability 1 half and 1 half and Y you could have–so we’ll draw it this way. With Y you could have 1 third or minus 3 and here the probability is going to be .9 and .1. This is 0.

Those are the pictures that we started with. So you know where X could end up and where Y could end up, well, you don’t know where they jointly could end up. So if they end up on the long diagonal that means when X is high Y tends to be high and vice versa, and if you end up down here X is low and Y is low. So to the extent that the probability is on the long diagonal they’re correlated together. To the extent that the probability is on the off diagonal they’re negatively correlated.

So anyway, to get a sense of that, the covariance is going to be the probability of (1, 1 third) times (1 - X bar) times (1 third - Y bar) + the probability of–I’ll just go around the circle of (minus 1, 1 third) times (minus 1 - X bar) times (1 third - Y bar) + the probability of (minus 1 and 1 third), sorry what did I just do? I did minus 1 and 1 third. I’ve already done that, so I’m down here. So (minus 1 and minus 3) times (minus 1 - X bar) times (minus 3 - X bar [correction: Y bar]) + probability of the ordered pair–

Student: Should that minus be the X bar or Y bar?

Professor John Geanakoplos: Thank you. And probability, what’s the point I haven’t done yet, (1, minus 3) times (1 - X bar) times (minus 3 - Y bar). So why does that covariance pick up the idea of correlation?

Well, to the extent that the probabilities are high here and over there on the long diagonal this term is going to get a lot of weight, and what is the other term, (minus 1, minus 3), and this term is going to get a lot of weight. So to the extent that you’re on the long diagonal this term and this term are going to get a lot of weight, but you see those terms this is going to be positive because it’s 1 - 0 and 1 third - 0, so that’s a positive term. And this is negative, minus 1 - 0, minus 3 - 0, so a negative times a negative is also positive.

To the extent that you’re down here and up there you’re going to get big positive numbers in the covariance. To the extent you’re on the off diagonal you’ll get big probabilities here, but they all multiply negative terms. This is a minus and this is a minus, because one of terms is above the mean and the other one is below the mean. That’s what it means to be in the off diagonal. So covariance is giving you a sense of whether things are moving together or moving the opposite way.

So those are the basic things you have to know. And I guess another couple things are, the covariance is linear in X, right, because if you double X every time you see the X variable over here it’s always an X outcome minus an X bar, an X outcome minus an X bar, an X outcome minus an X bar, an X outcome minus an X bar, so if you double X you’re going to double every term.

So it’s linear in X and in Y, and so one last thing to keep in mind is that the variance of X is just the covariance of X with itself. Obviously if you just plug in X equal to Y you just get the formula for covariance [correction: for variance], and similarly because they’re linear the covariance of X + Y–so the variance of X + Y, one more formula, of X + Y by linearity–first of all that’s the covariance of X + Y with itself, and therefore by linearity now, I’m just going to do linear stuff, that’s equal to the covariance of X with X + the covariance of Y with Y + 2 times the covariance of X with Y.

Since it’s linear I just do the linear parts, right? Covariance of X + Y with X + Y is covariance of X + Y with X + covariance of X + Y with Y, then I repeat the linearity thing and I get down to that. So those are basically the key formulas to know. So now I’m going to make three little observations that come out of all of this that are quite fascinating, so quite elementary. Are there any questions about this, these numbers? Yes?

Student: I don’t understand why you gave the probability of (negative 1, negative 3) weight when negative 3 has a much more probability of being hit on that 1 third.

Professor John Geanakoplos: Why did we give? Say that again.

Student: Why did you underline the probably of negative 1, negative 3.

Professor John Geanakoplos: Probably of negative 1, negative 3. That’s this outcome here. We underlined it not because it was very likely, but because this term is going to be positive. This is positive and this is positive. So the whole point is the joint distribution is not specified, not determined by the distributions of X alone and Y alone.

So even if I know the probability of what X could do, and I know what the probabilities that Y could do that doesn’t tell me anything about what numbers I should put on these four outcomes.

For example, I could have at one extreme when X is high Y is high–it can’t be exactly that because the probabilities are different. These numbers and those numbers don’t determine these four numbers. So there are many different numbers I could put in these four squares which would give me in total this probability outcome for X and in total this probability outcome for Y. So an easy way to see that is if I made them. So what are the observations I want to make?

For instance, I could say if X turns out to be 1 half then I’ll always assume Y turns out to be 1 half, and then with the other 40 percent of the time Y might turn out to be–when Y’s high X might have to turn out–so here are some ways I could do this. I could put 50 percent here, .5 here right? Then 40 percent of the time this is going to turn out–so I have a .5 here, then what could I do with the rest of this? This plus this has to add up to 50 percent. So 50 percent I could have X turn out to be here. So when X is 1 I could have Y always turn out to be 1, so that means I must have a probability here, a probability 0 here because here’s X 50 percent. So this plus this X is going to turn out to be 1, 50 percent of the time.

Now, how much of the time is Y going to turn out to be down here a .1? So suppose I put these probabilities, .4? Now, so you see that X is–50 percent of the time X is 1, and 50 percent of the time X is minus 1. Now, how many of the times is Y 1 third, .5 + .4, so 90 percent of the time, and then 10 percent of the time Y is minus 3. So here’s one way of putting probabilities on the dots that produces this outcome, but I could have chosen another way of doing it, the way that you probably had in mind where I assume they’re totally independent.

That is, knowing the outcome of X in this way of doing it, if I know that X turned out to be 1, Y has to turn out to be a third. So they’re very dependent. X is somehow causing Y or determining Y. X has a lot of information about Y. Suppose I make them independent? I say what happens here has nothing to with what happens over there. Then I write the probabilities, instead of these, I’d write it .45. I’d take 1 half times .9 is .45, and then the chance that you go down for X, which is .5 and up for Y which is also .45 here, then I’d go .05 here and .05 there.

So here, knowing that Y has a good outcome tells you nothing about what X is going to do. It’s still equally likely X was good or bad. Knowing that Y had a bad outcome, X is still likely to be equally likely good or bad. And similarly knowing the outcome of X tells you nothing about the outcome of Y. This is 9 times this and this is 9 times that. So the yellow is independence, which is probability ((X equals x), and (Y equals y)), equals the product, Probability (X = x) times probability (Y = y). So that’s the case in independence.

So in the case of independence, knowing something about one variable tells you nothing about what happened to the other variable, but you could do other joint things. So knowing each of them separately doesn’t tell you how they’re jointly distributed, and the covariance is an effort to see whether they’re sort of correlated together or whether they’re correlated independently.

So independence, by the way, independence implies covariance equals 0. That’s obvious because what’s happening in the X variable’s got nothing to do with what’s happening in the Y variable. So since it’s linear in X you can hold Y fixed, and the X is just the same and you’re going to get something that adds up to 0. So for any fixed value of Y this number will just give you the expectation of X, which won’t depend on Y and it’s going to be 0 in every case. So therefore if they’re independent their covariance has to be 0. So, independence means X and Y tell you nothing. That means the covariance is 0. They could be positively distributed like up here or negatively distributed, either way you want to do it. Does that make sense? You asked me about this.

Student: Yes.

### Chapter 2. Diversification and Risk Exposure [00:19:06]

Professor John Geanakoplos: So what are the key simple observations here that are going to inform a lot of our behavior under uncertainty?

Well, it’s going to turn out that expectation is good and standard deviation is bad. So if we take this variable that we just found, X and Y were both here, X and Y were both there. All right, they each had standard deviation 1 and expectation 0, so this is the standard deviation. So X is here, and by the way so is Y, same thing. Well, suppose I put half my money into X and I put half my money into Y, and if I put half my money in each let’s say I get half the payoff of each. I make half a bet and get half the outcome. What happens to my expectation?

Well, the expectation of that obviously equals 1 half X bar + 1 half Y bar which also equals 0. So it’s staying the same. The expectation hasn’t moved, but what’s the variance of 1 half X + 1 half Y? Well, by that formula it’s the covariance–so I’m just going to do this formula. I’m going to a 1 half here and 1 half here. So it’s the same thing. So it’s the covariance of 1 half X with 1 half X + the covariance of 1 half X + 1 half Y + 1 half and 1 half.

But the covariance of 1 half X with 1 half X is just, okay, what is that? It’s the variance of 1 half X, but we already saw from our definition of variance over here, remember, if you double X you’re going to multiply the variance by 4 because you’re squaring things. So this is going to turn out to be 1 quarter times the variance of X. And this, which is 1 half Y and 1 half Y, is going to be 1 quarter times the variance of Y.

And if the two are independent the covariance will be 0. So in this example, these two variables, if I take the orange distribution where they’re independent I can do an X outcome and have this standard deviation and this expectation, 0 expectation and that standard deviation, I can do the Y thing, get the same standard deviation or I can put half my money in each.

It seems like a total waste of time to put half my money in each. After all, they give me the same standard deviation, but no, it isn’t. If they’re independent you’re shockingly, drastically reducing your standard deviation. Because if they’re independent the covariance is 0 and so this plus this plus, the variance of X = the variance of Y is just the half the variance of X = half the variance of Y. So that’s shocking. So the standard deviation, therefore, the square root of that is 1 over the square root. So by putting half your money in each you’ve now produced this when they’re independent.

So this is the standard deviation of 1 half X + 1 half Y, (X, Y) independent. You move from this point to that point. You reduced your standard deviation without affecting your expectation. So the first lesson that we’re going to see applied, this is all mathematics so mathematicians understood this, of course, a long time ago, but to realize this has an application to economics wasn’t so obvious, although Shakespeare knew it. It’s diversification. So don’t put all your, you know, spread your investments out into different waters.

Shakespeare, you know, Antonio had a different ship on each ocean, so instead of putting all the ships on the same ocean he put them on different oceans which he assumed was independent. So he had the same expected outcome assuming the paths were just as quick to wherever he was selling the stuff, the same expected outcome and that each of the waters were equally dangerous, but he drastically reduced his variance. And because there were a lot of oceans and a lot of ships this number went down further and further. So the key is to look for independent risks. So that’s one lesson in mathematics that has a big application in economics.

What’s a second thing? Well, the second thing is that if you add a bunch of risks together, so I’m going to say this loosely. If you add a bunch of risks together, so by the way, what’s the generalization of this before I say this?

If you had N independent risks with identical means and variances, means let’s call them all X bar and variances, sigma squared. Let’s say they all have expectation E and variance sigma squared, each of them has that, then what happens to the–so each of them has standard deviations, so they’re all identical. Like X and Y have the expectation 0 and the same standard deviation 1. Suppose I had 20 of those and I put 1 twentieth of money into each of them? What would happen to my expectation?

1 over N dollars in each one implies what happens to my expectation if expectation equal to what? Each of them had expectation E. I now split my money among all of them, all with the same expectation. That also has to have expectation E. All right, just like this thing putting half my money in Y and half my money in X, wherever the X went. Y was over here. X is there. Half my money in X and half my money in Y, is going to give me the same expectation. If I had 12 projects like that that were independent I’d still have the same expectation, but my standard deviation, what’s going to happen to my standard deviation?

Well, the variance is going to be–so what’s going to happen to the standard deviation?

Student: It would go down.

Professor John Geanakoplos: By what factor? Yeah, what’s going to happen to the variance?

Student: 1 over…

Professor John Geanakoplos: Put 1 over N dollars in each of N identical but independent investments, what will my variance be?

Student: <>

Professor John Geanakoplos: The variance is going to equal 1 over N times sigma squared. Why is that? Because each one will have 1 over N dollars in it, so its variance is going to be 1 over N squared times sigma squared, but there are N of them. So it’s going to be N over times 1 over N squared, so it’s just 1 over N, so implies the standard deviation–so I’ll call it standard deviation, is 1 over the square root of N times sigma. So it’s just this generalization. We’ve got 1 over the square root of 2, so if I did N of them instead of 2 of them I’d have 1 over the square root of N. So those turn out to be very useful formulas which are going to come up over and over again.

And let’s just say it again so you get this straight. If I have two independent random variables, and I split my money evenly between them, and they have the same expectation, it doesn’t have to be 0, it could be a positive number, if I split my money between them I haven’t changed my expectation because each dollar, however I split it, I’m putting it into something with the same expectation.

But because they’re independent you get a lot of off diagonal things happening. The off diagonal things, remember, are canceling. One investment is turning out well, X is–sorry that’s on the diagonal. The off diagonal elements are good in a way because if one investment’s turning out well, sorry, turning out badly the other one’s turning out well. So here investment Y is turning out badly, but X is turning out well. So to the extent you’re off the diagonal you’re canceling some of your bad outcomes because one’s good and the other’s bad. So that way you leave the expectation the same, but you reduce the variance.

In fact it would be even better if you could put everything on the off diagonal, but to the extent you get at least some stuff on the off diagonal you’re reducing the risk. And how fast do you reduce it when they’re independent? You reduce it dividing it equally because the variance is a squared thing, half your money in one and half in the other means the variance of the first is 1 quarter and the variance of the second is 1 quarter, but now there are two of them so the total variance is 1 half of what it was before.

If you have 10 of them each one is 1 tenth the money so it’s got 1 one-hundredth of the variance, but there are 10 of them so it’s 10 one-hundredths, 1 over N of the variance. If you take the standard deviation it’s 1 over the square root of N. So that’s the rate at which you can reduce your uncertainty and your risk. You’ll see this gets much more concrete next lecture.

So this is just stuff that most of you know. So one more thing, if you add a bunch of independent things together, independent random variables, so I’m going to speak very loosely now, variables, you get a normally distributed random variable, normally distributed random variable with the corresponding expectation and standard deviation. So what am I saying?

I don’t want to speak too precisely about this because if you’ve seen this before and seen a proof you know everything about it, if you haven’t it’s just too many subtleties to absorb. But the normal distributed random variable’s the bell curve that looks like that. It looks like this. So there’s the bell curve with expectation 0. So it’s this bell curve.

Now, what’s special about it, it has a particular formula which has got an exponential to a minus X squared thing. Anyway, it’s got a particular formula to it which if you know you know, if you don’t it’s written down. We’re never going to use the exact formula, but it looks like that. So these are the outcomes X and this is the probability, probability of outcome, or frequency of outcome. So the bigger X is, and this is the mean–equals 0–I’ve assumed the mean is 0. If you take a really big X it’s very unlikely to happen, and a really small X it’s very unlikely to happen, and X’s nearer the mean are pretty likely to happen.

So anyway, it’s amazing that if you add this random variable to itself a bunch of times it can only produce 1 and minus 1, right? This one produces totally different outcomes, 1 third and minus 3, they’re disjoint outcomes, but if you add this together you can get 25 1s and 10 minus 1s, so that gives you 15. Over here you could have–25 will never get me there, so sorry, that was a bad example. If I had 30 things I could get 18 1s and 12 minus 1s, that’ll give me 6, you could have gotten 6 over here, but with 30 outcomes you could get, you know, all 30 of them could have turned out to be 1, and that would have gotten you pretty close to the same outcome.

So just because these outcomes are separate, once you’re adding them up you’re starting to produce numbers different from 1 and minus 1, and these added up–if you take the right combination of 1 third and minus a third–you can start reproducing things. Like to get a 1 here you could produce three tops and then you’re producing a 1. So anyway, the shocking thing is if you add a bunch of these random variables that are independent to each other you get something normally distributed that looks like that because this random variable had exactly the same mean and standard deviation.

You add the same number of these you’re going to get outcomes that are almost identically distributed. So in the limit this random variable, enough of these added together looks exactly the same as these added together. That’s the second surprising mathematical fact.

And the third thing that we’re going to use is that the normal distribution is characterized by the mean and standard deviation, that’s all it takes to write the formula of this down, and these numbers, these are called thin tailed. These probabilities go to 0 very fast, so you shouldn’t expect many outlying dramatic things to happen.

And in the world they do happen, and so we’re going to see that much of classical economics is built on normally distributed things and so you can’t see–you shouldn’t expect any gigantic outliers to ever happen. And it seems natural to build it on that kind of assumption because if you add things that are independent you get normal distributions all the time. And things seem independent so why shouldn’t you get normal distributions, and yet we must not get it because we have so many outliers. So that’s the basic background of mathematics.

Are there any questions about any of that? I’m just assuming you know all that and now we’re going to move to economics. I think that’s all the background you need.

### Chapter 3. Conditional Expectation [00:33:54]

I want to do one more thing, which is maybe background, but it’s used in economics all the time, and it’s called the iterated expectations.

So if I told you that these variables were correlated like these up here, like the orange things, if I told you what X turned out to be that would tell you a lot about what Y was going to be. So for example, if I told you that X was–sorry, the white ones are the correlated ones. If I tell you that X has turned out to be 1, that tells you that Y has to be a good outcome of 1 third, because if X is one this never happens.

So the only thing that can happen if X is 1 is that Y turns out to be 1 third, so knowing X is going to completely change your mind about the expectation of Y.

So conditional expectation, I should have said this before, conditional expectation simply means re-computing expectation using updated probabilities from your information. Now, you’ve probably done this in high school, so I’m just going to assume you know how to do this. So in this case if I tell you something like X has turned out to be 1 that tells you that only these two outcomes are possible. So that means that the only two outcomes in the white case have happened with probability of .5 and 0, but if I tell you X has come out to 1 the conditional probabilities have to add up to 1.

So you just scale things up. So you know that Y had to have been the good outcome up here. If I tell you that the bad outcome for Y has happened then you have probabilities of .1–so this 0 makes things too easy. Suppose I tell you the good outcome of Y has happened. What are the chances now that X has gotten the good outcome in the white probability case? If I tell you that Y turned out to be 1 third in the white probability case what’s the probability that X turned out to be 1, conditional on that?

Student: 5 ninths.

Professor John Geanakoplos: 5 ninths, so that’s it, because the probabilities are now–you’re reduced with .4 and .5 so 5 ninths of the time. So that’s an idea which I assume you all can–it’s very intuitive, and it’s way too long to explain, and I’m sure you know how to do that. So anyway, the conditional expectation, blah, so the iterated expectation is simply this. It’s an obvious idea, but it’s going to be incredibly useful to us.

It says if you ask me what are the chances that the Yankees are going to win the World Series against the Dodgers–let’s suppose that’s who’s going to play–the Yankees are going to beat the Dodgers, what’s the probability that’s going to happen? What do you expect the chances are?

If I then ask you my opinion after the first game, well, obviously if the Yankees win the first game my opinion’s going to go up, so I’m going to have a different opinion. If the Dodgers win the first game my opinion is going to go down, so I’ll have a different opinion. But you can ask now another question, what’s your expected opinion going to be? So the law of iterated expectations is, the expectation of X has to equal the expected expectation of X given some information.

So here is what I think. The Yankees are 70 percent likely to win. If I say after the first game [clarification: if the Yankees win] I’ll think it’s 80 percent, and after the first game if the Dodgers win I’ll think it’s gone down to 65 percent, it had better be that the average of my opinions after the information is the same as the number I started with.

That’s just common sense and I’m not going to bother to prove that. So that’s incredibly important. It’s not only the expectation of X, but as you learn stuff you can anticipate your opinion’s going to change, but your average opinion has to always stay the same as X was. So that’s the last of the background.

And now I want to do a simple application of this. So in fact, to that very question, suppose that you’re playing a World Series. The Yankees are playing the Dodgers and let’s suppose that the Yankees have a 60 percent chance of winning any game. I’ll just do it here. The Yankees have a 60 percent chance of winning any game. What’s the chance the Yankees win a 3 game world series? How do you figure that out?

Well, a naïve way, a simple way of figuring that out is to say, well, what could happen? Life can mean a Yankee win, let’s call that an up, or a Yankee loss, let’s call that a down, and this could happen with probability .6 or .4. The Yankees could win again, so that’s probability .6. We have two Yankee wins, or the Yankees could lose the second game so that’s probability .4. The Yankees could lose or could win. That’s .6 and this is .4, and we’ve only played 2 games. The Yankees could win a third–well, you don’t need to play this game because they’ve already won a three game series, but if you did it wouldn’t matter, .4, or we could go up or down.

The Yankees after winning and losing could then win probability .6, or could lose, or after losing and winning they could win again or they could lose. After losing and winning they could lose, so this is probability .4 and this is .6, and then finally we have this and we have this. So this is .6 and .4.

So this is what the tree looks like. You could imagine 8 possible paths each of length 3 where you give the whole sequence of wins and losses. So to compute the probability that the Yankees win you look at all the–so in this case the Yankees win. They would have already won here, but if you play it out it doesn’t matter. They’re going to win here and here. They’ve got two wins and one loss. Here they’ve got one win, two wins and one loss. They win. Here they’ve got loss, win, win. They win the World Series. Here they lose, win, lose. They lose the World Series. Here’s lose, win–it’s win, lose, lose, they also lose the World Series. Here it’s lose, lose, they’ve lost the World Series, loss. So these are the possible outcomes.

So you could compute the probability of every path, there are 8 of them, and then multiply that probability by the outcome and you’ll get the chance that the Yankees will win the World Series, right? That’s clear to everybody? But there’s a much faster way of doing it and putting it on a computer, and that’s using the law of the iterated expectation.

So first of all–so this is called a tree. So we’re going to use trees all the time. So tree, I don’t want to formally define it. It’s just you start with something and stuff can happen. Stuff happens every period, and so you just write down all the things that can happen. And then you write down all the things that can happen after that and the thing unfolds like a tree. That’s formal enough to describe a tree and here we’ve got it.

But you notice that the tree the number of things happening grows exponentially. It’s horrible to have to compute something growing exponentially, but they’re often recombining trees. Oh, so if I ask, by the way, in this tree whatever the opinion is here, which turns out to be .68 something, yeah, I should have asked you to guess, .68 something.

If you write down the opinion that opinion has to be the average of the opinion here and the opinion here. So if I take the opinion here times .6 plus the opinion here times .4 that’s also going to equal .68. And that’s what’s going to be the key to computing the thing much faster rather than going through every branch which is such a pain because there are an exponentially growing number of paths, very bad to have to compute by hand.

But we notice that we can look at a recombining tree. These two nodes are essentially the same. What difference does it make if the Yankees win one and lose one, or lose one and win one? In both cases they’re at the same spot. They’re even in the World Series. And since we assume the probability of winning any game is the same, .6 and .4, independent of what’s happened before–you might think you’re learning something about, “Oh, their starter pitched here and he didn’t last the whole game,” and stuff like that. So I’m not allowing for any of that. I’m just saying it’s a (.6, .4) chance for the Yankees to win no matter what happens.

So all you care about at any point from then on is who’s won how many games. So these nodes are basically identical, and these nodes are identical, because it all ended up with the Dodgers ahead 2 to 1, and here the Yankees were ahead 2 to 1, and here the Yankees were ahead 3 to 0, and 0 to 3. So the recombining tree which has all the same information is just this, this, this, this tree.

So this three only has 1, 2, 3, 4, 5, has far–it’s 1 node, 2 nodes, 3 nodes and 4 nodes as time goes by growing linearly instead of growing 1, to 2, to 4, to 8 which is growing exponentially. So I could have a very long World Series and write it as a finite tree and just .6 and .4 here at every stage. So how am I going to solve this now?

Well, over here I know the Yankees ended up winning all 3 games. Here they won 2, here they won 1, here they won none. So those are the outcomes. So instead of trying to figure out path by path, through these exponential number of paths what the chances of each path are, why it’s hard to compute here, it’s .6 times .6 times .4, a complicated calculation, I’m now going to do something simple. I’m going to say, what would I think if the Yankees had already won 2 games? Well, I know that they would win. That’s a 1. The series is already over.

What would I think after the Dodgers won the first two games? I’d know it was over. What would I think–so how did I get that? It’s .6 times 1 + .4 times 1. That’s 1, the Dodgers .6 times 0 + .4 times 0 that’s 0, so that’s my opinion if the Dodgers win 2 games. Here’s my opinion if the Yankees won 2 games. What would my opinion be if they split? Well, if they split what would my opinion be if I started here? So after game 2 they’ve each won 1 game. I don’t know who won the first one, but it was 1 to 1 after 2 games. Now what would I think?

Student: .6 times 1 + .4 times 0.

Professor John Geanakoplos: Exactly, so it’s .6. It’s .6 times 1 + .4 times 0. So the odds, I would think, the Yankees would win the World Series here with 1 game left knowing that they win 60 percent of the time it’s .6. But now what do I think if the Yankees win the first game? What’s my opinion?

Student: .6 times 1 + .4 times .6.

Professor John Geanakoplos: So it’s .6 times 1, so it’s .6 + .4 times .6, so that’s .24, so that’s .84 here, and what’s my opinion after the Yankees lose the first game and the Dodgers win? What do I think is going to happen? What will my opinion be here? It’s .6 times having an opinion of .6, so it’s .36 + .4 times knowing that it’s all over + .4 times 0. So it’s equal to .36.

So I’ve now figured out–not only am I solving this thing much faster than I could over there, but I’m finding interesting numbers on the way. I’m now figuring out what would I think after the Yankees won the first game? Well, now I think it’s 84 percent. What would I think after the Dodgers won the first game? I’d think it was only a 36 percent chance of the Yankees winning. So now what’s my opinion at the very beginning? It’s .6 times .84 (it’s my chance of having this opinion plus my chance of having that opinion) + .4 times .36. Oh no, 504 (maybe) + 144 what is that?

Student: .648.

Professor John Geanakoplos:.648, 6 times 84 looks like 504 and 4 times 36 looks like 144, so it looks like .648 and that’s what you said. So that’s it. I’ve solved it now. So that’s the method of iterated expectation and we’re going to turn this into quite an interesting theory in a second, but I want to now put that on a computer to show you just how completely obvious this is, I mean, not obvious, fast this is. So you could solve for any number of–a series of any length you could instantly solve.

Now, we’re going to price bonds that way too. So class–so what did I do? I–this is a spreadsheet you had. I simply had the probabilities of the Yankees winning which was .6, which I could change.

Student: Can you lower the screen?

Professor John Geanakoplos: Oh.

Student: Thank you.

Professor John Geanakoplos: So this is the simplest thing to do, but now suppose that–so we said the Yankees can win every game with probability .6. So then what did I do? I went down to here. I gave myself some room. I didn’t do a very long series. So now what does each of these things say? Each of these nodes, like that one, says, if I can read it, it says–so this is my opinion of winning the World Series. It says my opinion here is going be the chance I go up. That’s the probability, that’s A 2, that’s .6, the chance I go up times what my opinion would be over here, plus the chance that I go down, which is here, the chance I go to here which is 1 minus that number .6 that’s frozen up there, times whatever I thought would be my opinion here.

So you see that’s the same–I just write that once. I wrote that once here, that thing about the probability, my opinion there is the probability of going up. That’s S A, dollar A dollar 2, that’s .6, it’s frozen, times what my opinion would be and the square over 1 and up 1 plus 1 minus dollar A dollar 2 times my opinion over 1 and down 1. So I just copied that as many times I wanted to down the column and then I copied it again across all the rows. So all of these entries are identical, they’re all just copies of each other. So it’s just says iterate your opinion from what you know it was forward.

Now, how do I take a 3 game World Series? Well, we’re starting here. This’ll be game 1, game 2, game 3, so all I have to do now is put 1s everywhere here like 1 enter, and now I’ll copy this, ctrl, copy, and go all the down here. So that’s it. So we’ve got all the numbers. So why is that? Because my opinion here–remember the numbers we got? The series goes 1 game, 2 games, 3 games, so if you end up above the middle that means the Yankees won the majority of games. Your pay off is 1. Your probability of the Yankees wining is 1. So now what’s your opinion going to be?

If you’ve won 2 games then the Yankees have to have won. What if the Yankees win the first game? Remember the numbers we got 1, and .6, and 0, so here’s the .84. It’s the average of 1 and .6. Here’s the .36 which was the average of .6 and 0. And then we come down to the middle which is .648.

So what do I do if I want to play a 7 game World Series? I have to get rid of this, and if it’s a 7 game World Series I would just–now I want to restore what I had before, so I’m going to copy all this, ctrl, copy, ctrl. So I’m back to where I was before. So you see what I’m doing here? The game hasn’t started. This is the first game, second game, third game, fourth game, fifth game, sixth game, seventh game. Every square is just saying my opinion is my average of what my opinion will be next time.

If I want to make it a 7 game World Series I just plug in 1s here. There must be some faster way of doing this, but I plug in 1s here. So ctrl, copy and here are all the 1s down to above the thing, ctrl V, and now I’ve solved my opinion backwards and I’ve got the chances of the Yankees winning a 7 game World Series are 71 percent. So the longer the World Series goes the better the chances are the Yankees win if they’re better in each individual game, and you can do it instantly. So are there any questions about that?

So that is a trick we’re going to use over and over again to price bonds. You do it by backward induction because of the law of iterated expectations. Your opinion today of what’s going to happen way in the future when you get a lot of information has to be the average opinion you’re going to have after you get some information, but before you know what the final outcome is.

And so realizing that, you just take the pieces of information one by one and work backwards from the end and you can solve things instantly which would take in the brute force way an exponentially growing length of time to do if you did them path by path.

### Chapter 4. Uncertainty in Interest Rates [00:53:39]

I now want to turn to an application of this to one subject, which is, let’s just not do the World Series. Let’s do a more interesting problem. I hope I have time to finish this story.

So the more interesting problem is this. Let’s suppose our uncertainty’s of a different kind. Instead of not knowing the outcome of the World Series let’s say we don’t know how impatient we are.

So remember the most important idea so far that we’ve seen, because we haven’t done uncertainty yet, the most important idea we’ve seen so far is impatience. That’s the reason why you get an interest rate and the interest rate is the key to finding out the value of everything. So Irving Fisher put tremendous weight on impatience.

And now that we’re talking about uncertainty the natural thing to make uncertain is how impatient you’re going to be. So we want to talk a little bit more about impatience. So impatience by Irving Fisher is the discount. So in fact I want to talk about this in sort of realistic terms. Do we really believe that people just discount the future, 1 year they discount by delta, 2 years discount by delta squared, 3 years by delta cubed, 4 years by delta to the fourth. Is it really true that every year people think of as delta less important as the year before?

I mean, the argument for this is you might not live beyond a certain–you know, poor imagination, so imagination, poor imagination, we’ve said this before, poor imagination and mortality are the two arguments for discounting. But let me tell a story that seems to contradict that. Suppose someone asks you to clean your room and they give you a choice of doing it–I can give my son for example.

Say I–“Clean your room Constantin,” and so if I say do it today or do it tomorrow that makes a huge difference to him, I mean just a huge difference doing it today from doing it tomorrow. He’ll think doing it today is just impossible, doing it tomorrow I can almost force him into agreeing to that. So clearly there’s a big discount between today and tomorrow, but what about between a year from now and a year and a day from now? Do you think Constantin will think there’s any difference in that? The answer is no.

If I say, “Constantine, do you agree to clean it 365 days from now or 366 days from now,” to him there’s hardly any difference, but there’s hardly any tradeoff. One is hardly more valuable than the other, of course, they’re both pretty unimportant, but the ratio of the two doesn’t even seem important to him. So that’s called hyperbolic discounting.

If you do any experiment with people or with animals, you make a bird do something and if he does more stuff he gets the things faster, he’ll do a lot of stuff to get it in the next minute as opposed to in 2 minutes, but the difference between what he’ll do in 10 minutes versus 11 minutes is very small.

So hyperbolic discounting is discounting much less than exponential discounting. So this has a tremendous importance for the environment.

If you thought that people exponentially discounted like they thought each year was only 95 percent–if the interest rate’s 5 percent it sounds like the discounting is .95, so if next year’s only 95 percent as important as this year, and the year after that is only 95 percent as important as the first year, and the third year is only 95 percent as important as the second year, .95 in 100 years to the hundredth is an incredibly small number.

So there’s no point in doing something today and investing a lot resources in order to clean up the environment and help people 100 years from now, because by discounting it this much nobody could, you know, what’s the difference because the future’s so unimportant. You shouldn’t be investing resources now to do something that’s going to have such a small effect later. So in all the reports on the environment a crucial half of the report is devoted to what the discount rate should be.

So, but they never thought of doing the most obvious thing which is to ask what would happen if the discounting was uncertain. All of these are certain discount rates. So what if you made the discounting uncertain what would you imagine doing? So suppose you discount today at 100 percent, and maybe next period you’re going to discount at 200 percent, this is the interest rate, and here it might go down to 50 percent. It could go up to 400 percent or it could go down to 100 percent again, or it could go down to 25 percent, you know, this kind of discounting I have in mind.

You don’t know–so delta = 1 over (1 + r), and this is r, r0, rup, rdown. So maybe the discount is uncertain and it goes like that. So it’s a geometric random walk. I keep multiplying or dividing by 2. I multiply or divide by 2. I multiply or divide by 2. That seems to make for a lot of discounting. These numbers are going up very fast. The higher the r, the less you care about the future.

So the question is if you ask for a dollar sometime in the future, what will people be willing to pay for it? So you know today that you think the future is only half as important as the present. Let’s say these all have probability of half. And tomorrow it might be that you think the future is only 2 thirds, the next year’s only 2 thirds as important as that current year, or you might think the future’s only 1 third as important as this year.

So you see how this is working? Two years from now you might think the future’s only 1 fifth, the third year’s only 1 fifth as important as the second year. Here you might think the third year is half as important as the second year. Here you might think it’s 4 fifths as important as the third [correction: second] year.

So you don’t know what it’s going to be, and if anything this process seems to give you a bias towards getting really high numbers, high discounts, meaning the future doesn’t matter. So, but nobody bothered to stop–so this is the most famous interest rate process in finance.

This is called the Ho-Lee interest rate model where you think today’s interest rate might be 4 percent. Maybe it’ll be 10 percent higher next year or 10 percent lower and it’ll keep going up and down like that, and that’s the uncertainty about the interest rate. So if we think interest rates are so important, and patience is so important, and we want to add uncertainty, the first place to do it is to the interest rate, and the Ho-Lee model in finance does that.

Nobody bothered to compute this out more than 30 years. Compute what out? Suppose you get 1 dollar for sure in year 1. How much would you pay for 1 dollar in year 1? Well, your discount is 100 percent. You’d pay 1 half a dollar. How much would you pay for 1 dollar in year 2?

Well, you know how much more a dollar now is worth than 1 year from now, but you don’t know 2 years from now so you have to work by backward induction.

Here 1 dollar for sure is worth 1 dollar. What would I pay for it here? I’d pay 1 third of a dollar. What would I pay for it here? Well, the discount is 2 thirds. I’d pay 2 thirds of a dollar. So what would I pay for it back here? I’d pay 1 half times 1 third + 1 half times 2 thirds discounted by 100 percent. So that’s 1 third + 1 sixth which is 1 half, times 1 half, which is 1 quarter, I guess. So I’d pay 1 quarter.

So for any time I could figure out D(t) = amount I would pay, I’m going to be done in one minute, amount I would pay today for 1 dollar for sure at time t. And that number, obviously, is going to go down as t goes up, and we know how to compute it by backward induction. You just put the 1s further and further out and then you go backwards by backward induction.

But just like for the World Series I could do that any T however big I want to, and on a computer, and the spreadsheet which I wrote for you, you could do this instantly. And nobody bothered to do this for T bigger than 30 because bonds basically don’t last for more than 30 years, so what’s the point in doing it for T bigger than 30? So 100 years–there are virtually no financial instruments that are 100 years long because they didn’t both to do this.

Suppose you did it for every T up to 1,000 years? Well, you could do it on a computer very easily. You could even prove a theorem of what it’s like. So in the problem set I’m going to ask you do a few of these, and what you’re going to find is that people are hyperbolic–that you get–you discount a lot. It’s pretty close to 100 percent for the first few periods, but after that you’re going to be–anyway, you’re going to find out what the numbers turn out to be when you do it on a computer. So we’re going to start with random interest rates next period, the most important variable in the economy.

Student: What’s the next problem set?

Professor John Geanakoplos: Oh, so it’s going to be a very short problem set just doing the World Series and this random interest rate thing. It’s due on Tuesday, so I’ll put it on the web right now. I wasn’t sure how far I’d get today. And your exams are available.

[end of transcript]