[Thomas Williams originally wrote this for his blog Twenty-Five Squared, but was kind enough to share it with the iRunFar audience.]
I did not enter the lottery for the 2011 Western States run, and I’m not entered otherwise, so I can claim at least some objectivity for the following critique of the lottery process that follows.
For the 2010 run, it became clear that the “two-time loser” policy for getting into the run – not picked in the lottery two years in a row; you’re in the next year, assuming you qualify – had to go; the numbers became impossible in that before very long the entrants would be entirely two-time losers, and even then impossible in the context of the cap imposed by the Forest Service for the number of runners who can pass through the Granite Chief Wilderness at one time. So there had to be a change. So no more two-time losers. Instead, for every year you did not get selected in the lottery, you would get an additional ticket the following year, assuming you qualified and wanted to enter. If you were not selected one year, you would have two tickets next year; if you were not select the second year, you would have three tickets in the following year, and so on.
This process will not work. The odds for people with multiple tickets never improve as much as might seem intuitive. In fact, depending on how many people enter the process for the 2012 run, it could actually happen, and pretty easily, that the odds for the three-tickets holders are worse than their odds the prior year when they held two tickets. The odds for multi-ticket holders will never be 50 percent, or even close, as was reported at the lottery for the 2011 run.
The Western States lottery fits a type of statistical problem that has the scary name hypergeometric distribution. It is like a binomial distribution (itself, not a very comforting name) except that there is no “replacement,” which means that if your ticket is drawn, you are removed from the process; in other words, you cannot get selected more than once.
For the 2011 run, there were 2115 tickets in the lottery, 1113 for people who had one ticket and 501 times two for people who entered the process for the 2010 run, did not get selected, and wanted to try their luck again. There were 213 slots available before the “had to be at the lottery” selections. If you plug those numbers into a hypergeometric distribution (Excel makes it pretty easy to do that with the HYPGEOMDIST function), the result is that people who had two tickets had a probability of being selected of just over 18 percent, while people with one ticket had a probability of just over 10 percent. These probabilities are from the point of view of being selected once during the entire process, not the same as the probability at any individual draw, so you might say from the cumulative perspective. Put another way, the probability of a person having two tickets not getting selected at all was about 82 percent, and for a person having one ticket about 90 percent, given 2115 total tickets and 213 spots available.
As the special-consideration entrants continue to come into the run, it is hard to tell now how many one-ticket holders got selected, and how many two-tickets holders, during the lottery process. Of the 1113 one-ticket holders originally listed, 963 are now listed, while of the 501 two-ticket holders originally listed, 400 remain listed. This means 150 from the one-ticket holders and 101 from the two-ticket holders, which means that almost 14 percent of the one-ticket holders got in – by one means or another – and just over 20 percent of the two-ticket holders got in – not too far off the statistical expectation. But the total, 251, is more than the total selected at the lottery, 213. So let’s just guess that 120 of the one-ticket holders were selected in the lottery process while 93 of the two-ticket holders got in. That would put the probability very close to exactly the statistical expectation, which might give us some confidence in the statistical analysis.
So, the probability for the two-ticket holders was not as good as we might have thought, but surely, next year, when the new form of “two-time loser” has a third ticket, the odds go up? Maybe not. The odds could go up, but could easily go down, and in any event remain lower than might intuitively seem to be the case. Let’s say that for the 2012 run we have 1000 one-ticket holders, 700 two-ticket holders, and 300 three-ticket holders, assuming that about two-thirds of the one-ticket holders who did not get in for the 2011 run decide to try for the 2012 run and maybe three-quarters of this year’s two-ticket holders decide to try their luck again. How do those numbers crunch in a hypergeometric distribution? The one-ticket holders would have a probability of a little over 6 percent; the two-tickets holders, a little over 12 percent; and the three-ticket holders, just under 17 percent – worse odds for the three-ticket holders than they had as two-ticket holders. Let’s stretch the guesswork out to include 200 people who hold five tickets for the 2014 run and 250 people who hold four tickets; the odds for the five-ticket holders are about 17 percent – still less than their odds as two-tickets holders. This process does not work.
All numbers aside, I find myself more concerned about the purely human effects involved. In a statistical distribution, there are “outliers” – a few very lucky people and few very unlucky people. I am concerned about what is obviously some pretty significant emotional devastation for some really unlucky people who work themselves to the bone year after year to have a shot at getting into the Western States run and do not get in. This problem is well known, used to be solved by the “two-time loser” policy, does not have a new solution, and is not helped when people are just as competitive about getting into the run as they are about running the event.
I don’t have any bright ideas about how to fix the problem, but I think that we need to open the discussion and, for whatever ideas make the initial cut, apply some probability theory to them and see how they shake out. Let’s not forget that people are involved here, and that the system should be as fair as all the people who care about it have brains to make it fair. One thing that I think would help the larger process is transparency about the selections. Each person in the run should have an annotation about the selection method, such as “got in at the lottery” or “selected by Michigan Bluff aid station” or “Montrail sponsor selection” or “won Miwok” or “the Board feels that this runner will bring a lot of good beneficial publicity to the race.” It does not well serve the Western States run the impression shared by many of a “smoked-filled room” of special consideration. Perhaps people could apply for special consideration, make their case in writing, and be subject to a vote of the applicants, of the people.
Call for Comments
I welcome your comments, and especially your ideas on a sound method for statistically improving ‘lottery loser’ odds from one year to the next, so that the odds not only increase from one year to the next but increase significantly, such that the possibility for the truly unlucky never to be selected, year after year, would be very, very small, unlike the current system.
I should mention that, as far as statistics go, I am a hobbyist at best, and I really welcome input from those who are more than that.