What is randomness?

The code project I'm currently working on requires generating numbers for simulating card games. This is a notoriously perilous task for programmers. It also brings up interesting (at least to some of us) questions about just what randomness is, a notoriously perilous task for philosophers.

It turns out (unsurprisingly) that the mathematician's perspective on randomness closely matches the gambler's perspective. “Random”, gamblers and mathematicians will tell you, is the opposite of “predictable”. That is, the extent to which we can predict anything about the outcome of some experiment ahead of time, those results are not random. Or even more toward the gambler's point of view, a random process is one on which we cannot—even in principle—make a bet in our favor.

This differs considerably from what most people informally think of as “random”. Most of us have an intuitive sense that random things are evenly distributed, which is true in the very very long run, but not true at all on the scales we generally experience things. This intuition is called the “gambler's fallacy”, because gamblers who bet expecting things to even out in the short run keep me employed. Just because that seven hasn't been rolled in a while, or we've just seen a run of black on the roulette wheel, that doesn't mean that upcoming rolls or spins will have any bias—any predictability—compared to previous rolls. Dice, it is often said, have no memory.

This leads to some unintuitive results: if we flipped a coin 10 times, and it landed heads all 10 times, we might rightly suspect that it wasn't a fair coin. But if we flipped it a million times, and there were not a single run of 10 heads in a row in the whole sequence, we would also reject that coin as non-random, because it's distribution was too even. After billions of rolls of the dice, or billions of cards dealt, or billions of spins of the wheel, we would expect all the possible outcomes to be roughly—but not exactly—even. But in the short run, lopsided results are commonplace, and in fact failing to find such streaks is evidence of non-randomness. And here's an important clue: your lifetime is the short run. If it were true that things “evened out” in the short run, that would be a statistical bias that you could bet on and make money. People do bet on those feelings, but it's the casino who makes money, because they're betting on randomness.

There is one exception to note in casino games: blackjack. Cards don't have memory either, but the shoe from which cards are dealt does. If you've been watching 40 hands of blackjack and not seen a single face card, you can be sure that the cards remaining in that shoe, until it is reshuffled, are overly rich with face cards, and change your bets accordingly. The fact that you will see more face cards in the next few hands than you would expect from a fresh shoe is statistically predictable, and you can therefore make money from that fact, even if each individual card is still randomly chosen from those remaining. Card counting works because short-run clusters of events are predictable.

Randomness and science

People who misunderstand math and science get randomness wrong all the time. You will hear a silly argument from creationists that creating life from random processes is like a tornado in a junkyard randomly assembling a 747. The mistake here (well, one among many) is that they think a process that merely contains some random element is therefore a random process. Yes, random mutation is an important element in evolution, but less important than the process of natural selection, which is not random at all. Imagine, for example, your parents, and their parents, and their parents, going back thousands of generations. That's a lot of people. Now ask: how many of those people died at birth? How many died before puberty? How many were sterile? How many just didn't have the desire or opportunity to have children? These questions are easy: zero. 100% of those people, without exception, successfully reproduced. 100%. That's the exact opposite of random. That's why natural selection is such a powerful force. Even with random variation as input, its output is remarkably complex and and sometimes gives a magnificent illusion of design—and some things that clearly aren't designed.

Even real scientists get randomness wrong. Doctors who find clusters of people with certain cancers, for example, often mistakenly jump to the conclusion that there's some environmental cause, when in fact a purely random distribution would inevitably produce such clusters. Only if the clusters are much worse than expected by randomness—as determined by a good mathematician—should we start looking for another cause. This is called the “Texas sharpshooter” fallacy, after a supposed shooter who fires shots randomly at the side of a barn, then walks over to the biggest clump of holes and draws a target there. Cancer clusters are just like our 10 heads in a row: if the numbers are big, we should expect them a certain number of times.

Even Dr. Steven Novella, host of the Skeptics Guide to the Universe podcast, has said on the air that the digits of pi are random. They are not. First of all, they are 100% predictable by calculating pi. But even outside of that, it is possible to find statistical biases in the digits of pi without actually calculating them out. For example, one can mathematically prove, say, that the 10 billionth digit of pi is a bit more likely to be a 7 than a 6 without calculating pi all the way out to 10 billion digits. Such statistical properties of the digits of pi are quite common, and it means that you could make a bet on one of those digits at fair odds and make money. If the digits were random, this would not be possible.

Programmers get randomness wrong all the time. This includes, unfortunately, the programmers who write the standard function libraries for most programming languages. When you learn to program, you will probably be given a programming exercise of some simple card or dice game, and be taught about the standard function in your language for producing random numbers. For a simple game run a few times, this is probably fine. But if you try to make real industrial-strength simulations of billions of hands or rolls, your built-in random function will probably fail in more than one way. It may be too evenly or too unevenly distributed, or it may repeat itself too soon.

So your language's built-in function is probably bad. And if you try to do it yourself, you'll probably make a worse one. So what should you do? Find an add-on library written by a serious math geek who you trust to get it right. And there's more: even if you have a perfect random number generator, you can use it the wrong way and still get bad results. Shuffling a deck of cards, for example, requires not only that you select cards at random with a good algorithm, but that you correctly rearrange the cards in such a way that each possible arrangement is equally likely, and this is easy to get wrong. There are even issues with the purpose for which the numbers are used: a good algorithm for choosing cards for a simulation might not be the best for choosing a cryptographic key, and vice versa. The well-known algorithms have fancy names: cryptographers use algorithms called RC4, Yarrow, and Fortuna. People writing simulations use algorithms called Mersenne Twister and CMWC.

The moral of this tale is short: if you think you know what randomness is, think again. Maybe consult a mathematician. If you're writing code that uses randomness, definitely consult a mathematician. It's much easier to get it wrong than you think. And test the code. I have as many lines of test code in my library to verify the random number generator as I have to generate the numbers. There's also a good program called “dieharder” that runs a suite of statistical tests on your generator to prove its quality (and which will, by the odd nature of the beast, occasionally—randomly—fail even when your code is perfect!)

Just to give you a glimpse of the level of complexity of the problem, this page shows the code from my OneJoker library that generates random numbers and shuffles cards, and does nothing else. Over 150 lines of code to ensure that the next card you get is, in fact, unpredictable.

Why Python?

I recently counseled a friend who wanted to learn about computer programming to start by learning the Python language. I also mentioned that I liked Python to an old friend who is a fellow experienced programmer, but I wasn't very clear about why. Now that I'm in the middle of a project that uses both the Python and C languages, I've come to better understand my reasons for favoring Python both for learning to program and for serious use.

Computer programming is, at its core, communication. At the lowest level, a program instructs a computer how to solve a problem. But at a more important level, a program communicates to people the thought process of the programmer in translating a vague problem into a specific solution. A programming language, then, should be expressive. That is, it should be easy for a programmer to concisely and accurately describe his thoughts, and it should be easy for someone reading the code (often that same programmer years later) to understand the original programmer's intent.

A brief history of programming

Computer hardware is reasonably simple, conceptually (though there are certainly complex details). There is a large memory that stores numbers. Programs move these numbers from one place to another in memory and do math operations on them. Attached devices use numbers to represent dots of color on a screen, letters on a keyboard or printer, the position of a mouse, the movement of a loudspeaker.

The computer also uses numbers in memory to represent instructions to itself. This is called “machine code”, and was how the very first programmers had to program. If I wanted to write the letter “A” to the screen, I had to know that the screen interprets the number 65 as that letter, and that putting it on the screen involved writing that number to a specific address in memory, and that the instruction code for “write this number to this address” is another number, and so on. Then I put the numbers 248, 65, 2, 193 in the right place in memory and press a start button. This worked, but was complex, tedious, and error-prone.

It is no accident that the first programs written by early programmers were tools to simplify programming. One such tool is called an assembler. This is a program that takes computer instructions described as more human-readable text and translates them into the raw numbers. For example, I might give the memory address of the screen the name screen. The number representing the write-to-memory instruction is named write, and so on. Now I can type write screen, "A", and the assembler will translate that into 248, 65, 2, 193 for me. Assembly instructions are an exact one-to-one match for machine instructions. They are just a more convenient—and more expressive—way to write them.

The next tools were real programming languages, which are a level of abstraction above machine code. Instead of describing machine instructions directly, the programmer used expressions like y = (x + 3) / 2, and a program called a compiler would translate that into a string of instructions to load numbers from memory, do the math, and write them back. The details of exactly how that was accomplished were delegated to the compiler program. A computer is meant to solve problems, so why not have it solve problems about how to program itself? FORTRAN was the first popular such language, although the C language overtook it and remains popular today.

In addition to compilers, there are programs called interpreters that do roughly the same job of translating a programming language into machine instructions, but do so on the fly, as the program is running. This makes using them even simpler since a programmer can try things interactively and get immediate feedback. BASIC is the canonical example of this kind of interpreted language.

Modern languages

Programming languages today add one more level of abstraction. Instead of operating only on numbers directly, or even names given to numbers, they allow you to describe “objects”, which are large collections of numbers that represent real-world things like people, places, accounts, documents, and so on. You express actions in terms of these objects, and the compiler or interpreter then decomposes these into the actions needed on the individual numbers and finally into machine code.

The measure of a modern programming language then is twofold: how well does it allow the programmer to clearly describe the problem to be solved, and how well does it translate that solution into machine code? These goals are often at odds: efficient translation to machine code is often accomplished by making special-case exceptions and back doors in the higher-level abstractions that allow the programmer to fiddle with the numbers directly at the expense of clarity and safety.

The best example of doing this badly is the C++ language. It is an extension to the C language that adds some modern abstraction tools, but it retains all of the low-level number-twiddling of C, which allows—indeed encourages—programmers to step outside of the abstractions. The resulting programs are complex, hard to understand, loaded with exceptions, and hard to debug.

The Java language does a better job, producing efficient machine code while maintaining well-defined higher-level abstractions with few exceptions. It achieves this in part by requiring the programmer to be very explicit about a lot of implementation details that don't really express programmer intent, so it tends to be verbose and hard to read and write.

The Python language manages to maintain consistent and clear use of its high-level abstractions without special cases, and yet produces machine instructions that are remarkably efficient in terms of speed. It achieves this goal mostly at the expense of using more memory at runtime than other languages. Python programs internally use dozens of big hash tables to speed up the namespace and associative-array lookups that accomplish its expressiveness. This is a good tradeoff. Today, memory is a lot cheaper than time. Time saved running a program—and time saved writing it—more than make up for the fact that a running Python program takes up twice the memory of a running Java program.

Examples

Python also adds many features that increase expressiveness without sacrificing either efficiency or high-level abstraction. Things like multiple-value function returns, default function arguments, named arguments, and tuple assignment allow a programmer to provide the information he wants to see in the code and eliminate much that isn't really expressive but merely pro forma. Here are some geeky examples for those who want (and understand) details:

Let's say I have a function of two arguments, the second of which is a value from 1 to 100, but almost always a default value the caller may not know about. In C++, I'd have two choices of how to write this. One, and the way you'd do it in C, would be to use a value like 0 to mean “use the default”:

void dosomething(int a, int b) {
    if (b == 0) { b = 53; }
    . . .
}
. . .
dosomething(5, 12);
dosomething(5, 0);

What would a programmer reading this later think on seeing the call? It looks as if 0 is just an ordinary passed argument and might be surprised that it gets changed. Maybe some change has made 0 a valid argument now, and we'll have to change the default marker. If that happens, and we run across that call in another piece of code, is that call a valid one with 0 or was it the old default?

The other option in C++ uses function overloading:

void dosomething(int a, int b) { . . . }
void dosomething(int a) { dosomething(a, 53); }
. . .
dosomething(5, 12);
dosomething(5);

Now we don't have the problem of confusing a real value with a default marker, but we have a new problem. In C++, overloaded functions are not related in any way and might do completely different things. the first call might draw a line on the screen while the second one plays music. A programmer might reasonably expect the latter call to be a default case of the former, but he can't rely on it. In Python, two function calls with the same name in the same place are guaranteed to call the same function. Omitting an argument means “use the default”, and can mean nothing else.

def dosomething(a, b = 53):
. . .
dosomething(5, 12)
dosomething(5)

Python doesn't need overloading because its arguments aren't typed (more on this later). You treat the arguments of different type differently by checking them explicitly only when necessary. In this way, the code more closely matches the programmer intent. Both languages can do what we want, but in C++ it's easy to get it wrong, while in Python it's easy to get right and is more lucid. Even the much-maligned feature of Python that code indentation is significant helps catch errors by forcing the physical appearance of the code to match its real meaning.

Many of the more expressive idioms in Python come from the world of “functional programming”, a field of study in computer science that uses functions as the overriding abstraction rather than objects—more about verbs, less about nouns. Python is not a functional language itself. It is firmly grounded in the not-so-old school of organizing your problem first by the things involved and then what they do. But its carefully-borrowed features from that world make it capable of expressing complex actions more clearly and effectively than is possible in many other languages.

Let's say you have two lists of things called lista and listb, a function f(), and you want to create another list of f() of things from lista where that thing also appears in listb. In Java, most of your work is specifying implementation (my Java is a little rusty, so forgive me if there are errors; I'm just trying to convey the flavor of the code):

List<Thing> nl = new ArrayList<Thing>();
Iterator it = lista.iterator();

while (it.hasNext()) {
    <Thing>a = it.next();
    if (listb.contains(a)) {
        nl.add(f(a));
    }
}

Here's the equivalent Python:

nl = [ f(x) for x in lista if x in listb ]

The advantage here is not just that it's easier to type, but that it clearly and concisely describes what the programmer wants, and no more. I didn't have to tell the computer exactly how to do what I wanted, just what I wanted. And this code will be easier for me and others to understand later.

“But wait”, I hear Java programmers saying about both of my examples, “Python code isn't type-safe!” That's true. My Python list here might contain non-Things, and the code will still compile. More, it will actually work as long as f(x) succeeds for whatever it finds in lista. It is believed by some that strict type safety catches programmer mistakes. I believe (and some evidence suggests) that this is a myth. Strict typing gets in the way more than it helps. And here's an important point: if I wanted to add strict type-checking to the Python code, I could, making it look more like the Java code. So the Python code is simpler for the more generally useful case, and more complex if that's what the programmer wants.

Performance and Conclusion

As a final note, I'm sure others will point out that speed of execution is critical in many applications, and that Python may not be suitable for those. That's true, but such cases are fewer than you might think. I've used Python for graphics, sound, number crunching, database access and many other things that might seem performance-hungry, and it's up to the task. It's certainly on par with Java. If you have written a program in Python and it's not fast enough, odds are it's your algorithm, not your language. Even if it is the language, it's likely that you've saved enough time in development that you could translate your slow—but working and clearly defined—code into C and still have spent less total time than it would have taken to develop in C in the first place, and have fewer bugs.

My OneJoker library is a hybrid: the core stuff that needs to run blindingly fast is in C, and Python code links to it. This is so that I can write the code for a simulation in nice readable Python, and still do millions of hands in reasonable time. Let's say I wanted to compare the odds of starting with ace-king in a Texas Hold'en game against pocket deuces. Here's the whole program, using my library:

#!/usr/bin/env python3
import onejoker as oj

h1 = oj.Sequence(7, "Ac Kh")
h2 = oj.Sequence(7, "2s 2d")

deck = oj.Sequence(52)
deck.fill()
deck.remove(h1)
deck.remove(h2)

wins1, wins2, ties = 0, 0, 0
boards = oj.Iterator(deck, 5)

for b in boards.all():
    h1.append(b)
    v1, h = oj.poker_best5(h1)
    h1.truncate(2)

    h2.append(b)
    v2, h = oj.poker_best5(h2)
    h2.truncate(2)

    if v1 < v2:
        wins1 += 1
    elif v2 < v1:
        wins2 += 1
    else:
        ties += 1

print("{0:10d} boards".format(boards.total))
print("{0:10d} wins for {1}".format(wins1, h1))
print("{0:10d} wins for {1}".format(wins2, h2))
print("{0:10d} ties".format(ties))

Run this, and it dutifully prints:

1712304 boards
 799119 wins for (Ac Kh)
 903239 wins for (2s 2d)
   9946 ties

in about a minute and a half on my old laptop. And this is actually a pretty bad case for my library: I have run other simulations that complete billions of hands in minutes. If I wanted an approximate answer faster instead of waiting for the exact one, I could replace the line for b boards.all() with for b in boards.random(10000).

So I repeat my recommendation for others out there who may be looking to get into computer programming: try Python. If you want to learn C or Java after that, go ahead. But if anyone suggests you learn C++, run as fast as you can.

Why is prostitution illegal?

Many of you may have heard by now that my friend and WSOP champion Greg Raymer was arrested after setting up a date with a prostitute that turned out to be an undercover cop. But that isn't the real news story. Frankly, I think “Overweight wealthy man hires hooker” is about as newsworthy as “Teenager caught drinking beer”. Yes, they're both illegal, but I really don't have a problem with either if they're done responsibly—and they can be. What I do have a problem with, and what really should be the story here, is that the government of North Carolina thought it was a good idea to spend taxpayer money to catch two consenting adults doing something in private that they have every right to do.

We own our bodies. A woman (or man) has a right to have sex. And a right to charge for services. But for some reason, our backward society takes issue with doing those things together. It is true that other kinds of crime often surround prostitution, but prosecuting it for those reasons makes no more sense than jailing the friends and family of a criminal instead of the criminal. San Francisco recently busted a gang caught smuggling women from Asia to work as forced prostitutes, and that's great—not because they cut down on prostitution, but because they cut down on fucking slavery. I am also happy when the police grab an abusive pimp or a hooker who robs her customers. But laws against prostitution itself actually make catching these crimes harder, because it forces prostitutes and customers to work outside the law and not report such crimes.

It is no accident that prostitutes in places like Nevada and Amsterdam do not suffer from crime, disease, or abuse. Where it is legal and regulated, it is safe and actually reduces crimes like rape and spousal abuse.

If a woman freely chooses to charge money for sex, and a man freely chooses to hire her, and they do so safely, that's their own business. The man's wife might have an issue with it, or she might not. That's her business. What I'm quite sure of is that it's none of my business, and none of my government's. What is my business is how my government spends my money, and I for one don't want them wasting it on sting operations for non-crimes.

How many poker hands are there?

I've been posting a lot of philosophical geekery lately, so I'll balance that today with some math geekery. Today's math term is equivalence class. The basic idea is simple: if you have a big set of things, you can reduce it to a smaller set of things, each of which is a subset, or "class", of those things that are “equivalent” in some well-defined way.

Here's an example: how many five-card poker hands are there? Well we can pick any card from a 52-card deck, then pick a second card from the 51 remaining, and so on five times. This gives us 52×51×50×49×48 hands, or 311,875,200. Those 300 million hands include A♥4♦A♣9♥4♣ and also 4♣9♥A♣A♥4♦, so we can immediately reduce that by a factor of 120 by noting that poker rules don't care what order the cards are in. So we collect all those together and reduce our number to 2,598,960.

But we can go further still. That 2.5 million counts our two hands above (along with other combinations like 9♥4♦A♥4♣A♣) as equivalent, but it counts separately the hand A♠A♥4♥4♦9♠, which is the “same” hand in the sense of being identically valued—it is “two pair, aces and fours, nine kicker”, just like the first two. So how many poker hands are there, only counting those that are actually of different value in the game? As it turns out, only 7,462. Number one at the top of that list of 7,462 is simply “royal flush”, which accounts for four of our 2.5 million hands, and 480 of our 300 million. Number 2 is “king high straight flush”, and so on down to number 7,462 which is “no pair, 7-5-4-3-2” (which accounts for 1,020 of our 2.5 million, or 122,400 of our 300 million).

Notice a major difference between our two reduction operations: in the first case, we reduced the big set into subsets that were all the same size. Each of our 2.5 million hands contains exactly 120 of our 300 million—that's the number of different ways you can arrange 5 cards. As a consequence of this, the probability of each of those 2.5 million hands is exactly the same, just as is the probability of each of the 300 million. The 7,462 sets, however, are of different sizes. There are hundreds of times as many ways to get 7-5-4-3-2 as there are to get a royal flush, so the probability of each is different.

One common application of equivalence classes is in computer science: sometimes you need to do something to a very large set of inputs, and you can simplify and speed up the operation by reducing them to a smaller set. If you ask Google for pages about “poker”, not only would you expect it to return pages that mention “Poker” and “POKER”, but Google would save time and disk space by indexing those only once.

This can be applied to life as well. Perhaps there is a large set of things you'd like to improve about your life in some way. If you can group them by things that might have a similar cause or similar solutions, not only will you reduce the number of things to think about, you might notice that some groups are much larger than others, giving you guidance about what to focus on.

Prisoner's Dilemma

Grade schools in the US ignore philosophy as a subject. A few high schools give it brief mention (and then it's only to cover historically important people). Even in many colleges it remains elective. The result is that many important subjects in philosophy are unknown to the general public, despite the fact that they are simple and can have a great influence on our everyday lives.

I've mentioned concepts like confirmation bias and the sunk cost fallacy before. These are common mistakes all people make in reasoning that can be avoided if we learn about them. These have aspects of psychology as well as philosophy. A more purely philosophical concept everyone should understand is the Prisoner's Dilemma. A typical example goes like this:

Two suspects are arrested for a robbery. Each is questioned separately by police and told this: Our evidence against the two of you for the robbery is thin, but we can give each of you a year in jail on a lesser weapons charge. If you confess and squeal on your buddy, he'll get five years and we'll let you walk. But if you both squeal, you each get three years.

A keeps silentA confesses
B keeps silentA: 1 year
B: 1 year
A: free
B: 5 years
B confessesA: 5 years
B: free
A: 3 years
B: 3 years

Each suspect reasons like this: I can't talk to my buddy, and I have no control over what he does. If he clams up, I get a year if I do as well, but I go free if I confess. If he squeals, then I get five years if I stay silent and three if I confess. In both cases, I'm better off confessing. Both suspects reason this way, so both confess, and each gets three years in prison. But if they had both remained silent, they both would have gotten only a year. So the essence of the Prisoner's dilemma is this: reasoning separately, both parties doing what is clearly in their best interest end up with a result that is worse that what they would have gotten if they had cooperated.

Many situations in life mirror this. Take doping in sports, for example. Whether or not your opponent is doping is out of your control. If he is, you must dope to compete. If he isn't, doping won't lessen your chance of winning, so individually you are always better off doping. But if everyone in the sport is doping, the results will be roughly the same as if no one is doping, so as a group, it would be better if everyone didn't. Other situations like the tragedy of the commons can be modeled this way.

The way out of these dilemmas is to find some means to encourage or force cooperation. For example, a criminal gang might have a prior agreement—or strong social taboo—against snitching. Sports regulators might have strong rules against doping and do regular testing. It has even been suggested that one of the primary motivations for which people create governments is to have a third party to resolve such dilemmas between citizens.

Like any simplified mathematical model of complex human interaction, there is a danger of applying it to situations that don't quite match. In a recent episode of the Philosophy Bites podcast, Jeff McMahan suggests modeling aspects of the gun control debate as a prisoner's dilemma: for example, the interaction between a burglar and homeowner. Each reasons that if the opponent is armed, he is certainly safer being armed himself, and if his opponent is unarmed, being armed doesn't hurt, so it is better to be armed in each case. But collectively, they are safer if both are unarmed.

I don't think this particular argument holds water, even ignoring all the other aspects of a very complicated issue. First, the “payoffs” (that's mathematical jargon for the relative value of the results to each of the participants) are not symmetrical. In the “both unarmed” condition, the winner of the interaction is likely to be whoever is bigger, stronger, or the more experienced fighter—probably the criminal. The “both armed” condition raises the stakes and the danger for both, but is also equalizes them, so it is likely to benefit the homeowner relative to the burglar. Second, it assumes that both the criminal and the homeowner place equal value on their own safety. This is a psychological question. Perhaps the burglar is a sociopath who values violence for its own sake. It also makes the assumption that the “both unarmed” condition is something that's possible to achieve in real life.

Another way out of the prisoner's dilemma is available when situations are repeated more than once: reciprocity. When we have multiple interactions with people, we can gain a reputation for being cooperative, making others more likely to cooperate with us. Robert Axelrod's classic experiments along these lines are explored in his book The Evolution of Cooperation. Reciprocity can also explain how our moral sense (and things like our ability to recognize faces) can evolve from the essentially selfish process of natural selection.

This concept is simple, and recognizing it in real life situations can make such a difference in life that it should be taught to everyone in grade school.