February 22, 2010

Big numbers are everywhere these days, what with lost jobs (millions), bailouts (billions), and the budget and its deficits (trillions). I think most of us don’t grasp these numbers at all, to the point where words like million, billion and trillion have become synonyms for “big,” “really big” and “really really big.” Technology has its own set of “big” words as well: mega, giga and tera are part of everyday speech, while further-out ones like peta and exa now appear in public with some regularity.

Since most of us have no intuition about big numbers and probably don’t know the data that they are based on anyway, we’re at the mercy of whoever provides them. Here are a couple of recent examples.

There was much buzz before Christmas about Amazon’s Kindle and other e-readers as potential gifts, along with speculation about a tablet device from Apple. (The iPad was announced in late January, but won’t ship until March, so that’s a topic for another time.) On Dec. 9, The Wall Street Journal said that the Nook e-book reader from Barnes & Noble has two gigabytes of memory, “enough to hold about 1,500 digital books.” On Dec. 10, The New York Times said that a zettabyte (10^21 bytes) “is equivalent to 100 billion copies of all the books in the Library of Congress.”

By good luck, I was right then in the early stages of inventing questions for the final exam in COS 109, so this confluence of technological numbers was a gift from the gods. On the exam, I asked, “Supposing that these two statements are correct, compute roughly how many books are in the Library of Congress.” This required straightforward arithmetic, albeit with big numbers, not something that most people are good at. The brain often refuses to cooperate when there are too many zeroes. Writing them all out might help, but it’s easy to slip up. Scientific notation like 10^21 is better, but units like “zetta,” completely unknown outside a tiny population, convey nothing at all to most people.

Since intuition is of no help here, let’s do some careful arithmetic. Taking the Journal at its word, 2 GB for 1,500 books means that a single book is somewhat over a million bytes. Taking the Times at its word, a hundred billion copies is 10^11; dividing 10^21 by 10^11 implies that there are about 10^10 bytes in a single copy of all the books. If each book is 10^6 bytes, then the Library of Congress must hold about 10,000 books.

Is this a reasonable estimate? One useful alternative to blind guessing is a kind of numeric triage, which led the second part of the exam question: “Does your computed number seem much too high, much too low, or about right, and why do you say so?” Of course if one didn’t do the arithmetic correctly, all bets are off. A fair number of people found themselves in that situation, and thus had to rationalize faulty values from hundreds to bazillions.

Those who did the arithmetic right were better off, but some still had trouble assessing plausibility. Apparently even small big numbers are hard to visualize, for a surprising number thought that 10,000 books was reasonable for a big library: “I would guess that even Firestone holds over 10,000 books” was a not-atypical response. That’s not reasonable, of course — even I have close to 500 books in my office, and I’ll bet that many humanities colleagues have thousands.

Let’s look at another example. At Christmas, my wife gave me “Googled: The End of the World As We Know It,” by Ken Auletta. It’s an interesting history and assessment of the most successful technology company of the past decade, though there were places where the fact-checking was a bit spotty. For instance, it claims that Google’s CEO, Eric Schmidt ’76, graduated from Princeton in 1979. But in exam-creation mode, what caught my eye was the very last sentence, which says that Google stores “two dozen or so tetabits (about twenty-four quadrillion bits) of data.”

Another gift! There is no such thing as a tetabit; if quadrillions is correct, then the word should have been petabits. So I asked, “How many gigabytes does Google store?” This required converting petabits to gigabits, then dividing 24 bits by 8 bits per byte to get 3 million gigabytes. But “tetabit” is also only one letter away from another valid unit, terabit, so the second half of the question asked, “If tetabits really should have been terabits, how many gigabytes would there be?” I’ll leave that as an easy exercise.

What are we to do when the country’s premier newspapers and highly qualified authors of important books can’t get the numbers or the units right? Most numbers just go right by; we don’t have the time or background to pay much attention, and we act on intuition and gut feelings, however faulty. Could we do better? Eternal vigilance is a partial answer. Informed skepticism, a little knowledge and some grade-school arithmetic will also help, but only if we use them.

Millions, Billions, Zillions

February 22, 2010

By : Brian Kernighan