How many books have ever been published in all of modern history? According to Google’s advanced algorithms, the answer is nearly 130 million books, or 129,864,880, to be exact.
We’ll admit it’s practically impossible to count every book that has ever been written, but in order for Google Books to successfully catalog the world’s supply of printed knowledge, the company needs an estimate of the amount of books it needs to scan. That’s why Google set out on the task to do just that.
In a detailed blog post, software engineer Leonid Taycher outlined just how complex counting books actually can become. The first step is defining exactly what a book is. The company decided to discount anything that wasn’t an idealized bound volume:
“One definition of a book we find helpful inside Google when handling book metadata is a ‘tome,’ an idealized bound volume. A tome can have millions of copies (e.g. a particular edition of Angels and Demons by Dan Brown) or can exist in just one or two copies (such as an obscure master’s thesis languishing in a university library).”
Google admits their definition is imperfect, but it’s workable and similar to what ISBNs are supposed to represent. ISBN, or International Standard Book Numbers, are designed to be unique identifiers for books. Because they’ve only been around for 30-40 years and are used in mostly Western countries, they can’t be used by themselves. That’s why Google took data from the Library of Congress, WorldCat and others to find as many books as possible — one billion raw records by the company’s count.
Here’s where Google’s engineering talent comes into play. The company used countless algorithms to determine and discard duplicates in an effort that required more than 150 pieces of metadata related to the world’s books to evaluate whether each book record was unique or a duplicate of another. Analyzing this data resulted in 210 million unique books.
Next, Google subtracted the millions of microforms, audio recordings, maps, t-shirts, turkey probles (yes, turkey probes) and videos with IBSNs, arriving at a much more reasonable number of 146 million. Finally, the company removed 16 million government document volumes from their estimate, getting to the 129.8 million count they announced today. Of course, publishers are issuing new books even as this post is being typed, so the company is constantly recalculating the book count.
While we don’t have an army of software engineers and algorithms to back us up, my gut says this number is too low and the company has many more books to count from the annals of history. Still, this project to figure out the world’s book supply is simply fascinating and could be useful for research and historical purposes for years to come.
Google’s next challenge: to stop getting sued over Google Books.
For more Tech coverage: