Google Caffeine launch: web indexing delivers new content faster

a coffee cup full of coffee beans

The Google Caffeine launch on Wednesday delivers a wider range of web content and makes more web content available instantly on Google. Flickr photo.

Caffeine, Google’s new web indexing system, went live Wednesday. Announcing the global launch of Caffeine, Google said its evolving search engine technology makes even more freshly minted web content available and delivers that new content faster than before. Searchers and web content developers don’t have to change the way they use Google. But links to a broader range of relevant content are now presented much sooner after the content is published. The Caffeine overhaul of the web indexing technology also provides Google more flexibility to keep pace with a web that is evolving at an accelerating rate.

Google Caffeine launch: speed isn’t everything

Google said the Caffeine launch delivers 50 percent fresher search results. That feature alone may be hard to translate into a benefit for the average Google user. PCWorld tested a side-by-side comparison of web indexing systems when Caffeine was in development and found that results took 0.15 seconds on the regular Google search and 0.09 seconds on Caffeine. No one else will be able to repeat that test now, since Caffeine is now the regular Google search. And 0.06 seconds probably won’t make much of a difference for searchers, no matter how tight the deadline. However, what shows up 0.06 seconds faster will make a difference for content publishing.

Real time content publishing

The immediate benefit of the Google Caffeine launch to the average user is fresher content, and more of it. Google’s Matt Cutts told Search Engine Land that that “Caffeine benefits both searchers and content owners because it means that all content (and not just content deemed “real time”) can be searchable within seconds after it’s crawled.” Search Engine Land reports that the old Google would crawl a set of pages, process those pages and add them to the index. All the pages in the batch had to wait until the whole batch was processed to be made available on Google. Now Google crawls and processes pages individually and instantly.

Caffeine: astronomical storage capacity

For Caffeine to eliminate the delay between when it finds a page and makes it available requires an astronomical amount of storage. On the Official Google Blog, Carrie Grimes said Caffeine indexed web pages on an enormous scale. Caffeine processes hundreds of thousands of pages in parallel — every second. Paper pages processed at that rate would stack three miles high — every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles. PC World adds that the bill from Apple would be $155,625,000.

Keeping up with Caffeine

The Google Caffeine launch doesn’t change web searching or content publishing. But Resource Shelf points out an important detail. Information found one day may not be there if you go back to the same location the next. This is because pages are being refreshed more frequently and the cache is also being updated more frequently. If a searcher needs content on a page the way it looked at noon on Wednesday, it’s a good idea to make a copy with something like Zotero, a Firefox extension because by 12:15 p.m. on Wednesday the content on the page might change when the cache is updated.

Other recent posts by bryanh