Exclusive Google Caffeine — the remodeled search infrastructure rolled out across Google's worldwide data center network earlier this year — is not based on MapReduce, the distributed number-crunching platform that famously underpinned the company's previous indexing system. As the likes of Yahoo!, Facebook, and Microsoft work to duplicate MapReduce through the open source Hadoop project, Google is moving on...
But MapReduce didn't allow Google to update its index as quickly as it would like. In the age of the "real-time" web, the company is determined to refresh its index within seconds. Over the last few years, MapReduce has received ample criticism from the likes of MIT database guru Mike Stonebraker, and according to Lipkovitz, Google long ago made "similar observations." MapReduce, he says, isn't suited to calculations that need to occur in near real-time.
MapReduce is a sequence of batch operations, and generally, Lipkovitz explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."