The leading-edge of innovation in the data center is now firmly in the consumer space with eBay, Google and Amazon leading the charge. Not too long ago, the data center innovation was being driven by financial firms like Morgan Stanley and Merrill Lynch. Further back in time, the defense and space sector was the engine of data center innovation. It’s interesting to see how things have changed.
This ascendancy of the consumer players in the data center innovation space is brought home by an excellent architecture talk on eBay’s data center. Dan Pritchet, a Technical Fellow at eBay, gave this talk at a SD Forum event a few weeks back. He has a also posted a related and thoughtful article, “You Scaled Your What?” on his blog. [I came across the talk via Vinnie Mirchandani’s Deal Architect blog. There is insightful commentary available here, here, here and here.]
The scale and complexity of eBay operations is mind-boggling. It manages 212m registered users with over 1 billion photos and 105 million listings. It has 2 Petabytes of data and handles >26 billion SQL executions/day. Its users trade more than $1500 worth of goods every second clocking 1 billion page views every day. What’s more, 300+ features are added per quarter with about 100,000 LOC rolled out every two weeks.
For me the big story here is about the ‘focus-inversion’ that has taken place in recent years. Ten years back the focus was on tuning the applications to get the best performance. Now the focus is on tuning the infrastructure to get the best performance. Commodity building-locks and custom-software to tie it all together is the new norm in data center infrastructure. Without this new thinking, it would simply not have been practical to get the kind of scale that eBay, Google and Amazon have today. This rise of adaptable infrastructure is creating a new “lean” data center model. Nick Carr evocatively calls this “frugal computing”.
Each leading-edge has its own vendors
Every new leading-edge in data center innovation has produced its own hot segments and startups.
At the start of the 90s when UNIX was catching on, Wal-Mart was pushing the boundaries with large-scale relational data storage optimized for reporting and analysis (instead of transaction processing). Their partner was a small company, Teradata, who pioneered the concept of data warehousing. The Wal-Mart – Teradata relationship has remained strong over the years. Wal-Mart is still the largest data warehousing user, as this 2004 NY Times article points out. It has “460 terabytes of data stored on Teradata mainframes… To put that in perspective, the Internet has less than half as much data, according to experts.”
By late-90s, a different set of lighthouse accounts had taken center-stage and financial firms like Morgan Stanley, Merrill Lynch, etc. were setting the standards for transaction processing. VERITAS Software was born in this milieu and came to dominate high-performance UNIX storage management software.
This current leading-edge in data centers will produce its own hot startups and segments. It will also push many existing segments and startups off the list.
What won’t be “hot”?
In a blogpost on scalability.org, Joseph Landman laments that commercial HPC has gone “out of style” and is “just not sexy” any more. He doesn’t say why that’s the case but his subsequent wrap-up of SC06 (Conference on High-Performance Computing, Storage, Networking and Analysis) points to an answer. He found that real applications and use cases for high-end HPC were missing this time. The truth is that today’s leading-edge players – eBay, Google and Amazon - are not relying on commercial HPC at all. They are building their own.
We know that Google has its own file system, scheduler, distributed lock management, you name it, so that they can run on low-cost commodity PCs. Their motivation is pretty clear. “We develop code at all levels of the infrastructure. This allows us to take advantage of storage optimizations and network optimizations that are much harder to do when you are running on top of something that you don’t really control.”, says Jeff Dean, Research Fellow at Google, in a popular talk on “Big Table” (video is here).
What will be “hot”?
A new generation of commercial high-performance infrastructure components, mostly software, is waiting to come out. Many of these will be either be spin-offs from eBay, Google and Amazon, or will count these firms as their lighthouse accounts. I would like to hear your views on what will be “hot” in the data center infrastructure vendor space.
[See also my previous posts, “Move Over Data Center”, and “Nick Carr Labels it the Era of Frugal Computing”. In a future post, I’ll look at the mad scramble that is getting underway to fix the current data centers.]
Update: Joseph Landman has politely but firmly taken me to task for this post. In spirit of friendly and constructive conversation, I have replied here. I have argued that having a growing market doesn’t make a segment “hot”. What would make HPC hot is if it was part of the current data center leading-edge. Alternatively, a “hot” startup can still happen if there is a disruptive game plan that the investors can put their arms around.
It’s interesting to see the contrast in how enterprises are going towards handling their data center requirements.
On one side you have companies like Google, Amazon who go with the approach of building their own infrastructure and on other side, you have enterprises who want to outsource the complete infrastructure setup and management of datacenter based on SLAs.
Both these trends give rise to interesting new demands for data center infrastructure software.
While Amazon/Google like setups require very customized infrastructure capabilities(though the requirements are high end in terms of performance, scalability, etc.),
the Data Center hosting shops will require infrastructure capabilities which should be very customizable based on the client’s requirements.
Who is the new hot brunnete on stargate universe? Please let me know!!!
http://www.koldcast.tv/video/2384