The cloud and big data seem to many to be closely linked. Big data is a product of the cloud and mobile era, with massive amounts of data from infrastructure and logistics, business operations, mobile applications, and social networks flowing into flexible cloud platforms to be mined, processed, and analyzed. And surely, given the massive amounts of data involved, physical infrastructure doesn’t begin to offer the elasticity required of the ever growing torrent. It’s a compelling narrative, especially if you’re cloud vendor, but the reality is that physical infrastructure, server clusters, bare metal clouds, colocation — all those “old-fashioned” ideas — play a dominant role in enterprise big data platforms.
According to Chris Selland, VP of Business Development at HP’s Big Data Division, “most of the company’s customers aren’t using the cloud in a substantial way with big data. There are both technical challenges (like data portability, and data latency) along with non-technical reasons, such as company executives being more comfortable with the data not being the cloud.”
In the same report, Etsy Senior Database Developer CB Bohn emphasizes the limitations of the public cloud for his company’s big data efforts. Citing the technological and economic effort of lifting massive amounts of data into the cloud, Etsy relies on in-house expertise and colocation to manage and analyze the massive amounts of data produced by its eCommerce platform.
As the Internet of Things becomes more prevalent, we can expect to see the amount of data companies deal with increase exponentially. As the volume of data expands, the limitations of public cloud platforms will be even more starkly emphasized. For a company that depends on big data for its revenue, the best option is to invest in stable, reliable physical infrastructure rather than relying on cloud platforms that offer limited insight into or control of the underlying infrastructure.
“The Internet of Things is coming, and drastic traffic growth is going to blow your network sky-high. Should you scale up your on-premises data center? No. Should you move to the cloud? No.” says Supernap’s’s Jason Mendenhall, “The best strategy is to move your servers, applications, and data into your own servers in a top-tier colocation facility.”
In fact, the cloud is particularly bad at running the sort of applications that big data analysis depends on. Big Data relies on the ability to move massive amounts of data around very quickly. As I discussed in our post on the benefits of bare metal clouds earlier this year, public cloud platforms are multi-tenant environments that rely on network attached storage. That’s a particularly bad design if the goal is reliably available compute and very fast data transport.
That’s not to say public cloud platforms don’t have their place. Many organizations maintain extensive physical infrastructure for core operations and use the cloud for “bursting” when resource requirements temporarily exceed available infrastructure. The key lesson is not that the cloud should be ignored, but that businesses should be careful to examine all the options and invest in the infrastructure portfolio that makes sense for their workloads.