I’m not one for kicking someone while they’re down, but Verizon’s recent 48-hour downtime should be instructive for companies seeking to put all of their eggs in the cloud basket. The planned outage was announced to users of Verizon’s cloud via an email, sparking a storm of angry Tweets and articles about the reliability of the cloud.
Now, it wouldn’t be entirely fair to say to that downtime of this order was common in the cloud: it isn’t, but there are plenty of similar instances that have hit most of the major cloud platforms over the past few years. What makes outages of this sort so striking is their scale and how at odds they are with the assurances of virtual cloud vendors that the cloud is actually more robust than traditional infrastructure hosting scenarios.
One commenter on Hacker News put it succinctly:
If I was a hosting company and needed a 2 day outage, I’d probably just shut down my business and offer to cover some expenses for users to migrate whilst apologizing profusely.
Downtime is a fact of life for hosting companies and their clients. Servers and networks require maintenance, and sometimes that maintenance leads to downtime. There is no web hosting company that can promise constant uptime (or they can promise it in the expectation that they’ll be giving credit to some proportion of their users when they fail to deliver it). But a 48-hour planned outage is egregious in an industry where even the least expensive web hosts regularly deliver uptime of 99.9%.
I don’t have any special insight into the problems that caused this particular outage, but it is worth noting that cloud platforms are fragile in ways in that dedicated servers are not. In theory, virtual machines can be migrated to new servers or even new data centers to work around outages. In practice they are not, and once you’re locked into a particular vendor, it’s difficult to migrate to an alternative without a great deal of work. With dedicated servers or bare metal clouds, migrating backups is much more simple because the complexity of the virtualization layer does not exist. A Linux server is a Linux server, and restorations on new hardware are easily managed.
This incident and others like it shine a light on the distance between the cloud’s theoretical advantages and the real-world cloud systems we are able to use. In theory, cloud platforms are multiply redundant across servers, data centers, and geographical locations. In practice, unless you have the considerable expertise to build and manage that sort of redundancy, your virtual servers are going to be sitting on one or more physical servers in the same data center — which means they’re not any more redundant than a bare metal cloud. And given the virtualization tax and the other issues with virtualized platforms we’ve discussed before, that makes bare metal clouds look pretty good in comparison: where they’re better, as with performance and reliability, they’re much better. Where they appear worse, the difference is often only theoretical because cloud platforms fail to deliver on their promises of endless flexibility and redundancy.