I encountered a cloud problem this week. A couple of years ago I would have said that I encountered a server hosting problem, but we must be fully buzz-word compliant. It seems there are two main kinds of clouds, but I am predicting one will disappear in the long run. Don’t say I didn’t warn you.
Is the night chilly and dark?
The night is chilly, but not dark.
The thin gray cloud is spread on high,
It covers but not hides the sky.
The moon is behind, and at the full;
And yet she looks both small and dull.
The night is chill, the cloud is gray:
‘T is a month before the month of May,
And the Spring comes slowly up this way
– Samuel Taylor Coleridge
Real “cloud computing” is best described by Nicholas Carr in his book “The Big Switch”. The cloud is simply a new way to deploy and access equipment. Instead of buying their own compute hardware, a company will simply sign up for compute service, and pay by the CPU cycle. The analogy is to power, and how factories used to have their own generators, and later they could just buy electricity from a utility company, and let the professionals run the generators.
Since cloud computing began discussion, there have emerged two very distinct camps:
- The elastic cloud camp: including Google, SalesForce, Amazon, and (possibly) Microsoft Azure.
- The discrete host camp: including VMWare, Rackspace, and dozens of traditional hosting companies now advertising that you can get cloud computing from them.
Both camps allow you rent hardware instead of having to purchase it up front, so technically they both fit Carr’s definition, but there are big differences. The elastic cloud camp requires you to redesign your software somewhat – particularly with regard to persistence. The discrete host camp offers environments identical to that which you would have in your own IT center, which is familiar, and does not require rewriting applications.
The problem I ran into is with the company that hosts the web site XPDL.org: MochaHost. This is an incredibly inexpensive hosting company, I dare say it is too inexpensive. They have some plans that will give you a complete (virtual) server for $1.95 a month! For less than $24/year you can host a modest server. But, you can hardly expect good support. Taking a single support call from you might ruin their profits for the entire year! The business plan is obvious centered on getting large numbers of people signed up automatically, the servers are all self managed, they run run the systems reliably, and if they never have to talk to any of their customers, they will make some money.
The problem with discrete hosting is that eventually you need to reconfigure / move servers, and since there are real settings which are tied (at some level) to hardware, when the services need to be reallocated to the hardware they have to make changes. Of course, all the changes should be automatic, but that can never be guaranteed. We received information that the XPDL.org host would be moved to new hardware, and on Saturday, at 3pm in the afternoon it was. In doing so, they failed to get the destination server properly configured. The key wiki server was not running, several other services were not available, and to top it all off, the the “Control Panel” which allows one to configure the server was not allowing anyone to log in. For all intents and purposes, the server was “down” at that point. At the time of writing this, 5 days later, it is still down. It has proven impossible to actually get someone in support to look at the problem, the excuse being given is one of security and being sure that the requests for support are coming from an authorized person. Even though it is obvious that the server is failing, any excuse to avoid work is used. It occurred to me that part of their business model is weed out and eliminate anyone who needs support: it is better for profits to simply lose any customer who needs support, and focus exclusively on customers that *somehow* never need help.
About 8 months ago we had a similar problem with MochaHost. Some support person was looking at the (shared) server at exactly the time that a web robot happened to be indexing the site. They noticed exceptional CPU usage, and blocked the web server. It took 5 days to get to the right person to get the site unblocked. There was never anything wrong with the software running there, they simply decided on a whim to turn it off. Like I said, too inexpensive.
The change from discrete hosts to elastic cloud is a classic disruptive technology change. Thousands of IT managers are loath to change their applications to go to elastic cloud, so they opt for the easier discrete host cloud. The discrete host cloud offers short term advantages, at a much higher long term cost. The traditional server OS was not designed to be easily moved around and rescaled on demand. To do this effectively, there are many configuration settings that all have to be updated in concert, and it is not likely that this will happen. So such hosting options will always be faced with interruption.
However, the elastic cloud was designed from the beginning precisely to allow for this kind of flexibility. Yes, you have to rewrite a bit, but the benefit is that you never have to worry about the kinds of problems we encountered this week. There simply is less to go wrong, because the platform was designed from the beginning for this kind of change.
The end of the story is that it turns out to be easier to move the data to a new account at Google’s cloud than it is to find the right support person at MochaHost. I have already moved some of the services, and will follow through with the rest soon. This whole experience is a kick in the pants to go ahead and re-code for an elastic cloud. I expect, that designing for the elastic cloud will reduce not only the time spent configuring the site, but also eliminate the possibility that hardware moves in the future will interrupt service. the discrete host cloud is doomed in the long run. Of course, last week Amazon did experience a major outage, so even the elastic cloud is not perfect. Still, it is time to get over the barrier, and move to the “real” cloud platform of the future, or risk being left behind.