Balancing Low Latency, High Availability and Cloud Choice
Cloud hosting is no longer just an option — it’s now, in many cases, the default choice.
But the cloud computing market, having grown to a whopping $483.9 billion, may finally be starting to slow down in the face of companies refusing to fully abandon their in-house data centers.
Why are they refusing?
Because they’ve realized that the 100% cloud is good for certain things, but others no.
Let’s look at the top cloud computing use cases, the use cases for which cloud probably isn’t the best route available, and the use cases where a hybrid approach may be best.
When is the cloud a good idea?
Before we get into the maybes, let’s briefly examine the cases in which the cloud is ideal:
1. Low usage or over-engineered legacy systems
Until the arrival of cloud hosting, nobody knew how to size anything properly, or they did know but never got a chance to. As a result, IT teams picked hardware somewhat blindly but with a strong bias towards oversizing for the sake of expanding the budget, leading to systems running at 10-15% of maximum capacity. These systems are the ideal candidates for moving to the cloud because they can be moved onto smaller, cheaper virtual hardware, which frees up their expensive hardware for re-use or disposal.
2. Prototypes, experiments, and tests
Development and testing historically involved end-of-life or ‘spare’ hardware. While efficient on paper, scaling-related issues usually pop up when testing moves into production. With the cloud, your testing capacity is only limited by your budget. The cloud also allows you to use hardware without having to buy it upfront.
3. When you have no hardware
The third and most obvious cloud use case is when there is no alternative. The cloud allows even people with very limited budgets like startups or even students to gain access to the kind of hardware we all dreamed about a decade ago.
When is the cloud a bad idea?
Now that we’ve examined the good cloud use cases, let’s look at the bad ones.
1. When you need more than ‘three nines’ of reliability
Volt’s heritage is in the high availability computing space, with many of our customers operating in industries where ‘five nines’ is the industry standard. But public clouds generally don’t offer this level of reliability for entirely sound commercial reasons. For example, AWS offers ‘three nines’, but only if your application runs with triple redundancy in three separate availability zones in the same Amazon data center. Even then, AWS can elect to ‘move’ your server to different physical hardware without warning, a process that involves ‘only’ a few seconds of downtime. For most use cases this won’t even be noticed, but for mission-critical applications that require low latency, this would be somewhere from ‘noticeable’ to ‘dramatic’, which provides the perfect segue to our next section.
2. When you have hard low-latency requirements
Until we figure out a way to bypass the speed of light, mission-critical applications requiring low latency have to live relatively close to their data platforms. Given that light moves at around 180 miles per millisecond, and about 120 miles per millisecond in a fibre optic cable, it quickly becomes apparent that when you need less than 10 milliseconds of latency for a round trip, your choice of cloud bit barns may be limited.
Long-tail latency spikes would break a lot of if not most time-sensitive use cases, like IoT device control. Another latency issue we see in the public cloud is “Noisy Neighbours”, where shared hardware doesn’t deliver a perfect and smooth runtime experience. Our current working ‘rule of thumb’ is not to assume more than 85% of measured capacity will be available in a real-world situation if your use case is latency sensitive; running the system at less than max capacity will smooth out 99th-percentile latency spikes.
3. High data transfer Costs
The last big ‘gotcha’ with the cloud is transfer costs for sending data to and from a public cloud. Cloud vendors incentivize maximum usage to maximize their profits, and levying hefty tolls on data as it enters and leaves is a part of this. It’s also something IT teams tend to overlook if their dev and test program is 100% cloud but their real-world deployments aren’t.
When is hybrid cloud the best idea?
As with anything, there’s always a “middle ground”, and there are some instances where a hybrid solution may be best. In such a scenario, most of your applications will be 100% cloud but some will remain in-house.
These are the main advantages of using such a ‘hybrid cloud’ approach.
1. Handling surge capacity
Volt has several customers who have wildly varying workloads, and on paper. the ability to spin up instances in the public cloud on a whim seems attractive. But in reality, they usually need certainty, and the terms of service for public cloud providers make it fairly clear that there is no guarantee on their part to give you more hardware on demand unless you pay to reserve it first.
While ‘reserved instances’ are cheaper than ‘on demand’, this defeats the whole point of elasticity. The alternatives are to either ‘provision and pray’ that extra capacity will always be available when you need it, or adopt a hybrid cloud strategy by buying your own servers and run them in-house. With TCO becoming increasingly visible, this might actually be a better path.
2. Other people’s skills — and incentives
One of the headline reasons for running ‘something as a service’ is that you can avoid having to hire and retain the skills in-house. There’s a degree of truth to this, and an awful lot of the drudgery will be taken off your hands.
But you will still need expertise. Many products, especially data platforms, require expert knowledge to use efficiently. Where does that knowledge come from if you aren’t going to hire it yourself? If you need to make things more efficient, who do you ask? Cloud vendor employees also have a strong incentive to get you to use as much ‘stuff’ as possible when your goal is to pay for as little ‘stuff’ as possible.
By adopting a ‘’hybrid cloud’ strategy and keeping some of your equipment in-house, you can make sure you have the skills and incentives needed to get the best results and ask the awkward questions when needed. Note that about 90% of data platform systems are underused and can be parked in the cloud, so we’re only talking a small percentage of the overall hardware.
3. Developer freedom
An optimist would say that the cloud allows you to rapidly create large, elaborate systems that solve big problems quickly. A pessimist would say that this encourages bloat and inefficiency. The truth lies somewhere in the middle. This is arguably a management and incentive issue as much as a technical one, as developers need to be incentivized to retain visibility to how much things are costing. One reliable way to control costs is to adopt a hybrid cloud strategy for mission-critical systems: buy the hardware and insist that the application needs to be made to run on it. This will increase development costs but slash operating costs.
4. Potentially more control over latency and downtime
This is a controversial point, but one which bears consideration. Surely one of the big advantages of using your local public cloud behemoth’s bit barn is that they guarantee it will stay up? Well, truth be told, they don’t. The industry standard for uptime guarantees is 99.9% (ie “three 9’s”). A lot of Volt’s customers are in spaces that require “five nines” uptime. To achieve that, they use the public cloud and combine it with Volt’s Active(N) to copy the system between multiple data centers to boost availability to 99.999%.
A second issue is that instances get moved around in the public cloud. This is done by freezing processing, rapidly copying the contents of RAM to a new server, re-mounting the file systems so they point at the new server, and then resuming processing. This may only take a second, but it’s disruptive and could be avoided if you owned the hardware in question.
Like so many other things in tech, choosing cloud, on-prem, or hybrid is a very nuanced decision involving many different factors — one that depends on your budget, resources, time, and of course the nature of your applications and what you want to achieve with them.
As discussed, cloud start-up costs are much lower, and you can quickly gain a lot of capability, but this doesn’t come without risks and compromises.
That said, if you run mission-critical applications that require low latency and high reliability, as so many Volt customers do, then the cloud probably isn’t the best choice.