Home < Blog < Introducing Volt Active(ISU): Zero-Downtime Upgrades to Improve the Engine While It’s Running

Introducing Volt Active(ISU): Zero-Downtime Upgrades to Improve the Engine While It’s Running

4 min read

Enterprise customers that require ‘five nines’ availability need it as a guarantee, not just a vague promise. That’s why we’ve introduced Volt In-Service Upgrade  “Active(ISU)”, which allows you to upgrade clusters without having to shut anything down. When all the nodes have been updated, the features of the new version will become active. Active(ISU) will become generally available with Version 13.1 as a separate licenable option.  

WHAT’S REQUIRED FOR ACTIVE(ISU) AND HOW DOES IT WORK?

In order to use Active(ISU), your system must have at least one spare copy of all data items, or be ‘k=1’ in Volt terminology. Ideally, it would be ‘k=2’, or have two spare copies of everything. This is because while Active(ISU) is running a node will be offline, with a consequent loss of redundancy. 

Note that the node will only be offline for the time it takes to bring it down, apply the patch, bring it back online, and have it catch up and rejoin the cluster. While internally this could be a couple of minutes, from the perspective of clients there will be a brief (1-3 seconds) latency spike and then everything will continue as if nothing has happened, without the node in question, and then another spike when it rejoins. Processing continues while the node is being upgraded – akin to making upgrades on an engine while it’s running. 

The key thing to understand is that once the node has come back, none of the new functionality is visible or used initially. It’s only when all the nodes in a cluster have been updated that they all switch versions at the same time. 

While ISU works for dot releases, and we’ll try and make it work for major releases, we can’t guarantee it will work for every future major release. Active(ISU) is also a separately licensable feature.

What else does Volt do to make “five nines” a reality?

Don’t forget that Volt also does all of the below to make five nines update a reality:

  1. Async developer APIs to ensure your client thread isn’t waiting for someone else. This is a non-obvious but important feature. If your client code needs to respond within a certain SLA, usage of synchronous APIs makes it almost impossible to guarantee a timely response, as the control flow of the code is waiting for a response which may never come. The classic real-world example of this problem being solved is when you try to make a phone call and get a ‘system not working’ message instead of a prolonged silence.
  2. Multiple up-to-date copies of data mean that losing a node isn’t a problemIn Volt, we provide high availability by getting more than one node to run each and every request, while remaining perfectly synchronized. This means that if a node dies there is another perfectly up to date copy of its data online somewhere ready for use.
  3. Silent rejoins so that when a node comes back you won’t see a disruption. When a node that was offline comes back, it asks other nodes in the cluster to silently send it the data it needs to work, and eventually catches up and rejoins as a first-class member. This happens without any material impact on client behavior or performance. 
  4. Elastic add and shrink so you can change the size of the cluster without users ever perceiving downtime.
  5. Double-, triple-, or even quadruple-active cross data-center replication via Active(N) so you always know what’s lost when conflicts between updates are resolved. Active(N) also allows different geographical sites to run different versions of Volt and for one site to make a schema change in advance of the others.

CONCLUSION

Active (ISU) solidifies Volt’s industry lead in providing true high availability to enterprise customers in the telco and IoT spaces, among others. If you want to talk to us about high availability, enterprise SLAs, or processing lots of data in real time, at scale, without compromising on accuracy, consistency, or resiliency, click here to get started. 

David Rolfe