Big Data London

Learn More

Home < Blog < Jepsen Test Validates ACID Nature of Volt Streaming Capabilities

Jepsen Test Validates ACID Nature of Volt Streaming Capabilities

8 min read

Modern Data Technology Center Server Racks in Dark Room with VFX. Visualization Concept of Internet of Things, Data Flow, Digitalization of Internet Traffic. Electric Equipment Warehouse.

Volt Active Data (Volt) is a sophisticated real-time data platform intricately designed with multiple critical components, including high-speed data processing, in-memory storage, and ACID-compliant transactions. Volt’s outbound streaming data (also known as Export) functionality, a key aspect of its distributed database system, has multiple components crucial for efficient data movement. 

We recently collaborated with Jepsen to test the robustness and reliability of Volt’s export mechanisms. The analysis helped us validate our latest fixes for streaming problems identified by our customers and reinforced the strength of Volt’s export functionality and internal testing strategies. 

Why Jepsen Testing?

If you aren’t familiar with Jepsen testing (although we’re guessing you are if you’re reading this), then click here to learn more. We had a great experience in 2016 when we had Jepsen test out the Volt database functionality and needed an update in 2023 due to an incident at a customer site.

A major customer found an atomicity bug in our export system last year. They discovered data streamed out from Volt that did not match data within Volt, as it should have. After fixing it, we decided to improve our testing by building a new test based on Jepsen to validate the outbound streaming functionality in addition to the core product functionality. The Jepsen test validated the original fix, and we found and fixed another bug that was less likely to happen. 

Test Design

Our software development team engaged Kyle Kingsbury, the creator of Jepsen, to give training to the team and consulting on building Jepsen tests for streaming. We then set out to build our own test of Volt’s outbound streaming functionality to verify its guarantees.

Volt’s export capability allows for various ways to stream data: including explicit and implicit streams. Explicit export streams are defined with their own schema. Custom procedures manage transactions that write atomically into a Volt table (or tables) and into an export stream (or streams). Once a transaction is committed, its data should be persisted simultaneously within Volt and within export streams. The export stream can then asynchronously deliver the data to an export target. Implicit streams are defined at the point of table creation to enable the capture of table state changes. The table itself specifies the export resource where the data is sent. When that table is updated, a record of that table state change is also sent to the export stream.

The graphic below shows an example of data coming in from a data source (this can be any event source, a user application, or another streaming platform). Stored procedures can insert SQL into tables and/or export streams. These exported records will be written to an external system (such as another database, CSV files, or another streaming platform).

Screen Shot 2024 08 27 at 7.56.19 AM
Screen Shot 2024 08 27 at 7.56.19 AM

The export Jepsen tests utilized both implicit and explicit export streams to validate Volt’s export functionality.

We considered the following scenarios:

1. A single node of a Volt cluster repeatedly crashing and recovering. 

The Linux “kill -9” command was used to terminate the node. In this situation,  the ownership of a partition replica is moved from the failed node to a surviving node. 

2. Crashing and recovering an entire cluster.

3. Partitioning a network for a Volt node. 

Network communications between Volt nodes are blocked with Linux’s “iptables” command. Once the coordinator on the node identifies such a situation, the Volt node is terminated to avoid having multiple replicas processing data simultaneously (a.k.a. ‘split brain’). After the network is restored, the node must be restarted and will rejoin the cluster, receiving up-to-date data from the surviving replica.

The final step was validating Volt’s correct operations based on the recorded transactions issued from Jepsen and on the data within Volt and within the external system. Note that an external store and Volt are independent; Jepsen tests cannot coordinate requests across them. Hence, if Volt violates its safety requirements, no conclusion can be made from outside of Volt until the data is guaranteed to be present in the export store. 

Note that the Jepsen framework allows the specification of various additional systems to run as external stores for Volt. The simplest store that is easiest to configure for the export streams is a file system. Jepsen tests export Volt into a set of comma-separated (CSV) files. This choice of a simple external store does not affect the generality or effectiveness of the developed Jepsen tests.

Volt Export Guarantees

Atomicity. Volt treats each transaction as a single unit, ensuring all operations within a transaction are completed successfully or none at all. This same guarantee covers writing database tables, as well as export stream.  This property of Volt leads to two important requirements for export functionality.

  • GUARANTEED DELIVERY (NO LOST RECORDS)

As soon as a transaction is completed, the record must be scheduled and eventually delivered to the export stream to be consumed by external applications. 

  • NO PHANTOM RECORDS

Export streams must not contain records that are not properly processed and committed to the Volt database..

Durability for Exported Data. Once a transaction is committed in Volt, it remains so even in the event of a system failure, ensuring the permanence of transaction changes. Similarly, an external application consuming export streams might fail, or a network might encounter a problem. Exported data must remain durable until external faults are recovered. Hence “guaranteed delivery” is satisfied by the durability requirement on the Volt export. 

Consistency. Volt maintains consistency by ensuring all transactions transform the database from one valid state to another, adhering to data integrity and business rules. Respectively, consistency-like property for the export data follows from the consistency property for the data in the database. Volt data is sent to the export streams simultaneously as transactions are committed. 

Isolation. Transactions in Volt are processed in isolation. Similarly, export streams and external applications see records only after they are committed to the database without interference from any other transactions.

Results

Volt engineers ran multiple test variations. 

1. Explicit streams and injected crash faults to Volt nodes

A configuration showed a “Phantom Records” defect at the customer’s installation and during our previous testing. At the customer, and in our own tests, we could demonstrate that occasionally, during a node crash, a record could be written to the export stream, but not be persisted in the Volt database.  This defect was fixed in V12.3.1.  When running on an unpatched version, V12.3.0, this “Phantom Records” defect did not show up in the Jepsen test which most closely mimicked the customer’s scenario. This shows that even the most sophisticated testing framework might fail to uncover a significant failure in the distributed database. 

2. Explicit streams and injected partition faults to Volt node 

Our initial Jepsen tests were modified slightly to inject partitioning of Volt nodes during operation. Such a fault implies the node is shut down shortly afterward while Volt continues operating on the remaining nodes with reduced k-safety. This configuration did show a “Phantom Data” problem. Such errors were seen only with the Volt V12.3.0, which did not contain the “phantom data fix.” The same Jepsen test configuration run on V12.3.1 version yielded no “Phantom Records.” This testing affirmed our claim that we had fixed the “Phantom Record” issue. 

3. Jepsen test with implicit export streams and injected crash faults to Volt

We discovered a “Guaranteed Delivery” defect in this configuration. In this case, Volt was set to run export via implicit streams by specifying export targets when tables were created.  We found that some records might never be delivered to the external system from the export streams if Jepsen tests run over 5 to 10 minutes with at least 5 to 10 cycles of fault injection and restoration. Shorter tests did not report any failures. A new product defect ticket was created. This defect was fixed in Volt V13.1.0. Additional Jepsen testing of this latest version did not show any “phantom” or “missing” records in both implicit and explicit streams. This failure also showed that our previous testing might not have subjected Volt to a sufficient number of fault cycles and missed this defect.

Conclusion

We’re very excited by these tests because they prove that Volt’s ACID guarantees now include our streaming capabilities. We continue to run our older Jepsen tests for the database and these newer export tests  nightly and fix any defects or regressions found from the runs.

David Rolfe