The Internet of Things (IOT) is rapidly expanding the number of streaming applications as people sensor tag an increasing number of objects of significance. This adds to the historical streaming applications we have observed in financial services, the military, on-line gaming and elsewhere. In this blog post, we classify such streaming applications according to Table 1.
The vertical axis represents the latency requirements of the streaming application. The bottom row represents applications without severe requirements. In other words, multi-second response time is acceptable. The notification of you being tagged in a Facebook picture is an example of an application without demanding latency requirements. The top row represents demanding temporal requirements. For example electronic trading applications require event notifications in small numbers of milliseconds, so they can take action before others can react to the event at hand. Many financial services, gaming and military applications fall into this category.
Any given application has latency requirements and falls somewhere on a temporal spectrum, which we have shown in Table 1 as two boxes for simplicity.
The horizontal axis represents a totally different dimension of streaming applications. Consider the trading applications discussed above. If an event is lost, corrupted, inadvertently discarded or processed twice, no great harm is done. If a processing node or the application crashes, then there is a lost opportunity while the application is down. However, over the course of a trading day, there will be many other possibilities. Such applications have messages that are not that important, and the messaging system does not have to be held to demanding standards.
In contrast, consider a second streaming application concerning electronic trading. Here, an example company has multiple electronic trading systems, say in Tokyo, London and New York. A risk management application wants to assemble the real time position of the whole company, for or against all stocks that are traded. The purpose is to “ring the red telephone” if risk exposure exceeds a specific threshold. This is an application where “exactly once semantics” is required. Message loss or corruption renders the whole application invalid. A crash renders the whole application similarly invalid, and High Availability (HA) must be provided. Hence, every message really matters. On the right hand side we see applications that require some-to-all of the following features:
- Exactly-once message processing
- High availability
- Ability to store significant amounts of state (e.g. trading positions) reliably
To support such features requires both replication and transactions, or some construct that is equivalent.
Of course there are intermediate cases between “unimportant” and ”important”. As a result, both axes of Table 1 are a continuum, so Table 1 should be considered a discretization of the actual real world.
One last point should be noted. It is always possible to provide the features above through application heroics. For example, it is possible to provide HA on top of a system without HA by implementing replication, failover, and failback in user-level logic. As I have observed on several occasions, this is “rocket science” code that is difficult, tedious and error-prone. Hence, in Table 2 which puts systems into buckets, we assume that such heroics are not used, and we simply report on system features that are available “out of the box”.
In the lower left hand box are batch parallel frameworks such as Hadoop and Spark Streaming. They will work fine as long as you don’t need the answer right away. Neither supports transactional guarantees on data processing, so it is possible to lose or misprocess messages.
Real time complex event processing systems (CEP) fall in the upper left hand box. Most expect to react with sub-second response time. However, transactional guarantees on message processing are typically absent, often replaced by some lesser guarantee. HA is similarly often absent or requires substantial programming.
Transactional systems which do not operate in real time include most SQL DBMSs. The legacy systems from Oracle, IBM and Microsoft fall in this category, as sub-second response time is difficult to guarantee because of their legacy architectures. All data warehouse products are also in the lower right hand corner.
In the upper right hand box are transactional systems that support HA and operate with millisecond response times. A good example of this class of product is Volt Active Data.
A final concluding remark can be made about Table 2. Consider the throughput (in messages per second) of the products mentioned. In head-to-head comparisons, Volt Active Data outperforms or is comparable to every product indicated. As a result, the following statement can be made:
You can move to the upper right hand corner with no performance loss!!!!
As a result, any sane consumer should run in the upper right hand corner, as there is no performance penalty for transactional guarantees. Even if you don’t need transactions now, you might want them later. Your successors will probably thank you for adopting a transactional architecture.