Editor’s note: This post is re-posted here by permission from the author. The original post was on LinkedIn on January 10, 2018.
The world has changed massively in the past 20 years. Back in the year 2000, a few million users connected to the web using a 56k modem attached to a PC, and Amazon only sold books. Now billions of people are using to their smartphone or tablet 24×7 to buy just about everything, and they’re interacting with Facebook, Twitter and Instagram. The pace has been unstoppable.
Expectations have also changed. If a web page doesn’t refresh within seconds we’re quickly frustrated, and go elsewhere. If a website is down, we fear it’s the end of civilisation as we know it. If a major site is down, it makes global headlines.
Instant gratification takes too long! – Ladawn Clare-Panton
What’s Changed?
The above leads to a few observations:-
- Scalability: With potentially explosive traffic growth, IT systems need to quickly grow to meet exponential numbers of transactions
- High Availability: IT systems must run 24×7, and be resilient to failure. (A failure at Bank of America in 2011 affected 29 million customers over six days).
- High Performance: In tandem with incremental scalability, performance must remain stable, and fast. At the extreme end, Amazon estimates it loses $1.6B a year for each additional second it takes a page to load.
- Velocity: As web connected sensors are increasingly built into machines (your smartphone being the obvious one), transactions can repeatedly arrive at millions of transactions per second.
- Real Time Analytics : Nightly batch processing and Business Intelligence is no longer acceptable. The line between analytic and operational processing is becoming blurred, and increasingly there are demands for real time decision making.
The Internet of Things is sending velocity through the roof! – Dr Stonebraker (MIT).
The above demands have lead to the truly awful marketing term Translytical Databases which refer to hybrid solutions that handle both high throughput transactions and real time analytics in the same solution.
What’s the problem?
The challenge faced by all database vendors is to provide high performance solutions while reducing costs (perhaps using commodity servers). But there are conflicting demands:-
- Performance – To minimise latency, and process transactions in milliseconds.
- Availability – The ability to keep going, even if one or more nodes in the system fail, or are temporarily disconnected from the network.
- Scalability – The ability to incrementally scale to massive data volumes and transaction velocity.
- Consistency – To provide consistent, accurate results – particularly in the event of network failures.
- Durability – To ensure changes once committed are not lost.
- Flexibility – Providing a general purpose database solution to support both transactional and analytic workloads.
The only realistic way to provide massive incremental scalability is to deploy a Scale Out distributed system. Typically, to maximise availability, changes applied on one node are immediately replicated to two or more others. However, once you distribute data across servers you face trade-offs. To learn more about the trade-offs that come with distributed systems, as well as different solutions, check out my new ebook, Oracle vs. NoSQL vs. NewSQL: Comparing Database Technology.
About the Author
John Ryan is an experienced Data Warehouse architect, designer, developer and DBA. Specializing in Kimball dimensional design on multi-terabyte Oracle systems, he has over 30 years IT experience in a range of industries as diverse as Mobile Telephony and Investment Banking. Follow him on LinkedIn for future articles.