Table Of Contents
- Introduction
- NoSQL Misconception #1: I need it because SQL is slow
- NoSQL Misconception #2: I can just use open source and get the best of both worlds
- NoSQL misconception #3: We want flexibility in our schema, so we need a schemaless NoSQL database
- NoSQL misconception #4: I need a NoSQL key value store because we don’t want our server doing application logic
- NoSQL misconception #5: We only need a simple NoSQL data store today, so we can do that and worry about future complexity some other time
- Conclusion
Introduction
NoSQL databases became “en mode” some time around 2012 and sprouted a database revolution that led many enterprises to replace their traditional RDBMS technology with NoSQL-based data platforms.
What’s interesting now is how many of these companies are either regretting their decision to do this or finding it extremely difficult to continue with their NoSQL solution without either spending an unreasonable amount of money or taking on an unreasonable amount of risk.
Here at Volt Active Data, we have a lot of customers who implemented systems on other platforms and then came to us after their initial solution disappointed them. When you look into how they ended up with a failed implementation, many common factors emerge, and one of the most common things we see pop up in these companies’ assessment of their failure is some kind of misconception around what NoSQL can or can’t do, often related to misconceptions about what SQL can or can’t do.
Let’s review our real-world experiences with NoSQL shattered dreams.
NoSQL Misconception #1: I need it because SQL is slow
We’ve seen this multiple times: Because legacy databases are slow, people tend to assume that SQL is what’s making them slow.
Now let’s be clear: SQL can be slow. The easiest way to botch SQL is to parse SQL statements each and every time you send them to the database, instead of using ‘prepared statements’. You can also have hours of fun with complex queries and dodgy execution plans. Or if you really want a ‘full employment act’ for your DBAs, use a cost-based optimizer.
But there’s nothing inherently slow about SQL itself. The reason legacy SQL databases are slow is because their architecture dates from around 1984, when you had a single CPU, minimal RAM, and a disk you could probably see spinning if you took the cover off.
Volt Active Data was built to solve this problem in the context of modern hardware, and gets around 10x the throughput of a legacy RDBMS for the same hardware because it’s written for modern multi core CPUs. SQL is not, in itself, the problem.
The second argument we hear against SQL is that “the developers don’t like it”. But as we point out above, some form of schema and structure is needed, and it will rapidly get more complicated as more and more use cases are supported. This is the reason people are adding SQL layers to NoSQL databases. This, in turn, gets us back to the question of how to get SQL to run quickly, which is something we know a thing or two about. Fast SQL has to be baked into the architecture from the very start. Adding a SQL layer to a NoSQL store will, at best, leave you facing the same kind of performance challenges you wanted to escape when you were on legacy RDBMS.
What you should be thinking about around database speed: Transactions running quickly—ie, within 1 to 2 milliseconds.
NoSQL Misconception #2: I can just use open source and get the best of both worlds
Management loves the concept of open source NoSQL technology, not because they want to contribute to the codebase, but because it’s ‘free’.
But open source software is ‘free’ in the same way that a zoo being given a giant panda to mind by the Chinese government is ‘free’. The panda itself doesn’t cost the zoo a thing, but the upkeep, care, and feeding with air-freighted fresh bamboo definitely does, and adds up rapidly. The reality is that, in a commercial contex, software licenses are a part of a whole constellation of fees and costs associated with operating a system over its lifetime. So, while you might acquire the software without spending any money, you won’t be using it for free.
And once you’ve acquired the software, you need to support it. For open source, we frequently hear stories of employees volunteering to do the support themselves. While this is good for the resumes of the people involved, it isn’t necessarily in the company’s best interest. By taking support in house, they are effectively creating a requirement that new hires have the same level of highly specialized skill that the original in house volunteer did. The alternative is to pay the company that champions the open source offering to provide support, in which case the whole ‘open source’ distinction starts to get a bit vague.
Another factor is the increasing complexity of NoSQL databases. As they address more functionality, they become more and more complex, especially when you start to add subsystems for things like SQL and ACID transactions.
There is also a longer term issue with open source NoSQL databases. Virtually all the successful ones have now IPO’d, frequently at eye-watering valuations. This means that their future development efforts will inevitably be focused both on enterprise features and capturing enterprise customers that have yet to sign deals with them. The rise of the the SSPL is clear evidence of this.
What you should be thinking about around database and data platform costs: The long-term TCO of the technology should be both predictable and affordable when compared to the value it creates.
NoSQL misconception #3: We want flexibility in our schema, so we need a schemaless NoSQL database
Legacy relational databases were incredibly unhelpful when it came to customization and flexibility. But thanks to JSON, it’s now a lot easier to add extra, custom data at the record level, regardless of what data platform you’re using.
That said, in their quest for flexibility, a lot of developers end up discarding the structures they need to process the data. Flexibility at the schema level is a double-edged sword that pushes complexity out from the single stored copy of the data towards multiple client applications, all of whom have to agree on what the custom data means. There is also a big, big difference between having optional attributes on things you track and refusing to say what the different kinds of things are in the first place. Sooner or later, you need to nail down the key data structures.
While there are legitimate situations where having a very flexible schema is incredibly helpful, such as customization, in reality there isn’t such a thing as a true “schemaless” database because all data has at least a high-level structure. If it was true that there was no inherent structure to data, then you’d only need one record, which would be a BLOB containing everything to do with a business.
But in reality, you work with different kinds of ‘things’ that have the same set of attributes, and these things are related to other things. Hence, your data always has a structure even if its individual components have unique attributes.
The fact is that any kind of automated data processing presupposes a working structure. You can have schemaless areas inside the schema, but this needs to be in the context of a broader corporate schema.
What you should be thinking about around schema flexibility: Having the ability to extend records with arbitrary extra data (i.e., JSON) and index those records.
NoSQL misconception #4: I need a NoSQL key value store because we don’t want our server doing application logic
Of all the oddball requirements we’ve seen, this is in some ways the strangest. It appears to be a reaction against the unarguable hassle, misery, and pain of working with 3GL SQL manipulation languages like PL/SQL. It usually comes hand in hand with the requirement above that the server support totally schemaless storage.
The thinking is that because stored procedures in general and PL/SQL in particular were painful to deal with, all future interactions with the database should be limited to ‘Get’ and ‘Put’ operations. As a former PL/SQL developer, I can sympathise with this approach, but trying to avoid it creates other, far-worse problems.
If you’re dealing with a simple use case where you have one read of a key/value followed by writing changes, then while you might hit issues with contention and optimistic locking, things will more or less work.
The problems start when you need to read and change multiple keys, say ‘A’, ‘B’ and ‘C’. If you read and change them sequentially you face two major issues:
- Someone might change ‘A’ after you’ve read it but before you read ‘C’, leading to the wrong results.
- The time taken to sequentially read ‘A’, ‘B’ and then ‘C’ before sequentially writing them will blow your SLA.
In short, there is a subset of use cases where you will need to read and modify a set of related values in one step, or write lots and lots of code to clean things up when things go wrong.
NoSQL tends to handle this badly, either by offloading the work onto developers or implementing the same kind of clunky locking techniques that drove us mad when working with legacy RDBMS.
What you should be thinking about around consistency and logic: Being able to handle complex transactions involving multiple data items without having to worry about read consistency or cleanup code.
NoSQL misconception #5: We only need a simple NoSQL data store today, so we can do that and worry about future complexity some other time
Agile development is generally a good thing, but we’ve seen scenarios where the emphasis has shifted from focusing on adding value to customers with each rapid iteration to a weird, willful refusal to plan for the future. Just because your first release has a simple schema doesn’t mean that you won’t need a more fully featured data platform in the future, and changing data platforms can be very expensive.
Future requirements for geo replication are a classic example of this. Credible geo replicated data platforms, like our Active(N), are rare. Retrofitting geo-replication to a finished product at the application level is a nightmare and frequently leads to a total rewrite. It’s the distinction between building a flying car and making an existing car fly.
Even if you do use a geo-replication-capable data platform, you’re still going to need a good plan for application-level conflicts, and such systems can present significant operational challenges.
Geo-replication is but one example of this phenomenon. The bottom line is that defining requirements that fixate on immediate problems while failing to consider long-term needs is a recipe for disaster.
What you should be thinking about around complexity: Picking a product that meets your longer-term needs but also fully understanding how it does so and how that influences your architectural choices and TCO.
Conclusion
Everyone’s situation is different, and here at Volt Active Data we’re not in the business of claiming that our product is a magic, universal data platform that will meet all your needs. We have no idea whether Volt Active Data is right for you until we’ve spoken to you.
But what we do see, with depressing frequency, is cases where the shift from legacy RDBMS to NoSQL went wrong because people used requirements that at first sight seemed logical but when you drill down into them turn out to be both simplistic and disconnected from the long-term goals of the business.
If you’re either moving from legacy RDBMS or have a newly implemented NoSQL solution that is disappointing you because it doesn’t meet your needs for scale, speed, or transactions, please feel free to speak to us. We’re pretty sure we can at least guide you in the right direction.