I recently fielded a question from a former co-worker who works as a senior Oracle Database Administrator (DBA). I spent much of my career with Oracle, and I’ve asked myself the same question:
I am one of those DBAs who can see Oracle’s market share gradually being eroded and replaced by a whole bunch of new database vendors. I would like to transition from Oracle to “Big Data” but am struggling to highlight a path to do so. Which technologies should I focus on? Which programming languages?
To answer this question, we need to take a step back and see things from a bigger, historical perspective.
Table Of Contents
How did we end up with DBAs?
First, we need a workable definition of ‘DBA’. Is a DBA a form of system administrator, a business knowledge guru, or something in between? The answer lies in how the DBA role has evolved, starting with the advent of relational database management systems (RDBMSs).
The RDBMS solved a whole series of problems that prevented people from getting value from their data. This initial success then led to the concept of the ‘Enterprise Data Model’, which in turn implied a need for someone to represent the enterprise, as opposed to the needs of one application. This is fundamentally what a DBA is—a curator of enterprise data.
Questioning the need for DBAs
Developers didn’t always see the need for DBAs, which have traditionally been perceived as ‘expensive.’ But downtime is expensive, too, and the only thing worse than being down is being up and spouting the wrong answers. Preventing either of these things from happening is the DBA’s job.
Over the last decade, the major database vendors have added more and more functionality to ‘automate’ the DBA role, but in practice this makes the database less predictable. In some cases, more people are needed to control the ‘automated’ functionality’s side effects. A good analogy here is the introduction of fly-by-wire in the aviation industry, which eliminated some hazards but replaced them with new ones.
Can we actually live without them?
Being older, I’m in a position to answer this question with a ‘yes’. I started my career before the DBA role was invented and worked with file-based systems. The only fundamental difference between Hadoop and 9 track tapes is that a human has to hang a tape, whereas you can tell a computer to process an HDFS file.
The problem we faced then was not that tapes performed poorly (at least when compared to our goals), but that there was no single, live repository of the business’s data, and that what passed for a ‘data model’ was in fact a loose constellation of mini-schemas from individual applications. Building a single application is easy—building fifty and having them use the same data without errors is pretty much impossible. This is what the NoSQL community is seeing now.
What worries me is that the RDBMS became so ubiquitously successful that two negative things happened. First, people started using RDBMSs in applications for which they weren’t very well suited, simply because ‘everything’ was in the database: XML, CLOBS and object layers tacked on top of an otherwise blameless RDBMS, for example. Second: over time, people tended to forget the value an RDBMS provides and began to focus instead on the visible deficiencies—a bit like people who object to measles vaccinations because they’ve never encountered measles themselves.
A wave of database innovation resulted, but not all of the innovators have experienced life before the RDBMS and thus don’t understand the world they are entering when they abandon concepts such as ACID. While not all applications need ACID, many do, and what might be perceived as an acceptable limitation for a standalone application might be totally unacceptable in an enterprise context.
The real value in a database is what you prevent from happening…
This may sound perverse but think about it: If I allow people to store anything they want in an unstructured key value store, I am betting that every single developer whoever works with this data will write code that can successfully interpret the contents. Adding a column in a conventional database borders on the trivial. Adding an extra attribute to an unstructured JSON object creates all sorts of issues about how the new code will co-exist with old data.
A key but overlooked aspect of the DBA’s job description is that ‘curation’ involves forcing data to follow rules and standards so it is actually possible to process it with a computer program. It’s not about what you make possible, it’s about what you prevent. Curation also involves controlling access. Failing to control access used to be merely embarrassing, but the costs can now be measured in hundreds of millions of dollars and are getting higher every day. In some cases, data breaches represent an existential threat to the companies involved.
Without DBAs history will repeat itself
Though the DBA role won’t vanish, we are in a period of chaos where people may believe it isn’t needed. Yet it is clear the future will have enterprises using multiple database platforms to manage data, and there will still be a need for ‘curators’ of that data.
Companies can’t afford the learning curve of every developer becoming expert in every new technology. They also need a long-term plan to create and manage an appropriate ‘zoo’ of database technologies, instead of allowing industry fashions and the blank spaces in a developer’s resumes to dictate how and where the enterprise stores its most valuable asset—its data.