Volt Active Data is now available ‘by the hour’ on Amazon EC2. To get best results and optimal TCO from your Volt Active Data cloud instances, users need to understand how Volt Active Data handles IO, which we’ll discuss below. At the close of this blog, we’ll also review available IO options.
Table Of Contents
Why would an in-memory database like Volt Active Data need IO?
Even in a replicated, highly-available configuration, you will eventually need to restart the database, either because of planned maintenance or because of an incident at your hosting provider. This means you’ll need something to restore it from, which means disk and hence IO.
A legacy RDBMS is architected on the assumption that all data lives on disk and a small subset is in memory. The design compromises this leads to cause all sorts of side effects, among which is that IO can be both verbose and very tightly coupled to your transactions. Volt Active Data and other in-memory databases, on the other hand, take a different architectural approach, keeping all data in memory. Since all data is in memory, it needs to be flushed to disk every now and then as part of the database durability strategy.
Volt Active Data and IO
As we implied above, Volt Active Data’s approach to IO is different than that of a legacy RDBMS. In a conventional configuration, a Volt Active Data database will have multiple streams of serial command log related-IO (one for each core), with changes appended to files in micro-batches. As a consequence, the number of IOPS needed does not increase linearly with the number of transactions and is instead tied to the physical space needed to store the transactions. Volt Active Data also uses serial IO to flush database snapshots to disk, and thus is fundamentally thriftier in its IO expectations than legacy databases.
Traditional IO
When using your own dedicated hardware there are two kinds of IO:
- Random IO occurs when a series of writes is issued to different locations on the file system. In traditional databases, Random IO is used to update data files to make them reflect changes that have happened in RAM. This kind of IO is generally measured in IOPS (Input/Outputs per second). A generic ‘spinning rust’ drive can do about 160 random IOPS per second. Many database products that rely on Random IO have significant IOPS requirements that may not be easily satisfied in a cloud environment. Volt Active Data does almost no Random IO.
- Serial IO is used when the database is appending to an existing file, with a database journal being a classic example. IOPS work differently here – the key word in the definition above is random. With a physical drive, the actual execution cost is how long it takes to move the drive head into position, not how long it takes to transfer the data. If you are writing many blocks that are stored on the same column on the disk, you can often see a 10-15x improvement in observed throughput, provided nobody else is using the drive and the underlying mirroring scheme doesn’t fragment access (as does RAID 5). As a consequence, in a traditional DB a pair of mirrored drives, properly configured, can outperform a lot of entry-level SANS for serial IO, especially if every single transaction generates an IOP. If using Volt Active Data on conventional disks, you’ll get the best results if you use separate disks for command logs and snapshots, as you can avoid ‘random’ IO.
Deploying Volt Active Data in AWS
IO works differently in AWS than in the ‘traditional world’ described above. As database professionals, we are used to working with dedicated drives, or a SAN. In both cases, we are encouraged to think in terms of IOPS. AWS uses an Elastic Block Store (EBS) to provide disk volumes to servers. It provides four different types of volume, in three of which the number of IOPS you get is defined by the size of the volume as well as the volume type. This has interesting implications for deployment. A further complication is the concept of ‘Burst Balance’. In EBS there is a fundamental difference between the IOPs a volume will support in a ten-minute benchmark and what in can support under sustained load.
Burst Balance
“Burst Balance” allows you to get much higher IO throughput for a finite period than your EBS device would normally support. The diagram below shows this for the ‘sc1’ volume type – a 1TiB volume will support 80MiB/sec until Burst Balance reaches zero, at which point IO will be constrained to the base throughput of 12 MiB a second. The way to increase the sustained IO capacity is to make the volume bigger, even if you don’t need the space. Thus, if we need to sustain 100 MiB/sec, we can get that by creating an 8 TiB volume. The side effect will be that we need to pay for the extra TiB.
Burst Balance is available as a volume-level statistic. We strongly recommend creating a CloudWatch alarm to make sure it doesn’t reach zero, as this could have calamitous effects on throughput. Burst Balance must also be measured during benchmarks to make sure it is not degrading in an unhelpful manner. To complicate matters:
- Burst Balance stats are only generated when the volume is being used, not when it’s mounted.
- Burst Balance stats take up to 10 minutes to show up in CloudWatch.
We use a script like this to measure it:
STDATE=`date ‘+%Y-%m-%d’`
STDATEMIN=`date ‘+%H:%M’`
./runCluster.sh async-benchmark | tee -a $FNAME
EDDATE=`date ‘+%Y-%m-%d’`
EDDATEMIN=`date ‘+%H:%M’`
VOLUME_ID=`cat v.txt`
sleep 900
aws cloudwatch get-metric-statistics –metric-name BurstBalance
–namespace AWS/EBS –period 60
–start-time ${STDATE}T${STDATEMIN}:00
–end-time ${EDDATE}T${EDDATEMIN}:59
–statistics Average –dimensions Name=VolumeId,Value=${VOLUME_ID}
AWS Volume Types
Sc1 and St1 are conventional ‘spinning rust’ disks; gp2 and io1 are solid state drives. With gp2 you get a fixed 10000 IOPS; io1 allows you to pick the number of IOPS you need. To make things more complicated, AWS has a specific definition of an IOPS – changes to blocks that are ‘next’ to each other are merged, so a set of 8 contiguous 32 changes will be merged into a single 256K ‘IOP’. Consequently, it’s hard to make clear cost predictions for io1, as we need to benchmark our application to establish average IOPS.
Type | Description | Official Use Cases | Cost for a 4TiB Volume per month (2/2017) |
Sc1 | Lowest cost HDD volume |
|
US$100 48 MiB/Sec Sustained 250 MiB/Sec Burst |
St1 | Through Optimized HDD |
|
US$180 160 MiB/Sec Sustained 500 Mib/Burst |
Gp2 | Default SSD – used in Ec2 root volumes |
|
US$397 160 MiB/Sec Sustained or Burst. No Burst limit. 10000 IOPS fixed |
Io1 | Fancy SSD |
|
US$ 1,265 MiB/Sec 320 Sustained or Burst. 10000 IOPS fixed MiB Sec 250 1000 IOPS |
Recommendations for deploying Volt Active Data in EBS include:
- Always put Volt Active Data data files on a separate volume. This is so you can easily backup and move the data if you need to in the future.
- For m4.xlarge and m4.2xlarge, a 4GiB sc1 volume ought to be enough – we’ve struggled to saturate any of these instances in tests designed to generate as much IO as possible.
- For larger instances, you’ll probably need to use st1 and pay attention to Burst Balance.
To try it for yourself, visit the AWS Marketplace.