Storage vendor 1: “Our array can deliver 1 million IOPS!”
Storage vendor 2: “Well, our array can deliver 1 billion IOPS, so take that!”
This might be the start of the latest round of storage gang violence as storage vendors fight over who can do more IOPS. Or, it could be those same vendors talking about a metric that, while important, actually takes second place when it comes to storage performance.
IOPS has been used for years by storage vendors as a way to demonstrate the sheer awesomeness of their storage wares. But did you know that this metric has a dark secret? Did you know that you, too, might fall victim to the results of this nefarious downside?
And here’s the big reveal: IOPS can be gamed and, worse, some storage vendors do, in fact, game their testing to boost the IOPS figures.
The problem with using IOPS as the primary performance metric is that you often aren’t told under what conditions the published IOPS figures were derived. So, the vendor could publish IOPS figures that were derived from a 100% read workload with 4K blocks. In that instance, IOPS figures will be, well, stunningly awesome.
In the real world, though, people have varied workloads that have different read/write percentages are a variety of block sizes. In other words, it’s rare that a real customer will actually have the same workload characteristics as the vendor testing.
The real proof for how well a storage solution will meet your needs is to take a look at latency figures and, more importantly, do proof of concept testing to determine what kinds of latency figures you might see with your unique workloads. Latency is the outcome from all of the other performance points that are built into the storage pathway. It’s the amount of time that an application will have to wait before being told that a storage operation completed.
With that in mind, understand that latency actually does take IOPS into consideration. After all, if there are insufficient available IOPS to complete an operation, it will increase the amount of time that it takes that operation to complete. That wait time will be reflected in the latency value. The same goes for throughput – the amount of time that it takes for data to traverse the storage fabric.
Latency also takes into consideration non-storage parameters, such as the amount of time that it takes the hypervisor to handle an I/O operation. Of course, that pat of the latency value isn’t manageable by the storage vendor, but is still critically important to end users since they need to know where latency is happening so that they can take appropriate action.
Good latency depends on a bunch of factors, with the primary one being the kind of storage you’re using. If you’re all-disk, latency measure in milliseconds is generally common. Up to about 20 ms is generally acceptable in a VMware environment.
As you move to all-flash, though, 20 ms is a lifetime. A good all flash array may have spikes to 3 or 4 milliseconds, but, in general, you should see less than that. And, in many cases, you will see latency figures in the microseconds. Violin Memory, for example, publishes an average latency value of just 150 microseconds, which is particularly impressive. Bear in mind, though, that there may be additional latency imposed by non-storage systems.
If you’re looking for storage:
I wrote this post while attending Storage Field Day 9, an event hosted by Gestalt IT, while I was sitting at a table at Violin HQ watching a technical presentation. However, beyond a most awesome breakfast, no money exchanged hands in return for this post. In addition, Violin Memory is a client of ActualTech Media, but they did not pay for this post. This entire post – good or bad – is of my own doing.