Latency: The King of Storage Performance Metrics

Posted on:

March 17, 2016

Author

Keith Ward

Moderator and Editor

TL;DR - Article Summary

Storage vendor 1: “Our array can deliver 1 million IOPS!” Storage vendor 2: “Well, our array can deliver 1 billion IOPS, so take that!” This might be the start of the latest round of storage gang violence as storage vendors fight over who can do more IOPS. Or, it could be those same vendors talking […]

Storage vendor 1: “Our array can deliver 1 million IOPS!”
Storage vendor 2: “Well, our array can deliver 1 billion IOPS, so take that!”

This might be the start of the latest round of storage gang violence as storage vendors fight over who can do more IOPS. Or, it could be those same vendors talking about a metric that, while important, actually takes second place when it comes to storage performance.

IOPS has been used for years by storage vendors as a way to demonstrate the sheer awesomeness of their storage wares. But did you know that this metric has a dark secret? Did you know that you, too, might fall victim to the results of this nefarious downside?

And here’s the big reveal: IOPS can be gamed and, worse, some storage vendors do, in fact, game their testing to boost the IOPS figures.

Psst...While You're Here, Check Out Our Exclusive FREE IT Training Program:

I’m SHOCKED! Shocked, I Tell You!

The problem with using IOPS as the primary performance metric is that you often aren’t told under what conditions the published IOPS figures were derived. So, the vendor could publish IOPS figures that were derived from a 100% read workload with 4K blocks. In that instance, IOPS figures will be, well, stunningly awesome.

In the real world, though, people have varied workloads that have different read/write percentages are a variety of block sizes. In other words, it’s rare that a real customer will actually have the same workload characteristics as the vendor testing.

So What’s a Customer To Do?

The real proof for how well a storage solution will meet your needs is to take a look at latency figures and, more importantly, do proof of concept testing to determine what kinds of latency figures you might see with your unique workloads. Latency is the outcome from all of the other performance points that are built into the storage pathway. It’s the amount of time that an application will have to wait before being told that a storage operation completed.

With that in mind, understand that latency actually does take IOPS into consideration. After all, if there are insufficient available IOPS to complete an operation, it will increase the amount of time that it takes that operation to complete. That wait time will be reflected in the latency value. The same goes for throughput – the amount of time that it takes for data to traverse the storage fabric.

Latency also takes into consideration non-storage parameters, such as the amount of time that it takes the hypervisor to handle an I/O operation. Of course, that pat of the latency value isn’t manageable by the storage vendor, but is still critically important to end users since they need to know where latency is happening so that they can take appropriate action.

What’s Good Latency Look Like?

Good latency depends on a bunch of factors, with the primary one being the kind of storage you’re using. If you’re all-disk, latency measure in milliseconds is generally common. Up to about 20 ms is generally acceptable in a VMware environment.

As you move to all-flash, though, 20 ms is a lifetime. A good all flash array may have spikes to 3 or 4 milliseconds, but, in general, you should see less than that. And, in many cases, you will see latency figures in the microseconds. Violin Memory, for example, publishes an average latency value of just 150 microseconds, which is particularly impressive. Bear in mind, though, that there may be additional latency imposed by non-storage systems.

Guidance

If you’re looking for storage:

Look at the whole performance package, not just vendor-provided IOPS
Do your own testing in a proof of concept and see what the latency figures look like
Try to keep testing as real-world as possible
Ask you vendor for any data-gathered stats they have; many vendors do all kinds of centralized performance statistic gathering now across their entire user base, so they should be able to tell you what is really happening with their customers

Disclosure

I wrote this post while attending Storage Field Day 9, an event hosted by Gestalt IT, while I was sitting at a table at Violin HQ watching a technical presentation. However, beyond a most awesome breakfast, no money exchanged hands in return for this post. In addition, Violin Memory is a client of ActualTech Media, but they did not pay for this post. This entire post – good or bad – is of my own doing.