Storage systems, like any other network or server hardware, are likely to brew up bottlenecks and performance issues and it’s the storage administrator’s job to keep this in check and triage issues. It’s a misconception that all storage bottlenecks arise due to storage disks. There are other key components of the storage infrastructure such as the storage controller, FC switches, and front-end ports that could go off course and, in turn, impact storage performance.
In this blog, we’ll understand some important factors causing performance bottlenecks in disk array controllers (aka RAID controllers).
A disk array controller is a device which manages the physical disk drives and presents them to the computer as logical units. It almost always implements hardware RAID, and thus is sometimes referred to as RAID controller. It also often provides additional disk cache.
The disk array controller is made of up 3 important parts which play a key role in the controllers functioning and also show us indicators of storage I/O bottlenecks. These are:
These components could potentially offset the performance of the storage subsystem when left unchecked. There are third-party storage management tools that help to get this visibility, but as storage administrators you should know what metrics to look at to understand what could possibly go wrong with the disk array controller.
It is possible that the disk array controller is made to support more resources than it can practically handle. Especially when in scenarios of thin provisioning, automated tiering, snapshots, etc. the controller is put through capacity overload and this may impact the storage I/O operations. Also, when we are having to execute operations such as deduplication and compression, they may just add more load on the controller.
Thanks to server virtualization, there are more workloads on the disk array controller to in comparison to the single application load on the host in the past. This makes it more difficult for the storage controller to find the data each virtual machine is requesting when each host has a steady stream of random I/O depending on each connecting host supporting multiple workloads.
You need to monitor the CPU utilization of the disk array controller with great depth and visibility. Try to get CPU utilization data during peak load times and analyze what is causing the additional load and whether the storage controller is able to cope with the processing requirements.
It’s also important to monitor I/O utilization metrics of the controller in 2 respects:
Both these metrics allow you to figure out when the disk array controller has excessive CPU utilization or if one of the I/O bandwidths is overshooting. Then, you can understand whether the storage controller is able to meet the CPU capacity and I/O bandwidth demand with the available resource specification.
As George Crump, President of Storage Switzerland, recommends on TechTarget, you can address storage controller bottlenecks by:
Ed. Note: This post by Vinod Mohan was reprinted with permission. It first appeared on the Solarwinds Thwack community forum.