I finally had a chance to get vCenter Operations Manager setup. It has been collecting data for about a week and I have just been clicking around to see what I can find. I noticed that it was reporting very high CPU contention across the entire vSphere infrastructure, so I started to investigate.
In vCOPs I was in the Analysis tab and then set the focus area to CPU and clicked VM CPU Contention.
Once I clicked on this, it brought up a graph of sorts, displaying all red, which isnt good.
The CPU contention percent ranged from 38% up to 720% for every VM. Here is a graph from one of the ESXi hosts.
You can see its averaging around 1000ms of Ready time. Now according to this article I found, 1000ms is about 5%, but according to vCOPs this same host is coming in at 174%, so why the large difference?
So at this point im not sure if there is an actual issue with CPU contention or not. So to be sure I connected to the ESXi host by SSH’ing into the vMA VM. Once connected I ran the following command to connect to the ESXi host, and run resxtop to view CPU info:
resxtop -server 192.168.1.1
login with root and root password
At this point, resxtop will show up.
I highlighted the important CPU fields in red and here are the descriptions from VMware.
Run, %RUN:This value represents the percentage of absolute time the virtual machine was running on the system.
Wait, %WAIT:This value represents the percentage of time the virtual machine was waiting for some VMkernel activity to complete (such as I/O) before it can continue.
Ready, %RDY:This value represents the percentage of time that the virtual machine is ready to execute commands, but has not yet been scheduled for CPU time due to contention with other virtual machines.
Co-stop, %CSTP:This value represents the percentage of time that the virtual machine is ready to execute commands but that it is waiting for the availability of multiple CPUs as the virtual machine is configured to use multiple vCPUs.
At this point there does not appear to be an issue with CPU contention, but I do need to find out why vCOPs is reporting it that way.
After creating a support ticket with VMware, it was determined that this area was mis-labeled and would be fixed in a later release. Instead of the percentage of CPU contention, it is actually ms of latency and support adjusted the heatmap.