High CPU Contention reported by vCOPs

I finally had a chance to get vCenter Operations Manager setup.  It has been collecting data for about a week and I have just been clicking around to see what I can find.  I noticed that it was reporting very high CPU contention across the entire vSphere infrastructure, so I started to investigate.

In vCOPs I was in the Analysis tab and then set the focus area to CPU and clicked VM CPU Contention.

vCOPs-CPUContention

Once I clicked on this, it brought up a graph of sorts, displaying all red, which isnt good.

CPUContention-Graph

The CPU contention percent ranged from 38% up to 720% for every VM.  Here is a graph from one of the ESXi hosts.

Usage&Ready-CPUGraph

You can see its averaging around 1000ms of Ready time.  Now according to this article I found, 1000ms is about 5%, but according to vCOPs this same host is coming in at 174%, so why the large difference?
http://www.vfrank.org/2011/01/31/cpu-ready-1000-ms-equals-5/

So at this point im not sure if there is an actual issue with CPU contention or not.  So to be sure I connected to the ESXi host by SSH’ing into the vMA VM.  Once connected I ran the following command to connect to the ESXi host, and run resxtop to view CPU info:

resxtop -server 192.168.1.1
login with root and root password

At this point, resxtop will show up.

resxtop-ESXi2

I highlighted the important CPU fields in red and here are the descriptions from VMware.

Run, %RUN:This value represents the percentage of absolute time the virtual machine was running on the system.
Wait, %WAIT:This value represents the percentage of time the virtual machine was waiting for some VMkernel activity to complete (such as I/O) before it can continue.
Ready, %RDY:This value represents the percentage of time that the virtual machine is ready to execute commands, but has not yet been scheduled for CPU time due to contention with other virtual machines.
Co-stop, %CSTP:This value represents the percentage of time that the virtual machine is ready to execute commands but that it is waiting for the availability of multiple CPUs as the virtual machine is configured to use multiple vCPUs.

At this point there does not appear to be an issue with CPU contention, but I do need to find out why vCOPs is reporting it that way.

After creating a support ticket with VMware, it was determined that this area was mis-labeled and would be fixed in a later release.  Instead of the percentage of CPU contention, it is actually ms of latency and support adjusted the heatmap.

Exchange 2010 Pages/sec Counter

I have been monitoring our Exchange servers on a daily basis for a few weeks, just making it part of my daily monitoring.  I have been using the Exchange Server Performance Monitor. On the Mailbox server I have noticed that the Pages/sec has been spiking quite a lot, up to 300 at times.  It is averaging about 50.  Normally when counters start spiking, something isnt right so i started to look into the counter a little bit to see if i could pin point where the issue was.  Since i know paging deals with RAM, this narrows it down quite a bit.
  • Pages/sec—The values of this counter should range from 5 to 20. Values consistently higher than 10 are indicative of potential performance problems, whereas values consistently higher than 20 might cause noticeable and significant performance hits. The trend of these values is impacted by the amount of physical memory installed in the server.
  • Page Faults/sec—This counter, together with the Memory—Cache Faults/sec and Memory—Transition Faults/sec counters, can provide valuable information about page faults that are not committed to disk. They were not committed to disk because the memory manager allocated those pages to a standby list. Most systems today can handle a large number of page faults, but it is important to correlate these numbers with the Pages/sec counter as well to determine whether Exchange Server is configured with enough memory.

I did some Google searching to try and get some guidelines to go off of as to where these counters should be, and the definitions above seem to be what i was looking for.  Based on those numbers there is clearly an issue.  Next I opened up Resource Monitor on the Mailbox Server and started watching the Memory.  I was actually get some page faults also.  Here are some screenshots.

ExchangeMailboxMemory 

ExchangeMailboxMemory2

I noticed that physical memory usage was 83% this seemed a little high, but was it causing the issue i was seeing with a lot of page spikes and page faults?   I have done some reading on excessive paging in Exchange 2010 and believe the best spot to start off is by increasing the amount of RAM in our mailbox server from 10GB to 16GB.  I will update next week with the outcome.

UPDATE: Since adding an additional 6GB of RAM to the mailbox server, pages/sec have dropped from about 50 pages/sec on average to about 10 pages/sec average.  According to documentation the acceptable range is 5-20 so I will leave it at 16GB for now, and continue to monitor into the future.

HP MSL 4048 Library Install

Today the new HP MSL4048 Tape Library arrived.  After unpacking everything and going over the installation guide, I removed the shipping lock which keeps the inside parts from getting smashed around during shipping.  Next was to install the rails into the square holed rack.  There were two different sets of mounting hardware, one for circle and one for square.

Once we got the library loaded into the rack, it was connected to power and network to start the initial configuration.  First was to let the library do its initial power on testing, this took about 5 minutes and showed the status changes on the LCD screen.  Once that was completed I went through the menu to change the admin password so I was able to log into the web interface to manage it.  Next I set the IP and went to my desk and launched my web browser to the IP I just set, after documenting the IP.  Next was to setup email alerts on the tape library, so if anything changes, or fails, I will be notified of this through email.

I loaded one of the four magazines full with LTO5 tapes and inserted back into the tape library, it went through its process of scanning all of the tapes (at this point no labels were on them).

I checked the firmware versions that were currently installed on the library, all appeared to be up to date from what I downloaded from HP’s website.

Next I updated my spreadsheet with the warranty information to include the new library, and then added the warranty info the HP Insight Online.

Next will be connecting it to the SAS HBA and configuring in Backup Exec.

I put the ESXi host in maintenance mode today and shutdown the server.  Installed the SAS HBA and booted the server back up.  During the bootup process I did see it recognize the H221 SAS HBA, thats a good sign.  Took the ESXi host out of maintenance mode and started the Backup Exec VM back up.  Backup Exec was not able to see any of the equipment for some reason when trying to configure storage.  The HP H221 SAS HBA was showing up in the device manager, and the HBA could see both LTO5 drives when i booted the HBA into its BIOS.  I started reading around online, didnt see a whole lot.  After about 10 minutes, I started to wonder if I had to add hardware to the VM.  I right clicked the BackupExec VM and went to edit settings, and added a SCSI device.  There were 2 HP Tape options from the drop down menu, so i added both.  I rebooted the VM and both the tape drives showed up in device manager, and in BackupExec.

I started to setup a quick backup job, however I was failing and having issues.  I noticed that there was also no robotic library in device manager or BackupExec and that is quite imporant since thats what controls loading and unloading the tapes into the drives.  So i did some more looking and it appears as though VMware stopped supporting tape libraries directly connected to ESXi hosts in version 5.0… great.  So at this point i submitted a ticket to HP to see if there was a configuration change i could make somewhere since tape libraries are supported in ESXi 4.1, I figure they should be able to help me.  Waiting to hear back from them.

 

Dcdiag is reporting FRS Event Error

I set aside an hour a week to work with our domain controllers.  During this time I run health checks, review logs, and review event viewer entries.  I have been looking for a way to automate a health check script using powershell, but for the time being I am sticking with the normal commands.  I ran Dcdiag.exe /v this morning to review the overall health of the domain controller (DC) and everything was normal except one thing.  There was an error in the FrsEvent which is the File Replication Service.

FRS Error on Domain Controller DC

I have been hearing of issues of group policy replication issues going around, and until now, the domain controllers have been reporting back as healthy, but now I have something to work with.  I ran the Dcdiag.exe /v command on our second domain controller and everything came back healthy.

FRS on Domain Controller DC1

At this point I did a search for 0x800034C4.  I wasn’t able to find much specific about the error, but went to check the services status on both domain controllers to make sure the services were started that needed to be.  I usually do this by sorting to show automatic startup type and then check to make sure all of those are started.  All the services looked fine on both domain controllers.

Next I started going through the actual reasons why it may be showing this error.  The first being that is not able to resolve the domain controller’s DNS name.  I pinged the DC1 from DC and it resolved the DNS name just fine.

Next was FRS is not running on DC1.  I verified that the File Replication Service was indeed running on DC1, but do I need to restart the service for some reason, will this have any impact?  After reading a little bit, it did not sound like it would affect anything, I restarted the File Replication Service on both domain controllers.

Next I went into the File Replication Service Event Log in Event Viewer on both Domain Controllers, and DC looked fine, while DC1 had a lot of errors.

13568

I followed the instructions for creating a new DWORD Value for “Enable Journal Wrap Automatic Restore” and restarted the Ntfrs service on the problem DC1.

restore

After restarting the service, I went back into the Event Viewer to watch for any new events and this appeared.

13516

After 5 minutes the following entries showed up in the event viewer.
13560
13554

13554

I changed back the registry key to 0, from 1, and will continue to keep an eye on this for a few days.

Exchange 2010 Retry Remote Delivery Queue Length

MSExchangeTransport Queues(_total)Retry Remote Delivery Queue Length
-Shows the number of messages in a retry state in the remote delivery queues.
-Shouldn’t exceed 100. We recommend that you check the next hop to determine the causes for queuing.
http://technet.microsoft.com/en-us/library/ff367923.aspx

In my daily monitoring of our Exchange 2010 servers, I glance at the Exchange Server Performance Monitor on both our CAS/HUB server, and the MBX server.  This allows me to see any abnormal activity pretty easily with lines on a graph.  If there are high spikes, or sustained heightened activity, I will usually investigate.  Since ive been working with Exchange I have noticed there are always a substantial amount of messages stuck in the Retry Remote Delivery Queue Length.  When I go into Queue Viewer and click the messages tab, I can view all of the messages that are causing the large queue size.  Normally this is between 30 and 100 messages.  These messages are addressed to recipients that do not exist in our organization with the error 400 4.4.7 Message Delayed, and keep retrying.

So what is causing these queued messages?

Outlook Freezing When Loading Contacts – Duplicate Contacts

I had a request come in to take a look at a users Outlook that had over 500,000 duplicate contacts.  He has about 500 unique contacts, but for some reason, they duplicated.  When I launched Outlook it initially froze Outlook so I went into Resource Monitor to watch what was going on with the CPU, Disk, Network, Memory, and Outlook was hitting it hard.

ResourceMonitor

After I let Outlook sit and sync for a while (20 minutes) it finally started responding again and I was able to get into the Contacts area.  I found an article which Microsoft published that is supposed to help resolve the issue of duplicate contacts.  After reading this over, I still involved us doing the work of removing the duplicates, which isn’t very efficient of our time. Also in the article there was a section on importing contacts and telling it “Do Not Import Duplicate contacts” so I was thinking, why can’t I export all of the contacts into a PST file, then delete all the contacts in Outlook, and import them back in from the PST file telling it not to import the duplicates.  Even if it doesn’t get rid of all the duplicates, it should do about 90% of the work and leave us a little cleanup to take care of.

I was on the phone with the HelpDesk tech working on the issue, and told him the route I would take with the issue.  After some discussion we agree’d that this would be the best method that used the least amount of our time and would be a good first step.  If there are no more updates to this, then consider it resolved.  If anything changes or doesn’t work, I will update this.

Windows 8 Enterprise RTM Installation and Impressions

I am currently in the process of downloading Windows 8 RTM and will be burning to DVD and installing in the next couple of days.

Things I like with Windows 8
Task Manager shows a lot of useful information, can see this helping troubleshooting a lot.
File Transfer status window also shows a lot more information and has nice graph.
Resource Monitor shows all information you would need for CPU, Disk, Network, etc.
Taskbar spans across multiple monitors instead of just primary monitor

Things I don’t like with Windows 8
Still can not print from the snipping tool.
Advertisement in Weather App.
First off, not a fan of the Metro UI, it slows me down in everyday tasks.
I have been using Windows 8 since Developer Preview, still don’t like Metro UI.
With that said, I am going to have to learn keyboard shortcuts to make my life easier.
For Enterprise, it comes loaded with all kinds of non-enterprise apps, removed those.
Shutting down and restarting PC takes way to many steps, I created a shortcut in taskbar.

Questions
Is there a group policy (adm) addon for additional policies for Windows 8?
Will the Windows 8 group policies require Windows Server 2012 Domain Controllers?
Not sure if Windows Defender was updated in Windows 8 or not, don’t use it much.