Virtual Infrastructure Management

What does a VMWare Virtual Infrastructure (VI) Administrator have to do to get their head around the dozens of host servers, subnets and virtual machines in their environment?  Stop…Take A Deep Breath…Move Forward.  VI sanity can be meet with a reasonable amount of planned activity equal in part to the regular due diligence Administrators apply to their physical infrastructure counter parts: host clean-up, log reviews, monitoring, etc.

Who Moved My Cheese: VI Change Management

First, get your bearings.  In a well-defined VI, a Virtual Machine (VM) resides on a dynamic cluster of host servers, those servers are part of a group and that group is composed of hardware (e.g., servers, SAN, networks, disks, etc.).  That being said, your VMs can jump from host to host and group to group at any point during the day or night.  As shared resources (CPU, RAM, etc.) are required and contention for those resources becomes an issue, VMs can automatically be evacuated from one host to another.  If your cluster is configured with Dynamic Resource Scheduling (DRS) or High Availability (HA), your VMs can reside on any host in the cluster at any time.  In this configuration, the VI dynamically makes room on a host if the intensive processing of one or more VMs be comes and issue and then re-balance those resources later on as things settle down.

Given the dynamic nature of this architecture, chasing these moving targets is often like the old adage…”hearding cats”.  So what should be considered an offical “Request For Change” (RFC) in a Virtual Infrastrucuture environment?  Here are some common changes that make sense to account for in your formal Change Control process:

  1. Manual vMotion or movement of a VM between hosts
  2. VM configuration changes – extending the permanent allocation of virtual hardware or resource share changes
  3. Deployment / introduction of new VMs into the environment
  4. Host configuration changes – maintenance changes
  5. Patches and updates to ESX hosts, host hardware maintenance, etc
  6. DRS – Automatic load leveling on hosts using vMotion, could occur daily or hourly
  7. Cluster change – Addition of LUNs, rescan of storage
  8. Cluster change – Removal of LUNs
  9. Cluster change –Host upgrade (Major change, VM downtime not always required)
  10. VMtools upgrades (after major host version upgrade, VM restarts after tools install)
  11. Addition of hosts to an existing cluster
  12. vCenter upgrades – no VM changes, but possible loss of access to VMs via vCenter
  13. Critical to performance and stability

Walk The Talk: The Practice of  VI maintenance

Like any administrative function, keeping an active eye on your hosts and VMs will give you a finger on the pulse of your environment.  Remember the three key concepts in VI management: shared storage, shared resources (CPU, RAM, Networks) and VM placement.  As things progress disks (SAN LUNs) will fill-up, VMs will need care and feeding and performance monitoring will tell yo how things are progressing.  Here are some best practices that most VI admins perform on a regular basis to keep their head in the game.  I’m sure your see many similarities to your regular duties when managing a physical environment.

Daily Tasks

  1. Gather Statistics and review previous day performance and utilization data
  2. Look for changes between current and previous day data on both VMs and hosts

Weekly Tasks

  1. Review host logs, vSphere logs document errors or issues to troubleshoot
  2. Review VMFS volume capacity; do not deploy VMs to LUNs with <20% available space
  3. Look for VMs with open snapshots; these can grow to big and cause performance issues or lock ups
  4. Monitor host drive space
  5. Decommission Test/Dev VMs to ensure to reclaim unused space

Monthly Tasks

  1. Create a capacity reports for IT management; there is a great tool for this from vKernel called Capacity Analyzer
  2. Update your VM templates with the latest hotfixes and patches approved for the environment