Nexus 1000v

For any vSphere’ers this has been released via Ciscos website http://www.cisco.com/en/US/products/ps9902/index.html its downloadable as a demo for 60 days so go get it…v.easy to setup the relevant components (even I have) but caution RTFM as this is a “real” Network switch running a real IOS.

Nexus is a great step within the world of VMware networking, it means that the network bods now retrieve back some turf in the Virtualised datacenter and will need to start to work together with VMware peeps to design scalable networking solutions capable of delivering great network capability to Virtual Machines. This is most certainly something organisations will need as they become fully converged and virtualised with initiatives such as Cisco UCS. (god i sound brainwashed but hey it rocks)

Adoption of Nexus 1000v within datacenters will be a large hurdle that needs to be jumped, questions need to be raised in pre planning awareness workshops and design sessions, typical topics that may come up are whether the networking bods know about current de facto grass roots capability of Virtual Switches, are they are aware of the trunking and vlanning methodologies on offer. And in reverse I would expect the networking bods to provide technical benefits to the Virtual bods on how Nexus benefits virtualised worlds.

Commercially any large enterprise customers being Cisco based I won’t be surprised if we will see a sideways approach to adoption, if networking divisions go and buy the Nexus family of switches that 1000v falls under no doubt 1000v will get pushed as being viable and suitable to the virtualised world that the physical Nexus switches will use.

A few great resources with information that might help education and awareness is available on the following links;

Steve Chambers provides some excellent commentary on typical issues and how to resolve http://viewyonder.com/2009/05/22/virtualization-barrier-4-the-network-engineer/

Ken Cline has a post that he could almost publish as a book! Great resource http://kensvirtualreality.wordpress.com/2009/03/29/the-great-vswitch-debate-part-1/

Siloed DRS Clusters – Would you, do you or will you have to?

My shop runs various applications which are subject on the commercial front to Circa 1990 ISV Licensing models, this most certainly becomes a big issue when wanting to reap the benefits of VMware DRS and dynamic load balancing.

Getting push back when wanting to Virtualise applications which are still under licensing policies that go back to the dark ages is definitely a kick in the teeth to anyone waxing lyrical about Virtualisation, also its very hard for someone who believes in the excellent benefits of cutting edge technology such as VMware that an ISV could be so backwards and cruel. The most common barrier with the licensing model you experience is you can’t virtualise something due to the fact you have to license all Physical CPUs and sometimes even the Cores on 32 hosts in your DRS Cluster just to run it on a single VM instance, the cost just makes it impractical and I think any VM Lover would see sense (after punching a wall) in this.

One option to get around this is you could Silo DRS Clusters or segregate ESX hosts, This has its plus and minuses, some things I can think of are;

Advantages

Implementing Silo’d clusters allows you to sensibly afford the licenses to cover for the workloads in a virtual environment that are subject to the ludicrous rules, it allows you to virtualise across say 2-3 Hosts and still reap benefits of dynamic load balancing and high availability with Vmotion but within a smaller cluster.

Indirectly this may also work well for supporting the higher end more expensively licensed database technology (hey i’m generalising here) that are subject to different SLA coverage, you may for example only want to license the underlying hosts for additional technology such as for DR purposes with VMware SRM which is licensed per CPU, you may not necessarily want to license hosts for SRM that have non critical tier 3 VM’s running but have coverage for tier 1 apps.

Disadvantages

Silo’ing clusters is going to be a pain in the arse for architects and designers to plan the Virtual landscape for large scale environments and also to plan for the DR of this. Technology wise this option limits you to embrace and achieve what VMware is designed to do of sweating your underlying Tin and Infrastructure assets.

Silo’ing doesn’t allow you to make use of resource and scale out as much across the hosts in a DRS cluster to facilitate the smaller app workloads and to slot into the gaps that are available when you have larger apps running within a Cluster. On the operational side this creates more day to day management overhead within the virtual environment and management console, it also means you have more things that can go wrong in your environment and more change control considerations going forward.

I’m sure its evident that Siloing clusters is probably more hassle than its worth to acheive virtualising your un-license virtualisation freindly applications, carefull investigation is needed to ensure that the business case stacks up and you will gain benefits cost wise to virtualising your workloads. A saving grace for Virtualisation adopters is that VMware makes this possible, it allows you to design and build your environment in ways to facilitate this.

Summary

Another thing to also consider when it comes to licenses is the latest ESX licensing changes with the introduction of dare I say it vSphere Enterprise Plus. ESX Licensing across the complete ESX Landscape may mean we silo ESX versions to economically make use in datacentres of extended features such as Powerpath VE. In other areas of the datacenter this is done at application level with products such as SQL, it is not economical to put a low end 5GB DB on a Enterprise cluster so Standard clusters get built to host workloads that do not require the high end functionality.

All of these pointers and thoughts may well be something that Virtualisation peeps will cringe at the thought of doing, it certainly is not something I like as I like shared clusters and making use of my Infrastructure, but the mindset might have to change and it may have to happen due to there being no option or room for negogiation with ISV’s to virtualise such workloads at both the Virtual Host and the Application licensing stack where ISV’s wont budge to allow us to virtualise on a per vCPU basis.

ESX IOPs – This is NOT a HyperV bash!

Now I’m not a nasty horrible person (although some people might think that) I just like proving facts, a google to find material on Virtualising Microsoft Biztalk and I stumbled across this excellently written Microsoft paper on how BTS benchmarks when running on Hyper-V against conventional physical hardware. http://go.microsoft.com/fwlink/?LinkId=123100
Conceptually this document gave me some great resource material to use as I have no idea on how Biztalk server works, how the architecture of Biztalk is configured on the box or how components figure across a landscape. The best bit for me was that Microsoft kindly provided the Storage IOPs profile to compare and benchmark, this gave me the following quoted figures for expected IOPs when virtualising Biztalk on Hyper-V;
Now amateur alert here……I currently only have a very basic test and development platform rig currently running ESX 3.5, its only got DAS and is running a RAID 5 across about a zillion disks so is not the best system to provide me with the 400000 IOPs that is touted as being possible with vSPhere. The specifications of the Hyper-V VM against my sh*tpit server were as follows;

To gain a quick comparative idea on how a Microsoft quoted Biztalk simulated workload would run on an ESX setup I set to test the benchmark IOmeter parameters within my ESX 3.5 VM to see what comparable IOPs and Other readings I would achieve by using VMware ESX.

The results I experienced were as follows;

I think the results speak for themselves even with ESX being run on a lower expected performing platform. Now please don’t sue me Microsoft for writing about this….I really give you upmost credit for writing this document as it gives someone the opportunity to learn how your application stack works and is expected to perform and would like to see more material and vendors following suit (including VMware)

Dynamic hot add features for VM’s – Friend or Foe?

With the latest feature sets on offer within vSphere focusing on enabling IT departments to perform dynamic on the fly change to Virtual machine resource, this post is provides a simplified view point on the likely impact and required changes to existing processes in departments today, commentary is also provided on what issues may arise within current change processes and what effect this has on typical financial authorisation processes internally. The main question of the post is;

“How will Hot Add features fit into current IT environments and processes?”

The general consensus and observation across various virtualisation experts on the newly introduced sets of feature is that some view the feature as a god send to operations, some as a mechanism to reduce numbers of people involved in the process to actually implement change outside of conventional outage periods and some at the opposite end see it as potentially even technically being detrimental to performance when hot add occurs, the later certainly stands quite true, when adding more CPU’s this increases issues with large amounts of %RDY time if you have a workload or OS which isn’t capable of dealing with multi threaded activity.

For operational change management teams I predict we will most certainly see the feature requiring careful planning and adoption control when being introduced into departments that currently already have business processes in place for planned maintenance, and lastly most organizations that also have approval processes for gaining cost approval on components associated with Infrastructure services and upgrades will need to change and control dynamic growth.

The crux is that something that anything current IT departments are looking to buy into that has a charge associated with it is almost certainly going to need a sustainable business case as to why you should purchase it. Any organisation is in the current economic climate most certainly experiencing large amounts of kickback for any future potential investment above and beyond the norm, it maybe a Small business that has been told you need to use VMware server for free rather than buying ESX or a medium business being forced to only use ESXi rather than full blown and get on with managing your hosts on a singular basis all the way through to large corporate enterprises that are being forced to reduce spend on Vmware features such as the topic of this post Hot Add.

It is definitely a dead cert that Hot Add resource will save time and money in the small and medium sized operational environments, it avoids spending money planning out of hours coverage as changes if approved are technical possible during the day, the size of the business and maturity level of Infrastructure usually means that the department has to be quite reactive when it comes to reported issues with performance. This flexibility may not be the case in larger enterprises, most larger organisations have a defined business process implemented that has a change control process enforced to ensure that large outages across the Infrastructure landscape are mitigated and also ensure that any proposed changes are regressable within decent time frames if or when that change goes wrong. Change and approval processes are also in place within organizations for things such as service improvement plans, a technical design authority within a company will approve a larger upgrade such as increase of RAM to a host or multiple CPU upgrades due to the overall effect on the larger picture i.e. your Virtualisation farm and general Infrastructure.

Vmotion remember that?

In comparison when you look at how a technology such as Vmotion works today, this has certainly made organisational process required to plan for maintenance and upgrade tasks to physical hosts a hell of a lot faster and hot add is no doubt going to be very similar, the premise being that you are enabling IT Operations in large amounts of organisations the opportunity to be able to mitigate and perform maintenance tasks that they would not usually perform on VM’s unless pre agreed outages are approved by CAB that allows them to shutdown the VM and add resource. As with most things however their are some possible barriers which may not mean hot add is a possibility, the risk and caveat associated that I can see causing stumbling blocks in organisations are as follows;

Hot Removal

I’ve investigated the support of hot removal and it doesn’t exist as a supported function in Windows versions (only hot add/replace), vSphere may support hot removal at Virtual Hardware level but the important factor is at the operating system layer the OS doesn’t, this may shoot you in the foot when quoting that hot add/remove can be performed at any point in time without disruption, remember CAB’s like a regression path (lack of this is quite a good excuse most times for them to defer a change) say you experience more problem than before and can’t remove a CPU or RAM unless you turn the service off and you are walking a dodgy tightrope. Also reducing CPU’s from VM’s is NOT a cleancut 1-2 process, it needs HAL changes. (which I believe on Windows 2003 is not supported in full)

Cost approval and control

RAM and CPU is not free, yes we all know that virtualising workloads provides additional HW resource to use the underutilized hardware resources more efficiently, but each Megabyte of RAM in your hosts still costs money, this starts by you needing to ensure that lower level Virtual admins do not have access to add RAM at the click of a button the minute that a support call gets raised from someone experiencing poor application performance. This is where implementing a full vCenter delegation model is very important.

Within organisations that are currently recharging per VM and recharging back on the actual VM size this is quite important as there is scope to use more underlying resource for the same money if a corner is cut, increase the amount of RAM/CPU constantly and this cost model goes out the window and so do your bottom line figures.

The alternative model of using a Chargeback model for VM’s certainly works more in tandem with hot add, a billing mechanism ensures that if a business unit or a developer requests more RAM it will end up on a PO for approval. If using also VMware FT, this volume of RAM increase needs to double! (FT dosnt support more than one vCPU). For Chargeback enablement in your organization look out for vCenter Chargeback which provides this functionality within the vCenter console http://www.vmware.com/products/vcenter-chargeback/

Summary

Ultimately to avoid any possible issues and arguments in CAB, the key is to ensure you rightsize your Virtual Machines and match the workload correctly in the first place. Hot Add of resource for OS’s supported running in VM’s really is a great thing, it is enabling x86 Workloads to act almost non stop mainframe style which means you have very little experienced downtime for apps and services, this is exactly what you get when you use Mainframe systems and the associated benefits of non stop. This strategy of dynamic growth enables VMware to acheive the goal of turning x86 into the new mainframe, this will undoubtedly play even a small role in adding huge amount of functionality improvements and value add within your organization so make sure you embrace it but make sure it dosnt get out of control.

VM Sprawl – prevention rather than cure

The subject of vm sprawl is probably becoming quite common within any organisation that has virtualised a vast majority of their server estate. With analyst reports such as this one http://www.techworld.com/virtualisation/news/index.cfm?newsID=115876&pagtype=all who are stating huge growth and adoption of Server Virtualisation within datacentres over the coming years, it is now time to start to look at methodology, technology and IT governance to control this popularity and sprawl and ensure that you do not truly become a victim of your own success gained through Virtualisation. This post will go over some of the small amounts of methods and techniques you can use to control and govern VM sprawl through both technological solutions and through governance and auditing processes.

Within the virtualised world today many people who have deployed virtualisation have the problem that business leaders and purse string holders now know about the great fantastic cost benefits that they can obtain from investment in VMware and other Hypervisors, examples of some of these benefits include;

  • Ability to do more with less physical infrastructure and with less upfront capital expenditure
  • Capability of deploying cheaper DR
  • Reductions on project opex resource costs
  • Introduction of improved agility for development lifecycles
  • Reductions in deployment times on production rollout of server instances
  • Reductions in real estate, they can even rent out your space if your part of a massive group of organisations

Large amounts of IT shops have enforced the ever popular “Virtualisation first” policy since the virtual boom time around 2006/7, this opportunity has been enabled by the excellent benefits and the excellent work VMware have done in ensuring that production workloads can and will be most certainly suitable on a Virtualised platform. This policy and the return benefits that the end service requestee has gained however is now almost likely starting to mean that your Infrastructure is growing out of control and at quite a rapid rate due to the popularity, also you are finding due to the agile benefits requestees for projects are probably slipping in the odd extra VM in an estate which they would not typically do with Physical tin due to the associated cost and process to deploy. You may find you start to experience operational issues such as your storage array is becoming full, your Networking switching full due to host demands and in worst cases that lovely space that you gained back through aggressive P2V’s strategy circa 2005/6 is now needed back for ESX hosts, all requiring investment again.

Reactive measures

In its simplest form VM sprawl reactive resolution can start by general house cleaning, this wont require you to purchase a product as using Virtualcenter can quite easily accomplish and target reductions if needed. For example some VM’s might be not registered on ESX hosts, some might be replicated or spun off to a clone due to original operational issues when the app team or ISV deployed the VM. You may also find that your actual presented VMDK’s for VM’s are way under filled so they can be shrunk to regain space.

On the consumed storage issues, vSphere 4 introduces a few added peices of functionality which will aid and reduce this in future, any recommendations are based on current releases. Main features include Thin Provisioning of VM’s, this will enable you to grow VM usage and not have what is effectively whitespace within your VMDK’s unable to be used.

Proactive planning and prevention

Every virtualised environment should have at least some kind of documented audit, if you have not got a CMDB then in simplest form an Excel spreadsheet provides a simplistic view of your Virtual Infrastructure and allocation. Virtualcenter has exportable reporting built in to contribute to build even a simple spreadsheet, to see this in action withinin your Virtualcenter today goto “VM and Template View” then select the highest level folder then select “File > Export > Export List”. Some VI Admins may be quite clever with powershell scripts or by building SQL queries but this is quick and easy and intuitive. You can use this type of audit to also help capacity planning for your environment, this enables you to monitor how much space you have left and perform simplistic “What If” analysis on how much disk, RAM and CPU resource you would have when adding a new machine that is being requested.

Again VMware Virtualcenter will at some point this year have functionality within a module called CapacityIQ to enable you to gain this functionality from within the vCenter console, for more information see http://www.vmware.com/products/vcenter-capacityiq/ on this. I’ve seen it in action and its great, it provides out of the box functionality which will most certainly aid what I’ve said about within this post.

The Rolls Royce solution

For larger enterprise sized Virtual environments, keeping track of the constant demand and growth demand is impossible and to succeed IT services ideally need to be self service based with the end user or customer being able to request what they want through web mechanism. It would sound stupid to provide the enduser with control to increase even more the created problem of sprawl that you are experiencing, however to combat this the SSP (self service portal) can be provided with delegated privileges, pre defined object creation control, approval processes to higher level management or project support offices and also they can provide proactive benefits such as what if analysis and tombstone of Virtual Machines. All policy within the technology which is applied is set by IT governance policies and defined according to business requirement within the tools.

Two example products which provide self service portals include;

These technologies are currently rather low on uptake and adoption within organisations today, there maybe more technologies on the market but with using example functionality in the above products we will certainly start to see more and more as IT departments struggle with the demands from the business for Infrastructure. I also predict that the technologies will also start to become known as has with VMware the killer app to reduce lost productivity gain within organisations and project teams.

The issues today with the products are they currently they do have medium to large price tags associated which puts off the typical bean counter when businesses cases are put forward, so before building any proposals do your research on the product and see where you feel it is able to reduce and cut current tedious expensive business processes, VM Sprawl and improve your budgeting cost projects so this can be equated into a measurable deliverable ROI post deployment of such product.