VMworld Cannes

Hi all,


I am fortunately (for me not you your probably reading this at work in the week) off to Vmworld in Cannes this week to obtain the latest roadmaps on VMware tech and partner tech, and top up on all things Virtual. I missed last years VMworld but this will be my Third VMworld event which I have attended.

After i’ve fended off the various vendors who will no doubt be ringing me the Friday I get back I am hoping to feedback with some down to earth news and views on what is being touted in any public announcements by the vendors, what new up and coming tech is going to arrive with ESX 4.0 or vSphere (Who knows might not be that now), new tech arriving in the Cloud arena from VMware and do some quality networking with people sharing the same pain as me on various areas of virtualisation projects and intiatives. If I get time might even do some live blogging during the week…

If your going down I hope to see you and touchbase, will be good to put a name to a face.

Blade Versus Rackmount…ding ding round one!

Sorry folks, been a bit busy last week for posts so I will hopefully be able to create some new material.

I am sure anyone who has deployed new x86 Server hardware on a large scale and especially for a Virtualisation project has most likely had the task of comparing blade versus rackmount for your physical host. Aaron Delp is doing some fantastic articles on how Blades compare in scalability options and also on the Green agenda with there Rackmount counterparts. Part One is on http://blog.scottlowe.org/2009/02/04/blades-and-virtualization-arent-mutually-exclusive-part-one-hp-power-sizing/ and you can follow part two and Three that is yet to added and is based on scalability options with HP C Class blade.

This post will hopefully provide some ideas and general advice for any project that you may be doing or about to be involved in and also to provide some rolling ideas on what areas may need operational process change or technological change to support possible Blades.

Blade ROI Claims

Blade has a ROI at a certain amount of blades, this is apparently according to HP on C Class Blade figure of 9, when you look at the cost saving of the core Infrastructure components such as Ethernet Cable, SAN Fibre Connection, Power connection and also iLO ports it starts to become evident when you do your sums that you will save money when compared to your typical rackmount and why this number is relevant.

You won’t get this ROI by just putting in a blade chassis and putting in the Nine blades, you will need to prepare and ensure that certain design criteria is implemented optimally and whether it is even supportable by the additional dependant infrastructure such as SAN and LAN switching.

Due Diligence and design consideration

This process is a priority regardless of what your views are on Blade, you may be completely sold but I urge you to still do this. Design considerations need to be relevant to the key areas such as your organizational structure, design strategy for growth and scalability of the running app landscape, how agile you are internally with your deployment process and also at the host level consideration on the running applications and services which will go onto possible blade candidates, for example considerations such as is the application stateless like VMware View or Citrix setup or does the requirement have Multiple running workloads such as a ESX host or SQL DB.

Another point to consider is does the application have failover capability in another chassis like MS Clustering or VMware HA or is it a standalone application with just conventional DR practice. These example questions are all factors that need to certainly be taking into account when you look into the Blade/Rackmount options and whether you have enough flexibility to cope with the various demands of your application landscape.

The trouble with Blades is that a lot of companies are buying them without blinking an eye lid and a lot of companies are afraid to buy them due to the horror stories with early generation releases. This difference builds a big collision of opinions in the blogosphere and an array of manufacture bitchslapping between vendors on topics like what makes there chassis better than the other manufactures chassis, which one has more easier management etc etc.

The key is to shut the industry arguments out, do your risk analysis on whether Blade or Rackmount is right for your organization not a FTSE 100 or Joe Bloggs PLC case studies (not initially anyway) and really investigate how both technical and operationally you will benefit from deploying blade or rackmount within your environment

Old Faithfull

Rackmount is a completely comfortable and known entity to Architects and Engineers, it has been the defacto standard for server computing for a exceptionally large number of years in the x86 world, Blade on the other hand is the new kid on the block, it however in concept is something that has been present in Network Switch technology with Line card modules for also a number of years and probably more recognizable to Storage architects and engineers in Fibre Channel switch directors.

When you look at the technical Hardware specification of HP Blade C Class range (which is my preferred btw) the range has certainly got a full catalogue of solutions available which are capable of the same CPU and RAM scalability, I do not know about IBM or DELL I have never used them so will just stick with HP Blade as my target architecture, this is not to say they are no good I just won’t blag I do! (don’t laugh its true) Alternative vendors use the same building block principle across commodity servers and would be stupid not to, so this proven practice is hopefully of high relevance even if you do have an alternative manufacture as your standard strategy for Infrastructure and possible Blades.

Without trying to steal some of Aarons subject matter on his posts, the differences between Blade and Rackmount certainly starts to become more of a challenge when you look at expandability and upgrade capacity with additional connectivity, as stated the actual Host Hardware resources on higher end blade models become comparative on the RAM and CPU density, take a look at the DL580 then look at the BL680 as an example they both host up to 4 CPU’s and they both can take 128GB RAM Natively.

Hopefully the above is something you agree with so far and find of interest, the next sections provide some key factors to bring into your design discussions to weigh up between whether Blade and Rackmount is right for you, areas you may find useful to include;

Server Scalability

This is an important factor, when designing and planning for a Bladecenter you need to make sure that your workload requirements for the target blades can be met for the workload you will immediately house and for any planned future requirements such as growth of ESX Hosts in your virtual farm or increasing the size of an Exchange environment for more Users can all be met and delivered in future without having to completely redesign your Blade infrastructure and connectivity on the Backplane chassis.

In ESX environments you will need to ensure that you have enough capacity for the IO ports in the first initial deployment based on your Virtual Machine requirements. To ensure this running a Capacity Planner evaluation of your current Physical estate will establish how much Network and storage IO you will need to cater for running VM’s, also consider whether you plan on using Network Storage as this has a requirement for dedicated network ports in ESX.

Another factor is to ensure you have coverage for future networking requirements in new functionality, one example is VMware FT, this when looking at prerequisites on some teaser posts from VMworld 2008 shows it will need a dedicated network port for replication traffic and security, in similar context to VMotion requiring dedicated bandwidth and security for migrations.

Total Blade Chassis’s

This is more important to people looking at deploying blade in smaller environment. In a worst case scenario perspective here, in the event you did have issues with connectivity components on the chassis and when external connectivity to your blade chassis in the form of SAN and LAN connectivity, would a disruption to all running applications and services on the blades in that chassis be acceptable to your end users or customers? Is the cost of buying another Chassis to negate any outage issues going to be cost effective?

On a larger scale this is relevant to also anyone deploying into high dense Enterprise environments, you will need to consider where and in which position you put relevant Blade hosts across your chassis’. For example in a VMware Ha environment you may find splitting say 16 ESX Hosts across 4 multiple chassis’s is more suitable than hosting them all in a single chassis.

Chassis Connectivity

This is where the cost saving and all round savings magic sauce is within blade technology. In a conventional Rackmount server environment the back of the rack becomes the rat nest that most Engineers and administrators just hate, I wont even go into how much of a mess cable environments with 1 and 2U servers can become. Tidy cabling is something of an unfortunate prerequisite for any design by a perfectionist like me; It also is a factor of consideration to ensure that when you need to be agile, however its important to ensure your datacenter design will

For your blades you will need to understand cable trunking methodology and how to appropriately map this to external connections, this is the case for both LAN connectivity and SAN connectivity. In a consolidated blade environment operational build out and deploy will differ from the conventional equivalent of just plugging in two HBA’s to a Fabric switch. In simple terms you group multiple ports on both Blade Chassis modules and the end point connection.

Virtual Connect and most recently Flex10 module switching from HP throws this topic way over the horizon and is a post in itself, I seriously recommend you check out the HP material which is extremely good for planning and design of the backplane http://h71028.www7.hp.com/enterprise/cache/80316-0-0-225-121.html. I may even do a post on Flex10 as it looks pretty cool.

To 10GBe or to Not 10GBe

Most blades support 10GB today on both Mezzanine on the Blade and modules on the backplane, this is great if you have external switching connectivity that has 10GB capability and are using network storage. Using 10GB will hopefully reduce the need for high dense network port usage on Blade mezzanines, certainly in an ESX environment. Be conscious and open your eyes though to new emerging technologies such as FCoE, check out my last post on Nexus V to explain how this works in an example ESX environment with the 1000 Nexus V and Physical nexus switch range.

The cost of 10GB infrastructure is still quite high so it is recommended that you perform a cost analysis to see if implementing 10GB now rather than later is something that will save you money on port costs.

Future Roadmaps

And to cap it off (yes I know your nearly asleep) get onto your reseller, vendor and manufacture for roadmaps on where your chosen blade technology is heading, I seriously recommend you do this when investigating whether blade is right for you. It’s a bit like buying a new car and checking whether the new model is going to arrive in Six months time. This becomes very important when buying ESX Hosts to be implemented into a DRS Cluster, you need to ensure you procure enough so that you do not suffer from lack of backward compatibility with CPU step when attempting to perform VMotion and use DRS.

Important is to evaluate your roadmap for the hosted Virtualisation technology with both VMware and the Blade vendor. For example if you are going to host your ESX environment on Blade then I seriously recommend you investigate whether you need to plan for using future technology which requires Network connectivity for a feature like Vmware FT and whether you will need any additional support on SAN technology such as NPIV and future Storage technology within vStorage API.

Granted some of this may not be possible but try you are committing to a manufacture with their blades so hopefully they respect you are committing to the manufacture and can be provided with at least some roadmap.

Summary

Choosing any server technology is not an easy task, hopefully the above has provided some assistance and ideas into what you should look into when looking at Blade technology for your infrastructure needs.

My advice is to evaluate and perform enough investigation beforehand to ensure that you are not going to limit yourself in any future design and upgrade for your infrastructural needs, this is something that you may feel confident of not being an issue but hopefully the above has helped even if you do.

Nexus 1000v

Sheesh this write up is one uber informative and cool schematic with explanation of how Cisco 1000v works with the new converged Cisco physical nexus networking switches. http://www.internetworkexpert.org/2009/01/01/nexus-1000v-with-fcoe-cna-and-vmware-esx-40-deployment-diagram/

I like the concept behind the 1000v for virtualised environments. The Physical external dependency in FCoE and convergence is something I have not had the opportunity to look at yet so looking at this resource and Nexus 1000V since it was previewed at VMworld last year, has given me a great insight into how the two work in tandem within a Virtualised environment to increase capability and consolidate cabling and provisioning times at the same time. Think of it this way, today on a typical ESX host for say 15 VM’s which is using NFS and iSCSI and Fibre Channel it have upto 8-10 connections, this will differ completely when run and converged have a look at the diagram i’ve knocked up to show the conceptual view over 10GB FCoE (its only rough go easy)



Looking at the Qlogic QLE8042 CNA specification sheet it works seamlessly with standard 4/8GB Fibre Channel connection and is not soley for use within a FCoE stack, this could mean that you can invest today in a CNA and upgrade your external infrastructure quite easily without having to go and buy Zillions of HBA’s for say a whole ESX farm being built and ensure you can move to the converged switches at a later date.

There are some interesting comments to the post in regards to how this technology is currently not developed with Blade technology such as HP C Class in mind. Cisco are going to be announcing blade platforms http://www.theregister.co.uk/2008/12/11/cisco_blade_servers/ so I can imagine that they are not going to be releasing a product which doesn’t work with there next generation of switching technology, lets face it they will most likely be releasing backplane technology to reach greater heights than competitors do today with the likes of HP Virtual Connect as big blue things with exceptional quick backplanes is there core business.

Expect the market to most likely open up and be targetted for competition in probably the same way that various Blade switch vendors did in the early inception of blades and opened up for partnerships with the likes of Nortel and Cisco themselves.

NFS is back…again

Be carefull though when architecting connectivity for Virtualised environments and the backend physical stack to support those requirements, many shops use iSCSI and NFS today quite happily across a 1GB Networking infrastructure, so why does using converged technology gain advantage? Surely it reduces any cost effectiveness and cost benefits that have been achieved on typical catalyst and procurve range networking (maybe not Nortel hey). I’ll maybe touch on this and the whole iSCSI/FCoE war in another post when I have time.

So I guess one question being asked by architects and people whom will be investing is how long will it take for the Converged initiative to take off and gain popularity, we don’t want another Infiniband now do we with limited usage and adoption. Within a credit crunch it is not going to be easy to justify spend on such projects and its also not going to be easy to justify ripping and replacing your full core networking infrastructure to start using this type of architecture with its limited HCL (only about two on the current HCL) so I guess its a technology like most which will be more heavily adopted in 18-24 months time when it has been used by the banks and service providers who can invest and test (with our money anyway ;)).

Update 02.02.09 - Realised I’ve not gone into to much technical detail on how 1000v Works with VM’s in a virtualised environment so I will delve into 1000v in a later “part duex” post