Cloud Overbooking – Part 2

Following last weeks post which went into where I think the potential requirement for similar algorithms that are used within the airline overbooking model may need to be used within a Public cloud provider such as EC2, I will now blog with a second part which provides some predictions on where I think that similar software and associated modules that may start to arise within the world of Cloud providers.

As stated in Part 1, I work for an airline, this dosn’t mean this post will include industry based secrets, but what I will provide is a comparison of technologies used to ensure the overbooking strategy works. 

The cloud revenue calculation thingy 

Today within the Airline industry are various commercially available software technologies that calculate what an airline can make from various different seating strategies on certain key flights, dont ask how this does it but the companies that have designed this are certainly not short of a bob or two, meaning its very niche and very clever (and works).

Looking to compare this to the world of Public clouds and I think we may see Software ecosystems arise as has arisen within reservation and booking worlds. A couple of thoughts collated that I think may or may not emerge withing the future state of cloud computing include;

  • Potential third parties selling third party software to Public cloud providers to calculate optimal times or prices to charge customers, or do Amazon already do this?  
  • If cloud is going to provide fluidity and flexibility than say your Electricity in the home will we potentially see variable seasonal or peak pricing charges emerging once the Cloud starts to become more heavily adopted and resource becomes scarce
Will we see larger customers obtaining a first class citizenship in multi tenancy environment and recieve higher weighting and priority when resources become scarce, the same as Airlines do in similar ways for frequent flyers, or will a model exist where they be exposed and at risk more like Economy travellers are in the Overbooking model where they pay smaller prices for services but run the risk of being bumped off of the core underlying cloud service? I am only speculating here, its difficult to know what really goes on within public cloud business plans but it maybe potentially something that may start to become more apparent as people transition from conventional outsourced models into cloud based environments. 

Screen Scrapers 

Just another crazy thought that i’ll leave you with which is completely seperate to overbooking and is in regards the potential role of screen scrapers in “Cloud commerce”. In the reservation world, screen scrapers play havoc on travel industry websites if they are not controlled, in a nutshell a screen scraper is basically a third party whom are scraping say a Airline booking site to scour for the best deal. If not controlled correctly, scrapers play havoc with underlying ecommerce environments because they consume transactional space and mean the real humans end user experience who is using the website directly suffers. Screen scrapers can work in an airlines favor though, some airlines have agreements with some third parties to “scrape” and some airlines have partnerships with third parties who provide indirect services. 

So within the world of Cloud services are we going to see an influx of parties screen scraping big players like EC2 and draining ecommerce portals? Imagine hundreds of screen scrapers upon screen scrapers scouring main portals to see if EC2 has a good price, suitable AMI’s, suitable SLA’s (dont laugh), and many other charateristics within? It would degrade end user services and potentially steer them to competitors……Just more crazy thoughts that i’ll leave you with.

Thats all folks until next time

Cloud Overbooking – Part 1

Now for something cloud related as I haven’t waffled on about cloud for a while. This Two part series (it got too long for one post) is based upon oversubscribing or over allocating strategy within public cloud world. Within this first part I will use the current Airline reservation overbooking strategy and use this as an example to potentially see where similar algorithms may start to be needed to calculate workload allocation in a typical open Public Cloud provider. This post was also super charged by this excellent post on what the blogosphere classes as the difference between capacity oversubscription and over capacity models within the Amazon EC2 service.

So ever been bumped up or bumped off?

No this isn’t a question about your mafia status, I am talking about flight bookings. As you may have noticed from the “about me section” I currently work for an Airline, with this in mind I will use some of the (small) knowledge I have gained on how the oversubscription model works in our world. It is a well known fact that the Airline industry falls into a number of industries that “overbook” on certain flights, see this definition for full gory detail on how this whole process works behind the scenes but in a nutshell it is an algorithm used by the travel industry to work towards achieving full capacity on certain flights by taking more upfront purchases than is available in the reservation system. Overbooking tends to affect the lower entry level economy passenger who is paying less for his seat and is likely to be less of a regular customer, lastly overbooked passengers are all covered for compensation in many shapes and forms such as being offered either a seat on the next available flight or a volume of cash that makes them happy.

So hopefully after reading the brief detail on how the overbooking model I am beginning to think we are going to see a overbooking or oversubscribed type strategy needing to be adopted within Public Clouds. To justify my comparison, simplistic marketing from Public cloud companies state that you can buy a workload in EC2 from the Cloud provider and assume it will be able to provide you with the compute and networking requirements that you would get if hosting on premise. Based on this comparison in a shared multi tenant public cloud do you think the same rules could apply to allocation models of cloud workloads?

Rate of change of public cloud a problem?

Public Cloud adoption is happening at a very fast rate, in future I assume public cloud providers such as EC2 are going to start to hit massive problems with not being able to facilitate large volumes of customer requirements and I also predict that public cloud is certainly not capable of facilitating concurrently every single customer that has ever laid eyes on a public cloud Virtual Machine in such providers. Therefore I believe that to succeed, Public Cloud providers are going to seriously need to look at the level of service they can potentially offer and design an algorithm similar to what Airlines have developed within the Overbooking model. Remember you are not always guaranteed to get the seat on a plane that you always want but most customers are happy to take compensation in return. Interestingly the likely compensation from a public cloud provider is not likely to be high if you fail to get what workload you require….


I admit that using this comparison between Cloud providers and Airline reservations is quite a cynical view, but putting this into perspective my view is that EC2 and any other public cloud provider that is struggling to control who is able to buy a workload and who wants to use a workload is going to hit massive PR and Customer relation problems just like you get when an airline unfortunately overbooks a flight with 20-30 economy passengers.

In Part Two I delve into various areas and technologies that exist today in the Airline reservation world and align these to how they may emerge within the world of cloud as potential problems or answers to common problems.

Abstraction….love it or hate it?

So I am an Infrastructure guy surrounded by massive volumes of technology in the industry which operate above Servers, Storage, Networking environments to enable certain goals. Within the technology some have a level of abstraction (or virtualisation) to provide a level of “ease” to make portability and migration easier between the lower level and upper level component i.e. VMware server Virtualisation or SAN Virtualisation arrays, we also have lower level components that we just don’t realise like proprietary volume managers/file systems.

Unfortunately though the industry is still despite being full of such glorious technology plagued with any kind of easy migration and flexible movement capability and by this I mean some of the following examples that I hear and see about day to day in the industry;

  • Replicating/Cloning from one SAN Array Vendor to another,
  • Migrating a VMware VMDK File format to an alternative Hypervisor vendor such as Microsoft with VHD,
  • Migrating from one NAS vendor to another NAS Vendor,
  • Running a J2EE workload between two middleware stacks i.e. Weblogic and Glassfish,
  • Restoring backup jobs from one vendor onto another (even within the same vendor)

Fortunately with the above common examples you do have some technical options, for example on the SAN replication problem you can use a SAN Virtualisation appliance like a Netapp V-Series or a IBM SVC, however you do need one of each appliances on the target and source, for starters this is expensive and you also have support issues with this from the underlying storage array vendor, you also have various other potential issues that may arise all to achieve what is merely just copying data from A to B (maybe not that simplistic but im a simple guy remember).

To address cross Hypervisor migration, within the Server Virtualisation industry we have the Open Virtualisation Format (.OVF). As Server Virtualisation is more my bag I will use this example for the rest of this post. With OVF the industry has got together to build a standard for portability of Virtual Machines, bear in mind before you run off shouting EUREKA this isn’t live migration, to migrate requires minimal downtime as it only works on cold migrations which is still a pain in the rear, however this means you can in theory move from one Virtualisation vendor to another by using export/import capability with OVF.

Unfortunately even with functionality that enables us to address the common interop problems in the datacentre such as replication between different array vendors and migration between Hypervisor Vendors, being humans and never satisfied the most common groan I hear is that they never actually provide 100% confidence and functionality that the original abstraction layer did, so for example with OVF you can’t migrate “a la” Vmotion style between hypervisors. They also add large volumes of overhead and support, you need someone to operationally support VMware, you need someone to operationally support SAN Virtualisation arrays etc, and additionally they can end up actually costing more if you do not do complete TCO analysis on the actual solution being acquired to address the problem in the first place.

This leads me to the conclusion even with the limited knowledge in IT I have on such topics that unless large volumes of innovation and collaboration occurs we are never likely to see such technology or initiatives occur where we the customer or end user have this reduced overhead penalty for portability. Being the cynic I am, I am seriously starting to think that solutions that address problems are merely a level of abstraction that is pushing the problem faced higher up the stack for something else to be affected or to deal with, additionally the increase in layers means yes you got it more support and TCO costs. So based on my theory for example lets look below at how much abstraction is required to enable OVF functionality, and how this compares to abstraction of the application environment of yesterday;

As you can see the cost of gaining OVF means this functionality adds 2-3 more layers of abstraction in order to work and be fully exploited, and lets not forget the increase in Software cost/renewals in order for the Vendors to develop and support such features, lets also not forget the fact we have underlying components which are required to reach the goal.

Quilt, its a nasty thing

To be fair on the industry and VMware in particular this portability means tasks in the legacy world is a breeze. Before meant we faced the lenghty costly strategy of having to move that relevant Workload onto another x86 server with re-installation of OS/App components, we had to plan this a lot more harder and in actual fact it also very rarely actually happened when physical tin was EOL, the kit just sat in the datacentre rotting, we also didn’t have OS Refresh capability.

My concern

So what am I getting at here? One minute I am whinging about abstraction the next I am praising it, I guess its a bit like Marmite with Abstraction, you either love it or hate it. But to summarise my concern is this, the more I hear about new technology that helps me to solve another problem the more I see a level of abstraction being introduced into the stack, this to me means that it now means more software purchase costs, more ongoing OPEX costs of that software, more layers of operational complexity, more concerns and arguments with my lovely ISV’s on support statements i.e. and all round more TCO on products that already struggle to have a good TCO.

Moving forward and being so young and naive would it be unreasonable to hope that the industry vendors look to reduce this overall use of abstraction and to combat a problem in a more practical way that ensures we do not have multitude of abstraction layers? Or alternatively is there any technology which addresses the above problems that I am not aware of that bloggers are aware of? (on the later it is not an invitation for vendors to tender so forget spamming me if you are a vendor!),

Additionally I am not looking to implement such technology (yet), but I do see this as a potential snowball growing in size as the move towards the demand from businesses for more of an agile and flexible datacentre environment, so I am interested to know if you think I am talking absolute rubbish or whether you agree….