Computing Utilities
So, imagine this scenario: you have a rack full of servers that hum along at 20% of capacity, performing "mundane" tasks. Wouldn't it be nice to harness at least part of that remaining computing power for your batch calculations? Wouldn't it be nice to save some money by not having to expand that costly, fast cluster you use for your fluid-dynamic / atom-splitting / weather-forecasting computations? How about using the same programming model, job submission tools and management tools that you use for your main cluster on those lazy machines? How about having the same security and isolation?
Here's the idea, admittedly far-fetched (well give us some time):
- Virtualize the "lazy" machines with physical-to-virtual tools, like system center virtual machine manager (https://www.microsoft.com/systemcenter/scvmm/default.mspx)
- Install a hypervisor on those machines. We are producing one for Windows Server 2008, in case you haven't noticed :-). Take a look at Arlindo's blog on https://blogs.technet.com/aralves/archive/2007/02/28/longhorn-hypervisor-demo.aspx for an idea of what is possible.
- Configure a partition (virtual machine) that does the mundane job. It contains the same n. of virtual processors as in the real machine (or up to what is supported by the hypervisor, whatever is smaller), but it is assigned a portion of the memory and the ability to consume up to 20% of CPU (or whatever is appropriate). System Center will help you with sizing.
- Configure a second partition with Compute Cluster Server and make it a member of your cluster. Again, this has as many virtual processors as the real machine (or up to the supported limit), but it is allowed to use up to the remaining computing power minus about 20% (so, 60% in our example. The 20% is a rule of thumb for the hypervisor overhead, not a scientific recommendation).
- if your workload requirements change, you can dynamically alter the CPU allocation to the partitions. E.g. if you need 60% of capacity for the "mundane" tasks at certain times, you can schedule a script that pauses the cluster nodes, then changes the allocation. There is no need to switch on or off and repurpose machines.
What are the scenarios where this is suitable?
Virtualization does impose a performance penalty, even with hypervisors. Also, some hardware (e.g. infiniband) will not be virtualized. So, don't use this technique for real-time, massively parallel computations.
However, for parametric sweeps, especially in batch jobs, this recoups some valuable computing power with little management overhead. Having 1000 CPUs, even if they are in reality a time-share of real ones, may be a worthy addition to 100 dedicated ones.
So, worth a try? If you think so, drop me a note on gmarchet@microsoft.com