Thursday, April 03, 2008

Steve Loughran on 'Farms, Fabrics and Clouds'

Yesterday I and my colleagues at RIS Technology had the pleasure of attending a remote presentation given to us by Steve Loughran, who works as a researcher at HP Labs and is also a committer on the Ant project. I had seen Steve's slides from a presentation he gave at the University of Bristol on 'Farms, Fabrics and Clouds' back in December 2007, and I have been pestering him via email ever since, hoping to have him release a screencast. After much back and forth, Steve offered to simply present for now directly to us via Skype. He did it out of the goodness of his heart, but both he and I realized that there's a nice little business opportunity in this type of presentation: you release the slides with no audio, then you get hired to present to interested parties in person, remotely, via Skype and a shared set of slides, with a Q&A session at the end. Everybody wins in this scenario. Filing it in the 'ideas worth trying' category.

To come back to Steve's presentation -- here are the slides from a previous version. I hope he will soon post the updated version we saw yesterday, but the differences are not major. The co-author of the talk is Julio Guijarro. Their area of interest within HP Labs is the deployment of large applications across distributed resources and the management of these apps/resources with an eye to maximizing their output and minimizing their cost. A familiar (and hard) problem for everybody who works in the hosting industry.

Steve talked about how the infrastructure architectures have changed over the years from a single web server talking to a single database server, to clustering, and finally to server farms and computing-on-demand. The challenge for us 'server farmers' is to figure a way to manage thousands of servers, heaps of storage, a myriad of network infrastructure devices, and large distributed applications on top of that -- all while keeping everything purring and happy, running to their maximum potential. Sounds impossible, but Amazon seems to be doing a decent job at it. And in fact Steve spent quite some time talking about how Amazon changed the game by their S3 and EC2 offerings. Even though they're not quite ready for prime time in terms of production deployments, Amazon will soon get there. As a proof, see their recent introduction of static IP addresses in EC2, and of the possibility of running your application in different data centers.

In my opinion, the best of Steve's slides are the 'Assumptions that are now invalid' ones. They really turn the 'established facts and best practices' of infrastructure and application design on their heads. Here are some examples of assumptions that don't hold anymore in our day and time:
  • it is expensive to create, deploy and duplicate a new system, running a Linux image of your choice (see Instalinux as a counter-example)
  • system failure is unusal and 100% availability can be achieved
  • databases are the best form of storage
  • you need physical access to the data center
  • a single server farm needs to scale to infinity
My other favorite part, which is not in the online slides yet, is the concept of 'agile infrastructure'. I haven't seen this concept before applied to server hosting, but Steve has a great point here. If you look at something like Amazon EC2, where you can pay as you go, you can test you application in a smaller environment and then scale it up, you can move your application between data centers -- this is indeed an agile environment that also imposes some new demands on your application.

I really recommend that you check out Steve's slides. There's a lot to chew on, but you can't afford not to chew on it, if you have anything to do with the IT industry these days.

Here are a couple more links that might prove useful:
  • Anubis: a tuple-space implementation that uses multicast to share information between hosts within a site
  • SmartFrog: a technology from HP used to distribute and manage applications (think puppet but geared towards application deployment); see also Google video
Thanks again to Steve for presenting to us. Now, as a server farmer, I need to go back to my plow and try to improve it (maybe buy a tractor?)

Update: Steve has some more thoughts on the Agile Infrastructure concept. Intriguing. This is something I'll definitely keep a very close eye on and tinker with.

1 comment:

Sargon Benjamin said...

Good stuff here. Frog from HP looks very impressive and I hope that test teams put in a concentrated effort into setting up an execution and provisioning framework. It seems a bit heavy but its also got a ton of features. I guess another option would be to build a lightweight scheduler and runner that sits on top of STAF to act as an execution framework. Keep up the blog - this is really good content!

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...