Tuesday, May 29, 2012

A sweep through my Instapaper for May 2012

Here are some of the presentations/blog posts/articles I read this month, as saved in my Instapaper account. Maybe you'll find something useful in there too.
I also want to give a shout-out here to Gareth Rushgrove, who publishes an email newsletter called 'Devops Weekly'. If you are working in this field, I highly recommend you subscribe to it, as it is always full of interesting links and summaries to articles and tools.

Monday, May 14, 2012

The correct way of using DynamoDB BatchWriteItem with boto

In my previous post I wrote about the advantages of using the BatchWriteItem functionality in DynamoDB. As it turns out, I was overly optimistic when I wrote my initial code: I only called the batch_write_item method of the layer2 module in boto once.

The problem with this approach is that many of the batched inserts can fail, and in practice this happens quite frequently, probably because of transient network errors. The correct approach is to inspect the response object returned by batch_write_item -- here is an example of such an object:

{'Responses': {'mytable': {'ConsumedCapacityUnits': 5.0}},
 'UnprocessedItems': {'mytable': [
{'PutRequest': {'Item': {'mykey': 'key1', 'myvalue': 'value1'}}},
{'PutRequest': {'Item': {'mykey': 'key2', 'myvalue': 'value2'}}},
{'PutRequest': {'Item': {'mykey': 'key3', 'myvalue': 'value3'}}}]}}

You need to look for the value corresponding to the 'UnprocessedItems' key. This value is a dictionary keyed by the name of the table you're inserting items in. The value corresponding to that key gives you a list of other dictionaries with keys corresponding to the operations you applied to the table ('PutRequest' in my case). Going one level deeper allows you to finally obtain the attributes (keys + values) of the items that failed, which you can then try to re-insert.

So basically you need to stay in a loop and keep calling batch_write_items until UnprocessedItems corresponds to an empty list. Here is a gist containing code that reads a log file in lzop format, looks for lines containing a key + white space + a value, then inserts items based on those key/value pairs into a DynamoDB table. I've been pretty happy with this approach.

Before I finish, I'd like to reiterate the gripe I have about the static nature of determining your Read and Write Throughput when dealing with DynamoDB. I understand that it makes life easier for AWS in terms of the capacity planning they have to do on their end to scale the table across multiple instances, but it's a black art when it comes to capacity planning you need to do as a user. You almost always end up overcommitting as a DynamoDB user, and it's hard to make sense sometimes of the capacity units you're consuming, especially when doing inserts of large volumes of data.

Using AWS CloudWatch Logs and AWS ElasticSearch for log aggregation and visualization

If you run your infrastructure in AWS, then you can use CloudWatch Logs and AWS ElasticSearch + Kibana for log aggregation/searching/visuali...