A Systematic Look at EC2 I/O

At Scalyr, we’re building a large-scale storage system for timeseries and log data. To make good design decisions, we need hard data about EC2 I/O performance.

Plenty of data has been published on this topic, but we couldn’t really find the answers we needed. Most published data is specific to a particular application or EC2 configuration, or was collected from a small number of instances and hence is statistically suspect. (More on this below.)

Since the data we wanted wasn't readily available, we decided to collect it ourselves. For the benefit of the community, we're presenting our results here. These tests involved over 1000 EC2 instances, $1000 in AWS charges, and billions of I/O operations.

Transparency in Cloud Services

37signals recently launched public "Uptime Reports" for their applications (announcement). The reaction on Hacker News was rather tepid, but I think it's a positive development, and I applaud 37signals for stepping forward. Reliability of cloud applications is a real concern, and there's not nearly enough hard data out there. Not all products are equally reliable; even within 37signals, the new reports show a 3:1 variation in downtime across apps.