Even Stranger than Expected: a Systematic Look at EC2 I/O

At Scalyr, we’re building a large-scale storage system for timeseries and log data. To make good design decisions, we need hard data about EC2 I/O performance.

Plenty of data has been published on this topic, but we couldn’t really find the answers we needed. Most published data is specific to a particular application or EC2 configuration, or was collected from a small number of instances and hence is statistically suspect. (More on this below.)

Since the data we wanted wasn’t readily available, we decided to collect it ourselves. For the benefit of the community, we’re presenting our results here. These tests involved over 1000 EC2 instances, $1000 in AWS charges, and billions of I/O operations. (more…)

Read More

Introducing Scalyr Logs

Today we’re excited to announce a pair of new services from Scalyr:

Scalyr is a new approach to server monitoring and analysis. Traditionally, this has been treated as a series of special-case problems: timeseries/graphing, log search, external monitoring, dashboards, alerting, exception tracking, performance analysis, etc. In my career, I’ve had to juggle too many tools in an attempt to get a complete picture of a system’s behavior — and been frustrated at the disconnected, patchwork result. I’ve spent far too many hours trying to figure out which graph explains why my pager went off, or which logs might help me understand why an error graph just spiked, or taking random peeks into log files because I don’t have a tool that can analyze them in the way I need. (more…)

Read More