Many people have asked for the source code behind our recent post on EC2 I/O performance. After some minimal cleanup, we have now posted the source code on Github: https://github.com/scalyr/iobench. We’ve also created a discussion group for this work: https://groups.google.com/forum/#!forum/scalyr-cloud-benchmarks.Read More
At Scalyr, we’re building a large-scale storage system for timeseries and log data. To make good design decisions, we need hard data about EC2 I/O performance.
Plenty of data has been published on this topic, but we couldn’t really find the answers we needed. Most published data is specific to a particular application or EC2 configuration, or was collected from a small number of instances and hence is statistically suspect. (More on this below.)
Since the data we wanted wasn’t readily available, we decided to collect it ourselves. For the benefit of the community, we’re presenting our results here. These tests involved over 1000 EC2 instances, $1000 in AWS charges, and billions of I/O operations.Read More
Today we’re excited to announce a pair of new services from Scalyr:
Scalyr is a new approach to server monitoring and analysis. Traditionally, this has been treated as a series of special-case problems: timeseries/graphing, log search, external monitoring, dashboards, alerting, exception tracking, performance analysis, etc. In my career, I’ve had to juggle too many tools in an attempt to get a complete picture of a system’s behavior — and been frustrated at the disconnected, patchwork result. I’ve spent far too many hours trying to figure out which graph explains why my pager went off, or which logs might help me understand why an error graph just spiked, or taking random peeks into log files because I don’t have a tool that can analyze them in the way I need.Read More
On the Scalyr blog, we sometimes post on topics relating to cloud computing in general. This is such a post.
Microsoft’s Azure service suffered a widely publicized outage on February 28th / 29th. Microsoft recently published an excellent postmortem. For anyone trying to run a high-availability service, this incident can teach several important lessons.Read More
37signals recently launched public “Uptime Reports” for their applications (announcement). The reaction on Hacker News was rather tepid, but I think it’s a positive development, and I applaud 37signals for stepping forward. Reliability of cloud applications is a real concern, and there’s not nearly enough hard data out there. Not all products are equally reliable; even within 37signals, the new reports show a 3:1 variation in downtime across apps.Read More
Welcome to the Scalyr blog. Today we’re announcing our first service, Knobs.
What’s a Knob, you may ask? Or perhaps, what’s a Scalyr?
First, a little background. I’ve spent a good chunk of my career developing “in the cloud”. (Building Writely, for instance — aka Google Docs.) It can be an amazing experience. With the variety and sophistication of services available today, I sometimes feel like I’m programming with seven-league boots. One day, you wake up to find that thousands or millions of people are using your work.Read More