In DevOps Incident Response, Plans Are Worthless, But Planning Is Everything

So said President Dwight D. “Ike” Eisenhower (more or less). His battles were fought in the trenches, not the technology stacks—but for DevOps teams, the principle holds. No plan survives contact with the enemy.

A plan is a set of instructions you can follow when you understand what needs to be done. Handy when the enemy is one you know, and can plan for. But the Enemy You Know isn’t what (literally) keeps DevOps engineers up at night.

Good DevOps teams competently respond to incidents, outages, and just plain weird stuff happening in their technology stack. Great teams go further—they see around corners, developing the instincts and skills to prepare for the unexpected. At Scalyr we start by reducing the risk of the dreaded 3 am call as much as possible. But we’d be foolish to stop there.

Read More

So You’ve Been Paged: A Guide to Incident Response (For Those Who Hate Being Paged)

One of the inevitable joys of working in DevOps is “the page” — that dreaded notification from your alerting system that something has gone terribly wrong…and you’re the lucky person who gets to fix it.

Here at Scalyr, we’ve got a few decades of collective DevOps experience and we’ve all been on the receiving end of a page. Even though we do our best to avoid being woken up, it happens.

In this post, we’re going to put some of that experience to use and show you how to handle an incident the right way. You’ll learn not only how to fix the immediate problem, but how to grow from the experience and set your team up for smooth sailing in the future.

Ugh...Phones.

Read More