What Does a Site Reliability Engineer Do?

Although site reliability engineering has been around for a while, it has only recently gained fame in general software circles. But there are still a lot of questions as to what a site reliability engineer (SRE) does. Much of what we know comes from the book Site Reliability Engineering from Google. And we’ll refer to that book a few times in this post.

SREs have been compared to operations groups, system admins, and more. But the comparison falls short in encompassing their role in today’s modern software environment. They cover more responsibilities than operations. And though they usually have a background in system administration, they also bring software development skills to the role. SREs combine all these skills and ensure that complex distributed systems run smoothly.

So how do they do all this? Read further to find out how SREs accomplish this through the responsibilities they fulfill.

Computer_with_wrench_and_Scalyr_colors_signifying__site_reliability_engineer
Read More

An In-Depth Guide to Nginx Metrics

In our guides Zen and the Art of System Monitoring and How to Monitor Nginx: The Essential Guide, we cover our monitoring philosophy. We also recommend a specific set of metrics to monitor and alerts to set for maximum Nginx happiness.

Here, we’d like to dive into the nitty-gritty of those essential Nginx metrics. We’ll discuss what exactly they mean and why they’re important. This will also serve as a primer for some of the (perhaps esoteric) terminology associated with web servers.

You can think of this as a companion the official Nginx documentation and as an appendix to our Nginx monitoring guide.

For now, this guide covers only the metrics available via ngx_http_stub_status_module, plus those associated with the F/OSS version of Nginx. More comprehensive metrics are available atngx_http_status_module. (This is included with the commercial version, Nginx Plus.)

So roll up your sleeves, grab your Slanket or Snuggie, and let’s talk Nginx metrics.

Ngnix_shape_with_Scalyr_colors_An_In-Depth_Guide_to_Nginx_Metrics
Read More

Introducing Scalyr Labs

We’re excited to announce Scalyr Labs, a way you can try out new and experimental features before they are released to production. Labs lets us share new functionality more quickly with you to get feedback earlier in the development process.  We are excited to give you the opportunity to be more actively involved in the direction of our product.

Features in Labs are far enough along where we expect them to work but would like your feedback to catch any missing functionality. In the unlikely event that they don’t work, they won’t cause any harm to your data or your account. You choose if you want to use them, which ones you want to use and easily turn labs features on or off at any time.  

The first features in Labs include some of the new features we mentioned in our Q4’18 product blog: Stack Trace Linking, Chart Annotations and Post Graph to Slack. We think all of these features will help improve collaboration among teams and streamline troubleshooting workflows.  We’re eager to release these to production so give them a try and let us know what you think!

Zen and the Art of System Monitoring

System monitoring is an essential but often-overlooked part of production software deployment. It’s as critical as security but rarely given the same attention.

By overlooked, we don’t necessarily mean ignored. Novice operations folks know that monitoring is needed, and most environments do have some basic alarms in place. Even if it’s just the sales department screaming “The website’s down!”—which can be effective, but perhaps not optimal.

There isn’t (yet) a standard methodology for monitoring, and there really ought to be.

So let’s change that.

Read More

Why Distributed Tracing Will Be So Important in 2019

As we round the bend into 2019, it’s worth thinking about where our industry is headed. There are many exciting and challenging developments ahead: blockchain scalability, functions as a service, databases as a service—the list goes on. We’re also moving more and more into an increasingly complex, distributed world. This means distributed tracing will become especially important.

Image_of_many_people_using_etch_a_sketch_signifying_distributed_tracing
Read More

A Guide to Container Lifecycle Management

Containers have changed the way we develop and maintain applications. One of the main promises of containers is that you’ll be able to ship software faster. But sometimes how this happens seems a bit obscure. If you want to understand the benefits of containers, you first need to know what the lifecycle management is. Once you understand that, it’s going to be easier to connect all the points; then the aha moment will come naturally.

In this guide, I’ll use the Docker container engine—that way it’s easier to understand the lifecycle management behind it. Commands might be different in other container engines, but the concept is still valid. I’ll start with the application development, and I’ll finish with how to ship application changes. A container’s lifecycle only takes minutes to complete and is a reusable process.

Lifecyle_arrows_with_Scalyr_colors
Read More

Introducing deeper support for Kubernetes, collaboration features and stack tracing for easier troubleshooting

As the move to the cloud and containers continues to make software delivery faster and easier, environments are getting more complex. At Scalyr, we believe that observability solutions need to help engineering and operations teams get access to all the data they need, fast. Along those lines, we are announcing new features to help teams support the latest container and orchestration technologies, improve collaboration, and streamline workflows for faster and easier issue identification and resolution.

Kubernetes Cluster-Level Logging

Our new Kubernetes cluster-level logging enables engineering teams to effectively monitor and troubleshoot Kubernetes environments by centralizing and visualizing logs by deployment. Regardless of source, Scalyr intelligently ingests, parses and organizes logs to give developers an application-level view for faster issue resolution in complex container environments.

As shown in the screenshot above, users are presented with log summaries by deployment  rather than by each individual container. The automatic grouping of logs pertaining to a deployment gives developers a more holistic view of each application or service that may be running on multiple containers and pods. Insight into individual pods and nodes is also available but that level of detail is abstracted by default so developers can focus on their application right away.

Read More

Microservices Communication: How to Share Data Between Microservices

For some, the ideal picture of a modern application is a collection of microservices that stand alone. The design isolates each service with a unique set of messages and operations. They have a discrete code base, an independent release schedule, and no overlapping dependencies.

As far as I know, this type of system is rare, if it exists at all. It might seem ideal from an architectural perspective, but clients might not feel that way. There’s no guarantee that an application made up of independently developed services will share a cohesive API. Regardless of how you think about Microservices vs. SOA, services should share a standard grammar and microservices communication is not always a design flaw.

The fact is, in most systems you need to share data to a certain degree. In an online store, billing and authentication services need user profile data. The order entry and portfolio services in an online trading system both need market data. Without some degree of sharing, you end up duplicating data and effort. This creates a risk of race conditions and data consistency issues.

At the same time, how do you share data without building a distributed monolithic service instead of a micro? What’s the effective and safe way to implement microservice communication? Let’s take a look at a few different mechanisms.

First, we’ll go over different sharing scenarios. Depending on how you use the data, you can share it via events, feeds, or request/response mechanisms. We’ll take a look at the implications of each scenario.

Then, we’ll cover several different mechanisms for microservice communication, along with an overview of how to use them.

Microservices talking to one another illusrating microservices communication
Read More

Top 4 Ways to Make Your Microservices Not Actually Microservices

Microservices have gained such a level of popularity these days that they’re often touted as “the way” to build software. That’s all well and good, except there are a lot of microservice anti-patterns flying around.

The vast majority of microservice systems I’ve seen aren’t really microservices at all. They’re separately deployable artifacts, sure. But they’re built in a way that, for developers, causes more pain than solves problems.

So let’s dive a little more into this. First, we’ll talk about what a microservice was intended to be. Then we’ll get into some of these anti-patterns.

Disguise_face_in_Scalyr_colors
Read More

Microservices Vs. SOA: What’s the Difference?

What’s the difference between microservices and SOA? The two design paradigms have a lot in common.

That makes sense when you consider that microservices are an offshoot of the SOA movement. But there are essential differences between the two systems. Let’s take a look at the two different approaches to architecture and highlight where they differ.

Before we dive in, it’s important to note that neither architecture has a universally accepted definition. So, you could spend as much time debating the details of what microservices or SOA are as you could arguing their differences.

Read More