Getting Started Quickly With PHP Logging

The previous articles in this series covered the basics of logging in C#, Java, Python, Ruby, Node.js, and JavaScript. In this post, I’ll show you how to use logging techniques in yet another very popular language: PHP.

I’ll open with a quick example of manual logging in PHP. Then we’ll revisit the details of why logging matters and what your logs should show. And lastly, I’ll show you how to set up and use the most popular PHP logging framework.

Let’s get started, then!

php_logging_scalyr
Read More

How to Create a Docker Image From a Container

In this article, I’ll provide step-by-step instructions on how to create a Docker container, modify its internal state, and then save the container as an image. This is really handy when you are working out how an image should be constructed, because you can just keep tweaking a running container until it works like you want it to. When you’re done, just save it as an image.

Okay, let’s jump right into it.

create_docker_image_container_scalyr
Read More

Containers: Benefits and Making a Business Case

Containers are hot stuff right now, so it’s natural that you’re here wondering what the business case and benefits of containers could be.

If this is you—if you’re looking to assess whether containers would make sense for your company—then you’re in just the right place. Because by the end of this article, you’ll not only have a good understanding of what containers are and what they’re good (and not so good) at, but you’ll also have some decision making criteria to help you decide whether they’ll work for you in your unique situation.

We’ve got quite a bit of ground to cover, so let’s get to it.

benefits_of_containers_Scalyr
Read More

Why Are Engineers Getting DevOps Fatigue?

As an engineer, you already have enough responsibilities when developing software. Adding more tasks–say, DevOps-related ones—to your workday activities might not sound very appealing. With DevOps, not only are you responsible for producing working software, but now you also need to automate the building, testing, and deployment phases of the software. That’s a lot to take care of! But the extra work aside, maybe you’re just tired of the DevOps movement, and all the hype surrounding it is causing DevOps fatigue.

As a former developer, I can identify with that feeling of fatigue. I’ve also seen some colleagues reach a certain level of frustration with DevOps. There are times when we make the mistake of taking on everything, even the releases. This is especially common if we’re perfectionists and don’t like to deliver software with bugs. We could even get to the point of releasing our code to production. (Although now that you’re “doing” DevOps, that might become your responsibility anyway.) After all, if we code it, we know the things that could go wrong and how to fix it if there are problems.

Even though now I’m more on the operations side of things—dealing with servers and helping companies implement DevOps—let me share some thoughts with you about why I think engineers are getting DevOps fatigue.

devops_fatigue_scalyr
Read More

Orchestrating Microservices: A Guide for Architects

Well, microservices sure are getting a lot of attention these days. It’s almost uncool to like them, as they are so mainstream. When you have all these separate modules doing their own things, the question inevitably comes up: How do we stitch them together?

The answer? Very carefully. Here are some tips you can use if you find yourself in the position of needing to orchestrate your microservices.

orchestrating_microservices_guide_architects
Read More

Built for Speed: Custom Parser for Regex at Scale

At Scalyr, we’ve optimized our pipeline from data ingestion to query execution so our customers can search logs at TBs per second. In this post, we’ll discuss a specific part of that pipeline: regular expression (regex) parsing during Bloom filter creation. Read on to learn how we captured the huge query latency reduction enabled by Bloom filters with a custom-built regex parser, and how much speed we gained as a result.

A little background

Scalyr organizes incoming log data in a columnar format: each text line is split into multiple “columns” that can be queried independently. We also partition the data into multiple “epochs,” where each epoch represents a fixed time range. For example, a typical HTTP access log in our database is organized like this:

The above structure enables us to quickly locate the column and epoch to execute the query. For example, if someone searches for “ClientAddress contains ‘2.34.56.7’” we simply run the string comparison against the “ClientAddress” column on the appropriate epochs. Of course, this means we must linearly scan this column across all epochs. Our architecture is designed for scans—which allows arbitrary queries from our customers—but we also have a few optimizations that let us significantly speed up common query patterns.

Bloom filters for speed

To optimize queries like the one above, we build Bloom filters to summarize the contents of individual columns (and BTW, we don’t do indexing). Let’s consider our example above, when a customer searches for “2.34.56.7” in the ClientAddress column. By defining a Bloom filter of all IP addresses in that column, we can skip scanning the actual data if “2.34.56.7” is not in the Bloom filter.

We don’t define Bloom filters for every possible query term—instead, we reserve this optimization for query terms that our customers use frequently. While the exact query terms and patterns vary by customer, certain types of strings (IP addresses, customer IDs, account IDs) figure often in their queries, and we can use Bloom filters to help optimize those. Each customer typically has fewer than 10 common patterns. We can add those matched patterns to one or more of the Bloom filters we create when we “freeze” the epoch—meaning that we convert the temporary representation used to hold incoming log data into a much more query-efficient form that is written to disk. We freeze the epochs only after we are confident all their data has been received.

Let’s use some real-life numbers to see the time and computing resources we can save with our example query. We generate a new epoch per account per shard every 5 minutes, and a single epoch on a shard can be as large as 2 Gb (gigabits) (uncompressed). If a user queries “ClientAddress contains ‘2.34.56.7’” with a date range of 24 hours, then we need to read 288 epochs to search for that IP address. If each epoch averages 1.4 Gb (72 Mb compressed), that’s 20 Gb to be loaded from disk. Assuming a 1 Gbps read speed, that’s 20 seconds to just fetch the data, not even including the actual search time! But if we first fetch the Bloom filters for each epoch (which average 1 Mb compressed), and then fetch only the epochs that have hits, we end up reading only 2.9 Gb (288*1 Mb + 288*72 Mb / 8), assuming a one-eighth Bloom filter hit rate. That’s an 85% savings in latency, dropping from 20 seconds to 2.9. We’re focusing on the Bloom filter here, but there are many more tweaks to handle other cases.

The problem

This all sounded great (at least we thought so) until we noticed that our freezing process was taking longer than it was supposed to. At least twice as long, in fact. Some of the slowness came from the creation of the Bloom filter itself. Specifically, it was from the regex matching of common patterns against each log line. (For example, we used [0-9]\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9] to match IPv4 addresses.)

This kind of high-volume operation (at least once per log line) needs to be as fast as possible, otherwise, we can’t ingest customers’ logs in realtime at large scales. Even though we’re using our own version of the regex library, which mainly focuses on speed (we’ll discuss this in an upcoming post), it’s just not fast enough for our needs. Now what?

Hand-coded parsers!

Regex libraries tend to convert the given pattern into a finite state machine (FSM), which is great. But when it comes down to absolute fast matching for known patterns, you can’t beat a hand-coded FSM.

Here’s the code snippet of our IPv4 parser, which is essentially an optimized finite state machine.

boolean find(CharSequence text) {
 // [0-9]\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]  => the original regex
 // 0000 1 2 333 4 555 6 7 0000  => value of "state" in each position when processing string like "aaa 1.234.567.8 bbb"
int state = 0, len = 0;
 for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
   if (c == '.') {
     switch (state) {
       case 1:
       case 3:
       case 5: state++; break;
       case 2:
       case 4:
       case 6: state = 0; break;
     }
   } else if (c >= '0' && c <= '9') {
     switch (state) {
       case 0:
       case 1: state = 1; break;
       case 2:
       case 4: state++; len = 1; break;
       case 3:
       case 5: if (len == 3) state = 1; else len++; break;
       case 6: return true;
     }
   } else {
     state = 0;
   }
 }
 return false;
}

Now, it’s time for benchmarking (drumroll please). We ran three different regex matches (standard Java regex library, Scalyr regex library, and the hand-coded FSM), against the same inputs and recorded find() time in nanoseconds below.

Our own FSM proved to be 3–4 times faster! Thanks to these hand-coded FSMs, we’ve seen a substantial improvement in our ingestion pipeline, which brings more speed for our customers,

After all, that is what Scalyr is all about.

Scalyr updates: Usability on alerts, graphs, Live Tail, and more

Last month, we focused on usability improvements in our Scalyr Platform releases. These include better alerts, tweaks to the graph legend, live tail improvements, adding scrolling to the search fields sidebar and improving times and dates in search results.

This page shows the alerts and responses set up in Scalyr

Alerts

There are lots of usability goodies in this batch of alerts improvements. Amongst other changes, we made the display faster and filterable with a tabbed display that separates alerts by state (triggered, muted, etc.). This lets you zero in on stuff that matters and ignore the rest. And for those of you with lots of alerts, we’ve added infinite scrolling so that the page comes up much faster. Read more about alerts.

Graphs

We added a master checkbox on the legend so can click “all” or “none” to get to the view you need. We also added a mouseover so you can see on the legend what plot you’re looking at. Read more about graphs.

Live Tail

We heard your feedback on Live Tail! It now flows smoothly, giving you continuous updates with better viewability and the ability to start and stop easily.

Dates and times

We separated the date and time displays in your search results and now give you absolute times relative to UTC.

Field sidebar

We made the field sidebar scrollable instead of paginated and gave you the ability to filter on fields by name. Read more about fields sidebar.

Going forward

We are developing Scalyr with the engineering front line in mind, and with a focus on our three value pillars – fast, simple, and shareable.

In the coming months, we will focus on observability for container and serverless environments; enhance your stakeholders’ experience; and improve report creation, scheduling, and distribution.

Feedback

Your product (or any) feedback is always welcome. Please reach out to us at support@scalyr.com.

Log4j2 Configuration: A Detailed Guide to Getting Started

We covered basic logging for Java applications a while back. In that tutorial, we used log4j version 2, a logging framework from the Apache project. Let’s go one step further with Java application logging and look at log4j2 configuration.

Log4j’s capabilities have made it one of Java’s most popular logging frameworks. It can be configured for multiple logging destinations and a variety of log file formats. Log messages can be filtered and directed at the individual class level, giving developers and operations personnel granular control over application messages.

Let’s examine these mechanisms by configuring log4j with a command line Java application.

log4j2_configuration_guide_scalyr

Read More

Get Started Quickly With Spring Boot Logging

Hot on the heels of the Get Started Quickly With Python Logging and Getting Started Quickly With C++ Logging articles, we’re headed across the coffee shop to look at logging in the context of the Java Spring Boot framework. While we’ve written already on the topic of Java logging in Get Started Quickly With Java Logging, the Spring Boot framework simplifies a lot of the plumbing involved in getting up and running. In this article we’ll learn how to:

  • Create a Spring Boot starter project
  • Use Gradle to build our application
  • Configure the default Spring Boot logger
  • Use Log4j2 with Spring Boot
  • Customize the logging configurations

Grab your Venti red-eye triple espresso with almond milk, and let’s get started!spring_boot_logging_scalyr

Read More

Getting Started With the Rails Logger

Let’s continue our ongoing series on getting starting with loggers for different languages and platforms. Back in March, we covered logging with Ruby; now it’s time to take a look at the platform most often associated with that language, Rails.

We’ll start with a simple application with scaffolding for CRUD operations on a single record. We’ll look at Rails’ default logging configuration and how to use logging in an application. Then we’ll look at how logging can be improved and why you might want to improve it.

This tutorial uses Ruby v2.5.1 and Rails 5.2.0. You’ll need to have them installed to follow along. These instructions will use the command line to create and configure the application and will not rely on a specific IDE or editor. We’ll let Rails use SQLite for the backend database.rails_logger_scalyr

Read More