Scalyr updates: Tabular search results, log lines in alerts and more

Last month, we fixed a pile of bugs and added two of our most highly requested features. If you’re curious about which bugs we fixed, we list them in our weekly release notes at https://www.scalyr.com/releasenotes/.

Tabular Search Results

In addition to our standard side-scrolling log search results, we’ve added a new table view mode. If you’re searching through fairly heterogeneous logs, having the just the field values shown in columns makes it much easier to see patterns and find things.

To try table mode, just bring up the Display Settings dialog, and choose the table radio button under “Display log entries in…”. You can pick what to show as columns from the list of parsed fields plus from the system fields checkboxes.

Another nice benefit in using a table is that you can be more exact when filtering. In the screenshot above, I’ve used the mouse to select the ‘prune_rate’ value for the ‘path5’ field. If I click the FILTER FOR button, the query becomes “path5 contains ‘prune_rate‘”, instead of just “prune_rate”.

Log Lines in Alert Emails

Usually it’s easier to spot problems when you can see a visual trend, especially when you’ve got obvious spikes in a timeline or dashboard graph. There are other times, though, when one glance at a terribly familiar log line can tell the tale even better.

Raw log lines are so verbose that we’re not turning them on for everyone, but you can request them by dropping us an email at support@scalyr.com.

Obscured API Keys

For security or compliance purposes, you may be required to keep your API keys hidden from view in Scalyr.  We’ve added the ability to only show them when you create them, but keep them obfuscated on the client thereafter.

If you would like to turn this setting on, email us at support@scalyr.com.

Feedback

Got an idea, feedback or questions?  Email us at support@scalyr.com.

Containers: Benefits and Making a Business Case

Containers are hot stuff right now, so it’s natural that you’re here wondering what the business case and benefits of containers could be.

If this is you—if you’re looking to assess whether containers would make sense for your company—then you’re in just the right place. Because by the end of this article, you’ll not only have a good understanding of what containers are and what they’re good (and not so good) at, but you’ll also have some decision making criteria to help you decide whether they’ll work for you in your unique situation.

We’ve got quite a bit of ground to cover, so let’s get to it.

benefits_of_containers_Scalyr
Read More

Where in the World is Scalyr?

As the summer ends, it’s time to get back into the swing of things for the Scalyr team. We’re going to be out and about from September thru December and we would enjoy you stopping by to talk about how logs impact you, hear a few war stories (and maybe share a few), and catch you up on our latest technology.

September kicks off with PagerDuty Summit in San Francisco, CA happening September 10-12, 2018. PagerDuty is helping to define and drive the future of the practice of DevOps and Scalyr is pleased to be sponsoring the Summit. Scalyr will be in the Colonia Room on the Mezzanine Level and we want to meet all of you.

Scalyr will also be at DevOpsDays in Portland, OR, September 11-13, 2018. We’ve had the privilege of being part of a number of DevOpsDays around the United States and are fascinated by both the depth and breadth of knowledge the attendees share with us. Every show has been different and fascinating, so please stop by and tell us about what you do.

Jenkins World kicks off  on September 16th,,2018 in San Francisco.  Along with being on the show floor, Scalyr’s CEO and Founder, Steve Newman, will be speaking on what it takes to build operational visibility for modern design and scale.  We’re in booth #201, so drop by and chat with us.

And we finish out the month with the Grace Hopper Celebration on September 26-28, 2018 in Houston, TX. GHC, produced by the AnitaB.org, is the largest gathering of women technologists in the world. Please stop by booth #4951 and meet some of our skilled engineering team and learn more about our product, our technology, and culture of diversity.

So, as you can see, September is a busy month for Scalyr. If you don’t have a chance to meet us at one of these events keep watching this space, because there are more coming up in 2018.

Why Are Engineers Getting DevOps Fatigue?

As an engineer, you already have enough responsibilities when developing software. Adding more tasks–say, DevOps-related ones—to your workday activities might not sound very appealing. With DevOps, not only are you responsible for producing working software, but now you also need to automate the building, testing, and deployment phases of the software. That’s a lot to take care of! But the extra work aside, maybe you’re just tired of the DevOps movement, and all the hype surrounding it is causing DevOps fatigue.

As a former developer, I can identify with that feeling of fatigue. I’ve also seen some colleagues reach a certain level of frustration with DevOps. There are times when we make the mistake of taking on everything, even the releases. This is especially common if we’re perfectionists and don’t like to deliver software with bugs. We could even get to the point of releasing our code to production. (Although now that you’re “doing” DevOps, that might become your responsibility anyway.) After all, if we code it, we know the things that could go wrong and how to fix it if there are problems.

Even though now I’m more on the operations side of things—dealing with servers and helping companies implement DevOps—let me share some thoughts with you about why I think engineers are getting DevOps fatigue.

devops_fatigue_scalyr
Read More

Orchestrating Microservices: A Guide for Architects

Well, microservices sure are getting a lot of attention these days. It’s almost uncool to like them, as they are so mainstream. When you have all these separate modules doing their own things, the question inevitably comes up: How do we stitch them together?

The answer? Very carefully. Here are some tips you can use if you find yourself in the position of needing to orchestrate your microservices.

orchestrating_microservices_guide_architects
Read More

Built for Speed: Custom Parser for Regex at Scale

At Scalyr, we’ve optimized our pipeline from data ingestion to query execution so our customers can search logs at TBs per second. In this post, we’ll discuss a specific part of that pipeline: regular expression (regex) parsing during Bloom filter creation. Read on to learn how we captured the huge query latency reduction enabled by Bloom filters with a custom-built regex parser, and how much speed we gained as a result.

A little background

Scalyr organizes incoming log data in a columnar format: each text line is split into multiple “columns” that can be queried independently. We also partition the data into multiple “epochs,” where each epoch represents a fixed time range. For example, a typical HTTP access log in our database is organized like this:

The above structure enables us to quickly locate the column and epoch to execute the query. For example, if someone searches for “ClientAddress contains ‘2.34.56.7’” we simply run the string comparison against the “ClientAddress” column on the appropriate epochs. Of course, this means we must linearly scan this column across all epochs. Our architecture is designed for scans—which allows arbitrary queries from our customers—but we also have a few optimizations that let us significantly speed up common query patterns.

Bloom filters for speed

To optimize queries like the one above, we build Bloom filters to summarize the contents of individual columns (and BTW, we don’t do indexing). Let’s consider our example above, when a customer searches for “2.34.56.7” in the ClientAddress column. By defining a Bloom filter of all IP addresses in that column, we can skip scanning the actual data if “2.34.56.7” is not in the Bloom filter.

We don’t define Bloom filters for every possible query term—instead, we reserve this optimization for query terms that our customers use frequently. While the exact query terms and patterns vary by customer, certain types of strings (IP addresses, customer IDs, account IDs) figure often in their queries, and we can use Bloom filters to help optimize those. Each customer typically has fewer than 10 common patterns. We can add those matched patterns to one or more of the Bloom filters we create when we “freeze” the epoch—meaning that we convert the temporary representation used to hold incoming log data into a much more query-efficient form that is written to disk. We freeze the epochs only after we are confident all their data has been received.

Let’s use some real-life numbers to see the time and computing resources we can save with our example query. We generate a new epoch per account per shard every 5 minutes, and a single epoch on a shard can be as large as 2 Gb (gigabits) (uncompressed). If a user queries “ClientAddress contains ‘2.34.56.7’” with a date range of 24 hours, then we need to read 288 epochs to search for that IP address. If each epoch averages 1.4 Gb (72 Mb compressed), that’s 20 Gb to be loaded from disk. Assuming a 1 Gbps read speed, that’s 20 seconds to just fetch the data, not even including the actual search time! But if we first fetch the Bloom filters for each epoch (which average 1 Mb compressed), and then fetch only the epochs that have hits, we end up reading only 2.9 Gb (288*1 Mb + 288*72 Mb / 8), assuming a one-eighth Bloom filter hit rate. That’s an 85% savings in latency, dropping from 20 seconds to 2.9. We’re focusing on the Bloom filter here, but there are many more tweaks to handle other cases.

The problem

This all sounded great (at least we thought so) until we noticed that our freezing process was taking longer than it was supposed to. At least twice as long, in fact. Some of the slowness came from the creation of the Bloom filter itself. Specifically, it was from the regex matching of common patterns against each log line. (For example, we used [0-9]\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9] to match IPv4 addresses.)

This kind of high-volume operation (at least once per log line) needs to be as fast as possible, otherwise, we can’t ingest customers’ logs in realtime at large scales. Even though we’re using our own version of the regex library, which mainly focuses on speed (we’ll discuss this in an upcoming post), it’s just not fast enough for our needs. Now what?

Hand-coded parsers!

Regex libraries tend to convert the given pattern into a finite state machine (FSM), which is great. But when it comes down to absolute fast matching for known patterns, you can’t beat a hand-coded FSM.

Here’s the code snippet of our IPv4 parser, which is essentially an optimized finite state machine.

boolean find(CharSequence text) {
 // [0-9]\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]  => the original regex
 // 0000 1 2 333 4 555 6 7 0000  => value of "state" in each position when processing string like "aaa 1.234.567.8 bbb"
int state = 0, len = 0;
 for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
   if (c == '.') {
     switch (state) {
       case 1:
       case 3:
       case 5: state++; break;
       case 2:
       case 4:
       case 6: state = 0; break;
     }
   } else if (c >= '0' && c <= '9') {
     switch (state) {
       case 0:
       case 1: state = 1; break;
       case 2:
       case 4: state++; len = 1; break;
       case 3:
       case 5: if (len == 3) state = 1; else len++; break;
       case 6: return true;
     }
   } else {
     state = 0;
   }
 }
 return false;
}

Now, it’s time for benchmarking (drumroll please). We ran three different regex matches (standard Java regex library, Scalyr regex library, and the hand-coded FSM), against the same inputs and recorded find() time in nanoseconds below.

Our own FSM proved to be 3–4 times faster! Thanks to these hand-coded FSMs, we’ve seen a substantial improvement in our ingestion pipeline, which brings more speed for our customers,

After all, that is what Scalyr is all about.

Microservices Logging Best Practices

Microservice architecture is an application structure that fosters the use of a loosely coupled system to allow you to develop, test, deploy, and release services independently of each other. These services are part of a unique system, and the idea behind using microservices is to break a big problem in smaller problems. Usually, each service interacts with the others through an HTTP endpoint, hiding the details of its technology stack by exposing only a contract to its consumers. Service A will call Service B, which at the same time calls Service C. Once the request chain is complete, Service A might be able to respond to the end customer that initiated the request.

Microservice architecture offers a lot of great benefits like the ability to use different technology stacks, deploy independently, solve small problems one at a time, and more! But using microservices comes with a high cost in that they are complex…not only in how they communicate but also in how to manage them. And they get even more complicated when one or more services fail. Which service failed? Why, and under what circumstances? All these questions are hard to answer if you don’t have good, meaningful logs.

And let’s be honest, we all hate those “unknown” or “something went wrong” system errors. I myself have struggled with the problems that come from a lousy logging strategy. Let me share a few best practices that have helped me when dealing with microservices.

microservices_logging_best_practices

Read More

Scalyr updates: Usability on alerts, graphs, Live Tail, and more

Last month, we focused on usability improvements in our Scalyr Platform releases. These include better alerts, tweaks to the graph legend, live tail improvements, adding scrolling to the search fields sidebar and improving times and dates in search results.

This page shows the alerts and responses set up in Scalyr

Alerts

There are lots of usability goodies in this batch of alerts improvements. Amongst other changes, we made the display faster and filterable with a tabbed display that separates alerts by state (triggered, muted, etc.). This lets you zero in on stuff that matters and ignore the rest. And for those of you with lots of alerts, we’ve added infinite scrolling so that the page comes up much faster. Read more about alerts.

Graphs

We added a master checkbox on the legend so can click “all” or “none” to get to the view you need. We also added a mouseover so you can see on the legend what plot you’re looking at. Read more about graphs.

Live Tail

We heard your feedback on Live Tail! It now flows smoothly, giving you continuous updates with better viewability and the ability to start and stop easily.

Dates and times

We separated the date and time displays in your search results and now give you absolute times relative to UTC.

Field sidebar

We made the field sidebar scrollable instead of paginated and gave you the ability to filter on fields by name. Read more about fields sidebar.

Going forward

We are developing Scalyr with the engineering front line in mind, and with a focus on our three value pillars – fast, simple, and shareable.

In the coming months, we will focus on observability for container and serverless environments; enhance your stakeholders’ experience; and improve report creation, scheduling, and distribution.

Feedback

Your product (or any) feedback is always welcome. Please reach out to us at support@scalyr.com.

Log4j2 Configuration: A Detailed Guide to Getting Started

We covered basic logging for Java applications a while back. In that tutorial, we used log4j version 2, a logging framework from the Apache project. Let’s go one step further with Java application logging and look at log4j2 configuration.

Log4j’s capabilities have made it one of Java’s most popular logging frameworks. It can be configured for multiple logging destinations and a variety of log file formats. Log messages can be filtered and directed at the individual class level, giving developers and operations personnel granular control over application messages.

Let’s examine these mechanisms by configuring log4j with a command line Java application.

log4j2_configuration_guide_scalyr

Read More

Get Started Quickly With Spring Boot Logging

Hot on the heels of the Get Started Quickly With Python Logging and Getting Started Quickly With C++ Logging articles, we’re headed across the coffee shop to look at logging in the context of the Java Spring Boot framework. While we’ve written already on the topic of Java logging in Get Started Quickly With Java Logging, the Spring Boot framework simplifies a lot of the plumbing involved in getting up and running. In this article we’ll learn how to:

  • Create a Spring Boot starter project
  • Use Gradle to build our application
  • Configure the default Spring Boot logger
  • Use Log4j2 with Spring Boot
  • Customize the logging configurations

Grab your Venti red-eye triple espresso with almond milk, and let’s get started!spring_boot_logging_scalyr

Read More