Top 4 Ways to Make Your Microservices Not Actually Microservices

Microservices have gained such a level of popularity these days that they’re often touted as “the way” to build software. That’s all well and good, except there are a lot of microservice anti-patterns flying around.

The vast majority of microservice systems I’ve seen aren’t really microservices at all. They’re separately deployable artifacts, sure. But they’re built in a way that, for developers, causes more pain than solves problems.

So let’s dive a little more into this. First, we’ll talk about what a microservice was intended to be. Then we’ll get into some of these anti-patterns.

Disguise_face_in_Scalyr_colors
Read More

Microservices Vs. SOA: What’s the Difference?

What’s the difference between microservices and SOA? The two design paradigms have a lot in common.

That makes sense when you consider that microservices are an offshoot of the SOA movement. But there are essential differences between the two systems. Let’s take a look at the two different approaches to architecture and highlight where they differ.

Before we dive in, it’s important to note that neither architecture has a universally accepted definition. So, you could spend as much time debating the details of what microservices or SOA are as you could arguing their differences.

Read More

How to Merge Log Files

You have log files from two or more applications, and you need to see them together. Viewing the data together in proper sequence will make it easier to correlate events, and listing them side-by-side in windows or tabs isn’t cutting it.

You need to merge log files by timestamps.

But just merging them by timestamp isn’t the only thing you need. Many log files have entries with more than one line, and not all of those lines have timestamps on them.

merge_log_files
Read More

API vs. Microservices: A Microservice Is More Than Just an API

When writing software, consider both the implementation and the architecture of the code. The software you write is most effective when written in a way that logically makes sense. In addition to being architecturally sound, software should also consider the interaction the user will have with it and the interface the user will experience.

Both the concept of an API and the concept of a microservice involve the structure and interactions of software. A microservice can be misconstrued as simply an endpoint to provide an API. But microservices have much more flexibility and capabilities than that. This article will speak on the differences between APIs and microservices, plus detail some of the benefits a microservice can provide.

To get started, let’s define our terms.

api_vs_microservices_scalyr
Read More

Getting Started Quickly With PHP Logging

The previous articles in this series covered the basics of logging in C#, Java, Python, Ruby, Node.js, and JavaScript. In this post, I’ll show you how to use logging techniques in yet another very popular language: PHP.

I’ll open with a quick example of manual logging in PHP. Then we’ll revisit the details of why logging matters and what your logs should show. And lastly, I’ll show you how to set up and use the most popular PHP logging framework.

Let’s get started, then!

php_logging_scalyr
Read More

How to Create a Docker Image From a Container

In this article, I’ll provide step-by-step instructions on how to create a Docker container, modify its internal state, and then save the container as an image. This is really handy when you are working out how an image should be constructed, because you can just keep tweaking a running container until it works like you want it to. When you’re done, just save it as an image.

Okay, let’s jump right into it.

create_docker_image_container_scalyr
Read More

Containers: Benefits and Making a Business Case

Containers are hot stuff right now, so it’s natural that you’re here wondering what the business case and benefits of containers could be.

If this is you—if you’re looking to assess whether containers would make sense for your company—then you’re in just the right place. Because by the end of this article, you’ll not only have a good understanding of what containers are and what they’re good (and not so good) at, but you’ll also have some decision making criteria to help you decide whether they’ll work for you in your unique situation.

We’ve got quite a bit of ground to cover, so let’s get to it.

benefits_of_containers_Scalyr
Read More

Why Are Engineers Getting DevOps Fatigue?

As an engineer, you already have enough responsibilities when developing software. Adding more tasks–say, DevOps-related ones—to your workday activities might not sound very appealing. With DevOps, not only are you responsible for producing working software, but now you also need to automate the building, testing, and deployment phases of the software. That’s a lot to take care of! But the extra work aside, maybe you’re just tired of the DevOps movement, and all the hype surrounding it is causing DevOps fatigue.

As a former developer, I can identify with that feeling of fatigue. I’ve also seen some colleagues reach a certain level of frustration with DevOps. There are times when we make the mistake of taking on everything, even the releases. This is especially common if we’re perfectionists and don’t like to deliver software with bugs. We could even get to the point of releasing our code to production. (Although now that you’re “doing” DevOps, that might become your responsibility anyway.) After all, if we code it, we know the things that could go wrong and how to fix it if there are problems.

Even though now I’m more on the operations side of things—dealing with servers and helping companies implement DevOps—let me share some thoughts with you about why I think engineers are getting DevOps fatigue.

devops_fatigue_scalyr
Read More

Orchestrating Microservices: A Guide for Architects

Well, microservices sure are getting a lot of attention these days. It’s almost uncool to like them, as they are so mainstream. When you have all these separate modules doing their own things, the question inevitably comes up: How do we stitch them together?

The answer? Very carefully. Here are some tips you can use if you find yourself in the position of needing to orchestrate your microservices.

orchestrating_microservices_guide_architects
Read More

Built for Speed: Custom Parser for Regex at Scale

At Scalyr, we’ve optimized our pipeline from data ingestion to query execution so our customers can search logs at TBs per second. In this post, we’ll discuss a specific part of that pipeline: regular expression (regex) parsing during Bloom filter creation. Read on to learn how we captured the huge query latency reduction enabled by Bloom filters with a custom-built regex parser, and how much speed we gained as a result.

A little background

Scalyr organizes incoming log data in a columnar format: each text line is split into multiple “columns” that can be queried independently. We also partition the data into multiple “epochs,” where each epoch represents a fixed time range. For example, a typical HTTP access log in our database is organized like this:

The above structure enables us to quickly locate the column and epoch to execute the query. For example, if someone searches for “ClientAddress contains ‘2.34.56.7’” we simply run the string comparison against the “ClientAddress” column on the appropriate epochs. Of course, this means we must linearly scan this column across all epochs. Our architecture is designed for scans—which allows arbitrary queries from our customers—but we also have a few optimizations that let us significantly speed up common query patterns.

Bloom filters for speed

To optimize queries like the one above, we build Bloom filters to summarize the contents of individual columns (and BTW, we don’t do indexing). Let’s consider our example above, when a customer searches for “2.34.56.7” in the ClientAddress column. By defining a Bloom filter of all IP addresses in that column, we can skip scanning the actual data if “2.34.56.7” is not in the Bloom filter.

We don’t define Bloom filters for every possible query term—instead, we reserve this optimization for query terms that our customers use frequently. While the exact query terms and patterns vary by customer, certain types of strings (IP addresses, customer IDs, account IDs) figure often in their queries, and we can use Bloom filters to help optimize those. Each customer typically has fewer than 10 common patterns. We can add those matched patterns to one or more of the Bloom filters we create when we “freeze” the epoch—meaning that we convert the temporary representation used to hold incoming log data into a much more query-efficient form that is written to disk. We freeze the epochs only after we are confident all their data has been received.

Let’s use some real-life numbers to see the time and computing resources we can save with our example query. We generate a new epoch per account per shard every 5 minutes, and a single epoch on a shard can be as large as 2 Gb (gigabits) (uncompressed). If a user queries “ClientAddress contains ‘2.34.56.7’” with a date range of 24 hours, then we need to read 288 epochs to search for that IP address. If each epoch averages 1.4 Gb (72 Mb compressed), that’s 20 Gb to be loaded from disk. Assuming a 1 Gbps read speed, that’s 20 seconds to just fetch the data, not even including the actual search time! But if we first fetch the Bloom filters for each epoch (which average 1 Mb compressed), and then fetch only the epochs that have hits, we end up reading only 2.9 Gb (288*1 Mb + 288*72 Mb / 8), assuming a one-eighth Bloom filter hit rate. That’s an 85% savings in latency, dropping from 20 seconds to 2.9. We’re focusing on the Bloom filter here, but there are many more tweaks to handle other cases.

The problem

This all sounded great (at least we thought so) until we noticed that our freezing process was taking longer than it was supposed to. At least twice as long, in fact. Some of the slowness came from the creation of the Bloom filter itself. Specifically, it was from the regex matching of common patterns against each log line. (For example, we used [0-9]\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9] to match IPv4 addresses.)

This kind of high-volume operation (at least once per log line) needs to be as fast as possible, otherwise, we can’t ingest customers’ logs in realtime at large scales. Even though we’re using our own version of the regex library, which mainly focuses on speed (we’ll discuss this in an upcoming post), it’s just not fast enough for our needs. Now what?

Hand-coded parsers!

Regex libraries tend to convert the given pattern into a finite state machine (FSM), which is great. But when it comes down to absolute fast matching for known patterns, you can’t beat a hand-coded FSM.

Here’s the code snippet of our IPv4 parser, which is essentially an optimized finite state machine.

boolean find(CharSequence text) {
 // [0-9]\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]  => the original regex
 // 0000 1 2 333 4 555 6 7 0000  => value of "state" in each position when processing string like "aaa 1.234.567.8 bbb"
int state = 0, len = 0;
 for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
   if (c == '.') {
     switch (state) {
       case 1:
       case 3:
       case 5: state++; break;
       case 2:
       case 4:
       case 6: state = 0; break;
     }
   } else if (c >= '0' && c <= '9') {
     switch (state) {
       case 0:
       case 1: state = 1; break;
       case 2:
       case 4: state++; len = 1; break;
       case 3:
       case 5: if (len == 3) state = 1; else len++; break;
       case 6: return true;
     }
   } else {
     state = 0;
   }
 }
 return false;
}

Now, it’s time for benchmarking (drumroll please). We ran three different regex matches (standard Java regex library, Scalyr regex library, and the hand-coded FSM), against the same inputs and recorded find() time in nanoseconds below.

Our own FSM proved to be 3–4 times faster! Thanks to these hand-coded FSMs, we’ve seen a substantial improvement in our ingestion pipeline, which brings more speed for our customers,

After all, that is what Scalyr is all about.