Built for Speed: Custom Parser for Regex at Scale

At Scalyr, we’ve optimized our pipeline from data ingestion to query execution so our customers can search logs at TBs per second. In this post, we’ll discuss a specific part of that pipeline: regular expression (regex) parsing during Bloom filter creation. Read on to learn how we captured the huge query latency reduction enabled by Bloom filters with a custom-built regex parser, and how much speed we gained as a result.

A little background

Scalyr organizes incoming log data in a columnar format: each text line is split into multiple “columns” that can be queried independently. We also partition the data into multiple “epochs,” where each epoch represents a fixed time range. For example, a typical HTTP access log in our database is organized like this:

Read More