Surprising Use Cases for Log Visualization

People commonly say that a picture is worth a thousand words.  So I wonder if log visualization is worth a thousand log entries.  The math equivalency might be a little hard to prove, but the idea is worth exploring.

You’re recording all sorts of information in your log files, but are you visualizing that information?

Do you have dashboards and graphs that help you picture production behavior?  Or does the information sit buried within digital mountains of arcane strings?  The proverbial needle in the haystack?  Does anyone who wants to use it need to engage in laborious searches?

If you’re not visualizing your logs, you’re missing out.  But I don’t necessarily want to make the case for visualization today.  Instead, I’d like to offer some ideas for visualization that you perhaps hadn’t considered.  Let’s take a look at some use cases for log visualization that you might not have considered.

Prerequisites for Log Visualization

First things first, though.  Before I can take you through the use cases, you need to have a setup that allows log visualization.  Specifically, you need modern log file management, which includes the following things that concern us:

  • Log aggregation (gathering and storing log files).
  • Meaningful parsing of the data contained in the log files.
  • Powerful search capabilities.
  • Log visualization tools.

Don’t get me wrong.  You could implement all of this stuff yourself.  But then again, you could also implement your own source control and text editor.  Just because you can doesn’t mean you should.

If you want to get serious about visualizing log files without burning tons of time, you need tooling and infrastructure in place to help you.  Once you have that, though, let’s take a look at some of the things you might do.

Read More

Creating an Audit Trail for Your Business

No matter what you do, there will be aspects of your job that you absolutely love.  And then you’ll have the things that you tolerate out of necessity.  I’m guessing that, for almost everyone reading, “audit trail” sounds like something that fits squarely into the “tolerate” bucket.

Even if you don’t know what it is, it probably sounds equal parts intimidating and boring.  The closest word association you’ll likely have with “audit” is that it’s what the IRS does to you when it simultaneously takes a fine-toothed comb to your life and demands more money from you.  And looking to avoid angering the IRS is probably not what you dreamed of on career day as a child.

But building and maintaining an audit trail for your business doesn’t have to be onerous.  Far from it.

Magnifying glass looking at a graph

What Is an Audit Trail, Anyway?

I’ve thrown the word around a few times, but let’s get a little more precise to set the stage for a post.  What is an audit trail?

To get a good working definition of “audit trail,” consider the definition of “audit.”

An official examination and verification of accounts and records, especially of financial accounts.

It has official overtones to it, and it involves taking a detailed look at relevant records.  So when you commission an audit, you ask someone to come in, on the record, and take a detailed look at what you’re doing.

Building and maintaining an audit trail for your business doesn't have to be onerous.

An audit trail, then, is what you do to facilitate this activity.  You make sure to dutifully document and capture anything that an auditor might need.  What’s the reasoning for this?  Generally speaking, you do this to demonstrate that you operate with a high degree of transparency and that your activities are all ethical, responsible, and legal.

Take the aforementioned case of the IRS mandating an audit for you.  This will tend to go much better for you if you’ve made sure to create an audit trail: saving receipts, noting business expenses, keeping careful track of all income, etc.

 

Read More

Building a Sustainable Startup

Although Scalyr has been around since 2011, it feels like we are really just getting started.

So many tech startups come roaring out of the gate, pursuing growth at all costs. Sometimes this leads to spectacular success, but much more often it leads to burnout and retrenchment, if not outright failure. Many promising startups have failed due to early over-reach.

At Scalyr, we’re taking a different approach. We spent over three years with a small team, literally above my garage, taking our time to build the right product in the right way. Only after we’d built a differentiated product that our early users loved did we set out to grow the team and the business.

We have been absolutely blown away by the results:

  • Customer devotion: since we signed our first customer in mid-2013, we have not had a customer leave us for another solution. The sheer performance of our log management service has been a pronounced and sustained differentiator.
  • Word of mouth: once adopted, Scalyr tends to spread within an organization. We have had multiple instances of customers beginning at five-figure annual revenue and growing to seven figures.
  • Scalability: through multiple orders of magnitude of growth, we’ve been able to maintain the performance and functionality that makes Scalyr special.

We are building a real business with real customers, and I’m excited to share some of our recent progress.  

 

Origin story

Rewind to 2006. Amazon EC2 and S3 were still under wraps, Facebook was only available to .edu addresses… and Google had just acquired my startup, Writely – soon to be known as Google Docs. Pretty soon, I was leading a project to build a new storage infrastructure for applications such as 

Docs, Sheets, Drive, and Picasa. Google has a strong culture of internal tool development, and we soon found ourselves using 17 different operational visibility tools to maintain a reliable service. Seventeen! Together, they provided a lot of functionality, but juggling that many tools was a bit of a nightmare.

It was clear that there was a lot of room for improvement. Around the industry, it’s more common to see teams using four or five visibility tools, rather than 17. But everyone suffers from too many tools, too little insight, and too much time spent investigating issues. In 2011, after leaving Google, I started Scalyr to create a better solution. Our ultimate goal is to revolutionize operational visibility, making it easy to understand the behavior of modern, complex cloud stacks.

The first step on this journey is our log management service. Logs provide the most detailed view of server and application behavior and are a critical piece of the operational puzzle. But existing log management tools were so clunky and slow that people avoided using them. Most of these tools are built on traditional keyword indexes, a technology originally designed to search books. Rethinking the problem from first principles, we built a profoundly more efficient solution, proving blazing-fast search over terabyte-scale aggregated logs.

The early response was beyond what any of us thought possible. We raised $2M in seed funding in 2015 to jumpstart the business.

 

Building a Sustainable Business

We have a long way to go to fulfill our long-term vision. To accomplish that, we have to build a sustainable business with solid fundamentals. We’ve been thoughtful about our growth to date, building the team in proportion to revenue. In fact, we’ve been hovering around breakeven – some months even profitable – for the past several quarters.

Sustainable growth requires a delicate balance: be aggressive enough to seize opportunities, but conservative enough to maintain company culture and healthy finances. Combining growth with healthy, sustainable practices requires more than simply pacing yourself. It requires efficiency. With VC cash burning a hole in your pocket, it’s tempting to throw money at all your problems; but then you’re quickly looking for your next round, and you’ve fallen off the sustainable path. So we’re always looking for ways to become more efficient.

It helps tremendously that we have a sustainable product differentiation – performance – that’s linked to a fundamental technological advantage.

 

Powering the next stage

With solid revenue, delighted customers, and a clear path forward, we’re limited only by the speed at which we can execute. So I’m excited to share that we have raised a $20M Series A led by Shasta Ventures, with participation from Heroic Ventures, Susa Ventures, and Bloomberg Beta – bringing our total amount raised to $28M. 

 

By the way, we’re hiring 🙂

This is the 6th startup I’ve [co]founded. People sometimes ask which one is my favorite. Writely was certainly the splashiest. Spectre was pretty cool (and was the last time I got to hack assembly language). But I can say without hesitation, Scalyr is the most satisfying, rewarding project I’ve been lucky enough to work on. I truly believe this is going to be one of those companies where people will look back and say “I wish I’d been there when…” – well, right now is “when”.

There couldn’t be a more exciting time to join! We’re hiring across all teams – especially engineering, sales, marketing, and customer success. Take a look at our Careers Page and become part of the journey.

Log File Too Big — What Should I Do?

You have a problem. But you don’t just have an ordinary problem. You have one of the most frustrating kinds of problems in the technical world. In the most basic terms, you’re trying to open a log file that’s too big to open. But “log file too big” doesn’t fully capture the frustration or the problem.

You need something out of your log file, so you go to open it. Then you wait. And wait. And wait.

After some amount of time, your text editor just crashes. Hoping it’s a fluke, you try again, waiting 15 minutes before another crash. So you’re half an hour in and not only have you not solved your actual problem — you haven’t even successfully taken what should be the simplest imaginable step toward solving it.

This combination of a long feedback loop with a non-deterministic outcome is what makes this so maddeningly frustrating. But fear not. Let’s take a look at how you can solve this, starting with the quickest and most superficial route and working toward root cause analysis.Log file too big? It can seem as though the thing is crushing you.

Pick a Different Tool at Your Disposal

If opening this log file is crashing an editor, like, say, Notepad, then your easiest step is to use a different editor. At least that way you can know that fate will reward your waiting with an opened file rather than with a crash.

Your path of least resistance here is to use something you already have installed. So consider the following utilities for each application.

  • For Windows, you can use WordPad. If you have enough memory to cover the size of the file you want to edit, WordPad will load it. So these days, that’s quite likely to apply to files even topping a gig in size.
  • For Mac, use Vim. It should be able to handle as big a file as you have memory, and with good search besides.
  • There are a lot of different flavors of Linux out there, so it’s a little harder to talk about default installations. But if you have it, you can also use Vim here. If not, you can install it easily, and you can use tail -X at the command line, where X is the number of lines you’d like to see.

That should at least get you started. You should be able to see your file without needing to wait for something to maybe or maybe not crash.

Download and Use a Text Editor Meant for This

If you have a little more patience, you should ask yourself whether your current need is a one-time-only situation or if you’ll be viewing and editing a lot of large files. If the latter, you’ll want to get more deliberate about the tools in your toolbox. I might suggest this even if you think this is a one-time need. Familiarizing yourself with a new, powerful text editor can’t hurt anything.

The number of text editors available to you is FAR too large to enumerate here. But Wikipedia has an extensive page on them, including specific information about file size.

If the problem solved by opening your large file isn’t too pressing, you could always engage in some yak-shaving. But you should probably solve that problem first, using a tool at your disposal. Then come back and spend some time evaluating your text editor options. Find one that can open large files and that has other features you like besides. I’d say even try out a few of them.

If large files figure to be part of your life going forward, you should have a plan of attack for them.

Read More

How You Can Shorten the Defect Life Cycle

Ah, the software defect. It’s the bane of our collective existence, and it also seems unavoidable. Okay, frankly, it probably is unavoidable, for all intents and purposes. But that doesn’t mean we’re powerless to do anything about it. We can chip away at its impact by reducing its severity and by shortening the defect life cycle.

What Is the Defect Life Cycle?

Muting the impact of defects is a self-explanatory endeavor, but what do I mean by “defect life cycle”? Well, first, consider the word “cycle.” This borrows from the lean idea of cycle time, which is fancy Six-Sigma speak for “How long does it take from start to finish?”

Okay, so why not just call it “defect lifetime”? I suppose I could have used that term. But it omits a subtle yet crucial consideration. A defect in our software moves through a series of phases and steps as people work to correct it.

“Lifetime” makes it sound like fate gives birth to the defect, and then it simply exists until it naturally expires. But that’s not at all what happens. Instead, the team collaborates methodically to track down the defect, assess it, address it, and roll out a fix of some kind. So we think about a defect life cycle rather than a defect just living its life quietly out in the country somewhere.

But terminology and philosophy aside, how do you shorten the defect life cycle?

Production defects tend to generate stress and keep people up at night. From the moment a user reports it until the moment someone resolves the problem, tensions run higher. Let’s take a look at how to reduce the length of that tense time.

Read More

Zalando Engineering Team Standardizes on Scalyr for Log Management   

Overview 

Zalando, Europe’s leading online fashion platform, made the transition to the cloud two years ago. As part of the move to AWS, they were looking for a log management tool that was flexible enough to fit their agile engineering culture, powerful enough to scale, and fast enough to allow them to investigate incidents. After evaluating several solutions, they standardized on Scalyr as their log management solution across their entire engineering team.

About Zalando

Zalando is Europe’s leading online fashion platform for women, men and children. They offer their customers a one-stop, convenient shopping experience with an extensive selection of fashion articles including shoes, apparel, and accessories, with free delivery and returns. Their assortment of almost 2,000 international brands ranges from popular global brands, fast fashion, and local brands, and is complemented by their private label products. Their localized offering addresses the distinct preferences of their customers in each of the 15 European markets they serve: Austria, Belgium, Denmark, Finland, France, Germany, Italy, Luxembourg, the Netherlands, Norway, Spain, Sweden, Switzerland, Poland and the United Kingdom.

Customer Challenges

Zalando transitioned to the cloud two years ago. They went from a monolith code base to microservices in the cloud, which changed their log management needs. They evaluated Scalyr along with three other solutions.

During their evaluation process, their evaluation criteria required:

  • An agent that can collect all the logs on every service
  • UI where engineers can search logs
  • Search specific applications
  • Ability to see every single log in the UI
  • Ability to scale
  • Would fit with the engineering culture of Radical Agility

After evaluating the four solutions, they narrowed it down to two to let the teams decide. They liked that with Scalyr it was easy to implement the agent and roll it out onto EC2 instances. They were able to define custom parsers for log lines.

The engineering culture at Zalando is built on Radical Agility. In order to empower their teams with autonomy, they need to automate everything around how they provision machines. This includes giving people the tools they need to do everything in a compliant way in their accounts. They found that the custom parsers were particularly important in giving each team flexibility to do things in their own way, which is a key pillar of the success of the engineering team.

Results of Using Scalyr

Scalyr is now deployed across the entire engineering team at Zalando. The main ways the team uses Scalyr are:

  • Respond to incidents and incident mitigation
  • Analysis of what’s happening on the service
  • Metrics for monitoring
  • Proactive investigations

They were able to get Scalyr up and running very fast. Once set up, their teams were enabled with access to their logs. They didn’t need to configure the agent and were able to instantly see their logs.

Given the number of autonomous services Zalando runs, they needed a coherent solution for how to get to the logs.

When asked how Scalyr has helped them, Tim Kröger, Head of Engineering – Visibility and Andreas Pfeiffer, Cloud and Network Architect, responded with it feels like asking how breathing helped you with your life.”

Before Scalyr, when an application crashed, the developer had to go to the log server, grab all the logs and find the host where the app was running. This would take at least 10 minutes. With Scalyr, developers can now deploy an application, get issues on the error, see the logs immediately, log into Scalyr, give the app ID and see all the logs from the deployment. They were able to go from 10 minutes of work to 13 seconds (which includes logging into Scalyr!).

Overall, Scalyr has helped Zalando make the transition to the cloud and mitigated the risk or increasing errors while moving to AWS.

Calculating the ROI of Log Analysis Tools

Anyone in a technology organization can relate to a certain frustration. You know that adopting a certain tool or practice would help you. So you charge forward with the initiative, looking for approval. But then someone — a superior most likely — asks you to justify it. “Give me the business case for it,” they say. And then, flummoxed a little, the gears start turning in your head. Today, I’d like to talk about that very issue in the specific context of log analysis tools.

If you have significant operations of any kind in production, you’re almost certainly generating logs. If not, you should be. You’re also probably monitoring those logs, in some fashion or another. And if you’re consuming them, you’re analyzing them in some fashion or another. But maybe you’re doing this manually, and you’d rather use a tool for log analysis. How do you justify adopting that tool?  How do you justify paying for it?

ROI: The Basic Idea

To do this, you have to veer into the world of business and entrepreneurship for a moment. But don’t worry — you’re not veering too far into that world. Just far enough to acquire a skill that any technologist ought to have.

I’m talking about understanding the idea of return on investment (ROI). Follow the link and you’ll see a formula, but the idea is really dead simple. If you’re going to pay for something, will that something bring you as much or more value than what you paid? When the answer is “yes,” then you have a justifiable decision. If the answer is “no,” then you can’t make a good case for the investment.

So, for log analysis tools, the question becomes a pretty straightforward one. Will your group realize enough cost savings or additional revenue generation from the tool to justify its cost?

Employing Back-of-the-Napkin Math

When you’re asked to justify purchasing a tool, you might wonder how much rigor you must bring to bear. People working with technology tend to have an appreciation for objective, empirical data.

When making a business case, if you can back it with objective, empirical data, that’s great. You should absolutely do so. But that’s often hard because it involves making projections and generally reasoning about the future. We humans like to believe we’re good at this, but if that were true, we’d all be rich from playing the stock market.

So you need to make some assumptions and build your case on the back of those assumptions. People sometimes refer to that as “back-of-the-napkin math” and it’s a perfectly fine way to build a business case, provided you highlight the assumptions that you make.

For instance, let’s say that I wanted to spend $50 on a text editor. I might project that its feature set would save me 20 minutes per day of brainless typing. I’d highlight that assumption and say that, if true, the investment would pay off after less than a week, given my current salary. These are the sorts of arguments that bosses and business folks appreciate.

First, the Cost of Log Analysis Tools

To make a business case and a credible projection of ROI, you need two projected pieces of data: the cost (i.e., the amount of the investment you’re looking for a return on) and the savings or revenue benefit. I’ll dedicate the rest of this post to talking about how log analysis tools can save companies money or even add to their bottom line. But first, let’s take a look at their costs.

The most obvious cost is the sticker price of the tool. That might be an initial lump sum, but in this day and age, it’s usually going to be a recurring monthly subscription cost. So when making your case, be sure to take that into account.

There’s also a second, subtler cost that you should prepare yourself to address. Installing, learning, and managing the tools takes time from someone in the IT organization. You can (and should) argue that it winds up saving time in the end, but you also must acknowledge that investing employee time (and thus salary) is required.

Once you have those costs established, you can start to reason about the benefits.

Read More

Scalyr October Product Updates

The engineering team has been hard at work on new features and updates over the last few weeks. We are excited to share these changes with you and would love to hear your feedback.

Agent

  • By popular demand, we’ve raised the limit on log messages to 10,000 bytes (from 3500). If you’re using the Scalyr Agent, please upgrade to version 2.0.30 (available by the end of next week) to take advantage of this change: https://www.scalyr.com/help/scalyr-agent#upgrades.

  • Amazon EC2 Container Services (ECS) support in the Scalyr Agent.
  • The Scalyr Agent can now rename log files before uploading them to Scalyr.
  • Improved support for Kubernetes logs. You can now configure the Scalyr Agent to parse the JSON records generated by Kubernetes and extract the original log text.
  • The Scalyr Agent can now redact sensitive data (e.g. user email addresses) by replacing it with a hash, allowing you to see patterns without revealing the raw data.
  • The Scalyr Agent’s URL monitor plugin can now generate POST and PUT requests, and can specify HTTP headers and a request body.

 

Integrations

 

 

Improved Search and Team Settings

  • Improved bookmark and link-sharing support for multiple teams. Search URLs now record the team being viewed. When you (or another user) later open the URL, if you’re not linked to the correct team, you’ll be prompted to switch.

  • When you customize the fields shown in the log view, your settings will now persist until you change them. (If you work with multiple teams, your settings are saved separately for each team.)
  • You can now customize the default search time period (normally 4 hours). See “Set Default Search Time Span” on the Tips & Tricks page.

 

Simplified Pricing

  • We are making some changes to our pricing to provide additional flexibility and to make the full Scalyr feature set available to everyone. You can view the new options at scalyr.com/pricing. All affected customers have received an email with more details. These changes will go live November 1st.

OkCupid Falls For Scalyr

OkCupid has a long history of building things in-house. They have always been a deeply technical company that doesn’t settle for what’s already available and isn’t afraid to build something new. In fact, OkCupid doesn’t use Apache or Nginx — they built their own web server. So it isn’t surprising that when it came time to reconsider how they manage logs, they considered both existing solutions and the possibility of building something in-house. In the end, they found that Scalyr both met their log management needs for today and has the potential to grow as a solution for them to solve a number of other challenges they face. I recently had a chance to catch up with Alex Dumitriu, CIO at OkCupid, about their early experiences with Scalyr.

Read More

Wistia’s Engineering and Customer Support Teams Solve Customer Issues Faster With Scalyr

I recently caught up with Ryan Artecona, Infrastructure Tech Lead at Wistia, to learn more about how their engineering and customer support teams use Scalyr. 

Last summer, the engineering team at Wistia took a step back from their day to day to evaluate the tools and infrastructure they were using. The team realized they needed a log management solution. After evaluating several products, they decided to move forward with Scalyr. Scalyr enabled the engineering team to have more visibility into operations. They were able to identify issues faster and as a result, have found they have fewer disgruntled customers.

About Wistia

Wistia is a professional video hosting and analytics platform designed to help businesses communicate more creatively. Founded in Cambridge, Massachusetts in 2006, Wistia offers businesses the resources to host, organize, customize and measure the impact of video. In addition to video hosting, analytics, and marketing tools, Wistia has a library of educational resources to help you learn the ins-and-outs of creating great video content.

Customer Challenges

Last summer, the engineering team at Wistia took a step back from their normal feature releases to assess their infrastructure. At the time, they had no log aggregation or observability tools beyond New Relic. They had log files, but the only way engineers could use them was to ssh into a specific log and hope the log you were looking for was there. They hadn’t used a log aggregator, but they knew they existed and wanted to find the right one.

The team evaluated Scalyr along with Splunk, SumoLogic, PaperTrail, and LogDNA. The criteria for their evaluation included:

  • Support for live streaming
  • Speed of queries
  • Ability to add indexes to logs in a flexible way
  • Reasonably priced solution

After evaluating all the products, the team decided on Scalyr because it matched their evaluation criteria the best. Some of the others didn’t support live streaming, others were too slow and inflexible and others were too expensive. In the evaluation, the team found Scalyr was easy to use, had powerful capabilities, was fast and was the most competitively priced.

Results of Using Scalyr

It took about a week to get up and running on Scalyr. Now, the product is used pervasively across the engineering team. The simplicity of the product that can get more complex as you need it to helps everyone across the team get value out of the product. For example, when writing a feature, the team wants to log when a few events happen, whatever they put into the code is what they put into Scalyr. They aren’t required to create an opaque translation layer. If they want to start treating it as a string, they can put it in there. If they want something fancier or to do something more complicated, they can make sure to format it to make it through the parser.

According to Ryan Artecona, Infrastructure Engineering Lead at Wistia,

“it is easier and faster to diagnose the bugs that frustrate our customers as a result of using Scalyr.”

Beyond the engineering team, the technical support team (Support Engineers) at Wistia uses Scalyr to help track down the root causes of problems that customers encounter. They use the search functionality in Scalyr extensively to pinpoint the exact requests in the logs that correspond to the problems customers encounter in the browser or when using their APIs. Scalyr has made the customer support team more self-sufficient in resolving customers’ problems – and when they do need to escalate problems to the engineering team, they’re able to give them a great head start on fixing the issue. The handoff between support to engineering is much smoother as a result of using Scalyr.

The engineering team is able to respond to incidents much faster. Rolling out Scalyr has helped the engineering team commit to security best practices, including removing root access to all production servers. Now engineers can write code that is observable from the outside and more conducive to debugging. Using Scalyr has been the team’s first big step in the direction of making debugging software in production more collaborative – something they didn’t have before. Individuals are more empowered to own their code by using Scalyr. Before, if things became too time-consuming (pre-Scalyr), issues would just go undiagnosed and marked as “too hard” to figure out.