Calculating the ROI of Log Analysis Tools

Anyone in a technology organization can relate to a certain frustration. You know that adopting a certain tool or practice would help you. So you charge forward with the initiative, looking for approval. But then someone — a superior most likely — asks you to justify it. “Give me the business case for it,” they say. And then, flummoxed a little, the gears start turning in your head. Today, I’d like to talk about that very issue in the specific context of log analysis tools.

If you have significant operations of any kind in production, you’re almost certainly generating logs. If not, you should be. You’re also probably monitoring those logs, in some fashion or another. And if you’re consuming them, you’re analyzing them in some fashion or another. But maybe you’re doing this manually, and you’d rather use a tool for log analysis. How do you justify adopting that tool?  How do you justify paying for it?

ROI: The Basic Idea

To do this, you have to veer into the world of business and entrepreneurship for a moment. But don’t worry — you’re not veering too far into that world. Just far enough to acquire a skill that any technologist ought to have.

I’m talking about understanding the idea of return on investment (ROI). Follow the link and you’ll see a formula, but the idea is really dead simple. If you’re going to pay for something, will that something bring you as much or more value than what you paid? When the answer is “yes,” then you have a justifiable decision. If the answer is “no,” then you can’t make a good case for the investment.

So, for log analysis tools, the question becomes a pretty straightforward one. Will your group realize enough cost savings or additional revenue generation from the tool to justify its cost?

Employing Back-of-the-Napkin Math

When you’re asked to justify purchasing a tool, you might wonder how much rigor you must bring to bear. People working with technology tend to have an appreciation for objective, empirical data.

When making a business case, if you can back it with objective, empirical data, that’s great. You should absolutely do so. But that’s often hard because it involves making projections and generally reasoning about the future. We humans like to believe we’re good at this, but if that were true, we’d all be rich from playing the stock market.

So you need to make some assumptions and build your case on the back of those assumptions. People sometimes refer to that as “back-of-the-napkin math” and it’s a perfectly fine way to build a business case, provided you highlight the assumptions that you make.

For instance, let’s say that I wanted to spend $50 on a text editor. I might project that its feature set would save me 20 minutes per day of brainless typing. I’d highlight that assumption and say that, if true, the investment would pay off after less than a week, given my current salary. These are the sorts of arguments that bosses and business folks appreciate.

First, the Cost of Log Analysis Tools

To make a business case and a credible projection of ROI, you need two projected pieces of data: the cost (i.e., the amount of the investment you’re looking for a return on) and the savings or revenue benefit. I’ll dedicate the rest of this post to talking about how log analysis tools can save companies money or even add to their bottom line. But first, let’s take a look at their costs.

The most obvious cost is the sticker price of the tool. That might be an initial lump sum, but in this day and age, it’s usually going to be a recurring monthly subscription cost. So when making your case, be sure to take that into account.

There’s also a second, subtler cost that you should prepare yourself to address. Installing, learning, and managing the tools takes time from someone in the IT organization. You can (and should) argue that it winds up saving time in the end, but you also must acknowledge that investing employee time (and thus salary) is required.

Once you have those costs established, you can start to reason about the benefits.

Read More

Scalyr October Product Updates

The engineering team has been hard at work on new features and updates over the last few weeks. We are excited to share these changes with you and would love to hear your feedback.

Agent

  • By popular demand, we’ve raised the limit on log messages to 10,000 bytes (from 3500). If you’re using the Scalyr Agent, please upgrade to version 2.0.30 (available by the end of next week) to take advantage of this change: https://www.scalyr.com/help/scalyr-agent#upgrades.

  • Amazon EC2 Container Services (ECS) support in the Scalyr Agent.
  • The Scalyr Agent can now rename log files before uploading them to Scalyr.
  • Improved support for Kubernetes logs. You can now configure the Scalyr Agent to parse the JSON records generated by Kubernetes and extract the original log text.
  • The Scalyr Agent can now redact sensitive data (e.g. user email addresses) by replacing it with a hash, allowing you to see patterns without revealing the raw data.
  • The Scalyr Agent’s URL monitor plugin can now generate POST and PUT requests, and can specify HTTP headers and a request body.

 

Integrations

 

 

Improved Search and Team Settings

  • Improved bookmark and link-sharing support for multiple teams. Search URLs now record the team being viewed. When you (or another user) later open the URL, if you’re not linked to the correct team, you’ll be prompted to switch.

  • When you customize the fields shown in the log view, your settings will now persist until you change them. (If you work with multiple teams, your settings are saved separately for each team.)
  • You can now customize the default search time period (normally 4 hours). See “Set Default Search Time Span” on the Tips & Tricks page.

 

Simplified Pricing

  • We are making some changes to our pricing to provide additional flexibility and to make the full Scalyr feature set available to everyone. You can view the new options at scalyr.com/pricing. All affected customers have received an email with more details. These changes will go live November 1st.

Application Logging Practices You Should Adopt

When talking about logging practices, you could segment the topic into three main areas of concern. Those would include, in chronological order, application logging, log aggregation, and log consumption. In plain English, you set up logging in your source code, collect the results in production, and then monitor or search those results as needed.

For aggregation and consumption, you will almost certainly rely on third-party tools (an advisable course of action). But the first concern, application logging, is really up to you as an application developer. Sure, logging frameworks exist.  ut, ultimately, you’re writing the code that uses them, and you’re dictating what goes into the log files.

And not all logging code is created equal. Your logging code can make everyone’s lives easier, or it can really make a mess. Let’s take a look at some techniques for avoiding the mess situation.

What Do You Mean by Application Logging Practices?

Before diving into the practices themselves, let’s briefly clarify what application logging practices are. In the context of this post, I’m talking specifically about things that you do in your application’s source code.

So this means how you go about using the API. How do you fit the logging API into your source code? How do you use that API? And what kinds of messages do you log?

Read More

OkCupid Falls For Scalyr

OkCupid has a long history of building things in-house. They have always been a deeply technical company that doesn’t settle for what’s already available and isn’t afraid to build something new. In fact, OkCupid doesn’t use Apache or Nginx — they built their own web server. So it isn’t surprising that when it came time to reconsider how they manage logs, they considered both existing solutions and the possibility of building something in-house. In the end, they found that Scalyr both met their log management needs for today and has the potential to grow as a solution for them to solve a number of other challenges they face. I recently had a chance to catch up with Alex Dumitriu, CIO at OkCupid, about their early experiences with Scalyr.

Read More

Introduction to Continuous Integration Tools

In a sense, you could call continuous integration the lifeblood of modern software development. So it stands to reason that you’d want to avail yourself of continuous integration tools. But before we get to the continuous integration tools themselves, let me explain why I just made the claim that I did. How do I justify calling continuous integration the lifeblood of software development? Well, the practice of continuous integration has given rise to modern standards for how we collaborate around and deploy software.

What Is Continuous Integration?

If you don’t know what continuous integration is, don’t worry. You’re not alone. For plenty of people, it’s a vague industry buzzword. Even some people who think they know what it means have the definition a little muddled. The reason? Well, the way the industry defines it is a little muddled.

The industry mixes up the definitions of continuous integration itself, and the continuous integration tools that enable it. To understand this, imagine if you asked someone to define software testing, and they responded with, “oh, that’s Selenium.” “No,” you’d say, “that’s a tool that helps you with software testing — it isn’t software testing itself.” So it goes with continuous integration.

Continuous integration is conceptually simple. It’s a practice. Specifically, it’s the practice of a development team syncing its code quite regularly. Several times per day, at least.

Merge Parties: The Continuous Integration Origin Story

To understand why this matters, let me explain something about the bad old days. 20 years ago, teams used rudimentary source control tools, if any at all. Concepts like branching and merging didn’t really exist in any meaningful sense.

So, here’s how software development would go. First, everyone would start with the same basic codebase. Then, management would assign features to developers. From there, each developer would go back to his or her desk, code for months, and then declare their features done. And, finally, at the end, you’d have a merge party.

What’s a merge party? It’s what happens when a bunch of software developers all slam months worth of changes into the codebase at the same time. You’re probably wondering why anyone would call this a party when it sounds awful. Well, it got this name because it was so awful and time-consuming that teams made it into an event. They’d stay into the evenings, ordering pizza and soda, and work on this for days at a time. The “party” was partially ironic and partially a weird form of team building.

But whatever you called it, and party favors or not, it was really, really inefficient. So some forward thinking teams started to address the problem.

“What if,” they wondered, “instead of doing this all at once in the end, with a big bang, we did it much sooner?” This line of thinking led to an important conclusion. If you integrated changes all of the time, it added a little friction each day, but it saved monumental pain at the end. And it made the whole process more efficient.

Read More

Wistia’s Engineering and Customer Support Teams Solve Customer Issues Faster With Scalyr

I recently caught up with Ryan Artecona, Infrastructure Tech Lead at Wistia, to learn more about how their engineering and customer support teams use Scalyr. 

Last summer, the engineering team at Wistia took a step back from their day to day to evaluate the tools and infrastructure they were using. The team realized they needed a log management solution. After evaluating several products, they decided to move forward with Scalyr. Scalyr enabled the engineering team to have more visibility into operations. They were able to identify issues faster and as a result, have found they have fewer disgruntled customers.

About Wistia

Wistia is a professional video hosting and analytics platform designed to help businesses communicate more creatively. Founded in Cambridge, Massachusetts in 2006, Wistia offers businesses the resources to host, organize, customize and measure the impact of video. In addition to video hosting, analytics, and marketing tools, Wistia has a library of educational resources to help you learn the ins-and-outs of creating great video content.

Customer Challenges

Last summer, the engineering team at Wistia took a step back from their normal feature releases to assess their infrastructure. At the time, they had no log aggregation or observability tools beyond New Relic. They had log files, but the only way engineers could use them was to ssh into a specific log and hope the log you were looking for was there. They hadn’t used a log aggregator, but they knew they existed and wanted to find the right one.

The team evaluated Scalyr along with Splunk, SumoLogic, PaperTrail, and LogDNA. The criteria for their evaluation included:

  • Support for live streaming
  • Speed of queries
  • Ability to add indexes to logs in a flexible way
  • Reasonably priced solution

After evaluating all the products, the team decided on Scalyr because it matched their evaluation criteria the best. Some of the others didn’t support live streaming, others were too slow and inflexible and others were too expensive. In the evaluation, the team found Scalyr was easy to use, had powerful capabilities, was fast and was the most competitively priced.

Results of Using Scalyr

It took about a week to get up and running on Scalyr. Now, the product is used pervasively across the engineering team. The simplicity of the product that can get more complex as you need it to helps everyone across the team get value out of the product. For example, when writing a feature, the team wants to log when a few events happen, whatever they put into the code is what they put into Scalyr. They aren’t required to create an opaque translation layer. If they want to start treating it as a string, they can put it in there. If they want something fancier or to do something more complicated, they can make sure to format it to make it through the parser.

According to Ryan Artecona, Infrastructure Engineering Lead at Wistia,

“it is easier and faster to diagnose the bugs that frustrate our customers as a result of using Scalyr.”

Beyond the engineering team, the technical support team (Support Engineers) at Wistia uses Scalyr to help track down the root causes of problems that customers encounter. They use the search functionality in Scalyr extensively to pinpoint the exact requests in the logs that correspond to the problems customers encounter in the browser or when using their APIs. Scalyr has made the customer support team more self-sufficient in resolving customers’ problems – and when they do need to escalate problems to the engineering team, they’re able to give them a great head start on fixing the issue. The handoff between support to engineering is much smoother as a result of using Scalyr.

The engineering team is able to respond to incidents much faster. Rolling out Scalyr has helped the engineering team commit to security best practices, including removing root access to all production servers. Now engineers can write code that is observable from the outside and more conducive to debugging. Using Scalyr has been the team’s first big step in the direction of making debugging software in production more collaborative – something they didn’t have before. Individuals are more empowered to own their code by using Scalyr. Before, if things became too time-consuming (pre-Scalyr), issues would just go undiagnosed and marked as “too hard” to figure out.

The Different Types of Server Monitoring Software

If you landed here from a google search for “server monitoring software,” then I don’t envy you. You’re probably searching for a specific answer to a specific problem. And you’ve no doubt learned that there are roughly 8,498 companies that offer server monitoring software. But the only thing they have in common? All of them can totally monitor everything that you might ever need monitored.

E-commerce has grown steadily in the last two decades, eventually coming to completely dominate our commercial world. So a lot of people have a lot of money invested in a lot of stuff running on the internet. This, in turn, makes a lot of people interested in monitoring that stuff, seeking to defend their livelihoods and gain competitive advantages.

Thus when we talk about server monitoring software, we’re casting a pretty wide net. Too wide, in fact. Servers handle lots of things. Searching for server monitoring software will cause you to smack up against the paradox of choice. You have too many options.

You could narrow the field in a number of ways. How much does the tool cost? What server operating system? Is it open source? You get the idea.

But today, I’d like to help you focus the scope of your search by acknowledging that servers are extremely complex, resulting in many different potential things to monitor. Let’s take a look at the different types of server monitoring software as a function of what, exactly, they monitor. Get a better focus there, and you’ll know what to look for.

Read More

Flyclops Accelerates Development by Using Scalyr

We are thrilled to have Flyclops as a customer at Scalyr. I recently chatted with co-owner Dave Martorana to learn more about how they chose Scalyr and how it is helping the engineering team be more effective.

Flyclops was evaluating log management tools when they came across Scalyr in a newsletter. After signing up for a trial, the team was blown away by the speed of the product. Searches went from minutes to seconds, which allowed them to save valuable development time. Using Scalyr has improved their ability to provide support and rapidly resolve issues, which has led to increased player happiness and let the team sleep at night knowing that they had the right tool in place to help them monitor activity on their servers.

About Flyclops

Flyclops is a independent mobile games studio located in Philadelphia, PA, specializing in casual multi-player games, both asynchronous turn-based, and real-time. Flyclops’s games have been played by millions across the globe.

Evaluation Process

When Flyclops was looking for a logging solution, they evaluated several tools. Other tools they were experimenting with made searching logs a painful task. Flyclops had a few things they were looking for in a log management tool, including:

  • Ease of logging
  • Speed of collection
  • Ability to pull metrics out of log data
  • Ability to parse custom log formats
  • Ability to diagnose issues quickly

Dave Martorana, co-owner of Flyclops, discovered Scalyr via a Google Go newsletter as a featured log management tool. Given that a lot of the Flyclops backend was written in Go, they decided to give it a try.

Results of using Scalyr

When the Flyclops team started using Scalyr, they immediately took notice of the speed and performance of the tool. Searches went from tens of seconds and minutes in other tools to almost instantaneous with Scalyr. 

According to Dave,

“Scalyr has been the single best tool I’ve added to our stack in years”.

Flyclops has 500,000 unique players per month. By using Scalyr, they are able to save significant time investigating issues, which gives more time for development. Scalyr allowed them to diagnose most problems substantially faster than with other tools they had tried. They were able to replace whole suites of monitoring tools with something that can answer questions they don’t know they’re going to have in the future.

The team liked that they had the ability to write their own parsers and didn’t have to conform to a certain pattern when writing data to logs. On the client side of things, they’ve gone through a number of third party crash reporting tools. They started logging client exceptions and wrote some custom parsers in order to parse thru the stack traces and proactively look at what’s unique to their products. They were able to turn Scalyr into the best stack analysis tool they had used.

The team sleeps a lot better knowing that Scalyr is watching their servers. The ability to answer questions they didn’t anticipate allows them to be more proactive. They are able to define custom variables and query them. When launching a new feature with a staged rollout, they are able to use Scalyr to validate that they are rolling out at the speed they expected with just a little bit of graphing. All of this allows them to better support their players, and be sure their customers are having a high-quality experience at all times.

Features of a Good Log File Viewer

When you think of a log file viewer, what do you think of? Come on, be honest. Vim or Emacs?  Notepad++ or Sublime? Do you just invoke “tail” from the command line? Please say you’re not just using Notepad in Windows. No judgment, but if you’re going that route, you’re putting yourself through a lot of unneeded pain.

I ask about these tools — these text editors — because that’s how we conceive of a log file viewer. Logs are text files. So when we go to view them, we use a tool meant for viewing (and editing) text. This is completely understandable, and it’s also served us well as an industry since some programmer half a century ago first had the idea to output runtime information to a file.

Or, at least, it has sufficed. Viewing log files with a text editor supplies us with the basics for troubleshooting. We can look at the information contained in the file and we can do text search, with varying degrees of sophistication. And, if necessary, we can copy and make modifications to the text.

But surviving isn’t thriving. When it comes to employing a log file viewer, we can ask for so much more. This really shouldn’t surprise in the year 2017. Software is “eating the world” and the DevOps movement has brought us an explosion of SaaS and tools to help software shops. Should it really surprise anyone that the modern log file viewer can do some awesome stuff?

Let’s take a look at some of what you should expect when picking a tool to help you view your logs. What are features of a good log file viewer in this day and age?

Read More

What To Look for in a Logging Framework

If you’ve spent any amount of time in the software industry, you’ve probably bumped up against logging. Maybe you first encountered it as a way to debug your program as you worked, printing “you are here” messages. Go much beyond that, especially in 2017, and your logging efforts likely graduate to the use of a logging framework.

What Is a Logging Framework? Let’s Get Precise.

So what marks this distinction? When do you go from newbie printing out “got into the Calculate() method” to user of a logging framework? To understand that, let’s define logging framework.

A logging framework is a utility specifically designed to standardize the process of logging in your application. This can come in the form of a third party tool, such as log4j or its .NET cousin, log4net. But a lot of organizations also roll their own.

You have to go beyond just standardizing, however, to qualify as a framework. After all, you could simply “standardize” that logging meant invoking the file system API to write to a file called log.txt. A logging framework has to standardize the solution by taking it care of the logging for you, exposing a standard API.

To get more specific, we can conceive of a logging framework encapsulating three main concerns: recording, formatting, and appending. When you want to capture runtime information about your application, you start by signaling what to record. Then, you determine how to format that information. And, finally, you append it to something. Classically, this is a file, but you can also append to RDBMS tables, document databases, or really anywhere capable of receiving data.

If you have a piece of code dedicated to solving these problems, you have yourself a logging framework.

Should You Use an Existing Logging Framework or Build Your Own?

Now that we have precision around the definition, let’s examine your options for making use of a logging framework. Notice that I frame the question in terms of which one you should pick, rather than whether or not you should use one. I do that because production applications should most definitely make use of logging frameworks. Without them, you’re flying blind about what your application does in the wild.

The question then becomes whether you should use an existing one or write your own. Use an existing one — full stop. It’s 2017, and this is a well solved problem in every tech stack.  You shouldn’t write a logging framework any more than you should write your own source control tool or build a bug tracker. Others have built these, and you can have them cheaply or freely. Stick with solving problems in your own domain, rather than reinventing wheels.

So you need a logging framework, and you should use an existing one. The question then becomes “which one?” Which logging framework should you use? I won’t answer that outright, since it will depend on your stack and your needs. Instead, I’ll offer guidance for how to choose.

Read More