Java Exceptions and How to Log Them Securely

As a security consultant, I perform assessments across a wide variety of applications. Throughout the applications I’ve tested, I’ve found it’s common for them to suffer from some form of inadequate exception handling and logging. Logging and monitoring are often-overlooked areas, and due to increased threats against web applications, they’ve been added to the OWASP Top 10 as the new number ten issue, under the name “Insufficient Logging and Monitoring.”

So what’s the problem here? Well, let’s take a look.

Java Exceptions alert sign
Read More

Common Ways People Destroy Their Log Files

For this article, I’m going to set up a hypothetical scenario (but based on reality) that needs logging. We’re writing an application that automates part of a steel factory. In our application, we need to calculate the temperature to which the steel must be heated. This is the responsibility of the TemperatureCalculator class.

The class is fed a lot of parameters that come from external sensors (like current temperature of the furnace, weight of the steel, chemical composition of the steel, etc.). The sensors sometimes provide invalid values, forcing us to be creative. The engineers said that, in such a case, we should use the previous value. This isn’t something that crashes our application, but we do want to log such an event.

So the team has set up a simple logging system, and the following line is appended to the log file:

An invalid value was provided. Using previous value.

Let’s explore how this well-meant log message doesn’t actually help. In fact, combined with similar messages in our log file, the log file ends up being a giant, useless mess.


Trash Fire Depicting Way People Destroy Log Files


Read More

Real-World Applications of Increased Visibility

What can change in an organization when you increase visibility? A lot.

Previously I wrote about how providing visibility to key information is a core enabler of high-functioning, high-speed teams. When put into practice, information visibility increases can lead to transformative results. In this post I’ll use a mix of Scalyr customers and others I’ve worked with in my couple of decades here in Silicon Valley to show you concrete examples where companies have realized these benefits.

Common to all of these use cases is the elimination of “middlemen” and dramatically decreasing latency in the information retrieval process. Giving employees direct, rapid access to the information they need to make effective decisions facilitates decentralized decision-making and chips away at organizational silos. Enhancing knowledge worker productivity using this approach is not new. Harvard Business School analyzed the implications of decentralized decision-making, and GE conceptualized its path to eliminating silos more than 25 years ago. Unsurprisingly, in both cases the benefits far outweighed the costs.

Whether we’re talking about engineers or customer service specialists (and we’ll cover both) remember that Data != Information. Simply having access to data—even if it represents every event happening everywhere in your environment—isn’t enough. Care and effort must be taken to ensure that data is processed and organized to be immediately consumable by the intended audience.

As a general rule of thumb, figure that half of the work will be in gathering, storing, and calculating the raw data. The other half of the work is around the presentation and organization of information.

Engineering and SaaS Use Cases

These next examples walk through the benefits that result from giving engineers increased visibility into production environments. Similar impacts can be seen in Dev/Test environments, visibility into CI/CD pipelines, testing status, and related environments. In short, any situation with multiple teams and a potential “black box” is a candidate to reap the benefits of increased transparency.

Shortening the Product Defect Lifecycle

This is such a common—and important—use case for increased visibility that we wrote an entire post on it. Visibility is the first step in the process: Is the Customer Support team immediately alerted to issues? Can your CS and Dev teams get direct access to logs when troubleshooting? Do all of your teams have clear visibility into the same data? Answer no to any of those and your teams are wasting valuable time because they lack the visibility required to shorten the defect lifecycle.

Our customers report that their internal latency times around bug triage, inter-team escalations, and root cause analysis typically decrease by a factor of 5-10 when using Scalyr. Interestingly Scalyr customers have told us that this change matters less over time because increased visibility into log data doesn’t just shorten the product defect lifecycle—it actually decreases the number of product defects. They attribute this decrease to individual engineers’ very high engagement with the log data leading to them catching a correspondingly greater percentage of issues earlier in the development process.

Next Generation Deployment Techniques

Imagine if you will a traditional code deployment pipeline, one where the engineering team hands over a release to Ops, Ops deploys it during a specific window within which QA tests, and both Ops and Customer Support stand by post-deployment to verify the health of the running system. But if your goal is to deploy continuously, with multiple releases per week (or per day!) or partial releases via feature flags, blue/green deployments, or similar incremental deployment strategies, the traditional process quickly breaks down.

Why? In traditional environments, engineers monitor releases with prebuilt dashboards and tools (like daily email reports) but cannot access individual server logs or system/application performance metrics for the full stack. As companies move to a more integrated code release pipeline, developers need a more granular and up-to-date view of their code operating in production.

The continuous delivery model can only succeed if engineers have easy access to:

  • The current state of production systems
  • The detailed state of their code (dashboards aren’t enough)
  • All relevant log files (and when in doubt, let them see the data)

Logs as Primary Data

This next use case is slightly different since not only do employees need access to logs, but they need it fast enough to use in their typical decision-making workflow. Once you have that in place something magic happens… your logs become a primary information source, not one of last resort. The specific implications of this are pretty wide-ranging, but among Scalyr customers, the most common benefits are:

  • Better logging. Once developers know they can get to the logs for real debugging, they start putting more, and cleaner, logging events in their code.
  • Democratized access to logs. When engineers can freely explore how applications are running in production, more eyes are on the lookout for problems, engineers build code for “what is” vs. how things were described to them, and teams operate more asynchronously.
  • Better tools. Knowledge that the data you need is reliably in a central location allows enterprising teams to build specific tools to assist with team-specific issues. This is particularly powerful as over time teams build numerous small tools that would never make the official roadmaps but still provide tangible benefits.

The exact implications for you will depend on how your teams decide to make use of this new power. As the saying goes, “Garbage in, garbage out,” but clean and descriptive logs can transform a business,  as I’ll show in the next use cases.

From Engineering and SaaS to Customer Service

Visibility is not just a high-leverage tool for teams reporting to the CIO or VP of Engineering. Any team working to decentralize decision-making or increase organizational efficiency can benefit. The next two examples highlight how non-technical customer-facing teams made transformative changes by enabling employee visibility into operational metrics and data.

Improving Customer Support

Recently Return Path, a leading provider of outbound email services, granted all of their Tier 1 customer support employees direct access to the production application logs. This simple but dramatic shift reduced ticket turnaround times from three business days to about five minutes for customer issues like the following.

Previously, when a support rep received a ticket from a customer complaining that an email wasn’t delivered, the three-day investigation process went something like this:

  1. Work with the customer to verify common email client or other end-user issues weren’t to blame.
  2. Contact Ops to verify that no known issues for the application were to blame.
  3. Create a ticket for the Ops team to pull the relevant logs.
  4. Receive the logs and review the delivery status of the email(s) in question.
  5. Get back to the customer and if required, open a second ticket with Ops or Engineering for any application issues found.

Not the best experience for the customer…

Fast-forward to today and that the same ticket is handled much differently. While on the phone or chat with the customer, the support rep:

  1. Gets the customer’s message ID.
  2. Queries the application logs for the full status of that message (or any other potentially relevant messages) to identify the issue.
  3. Gives the customer an immediate answer and if required, creates a ticket for Ops or Engineering.

Not only is the customer experience dramatically improved, both the customer support and Ops teams can spend more time on actual work and less time passing around tickets.

Contact Center Employee Optimization

My last example veers off the standard software development and SaaS path to a very different type of organization: contact centers. For those of you not familiar with the space, contact centers consist of inbound customer support centers, inbound or outbound sales teams, and medium- to large-scale call centers. Contact centers have long had a multitude of metrics used to track their performance. These metrics are used for several key things, most importantly the contact center’s financial and employee performance.

A startup I once worked with called Merced Systems, stepped into the contact center space with a fairly simple proposition. If employees, frontline managers, and company executives had access to key metrics in a timely manner through a user interface that allowed them to understand the raw data, they could use that information to drive more efficient and successful customer engagements. In other words, they built a product that enabled employee visibility into contact center operational metrics and allowed their customers to operate more efficiently.

Customers realized these efficiency gains in several key areas:

  • Employees could self-optimize their actions to meet real-time goals.
  • Managers could evaluate employee performance based on actual vs. perceived performance.
  • Executives could analyze contact center performance along various dimensions.

Net result? Extremely happy customers like T-Mobile, Coca Cola, Echostar, and many others— and Merced Systems going from idea to $170m acquisition in less than 10 years. All from the simple idea that granting everyone visibility to key information leads to more efficient operations.

These examples give you some ideas on where, and how, you can apply increased visibility to your environment. If you have a story about how visibility into the right information transformed your environment, we’d love to hear it about it in the comments below!

Next time I’ll be talking about the nuts and bolts of enabling visibility in SaaS environments and where we’ve seen the biggest bang for the buck.

Visibility = Speed

Waiting … to … find … out … something … breaks … everything.

If you found yourself wanting to skip over that sentence, you’re not alone.

For engineers, and knowledge workers in general, milliseconds can mark the difference between a person’s willingness to wait for information and their need to take action. If they wait, they risk falling behind. If they act on incomplete information, they make suboptimal decisions.

As business trends—and the release cycles they drive—speed up and companies struggle to fill engineering roles, this tradeoff becomes even more important. If your teams are chronically understaffed by 10-20%, can you afford to have existing staff executing at anything less than 100% efficiency?

Rapid information flow is key to ensuring that employees have maximum visibility into the information they need, when they need it. In an ideal world teams use that visibility to move with speed AND accuracy—even Facebook realized that a maturing company can’t just move fast and break things. But given that the faster you move, the higher probability you have of breaking something, navigating the speed vs. accuracy conundrum becomes paramount. Giving employees a complete view of the environment and the results of their actions is the single biggest thing you can do to enable success. Put simply:

Maximum visibility depends on knowing four key things:

  1. What to do
  2. When to do it
  3. The starting state of the system
  4. What actually happened/is happening

Effective information flow for the first two are core tenets of the Agile movement. Done right, Agile makes it clear to both engineers and project managers what needs to be done, and when. Engineers no longer need to wait to learn (or guess at) what a product manager was intending, and product managers no longer have to guess how far along a project is, or if it can be built as desired. This visibility increase between product and engineering forms the basis of many of Agile’s advantages.

Numbers 3 and 4 might lack their own manifesto, but seasoned developers and ops engineers instinctively understand how critical they are. The methods and tools deployed to gain visibility into an environment fall broadly into five categories:

  • Application Performance Monitoring (APM)
  • Systems and Network Monitoring
  • Metrics Dashboards
  • Log Aggregation
  • Configuration Management

Collectively these categories represent a more than $15 billion-dollar market, and that’s not accounting for dominant open-source players in the space like Nagios, Grafana, ELK, and Ansible (among many, many others).

Why are so many resources aimed at solving this visibility issue?

The Benefits of Increased Visibility

Let’s use two fictitious organizations: Acme Corp and Nadir Corp, to explore how visibility impacts behavior and execution speed. In both companies any employee can access any piece of information—but the method and speed of access differ greatly.

Acme Corp has built a culture of radical transparency where every employee has immediate access to every piece of company information through a lightning-fast application accessible from anywhere in the world on any device. Employees have a top-level view of key information and can do ad-hoc data exploration, for near-perfect visibility into the operation of the system at all times.

At Nadir Corp, every request for information goes through a rigorous process, occasionally with hard-copy sign-offs, before being granted. Employees must find out where the data is stored, who to request it from, justify their request, and wait for approval. Once all of that work is complete they can finally try to answer their question using the data they received.

In practice, of course, no company is as open as Acme (for very good security reasons!) and very few are as convoluted as Nadir. But from this example it’s brutally apparent which company will be able to investigate, reach decisions, and execute faster.

Employees at Nadir either 1) won’t bother trying to get data unless they absolutely have to, or 2) will look for shortcuts that allow quicker access to a slice of the data. Both of these factors lead to a continuation of the speed vs. accuracy conundrum mentioned above. Employees at Nadir are forced to either wait for key information to act, or act with limited information.

Teams or individuals who take the first option get left behind, those that take the second option make more than their share of errors.

Every company has elements of Nadir Corp in them. Sometimes for good reasons (HR records), sometimes for no good reason (lack of priority/time), and sometimes for bad ones (silo building).

Companies that aspire to be more like Acme Corp and invest in finding and eliminating silos and legacy barriers to data will quickly realize the gains of increased visibility:

  • Increased visibility drives use of optimal data sources
  • Fast access to optimal data leads to more efficient work
  • More efficient work equals faster execution

In the age-old debate of good vs. fast vs. cheap, what should you do if you want good and fast but don’t have an unlimited budget? Invest in tools that allow employees to quickly get to key information, rapidly assess the results of their work, and continually refine their actions. Do that and those chronically overworked engineers and operations staff will be able to operate faster and with fewer errors. And isn’t that what we’re all building toward?

In my next posts, I’ll delve into the practical implications of increased visibility and common tools of the trade that promote visibility.


Log Appender: What Is It and Why Would You Use It?

If you’re not familiar with application logging, I can understand there being some confusion when you hear the term “log appender.”  What do you mean, “append”?  I want to write stuff to a file.  Isn’t that kinda the whole deal with log files?

So let’s demystify things a little.  A log appender is a specific part of the general logging process.  Meaning, yes, logging is often about writing to files, and a log appender can help you with that.  But it can actually help you with a whole lot more besides.

The appender is the part of a logging system that’s responsible for sending the log messages to some destination or medium.  It answers the question “where do you want to store this stuff?”

Image of Storage Cabinet

Anatomy of a Logging Operation

If all you’ve ever done with logging is dump messages to a file, this might smack of over-engineering.  “What do you mean ‘part of a logging system’?  You call the file API and write stuff to a file.”

Well, that certainly works for simple and small cases.  But as you expand your logging operation, you might need to get a little more sophisticated since simple file writes will start to create conflicts and other problems. You might even adopt a first-class logging framework. (In fact, you should.)  When you do this, logging becomes a more involved proposition, and it’s one that you can split into three main concerns:

  • Message recording and formatting.  This is where you decide what should go in the log and how to format it.
  • Log appender.  This is, as I’ve already mentioned, the part of the operation that decides where the messages go and how they get there.
  • Log consumption.  This can range from someone simply inspecting the logs to sophisticated search, intelligence, and visualization.

Even in the simplest logging implementation, all of these things actually take place.  For instance, consider this pseudocode:

public void keepTrackOfSomething() {
     _file.write("This method doesn't seem super-useful", "log.txt", File.Append);

Let’s see how our three concerns apply.

  • Recording and formatting is just creating the string “This method doesn’t seem super-useful.”
  • The appending is a simple file write in append mode to “log.txt”
  • Consumption happens later when someone scans through log.txt and reads the message.

Read More

What Goes Into Log Analysis?

I’ve talked here before about log management in some detail.  And I’ve talked about log analysis in high-level terms when making the case for its ROI.  But I haven’t gone into a ton of detail about log analysis.  Let’s do that today.

At the surface level, this might seem a little indulgent.  What’s so hard?  You take a log file and you analyze it, right?

Well, sure, but what does that mean, exactly?  Do you, as a human, SSH into some server, open a gigantic server log file, and start thumbing through it like a newspaper?  If I had to guess, I’d say probably not.  It’s going to be some interleaving of tooling, human intelligence, and heuristics.  So let’s get a little more specific about what that looks like, exactly.

Log Analysis, In the Broadest Terms

In the rest of this post, I’ll explain some of the most important elements of log analysis.  But, before I do that, I want to give you a very broad working definition.

Log analysis is the process of turning your log files into data and then making intelligent decisions based on that data.

It sounds simple in principle.  But it’s pretty involved in practice.  Your production operations generate all sorts of logs: server logs, OS logs, application logs, etc.  You need to take these things, gather them up, treat them as data, and make sense of them somehow.  And it doesn’t help matters any that log files have some of the most unstructured and noisy data imaginable in them.

So log analysis takes you from “unstructured and noisy” to “ready to make good decisions.”  Let’s see how that happens.


Read More

Search Your Files with Grep and Regex

How do you search through a file?  On the surface, this might seem like sort of a silly question.  But somewhere between the common-sense answer for many (“double click it and start reading!”) and the heavily technical (“command line text grep regex”) lies an interesting set of questions.

  • Where does this file reside?
  • What kind of file is it?
  • How big is the file?
  • What, exactly, are you looking for in the file?

Today, we’re going to look at one of the most versatile ways to search a file: using grep and regex (short for regular expression).  Using this combination of tools, you can search files of any sort and size.  You can also search with extremely limited access to your environment, and if you get creative, you can find just about anything.

But with that versatility comes a bit of a learning curve.  So let’s look at how to take the edge off of that and get you familiar with this file search technique.  To do that, I’ll walk through a hypothetical example of trying to extract some information.  First, though, let’s cover a bit of background.

Magnifying Glass Aimed at Search Box on Screen

Read More

Logging Levels: What They Are and How They Help You

Logging levels probably aren’t the most exciting thing in this world.  But then again, neither is banking.  And yet both things are fundamental to the people who use them as a tool.

Application logging is one of the most important things you can do in your code when it comes to facilitating production support.  Your log files serve as a sort of archaeological record of what on earth your codebase did in production.  Each entry in a log file has important information, including a time stamp, contextual information, and a message.  Oh—and generally, something called a logging level.

So what are logging levels?

Well, put as simply as possible, they’re simply a means of categorizing the entries in your log file.  But they categorize in a very specific way—by urgency.  At a glance, the logging level lets you separate the following kinds of information:

  • Hey, someone might find this interesting: we just got our fourth user named Bill.

Logging levels can help you distinguish whether you need the pictured fire extinguisher or whether the message just contains information.

For the most part, this distinction helps in two ways.  First, you can filter your log files this way during search.  And second, you can control the amount of information that you log.  But we’ll get to that in a bit.

Logging Levels: Why Do We Do It?

When it comes to logging, you have two essential and opposing forces, if you will.  On the one hand, you want to capture every last detail you can because this might prove useful during troubleshooting or auditing your system.  But on the other hand, all of that logging consumes resources. You can eat up disk space, overload people reading the logs, and even start to slow down your production code if you go overboard.

So logging requires either a balance or a way to get both the proverbial signal and the noise.  And logging levels look to help with this by making your situation configurable on the fly.

Here’s a helpful metaphor, perhaps.  Imagine you have an android friend and you ask him about what he did today.

“Well, I woke up, inhaled, moved my nose three inches to the right—”

“No!  Too much information.  Go from granularity 10 down to 3.”

“I woke up, had breakfast, caught a taxi…”

“THAT’s better.”

Usually, you don’t need to know everything the android did.  But if he starts malfunctioning, you might want to go back to “granularity 10” to zoom in on the problem as narrowly as possible.

Logging levels work this way.  Often you’ll only want to know if there are problems or warning conditions.  But sometimes you’ll really want to dial up the information for troubleshooting purposes.

Read More

Get Started Quickly With C# Logging

If you’re interested in getting started with C# logging as quickly as humanly possible, you’ve come to the right place.  Today, we’ll look at just how to do that.  Don’t worry, we’ll get into comprehensive detail too, covering a number of things.  Some of those include

  • What is logging?
  • Why would you want to log?

But for now, let’s get to the action.  How can you start logging in the absolute quickest and simplest way possible?  Let’s take a look.

Getting started quickly with C# logging

The Simplest C# Logging That Could Possibly Work

Let’s start with a clean slate for the sake of easy demonstration.  First, open Visual Studio and create a new console project by selecting File->New->Project, as shown here.

To get started with C# logging, create a console project, as shown here.

I’ve called my new project ConsoleProject and opted to put it into a new solution called LoggerPlaypen.  No need to congratulate me on my creativity, though.

This will open the Program.cs file containing your Program class.  Let’s now add a few lines of code to that class, as follows:

static void Main(string[] args)
    var displayMessage = "Hello, user!";


We now have an application that successfully defines a greeting, displays it, and waits for an enter keystroke to dismiss it.  When you hit F5 to run, you’ll see the following message before pressing enter:

Here is the console output, "Hello, user!"

But let’s say that we wanted not only to display this message but also to log it to a file.  How might we do that?  With something fancy, like a logging framework, inversion of control, or aspect-oriented programming?  Nah, let’s not get ahead of ourselves.  Try this instead, making sure to add “using System.IO;” at the top with the other using statements:

static void Main(string[] args)
    var displayMessage = "Hello, user!";

    File.WriteAllText("log.txt", displayMessage);


Now, when you hit F5, you’ll see your greeting message once again.  But that extra line you’ve added will create a log file for you.  Go to your project directory, and then navigate to the Bin->Debug folder. You’ll see a file hanging out there in addition to your console executable.

Here's the file in your Debug directory, plus a shot of the file's contents, "Hello, user!"

Is That Really It?

And…that’s it.  You just successfully implemented your first C# logger!

Is that all you need to know about logging?  Of course not.  Will this logging scheme be sufficient for any but the most trivial applications?  Nope.  But have you learned the barest essential backbone of C# logging?  Absolutely.

I told you it’d be a quick start.  But I strongly recommend that you read on to really get a handle on this logging thing.

Read More

Be Kind to Your Log File (And Those Reading It)

The log file has existed since programmer time immemorial. The software world moves really fast, but it’s not hard to imagine someone decades ago poring over a log file. This is, perhaps, as iconic as programming itself.

But sadly, for many shops, the approach hasn’t really evolved in all of that time. “Dump everything to a file and sort it out later, if you ever need it” was the original idea. And that holds up to this day.

Put another way, we tend to view the log file as a dumping ground for any whim we have while writing code. When in doubt, log it. And, while that’s a good impulse, it tends to lead to a “write once, read never” outcome. If you’ve ever started to troubleshoot some issue you have via OS logs and then given up due to being overwhelmed, you know what I mean.

But, even if you don’t, you can picture it. A log file with gigabytes of cryptic, similar looking text page after page discourages meaningful troubleshooting. Reading it is so boring you start to find yourself hypnotized before you blink, thinking to yourself that this isn’t going to help, and move on to other means of troubleshooting.

Rethinking the Log File

So, with that in mind, let’s rethink the log file. People focus an awful lot on intuitive user experiences these days, and we can draw from that. Think of the log file not as some dusty dumping ground for every errant thought your production code has, but as something meant to be consumed.

Who will read your log file? And for what purpose? And, with the answers to those questions, how can we make it easier for them?

This is an important enough consideration that products have emerged to help facilitate consuming log files, making troubleshooting and data gathering easier. People are buying these products to make their lives easier, so that should tell you something. They desperately want to filter the signal from the noise and get valuable data out of their logs.

So let’s take a look at how you can help with that when writing to log files. These tools have sophisticated parsing capabilities, but that doesn’t mean you shouldn’t do your part to help consumers of your log files. Here’s how you can do that.

Read More