A Detailed Introduction to the Apache Access Log

What is the Apache access log?  Well, at the broadest level, it’s a source of information about who is accessing your website and how.

But as you might expect, a lot more goes into it than just that.  After all, people visiting your website aren’t like guests at your wedding, politely signing a registry to record their presence.  They’ll visit for a whole host of reasons, stay for seconds or hours, and do all sorts of interesting and improbable things.  And some of them will passively (or even actively) thwart information capture.

So, the Apache access log has a bit of nuance to it.  And it’s also a little…complicated at first glance.

But don’t worry — demystifying it is the purpose of this post.

Apache Access Log: the Why

I remember starting my first blog years and years ago.  I paid for hosting and then installed a (much younger) version of WordPress on it.

For a while, I blogged into the void with nobody really paying attention.  Then I started to get some comments: a trickle at first, and then a flood.  I was excited until I realized that they were all suspiciously vague and often non-sequiturs.  “Super pro info site you have here, oPPS, I HITTED THE CAPSLOCK KEY.”  And these comments tended to link back to what I’ll gently say weren’t the finest sites the internet had to offer.

Yep.  Comment spam.

Somewhere between manually deleting these comments and eventually installing a WordPress plugin to help, I started to wonder where these comments were all coming from.  They all seemed to magically appear in the middle of the night and they were spammy, but I was interested in patterns beyond that.

This is a perfect use case for the Apache access log.  You can use it to examine a detailed log of who has been to your website.  The information about visitors can include their IP address, their browser, the actual HTTP request itself, the response, and plenty more.

An apache feather, representing our look at the apache access log.Read More

DevOps: Past, Present, and Future

While DevOps is no longer a brand-new field or movement, there continues to be rapid innovation in the space. Every day new tools are announced, new technologies are created, and teams around the world have to sort through the noise to figure out what’s actually important to their team and their environment. Some tools are transformative (VMWare), some are useful but quickly supplanted (LXC), some prove to be neat technology but never find wide adoption (VRML).

In this post I explore DevOps’ past, present, and future, reflecting on 2017 and looking to 2018 and beyond. What once was revolutionary is now commonplace, and the future holds the promise of much greater efficiencies, albeit with a significant amount of retooling to get there.

DevOps Adoption is: The Past

Today, DevOps is the de facto standard for modern IT and Operations. No longer are you an early adopter if you roll out DevOps. In 2017 only ~25% of companies hadn’t started down the DevOps path. In 2018 most companies will complete the journey, leaving those without DevOps in the minority. Is DevOps for everyone? I’d argue that the core concepts of DevOps— collaboration, communication, and efficiency—are beneficial to any company, in any industry.  The benefits that companies receive are both material and increase over time.

Through 2018 and beyond, DevOps will continue to entrench itself as the new normal in how companies build and run large-scale software systems. The business world will continue its adoption of the DevOps mindset and tools, occasionally crossing over from traditional software engineering into other practices, much as Toyota’s Lean manufacturing inspired the creation of Lean Startups and forms the foundation for next-generation hospital operations. The maturation of the DevOps space will drive the development of new terminology—for good reasons (more precise language and descriptions) and bad (our product is different because we target DevSecAIFooOps). These new labels (and inevitable jargon) will muddy the waters around DevOps a bit, mostly due to vendors fighting to stand out from the crowd and avoid becoming a casualty of industry consolidation.

The next phase of DevOps evolution involves marrying Agile product development processes with DevOps-centric production release environments. This workflow evolution promises to dramatically increase knowledge worker efficiency—but while very early implementers will start this process in 2018, the practice won’t become widespread until 2019 or later.

Containers are: The Present

Containers as a concept (be they Kubernetes, Docker, or some other new format) will start to “cross the chasm” and see much more widespread adoption in a variety of use cases. Most early adoptions will focus on enabling engineer productivity and software distribution. More evolved environments will use containers to incrementally deploy their microservices and ensure consistency among dev, test, and production environments.

Where the first wave of virtualization was all around efficient hardware utilization, the wave of containerized virtualization is about enabling consistent software environments. Expect to see more tools enabling software release orchestration using containers as the deployable code artifact. This will have benefits in terms of testability and reproducibility but will require teams to invest in new tools and modes of operation in order to make full use of the potential.

Increased adoption of containers will lead to a dilution of the core message and some industry pushback due to:

  • Fear of change
  • Valid attempts by engineers to separate fluff from reality
  • Lack of good tools and example use cases

Ultimately, containers will evolve into a core component of modern IT infrastructure much as virtualization did 15 years ago.

Unikernels are: The Future

Unikernels are a very promising bit of technology, but are still a couple years away from widespread adoption. For those not familiar with them, Unikernels are a newer type of container technology that embeds the operating system in your application (i.e., the reverse of your typical OS). Most operating systems need things like support for multiple users and applications, and, in most cases, a user interface. Unikernels ditch all of that overhead and pare the OS down to the absolute minimum. This is good from a security and reliability standpoint, but will require engineering and operations teams to retool to support monitoring, debugging, and deployment of Unikernel applications.

Unikernels have the potential to dramatically change the paradigms for production software environments (primarily due to dramatically altering the security landscape). In 2018 you’ll hear a lot more about them and vendors will start to tout their support—but expect to see only limited production adoption. My hope is that Unikernel standards will start to emerge, and at least one or two will be ready for early production deployments of common app platforms (UniK? IncludeOS? Xen?).

Hybrid Clouds are: Still to be defined

“Hybrid cloud” means many different things to many different people. To some it means running your environment across your physical datacenter and AWS/Azure; to others it means stitching together SaaS offerings from multiple providers; to still others it means only mixing of IaaS providers. The hybrid cloud story has been touted since the very early days of AWS with Microsoft, VMWare, IBM/Softlayer, Rackspace, and others all making their pitch to the market about what a hybrid cloud should look like. Often more than once.

The industry demand for hybrid cloud functionality continues to grow. Engineering and operations teams must be able to build, deploy, and run applications across multiple providers, and have those providers interoperate securely and efficiently. They need the ability to migrate selected IT workloads to cloud providers without undertaking massive retooling efforts. And yet there still doesn’t seem to be an agreed-upon set of solutions. Too many vendors are building walled gardens to keep customers in instead of building tools that allow seamless, secure, cross-platform communication. But, Microsoft, VMWare, Google, and others continue to invest in this area, so I’m hopeful we’ll start to see some type of consensus and standards developed over the next few years.  

I’ll be tracking the success of my predictions throughout 2018…and invite you to do the same. Commend me, call me out, or leave your own predictions in the comments below.

Get Started Quickly With Java Logging

You’ve already seen how to get started with C# logging as quickly as possible.  But what if you’re more of a Java guy or gal? Well, then we’ve got your back, too: today’s post will get you up to speed with logging using C#’s older cousin.

As in the previous post in this series, we’ll not only provide a quick guide but also go into more detail about logging, diving particularly into the what and why of logging.

The Simplest Possible Java Logging

For this simple demo, I’m going to use the free community version of IntelliJ IDEA. I’m also assuming that you have the Java JDK installed on your machine.

Read More

Visibility = Speed

Waiting … to … find … out … something … breaks … everything.

If you found yourself wanting to skip over that sentence, you’re not alone.

For engineers, and knowledge workers in general, milliseconds can mark the difference between a person’s willingness to wait for information and their need to take action. If they wait, they risk falling behind. If they act on incomplete information, they make suboptimal decisions.

As business trends—and the release cycles they drive—speed up and companies struggle to fill engineering roles, this tradeoff becomes even more important. If your teams are chronically understaffed by 10-20%, can you afford to have existing staff executing at anything less than 100% efficiency?

Rapid information flow is key to ensuring that employees have maximum visibility into the information they need, when they need it. In an ideal world teams use that visibility to move with speed AND accuracy—even Facebook realized that a maturing company can’t just move fast and break things. But given that the faster you move, the higher probability you have of breaking something, navigating the speed vs. accuracy conundrum becomes paramount. Giving employees a complete view of the environment and the results of their actions is the single biggest thing you can do to enable success. Put simply:

Maximum visibility depends on knowing four key things:

  1. What to do
  2. When to do it
  3. The starting state of the system
  4. What actually happened/is happening

Effective information flow for the first two are core tenets of the Agile movement. Done right, Agile makes it clear to both engineers and project managers what needs to be done, and when. Engineers no longer need to wait to learn (or guess at) what a product manager was intending, and product managers no longer have to guess how far along a project is, or if it can be built as desired. This visibility increase between product and engineering forms the basis of many of Agile’s advantages.

Numbers 3 and 4 might lack their own manifesto, but seasoned developers and ops engineers instinctively understand how critical they are. The methods and tools deployed to gain visibility into an environment fall broadly into five categories:

  • Application Performance Monitoring (APM)
  • Systems and Network Monitoring
  • Metrics Dashboards
  • Log Aggregation
  • Configuration Management

Collectively these categories represent a more than $15 billion-dollar market, and that’s not accounting for dominant open-source players in the space like Nagios, Grafana, ELK, and Ansible (among many, many others).

Why are so many resources aimed at solving this visibility issue?

The Benefits of Increased Visibility

Let’s use two fictitious organizations: Acme Corp and Nadir Corp, to explore how visibility impacts behavior and execution speed. In both companies any employee can access any piece of information—but the method and speed of access differ greatly.

Acme Corp has built a culture of radical transparency where every employee has immediate access to every piece of company information through a lightning-fast application accessible from anywhere in the world on any device. Employees have a top-level view of key information and can do ad-hoc data exploration, for near-perfect visibility into the operation of the system at all times.

At Nadir Corp, every request for information goes through a rigorous process, occasionally with hard-copy sign-offs, before being granted. Employees must find out where the data is stored, who to request it from, justify their request, and wait for approval. Once all of that work is complete they can finally try to answer their question using the data they received.

In practice, of course, no company is as open as Acme (for very good security reasons!) and very few are as convoluted as Nadir. But from this example it’s brutally apparent which company will be able to investigate, reach decisions, and execute faster.

Employees at Nadir either 1) won’t bother trying to get data unless they absolutely have to, or 2) will look for shortcuts that allow quicker access to a slice of the data. Both of these factors lead to a continuation of the speed vs. accuracy conundrum mentioned above. Employees at Nadir are forced to either wait for key information to act, or act with limited information.

Teams or individuals who take the first option get left behind, those that take the second option make more than their share of errors.

Every company has elements of Nadir Corp in them. Sometimes for good reasons (HR records), sometimes for no good reason (lack of priority/time), and sometimes for bad ones (silo building).

Companies that aspire to be more like Acme Corp and invest in finding and eliminating silos and legacy barriers to data will quickly realize the gains of increased visibility:

  • Increased visibility drives use of optimal data sources
  • Fast access to optimal data leads to more efficient work
  • More efficient work equals faster execution

In the age-old debate of good vs. fast vs. cheap, what should you do if you want good and fast but don’t have an unlimited budget? Invest in tools that allow employees to quickly get to key information, rapidly assess the results of their work, and continually refine their actions. Do that and those chronically overworked engineers and operations staff will be able to operate faster and with fewer errors. And isn’t that what we’re all building toward?

In my next posts, I’ll delve into the practical implications of increased visibility and common tools of the trade that promote visibility.


Log Appender: What Is It and Why Would You Use It?

If you’re not familiar with application logging, I can understand there being some confusion when you hear the term “log appender.”  What do you mean, “append”?  I want to write stuff to a file.  Isn’t that kinda the whole deal with log files?

So let’s demystify things a little.  A log appender is a specific part of the general logging process.  Meaning, yes, logging is often about writing to files, and a log appender can help you with that.  But it can actually help you with a whole lot more besides.

The appender is the part of a logging system that’s responsible for sending the log messages to some destination or medium.  It answers the question “where do you want to store this stuff?”

Image of Storage Cabinet

Anatomy of a Logging Operation

If all you’ve ever done with logging is dump messages to a file, this might smack of over-engineering.  “What do you mean ‘part of a logging system’?  You call the file API and write stuff to a file.”

Well, that certainly works for simple and small cases.  But as you expand your logging operation, you might need to get a little more sophisticated since simple file writes will start to create conflicts and other problems. You might even adopt a first-class logging framework. (In fact, you should.)  When you do this, logging becomes a more involved proposition, and it’s one that you can split into three main concerns:

  • Message recording and formatting.  This is where you decide what should go in the log and how to format it.
  • Log appender.  This is, as I’ve already mentioned, the part of the operation that decides where the messages go and how they get there.
  • Log consumption.  This can range from someone simply inspecting the logs to sophisticated search, intelligence, and visualization.

Even in the simplest logging implementation, all of these things actually take place.  For instance, consider this pseudocode:

public void keepTrackOfSomething() {
     _file.write("This method doesn't seem super-useful", "log.txt", File.Append);

Let’s see how our three concerns apply.

  • Recording and formatting is just creating the string “This method doesn’t seem super-useful.”
  • The appending is a simple file write in append mode to “log.txt”
  • Consumption happens later when someone scans through log.txt and reads the message.

Read More

What Goes Into Log Analysis?

I’ve talked here before about log management in some detail.  And I’ve talked about log analysis in high-level terms when making the case for its ROI.  But I haven’t gone into a ton of detail about log analysis.  Let’s do that today.

At the surface level, this might seem a little indulgent.  What’s so hard?  You take a log file and you analyze it, right?

Well, sure, but what does that mean, exactly?  Do you, as a human, SSH into some server, open a gigantic server log file, and start thumbing through it like a newspaper?  If I had to guess, I’d say probably not.  It’s going to be some interleaving of tooling, human intelligence, and heuristics.  So let’s get a little more specific about what that looks like, exactly.

Log Analysis, In the Broadest Terms

In the rest of this post, I’ll explain some of the most important elements of log analysis.  But, before I do that, I want to give you a very broad working definition.

Log analysis is the process of turning your log files into data and then making intelligent decisions based on that data.

It sounds simple in principle.  But it’s pretty involved in practice.  Your production operations generate all sorts of logs: server logs, OS logs, application logs, etc.  You need to take these things, gather them up, treat them as data, and make sense of them somehow.  And it doesn’t help matters any that log files have some of the most unstructured and noisy data imaginable in them.

So log analysis takes you from “unstructured and noisy” to “ready to make good decisions.”  Let’s see how that happens.


Read More

Search Your Files with Grep and Regex

How do you search through a file?  On the surface, this might seem like sort of a silly question.  But somewhere between the common-sense answer for many (“double click it and start reading!”) and the heavily technical (“command line text grep regex”) lies an interesting set of questions.

  • Where does this file reside?
  • What kind of file is it?
  • How big is the file?
  • What, exactly, are you looking for in the file?

Today, we’re going to look at one of the most versatile ways to search a file: using grep and regex (short for regular expression).  Using this combination of tools, you can search files of any sort and size.  You can also search with extremely limited access to your environment, and if you get creative, you can find just about anything.

But with that versatility comes a bit of a learning curve.  So let’s look at how to take the edge off of that and get you familiar with this file search technique.  To do that, I’ll walk through a hypothetical example of trying to extract some information.  First, though, let’s cover a bit of background.

Magnifying Glass Aimed at Search Box on Screen

Read More

Logging Levels: What They Are and How They Help You

Logging levels probably aren’t the most exciting thing in this world.  But then again, neither is banking.  And yet both things are fundamental to the people who use them as a tool.

Application logging is one of the most important things you can do in your code when it comes to facilitating production support.  Your log files serve as a sort of archaeological record of what on earth your codebase did in production.  Each entry in a log file has important information, including a time stamp, contextual information, and a message.  Oh—and generally, something called a logging level.

So what are logging levels?

Well, put as simply as possible, they’re simply a means of categorizing the entries in your log file.  But they categorize in a very specific way—by urgency.  At a glance, the logging level lets you separate the following kinds of information:

  • Hey, someone might find this interesting: we just got our fourth user named Bill.

Logging levels can help you distinguish whether you need the pictured fire extinguisher or whether the message just contains information.

For the most part, this distinction helps in two ways.  First, you can filter your log files this way during search.  And second, you can control the amount of information that you log.  But we’ll get to that in a bit.

Logging Levels: Why Do We Do It?

When it comes to logging, you have two essential and opposing forces, if you will.  On the one hand, you want to capture every last detail you can because this might prove useful during troubleshooting or auditing your system.  But on the other hand, all of that logging consumes resources. You can eat up disk space, overload people reading the logs, and even start to slow down your production code if you go overboard.

So logging requires either a balance or a way to get both the proverbial signal and the noise.  And logging levels look to help with this by making your situation configurable on the fly.

Here’s a helpful metaphor, perhaps.  Imagine you have an android friend and you ask him about what he did today.

“Well, I woke up, inhaled, moved my nose three inches to the right—”

“No!  Too much information.  Go from granularity 10 down to 3.”

“I woke up, had breakfast, caught a taxi…”

“THAT’s better.”

Usually, you don’t need to know everything the android did.  But if he starts malfunctioning, you might want to go back to “granularity 10” to zoom in on the problem as narrowly as possible.

Logging levels work this way.  Often you’ll only want to know if there are problems or warning conditions.  But sometimes you’ll really want to dial up the information for troubleshooting purposes.

Read More

Get Started Quickly With C# Logging

If you’re interested in getting started with C# logging as quickly as humanly possible, you’ve come to the right place.  Today, we’ll look at just how to do that.  Don’t worry, we’ll get into comprehensive detail too, covering a number of things.  Some of those include

  • What is logging?
  • Why would you want to log?

But for now, let’s get to the action.  How can you start logging in the absolute quickest and simplest way possible?  Let’s take a look.

Getting started quickly with C# logging

The Simplest C# Logging That Could Possibly Work

Let’s start with a clean slate for the sake of easy demonstration.  First, open Visual Studio and create a new console project by selecting File->New->Project, as shown here.

To get started with C# logging, create a console project, as shown here.

I’ve called my new project ConsoleProject and opted to put it into a new solution called LoggerPlaypen.  No need to congratulate me on my creativity, though.

This will open the Program.cs file containing your Program class.  Let’s now add a few lines of code to that class, as follows:

static void Main(string[] args)
    var displayMessage = "Hello, user!";


We now have an application that successfully defines a greeting, displays it, and waits for an enter keystroke to dismiss it.  When you hit F5 to run, you’ll see the following message before pressing enter:

Here is the console output, "Hello, user!"

But let’s say that we wanted not only to display this message but also to log it to a file.  How might we do that?  With something fancy, like a logging framework, inversion of control, or aspect-oriented programming?  Nah, let’s not get ahead of ourselves.  Try this instead, making sure to add “using System.IO;” at the top with the other using statements:

static void Main(string[] args)
    var displayMessage = "Hello, user!";

    File.WriteAllText("log.txt", displayMessage);


Now, when you hit F5, you’ll see your greeting message once again.  But that extra line you’ve added will create a log file for you.  Go to your project directory, and then navigate to the Bin->Debug folder. You’ll see a file hanging out there in addition to your console executable.

Here's the file in your Debug directory, plus a shot of the file's contents, "Hello, user!"

Is That Really It?

And…that’s it.  You just successfully implemented your first C# logger!

Is that all you need to know about logging?  Of course not.  Will this logging scheme be sufficient for any but the most trivial applications?  Nope.  But have you learned the barest essential backbone of C# logging?  Absolutely.

I told you it’d be a quick start.  But I strongly recommend that you read on to really get a handle on this logging thing.

Read More

Surprising Use Cases for Log Visualization

People commonly say that a picture is worth a thousand words.  So I wonder if log visualization is worth a thousand log entries.  The math equivalency might be a little hard to prove, but the idea is worth exploring.

You’re recording all sorts of information in your log files, but are you visualizing that information?

Do you have dashboards and graphs that help you picture production behavior?  Or does the information sit buried within digital mountains of arcane strings?  The proverbial needle in the haystack?  Does anyone who wants to use it need to engage in laborious searches?

If you’re not visualizing your logs, you’re missing out.  But I don’t necessarily want to make the case for visualization today.  Instead, I’d like to offer some ideas for visualization that you perhaps hadn’t considered.  Let’s take a look at some use cases for log visualization that you might not have considered.

Prerequisites for Log Visualization

First things first, though.  Before I can take you through the use cases, you need to have a setup that allows log visualization.  Specifically, you need modern log file management, which includes the following things that concern us:

  • Log aggregation (gathering and storing log files).
  • Meaningful parsing of the data contained in the log files.
  • Powerful search capabilities.
  • Log visualization tools.

Don’t get me wrong.  You could implement all of this stuff yourself.  But then again, you could also implement your own source control and text editor.  Just because you can doesn’t mean you should.

If you want to get serious about visualizing log files without burning tons of time, you need tooling and infrastructure in place to help you.  Once you have that, though, let’s take a look at some of the things you might do.

Read More