Five To-Dos When Monitoring Your Kubernetes Environment

If you’re on the DevOps front line, Kubernetes is fast becoming an essential element of your production cloud environment. Since container orchestration is critical to deploying, scaling, and managing your containerized applications, monitoring Kubernetes needs to be a big part of your monitoring strategy.

Container environments don’t operate like traditional ones. So, if you are monitoring your applications and infrastructure, you need to be thoughtful about how you monitor your container environment in which they are running. Here are five best practices to inform your strategy:

  1. Centralize your logs and metrics. Orchestrating your containerized services and workloads through Kubernetes brings order to the chaos, but remember that your environment is still decentralized. You will give yourself a fighting chance if you centralize your logs and metrics.
  2. Account for ephemeral containers. The beauty of container orchestration is it’s easy to start, stop, kill, and clean up your containers in short order. However, monitoring them may not be so easy. You still need to debug problems and monitor cluster activity, even when services are coming and going. The trick is to grab the logs and metrics before they’re gone. If you don’t, your metrics will look more like the graph on the left than the one on the right.
    log files examples for transient containers
  3. Simplify, simplify, simplify. With all of the moving pieces in your container environment (services, APIs, containers, orchestration tool), you need to monitor without introducing unneeded complexity. Rather than bloating your container with various monitoring agents, each requiring updates on unique schedules, abstract your monitoring and management tools from what you’re monitoring and managing. This will also help your engineers focus on building and delivering software, not operating the delivery platform.
  4. Monitor each layer explicitly. You will need to collect logs and monitor for errors, failures, and performance issues at each layer – the pod, the container, and the controller manager – of your environment. For example, you’ll need to be able to troubleshoot pod issues, ensure the container is working, and collect runtime metrics in the controller manager.
  5. Ensure data consistency across layers. For fast, accurate debugging, you need to ensure data consistency across all the layers in your container environment. Things like accurate timestamps, consistent units of measurement (such as milliseconds vs. seconds), and collecting a common set of metrics and logs across applications and components will help you troubleshoot and debug quickly and accurately across all of your layers.

One best practice for accomplishing these to-dos in a simple, straightforward manner is to monitor the containers in your Kubernetes environment without touching your application containers. Do this by introducing a DaemonSet, or alternatively a sidecar, into your Kubernetes environment(s) that sits alongside your containerized services and includes your logging and metrics collection agent. Deploying in this method will ensure consistent data collection, minimize the changes required to your application containers, and most importantly, eliminate the possibility of selective blindness in your production environment.

A few ways to implement this include:

  • Introduce a DaemonSet with the Fluentd logging agent (this will give you logging but not metrics). If you already have an ELK cluster configured, this is probably the option for you. Learn more here.
  • Introduce a DaemonSet or sidecar with the Prometheus metrics agent (CoreOS has done an excellent job of integrating Prometheus and Kubernetes). Running Prometheus on your Kubernetes cluster will give you metrics instrumentation, querying, and alerting. Learn more here.
  • A variety of metrics and performance monitoring tools, including Heapster, DataDog, cAdvisor, New Relic, Weave/VMware, and several others also offer a DaemonSet or sidecar options for Kubernetes monitoring.
  • Scalyr, log management for the DevOps front line, has a preconfigured DaemonSet containing the open source Scalyr agent available for download and use. The Scalyr DaemonSet natively supports both Kubernetes logging and metrics. You can download the YAML file for deploying the containerized Scalyr agent from GitHub here. Note that you also can download the full open-source Scalyr agent from GitHub here.


Scalyr Platform: Kubernetes Monitoring, Performance, and Usability

Our Scalyr platform releases over the past month have focused on Kubernetes monitoring, query performance, and making improvements to usability.

Kubernetes Monitoring
Scalyr Kubernetes Data Visualization

Kubernetes Monitoring

We have added Kubernetes monitoring to our agent. We recommend running it as a DaemonSet on your cluster for efficiency and minimal disruption. Find the new Scalyr agent on Github, and don’t forget to download our Kubernetes monitoring best practices document.

Query Performance Hits New Benchmark – 1.5 TB/second

We have continued to optimize for performance, leading to a new throughput query performance benchmark. Our streamlined database architecture, combined with the brute force technique of applying every core in our cluster to every user query, helped us surpass the 1.5 TB/second benchmark – up from 1 TB/second late last year. Last month, we made a number of improvements, including how we load data from disk, manage concurrent queries, and map data to RAM cache pools.

User and Group APIs

We have added APIs to manage granular user and group permissions. These include adding, listing, editing, revoking access, and providing permissions to users, groups, and users within groups. Learn more in our API documentation.

Billing and Usage Page

We made a number of usability improvements last month, the most notable of which is our revamped billing and usage page, providing at-a-glance information for cost management. Learn more in our Billing and Usage page at the top right dropdown ( > Billing Plan).

Going Forward

We are developing Scalyr with the DevOps front line in mind, and with a focus on our three value pillars – fast, simple, and shareable. The next several releases will focus on the simple part of that equation and include such improvements as making export to Amazon S3 buckets easy and revamping our alerting capability.


Your product (or any) feedback is always welcome. Please reach out to us at