Introducing deeper support for Kubernetes, collaboration features and stack tracing for easier troubleshooting

As the move to the cloud and containers continues to make software delivery faster and easier, environments are getting more complex. At Scalyr, we believe that observability solutions need to help engineering and operations teams get access to all the data they need, fast. Along those lines, we are announcing new features to help teams support the latest container and orchestration technologies, improve collaboration, and streamline workflows for faster and easier issue identification and resolution.

Kubernetes Cluster-Level Logging

Our new Kubernetes cluster-level logging enables engineering teams to effectively monitor and troubleshoot Kubernetes environments by centralizing and visualizing logs by deployment. Regardless of source, Scalyr intelligently ingests, parses and organizes logs to give developers an application-level view for faster issue resolution in complex container environments.

As shown in the screenshot above, users are presented with log summaries by deployment  rather than by each individual container. The automatic grouping of logs pertaining to a deployment gives developers a more holistic view of each application or service that may be running on multiple containers and pods. Insight into individual pods and nodes is also available but that level of detail is abstracted by default so developers can focus on their application right away.

Chart Annotations

For many of the complex software issues faced by engineers, increased collaboration is key to finding solutions quickly.  To improve collaboration between engineering and operations and between different engineering teams, our new chart annotations provide a way for users to call attention to specific points or windows of time with markers and customizable notes. Annotations can be manually added to dashboard charts to highlight potential issues and shared with any team member. This improves communication and productivity among engineering team members, giving them additional context to more quickly hone in on the specific logs related to the problem at hand.

Slack Integration

We’ve extended our integration with Slack to provide more native interaction within the Scalyr UI.  When viewing a chart, users can select “Share with Slack” from the Share dropdown menu and immediately send the chart to another user or channel in Slack.

Stack Trace Linking

Scalyr now makes it possible to jump directly from log events with stack traces into the reference source code in your repositories. This streamlines your debugging workflow, making it faster and simpler to get under the hood, make tweaks, test hypothesis, and ultimately solve problems. Investigating exceptions is now as easy as one mouse-click. Scalyr supports any web-accessible repository such as GitHub.

CloudWatch Support

We’ve improved our support for AWS CloudWatch to provide a simpler and more reliable way to import AWS logs. By importing your AWS logs into Scalyr, you’ll get a centralized and more holistic view of all of your services, including serverless AWS Lambda functions and other AWS services.

We will be releasing these new features in the coming weeks. If you’ll be at AWS re:Invent 2018, be sure to swing by Booth 1014 to catch a demo.

Five To-Dos When Monitoring Your Kubernetes Environment

If you’re on the DevOps front line, Kubernetes is fast becoming an essential element of your production cloud environment. Since container orchestration is critical to deploying, scaling, and managing your containerized applications, monitoring Kubernetes needs to be a big part of your monitoring strategy.

Container environments don’t operate like traditional ones. So, if you are monitoring your applications and infrastructure, you need to be thoughtful about how you monitor your container environment in which they are running. Here are five best practices to inform your strategy:

  1. Centralize your logs and metrics. Orchestrating your containerized services and workloads through Kubernetes brings order to the chaos, but remember that your environment is still decentralized. You will give yourself a fighting chance if you centralize your logs and metrics.
  2. Account for ephemeral containers. The beauty of container orchestration is it’s easy to start, stop, kill, and clean up your containers in short order. However, monitoring them may not be so easy. You still need to debug problems and monitor cluster activity, even when services are coming and going. The trick is to grab the logs and metrics before they’re gone. If you don’t, your metrics will look more like the graph on the left than the one on the right.
    log files examples for transient containers
  3. Simplify, simplify, simplify. With all of the moving pieces in your container environment (services, APIs, containers, orchestration tool), you need to monitor without introducing unneeded complexity. Rather than bloating your container with various monitoring agents, each requiring updates on unique schedules, abstract your monitoring and management tools from what you’re monitoring and managing. This will also help your engineers focus on building and delivering software, not operating the delivery platform.
  4. Monitor each layer explicitly. You will need to collect logs and monitor for errors, failures, and performance issues at each layer – the pod, the container, and the controller manager – of your environment. For example, you’ll need to be able to troubleshoot pod issues, ensure the container is working, and collect runtime metrics in the controller manager.
  5. Ensure data consistency across layers. For fast, accurate debugging, you need to ensure data consistency across all the layers in your container environment. Things like accurate timestamps, consistent units of measurement (such as milliseconds vs. seconds), and collecting a common set of metrics and logs across applications and components will help you troubleshoot and debug quickly and accurately across all of your layers.

One best practice for accomplishing these to-dos in a simple, straightforward manner is to monitor the containers in your Kubernetes environment without touching your application containers. Do this by introducing a DaemonSet, or alternatively a sidecar, into your Kubernetes environment(s) that sits alongside your containerized services and includes your logging and metrics collection agent. Deploying in this method will ensure consistent data collection, minimize the changes required to your application containers, and most importantly, eliminate the possibility of selective blindness in your production environment.

A few ways to implement this include:

  • Introduce a DaemonSet with the Fluentd logging agent (this will give you logging but not metrics). If you already have an ELK cluster configured, this is probably the option for you. Learn more here.
  • Introduce a DaemonSet or sidecar with the Prometheus metrics agent (CoreOS has done an excellent job of integrating Prometheus and Kubernetes). Running Prometheus on your Kubernetes cluster will give you metrics instrumentation, querying, and alerting. Learn more here.
  • A variety of metrics and performance monitoring tools, including Heapster, DataDog, cAdvisor, New Relic, Weave/VMware, and several others also offer a DaemonSet or sidecar options for Kubernetes monitoring.
  • Scalyr, log management for the DevOps front line, has a preconfigured DaemonSet containing the open source Scalyr agent available for download and use. The Scalyr DaemonSet natively supports both Kubernetes logging and metrics. You can download the YAML file for deploying the containerized Scalyr agent from GitHub here. Note that you also can download the full open-source Scalyr agent from GitHub here.