Dataset

Observability Trends and Dogfooding Products

Maggie Liu — Tue, 30 May 2023 18:54:22 +0000

“Our dependency tree keeps getting bigger, and each dependency is emitting more logs. The extent of those logs not following a common schema radically impacts their usability”

Distinguished engineer John Hart of the Event DB team at DataSet joins Lee Atchinson to speak about everything from observability trends, machine learning, dogfooding DataSet, and more.

Listen to the full podcast episode here on Software Engineering Daily. Read on to see Q&A summary and highlights of the conversation:

Q: What does DataSet do?

JH: DataSet is a unified source for server observability, trace, logs, and metrics. There are many solutions that address each of these in isolation, but DataSet brings them all together. The ability to click from a metric-based “latency is high” alert to the tracing spans that show those operations in context, all the way to the individual line-level application logs, without having to leave the tool, is super important. We sometimes refer to this as “MTTWH” – mean time to “what the heck?” (although sometimes we use a different final character…) Having to switch tools for different levels of detail is needless friction.

You can go the other direction as well – from looking at an application log that contains a numeric value, it’s just one click to chart that over time and another click to create an alert or dashboard based on it.

Q: Sounds like a lot of data.

JH: For sure, and that’s not trending down anytime soon. Kubernetes control-plane data is verbose just by itself, and that’s before you get to the actual workload logs that you care about. DataSet’s architecture is fairly unique as far as I know – we avoid global indexes that would write-amplify our data and we separate compute from storage. Competing solutions that are based in the Solr/Lucene/Elastic document-indexing world depend on locally-attached storage, and therefore must scale their compute linearly with data. In other words, moving from 1 month to 1 year of storage would mean lighting up 2x the amount of compute (each with its own locally-attached storage).

DataSet separates compute from storage, so most of our cost is determined by daily volume rather than total-data-stored. This enables some cool features like pay-per-query for historical data, which lets customers leave their data in our system at very low cost for as long as they’d like. Because our at-rest format is columnar, we get great compression rates and it can actually be cheaper to leave data in DataSet than to keep it directly in your own storage system in standard row-major format. Plus you get the ergonomics of the entire tool, without having to manage hot/cold/glacier storage migrations.

Q: How has the log analytics space changed dramatically?

JH: With ever-increasing volume comes a need for a standardized view of your data, so I’d shout out to OpenTelemetry (and its predecessors) as maybe the biggest revolution in the observability space over the past decade. Everyone’s dependency trees keep getting bigger, and each dependency is emitting more logs. The extent of those logs not following a common schema radically impacts their usability. The standardized approach of OpenTelemetry really helps engineers operate third party systems reliably without having to become an expert in those systems’ logfile formats.

Q: Can you use machine learning for logs to find larger or time consuming patterns?

JH: For sure, and we’re putting a lot of effort into this. DataSet recently added anomaly detection, so the system can detect spikes/gaps without needing manual thresholds. I think we’ve all been through the cycle of creating an alert and then tuning it over days/weeks to eliminate false positives while still flagging actual problems … it’s annoying, time-consuming, and can be difficult to get right for data with built-in seasonality, diurnal/nocturnal patterns, etc. This is the type of thing that ML excels at, so it’s nice to offload that problem to the computer.

Q: How is your team dogfooding product at DataSet?

JH: That might be my favorite part of this job – we use DataSet constantly in the development and operation of DataSet. Any new feature we are developing is a feature we ourselves benefit from. That’s about as tight of a feedback loop as you can get between your code & your tools, unless you’re coding a text editor.

DataSet is hosted and multitenant, typically with just one cluster per geographic region. In our largest cluster, this means a single query will fan out to tens of thousands of CPU cores, all acting in concert to search TBs or PBs of data as quickly as possible. To run clusters of this size we absolutely depend on DataSet to monitor them, for steady-state operation (alerts and dashboards) as well as ad-hoc queries, debugging, reasoning about the system … we couldn’t do it without DataSet.

Fun Fact: John was one of DataSet’s first ten customers (back when it was called Scalyr). He came across this blog post written by Scalyr’s founder, Steve Newman, and after trying Scalyr found it much preferable to Splunk. John liked the product so much he became an early employee of DataSet and now runs the database team, which powers DataSet as well as SentinelOne’s security products.

Get Started with DataSet for Free

DataSet is a modern log analytics platform that helps DevOps, IT engineering, and security teams get answers from their data across all time periods, both live streaming and historical. It’s powered by a unique architecture that uses a massively parallel query engine to provide actionable insights from the data available.

Get started with our 30-day free trial here.

The post Observability Trends and Dogfooding Products appeared first on Dataset.

Unleash the Power of Modern Log Analytics

Maggie Liu — Fri, 19 May 2023 18:40:31 +0000

Our latest eBook covers the importance of event logging in cloud-native applications. Learn how logging works, why it matters, and how it improves your business and end-user experience.

Our latest ebook covers the basics of log analytics, including different types of log data and how to collect, store, and analyze them. If you are currently using an event data solution or looking for alternatives, consider new ways to better manage logs for your entire team.

Do you want to understand log analytics and how it can be used to optimize system and application performance, detect and triage issues, and improve user experience? In this eBook we answer common questions such us:

Why do you need log aggregation?
What makes an effective logging strategy and how do you start?
How can you centralize logs to improve developer productivity?

Download the Modern Logging and Analytics eBook today and start unlocking the full potential of your event data.

Building a Great Logging Strategy

Quickly track down and understand issues that arise within an application. Understanding what and how to log data will make this process significantly easier. Whether you are just getting started or using an alternative logging solution, see the DataSet difference in our Get More Value Out of Your Splunk Investment and Replace ELK Stack whitepaper when evaluating the best solution.

The post Unleash the Power of Modern Log Analytics appeared first on Dataset.

DataSet, SentinelOne at KubeCon Europe 2023

Maggie Liu — Fri, 14 Apr 2023 20:59:56 +0000

DataSet and SentinelOne are proud sponsors of KubeCon + CloudNativeCon Europe this year. Join us in Amsterdam, Netherlands on April 18-23, 2023, to be a part of the biggest event for Kubernetes! We hope to see you in Amsterdam – visit the booth, meet the team, and talk about all things related to Kubernetes from monitoring to security.

Visit Us at the Event

Where: RAI Convention Centre in Amsterdam, NL

When: April 18-21, 2023

Booth: S47

Find us at SentinelOne booth S47 where we will share interactive product demos and you will even have a chance to participate in exciting giveaways!

Register Now

How SentinelOne Helps Kubernetes Security

Check out our guide Defending Cloud-Based Workloads to learn more about implementing Kubernetes (K8s) security. Our Guide to Kubernetes Troubleshooting also highlights how to handle monitoring with K8s. If you can’t make the event, but are looking to learn more – we invite you to try a personalized demo with our team for your use case!

The post DataSet, SentinelOne at KubeCon Europe 2023 appeared first on Dataset.

DataSet at SREcon 2023 Americas

Maggie Liu — Mon, 13 Mar 2023 17:52:23 +0000

DataSet is proud to be a sponsor of SREcon Americas this year. Join us in Santa Clara, CA on March 21-23, 2023, to be a part of the biggest event for SREs! The conference is a premier gathering of engineers who are passionate about site reliability, systems engineering, and working with complex distributed systems at scale.

Where to Find Us at SREcon

Where: Santa Clara, CA
When: March 21-23, 2023
Booth: #K5

Our team will be here to share product demos and you will even have a chance to win a Bose Bluetooth Speaker!

Register Now

How DataSet Helps SREs

DataSet is a fully managed SaaS platform that has minimal downtime and maintenance. Site reliability engineers need to scale fast without worrying about high operational overhead, managing large data volumes, and high MTTR and MTTD.

DataSet provides SREs and DevOps engineers with a single and unified log monitoring tool that replaces disparate workflows. Instead of dealing with fragmented tools that complicate log analytics, your team can aggregate multiple server logs, monitor and analyze them, set custom log alerts, and create custom dashboards. With the help of DataSet, achieve unmatched scale and real-time visibility across the entire software environment – all while experiencing a low total cost of ownership.

Can’t make the event? We also invite you to try a personalized demo with our team.

The post DataSet at SREcon 2023 Americas appeared first on Dataset.

Scaling Beyond Elasticsearch

Maggie Liu — Fri, 03 Mar 2023 22:39:14 +0000

According to the IDC, the amount of data created in the next three years is estimated to be greater than that generated in the past thirty. With data explosion, comes the need to efficiently manage, adapt, and scale the growing volume. Many customers opt for ELK for its cost saving, scalability, and open source benefits – however, it is important to understand the pitfalls of maintaining legacy tools and consider the advantages of other solutions.

We recently hosted a webinar diving into the challenges with Elasticsearch and how DataSet differentiates itself in log management with Dave Gold, Field CTO SentinelOne , and Anthony Johnson, Field CTO of DataSet. If you missed the webinar, watch the on-demand recording here.

Here are some key takeaways:

Main Challenges with Legacy and On-Premise Tools

Cannot support the scale and speed requirements of modern architectures such as Kubernetes and microservices
Relies heavily on keyword indexing, batch, and query
Autoscaling data is hard – compute and storage are tightly coupled
Open source doesn’t necessarily mean free, consider operational costs from the beginning

ELK Hidden Costs

There are hidden costs when it comes to maintaining hardware. This can lead ELK deployments to be more expensive and time-consuming than suggested:

Sharding is required which divides indexes across nodes. If there are too many on a single node, this can lead to increased latency, storage usage, and make it more difficult to scale.
While it is free and open to modifications, ELK requires heavy lifting in the form of additional paid services, support, and features. This increases operational overhead requiring developers to manage, maintain, and deploy updates.
Managing the infrastructure for ELK requires additional setup work in making sure the data is backed up and protected with the right type of hardware that can scale large volumes of data.

Comparing Scale and Query Data

Using Logstash generators, watch how DataSet stacks up against ELK when handling one, five, and even up to forty generators. You can see the whole demo in the webinar linked here. See how much more data (TB) is faster and easily ingested over different retention periods compared to that of ELK. It is clear that DataSet differentiates in:

Achieving faster ingestion and query speed, taking seconds at a petabyte scale
Scaling efficiently and autoscaling without the need to rebalance nodes, manage storage, or allocate resources.
Reducing operational overhead and total cost of ownership

The DataSet Difference:

Live Data – Schema-less ingestion and index-free architecture means that data shows up in real time, scaling to petabytes of data. No need to worry about indexing and sharding.
The Power of a SaaS Cloud Platform – Store all log and event data in one place accessible to different teams.
Separate Storage from Compute – Choose between DataSet S3 or Bring Your Own S3 bucket. Our columnar format allows for fast searches and low cost storage.
Streaming Engine -Drive live dashboards and alerts to offload queries to answer in real time.
Best in class security – A security first architecture helps meet encryption and compliance standards.

Lower Your Total Cost of Ownership:

DataSet can lower Total Cost of Ownership by 60-80% over 3 years compared to traditional tools. With the platform experience:

Paid full managed cloud system with no maintenance, tuning, and scaling that is cheaper than open source
Faster, scalable, lower operational expense
Increase productivity with your teams spending 10x less time managing and operating the platform
Integrations that help pull data in (Logstash, Kafka, fluentD)
White glove support

Read more about the benefits of replacing ELK in our latest whitepaper, The Business and Engineering Case to Replace ELK Stack

The post Scaling Beyond Elasticsearch appeared first on Dataset.

DataSet Achieves AWS Container Competency Status

Maggie Liu — Tue, 21 Feb 2023 21:52:48 +0000

DataSet has achieved Amazon Web Services (AWS) Container Competency status. DataSet is a SentinelOne solution that offers logging, monitoring, and troubleshooting in container environments, enabling organizations to fully achieve the benefits of cloud, containers, and Kubernetes.

Achieving the AWS Container Competency differentiates DataSet as an AWS Partner Network (APN) member that provides specialized expertise and proven success in delivering solutions for customers looking to manage, deploy, secure, and monitor their container workloads on AWS.

“We are excited to achieve AWS Container Competency status and to continue our efforts in accelerating our customers’ cloud journey,” Rajiv Taori, General Manager

DataSet seamlessly works with AWS services such as Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), and AWS Fargate, continuously collects metrics, events, and logs from the entire stack, surfaces anomalies and uncovers their root causes, so DevOps, SRE, and engineering teams can detect and resolve performance issues faster than ever before. Read more about this from the case studies referenced on the AWS partner page.

“Dynamic container environments generate a lot of fast-moving data. Traditional solutions are expensive, difficult to scale, and slow to detect anomalies. DataSet delivers easy scalability and real-time performance at a fraction of the cost. SentinelOne is committed to helping customers efficiently modernize their applications by using containers with the range of powerful tools AWS provides, and this recognition further advances our partnership in delivering customer success,” Rajiv Taori, General Manager

About the Architecture

DataSet’s unique architecture combines high performance, low overhead, index-free design, and massively parallel processing that unlocks an unmatched log analytics experience:

Schema-Less Ingestion: Experience enormous flexibility in data collection and ingestion without any processing overhead of schema evaluation
Streaming Engine: Create materialized views for repeat queries, so high-res dash-boards refresh, accurate alerts fire and automation tasks trigger within seconds.
Index-Free Design: Columnar data format eliminates the need to maintain index clusters, re-index, and re-shard storage.
Massively Parallel Query Engine: Query engine uses horizontal scheduling, devoting the entire cluster – every CPU core on every compute node –to one query at a time.
Cost-Efficient Object Storage: Dedicates every node in our compute cluster to retrieve data from Amazon S3 in parallel, saturating the entire network band-width to fetch compressed data in the most efficient way possible.

The post DataSet Achieves AWS Container Competency Status appeared first on Dataset.

Introducing DataSet Kubernetes Explorer

Amit Sharma — Tue, 10 Jan 2023 23:24:38 +0000

We are excited to announce the general availability of DataSet Kubernetes Explorer for our customers. Kubernetes Explorer is an intuitive and cost-effective way to understand and manage the health and performance of Kubernetes clusters, deployed applications, and underlying infrastructure. Kubernetes Explorer continuously collects metrics, events, and logs from the entire Kubernetes stack, surfaces anomalies, uncovers their root causes, and provides contextual logs, so DevOps, SRE, and engineering teams can triage and troubleshoot independently – faster than ever before.

Enterprises strive for operational efficiencies, multi-cloud portability, and agility by deploying and running applications on Kubernetes. While Kubernetes has emerged as the ‘de-facto’ standard for container orchestration (Gartner) and is getting mainstream adoption, many enterprises encounter challenges as they operationalize Kubernetes at scale.

The latest Cloud Native Computing Foundation (CNCF) survey cites monitoring and logging as the biggest challenge in adopting Kubernetes, followed by complexity and security.

Why Operationalizing Kubernetes is Challenging

As Kubernetes environments scale and applications get more distributed, DevOps, SRE, and engineering teams struggle because:

High data volume: Distributed containerized applications generate a lot of data. Traditional tools struggle to analyze high log data volume in real-time.
Fragmented tooling: Multiple monitoring and logging solutions trap data in silos and require teams to carry context across tools leading to higher mean-time-to-detect.
Lack of cluster-wide access: All developers do not have cluster-level access as implementing RBAC within Kubernetes is complex, so it is difficult to troubleshoot errors efficiently and independently.
Expensive to scale: Higher data volume in cloud-native environments increases the total cost of ownership by orders of magnitude.

SREs and DevOps engineering teams spend the bulk of their time answering basic questions regarding the performance of Kubernetes and deployed applications. Scale increases the effort required to detect and troubleshoot problems. Traditional log data platforms were designed decades ago in the pre-cloud era; they don’t work for modern K8s environments. They are slow, siloed, expensive to scale, and complex to operate.

Introducing DataSet Kubernetes Explorer

Kubernetes Explorer is a turnkey solution that makes it easy for engineering teams to debug and troubleshoot errors no matter where they happen across the entire Kubernetes stack. Kubernetes Explorer delivers immediate value to teams starting on their Kubernetes journey while addressing challenges for complex Kubernetes deployments at scale.

Kubernetes Explorer provides an at-a-glance view into all Kubernetes clusters with the flexibility to drill down into a particular cluster, namespace, nodes, pods, containers, or deployed workloads in mere seconds. Additionally, Kubernetes Explorer provides instant visibility to lifecycle events of Kubernetes components so teams can connect the dots between distributed infrastructure and applications. A seamless pivot to contextual logs gives granular visibility, helping organizations troubleshoot and resolve anomalies quickly.

Cluster Overview

Cluster Overview is the at-a-glance view that provides visibility into all clusters, infrastructure – nodes, pods, containers running in each cluster, and applications. You can filter or search into individual components in mere seconds. Kubernetes Explorer automatically discovers Kubernetes components and containerized applications, so you get instant, out-of-the-box visibility across the entire stack.

Nodes and Pods Overview

Visualize provisioned capacity and key performance characteristics such as CPU, Memory, Network, and Disk performance across clusters using pre-built, curated dashboards. Visualize and get alerted on any component trending hot and draining cluster resources, commonly known as noisy neighbor issues.

Troubleshooting Kubernetes Errors

In distributed systems such as Kubernetes, errors can occur anywhere within the stack, and troubleshooting becomes time-consuming when teams have to inspect the health of individual components and manually stitch context.

Further exacerbating the problem is the lack of cluster-level privileges for all developers. While the platform team, DevOps, or SRE teams will have access to the cluster using kubectl, the complexity of managing RBAC within Kubernetes prevents all developers from gaining such access and thus hinders their ability to troubleshoot their applications independently.

Kubernetes Explorer’s Workloads view shines in for these scenarios. It provides an out-of-the-box view of all events that happened to a particular workload.

Application engineers can click the number of events for contextual logging and detailed information about root-cause. In this example, a python application is causing the container to crash every time it starts resulting in BackOff. Using DataSet, engineers can go to the root cause – up to the line of code causing the error within two clicks.

Engineers can quickly narrow down to visualize all events that happened to the service they are supporting, while DevOps and SRE teams can look for all events across the cluster.

Full-stack visibility

Drilling down into an application from the workload tab gives out-of-the-box application performance details using RED metrics – Rate, Errors, and Duration.

The Workload view also shows all lifecycle events that happened on applications so teams can immediately correlate performance-related incidence with Kubernetes events.

Get Started with DataSet to Simplify Kubernetes Troubleshooting

DataSet Kubernetes Explorer is deployed as a one-step helm installation and works with all flavors of Kubernetes – native upstream distribution, Amazon EKS, Google Cloud GKE, Microsoft Azure AKS, or Red Hat OpenShift.

Powered by unique architecture, DataSet combines high performance, low overhead, index-free design, and massively parallel processing that unlocks an unmatched analytics experience at a fraction of the cost.

If you haven’t started with DataSet yet, we invite you to try fully-functional DataSet for 30 days, completely free. Request a demo; we’d love to show you how DataSet can enable all your teams to troubleshoot Kubernetes efficiently.

The post Introducing DataSet Kubernetes Explorer appeared first on Dataset.

7 Reasons Customers Choose DataSet for Log Management

Maggie Liu — Tue, 20 Dec 2022 21:14:23 +0000

There are numerous reasons to consider a log and data event management tool. To start, here are customer stories we have consolidated to share the benefits of implementing a logging solution into your team’s workflow! Let’s dive into what customers have said about why they chose DataSet, but more importantly how they are using log management to improve their software development process and productivity.

1. Fast Data Ingestion – Petabytes to Scale

One of DataSet’s strongest assets is providing blazing speeds of data ingestion. It is clear that most teams evaluating logging and observability tools are looking for the easiest and most effective way to scale and search through high volume data. When Zalando migrated from a monolithic code base to a microservices architecture, their teams prioritized being able to access all data within their new complex architecture. Instead of trying to find which hosts were running on which services and manually pulling logs from each host, their DevOps teams simplified by ingesting all services in DataSet for instant searches.

An online retail company reduced log search time by 98% from hours down to seconds

While some customers were beginning their journey in the logging space, others were opting for a new solution opposed to their current legacy competitor tools. A leading talent management software company found a competitor tools’ ingestion performance too slow and noticed their service slowing down as data volume ramped up. On the other hand, a popular online dating company decided to opt-in for a buy solution with DataSet versus their ELK based in-house solution.

As a result, both of their engineering teams found a similar reduction in log search time to that of Zalando’s and did not have to worry about storage capacity.

2. Simplify the Journey from Monoliths to Microservices

There are many benefits to moving your architecture to microservices, but there are caveats to consider. For example, engineering teams at Wistia needed a single solution to provide real time visibility into their entire cloud environment. They used DataSet to gain insight into their containerized applications which were being deployed on Amazon EKS. Knowing that DataSet effortlessly unifies logs spread across all microservices, infrastructure, and applications they’re running on, the team was able to gain insight with Kubernetes.

Likewise, the previously mentioned talent management company was able to equip their Cloud Operations team with similar visibility across all applications, cloud stacks, and containers on Amazon ECS. They were able to have all the access they needed to web server logs and applications from DataSet.

A leading talent management software company enhanced its migration to microservices by being able to detect performance issues in seconds – which took competitors 10 minutes

When moving from traditional data centers to Kubernetes, Copart decided to make the jump to log management. They quickly scaled all their logs in order to increase response times to query, thus improving scalability and performance in operations.

Lastly, Asana, found their applications lacked visibility and transitioned from running scripts to download logs from their S3 buckets to running Kubernetes frameworks to connect their logs for easy central visibility across the entire tech stack.

As you can see, it is generally easy to deploy an agent and start ingesting logs with cloud-native tools like DataSet.

3. Total Cost Reduction

Developer team leaders can attest to a significant decrease in total cost of ownership when looking at the big picture effects of log management. In fact, noted by Copart’s CISO, DevOps and Security teams downsized from two log management solutions to one, with DataSet. They were able to respond to incidents quicker due to aggregated data across servers, routers, infrastructures, and ensure application uptime.

Copart’s CISO recognized a drop in total cost for DevOps teams due to more integrated system and network logs and increased latency

Another benefit is that DataSet does not charge by the number of users. Some teams feel like they can scale infinitely in relation to product adoption and not worry about uncertain overages.

4. Make Data Accessible to All

No need to worry about the gatekeepers of information with a log management tool as accessible as DataSet. It’s no surprise that making data sets accessible to an entire team benefits everyone. When it came down to sharing a data lake, Zalando enabled hundreds of their small autonomous teams to take ownership of one or more platforms. By having a centralized variety of logs available to search at scale for everyone, DataSet was adopted by 200 teams made up of over 1,000 engineers.

Engineer, Security, DevOps, Support teams all make use of log management resources

Who else can take advantage of logs besides DevOps and security teams? Wistia, chose to give access to log management for their Technical Support team and found success in usability and self-serve purposes. Its tech support team used DataSet to track the root cause of customer issues using the search function to pinpoint requests within the logs. By analyzing customer encounters, they were able to identify and best solve common problems.

5. Custom Parsing – See What Matters

Parsing common log types makes it easy to ingest data so that they’re easy to search and analyze. Customers previously mentioned, such as Zalando and Wistia, found usefulness in custom parsing, ease in implementing on instances and out-of-the-box agents, as well as flexibility with formatting rules.

An online dating company’s operations team saves thousands of engineer hours with custom parsers, configuring to extract fields from custom log formats

An online dating company found its match with DataSet’s custom parsing ability. Within their journey, they were able to deploy in production environments, set up the parser, and track down issues in only minutes. Traditionally, it would take hours to query data and pinpoint important levels of information, but with parsing it only takes minutes to track down relevant issues.

DataSet provides Built-In Parsers for common log formats and also allows for custom builds as well.

6. Easy Integrations

It’s possible to aggregate logs from thousands of services with no management overhead. You can use the DataSet API to send and retrieve log data, archive events, and manage configuration files, users, and groups.

Asana onboarded logs within their Kubernetes framework, making it easy to find logs across different clusters

Going from one application to another is simple. As shown by Asana, their teams were able to find the root cause of incidents and correlations at lightning speed with Kubernetes.

Teams can experience the benefits of DataSet on all cloud platforms – from Amazon ECS, Google, to Microsoft Azure and more.

DataSet and Kubernetes
DataSet and Docker
DataSet and Amazon ECS

7. Identify Errors with Alerts

DataSet’s many integrations across platforms like email, Slack, PagerDuty, OpsGenie allow for instant alerting to flag important incidents. Many customers found that log management helped their teams identify errors and mitigate the incidents as a response. One company found that increased troubleshooting led to debugging productivity across their engineering team and as a result they were able to resolve customer-facing issues quicker – cutting resolution time by 80 percent.

It’s easier and faster to diagnose bugs that frustrate customers – Infrastructure Engineering Lead, Wistia

While some competitor tools do not provide easy access to web server and application logs, DataSet does so you can spend less time searching through them. What do you do when there is an outage? On-call engineers can quickly find logs that correlate and decide to scale up or down or deploy fixes. If you are dealing with Kubernetes, instead of figuring out which pod has errors and trying to find error logs from there, all the logs are shown in DataSet.

DataSet Has All The Features

Our platform is built for experiencing the speed, scale, and efficiency of enterprise data. Learn more about our features, such as schema-less ingestions, parsing and preprocessing data, live streaming data, parallel query engine, secure storage, and cloud architecture, to make an informed decision when evaluating your next log monitoring solution.

These features allow you to do all sorts of things, including and not limited to:

Streaming data from a broad range of log shippers, queues, agents, distributed stream processors, and APIs in milliseconds
Acquiring visibility on structured and unstructured data upon ingestion
Viewing live dashboards materialized by repeat queries, which alerts trigger at machine speed and automate tasks execute in real-time
Searching encrypted data instantly in a security-first platform
Experiencing cloud scale

Get Started Now

Teams choose DataSet to elastically scale to petabytes of data while delivering real-time performance at a fraction of the cost. We invite you to try a fully-functional DataSet for 30 days at no cost.

Request a demo – We’ll show you how DataSet provides unparalleled cost advantages for your specific use case.

The post 7 Reasons Customers Choose DataSet for Log Management appeared first on Dataset.

Observability vs. Monitoring: Everything You Need To Know

Maggie Liu — Tue, 22 Nov 2022 20:11:48 +0000

Good monitoring and observability will help you detect problems more quickly when it comes to running production software, spot issues before they become outages, and ultimately save you and your users’ headaches. Both monitoring and observability provide the foundation to improve the customer experience, reduce reliability metrics like Mean Time to Repair (MTTR), and improve Mean Time Between Failures (MTBF).

According to the latest 2022 report, DevOps Research and Assessment (DORA) highlights that good monitoring alongside an observability solution should be prioritized for high-performing and elite teams.

It is important to know the value monitoring can provide and the role it plays in strengthening observability. Read on to learn:

What is observability and monitoring?
How are observability and monitoring different?
Building your monitoring and observability efforts with DataSet

Observability vs. Monitoring

While observability and monitoring sound vaguely similar, they are related but not in the ways you may think. Monitoring helps teams identify problems and receive notifications about them, while observability follows through to aid with problem identification, improving debugging and cause analysis. Additionally, monitoring uses observability tools to track known metrics and failure points while observability provides tools to resolve unknown or unexpected issues. They go hand-in-hand together where you actually need both monitoring and observability if you want to build reliable systems.

Monitoring is tooling or a technical solution that allows teams to watch and understand the state of their systems. Monitoring is based on gathering predefined sets of metrics or logs.

Observability is tooling or a technical solution that allows teams to actively debug their system. Observability is based on exploring properties and patterns not defined in advance.

Observability

With observability, you are able to infer the internal state of a system based on its outputs. Since we cannot predict what we will want to know about our system, we want to track enough data to make sure we can analyze problems from different angles and different aggregates when problems inevitably occur.

There are three main pillars that observability encompasses: metrics, logs, and traces. These three solutions should provide insights into what is going on inside the system.

Metrics

Metrics typically aggregate numeric data about your system and application. For example, you can have metrics around your available CPU and memory and track metrics like response codes, traffic, latency, and errors.

Once system metrics are defined, you can also add on custom metrics that provide relevant business or domain metrics. This allows you to track types of payments, shopping cart size, and a number of abandoned carts, to name a few. For instance with DataSet, you can log simple metrics, or complex multi-field events, with equal ease. Then use your metrics for searching, graphing, alerting, and more.

Metric Types

There are many types of metrics, but the following are the most common:

Gauge – Gauges represent measurements at a particular point in time. Metrics like CPU, memory, or queue counts use gauges.
Counter – Counters measure events that occur. For example, you may count the number of requests your API receives, the number of errors that result, or the number of visitors to your application site.
Histogram – Histograms measure the distributions of events. One of the most common uses for histograms is latency. And instead of using just an average or max, you can determine the 50th, 90th, or 99th percentile of latency that your customers experience.
Gauge Histogram – As a combination of the gauge and histogram, here you can see the distribution of gauge data. So if we take queue counts as an example gauge, we could plot how long the data has been in the queue with a histogram.
Info – For information that doesn’t change during a process, you can use info. This can indicate an application version number or dependency version numbers.

With metrics, you have the potential to measure anything that occurs in your system.

Logs

Logs provide textual data regarding events that occur in your system. Some logs can provide the crucial piece of information to resolving issues, but most logs do not add much value, increasing unnecessary noise volume. It is worth aggregating logs because they provide context to recreate and investigate issues, but it can also be a nuance to reduce log clutter.

It is important to consider leveraging log analytics tools that can help scale data without slowing performance. If you would like to avoid having to query different sources of information when troubleshooting, you want to have your database logs in the same location as your application logs. In other words, you need to have centralized logging, which is provided by solutions like DataSet. Instantly aggregate, search and analyze log data across the entire stack. No matter where an anomaly occurs, you can detect, triage, root cause and resolve.

Traces

Both logs and metrics can relate to particular events that occur within the system, but they do not provide the ability to trace one particular transaction or customer until we add tracing.

For example, if you want to follow a customer’s experience for a particular transaction that failed, you can look at traces and tie relevant metrics, errors, and logs together to show the path through the code that a particular transaction took. With traces, we gain the ability to trace one process, transaction, or experience through our system.

Pitfalls: Three Pillars

Businesses should take some caution when determining whether observability is the solution to their problems. Many wrongfully assume that if they have monitoring, logging, and traceability, they have observability. But that’s not always the case.

In fact, common pitfalls can add more pain and make debugging more complex. For example, if you have three disparate systems that provide the logging, metrics, and traces, your engineers will have to context switch and attempt to correlate the data in those systems themselves. That can lead to errors, a longer time to debug, and frustration.

Additionally, some companies have the “silver bullet” pitfall. Observability isn’t just about throwing a tool or dashboards at your application teams. It is also about building a solid foundation of good logging and metric fundamentals.

Even if teams have all the tools at their disposal, if their application reliability and availability aren’t improving, there may not be any observability, no matter how many fancy dashboards you may have.

Monitoring

Now that we have a good understanding of observability, what about monitoring? With monitoring, we use some of our observability tools to identify issues, notify the software team of those issues, and even predict potential trends in our system’s reliability.

Dashboard monitoring becomes important when tracking metrics and logs, but it should incite actionable steps. Automated alerts should be set up to provide notifications when things need to be looked at or when systems experience issues – and the dashboards should provide relevant data for investigative purposes. DataSet unifies multiple functions into a single tool: log aggregation, search, and analysis; server metrics; dashboards and alerts, external monitoring, and more. At the heart of all this is the event database, a universal repository for logs, metrics, and other operational data, hosted on our servers.

Now this doesn’t mean that you always need an incident to review dashboards. You can also explore the current health of the system or the activities taking place. Then you can start to see how different types of traffic or load affect other parts of the system. From there, you can start to predict when issues may crop up in the future.

Reaching Observability with Centralized Logging

Observability and monitoring go hand in hand. Dedicate the time to understand your system and its architecture and components to know where reliability is lacking within. You can use tools like DataSet to start aggregating your system’s logs in a centralized location and troubleshoot in a better way.

Thousands of Zalando engineers use DataSet for application observability to proactively monitor their end-to-end system health, detect potential problems before they arise, and quickly troubleshoot incidents.

How can you get started? Start a free trial with DataSet and see how you can combine observability and monitoring to ensure your teams can not only detect issues but also resolve them quickly:

Free for 30-days
All features enabled
Unlimited Users, Queries, Dashboards, Real-time alerts
No credit card required
Scale to petabytes of data

The post Observability vs. Monitoring: Everything You Need To Know appeared first on Dataset.

Introducing DataSet KubeiQ

Amit Sharma — Tue, 25 Oct 2022 11:49:44 +0000

We are excited to announce the preview of DataSet KubeiQ, an algorithmic approach to automatically detect anomalies no matter where they occur across the Kubernetes cluster – the underlying infrastructure, the Kubernetes platform, or deployed workloads. The innovation enables DevOps and Site Reliability Engineering (SRE) teams to expedite investigations related to Kubernetes, reduce mean-time-to-resolve (MTTR), and improve end-user experience.

DevOps engineers and SREs use static thresholds based alerts, heuristics, or triangulation based on trial and error to investigate incidents within dynamic, containerized environments in an effort to find the most meaningful signals. Now, teams can streamline this process with DataSet KubeiQ, which automatically bubbles up anomalies in Kubernetes clusters.

Quickly Detect and Resolve Anomalies No Matter Where They Occur in Kubernetes Cluster

DataSet continuously observes the entire cluster and automatically surfaces meaningful insights related to anomalies, such as

A high number of unhealthy nodes due to memory, disk, or CPU pressure
A high number of unhealthy pods remain in the Pending status
A high number of unavailable Kubernetes Deployments
A high number of restarted containers

Due to the ephemeral nature of containerized infrastructure, some of the scenarios mentioned above are expected, such as container restarts. Kubernetes is self-healing and automatically corrects when anomalies occur unless there is a system-wide issue such as resource starvation. The challenge is to determine when a performance signal indeed becomes an anomaly.

DataSet continuously analyzes your Kubernetes cluster and leverages machine learning to understand when activity deviates from its historical baseline to be considered anomalous.

Using DataSet, engineers can effectively resolve performance incidents faster than ever before, regardless of their prior knowledge of Kubernetes technology.

For example, if you are an SRE who recently joined an e-commerce company that runs several services on Kubernetes, you want to get notified when the latency of a service exceeds a threshold. What threshold would you like to set, so you get alerted when there is an anomaly? Many variables can affect the latency of a particular service, such as the rate of requests, errors, or even infrastructure performance, for several reasons. Creating an alert on a static threshold is simplistic and almost always results in alert storms.

KubeiQ uses machine learning to automatically determine truly anomalous patterns and surface those up so that you can get alerted when there might be a systematic issue. In this example, you can see spikes in a frontend service’s latency.

You’d have gotten false alerts multiple times if an alert was set based on static thresholds. KubeiQ dynamically sets thresholds and calculates a performance metric’s expected range.

You are alerted only when the actual value exceeds the confidence interval, which is dynamically created based on historical performance, thus minimizing alerts and the cognitive overload associated with triaging such incidents.

A quick pivot to logs indicates that infrastructure was starved of resources, and one of the replicas of this service was evicted.

You can automate the response and resolution for such conditions using an auto-scaler that dynamically provisions just the right compute resources to handle your cluster’s applications.

Another pertinent consideration is the ephemerality in dynamic Kubernetes environments. You want to monitor pod restarts to proactively get alerted about a system wide issue. However, you know that not every occurrence of these events indicates an anomaly. Kubernetes is self-healing, so if a pod is evicted or stopped, Kubernetes will reschedule or restart it to match the desired state.

So should you be alerted on pod restarts? Again, static thresholds no longer work in the dynamic Kubernetes and container environments. KubeiQ correctly analyzes these dynamic events and alerts you only when it is determined that pod restart exhibit anomalous patterns and warrants your attention.

In this case, even though there is variation in the number of pod restarts across the system, it is within the expected range as predicted by KubeiQ.

KubeiQ proactively analyzes the performance of the entirety of the Kubernetes clusters and expedites the process of detecting and troubleshooting performance anomalies. Engineering teams can now effectively identify anomalies in far less time, regardless of their prior knowledge of the Kubernetes platform or the underlying infrastructure.

Get Started with DataSet to Automate Kubernetes Monitoring

DataSet KubeiQ, a part of Kubernetes Explorer is deployed as a one-step helm installation and works with all flavors of Kubernetes – native upstream distribution, Amazon EKS, Google Cloud GKE, Microsoft Azure AKS, or Red Hat OpenShift.

Meet us at KubeCon or SREcon Europe

Heading to KubeCon + CloudNativeCon North America in Detroit or SRECon Europe in Amsterdam? The DataSet team will be in full force to greet you there. Meet us to get a personalized demo, collect awesome swag, and win exciting prizes!

The post Introducing DataSet KubeiQ appeared first on Dataset.