What Can You Know About Your Systems?

It's time to scrutinize a bunch of things to compare observability workflows.

Each observability tool tends to lead users toward a specific workflow. In the same way that working with computers on Windows or MacOS has very different starting points, Logs, Metrics, and Traces each have several entry points.

I want to dig into the benefits and drawbacks of each default workflow.

Honeycomb as the baseline

Since I’m Honeycomb-brained, I’ll definitely come at this with Honeycomb’s default workflow in the front of my mind.

Each of the types of person who interacts with the software brings their own mental model of their business which leads to a lot of different “initial jobs to be done”. Even with the same company, the same data, the same question even, these users may start with different ideas that lead them to different answers.

The marketing and design folks each have a set of conflicting personas meant to capture how to reach these folks. I’m going to use a much more crude delineation where it makes sense.

So how does an Ops person approach Honeycomb?

The ops people want to see percentiles of performance and errors.

They’re accustomed to a dashboard that shows all the things they’ve cared about in the past.

In Honeycomb, they have to ask a question. If they start with “COUNT” of “EVERYTHING”, they’ll just sort of see how much stuff is happening. The home page for a given service shows error rates and a heatmap of performance, but ops people these days are responsible for dozens or hundreds of services.

Once they realize they can get a HEATMAP(duration_ms) and group by error, they’ll have a good solid idea of what errors and slowness exist.

Proper metrics in Honeycomb are a newer offering. As more Ops people get exposed to it, I’ll form some opinions and work it into this series as an addendum.

So how does a developer approach Honeycomb?

Developers birth code into the world using their IDE and terminal outputs. The moment their code is sent off to the real world, they try to bring this feedback loop to production, and the wheels come off.

When we can knock down the door and get a developer to investigate something using Honeycomb, they typically dive directly into their service, click on a trace, and see what the hell is going on.

It contextualizes the work they’ve done, the spans they’ve added, and puts it in a whole execution chain from the frontend to the databases.

Viewing a trace is nice, but using Honeycomb’s aggregation interface to find the spiciest traces means you aren’t stuck with random transactions that were uninteresting.

So how does a product owner approach Honeycomb?

What the hell is a product owner? I’m just gonna merge engineering leads and data people and anyone who didn’t write the code and doesn’t run the code into a mysterious third group.

Honeycomb is fundamentally an analytics engine that loves gobbling up traces and wide events that offers whatever sorts of insights you want. Because of this, everyone at Honeycomb uses Honeycomb to see if users of Honeycomb are enjoying the functionality that is important to them and their budgets.

Customers seem to have a bit more trouble with switching from Big Query to Honeycomb for some reason. I have the least insight into this trend and assume it’s mostly inertia.

With the obvious flaws out there, let’s goooo!!!

Acknowledging that I have 30,000 hours of exposure to Honeycomb and a few dozen to each other tool, I’m leaning a lot into what people claim Honeycomb can’t do.

I’m expecting that the patterns I uncover here will help explain why some tools feel better for certain users.

The contestants

I’ll add these as I write them. So far: