All posts
April 17, 20266 minAdoba Yua

How I picked an observability platform without trusting a single demo

Three vendors, three polished demos, three curated sample environments. By the end of the week I had no real idea which one would survive a Tuesday incident. So I ran the evaluation inside my own stack instead.

Last quarter I had to pick an observability platform. Three vendors on the shortlist. Each one got an hour with our team. Each one ran a polished demo. Each one sent a custom deck after.

By the end of the week I could have told you what every logo looked like. I could not have told you which one would actually survive a Tuesday incident at 2am.

So I ran the evaluation inside my own stack instead. Here is what that actually looked like.

The questions the demos never answered

Every demo told me latency percentiles on their sample data. Every demo told me retention on hypothetical volumes. Every demo walked me through a dashboard with six services and clean, well-tagged traces.

What I actually needed to know was different.

  • What does our log volume cost per month at current growth, not at their reference customer's growth
  • Does their SCIM connector actually handle our Okta group structure, including the two legacy groups nobody wants to clean up
  • Can their trace sampling keep the spans we care about without blowing the retention budget
  • How does the query editor feel when you are tired, something is broken, and you have fourteen minutes before the next stakeholder ping

None of those answers live in the vendor's environment. They all live in mine.

What I actually wired up

I pointed Panaptico at our prod mirror and let it enumerate. Log volume per service per day. Trace cardinality per endpoint. Current alerting rules. How much of our Kubernetes we actually have instrumented today versus what the vendors assumed we had.

Then I scored the three candidates against that picture, on dimensions I picked, not dimensions their marketing picked.

Vendor A looked cheapest on the pricing page. Against our real volume they came out roughly 40% more expensive than vendor B, because of how they count custom metrics at our specific cardinality.

Vendor C had the best demo. Against our identity setup their SCIM provisioning broke on a group structure we actually use. Not broken like a little rough. Broken like we would have had to restructure half our Okta to make it work.

Vendor B was not the flashiest. Against our stack they had the least friction and the cleanest cost story. They won.

The part I would have missed without running it

Every vendor demo ran on clean sample data. Nobody showed me what happens when you have a service that emits 40x the log volume of every other service (we have one, it is not going away). Nobody showed me the weird identity edge case. Nobody warned me about the custom metrics pricing curve at our scale.

Those are the things that eat your quarterly budget in month three and make you regret the decision at month six.

What I would tell anyone doing this

Do not pick based on the demo. Do not pick based on the pricing page. Do not pick based on what a Gartner quadrant says. Pick based on what the vendor looks like inside your actual environment, on your actual workloads, with your actual identity and network setup.

If you cannot evaluate against live data, you are not evaluating. You are guessing.

Related problem

Systems Evaluation

Compare tools against your real stack, not vendor decks.

Read how we solve it