Birbla

Datadog's $65M/year customer mystery solved (blog.pragmaticengineer.com)
122 points by thunderbong - 9 hours ago

And who says that SaaS doesn't pay off?! It pays off like hell!
by mrkramer - 8 hours ago
> For observability, Coinbase spun up a dedicated team with the goal of moving off of Datadog, and onto a Grafana/Prometheus/Clickhouse stack.
We recently did the same, and our Datadog bill was only five figures. We're finding the new stack to not be a poor man's anything, but more flexible, complete and manageable than yet another SaaS. With just a little extra learning curve observability is a domain where open source trounces proprietary, and not just if you don't have money to set on fire.
by delichon - 8 hours ago
There's also https://openobserve.ai, while not as stable as Grafana/Prometheus/Clickhouse, feels a bit easier to setup and manage. Though has a bit of ways to go, does the basics and more without issue.
Crazy crazy they spent so much on observability. Even with DataDog they could've optimized that spend. DataDog does lots of bad things with billing where by default, especially with on-demand instances you get charged significantly more than you should as they have (had?) pretty deficient counting towards instance hours and instances.
For example, rather than run the agent (which counts as an instance regardless of if it's on for a minute), you can send the logs, metrics, etc. directly to their ingestion endpoints and not have those instances counted towards their usage other than log and metric usage.
Maybe at that level they don't even get into actual by usage anymore, and they just negotiate arbitrary amounts for some absurd quota of use.
by asnyder - 8 hours ago
I wonder how much that no-expense-spared, money-is-no-object attitude to buying SaaS impacts an engineers ability to make sensible decisions around infra and architecture. Coinbase might have been fine blowing 65 mil but take that approach to a new startup and you could trivially eat up a significant amount of runway with it.
I won’t single out Datadog on this because the exact same thing happens with cloud spend, and it’s very literally burning money.
by ljm - 8 hours ago
(May 2023)
by abxyz - 7 hours ago
>Originally published on 11 May 2023
by everfrustrated - 7 hours ago
An article that's basically an ad for Datadog: Pay us a ton of money - it’s still cheaper in the long run.
by cybice - 7 hours ago
> Assume that Datadog cuts the number of outages by half, by preventing them with early monitoring. That would mean that without Datadog, we’d look at 24 hours’ worth of downtime, not 12. Let’s also assume that using Datadog results in mitigating outages 50% faster than without - thanks to being able to connect health metrics with logs, debug faster, pinpoint the root cause and mitigate faster. In that case, without Datadog, we could be looking at 36 hours worth of total downtime, versus the 12 hours with Datadog. To put it in numbers: the company would make around $9M in revenue it would otherwise lose, Now that $10M/year fee practically pays for itself!
Those are some pretty heroic assumptions. In particular, they assume the only options are Datadog or nothing, when there are far cheaper alternatives like the Prometheus/Grafana/Clickhouse stack mentioned in the article itself.
by decimalenough - 7 hours ago
What problems does Datadog solve that you can't solve with cheaper solutions?
by cloudking - 7 hours ago
I should have known it was Coinbase. I know that Coinbase used to spend $35,000 a month to back up the data directory of ETH nodes.
by therein - 6 hours ago
> we really work with customers to restructure their contracts
Does anyone have such an experience with Datadog? A few million wasn't enough to get them to talk about anything, always paid list price and there was no negotiating either when they restructured their pricing.
by aeyes - 6 hours ago
> To put it in numbers: the company would make around $9M in revenue it would otherwise lose, Now that $10M/year fee practically pays for itself!
am i misunderstanding, or is the author saying it's better to spend $10m than $9m?
by GuinansEyebrows - 6 hours ago
This person is like the Gossip Guy of tech. Who cares?
by gneray - 6 hours ago
When did this guy stop writing about engineering and start running a tech gossip rag?
by generalpf - 5 hours ago
I have run ELK, Grafana + Prom, Grafana + Thanos/Coretex, New relic and all of the more traditional products for monitoring/observability. More recently in the last few years, I have been running full observability stacks via either The Grafana LGTM stack or datadog at a reasonable scale and complexities. Ultimately you want one tool that can alert you off a metric, present you some traces, and drill down into logs, all the way down the stack.
I have found Datadog to be, by far hands down the best developer experience from the get go, the way it glues the mostly decent products together is unparalleled in comparison to other products (Grafana cloud/LGTM). I usually say if your at a small to medium scale business just makes sense, IF you understand the product and configure it correctly which is reasonably easy. The seamless integration between tracing, logging and metrics in the platform, which you can then easily combine with alerts is great. However, its easy to misconfigure it and spend a lot of money on seemingly nothing. If you do not implement tracing and structured logs (at the right volume and level) with trace/span ids etc all the way through services its hard to see the value, and seems expensive. It requires some good knowledge, and configuration of the product to make it pay off. The rest of the product features are generally good, for example their security suite is a good entry level to cloud security monitoring and SEIM too.
However, when you get to a certain scale, the cost of APM and Infrastructure hosts in Datadog can become become somewhat prohibitive. Also, Datadogs custom metrics pricing is somewhat expensive and its query language cababilities does not quite match the power of promql, and you start to find yourself needed them to debug issues. At that point, the self hosted LGTM stack starts to make sense, however, it involves a lot more education for end users in both integration (a little less now Otel is popular) and querying/building dashboards etc, but also running it yourself. The grafana cloud platform is more attractive though.
by willejs - 5 hours ago
Earlier this year, we at Listen Notes switched to Better Stack [0], replacing both Datadog and PagerDuty, and we couldn’t be happier :) Datadog offers a rich set of features, and as a public company, it makes sense for them to keep expanding their product and pushing larger contracts. But as a small team, we don't have a strong demand for constant new features. By switching to Better Stack, we were able to cut our monitoring and alerting costs by 90%, with basically the same things that we used from Datadog previously.
[0] https://www.listennotes.com/blog/use-betterstack-to-replace-...
by wenbin - 4 hours ago