Skip to content(if available)orjump to list(if available)

Monitoring my Minecraft server with OpenTelemetry and Prometheus

Monitoring my Minecraft server with OpenTelemetry and Prometheus

60 comments

·May 8, 2025

My kids demand SLOs stricter than Moon exploration technology, so I had to monitor our family’s Minecraft server Minecraft server like a pro. As luck would have it, I am one.

darknavi

> Microsoft made things confusing by adding the Bedrock server, which reportedly uses a combination of C, C# and Java,

No C# in Bedrock. No Java unless you're talking about the Android versions. Very little C.

It's mostly C++.

mmanciop

Thanks for setting me straight :-) I updated the article to reflect that.

doabell

> I am a man of simple tastes, and running the “vanilla” Minecraft server as a Systemd unit on a Linux VM in the cloud

Minecraft is famously under-optimized and needy in terms of CPU frequency. If running a vanilla (no server mods) version, then using something optimized, like PaperMC is a better idea for datacenter VMs. (Until you need to dupe sand or something.)

The other route is installing a bunch of optimization mods - some really do help.

ehnto

People love to bother about Java MC performance, but I ran a modded Tekkit sever for like 10 years on a base Digital Ocean VM. Shoutout to Digital Ocean for having no impactful changes for 10 years too. They give me a VM, I run the thing, life is good.

strogonoff

From my understanding, Paper and the like are good for Minecraft servers focused around specific mini-games (rather than freedorm building), and are the only sensible choice for servers with many people (or not that many people, but really underpowered hardware).

However, they may be a problem if players are sensitive to possible non-vanilla behaviour (as you mentioned, and it’s not limited to cheaty duping). Thankfully, spinning up a server with a selection of performance mods is very easy these days. Various tricks like pre-generating chunks in advance also help.

treyd

It's kinda nuts. The upstream mojang server binary starts to groan if you have >4-5 players on the same server doing stuff. They've really been dropping the ball on optimization in recent years.

Paper is good enough for anyone but very technical players pushing to the limits of redstone tick timing logic, entity behavior, chunk loading mechanics, etc. These don't matter even for advanced players doing normal things.

mmanciop

I actually had to splurge got 2 VCPUs on Digital Ocean to avoid "skipping ticks" and it does sound pretty nuts to me. We play max 3 players. I would expect the server with such a load to be able to run on a slightly tuned up toaster.

frollogaston

Wasn't it always like this? There's a lot going on in the game, especially if generating new chunks, and it's in Java.

strogonoff

Monitoring and metric collection makes a lot of sense when you run a production system, or a personal but critical system.

Promoting a telemetry solution when it comes to a hobby server, which you host for yourself and which can’t bankrupt you by running up a massive AWS bill, doesn’t seem to make much sense when simply bottling it up in Docker and being able to restart or recreate at will is enough (mount volumes for logs and persistent data, back it up, and you’re good).

With games like Minecraft in particular there’s value in being able to have multiple servers with different worlds, perhaps different mods, etc. If you decide not to have more servers because they are snowflakes you do not have time to set up monitoring for then you rob yourself and your players of the opportunity to have more fun.

Furthermore, containerizing it allows you to upgrade as new game versions come out quickly by simply spinning up a new container with your preexisting world as a test, and you get you basic system resource usage monitoring built-in.

What I think could be a more interesting exercise is a dashboard for friends or family that allows to manage the lifetime and configuration of their respective containers.

gmuslera

Implementing proper monitoring in a toy system doesn't prepare you to do it in a massive critical system, but at least you may had learn something in the process, and notice things that in big scale may not be as evident.

In any case, fun starts when the system have more interdependent components.

strogonoff

I think there is value in learning which pattern is good to apply in which scenario, and I will argue that in this case the best pattern is “servers are cattle”.

mmanciop

One of the stretch goals for me writing this article was indeed to show between the lines how Prometheus Exporters, the OpenTelemetry Collector and Systemd can all work together. That is a very reusable skill on monitoring workflows running outside containers on Linux VMs or hosts.

jeroenhd

The goal of this article is to show you how to integrate with this service from just about anything. It's an ad that was fun to make as a hobby project. I doubt the goal was ever to set up a fully integrated Minecraft monitoring pipeline. At best, this is an employee at this company just decided to show the flexibility of their product by integrating with a random piece of kit they like.

Luckily, all of the interesting components are existing third party libraries so if you don't want to use their SaaS service, you can build your own Minecraft dashboard pretty easily.

mmanciop

I am indeed an employee of Dash0. The setup for telemetry collector will work with anything that accepts OTLP, and with minor adjustments, the data can be sent elsewhere too in other formats, as the OpenTelemetry Collector is very flexible in that regard.

Alerting is specific to Dash0. I know of no other monitoring solution that lets you run real PromQL on logs. But there will be similar ways of accomplishing the same alerting logic.

dpe82

Have you never just built something for fun?

dengolius

Do you mean something like launching k3s on smartphones https://blog.denv.it/posts/pmos-k3s-cluster/?

strogonoff

I have built a panel like the one I mentioned for fun with friends!

The goal of my comment was to highlight opportunities for more fun and less what seems like toil.

Furthermore, this is an article about a telemetry solution posted on a site of that telemetry solution. They make money from this.

dewey

One persons toil is another persons fun.

koinedad

I’ve recently added telemetry to some “toy” apps at my house because a power outage or other unforeseen issue has caused things like my Siri enabled garage doors to stop working. Now I get alerts through grafana and telegram for basically free which comes in handy.

strogonoff

A garage door is a security concern.

For a game, a solution that simply restarts the container if it’s down solves the issue. You can mount game logs in a volume if you want, and you can see resource usage in container host dashboard. What value do detailed system metrics bring?

Furthermore, you don’t care what software you run to make your garage door system Siri-enabled, as long as it does its job and is not vulnerable; whereas with a game that adds new gameplay features multiple times per month, you do want to update it frequently. Babysitting a snowflake server makes it way more difficult than it should be.

ajmurmann

I am currently planning adding monitoring to some toy apps I hosted on a raspberry pi cluster. The intent is that this might safe me time and stress further down the road. If a new version makes performance worse, I want to see that in the data. If resource needs go up, I want to know that before it's time to move, so that I can plan without any kind of scheduling stress. (I also want to do this in part as an exercise which is partial motivation for the cluster and most things I built that run on it. But don't tell anyone!)

Am I misguided?

strogonoff

Well, as far as I’m concerned, if they are toy apps, why stress? If they are going to go in production at some point, then sure; but this certainly is not happening with a family game server.

ajmurmann

Family game server going down can be very stressful, especially if you have kids.

Also, I've had phone tech support sessions with family that were more stressful than calls with large banks who were worried about losing very large amounts of money in case of an outage. Different stressful, but nonetheless...

jauntywundrkind

Seeing what computers are doing is good, actually. Period.

harrall

Setting up telemetry is really easy if you’ve done it before and it’s a learning opportunity if you haven’t.

I have Dockerfiles from 10 years ago for Grafana and a time-series DB so basically you learn it once and you can bang out basic telemetry infra in an hour afterwards.

And I still actually use InfluxDB and Grafana for my hobby stuff. My current Dockerfiles just look like my old ones…

strogonoff

What happens if Grafana or InfluxDB is down? Who monitors the monitors?

mmanciop

For this, I have the impression that https://github.com/dirien/minectl might be very close to what you are thinking. I did not try it, but took the Minecraft Exporter from it and used in the setup.

cpburns2009

> The minecraft-prometheus-exporter ... which uses Fabric, another way to run Minecraft servers with mods. Like Bukkit, Fabric was not an option for me.

Forge and its recent fork Neoforge are supported too.

null

[deleted]

Lirael

[dead]

Calliope1

[flagged]

Yasuraka

Why are you doing this?

mmanciop

Because I enjoy observability and monitoring a LOT, and because my kids nag me to hell and back when our home IT infrastructure is having a bad day.

Yasuraka

I was asking some LLM spammer

null

[deleted]

acedTrex

Hi gippity