Some notes on Grafana Loki's new "structured metadata"

45 comments

·March 16, 2025

ohgr

Not much I agree with in this article. Seems to be based on little operational experience with the product, particular indicated by a couple of major mistakes and assumptions in the article (compacting does happen, didn't read the manual about deployment configurations clearly).

Loki has its idiosyncrasies but they are there for a good reason. Anyone who has sat there waiting hours for a Kibana or Splunk query to run to get some information out will know what I'd referring to. You don't dragnet your entire log stream unless your logs are terrible, which needs to be fixed, or you don't know when something happened, which needs fixing. I watch many people run queries that scan terabytes of data with gay abandon on a regular basis on older platforms and still never get what they need out.

The structured metadata distinction is important because when you do a query against that you are not using an index, just parsed out data. That means explicitly you're not filtering, you're scanning and that is expensive.

If you have a problem with finding things, then it's not the logging engine, it's the logs!

DeathArrow

Did someone use both Grafana Loki and Kibana? Does it have any advantages over Kibana? I am mostly interested in resource usage and versatility of filtering.

In Kibana, if something is there I will find it with ease and it doesn't take a lot of time to investigate issues in a microservice based application. It is also quite fast.

feydaykyn

Compared over Kibana, we experience: - 3x reduced costs - no more index corruption because a key changed type - slower performance for queries over 1 day, especially when non optimized without any filtering - non intuitive ui/ux

So good but not perfect! When we have the time we'll look for alternatives

kbouck

Re: storage, Kibana (Elastic) has a new (as of v8.17) "logsdb" index mode which claims to be ~2.5x more storage efficient than previous options.

valyala

Did you try VictoriaLogs? It is easier to configure than Loki and Elasticsearch, and it uses less CPU and RAM. See https://itnext.io/how-do-open-source-solutions-for-logs-work...

feydaykyn

Stop spamming. You want your product known? Create a dedicated post.

suraci

1. It is extremely resource-efficient.

2. It has a convenient and simple query language.

3. It works very well with traces and metrics.

the pain part:

1. It struggles to query logs over a wide time range.

2. Its indexing (or labeling) capabilities are very limited, similar to Prometheus.

3. Due to 1 and 2, it is difficult to configure and use correctly to avoid errors related to usage limits (e.g., maximum series limits).

valyala

> 2. It has a convenient and simple query language

IMHO, Loki query language is the most inconvenient language for logs I've seen:

- It doesn't support calculating multiple stats in a single query. For example, it cannot calculate the number of logs and the average request duration in a single query.

- Its' syntax for aggregate functions is very unintuitive and is hard to use, especially if you aren't familiar with PromQL.

- It requires putting an annoying "|=" separator between words and phrases you are searching in logs.

- You need to use a hack with JSON parsing when filtering or stats calculations on log fields is needed.

arcanemachiner

Which its is it that your its are referring to?

suraci

loki

null

[deleted]

jakozaur

Kibana is great, but managing Elasticsearch is pain. Slow ingestion, query performance, takes a lot of space, harder to tune up.

Modern columnar SQL such as ClickHouse are 10+ times more efficient in real-world use cases.

I'm a CEO and founder of Quesma, which, let's use Kibana with ClickHouse: https://quesma.com/

Forever free, source-available license.

ohgr

Kibana + ElasticSearch was a mess for us. Was glad to get rid of it. Cost a fortune to run and was time consuming. Loki conversely doesn’t even show up on our costs report (other than the S3 bucket) and requires very little if any maintenance!

Also out of box configuration sinks 1TB/hr quite happily in microservices mode.

valyala

How much CPU and RAM do all the Loki components use for your workload?

Could you share Loki config, which can deal with 1TB/hr volume of logs?

ohgr

No idea. It’s their out of box microservices set up. I think the only thing we pumped up was the chunks cache to 30Gb.

remram

ELK could never deal with my logs which are sometimes-JSON. Loki can ingest and query it just fine. Also the query/extraction language makes a lot more sense to me.

valyala

Elasticsearch can store arbitrary text in log fields, including JSON-encoded string. Elasticsearch can also tokenize JSON-encoded string and provide fast full-text search over such string in the same way like it does for a regular plaintext string.

why do you need storing JSON-encoded string inside log field? It is much better parsing the JSON into separate fields at log shipper and storing the parsed log fields into Elasticsearch. This gives better query performance and may also reduce disk space usage, since values for every parsed field are stored separately (this usually improves compression ratio and reduces disk read IO during queries if column-oriented storage is used for per-field data).

I tried explaining this at https://itnext.io/why-victorialogs-is-a-better-alternative-t...

remram

The problem is that log messages are collected from many researcher-authored applications. Some of them are JSON, in which case I want to parse them and store the parsed JSON as you say. But some of them are not JSON, in which case I need to store the text.

I was not able to do that with the log shipper. If I configured parsing, then the not-JSON messages would get dropped.

kbouck

If your source emits logs in OpenTelemetry format, using an OTel Collector inbetween you could do sometimes-JSON parsing of log content before the backend.

cortesoft

Couldn’t you use the logstash part of ELK to process the JSON?

remram

JSON no problem, sometimes-JSON no. It would drop the not-JSON message entirely.

parliament32

Yes, we switched metrics and logs from an Elastic stack to Prometheus/Thanos/Loki/Grafana about two years ago. On the logs side specifically, resource usage is WAY lower (300eps is like 1.5 cores and 4gb of memory), not to mention going from persistent volumes (disks) to blob storage / S3 is far cheaper and doesn't require any maintenance. Queries are slower, however, because Elastic pre-indexes while Loki searches on-demand, so it really comes down to query volume and your need for query performance (does it matter if your search takes 300ms vs 3s?). I've also found running Elastic yourself requires constant maintenance, while Loki has been very hands-off. Strongly recommend.

Unroasted6154

Loki was much cheaper to run in my experience, using S3 storage. And you could scale the parts you needed dynamically in K8s.

Elastic was kind of a resource hog and much more expensive for the same amount of data.

That might be dependent on your use case though.

weitzj

From the Enterprise Perspektive at least for my use cases(fine grained permissions using extra id) , elasticsearch with kibana always had a solution available.

For grafana cloud and Loki you can close to a good usability with LBAC (label based access control) but you still need have many data sources to map onto each “team view” to make it user friendly.

What is missing for me is like in elastic a single datasource for all logs which every team member across all teams can see and you scope out the visibility level with LBAC

ptman

@valyala , as others have noted, you are CEO of VictoriaMetrics and have written (most of?) VictoriaLogs. How is VictoriaLogs coming along? This is an older blog post.

Dusseldorf

I switched our team over to VictoriaLogs from ELK when VL1.0 was released a few months back and we've been very happy with it. Nowhere near as much finicky performance tuning, no more logs failing to ingest because a string looked a bit too numeric, and the query language has fewer weird gotchas.

At the end of the day ELK was throwing us a bunch of roadblocks in order to solve problems we didn't need solved. Maybe if we were trying to build some big analysis layer on top of our logs that would've been nice. VL has worked great for our use case of needing to centralize and view logs.

valyala

VictoriaLogs is free from issues mentioned in the referred article. It supports log fields with big number of unique values (such as user_id, trace_id, ip, etc.) from the beginning, and it doesn't need any configuration for working with such fields. It automatically indexes all the log fields and provides fast full-text search over all the ingested log fields.

This is explained in more details at https://itnext.io/why-victorialogs-is-a-better-alternative-t...

jiveturkey

(2024)

important because the title includes _new_

kbouck

It's also not ideal to have a different query language for different Grafana datastores (LogQL, PromQL, TraceQL). Are there any plans on making a unified Grafana query language?

jakozaur

There is an effort in OpenTelemetry to create a standard query language for observability. There were a lot of discussions with a lot of opinions; there were even several talks during KubeConEU about that:

https://sched.co/1tcyx

https://sched.co/1txI1

We are still waiting for a compelling implementation that will show the way.

ople

Why not just use SQL? With LLMs evolving to do sophisticated text-to-SQL, the case for a custom language for the sake of simplicity is diminishing.

I think that expressiveness, performance and level of fluency by base language models (i.e. the amount of examples in training set) are the key differentiators for query languages in the future. SQL ticks all those boxes.

jakozaur

You are right. SQL is the best language, but it likely needs some extensions. See SQL with pipe syntax. Read Google paper or try it out in Big Query.

There are a lot of fundamentals in observability, but there are very verbose in SQL:

- rate operator, which translates absolute value to rate, possible with SQL and window functions, but takes many lines of code

- pivot, where you like to see the top 5 counts of errors of most hit-by-error microservices plus others over time

- sampling is frequent in observability and will be useful for LLMs, it is a one-liner in SQL with pipe syntax, even customizing specific

I actually believe LLM gen AI plays extremely well with pipe syntax. It allows us to feed partial results to LLM, sampling as well as show how LLM is evolving queries over time. SQL troubleshooting is not a single query but a series of them.

Still, SQL with pipe syntax is just syntactical sugar on SQL. It let's you use all SQL features as well as compiles to SQL.

pbh101

I think I’m probably not interested in this. PromQL is already relatively dense to learn, but reasonably well fit to the domain model and internally consistent, unlike most other metric querying tools I’ve tried over the years.

Maybe that would work as well with traces and logs but IMO the problem space is quite different and not sure how much value we’d get from a unified language where some subsets only apply to parts, ie traces and logs and metric, as opposed to spiritually similar but distinct languages.

pjd7

Unifying things slows engineers down, so probably not (for some time).

Wheaties466

well, not having a unified language slows down things on the other end.

slekker

A disclaimer is that OP is CEO of another company in the same sector

NewJazz

Chris Siebenmann? What company is he the CEO of?

fsiefken

No, the OP of the HN thread is from VictoriaMetrics (open source), he’s not Chris Siebenmann, unix systems administrator at the University of Toronto’s Computer Science Labs.

duckydude20

i find him everywhere theres is mention of loki. trying so hard to push down loki. sure loki have issues but i don't feel this is a good approach to promote your product.

NewJazz

Ah, poster not article author gotcha.

xyz-x

Does anyone have experience using Apache Iceberg, Hudi or DeltaLake for storing logs instead of Loki?

pbh101

danluu.com mentioned this approach (or just, ‘big data’ systems) for traces and metrics iirc. Not sure if for logs too. Aren’t all of those relatively tabular? What would you be looking for from those tools to help with logs?