Show HN: LocalScore – Local LLM Benchmark

26 comments

·April 3, 2025

Hey Folks!

I've been building an open source benchmark for measuring local LLM performance on your own hardware. The benchmarking tool is a CLI written on top of Llamafile to allow for portability across different hardware setups and operating systems. The website is a database of results from the benchmark, allowing you to explore the performance of different models and hardware configurations.

Please give it a try! Any feedback and contribution is much appreciated. I'd love for this to serve as a helpful resource for the local AI community.

For more check out: - Website: https://localscore.ai - Demo video: https://youtu.be/De6pA1bQsHU - Blog post: https://localscore.ai/blog - CLI Github: https://github.com/Mozilla-Ocho/llamafile/tree/main/localsco... - Website Github: https://github.com/cjpais/localscore

Visit

mentalgear

Congrats on the effort - the local-first / private space needs more performant AI, and AI in general needs more comparable and trustworthy benchmarks.

Notes: - Olama integration would be nice - Is there an anonymous federated score sharing? That way, users you approximate a model's performance before downloading it.

sipjca

Can you tell me more about the "anonymous federated score sharing"? Maybe something we can think about more

I totally agree with Ollama integration and if there is interest we will try to upstream into llama.cpp

jsatok

Contributed scores for the M3 Ultra 512 GB unified memory: https://www.localscore.ai/accelerator/404

Happy to test larger models that utilize the memory capacity if helpful.

deanputney

That's very interesting. I guess it just can't compete with any of the Nvidia cards? I would think your results should show up if sorted by "generation"– maybe the leaderboard is cached...

zamadatix

Non-VRAM based options of the moment are only competitive when it comes to the size of a model they can run, since the alternative is "you can't run it at that price otherwise".

Bandwidth at a given size is king, only then followed by enough compute to utilize it.

sipjca

Ty for pointing this out, the results are taken from the db based on LocalScore, I will make some modifications to make the sorting better here

david_draco

I don't know if I should trust and run this code. If it was associated to Mozilla I would. It says it is a Mozilla Builders project, but https://builders.mozilla.org/projects/ does not list it. I don't see a way to verify that localscore.ai is associated with Mozilla.

zamadatix

It doesn't seem they update the site often (the last 'latest' post is from December) but they reposted something claiming the same on X https://x.com/llamafile/status/1907917417118105751

zamadatix

The run and/or troubleshooting steps for Windows should probably include the note you need to install https://developer.nvidia.com/cuda-downloads?target_os=Window... if you have an Nvidia GPU (and probably something similar if you have an AMD GPU?). As it is right now the steps happily get you benchmarking your CPU and I'd say that might even be worth adding a "Warning: The benchmark is operating in CPU only mode, press y to continue if this is intended" type message to the program.

Edit: And for the same prompt and generated token counts it runs ~4x slower than `ollama run hf.co/bartowski/Qwen2.5-14B-Instruct-GGUF:Q4_K_M --verbose`. It's possible I'm mixing up a few things there but my results also post in the same ballpark slower than others with the same GPU so it seems something is up with the application in either case.

roxolotl

This is super cool. I finally just upgraded my desktop and one thing I’m curious to do with it is run local models. Of course the ram is late so I’ve been googling trying to get an idea of what I could expect and there’s not much out there to compare to unless you’re running state of the art stuff.

I’ll make sure to run contribute my benchmark to this once my ram comes in.

jborichevskiy

Congrats on launching!

Stoked to have this dataset out in the open. I submitted a bunch of tests for some models I'm experimenting with on my M4 Pro. Rather paltry scores compared to having a dedicated GPU but I'm excited that running a 24B model locally is actually feasible at this point.

FloatArtifact

I've been waiting for something like this. Have you considered the following based on the benchmark data that's submitted beyond the GPU?

1. User selects a model, size and token output speed and latency. The website generates a hardware list of components that should match the performance requirements.

2. User selects hardware components and the website generates a list of models that performant on that hardware.

3. Monetize through affiliate links the components to fund the project. Think like PC part picker.

I know there's going to be some variability in the benchmarks due to the software stack, but it should give a AI enthusiasts, an educated perspective on what hardware can be relevant for their use case.

sipjca

Right now the main priority is just getting the data out, but in the future may have some interest in this. Or perhaps we can open an API for others to build this as well

chrismorgan

I’m curious: does this fundamentally need to contain an actual model, or would it be okay if it generated a synthetic model itself, full of random weights? I’m picturing downloading just, say, a 20MB file instead of the multi-gigabyte one, and…

Hang on, why is https://blob.localscore.ai/localscore-0.9.2 380MB? I remember llamafile being only a few megabytes. From https://github.com/Mozilla-Ocho/llamafile/releases, looks like it steadily grew from adding support for GPUs on more platforms, up to 28.5MiB¹ in 0.8.12, and then rocketed up to 230MiB in 0.8.13:

> The llamafile executable size is increased from 30mb to 200mb by this release. This is caused by https://github.com/ggml-org/llama.cpp/issues/7156. We're already employing some workarounds to minimize the impact of upstream development contributions on binary size, and we're aiming to find more in the near future.

Ah, of course, CUDA. Honestly I might be more surprised that it’s only this big. That monstrosity will happily consume a dozen gigabytes of disk space.

llamafile-0.9.0 was still 231MiB, then llamafile-0.9.1 was 391MiB, now llamafile-0.9.2 is 293MiB. Fluctuating all over the place, but growing a lot. And localscore-0.9.2 is 363MiB. Why 70MiB extra on top of llamafile-0.9.2? I’m curious, but not curious enough to investigate concretely.

Well, this became a grumble about bloat, but I’d still like to know whether it would be feasible to ship a smaller localscore that would synthesise a suitable model, according to the size required, at runtime.

—⁂—

¹ Eww, GitHub is using the “MB” suffix for its file sizes, but they’re actually mebibytes (2²⁰ bytes, 1048576 bytes, MiB). I thought we’d basically settled on returning the M/mega- prefix to SI with its traditional 10⁶ definition, at least for file sizes, ten or fifteen years ago.

sipjca

LocalScore dev here

Llamafile could certainly be released without the GPU binaries included by default and it would slim down the size tremendously.

The extra 70MiB is that the CUDA binaries for LocalScore are built with CuBLAS and for more generations of NVIDIA architectures (sm60->sm120), whereas Llamafile is built with TinyBLAS and for just a few generations in particular

I think it's possible to randomize weights with a standard set of layers, and maybe a possibility for the future

alchemist1e9

Really awesome project!

Clicking on GPU is a nice simple visualization. I was thinking maybe try to put that type of visual representation intuitively accessible immediately on the landing page.

cpubenchmark.net could he an example technique of drawing the site visitor into the paradigm.

sipjca

I think you might be right, definitely interested in this feedback and creating charts and graphs that are the most useful for folks!

omneity

This is great, congrats for launching!

A couple of ideas .. I would like to benchmark a remote headless server, as well as different methods to run the LLM (vllm vs tgi vs llama.cpp ...) on my local machine, and in this case llamafile is quite limiting. Connecting over an OpenAI-like API instead would be great!

sipjca

LocalScore dev here

Thank you! I think this is quite possible! If you don't mind starting a discussion on this I would love to think aloud there

https://github.com/cjpais/LocalScore/discussions

null

[deleted]

sharmasachin98

This looks super useful, especially with so many folks experimenting with local LLMs now. Curious how well it handles edge devices. Will give it a try!