Show HN: I built a tensor library from scratch in C++/CUDA
4 comments
·June 18, 2025aklein
I noticed you interface with the native code via ctypes. I think cffi is generally preferred (eg, https://cffi.readthedocs.io/en/stable/overview.html#api-mode...). Although you'd have more flexibility if you build your own python extension module (eg using pybind), which will free you from a simple/strict ABI. Curious if this strict separation of C & Python was a deliberate design choice.
kajecounterhack
Cool stuff! Is the goal of this project personal learning, inference performance, or something else?
Would be nice to see how inference speed stacks up against say llama.cpp
liuliu
Both uses cublas under the hood. So I think it is similar for prefilling (of course, this framework is too early and don't have FP16 / BF16 support for GEMM it seems). Hand-roll gemv is faster for token generation hence llama.cpp is better.
helltone
This is very cool. I'm wondering if some of the templates and switch statements would be nicer if there was an intermediate representation and a compiler-like architecture.
I'm also curious about how this compares to something like Jax.
Also curious about how this compares to zml.
Hi HN,
Over the past few months, I've been building `dsc`, a tensor library from scratch in C++/CUDA. My main focus has been on getting the basics right, prioritizing a clean API, simplicity, and clear observability for running small LLMs locally.
The key features are: - C++ core with CUDA support written from scratch. - A familiar, PyTorch-like Python API. - Runs real models: it's complete enough to load a model like Qwen from HuggingFace and run inference on both CUDA and CPU with a single line change[1]. - Simple, built-in observability for both Python and C++.
Next on the roadmap is adding BF16 support and then I'll be working on visualization for GPU workloads.
The project is still early and I would be incredibly grateful for any feedback, code reviews, or questions from the HN community!
GitHub Repo: https://github.com/nirw4nna/dsc
[1]: https://github.com/nirw4nna/dsc/blob/main/examples/models/qw...