Show HN: GPULlama3.java Llama Compilied to PTX/OpenCL Now Integrated in Quarkus

6 comments

·December 11, 2025

wget https://github.com/beehive-lab/TornadoVM/releases/download/v... unzip tornadovm-2.1.0-opencl-linux-amd64.zip # Replace <path-to-sdk> manually with the absolute path of the extracted folder export TORNADO_SDK="<path-to-sdk>/tornadovm-2.1.0-opencl" export PATH=$TORNADO_SDK/bin:$PATH

tornado --devices tornado --version

# Navigate to the project directory cd GPULlama3.java

# Source the project-specific environment paths -> this will ensure the source set_paths

# Build the project using Maven (skip tests for faster build) # mvn clean package -DskipTests or just make make

# Run the model (make sure you have downloaded the model file first - see below) ./llama-tornado --gpu --verbose-init --opencl --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke"

Visit

mikepapadim

https://github.com/beehive-lab/GPULlama3.java

lostmsu

Does it support flash attention? Use tensor cores? Can I write custom kernels?

UPD. found no evidence that it supports tensor cores, so it's going to be many times slower than implementations that do.

mikepapadim

Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...

lostmsu

TornadoVM GitHub has no mentions of tensor cores or WMMA instructions. The only mention of tensor cores is in 2024 and states they are not used: https://github.com/beehive-lab/TornadoVM/discussions/393

mikepapadim

https://github.com/beehive-lab/TornadoVM/pull/732 https://github.com/beehive-lab/TornadoVM/pull/313

sliicemasternet

[dead]