Show HN: Tokenflood – simulate arbitrary loads on instruction-tuned LLMs

Hi everyone, I just released an open source load testing tool for LLMs:

https://github.com/twerkmeister/tokenflood

=== What is it and what problems does it solve? ===

Tokenflood is a load testing tool for instruction-tuned LLMs hat can simulate arbitrary LLM loads in terms of prompt, prefix, and output lengths and requests per second. Instead of first collecting prompt data for different load types, you can configure the desired parameters for your load test and you are good to go. It also let's you assess the latency effects of potential prompt parameter changes before spending the time and effort to implement them.

I believe it's really useful for developing latency sensitive LLM applications and * load testing self-hosted LLM model setups * Assessing the latency benefit of changes to prompt parameters before implementing those changes * Assessing latency and intraday variation of latency on hosted LLM services before sending your traffic there

=== Why did I built it? ===

Over the course of the past year, part of my work has been helping my clients to meet their latency, throughput and cost targets for LLMs (PTUs, anyone? ). That process involved making numerous choices about cloud providers, hardware, inference software, models, configurations and prompt changes. During that time I found myself doing similar tests over and over with a collection of adhoc scripts. I finally had some time on my hands and wanted to properly put it together in one tool.

=== What am I looking for? ===

I am sharing this for three reasons: Hoping this can make other's work for latency-sensitive LLM applications simpler, learning and improving from feedback, and finding new projects to work on.

So please check it out on github (https://github.com/twerkmeister/tokenflood), comment, and reach out at thomas@werkmeister.me or on linkedin(https://www.linkedin.com/in/twerkmeister/) for professional inquiries.

=== Pics ===

image of cli interface: https://github.com/twerkmeister/tokenflood/blob/main/images/...

result image: https://github.com/twerkmeister/tokenflood/blob/main/images/...