Tabby: Self-hosted AI coding assistant
31 comments
·January 12, 2025wsxiaoys
maille
Do you have a plugin for MSVC?
wsxiaoys
Not yet, consider subscribe https://github.com/TabbyML/tabby/issues/322 for future updates!
tootie
Is it only compatible with Nvidia and Apple? Will this work with an AMD GPU?
wsxiaoys
Yes - AMD GPU is supported through vulkan backend:
thih9
As someone unfamiliar with local AIs and eager to try, how does the “run tabby in 1 minute”[1] compare to e.g. chatgpt’s free 4o-mini? Can I run that docker command on a medium specced macbook pro and have an AI that is comparably fast and capable? Or are we not there (yet)?
Edit: looks like there is a separate page with instructions for macbooks[2] that has more context.
> The compute power of M1/M2 is limited and is likely to be sufficient only for individual usage. If you require a shared instance for a team, we recommend considering Docker hosting with CUDA or ROCm.
[1]: https://github.com/TabbyML/tabby#run-tabby-in-1-minute
docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model StarCoder-1B --device cuda --chat-model Qwen2-1.5B-Instruct
[2]: https://tabby.tabbyml.com/docs/quick-start/installation/appl...coder543
gpt-4o-mini might not be the best point of reference for what good LLMs can do with code: https://aider.chat/docs/leaderboards/#aider-polyglot-benchma...
A teeny tiny model such as a 1.5B model is really dumb, and not good at interactively generating code in a conversational way, but models in the 3B or less size can do a good job of suggesting tab completions.
There are larger "open" models (in the 32B - 70B range) that you can run locally that should be much, much better than gpt-4o-mini at just about everything, including writing code. For a few examples, llama3.3-70b-instruct and qwen2.5-coder-32b-instruct are pretty good. If you're really pressed for RAM, qwen2.5-coder-7b-instruct or codegemma-7b-it might be okay for some simple things.
> medium specced macbook pro
medium specced doesn't mean much. How much RAM do you have? Each "B" (billion) of parameters is going to require about 1GB of RAM, as a rule of thumb. (500MB for really heavily quantized models, 2GB for un-quantized models... but, 8-bit quants use 1GB, and that's usually fine.)
eurekin
Also context size significantly impacts ram/vram usage and in programming those chats get big quickly
eric-burel
Side question : open source models tend to be less "smart" than private ones, do you intend to compensate by providing a better context (eg query relevant technology docs to feed context)?
KronisLV
For something similar I use Continue.dev with ollama, it’s always nice to see more tools in the space! But as usual, you need pretty formidable hardware to run the actually good models, like the 32B version of Qwen2.5-coder.
SOLAR_FIELDS
> How to utilize multiple NVIDIA GPUs?
| Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly.
So using 2 NVLinked GPU's with inference is not supported? Or is that situation different because NVLink treats the two GPU as a single one?
wsxiaoys
> So using 2 NVLinked GPU's with inference is not supported?
To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example
mjrpes
What is the recommended hardware? GPU required? Could this run OK on an older Ryzen APU (Zen 3 with Vega 7 graphics)?
coder543
The usual bottleneck for self-hosted LLMs is memory bandwidth. It doesn't really matter if there are integrated graphics or not... the models will run at the same (very slow) speed on CPU-only. Macs are only decent for LLMs because Apple has given Apple Silicon unusually high memory bandwidth, but they're still nowhere near as fast as a high-end GPU with extremely fast VRAM.
For extremely tiny models like you would use for tab completion, even an old AMD CPU is probably going to do okay.
mjrpes
Good to know. It also looks like you can host TabbyML as an on-premise server with docker and serve requests over a private network. Interesting to think that a self-hosted GPU server might become a thing.
wsxiaoys
Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a local setup with 3090.
mkl
That thread doesn't seem to mention hardware. It would be really helpful to just put hardware requirements in the GitHub README.
leke
So does this run on your personal machine, or can you install it on a local company server and have everyone in the company connect to it?
wsxiaoys
Tabby is engineered for team usage, intended to be deployed on a shared server. However, with robust local computing resources, you can also run Tabby on your individual machine. Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a local setup with 3090.
thedangler
How would I tell this to use an api framework it doesn’t know ?
wsxiaoys
Tabby comes with builtin RAG support so you can add this api framework to it.
Example: https://demo.tabbyml.com/search/how-to-configure-sso-in-tabb...
Settings page: https://demo.tabbyml.com/settings/providers/doc
jslakro
mkl
Not a dupe, as that was nearly two years ago. https://news.ycombinator.com/newsfaq.html#reposts
d--b
Didn’t you mean to name it Spacey?
laurentpm
[dead]
thecal
Unfortunate name. Can you connect Tabby to the OpenAI-compatible TabbyAPI? https://github.com/theroyallab/tabbyAPI
mbernstein
At least per Github, the TabbyML project is older than the TabbyAPI project.
mynameisvlad
Also, wildly more popular, to the tune of several magnitudes more forks and stars. If anything, this question should be asked of the TabbyAPI project.
karolist
I'm not sure what's going on with TabbyAPI's github metrics, but exl2 quants are very popular among nvidia local LLM crowd and TabbyAPI comes in tons of reddit posts of people using it. Might be just my bubble, not saying they're not accurate, just generally surprised such a useful project has under 1k stars. On the flip side, LLMs will hallucinate about TabbyML if you ask it TabbyAPI related questions, so I'd agree the naming is unfortunate.
Never imagined our project would make it to the HN front page on Sunday!
Tabby has undergone significant development since its launch two years ago [0]. It is now a comprehensive AI developer platform featuring code completion and a codebase chat, with a team [1] / enterprise focus (SSO, Access Control, User Authentication).
Tabby's adopters [2][3] have discovered that Tabby is the only platform providing a fully self-service onboarding experience as an on-prem offering. It also delivers performance that rivals other options in the market. If you're curious, I encourage you to give it a try!
[0]: https://www.tabbyml.com
[1]: https://demo.tabbyml.com/search/how-to-add-an-embedding-api-...
[2]: https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ
[3]: https://www.linkedin.com/posts/kelvinmu_last-week-i-introduc...