Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

I am excited to announce the release of TabPFN-2.5, our tabular foundation model that now scales to datasets of up to 50,000 samples and 2,000 features - a 5x increase from TabPFN v2, published in the Nature journal earlier this year. TabPFN-2.5 delivers state-of-the-art predictions in one forward pass without hyperparameter tuning across classification and regression tasks.

What’s new in 2.5: TabPFN-2.5 maintains the core approach of v2 - a pretrained transformer trained on more than hundred million synthetic datasets to perform in-context learning and output a predictive distribution for the test data. It natively supports missing values, cateogrical features, text and numerical features is robust to outliers and uninformative features.

The major improvements:

- 5x scale increase: Now handles 50,000 samples × 2,000 features (up from 10,000 × 500 in v2)

- SOTA performance: TabPFN-2.5 outperforms tuned tree-based methods and matches the performance of a complex ensemble (AutoGluon 1.4), that itself includes TabPFN v2, tuned for 4 hours. Tuning the model improves performance, outperforming AutoGluon 1.4 for regression tasks.

- Rebuilt API: New REST interface along with Python SDK with dedicated fit & predict endpoints, making deployment and integration more developer-friendly

- A distillation engine that converts TabPFN-2.5 into a compact MLP or tree ensemble while preserving accuracy and offer low latency inference.

There are still some limitations. The model is designed for datasets up to 50K samples. It can handle larger datasets but that hasn’t been our focus with TabPFN-2.5. The distillation engine is not yet available through the API but only through licenses (though we do show the performance in the model report).

We’re actively working on removing these limitations and intend to release newer models focused on context reasoning, causal inference, graph networks, larger data and time-series. TabPFN-2.5 is available via API and a package on Hugging Face. Would love for you to try it and give us your feedback!

Model report: https://priorlabs.ai/technical-reports/tabpfn-2-5-model-repo...

Package: https://github.com/PriorLabs/TabPFN

Client: https://github.com/PriorLabs/tabpfn-client

Docs: https://docs.priorlabs.ai/quickstart

The current go to solution for the kinds of problems that TabPFN is solving would be something like XGBoost. In general it's a good baseline, but the challenge is always that you need to spend a lot of time feature engineering and tweaking the data representation before something like XGBoost can deliver good performance on your regression or classification problems.

For me the promise of foundation models for tabular data is that there are enough generalizable patterns, so that you need less manual feature engineering and data cleaning.

And kudos to the team, I think it's a really creative application of neural networks. I was always frustrated with neural networks, since they were hard to tune on "structured" data and always under-performed (for me), but we also never had real foundational models for structured data.