Skip to content(if available)orjump to list(if available)

I Solved PyTorch's Cross-Platform Nightmare

lynndotpy

> Setting up a Python project that relies on PyTorch, so that it works across different accelerators and operating systems, is a nightmare.

I would like to add some anecdata to this.

When I was a PhD student, I already had 12 years of using and administrating Linuxes as my personal OS, and I'd already had my share of package manager and dependency woes.

But managing Python, PyTorch, and CUDA dependencies were relatively new to me. Sometimes I'd lose an evening here or there to something silly. But I had one week especially dominated by these woes, to the point where I'd have dreams about package management problems at the terminal.

They were mundane dreams but I'd chalk them up as nightmares. The worst was having the pleasant dream where those problems went away forever, only to wake up to realize that was not the case.

dleeftink

Wake up, lynndotpy

di

Note that https://peps.python.org/pep-0440/#direct-references says:

> Public index servers SHOULD NOT allow the use of direct references in uploaded distributions. Direct references are intended as a tool for software integrators rather than publishers.

This means that PyPI will not accept your project metadata as you currently have it configured. See https://github.com/pypi/warehouse/issues/7136 for more details.

doctorpangloss

Guess the guy who wrote this article will learn the hard way: The last 20% of packaging is 800% of your time.

mdaniel

> Cross-Platform

  cpu = [
  "torch @ <https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl> ; python_version == '3.12'",
  "torch @ <https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl> ; python_version == '3.13'",
  ]
:-/ It reminds me of Microsoft calling their thing "cross platform" because it works on several copies of Windows

In all seriousness, I get the impression that pytorch is such a monster PITA to manage because it cares so much about the target hardware. It'd be like a blog post saying "I solved the assembly language nightmare"

gobdovan

Torch simply has to work this way because it cares about performance on a combination of multiple systems and dozens of GPUs. The complexity leaks into packaging.

If you do not care about performance and would rather have portability, use an alternative like tinygrad that does not optimize for every accelerator under the sun.

This need for hardware-specific optimization is also why the assembly language analogy is a little imprecise. Nobody expects one binary to run on every CPU or GPU with peak efficiency, unless you are talking about something like Redbean which gets surprisingly far (the creator actually worked on the TensorFlow team and addressed similar cross-platform problems).

So maybe the the blogpost you're looking for is https://justine.lol/redbean2/.

esafak

https://github.com/pypa/manylinux is for building cross-platform wheels.

zbowling

Check out Pixi! Pixi is an alternative to the common conda and pypi frontends and has better system for hardware feature detection and get the best version of Torch for your hardware that is compatible across your packages (except for AMD at the moment). It can pull in the condaforge or pypi builds of pytorch and help you manage things automagically across platforms. https://pixi.sh/latest/python/pytorch/

It doesn't solve how you package your wheels specifically, that problem is still pushed on your downstream users because of boneheaded packaging decisions by PyTorch themselves but as the consumer, Pixi soften's blow. The condaforge builds of PyTorch also are a bit more sane.

kwon-young

In my opinion, anything that touch compiled packages like pytorch should be packaged with conda/mamba on conda-forge. I found it is the only package manager for python which will reliably detect my hardware and install the correct version of every dependency.

zbowling

Try pixi! Pixi is a much more sane way for building with conda + pypi packages in a single tool that makes this so much easier for torch development, regardless if you get the condaforge or pypi builds of pytorch. https://pixi.sh/latest/

Simulacra

Good writeup. PyTorch has generally been very good to me when I can mitigate its resource hogging at times. Production can be a little wonky but for everything else it works