Skip to content(if available)orjump to list(if available)

Show HN: I built a hardware processor that runs Python

Show HN: I built a hardware processor that runs Python

146 comments

·April 28, 2025

Hi everyone, I built PyXL — a hardware processor that executes a custom assembly generated from Python programs, without using a traditional interpreter or virtual machine. It compiles Python -> CPython Bytecode -> Instruction set designed for direct hardware execution.

I’m sharing an early benchmark: a GPIO test where PyXL achieves a 480ns round-trip toggle — compared to 14-25 micro seconds on a MicroPython Pyboard - even though PyXL runs at a lower clock (100MHz vs. 168MHz).

The design is stack-based, fully pipelined, and preserves Python's dynamic typing without static type restrictions. I independently developed the full stack — toolchain (compiler, linker, codegen), and hardware — to validate the core idea. Full technical details will be presented at PyCon 2025.

Demo and explanation here: https://runpyxl.com/gpio Happy to answer any questions

rkagerer

Back when C# came out, I thought for sure someone would make a processor that would natively execute .Net bytecode. Glad to see it finally happened for some language.

kcb

For Java, this was around for a bit https://en.wikipedia.org/wiki/Jazelle.

monocasa

Even better was a complete system rather than a mode for arm codes that ran the common jvm opcodes.

https://en.wikipedia.org/wiki/PicoJava

varispeed

Didn't some phones have hardware Java execution or does my memory fail me?

jiehong

Java got that with smart cards for example. Cute oddities of the past

monocasa

JavaCard was just implemented as just a regular interpreter last time I checked.

whoomp12342

I'd be surprised if azure app services didn't do this already.

john-h-k

I’d be willing to bet my net worth that they don’t

bongodongobob

Azure runs on Linux if I'm not mistaken.

actionfromafar

Wouldn't that be a real scoop?

actinium226

So first of all, this is awesome and props to you for some great work.

I have what may be a dumb question, but I've heard that Lua can be used in embedded contexts, and that it can be used without dynamic memory allocation and other such things you don't want in real time systems. How does this project compare to that? And like I said it's likely a dumb question because I haven't actually used Lua in an embedded context but I imagine if there's something there you've probably looked at it?

Y_Y

Are there any limitations on what code can run? (discounting e.g. memory limitations and OS interaction)

I'd love to read about the design process. I think the idea of taking bytecode aimed at the runtime of dynamic languages like Python or Ruby or even Lisp or Java and making custom processors for that is awesome and (recently) under-explored.

I'd be very interested to know why you chose to stay this, why it was a good idea, and how you went about the implementation (in broad strokes if necessary).

hwpythonner

Thanks — really appreciate the interest!

There are definitely some limitations beyond just memory or OS interaction. Right now, PyXL supports a subset of real Python. Many features from CPython are not implemented yet — this early version is mainly to show that it's possible to run Python efficiently in hardware. I'd prefer to move forward based on clear use cases, rather than trying to reimplement everything blindly.

Also, some features (like heavy runtime reflection, dynamic loading, etc.) would probably never be supported, at least not in the traditional way, because the focus is on embedded and real-time applications.

As for the design process — I’d love to share more! I'm a bit overwhelmed at the moment preparing for PyCon, but I plan to post a more detailed blog post about the design and philosophy on my website after the conference.

mikepurvis

In terms of a feature-set to target, would it make sense to be going after RPython instead of "real" Python? Doing that would let you leverage all the work that PyPy has done on separating what are the essential primitives required to make a Python vs what are the sugar and abstractions that make it familiar:

https://doc.pypy.org/en/latest/faq.html#what-is-pypy

hermitShell

JVM I think I can understand, but do you happen to know more about LISP machines and whether they use an ISA specifically optimized for the language, or if the compilers for x86 end up just doing the same thing?

In general I think the practical result is that x86 is like democracy. It’s not always efficient but there are other factors that make it the best choice.

boutell

This is very, very cool. Impressive work.

I'm interested to see whether the final feature set will be larger than what you'd get by creating a type-safe language with a pythonic syntax and compiling that to native, rather than building custom hardware.

The background garbage collection thing is easier said than done, but I'm talking to someone who has already done something impressively difficult, so...

warble

Wow, these FPGAs are not cheap. Don't they also have a couple of ARM cores attached on the SOC?

nynx

This is cool for sure. I think you’ll ultimately find that this can’t really be faster than modern OoO cores because python instructions are so complex. To execute them OoO or even at a reasonable frequency (e.g. to reduce combinatorial latency), you’ll need to emit type-specialized microcode on the fly, but you can’t do that until the types are known — which is only the case once all the inputs are known for python.

hwpythonner

Thanks — appreciate it!

You're right that dynamic typing makes high-frequency execution tricky, and modern OoO cores are incredibly good at hiding latencies. But PyXL isn't trying to replace general-purpose CPUs — it's designed for efficient, predictable execution in embedded and real-time systems, where simplicity and determinism matter more than absolute throughput. Most embedded cores (like ARM Cortex-M and simple RISC-V) are in-order too — and deliver huge value by focusing on predictability and power efficiency. That said, there’s room for smart optimizations even in a simple core — like limited lookahead on types, hazard detection, and other techniques to smooth execution paths. I think embedded and real-time represent the purest core of the architecture — and once that's solid, there's a lot of room to iterate upward for higher-end acceleration later.

IshKebab

Very cool! Nobody who really wants simplicity and determinism is going to be using Python on a microcontroller though.

actionfromafar

Hm, why not though. People managed to do it with tiny JVMs before, so why not a Python variant.

gavinsyancey

Sure, but for embedded use cases (which this is targeting), the goal isn't raw speed so much as being fast enough for specific use cases while minimizing power usage / die area / cost.

hwpythonner

I built a hardware processor that runs Python programs directly, without a traditional VM or interpreter. Early benchmark: GPIO round-trip in 480ns — 30x faster than MicroPython on a Pyboard (at a lower clock). Demo: https://runpyxl.com/gpio

rthomas6

* What HDL did you use to design the processor?

* Could you share the assembly language of the processor?

* What is the benefit of designing the processor and making a Python bytecode compiler for it, vs making a bytecode compiler for an existing processor such as ARM/x86/RISCV?

hwpythonner

Thanks for the question.

HDL: Verilog

Assembly: The processor executes a custom instruction set called PySM (Not very original name, I know :) ). It's inspired by CPython Bytecode — stack-based, dynamically typed — but streamlined to allow efficient hardware pipelining. Right now, I’m not sharing the full ISA publicly yet, but happy to describe the general structure: it includes instructions for stack manipulation, binary operations, comparisons, branching, function calling, and memory access.

Why not ARM/X86/etc... Existing CPUs are optimized for static, register-based compiled languages like C/C++. Python’s dynamic nature — stack-based execution, runtime type handling, dynamic dispatch — maps very poorly onto conventional CPUs, resulting in a lot of wasted work (interpreter overhead, dynamic typing penalties, reference counting, poor cache locality, etc.).

larusso

This sounds like your ‚arch‘ (sorry don‘t 100% know the correct term here) could potentially also run ruby/js if the toolchain can interpret it into your assembly language?

pak9rabid

Wow, this is fascinating stuff. Just a side question (and please understand I am not a low-level hardware expert, so pardon me if this is a stupid question): does this arch support any sort of speculative execution, and if so do you have any sort of concerns and/or protections in place against the sort of vulnerabilities that seem to come inherent with that?

hwpythonner

Thanks — and no worries, that’s a great question!

Right now, PyXL runs fully in-order with no speculative execution. This is intentional for a couple of reasons: First, determinism is really important for real-time and embedded systems — avoiding speculative behavior makes timing predictable and eliminates a whole class of side-channel vulnerabilities. Second, PyXL is still at an early stage — the focus right now is on building a clean, efficient architecture that makes sense structurally, without adding complex optimizations like speculation just for the sake of performance.

In the future, if there's a clear real-world need, limited forms of prediction could be considered — but always very carefully to avoid breaking predictability or simplicity.

thenobsta

Amazing work! This is a great project!

Every time I see a project that has a great implementation on an FPGA, I lament the fact that Tabula didn’t make it, a truly innovative and fast FPGA.

<https://en.m.wikipedia.org/wiki/Tabula,_Inc.>

echoangle

Would this be able to handle an exec()- or eval()-call? Is there a Python byte code compiler available as python byte code to include in this processor?

IshKebab

Yeah this is surely a subset of Python.

bieganski

it would be nice to have some peripheral drivers implemented (UART, eMMC etc).

having this, the next tempting step is to make `print` function work, then the filesystem wrapper etc.

btw - what i'm missing is a clear information of limitations. it's definitely not true that i can take any Python snippet and run it using PyXL (for example threads i suppose?)

hwpythonner

Great points!

Peripheral drivers (like UART, SPI, etc.) are definitely on the roadmap - They'd obviously be implemented in HW. You're absolutely right — once you have basic IO, you can make things like print() and filesystem access feel natural.

Regarding limitations: you're right again. PyXL currently focuses on running a subset of real Python — just enough to show it's real python and to prove the core concept, while keeping the system small and efficient for hardware execution. I'm intentionally holding off on implementing higher-level features until there's a real use case, because embedded needs can vary a lot, and I want to keep the system tight and purpose-driven.

Also, some features (like threads, heavy runtime reflection, etc.) will likely never be supported — at least not in the traditional way — because PyXL is fundamentally aimed at embedded and real-time applications, where simplicity and determinism matter most.

froh

Do I get this right? this is an ASIC running a python-specific microcontroller which has python-tailored microcode? and together with that a python bytecode -> microcode compiler plus support infrastructure to get the compiled bytcode to the asic?

fun :-)

but did I get it right?

hwpythonner

You're close: It's currently running on an FPGA (Zynq-7000) — not ASIC yet — but yeah, could be transferable to ASIC (not cheap though :))

It's a custom stack-based hardware processor tailored for executing Python programs directly. Instead of traditional microcode, it uses a Python-specific instruction set (PySM) that hardware executes.

The toolchain compiles Python → CPython Bytecode → PySM Assembly → hardware binary.

cchianel

As someone who did a CPython Bytecode → Java bytecode translator (https://timefold.ai/blog/java-vs-python-speed), I strongly recommend against the CPython Bytecode → PySM Assembly step:

- CPython Bytecode is far from stable; it changes every version, sometimes changing the behaviour of existing bytecodes. As a result, you are pinned to a specific version of Python unless you make multiple translators.

- CPython Bytecode is poorly documented, with some descriptions being misleading/incorrect.

- CPython Bytecode requires restoring the stack on exception, since it keeps a loop iterator on the stack instead of in a local variable.

I recommend instead doing CPython AST → PySM Assembly. CPython AST is significantly more stable.

hwpythonner

Thanks — really appreciate your insights.

You're absolutely right that CPython bytecode changes over time and isn’t perfectly documented — I’ve also had to read the CPython source directly at times because of unclear docs.

That said, I intentionally chose to target bytecode instead of AST at this stage. Adhering to the AST would actually make me more vulnerable to changes in the Python language itself (new syntax, new constructs), whereas bytecode changes are usually contained to VM-level behavior. It also made it much easier early on, because the PyXL compiler behaves more like a simple transpiler — taking known bytecode and mapping it directly to PySM instructions — which made validation and iteration faster.

Either way, some adaptation will always be needed when Python evolves — but my goal is to eventually get to a point where only the compiler (the software part of PyXL) needs updates, while keeping the hardware stable.

nurettin

This was my first thought as well. They will be stuck at a certain python version

bangaladore

Have you considered joining the next tiny tapeout run? This is exactly the type of project I'm sure they would sponsor or try to get to asic.

In case you weren't aware, they give you 200 x 150 um tile on a shared chip. There is then some helper logic to mux between the various projects on the chip.

https://tinytapeout.com/

relistan

Not an ASIC, it’s running on an FPGA. There is an ARM CPU that bootstraps the FPGA. The rest of what you said is about right.