Making C and Python Talk to Each Other

92 comments

·May 27, 2025

rossant

My visualization library [1] is written in C and exposes a visualization API in C. It is packaged as a Python wheel using auto-generated ctypes bindings, which includes the shared library (so, dylib, or dll) and a few dependencies. This setup works very well, with no need to compile against each Python version. I only need to build it for the supported platforms, which is handled automatically by GitHub Actions. The library is designed to minimize the number of C calls, making the ctypes overhead negligible in practice.

[1] https://datoviz.org/

Grimblewald

This looks awesome, thanks for sharing - do you have any info on how it compares to matplotlib in terms of plotting speed? (even just rough estimates)

fadesibert

A quick inspection of the article suggests there's a difference of intent.

<snip> Datoviz is a relatively low-level visualization library. It focuses on rendering visual primitives like points, lines, images, and meshes — efficiently and interactively.

Unlike libraries such as Matplotlib, Datoviz does not provide high-level plotting functions like plt.plot(), plt.scatter(), or plt.imshow(). Its goal is not to replace plotting libraries, but to serve as a powerful rendering backend for scientific graphics. </snip>

rossant

Yes, although there isn't much to do to go from Datoviz to simple scientific plots like scatter plots or polylines with axes. It's just a few lines of code. I should probably clarify the documentation.

rossant

Good idea, I should do benchmarks on simple plots. It's orders of magnitude faster. Above 60 FPS on hundreds of thousands to even millions of points depending on the type of plot, on common hardware.

Grimblewald

Thanks, I expected as much but wanted to confirm before I commit to learning something new, not for laziness but for prioritising what to learn next. Project looks great either way!

lgtx

How does it compare to VTK, that also provides a Python API?

rossant

Datoviz is much lighter and younger than VTK. It compiles in seconds, making it much faster to build, and it's easier to install and distribute.

It has far fewer features since it focuses solely on rendering common primitives. For example, it doesn't handle data file loading, except for a few quick loaders like OBJ used mainly for testing.

There's almost no computational geometry, data processing, or signal processing functionality. Datoviz is solely focused on interactive rendering.

Datoviz also supports fast, high-quality 2D vector graphics, scaling efficiently to millions of points. In contrast, VTK is primarily designed for more complex and heavy 3D rendering tasks.

hughw

I realize I'm talking about C++ not C, but coincidentally just today I ported our 7 year old library's Swig/Python interface to nanobind. What a fragile c9k Swig has been all these years (don't touch it!) and the nanobind transformation is so refreshing and clean, and lots of type information suddenly available to Python programs. One day of effort and our tests all pass, and now nanobind seems able to allow us to improve the ergonomics (from the Python pov) of our lib.

rrdharan

What is c9k short for?

spacechild1

I guess it's "clusterfuck" :)

DonHopkins

A c9k problem is much worse than a y2k problem (yuck).

muragekibicho

Lots of articles focus on Cython and optimizing Python using C code.

This article is about embedding Python scripts inside a C codebase

kvemkon

Once I needed to implement a simple python plugin engine in a C/C++ software, I've been successfully using the official guide [1].

[1] https://docs.python.org/3/extending/embedding.html

vkoskiv

I did a lot of this for my raytracer, c-ray [1]. Originally it was just a self-contained C program, but I got tired of writing buggy and limited asset import/export code, so eventually I put together a minimal public C API [2] that I then wrapped with CPython bindings [3] and some additional python code [4] to expose a more 'pythonic' API. It's all still a WIP, but it has already allowed me to write a Blender plugin [5], so now I can play around with my renderer directly in Blender, and test with more complex scenes others have made.

Fun project, and it's really cool to see my little renderer in the interactive viewport in Blender, but I have also learned that I don't particularly enjoy working with non-trivial amounts of Python code.

[1] https://github.com/vkoskiv/c-ray [2] https://github.com/vkoskiv/c-ray/blob/51a742b2ee4d0b570975cd... [3] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd... [4] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd... [5] https://github.com/vkoskiv/c-ray/tree/51a742b2ee4d0b570975cd...

hugs

This is one of the "killer apps" for Nim. Nim makes makes it easy to wrap C and easy to talk to Python (via Nimpy).

dexzod

The title of the article is misleading. Making C and python talk to each other implies, calling python from C and calling C from python. The article only covers the former.

eth_hack77

Thanks a lot for the article. Here's a QQ: did you measure the time of some basic operations python vs C? (e.g. if I do a loop of 10 billion iterations, just dividing numbers in C and do the same in python, and then import these operations into one another as libraries, does anything change?)

I'm a beginner engineer so please don't judge me if my question is not making perfect sense.

bdbenton5255

C is many magnitudes faster than Python and you can measure this using nested conditionals. Python is built for a higher level of abstraction and this comes at the cost of speed. It is what makes it very natural and human-like to write in.

xandrius

Syntax has nothing to do with the speed of the language: python could be "natural" and "human-like" while being much faster and also "unnatural" and "inhuman" while being slower.

throwaway314155

Language abstractions that are not "zero-cost" inevitably lead to worse performance. Python has many such abstractions designed to improve developer experience. I think that's all the person you're responding to meant.

bdbenton5255

It does, actually, as the syntax is a result of the language's design and a simpler and more human-like syntax requires a higher level of abstraction that reduces efficiency.

The design of a language, including its syntax, has a great bearing on its speed and efficiency.

Compare C with Assembly, for example, and you will see that higher level languages take complex actions and simplify them into a more terse syntax.

You will also observe that languages such as Python are not nearly as suitable for lower level tasks like writing operating systems where C is much more suitable due to speed.

Languages like Python and Ruby include a higher level of built-in logic to make writing in them more natural and easy at the cost of efficiency.

jebarker

Lots of people argue that AI R&D is currently done in Python because of the benefits of the rich library ecosystem. This makes me realize that's actually a poor reason for everything to be in Python since the actually useful libraries for things like visualization could easily be called from lower level languages if they're off the hot path.

crote

> could easily be called from lower level languages

Could? Yes. Easily? No.

People write their business logic in Python because they don't want to code in those lower-level languages unless they absolutely have to. The article neatly shows the kind of additional coding overhead you'd have to deal with - and you're not getting anything back in return.

Python is successful because it's a high-level language which has the right tooling to create easy-to-use wrappers around low-level high-performance libraries. You get all the benefits of a rich high-level language for the cold path, and you only pay a small penalty over using a low-level language for the hot path.

jebarker

The problem I see (day to day working on ML framework optimization) is that it's not just a case of python calling lower level compiled code. Pytorch, for example, has a much closer integration of python and the low level functions than that and it does cause performance bottlenecks. So in theory I agree that using high level languages to script calls to low level is a good idea, but in practice that gets abused to put python in the hot path. Perhaps if the lower level language were the bulk of the framework and just called python for helper functions we'd see better performance-aware design from developers.

sigbottle

What's the bottleneck? Is it serializing to/from pyobjects over and over for the mlops? I thought pytorch was pretty good with this: Tensors are views, the computation graph can be executed in parallel, & you're just calling a bunch of fast linear algebra libraries under the hood, etc.

If it avoids excessive copying & supports parallel computation, surely it's fine?

If your model is small enough where the overhead of python would start dominating the execution time, I mean... does performance even matter that much, then? And if it's large enough, surely the things I mentioned outweigh the costs?

yowlingcat

> but in practice that gets abused to put python in the hot path

But if that's an abuse of the tools (which I agree with) how does that make it the fault of the language rather than the user or package author? Isn't the language with the "rich library ecosystem" the natural place to glue everything together (including performant extensions in other languages) rather than the other way around -- and so in your example, wouldn't the solution just be to address the abuse in pytorch rather than throw away the entire universe within which it's already functionally working?

efavdb

FWIW I would be up to write in c or something else, but use python for the packages / network effects.

Grimblewald

It's not just about library availability. Python wins because it lets you offload the low-level performance work to people who really know what they’re doing. Libraries like NumPy or PyTorch / Keras wrap highly optimized C/C++ code—so you get near-C/++ performance without having to write or debug C yourself, and without needing a whole computer science degree to do so properly.

It's a mistake to assume C is always faster. If you don’t have a deep understanding of memory layout, compiler flags, vectorization, cache behavior, etc. your hand-written C code can easily be slower than high-level Python using well-optimized libraries. See [1] for a good example of that.

Sure, you could call those same libs from C, but then you're reinventing Python's ecosystem with more effort and more chances to shoot yourself in the foot. Python gives you access to powerful, low-level tools while letting you focus on higher-level problems—in a language that’s vastly easier to learn and use.

That tradeoff isn't just convenience—it's what makes modern AI R&D productive at scale.

[1] https://stackoverflow.com/questions/41365723/why-is-my-pytho...

jebarker

I feel like you're re-stating the same claim that crote made that there's a clean cut between python and lower level libraries meaning that the user doesn't need to know what is happening at the lower level to achieve good performance. This is not true in many cases if you are aiming to achieve peak performance - which we should be for training and serving AI systems since they are already so resource hungry.

giancarlostoro

I think its more than just because of the available libraries. I think that industry has just predominantly preferred Python. Python is a really rich modern language, it might be quirky, but so is every single language you can name. Nothing is quite as quirky as JavaScript though, maybe VB6 but that's mostly dead, though slightly lingering.

Mind you I've programmed in all the mentioned languages. ;)

whattheheckheck

It's the ease of distribution of packages and big functionality being a pip install away

kstrauser

That's the killer feature. Whatever it is you want to do, there's almost certainly a package for it. The joke is that Python's the second best language for everything. It's not the best for web backends, but it's pretty great. It's not the best for data analysis, but it's pretty great. It's not the best at security tooling, but it's pretty great. And it probably is the best language for doing all three of those things in one project.

wallunit

This is actually rather a reason to avoid Python in my opinion. You don't want pip to pollute your system with untracked files. There are tools like virtualenv to contain your Python dependencies but this isn't by default, and pip is generally rather primitive compared to npm.

ashishb

I rewrote a simple RAG ingestion pipeline from Python to Go.

It reads from a database. Generates embeddings. Writes it to a vector database.

  - ~10X faster
  - ~10X lower memory usage

The only problem is that you have to spend a lot of time figuring out how to do it.

All instructions on the Internet and even on the vector database documentation are in Python.

chpatrick

If speed and memory use aren't a bottleneck then "a lot of time figuring out how to do it" is probably the biggest cost for the company. Generally these things can be run offline and memory is fairly cheap. You can get a month of a machine with a ton of RAM for the equivalent of one hour of developer time of someone who knows how to do this. That's why Python is so popular.

kgeist

>I rewrote a simple RAG ingestion pipeline from Python to Go

I also wrote a RAG pipeline in Go, using OpenSearch for hybrid search (full-text + semantic) and the OpenAI API. I reused OpenSearch because our product was already using it for other purposes, and it supports vector search.

For me, the hardest part was figuring out all the additional settings and knobs in OpenSearch to achieve around 90% successful retrieval, as well as determining the right prompt and various settings for the LLM. I've found that these settings can be very sensitive to the type of data you're applying RAG to. I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

ashishb

> I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

There are Python libraries that will simplify the task by giving a better structure to your problem. The knobs will be fewer and more high-level.

bigger_cheese

I have been using Python recently and have found a lot of the data visualization tools seem to be wrappers around other languages (mostly JavaScript), things like, agGrid, Tabulator, Plotly etc.

Sometimes you end up embedding chunks of javascript directly inside your python

For example the docs for Streamlit implementation of AgGrid contain this: https://staggrid-examples.streamlit.app/Advanced_config_and_...

kristjansson

One ... could? But it doesn't seem particularly ergonomic.

jebarker

Ergonomics isn't the point, performance is.

mkoubaa

Nobody has ever, in the history of Python, called the Python C API easy.

nottorp

Or this?

https://docs.python.org/3/extending/embedding.html

null

[deleted]

SandmanDP

I’ve been curious, what are the motivations for most projects to use Lua for enabling scripting in C over this? Is the concern around including an entire Python interpreter in a project and Lua is lighter?

crote

Lua is absolutely trivial to isolate. As the embedder, you have complete control over what the interpreter and VM are doing. Don't want your Lua scripts to have file access? Don't hook up those functions and you're done. Want to prevent against endless loops? Tell the VM to stop after 10.000 instructions. Want to limit the amount of memory a script can use? Absolutely trivial. This makes Lua very attractive for things like game development. You can run untrusted addon code without any worry that it'll be able to mess up the game - or the rest of the system.

Doing the same with Python is a lot harder. Python is designed first and foremost to run on its own. If you embed Python you are essentially running it besides your own code, with a bunch of hooks in both directions. Running hostile Python code? Probably not a good idea.

OskarS

Another thing to mention is that until very recently (Python 3.12, I think?) every interpreter in the address space shared a lot of global state, including most importantly the GIL. For my area (audio plugins) that made Python a non-starter for embedding, while Lua works great.

I agree though: biggest reason is probably the C API. Lua's is so simple to embed and to integrate with your code-base compared to Python. The language is also optimized for "quick compiling", and it's also very lightweight.

These days, however, one might argue that you gain so much from embedding either Python or JavaScript, it might be worth the extra pain on the C/C++ side.

Derbasti

I've done both. Let me tell you, embedding Lua into a C program is magnitudes easier than embedding Python.

The main reason is the endless fragility of Python's garbage collector. To be clear, the API is trying to be as helpful as possible, but it's still a whole bunch of complexity that's easy to mess up. Incidentally, this discussion is left out in the linked article, which makes it less than useless. In my experience with many a third party C/Py interface, data leaks are incredibly common in such code.

Lua of course also has a garbage collector, but it essentially only knows two types of values: POD, and tables, with only the latter needing much consideration. The interaction model is based on a stack-based virtual machine, which is more complex than Python's full-function abstraction, but conveniently hides most of the garbage collector complexity. So long as you're just reshuffling things on the stack (i.e. most of the time), you don't need to worry about the garbage collector at all.

bandoti

Lua is much lighter but the key is that it’s probably one of the easiest things to integrate (just copy the sources/includes and add them to build it’ll work)—like a “header only” kind of vibe.

But, you can strip down a minimal Python build and statically compile it without too much difficulty.

I tend to prefer Tcl because it has what I feel the perfect amount of functionality by default with a relatively small size. Tcl also has the better C APIs of the bunch if you’re working more in C.

Lua is very “pushy” and “poppy” due to its stack-based approach, but that can be fun too if you enjoy programming RPN calculators haha :)

spacechild1

People already mentioned that Lua is very lightweight and easy to integrate. It's also significantly faster than Python. (I'm not even talking about LuaJIT.)

Another big reason: the Lua interpreter does not have any global variables (and therefore also no GIL) so you can have multiple interpreters that are completely independent from each other.

90s_dev

Network effect.

HN

Making C and Python Talk to Each Other

Making C and Python Talk to Each Other