Skip to content(if available)orjump to list(if available)

Exploring Polymorphism in C: Lessons from Linux and FFmpeg's Code Design (2019)

inopinatus

This is an excellent pattern in C. The Dovecot mail server has many fine examples of the style as well e.g.

    struct dict dict_driver_ldap = {
        .name = "ldap",
        .v = {
            .init = ldap_dict_init,
            .deinit = ldap_dict_deinit,
            .wait = ldap_dict_wait,
            .lookup = ldap_dict_lookup,
            .lookup_async = ldap_dict_lookup_async,
            .switch_ioloop = ldap_dict_switch_ioloop,
        }
    };
defines the virtual function table for the LDAP module, and any other subsystem that looks things up via the abstract dict interface can consequently be configured to use the ldap service without concrete knowledge of it.

(those interested in a deeper dive might start at https://github.com/dovecot/core/blob/main/src/lib-dict/dict-...)

dividuum

So does the good old Quake 2 rendering API. The game exported a bunch of functions to the renderer via refimport_t and the renderer in return provided functions via refexport_t. The only visible symbol in a rendering DLL is GetRefAPI_t: https://github.com/id-Software/Quake-2/blob/master/client/re...

I remember being impressed by this approach, so I shamelessly copied it for my programming game: https://github.com/dividuum/infon/blob/master/renderer.h :)

dfox

I somehow suspect that the reason why Quake2 does this lies in the legacy of Quake1 written in DJGPP. DJGPP supports dynamicaly loaded libraries (although the API is technically unsupported and internal-only), but does not have any kind of dynamic linker, thus passing around pair of such structs during library initialization is the only way to make that work.

jamesfinlayson

Pretty sure Half-Life does something pretty similar - all functionality between the game and engine is done via function pointer structs.

MisterTea

The original Half-Life engine, now called Goldsrc, is a heavily modified Quake engine with bits of Quake 2 in there.

immibis

This is cargo cult. You can just directly export and import the rendering functions.

jrmg

Reminds me of Apple’s CoreFoundation.

social_quotient

I spend a ton of time in FFmpeg, and I’m still blown away by how it uses abstractions to stay modular—especially for a project that’s been around forever and still feels so relevant. Those filtergraphs pulling off polymorphism-like tricks in C? It’s such an elegant way to manage complex pipelines. e.g.

ffmpeg -i input.wav -filter_complex " [0:a]asplit=2[a1][a2]; [a1]lowpass=f=500[a1_low]; [a2]highpass=f=500[a2_high]; [a1_low]volume=0.5[a1_low_vol]; [a2_high]volume=1.5[a2_high_vol]; [a1_low_vol][a2_high_vol]amix=inputs=2[a_mixed]; [a_mixed]aecho=0.8:0.9:1000:0.3[a_reverb] " -map "[a_reverb]" output.wav

That said, keeping those interfaces clean and consistent as the codebase grows (and ages) takes some real dedication.

Also recently joined the mailing lists and it’s been awesome to get a step closer to the pulse of the project. I recommend if you want to casually get more exposure to the breadth of the project.

https://ffmpeg.org/mailman/listinfo

MuffinFlavored

how similar are the C abstractions in ffmpeg and qemu given they were started by the same person?

variadix

I haven’t worked with ffmpeg’s code, but I have worked with QEMU. QEMU has a lot of OOP (implemented in C obviously) that is supported by macros and GCC extensions. I definitely think it would have been better (and the code would be easier to work with) to use C++ rather than roll your own object model in C, but QEMU is quite old so it’s somewhat understandable. I say that as someone who mostly writes C and generally doesn’t like using C++.

shmerl

What's the reason for ffmpeg to use C, also historic?

bonzini

QEMU's abstractions were added when Fabrice was almost completely inactive already (starting in 2008 for some command line support).

cbarrick

For the record, this design pattern is called a virtual method table, or vtable.

I'm surprised that this article never mentioned the term.

C++ programmers will know this pattern from the `virtual` keyword.

rzzzt

You can take it a step further:

- instead of setting the same function pointers on structs over and over again, point to a shared (singleton) struct named "vtable" which keeps track of all function pointers for this "type" of structs

- create a factory function that allocates memory for the struct, initializes fields ("vtable" included), let's call it a "constructor"

- make sure all function signatures in the shared struct start with a pointer to the original struct as the first parameter, a good name for this argument would be "this"

- encode parameter types in the function name to support overloading, e.g. "func1_int_int"

- call functions in the form of "obj->vtable->func1_int_int(obj, param1, param2)"

starspangled

> You can take it a step further:

No, that is essentially what Linux does in this article (and by the looks of it also ffmpeg).

struct file does not have a bunch of pointers to functions, it has a pointer to a struct file_operations, and that is set to a (usually / always?) const global struct defined by a filesystem.

As you can see, the function types of the pointers in that file_operations struct take a struct file pointer as the first argument. This is not a hard and fast rule in Linux, arguments even to such ops structures are normally added as required not just-in-case (in part because ABI stability is not a high priority). Also the name is not mangled like that because it would be silly. But otherwise that's what these are, a "real" vtable.

Surely this kind of thing came before C++ or the name vtable? The Unix V4 source code contains a pointers to functions (one in file name lookup code, even) (though not in a struct but passed as an argument). "Object oriented" languages and techniques must have first congealed out of existing practices with earlier languages, you would think.

rzzzt

The proto-C++ transpiler used C with this and similar techniques behind the scenes: https://en.wikipedia.org/wiki/Cfront

drivebyhooting

Is this satire? That’s almost exactly the C++ way.

p_l

It's how you do it in many C++ implementations, but IIRC it's not actually mandated in any way unless you strive for GCC's IA-64 ABI compatibility (the effective standard on Linux for C++)

C++'s vtables are also, in my experience, especially bad compared to Objective-C or COM ones (MSVC btw generates vtables specifically aligned for use with COM, IIRC). Mind you it's been 15 years since I touched that part of crazy.

procaryote

Some say C++ is satire

relistan

It’s not satire, it’s how you do full OO in plain C.

chrsw

I've noticed many large C projects resort to these sorts of OOP-like patterns to manage the complexity of the design and size of the code base. But I'm not aware of any one standard way of doing this in C. It seems C++ standardized a lot of these concepts, or C++ developers adopted standard patterns somehow.

discreteevent

> I've noticed many large C projects resort to these sorts of OOP-like patterns to manage the complexity of the design and size of the code base

The Power of Interoperability: Why Objects Are Inevitable

https://www.cs.cmu.edu/~aldrich/papers/objects-essay.pdf

pjmlp

Objects regardless of what shape they take, are basically an evolution of modules that can be passed around as values, instead of having a single instance of them.

That is why they are here to stay, and even all mainstream FP and LP languages offer features that provide similar capabilities, even if they get other names for the same thing.

It is like saying an artifact is useless, only because it get named differently in English and Chinese.

jcelerier

and C++ also supports optimizing them, especially when you use `final` keyword and LTO which is able to devirtualize at the scale of a whole program.

i_am_a_peasant

Interesting, in Rust those optimizations are more implicit since there's no "final" keyword when you use dynamic dispatch via trait objects. + you also got LTO.

I wonder if there are many cases where C++ will devirtualize and Rust won't.

But then again Rust devs are more likely to use static dispatch via generics if performance is critical.

trelane

I learned it from a textbook. I think it was an earlier printing of https://docs.freebsd.org/en/books/design-44bsd/

KerrAvon

> The interface type in golang is much more powerful than Java’s similar construct because its definition is totally disconnected from the implementation and vice versa. We could even make each codec a ReadWriter and use it all around.

This paragraph completely derailed me — I’m not familiar with golang, but `interface` in Java is like `@protocol` in Objective-C — it defines an interface without an implementation for the class to implement, decoupling it entirely from the implementation. Seems to be exactly the same thing?

mananaysiempre

The difference between Go and Java is that in Go a type need not declare its adherence to an interface up front—any type that has methods of appropriate names and signatures is considered to implement the interface, even if its designers were not aware of the interface’s existence. (This involves a small bit of dynamism in the runtime; easily cached, though, as the set of methods of a given type and the set of all interfaces are both fixed by the time the program runs.) Whether that’s a good thing depends on your design sensibilities (ETA: nominal vs structural).

jeroenhd

For those wishing Java had a similar feature, there's Manifold: https://github.com/manifold-systems/manifold/tree/master/man...

Manifold is a very interesting project that adds a lot of useful features to Java (operator overloading, extension classes, and a whole bunch more). I don't know if it's smart to use it in production code because you basically go from writing Java to writing Manifold, but I still think it's a fun project to experiment with.

sitkack

http://manifold.systems/

> Manifold is a Java compiler plugin, its features include Metaprogramming, Properties, Extension Methods, Operator Overloading, Templates, a Preprocessor, and more.

Neat tool. It is like having a programmable compiler built into your language.

JavierFlores09

The funny thing about Java is that while its design is to be entirely nominally typed, the way it is implemented in the JVM is compatible with structural typing, but there are artificial limitations set to follow the intended design (though of course, if one were to disable these limitations then modeled type safety goes out of the window as Java was simply not designed to be used that way). One community which takes advantage of this fact is the Minecraft modding space, as it is the basis[1] of how modding platforms like Fabric work.

1: https://github.com/SpongePowered/Mixin/wiki/Introduction-to-...

owlstuffing

>The difference between Go and Java is that in Go a type need not declare its adherence to an interface up front.

Go can't declare adherence up front, and in my view that’s a problem. Most of the time, explicitly stating your intent is best, for both humans reading the code and tools analyzing it. That said, structural typing has its moments, like when you need type-safe bridging without extra boilerplate.

duskwuff

You can assert that your type implements an interface at compile time, though, e.g.

    var _ AssertedInterface = &MyType{}

relistan

One of the main uses for interfaces in Go is defining the contact for _your_ dependencies. Rather than saying your function takes a socket, if you only ever call Write(), you can take a Writer, or define another interface that is only the set of functions you need. This is far more powerful than declaring that your type implements an interface up front. It allows for things like e.g. multiple image libraries to implement your interface without knowing it, enabling your project to use them interchangeably. And as another commenter said, you can have the compiler verify your compliance with an interface with a simple (though admittedly odd looking) declaration.

williamdclt

I don’t agree it’s a structural VS nominal difference. Typescript is structural, but it does have the “implements” keyword.

Which makes a million times more sense to me, because realistically when do you ever have a structure that usefully implements an interface without being aware of it?? The common use-case is to implement an existing interface (in which case might as well enforce adherence to the interface at declaration point), not to plug an implementation into an unrelated functionality that happens to expect the right interface.

cognisent

TypeScript doesn't require a class to use it, though, because it's structurally typed. All that "implements Foo" in this example does is make sure that you get a type error on the definition of "One" if it doesn't have the members of "Foo".

If "Two" didn't have a "name: string" member, then the error would be on the call to "test".

    interface Foo {
        name: string
    }

    class One implements Foo {
        constructor(public name: string) {}
    }

    class Two {
        constructor(public name: string) {}
    }

    function test(thing: Foo): void {
        //...
    }

    test(new One('joe'));
    test(new Two('jane'));

kazinator

GNU C++ once had this feature; it was called Signatures. It was removed, though.

A signature declaration resembled an abstract base class. The target class did not have to inherit the signature: just have functions with matching names and types.

The user of the class could cast a pointer to an instance of the class to a pointer to a compatible signature. Code not knowing anything about the class could indirectly call all the functions through the signature pointer.

pjmlp

Nowadays you can do that with concepts.

billfruit

So golang supports 'duck typing'?

theLiminator

I think in a static context, it's generally referred to as structural typing, but yeah.

juwjfoobar

interfaces in Go are structural. Interfaces in Java are nominal and require immediate declaration of intent to implement at type definition.

psychoslave

Shouldn’t this be named phenomenal rather than structural? In both cases there is a structure assumed, but one is implicitly inferred while the other one is explicitly required.

layer8

The difference is that in Go, an interface is assumed to match if the method signatures match. In other words, the match is done on the type structure of the interface, hence the “structural” designation. Nominal typing, on the other hand, considers that interfaces tend to be associated with important semantic requirements in addition to the type signature, and that mere type-structure matching doesn’t at all guarantee a semantic match. For that reason, the semantics are implicitly bound to the declared name of the interface, and the way for an implementation to claim conformance to those semantics is to explicitly bind itself to that name.

relistan

I think you’re making a joke, but in Go you get both. You can have the compiler enforce that you implement an interface with a simple declaration. Most people do.

sitkack

I'd say at 20kloc of C, https://www.lua.org/ gets you as far up the Object Oriented tower as you want.

quietbritishjim

The article is about using OO techniques directly in C code. Lua is implemented in C but it's an entirely separate language. Does its implementation use OO techniques as part of its C source code? If not, then it's not really relevant.

sitkack

I don't see how a distinction here is anything but semantically arbitrary.

Transitively, it most definitely uses OO techniques. Furthermore, by having such a clean C ffi (in both directions) it allows for the weaving of the Lua based OO techniques back into C code.

codr7

No discussion about polymorphism in C is complete without mentioning this macro:

https://stackoverflow.com/questions/15832301/understanding-c...

glouwbug

    *int (*encode)(*int);
Why not compile your snippets? Heads up to the author.

mlvljr

[dead]

favorited

> for instance, Linux handles network socket, special files (like /proc/cpuinfo) or even USB devices as files. This is a powerful idea that can make easy to write or use programs for linux since we can rely in a set of well known operations from this abstraction called file.

Benno Rice gave a fantastic talk a few years ago called "What UNIX Cost Us," which he starts off by showing how to write some USB device code in macOS, Windows, and Linux. It only takes a few minutes to demonstrate how pretending that everything is a file can be a pretty poor abstraction, and result in far more confusing code, which is why everyone ends up using libusb instead of sysfs.

https://www.youtube.com/watch?v=9-IWMbJXoLM#t=134s

DarkUranium

The same is visible in having to parse a bunch of Linux's more complex of the /proc entries, vs. simply using syscalls in (say) FreeBSD.

"Everything is a file" is not a bad abstraction for some things. It feels like Linux went the route of a golden hammer here.

favorited

That's the gist of his whole talk – that doing things "the UNIX way" (which can be defined to various degrees of specificity) has been cargo culted, and that we should reexamine whether solutions that were pragmatic 50+ years ago is still the best we can do.

The specific reason I mentioned it was because his initial example was about how much more ceremony and boilerplate is needed when you need to pretend that USB interfaces are actually magic files and directories.

loph

My recollection (which could be rusty, it has been >30 years) is that the Motif API, coded in C, implemented a kind of polymorphism.

jdefr89

What is with the incorrect function declarations? I see:

int (func)().

Maybe you meant: int * (*func)(void)?

Don't mean to be pedantic. Just wanted to point it out so you can fix it.

brcmthrowaway

Does ffmpeg support SVE?

pjmlp

The language is called Go.

Other than that, yeah doing by hand what C++ and Objective-C do automatically.

programmarchy

For one, ffmpeg is 9 years older than Go. Plus, when dealing with video files a garbage collected language probably isn't going to cut it. C++ and Obj-C also feel overkill for ffmpeg.

pjmlp

Apparently someone has not read the article, otherwise you would have had understood my point about Go.

Secondly, Apple and Microsoft, do just fine with Objective-C and C++ for their video codecs, without having to manually implement OOP in C.

programmarchy

CoreVideo and CoreAudio are both implemented in C on Apple systems. There are higher level APIs like AVFoundation implemented in Obj-C/Swift, but the codecs themselves are written in C. Even the mid-level frameworks like AudioToolbox and VideoToolbox are written in C. I’m not as familiar with Microsoft but imagine it’s similar.

Also the article doesn’t actually mention OOP. You can use polymorphism without fully buying into OOP (like Go does).

The great thing about C is its interoperability, which is why it’s the go to language for things like codecs, device drivers, kernel modules, etc.