Skip to content(if available)orjump to list(if available)

I Write Type Safe Generic Data Structures in C

layer8

The casting of the function type assumes that the item pointer type (e.g. Foo*) has the same representation as void*, which the C standard doesn’t guarantee. Calling the function with the converted type may therefore constitute undefined behavior.

o11c

For your level 2 code, `uint64_t data[];` is wrong for types whose alignment is greater than `uint64_t`, and also wasteful for types whose alignment is smaller (for example, under an ilp32 ABI on 64-bit architectures).

For your level 3 code, it should be `int main() { List(Foo) foo_list = {NULL};`

Note that working around a lack of `typeof` means you can't return anything. Also, your particular workaround allows `const`ness errors since `==` is symmetrical.

You can't safely omit `payload` since you need it to know the correct size. Consider a `List(int64_t)` and you try to add an `int32_t` to it - this should be fine, but you can't `sizeof` the `int32_t`. Your code is actually lacking quite a bit to make this work.

=====

There are 2 major limitations to generics in C right now:

* Delegating to a vtable (internal or external) is limited in functionality, since structs cannot contain macros, only functions.

* Delegating to an external vtable (mandatory to avoid overhead) means that you have to forward-declare all of the types you'll ever use a vtable with. So far the best approach I've found is to declare (but not define) static functions in the same forwarding header I declare the typedefs in; note that GCC and Clang differ in what phase the "undefined static" warning appears in for the case where you don't actually include that particular type's header in a given TU.

(think about writing a function that accepts either `struct SizedBuffer {void *p; size_t len;};` or `struct BoundedBuffer {void *begin; void *end;};`, and also const versions thereof - all from different headers).

EPWN3D

I would love for `union`s to be federated, that is, a type could declare itself as thought it was part of a union with another type, without having to pre-declare all possible types in one place.

gritzko

Hi. I object.

The trick#0 you mention is how I made an entire C dialect. Here is a generic binary heap, for example https://github.com/gritzko/librdx/blob/master/abc/HEAPx.h The syntax is a bit heavyweight, but a huge huge advantage is: you get regular C structs in the end, very plain, very predictable, very optimizable. Compiler would eat them like donuts.

In the other cases, it is void* and runtime memory sizing and you have to define macros anyway.

dhooper

OP here. Binary heaps and linked lists are different use cases. A binary heap must read the data you put in it to store it correctly, but a linked list doesn't. If I were writing a generic binary heap, maybe I would weigh my options differently. I mentioned this in the footnotes.

variadix

I agree, there are actually several reasons to prefer the header impl. Debugging is better, both because you can step through the header code where you can’t with a macro function, and because the type information available to the debugger is better. There are more opportunities for compiler optimizations because each instantiation is monomorphized and you don’t pay a runtime cost with variable sizing, generic structures can also be placed on the stack because of the fixed sizing.

There are workarounds for at least two of the problems the author mentions. Naming can be changed from Bar_func(args…) to func(Bar)(args…) with a function name macro that just does name mangling. You can avoid some of the binary bloat by using weak symbols, letting the linker deduplicate functions shared between translation units at link time.

There are other problems for generic containers of pointer types however, you can work around them by using a typedef or a type alias.

Intrusive data structures are more convenient in C still, but working with them in a debugger is a pain.

dhooper

OP here. It's worth noting that no work is being done in the macros of my article, they compile down to a normal c function call which you can step through in a debugger.

There is little benefit in monomorphizing the implementation of a data structure like a linked list where its behavior doesn't depend on the contents of the data it contains (compared to, say, a max heap)

eqvinox

Interesting idea with the union and using typeof(). We (I) went with large macros defining wrappers instead, which, I believe, is a necessity with intrusive data structures, or at least I don't immediately see how to do that with unions & typeof. Maybe it's possible...?

e.g. hash table wrapper: https://github.com/FRRouting/frr/blob/master/lib/typesafe.h#...

(cf. https://docs.frrouting.org/projects/dev-guide/en/latest/list...)

null

[deleted]

HexDecOctBin

The key idea here seems to be to use function pointer's type to enforce type safety rather than using the data "handle" type (that is often found in implementations inspired by Sean Barrett's strechy_buffers).

> One annoying thing about C is that it does not consider these two variables to have the same type

C23 solves that too: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf

Supported by latest GCC and Clang, but not by MSVC.

dhooper

OP here. Not quite. The key idea is about using a union to associate type information with a generic data type. Type casting a function is not the only way to use that type information. I discuss that as well as the C23 changes in the footnotes and the "typeof on old compilers" section.

null

[deleted]

ape4

Or write in CFront and have it translated to C

uecker

It is cool trick. I already use in my experimental library though ;-) https://github.com/uecker/noplate/blob/main/src/list.h

eqvinox

I guess if anyone might know it might be you—do you see any way of doing this for intrusive data structures, embedding the node struct in the data (and as side effect supporting an object to be on multiple containers) rather than the data in the node like you're doing there?

notnmeyer

pretty sure C is the new Go.

revskill

Without the concurreny part.

b0a04gl

what happens if two types have same size and alignment but different semantics : like `int` vs `float` or `struct { int a; }` vs `int`? does the type tag system catch accidental reuse . aka defending against structual collisions

asplake

Interesting! I’m working on toy/educational generator of ML-style tagged variants and associated functions in C (for a compiler) and when I’m a bit further along I will see if they’re compatible.