Getting silly with C, part (void*)2
34 comments
·January 10, 2025svilen_dobrev
hehe. similar to
How to Get Fired Using Switch Statements & Statement Expressions:
https://blog.robertelder.org/switch-statements-statement-exp...
kazinator
Without information about how identifiers are declared, you do not know how to parse this:
(A)(B);
It could be a cast of B to type A, or function A being called with argument B.Or this (like the puts(puts) in the article):
A(B):
Could be a declaration of B as an identifier of type A, or a call to a function A with argument B.Back in 1999 I made a small C module called "sfx" (side effects) which parses and identifies C expressions that could plausibly contain side effects. This is one of the bits provided in a small collection called Kazlib.
This can be used to make macros safer; it lets you write a #define macro that inserts an argument multiple times into the expansion. Such a macro could be unsafe if the argument has side effects. With this module, you can write the macro in such a way that it will catch the situation (albeit at run time!). It's like a valgrind for side effects in macros, so to speak.
https://git.savannah.gnu.org/cgit/kazlib.git/tree/sfx.c
In the sfx.c module, there is a rudimentary C expression parser which has to work in the absence of declaration info. In other words it has to make sense of an input like (A)(B).
I made it so that when the parser encounters an ambiguity, it will try parsing it both ways, using backtracking via exception handling (provided by except.c). When it hits a syntax error, it can backtrack to an earlier point and parse alternatively.
Consider (A)(A+B). When we are looking at the left part (A), that could plausibly be a cast or declaration. In recursive descent mode, we are going left to right and looking at left derivations. If we parse it as a declaration, we will hit a syntax error on the +, because there is no such operator in the declarator grammar. So we backtrack and parse it as a cast expression, and then we are good.
Hard to believe that was 26 years ago now. I think I was just on the verge of getting into Lisp.
I see the sfx.c code assumes it would never deal with negative character values, so it cheerfully uses the <ctype.h> functions without a cast to unsigned char. It's a reasonable assumption there since the inputs under the intended use case would be expressions in the user's program, stringified by the preprocessor. Funny bytes would only occur in a multi-byte string literal (e.g. UTF-8). When I review code today, this kind of potential issue immediately stands out.
The same exception module is (still?) used in the Ethereal/Wireshark packet capture and analysis tool. It's used to abort "dissecting" packets that are corrupt or truncated.
null
GranPC
Can anyone explain how the last snippet works?
andreyv
It creates a compound literal [1] of type array of int, and initializes the specified array positions using designated initializers [2] with the results of calls to puts().
Using designated initializers without the = symbol is an obsolete extension.
[1] https://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html [2] https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
a12k
Basically they execute in order of array initialization, not index order, so it outputs hello cruel world rather than cruel world hello.
sylware
C syntax is already way too rich and complex.
We need a C- ore µC:
No implicit cast except for literals and void* (explicit compile time/runtime casts), one loop statement (loop{}), no switch/enum/generic/_thread/typeof/etc, no integer promotion, only sized primitive types (u64 s32 f32 etc...), no anonymous code block, real compiler hard/compile time constant declaration, many operators have to go (--,++, a?b:c, etc)... and everything I am forgetting right now (the dangerous struct pack attribute...). But we need inline keywords for memory barriers, atomics for modern hardware architecture programming.
wongarsu
There is C0, a stripped-down version of C popular in academia [1]. Great for teaching because it's conceptually simple and easy to write a compiler for. But with a couple of additions (like sized primitive types) it might match what you are imagining
glouwbug
C really just needs if / else / while / and void functions. Function inputs should be in/out (const type* or type*).
bregma
So, FORTRAN IV except for the else.
butterisgood
Pre-scheme?
short_sells_poo
Sooo, assembly :)
poincaredisk
No. Assembly is not portable, not typed and not structured (as in structured programming).
Another usecase for microC: most decompilers decompile to a custom C-like language. It pretends to be C, but in reality this is just a representation of a recovered AST that is often-but not always-a valid C code. MicroC would be a better target, because it would supposedly have less weird edge cases.
jcranmer
I'd argue that unsafe Rust is a better target here (although I don't know if &raw has made it into stable Rust yet, which you need for doing an address-of that doesn't have a risk of UB like & does). Rust's typesystem is less silly (there's only one multiple-types-for-same-value, namely usize), there's no implicit conversions, the core CFG primitives are a little bit richer (labeled break/continue, although those are now added in C2y), and there's a decent usable intrinsic class for common operators-not-in-C like rotations or popcount.
If your goal is a simple language to compile, well, Rust won't ever fit the bill; but as a target for a decompiler, I think the unsafe subset is worth exploring.
uecker
I don't think any weird edge case is a problem when targeting C. You just do not produce such cases when emitting the code.
null
9rx
> Assembly [...] not typed and not structured (as in structured programming).
That depends on the assembly language. Some have structure constructs, some are typed. Portability is out.
But if you accept a slightly higher abstraction, WebAssembly is portable, typed, and structured.
null
Frenchgeek
Sphinx C-- maybe?
mystified5016
You want assembly with some sugar.
Read up on Forth languages. It's pretty much exactly what you're after.
mananaysiempre
Forth is kind of weak dealing with value types of unknown size. For example, suppose you're writing a cross-compiler, and an address on the target machine might take one or two cells on the host machine depending on the host bitness. Now suppose you need to accept a (host-machine) cell and a target-machine address (e.g. an opcode and an immediate) and manipulate them a bit. Trivial in C, definitely possible in Forth, but supremely annoying and the elegance kind of falls apart.
accelbred
Does Zig fit your bill?
matheusmoreira
Note that this is GNU C, not standard C. GNU has extended the normal C language with features such as forward parameter declarations and numeric ranges in switch cases. Lots of people don't know about these things.
dzaima
Note that switch case ranges might be coming in C2y though.
mananaysiempre
Also forward parameter declarations, or is that proposal dead?
wahern
Basically dead. The main motivation would be to make it easier to use variably modified types in function parameters, where the (length) identifier is declared after the variably modified type, as in
> void foo(int a[m][m], int m)
Currently you can only do: > void foo(int m, int a[m][m])
The holy grail is being able to update the prototypes of functions like snprintf to something like: > int snprintf(char buf[bufsiz], size_t bufsiz, const char *, ...);
However, array pointer decay means that foo above is actually: > void foo(int (*a)[m], int m)
Likewise, the snprintf example above would be little different than the current definition.There's related syntax, like
> foo (int m, int a[static m])
But a is still just a pointer, and while it can help some static analyzers to detect mismatched buffer size arguments at the call site, the extent of the analysis is very limited as decay semantics effectively prevent tracing the propagation of buffer sizes across call chains, even statically.There's no active proposal at the moment to make it possible to pass VM arrays (or rather, array references) directly to functions--you can only pass pointers to VM array types. That actually works (sizeof *a == sizeof (int) * m when declaring int (*a)[m] in the prototype), but the code in the function body becomes very stilted with all the syntactical dereferencing--and it's just syntactical as the same code is generated for a function parameter of `int (*a)[m]` as for `int *a` (underneath it's the same pointer value rather than an extra level of memory indirection). There are older proposals but they all lost steam because there aren't any existing implementation examples in any major production C compilers. Without that ability, the value of forward declarations is greatly diminished. Because passing VM array types to functions already requires significant refactoring, most of the WG14 felt it wasn't worth the risk of adopting GCC's syntax when everybody could (and should?) just start declaring size parameters before their respective buffer parameters in new code.
C[++] never ceases to amaze me. Just yesterday I saw this, totally valid, code snippet:
``` void g(); void f() { return g(); } ```