Getting silly with C, part (void*)2

122 comments

·January 10, 2025

sylware

C syntax is already way too rich and complex.

We need a C- ore µC:

No implicit cast except for literals and void* (explicit compile time/runtime casts), one loop statement (loop{}), no switch/enum/generic/_thread/typeof/etc, no integer promotion, only sized primitive types (u64 s32 f32 etc...), no anonymous code block, real compiler hard/compile time constant declaration, many operators have to go (--,++, a?b:c, etc)... and everything I am forgetting right now (the dangerous struct pack attribute...). But we need inline keywords for memory barriers, atomics for modern hardware architecture programming.

wongarsu

There is C0, a stripped-down version of C popular in academia [1]. Great for teaching because it's conceptually simple and easy to write a compiler for. But with a couple of additions (like sized primitive types) it might match what you are imagining

1: https://c0.cs.cmu.edu/docs/c0-reference.pdf

glouwbug

C really just needs if / else / while / and void functions. Function inputs should be in/out (const type* or type*).

bregma

So, FORTRAN IV except for the else.

butterisgood

Pre-scheme?

accelbred

Does Zig fit your bill?

sylware

Dunno, I should have a look though. But I have recollection of some garbage collector, wrong/right ?

nick__m

I doubt that, zig is allocators land. Even stdlib datastructures required an allocators to be instanciated. Have a look at the selection of allocators: https://zig.guide/standard-library/allocators .

short_sells_poo

Sooo, assembly :)

poincaredisk

No. Assembly is not portable, not typed and not structured (as in structured programming).

Another usecase for microC: most decompilers decompile to a custom C-like language. It pretends to be C, but in reality this is just a representation of a recovered AST that is often-but not always-a valid C code. MicroC would be a better target, because it would supposedly have less weird edge cases.

pjmlp

Hence why Macro Assemblers have existed for almost as long as raw Assembly.

MASM and TASM, were already far beyond the features in K&R C, if we overlook the issue of being bound to 80x86 Assembly.

TI has some DSPs where their Assembly is basically bare bones C like, in a SSA kind of approach.

uecker

I don't think any weird edge case is a problem when targeting C. You just do not produce such cases when emitting the code.

jcranmer

I'd argue that unsafe Rust is a better target here (although I don't know if &raw has made it into stable Rust yet, which you need for doing an address-of that doesn't have a risk of UB like & does). Rust's typesystem is less silly (there's only one multiple-types-for-same-value, namely usize), there's no implicit conversions, the core CFG primitives are a little bit richer (labeled break/continue, although those are now added in C2y), and there's a decent usable intrinsic class for common operators-not-in-C like rotations or popcount.

If your goal is a simple language to compile, well, Rust won't ever fit the bill; but as a target for a decompiler, I think the unsafe subset is worth exploring.

null

[deleted]

9rx

> Assembly [...] not typed and not structured (as in structured programming).

That depends on the assembly language. Some have structure constructs, some are typed. Portability is out.

But if you accept a slightly higher abstraction, WebAssembly is portable, typed, and structured.

null

[deleted]

sylware

Yep, assembly... but a royalty free, brutally simple ISA...

Wait... we have it... RISC-V.

But we need performant µArchitectures of all major use cases (server/desktop/mobile/etc) and that on the best silicon process.

If RISC-V is a success, no need for a µC, just go RISC-V assembly BUT... do not abuse that macro preprocessor, because if it is to move complexity from the C syntax to some macro preprocessor, that would be pointless.

Frenchgeek

Sphinx C-- maybe?

sylware

Is that microsoft C--?

mystified5016

You want assembly with some sugar.

Read up on Forth languages. It's pretty much exactly what you're after.

mananaysiempre

Forth is kind of weak dealing with value types of unknown size. For example, suppose you're writing a cross-compiler, and an address on the target machine might take one or two cells on the host machine depending on the host bitness. Now suppose you need to accept a (host-machine) cell and a target-machine address (e.g. an opcode and an immediate) and manipulate them a bit. Trivial in C, definitely possible in Forth, but supremely annoying and the elegance kind of falls apart.

mystified5016

Assembly isn't portable? Wow! You must be some type of genius! I'll start calling newspapers

sylware

endian support is like the memory barrier and atomic support, should be inline function/keyword.

mhandley

I expect many people know this one, but it's a useful teaching aid when understanding the relationship between arrays and pointers

  int array[10];
  *(array+1) = 56;
  array[2] = 4;
  3[array] = 27;

The first two are obvious, but the third is also legal. It works because array indexing is just sugar for pointer arithmetic, so array[2]=4 is identical in meaning to *(array+2)=4. Therefore 3[array]=27 is identical to *(3+array)=27 and so is legal. But just because you can doesn't mean you should.

macintux

The best, most entertaining book I've ever read on C covered that (unless I'm misremembering, but I doubt it): Expert C Programming.

https://www.goodreads.com/book/show/198207.Expert_C_Programm...

dualogy

I'm already liking that one! Page 5 quote:

> There is one other convention — sometimes we repeat a key point to emphasize it. In addition, we sometimes repeat a key point to emphasize it.

One more quote and I'll stop:

> ctime() converts its argument into local time, which will vary from GMT, depending on where you are. California, where this book was written, is eight hours behind London, and several years ahead

WalterBright

> The first two are obvious, but the third is also legal.

D doesn't have that bug!

In 44 years of C programming, I've never encountered a legitimate use for the 3rd. (Other than Obfuscated C, that is.))

WolfeReader

It's not a bug. You're seeing the difference between "this is how you're taught to access arrays" and "this is how array access actually works".

WalterBright

Since the Standard specifies what that does, pedantically it is not a bug. Ok.

But I call it a bug because it has no use and just pointlessly confuses people.

im3w1l

Well it could (and I agree with WalterBright that it should) have been disallowed. a[b] being implemented as an early stage rewrite rule expanding to *(a+b) is an uninteresting implementation detail. And I doubt it is even implemented that way in modern compilers anyway. It certainly can't be in C++ as a[b] and b[a] mean different things when [] is overloaded.

kragen

You seem to be lecturing the author of one of the most prominent early C compilers on how array access actually works in C.

mhandley

Agreed - I've only been programming C for 38 years but I've also never found a legitimate use. However I have used it to illustrate a point when teaching C to beginners - it looks so odd they tend to remember it.

matheusmoreira

Note that this is GNU C, not standard C. GNU has extended the normal C language with features such as forward parameter declarations and numeric ranges in switch cases. Lots of people don't know about these things.

dzaima

Note that switch case ranges might be coming in C2y though.

mananaysiempre

Also forward parameter declarations, or is that proposal dead?

wahern

Basically dead. The main motivation would be to make it easier to use variably modified types in function parameters, where the (length) identifier is declared after the variably modified type, as in

  > void foo(int a[m][m], int m)

Currently you can only do:

  > void foo(int m, int a[m][m])

The holy grail is being able to update the prototypes of functions like snprintf to something like:

  > int snprintf(char buf[bufsiz], size_t bufsiz, const char *, ...);

However, array pointer decay means that foo above is actually:

  > void foo(int (*a)[m], int m)

Likewise, the snprintf example above would be little different than the current definition.

There's related syntax, like

  > foo (int m, int a[static m])

But a is still just a pointer, and while it can help some static analyzers to detect mismatched buffer size arguments at the call site, the extent of the analysis is very limited as decay semantics effectively prevent tracing the propagation of buffer sizes across call chains, even statically.

There's no active proposal at the moment to make it possible to pass VM arrays (or rather, array references) directly to functions--you can only pass pointers to VM array types. That actually works (sizeof *a == sizeof (int) * m when declaring int (*a)[m] in the prototype), but the code in the function body becomes very stilted with all the syntactical dereferencing--and it's just syntactical as the same code is generated for a function parameter of `int (*a)[m]` as for `int *a` (underneath it's the same pointer value rather than an extra level of memory indirection). There are older proposals but they all lost steam because there aren't any existing implementation examples in any major production C compilers. Without that ability, the value of forward declarations is greatly diminished. Because passing VM array types to functions already requires significant refactoring, most of the WG14 felt it wasn't worth the risk of adopting GCC's syntax when everybody could (and should?) just start declaring size parameters before their respective buffer parameters in new code.

dfawcus

I'd have to argue the function typedefs are not useless, I've come across two uses.

The obvious one is rather than a function pointer typedef, such the subsequent use in a struct is obviously a pointer. Which helps when others are initially reading unfamiliar structures.

  typedef int handler_ty(int a);

  struct foo {
    handler_ty *handler;
    /* ... */
  }

  struct foo table[] = { { /* init fields */, /* init fields */, };

The other case can be somewhat related, namely as an assertion / check when writing such handler functions, and more importantly updating them.

  handler_ty some_handler;
  int some_handler(int a) { /* ... */ }

When updating code, it allowed for easier to decode compiler errors if the expected type of handler_ty was changed, and some specific handler was incorrectly updated, or not updated at all.

Basically the error would generally directly call out the inconsistency with the prior line, rather than with the distanct use in the initialisation of 'table'.

As I recall this mechanism has been around since at least C89, I don't recall using it in K&R.

WalterBright

I'm going to speculate a bit on why these silly things are in C.

C was developed on a PDP-11 that had 64Kb of memory. That's not much of any at all. Therefore, the compiler must be extremely tightly coded.

The fundamental rules of the C language are pretty simple. But articles like these expose consequences of such simple rules. Fixing them requires adding more code. Adding more code means less room for the code being compiled.

Therefore, if the intended use of the language works, the pragmatic approach would be to simply not worry about the quirky consequences.

A more interesting question would be "why do these characteristics persist in modern C compilers?"

The stock answer is "backwards compatibility", "Obfuscated C Code contests" and "gotcha job interview questions". My argument would be that there is no reason for the persistence of such "junk DNA" and it should be deprecated and removed.

I've done my part. D doesn't support that stuff, even though the basic use of the language is easily confused with C.

For example:

    #include <stdio.h>
    void main()
    {
        int i;
        for (i = 0; i < 10; ++i);
            printf("%d\n", i);
    }

I've died on that hill. I know others who lost an entire day staring at it wondering what's wrong with it. I saw it on X recently as "99% of C programmers will not be able to find the bug."

The equivalent D code:

    import core.stdc.stdio;
    void main()
    {
        int i;
        for (i = 0; i < 10; ++i);
            printf("%d\n", i);
    }

gets you:

    test.d(5): Error: use `{ }` for an empty statement, not `;`

C'mon, Standard C! Fix that!

moefh

> I know others who lost an entire day staring at it wondering what's wrong with it. I saw it on X recently as "99% of C programmers will not be able to find the bug."

Both gcc and clang give a warning[1] for that code with just "-Wall", so I's hard to imagine it being a real problem these days.

[1] https://godbolt.org/z/vfPzhc596

WalterBright

I know modern compilers do it, too. But still the language needs to be fixed. The proof is C programmers still get victimized by this useless feature.

Compiler warnings are a good source of material for things that need to be fixed in the language. Unfortunately, every compiler has their own set of warnings, and sometimes warnings from different compilers contradict each other. That encourages programmers to not use the warning feature. That's another reason why the language should be fixed.

uecker

In my experience the warnings work quite well for the programmers I know.

Anyway, ranting on HackerNews does not get anything fixed: https://www.open-std.org/jtc1/sc22/wg14/www/contributing.htm...

HeliumHydride

I was able to find it instantly, but that's because I always use curly braces for my if/while/for loops.

WalterBright

I added a warning for it in my C compiler back in the mid 1980s.

Fun story. A friend of mine (Eric Engstrom!) bought himself a backhoe. I'd never driven one before and he offered to let me drive it. Sure!

The clutch pedal works backwards from that in a car. Press on the clutch to engage it, release the pedal to disengage it. After some struggling with my reflexes being all wrong, I came within a couple feet of taking out the side of his barn - by switching off the key.

There was nothing wrong with that user interface, other than being insane.

binaryturtle

Indeed, enforcing brackets here would be the proper fix, IMHO. :)

I always use brackets too. That's simply a less error-prone style.

dfawcus

clang and gcc now warn of that.

  $ gcc-12 -g -O2 -std=c11 -Wall -Wextra -Wpedantic -Werror c-error.c
  c-error.c:2:10: error: return type of ‘main’ is not ‘int’ [-Werror=main]
      2 |     void main()
        |          ^~~~
  c-error.c: In function ‘main’:
  c-error.c:5:9: error: this ‘for’ clause does not guard... [-Werror=misleading-indentation]
      5 |         for (i = 0; i < 10; ++i);
        |         ^~~
  c-error.c:6:13: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘for’
      6 |             printf("%d\n", i);
        |             ^~~~~~
  cc1: all warnings being treated as errors

and:

  $ clang-14 -g -O2 -std=c11 -Wall -Wextra -Wpedantic -Werror c-error.c
  c-error.c:2:5: error: 'main' must return 'int'
      void main()
      ^~~~
      int
  c-error.c:5:33: error: for loop has empty body [-Werror,-Wempty-body]
          for (i = 0; i < 10; ++i);
                                  ^
  c-error.c:5:33: note: put the semicolon on a separate line to silence this warning
  2 errors generated.

Now granted, those are specific implementations, not things mandated by language changes.

WalterBright

> I've died on that hill.

But I'm feeling much better.

mystified5016

Forward parameter declaration is an insane feature. It makes perfect sense in the context of C's other forward declarations but just bonkers.

I can't wait to slip this into some production code to confuse the hell out of some intern in a few years

svilen_dobrev

hehe. similar to

How to Get Fired Using Switch Statements & Statement Expressions:

https://blog.robertelder.org/switch-statements-statement-exp...

kazinator

Without information about how identifiers are declared, you do not know how to parse this:

  (A)(B);

It could be a cast of B to type A, or function A being called with argument B.

Or this (like the puts(puts) in the article):

  A(B):

Could be a declaration of B as an identifier of type A, or a call to a function A with argument B.

Back in 1999 I made a small C module called "sfx" (side effects) which parses and identifies C expressions that could plausibly contain side effects. This is one of the bits provided in a small collection called Kazlib.

This can be used to make macros safer; it lets you write a #define macro that inserts an argument multiple times into the expansion. Such a macro could be unsafe if the argument has side effects. With this module, you can write the macro in such a way that it will catch the situation (albeit at run time!). It's like a valgrind for side effects in macros, so to speak.

https://git.savannah.gnu.org/cgit/kazlib.git/tree/sfx.c

In the sfx.c module, there is a rudimentary C expression parser which has to work in the absence of declaration info. In other words it has to make sense of an input like (A)(B).

I made it so that when the parser encounters an ambiguity, it will try parsing it both ways, using backtracking via exception handling (provided by except.c). When it hits a syntax error, it can backtrack to an earlier point and parse alternatively.

Consider (A)(A+B). When we are looking at the left part (A), that could plausibly be a cast or declaration. In recursive descent mode, we are going left to right and looking at left derivations. If we parse it as a declaration, we will hit a syntax error on the +, because there is no such operator in the declarator grammar. So we backtrack and parse it as a cast expression, and then we are good.

Hard to believe that was 26 years ago now. I think I was just on the verge of getting into Lisp.

I see the sfx.c code assumes it would never deal with negative character values, so it cheerfully uses the <ctype.h> functions without a cast to unsigned char. It's a reasonable assumption there since the inputs under the intended use case would be expressions in the user's program, stringified by the preprocessor. Funny bytes would only occur in a multi-byte string literal (e.g. UTF-8). When I review code today, this kind of potential issue immediately stands out.

The same exception module is (still?) used in the Ethereal/Wireshark packet capture and analysis tool. It's used to abort "dissecting" packets that are corrupt or truncated.

jwilk

First part discussed on HN:

https://news.ycombinator.com/item?id=40835274 (113 comments)

zzo38computer

I had read the GCC documentation and I did not know about the forward parameter declaration. I did know about the other stuff that is mentioned there (and in the first part).

Declarations in for loops is something that I had only ever used in macros (I had not found it useful in other circumstances), such as:

  #define lpt_document() for(int lpt_document_=lpt_begin();lpt_document_;lpt_document_=(lpt_end(),0))
  #define win_form(xxx) for(win_memo win_mem=win_begin_();;win_step_(&win_mem,xxx))

(The compiler will optimize out the loop and the declared variable in the use of the lpt_document macro; I had tested this.)

teddyh

The comp.lang.c Frequently Asked Questions <https://c-faq.com/> should be required reading for every serious C programmer.

hulitu

Also the C infrequently asked questions https://www.seebs.net/faqs/c-iaq.html

GrantMoyer

I should keep this link handy for when people claim C is a simple language. Even without the GNU extensions, the examples here are pretty wretched.

SAI_Peregrinus

C is a small language. People confuse simple with small quite often. As languages get smaller, using them gets more difficult once below a certain size. The "Turing tarpit" languages like Brainfuck are extremely difficult to write complex programs in, mostly because they're so small.

C is clearly too small to be simple. C++ is too large to be simple. Somewhere in between, there may exist a simple language waiting to be invented.

tpoacher

C is simple in the same way Conway's Game of Life is simple.

That's not to say you can't create interesting monstrocities out of it!