Make C string literals const?
55 comments
·April 6, 2025jcalvinowens
_kst_
The C standard, since 1989, has said that attempting to modify the array object corresponding to a string literal has undefined behavior. Whether it "works" or not is not the issue.
The problem is that it's currently legal to pass a string literal to a function expecting a (non-const) pointer-to-char argument. As long as the function doesn't try to write through the pointer, there's no undefined behavior. (If the function does try to write through the pointer, the behavior is undefined, but no compile-time diagnostic is required.) If a future version of C made string literals const, such a program would become invalid (a constraint violation requiring a diagnostic). Such code was common in pre-ANSI C, before const was introduced to the language.
The following is currently valid C. The corresponding C++ code would be invalid. The proposal would make it invalid in C, with the cost of breaking some existing code, and the advantage of catching certain errors at compile time.
#include <stdio.h>
void print_message(char *message) {
puts(message);
// *message = '\0'; // would have undefined behavior
}
int main(void) {
print_message("hello");
}
jcalvinowens
> Whether it "works" or not is not the issue.
Of course it is. It doesn't work on anything modern, and thus it is impossible for portable code which actually runs in the real world and has to work to have relied on it for a long time.
Your example is not code any competent C programmer would ever write, IMHO. Every proficient C programmer I've ever worked with used "const char *" for string literals, and called out anybody who didn't in review.
Old code already needs special flags to build with modern compilers: I think the benefit of doing this outweighs the cost of editing some makefiles.
_kst_
A conforming implementation could make string literals modifiable, and (obviously non-portable) code could rely on that. I don't know whether any current compilers do so. I suspect not.
Apart from that, it's not about actually modifying string literals. It's about currently valid (but admittedly sloppy) code that uses a non-const pointer to point to a string literal. It's easy to write such code in a way that a modern conforming C compiler will not warn about.
That kind of code is the reason that this proposed change is not just an obvious no-brainer, and the author is doing research to find out how much of an issue it really is.
As it happens, I think that the next C standard should make string literals const. Any code that depends on the current behavior can still be compiled with C23 or earlier compilers, or with a non-conforming option, or by ignoring non-fatal warnings. And of course any such code can be fixed, but that's not necessarily trivial; making the source code changes can be a very small part of the process.
Any change that can break existing valid code should be approached with caution to determine whether it's worth the cost. And if the answer is yes, that's great.
ncruces
The most current SQLite amalgamation (3.49.1) is showing ~70 warnings when compiled with -Wwrite-strings.
But maybe 70 warnings in 250k LoC is OK for your standards of proficiency.
kazinator
If you suddenly make string literals const char, tons of previously correct code will require diagnostics. Code which requires diagnostics is incorrect and has undefined behavior if it is translated and executed anyway.
C++ went through this over 20 years ago. I can't remember if it was already in c++03 or whether it was a post '03 draft feature.
zabzonk
Yes, but C (or C++, for that matter) has no concept of .rodata. This is something that needs to be enforced by the compiler, as it is in C++, and why C programmers should probably simply use a C++ compiler, with its much stronger type checking.
jcalvinowens
You missed the point: I'm saying it has been impossible to modify string literals forever, so enforcing const is probably a non-issue except in very old C.
null
zabzonk
It is completely possible to write C code which does attempt to write to string literals.
the_svd_doctor
Right but some code will stop compiling, no?
hun3
The affected platforms lack an OS (e.g., bootloaders) and/or an MMU/MPU (e.g., microprocessors like AVR)
jcalvinowens
I don't care about platform specific stuff. I'm talking about C which is actually intended to be portable. Nothing written with portability in mind in the past ~decade is going to be doing this.
dyhi55
C is not node.js. C exists for 50 years and is expected to have stable API. In scientific circles it's not unusual to compile c and f77 libraries built in the 70's, 80's.
BLAS, gemv, GEMM, SGEMM libraries are from 1979, 1984, 1989. You may have seen these words scroll by when compiling modern 2025 CUDA :)
hun3
I think we're going a bit past each other.
In AVR or other MPU-less architecture you can literally modify the string literal memory without triggering a crash.
Why? Because there is no memory protection ("rodata") at all.
And such microprocessors are still in use today, so it's a bit too far fetched to say "really old code."
It's UB, sure, but how many embedded programmers actually care? The OP's proposal is trying to change the type system so that this UB becomes much less likely to trigger in practice.
Dwedit
Wait, C string literals are not already const? On many platforms, they live in a read-only data section, which is write-protected memory.
HeliumHydride
They're not const because of backwards compatibility. Const correctness in C is a lot weaker than the way C++ enforces it, letting you implicitly cast it away in a lot of cases.
jcalvinowens
On all modern platforms I'm familiar with, if you try to modify a string literal, you'll segfault. So while it's not const at the language level, it is very much const at the machine level.
zabzonk
At runtime, yes. But I want to know about errors like this at compile time.
dyhi55
You're young. On all the legacy platforms I'm familiar with, you can modify string literals. That's original c.
dyhi55
Strings including string literals are supposed to be writable for strtok() to work. Const char * is a modern c construct. You gotta deprecate parts of the standard c library, which will break backward compatibility...
kazinator
Using strtok on a string literal has been undefined behavior since ANSI C 89.
The standard C library uses const char * almost everywhere where a string is accepted that will not be modified.
kevin_thibedeau
I have a strtok() clone for this purpose that returns a pointer range for each token, leaving the string untouched.
kazinator
But then you have to copy out those pieces in order to have them null terminated so they can correctly function as strings.
bodyfour
The issue is that "const" didn't exist in the earliest forms of C... and even when it became available not everybody started using it.
So you might have a function that doesn't have proper "const" qualifications in its prototype like:
void my_log(char *message);
and then call-sites like: my_log("Hello, World!");
...and that needed to stay compiling.kazinator
Some C projects have been ready for this for years due to supporting being compiled as C++.
iknowstuff
Doesn’t the first paragraph address this?
KingLancelot
[dead]
Modifying string litetals has never worked on any platform I've run code on the past 20 years. They're always in .rodata. I can't imagine doing this by default would be a problem except for really old code.