The surprising struggle to get a Unix Epoch time from a UTC string in C or C++
105 comments
·January 19, 2025chikere232
johnisgood
It is not.
int main(void) {
struct tm tm = {0};
const char *time_str = "Mon, 20 Jan 2025 06:07:07 GMT";
const char *fmt = "%a, %d %b %Y %H:%M:%S GMT";
// Parse the time string
if (strptime(time_str, fmt, &tm) == NULL) {
fprintf(stderr, "Error parsing time\n");
return 1;
}
// Convert to Unix timestamp (UTC)
time_t timestamp = timegm(&tm);
if (timestamp == -1) {
fprintf(stderr, "Error converting to timestamp\n");
return 1;
}
printf("Unix timestamp: %ld\n", timestamp);
return 0;
}
It is a C99 code snippet that parses the UTC time string and safely converts it to a Unix timestamp and it follows best practices from the SEI CERT C standard, avoiding locale and timezone issues by using UTC and timegm().You can avoids pitfalls of mktime() by using timegm() which directly works with UTC time.
Where is the struggle? Am I misunderstanding it?
Oh by the way, must read: https://www.catb.org/esr/time-programming/ (Time, Clock, and Calendar Programming In C by Eric S. Raymond)
1vuio0pswjnm7
"Mon, 20 Jan 2025 06:07:07 GMT"
I thought the default output of date(1), with TZ unset, is something like
Mon Jan 20 06:07:07 UTC 2025
That's the busybox default anywayjohnisgood
Well, `Mon Jan 20 06:07:07 UTC 2025` does not match `fmt` in the code. My input matches the format string exactly, which is why it works.
You could use `"%a %b %d %H:%M:%S %Z %Y"` for `fmt` (which is indeed the default for `date`) and it would work with yours.
Both results in the same timestamp.
1vuio0pswjnm7
If I use "UTC" it works. For example,
date.l:
int fileno (FILE *);
FILE *f;
int printf(const char *__restrict, ...);
#include <time.h>
char *strptime(const char *s, const char *f, struct tm *tm);
struct tm t;
a (Mon|Tue|Wed|Thu|Fri|Sat|Sun)
b (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
d [0-2][0-9]|3[01]
H [0-2][0-9]
M [0-5][0-9]
S [0-5][0-9]
Y [1-9][0-9][0-9][0-9]
%option nounput noinput noyywrap
%%
{a}[ ]{b}[ ]{d}[ ]{H}:{M}:{S}[ ]UTC[ ]{Y} {
strptime(yytext,"%a %b %d %H:%M:%S UTC %Y",&t);
printf("%ld\n",mktime(&t));
}
.|\n
%%
int main(){yylex();exit(0);}
flex -8Cem date.l
cc -O3 -std=c89 -W -Wall -pipe lex.yy.c -static -s -o yydate
date|yydate
This works for me. No need for timegm().But if I substitute %Z or %z for "UTC" in strptime() above then this does not work.
Fun fact: strptime() can make timestamps for dates that do not exist on any calandar.
echo "Thu Jun 31 01:59:26 UTC 2024"|yydate
paxcoder
I can't find `timegm` neither in the C99 standard draft nor in POSIX.1-2024.
The first sentence of your link reads:
>The C/Unix time- and date-handling API is a confusing jungle full of the corpses of failed experiments and various other traps for the unwary, many of them resulting from design decisions that may have been defensible when the originals were written but appear at best puzzling today.
wahern
timegm was finally standardized by C23, and POSIX-2024 mentions it in the FUTURE DIRECTIONS section of mktime. I don't know precisely what happened with POSIX. I think timegm got lost in the shuffle and by the time Austin Group attention turned back to it, it made more sense to let C23 pick it up first so there were no accidental conflicts in specification.[1]
[1] POSIX-2024 incorporates C17, not C23, but in practice the typical POSIX environment going forward will likely be targeting POSIX-2024 + C23, or just POSIX-2024 + extensions; and hopefully neither POSIX nor C will wait as long between standard updates as previously.
chikere232
https://man7.org/linux/man-pages/man3/timegm.3.html
It's not posix, but it's pretty available
johnisgood
Yeah, you're correct that `timegm` is neither part of the C99 standard nor officially specified in POSIX.1-2024 but it is widely supported in practice on many platforms, including glibc, musl, and BSD systems which makes it a pragmatic choice in environments where it is available. Additionally, it is easy to implement it in a portable way when unavailable.
So, while `timegm` is not standardized in C99 or POSIX, it is a practical solution in most real-world environments, and alternatives exist for portability, and thus: handling time in C is not inherently a struggle.
As for the link, it says "You may want to bite the bullet and use timegm(3), even though it’s nominally not portable.", but see what I wrote above.
kazinator
Here is some of my code that works around not having timegm. It is detected in a configure script, so there's a #define symbol indicating whether it's available.
michaelt
> Is it a struggle though?
It’s twelve lines or more, if you include the imports and error handling.
Spreadsheets and SQL will coerce a string to a date without even being asked to. You might want something more structured than that, but you should be able to do it in far less than 12 lines.
C has many clunky elements like this, which makes working with it like pulling teeth.
Suppafly
>Spreadsheets and SQL will coerce a string to a date without even being asked to.
But only when you don't want them to, when you do want them to do it it's still a pain.
stonogo
Spreadsheets and SQL will coerce a string to a date because someone programmed them to in C or C++.
sitzkrieg
almost like C is logically operating at a lower level than spreadsheets or SQL or something
oguz-ismail
> you should be able to do it in far less than 12 lines
In C++, maybe. In C, not necessarily. If you're not willing to reinvent the wheel why'd you choose C anyway?
null
pif
What's a man page? [cit]
johnisgood
"manual pages", type "man man" in your terminal.
TZubiri
Never type up man man, it might make the internet implpode.
amelius
It's where people went for programming information before ChatGPT and even before StackOverflow.
werdnapk
It's where people went for information "even before" the internet.
pif
I'm sorry the sarcasm was not evident. I learnt to program when men were men, and man was man.
cstrahan
[dead]
d_burfoot
My personal rule for time processing: use the language-provided libraries for ONLY 2 operations: converting back and forth between a formatted time string with a time zone, and a Unix epoch timestamp. Perform all other time processing in your own code based on those 2 operations, and whenever you start with a new language or framework, just learn those 2.
I've wasted so many dreary hours trying to figure out crappy time processing APIs and libraries. Never again!
avalys
Starting from timestamp A, how do I find the Unix timestamp B corresponding to exactly 6 months in the future from timestamp B?
cryptonector
Adding or subtracting "months" is inherently difficult because months don't have set lengths, varying from 28 through 31 days. Thus adding one month to May 31 is weird: should that be June 30 or July 1 or some other date?
Try not to have to do this sort of thing. You might have to though, and then you'll have to figure out what adding months means for your app.
mjevans
Welcome to Business Logic. This is where I'd really like pushback to result in things that aren't edgecases.
However you also run into day to day business issues like:
* What if it's now a Holiday and things are closed?
* What if it's some commonly busy time like winter break? (Not quite a single holiday)
* What if a disaster of somekind (even just a burst waterpipe) halts operations in an unplanned way?
Usually flexability needs to be built in. It can be fine to 'target' +3 months, but specify it as something like +3m(-0d:+2w) (so, add '3 months' ignoring the day of month, clamp dom to a valid value, allow 0 days before or 14 days after),
Spivak
I think the parent is describing a "bring your own library" approach where a set of known to the author algorithms will be used for those calculations and the only thing the host language will be used for is the parse/convert.
It does remove a lot of the ambiguity of "I wonder what this stdlib's quirks are in their date calculations" but it also seems like a non-trivial amount of effort to port every time.
d_burfoot
The difficulty of this problem rests on the ambiguity of the phrase "exactly 6 months", which is going to depend totally on the precise business logic. But there's no reason to suppose that the requirements of the business logic will agree with the concepts implemented by the datetime library.
layer8
"Exactly 6 months in the future" from an arbitrary timestamp is not well-defined, even when assuming a fixed time zone. What is it supposed to mean?
1970-01-01
13 more years to go until the 2038 problem.
Surely we'll have everything patched up by then..
ahubert
wow that is dedication 1970-01-01! :-)
xnorswap
It worries me how blasé we seem to be to the 2038 problem.
I wonder if people will still be repeating the "Y2k myth" myth as things start to fail.
robertlagrant
People are doing things[0]. We'll see closer to the date what's left, I suppose.
[0] https://en.wikipedia.org/wiki/Year_2038_problem#Implemented_...
quesera
Almost exactly 13 years, in fact!
The overflow happens at 2038-01-19T03:14:08Z.
account42
The concept of a process-wide locale was a mistake. All locale-dependent functons should be explicit. Yes that means some programs won't respect your locale because the author didn't care to add support but at least they won't break in unexpected ways because some functions magically work differently between the user's and developers system.
robertlagrant
Totally agree. Python's gettext() API feels so ancient because it can only cope with one locale at a time, and it would love to get that locale from an environment variable. Not ideal for writing an HTTP service that sends text based on the Accept-Language header.
layer8
It was a very reasonable design when most programs were local-only.
account42
It really wasn't. Even local-only programs need to process data that isn't formatted in the user's locale.
kazinator
thread-local locale you can easily save and restore would work. In other words, dynamically scoped.
But you don't want to be processing data in locale dependent-ways using the crap available in ISO C.
null
jonstewart
The headline doesn’t match the article. As it points out, C++20 has a very nice, and portable, time library. I quibble with the article here, though: in 2025, C++20 is widely available.
jeffbee
Indeed. The article should be retitled "C still useless in 2025, including time handling".
chikere232
It would be incorrect, but it's already incorrect as what they're doing isn't really a struggle, so I guess the net result is neutral?
spacechild1
Damn, I didn't notice that C++20 added a whole bunch of new features to the std::chrono library! Nice!
zX41ZdbW
The first rule of thumb is to never use functions from glibc (gmtime, localtime, mktime, etc) because half of them are non-thread-safe, and another half use a global mutex, and they are unreasonably slow. The second rule of thumb is to never use functions from C++, because iostreams are slow, and a stringstream can lead to a silent data loss if an exception is thrown during memory allocation.
ClickHouse has the "parseDateTimeBestEffort" function: https://clickhouse.com/docs/en/sql-reference/functions/type-... and here is its source code: https://github.com/ClickHouse/ClickHouse/blob/74d8551dadf735...
bagels
I came to make the thread safe comment. Got bit by that myself formatting is8601, would get wrong output... Sometimes.
I won't believe anyone who tells me that handling time in c/c++ isn't perilous.
p0w3n3d
I think that time handling is the most hard thing in the world of programming.
Explanation: you can learn heap sort or FFT or whatever algorithm there is and implement it. But writing your own calendar from scratch, that will do for example chron job on 3 am in the day of DST transition, that works in every TZ, is a work for many people and many months if not years...
timewizard
Time handling is exceptionally easy. Time zone handling is hard. It doesn't help that the timezone database isn't actually designed to make this any easier.
p0w3n3d
Meanwhile I edited my comment but we're still agreeing. And adding them for example to embedded systems is additional pain. Example: tram or train electronic boards / screens
DougN7
I don’t know. I’ve written that seemed like obvious simple code that got tripped up with the 25 hour day on DST transition. That’s when I learned to stick to UTC.
sgarland
Debian’s vixie-cron had a bug [0] where if the system TZ was changed without restarting crond, it would continue to run jobs based on the old TZ. It checked for DST transitions, but not TZ.
In fairness, it’s not something that should happen much at all, if ever.
[0]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019716
wang_li
Assuming the unstated requirement that you want your cron job to only run once per day, scheduling for 3 am is not a software problem. It's a lack of understanding by the person problem. By definition times around the time change can occur twice or not at all. Also, in the US 3am would never be a problem as the time changes at 2 am.
Also, naming things, cache coherency, and off by one errors are the two hardest problems in computer science.
blindriver
I used the ICU packages when I needed to do something like this but it's been a decade since I coded in C++.
havermeyer
The Abseil time library makes time and date parsing and manipulation a lot nicer in C++: https://abseil.io/docs/cpp/guides/time
rstuart4133
For those skimmimg the problem is mktime() returns local time, and they want it in UTC. So you need to subtract the timezone used, but the timezone varies by date you feed mktime() and there is no easy way to determime it.
If you are happy for the time to perhaps be wrong around the hours timezone changes, this is an easy hack:
import time
def time_mktime_utc(_tuple):
result = time.mktime(_tuple[:-1] + (0,))
return result * 2 - time.mktime(time.gmtime(result))
If you are just using it for display this is usually fine as time zone changes are usually timed to happen when nobody is looking.cryptonector
And the answer is to use `gmtime()`, which AIX doesn't have and which Windows calls something else, but, whatever, if you need to support AIX you can use an open source library.
shakna
AIX has gmtime [0], too. Since at least 7.1.
[0] https://www.ibm.com/docs/en/aix/7.1?topic=c-ctime-localtime-...
chikere232
That is not really the problem.
mktime() parses the time string which lacks any information on time zones
then the article uses timegm() to convert it to unixtime on the assumption that it was in UTC
also it's about C
rstuart4133
> mktime() parses the time string
No, mktime() doesn't parse a string. Parsing the string is done by strptime(). mktime() takes the output of strptime(), which is a C structure or the equivalent in Python - a named tuple with the same fields.
> the time string lacks any information on time zones
Not necessarily. Time strings often contain a time zone. The string you happen to be parsing doesn't contain a time zone you could always append one. If it did have a time zone you could always change it to UTC. So this isn't the problem either.
The root cause of the issue is the "struct tm" that strptime() outputs didn't have field for the time zone so if the string has one, it is lost. mktime() needs that missing piece of information. It solves that problem by assuming the missing time zone is local time.
> then the article uses timegm() to convert it to unixtime on the assumption that it was in UTC
It does, but timegm() is not a POSIX function so isn't available on most platforms. gmtime() is a POSIX function and is available everywhere. It doesn't convert a "struct tm", but it does allow you to solve the core problem the article labours over, which is finding out what time zone offset mktime() used. With that piece of information it's trivial to convert to UTC, as the above code demonstrates in 2 lines.
> also it's about C
The python "time" module is a very thin wrapper around the POSIX libc functions and structures. There is a one to one correspondence, mostly with the same names. Consequently any experienced C programmer will be able translate the above python to C. I chose Python because it expresses the same algorithm much more concisely.
null
d0mine
It is easier in Python:
>>> from email.utils import parsedate_tz, mktime_tz
>>> mktime_tz(parsedate_tz("Fri, 17 Jan 2025 06:07:07"))
1737094027
It converts rfc 2822 time into POSIX timestamp ([mean solar] seconds since epoch--elapsed SI seconds not counting leap seconds).TZubiri
Fun fact, http 1 used to pass expirations and dates in string format.
[Missing scene]
" We are releasing Http1.1 specifications whereby expirations are passed as seconds to expire instead of dates as strings."
richrichie
> give us some truly excellent code that we really don’t deserve
Why such self flagellation?
Is it a struggle though?
They needed to have a locale matching the language of the localised time string they wanted to parse, they needed to use strptime to parse the string, they needed to use timegm() to convert the result to seconds when seen as UTC. The man pages pretty much describe these things.
The interface or these things could certainly be nicer, but most of the things they bring up as issues aren't even relevant for the task they're trying to do. Why do they talk about daylight savings time being confusing when they're only trying to deal with UTC which doesn't have it?