Skip to content(if available)orjump to list(if available)

The surprising struggle to get a Unix Epoch time from a UTC string in C or C++

chikere232

Is it a struggle though?

They needed to have a locale matching the language of the localised time string they wanted to parse, they needed to use strptime to parse the string, they needed to use timegm() to convert the result to seconds when seen as UTC. The man pages pretty much describe these things.

The interface or these things could certainly be nicer, but most of the things they bring up as issues aren't even relevant for the task they're trying to do. Why do they talk about daylight savings time being confusing when they're only trying to deal with UTC which doesn't have it?

johnisgood

It is not.

  int main(void) {
    struct tm tm = {0}; 
    const char *time_str = "Mon, 20 Jan 2025 06:07:07 GMT"; 
    const char *fmt = "%a, %d %b %Y %H:%M:%S GMT"; 

    // Parse the time string
    if (strptime(time_str, fmt, &tm) == NULL) {
        fprintf(stderr, "Error parsing time\n");
        return 1;
    }

    // Convert to Unix timestamp (UTC)
    time_t timestamp = timegm(&tm);
    if (timestamp == -1) {
        fprintf(stderr, "Error converting to timestamp\n");
        return 1;
    }

    printf("Unix timestamp: %ld\n", timestamp);
    return 0;
  }
It is a C99 code snippet that parses the UTC time string and safely converts it to a Unix timestamp and it follows best practices from the SEI CERT C standard, avoiding locale and timezone issues by using UTC and timegm().

You can avoids pitfalls of mktime() by using timegm() which directly works with UTC time.

Where is the struggle? Am I misunderstanding it?

Oh by the way, must read: https://www.catb.org/esr/time-programming/ (Time, Clock, and Calendar Programming In C by Eric S. Raymond)

1vuio0pswjnm7

"Mon, 20 Jan 2025 06:07:07 GMT"

I thought the default output of date(1), with TZ unset, is something like

   Mon Jan 20 06:07:07 UTC 2025
That's the busybox default anyway

johnisgood

Well, `Mon Jan 20 06:07:07 UTC 2025` does not match `fmt` in the code. My input matches the format string exactly, which is why it works.

You could use `"%a %b %d %H:%M:%S %Z %Y"` for `fmt` (which is indeed the default for `date`) and it would work with yours.

Both results in the same timestamp.

1vuio0pswjnm7

If I use "UTC" it works. For example,

date.l:

    int fileno (FILE *);
    FILE *f;
    int printf(const char *__restrict, ...);
    #include <time.h>
    char *strptime(const char *s, const char *f, struct tm *tm);
    struct tm t;
   a (Mon|Tue|Wed|Thu|Fri|Sat|Sun)
   b (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
   d [0-2][0-9]|3[01]
   H [0-2][0-9]
   M [0-5][0-9]
   S [0-5][0-9]
   Y [1-9][0-9][0-9][0-9] 
   %option nounput noinput noyywrap
   %%
   {a}[ ]{b}[ ]{d}[ ]{H}:{M}:{S}[ ]UTC[ ]{Y} {
    strptime(yytext,"%a %b %d %H:%M:%S UTC %Y",&t);
    printf("%ld\n",mktime(&t));
    }
   .|\n
   %%
    int main(){yylex();exit(0);}
   
    flex -8Cem date.l
    cc -O3 -std=c89 -W -Wall -pipe lex.yy.c -static -s -o yydate

    date|yydate
This works for me. No need for timegm().

But if I substitute %Z or %z for "UTC" in strptime() above then this does not work.

Fun fact: strptime() can make timestamps for dates that do not exist on any calandar.

     echo "Thu Jun 31 01:59:26 UTC 2024"|yydate

paxcoder

I can't find `timegm` neither in the C99 standard draft nor in POSIX.1-2024.

The first sentence of your link reads:

>The C/Unix time- and date-handling API is a confusing jungle full of the corpses of failed experiments and various other traps for the unwary, many of them resulting from design decisions that may have been defensible when the originals were written but appear at best puzzling today.

wahern

timegm was finally standardized by C23, and POSIX-2024 mentions it in the FUTURE DIRECTIONS section of mktime. I don't know precisely what happened with POSIX. I think timegm got lost in the shuffle and by the time Austin Group attention turned back to it, it made more sense to let C23 pick it up first so there were no accidental conflicts in specification.[1]

[1] POSIX-2024 incorporates C17, not C23, but in practice the typical POSIX environment going forward will likely be targeting POSIX-2024 + C23, or just POSIX-2024 + extensions; and hopefully neither POSIX nor C will wait as long between standard updates as previously.

chikere232

https://man7.org/linux/man-pages/man3/timegm.3.html

It's not posix, but it's pretty available

johnisgood

Yeah, you're correct that `timegm` is neither part of the C99 standard nor officially specified in POSIX.1-2024 but it is widely supported in practice on many platforms, including glibc, musl, and BSD systems which makes it a pragmatic choice in environments where it is available. Additionally, it is easy to implement it in a portable way when unavailable.

So, while `timegm` is not standardized in C99 or POSIX, it is a practical solution in most real-world environments, and alternatives exist for portability, and thus: handling time in C is not inherently a struggle.

As for the link, it says "You may want to bite the bullet and use timegm(3), even though it’s nominally not portable.", but see what I wrote above.

kazinator

Here is some of my code that works around not having timegm. It is detected in a configure script, so there's a #define symbol indicating whether it's available.

https://www.kylheku.com/cgit/txr/tree/time.c

michaelt

> Is it a struggle though?

It’s twelve lines or more, if you include the imports and error handling.

Spreadsheets and SQL will coerce a string to a date without even being asked to. You might want something more structured than that, but you should be able to do it in far less than 12 lines.

C has many clunky elements like this, which makes working with it like pulling teeth.

Suppafly

>Spreadsheets and SQL will coerce a string to a date without even being asked to.

But only when you don't want them to, when you do want them to do it it's still a pain.

stonogo

Spreadsheets and SQL will coerce a string to a date because someone programmed them to in C or C++.

sitzkrieg

almost like C is logically operating at a lower level than spreadsheets or SQL or something

oguz-ismail

> you should be able to do it in far less than 12 lines

In C++, maybe. In C, not necessarily. If you're not willing to reinvent the wheel why'd you choose C anyway?

null

[deleted]

pif

What's a man page? [cit]

johnisgood

"manual pages", type "man man" in your terminal.

https://man7.org/linux/man-pages/man1/man.1.html

TZubiri

Never type up man man, it might make the internet implpode.

amelius

It's where people went for programming information before ChatGPT and even before StackOverflow.

werdnapk

It's where people went for information "even before" the internet.

pif

I'm sorry the sarcasm was not evident. I learnt to program when men were men, and man was man.

cstrahan

[dead]

d_burfoot

My personal rule for time processing: use the language-provided libraries for ONLY 2 operations: converting back and forth between a formatted time string with a time zone, and a Unix epoch timestamp. Perform all other time processing in your own code based on those 2 operations, and whenever you start with a new language or framework, just learn those 2.

I've wasted so many dreary hours trying to figure out crappy time processing APIs and libraries. Never again!

avalys

Starting from timestamp A, how do I find the Unix timestamp B corresponding to exactly 6 months in the future from timestamp B?

cryptonector

Adding or subtracting "months" is inherently difficult because months don't have set lengths, varying from 28 through 31 days. Thus adding one month to May 31 is weird: should that be June 30 or July 1 or some other date?

Try not to have to do this sort of thing. You might have to though, and then you'll have to figure out what adding months means for your app.

mjevans

Welcome to Business Logic. This is where I'd really like pushback to result in things that aren't edgecases.

However you also run into day to day business issues like:

* What if it's now a Holiday and things are closed?

* What if it's some commonly busy time like winter break? (Not quite a single holiday)

* What if a disaster of somekind (even just a burst waterpipe) halts operations in an unplanned way?

Usually flexability needs to be built in. It can be fine to 'target' +3 months, but specify it as something like +3m(-0d:+2w) (so, add '3 months' ignoring the day of month, clamp dom to a valid value, allow 0 days before or 14 days after),

Spivak

I think the parent is describing a "bring your own library" approach where a set of known to the author algorithms will be used for those calculations and the only thing the host language will be used for is the parse/convert.

It does remove a lot of the ambiguity of "I wonder what this stdlib's quirks are in their date calculations" but it also seems like a non-trivial amount of effort to port every time.

d_burfoot

The difficulty of this problem rests on the ambiguity of the phrase "exactly 6 months", which is going to depend totally on the precise business logic. But there's no reason to suppose that the requirements of the business logic will agree with the concepts implemented by the datetime library.

layer8

"Exactly 6 months in the future" from an arbitrary timestamp is not well-defined, even when assuming a fixed time zone. What is it supposed to mean?

1970-01-01

13 more years to go until the 2038 problem.

Surely we'll have everything patched up by then..

ahubert

wow that is dedication 1970-01-01! :-)

xnorswap

It worries me how blasé we seem to be to the 2038 problem.

I wonder if people will still be repeating the "Y2k myth" myth as things start to fail.

robertlagrant

People are doing things[0]. We'll see closer to the date what's left, I suppose.

[0] https://en.wikipedia.org/wiki/Year_2038_problem#Implemented_...

quesera

Almost exactly 13 years, in fact!

The overflow happens at 2038-01-19T03:14:08Z.

account42

The concept of a process-wide locale was a mistake. All locale-dependent functons should be explicit. Yes that means some programs won't respect your locale because the author didn't care to add support but at least they won't break in unexpected ways because some functions magically work differently between the user's and developers system.

robertlagrant

Totally agree. Python's gettext() API feels so ancient because it can only cope with one locale at a time, and it would love to get that locale from an environment variable. Not ideal for writing an HTTP service that sends text based on the Accept-Language header.

layer8

It was a very reasonable design when most programs were local-only.

account42

It really wasn't. Even local-only programs need to process data that isn't formatted in the user's locale.

kazinator

thread-local locale you can easily save and restore would work. In other words, dynamically scoped.

But you don't want to be processing data in locale dependent-ways using the crap available in ISO C.

null

[deleted]

jonstewart

The headline doesn’t match the article. As it points out, C++20 has a very nice, and portable, time library. I quibble with the article here, though: in 2025, C++20 is widely available.

jeffbee

Indeed. The article should be retitled "C still useless in 2025, including time handling".

chikere232

It would be incorrect, but it's already incorrect as what they're doing isn't really a struggle, so I guess the net result is neutral?

spacechild1

Damn, I didn't notice that C++20 added a whole bunch of new features to the std::chrono library! Nice!

zX41ZdbW

The first rule of thumb is to never use functions from glibc (gmtime, localtime, mktime, etc) because half of them are non-thread-safe, and another half use a global mutex, and they are unreasonably slow. The second rule of thumb is to never use functions from C++, because iostreams are slow, and a stringstream can lead to a silent data loss if an exception is thrown during memory allocation.

ClickHouse has the "parseDateTimeBestEffort" function: https://clickhouse.com/docs/en/sql-reference/functions/type-... and here is its source code: https://github.com/ClickHouse/ClickHouse/blob/74d8551dadf735...

bagels

I came to make the thread safe comment. Got bit by that myself formatting is8601, would get wrong output... Sometimes.

I won't believe anyone who tells me that handling time in c/c++ isn't perilous.

p0w3n3d

I think that time handling is the most hard thing in the world of programming.

Explanation: you can learn heap sort or FFT or whatever algorithm there is and implement it. But writing your own calendar from scratch, that will do for example chron job on 3 am in the day of DST transition, that works in every TZ, is a work for many people and many months if not years...

timewizard

Time handling is exceptionally easy. Time zone handling is hard. It doesn't help that the timezone database isn't actually designed to make this any easier.

p0w3n3d

Meanwhile I edited my comment but we're still agreeing. And adding them for example to embedded systems is additional pain. Example: tram or train electronic boards / screens

DougN7

I don’t know. I’ve written that seemed like obvious simple code that got tripped up with the 25 hour day on DST transition. That’s when I learned to stick to UTC.

sgarland

Debian’s vixie-cron had a bug [0] where if the system TZ was changed without restarting crond, it would continue to run jobs based on the old TZ. It checked for DST transitions, but not TZ.

In fairness, it’s not something that should happen much at all, if ever.

[0]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019716

wang_li

Assuming the unstated requirement that you want your cron job to only run once per day, scheduling for 3 am is not a software problem. It's a lack of understanding by the person problem. By definition times around the time change can occur twice or not at all. Also, in the US 3am would never be a problem as the time changes at 2 am.

Also, naming things, cache coherency, and off by one errors are the two hardest problems in computer science.

blindriver

I used the ICU packages when I needed to do something like this but it's been a decade since I coded in C++.

https://unicode-org.github.io/icu/userguide/datetime/

havermeyer

The Abseil time library makes time and date parsing and manipulation a lot nicer in C++: https://abseil.io/docs/cpp/guides/time

rstuart4133

For those skimmimg the problem is mktime() returns local time, and they want it in UTC. So you need to subtract the timezone used, but the timezone varies by date you feed mktime() and there is no easy way to determime it.

If you are happy for the time to perhaps be wrong around the hours timezone changes, this is an easy hack:

    import time
    def time_mktime_utc(_tuple):
        result = time.mktime(_tuple[:-1] + (0,))
        return result * 2 - time.mktime(time.gmtime(result))
If you are just using it for display this is usually fine as time zone changes are usually timed to happen when nobody is looking.

cryptonector

And the answer is to use `gmtime()`, which AIX doesn't have and which Windows calls something else, but, whatever, if you need to support AIX you can use an open source library.

shakna

AIX has gmtime [0], too. Since at least 7.1.

[0] https://www.ibm.com/docs/en/aix/7.1?topic=c-ctime-localtime-...

chikere232

That is not really the problem.

mktime() parses the time string which lacks any information on time zones

then the article uses timegm() to convert it to unixtime on the assumption that it was in UTC

also it's about C

rstuart4133

> mktime() parses the time string

No, mktime() doesn't parse a string. Parsing the string is done by strptime(). mktime() takes the output of strptime(), which is a C structure or the equivalent in Python - a named tuple with the same fields.

> the time string lacks any information on time zones

Not necessarily. Time strings often contain a time zone. The string you happen to be parsing doesn't contain a time zone you could always append one. If it did have a time zone you could always change it to UTC. So this isn't the problem either.

The root cause of the issue is the "struct tm" that strptime() outputs didn't have field for the time zone so if the string has one, it is lost. mktime() needs that missing piece of information. It solves that problem by assuming the missing time zone is local time.

> then the article uses timegm() to convert it to unixtime on the assumption that it was in UTC

It does, but timegm() is not a POSIX function so isn't available on most platforms. gmtime() is a POSIX function and is available everywhere. It doesn't convert a "struct tm", but it does allow you to solve the core problem the article labours over, which is finding out what time zone offset mktime() used. With that piece of information it's trivial to convert to UTC, as the above code demonstrates in 2 lines.

> also it's about C

The python "time" module is a very thin wrapper around the POSIX libc functions and structures. There is a one to one correspondence, mostly with the same names. Consequently any experienced C programmer will be able translate the above python to C. I chose Python because it expresses the same algorithm much more concisely.

null

[deleted]

d0mine

It is easier in Python:

    >>> from email.utils import parsedate_tz, mktime_tz
    >>> mktime_tz(parsedate_tz("Fri, 17 Jan 2025 06:07:07"))
    1737094027
It converts rfc 2822 time into POSIX timestamp ([mean solar] seconds since epoch--elapsed SI seconds not counting leap seconds).

TZubiri

Fun fact, http 1 used to pass expirations and dates in string format.

[Missing scene]

" We are releasing Http1.1 specifications whereby expirations are passed as seconds to expire instead of dates as strings."

richrichie

> give us some truly excellent code that we really don’t deserve

Why such self flagellation?