Skip to content(if available)orjump to list(if available)

A love letter to the CSV format (2024)

A love letter to the CSV format (2024)

74 comments

·September 10, 2025

joz1-k

Except that the "comma" was a poor choice for a separator, the CSV is just a plain text that can be trivially parsed from any language or platform. That's its biggest value. There is essentially no format, library, or platform lock-in. JSON comes close to this level of openness and ease, but YAML is already too complicated as a file format.

thw_9a83c

The notion of a "platform" caught my attention. Funny story: About five years ago, I got a little nostalgic and wanted to retrieve some data from my Atari XL computer (8-bit) from my preteen years. Back then, I created primitive software that showed a map of my village with notable places, like my friends' homes. I was able to transform all the BASIC files (stored on cassette tape) from the Atari computer to my PC using a special "SIO2PC" cable, but the mapping data were still locked in the BASIC files. So I had the idea to create a simple BASIC program that would run in an Atari 8-bit PC emulator, linearize all the coordinates and metadata, and export them as a CSV file. The funny thing is that the 8-bit Atari didn't even use ASCII, but an unusual ATASCII encoding. But it's the same for letters, numbers, and even for the comma. Finally, I got the data and wrote a little Python script to create an SVG map. So yes, CSV forever! :)

humanfromearth9

And the best thing about CSV is that it is a text file with a standardized, well known, universally shared encoding, so you don't have to guess it when opening a CSV file. Exactly in the same way as any other text file. The next best thing with CSV is that separators are also standardized and never positional, you never have to guess.

whizzter

Almost missed the sarcasm :)

jstanley

JSON has the major annoyance that grep doesn't work well on it. You need tooling to work with JSON.

re

As soon as you encounter any CSVs where field values may contain double quotes, commas, or newlines, you need tooling to work with CSV as well.

(TSV FTW)

theknarf

grep is a tool. jq is a good tool for json.

kergonath

grep is POSIX and you can count on it being installed pretty much anywhere. That’s not the case for jq.

john_the_writer

100%.. xml also worked here too..

YAML is a pain because it has every so slightly different versions, that sometimes don't play nice.

csv or TSV's are almost always portable.

hiAndrewQuinn

I like CSV because its simplicity and ubiquity make it an easy Schelling point in the wide world of corporate communication. Even very non-technical people can, with some effort, figure out how to save a CSV from Excel, and figure out how to open a CSV with Notepad if absolutely necessary.

On the technical side libraries like pandas have undergone extreme selection pressure to be able to read in Excel's weird CSV choices without breaking. At that point we have the luxury of writing them out as "proper" CSV, or as a SQLite database, or as whatever else we care about. It's just a reasonable crossing-over point.

heresie-dabord

CSV is a flexible solution that is as simple as possible. The next step is JSONL.

https://jsonlines.org/

https://medium.com/@ManueleCaddeo/understanding-jsonl-bc8922...

roland35

To people saying that "your boss can open it" being an benefit of csv, well I have a funny story!

Back in the early 2000s I designed and built a custom data collector for a air force project. It saved data at 100 Hz on an SD card. The project manager loved it! He could pop the SD card out or use the handy USB mass storage mode to grab the csv files.

The only problem... Why did the data cut off after about 10 minutes?? I couldn't see the actual data collected since it was secret, but I had no issue on my end, assuming there was space on the card and battery life was good.

Turns out, I learned he was using excel 2003 to open the csv file. There is a 65,536 row limit (does that number look familiar?). That took a while to figure out!!

IanCal

Love it.

The first data release I did excel couldn't open the CSV file, because it started with a capital I (first column ID). Excel looks at this, looks at this file with a comma in the header and text and the ending "csv" and says

I KNOW WHAT THIS IS

THIS IS A SYLK FILE

BECAUSE IT STARTS WITH "I"

NO OTHER POSSIBLE FILE COULD START WITH THE LETTER "I"

then reads some more and says

THIS SYLK FILE LOOKS WRONG

IT MUST BE BROKEN

ERROR

https://en.wikipedia.org/wiki/Symbolic_Link_(SYLK)

Dilettante_

"Plz fix. No look! Just fix!!"[1] must be one of the circles of programmer hell.

[1]https://i.kym-cdn.com/entries/icons/facebook/000/027/691/tum...

untrimmed

This is a great defense, but I feel like it misses the single biggest reason CSV will never die: your boss can open it. We can talk about streaming and Parquet all day, but if the marketing team can't double-click the file, it's useless.

imtringued

With what software? LibreOffice? Excel doesn't support opening CSV files with a double click. It lets you import CSV files into a spreadsheet, but that requires reading unreasonably complicated instructions.

ertgbnm

On windows, csv's automatically open in Excel through the file explorer. Almost all normal businesses use windows so the OPs claim is pretty reasonable.

tommica

Depends on the country/locale - I just generate them with semicolons to enable easy opening

efitz

Excel absolutely can open csv files with a double click if you associate the file type extension.

boshomi

You should never blindly trust Excel when using CSV files. Try this csv file:

    COL1,COL2,COL3 
    5,"+A2&C1","+A2*8&B1"

delta_p_delta_x

> Excel doesn't support opening CSV files with a double click

Yes, it does. When Excel is installed, it installs a file type association for CSV and Explorer sets Excel as the default handler.

kelvinjps10

Those programs support opening csv with double click

jowea

How is that not opening?

imtringued

You are creating a new spreadsheet that you can save as an xlsx. What you are looking at is not the CSV file itself.

john_the_writer

What are you talking about? Excel opens csv with zero issue. In windows, and mac. Mac you right click and "open with". Or you open excel, and click file/open and find the csv. I do the first one a dozen times a day.

1wd

Only if the Windows Regional Settings List Separators happens to be "comma", which is not the case in most of Europe (even in regions that use the decimal point) so only CSV files with SEP=, as the first line work reliably with Excel.

guzik

I am glad that we decided to pick CSV as our default format for health data (even for heavy stuff like raw ECG). Yeah, files were bigger, but clients loved that they could just download them, open in Excel, make a quick chart. Meanwhile other software was insisting on EDF (lighter, sure) but not everything could handle it.

efitz

I don’t think I ever heard anyone say “csv is dead”.

Smart people (that have been burned once too many times) put quotes around fields in csv if they aren’t 100% positive the field will be comma-free, and escape quotes in such fields.

mcdonje

>Excel hates CSV. It clearly means CSV must be doing something right.

Use tabs as a delimiter and excel interops with the format as if natively.

tacker2000

The problem is that nobody in the real world uses tabs.

Everyone uses , or ; as delimiters and then uses either . or , for decimals, depending on the source.

It shouldnt be so hard to auto-detect these different formats, but somehow in 2025, Excel still cannot do it.

sfn42

You don't need to auto-detect the format. The delimiter can be declared at the top of the file as for example sep=;

yrro

But now that's not CSV. It's CSV with some kind of ad-hoc header...

pragmatic

Pipe enters the chat.

For whatever reason, pipe seems to be support common in health care data.

sevensor

I was writing a program a little while ago that put data on the clipboard for you to paste into Excel. I tried all manner of delimiters before I figured out that Excel really loves HTML tables. If you wrap everything in <tr> and <td>, and put it up on the clipboard, it pastes flawlessly.

gentooflux

Use tabs as a delimiter and it's not CSV anymore, that's TSV.

mcdonje

They're essentially the same format. Same with PSV. They're all DSVs.

Most arguments for or against one apply to all.

https://en.m.wikipedia.org/wiki/Delimiter-separated_values

roelschroeven

It still can't properly deal with CSVs that use different decimal separators than the UI setting in Excel / Windows. It's still too stupid to understand that UI localization and interoperation should never be mixed.

femto

CSV is good for debugging C/C++ real-time signal processing data paths.

Add cout or printf lines, which on each iteration print out relevant intermediate values separated by commas, with the first cell being a constant tag. Provided you don't overdo it, the software will typically still run in real-time. Pipe stdout to a file.

After the fact, you can then use grep to filter tags to select which intermediate results you want to analyse. This filtered data can be loaded into a spreadsheet, or read into a higher level script for analysis/debugging/plotting/... In this way you can reproducibly visualise internal operation over a long period of time and see infrequent or subtle deviations from expected behaviour.

matt_daemon

Agree this is the main use for it

lan321

I hate parsing CSV. There are so many different implementations it's a constant cat and mouse.. Literally any symbol can be the separator, then the ordering starts getting changed, then since you have to guess what's where you go by type but strings, for example, are sometimes in quotations, other times not, then you have some decimal split with a comma when the values are also separated with commas so you have to track what's a split and what's a decimal comma.. Then you get some line with only 2 elements when you expect 7 and have no clue what to do because there's no documentation for the output and hence what that line means..

If the CSV is not written by me it's always been an exercise in making things as difficult as possible. It might be a tad smaller as a format but I find the parsing to be so ass you need really good reason to use it.

Edit: Oh yeah, and some have a header, others don't and CSV seems to always come from some machine where the techs can come over to do an update, and just reorder everything because fuck your parsing and then you either get lucky and the parser dies, or since you don't really have much info the types just align and you start saving garbage data to your database until a domain expert notices something isn't quite right so you have to find when was the last time someone touched the machines and rollback/reparse everything..

jcattle

If you don't care that much about the accuracy of your data (like only caring about a few decimals of accuracy in your floats), you don't generate huge amounts of data, you do not need to work with it across different tools and pass it back and forth, then yes CSV CAN be nice.

I wouldn't write it a love letter though. There's a reason that parquet exists.

christophilus

CSV is just a string serialization, so you can represent floats with any accuracy you choose. It’s streamable and compressible, so large files are fine, though maybe not “huge” depending on how you define “huge”. It works fine passing back and forth between various tools, so…

Without more specifics, I disagree with your take.

jcattle

It's only fine passing between various tools if you tell each tool exactly how they should serialize your values. Each tool will interpret column values in some way and if you want to work with those values in any meaningful way it will convert to their representation of the data type that is likely present in a column.

Going from tool to tool will leave you with widely different representations of the original data you've put in. Because as you said yourself. All of this data does not have any meaning. It's just strings. The csv and the tools do not care if one column was ms-epoch and another was mathematical notation floating point. It'll all just go through that specific tools deserialization - serialization mangle and you'll have completely different data on the other end.

bsghirt

How would you deserialise the entity "0.4288"?

vim-guru

Excel hates CSV

It clearly means CSV must be doing something right.

wkat4242

Especially in Europe because we use the comma as a decimal point. So every csv file opened in Excel is screwed up.

ayhanfuat

Previously: A love letter to the CSV format (https://github.com/medialab/xan/blob/master/docs/LOVE_LETTER...)

708 points | 5 months ago | 698 comments (https://news.ycombinator.com/item?id=43484382)