Formatting code should be unnecessary

289 comments

·September 7, 2025

jillesvangurp

There was a movement towards working with syntax trees directly and treating source code as a generated serialization of those syntax trees about 20-25 years ago. This probably started with refactoring as it was pioneered in the nineties. Things like Visual Age actually stored code in a database instead of on the file system. Later intentional programming (Charles Simonyi was pushing that) tried to also do things with this. And of course model driven development was a thing around the same time.

Refactorings (when done right) are syntax tree transformations that preserve things like referential integrity, etc. that ensure code does the same thing before and after applying a refactoring.

A rename becomes trivial if you are simply working on the symbol directly. For that to work with file based source trees, you need to parse the whole thing, keep track of where symbols are referred in files, rename the symbol and then update all the places in the source tree. That stuff becomes a lot easier when the code representation isn't a bunch of files but the syntax tree. The symbol just gets a different name. Anything that uses the symbol will still use the same symbol.

People like editing files of course and that has resulted in a lot of friction developing richer tools that don't store text but something that preserves more structure. The fact that we're still going on about formatting issues a quarter century later maybe shows that this is something to revisit. For many languages and editors, robust symbol renames are still somewhat science fiction. And that's just the most basic refactoring.

automatoney

I've never understood why people care so much about the linter settings. It's so obviously bikeshedding, just make a choice, run the linter automatically and be done with it. I'm too busy doing actual software engineering to care about where exactly everything goes - I promise after a week you'll just get used to whatever format your team lands on.

AdieuToLogic

> I've never understood why people care so much about the linter settings.

Source code formatting programs are not the same as lint[0] programs. The former rewrites source code files such that the output is conformant with a set of layout rules without altering existing logic. The latter is a category of idempotent source code analysis programs typically used to identify potential implementation errors within otherwise valid constructs.

Some language tools support both formatting and source code analysis, but this is an implementation detail.

0 - https://en.wikipedia.org/wiki/Lint_(software)

stavros

Right, but it's obvious they meant "formatter".

kristopolous

Formatters, if you want to be specific, are even worse.

They slyly add git noise and pollute your audit trails by just going through and moving shit around whenever you save a file.

And sometimes, they actually insert bugs - string formatting errors are my favorite example.

It's for people who think good code is a about adhering to aesthetic ideologies instead of making things documented and accountable.

This is most noticeable in open source contributions. Sometimes I'll get a pull request with like 2 lines of change and 120 lines of some reformating tool.

You think I accept that?

It's not a good idea

rcxdude

Running random formatters on random subsets of your code is not a good idea. If you want code in a repo to be formatted a certain way, you need to have one set of settings and enforce it, and yeah, reject anything that just has spurious formatting changes that someone else has run.

misiek08

I’ve seen multiple repos with pre-hook and just CI running formatter on _modified_ code only. Those repos were the cleanest to date.

socalgal2

some settings have advantages. For example, trailing commas on tables

    [
      'apple',
      'banana',
      'orange',
    ]

has an advantage over

    [
      'apple',
      'banana',
      'orange'
    ]

Because adding a new line at the end of the table (1) requires editing 1 line, instead of 2 (2) makes the diffs in code review smaller and easier to read and review. So a bad choice makes my life harder. The same applies to local variable declarations.

Sorted lists (or sorted includes) is also something that makes my life easier. If they're not sorted then everyone adds their new things to the end, which means there are many times more merge conflicts. sorted doesn't mean there are zero but does mean there are less than "append to the end". So, just like an auto-formatter is there to save time, don't waste my time by not sorting where possible.

Also, my OCD hates inconsistency. So

    [1, 2, 3]
    {a, b, c}

Is ok and

    [ 1, 2, 3 ]
    [ a, b, c ]

Is ok but

    [1, 2, 3]
    { a, b, c }

Is not. I don't care which but pick ONE style, not two styles!

aleph_minus_one

> Because adding a new line at the end of the table (1) requires editing 1 line, instead of 2 (2) makes the diffs in code review smaller and easier to read and review.

This judgement is rather based on a strong personal opinion (which I don't claim to be wrong, but also not as god-given) on what is one, and what are two changes in the code:

- If you consider adding an additional item to the end of the list to be one code change, I agree that a trailing comma makes sense

- On the other hand, it is also a sensible judgment to consider this to be a code change of two lines:

1. an item (say 'peach') is added to the end of the list

2. 'orange' has been turned from the last element of the list to a non-last element of the list

If you are a proponent of the second interpretation, the version that you consider to be non-advantageous is the one that does make sense.

austin-cheney

Yes, everyone has personal opinions about code vanity. When this becomes a holy war I really start to question the maturity of people on the project. I find that people worry about trivial nonsense to mask their inability to address more valid concerns.

All that really matters is consistency. Let a team make some decisions and then just move forward.

yes_man

The problem is when 2 people with same level of enthusiasm for linter rules but opposing views collide. If there’s nothing more impactful you could be solving and spending energy and time on than arguing those linter rules, then it’s time to question where the project is at and where is it going.

And if there is something more important, then instead of of micro-optimizing the rules when there is strong disagreement it’s probably best if one of the parties takes the high road and lives with it so you can all focus on what matters.

vbezhenar

I guess that's one reason why opinionated tools like prettier or gofmt are popular. They made all the choices for you, they don't have configurable knobs, so you just learn to live with it.

rapind

Let this sink in though:

    [ 'apple'
    , 'banana'
    , 'orange'
    ]

JBiserkov

In Clojure, commas are treated as whitespace and are thus completely optional.

Nevermark

This is so clearly superior. Delimiters are prefixes.

But the scale of technical debt this insight has revealed is depressing.

maest

That makes prepending an element a special case.

muzani

I agree with you on all these points. If you were to argue the opposite point, I'd agree as well.

hananova

Meanwhile, I know and understand the reasons for trailing commas, but I find them incredibly ugly so I always strip them out.

sarchertech

Can’t strip them out if the compiler requires them.

huflungdung

That isn’t ocd.

sotix

A strong reason I enjoy Rust for collaboration is that it's so opinionated, it forces people to focus on solving real problems. I agree that bikeshedding over ES Lint and Prettier configs are not a strong use of time.

psychoslave

I don't care that much about the specific retained options (though my own gusts of the day are obviously the best taste ever in the whole existence of universe) but having a common linter setting to prevent the noise in every damn PR is a must have.

Yes both git and all these PL are actually damn stupid to take lines at face value instead of something more elegant like Ada does. In my 20+ year career I've been proposed only once a project that involved Ada.

It's hard to come with something elegant and efficient. It's even harder to make it reach top tiers global presence, all the more when the ecological niche is already filled with good enough stuff.

rs186

That is true if a set of good linting rules are set up, those that help discover errors or other code smells which are valid issues in 99% of cases, or pure formatting rules when there is no "correct" thing to do. Linting becomes a problem when it is opinionated and has questionable rationale to begin with, and stands in your way instead of help you catch issues. Nobody should be fighting linting rules, but sadly that's what often happens.

See my other comment: https://news.ycombinator.com/item?id=45166670

worldsayshi

I agree. Linters are one of the more frustrating aspects of modern dev. It's of such little relevance and yet it takes up a sizeable portion of my time when I'm going for a merge. Many editors/language combinations don't give automatic linting out of the box and when they do I can bet that the rules they infer is different from what the CI pipeline infers.

jupp0r

I generally agree, but max line length being so high you have to horizontally scroll while reading code is very detrimental to productivity.

elevation

Formatters eliminating long lines is a pet peeve of mine.

About once every other project, some portion of the source benefits from source code being arranged in a tabular format. Long lines which are juxtaposed help make dissimilar values stand out. The following table is not unlike code I have written:

  setup_spi(&adc,    mode=SPI_01, rate=15, cs_control=CS_MUXED,  cs=0x01);
  setup_spi(&eeprom, mode=SPI_10, rate=13, cs_control=CS_MUXED,  cs=0x02);
  setup_spi(&mram,   mode=SPI_10, rate=50, cs_control=CS_DIRECT, cs=0x08);

Even if we add 4-5 more operational parameters, I find this arrangement much more readable than the short-line equivalent:

  setup_spi(&adc,
      mode=SPI_01,
      rate=15,
      cs_control=CS_MUXED,
      cs=0x01);
  setup_spi(&eeprom,
      mode=SPI_10,
      rate=13,
      cs_control=CS_MUXED,
      cs=0x02);
  setup_spi(&mram,
      mode=SPI_10,
      rate=50,
      cs_control=CS_DIRECT,
      cs=0x08);

Or worse, the formatter may keep the long lines but normalize the spaces, ruining the tabular alignment:

  setup_spi(&adc, mode=SPI_01, rate=15, cs_control=CS_MUXED, cs=0x01);
  setup_spi(&som_eeprom, mode=SPI_10, rate=13, cs_control=CS_MUXED, cs=0x02);
  setup_spi(&mram, mode=SPI_10, rate=50, cs_control=CS_DIRECT, cs=0x08);

Sometimes a neat, human-maintained block of 200 character lines brings order to chaos, even if you have to scroll a little.

sn0wleppard

The worst is when you have lines in a similar pattern across your formatter's line length boundary and you end up with

  setup_spi(&adc, mode=SPI_01, rate=15, cs_control=CS_MUXED, cs=0x01);
  setup_spi(&eeprom,
      mode=SPI_10,
      rate=13,
      cs_control=CS_MUXED,
      cs=0x02);
  setup_spi(&mram, mode=SPI_10, rate=50, cs_control=CS_DIRECT, cs=0x08);

a_e_k

Yes, so much this!

I've often wished that formatters had some threshold for similarity between adjacent lines. If some X% of the characters on the line match the character right above, then it might be tabular and it could do something to maintain the tabular layout.

Bonus points for it's able to do something like diff the adjacent lines to detect table-like layouts and figure out if something nudged a field or two out of alignment and then insert spaces to fix the table layout.

bryanrasmussen

People have already outlined all the reasons why the long line might be less than optimal, but I will note that really you are using formatting to do styling.

In a post-modern editor (by which I mean any modern editor that takes this kind of thing into consideration which I don't think any do yet) it should be possible for the editor to determine similarity between lines and achieve a tabular layout, perhaps also with styling for dissimilar values in cases where the table has a higher degree of similarity than the one above. Perhaps also with collapsing of tables with some indicator that what is collapsed is not just a sub-tree but a table.

jaimebuelta

The first line should be readable enough, but in case it's longer than that, I way prefer the style of

  setup_spi(&adc, mode=SPI_01, rate=15, cs_control=CS_MUXED,  
            cs=0x01);
  setup_spi(&eeprom, mode=SPI_10, rate=13, cs_control=CS_MUXED,  
            cs=0x02);
  setup_spi(&mram, mode=SPI_10, rate=50, cs_control=CS_DIRECT, 
            cs=0x08);

of there the short-line alternative presented.

I like short lines in general, as having a bunch of short lines (which tend to be the norm in code) and suddenly a very long line is terrible for readability. But all has exemptions. It's also very dependent on the programming language.

lambdaba

I agree, I'm very much against any line length constraint, it's arbitrary and word wrapping exists.

VBprogrammer

Those kind of tables improve readability right until someone hits a length constraint and had to either touch every line in order to fix the alignment, causing weird conflicts in VCS, or ignore the alignment and it's slow decay into a mess begins.

vbezhenar

It is an obvious example where automatic formatter fails.

But are there more examples? May be it's not high price to pay. I'm using either second or third approach for my code and I never had much issues. Yes, first example is pretty, but it's not a huge deal for me.

account42

Another issue with fixed line lengths is that it requires tab stops to have a defined width instead of everyone being able to choose their desired indentation level in their editor config.

jghn

I’d agree with you except for the trend over the last 10 years or so to set limits back to the Stone Age. For a while there we seemed to be settling on somewhere around 150 characters and yet these days we’re back to the 80-100 range.

tsimionescu

I am at the opposite end. Having any line length constraints whatsoever seems like a massive waste of time every time I've seen it. Let the lines be as long as I need them, and accept that your colleagues will not be idiots. A guideline for newer colleagues is great, but auto-formatters messing with line lengths is a source of significant annoyance.

Cthulhu_

> auto-formatters messing with line lengths is a source of significant annoyance.

Unless they have been a thing since the start of a project; existing code should never be affected by formatters, that's unnecessary churn. If a formatter is introduced later on in a project (or a formatting rule changed), it should be applied to all code in one go and no new code accepted if it hasn't passed through the formatter.

I think nobody should have to think about code formatting, and no diff should contain "just" formatting changes unless there's also an updated formatting rule in there. But also, you should be able to escape the automatic formatting if there is a specific use case for it, like the data table mentioned earlier.

jitl

every editor can wrap text these days. good ones will even indent the wrapped text properly

giveita

Thats a slippery slope towards storing semantics and displaying locally preferred syntax ;)

rTX5CMRXIfFG

You still have to minimize the wrapping that happens, because wrapped lines of code tend to be continuous instead of being properly spaced so as to make its parts individually readable.

hulitu

> every editor can wrap text these days.

could. Yesterday notepad (win 10) just plainly refused.

appellations

I forget there are people who don’t configure softwrap in their text editor.

Some languages (java) really need the extra horizontal space if you can afford it and aren’t too hard to read when softwrapped.

forrestthewoods

Define high? I think 120 is pretty reasonable. Maybe even as high as 140.

Log statements however I think have an effectively unbounded length. Nothing I hate more than a stupid linter turning a sprinkling of logs into 7 line monsters. cargo fmt is especially bad about this. It’s so bad.

skinner927

I still prefer 80. I won’t (publicly) scoff at 100 though. IMO 120 is reasonable for HTML and Java, but that’s about it.

Sent from my 49” G9 Ultrawide.

layer8

100 is the sweet spot, IMO.

I like splitting long text as in log statements into appropriate source lines, just like you would a Markdown paragraph. As in:

    logger.info(
        "I like splitting long text as in log statements " +
        "into ” + suitablelAdjective + " source lines, " +
        "just like you would a Markdown paragraph. " +
        "As in: " + quine);

I agree that many formatters are bad about this, like introducing an indent for all but the first content line, or putting the concatenation operator in the front instead of the back, thereby also causing non-uniform alinkemt of the text content.

setopt

It’s tricky to find an objective optimum. Personally I’ve been happy with up to 100 chars per line (aim for 80 but some lines are just more readable without wrapping).

But someone will always have to either scroll horizontally or wrap the text. I’m speaking as someone who often views code on my phone, with a ~40 characters wide screen.

In typography, it’s well accepted that an average of ~66 chars per line increases readability of bulk text, with the theory being that short lines require you to mentally «jump» to the beginning of the next line frequently which interrupts flow, but long lines make it harder to mentally keep track of where you are in each line. There is however a difference between newspapers and books, since shorter ~40-char columns allows rapid skimming by moving your eyes down a column instead of zigzagging through the text.

But I don’t think these numbers translate directly to code, which is usually written with most lines indented (on the left) and most lines shorter than the maximum (few statements are so long). Depending on language, I could easily imagine a line length of 100 leading to an average of ~66 chars per line.

smokel

I've never understood why we still look at the plain text representation of code, and not a visualization of the code that makes more sense.

Note that, in my mind, this visualization is not automatically generated, but lovingly created by humans who wish their code to be understood by others. It is not separate from the code, as typical design documentation is, but an integral part of it, stored in metadata. Consider it an extension of variable and function naming.

There is of course "literate programming" [1], but somehow (improvements of) that never took off in larger systems.

[1] https://en.wikipedia.org/wiki/Literate_programming

jraph

> I've never understood why we still look at the plain text representation of code, and not a visualization of the code that makes more sense.

I suppose this is because nobody has been able to create good tooling for it (the visualization itself, the efficient editing, etc). You'll have to deal with the text version of it at some point if not all tools that we rely on get a version for the new visualization.

Another hypothesis is that it might not matter this much that we work with text directly after all.

> Note that, in my mind, this visualization is not automatically generated, but lovingly created by humans who wish their code to be understood by others.

If you allow manual crafting there, I suspect you'll need some sort of linting too.

AdieuToLogic

> I've never understood why we still look at the plain text representation of code, and not a visualization of the code that makes more sense.

My guess is it is the same reason why the most common form of creating source code is typing and not other readily available mechanisms:

  Semantic density

Graphical visualizations are approachable representations and very useful for introductory, infrequent, and/or summary needs. However, they become cumbersome when either a well-defined repetitive workflow is used or usage variations are not known a priori.

An example of both are the emacs and vi editors. The vast majority of supported commands are at most a few keystrokes and any programming language source code can be manipulated by them.

kelseyfrog

The tradeoff here is not being able to use a universal set of tooling to interact with source files. Anything but text makes grep, diff, sed, and version control less effective. You end up locked into specialized tools, formats, or IDE extensions, while the Unix philosophy thrives on composability with plain text.

There's a scissor that cuts through the formatting debate: If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?

gr__or

Text surely is a hill, but I believe it's a local one, we got stuck on due to our short-sighted inability to go into a valley for a few miles until we find the (projectional) mountain.

All of your examples work better for code with structural knowledge:

- grep: symbol search (I use it about 100x as often as a text grep) or https://github.com/ast-grep/ast-grep

- diff: https://semanticdiff.com (and others), i.e.: hide noisy syntax only changes, attempt to capture moved code. I say attempt, because with projectional programming we could have a more expressive notion of code being moved

- sed: https://npmjs.com/package/@codemod/cli

- version control: I'd look towards languages like Unison to see what funky things we could do here, especially for libraries. A general example: no conflicts due to non-semantic changes (re-orderings, irrelevant whitespaces, etc.)

zokier

But as the tools you link demonstrate, having "text" as the on-disk format does not preclude AST based (or even smarter) tools. So there is little benefit in having non-text format. Ultimately it's all just bytes on disk

Tooster

I’d also add:

* [Difftastic](https://difftastic.wilfred.me.uk/) — my go-to diff tool for years * [Nu shell](https://www.nushell.sh/) — a promising idea, but still lacking in design/implementation maturity

What I’d really like to see is a *viable projectional editor* and a broader shift from text-centric to data-centric tools.

The issue is that nearly everything we use today (editors, IDEs, coreutils) is built around text, and there’s no agreed-upon data interchange format. There have been attempts (Unison, JetBrains MCP, Nu shell), but none have gained real traction.

Rare “miracles” like the C++ --> Rust migration show paradigm shifts can happen. But a text → projectional transition would be even bigger. For that to succeed, someone influential would need to offer a *clear, opt-in migration path* where:

* some people stick with text-based tools, * others move to semantic model editing, * and both can interoperate in the same codebase.

What would be needed:

* Robust, data-native alternatives to [coreutils](https://wiki.archlinux.org/title/Core_utilities) operating directly on structured data (avoid serialize ↔ parse boundaries). Learn from Nushell’s mistakes, and aim for future-compatible, stable, battle-tested tools. * A more declarative-first mindset. * Strong theoretical foundations for the new paradigm. * Seamless conversion between text-based and semantic models. * New tools that work with mainstream languages (not niche reinventions), and enforce correctness at construction time (no invalid programs). * Integration of semantic model with existing version control systems * Shared standards for semantic models across languages/tools (something on the scale of MCP or LSP — JetBrains’ are better, but LSP won thanks to Microsoft’s push). * Dual compatibility in existing editors/IDEs (e.g. VSCode supporting both text files and semantic models). * Integrate knowledge across many different projects to distill the best way forward -> for example learn from Roslyn's semantic vs syntax model, look into tree sitter, check how difftastic does tree diffing, find tree regex engines, learn from S-expressions and LISP like languages, check unison, adopt helix editor/vim editing model, see how it can eb integrated with LSP and MCP etc.

This isn’t something you can brute-force — it needs careful planning and design before implementation. The train started on text rails and won’t stop, so the only way forward is to *build an alternative track* and make switching both gradual and worthwhile. Unfortunately it is pretty impossible to do for an entity without enough influence.

zokier

But almost every editor worth its salt these days has structural editing.

https://docs.helix-editor.com/syntax-aware-motions.html

https://www.masteringemacs.org/article/combobulate-structure...

https://zed.dev/blog/syntax-aware-editing

Etc etc.

jsharpe

Exactly. This idea comes up time and time again, but the cost/benefit just doesn't make sense at all. You're adding an unbelievable amount of complex tooling just to avoid running a simple formatter.

The goal of having every developer viewing the code with their own preferences just isn't that important. On every team I've been on, we just use a standard style guide, enforced by formatter, and while not everyone agrees with every rule, it just doesn't matter. You get used to it.

Arguing and obsessing about code formatting is simply useless bikeshedding.

scubbo

I disagree with almost every choice made by the Go language designers, but `Gofmt's style is no one's favorite, yet gofmt is everyone's favorite` is solid. Pick a not-unreasonable standard, enforce it, and move on to more important things.

spyspy

My only complaint about gofmt is that it’s not even stricter about some things.

rbits

Yeah it would probably be a waste of time. It's a nice idea to dream about though. It would be nice to be able to look at some C# code and not have opening curly brackets on a separate line.

mdaniel

I say this fully cognizant of the thread in which it's posted, but these people are sick

https://astyle.sourceforge.net/astyle.html#_style=whitesmith

And then someone said: oh yeah? Hold my beer https://astyle.sourceforge.net/astyle.html#_style=pico

Buttons840

> Arguing and obsessing about code formatting is simply useless bikeshedding.

Unless it's an accessibility issue, and it is an accessibility issue sometimes.

mmastrac

Maybe if you use 16-wide tabs or a 40 character line length.

raspasov

>> The goal of having every developer viewing the code with their own preferences just isn't that important.

Bah! So, what is more important? Is the average convenience of the herd more important? Average of the convenience, even if there was ever such a thing.

What if you really liked reading books in paper format, but were forced to read them on displays for... reasons?

MyOutfitIsVague

The way I envision this working is with something like git filters. Checking out from version control converts it all into text in your preferred formatting, which you then work with as expected. Staging it converts it into the stored representation. In git, this would be done with smudge and clean filters, like how git LFS works. You'd also have viewers for forges and the like that are built to interpret all the stored representations as needed.

You still work with text, the text just isn't the canonical stored representation. You get diffs to resolve only when structure is changed.

You get most of the same benefit with a pre-commit linter hook, though.

zokier

[delayed]

bapak

This is it, unfortunately git is "too dumb" for this. In order to merge code, it would have to either understand the AST.

What happens when you stage the line `} else return {`? git doesn't allow to stage specific AST nodes. It would also mean that you can't stage partial code (that produces syntax errors)

zokier

Git can use arbitrary merge (and diff) tools. Something like https://mergiraf.org/introduction.html works with git and gets you ast aware merging. Do not underestimate gits flexibility.

Hendrikto

Smudge and clean filters work on text, git would not need to change at all.

You would still store text, and still check out text, just transformed text. You could still check in anything you want, including partial code, syntax errors, or any other arbitrary text. Diffs would work the same way they do now.

account42

Please no, git trying to automatically "correct" \n vs \r\n line endings is already horrible enough. At least you can turn that off.

accelbred

What if the common intermediate encoding is text, not binary? Then grep/diff/sed all still work.

If we had a formatting tool that operated solely on AST, checked in code could be in a canonical form for a given AST. Editors could then parse the AST and display the source with a different formatting of the users choice, and convert to canonical form when writing the file to disk.

pmontra

All mainstream editors that agree to work on a standard AST for any given language could be nice. I'm not expecting that to happen at any time in future.

About grep and diff working on a textual representation of the AST, it would be like grepping on Javascript source code when the actual source code is Typescript or some other more distant language that compiles to Javascript (does anybody remember Coffescript?) We want to see only the source code we typed in.

By the way, add git diff to the list of tools that should work on the AST but show us the real source code.

sublinear

Nobody wants to have to run their own formatter rules in reverse in their head just to know what to grep for. That defeats the point of formatting at all.

pwdisswordfishz

That's why you grep for a syntactic structure, not undifferentiated text.

Avshalom

The entire OS was built around these source files.

the unix philosophy on the other hand only "thrives" if every other tool is designed around (and contains code to parse) "plain text"

lmm

> The entire OS was built around these source files.

And how did that work out for them?

This seems like one of the many cases where unix won out by being a lowest common denominator. Every platform can handle plain text.

account42

Not all platforms come with powerful text handling tools out of the box - or at least they didn't used to until Unix-based systems forced them to catch up.

aleph_minus_one

> This seems like one of the many cases where unix won out by being a lowest common denominator.

The lowest common denominator rather is binary blobs. :-)

rendaw

Grep, diff, sed, and line-based non-semantic merge are all terrible tools for manipulating code... rather than dig ourselves in either further with those maybe a reason to come up with something better would be good.

danielheath

If you’re going to store the source in a canonical format and unpack that to suit each developer… why should the canonical format just be regular source code?

All the same tools can exist with a text backend, and you get grep/sed support for free too!

psychoslave

That’s seems like a genious remark actually. If you store the abstract objects and have the mechanism to transform to whatever the desired output form is, it’s almost trivial to expose a version as files and text rendering for tools that are thus oriented, isn’t it?

giveita

My grep may not work on your settings for the same code.

This becomes an issue with say CI where maybe I add a gate to check something with grep. But whose format do I assume? My local (that I used to test it locally) or the canonical (which means I need to switch local format to test it)?

brabel

You really rely on grep on CI? How fragile is that ?! This is a good argument for storing non-text. Grepping code is laughably unreliable. The only way to write things like that reliably is by actually parsing the code and working in its AST. Working in text is like writing code in a completely untyped language. It can be done, but it’s beyond stupid for anything where accuracy matters.

treadmill

You're misunderstanding the idea I think.

You would use the format on disk for the grep. "Your format" only exists displayed in your editor.

eviks

> If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?

Yes, of course, because tab width is * dynamically* flexible, so initial space width isn't enough

pasc1878

Yes because if you want to deindent with tabs it is just delete one character whilst spaces requires you top delete x characters where x is the number of spaces you indent by.

davetron5000

There’s also a typography element to formatting source code. The notion that all code formatting is mere personal preference isn’t true. Formatting code a certain way can help to communicate meaning and structure. This is lost when the minimal tokens are serialized and re-constituted using an automated tool.

https://naildrivin5.com/blog/2013/05/17/source-code-typograp...

Mikhail_Edoshin

And I'd add that typographers go out of their skin to typeset tables and formulae so that everything is aligned and has proper spacing. For centuries this was done manually because it it important, even though an outsider cannot notice it.

(That said, it must be possible to make a more sophisticated formatter for the source code too.)

IshKebab

Yeah in theory people can do a better job than auto-formatters. In practice they absolutely do not, so that argument is moot.

xpe

> Yeah in theory people can do a better job than auto-formatters. In practice they absolutely do not, so that argument is moot.

Status quo fallacy alert. Arguments are not forever mired in a current state of affairs. People can learn and can build tools to help them do better.

This could change quickly; e.g. if Claude or GitHub or (Your Team) decide to prioritize how source code looks.

frizlab

Yes! I’m always appalled that people cannot see that.

psychoslave

Caring for typography but blindly bending to dubious programming-language convention feels really like putting efforts on the wrong starting point though.

What’s the point of such an heavy obfuscation of the intend, really? Let’s take the first example.

    char *
    strcpy(to, from)
            register char *to;
            register const char *from;
    {
            char *save = to;

            for (; (*to = *from) != 0; ++from, ++to);
            return(save);
    }

If we are fine with the "lengthy" register, why not use character in full word? Or if we want something shorter sign would be actually semantically more on point in general.

What with the star to design a pointer? Why not sign-pointer? Or pin for short if we dare to use a pretty straightforward metaphor, so sign-pin. Ah yes by the way, using "dot" (.) or "dash, greater than" (->) is such a typographical non-sense.

And as a side note *char brings nothing in readability compared to sign-pin-pin. Remember that most people read words or even word sequences as a whole. And let’s compare **char to something like sign-pin-back-5.

What with strcpy? Do we want to play code-obfuscation to look smart being able to decode this pile of letter sequence? What’s wrong with string·copy* or even stringcopy (compare photocopy)? Or even simply copy? If we want to avoid some redundant identifier without relying on overriding through argument types, English is rich in synonyms. For example duplicate, replicate, reproduce.

Various parentheses could be just as well optional to ease code browsing if proper typography is already on place, and English already provide many adverb/preposition that could replace/complement them into a linguistically more usual counterparts.

Speaking about prepositions, using from and to as identifiers for things which would be far more aptly described with nouns is really such a confusing choice. What’s wrong with origin/source and destination/target? It’s also a bit counterproductive to put the identifier, which is the main point of interest, at the very end of it’s declaration statement.

Equal for assignment is just really an artifact of more relevant symbol like ← or ≔ because most keyboard layouts stem from disastrous design. But using an more adequate symbol is really pushing for unnecessary obscured notation.

Mandatory semicolon to end a statement is obviously also a typographical nonsense.

If a parameter is to be left blank in for, we would obviously be better served with a separate control-flow construction rather than any way to highlight it’s not filled in that employ.

So packing it all:

     duplicate as function ⟨
          requiring (
               origin as sign-pin-register,
               destination as sign-pin-register
          )
          making {
               save as sign-pin
               save assigned origin
               destination-pin assigned origin-pin until ( zeroized,
                    whilst [
                        origin-increment,
                        destination-increment
                    wrought ]
               done )
               return save
          made }
     built ⟩

Given that in that case the parentheses and comas are purely ornamental, the compiler could just ignore them and would have enough information with something like

     duplicate as function
          requiring
               origin as sign-pin-register
               destination as sign-pin-register
          making
               save as sign-pin
               save assigned origin
               destination-pin assigned origin-pin until zeroized
                    whilst
                        origin-increment
                        destination-increment
                    wrought
               done
               return save
          made
     built

Or even

     duplicate as function requiring origin as sign-pin-register destination as sign-pin-register making save as sign-pin save assigned origin destination-pin assigned origin-pin until zeroized whilst origin-increment destination-increment wrought done return save made built

anticodon

Yes. In Python, black formatter consistently breaks SQLAlchemy queries in an unreadable way (e.g. splitting conditions over multiple lines when it's not really necessary and makes reading harder).

3036e4

For C++ clang-format does things like that all the time as well. Of course it has no idea what semantically belongs together on the same line or not. I wish the C++ world had settled on some other standard linter.

IshKebab

clang-format is probably the worst of the autoformatters. They tried to get fancy with a sort of global optimisation algorithm but in practice it's buggier and uglier than the classic Prettier algorithm which is elegant and generally works very well. It's also way less diff friendly.

I wouldn't draw any conclusions about autoformatters from clang-format.

pwdisswordfishz

> A C argument declaration is made up of modifiers (register, const), a data type (char *), and a name (from).

Now explain a declaration like "char *argv[]"...

> We’ve also re-set the data type such that there is no space between char and * - the data type of both of these variables is “pointer to char”, so it makes more sense to put the space before the argument name, not in the middle the data type’s name (update: it should be pointed out that this only makes sense for a single declaration. A construct like char* a, b will create a pointer to char, a, and a regular char, b).

Ah, yes, the delusional C++ formatting style. At least it's nice that the update provides the explanation why it should be avoided.

yccs27

My $0.02: Don't throw away a perfectly good mental model because of a compiler ideosyncasy. Just treat it as a special case and use a linter against stuff like char* a, b.

You also don't think about dollars differently than other units, just because the sign goes before the number.

jauntywundrkind

I'm pretty unconvinced by the examples.

> Some of us even align other parts of our code, such repeated inline comments

> Now, the arguments block forms a table of three columns. The modifiers make up the first column, the data types are aligned in the second column, and the names are in the third column

These feel like pretty trivial routines that can be encompassed by code formatting.

We can contrive more extreme examples, like the for loop, but super custom formatting ("typesetting") like that has always made me feel awkward, feels like it givesicemse for people to use all manners of arbitrary formatting. The author has some intent, but when you run into an inconsistent code based with lots of things going on, the variance doesn't feel informative or helpful: it sucks and it's a drain.

What's stored is perhaps more minimal, some kind of reference encoding, maybe prettier-ifies for js. The meat of this article to me is that it shouldn't matter: the IDE should let you view and edit as you like:

> Everyone had their own pretty-printing settings for viewing it however they wanted.

chowells

I have to disagree with the premise. Formatting code is a critical communication channel. Well-formatted code should tell you:

1. The developer has enough experience to understand that formatting matters.

2. The developer has enough discipline to stick with their chosen formatting rules.

3. The developer has the taste necessary to choose good formatting rules.

4. The developer has the judgement necessary to identify when other concerns justify one-off violations of the rules.

These are really important attributes for a developer to have. They affect every aspect of the code, not just formatting. Formatting is just a very quick proxy to measure those by.

Unfortunately, things like autoformatting and linter rules are destroying the signal. Goodheart's law strikes again.

babel_

The blog entry is short and simple, perhaps consider reading it before knee-jerk reacting to the title, and then you might understand why "should" and "unnecessary" are operative in said title.

rho4

Not caring about formatting also signals to me that:

- they have probably never worked on a codebase where files are edited by more than 1 person

- they have never done any significant amount of merging between branches

- they have never maintained a large codebase

- they have never had to refactor a large codebase

- they don't use diff/comparison tools to read the history of their codebase

- they have never written any tooling for their codebase

- they are not good team-players and/or only care about their own stuff

pure-orange

Did you not read the article?

KronisLV

If you are in circumstances where the answers to those questions are a resounding "No" then you should just set up the tooling to format the code on save / commit and perhaps to make the CI complain if anyone skips that and leave it at that.

Furthermore, instead of nitpicking over small details, it can actually be a good idea to just leave everything on default, forgo whatever your individual style might be and stick to what's been deemed to be good enough as the default - so the code will look more familiar to anyone who picks it up (and has used the tools you use for linting and formatting). Yes, formatting is different from linting; though if you set up one, you might as well do the other.

falcor84

The same personality attributes can be assessed even better based on penmanship, so going forward, I'll require all PRs to be submitted in cursive

null

[deleted]

shit_game

Would you say that someones code formatting is a shibboleth? How do you feel about formatters and linters in regards to this?

teaearlgraycold

There are times when you really want a specific formatting of the text, like visually turning a list into a table.

rho4

The system should support this, e.g. via // @formatter:off/on tags

aleph_minus_one

Some (sometimes) desirable source code formatting cannot be deduced from the abstract syntax tree alone:

Consider the following (pseudo-)code example:

  bar.glob = 1;
  bar.plu.a1 = 21;
  bar.plu.coza = fol;

Should this code formatted this way? Or should it be formatted

  bar.glob     = 1;
  bar.plu.a1   = 21;
  bar.plu.coza = fol;

to emphasize that three assignments are done?

Or should this code be formatted

  bar.glob      = 1;
  bar.plu .a1   = 21;
  bar.plu .coza = fol;

to bring make the "depth" of the structure variables more tabular so that you can immediately see by the tabular shape which "depth" a member variable has?

We can go even further like

  bar.glob     =   1;
  bar.plu.a1   =  21;
  bar.plu.coza = fol;

which emphasizes that the author considers it to be very important that the reader can easily grasp the magnitudes of the numbers involved (which is why in Excel or LibreOffice Calc, numbers are right-aligned by default). Or combining this with making the depth "tabular":

  bar.glob      =   1;
  bar.plu .a1   =  21;
  bar.plu .coza = fol;

Each of these formattings emphasizes different aspects of the code that the author wants to emphasize. This information cannot be deduced from some abstract syntax tree alone. Rather, this needs additional information by the programmer in which sense the structure behind the code intended by the programmer is to be "interpreted".

kennywinker

I see what you’re saying, but I also haven’t ever used anything but the first two formats, and my goal was always readability not emphasis.

Storing the AST instead of the text is a lossy encoding, but would we lose something more valuable than what we gain? If your example is the best thing we’d lose - i’d say it’s still net a massive win.

and there are ways to emphasize different parts, that would survive the roundtrip to AST. E.g. one way to emphasize depth:

    setValue([bar, glob], 1)

    setValue([bar, plu, a1], 21)

or to emphasize the data:

    configure(bar, 1, 21, fol)

Or heck you could allow style overides if you really wanted to preserve this kind of styling:

    // $formatblk: tabular_keypaths, aligned_assignments

    bar   .glob       = 1

    bar   .plu    .a1 = 21

    // $formatblk-end

Cthulhu_

But "desirable code formatting" is subjective; some people prefer 2, 4 or 8 spaces, some prefer columnar layout like you demonstrated, etc. You can't deduce formatting from an AST alone as an AST is not source code and does not have formatting information.

gentooflux

The second two lines of your example smell like LoD violations. It's not a formatting problem, it's a structural problem.

aleph_minus_one

Sometimes you have to use libraries that are badly designed.

gentooflux

When that happens they're usually badly formatted too.

rs186

Ah, eslint-config-airbnb. My favorite airbnb config issues:

https://github.com/airbnb/javascript/issues/1271

https://github.com/airbnb/javascript/issues/1122

I literally spent over an hour when adapting an existing project to use the airbnb config, when code was perfectly correct, clear and maintainable. I ended up disabling those specific rules locally. I never used it in another project. (Looks like the whole project is no longer maintained. Good riddance.)

The airbnb config is, in my view, the perfect example of unnecessarily wasting people's productivity when linting is done badly.

efortis

Projectional Editing can be done with text sources.

Here’s an old video of JetBrains MPS rendering a table from code https://www.youtube.com/watch?v=XolJx4GfMmg&t=63s

I’m hoping for an IDE able to render dictionaries as tables -- my wishlist doesn’t stop there.

Currently, we have a glimpse of those features, such as code folding, inlay hints, or docstrings rendered as HTML:

https://x.com/efortis/status/1922427544470438381

crq-yml

I think the problem can be defined equally as: we can't invest in something more abstract than "plain text" at this time. When we try, it gets downgraded to a plain text projection of the syntax.

The plain text encoding itself exists in a process of incremental, path-dependent development from Morse Code signals to Unicode resulting in a "Gigantic Lookup Table" (GLUT, my coining) approach to symbolic comprehension. The assumption is useful - lots of features can "just work" by knowing that a particular bit pattern is always a particular symbol.

If we push up the abstraction level, we get a different set of symbols that are better suited to the app, but not equivalent GLUT tooling. Instead we usually get parsing of plain text as a transport. For example, CSV parsing. It is sloppy; it is also good enough.

Edit: XML is also a key example. It goes out of its way to respect the text transport approach. There are dedicated XML editors. But people want to edit it as plain text and they can't quite get there because funny-business with character encodings gets in the way, adding a bunch of ampersands and semicolons onto the symbols they want to edit. Thus we have ended up with "the CSV of hypertext documents", Markdown.

banashark

Interesting read. I’ve often wondered why the projection we see needs to be the same as the stored artifact. Even something like a git diff should be viewable via a projection of the source IR.

With things like treesitter and the like, I sometimes daydream about what an efficient and effective HCI for an AST or IR would look like.

Things like f#s ordered compilation often make code reviews more simple for me, but that’s because a piece of the intermediate form (dependency order) is exposed to me as a first class item. I find it much more simple to reason about compared to small changes in code with more lax ordering requirements, where I often find myself jumping up and down and back and forth in a diff and all the related interfaces and abstract classes and implementations to understand what effect the delta is having on the program as a whole.

PaulKeeble

In theory we could have an IDE apply a reformatting to any piece of code we looked at and formatted any changes back to the standard for the code base on updates. One of the things I dislike is that sometimes autoformatting does a poor job and looses some information that manually formatting provides but honestly in go fmt is mostly fine it just works.

All of this seems doable, I just think for the most part we don't care very much about our preferences, it has very little impact on readability. Its definitely doable however we could view the code however we most wanted it and have it stored in a different formatting. Might not be 100% round trip stable but it probably doesn't matter.

There is always better where the defaults can be overridden and formatting forced and we only format new and changed lines to reduce potential instability but again go fmt doesn't really suffer from this so its possible to make things pretty reliable. Its simple really, there is a default formatting and the code is stored that way and we can then have our view of choice reformat the code as we want it, when its stored its stored in the default.

Tractor8626

Author advocates for a thing they never used.

"It must have been good because Grady Booch says so".

HN

Formatting code should be unnecessary

Formatting code should be unnecessary