Skip to content(if available)orjump to list(if available)

What makes code hard to read: Visual patterns of complexity (2023)

feoren

> Chaining together map/reduce/filter and other functional programming constructs (lambdas, iterators, comprehensions) may be concise, but long/multiple chains hurt readability

This is not at all implied by anything else in the article. This feels like a common "I'm unfamiliar with it so it's bad" gripe that the author just sneaked in. Once you become a little familiar with it, it's usually far easier to both read and write than any of the alternatives. I challenge anyone to come up with a more readable example of this:

    var authorsOfLongBooks = books
        .filter(book => book.pageCount > 1000)
        .map(book => book.author)
        .distinct()
By almost any complexity metric, including his, this code is going to beat the snot out of any other way of doing this. Please, learn just the basics of functional programming. You don't need to be able to explain what a Monad is (I barely can). But you should be familiar enough that you stop randomly badmouthing map and filter like you have some sort of anti-functional-programming Tourette's syndrome.

seeinglogic

This comment seems unnecessarily mean-spirited... perhaps I just feel that way because I'm the person on the other end of it!

I agree the code you have there is very readable, but it's not really an example of what that sentence you quoted is referencing... However I didn't spell out exactly what I meant, so please allow me to clarify.

For me, roughly 5 calls in a chain is where things begin to become harder to read, which is the length of the example I used.

For the meaning of "multiple", I intended that to mean if there are nested chains or if the type being operated on changes, that can slow down the rate of reading for me.

Functional programming constructs can be very elegant, but it's possible to go overboard :)

atoav

To me the functional style is much more easy to parse as well. Maybe the lesson is that familiarity can be highly subjective.

I for example prefer a well chosen one-liner list comprehension in python over a loop with temporary variables and nested if statements most of the time. That is because usually people who use the list comprehension do not program it with side effects, so I know this block of code, once understood stands for itself.

The same is true for the builder style code. I just need to know what each step does and I know what comes out in the end. I even know that the object that was set up might become relevant later.

With the traditional imperative style that introduces intermediate variables I might infer that those are just temporary, but I can't be sure until I read on, keeping those variables in my head. Leaving me in the end with many more possible ways future code could play out. The intermediate variables have the benefit of clarifying steps, but you can have that with a builder pattern too if the interface is well-chosen (or if you add comments).

This is why in an imperative style variables that are never used should be marked (e.g. one convention is a underscore-prefix like _temporaryvalue — a language like Rust would even enforce this via compiler). But guess what: to a person unfamilar with that convention this increases mental complexity ("What is that weird name?"), while it factually should reduce it ("I don't have to keep that variable in the brain head as it won't matter in the future").

In the end many things boil down to familiarity. For example in electronics many people prefer to write a 4.7 kΩ as 4k7 instead, as it prevents you from accidentally overlooking the decimal point and making an accidental off-by-a-magnitude-error. This was particularly important in the golden age of the photocopier as you can imagine. However show that to a beginner and they will wonder what that is supposed to mean. Familiarity is subjective and every expert was once a beginner coming from a different world, where different things are familiar.

Something being familiar to a beginner (or someone who learned a particular way of doing X) is valuable, but it is not necessarily an objective measure of how well suited that representation is for a particular task.

jghn

> Maybe the lesson is that familiarity can be highly subjective.

Over the years I've come to firmly believe that readability is highly subjective. And familiarity is a key contributor to that, but not the only one. There are other factors that I've found highly correlate with various personality traits and other preferences. In other words, people shouldn't make claims that one pattern is objectively more readable than another. Ever.

I've reached the point where anyone who claims some style is "more readable" without adding "to me" I just start to tune out. There's very little objective truth to be had here.

What should one do? If on a team and you're the outlier, suck it up and conform. If you're on a team and someone else is the outlier? Try to convince them to suck it up and conform. If you're on a new team? Work empathetically with your teammates to understand what the happy medium style should be.

p2edwards

Agreed —

seeinglogic's article made me think of a 3rd option:

1. Sorta long functional chain where the type changes partway through 2. Use temp variables 3. (New option) Use comments

(Here's funcA from seeinglogic's article, but I added 3 comments)

    function funcC(graph) {
      return 
        // target node
        graph.nodes(`node[name = ${name}]`)
          // neighbor nodes
          .connected()
          .nodes()
          // visible names
          .not('.hidden')
          .data('name');
     }


Compare to funcB which uses temp variables:

    function funcB(graph) {
      const targetNode = graph.nodes(`node[name = ${name}]`)
      const neighborNodes = targetNode.connected().nodes();
      const visibleNames = neighborNodes.not('.hidden').data('name')

      return visibleNames;
    }
For me the commented version is easier to read and audit and it also feels safer for some reason, but I'm not how subjective that is.

daotoad

Funny, I see the need to add comments as an indicator that it's time to introduce chunking of some sort--arguably it already has been.

In this case, I'd lean towards intermediate variables, but sometimes I'll use functions or methods to group things.

I prefer function/method/variable names over comments because the are actual first class parts of the code. In my experience, people are a bit more likely to update them when they stop being true. YMMV.

feoren

The dig on chains of map/reduce/filter was listed as a "Halstead Complexity Takeaway", and seemed to come out of the blue, unjustified by any of the points made about Halstead complexity. In fact in your later funcA vs. funcB example, funcB would seem to have higher Halstead complexity due to its additional variables (depending on whether they count as additional "operands" or not). In general, long chains of functions seem like they'd have lower Halstead complexity.

The "anti-functional Tourette's" comment was partly a response to how completely random and unjustified it seemed in that part of the article, and also that this feels like a very common gut reaction to functional programming from people who aren't really willing to give it a try. I'm not only arguing directly against you here, but that attitude at large.

Your funcA vs. funcB example doesn't strike me as "functional" at all. No functions are even passed as arguments. That "fluent" style of long chains has been around in OO languages for a while, independent of functional programming (e.g. see d3.js*, which is definitely not the oldest). Sure, breaking long "fluent" chains up with intermediate variables can sometimes help readability. I just don't really get how any of this is the fault of functional programming.

I think part of the reason funcB seems so much more readable is that neither function's name explains what it's trying to do, so you go from 0 useful names to 3. If the function was called "getNamesOfVisibleNeighbors" it'd already close the readability gap a lot. Of course if it were called that, it'd be more clear that it might be just trying to do too much at once.

I view the "fluent" style as essentially embedding a DSL inside the host language. How readable it is depends a lot on how clear the DSL itself is. Your examples benefit from additional explanation partly because the DSL just seems rather inscrutable and idiosyncratic. Is it really clear what ".data()" is supposed to do? Sure, you can learn it, but you're learning an idiosyncrasy of that one library, not an agreed-upon language. And why do we need ".nodes()" after ".connected()"? What else can be connected to a node in a graph other than other nodes? Why do you need to repeat the word "node" in a string inside "graph.nodes()"? Why does a function with the plural "nodes" get assigned to a singular variable? As an example of how confusing this DSL is, you've claimed to find "visibleNames", but it looks to me like you've actually found the names of visible neighborNodes. It's not the names that are not(.hidden), it's the nodes, right? Consider this:

    function getVisibleNeighborNames(graph) {
        return graph
            .nodeByName(name)
            .connectedNodes()
            .filter(node => !node.isHidden)
            .map(node => node.name)
    }
Note how much clearer ".filter(node => !node.isHidden)" is than ".not('.hidden')", and ".map(node => node.name)" versus ".data('name')". It's much harder to get confused about whether it's the node or the name that's hidden, etc.

Getting the DSL right is really hard, which only increases the benefit of using things like "map" and "filter" which everyone immediately understands, and which have no extrinsic complexity at all.

You could argue that it's somehow "invalid" to change the DSL, but my point is that if you're using the wrong tool for the job to begin with, then any further discussion of readability is in some sense moot. If you're doing a lot of logic on graphs, you should be dealing with a graph representation, not CSS classes and HTML attributes. Then the long chains are not an issue at all, because they read like a DSL in the actual domain you're working in.

*Sidenote: I hate d3's standard style, for some of the same reasons you mention, but mainly because "fluent" chains should never be mutating their operand.

climb_stealth

Just want to add that I both agree with you and parent. Your examples are readable and I see no issues there.

It might be a language thing as well. In Python often people take list-comprehensions too far and it becomes an undecipherable mess of nested iterators, casts and lists.

There are always exceptions :)

vitus

> mainly because "fluent" chains should never be mutating their operand.

I see this quite often with builders, actually, and I don't mind it so much there.

    FooBuilder()
      .setBar(bar)
      .setBaz(baz)
      .setQux(qux)
      .build()

desumeku

  o_node := graph.GetNodeByName(name)
  var ret []string
  for _, node := range o_node.connectedNodes() {
    if !node.isHidden {
      ret = append(ret, node.name)
    }
  }
  return ret

ninetyninenine

>For me, roughly 5 calls in a chain is where things begin to become harder to read, which is the length of the example I used.

This isn't just about readability. Chaining or FP is structurally more sound. It is the more proper way to code from a architectural and structural pattern perspective.

     given an array of numbers

   1. I want to add 5 to all numbers
   2. I want to convert to string
   3. I want to concat hello
   4. I want to create a reduced comma seperated string
   5. I want to capitalize all letters in the string. 
This is what a for loop would look like:

   // assume x is the array
   acc = ""

   for(var i = 0, i < x.length; x++) {
       value = x[i] + 5
       value += 5
       stringValue = str(value).concat(hello)
       acc += stringValue + ","
   }

   for (var i = 0, i < acc.length; i++) {
       acc[i] = capitalLetter(acc[i])
   }
FP:

    addFive(x) = [i + 5 for i in x]
    toString(x) = [str(i) for i in x]
    concatHello = [i + "hello" for i in x]
    reduceStrings(x) = reduce((i, acc) = acc + "," + i, x)
    capitalize(x) = ([capitalLetter(i) for i in x]).toString()

You have 5 steps. With FP all 5 steps are reuseable. With Procedural it is not.

Mind you that I know you're thinking about chaining. Chaining is eqivalent to inlining multiple operations together. So for example in that case

     x.map(...).map(...).map(...).reduce(...).map(...)

     //can be made into
     addFive(x) = x.map(...)
     toString(x)= x.map(...)
     ...
By nature functional is modular so such syntax can easily be extracted into modules with each module given a name. The procedural code cannot do this. It is structurally unsound and tightly coupled.

It's not about going overboard here. The FP simply needs to be formatted to be readable, but it is the MORE proper way to code to make your code modular general and decoupled.

harrison_clarke

you have this backwards: reusing code couples the code. copy+paste uncouples code

if you have two functions, they're not coupled. you change one, the other stays as-is

if you refactor it so that they both call a third function, they're now coupled. you can't change the part they have in common without either changing both, or uncoupling them by duplicating the code

(you often want that coupling, if it lines up with the semantics)

jltsiren

Your example is a conceptually simple filter on a single list of items. But once the chain grows too long, the conditions become too complex, and there are too many lists/variables involved, it becomes impossible understand everything at once.

In a procedural loop, you can assign an intermediate result to a variable. By giving it a name, you can forget the processing you have done so far and focus on the next steps.

stouset

You don't ever need to "understand everything at once". You can read each stanza linearly. The for loop style is the approach where everything often needs to be understood all at once since the logic is interspersed throughout the entire body.

__mharrison__

This. I teach this with Pandas (and Polars) all the time. You don't really care about the intermediate values. You build up the chain operation by operation (validating that it works). At the end you have a recipe for processing the data.

Most professional Pandas users realize that working with chains makes their lives much easier.

By the way, debugging chains isn't hard. I have a chapter in my book that shows you how to do it.

jltsiren

In the example above, you first have a list of books. Then you filter it down to books with >1000 pages. Then you map it to authors of books with >1000 pages. Then you collapse it to distinct authors of books with >1000 pages. Every step in the chain adds further complexity to the description of the things you have, until it exceeds the capacity of your working memory. Then you can no longer reason about it.

The standard approach to complexity like that is to invent useful concepts and give them descriptive names. Then you can reason about the concepts themselves, without having to consider the steps you used to reach them.

ffsm8

That's only true for casual reviewing and writing.

When you're actually analyzing a bug, or need to add a new feature to the code... Then you'll have to keep the whole thing in your mind. No way around it

It gets extra annoying when people have complex maps, reduces, flat maps all chained after the next, and each step moved into a named function.

HF constantly jumping around trying to rationalize why something happens with such code...

It looks good on first glance, but it inevitably becomes a dumpster fire as soon as you need to actually interact with the code.

reubenmorais

In a practical example you'd create a named intermediate type which becomes a new base for reasoning. Once you convinced yourself that the first part of the chain responsible for creating that type (or a collection of it) is correct, you can forget it and free up working memory to move on to the next part. The pure nature of the steps also makes them trivially testable as you can just call them individually with easy to construct values.

kccqzy

If you assign an intermediate result to a variable in a procedural loop, you can also assign intermediate results of parts of this chain to variables.

titzer

SELECT DISTINCT author FROM books WHERE pageCount > 1000;

101011

In fairness, if this was in a relational data store, the same code as above would probably look more like...

SELECT DISTINCT authors.some_field FROM books JOIN authors ON books.author_id = authors.author_id WHERE books.pageCount > 1000

And if you wanted to grab the entire authors record (like the code does) you'd probably need some more complexity in there:

SELECT * FROM authors WHERE author_id IN ( SELECT DISTINCT authors.author_id FROM books JOIN authors ON books.author_id = authors.author_id WHERE books.pageCount > 1000 )

titzer

The last one is better as:

SELECT * FROM authors WHERE author_id IN (SELECT author_id FROM books WHERE pageCount > 1000);

But I think you're missing the point. The functional/procedural style of writing is sequentialized and potentially slow. It's not transactional, doesn't handle partial failure, isn't parallelizable (without heavy lifting from the language--maybe LINQ can do this? but definitely not in Java).

With SQL, you push the entire query down into the database engine and expose it to the query optimizer. And SQL is actually supported by many, many systems. And it's what people have been writing for 40+ years.

YesBox

Scrolled to find the SQL. Such an elegant, powerful language. Really happy I chose SQLite for my game/project.

pif

Just add strong types, with verification at something-like-compile-time, and you'll have a sane language.

beryilma

This is 5 times more readable than FP example above for the same computation. The FP example uses variable book(s) five times, where using it once was sufficient for SQL. Perhaps FP languages could have learned something from SQL...

xigoi

Now modify the SQL to support books having multiple authors. (In the FP example, you would just change map to flatMap.)

odyssey7

Notably this example is declarative, the original is functional, and neither is imperative.

__mharrison__

Folks don't seem to have a problem when SQL does it. Only when code like Pandas does it...

mont_tag

Hi Matt! I've observed this phenomenon as well.

When the SQL and Pandas examples are isomorphic except for shallow syntactic differences, the root cause of the complaint must either be:

* that the judgment was emotional rather than substantive * or that the syntactic differences (dots and parens) actually matter

RedNifre

While I think your example is fine, I think the complaint was more about very long chains. Personally, I like to break them up to give intermediate results names, kinda like using variable names as comments:

  var longBooks = books.filter(book => book.pageCount > 1000)

  var authorsOfLongBooks = longBooks.map(book => book.author).distinct()

matejn

I like the SQL solutions people posted. But what about this one in Prolog?

  ?- setof(Author, Book^Pages^(book_author(Book, Author), book_pages(Book, Pages), Pages > 1000), Authors).
Depending on the structure of the Prolog database, it could be shorter:

  ?- setof(Author, Pages^(book(_, Author, Pages), Pages > 1000), Authors).

dsego

That's not a long chain. It doesn't even have a reduce, try nesting a few reducers and see how you like it.

aaronbrethorst

What is "long"?

brabel

Are you really confused when people don't think 3 operations constitutes "long"? I would guess anyone with half a brain would agree 3 operations is not long, maybe 5 or 6 and you will have many people agreeing, and above that most.

MathMonkeyMan

> I challenge anyone [...]

    select distinct author from book where pageCount > 1000;

null

[deleted]

elliottkember

Good example actually. You started with a books array, and changed the type to authors half-way through.

To know the return type of the chain, I have to read through it and get to the end of each line.

A longBooks array, and map(longBooks, ‘author’) wouldn’t be much longer, but would involve more distinct and meaningful phrases.

I used to love doing chains! I used lodash all the time for things like this. It’s fun to write code this way. But now I see that it’s just a one-liner with line breaks.

recursivedoubts

There is a (large, I believe) aspect of good code that is fundamentally qualitative & almost literary. This annoys a lot of computer programmers (and academics) who are inclined to the mathematical mindset and want quantitative answers instead.

I love dostoyevsky and wodehouse, both wrote very well, but also very differently. While I don't think coding is quite that open a playing field, I have worked on good code bases that feel very different qualitatively. It often takes me a while to "get" the style of a code base, just as a new author make take a while for me to get.

louthy

I 100% agree with this. One of the best compliments I ever got (regarding programming) was from one of my principal engineers who said something along the lines of "your code reads like a story". He meant he could open a code file I had written, read from top to bottom and follow the 'narrative' in an easy way, because of how I'd ordered functions, but also how I created declarative implementations that would 'talk' to the reader.

I follow the pure functional programming paradigm which I think lends itself to this more narrative style. The functions are self contained in that their dependencies/inputs are the arguments provided or other pure functions, and the outputs are entirely in the return type.

This makes it incredibly easy to walk a reader through the complexity step-by-step (whereas other paradigms might have other complexities, like hidden state, for example). So, ironically, the most mathematically precise programming paradigm is also the best for the more narrative style (IMHO of course!)

DidYaWipe

I had a similar experience. I was a lead on a project where the client sent a functional expert to literally (at times) watch over my shoulder as I worked. He got very frustrated after a few weeks, seeing little in the way of code being laid down. He even complained to the project manager. That's because this was a complicated manufacturing system, and I was absorbing the necessary rules for it and designing it... a process that involved mostly sitting and thinking.

When I decided on the final design and basically barfed all the code out in a matter of days, I walked this guy (a non-programmer) through the code. He then wrote my manager a letter declaring it to be the "most beautiful code he had ever seen." I still have the Post-It she left in my cube telling me that.

I have little tolerance for untidy code, and also overly-clever syntax that wastes the reader's time trying to unravel it.

And now we have languages building more inconsistent and obscure syntax in as special-case options, wasting more time. Specifically I'm thinking about Swift; where, if the last parameter in a function call is a closure, it's a "trailing" closure and you can just ignore the function signature and plop the whole closure right there AFTER the closing parenthesis. Why?https://www.hackingwithswift.com/sixty/6/5/trailing-closure-...

This is just one example, and yeah... you can get used to it. But in this example, the language has undermined PARENTHESES, a notation that is almost universally understood to enclose things. When something that basic is out the window, you're dealing with language designers who lack an appreciation for human communication.

qwertygnu

> The functions are self contained in that their dependencies/inputs are the arguments provided or other pure functions and the outputs are entirely in the return type.

Is this just a fancy way of saying static functions?

louthy

Nope, pure functions are referentially transparent. The key idea is that you can replace the function invocation with a value and it shouldn’t change the program.

A regular static function could refer to a file, a database, or it could change some global memory, etc. So, replacing the static function (that causes side-effects) with a pure value wouldn’t result in the same program.

Side-effects are usually declaratively represented by something like an IO monad. Which in reality is just a lambda with the side-effecting behaviour in the body of the lambda.

So, to make a pure IO function you don’t actually perform the IO in the function, you return a data type (the lambda) that represents the IO to perform. This maintains the purity if the function and ‘passes the buck’ to the caller. In the case of Haskell, all the way up to its Main function and into its runtime — making the language itself pure, even if the runtime isn’t.

This isn't just a Haskell thing though. I'll write code this way in C# (and have built a large pure-FP framework for C# to facilitate this approach [1]).

Here's an example of the more 'narrative style' [2] of C# using pure-FP. It reads from top-to-bottom, keeping the related functions near each other and walking the reader through the functionality. There's also a massive removal of the usual clutter you see in C#/Java programs, getting down to the essence of the logic. It won't be to everybody's taste (as it's not idiomatic at all), but it demonstrates the idea.

This style works well for regular program logic and less well for things like APIs where there's not always a narrative you can tell.

[1] https://github.com/louthy/language-ext

[2] https://github.com/louthy/language-ext/blob/main/Samples/Car...

hinkley

There’s a difference between simplifying a concept and stating it plainly.

I use this analogy a lot. Code can be like a novel, a short story, or a poem. A short story has to get to the point pretty quickly. A poem has to be even more so, but it relies either on shared context or extensive unpacking to be understood. It’s beautiful but not functional.

And there are a bunch of us short story writers who just want to get to the fucking point with a little bit of artistic flair, surrounded by a bunch of loud novel and mystery writers arguing with the loudest poets over which is right when they are both wrong. And then there’s that asshole over there writing haikus all the fucking time and expecting the rest of us to be impressed. The poets are rightfully intimidated but nobody else wants to deal with his bullshit.

h4ny

I see where you are coming from but that's unnecessarily hostile.

> There’s a difference between simplifying a concept and stating it plainly.

You are right, but they are not mutually exclusive.

The analogy you used with novel, short story, and poem/haiku also doesn't demonstrate your point: it's not like you can compress any novel into a short story, let alone a poem. If you're into games, try equating AAA-quality 3D games to novel, high-resolution 2D games to short stories, and pixel art games to haikus: it doesn't make sense and it's ridiculous.

I respect that you are passionate about the medium you choose, but what you claimed about novels and poems, as per your own words, "they are both wrong" at best. Don't generalize your personal experience to everyone else, there are kind, hardworking people out there writing novels and poems who love short stories just as much -- maybe what you need to do is to find those people instead of spewing your unwarranted anger over them.

zwnow

I consider code bad if it takes more then 5 seconds to read and understand the high level goal of a function.

Doesn't matter how it looks. If its not possible to understand what a function accomplishes within a reasonable amount of time (without requiring hours upon hours of development experience), it's simply bad.

jacobr1

There is a call-stack depth problem here that is specific to codebases though. For one familiar with the the conventions, key data abstractions (not just data model but convention of how models are structured and relate) and key code abstractions, a well formed function is easy to understand. But someone relatively new to the codebase will need to take a bunch of time switching between levels to know what can be assumed about the state or control flow of the system in the context of when that function/subroutine is running. Better codebases avoid side-effects, but even with good separation there, non-trivial changes require strong reasoning about where to make changes in the system to avoid introducing side-effects and not just passing extra state or around all over the place.

So, I'd take "good architecture" with ok and above readability, over excellent readability but "poor architecture" any day. Where architecture in this context means the broader code structure of the whole project.

zwnow

But who talked about bad architecture? Good readable code doesn't rule out good architecture. Surely some things are complicated but even then, a dev should be able to quickly see whats going on with minimal expertise in a codebase.

ikrenji

the purpose of the function should be clear from its name. if its too complex to convey this information it should have a docstring that clearly explains what it does. it's not rocket science

fasbiner

Sounds like you are content to limit yourself to problems that do not contain more irreducible complexity or require more developer context than what fits within five seconds of comprehension.

That's a good rule for straightforward CRUD apps and single-purpose backend systems, but as a universal declaration, "it is simply bad" is an ex cathedra metaphysical claim from someone who has mistaken their home village for the entirety of the universe.

hinkley

> is an ex cathedra metaphysical claim

I have a cargo ship-sized suspicion that your code is difficult to read for reasons other than intrinsic complexity.

You’ve found a way to explain it to yourself and excuse it to others, but you won’t always be the smartest person in the room.

Also that’s not what was said.

> more then 5 seconds to read and understand the high level goal of a function

Understanding what something is for is not understanding how it accomplishes it.

ajross

> I consider code bad if it takes more then 5 seconds to read and understand the high level goal of a function.

That's something that's possible only for fairly trivial logic, though. Real code needs to be built on an internal "language" reflecting its invariants and data model and that's not something you can see with a microscope.

IMHO obsessive attention to microscope qualities (endless style nitpicking in code review, demands to run clang-format or whatever the tool du jour is on all submissions, style guides that limit function length, etc...) hurts and doesn't help. Good code, as the grandparent points out, is a heuristic quality and not one well-defined by rules like yours.

necrotic_comp

I agree with this up to a point - having consistent code style with some sort of formatter (gofmt, black, clang-format) goes a long way to reducing complexity of understanding because it unifies visual style.

I suggest that a codebase should read like a newspaper. While there is room for op-eds in the paper, it's not all op-eds, everything else should read as a single voice.

zwnow

I even wrote no matter how it looks?

I meant the goal of your function needs to be grasped within a reasonable amount of time. This works for every codebase.

hinkley

Code that looks like it has a bug in it but doesn’t will draw the eye over, and over, and over again when fishing for how regressions or bugs got into the code. This is the real cost of code smells. At some point it’s cheaper for me to clean up your mess than to keep walking past it every day. But I’m going to hate you a little bit every time I do.

callc

Generally agree.

Consider reading kernel or driver code. These areas have a huge amount of prerequisite knowledge that - I argue - makes it OK to violate the “understand at a glance” rule of thumb.

andybp85

for the codebase i work on, i made a rule that "functions do what the name says and nothing else". this way if the function does too much, hopefully you feel dumb typing it and realize you should break it up.

quinnirill

Then what does the function that calls the split functions get called? foo_and_bar_and_qoo? And if they’re called only under some conditions?

hinkley

OrderAuthorsByNameAndCalculateResidualsAndSendPaperCheckWithThankYouCard()

I could see how that might come up in a retrospective.

sunrunner

Does this apply to all domains and all 'kinds' of code?

I feel like there's a fundamental difference in the information density between code that, for example, defines some kind of data structure (introducing a new 'shape' of data into an application) versus code that implements a known algorithm that might appear short in line length but carries a lot of information and therefore complexity.

zwnow

If the algorithm is well known, it's all good as long as the function name for it is somewhat understandable. I have to work with 200 line functions at work and it's a complete, excuse the language, shitshow.

jayd16

So it should take 5 minutes whether it's your language or choice or the assembly it compiles to? Or does it matter how it looks in _that_ case?

freetonik

>This annoys a lot of computer programmers (and academics) who are inclined to the mathematical mindset and want quantitative answers instead.

I find many syntactical patterns that are considered elegant to be the opposite, and not as clear as mathematics, actually. For example, the the ternary operator mentioned in the article `return n % 2 === 0 ?'Even' : 'Odd;` feels very backwards to my human brain. It's better suited for the compiler to process the syntax tree rather than a human. A human mathematician would do something like this:

         ⎧  'Even' n mod 2 = 0
  f(n) = ⎨
         ⎩  'Odd'  n mod 2 ≠ 0
Which is super clear.

pc86

Well of course if you have the freedom to write a mathematical expression you're going to be able to present it in a way that is clearer than if you have to type monospace characters into a text editor.

I'm not sure it's realistic to expect to be able to type a mathematical expression using ascii more clearly than you can write it by hand (or implement using special unicode characters).

cdirkx

Quite some years back I worked with JetBrains MPS which used a "projectional editor" instead of a text editor. It was pretty neat to be able to enter "code" as mathematical expressions, or even state machine tables or flow diagrams with actual nodes instead of a text representation.

Sadly not much has happened in that space since then, but it was cool to think about what our tools of the future might look like. (of course ignoring all the practical reasons why we're probably still using regular text files in 100 years)

Timwi

I understand that you might find the mathematical notation clearer but I think it's presumptuous of you to speak on behalf of all humans, or even all human mathematicians. I'm a mathematics graduate and I find the conditional operator more readable in a program because it corresponds to what the program actually does (it checks the condition first); but I also recognize that the two notations have exactly the same information content and only differ superficially in syntax, making it entirely a matter of familiarity.

gwbas1c

This is why code reviews are so critical: They help keep a consistent style while onboarding new team members, and they help a team keep its style (reasonably) consistent.

(Also, see my comment about .editorconfig: https://news.ycombinator.com/item?id=43333011. It helps reduce discussions about style minutia in pull requests.)

WillAdams

Arguably, this is why Literate Programming (see my comment elsethread) didn't take off.

User23

Mathematicians have recognized the importance of elegance for millennia.

mrkeen

The article's good, but misses my most mentally-fatiguing issue when reading code: mutability.

It is such a gift to be able to "lock in" a variable's meaning exactly once while reading a given method, and to hold it constant while reasoning about the rest of the method.

Your understanding of the method should monotonically increase from 0% to 100%, without needing to mentally "restart" the method because you messed up what the loop body did to an accumulator on a particular iteration.

This is the real reason why GOTOs are harmful: I don't have a hard time moving my mind's instruction-pointer around a method; I have a hard time knowing the state of mutable variables when GOTOs are in play.

cle

Disagree. There's an abstract "information space" that the code is modeling, and you have to move around your mind's instruction pointer in that space. This can be helped or hindered by both mutable and immutable vars--it depends on how cleanly the code itself maps into that space. This can be a problem w/ both mutable and immutable vars. There's a slight tactical advantage to immutable vars b/c you don't have to worry about the value changing or it changing in a way that's misleading, but IME it's small and not worth adopting a "always use immutability" rule-of-thumb. Sometimes mutability makes it way easier to map into that "information space" cleanly.

klabb3

> This is the real reason why GOTOs are harmful: I don't have a hard time moving my mind's instruction-pointer around a method; I have a hard time knowing the state of mutable variables when GOTOs are in play.

Well, total complexity is not only about moving the instruction pointer given a known starting point. Look at it from the callee’s pov instead of the call site. If someone can jump to a line, you can’t backtrack and see what happened before, because it could have come from anywhere. Ie you needed global program analysis, instead of local.

If mutability were the true source of goto complexity then if-statements and for loops have the same issue. While I agree mutability and state directly causes complexity, I think goto was in a completely different (and harmful) category.

stared

My pet peeve:

    function getOddness4(n: number):
      if (n % 2 === 0):
        return "Even";
      return "Odd";
While it is shorter, I prefer vastly prefer this one:

    function getOddness2(n: number):
      if (n % 2 === 0):
        return "Even";
      else:
        return "Odd";
Reason: getOddness4 gives some sense of asymmetry, whereas "Even" and "Odd" are symmetric choices. getOddness2 is in that respect straightforward.

bogomog

  function getOddness(n: number):
    return (n % 2 === 0)
      ? "Even"
      : "Odd";
Lowest boilerplate makes it the most readable. If working in a language with the ternary operator it ought to be easily recognized!

ajuc

I love code golf as much as anyone, not sure it's worth it on such small methods tho. Any of the propositions would be fine. Anyway:

    def oddness(n):
      return ["Even", "Odd"][n % 2]
BTW this trick with replacing if-then-else with a lookup is sometimes very useful. Especially if there's many ifs.

pasc1878

Or your write in APL

stared

This is, IMHO, the idiomatic way to do so.

erikerikson

Since the article was in JavaScript:

  const getOddness = (n) => (
    n % 2
      ? 'Odd'
      : 'Even'
  )
Even less visual noise

lioeters

    const getOddness = n => n % 2 ? 'Odd' : 'Even'

null

[deleted]

culopatin

While this is simple and all, the English words if/else don’t require the reader to know the ?: convention. Depending on what background the reader may have, they could think of the set notation where it could mean “all the evens such that odd is true” which makes no sense. Its also very close to a key:value set notation. If/else leave no doubts for the majority of readers. It’s more inclusive if you will.

bogomog

That's why I gave the caveat that if using a language with the ternary operator, one should know that operator. Python tried using English words for a ternary, but I think that's awkward from a readability perspective. A limited set of symbolic syntax improves readability over using words in my opinion, there's less text to scan.

erikerikson

Learning the basic operators of the language seems like table stakes.

Choosing one of world's spoken languages over others seems to be the opposite of inclusive.

elliottkember

Putting “return” on a different line from the actual value you’re returning?

Twisol

If there are two ways to say something, then people will find ways to make their choice of method into speech as well.

To me and my style of coding, there's a difference of intent between the two. A ternary connotes a mere computation, something that should have no side-effects. A conditional connotes a procedure; the arms of the conditional might be expected to have side-effects. (And the case of `if (_) return` or similar are pure control flow guards; they neither compute a value nor perform a procedure as such.)

It's not just about where the symbols go on the screen.

__oh_es

I feel guard clauses/early returns end up shifting developer focus on narrowing the function operation, and not an originally happy path with some after thought about other conditions it could handle.

IME else’s also end up leading to further nesting and evaluating edge cases or variables beyond the immediate scope of happy path (carrying all that context!).

CatAtHeart

I personally prefer the former as you can visually see the return one level of indentation below function name. It shows a guaranteed result barring no early-exits. Something about having the return embedded lower just seems off to me.

cess11

I would add a blank line to push 'return "Odd";' from the if, and also add brackets around the if-body if the language allows.

There are situations where I allow else, they tend to have side effects, but usually I refactor until I get rid of it because it'll come out clearer than it was. Commonly something rather convoluted turns into a sequence of guards where execution can bail ordered based on importance or execution cost. It isolates the actual function/method logic from the exit conditions.

BlackFly

The asymmetry is apparent if the code gets refactored to continuation/callback style or mutation of a more complex data structure, the first method will fall through and execute the second instruction set. Return is a special operator in this sense in that it breaks control flow and the ordinary control flow of the first method is not capturing the exhaustiveness of the two cases.

In idiomatic rust, return isn't used except for exceptional cases that break the control flow of the method and the second example is more commonly seen without return statements at all. Idiomatic python also typically early exits in the beginning with a return on invalid parameters or state with a tail position return being the usual actual return value. Because of these conventional practices, breaking the exhaustive if-else control structure makes the indented return appear exceptional (like an invalidity). If you follow these conventions than naturally the return statement begins to appear redundant except in the break-control-flow cases and the choice of the rust convention begins to make sense: in all languages return is a statement equivalent to break.

brulard

Nice example of how subjective this is. I immediately thought the first one without "else" is clearly the winner.

This is the problem with formatting rules. A codebase needs to have consistent style, even though that might mean nobody is fully happy with it.

I for example can not stand semicolons in JavaScript. It is just a visual clutter that is completely redundant, and yet some people really want it there.

makeitdouble

If it's this short, the ternary operator would be the absolute best option IMHO.

If any of the clauses are much longer, the first option reads a lot better if it can be a guard cause that returns very quick.

If neither options are short I'd argue they should be pushed away into scoped and named blocks (e.g. a function) and we're back to either a ternary operation or a guard like clause.

zoogeny

90% of the time I prefer the first. I am allergic to indentation and I hate anything remotely like:

    function foo(a) {
        if (a) {
            return doThing()
        } else {
            return Error();
        }
    }
I like all of my assertion and predicate guards nicely at the top of a function:

    function foo(a) {
        if (!a) {
            return Error()
        } 

        return doThing()
    }
And for that reason, I would probably go for getOddness4 even though I see your point.

stared

In case of handling exceptions (and similar stuff), I also prefer avoiding unnecessary nesting.

For two (or more) equally valid, I prefer keeping same nesting.

CharlieDigital

Maybe it's just me, but TypeScript makes code hard to read.

It's fine if the data model is kept somewhat "atomic" and devs are diligent about actually declaring and documenting types (on my own projects, I'm super diligent about this).

But once types start deriving from types using utility functions and then devs slack and fall back to type inference (because they skip an explicit type), it really starts to unravel because it's very hard to trace fields back to their origin in a _deep_ stack (like 4-5 levels of type indirection; some inferred, some explicit, some derived, some fields get aliased...).

    type Dog = {
      breed: string
      size: "lg" | "md" | "sm"
      // ...
    }

    type DogBreedAndSize = Pick<Dog, "breed" | "size">

    function checkDogs(dogs: Dog[]) : DogBreedAndSize[] {
      return dogs.map(d => /* ... */)
    }

    const checkedDoggos = checkDogs([])
Versus:

    function checkDogs(dogs: Dog[]) {
      // ...
    }
Very subtle, but for large data models with deep call stacks, the latter is completely unusable and absolutely maddening.

bluefirebrand

I agree that functions should probably specify their output type, MOSTLY to enforce that all paths that return from that function must adhere to that type

I've seen plenty of regressions where someone added a new condition to a function and then returned a slightly different type than other branches did, and it broke things

However, I don't think there is much value in putting types on variable declarations

In your example,

`const checkedDoggos = checkDogs([])` is good. Just let checkedDoggos inherit the type from the function

I have a codebase I'm working on where the linter enforces

`const checkedDoggos: DogBreedAndSize[] = checkDogs([])`

It is very silly and doesn't add much value imo

CharlieDigital

I want it on the other side (on the function return) so that it's consistently displayed in type hints and intellisense so I don't have to navigate the code backwards 3-4 layers to find the root type (do you see what I'm saying?)

    function checkDogs(dogs: Dog[]) : DogBreedAndSize[] {
      return dogs.map(d => /* ... */)
    }
^^^ That's where it's important to not skip the type def because then I can see the root type in the editor hints and I don't need to dig into the call stack (I know the end result is the same whether it's on the assignment side or the declaration side, but it feels like ensuring it's always on the declaration side is where the value is)

dkdbejwi383

I'd prefer to have some type information over nothing if the choice were between TypeScript with some inferred return types, versus JavaScript where you're never really sure and constantly have to walk back up/down the stack and keep it in your mind.

CharlieDigital

I'd say on backend, my preference is statically something like C#. Statically typed but enough type flexibility to be interesting (tuples, anonymous types, inferred types, etc)

Sateeshm

My policy is to type only when the typescript compiler yells at me.

userbinator

Smaller functions with fewer variables are generally easier to read

I hate how a lot of focus on "readability" is on micro-readability, which then tends to encourage highly fragmented code under the highly misguided assumption that micro-readability is more important than macro-readability. The dogma-culting around this then breeds plenty of programmers who can't see the forest for the trees and end up creating grossly inefficient code and/or have difficulty with debugging.

APL-family languages are at the other extreme, although I suspect the actual optimum is somewhere in the middle and highly dependent on the individual.

DimmieMan

There's certainly a middle ground, especially when multiple file are involved. 3-4 go to definitions in something i'm not familiar with and i'm struggling, now that's a me problem but i can't imagine most people are miles ahead of me.

.Net culture, especially with "clean architecture" is shocking for this, you go to modify a feature or troubleshoot and things are spread across 4 layers and 15 files, some that are > 60% keywords.

I don’t have an answer of where the cutoff is but I'll generally take 1 longer function that's otherwise neat and following the other recommendations outlined that I can read sequentially instead of scrolling up and down every 5 lines because it's so fragmented. same can be said for types/classes too, that 4 value enum used only for this DTO does not need to be in another file!

neonsunset

What’s tragic is this is completely self-inflicted and you could argue is done against the kind of code structure that is more idiomatic to C#. Luckily, you don’t have to do it if you are not already working with this type of codebase.

jorams

This is an interesting article, but also rather unsatisfying. It very quickly jumps to conclusions and goes right back to opinion. I agree with several of those opinions, but opinion was explicitly not the point of the article.

> Prefer to not use language-specific operators or syntactic sugars, since additional constructs are a tax on the reader.

I don't think this follows from the metric. If a function contains three distinct operators, a language-specific operator that replaces all three of them in one go would reduce the "effort" of function. It's highly scenario-specific.

> Chaining together map/reduce/filter and other functional programming constructs (lambdas, iterators, comprehensions) may be concise, but long/multiple chains hurt readability

I don't think this follows either. One effect of these constructs when used right is that they replace other operators and reduce the "volume". Again this can go both ways.

> ...case in point, these code snippets aren’t actually equivalent!

That's a very language-specific diagnosis, and arguably points at hard-to-read language design in JS. The snippet otherwise doesn't look like JS, but I'm not aware of another language for which this would apply. Indeed it is also commonly known as a "null-safe operator", because most languages don't have separate "null" and "undefined".

> variable shadowing is terrible

> long liveness durations force the reader to do keep more possible variables and variables in their head.

These can arguably be contradictory, and that is why I am a huge fan of variable shadowing in some contexts: By shadowing a variable you remove the previous instance from scope, rather than keeping both available.

shortrounddev2

There's a cool plugin for vscode called Highlight[1] that lets you set custom regexes to apply different colors to your code. I think a common use of this is to make //TODO comments yellow, but I use it to de-emphasize logs, which add a lot of visual noise because I put them EVERYWHERE. The library I maintain uses logs that look like:

    this.logger?.info('Some logs here');
So I apply 0.4 opacity to it so that it kind of fades into the background. It's still visible, but at a glance, the actual business logic code pops out at you. This is my configuration for anyone who wants to modify it:

    //In vscode settings.json:
    "highlight.regexes": {
        "((?:this\\.)?(?:_)?logger(?:\\?)?.(debug|error|info|warn)[^\\)]*\\)\\;)": {
            "regexFlags": "gmi",
            "decorations": [{
                "opacity": "0.4"
            }]
        }
    },
---

[1] https://marketplace.visualstudio.com/items?itemName=fabiospa...

gwbas1c

> For long function chains or callbacks that stack up, breaking up the chain into smaller groups and using a well-named variable

> Is the second one marginally less efficient?

> Yes.

No, both versions are just as efficient:

In both versions, the same objects are allocated, stored on the heap, and garbage collected. The difference in efficiency comes down to the compiler.

For the second version, the compiler should observe that each variable is only used immediately after it's declared, and thus treat those objects as "out-of-scope" as if it's a chained function call.

superjan

I agree. After compiling it is even quite likely that the compiler does not care you gave a name to a return value (assuming you let it infer the variable type). What you will see in practice is that the intermediate is explicitly “materialized” (e.g. into a list), because the author wanted to inspect it in the debugger. That will have some cost, mostly in the form of avoidable allocations.

jt2190

Kudos to seeinglogic for trying to quantify that “readablity” is. We need a lot more of this. (I feel like the most common definition of readability in use today is “readable to me“.)

I have a half-baked thought that we could find the actual dimensions of readability if we gave a test to a very large group of people and asked them to pick a sentence that describes what the code does. Each question would be timed. The questions that most people answered correctly in the shortest average time would provide us with examples of “real-world readable” code, and more importantly, might help us identify some truly not-readable practices.

I predict we’d see respondents start to cluster around various things, like “how long have they been programming?“, “do they understand programming paradigm X?“, etc. Perhaps the results would shift over time, as various things came into and out of fashion.

alpinisme

Yes one of the core challenges here is that we learn to read code. So what you learn to read and write shapes what you find readable. And lots of factors shape what you learn to read and write, including what you are trying to do, who you’re doing it with, what besides coding you knew how to do ahead of time, what other languages you know, etc. One stark possibility is that a lot of “readability” concerns after the low hanging fruit is gone (like don’t name variables with arbitrary irrelevant or misleading names) are really just about consensus building: maybe there are no right answers that transcend the particular group of programmers you are trying to work with.

bluGill

As an example, for my first decade of programming I worked on code where the coding style banned the ?: operator, so I didn't use it and found such code hard to read the few times I encountered it. Then I got a new job where one of the programmers really liked that operator and so I was forced to learn how to read it, now such code is more readable then the if statements to me - when used in the way we use it on this project.

James_K

I actually don't see any value in it. Code readability is similar to language readability in that it is mostly a concern for people who don't know a language and can be addressed by spending time with it. The real issue of programming is code complexity which you cannot determine from metrics about individual pieces of code. The problem exists in the relationships between functions rather than the implementation decisions in the bodies of those functions.

usrbinenv

That's very one-dimensional. It's usually easy to tell what the code does, but what's hard is to modify or add functionality to it. And this is because of various levels of abstractions that hide how things are interconnected.

usrbinenv

Another thing that comes to mind is the level at which one is familiar with a particular style of code organization & a set of abstractions. For example, Rails devs really have absolutely no problem getting up to speed with any given Rails app, but people who don't practice Ruby/Rails as their primary language/framework often complain how complicated and magical it is.

esafak

Poor abstractions. Good abstractions make it easier to change things, by decomposing the code into cohesive pieces with low coupling so that you can swap them out and having to think about the surrounding pieces beyond their interfaces. A good interface is small and logical.

bluGill

Good abstractions make it easy to change the things that will change. Abstractions always make some changes harder and some easier, but good ones make hard the things you wouldn't change anyway.

James_K

I my view, code complexity is best expressed in the size of it's syntax tree, with maybe an additional term for the number of unique nodes. The real mistake here is the assumption that local reductions in complexity make a meaningful difference to overall complexity. Small local decreases in complexity may guide you towards the local minimum of complexity, but will never substantially change the complexity of the code-base overall. All measurements of code complexity are essentially as good as asking "how much code do you have".

bluGill

That is ultimately by problem with the article. It isn't a bad investigation but it cannot stand alone. I never review on function in isolation. It always needs to be in context of what calls it (and often what it calls).

Izkata

I think the only one I disagree with here is the function chains example. I may agree with a different example, but with this one I find the chained version without variables much easier to understand because I'm traversing the graph visually in my head, while the variables are additional state I have to keep track of in my head.

----

Really I was hoping this would be about actual visual patterns and not syntax. It's my major issue with how strict code linting/autoformatting is nowadays.

For example, the "black" formatter for python requires this:

    drawer.ellipse((10, 10, 30, 30), fill=(256, 256, 0))
    drawer.ellipse((370, 10, 390, 30), fill=(256, 256, 0))
    drawer.arc((20, 0, 380, 180), 15, 165, fill=(256, 256, 0), width=5)
The first argument is (x1, y1, x2, y2) of a bounding box and black wants to align x1 for "ellipse" with y1 of "arc". Do people really find that more readable than this?

    drawer.ellipse( (10,  10,  30,  30), fill=(256, 256, 0) )
    drawer.ellipse( (370, 10, 390,  30), fill=(256, 256, 0) )
    drawer.arc(     (20,   0, 380, 180), 15, 165, fill=(256, 256, 0), width=5 )
Or perhaps something more common in python, kwargs in function arguments. No spacing at all is standard python style:

    Foobar.objects.create(
        name=form.name,
        owner=user,
        location=form.location,
        source=form.source,
        created=time.now,
    )
Instead for me it's much easier to read this, since I don't have to scan for the "=" and mentally parse each line, it's just a table:

    Foobar.objects.create(
        name        = form.name,
        owner       = user,
        location    = form.location,
        source      = form.source,
        created     = time.now,
    )
But no linters/formatters accept such spacing by default. I think flake8 (linter) could be configured to ignore whitespace like this, but black (formatter) can't.

jraph

There are two issues with such alignment that may make the "less readable" version a bit better despite being, indeed, arguably less readable:

- a modification might lead you to realign several lines, making diffs noisier (though you can ignore white-spaces when displaying diffs, but the commits still hold these lines making navigating in the history less straightforward

- we all have our own preferences wrt alignment, and your way of aligning or what you decide to align might be different from someone else, and this can lead to friction

Worse is probably better here, as much as I like aligned stuff, I think black is right in this case :-)

12_throw_away

Vertical alignments really makes me want to create a devtool stack that operates at more of an AST level and less of an "monospaced text + extra bells and whistles" level.

Aligning similar expressions for ease of reading seems like exactly the sort of thing an editor should display for us without requiring some arbitrary number of spaces to be stored in a text file ...

Izkata

AST level would have to automatically figure out what parts should be aligned, an alternative is to keep saving it in text but tweak the meaning of the "tab" character, so the developer still has control over what gets aligned: https://nick-gravgaard.com/elastic-tabstops/

Not great since viewing it in something that doesn't understand elastic tabstops would just be a mess, but it solves one of the issues the other response brings up, and I think some sort of user control like that is going to remain necessary either way.

mont_tag

When vertical alignment helps readability, you have use `#fmt: on/off`. Black simply doesn't have enough information to know when assignment alignment (or comment alignment) was done on purpose.