Don't Be Afraid of Types
232 comments
·March 18, 2025DeathArrow
chamomeal
Uncle Bob in particular is a funny case. I feel like his books are largely credited as the being source those very particular OOP obsessions
But he eventually turned to the dark side. He claims clojure, a functional lisp, is his absolute favorite language.
He’s even got blog posts about it! Multiple!
I stumbled on this while reading about clojure. I really like his blog!
https://blog.cleancoder.com/uncle-bob/2019/08/22/WhyClojure....
turtleyacht
Holub [1] helped clarify this for me. Functional programming can express patterns just as well as OOP. Implementations--idioms--of a pattern can appear different but still retain its design purpose.
Previously, I thought FP was a way to happy-path incidental habits to avoid studying every pattern. But if patterns are discovered, arise out of independently invented idioms, then the best I could do is reinvent what everyone else has found worked for them (and turned out to be a design pattern).
It has also helped me look at Gang of Four (GoF) examples less literally--if we don't have exactly these classes, it's wrong--to context matching a potential solution with a given problem.
The light bulb moment is when OS artifacts like filesystem, programming constructs like modules, and even some one-off scripts can also participate in a pattern implementation, not just having a specific constellation of classes.
[1] Holub on Design Patterns
veqq
> Functional programming can express patterns just as well as OOP.
No! Patterns are just crutches for missing language features or "design patterns are bug reports against your programming language." GoF patterns are concepts useful in OOP, but the recurring patterns and architectures you see in other paradigms are totally different. And they don't apply to Lisp: https://www.norvig.com/design-patterns/ Most don't even apply in Go: https://alexalejandre.com/programming/software-architecture-...
> visitor is type switching, singleton global variables, command and strategy are first class functions with closures, state is building a state machine with first class functions
If you could perfectly compress your code, repeated patterns would be factored away. Macros do this. In Lisp, when you find a pattern, you write code which generates that pattern, so you don't have to.
https://mishadoff.com/blog/clojure-design-patterns/
> if patterns are discovered, arise out of independently invented idioms
Yes. That's the point behind Christopher Alexander's pattern concept - he found architectural patterns which seemed to promote good social habits, happiness etc. Gabriel's Patterns of Software presents this far better than GoF. I strongly suggest you read it: https://www.dreamsongs.com/Files/PatternsOfSoftware.pdf
agumonkey
I believe OO was never meant to be reasoned about, it was just a way to avoid coupling just enough to avoid death. Also, even as a FP head, I think something is missing in FP for some domains.. where I go from object state soup, to function composition soup. At which point I'm in need for a kind of protocol algebra (sitting between pure FP and object graphs). Maybe haskellers do that in their own way, sadly I don't have the time nor the network to know more.
bayindirh
OO (as a data container or not) fits into some domains very well. Gonna get stuff from a database? Objects are great. Want to move something without being accidentally written/corrupted? Objects are great. Want to model a 3D object with per node/face/$ANYTHING properties, objects are great.
Does object handle everything? Of course not, but having it as a capability at hand allows some neat tricks and tidy code.
I believe every problem needs a different mix of capabilities/features, and that particular mix directs my choice about which tools to use. This mindset always served me well, but who knows, maybe I'm the dumbest one in the room.
eterm
Objects are famously bad at (relational) databases, hence a near universal loathing of ORMs.
ColonelPhantom
What is a 'procotol algebra'? I googled the term but didn't find anything that seemed relevant.
agumonkey
Made that up. Say function can be composed, that's the core algebra. f,g,h... and compose operator, but often you need more involved logic and types that can't be encoded in a simple domain Int and types like Int -> Int[0]. You need DB, Logging, Transaction, whatever lower level system used. In OO you use inheritance to be able to integrate all this through layered method calls.. I kinda describe this a protocol. Problem is, OO is loose on types and mutability.. so I'd think there's a gap to fill between function algebras and these 'protocols'. A way to describe typed "function graphs" in a way that can be intercepted / adjusted without passing tons of functions as parameters.
Again that's a bedroom thought, maybe people do this with Category Theory in a haskell library, or caml modules, I'm just not aware of it.
[0] Then there are monadic types to embed a secondary type but that seems too restrictive still.
feoren
Mostly agree, but there are methods that truly are best co-located with their (immutable) data. A good example is Point.Offset(deltaX, deltaY) which returns a new point relative to the callee. Why force that into a static class?
There are plenty of examples where you want to use an abstraction over various immutable record types. Services vs. records is a false dichotomy and there is power in mixing the two.
Yes, there are lots of functions that don't make sense to co-locate with their operands. Static classes are fine for those functions if both of these are true: the function requires no dependencies, and there is only one reasonable implementation of the function. In practice I find it rare that both of these are true. With a good Dependecy Injection system (and bad ones abound), requesting a service instance is just as simple as referencing a static class.
ansgri
Your Point example is indeed good, it shows one of the drawbacks of small 'proper' objects. How do you handle the case of offsetting thousands of points with a single operation, which in most cases will make sense and is readily vectorizable? It's better to expose the internals and use external functions here.
There's probably a deeper question, how to make objects 'transposable' in general case (like going from row-based to column-based representation) without duplicating code or exposing internals?
feoren
Honestly I wouldn't think much about offsetting thousands of points in my normal work. I'd expect that the compiler would do a good enough job of optimizing it and it's just as parallelizable (if necessary) as using a static method. Here I'm comparing against a static OffsetPoint(startPoint, x, y) type function. I don't see a performance difference there.
But you're right that it's commonly nice to operate on sets of things instead of individual things. If I were offsetting millions of points repeatedly, I'd look hard for a good data structure to optimize for whatever I'm trying to do.
"Exposing Internals" is not really the big issue here; the big issue is resilience against change. The time when it's appropriate to finely optimize CPU cycles is long after that code has settled into a very durable form. It's just that, for most systems, there's a lot more time spent in the volatile stage than the durable stage. Get your Big-O right and don't chatter over networks and you won't need to worry about performance most of the time. It's much rarer that you don't have to worry about change.
DeathArrow
>With a good Dependecy Injection system (and bad ones abound), requesting a service instance is just as simple as referencing a static class.
DI in .NET is very good and you can access an object with ease with DI. Still, why use it? It's another layer between the caller and calee. Creating objects and resolving those through DI takes some CPU cycles without any major added benefits and your code becomes more complex so more things can go wrong.
beluchin
> Static classes are fine for those functions if ...
> there is only one reasonable implementation of the function
this. In the absence of polymorphism, a static function is just fine. I am in the phase of avoiding object notation in preference of functional notation whenever possible. I leave OO notation to cases where, for example, there is polymorphism as the functional wrapper will add little to no value.
pin24
> Yes, there are lots of functions that don't make sense to co-locate with their operands.
May I ask for one or two examples?
feoren
I'd argue that any functions with arity higher than 1 (including "this"/"self"), where one operand is not clearly "primary" somehow, should live in some sort of "service" (and I am including static classes or global modules in this category). Bonus points if the function is used much more rarely than the type it operates on. I'd argue that Math.Add(x, y) is way better than x.Add(y), especially since 7.Add(8) looks a bit odd (if it even works in your language). Note that my preference includes the standard "7 + 8", since + is defined as a static function rather than as a method on an integer.
For a more complex example, let's consider mixing colors.
Say we have red = ColorRGB(196, 0, 0) and blue = ColorRGB(0, 0, 196) and our current task is "allow colors to be mixed". Which looks better:
purple = red.MixWith(blue) or
purple = Color.Mix(red, blue) ?
There are many different ways to mix colors, and many other things you might want to do with them too. When you start getting into things like red.MixHardLight(blue) vs.
ColorMixing.HardLight(red, blue)
The advantage of the latter becomes more clear. And it naturally extends into moving your mixing into an interface instead of a static class, so you can have "ColorMixer.Mix(red, blue)", etc. It's about focusing more on the operation you're doing than the nouns you're doing them on, which just tends to be a cleaner way to think about things. There's just a lot more variety in "different operations you can do with things" than in "types of things", at least in the kind of software development I've experienced.icedchai
OOP eventually degrades into "lasagna" (a play on "spaghetti code".) Layers and layers of complexity like you describe. Once you have enough "layers" it becomes almost impossible to follow what is actually happening.
samiv
I don't think this is what lasagna means.
Lasagna is when you have your software organized in layers. In other words instead of having a big ball of mud where A calls B calls C calls A calls C calls B you have layers (like in lasagna) so that A calls B calls C and you keep your code base so that classes/modules/types that are in the lower layer do not depend on or know of the anything above them and the dependencies only go one way.
I love lasagna. It's great (both as design and as food) !
seanwilson
Not trying to be funny, but "Ravioli code" might be closer:
https://stackoverflow.com/questions/2052017/ravioli-code-why... https://en.wikipedia.org/wiki/Spaghetti_code#Ravioli_code
A related principle that I don't think is talked about enough is "locality": I'd rather have all the code about one feature in one file or close together, rather than it strewn across files where it's harder to read and understand as a whole. Traditional Java was notorious for being the opposite of this. Traditional HTML+CSS+JavaScript is also very bad for this problem.
icedchai
It may be a poor analogy but I’ve seen it used elsewhere. I didn’t make up the term, though I’ve certainly witnessed it.
pin24
If the encapsulation is implemented in a way that a) it is not possible to instantiate invalid objects of the type and b) all modifications via the type's methods only create valid invariants by enforcing the type's rules (either by mutating it in place or by returning a new instance for every modification), then it is ensured at compile time that this type cannot be misused ("make invalid states unrepresentable"). If that particular logic lives somewhere else, that's not possible.
throwaway2037
> because in OOP land
Speaking on behalf of enterprise CRUD devs (the vast majority of programmers), the Java POJO (pure data struct) is alive and well.throw_m239339
Inheritance is one way to achieve polymorphism. it is not mandatory for OOP.
Unfortunately YES, because of "Java Entreprise" pushed by consultancies 15 years ago, a lot of developers insist on encapsulating everything, even when it's redundant.
Fortunately, Java is a better language today.
throwaway2037
> a lot of developers insist on encapsulating everything, even when it's redundant.
Can you give an example? 15 years ago was 2010 and Java 8 was already released. > Fortunately, Java is a better language today.
In what ways? To me, it has barely changed since JDK 8 when it got lambdas. To be clear, the JVM is leaps and bounds better each long term supported release.rustyminnow
Sealed interfaces and records allow you to effectively build sum types. Switch expressions with type based pattern matching really streamline certain patterns. Type inferred vars are massively welcome. Streams and lambdas are pretty old by now, but they are much more widely embraced in e.g. frameworks and std libs which is nice.
Independently none of these are game changing, but together they provide much improved ergonomics and can reduce boilerplate. Now if only we had null safe types <3.
turdprincess
In that setup, how do you write a test which acts against a mock version of your API client (a static function in your proposal?)
nrook
Run a fake version of the API server, and then connect the real client to it. It is usually a mistake to make the "unit" of your unit test too small.
kdps
In this case, API does not refer to client/server. The API of the aforementioned static class is the set of its methods and their signatures.
btreecat
Generally, avoid mocks.
Run a copy of the server in a container, run client E2E tests against it.
MrDarcy
In Go at least you simply define an interface and use it as the receiver then inject whatever mock object you want.
The mock itself may simply be a new type defined in your test case.
beders
yeah, I wouldn't recommend trying to do this with pure Java but you could pass around method handles for that purpose.
You certainly would want to use an `interface` and that means you need an object. It could be an object that has no fields though and receives all data through its methods.
But it does go against the spirit of objects: You want to make use of `this` because you are in the land of nouns.
beders
The issue is names.
Every darn little thing in Java needs a name.
If there is no good name, that's a hint that maybe you don't need a new type.
Obligatory Clojure example:
(defn full-name [{:keys [first-name last-name]}]
(str first-name " " last-name))
This defines a function named `full-name`.
The stuff between [] is the argument list. There's a single argument. The argument has no name.
Instead it is using destructuring to access keys `:first-name` and `:last-name` of a map passed in (so the type of the unnamed argument is just Map)This function works for anything that has a key `:first-name` and `:last-name`.
There's no need to declare a type ObjectWithFirstNameAndLastName. It would be quite silly.
motorest
> Every darn little thing in Java needs a name.
Don't all types need names, regardless of what language you use?
Look at typescript, and how it supports structural typing. They don't seem to have a problem with names. Why do you think Java has that problem when nominal type systems simplify the problem?
> There's no need to declare a type ObjectWithFirstNameAndLastName. It would be quite silly.
Naming things is hard, but don't criticize typing for a problem caused by your lack of imagination. A basic fallback strategy to name specialized types is to add adjectives. Instead of ObjectWithFirstNameAndLastName you could have NamedObject. You don't need to overthink it, with namespaces and local contexts making sure conflicts don't happen.
There are two mindsets: try to work around problems to reach your goals, and try to come up with any problem to find excuses to not reach your goals. Complaining about naming sounds a lot like the second.
josephg
> Don't all types need names, regardless of what language you use?
No.
- In very dynamic languages (like javascript), most types arguably don't have names at all. For example, I can make a function to add 2d vectors together. Even though I can use 2d vectors in my program, there doesn't have to be a 2d vector type. (Eg, const vecAdd = (a, b) => ({x: a.x+b.x, y: a.y+b.y}) ).
- Most modern languages have tuples. And tuples are usually anonymous. For example, in rust I could pass around 2d vectors by simply using tuples of (f64, f64). I can even give my implicit vector type functions via the trait system.
- In typescript you can have whole struct definitions be anonymous if you want to. Eg: const MyComponent(props: {x: number, y: string, ...}) {...}.
- There's also lots of types in languages like typescript and rust which are unfortunately impossible to name. For example, if I have this code:
#[derive(Eq, PartialEq)]
enum Color { Red, Green, Blue }
fn foo(c: Color) {
if c == Color::Red { return; }
// What is the type of 'c' here?
}
Arguably, c is a Color object. But actually, c must be either Color::Green or Color::Blue. The compiler understands this and uses it in lots of little ways. But unfortunately we can't actually name the restricted type in the program.Rust can do the same thing with integers - even though (weirdly) it has no way to name an integer in a restricted range. For example, in this code the compiler knows that y must be less than 256 - so the if statement is always false, and it skips the if statement entirely:
https://rust.godbolt.org/z/3nTrabnYz
But - its impossible to write a function that takes as input an integer that must be within some arbitrary range.
whilenot-dev
> Arguably, c is a Color object. But actually, c must be either Color::Green or Color::Blue. The compiler understands this and uses it in lots of little ways. But unfortunately we can't actually name the restricted type in the program.
I think that's less a question whether you can, but rather whether you should or shouldn't design it that way... (I'll use TypeScript here for the simpler syntax)
It'd be perfectly fine to do something like this:
type ColorR = 'red';
type ColorG = 'green';
type ColorB = 'blue';
type ColorRGB = ColorR | ColorG | ColorB;
But which constraint should your new type ColorGB (your variable c) adhere to? // constraint A
type ColorGB = ColorG | ColorB;
// constraint B
type ColorGB = Exclude<ColorRGB, ColorR>;
I'd argue if the type ColorGB is only needed in the derived form from ColorRGB within a single scope, then just let the compiler do its control flow analysis, yes - it'll infer the type as constraint B.But if you really need to reuse the type ColorGB (probably some categorization other than all the colors), then you'd need to pay close attention to your designed constraint.
relaxing
The named integer range thing is interesting. I guess it depends on what you goal is. Could you use asserts? Could you wrap the integer in an object and embed the restriction logic there?
kirkhawley
"Most modern languages have tuples. And tuples are usually anonymous." Sure! And I LOVE ending up with var names that are always Item1 and Item2... Very descriptive!
gleenn
All types don't need names exactly like the Clojure example shows. There is no "type" for the argument, likely it's a map under the hood. And maps with keywords are used broadly across Clojure projects as a method of pass groups of data and no, you don't have to name that arbitrary collection of data. Rich Hickey has an amazing presentation on the silliness of a commonly used Java Web Request library where the vast majority of object types required could have just been little nested maps instead of creating a ton of bespoke like types that make it difficult to assemble and manipulate all the bits. The interface he roasts could be completely discarded an nearly no benefits lost in terms of readability or usability. Hickey is also famous for saying he would rather a few data structures and a 100 little algorithms instead of 10 data structures and 10 algorithms per data structure.
Capricorn2481
> Hickey is also famous for saying he would rather a few data structures and a 100 little algorithms instead of 10 data structures and 10 algorithms per data structure
This is a quote from Alan J. Perlis, not Rich Hickey, and it's certainly not what Rich is famous for.
noduerme
I think Java culture had something to do with the ridiculously verbose names, but even more so the prevalence of Factory and Singleton paradigms in Java created these issues. Maybe because it was the first OO language that a lot of procedural coders came to in the 90s. Those patterns became sort of escape hatches to avoid reasoning about ownership, inheritance and scope. They're still common patterns for emergencies in a lot of ECMA-whatever languages resembling Java: You need a static generator that doesn't own the thing it instantiated, and sometimes you only need one static object itself... but one reaches for those tools only as a last resort. After thinking through what other ways you could structure your code. The super-long-name thing in Java always felt to me like people trying to write GOTO/GOSUB type procedural programs in an OO language.
pjmlp
For whatever reason Java gets the blame for what was already common Smalltalk, C++, Clipper 5, Object Pascal, Actor, Eiffel, Objective-C,....before Oak idea turned into Java.
To the point many think the famous patterns book used Java, when it is all about Smalltalk and C++ patterns.
re-thc
> I think Java culture had something to do with the ridiculously verbose names
People complain about this all the time but I'd rather take a verbose name than not knowing what is going on. Sometimes these naming conventions help.
Digging into older poorly named, structured and documented codebases is not fun.
motorest
> I think Java culture had something to do with the ridiculously verbose names, but even more so the prevalence of Factory and Singleton paradigms in Java created these issues.
It sounds like you're confusing things. Factories are a way to instantiate objects of a specific type a certain way. Singletons is just a restriction on how many instances of a type there can be.
These are not Java concepts, nor are they relevant to the topic of declaring and naming types.
peterashford
tbf those are features of early Java frameworks (particularly the awful java enterprise crap)
jayd16
C# has anonymous types, for example. Kind of like tuples but you can name the fields.
GrantMoyer
The types of closures are unnamable in C++ and Rust; each closure has a unique type that can't be written out. Function types (that is, "function item" types) in Rust are also unnamable.
marcosdumay
You kind of answered your question, didn't you?
Because the types in typescript don't need names. And the type "object with firstName and lastName" is one such type that doesn't need a name.
So:
> They don't seem to have a problem with names.
Yes. The problem is much smaller there, and mostly caused by programmer cultures that insist on naming everything.
dharmab
I frequently use anonymous types in my unit tests in Go. I create a custom type describing the inputs, behaviors and expected outputs of a test case. I create a collection of these cases, and use Go's standard library testing package to run the test cases concurrently scaling to the CPU's available threads.
Here's a simple example: https://github.com/dharmab/skyeye/blob/main/pkg/bearings/bea...
jandrewrogers
Anonymous types are a thing in some languages. You also have adjacent concepts like anonymous namespaces in which you can dump types that require a name so that the names don’t leak out of the local context.
Sufficiently flexible namespaces do solve most of these problems. Java is kind of perverse though.
gleenn
Java:
Map m = new HashMap() {{ System.out.println("I am a unique subclass of HashMap with a single instance!");}};
tossandthrow
Now when you establish full functional languages, most languages will allow you to do
fullName = map list \(firstName, lastName) -> firstName + " " + firstName
and type it as `funfullName: (String, String)[] -> String`.I have worked on large scale systems in both types and untyped languages and I cannot emphasize strongly enough how important types are.
noduerme
The only thing with anonymous functions is, when the boss says "please include every user's middle initial", you need to go find every instance of an inline function that resembles this. Consolidating that function in a getter in a class object called Person or User or Customer is a lot nicer.
tossandthrow
This is more a question about architecture.
But one thing is certain: When you have that one function that is used 165 times throughout the code base, having a type checker is certainly going to help you when you add in the users middle initial.
Spivak
In Python you're describing a Protocol. It's actually super reasonable to have a ObjectWithFirstNameAndLastName noun like this. You don't ever need to construct one but you can use it in the type slot and objects you pass in will be checked to conform. You see all kinds of weird specific types floating around the standard lib like this for type hinting.
Duck typing is great, what's even better is documenting when when they need to quack or waddle.
wesselbindt
I think protocols have two major drawbacks regarding readability and safety. When I have a protocol, I cannot easily find its concrete implementations, so it becomes harder to see what the code is actually doing. As for safety, protocols have no way of distinguishing between
class Command:
def execute(self) -> None:
# some implementation
and class Prisoner:
def execute(self) -> None:
# some other implementation
The implementor of the Prisoner class might not want the Prisoner class to be able to be slotted in where the Command class can be slotted in. Your type checker will be of no help here. If you use abstract base classes, your type checker can prevent such mistakes.So when it comes to your own code, the drawbacks of the structural Protocols in comparison the nominal ABCs are pretty big. The pros seem non-existent. The pro, I guess, is that you don't have to type the handful of characters "(Baseclass)" with every concrete implementation.
But they do have one major advantage: if you have third party code that you have no control over, and you have some part of this codebase that you want to replace with your own, and there's no convenient way to do something like the adapter pattern, because it's somehow a bit deeply nested then a Protocol is a great solution.
sfvisser
Really depends on your intent. Ideally code has meaning that reflects your problem domain and not just what happens to work at the moment.
Code that just works right now never scales.
continuational
Did you forget to include `middle-name`?
There's no way to tell.
nottorp
If we're philosophizing here:
1. This (or maybe a less trivial form of this) will bite you in the ass when you end up using other people's unnamed types. Or even when you use your own unnamed types that come from code you haven't touched in three years.
2. That's what interfaces are for in Java. Or at least modern Java.
ido
I’ve first learned Java in introduction to programming in 2001 and that’s what interfaces were for back then already.
Viliam1234
Interface are more fundamental to Java than classes.
Sadly, at the beginning, many people came to Java from C/C++, and they did the thing we used to call "writing C/C++ code in Java".
kitd
This sounds like passing JS objects around and having dependencies between caller and callee on their content being undefined and assumed. I can't think of much worse than that for anything other than a trivial codebase.
At least in Javascript you have JSDoc.
rockyj
Not just names, but a separate file and a package to fit in. I need a small data object, sorry, you have to put it in a separate file and then think of the package it goes in and so on and so forth. Not to mention in Spring you then need to annotate it with something. That is why I say Java development is a pain.
darioush
Kind of disagree with this article, when you add a "noun" (aka type), you're often introducing a new abstraction.
Abstractions have a maintenance cost associated with it ie, another developer or possibly yourself must be able to recreate the "algebra" associated with that type (your thought process) at the time of making modifications. This creates some problems:
1. Since there's no requirement to create a cohesive algebra (API), there was probably never a cohesive abstraction to begin with.
2. Requirements may have changed since the inception of the abstraction, further breaking its cohesion.
3. Since we largely practice "PR (aka change) driven development", after a few substantial repetitions of step 2, now the abstraction has morphed into something that's actually very tied into the callsites (verbs), and is essentially now tech debt (more like a bespoke rube goldberg machine than a well-designed re-usable software component).
You can introduce types if you follow the open/closed principle which means you don't change abstractions after their creation (instead create new ones and then delete old ones when they have no callsites).
default-kramer
> Kind of disagree with this article, when you add a "noun" (aka type), you're often introducing a new abstraction.
The article is talking about simple "bundle of values" types; the example is the CreateSubscriptionRequest. This is not an abstraction. It is simply a declaration of all the fields that must be provided if you want to create a subscription. And it is usually superior to passing those N fields around individually.
darioush
yeah this is correct, somehow I jumped to a different conclusion :)
yodsanklai
Not creating abstractions has a cost too. Let say you have a string which is a user_id, just define a user_id type. That'll document the code, helps avoid mistake thanks to type checking, reduce cognitive load, ease refactoring and so on.
And if you need to change abstraction at some point, then refactor.
darioush
this isn't what I would call an abstraction, that's creating a named type. named types are simple because their algebra is also simple and their maintenance cost is low.
problem is more when you have types that "do things" and "have responsibilities" (usually to "do things with other types they hold pointers to, but do not totally own"), such a type is very difficult to maintain because there's now:
- a boundary of its responsibilities that is subjective,
- responsibility of building collaborators and initializing the type
- dealing with test doubles for the collaborators.
parpfish
my code became much easier to maintain once i stopped thinking of it as writing "algorithms" and "processes" and started thinking of it as a series of type conversions.
structuring what lives where became easier, naming things became systematic and consistent, and writing unit tests became simple.
tomtom1337
This resonates strongly! In Python, I often now find myself declaring Pydantic data structures, and then adding classmethods and regular methods on them to facilitate converting between them.
It makes for great APIs (dot-chaining from one type to another), well-defined types (parse, don’t validate) and keeps the code associated with the type.
whilenot-dev
Exactly!
class Profile:
...
class User:
@classmethod
def from_profile(cls, profile: Profile) -> 'User':
...
def to_profile(self) -> Profile:
...
...are about all the methods I need in my data records. Three simple rules though:1. Keep isomorphisms to one class only: Don't put two def to_${OTHER_MODEL_NAME} in each class, instead (like you said) create one static mapping (@classmethod) and one instance mapping
2. Add a mapping to the one class that feels more generalized out of the two: A more generalized data model will probably be used a lot more throughout the application
3. The creation of instances should be pure: If a mapping has side effects and needs to await something then it isn't just a mapping - first resolve all necessary dependencies, then do the mapping
taberiand
I agree; in my experience, pretty much everything is ETL. We take data from one thing, change it a bit, and put it somewhere else. Sometimes, as a treat, we take data from two things, put them together, and then put that somewhere else.
Frontend, backend, databases, services, reports, whatever - ETL.
In that context, the types and transformations between types are the most important feature. (Everything is actually Category Theory)
namaria
I remember working on an ETL pipeline in Airflow years ago and thinking: "we're defining and programming an abstract computer".
I guess with general computers everything we do is basically defining nested specific computers. That's, I think, the insight behind SmallTalk and the original concept of objects it used: the objects were supposed to represent computers and the message passing was an abstract network layer.
necovek
Except that "ETL" is a terrible name for it: one of those generic acronyms, that even when expanded to "Extract, Transform, Load", needs explanation for at least the "E" and "L" parts.
To me, looking at it as "functional" approach instead (data in, operate over that data, data out) is cognitively simpler.
jt2190
If you ignore the weird rant about OOP (that references an article from 2006… haven’t we all moved on in the last twenty years?), the author’s main thesis lacks context:
> But take it from someone that’s had do deal with codes passing through and returning several values of strings, ints, and bools through a series of function calls: a single struct value is much easier to work with.
This presupposes that the code should be very strict with the data, which may or may not be desirable. For example, in many CRUD apps the client and the database enforce constraints, while the middle tier just needs to martial data between the two, and it’s questionable whether the middle tier should do type enforcement. As always: Challenge your assumptions, try to understand how a “bad” practice might actually be the right approach in certain contexts, etc, etc.
karparov
> That’s what the type system is for: a means of grouping similar bits of information into an easy-to-use whole.
While types can be used for that, they are a much broader concept.
I would say the general purpose of types is to tell apples from oranges.
lurking_swe
not to mention having some actual confidence when making changes to a project! especially one you didn’t author.
DaiPlusPlus
...even in a strictly-statically-typed language with a perfectly expressive type-system, it would be unwise to rely _only_ on a "Build succeeded!" message for having confidence in any changes to the system: there's no substitute for well-trodden unit and integration tests for any codebase of nontrivial importance or complexity.
lurking_swe
it does eliminate an entire class of issues thougj. The typical “property not found” type of issues that are common in vanilla javascript, python, etc.
Having types helps the IDE help YOU! That’s my favorite part about types and a strong IDE like Webstorm or IntelliJ. I agree it’s not a substitute for proper testing though.
paulddraper
Relatedly, don’t be afraid of (database) tables.
It’s okay, you really can have hundreds of tables, your DBMS can handle it.
Obviously don’t create them for their own sake, but there’s no reason to force reuse or generic design for different things.
necovek
I think with databases, it's natural to go for normalized forms for the benefits you get with a relational database.
Unless you don't have anyone with any semblance of DB design, you usually have more friction when you want to de-normalize the DB instead.
At least, that's been my experience, but I did have the luck of mostly working with experts in DB design.
salmonellaeater
> I found that there’s a slight aversion to creating new types in the codebases I work in.
I've encountered the same phenomenon, and I too cannot explain why it happens. Some of the highest-value types are the small special-purpose types like the article's "CreateSubscriptionRequest". They make it much easier to test and maintain these kinds of code paths, like API handlers and DAO/ORM methods.
One of the things that Typescript makes easy is that you can declare a type just to describe some values you're working with, independent of where they come from. So there's no need to e.g. implement a new interface when passing in arguments; if the input conforms to the type, then it's accepted by the compiler. I suspect part of the reason for not wanting to introduce a new type in other languages like Java is the extra friction of having to wrap values in a new class that implements the interface. But even in Typescript codebases I see reluctance to declare new types. They're completely free from the caller's perspective, and they help tremendously with preventing bugs and making refactoring easier. Why are so many engineers afraid to use them? Instead the codebase is littered with functions that take six positional arguments of type string and number. It's a recipe for bugs.
motorest
> I've encountered the same phenomenon, and I too cannot explain why it happens.
I think that some languages lead developers to think of types as architecture components. The cognitive cost and actual development work required to add a type to a project is not the one-liner that we see in TypeScript. As soon as you create a new class, you have a new component that is untested and unproven to work, which then requires developers to add test coverages, which then requires them to add the necessary behavior, etc.
Before you know it, even though you started out by creating a class, you end up with 3 or 4 new files in your project and a PR that spans a dozen source files.
Alternatively, you could instead pass an existing type, or even a primitive type?
> But even in Typescript codebases I see reluctance to declare new types.
Of course. Adding types is not free of cost. You're adding cognitive load to be able to understand what that symbol means and how it can and should be used, not to mention support infrastructure like all the type guards you need to have in place to nudge the compiler to help you write things the right way. Think about it for a second: one of the main uses of types is to prevent developers from misusing specific objects if they don't meet specific requirements. Once you define a type, you need to support the happy flows and also the other flows as well. The bulk of the complexity often lies in the non-happy flows.
DaiPlusPlus
> I think that some languages lead developers to think of types as architecture components
It's not any languages doing that; it's their company culture doing that.
Java-style languages (esp. those using nominative typing, so: Java, C#, Kotlin, Swift, but not Go, Rust, etc) have never elevated their `class` types as illuminated representations of some grandiose system architecture (...with the exception of Java's not-uncontroversial one-class-one-file requirement); consider that none of those languages make it difficult to define a simple product-type class - i.e. a "POCO/POJO DTO". (I'll pre-empt anyone thinking of invoking Java's `java.beans.Bean` as evidence of the language leading to over-thinking architecture: the Bean class is not part of the Java language any more than the MS Office COM lib is part of VB).
The counter-argument is straightforward: reach for your GoF Design Patterns book, leaf through to any example and see how new types, used for a single thing, are declared left, right and centre. There's certainly nothing "architectural" about defining an adapter-class or writing a 10-line factory.
...so if anyone does actually think like that I assume they're misremembering some throwaway advice that maybe applied to a single project they did 20 years ago - and maybe perhaps the company doesn't have a meritocratic vertical-promotion policy and doesn't tolerate subordinates challenging any dictats from the top.
> Think about it for a second: one of the main uses of types is to prevent developers from misusing specific objects if they don't meet specific requirements.
...what you're saying here only applies to languages like TypeScript or Python-with-hints - where "objects" are not instances-of-classes, but even then the term "type" means a lot more than just a kind-of static precondition constraint on a function parameter.
re-thc
> But even in Typescript codebases I see reluctance to declare new types.
The current Typescript hype / trend is to infer types.
Problem is at some point it slow things down to a crawl and it can get really confusing. Instead of having a type mismatch between type A and type B you get an error report that looks like a huge json chain.
jerf
This can't be the explanation for everything, but I do know that once upon a time just the sheer source-code size of the types was annoying. You have to create a constructor, maybe a destructor, and there's all this syntax, and you have to label all the fields as private or public or protected and worry about how it fits into the inheritance hierarchy... and that all still applies in some languages. Even the dynamic scripting languages like Python that you'd think it would be easy tended to need an annoying and often 100% boilerplate __init__ function to initialize them. You couldn't really just get away with "class MyNewType(string): pass" in most cases.
But in many more modern languages, a "new type" is something the equivalent of
type MyNewType string
or data PrimaryColor = Red | Green | Blue
and if that's all your language requires, you really shouldn't be afraid of creating new types. With such a small initial investment it doesn't take much for them to turn net positive.You may need more, but I don't mind paying more to get more. I mind paying more just to tread water.
And I find they tend to very naturally accrete methods/functions (whatever the local thing is) that work on those types that pushes them even more positive fairly quickly. Plus if you've got a language with a halfway modern concept of source documentation you get a nice new thing you can document.
true_blue
>Even the dynamic scripting languages like Python that you'd think it would be easy tended to need an annoying and often 100% boilerplate __init__ function to initialize them.
For this reason, I very much appreciate the dataclass decorator. I notice that I define classes more often since I started using it, so I'm sure that boilerplate is part of the issue.
Jtsummers
> But in many more modern languages, a "new type" is something the equivalent of
You don't even need a modern language for that kind of thing, plenty of languages from a half century or so ago also let you do that. From Ada (40+ years old):
type PrimaryColor is (Red, Green, Blue);
Or if you're content with a mere alias, C (50+ years old) for your first example: typedef char* MyNewType;
jerf
It is true that modern is strictly speaking not the criterion. But what I was referring to is that the languages that were popular in, say, the 90s, generally did have that degree of ceremony.
C has its own problem. First, typedef isn't a new type, just an alias. But C's problem isn't the ceremony so much as the global namespace. Declaring a new type was nominally easy, but it had to have a crappy name, and you paid for that crappy name on every single use, with every function built for it, and so forth. You couldn't afford to just declare a new "IP" type because who knows what that would conflict with. A new type spent a valuable resource in C, a namespace entry. Fortunately modern languages make namespaces cheap too.
salgernon
I was refactoring a 700 line recursive C function (!) - one of those with all variables declared at the outer scope while the function itself was mainly a one pass switch with goto’s for error handling. I created c++ classes for each case, hoisted them out and coalesced types that were otherwise identical. The new version was way smaller and and (imho) far more readable and maintainable.
At some point I needed to change the types to capture a byte range from a buffer rather than just referring to the base+offset and length, and it was trivial to make that change and have it “just work”.
These were no vtable classes with inline methods, within a single compilation unit - so they just poof go away in a stripped binary.
‘Tis better to create a class, than never to have class at all. Or curse the darkness.
jongjong
I'm not a huge fan of types because they don't model the real world accurately. In the real world, most concepts which we describe with human language have many variations with optional properties and it's a pain and a waste of time to try to come up with labels to categorize each one... It induces people to believe that two similar concepts are distinct, when in fact, they are extremely similar and should share the same name. I hate codebases which have many different types for each concept... Just because they have a few properties that are different. It overcomplicates things and creates logical boundaries in the system before they are needed or well defined. It forces people to categorize concepts before they understand the business domain. People will argue that you can define types with optional properties, that's a fair point but you lose some type safety if you do that... If you can do away with some type safety (with regard to some properties), surely you can do away with it completely?
Kinrany
"Modeling the real world" only makes sense when you're building a simulator of some kind.
Another similar approach is building a "digital twin" of something that is happening in the real world.
But very often the software is the only implementation, there's nothing to simulate.
myvoiceismypass
If you flipped things a bit and think of types as more or less "allowed shapes of the things happening in my program", it might seem closer to the real-world comparison that you are thinking of.
boris_m
Languages like Java are awful in that respect, as they make it super hard to declare new types.
People expect for a type to contain some logic, but it doesn't have to. e.g. a configuration is a type that contains other types that contains yet other types. But I have never seen it done like that in languages like Java.
enriquto
As somebody who is afraid of types (and also, who hates types, because we all hate what we fear), may my point of view serve as balance: you don't need a type system if everything is of the same type. Programming in a type-less style is an exhilarating and liberating experience:
assembler : everything is a word
C : everything is an array of bytes
fortran/APL/matlab/octave : everything is a multi-dimensional array of floats
lua : everything is a table
tcl : everything is a string
unix : everything is a file
In some of these languages there are other types, OK, but it helps to treat these objects as awkward deviations from the appropriate thing, and to feel a bit guilty when you use them (e.g., strings in fortran).
zeta0134
I feel the need to issue a correction: while I'm programming in assembly, I very well have types. This word over here (screen position) represents a positive number only, but this one over here (character acceleration) can be negative. When adding one to the other, I need to check the arithmetic flags like so to implement a speed cap...
The types certainly exist. They're in my mind and, increasingly through naming conventions, embedded within some of the comments of my assembler code. But nothing is there to check me. Nothing can catch if I have made an error, and accessed a pointer to a data structure which contains a different type than I thought it did. Without a type system, that error is silent. It may even appear to work! Until 6 months later, when I rearrange my code and the types are arranged differently in memory, and only THEN does it crash.
nottorp
> increasingly through naming conventions
The original goal of hungarian notation :) But Petzold mistakenly used 'type' in the paper and we ended up with llpcmstrzVariableName instead of int mmWith vs int pixelWidth, which was what they were doing in Office and frankly makes a lot of sense.
MrMcCall
But once you get down to the unit data values inside any of those aggregates, you're still dealing with either characters, ints, floats, strings, arrays, and they each have their own individual access patterns and, more importantly, modification functions.
You can't add a number to a string, only to another number.
If you are dealing with a float, you better be careful how you check it for equality.
If it's pure binary, what kind of byte is it? Ascii, unicode code point, unsigned byte, signed multi-byte int, ... whatever.
There's no escaping the details, friend.
And your saying "everything is a word" for assembler is just plain wrong.
SAI_Peregrinus
> You can't add a number to a string, only to another number.
Works in C, as long as the integer keeps the resulting pointer within the bounds of the allocation. See a trivial example[1].
MrMcCall
Ok, sure. But I doubt that's a good practice. In fact, I can't possibly imagine it not being a horrible idea.
So, I ask: what size and signedness of int? 1, 2, 4, 8? What if the string is of length 3, 2, 1, 0?
Why bother with all those corner cases. Everything has a memory layout and appropriate semantics of representation and modification. Pushing those definitions is a recipe for problems.
I like to keep it simple, keeping the semantics simple in how I code specific kinds of transforms.
The less kinds of techniques you use, the less kinds of patterns you have to develop, test, and ensure consistent application of across a codebase.
Especially down in C land, which is effectively assembler.
Gone are the days of Carmac having to save bytes in Doom, unless you're doing embedded work, in which case that's all the more reason to be very careful how you handle those bytes.
enriquto
> And your saying "everything is a word" for assembler is just plain wrong.
I also love x87 registers, but they are becoming rarer these days.
MrMcCall
Your profile says "just another C hacker".
I learned C from reading K&R in the late 80s.
Every C compiler I've worked with could output the code as assembler, so C is really a thin layer of abstraction that wraps assembler. Having programmed in pure assembler before, I understand the benefits of C's abstractions, which began with its minimal, but helpful, type system.
Should I not be taking you seriously?
We are not just talking with each other but sharing our expertise with those who may be reading.
Sometimes I forget that other people can just be unpleasant on purpose. I find no other explanation for your response.
erikerikson
> You can't add a number to a string
Haven't written JavaScript?
MrMcCall
I don't care if it "can" be done, the results will be garbage unless you are very, very careful.
And Javascript is garbage no matter how many people use it successfully, as I have done professionally.
SAI_Peregrinus
Or C. It just turns into pointer math. Godbolt example here[1], just make sure the `int` is an offset within the bounds of the char* and it's well-defined.
ks2048
If you design a large program in C where all your variables are "char*", I suppose "exhilarating" could be one word used to describe it.
The article's perspective would be that structs are useful, so use them liberally. And nearly all good, large C programs do, as far as I can tell.
Of course there are tradeoffs and you can take it too far. The article mentions that as well.
bluGill
I've programmed in typeless languages and they are great for small programs - less than 10,000 lines of code and 5 developers (these numbers are somewhat arbitrary, but close enough from discussion). As you get over that number you start to run into issues because fundamentally your word/array of byte/multi-dimensional array of floats/ ... has deeper meaning and when you get it wrong the code might parse and give a result but the result is wrong.
Types give me a language enforced way to track what the data really means. For small programs I don't care, but types are one of the most powerful tricks needed for thousands of developers to work on a program with millions of lines of code.
wruza
I experience it often in teams less than 2 developers and projects less than 2000 lines of code (that's still 50 pages, btw). It boils down to being able to load everything into your mind, and that heavily depends on a type of a project, data/code models, ide, etc, and also various factors unrelated to coding.
A human mind is a cache -- if you overload it, something will fly out and you won't even notice. Anyone who claims that types have no use probably doesn't experience overloads. If it works for them, good, but it doesn't generalize.
RodgerTheGreat
Sure! But many, many useful programs will never need to grow to millions of LoC or thousands of developers.
null
darioush
I'm a big fan of primitive types, in particular byte arrays.
It's okay to create a new data structure that combines some primitive data types in a "struct", like an array that tracks its length.
But we don't want to "build abstractions and associate behavior to them" (just associate behavior to data structures like push/pop).
whattheheckheck
Can you expand on this please?
darioush
Sure, in many languages we have the notation of thing.do_thing(arg1, arg2).
I suggest this is a good notation for data structures like, stack.push(10) or heap.pop()
I'm suggesting we don't use this notation for things like rules to validate a file, so I suggest we write validate(file, rules) instead of rules.validate(file).
Then we can express the rules as a data structure, and keep the IMO unrelated behavior separate. Note then we don't need to worry about whether it should be file.validate(rules) perhaps. Who does the validation belong to? the rules or the file? the abstractions that are created by non-obvious answers to "who does this behavior belong to" are generally problems for future changes.
usernamed7
Files have types though: .txt, .jpg, .bin, etc.
whilenot-dev
The filename suffix isn't much more than part of the filename (a simple variable name in that analogy) - it's more convention than constraint. Nobody is stopping you from giving your file the name you want (and the OS allows). You'd use literal magic[0] to assume an actual type.
"Everything is a file" rather refers to the fact that every resource in UNIX(-like) operating systems is accessible through a file descriptor: devices, processes, everything, even files =)
mdaniel
And the good(sic) thing about conventions: so many to choose from! .htm .html .HTML .jpeg
People might be afraid of types because in OOP land there's the idea that types aren't mere containers for data.
You have to use encapsulation, inheritance and polymorphism. Fields and properties shouldn't have public setters. You assign a value only through a method, otherwise you make the gods angry.
You have gazillions of constructors, static, public, protected, private and internal. And you have gazillions of methods, static, public, private, protected and internal.
You inherit at least an abstract class and an interface.
You have at least some other types composed in your type.
Unless you do all this, some people might not consider them proper types.
I had my engineering manager warn me that data classes are "anemic models". Yes, but why? "We should have logic and methods to set fields in classes". "Yes, but why?" "It's OOP and we do DDD and use encapsulation." "Yes, but why? Imagine we have immutable records that hold just data and static classes as function containers, and those functions just act on the records, return some new ones and change no state. This way we can reason with ease about our goddam software, especially if we don't encapsulate state and mutate it all over the place, Uncle Bob, be damned." He shook his head in horror. He probably thinks I am a kind of heretic.