You Need Subtyping

99 comments

·March 26, 2025

masklinn

> If a function accepts a value that can be String or null, it is ok to pass it a value that is guaranteed to be a String and not null. Or in other words, String is a subtype of String?!

That's... not true? The language can just have an implicit conversion rule for convenience.

> properties like nullability that don’t affect memory layout

That's not true either? It might not affect memory layout because all your objects are boxed or you have niche value optimisation you can use to tag nulls, but make an unboxed integer nullable and you'll definitely see your memory layout be affected.

3np

> That's... not true? The language can just have an implicit conversion rule for convenience.

It is true regardless. What you are talking about is a language implementation detail that is orthogonal to reasoning about the type system. Depending on your definition (reasonable people disagree and could be contextual), the memory layout is also rarely a concern when modeling type theory.

However, I can't make sense of the idea that "most languages don't have subtypes". Every language I can think of above ASM has at least some. In C, int8_t is a subtype of int16_t, for example.

And the idea that this is new territory... Here they contrast subtypes in C++ to that of "the new language Java".

https://www.cs.princeton.edu/courses/archive/fall98/cs441/ma...

I agree with the comment that OP really should make it explicit what they mean. It presents itself as "what is subtypes" but they unfortunately use the same term for different concepts in the same text, which I guess contributes to the wider confusion in this thread. Inserting a few "Algebraic ..." and making the context explicit in a place or two.

Or put differently: Algebraic subtypes are a subtype of subtypes.

kibwen

> What you are talking about is a language implementation detail that is orthogonal to reasoning about the type system.

You can handwave away these implementation details in a high-level language, but a thoughtfully-designed low-level language needs to consider whether or not any given language feature can feasibly be implemented without imposing an undesirable runtime cost.

This is also a pretty poor example for us to index on, because the idea of a function taking a `String?` (or `Option<String>` or whatever) is already so bizarre that's it naturally derails the discussion for want of a better example. It only makes sense to write such a function explicitly for users who do already have the nullable/optional type; if you have the non-nullable/non-optional version of the type, you should just be calling functions defined on that type.

zarakshR

> The language can just have an implicit conversion rule for convenience

the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation

> make an unboxed integer nullable ...

I don't think any language allows this, in any case disallowing nullability for unboxed types amounts to the observation that `P !<: P?`, where !<: is "does not subtype"

I believe (unless I have misunderstood you) that both your examples are subsumed (heh) by subtyping

lmm

> the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation

Not necessarily, because you might consider it acceptable for the implicit conversion to change the memory layout in this sense.

> I don't think any languages allow this

Plenty do, e.g. Rust's Option works that way.

> in any case disallowing nullability for unboxed types amounts to the observation that `P !<: P?`, where !<: is "does not subtype"

Saying the same thing with fancier words doesn't explain anything. The point is you can't simply treat non-nullable as a subtype of nullable in general, because this case exists.

dwattttt

> Not necessarily, because you might consider it acceptable for the implicit conversion to change the memory layout in this sense.

Does the actual data layout impact the observation?

If you have A, something that accepts B, and you consider it implicitly possible for an A to be a B with either no change or an implicit change... that seems to amount to considering As to be Bs when necessary.

zarakshR

> The point is you can't simply treat non-nullable as a subtype of nullable in general, because this case exists.

But surely, you can still use subtyping in other cases -- when it is already unboxed -- right?

Like so: `T <: T?` for all boxed `T`.

masklinn

> the presence of an implicit conversion rule `T -> T?` amounts to the observation that `T <: T?`, where <: is the subtyping relation

So you assert that numbers and strings are the same type because PHP and Javascript will implicitly convert back and forth? That any C++ implicit constructor creates a subtyping relationship?

> I don't think any language allows this

Rust, Swift, Zig, just off the top of my head. C# too, value types is where and why it first introduced explicit nullability (back in C# 2.0).

zarakshR

> because PHP and Javascript will implicitly convert back and forth

If you can provide (valid!) methods `T -> U` and `U -> T` for two types, why wouldn't `T = U` hold? (Atleast for types where `=` makes sense)

This is the definition I am using:

> S is a subtype of T, written S <: T, if a value of type S can safely be used in any context where a value of type T is expected.

from Pierce's "Software Foundations"

Of course, you may not want to make `String <: Number` and `Number <: String` (and thus `String = Number`), because there is no sensible way to do so; but this is an issue with the way PHP/JS handles subtyping, not with the notion of subtyping itself and certainly does not apply to the example of nullable types.

Unfortunately, I am not familiar enough with C++ to comment on the other question, sorry!

marcosdumay

Yes, numbers and strings are the same type in Javascript and PHP. And yes, a C++ constructor declares a subtype relation.

That's the semantics of those languages.

meltyness

Yeah I think this is "Deref coercion" in Rust.[0]

For example since Vec can Deref to Slice, you can syntactically run Slice methods on a Vec; meaning Vec is a subtype of Slice.

[0] https://doc.rust-lang.org/std/ops/trait.Deref.html#deref-coe...

naasking

> The language can just have an implicit conversion rule for convenience

By "it is ok pass a String where we expect a String?", I think he means that no such program can go wrong, therefore this is sound. Whether you do this via an implicit conversion or a subtyping relation doesn't really matter.

kragen

I don't think subtyping and implicit coercions are really equivalent. What if your implicit coercions aren't transitive, or if different chains of possible implicit coercions give different results, or if implicit coercions change the in-memory representation of a value? In the last case, for example, reflectively examining the dynamic type will give different results.

3np

They aren't equivalent, they're in different domains and abstraction levels. Subtypes are an abstract notion in type theory and grammar while coercions are implementation details/syntax.

> What if your implicit coercions aren't transitive

Have an example in mind?

naasking

I didn't say they were equivalent, I said they were both sound in that context. By "doesn't really matter", I meant in the sense that I had just described, eg. that it "cannot go wrong".

clord

I think that whole section of the post was "For example, in Kotlin," presumably these are true in that language.

manmal

In Swift, String? is an alias to Optional<String>. Optional is an enumeration with two cases: some, and none, where the former has the non-nil value attached. I‘m not sure how that ties in with subtyping though.

ackfoobar

> how that ties in with subtyping

In Swift it does not. But in Kotlin, `String?` is not an enum (sum type), but instead means `String | null`, a union type.

wesselbindt

Maybe I'm off the mark here, but in my mind the definition of subtype is as follows: t1 is a subtype of t2 if any instance of t1 is an instance of t2. Surely with this definition String is a subtype of String?. Every instance of String (i.e. every string) is an instance of String?, after all.

Post-submit clarity edit: Ah, I think I understand. You're objecting to OP's definition of subtyping, as it breaks down for certain language implementations.

seanmcdirmid

String <: String? is true (Any non-null string is a possibly null string), but String? <: String isn’t true (Any possibly null string may not be a non-null string).

wesselbindt

We're all in agreement there it seems.

tlb

Implicit conversions from Foo? to Foo are a major footgun, because the point of returning a Foo? is to have the caller check for failure before assuming there's a Foo to use. Even C++ avoids the implicit conversion from optional<T> to T.

kragen

Does C++ support implicit coercion from T to optional<T>? I think it doesn't, but that would be the corresponding case.

naasking

Yes, but they were describing an implicit conversion going the other way around, eg. String <: String?.

pansa2

> existing programming languages have little or no subtyping [...] If you remember the fad for Object Oriented Programming

I was under the impression that even today, most code is written in an object-oriented style (or at least, written in languages that support OOP). Therefore, most languages support subtyping via inheritance. Is that not true?

josephg

> I was under the impression that even today, most code is written in an object-oriented style (or at least, written in languages that support OOP).

Whoa there - those are very different claims! Most modern languages are multi-paradigm. You can write Python in an OO way, or a data oriented way, or a functional way, or whatever you like. The same is true of JavaScript/typescript. And modern Java. And lots of languages.

Just because the language has the class keyword doesn’t mean everything can be subtyped. If you have an entity-component system, you can’t necessarily inherit from your Entity type. Likewise I can’t inherit from a react component, just because JavaScript has classes. (At least not using modern react)

kllrnohj

> Most modern languages are multi-paradigm. [..] And modern Java.

Modern Java is still very definitely OOP and not multi-paradigm. Other JVM languages, like Scala or Kotlin, add more multi-paradigm features to that ecosystem, but modern Java doesn't.

tkcranny

Java has made some big strides here to catch up with the functional and data-oriented features other languages are all adding.

I was pretty surprised and impressed to see a modern example lately. There’s Record data types, stronger type inference with `var`, pretty great structural pattern matching, switch expressions (that return a value), stream apis for map, filter, and reduce etc. All of these move it further away from strictly OOP.

And separate but still cool is a pretty neat virtual thread system too. Java has definitely had a bit of a glow up since I learned it in uni.

diroussel

It tries to be OOP, but you can still implement a program with int, int[] and static methods.

And you can’t subtype those.

Other JVM languages, like scala, are more OO than Java.

pansa2

> Whoa there - those are very different claims!

I suspect that the stronger claim is true - but even the weaker claim would invalidate the article's statements that "existing programming languages have little or no subtyping" and that OOP was a "fad" and has been left in the past.

mrkeen

Sate your curiosity and read the next 21 words following your quote.

> , “subtyping” might call to mind classes and inheritance hierarchies. However, subtyping is a far more basic and general notion than that.

adelmotsjr

That does not address the fact that yes, inheritance is one type of subtyping polymorphism, and that many languages have them via OO paradigm.

Yes, there are other forms of subtyping, but saying that many languages don't have subtyping just because they don't have these other forms of subtyping is either cluelessness or satire.

dataangel

You're absolutely correct, just look at the top languages and what's idiomatic in them

kragen

Some languages have separate inheritance and subtyping relations, or have inheritance and no static typing, but yes, this statement makes it clear that the author is either astoundingly clueless, intentionally dishonest, or joking.

pyjarrett

> However, existing programming languages have little or no subtyping

Every "type" you use in Ada is actually a subtype.

In Ada, "A subtype of a given type is a combination of the type, a constraint on values of the type, and certain attributes specific to the subtype".

You don't ever refer to actual types in Ada, since creating a type with "type" actually creates an anonymous type and the name refers to the first subtype.[1]

    type Token_Kind is (
       Identifier,
       String_Literal,
       --  many more ...
    
       -- These are character symbols and will be part of Token_Character_Symbol.
       Ampersand,
       --  ...
       Vertical_Bar,
    );
    
    --  A subtype with a compile-time checked predicate.
    subtype Subprogram_Kind is Token_Kind
    with Static_Predicate => Subprogram_Kind in RW_Function | RW_Procedure;
    
    subtype Token_Character_Symbol is Token_Kind range Ampersand .. Vertical_Bar;

> If you have a pointer with more permissions, it should still be usable in a place that only requires a pointer with fewer permissions

This is exactly how "accessibility" of access types works in Ada[2]. If you have a pointer ("access type") to something on the heap, you can use it wherever an anonymous access type can be used. You can also create subtypes of access types which have the constraint of a "null exclusion".

[1]: https://ada-lang.io/docs/arm/AA-3/AA-3.2#p1_3.2.1

[2]: https://ada-lang.io/docs/arm/AA-3/AA-3.10

jolux

Seems like "Why You Need Algebraic Subtyping" might be a better title for this post.

ivanjermakov

I think more common name for this is structured typing, so it's not confused with subtyping which is a kind of polymorphism.

Rusky

Algebraic subtyping and structured typing are not the same thing. Algebraic subtyping is a specific approach to type inference and checking of subtyping, which may or may not be used with structural types.

ijustlovemath

I hear what the author is saying, but I think if composability is your goal, you should look no further than traits/interfaces. Accomplishes the same goals but actually far more general (which, according to the author, is a good thing)

kragen

Trait and interface systems use a subtyping relation; they aren't alternatives to a subtyping relation.

smadge

I think subtyping is the general concept in the programming language / type literature and interfaces / traits are an example of subtyping. If A implements interface B, A is a subtype of B.

ijustlovemath

thanks for clarifying that!

stephantul

What is the meaning of the word “boxed” in this article? Bit confused about it.

tialaramex

If you're familiar with Rust it's what's different about Box<T> compared to T, if you know Java, all their "objects" are boxed it's the difference between Integer and the primitive int type.

In C++ it's approximately std::unique_ptr<T> in a lot of the high level languages (indeed Java I mentioned above fits this if you're not familiar with the details) boxing is silently done for you and you are unaware of it unless you care about fine details.

Think of an arbitrary type Goose, where "is" the Goose? Assume for a moment there's data associated so the Goose does need to be stored somewhere, the two usual candidates are "the stack" and "the heap". If you've no idea what those are then I'm afraid you need a bit more CS to really engage with this conversation. Boxing puts things on the heap, typically with just a pointer to that box kept on the stack.

mostlylurks

Not represented directly in memory in its raw form right where the value is placed, but rather stored somewhere else (usually the heap) / in an opaque manner and accessed via an indirection, such as a pointer / reference. Often involved in making things polymorphic, because different types usually have different representations in memory even if they implement the same interface, so they need to be boxed.

DeathArrow

Boxing means transforming a variable of a value type into a variable of reference type.

//boxing var a = 5; //int, value type

string b = (string)a; //string, reference type

Unboxing is the reverse process of transforming variables of reference types to value types.

A value type variable holds the value directly, while a reference type variable holds a reference that points to a memory address where the actual value is found.

kragen

This is true in some circumstances but not in general. Consider how integers are represented in Emacs Lisp, OCaml, or Smalltalk. They're unboxed, but boxing them like CPython does wouldn't change their semantics, so it doesn't seem correct to say that it would change their type. And there's no way in the language to box them. Fundamentally, boxing is a question about implementation, not semantics.

I think a more generally correct statement would be:

> An unboxed value can be held in CPU registers directly, while a boxed value is referenced through a memory address where the actual value is found.

masklinn

Usually it's used for saying something is heap allocated.

In this case it doesn't seem very relevant, it looks like the article just assumes boxed records are dynamically or structurally typed, even though that's completely unrelated.

SideburnsOfDoom

> Boxing is the process of converting a value type (on the stack, passed by value) to the type "object". When the runtime boxes a value type, it wraps the value inside a object instance and stores it on the managed heap (passed by reference). Unboxing extracts the value type from the object.

https://learn.microsoft.com/en-us/dotnet/csharp/programming-...

gugagore

Subtyping is one form of type polymorphism. Others include:

- parametric polymorphism (generics, templates, ...)

- ad hoc polymorphism (type classes are the nicest example)

- https://en.wikipedia.org/wiki/Row_polymorphism I think this one is the most relevant for the record discussion.

a3w

After reading the article, I propose this gist / title:

"In high level programmming languages, you expect subtyping as a given",

since we expect a lot of features to work as if subtyping was the way it was implemented?

continuational

This would be a lot more convincing if served with an example of a language where subtyping doesn't lead to more rough edges.

jyounker

Dude seems to be coming from a strange place if he thinks that subtyping is uncommon in programming languages. Java, Kotlin, Scala, C++, Python, Go, and C# all have subtyping of one sort or another, and that covers at least 80% of the code being written today.

null

[deleted]

null

[deleted]

HN

You Need Subtyping

You Need Subtyping