Build a Container Image from Scratch

59 comments

·March 18, 2025

godelski

I often wonder, why isn't systemd-nspawn[0] used more often? It's self-described as "chroot on steroids". IME it pretty much lives up to that name. Makes it really easy to containerize things and since it integrates well with systemd you basically don't have to learn new things.

I totally get these are different tools and I don't think nspawn makes docker or podman useless, but I do find it interesting that it isn't more used, especially in things you're using completely locally. Say, your random self-hosted server thing that isn't escaping your LAN (e.g. Jellyfin or anything like this)

[0] https://wiki.archlinux.org/title/Systemd-nspawn

cmeacham98

Because Docker/OCI/etc got the most important part right (or at least much better than the alternatives): distribution.

All you need to start running a Docker container is a location and tag (or hash). To update, all you do is bump the tag (or hash). If a little more complicated setup is necessary (environment variables, volumes, ports, etc) - this can all be easily represented in common formats like Docker compose or Kubernetes manifests.

How do you start running a system-nspawn container? Well first, you bootstrap an entire OS, then deal with that OS's package manager to install the application. You have to manage updates with the package manager yourself (which likely aren't immutable). There's no easy declarative config - you'll probably end up writing a shell script or using a third party tool like Ansible.

There have been many container/chroot concepts in the past. Docker's idea was not novel, but they did building and distribution far better than any alternative when it first released, and it still holds up well today.

ranger207

Yeah, this. Docker/container's greatest feature is less the sandboxing than the distribution. The sandboxing is essential to making the distribution work well, but it's a side feature most of the time

placardloop

It’s kind of funny that people think of “sandboxing” as the main feature of containers, or even as a feature at all. The distribution benefits have always been the entire point of Docker.

The logo of Docker is a ship with a bunch of shipping containers on it (the original logo was clearer, but the current logo still shows this). “Containers” has never been about “containment”, but about modularity and portability.

guappa

Docker containers ran as root by default for a great number of years. I'm not even sure if it has now finally been changed.

They provided no sandboxing whatsoever.

godelski

Sorry. I agree, but that's a different question. I'll circle back to that then. Why don't technical people make these interfaces, giving the same love to user experience that something like Docker gets. As you said, it is scriptable, and I think -- us all being programmers here -- we all know that means you can just make the interface easier.

prmoustache

Are you implying that docker or podman hasn't been made by _technical people_?

vaylian

> I often wonder, why isn't systemd-nspawn[0] used more often?

I think most people simply don't know about it. A lot of people also don't know that there are alternatives to Docker.

I use both, systemd-nspawn and podman containers. They serve different purposes:

systemd-nspawn: Run a complete operation system in a container. Updates are applied in-place. The whole system is writeable. I manage this system myself. I also use the -M switch for the systemctl and journalctl commands on the host to peek into my nspawn-containers. I create the system with debootstrap.

podman: Run a stripped down operating system or just a contained executable with some supporting files. Most of the system is read-only with some writeable volumes mounted at well-defined locations in the file system tree. I don't manage the container image myself and I have activated auto-updates via the quadlet definition file. I create the container based on an image from a public container registry.

Both solutions have their place. systemd-nspawn is a good choice if you want to create a long-lived linux system with lots of components. podman/docker containers are a good choice if you want to containerize an application with standard requirements.

systemd-nspawn is good for pet containers. podman is good for cattle containers.

fuhsnn

I just started learning to setup containers and found nspawn a total convenience, just create ./usr, throw some static-linked binaries to ./bin and systemd-nspawn -D would handle the rest including network pass-through.

wvh

I used this extensively at the time Docker was up and coming. It worked well, much faster than Docker volumes, but required a lot of scripting and clean-up. What Docker got right, apart from distribution, is better separation of host system and whatever mess you are creating. You do not want to make a mistake bootstrapping an OS or forgetting to `chroot` to the right volume.

magicalhippo

> Say, your random self-hosted server thing that isn't escaping your LAN (e.g. Jellyfin or anything like this)

I tried reading your link but I'm none the wiser, so perhaps you could provide the docker-equivalent one-liner to start a Jellyfin instance using systemd-nspawn?

godelski

There isn't a one liner because no one has built it. Which you be clear, this also had to be done for docker.

I'll admit, the documentation to really anything systemd kinda sucks but awareness can help change that

magicalhippo

Ok, so I misread your question.

You're asking why hasn't anyone made something like Docker but with systemd-nspawn as the runtime or "engine".

edit: Found this article[1], which tries to do just that. Still not as convenient as Docker, but doesn't look terrible either.

[1]: https://benjamintoll.com/2022/02/04/on-running-systemd-nspaw...

bionsystem

What is the advantage of nspawn vs lxc ? I use lxc extensively at the moment to test ansible recipes and it works pretty well even though I'm constrained on an old version.

godelski

It uses the host kernel so you can get better performance. You can also do all the resource and capability management that you can do with systemd, so you have a bit more control over the level of consainerization and resource management

mrbluecoat

or nspawn vs apptainer for that matter

null

[deleted]

jmholla

If the author is here, I think there's a typo in this. In section 1.4, you start working from the scratch layer, but the content continues to refer to alpine as the base layer.

    FROM scratch
    
    COPY ./hello /root/
    
    ENTRYPOINT ["./hello"]

> Here, our image contains 2 layers. The first layer comes from the base image, the alpine official docker image i.e. the root filesystem with all the standard shell tools that come along with an alpine distribution. Almost every instruction inside a Containerfile generates another layer. So in the Containerfile above, the COPY instruction creates the second layer which includes filesystem changes to the layer before it. The change here is “adding” a new file—the hello binary—to the existing filesystem i.e. the alpine root filesystem.

prakashdanish

Thanks for pointing that out, I'm curating a PR with all the suggestion from here, should be fixed soon!

psnehanshu

That, and they also added the "time" command in the config with the scratch base image.

mortar

Just learnt about whiteout files from this, thanks! Trying to understand if you purposely included a filename into a layer with the same whiteout prefix “.wh.”, if it would mess with the process that is meant to obfuscate that prefix from subsequent layers.

m463

I learned about $_

  echo abc && echo $_
  abc
  abc

except it's used with wget...

  wget URL && tar -xvf $_

does this work? Shouldn't tar take a filename?

hmm... also, it says there is an alpine layer with "FROM scratch"??

godelski

$_ is the last argument. Here's a better example to illustrate

  > echo 'Hello' 'world' 'my' 'name' 'is' 'godelski'
  Hello world my name is godelski
  > echo $_
  godelski
  > !:0 !:1 !:2 "I'm" "$_"
  Hello world I'm godelski

The reference manual is here[0] and here's a more helpful list[1]

One of my favorites is

  > git diff some/file/ugh/hierarchy.cpp
  > git add $_
  ## Alternatively, but this is more cumbersome (but more flexible)
  !!:s^diff^add

So what is happening with wget is

  > wget https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz && tar -xvf $_
  ## Becomes
  > wget https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz
  > tar -xvf https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz

Which you are correct, doesn't work.

It should actually be something like this

  > wget https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz -O alpine.tar.gz && tar xzf $_

This would work as the last parameter is correct. I also added `z` to the tar and removed `-` because it isn't needed. Note that `v` often makes untaring files MUCH slower

[0] https://www.gnu.org/software/bash/manual/html_node/Bash-Vari...

[1] https://www.gnu.org/software/bash/manual/html_node/Variable-...

ryencoke

If you want to add in another bash trick called Parameter Expansion[0] you can parse out the filename automatically with the special variable $_. Something like:

  > wget https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz && tar xzf ${_##*/}

[0] https://www.gnu.org/software/bash/manual/html_node/Shell-Par...

alkh

I am surprised that this is working, as I always thought that variables get initialized after the full command is parsed. So, I would assume that $_ would be related to the previous command (defined by a new line) and not this one, because there's no newline character here, but only an ampersand.

fragmede

TIL! I use alt-. for that when running interactively, good to know there's a way to do that in a script

null

[deleted]

sudahtigabulan

  !!:s^diff^add

This is enough:

  ^diff^add

mdaniel

It's not an alpine layer, it's a Dockerfile construct representing basically an empty tar file layer: <https://docs.docker.com/build/building/base-images/#create-a...> and <https://github.com/moby/moby/pull/8827>

m463

He says:

  FROM scratch

  COPY ./hello /root/

  ENTRYPOINT ["./hello"]

But I thought "FROM scratch" was an empty container, while "FROM alpine" is a container with alpine libs/executables.

otherwise using "FROM scratch" to populate for example an ubuntu image would pollute the container.

prakashdanish

You're right, that doesn't work the way it is shown. Thankfully, a reader(I'm not sure if it's you) pointed this out with a solution[1] that I plan to add to the post shortly. Again, thanks for pointing this out!

[1] - https://github.com/danishprakash/danishpraka.sh/issues/30

DeathArrow

By containers here the author seems to understand Docker containers. But there are other types of containers like Linux/OpenVZ containers, Windows containers etc.

adminm

Yep. Also containers used in the shipping industry. You might have yet another ype in your fridge.

The thing is that because Docker started the craze, the word "container" without further context in the IT world has become to mean docker container.

prakashdanish

Yes that's what I meant, but while not specifically Docker containers, I did mean Linux containers that are most commonly managed by container engines such as Podman or Docker.

tmaly

Is there a windows version ?

donno

Windows is is very similar, the differences are two the layer tarballs.

The file system appears in a Files sub-directory as there is a Hives sub-directory for containing the Windows Registry.

The other difference is there are two extra PAX headers within the tarball, MSWINDOWS.fileattr which is "32" for a regular file, and "16" for a directory and MSWINDOWS.rawsd which is a special encoding of the security descriptor, which you can think of it as the owner, group and permissions associated with the file (which their standard values can be seen from buildkit here: https://github.com/moby/buildkit/blob/22156ab20bcaea1a1466d2...)

I haven't looked into how to handle the Windows Registry aspect as in my exploration I was focused on simply adding a pre-built executable so I didn't need any registry entries created.

The other fun gotcha is to ensure the ENV section contain PATH set to c:\\Windows\\System32;c:\\Windows otherwise you would be unlikely to be able to run any Windows executable.

stackskipton

Registry is best handled with copy .reg file and CMD reg import blah.reg in Dockerfile

kritr

Running the container on Windows is probably a lot more complicated because there’s no obvious built in chroot + mount filesystem command (at least from memory).

I believe they’re built on silos. I believe containerd itself is probably as low in the container runtime as you’d want to go… See https://github.com/microsoft/hcsshim for the actual bindings.

fazeirony

would WSL work? (sorry, not used windows in a hot minute...)

null

[deleted]

HN

Build a Container Image from Scratch

Build a Container Image from Scratch