[RFC] initoverlayfs

* [RFC] initoverlayfs - a scalable initial filesystem
@ 2023-12-08 17:59 Eric Curtin
  2023-12-09 12:46 ` Luca Boccassi
  2023-12-11  9:57 ` Lennart Poettering
  0 siblings, 2 replies; 49+ messages in thread
From: Eric Curtin @ 2023-12-08 17:59 UTC (permalink / raw)
  To: systemd-devel, initramfs
  Cc: Stephen Smoogen, Yariv Rachmani, Daniel Walsh, Douglas Landgraf

We have been working on a new initial filesystem called initoverlayfs.
It is a new filesystem that provides a more scalable approach to
initial filesystems as opposed to just using initrds. We are writing
this RFC to the systemd and dracut mailing lists (feel free to forward
to UAPI group also) because although this solution works without
changing the code in these projects, it operates in the same area as
systemd, udev, dracut, etc. and uses these tools.

Brief context:
--------------

initoverlayfs by default uses transient overlays rather than tmpfs to
create throwaway filesystems early in the boot sequence.

Why?

An initramfs has to be decompressed and copied to a tmpfs up front
before it can be used. This results in a situation where you end up
paying for every byte in an initrd in boot performance, even the ones
you don't use in a given boot.

This leads to a fear of using languages that result in larger binaries
sizes early boot, reusing libraries, etc. In some cases, reimplemented
minified versions of software components present in the rootfs are
used.

Alternatively, initoverlayfs uses erofs (with compression) and
overlayfs to achieve this, so you only pay for the bytes you actually
use.

There is also increased pressure from certain industries like
automotive, to start essential services in a boot sequence early.

Requirements:
-------------

An init system
An initramfs building tool
A device manager
overlayfs

Nothing that you wouldn't find in most Linux distributions today.

Design:
-------

Here is the boot sequence with initoverlayfs integrated, the
mini-initramfs contains just enough to get storage drivers loaded and
storage devices initialized. storage-init is a process that is not
designed to replace init, it does just enough to initialize storage
(performs a targeted udev trigger on storage), switches to
initoverlayfs as root and then executes init.

```
fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs

fw -> bootloader -> kernel -> storage-init   -> init ----------------->
```

Benefits:
---------

Scalability: You can put less emphasis on keeping this initial
filesystem small as you will only pay for the bytes you read. This is
probably the bigger picture than raw performance in the next point.

Performance: As this minifies the initramfs to contain only the most
basic storage initialization tasks, linux userspace starts earlier
than it would using just initramfs alone. Leaving all the other
software that require early throwaway filesystems to be executed in
the initoverlayfs. In the case of a Raspberry Pi 4 with sd card, it
leads to systemd starting ~300ms faster and in the case of a Raspberry
Pi 4 with NVMe SSD drive over USB it leads to systemd starting ~500ms
faster. There are some devices that by starting Linux userspace early,
you can expose a slowly initializing storage driver, leading to a
slower boot as with just an initramfs you mask this slow driver by
spending this time on decompression and copying. But a computer is
only as fast as it's slowest component, so if you care about super
fast boots, you need to optimize your storage drivers.

Flexibility: It is now easier to consider using fatter languages like
Rust, etc. Using libraries like graphics libraries, camera libraries,
libevent, glib, C++, etc. early boot can be considered. As you don't
have to decompress and copy this data upfront. This leads to easier to
maintain initrd software also, with more consolidation between rootfs
impelmentations and initial filesystem implementations of components.

Changes required in other projects:
-----------------------------------

There are no major changes required in other projects. Tools like
systemd-analyze might need to be updated to recognize this boot
sequence more accurately, because it has no awareness of
initoverlayfs.

Future plans:
-------------

We intend to propose this to Fedora, CentOS Stream, ostree and
non-ostree variants as we continue this project.

Feel free to try:
-----------------

It should work on most standard 3 partition non-ostree Fedora and
CentOS 9 installs (note: CentOS 9 kernel does not support erofs
compression, so Fedora is a better playground today). It's still in
alpha/beta state I guess. Although I successfully dogfood this on my
laptop and we hard tried this on a couple of different pieces of
hardware and VMs... Maybe run this on a non-critical piece of hardware
or a VM for the next few weeks if you want to try :)

git repo:

https://github.com/containers/initoverlayfs

Also checkout the README.md, there are some graphs and other information there:

https://github.com/containers/initoverlayfs/blob/main/README.md

rpm available in copr:

dnf copr enable @centos-automotive-sig/next
dnf install initoverlayfs
initoverlayfs-install

Is mise le meas/Regards,

Eric Curtin

^ permalink raw reply	[flat|nested] 49+ messages in thread