initramfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luca Boccassi <bluca@debian.org>
To: Demi Marie Obenour <demi@invisiblethingslab.com>
Cc: Lennart Poettering <mzerqung@0pointer.de>,
	Eric Curtin <ecurtin@redhat.com>,
	initramfs@vger.kernel.org,  systemd-devel@lists.freedesktop.org,
	Stephen Smoogen <ssmoogen@redhat.com>,
	 Yariv Rachmani <yrachman@redhat.com>,
	Douglas Landgraf <dlandgra@redhat.com>
Subject: Re: [RFC] initoverlayfs - a scalable initial filesystem
Date: Mon, 11 Dec 2023 21:45:45 +0000	[thread overview]
Message-ID: <CAMw=ZnTTaMK1S6HCm3FL8YkgbYFuvZNQmbPjSC=kfc98=j7MJw@mail.gmail.com> (raw)
In-Reply-To: <ZXd9EVT494r0tuC_@itl-email>

On Mon, 11 Dec 2023 at 21:20, Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> On Mon, Dec 11, 2023 at 08:58:58PM +0000, Luca Boccassi wrote:
> > On Mon, 11 Dec 2023 at 20:43, Demi Marie Obenour
> > <demi@invisiblethingslab.com> wrote:
> > >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA512
> > >
> > > On Mon, Dec 11, 2023 at 08:15:27PM +0000, Luca Boccassi wrote:
> > > > On Mon, 11 Dec 2023 at 17:30, Demi Marie Obenour
> > > > <demi@invisiblethingslab.com> wrote:
> > > > >
> > > > > On Mon, Dec 11, 2023 at 10:57:58AM +0100, Lennart Poettering wrote:
> > > > > > On Fr, 08.12.23 17:59, Eric Curtin (ecurtin@redhat.com) wrote:
> > > > > >
> > > > > > > Here is the boot sequence with initoverlayfs integrated, the
> > > > > > > mini-initramfs contains just enough to get storage drivers loaded and
> > > > > > > storage devices initialized. storage-init is a process that is not
> > > > > > > designed to replace init, it does just enough to initialize storage
> > > > > > > (performs a targeted udev trigger on storage), switches to
> > > > > > > initoverlayfs as root and then executes init.
> > > > > > >
> > > > > > > ```
> > > > > > > fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs
> > > > > > >
> > > > > > > fw -> bootloader -> kernel -> storage-init   -> init ----------------->
> > > > > > > ```
> > > > > >
> > > > > > I am not sure I follow what these chains are supposed to mean? Why are
> > > > > > there two lines?
> > > > > >
> > > > > > So, I generally would agree that the current initrd scheme is not
> > > > > > ideal, and we have been discussing better approaches. But I am not
> > > > > > sure your approach really is useful on generic systems for two
> > > > > > reasons:
> > > > > >
> > > > > > 1. no security model? you need to authenticate your initrd in
> > > > > >    2023. There's no execuse to not doing that anymore these days. Not
> > > > > >    in automotive, and not anywhere else really.
> > > > > >
> > > > > > 2. no way to deal with complex storage? i.e. people use FDE, want to
> > > > > >    unlock their root disks with TPM2 and similar things. People use
> > > > > >    RAID, LVM, and all that mess.
> > > > > >
> > > > > > Actually the above are kinda the same problem in a way: you need
> > > > > > complex storage, but if you need that you kinda need udev, and
> > > > > > services, and then also systemd and all that other stuff, and that's
> > > > > > why the system works like the system works right now.
> > > > > >
> > > > > > Whenever you devise a system like yours by cutting corners, and
> > > > > > declaring that you don't want TPM, you don't want signed initrds, you
> > > > > > don't want to support weird storage, you just solve your problem in a
> > > > > > very specific way, ignoring the big picture. Which is OK, *if* you can
> > > > > > actually really work without all that and are willing to maintain the
> > > > > > solution for your specific problem only.
> > > > > >
> > > > > > As I understand you are trying to solve multiple problems at once
> > > > > > here, and I think one should start with figuring out clearly what
> > > > > > those are before trying to address them, maybe without compromising on
> > > > > > security. So my guess is you want to address the following:
> > > > > >
> > > > > > 1. You don't want the whole big initrd to be read off disk on every
> > > > > >    boot, but only the parts of it that are actually needed.
> > > > > >
> > > > > > 2. You don't want the whole big initrd to be fully decompressed on every
> > > > > >    boot, but only the parts of it that are actually needed.
> > > > > >
> > > > > > 3. You want to share data between root fs and initrd
> > > > > >
> > > > > > 4. You want to save some boot time by not bringing up an init system
> > > > > >    in the initrd once, then tearing it down again, and starting it
> > > > > >    again from the root fs.
> > > > > >
> > > > > > For the items listed above I think you can find different solutions
> > > > > > which do not necessarily compromise security as much.
> > > > > >
> > > > > > So, in the list above you could address the latter three like this:
> > > > > >
> > > > > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > > > > >    loader load the erofs into contigous memory, then use memmap=X!Y on
> > > > > >    the kernel cmdline to synthesize a block device from that, which
> > > > > >    you then mount directly (without any initrd) via
> > > > > >    root=/dev/pmem0. This means yout boot loader will still load the
> > > > > >    whole image into memory, but only decompress the bits actually
> > > > > >    neeed. (It also has some other nice benefits I like, such as an
> > > > > >    immutable rootfs, which tmpfs-based initrds don't have.)
> > > > > >
> > > > > > 3. Simply never transition to the root fs, don't marke the initrds in
> > > > > >    systemd's eyes as an initrd (specifically: don't add an
> > > > > >    /etc/initrd-release file to it). Instead, just merge resources of
> > > > > >    the root fs into your initrd fs via overlayfs. systemd has
> > > > > >    infrastructure for this: "systemd-sysext". It takes immutable,
> > > > > >    authenticated erofs images (with verity, we call them "DDIs",
> > > > > >    i.e. "discoverable disk images") that it overlays into /usr/. [You
> > > > > >    could also very nicely combine this approach with systemd's
> > > > > >    portable services, and npsawn containers, which operate on the same
> > > > > >    authenticated images]. At MSFT we have a major product that works
> > > > > >    exactly like this: the OS runs off a rootfs that is loaded as an
> > > > > >    initrd, and everything that runs on top of this are just these
> > > > > >    verity disk images, using overlayfs and portable services.
> > > > > >
> > > > > > 4. The proposal in 3 also addresses goal 4.
> > > > > >
> > > > > > Which leaves item 1, which is a bit harder to address. We have been
> > > > > > discussing this off an on internally too. A generic solution to this
> > > > > > is hard. My current thinking for this could be something like this,
> > > > > > covering the UEFI world: support sticking a DDI for the main initrd in
> > > > > > the ESP. The ESP is per definition unencrypted and unauthenticated,
> > > > > > but otherwise relatively well defined, i.e. known to be vfat and
> > > > > > discoverable via UUID on a GPT disk. So: build a minimal
> > > > > > single-process initrd into the kernel (i.e. UKI) that has exactly the
> > > > > > storage to find a DDI on the ESP, and set it up. i.e. vfat+erofs fs
> > > > > > drivers, and dm-verity. Then have a PID 1 that does exactly enough to
> > > > > > jump into the rootfs stored in the ESP. That latter then has proper
> > > > > > file system drivers, storage drivers, crypto stack, and can unlock the
> > > > > > real root. This would still be a pretty specific solution to one set
> > > > > > of devices though, as it could not cover network boots (i.e. where
> > > > > > there is just no ESP to boot from), but I think this could be kept
> > > > > > relatively close, as the logic in that case could just fall back into
> > > > > > loading the DDI that normally would still in the ESP fully into
> > > > > > memory.
> > > > >
> > > > > I don't think this is "a pretty specific solution to one set of devices"
> > > > > _at all_.  To the contrary, it is _exactly_ what I want to see desktop
> > > > > systems moving to in the future.
> > > > >
> > > > > It solves the problem of large firmware images.  It solves the problem
> > > > > of device-specific configuration, because one can use a file on the EFI
> > > > > system partition that is read by userspace and either treated as
> > > > > untrusted or TPM-signed.
> > > >
> > > > All those problems are already solved, without inventing a new shell
> > > > scripting solution - we have DDIs and credentials. This is the exact
> > > > opposite of the direction we are pursuing: we want to _kill_ all these
> > > > initrd-specific infrastructure, tools, build systems, dependency
> > > > management and so on, because they are difficult to maintain, they
> > > > create a completely different environment that what is "normally" ran,
> > > > and they end up reinventing everything the 'normal' image does. We
> > > > want to build initrds from packages - as in normal distribution
> > > > packages, not special sauce initrd-only packages, so that the same
> > > > code and the same configuration is used everywhere, in different
> > > > runtime modes. Because that's what distributions are good to do:
> > > > creating package-based ecosystems, with good tooling, infrastructure
> > > > and so on.
> > > >
> > > > The end goal is to build images without initramfs-tools/dracut and
> > > > just using packages, not to stick yet another glue script in front of
> > > > them, that needs yet more special initrd-only arcane magic to put
> > > > together, in order to save a handful of KBs.
> > >
> > > The initramfs being a RAM filesystem is exactly why keeping it small is
> > > so critical.  Lennart's suggestion solves this problem by eagerly
> > > loading an image from disk, which is much less size-constrained.  One
> > > would use distribution packages to build this on-disk image.
> >
> > This is already solved by using extension DDIs for optional packages.
>
> What about non-optional packages?  The goal is to _require_ the on-disk
> image to boot, so that full-featured UI toolkits can be used to e.g.
> prompt for LUKS passphrases.  Ideally, the initramfs would be as minimal
> as possible.

You can use DDIs for anything you want, outside of systemd itself

> > > > And for ancient, legacy platforms that do not support modern APIs, the
> > > > old ways will still be there, and can be used. Nobody is going to take
> > > > away grub and dracut from the internet, if you got some special corner
> > > > case where you want to use it it will still be there, but the fact
> > > > that such corner cases exist cannot stop the rest of the ecosystem
> > > > that is targeted to modern hardware from evolving into something
> > > > better, more maintainable and more straightforward.
> > >
> > > The problem is not that UEFI is not usable in automotive systems.  The
> > > problem is that U-Boot (or any other UEFI implementation) is an extra
> > > stage in the boot process, slows things down, and has more attack
> > > surface.
> >
> > Whatever firmware you use will have an attack surface, the interface
> > it provides - whether legacy bios or uefi-based - is irrelevant for
> > that. Skipping or reimplementing all the verity, tpm, etc logic also
> > increases the attack surface, as does adding initrd-only code that is
> > never tested and exercised outside of that limited context. If you are
> > running with legacy bios on ancient hardware you also will likely lack
> > tpm, secure boot, and so on, so it's all moot, any security argument
> > goes out of the window. If anybody cares about platform security, then
> > a tpm-capable and secureboot-capable firmware with a modern, usable
> > interface like uefi, running the same code in initrd and full system,
> > using dm-verity everywhere, is pretty much the best one can do.
>
> Neither Chrome OS devices nor Macs with Apple silicon use UEFI, and both
> have better platform security than any UEFI-based device on the market I
> am aware of.

We are talking about Linux distributions here. If one wants to use
proprietary systems, sure, there are better things out there, but
that's off topic.

  reply	other threads:[~2023-12-11 21:46 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-08 17:59 [RFC] initoverlayfs - a scalable initial filesystem Eric Curtin
2023-12-09 12:46 ` Luca Boccassi
2023-12-09 14:42   ` Eric Curtin
2023-12-09 14:56     ` Andrei Borzenkov
2023-12-09 15:07       ` Eric Curtin
2023-12-09 15:22         ` Daan De Meyer
2023-12-09 15:46           ` Eric Curtin
2023-12-09 17:19         ` Luca Boccassi
2023-12-09 17:24           ` Eric Curtin
2023-12-09 17:46             ` Luca Boccassi
2023-12-09 17:57               ` Eric Curtin
2023-12-09 18:11                 ` Luca Boccassi
2023-12-09 18:26                   ` Eric Curtin
2023-12-11  9:57 ` Lennart Poettering
2023-12-11 10:07   ` Lennart Poettering
2023-12-11 11:20   ` Eric Curtin
2023-12-11 11:28     ` Eric Curtin
2023-12-11 11:42       ` Eric Curtin
2023-12-11 11:58         ` Lennart Poettering
2023-12-11 11:51       ` Lennart Poettering
2023-12-11 12:48         ` Eric Curtin
2023-12-11 12:52           ` Eric Curtin
2023-12-12 17:37           ` Lennart Poettering
2023-12-12 17:40           ` Lennart Poettering
2023-12-12 19:05             ` Demi Marie Obenour
2023-12-11 16:28   ` Demi Marie Obenour
2023-12-11 17:03     ` Eric Curtin
2023-12-11 17:46       ` Demi Marie Obenour
2023-12-12 18:00       ` Lennart Poettering
2023-12-12 20:34         ` Nils Kattenbeck
2023-12-12 20:48           ` Eric Curtin
2023-12-12 21:02           ` Lennart Poettering
2023-12-12 22:01             ` Nils Kattenbeck
2023-12-13  9:03               ` Lennart Poettering
2023-12-14  1:17                 ` Nils Kattenbeck
2023-12-16 14:34                   ` Lennart Poettering
2023-12-11 17:33     ` Neal Gompa
2023-12-11 20:15     ` Luca Boccassi
2023-12-11 20:43       ` Demi Marie Obenour
2023-12-11 20:58         ` Luca Boccassi
2023-12-11 21:20           ` Demi Marie Obenour
2023-12-11 21:45             ` Luca Boccassi [this message]
2023-12-12  3:47               ` Paul Menzel
2023-12-12  3:56               ` Paul Menzel
2023-12-12 15:26               ` Paul Menzel
2023-12-11 21:24           ` Eric Curtin
2023-12-12 17:50     ` Lennart Poettering
2023-12-18 21:59 Askar Safin
     [not found] ` <CAOgh=FyA94-7YqGpsAqVQjadegRusoAvRhD=t-ipzVWN0CiJRQ@mail.gmail.com>
2023-12-18 23:31   ` Askar Safin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMw=ZnTTaMK1S6HCm3FL8YkgbYFuvZNQmbPjSC=kfc98=j7MJw@mail.gmail.com' \
    --to=bluca@debian.org \
    --cc=demi@invisiblethingslab.com \
    --cc=dlandgra@redhat.com \
    --cc=ecurtin@redhat.com \
    --cc=initramfs@vger.kernel.org \
    --cc=mzerqung@0pointer.de \
    --cc=ssmoogen@redhat.com \
    --cc=systemd-devel@lists.freedesktop.org \
    --cc=yrachman@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).