linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Curtin <ecurtin@redhat.com>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-unionfs@vger.kernel.org, linux-erofs@lists.ozlabs.org
Cc: "Daan De Meyer" <daan.j.demeyer@gmail.com>,
	"Stephen Smoogen" <ssmoogen@redhat.com>,
	"Yariv Rachmani" <yrachman@redhat.com>,
	"Daniel Walsh" <dwalsh@redhat.com>,
	"Douglas Landgraf" <dlandgra@redhat.com>,
	"Alexander Larsson" <alexl@redhat.com>,
	"Colin Walters" <walters@redhat.com>,
	"Brian Masney" <bmasney@redhat.com>,
	"Eric Chanudet" <echanude@redhat.com>,
	"Pavol Brilla" <pbrilla@redhat.com>,
	"Lokesh Mandvekar" <lmandvek@redhat.com>,
	"Petr Šabata" <psabata@redhat.com>,
	"Lennart Poettering" <lennart@poettering.net>,
	"Luca Boccassi" <bluca@debian.org>, "Neal Gompa" <neal@gompa.dev>
Subject: [RFC KERNEL] initoverlayfs - a scalable initial filesystem
Date: Mon, 11 Dec 2023 13:45:58 +0000	[thread overview]
Message-ID: <CAOgh=Fwb+JCTQ-iqzjq8st9qbvauxc4gqqafjWG2Xc08MeBabQ@mail.gmail.com> (raw)

Hi All,

We have recently been working on something called initoverlayfs, which
we sent an RFC email to the systemd and dracut mailing lists to gather
feedback. This is an exploratory email as we are unsure if a solution
like this fits in userspace or kernelspace and we would like to gather
feedback from the community.

To describe this briefly, the idea is to use erofs+overlayfs as an
initial filesystem rather than an initramfs. The benefits are, we can
start userspace significantly faster as we do not have to unpack,
decompress and populate a tmpfs upfront, instead we can rely on
transparent decompression like lz4hc instead. What we believe is the
greater benefit, is that we can have less fear of initial filesystem
bloat, as when you are using transparent decompression you only pay
for decompressing the bytes you actually use.

We implemented the first version of this, by creating a small
initramfs that only contains storage drivers, udev and a couple of 100
lines of C code, just enough userspace to mount an erofs with
transient overlay. Then we build a second initramfs which has all the
contents of a normal everyday initramfs with all the bells and
whistles and convert this into an erofs.

Then at boot time you basically transition to this erofs+overlayfs in
userspace and everything works as normal as it would in a traditional
initramfs.

The current implementation looks like this:

```
From the filesystem perspective (roughly):

fw -> bootloader -> kernel -> mini-initramfs -> initoverlayfs -> rootfs

From the process perspective (roughly):

fw -> bootloader -> kernel -> storage-init   -> init ----------------->
```

But we have been asking the question whether we should be implementing
this in kernelspace so it looks more like:

```
From the filesystem perspective (roughly):

fw -> bootloader -> kernel -> initoverlayfs -> rootfs

From the process perspective (roughly):

fw -> bootloader -> kernel -> init ----------------->
```

The kind of questions we are asking are: Would it be possible to
implement this in kernelspace so we could just mount the initial
filesystem data as an erofs+overlayfs filesystem without unpacking,
decompressing, copying the data to a tmpfs, etc.? Could we memmap the
initramfs buffer and mount it like an erofs? What other considerations
should be taken into account?

Echo'ing Lennart we must also "keep in mind from the beginning how
authentication of every component of your process shall work" as
that's essential to a couple of different Linux distributions today.

We kept this email short because we want people to read it and avoid
duplicating information from elsewhere. The effort is described from
different perspectives in the systemd/dracut RFC email and github
README.md if you'd like to learn more, it's worth reading the
discussion in the systemd mailing list:

https://marc.info/?l=systemd-devel&m=170214639006704&w=2

https://github.com/containers/initoverlayfs/blob/main/README.md

We also received feedback informally in the community that it would be
nice if we could optionally use btrfs as an alternative.

Is mise le meas/Regards,

Eric Curtin


             reply	other threads:[~2023-12-11 13:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-11 13:45 Eric Curtin [this message]
2023-12-11 14:17 ` [RFC KERNEL] initoverlayfs - a scalable initial filesystem Neal Gompa
2023-12-12  0:50 ` Gao Xiang
2023-12-12  7:35   ` Christoph Hellwig
2023-12-12  7:50     ` Gao Xiang
2023-12-12 13:06       ` Christoph Hellwig
2023-12-12 21:17         ` Eric Curtin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOgh=Fwb+JCTQ-iqzjq8st9qbvauxc4gqqafjWG2Xc08MeBabQ@mail.gmail.com' \
    --to=ecurtin@redhat.com \
    --cc=alexl@redhat.com \
    --cc=bluca@debian.org \
    --cc=bmasney@redhat.com \
    --cc=daan.j.demeyer@gmail.com \
    --cc=dlandgra@redhat.com \
    --cc=dwalsh@redhat.com \
    --cc=echanude@redhat.com \
    --cc=lennart@poettering.net \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=lmandvek@redhat.com \
    --cc=neal@gompa.dev \
    --cc=pbrilla@redhat.com \
    --cc=psabata@redhat.com \
    --cc=ssmoogen@redhat.com \
    --cc=walters@redhat.com \
    --cc=yrachman@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).