All of
 help / color / mirror / Atom feed
To:, Jonathan Corbet <>,
	Alexander Viro <>
Cc: Ignat Korchagin <>,,,
Subject: [PATCH v3] mnt: add support for non-rootfs initramfs
Date: Tue, 14 Sep 2021 13:09:34 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

From: Ignat Korchagin <>

The main need for this is to support container runtimes on stateless Linux
system (pivot_root system call from initramfs).

Normally, the task of initramfs is to mount and switch to a "real" root
filesystem. However, on stateless systems (booting over the network) it is
just convenient to have your "real" filesystem as initramfs from the start.

This, however, breaks different container runtimes, because they usually
use pivot_root system call after creating their mount namespace. But
pivot_root does not work from initramfs, because initramfs runs from
rootfs, which is the root of the mount tree and can't be unmounted.

One workaround is to do:

  mount --bind / /

However, that defeats one of the purposes of using pivot_root in the
cloned containers: get rid of host root filesystem, should the code somehow
escapes the chroot.

There is a way to solve this problem from userspace, but it is much more
  * either have to create a multilayered archive for initramfs, where the
    outer layer creates a tmpfs filesystem and unpacks the inner layer,
    switches root and does not forget to properly cleanup the old rootfs
  * or we need to use keepinitrd kernel cmdline option, unpack initramfs
    to rootfs, run a script to create our target tmpfs root, unpack the
    same initramfs there, switch root to it and again properly cleanup
    the old root, thus unpacking the same archive twice and also wasting
    memory, because the kernel stores compressed initramfs image

With this change we can ask the kernel (by specifying nonroot_initramfs
kernel cmdline option) to create a "leaf" tmpfs mount for us and switch
root to it before the initramfs handling code, so initramfs gets unpacked
directly into the "leaf" tmpfs with rootfs being empty and no need to
clean up anything.

This also bring the behaviour in line with the older style initrd, where
the initrd is located on some leaf filesystem in the mount tree and rootfs
remaining empty.

Co-developed-by: Graham Christensen <>
Signed-off-by: Graham Christensen <>
Tested-by: Graham Christensen <>
Signed-off-by: Ignat Korchagin <>
 .../admin-guide/kernel-parameters.txt         |  9 +++-
 fs/namespace.c                                | 48 +++++++++++++++++++
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 91ba391f9b32..bfbc904ad751 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3517,11 +3517,18 @@
 	nomfgpt		[X86-32] Disable Multi-Function General Purpose
 			Timer usage (for AMD Geode machines).
+	nomodule        Disable module load
 	nonmi_ipi	[X86] Disable using NMI IPIs during panic/reboot to
 			shutdown the other cpus.  Instead use the REBOOT_VECTOR
-	nomodule	Disable module load
+	nonroot_initramfs
+			[KNL] Create an additional tmpfs filesystem under rootfs
+			and unpack initramfs there instead of the rootfs itself.
+			This is useful for stateless systems, which run directly
+			from initramfs, create mount namespaces and use
+			"pivot_root" system call.
 	nopat		[X86] Disable PAT (page attribute table extension of
 			pagetables) support.
diff --git a/fs/namespace.c b/fs/namespace.c
index 659a8f39c61a..c639ea9feb66 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -18,6 +18,7 @@
 #include <linux/cred.h>
 #include <linux/idr.h>
 #include <linux/init.h>		/* init_rootfs */
+#include <linux/init_syscalls.h> /* init_chdir, init_chroot, init_mkdir */
 #include <linux/fs_struct.h>	/* get_fs_root */
 #include <linux/fsnotify.h>	/* fsnotify_vfsmount_delete */
 #include <linux/file.h>
@@ -4302,6 +4303,49 @@ static void __init init_mount_tree(void)
 	set_fs_root(current->fs, &root);
+static int __initdata nonroot_initramfs;
+static int __init nonroot_initramfs_param(char *str)
+	if (*str)
+		return 0;
+	nonroot_initramfs = 1;
+	return 1;
+__setup("nonroot_initramfs", nonroot_initramfs_param);
+static void __init init_nonroot_initramfs(void)
+	int err;
+	if (!nonroot_initramfs)
+		return;
+	err = init_mkdir("/root", 0700);
+	if (err < 0)
+		goto out;
+	err = init_mount("tmpfs", "/root", "tmpfs", 0, NULL);
+	if (err)
+		goto out;
+	err = init_chdir("/root");
+	if (err)
+		goto out;
+	err = init_mount(".", "/", NULL, MS_MOVE, NULL);
+	if (err)
+		goto out;
+	err = init_chroot(".");
+	if (!err)
+		return;
+	pr_warn("Failed to create a non-root filesystem for initramfs\n");
 void __init mnt_init(void)
 	int err;
@@ -4335,6 +4379,10 @@ void __init mnt_init(void)
+	init_nonroot_initramfs();
 void put_mnt_ns(struct mnt_namespace *ns)

      parent reply	other threads:[~2021-09-14 17:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 17:09 [PATCH] mnt: add support for non-rootfs initramfs graham
2021-09-14 17:09 ` graham
2021-09-14 17:09 ` graham [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.