linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root
@ 2021-06-05  3:44 menglong8.dong
  2021-06-05  3:44 ` [PATCH v6 1/2] init/main.c: introduce function ramdisk_exec_exist() menglong8.dong
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: menglong8.dong @ 2021-06-05  3:44 UTC (permalink / raw)
  To: christian.brauner
  Cc: viro, keescook, samitolvanen, johan, ojeda, jeyu, masahiroy, joe,
	dong.menglong, jack, hare, axboe, tj, gregkh, song, neilb, akpm,
	linux, brho, f.fainelli, palmerdabbelt, wangkefeng.wang,
	mhiramat, rostedt, vbabka, glider, pmladek, johannes.berg,
	ebiederm, jojing64, terrelln, geert, linux-fsdevel, linux-kernel,
	mcgrof, arnd, chris, mingo, bhelgaas, josh

From: Menglong Dong <dong.menglong@zte.com.cn>

As Luis Chamberlain suggested, I split the patch:
[init/initramfs.c: make initramfs support pivot_root]
(https://lore.kernel.org/linux-fsdevel/20210520154244.20209-1-dong.menglong@zte.com.cn/)
into three.

The goal of the series patches is to make pivot_root() support initramfs.

In the first patch, I introduce the function ramdisk_exec_exist(), which
is used to check the exist of 'ramdisk_execute_command' in LOOKUP_DOWN
lookup mode.

In the second patch, I create a second mount, which is called
'user root', and make it become the root. Therefore, the root has a
parent mount, and it can be umounted or pivot_root.

In the third patch, I fix rootfs_fs_type with ramfs, as it is not used
directly any more, and it make no sense to switch it between ramfs and
tmpfs, just fix it with ramfs to simplify the code.


Changes since V5:

Remove the third patch and make it combined with the second one.


Changes since V4:                                                                                                                                                     
                                                                                                                                                                      
Do some more code cleanup for the second patch, include:                                                                                                              
- move 'ramdisk_exec_exist()' to 'init.h'                                                                                                                             
- remove unnecessary struct 'fs_rootfs_root'                                                                                                                          
- introduce 'revert_mount_rootfs()'                                                                                                                                   
- [...]                                                                                                                                                               


Changes since V3:

Do a code cleanup for the second patch, as Christian Brauner suggested:
- remove the concept 'user root', which seems not suitable.
- introduce inline function 'check_tmpfs_enabled()' to avoid duplicated
  code.
- rename function 'mount_user_root' to 'prepare_mount_rootfs'
- rename function 'end_mount_user_root' to 'finish_mount_rootfs'
- join 'init_user_rootfs()' with 'prepare_mount_rootfs()'

Changes since V2:

In the first patch, I use vfs_path_lookup() in init_eaccess() to make the
path lookup follow the mount on '/'. After this, the problem reported by
Masami Hiramatsu is solved. Thanks for your report :/


Changes since V1:

In the first patch, I add the flag LOOKUP_DOWN to init_eaccess(), to make
it support the check of filesystem mounted on '/'.

In the second patch, I control 'user root' with kconfig option
'CONFIG_INITRAMFS_USER_ROOT', and add some comments, as Luis Chamberlain
suggested.

In the third patch, I make 'rootfs_fs_type' in control of
'CONFIG_INITRAMFS_USER_ROOT'.



Menglong Dong (2):
  init/do_mounts.c: create second mount for initramfs
  init/do_mounts.c: fix rootfs_fs_type with ramfs

 include/linux/init.h |   4 ++
 init/do_mounts.c     | 101 ++++++++++++++++++++++++++++++++++++++++---
 init/do_mounts.h     |  16 ++++++-
 init/initramfs.c     |   8 ++++
 usr/Kconfig          |  10 +++++
 5 files changed, 131 insertions(+), 8 deletions(-)

-- 
2.32.0.rc0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 1/2] init/main.c: introduce function ramdisk_exec_exist()
  2021-06-05  3:44 [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root menglong8.dong
@ 2021-06-05  3:44 ` menglong8.dong
  2021-06-05  3:44 ` [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs menglong8.dong
  2021-06-09 14:03 ` [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root Masami Hiramatsu
  2 siblings, 0 replies; 16+ messages in thread
From: menglong8.dong @ 2021-06-05  3:44 UTC (permalink / raw)
  To: christian.brauner
  Cc: viro, keescook, samitolvanen, johan, ojeda, jeyu, masahiroy, joe,
	dong.menglong, jack, hare, axboe, tj, gregkh, song, neilb, akpm,
	linux, brho, f.fainelli, palmerdabbelt, wangkefeng.wang,
	mhiramat, rostedt, vbabka, glider, pmladek, johannes.berg,
	ebiederm, jojing64, terrelln, geert, linux-fsdevel, linux-kernel,
	mcgrof, arnd, chris, mingo, bhelgaas, josh

From: Menglong Dong <dong.menglong@zte.com.cn>

Introduce the function ramdisk_exec_exist, which is used to check the
exist of 'ramdisk_execute_command'.

To make path lookup follow the mount on '/', use vfs_path_lookup() in
init_eaccess(), and make the filesystem that mounted on '/' as root
during path lookup.

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
---
 fs/init.c            | 11 +++++++++--
 include/linux/init.h |  1 +
 init/main.c          |  7 ++++++-
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/init.c b/fs/init.c
index 5c36adaa9b44..166356a1f15f 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -112,14 +112,21 @@ int __init init_chmod(const char *filename, umode_t mode)
 
 int __init init_eaccess(const char *filename)
 {
-	struct path path;
+	struct path path, root;
 	int error;
 
-	error = kern_path(filename, LOOKUP_FOLLOW, &path);
+	error = kern_path("/", LOOKUP_DOWN, &root);
 	if (error)
 		return error;
+	error = vfs_path_lookup(root.dentry, root.mnt, filename,
+				LOOKUP_FOLLOW, &path);
+	if (error)
+		goto on_err;
 	error = path_permission(&path, MAY_ACCESS);
+
 	path_put(&path);
+on_err:
+	path_put(&root);
 	return error;
 }
 
diff --git a/include/linux/init.h b/include/linux/init.h
index d82b4b2e1d25..889d538b6dfa 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -149,6 +149,7 @@ extern unsigned int reset_devices;
 void setup_arch(char **);
 void prepare_namespace(void);
 void __init init_rootfs(void);
+bool ramdisk_exec_exist(void);
 extern struct file_system_type rootfs_fs_type;
 
 #if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_STRICT_MODULE_RWX)
diff --git a/init/main.c b/init/main.c
index eb01e121d2f1..1153571ca977 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1522,6 +1522,11 @@ void __init console_on_rootfs(void)
 	fput(file);
 }
 
+bool __init ramdisk_exec_exist(void)
+{
+	return init_eaccess(ramdisk_execute_command) == 0;
+}
+
 static noinline void __init kernel_init_freeable(void)
 {
 	/*
@@ -1568,7 +1573,7 @@ static noinline void __init kernel_init_freeable(void)
 	 * check if there is an early userspace init.  If yes, let it do all
 	 * the work
 	 */
-	if (init_eaccess(ramdisk_execute_command) != 0) {
+	if (!ramdisk_exec_exist()) {
 		ramdisk_execute_command = NULL;
 		prepare_namespace();
 	}
-- 
2.32.0.rc0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-06-05  3:44 [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root menglong8.dong
  2021-06-05  3:44 ` [PATCH v6 1/2] init/main.c: introduce function ramdisk_exec_exist() menglong8.dong
@ 2021-06-05  3:44 ` menglong8.dong
  2021-06-05 11:50   ` Christian Brauner
  2021-06-09 14:03 ` [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root Masami Hiramatsu
  2 siblings, 1 reply; 16+ messages in thread
From: menglong8.dong @ 2021-06-05  3:44 UTC (permalink / raw)
  To: christian.brauner
  Cc: viro, keescook, samitolvanen, johan, ojeda, jeyu, masahiroy, joe,
	dong.menglong, jack, hare, axboe, tj, gregkh, song, neilb, akpm,
	linux, brho, f.fainelli, palmerdabbelt, wangkefeng.wang,
	mhiramat, rostedt, vbabka, glider, pmladek, johannes.berg,
	ebiederm, jojing64, terrelln, geert, linux-fsdevel, linux-kernel,
	mcgrof, arnd, chris, mingo, bhelgaas, josh

From: Menglong Dong <dong.menglong@zte.com.cn>

If using container platforms such as Docker, upon initialization it
wants to use pivot_root() so that currently mounted devices do not
propagate to containers. An example of value in this is that
a USB device connected prior to the creation of a containers on the
host gets disconnected after a container is created; if the
USB device was mounted on containers, but already removed and
umounted on the host, the mount point will not go away until all
containers unmount the USB device.

Another reason for container platforms such as Docker to use pivot_root
is that upon initialization the net-namspace is mounted under
/var/run/docker/netns/ on the host by dockerd. Without pivot_root
Docker must either wait to create the network namespace prior to
the creation of containers or simply deal with leaking this to each
container.

pivot_root is supported if the rootfs is a initrd or block device, but
it's not supported if the rootfs uses an initramfs (tmpfs). This means
container platforms today must resort to using block devices if
they want to pivot_root from the rootfs. A workaround to use chroot()
is not a clean viable option given every container will have a
duplicate of every mount point on the host.

In order to support using container platforms such as Docker on
all the supported rootfs types we must extend Linux to support
pivot_root on initramfs as well. This patch does the work to do
just that.

pivot_root will unmount the mount of the rootfs from its parent mount
and mount the new root to it. However, when it comes to initramfs, it
donesn't work, because the root filesystem has not parent mount, which
makes initramfs not supported by pivot_root.

In order to make pivot_root supported on initramfs, we create a second
mount with type of rootfs before unpacking cpio, and change root to
this mount after unpacking.

While mounting the second rootfs, 'rootflags' is passed, and it means
that we can set options for the mount of rootfs in boot cmd now.
For example, the size of tmpfs can be set with 'rootflags=size=1024M'.

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
---
 init/do_mounts.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 init/do_mounts.h | 17 ++++++++++++++++-
 init/initramfs.c |  8 ++++++++
 usr/Kconfig      | 10 ++++++++++
 4 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/init/do_mounts.c b/init/do_mounts.c
index a78e44ee6adb..715bdaa89b81 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -618,6 +618,49 @@ void __init prepare_namespace(void)
 }
 
 static bool is_tmpfs;
+#ifdef CONFIG_INITRAMFS_MOUNT
+
+/*
+ * Give systems running from the initramfs and making use of pivot_root a
+ * proper mount so it can be umounted during pivot_root.
+ */
+int __init prepare_mount_rootfs(void)
+{
+	char *rootfs = "ramfs";
+
+	if (is_tmpfs)
+		rootfs = "tmpfs";
+
+	return do_mount_root(rootfs, rootfs,
+			     root_mountflags & ~MS_RDONLY,
+			     root_mount_data);
+}
+
+/*
+ * Revert to previous mount by chdir to '/' and unmounting the second
+ * mount.
+ */
+void __init revert_mount_rootfs(void)
+{
+	init_chdir("/");
+	init_umount(".", MNT_DETACH);
+}
+
+/*
+ * Change root to the new rootfs that mounted in prepare_mount_rootfs()
+ * if cpio is unpacked successfully and 'ramdisk_execute_command' exist.
+ */
+void __init finish_mount_rootfs(void)
+{
+	init_mount(".", "/", NULL, MS_MOVE, NULL);
+	if (likely(ramdisk_exec_exist()))
+		init_chroot(".");
+	else
+		revert_mount_rootfs();
+}
+
+#define rootfs_init_fs_context ramfs_init_fs_context
+#else
 static int rootfs_init_fs_context(struct fs_context *fc)
 {
 	if (IS_ENABLED(CONFIG_TMPFS) && is_tmpfs)
@@ -625,6 +668,7 @@ static int rootfs_init_fs_context(struct fs_context *fc)
 
 	return ramfs_init_fs_context(fc);
 }
+#endif
 
 struct file_system_type rootfs_fs_type = {
 	.name		= "rootfs",
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 7a29ac3e427b..ae4ab306caa9 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -10,9 +10,24 @@
 #include <linux/root_dev.h>
 #include <linux/init_syscalls.h>
 
+extern int root_mountflags;
+
 void  mount_block_root(char *name, int flags);
 void  mount_root(void);
-extern int root_mountflags;
+
+#ifdef CONFIG_INITRAMFS_MOUNT
+
+int  prepare_mount_rootfs(void);
+void finish_mount_rootfs(void);
+void revert_mount_rootfs(void);
+
+#else
+
+static inline int  prepare_mount_rootfs(void) { return 0; }
+static inline void finish_mount_rootfs(void) { }
+static inline void revert_mount_rootfs(void) { }
+
+#endif
 
 static inline __init int create_dev(char *name, dev_t dev)
 {
diff --git a/init/initramfs.c b/init/initramfs.c
index af27abc59643..1833de3cf04e 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -16,6 +16,8 @@
 #include <linux/namei.h>
 #include <linux/init_syscalls.h>
 
+#include "do_mounts.h"
+
 static ssize_t __init xwrite(struct file *file, const char *p, size_t count,
 		loff_t *pos)
 {
@@ -682,13 +684,19 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
 	else
 		printk(KERN_INFO "Unpacking initramfs...\n");
 
+	if (prepare_mount_rootfs())
+		panic("Failed to mount rootfs");
+
 	err = unpack_to_rootfs((char *)initrd_start, initrd_end - initrd_start);
 	if (err) {
+		revert_mount_rootfs();
 #ifdef CONFIG_BLK_DEV_RAM
 		populate_initrd_image(err);
 #else
 		printk(KERN_EMERG "Initramfs unpacking failed: %s\n", err);
 #endif
+	} else {
+		finish_mount_rootfs();
 	}
 
 done:
diff --git a/usr/Kconfig b/usr/Kconfig
index 8bbcf699fe3b..4f6ac12eafe9 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -52,6 +52,16 @@ config INITRAMFS_ROOT_GID
 
 	  If you are not sure, leave it set to "0".
 
+config INITRAMFS_MOUNT
+	bool "Create second mount to make pivot_root() supported"
+	default y
+	help
+	  Before unpacking cpio, create a second mount and make it become
+	  the root filesystem. Therefore, initramfs will be supported by
+	  pivot_root().
+
+	  If container platforms is used with initramfs, say Y.
+
 config RD_GZIP
 	bool "Support initial ramdisk/ramfs compressed using gzip"
 	default y
-- 
2.32.0.rc0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-06-05  3:44 ` [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs menglong8.dong
@ 2021-06-05 11:50   ` Christian Brauner
  2021-06-05 14:47     ` Menglong Dong
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2021-06-05 11:50 UTC (permalink / raw)
  To: menglong8.dong
  Cc: viro, keescook, samitolvanen, johan, ojeda, jeyu, masahiroy, joe,
	dong.menglong, jack, hare, axboe, tj, gregkh, song, neilb, akpm,
	linux, brho, f.fainelli, palmerdabbelt, wangkefeng.wang,
	mhiramat, rostedt, vbabka, glider, pmladek, johannes.berg,
	ebiederm, jojing64, terrelln, geert, linux-fsdevel, linux-kernel,
	mcgrof, arnd, chris, mingo, bhelgaas, josh

On Sat, Jun 05, 2021 at 11:44:47AM +0800, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <dong.menglong@zte.com.cn>
> 
> If using container platforms such as Docker, upon initialization it
> wants to use pivot_root() so that currently mounted devices do not
> propagate to containers. An example of value in this is that
> a USB device connected prior to the creation of a containers on the
> host gets disconnected after a container is created; if the
> USB device was mounted on containers, but already removed and
> umounted on the host, the mount point will not go away until all
> containers unmount the USB device.
> 
> Another reason for container platforms such as Docker to use pivot_root
> is that upon initialization the net-namspace is mounted under
> /var/run/docker/netns/ on the host by dockerd. Without pivot_root
> Docker must either wait to create the network namespace prior to
> the creation of containers or simply deal with leaking this to each
> container.
> 
> pivot_root is supported if the rootfs is a initrd or block device, but
> it's not supported if the rootfs uses an initramfs (tmpfs). This means
> container platforms today must resort to using block devices if
> they want to pivot_root from the rootfs. A workaround to use chroot()
> is not a clean viable option given every container will have a
> duplicate of every mount point on the host.
> 
> In order to support using container platforms such as Docker on
> all the supported rootfs types we must extend Linux to support
> pivot_root on initramfs as well. This patch does the work to do
> just that.
> 
> pivot_root will unmount the mount of the rootfs from its parent mount
> and mount the new root to it. However, when it comes to initramfs, it
> donesn't work, because the root filesystem has not parent mount, which
> makes initramfs not supported by pivot_root.
> 
> In order to make pivot_root supported on initramfs, we create a second
> mount with type of rootfs before unpacking cpio, and change root to
> this mount after unpacking.
> 
> While mounting the second rootfs, 'rootflags' is passed, and it means
> that we can set options for the mount of rootfs in boot cmd now.
> For example, the size of tmpfs can be set with 'rootflags=size=1024M'.
> 
> Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
> ---
>  init/do_mounts.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  init/do_mounts.h | 17 ++++++++++++++++-
>  init/initramfs.c |  8 ++++++++
>  usr/Kconfig      | 10 ++++++++++
>  4 files changed, 78 insertions(+), 1 deletion(-)
> 
> diff --git a/init/do_mounts.c b/init/do_mounts.c
> index a78e44ee6adb..715bdaa89b81 100644
> --- a/init/do_mounts.c
> +++ b/init/do_mounts.c
> @@ -618,6 +618,49 @@ void __init prepare_namespace(void)
>  }
>  
>  static bool is_tmpfs;
> +#ifdef CONFIG_INITRAMFS_MOUNT
> +
> +/*
> + * Give systems running from the initramfs and making use of pivot_root a
> + * proper mount so it can be umounted during pivot_root.
> + */
> +int __init prepare_mount_rootfs(void)
> +{
> +	char *rootfs = "ramfs";
> +
> +	if (is_tmpfs)
> +		rootfs = "tmpfs";
> +
> +	return do_mount_root(rootfs, rootfs,
> +			     root_mountflags & ~MS_RDONLY,
> +			     root_mount_data);
> +}
> +
> +/*
> + * Revert to previous mount by chdir to '/' and unmounting the second
> + * mount.
> + */
> +void __init revert_mount_rootfs(void)
> +{
> +	init_chdir("/");
> +	init_umount(".", MNT_DETACH);
> +}
> +
> +/*
> + * Change root to the new rootfs that mounted in prepare_mount_rootfs()
> + * if cpio is unpacked successfully and 'ramdisk_execute_command' exist.
> + */
> +void __init finish_mount_rootfs(void)
> +{
> +	init_mount(".", "/", NULL, MS_MOVE, NULL);
> +	if (likely(ramdisk_exec_exist()))
> +		init_chroot(".");
> +	else
> +		revert_mount_rootfs();
> +}
> +
> +#define rootfs_init_fs_context ramfs_init_fs_context

Sorry, I think we're nearly there. What's the rationale for using ramfs
when unconditionally when a separate mount for initramfs is requested?
Meaning, why do we need this define at all?

> +#else
>  static int rootfs_init_fs_context(struct fs_context *fc)
>  {
>  	if (IS_ENABLED(CONFIG_TMPFS) && is_tmpfs)
> @@ -625,6 +668,7 @@ static int rootfs_init_fs_context(struct fs_context *fc)
>  
>  	return ramfs_init_fs_context(fc);
>  }
> +#endif
>  
>  struct file_system_type rootfs_fs_type = {
>  	.name		= "rootfs",
> diff --git a/init/do_mounts.h b/init/do_mounts.h
> index 7a29ac3e427b..ae4ab306caa9 100644
> --- a/init/do_mounts.h
> +++ b/init/do_mounts.h
> @@ -10,9 +10,24 @@
>  #include <linux/root_dev.h>
>  #include <linux/init_syscalls.h>
>  
> +extern int root_mountflags;
> +
>  void  mount_block_root(char *name, int flags);
>  void  mount_root(void);
> -extern int root_mountflags;
> +
> +#ifdef CONFIG_INITRAMFS_MOUNT
> +
> +int  prepare_mount_rootfs(void);
> +void finish_mount_rootfs(void);
> +void revert_mount_rootfs(void);
> +
> +#else
> +
> +static inline int  prepare_mount_rootfs(void) { return 0; }
> +static inline void finish_mount_rootfs(void) { }
> +static inline void revert_mount_rootfs(void) { }
> +
> +#endif
>  
>  static inline __init int create_dev(char *name, dev_t dev)
>  {
> diff --git a/init/initramfs.c b/init/initramfs.c
> index af27abc59643..1833de3cf04e 100644
> --- a/init/initramfs.c
> +++ b/init/initramfs.c
> @@ -16,6 +16,8 @@
>  #include <linux/namei.h>
>  #include <linux/init_syscalls.h>
>  
> +#include "do_mounts.h"
> +
>  static ssize_t __init xwrite(struct file *file, const char *p, size_t count,
>  		loff_t *pos)
>  {
> @@ -682,13 +684,19 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
>  	else
>  		printk(KERN_INFO "Unpacking initramfs...\n");
>  
> +	if (prepare_mount_rootfs())
> +		panic("Failed to mount rootfs");
> +
>  	err = unpack_to_rootfs((char *)initrd_start, initrd_end - initrd_start);
>  	if (err) {
> +		revert_mount_rootfs();
>  #ifdef CONFIG_BLK_DEV_RAM
>  		populate_initrd_image(err);
>  #else
>  		printk(KERN_EMERG "Initramfs unpacking failed: %s\n", err);
>  #endif
> +	} else {
> +		finish_mount_rootfs();
>  	}
>  
>  done:
> diff --git a/usr/Kconfig b/usr/Kconfig
> index 8bbcf699fe3b..4f6ac12eafe9 100644
> --- a/usr/Kconfig
> +++ b/usr/Kconfig
> @@ -52,6 +52,16 @@ config INITRAMFS_ROOT_GID
>  
>  	  If you are not sure, leave it set to "0".
>  
> +config INITRAMFS_MOUNT
> +	bool "Create second mount to make pivot_root() supported"
> +	default y
> +	help
> +	  Before unpacking cpio, create a second mount and make it become
> +	  the root filesystem. Therefore, initramfs will be supported by
> +	  pivot_root().
> +
> +	  If container platforms is used with initramfs, say Y.
> +
>  config RD_GZIP
>  	bool "Support initial ramdisk/ramfs compressed using gzip"
>  	default y
> -- 
> 2.32.0.rc0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-06-05 11:50   ` Christian Brauner
@ 2021-06-05 14:47     ` Menglong Dong
  2021-06-07 10:31       ` Christian Brauner
  0 siblings, 1 reply; 16+ messages in thread
From: Menglong Dong @ 2021-06-05 14:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Kees Cook, Sami Tolvanen, johan, ojeda, jeyu,
	masahiroy, joe, Menglong Dong, Jan Kara, hare, Jens Axboe, tj,
	gregkh, song, NeilBrown, Andrew Morton, Rasmus Villemoes,
	Barret Rhoden, f.fainelli, palmerdabbelt, wangkefeng.wang,
	Masami Hiramatsu, Steven Rostedt, vbabka, Alexander Potapenko,
	pmladek, johannes.berg, Eric W. Biederman, jojing64, terrelln,
	geert, linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down,
	mingo, Bjorn Helgaas, Josh Triplett

Hello,

On Sat, Jun 5, 2021 at 7:50 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Sat, Jun 05, 2021 at 11:44:47AM +0800, menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <dong.menglong@zte.com.cn>
> >
> > If using container platforms such as Docker, upon initialization it
> > wants to use pivot_root() so that currently mounted devices do not
> > propagate to containers. An example of value in this is that
> > a USB device connected prior to the creation of a containers on the
> > host gets disconnected after a container is created; if the
> > USB device was mounted on containers, but already removed and
> > umounted on the host, the mount point will not go away until all
> > containers unmount the USB device.
> >
> > Another reason for container platforms such as Docker to use pivot_root
> > is that upon initialization the net-namspace is mounted under
> > /var/run/docker/netns/ on the host by dockerd. Without pivot_root
> > Docker must either wait to create the network namespace prior to
> > the creation of containers or simply deal with leaking this to each
> > container.
> >
> > pivot_root is supported if the rootfs is a initrd or block device, but
> > it's not supported if the rootfs uses an initramfs (tmpfs). This means
> > container platforms today must resort to using block devices if
> > they want to pivot_root from the rootfs. A workaround to use chroot()
> > is not a clean viable option given every container will have a
> > duplicate of every mount point on the host.
> >
> > In order to support using container platforms such as Docker on
> > all the supported rootfs types we must extend Linux to support
> > pivot_root on initramfs as well. This patch does the work to do
> > just that.
> >
> > pivot_root will unmount the mount of the rootfs from its parent mount
> > and mount the new root to it. However, when it comes to initramfs, it
> > donesn't work, because the root filesystem has not parent mount, which
> > makes initramfs not supported by pivot_root.
> >
> > In order to make pivot_root supported on initramfs, we create a second
> > mount with type of rootfs before unpacking cpio, and change root to
> > this mount after unpacking.
> >
> > While mounting the second rootfs, 'rootflags' is passed, and it means
> > that we can set options for the mount of rootfs in boot cmd now.
> > For example, the size of tmpfs can be set with 'rootflags=size=1024M'.
> >
> > Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
> > ---
> >  init/do_mounts.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >  init/do_mounts.h | 17 ++++++++++++++++-
> >  init/initramfs.c |  8 ++++++++
> >  usr/Kconfig      | 10 ++++++++++
> >  4 files changed, 78 insertions(+), 1 deletion(-)
> >
> > diff --git a/init/do_mounts.c b/init/do_mounts.c
> > index a78e44ee6adb..715bdaa89b81 100644
> > --- a/init/do_mounts.c
> > +++ b/init/do_mounts.c
> > @@ -618,6 +618,49 @@ void __init prepare_namespace(void)
> >  }
> >
> >  static bool is_tmpfs;
> > +#ifdef CONFIG_INITRAMFS_MOUNT
> > +
> > +/*
> > + * Give systems running from the initramfs and making use of pivot_root a
> > + * proper mount so it can be umounted during pivot_root.
> > + */
> > +int __init prepare_mount_rootfs(void)
> > +{
> > +     char *rootfs = "ramfs";
> > +
> > +     if (is_tmpfs)
> > +             rootfs = "tmpfs";
> > +
> > +     return do_mount_root(rootfs, rootfs,
> > +                          root_mountflags & ~MS_RDONLY,
> > +                          root_mount_data);
> > +}
> > +
> > +/*
> > + * Revert to previous mount by chdir to '/' and unmounting the second
> > + * mount.
> > + */
> > +void __init revert_mount_rootfs(void)
> > +{
> > +     init_chdir("/");
> > +     init_umount(".", MNT_DETACH);
> > +}
> > +
> > +/*
> > + * Change root to the new rootfs that mounted in prepare_mount_rootfs()
> > + * if cpio is unpacked successfully and 'ramdisk_execute_command' exist.
> > + */
> > +void __init finish_mount_rootfs(void)
> > +{
> > +     init_mount(".", "/", NULL, MS_MOVE, NULL);
> > +     if (likely(ramdisk_exec_exist()))
> > +             init_chroot(".");
> > +     else
> > +             revert_mount_rootfs();
> > +}
> > +
> > +#define rootfs_init_fs_context ramfs_init_fs_context
>
> Sorry, I think we're nearly there. What's the rationale for using ramfs
> when unconditionally when a separate mount for initramfs is requested?
> Meaning, why do we need this define at all?

I think it's necessary, as I explained in the third patch. When the rootfs
is a block device, ramfs is used in init_mount_tree() unconditionally,
which can be seen from the enable of is_tmpfs.

That makes sense, because rootfs will not become the root if a block
device is specified by 'root' in boot cmd, so it makes no sense to use
tmpfs, because ramfs is more simple.

Here, I make rootfs as ramfs for the same reason: the first mount is not
used as the root, so make it ramfs which is more simple.

Thanks!
Menglong Dong

>
> > +#else
> >  static int rootfs_init_fs_context(struct fs_context *fc)
> >  {
> >       if (IS_ENABLED(CONFIG_TMPFS) && is_tmpfs)
> > @@ -625,6 +668,7 @@ static int rootfs_init_fs_context(struct fs_context *fc)
> >
> >       return ramfs_init_fs_context(fc);
> >  }
> > +#endif
> >
> >  struct file_system_type rootfs_fs_type = {
> >       .name           = "rootfs",
> > diff --git a/init/do_mounts.h b/init/do_mounts.h
> > index 7a29ac3e427b..ae4ab306caa9 100644
> > --- a/init/do_mounts.h
> > +++ b/init/do_mounts.h
> > @@ -10,9 +10,24 @@
> >  #include <linux/root_dev.h>
> >  #include <linux/init_syscalls.h>
> >
> > +extern int root_mountflags;
> > +
> >  void  mount_block_root(char *name, int flags);
> >  void  mount_root(void);
> > -extern int root_mountflags;
> > +
> > +#ifdef CONFIG_INITRAMFS_MOUNT
> > +
> > +int  prepare_mount_rootfs(void);
> > +void finish_mount_rootfs(void);
> > +void revert_mount_rootfs(void);
> > +
> > +#else
> > +
> > +static inline int  prepare_mount_rootfs(void) { return 0; }
> > +static inline void finish_mount_rootfs(void) { }
> > +static inline void revert_mount_rootfs(void) { }
> > +
> > +#endif
> >
> >  static inline __init int create_dev(char *name, dev_t dev)
> >  {
> > diff --git a/init/initramfs.c b/init/initramfs.c
> > index af27abc59643..1833de3cf04e 100644
> > --- a/init/initramfs.c
> > +++ b/init/initramfs.c
> > @@ -16,6 +16,8 @@
> >  #include <linux/namei.h>
> >  #include <linux/init_syscalls.h>
> >
> > +#include "do_mounts.h"
> > +
> >  static ssize_t __init xwrite(struct file *file, const char *p, size_t count,
> >               loff_t *pos)
> >  {
> > @@ -682,13 +684,19 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
> >       else
> >               printk(KERN_INFO "Unpacking initramfs...\n");
> >
> > +     if (prepare_mount_rootfs())
> > +             panic("Failed to mount rootfs");
> > +
> >       err = unpack_to_rootfs((char *)initrd_start, initrd_end - initrd_start);
> >       if (err) {
> > +             revert_mount_rootfs();
> >  #ifdef CONFIG_BLK_DEV_RAM
> >               populate_initrd_image(err);
> >  #else
> >               printk(KERN_EMERG "Initramfs unpacking failed: %s\n", err);
> >  #endif
> > +     } else {
> > +             finish_mount_rootfs();
> >       }
> >
> >  done:
> > diff --git a/usr/Kconfig b/usr/Kconfig
> > index 8bbcf699fe3b..4f6ac12eafe9 100644
> > --- a/usr/Kconfig
> > +++ b/usr/Kconfig
> > @@ -52,6 +52,16 @@ config INITRAMFS_ROOT_GID
> >
> >         If you are not sure, leave it set to "0".
> >
> > +config INITRAMFS_MOUNT
> > +     bool "Create second mount to make pivot_root() supported"
> > +     default y
> > +     help
> > +       Before unpacking cpio, create a second mount and make it become
> > +       the root filesystem. Therefore, initramfs will be supported by
> > +       pivot_root().
> > +
> > +       If container platforms is used with initramfs, say Y.
> > +
> >  config RD_GZIP
> >       bool "Support initial ramdisk/ramfs compressed using gzip"
> >       default y
> > --
> > 2.32.0.rc0
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-06-05 14:47     ` Menglong Dong
@ 2021-06-07 10:31       ` Christian Brauner
  2021-06-07 12:15         ` menglong8.dong
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2021-06-07 10:31 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexander Viro, Kees Cook, Sami Tolvanen, johan, ojeda, jeyu,
	masahiroy, joe, Menglong Dong, Jan Kara, hare, Jens Axboe, tj,
	gregkh, song, NeilBrown, Andrew Morton, Rasmus Villemoes,
	Barret Rhoden, f.fainelli, palmerdabbelt, wangkefeng.wang,
	Masami Hiramatsu, Steven Rostedt, vbabka, Alexander Potapenko,
	pmladek, johannes.berg, Eric W. Biederman, jojing64, terrelln,
	geert, linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down,
	mingo, Bjorn Helgaas, Josh Triplett

On Sat, Jun 05, 2021 at 10:47:07PM +0800, Menglong Dong wrote:
> Hello,
> 
> On Sat, Jun 5, 2021 at 7:50 PM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> >
> > On Sat, Jun 05, 2021 at 11:44:47AM +0800, menglong8.dong@gmail.com wrote:
> > > From: Menglong Dong <dong.menglong@zte.com.cn>
> > >
> > > If using container platforms such as Docker, upon initialization it
> > > wants to use pivot_root() so that currently mounted devices do not
> > > propagate to containers. An example of value in this is that
> > > a USB device connected prior to the creation of a containers on the
> > > host gets disconnected after a container is created; if the
> > > USB device was mounted on containers, but already removed and
> > > umounted on the host, the mount point will not go away until all
> > > containers unmount the USB device.
> > >
> > > Another reason for container platforms such as Docker to use pivot_root
> > > is that upon initialization the net-namspace is mounted under
> > > /var/run/docker/netns/ on the host by dockerd. Without pivot_root
> > > Docker must either wait to create the network namespace prior to
> > > the creation of containers or simply deal with leaking this to each
> > > container.
> > >
> > > pivot_root is supported if the rootfs is a initrd or block device, but
> > > it's not supported if the rootfs uses an initramfs (tmpfs). This means
> > > container platforms today must resort to using block devices if
> > > they want to pivot_root from the rootfs. A workaround to use chroot()
> > > is not a clean viable option given every container will have a
> > > duplicate of every mount point on the host.
> > >
> > > In order to support using container platforms such as Docker on
> > > all the supported rootfs types we must extend Linux to support
> > > pivot_root on initramfs as well. This patch does the work to do
> > > just that.
> > >
> > > pivot_root will unmount the mount of the rootfs from its parent mount
> > > and mount the new root to it. However, when it comes to initramfs, it
> > > donesn't work, because the root filesystem has not parent mount, which
> > > makes initramfs not supported by pivot_root.
> > >
> > > In order to make pivot_root supported on initramfs, we create a second
> > > mount with type of rootfs before unpacking cpio, and change root to
> > > this mount after unpacking.
> > >
> > > While mounting the second rootfs, 'rootflags' is passed, and it means
> > > that we can set options for the mount of rootfs in boot cmd now.
> > > For example, the size of tmpfs can be set with 'rootflags=size=1024M'.
> > >
> > > Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
> > > ---
> > >  init/do_mounts.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> > >  init/do_mounts.h | 17 ++++++++++++++++-
> > >  init/initramfs.c |  8 ++++++++
> > >  usr/Kconfig      | 10 ++++++++++
> > >  4 files changed, 78 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/init/do_mounts.c b/init/do_mounts.c
> > > index a78e44ee6adb..715bdaa89b81 100644
> > > --- a/init/do_mounts.c
> > > +++ b/init/do_mounts.c
> > > @@ -618,6 +618,49 @@ void __init prepare_namespace(void)
> > >  }
> > >
> > >  static bool is_tmpfs;
> > > +#ifdef CONFIG_INITRAMFS_MOUNT
> > > +
> > > +/*
> > > + * Give systems running from the initramfs and making use of pivot_root a
> > > + * proper mount so it can be umounted during pivot_root.
> > > + */
> > > +int __init prepare_mount_rootfs(void)
> > > +{
> > > +     char *rootfs = "ramfs";
> > > +
> > > +     if (is_tmpfs)
> > > +             rootfs = "tmpfs";
> > > +
> > > +     return do_mount_root(rootfs, rootfs,
> > > +                          root_mountflags & ~MS_RDONLY,
> > > +                          root_mount_data);
> > > +}
> > > +
> > > +/*
> > > + * Revert to previous mount by chdir to '/' and unmounting the second
> > > + * mount.
> > > + */
> > > +void __init revert_mount_rootfs(void)
> > > +{
> > > +     init_chdir("/");
> > > +     init_umount(".", MNT_DETACH);
> > > +}
> > > +
> > > +/*
> > > + * Change root to the new rootfs that mounted in prepare_mount_rootfs()
> > > + * if cpio is unpacked successfully and 'ramdisk_execute_command' exist.
> > > + */
> > > +void __init finish_mount_rootfs(void)
> > > +{
> > > +     init_mount(".", "/", NULL, MS_MOVE, NULL);
> > > +     if (likely(ramdisk_exec_exist()))
> > > +             init_chroot(".");
> > > +     else
> > > +             revert_mount_rootfs();
> > > +}
> > > +
> > > +#define rootfs_init_fs_context ramfs_init_fs_context
> >
> > Sorry, I think we're nearly there. What's the rationale for using ramfs
> > when unconditionally when a separate mount for initramfs is requested?
> > Meaning, why do we need this define at all?
> 
> I think it's necessary, as I explained in the third patch. When the rootfs
> is a block device, ramfs is used in init_mount_tree() unconditionally,
> which can be seen from the enable of is_tmpfs.
> 
> That makes sense, because rootfs will not become the root if a block
> device is specified by 'root' in boot cmd, so it makes no sense to use
> tmpfs, because ramfs is more simple.
> 
> Here, I make rootfs as ramfs for the same reason: the first mount is not
> used as the root, so make it ramfs which is more simple.

Ok. If you don't mind I'd like to pull and test this before moving
further. (Btw, I talked about this at Plumbers before btw.)
What did you use for testing this? Any way you can share it?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-06-07 10:31       ` Christian Brauner
@ 2021-06-07 12:15         ` menglong8.dong
  2021-06-17  3:57           ` Menglong Dong
  0 siblings, 1 reply; 16+ messages in thread
From: menglong8.dong @ 2021-06-07 12:15 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Menglong Dong, Alexander Viro, Kees Cook, Sami Tolvanen, johan,
	ojeda, jeyu, masahiroy, joe, Jan Kara, hare, Jens Axboe, tj,
	gregkh, song, NeilBrown, Andrew Morton, Rasmus Villemoes,
	Barret Rhoden, f.fainelli, palmerdabbelt, wangkefeng.wang,
	Masami Hiramatsu, Steven Rostedt, vbabka, Alexander Potapenko,
	pmladek, johannes.berg, Eric W. Biederman, jojing64, terrelln,
	geert, linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down,
	mingo, Bjorn Helgaas, Josh Triplett

On Mon, Jun 07, 2021 at 12:31:47PM +0200, Christian Brauner wrote:
> On Sat, Jun 05, 2021 at 10:47:07PM +0800, Menglong Dong wrote:
[...]
> > 
> > I think it's necessary, as I explained in the third patch. When the rootfs
> > is a block device, ramfs is used in init_mount_tree() unconditionally,
> > which can be seen from the enable of is_tmpfs.
> > 
> > That makes sense, because rootfs will not become the root if a block
> > device is specified by 'root' in boot cmd, so it makes no sense to use
> > tmpfs, because ramfs is more simple.
> > 
> > Here, I make rootfs as ramfs for the same reason: the first mount is not
> > used as the root, so make it ramfs which is more simple.
> 
> Ok. If you don't mind I'd like to pull and test this before moving
> further. (Btw, I talked about this at Plumbers before btw.)
> What did you use for testing this? Any way you can share it?

Ok, no problem definitely. I tested this function in 3 way mainly:

1. I debug the kernel with qemu and gdb, and trace the the whole
   process, to ensure that there is no abnormal situation.
2. I tested pivot_root() in initramfs and ensured that it can be
   used normally. What's more, I also tested docker and ensured
   container can run normally without 'DOCKER_RAMDISK=yes' set in
   initramfs.
3. I tried to enable and disable CONFIG_INITRAMFS_MOUNT, and
   ensured that the system can boot successfully from initramfs, initrd
   and sda.

What's more, our team is going to test it comprehensively, such as
ltp, etc.

Thanks!
Menglong Dong                                                                                                                                                         



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root
  2021-06-05  3:44 [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root menglong8.dong
  2021-06-05  3:44 ` [PATCH v6 1/2] init/main.c: introduce function ramdisk_exec_exist() menglong8.dong
  2021-06-05  3:44 ` [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs menglong8.dong
@ 2021-06-09 14:03 ` Masami Hiramatsu
  2021-06-10  6:36   ` Menglong Dong
  2 siblings, 1 reply; 16+ messages in thread
From: Masami Hiramatsu @ 2021-06-09 14:03 UTC (permalink / raw)
  To: menglong8.dong
  Cc: christian.brauner, viro, keescook, samitolvanen, johan, ojeda,
	jeyu, masahiroy, joe, dong.menglong, jack, hare, axboe, tj,
	gregkh, song, neilb, akpm, linux, brho, f.fainelli,
	palmerdabbelt, wangkefeng.wang, mhiramat, rostedt, vbabka,
	glider, pmladek, johannes.berg, ebiederm, jojing64, terrelln,
	geert, linux-fsdevel, linux-kernel, mcgrof, arnd, chris, mingo,
	bhelgaas, josh

On Sat,  5 Jun 2021 11:44:45 +0800
menglong8.dong@gmail.com wrote:

> From: Menglong Dong <dong.menglong@zte.com.cn>
> 
> As Luis Chamberlain suggested, I split the patch:
> [init/initramfs.c: make initramfs support pivot_root]
> (https://lore.kernel.org/linux-fsdevel/20210520154244.20209-1-dong.menglong@zte.com.cn/)
> into three.
> 
> The goal of the series patches is to make pivot_root() support initramfs.
> 
> In the first patch, I introduce the function ramdisk_exec_exist(), which
> is used to check the exist of 'ramdisk_execute_command' in LOOKUP_DOWN
> lookup mode.
> 
> In the second patch, I create a second mount, which is called
> 'user root', and make it become the root. Therefore, the root has a
> parent mount, and it can be umounted or pivot_root.
> 
> In the third patch, I fix rootfs_fs_type with ramfs, as it is not used
> directly any more, and it make no sense to switch it between ramfs and
> tmpfs, just fix it with ramfs to simplify the code.

Hi,

I have tested this series on qemu with shell script container on initramfs.
It works for me!

Tested-by: Masami Hiramatsu <mhiramat@kernel.org>

Thank you,

> 
> 
> Changes since V5:
> 
> Remove the third patch and make it combined with the second one.
> 
> 
> Changes since V4:                                                                                                                                                     
>                                                                                                                                                                       
> Do some more code cleanup for the second patch, include:                                                                                                              
> - move 'ramdisk_exec_exist()' to 'init.h'                                                                                                                             
> - remove unnecessary struct 'fs_rootfs_root'                                                                                                                          
> - introduce 'revert_mount_rootfs()'                                                                                                                                   
> - [...]                                                                                                                                                               
> 
> 
> Changes since V3:
> 
> Do a code cleanup for the second patch, as Christian Brauner suggested:
> - remove the concept 'user root', which seems not suitable.
> - introduce inline function 'check_tmpfs_enabled()' to avoid duplicated
>   code.
> - rename function 'mount_user_root' to 'prepare_mount_rootfs'
> - rename function 'end_mount_user_root' to 'finish_mount_rootfs'
> - join 'init_user_rootfs()' with 'prepare_mount_rootfs()'
> 
> Changes since V2:
> 
> In the first patch, I use vfs_path_lookup() in init_eaccess() to make the
> path lookup follow the mount on '/'. After this, the problem reported by
> Masami Hiramatsu is solved. Thanks for your report :/
> 
> 
> Changes since V1:
> 
> In the first patch, I add the flag LOOKUP_DOWN to init_eaccess(), to make
> it support the check of filesystem mounted on '/'.
> 
> In the second patch, I control 'user root' with kconfig option
> 'CONFIG_INITRAMFS_USER_ROOT', and add some comments, as Luis Chamberlain
> suggested.
> 
> In the third patch, I make 'rootfs_fs_type' in control of
> 'CONFIG_INITRAMFS_USER_ROOT'.
> 
> 
> 
> Menglong Dong (2):
>   init/do_mounts.c: create second mount for initramfs
>   init/do_mounts.c: fix rootfs_fs_type with ramfs
> 
>  include/linux/init.h |   4 ++
>  init/do_mounts.c     | 101 ++++++++++++++++++++++++++++++++++++++++---
>  init/do_mounts.h     |  16 ++++++-
>  init/initramfs.c     |   8 ++++
>  usr/Kconfig          |  10 +++++
>  5 files changed, 131 insertions(+), 8 deletions(-)
> 
> -- 
> 2.32.0.rc0
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root
  2021-06-09 14:03 ` [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root Masami Hiramatsu
@ 2021-06-10  6:36   ` Menglong Dong
  0 siblings, 0 replies; 16+ messages in thread
From: Menglong Dong @ 2021-06-10  6:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: christian.brauner, viro, keescook, samitolvanen, johan, ojeda,
	jeyu, masahiroy, joe, dong.menglong, jack, hare, axboe, tj,
	gregkh, song, neilb, akpm, linux, brho, f.fainelli,
	palmerdabbelt, wangkefeng.wang, rostedt, vbabka, glider, pmladek,
	johannes.berg, ebiederm, jojing64, terrelln, geert,
	linux-fsdevel, linux-kernel, mcgrof, arnd, chris, mingo,
	bhelgaas, josh

On Wed, Jun 09, 2021 at 11:03:12PM +0900, Masami Hiramatsu wrote:
> On Sat,  5 Jun 2021 11:44:45 +0800
> menglong8.dong@gmail.com wrote:
> 
> > From: Menglong Dong <dong.menglong@zte.com.cn>
> > 
> > As Luis Chamberlain suggested, I split the patch:
> > [init/initramfs.c: make initramfs support pivot_root]
> > (https://lore.kernel.org/linux-fsdevel/20210520154244.20209-1-dong.menglong@zte.com.cn/)
> > into three.
> > 
> > The goal of the series patches is to make pivot_root() support initramfs.
> > 
> > In the first patch, I introduce the function ramdisk_exec_exist(), which
> > is used to check the exist of 'ramdisk_execute_command' in LOOKUP_DOWN
> > lookup mode.
> > 
> > In the second patch, I create a second mount, which is called
> > 'user root', and make it become the root. Therefore, the root has a
> > parent mount, and it can be umounted or pivot_root.
> > 
> > In the third patch, I fix rootfs_fs_type with ramfs, as it is not used
> > directly any more, and it make no sense to switch it between ramfs and
> > tmpfs, just fix it with ramfs to simplify the code.
> 
> Hi,
> 
> I have tested this series on qemu with shell script container on initramfs.
> It works for me!
> 
> Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> Thank you,
> 

Ok, thank you :/

Menglong Dong

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-06-07 12:15         ` menglong8.dong
@ 2021-06-17  3:57           ` Menglong Dong
  2021-06-17 14:38             ` Christian Brauner
  0 siblings, 1 reply; 16+ messages in thread
From: Menglong Dong @ 2021-06-17  3:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Kees Cook, Sami Tolvanen, johan, ojeda, jeyu,
	masahiroy, joe, Jan Kara, hare, Jens Axboe, tj, gregkh, song,
	NeilBrown, Andrew Morton, Rasmus Villemoes, Barret Rhoden,
	f.fainelli, palmerdabbelt, wangkefeng.wang, Masami Hiramatsu,
	Steven Rostedt, vbabka, Alexander Potapenko, pmladek,
	johannes.berg, Eric W. Biederman, jojing64, terrelln, geert,
	linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down, mingo,
	Bjorn Helgaas, Josh Triplett

Hello,

On Mon, Jun 07, 2021 at 05:15:24AM -0700, menglong8.dong@gmail.com wrote:
> On Mon, Jun 07, 2021 at 12:31:47PM +0200, Christian Brauner wrote:
> > On Sat, Jun 05, 2021 at 10:47:07PM +0800, Menglong Dong wrote:
> [...]
> > > 
> > > I think it's necessary, as I explained in the third patch. When the rootfs
> > > is a block device, ramfs is used in init_mount_tree() unconditionally,
> > > which can be seen from the enable of is_tmpfs.
> > > 
> > > That makes sense, because rootfs will not become the root if a block
> > > device is specified by 'root' in boot cmd, so it makes no sense to use
> > > tmpfs, because ramfs is more simple.
> > > 
> > > Here, I make rootfs as ramfs for the same reason: the first mount is not
> > > used as the root, so make it ramfs which is more simple.
> > 
> > Ok. If you don't mind I'd like to pull and test this before moving
> > further. (Btw, I talked about this at Plumbers before btw.)
> > What did you use for testing this? Any way you can share it?
> 

I notice that it have been ten days, and is it ok to move a little
further? (knock-knock :/)

Thanks!
Menglong Dong

> Ok, no problem definitely. I tested this function in 3 way mainly:
> 
> 1. I debug the kernel with qemu and gdb, and trace the the whole
>    process, to ensure that there is no abnormal situation.
> 2. I tested pivot_root() in initramfs and ensured that it can be
>    used normally. What's more, I also tested docker and ensured
>    container can run normally without 'DOCKER_RAMDISK=yes' set in
>    initramfs.
> 3. I tried to enable and disable CONFIG_INITRAMFS_MOUNT, and
>    ensured that the system can boot successfully from initramfs, initrd
>    and sda.
> 
> What's more, our team is going to test it comprehensively, such as
> ltp, etc.
> 
> Thanks!
> Menglong Dong                                                                                                                                                         
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-06-17  3:57           ` Menglong Dong
@ 2021-06-17 14:38             ` Christian Brauner
  2021-07-27 12:24               ` Menglong Dong
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2021-06-17 14:38 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexander Viro, Kees Cook, Sami Tolvanen, johan, ojeda, jeyu,
	masahiroy, joe, Jan Kara, hare, Jens Axboe, tj, gregkh, song,
	NeilBrown, Andrew Morton, Rasmus Villemoes, Barret Rhoden,
	f.fainelli, palmerdabbelt, wangkefeng.wang, Masami Hiramatsu,
	Steven Rostedt, vbabka, Alexander Potapenko, pmladek,
	johannes.berg, Eric W. Biederman, jojing64, terrelln, geert,
	linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down, mingo,
	Bjorn Helgaas, Josh Triplett

On Wed, Jun 16, 2021 at 08:57:56PM -0700, Menglong Dong wrote:
> Hello,
> 
> On Mon, Jun 07, 2021 at 05:15:24AM -0700, menglong8.dong@gmail.com wrote:
> > On Mon, Jun 07, 2021 at 12:31:47PM +0200, Christian Brauner wrote:
> > > On Sat, Jun 05, 2021 at 10:47:07PM +0800, Menglong Dong wrote:
> > [...]
> > > > 
> > > > I think it's necessary, as I explained in the third patch. When the rootfs
> > > > is a block device, ramfs is used in init_mount_tree() unconditionally,
> > > > which can be seen from the enable of is_tmpfs.
> > > > 
> > > > That makes sense, because rootfs will not become the root if a block
> > > > device is specified by 'root' in boot cmd, so it makes no sense to use
> > > > tmpfs, because ramfs is more simple.
> > > > 
> > > > Here, I make rootfs as ramfs for the same reason: the first mount is not
> > > > used as the root, so make it ramfs which is more simple.
> > > 
> > > Ok. If you don't mind I'd like to pull and test this before moving
> > > further. (Btw, I talked about this at Plumbers before btw.)
> > > What did you use for testing this? Any way you can share it?
> > 
> 
> I notice that it have been ten days, and is it ok to move a little
> further? (knock-knock :/)

Hey Menglong,

Since we're very close to the next kernel release it's unlikely that
anything will happen before the merge window has closed.
Otherwise I think we're close. I haven't had the time to test yet but if
nothing major comes up I'll pick it up and route it through my tree.
We need to be sure there's no regressions for anyone using this.

Thanks!
Christian

> 
> Thanks!
> Menglong Dong
> 
> > Ok, no problem definitely. I tested this function in 3 way mainly:
> > 
> > 1. I debug the kernel with qemu and gdb, and trace the the whole
> >    process, to ensure that there is no abnormal situation.
> > 2. I tested pivot_root() in initramfs and ensured that it can be
> >    used normally. What's more, I also tested docker and ensured
> >    container can run normally without 'DOCKER_RAMDISK=yes' set in
> >    initramfs.
> > 3. I tried to enable and disable CONFIG_INITRAMFS_MOUNT, and
> >    ensured that the system can boot successfully from initramfs, initrd
> >    and sda.
> > 
> > What's more, our team is going to test it comprehensively, such as
> > ltp, etc.
> > 
> > Thanks!
> > Menglong Dong                                                                                                                                                         
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-06-17 14:38             ` Christian Brauner
@ 2021-07-27 12:24               ` Menglong Dong
  2021-07-27 12:37                 ` Christian Brauner
  0 siblings, 1 reply; 16+ messages in thread
From: Menglong Dong @ 2021-07-27 12:24 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Kees Cook, Sami Tolvanen, johan, ojeda, jeyu,
	masahiroy, joe, Jan Kara, hare, Jens Axboe, tj, gregkh, song,
	NeilBrown, Andrew Morton, Rasmus Villemoes, Barret Rhoden,
	f.fainelli, palmerdabbelt, wangkefeng.wang, Masami Hiramatsu,
	Steven Rostedt, vbabka, Alexander Potapenko, pmladek,
	johannes.berg, Eric W. Biederman, jojing64, terrelln, geert,
	linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down, mingo,
	Bjorn Helgaas, Josh Triplett

Hello Christian,

On Thu, Jun 17, 2021 at 10:38 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
[...]
>
> Hey Menglong,
>
> Since we're very close to the next kernel release it's unlikely that
> anything will happen before the merge window has closed.
> Otherwise I think we're close. I haven't had the time to test yet but if
> nothing major comes up I'll pick it up and route it through my tree.
> We need to be sure there's no regressions for anyone using this.
>

Seems that it has been a month, and is it ok to move a little
further? (knock-knock :/)

Thanks!
Menglong Dong

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-07-27 12:24               ` Menglong Dong
@ 2021-07-27 12:37                 ` Christian Brauner
  2021-07-28  8:07                   ` Petr Mladek
  2021-09-17  1:58                   ` Menglong Dong
  0 siblings, 2 replies; 16+ messages in thread
From: Christian Brauner @ 2021-07-27 12:37 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexander Viro, Kees Cook, Sami Tolvanen, johan, ojeda, jeyu,
	masahiroy, joe, Jan Kara, hare, Jens Axboe, tj, gregkh, song,
	NeilBrown, Andrew Morton, Rasmus Villemoes, Barret Rhoden,
	f.fainelli, palmerdabbelt, wangkefeng.wang, Masami Hiramatsu,
	Steven Rostedt, vbabka, Alexander Potapenko, pmladek,
	johannes.berg, Eric W. Biederman, jojing64, terrelln, geert,
	linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down, mingo,
	Bjorn Helgaas, Josh Triplett

On Tue, Jul 27, 2021 at 08:24:03PM +0800, Menglong Dong wrote:
> Hello Christian,
> 
> On Thu, Jun 17, 2021 at 10:38 PM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> >
> [...]
> >
> > Hey Menglong,
> >
> > Since we're very close to the next kernel release it's unlikely that
> > anything will happen before the merge window has closed.
> > Otherwise I think we're close. I haven't had the time to test yet but if
> > nothing major comes up I'll pick it up and route it through my tree.
> > We need to be sure there's no regressions for anyone using this.
> >
> 
> Seems that it has been a month, and is it ok to move a little
> further? (knock-knock :/)

Yep, sorry.
When I tested this early during the merge window it regressed booting a
regular system for me meaning if I compiled a kernel with this feature
enabled it complained about not being being able to open an initial
console and it dropped me right into initramfs instead of successfully
booting. I haven't looked into what this is caused or how to fix it for
lack of time.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-07-27 12:37                 ` Christian Brauner
@ 2021-07-28  8:07                   ` Petr Mladek
  2021-07-29 13:25                     ` Menglong Dong
  2021-09-17  1:58                   ` Menglong Dong
  1 sibling, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2021-07-28  8:07 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Menglong Dong, Alexander Viro, Kees Cook, Sami Tolvanen, johan,
	ojeda, jeyu, masahiroy, joe, Jan Kara, hare, Jens Axboe, tj,
	gregkh, song, NeilBrown, Andrew Morton, Rasmus Villemoes,
	Barret Rhoden, f.fainelli, palmerdabbelt, wangkefeng.wang,
	Masami Hiramatsu, Steven Rostedt, vbabka, Alexander Potapenko,
	johannes.berg, Eric W. Biederman, jojing64, terrelln, geert,
	linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down, mingo,
	Bjorn Helgaas, Josh Triplett

On Tue 2021-07-27 14:37:01, Christian Brauner wrote:
> On Tue, Jul 27, 2021 at 08:24:03PM +0800, Menglong Dong wrote:
> > Hello Christian,
> > 
> > On Thu, Jun 17, 2021 at 10:38 PM Christian Brauner
> > <christian.brauner@ubuntu.com> wrote:
> > >
> > [...]
> > >
> > > Hey Menglong,
> > >
> > > Since we're very close to the next kernel release it's unlikely that
> > > anything will happen before the merge window has closed.
> > > Otherwise I think we're close. I haven't had the time to test yet but if
> > > nothing major comes up I'll pick it up and route it through my tree.
> > > We need to be sure there's no regressions for anyone using this.
> > >
> > 
> > Seems that it has been a month, and is it ok to move a little
> > further? (knock-knock :/)
> 
> Yep, sorry.
> When I tested this early during the merge window it regressed booting a
> regular system for me meaning if I compiled a kernel with this feature
> enabled it complained about not being being able to open an initial
> console and it dropped me right into initramfs instead of successfully
> booting. I haven't looked into what this is caused or how to fix it for
> lack of time.

I guess that you have seen the following message printed by
console_on_rootfs():

      "Warning: unable to open an initial console."

This function is responsible for opening stdin, stdout, stderr
file to be used by the init process.

I am not sure how this is supposed to work with the pivot_root
and initramfs.


Some more details:

console_on_rootfs() tries to open /dev/console. It is created
by tty_init(). The open() callback calls:

  + tty_kopen()
    + tty_lookup_driver()
      + console_device()

, where console_device() iterates over all registered consoles
  and returns the first with tty binding.

There is ttynull_console that might be used as a fallback. But I
am not sure if this is what you want.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-07-28  8:07                   ` Petr Mladek
@ 2021-07-29 13:25                     ` Menglong Dong
  0 siblings, 0 replies; 16+ messages in thread
From: Menglong Dong @ 2021-07-29 13:25 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Christian Brauner, Alexander Viro, Kees Cook, Sami Tolvanen,
	johan, ojeda, jeyu, masahiroy, joe, Jan Kara, hare, Jens Axboe,
	tj, gregkh, song, NeilBrown, Andrew Morton, Rasmus Villemoes,
	Barret Rhoden, f.fainelli, palmerdabbelt, wangkefeng.wang,
	Masami Hiramatsu, Steven Rostedt, vbabka, Alexander Potapenko,
	johannes.berg, Eric W. Biederman, jojing64, terrelln, geert,
	linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down, mingo,
	Bjorn Helgaas, Josh Triplett

Hello,

On Wed, Jul 28, 2021 at 4:07 PM Petr Mladek <pmladek@suse.com> wrote:
>
[...]
>
> I guess that you have seen the following message printed by
> console_on_rootfs():
>
>       "Warning: unable to open an initial console."
>
> This function is responsible for opening stdin, stdout, stderr
> file to be used by the init process.
>
> I am not sure how this is supposed to work with the pivot_root
> and initramfs.
>
>
> Some more details:
>
> console_on_rootfs() tries to open /dev/console. It is created
> by tty_init(). The open() callback calls:
>
>   + tty_kopen()
>     + tty_lookup_driver()
>       + console_device()
>
> , where console_device() iterates over all registered consoles
>   and returns the first with tty binding.
>
> There is ttynull_console that might be used as a fallback. But I
> am not sure if this is what you want.

I didn't figure out the relation between initramfs and initial console,
could you please tell me how this warning came up? I can't
reproduce it in qemu with this command:

qemu-system-x86_64 -nographic -m 2048M -smp cores=4,sockets=1 -s
-kernel ./bzImage -initrd ./rootfs.cpio -append "rdinit=/init
console=ttyS0"

Thanks!
Menglong Dong

>
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs
  2021-07-27 12:37                 ` Christian Brauner
  2021-07-28  8:07                   ` Petr Mladek
@ 2021-09-17  1:58                   ` Menglong Dong
  1 sibling, 0 replies; 16+ messages in thread
From: Menglong Dong @ 2021-09-17  1:58 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, Kees Cook, Sami Tolvanen, johan, ojeda, jeyu,
	masahiroy, joe, Jan Kara, hare, Jens Axboe, tj, gregkh, song,
	NeilBrown, Andrew Morton, Rasmus Villemoes, Barret Rhoden,
	f.fainelli, palmerdabbelt, wangkefeng.wang, Masami Hiramatsu,
	Steven Rostedt, vbabka, Alexander Potapenko, Petr Mladek,
	johannes.berg, Eric W. Biederman, jojing64, terrelln, geert,
	linux-fsdevel, LKML, Luis Chamberlain, arnd, Chris Down, mingo,
	Bjorn Helgaas, Josh Triplett

Hello,

On Tue, Jul 27, 2021 at 8:37 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
[...]
>
> Yep, sorry.
> When I tested this early during the merge window it regressed booting a
> regular system for me meaning if I compiled a kernel with this feature
> enabled it complained about not being being able to open an initial
> console and it dropped me right into initramfs instead of successfully
> booting. I haven't looked into what this is caused or how to fix it for
> lack of time.

Our team has fully tested this feature, and no abnormalities have been
found yet.
What's more, this feature has been used in the product of our company. So if
there is any potential bug, as you mentioned above, I'd appreciate it if you can
spend some time on looking into it.

What's more, besides the problem that this feature solved, it has some more
benefits: saving memory. The amount of 'mnt_cache' is up to 50k when 180 docker
containers are created without this feature. However, only 15k 'mnt_cache' are
used with this feature enabled. Each 'mnt_cache' eats 320 bytes, so about 11M
memory is saved in this situation.

Please let me know if this feature is illogical or if there is any
better solution, thanks~

Best Wishes!
Menglong Dong

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-09-17  1:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-05  3:44 [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root menglong8.dong
2021-06-05  3:44 ` [PATCH v6 1/2] init/main.c: introduce function ramdisk_exec_exist() menglong8.dong
2021-06-05  3:44 ` [PATCH v6 2/2] init/do_mounts.c: create second mount for initramfs menglong8.dong
2021-06-05 11:50   ` Christian Brauner
2021-06-05 14:47     ` Menglong Dong
2021-06-07 10:31       ` Christian Brauner
2021-06-07 12:15         ` menglong8.dong
2021-06-17  3:57           ` Menglong Dong
2021-06-17 14:38             ` Christian Brauner
2021-07-27 12:24               ` Menglong Dong
2021-07-27 12:37                 ` Christian Brauner
2021-07-28  8:07                   ` Petr Mladek
2021-07-29 13:25                     ` Menglong Dong
2021-09-17  1:58                   ` Menglong Dong
2021-06-09 14:03 ` [PATCH v6 0/2] init/initramfs.c: make initramfs support pivot_root Masami Hiramatsu
2021-06-10  6:36   ` Menglong Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).