Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Ignat Korchagin <ignat@cloudflare.com>
To: Arvind Sankar <nivedita@alum.mit.edu>,
	James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH] mnt: add support for non-rootfs initramfs
Date: Thu, 5 Mar 2020 22:53:54 +0000
Message-ID: <CALrw=nH3pOmjUqN44MkBPcBCXU4VrgT36Bs0R66aSdLPg08XQg@mail.gmail.com> (raw)
In-Reply-To: <20200305222117.GA1291132@rani.riverdale.lan>

On Thu, Mar 5, 2020 at 10:21 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Thu, Mar 05, 2020 at 01:09:10PM -0800, James Bottomley wrote:
> > On Thu, 2020-03-05 at 19:35 +0000, Ignat Korchagin wrote:
> > > The main need for this is to support container runtimes on stateless
> > > Linux system (pivot_root system call from initramfs).
> > >
> > > Normally, the task of initramfs is to mount and switch to a "real"
> > > root filesystem. However, on stateless systems (booting over the
> > > network) it is just convenient to have your "real" filesystem as
> > > initramfs from the start.
> > >
> > > This, however, breaks different container runtimes, because they
> > > usually use pivot_root system call after creating their mount
> > > namespace. But pivot_root does not work from initramfs, because
> > > initramfs runs form rootfs, which is the root of the mount tree and
> > > can't be unmounted.
> >
> > Can you say more about why this is a problem?  We use pivot_root to
> > pivot from the initramfs rootfs to the newly discovered and mounted
> > real root ... the same mechanism should work for a container (mount
> > namespace) running from initramfs ... why doesn't it?
>
> Not sure how it interacts with mount namespaces, but we don't use
> pivot_root to go from rootfs to the real root. We use switch_root, which
> moves the new root onto the old / using mount with MS_MOVE and then
> chroot to it.
>
> https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
>
> >
> > The sequence usually looks like: create and enter a mount namespace,
> > build a tmpfs for the container in some $root directory then do
> >
> >
> >     cd $root
> >     mkdir old-root
> >     pivot_root . old-root
> >     mount --
> > make-rprivate /old-root
> >     umount -l /old-root
> >     rmdir /old-root
> >
> > Once that's done you're disconnected from the initramfs root.  The
> > sequence is really no accident because it's what the initramfs would
> > have done to pivot to the new root anyway (that's where container
> > people got it from).
> >
> >
> > James
> >

Yes, to add to Arvind's point the above sequence will only work for
"old style" initrd (block ramdisk with some filesystem image on top),
but will not work for the "new style" initramfs (just a disguised
tmpfs). The sequence will fail on "pivot_root" with EINVAL (see
pivot_root(2)). In fact this patch conceptually tries to have the same
behaviour as with "old style" initrd. As currently, if you use initrd:
1. The kernel will create an empty "dummy" initramfs
2. Create a ramdisk
3. Unpack the FS image into the ramdisk
4. Mount the the disk
5. Do switch_root/move etc

So we have initial mount tree as: rootfs->some_initrd_fs
(and pivot_root works here and you get empty rootfs by default)

With this option we have similar in the end: rootfs->tmpfs
and rootfs is empty, because the kernel never unpacked anything there.

  reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-05 19:35 Ignat Korchagin
2020-03-05 20:21 ` Al Viro
2020-03-05 22:45   ` Ignat Korchagin
2020-03-05 21:09 ` James Bottomley
2020-03-05 22:21   ` Arvind Sankar
2020-03-05 22:53     ` Ignat Korchagin [this message]
2020-03-11 14:01 ` Ignat Korchagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALrw=nH3pOmjUqN44MkBPcBCXU4VrgT36Bs0R66aSdLPg08XQg@mail.gmail.com' \
    --to=ignat@cloudflare.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nivedita@alum.mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git