linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: mtk.manpages@gmail.com,
	Christian Brauner <christian.brauner@ubuntu.com>,
	linux-man <linux-man@vger.kernel.org>,
	Containers <containers@lists.linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Jordan Ogas <jogas@lanl.gov>,
	werner@almesberger.net, Al Viro <viro@ftp.linux.org.uk>
Subject: Re: pivot_root(".", ".") and the fchdir() dance
Date: Mon, 7 Oct 2019 14:02:12 +0300	[thread overview]
Message-ID: <7e4b23df-ab83-3d5a-3dc5-54025e3682cf@gmail.com> (raw)
In-Reply-To: <87y2y6j9i1.fsf@x220.int.ebiederm.org>

Hello Eric,

On 9/30/19 2:42 PM, Eric W. Biederman wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> 
>> Hello Eric,
>>
>> A ping on my question below. Could you take a look please?
>>
>> Thanks,
>>
>> Michael
>>
>>>>>> The concern from our conversation at the container mini-summit was that
>>>>>> there is a pathology if in your initial mount namespace all of the
>>>>>> mounts are marked MS_SHARED like systemd does (and is almost necessary
>>>>>> if you are going to use mount propagation), that if new_root itself
>>>>>> is MS_SHARED then unmounting the old_root could propagate.
>>>>>>
>>>>>> So I believe the desired sequence is:
>>>>>>
>>>>>>>>>             chdir(new_root);
>>>>>> +++            mount("", ".", MS_SLAVE | MS_REC, NULL);
>>>>>>>>>             pivot_root(".", ".");
>>>>>>>>>             umount2(".", MNT_DETACH);
>>>>>>
>>>>>> The change to new new_root could be either MS_SLAVE or MS_PRIVATE.  So
>>>>>> long as it is not MS_SHARED the mount won't propagate back to the
>>>>>> parent mount namespace.
>>>>>
>>>>> Thanks. I made that change.
>>>>
>>>> For what it is worth.  The sequence above without the change in mount
>>>> attributes will fail if it is necessary to change the mount attributes
>>>> as "." is both put_old as well as new_root.
>>>>
>>>> When I initially suggested the change I saw "." was new_root and forgot
>>>> "." was also put_old.  So I thought there was a silent danger without
>>>> that sequence.
>>>
>>> So, now I am a little confused by the comments you added here. Do you
>>> now mean that the
>>>
>>> mount("", ".", MS_SLAVE | MS_REC, NULL);
>>>
>>> call is not actually necessary?
> 
> Apologies for being slow getting back to you.

NP. Thanks for your reply.

> To my knowledge there are two cases where pivot_root is used.
> - In the initial mount namespace from a ramdisk when mounting root.
>    This is the original use case and somewhat historical as rootfs
>    (aka an initial ramfs) may not be unmounted.
> 
> - When setting up a new mount namespace to jettison all of the mounts
>    you don't need.
> 
> The sequence:
> 
> 	chdir(new_root);
>          pivot_root(".", ".");
>          umount2(".", MNT_DETACH);
> 
> is perfect for both use cases (as nothing needs to be known about the
> directory layout of the new root filesystem).
> 
> In the case when you are setting up a new mount namespace propogating
> changes in the mount layout to another mount namespace is fatal.  But
> that is not a concern for using that pivot_root sequence above because
> pivot_root will fail deterministically if
> 'mount("", ".", MS_SLAVE | MS_REC, NULL)' is needed but not specified.
> 
> So I would document the above sequence of three system calls in the
> man-page.

Okay. I've changed the example to be just those three calls.

> I would document that pivot_root will fail if propagation would occur.

Yep. That's in the page already.

> I would document in pivot_root or under unshare(CLONE_NEWNS) that if
> mount propagation is enabled (the default with systemd) that you
> need to call 'mount("", "/", MS_SLAVE | MS_REC, NULL);' or
> 'mount("", "/", MS_PRIVATE | MS_REC, NULL);' after creating a mount
> namespace.  Or mounts will propagate backwards, which is usually
> not what people want.

Thanks. Instead, I have added the following text to
mount_namespaces(7), the page that is referred to by both clone(2) and 
unshare(2) in their discussions of CLONE_NEWNS:

        An   application  that  creates  a  new  mount  namespace
        directly using clone(2) or unshare(2) may desire to  pre‐
        vent  propagation  of  mount events to other mount names‐
        paces (as is is done by unshare(1)).  This can be done by
        changing  the propagation type of mount points in the new
        namesapace to either MS_SLAVE  or  MS_PRIVATE.   using  a
        call such as the following:

                   mount(NULL, "/", MS_SLAVE | MS_REC, NULL);

> Creating of a mount namespace in a user namespace automatically does
> 'mount("", "/", MS_SLAVE | MS_REC, NULL);' if the starting mount
> namespace was not created in that user namespace.  AKA creating
> a mount namespace in a user namespace does the unshare for you.

Oh -- I had forgotten that detail. But it is documented
(by you, I think) in mount_namespaces(7):

        *  A  mount  namespace  has  an  owner user namespace.  A
           mount namespace whose owner user namespace is  differ‐
           ent  from the owner user namespace of its parent mount
           namespace is considered a less privileged mount names‐
           pace.

        *  When  creating  a  less  privileged  mount  namespace,
           shared mounts are reduced to  slave  mounts.   (Shared
           and  slave  mounts are discussed below.)  This ensures
           that  mappings  performed  in  less  privileged  mount
           namespaces will not propagate to more privileged mount
           namespaces.

There's one point that description that troubles me. There is a
reference to "parent mount namespace", but as I understand things
there is no parental relationship among mount namespaces instances
(or am I wrong?). Should that wording not be rather something
like "the mount namespace of the process that created this mount
namespace"?

Thanks,

Michael



  reply	other threads:[~2019-10-07 11:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-01 13:38 pivot_root(".", ".") and the fchdir() dance Michael Kerrisk (man-pages)
2019-08-05 10:36 ` Aleksa Sarai
2019-08-05 12:29   ` Michael Kerrisk (man-pages)
2019-08-05 13:37     ` Aleksa Sarai
2019-08-06 19:35       ` Michael Kerrisk (man-pages)
2019-08-06  8:12     ` Philipp Wendler
2019-08-06 12:03       ` Michael Kerrisk (man-pages)
2019-09-09 10:40         ` Eric W. Biederman
2019-09-09 14:48           ` Michael Kerrisk (man-pages)
2019-09-09 23:40             ` Eric W. Biederman
2019-09-10 10:27               ` Michael Kerrisk (man-pages)
2019-09-10 11:15                 ` Christian Brauner
2019-09-10 11:21                   ` Michael Kerrisk (man-pages)
2019-09-10 23:06                     ` Eric W. Biederman
2019-09-15  8:12                       ` Michael Kerrisk (man-pages)
2019-09-15 18:17                         ` Eric W. Biederman
2019-09-23 11:10                           ` Michael Kerrisk (man-pages)
2019-09-28 15:05                             ` Michael Kerrisk (man-pages)
2019-09-30 11:42                               ` Eric W. Biederman
2019-10-07 11:02                                 ` Michael Kerrisk (man-pages) [this message]
2019-10-07 15:46                                   ` Eric W. Biederman
2019-10-08 14:27                                     ` Michael Kerrisk (man-pages)
2019-10-08 19:40                                       ` Eric W. Biederman
2019-10-08 21:40                                         ` Michael Kerrisk (man-pages)
2019-10-08 22:16                                           ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e4b23df-ab83-3d5a-3dc5-54025e3682cf@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=jogas@lanl.gov \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=viro@ftp.linux.org.uk \
    --cc=werner@almesberger.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).