Re: [PATCHi, man-pages] mount_namespaces.7: More clearly explain "locked mounts"

From: ebiederm@xmission.com (Eric W. Biederman)
To: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: linux-man <linux-man@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	containers@lists.linux-foundation.org,
	Alejandro Colomar <alx.manpages@gmail.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	linux-kernel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCHi, man-pages] mount_namespaces.7: More clearly explain "locked mounts"
Date: Mon, 16 Aug 2021 11:03:03 -0500	[thread overview]
Message-ID: <87r1et1io8.fsf@disp2133> (raw)
In-Reply-To: <20210813220120.502058-1-mtk.manpages@gmail.com> (Michael Kerrisk's message of "Sat, 14 Aug 2021 00:01:20 +0200")

Michael Kerrisk <mtk.manpages@gmail.com> writes:

> For a long time, this manual page has had a brief discussion of
> "locked" mounts, without clearly saying what this concept is, or
> why it exists. Expand the discussion with an explanation of what
> locked mounts are, why mounts are locked, and some examples of the
> effect of locking.
>
> Thanks to Christian Brauner for a lot of help in understanding
> these details.
>
> Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
> ---
>
> Hello Eric and others,
>
> After some quite helpful info from Chrstian Brauner, I've expanded
> the discussion of locked mounts (a concept I didn't really have a
> good grasp on) in the mount_namespaces(7) manual page. I would be
> grateful to receive review comments, acks, etc., on the patch below.
> Could you take a look please?
>
> Cheers,
>
> Michael
>
>  man7/mount_namespaces.7 | 73 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 73 insertions(+)
>
> diff --git a/man7/mount_namespaces.7 b/man7/mount_namespaces.7
> index e3468bdb7..97427c9ea 100644
> --- a/man7/mount_namespaces.7
> +++ b/man7/mount_namespaces.7
> @@ -107,6 +107,62 @@ operation brings across all of the mounts from the original
>  mount namespace as a single unit,
>  and recursive mounts that propagate between
>  mount namespaces propagate as a single unit.)
> +.IP
> +In this context, "may not be separated" means that the mounts
> +are locked so that they may not be individually unmounted.
> +Consider the following example:
> +.IP
> +.RS
> +.in +4n
> +.EX
> +$ \fBsudo mkdir /mnt/dir\fP
> +$ \fBsudo sh \-c \(aqecho "aaaaaa" > /mnt/dir/a\(aq\fP
> +$ \fBsudo mount \-\-bind -o ro /some/path /mnt/dir\fP
> +$ \fBls /mnt/dir\fP   # Former contents of directory are invisible

Do we want a more motivating example such as a /proc/sys?

It has been common to mount over /proc files and directories that can be
written to by the global root so that users in a mount namespace may not
touch them.

> +.EE
> +.in
> +.RE
> +.IP
> +The above steps, performed in a more privileged user namespace,
> +have created a (read-only) bind mount that
> +obscures the contents of the directory
> +.IR /mnt/dir .
> +For security reasons, it should not be possible to unmount
> +that mount in a less privileged user namespace,
> +since that would reveal the contents of the directory
> +.IR /mnt/dir .
 > +.IP
> +Suppose we now create a new mount namespace
> +owned by a (new) subordinate user namespace.
> +The new mount namespace will inherit copies of all of the mounts
> +from the previous mount namespace.
> +However, those mounts will be locked because the new mount namespace
> +is owned by a less privileged user namespace.
> +Consequently, an attempt to unmount the mount fails:
> +.IP
> +.RS
> +.in +4n
> +.EX
> +$ \fBsudo unshare \-\-user \-\-map\-root\-user \-\-mount \e\fP
> +               \fBstrace \-o /tmp/log \e\fP
> +               \fBumount /mnt/dir\fP
> +umount: /mnt/dir: not mounted.
> +$ \fBgrep \(aq^umount\(aq /tmp/log\fP
> +umount2("/mnt/dir", 0)     = \-1 EINVAL (Invalid argument)
> +.EE
> +.in
> +.RE
> +.IP
> +The error message from
> +.BR mount (8)
> +is a little confusing, but the
> +.BR strace (1)
> +output reveals that the underlying
> +.BR umount2 (2)
> +system call failed with the error
> +.BR EINVAL ,
> +which is the error that the kernel returns to indicate that
> +the mount is locked.

Do you want to mention that you can unmount the entire subtree?  Either
with pivot_root if it is locked to "/" or with
"umount -l /path/to/propagated/directory".

>  .IP *
>  The
>  .BR mount (2)
> @@ -128,6 +184,23 @@ settings become locked
>  when propagated from a more privileged to
>  a less privileged mount namespace,
>  and may not be changed in the less privileged mount namespace.
> +.IP
> +This point can be illustrated by a continuation of the previous example.
> +In that example, the bind mount was marked as read-only.
> +For security reasons,
> +it should not be possible to make the mount writable in
> +a less privileged namespace, and indeed the kernel prevents this,
> +as illustrated by the following:
> +.IP
> +.RS
> +.in +4n
> +.EX
> +$ \fBsudo unshare \-\-user \-\-map\-root\-user \-\-mount \e\fP
> +               \fBmount \-o remount,rw /mnt/dir\fP
> +mount: /mnt/dir: permission denied.
> +.EE
> +.in
> +.RE
>  .IP *
>  .\" (As of 3.18-rc1 (in Al Viro's 2014-08-30 vfs.git#for-next tree))
>  A file or directory that is a mount point in one namespace that is not

Eric