All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression for MS_MOVE on kernel v5.1
@ 2019-06-12 22:54 Christian Brauner
  2019-06-13  4:00 ` Linus Torvalds
  2019-06-13  9:27 ` David Howells
  0 siblings, 2 replies; 9+ messages in thread
From: Christian Brauner @ 2019-06-12 22:54 UTC (permalink / raw)
  To: viro, linux-kernel, torvalds, linux-fsdevel, linux-api, dhowells

Hey,

Sorry to be the bearer of bad news but I think I observed a pretty
gnarly regression for userspace with MS_MOVE from kernel v5.1 onwards.

When propagating mounts across mount namespaces owned by different user
namespaces it is not possible anymore to move the mount in the less
privileged mount namespace.
Here is a reproducer:

sudo mount -t tmpfs tmpfs /mnt
sudo --make-rshared /mnt

# create unprivileged user + mount namespace and preserve propagation
unshare -U -m --map-root --propagation=unchanged

# now change back to the original mount namespace in another terminal:
sudo mkdir /mnt/aaa
sudo mount -t tmpfs tmpfs /mnt/aaa

# now in the unprivileged user + mount namespace
mount --move /mnt/aaa /opt

This will work on kernels prior to 5.1 but will fail on kernels starting
with 5.1.
Unfortunately, this is a pretty big deal for userspace. In LXD - which I
maintain when not doing kernel stuff - we use this mechanism to inject
mounts into running unprivileged containers. Users started reporting
failures against our mount injection feature just a short while ago
(cf.  [1], [2]) and I just came around to looking into this today.

I tracked this down to commit:

commit 3bd045cc9c4be2049602b47505256b43908b4e2f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Jan 30 13:15:45 2019 -0500

    separate copying and locking mount tree on cross-userns copies

    Rather than having propagate_mnt() check doing unprivileged copies,
    lock them before commit_tree().

    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

reverting it makes MS_MOVE to work correctly again.
The commit changes the internal logic to lock mounts when propagating
mounts (user+)mount namespaces and - I believe - causes do_mount_move()
to fail at:

if (old->mnt.mnt_flags & MNT_LOCKED)
        goto out;

If that's indeed the case we should either revert this commit (reverts
cleanly, just tested it) or find a fix.

Thanks!
Christian

[1]: https://github.com/lxc/lxd/issues/5788
[2]: https://github.com/lxc/lxd/issues/5836

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression for MS_MOVE on kernel v5.1
  2019-06-12 22:54 Regression for MS_MOVE on kernel v5.1 Christian Brauner
@ 2019-06-13  4:00 ` Linus Torvalds
  2019-06-13 13:22   ` Christian Brauner
  2019-06-13  9:27 ` David Howells
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2019-06-13  4:00 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API,
	David Howells

On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote:
>
> The commit changes the internal logic to lock mounts when propagating
> mounts (user+)mount namespaces and - I believe - causes do_mount_move()
> to fail at:

You mean 'do_move_mount()'.

> if (old->mnt.mnt_flags & MNT_LOCKED)
>         goto out;
>
> If that's indeed the case we should either revert this commit (reverts
> cleanly, just tested it) or find a fix.

Hmm.. I'm not entirely sure of the logic here, and just looking at
that commit 3bd045cc9c4b ("separate copying and locking mount tree on
cross-userns copies") doesn't make me go "Ahh" either.

Al? My gut feel is that we need to just revert, since this was in 5.1
and it's getting reasonably late in 5.2 too. But maybe you go "guys,
don't be silly, this is easily fixed with this one-liner".

                      Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression for MS_MOVE on kernel v5.1
  2019-06-12 22:54 Regression for MS_MOVE on kernel v5.1 Christian Brauner
  2019-06-13  4:00 ` Linus Torvalds
@ 2019-06-13  9:27 ` David Howells
  1 sibling, 0 replies; 9+ messages in thread
From: David Howells @ 2019-06-13  9:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Eric W. Biederman, Christian Brauner, Al Viro,
	Linux List Kernel Mailing, linux-fsdevel, Linux API

[Adding Eric to the cc list since he implemented MNT_LOCKED]

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > The commit changes the internal logic to lock mounts when propagating
> > mounts (user+)mount namespaces and - I believe - causes do_mount_move()
> > to fail at:
> 
> You mean 'do_move_mount()'.
> 
> > if (old->mnt.mnt_flags & MNT_LOCKED)
> >         goto out;
> >
> > If that's indeed the case we should either revert this commit (reverts
> > cleanly, just tested it) or find a fix.
> 
> Hmm.. I'm not entirely sure of the logic here, and just looking at
> that commit 3bd045cc9c4b ("separate copying and locking mount tree on
> cross-userns copies") doesn't make me go "Ahh" either.
> 
> Al? My gut feel is that we need to just revert, since this was in 5.1
> and it's getting reasonably late in 5.2 too. But maybe you go "guys,
> don't be silly, this is easily fixed with this one-liner".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression for MS_MOVE on kernel v5.1
  2019-06-13  4:00 ` Linus Torvalds
@ 2019-06-13 13:22   ` Christian Brauner
  2019-06-13 18:34     ` Eric W. Biederman
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2019-06-13 13:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Linux List Kernel Mailing, linux-fsdevel, Linux API,
	David Howells, Eric W. Biederman

On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote:
> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote:
> >
> > The commit changes the internal logic to lock mounts when propagating
> > mounts (user+)mount namespaces and - I believe - causes do_mount_move()
> > to fail at:
> 
> You mean 'do_move_mount()'.
> 
> > if (old->mnt.mnt_flags & MNT_LOCKED)
> >         goto out;
> >
> > If that's indeed the case we should either revert this commit (reverts
> > cleanly, just tested it) or find a fix.
> 
> Hmm.. I'm not entirely sure of the logic here, and just looking at
> that commit 3bd045cc9c4b ("separate copying and locking mount tree on
> cross-userns copies") doesn't make me go "Ahh" either.
> 
> Al? My gut feel is that we need to just revert, since this was in 5.1
> and it's getting reasonably late in 5.2 too. But maybe you go "guys,
> don't be silly, this is easily fixed with this one-liner".

David and I have been staring at that code today for a while together.
I think I made some sense of it.
One thing we weren't absolutely sure is if the old MS_MOVE behavior was
intentional or a bug. If it is a bug we have a problem since we quite
heavily rely on this...

So this whole cross-user+mnt namespace propagation mechanism comes with
a big hammer that Eric indeed did introduce a while back which is
MNT_LOCKED (cf. [1] for the relevant commit).

Afaict, MNT_LOCKED is (among other cases) supposed to prevent a user+mnt
namespace pair to get access to a mount that is hidden underneath an
additional mount. Consider the following scenario:

sudo mount -t tmpfs tmpfs /mnt
sudo mount --make-rshared /mnt
sudo mount -t tmpfs tmpfs /mnt
sudo mount --make-rshared /mnt
unshare -U -m --map-root --propagation=unchanged

umount /mnt
# or
mount --move -mnt /opt

The last umount/MS_MOVE is supposed to fail since the mount is locked
with MNT_LOCKED since umounting or MS_MOVing the mount would reveal the
underlying mount which I didn't have access to prior to the creation of
my user+mnt namespace pair.
(Whether or not this is a reasonable security mechanism is a separate
discussion.)

But now consider the case where from the ancestor user+mnt namespace
pair I do:

# propagate the mount to the user+mount namespace pair                 
sudo mount -t tmpfs tmpfs /mnt
# switch to the child user+mnt namespace pair
umount /mnt
# or
mount --move /mnt /opt

That umount/MS_MOVE should work since that mount was propagated to the
unprivileged task after the user+mnt namespace pair was created.
Also, because I already had access to the underlying mount in the first
place and second because this is literally the only way - we know of -
to inject a mount cross mount namespaces and this is a must have feature
that quite a lot of users rely on.

Christian

[1]: git show 5ff9d8a65ce80efb509ce4e8051394e9ed2cd942

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression for MS_MOVE on kernel v5.1
  2019-06-13 13:22   ` Christian Brauner
@ 2019-06-13 18:34     ` Eric W. Biederman
  2019-06-13 20:25       ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2019-06-13 18:34 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Al Viro, Linux List Kernel Mailing,
	linux-fsdevel, Linux API, David Howells

Christian Brauner <christian@brauner.io> writes:

> On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote:
>> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote:
>> >
>> > The commit changes the internal logic to lock mounts when propagating
>> > mounts (user+)mount namespaces and - I believe - causes do_mount_move()
>> > to fail at:
>> 
>> You mean 'do_move_mount()'.
>> 
>> > if (old->mnt.mnt_flags & MNT_LOCKED)
>> >         goto out;
>> >
>> > If that's indeed the case we should either revert this commit (reverts
>> > cleanly, just tested it) or find a fix.
>> 
>> Hmm.. I'm not entirely sure of the logic here, and just looking at
>> that commit 3bd045cc9c4b ("separate copying and locking mount tree on
>> cross-userns copies") doesn't make me go "Ahh" either.
>> 
>> Al? My gut feel is that we need to just revert, since this was in 5.1
>> and it's getting reasonably late in 5.2 too. But maybe you go "guys,
>> don't be silly, this is easily fixed with this one-liner".
>
> David and I have been staring at that code today for a while together.
> I think I made some sense of it.
> One thing we weren't absolutely sure is if the old MS_MOVE behavior was
> intentional or a bug. If it is a bug we have a problem since we quite
> heavily rely on this...

It was intentional.

The only mounts that are locked in propagation are the mounts that
propagate together.  If you see the mounts come in as individuals you
can always see/manipulate/work with the underlying mount.

I can think of only a few ways for MNT_LOCKED to become set:
a) unshare(CLONE_NEWNS)
b) mount --rclone /path/to/mnt/tree /path/to/propagation/point
c) mount --move /path/to/mnt/tree /path/to/propgation/point

Nothing in the target namespace should be locked on the propgation point
but all of the new mounts that came across as a unit should be locked
together.

> So this whole cross-user+mnt namespace propagation mechanism comes with
> a big hammer that Eric indeed did introduce a while back which is
> MNT_LOCKED (cf. [1] for the relevant commit).
>
> Afaict, MNT_LOCKED is (among other cases) supposed to prevent a user+mnt
> namespace pair to get access to a mount that is hidden underneath an
> additional mount. Consider the following scenario:
>
> sudo mount -t tmpfs tmpfs /mnt
> sudo mount --make-rshared /mnt
> sudo mount -t tmpfs tmpfs /mnt
> sudo mount --make-rshared /mnt
> unshare -U -m --map-root --propagation=unchanged
>
> umount /mnt
> # or
> mount --move -mnt /opt
>
> The last umount/MS_MOVE is supposed to fail since the mount is locked
> with MNT_LOCKED since umounting or MS_MOVing the mount would reveal the
> underlying mount which I didn't have access to prior to the creation of
> my user+mnt namespace pair.
> (Whether or not this is a reasonable security mechanism is a separate
> discussion.)
>
> But now consider the case where from the ancestor user+mnt namespace
> pair I do:
>
> # propagate the mount to the user+mount namespace pair                 
> sudo mount -t tmpfs tmpfs /mnt
> # switch to the child user+mnt namespace pair
> umount /mnt
> # or
> mount --move /mnt /opt
>
> That umount/MS_MOVE should work since that mount was propagated to the
> unprivileged task after the user+mnt namespace pair was created.
> Also, because I already had access to the underlying mount in the first
> place and second because this is literally the only way - we know of -
> to inject a mount cross mount namespaces and this is a must have feature
> that quite a lot of users rely on.

Then it breaking is definitely a regression that needs to be fixed.

I believe the problematic change as made because the new mount
api allows attaching floating mounts.  Or that was the plan last I
looked.   Those floating mounts don't have a mnt_ns so will result
in a NULL pointer dereference when they are attached.

So I suspect fixing this is not as simple as reverting a single patch.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression for MS_MOVE on kernel v5.1
  2019-06-13 18:34     ` Eric W. Biederman
@ 2019-06-13 20:25       ` Miklos Szeredi
  2019-06-13 21:59         ` Eric W. Biederman
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2019-06-13 20:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christian Brauner, Linus Torvalds, Al Viro,
	Linux List Kernel Mailing, linux-fsdevel, Linux API,
	David Howells

On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Christian Brauner <christian@brauner.io> writes:
>
> > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote:
> >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote:
> >> >
> >> > The commit changes the internal logic to lock mounts when propagating
> >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move()
> >> > to fail at:
> >>
> >> You mean 'do_move_mount()'.
> >>
> >> > if (old->mnt.mnt_flags & MNT_LOCKED)
> >> >         goto out;
> >> >
> >> > If that's indeed the case we should either revert this commit (reverts
> >> > cleanly, just tested it) or find a fix.
> >>
> >> Hmm.. I'm not entirely sure of the logic here, and just looking at
> >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on
> >> cross-userns copies") doesn't make me go "Ahh" either.
> >>
> >> Al? My gut feel is that we need to just revert, since this was in 5.1
> >> and it's getting reasonably late in 5.2 too. But maybe you go "guys,
> >> don't be silly, this is easily fixed with this one-liner".
> >
> > David and I have been staring at that code today for a while together.
> > I think I made some sense of it.
> > One thing we weren't absolutely sure is if the old MS_MOVE behavior was
> > intentional or a bug. If it is a bug we have a problem since we quite
> > heavily rely on this...
>
> It was intentional.
>
> The only mounts that are locked in propagation are the mounts that
> propagate together.  If you see the mounts come in as individuals you
> can always see/manipulate/work with the underlying mount.
>
> I can think of only a few ways for MNT_LOCKED to become set:
> a) unshare(CLONE_NEWNS)
> b) mount --rclone /path/to/mnt/tree /path/to/propagation/point
> c) mount --move /path/to/mnt/tree /path/to/propgation/point
>
> Nothing in the target namespace should be locked on the propgation point
> but all of the new mounts that came across as a unit should be locked
> together.

Locked together means the root of the new mount tree doesn't have
MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right?

Isn't the bug here that the root mount gets MNT_LOCKED as well?

>
> Then it breaking is definitely a regression that needs to be fixed.
>
> I believe the problematic change as made because the new mount
> api allows attaching floating mounts.  Or that was the plan last I
> looked.   Those floating mounts don't have a mnt_ns so will result
> in a NULL pointer dereference when they are attached.

Well, it's called anonymous namespace.  So there *is* an mnt_ns, and
its lifetime is bound to the file returned by fsmount().

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression for MS_MOVE on kernel v5.1
  2019-06-13 20:25       ` Miklos Szeredi
@ 2019-06-13 21:59         ` Eric W. Biederman
  2019-06-13 23:37           ` Christian Brauner
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2019-06-13 21:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Christian Brauner, Linus Torvalds, Al Viro,
	Linux List Kernel Mailing, linux-fsdevel, Linux API,
	David Howells

Miklos Szeredi <miklos@szeredi.hu> writes:

> On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Christian Brauner <christian@brauner.io> writes:
>>
>> > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote:
>> >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote:
>> >> >
>> >> > The commit changes the internal logic to lock mounts when propagating
>> >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move()
>> >> > to fail at:
>> >>
>> >> You mean 'do_move_mount()'.
>> >>
>> >> > if (old->mnt.mnt_flags & MNT_LOCKED)
>> >> >         goto out;
>> >> >
>> >> > If that's indeed the case we should either revert this commit (reverts
>> >> > cleanly, just tested it) or find a fix.
>> >>
>> >> Hmm.. I'm not entirely sure of the logic here, and just looking at
>> >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on
>> >> cross-userns copies") doesn't make me go "Ahh" either.
>> >>
>> >> Al? My gut feel is that we need to just revert, since this was in 5.1
>> >> and it's getting reasonably late in 5.2 too. But maybe you go "guys,
>> >> don't be silly, this is easily fixed with this one-liner".
>> >
>> > David and I have been staring at that code today for a while together.
>> > I think I made some sense of it.
>> > One thing we weren't absolutely sure is if the old MS_MOVE behavior was
>> > intentional or a bug. If it is a bug we have a problem since we quite
>> > heavily rely on this...
>>
>> It was intentional.
>>
>> The only mounts that are locked in propagation are the mounts that
>> propagate together.  If you see the mounts come in as individuals you
>> can always see/manipulate/work with the underlying mount.
>>
>> I can think of only a few ways for MNT_LOCKED to become set:
>> a) unshare(CLONE_NEWNS)
>> b) mount --rclone /path/to/mnt/tree /path/to/propagation/point
>> c) mount --move /path/to/mnt/tree /path/to/propgation/point
>>
>> Nothing in the target namespace should be locked on the propgation point
>> but all of the new mounts that came across as a unit should be locked
>> together.
>
> Locked together means the root of the new mount tree doesn't have
> MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right?
>
> Isn't the bug here that the root mount gets MNT_LOCKED as well?

Yes, and the code to remove MNT_LOCKED is still sitting there in
propogate_one right after it calls copy_tree.  It should be a trivial
matter of moving that change to after the lock_mnt_tree call.

Now that I have been elightened about anonymous mount namespaces
I am suspecting that we want to take the user_namespace of the anonymous
namespace into account when deciding to lock the mounts.

>> Then it breaking is definitely a regression that needs to be fixed.
>>
>> I believe the problematic change as made because the new mount
>> api allows attaching floating mounts.  Or that was the plan last I
>> looked.   Those floating mounts don't have a mnt_ns so will result
>> in a NULL pointer dereference when they are attached.
>
> Well, it's called anonymous namespace.  So there *is* an mnt_ns, and
> its lifetime is bound to the file returned by fsmount().

Interesting.  That has changed since I last saw the patches.

Below is what will probably be a straight forward fix for the regression.

Eric

diff --git a/fs/namespace.c b/fs/namespace.c
index ffb13f0562b0..a39edeecbc46 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2105,6 +2105,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
                /* Notice when we are propagating across user namespaces */
                if (child->mnt_parent->mnt_ns->user_ns != user_ns)
                        lock_mnt_tree(child);
+               child->mnt.mnt_flags &= ~MNT_LOCKED;
                commit_tree(child);
        }
        put_mountpoint(smp);
diff --git a/fs/pnode.c b/fs/pnode.c
index 7ea6cfb65077..012be405fec0 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -262,7 +262,6 @@ static int propagate_one(struct mount *m)
        child = copy_tree(last_source, last_source->mnt.mnt_root, type);
        if (IS_ERR(child))
                return PTR_ERR(child);
-       child->mnt.mnt_flags &= ~MNT_LOCKED;
        mnt_set_mountpoint(m, mp, child);
        last_dest = m;
        last_source = child;



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Regression for MS_MOVE on kernel v5.1
  2019-06-13 21:59         ` Eric W. Biederman
@ 2019-06-13 23:37           ` Christian Brauner
  2019-06-14 12:08             ` Eric W. Biederman
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Brauner @ 2019-06-13 23:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Miklos Szeredi, Linus Torvalds, Al Viro,
	Linux List Kernel Mailing, linux-fsdevel, Linux API,
	David Howells

On Thu, Jun 13, 2019 at 04:59:24PM -0500, Eric W. Biederman wrote:
> Miklos Szeredi <miklos@szeredi.hu> writes:
> 
> > On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>
> >> Christian Brauner <christian@brauner.io> writes:
> >>
> >> > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote:
> >> >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote:
> >> >> >
> >> >> > The commit changes the internal logic to lock mounts when propagating
> >> >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move()
> >> >> > to fail at:
> >> >>
> >> >> You mean 'do_move_mount()'.
> >> >>
> >> >> > if (old->mnt.mnt_flags & MNT_LOCKED)
> >> >> >         goto out;
> >> >> >
> >> >> > If that's indeed the case we should either revert this commit (reverts
> >> >> > cleanly, just tested it) or find a fix.
> >> >>
> >> >> Hmm.. I'm not entirely sure of the logic here, and just looking at
> >> >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on
> >> >> cross-userns copies") doesn't make me go "Ahh" either.
> >> >>
> >> >> Al? My gut feel is that we need to just revert, since this was in 5.1
> >> >> and it's getting reasonably late in 5.2 too. But maybe you go "guys,
> >> >> don't be silly, this is easily fixed with this one-liner".
> >> >
> >> > David and I have been staring at that code today for a while together.
> >> > I think I made some sense of it.
> >> > One thing we weren't absolutely sure is if the old MS_MOVE behavior was
> >> > intentional or a bug. If it is a bug we have a problem since we quite
> >> > heavily rely on this...
> >>
> >> It was intentional.
> >>
> >> The only mounts that are locked in propagation are the mounts that
> >> propagate together.  If you see the mounts come in as individuals you
> >> can always see/manipulate/work with the underlying mount.
> >>
> >> I can think of only a few ways for MNT_LOCKED to become set:
> >> a) unshare(CLONE_NEWNS)
> >> b) mount --rclone /path/to/mnt/tree /path/to/propagation/point
> >> c) mount --move /path/to/mnt/tree /path/to/propgation/point
> >>
> >> Nothing in the target namespace should be locked on the propgation point
> >> but all of the new mounts that came across as a unit should be locked
> >> together.
> >
> > Locked together means the root of the new mount tree doesn't have
> > MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right?
> >
> > Isn't the bug here that the root mount gets MNT_LOCKED as well?

Yes, we suspected this as well. We just couldn't pinpoint where the
surgery would need to start.

> 
> Yes, and the code to remove MNT_LOCKED is still sitting there in
> propogate_one right after it calls copy_tree.  It should be a trivial
> matter of moving that change to after the lock_mnt_tree call.
> 
> Now that I have been elightened about anonymous mount namespaces
> I am suspecting that we want to take the user_namespace of the anonymous
> namespace into account when deciding to lock the mounts.
> 
> >> Then it breaking is definitely a regression that needs to be fixed.
> >>
> >> I believe the problematic change as made because the new mount
> >> api allows attaching floating mounts.  Or that was the plan last I
> >> looked.   Those floating mounts don't have a mnt_ns so will result
> >> in a NULL pointer dereference when they are attached.
> >
> > Well, it's called anonymous namespace.  So there *is* an mnt_ns, and
> > its lifetime is bound to the file returned by fsmount().
> 
> Interesting.  That has changed since I last saw the patches.
> 
> Below is what will probably be a straight forward fix for the regression.

Tested the patch just now applied on top of v5.1. It fixes the
regression.
Can you please send a proper patch, Eric?

Tested-by: Christian Brauner <christian@brauner.io>
Acked-by: Christian Brauner <christian@brauner.io>

> 
> Eric
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ffb13f0562b0..a39edeecbc46 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2105,6 +2105,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>                 /* Notice when we are propagating across user namespaces */
>                 if (child->mnt_parent->mnt_ns->user_ns != user_ns)
>                         lock_mnt_tree(child);
> +               child->mnt.mnt_flags &= ~MNT_LOCKED;
>                 commit_tree(child);
>         }
>         put_mountpoint(smp);
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 7ea6cfb65077..012be405fec0 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -262,7 +262,6 @@ static int propagate_one(struct mount *m)
>         child = copy_tree(last_source, last_source->mnt.mnt_root, type);
>         if (IS_ERR(child))
>                 return PTR_ERR(child);
> -       child->mnt.mnt_flags &= ~MNT_LOCKED;
>         mnt_set_mountpoint(m, mp, child);
>         last_dest = m;
>         last_source = child;
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regression for MS_MOVE on kernel v5.1
  2019-06-13 23:37           ` Christian Brauner
@ 2019-06-14 12:08             ` Eric W. Biederman
  0 siblings, 0 replies; 9+ messages in thread
From: Eric W. Biederman @ 2019-06-14 12:08 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Miklos Szeredi, Linus Torvalds, Al Viro,
	Linux List Kernel Mailing, linux-fsdevel, Linux API,
	David Howells

Christian Brauner <christian@brauner.io> writes:

> On Thu, Jun 13, 2019 at 04:59:24PM -0500, Eric W. Biederman wrote:
>> Miklos Szeredi <miklos@szeredi.hu> writes:
>> 
>> > On Thu, Jun 13, 2019 at 8:35 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> >>
>> >> Christian Brauner <christian@brauner.io> writes:
>> >>
>> >> > On Wed, Jun 12, 2019 at 06:00:39PM -1000, Linus Torvalds wrote:
>> >> >> On Wed, Jun 12, 2019 at 12:54 PM Christian Brauner <christian@brauner.io> wrote:
>> >> >> >
>> >> >> > The commit changes the internal logic to lock mounts when propagating
>> >> >> > mounts (user+)mount namespaces and - I believe - causes do_mount_move()
>> >> >> > to fail at:
>> >> >>
>> >> >> You mean 'do_move_mount()'.
>> >> >>
>> >> >> > if (old->mnt.mnt_flags & MNT_LOCKED)
>> >> >> >         goto out;
>> >> >> >
>> >> >> > If that's indeed the case we should either revert this commit (reverts
>> >> >> > cleanly, just tested it) or find a fix.
>> >> >>
>> >> >> Hmm.. I'm not entirely sure of the logic here, and just looking at
>> >> >> that commit 3bd045cc9c4b ("separate copying and locking mount tree on
>> >> >> cross-userns copies") doesn't make me go "Ahh" either.
>> >> >>
>> >> >> Al? My gut feel is that we need to just revert, since this was in 5.1
>> >> >> and it's getting reasonably late in 5.2 too. But maybe you go "guys,
>> >> >> don't be silly, this is easily fixed with this one-liner".
>> >> >
>> >> > David and I have been staring at that code today for a while together.
>> >> > I think I made some sense of it.
>> >> > One thing we weren't absolutely sure is if the old MS_MOVE behavior was
>> >> > intentional or a bug. If it is a bug we have a problem since we quite
>> >> > heavily rely on this...
>> >>
>> >> It was intentional.
>> >>
>> >> The only mounts that are locked in propagation are the mounts that
>> >> propagate together.  If you see the mounts come in as individuals you
>> >> can always see/manipulate/work with the underlying mount.
>> >>
>> >> I can think of only a few ways for MNT_LOCKED to become set:
>> >> a) unshare(CLONE_NEWNS)
>> >> b) mount --rclone /path/to/mnt/tree /path/to/propagation/point
>> >> c) mount --move /path/to/mnt/tree /path/to/propgation/point
>> >>
>> >> Nothing in the target namespace should be locked on the propgation point
>> >> but all of the new mounts that came across as a unit should be locked
>> >> together.
>> >
>> > Locked together means the root of the new mount tree doesn't have
>> > MNT_LOCKED set, but all mounts below do have MNT_LOCKED, right?
>> >
>> > Isn't the bug here that the root mount gets MNT_LOCKED as well?
>
> Yes, we suspected this as well. We just couldn't pinpoint where the
> surgery would need to start.
>
>> 
>> Yes, and the code to remove MNT_LOCKED is still sitting there in
>> propogate_one right after it calls copy_tree.  It should be a trivial
>> matter of moving that change to after the lock_mnt_tree call.
>> 
>> Now that I have been elightened about anonymous mount namespaces
>> I am suspecting that we want to take the user_namespace of the anonymous
>> namespace into account when deciding to lock the mounts.
>> 
>> >> Then it breaking is definitely a regression that needs to be fixed.
>> >>
>> >> I believe the problematic change as made because the new mount
>> >> api allows attaching floating mounts.  Or that was the plan last I
>> >> looked.   Those floating mounts don't have a mnt_ns so will result
>> >> in a NULL pointer dereference when they are attached.
>> >
>> > Well, it's called anonymous namespace.  So there *is* an mnt_ns, and
>> > its lifetime is bound to the file returned by fsmount().
>> 
>> Interesting.  That has changed since I last saw the patches.
>> 
>> Below is what will probably be a straight forward fix for the regression.
>
> Tested the patch just now applied on top of v5.1. It fixes the
> regression.
> Can you please send a proper patch, Eric?
>
> Tested-by: Christian Brauner <christian@brauner.io>
> Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

I will let Al or whoever take this over the finish line.
I am too sleep deprived at the moment to say anything about the quality
of my patch.

Eric

>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index ffb13f0562b0..a39edeecbc46 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -2105,6 +2105,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>>                 /* Notice when we are propagating across user namespaces */
>>                 if (child->mnt_parent->mnt_ns->user_ns != user_ns)
>>                         lock_mnt_tree(child);
>> +               child->mnt.mnt_flags &= ~MNT_LOCKED;
>>                 commit_tree(child);
>>         }
>>         put_mountpoint(smp);
>> diff --git a/fs/pnode.c b/fs/pnode.c
>> index 7ea6cfb65077..012be405fec0 100644
>> --- a/fs/pnode.c
>> +++ b/fs/pnode.c
>> @@ -262,7 +262,6 @@ static int propagate_one(struct mount *m)
>>         child = copy_tree(last_source, last_source->mnt.mnt_root, type);
>>         if (IS_ERR(child))
>>                 return PTR_ERR(child);
>> -       child->mnt.mnt_flags &= ~MNT_LOCKED;
>>         mnt_set_mountpoint(m, mp, child);
>>         last_dest = m;
>>         last_source = child;
>> 
>> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-14 12:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-12 22:54 Regression for MS_MOVE on kernel v5.1 Christian Brauner
2019-06-13  4:00 ` Linus Torvalds
2019-06-13 13:22   ` Christian Brauner
2019-06-13 18:34     ` Eric W. Biederman
2019-06-13 20:25       ` Miklos Szeredi
2019-06-13 21:59         ` Eric W. Biederman
2019-06-13 23:37           ` Christian Brauner
2019-06-14 12:08             ` Eric W. Biederman
2019-06-13  9:27 ` David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.