From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from out01.mta.xmission.com ([166.70.13.231]:53936 "EHLO
        out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1752446AbdBCVDU (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 3 Feb 2017 16:03:20 -0500
From: ebiederm@xmission.com (Eric W. Biederman)
To: Ram Pai <linuxram@us.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>, linux-fsdevel@vger.kernel.org,
        Andrei Vagin <avagin@virtuozzo.com>
References: <87d1fth8mh.fsf_-_@xmission.com>
        <20170112054548.GT1555@ZenIV.linux.org.uk>
        <87tw8u42fq.fsf_-_@xmission.com>
        <20170121035827.GA5657@ram.oc3035372033.ibm.com>
        <87efzxt5em.fsf@xmission.com>
        <20170123190201.GC5657@ram.oc3035372033.ibm.com>
        <87d1fd1fem.fsf@xmission.com> <87k2977deq.fsf@xmission.com>
        <20170203171019.GC5705@ram.oc3035372033.ibm.com>
        <87lgtn167n.fsf@xmission.com>
        <20170203202814.GD5705@ram.oc3035372033.ibm.com>
Date: Sat, 04 Feb 2017 09:58:39 +1300
In-Reply-To: <20170203202814.GD5705@ram.oc3035372033.ibm.com> (Ram Pai's
        message of "Fri, 3 Feb 2017 12:28:14 -0800")
Message-ID: <877f57uh34.fsf@xmission.com>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [PATCH v5] mnt: Tuck mounts under others instead of creating shadow/side mounts.
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Ram Pai <linuxram@us.ibm.com> writes:

> On Sat, Feb 04, 2017 at 07:26:20AM +1300, Eric W. Biederman wrote:
>> Ram Pai <linuxram@us.ibm.com> writes:
>> 
>> > On Fri, Feb 03, 2017 at 11:54:21PM +1300, Eric W. Biederman wrote:
>> >> ebiederm@xmission.com (Eric W. Biederman) writes:
>> >> 
>> >> > Ram Pai <linuxram@us.ibm.com> writes:
>> >> >
>> >> >> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote:
>> >> >>> Ram Pai <linuxram@us.ibm.com> writes:
>> >> >>> 
>> >> >>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt)
>> >> >>> >> 
>> >> >>> >>  	for (m = propagation_next(parent, parent); m;
>> >> >>> >>  	     		m = propagation_next(m, parent)) {
>> >> >>> >> -		child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint);
>> >> >>> >> -		if (child && list_empty(&child->mnt_mounts) &&
>> >> >>> >> -		    (ret = do_refcount_check(child, 1)))
>> >> >>> >> -			break;
>> >> >>> >> +		int count = 1;
>> >> >>> >> +		child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint);
>> >> >>> >> +		if (!child)
>> >> >>> >> +			continue;
>> >> >>> >> +
>> >> >>> >> +		/* Is there exactly one mount on the child that covers
>> >> >>> >> +		 * it completely whose reference should be ignored?
>> >> >>> >> +		 */
>> >> >>> >> +		topper = find_topper(child);
>> >> >>> >
>> >> >>> > This is tricky. I understand it is trying to identify the case where a
>> >> >>> > mount got tucked-in because of propagation.  But this will not
>> >> >>> > distinguish the case where a mount got over-mounted genuinely, not because of
>> >> >>> > propagation, but because of explicit user action.
>> >> >>> >
>> >> >>> >
>> >> >>> > example:
>> >> >>> >
>> >> >>> > case 1: (explicit user action)
>> >> >>> > 	B is a slave of A
>> >> >>> > 	mount something on A/a , it will propagate to B/a
>> >> >>> > 	and than mount something on B/a
>> >> >>> >
>> >> >>> > case 2: (tucked mount)
>> >> >>> > 	B is a slave of A
>> >> >>> > 	mount something on B/a
>> >> >>> > 	and than mount something on A/a
>> >> >>> >
>> >> >>> > Both case 1 and case 2 lead to the same mount configuration.
>> >> >>> >
>> >> >>> >
>> >> >>> > 	  however 'umount A/a' in case 1 should fail.
>> >> >>> > 	  and 'umount A/a' in case 2 should pass.
>> >> >>> >
>> >> >>> > Right? in other words, umounts of 'tucked mounts' should pass(case 2).
>> >> >>> > 	whereas umounts of mounts on which overmounts exist should
>> >> >>> > 		fail.(case 1)
>> >> >>> 
>> >> >>> Looking at your example.  I agree that case 1 will fail today.
>> >> >>
>> >> >> And should continue to fail. right? Your semantics change will pass it.
>> >> >
>> >> > I don't see why it should continue to fail.
>> >> >
>> >> >>> However my actual expectation would be for both mount configurations
>> >> >>> to behave the same.  In both cases something has been explicitly mounted
>> >> >>> on B/a and something has propagated to B/a.  In both cases the mount
>> >> >>> on top is what was explicitly mounted, and the mount below is what was
>> >> >>> propagated to B/a.
>> >> >>> 
>> >> >>> I don't see why the order of operations should matter.
>> >> >>
>> >> >> One of the subtle expectation is reversibility.
>> >> >>
>> >> >> Mount followed immediately by unmount has always passed and that is the
>> >> >> standard expectation always. Your proposed code will ensure that.
>> >> >>
>> >> >> However there is one other subtle expectaton.
>> >> >>
>> >> >> A mount cannot disappear if a user has explicitly mounted on top of it.
>> >> >>
>> >> >> your proposed code will not meet that expectation. 
>> >> >>
>> >> >> In other words, these two expectations make it behave differently even
>> >> >> when; arguably, they feel like the same configuration.
>> >> >
>> >> > I am not seeing that.
>> >> >
>> >> >
>> >> >
>> >> >>> 
>> >> >>> > maybe we need a flag to identify tucked mounts?
>> >> >>> 
>> >> >>> To preserve our exact current semantics yes.
>> >> >>> 
>> >> >>> The mount configurations that are delibearately constructed that I am
>> >> >>> aware of are comparatively simple.  I don't think anyone has even taken
>> >> >>> advantage of the shadow/side mounts at this point.  I made a reasonable
>> >> >>> effort to find out and no one was even aware they existed.  Much less
>> >> >>> what they were.  And certainly no one I talked to could find code that
>> >> >>> used them.
>> >> >>
>> >> >> But someday; even if its after a decade, someone ;) will
>> >> >> stumble into this semantics and wonder 'why?'. Its better to get it right
>> >> >> sooner. Sorry, I am blaming myself; for keeping some of the problems
>> >> >> open thinking no one will bump into them.
>> >> >
>> >> > Oh definitely.  If we have people ready to talk it through I am happy to
>> >> > dot as many i's and cross as many t's as we productively can.
>> >> >
>> >> > I was just pointing out that I don't have any reason to expect that any
>> >> > one depends on the subtle details of the implementation today so we
>> >> > still have some wiggle room to fix them.  Even if they are visible to
>> >> > user space.
>> >> 
>> >> So I haven't seen a reply, and we are getting awfully close to the merge
>> >> window.  Is there anything concrete we can do to ease concerns?
>> >> 
>> >> Right now I am thinking my last version of the patch is the likely the
>> >> best we have time and energy to manage and it would be good to merge it
>> >> before the code bit rots.
>> >
>> > I was waiting for some other opinions on the behavior, since I
>> > continue to think that 'one should not be able to unmount mounts on
>> > which a user has explicitly mounted upon'. I am happy to be overruled,
>> > since your patch significantly improves the rest of the semantics.
>> >
>> > Viro?
>> 
>> Ram Pai, just to be clear you were hoping to add the logic below to my patch?
>
> Yes. the behavior of your patch below is what I was proposing.
>
>> 
>> My objections to the snippet below are:
>> 
>> - It makes it hard for the CRIU folks (yet more state they have to find
>>   and restore).
>
> true. unfortunately one more subtle detail to be aware off.

A bit more than that, as it means that it requires an almost exact
playback of the sequence of mounts in all mount namespaces to
get to the point of reproducing a mount namespace.

>> - It feels subjectively worse to me.
>> 
>> - We already have cases where mounts are unmounted transparently (umount on rmdir).
>
> sorry. i am not aware of this case. some details will help.

The question:

What happens when we rmdir a directory that has a mount on it in another
mount namespace?

What happens when someone on the nfs server deletes a directory there
is a mount on?


It used to be that we returned -EBUSY, and refused the rmdir operation,
and we lied in the vfs about the nfs dentry being deleted to preserve
the mount.

In recent kernels I have done the work so that we transparently unmount
the mounts and allow the rmdir to happen.  An unprivileged user mounting
over say glibc and blocking the yum update of it is a pretty serious
bug.

>> - Al Viro claims that the side/shadow mounts are ordinary mounts and
>>   maintaining this extra logic that remembers if we tucked one mount
>>   under another seems to make this them less ordinary.
>
> I tend to argue that they are a bit more than ordinary, for they have the
> ability to tuck.
>
>> 
>> - The symmetry for unmounting exists for a tucked mount.  We can unmount
>>   it via propagation or we can unmount the mount above it, and then we
>>   can unmount the new underlying mount.
>
> this is fine with me.
>
>>   So I don't see why we don't
>>   want symmetry in the other case just because we mounted on top of
>>   the mount and rather than had the mount tucked under us.
>
> A tucked mount should be un-tuckable. I agree.  But a non-tucked mount
> cannot pretend to be tucked and this is where I disagree.

I have always seen the question as: Should a mount that is propagated be
unmountable via umount propagation.

Which leads me to think that allowing the umount propagation when it
won't change the applications view of files and filesystems is a good
thing.  From my perspective it also better preserves the reversability
property that is important.   The mount propgated and now the unmount
propagated.


>>From a system management point of view one of the largest practical
problems with mount namespaces and mount propagation is: mounts that
propagate into another mount namespaces but don't get unmounted.


Which is to say not unmounting something (especially silently) and
leaving the filesystem busy when something could be unmounted is a
practical problem for people.


I am going to be out for a week, and I am leaving in a few minutes.
So I am going to push my patch to the my for-next branch, so there
is a reasonable chance of merging things when the merge window opens.

If the feedback is to add the MNT_TUCKED annotations to make the patch
suitable for merging to Linus's tree I will take care of that when
I get back.

Eric