From mboxrd@z Thu Jan  1 00:00:00 1970
From: Al Viro <viro@ZenIV.linux.org.uk>
Subject: Re: [PATCH review 11/11] mnt: Honor MNT_LOCKED when detaching mounts
Date: Fri, 9 Jan 2015 20:31:26 +0000
Message-ID: <20150109203126.GI22149@ZenIV.linux.org.uk>
References: <87mw5xq7lt.fsf@x220.int.ebiederm.org>
 <1420490787-14387-11-git-send-email-ebiederm@xmission.com>
 <20150107184334.GZ22149@ZenIV.linux.org.uk>
 <87h9w2gzht.fsf@x220.int.ebiederm.org>
 <20150107205239.GB22149@ZenIV.linux.org.uk>
 <87iogi8dka.fsf@x220.int.ebiederm.org>
 <20150108002227.GC22149@ZenIV.linux.org.uk>
 <20150108223212.GF22149@ZenIV.linux.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Linux Containers <containers@lists.linux-foundation.org>,
	linux-fsdevel@vger.kernel.org,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Chen Hanxiao <chenhanxiao@cn.fujitsu.com>,
	Richard Weinberger <richard@nod.at>,
	Andrey Vagin <avagin@openvz.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeniv.linux.org.uk ([195.92.253.2]:60518 "EHLO
	ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751630AbbAIUbd (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Fri, 9 Jan 2015 15:31:33 -0500
Content-Disposition: inline
In-Reply-To: <20150108223212.GF22149@ZenIV.linux.org.uk>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Thu, Jan 08, 2015 at 10:32:13PM +0000, Al Viro wrote:
> 	a) easily expressed with that scheme (behaviour is almost identical
> to what your series does, except that we'd empty mnt_child right in
> mntput_no_expire(), rather than keeping it until cleanup_mnt()) and
> 	b) easily generalized to MNT_DETACH with lazy dissolving; in fact,
> the only difference would be to treat *all* submounts as "don't put on
> global list, don't remove from child list", not just the MNT_LOCKED ones.
> 
> I'll play around with that today and tomorrow; hopefully, I'll have a postable
> variant by the weekend...

Hmm...  Linus, what do you think of the following?

struct foo {
	int done;
	wait_queue_head_t wait;
	...
};

void kill_foo(struct foo *p)
{
        wait_queue_t wait;
	wait.flags = WQ_FLAG_EXCLUSIVE;
	wait.private = current;
	wait.func = autoremove_wake_function;
	spin_lock_irq(&p->wait.lock);
	if (likely(!p->done)) {
		p->done = -1;
		spin_unlock_irq(&p->wait.lock);
		rcu_read_unlock();
		/* do cleanup */
		...
		/* remove references to *p */
		...
		spin_lock_irq(&p->wait.lock);
		p->done = 1;	
		wake_up_locked(&p->wait);
		spin_unlock_irq(&p->wait.lock);
		/* RCU-schedule freeing of p */
		...
		return;
	}
	if (p->done > 0) {
		spin_unlock_irq(&p->wait.lock);
		rcu_read_unlock();
		return;
	}
	__add_wait_queue_tail(&p->wait, &wait);
	while (1) {
		set_current_state(TASK_UNINITERRUPTIBLE);
		spin_unlock_irq(&p->wait.lock);
		rcu_read_unlock();
		schedule();
		rcu_read_lock();
		if (likely(list_empty(&wait.task_list)))
			break;
		/* OK, we know p couldn't have been freed yet */
		spin_lock_irq(&p->wait.lock);
		if (p->done > 0) {
			spin_unlock_irq(&p->wait.lock);
			break;
		}
	}
	rcu_read_unlock();
}

AFAICS, that ought to be safe - the first caller makes sure that everybody
else will wait for it to finish, everybody else (ones who come via references
before the first one has removed them) will be waiting for the first one
to do wakeup *and* will take care not to touch the victim unless they knows
it's still there.

Do you see any problems with the above?  I wonder if I ended up open-coding
something already existing in there...