From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: Re: [PATCH] vfs: move ACL cache lookup into generic code
Date: Mon, 25 Jul 2011 13:45:04 +0530
Message-ID: <87tyaag4jb.fsf@skywalker.in.ibm.com>
References: <alpine.LFD.2.02.1107221023170.29055@i5.linux-foundation.org> <alpine.LFD.2.02.1107221038090.29055@i5.linux-foundation.org> <alpine.LFD.2.02.1107221932120.27112@i5.linux-foundation.org> <20110723032944.GA24703@ZenIV.linux.org.uk> <CA+55aFx90c2HaBJ+SA7zEVrnQJkiQRrfK6XUqJvLVA1CsR1Okw@mail.gmail.com> <20110723043120.GB24703@ZenIV.linux.org.uk> <20110723060626.GC24703@ZenIV.linux.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Christoph Hellwig <hch@lst.de>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from e6.ny.us.ibm.com ([32.97.182.146]:42504 "EHLO e6.ny.us.ibm.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751294Ab1GYIbL (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Mon, 25 Jul 2011 04:31:11 -0400
Received: from d01relay01.pok.ibm.com (d01relay01.pok.ibm.com [9.56.227.233])
	by e6.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p6P7p7SW000890
	for <linux-fsdevel@vger.kernel.org>; Mon, 25 Jul 2011 03:51:07 -0400
Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216])
	by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p6P8FEfg121814
	for <linux-fsdevel@vger.kernel.org>; Mon, 25 Jul 2011 04:15:14 -0400
Received: from d01av02.pok.ibm.com (loopback [127.0.0.1])
	by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p6P8FDnJ017724
	for <linux-fsdevel@vger.kernel.org>; Mon, 25 Jul 2011 05:15:14 -0300
In-Reply-To: <20110723060626.GC24703@ZenIV.linux.org.uk>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Sat, 23 Jul 2011 07:06:26 +0100, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Sat, Jul 23, 2011 at 05:31:20AM +0100, Al Viro wrote:
> 
> > Heh...  In addition to ocfs2 leak: 9p leaks nicely if v9fs_acl_mode() is
> > called with !S_ISDIR(mode).  In that case acl reference is simply lost.
> > So yes, it's worth looking at.
> 
> Damn...  FWIW, that code is seriously broken and not only on failure exits.
> Guys, think what happens if we have a negative dentry and call e.g. mkdir().
> Dentry is hashed.  Eventually we get to
>         d_instantiate(dentry, inode);
>         err = v9fs_fid_add(dentry, fid);
> in v9fs_create() (or v9fs_vfs_mkdir_dotl()).  At the same time somebody does
> lookup *in* that directory.  We do find that thing in dcache, it's (already)
> positive and we hit
>         dfid = v9fs_fid_lookup(dentry->d_parent);
> at the same time.  That calls v9fs_fid_lookup_with_uid() and then
> v9fs_fid_add().  Now, we have two tasks doing v9fs_fid_add() on the same
> dentry.  Which is to say,
>         dent = dentry->d_fsdata;
>         if (!dent) {
>                 dent = kmalloc(sizeof(struct v9fs_dentry), GFP_KERNEL);
>                 if (!dent)
>                         return -ENOMEM;
> 
>                 spin_lock_init(&dent->lock);
>                 INIT_LIST_HEAD(&dent->fidlist);
>                 dentry->d_fsdata = dent;
>         }
> and that's an obvious leak.  It's not easy to hit, but...  Note that
> ->d_revalidate() does *not* help - v9fs_create() will force revaliation
> of parent, but the second process might be already past that point.
> Moreover, ->d_revalidate() would just talk to server and return 1, so
> even if the second process does hit it, nothing will change except for
> minor delay.  And the first one might be sleeping in blocking allocation
> at that point...
> 
> I don't like that; we certainly can protect v9fs_fid_add() in standard way
> (i.e. recheck ->d_fsdata after kmalloc(), if we have lost the race - kfree()
> and go on, otherwise set ->d_fsdata and make that check+assignment atomic,
> e.g. by use of ->d_lock).  However, that looks like a kludge.  Is there any
> saner way to do it?  E.g. do we absolutely have to do that *after*
> d_instantiate()?

We should be able to move that v9fs_fid_add before d_instantiate. That
way we can be sure that positive dentries have fids attached.
v9fs_vfs_lookup does that already. I guess we should audit all the
v9fs_fid_add usage to make sure we do that consistently all other place.

-aneesh