From: Oleg Drokin <green@linuxhacker.ru>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mailing List <linux-kernel@vger.kernel.org>,
"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>
Subject: Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
Date: Tue, 5 Jul 2016 00:43:52 -0400 [thread overview]
Message-ID: <AD13D806-21D0-4850-AD7D-24E27C306D38@linuxhacker.ru> (raw)
In-Reply-To: <1B63970B-923A-4C4C-98ED-70F0087C513D@linuxhacker.ru>
On Jul 4, 2016, at 10:28 PM, Oleg Drokin wrote:
>
> On Jul 3, 2016, at 2:29 AM, Al Viro wrote:
>
>> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>>
>>> Sorry to nag you about this, but did any of those pan out?
>>>
>>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>>> a dentry already (though a potentially shared one, I understand).
>>> Would not it be better to try and establish some dentry locking rule for calling into
>>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>>> not change under them?
>>> Though I guess if there's dentry locking like that, we might as well do all the
>>> checking in d_splice_alias(), but that means the unhashed dentries would no
>>> longer be disallowed which is a change of semantic from now.--
>>
>> FWIW, the only interesting case here is this:
>> * no O_CREAT in flags (otherwise the parent is held exclusive).
>> * dentry is found in hash
>> * dentry is negative
>> * dentry has passed ->d_revalidate() (i.e. in case of
>> NFS it had nfs_neg_need_reval() return false).
>>
>> Only two instances are non-trivial in that respect - NFS and Lustre.
>> Everything else will simply fail open() with ENOENT in that case.
>>
>> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
>> finish_no_open and bugger off in case it's not in_lookup, otherwise do
>> pretty much what we do in case we'd got in_lookup from the very beginning.
>> Some adjustments are needed for that case (basically, we need to make
>> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
>> with refcounting correctly).
>>
>> Tentative NFS patch follows; I don't understand Lustre well enough, but it
>> looks like a plausible strategy there as well.
>
> This patch seems to have brought the other crash back in (or something similar),
> the one with a negative dentry being hashed when it's already hashed.
> it's not as easy to hit as before, but a lot easier than the race we are hitting here.
>
> Also in all cases it is ls that is crashing now, which seems to highlight it is
> taking some of this new paths added.
> I have half a dosen crashdumps if you want me to look something up there.
> This is on top of 4.7.0-rc6 a99cde438de0c4c0cecc1d1af1a55a75b10bfdef
> with just your patch on top.
>
> I can reproduce it with both the complex workload, or with a simplified one (attached),
> takes anywhere from 5 to 30 minutes to hit.
>
>>
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index d8015a03..5474e39 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> struct file *file, unsigned open_flags,
>> umode_t mode, int *opened)
>> {
>> + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
>> struct nfs_open_context *ctx;
>> struct dentry *res;
>> struct iattr attr = { .ia_valid = ATTR_OPEN };
>> struct inode *inode;
>> unsigned int lookup_flags = 0;
>> + bool switched = false;
>> int err;
>>
>> /* Expect a negative dentry */
>> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> attr.ia_size = 0;
>> }
>>
>> + if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
>> + d_drop(dentry);
>> + switched = true;
>> + dentry = d_alloc_parallel(dentry->d_parent,
>> + &dentry->d_name, &wq);
>
> Hm, d_alloc_parallel can return some preexisting dentry it was able to lookup in
> rcu attached to the same parent with the same name it seems?
> What's to stop a parallel thread to rehash the dentry after we dropped it here,
> and so d_alloc_parallel will happily find it?
> I am running a test to verify this theory now.
Ok, so at least in my case we do not come out of d_alloc_parallel() with a hashed
dentry, though I wonder why are not you trying to catch this case here?
If you rely on the !d_in_lookup(dentry) below, that seems to be racy.
if it was __d_lookup_rcu() that found the dentry, we do not seem to be performing
any additional checks on it and just return it.
But what if it was just added by a parallel caller here that just finished
d_splice_alias (= hashed dentry so it's not rejected by the __d_lookup_rcu) and
at the same time have not progressed all the way to d_lookup_done() call below?
Interesting that in some of the dumps d_hash is all zeroes, which means
it is unhashed, yet the assertion was triggered, so there was some parallel
caller that performed a d_drop on us (but nothing of interest on other CPUs).
In all cases d_flags = 140, so we do not have the DCACHE_PAR_LOOKUP set at the
point of crash.
Also in part of the dumps the dentry is actually positive, not negative.
>
>> + if (IS_ERR(dentry))
>> + return PTR_ERR(dentry);
>> + if (unlikely(!d_in_lookup(dentry)))
>> + return finish_no_open(file, dentry);
>> + }
>> +
>> ctx = create_nfs_open_context(dentry, open_flags);
>> err = PTR_ERR(ctx);
>> if (IS_ERR(ctx))
>> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>> put_nfs_open_context(ctx);
>> out:
>> + if (unlikely(switched)) {
>> + d_lookup_done(dentry);
>> + dput(dentry);
>> + }
>> return err;
>>
>> no_open:
>> res = nfs_lookup(dir, dentry, lookup_flags);
>> - err = PTR_ERR(res);
>> + if (switched) {
>> + d_lookup_done(dentry);
>> + if (!res)
>> + res = dentry;
>> + else
>> + dput(dentry);
>> + }
>> if (IS_ERR(res))
>> - goto out;
>> -
>> + return PTR_ERR(res);
>> return finish_no_open(file, res);
>> }
>> EXPORT_SYMBOL_GPL(nfs_atomic_open);
>
next prev parent reply other threads:[~2016-07-05 4:43 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-17 4:09 More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes Oleg Drokin
2016-06-17 4:29 ` Al Viro
2016-06-25 16:38 ` Oleg Drokin
2016-07-03 6:29 ` Al Viro
2016-07-04 0:08 ` Al Viro
2016-07-04 0:37 ` Oleg Drokin
2016-07-04 3:08 ` Al Viro
2016-07-04 3:55 ` Oleg Drokin
2016-07-05 2:25 ` Al Viro
2016-07-10 17:01 ` Oleg Drokin
2016-07-10 18:14 ` James Simmons
2016-07-11 1:01 ` Al Viro
2016-07-11 1:03 ` Al Viro
2016-07-11 22:54 ` lustre sendmsg stuff Oleg Drokin
2016-07-11 17:15 ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes James Simmons
2016-07-05 2:28 ` Oleg Drokin
2016-07-05 2:32 ` Oleg Drokin
2016-07-05 4:43 ` Oleg Drokin [this message]
2016-07-05 6:22 ` Oleg Drokin
2016-07-05 12:31 ` Al Viro
2016-07-05 13:51 ` Al Viro
2016-07-05 15:21 ` Oleg Drokin
2016-07-05 17:42 ` Al Viro
2016-07-05 18:12 ` Oleg Drokin
2016-07-05 16:33 ` Oleg Drokin
2016-07-05 18:08 ` Al Viro
2016-07-05 19:12 ` Oleg Drokin
2016-07-05 20:08 ` Al Viro
2016-07-05 20:21 ` Oleg Drokin
2016-07-06 0:29 ` Oleg Drokin
2016-07-06 3:20 ` Al Viro
2016-07-06 3:25 ` Oleg Drokin
2016-07-06 4:35 ` Oleg Drokin
2016-07-06 16:24 ` Oleg Drokin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AD13D806-21D0-4850-AD7D-24E27C306D38@linuxhacker.ru \
--to=green@linuxhacker.ru \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).