All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Drokin <green@linuxhacker.ru>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mailing List <linux-kernel@vger.kernel.org>,
	"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>
Subject: Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
Date: Tue, 5 Jul 2016 00:43:52 -0400	[thread overview]
Message-ID: <AD13D806-21D0-4850-AD7D-24E27C306D38@linuxhacker.ru> (raw)
In-Reply-To: <1B63970B-923A-4C4C-98ED-70F0087C513D@linuxhacker.ru>


On Jul 4, 2016, at 10:28 PM, Oleg Drokin wrote:

> 
> On Jul 3, 2016, at 2:29 AM, Al Viro wrote:
> 
>> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>> 
>>> Sorry to nag you about this, but did any of those pan out?
>>> 
>>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>>> a dentry already (though a potentially shared one, I understand).
>>> Would not it be better to try and establish some dentry locking rule for calling into
>>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>>> not change under them?
>>> Though I guess if there's dentry locking like that, we might as well do all the
>>> checking in d_splice_alias(), but that means the unhashed dentries would no
>>> longer be disallowed which is a change of semantic from now.--
>> 
>> FWIW, the only interesting case here is this:
>> 	* no O_CREAT in flags (otherwise the parent is held exclusive).
>> 	* dentry is found in hash
>> 	* dentry is negative
>> 	* dentry has passed ->d_revalidate() (i.e. in case of
>> NFS it had nfs_neg_need_reval() return false).
>> 
>> Only two instances are non-trivial in that respect - NFS and Lustre.
>> Everything else will simply fail open() with ENOENT in that case.
>> 
>> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
>> finish_no_open and bugger off in case it's not in_lookup, otherwise do
>> pretty much what we do in case we'd got in_lookup from the very beginning.
>> Some adjustments are needed for that case (basically, we need to make
>> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
>> with refcounting correctly).
>> 
>> Tentative NFS patch follows; I don't understand Lustre well enough, but it
>> looks like a plausible strategy there as well.
> 
> This patch seems to have brought the other crash back in (or something similar),
> the one with a negative dentry being hashed when it's already hashed.
> it's not as easy to hit as before, but a lot easier than the race we are hitting here.
> 
> Also in all cases it is ls that is crashing now, which seems to highlight it is
> taking some of this new paths added.
> I have half a dosen crashdumps if you want me to look something up there.
> This is on top of 4.7.0-rc6 a99cde438de0c4c0cecc1d1af1a55a75b10bfdef
> with just your patch on top.
> 
> I can reproduce it with both the complex workload, or with a simplified one (attached),
> takes anywhere from 5 to 30 minutes to hit.
> 
>> 
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index d8015a03..5474e39 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 		    struct file *file, unsigned open_flags,
>> 		    umode_t mode, int *opened)
>> {
>> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
>> 	struct nfs_open_context *ctx;
>> 	struct dentry *res;
>> 	struct iattr attr = { .ia_valid = ATTR_OPEN };
>> 	struct inode *inode;
>> 	unsigned int lookup_flags = 0;
>> +	bool switched = false;
>> 	int err;
>> 
>> 	/* Expect a negative dentry */
>> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 		attr.ia_size = 0;
>> 	}
>> 
>> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
>> +		d_drop(dentry);
>> +		switched = true;
>> +		dentry = d_alloc_parallel(dentry->d_parent,
>> +					  &dentry->d_name, &wq);
> 
> Hm, d_alloc_parallel can return some preexisting dentry it was able to lookup in
> rcu attached to the same parent with the same name it seems?
> What's to stop a parallel thread to rehash the dentry after we dropped it here,
> and so d_alloc_parallel will happily find it?
> I am running a test to verify this theory now.

Ok, so at least in my case we do not come out of d_alloc_parallel() with a hashed
dentry, though I wonder why are not you trying to catch this case here?
If you rely on the !d_in_lookup(dentry) below, that seems to be racy.
if it was __d_lookup_rcu() that found the dentry, we do not seem to be performing
any additional checks on it and just return it.
But what if it was just added by a parallel caller here that just finished
d_splice_alias (= hashed dentry so it's not rejected by the __d_lookup_rcu) and
at the same time have not progressed all the way to d_lookup_done() call below?

Interesting that in some of the dumps d_hash is all zeroes, which means
it is unhashed, yet the assertion was triggered, so there was some parallel
caller that performed a d_drop on us (but nothing of interest on other CPUs).

In all cases d_flags = 140, so we do not have the DCACHE_PAR_LOOKUP set at the
point of crash.

Also in part of the dumps the dentry is actually positive, not negative.


> 
>> +		if (IS_ERR(dentry))
>> +			return PTR_ERR(dentry);
>> +		if (unlikely(!d_in_lookup(dentry)))
>> +			return finish_no_open(file, dentry);
>> +	}
>> +
>> 	ctx = create_nfs_open_context(dentry, open_flags);
>> 	err = PTR_ERR(ctx);
>> 	if (IS_ERR(ctx))
>> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>> 	put_nfs_open_context(ctx);
>> out:
>> +	if (unlikely(switched)) {
>> +		d_lookup_done(dentry);
>> +		dput(dentry);
>> +	}
>> 	return err;
>> 
>> no_open:
>> 	res = nfs_lookup(dir, dentry, lookup_flags);
>> -	err = PTR_ERR(res);
>> +	if (switched) {
>> +		d_lookup_done(dentry);
>> +		if (!res)
>> +			res = dentry;
>> +		else
>> +			dput(dentry);
>> +	}
>> 	if (IS_ERR(res))
>> -		goto out;
>> -
>> +		return PTR_ERR(res);
>> 	return finish_no_open(file, res);
>> }
>> EXPORT_SYMBOL_GPL(nfs_atomic_open);
> 

  parent reply	other threads:[~2016-07-05  4:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  4:09 More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes Oleg Drokin
2016-06-17  4:29 ` Al Viro
2016-06-25 16:38   ` Oleg Drokin
2016-07-03  6:29     ` Al Viro
2016-07-04  0:08       ` Al Viro
2016-07-04  0:37         ` Oleg Drokin
2016-07-04  0:37           ` Oleg Drokin
2016-07-04  3:08           ` Al Viro
2016-07-04  3:55             ` Oleg Drokin
2016-07-04  3:55               ` Oleg Drokin
2016-07-05  2:25               ` Al Viro
2016-07-10 17:01                 ` Oleg Drokin
2016-07-10 18:14                   ` James Simmons
2016-07-11  1:01                     ` Al Viro
2016-07-11  1:03                       ` Al Viro
2016-07-11 22:54                         ` lustre sendmsg stuff Oleg Drokin
2016-07-11 17:15                       ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes James Simmons
2016-07-05  2:28       ` Oleg Drokin
2016-07-05  2:32         ` Oleg Drokin
2016-07-05  4:43         ` Oleg Drokin [this message]
2016-07-05  6:22       ` Oleg Drokin
2016-07-05 12:31         ` Al Viro
2016-07-05 13:51           ` Al Viro
2016-07-05 15:21             ` Oleg Drokin
2016-07-05 17:42               ` Al Viro
2016-07-05 18:12                 ` Oleg Drokin
2016-07-05 16:33             ` Oleg Drokin
2016-07-05 18:08               ` Al Viro
2016-07-05 19:12                 ` Oleg Drokin
2016-07-05 20:08                   ` Al Viro
2016-07-05 20:21                     ` Oleg Drokin
2016-07-06  0:29                       ` Oleg Drokin
2016-07-06  3:20                         ` Al Viro
2016-07-06  3:25                           ` Oleg Drokin
2016-07-06  3:25                             ` Oleg Drokin
2016-07-06  4:35                             ` Oleg Drokin
2016-07-06  4:35                               ` Oleg Drokin
2016-07-06 16:24             ` Oleg Drokin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AD13D806-21D0-4850-AD7D-24E27C306D38@linuxhacker.ru \
    --to=green@linuxhacker.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.