All of lore.kernel.org
 help / color / mirror / Atom feed
* More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
@ 2016-06-17  4:09 Oleg Drokin
  2016-06-17  4:29 ` Al Viro
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-06-17  4:09 UTC (permalink / raw)
  To: Al Viro, Trond Myklebust
  Cc: linux-nfs, Mailing List, <linux-fsdevel@vger.kernel.org>,
	idryomov, sage, zyan

Hello!

   So I think I finally kind of understand this other d_splice_alias problem
   I've been having.

   Imagine that we have a negative dentry in the cache.

   Now we have two threads that are trying to open this file in parallel,
   the fs is NFS, so atomic_open is defined, it's not a create, so
   parent is locked in shared mode and they both enter first lookup_open
   and then (because the dentry is negative) into nfs_atomic_open().

   Now in nfs_atomic_open we have two threads that run alongside each other
   trying to open this file and both fail (the code is with Al's earlier patch
   for the other d_splice_alias problem).

        inode = NFS_PROTO(dir)->open_context(dir, ctx, open_flags, &attr, opened);
        if (IS_ERR(inode)) {
                err = PTR_ERR(inode);
                trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
                put_nfs_open_context(ctx);
                d_drop(dentry);
. . .
                case -ENOTDIR:
                        goto no_open;
. . .
no_open:
        res = nfs_lookup(dir, dentry, lookup_flags);


    So they both do d_drop(), the dentry is now unhashed, and they both
    dive into nfs_lookup().
    There eventually they both call

	res = d_splice_alias(inode, dentry);

   And so the first lucky one continues on it's merry way with a hashed dentry,
   but the other less lucky one ends up calling into d_splice_alias() with
   dentry that's already hashed and hits the very familiar assertion.

   I took a brief look into ceph and it looks like a very similar thing
   might happen there with handle_reply() for two parallel replies calling into
   ceph_fill_trace() and then splice_alias()->d_splice_alias(), since the
   unhashed check it does is not under any locks, it's unsafe, so the problem
   might be more generic than just NFS too.

   So I wonder how to best fix this? Holding some sort of dentry lock across a call
   into atomic_open in VFS? We cannot just make d_splice_alias() callers call with
   inode->i_lock held because dentry might be negative.

   I know this is really happening because I in fact have 3 threads doing the above
   described race, one luckily passing through and two hitting the assertion:
PID: 2689619  TASK: ffff8800ca740540  CPU: 6   COMMAND: "ls"
 #0 [ffff8800c6da3758] machine_kexec at ffffffff81042462
 #1 [ffff8800c6da37b8] __crash_kexec at ffffffff811365ff
 #2 [ffff8800c6da3880] __crash_kexec at ffffffff811366d5
 #3 [ffff8800c6da3898] crash_kexec at ffffffff8113671b
 #4 [ffff8800c6da38b8] oops_end at ffffffff81018cb4
 #5 [ffff8800c6da38e0] die at ffffffff8101917b
 #6 [ffff8800c6da3910] do_trap at ffffffff81015b92
 #7 [ffff8800c6da3960] do_error_trap at ffffffff81015d8b
 #8 [ffff8800c6da3a20] do_invalid_op at ffffffff810166e0
 #9 [ffff8800c6da3a30] invalid_op at ffffffff8188c91e
    [exception RIP: d_splice_alias+478]
    RIP: ffffffff8128784e  RSP: ffff8800c6da3ae8  RFLAGS: 00010286
    RAX: ffff8800d8661ab8  RBX: ffff8800d3b6b178  RCX: ffff8800ca740540
    RDX: ffffffff81382f09  RSI: ffff8800d3b6b178  RDI: ffff8800d8661ab8
    RBP: ffff8800c6da3b20   R8: 0000000000000000   R9: 0000000000000000
    R10: ffff8800ca740540  R11: 0000000000000000  R12: ffff8800d6491600
    R13: 0000000000000102  R14: ffff8800d39063d0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff8800c6da3b28] nfs_lookup at ffffffff8137cd1c
#11 [ffff8800c6da3b80] nfs_atomic_open at ffffffff8137ed61
#12 [ffff8800c6da3c30] lookup_open at ffffffff8127901a
#13 [ffff8800c6da3d50] path_openat at ffffffff8127bf15
#14 [ffff8800c6da3dd8] do_filp_open at ffffffff8127d9b1
#15 [ffff8800c6da3ee0] do_sys_open at ffffffff81269f90
#16 [ffff8800c6da3f40] sys_open at ffffffff8126a09e
#17 [ffff8800c6da3f50] entry_SYSCALL_64_fastpath at ffffffff8188aebc
    RIP: 00007f92a76d4d20  RSP: 00007fff640c1160  RFLAGS: 00000206
    RAX: ffffffffffffffda  RBX: 0000000000000003  RCX: 00007f92a76d4d20
    RDX: 000055c9f5be9f50  RSI: 0000000000090800  RDI: 000055c9f5bea1d0
    RBP: 0000000000000000   R8: 000055c9f5bea340   R9: 0000000000000000
    R10: 000055c9f5bea300  R11: 0000000000000206  R12: 00007f92a800d6b0
    R13: 000055c9f5bf3000  R14: 000055c9f5bea3e0  R15: 0000000000000000
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

PID: 2689597  TASK: ffff880109b40bc0  CPU: 7   COMMAND: "ls"
 #0 [ffff8800c07178e0] die at ffffffff8101914e
 #1 [ffff8800c0717910] do_trap at ffffffff81015b92
 #2 [ffff8800c0717960] do_error_trap at ffffffff81015d8b
 #3 [ffff8800c0717a20] do_invalid_op at ffffffff810166e0
 #4 [ffff8800c0717a30] invalid_op at ffffffff8188c91e
    [exception RIP: d_splice_alias+478]
    RIP: ffffffff8128784e  RSP: ffff8800c0717ae8  RFLAGS: 00010286
    RAX: ffff8800d8661ab8  RBX: ffff8800d3b6b178  RCX: ffff880109b40bc0
    RDX: ffffffff81382f09  RSI: ffff8800d3b6b178  RDI: ffff8800d8661ab8
    RBP: ffff8800c0717b20   R8: 0000000000000000   R9: 0000000000000000
    R10: ffff880109b40bc0  R11: 0000000000000000  R12: ffff8800d6a29e40
    R13: 0000000000000102  R14: ffff8800d39063d0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff8800c0717b28] nfs_lookup at ffffffff8137cd1c
 #6 [ffff8800c0717b80] nfs_atomic_open at ffffffff8137ed61
 #7 [ffff8800c0717c30] lookup_open at ffffffff8127901a
 #8 [ffff8800c0717d50] path_openat at ffffffff8127bf15
 #9 [ffff8800c0717dd8] do_filp_open at ffffffff8127d9b1
#10 [ffff8800c0717ee0] do_sys_open at ffffffff81269f90
#11 [ffff8800c0717f40] sys_open at ffffffff8126a09e
#12 [ffff8800c0717f50] entry_SYSCALL_64_fastpath at ffffffff8188aebc
    RIP: 00007f5cef6f8d20  RSP: 00007ffcc98f87b0  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000000000000003  RCX: 00007f5cef6f8d20
    RDX: 000055de64230f50  RSI: 0000000000090800  RDI: 000055de642311d0
    RBP: 0000000000000000   R8: 000055de64231320   R9: 0000000000000000
    R10: 000055de64231300  R11: 0000000000000202  R12: 00007f5cf00316b0
    R13: 000055de6423a000  R14: 000055de642313c0  R15: 0000000000000000
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

The dentry is 0xffff8800d3b6b178 in both cases.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-06-17  4:09 More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes Oleg Drokin
@ 2016-06-17  4:29 ` Al Viro
  2016-06-25 16:38   ` Oleg Drokin
  0 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-06-17  4:29 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Trond Myklebust, linux-nfs, Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	idryomov, sage, zyan

On Fri, Jun 17, 2016 at 12:09:19AM -0400, Oleg Drokin wrote:

>     So they both do d_drop(), the dentry is now unhashed, and they both
>     dive into nfs_lookup().
>     There eventually they both call
> 
> 	res = d_splice_alias(inode, dentry);
> 
>    And so the first lucky one continues on it's merry way with a hashed dentry,
>    but the other less lucky one ends up calling into d_splice_alias() with
>    dentry that's already hashed and hits the very familiar assertion.
> 
>    I took a brief look into ceph and it looks like a very similar thing
>    might happen there with handle_reply() for two parallel replies calling into
>    ceph_fill_trace() and then splice_alias()->d_splice_alias(), since the
>    unhashed check it does is not under any locks, it's unsafe, so the problem
>    might be more generic than just NFS too.
> 
>    So I wonder how to best fix this? Holding some sort of dentry lock across a call
>    into atomic_open in VFS? We cannot just make d_splice_alias() callers call with
>    inode->i_lock held because dentry might be negative.

Oh, lovely...  So basically the problem is that we violate the "no lookups on
the same name in parallel" rule on those fallbacks from foo_atomic_open() to
foo_lookup().  The thing is, a lot of ->atomic_open() instances have such
fallbacks and I wonder if that's a sign that we need to lift some of that
to fs/namei.c...

Hell knows; alternative is to have that d_drop() followed by d_alloc_parallel()
and feeding that dentry to lookup.  I'll play with that a bit and see what's
better; hopefully I'll have something by tomorrow.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-06-17  4:29 ` Al Viro
@ 2016-06-25 16:38   ` Oleg Drokin
  2016-07-03  6:29     ` Al Viro
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-06-25 16:38 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

Hello!

On Jun 17, 2016, at 12:29 AM, Al Viro wrote:

> On Fri, Jun 17, 2016 at 12:09:19AM -0400, Oleg Drokin wrote:
> 
>>    So they both do d_drop(), the dentry is now unhashed, and they both
>>    dive into nfs_lookup().
>>    There eventually they both call
>> 
>> 	res = d_splice_alias(inode, dentry);
>> 
>>   And so the first lucky one continues on it's merry way with a hashed dentry,
>>   but the other less lucky one ends up calling into d_splice_alias() with
>>   dentry that's already hashed and hits the very familiar assertion.
>> 
>>   I took a brief look into ceph and it looks like a very similar thing
>>   might happen there with handle_reply() for two parallel replies calling into
>>   ceph_fill_trace() and then splice_alias()->d_splice_alias(), since the
>>   unhashed check it does is not under any locks, it's unsafe, so the problem
>>   might be more generic than just NFS too.
>> 
>>   So I wonder how to best fix this? Holding some sort of dentry lock across a call
>>   into atomic_open in VFS? We cannot just make d_splice_alias() callers call with
>>   inode->i_lock held because dentry might be negative.
> 
> Oh, lovely...  So basically the problem is that we violate the "no lookups on
> the same name in parallel" rule on those fallbacks from foo_atomic_open() to
> foo_lookup().  The thing is, a lot of ->atomic_open() instances have such
> fallbacks and I wonder if that's a sign that we need to lift some of that
> to fs/namei.c...
> 
> Hell knows; alternative is to have that d_drop() followed by d_alloc_parallel()
> and feeding that dentry to lookup.  I'll play with that a bit and see what's
> better; hopefully I'll have something by tomorrow.

Sorry to nag you about this, but did any of those pan out?

d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
a dentry already (though a potentially shared one, I understand).
Would not it be better to try and establish some dentry locking rule for calling into
d_splice_alias() instead? At least then the callers can make sure the dentry does
not change under them?
Though I guess if there's dentry locking like that, we might as well do all the
checking in d_splice_alias(), but that means the unhashed dentries would no
longer be disallowed which is a change of semantic from now.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-06-25 16:38   ` Oleg Drokin
@ 2016-07-03  6:29     ` Al Viro
  2016-07-04  0:08       ` Al Viro
                         ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Al Viro @ 2016-07-03  6:29 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:

> Sorry to nag you about this, but did any of those pan out?
> 
> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
> a dentry already (though a potentially shared one, I understand).
> Would not it be better to try and establish some dentry locking rule for calling into
> d_splice_alias() instead? At least then the callers can make sure the dentry does
> not change under them?
> Though I guess if there's dentry locking like that, we might as well do all the
> checking in d_splice_alias(), but that means the unhashed dentries would no
> longer be disallowed which is a change of semantic from now.--

FWIW, the only interesting case here is this:
	* no O_CREAT in flags (otherwise the parent is held exclusive).
	* dentry is found in hash
	* dentry is negative
	* dentry has passed ->d_revalidate() (i.e. in case of
NFS it had nfs_neg_need_reval() return false).

Only two instances are non-trivial in that respect - NFS and Lustre.
Everything else will simply fail open() with ENOENT in that case.

And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
finish_no_open and bugger off in case it's not in_lookup, otherwise do
pretty much what we do in case we'd got in_lookup from the very beginning.
Some adjustments are needed for that case (basically, we need to make
sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
with refcounting correctly).

Tentative NFS patch follows; I don't understand Lustre well enough, but it
looks like a plausible strategy there as well.

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d8015a03..5474e39 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		    struct file *file, unsigned open_flags,
 		    umode_t mode, int *opened)
 {
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct nfs_open_context *ctx;
 	struct dentry *res;
 	struct iattr attr = { .ia_valid = ATTR_OPEN };
 	struct inode *inode;
 	unsigned int lookup_flags = 0;
+	bool switched = false;
 	int err;
 
 	/* Expect a negative dentry */
@@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		attr.ia_size = 0;
 	}
 
+	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
+		d_drop(dentry);
+		switched = true;
+		dentry = d_alloc_parallel(dentry->d_parent,
+					  &dentry->d_name, &wq);
+		if (IS_ERR(dentry))
+			return PTR_ERR(dentry);
+		if (unlikely(!d_in_lookup(dentry)))
+			return finish_no_open(file, dentry);
+	}
+
 	ctx = create_nfs_open_context(dentry, open_flags);
 	err = PTR_ERR(ctx);
 	if (IS_ERR(ctx))
@@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
 	put_nfs_open_context(ctx);
 out:
+	if (unlikely(switched)) {
+		d_lookup_done(dentry);
+		dput(dentry);
+	}
 	return err;
 
 no_open:
 	res = nfs_lookup(dir, dentry, lookup_flags);
-	err = PTR_ERR(res);
+	if (switched) {
+		d_lookup_done(dentry);
+		if (!res)
+			res = dentry;
+		else
+			dput(dentry);
+	}
 	if (IS_ERR(res))
-		goto out;
-
+		return PTR_ERR(res);
 	return finish_no_open(file, res);
 }
 EXPORT_SYMBOL_GPL(nfs_atomic_open);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-03  6:29     ` Al Viro
@ 2016-07-04  0:08       ` Al Viro
  2016-07-04  0:37           ` Oleg Drokin
  2016-07-05  2:28       ` Oleg Drokin
  2016-07-05  6:22       ` Oleg Drokin
  2 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-04  0:08 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Sun, Jul 03, 2016 at 07:29:46AM +0100, Al Viro wrote:

> Tentative NFS patch follows; I don't understand Lustre well enough, but it
> looks like a plausible strategy there as well.

Speaking of Lustre: WTF is
                        /* Open dentry. */
                        if (S_ISFIFO(d_inode(dentry)->i_mode)) {
                                /* We cannot call open here as it would
                                 * deadlock.
                                 */
                                if (it_disposition(it, DISP_ENQ_OPEN_REF))
                                        ptlrpc_req_finished(
                                                       (struct ptlrpc_request *)
                                                          it->d.lustre.it_data);
                                rc = finish_no_open(file, de);
                        } else {
about and why do we only do that to FIFOs?  What about symlinks or device
nodes?  Directories, for that matter...  Shouldn't that be if (!S_ISREG(...))
instead?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-04  0:08       ` Al Viro
@ 2016-07-04  0:37           ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-04  0:37 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 3, 2016, at 8:08 PM, Al Viro wrote:

> On Sun, Jul 03, 2016 at 07:29:46AM +0100, Al Viro wrote:
> 
>> Tentative NFS patch follows; I don't understand Lustre well enough, but it
>> looks like a plausible strategy there as well.
> 
> Speaking of Lustre: WTF is
>                        /* Open dentry. */
>                        if (S_ISFIFO(d_inode(dentry)->i_mode)) {
>                                /* We cannot call open here as it would
>                                 * deadlock.
>                                 */
>                                if (it_disposition(it, DISP_ENQ_OPEN_REF))
>                                        ptlrpc_req_finished(
>                                                       (struct ptlrpc_request *)
>                                                          it->d.lustre.it_data);
>                                rc = finish_no_open(file, de);
>                        } else {
> about and why do we only do that to FIFOs?  What about symlinks or device
> nodes?  Directories, for that matter...  Shouldn't that be if (!S_ISREG(...))
> instead?

Hm… This dates to sometime in 2006 and my memory is a bit hazy here.

I think when we called into the open, it went into fifo open and stuck there
waiting for the other opener. Something like that. And we cannot really be stuck here
because we are holding some locks that need to be released in predictable time.

This code is actually unreachable now because the server never returns an openhandle
for special device nodes anymore (there's a comment about it in current staging tree,
but I guess you are looking at some prior version).

I imagine device nodes might have represented a similar risk too, but it did not
occur to me to test it separately and the testsuite does not do it either.

Directories do not get stuck when you open them so they are ok and we can
atomically open them too, I guess.
Symlinks are handled specially on the server and the open never returns
the actual open handle for those, so this path is also unreachable with those.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
@ 2016-07-04  0:37           ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-04  0:37 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 3, 2016, at 8:08 PM, Al Viro wrote:

> On Sun, Jul 03, 2016 at 07:29:46AM +0100, Al Viro wrote:
> 
>> Tentative NFS patch follows; I don't understand Lustre well enough, but it
>> looks like a plausible strategy there as well.
> 
> Speaking of Lustre: WTF is
>                        /* Open dentry. */
>                        if (S_ISFIFO(d_inode(dentry)->i_mode)) {
>                                /* We cannot call open here as it would
>                                 * deadlock.
>                                 */
>                                if (it_disposition(it, DISP_ENQ_OPEN_REF))
>                                        ptlrpc_req_finished(
>                                                       (struct ptlrpc_request *)
>                                                          it->d.lustre.it_data);
>                                rc = finish_no_open(file, de);
>                        } else {
> about and why do we only do that to FIFOs?  What about symlinks or device
> nodes?  Directories, for that matter...  Shouldn't that be if (!S_ISREG(...))
> instead?

Hm� This dates to sometime in 2006 and my memory is a bit hazy here.

I think when we called into the open, it went into fifo open and stuck there
waiting for the other opener. Something like that. And we cannot really be stuck here
because we are holding some locks that need to be released in predictable time.

This code is actually unreachable now because the server never returns an openhandle
for special device nodes anymore (there's a comment about it in current staging tree,
but I guess you are looking at some prior version).

I imagine device nodes might have represented a similar risk too, but it did not
occur to me to test it separately and the testsuite does not do it either.

Directories do not get stuck when you open them so they are ok and we can
atomically open them too, I guess.
Symlinks are handled specially on the server and the open never returns
the actual open handle for those, so this path is also unreachable with those.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-04  0:37           ` Oleg Drokin
  (?)
@ 2016-07-04  3:08           ` Al Viro
  2016-07-04  3:55               ` Oleg Drokin
  -1 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-04  3:08 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Sun, Jul 03, 2016 at 08:37:22PM -0400, Oleg Drokin wrote:

> Hm… This dates to sometime in 2006 and my memory is a bit hazy here.
> 
> I think when we called into the open, it went into fifo open and stuck there
> waiting for the other opener. Something like that. And we cannot really be stuck here
> because we are holding some locks that need to be released in predictable time.
> 
> This code is actually unreachable now because the server never returns an openhandle
> for special device nodes anymore (there's a comment about it in current staging tree,
> but I guess you are looking at some prior version).
> 
> I imagine device nodes might have represented a similar risk too, but it did not
> occur to me to test it separately and the testsuite does not do it either.
> 
> Directories do not get stuck when you open them so they are ok and we can
> atomically open them too, I guess.
> Symlinks are handled specially on the server and the open never returns
> the actual open handle for those, so this path is also unreachable with those.

Hmm...  How much does the safety of client depend upon the correctness of
server?

BTW, there's a fun issue in ll_revalidate_dentry(): there's nothing to
promise stability of ->d_parent in there, so uses of d_inode(dentry->d_parent)
are not safe.  That's independent from parallel lookups, and it's hard
to hit, but AFAICS it's not impossible to oops there.

Anyway, for Lustre the analogue of that NFS problem is here:
        } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
                   !it_disposition(it, DISP_OPEN_CREATE)) {
                /* With DISP_OPEN_CREATE dentry will be
                 * instantiated in ll_create_it.
                 */
                LASSERT(!d_inode(*de));
                d_instantiate(*de, inode);
        }
AFAICS, this (on top of mainline) ought to work:

diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 5eba0eb..b8da5b4 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -581,9 +581,11 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 			  struct file *file, unsigned open_flags,
 			  umode_t mode, int *opened)
 {
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
 	struct lookup_intent *it;
 	struct dentry *de;
 	long long lookup_flags = LOOKUP_OPEN;
+	bool switched = false;
 	int rc = 0;
 
 	CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir="DFID"(%p),file %p,open_flags %x,mode %x opened %d\n",
@@ -603,11 +605,28 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 	it->it_flags = (open_flags & ~O_ACCMODE) | OPEN_FMODE(open_flags);
 
 	/* Dentry added to dcache tree in ll_lookup_it */
+	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
+		d_drop(dentry);
+		switched = true;
+	        dentry = d_alloc_parallel(dentry->d_parent,
+					  &dentry->d_name, &wq);
+		if (IS_ERR(dentry)) {
+			rc = PTR_ERR(dentry);
+			goto out_release;
+		}
+		if (unlikely(!d_in_lookup(dentry))) {
+			rc = finish_no_open(file, dentry);
+			goto out_release;
+		}
+	}
+
 	de = ll_lookup_it(dir, dentry, it, lookup_flags);
 	if (IS_ERR(de))
 		rc = PTR_ERR(de);
 	else if (de)
 		dentry = de;
+	else if (switched)
+		de = dget(dentry);
 
 	if (!rc) {
 		if (it_disposition(it, DISP_OPEN_CREATE)) {
@@ -648,6 +667,10 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 	}
 
 out_release:
+	if (unlikely(switched)) {
+		d_lookup_done(dentry);
+		dput(dentry);
+	}
 	ll_intent_release(it);
 	kfree(it);
 

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-04  3:08           ` Al Viro
@ 2016-07-04  3:55               ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-04  3:55 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 3, 2016, at 11:08 PM, Al Viro wrote:

> On Sun, Jul 03, 2016 at 08:37:22PM -0400, Oleg Drokin wrote:
> 
>> Hm… This dates to sometime in 2006 and my memory is a bit hazy here.
>> 
>> I think when we called into the open, it went into fifo open and stuck there
>> waiting for the other opener. Something like that. And we cannot really be stuck here
>> because we are holding some locks that need to be released in predictable time.
>> 
>> This code is actually unreachable now because the server never returns an openhandle
>> for special device nodes anymore (there's a comment about it in current staging tree,
>> but I guess you are looking at some prior version).
>> 
>> I imagine device nodes might have represented a similar risk too, but it did not
>> occur to me to test it separately and the testsuite does not do it either.
>> 
>> Directories do not get stuck when you open them so they are ok and we can
>> atomically open them too, I guess.
>> Symlinks are handled specially on the server and the open never returns
>> the actual open handle for those, so this path is also unreachable with those.
> 
> Hmm...  How much does the safety of client depend upon the correctness of
> server?

Quite a bit, actually. If you connect to an rogue Lustre server,
currently there are many ways it can crash the client.
I suspect this is true not just of Lustre, if e.g. NFS server starts to
send directory inodes with duplicated inode numbers or some such,
VFS would not be super happy about such "hardlinked" directories either.
This is before we even consider that it can feed you garbage data
to crash your apps (or substitute binaries to do something else).

> BTW, there's a fun issue in ll_revalidate_dentry(): there's nothing to
> promise stability of ->d_parent in there, so uses of d_inode(dentry->d_parent)

Yes, we actually had a discussion about that in March, we were not the only ones
affected, and I think it was decided that dget_parent() was a better solution
to get to the parent (I see ext4 has already converted).
I believe you cannot hit it in Lustre now due to Lustre locking magic, but
I'll create a patch to cover this anyway. Thanks for reminding me about this.

> are not safe.  That's independent from parallel lookups, and it's hard
> to hit, but AFAICS it's not impossible to oops there.
> 
> Anyway, for Lustre the analogue of that NFS problem is here:
>        } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
>                   !it_disposition(it, DISP_OPEN_CREATE)) {
>                /* With DISP_OPEN_CREATE dentry will be
>                 * instantiated in ll_create_it.
>                 */
>                LASSERT(!d_inode(*de));
>                d_instantiate(*de, inode);
>        }

Hm… Do you mean that when we do come hashed here, with a negative dentry
and positive disposition and hit the assertion about inode not being NULL
(still staying negative, basically)?
This one we cannot hit because negative dentries are protected by a Lustre
dlm lock held by the parent directory. Any create in that parent directory
would invalidate the lock and once that happens, all negative dentries would
be killed.
Hmm… This probably means this is a dead code?
Ah, I guess it's not.
If we do a lookup and find this negative dentry (from 2+ threads) and THEN it gets invalidated and our two threads both race to instantiate it...
It does look like something that is quite hard to hit, but still looks like a race
that could happen.

> AFAICS, this (on top of mainline) ought to work:

Thanks, I'll give this a try.
> 
> diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
> index 5eba0eb..b8da5b4 100644
> --- a/drivers/staging/lustre/lustre/llite/namei.c
> +++ b/drivers/staging/lustre/lustre/llite/namei.c
> @@ -581,9 +581,11 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
> 			  struct file *file, unsigned open_flags,
> 			  umode_t mode, int *opened)
> {
> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> 	struct lookup_intent *it;
> 	struct dentry *de;
> 	long long lookup_flags = LOOKUP_OPEN;
> +	bool switched = false;
> 	int rc = 0;
> 
> 	CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir="DFID"(%p),file %p,open_flags %x,mode %x opened %d\n",
> @@ -603,11 +605,28 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
> 	it->it_flags = (open_flags & ~O_ACCMODE) | OPEN_FMODE(open_flags);
> 
> 	/* Dentry added to dcache tree in ll_lookup_it */
> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
> +		d_drop(dentry);
> +		switched = true;
> +	        dentry = d_alloc_parallel(dentry->d_parent,
> +					  &dentry->d_name, &wq);
> +		if (IS_ERR(dentry)) {
> +			rc = PTR_ERR(dentry);
> +			goto out_release;
> +		}
> +		if (unlikely(!d_in_lookup(dentry))) {
> +			rc = finish_no_open(file, dentry);
> +			goto out_release;
> +		}
> +	}
> +
> 	de = ll_lookup_it(dir, dentry, it, lookup_flags);
> 	if (IS_ERR(de))
> 		rc = PTR_ERR(de);
> 	else if (de)
> 		dentry = de;
> +	else if (switched)
> +		de = dget(dentry);
> 
> 	if (!rc) {
> 		if (it_disposition(it, DISP_OPEN_CREATE)) {
> @@ -648,6 +667,10 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
> 	}
> 
> out_release:
> +	if (unlikely(switched)) {
> +		d_lookup_done(dentry);
> +		dput(dentry);
> +	}
> 	ll_intent_release(it);
> 	kfree(it);
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
@ 2016-07-04  3:55               ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-04  3:55 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 3, 2016, at 11:08 PM, Al Viro wrote:

> On Sun, Jul 03, 2016 at 08:37:22PM -0400, Oleg Drokin wrote:
> 
>> Hm� This dates to sometime in 2006 and my memory is a bit hazy here.
>> 
>> I think when we called into the open, it went into fifo open and stuck there
>> waiting for the other opener. Something like that. And we cannot really be stuck here
>> because we are holding some locks that need to be released in predictable time.
>> 
>> This code is actually unreachable now because the server never returns an openhandle
>> for special device nodes anymore (there's a comment about it in current staging tree,
>> but I guess you are looking at some prior version).
>> 
>> I imagine device nodes might have represented a similar risk too, but it did not
>> occur to me to test it separately and the testsuite does not do it either.
>> 
>> Directories do not get stuck when you open them so they are ok and we can
>> atomically open them too, I guess.
>> Symlinks are handled specially on the server and the open never returns
>> the actual open handle for those, so this path is also unreachable with those.
> 
> Hmm...  How much does the safety of client depend upon the correctness of
> server?

Quite a bit, actually. If you connect to an rogue Lustre server,
currently there are many ways it can crash the client.
I suspect this is true not just of Lustre, if e.g. NFS server starts to
send directory inodes with duplicated inode numbers or some such,
VFS would not be super happy about such "hardlinked" directories either.
This is before we even consider that it can feed you garbage data
to crash your apps (or substitute binaries to do something else).

> BTW, there's a fun issue in ll_revalidate_dentry(): there's nothing to
> promise stability of ->d_parent in there, so uses of d_inode(dentry->d_parent)

Yes, we actually had a discussion about that in March, we were not the only ones
affected, and I think it was decided that dget_parent() was a better solution
to get to the parent (I see ext4 has already converted).
I believe you cannot hit it in Lustre now due to Lustre locking magic, but
I'll create a patch to cover this anyway. Thanks for reminding me about this.

> are not safe.  That's independent from parallel lookups, and it's hard
> to hit, but AFAICS it's not impossible to oops there.
> 
> Anyway, for Lustre the analogue of that NFS problem is here:
>        } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
>                   !it_disposition(it, DISP_OPEN_CREATE)) {
>                /* With DISP_OPEN_CREATE dentry will be
>                 * instantiated in ll_create_it.
>                 */
>                LASSERT(!d_inode(*de));
>                d_instantiate(*de, inode);
>        }

Hm� Do you mean that when we do come hashed here, with a negative dentry
and positive disposition and hit the assertion about inode not being NULL
(still staying negative, basically)?
This one we cannot hit because negative dentries are protected by a Lustre
dlm lock held by the parent directory. Any create in that parent directory
would invalidate the lock and once that happens, all negative dentries would
be killed.
Hmm� This probably means this is a dead code?
Ah, I guess it's not.
If we do a lookup and find this negative dentry (from 2+ threads) and THEN it gets invalidated and our two threads both race to instantiate it...
It does look like something that is quite hard to hit, but still looks like a race
that could happen.

> AFAICS, this (on top of mainline) ought to work:

Thanks, I'll give this a try.
> 
> diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
> index 5eba0eb..b8da5b4 100644
> --- a/drivers/staging/lustre/lustre/llite/namei.c
> +++ b/drivers/staging/lustre/lustre/llite/namei.c
> @@ -581,9 +581,11 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
> 			  struct file *file, unsigned open_flags,
> 			  umode_t mode, int *opened)
> {
> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> 	struct lookup_intent *it;
> 	struct dentry *de;
> 	long long lookup_flags = LOOKUP_OPEN;
> +	bool switched = false;
> 	int rc = 0;
> 
> 	CDEBUG(D_VFSTRACE, "VFS Op:name=%pd, dir="DFID"(%p),file %p,open_flags %x,mode %x opened %d\n",
> @@ -603,11 +605,28 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
> 	it->it_flags = (open_flags & ~O_ACCMODE) | OPEN_FMODE(open_flags);
> 
> 	/* Dentry added to dcache tree in ll_lookup_it */
> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
> +		d_drop(dentry);
> +		switched = true;
> +	        dentry = d_alloc_parallel(dentry->d_parent,
> +					  &dentry->d_name, &wq);
> +		if (IS_ERR(dentry)) {
> +			rc = PTR_ERR(dentry);
> +			goto out_release;
> +		}
> +		if (unlikely(!d_in_lookup(dentry))) {
> +			rc = finish_no_open(file, dentry);
> +			goto out_release;
> +		}
> +	}
> +
> 	de = ll_lookup_it(dir, dentry, it, lookup_flags);
> 	if (IS_ERR(de))
> 		rc = PTR_ERR(de);
> 	else if (de)
> 		dentry = de;
> +	else if (switched)
> +		de = dget(dentry);
> 
> 	if (!rc) {
> 		if (it_disposition(it, DISP_OPEN_CREATE)) {
> @@ -648,6 +667,10 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
> 	}
> 
> out_release:
> +	if (unlikely(switched)) {
> +		d_lookup_done(dentry);
> +		dput(dentry);
> +	}
> 	ll_intent_release(it);
> 	kfree(it);
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-04  3:55               ` Oleg Drokin
  (?)
@ 2016-07-05  2:25               ` Al Viro
  2016-07-10 17:01                 ` Oleg Drokin
  -1 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-05  2:25 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Sun, Jul 03, 2016 at 11:55:09PM -0400, Oleg Drokin wrote:
> Quite a bit, actually. If you connect to an rogue Lustre server,
> currently there are many ways it can crash the client.
> I suspect this is true not just of Lustre, if e.g. NFS server starts to
> send directory inodes with duplicated inode numbers or some such,
> VFS would not be super happy about such "hardlinked" directories either.
> This is before we even consider that it can feed you garbage data
> to crash your apps (or substitute binaries to do something else).

NFS client is at least supposed to try to be resistant to that.  As in,
"if an 0wn3d NFS server can be escalated to buggered client, it's a bug in
client and we are expected to try and fix it".

[snip]
> Thanks, I'll give this a try.

BTW, could you take a look at
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git#sendmsg.lustre?
It's a bunch of simplifications that became possible once sendmsg()/recvmsg()
switched to iov_iter, stopped mangling the iovecs and went for predictable
behaviour re advancing the iterator.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-03  6:29     ` Al Viro
  2016-07-04  0:08       ` Al Viro
@ 2016-07-05  2:28       ` Oleg Drokin
  2016-07-05  2:32         ` Oleg Drokin
  2016-07-05  4:43         ` Oleg Drokin
  2016-07-05  6:22       ` Oleg Drokin
  2 siblings, 2 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05  2:28 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 3, 2016, at 2:29 AM, Al Viro wrote:

> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
> 
>> Sorry to nag you about this, but did any of those pan out?
>> 
>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>> a dentry already (though a potentially shared one, I understand).
>> Would not it be better to try and establish some dentry locking rule for calling into
>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>> not change under them?
>> Though I guess if there's dentry locking like that, we might as well do all the
>> checking in d_splice_alias(), but that means the unhashed dentries would no
>> longer be disallowed which is a change of semantic from now.--
> 
> FWIW, the only interesting case here is this:
> 	* no O_CREAT in flags (otherwise the parent is held exclusive).
> 	* dentry is found in hash
> 	* dentry is negative
> 	* dentry has passed ->d_revalidate() (i.e. in case of
> NFS it had nfs_neg_need_reval() return false).
> 
> Only two instances are non-trivial in that respect - NFS and Lustre.
> Everything else will simply fail open() with ENOENT in that case.
> 
> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
> finish_no_open and bugger off in case it's not in_lookup, otherwise do
> pretty much what we do in case we'd got in_lookup from the very beginning.
> Some adjustments are needed for that case (basically, we need to make
> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
> with refcounting correctly).
> 
> Tentative NFS patch follows; I don't understand Lustre well enough, but it
> looks like a plausible strategy there as well.

This patch seems to have brought the other crash back in (or something similar),
the one with a negative dentry being hashed when it's already hashed.
it's not as easy to hit as before, but a lot easier than the race we are hitting here.

Also in all cases it is ls that is crashing now, which seems to highlight it is
taking some of this new paths added.
I have half a dosen crashdumps if you want me to look something up there.
This is on top of 4.7.0-rc6 a99cde438de0c4c0cecc1d1af1a55a75b10bfdef
with just your patch on top.

I can reproduce it with both the complex workload, or with a simplified one (attached),
takes anywhere from 5 to 30 minutes to hit.

> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index d8015a03..5474e39 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 		    struct file *file, unsigned open_flags,
> 		    umode_t mode, int *opened)
> {
> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> 	struct nfs_open_context *ctx;
> 	struct dentry *res;
> 	struct iattr attr = { .ia_valid = ATTR_OPEN };
> 	struct inode *inode;
> 	unsigned int lookup_flags = 0;
> +	bool switched = false;
> 	int err;
> 
> 	/* Expect a negative dentry */
> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 		attr.ia_size = 0;
> 	}
> 
> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
> +		d_drop(dentry);
> +		switched = true;
> +		dentry = d_alloc_parallel(dentry->d_parent,
> +					  &dentry->d_name, &wq);

Hm, d_alloc_parallel can return some preexisting dentry it was able to lookup in
rcu attached to the same parent with the same name it seems?
What's to stop a parallel thread to rehash the dentry after we dropped it here,
and so d_alloc_parallel will happily find it?
I am running a test to verify this theory now.

> +		if (IS_ERR(dentry))
> +			return PTR_ERR(dentry);
> +		if (unlikely(!d_in_lookup(dentry)))
> +			return finish_no_open(file, dentry);
> +	}
> +
> 	ctx = create_nfs_open_context(dentry, open_flags);
> 	err = PTR_ERR(ctx);
> 	if (IS_ERR(ctx))
> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> 	put_nfs_open_context(ctx);
> out:
> +	if (unlikely(switched)) {
> +		d_lookup_done(dentry);
> +		dput(dentry);
> +	}
> 	return err;
> 
> no_open:
> 	res = nfs_lookup(dir, dentry, lookup_flags);
> -	err = PTR_ERR(res);
> +	if (switched) {
> +		d_lookup_done(dentry);
> +		if (!res)
> +			res = dentry;
> +		else
> +			dput(dentry);
> +	}
> 	if (IS_ERR(res))
> -		goto out;
> -
> +		return PTR_ERR(res);
> 	return finish_no_open(file, res);
> }
> EXPORT_SYMBOL_GPL(nfs_atomic_open);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05  2:28       ` Oleg Drokin
@ 2016-07-05  2:32         ` Oleg Drokin
  2016-07-05  4:43         ` Oleg Drokin
  1 sibling, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05  2:32 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

[-- Attachment #1: Type: text/plain, Size: 59 bytes --]

ah,
and of course the testcase that I forgot to attach ;)


[-- Attachment #2: lstest.sh --]
[-- Type: application/octet-stream, Size: 1655 bytes --]

#!/bin/bash

cd /home/green/git/lustre-release/lustre/tests/racer
dd if=/dev/zero of=/tmp/loop bs=1024k count=1024
mkfs.ext4 /tmp/loop
service rpcbind start
mount none /var/lib/nfs -t tmpfs
touch /var/lib/nfs/etab
service nfs-mountd start
mount /tmp/loop /mnt/lustre -o loop
service nfs-server start
mount localhost:/mnt/lustre /mnt/nfs -t nfs -o nolock
mount localhost:/ /mnt/nfs2 -t nfs4
mkdir /mnt/lustre/racer

while :; do mkdir /mnt/lustre/racer/a ; mv /mnt/lustre/racer/a /mnt/lustre/racer/b ; touch /mnt/lustre/racer/a ; mv /mnt/lustre/racer/a /mnt/lustre/racer/b/ ; rm -rf /mnt/lustre/racer/b ; done &

while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &
while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &
while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &
while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &
while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &
while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &
while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &
while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &
while :; do ls -l /mnt/nfs2/racer/a/ ; ls -l /mnt/nfs2/racer/b/ ; ls -lR /mnt/nfs2/racer/ ; done >/dev/null 2>&1 &

wait %1

[-- Attachment #3: Type: text/plain, Size: 4773 bytes --]



On Jul 4, 2016, at 10:28 PM, Oleg Drokin wrote:

> 
> On Jul 3, 2016, at 2:29 AM, Al Viro wrote:
> 
>> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>> 
>>> Sorry to nag you about this, but did any of those pan out?
>>> 
>>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>>> a dentry already (though a potentially shared one, I understand).
>>> Would not it be better to try and establish some dentry locking rule for calling into
>>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>>> not change under them?
>>> Though I guess if there's dentry locking like that, we might as well do all the
>>> checking in d_splice_alias(), but that means the unhashed dentries would no
>>> longer be disallowed which is a change of semantic from now.--
>> 
>> FWIW, the only interesting case here is this:
>> 	* no O_CREAT in flags (otherwise the parent is held exclusive).
>> 	* dentry is found in hash
>> 	* dentry is negative
>> 	* dentry has passed ->d_revalidate() (i.e. in case of
>> NFS it had nfs_neg_need_reval() return false).
>> 
>> Only two instances are non-trivial in that respect - NFS and Lustre.
>> Everything else will simply fail open() with ENOENT in that case.
>> 
>> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
>> finish_no_open and bugger off in case it's not in_lookup, otherwise do
>> pretty much what we do in case we'd got in_lookup from the very beginning.
>> Some adjustments are needed for that case (basically, we need to make
>> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
>> with refcounting correctly).
>> 
>> Tentative NFS patch follows; I don't understand Lustre well enough, but it
>> looks like a plausible strategy there as well.
> 
> This patch seems to have brought the other crash back in (or something similar),
> the one with a negative dentry being hashed when it's already hashed.
> it's not as easy to hit as before, but a lot easier than the race we are hitting here.
> 
> Also in all cases it is ls that is crashing now, which seems to highlight it is
> taking some of this new paths added.
> I have half a dosen crashdumps if you want me to look something up there.
> This is on top of 4.7.0-rc6 a99cde438de0c4c0cecc1d1af1a55a75b10bfdef
> with just your patch on top.
> 
> I can reproduce it with both the complex workload, or with a simplified one (attached),
> takes anywhere from 5 to 30 minutes to hit.
> 
>> 
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index d8015a03..5474e39 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 		    struct file *file, unsigned open_flags,
>> 		    umode_t mode, int *opened)
>> {
>> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
>> 	struct nfs_open_context *ctx;
>> 	struct dentry *res;
>> 	struct iattr attr = { .ia_valid = ATTR_OPEN };
>> 	struct inode *inode;
>> 	unsigned int lookup_flags = 0;
>> +	bool switched = false;
>> 	int err;
>> 
>> 	/* Expect a negative dentry */
>> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 		attr.ia_size = 0;
>> 	}
>> 
>> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
>> +		d_drop(dentry);
>> +		switched = true;
>> +		dentry = d_alloc_parallel(dentry->d_parent,
>> +					  &dentry->d_name, &wq);
> 
> Hm, d_alloc_parallel can return some preexisting dentry it was able to lookup in
> rcu attached to the same parent with the same name it seems?
> What's to stop a parallel thread to rehash the dentry after we dropped it here,
> and so d_alloc_parallel will happily find it?
> I am running a test to verify this theory now.
> 
>> +		if (IS_ERR(dentry))
>> +			return PTR_ERR(dentry);
>> +		if (unlikely(!d_in_lookup(dentry)))
>> +			return finish_no_open(file, dentry);
>> +	}
>> +
>> 	ctx = create_nfs_open_context(dentry, open_flags);
>> 	err = PTR_ERR(ctx);
>> 	if (IS_ERR(ctx))
>> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>> 	put_nfs_open_context(ctx);
>> out:
>> +	if (unlikely(switched)) {
>> +		d_lookup_done(dentry);
>> +		dput(dentry);
>> +	}
>> 	return err;
>> 
>> no_open:
>> 	res = nfs_lookup(dir, dentry, lookup_flags);
>> -	err = PTR_ERR(res);
>> +	if (switched) {
>> +		d_lookup_done(dentry);
>> +		if (!res)
>> +			res = dentry;
>> +		else
>> +			dput(dentry);
>> +	}
>> 	if (IS_ERR(res))
>> -		goto out;
>> -
>> +		return PTR_ERR(res);
>> 	return finish_no_open(file, res);
>> }
>> EXPORT_SYMBOL_GPL(nfs_atomic_open);
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05  2:28       ` Oleg Drokin
  2016-07-05  2:32         ` Oleg Drokin
@ 2016-07-05  4:43         ` Oleg Drokin
  1 sibling, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05  4:43 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 4, 2016, at 10:28 PM, Oleg Drokin wrote:

> 
> On Jul 3, 2016, at 2:29 AM, Al Viro wrote:
> 
>> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>> 
>>> Sorry to nag you about this, but did any of those pan out?
>>> 
>>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>>> a dentry already (though a potentially shared one, I understand).
>>> Would not it be better to try and establish some dentry locking rule for calling into
>>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>>> not change under them?
>>> Though I guess if there's dentry locking like that, we might as well do all the
>>> checking in d_splice_alias(), but that means the unhashed dentries would no
>>> longer be disallowed which is a change of semantic from now.--
>> 
>> FWIW, the only interesting case here is this:
>> 	* no O_CREAT in flags (otherwise the parent is held exclusive).
>> 	* dentry is found in hash
>> 	* dentry is negative
>> 	* dentry has passed ->d_revalidate() (i.e. in case of
>> NFS it had nfs_neg_need_reval() return false).
>> 
>> Only two instances are non-trivial in that respect - NFS and Lustre.
>> Everything else will simply fail open() with ENOENT in that case.
>> 
>> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
>> finish_no_open and bugger off in case it's not in_lookup, otherwise do
>> pretty much what we do in case we'd got in_lookup from the very beginning.
>> Some adjustments are needed for that case (basically, we need to make
>> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
>> with refcounting correctly).
>> 
>> Tentative NFS patch follows; I don't understand Lustre well enough, but it
>> looks like a plausible strategy there as well.
> 
> This patch seems to have brought the other crash back in (or something similar),
> the one with a negative dentry being hashed when it's already hashed.
> it's not as easy to hit as before, but a lot easier than the race we are hitting here.
> 
> Also in all cases it is ls that is crashing now, which seems to highlight it is
> taking some of this new paths added.
> I have half a dosen crashdumps if you want me to look something up there.
> This is on top of 4.7.0-rc6 a99cde438de0c4c0cecc1d1af1a55a75b10bfdef
> with just your patch on top.
> 
> I can reproduce it with both the complex workload, or with a simplified one (attached),
> takes anywhere from 5 to 30 minutes to hit.
> 
>> 
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index d8015a03..5474e39 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 		    struct file *file, unsigned open_flags,
>> 		    umode_t mode, int *opened)
>> {
>> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
>> 	struct nfs_open_context *ctx;
>> 	struct dentry *res;
>> 	struct iattr attr = { .ia_valid = ATTR_OPEN };
>> 	struct inode *inode;
>> 	unsigned int lookup_flags = 0;
>> +	bool switched = false;
>> 	int err;
>> 
>> 	/* Expect a negative dentry */
>> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 		attr.ia_size = 0;
>> 	}
>> 
>> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
>> +		d_drop(dentry);
>> +		switched = true;
>> +		dentry = d_alloc_parallel(dentry->d_parent,
>> +					  &dentry->d_name, &wq);
> 
> Hm, d_alloc_parallel can return some preexisting dentry it was able to lookup in
> rcu attached to the same parent with the same name it seems?
> What's to stop a parallel thread to rehash the dentry after we dropped it here,
> and so d_alloc_parallel will happily find it?
> I am running a test to verify this theory now.

Ok, so at least in my case we do not come out of d_alloc_parallel() with a hashed
dentry, though I wonder why are not you trying to catch this case here?
If you rely on the !d_in_lookup(dentry) below, that seems to be racy.
if it was __d_lookup_rcu() that found the dentry, we do not seem to be performing
any additional checks on it and just return it.
But what if it was just added by a parallel caller here that just finished
d_splice_alias (= hashed dentry so it's not rejected by the __d_lookup_rcu) and
at the same time have not progressed all the way to d_lookup_done() call below?

Interesting that in some of the dumps d_hash is all zeroes, which means
it is unhashed, yet the assertion was triggered, so there was some parallel
caller that performed a d_drop on us (but nothing of interest on other CPUs).

In all cases d_flags = 140, so we do not have the DCACHE_PAR_LOOKUP set at the
point of crash.

Also in part of the dumps the dentry is actually positive, not negative.


> 
>> +		if (IS_ERR(dentry))
>> +			return PTR_ERR(dentry);
>> +		if (unlikely(!d_in_lookup(dentry)))
>> +			return finish_no_open(file, dentry);
>> +	}
>> +
>> 	ctx = create_nfs_open_context(dentry, open_flags);
>> 	err = PTR_ERR(ctx);
>> 	if (IS_ERR(ctx))
>> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>> 	put_nfs_open_context(ctx);
>> out:
>> +	if (unlikely(switched)) {
>> +		d_lookup_done(dentry);
>> +		dput(dentry);
>> +	}
>> 	return err;
>> 
>> no_open:
>> 	res = nfs_lookup(dir, dentry, lookup_flags);
>> -	err = PTR_ERR(res);
>> +	if (switched) {
>> +		d_lookup_done(dentry);
>> +		if (!res)
>> +			res = dentry;
>> +		else
>> +			dput(dentry);
>> +	}
>> 	if (IS_ERR(res))
>> -		goto out;
>> -
>> +		return PTR_ERR(res);
>> 	return finish_no_open(file, res);
>> }
>> EXPORT_SYMBOL_GPL(nfs_atomic_open);
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-03  6:29     ` Al Viro
  2016-07-04  0:08       ` Al Viro
  2016-07-05  2:28       ` Oleg Drokin
@ 2016-07-05  6:22       ` Oleg Drokin
  2016-07-05 12:31         ` Al Viro
  2 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05  6:22 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 3, 2016, at 2:29 AM, Al Viro wrote:

> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
> 
>> Sorry to nag you about this, but did any of those pan out?
>> 
>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>> a dentry already (though a potentially shared one, I understand).
>> Would not it be better to try and establish some dentry locking rule for calling into
>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>> not change under them?
>> Though I guess if there's dentry locking like that, we might as well do all the
>> checking in d_splice_alias(), but that means the unhashed dentries would no
>> longer be disallowed which is a change of semantic from now.--
> 
> FWIW, the only interesting case here is this:
> 	* no O_CREAT in flags (otherwise the parent is held exclusive).
> 	* dentry is found in hash
> 	* dentry is negative
> 	* dentry has passed ->d_revalidate() (i.e. in case of
> NFS it had nfs_neg_need_reval() return false).
> 
> Only two instances are non-trivial in that respect - NFS and Lustre.
> Everything else will simply fail open() with ENOENT in that case.
> 
> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
> finish_no_open and bugger off in case it's not in_lookup, otherwise do
> pretty much what we do in case we'd got in_lookup from the very beginning.
> Some adjustments are needed for that case (basically, we need to make
> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
> with refcounting correctly).
> 
> Tentative NFS patch follows; I don't understand Lustre well enough, but it
> looks like a plausible strategy there as well.

I think I know why this does not work or at least part of the reason.

> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index d8015a03..5474e39 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 		    struct file *file, unsigned open_flags,
> 		    umode_t mode, int *opened)
> {
> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> 	struct nfs_open_context *ctx;
> 	struct dentry *res;
> 	struct iattr attr = { .ia_valid = ATTR_OPEN };
> 	struct inode *inode;
> 	unsigned int lookup_flags = 0;
> +	bool switched = false;
> 	int err;
> 
> 	/* Expect a negative dentry */
> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 		attr.ia_size = 0;
> 	}
> 
> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {

So we come racing here from multiple threads (say 3 or more - we have seen this
in the older crash reports, so totally possible)

> +		d_drop(dentry);

One lucky one does this first before the others perform the !d_unhashed check above.
This makes the other ones to not enter here.

And we are back to the original problem of multiple threads trying to instantiate
same dentry as before.

Only this time the race is somehow even wider because of some other interactions
I don't really understand, at least removing this patch I feel like I am having
harder time hitting the original issue.


> +		switched = true;
> +		dentry = d_alloc_parallel(dentry->d_parent,
> +					  &dentry->d_name, &wq);
> +		if (IS_ERR(dentry))
> +			return PTR_ERR(dentry);
> +		if (unlikely(!d_in_lookup(dentry)))
> +			return finish_no_open(file, dentry);
> +	}
> +
> 	ctx = create_nfs_open_context(dentry, open_flags);
> 	err = PTR_ERR(ctx);
> 	if (IS_ERR(ctx))
> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> 	put_nfs_open_context(ctx);
> out:
> +	if (unlikely(switched)) {
> +		d_lookup_done(dentry);
> +		dput(dentry);
> +	}
> 	return err;
> 
> no_open:
> 	res = nfs_lookup(dir, dentry, lookup_flags);
> -	err = PTR_ERR(res);
> +	if (switched) {
> +		d_lookup_done(dentry);
> +		if (!res)
> +			res = dentry;
> +		else
> +			dput(dentry);
> +	}
> 	if (IS_ERR(res))
> -		goto out;
> -
> +		return PTR_ERR(res);
> 	return finish_no_open(file, res);
> }
> EXPORT_SYMBOL_GPL(nfs_atomic_open);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05  6:22       ` Oleg Drokin
@ 2016-07-05 12:31         ` Al Viro
  2016-07-05 13:51           ` Al Viro
  0 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-05 12:31 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Tue, Jul 05, 2016 at 02:22:48AM -0400, Oleg Drokin wrote:

> > +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {

s/d_unhashed/d_in_lookup/ in that.

> So we come racing here from multiple threads (say 3 or more - we have seen this
> in the older crash reports, so totally possible)
> 
> > +		d_drop(dentry);
> 
> One lucky one does this first before the others perform the !d_unhashed check above.
> This makes the other ones to not enter here.
> 
> And we are back to the original problem of multiple threads trying to instantiate
> same dentry as before.

Yep.  See above - it should've been using d_in_lookup() in the first place,
through the entire nfs_atomic_open().  Same in the Lustre part of fixes,
obviously.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 12:31         ` Al Viro
@ 2016-07-05 13:51           ` Al Viro
  2016-07-05 15:21             ` Oleg Drokin
                               ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Al Viro @ 2016-07-05 13:51 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Tue, Jul 05, 2016 at 01:31:10PM +0100, Al Viro wrote:
> On Tue, Jul 05, 2016 at 02:22:48AM -0400, Oleg Drokin wrote:
> 
> > > +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
> 
> s/d_unhashed/d_in_lookup/ in that.
> 
> > So we come racing here from multiple threads (say 3 or more - we have seen this
> > in the older crash reports, so totally possible)
> > 
> > > +		d_drop(dentry);
> > 
> > One lucky one does this first before the others perform the !d_unhashed check above.
> > This makes the other ones to not enter here.
> > 
> > And we are back to the original problem of multiple threads trying to instantiate
> > same dentry as before.
> 
> Yep.  See above - it should've been using d_in_lookup() in the first place,
> through the entire nfs_atomic_open().  Same in the Lustre part of fixes,
> obviously.

See current #for-linus for hopefully fixed variants (both lustre and nfs)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 13:51           ` Al Viro
@ 2016-07-05 15:21             ` Oleg Drokin
  2016-07-05 17:42               ` Al Viro
  2016-07-05 16:33             ` Oleg Drokin
  2016-07-06 16:24             ` Oleg Drokin
  2 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05 15:21 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 9:51 AM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 01:31:10PM +0100, Al Viro wrote:
>> On Tue, Jul 05, 2016 at 02:22:48AM -0400, Oleg Drokin wrote:
>> 
>>>> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
>> 
>> s/d_unhashed/d_in_lookup/ in that.
>> 
>>> So we come racing here from multiple threads (say 3 or more - we have seen this
>>> in the older crash reports, so totally possible)
>>> 
>>>> +		d_drop(dentry);
>>> 
>>> One lucky one does this first before the others perform the !d_unhashed check above.
>>> This makes the other ones to not enter here.
>>> 
>>> And we are back to the original problem of multiple threads trying to instantiate
>>> same dentry as before.
>> 
>> Yep.  See above - it should've been using d_in_lookup() in the first place,
>> through the entire nfs_atomic_open().  Same in the Lustre part of fixes,
>> obviously.
> 
> See current #for-linus for hopefully fixed variants (both lustre and nfs)

The first patch of the series:
> @@ -416,9 +416,9 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
> ...
> -       if (d_unhashed(*de)) {
> +       if (d_in_lookup(*de)) {
>                 struct dentry *alias;
>  
>                 alias = ll_splice_alias(inode, *de);

This breaks Lustre because we now might progress further in this function
without calling into ll_splice_alias and that's the only place that we do
ll_d_init() that later code depends on so we violently crash next time
we call e.g. d_lustre_revalidate() further down that code.

Also I still wonder what's to stop d_alloc_parallel() from returning
a hashed dentry with d_in_lookup() still true?
Certainly there's a big gap between hashing the dentry and dropping the PAR
bit in there that I imagine might allow __d_lookup_rcu() to pick it up
in between?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 13:51           ` Al Viro
  2016-07-05 15:21             ` Oleg Drokin
@ 2016-07-05 16:33             ` Oleg Drokin
  2016-07-05 18:08               ` Al Viro
  2016-07-06 16:24             ` Oleg Drokin
  2 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05 16:33 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 9:51 AM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 01:31:10PM +0100, Al Viro wrote:
>> On Tue, Jul 05, 2016 at 02:22:48AM -0400, Oleg Drokin wrote:
>> 
>>>> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
>> 
>> s/d_unhashed/d_in_lookup/ in that.
>> 
>>> So we come racing here from multiple threads (say 3 or more - we have seen this
>>> in the older crash reports, so totally possible)
>>> 
>>>> +		d_drop(dentry);
>>> 
>>> One lucky one does this first before the others perform the !d_unhashed check above.
>>> This makes the other ones to not enter here.
>>> 
>>> And we are back to the original problem of multiple threads trying to instantiate
>>> same dentry as before.
>> 
>> Yep.  See above - it should've been using d_in_lookup() in the first place,
>> through the entire nfs_atomic_open().  Same in the Lustre part of fixes,
>> obviously.
> 
> See current #for-linus for hopefully fixed variants (both lustre and nfs)

So at first it looked like we just need another d_init in the other arm
of that "if (d_in_lookup())" statement, but alas.

Also the patch that changed d_unhashed check for d_in_lookup() now results in
a stale comment:
        /* Only hash *de if it is unhashed (new dentry).
         * Atomic_open may passing hashed dentries for open.
         */
        if (d_in_lookup(*de)) {


Since we no longer check for d_unhashed(), what would be a better choice of words here?
"Only hash *de if it is a new dentry coming from lookup"?

This also makes me question the whole thing some more. We are definitely in lookup
when this hits, so the dentry is already new, yet it does not check off as
d_in_lookup(). That also means that by skipping the ll_splice_alias we are failing
to hash it and that causing needless lookups later?
Looking some back into the history of commits, d_in_lookup() is to tell us
that we are in the middle of lookup. How can we be in the middle of lookup
path then and not have this set on a dentry? We know dentry was not
substituted with anything here because we did not call into ll_split_alias().
So what's going on then?

Here's a backtrace:
[  146.045148]  [<ffffffffa017baa6>] lbug_with_loc+0x46/0xb0 [libcfs]
[  146.045158]  [<ffffffffa05baef3>] ll_lookup_it_finish+0x713/0xaa0 [lustre]
[  146.045160]  [<ffffffff810e6dcd>] ? trace_hardirqs_on+0xd/0x10
[  146.045167]  [<ffffffffa05bb51b>] ll_lookup_it+0x29b/0x710 [lustre]
[  146.045173]  [<ffffffffa05b8830>] ? md_set_lock_data.part.25+0x60/0x60 [lustr
e]
[  146.045179]  [<ffffffffa05bc6a4>] ll_lookup_nd+0x84/0x190 [lustre]
[  146.045180]  [<ffffffff81276e94>] __lookup_hash+0x64/0xa0
[  146.045181]  [<ffffffff810e1b88>] ? down_write_nested+0xa8/0xc0
[  146.045182]  [<ffffffff8127d55f>] do_unlinkat+0x1bf/0x2f0
[  146.045183]  [<ffffffff8127e12b>] SyS_unlinkat+0x1b/0x30
[  146.045185]  [<ffffffff8188b3bc>] entry_SYSCALL_64_fastpath+0x1f/0xbd

__lookup_hash() does d_alloc (not parallel) and falls through into the ->lookup()
of the filesystem. So the dots do not connect.

The more I look at it the more I suspect it's wrong.
Everywhere else you changed in that patch, it was in *atomic_open() with
a very known impact. ll_lookup_it_finish() on the other hand is a generic lookup path,
not just for atomic opens.

I took out that part of your patch and problems went away it seems.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 15:21             ` Oleg Drokin
@ 2016-07-05 17:42               ` Al Viro
  2016-07-05 18:12                 ` Oleg Drokin
  0 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-05 17:42 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Tue, Jul 05, 2016 at 11:21:32AM -0400, Oleg Drokin wrote:
> > ...
> > -       if (d_unhashed(*de)) {
> > +       if (d_in_lookup(*de)) {
> >                 struct dentry *alias;
> >  
> >                 alias = ll_splice_alias(inode, *de);
> 
> This breaks Lustre because we now might progress further in this function
> without calling into ll_splice_alias and that's the only place that we do
> ll_d_init() that later code depends on so we violently crash next time
> we call e.g. d_lustre_revalidate() further down that code.

Huh?  How the hell do those conditions differ there?

> Also I still wonder what's to stop d_alloc_parallel() from returning
> a hashed dentry with d_in_lookup() still true?

The fact that such dentries do not exist at any point?

> Certainly there's a big gap between hashing the dentry and dropping the PAR
> bit in there that I imagine might allow __d_lookup_rcu() to pick it up
> in between?--

WTF?  Where do you see that gap?  in-lookup dentries get hashed only in one
place - __d_add().  And there (besides holding ->d_lock around both) we
drop that bit in flags *before* _d_rehash().  AFAICS, the situation with
barriers is OK there, due to lockref_get_not_dead() serving as ACQUIRE
operation; I could be missing something subtle, but a wide gap...  Where?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 16:33             ` Oleg Drokin
@ 2016-07-05 18:08               ` Al Viro
  2016-07-05 19:12                 ` Oleg Drokin
  0 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-05 18:08 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Tue, Jul 05, 2016 at 12:33:09PM -0400, Oleg Drokin wrote:

> This also makes me question the whole thing some more. We are definitely in lookup
> when this hits, so the dentry is already new, yet it does not check off as
> d_in_lookup(). That also means that by skipping the ll_splice_alias we are failing
> to hash it and that causing needless lookups later?
> Looking some back into the history of commits, d_in_lookup() is to tell us
> that we are in the middle of lookup. How can we be in the middle of lookup
> path then and not have this set on a dentry? We know dentry was not
> substituted with anything here because we did not call into ll_split_alias().
> So what's going on then?

Lookup in directory locked exclusive, that's what...  In unlink(), in your
testcase.  And yes, this piece of 1/3 is incorrect; what I do not understand
is the logics of what you are doing with dcache in ll_splice_alias() and
in its caller ;-/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 17:42               ` Al Viro
@ 2016-07-05 18:12                 ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05 18:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 1:42 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 11:21:32AM -0400, Oleg Drokin wrote:
>>> ...
>>> -       if (d_unhashed(*de)) {
>>> +       if (d_in_lookup(*de)) {
>>>                struct dentry *alias;
>>> 
>>>                alias = ll_splice_alias(inode, *de);
>> 
>> This breaks Lustre because we now might progress further in this function
>> without calling into ll_splice_alias and that's the only place that we do
>> ll_d_init() that later code depends on so we violently crash next time
>> we call e.g. d_lustre_revalidate() further down that code.
> 
> Huh?  How the hell do those conditions differ there?

Like explained in my other email, because this is in a normal
lookup path, we can get here with a new dentry that was allocated in
__hash_lookup via d_alloc (not parallel) that's not marked with the PAR bit.

>> Also I still wonder what's to stop d_alloc_parallel() from returning
>> a hashed dentry with d_in_lookup() still true?
> 
> The fact that such dentries do not exist at any point?
> 
>> Certainly there's a big gap between hashing the dentry and dropping the PAR
>> bit in there that I imagine might allow __d_lookup_rcu() to pick it up
>> in between?--
> 
> WTF?  Where do you see that gap?  in-lookup dentries get hashed only in one
> place - __d_add().  And there (besides holding ->d_lock around both) we
> drop that bit in flags *before* _d_rehash().  AFAICS, the situation with
> barriers is OK there, due to lockref_get_not_dead() serving as ACQUIRE
> operation; I could be missing something subtle, but a wide gap...  Where?

Oh! I see, I missed that __d_add drops the PAR bit as well, not just the code
at the end of the call that does d_alloc_parallel.
Then indeed there is no gap, sorry for the false alarm.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 18:08               ` Al Viro
@ 2016-07-05 19:12                 ` Oleg Drokin
  2016-07-05 20:08                   ` Al Viro
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05 19:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 2:08 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 12:33:09PM -0400, Oleg Drokin wrote:
> 
>> This also makes me question the whole thing some more. We are definitely in lookup
>> when this hits, so the dentry is already new, yet it does not check off as
>> d_in_lookup(). That also means that by skipping the ll_splice_alias we are failing
>> to hash it and that causing needless lookups later?
>> Looking some back into the history of commits, d_in_lookup() is to tell us
>> that we are in the middle of lookup. How can we be in the middle of lookup
>> path then and not have this set on a dentry? We know dentry was not
>> substituted with anything here because we did not call into ll_split_alias().
>> So what's going on then?
> 
> Lookup in directory locked exclusive, that's what...  In unlink(), in your
> testcase.  And yes, this piece of 1/3 is incorrect; what I do not understand
> is the logics of what you are doing with dcache in ll_splice_alias() and
> in its caller ;-/

When d_in_lookup was introduced, it was described as:
    New primitives: d_in_lookup() (a predicate checking if dentry is in
    the in-lookup state) and d_lookup_done() (tells the system that
    we are done with lookup and if it's still marked as in-lookup, it
    should cease to be such).

I don't see where it mentions anything about exclusive vs parallel lookup
that probably led to some confusion.

So with Lustre the dentry can be in three states, really:

1. hashed dentry that's all A-ok to reuse.
2. hashed dentry that's NOT valid (dlm lock went away) - this is distinguished in d_compare by looking at a bit in the fs_data
3. unhashed dentry ( I guess could be both valid and invalid lustre-wise).

So the logic in ll_lookup_it_finish() (part of regular lookup) is this:

If the dentry we have is not hashed - this is a new lookup, so we need to
call into ll_splice_alias() to see if there's a better dentry we need to
reuse that was already rejected by VFS before since we did not have necessary locks,
but we do have them now.
The comment at the top of ll_dcompare() explains why we don't just unhash the
dentry on lock-loss - that apparently leads to a loop around real_lookup for
real-contended dentries.
This is also why we cannot use d_splice_alias here - such cases are possible
for regular files and directories.

Anyway, I guess additional point of confusion here is then why does
ll_lookup_it_finish() need to check for hashedness of the dentry since it's in
lookup, so we should be unhashed here.
I checked the commit history and this check was added along with atomic_open
support, so I imagine we can just move it up into ll_atomic_open and then your
change starts to make sense along with a few other things.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 19:12                 ` Oleg Drokin
@ 2016-07-05 20:08                   ` Al Viro
  2016-07-05 20:21                     ` Oleg Drokin
  0 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-05 20:08 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Tue, Jul 05, 2016 at 03:12:44PM -0400, Oleg Drokin wrote:
> 
> When d_in_lookup was introduced, it was described as:
>     New primitives: d_in_lookup() (a predicate checking if dentry is in
>     the in-lookup state) and d_lookup_done() (tells the system that
>     we are done with lookup and if it's still marked as in-lookup, it
>     should cease to be such).
> 
> I don't see where it mentions anything about exclusive vs parallel lookup
> that probably led to some confusion.

In the same commit:

#define DCACHE_PAR_LOOKUP               0x10000000 /* being looked up (with parent locked shared) */

static inline int d_in_lookup(struct dentry *dentry)
{
       return dentry->d_flags & DCACHE_PAR_LOOKUP;
}

Sure, we could use d_alloc_parallel() for all lookups, but it wouldn't buy
us anything in terms of exclusion (parent locked exclusive => no other
lookup attempts on that parent/name pair anyway) and it would cost extra
searches both in the primary and in-lookup hashes, as well as insertions and
removals from the latter.

Hell knows - perhaps teaching d_alloc_parallel() that NULL wq => just
allocate and mark in-lookup, without touching either primary or in-lookup
hash (and scream bloody murder if the parent isn't locked exclusive) would
be a good idea.  A few places in fs/dcache.c would need to check for
->d_wait being non-NULL (__d_lookup_done(), __d_add() an __d_move());
We could use that for lookups when parent is locked exclusive; then
d_in_lookup() would be true for *all* dentries passed to ->lookup().  

I'll look into that, but it's obviously the next cycle fodder.

> So with Lustre the dentry can be in three states, really:
> 
> 1. hashed dentry that's all A-ok to reuse.
> 2. hashed dentry that's NOT valid (dlm lock went away) - this is distinguished in d_compare by looking at a bit in the fs_data
> 3. unhashed dentry ( I guess could be both valid and invalid lustre-wise).
> 
> So the logic in ll_lookup_it_finish() (part of regular lookup) is this:
> 
> If the dentry we have is not hashed - this is a new lookup, so we need to
> call into ll_splice_alias() to see if there's a better dentry we need to
> reuse that was already rejected by VFS before since we did not have necessary locks,
> but we do have them now.
> The comment at the top of ll_dcompare() explains why we don't just unhash the
> dentry on lock-loss - that apparently leads to a loop around real_lookup for
> real-contended dentries.
> This is also why we cannot use d_splice_alias here - such cases are possible
> for regular files and directories.
>
> Anyway, I guess additional point of confusion here is then why does
> ll_lookup_it_finish() need to check for hashedness of the dentry since it's in
> lookup, so we should be unhashed here.
> I checked the commit history and this check was added along with atomic_open
> support, so I imagine we can just move it up into ll_atomic_open and then your
> change starts to make sense along with a few other things.

So basically this
        } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
                   !it_disposition(it, DISP_OPEN_CREATE)) {
                /* With DISP_OPEN_CREATE dentry will be
                 * instantiated in ll_create_it.
                 */
                LASSERT(!d_inode(*de));
                d_instantiate(*de, inode);
        }
is something we should do only for negative hashed fed to it by
->atomic_open(), and that - only if we have no O_CREAT in flags?

Then, since 3/3 eliminates that case completely, we could just rip that
else-if, along with the check for d_unhashed itself, making the call of
ll_splice_alias() unconditional there.  Or am I misreading what you said?
Do you see any problems with what's in #for-linus now (head at 11f0083)?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 20:08                   ` Al Viro
@ 2016-07-05 20:21                     ` Oleg Drokin
  2016-07-06  0:29                       ` Oleg Drokin
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-07-05 20:21 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 4:08 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 03:12:44PM -0400, Oleg Drokin wrote:
>> 
>> When d_in_lookup was introduced, it was described as:
>>    New primitives: d_in_lookup() (a predicate checking if dentry is in
>>    the in-lookup state) and d_lookup_done() (tells the system that
>>    we are done with lookup and if it's still marked as in-lookup, it
>>    should cease to be such).
>> 
>> I don't see where it mentions anything about exclusive vs parallel lookup
>> that probably led to some confusion.
> 
> In the same commit:
> 
> #define DCACHE_PAR_LOOKUP               0x10000000 /* being looked up (with parent locked shared) */

Well, I did not really check the commit, just the log message,
since there's no other documentation about it apparently ;)

>> So with Lustre the dentry can be in three states, really:
>> 
>> 1. hashed dentry that's all A-ok to reuse.
>> 2. hashed dentry that's NOT valid (dlm lock went away) - this is distinguished in d_compare by looking at a bit in the fs_data
>> 3. unhashed dentry ( I guess could be both valid and invalid lustre-wise).
>> 
>> So the logic in ll_lookup_it_finish() (part of regular lookup) is this:
>> 
>> If the dentry we have is not hashed - this is a new lookup, so we need to
>> call into ll_splice_alias() to see if there's a better dentry we need to
>> reuse that was already rejected by VFS before since we did not have necessary locks,
>> but we do have them now.
>> The comment at the top of ll_dcompare() explains why we don't just unhash the
>> dentry on lock-loss - that apparently leads to a loop around real_lookup for
>> real-contended dentries.
>> This is also why we cannot use d_splice_alias here - such cases are possible
>> for regular files and directories.
>> 
>> Anyway, I guess additional point of confusion here is then why does
>> ll_lookup_it_finish() need to check for hashedness of the dentry since it's in
>> lookup, so we should be unhashed here.
>> I checked the commit history and this check was added along with atomic_open
>> support, so I imagine we can just move it up into ll_atomic_open and then your
>> change starts to make sense along with a few other things.
> 
> So basically this
>        } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
>                   !it_disposition(it, DISP_OPEN_CREATE)) {
>                /* With DISP_OPEN_CREATE dentry will be
>                 * instantiated in ll_create_it.
>                 */
>                LASSERT(!d_inode(*de));
>                d_instantiate(*de, inode);
>        }
> is something we should do only for negative hashed fed to it by
> ->atomic_open(), and that - only if we have no O_CREAT in flags?

Yes, and in fact we can totally avoid it I think.

> Then, since 3/3 eliminates that case completely, we could just rip that
> else-if, along with the check for d_unhashed itself, making the call of
> ll_splice_alias() unconditional there.  Or am I misreading what you said?
> Do you see any problems with what's in #for-linus now (head at 11f0083)?

Yes, we can make it unconditional
I think we can simplify it even more and since unlike NFS our negative dentries
are a lot less speculative, we can just return ENOENT if this is not a create
request and unhash otherwise. Still needs to go through the whole test suite
to make sure it does not break anything unexpected.

At least this is the patch I am playing with right now:
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -391,6 +391,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 	struct inode *inode = NULL;
 	__u64 bits = 0;
 	int rc = 0;
+	struct dentry *alias;
 
 	/* NB 1 request reference will be taken away by ll_intent_lock()
 	 * when I return
@@ -415,26 +416,12 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 		 */
 	}
 
-	/* Only hash *de if it is unhashed (new dentry).
-	 * Atoimc_open may passing hashed dentries for open.
-	 */
-	if (d_unhashed(*de)) {
-		struct dentry *alias;
-
-		alias = ll_splice_alias(inode, *de);
-		if (IS_ERR(alias)) {
-			rc = PTR_ERR(alias);
-			goto out;
-		}
-		*de = alias;
-	} else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
-		   !it_disposition(it, DISP_OPEN_CREATE)) {
-		/* With DISP_OPEN_CREATE dentry will be
-		 * instantiated in ll_create_it.
-		 */
-		LASSERT(!d_inode(*de));
-		d_instantiate(*de, inode);
+	alias = ll_splice_alias(inode, *de);
+	if (IS_ERR(alias)) {
+		rc = PTR_ERR(alias);
+		goto out;
 	}
+	*de = alias;
 
 	if (!it_disposition(it, DISP_LOOKUP_NEG)) {
 		/* we have lookup look - unhide dentry */
@@ -590,6 +577,24 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 	       dentry, PFID(ll_inode2fid(dir)), dir, file, open_flags, mode,
 	       *opened);
 
+	/* Only negative dentries enter here */
+	LASSERT(!d_inode(dentry));
+
+	if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {
+		/* A valid negative dentry that just passed revalidation,
+		 * there's little point to try and open it server-side,
+		 * even though there's a minuscle chance it might succeed.
+		 * Either way it's a valid race to just return -ENOENT here.
+		 */
+		if (!(open_flags & O_CREAT))
+			return -ENOENT;
+
+		/* Otherwise we just unhash it to be rehashed afresh via
+		 * lookup if necessary
+		 */
+		d_drop(dentry);
+	}
+
 	it = kzalloc(sizeof(*it), GFP_NOFS);
 	if (!it)
 		return -ENOMEM;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 20:21                     ` Oleg Drokin
@ 2016-07-06  0:29                       ` Oleg Drokin
  2016-07-06  3:20                         ` Al Viro
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-07-06  0:29 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 4:21 PM, Oleg Drokin wrote:
> 
>>> So with Lustre the dentry can be in three states, really:
>>> 
>>> 1. hashed dentry that's all A-ok to reuse.
>>> 2. hashed dentry that's NOT valid (dlm lock went away) - this is distinguished in d_compare by looking at a bit in the fs_data
>>> 3. unhashed dentry ( I guess could be both valid and invalid lustre-wise).
>>> 
>>> So the logic in ll_lookup_it_finish() (part of regular lookup) is this:
>>> 
>>> If the dentry we have is not hashed - this is a new lookup, so we need to
>>> call into ll_splice_alias() to see if there's a better dentry we need to
>>> reuse that was already rejected by VFS before since we did not have necessary locks,
>>> but we do have them now.
>>> The comment at the top of ll_dcompare() explains why we don't just unhash the
>>> dentry on lock-loss - that apparently leads to a loop around real_lookup for
>>> real-contended dentries.
>>> This is also why we cannot use d_splice_alias here - such cases are possible
>>> for regular files and directories.
>>> 
>>> Anyway, I guess additional point of confusion here is then why does
>>> ll_lookup_it_finish() need to check for hashedness of the dentry since it's in
>>> lookup, so we should be unhashed here.
>>> I checked the commit history and this check was added along with atomic_open
>>> support, so I imagine we can just move it up into ll_atomic_open and then your
>>> change starts to make sense along with a few other things.
>> 
>> So basically this
>>       } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
>>                  !it_disposition(it, DISP_OPEN_CREATE)) {
>>               /* With DISP_OPEN_CREATE dentry will be
>>                * instantiated in ll_create_it.
>>                */
>>               LASSERT(!d_inode(*de));
>>               d_instantiate(*de, inode);
>>       }
>> is something we should do only for negative hashed fed to it by
>> ->atomic_open(), and that - only if we have no O_CREAT in flags?
> 
> Yes, and in fact we can totally avoid it I think.
> 
>> Then, since 3/3 eliminates that case completely, we could just rip that
>> else-if, along with the check for d_unhashed itself, making the call of
>> ll_splice_alias() unconditional there.  Or am I misreading what you said?
>> Do you see any problems with what's in #for-linus now (head at 11f0083)?
> 
> Yes, we can make it unconditional
> I think we can simplify it even more and since unlike NFS our negative dentries
> are a lot less speculative, we can just return ENOENT if this is not a create
> request and unhash otherwise. Still needs to go through the whole test suite
> to make sure it does not break anything unexpected.
> 
> At least this is the patch I am playing with right now:
> --- a/drivers/staging/lustre/lustre/llite/namei.c
> +++ b/drivers/staging/lustre/lustre/llite/namei.c
> @@ -391,6 +391,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
> 	struct inode *inode = NULL;
> 	__u64 bits = 0;
> 	int rc = 0;
> +	struct dentry *alias;
> 
> 	/* NB 1 request reference will be taken away by ll_intent_lock()
> 	 * when I return
> @@ -415,26 +416,12 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
> 		 */
> 	}
> 
> -	/* Only hash *de if it is unhashed (new dentry).
> -	 * Atoimc_open may passing hashed dentries for open.
> -	 */
> -	if (d_unhashed(*de)) {
> -		struct dentry *alias;
> -
> -		alias = ll_splice_alias(inode, *de);
> -		if (IS_ERR(alias)) {
> -			rc = PTR_ERR(alias);
> -			goto out;
> -		}
> -		*de = alias;
> -	} else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
> -		   !it_disposition(it, DISP_OPEN_CREATE)) {
> -		/* With DISP_OPEN_CREATE dentry will be
> -		 * instantiated in ll_create_it.
> -		 */
> -		LASSERT(!d_inode(*de));
> -		d_instantiate(*de, inode);
> +	alias = ll_splice_alias(inode, *de);
> +	if (IS_ERR(alias)) {
> +		rc = PTR_ERR(alias);
> +		goto out;
> 	}
> +	*de = alias;
> 
> 	if (!it_disposition(it, DISP_LOOKUP_NEG)) {
> 		/* we have lookup look - unhide dentry */
> @@ -590,6 +577,24 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
> 	       dentry, PFID(ll_inode2fid(dir)), dir, file, open_flags, mode,
> 	       *opened);
> 
> +	/* Only negative dentries enter here */
> +	LASSERT(!d_inode(dentry));
> +
> +	if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {

Duh, this obviously was supposed to be
 if (!d_in_lookup(dentry)) {

But even in the form above nothing bad happened in the full testing
because we cannot find any aliases without an inode.

> +		/* A valid negative dentry that just passed revalidation,
> +		 * there's little point to try and open it server-side,
> +		 * even though there's a minuscle chance it might succeed.
> +		 * Either way it's a valid race to just return -ENOENT here.
> +		 */
> +		if (!(open_flags & O_CREAT))
> +			return -ENOENT;
> +
> +		/* Otherwise we just unhash it to be rehashed afresh via
> +		 * lookup if necessary
> +		 */
> +		d_drop(dentry);

So we can even drop this part and retain the top condition as it was.
d_add does not care if the dentry we are feeding it was hashed or not,
so do you see any downsides to doing that I wonder?

> +	}
> +
> 	it = kzalloc(sizeof(*it), GFP_NOFS);
> 	if (!it)
> 		return -ENOMEM;
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-06  0:29                       ` Oleg Drokin
@ 2016-07-06  3:20                         ` Al Viro
  2016-07-06  3:25                             ` Oleg Drokin
  0 siblings, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-06  3:20 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>

On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
> > +		/* Otherwise we just unhash it to be rehashed afresh via
> > +		 * lookup if necessary
> > +		 */
> > +		d_drop(dentry);
> 
> So we can even drop this part and retain the top condition as it was.
> d_add does not care if the dentry we are feeding it was hashed or not,
> so do you see any downsides to doing that I wonder?

d_add() on hashed dentry will end up reaching this:
static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
{
        BUG_ON(!d_unhashed(entry));

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-06  3:20                         ` Al Viro
@ 2016-07-06  3:25                             ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-06  3:25 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 11:20 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>> +		/* Otherwise we just unhash it to be rehashed afresh via
>>> +		 * lookup if necessary
>>> +		 */
>>> +		d_drop(dentry);
>> 
>> So we can even drop this part and retain the top condition as it was.
>> d_add does not care if the dentry we are feeding it was hashed or not,
>> so do you see any downsides to doing that I wonder?
> 
> d_add() on hashed dentry will end up reaching this:
> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
> {
>        BUG_ON(!d_unhashed(entry));

Ah, ok. Yes, I remember about it now from the older issue with nfs.

It's still puzzling why I did not hit it yet, but oh well.

I wonder if handling of negative dentries broke… Time for more investigations.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
@ 2016-07-06  3:25                             ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-06  3:25 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 11:20 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>> +		/* Otherwise we just unhash it to be rehashed afresh via
>>> +		 * lookup if necessary
>>> +		 */
>>> +		d_drop(dentry);
>> 
>> So we can even drop this part and retain the top condition as it was.
>> d_add does not care if the dentry we are feeding it was hashed or not,
>> so do you see any downsides to doing that I wonder?
> 
> d_add() on hashed dentry will end up reaching this:
> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
> {
>        BUG_ON(!d_unhashed(entry));

Ah, ok. Yes, I remember about it now from the older issue with nfs.

It's still puzzling why I did not hit it yet, but oh well.

I wonder if handling of negative dentries broke� Time for more investigations.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-06  3:25                             ` Oleg Drokin
@ 2016-07-06  4:35                               ` Oleg Drokin
  -1 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-06  4:35 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 11:25 PM, Oleg Drokin wrote:

> 
> On Jul 5, 2016, at 11:20 PM, Al Viro wrote:
> 
>> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>>> +		/* Otherwise we just unhash it to be rehashed afresh via
>>>> +		 * lookup if necessary
>>>> +		 */
>>>> +		d_drop(dentry);
>>> 
>>> So we can even drop this part and retain the top condition as it was.
>>> d_add does not care if the dentry we are feeding it was hashed or not,
>>> so do you see any downsides to doing that I wonder?
>> 
>> d_add() on hashed dentry will end up reaching this:
>> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
>> {
>>       BUG_ON(!d_unhashed(entry));
> 
> Ah, ok. Yes, I remember about it now from the older issue with nfs.
> 
> It's still puzzling why I did not hit it yet, but oh well.
> 
> I wonder if handling of negative dentries broke… Time for more investigations.

Ah, the reason was just looking me in the eyes as the first thing we do in
ll_revalidate_dentry():

        if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
            (LOOKUP_OPEN | LOOKUP_CREATE))
                return 0;


Apparently we are trying to protect from the case where dentry is valid when we
check it, but gets pulled under us as soon as we are done (no lustre locking held
around it).
So if we don't do this, we fall into ll_file_open/ll_intent_file_open and form
a request to open the file based on the inode. If this inode is already gone
server-side, we might get ESTALE, since we don't really send the name to the
server in this case so it does not know what is it we are after anyway.

On the userspace side of things the picture is not pretty then, we are doing
open(O_CREAT) and get ESTALE back.

Looking at do_filp_open(), there's actually a retry:
        if (unlikely(filp == ERR_PTR(-ESTALE)))
                filp = path_openat(&nd, op, flags | LOOKUP_REVAL);

But it's only once so for a contended file we can still have some other thread
do a lookup, provide us with a new cached dentry that's similarly pulled
under us next time around.
But I guess the whole idea of this is to let us know the file is contended and
we can just only fail revalidate then.

Hm.. looking at the server - it's even worse, we'd get ENOENT from the server if
the desired inode no longer exist, so a perfect opportunity for the
client to turn that ENOENT into ESTALE if the create flag was set.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
@ 2016-07-06  4:35                               ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-06  4:35 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 11:25 PM, Oleg Drokin wrote:

> 
> On Jul 5, 2016, at 11:20 PM, Al Viro wrote:
> 
>> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>>> +		/* Otherwise we just unhash it to be rehashed afresh via
>>>> +		 * lookup if necessary
>>>> +		 */
>>>> +		d_drop(dentry);
>>> 
>>> So we can even drop this part and retain the top condition as it was.
>>> d_add does not care if the dentry we are feeding it was hashed or not,
>>> so do you see any downsides to doing that I wonder?
>> 
>> d_add() on hashed dentry will end up reaching this:
>> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
>> {
>>       BUG_ON(!d_unhashed(entry));
> 
> Ah, ok. Yes, I remember about it now from the older issue with nfs.
> 
> It's still puzzling why I did not hit it yet, but oh well.
> 
> I wonder if handling of negative dentries broke� Time for more investigations.

Ah, the reason was just looking me in the eyes as the first thing we do in
ll_revalidate_dentry():

        if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
            (LOOKUP_OPEN | LOOKUP_CREATE))
                return 0;


Apparently we are trying to protect from the case where dentry is valid when we
check it, but gets pulled under us as soon as we are done (no lustre locking held
around it).
So if we don't do this, we fall into ll_file_open/ll_intent_file_open and form
a request to open the file based on the inode. If this inode is already gone
server-side, we might get ESTALE, since we don't really send the name to the
server in this case so it does not know what is it we are after anyway.

On the userspace side of things the picture is not pretty then, we are doing
open(O_CREAT) and get ESTALE back.

Looking at do_filp_open(), there's actually a retry:
        if (unlikely(filp == ERR_PTR(-ESTALE)))
                filp = path_openat(&nd, op, flags | LOOKUP_REVAL);

But it's only once so for a contended file we can still have some other thread
do a lookup, provide us with a new cached dentry that's similarly pulled
under us next time around.
But I guess the whole idea of this is to let us know the file is contended and
we can just only fail revalidate then.

Hm.. looking at the server - it's even worse, we'd get ENOENT from the server if
the desired inode no longer exist, so a perfect opportunity for the
client to turn that ENOENT into ESTALE if the create flag was set.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05 13:51           ` Al Viro
  2016-07-05 15:21             ` Oleg Drokin
  2016-07-05 16:33             ` Oleg Drokin
@ 2016-07-06 16:24             ` Oleg Drokin
  2 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-06 16:24 UTC (permalink / raw)
  To: Al Viro; +Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 5, 2016, at 9:51 AM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 01:31:10PM +0100, Al Viro wrote:
>> On Tue, Jul 05, 2016 at 02:22:48AM -0400, Oleg Drokin wrote:
>> 
>>>> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
>> 
>> s/d_unhashed/d_in_lookup/ in that.
>> 
>>> So we come racing here from multiple threads (say 3 or more - we have seen this
>>> in the older crash reports, so totally possible)
>>> 
>>>> +		d_drop(dentry);
>>> 
>>> One lucky one does this first before the others perform the !d_unhashed check above.
>>> This makes the other ones to not enter here.
>>> 
>>> And we are back to the original problem of multiple threads trying to instantiate
>>> same dentry as before.
>> 
>> Yep.  See above - it should've been using d_in_lookup() in the first place,
>> through the entire nfs_atomic_open().  Same in the Lustre part of fixes,
>> obviously.
> 
> See current #for-linus for hopefully fixed variants (both lustre and nfs)

So returning to the NFS - the patches held up for 24 hours under various loads
with no crashes or other such ill effects.
So I think the NFS patches are good to go.

The Lustre patches I only tried my own and those seem to be doing well too.
Do you want me to send you my Lustre patches just for this issue, or send
everything to Greg as usual?

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-05  2:25               ` Al Viro
@ 2016-07-10 17:01                 ` Oleg Drokin
  2016-07-10 18:14                   ` James Simmons
  0 siblings, 1 reply; 38+ messages in thread
From: Oleg Drokin @ 2016-07-10 17:01 UTC (permalink / raw)
  To: Al Viro, James Simmons
  Cc: Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 4, 2016, at 10:25 PM, Al Viro wrote:

> BTW, could you take a look at
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git#sendmsg.lustre?
> It's a bunch of simplifications that became possible once sendmsg()/recvmsg()
> switched to iov_iter, stopped mangling the iovecs and went for predictable
> behaviour re advancing the iterator.

Thanks, this looks good to me and passes my testing (on tcp).

+typedef struct bio_vec lnet_kiov_t;

This I guess we'll need to just get rid of all lnet_kiov_t usage, but that's
something we can do ourselves, I guess.

Anyway, your patchset is based on old tree that no longer applies cleanly,
I rebased it to current staging tree to save you time in case
you want to go forward with it.
It's at git@github.com:verygreen/linux.git branch lustre-next-sendmsg

James, can you please give it a try on IB?

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-10 17:01                 ` Oleg Drokin
@ 2016-07-10 18:14                   ` James Simmons
  2016-07-11  1:01                     ` Al Viro
  0 siblings, 1 reply; 38+ messages in thread
From: James Simmons @ 2016-07-10 18:14 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Al Viro, Mailing List, <linux-fsdevel@vger.kernel.org>


> On Jul 4, 2016, at 10:25 PM, Al Viro wrote:
> 
> > BTW, could you take a look at
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git#sendmsg.lustre?
> > It's a bunch of simplifications that became possible once sendmsg()/recvmsg()
> > switched to iov_iter, stopped mangling the iovecs and went for predictable
> > behaviour re advancing the iterator.
> 
> Thanks, this looks good to me and passes my testing (on tcp).
> 
> +typedef struct bio_vec lnet_kiov_t;
> 
> This I guess we'll need to just get rid of all lnet_kiov_t usage, but that's
> something we can do ourselves, I guess.
> 
> Anyway, your patchset is based on old tree that no longer applies cleanly,
> I rebased it to current staging tree to save you time in case
> you want to go forward with it.
> It's at git@github.com:verygreen/linux.git branch lustre-next-sendmsg
> 
> James, can you please give it a try on IB?

Its broke for the ko2iblnd driver.

[  110.840583] LNet: Using FMR for registration
[  110.991747] LNet: Added LNI 10.37.248.137@o2ib1 [63/2560/0/180]
[  110.998211] ------------[ cut here ]------------
[  111.003012] kernel BUG at lib/iov_iter.c:513!
[  111.007545] invalid opcode: 0000 [#1] SMP
[  111.011731] Modules linked in: ko2iblnd(C) ptlrpc(C+) obdclass(C) 
ksocklnd(C) lnet(C) sha512_generic sha256_generic md5 crc32_generic crc3
2_pclmul libcfs(C) autofs4 ipmi_devintf auth_rpcgss nfsv4 dns_resolver 
8021q iptable_filter ip_tables x_tables ib_ipoib rdma_ucm ib_ucm ib_uv
erbs ib_umad rdma_cm configfs ib_cm iw_cm mlx4_ib ib_core dm_mirror 
dm_region_hash dm_log dm_multipath sg sd_mod joydev pcspkr dm_mod mpt3sas
 raid_class acpi_cpufreq ipmi_ssif ipmi_si ipmi_msghandler isci libsas 
scsi_transport_sas wmi tpm_tis tpm i2c_i801 ahci libahci libata scsi_m
od ehci_pci ehci_hcd button tcp_cubic nfsv3(E) nfs_acl(E) ipv6(E) nfs(E) 
lockd(E) sunrpc(E) grace(E) mlx4_en(E) mlx4_core(E) igb(E) i2c_algo_
bit(E) i2c_core(E) ptp(E) pps_core(E) hwmon(E)
[  111.086669] CPU: 6 PID: 11899 Comm: router_checker Tainted: G         C  
E   4.7.0-rc6+ #1
[  111.095248] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0a 02/19/2014
[  111.102040] task: ffff880826d32d80 ti: ffff880811d24000 task.ti: 
ffff880811d24000
[  111.109818] RIP: 0010:[<ffffffff8128daf2>]  [<ffffffff8128daf2>] 
iov_iter_kvec+0x22/0x30
[  111.118302] RSP: 0018:ffff880811d27b28  EFLAGS: 00010246
[  111.123806] RAX: 0000000000000000 RBX: ffff88105e037c00 RCX: 
0000000000000000
[  111.131111] RDX: 0000000000000000 RSI: 0000000000000005 RDI: 
ffff880811d27b78
[  111.138426] RBP: ffff880811d27b28 R08: 0000000000000000 R09: 
0000000000000000
[  111.145751] R10: 0000000000000000 R11: 00000000fffd19f7 R12: 
0000000000000000
[  111.153083] R13: 0000000000000000 R14: 000500010a25ca3b R15: 
ffff880811d27b78
[  111.160407] FS:  0000000000000000(0000) GS:ffff88107fd00000(0000) 
knlGS:0000000000000000
[  111.168797] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  111.174717] CR2: 00007fd075c48945 CR3: 000000105a9ec000 CR4: 
00000000000406e0
[  111.182031] Stack:
[  111.184224]  ffff880811d27be8 ffffffffa04012bd 0000000000000000 
ffff880811d27b40
[  111.192206]  0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[  111.192206]  0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[  111.200196]  0000880800000002 ffff88105e394000 0000000000000000 
0000000000000000
[  111.208179] Call Trace:
[  111.210818]  [<ffffffffa04012bd>] kiblnd_send+0x51d/0x9e0 [ko2iblnd]
[  111.217370]  [<ffffffffa06ec6bd>] lnet_ni_send+0x3d/0xe0 [lnet]
[  111.223487]  [<ffffffffa06ee223>] lnet_send+0x6b3/0xc80 [lnet]
[  111.229501]  [<ffffffffa06eeb58>] LNetGet+0x368/0x650 [lnet]
[  111.235346]  [<ffffffffa0692a50>] ? cfs_percpt_lock+0x50/0x110 [libcfs]
[  111.242139]  [<ffffffffa06f4d8f>] lnet_ping_router_locked+0x20f/0x840 
[lnet]
[  111.249384]  [<ffffffffa06f5909>] lnet_router_checker+0xd9/0x490 [lnet]
[  111.256192]  [<ffffffff8108347d>] ? default_wake_function+0xd/0x10
[  111.262549]  [<ffffffff810923f1>] ? __wake_up_common+0x51/0x80
[  111.268562]  [<ffffffffa06f5830>] ? lnet_prune_rc_data+0x470/0x470 
[lnet]
[  111.275544]  [<ffffffff81508f9b>] ? schedule+0x3b/0xa0
[  111.280871]  [<ffffffffa06f5830>] ? lnet_prune_rc_data+0x470/0x470 
[lnet]
[  111.287849]  [<ffffffff810779d7>] kthread+0xc7/0xe0
[  111.292904]  [<ffffffff8150c3cf>] ret_from_fork+0x1f/0x40
[  111.298475]  [<ffffffff81077910>] ? 
kthread_freezable_should_stop+0x70/0x70
[  111.305631] Code: 2e 0f 1f 84 00 00 00 00 00 55 40 f6 c6 02 48 89 e5 74 
18 89 37 48 89 57 18 48 89 4f 20 48 c7 47 08 00 00 00 00 4c 89 47 10 c9 c3 
<0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 10 55 48 89
[  111.329528] RIP  [<ffffffff8128daf2>] iov_iter_kvec+0x22/0x30
[  111.335533]  RSP <ffff880811d27b28>
[  111.339360] ---[ end trace 1ea9288f558e2c8d ]---

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-10 18:14                   ` James Simmons
@ 2016-07-11  1:01                     ` Al Viro
  2016-07-11  1:03                       ` Al Viro
  2016-07-11 17:15                       ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes James Simmons
  0 siblings, 2 replies; 38+ messages in thread
From: Al Viro @ 2016-07-11  1:01 UTC (permalink / raw)
  To: James Simmons
  Cc: Oleg Drokin, Mailing List, <linux-fsdevel@vger.kernel.org>

On Sun, Jul 10, 2016 at 07:14:18PM +0100, James Simmons wrote:

> [  111.210818]  [<ffffffffa04012bd>] kiblnd_send+0x51d/0x9e0 [ko2iblnd]

Mea culpa - in kiblnd_send() this
        if (payload_kiov)
                iov_iter_bvec(&from, ITER_BVEC | WRITE,
                                payload_kiov, payload_niov, payload_nob);
        else
                iov_iter_kvec(&from, ITER_BVEC | WRITE,
                                payload_iov, payload_niov, payload_nob);
should have s/BVEC/KVEC/ in the iov_iter_kvec() arguments.  Cut'n'paste
braindamage...

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-11  1:01                     ` Al Viro
@ 2016-07-11  1:03                       ` Al Viro
  2016-07-11 22:54                         ` lustre sendmsg stuff Oleg Drokin
  2016-07-11 17:15                       ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes James Simmons
  1 sibling, 1 reply; 38+ messages in thread
From: Al Viro @ 2016-07-11  1:03 UTC (permalink / raw)
  To: James Simmons
  Cc: Oleg Drokin, Mailing List, <linux-fsdevel@vger.kernel.org>

On Mon, Jul 11, 2016 at 02:01:13AM +0100, Al Viro wrote:
> On Sun, Jul 10, 2016 at 07:14:18PM +0100, James Simmons wrote:
> 
> > [  111.210818]  [<ffffffffa04012bd>] kiblnd_send+0x51d/0x9e0 [ko2iblnd]
> 
> Mea culpa - in kiblnd_send() this
>         if (payload_kiov)
>                 iov_iter_bvec(&from, ITER_BVEC | WRITE,
>                                 payload_kiov, payload_niov, payload_nob);
>         else
>                 iov_iter_kvec(&from, ITER_BVEC | WRITE,
>                                 payload_iov, payload_niov, payload_nob);
> should have s/BVEC/KVEC/ in the iov_iter_kvec() arguments.  Cut'n'paste
> braindamage...

PS: That was introduced in the last commit in that pile - "lustre: introduce
lnet_copy_{k,}iov2iter(), kill lnet_copy_{k,}iov2{k,}iov()".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
  2016-07-11  1:01                     ` Al Viro
  2016-07-11  1:03                       ` Al Viro
@ 2016-07-11 17:15                       ` James Simmons
  1 sibling, 0 replies; 38+ messages in thread
From: James Simmons @ 2016-07-11 17:15 UTC (permalink / raw)
  To: Al Viro; +Cc: Oleg Drokin, Mailing List, <linux-fsdevel@vger.kernel.org>


> On Sun, Jul 10, 2016 at 07:14:18PM +0100, James Simmons wrote:
> 
> > [  111.210818]  [<ffffffffa04012bd>] kiblnd_send+0x51d/0x9e0 [ko2iblnd]
> 
> Mea culpa - in kiblnd_send() this
>         if (payload_kiov)
>                 iov_iter_bvec(&from, ITER_BVEC | WRITE,
>                                 payload_kiov, payload_niov, payload_nob);
>         else
>                 iov_iter_kvec(&from, ITER_BVEC | WRITE,
>                                 payload_iov, payload_niov, payload_nob);
> should have s/BVEC/KVEC/ in the iov_iter_kvec() arguments.  Cut'n'paste
> braindamage...

That is the fix. Also I believe payload_nob should be payload_nob + 
payload_offset instead. I will send a patch that against Oleg's tree
that address these issues.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: lustre sendmsg stuff
  2016-07-11  1:03                       ` Al Viro
@ 2016-07-11 22:54                         ` Oleg Drokin
  0 siblings, 0 replies; 38+ messages in thread
From: Oleg Drokin @ 2016-07-11 22:54 UTC (permalink / raw)
  To: Al Viro
  Cc: James Simmons, Mailing List, <linux-fsdevel@vger.kernel.org>


On Jul 10, 2016, at 9:03 PM, Al Viro wrote:

> On Mon, Jul 11, 2016 at 02:01:13AM +0100, Al Viro wrote:
>> On Sun, Jul 10, 2016 at 07:14:18PM +0100, James Simmons wrote:
>> 
>>> [  111.210818]  [<ffffffffa04012bd>] kiblnd_send+0x51d/0x9e0 [ko2iblnd]
>> 
>> Mea culpa - in kiblnd_send() this
>>        if (payload_kiov)
>>                iov_iter_bvec(&from, ITER_BVEC | WRITE,
>>                                payload_kiov, payload_niov, payload_nob);
>>        else
>>                iov_iter_kvec(&from, ITER_BVEC | WRITE,
>>                                payload_iov, payload_niov, payload_nob);
>> should have s/BVEC/KVEC/ in the iov_iter_kvec() arguments.  Cut'n'paste
>> braindamage...
> 
> PS: That was introduced in the last commit in that pile - "lustre: introduce
> lnet_copy_{k,}iov2iter(), kill lnet_copy_{k,}iov2{k,}iov()".

Is this something you plan to submit to Linus or should I just submit this to
Greg along with other changes?

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-07-11 22:55 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17  4:09 More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes Oleg Drokin
2016-06-17  4:29 ` Al Viro
2016-06-25 16:38   ` Oleg Drokin
2016-07-03  6:29     ` Al Viro
2016-07-04  0:08       ` Al Viro
2016-07-04  0:37         ` Oleg Drokin
2016-07-04  0:37           ` Oleg Drokin
2016-07-04  3:08           ` Al Viro
2016-07-04  3:55             ` Oleg Drokin
2016-07-04  3:55               ` Oleg Drokin
2016-07-05  2:25               ` Al Viro
2016-07-10 17:01                 ` Oleg Drokin
2016-07-10 18:14                   ` James Simmons
2016-07-11  1:01                     ` Al Viro
2016-07-11  1:03                       ` Al Viro
2016-07-11 22:54                         ` lustre sendmsg stuff Oleg Drokin
2016-07-11 17:15                       ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes James Simmons
2016-07-05  2:28       ` Oleg Drokin
2016-07-05  2:32         ` Oleg Drokin
2016-07-05  4:43         ` Oleg Drokin
2016-07-05  6:22       ` Oleg Drokin
2016-07-05 12:31         ` Al Viro
2016-07-05 13:51           ` Al Viro
2016-07-05 15:21             ` Oleg Drokin
2016-07-05 17:42               ` Al Viro
2016-07-05 18:12                 ` Oleg Drokin
2016-07-05 16:33             ` Oleg Drokin
2016-07-05 18:08               ` Al Viro
2016-07-05 19:12                 ` Oleg Drokin
2016-07-05 20:08                   ` Al Viro
2016-07-05 20:21                     ` Oleg Drokin
2016-07-06  0:29                       ` Oleg Drokin
2016-07-06  3:20                         ` Al Viro
2016-07-06  3:25                           ` Oleg Drokin
2016-07-06  3:25                             ` Oleg Drokin
2016-07-06  4:35                             ` Oleg Drokin
2016-07-06  4:35                               ` Oleg Drokin
2016-07-06 16:24             ` Oleg Drokin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.