From: Oleg Drokin <green@linuxhacker.ru>
To: Al Viro <viro@ZenIV.linux.org.uk>,
Trond Myklebust <trond.myklebust@primarydata.com>
Cc: linux-nfs@vger.kernel.org,
Mailing List <linux-kernel@vger.kernel.org>,
"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>,
idryomov@gmail.com, sage@redhat.com, zyan@redhat.com
Subject: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
Date: Fri, 17 Jun 2016 00:09:19 -0400 [thread overview]
Message-ID: <F861E78B-00C3-4F17-A10D-C9FCB81D43B9@linuxhacker.ru> (raw)
Hello!
So I think I finally kind of understand this other d_splice_alias problem
I've been having.
Imagine that we have a negative dentry in the cache.
Now we have two threads that are trying to open this file in parallel,
the fs is NFS, so atomic_open is defined, it's not a create, so
parent is locked in shared mode and they both enter first lookup_open
and then (because the dentry is negative) into nfs_atomic_open().
Now in nfs_atomic_open we have two threads that run alongside each other
trying to open this file and both fail (the code is with Al's earlier patch
for the other d_splice_alias problem).
inode = NFS_PROTO(dir)->open_context(dir, ctx, open_flags, &attr, opened);
if (IS_ERR(inode)) {
err = PTR_ERR(inode);
trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
put_nfs_open_context(ctx);
d_drop(dentry);
. . .
case -ENOTDIR:
goto no_open;
. . .
no_open:
res = nfs_lookup(dir, dentry, lookup_flags);
So they both do d_drop(), the dentry is now unhashed, and they both
dive into nfs_lookup().
There eventually they both call
res = d_splice_alias(inode, dentry);
And so the first lucky one continues on it's merry way with a hashed dentry,
but the other less lucky one ends up calling into d_splice_alias() with
dentry that's already hashed and hits the very familiar assertion.
I took a brief look into ceph and it looks like a very similar thing
might happen there with handle_reply() for two parallel replies calling into
ceph_fill_trace() and then splice_alias()->d_splice_alias(), since the
unhashed check it does is not under any locks, it's unsafe, so the problem
might be more generic than just NFS too.
So I wonder how to best fix this? Holding some sort of dentry lock across a call
into atomic_open in VFS? We cannot just make d_splice_alias() callers call with
inode->i_lock held because dentry might be negative.
I know this is really happening because I in fact have 3 threads doing the above
described race, one luckily passing through and two hitting the assertion:
PID: 2689619 TASK: ffff8800ca740540 CPU: 6 COMMAND: "ls"
#0 [ffff8800c6da3758] machine_kexec at ffffffff81042462
#1 [ffff8800c6da37b8] __crash_kexec at ffffffff811365ff
#2 [ffff8800c6da3880] __crash_kexec at ffffffff811366d5
#3 [ffff8800c6da3898] crash_kexec at ffffffff8113671b
#4 [ffff8800c6da38b8] oops_end at ffffffff81018cb4
#5 [ffff8800c6da38e0] die at ffffffff8101917b
#6 [ffff8800c6da3910] do_trap at ffffffff81015b92
#7 [ffff8800c6da3960] do_error_trap at ffffffff81015d8b
#8 [ffff8800c6da3a20] do_invalid_op at ffffffff810166e0
#9 [ffff8800c6da3a30] invalid_op at ffffffff8188c91e
[exception RIP: d_splice_alias+478]
RIP: ffffffff8128784e RSP: ffff8800c6da3ae8 RFLAGS: 00010286
RAX: ffff8800d8661ab8 RBX: ffff8800d3b6b178 RCX: ffff8800ca740540
RDX: ffffffff81382f09 RSI: ffff8800d3b6b178 RDI: ffff8800d8661ab8
RBP: ffff8800c6da3b20 R8: 0000000000000000 R9: 0000000000000000
R10: ffff8800ca740540 R11: 0000000000000000 R12: ffff8800d6491600
R13: 0000000000000102 R14: ffff8800d39063d0 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff8800c6da3b28] nfs_lookup at ffffffff8137cd1c
#11 [ffff8800c6da3b80] nfs_atomic_open at ffffffff8137ed61
#12 [ffff8800c6da3c30] lookup_open at ffffffff8127901a
#13 [ffff8800c6da3d50] path_openat at ffffffff8127bf15
#14 [ffff8800c6da3dd8] do_filp_open at ffffffff8127d9b1
#15 [ffff8800c6da3ee0] do_sys_open at ffffffff81269f90
#16 [ffff8800c6da3f40] sys_open at ffffffff8126a09e
#17 [ffff8800c6da3f50] entry_SYSCALL_64_fastpath at ffffffff8188aebc
RIP: 00007f92a76d4d20 RSP: 00007fff640c1160 RFLAGS: 00000206
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f92a76d4d20
RDX: 000055c9f5be9f50 RSI: 0000000000090800 RDI: 000055c9f5bea1d0
RBP: 0000000000000000 R8: 000055c9f5bea340 R9: 0000000000000000
R10: 000055c9f5bea300 R11: 0000000000000206 R12: 00007f92a800d6b0
R13: 000055c9f5bf3000 R14: 000055c9f5bea3e0 R15: 0000000000000000
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
PID: 2689597 TASK: ffff880109b40bc0 CPU: 7 COMMAND: "ls"
#0 [ffff8800c07178e0] die at ffffffff8101914e
#1 [ffff8800c0717910] do_trap at ffffffff81015b92
#2 [ffff8800c0717960] do_error_trap at ffffffff81015d8b
#3 [ffff8800c0717a20] do_invalid_op at ffffffff810166e0
#4 [ffff8800c0717a30] invalid_op at ffffffff8188c91e
[exception RIP: d_splice_alias+478]
RIP: ffffffff8128784e RSP: ffff8800c0717ae8 RFLAGS: 00010286
RAX: ffff8800d8661ab8 RBX: ffff8800d3b6b178 RCX: ffff880109b40bc0
RDX: ffffffff81382f09 RSI: ffff8800d3b6b178 RDI: ffff8800d8661ab8
RBP: ffff8800c0717b20 R8: 0000000000000000 R9: 0000000000000000
R10: ffff880109b40bc0 R11: 0000000000000000 R12: ffff8800d6a29e40
R13: 0000000000000102 R14: ffff8800d39063d0 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffff8800c0717b28] nfs_lookup at ffffffff8137cd1c
#6 [ffff8800c0717b80] nfs_atomic_open at ffffffff8137ed61
#7 [ffff8800c0717c30] lookup_open at ffffffff8127901a
#8 [ffff8800c0717d50] path_openat at ffffffff8127bf15
#9 [ffff8800c0717dd8] do_filp_open at ffffffff8127d9b1
#10 [ffff8800c0717ee0] do_sys_open at ffffffff81269f90
#11 [ffff8800c0717f40] sys_open at ffffffff8126a09e
#12 [ffff8800c0717f50] entry_SYSCALL_64_fastpath at ffffffff8188aebc
RIP: 00007f5cef6f8d20 RSP: 00007ffcc98f87b0 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f5cef6f8d20
RDX: 000055de64230f50 RSI: 0000000000090800 RDI: 000055de642311d0
RBP: 0000000000000000 R8: 000055de64231320 R9: 0000000000000000
R10: 000055de64231300 R11: 0000000000000202 R12: 00007f5cf00316b0
R13: 000055de6423a000 R14: 000055de642313c0 R15: 0000000000000000
ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
The dentry is 0xffff8800d3b6b178 in both cases.
next reply other threads:[~2016-06-17 4:09 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-17 4:09 Oleg Drokin [this message]
2016-06-17 4:29 ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes Al Viro
2016-06-25 16:38 ` Oleg Drokin
2016-07-03 6:29 ` Al Viro
2016-07-04 0:08 ` Al Viro
2016-07-04 0:37 ` Oleg Drokin
2016-07-04 0:37 ` Oleg Drokin
2016-07-04 3:08 ` Al Viro
2016-07-04 3:55 ` Oleg Drokin
2016-07-04 3:55 ` Oleg Drokin
2016-07-05 2:25 ` Al Viro
2016-07-10 17:01 ` Oleg Drokin
2016-07-10 18:14 ` James Simmons
2016-07-11 1:01 ` Al Viro
2016-07-11 1:03 ` Al Viro
2016-07-11 22:54 ` lustre sendmsg stuff Oleg Drokin
2016-07-11 17:15 ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes James Simmons
2016-07-05 2:28 ` Oleg Drokin
2016-07-05 2:32 ` Oleg Drokin
2016-07-05 4:43 ` Oleg Drokin
2016-07-05 6:22 ` Oleg Drokin
2016-07-05 12:31 ` Al Viro
2016-07-05 13:51 ` Al Viro
2016-07-05 15:21 ` Oleg Drokin
2016-07-05 17:42 ` Al Viro
2016-07-05 18:12 ` Oleg Drokin
2016-07-05 16:33 ` Oleg Drokin
2016-07-05 18:08 ` Al Viro
2016-07-05 19:12 ` Oleg Drokin
2016-07-05 20:08 ` Al Viro
2016-07-05 20:21 ` Oleg Drokin
2016-07-06 0:29 ` Oleg Drokin
2016-07-06 3:20 ` Al Viro
2016-07-06 3:25 ` Oleg Drokin
2016-07-06 3:25 ` Oleg Drokin
2016-07-06 4:35 ` Oleg Drokin
2016-07-06 4:35 ` Oleg Drokin
2016-07-06 16:24 ` Oleg Drokin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F861E78B-00C3-4F17-A10D-C9FCB81D43B9@linuxhacker.ru \
--to=green@linuxhacker.ru \
--cc=idryomov@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=sage@redhat.com \
--cc=trond.myklebust@primarydata.com \
--cc=viro@ZenIV.linux.org.uk \
--cc=zyan@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.