All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Drokin <green@linuxhacker.ru>
To: Al Viro <viro@ZenIV.linux.org.uk>,
	Trond Myklebust <trond.myklebust@primarydata.com>
Cc: linux-nfs@vger.kernel.org,
	Mailing List <linux-kernel@vger.kernel.org>,
	"<linux-fsdevel@vger.kernel.org>" <linux-fsdevel@vger.kernel.org>,
	idryomov@gmail.com, sage@redhat.com, zyan@redhat.com
Subject: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.
Date: Fri, 17 Jun 2016 00:09:19 -0400	[thread overview]
Message-ID: <F861E78B-00C3-4F17-A10D-C9FCB81D43B9@linuxhacker.ru> (raw)

Hello!

   So I think I finally kind of understand this other d_splice_alias problem
   I've been having.

   Imagine that we have a negative dentry in the cache.

   Now we have two threads that are trying to open this file in parallel,
   the fs is NFS, so atomic_open is defined, it's not a create, so
   parent is locked in shared mode and they both enter first lookup_open
   and then (because the dentry is negative) into nfs_atomic_open().

   Now in nfs_atomic_open we have two threads that run alongside each other
   trying to open this file and both fail (the code is with Al's earlier patch
   for the other d_splice_alias problem).

        inode = NFS_PROTO(dir)->open_context(dir, ctx, open_flags, &attr, opened);
        if (IS_ERR(inode)) {
                err = PTR_ERR(inode);
                trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
                put_nfs_open_context(ctx);
                d_drop(dentry);
. . .
                case -ENOTDIR:
                        goto no_open;
. . .
no_open:
        res = nfs_lookup(dir, dentry, lookup_flags);


    So they both do d_drop(), the dentry is now unhashed, and they both
    dive into nfs_lookup().
    There eventually they both call

	res = d_splice_alias(inode, dentry);

   And so the first lucky one continues on it's merry way with a hashed dentry,
   but the other less lucky one ends up calling into d_splice_alias() with
   dentry that's already hashed and hits the very familiar assertion.

   I took a brief look into ceph and it looks like a very similar thing
   might happen there with handle_reply() for two parallel replies calling into
   ceph_fill_trace() and then splice_alias()->d_splice_alias(), since the
   unhashed check it does is not under any locks, it's unsafe, so the problem
   might be more generic than just NFS too.

   So I wonder how to best fix this? Holding some sort of dentry lock across a call
   into atomic_open in VFS? We cannot just make d_splice_alias() callers call with
   inode->i_lock held because dentry might be negative.

   I know this is really happening because I in fact have 3 threads doing the above
   described race, one luckily passing through and two hitting the assertion:
PID: 2689619  TASK: ffff8800ca740540  CPU: 6   COMMAND: "ls"
 #0 [ffff8800c6da3758] machine_kexec at ffffffff81042462
 #1 [ffff8800c6da37b8] __crash_kexec at ffffffff811365ff
 #2 [ffff8800c6da3880] __crash_kexec at ffffffff811366d5
 #3 [ffff8800c6da3898] crash_kexec at ffffffff8113671b
 #4 [ffff8800c6da38b8] oops_end at ffffffff81018cb4
 #5 [ffff8800c6da38e0] die at ffffffff8101917b
 #6 [ffff8800c6da3910] do_trap at ffffffff81015b92
 #7 [ffff8800c6da3960] do_error_trap at ffffffff81015d8b
 #8 [ffff8800c6da3a20] do_invalid_op at ffffffff810166e0
 #9 [ffff8800c6da3a30] invalid_op at ffffffff8188c91e
    [exception RIP: d_splice_alias+478]
    RIP: ffffffff8128784e  RSP: ffff8800c6da3ae8  RFLAGS: 00010286
    RAX: ffff8800d8661ab8  RBX: ffff8800d3b6b178  RCX: ffff8800ca740540
    RDX: ffffffff81382f09  RSI: ffff8800d3b6b178  RDI: ffff8800d8661ab8
    RBP: ffff8800c6da3b20   R8: 0000000000000000   R9: 0000000000000000
    R10: ffff8800ca740540  R11: 0000000000000000  R12: ffff8800d6491600
    R13: 0000000000000102  R14: ffff8800d39063d0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff8800c6da3b28] nfs_lookup at ffffffff8137cd1c
#11 [ffff8800c6da3b80] nfs_atomic_open at ffffffff8137ed61
#12 [ffff8800c6da3c30] lookup_open at ffffffff8127901a
#13 [ffff8800c6da3d50] path_openat at ffffffff8127bf15
#14 [ffff8800c6da3dd8] do_filp_open at ffffffff8127d9b1
#15 [ffff8800c6da3ee0] do_sys_open at ffffffff81269f90
#16 [ffff8800c6da3f40] sys_open at ffffffff8126a09e
#17 [ffff8800c6da3f50] entry_SYSCALL_64_fastpath at ffffffff8188aebc
    RIP: 00007f92a76d4d20  RSP: 00007fff640c1160  RFLAGS: 00000206
    RAX: ffffffffffffffda  RBX: 0000000000000003  RCX: 00007f92a76d4d20
    RDX: 000055c9f5be9f50  RSI: 0000000000090800  RDI: 000055c9f5bea1d0
    RBP: 0000000000000000   R8: 000055c9f5bea340   R9: 0000000000000000
    R10: 000055c9f5bea300  R11: 0000000000000206  R12: 00007f92a800d6b0
    R13: 000055c9f5bf3000  R14: 000055c9f5bea3e0  R15: 0000000000000000
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

PID: 2689597  TASK: ffff880109b40bc0  CPU: 7   COMMAND: "ls"
 #0 [ffff8800c07178e0] die at ffffffff8101914e
 #1 [ffff8800c0717910] do_trap at ffffffff81015b92
 #2 [ffff8800c0717960] do_error_trap at ffffffff81015d8b
 #3 [ffff8800c0717a20] do_invalid_op at ffffffff810166e0
 #4 [ffff8800c0717a30] invalid_op at ffffffff8188c91e
    [exception RIP: d_splice_alias+478]
    RIP: ffffffff8128784e  RSP: ffff8800c0717ae8  RFLAGS: 00010286
    RAX: ffff8800d8661ab8  RBX: ffff8800d3b6b178  RCX: ffff880109b40bc0
    RDX: ffffffff81382f09  RSI: ffff8800d3b6b178  RDI: ffff8800d8661ab8
    RBP: ffff8800c0717b20   R8: 0000000000000000   R9: 0000000000000000
    R10: ffff880109b40bc0  R11: 0000000000000000  R12: ffff8800d6a29e40
    R13: 0000000000000102  R14: ffff8800d39063d0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff8800c0717b28] nfs_lookup at ffffffff8137cd1c
 #6 [ffff8800c0717b80] nfs_atomic_open at ffffffff8137ed61
 #7 [ffff8800c0717c30] lookup_open at ffffffff8127901a
 #8 [ffff8800c0717d50] path_openat at ffffffff8127bf15
 #9 [ffff8800c0717dd8] do_filp_open at ffffffff8127d9b1
#10 [ffff8800c0717ee0] do_sys_open at ffffffff81269f90
#11 [ffff8800c0717f40] sys_open at ffffffff8126a09e
#12 [ffff8800c0717f50] entry_SYSCALL_64_fastpath at ffffffff8188aebc
    RIP: 00007f5cef6f8d20  RSP: 00007ffcc98f87b0  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000000000000003  RCX: 00007f5cef6f8d20
    RDX: 000055de64230f50  RSI: 0000000000090800  RDI: 000055de642311d0
    RBP: 0000000000000000   R8: 000055de64231320   R9: 0000000000000000
    R10: 000055de64231300  R11: 0000000000000202  R12: 00007f5cf00316b0
    R13: 000055de6423a000  R14: 000055de642313c0  R15: 0000000000000000
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

The dentry is 0xffff8800d3b6b178 in both cases.

             reply	other threads:[~2016-06-17  4:09 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  4:09 Oleg Drokin [this message]
2016-06-17  4:29 ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes Al Viro
2016-06-25 16:38   ` Oleg Drokin
2016-07-03  6:29     ` Al Viro
2016-07-04  0:08       ` Al Viro
2016-07-04  0:37         ` Oleg Drokin
2016-07-04  0:37           ` Oleg Drokin
2016-07-04  3:08           ` Al Viro
2016-07-04  3:55             ` Oleg Drokin
2016-07-04  3:55               ` Oleg Drokin
2016-07-05  2:25               ` Al Viro
2016-07-10 17:01                 ` Oleg Drokin
2016-07-10 18:14                   ` James Simmons
2016-07-11  1:01                     ` Al Viro
2016-07-11  1:03                       ` Al Viro
2016-07-11 22:54                         ` lustre sendmsg stuff Oleg Drokin
2016-07-11 17:15                       ` More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes James Simmons
2016-07-05  2:28       ` Oleg Drokin
2016-07-05  2:32         ` Oleg Drokin
2016-07-05  4:43         ` Oleg Drokin
2016-07-05  6:22       ` Oleg Drokin
2016-07-05 12:31         ` Al Viro
2016-07-05 13:51           ` Al Viro
2016-07-05 15:21             ` Oleg Drokin
2016-07-05 17:42               ` Al Viro
2016-07-05 18:12                 ` Oleg Drokin
2016-07-05 16:33             ` Oleg Drokin
2016-07-05 18:08               ` Al Viro
2016-07-05 19:12                 ` Oleg Drokin
2016-07-05 20:08                   ` Al Viro
2016-07-05 20:21                     ` Oleg Drokin
2016-07-06  0:29                       ` Oleg Drokin
2016-07-06  3:20                         ` Al Viro
2016-07-06  3:25                           ` Oleg Drokin
2016-07-06  3:25                             ` Oleg Drokin
2016-07-06  4:35                             ` Oleg Drokin
2016-07-06  4:35                               ` Oleg Drokin
2016-07-06 16:24             ` Oleg Drokin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F861E78B-00C3-4F17-A10D-C9FCB81D43B9@linuxhacker.ru \
    --to=green@linuxhacker.ru \
    --cc=idryomov@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sage@redhat.com \
    --cc=trond.myklebust@primarydata.com \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=zyan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.