From: David Wysochanski <dwysocha@redhat.com>
To: Pavel Shilovsky <piastryyy@gmail.com>
Cc: Steve French <smfrench@gmail.com>,
Ronnie Sahlberg <lsahlber@redhat.com>,
linux-cifs <linux-cifs@vger.kernel.org>
Subject: Re: Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart
Date: Mon, 21 Oct 2019 23:04:47 -0400 [thread overview]
Message-ID: <CALF+zO=UaXnrwn1X0MnZ9Gn4g5qOZ4_SdH=WDyQ3Uc1fpWtD_Q@mail.gmail.com> (raw)
In-Reply-To: <CAKywueQRnmAuXJq+5ZER3ir6BhUgNFaaFCG6aZE7ep=qe=qbmQ@mail.gmail.com>
On Mon, Oct 21, 2019 at 6:19 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> пн, 21 окт. 2019 г. в 10:23, David Wysochanski <dwysocha@redhat.com>:
> >
> > HI all,
> >
> > We recently found another problem when running a simple test case
> > where multiple processes open/read/close a single file, and a network
> > disruption or smb restart occurs.
> >
> > After just a few minutes of restarts, a few of the processes error
> > out, but a few remain. If you stop the disruption, the share is
> > accessible again, but the original processes are blocked indefinitely
> > waiting on lock_sem
> >
> > Once this occurs, you'll see two threads deadlocked like the below. Basically:
> > thread1: down_read(lock_sem) inside cifs_strict_readv(); does not
> > block, but blocks on network temporarily
> > thread2: down_write(lock_sem) inside _cifsFileInfo_put; blocks due to thread 1
> > thread1: down_read(lock_sem) inside cifs_reopen_file (after
> > reconnect); unblocks to reconnect, but then blocks due to thread 2
> >
> > At this point there is a deadlock, both threads are waiting for each
> > other to release lock_sem
> >
> > The root of the problem seems to be having to hold the read sem over a
> > network disruption, and no check when it comes back before down_read a
> > second time.
> > Inside cifs_strict_readv there's a comment that explains why we need
> > hold the sem over the read, and it's so no one modifies the lock list
> > with a brlock that prevents writing.
> > However, in this case at least, we're only opening files for reading,
> > and inside _cifsFileInfo_put it needs to down_write to delete the
> > outstanding lock records.
> > So we're not really hitting that case, but are modifying the list.
> >
> > One approach to avoid the deadlock is for the reading process to drop
> > the lock_sem before re-aquiring it again inside cifs_reopen_file.
> > We would drop the lock just before calling, and then inside we'd need
> > to check for the condition of the conflicting brlock, and then
> > possibly error out somehow.
> > But that is probably not preferred.
> >
> > Instead, can the reader only down_read one time? cifs_relock_file is
> > only called from one place (cifs_reopen_file) and it's when we're
> > reconnecting, but
> > cifs_reopen_file is called from many places. I'm not sure if we there
> > is something in existence we can use that says we don't need to
> > re-aquire it, or if we need to add a flag or something?
> >
> >
> >
> >
> >
> > Here is the live crash session
> >
> > crash> ps -m | grep UN
> > [0 01:54:10.166] [UN] PID: 13271 TASK: ffff93b8a7349e40 CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.167] [UN] PID: 13273 TASK: ffff93b8b3310000 CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.169] [UN] PID: 13269 TASK: ffff93b8a734dac0 CPU: 0
> > COMMAND: "python3-openloo"
> > [0 01:54:10.208] [UN] PID: 13272 TASK: ffff93b8a7348000 CPU: 7
> > COMMAND: "python3-openloo"
> > [0 01:54:10.216] [UN] PID: 13265 TASK: ffff93b8a7358000 CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.172] [UN] PID: 13263 TASK: ffff93b8b3313c80 CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.229] [UN] PID: 13267 TASK: ffff93b8a7359e40 CPU: 5
> > COMMAND: "python3-openloo"
> > [0 01:54:10.234] [UN] PID: 13277 TASK: ffff93b8a7dddac0 CPU: 3
> > COMMAND: "python3-openloo"
> > [0 01:54:10.235] [UN] PID: 13270 TASK: ffff93b8a734bc80 CPU: 1
> > COMMAND: "python3-openloo"
> > [0 01:54:10.244] [UN] PID: 13276 TASK: ffff93b8a7dd8000 CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.260] [UN] PID: 13264 TASK: ffff93b8b3315ac0 CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.257] [UN] PID: 13274 TASK: ffff93b8a7ddbc80 CPU: 1
> > COMMAND: "python3-openloo"
> > [0 01:54:31.262] [UN] PID: 13279 TASK: ffff93b8a7d90000 CPU: 1
> > COMMAND: "python3-openloo" <--- block same instant
> > [0 01:54:31.265] [UN] PID: 13275 TASK: ffff93b8a7dd9e40 CPU: 4
> > COMMAND: "python3-openloo" <--- block same instant
> > crash> bt 13279
> > PID: 13279 TASK: ffff93b8a7d90000 CPU: 1 COMMAND: "python3-openloo"
> > #0 [ffffaae1c0a77c20] __schedule at ffffffffa19d4239
> > #1 [ffffaae1c0a77cb0] schedule at ffffffffa19d46b9
> > #2 [ffffaae1c0a77cc8] rwsem_down_write_slowpath at ffffffffa1131ada
> > <------------- T+1: pid 13279 down_write(... lock_sem); blocks due to
> > pid 13275
> > #3 [ffffaae1c0a77d98] _cifsFileInfo_put at ffffffffc0759818 [cifs]
> > #4 [ffffaae1c0a77e78] cifs_close at ffffffffc0759b4f [cifs]
> > #5 [ffffaae1c0a77e88] __fput at ffffffffa12f6e51
> > #6 [ffffaae1c0a77ed0] task_work_run at ffffffffa10fc827
> > #7 [ffffaae1c0a77f08] exit_to_usermode_loop at ffffffffa1003f83
> > #8 [ffffaae1c0a77f38] do_syscall_64 at ffffffffa10043c2
> > #9 [ffffaae1c0a77f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
> > RIP: 00007fc6caca1778 RSP: 00007ffed0c43de8 RFLAGS: 00000246
> > RAX: 0000000000000000 RBX: 00007fc6bd5a5f30 RCX: 00007fc6caca1778
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
> > RBP: 0000558ebb9ca0d0 R8: 00007fc6cab941a8 R9: 0000000000000000
> > R10: 0000558ebb9ca0d0 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000000 R14: 00007fc6cab4df0c R15: 00007ffed0c43f10
> > ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b
> > crash> bt 13275
> > PID: 13275 TASK: ffff93b8a7dd9e40 CPU: 4 COMMAND: "python3-openloo"
> > #0 [ffffaae1c0a57a00] __schedule at ffffffffa19d4239
> > #1 [ffffaae1c0a57a90] schedule at ffffffffa19d46b9
> > #2 [ffffaae1c0a57aa8] rwsem_down_read_slowpath at ffffffffa19d72b4
> > #3 [ffffaae1c0a57b60] cifs_reopen_file at ffffffffc0755352 [cifs]
> > <------------- T+2: pid 13275 down_read(... lock_sem); blocks due to
> > pid 13279
> > #4 [ffffaae1c0a57c08] cifs_readpage_worker at ffffffffc07575f0 [cifs]
> > #5 [ffffaae1c0a57cd0] cifs_readpage at ffffffffc0758096 [cifs]
> > #6 [ffffaae1c0a57d08] generic_file_read_iter at ffffffffa123acd2
> > #7 [ffffaae1c0a57e10] cifs_strict_readv at ffffffffc07606a5 [cifs]
> > <------------- T+0: pid 13275 down_read(... lock_sem); does not block
> > #8 [ffffaae1c0a57e40] new_sync_read at ffffffffa12f336a
> > #9 [ffffaae1c0a57ec8] vfs_read at ffffffffa12f5fad
> > #10 [ffffaae1c0a57f00] ksys_read at ffffffffa12f62df
> > #11 [ffffaae1c0a57f38] do_syscall_64 at ffffffffa10042bb
> > #12 [ffffaae1c0a57f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
> > RIP: 00007f6ed0b7c055 RSP: 00007ffeea637e68 RFLAGS: 00000246
> > RAX: ffffffffffffffda RBX: 00007f6ed05cd600 RCX: 00007f6ed0b7c055
> > RDX: 000000000000001e RSI: 00007f6ec3404c10 RDI: 0000000000000003
> > RBP: 000000000000001e R8: 00007f6ed0a6f1a8 R9: 0000000000000000
> > R10: 00007f6ed090faf0 R11: 0000000000000246 R12: 00007f6ec3480f40
> > R13: 0000000000000003 R14: 00007f6ec3404c10 R15: 000055c449ea60d0
> > ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
> >
> >
> > Here is my tracing from the attached debug patch which shows the
> > conditions, and points at the bugzilla we have open:
> > [ 233.588248] CIFS VFS: file=fs/cifs/file.c line=480 pid=13279
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [ 233.589462] CIFS VFS: file=fs/cifs/file.c line=311 pid=13279
> > cifs_sem_op op=down_write sem=ffff93b8a7424bd8
> > [ 233.592333] CIFS VFS: file=fs/cifs/file.c line=314 pid=13275
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [ 233.612324] CIFS VFS: file=fs/cifs/file.c line=314 pid=13279
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [ 233.612510] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13275
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8
> > [ 233.616487] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13279
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8 T0:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> > [ 254.389458] CIFS VFS: Send error in read = -9
> > [ 254.391943] CIFS VFS: file=fs/cifs/file.c line=3926 pid=13279
> > cifs_sem_op op=up_read sem=ffff93b8a7424bd8
> > [ 254.397453] CIFS VFS: file=fs/cifs/file.c line=471 pid=13279
> > cifs_sem_op op=down_write sem=ffff93b8a7424bd8 T1:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> > [ 254.399798] CIFS VFS: file=fs/cifs/file.c line=630 pid=13275
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8 T2:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> >
> >
> >
> > There was a patch that removed a lockdep warning, but it looks like
> > this removed a semi-valid warning, because clearly we can deadlock
> > here:
> >
> > commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d
> > Author: Rabin Vincent <rabinv@axis.com>
> > Date: Wed May 3 17:17:21 2017 +0200
> >
> > CIFS: silence lockdep splat in cifs_relock_file()
> >
> > cifs_relock_file() can perform a down_write() on the inode's lock_sem even
> > though it was already performed in cifs_strict_readv(). Lockdep complains
> > about this. AFAICS, there is no problem here, and lockdep just needs to be
> > told that this nesting is OK.
> >
> > =============================================
> > [ INFO: possible recursive locking detected ]
> > 4.11.0+ #20 Not tainted
> > ---------------------------------------------
> > cat/701 is trying to acquire lock:
> > (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00
> >
> > but task is already holding lock:
> > (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(&cifsi->lock_sem);
> > lock(&cifsi->lock_sem);
> >
> > *** DEADLOCK ***
> >
> > May be due to missing lock nesting notation
> >
> > 1 lock held by cat/701:
> > #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
> >
> > stack backtrace:
> > CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
> > Call Trace:
> > dump_stack+0x85/0xc2
> > __lock_acquire+0x17dd/0x2260
> > ? trace_hardirqs_on_thunk+0x1a/0x1c
> > ? preempt_schedule_irq+0x6b/0x80
> > lock_acquire+0xcc/0x260
> > ? lock_acquire+0xcc/0x260
> > ? cifs_reopen_file+0x7a7/0xc00
> > down_read+0x2d/0x70
> > ? cifs_reopen_file+0x7a7/0xc00
> > cifs_reopen_file+0x7a7/0xc00
> > ? printk+0x43/0x4b
> > cifs_readpage_worker+0x327/0x8a0
> > cifs_readpage+0x8c/0x2a0
> > generic_file_read_iter+0x692/0xd00
> > cifs_strict_readv+0x29f/0x310
> > generic_file_splice_read+0x11c/0x1c0
> > do_splice_to+0xa5/0xc0
> > splice_direct_to_actor+0xfa/0x350
> > ? generic_pipe_buf_nosteal+0x10/0x10
> > do_splice_direct+0xb5/0xe0
> > do_sendfile+0x278/0x3a0
> > SyS_sendfile64+0xc4/0xe0
> > entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > Signed-off-by: Rabin Vincent <rabinv@axis.com>
> > Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
> > Signed-off-by: Steve French <smfrench@gmail.com>
>
> So, even after we already grabbed the semaphore it doesn't allow us to
> acquire it one more time because of the conflicting waiting down_write
Yes it is my understanding this is the case. Otherwise, readers could
starve writers.
> ? If yes, then we can create another work queue that will be
> re-locking files that have been re-opened. In this case
> cifs_reopen_file will simply submit the "re-lock" request to the queue
> without the need to grab the lock one more time.
>
I think you're approach of a work queue may work but it would have to
be on a per-pid basis, as would any solution.
The root seems to be we're going to sleep holding the lock_sem. Is
there anyway to avoid that?
FWIW, the flag approach I mentioned (i.e. adding another flag in
cifsInodeInfo.flags and checking it), I don't think would work because
some other process could clear it and you'd end up in the same
situation.
I tried it and realized the same problem can happen again.
> Thoughts?
>
> --
> Best regards,
> Pavel Shilovsky
next prev parent reply other threads:[~2019-10-22 3:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-21 17:22 Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart David Wysochanski
2019-10-21 22:19 ` Pavel Shilovsky
2019-10-22 3:04 ` David Wysochanski [this message]
2019-10-22 16:09 ` Pavel Shilovsky
2019-10-22 16:43 ` David Wysochanski
2019-10-22 19:00 ` David Wysochanski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALF+zO=UaXnrwn1X0MnZ9Gn4g5qOZ4_SdH=WDyQ3Uc1fpWtD_Q@mail.gmail.com' \
--to=dwysocha@redhat.com \
--cc=linux-cifs@vger.kernel.org \
--cc=lsahlber@redhat.com \
--cc=piastryyy@gmail.com \
--cc=smfrench@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.