From: David Wysochanski <dwysocha@redhat.com>
To: Pavel Shilovsky <piastryyy@gmail.com>
Cc: Steve French <smfrench@gmail.com>,
Ronnie Sahlberg <lsahlber@redhat.com>,
linux-cifs <linux-cifs@vger.kernel.org>
Subject: Re: Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart
Date: Mon, 21 Oct 2019 23:04:47 -0400 [thread overview]
Message-ID: <CALF+zO=UaXnrwn1X0MnZ9Gn4g5qOZ4_SdH=WDyQ3Uc1fpWtD_Q@mail.gmail.com> (raw)
In-Reply-To: <CAKywueQRnmAuXJq+5ZER3ir6BhUgNFaaFCG6aZE7ep=qe=qbmQ@mail.gmail.com>
On Mon, Oct 21, 2019 at 6:19 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> пн, 21 окт. 2019 г. в 10:23, David Wysochanski <dwysocha@redhat.com>:
> >
> > HI all,
> >
> > We recently found another problem when running a simple test case
> > where multiple processes open/read/close a single file, and a network
> > disruption or smb restart occurs.
> >
> > After just a few minutes of restarts, a few of the processes error
> > out, but a few remain. If you stop the disruption, the share is
> > accessible again, but the original processes are blocked indefinitely
> > waiting on lock_sem
> >
> > Once this occurs, you'll see two threads deadlocked like the below. Basically:
> > thread1: down_read(lock_sem) inside cifs_strict_readv(); does not
> > block, but blocks on network temporarily
> > thread2: down_write(lock_sem) inside _cifsFileInfo_put; blocks due to thread 1
> > thread1: down_read(lock_sem) inside cifs_reopen_file (after
> > reconnect); unblocks to reconnect, but then blocks due to thread 2
> >
> > At this point there is a deadlock, both threads are waiting for each
> > other to release lock_sem
> >
> > The root of the problem seems to be having to hold the read sem over a
> > network disruption, and no check when it comes back before down_read a
> > second time.
> > Inside cifs_strict_readv there's a comment that explains why we need
> > hold the sem over the read, and it's so no one modifies the lock list
> > with a brlock that prevents writing.
> > However, in this case at least, we're only opening files for reading,
> > and inside _cifsFileInfo_put it needs to down_write to delete the
> > outstanding lock records.
> > So we're not really hitting that case, but are modifying the list.
> >
> > One approach to avoid the deadlock is for the reading process to drop
> > the lock_sem before re-aquiring it again inside cifs_reopen_file.
> > We would drop the lock just before calling, and then inside we'd need
> > to check for the condition of the conflicting brlock, and then
> > possibly error out somehow.
> > But that is probably not preferred.
> >
> > Instead, can the reader only down_read one time? cifs_relock_file is
> > only called from one place (cifs_reopen_file) and it's when we're
> > reconnecting, but
> > cifs_reopen_file is called from many places. I'm not sure if we there
> > is something in existence we can use that says we don't need to
> > re-aquire it, or if we need to add a flag or something?
> >
> >
> >
> >
> >
> > Here is the live crash session
> >
> > crash> ps -m | grep UN
> > [0 01:54:10.166] [UN] PID: 13271 TASK: ffff93b8a7349e40 CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.167] [UN] PID: 13273 TASK: ffff93b8b3310000 CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.169] [UN] PID: 13269 TASK: ffff93b8a734dac0 CPU: 0
> > COMMAND: "python3-openloo"
> > [0 01:54:10.208] [UN] PID: 13272 TASK: ffff93b8a7348000 CPU: 7
> > COMMAND: "python3-openloo"
> > [0 01:54:10.216] [UN] PID: 13265 TASK: ffff93b8a7358000 CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.172] [UN] PID: 13263 TASK: ffff93b8b3313c80 CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.229] [UN] PID: 13267 TASK: ffff93b8a7359e40 CPU: 5
> > COMMAND: "python3-openloo"
> > [0 01:54:10.234] [UN] PID: 13277 TASK: ffff93b8a7dddac0 CPU: 3
> > COMMAND: "python3-openloo"
> > [0 01:54:10.235] [UN] PID: 13270 TASK: ffff93b8a734bc80 CPU: 1
> > COMMAND: "python3-openloo"
> > [0 01:54:10.244] [UN] PID: 13276 TASK: ffff93b8a7dd8000 CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.260] [UN] PID: 13264 TASK: ffff93b8b3315ac0 CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.257] [UN] PID: 13274 TASK: ffff93b8a7ddbc80 CPU: 1
> > COMMAND: "python3-openloo"
> > [0 01:54:31.262] [UN] PID: 13279 TASK: ffff93b8a7d90000 CPU: 1
> > COMMAND: "python3-openloo" <--- block same instant
> > [0 01:54:31.265] [UN] PID: 13275 TASK: ffff93b8a7dd9e40 CPU: 4
> > COMMAND: "python3-openloo" <--- block same instant
> > crash> bt 13279
> > PID: 13279 TASK: ffff93b8a7d90000 CPU: 1 COMMAND: "python3-openloo"
> > #0 [ffffaae1c0a77c20] __schedule at ffffffffa19d4239
> > #1 [ffffaae1c0a77cb0] schedule at ffffffffa19d46b9
> > #2 [ffffaae1c0a77cc8] rwsem_down_write_slowpath at ffffffffa1131ada
> > <------------- T+1: pid 13279 down_write(... lock_sem); blocks due to
> > pid 13275
> > #3 [ffffaae1c0a77d98] _cifsFileInfo_put at ffffffffc0759818 [cifs]
> > #4 [ffffaae1c0a77e78] cifs_close at ffffffffc0759b4f [cifs]
> > #5 [ffffaae1c0a77e88] __fput at ffffffffa12f6e51
> > #6 [ffffaae1c0a77ed0] task_work_run at ffffffffa10fc827
> > #7 [ffffaae1c0a77f08] exit_to_usermode_loop at ffffffffa1003f83
> > #8 [ffffaae1c0a77f38] do_syscall_64 at ffffffffa10043c2
> > #9 [ffffaae1c0a77f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
> > RIP: 00007fc6caca1778 RSP: 00007ffed0c43de8 RFLAGS: 00000246
> > RAX: 0000000000000000 RBX: 00007fc6bd5a5f30 RCX: 00007fc6caca1778
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
> > RBP: 0000558ebb9ca0d0 R8: 00007fc6cab941a8 R9: 0000000000000000
> > R10: 0000558ebb9ca0d0 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000000 R14: 00007fc6cab4df0c R15: 00007ffed0c43f10
> > ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b
> > crash> bt 13275
> > PID: 13275 TASK: ffff93b8a7dd9e40 CPU: 4 COMMAND: "python3-openloo"
> > #0 [ffffaae1c0a57a00] __schedule at ffffffffa19d4239
> > #1 [ffffaae1c0a57a90] schedule at ffffffffa19d46b9
> > #2 [ffffaae1c0a57aa8] rwsem_down_read_slowpath at ffffffffa19d72b4
> > #3 [ffffaae1c0a57b60] cifs_reopen_file at ffffffffc0755352 [cifs]
> > <------------- T+2: pid 13275 down_read(... lock_sem); blocks due to
> > pid 13279
> > #4 [ffffaae1c0a57c08] cifs_readpage_worker at ffffffffc07575f0 [cifs]
> > #5 [ffffaae1c0a57cd0] cifs_readpage at ffffffffc0758096 [cifs]
> > #6 [ffffaae1c0a57d08] generic_file_read_iter at ffffffffa123acd2
> > #7 [ffffaae1c0a57e10] cifs_strict_readv at ffffffffc07606a5 [cifs]
> > <------------- T+0: pid 13275 down_read(... lock_sem); does not block
> > #8 [ffffaae1c0a57e40] new_sync_read at ffffffffa12f336a
> > #9 [ffffaae1c0a57ec8] vfs_read at ffffffffa12f5fad
> > #10 [ffffaae1c0a57f00] ksys_read at ffffffffa12f62df
> > #11 [ffffaae1c0a57f38] do_syscall_64 at ffffffffa10042bb
> > #12 [ffffaae1c0a57f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
> > RIP: 00007f6ed0b7c055 RSP: 00007ffeea637e68 RFLAGS: 00000246
> > RAX: ffffffffffffffda RBX: 00007f6ed05cd600 RCX: 00007f6ed0b7c055
> > RDX: 000000000000001e RSI: 00007f6ec3404c10 RDI: 0000000000000003
> > RBP: 000000000000001e R8: 00007f6ed0a6f1a8 R9: 0000000000000000
> > R10: 00007f6ed090faf0 R11: 0000000000000246 R12: 00007f6ec3480f40
> > R13: 0000000000000003 R14: 00007f6ec3404c10 R15: 000055c449ea60d0
> > ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
> >
> >
> > Here is my tracing from the attached debug patch which shows the
> > conditions, and points at the bugzilla we have open:
> > [ 233.588248] CIFS VFS: file=fs/cifs/file.c line=480 pid=13279
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [ 233.589462] CIFS VFS: file=fs/cifs/file.c line=311 pid=13279
> > cifs_sem_op op=down_write sem=ffff93b8a7424bd8
> > [ 233.592333] CIFS VFS: file=fs/cifs/file.c line=314 pid=13275
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [ 233.612324] CIFS VFS: file=fs/cifs/file.c line=314 pid=13279
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [ 233.612510] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13275
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8
> > [ 233.616487] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13279
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8 T0:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> > [ 254.389458] CIFS VFS: Send error in read = -9
> > [ 254.391943] CIFS VFS: file=fs/cifs/file.c line=3926 pid=13279
> > cifs_sem_op op=up_read sem=ffff93b8a7424bd8
> > [ 254.397453] CIFS VFS: file=fs/cifs/file.c line=471 pid=13279
> > cifs_sem_op op=down_write sem=ffff93b8a7424bd8 T1:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> > [ 254.399798] CIFS VFS: file=fs/cifs/file.c line=630 pid=13275
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8 T2:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> >
> >
> >
> > There was a patch that removed a lockdep warning, but it looks like
> > this removed a semi-valid warning, because clearly we can deadlock
> > here:
> >
> > commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d
> > Author: Rabin Vincent <rabinv@axis.com>
> > Date: Wed May 3 17:17:21 2017 +0200
> >
> > CIFS: silence lockdep splat in cifs_relock_file()
> >
> > cifs_relock_file() can perform a down_write() on the inode's lock_sem even
> > though it was already performed in cifs_strict_readv(). Lockdep complains
> > about this. AFAICS, there is no problem here, and lockdep just needs to be
> > told that this nesting is OK.
> >
> > =============================================
> > [ INFO: possible recursive locking detected ]
> > 4.11.0+ #20 Not tainted
> > ---------------------------------------------
> > cat/701 is trying to acquire lock:
> > (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00
> >
> > but task is already holding lock:
> > (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(&cifsi->lock_sem);
> > lock(&cifsi->lock_sem);
> >
> > *** DEADLOCK ***
> >
> > May be due to missing lock nesting notation
> >
> > 1 lock held by cat/701:
> > #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
> >
> > stack backtrace:
> > CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
> > Call Trace:
> > dump_stack+0x85/0xc2
> > __lock_acquire+0x17dd/0x2260
> > ? trace_hardirqs_on_thunk+0x1a/0x1c
> > ? preempt_schedule_irq+0x6b/0x80
> > lock_acquire+0xcc/0x260
> > ? lock_acquire+0xcc/0x260
> > ? cifs_reopen_file+0x7a7/0xc00
> > down_read+0x2d/0x70
> > ? cifs_reopen_file+0x7a7/0xc00
> > cifs_reopen_file+0x7a7/0xc00
> > ? printk+0x43/0x4b
> > cifs_readpage_worker+0x327/0x8a0
> > cifs_readpage+0x8c/0x2a0
> > generic_file_read_iter+0x692/0xd00
> > cifs_strict_readv+0x29f/0x310
> > generic_file_splice_read+0x11c/0x1c0
> > do_splice_to+0xa5/0xc0
> > splice_direct_to_actor+0xfa/0x350
> > ? generic_pipe_buf_nosteal+0x10/0x10
> > do_splice_direct+0xb5/0xe0
> > do_sendfile+0x278/0x3a0
> > SyS_sendfile64+0xc4/0xe0
> > entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > Signed-off-by: Rabin Vincent <rabinv@axis.com>
> > Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
> > Signed-off-by: Steve French <smfrench@gmail.com>
>
> So, even after we already grabbed the semaphore it doesn't allow us to
> acquire it one more time because of the conflicting waiting down_write
Yes it is my understanding this is the case. Otherwise, readers could
starve writers.
> ? If yes, then we can create another work queue that will be
> re-locking files that have been re-opened. In this case
> cifs_reopen_file will simply submit the "re-lock" request to the queue
> without the need to grab the lock one more time.
>
I think you're approach of a work queue may work but it would have to
be on a per-pid basis, as would any solution.
The root seems to be we're going to sleep holding the lock_sem. Is
there anyway to avoid that?
FWIW, the flag approach I mentioned (i.e. adding another flag in
cifsInodeInfo.flags and checking it), I don't think would work because
some other process could clear it and you'd end up in the same
situation.
I tried it and realized the same problem can happen again.
> Thoughts?
>
> --
> Best regards,
> Pavel Shilovsky
next prev parent reply other threads:[~2019-10-22 3:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-21 17:22 Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart David Wysochanski
2019-10-21 22:19 ` Pavel Shilovsky
2019-10-22 3:04 ` David Wysochanski [this message]
2019-10-22 16:09 ` Pavel Shilovsky
2019-10-22 16:43 ` David Wysochanski
2019-10-22 19:00 ` David Wysochanski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALF+zO=UaXnrwn1X0MnZ9Gn4g5qOZ4_SdH=WDyQ3Uc1fpWtD_Q@mail.gmail.com' \
--to=dwysocha@redhat.com \
--cc=linux-cifs@vger.kernel.org \
--cc=lsahlber@redhat.com \
--cc=piastryyy@gmail.com \
--cc=smfrench@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).