Linux-CIFS Archive on lore.kernel.org
 help / color / Atom feed
* Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart
@ 2019-10-21 17:22 David Wysochanski
  2019-10-21 22:19 ` Pavel Shilovsky
  0 siblings, 1 reply; 6+ messages in thread
From: David Wysochanski @ 2019-10-21 17:22 UTC (permalink / raw)
  To: linux-cifs

[-- Attachment #1: Type: text/plain, Size: 10296 bytes --]

HI all,

We recently found another problem when running a simple test case
where multiple processes open/read/close a single file, and a network
disruption or smb restart occurs.

After just a few minutes of restarts, a few of the processes error
out, but a few remain.  If you stop the disruption, the share is
accessible again, but the original processes are blocked indefinitely
waiting on lock_sem

Once this occurs, you'll see two threads deadlocked like the below.  Basically:
thread1: down_read(lock_sem) inside cifs_strict_readv(); does not
block, but blocks on network temporarily
thread2: down_write(lock_sem) inside _cifsFileInfo_put; blocks due to thread 1
thread1: down_read(lock_sem) inside cifs_reopen_file (after
reconnect); unblocks to reconnect, but then blocks due to thread 2

At this point there is a deadlock, both threads are waiting for each
other to release lock_sem

The root of the problem seems to be having to hold the read sem over a
network disruption, and no check when it comes back before down_read a
second time.
Inside cifs_strict_readv there's a comment that explains why we need
hold the sem over the read, and it's so no one modifies the lock list
with a brlock that prevents writing.
However, in this case at least, we're only opening files for reading,
and inside _cifsFileInfo_put it needs to down_write to delete the
outstanding lock records.
So we're not really hitting that case, but are modifying the list.

One approach to avoid the deadlock is for the reading process to drop
the lock_sem before re-aquiring it again inside cifs_reopen_file.
We would drop the lock just before calling, and then inside we'd need
to check for the condition of the conflicting brlock, and then
possibly error out somehow.
But that is probably not preferred.

Instead, can the reader only down_read one time?  cifs_relock_file is
only called from one place (cifs_reopen_file) and it's when we're
reconnecting, but
cifs_reopen_file is called from many places. I'm not sure if we there
is something in existence we can use that says we don't need to
re-aquire it, or if we need to add a flag or something?





Here is the live crash session

crash> ps -m | grep UN
[0 01:54:10.166] [UN]  PID: 13271  TASK: ffff93b8a7349e40  CPU: 2
COMMAND: "python3-openloo"
[0 01:54:10.167] [UN]  PID: 13273  TASK: ffff93b8b3310000  CPU: 2
COMMAND: "python3-openloo"
[0 01:54:10.169] [UN]  PID: 13269  TASK: ffff93b8a734dac0  CPU: 0
COMMAND: "python3-openloo"
[0 01:54:10.208] [UN]  PID: 13272  TASK: ffff93b8a7348000  CPU: 7
COMMAND: "python3-openloo"
[0 01:54:10.216] [UN]  PID: 13265  TASK: ffff93b8a7358000  CPU: 6
COMMAND: "python3-openloo"
[0 01:54:10.172] [UN]  PID: 13263  TASK: ffff93b8b3313c80  CPU: 2
COMMAND: "python3-openloo"
[0 01:54:10.229] [UN]  PID: 13267  TASK: ffff93b8a7359e40  CPU: 5
COMMAND: "python3-openloo"
[0 01:54:10.234] [UN]  PID: 13277  TASK: ffff93b8a7dddac0  CPU: 3
COMMAND: "python3-openloo"
[0 01:54:10.235] [UN]  PID: 13270  TASK: ffff93b8a734bc80  CPU: 1
COMMAND: "python3-openloo"
[0 01:54:10.244] [UN]  PID: 13276  TASK: ffff93b8a7dd8000  CPU: 6
COMMAND: "python3-openloo"
[0 01:54:10.260] [UN]  PID: 13264  TASK: ffff93b8b3315ac0  CPU: 6
COMMAND: "python3-openloo"
[0 01:54:10.257] [UN]  PID: 13274  TASK: ffff93b8a7ddbc80  CPU: 1
COMMAND: "python3-openloo"
[0 01:54:31.262] [UN]  PID: 13279  TASK: ffff93b8a7d90000  CPU: 1
COMMAND: "python3-openloo" <--- block same instant
[0 01:54:31.265] [UN]  PID: 13275  TASK: ffff93b8a7dd9e40  CPU: 4
COMMAND: "python3-openloo" <--- block same instant
crash> bt 13279
PID: 13279  TASK: ffff93b8a7d90000  CPU: 1   COMMAND: "python3-openloo"
 #0 [ffffaae1c0a77c20] __schedule at ffffffffa19d4239
 #1 [ffffaae1c0a77cb0] schedule at ffffffffa19d46b9
 #2 [ffffaae1c0a77cc8] rwsem_down_write_slowpath at ffffffffa1131ada
<------------- T+1: pid 13279 down_write(... lock_sem); blocks due to
pid 13275
 #3 [ffffaae1c0a77d98] _cifsFileInfo_put at ffffffffc0759818 [cifs]
 #4 [ffffaae1c0a77e78] cifs_close at ffffffffc0759b4f [cifs]
 #5 [ffffaae1c0a77e88] __fput at ffffffffa12f6e51
 #6 [ffffaae1c0a77ed0] task_work_run at ffffffffa10fc827
 #7 [ffffaae1c0a77f08] exit_to_usermode_loop at ffffffffa1003f83
 #8 [ffffaae1c0a77f38] do_syscall_64 at ffffffffa10043c2
 #9 [ffffaae1c0a77f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
    RIP: 00007fc6caca1778  RSP: 00007ffed0c43de8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 00007fc6bd5a5f30  RCX: 00007fc6caca1778
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000003
    RBP: 0000558ebb9ca0d0   R8: 00007fc6cab941a8   R9: 0000000000000000
    R10: 0000558ebb9ca0d0  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000000000000000  R14: 00007fc6cab4df0c  R15: 00007ffed0c43f10
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
crash> bt 13275
PID: 13275  TASK: ffff93b8a7dd9e40  CPU: 4   COMMAND: "python3-openloo"
 #0 [ffffaae1c0a57a00] __schedule at ffffffffa19d4239
 #1 [ffffaae1c0a57a90] schedule at ffffffffa19d46b9
 #2 [ffffaae1c0a57aa8] rwsem_down_read_slowpath at ffffffffa19d72b4
 #3 [ffffaae1c0a57b60] cifs_reopen_file at ffffffffc0755352 [cifs]
<------------- T+2: pid 13275 down_read(... lock_sem); blocks due to
pid 13279
 #4 [ffffaae1c0a57c08] cifs_readpage_worker at ffffffffc07575f0 [cifs]
 #5 [ffffaae1c0a57cd0] cifs_readpage at ffffffffc0758096 [cifs]
 #6 [ffffaae1c0a57d08] generic_file_read_iter at ffffffffa123acd2
 #7 [ffffaae1c0a57e10] cifs_strict_readv at ffffffffc07606a5 [cifs]
<------------- T+0: pid 13275 down_read(... lock_sem); does not block
 #8 [ffffaae1c0a57e40] new_sync_read at ffffffffa12f336a
 #9 [ffffaae1c0a57ec8] vfs_read at ffffffffa12f5fad
#10 [ffffaae1c0a57f00] ksys_read at ffffffffa12f62df
#11 [ffffaae1c0a57f38] do_syscall_64 at ffffffffa10042bb
#12 [ffffaae1c0a57f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
    RIP: 00007f6ed0b7c055  RSP: 00007ffeea637e68  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f6ed05cd600  RCX: 00007f6ed0b7c055
    RDX: 000000000000001e  RSI: 00007f6ec3404c10  RDI: 0000000000000003
    RBP: 000000000000001e   R8: 00007f6ed0a6f1a8   R9: 0000000000000000
    R10: 00007f6ed090faf0  R11: 0000000000000246  R12: 00007f6ec3480f40
    R13: 0000000000000003  R14: 00007f6ec3404c10  R15: 000055c449ea60d0
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b


Here is my tracing from the attached debug patch which shows the
conditions, and points at the bugzilla we have open:
[  233.588248] CIFS VFS: file=fs/cifs/file.c line=480 pid=13279
cifs_sem_op op=up_write sem=ffff93b8a7424bd8
[  233.589462] CIFS VFS: file=fs/cifs/file.c line=311 pid=13279
cifs_sem_op op=down_write sem=ffff93b8a7424bd8
[  233.592333] CIFS VFS: file=fs/cifs/file.c line=314 pid=13275
cifs_sem_op op=up_write sem=ffff93b8a7424bd8
[  233.612324] CIFS VFS: file=fs/cifs/file.c line=314 pid=13279
cifs_sem_op op=up_write sem=ffff93b8a7424bd8
[  233.612510] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13275
cifs_sem_op op=down_read sem=ffff93b8a7424bd8
[  233.616487] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13279
cifs_sem_op op=down_read sem=ffff93b8a7424bd8  T0:
https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
[  254.389458] CIFS VFS: Send error in read = -9
[  254.391943] CIFS VFS: file=fs/cifs/file.c line=3926 pid=13279
cifs_sem_op op=up_read sem=ffff93b8a7424bd8
[  254.397453] CIFS VFS: file=fs/cifs/file.c line=471 pid=13279
cifs_sem_op op=down_write sem=ffff93b8a7424bd8  T1:
https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
[  254.399798] CIFS VFS: file=fs/cifs/file.c line=630 pid=13275
cifs_sem_op op=down_read sem=ffff93b8a7424bd8   T2:
https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9



There was a patch that removed a lockdep warning, but it looks like
this removed a semi-valid warning, because clearly we can deadlock
here:

commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d
Author: Rabin Vincent <rabinv@axis.com>
Date:   Wed May 3 17:17:21 2017 +0200

    CIFS: silence lockdep splat in cifs_relock_file()

    cifs_relock_file() can perform a down_write() on the inode's lock_sem even
    though it was already performed in cifs_strict_readv().  Lockdep complains
    about this.  AFAICS, there is no problem here, and lockdep just needs to be
    told that this nesting is OK.

     =============================================
     [ INFO: possible recursive locking detected ]
     4.11.0+ #20 Not tainted
     ---------------------------------------------
     cat/701 is trying to acquire lock:
      (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00

     but task is already holding lock:
      (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

     other info that might help us debug this:
      Possible unsafe locking scenario:

            CPU0
            ----
       lock(&cifsi->lock_sem);
       lock(&cifsi->lock_sem);

      *** DEADLOCK ***

      May be due to missing lock nesting notation

     1 lock held by cat/701:
      #0:  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310

     stack backtrace:
     CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
     Call Trace:
      dump_stack+0x85/0xc2
      __lock_acquire+0x17dd/0x2260
      ? trace_hardirqs_on_thunk+0x1a/0x1c
      ? preempt_schedule_irq+0x6b/0x80
      lock_acquire+0xcc/0x260
      ? lock_acquire+0xcc/0x260
      ? cifs_reopen_file+0x7a7/0xc00
      down_read+0x2d/0x70
      ? cifs_reopen_file+0x7a7/0xc00
      cifs_reopen_file+0x7a7/0xc00
      ? printk+0x43/0x4b
      cifs_readpage_worker+0x327/0x8a0
      cifs_readpage+0x8c/0x2a0
      generic_file_read_iter+0x692/0xd00
      cifs_strict_readv+0x29f/0x310
      generic_file_splice_read+0x11c/0x1c0
      do_splice_to+0xa5/0xc0
      splice_direct_to_actor+0xfa/0x350
      ? generic_pipe_buf_nosteal+0x10/0x10
      do_splice_direct+0xb5/0xe0
      do_sendfile+0x278/0x3a0
      SyS_sendfile64+0xc4/0xe0
      entry_SYSCALL_64_fastpath+0x1f/0xbe

    Signed-off-by: Rabin Vincent <rabinv@axis.com>
    Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
    Signed-off-by: Steve French <smfrench@gmail.com>

[-- Attachment #2: 0001-debug-cifs-lock_sem-deadlock-redhat-bug-1760835.patch --]
[-- Type: text/x-patch, Size: 10878 bytes --]

From 68fe8004456298070247fd23c3af91b332d98d35 Mon Sep 17 00:00:00 2001
From: Dave Wysochanski <dwysocha@redhat.com>
Date: Mon, 21 Oct 2019 07:28:04 -0400
Subject: [PATCH] debug cifs lock_sem deadlock - redhat bug 1760835

debug of https://bugzilla.redhat.com/show_bug.cgi?id=1760835

cifs_dbg trace of lock_sem
turn off ratelimiting
---
 Makefile            |  2 +-
 fs/cifs/cifsfs.c    |  1 +
 fs/cifs/cifsproto.h | 34 ++++++++++++++++++++++++++++++++++
 fs/cifs/file.c      | 32 ++++++++++++++++++++++++++++++++
 fs/cifs/smb2file.c  |  2 ++
 lib/ratelimit.c     |  3 +--
 6 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 5475cdb6d57d..47b0d42f0418 100644
--- a/Makefile
+++ b/Makefile
@@ -2,7 +2,7 @@
 VERSION = 5
 PATCHLEVEL = 4
 SUBLEVEL = 0
-EXTRAVERSION = -rc4
+EXTRAVERSION = -rc4-dbg3
 NAME = Nesting Opossum
 
 # *DOCUMENTATION*
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index c049c7b3aa87..431a9d46b409 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1335,6 +1335,7 @@ static ssize_t cifs_copy_file_range(struct file *src_file, loff_t off,
 	struct cifsInodeInfo *cifsi = inode;
 
 	inode_init_once(&cifsi->vfs_inode);
+	cifs_sem_op_trace_init_rwsem(&cifsi->lock_sem);
 	init_rwsem(&cifsi->lock_sem);
 }
 
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index e53e9f62b87b..5488e5e11e40 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -65,6 +65,40 @@ extern int smb_send(struct TCP_Server_Info *, struct smb_hdr *,
 	else							\
 		trace_smb3_exit_done(curr_xid, __func__);	\
 } while (0)
+
+
+/* Debug lock_sem */
+#define cifs_sem_op_trace(SEM, OP) \
+do { \
+	cifs_dbg(VFS, "file=%s line=%d pid=%d cifs_sem_op op=%s sem=%px\n", \
+		 __FILE__, __LINE__, current->pid, #OP, SEM); \
+} while(0)
+
+#define cifs_sem_op_trace_init_rwsem(X) \
+do { \
+	cifs_sem_op_trace(X, init_rwsem); \
+} while (0)
+
+#define cifs_sem_op_trace_down_read(X) \
+do { \
+	cifs_sem_op_trace(X, down_read); \
+} while (0)
+
+#define cifs_sem_op_trace_up_read(X) \
+do { \
+	cifs_sem_op_trace(X, up_read); \
+} while (0)
+
+#define cifs_sem_op_trace_down_write(X) \
+do { \
+	cifs_sem_op_trace(X, down_write); \
+} while (0)
+
+#define cifs_sem_op_trace_up_write(X) \
+do { \
+	cifs_sem_op_trace(X, up_write); \
+} while (0)
+
 extern int init_cifs_idmap(void);
 extern void exit_cifs_idmap(void);
 extern int init_cifs_spnego(void);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 5ad15de2bb4f..58e554cc397a 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -270,6 +270,7 @@ int cifs_posix_open(char *full_path, struct inode **pinode,
 	struct cifs_fid_locks *cur;
 	bool has_locks = false;
 
+	cifs_sem_op_trace_down_read(&cinode->lock_sem);
 	down_read(&cinode->lock_sem);
 	list_for_each_entry(cur, &cinode->llist, llist) {
 		if (!list_empty(&cur->locks)) {
@@ -277,6 +278,7 @@ int cifs_posix_open(char *full_path, struct inode **pinode,
 			break;
 		}
 	}
+	cifs_sem_op_trace_up_read(&cinode->lock_sem);
 	up_read(&cinode->lock_sem);
 	return has_locks;
 }
@@ -306,8 +308,10 @@ struct cifsFileInfo *
 	INIT_LIST_HEAD(&fdlocks->locks);
 	fdlocks->cfile = cfile;
 	cfile->llist = fdlocks;
+	cifs_sem_op_trace_down_write(&cinode->lock_sem);
 	down_write(&cinode->lock_sem);
 	list_add(&fdlocks->llist, &cinode->llist);
+	cifs_sem_op_trace_up_write(&cinode->lock_sem);
 	up_write(&cinode->lock_sem);
 
 	cfile->count = 1;
@@ -464,6 +468,7 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
 	 * Delete any outstanding lock records. We'll lose them when the file
 	 * is closed anyway.
 	 */
+	cifs_sem_op_trace_down_write(&cifsi->lock_sem);
 	down_write(&cifsi->lock_sem);
 	list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) {
 		list_del(&li->llist);
@@ -472,6 +477,7 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
 	}
 	list_del(&cifs_file->llist->llist);
 	kfree(cifs_file->llist);
+	cifs_sem_op_trace_up_write(&cifsi->lock_sem);
 	up_write(&cifsi->lock_sem);
 
 	cifs_put_tlink(cifs_file->tlink);
@@ -621,9 +627,11 @@ int cifs_open(struct inode *inode, struct file *file)
 	struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
 	int rc = 0;
 
+	cifs_sem_op_trace_down_read(&cinode->lock_sem);
 	down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING);
 	if (cinode->can_cache_brlcks) {
 		/* can cache locks - no need to relock */
+		cifs_sem_op_trace_up_read(&cinode->lock_sem);
 		up_read(&cinode->lock_sem);
 		return rc;
 	}
@@ -635,6 +643,7 @@ int cifs_open(struct inode *inode, struct file *file)
 	else
 		rc = tcon->ses->server->ops->push_mand_locks(cfile);
 
+	cifs_sem_op_trace_up_read(&cinode->lock_sem);
 	up_read(&cinode->lock_sem);
 	return rc;
 }
@@ -1001,6 +1010,7 @@ int cifs_closedir(struct inode *inode, struct file *file)
 	struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server;
 	bool exist;
 
+	cifs_sem_op_trace_down_read(&cinode->lock_sem);
 	down_read(&cinode->lock_sem);
 
 	exist = cifs_find_lock_conflict(cfile, offset, length, type,
@@ -1019,6 +1029,7 @@ int cifs_closedir(struct inode *inode, struct file *file)
 	else
 		flock->fl_type = F_UNLCK;
 
+	cifs_sem_op_trace_up_read(&cinode->lock_sem);
 	up_read(&cinode->lock_sem);
 	return rc;
 }
@@ -1027,8 +1038,10 @@ int cifs_closedir(struct inode *inode, struct file *file)
 cifs_lock_add(struct cifsFileInfo *cfile, struct cifsLockInfo *lock)
 {
 	struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry));
+	cifs_sem_op_trace_down_write(&cinode->lock_sem);
 	down_write(&cinode->lock_sem);
 	list_add_tail(&lock->llist, &cfile->llist->locks);
+	cifs_sem_op_trace_up_write(&cinode->lock_sem);
 	up_write(&cinode->lock_sem);
 }
 
@@ -1049,6 +1062,7 @@ int cifs_closedir(struct inode *inode, struct file *file)
 
 try_again:
 	exist = false;
+	cifs_sem_op_trace_down_write(&cinode->lock_sem);
 	down_write(&cinode->lock_sem);
 
 	exist = cifs_find_lock_conflict(cfile, lock->offset, lock->length,
@@ -1056,6 +1070,7 @@ int cifs_closedir(struct inode *inode, struct file *file)
 					CIFS_LOCK_OP);
 	if (!exist && cinode->can_cache_brlcks) {
 		list_add_tail(&lock->llist, &cfile->llist->locks);
+		cifs_sem_op_trace_up_write(&cinode->lock_sem);
 		up_write(&cinode->lock_sem);
 		return rc;
 	}
@@ -1066,16 +1081,19 @@ int cifs_closedir(struct inode *inode, struct file *file)
 		rc = -EACCES;
 	else {
 		list_add_tail(&lock->blist, &conf_lock->blist);
+		cifs_sem_op_trace_up_write(&cinode->lock_sem);
 		up_write(&cinode->lock_sem);
 		rc = wait_event_interruptible(lock->block_q,
 					(lock->blist.prev == &lock->blist) &&
 					(lock->blist.next == &lock->blist));
 		if (!rc)
 			goto try_again;
+		cifs_sem_op_trace_down_write(&cinode->lock_sem);
 		down_write(&cinode->lock_sem);
 		list_del_init(&lock->blist);
 	}
 
+	cifs_sem_op_trace_up_write(&cinode->lock_sem);
 	up_write(&cinode->lock_sem);
 	return rc;
 }
@@ -1097,6 +1115,7 @@ int cifs_closedir(struct inode *inode, struct file *file)
 	if ((flock->fl_flags & FL_POSIX) == 0)
 		return 1;
 
+	cifs_sem_op_trace_down_read(&cinode->lock_sem);
 	down_read(&cinode->lock_sem);
 	posix_test_lock(file, flock);
 
@@ -1105,6 +1124,7 @@ int cifs_closedir(struct inode *inode, struct file *file)
 		rc = 1;
 	}
 
+	cifs_sem_op_trace_up_read(&cinode->lock_sem);
 	up_read(&cinode->lock_sem);
 	return rc;
 }
@@ -1125,13 +1145,16 @@ int cifs_closedir(struct inode *inode, struct file *file)
 		return rc;
 
 try_again:
+	cifs_sem_op_trace_down_write(&cinode->lock_sem);
 	down_write(&cinode->lock_sem);
 	if (!cinode->can_cache_brlcks) {
+		cifs_sem_op_trace_up_write(&cinode->lock_sem);
 		up_write(&cinode->lock_sem);
 		return rc;
 	}
 
 	rc = posix_lock_file(file, flock, NULL);
+	cifs_sem_op_trace_up_write(&cinode->lock_sem);
 	up_write(&cinode->lock_sem);
 	if (rc == FILE_LOCK_DEFERRED) {
 		rc = wait_event_interruptible(flock->fl_wait, !flock->fl_blocker);
@@ -1331,8 +1354,10 @@ struct lock_to_push {
 	int rc = 0;
 
 	/* we are going to update can_cache_brlcks here - need a write access */
+	cifs_sem_op_trace_down_write(&cinode->lock_sem);
 	down_write(&cinode->lock_sem);
 	if (!cinode->can_cache_brlcks) {
+		cifs_sem_op_trace_up_write(&cinode->lock_sem);
 		up_write(&cinode->lock_sem);
 		return rc;
 	}
@@ -1345,6 +1370,7 @@ struct lock_to_push {
 		rc = tcon->ses->server->ops->push_mand_locks(cfile);
 
 	cinode->can_cache_brlcks = false;
+	cifs_sem_op_trace_up_write(&cinode->lock_sem);
 	up_write(&cinode->lock_sem);
 	return rc;
 }
@@ -1522,6 +1548,7 @@ struct lock_to_push {
 	if (!buf)
 		return -ENOMEM;
 
+	cifs_sem_op_trace_down_write(&cinode->lock_sem);
 	down_write(&cinode->lock_sem);
 	for (i = 0; i < 2; i++) {
 		cur = buf;
@@ -1592,6 +1619,7 @@ struct lock_to_push {
 		}
 	}
 
+	cifs_sem_op_trace_up_write(&cinode->lock_sem);
 	up_write(&cinode->lock_sem);
 	kfree(buf);
 	return rc;
@@ -3148,6 +3176,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
 	 * We need to hold the sem to be sure nobody modifies lock list
 	 * with a brlock that prevents writing.
 	 */
+	cifs_sem_op_trace_down_read(&cinode->lock_sem);
 	down_read(&cinode->lock_sem);
 
 	rc = generic_write_checks(iocb, from);
@@ -3161,6 +3190,7 @@ ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from)
 	else
 		rc = -EACCES;
 out:
+	cifs_sem_op_trace_up_read(&cinode->lock_sem);
 	up_read(&cinode->lock_sem);
 	inode_unlock(inode);
 
@@ -3887,11 +3917,13 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
 	 * We need to hold the sem to be sure nobody modifies lock list
 	 * with a brlock that prevents reading.
 	 */
+	cifs_sem_op_trace_down_read(&cinode->lock_sem);
 	down_read(&cinode->lock_sem);
 	if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to),
 				     tcon->ses->server->vals->shared_lock_type,
 				     0, NULL, CIFS_READ_OP))
 		rc = generic_file_read_iter(iocb, to);
+	cifs_sem_op_trace_up_read(&cinode->lock_sem);
 	up_read(&cinode->lock_sem);
 	return rc;
 }
diff --git a/fs/cifs/smb2file.c b/fs/cifs/smb2file.c
index e6a1fc72018f..9f89a765d977 100644
--- a/fs/cifs/smb2file.c
+++ b/fs/cifs/smb2file.c
@@ -145,6 +145,7 @@
 
 	cur = buf;
 
+	cifs_sem_op_trace_down_write(&cinode->lock_sem);
 	down_write(&cinode->lock_sem);
 	list_for_each_entry_safe(li, tmp, &cfile->llist->locks, llist) {
 		if (flock->fl_start > li->offset ||
@@ -206,6 +207,7 @@
 		} else
 			cifs_free_llist(&tmp_llist);
 	}
+	cifs_sem_op_trace_up_write(&cinode->lock_sem);
 	up_write(&cinode->lock_sem);
 
 	kfree(buf);
diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index e01a93f46f83..bc87f790304c 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -29,8 +29,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
 	unsigned long flags;
 	int ret;
 
-	if (!rs->interval)
-		return 1;
+	return 1;
 
 	/*
 	 * If we contend on this state's lock then almost
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart
  2019-10-21 17:22 Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart David Wysochanski
@ 2019-10-21 22:19 ` Pavel Shilovsky
  2019-10-22  3:04   ` David Wysochanski
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Shilovsky @ 2019-10-21 22:19 UTC (permalink / raw)
  To: David Wysochanski, Steve French, Ronnie Sahlberg; +Cc: linux-cifs

пн, 21 окт. 2019 г. в 10:23, David Wysochanski <dwysocha@redhat.com>:
>
> HI all,
>
> We recently found another problem when running a simple test case
> where multiple processes open/read/close a single file, and a network
> disruption or smb restart occurs.
>
> After just a few minutes of restarts, a few of the processes error
> out, but a few remain.  If you stop the disruption, the share is
> accessible again, but the original processes are blocked indefinitely
> waiting on lock_sem
>
> Once this occurs, you'll see two threads deadlocked like the below.  Basically:
> thread1: down_read(lock_sem) inside cifs_strict_readv(); does not
> block, but blocks on network temporarily
> thread2: down_write(lock_sem) inside _cifsFileInfo_put; blocks due to thread 1
> thread1: down_read(lock_sem) inside cifs_reopen_file (after
> reconnect); unblocks to reconnect, but then blocks due to thread 2
>
> At this point there is a deadlock, both threads are waiting for each
> other to release lock_sem
>
> The root of the problem seems to be having to hold the read sem over a
> network disruption, and no check when it comes back before down_read a
> second time.
> Inside cifs_strict_readv there's a comment that explains why we need
> hold the sem over the read, and it's so no one modifies the lock list
> with a brlock that prevents writing.
> However, in this case at least, we're only opening files for reading,
> and inside _cifsFileInfo_put it needs to down_write to delete the
> outstanding lock records.
> So we're not really hitting that case, but are modifying the list.
>
> One approach to avoid the deadlock is for the reading process to drop
> the lock_sem before re-aquiring it again inside cifs_reopen_file.
> We would drop the lock just before calling, and then inside we'd need
> to check for the condition of the conflicting brlock, and then
> possibly error out somehow.
> But that is probably not preferred.
>
> Instead, can the reader only down_read one time?  cifs_relock_file is
> only called from one place (cifs_reopen_file) and it's when we're
> reconnecting, but
> cifs_reopen_file is called from many places. I'm not sure if we there
> is something in existence we can use that says we don't need to
> re-aquire it, or if we need to add a flag or something?
>
>
>
>
>
> Here is the live crash session
>
> crash> ps -m | grep UN
> [0 01:54:10.166] [UN]  PID: 13271  TASK: ffff93b8a7349e40  CPU: 2
> COMMAND: "python3-openloo"
> [0 01:54:10.167] [UN]  PID: 13273  TASK: ffff93b8b3310000  CPU: 2
> COMMAND: "python3-openloo"
> [0 01:54:10.169] [UN]  PID: 13269  TASK: ffff93b8a734dac0  CPU: 0
> COMMAND: "python3-openloo"
> [0 01:54:10.208] [UN]  PID: 13272  TASK: ffff93b8a7348000  CPU: 7
> COMMAND: "python3-openloo"
> [0 01:54:10.216] [UN]  PID: 13265  TASK: ffff93b8a7358000  CPU: 6
> COMMAND: "python3-openloo"
> [0 01:54:10.172] [UN]  PID: 13263  TASK: ffff93b8b3313c80  CPU: 2
> COMMAND: "python3-openloo"
> [0 01:54:10.229] [UN]  PID: 13267  TASK: ffff93b8a7359e40  CPU: 5
> COMMAND: "python3-openloo"
> [0 01:54:10.234] [UN]  PID: 13277  TASK: ffff93b8a7dddac0  CPU: 3
> COMMAND: "python3-openloo"
> [0 01:54:10.235] [UN]  PID: 13270  TASK: ffff93b8a734bc80  CPU: 1
> COMMAND: "python3-openloo"
> [0 01:54:10.244] [UN]  PID: 13276  TASK: ffff93b8a7dd8000  CPU: 6
> COMMAND: "python3-openloo"
> [0 01:54:10.260] [UN]  PID: 13264  TASK: ffff93b8b3315ac0  CPU: 6
> COMMAND: "python3-openloo"
> [0 01:54:10.257] [UN]  PID: 13274  TASK: ffff93b8a7ddbc80  CPU: 1
> COMMAND: "python3-openloo"
> [0 01:54:31.262] [UN]  PID: 13279  TASK: ffff93b8a7d90000  CPU: 1
> COMMAND: "python3-openloo" <--- block same instant
> [0 01:54:31.265] [UN]  PID: 13275  TASK: ffff93b8a7dd9e40  CPU: 4
> COMMAND: "python3-openloo" <--- block same instant
> crash> bt 13279
> PID: 13279  TASK: ffff93b8a7d90000  CPU: 1   COMMAND: "python3-openloo"
>  #0 [ffffaae1c0a77c20] __schedule at ffffffffa19d4239
>  #1 [ffffaae1c0a77cb0] schedule at ffffffffa19d46b9
>  #2 [ffffaae1c0a77cc8] rwsem_down_write_slowpath at ffffffffa1131ada
> <------------- T+1: pid 13279 down_write(... lock_sem); blocks due to
> pid 13275
>  #3 [ffffaae1c0a77d98] _cifsFileInfo_put at ffffffffc0759818 [cifs]
>  #4 [ffffaae1c0a77e78] cifs_close at ffffffffc0759b4f [cifs]
>  #5 [ffffaae1c0a77e88] __fput at ffffffffa12f6e51
>  #6 [ffffaae1c0a77ed0] task_work_run at ffffffffa10fc827
>  #7 [ffffaae1c0a77f08] exit_to_usermode_loop at ffffffffa1003f83
>  #8 [ffffaae1c0a77f38] do_syscall_64 at ffffffffa10043c2
>  #9 [ffffaae1c0a77f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
>     RIP: 00007fc6caca1778  RSP: 00007ffed0c43de8  RFLAGS: 00000246
>     RAX: 0000000000000000  RBX: 00007fc6bd5a5f30  RCX: 00007fc6caca1778
>     RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000003
>     RBP: 0000558ebb9ca0d0   R8: 00007fc6cab941a8   R9: 0000000000000000
>     R10: 0000558ebb9ca0d0  R11: 0000000000000246  R12: 0000000000000000
>     R13: 0000000000000000  R14: 00007fc6cab4df0c  R15: 00007ffed0c43f10
>     ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
> crash> bt 13275
> PID: 13275  TASK: ffff93b8a7dd9e40  CPU: 4   COMMAND: "python3-openloo"
>  #0 [ffffaae1c0a57a00] __schedule at ffffffffa19d4239
>  #1 [ffffaae1c0a57a90] schedule at ffffffffa19d46b9
>  #2 [ffffaae1c0a57aa8] rwsem_down_read_slowpath at ffffffffa19d72b4
>  #3 [ffffaae1c0a57b60] cifs_reopen_file at ffffffffc0755352 [cifs]
> <------------- T+2: pid 13275 down_read(... lock_sem); blocks due to
> pid 13279
>  #4 [ffffaae1c0a57c08] cifs_readpage_worker at ffffffffc07575f0 [cifs]
>  #5 [ffffaae1c0a57cd0] cifs_readpage at ffffffffc0758096 [cifs]
>  #6 [ffffaae1c0a57d08] generic_file_read_iter at ffffffffa123acd2
>  #7 [ffffaae1c0a57e10] cifs_strict_readv at ffffffffc07606a5 [cifs]
> <------------- T+0: pid 13275 down_read(... lock_sem); does not block
>  #8 [ffffaae1c0a57e40] new_sync_read at ffffffffa12f336a
>  #9 [ffffaae1c0a57ec8] vfs_read at ffffffffa12f5fad
> #10 [ffffaae1c0a57f00] ksys_read at ffffffffa12f62df
> #11 [ffffaae1c0a57f38] do_syscall_64 at ffffffffa10042bb
> #12 [ffffaae1c0a57f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
>     RIP: 00007f6ed0b7c055  RSP: 00007ffeea637e68  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: 00007f6ed05cd600  RCX: 00007f6ed0b7c055
>     RDX: 000000000000001e  RSI: 00007f6ec3404c10  RDI: 0000000000000003
>     RBP: 000000000000001e   R8: 00007f6ed0a6f1a8   R9: 0000000000000000
>     R10: 00007f6ed090faf0  R11: 0000000000000246  R12: 00007f6ec3480f40
>     R13: 0000000000000003  R14: 00007f6ec3404c10  R15: 000055c449ea60d0
>     ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b
>
>
> Here is my tracing from the attached debug patch which shows the
> conditions, and points at the bugzilla we have open:
> [  233.588248] CIFS VFS: file=fs/cifs/file.c line=480 pid=13279
> cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> [  233.589462] CIFS VFS: file=fs/cifs/file.c line=311 pid=13279
> cifs_sem_op op=down_write sem=ffff93b8a7424bd8
> [  233.592333] CIFS VFS: file=fs/cifs/file.c line=314 pid=13275
> cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> [  233.612324] CIFS VFS: file=fs/cifs/file.c line=314 pid=13279
> cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> [  233.612510] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13275
> cifs_sem_op op=down_read sem=ffff93b8a7424bd8
> [  233.616487] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13279
> cifs_sem_op op=down_read sem=ffff93b8a7424bd8  T0:
> https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> [  254.389458] CIFS VFS: Send error in read = -9
> [  254.391943] CIFS VFS: file=fs/cifs/file.c line=3926 pid=13279
> cifs_sem_op op=up_read sem=ffff93b8a7424bd8
> [  254.397453] CIFS VFS: file=fs/cifs/file.c line=471 pid=13279
> cifs_sem_op op=down_write sem=ffff93b8a7424bd8  T1:
> https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> [  254.399798] CIFS VFS: file=fs/cifs/file.c line=630 pid=13275
> cifs_sem_op op=down_read sem=ffff93b8a7424bd8   T2:
> https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
>
>
>
> There was a patch that removed a lockdep warning, but it looks like
> this removed a semi-valid warning, because clearly we can deadlock
> here:
>
> commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d
> Author: Rabin Vincent <rabinv@axis.com>
> Date:   Wed May 3 17:17:21 2017 +0200
>
>     CIFS: silence lockdep splat in cifs_relock_file()
>
>     cifs_relock_file() can perform a down_write() on the inode's lock_sem even
>     though it was already performed in cifs_strict_readv().  Lockdep complains
>     about this.  AFAICS, there is no problem here, and lockdep just needs to be
>     told that this nesting is OK.
>
>      =============================================
>      [ INFO: possible recursive locking detected ]
>      4.11.0+ #20 Not tainted
>      ---------------------------------------------
>      cat/701 is trying to acquire lock:
>       (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00
>
>      but task is already holding lock:
>       (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
>
>      other info that might help us debug this:
>       Possible unsafe locking scenario:
>
>             CPU0
>             ----
>        lock(&cifsi->lock_sem);
>        lock(&cifsi->lock_sem);
>
>       *** DEADLOCK ***
>
>       May be due to missing lock nesting notation
>
>      1 lock held by cat/701:
>       #0:  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
>
>      stack backtrace:
>      CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
>      Call Trace:
>       dump_stack+0x85/0xc2
>       __lock_acquire+0x17dd/0x2260
>       ? trace_hardirqs_on_thunk+0x1a/0x1c
>       ? preempt_schedule_irq+0x6b/0x80
>       lock_acquire+0xcc/0x260
>       ? lock_acquire+0xcc/0x260
>       ? cifs_reopen_file+0x7a7/0xc00
>       down_read+0x2d/0x70
>       ? cifs_reopen_file+0x7a7/0xc00
>       cifs_reopen_file+0x7a7/0xc00
>       ? printk+0x43/0x4b
>       cifs_readpage_worker+0x327/0x8a0
>       cifs_readpage+0x8c/0x2a0
>       generic_file_read_iter+0x692/0xd00
>       cifs_strict_readv+0x29f/0x310
>       generic_file_splice_read+0x11c/0x1c0
>       do_splice_to+0xa5/0xc0
>       splice_direct_to_actor+0xfa/0x350
>       ? generic_pipe_buf_nosteal+0x10/0x10
>       do_splice_direct+0xb5/0xe0
>       do_sendfile+0x278/0x3a0
>       SyS_sendfile64+0xc4/0xe0
>       entry_SYSCALL_64_fastpath+0x1f/0xbe
>
>     Signed-off-by: Rabin Vincent <rabinv@axis.com>
>     Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
>     Signed-off-by: Steve French <smfrench@gmail.com>

So, even after we already grabbed the semaphore it doesn't allow us to
acquire it one more time because of the conflicting waiting down_write
? If yes, then we can create another work queue that will be
re-locking files that have been re-opened. In this case
cifs_reopen_file will simply submit the "re-lock" request to the queue
without the need to grab the lock one more time.

Thoughts?

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart
  2019-10-21 22:19 ` Pavel Shilovsky
@ 2019-10-22  3:04   ` David Wysochanski
  2019-10-22 16:09     ` Pavel Shilovsky
  0 siblings, 1 reply; 6+ messages in thread
From: David Wysochanski @ 2019-10-22  3:04 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Steve French, Ronnie Sahlberg, linux-cifs

On Mon, Oct 21, 2019 at 6:19 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> пн, 21 окт. 2019 г. в 10:23, David Wysochanski <dwysocha@redhat.com>:
> >
> > HI all,
> >
> > We recently found another problem when running a simple test case
> > where multiple processes open/read/close a single file, and a network
> > disruption or smb restart occurs.
> >
> > After just a few minutes of restarts, a few of the processes error
> > out, but a few remain.  If you stop the disruption, the share is
> > accessible again, but the original processes are blocked indefinitely
> > waiting on lock_sem
> >
> > Once this occurs, you'll see two threads deadlocked like the below.  Basically:
> > thread1: down_read(lock_sem) inside cifs_strict_readv(); does not
> > block, but blocks on network temporarily
> > thread2: down_write(lock_sem) inside _cifsFileInfo_put; blocks due to thread 1
> > thread1: down_read(lock_sem) inside cifs_reopen_file (after
> > reconnect); unblocks to reconnect, but then blocks due to thread 2
> >
> > At this point there is a deadlock, both threads are waiting for each
> > other to release lock_sem
> >
> > The root of the problem seems to be having to hold the read sem over a
> > network disruption, and no check when it comes back before down_read a
> > second time.
> > Inside cifs_strict_readv there's a comment that explains why we need
> > hold the sem over the read, and it's so no one modifies the lock list
> > with a brlock that prevents writing.
> > However, in this case at least, we're only opening files for reading,
> > and inside _cifsFileInfo_put it needs to down_write to delete the
> > outstanding lock records.
> > So we're not really hitting that case, but are modifying the list.
> >
> > One approach to avoid the deadlock is for the reading process to drop
> > the lock_sem before re-aquiring it again inside cifs_reopen_file.
> > We would drop the lock just before calling, and then inside we'd need
> > to check for the condition of the conflicting brlock, and then
> > possibly error out somehow.
> > But that is probably not preferred.
> >
> > Instead, can the reader only down_read one time?  cifs_relock_file is
> > only called from one place (cifs_reopen_file) and it's when we're
> > reconnecting, but
> > cifs_reopen_file is called from many places. I'm not sure if we there
> > is something in existence we can use that says we don't need to
> > re-aquire it, or if we need to add a flag or something?
> >
> >
> >
> >
> >
> > Here is the live crash session
> >
> > crash> ps -m | grep UN
> > [0 01:54:10.166] [UN]  PID: 13271  TASK: ffff93b8a7349e40  CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.167] [UN]  PID: 13273  TASK: ffff93b8b3310000  CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.169] [UN]  PID: 13269  TASK: ffff93b8a734dac0  CPU: 0
> > COMMAND: "python3-openloo"
> > [0 01:54:10.208] [UN]  PID: 13272  TASK: ffff93b8a7348000  CPU: 7
> > COMMAND: "python3-openloo"
> > [0 01:54:10.216] [UN]  PID: 13265  TASK: ffff93b8a7358000  CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.172] [UN]  PID: 13263  TASK: ffff93b8b3313c80  CPU: 2
> > COMMAND: "python3-openloo"
> > [0 01:54:10.229] [UN]  PID: 13267  TASK: ffff93b8a7359e40  CPU: 5
> > COMMAND: "python3-openloo"
> > [0 01:54:10.234] [UN]  PID: 13277  TASK: ffff93b8a7dddac0  CPU: 3
> > COMMAND: "python3-openloo"
> > [0 01:54:10.235] [UN]  PID: 13270  TASK: ffff93b8a734bc80  CPU: 1
> > COMMAND: "python3-openloo"
> > [0 01:54:10.244] [UN]  PID: 13276  TASK: ffff93b8a7dd8000  CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.260] [UN]  PID: 13264  TASK: ffff93b8b3315ac0  CPU: 6
> > COMMAND: "python3-openloo"
> > [0 01:54:10.257] [UN]  PID: 13274  TASK: ffff93b8a7ddbc80  CPU: 1
> > COMMAND: "python3-openloo"
> > [0 01:54:31.262] [UN]  PID: 13279  TASK: ffff93b8a7d90000  CPU: 1
> > COMMAND: "python3-openloo" <--- block same instant
> > [0 01:54:31.265] [UN]  PID: 13275  TASK: ffff93b8a7dd9e40  CPU: 4
> > COMMAND: "python3-openloo" <--- block same instant
> > crash> bt 13279
> > PID: 13279  TASK: ffff93b8a7d90000  CPU: 1   COMMAND: "python3-openloo"
> >  #0 [ffffaae1c0a77c20] __schedule at ffffffffa19d4239
> >  #1 [ffffaae1c0a77cb0] schedule at ffffffffa19d46b9
> >  #2 [ffffaae1c0a77cc8] rwsem_down_write_slowpath at ffffffffa1131ada
> > <------------- T+1: pid 13279 down_write(... lock_sem); blocks due to
> > pid 13275
> >  #3 [ffffaae1c0a77d98] _cifsFileInfo_put at ffffffffc0759818 [cifs]
> >  #4 [ffffaae1c0a77e78] cifs_close at ffffffffc0759b4f [cifs]
> >  #5 [ffffaae1c0a77e88] __fput at ffffffffa12f6e51
> >  #6 [ffffaae1c0a77ed0] task_work_run at ffffffffa10fc827
> >  #7 [ffffaae1c0a77f08] exit_to_usermode_loop at ffffffffa1003f83
> >  #8 [ffffaae1c0a77f38] do_syscall_64 at ffffffffa10043c2
> >  #9 [ffffaae1c0a77f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
> >     RIP: 00007fc6caca1778  RSP: 00007ffed0c43de8  RFLAGS: 00000246
> >     RAX: 0000000000000000  RBX: 00007fc6bd5a5f30  RCX: 00007fc6caca1778
> >     RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000003
> >     RBP: 0000558ebb9ca0d0   R8: 00007fc6cab941a8   R9: 0000000000000000
> >     R10: 0000558ebb9ca0d0  R11: 0000000000000246  R12: 0000000000000000
> >     R13: 0000000000000000  R14: 00007fc6cab4df0c  R15: 00007ffed0c43f10
> >     ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
> > crash> bt 13275
> > PID: 13275  TASK: ffff93b8a7dd9e40  CPU: 4   COMMAND: "python3-openloo"
> >  #0 [ffffaae1c0a57a00] __schedule at ffffffffa19d4239
> >  #1 [ffffaae1c0a57a90] schedule at ffffffffa19d46b9
> >  #2 [ffffaae1c0a57aa8] rwsem_down_read_slowpath at ffffffffa19d72b4
> >  #3 [ffffaae1c0a57b60] cifs_reopen_file at ffffffffc0755352 [cifs]
> > <------------- T+2: pid 13275 down_read(... lock_sem); blocks due to
> > pid 13279
> >  #4 [ffffaae1c0a57c08] cifs_readpage_worker at ffffffffc07575f0 [cifs]
> >  #5 [ffffaae1c0a57cd0] cifs_readpage at ffffffffc0758096 [cifs]
> >  #6 [ffffaae1c0a57d08] generic_file_read_iter at ffffffffa123acd2
> >  #7 [ffffaae1c0a57e10] cifs_strict_readv at ffffffffc07606a5 [cifs]
> > <------------- T+0: pid 13275 down_read(... lock_sem); does not block
> >  #8 [ffffaae1c0a57e40] new_sync_read at ffffffffa12f336a
> >  #9 [ffffaae1c0a57ec8] vfs_read at ffffffffa12f5fad
> > #10 [ffffaae1c0a57f00] ksys_read at ffffffffa12f62df
> > #11 [ffffaae1c0a57f38] do_syscall_64 at ffffffffa10042bb
> > #12 [ffffaae1c0a57f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c
> >     RIP: 00007f6ed0b7c055  RSP: 00007ffeea637e68  RFLAGS: 00000246
> >     RAX: ffffffffffffffda  RBX: 00007f6ed05cd600  RCX: 00007f6ed0b7c055
> >     RDX: 000000000000001e  RSI: 00007f6ec3404c10  RDI: 0000000000000003
> >     RBP: 000000000000001e   R8: 00007f6ed0a6f1a8   R9: 0000000000000000
> >     R10: 00007f6ed090faf0  R11: 0000000000000246  R12: 00007f6ec3480f40
> >     R13: 0000000000000003  R14: 00007f6ec3404c10  R15: 000055c449ea60d0
> >     ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b
> >
> >
> > Here is my tracing from the attached debug patch which shows the
> > conditions, and points at the bugzilla we have open:
> > [  233.588248] CIFS VFS: file=fs/cifs/file.c line=480 pid=13279
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [  233.589462] CIFS VFS: file=fs/cifs/file.c line=311 pid=13279
> > cifs_sem_op op=down_write sem=ffff93b8a7424bd8
> > [  233.592333] CIFS VFS: file=fs/cifs/file.c line=314 pid=13275
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [  233.612324] CIFS VFS: file=fs/cifs/file.c line=314 pid=13279
> > cifs_sem_op op=up_write sem=ffff93b8a7424bd8
> > [  233.612510] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13275
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8
> > [  233.616487] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13279
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8  T0:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> > [  254.389458] CIFS VFS: Send error in read = -9
> > [  254.391943] CIFS VFS: file=fs/cifs/file.c line=3926 pid=13279
> > cifs_sem_op op=up_read sem=ffff93b8a7424bd8
> > [  254.397453] CIFS VFS: file=fs/cifs/file.c line=471 pid=13279
> > cifs_sem_op op=down_write sem=ffff93b8a7424bd8  T1:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> > [  254.399798] CIFS VFS: file=fs/cifs/file.c line=630 pid=13275
> > cifs_sem_op op=down_read sem=ffff93b8a7424bd8   T2:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9
> >
> >
> >
> > There was a patch that removed a lockdep warning, but it looks like
> > this removed a semi-valid warning, because clearly we can deadlock
> > here:
> >
> > commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d
> > Author: Rabin Vincent <rabinv@axis.com>
> > Date:   Wed May 3 17:17:21 2017 +0200
> >
> >     CIFS: silence lockdep splat in cifs_relock_file()
> >
> >     cifs_relock_file() can perform a down_write() on the inode's lock_sem even
> >     though it was already performed in cifs_strict_readv().  Lockdep complains
> >     about this.  AFAICS, there is no problem here, and lockdep just needs to be
> >     told that this nesting is OK.
> >
> >      =============================================
> >      [ INFO: possible recursive locking detected ]
> >      4.11.0+ #20 Not tainted
> >      ---------------------------------------------
> >      cat/701 is trying to acquire lock:
> >       (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00
> >
> >      but task is already holding lock:
> >       (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
> >
> >      other info that might help us debug this:
> >       Possible unsafe locking scenario:
> >
> >             CPU0
> >             ----
> >        lock(&cifsi->lock_sem);
> >        lock(&cifsi->lock_sem);
> >
> >       *** DEADLOCK ***
> >
> >       May be due to missing lock nesting notation
> >
> >      1 lock held by cat/701:
> >       #0:  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
> >
> >      stack backtrace:
> >      CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
> >      Call Trace:
> >       dump_stack+0x85/0xc2
> >       __lock_acquire+0x17dd/0x2260
> >       ? trace_hardirqs_on_thunk+0x1a/0x1c
> >       ? preempt_schedule_irq+0x6b/0x80
> >       lock_acquire+0xcc/0x260
> >       ? lock_acquire+0xcc/0x260
> >       ? cifs_reopen_file+0x7a7/0xc00
> >       down_read+0x2d/0x70
> >       ? cifs_reopen_file+0x7a7/0xc00
> >       cifs_reopen_file+0x7a7/0xc00
> >       ? printk+0x43/0x4b
> >       cifs_readpage_worker+0x327/0x8a0
> >       cifs_readpage+0x8c/0x2a0
> >       generic_file_read_iter+0x692/0xd00
> >       cifs_strict_readv+0x29f/0x310
> >       generic_file_splice_read+0x11c/0x1c0
> >       do_splice_to+0xa5/0xc0
> >       splice_direct_to_actor+0xfa/0x350
> >       ? generic_pipe_buf_nosteal+0x10/0x10
> >       do_splice_direct+0xb5/0xe0
> >       do_sendfile+0x278/0x3a0
> >       SyS_sendfile64+0xc4/0xe0
> >       entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> >     Signed-off-by: Rabin Vincent <rabinv@axis.com>
> >     Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
> >     Signed-off-by: Steve French <smfrench@gmail.com>
>
> So, even after we already grabbed the semaphore it doesn't allow us to
> acquire it one more time because of the conflicting waiting down_write

Yes it is my understanding this is the case.  Otherwise, readers could
starve writers.

> ? If yes, then we can create another work queue that will be
> re-locking files that have been re-opened. In this case
> cifs_reopen_file will simply submit the "re-lock" request to the queue
> without the need to grab the lock one more time.
>
I think you're approach of a work queue may work but it would have to
be on a per-pid basis, as would any solution.

The root seems to be we're going to sleep holding the lock_sem.  Is
there anyway to avoid that?

FWIW, the flag approach I mentioned (i.e. adding another flag in
cifsInodeInfo.flags and checking it), I don't think would work because
some other process could clear it and you'd end up in the same
situation.
I tried it and realized the same problem can happen again.

> Thoughts?
>
> --
> Best regards,
> Pavel Shilovsky

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart
  2019-10-22  3:04   ` David Wysochanski
@ 2019-10-22 16:09     ` Pavel Shilovsky
  2019-10-22 16:43       ` David Wysochanski
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Shilovsky @ 2019-10-22 16:09 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Steve French, Ronnie Sahlberg, linux-cifs

пн, 21 окт. 2019 г. в 20:05, David Wysochanski <dwysocha@redhat.com>:
> > ? If yes, then we can create another work queue that will be
> > re-locking files that have been re-opened. In this case
> > cifs_reopen_file will simply submit the "re-lock" request to the queue
> > without the need to grab the lock one more time.
> >
> I think you're approach of a work queue may work but it would have to
> be on a per-pid basis, as would any solution.

Why do you think we should have a queue per pid? What are possible
problems to have a single queue per SMB connection or Share (Tree)
connection?

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart
  2019-10-22 16:09     ` Pavel Shilovsky
@ 2019-10-22 16:43       ` David Wysochanski
  2019-10-22 19:00         ` David Wysochanski
  0 siblings, 1 reply; 6+ messages in thread
From: David Wysochanski @ 2019-10-22 16:43 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Steve French, Ronnie Sahlberg, linux-cifs

On Tue, Oct 22, 2019 at 12:09 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
>
> пн, 21 окт. 2019 г. в 20:05, David Wysochanski <dwysocha@redhat.com>:
> > > ? If yes, then we can create another work queue that will be
> > > re-locking files that have been re-opened. In this case
> > > cifs_reopen_file will simply submit the "re-lock" request to the queue
> > > without the need to grab the lock one more time.
> > >
> > I think you're approach of a work queue may work but it would have to
> > be on a per-pid basis, as would any solution.
>
> Why do you think we should have a queue per pid? What are possible
> problems to have a single queue per SMB connection or Share (Tree)
> connection?
>

What you're proposing may work but would have to see the code and how
lock_sem would be used.  If you're serializing the relocking to only
one pid it should be fine.

I can describe what I tried (attached patch) and why it did not work.
Here's the problem code path.  Note that down_read(lock_sem) #2 is
only a problem if down_write(lock_sem) happens between #1 and #2, but
we're trying to avoid #2 to avoid the deadlock.

PID: 5464   TASK: ffff9862640b2f80  CPU: 1   COMMAND: "python3-openloo"
 #0 [ffffbd2d81203a20] __schedule at ffffffff9aa629e3
 #1 [ffffbd2d81203ac0] schedule at ffffffff9aa62fe8
 #2 [ffffbd2d81203ac8] rwsem_down_read_failed at ffffffff9aa660cd
 #3 [ffffbd2d81203b58] cifs_reopen_file at ffffffffc0529dce [cifs]
<--- down_read (lock_sem) #2
 #4 [ffffbd2d81203c00] cifs_read.constprop.46 at ffffffffc052bbe0 [cifs]
 #5 [ffffbd2d81203ca8] cifs_readpage_worker at ffffffffc052c0a7 [cifs]
 #6 [ffffbd2d81203ce0] cifs_readpage at ffffffffc052c427 [cifs]
 #7 [ffffbd2d81203d20] generic_file_buffered_read at ffffffff9a40f2b1
 #8 [ffffbd2d81203e10] cifs_strict_readv at ffffffffc0533a7b [cifs]
<---- down_read (lock_sem) #1
 #9 [ffffbd2d81203e40] new_sync_read at ffffffff9a4b6e21
#10 [ffffbd2d81203ec8] vfs_read at ffffffff9a4b9671
#11 [ffffbd2d81203f00] ksys_read at ffffffff9a4b9aaf
#12 [ffffbd2d81203f38] do_syscall_64 at ffffffff9a2041cb

Note that each of these down_read() calls are per-pid (each pid has to
get the sem).   I tried this patch:
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1404,6 +1404,7 @@ struct cifsInodeInfo {
 #define CIFS_INO_DELETE_PENDING                  (3) /* delete
pending on server */
 #define CIFS_INO_INVALID_MAPPING         (4) /* pagecache is invalid */
 #define CIFS_INO_LOCK                    (5) /* lock bit for synchronization */
+#define CIFS_LOCK_SEM_HELD               (6) /* the lock_sem is held */
        unsigned long flags;
        spinlock_t writers_lock;
        unsigned int writers;           /* Number of writers on this inode */
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 5ad15de2bb4f..153adcd3ad16 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -620,11 +620,15 @@ cifs_relock_file(struct cifsFileInfo *cfile)
        struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry));
        struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
        int rc = 0;
+       int lock_sem_held = 0;

-       down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING);
+       lock_sem_held = test_bit(CIFS_LOCK_SEM_HELD, &cinode->flags);
+       if (!lock_sem_held)
+               down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING);
        if (cinode->can_cache_brlcks) {
                /* can cache locks - no need to relock */
-               up_read(&cinode->lock_sem);
+               if (!lock_sem_held)
+                       up_read(&cinode->lock_sem);
                return rc;
        }

@@ -635,7 +639,8 @@ cifs_relock_file(struct cifsFileInfo *cfile)
        else
                rc = tcon->ses->server->ops->push_mand_locks(cfile);

-       up_read(&cinode->lock_sem);
+       if (!lock_sem_held)
+               up_read(&cinode->lock_sem);
        return rc;
 }

@@ -3888,11 +3893,13 @@ cifs_strict_readv(struct kiocb *iocb, struct
iov_iter *to)
         * with a brlock that prevents reading.
         */
        down_read(&cinode->lock_sem);
+       set_bit(CIFS_LOCK_SEM_HELD, &cinode->flags);
        if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to),
                                     tcon->ses->server->vals->shared_lock_type,
                                     0, NULL, CIFS_READ_OP))
                rc = generic_file_read_iter(iocb, to);
        up_read(&cinode->lock_sem);
+       clear_bit(CIFS_LOCK_SEM_HELD, &cinode->flags);
        return rc;
 }


Note that the bit is associated with the cifsInodeInfo file, and we
only have one file but multiple pids so it's not an effective
solution.  To describe, the patch doesn't work because when the server
goes down, you have many processes all having passed down_read() #1,
and they are blocked waiting for a response.  When the server comes
back at least one will see the bit is set and will skip down_read()
#2, but then it will proceed to finish the cifs_strict_readv(), and so
clear the bit, and other pids may not see the bit set anymore, and
will call down_read() #2, leading to the same deadlock.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart
  2019-10-22 16:43       ` David Wysochanski
@ 2019-10-22 19:00         ` David Wysochanski
  0 siblings, 0 replies; 6+ messages in thread
From: David Wysochanski @ 2019-10-22 19:00 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: Steve French, Ronnie Sahlberg, linux-cifs

On Tue, Oct 22, 2019 at 12:43 PM David Wysochanski <dwysocha@redhat.com> wrote:
>
> On Tue, Oct 22, 2019 at 12:09 PM Pavel Shilovsky <piastryyy@gmail.com> wrote:
> >
> > пн, 21 окт. 2019 г. в 20:05, David Wysochanski <dwysocha@redhat.com>:
> > > > ? If yes, then we can create another work queue that will be
> > > > re-locking files that have been re-opened. In this case
> > > > cifs_reopen_file will simply submit the "re-lock" request to the queue
> > > > without the need to grab the lock one more time.
> > > >
> > > I think you're approach of a work queue may work but it would have to
> > > be on a per-pid basis, as would any solution.
> >
> > Why do you think we should have a queue per pid? What are possible
> > problems to have a single queue per SMB connection or Share (Tree)
> > connection?
> >
>
> What you're proposing may work but would have to see the code and how
> lock_sem would be used.  If you're serializing the relocking to only
> one pid it should be fine.
>
> I can describe what I tried (attached patch) and why it did not work.
> Here's the problem code path.  Note that down_read(lock_sem) #2 is
> only a problem if down_write(lock_sem) happens between #1 and #2, but
> we're trying to avoid #2 to avoid the deadlock.
>
> PID: 5464   TASK: ffff9862640b2f80  CPU: 1   COMMAND: "python3-openloo"
>  #0 [ffffbd2d81203a20] __schedule at ffffffff9aa629e3
>  #1 [ffffbd2d81203ac0] schedule at ffffffff9aa62fe8
>  #2 [ffffbd2d81203ac8] rwsem_down_read_failed at ffffffff9aa660cd
>  #3 [ffffbd2d81203b58] cifs_reopen_file at ffffffffc0529dce [cifs]
> <--- down_read (lock_sem) #2
>  #4 [ffffbd2d81203c00] cifs_read.constprop.46 at ffffffffc052bbe0 [cifs]
>  #5 [ffffbd2d81203ca8] cifs_readpage_worker at ffffffffc052c0a7 [cifs]
>  #6 [ffffbd2d81203ce0] cifs_readpage at ffffffffc052c427 [cifs]
>  #7 [ffffbd2d81203d20] generic_file_buffered_read at ffffffff9a40f2b1
>  #8 [ffffbd2d81203e10] cifs_strict_readv at ffffffffc0533a7b [cifs]
> <---- down_read (lock_sem) #1
>  #9 [ffffbd2d81203e40] new_sync_read at ffffffff9a4b6e21
> #10 [ffffbd2d81203ec8] vfs_read at ffffffff9a4b9671
> #11 [ffffbd2d81203f00] ksys_read at ffffffff9a4b9aaf
> #12 [ffffbd2d81203f38] do_syscall_64 at ffffffff9a2041cb
>
> Note that each of these down_read() calls are per-pid (each pid has to
> get the sem).   I tried this patch:
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1404,6 +1404,7 @@ struct cifsInodeInfo {
>  #define CIFS_INO_DELETE_PENDING                  (3) /* delete
> pending on server */
>  #define CIFS_INO_INVALID_MAPPING         (4) /* pagecache is invalid */
>  #define CIFS_INO_LOCK                    (5) /* lock bit for synchronization */
> +#define CIFS_LOCK_SEM_HELD               (6) /* the lock_sem is held */
>         unsigned long flags;
>         spinlock_t writers_lock;
>         unsigned int writers;           /* Number of writers on this inode */
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 5ad15de2bb4f..153adcd3ad16 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -620,11 +620,15 @@ cifs_relock_file(struct cifsFileInfo *cfile)
>         struct cifsInodeInfo *cinode = CIFS_I(d_inode(cfile->dentry));
>         struct cifs_tcon *tcon = tlink_tcon(cfile->tlink);
>         int rc = 0;
> +       int lock_sem_held = 0;
>
> -       down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING);
> +       lock_sem_held = test_bit(CIFS_LOCK_SEM_HELD, &cinode->flags);
> +       if (!lock_sem_held)
> +               down_read_nested(&cinode->lock_sem, SINGLE_DEPTH_NESTING);
>         if (cinode->can_cache_brlcks) {
>                 /* can cache locks - no need to relock */
> -               up_read(&cinode->lock_sem);
> +               if (!lock_sem_held)
> +                       up_read(&cinode->lock_sem);
>                 return rc;
>         }
>
> @@ -635,7 +639,8 @@ cifs_relock_file(struct cifsFileInfo *cfile)
>         else
>                 rc = tcon->ses->server->ops->push_mand_locks(cfile);
>
> -       up_read(&cinode->lock_sem);
> +       if (!lock_sem_held)
> +               up_read(&cinode->lock_sem);
>         return rc;
>  }
>
> @@ -3888,11 +3893,13 @@ cifs_strict_readv(struct kiocb *iocb, struct
> iov_iter *to)
>          * with a brlock that prevents reading.
>          */
>         down_read(&cinode->lock_sem);
> +       set_bit(CIFS_LOCK_SEM_HELD, &cinode->flags);
>         if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to),
>                                      tcon->ses->server->vals->shared_lock_type,
>                                      0, NULL, CIFS_READ_OP))
>                 rc = generic_file_read_iter(iocb, to);
>         up_read(&cinode->lock_sem);
> +       clear_bit(CIFS_LOCK_SEM_HELD, &cinode->flags);
>         return rc;
>  }
>
>
> Note that the bit is associated with the cifsInodeInfo file, and we
> only have one file but multiple pids so it's not an effective
> solution.  To describe, the patch doesn't work because when the server
> goes down, you have many processes all having passed down_read() #1,
> and they are blocked waiting for a response.  When the server comes
> back at least one will see the bit is set and will skip down_read()
> #2, but then it will proceed to finish the cifs_strict_readv(), and so
> clear the bit, and other pids may not see the bit set anymore, and
> will call down_read() #2, leading to the same deadlock.


One other thing to think about here is this.  The other part of the
deadlock, as seen in the reproducer, is due to this process:

PID: 5411   TASK: ffff9862788fc740  CPU: 3   COMMAND: "python3-openloo"
 #0 [ffffbd2d80fbfc60] __schedule at ffffffff9aa629e3
 #1 [ffffbd2d80fbfd00] schedule at ffffffff9aa62fe8
 #2 [ffffbd2d80fbfd08] rwsem_down_write_failed at ffffffff9aa65dbc
 #3 [ffffbd2d80fbfda8] down_write at ffffffff9aa65be9
 #4 [ffffbd2d80fbfdb8] _cifsFileInfo_put at ffffffffc052da65 [cifs]
 #5 [ffffbd2d80fbfe98] cifs_close at ffffffffc052dd3f [cifs]
 #6 [ffffbd2d80fbfea8] __fput at ffffffff9a4ba5c7
 #7 [ffffbd2d80fbfee8] task_work_run at ffffffff9a2d1c5a
 #8 [ffffbd2d80fbff20] exit_to_usermode_loop at ffffffff9a203b4b
 #9 [ffffbd2d80fbff38] do_syscall_64 at ffffffff9a2042f2
#10 [ffffbd2d80fbff50] entry_SYSCALL_64_after_hwframe at ffffffff9ac000ad


What's that process doing?  Well it is blocked on down_write()
deleting and freeing all entries on cifs_file->llist->locks
void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)

    /*
     * Delete any outstanding lock records. We'll lose them when the file
     * is closed anyway.
     */
    down_write(&cifsi->lock_sem);
    list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) {
        list_del(&li->llist);
        cifs_del_lock_waiters(li);
        kfree(li);
    }


The other trace we see where someone is trying to down_write is here:

PID: 14207  TASK: ffff8ac1f3369e40  CPU: 6   COMMAND: "python3-openloo"
 #0 [ffff9cfc41ccb990] __schedule at ffffffff9f9d2d29
 #1 [ffff9cfc41ccba20] schedule at ffffffff9f9d31a9
 #2 [ffff9cfc41ccba38] rwsem_down_write_slowpath at ffffffff9f131b0a
 #3 [ffff9cfc41ccbb08] cifs_new_fileinfo at ffffffffc065833a [cifs]
 #4 [ffff9cfc41ccbb78] cifs_open at ffffffffc0658a8c [cifs]
 #5 [ffff9cfc41ccbcd0] do_dentry_open at ffffffff9f2f09ca
 #6 [ffff9cfc41ccbd00] path_openat at ffffffff9f3050ad
 #7 [ffff9cfc41ccbdd8] do_filp_open at ffffffff9f307431
 #8 [ffff9cfc41ccbee0] do_sys_open at ffffffff9f2f2154
 #9 [ffff9cfc41ccbf38] do_syscall_64 at ffffffff9f0042bb
#10 [ffff9cfc41ccbf50] entry_SYSCALL_64_after_hwframe at ffffffff9fa0008c

What is that process doing?  It is creating a new empty list of
cifs_fid_locks and adding that onto cinode->llist:
cifs_new_fileinfo(struct cifs_fid *fid, struct file *file,
          struct tcon_link *tlink, __u32 oplock)
...
    cfile = kzalloc(sizeof(struct cifsFileInfo), GFP_KERNEL);
    if (cfile == NULL)
        return cfile;

    fdlocks = kzalloc(sizeof(struct cifs_fid_locks), GFP_KERNEL);
    if (!fdlocks) {
        kfree(cfile);
        return NULL;
    }

    INIT_LIST_HEAD(&fdlocks->locks);
    fdlocks->cfile = cfile;
    cfile->llist = fdlocks;
    down_write(&cinode->lock_sem);
    list_add(&fdlocks->llist, &cinode->llist);
    up_write(&cinode->lock_sem);



So back to the reader, why did he down_read() and hold the semaphore
and go to sleep waiting for the read response (or a reconnect)?
cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
...
    /*
     * We need to hold the sem to be sure nobody modifies lock list
     * with a brlock that prevents reading.
     */
    down_read(&cinode->lock_sem);
    if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(to),
                     tcon->ses->server->vals->shared_lock_type,
                     0, NULL, CIFS_READ_OP))
        rc = generic_file_read_iter(iocb, to);
    up_read(&cinode->lock_sem);


Neither of the above two callers of down_write() are doing the
operation that down_read #1 is meaning to avoid - "modify the lock
list with a brlock that prevents reading".
So if there is some way to pull that piece out, then you can solve the
deadlock.  The existing use of lock_sem to both a) protect reads
against a lock that prevents reading, and b) protect modifications to
cinode->llist causes unnecessary contention of the semaphore and this
deadlock.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 17:22 Easily reproducible deadlock on 5.4-rc4 with cifsInodeInfo.lock_sem semaphore after network disruption or smb restart David Wysochanski
2019-10-21 22:19 ` Pavel Shilovsky
2019-10-22  3:04   ` David Wysochanski
2019-10-22 16:09     ` Pavel Shilovsky
2019-10-22 16:43       ` David Wysochanski
2019-10-22 19:00         ` David Wysochanski

Linux-CIFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-cifs/0 linux-cifs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-cifs linux-cifs/ https://lore.kernel.org/linux-cifs \
		linux-cifs@vger.kernel.org
	public-inbox-index linux-cifs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-cifs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git