Linux-CIFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/1] cifs: move cifsFileInfo_put logic into a work-queue
@ 2019-10-26 21:04 Ronnie Sahlberg
  2019-10-26 21:04 ` [PATCH] " Ronnie Sahlberg
  2019-10-27 23:52 ` [PATCH 0/1] " Frank Sorenson
  0 siblings, 2 replies; 7+ messages in thread
From: Ronnie Sahlberg @ 2019-10-26 21:04 UTC (permalink / raw)
  To: linux-cifs

Steve, Pavel, Frank

This patch moves the logic in cifsFileInfo_put() into a work-queue so that
it will be processed in a different thread which, importantly, does not hold
any other locks.

This should address the deadlock that Frank reported in the thread:
Yet another hang after network disruption or smb restart -- multiple file writers



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] cifs: move cifsFileInfo_put logic into a work-queue
  2019-10-26 21:04 [PATCH 0/1] cifs: move cifsFileInfo_put logic into a work-queue Ronnie Sahlberg
@ 2019-10-26 21:04 ` " Ronnie Sahlberg
  2019-10-28 23:18   ` Pavel Shilovsky
  2019-10-27 23:52 ` [PATCH 0/1] " Frank Sorenson
  1 sibling, 1 reply; 7+ messages in thread
From: Ronnie Sahlberg @ 2019-10-26 21:04 UTC (permalink / raw)
  To: linux-cifs; +Cc: Ronnie Sahlberg

This patch moves the _put logic to be processed in a separate thread that
holds no other locks to prevent deadlocks like below from happening

> there are 6 processes looping to while trying to down_write
> cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
> cifs_new_fileinfo
>
> and there are 5 other processes which are blocked, several of them
> waiting on either PG_writeback or PG_locked (which are both set), all
> for the same page of the file
>
> 2 inode_lock() (inode->i_rwsem) for the file
> 1 wait_on_page_writeback() for the page
> 1 down_read(inode->i_rwsem) for the inode of the directory
> 1 inode_lock()(inode->i_rwsem) for the inode of the directory
> 1 __lock_page
>
>
> so processes are blocked waiting on:
>   page flags PG_locked and PG_writeback for one specific page
>   inode->i_rwsem for the directory
>   inode->i_rwsem for the file
>   cifsInodeInflock_sem
>
>
>
> here are the more gory details (let me know if I need to provide
> anything more/better):
>
> [0 00:48:22.765] [UN]  PID: 8863   TASK: ffff8c691547c5c0  CPU: 3
> COMMAND: "reopen_file"
>  #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
>  #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
>  #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
>  #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
>  #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
>  #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
>  #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
> * (I think legitimize_path is bogus)
>
> in path_openat
>         } else {
>                 const char *s = path_init(nd, flags);
>                 while (!(error = link_path_walk(s, nd)) &&
>                         (error = do_last(nd, file, op)) > 0) {  <<<<
>
> do_last:
>         if (open_flag & O_CREAT)
>                 inode_lock(dir->d_inode);  <<<<
>         else
> so it's trying to take inode->i_rwsem for the directory
>
>      DENTRY           INODE           SUPERBLK     TYPE PATH
> ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR  /mnt/vm1_smb/
> inode.i_rwsem is ffff8c691158efc0
>
> <struct rw_semaphore 0xffff8c691158efc0>:
>         owner: <struct task_struct 0xffff8c6914275d00> (UN -   8856 -
> reopen_file), counter: 0x0000000000000003
>         waitlist: 2
>         0xffff9965007e3c90     8863   reopen_file      UN 0  1:29:22.926
>   RWSEM_WAITING_FOR_WRITE
>         0xffff996500393e00     9802   ls               UN 0  1:17:26.700
>   RWSEM_WAITING_FOR_READ
>
>
> the owner of the inode.i_rwsem of the directory is:
>
> [0 00:00:00.109] [UN]  PID: 8856   TASK: ffff8c6914275d00  CPU: 3
> COMMAND: "reopen_file"
>  #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
>  #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
>  #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff99650065b940] msleep at ffffffff9af573a9
>  #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
>  #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
>  #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
>  #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
>  #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
>  #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
> #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
> #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
> #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
> #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
> #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
>
> cifs_launder_page is for page 0xffffd1e2c07d2480
>
> crash> page.index,mapping,flags 0xffffd1e2c07d2480
>       index = 0x8
>       mapping = 0xffff8c68f3cd0db0
>   flags = 0xfffffc0008095
>
>   PAGE-FLAG       BIT  VALUE
>   PG_locked         0  0000001
>   PG_uptodate       2  0000004
>   PG_lru            4  0000010
>   PG_waiters        7  0000080
>   PG_writeback     15  0008000
>
>
> inode is ffff8c68f3cd0c40
> inode.i_rwsem is ffff8c68f3cd0ce0
>      DENTRY           INODE           SUPERBLK     TYPE PATH
> ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
> /mnt/vm1_smb/testfile.8853
>
>
> this process holds the inode->i_rwsem for the parent directory, is
> laundering a page attached to the inode of the file it's opening, and in
> _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
> for the file itself.
>
>
> <struct rw_semaphore 0xffff8c68f3cd0ce0>:
>         owner: <struct task_struct 0xffff8c6914272e80> (UN -   8854 -
> reopen_file), counter: 0x0000000000000003
>         waitlist: 1
>         0xffff9965005dfd80     8855   reopen_file      UN 0  1:29:22.912
>   RWSEM_WAITING_FOR_WRITE
>
> this is the inode.i_rwsem for the file
>
> the owner:
>
> [0 00:48:22.739] [UN]  PID: 8854   TASK: ffff8c6914272e80  CPU: 2
> COMMAND: "reopen_file"
>  #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
>  #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
>  #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
>  #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
>  #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
>  #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
>  #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
>  #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
>  #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
>  #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
> #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
> #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
> #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
> #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
>
> the process holds the inode->i_rwsem for the file to which it's writing,
> and is trying to __lock_page for the same page as in the other processes
>
>
> the other tasks:
> [0 00:00:00.028] [UN]  PID: 8859   TASK: ffff8c6915479740  CPU: 2
> COMMAND: "reopen_file"
>  #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
>  #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
>  #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
>  #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
>  #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
>  #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
>  #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
>  #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
> #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
>
> this is opening the file, and is trying to down_write cinode->lock_sem
>
>
> [0 00:00:00.041] [UN]  PID: 8860   TASK: ffff8c691547ae80  CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.057] [UN]  PID: 8861   TASK: ffff8c6915478000  CPU: 3
> COMMAND: "reopen_file"
> [0 00:00:00.059] [UN]  PID: 8858   TASK: ffff8c6914271740  CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.109] [UN]  PID: 8862   TASK: ffff8c691547dd00  CPU: 6
> COMMAND: "reopen_file"
>  #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
>  #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
>  #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
>  #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
>  #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
>  #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
>  #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
>  #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
>  #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
>  #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
>
> closing the file, and trying to down_write cifsi->lock_sem
>
>
> [0 00:48:22.839] [UN]  PID: 8857   TASK: ffff8c6914270000  CPU: 7
> COMMAND: "reopen_file"
>  #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
>  #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
>  #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
>  #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
>  #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
>  #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
>  #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
>  #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
>  #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
>  #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
>
> in __filemap_fdatawait_range
>                         wait_on_page_writeback(page);
> for the same page of the file
>
>
>
> [0 00:48:22.718] [UN]  PID: 8855   TASK: ffff8c69142745c0  CPU: 7
> COMMAND: "reopen_file"
>  #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
>  #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
>  #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
>  #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
>  #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
>  #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
>  #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
>  #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
>
>         inode_lock(inode);
>
>
> and one 'ls' later on, to see whether the rest of the mount is available
> (the test file is in the root, so we get blocked up on the directory
> ->i_rwsem), so the entire mount is unavailable
>
> [0 00:36:26.473] [UN]  PID: 9802   TASK: ffff8c691436ae80  CPU: 4
> COMMAND: "ls"
>  #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
>  #1 [ffff996500393db8] schedule at ffffffff9b6e64df
>  #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
>  #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
>  #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
>  #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
>  #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
>  #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
>
> in iterate_dir:
>         if (shared)
>                 res = down_read_killable(&inode->i_rwsem);  <<<<
>         else
>                 res = down_write_killable(&inode->i_rwsem);
>

Reported-by: Frank Sorenson <sorenson@redhat.com>:
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
---
 fs/cifs/cifsfs.c   | 13 ++++++++-
 fs/cifs/cifsglob.h |  1 +
 fs/cifs/file.c     | 79 +++++++++++++++++++++++++++++++++++++-----------------
 3 files changed, 68 insertions(+), 25 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index e4e3b573d20c..f8e201c45ccb 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -119,6 +119,7 @@ extern mempool_t *cifs_mid_poolp;
 
 struct workqueue_struct	*cifsiod_wq;
 struct workqueue_struct	*decrypt_wq;
+struct workqueue_struct	*fileinfo_put_wq;
 struct workqueue_struct	*cifsoplockd_wq;
 __u32 cifs_lock_secret;
 
@@ -1557,11 +1558,18 @@ init_cifs(void)
 		goto out_destroy_cifsiod_wq;
 	}
 
+	fileinfo_put_wq = alloc_workqueue("cifsfileinfoput",
+				     WQ_UNBOUND|WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
+	if (!fileinfo_put_wq) {
+		rc = -ENOMEM;
+		goto out_destroy_decrypt_wq;
+	}
+
 	cifsoplockd_wq = alloc_workqueue("cifsoplockd",
 					 WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
 	if (!cifsoplockd_wq) {
 		rc = -ENOMEM;
-		goto out_destroy_decrypt_wq;
+		goto out_destroy_fileinfo_put_wq;
 	}
 
 	rc = cifs_fscache_register();
@@ -1627,6 +1635,8 @@ init_cifs(void)
 	cifs_fscache_unregister();
 out_destroy_cifsoplockd_wq:
 	destroy_workqueue(cifsoplockd_wq);
+out_destroy_fileinfo_put_wq:
+	destroy_workqueue(fileinfo_put_wq);
 out_destroy_decrypt_wq:
 	destroy_workqueue(decrypt_wq);
 out_destroy_cifsiod_wq:
@@ -1656,6 +1666,7 @@ exit_cifs(void)
 	cifs_fscache_unregister();
 	destroy_workqueue(cifsoplockd_wq);
 	destroy_workqueue(decrypt_wq);
+	destroy_workqueue(fileinfo_put_wq);
 	destroy_workqueue(cifsiod_wq);
 	cifs_proc_clean();
 }
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 50dfd9049370..8361e30adcd9 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1902,6 +1902,7 @@ void cifs_queue_oplock_break(struct cifsFileInfo *cfile);
 extern const struct slow_work_ops cifs_oplock_break_ops;
 extern struct workqueue_struct *cifsiod_wq;
 extern struct workqueue_struct *decrypt_wq;
+extern struct workqueue_struct *fileinfo_put_wq;
 extern struct workqueue_struct *cifsoplockd_wq;
 extern __u32 cifs_lock_secret;
 
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0e0217641de1..e222cf04b325 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -368,30 +368,8 @@ cifsFileInfo_get(struct cifsFileInfo *cifs_file)
 	return cifs_file;
 }
 
-/**
- * cifsFileInfo_put - release a reference of file priv data
- *
- * Always potentially wait for oplock handler. See _cifsFileInfo_put().
- */
-void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
-{
-	_cifsFileInfo_put(cifs_file, true);
-}
-
-/**
- * _cifsFileInfo_put - release a reference of file priv data
- *
- * This may involve closing the filehandle @cifs_file out on the
- * server. Must be called without holding tcon->open_file_lock and
- * cifs_file->file_info_lock.
- *
- * If @wait_for_oplock_handler is true and we are releasing the last
- * reference, wait for any running oplock break handler of the file
- * and cancel any pending one. If calling this function from the
- * oplock break handler, you need to pass false.
- *
- */
-void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
+static void
+_cifsFileInfo_put_work(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
 {
 	struct inode *inode = d_inode(cifs_file->dentry);
 	struct cifs_tcon *tcon = tlink_tcon(cifs_file->tlink);
@@ -480,6 +458,59 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
 	kfree(cifs_file);
 }
 
+struct cifsFileInfo_w {
+	struct work_struct put;
+	struct cifsFileInfo *cifs_file;
+	bool wait_oplock_handler;
+};
+
+static void cifsFileInfo_put_work(struct work_struct *work)
+{
+	struct cifsFileInfo_w *cfiw = container_of(work,
+						   struct cifsFileInfo_w, put);
+	_cifsFileInfo_put_work(cfiw->cifs_file, cfiw->wait_oplock_handler);
+	kfree(cfiw);
+}
+
+/**
+ * cifsFileInfo_put - release a reference of file priv data
+ *
+ * Always potentially wait for oplock handler. See _cifsFileInfo_put().
+ */
+void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
+{
+	_cifsFileInfo_put(cifs_file, true);
+}
+
+/**
+ * _cifsFileInfo_put - release a reference of file priv data
+ *
+ * This may involve closing the filehandle @cifs_file out on the
+ * server. Must be called without holding tcon->open_file_lock and
+ * cifs_file->file_info_lock.
+ *
+ * If @wait_for_oplock_handler is true and we are releasing the last
+ * reference, wait for any running oplock break handler of the file
+ * and cancel any pending one. If calling this function from the
+ * oplock break handler, you need to pass false.
+ *
+ */
+void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
+{
+	struct cifsFileInfo_w *work;
+
+	work = kmalloc(sizeof(struct cifsFileInfo_w), GFP_KERNEL);
+	if (work == NULL) {
+		_cifsFileInfo_put_work(cifs_file, wait_oplock_handler);
+		return;
+	}
+
+	INIT_WORK(&work->put, cifsFileInfo_put_work);
+	work->cifs_file = cifs_file;
+	work->wait_oplock_handler = wait_oplock_handler;
+	queue_work(fileinfo_put_wq, &work->put);
+}
+
 int cifs_open(struct inode *inode, struct file *file)
 
 {
-- 
2.13.6


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] cifs: move cifsFileInfo_put logic into a work-queue
  2019-10-26 21:04 [PATCH 0/1] cifs: move cifsFileInfo_put logic into a work-queue Ronnie Sahlberg
  2019-10-26 21:04 ` [PATCH] " Ronnie Sahlberg
@ 2019-10-27 23:52 ` " Frank Sorenson
  2019-10-28 22:51   ` Pavel Shilovsky
  1 sibling, 1 reply; 7+ messages in thread
From: Frank Sorenson @ 2019-10-27 23:52 UTC (permalink / raw)
  To: Ronnie Sahlberg, linux-cifs

On 10/26/19 4:04 PM, Ronnie Sahlberg wrote:
> Steve, Pavel, Frank
> 
> This patch moves the logic in cifsFileInfo_put() into a work-queue so that
> it will be processed in a different thread which, importantly, does not hold
> any other locks.
> 
> This should address the deadlock that Frank reported in the thread:
> Yet another hang after network disruption or smb restart -- multiple file writers

Pavel,

Thanks for understanding my report and translating it into what it meant.

Ronnie,
Thanks for the patch.  I'm happy to say that my attempt at the same
looks like it would have been similar, had I known what I was doing to
begin with :)


Unfortunately, I think the patch really only moves the problem from one
place to another.  I'm guessing that means we have a second underlying
deadlock.  (this reproducer appears to be quite effective)


On the plus side, only the one file involved is jammed in the deadlock
now, as opposed to the parent directory as well, so it is at least progress.


#!/bin/bash

testfile=testfile
openloop() {
        while true ; do
                exec 10<>$testfile
                <&10 >/dev/null
                echo "testing $$" >&10
                exec 10>&-
        done
}
for i in {1..50} ; do
        openloop &
done
wait


and on the server (once):
# systemctl restart smb.service
(or anything else which forces a reconnect)


After doing the above with 20 child processes, all are now blocked, as
are 3 cifsfileinfoput kworkers, a cifsiod kworker, and a flush-cifs kworker:


# ps x | awk '{if (substr($3, 1, 1) == "D"){print $3,$NF}}' | sort  |
uniq -c | sort -rn
     20 D /var/tmp/openloop.sh
      1 D [kworker/u16:9+cifsfileinfoput]
      1 D [kworker/u16:8+cifsfileinfoput]
      1 D [kworker/u16:25+flush-cifs-1]
      1 D [kworker/u16:19+cifsfileinfoput]
      1 D [kworker/4:3+cifsiod]


12 of the openloop.sh processes:
	#0 __schedule at ffffffffab4f1a45
	#1 schedule at ffffffffab4f1ea0
	#2 rwsem_down_write_slowpath at ffffffffaad29a47
	#3 cifs_strict_writev at ffffffffc08010e2
	#4 new_sync_write at ffffffffaaeeff6d
	#5 vfs_write at ffffffffaaef2ac5
	#6 ksys_write at ffffffffaaef2d69
	#7 do_syscall_64 at ffffffffaac04345
in cifs_strict_writev:
        inode_lock(inode);


3 openloop.sh:
	#0 __schedule at ffffffffab4f1a45
	#1 schedule at ffffffffab4f1ea0
	#2 schedule_timeout at ffffffffab4f5949
	#3 msleep at ffffffffaad58d89
	#4 cifs_down_write at ffffffffc07fa2b5
	#5 cifs_new_fileinfo at ffffffffc07fa392
	#6 cifs_open at ffffffffc07fae1f
	#7 do_dentry_open at ffffffffaaeed90a
	#8 path_openat at ffffffffaaf0204a
	#9 do_filp_open at ffffffffaaf04753
	#10 do_sys_open at ffffffffaaeef066
	#11 do_syscall_64 at ffffffffaac04345



3 openloop.sh
	#0 __schedule at ffffffffab4f1a45
	#1 schedule at ffffffffab4f1ea0
	#2 io_schedule at ffffffffab4f22a2
	#3 wait_on_page_bit at ffffffffaae3fbc6
	#4 __filemap_fdatawait_range at ffffffffaae3e158
	#5 filemap_write_and_wait at ffffffffaae432c5
	#6 cifs_flush at ffffffffc0800fe3
	#7 filp_close at ffffffffaaeed181
	#8 do_dup2 at ffffffffaaf12c0b
	#9 __x64_sys_dup2 at ffffffffaaf1306a
	#10 do_syscall_64 at ffffffffaac04345

in __filemap_fdatawait_range:
	wait_on_page_writeback(page);



1 openloop.sh
	#0 __schedule at ffffffffab4f1a45
	#1 schedule at ffffffffab4f1ea0
	#2 io_schedule at ffffffffab4f22a2
	#3 wait_on_page_bit at ffffffffaae3fbc6
	#4 cifs_writepages at ffffffffc07ffc73
	#5 do_writepages at ffffffffaae4ba71
	#6 __filemap_fdatawrite_range at ffffffffaae4321b
	#7 filemap_write_and_wait at ffffffffaae432aa
	#8 cifs_flush at ffffffffc0800fe3
	#9 filp_close at ffffffffaaeed181
	#10 do_dup2 at ffffffffaaf12c0b
	#11 __x64_sys_dup2 at ffffffffaaf1306a
	#12 do_syscall_64 at ffffffffaac04345

in cifs_writepages, we're really in wdata_prepare_pages
                if (wbc->sync_mode != WB_SYNC_NONE)
                        wait_on_page_writeback(page);

1 openloop.sh
	#0 __schedule at ffffffffab4f1a45
	#1 schedule at ffffffffab4f1ea0
	#2 io_schedule at ffffffffab4f22a2
	#3 __lock_page at ffffffffaae3f66f
	#4 pagecache_get_page at ffffffffaae40ddf
	#5 grab_cache_page_write_begin at ffffffffaae420ac
	#6 cifs_write_begin at ffffffffc07f9ea9
	#7 generic_perform_write at ffffffffaae3eca4
	#8 __generic_file_write_iter at ffffffffaae4376a
	#9 cifs_strict_writev at ffffffffc08011cf
	#10 new_sync_write at ffffffffaaeeff6d
	#11 vfs_write at ffffffffaaef2ac5
	#12 ksys_write at ffffffffaaef2d69
	#13 do_syscall_64 at ffffffffaac04345



kworker/u16:8+cifsfileinfoput
kworker/u16:19+cifsfileinfoput
kworker/u16:9+cifsfileinfoput
	#0 __schedule at ffffffffab4f1a45
	#1 schedule at ffffffffab4f1ea0
	#2 schedule_timeout at ffffffffab4f5949
	#3 msleep at ffffffffaad58d89
	#4 cifs_down_write at ffffffffc07fa2b5
	#5 _cifsFileInfo_put_work at ffffffffc07fb687
	#6 cifsFileInfo_put_work at ffffffffc07fd816
	#7 process_one_work at ffffffffaacecc11
	#8 worker_thread at ffffffffaaced0f0
	#9 kthread at ffffffffaacf2ac2
	#10 ret_from_fork at ffffffffab600215

kworker/4:3+cifsiod
	#0 __schedule at ffffffffab4f1a45
	#1 schedule at ffffffffab4f1ea0
	#2 io_schedule at ffffffffab4f22a2
	#3 __lock_page at ffffffffaae3f66f
	#4 cifs_writev_complete at ffffffffc07df762
	#5 process_one_work at ffffffffaacecc11
	#6 worker_thread at ffffffffaaced0f0
	#7 kthread at ffffffffaacf2ac2
	#8 ret_from_fork at ffffffffab600215

kworker/u16:25+flush-cifs-1
	#0 __schedule at ffffffffab4f1a45
	#1 schedule at ffffffffab4f1ea0
	#2 io_schedule at ffffffffab4f22a2
	#3 __lock_page at ffffffffaae3f66f
	#4 cifs_writepages at ffffffffc07ffcbe
	#5 do_writepages at ffffffffaae4ba71
	#6 __writeback_single_inode at ffffffffaaf25bdd
	#7 writeback_sb_inodes at ffffffffaaf26315
	#8 __writeback_inodes_wb at ffffffffaaf2660d
	#9 wb_writeback at ffffffffaaf2698f
	#10 wb_workfn at ffffffffaaf279cf
	#11 process_one_work at ffffffffaacecc11
	#12 worker_thread at ffffffffaaced0f0
	#13 kthread at ffffffffaacf2ac2
	#14 ret_from_fork at ffffffffab600215


the page which processes are either trying to lock or are waiting on
writeback:
FLAGS: fffffc000809d
  PAGE-FLAG       BIT  VALUE
  PG_locked         0  0000001
  PG_uptodate       2  0000004
  PG_dirty          3  0000008
  PG_lru            4  0000010
  PG_waiters        7  0000080
  PG_writeback     15  0008000
  PG_savepinned     3  0000008





Frank
--
Frank Sorenson
sorenson@redhat.com
Principal Software Maintenance Engineer
Global Support Services - filesystems
Red Hat

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] cifs: move cifsFileInfo_put logic into a work-queue
  2019-10-27 23:52 ` [PATCH 0/1] " Frank Sorenson
@ 2019-10-28 22:51   ` Pavel Shilovsky
  2019-11-05  2:11     ` Deadlock between cifs_writev_requeue and cifs_writepages Pavel Shilovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Shilovsky @ 2019-10-28 22:51 UTC (permalink / raw)
  To: Frank Sorenson; +Cc: Ronnie Sahlberg, linux-cifs

пн, 28 окт. 2019 г. в 05:13, Frank Sorenson <sorenson@redhat.com>:
>
> On 10/26/19 4:04 PM, Ronnie Sahlberg wrote:
> > Steve, Pavel, Frank
> >
> > This patch moves the logic in cifsFileInfo_put() into a work-queue so that
> > it will be processed in a different thread which, importantly, does not hold
> > any other locks.
> >
> > This should address the deadlock that Frank reported in the thread:
> > Yet another hang after network disruption or smb restart -- multiple file writers
>
> Pavel,
>
> Thanks for understanding my report and translating it into what it meant.
>
> Ronnie,
> Thanks for the patch.  I'm happy to say that my attempt at the same
> looks like it would have been similar, had I known what I was doing to
> begin with :)
>
>
> Unfortunately, I think the patch really only moves the problem from one
> place to another.  I'm guessing that means we have a second underlying
> deadlock.  (this reproducer appears to be quite effective)
>
>
> On the plus side, only the one file involved is jammed in the deadlock
> now, as opposed to the parent directory as well, so it is at least progress.
>

Hi Frank, Ronnie,

Ok, it seems there is another bug resulting in a deadlock between a
page lock and writeback (WB) flag in cifs_writev_complete and
cifs_writepages.

1) cifs_writev_complete. in the case of WB_SYNC_ALL and EAGAIN
(reconnect) it calls cifs_writev_requeue. At this point it still has
WB flag set for all the pages being retried. Inside
cifs_writev_requeue it tries to lock the page:

kworker/4:3+cifsiod
#0 __schedule at ffffffffab4f1a45
#1 schedule at ffffffffab4f1ea0
#2 io_schedule at ffffffffab4f22a2
#3 __lock_page at ffffffffaae3f66f
#4 cifs_writev_complete at ffffffffc07df762
#5 process_one_work at ffffffffaacecc11
#6 worker_thread at ffffffffaaced0f0
#7 kthread at ffffffffaacf2ac2
#8 ret_from_fork at ffffffffab600215

2) cifs_writepages. It locks the page and then waits on WB flag to be
cleared before processing pages:

1 openloop.sh
#0 __schedule at ffffffffab4f1a45
#1 schedule at ffffffffab4f1ea0
#2 io_schedule at ffffffffab4f22a2
#3 wait_on_page_bit at ffffffffaae3fbc6
#4 cifs_writepages at ffffffffc07ffc73
#5 do_writepages at ffffffffaae4ba71
#6 __filemap_fdatawrite_range at ffffffffaae4321b
#7 filemap_write_and_wait at ffffffffaae432aa
#8 cifs_flush at ffffffffc0800fe3
#9 filp_close at ffffffffaaeed181
#10 do_dup2 at ffffffffaaf12c0b
#11 __x64_sys_dup2 at ffffffffaaf1306a
#12 do_syscall_64 at ffffffffaac04345

The bug is in cifs_writev_complete and cifs_writev_requeue: the 1st
needs to clear the WB flag and re-dirty pages before trying to retry
the IO; the 2nd needs additionally to skip pages that are no longer
dirty (e.g. written by another thread) and re-set the WB flag before
sending pages over the wire.

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cifs: move cifsFileInfo_put logic into a work-queue
  2019-10-26 21:04 ` [PATCH] " Ronnie Sahlberg
@ 2019-10-28 23:18   ` Pavel Shilovsky
  2019-10-29  3:09     ` Ronnie Sahlberg
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Shilovsky @ 2019-10-28 23:18 UTC (permalink / raw)
  To: Ronnie Sahlberg; +Cc: linux-cifs

сб, 26 окт. 2019 г. в 14:04, Ronnie Sahlberg <lsahlber@redhat.com>:
>
> This patch moves the _put logic to be processed in a separate thread that
> holds no other locks to prevent deadlocks like below from happening
>

Ronnie,

Thanks for creating the patch! Please find my comments below.

...
>
> Reported-by: Frank Sorenson <sorenson@redhat.com>:
> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
> ---
>  fs/cifs/cifsfs.c   | 13 ++++++++-
>  fs/cifs/cifsglob.h |  1 +
>  fs/cifs/file.c     | 79 +++++++++++++++++++++++++++++++++++++-----------------
>  3 files changed, 68 insertions(+), 25 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index e4e3b573d20c..f8e201c45ccb 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -119,6 +119,7 @@ extern mempool_t *cifs_mid_poolp;
>
>  struct workqueue_struct        *cifsiod_wq;
>  struct workqueue_struct        *decrypt_wq;
> +struct workqueue_struct        *fileinfo_put_wq;
>  struct workqueue_struct        *cifsoplockd_wq;
>  __u32 cifs_lock_secret;
>
> @@ -1557,11 +1558,18 @@ init_cifs(void)
>                 goto out_destroy_cifsiod_wq;
>         }
>
> +       fileinfo_put_wq = alloc_workqueue("cifsfileinfoput",
> +                                    WQ_UNBOUND|WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
> +       if (!fileinfo_put_wq) {
> +               rc = -ENOMEM;
> +               goto out_destroy_decrypt_wq;
> +       }
> +
>         cifsoplockd_wq = alloc_workqueue("cifsoplockd",
>                                          WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
>         if (!cifsoplockd_wq) {
>                 rc = -ENOMEM;
> -               goto out_destroy_decrypt_wq;
> +               goto out_destroy_fileinfo_put_wq;
>         }
>
>         rc = cifs_fscache_register();
> @@ -1627,6 +1635,8 @@ init_cifs(void)
>         cifs_fscache_unregister();
>  out_destroy_cifsoplockd_wq:
>         destroy_workqueue(cifsoplockd_wq);
> +out_destroy_fileinfo_put_wq:
> +       destroy_workqueue(fileinfo_put_wq);
>  out_destroy_decrypt_wq:
>         destroy_workqueue(decrypt_wq);
>  out_destroy_cifsiod_wq:
> @@ -1656,6 +1666,7 @@ exit_cifs(void)
>         cifs_fscache_unregister();
>         destroy_workqueue(cifsoplockd_wq);
>         destroy_workqueue(decrypt_wq);
> +       destroy_workqueue(fileinfo_put_wq);
>         destroy_workqueue(cifsiod_wq);
>         cifs_proc_clean();
>  }
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index 50dfd9049370..8361e30adcd9 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1902,6 +1902,7 @@ void cifs_queue_oplock_break(struct cifsFileInfo *cfile);
>  extern const struct slow_work_ops cifs_oplock_break_ops;
>  extern struct workqueue_struct *cifsiod_wq;
>  extern struct workqueue_struct *decrypt_wq;
> +extern struct workqueue_struct *fileinfo_put_wq;
>  extern struct workqueue_struct *cifsoplockd_wq;
>  extern __u32 cifs_lock_secret;
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 0e0217641de1..e222cf04b325 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -368,30 +368,8 @@ cifsFileInfo_get(struct cifsFileInfo *cifs_file)
>         return cifs_file;
>  }
>
> -/**
> - * cifsFileInfo_put - release a reference of file priv data
> - *
> - * Always potentially wait for oplock handler. See _cifsFileInfo_put().
> - */
> -void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
> -{
> -       _cifsFileInfo_put(cifs_file, true);
> -}
> -
> -/**
> - * _cifsFileInfo_put - release a reference of file priv data
> - *
> - * This may involve closing the filehandle @cifs_file out on the
> - * server. Must be called without holding tcon->open_file_lock and
> - * cifs_file->file_info_lock.
> - *
> - * If @wait_for_oplock_handler is true and we are releasing the last
> - * reference, wait for any running oplock break handler of the file
> - * and cancel any pending one. If calling this function from the
> - * oplock break handler, you need to pass false.
> - *
> - */
> -void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
> +static void
> +_cifsFileInfo_put_work(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
>  {
>         struct inode *inode = d_inode(cifs_file->dentry);
>         struct cifs_tcon *tcon = tlink_tcon(cifs_file->tlink);
> @@ -480,6 +458,59 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
>         kfree(cifs_file);
>  }
>
> +struct cifsFileInfo_w {
> +       struct work_struct put;
> +       struct cifsFileInfo *cifs_file;
> +       bool wait_oplock_handler;
> +};
> +
> +static void cifsFileInfo_put_work(struct work_struct *work)
> +{
> +       struct cifsFileInfo_w *cfiw = container_of(work,
> +                                                  struct cifsFileInfo_w, put);
> +       _cifsFileInfo_put_work(cfiw->cifs_file, cfiw->wait_oplock_handler);
> +       kfree(cfiw);
> +}
> +
> +/**
> + * cifsFileInfo_put - release a reference of file priv data
> + *
> + * Always potentially wait for oplock handler. See _cifsFileInfo_put().
> + */
> +void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
> +{
> +       _cifsFileInfo_put(cifs_file, true);
> +}
> +
> +/**
> + * _cifsFileInfo_put - release a reference of file priv data
> + *
> + * This may involve closing the filehandle @cifs_file out on the
> + * server. Must be called without holding tcon->open_file_lock and
> + * cifs_file->file_info_lock.

cinode->open_file_lock should be mentioned here as well.

> + *
> + * If @wait_for_oplock_handler is true and we are releasing the last
> + * reference, wait for any running oplock break handler of the file
> + * and cancel any pending one. If calling this function from the
> + * oplock break handler, you need to pass false.
> + *
> + */
> +void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool wait_oplock_handler)
> +{
> +       struct cifsFileInfo_w *work;
> +
> +       work = kmalloc(sizeof(struct cifsFileInfo_w), GFP_KERNEL);

I would suggest to make this a part of cifsFileInfo structure (like
oplock handling work) to avoid unnecessary memory allocations every
time we put/close the handle.

> +       if (work == NULL) {
> +               _cifsFileInfo_put_work(cifs_file, wait_oplock_handler);
> +               return;
> +       }
> +
> +       INIT_WORK(&work->put, cifsFileInfo_put_work);
> +       work->cifs_file = cifs_file;
> +       work->wait_oplock_handler = wait_oplock_handler;
> +       queue_work(fileinfo_put_wq, &work->put);

I don't think we should offload the work to another thread
unconditionally here - e.g. cifs_close() can safely do it in the same
thread. Since there are more places where we want to offload, so let's
do the similar thing as we have to the "wait_oplock_handler" argument:

cifsFileInfo_put(cifs_file) -> _cifsFileInfo_put(cifs_file, true /*
wait_oplock_handler */, true /* offload */);

and convert cifs_close() to call _cifsFileInfo_put(cifs_file, true /*
wait_oplock_handler */, false /* offload */);

Note, that we do not need to conditionally cancel the worker thread as
we do for the oplock handler because once we reach zero references
there should be only one thread (we) that has access to the file info
structure, thus no need to cancel anything. We may add a Warning if
the work is scheduled but we reached zero references to highlight a
potential bug.

Also, as an optimization, we may use the caller thread to check the
number of references and only offload if both conditions meet:
"offload" argument is true and the current number of references is 1
(putting the last reference). This optimization may be done in the
same patch or as a separate patch.

Thoughts?

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cifs: move cifsFileInfo_put logic into a work-queue
  2019-10-28 23:18   ` Pavel Shilovsky
@ 2019-10-29  3:09     ` Ronnie Sahlberg
  0 siblings, 0 replies; 7+ messages in thread
From: Ronnie Sahlberg @ 2019-10-29  3:09 UTC (permalink / raw)
  To: Pavel Shilovsky; +Cc: linux-cifs





----- Original Message -----
> From: "Pavel Shilovsky" <piastryyy@gmail.com>
> To: "Ronnie Sahlberg" <lsahlber@redhat.com>
> Cc: "linux-cifs" <linux-cifs@vger.kernel.org>
> Sent: Tuesday, 29 October, 2019 9:18:40 AM
> Subject: Re: [PATCH] cifs: move cifsFileInfo_put logic into a work-queue
> 
> сб, 26 окт. 2019 г. в 14:04, Ronnie Sahlberg <lsahlber@redhat.com>:
> >
> > This patch moves the _put logic to be processed in a separate thread that
> > holds no other locks to prevent deadlocks like below from happening
> >
> 
> Ronnie,
> 
> Thanks for creating the patch! Please find my comments below.

I will update the patch.  Thanks.

> 
> ...
> >
> > Reported-by: Frank Sorenson <sorenson@redhat.com>:
> > Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
> > ---
> >  fs/cifs/cifsfs.c   | 13 ++++++++-
> >  fs/cifs/cifsglob.h |  1 +
> >  fs/cifs/file.c     | 79
> >  +++++++++++++++++++++++++++++++++++++-----------------
> >  3 files changed, 68 insertions(+), 25 deletions(-)
> >
> > diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> > index e4e3b573d20c..f8e201c45ccb 100644
> > --- a/fs/cifs/cifsfs.c
> > +++ b/fs/cifs/cifsfs.c
> > @@ -119,6 +119,7 @@ extern mempool_t *cifs_mid_poolp;
> >
> >  struct workqueue_struct        *cifsiod_wq;
> >  struct workqueue_struct        *decrypt_wq;
> > +struct workqueue_struct        *fileinfo_put_wq;
> >  struct workqueue_struct        *cifsoplockd_wq;
> >  __u32 cifs_lock_secret;
> >
> > @@ -1557,11 +1558,18 @@ init_cifs(void)
> >                 goto out_destroy_cifsiod_wq;
> >         }
> >
> > +       fileinfo_put_wq = alloc_workqueue("cifsfileinfoput",
> > +
> > WQ_UNBOUND|WQ_FREEZABLE|WQ_MEM_RECLAIM,
> > 0);
> > +       if (!fileinfo_put_wq) {
> > +               rc = -ENOMEM;
> > +               goto out_destroy_decrypt_wq;
> > +       }
> > +
> >         cifsoplockd_wq = alloc_workqueue("cifsoplockd",
> >                                          WQ_FREEZABLE|WQ_MEM_RECLAIM, 0);
> >         if (!cifsoplockd_wq) {
> >                 rc = -ENOMEM;
> > -               goto out_destroy_decrypt_wq;
> > +               goto out_destroy_fileinfo_put_wq;
> >         }
> >
> >         rc = cifs_fscache_register();
> > @@ -1627,6 +1635,8 @@ init_cifs(void)
> >         cifs_fscache_unregister();
> >  out_destroy_cifsoplockd_wq:
> >         destroy_workqueue(cifsoplockd_wq);
> > +out_destroy_fileinfo_put_wq:
> > +       destroy_workqueue(fileinfo_put_wq);
> >  out_destroy_decrypt_wq:
> >         destroy_workqueue(decrypt_wq);
> >  out_destroy_cifsiod_wq:
> > @@ -1656,6 +1666,7 @@ exit_cifs(void)
> >         cifs_fscache_unregister();
> >         destroy_workqueue(cifsoplockd_wq);
> >         destroy_workqueue(decrypt_wq);
> > +       destroy_workqueue(fileinfo_put_wq);
> >         destroy_workqueue(cifsiod_wq);
> >         cifs_proc_clean();
> >  }
> > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> > index 50dfd9049370..8361e30adcd9 100644
> > --- a/fs/cifs/cifsglob.h
> > +++ b/fs/cifs/cifsglob.h
> > @@ -1902,6 +1902,7 @@ void cifs_queue_oplock_break(struct cifsFileInfo
> > *cfile);
> >  extern const struct slow_work_ops cifs_oplock_break_ops;
> >  extern struct workqueue_struct *cifsiod_wq;
> >  extern struct workqueue_struct *decrypt_wq;
> > +extern struct workqueue_struct *fileinfo_put_wq;
> >  extern struct workqueue_struct *cifsoplockd_wq;
> >  extern __u32 cifs_lock_secret;
> >
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > index 0e0217641de1..e222cf04b325 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -368,30 +368,8 @@ cifsFileInfo_get(struct cifsFileInfo *cifs_file)
> >         return cifs_file;
> >  }
> >
> > -/**
> > - * cifsFileInfo_put - release a reference of file priv data
> > - *
> > - * Always potentially wait for oplock handler. See _cifsFileInfo_put().
> > - */
> > -void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
> > -{
> > -       _cifsFileInfo_put(cifs_file, true);
> > -}
> > -
> > -/**
> > - * _cifsFileInfo_put - release a reference of file priv data
> > - *
> > - * This may involve closing the filehandle @cifs_file out on the
> > - * server. Must be called without holding tcon->open_file_lock and
> > - * cifs_file->file_info_lock.
> > - *
> > - * If @wait_for_oplock_handler is true and we are releasing the last
> > - * reference, wait for any running oplock break handler of the file
> > - * and cancel any pending one. If calling this function from the
> > - * oplock break handler, you need to pass false.
> > - *
> > - */
> > -void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool
> > wait_oplock_handler)
> > +static void
> > +_cifsFileInfo_put_work(struct cifsFileInfo *cifs_file, bool
> > wait_oplock_handler)
> >  {
> >         struct inode *inode = d_inode(cifs_file->dentry);
> >         struct cifs_tcon *tcon = tlink_tcon(cifs_file->tlink);
> > @@ -480,6 +458,59 @@ void _cifsFileInfo_put(struct cifsFileInfo *cifs_file,
> > bool wait_oplock_handler)
> >         kfree(cifs_file);
> >  }
> >
> > +struct cifsFileInfo_w {
> > +       struct work_struct put;
> > +       struct cifsFileInfo *cifs_file;
> > +       bool wait_oplock_handler;
> > +};
> > +
> > +static void cifsFileInfo_put_work(struct work_struct *work)
> > +{
> > +       struct cifsFileInfo_w *cfiw = container_of(work,
> > +                                                  struct cifsFileInfo_w,
> > put);
> > +       _cifsFileInfo_put_work(cfiw->cifs_file, cfiw->wait_oplock_handler);
> > +       kfree(cfiw);
> > +}
> > +
> > +/**
> > + * cifsFileInfo_put - release a reference of file priv data
> > + *
> > + * Always potentially wait for oplock handler. See _cifsFileInfo_put().
> > + */
> > +void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
> > +{
> > +       _cifsFileInfo_put(cifs_file, true);
> > +}
> > +
> > +/**
> > + * _cifsFileInfo_put - release a reference of file priv data
> > + *
> > + * This may involve closing the filehandle @cifs_file out on the
> > + * server. Must be called without holding tcon->open_file_lock and
> > + * cifs_file->file_info_lock.
> 
> cinode->open_file_lock should be mentioned here as well.
> 
> > + *
> > + * If @wait_for_oplock_handler is true and we are releasing the last
> > + * reference, wait for any running oplock break handler of the file
> > + * and cancel any pending one. If calling this function from the
> > + * oplock break handler, you need to pass false.
> > + *
> > + */
> > +void _cifsFileInfo_put(struct cifsFileInfo *cifs_file, bool
> > wait_oplock_handler)
> > +{
> > +       struct cifsFileInfo_w *work;
> > +
> > +       work = kmalloc(sizeof(struct cifsFileInfo_w), GFP_KERNEL);
> 
> I would suggest to make this a part of cifsFileInfo structure (like
> oplock handling work) to avoid unnecessary memory allocations every
> time we put/close the handle.
> 
> > +       if (work == NULL) {
> > +               _cifsFileInfo_put_work(cifs_file, wait_oplock_handler);
> > +               return;
`> > +       }
> > +
> > +       INIT_WORK(&work->put, cifsFileInfo_put_work);
> > +       work->cifs_file = cifs_file;
> > +       work->wait_oplock_handler = wait_oplock_handler;
> > +       queue_work(fileinfo_put_wq, &work->put);
> 
> I don't think we should offload the work to another thread
> unconditionally here - e.g. cifs_close() can safely do it in the same
> thread. Since there are more places where we want to offload, so let's
> do the similar thing as we have to the "wait_oplock_handler" argument:
> 
> cifsFileInfo_put(cifs_file) -> _cifsFileInfo_put(cifs_file, true /*
> wait_oplock_handler */, true /* offload */);
> 
> and convert cifs_close() to call _cifsFileInfo_put(cifs_file, true /*
> wait_oplock_handler */, false /* offload */);
> 
> Note, that we do not need to conditionally cancel the worker thread as
> we do for the oplock handler because once we reach zero references
> there should be only one thread (we) that has access to the file info
> structure, thus no need to cancel anything. We may add a Warning if
> the work is scheduled but we reached zero references to highlight a
> potential bug.
> 
> Also, as an optimization, we may use the caller thread to check the
> number of references and only offload if both conditions meet:
> "offload" argument is true and the current number of references is 1
> (putting the last reference). This optimization may be done in the
> same patch or as a separate patch.
> 
> Thoughts?

Yes, that makes it much better. Thanks!

> 
> --
> Best regards,
> Pavel Shilovsky
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Deadlock between cifs_writev_requeue and cifs_writepages
  2019-10-28 22:51   ` Pavel Shilovsky
@ 2019-11-05  2:11     ` Pavel Shilovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Pavel Shilovsky @ 2019-11-05  2:11 UTC (permalink / raw)
  To: Frank Sorenson, Ronnie Sahlberg; +Cc: linux-cifs, Steve French, Jeff Layton

пн, 28 окт. 2019 г. в 15:51, Pavel Shilovsky <piastryyy@gmail.com>:
>
> пн, 28 окт. 2019 г. в 05:13, Frank Sorenson <sorenson@redhat.com>:
> >
> > On 10/26/19 4:04 PM, Ronnie Sahlberg wrote:
> > > Steve, Pavel, Frank
> > >
> > > This patch moves the logic in cifsFileInfo_put() into a work-queue so that
> > > it will be processed in a different thread which, importantly, does not hold
> > > any other locks.
> > >
> > > This should address the deadlock that Frank reported in the thread:
> > > Yet another hang after network disruption or smb restart -- multiple file writers
> >
> > Pavel,
> >
> > Thanks for understanding my report and translating it into what it meant.
> >
> > Ronnie,
> > Thanks for the patch.  I'm happy to say that my attempt at the same
> > looks like it would have been similar, had I known what I was doing to
> > begin with :)
> >
> >
> > Unfortunately, I think the patch really only moves the problem from one
> > place to another.  I'm guessing that means we have a second underlying
> > deadlock.  (this reproducer appears to be quite effective)
> >
> >
> > On the plus side, only the one file involved is jammed in the deadlock
> > now, as opposed to the parent directory as well, so it is at least progress.
> >
>
> Hi Frank, Ronnie,
>
> Ok, it seems there is another bug resulting in a deadlock between a
> page lock and writeback (WB) flag in cifs_writev_complete and
> cifs_writepages.
>
> 1) cifs_writev_complete. in the case of WB_SYNC_ALL and EAGAIN
> (reconnect) it calls cifs_writev_requeue. At this point it still has
> WB flag set for all the pages being retried. Inside
> cifs_writev_requeue it tries to lock the page:
>
> kworker/4:3+cifsiod
> #0 __schedule at ffffffffab4f1a45
> #1 schedule at ffffffffab4f1ea0
> #2 io_schedule at ffffffffab4f22a2
> #3 __lock_page at ffffffffaae3f66f
> #4 cifs_writev_complete at ffffffffc07df762
> #5 process_one_work at ffffffffaacecc11
> #6 worker_thread at ffffffffaaced0f0
> #7 kthread at ffffffffaacf2ac2
> #8 ret_from_fork at ffffffffab600215
>
> 2) cifs_writepages. It locks the page and then waits on WB flag to be
> cleared before processing pages:
>
> 1 openloop.sh
> #0 __schedule at ffffffffab4f1a45
> #1 schedule at ffffffffab4f1ea0
> #2 io_schedule at ffffffffab4f22a2
> #3 wait_on_page_bit at ffffffffaae3fbc6
> #4 cifs_writepages at ffffffffc07ffc73
> #5 do_writepages at ffffffffaae4ba71
> #6 __filemap_fdatawrite_range at ffffffffaae4321b
> #7 filemap_write_and_wait at ffffffffaae432aa
> #8 cifs_flush at ffffffffc0800fe3
> #9 filp_close at ffffffffaaeed181
> #10 do_dup2 at ffffffffaaf12c0b
> #11 __x64_sys_dup2 at ffffffffaaf1306a
> #12 do_syscall_64 at ffffffffaac04345
>
> The bug is in cifs_writev_complete and cifs_writev_requeue: the 1st
> needs to clear the WB flag and re-dirty pages before trying to retry
> the IO; the 2nd needs additionally to skip pages that are no longer
> dirty (e.g. written by another thread) and re-set the WB flag before
> sending pages over the wire.
>
> --
> Best regards,
> Pavel Shilovsky

(re-titling and cc'ing Steve and Jeff to comment on this)

I looked at other file systems in the kernel. Different modules do
writepages differently, e.g. CephFS doesn't unlock pages after sending
and only does that once the IO completes. We may probably do the same:
postpone unlocking pages until we have successfully written them to
the server. In this case the retry code path (cifs_writev_requeue)
won't need to lock the pages again and thus the deadlock will be
avoided.

Thoughts?

--
Best regards,
Pavel Shilovsky

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-26 21:04 [PATCH 0/1] cifs: move cifsFileInfo_put logic into a work-queue Ronnie Sahlberg
2019-10-26 21:04 ` [PATCH] " Ronnie Sahlberg
2019-10-28 23:18   ` Pavel Shilovsky
2019-10-29  3:09     ` Ronnie Sahlberg
2019-10-27 23:52 ` [PATCH 0/1] " Frank Sorenson
2019-10-28 22:51   ` Pavel Shilovsky
2019-11-05  2:11     ` Deadlock between cifs_writev_requeue and cifs_writepages Pavel Shilovsky

Linux-CIFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-cifs/0 linux-cifs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-cifs linux-cifs/ https://lore.kernel.org/linux-cifs \
		linux-cifs@vger.kernel.org
	public-inbox-index linux-cifs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-cifs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git