linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Misc NVMe lockdep fixes
@ 2018-05-15  7:40 Johannes Thumshirn
  2018-05-15  7:40 ` [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path Johannes Thumshirn
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Johannes Thumshirn @ 2018-05-15  7:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke, Johannes Thumshirn

I've "accidently" activated lockdep and ran some tests on nvme, here's
the post-fallout patch series.

Johannes Thumshirn (5):
  nvme: fix lockdep warning in nvme_mpath_clear_current_path
  nvme: don't hold nvmf_transports_rwsem for more than transport lookups
  nvme: call nvmf_create_ctrl before checking for duplicate assignment
  nvmet: use atomic allocations when allocating fc requests
  nvmet: fcloop: use irqsave spinlocks

 drivers/nvme/host/fabrics.c  | 20 +++++++---------
 drivers/nvme/host/nvme.h     |  3 ++-
 drivers/nvme/target/fcloop.c | 54 ++++++++++++++++++++++++--------------------
 3 files changed, 40 insertions(+), 37 deletions(-)

-- 
2.16.3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path
  2018-05-15  7:40 [PATCH 0/5] Misc NVMe lockdep fixes Johannes Thumshirn
@ 2018-05-15  7:40 ` Johannes Thumshirn
  2018-05-15  8:11   ` Christoph Hellwig
  2018-05-31  9:22   ` Sagi Grimberg
  2018-05-15  7:40 ` [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Johannes Thumshirn
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Johannes Thumshirn @ 2018-05-15  7:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke, Johannes Thumshirn

When running blktest's nvme/005 with a lockdep enabled kernel the test
case fails due to the following lockdep splat in dmesg:

[   18.206166] =============================
[   18.207286] WARNING: suspicious RCU usage
[   18.208417] 4.17.0-rc5 #881 Not tainted
[   18.209487] -----------------------------
[   18.210612] drivers/nvme/host/nvme.h:457 suspicious rcu_dereference_check() usage!
[   18.213486]
[   18.213486] other info that might help us debug this:
[   18.213486]
[   18.214745]
[   18.214745] rcu_scheduler_active = 2, debug_locks = 1
[   18.215798] 3 locks held by kworker/u32:5/1102:
[   18.216535]  #0:         (ptrval) ((wq_completion)"nvme-wq"){+.+.}, at: process_one_work+0x152/0x5c0
[   18.217983]  #1:         (ptrval) ((work_completion)(&ctrl->scan_work)){+.+.}, at: process_one_work+0x152/0x5c0
[   18.219584]  #2:         (ptrval) (&subsys->lock#2){+.+.}, at: nvme_ns_remove+0x43/0x1c0 [nvme_core]
[   18.221037]
[   18.221037] stack backtrace:
[   18.221721] CPU: 12 PID: 1102 Comm: kworker/u32:5 Not tainted 4.17.0-rc5 #881
[   18.222830] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[   18.224451] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[   18.225308] Call Trace:
[   18.225704]  dump_stack+0x78/0xb3
[   18.226224]  nvme_ns_remove+0x1a3/0x1c0 [nvme_core]
[   18.226975]  nvme_validate_ns+0x87/0x850 [nvme_core]
[   18.227749]  ? blk_queue_exit+0x69/0x110
[   18.228358]  ? blk_queue_exit+0x81/0x110
[   18.228960]  ? direct_make_request+0x1a0/0x1a0
[   18.229649]  nvme_scan_work+0x212/0x2d0 [nvme_core]
[   18.230411]  process_one_work+0x1d8/0x5c0
[   18.231037]  ? process_one_work+0x152/0x5c0
[   18.231705]  worker_thread+0x45/0x3e0
[   18.232282]  kthread+0x101/0x140
[   18.232788]  ? process_one_work+0x5c0/0x5c0

The only caller of nvme_mpath_clear_current_path() is nvme_ns_remove()
which holds the subsys lock so it's likely a false positive, but when
using rcu_access_pointer(), we're telling rcu and lockdep that we're
only after the pointer falue.

Fixes: 32acab3181c7 ("nvme: implement multipath access to nvme subsystems")
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

---
Changes to v1:
- Change rcu_dereference_protected() to rc_access_pointer() (Paul)
---
 drivers/nvme/host/nvme.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 17d2f7cf3fed..af2bb6bc984d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -22,6 +22,7 @@
 #include <linux/lightnvm.h>
 #include <linux/sed-opal.h>
 #include <linux/fault-inject.h>
+#include <linux/rcupdate.h>
 
 extern unsigned int nvme_io_timeout;
 #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
@@ -454,7 +455,7 @@ static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {
 	struct nvme_ns_head *head = ns->head;
 
-	if (head && ns == srcu_dereference(head->current_path, &head->srcu))
+	if (head && ns == rcu_access_pointer(head->current_path))
 		rcu_assign_pointer(head->current_path, NULL);
 }
 struct nvme_ns *nvme_find_path(struct nvme_ns_head *head);
-- 
2.16.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups
  2018-05-15  7:40 [PATCH 0/5] Misc NVMe lockdep fixes Johannes Thumshirn
  2018-05-15  7:40 ` [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path Johannes Thumshirn
@ 2018-05-15  7:40 ` Johannes Thumshirn
  2018-05-15  7:48   ` Christoph Hellwig
  2018-05-15  7:40 ` [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment Johannes Thumshirn
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Johannes Thumshirn @ 2018-05-15  7:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke, Johannes Thumshirn

Only take nvmf_transports_rwsem when doing a lookup of regitered
transports, as subsequent actions may take the hctx_lock which is an
rcu_read_lock and thus lockdep complains about suspicious rcu usage.

[    9.703333] =============================
[    9.704797] WARNING: suspicious RCU usage
[    9.706250] 4.17.0-rc5+ #883 Not tainted
[    9.708146] -----------------------------
[    9.708868] ./include/linux/rcupdate.h:304 Illegal context switch in RCU read-side critical section!
[    9.710511]
[    9.710511] other info that might help us debug this:
[    9.710511]
[    9.711959]
[    9.711959] rcu_scheduler_active = 2, debug_locks = 1
[    9.713142] 3 locks held by nvme/1420:
[    9.713800]  #0:         (ptrval) (nvmf_dev_mutex){+.+.}, at: nvmf_dev_write+0x6a/0xb7d [nvme_fabrics]
[    9.715471]  #1:         (ptrval) (nvmf_transports_rwsem){++++}, at: nvmf_dev_write+0x6e5/0xb7d [nvme_fabrics]
[    9.717279]  #2:         (ptrval) (rcu_read_lock){....}, at: hctx_lock+0x56/0xd0
[    9.718636]
[    9.718636] stack backtrace:
[    9.720266] CPU: 2 PID: 1420 Comm: nvme Not tainted 4.17.0-rc5+ #883
[    9.721446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    9.723003] Call Trace:
[    9.723453]  dump_stack+0x78/0xb3
[    9.724059]  ___might_sleep+0xde/0x250
[    9.724749]  kmem_cache_alloc_trace+0x1ae/0x270
[    9.725565]  fcloop_fcp_req+0x32/0x1a0 [nvme_fcloop]
[    9.726428]  nvme_fc_start_fcp_op.part.39+0x193/0x4c0 [nvme_fc]
[    9.727425]  blk_mq_dispatch_rq_list+0x7f/0x4a0
[    9.728219]  ? blk_mq_flush_busy_ctxs+0xa8/0xf0
[    9.729035]  blk_mq_sched_dispatch_requests+0x16e/0x170
[    9.729984]  __blk_mq_run_hw_queue+0x79/0xd0
[    9.730737]  __blk_mq_delay_run_hw_queue+0x11c/0x160
[    9.731647]  blk_mq_run_hw_queue+0x63/0xc0
[    9.732357]  blk_mq_sched_insert_request+0xb2/0x140
[    9.733204]  blk_execute_rq+0x64/0xc0
[    9.733840]  __nvme_submit_sync_cmd+0x63/0xd0 [nvme_core]
[    9.734772]  nvmf_connect_admin_queue+0x11e/0x190 [nvme_fabrics]
[    9.735815]  ? mark_held_locks+0x6b/0x90
[    9.736504]  nvme_fc_create_association+0x35b/0x970 [nvme_fc]
[    9.737489]  nvme_fc_create_ctrl+0x5d2/0x830 [nvme_fc]
[    9.738379]  nvmf_dev_write+0x92d/0xb7d [nvme_fabrics]
[    9.739253]  __vfs_write+0x21/0x130
[    9.739895]  ? selinux_file_permission+0xe9/0x140
[    9.740698]  ? security_file_permission+0x2f/0xb0
[    9.741526]  vfs_write+0xbd/0x1c0
[    9.742095]  ksys_write+0x40/0xa0
[    9.742681]  ? do_syscall_64+0xd/0x190
[    9.743335]  do_syscall_64+0x51/0x190
[    9.744415]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/nvme/host/fabrics.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 7ae732a77fe8..febf82639b40 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -957,16 +957,17 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 
 	down_read(&nvmf_transports_rwsem);
 	ops = nvmf_lookup_transport(opts);
+	up_read(&nvmf_transports_rwsem);
 	if (!ops) {
 		pr_info("no handler found for transport %s.\n",
 			opts->transport);
 		ret = -EINVAL;
-		goto out_unlock;
+		goto out_free_opts;
 	}
 
 	if (!try_module_get(ops->module)) {
 		ret = -EBUSY;
-		goto out_unlock;
+		goto out_free_opts;
 	}
 
 	ret = nvmf_check_required_opts(opts, ops->required_opts);
@@ -988,19 +989,15 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 			"controller returned incorrect NQN: \"%s\".\n",
 			ctrl->subsys->subnqn);
 		module_put(ops->module);
-		up_read(&nvmf_transports_rwsem);
 		nvme_delete_ctrl_sync(ctrl);
 		return ERR_PTR(-EINVAL);
 	}
 
 	module_put(ops->module);
-	up_read(&nvmf_transports_rwsem);
 	return ctrl;
 
 out_module_put:
 	module_put(ops->module);
-out_unlock:
-	up_read(&nvmf_transports_rwsem);
 out_free_opts:
 	nvmf_free_options(opts);
 	return ERR_PTR(ret);
-- 
2.16.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment
  2018-05-15  7:40 [PATCH 0/5] Misc NVMe lockdep fixes Johannes Thumshirn
  2018-05-15  7:40 ` [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path Johannes Thumshirn
  2018-05-15  7:40 ` [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Johannes Thumshirn
@ 2018-05-15  7:40 ` Johannes Thumshirn
  2018-05-15  7:50   ` Christoph Hellwig
  2018-05-15  7:40 ` [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests Johannes Thumshirn
  2018-05-15  7:40 ` [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks Johannes Thumshirn
  4 siblings, 1 reply; 19+ messages in thread
From: Johannes Thumshirn @ 2018-05-15  7:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke, Johannes Thumshirn

In nvmf_dev_write we did check if the /dev/nvme-fabrics device node's
private data is already set and then create a controller data
structure afterwards. The private data is protected by the
nvmf_dev_mutex, but there is no need to hold it while calling
nvmf_create_ctrl().

This also reduces the number of lockdep complaints in the 'nvme
connect' with fcloop scenario.

[    9.703333] =============================
[    9.704797] WARNING: suspicious RCU usage
[    9.706250] 4.17.0-rc5+ #883 Not tainted
[    9.708146] -----------------------------
[    9.708868] ./include/linux/rcupdate.h:304 Illegal context switch in RCU read-side critical section!
[    9.710511]
[    9.710511] other info that might help us debug this:
[    9.710511]
[    9.711959]
[    9.711959] rcu_scheduler_active = 2, debug_locks = 1
[    9.713142] 3 locks held by nvme/1420:
[    9.713800]  #0:         (ptrval) (nvmf_dev_mutex){+.+.}, at: nvmf_dev_write+0x6a/0xb7d [nvme_fabrics]
[    9.717279]  #1:         (ptrval) (rcu_read_lock){....}, at: hctx_lock+0x56/0xd0
[    9.718636]
[    9.718636] stack backtrace:
[    9.720266] CPU: 2 PID: 1420 Comm: nvme Not tainted 4.17.0-rc5+ #883
[    9.721446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    9.723003] Call Trace:
[    9.723453]  dump_stack+0x78/0xb3
[    9.724059]  ___might_sleep+0xde/0x250
[    9.724749]  kmem_cache_alloc_trace+0x1ae/0x270
[    9.725565]  fcloop_fcp_req+0x32/0x1a0 [nvme_fcloop]
[    9.726428]  nvme_fc_start_fcp_op.part.39+0x193/0x4c0 [nvme_fc]
[    9.727425]  blk_mq_dispatch_rq_list+0x7f/0x4a0
[    9.728219]  ? blk_mq_flush_busy_ctxs+0xa8/0xf0
[    9.729035]  blk_mq_sched_dispatch_requests+0x16e/0x170
[    9.729984]  __blk_mq_run_hw_queue+0x79/0xd0
[    9.730737]  __blk_mq_delay_run_hw_queue+0x11c/0x160
[    9.731647]  blk_mq_run_hw_queue+0x63/0xc0
[    9.732357]  blk_mq_sched_insert_request+0xb2/0x140
[    9.733204]  blk_execute_rq+0x64/0xc0
[    9.733840]  __nvme_submit_sync_cmd+0x63/0xd0 [nvme_core]
[    9.734772]  nvmf_connect_admin_queue+0x11e/0x190 [nvme_fabrics]
[    9.735815]  ? mark_held_locks+0x6b/0x90
[    9.736504]  nvme_fc_create_association+0x35b/0x970 [nvme_fc]
[    9.737489]  nvme_fc_create_ctrl+0x5d2/0x830 [nvme_fc]
[    9.738379]  nvmf_dev_write+0x92d/0xb7d [nvme_fabrics]
[    9.739253]  __vfs_write+0x21/0x130
[    9.739895]  ? selinux_file_permission+0xe9/0x140
[    9.740698]  ? security_file_permission+0x2f/0xb0
[    9.741526]  vfs_write+0xbd/0x1c0
[    9.742095]  ksys_write+0x40/0xa0
[    9.742681]  ? do_syscall_64+0xd/0x190
[    9.743335]  do_syscall_64+0x51/0x190
[    9.744415]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/nvme/host/fabrics.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index febf82639b40..757a49b9c5a8 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -1022,18 +1022,17 @@ static ssize_t nvmf_dev_write(struct file *file, const char __user *ubuf,
 	if (IS_ERR(buf))
 		return PTR_ERR(buf);
 
+	ctrl = nvmf_create_ctrl(nvmf_device, buf, count);
+	if (IS_ERR(ctrl))
+		return PTR_ERR(ctrl);
+
 	mutex_lock(&nvmf_dev_mutex);
 	if (seq_file->private) {
+		nvme_delete_ctrl_sync(ctrl);
 		ret = -EINVAL;
 		goto out_unlock;
 	}
 
-	ctrl = nvmf_create_ctrl(nvmf_device, buf, count);
-	if (IS_ERR(ctrl)) {
-		ret = PTR_ERR(ctrl);
-		goto out_unlock;
-	}
-
 	seq_file->private = ctrl;
 
 out_unlock:
-- 
2.16.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests
  2018-05-15  7:40 [PATCH 0/5] Misc NVMe lockdep fixes Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2018-05-15  7:40 ` [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment Johannes Thumshirn
@ 2018-05-15  7:40 ` Johannes Thumshirn
  2018-05-31  9:31   ` Sagi Grimberg
  2018-06-12 20:39   ` James Smart
  2018-05-15  7:40 ` [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks Johannes Thumshirn
  4 siblings, 2 replies; 19+ messages in thread
From: Johannes Thumshirn @ 2018-05-15  7:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke, Johannes Thumshirn

fcloop_fcp_req() runs with the hctx_lock (a rcu_read_lock() locked
section) held, so memory allocations done in this context have to be
atomic.

This fixes the follwing lockdep complaint:
[    9.753313] BUG: sleeping function called from invalid context at mm/slab.h:421
[    9.754518] in_atomic(): 1, irqs_disabled(): 0, pid: 1420, name: nvme
[    9.755613] 3 locks held by nvme/1420:
[    9.756221]  #0:         (ptrval) (nvmf_dev_mutex){+.+.}, at: nvmf_dev_write+0x6a/0xb7d [nvme_fabrics]
[    9.757575]  #1:         (ptrval) (nvmf_transports_rwsem){++++}, at: nvmf_dev_write+0x6e5/0xb7d [nvme_fabrics]
[    9.759000]  #2:         (ptrval) (rcu_read_lock){....}, at: hctx_lock+0x56/0xd0
[    9.760141] CPU: 2 PID: 1420 Comm: nvme Not tainted 4.17.0-rc5+ #883
[    9.761078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[    9.762624] Call Trace:
[    9.763021]  dump_stack+0x78/0xb3
[    9.763505]  ___might_sleep+0x227/0x250
[    9.764115]  kmem_cache_alloc_trace+0x1ae/0x270
[    9.764793]  fcloop_fcp_req+0x32/0x1a0 [nvme_fcloop]
[    9.765561]  nvme_fc_start_fcp_op.part.39+0x193/0x4c0 [nvme_fc]
[    9.766480]  blk_mq_dispatch_rq_list+0x7f/0x4a0
[    9.767163]  ? blk_mq_flush_busy_ctxs+0xa8/0xf0
[    9.767871]  blk_mq_sched_dispatch_requests+0x16e/0x170
[    9.768644]  __blk_mq_run_hw_queue+0x79/0xd0
[    9.769294]  __blk_mq_delay_run_hw_queue+0x11c/0x160
[    9.770012]  blk_mq_run_hw_queue+0x63/0xc0
[    9.770667]  blk_mq_sched_insert_request+0xb2/0x140
[    9.771399]  blk_execute_rq+0x64/0xc0
[    9.771990]  __nvme_submit_sync_cmd+0x63/0xd0 [nvme_core]
[    9.772765]  nvmf_connect_admin_queue+0x11e/0x190 [nvme_fabrics]
[    9.773659]  ? mark_held_locks+0x6b/0x90
[    9.774798]  nvme_fc_create_association+0x35b/0x970 [nvme_fc]
[    9.775631]  nvme_fc_create_ctrl+0x5d2/0x830 [nvme_fc]
[    9.776423]  nvmf_dev_write+0x92d/0xb7d [nvme_fabrics]
[    9.777188]  __vfs_write+0x21/0x130
[    9.777725]  ? selinux_file_permission+0xe9/0x140
[    9.778417]  ? security_file_permission+0x2f/0xb0
[    9.779158]  vfs_write+0xbd/0x1c0
[    9.779644]  ksys_write+0x40/0xa0
[    9.780184]  ? do_syscall_64+0xd/0x190
[    9.780736]  do_syscall_64+0x51/0x190
[    9.781281]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/nvme/target/fcloop.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index 34712def81b1..d2209c60f95f 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -509,7 +509,7 @@ fcloop_fcp_req(struct nvme_fc_local_port *localport,
 	if (!rport->targetport)
 		return -ECONNREFUSED;
 
-	tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_KERNEL);
+	tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_ATOMIC);
 	if (!tfcp_req)
 		return -ENOMEM;
 
-- 
2.16.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks
  2018-05-15  7:40 [PATCH 0/5] Misc NVMe lockdep fixes Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2018-05-15  7:40 ` [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests Johannes Thumshirn
@ 2018-05-15  7:40 ` Johannes Thumshirn
  2018-05-31  9:37   ` Sagi Grimberg
                     ` (2 more replies)
  4 siblings, 3 replies; 19+ messages in thread
From: Johannes Thumshirn @ 2018-05-15  7:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke, Johannes Thumshirn,
	James Smart

Lockdep complains about inconsistent hardirq states when using
nvme-fcloop. So use the irqsave variant of spin_locks for the
nvmefc_fcp_req's reqlock.

Here's the lockdep report:
[   11.138417] ================================
[   11.139085] WARNING: inconsistent lock state
[   11.139207] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:046394e7-bff7-4403-9502-1816800efaa0.
[   11.139727] 4.17.0-rc5+ #883 Not tainted
[   11.139732] --------------------------------
[   11.144441] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[   11.145341] swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[   11.146108]         (ptrval) (&(&tfcp_req->reqlock)->rlock){?.+.}, at: fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
[   11.148633] {HARDIRQ-ON-W} state was registered at:
[   11.149380]   _raw_spin_lock+0x34/0x70
[   11.149940]   fcloop_fcp_recv_work+0x17/0x90 [nvme_fcloop]
[   11.150746]   process_one_work+0x1d8/0x5c0
[   11.151346]   worker_thread+0x45/0x3e0
[   11.151886]   kthread+0x101/0x140
[   11.152370]   ret_from_fork+0x3a/0x50
[   11.152903] irq event stamp: 36666
[   11.153402] hardirqs last  enabled at (36663): [<ffffffffafc601c3>] default_idle+0x13/0x180
[   11.154601] hardirqs last disabled at (36664): [<ffffffffafe00964>] interrupt_entry+0xc4/0xe0
[   11.155817] softirqs last  enabled at (36666): [<ffffffffaf267409>] irq_enter+0x59/0x60
[   11.156952] softirqs last disabled at (36665): [<ffffffffaf2673ee>] irq_enter+0x3e/0x60
[   11.158092]
[   11.158092] other info that might help us debug this:
[   11.159436]  Possible unsafe locking scenario:
[   11.159436]
[   11.160299]        CPU0
[   11.160663]        ----
[   11.161026]   lock(&(&tfcp_req->reqlock)->rlock);
[   11.161709]   <Interrupt>
[   11.162091]     lock(&(&tfcp_req->reqlock)->rlock);
[   11.163148]
[   11.163148]  *** DEADLOCK ***
[   11.163148]
[   11.164007] no locks held by swapper/8/0.
[   11.164596]
[   11.164596] stack backtrace:
[   11.165238] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.17.0-rc5+ #883
[   11.166180] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[   11.167673] Call Trace:
[   11.168037]  <IRQ>
[   11.168349]  dump_stack+0x78/0xb3
[   11.168864]  print_usage_bug+0x1ed/0x1fe
[   11.169440]  ? check_usage_backwards+0x110/0x110
[   11.170111]  mark_lock+0x263/0x2a0
[   11.170995]  __lock_acquire+0x675/0x1870
[   11.171865]  ? __lock_acquire+0x48d/0x1870
[   11.172460]  lock_acquire+0xd4/0x220
[   11.172981]  ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
[   11.173709]  _raw_spin_lock+0x34/0x70
[   11.174243]  ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
[   11.174978]  fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
[   11.175673]  nvmet_fc_transfer_fcp_data+0x9b/0x130 [nvmet_fc]
[   11.176507]  nvmet_req_complete+0x10/0x110 [nvmet]
[   11.177210]  nvmet_bio_done+0x23/0x40 [nvmet]
[   11.177837]  blk_update_request+0xab/0x3b0
[   11.178434]  blk_mq_end_request+0x13/0x60
[   11.179033]  flush_smp_call_function_queue+0x58/0x150
[   11.179755]  smp_call_function_single_interrupt+0x49/0x260
[   11.180531]  call_function_single_interrupt+0xf/0x20
[   11.181236]  </IRQ>
[   11.181542] RIP: 0010:native_safe_halt+0x2/0x10
[   11.182186] RSP: 0018:ffffa481006cbec0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff04
[   11.183265] RAX: ffff9f54f8b86200 RBX: ffff9f54f8b86200 RCX: 0000000000000000
[   11.184264] RDX: ffff9f54f8b86200 RSI: 0000000000000001 RDI: ffff9f54f8b86200
[   11.185270] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
[   11.186271] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[   11.187280] R13: 0000000000000000 R14: ffff9f54f8b86200 R15: ffff9f54f8b86200
[   11.188280]  default_idle+0x18/0x180
[   11.188806]  do_idle+0x176/0x260
[   11.189273]  cpu_startup_entry+0x5a/0x60
[   11.189832]  start_secondary+0x18f/0x1b0

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: James Smart <james.smart@broadcom.com>
---
 drivers/nvme/target/fcloop.c | 52 ++++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index d2209c60f95f..f4a3700201c1 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -406,9 +406,10 @@ fcloop_fcp_recv_work(struct work_struct *work)
 		container_of(work, struct fcloop_fcpreq, fcp_rcv_work);
 	struct nvmefc_fcp_req *fcpreq = tfcp_req->fcpreq;
 	int ret = 0;
+	unsigned long flags;
 	bool aborted = false;
 
-	spin_lock(&tfcp_req->reqlock);
+	spin_lock_irqsave(&tfcp_req->reqlock, flags);
 	switch (tfcp_req->inistate) {
 	case INI_IO_START:
 		tfcp_req->inistate = INI_IO_ACTIVE;
@@ -417,11 +418,11 @@ fcloop_fcp_recv_work(struct work_struct *work)
 		aborted = true;
 		break;
 	default:
-		spin_unlock(&tfcp_req->reqlock);
+		spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 		WARN_ON(1);
 		return;
 	}
-	spin_unlock(&tfcp_req->reqlock);
+	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 
 	if (unlikely(aborted))
 		ret = -ECANCELED;
@@ -442,8 +443,9 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
 		container_of(work, struct fcloop_fcpreq, abort_rcv_work);
 	struct nvmefc_fcp_req *fcpreq;
 	bool completed = false;
+	unsigned long flags;
 
-	spin_lock(&tfcp_req->reqlock);
+	spin_lock_irqsave(&tfcp_req->reqlock, flags);
 	fcpreq = tfcp_req->fcpreq;
 	switch (tfcp_req->inistate) {
 	case INI_IO_ABORTED:
@@ -452,11 +454,11 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
 		completed = true;
 		break;
 	default:
-		spin_unlock(&tfcp_req->reqlock);
+		spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 		WARN_ON(1);
 		return;
 	}
-	spin_unlock(&tfcp_req->reqlock);
+	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 
 	if (unlikely(completed)) {
 		/* remove reference taken in original abort downcall */
@@ -468,9 +470,9 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
 		nvmet_fc_rcv_fcp_abort(tfcp_req->tport->targetport,
 					&tfcp_req->tgt_fcp_req);
 
-	spin_lock(&tfcp_req->reqlock);
+	spin_lock_irqsave(&tfcp_req->reqlock, flags);
 	tfcp_req->fcpreq = NULL;
-	spin_unlock(&tfcp_req->reqlock);
+	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 
 	fcloop_call_host_done(fcpreq, tfcp_req, -ECANCELED);
 	/* call_host_done releases reference for abort downcall */
@@ -486,11 +488,12 @@ fcloop_tgt_fcprqst_done_work(struct work_struct *work)
 	struct fcloop_fcpreq *tfcp_req =
 		container_of(work, struct fcloop_fcpreq, tio_done_work);
 	struct nvmefc_fcp_req *fcpreq;
+	unsigned long flags;
 
-	spin_lock(&tfcp_req->reqlock);
+	spin_lock_irqsave(&tfcp_req->reqlock, flags);
 	fcpreq = tfcp_req->fcpreq;
 	tfcp_req->inistate = INI_IO_COMPLETED;
-	spin_unlock(&tfcp_req->reqlock);
+	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 
 	fcloop_call_host_done(fcpreq, tfcp_req, tfcp_req->status);
 }
@@ -594,13 +597,14 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
 	u32 rsplen = 0, xfrlen = 0;
 	int fcp_err = 0, active, aborted;
 	u8 op = tgt_fcpreq->op;
+	unsigned long flags;
 
-	spin_lock(&tfcp_req->reqlock);
+	spin_lock_irqsave(&tfcp_req->reqlock, flags);
 	fcpreq = tfcp_req->fcpreq;
 	active = tfcp_req->active;
 	aborted = tfcp_req->aborted;
 	tfcp_req->active = true;
-	spin_unlock(&tfcp_req->reqlock);
+	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 
 	if (unlikely(active))
 		/* illegal - call while i/o active */
@@ -608,9 +612,9 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
 
 	if (unlikely(aborted)) {
 		/* target transport has aborted i/o prior */
-		spin_lock(&tfcp_req->reqlock);
+		spin_lock_irqsave(&tfcp_req->reqlock, flags);
 		tfcp_req->active = false;
-		spin_unlock(&tfcp_req->reqlock);
+		spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 		tgt_fcpreq->transferred_length = 0;
 		tgt_fcpreq->fcp_error = -ECANCELED;
 		tgt_fcpreq->done(tgt_fcpreq);
@@ -666,9 +670,9 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
 		break;
 	}
 
-	spin_lock(&tfcp_req->reqlock);
+	spin_lock_irqsave(&tfcp_req->reqlock, flags);
 	tfcp_req->active = false;
-	spin_unlock(&tfcp_req->reqlock);
+	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 
 	tgt_fcpreq->transferred_length = xfrlen;
 	tgt_fcpreq->fcp_error = fcp_err;
@@ -682,15 +686,16 @@ fcloop_tgt_fcp_abort(struct nvmet_fc_target_port *tgtport,
 			struct nvmefc_tgt_fcp_req *tgt_fcpreq)
 {
 	struct fcloop_fcpreq *tfcp_req = tgt_fcp_req_to_fcpreq(tgt_fcpreq);
+	unsigned long flags;
 
 	/*
 	 * mark aborted only in case there were 2 threads in transport
 	 * (one doing io, other doing abort) and only kills ops posted
 	 * after the abort request
 	 */
-	spin_lock(&tfcp_req->reqlock);
+	spin_lock_irqsave(&tfcp_req->reqlock, flags);
 	tfcp_req->aborted = true;
-	spin_unlock(&tfcp_req->reqlock);
+	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 
 	tfcp_req->status = NVME_SC_INTERNAL;
 
@@ -726,19 +731,20 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
 	struct fcloop_ini_fcpreq *inireq = fcpreq->private;
 	struct fcloop_fcpreq *tfcp_req;
 	bool abortio = true;
+	unsigned long flags;
 
-	spin_lock(&inireq->inilock);
+	spin_lock_irqsave(&inireq->inilock, flags);
 	tfcp_req = inireq->tfcp_req;
 	if (tfcp_req)
 		fcloop_tfcp_req_get(tfcp_req);
-	spin_unlock(&inireq->inilock);
+	spin_unlock_irqrestore(&inireq->inilock, flags);
 
 	if (!tfcp_req)
 		/* abort has already been called */
 		return;
 
 	/* break initiator/target relationship for io */
-	spin_lock(&tfcp_req->reqlock);
+	spin_lock_irqsave(&tfcp_req->reqlock, flags);
 	switch (tfcp_req->inistate) {
 	case INI_IO_START:
 	case INI_IO_ACTIVE:
@@ -748,11 +754,11 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
 		abortio = false;
 		break;
 	default:
-		spin_unlock(&tfcp_req->reqlock);
+		spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 		WARN_ON(1);
 		return;
 	}
-	spin_unlock(&tfcp_req->reqlock);
+	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
 
 	if (abortio)
 		/* leave the reference while the work item is scheduled */
-- 
2.16.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups
  2018-05-15  7:40 ` [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Johannes Thumshirn
@ 2018-05-15  7:48   ` Christoph Hellwig
  2018-05-31  9:24     ` Sagi Grimberg
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-05-15  7:48 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Keith Busch, Sagi Grimberg, Christoph Hellwig,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Hannes Reinecke

> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index 7ae732a77fe8..febf82639b40 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -957,16 +957,17 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
>  
>  	down_read(&nvmf_transports_rwsem);
>  	ops = nvmf_lookup_transport(opts);
> +	up_read(&nvmf_transports_rwsem);

And what protects us from the transport getting unregister right here
from anothet thread waiting to acquire nvmf_transports_rwsem?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment
  2018-05-15  7:40 ` [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment Johannes Thumshirn
@ 2018-05-15  7:50   ` Christoph Hellwig
  2018-05-15  7:52     ` Johannes Thumshirn
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-05-15  7:50 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Keith Busch, Sagi Grimberg, Christoph Hellwig,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Hannes Reinecke

On Tue, May 15, 2018 at 09:40:41AM +0200, Johannes Thumshirn wrote:
> In nvmf_dev_write we did check if the /dev/nvme-fabrics device node's
> private data is already set and then create a controller data
> structure afterwards. The private data is protected by the
> nvmf_dev_mutex, but there is no need to hold it while calling
> nvmf_create_ctrl().
> 
> This also reduces the number of lockdep complaints in the 'nvme
> connect' with fcloop scenario.

Something looks seriously fishy in fcloop.  How do we and up in
a RCU critical section here?

In general it seems like fc loop needs to offload any I/O to a workqueue
just like nvme-loop does, but even then I can't see how that is going to
cause an issue in this area.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment
  2018-05-15  7:50   ` Christoph Hellwig
@ 2018-05-15  7:52     ` Johannes Thumshirn
  2018-05-31  9:26       ` Sagi Grimberg
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Thumshirn @ 2018-05-15  7:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke

On Tue, May 15, 2018 at 09:50:55AM +0200, Christoph Hellwig wrote:
> On Tue, May 15, 2018 at 09:40:41AM +0200, Johannes Thumshirn wrote:
> > In nvmf_dev_write we did check if the /dev/nvme-fabrics device node's
> > private data is already set and then create a controller data
> > structure afterwards. The private data is protected by the
> > nvmf_dev_mutex, but there is no need to hold it while calling
> > nvmf_create_ctrl().
> > 
> > This also reduces the number of lockdep complaints in the 'nvme
> > connect' with fcloop scenario.
> 
> Something looks seriously fishy in fcloop.  How do we and up in
> a RCU critical section here?

__blk_mq_run_hw_queue() calls blk_mq_sched_dispatch_requests() with
the hctx_lock() a.k.a rcu_read_lock() held.

nvmf_connect_admin_queue() does I/O which causes the above to be called.

> In general it seems like fc loop needs to offload any I/O to a workqueue
> just like nvme-loop does, but even then I can't see how that is going to
> cause an issue in this area.

Hmm I'll be looking into it.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path
  2018-05-15  7:40 ` [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path Johannes Thumshirn
@ 2018-05-15  8:11   ` Christoph Hellwig
  2018-05-31  9:22   ` Sagi Grimberg
  1 sibling, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-05-15  8:11 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Keith Busch, Sagi Grimberg, Christoph Hellwig,
	Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Hannes Reinecke

Looks good, although I hate having the whole splat in the commit
message..

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path
  2018-05-15  7:40 ` [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path Johannes Thumshirn
  2018-05-15  8:11   ` Christoph Hellwig
@ 2018-05-31  9:22   ` Sagi Grimberg
  1 sibling, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2018-05-31  9:22 UTC (permalink / raw)
  To: Johannes Thumshirn, Keith Busch
  Cc: Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke

Reviewed-by: Sagi Grimberg <sagi@grmberg.e>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups
  2018-05-15  7:48   ` Christoph Hellwig
@ 2018-05-31  9:24     ` Sagi Grimberg
  0 siblings, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2018-05-31  9:24 UTC (permalink / raw)
  To: Christoph Hellwig, Johannes Thumshirn
  Cc: Keith Busch, Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Hannes Reinecke


>> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
>> index 7ae732a77fe8..febf82639b40 100644
>> --- a/drivers/nvme/host/fabrics.c
>> +++ b/drivers/nvme/host/fabrics.c
>> @@ -957,16 +957,17 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
>>   
>>   	down_read(&nvmf_transports_rwsem);
>>   	ops = nvmf_lookup_transport(opts);
>> +	up_read(&nvmf_transports_rwsem);
> 
> And what protects us from the transport getting unregister right here
> from anothet thread waiting to acquire nvmf_transports_rwsem?

I think having the module_get inside as well would protect against it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment
  2018-05-15  7:52     ` Johannes Thumshirn
@ 2018-05-31  9:26       ` Sagi Grimberg
  0 siblings, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2018-05-31  9:26 UTC (permalink / raw)
  To: Johannes Thumshirn, Christoph Hellwig
  Cc: Keith Busch, Linux NVMe Mailinglist, Linux Kernel Mailinglist,
	Hannes Reinecke


>> In general it seems like fc loop needs to offload any I/O to a workqueue
>> just like nvme-loop does, but even then I can't see how that is going to
>> cause an issue in this area.
> 
> Hmm I'll be looking into it.

FWIW, I agree it should do that as well.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests
  2018-05-15  7:40 ` [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests Johannes Thumshirn
@ 2018-05-31  9:31   ` Sagi Grimberg
  2018-06-12 20:36     ` James Smart
  2018-06-12 20:39   ` James Smart
  1 sibling, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2018-05-31  9:31 UTC (permalink / raw)
  To: Johannes Thumshirn, Keith Busch
  Cc: Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke


> diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
> index 34712def81b1..d2209c60f95f 100644
> --- a/drivers/nvme/target/fcloop.c
> +++ b/drivers/nvme/target/fcloop.c
> @@ -509,7 +509,7 @@ fcloop_fcp_req(struct nvme_fc_local_port *localport,
>   	if (!rport->targetport)
>   		return -ECONNREFUSED;
>   
> -	tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_KERNEL);
> +	tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_ATOMIC);

Question, why isn't tfcp_req embedded in fcpreq? don't they have
the same lifetime?

>   	if (!tfcp_req)
>   		return -ENOMEM;
>   
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks
  2018-05-15  7:40 ` [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks Johannes Thumshirn
@ 2018-05-31  9:37   ` Sagi Grimberg
  2018-06-12 20:41   ` James Smart
  2018-06-12 20:48   ` Jens Axboe
  2 siblings, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2018-05-31  9:37 UTC (permalink / raw)
  To: Johannes Thumshirn, Keith Busch
  Cc: Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke, James Smart


> Lockdep complains about inconsistent hardirq states when using
> nvme-fcloop. So use the irqsave variant of spin_locks for the
> nvmefc_fcp_req's reqlock.

Is this because of the nvmet bdev can trigger rx path in interrupt
context? Don't understand exactly what is violated.

> 
> Here's the lockdep report:
> [   11.138417] ================================
> [   11.139085] WARNING: inconsistent lock state
> [   11.139207] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:046394e7-bff7-4403-9502-1816800efaa0.
> [   11.139727] 4.17.0-rc5+ #883 Not tainted
> [   11.139732] --------------------------------
> [   11.144441] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [   11.145341] swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [   11.146108]         (ptrval) (&(&tfcp_req->reqlock)->rlock){?.+.}, at: fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.148633] {HARDIRQ-ON-W} state was registered at:
> [   11.149380]   _raw_spin_lock+0x34/0x70
> [   11.149940]   fcloop_fcp_recv_work+0x17/0x90 [nvme_fcloop]
> [   11.150746]   process_one_work+0x1d8/0x5c0
> [   11.151346]   worker_thread+0x45/0x3e0
> [   11.151886]   kthread+0x101/0x140
> [   11.152370]   ret_from_fork+0x3a/0x50
> [   11.152903] irq event stamp: 36666
> [   11.153402] hardirqs last  enabled at (36663): [<ffffffffafc601c3>] default_idle+0x13/0x180
> [   11.154601] hardirqs last disabled at (36664): [<ffffffffafe00964>] interrupt_entry+0xc4/0xe0
> [   11.155817] softirqs last  enabled at (36666): [<ffffffffaf267409>] irq_enter+0x59/0x60
> [   11.156952] softirqs last disabled at (36665): [<ffffffffaf2673ee>] irq_enter+0x3e/0x60
> [   11.158092]
> [   11.158092] other info that might help us debug this:
> [   11.159436]  Possible unsafe locking scenario:
> [   11.159436]
> [   11.160299]        CPU0
> [   11.160663]        ----
> [   11.161026]   lock(&(&tfcp_req->reqlock)->rlock);
> [   11.161709]   <Interrupt>
> [   11.162091]     lock(&(&tfcp_req->reqlock)->rlock);
> [   11.163148]
> [   11.163148]  *** DEADLOCK ***
> [   11.163148]
> [   11.164007] no locks held by swapper/8/0.
> [   11.164596]
> [   11.164596] stack backtrace:
> [   11.165238] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.17.0-rc5+ #883
> [   11.166180] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [   11.167673] Call Trace:
> [   11.168037]  <IRQ>
> [   11.168349]  dump_stack+0x78/0xb3
> [   11.168864]  print_usage_bug+0x1ed/0x1fe
> [   11.169440]  ? check_usage_backwards+0x110/0x110
> [   11.170111]  mark_lock+0x263/0x2a0
> [   11.170995]  __lock_acquire+0x675/0x1870
> [   11.171865]  ? __lock_acquire+0x48d/0x1870
> [   11.172460]  lock_acquire+0xd4/0x220
> [   11.172981]  ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.173709]  _raw_spin_lock+0x34/0x70
> [   11.174243]  ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.174978]  fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.175673]  nvmet_fc_transfer_fcp_data+0x9b/0x130 [nvmet_fc]
> [   11.176507]  nvmet_req_complete+0x10/0x110 [nvmet]
> [   11.177210]  nvmet_bio_done+0x23/0x40 [nvmet]
> [   11.177837]  blk_update_request+0xab/0x3b0
> [   11.178434]  blk_mq_end_request+0x13/0x60
> [   11.179033]  flush_smp_call_function_queue+0x58/0x150
> [   11.179755]  smp_call_function_single_interrupt+0x49/0x260
> [   11.180531]  call_function_single_interrupt+0xf/0x20
> [   11.181236]  </IRQ>
> [   11.181542] RIP: 0010:native_safe_halt+0x2/0x10
> [   11.182186] RSP: 0018:ffffa481006cbec0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff04
> [   11.183265] RAX: ffff9f54f8b86200 RBX: ffff9f54f8b86200 RCX: 0000000000000000
> [   11.184264] RDX: ffff9f54f8b86200 RSI: 0000000000000001 RDI: ffff9f54f8b86200
> [   11.185270] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
> [   11.186271] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [   11.187280] R13: 0000000000000000 R14: ffff9f54f8b86200 R15: ffff9f54f8b86200
> [   11.188280]  default_idle+0x18/0x180
> [   11.188806]  do_idle+0x176/0x260
> [   11.189273]  cpu_startup_entry+0x5a/0x60
> [   11.189832]  start_secondary+0x18f/0x1b0
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: James Smart <james.smart@broadcom.com>
> ---
>   drivers/nvme/target/fcloop.c | 52 ++++++++++++++++++++++++--------------------
>   1 file changed, 29 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
> index d2209c60f95f..f4a3700201c1 100644
> --- a/drivers/nvme/target/fcloop.c
> +++ b/drivers/nvme/target/fcloop.c
> @@ -406,9 +406,10 @@ fcloop_fcp_recv_work(struct work_struct *work)
>   		container_of(work, struct fcloop_fcpreq, fcp_rcv_work);
>   	struct nvmefc_fcp_req *fcpreq = tfcp_req->fcpreq;
>   	int ret = 0;
> +	unsigned long flags;
>   	bool aborted = false;
>   
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   	switch (tfcp_req->inistate) {
>   	case INI_IO_START:
>   		tfcp_req->inistate = INI_IO_ACTIVE;
> @@ -417,11 +418,11 @@ fcloop_fcp_recv_work(struct work_struct *work)
>   		aborted = true;
>   		break;
>   	default:
> -		spin_unlock(&tfcp_req->reqlock);
> +		spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   		WARN_ON(1);
>   		return;
>   	}
> -	spin_unlock(&tfcp_req->reqlock);
> +	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   
>   	if (unlikely(aborted))
>   		ret = -ECANCELED;
> @@ -442,8 +443,9 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
>   		container_of(work, struct fcloop_fcpreq, abort_rcv_work);
>   	struct nvmefc_fcp_req *fcpreq;
>   	bool completed = false;
> +	unsigned long flags;
>   
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   	fcpreq = tfcp_req->fcpreq;
>   	switch (tfcp_req->inistate) {
>   	case INI_IO_ABORTED:
> @@ -452,11 +454,11 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
>   		completed = true;
>   		break;
>   	default:
> -		spin_unlock(&tfcp_req->reqlock);
> +		spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   		WARN_ON(1);
>   		return;
>   	}
> -	spin_unlock(&tfcp_req->reqlock);
> +	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   
>   	if (unlikely(completed)) {
>   		/* remove reference taken in original abort downcall */
> @@ -468,9 +470,9 @@ fcloop_fcp_abort_recv_work(struct work_struct *work)
>   		nvmet_fc_rcv_fcp_abort(tfcp_req->tport->targetport,
>   					&tfcp_req->tgt_fcp_req);
>   
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   	tfcp_req->fcpreq = NULL;
> -	spin_unlock(&tfcp_req->reqlock);
> +	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   
>   	fcloop_call_host_done(fcpreq, tfcp_req, -ECANCELED);
>   	/* call_host_done releases reference for abort downcall */
> @@ -486,11 +488,12 @@ fcloop_tgt_fcprqst_done_work(struct work_struct *work)
>   	struct fcloop_fcpreq *tfcp_req =
>   		container_of(work, struct fcloop_fcpreq, tio_done_work);
>   	struct nvmefc_fcp_req *fcpreq;
> +	unsigned long flags;
>   
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   	fcpreq = tfcp_req->fcpreq;
>   	tfcp_req->inistate = INI_IO_COMPLETED;
> -	spin_unlock(&tfcp_req->reqlock);
> +	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   
>   	fcloop_call_host_done(fcpreq, tfcp_req, tfcp_req->status);
>   }
> @@ -594,13 +597,14 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
>   	u32 rsplen = 0, xfrlen = 0;
>   	int fcp_err = 0, active, aborted;
>   	u8 op = tgt_fcpreq->op;
> +	unsigned long flags;
>   
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   	fcpreq = tfcp_req->fcpreq;
>   	active = tfcp_req->active;
>   	aborted = tfcp_req->aborted;
>   	tfcp_req->active = true;
> -	spin_unlock(&tfcp_req->reqlock);
> +	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   
>   	if (unlikely(active))
>   		/* illegal - call while i/o active */
> @@ -608,9 +612,9 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
>   
>   	if (unlikely(aborted)) {
>   		/* target transport has aborted i/o prior */
> -		spin_lock(&tfcp_req->reqlock);
> +		spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   		tfcp_req->active = false;
> -		spin_unlock(&tfcp_req->reqlock);
> +		spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   		tgt_fcpreq->transferred_length = 0;
>   		tgt_fcpreq->fcp_error = -ECANCELED;
>   		tgt_fcpreq->done(tgt_fcpreq);
> @@ -666,9 +670,9 @@ fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
>   		break;
>   	}
>   
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   	tfcp_req->active = false;
> -	spin_unlock(&tfcp_req->reqlock);
> +	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   
>   	tgt_fcpreq->transferred_length = xfrlen;
>   	tgt_fcpreq->fcp_error = fcp_err;
> @@ -682,15 +686,16 @@ fcloop_tgt_fcp_abort(struct nvmet_fc_target_port *tgtport,
>   			struct nvmefc_tgt_fcp_req *tgt_fcpreq)
>   {
>   	struct fcloop_fcpreq *tfcp_req = tgt_fcp_req_to_fcpreq(tgt_fcpreq);
> +	unsigned long flags;
>   
>   	/*
>   	 * mark aborted only in case there were 2 threads in transport
>   	 * (one doing io, other doing abort) and only kills ops posted
>   	 * after the abort request
>   	 */
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   	tfcp_req->aborted = true;
> -	spin_unlock(&tfcp_req->reqlock);
> +	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   
>   	tfcp_req->status = NVME_SC_INTERNAL;
>   
> @@ -726,19 +731,20 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
>   	struct fcloop_ini_fcpreq *inireq = fcpreq->private;
>   	struct fcloop_fcpreq *tfcp_req;
>   	bool abortio = true;
> +	unsigned long flags;
>   
> -	spin_lock(&inireq->inilock);
> +	spin_lock_irqsave(&inireq->inilock, flags);
>   	tfcp_req = inireq->tfcp_req;
>   	if (tfcp_req)
>   		fcloop_tfcp_req_get(tfcp_req);
> -	spin_unlock(&inireq->inilock);
> +	spin_unlock_irqrestore(&inireq->inilock, flags);
>   
>   	if (!tfcp_req)
>   		/* abort has already been called */
>   		return;
>   
>   	/* break initiator/target relationship for io */
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);
>   	switch (tfcp_req->inistate) {
>   	case INI_IO_START:
>   	case INI_IO_ACTIVE:
> @@ -748,11 +754,11 @@ fcloop_fcp_abort(struct nvme_fc_local_port *localport,
>   		abortio = false;
>   		break;
>   	default:
> -		spin_unlock(&tfcp_req->reqlock);
> +		spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   		WARN_ON(1);
>   		return;
>   	}
> -	spin_unlock(&tfcp_req->reqlock);
> +	spin_unlock_irqrestore(&tfcp_req->reqlock, flags);
>   
>   	if (abortio)
>   		/* leave the reference while the work item is scheduled */
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests
  2018-05-31  9:31   ` Sagi Grimberg
@ 2018-06-12 20:36     ` James Smart
  0 siblings, 0 replies; 19+ messages in thread
From: James Smart @ 2018-06-12 20:36 UTC (permalink / raw)
  To: Sagi Grimberg, Johannes Thumshirn, Keith Busch
  Cc: Hannes Reinecke, Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist



On 5/31/2018 2:31 AM, Sagi Grimberg wrote:
>
> Question, why isn't tfcp_req embedded in fcpreq? don't they have
> the same lifetime?
>

no they don't.  To properly simulate cable-pulls, etc - the host side 
and controller side effectively have their own "exchange" structure. 
tfcp_req corresponds to the controller side. The lifetimes of the two 
halves can differ.

-- james


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests
  2018-05-15  7:40 ` [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests Johannes Thumshirn
  2018-05-31  9:31   ` Sagi Grimberg
@ 2018-06-12 20:39   ` James Smart
  1 sibling, 0 replies; 19+ messages in thread
From: James Smart @ 2018-06-12 20:39 UTC (permalink / raw)
  To: Johannes Thumshirn, Keith Busch
  Cc: Sagi Grimberg, Linux Kernel Mailinglist, Linux NVMe Mailinglist,
	Hannes Reinecke, Christoph Hellwig



On 5/15/2018 12:40 AM, Johannes Thumshirn wrote:
> fcloop_fcp_req() runs with the hctx_lock (a rcu_read_lock() locked
> section) held, so memory allocations done in this context have to be
> atomic.
...
>
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>   drivers/nvme/target/fcloop.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
> index 34712def81b1..d2209c60f95f 100644
> --- a/drivers/nvme/target/fcloop.c
> +++ b/drivers/nvme/target/fcloop.c
> @@ -509,7 +509,7 @@ fcloop_fcp_req(struct nvme_fc_local_port *localport,
>   	if (!rport->targetport)
>   		return -ECONNREFUSED;
>   
> -	tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_KERNEL);
> +	tfcp_req = kzalloc(sizeof(*tfcp_req), GFP_ATOMIC);
>   	if (!tfcp_req)
>   		return -ENOMEM;
>   

Reviewed-by:   James Smart  <james.smart@broadcom.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks
  2018-05-15  7:40 ` [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks Johannes Thumshirn
  2018-05-31  9:37   ` Sagi Grimberg
@ 2018-06-12 20:41   ` James Smart
  2018-06-12 20:48   ` Jens Axboe
  2 siblings, 0 replies; 19+ messages in thread
From: James Smart @ 2018-06-12 20:41 UTC (permalink / raw)
  To: Johannes Thumshirn, Keith Busch
  Cc: Sagi Grimberg, Christoph Hellwig, Linux NVMe Mailinglist,
	Linux Kernel Mailinglist, Hannes Reinecke



On 5/15/2018 12:40 AM, Johannes Thumshirn wrote:
> Lockdep complains about inconsistent hardirq states when using
> nvme-fcloop. So use the irqsave variant of spin_locks for the
> nvmefc_fcp_req's reqlock.
>
> Here's the lockdep report:
> [   11.138417] ================================
> [   11.139085] WARNING: inconsistent lock state
> [   11.139207] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:046394e7-bff7-4403-9502-1816800efaa0.
> [   11.139727] 4.17.0-rc5+ #883 Not tainted
> [   11.139732] --------------------------------
> [   11.144441] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [   11.145341] swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [   11.146108]         (ptrval) (&(&tfcp_req->reqlock)->rlock){?.+.}, at: fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.148633] {HARDIRQ-ON-W} state was registered at:
> [   11.149380]   _raw_spin_lock+0x34/0x70
> [   11.149940]   fcloop_fcp_recv_work+0x17/0x90 [nvme_fcloop]
> [   11.150746]   process_one_work+0x1d8/0x5c0
> [   11.151346]   worker_thread+0x45/0x3e0
> [   11.151886]   kthread+0x101/0x140
> [   11.152370]   ret_from_fork+0x3a/0x50
> [   11.152903] irq event stamp: 36666
> [   11.153402] hardirqs last  enabled at (36663): [<ffffffffafc601c3>] default_idle+0x13/0x180
> [   11.154601] hardirqs last disabled at (36664): [<ffffffffafe00964>] interrupt_entry+0xc4/0xe0
> [   11.155817] softirqs last  enabled at (36666): [<ffffffffaf267409>] irq_enter+0x59/0x60
> [   11.156952] softirqs last disabled at (36665): [<ffffffffaf2673ee>] irq_enter+0x3e/0x60
> [   11.158092]
> [   11.158092] other info that might help us debug this:
> [   11.159436]  Possible unsafe locking scenario:
> [   11.159436]
> [   11.160299]        CPU0
> [   11.160663]        ----
> [   11.161026]   lock(&(&tfcp_req->reqlock)->rlock);
> [   11.161709]   <Interrupt>
> [   11.162091]     lock(&(&tfcp_req->reqlock)->rlock);
> [   11.163148]
> [   11.163148]  *** DEADLOCK ***
> [   11.163148]
> [   11.164007] no locks held by swapper/8/0.
> [   11.164596]
> [   11.164596] stack backtrace:
> [   11.165238] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.17.0-rc5+ #883
> [   11.166180] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [   11.167673] Call Trace:
> [   11.168037]  <IRQ>
> [   11.168349]  dump_stack+0x78/0xb3
> [   11.168864]  print_usage_bug+0x1ed/0x1fe
> [   11.169440]  ? check_usage_backwards+0x110/0x110
> [   11.170111]  mark_lock+0x263/0x2a0
> [   11.170995]  __lock_acquire+0x675/0x1870
> [   11.171865]  ? __lock_acquire+0x48d/0x1870
> [   11.172460]  lock_acquire+0xd4/0x220
> [   11.172981]  ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.173709]  _raw_spin_lock+0x34/0x70
> [   11.174243]  ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.174978]  fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.175673]  nvmet_fc_transfer_fcp_data+0x9b/0x130 [nvmet_fc]
> [   11.176507]  nvmet_req_complete+0x10/0x110 [nvmet]
> [   11.177210]  nvmet_bio_done+0x23/0x40 [nvmet]
> [   11.177837]  blk_update_request+0xab/0x3b0
> [   11.178434]  blk_mq_end_request+0x13/0x60
> [   11.179033]  flush_smp_call_function_queue+0x58/0x150
> [   11.179755]  smp_call_function_single_interrupt+0x49/0x260
> [   11.180531]  call_function_single_interrupt+0xf/0x20
> [   11.181236]  </IRQ>
> [   11.181542] RIP: 0010:native_safe_halt+0x2/0x10
> [   11.182186] RSP: 0018:ffffa481006cbec0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff04
> [   11.183265] RAX: ffff9f54f8b86200 RBX: ffff9f54f8b86200 RCX: 0000000000000000
> [   11.184264] RDX: ffff9f54f8b86200 RSI: 0000000000000001 RDI: ffff9f54f8b86200
> [   11.185270] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
> [   11.186271] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [   11.187280] R13: 0000000000000000 R14: ffff9f54f8b86200 R15: ffff9f54f8b86200
> [   11.188280]  default_idle+0x18/0x180
> [   11.188806]  do_idle+0x176/0x260
> [   11.189273]  cpu_startup_entry+0x5a/0x60
> [   11.189832]  start_secondary+0x18f/0x1b0
>
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: James Smart <james.smart@broadcom.com>
> ---
>   drivers/nvme/target/fcloop.c | 52 ++++++++++++++++++++++++--------------------
>   1 file changed, 29 insertions(+), 23 deletions(-)
>

Looks ok. I assume it is as the bio_done call can be in an interrupt 
handler.

Reviewed-by:   James Smart  <james.smart@broadcom.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks
  2018-05-15  7:40 ` [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks Johannes Thumshirn
  2018-05-31  9:37   ` Sagi Grimberg
  2018-06-12 20:41   ` James Smart
@ 2018-06-12 20:48   ` Jens Axboe
  2 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2018-06-12 20:48 UTC (permalink / raw)
  To: Johannes Thumshirn, Keith Busch
  Cc: Sagi Grimberg, Linux Kernel Mailinglist, Linux NVMe Mailinglist,
	James Smart, Hannes Reinecke, Christoph Hellwig

On 5/15/18 1:40 AM, Johannes Thumshirn wrote:
> Lockdep complains about inconsistent hardirq states when using
> nvme-fcloop. So use the irqsave variant of spin_locks for the
> nvmefc_fcp_req's reqlock.
> 
> Here's the lockdep report:
> [   11.138417] ================================
> [   11.139085] WARNING: inconsistent lock state
> [   11.139207] nvmet: creating controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:046394e7-bff7-4403-9502-1816800efaa0.
> [   11.139727] 4.17.0-rc5+ #883 Not tainted
> [   11.139732] --------------------------------
> [   11.144441] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [   11.145341] swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [   11.146108]         (ptrval) (&(&tfcp_req->reqlock)->rlock){?.+.}, at: fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.148633] {HARDIRQ-ON-W} state was registered at:
> [   11.149380]   _raw_spin_lock+0x34/0x70
> [   11.149940]   fcloop_fcp_recv_work+0x17/0x90 [nvme_fcloop]
> [   11.150746]   process_one_work+0x1d8/0x5c0
> [   11.151346]   worker_thread+0x45/0x3e0
> [   11.151886]   kthread+0x101/0x140
> [   11.152370]   ret_from_fork+0x3a/0x50
> [   11.152903] irq event stamp: 36666
> [   11.153402] hardirqs last  enabled at (36663): [<ffffffffafc601c3>] default_idle+0x13/0x180
> [   11.154601] hardirqs last disabled at (36664): [<ffffffffafe00964>] interrupt_entry+0xc4/0xe0
> [   11.155817] softirqs last  enabled at (36666): [<ffffffffaf267409>] irq_enter+0x59/0x60
> [   11.156952] softirqs last disabled at (36665): [<ffffffffaf2673ee>] irq_enter+0x3e/0x60
> [   11.158092]
> [   11.158092] other info that might help us debug this:
> [   11.159436]  Possible unsafe locking scenario:
> [   11.159436]
> [   11.160299]        CPU0
> [   11.160663]        ----
> [   11.161026]   lock(&(&tfcp_req->reqlock)->rlock);
> [   11.161709]   <Interrupt>
> [   11.162091]     lock(&(&tfcp_req->reqlock)->rlock);
> [   11.163148]
> [   11.163148]  *** DEADLOCK ***
> [   11.163148]
> [   11.164007] no locks held by swapper/8/0.
> [   11.164596]
> [   11.164596] stack backtrace:
> [   11.165238] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.17.0-rc5+ #883
> [   11.166180] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [   11.167673] Call Trace:
> [   11.168037]  <IRQ>
> [   11.168349]  dump_stack+0x78/0xb3
> [   11.168864]  print_usage_bug+0x1ed/0x1fe
> [   11.169440]  ? check_usage_backwards+0x110/0x110
> [   11.170111]  mark_lock+0x263/0x2a0
> [   11.170995]  __lock_acquire+0x675/0x1870
> [   11.171865]  ? __lock_acquire+0x48d/0x1870
> [   11.172460]  lock_acquire+0xd4/0x220
> [   11.172981]  ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.173709]  _raw_spin_lock+0x34/0x70
> [   11.174243]  ? fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.174978]  fcloop_fcp_op+0x20/0x1d0 [nvme_fcloop]
> [   11.175673]  nvmet_fc_transfer_fcp_data+0x9b/0x130 [nvmet_fc]
> [   11.176507]  nvmet_req_complete+0x10/0x110 [nvmet]
> [   11.177210]  nvmet_bio_done+0x23/0x40 [nvmet]
> [   11.177837]  blk_update_request+0xab/0x3b0
> [   11.178434]  blk_mq_end_request+0x13/0x60
> [   11.179033]  flush_smp_call_function_queue+0x58/0x150
> [   11.179755]  smp_call_function_single_interrupt+0x49/0x260
> [   11.180531]  call_function_single_interrupt+0xf/0x20
> [   11.181236]  </IRQ>
> [   11.181542] RIP: 0010:native_safe_halt+0x2/0x10
> [   11.182186] RSP: 0018:ffffa481006cbec0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff04
> [   11.183265] RAX: ffff9f54f8b86200 RBX: ffff9f54f8b86200 RCX: 0000000000000000
> [   11.184264] RDX: ffff9f54f8b86200 RSI: 0000000000000001 RDI: ffff9f54f8b86200
> [   11.185270] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
> [   11.186271] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [   11.187280] R13: 0000000000000000 R14: ffff9f54f8b86200 R15: ffff9f54f8b86200
> [   11.188280]  default_idle+0x18/0x180
> [   11.188806]  do_idle+0x176/0x260
> [   11.189273]  cpu_startup_entry+0x5a/0x60
> [   11.189832]  start_secondary+0x18f/0x1b0
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: James Smart <james.smart@broadcom.com>
> ---
>  drivers/nvme/target/fcloop.c | 52 ++++++++++++++++++++++++--------------------
>  1 file changed, 29 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
> index d2209c60f95f..f4a3700201c1 100644
> --- a/drivers/nvme/target/fcloop.c
> +++ b/drivers/nvme/target/fcloop.c
> @@ -406,9 +406,10 @@ fcloop_fcp_recv_work(struct work_struct *work)
>  		container_of(work, struct fcloop_fcpreq, fcp_rcv_work);
>  	struct nvmefc_fcp_req *fcpreq = tfcp_req->fcpreq;
>  	int ret = 0;
> +	unsigned long flags;
>  	bool aborted = false;
>  
> -	spin_lock(&tfcp_req->reqlock);
> +	spin_lock_irqsave(&tfcp_req->reqlock, flags);

This should just be spin_lock_irq(), you don't need irq saving locks when
you are always called from a workqueue handler.

Ditto the other workqueue handlers. fcloop_fcp_op() part looks fine.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-06-12 20:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-15  7:40 [PATCH 0/5] Misc NVMe lockdep fixes Johannes Thumshirn
2018-05-15  7:40 ` [PATCHv2 1/5] nvme: fix lockdep warning in nvme_mpath_clear_current_path Johannes Thumshirn
2018-05-15  8:11   ` Christoph Hellwig
2018-05-31  9:22   ` Sagi Grimberg
2018-05-15  7:40 ` [PATCH 2/5] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Johannes Thumshirn
2018-05-15  7:48   ` Christoph Hellwig
2018-05-31  9:24     ` Sagi Grimberg
2018-05-15  7:40 ` [PATCH 3/5] nvme: call nvmf_create_ctrl before checking for duplicate assignment Johannes Thumshirn
2018-05-15  7:50   ` Christoph Hellwig
2018-05-15  7:52     ` Johannes Thumshirn
2018-05-31  9:26       ` Sagi Grimberg
2018-05-15  7:40 ` [PATCH 4/5] nvmet: use atomic allocations when allocating fc requests Johannes Thumshirn
2018-05-31  9:31   ` Sagi Grimberg
2018-06-12 20:36     ` James Smart
2018-06-12 20:39   ` James Smart
2018-05-15  7:40 ` [PATCH 5/5] nvmet: fcloop: use irqsave spinlocks Johannes Thumshirn
2018-05-31  9:37   ` Sagi Grimberg
2018-06-12 20:41   ` James Smart
2018-06-12 20:48   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).