linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] nvme-fc: Fix blktests hangers
@ 2023-06-20 13:37 Daniel Wagner
  2023-06-20 13:37 ` [PATCH v2 1/5] nvme-fc: Do not wait in vain when unloading module Daniel Wagner
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:37 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Daniel Wagner

With the additinal 3 patches and the blktests changes [1] I am able to run the
tests in a loop for a while without failures. I expect there are more problems
hidden but with these changes the first milesone is reached.

The first new patch is actually one from a previous attempt. It addresses the
problem in unloading path where we dont make any progress more [2].

The next two patches change the existing initial connection attempt of FC to
a synchronous one. Without this the auth test fail [3].

Daniel


[1] https://lore.kernel.org/linux-nvme/20230620132703.20648-1-dwagner@suse.de/
[2] https://lore.kernel.org/linux-nvme/20230418130159.11075-1-dwagner@suse.de/
[3] https://lore.kernel.org/linux-nvme/j4w7724skjsapgtp6wtll5fgsb4adhpfdtrif52lhc7iql4inf@3mu2gbjrrtcs/

changes:

v2:
  - added RBs
  - added new patches

v1:
  - https://lore.kernel.org/linux-nvme/20230615094356.14878-1-dwagner@suse.de/ 


Initial cover letter:

A couple more fixes to enable blktests for the fc transport.

1) Another module unloading hanger which can be triggered with
   the nvme/048 tests:

   run blktests nvme/048 at 2023-06-06 13:04:49
   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
   nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
   (NULL device *): {0:0} Association created
   [478] nvmet: ctrl 1 start keep-alive timer for 1 secs
   [478] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
   [478] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
   [478] nvmet: No authentication provided
   nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
   [478] nvmet: adding queue 1 to ctrl 1.
   [407] nvmet: adding queue 2 to ctrl 1.
   [6549] nvmet: adding queue 3 to ctrl 1.
   [1760] nvmet: adding queue 4 to ctrl 1.
   nvme nvme2: NVME-FC{0}: controller connect complete
   nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
   [478] nvmet: ctrl 1 reschedule traffic based keep-alive timer
   [407] nvmet: ctrl 1 update keep-alive timer for 1 secs
   nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
   nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
   nvme nvme2: NVME-FC{0}: resetting controller
   [478] nvmet: ctrl 1 stop keep-alive
   (NULL device *): {0:0} Association deleted
   nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
   (NULL device *): {0:0} Association freed
   (NULL device *): {0:0} Association created
   (NULL device *): Disconnect LS failed: No Association
   [407] nvmet: ctrl 1 start keep-alive timer for 1 secs
   [407] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
   [407] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
   [407] nvmet: No authentication provided
   nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
   nvme nvme2: reconnect: revising io queue count from 4 to 1
   [478] nvmet: adding queue 1 to ctrl 1.
   nvme nvme2: NVME-FC{0}: controller connect complete
   [478] nvmet: ctrl 1 reschedule traffic based keep-alive timer
   [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [407] nvmet: ctrl 1 update keep-alive timer for 1 secs
   nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
   nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
   nvme nvme2: NVME-FC{0}: resetting controller
   [478] nvmet: ctrl 1 stop keep-alive
   (NULL device *): {0:0} Association deleted
   (NULL device *): {0:0} Association freed
   (NULL device *): Disconnect LS failed: No Association
   nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
   (NULL device *): {0:0} Association created
   [1760] nvmet: ctrl 1 start keep-alive timer for 1 secs
   [1760] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
   [1760] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
   [1760] nvmet: No authentication provided
   nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
   nvme nvme2: reconnect: revising io queue count from 1 to 4
   [478] nvmet: adding queue 1 to ctrl 1.
   [407] nvmet: adding queue 2 to ctrl 1.
   [6549] nvmet: adding queue 3 to ctrl 1.
   [1760] nvmet: adding queue 4 to ctrl 1.
   nvme nvme2: NVME-FC{0}: controller connect complete
   [1760] nvmet: ctrl 1 reschedule traffic based keep-alive timer
   [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
   [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
   nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
   [407] nvmet: ctrl 1 stop keep-alive
   (NULL device *): {0:0} Association deleted
   (NULL device *): {0:0} Association freed
   (NULL device *): Disconnect LS failed: No Association
   nvme_fc: nvme_fc_exit_module: waiting for ctlr deletes
   BTRFS info (device vda2): qgroup scan completed (inconsistency flag cleared)
   INFO: task modprobe:11758 blocked for more than 491 seconds.
         Tainted: G        W          6.4.0-rc2+ #3
   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   task:modprobe        state:D stack:0     pid:11758 ppid:11585  flags:0x00004002
   Call Trace:
    <TASK>
    __schedule+0x17b5/0x4870
    ? finish_lock_switch+0x8f/0x120
    ? finish_task_switch+0x17f/0x5c0
    ? __sched_text_start+0x10/0x10
    ? __schedule+0x17bd/0x4870
    ? do_raw_spin_trylock+0xc9/0x1f0
    schedule+0xe6/0x1e0
    schedule_timeout+0x7e/0x1e0
    ? __cfi_schedule_timeout+0x10/0x10
    ? do_raw_spin_trylock+0xc9/0x1f0
    ? __cfi_lock_release+0x10/0x10
    ? do_raw_spin_unlock+0x116/0x8a0
    wait_for_common+0x377/0x600
    ? wait_for_completion+0x30/0x30
    cleanup_module+0x222/0x240 [nvme_fc bda1ef6f83d00208ba257c5d0ac34dd6bdb58bf9]
    __se_sys_delete_module+0x388/0x550
    ? __x64_sys_delete_module+0x50/0x50
    ? task_work_run+0x236/0x290
    ? syscall_enter_from_user_mode+0x2e/0x210
    do_syscall_64+0x6e/0xa0
    ? syscall_exit_to_user_mode+0x5e/0x220
    ? do_syscall_64+0x7d/0xa0
    ? syscall_exit_to_user_mode+0x5e/0x220
    ? do_syscall_64+0x7d/0xa0
    ? syscall_exit_to_user_mode+0x5e/0x220
    ? do_syscall_64+0x7d/0xa0
    entry_SYSCALL_64_after_hwframe+0x72/0xdc
   RIP: 0033:0x7fa80811aebb
   RSP: 002b:00007fff80ea0a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
   RAX: ffffffffffffffda RBX: 000055a0b5acf1f0 RCX: 00007fa80811aebb
   RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a0b5acf258
   RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
   R10: 00007fa808193ac0 R11: 0000000000000206 R12: 00007fff80ea0ad0
   R13: 000055a0b5acf1f0 R14: 0000000000000000 R15: 0000000000000000
    </TASK>
 

2) When executing blktests nvme/030 in a tight loop, I was able to
   reproduce a different hanger. In this case the ->done() function
   was never executed on the host fc side. It turns out we didn't
   enqueue work items correctly and thus would only process one
   work item.

   run blktests nvme/030 at 2023-06-13 14:03:52
   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
   nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
   (NULL device *): {0:0} Association created
   [23733] nvmet: ctrl 1 start keep-alive timer for 120 secs
   nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
   nvme nvme2: NVME-FC{0}: controller connect complete
   nvme nvme2: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
   nvme nvme2: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
   [23771] nvmet: ctrl 1 stop keep-alive
   nvmet: adding nsid 1 to subsystem blktests-subsystem-2
   (NULL device *): {0:0} Association deleted
   (NULL device *): {0:0} Association freed
   (NULL device *): Disconnect LS failed: No Association
   nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
   (NULL device *): {0:0} Association created
   [27320] nvmet: ctrl 1 start keep-alive timer for 120 secs
   nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
   nvme nvme2: NVME-FC{0}: controller connect complete
   nvme nvme2: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
   nvme nvme2: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
   [27320] nvmet: ctrl 1 stop keep-alive
   nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
   (NULL device *): {0:1} Association created
   (NULL device *): {0:0} Association deleted
   (NULL device *): {0:0} Association freed
   (NULL device *): Disconnect LS failed: No Association
   INFO: task kworker/u8:4:102 blocked for more than 491 seconds.
         Tainted: G        W          6.4.0-rc2+ #3
   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   task:kworker/u8:4    state:D stack:0     pid:102   ppid:2      flags:0x00004000
   Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
   Call Trace:
    <TASK>
    __schedule+0x17b5/0x4870
    ? try_to_wake_up+0xb5/0xfc0
    ? __cfi_lock_release+0x10/0x10
    ? __sched_text_start+0x10/0x10
    ? _raw_spin_unlock_irqrestore+0x24/0x50
    ? lock_release+0x2aa/0xd10
    ? wq_worker_sleeping+0x1e/0x200
    schedule+0xe6/0x1e0
    schedule_timeout+0x7e/0x1e0
    ? __cfi_schedule_timeout+0x10/0x10
    ? mark_lock+0x94/0x340
    ? lockdep_hardirqs_on_prepare+0x2aa/0x5e0
    wait_for_common+0x377/0x600
    ? queue_work_on+0x57/0xa0
    ? wait_for_completion+0x30/0x30
    nvme_fc_connect_ctrl_work+0x7dc/0x1a80 [nvme_fc 39e2bf78272df3a610fb1f9884513e99038af746]
    process_one_work+0x80f/0xfb0
    ? rescuer_thread+0x1130/0x1130
    ? do_raw_spin_trylock+0xc9/0x1f0
    ? lock_acquired+0x310/0x9a0
    ? worker_thread+0xd5e/0x1260
    worker_thread+0xbde/0x1260
    kthread+0x25d/0x2f0
    ? __cfi_worker_thread+0x10/0x10
    ? __cfi_kthread+0x10/0x10
    ret_from_fork+0x29/0x50
    </TASK>
 
Daniel Wagner (5):
  nvme-fc: Do not wait in vain when unloading module
  nvme-fcloop: queue work items correctly
  nvmet-fcloop: Remove remote port from list when unlinking
  nvme-fc: Make initial connect attempt synchronous
  nvme-fc: do no free ctrl opts

 drivers/nvme/host/fc.c       | 43 ++++++++++++++++++------------------
 drivers/nvme/target/fcloop.c | 15 +++++--------
 2 files changed, 27 insertions(+), 31 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/5] nvme-fc: Do not wait in vain when unloading module
  2023-06-20 13:37 [PATCH v2 0/5] nvme-fc: Fix blktests hangers Daniel Wagner
@ 2023-06-20 13:37 ` Daniel Wagner
  2023-06-20 13:37 ` [PATCH v2 2/5] nvme-fcloop: queue work items correctly Daniel Wagner
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:37 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Daniel Wagner

The module unload code will wait for a controller to be delete even when
there is no controller and we wait for completion forever to happen.
Thus only wait for the completion when there is a controller which
needs to be removed.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/host/fc.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 2ed75923507d..472ed285fd45 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3932,10 +3932,11 @@ static int __init nvme_fc_init_module(void)
 	return ret;
 }
 
-static void
+static bool
 nvme_fc_delete_controllers(struct nvme_fc_rport *rport)
 {
 	struct nvme_fc_ctrl *ctrl;
+	bool cleanup = false;
 
 	spin_lock(&rport->lock);
 	list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) {
@@ -3943,21 +3944,28 @@ nvme_fc_delete_controllers(struct nvme_fc_rport *rport)
 			"NVME-FC{%d}: transport unloading: deleting ctrl\n",
 			ctrl->cnum);
 		nvme_delete_ctrl(&ctrl->ctrl);
+		cleanup = true;
 	}
 	spin_unlock(&rport->lock);
+
+	return cleanup;
 }
 
-static void
+static bool
 nvme_fc_cleanup_for_unload(void)
 {
 	struct nvme_fc_lport *lport;
 	struct nvme_fc_rport *rport;
+	bool cleanup = false;
 
 	list_for_each_entry(lport, &nvme_fc_lport_list, port_list) {
 		list_for_each_entry(rport, &lport->endp_list, endp_list) {
-			nvme_fc_delete_controllers(rport);
+			if (nvme_fc_delete_controllers(rport))
+				cleanup = true;
 		}
 	}
+
+	return cleanup;
 }
 
 static void __exit nvme_fc_exit_module(void)
@@ -3967,10 +3975,8 @@ static void __exit nvme_fc_exit_module(void)
 
 	spin_lock_irqsave(&nvme_fc_lock, flags);
 	nvme_fc_waiting_to_unload = true;
-	if (!list_empty(&nvme_fc_lport_list)) {
-		need_cleanup = true;
-		nvme_fc_cleanup_for_unload();
-	}
+	if (!list_empty(&nvme_fc_lport_list))
+		need_cleanup = nvme_fc_cleanup_for_unload();
 	spin_unlock_irqrestore(&nvme_fc_lock, flags);
 	if (need_cleanup) {
 		pr_info("%s: waiting for ctlr deletes\n", __func__);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/5] nvme-fcloop: queue work items correctly
  2023-06-20 13:37 [PATCH v2 0/5] nvme-fc: Fix blktests hangers Daniel Wagner
  2023-06-20 13:37 ` [PATCH v2 1/5] nvme-fc: Do not wait in vain when unloading module Daniel Wagner
@ 2023-06-20 13:37 ` Daniel Wagner
  2023-06-20 13:37 ` [PATCH v2 3/5] nvmet-fcloop: Remove remote port from list when unlinking Daniel Wagner
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:37 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Daniel Wagner

The arguments passed to list_add_tail are reversed. The new element is
first argument and the queue/list is the second one.

Fixes: 38803fcffb5b ("nvme-fcloop: fix deallocation of working context")
Fixes: 437c0b824dbd ("nvme-fcloop: add target to host LS request support")

Cc: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/fcloop.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index c65a73433c05..4b35bdcac185 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -358,7 +358,7 @@ fcloop_h2t_ls_req(struct nvme_fc_local_port *localport,
 	if (!rport->targetport) {
 		tls_req->status = -ECONNREFUSED;
 		spin_lock(&rport->lock);
-		list_add_tail(&rport->ls_list, &tls_req->ls_list);
+		list_add_tail(&tls_req->ls_list, &rport->ls_list);
 		spin_unlock(&rport->lock);
 		queue_work(nvmet_wq, &rport->ls_work);
 		return ret;
@@ -391,7 +391,7 @@ fcloop_h2t_xmt_ls_rsp(struct nvmet_fc_target_port *targetport,
 	if (remoteport) {
 		rport = remoteport->private;
 		spin_lock(&rport->lock);
-		list_add_tail(&rport->ls_list, &tls_req->ls_list);
+		list_add_tail(&tls_req->ls_list, &rport->ls_list);
 		spin_unlock(&rport->lock);
 		queue_work(nvmet_wq, &rport->ls_work);
 	}
@@ -446,7 +446,7 @@ fcloop_t2h_ls_req(struct nvmet_fc_target_port *targetport, void *hosthandle,
 	if (!tport->remoteport) {
 		tls_req->status = -ECONNREFUSED;
 		spin_lock(&tport->lock);
-		list_add_tail(&tport->ls_list, &tls_req->ls_list);
+		list_add_tail(&tls_req->ls_list, &tport->ls_list);
 		spin_unlock(&tport->lock);
 		queue_work(nvmet_wq, &tport->ls_work);
 		return ret;
@@ -478,7 +478,7 @@ fcloop_t2h_xmt_ls_rsp(struct nvme_fc_local_port *localport,
 	if (targetport) {
 		tport = targetport->private;
 		spin_lock(&tport->lock);
-		list_add_tail(&tport->ls_list, &tls_req->ls_list);
+		list_add_tail(&tls_req->ls_list, &tport->ls_list);
 		spin_unlock(&tport->lock);
 		queue_work(nvmet_wq, &tport->ls_work);
 	}
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/5] nvmet-fcloop: Remove remote port from list when unlinking
  2023-06-20 13:37 [PATCH v2 0/5] nvme-fc: Fix blktests hangers Daniel Wagner
  2023-06-20 13:37 ` [PATCH v2 1/5] nvme-fc: Do not wait in vain when unloading module Daniel Wagner
  2023-06-20 13:37 ` [PATCH v2 2/5] nvme-fcloop: queue work items correctly Daniel Wagner
@ 2023-06-20 13:37 ` Daniel Wagner
  2023-06-20 13:37 ` [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous Daniel Wagner
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:37 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Daniel Wagner

The remote port is removed too late from fcloop_nports list. Remove it
when port is unregistered.

This prevents a busy loop in fcloop_exit, because it is possible the
remote port is found in the list and thus we will never progress.

The kernel log will be spammed with

  nvme_fcloop: fcloop_exit: Failed deleting remote port
  nvme_fcloop: fcloop_exit: Failed deleting target port

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/fcloop.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/fcloop.c b/drivers/nvme/target/fcloop.c
index 4b35bdcac185..9f7530147893 100644
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -995,11 +995,6 @@ fcloop_nport_free(struct kref *ref)
 {
 	struct fcloop_nport *nport =
 		container_of(ref, struct fcloop_nport, ref);
-	unsigned long flags;
-
-	spin_lock_irqsave(&fcloop_lock, flags);
-	list_del(&nport->nport_list);
-	spin_unlock_irqrestore(&fcloop_lock, flags);
 
 	kfree(nport);
 }
@@ -1357,6 +1352,8 @@ __unlink_remote_port(struct fcloop_nport *nport)
 		nport->tport->remoteport = NULL;
 	nport->rport = NULL;
 
+	list_del(&nport->nport_list);
+
 	return rport;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-06-20 13:37 [PATCH v2 0/5] nvme-fc: Fix blktests hangers Daniel Wagner
                   ` (2 preceding siblings ...)
  2023-06-20 13:37 ` [PATCH v2 3/5] nvmet-fcloop: Remove remote port from list when unlinking Daniel Wagner
@ 2023-06-20 13:37 ` Daniel Wagner
  2023-06-26 10:59   ` Dan Carpenter
                     ` (2 more replies)
  2023-06-20 13:37 ` [PATCH v2 5/5] nvme-fc: do no free ctrl opts Daniel Wagner
  2023-06-30 13:33 ` [PATCH v2 0/5] nvme-fc: Fix blktests hangers Ewan Milne
  5 siblings, 3 replies; 17+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:37 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Daniel Wagner

Commit 4c984154efa1 ("nvme-fc: change controllers first connect to use
reconnect path") made the connection attempt asynchronous in order to
make the connection attempt from autoconnect/boot via udev/systemd up
case a bit more reliable.

Unfortunately, one side effect of this is that any wrong parameters
provided from userspace will not be directly reported as invalid, e.g.
auth keys.

So instead having the policy code inside the kernel it's better to
address this in userspace, for example in nvme-cli or nvme-stas.

This aligns the fc transport with tcp and rdma.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/host/fc.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 472ed285fd45..aa2911f07c6c 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2943,6 +2943,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
 	/* force put free routine to ignore io queues */
 	ctrl->ctrl.tagset = NULL;
 
+	if (ret > 0)
+		ret = -EIO;
 	return ret;
 }
 
@@ -3545,21 +3547,15 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
 	spin_unlock_irqrestore(&rport->lock, flags);
 
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
-	    !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
 		dev_err(ctrl->ctrl.device,
 			"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
 		goto fail_ctrl;
 	}
 
-	if (!queue_delayed_work(nvme_wq, &ctrl->connect_work, 0)) {
-		dev_err(ctrl->ctrl.device,
-			"NVME-FC{%d}: failed to schedule initial connect\n",
-			ctrl->cnum);
+	ret = nvme_fc_create_association(ctrl);
+	if (ret)
 		goto fail_ctrl;
-	}
-
-	flush_delayed_work(&ctrl->connect_work);
 
 	dev_info(ctrl->ctrl.device,
 		"NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
@@ -3568,7 +3564,6 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	return &ctrl->ctrl;
 
 fail_ctrl:
-	nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
 	cancel_work_sync(&ctrl->ioerr_work);
 	cancel_work_sync(&ctrl->ctrl.reset_work);
 	cancel_delayed_work_sync(&ctrl->connect_work);
@@ -3590,7 +3585,9 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	 */
 	nvme_fc_rport_get(rport);
 
-	return ERR_PTR(-EIO);
+	if (ret > 0)
+		ret = -EIO;
+	return ERR_PTR(ret);
 
 out_free_queues:
 	kfree(ctrl->queues);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/5] nvme-fc: do no free ctrl opts
  2023-06-20 13:37 [PATCH v2 0/5] nvme-fc: Fix blktests hangers Daniel Wagner
                   ` (3 preceding siblings ...)
  2023-06-20 13:37 ` [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous Daniel Wagner
@ 2023-06-20 13:37 ` Daniel Wagner
  2023-06-30 13:33 ` [PATCH v2 0/5] nvme-fc: Fix blktests hangers Ewan Milne
  5 siblings, 0 replies; 17+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:37 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Daniel Wagner

Since the initial additional of the FC transport
e399441de911 ("nvme-fabrics: Add host support for FC transport"), the
transport also freed the options. Since nvme_free_ctrl() is freeing the
options too commit de41447aac03 ("nvme-fc: avoid memory corruption
caused by calling nvmf_free_options() twice") was added to avoid double
frees.

With the change to make the initial connection attempt synchronous
again, the life time of all object is known also in the error case. All
resources will be freed in the same context.

The FC transport should not free the options as the generic auth code is
relying to be able to read the options even in the shutdown path (see
nvme_auth_free is calling ctrl_max_dhchaps which relies on opts being a
valid pointer).

TCP and RDMA also avoid freeing the options, so make the FC transport
behave the same.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/host/fc.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index aa2911f07c6c..6f5cfa47fee5 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2413,8 +2413,6 @@ nvme_fc_ctrl_free(struct kref *ref)
 	nvme_fc_rport_put(ctrl->rport);
 
 	ida_free(&nvme_fc_ctrl_cnt, ctrl->cnum);
-	if (ctrl->ctrl.opts)
-		nvmf_free_options(ctrl->ctrl.opts);
 	kfree(ctrl);
 }
 
@@ -3568,8 +3566,6 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	cancel_work_sync(&ctrl->ctrl.reset_work);
 	cancel_delayed_work_sync(&ctrl->connect_work);
 
-	ctrl->ctrl.opts = NULL;
-
 	/* initiate nvme ctrl ref counting teardown */
 	nvme_uninit_ctrl(&ctrl->ctrl);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-06-20 13:37 ` [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous Daniel Wagner
@ 2023-06-26 10:59   ` Dan Carpenter
  2023-06-26 11:33   ` Dan Carpenter
  2023-07-01 12:11   ` James Smart
  2 siblings, 0 replies; 17+ messages in thread
From: Dan Carpenter @ 2023-06-26 10:59 UTC (permalink / raw)
  To: oe-kbuild, Daniel Wagner, linux-nvme
  Cc: lkp, oe-kbuild-all, linux-kernel, linux-block,
	Chaitanya Kulkarni, Shin'ichiro Kawasaki, Sagi Grimberg,
	Hannes Reinecke, James Smart, Daniel Wagner

Hi Daniel,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Wagner/nvme-fc-Do-not-wait-in-vain-when-unloading-module/20230620-213849
base:   linus/master
patch link:    https://lore.kernel.org/r/20230620133711.22840-5-dwagner%40suse.de
patch subject: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
config: openrisc-randconfig-m041-20230622 (https://download.01.org/0day-ci/archive/20230624/202306240125.U2jdrjAY-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230624/202306240125.U2jdrjAY-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202306240125.U2jdrjAY-lkp@intel.com/

smatch warnings:
drivers/nvme/host/fc.c:3590 nvme_fc_init_ctrl() warn: passing zero to 'ERR_PTR'

vim +/ERR_PTR +3590 drivers/nvme/host/fc.c

61bff8ef008845 James Smart       2017-04-23  3533  	ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_fc_ctrl_ops, 0);
61bff8ef008845 James Smart       2017-04-23  3534  	if (ret)
98e3528012cd57 Ross Lagerwall    2023-01-20  3535  		goto out_free_queues;
e399441de9115c James Smart       2016-12-02  3536  
61bff8ef008845 James Smart       2017-04-23  3537  	/* at this point, teardown path changes to ref counting on nvme ctrl */
e399441de9115c James Smart       2016-12-02  3538  
98e3528012cd57 Ross Lagerwall    2023-01-20  3539  	ret = nvme_alloc_admin_tag_set(&ctrl->ctrl, &ctrl->admin_tag_set,
98e3528012cd57 Ross Lagerwall    2023-01-20  3540  			&nvme_fc_admin_mq_ops,
98e3528012cd57 Ross Lagerwall    2023-01-20  3541  			struct_size((struct nvme_fcp_op_w_sgl *)NULL, priv,
98e3528012cd57 Ross Lagerwall    2023-01-20  3542  				    ctrl->lport->ops->fcprqst_priv_sz));
98e3528012cd57 Ross Lagerwall    2023-01-20  3543  	if (ret)
98e3528012cd57 Ross Lagerwall    2023-01-20  3544  		goto fail_ctrl;
98e3528012cd57 Ross Lagerwall    2023-01-20  3545  
e399441de9115c James Smart       2016-12-02  3546  	spin_lock_irqsave(&rport->lock, flags);
e399441de9115c James Smart       2016-12-02  3547  	list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
e399441de9115c James Smart       2016-12-02  3548  	spin_unlock_irqrestore(&rport->lock, flags);
e399441de9115c James Smart       2016-12-02  3549  
ac881fd1288ca6 Daniel Wagner     2023-06-20  3550  	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
4c984154efa131 James Smart       2018-06-13  3551  		dev_err(ctrl->ctrl.device,
4c984154efa131 James Smart       2018-06-13  3552  			"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
4c984154efa131 James Smart       2018-06-13  3553  		goto fail_ctrl;

No error code on this path.  Originally it didn't matter because it was
hardcoded to return ERR_PTR(-EIO);

17c4dc6eb7e1b2 James Smart       2017-10-09  3554  	}
17c4dc6eb7e1b2 James Smart       2017-10-09  3555  
ac881fd1288ca6 Daniel Wagner     2023-06-20  3556  	ret = nvme_fc_create_association(ctrl);
ac881fd1288ca6 Daniel Wagner     2023-06-20  3557  	if (ret)
4c984154efa131 James Smart       2018-06-13  3558  		goto fail_ctrl;
4c984154efa131 James Smart       2018-06-13  3559  
4c984154efa131 James Smart       2018-06-13  3560  	dev_info(ctrl->ctrl.device,
4c984154efa131 James Smart       2018-06-13  3561  		"NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
e5ea42faa773c6 Hannes Reinecke   2021-09-22  3562  		ctrl->cnum, nvmf_ctrl_subsysnqn(&ctrl->ctrl));
4c984154efa131 James Smart       2018-06-13  3563  
4c984154efa131 James Smart       2018-06-13  3564  	return &ctrl->ctrl;
4c984154efa131 James Smart       2018-06-13  3565  
4c984154efa131 James Smart       2018-06-13  3566  fail_ctrl:
19fce0470f0503 James Smart       2020-12-01  3567  	cancel_work_sync(&ctrl->ioerr_work);
cf25809bec2c7d James Smart       2018-03-13  3568  	cancel_work_sync(&ctrl->ctrl.reset_work);
cf25809bec2c7d James Smart       2018-03-13  3569  	cancel_delayed_work_sync(&ctrl->connect_work);
cf25809bec2c7d James Smart       2018-03-13  3570  
de41447aac034c Ewan D. Milne     2017-04-24  3571  	ctrl->ctrl.opts = NULL;
17c4dc6eb7e1b2 James Smart       2017-10-09  3572  
61bff8ef008845 James Smart       2017-04-23  3573  	/* initiate nvme ctrl ref counting teardown */
e399441de9115c James Smart       2016-12-02  3574  	nvme_uninit_ctrl(&ctrl->ctrl);
61bff8ef008845 James Smart       2017-04-23  3575  
0b5a7669a457dd James Smart       2017-06-15  3576  	/* Remove core ctrl ref. */
0b5a7669a457dd James Smart       2017-06-15  3577  	nvme_put_ctrl(&ctrl->ctrl);
0b5a7669a457dd James Smart       2017-06-15  3578  
61bff8ef008845 James Smart       2017-04-23  3579  	/* as we're past the point where we transition to the ref
61bff8ef008845 James Smart       2017-04-23  3580  	 * counting teardown path, if we return a bad pointer here,
61bff8ef008845 James Smart       2017-04-23  3581  	 * the calling routine, thinking it's prior to the
61bff8ef008845 James Smart       2017-04-23  3582  	 * transition, will do an rport put. Since the teardown
61bff8ef008845 James Smart       2017-04-23  3583  	 * path also does a rport put, we do an extra get here to
61bff8ef008845 James Smart       2017-04-23  3584  	 * so proper order/teardown happens.
61bff8ef008845 James Smart       2017-04-23  3585  	 */
61bff8ef008845 James Smart       2017-04-23  3586  	nvme_fc_rport_get(rport);
61bff8ef008845 James Smart       2017-04-23  3587  
ac881fd1288ca6 Daniel Wagner     2023-06-20  3588  	if (ret > 0)
ac881fd1288ca6 Daniel Wagner     2023-06-20  3589  		ret = -EIO;
ac881fd1288ca6 Daniel Wagner     2023-06-20 @3590  	return ERR_PTR(ret);
e399441de9115c James Smart       2016-12-02  3591  
61bff8ef008845 James Smart       2017-04-23  3592  out_free_queues:
61bff8ef008845 James Smart       2017-04-23  3593  	kfree(ctrl->queues);
e399441de9115c James Smart       2016-12-02  3594  out_free_ida:
61bff8ef008845 James Smart       2017-04-23  3595  	put_device(ctrl->dev);
3dd83f4013f0e8 Sagi Grimberg     2022-02-14  3596  	ida_free(&nvme_fc_ctrl_cnt, ctrl->cnum);
e399441de9115c James Smart       2016-12-02  3597  out_free_ctrl:
e399441de9115c James Smart       2016-12-02  3598  	kfree(ctrl);
e399441de9115c James Smart       2016-12-02  3599  out_fail:
e399441de9115c James Smart       2016-12-02  3600  	/* exit via here doesn't follow ctlr ref points */
e399441de9115c James Smart       2016-12-02  3601  	return ERR_PTR(ret);
e399441de9115c James Smart       2016-12-02  3602  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-06-20 13:37 ` [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous Daniel Wagner
  2023-06-26 10:59   ` Dan Carpenter
@ 2023-06-26 11:33   ` Dan Carpenter
  2023-06-27  6:18     ` Daniel Wagner
  2023-07-01 12:11   ` James Smart
  2 siblings, 1 reply; 17+ messages in thread
From: Dan Carpenter @ 2023-06-26 11:33 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart

On Tue, Jun 20, 2023 at 03:37:10PM +0200, Daniel Wagner wrote:
> Commit 4c984154efa1 ("nvme-fc: change controllers first connect to use
> reconnect path") made the connection attempt asynchronous in order to
> make the connection attempt from autoconnect/boot via udev/systemd up
> case a bit more reliable.
> 
> Unfortunately, one side effect of this is that any wrong parameters
> provided from userspace will not be directly reported as invalid, e.g.
> auth keys.
> 
> So instead having the policy code inside the kernel it's better to
> address this in userspace, for example in nvme-cli or nvme-stas.
> 
> This aligns the fc transport with tcp and rdma.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  drivers/nvme/host/fc.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 472ed285fd45..aa2911f07c6c 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2943,6 +2943,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
>  	/* force put free routine to ignore io queues */
>  	ctrl->ctrl.tagset = NULL;
>  
> +	if (ret > 0)
> +		ret = -EIO;

All these checks for ret > 0 make me unhappy.  I don't understand how
they are a part of the commit.

I have tried to look at the context and I think maybe you are working
around the fact that qla_nvme_ls_req() returns QLA_FUNCTION_FAILED on
error.

Also the qla_nvme_ls_req() function EINVAL on error.  I just wrote a
commit message saying that none of the callers cared but I missed that
apparently gets returned to nvme_fc_init_ctrl().  :/
https://lore.kernel.org/all/49866d28-4cfe-47b0-842b-78f110e61aab@moroto.mountain/

Let's just fix qla_nvme_ls_req() instead of working around it here.

And let's add a WARN_ON_ONCE() somewhere to prevent future bugs.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-06-26 11:33   ` Dan Carpenter
@ 2023-06-27  6:18     ` Daniel Wagner
  2023-06-27  6:39       ` Hannes Reinecke
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Wagner @ 2023-06-27  6:18 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart

On Mon, Jun 26, 2023 at 02:33:18PM +0300, Dan Carpenter wrote:
> > @@ -2943,6 +2943,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
> >  	/* force put free routine to ignore io queues */
> >  	ctrl->ctrl.tagset = NULL;
> >  
> > +	if (ret > 0)
> > +		ret = -EIO;
> 
> All these checks for ret > 0 make me unhappy.  I don't understand how
> they are a part of the commit.

We have two types of error message types in the nvme subsystem. The negative
values are the usual ones and positive ones are nvme protocol errors.

For example if the authentication fails because of invalid credentials when
doing the authentication nvmf_connect_admin_queue() will return a value of
NVME_SC_AUTH_REQUIRED. This is also the value which gets propagated to this
point here. The problem is any positive error code is interpreted as a valid
pointer later in the code, which results in a crash.

> I have tried to look at the context and I think maybe you are working
> around the fact that qla_nvme_ls_req() returns QLA_FUNCTION_FAILED on
> error.

The auth blktests are exercising the error path here and that's why I added
this check. BTW, we already use in other places, this is not completely new in
this subsystem.

> Also the qla_nvme_ls_req() function EINVAL on error.  I just wrote a
> commit message saying that none of the callers cared but I missed that
> apparently gets returned to nvme_fc_init_ctrl().  :/
> https://lore.kernel.org/all/49866d28-4cfe-47b0-842b-78f110e61aab@moroto.mountain/

Thank!

> Let's just fix qla_nvme_ls_req() instead of working around it here.
>
> And let's add a WARN_ON_ONCE() somewhere to prevent future bugs.

This makes sense for the driver APIs. Though for the core nvme subsystem this
needs to be discusses/redesigned how to handle the protocol errors first.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-06-27  6:18     ` Daniel Wagner
@ 2023-06-27  6:39       ` Hannes Reinecke
  2023-06-27  6:51         ` Hannes Reinecke
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Reinecke @ 2023-06-27  6:39 UTC (permalink / raw)
  To: Daniel Wagner, Dan Carpenter
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, James Smart

On 6/27/23 08:18, Daniel Wagner wrote:
> On Mon, Jun 26, 2023 at 02:33:18PM +0300, Dan Carpenter wrote:
>>> @@ -2943,6 +2943,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
>>>   	/* force put free routine to ignore io queues */
>>>   	ctrl->ctrl.tagset = NULL;
>>>   
>>> +	if (ret > 0)
>>> +		ret = -EIO;
>>
>> All these checks for ret > 0 make me unhappy.  I don't understand how
>> they are a part of the commit.
> 
> We have two types of error message types in the nvme subsystem. The negative
> values are the usual ones and positive ones are nvme protocol errors.
> 
> For example if the authentication fails because of invalid credentials when
> doing the authentication nvmf_connect_admin_queue() will return a value of
> NVME_SC_AUTH_REQUIRED. This is also the value which gets propagated to this
> point here. The problem is any positive error code is interpreted as a valid
> pointer later in the code, which results in a crash.
> 
>> I have tried to look at the context and I think maybe you are working
>> around the fact that qla_nvme_ls_req() returns QLA_FUNCTION_FAILED on
>> error.
> 
> The auth blktests are exercising the error path here and that's why I added
> this check. BTW, we already use in other places, this is not completely new in
> this subsystem.
> 
>> Also the qla_nvme_ls_req() function EINVAL on error.  I just wrote a
>> commit message saying that none of the callers cared but I missed that
>> apparently gets returned to nvme_fc_init_ctrl().  :/
>> https://lore.kernel.org/all/49866d28-4cfe-47b0-842b-78f110e61aab@moroto.mountain/
> 
> Thank!
> 
>> Let's just fix qla_nvme_ls_req() instead of working around it here.
>>
>> And let's add a WARN_ON_ONCE() somewhere to prevent future bugs.
> 
> This makes sense for the driver APIs. Though for the core nvme subsystem this
> needs to be discusses/redesigned how to handle the protocol errors first.
> 
I would stick with the 'normal' nvme syntax of having negative errors as 
internal errors (ie errnos), '0' for no error, and positive numbers as 
NVMe protocol errors.
As such I would also advocate to not map NVMe protocol errors onto error 
numbers but rather fix the callers to not do a pointer conversion.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-06-27  6:39       ` Hannes Reinecke
@ 2023-06-27  6:51         ` Hannes Reinecke
  0 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2023-06-27  6:51 UTC (permalink / raw)
  To: Daniel Wagner, Dan Carpenter
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, James Smart

On 6/27/23 08:39, Hannes Reinecke wrote:
> On 6/27/23 08:18, Daniel Wagner wrote:
>> On Mon, Jun 26, 2023 at 02:33:18PM +0300, Dan Carpenter wrote:
>>>> @@ -2943,6 +2943,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl 
>>>> *ctrl)
>>>>       /* force put free routine to ignore io queues */
>>>>       ctrl->ctrl.tagset = NULL;
>>>> +    if (ret > 0)
>>>> +        ret = -EIO;
>>>
>>> All these checks for ret > 0 make me unhappy.  I don't understand how
>>> they are a part of the commit.
>>
>> We have two types of error message types in the nvme subsystem. The 
>> negative
>> values are the usual ones and positive ones are nvme protocol errors.
>>
>> For example if the authentication fails because of invalid credentials 
>> when
>> doing the authentication nvmf_connect_admin_queue() will return a 
>> value of
>> NVME_SC_AUTH_REQUIRED. This is also the value which gets propagated to 
>> this
>> point here. The problem is any positive error code is interpreted as a 
>> valid
>> pointer later in the code, which results in a crash.
>>
>>> I have tried to look at the context and I think maybe you are working
>>> around the fact that qla_nvme_ls_req() returns QLA_FUNCTION_FAILED on
>>> error.
>>
>> The auth blktests are exercising the error path here and that's why I 
>> added
>> this check. BTW, we already use in other places, this is not 
>> completely new in
>> this subsystem.
>>
>>> Also the qla_nvme_ls_req() function EINVAL on error.  I just wrote a
>>> commit message saying that none of the callers cared but I missed that
>>> apparently gets returned to nvme_fc_init_ctrl().  :/
>>> https://lore.kernel.org/all/49866d28-4cfe-47b0-842b-78f110e61aab@moroto.mountain/
>>
>> Thank!
>>
>>> Let's just fix qla_nvme_ls_req() instead of working around it here.
>>>
>>> And let's add a WARN_ON_ONCE() somewhere to prevent future bugs.
>>
>> This makes sense for the driver APIs. Though for the core nvme 
>> subsystem this
>> needs to be discusses/redesigned how to handle the protocol errors first.
>>
> I would stick with the 'normal' nvme syntax of having negative errors as 
> internal errors (ie errnos), '0' for no error, and positive numbers as 
> NVMe protocol errors.
> As such I would also advocate to not map NVMe protocol errors onto error 
> numbers but rather fix the callers to not do a pointer conversion.
> 
Aw. Now I see it.

It's the ->create_ctrl() callback which will always return a controller 
pointer or an error value.
If we were to return a protocol error we would need to stick it into the 
controller structure itself. But if we doing that then we'll be ending 
up with a non-existing controller, ie we'd be returning a structure for 
a dead controller. Not exactly pretty, but it would allow us to improve
the userland API to return the NVMe protocol error by reading from the
fabrics device; the controller structure itself would be cleaned up when 
closing that device.

Hmm.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] nvme-fc: Fix blktests hangers
  2023-06-20 13:37 [PATCH v2 0/5] nvme-fc: Fix blktests hangers Daniel Wagner
                   ` (4 preceding siblings ...)
  2023-06-20 13:37 ` [PATCH v2 5/5] nvme-fc: do no free ctrl opts Daniel Wagner
@ 2023-06-30 13:33 ` Ewan Milne
  5 siblings, 0 replies; 17+ messages in thread
From: Ewan Milne @ 2023-06-30 13:33 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart

We ran this patch series through our regression and NVMe/FC boot from SAN tests.
(I think 4/5 will also resolve some instability / race condition with
kickstart, since the connect
is now synchronous, but we may need some more time to full validate
that.  I was going
to pursue a userspace solution but this may be good enough.)

Notwithstanding the other comments re: error codes, etc. feel free to add:

Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Tested-by: Marco Patalano (mpatalan@redhat.com>

On Tue, Jun 20, 2023 at 9:39 AM Daniel Wagner <dwagner@suse.de> wrote:
>
> With the additinal 3 patches and the blktests changes [1] I am able to run the
> tests in a loop for a while without failures. I expect there are more problems
> hidden but with these changes the first milesone is reached.
>
> The first new patch is actually one from a previous attempt. It addresses the
> problem in unloading path where we dont make any progress more [2].
>
> The next two patches change the existing initial connection attempt of FC to
> a synchronous one. Without this the auth test fail [3].
>
> Daniel
>
>
> [1] https://lore.kernel.org/linux-nvme/20230620132703.20648-1-dwagner@suse.de/
> [2] https://lore.kernel.org/linux-nvme/20230418130159.11075-1-dwagner@suse.de/
> [3] https://lore.kernel.org/linux-nvme/j4w7724skjsapgtp6wtll5fgsb4adhpfdtrif52lhc7iql4inf@3mu2gbjrrtcs/
>
> changes:
>
> v2:
>   - added RBs
>   - added new patches
>
> v1:
>   - https://lore.kernel.org/linux-nvme/20230615094356.14878-1-dwagner@suse.de/
>
>
> Initial cover letter:
>
> A couple more fixes to enable blktests for the fc transport.
>
> 1) Another module unloading hanger which can be triggered with
>    the nvme/048 tests:
>
>    run blktests nvme/048 at 2023-06-06 13:04:49
>    nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>    nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
>    (NULL device *): {0:0} Association created
>    [478] nvmet: ctrl 1 start keep-alive timer for 1 secs
>    [478] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
>    [478] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
>    [478] nvmet: No authentication provided
>    nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
>    [478] nvmet: adding queue 1 to ctrl 1.
>    [407] nvmet: adding queue 2 to ctrl 1.
>    [6549] nvmet: adding queue 3 to ctrl 1.
>    [1760] nvmet: adding queue 4 to ctrl 1.
>    nvme nvme2: NVME-FC{0}: controller connect complete
>    nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
>    [478] nvmet: ctrl 1 reschedule traffic based keep-alive timer
>    [407] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
>    nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
>    nvme nvme2: NVME-FC{0}: resetting controller
>    [478] nvmet: ctrl 1 stop keep-alive
>    (NULL device *): {0:0} Association deleted
>    nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
>    (NULL device *): {0:0} Association freed
>    (NULL device *): {0:0} Association created
>    (NULL device *): Disconnect LS failed: No Association
>    [407] nvmet: ctrl 1 start keep-alive timer for 1 secs
>    [407] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
>    [407] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
>    [407] nvmet: No authentication provided
>    nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
>    nvme nvme2: reconnect: revising io queue count from 4 to 1
>    [478] nvmet: adding queue 1 to ctrl 1.
>    nvme nvme2: NVME-FC{0}: controller connect complete
>    [478] nvmet: ctrl 1 reschedule traffic based keep-alive timer
>    [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [407] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
>    nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
>    nvme nvme2: NVME-FC{0}: resetting controller
>    [478] nvmet: ctrl 1 stop keep-alive
>    (NULL device *): {0:0} Association deleted
>    (NULL device *): {0:0} Association freed
>    (NULL device *): Disconnect LS failed: No Association
>    nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
>    (NULL device *): {0:0} Association created
>    [1760] nvmet: ctrl 1 start keep-alive timer for 1 secs
>    [1760] nvmet: check nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a
>    [1760] nvmet: nvmet_setup_dhgroup: ctrl 1 selecting dhgroup 0
>    [1760] nvmet: No authentication provided
>    nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:3d0c3f5d-cb37-4bc4-af29-2168953bfc0a.
>    nvme nvme2: reconnect: revising io queue count from 1 to 4
>    [478] nvmet: adding queue 1 to ctrl 1.
>    [407] nvmet: adding queue 2 to ctrl 1.
>    [6549] nvmet: adding queue 3 to ctrl 1.
>    [1760] nvmet: adding queue 4 to ctrl 1.
>    nvme nvme2: NVME-FC{0}: controller connect complete
>    [1760] nvmet: ctrl 1 reschedule traffic based keep-alive timer
>    [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [478] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [1760] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    [6549] nvmet: ctrl 1 update keep-alive timer for 1 secs
>    nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
>    [407] nvmet: ctrl 1 stop keep-alive
>    (NULL device *): {0:0} Association deleted
>    (NULL device *): {0:0} Association freed
>    (NULL device *): Disconnect LS failed: No Association
>    nvme_fc: nvme_fc_exit_module: waiting for ctlr deletes
>    BTRFS info (device vda2): qgroup scan completed (inconsistency flag cleared)
>    INFO: task modprobe:11758 blocked for more than 491 seconds.
>          Tainted: G        W          6.4.0-rc2+ #3
>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    task:modprobe        state:D stack:0     pid:11758 ppid:11585  flags:0x00004002
>    Call Trace:
>     <TASK>
>     __schedule+0x17b5/0x4870
>     ? finish_lock_switch+0x8f/0x120
>     ? finish_task_switch+0x17f/0x5c0
>     ? __sched_text_start+0x10/0x10
>     ? __schedule+0x17bd/0x4870
>     ? do_raw_spin_trylock+0xc9/0x1f0
>     schedule+0xe6/0x1e0
>     schedule_timeout+0x7e/0x1e0
>     ? __cfi_schedule_timeout+0x10/0x10
>     ? do_raw_spin_trylock+0xc9/0x1f0
>     ? __cfi_lock_release+0x10/0x10
>     ? do_raw_spin_unlock+0x116/0x8a0
>     wait_for_common+0x377/0x600
>     ? wait_for_completion+0x30/0x30
>     cleanup_module+0x222/0x240 [nvme_fc bda1ef6f83d00208ba257c5d0ac34dd6bdb58bf9]
>     __se_sys_delete_module+0x388/0x550
>     ? __x64_sys_delete_module+0x50/0x50
>     ? task_work_run+0x236/0x290
>     ? syscall_enter_from_user_mode+0x2e/0x210
>     do_syscall_64+0x6e/0xa0
>     ? syscall_exit_to_user_mode+0x5e/0x220
>     ? do_syscall_64+0x7d/0xa0
>     ? syscall_exit_to_user_mode+0x5e/0x220
>     ? do_syscall_64+0x7d/0xa0
>     ? syscall_exit_to_user_mode+0x5e/0x220
>     ? do_syscall_64+0x7d/0xa0
>     entry_SYSCALL_64_after_hwframe+0x72/0xdc
>    RIP: 0033:0x7fa80811aebb
>    RSP: 002b:00007fff80ea0a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>    RAX: ffffffffffffffda RBX: 000055a0b5acf1f0 RCX: 00007fa80811aebb
>    RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055a0b5acf258
>    RBP: 0000000000000000 R08: 1999999999999999 R09: 0000000000000000
>    R10: 00007fa808193ac0 R11: 0000000000000206 R12: 00007fff80ea0ad0
>    R13: 000055a0b5acf1f0 R14: 0000000000000000 R15: 0000000000000000
>     </TASK>
>
>
> 2) When executing blktests nvme/030 in a tight loop, I was able to
>    reproduce a different hanger. In this case the ->done() function
>    was never executed on the host fc side. It turns out we didn't
>    enqueue work items correctly and thus would only process one
>    work item.
>
>    run blktests nvme/030 at 2023-06-13 14:03:52
>    nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>    nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
>    (NULL device *): {0:0} Association created
>    [23733] nvmet: ctrl 1 start keep-alive timer for 120 secs
>    nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
>    nvme nvme2: NVME-FC{0}: controller connect complete
>    nvme nvme2: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>    nvme nvme2: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>    [23771] nvmet: ctrl 1 stop keep-alive
>    nvmet: adding nsid 1 to subsystem blktests-subsystem-2
>    (NULL device *): {0:0} Association deleted
>    (NULL device *): {0:0} Association freed
>    (NULL device *): Disconnect LS failed: No Association
>    nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
>    (NULL device *): {0:0} Association created
>    [27320] nvmet: ctrl 1 start keep-alive timer for 120 secs
>    nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
>    nvme nvme2: NVME-FC{0}: controller connect complete
>    nvme nvme2: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>    nvme nvme2: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>    [27320] nvmet: ctrl 1 stop keep-alive
>    nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
>    (NULL device *): {0:1} Association created
>    (NULL device *): {0:0} Association deleted
>    (NULL device *): {0:0} Association freed
>    (NULL device *): Disconnect LS failed: No Association
>    INFO: task kworker/u8:4:102 blocked for more than 491 seconds.
>          Tainted: G        W          6.4.0-rc2+ #3
>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    task:kworker/u8:4    state:D stack:0     pid:102   ppid:2      flags:0x00004000
>    Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
>    Call Trace:
>     <TASK>
>     __schedule+0x17b5/0x4870
>     ? try_to_wake_up+0xb5/0xfc0
>     ? __cfi_lock_release+0x10/0x10
>     ? __sched_text_start+0x10/0x10
>     ? _raw_spin_unlock_irqrestore+0x24/0x50
>     ? lock_release+0x2aa/0xd10
>     ? wq_worker_sleeping+0x1e/0x200
>     schedule+0xe6/0x1e0
>     schedule_timeout+0x7e/0x1e0
>     ? __cfi_schedule_timeout+0x10/0x10
>     ? mark_lock+0x94/0x340
>     ? lockdep_hardirqs_on_prepare+0x2aa/0x5e0
>     wait_for_common+0x377/0x600
>     ? queue_work_on+0x57/0xa0
>     ? wait_for_completion+0x30/0x30
>     nvme_fc_connect_ctrl_work+0x7dc/0x1a80 [nvme_fc 39e2bf78272df3a610fb1f9884513e99038af746]
>     process_one_work+0x80f/0xfb0
>     ? rescuer_thread+0x1130/0x1130
>     ? do_raw_spin_trylock+0xc9/0x1f0
>     ? lock_acquired+0x310/0x9a0
>     ? worker_thread+0xd5e/0x1260
>     worker_thread+0xbde/0x1260
>     kthread+0x25d/0x2f0
>     ? __cfi_worker_thread+0x10/0x10
>     ? __cfi_kthread+0x10/0x10
>     ret_from_fork+0x29/0x50
>     </TASK>
>
> Daniel Wagner (5):
>   nvme-fc: Do not wait in vain when unloading module
>   nvme-fcloop: queue work items correctly
>   nvmet-fcloop: Remove remote port from list when unlinking
>   nvme-fc: Make initial connect attempt synchronous
>   nvme-fc: do no free ctrl opts
>
>  drivers/nvme/host/fc.c       | 43 ++++++++++++++++++------------------
>  drivers/nvme/target/fcloop.c | 15 +++++--------
>  2 files changed, 27 insertions(+), 31 deletions(-)
>
> --
> 2.41.0
>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-06-20 13:37 ` [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous Daniel Wagner
  2023-06-26 10:59   ` Dan Carpenter
  2023-06-26 11:33   ` Dan Carpenter
@ 2023-07-01 12:11   ` James Smart
  2023-07-06 12:07     ` Daniel Wagner
  2 siblings, 1 reply; 17+ messages in thread
From: James Smart @ 2023-07-01 12:11 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart


On 6/20/2023 6:37 AM, Daniel Wagner wrote:
> Commit 4c984154efa1 ("nvme-fc: change controllers first connect to use
> reconnect path") made the connection attempt asynchronous in order to
> make the connection attempt from autoconnect/boot via udev/systemd up
> case a bit more reliable.
> 
> Unfortunately, one side effect of this is that any wrong parameters
> provided from userspace will not be directly reported as invalid, e.g.
> auth keys.
> 
> So instead having the policy code inside the kernel it's better to
> address this in userspace, for example in nvme-cli or nvme-stas.
> 
> This aligns the fc transport with tcp and rdma.

As much as you want to make this change to make transports "similar", I 
am dead set against it unless you are completing a long qualification of 
the change on real FC hardware and FC-NVME devices. There is probably 
1.5 yrs of testing of different race conditions that drove this change. 
You cannot declare success from a simplistic toy tool such as fcloop for 
validation.

The original issues exist, probably have even morphed given the time 
from the original change, and this will seriously disrupt the transport 
and any downstream releases.  So I have a very strong NACK on this change.

Yes - things such as the connect failure results are difficult to return 
back to nvme-cli. I have had many gripes about the nvme-cli's behavior 
over the years, especially on negative cases due to race conditions 
which required retries. It still fails this miserably.  The async 
reconnect path solved many of these issues for fc.

For the auth failure, how do we deal with things if auth fails over time 
as reconnects fail due to a credential changes ?  I would think 
commonality of this behavior drives part of the choice.

-- james

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-07-01 12:11   ` James Smart
@ 2023-07-06 12:07     ` Daniel Wagner
  2023-07-11 22:47       ` James Smart
  2023-07-13 20:35       ` Ewan Milne
  0 siblings, 2 replies; 17+ messages in thread
From: Daniel Wagner @ 2023-07-06 12:07 UTC (permalink / raw)
  To: James Smart
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	Ewan Milne

Hi James,

On Sat, Jul 01, 2023 at 05:11:11AM -0700, James Smart wrote:
> As much as you want to make this change to make transports "similar", I am
> dead set against it unless you are completing a long qualification of the
> change on real FC hardware and FC-NVME devices. There is probably 1.5 yrs of
> testing of different race conditions that drove this change. You cannot
> declare success from a simplistic toy tool such as fcloop for validation.
> 
> The original issues exist, probably have even morphed given the time from
> the original change, and this will seriously disrupt the transport and any
> downstream releases.  So I have a very strong NACK on this change.
> 
> Yes - things such as the connect failure results are difficult to return
> back to nvme-cli. I have had many gripes about the nvme-cli's behavior over
> the years, especially on negative cases due to race conditions which
> required retries. It still fails this miserably.  The async reconnect path
> solved many of these issues for fc.
> 
> For the auth failure, how do we deal with things if auth fails over time as
> reconnects fail due to a credential changes ?  I would think commonality of
> this behavior drives part of the choice.

Alright, what do you think about the idea to introduce a new '--sync' option to
nvme-cli which forwards this info to the kernel that we want to wait for the
initial connect to succeed or fail? Obviously, this needs to handle signals too.

From what I understood this is also what Ewan would like to have.

Hannes thought it would make sense to use the same initial connect logic in
tcp/rdma, because there could also be transient erros (e.g. spanning tree
protocol). In short making the tcp/rdma do the same thing as fc?

So let's drop the final patch from this series for the time. Could you give some
feedback on the rest of the patches?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-07-06 12:07     ` Daniel Wagner
@ 2023-07-11 22:47       ` James Smart
  2023-07-12  6:50         ` Hannes Reinecke
  2023-07-13 20:35       ` Ewan Milne
  1 sibling, 1 reply; 17+ messages in thread
From: James Smart @ 2023-07-11 22:47 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	Ewan Milne

On 7/6/2023 5:07 AM, Daniel Wagner wrote:
> Hi James,
> 
> On Sat, Jul 01, 2023 at 05:11:11AM -0700, James Smart wrote:
>> As much as you want to make this change to make transports "similar", I am
>> dead set against it unless you are completing a long qualification of the
>> change on real FC hardware and FC-NVME devices. There is probably 1.5 yrs of
>> testing of different race conditions that drove this change. You cannot
>> declare success from a simplistic toy tool such as fcloop for validation.
>>
>> The original issues exist, probably have even morphed given the time from
>> the original change, and this will seriously disrupt the transport and any
>> downstream releases.  So I have a very strong NACK on this change.
>>
>> Yes - things such as the connect failure results are difficult to return
>> back to nvme-cli. I have had many gripes about the nvme-cli's behavior over
>> the years, especially on negative cases due to race conditions which
>> required retries. It still fails this miserably.  The async reconnect path
>> solved many of these issues for fc.
>>
>> For the auth failure, how do we deal with things if auth fails over time as
>> reconnects fail due to a credential changes ?  I would think commonality of
>> this behavior drives part of the choice.
> 
> Alright, what do you think about the idea to introduce a new '--sync' option to
> nvme-cli which forwards this info to the kernel that we want to wait for the
> initial connect to succeed or fail? Obviously, this needs to handle signals too.
> 
>  From what I understood this is also what Ewan would like to have
To me this is not sync vs non-sync option, it's a max_reconnects value 
tested for in nvmf_should_reconnect(). Which, if set to 0 (or 1), should 
fail if the initial connect fails.

Right now max_reconnects is calculated by the ctrl_loss_tmo and 
reconnect_delay. So there's already a way via the cli to make sure 
there's only 1 connect attempt. I wouldn't mind seeing an exact cli 
option that sets it to 1 connection attempt w/o the user calculation and 
2 value specification.

I also assume that this is not something that would be set by default in 
the auto-connect scripts or automated cli startup scripts.


> 
> Hannes thought it would make sense to use the same initial connect logic in
> tcp/rdma, because there could also be transient erros (e.g. spanning tree
> protocol). In short making the tcp/rdma do the same thing as fc?

I agree that the same connect logic makes sense for tcp/rdma. Certainly 
one connect/teardown path vs one at create and one at reconnect makes 
sense. The transient errors during 1st connect was the why FC added it 
and I would assume tcp/rdma has it's own transient errors or timing 
relationships at initial connection setups, etc.

For FC, we're trying to work around errors to transport commands (FC 
NVME ELS's) that fail (dropped or timeout) or commands used to 
initialize the controller which may be dropped/timeout thus fail 
controller init. Although NVMe-FC does have a retransmission option, it 
generally doesn't apply to the FC NVME LS's, and few of the FC devices 
have yet to turn on the retransmission option to deal with the errors. 
So the general behavior is connection termination and/or association 
termination which then depends on the reconnect path to retry. It's also 
critical as connection requests are automated on FC based on 
connectivity events. If we fail out to the cli due to the fabric 
dropping some up front command, there's no guarantee there will be 
another connectivity event to restart the controller create and we end 
up without device connectivity. The other issue we had to deal with was 
how long sysadm hung out waiting for the auto-connect script to 
complete. We couldn't wait the entire multiple retry case, and returning 
before the 1st attempt was complete was against the spirit of the cli - 
so we waited for the 1st attempt to try, released sysadm and let the 
reconnect go on in the background.


> 
> So let's drop the final patch from this series for the time. Could you give some
> feedback on the rest of the patches?
> 
> Thanks,
> Daniel

I'll look at them.

-- james



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-07-11 22:47       ` James Smart
@ 2023-07-12  6:50         ` Hannes Reinecke
  0 siblings, 0 replies; 17+ messages in thread
From: Hannes Reinecke @ 2023-07-12  6:50 UTC (permalink / raw)
  To: James Smart, Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Ewan Milne

On 7/12/23 00:47, James Smart wrote:
> On 7/6/2023 5:07 AM, Daniel Wagner wrote:
>> Hi James,
>>
>> On Sat, Jul 01, 2023 at 05:11:11AM -0700, James Smart wrote:
>>> As much as you want to make this change to make transports "similar", 
>>> I am dead set against it unless you are completing a long qualification
>>> of the change on real FC hardware and FC-NVME devices. There is probably 
>>> 1.5 yrs of testing of different race conditions that drove this change.
>>> You cannot declare success from a simplistic toy tool such as fcloop for 
>>> validation.
>>>
>>> The original issues exist, probably have even morphed given the time 
>>> from
>>> the original change, and this will seriously disrupt the transport 
>>> and any
>>> downstream releases.  So I have a very strong NACK on this change.
>>>
>>> Yes - things such as the connect failure results are difficult to return
>>> back to nvme-cli. I have had many gripes about the nvme-cli's 
>>> behavior over
>>> the years, especially on negative cases due to race conditions which
>>> required retries. It still fails this miserably.  The async reconnect 
>>> path
>>> solved many of these issues for fc.
>>>
>>> For the auth failure, how do we deal with things if auth fails over 
>>> time as
>>> reconnects fail due to a credential changes ?  I would think 
>>> commonality of
>>> this behavior drives part of the choice.
>>
>> Alright, what do you think about the idea to introduce a new '--sync' 
>> option to
>> nvme-cli which forwards this info to the kernel that we want to wait 
>> for the
>> initial connect to succeed or fail? Obviously, this needs to handle 
>> signals too.
>>
>>  From what I understood this is also what Ewan would like to have
> To me this is not sync vs non-sync option, it's a max_reconnects value 
> tested for in nvmf_should_reconnect(). Which, if set to 0 (or 1), should 
> fail if the initial connect fails.
> 
Well, this is more a technical detail while we continue to harp about 
'sync' vs 'non-sync'.
Currently all instances of ->create_ctrl() are running asynchronously,
ie ->create_ctrl() returns a 'ctrl' object which is still in the process
of establishing the connection.
(And there it doesn't really matter whether it's FC or TCP/RDMA; FC is 
kicking of a workqueue for the 'reconnect' call, whereas TCP/RDMA is 
creating the association and issues the actual 'connect' NVMe SQE via
an I/O workqueue; net result is identical).
And when we talk about 'sync' connect we are planning to _wait_ until
this asynchronous operation reaches a steady state, ie either after the 
connect attempts succeeded or after the connect retries are exhausted.

And yes, we _are_ aware that this might be a quite long time.

> Right now max_reconnects is calculated by the ctrl_loss_tmo and 
> reconnect_delay. So there's already a way via the cli to make sure 
> there's only 1 connect attempt. I wouldn't mind seeing an exact cli 
> option that sets it to 1 connection attempt w/o the user calculation and 
> 2 value specification.
> 
Again, we do _not_ propose to change any of the default settings.
The 'sync' option will not modify the reconnect settings, it will just 
wait until a steady state it reached.

> I also assume that this is not something that would be set by default in 
> the auto-connect scripts or automated cli startup scripts.
> 
You assume correctly. That's why it'll be an additional option.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
  2023-07-06 12:07     ` Daniel Wagner
  2023-07-11 22:47       ` James Smart
@ 2023-07-13 20:35       ` Ewan Milne
  1 sibling, 0 replies; 17+ messages in thread
From: Ewan Milne @ 2023-07-13 20:35 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: James Smart, linux-nvme, linux-kernel, linux-block,
	Chaitanya Kulkarni, Shin'ichiro Kawasaki, Sagi Grimberg,
	Hannes Reinecke

The basic issue I am trying to solve with NVMe/FC is that we cannot get systems
to boot reliably from NVMe/FC (boot from SAN) because we don't know how long
the connect is going to take, and we have seen spurious failures in our testing.

In general, I think the whole business of a userspace syscall() -> asynchronous
work in the kernel is problematic because the syscall can return a good status
even if the connect is never going to work due to a condition that is
not going to
clear.  It is difficult and cumbersome to script around this reliably.

So what I am saying is the current mechanism doesn't work completely
right anyway.

We also encounter the problem Daniel is addressing trying to get
blktests and other
internal tests to work, for similar reasons.

-Ewan

On Thu, Jul 6, 2023 at 8:07 AM Daniel Wagner <dwagner@suse.de> wrote:
>
> Hi James,
>
> On Sat, Jul 01, 2023 at 05:11:11AM -0700, James Smart wrote:
> > As much as you want to make this change to make transports "similar", I am
> > dead set against it unless you are completing a long qualification of the
> > change on real FC hardware and FC-NVME devices. There is probably 1.5 yrs of
> > testing of different race conditions that drove this change. You cannot
> > declare success from a simplistic toy tool such as fcloop for validation.
> >
> > The original issues exist, probably have even morphed given the time from
> > the original change, and this will seriously disrupt the transport and any
> > downstream releases.  So I have a very strong NACK on this change.
> >
> > Yes - things such as the connect failure results are difficult to return
> > back to nvme-cli. I have had many gripes about the nvme-cli's behavior over
> > the years, especially on negative cases due to race conditions which
> > required retries. It still fails this miserably.  The async reconnect path
> > solved many of these issues for fc.
> >
> > For the auth failure, how do we deal with things if auth fails over time as
> > reconnects fail due to a credential changes ?  I would think commonality of
> > this behavior drives part of the choice.
>
> Alright, what do you think about the idea to introduce a new '--sync' option to
> nvme-cli which forwards this info to the kernel that we want to wait for the
> initial connect to succeed or fail? Obviously, this needs to handle signals too.
>
> From what I understood this is also what Ewan would like to have.
>
> Hannes thought it would make sense to use the same initial connect logic in
> tcp/rdma, because there could also be transient erros (e.g. spanning tree
> protocol). In short making the tcp/rdma do the same thing as fc?
>
> So let's drop the final patch from this series for the time. Could you give some
> feedback on the rest of the patches?
>
> Thanks,
> Daniel
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-07-13 20:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-20 13:37 [PATCH v2 0/5] nvme-fc: Fix blktests hangers Daniel Wagner
2023-06-20 13:37 ` [PATCH v2 1/5] nvme-fc: Do not wait in vain when unloading module Daniel Wagner
2023-06-20 13:37 ` [PATCH v2 2/5] nvme-fcloop: queue work items correctly Daniel Wagner
2023-06-20 13:37 ` [PATCH v2 3/5] nvmet-fcloop: Remove remote port from list when unlinking Daniel Wagner
2023-06-20 13:37 ` [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous Daniel Wagner
2023-06-26 10:59   ` Dan Carpenter
2023-06-26 11:33   ` Dan Carpenter
2023-06-27  6:18     ` Daniel Wagner
2023-06-27  6:39       ` Hannes Reinecke
2023-06-27  6:51         ` Hannes Reinecke
2023-07-01 12:11   ` James Smart
2023-07-06 12:07     ` Daniel Wagner
2023-07-11 22:47       ` James Smart
2023-07-12  6:50         ` Hannes Reinecke
2023-07-13 20:35       ` Ewan Milne
2023-06-20 13:37 ` [PATCH v2 5/5] nvme-fc: do no free ctrl opts Daniel Wagner
2023-06-30 13:33 ` [PATCH v2 0/5] nvme-fc: Fix blktests hangers Ewan Milne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).