All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH blktests v1 0/3] More fixes for FC enabling
@ 2023-06-20 13:27 Daniel Wagner
  2023-06-20 13:27 ` [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly Daniel Wagner
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:27 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger, Daniel Wagner

The first patch is addressing the problem, that the FC transport is way faster
in reconnecting and the test didn't observe all the states from live ->
resetting -> connecting -> live. Instead trying to see these transitions just
test for the final state which is live and the correct number of queues. This
makes this test also a little bit more robust. So this patch is necessary.

The next two patches are more in RFC state but I think it makes sense to post
them along side the rest.

The second and the third patch rely on the not yet released nvme-cli features
'volatile configuration' and 'execution context awareness'. These two feature
allow nvme-cli to figure out if a 'nvme connect' should actually be done or just
ignored. If the FC autoconnect udev/systemd rules are enabled on a host, this is
interfering with blktests. Note, this is also a way to get nvme-stas and
nvme-cli play nicely with each other.

In case anyone wants to run blktest with FC as transport needs either to disable
the autoconnect feature or use the unreleased features of nvme-cli.

Daniel Wagner (3):
  nvme/048: Check for queue count check directly
  nvme/rc: Avoid triggering host nvme-cli autoconnect
  nvme/{041,042,043,044,045}: Use default hostnqn and hostid

 tests/nvme/041 |  8 ++----
 tests/nvme/042 |  8 ++----
 tests/nvme/043 |  8 ++----
 tests/nvme/044 |  8 ++----
 tests/nvme/045 |  8 ++----
 tests/nvme/048 | 35 ++++++++++++++++--------
 tests/nvme/rc  | 73 +++++++++++++++++++++++++++++++++++++++++++-------
 7 files changed, 97 insertions(+), 51 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly
  2023-06-20 13:27 [PATCH blktests v1 0/3] More fixes for FC enabling Daniel Wagner
@ 2023-06-20 13:27 ` Daniel Wagner
  2023-06-20 13:49   ` Sagi Grimberg
  2023-06-27 10:13   ` Shinichiro Kawasaki
  2023-06-20 13:27 ` [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect Daniel Wagner
  2023-06-20 13:27 ` [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid Daniel Wagner
  2 siblings, 2 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:27 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger, Daniel Wagner

The test monitored the state changes live -> resetting -> connecting ->
live, to figure out the queue count change was successful.

The fc transport is reconnecting very fast and the state transitions
are not observed by the current approach.

So instead trying to monitor the state changes, let's just wait for the
live state and the correct queue number.

As queue count is depending on the number of online CPUs we explicitly
use 1 and 2 for the max_queue count. This means the queue_count value
needs to reach either 2 or 3 (admin queue included).

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 tests/nvme/048 | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/tests/nvme/048 b/tests/nvme/048
index 81084f0440c2..3dc5169132de 100755
--- a/tests/nvme/048
+++ b/tests/nvme/048
@@ -42,6 +42,26 @@ nvmf_wait_for_state() {
 	return 0
 }
 
+nvmf_wait_for_queue_count() {
+	local subsys_name="$1"
+	local queue_count="$2"
+	local nvmedev
+
+	nvmedev=$(_find_nvme_dev "${subsys_name}")
+
+	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"
+
+	nvmf_wait_for_state "${subsys_name}" "live" || return 1
+
+	queue_count=$((queue_count + 1))
+	if grep -q "${queue_count}" "${queue_count_file}"; then
+		return 0
+	fi
+
+	echo "expected queue count ${queue_count} not set"
+	return 1
+}
+
 set_nvmet_attr_qid_max() {
 	local nvmet_subsystem="$1"
 	local qid_max="$2"
@@ -56,10 +76,7 @@ set_qid_max() {
 	local qid_max="$3"
 
 	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
-
-	# Setting qid_max forces a disconnect and the reconntect attempt starts
-	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
-	nvmf_wait_for_state "${subsys_name}" "live" || return 1
+	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1
 
 	return 0
 }
@@ -77,12 +94,8 @@ test() {
 
 	_setup_nvmet
 
-	hostid="$(uuidgen)"
-	if [ -z "$hostid" ] ; then
-		echo "uuidgen failed"
-		return 1
-	fi
-	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
+	hostid="${def_hostid}"
+	hostnqn="${def_hostnqn}"
 
 	truncate -s "${nvme_img_size}" "${file_path}"
 
@@ -103,7 +116,7 @@ test() {
 			echo FAIL
 		else
 			set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL
-			set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL
+			set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL
 		fi
 
 		_nvme_disconnect_subsys "${subsys_name}"
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-20 13:27 [PATCH blktests v1 0/3] More fixes for FC enabling Daniel Wagner
  2023-06-20 13:27 ` [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly Daniel Wagner
@ 2023-06-20 13:27 ` Daniel Wagner
  2023-06-20 14:07   ` Sagi Grimberg
                     ` (2 more replies)
  2023-06-20 13:27 ` [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid Daniel Wagner
  2 siblings, 3 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:27 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger, Daniel Wagner

When the host has enabled the udev/systemd autoconnect services for the
fc transport it interacts with blktests and make tests break.

nvme-cli learned to ignore connects attemps when using the
--context command line option paired with a volatile configuration. Thus
we can mark all the resources created by blktests and avoid any
interaction with the systemd autoconnect scripts.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 tests/nvme/rc | 73 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 63 insertions(+), 10 deletions(-)

diff --git a/tests/nvme/rc b/tests/nvme/rc
index 191f3e2e0c43..00117d314a38 100644
--- a/tests/nvme/rc
+++ b/tests/nvme/rc
@@ -14,8 +14,8 @@ def_remote_wwnn="0x10001100aa000001"
 def_remote_wwpn="0x20001100aa000001"
 def_local_wwnn="0x10001100aa000002"
 def_local_wwpn="0x20001100aa000002"
-def_hostnqn="$(cat /etc/nvme/hostnqn 2> /dev/null)"
-def_hostid="$(cat /etc/nvme/hostid 2> /dev/null)"
+def_hostnqn="nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8"
+def_hostid="242d4a24-2484-4a80-8234-d0169409c5e8"
 nvme_trtype=${nvme_trtype:-"loop"}
 nvme_img_size=${nvme_img_size:-"1G"}
 nvme_num_iter=${nvme_num_iter:-"1000"}
@@ -161,6 +161,50 @@ _nvme_calc_rand_io_size() {
 	echo "${io_size_kb}k"
 }
 
+_nvme_cli_supports_context() {
+	if ! nvme connect --help 2>&1 | grep -q context > /dev/null; then
+		    return 1
+	fi
+	return 0
+}
+
+_setup_nvme_cli() {
+	if ! _nvme_cli_supports_context; then
+		return
+	fi
+
+	mkdir -p /run/nvme
+	cat >> /run/nvme/blktests.json <<-EOF
+	[
+	  {
+	    "hostnqn": "${def_hostnqn}",
+	    "hostid": "${def_hostid}",
+	    "subsystems": [
+	      {
+	        "application": "blktests",
+	        "nqn": "blktests-subsystem-1",
+	        "ports": [
+	          {
+	            "transport": "fc",
+	            "traddr": "nn-${def_remote_wwnn}:pn-${def_remote_wwpn}",
+	            "host_traddr": "nn-${def_local_wwnn}:pn-${def_local_wwpn}"
+	          }
+	        ]
+	      }
+	    ]
+	  }
+	]
+	EOF
+}
+
+_cleanup_nvme_cli() {
+	if ! _nvme_cli_supports_context; then
+		return
+	fi
+
+	rm -f /run/nvme/blktests.json
+}
+
 _nvme_fcloop_add_rport() {
 	local local_wwnn="$1"
 	local local_wwpn="$2"
@@ -235,6 +279,8 @@ _cleanup_fcloop() {
 	_nvme_fcloop_del_lport "${local_wwnn}" "${local_wwpn}"
 	_nvme_fcloop_del_rport "${local_wwnn}" "${local_wwpn}" \
 			       "${remote_wwnn}" "${remote_wwpn}"
+
+	_cleanup_nvme_cli
 }
 
 _cleanup_nvmet() {
@@ -337,6 +383,8 @@ _setup_nvmet() {
 		def_host_traddr=$(printf "nn-%s:pn-%s" \
 					 "${def_local_wwnn}" \
 					 "${def_local_wwpn}")
+
+		_setup_nvme_cli
 	fi
 }
 
@@ -436,18 +484,18 @@ _nvme_connect_subsys() {
 	trtype="$1"
 	subsysnqn="$2"
 
-	ARGS=(-t "${trtype}" -n "${subsysnqn}")
+	ARGS=(-t "${trtype}"
+	      -n "${subsysnqn}"
+	      --hostnqn="${hostnqn}"
+	      --hostid="${hostid}")
+	if _nvme_cli_supports_context; then
+		ARGS+=(--context="blktests")
+	fi
 	if [[ "${trtype}" == "fc" ]] ; then
 		ARGS+=(-a "${traddr}" -w "${host_traddr}")
 	elif [[ "${trtype}" != "loop" ]]; then
 		ARGS+=(-a "${traddr}" -s "${trsvcid}")
 	fi
-	if [[ "${hostnqn}" != "$def_hostnqn" ]]; then
-		ARGS+=(--hostnqn="${hostnqn}")
-	fi
-	if [[ "${hostid}" != "$def_hostid" ]]; then
-		ARGS+=(--hostid="${hostid}")
-	fi
 	if [[ -n "${hostkey}" ]]; then
 		ARGS+=(--dhchap-secret="${hostkey}")
 	fi
@@ -482,7 +530,12 @@ _nvme_discover() {
 	local host_traddr="${3:-$def_host_traddr}"
 	local trsvcid="${3:-$def_trsvcid}"
 
-	ARGS=(-t "${trtype}")
+	ARGS=(-t "${trtype}"
+	      --hostnqn="${def_hostnqn}"
+	      --hostid="${def_hostid}")
+	if _nvme_cli_supports_context; then
+		ARGS+=(--context="blktests")
+	fi
 	if [[ "${trtype}" = "fc" ]]; then
 		ARGS+=(-a "${traddr}" -w "${host_traddr}")
 	elif [[ "${trtype}" != "loop" ]]; then
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid
  2023-06-20 13:27 [PATCH blktests v1 0/3] More fixes for FC enabling Daniel Wagner
  2023-06-20 13:27 ` [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly Daniel Wagner
  2023-06-20 13:27 ` [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect Daniel Wagner
@ 2023-06-20 13:27 ` Daniel Wagner
  2023-06-27 10:24   ` Shinichiro Kawasaki
  2023-07-06 16:06   ` Max Gurtovoy
  2 siblings, 2 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-06-20 13:27 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger, Daniel Wagner

The host might have enabled the udev/systemd auto connect feature.
This disturbs the blktests for the fc transport. nvme-cli is able
to distinguish between the different invocations via the --context
option. In order to get this working we have to use the default
hostnqn and hostid and not randon generated IDs for every single
run.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 tests/nvme/041 | 8 ++------
 tests/nvme/042 | 8 ++------
 tests/nvme/043 | 8 ++------
 tests/nvme/044 | 8 ++------
 tests/nvme/045 | 8 ++------
 5 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/tests/nvme/041 b/tests/nvme/041
index 308655dd6090..5b04b99b128e 100755
--- a/tests/nvme/041
+++ b/tests/nvme/041
@@ -30,12 +30,8 @@ test() {
 
 	echo "Running ${TEST_NAME}"
 
-	hostid="$(uuidgen)"
-	if [ -z "$hostid" ] ; then
-		echo "uuidgen failed"
-		return 1
-	fi
-	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
+	hostid="${def_hostid}"
+	hostnqn="${def_hostnqn}"
 	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
 	if [ -z "$hostkey" ] ; then
 		echo "nvme gen-dhchap-key failed"
diff --git a/tests/nvme/042 b/tests/nvme/042
index fed2efead013..8df5ed37aacc 100755
--- a/tests/nvme/042
+++ b/tests/nvme/042
@@ -32,12 +32,8 @@ test() {
 
 	echo "Running ${TEST_NAME}"
 
-	hostid="$(uuidgen)"
-	if [ -z "$hostid" ] ; then
-		echo "uuidgen failed"
-		return 1
-	fi
-	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
+	hostid="${def_hostid}"
+	hostnqn="${def_hostnqn}"
 
 	_setup_nvmet
 
diff --git a/tests/nvme/043 b/tests/nvme/043
index a030884aa4ed..b591e39d0706 100755
--- a/tests/nvme/043
+++ b/tests/nvme/043
@@ -33,12 +33,8 @@ test() {
 
 	echo "Running ${TEST_NAME}"
 
-	hostid="$(uuidgen)"
-	if [ -z "$hostid" ] ; then
-		echo "uuidgen failed"
-		return 1
-	fi
-	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
+	hostid="${def_hostid}"
+	hostnqn="${def_hostnqn}"
 
 	_setup_nvmet
 
diff --git a/tests/nvme/044 b/tests/nvme/044
index 9928bcc55397..fca0897af27b 100755
--- a/tests/nvme/044
+++ b/tests/nvme/044
@@ -32,12 +32,8 @@ test() {
 
 	echo "Running ${TEST_NAME}"
 
-	hostid="$(uuidgen)"
-	if [ -z "$hostid" ] ; then
-		echo "uuidgen failed"
-		return 1
-	fi
-	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
+	hostid="${def_hostid}"
+	hostnqn="${def_hostnqn}"
 
 	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
 	if [ -z "$hostkey" ] ; then
diff --git a/tests/nvme/045 b/tests/nvme/045
index 26a55335a92c..eca629a18691 100755
--- a/tests/nvme/045
+++ b/tests/nvme/045
@@ -36,12 +36,8 @@ test() {
 
 	echo "Running ${TEST_NAME}"
 
-	hostid="$(uuidgen)"
-	if [ -z "$hostid" ] ; then
-		echo "uuidgen failed"
-		return 1
-	fi
-	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
+	hostid="${def_hostid}"
+	hostnqn="${def_hostnqn}"
 
 	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
 	if [ -z "$hostkey" ] ; then
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly
  2023-06-20 13:27 ` [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly Daniel Wagner
@ 2023-06-20 13:49   ` Sagi Grimberg
  2023-06-20 17:37     ` Daniel Wagner
  2023-06-27 10:13   ` Shinichiro Kawasaki
  1 sibling, 1 reply; 35+ messages in thread
From: Sagi Grimberg @ 2023-06-20 13:49 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Hannes Reinecke, James Smart,
	Martin Belanger


> The test monitored the state changes live -> resetting -> connecting ->
> live, to figure out the queue count change was successful.
> 
> The fc transport is reconnecting very fast and the state transitions
> are not observed by the current approach.
> 
> So instead trying to monitor the state changes, let's just wait for the
> live state and the correct queue number.
> 
> As queue count is depending on the number of online CPUs we explicitly
> use 1 and 2 for the max_queue count. This means the queue_count value
> needs to reach either 2 or 3 (admin queue included).
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   tests/nvme/048 | 35 ++++++++++++++++++++++++-----------
>   1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/nvme/048 b/tests/nvme/048
> index 81084f0440c2..3dc5169132de 100755
> --- a/tests/nvme/048
> +++ b/tests/nvme/048
> @@ -42,6 +42,26 @@ nvmf_wait_for_state() {
>   	return 0
>   }
>   
> +nvmf_wait_for_queue_count() {
> +	local subsys_name="$1"
> +	local queue_count="$2"
> +	local nvmedev
> +
> +	nvmedev=$(_find_nvme_dev "${subsys_name}")
> +
> +	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"
> +
> +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> +
> +	queue_count=$((queue_count + 1))
> +	if grep -q "${queue_count}" "${queue_count_file}"; then
> +		return 0
> +	fi
> +
> +	echo "expected queue count ${queue_count} not set"
> +	return 1
> +}
> +
>   set_nvmet_attr_qid_max() {
>   	local nvmet_subsystem="$1"
>   	local qid_max="$2"
> @@ -56,10 +76,7 @@ set_qid_max() {
>   	local qid_max="$3"
>   
>   	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
> -
> -	# Setting qid_max forces a disconnect and the reconntect attempt starts
> -	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
> -	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> +	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1

Why not simply wait for live? The connecting is obviously racy...

>   
>   	return 0
>   }
> @@ -77,12 +94,8 @@ test() {
>   
>   	_setup_nvmet
>   
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>   
>   	truncate -s "${nvme_img_size}" "${file_path}"
>   
> @@ -103,7 +116,7 @@ test() {
>   			echo FAIL
>   		else
>   			set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL
> -			set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL
> +			set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL
>   		fi
>   
>   		_nvme_disconnect_subsys "${subsys_name}"

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-20 13:27 ` [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect Daniel Wagner
@ 2023-06-20 14:07   ` Sagi Grimberg
  2023-06-20 17:43     ` Daniel Wagner
  2023-06-27 10:22   ` Shinichiro Kawasaki
  2023-07-06 16:11   ` Max Gurtovoy
  2 siblings, 1 reply; 35+ messages in thread
From: Sagi Grimberg @ 2023-06-20 14:07 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Hannes Reinecke, James Smart,
	Martin Belanger


> When the host has enabled the udev/systemd autoconnect services for the
> fc transport it interacts with blktests and make tests break.
> 
> nvme-cli learned to ignore connects attemps when using the
> --context command line option paired with a volatile configuration. Thus
> we can mark all the resources created by blktests and avoid any
> interaction with the systemd autoconnect scripts.

Hmm... is this hapenning with non-fc as well?

> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   tests/nvme/rc | 73 ++++++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 63 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/nvme/rc b/tests/nvme/rc
> index 191f3e2e0c43..00117d314a38 100644
> --- a/tests/nvme/rc
> +++ b/tests/nvme/rc
> @@ -14,8 +14,8 @@ def_remote_wwnn="0x10001100aa000001"
>   def_remote_wwpn="0x20001100aa000001"
>   def_local_wwnn="0x10001100aa000002"
>   def_local_wwpn="0x20001100aa000002"
> -def_hostnqn="$(cat /etc/nvme/hostnqn 2> /dev/null)"
> -def_hostid="$(cat /etc/nvme/hostid 2> /dev/null)"
> +def_hostnqn="nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8"
> +def_hostid="242d4a24-2484-4a80-8234-d0169409c5e8"
>   nvme_trtype=${nvme_trtype:-"loop"}
>   nvme_img_size=${nvme_img_size:-"1G"}
>   nvme_num_iter=${nvme_num_iter:-"1000"}
> @@ -161,6 +161,50 @@ _nvme_calc_rand_io_size() {
>   	echo "${io_size_kb}k"
>   }
>   
> +_nvme_cli_supports_context() {
> +	if ! nvme connect --help 2>&1 | grep -q context > /dev/null; then
> +		    return 1
> +	fi
> +	return 0
> +}

Not a great way to check support.

> +
> +_setup_nvme_cli() {
> +	if ! _nvme_cli_supports_context; then
> +		return
> +	fi
> +
> +	mkdir -p /run/nvme
> +	cat >> /run/nvme/blktests.json <<-EOF
> +	[
> +	  {
> +	    "hostnqn": "${def_hostnqn}",
> +	    "hostid": "${def_hostid}",
> +	    "subsystems": [
> +	      {
> +	        "application": "blktests",
> +	        "nqn": "blktests-subsystem-1",
> +	        "ports": [
> +	          {
> +	            "transport": "fc",
> +	            "traddr": "nn-${def_remote_wwnn}:pn-${def_remote_wwpn}",
> +	            "host_traddr": "nn-${def_local_wwnn}:pn-${def_local_wwpn}"
> +	          }
> +	        ]
> +	      }
> +	    ]
> +	  }
> +	]
> +	EOF
> +}
> +
> +_cleanup_nvme_cli() {
> +	if ! _nvme_cli_supports_context; then
> +		return
> +	fi
> +
> +	rm -f /run/nvme/blktests.json
> +}
> +
>   _nvme_fcloop_add_rport() {
>   	local local_wwnn="$1"
>   	local local_wwpn="$2"
> @@ -235,6 +279,8 @@ _cleanup_fcloop() {
>   	_nvme_fcloop_del_lport "${local_wwnn}" "${local_wwpn}"
>   	_nvme_fcloop_del_rport "${local_wwnn}" "${local_wwpn}" \
>   			       "${remote_wwnn}" "${remote_wwpn}"
> +
> +	_cleanup_nvme_cli
>   }
>   
>   _cleanup_nvmet() {
> @@ -337,6 +383,8 @@ _setup_nvmet() {
>   		def_host_traddr=$(printf "nn-%s:pn-%s" \
>   					 "${def_local_wwnn}" \
>   					 "${def_local_wwpn}")
> +
> +		_setup_nvme_cli
>   	fi
>   }
>   
> @@ -436,18 +484,18 @@ _nvme_connect_subsys() {
>   	trtype="$1"
>   	subsysnqn="$2"
>   
> -	ARGS=(-t "${trtype}" -n "${subsysnqn}")
> +	ARGS=(-t "${trtype}"
> +	      -n "${subsysnqn}"
> +	      --hostnqn="${hostnqn}"
> +	      --hostid="${hostid}")
> +	if _nvme_cli_supports_context; then
> +		ARGS+=(--context="blktests")
> +	fi
>   	if [[ "${trtype}" == "fc" ]] ; then
>   		ARGS+=(-a "${traddr}" -w "${host_traddr}")
>   	elif [[ "${trtype}" != "loop" ]]; then
>   		ARGS+=(-a "${traddr}" -s "${trsvcid}")
>   	fi
> -	if [[ "${hostnqn}" != "$def_hostnqn" ]]; then
> -		ARGS+=(--hostnqn="${hostnqn}")
> -	fi
> -	if [[ "${hostid}" != "$def_hostid" ]]; then
> -		ARGS+=(--hostid="${hostid}")
> -	fi
>   	if [[ -n "${hostkey}" ]]; then
>   		ARGS+=(--dhchap-secret="${hostkey}")
>   	fi
> @@ -482,7 +530,12 @@ _nvme_discover() {
>   	local host_traddr="${3:-$def_host_traddr}"
>   	local trsvcid="${3:-$def_trsvcid}"
>   
> -	ARGS=(-t "${trtype}")
> +	ARGS=(-t "${trtype}"
> +	      --hostnqn="${def_hostnqn}"
> +	      --hostid="${def_hostid}")
> +	if _nvme_cli_supports_context; then
> +		ARGS+=(--context="blktests")
> +	fi
>   	if [[ "${trtype}" = "fc" ]]; then
>   		ARGS+=(-a "${traddr}" -w "${host_traddr}")
>   	elif [[ "${trtype}" != "loop" ]]; then

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly
  2023-06-20 13:49   ` Sagi Grimberg
@ 2023-06-20 17:37     ` Daniel Wagner
  2023-06-21  9:50       ` Sagi Grimberg
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-06-20 17:37 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Hannes Reinecke, James Smart,
	Martin Belanger

On Tue, Jun 20, 2023 at 04:49:45PM +0300, Sagi Grimberg wrote:
> > +nvmf_wait_for_queue_count() {
> > +	local subsys_name="$1"
> > +	local queue_count="$2"
> > +	local nvmedev
> > +
> > +	nvmedev=$(_find_nvme_dev "${subsys_name}")
> > +
> > +	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"
> > +
> > +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> > +
> > +	queue_count=$((queue_count + 1))
> > +	if grep -q "${queue_count}" "${queue_count_file}"; then
> > +		return 0
> > +	fi
> > +
> > +	echo "expected queue count ${queue_count} not set"
> > +	return 1
> > +}
> > +
> >   set_nvmet_attr_qid_max() {
> >   	local nvmet_subsystem="$1"
> >   	local qid_max="$2"
> > @@ -56,10 +76,7 @@ set_qid_max() {
> >   	local qid_max="$3"
> >   	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
> > -
> > -	# Setting qid_max forces a disconnect and the reconntect attempt starts
> > -	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
> > -	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> > +	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1
> 
> Why not simply wait for live? The connecting is obviously racy...

That is what the new version is doing. It's waiting for the live state and then
checks the queue count.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-20 14:07   ` Sagi Grimberg
@ 2023-06-20 17:43     ` Daniel Wagner
  2023-06-21  9:02       ` Daniel Wagner
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-06-20 17:43 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Hannes Reinecke, James Smart,
	Martin Belanger

On Tue, Jun 20, 2023 at 05:07:43PM +0300, Sagi Grimberg wrote:
> 
> > When the host has enabled the udev/systemd autoconnect services for the
> > fc transport it interacts with blktests and make tests break.
> > 
> > nvme-cli learned to ignore connects attemps when using the
> > --context command line option paired with a volatile configuration. Thus
> > we can mark all the resources created by blktests and avoid any
> > interaction with the systemd autoconnect scripts.
> 
> Hmm... is this hapenning with non-fc as well?

I haven't seen a problem for TCP or RDMA yet but in principle it should also
exists. I can do some tracing to see if we have also problem thern. Two of the
udev rule match on the subsystem and the event type.

> > +_nvme_cli_supports_context() {
> > +	if ! nvme connect --help 2>&1 | grep -q context > /dev/null; then
> > +		    return 1
> > +	fi
> > +	return 0
> > +}
> 
> Not a great way to check support.

Yeah, agree, it's a bit dodgy. I'll try to figure out a different way. Any
suggestions?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-20 17:43     ` Daniel Wagner
@ 2023-06-21  9:02       ` Daniel Wagner
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-06-21  9:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Hannes Reinecke, James Smart,
	Martin Belanger

On Tue, Jun 20, 2023 at 07:43:35PM +0200, Daniel Wagner wrote:
> On Tue, Jun 20, 2023 at 05:07:43PM +0300, Sagi Grimberg wrote:
> > Hmm... is this hapenning with non-fc as well?
> 
> I haven't seen a problem for TCP or RDMA yet but in principle it should also
> exists. I can do some tracing to see if we have also problem thern. Two of the
> udev rule match on the subsystem and the event type.

The only udev rule which gets triggered during blktests execution is this one:

# nvme-fc transport generated events (old-style for compatibility)
ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
  ENV{NVMEFC_HOST_TRADDR}=="*",  ENV{NVMEFC_TRADDR}=="*", \
  RUN+="@SYSTEMCTL@ --no-block restart nvmf-connect@--device=none\t--transport=fc\t--traddr=$env{NVMEFC_TRADDR}\t--trsvcid=none\t--host-traddr=$env{NVMEFC_HOST_TRADDR}.service"

That explain why blktests didn't get disturbed by the autoconnect rule as this
rule has a match on the fc subsystem.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly
  2023-06-20 17:37     ` Daniel Wagner
@ 2023-06-21  9:50       ` Sagi Grimberg
  2023-06-21 11:19         ` Daniel Wagner
  0 siblings, 1 reply; 35+ messages in thread
From: Sagi Grimberg @ 2023-06-21  9:50 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Hannes Reinecke, James Smart,
	Martin Belanger


>>> +nvmf_wait_for_queue_count() {
>>> +	local subsys_name="$1"
>>> +	local queue_count="$2"
>>> +	local nvmedev
>>> +
>>> +	nvmedev=$(_find_nvme_dev "${subsys_name}")
>>> +
>>> +	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"
>>> +
>>> +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
>>> +
>>> +	queue_count=$((queue_count + 1))
>>> +	if grep -q "${queue_count}" "${queue_count_file}"; then
>>> +		return 0
>>> +	fi
>>> +
>>> +	echo "expected queue count ${queue_count} not set"
>>> +	return 1
>>> +}
>>> +
>>>    set_nvmet_attr_qid_max() {
>>>    	local nvmet_subsystem="$1"
>>>    	local qid_max="$2"
>>> @@ -56,10 +76,7 @@ set_qid_max() {
>>>    	local qid_max="$3"
>>>    	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
>>> -
>>> -	# Setting qid_max forces a disconnect and the reconntect attempt starts
>>> -	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
>>> -	nvmf_wait_for_state "${subsys_name}" "live" || return 1
>>> +	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1
>>
>> Why not simply wait for live? The connecting is obviously racy...
> 
> That is what the new version is doing. It's waiting for the live state and then
> checks the queue count.

Maybe don't fold waiting for live into waiting for queue_count.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly
  2023-06-21  9:50       ` Sagi Grimberg
@ 2023-06-21 11:19         ` Daniel Wagner
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-06-21 11:19 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Hannes Reinecke, James Smart,
	Martin Belanger

On Wed, Jun 21, 2023 at 12:50:29PM +0300, Sagi Grimberg wrote:
> > > Why not simply wait for live? The connecting is obviously racy...
> > 
> > That is what the new version is doing. It's waiting for the live state and then
> > checks the queue count.
> 
> Maybe don't fold waiting for live into waiting for queue_count.

Alright, will do!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly
  2023-06-20 13:27 ` [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly Daniel Wagner
  2023-06-20 13:49   ` Sagi Grimberg
@ 2023-06-27 10:13   ` Shinichiro Kawasaki
  2023-06-28  5:52     ` Daniel Wagner
  1 sibling, 1 reply; 35+ messages in thread
From: Shinichiro Kawasaki @ 2023-06-27 10:13 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Sagi Grimberg, Hannes Reinecke, James Smart, Martin Belanger

On Jun 20, 2023 / 15:27, Daniel Wagner wrote:
> The test monitored the state changes live -> resetting -> connecting ->
> live, to figure out the queue count change was successful.
> 
> The fc transport is reconnecting very fast and the state transitions
> are not observed by the current approach.
> 
> So instead trying to monitor the state changes, let's just wait for the
> live state and the correct queue number.
> 
> As queue count is depending on the number of online CPUs we explicitly
> use 1 and 2 for the max_queue count. This means the queue_count value
> needs to reach either 2 or 3 (admin queue included).
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  tests/nvme/048 | 35 ++++++++++++++++++++++++-----------
>  1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/nvme/048 b/tests/nvme/048
> index 81084f0440c2..3dc5169132de 100755
> --- a/tests/nvme/048
> +++ b/tests/nvme/048
> @@ -42,6 +42,26 @@ nvmf_wait_for_state() {
>  	return 0
>  }
>  
> +nvmf_wait_for_queue_count() {
> +	local subsys_name="$1"
> +	local queue_count="$2"
> +	local nvmedev
> +
> +	nvmedev=$(_find_nvme_dev "${subsys_name}")
> +
> +	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"

Nit: queue count file is not declared as a local variable.

> +
> +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> +
> +	queue_count=$((queue_count + 1))
> +	if grep -q "${queue_count}" "${queue_count_file}"; then

Does this check work when the number in queue_count_file has more digits than
queue_count? e.g.) queue_count_file=20, queue_count=2

> +		return 0
> +	fi
> +
> +	echo "expected queue count ${queue_count} not set"
> +	return 1
> +}
> +
>  set_nvmet_attr_qid_max() {
>  	local nvmet_subsystem="$1"
>  	local qid_max="$2"
> @@ -56,10 +76,7 @@ set_qid_max() {
>  	local qid_max="$3"
>  
>  	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
> -
> -	# Setting qid_max forces a disconnect and the reconntect attempt starts
> -	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
> -	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> +	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1
>  
>  	return 0
>  }
> @@ -77,12 +94,8 @@ test() {
>  
>  	_setup_nvmet
>  
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"

I guess it's the better to move this hunk to the 3rd patch. Or we can mention it
in the commit message of this patch.

>  
>  	truncate -s "${nvme_img_size}" "${file_path}"
>  
> @@ -103,7 +116,7 @@ test() {
>  			echo FAIL
>  		else
>  			set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL
> -			set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL
> +			set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL
>  		fi
>  
>  		_nvme_disconnect_subsys "${subsys_name}"
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-20 13:27 ` [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect Daniel Wagner
  2023-06-20 14:07   ` Sagi Grimberg
@ 2023-06-27 10:22   ` Shinichiro Kawasaki
  2023-06-28  6:09     ` Daniel Wagner
  2023-07-06 16:11   ` Max Gurtovoy
  2 siblings, 1 reply; 35+ messages in thread
From: Shinichiro Kawasaki @ 2023-06-27 10:22 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Jun 20, 2023 / 15:27, Daniel Wagner wrote:
> When the host has enabled the udev/systemd autoconnect services for the
> fc transport it interacts with blktests and make tests break.
> 
> nvme-cli learned to ignore connects attemps when using the
> --context command line option paired with a volatile configuration. Thus
> we can mark all the resources created by blktests and avoid any
> interaction with the systemd autoconnect scripts.

This sounds a good idea. Question, is "--context" option of the nvme command
mandatory to run nvme test with nvme_trtype=fc? Or is it nice-to-have feature
depending on the test system OS? If it is mandatory, it's the better to check
in _nvme_requires.

> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  tests/nvme/rc | 73 ++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 63 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/nvme/rc b/tests/nvme/rc
> index 191f3e2e0c43..00117d314a38 100644
> --- a/tests/nvme/rc
> +++ b/tests/nvme/rc
> @@ -14,8 +14,8 @@ def_remote_wwnn="0x10001100aa000001"
>  def_remote_wwpn="0x20001100aa000001"
>  def_local_wwnn="0x10001100aa000002"
>  def_local_wwpn="0x20001100aa000002"
> -def_hostnqn="$(cat /etc/nvme/hostnqn 2> /dev/null)"
> -def_hostid="$(cat /etc/nvme/hostid 2> /dev/null)"
> +def_hostnqn="nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8"
> +def_hostid="242d4a24-2484-4a80-8234-d0169409c5e8"
>  nvme_trtype=${nvme_trtype:-"loop"}
>  nvme_img_size=${nvme_img_size:-"1G"}
>  nvme_num_iter=${nvme_num_iter:-"1000"}
> @@ -161,6 +161,50 @@ _nvme_calc_rand_io_size() {
>  	echo "${io_size_kb}k"
>  }
>  
> +_nvme_cli_supports_context() {
> +	if ! nvme connect --help 2>&1 | grep -q context > /dev/null; then
> +		    return 1
> +	fi
> +	return 0
> +}
> +
> +_setup_nvme_cli() {
> +	if ! _nvme_cli_supports_context; then
> +		return
> +	fi
> +
> +	mkdir -p /run/nvme
> +	cat >> /run/nvme/blktests.json <<-EOF
> +	[
> +	  {
> +	    "hostnqn": "${def_hostnqn}",
> +	    "hostid": "${def_hostid}",
> +	    "subsystems": [
> +	      {
> +	        "application": "blktests",
> +	        "nqn": "blktests-subsystem-1",
> +	        "ports": [
> +	          {
> +	            "transport": "fc",
> +	            "traddr": "nn-${def_remote_wwnn}:pn-${def_remote_wwpn}",
> +	            "host_traddr": "nn-${def_local_wwnn}:pn-${def_local_wwpn}"
> +	          }
> +	        ]
> +	      }
> +	    ]
> +	  }
> +	]
> +	EOF
> +}
> +
> +_cleanup_nvme_cli() {
> +	if ! _nvme_cli_supports_context; then
> +		return
> +	fi
> +
> +	rm -f /run/nvme/blktests.json
> +}
> +
>  _nvme_fcloop_add_rport() {
>  	local local_wwnn="$1"
>  	local local_wwpn="$2"
> @@ -235,6 +279,8 @@ _cleanup_fcloop() {
>  	_nvme_fcloop_del_lport "${local_wwnn}" "${local_wwpn}"
>  	_nvme_fcloop_del_rport "${local_wwnn}" "${local_wwpn}" \
>  			       "${remote_wwnn}" "${remote_wwpn}"
> +
> +	_cleanup_nvme_cli
>  }
>  
>  _cleanup_nvmet() {
> @@ -337,6 +383,8 @@ _setup_nvmet() {
>  		def_host_traddr=$(printf "nn-%s:pn-%s" \
>  					 "${def_local_wwnn}" \
>  					 "${def_local_wwpn}")
> +
> +		_setup_nvme_cli

Can we move this _setup_nvme_cli call from _setup_nvmet to _setup_fcloop?
_cleanup_nvme_cli is called in _cleanup_fcloop. I think it is a bit cleaner
since those setup/cleanup functions are called at at same level.

>  	fi
>  }
>  
> @@ -436,18 +484,18 @@ _nvme_connect_subsys() {
>  	trtype="$1"
>  	subsysnqn="$2"
>  
> -	ARGS=(-t "${trtype}" -n "${subsysnqn}")
> +	ARGS=(-t "${trtype}"
> +	      -n "${subsysnqn}"
> +	      --hostnqn="${hostnqn}"
> +	      --hostid="${hostid}")
> +	if _nvme_cli_supports_context; then
> +		ARGS+=(--context="blktests")
> +	fi
>  	if [[ "${trtype}" == "fc" ]] ; then
>  		ARGS+=(-a "${traddr}" -w "${host_traddr}")
>  	elif [[ "${trtype}" != "loop" ]]; then
>  		ARGS+=(-a "${traddr}" -s "${trsvcid}")
>  	fi
> -	if [[ "${hostnqn}" != "$def_hostnqn" ]]; then
> -		ARGS+=(--hostnqn="${hostnqn}")
> -	fi
> -	if [[ "${hostid}" != "$def_hostid" ]]; then
> -		ARGS+=(--hostid="${hostid}")
> -	fi
>  	if [[ -n "${hostkey}" ]]; then
>  		ARGS+=(--dhchap-secret="${hostkey}")
>  	fi
> @@ -482,7 +530,12 @@ _nvme_discover() {
>  	local host_traddr="${3:-$def_host_traddr}"
>  	local trsvcid="${3:-$def_trsvcid}"
>  
> -	ARGS=(-t "${trtype}")
> +	ARGS=(-t "${trtype}"
> +	      --hostnqn="${def_hostnqn}"
> +	      --hostid="${def_hostid}")
> +	if _nvme_cli_supports_context; then
> +		ARGS+=(--context="blktests")
> +	fi
>  	if [[ "${trtype}" = "fc" ]]; then
>  		ARGS+=(-a "${traddr}" -w "${host_traddr}")
>  	elif [[ "${trtype}" != "loop" ]]; then
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid
  2023-06-20 13:27 ` [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid Daniel Wagner
@ 2023-06-27 10:24   ` Shinichiro Kawasaki
  2023-07-06 16:06   ` Max Gurtovoy
  1 sibling, 0 replies; 35+ messages in thread
From: Shinichiro Kawasaki @ 2023-06-27 10:24 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Jun 20, 2023 / 15:27, Daniel Wagner wrote:
> The host might have enabled the udev/systemd auto connect feature.
> This disturbs the blktests for the fc transport. nvme-cli is able
> to distinguish between the different invocations via the --context
> option. In order to get this working we have to use the default
> hostnqn and hostid and not randon generated IDs for every single

Nit: s/randon/random/, probably.

Other than that this patch looks good to me.

> run.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  tests/nvme/041 | 8 ++------
>  tests/nvme/042 | 8 ++------
>  tests/nvme/043 | 8 ++------
>  tests/nvme/044 | 8 ++------
>  tests/nvme/045 | 8 ++------
>  5 files changed, 10 insertions(+), 30 deletions(-)
> 
> diff --git a/tests/nvme/041 b/tests/nvme/041
> index 308655dd6090..5b04b99b128e 100755
> --- a/tests/nvme/041
> +++ b/tests/nvme/041
> @@ -30,12 +30,8 @@ test() {
>  
>  	echo "Running ${TEST_NAME}"
>  
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>  	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
>  	if [ -z "$hostkey" ] ; then
>  		echo "nvme gen-dhchap-key failed"
> diff --git a/tests/nvme/042 b/tests/nvme/042
> index fed2efead013..8df5ed37aacc 100755
> --- a/tests/nvme/042
> +++ b/tests/nvme/042
> @@ -32,12 +32,8 @@ test() {
>  
>  	echo "Running ${TEST_NAME}"
>  
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>  
>  	_setup_nvmet
>  
> diff --git a/tests/nvme/043 b/tests/nvme/043
> index a030884aa4ed..b591e39d0706 100755
> --- a/tests/nvme/043
> +++ b/tests/nvme/043
> @@ -33,12 +33,8 @@ test() {
>  
>  	echo "Running ${TEST_NAME}"
>  
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>  
>  	_setup_nvmet
>  
> diff --git a/tests/nvme/044 b/tests/nvme/044
> index 9928bcc55397..fca0897af27b 100755
> --- a/tests/nvme/044
> +++ b/tests/nvme/044
> @@ -32,12 +32,8 @@ test() {
>  
>  	echo "Running ${TEST_NAME}"
>  
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>  
>  	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
>  	if [ -z "$hostkey" ] ; then
> diff --git a/tests/nvme/045 b/tests/nvme/045
> index 26a55335a92c..eca629a18691 100755
> --- a/tests/nvme/045
> +++ b/tests/nvme/045
> @@ -36,12 +36,8 @@ test() {
>  
>  	echo "Running ${TEST_NAME}"
>  
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>  
>  	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
>  	if [ -z "$hostkey" ] ; then
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly
  2023-06-27 10:13   ` Shinichiro Kawasaki
@ 2023-06-28  5:52     ` Daniel Wagner
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-06-28  5:52 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Sagi Grimberg, Hannes Reinecke, James Smart, Martin Belanger

On Tue, Jun 27, 2023 at 10:13:48AM +0000, Shinichiro Kawasaki wrote:
> > +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> > +
> > +	queue_count=$((queue_count + 1))
> > +	if grep -q "${queue_count}" "${queue_count_file}"; then
> 
> Does this check work when the number in queue_count_file has more digits than
> queue_count? e.g.) queue_count_file=20, queue_count=2

The idea is that it should be an exact match. Let me figure out if this does
what it is supposed to do.

 > -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> > +	hostid="${def_hostid}"
> > +	hostnqn="${def_hostnqn}"
> 
> I guess it's the better to move this hunk to the 3rd patch. Or we can mention it
> in the commit message of this patch.

Good point, I'll move this to the 3rd patch so this change is in one patch.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-27 10:22   ` Shinichiro Kawasaki
@ 2023-06-28  6:09     ` Daniel Wagner
  2023-06-28 10:04       ` Shinichiro Kawasaki
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-06-28  6:09 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Tue, Jun 27, 2023 at 10:22:53AM +0000, Shinichiro Kawasaki wrote:
> On Jun 20, 2023 / 15:27, Daniel Wagner wrote:
> > When the host has enabled the udev/systemd autoconnect services for the
> > fc transport it interacts with blktests and make tests break.
> > 
> > nvme-cli learned to ignore connects attemps when using the
> > --context command line option paired with a volatile configuration. Thus
> > we can mark all the resources created by blktests and avoid any
> > interaction with the systemd autoconnect scripts.
> 
> This sounds a good idea. Question, is "--context" option of the nvme command
> mandatory to run nvme test with nvme_trtype=fc?

If nvme-cli is called without the '--context' option, the command will be
executed. Though if '--context' is provided as option and there is a
configuration which matches the connect parameters but doesn't match the context
it will ignore the operation.

The blktests tests expects that nothing behind it's back is fiddling on the
setup while it is running. So far udev didn't trigger for rdma/tcp but with fc
it will.

Thus, it's mandatory to use either the '--context' parameter or alternatively
disable the rule with

  ln -s /etc/udev/rules.d/70-nvmf-autoconnect.rules /dev/null

BTW, when the udev rule is active I observed crashes when running blktests. So
there is more to fix, though one thing at the time.

> Or is it nice-to-have feature
> depending on the test system OS? If it is mandatory, it's the better to check
> in _nvme_requires.

Well, I didn't want to make this a hard requirement for all tests. I guess we
could make it for fc only if this is what you had in mind. The question should
it only test for nvme-cli supporting --context or should it be really clever and
test if the udev rule is also active (no idea how but I assume it is possible)?

> >  _cleanup_nvmet() {
> > @@ -337,6 +383,8 @@ _setup_nvmet() {
> >  		def_host_traddr=$(printf "nn-%s:pn-%s" \
> >  					 "${def_local_wwnn}" \
> >  					 "${def_local_wwpn}")
> > +
> > +		_setup_nvme_cli
> 
> Can we move this _setup_nvme_cli call from _setup_nvmet to _setup_fcloop?
> _cleanup_nvme_cli is called in _cleanup_fcloop. I think it is a bit cleaner
> since those setup/cleanup functions are called at at same level.

Sure, no problem.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-28  6:09     ` Daniel Wagner
@ 2023-06-28 10:04       ` Shinichiro Kawasaki
  2023-06-28 15:00         ` Daniel Wagner
  0 siblings, 1 reply; 35+ messages in thread
From: Shinichiro Kawasaki @ 2023-06-28 10:04 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Jun 28, 2023 / 08:09, Daniel Wagner wrote:
> On Tue, Jun 27, 2023 at 10:22:53AM +0000, Shinichiro Kawasaki wrote:
> > On Jun 20, 2023 / 15:27, Daniel Wagner wrote:
> > > When the host has enabled the udev/systemd autoconnect services for the
> > > fc transport it interacts with blktests and make tests break.
> > > 
> > > nvme-cli learned to ignore connects attemps when using the
> > > --context command line option paired with a volatile configuration. Thus
> > > we can mark all the resources created by blktests and avoid any
> > > interaction with the systemd autoconnect scripts.
> > 
> > This sounds a good idea. Question, is "--context" option of the nvme command
> > mandatory to run nvme test with nvme_trtype=fc?
> 
> If nvme-cli is called without the '--context' option, the command will be
> executed. Though if '--context' is provided as option and there is a
> configuration which matches the connect parameters but doesn't match the context
> it will ignore the operation.
> 
> The blktests tests expects that nothing behind it's back is fiddling on the
> setup while it is running. So far udev didn't trigger for rdma/tcp but with fc
> it will.
> 
> Thus, it's mandatory to use either the '--context' parameter or alternatively
> disable the rule with
> 
>   ln -s /etc/udev/rules.d/70-nvmf-autoconnect.rules /dev/null
> 
> BTW, when the udev rule is active I observed crashes when running blktests. So
> there is more to fix, though one thing at the time.
> 
> > Or is it nice-to-have feature
> > depending on the test system OS? If it is mandatory, it's the better to check
> > in _nvme_requires.
> 
> Well, I didn't want to make this a hard requirement for all tests. I guess we
> could make it for fc only if this is what you had in mind. The question should
> it only test for nvme-cli supporting --context or should it be really clever and
> test if the udev rule is also active (no idea how but I assume it is possible)?

Thanks for the explanations. It looks that the requirement check I suggested i
_nvme_requires will be will be too much. And I don't have good idea for the udev
rule check either, so let's settle this change without the checks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-28 10:04       ` Shinichiro Kawasaki
@ 2023-06-28 15:00         ` Daniel Wagner
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-06-28 15:00 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Wed, Jun 28, 2023 at 10:04:01AM +0000, Shinichiro Kawasaki wrote:
> > BTW, when the udev rule is active I observed crashes when running blktests. So
> > there is more to fix, though one thing at the time.

FTR, this is typical hanger/crash I see when the udev rule is active:

 run blktests nvme/005 at 2023-06-28 16:30:07
 loop0: detected capacity change from 0 to 32768
 nvmet: adding nsid 1 to subsystem blktests-subsystem-1
 nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
 (NULL device *): {0:0} Association created
 [43] nvmet: ctrl 1 start keep-alive timer for 5 secs
 nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
 [7] nvmet: adding queue 1 to ctrl 1.
 [363] nvmet: adding queue 2 to ctrl 1.
 [43] nvmet: adding queue 3 to ctrl 1.
 [45] nvmet: adding queue 4 to ctrl 1.
 nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
 nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
 (NULL device *): {0:1} Association created
 [43] nvmet: ctrl 2 start keep-alive timer for 120 secs
 nvmet: creating discovery controller 2 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
 nvme nvme3: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
 nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
 (NULL device *): {0:2} Association created
 [24] nvmet: ctrl 3 start keep-alive timer for 5 secs
 nvmet: creating nvm controller 3 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
 [7] nvmet: adding queue 1 to ctrl 3.
 [24] nvmet: adding queue 2 to ctrl 3.
 [43] nvmet: adding queue 3 to ctrl 3.
 [45] nvmet: adding queue 4 to ctrl 3.
 nvme nvme2: NVME-FC{0}: controller connect complete
 nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
 block nvme2n1: no available path - failing I/O
 block nvme2n1: no available path - failing I/O
 Buffer I/O error on dev nvme2n1, logical block 0, async page read
 [363] nvmet: ctrl 1 stop keep-alive
 (NULL device *): {0:0} Association deleted
 (NULL device *): {0:0} Association freed
 (NULL device *): Disconnect LS failed: No Association
 general protection fault, probably for non-canonical address 0xdffffc00000000a4: 0000 [#1] PREEMPT SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000520-0x0000000000000527]
 CPU: 1 PID: 363 Comm: kworker/1:4 Tainted: G    B   W          6.4.0-rc2+ #4 451dee9b3635bb59af0c91bc19227b1b3bbbbf3f
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown unknown
 Workqueue: nvmet-wq fcloop_fcp_recv_work [nvme_fcloop]
 RIP: 0010:nvmet_execute_disc_get_log_page+0x23f/0x8c0 [nvmet]
 Code: e8 c6 12 63 e3 4c 89 6c 24 40 48 89 5c 24 08 4c 8b 3b 49 8d 9f 20 05 00 00 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 df e8 93 12 63 e3 4c 89 74 24 30 4c 8b 2b
 RSP: 0018:ffff88811ab378a0 EFLAGS: 00010202
 RAX: 00000000000000a4 RBX: 0000000000000520 RCX: dffffc0000000000
 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffffa95d2d70
 RBP: ffff88811ab37ab0 R08: dffffc0000000000 R09: fffffbfff52ba5af
 R10: 0000000000000000 R11: dffffc0000000001 R12: 1ffff11023566f20
 R13: ffff888107db3260 R14: ffff888107db3270 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88815a600000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000002394000 CR3: 000000011e5e8006 CR4: 0000000000370ee0
 Call Trace:
  <TASK>
  ? prepare_alloc_pages+0x1c5/0x580
  ? __cfi_nvmet_execute_disc_get_log_page+0x10/0x10 [nvmet f0c9e080f6cf521f454ee104d12dafd11a5a7613]
  ? __alloc_pages+0x30e/0x650
  ? slab_post_alloc_hook+0x67/0x350
  ? __cfi___alloc_pages+0x10/0x10
  ? alloc_pages+0x30e/0x530
  ? sgl_alloc_order+0x118/0x320
  nvmet_fc_queue_fcp_req+0xa19/0xda0 [nvmet_fc f192e753efa6aca39b2b3b72c07e648aac01348b]
  ? nvmet_fc_rcv_fcp_req+0x9c0/0x9c0 [nvmet_fc f192e753efa6aca39b2b3b72c07e648aac01348b]
  ? do_raw_spin_unlock+0x116/0x8a0
  ? nvmet_fc_rcv_fcp_req+0x4de/0x9c0 [nvmet_fc f192e753efa6aca39b2b3b72c07e648aac01348b]
  nvmet_fc_rcv_fcp_req+0x4f0/0x9c0 [nvmet_fc f192e753efa6aca39b2b3b72c07e648aac01348b]
  fcloop_fcp_recv_work+0x173/0x440 [nvme_fcloop dcecfb508887e13f93cf8f37ef42c0fab90cbd00]
  process_one_work+0x80f/0xfb0
  ? rescuer_thread+0x1130/0x1130
  ? do_raw_spin_trylock+0xc9/0x1f0
  ? lock_acquired+0x4a/0x9a0
  ? wq_worker_sleeping+0x1e/0x200
  worker_thread+0x91e/0x1260
  ? do_raw_spin_unlock+0x116/0x8a0
  kthread+0x25d/0x2f0
  ? __cfi_worker_thread+0x10/0x10
  ? __cfi_kthread+0x10/0x10
  ret_from_fork+0x29/0x50
  </TASK>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid
  2023-06-20 13:27 ` [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid Daniel Wagner
  2023-06-27 10:24   ` Shinichiro Kawasaki
@ 2023-07-06 16:06   ` Max Gurtovoy
  2023-07-10  8:32     ` Daniel Wagner
  1 sibling, 1 reply; 35+ messages in thread
From: Max Gurtovoy @ 2023-07-06 16:06 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger



On 20/06/2023 16:27, Daniel Wagner wrote:
> The host might have enabled the udev/systemd auto connect feature.
> This disturbs the blktests for the fc transport. nvme-cli is able
> to distinguish between the different invocations via the --context
> option. In order to get this working we have to use the default
> hostnqn and hostid and not randon generated IDs for every single
> run.
> 

In the bellow tests the hostnqn and hostid are randomly generated for 
each test, so how will it disturb the udev/systemd ?
I'm not sure how will this change fix something and not sure why 
--context is mentioned here.

Seems like a good explanation to this patch is to used a dedicated 
hostnqn and hostid for blktests instead of randomly generate it.

> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   tests/nvme/041 | 8 ++------
>   tests/nvme/042 | 8 ++------
>   tests/nvme/043 | 8 ++------
>   tests/nvme/044 | 8 ++------
>   tests/nvme/045 | 8 ++------
>   5 files changed, 10 insertions(+), 30 deletions(-)
> 
> diff --git a/tests/nvme/041 b/tests/nvme/041
> index 308655dd6090..5b04b99b128e 100755
> --- a/tests/nvme/041
> +++ b/tests/nvme/041
> @@ -30,12 +30,8 @@ test() {
>   
>   	echo "Running ${TEST_NAME}"
>   
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>   	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
>   	if [ -z "$hostkey" ] ; then
>   		echo "nvme gen-dhchap-key failed"
> diff --git a/tests/nvme/042 b/tests/nvme/042
> index fed2efead013..8df5ed37aacc 100755
> --- a/tests/nvme/042
> +++ b/tests/nvme/042
> @@ -32,12 +32,8 @@ test() {
>   
>   	echo "Running ${TEST_NAME}"
>   
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>   
>   	_setup_nvmet
>   
> diff --git a/tests/nvme/043 b/tests/nvme/043
> index a030884aa4ed..b591e39d0706 100755
> --- a/tests/nvme/043
> +++ b/tests/nvme/043
> @@ -33,12 +33,8 @@ test() {
>   
>   	echo "Running ${TEST_NAME}"
>   
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>   
>   	_setup_nvmet
>   
> diff --git a/tests/nvme/044 b/tests/nvme/044
> index 9928bcc55397..fca0897af27b 100755
> --- a/tests/nvme/044
> +++ b/tests/nvme/044
> @@ -32,12 +32,8 @@ test() {
>   
>   	echo "Running ${TEST_NAME}"
>   
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>   
>   	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
>   	if [ -z "$hostkey" ] ; then
> diff --git a/tests/nvme/045 b/tests/nvme/045
> index 26a55335a92c..eca629a18691 100755
> --- a/tests/nvme/045
> +++ b/tests/nvme/045
> @@ -36,12 +36,8 @@ test() {
>   
>   	echo "Running ${TEST_NAME}"
>   
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>   
>   	hostkey="$(nvme gen-dhchap-key -n ${subsys_name} 2> /dev/null)"
>   	if [ -z "$hostkey" ] ; then

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-06-20 13:27 ` [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect Daniel Wagner
  2023-06-20 14:07   ` Sagi Grimberg
  2023-06-27 10:22   ` Shinichiro Kawasaki
@ 2023-07-06 16:11   ` Max Gurtovoy
  2023-07-10  8:27     ` Daniel Wagner
  2 siblings, 1 reply; 35+ messages in thread
From: Max Gurtovoy @ 2023-07-06 16:11 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger



On 20/06/2023 16:27, Daniel Wagner wrote:
> When the host has enabled the udev/systemd autoconnect services for the
> fc transport it interacts with blktests and make tests break.
> 
> nvme-cli learned to ignore connects attemps when using the
> --context command line option paired with a volatile configuration. Thus
> we can mark all the resources created by blktests and avoid any
> interaction with the systemd autoconnect scripts.

why would we like to ignore connect attempts during blktests ?
We define unique nqn/ids/etc. So we never should disturb other running 
services, unless it uses the same parameters, which shouldn't happen.

Agree on setting the hard coded values for hostnqn and hostid instead of 
reading the /etc/nvme/* files. This should do the work IMO, isn't it ?

> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   tests/nvme/rc | 73 ++++++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 63 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/nvme/rc b/tests/nvme/rc
> index 191f3e2e0c43..00117d314a38 100644
> --- a/tests/nvme/rc
> +++ b/tests/nvme/rc
> @@ -14,8 +14,8 @@ def_remote_wwnn="0x10001100aa000001"
>   def_remote_wwpn="0x20001100aa000001"
>   def_local_wwnn="0x10001100aa000002"
>   def_local_wwpn="0x20001100aa000002"
> -def_hostnqn="$(cat /etc/nvme/hostnqn 2> /dev/null)"
> -def_hostid="$(cat /etc/nvme/hostid 2> /dev/null)"
> +def_hostnqn="nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8"
> +def_hostid="242d4a24-2484-4a80-8234-d0169409c5e8"
>   nvme_trtype=${nvme_trtype:-"loop"}
>   nvme_img_size=${nvme_img_size:-"1G"}
>   nvme_num_iter=${nvme_num_iter:-"1000"}
> @@ -161,6 +161,50 @@ _nvme_calc_rand_io_size() {
>   	echo "${io_size_kb}k"
>   }
>   
> +_nvme_cli_supports_context() {
> +	if ! nvme connect --help 2>&1 | grep -q context > /dev/null; then
> +		    return 1
> +	fi
> +	return 0
> +}
> +
> +_setup_nvme_cli() {
> +	if ! _nvme_cli_supports_context; then
> +		return
> +	fi
> +
> +	mkdir -p /run/nvme
> +	cat >> /run/nvme/blktests.json <<-EOF
> +	[
> +	  {
> +	    "hostnqn": "${def_hostnqn}",
> +	    "hostid": "${def_hostid}",
> +	    "subsystems": [
> +	      {
> +	        "application": "blktests",
> +	        "nqn": "blktests-subsystem-1",
> +	        "ports": [
> +	          {
> +	            "transport": "fc",
> +	            "traddr": "nn-${def_remote_wwnn}:pn-${def_remote_wwpn}",
> +	            "host_traddr": "nn-${def_local_wwnn}:pn-${def_local_wwpn}"
> +	          }
> +	        ]
> +	      }
> +	    ]
> +	  }
> +	]
> +	EOF
> +}
> +
> +_cleanup_nvme_cli() {
> +	if ! _nvme_cli_supports_context; then
> +		return
> +	fi
> +
> +	rm -f /run/nvme/blktests.json
> +}
> +
>   _nvme_fcloop_add_rport() {
>   	local local_wwnn="$1"
>   	local local_wwpn="$2"
> @@ -235,6 +279,8 @@ _cleanup_fcloop() {
>   	_nvme_fcloop_del_lport "${local_wwnn}" "${local_wwpn}"
>   	_nvme_fcloop_del_rport "${local_wwnn}" "${local_wwpn}" \
>   			       "${remote_wwnn}" "${remote_wwpn}"
> +
> +	_cleanup_nvme_cli
>   }
>   
>   _cleanup_nvmet() {
> @@ -337,6 +383,8 @@ _setup_nvmet() {
>   		def_host_traddr=$(printf "nn-%s:pn-%s" \
>   					 "${def_local_wwnn}" \
>   					 "${def_local_wwpn}")
> +
> +		_setup_nvme_cli
>   	fi
>   }
>   
> @@ -436,18 +484,18 @@ _nvme_connect_subsys() {
>   	trtype="$1"
>   	subsysnqn="$2"
>   
> -	ARGS=(-t "${trtype}" -n "${subsysnqn}")
> +	ARGS=(-t "${trtype}"
> +	      -n "${subsysnqn}"
> +	      --hostnqn="${hostnqn}"
> +	      --hostid="${hostid}")
> +	if _nvme_cli_supports_context; then
> +		ARGS+=(--context="blktests")
> +	fi
>   	if [[ "${trtype}" == "fc" ]] ; then
>   		ARGS+=(-a "${traddr}" -w "${host_traddr}")
>   	elif [[ "${trtype}" != "loop" ]]; then
>   		ARGS+=(-a "${traddr}" -s "${trsvcid}")
>   	fi
> -	if [[ "${hostnqn}" != "$def_hostnqn" ]]; then
> -		ARGS+=(--hostnqn="${hostnqn}")
> -	fi
> -	if [[ "${hostid}" != "$def_hostid" ]]; then
> -		ARGS+=(--hostid="${hostid}")
> -	fi
>   	if [[ -n "${hostkey}" ]]; then
>   		ARGS+=(--dhchap-secret="${hostkey}")
>   	fi
> @@ -482,7 +530,12 @@ _nvme_discover() {
>   	local host_traddr="${3:-$def_host_traddr}"
>   	local trsvcid="${3:-$def_trsvcid}"
>   
> -	ARGS=(-t "${trtype}")
> +	ARGS=(-t "${trtype}"
> +	      --hostnqn="${def_hostnqn}"
> +	      --hostid="${def_hostid}")
> +	if _nvme_cli_supports_context; then
> +		ARGS+=(--context="blktests")
> +	fi
>   	if [[ "${trtype}" = "fc" ]]; then
>   		ARGS+=(-a "${traddr}" -w "${host_traddr}")
>   	elif [[ "${trtype}" != "loop" ]]; then

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-06 16:11   ` Max Gurtovoy
@ 2023-07-10  8:27     ` Daniel Wagner
  2023-07-10  9:53       ` Max Gurtovoy
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-07-10  8:27 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Thu, Jul 06, 2023 at 07:11:58PM +0300, Max Gurtovoy wrote:
> 
> 
> On 20/06/2023 16:27, Daniel Wagner wrote:
> > When the host has enabled the udev/systemd autoconnect services for the
> > fc transport it interacts with blktests and make tests break.
> > 
> > nvme-cli learned to ignore connects attemps when using the
> > --context command line option paired with a volatile configuration. Thus
> > we can mark all the resources created by blktests and avoid any
> > interaction with the systemd autoconnect scripts.
> 
> why would we like to ignore connect attempts during blktests ?

The problem I observed is that there were two connect attempts (one triggered
via udev and one via the test itself) which disturbed the test. The interference
resulted into a complete hang of the test case:

https://lore.kernel.org/linux-nvme/6g53yj5afhhonwf2psf7ft2gkakkwewy7klkix3y3u6uclpa5w@tt2yfzigyxgg/

> We define unique nqn/ids/etc. So we never should disturb other running
> services, unless it uses the same parameters, which shouldn't happen.

Yes, that should do the decoupling between udev and blktests.

> Agree on setting the hard coded values for hostnqn and hostid instead of
> reading the /etc/nvme/* files. This should do the work IMO, isn't it ?

Do you mean that nvme-cli will ignore the udev trigger when a different hostnqn
is invovled? At least for the well-known discover controller the hostqnq
doesn't work. Though not sure if I understood you correctly here.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid
  2023-07-06 16:06   ` Max Gurtovoy
@ 2023-07-10  8:32     ` Daniel Wagner
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Wagner @ 2023-07-10  8:32 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Thu, Jul 06, 2023 at 07:06:28PM +0300, Max Gurtovoy wrote:
> 
> 
> On 20/06/2023 16:27, Daniel Wagner wrote:
> > The host might have enabled the udev/systemd auto connect feature.
> > This disturbs the blktests for the fc transport. nvme-cli is able
> > to distinguish between the different invocations via the --context
> > option. In order to get this working we have to use the default
> > hostnqn and hostid and not randon generated IDs for every single
> > run.
> > 
> 
> In the bellow tests the hostnqn and hostid are randomly generated for each
> test, so how will it disturb the udev/systemd ?

No, random hostnqn should work as well. The only requirement is that the
hostnqn(s) used by blktest are unique.

> I'm not sure how will this change fix something and not sure why --context
> is mentioned here.

Sorry about that. I should have updated the commit message. A left over from a
earlier version.

> Seems like a good explanation to this patch is to used a dedicated hostnqn
> and hostid for blktests instead of randomly generate it.

Okay, I'll update the commit message to reflect this. I suppose I could also
look into getting rid of the local variables in the tests.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-10  8:27     ` Daniel Wagner
@ 2023-07-10  9:53       ` Max Gurtovoy
  2023-07-10 10:19         ` Daniel Wagner
  0 siblings, 1 reply; 35+ messages in thread
From: Max Gurtovoy @ 2023-07-10  9:53 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger



On 10/07/2023 11:27, Daniel Wagner wrote:
> On Thu, Jul 06, 2023 at 07:11:58PM +0300, Max Gurtovoy wrote:
>>
>>
>> On 20/06/2023 16:27, Daniel Wagner wrote:
>>> When the host has enabled the udev/systemd autoconnect services for the
>>> fc transport it interacts with blktests and make tests break.
>>>
>>> nvme-cli learned to ignore connects attemps when using the
>>> --context command line option paired with a volatile configuration. Thus
>>> we can mark all the resources created by blktests and avoid any
>>> interaction with the systemd autoconnect scripts.
>>
>> why would we like to ignore connect attempts during blktests ?
> 
> The problem I observed is that there were two connect attempts (one triggered
> via udev and one via the test itself) which disturbed the test. The interference
> resulted into a complete hang of the test case:
> 
> https://lore.kernel.org/linux-nvme/6g53yj5afhhonwf2psf7ft2gkakkwewy7klkix3y3u6uclpa5w@tt2yfzigyxgg/
> 
>> We define unique nqn/ids/etc. So we never should disturb other running
>> services, unless it uses the same parameters, which shouldn't happen.
> 
> Yes, that should do the decoupling between udev and blktests.
> 
>> Agree on setting the hard coded values for hostnqn and hostid instead of
>> reading the /etc/nvme/* files. This should do the work IMO, isn't it ?
> 
> Do you mean that nvme-cli will ignore the udev trigger when a different hostnqn
> is invovled? At least for the well-known discover controller the hostqnq
> doesn't work. Though not sure if I understood you correctly here.

All I was saying is that your issue would have been fixed easily by 2-3 
LOC by removing the usage of /etc/nvme/* files that are usually 
configured by system administrator and use a default values for blktests 
hostnqn and hostid.
The usage of --context in blktests is redundant IMO.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-10  9:53       ` Max Gurtovoy
@ 2023-07-10 10:19         ` Daniel Wagner
  2023-07-10 12:31           ` Max Gurtovoy
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-07-10 10:19 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Mon, Jul 10, 2023 at 12:53:17PM +0300, Max Gurtovoy wrote:
 > > Agree on setting the hard coded values for hostnqn and hostid instead of
> > > reading the /etc/nvme/* files. This should do the work IMO, isn't it ?
> > 
> > Do you mean that nvme-cli will ignore the udev trigger when a different hostnqn
> > is invovled? At least for the well-known discover controller the hostqnq
> > doesn't work. Though not sure if I understood you correctly here.
> 
> All I was saying is that your issue would have been fixed easily by 2-3 LOC
> by removing the usage of /etc/nvme/* files that are usually configured by
> system administrator and use a default values for blktests hostnqn and
> hostid.

Okay.

> The usage of --context in blktests is redundant IMO.

Right, I made a bit of a mess with the commit message. Sorry about that.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-10 10:19         ` Daniel Wagner
@ 2023-07-10 12:31           ` Max Gurtovoy
  2023-07-10 15:03             ` Daniel Wagner
  0 siblings, 1 reply; 35+ messages in thread
From: Max Gurtovoy @ 2023-07-10 12:31 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger



On 10/07/2023 13:19, Daniel Wagner wrote:
> On Mon, Jul 10, 2023 at 12:53:17PM +0300, Max Gurtovoy wrote:
>   > > Agree on setting the hard coded values for hostnqn and hostid instead of
>>>> reading the /etc/nvme/* files. This should do the work IMO, isn't it ?
>>>
>>> Do you mean that nvme-cli will ignore the udev trigger when a different hostnqn
>>> is invovled? At least for the well-known discover controller the hostqnq
>>> doesn't work. Though not sure if I understood you correctly here.
>>
>> All I was saying is that your issue would have been fixed easily by 2-3 LOC
>> by removing the usage of /etc/nvme/* files that are usually configured by
>> system administrator and use a default values for blktests hostnqn and
>> hostid.
> 
> Okay.
> 
>> The usage of --context in blktests is redundant IMO.
> 
> Right, I made a bit of a mess with the commit message. Sorry about that.

I think it is more than just commit message.
A lot of code that we can avoid was added regarding the --context 
cmdline argument.
Maybe it's worth cleaning it..

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-10 12:31           ` Max Gurtovoy
@ 2023-07-10 15:03             ` Daniel Wagner
  2023-07-10 16:30               ` Max Gurtovoy
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-07-10 15:03 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
> I think it is more than just commit message.

Okay, starting to understand what's the problem.

> A lot of code that we can avoid was added regarding the --context cmdline
> argument.

Correct and it's not optional to get the tests passing for the fc transport.

> Maybe it's worth cleaning it..

It really solves the problem that the autoconnect setup of nvme-cli is
distrubing the tests (*). The only other way I found to stop the autoconnect is
by disabling the udev rule completely. If autoconnect isn't enabled the context
isn't necessary. Though changing system configuration from blktests seems at bit
excessive.

Another option is to detect if autoconect is enabled and report this when
starting the tests. In this case we could remove the context part completely.
Obviously, I would prefer to keep it but I am certaintaly not against dropping
it and make blktests a bit simpler if this is the preference. I just need to
remember to disable the autoconnect stuff when using blktests.

(*) Sure we can fix this but at this point. Though I think it's a bit strange
for a test suite to depend/interact with external components in this way.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-10 15:03             ` Daniel Wagner
@ 2023-07-10 16:30               ` Max Gurtovoy
  2023-07-12 12:04                 ` Daniel Wagner
  0 siblings, 1 reply; 35+ messages in thread
From: Max Gurtovoy @ 2023-07-10 16:30 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger



On 10/07/2023 18:03, Daniel Wagner wrote:
> On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
>> I think it is more than just commit message.
> 
> Okay, starting to understand what's the problem.
> 
>> A lot of code that we can avoid was added regarding the --context cmdline
>> argument.
> 
> Correct and it's not optional to get the tests passing for the fc transport.

why the fc needs the --context to pass tests ?

> 
>> Maybe it's worth cleaning it..
> 
> It really solves the problem that the autoconnect setup of nvme-cli is
> distrubing the tests (*). The only other way I found to stop the autoconnect is
> by disabling the udev rule completely. If autoconnect isn't enabled the context
> isn't necessary. Though changing system configuration from blktests seems at bit
> excessive.

we should not stop any autoconnect during blktests. The autoconnect and 
all the system admin services should run normally.
The blktests should not interfere with regular system configuration and 
should not use any sysadmin file/services/devices.

It should create its own subsystems, nqn's, id's, namespaces, etc..

> 
> Another option is to detect if autoconect is enabled and report this when
> starting the tests. In this case we could remove the context part completely.

the --context might be used in the some sysadmin scripts or so. I don't 
see a point to do so in the blktests since we create a new association 
between initiator and target on each test (or on each group of tests).

> Obviously, I would prefer to keep it but I am certaintaly not against dropping
> it and make blktests a bit simpler if this is the preference. I just need to
> remember to disable the autoconnect stuff when using blktests.
> 
> (*) Sure we can fix this but at this point. Though I think it's a bit strange
> for a test suite to depend/interact with external components in this way.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-10 16:30               ` Max Gurtovoy
@ 2023-07-12 12:04                 ` Daniel Wagner
  2023-07-13  0:12                   ` Max Gurtovoy
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-07-12 12:04 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger

On Mon, Jul 10, 2023 at 07:30:20PM +0300, Max Gurtovoy wrote:
> 
> 
> On 10/07/2023 18:03, Daniel Wagner wrote:
> > On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
> > > I think it is more than just commit message.
> > 
> > Okay, starting to understand what's the problem.
> > 
> > > A lot of code that we can avoid was added regarding the --context cmdline
> > > argument.
> > 
> > Correct and it's not optional to get the tests passing for the fc transport.
> 
> why the fc needs the --context to pass tests ?

A typical nvme test consists out of following steps (nvme/004):

// nvme target setup (1)
	_create_nvmet_subsystem "blktests-subsystem-1" "${loop_dev}" \
		"91fdba0d-f87b-4c25-b80f-db7be1418b9e"
	_add_nvmet_subsys_to_port "${port}" "blktests-subsystem-1"

// nvme host setup (2)
	_nvme_connect_subsys "${nvme_trtype}" blktests-subsystem-1

	local nvmedev
	nvmedev=$(_find_nvme_dev "blktests-subsystem-1")
	cat "/sys/block/${nvmedev}n1/uuid"
	cat "/sys/block/${nvmedev}n1/wwid"

// nvme host teardown (3)
	_nvme_disconnect_subsys blktests-subsystem-1

// nvme target teardown (4)
	_remove_nvmet_subsystem_from_port "${port}" "blktests-subsystem-1"
	_remove_nvmet_subsystem "blktests-subsystem-1"


The corresponding output with --context

 run blktests nvme/004 at 2023-07-12 13:49:50
// (1)
 loop0: detected capacity change from 0 to 32768
 nvmet: adding nsid 1 to subsystem blktests-subsystem-1
 nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
 (NULL device *): {0:0} Association created
 [174] nvmet: ctrl 1 start keep-alive timer for 5 secs
// (2)
 nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
 [374] nvmet: adding queue 1 to ctrl 1.
 [1138] nvmet: adding queue 2 to ctrl 1.
 [73] nvmet: adding queue 3 to ctrl 1.
 [174] nvmet: adding queue 4 to ctrl 1.
 nvme nvme2: NVME-FC{0}: controller connect complete
 nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
// (3)
 nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
// (4)
 [1138] nvmet: ctrl 1 stop keep-alive
 (NULL device *): {0:0} Association deleted
 (NULL device *): {0:0} Association freed
 (NULL device *): Disconnect LS failed: No Association


and without --context

 run blktests nvme/004 at 2023-07-12 13:50:33
// (1)
 loop1: detected capacity change from 0 to 32768
 nvmet: adding nsid 1 to subsystem blktests-subsystem-1
 nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
 (NULL device *): {0:0} Association created
 [1235] nvmet: ctrl 1 start keep-alive timer for 120 secs
// XXX udev auto connect
 nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
 nvme nvme2: NVME-FC{0}: controller connect complete
 nvme nvme2: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
 nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
 (NULL device *): {0:1} Association created
 [73] nvmet: ctrl 2 start keep-alive timer for 5 secs
// (2)
 nvmet: creating nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
 [374] nvmet: adding queue 1 to ctrl 2.
 [233] nvmet: adding queue 2 to ctrl 2.
 [73] nvmet: adding queue 3 to ctrl 2.
 [1235] nvmet: adding queue 4 to ctrl 2.
 nvme nvme3: NVME-FC{1}: controller connect complete
 nvme nvme3: NVME-FC{1}: new ctrl: NQN "blktests-subsystem-1"
// (3)
 nvme nvme3: Removing ctrl: NQN "blktests-subsystem-1"
 general protection fault, probably for non-canonical address 0xdffffc00000000a4: 0000 [#1] PREEMPT SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000520-0x0000000000000527]
 CPU: 1 PID: 2076 Comm: kworker/1:1 Tainted: G        W          6.4.0-rc2+ #7 f2a41a58e59b44ee1bb7bc68087ccbe6d76392dd
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown unknown
 Workqueue: nvmet-wq fcloop_fcp_recv_work [nvme_fcloop]
 RIP: 0010:nvmet_execute_disc_get_log_page+0x23f/0x8c0 [nvmet]
 Code: e8 c6 12 c7 e0 4c 89 6c 24 40 48 89 5c 24 08 4c 8b 3b 49 8d 9f 20 05 00 00 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 df e8 93 12 c7 e0 4c 89 74 24 30 4c 8b 2b
 RSP: 0018:ffff888139a778a0 EFLAGS: 00010202
 RAX: 00000000000000a4 RBX: 0000000000000520 RCX: dffffc0000000000
 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffffa8af3a88
 RBP: ffff888139a77ab0 R08: dffffc0000000000 R09: fffffbfff515e752
 R10: 0000000000000000 R11: dffffc0000000001 R12: 1ffff1102734ef20
 R13: ffff888105563260 R14: ffff888105563270 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88815a600000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000824220 CR3: 0000000106512005 CR4: 0000000000370ee0
 Call Trace:
  <TASK>
  ? prepare_alloc_pages+0x1c5/0x580
  ? __cfi_nvmet_execute_disc_get_log_page+0x10/0x10 [nvmet 1da13efcd047161c3381cb240a95399f951fd98f]
  ? __alloc_pages+0x30e/0x650
  ? slab_post_alloc_hook+0x67/0x350
  ? __cfi___alloc_pages+0x10/0x10
  ? alloc_pages+0x30e/0x530
  ? sgl_alloc_order+0x118/0x320
  nvmet_fc_queue_fcp_req+0xa19/0xda0 [nvmet_fc 11628cdb09a094fd591bfaf88be45b97e3b18e3a]
  ? nvmet_fc_rcv_fcp_req+0x9c0/0x9c0 [nvmet_fc 11628cdb09a094fd591bfaf88be45b97e3b18e3a]
  ? lockdep_hardirqs_on_prepare+0x2aa/0x5e0
  ? nvmet_fc_rcv_fcp_req+0x4de/0x9c0 [nvmet_fc 11628cdb09a094fd591bfaf88be45b97e3b18e3a]
  nvmet_fc_rcv_fcp_req+0x4f0/0x9c0 [nvmet_fc 11628cdb09a094fd591bfaf88be45b97e3b18e3a]
  fcloop_fcp_recv_work+0x173/0x440 [nvme_fcloop 05cf1144b564c4e1626f9f15422ccf61f2af41de]
  process_one_work+0x80f/0xfb0
  ? rescuer_thread+0x1130/0x1130
  ? do_raw_spin_trylock+0xc9/0x1f0
  ? lock_acquired+0x310/0x9a0
  ? worker_thread+0xd5e/0x1260
  worker_thread+0x91e/0x1260
  ? __cfi_lock_release+0x10/0x10
  ? do_raw_spin_unlock+0x116/0x8a0
  kthread+0x25d/0x2f0
  ? __cfi_worker_thread+0x10/0x10
  ? __cfi_kthread+0x10/0x10
  ret_from_fork+0x29/0x50
  </TASK>

> > > Maybe it's worth cleaning it..
> > 
> > It really solves the problem that the autoconnect setup of nvme-cli is
> > distrubing the tests (*). The only other way I found to stop the autoconnect is
> > by disabling the udev rule completely. If autoconnect isn't enabled the context
> > isn't necessary. Though changing system configuration from blktests seems at bit
> > excessive.
> 
> we should not stop any autoconnect during blktests. The autoconnect and all
> the system admin services should run normally.

I do not agree here. The current blktests are not designed for run as
intergration tests. Sure we should also tests this but currently blktests is
just not there and tcp/rdma are not actually covered anyway.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-12 12:04                 ` Daniel Wagner
@ 2023-07-13  0:12                   ` Max Gurtovoy
  2023-07-13  6:00                     ` Hannes Reinecke
  0 siblings, 1 reply; 35+ messages in thread
From: Max Gurtovoy @ 2023-07-13  0:12 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, Hannes Reinecke,
	James Smart, Martin Belanger



On 12/07/2023 15:04, Daniel Wagner wrote:
> On Mon, Jul 10, 2023 at 07:30:20PM +0300, Max Gurtovoy wrote:
>>
>>
>> On 10/07/2023 18:03, Daniel Wagner wrote:
>>> On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
>>>> I think it is more than just commit message.
>>>
>>> Okay, starting to understand what's the problem.
>>>
>>>> A lot of code that we can avoid was added regarding the --context cmdline
>>>> argument.
>>>
>>> Correct and it's not optional to get the tests passing for the fc transport.
>>
>> why the fc needs the --context to pass tests ?
> 
> A typical nvme test consists out of following steps (nvme/004):
> 
> // nvme target setup (1)
> 	_create_nvmet_subsystem "blktests-subsystem-1" "${loop_dev}" \
> 		"91fdba0d-f87b-4c25-b80f-db7be1418b9e"
> 	_add_nvmet_subsys_to_port "${port}" "blktests-subsystem-1"
> 
> // nvme host setup (2)
> 	_nvme_connect_subsys "${nvme_trtype}" blktests-subsystem-1
> 
> 	local nvmedev
> 	nvmedev=$(_find_nvme_dev "blktests-subsystem-1")
> 	cat "/sys/block/${nvmedev}n1/uuid"
> 	cat "/sys/block/${nvmedev}n1/wwid"
> 
> // nvme host teardown (3)
> 	_nvme_disconnect_subsys blktests-subsystem-1
> 
> // nvme target teardown (4)
> 	_remove_nvmet_subsystem_from_port "${port}" "blktests-subsystem-1"
> 	_remove_nvmet_subsystem "blktests-subsystem-1"
> 
> 
> The corresponding output with --context
> 
>   run blktests nvme/004 at 2023-07-12 13:49:50
> // (1)
>   loop0: detected capacity change from 0 to 32768
>   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>   nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
>   (NULL device *): {0:0} Association created
>   [174] nvmet: ctrl 1 start keep-alive timer for 5 secs
> // (2)
>   nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>   [374] nvmet: adding queue 1 to ctrl 1.
>   [1138] nvmet: adding queue 2 to ctrl 1.
>   [73] nvmet: adding queue 3 to ctrl 1.
>   [174] nvmet: adding queue 4 to ctrl 1.
>   nvme nvme2: NVME-FC{0}: controller connect complete
>   nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
> // (3)
>   nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
> // (4)
>   [1138] nvmet: ctrl 1 stop keep-alive
>   (NULL device *): {0:0} Association deleted
>   (NULL device *): {0:0} Association freed
>   (NULL device *): Disconnect LS failed: No Association
> 
> 
> and without --context
> 
>   run blktests nvme/004 at 2023-07-12 13:50:33
> // (1)
>   loop1: detected capacity change from 0 to 32768
>   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>   nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"

why does this association to discovery controller created ? because of 
some system service ?

can we configure the blktests subsystem not to be discovered or add some 
access list to it ?

>   (NULL device *): {0:0} Association created
>   [1235] nvmet: ctrl 1 start keep-alive timer for 120 secs
> // XXX udev auto connect
>   nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:242d4a24-2484-4a80-8234-d0169409c5e8.
>   nvme nvme2: NVME-FC{0}: controller connect complete
>   nvme nvme2: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>   nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
>   (NULL device *): {0:1} Association created
>   [73] nvmet: ctrl 2 start keep-alive timer for 5 secs
> // (2)
>   nvmet: creating nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>   [374] nvmet: adding queue 1 to ctrl 2.
>   [233] nvmet: adding queue 2 to ctrl 2.
>   [73] nvmet: adding queue 3 to ctrl 2.
>   [1235] nvmet: adding queue 4 to ctrl 2.
>   nvme nvme3: NVME-FC{1}: controller connect complete
>   nvme nvme3: NVME-FC{1}: new ctrl: NQN "blktests-subsystem-1"
> // (3)
>   nvme nvme3: Removing ctrl: NQN "blktests-subsystem-1"

bellow sounds like a bug we should fix :)

>   general protection fault, probably for non-canonical address 0xdffffc00000000a4: 0000 [#1] PREEMPT SMP KASAN NOPTI
>   KASAN: null-ptr-deref in range [0x0000000000000520-0x0000000000000527]
>   CPU: 1 PID: 2076 Comm: kworker/1:1 Tainted: G        W          6.4.0-rc2+ #7 f2a41a58e59b44ee1bb7bc68087ccbe6d76392dd
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown unknown
>   Workqueue: nvmet-wq fcloop_fcp_recv_work [nvme_fcloop]
>   RIP: 0010:nvmet_execute_disc_get_log_page+0x23f/0x8c0 [nvmet]
>   Code: e8 c6 12 c7 e0 4c 89 6c 24 40 48 89 5c 24 08 4c 8b 3b 49 8d 9f 20 05 00 00 48 89 d8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 df e8 93 12 c7 e0 4c 89 74 24 30 4c 8b 2b
>   RSP: 0018:ffff888139a778a0 EFLAGS: 00010202
>   RAX: 00000000000000a4 RBX: 0000000000000520 RCX: dffffc0000000000
>   RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffffa8af3a88
>   RBP: ffff888139a77ab0 R08: dffffc0000000000 R09: fffffbfff515e752
>   R10: 0000000000000000 R11: dffffc0000000001 R12: 1ffff1102734ef20
>   R13: ffff888105563260 R14: ffff888105563270 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff88815a600000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000824220 CR3: 0000000106512005 CR4: 0000000000370ee0
>   Call Trace:
>    <TASK>
>    ? prepare_alloc_pages+0x1c5/0x580
>    ? __cfi_nvmet_execute_disc_get_log_page+0x10/0x10 [nvmet 1da13efcd047161c3381cb240a95399f951fd98f]
>    ? __alloc_pages+0x30e/0x650
>    ? slab_post_alloc_hook+0x67/0x350
>    ? __cfi___alloc_pages+0x10/0x10
>    ? alloc_pages+0x30e/0x530
>    ? sgl_alloc_order+0x118/0x320
>    nvmet_fc_queue_fcp_req+0xa19/0xda0 [nvmet_fc 11628cdb09a094fd591bfaf88be45b97e3b18e3a]
>    ? nvmet_fc_rcv_fcp_req+0x9c0/0x9c0 [nvmet_fc 11628cdb09a094fd591bfaf88be45b97e3b18e3a]
>    ? lockdep_hardirqs_on_prepare+0x2aa/0x5e0
>    ? nvmet_fc_rcv_fcp_req+0x4de/0x9c0 [nvmet_fc 11628cdb09a094fd591bfaf88be45b97e3b18e3a]
>    nvmet_fc_rcv_fcp_req+0x4f0/0x9c0 [nvmet_fc 11628cdb09a094fd591bfaf88be45b97e3b18e3a]
>    fcloop_fcp_recv_work+0x173/0x440 [nvme_fcloop 05cf1144b564c4e1626f9f15422ccf61f2af41de]
>    process_one_work+0x80f/0xfb0
>    ? rescuer_thread+0x1130/0x1130
>    ? do_raw_spin_trylock+0xc9/0x1f0
>    ? lock_acquired+0x310/0x9a0
>    ? worker_thread+0xd5e/0x1260
>    worker_thread+0x91e/0x1260
>    ? __cfi_lock_release+0x10/0x10
>    ? do_raw_spin_unlock+0x116/0x8a0
>    kthread+0x25d/0x2f0
>    ? __cfi_worker_thread+0x10/0x10
>    ? __cfi_kthread+0x10/0x10
>    ret_from_fork+0x29/0x50
>    </TASK>
> 
>>>> Maybe it's worth cleaning it..
>>>
>>> It really solves the problem that the autoconnect setup of nvme-cli is
>>> distrubing the tests (*). The only other way I found to stop the autoconnect is
>>> by disabling the udev rule completely. If autoconnect isn't enabled the context
>>> isn't necessary. Though changing system configuration from blktests seems at bit
>>> excessive.
>>
>> we should not stop any autoconnect during blktests. The autoconnect and all
>> the system admin services should run normally.
> 
> I do not agree here. The current blktests are not designed for run as
> intergration tests. Sure we should also tests this but currently blktests is
> just not there and tcp/rdma are not actually covered anyway.

what do you mean tcp/rdma not covered ?

And maybe we should make several changes in the blktests to make it 
standalone without interfering the existing configuration make by some 
system administrator.

> 
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-13  0:12                   ` Max Gurtovoy
@ 2023-07-13  6:00                     ` Hannes Reinecke
  2023-07-13  8:49                       ` Max Gurtovoy
  0 siblings, 1 reply; 35+ messages in thread
From: Hannes Reinecke @ 2023-07-13  6:00 UTC (permalink / raw)
  To: Max Gurtovoy, Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, James Smart,
	Martin Belanger

On 7/13/23 02:12, Max Gurtovoy wrote:
> 
> 
> On 12/07/2023 15:04, Daniel Wagner wrote:
>> On Mon, Jul 10, 2023 at 07:30:20PM +0300, Max Gurtovoy wrote:
>>>
>>>
>>> On 10/07/2023 18:03, Daniel Wagner wrote:
>>>> On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
>>>>> I think it is more than just commit message.
>>>>
>>>> Okay, starting to understand what's the problem.
>>>>
>>>>> A lot of code that we can avoid was added regarding the --context 
>>>>> cmdline
>>>>> argument.
>>>>
>>>> Correct and it's not optional to get the tests passing for the fc 
>>>> transport.
>>>
>>> why the fc needs the --context to pass tests ?
>>
>> A typical nvme test consists out of following steps (nvme/004):
>>
>> // nvme target setup (1)
>>     _create_nvmet_subsystem "blktests-subsystem-1" "${loop_dev}" \
>>         "91fdba0d-f87b-4c25-b80f-db7be1418b9e"
>>     _add_nvmet_subsys_to_port "${port}" "blktests-subsystem-1"
>>
>> // nvme host setup (2)
>>     _nvme_connect_subsys "${nvme_trtype}" blktests-subsystem-1
>>
>>     local nvmedev
>>     nvmedev=$(_find_nvme_dev "blktests-subsystem-1")
>>     cat "/sys/block/${nvmedev}n1/uuid"
>>     cat "/sys/block/${nvmedev}n1/wwid"
>>
>> // nvme host teardown (3)
>>     _nvme_disconnect_subsys blktests-subsystem-1
>>
>> // nvme target teardown (4)
>>     _remove_nvmet_subsystem_from_port "${port}" "blktests-subsystem-1"
>>     _remove_nvmet_subsystem "blktests-subsystem-1"
>>
>>
>> The corresponding output with --context
>>
>>   run blktests nvme/004 at 2023-07-12 13:49:50
>> // (1)
>>   loop0: detected capacity change from 0 to 32768
>>   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>   nvme nvme2: NVME-FC{0}: create association : host wwpn 
>> 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN 
>> "blktests-subsystem-1"
>>   (NULL device *): {0:0} Association created
>>   [174] nvmet: ctrl 1 start keep-alive timer for 5 secs
>> // (2)
>>   nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 
>> for NQN 
>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>   [374] nvmet: adding queue 1 to ctrl 1.
>>   [1138] nvmet: adding queue 2 to ctrl 1.
>>   [73] nvmet: adding queue 3 to ctrl 1.
>>   [174] nvmet: adding queue 4 to ctrl 1.
>>   nvme nvme2: NVME-FC{0}: controller connect complete
>>   nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
>> // (3)
>>   nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
>> // (4)
>>   [1138] nvmet: ctrl 1 stop keep-alive
>>   (NULL device *): {0:0} Association deleted
>>   (NULL device *): {0:0} Association freed
>>   (NULL device *): Disconnect LS failed: No Association
>>
>>
>> and without --context
>>
>>   run blktests nvme/004 at 2023-07-12 13:50:33
>> // (1)
>>   loop1: detected capacity change from 0 to 32768
>>   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>   nvme nvme2: NVME-FC{0}: create association : host wwpn 
>> 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN 
>> "nqn.2014-08.org.nvmexpress.discovery"
> 
> why does this association to discovery controller created ? because of 
> some system service ?
> 
Yes. There are nvme-autoconnect udev rules and systemd services 
installed per default (in quite some systems now).
And it's really hard (if not impossible) to disable these services (as 
we cannot be sure how they are named, hence we wouldn't know which 
service to disable.

> can we configure the blktests subsystem not to be discovered or add some 
> access list to it ?
> 
But that's precisely what the '--context' thing is attempting to do ...

[ .. ]
>>>>
>>>> It really solves the problem that the autoconnect setup of nvme-cli is
>>>> distrubing the tests (*). The only other way I found to stop the 
>>>> autoconnect is by disabling the udev rule completely. If autoconnect 
>>>> isn't enabled the context isn't necessary.
>>>> Though changing system configuration from blktests seems at bit 
>>>> excessive.
>>>
>>> we should not stop any autoconnect during blktests. The autoconnect 
>>> and all the system admin services should run normally.
>>
>> I do not agree here. The current blktests are not designed for run as
>> intergration tests. Sure we should also tests this but currently 
>> blktests is just not there and tcp/rdma are not actually covered anyway.
> 
> what do you mean tcp/rdma not covered ?
> 
Because there is no autoconnect functionality for tcp/rdma.
For FC we have full topology information, and the driver can emit udev 
messages whenever a NVMe port appears in the fabrics (and the systemd 
machinery will then start autoconnect).
For TCP/RDMA we do not have this, so really there's nothing which could 
send udev events (discounting things like mDNS and nvme-stas for now).

> And maybe we should make several changes in the blktests to make it 
> standalone without interfering the existing configuration make by some 
> system administrator.
> 
??
But this is what we are trying with this patches.
The '--context' flag only needs to be set for the blktests, to inform 
the rest of the system that these subsystems/configuration is special 
and should be exempted from 'normal' system processing.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-13  6:00                     ` Hannes Reinecke
@ 2023-07-13  8:49                       ` Max Gurtovoy
  2023-07-13 10:14                         ` Hannes Reinecke
  0 siblings, 1 reply; 35+ messages in thread
From: Max Gurtovoy @ 2023-07-13  8:49 UTC (permalink / raw)
  To: Hannes Reinecke, Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, James Smart,
	Martin Belanger



On 13/07/2023 9:00, Hannes Reinecke wrote:
> On 7/13/23 02:12, Max Gurtovoy wrote:
>>
>>
>> On 12/07/2023 15:04, Daniel Wagner wrote:
>>> On Mon, Jul 10, 2023 at 07:30:20PM +0300, Max Gurtovoy wrote:
>>>>
>>>>
>>>> On 10/07/2023 18:03, Daniel Wagner wrote:
>>>>> On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
>>>>>> I think it is more than just commit message.
>>>>>
>>>>> Okay, starting to understand what's the problem.
>>>>>
>>>>>> A lot of code that we can avoid was added regarding the --context 
>>>>>> cmdline
>>>>>> argument.
>>>>>
>>>>> Correct and it's not optional to get the tests passing for the fc 
>>>>> transport.
>>>>
>>>> why the fc needs the --context to pass tests ?
>>>
>>> A typical nvme test consists out of following steps (nvme/004):
>>>
>>> // nvme target setup (1)
>>>     _create_nvmet_subsystem "blktests-subsystem-1" "${loop_dev}" \
>>>         "91fdba0d-f87b-4c25-b80f-db7be1418b9e"
>>>     _add_nvmet_subsys_to_port "${port}" "blktests-subsystem-1"
>>>
>>> // nvme host setup (2)
>>>     _nvme_connect_subsys "${nvme_trtype}" blktests-subsystem-1
>>>
>>>     local nvmedev
>>>     nvmedev=$(_find_nvme_dev "blktests-subsystem-1")
>>>     cat "/sys/block/${nvmedev}n1/uuid"
>>>     cat "/sys/block/${nvmedev}n1/wwid"
>>>
>>> // nvme host teardown (3)
>>>     _nvme_disconnect_subsys blktests-subsystem-1
>>>
>>> // nvme target teardown (4)
>>>     _remove_nvmet_subsystem_from_port "${port}" "blktests-subsystem-1"
>>>     _remove_nvmet_subsystem "blktests-subsystem-1"
>>>
>>>
>>> The corresponding output with --context
>>>
>>>   run blktests nvme/004 at 2023-07-12 13:49:50
>>> // (1)
>>>   loop0: detected capacity change from 0 to 32768
>>>   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>   nvme nvme2: NVME-FC{0}: create association : host wwpn 
>>> 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN 
>>> "blktests-subsystem-1"
>>>   (NULL device *): {0:0} Association created
>>>   [174] nvmet: ctrl 1 start keep-alive timer for 5 secs
>>> // (2)
>>>   nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 
>>> for NQN 
>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>   [374] nvmet: adding queue 1 to ctrl 1.
>>>   [1138] nvmet: adding queue 2 to ctrl 1.
>>>   [73] nvmet: adding queue 3 to ctrl 1.
>>>   [174] nvmet: adding queue 4 to ctrl 1.
>>>   nvme nvme2: NVME-FC{0}: controller connect complete
>>>   nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
>>> // (3)
>>>   nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
>>> // (4)
>>>   [1138] nvmet: ctrl 1 stop keep-alive
>>>   (NULL device *): {0:0} Association deleted
>>>   (NULL device *): {0:0} Association freed
>>>   (NULL device *): Disconnect LS failed: No Association
>>>
>>>
>>> and without --context
>>>
>>>   run blktests nvme/004 at 2023-07-12 13:50:33
>>> // (1)
>>>   loop1: detected capacity change from 0 to 32768
>>>   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>   nvme nvme2: NVME-FC{0}: create association : host wwpn 
>>> 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN 
>>> "nqn.2014-08.org.nvmexpress.discovery"
>>
>> why does this association to discovery controller created ? because of 
>> some system service ?
>>
> Yes. There are nvme-autoconnect udev rules and systemd services 
> installed per default (in quite some systems now).
> And it's really hard (if not impossible) to disable these services (as 
> we cannot be sure how they are named, hence we wouldn't know which 
> service to disable.

Right. We shouldn't disable them IMO.

> 
>> can we configure the blktests subsystem not to be discovered or add 
>> some access list to it ?
>>
> But that's precisely what the '--context' thing is attempting to do ...

I'm not sure it is.

Exposing the subsystem is from the target configuration side.
Additionally, the --context (which is in the initiator/host side), 
according to Daniel, is there to distinguish between different 
invocations. I proposed that blktests subsystem will not be part of 
discoverable fabric or protected somehow by access list. Therefore, no 
additional invocation will happen.


> 
> [ .. ]
>>>>>
>>>>> It really solves the problem that the autoconnect setup of nvme-cli is
>>>>> distrubing the tests (*). The only other way I found to stop the 
>>>>> autoconnect is by disabling the udev rule completely. If 
>>>>> autoconnect isn't enabled the context isn't necessary.
>>>>> Though changing system configuration from blktests seems at bit 
>>>>> excessive.
>>>>
>>>> we should not stop any autoconnect during blktests. The autoconnect 
>>>> and all the system admin services should run normally.
>>>
>>> I do not agree here. The current blktests are not designed for run as
>>> intergration tests. Sure we should also tests this but currently 
>>> blktests is just not there and tcp/rdma are not actually covered anyway.
>>
>> what do you mean tcp/rdma not covered ?
>>
> Because there is no autoconnect functionality for tcp/rdma.
> For FC we have full topology information, and the driver can emit udev 
> messages whenever a NVMe port appears in the fabrics (and the systemd 
> machinery will then start autoconnect).
> For TCP/RDMA we do not have this, so really there's nothing which could 
> send udev events (discounting things like mDNS and nvme-stas for now).
> 
>> And maybe we should make several changes in the blktests to make it 
>> standalone without interfering the existing configuration make by some 
>> system administrator.
>>
> ??
> But this is what we are trying with this patches.
> The '--context' flag only needs to be set for the blktests, to inform 
> the rest of the system that these subsystems/configuration is special 
> and should be exempted from 'normal' system processing.

The --context is initiator configuration. I'm referring to changes in 
the target configuration.
This will guarantee that things will work also in the environment where 
we have nvme-cli without the --context flag.

> 
> Cheers,
> 
> Hannes

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-13  8:49                       ` Max Gurtovoy
@ 2023-07-13 10:14                         ` Hannes Reinecke
  2023-07-13 10:30                           ` Daniel Wagner
  0 siblings, 1 reply; 35+ messages in thread
From: Hannes Reinecke @ 2023-07-13 10:14 UTC (permalink / raw)
  To: Max Gurtovoy, Daniel Wagner
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, James Smart,
	Martin Belanger

On 7/13/23 10:49, Max Gurtovoy wrote:
> 
> 
> On 13/07/2023 9:00, Hannes Reinecke wrote:
>> On 7/13/23 02:12, Max Gurtovoy wrote:
>>>
>>>
>>> On 12/07/2023 15:04, Daniel Wagner wrote:
>>>> On Mon, Jul 10, 2023 at 07:30:20PM +0300, Max Gurtovoy wrote:
>>>>>
>>>>>
>>>>> On 10/07/2023 18:03, Daniel Wagner wrote:
>>>>>> On Mon, Jul 10, 2023 at 03:31:23PM +0300, Max Gurtovoy wrote:
>>>>>>> I think it is more than just commit message.
>>>>>>
>>>>>> Okay, starting to understand what's the problem.
>>>>>>
>>>>>>> A lot of code that we can avoid was added regarding the --context 
>>>>>>> cmdline
>>>>>>> argument.
>>>>>>
>>>>>> Correct and it's not optional to get the tests passing for the fc 
>>>>>> transport.
>>>>>
>>>>> why the fc needs the --context to pass tests ?
>>>>
>>>> A typical nvme test consists out of following steps (nvme/004):
>>>>
>>>> // nvme target setup (1)
>>>>     _create_nvmet_subsystem "blktests-subsystem-1" "${loop_dev}" \
>>>>         "91fdba0d-f87b-4c25-b80f-db7be1418b9e"
>>>>     _add_nvmet_subsys_to_port "${port}" "blktests-subsystem-1"
>>>>
>>>> // nvme host setup (2)
>>>>     _nvme_connect_subsys "${nvme_trtype}" blktests-subsystem-1
>>>>
>>>>     local nvmedev
>>>>     nvmedev=$(_find_nvme_dev "blktests-subsystem-1")
>>>>     cat "/sys/block/${nvmedev}n1/uuid"
>>>>     cat "/sys/block/${nvmedev}n1/wwid"
>>>>
>>>> // nvme host teardown (3)
>>>>     _nvme_disconnect_subsys blktests-subsystem-1
>>>>
>>>> // nvme target teardown (4)
>>>>     _remove_nvmet_subsystem_from_port "${port}" "blktests-subsystem-1"
>>>>     _remove_nvmet_subsystem "blktests-subsystem-1"
>>>>
>>>>
>>>> The corresponding output with --context
>>>>
>>>>   run blktests nvme/004 at 2023-07-12 13:49:50
>>>> // (1)
>>>>   loop0: detected capacity change from 0 to 32768
>>>>   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>   nvme nvme2: NVME-FC{0}: create association : host wwpn 
>>>> 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN 
>>>> "blktests-subsystem-1"
>>>>   (NULL device *): {0:0} Association created
>>>>   [174] nvmet: ctrl 1 start keep-alive timer for 5 secs
>>>> // (2)
>>>>   nvmet: creating nvm controller 1 for subsystem 
>>>> blktests-subsystem-1 for NQN 
>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>   [374] nvmet: adding queue 1 to ctrl 1.
>>>>   [1138] nvmet: adding queue 2 to ctrl 1.
>>>>   [73] nvmet: adding queue 3 to ctrl 1.
>>>>   [174] nvmet: adding queue 4 to ctrl 1.
>>>>   nvme nvme2: NVME-FC{0}: controller connect complete
>>>>   nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1"
>>>> // (3)
>>>>   nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
>>>> // (4)
>>>>   [1138] nvmet: ctrl 1 stop keep-alive
>>>>   (NULL device *): {0:0} Association deleted
>>>>   (NULL device *): {0:0} Association freed
>>>>   (NULL device *): Disconnect LS failed: No Association
>>>>
>>>>
>>>> and without --context
>>>>
>>>>   run blktests nvme/004 at 2023-07-12 13:50:33
>>>> // (1)
>>>>   loop1: detected capacity change from 0 to 32768
>>>>   nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>   nvme nvme2: NVME-FC{0}: create association : host wwpn 
>>>> 0x20001100aa000002  rport wwpn 0x20001100aa000001: NQN 
>>>> "nqn.2014-08.org.nvmexpress.discovery"
>>>
>>> why does this association to discovery controller created ? because 
>>> of some system service ?
>>>
>> Yes. There are nvme-autoconnect udev rules and systemd services 
>> installed per default (in quite some systems now).
>> And it's really hard (if not impossible) to disable these services (as 
>> we cannot be sure how they are named, hence we wouldn't know which 
>> service to disable.
> 
> Right. We shouldn't disable them IMO.
> 
>>
>>> can we configure the blktests subsystem not to be discovered or add 
>>> some access list to it ?
>>>
>> But that's precisely what the '--context' thing is attempting to do ...
> 
> I'm not sure it is.
> 
> Exposing the subsystem is from the target configuration side.
> Additionally, the --context (which is in the initiator/host side), 
> according to Daniel, is there to distinguish between different 
> invocations. I proposed that blktests subsystem will not be part of 
> discoverable fabric or protected somehow by access list. Therefore, no 
> additional invocation will happen.
> 
Hmm. Maybe we can tweak blktest to use it's own HostNQN, and always pass 
that for any nvme-cli call.
Daniel?

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-13 10:14                         ` Hannes Reinecke
@ 2023-07-13 10:30                           ` Daniel Wagner
  2023-07-13 10:44                             ` Daniel Wagner
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-07-13 10:30 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Max Gurtovoy, linux-nvme, linux-kernel, linux-block,
	Chaitanya Kulkarni, Shin'ichiro Kawasaki, Sagi Grimberg,
	James Smart, Martin Belanger

On Thu, Jul 13, 2023 at 12:14:14PM +0200, Hannes Reinecke wrote:
> > Exposing the subsystem is from the target configuration side.
> > Additionally, the --context (which is in the initiator/host side),
> > according to Daniel, is there to distinguish between different
> > invocations. I proposed that blktests subsystem will not be part of
> > discoverable fabric or protected somehow by access list. Therefore, no
> > additional invocation will happen.

I am confused. This is exactly what the whole --context thing is.

> Hmm. Maybe we can tweak blktest to use it's own HostNQN, and always pass
> that for any nvme-cli call.

This is what the current code already does.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-13 10:30                           ` Daniel Wagner
@ 2023-07-13 10:44                             ` Daniel Wagner
  2023-07-13 10:50                               ` Max Gurtovoy
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Wagner @ 2023-07-13 10:44 UTC (permalink / raw)
  To: Hannes Reinecke, g
  Cc: Max Gurtovoy, linux-nvme, linux-kernel, linux-block,
	Chaitanya Kulkarni, Shin'ichiro Kawasaki, Sagi Grimberg,
	James Smart, Martin Belanger

On Thu, Jul 13, 2023 at 12:30:45PM +0200, Daniel Wagner wrote:
> On Thu, Jul 13, 2023 at 12:14:14PM +0200, Hannes Reinecke wrote:
> > > Exposing the subsystem is from the target configuration side.
> > > Additionally, the --context (which is in the initiator/host side),
> > > according to Daniel, is there to distinguish between different
> > > invocations. I proposed that blktests subsystem will not be part of
> > > discoverable fabric or protected somehow by access list. Therefore, no
> > > additional invocation will happen.
> 
> I am confused. This is exactly what the whole --context thing is.

Ah I think I got it now. You want me to set allow_hosts on the target side too
:)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect
  2023-07-13 10:44                             ` Daniel Wagner
@ 2023-07-13 10:50                               ` Max Gurtovoy
  0 siblings, 0 replies; 35+ messages in thread
From: Max Gurtovoy @ 2023-07-13 10:50 UTC (permalink / raw)
  To: Daniel Wagner, Hannes Reinecke, g
  Cc: linux-nvme, linux-kernel, linux-block, Chaitanya Kulkarni,
	Shin'ichiro Kawasaki, Sagi Grimberg, James Smart,
	Martin Belanger



On 13/07/2023 13:44, Daniel Wagner wrote:
> On Thu, Jul 13, 2023 at 12:30:45PM +0200, Daniel Wagner wrote:
>> On Thu, Jul 13, 2023 at 12:14:14PM +0200, Hannes Reinecke wrote:
>>>> Exposing the subsystem is from the target configuration side.
>>>> Additionally, the --context (which is in the initiator/host side),
>>>> according to Daniel, is there to distinguish between different
>>>> invocations. I proposed that blktests subsystem will not be part of
>>>> discoverable fabric or protected somehow by access list. Therefore, no
>>>> additional invocation will happen.
>>
>> I am confused. This is exactly what the whole --context thing is.
> 
> Ah I think I got it now. You want me to set allow_hosts on the target side too
> :)

Yes.
With the unique hostId/hostNqn for blktests this should work without the 
need for --context mechanism, that was probably added for system 
administrators and not for unit_testing/QA/Verification engineers that 
run blktests.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2023-07-13 10:50 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-20 13:27 [PATCH blktests v1 0/3] More fixes for FC enabling Daniel Wagner
2023-06-20 13:27 ` [PATCH blktests v1 1/3] nvme/048: Check for queue count check directly Daniel Wagner
2023-06-20 13:49   ` Sagi Grimberg
2023-06-20 17:37     ` Daniel Wagner
2023-06-21  9:50       ` Sagi Grimberg
2023-06-21 11:19         ` Daniel Wagner
2023-06-27 10:13   ` Shinichiro Kawasaki
2023-06-28  5:52     ` Daniel Wagner
2023-06-20 13:27 ` [PATCH blktests v1 2/3] nvme/rc: Avoid triggering host nvme-cli autoconnect Daniel Wagner
2023-06-20 14:07   ` Sagi Grimberg
2023-06-20 17:43     ` Daniel Wagner
2023-06-21  9:02       ` Daniel Wagner
2023-06-27 10:22   ` Shinichiro Kawasaki
2023-06-28  6:09     ` Daniel Wagner
2023-06-28 10:04       ` Shinichiro Kawasaki
2023-06-28 15:00         ` Daniel Wagner
2023-07-06 16:11   ` Max Gurtovoy
2023-07-10  8:27     ` Daniel Wagner
2023-07-10  9:53       ` Max Gurtovoy
2023-07-10 10:19         ` Daniel Wagner
2023-07-10 12:31           ` Max Gurtovoy
2023-07-10 15:03             ` Daniel Wagner
2023-07-10 16:30               ` Max Gurtovoy
2023-07-12 12:04                 ` Daniel Wagner
2023-07-13  0:12                   ` Max Gurtovoy
2023-07-13  6:00                     ` Hannes Reinecke
2023-07-13  8:49                       ` Max Gurtovoy
2023-07-13 10:14                         ` Hannes Reinecke
2023-07-13 10:30                           ` Daniel Wagner
2023-07-13 10:44                             ` Daniel Wagner
2023-07-13 10:50                               ` Max Gurtovoy
2023-06-20 13:27 ` [PATCH blktests v1 3/3] nvme/{041,042,043,044,045}: Use default hostnqn and hostid Daniel Wagner
2023-06-27 10:24   ` Shinichiro Kawasaki
2023-07-06 16:06   ` Max Gurtovoy
2023-07-10  8:32     ` Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.