All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v1 0/4] nvmet-fc blktests & autoconnect fixes
@ 2023-08-29  9:13 Daniel Wagner
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Daniel Wagner @ 2023-08-29  9:13 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Hannes Reinecke, Sagi Grimberg, Jason Gunthorpe,
	James Smart, Chaitanya Kulkarni, Christoph Hellwig,
	Daniel Wagner

Currently, blktests will pass with the patches [1] and the revert of
[2]. This is possible because blktests is still disables the
nvmf-autoconnect auto connect service [3].

As I previously reported, blktests is able to trigger various kernel
panics with the system auto-connect running in the background. Let's try
to fix these problems.

The first two patches are fixing nvmet ftrace infrastructure. I think
they could go in right now.

The third patch changes the way the refcounting for association and
queues is done. There is a cycling dependency between these two objects
and this makes the shutdown path very complex and error prone. As the
life time of the queues is coupled to the association, I decided to drop
the refcounting of the queues and only rely on the refcounts of the
association. This made the code a bit simpler to follow and also allowed
to cleanup path to split into two halfs. The first one is to remove the
association from the association RCU list and wait for an grace period
so we know that now new I/Os will enter any queues. Then we drop the
refcounts and then actually remove any resources when the refcount drops
to 0 (all in-flight I/O has been processed). nvme/003 is particular good
in triggering crashes in this path.

nvme/005 is triggering crashes in get discovery log page. The req->port
pointer was never assign a valid pointer. This looks like there is way
to have no port entry binding (remember we have the external autoconnect
running in background).

Unfortunately, there are still some more fallouts, but I though I post
these patches now when my memory is fresh if there are any questions.

[1] https://lore.kernel.org/linux-nvme/sgoyzwj6ckrdrpq22u6fhtcemul5rqj6de4l5gw73vz77o3ils@vmv3jue4rom7/
[2] linux: ee6fdc5055e9 ("nvme-fc: fix race between error recovery and creating association")
[3] blktests: 0478dce70696 ("nvme/rc: Avoid triggering host nvme-cli autoconnect")

Daniel Wagner (4):
  nvmet-trace: avoid dereferencing pointer too early
  nvmet-trace: null terminate device name string correctly
  nvmet-fc: untangle cross refcounting objects
  nvmet-discovery: do not use invalid port

 drivers/nvme/target/discovery.c |  9 +++++
 drivers/nvme/target/fc.c        | 67 ++++++++++++++++-----------------
 drivers/nvme/target/trace.c     |  6 +--
 drivers/nvme/target/trace.h     | 28 +++++++-------
 4 files changed, 60 insertions(+), 50 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-08-29  9:13 [RFC v1 0/4] nvmet-fc blktests & autoconnect fixes Daniel Wagner
@ 2023-08-29  9:13 ` Daniel Wagner
  2023-08-29 11:21   ` kernel test robot
                     ` (5 more replies)
  2023-08-29  9:13 ` [RFC v1 2/4] nvmet-trace: null terminate device name string correctly Daniel Wagner
                   ` (3 subsequent siblings)
  4 siblings, 6 replies; 27+ messages in thread
From: Daniel Wagner @ 2023-08-29  9:13 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Hannes Reinecke, Sagi Grimberg, Jason Gunthorpe,
	James Smart, Chaitanya Kulkarni, Christoph Hellwig,
	Daniel Wagner

The first command issued from the host to the target is the fabrics
connect command. At this point, neither the target queue nor the
controller have been allocated. But we already try to trace this command
in nvmet_req_init.

Reported by KASAN.

Signed-off-by: Daniel Wagner <dwagner@suse.de>

---
 drivers/nvme/target/trace.c |  6 +++---
 drivers/nvme/target/trace.h | 24 +++++++++++++-----------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/target/trace.h b/drivers/nvme/target/trace.h
index 6109b3806b12..6997bd7e45cf 100644
--- a/drivers/nvme/target/trace.h
+++ b/drivers/nvme/target/trace.h
@@ -32,18 +32,20 @@ const char *nvmet_trace_parse_fabrics_cmd(struct trace_seq *p, u8 fctype,
 	 nvmet_trace_parse_nvm_cmd(p, opcode, cdw10) :			\
 	 nvmet_trace_parse_admin_cmd(p, opcode, cdw10)))
 
-const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl);
-#define __print_ctrl_name(ctrl)				\
-	nvmet_trace_ctrl_name(p, ctrl)
+const char *nvmet_trace_ctrl_id(struct trace_seq *p, u16 ctrl_id);
+#define __print_ctrl_id(ctrl_id)			\
+	nvmet_trace_ctrl_id(p, ctrl_id)
 
 const char *nvmet_trace_disk_name(struct trace_seq *p, char *name);
 #define __print_disk_name(name)				\
 	nvmet_trace_disk_name(p, name)
 
 #ifndef TRACE_HEADER_MULTI_READ
-static inline struct nvmet_ctrl *nvmet_req_to_ctrl(struct nvmet_req *req)
+static inline u16 nvmet_req_to_ctrl_id(struct nvmet_req *req)
 {
-	return req->sq->ctrl;
+	if (!req->sq || !req->sq->ctrl)
+		return 0;
+	return req->sq->ctrl->cntlid;
 }
 
 static inline void __assign_req_name(char *name, struct nvmet_req *req)
@@ -63,7 +65,7 @@ TRACE_EVENT(nvmet_req_init,
 	TP_ARGS(req, cmd),
 	TP_STRUCT__entry(
 		__field(struct nvme_command *, cmd)
-		__field(struct nvmet_ctrl *, ctrl)
+		__field(u16, ctrl_id)
 		__array(char, disk, DISK_NAME_LEN)
 		__field(int, qid)
 		__field(u16, cid)
@@ -76,7 +78,7 @@ TRACE_EVENT(nvmet_req_init,
 	),
 	TP_fast_assign(
 		__entry->cmd = cmd;
-		__entry->ctrl = nvmet_req_to_ctrl(req);
+		__entry->ctrl_id = nvmet_req_to_ctrl_id(req);
 		__assign_req_name(__entry->disk, req);
 		__entry->qid = req->sq->qid;
 		__entry->cid = cmd->common.command_id;
@@ -90,7 +92,7 @@ TRACE_EVENT(nvmet_req_init,
 	),
 	TP_printk("nvmet%s: %sqid=%d, cmdid=%u, nsid=%u, flags=%#x, "
 		  "meta=%#llx, cmd=(%s, %s)",
-		__print_ctrl_name(__entry->ctrl),
+		__print_ctrl_id(__entry->ctrl_id),
 		__print_disk_name(__entry->disk),
 		__entry->qid, __entry->cid, __entry->nsid,
 		__entry->flags, __entry->metadata,
@@ -104,7 +106,7 @@ TRACE_EVENT(nvmet_req_complete,
 	TP_PROTO(struct nvmet_req *req),
 	TP_ARGS(req),
 	TP_STRUCT__entry(
-		__field(struct nvmet_ctrl *, ctrl)
+		__field(u16, ctrl_id)
 		__array(char, disk, DISK_NAME_LEN)
 		__field(int, qid)
 		__field(int, cid)
@@ -112,7 +114,7 @@ TRACE_EVENT(nvmet_req_complete,
 		__field(u16, status)
 	),
 	TP_fast_assign(
-		__entry->ctrl = nvmet_req_to_ctrl(req);
+		__entry->ctrl_id = nvmet_req_to_ctrl_id(req);
 		__entry->qid = req->cq->qid;
 		__entry->cid = req->cqe->command_id;
 		__entry->result = le64_to_cpu(req->cqe->result.u64);
@@ -120,7 +122,7 @@ TRACE_EVENT(nvmet_req_complete,
 		__assign_req_name(__entry->disk, req);
 	),
 	TP_printk("nvmet%s: %sqid=%d, cmdid=%u, res=%#llx, status=%#x",
-		__print_ctrl_name(__entry->ctrl),
+		__print_ctrl_id(__entry->ctrl_id),
 		__print_disk_name(__entry->disk),
 		__entry->qid, __entry->cid, __entry->result, __entry->status)
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC v1 2/4] nvmet-trace: null terminate device name string correctly
  2023-08-29  9:13 [RFC v1 0/4] nvmet-fc blktests & autoconnect fixes Daniel Wagner
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
@ 2023-08-29  9:13 ` Daniel Wagner
  2023-09-05  6:49   ` Christoph Hellwig
  2023-09-06 11:01   ` Hannes Reinecke
  2023-08-29  9:13 ` [RFC v1 3/4] nvmet-fc: untangle cross refcounting objects Daniel Wagner
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Daniel Wagner @ 2023-08-29  9:13 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Hannes Reinecke, Sagi Grimberg, Jason Gunthorpe,
	James Smart, Chaitanya Kulkarni, Christoph Hellwig,
	Daniel Wagner

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/trace.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/trace.h b/drivers/nvme/target/trace.h
index 6997bd7e45cf..9ba466b49613 100644
--- a/drivers/nvme/target/trace.h
+++ b/drivers/nvme/target/trace.h
@@ -55,8 +55,8 @@ static inline void __assign_req_name(char *name, struct nvmet_req *req)
 		return;
 	}
 
-	strncpy(name, req->ns->device_path,
-		min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path)));
+	strscpy(name, req->ns->device_path,
+		min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path) + 1));
 }
 #endif
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC v1 3/4] nvmet-fc: untangle cross refcounting objects
  2023-08-29  9:13 [RFC v1 0/4] nvmet-fc blktests & autoconnect fixes Daniel Wagner
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
  2023-08-29  9:13 ` [RFC v1 2/4] nvmet-trace: null terminate device name string correctly Daniel Wagner
@ 2023-08-29  9:13 ` Daniel Wagner
  2023-09-06 11:22   ` Hannes Reinecke
  2023-08-29  9:13 ` [RFC v1 4/4] nvmet-discovery: do not use invalid port Daniel Wagner
  2023-08-29  9:13 ` [RFC v1 4/4] nvmet-discovery: Do " Daniel Wagner
  4 siblings, 1 reply; 27+ messages in thread
From: Daniel Wagner @ 2023-08-29  9:13 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Hannes Reinecke, Sagi Grimberg, Jason Gunthorpe,
	James Smart, Chaitanya Kulkarni, Christoph Hellwig,
	Daniel Wagner

Associations take a refcount on queues, queues take a refcount on
associations.

The existing code lead to the situation that the target executes a
disconnect and the host retriggers a reconnect immediately. The
reconnect command still finds an existing association and uses this.
Though the reconnect crashes later on because
nvmet_fc_delete_target_assoc() blindly goes ahead and removes resources
while the reconnect code wants to use it. The problem is that
nvmet_fc_find_target_assoc() is able to lookup an association which is
beeing removed.

So the first thing to address nvmet_fc_find_target_queue() is to remove
the association out of the list and wait a RCU cycle and free resources
in the free function callback of the kref_put().

The live time of the queues are strictly bound to the lifetime of an
association. Thus we don't need to take reverse refcounts (queue ->
assocation).

Furthermore, streamline the cleanup code by using the workqueue for
delete the association in nvmet_fc_ls_disconnect. This ensures, that we
run throught the same shutdown path in all non error cases.

Reproducer: nvme/003

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/fc.c | 67 ++++++++++++++++++++--------------------
 1 file changed, 33 insertions(+), 34 deletions(-)

diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index df7d84aff843..9d7262a8e3db 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -165,6 +165,7 @@ struct nvmet_fc_tgt_assoc {
 	struct nvmet_fc_hostport	*hostport;
 	struct nvmet_fc_ls_iod		*rcv_disconn;
 	struct list_head		a_list;
+	struct nvmet_fc_tgt_queue	*_queues[NVMET_NR_QUEUES + 1];
 	struct nvmet_fc_tgt_queue __rcu	*queues[NVMET_NR_QUEUES + 1];
 	struct kref			ref;
 	struct work_struct		del_work;
@@ -802,14 +803,11 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
 	if (!queue)
 		return NULL;
 
-	if (!nvmet_fc_tgt_a_get(assoc))
-		goto out_free_queue;
-
 	queue->work_q = alloc_workqueue("ntfc%d.%d.%d", 0, 0,
 				assoc->tgtport->fc_target_port.port_num,
 				assoc->a_id, qid);
 	if (!queue->work_q)
-		goto out_a_put;
+		goto out_free_queue;
 
 	queue->qid = qid;
 	queue->sqsize = sqsize;
@@ -830,7 +828,8 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
 	if (ret)
 		goto out_fail_iodlist;
 
-	WARN_ON(assoc->queues[qid]);
+	WARN_ON(assoc->_queues[qid]);
+	assoc->_queues[qid] = queue;
 	rcu_assign_pointer(assoc->queues[qid], queue);
 
 	return queue;
@@ -838,8 +837,6 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
 out_fail_iodlist:
 	nvmet_fc_destroy_fcp_iodlist(assoc->tgtport, queue);
 	destroy_workqueue(queue->work_q);
-out_a_put:
-	nvmet_fc_tgt_a_put(assoc);
 out_free_queue:
 	kfree(queue);
 	return NULL;
@@ -852,12 +849,8 @@ nvmet_fc_tgt_queue_free(struct kref *ref)
 	struct nvmet_fc_tgt_queue *queue =
 		container_of(ref, struct nvmet_fc_tgt_queue, ref);
 
-	rcu_assign_pointer(queue->assoc->queues[queue->qid], NULL);
-
 	nvmet_fc_destroy_fcp_iodlist(queue->assoc->tgtport, queue);
 
-	nvmet_fc_tgt_a_put(queue->assoc);
-
 	destroy_workqueue(queue->work_q);
 
 	kfree_rcu(queue, rcu);
@@ -1100,6 +1093,11 @@ nvmet_fc_delete_assoc(struct work_struct *work)
 		container_of(work, struct nvmet_fc_tgt_assoc, del_work);
 
 	nvmet_fc_delete_target_assoc(assoc);
+
+	/* release get taken in nvmet_fc_find_target_assoc */
+	nvmet_fc_tgt_a_put(assoc);
+
+	/* final reference from nvmet_fc_ls_create_association */
 	nvmet_fc_tgt_a_put(assoc);
 }
 
@@ -1172,13 +1170,18 @@ nvmet_fc_target_assoc_free(struct kref *ref)
 	struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
 	struct nvmet_fc_ls_iod	*oldls;
 	unsigned long flags;
+	int i;
+
+	for (i = NVMET_NR_QUEUES; i >= 0; i--) {
+		if (assoc->_queues[i])
+			nvmet_fc_delete_target_queue(assoc->_queues[i]);
+	}
 
 	/* Send Disconnect now that all i/o has completed */
 	nvmet_fc_xmt_disconnect_assoc(assoc);
 
 	nvmet_fc_free_hostport(assoc->hostport);
 	spin_lock_irqsave(&tgtport->lock, flags);
-	list_del_rcu(&assoc->a_list);
 	oldls = assoc->rcv_disconn;
 	spin_unlock_irqrestore(&tgtport->lock, flags);
 	/* if pending Rcv Disconnect Association LS, send rsp now */
@@ -1208,7 +1211,6 @@ static void
 nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
 {
 	struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
-	struct nvmet_fc_tgt_queue *queue;
 	int i, terminating;
 
 	terminating = atomic_xchg(&assoc->terminating, 1);
@@ -1217,29 +1219,21 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
 	if (terminating)
 		return;
 
+	/* prevent new I/Os entering the queues */
+	for (i = NVMET_NR_QUEUES; i >= 0; i--)
+		rcu_assign_pointer(assoc->queues[i], NULL);
+	list_del_rcu(&assoc->a_list);
+	synchronize_rcu();
 
+	/* ensure all in-flight I/Os have been processed */
 	for (i = NVMET_NR_QUEUES; i >= 0; i--) {
-		rcu_read_lock();
-		queue = rcu_dereference(assoc->queues[i]);
-		if (!queue) {
-			rcu_read_unlock();
-			continue;
-		}
-
-		if (!nvmet_fc_tgt_q_get(queue)) {
-			rcu_read_unlock();
-			continue;
-		}
-		rcu_read_unlock();
-		nvmet_fc_delete_target_queue(queue);
-		nvmet_fc_tgt_q_put(queue);
+		if (assoc->_queues[i])
+			flush_workqueue(assoc->_queues[i]->work_q);
 	}
 
 	dev_info(tgtport->dev,
 		"{%d:%d} Association deleted\n",
 		tgtport->fc_target_port.port_num, assoc->a_id);
-
-	nvmet_fc_tgt_a_put(assoc);
 }
 
 static struct nvmet_fc_tgt_assoc *
@@ -1497,6 +1491,8 @@ __nvmet_fc_free_assocs(struct nvmet_fc_tgtport *tgtport)
 			nvmet_fc_tgt_a_put(assoc);
 	}
 	rcu_read_unlock();
+
+	flush_workqueue(nvmet_wq);
 }
 
 /**
@@ -1870,9 +1866,6 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
 				sizeof(struct fcnvme_ls_disconnect_assoc_acc)),
 			FCNVME_LS_DISCONNECT_ASSOC);
 
-	/* release get taken in nvmet_fc_find_target_assoc */
-	nvmet_fc_tgt_a_put(assoc);
-
 	/*
 	 * The rules for LS response says the response cannot
 	 * go back until ABTS's have been sent for all outstanding
@@ -1887,8 +1880,6 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
 	assoc->rcv_disconn = iod;
 	spin_unlock_irqrestore(&tgtport->lock, flags);
 
-	nvmet_fc_delete_target_assoc(assoc);
-
 	if (oldls) {
 		dev_info(tgtport->dev,
 			"{%d:%d} Multiple Disconnect Association LS's "
@@ -1904,6 +1895,11 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
 		nvmet_fc_xmt_ls_rsp(tgtport, oldls);
 	}
 
+	if (!queue_work(nvmet_wq, &assoc->del_work)) {
+		/* already deleting - release local reference */
+		nvmet_fc_tgt_a_put(assoc);
+	}
+
 	return false;
 }
 
@@ -2933,6 +2929,9 @@ static int __init nvmet_fc_init_module(void)
 
 static void __exit nvmet_fc_exit_module(void)
 {
+	/* ensure any shutdown operation, e.g. delete ctrls have finished */
+	flush_workqueue(nvmet_wq);
+
 	/* sanity check - all lports should be removed */
 	if (!list_empty(&nvmet_fc_target_list))
 		pr_warn("%s: targetport list not empty\n", __func__);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-08-29  9:13 [RFC v1 0/4] nvmet-fc blktests & autoconnect fixes Daniel Wagner
                   ` (2 preceding siblings ...)
  2023-08-29  9:13 ` [RFC v1 3/4] nvmet-fc: untangle cross refcounting objects Daniel Wagner
@ 2023-08-29  9:13 ` Daniel Wagner
  2023-09-05  6:50   ` Christoph Hellwig
  2023-09-06 11:23   ` Hannes Reinecke
  2023-08-29  9:13 ` [RFC v1 4/4] nvmet-discovery: Do " Daniel Wagner
  4 siblings, 2 replies; 27+ messages in thread
From: Daniel Wagner @ 2023-08-29  9:13 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Hannes Reinecke, Sagi Grimberg, Jason Gunthorpe,
	James Smart, Chaitanya Kulkarni, Christoph Hellwig,
	Daniel Wagner

The port entry binding might not be existing and thus the req->port
pointer is not valid.

Reproducer: nvme/005 with active system nvmf-autoconnect systemd service.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/discovery.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
index 668d257fa986..fc113057cb95 100644
--- a/drivers/nvme/target/discovery.c
+++ b/drivers/nvme/target/discovery.c
@@ -191,6 +191,15 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
 		goto out;
 	}
 
+
+	/* No port assigned, portentrybinding is missing */
+	if (!req->port) {
+		req->error_loc =
+			offsetof(struct nvme_get_log_page_command, lpo);
+		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto out;
+	}
+
 	/*
 	 * Make sure we're passing at least a buffer of response header size.
 	 * If host provided data len is less than the header size, only the
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC v1 4/4] nvmet-discovery: Do not use invalid port
  2023-08-29  9:13 [RFC v1 0/4] nvmet-fc blktests & autoconnect fixes Daniel Wagner
                   ` (3 preceding siblings ...)
  2023-08-29  9:13 ` [RFC v1 4/4] nvmet-discovery: do not use invalid port Daniel Wagner
@ 2023-08-29  9:13 ` Daniel Wagner
  4 siblings, 0 replies; 27+ messages in thread
From: Daniel Wagner @ 2023-08-29  9:13 UTC (permalink / raw)
  To: linux-nvme
  Cc: linux-kernel, Hannes Reinecke, Sagi Grimberg, Jason Gunthorpe,
	James Smart, Chaitanya Kulkarni, Christoph Hellwig,
	Daniel Wagner

The port entry binding might not be existing and thus the req->port
pointer is not valid.

Reproducer: nvme/005 with active system nvmf-autoconnect systemd service.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 drivers/nvme/target/discovery.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
index 668d257fa986..fc113057cb95 100644
--- a/drivers/nvme/target/discovery.c
+++ b/drivers/nvme/target/discovery.c
@@ -191,6 +191,15 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
 		goto out;
 	}
 
+
+	/* No port assigned, portentrybinding is missing */
+	if (!req->port) {
+		req->error_loc =
+			offsetof(struct nvme_get_log_page_command, lpo);
+		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto out;
+	}
+
 	/*
 	 * Make sure we're passing at least a buffer of response header size.
 	 * If host provided data len is less than the header size, only the
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
@ 2023-08-29 11:21   ` kernel test robot
  2023-08-29 19:55   ` kernel test robot
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2023-08-29 11:21 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: oe-kbuild-all

Hi Daniel,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on hch-configfs/for-next]
[also build test WARNING on linus/master v6.5 next-20230829]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Wagner/nvmet-trace-avoid-dereferencing-pointer-too-early/20230829-171734
base:   git://git.infradead.org/users/hch/configfs.git for-next
patch link:    https://lore.kernel.org/r/20230829091350.16156-2-dwagner%40suse.de
patch subject: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
config: sparc-allyesconfig (https://download.01.org/0day-ci/archive/20230829/202308291952.LKh4QF9k-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230829/202308291952.LKh4QF9k-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308291952.LKh4QF9k-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/nvme/target/trace.c:214:13: warning: no previous prototype for 'nvmet_trace_ctrl_name' [-Wmissing-prototypes]
     214 | const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl)
         |             ^~~~~~~~~~~~~~~~~~~~~


vim +/nvmet_trace_ctrl_name +214 drivers/nvme/target/trace.c

a5448fdc469d67 Minwoo Im 2019-06-12  213  
a5448fdc469d67 Minwoo Im 2019-06-12 @214  const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl)
a5448fdc469d67 Minwoo Im 2019-06-12  215  {
a5448fdc469d67 Minwoo Im 2019-06-12  216  	const char *ret = trace_seq_buffer_ptr(p);
a5448fdc469d67 Minwoo Im 2019-06-12  217  
a5448fdc469d67 Minwoo Im 2019-06-12  218  	/*
a5448fdc469d67 Minwoo Im 2019-06-12  219  	 * XXX: We don't know the controller instance before executing the
a5448fdc469d67 Minwoo Im 2019-06-12  220  	 * connect command itself because the connect command for the admin
a5448fdc469d67 Minwoo Im 2019-06-12  221  	 * queue will not provide the cntlid which will be allocated in this
a5448fdc469d67 Minwoo Im 2019-06-12  222  	 * command.  In case of io queues, the controller instance will be
a5448fdc469d67 Minwoo Im 2019-06-12  223  	 * mapped by the extra data of the connect command.
a5448fdc469d67 Minwoo Im 2019-06-12  224  	 * If we can know the extra data of the connect command in this stage,
a5448fdc469d67 Minwoo Im 2019-06-12  225  	 * we can update this print statement later.
a5448fdc469d67 Minwoo Im 2019-06-12  226  	 */
a5448fdc469d67 Minwoo Im 2019-06-12  227  	if (ctrl)
a5448fdc469d67 Minwoo Im 2019-06-12  228  		trace_seq_printf(p, "%d", ctrl->cntlid);
a5448fdc469d67 Minwoo Im 2019-06-12  229  	else
a5448fdc469d67 Minwoo Im 2019-06-12  230  		trace_seq_printf(p, "_");
a5448fdc469d67 Minwoo Im 2019-06-12  231  	trace_seq_putc(p, 0);
a5448fdc469d67 Minwoo Im 2019-06-12  232  
a5448fdc469d67 Minwoo Im 2019-06-12  233  	return ret;
a5448fdc469d67 Minwoo Im 2019-06-12  234  }
a5448fdc469d67 Minwoo Im 2019-06-12  235  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
  2023-08-29 11:21   ` kernel test robot
@ 2023-08-29 19:55   ` kernel test robot
  2023-08-29 20:37   ` kernel test robot
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2023-08-29 19:55 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: oe-kbuild-all

Hi Daniel,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on hch-configfs/for-next]
[also build test ERROR on linus/master v6.5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Wagner/nvmet-trace-avoid-dereferencing-pointer-too-early/20230829-171734
base:   git://git.infradead.org/users/hch/configfs.git for-next
patch link:    https://lore.kernel.org/r/20230829091350.16156-2-dwagner%40suse.de
patch subject: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
config: i386-randconfig-014-20230830 (https://download.01.org/0day-ci/archive/20230830/202308300317.0x6Otev1-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230830/202308300317.0x6Otev1-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308300317.0x6Otev1-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: drivers/nvme/target/core.o: in function `trace_raw_output_nvmet_req_init':
>> drivers/nvme/target/./trace.h:63: undefined reference to `nvmet_trace_ctrl_id'
   ld: drivers/nvme/target/core.o: in function `trace_raw_output_nvmet_req_complete':
   drivers/nvme/target/./trace.h:105: undefined reference to `nvmet_trace_ctrl_id'


vim +63 drivers/nvme/target/./trace.h

a5448fdc469d67 Minwoo Im          2019-06-12   62  
a5448fdc469d67 Minwoo Im          2019-06-12  @63  TRACE_EVENT(nvmet_req_init,
a5448fdc469d67 Minwoo Im          2019-06-12   64  	TP_PROTO(struct nvmet_req *req, struct nvme_command *cmd),
a5448fdc469d67 Minwoo Im          2019-06-12   65  	TP_ARGS(req, cmd),
a5448fdc469d67 Minwoo Im          2019-06-12   66  	TP_STRUCT__entry(
a5448fdc469d67 Minwoo Im          2019-06-12   67  		__field(struct nvme_command *, cmd)
954c6dd2aaf3a8 Daniel Wagner      2023-08-29   68  		__field(u16, ctrl_id)
a5448fdc469d67 Minwoo Im          2019-06-12   69  		__array(char, disk, DISK_NAME_LEN)
a5448fdc469d67 Minwoo Im          2019-06-12   70  		__field(int, qid)
a5448fdc469d67 Minwoo Im          2019-06-12   71  		__field(u16, cid)
a5448fdc469d67 Minwoo Im          2019-06-12   72  		__field(u8, opcode)
a5448fdc469d67 Minwoo Im          2019-06-12   73  		__field(u8, fctype)
a5448fdc469d67 Minwoo Im          2019-06-12   74  		__field(u8, flags)
a5448fdc469d67 Minwoo Im          2019-06-12   75  		__field(u32, nsid)
a5448fdc469d67 Minwoo Im          2019-06-12   76  		__field(u64, metadata)
a5448fdc469d67 Minwoo Im          2019-06-12   77  		__array(u8, cdw10, 24)
a5448fdc469d67 Minwoo Im          2019-06-12   78  	),
a5448fdc469d67 Minwoo Im          2019-06-12   79  	TP_fast_assign(
a5448fdc469d67 Minwoo Im          2019-06-12   80  		__entry->cmd = cmd;
954c6dd2aaf3a8 Daniel Wagner      2023-08-29   81  		__entry->ctrl_id = nvmet_req_to_ctrl_id(req);
3c3751f2daf667 Chaitanya Kulkarni 2020-10-22   82  		__assign_req_name(__entry->disk, req);
a5448fdc469d67 Minwoo Im          2019-06-12   83  		__entry->qid = req->sq->qid;
a5448fdc469d67 Minwoo Im          2019-06-12   84  		__entry->cid = cmd->common.command_id;
a5448fdc469d67 Minwoo Im          2019-06-12   85  		__entry->opcode = cmd->common.opcode;
a5448fdc469d67 Minwoo Im          2019-06-12   86  		__entry->fctype = cmd->fabrics.fctype;
a5448fdc469d67 Minwoo Im          2019-06-12   87  		__entry->flags = cmd->common.flags;
a5448fdc469d67 Minwoo Im          2019-06-12   88  		__entry->nsid = le32_to_cpu(cmd->common.nsid);
a5448fdc469d67 Minwoo Im          2019-06-12   89  		__entry->metadata = le64_to_cpu(cmd->common.metadata);
a5448fdc469d67 Minwoo Im          2019-06-12   90  		memcpy(__entry->cdw10, &cmd->common.cdw10,
a5448fdc469d67 Minwoo Im          2019-06-12   91  			sizeof(__entry->cdw10));
a5448fdc469d67 Minwoo Im          2019-06-12   92  	),
a5448fdc469d67 Minwoo Im          2019-06-12   93  	TP_printk("nvmet%s: %sqid=%d, cmdid=%u, nsid=%u, flags=%#x, "
a5448fdc469d67 Minwoo Im          2019-06-12   94  		  "meta=%#llx, cmd=(%s, %s)",
954c6dd2aaf3a8 Daniel Wagner      2023-08-29   95  		__print_ctrl_id(__entry->ctrl_id),
a5448fdc469d67 Minwoo Im          2019-06-12   96  		__print_disk_name(__entry->disk),
a5448fdc469d67 Minwoo Im          2019-06-12   97  		__entry->qid, __entry->cid, __entry->nsid,
a5448fdc469d67 Minwoo Im          2019-06-12   98  		__entry->flags, __entry->metadata,
a5448fdc469d67 Minwoo Im          2019-06-12   99  		show_opcode_name(__entry->qid, __entry->opcode,
a5448fdc469d67 Minwoo Im          2019-06-12  100  				__entry->fctype),
a5448fdc469d67 Minwoo Im          2019-06-12  101  		parse_nvme_cmd(__entry->qid, __entry->opcode,
a5448fdc469d67 Minwoo Im          2019-06-12  102  				__entry->fctype, __entry->cdw10))
a5448fdc469d67 Minwoo Im          2019-06-12  103  );
a5448fdc469d67 Minwoo Im          2019-06-12  104  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
  2023-08-29 11:21   ` kernel test robot
  2023-08-29 19:55   ` kernel test robot
@ 2023-08-29 20:37   ` kernel test robot
  2023-09-05  6:48   ` Christoph Hellwig
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2023-08-29 20:37 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: oe-kbuild-all

Hi Daniel,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on hch-configfs/for-next]
[also build test ERROR on linus/master v6.5 next-20230829]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Wagner/nvmet-trace-avoid-dereferencing-pointer-too-early/20230829-171734
base:   git://git.infradead.org/users/hch/configfs.git for-next
patch link:    https://lore.kernel.org/r/20230829091350.16156-2-dwagner%40suse.de
patch subject: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
config: i386-randconfig-013-20230830 (https://download.01.org/0day-ci/archive/20230830/202308300456.n1DBoTie-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230830/202308300456.n1DBoTie-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308300456.n1DBoTie-lkp@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "nvmet_trace_ctrl_id" [drivers/nvme/target/nvmet.ko] undefined!

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
                     ` (2 preceding siblings ...)
  2023-08-29 20:37   ` kernel test robot
@ 2023-09-05  6:48   ` Christoph Hellwig
  2023-09-05  8:24     ` Daniel Wagner
  2023-09-06 11:00   ` Hannes Reinecke
  2023-09-06 15:19   ` kernel test robot
  5 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2023-09-05  6:48 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni,
	Christoph Hellwig

> +static inline u16 nvmet_req_to_ctrl_id(struct nvmet_req *req)
>  {
> -	return req->sq->ctrl;
> +	if (!req->sq || !req->sq->ctrl)
> +		return 0;
> +	return req->sq->ctrl->cntlid;

Can you add a comment here why we have this check?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 2/4] nvmet-trace: null terminate device name string correctly
  2023-08-29  9:13 ` [RFC v1 2/4] nvmet-trace: null terminate device name string correctly Daniel Wagner
@ 2023-09-05  6:49   ` Christoph Hellwig
  2023-09-05 10:25     ` Daniel Wagner
  2023-09-06 11:01   ` Hannes Reinecke
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2023-09-05  6:49 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni,
	Christoph Hellwig

On Tue, Aug 29, 2023 at 11:13:47AM +0200, Daniel Wagner wrote:
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  drivers/nvme/target/trace.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/target/trace.h b/drivers/nvme/target/trace.h
> index 6997bd7e45cf..9ba466b49613 100644
> --- a/drivers/nvme/target/trace.h
> +++ b/drivers/nvme/target/trace.h
> @@ -55,8 +55,8 @@ static inline void __assign_req_name(char *name, struct nvmet_req *req)
>  		return;
>  	}
>  
> -	strncpy(name, req->ns->device_path,
> -		min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path)));
> +	strscpy(name, req->ns->device_path,
> +		min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path) + 1));

I'd just switch to snprintf instead, that the best way to ensure
termination..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-08-29  9:13 ` [RFC v1 4/4] nvmet-discovery: do not use invalid port Daniel Wagner
@ 2023-09-05  6:50   ` Christoph Hellwig
  2023-09-05 10:40     ` Daniel Wagner
  2023-09-06 11:23   ` Hannes Reinecke
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2023-09-05  6:50 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni,
	Christoph Hellwig

On Tue, Aug 29, 2023 at 11:13:49AM +0200, Daniel Wagner wrote:
> The port entry binding might not be existing and thus the req->port
> pointer is not valid.
> 
> Reproducer: nvme/005 with active system nvmf-autoconnect systemd service.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  drivers/nvme/target/discovery.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
> index 668d257fa986..fc113057cb95 100644
> --- a/drivers/nvme/target/discovery.c
> +++ b/drivers/nvme/target/discovery.c
> @@ -191,6 +191,15 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
>  		goto out;
>  	}
>  
> +
> +	/* No port assigned, portentrybinding is missing */

Double new line above, and I think a missing white space before
binding.  But I'm still confused how we can get here without req->port
set.  Can you try to do a little more analysis as I suspect we have
a deeper problem somewhere.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-09-05  6:48   ` Christoph Hellwig
@ 2023-09-05  8:24     ` Daniel Wagner
  2023-09-05  8:33       ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Wagner @ 2023-09-05  8:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni

On Tue, Sep 05, 2023 at 08:48:46AM +0200, Christoph Hellwig wrote:
> > +static inline u16 nvmet_req_to_ctrl_id(struct nvmet_req *req)
> >  {
> > -	return req->sq->ctrl;
> > +	if (!req->sq || !req->sq->ctrl)
> > +		return 0;
> > +	return req->sq->ctrl->cntlid;
> 
> Can you add a comment here why we have this check?

/*
 * The queue and controller pointer are not valid until an
 * association has been established.
 */

?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-09-05  8:24     ` Daniel Wagner
@ 2023-09-05  8:33       ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2023-09-05  8:33 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, linux-nvme, linux-kernel, Hannes Reinecke,
	Sagi Grimberg, Jason Gunthorpe, James Smart, Chaitanya Kulkarni

On Tue, Sep 05, 2023 at 10:24:51AM +0200, Daniel Wagner wrote:
> On Tue, Sep 05, 2023 at 08:48:46AM +0200, Christoph Hellwig wrote:
> > > +static inline u16 nvmet_req_to_ctrl_id(struct nvmet_req *req)
> > >  {
> > > -	return req->sq->ctrl;
> > > +	if (!req->sq || !req->sq->ctrl)
> > > +		return 0;
> > > +	return req->sq->ctrl->cntlid;
> > 
> > Can you add a comment here why we have this check?
> 
> /*
>  * The queue and controller pointer are not valid until an
>  * association has been established.
>  */

sounds good.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 2/4] nvmet-trace: null terminate device name string correctly
  2023-09-05  6:49   ` Christoph Hellwig
@ 2023-09-05 10:25     ` Daniel Wagner
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Wagner @ 2023-09-05 10:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni

On Tue, Sep 05, 2023 at 08:49:21AM +0200, Christoph Hellwig wrote:
> On Tue, Aug 29, 2023 at 11:13:47AM +0200, Daniel Wagner wrote:
> > Signed-off-by: Daniel Wagner <dwagner@suse.de>
> > ---
> >  drivers/nvme/target/trace.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/nvme/target/trace.h b/drivers/nvme/target/trace.h
> > index 6997bd7e45cf..9ba466b49613 100644
> > --- a/drivers/nvme/target/trace.h
> > +++ b/drivers/nvme/target/trace.h
> > @@ -55,8 +55,8 @@ static inline void __assign_req_name(char *name, struct nvmet_req *req)
> >  		return;
> >  	}
> >  
> > -	strncpy(name, req->ns->device_path,
> > -		min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path)));
> > +	strscpy(name, req->ns->device_path,
> > +		min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path) + 1));
> 
> I'd just switch to snprintf instead, that the best way to ensure
> termination..

Okay, but we still need the +1 as the string is cut short; 'loop0' gets
logged as 'loop' at least with the strscpy variant.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-09-05  6:50   ` Christoph Hellwig
@ 2023-09-05 10:40     ` Daniel Wagner
  2023-09-11 14:44       ` Daniel Wagner
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Wagner @ 2023-09-05 10:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni

On Tue, Sep 05, 2023 at 08:50:32AM +0200, Christoph Hellwig wrote:
> > +	/* No port assigned, portentrybinding is missing */
> 
> Double new line above, and I think a missing white space before
> binding.

Yep, sorry.

> But I'm still confused how we can get here without req->port
> set.  Can you try to do a little more analysis as I suspect we have
> a deeper problem somewhere.

I am only able to reproduce this if there are two connect/discovery
attempts happening at the same time. I'll collect some logs and attempt
to make some sense out of it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
                     ` (3 preceding siblings ...)
  2023-09-05  6:48   ` Christoph Hellwig
@ 2023-09-06 11:00   ` Hannes Reinecke
  2023-09-06 15:19   ` kernel test robot
  5 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2023-09-06 11:00 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, Sagi Grimberg, Jason Gunthorpe, James Smart,
	Chaitanya Kulkarni, Christoph Hellwig

On 8/29/23 11:13, Daniel Wagner wrote:
> The first command issued from the host to the target is the fabrics
> connect command. At this point, neither the target queue nor the
> controller have been allocated. But we already try to trace this command
> in nvmet_req_init.
> 
> Reported by KASAN.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> 
> ---
>   drivers/nvme/target/trace.c |  6 +++---
>   drivers/nvme/target/trace.h | 24 +++++++++++++-----------
>   2 files changed, 16 insertions(+), 14 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 2/4] nvmet-trace: null terminate device name string correctly
  2023-08-29  9:13 ` [RFC v1 2/4] nvmet-trace: null terminate device name string correctly Daniel Wagner
  2023-09-05  6:49   ` Christoph Hellwig
@ 2023-09-06 11:01   ` Hannes Reinecke
  1 sibling, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2023-09-06 11:01 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, Sagi Grimberg, Jason Gunthorpe, James Smart,
	Chaitanya Kulkarni, Christoph Hellwig

On 8/29/23 11:13, Daniel Wagner wrote:
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   drivers/nvme/target/trace.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/target/trace.h b/drivers/nvme/target/trace.h
> index 6997bd7e45cf..9ba466b49613 100644
> --- a/drivers/nvme/target/trace.h
> +++ b/drivers/nvme/target/trace.h
> @@ -55,8 +55,8 @@ static inline void __assign_req_name(char *name, struct nvmet_req *req)
>   		return;
>   	}
>   
> -	strncpy(name, req->ns->device_path,
> -		min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path)));
> +	strscpy(name, req->ns->device_path,
> +		min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path) + 1));
>   }
>   #endif
>   
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 3/4] nvmet-fc: untangle cross refcounting objects
  2023-08-29  9:13 ` [RFC v1 3/4] nvmet-fc: untangle cross refcounting objects Daniel Wagner
@ 2023-09-06 11:22   ` Hannes Reinecke
  2023-09-11 10:08     ` Daniel Wagner
  0 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2023-09-06 11:22 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, Sagi Grimberg, Jason Gunthorpe, James Smart,
	Chaitanya Kulkarni, Christoph Hellwig

On 8/29/23 11:13, Daniel Wagner wrote:
> Associations take a refcount on queues, queues take a refcount on
> associations.
> 
> The existing code lead to the situation that the target executes a
> disconnect and the host retriggers a reconnect immediately. The
> reconnect command still finds an existing association and uses this.
> Though the reconnect crashes later on because
> nvmet_fc_delete_target_assoc() blindly goes ahead and removes resources
> while the reconnect code wants to use it. The problem is that
> nvmet_fc_find_target_assoc() is able to lookup an association which is
> beeing removed.
> 
> So the first thing to address nvmet_fc_find_target_queue() is to remove
> the association out of the list and wait a RCU cycle and free resources
> in the free function callback of the kref_put().
> 
> The live time of the queues are strictly bound to the lifetime of an
> association. Thus we don't need to take reverse refcounts (queue ->
> assocation).
> 
> Furthermore, streamline the cleanup code by using the workqueue for
> delete the association in nvmet_fc_ls_disconnect. This ensures, that we
> run throught the same shutdown path in all non error cases.
> 
> Reproducer: nvme/003
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   drivers/nvme/target/fc.c | 67 ++++++++++++++++++++--------------------
>   1 file changed, 33 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
> index df7d84aff843..9d7262a8e3db 100644
> --- a/drivers/nvme/target/fc.c
> +++ b/drivers/nvme/target/fc.c
> @@ -165,6 +165,7 @@ struct nvmet_fc_tgt_assoc {
>   	struct nvmet_fc_hostport	*hostport;
>   	struct nvmet_fc_ls_iod		*rcv_disconn;
>   	struct list_head		a_list;
> +	struct nvmet_fc_tgt_queue	*_queues[NVMET_NR_QUEUES + 1];
>   	struct nvmet_fc_tgt_queue __rcu	*queues[NVMET_NR_QUEUES + 1];
>   	struct kref			ref;
>   	struct work_struct		del_work;
> @@ -802,14 +803,11 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
>   	if (!queue)
>   		return NULL;
>   
> -	if (!nvmet_fc_tgt_a_get(assoc))
> -		goto out_free_queue;
> -
>   	queue->work_q = alloc_workqueue("ntfc%d.%d.%d", 0, 0,
>   				assoc->tgtport->fc_target_port.port_num,
>   				assoc->a_id, qid);
>   	if (!queue->work_q)
> -		goto out_a_put;
> +		goto out_free_queue;
>   
>   	queue->qid = qid;
>   	queue->sqsize = sqsize;
> @@ -830,7 +828,8 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
>   	if (ret)
>   		goto out_fail_iodlist;
>   
> -	WARN_ON(assoc->queues[qid]);
> +	WARN_ON(assoc->_queues[qid]);
> +	assoc->_queues[qid] = queue;
>   	rcu_assign_pointer(assoc->queues[qid], queue);
>   
>   	return queue;
> @@ -838,8 +837,6 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
>   out_fail_iodlist:
>   	nvmet_fc_destroy_fcp_iodlist(assoc->tgtport, queue);
>   	destroy_workqueue(queue->work_q);
> -out_a_put:
> -	nvmet_fc_tgt_a_put(assoc);
>   out_free_queue:
>   	kfree(queue);
>   	return NULL;
> @@ -852,12 +849,8 @@ nvmet_fc_tgt_queue_free(struct kref *ref)
>   	struct nvmet_fc_tgt_queue *queue =
>   		container_of(ref, struct nvmet_fc_tgt_queue, ref);
>   
> -	rcu_assign_pointer(queue->assoc->queues[queue->qid], NULL);
> -
>   	nvmet_fc_destroy_fcp_iodlist(queue->assoc->tgtport, queue);
>   
> -	nvmet_fc_tgt_a_put(queue->assoc);
> -
>   	destroy_workqueue(queue->work_q);
>   
>   	kfree_rcu(queue, rcu);
> @@ -1100,6 +1093,11 @@ nvmet_fc_delete_assoc(struct work_struct *work)
>   		container_of(work, struct nvmet_fc_tgt_assoc, del_work);
>   
>   	nvmet_fc_delete_target_assoc(assoc);
> +
> +	/* release get taken in nvmet_fc_find_target_assoc */
> +	nvmet_fc_tgt_a_put(assoc);
> +
> +	/* final reference from nvmet_fc_ls_create_association */
>   	nvmet_fc_tgt_a_put(assoc);
>   }
>   
That feels wrong. If we're having to do two put in a row it seems that
we're taking one reference too many here.
I would assume that each queue takes a reference on the association; 
isn't that the case here?

> @@ -1172,13 +1170,18 @@ nvmet_fc_target_assoc_free(struct kref *ref)
>   	struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
>   	struct nvmet_fc_ls_iod	*oldls;
>   	unsigned long flags;
> +	int i;
> +
> +	for (i = NVMET_NR_QUEUES; i >= 0; i--) {
> +		if (assoc->_queues[i])
> +			nvmet_fc_delete_target_queue(assoc->_queues[i]);
> +	}
>   
>   	/* Send Disconnect now that all i/o has completed */
>   	nvmet_fc_xmt_disconnect_assoc(assoc);
>   
>   	nvmet_fc_free_hostport(assoc->hostport);
>   	spin_lock_irqsave(&tgtport->lock, flags);
> -	list_del_rcu(&assoc->a_list);
>   	oldls = assoc->rcv_disconn;
>   	spin_unlock_irqrestore(&tgtport->lock, flags);
>   	/* if pending Rcv Disconnect Association LS, send rsp now */
> @@ -1208,7 +1211,6 @@ static void
>   nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
>   {
>   	struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
> -	struct nvmet_fc_tgt_queue *queue;
>   	int i, terminating;
>   
>   	terminating = atomic_xchg(&assoc->terminating, 1);
> @@ -1217,29 +1219,21 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
>   	if (terminating)
>   		return;
>   
> +	/* prevent new I/Os entering the queues */
> +	for (i = NVMET_NR_QUEUES; i >= 0; i--)
> +		rcu_assign_pointer(assoc->queues[i], NULL);
> +	list_del_rcu(&assoc->a_list);
> +	synchronize_rcu();
>   
Watch out for 'list_del_rcu()'. That does _not_ modify the pointer for 
the element in question, only those from the list.
So to avoid concurrency with nvmet_fc_alloc_target_assoc() I guess we 
need to get the tgtport lock here.

> +	/* ensure all in-flight I/Os have been processed */
>   	for (i = NVMET_NR_QUEUES; i >= 0; i--) {
> -		rcu_read_lock();
> -		queue = rcu_dereference(assoc->queues[i]);
> -		if (!queue) {
> -			rcu_read_unlock();
> -			continue;
> -		}
> -
> -		if (!nvmet_fc_tgt_q_get(queue)) {
> -			rcu_read_unlock();
> -			continue;
> -		}
> -		rcu_read_unlock();
> -		nvmet_fc_delete_target_queue(queue);
> -		nvmet_fc_tgt_q_put(queue);
> +		if (assoc->_queues[i])
> +			flush_workqueue(assoc->_queues[i]->work_q);
>   	}
>   
>   	dev_info(tgtport->dev,
>   		"{%d:%d} Association deleted\n",
>   		tgtport->fc_target_port.port_num, assoc->a_id);
> -
> -	nvmet_fc_tgt_a_put(assoc);
>   }
>   
>   static struct nvmet_fc_tgt_assoc *
> @@ -1497,6 +1491,8 @@ __nvmet_fc_free_assocs(struct nvmet_fc_tgtport *tgtport)
>   			nvmet_fc_tgt_a_put(assoc);
>   	}
>   	rcu_read_unlock();
> +
> +	flush_workqueue(nvmet_wq);
>   }
>   
>   /**
> @@ -1870,9 +1866,6 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
>   				sizeof(struct fcnvme_ls_disconnect_assoc_acc)),
>   			FCNVME_LS_DISCONNECT_ASSOC);
>   
> -	/* release get taken in nvmet_fc_find_target_assoc */
> -	nvmet_fc_tgt_a_put(assoc);
> -
>   	/*
>   	 * The rules for LS response says the response cannot
>   	 * go back until ABTS's have been sent for all outstanding
> @@ -1887,8 +1880,6 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
>   	assoc->rcv_disconn = iod;
>   	spin_unlock_irqrestore(&tgtport->lock, flags);
>   
> -	nvmet_fc_delete_target_assoc(assoc);
> -
>   	if (oldls) {
>   		dev_info(tgtport->dev,
>   			"{%d:%d} Multiple Disconnect Association LS's "
> @@ -1904,6 +1895,11 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
>   		nvmet_fc_xmt_ls_rsp(tgtport, oldls);
>   	}
>   
> +	if (!queue_work(nvmet_wq, &assoc->del_work)) {
> +		/* already deleting - release local reference */
> +		nvmet_fc_tgt_a_put(assoc);
> +	}
> +
>   	return false;
>   }
>   
> @@ -2933,6 +2929,9 @@ static int __init nvmet_fc_init_module(void)
>   
>   static void __exit nvmet_fc_exit_module(void)
>   {
> +	/* ensure any shutdown operation, e.g. delete ctrls have finished */
> +	flush_workqueue(nvmet_wq);
> +
>   	/* sanity check - all lports should be removed */
>   	if (!list_empty(&nvmet_fc_target_list))
>   		pr_warn("%s: targetport list not empty\n", __func__);

Otherwise looks good.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-08-29  9:13 ` [RFC v1 4/4] nvmet-discovery: do not use invalid port Daniel Wagner
  2023-09-05  6:50   ` Christoph Hellwig
@ 2023-09-06 11:23   ` Hannes Reinecke
  1 sibling, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2023-09-06 11:23 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: linux-kernel, Sagi Grimberg, Jason Gunthorpe, James Smart,
	Chaitanya Kulkarni, Christoph Hellwig

On 8/29/23 11:13, Daniel Wagner wrote:
> The port entry binding might not be existing and thus the req->port
> pointer is not valid.
> 
> Reproducer: nvme/005 with active system nvmf-autoconnect systemd service.
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   drivers/nvme/target/discovery.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
> index 668d257fa986..fc113057cb95 100644
> --- a/drivers/nvme/target/discovery.c
> +++ b/drivers/nvme/target/discovery.c
> @@ -191,6 +191,15 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
>   		goto out;
>   	}
>   
> +
> +	/* No port assigned, portentrybinding is missing */

Missing space.

> +	if (!req->port) {
> +		req->error_loc =
> +			offsetof(struct nvme_get_log_page_command, lpo);
> +		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> +		goto out;
> +	}
> +
>   	/*
>   	 * Make sure we're passing at least a buffer of response header size.
>   	 * If host provided data len is less than the header size, only the
Otherwise looks good.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
  2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
                     ` (4 preceding siblings ...)
  2023-09-06 11:00   ` Hannes Reinecke
@ 2023-09-06 15:19   ` kernel test robot
  5 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2023-09-06 15:19 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: oe-kbuild-all

Hi Daniel,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on hch-configfs/for-next]
[also build test WARNING on linus/master v6.5 next-20230906]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Wagner/nvmet-trace-avoid-dereferencing-pointer-too-early/20230829-171734
base:   git://git.infradead.org/users/hch/configfs.git for-next
patch link:    https://lore.kernel.org/r/20230829091350.16156-2-dwagner%40suse.de
patch subject: [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early
config: x86_64-randconfig-122-20230906 (https://download.01.org/0day-ci/archive/20230906/202309062255.pYH5WG3z-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230906/202309062255.pYH5WG3z-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309062255.pYH5WG3z-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/target/trace.c:214:12: sparse: sparse: symbol 'nvmet_trace_ctrl_name' was not declared. Should it be static?

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 3/4] nvmet-fc: untangle cross refcounting objects
  2023-09-06 11:22   ` Hannes Reinecke
@ 2023-09-11 10:08     ` Daniel Wagner
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Wagner @ 2023-09-11 10:08 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-nvme, linux-kernel, Sagi Grimberg, Jason Gunthorpe,
	James Smart, Chaitanya Kulkarni, Christoph Hellwig

On Wed, Sep 06, 2023 at 01:22:28PM +0200, Hannes Reinecke wrote:
 >   	destroy_workqueue(queue->work_q);
> >   	kfree_rcu(queue, rcu);
> > @@ -1100,6 +1093,11 @@ nvmet_fc_delete_assoc(struct work_struct *work)
> >   		container_of(work, struct nvmet_fc_tgt_assoc, del_work);
> >   	nvmet_fc_delete_target_assoc(assoc);
> > +
> > +	/* release get taken in nvmet_fc_find_target_assoc */
> > +	nvmet_fc_tgt_a_put(assoc);
> > +
> > +	/* final reference from nvmet_fc_ls_create_association */
> >   	nvmet_fc_tgt_a_put(assoc);
> >   }
> That feels wrong. If we're having to do two put in a row it seems that
> we're taking one reference too many here.

When the association is created the first reference is taken. This is
the one we want to release here. But as nvmet_fc_find_target_assoc
always takes a reference we have to drop that one too. One possibility
would be to introduce a lookup function which doesn't take the
reference.

 > +	/* prevent new I/Os entering the queues */
> > +	for (i = NVMET_NR_QUEUES; i >= 0; i--)
> > +		rcu_assign_pointer(assoc->queues[i], NULL);
> > +	list_del_rcu(&assoc->a_list);
> > +	synchronize_rcu();
> Watch out for 'list_del_rcu()'. That does _not_ modify the pointer for the
> element in question, only those from the list.
> So to avoid concurrency with nvmet_fc_alloc_target_assoc() I guess we need
> to get the tgtport lock here.

Yes, we need to protect from concurent write access obviously.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-09-05 10:40     ` Daniel Wagner
@ 2023-09-11 14:44       ` Daniel Wagner
  2023-09-11 18:19         ` Daniel Wagner
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Wagner @ 2023-09-11 14:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni

On Tue, Sep 05, 2023 at 12:40:25PM +0200, Daniel Wagner wrote:
> > But I'm still confused how we can get here without req->port
> > set.  Can you try to do a little more analysis as I suspect we have
> > a deeper problem somewhere.

The problem is that nvme/005 starts to cleanup all resources and there
is a race between the cleanup path and the host trying to figure out
what's going on (get log page).

We have 3 association:

 assoc 0: systemd/udev triggered 'nvme connect-all' discovery controller
 assoc 1: discovery controller from nvme/005
 assoc 2: i/o controller from nvme/005

nvme/005 issues a reset_controller but doesn't waiting or checking the
result. Instead we go directly into the resource cleanup part, nvme
disconnect which removes assoc 1 and assoc. Then the target cleanup
part starts. At this point, assoc 0 is still around.

 nvme nvme3: Removing ctrl: NQN "blktests-subsystem-1"
 block nvme3n1: no available path - failing I/O
 block nvme3n1: no available path - failing I/O
 Buffer I/O error on dev nvme3n1, logical block 89584, async page read
 (NULL device *): {0:2} Association deleted
 nvmet_fc: nvmet_fc_portentry_unbind: tgtport 000000004f5c9138 pe 00000000e2a2da84
 [321] nvmet: ctrl 2 stop keep-alive
 (NULL device *): {0:1} Association freed
 (NULL device *): Disconnect LS failed: No Association
 general protection fault, probably for non-canonical address 0xdffffc00000000a4: 0000 [#1] PREEMPT SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000520-0x0000000000000527]
 CPU: 1 PID: 250 Comm: kworker/1:4 Tainted: G        W          6.5.0-rc2+ #20 e82c2becb08b573f1fa41dfeddc70ac8f6838a63
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
 Workqueue: nvmet-wq fcloop_fcp_recv_work [nvme_fcloop]
 RIP: 0010:nvmet_execute_disc_get_log_page+0x23f/0x8c0 [nvmet]

The target cleanup removes the port from the subsystem
(nvmet_fc_portentry_unbind) and does not check if there is still a
association around. Right after we have removed assoc 1 and 2 the host
sends a get log page command on assoc 0. Though we have remove the port
binding and thus the pointer when nvmet_execute_disc_get_log_page gets
executed.

I am still pondering how to fix this.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-09-11 14:44       ` Daniel Wagner
@ 2023-09-11 18:19         ` Daniel Wagner
  2023-09-12  6:38           ` Daniel Wagner
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Wagner @ 2023-09-11 18:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni

On Mon, Sep 11, 2023 at 04:44:47PM +0200, Daniel Wagner wrote:
> On Tue, Sep 05, 2023 at 12:40:25PM +0200, Daniel Wagner wrote:
> > > But I'm still confused how we can get here without req->port
> > > set.  Can you try to do a little more analysis as I suspect we have
> > > a deeper problem somewhere.
> 
> The problem is that nvme/005 starts to cleanup all resources and there
> is a race between the cleanup path and the host trying to figure out
> what's going on (get log page).
> 
> We have 3 association:
> 
>  assoc 0: systemd/udev triggered 'nvme connect-all' discovery controller
>  assoc 1: discovery controller from nvme/005
>  assoc 2: i/o controller from nvme/005

Actually, assoc 1 is also a i/o controller for the same hostnqn as assoc
2. This looks more like it assoc 0 and 1 are both from systemd/udev
trigger. But why the heck is the hostnqn for assoc 1 the same as the
hostnqn we use in blktests? Something is really off here.

The rest of my analysis is still valid.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-09-11 18:19         ` Daniel Wagner
@ 2023-09-12  6:38           ` Daniel Wagner
  2023-09-13 11:35             ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Wagner @ 2023-09-12  6:38 UTC (permalink / raw)
  To: Christoph Hellwig, g
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni

 > We have 3 association:
> > 
> >  assoc 0: systemd/udev triggered 'nvme connect-all' discovery controller
> >  assoc 1: discovery controller from nvme/005
> >  assoc 2: i/o controller from nvme/005
> 
> Actually, assoc 1 is also a i/o controller for the same hostnqn as assoc
> 2. This looks more like it assoc 0 and 1 are both from systemd/udev
> trigger. But why the heck is the hostnqn for assoc 1 the same as the
> hostnqn we use in blktests? Something is really off here.
> 
> The rest of my analysis is still valid.

I figured it out. I still used an older version nvme-cli which didn't
apply the hostnqn correctly. We fixed this in the meantime. With the
latest git version of nvme-cli the second I/O controller is not setup
anymore and the crash is gone. Though we still don't cleanup all
resources as the kernel module complains with

[41707.083039] nvmet_fc: nvmet_fc_exit_module: targetport list not empty

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-09-12  6:38           ` Daniel Wagner
@ 2023-09-13 11:35             ` Christoph Hellwig
  2023-09-13 11:59               ` Daniel Wagner
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2023-09-13 11:35 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, g, linux-nvme, linux-kernel, Hannes Reinecke,
	Sagi Grimberg, Jason Gunthorpe, James Smart, Chaitanya Kulkarni

So that's interesting.  But what I'm mostly worried about is how the
nvmet kernel code allows a request without ->port to progress to the
actual command handler.  We should never allow a command to get that
far if ->port is NULL, and should not allow to clear ->port while
commands are still handled.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC v1 4/4] nvmet-discovery: do not use invalid port
  2023-09-13 11:35             ` Christoph Hellwig
@ 2023-09-13 11:59               ` Daniel Wagner
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Wagner @ 2023-09-13 11:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-kernel, Hannes Reinecke, Sagi Grimberg,
	Jason Gunthorpe, James Smart, Chaitanya Kulkarni

On Wed, Sep 13, 2023 at 01:35:19PM +0200, Christoph Hellwig wrote:
> So that's interesting.  But what I'm mostly worried about is how the
> nvmet kernel code allows a request without ->port to progress to the
> actual command handler.

nvmet_fc_handle_fcp_rqst()

	if (tgtport->pe)
		fod->req.port = tgtport->pe->port;

Not sure why this is there. Will test what happens when we just return
an error when we don't have pe set.

> We should never allow a command to get that
> far if ->port is NULL, and should not allow to clear ->port while
> commands are still handled.

Okay, makes sense. I'll test this when I have access to my rig again tomorrow.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-09-13 11:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-29  9:13 [RFC v1 0/4] nvmet-fc blktests & autoconnect fixes Daniel Wagner
2023-08-29  9:13 ` [RFC v1 1/4] nvmet-trace: avoid dereferencing pointer too early Daniel Wagner
2023-08-29 11:21   ` kernel test robot
2023-08-29 19:55   ` kernel test robot
2023-08-29 20:37   ` kernel test robot
2023-09-05  6:48   ` Christoph Hellwig
2023-09-05  8:24     ` Daniel Wagner
2023-09-05  8:33       ` Christoph Hellwig
2023-09-06 11:00   ` Hannes Reinecke
2023-09-06 15:19   ` kernel test robot
2023-08-29  9:13 ` [RFC v1 2/4] nvmet-trace: null terminate device name string correctly Daniel Wagner
2023-09-05  6:49   ` Christoph Hellwig
2023-09-05 10:25     ` Daniel Wagner
2023-09-06 11:01   ` Hannes Reinecke
2023-08-29  9:13 ` [RFC v1 3/4] nvmet-fc: untangle cross refcounting objects Daniel Wagner
2023-09-06 11:22   ` Hannes Reinecke
2023-09-11 10:08     ` Daniel Wagner
2023-08-29  9:13 ` [RFC v1 4/4] nvmet-discovery: do not use invalid port Daniel Wagner
2023-09-05  6:50   ` Christoph Hellwig
2023-09-05 10:40     ` Daniel Wagner
2023-09-11 14:44       ` Daniel Wagner
2023-09-11 18:19         ` Daniel Wagner
2023-09-12  6:38           ` Daniel Wagner
2023-09-13 11:35             ` Christoph Hellwig
2023-09-13 11:59               ` Daniel Wagner
2023-09-06 11:23   ` Hannes Reinecke
2023-08-29  9:13 ` [RFC v1 4/4] nvmet-discovery: Do " Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.