[PATCH v2 0/4] nvme: protect against possible request reference after completion

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] nvme: protect against possible request reference after completion
@ 2021-05-19 17:43 Sagi Grimberg
  2021-05-19 17:43 ` [PATCH v2 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Sagi Grimberg @ 2021-05-19 17:43 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

Nothing in nvme protects against referencing a request after it was completed.
For example, in case a buggy controller sends a completion twice for the same
request, the host can access and modify a request that was already completed.

At best, this will cause a panic, but on the worst case, this can cause a silent
data corruption if the request was already reused and executed by the time
we reference it.

The nvme command_id is an opaque that we simply placed the request tag thus far.
To protect against a access after completion, we introduce a generation counter
to the upper 4-bits of the command_id that will increment every invocation and
be validated upon the reception of a completion. This will limit the maximum
queue depth to be effectively 4095, but we hardly ever use such long queues
(in fabrics the maximum is already 1024).

Changes from v1:
- lift param_set_uint_minmax and reuse it
- simplify initialization in patch 3/4

Sagi Grimberg (4):
  params: lift param_set_uint_minmax to common code
  nvme-pci: limit maximum queue depth to 4095
  nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data
  nvme: code command_id with a genctr for use-after-free validation

 drivers/nvme/host/core.c    |  3 ++-
 drivers/nvme/host/nvme.h    | 47 ++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/pci.c     | 17 ++++++--------
 drivers/nvme/host/rdma.c    |  4 ++--
 drivers/nvme/host/tcp.c     | 38 ++++++++++++------------------
 drivers/nvme/target/loop.c  |  4 ++--
 include/linux/moduleparam.h |  3 +++
 kernel/params.c             | 19 +++++++++++++++
 net/sunrpc/xprtsock.c       | 18 --------------
 9 files changed, 96 insertions(+), 57 deletions(-)

-- 
2.27.0

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/4] params: lift param_set_uint_minmax to common code
  2021-05-19 17:43 [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
@ 2021-05-19 17:43 ` Sagi Grimberg
  2021-05-19 21:10   ` Chaitanya Kulkarni
  2021-05-20  6:01   ` Christoph Hellwig
  2021-05-19 17:43 ` [PATCH v2 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 25+ messages in thread
From: Sagi Grimberg @ 2021-05-19 17:43 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

It is a useful helper hence move it to common code so others can enjoy
it.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 include/linux/moduleparam.h |  3 +++
 kernel/params.c             | 19 +++++++++++++++++++
 net/sunrpc/xprtsock.c       | 18 ------------------
 3 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index eed280fae433..47bd89cd2580 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -432,6 +432,9 @@ extern const struct kernel_param_ops param_ops_uint;
 extern int param_set_uint(const char *val, const struct kernel_param *kp);
 extern int param_get_uint(char *buffer, const struct kernel_param *kp);
 #define param_check_uint(name, p) __param_check(name, p, unsigned int)
+int param_set_uint_minmax(const char *val,
+		const struct kernel_param *kp,
+		unsigned int min, unsigned int max);
 
 extern const struct kernel_param_ops param_ops_long;
 extern int param_set_long(const char *val, const struct kernel_param *kp);
diff --git a/kernel/params.c b/kernel/params.c
index 2daa2780a92c..db989a6aa281 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -243,6 +243,25 @@ STANDARD_PARAM_DEF(ulong,	unsigned long,		"%lu",		kstrtoul);
 STANDARD_PARAM_DEF(ullong,	unsigned long long,	"%llu",		kstrtoull);
 STANDARD_PARAM_DEF(hexint,	unsigned int,		"%#08x", 	kstrtouint);
 
+int param_set_uint_minmax(const char *val,
+		const struct kernel_param *kp,
+		unsigned int min, unsigned int max)
+{
+	unsigned int num;
+	int ret;
+
+	if (!val)
+		return -EINVAL;
+	ret = kstrtouint(val, 0, &num);
+	if (ret)
+		return ret;
+	if (num < min || num > max)
+		return -EINVAL;
+	*((unsigned int *)kp->arg) = num;
+	return 0;
+}
+EXPORT_SYMBOL(param_set_uint_minmax);
+
 int param_set_charp(const char *val, const struct kernel_param *kp)
 {
 	if (strlen(val) > 1024) {
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index e35760f238a4..737c0d983813 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -3123,24 +3123,6 @@ void cleanup_socket_xprt(void)
 	xprt_unregister_transport(&xs_bc_tcp_transport);
 }
 
-static int param_set_uint_minmax(const char *val,
-		const struct kernel_param *kp,
-		unsigned int min, unsigned int max)
-{
-	unsigned int num;
-	int ret;
-
-	if (!val)
-		return -EINVAL;
-	ret = kstrtouint(val, 0, &num);
-	if (ret)
-		return ret;
-	if (num < min || num > max)
-		return -EINVAL;
-	*((unsigned int *)kp->arg) = num;
-	return 0;
-}
-
 static int param_set_portnr(const char *val, const struct kernel_param *kp)
 {
 	return param_set_uint_minmax(val, kp,
-- 
2.27.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/4] nvme-pci: limit maximum queue depth to 4095
  2021-05-19 17:43 [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
  2021-05-19 17:43 ` [PATCH v2 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg
@ 2021-05-19 17:43 ` Sagi Grimberg
  2021-05-19 21:12   ` Chaitanya Kulkarni
  2021-05-26  8:55   ` Hannes Reinecke
  2021-05-19 17:43 ` [PATCH v2 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data Sagi Grimberg
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 25+ messages in thread
From: Sagi Grimberg @ 2021-05-19 17:43 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

We are going to use the upper 4-bits of the command_id for a generation
counter, so enforce the new queue depth upper limit. As we enforce
both min and max queue depth, use param_set_uint_minmax istead of
open coding it.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/pci.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a29b170701fc..ac26d242b630 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -60,6 +60,8 @@ MODULE_PARM_DESC(sgl_threshold,
 		"Use SGLs when average request segment size is larger or equal to "
 		"this size. Use 0 to disable SGLs.");
 
+#define NVME_PCI_MIN_QUEUE_SIZE 2
+#define NVME_PCI_MAX_QUEUE_SIZE 4095
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp);
 static const struct kernel_param_ops io_queue_depth_ops = {
 	.set = io_queue_depth_set,
@@ -68,7 +70,7 @@ static const struct kernel_param_ops io_queue_depth_ops = {
 
 static unsigned int io_queue_depth = 1024;
 module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644);
-MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
+MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2 and < 4096");
 
 static int io_queue_count_set(const char *val, const struct kernel_param *kp)
 {
@@ -157,14 +159,9 @@ struct nvme_dev {
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
 {
-	int ret;
-	u32 n;
-
-	ret = kstrtou32(val, 10, &n);
-	if (ret != 0 || n < 2)
-		return -EINVAL;
-
-	return param_set_uint(val, kp);
+	return param_set_uint_minmax(val, kp,
+			NVME_PCI_MIN_QUEUE_SIZE,
+			NVME_PCI_MAX_QUEUE_SIZE);
 }
 
 static inline unsigned int sq_idx(unsigned int qid, u32 stride)
-- 
2.27.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data
  2021-05-19 17:43 [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
  2021-05-19 17:43 ` [PATCH v2 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg
  2021-05-19 17:43 ` [PATCH v2 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg
@ 2021-05-19 17:43 ` Sagi Grimberg
  2021-05-26  8:56   ` Hannes Reinecke
  2021-05-19 17:43 ` [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg
  2021-06-16 16:28 ` [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
  4 siblings, 1 reply; 25+ messages in thread
From: Sagi Grimberg @ 2021-05-19 17:43 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

We already validate it when receiving the c2hdata pdu header
and this is not changing so this is a redundant check.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/tcp.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 6bd5b281c818..e189715ceca7 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -695,17 +695,9 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
 			      unsigned int *offset, size_t *len)
 {
 	struct nvme_tcp_data_pdu *pdu = (void *)queue->pdu;
-	struct nvme_tcp_request *req;
-	struct request *rq;
-
-	rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
-	if (!rq) {
-		dev_err(queue->ctrl->ctrl.device,
-			"queue %d tag %#x not found\n",
-			nvme_tcp_queue_id(queue), pdu->command_id);
-		return -ENOENT;
-	}
-	req = blk_mq_rq_to_pdu(rq);
+	struct request *rq =
+		blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
+	struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq);
 
 	while (true) {
 		int recv_len, ret;
-- 
2.27.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-19 17:43 [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
                   ` (2 preceding siblings ...)
  2021-05-19 17:43 ` [PATCH v2 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data Sagi Grimberg
@ 2021-05-19 17:43 ` Sagi Grimberg
  2021-05-20  6:49   ` Daniel Wagner
                     ` (2 more replies)
  2021-06-16 16:28 ` [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
  4 siblings, 3 replies; 25+ messages in thread
From: Sagi Grimberg @ 2021-05-19 17:43 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

We cannot detect a (perhaps buggy) controller that is sending us
a completion for a request that was already completed (for example
sending a completion twice), this phenomenon was seen in the wild
a few times.

So to protect against this, we use the upper 4 msbits of the nvme sqe
command_id to use as a 4-bit generation counter and verify it matches
the existing request generation that is incrementing on every execution.

The 16-bit command_id structure now is constructed by:
| xxxx | xxxxxxxxxxxx |
  gen    request tag

This means that we are giving up some possible queue depth as 12 bits
allow for a maximum queue depth of 4095 instead of 65536, however we
never create such long queues anyways so no real harm done.

Suggested-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c   |  3 ++-
 drivers/nvme/host/nvme.h   | 47 +++++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/pci.c    |  2 +-
 drivers/nvme/host/rdma.c   |  4 ++--
 drivers/nvme/host/tcp.c    | 26 ++++++++++-----------
 drivers/nvme/target/loop.c |  4 ++--
 6 files changed, 66 insertions(+), 20 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e6612971f4eb..8308dd1e9e3f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1006,7 +1006,8 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 		return BLK_STS_IOERR;
 	}
 
-	cmd->common.command_id = req->tag;
+	nvme_req(req)->genctr++;
+	cmd->common.command_id = nvme_cid(req);
 	trace_nvme_setup_cmd(req, cmd);
 	return ret;
 }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 05f31a2c64bb..e6f72c82582d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -158,6 +158,7 @@ enum nvme_quirks {
 struct nvme_request {
 	struct nvme_command	*cmd;
 	union nvme_result	result;
+	u8			genctr;
 	u8			retries;
 	u8			flags;
 	u16			status;
@@ -497,6 +498,49 @@ struct nvme_ctrl_ops {
 	int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
 };
 
+/*
+ * nvme command_id is constructed as such:
+ * | xxxx | xxxxxxxxxxxx |
+ *   gen    request tag
+ */
+#define nvme_genctr_mask(gen)			(gen & 0xf)
+#define nvme_cid_install_genctr(gen)		(nvme_genctr_mask(gen) << 12)
+#define nvme_genctr_from_cid(cid)		((cid & 0xf000) >> 12)
+#define nvme_tag_from_cid(cid)			(cid & 0xfff)
+
+static inline u16 nvme_cid(struct request *rq)
+{
+	return nvme_cid_install_genctr(nvme_req(rq)->genctr) | rq->tag;
+}
+
+static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
+		u16 command_id)
+{
+	u8 genctr = nvme_genctr_from_cid(command_id);
+	u16 tag = nvme_tag_from_cid(command_id);
+	struct request *rq;
+
+	rq = blk_mq_tag_to_rq(tags, tag);
+	if (unlikely(!rq)) {
+		pr_err("could not locate request for tag %#x\n",
+			tag);
+		return NULL;
+	}
+	if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
+		dev_err(nvme_req(rq)->ctrl->device,
+			"request %#x genctr mismatch (got %#x expected %#x)\n",
+			tag, genctr, nvme_req(rq)->genctr);
+		return NULL;
+	}
+	return rq;
+}
+
+static inline struct request *nvme_cid_to_rq(struct blk_mq_tags *tags,
+                u16 command_id)
+{
+	return blk_mq_tag_to_rq(tags, nvme_tag_from_cid(command_id));
+}
+
 #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
 void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
 			    const char *dev_name);
@@ -594,7 +638,8 @@ static inline void nvme_put_ctrl(struct nvme_ctrl *ctrl)
 
 static inline bool nvme_is_aen_req(u16 qid, __u16 command_id)
 {
-	return !qid && command_id >= NVME_AQ_BLK_MQ_DEPTH;
+	return !qid &&
+		nvme_tag_from_cid(command_id) >= NVME_AQ_BLK_MQ_DEPTH;
 }
 
 void nvme_complete_rq(struct request *req);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ac26d242b630..7398c5a76956 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1014,7 +1014,7 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
 		return;
 	}
 
-	req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), command_id);
+	req = nvme_find_rq(nvme_queue_tagset(nvmeq), command_id);
 	if (unlikely(!req)) {
 		dev_warn(nvmeq->dev->ctrl.device,
 			"invalid id %d completed on queue %d\n",
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 203b47a8ec92..ab5b7d175488 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1727,10 +1727,10 @@ static void nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 	struct request *rq;
 	struct nvme_rdma_request *req;
 
-	rq = blk_mq_tag_to_rq(nvme_rdma_tagset(queue), cqe->command_id);
+	rq = nvme_find_rq(nvme_rdma_tagset(queue), cqe->command_id);
 	if (!rq) {
 		dev_err(queue->ctrl->ctrl.device,
-			"tag 0x%x on QP %#x not found\n",
+			"got bad command_id %#x on QP %#x\n",
 			cqe->command_id, queue->qp->qp_num);
 		nvme_rdma_error_recovery(queue->ctrl);
 		return;
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index e189715ceca7..cf30568c9bbc 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -487,11 +487,11 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
 {
 	struct request *rq;
 
-	rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), cqe->command_id);
+	rq = nvme_find_rq(nvme_tcp_tagset(queue), cqe->command_id);
 	if (!rq) {
 		dev_err(queue->ctrl->ctrl.device,
-			"queue %d tag 0x%x not found\n",
-			nvme_tcp_queue_id(queue), cqe->command_id);
+			"got bad cqe.command_id %#x on queue %d\n",
+			cqe->command_id, nvme_tcp_queue_id(queue));
 		nvme_tcp_error_recovery(&queue->ctrl->ctrl);
 		return -EINVAL;
 	}
@@ -508,11 +508,11 @@ static int nvme_tcp_handle_c2h_data(struct nvme_tcp_queue *queue,
 {
 	struct request *rq;
 
-	rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
+	rq = nvme_find_rq(nvme_tcp_tagset(queue), pdu->command_id);
 	if (!rq) {
 		dev_err(queue->ctrl->ctrl.device,
-			"queue %d tag %#x not found\n",
-			nvme_tcp_queue_id(queue), pdu->command_id);
+			"got bad c2hdata.command_id %#x on queue %d\n",
+			pdu->command_id, nvme_tcp_queue_id(queue));
 		return -ENOENT;
 	}
 
@@ -599,7 +599,7 @@ static int nvme_tcp_setup_h2c_data_pdu(struct nvme_tcp_request *req,
 	data->hdr.plen =
 		cpu_to_le32(data->hdr.hlen + hdgst + req->pdu_len + ddgst);
 	data->ttag = pdu->ttag;
-	data->command_id = rq->tag;
+	data->command_id = nvme_cid(rq);
 	data->data_offset = cpu_to_le32(req->data_sent);
 	data->data_length = cpu_to_le32(req->pdu_len);
 	return 0;
@@ -612,11 +612,11 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue,
 	struct request *rq;
 	int ret;
 
-	rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
+	rq = nvme_find_rq(nvme_tcp_tagset(queue), pdu->command_id);
 	if (!rq) {
 		dev_err(queue->ctrl->ctrl.device,
-			"queue %d tag %#x not found\n",
-			nvme_tcp_queue_id(queue), pdu->command_id);
+			"got bad r2t.command_id %#x on queue %d\n",
+			pdu->command_id, nvme_tcp_queue_id(queue));
 		return -ENOENT;
 	}
 	req = blk_mq_rq_to_pdu(rq);
@@ -696,7 +696,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
 {
 	struct nvme_tcp_data_pdu *pdu = (void *)queue->pdu;
 	struct request *rq =
-		blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
+		nvme_cid_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
 	struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq);
 
 	while (true) {
@@ -789,8 +789,8 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue,
 	}
 
 	if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
-		struct request *rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue),
-						pdu->command_id);
+		struct request *rq = nvme_cid_to_rq(nvme_tcp_tagset(queue),
+					pdu->command_id);
 
 		nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
 		queue->nr_cqe++;
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 1b89a6bb819a..9f1f5d572960 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -107,10 +107,10 @@ static void nvme_loop_queue_response(struct nvmet_req *req)
 	} else {
 		struct request *rq;
 
-		rq = blk_mq_tag_to_rq(nvme_loop_tagset(queue), cqe->command_id);
+		rq = nvme_find_rq(nvme_loop_tagset(queue), cqe->command_id);
 		if (!rq) {
 			dev_err(queue->ctrl->ctrl.device,
-				"tag 0x%x on queue %d not found\n",
+				"got bad command_id %#x on queue %d\n",
 				cqe->command_id, nvme_loop_queue_idx(queue));
 			return;
 		}
-- 
2.27.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/4] params: lift param_set_uint_minmax to common code
  2021-05-19 17:43 ` [PATCH v2 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg
@ 2021-05-19 21:10   ` Chaitanya Kulkarni
  2021-05-20  6:01   ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Chaitanya Kulkarni @ 2021-05-19 21:10 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

On 5/19/21 10:52, Sagi Grimberg wrote:
>  
> +int param_set_uint_minmax(const char *val,
> +		const struct kernel_param *kp,
> +		unsigned int min, unsigned int max)

nit :- this is not from your patch but from existing code we
can try and make indentation like param_array(),
param_attr_store() and add_sysfs_param():-

int param_set_uint_minmax(const char *val,
                          const struct kernel_param *kp,
                          unsigned int min, unsigned int max)


can be done at the time of applying patch.
> +{
> +	unsigned int num;
> +	int ret;
> +
> +	if (!val)
> +		return -EINVAL;
> +	ret = kstrtouint(val, 0, &num);
> +	if (ret)
> +		return ret;
> +	if (num < min || num > max)
> +		return -EINVAL;
> +	*((unsigned int *)kp->arg) = num;
> +	return 0;
> +}

irrespective of that :-

Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/4] nvme-pci: limit maximum queue depth to 4095
  2021-05-19 17:43 ` [PATCH v2 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg
@ 2021-05-19 21:12   ` Chaitanya Kulkarni
  2021-05-26  8:55   ` Hannes Reinecke
  1 sibling, 0 replies; 25+ messages in thread
From: Chaitanya Kulkarni @ 2021-05-19 21:12 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

On 5/19/21 10:52, Sagi Grimberg wrote:
> We are going to use the upper 4-bits of the command_id for a generation
> counter, so enforce the new queue depth upper limit. As we enforce
> both min and max queue depth, use param_set_uint_minmax istead of
> open coding it.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>




_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/4] params: lift param_set_uint_minmax to common code
  2021-05-19 17:43 ` [PATCH v2 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg
  2021-05-19 21:10   ` Chaitanya Kulkarni
@ 2021-05-20  6:01   ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-05-20  6:01 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Daniel Wagner

On Wed, May 19, 2021 at 10:43:37AM -0700, Sagi Grimberg wrote:
> It is a useful helper hence move it to common code so others can enjoy
> it.


This needs a Cc to linux-kernel and linux-nfs.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-19 17:43 ` [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg
@ 2021-05-20  6:49   ` Daniel Wagner
  2021-05-25 22:50   ` Max Gurtovoy
  2021-05-26  8:59   ` Hannes Reinecke
  2 siblings, 0 replies; 25+ messages in thread
From: Daniel Wagner @ 2021-05-20  6:49 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch

Hi Sagi,

On Wed, May 19, 2021 at 10:43:40AM -0700, Sagi Grimberg wrote:
> +static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
> +		u16 command_id)
> +{
> +	u8 genctr = nvme_genctr_from_cid(command_id);
> +	u16 tag = nvme_tag_from_cid(command_id);
> +	struct request *rq;
> +
> +	rq = blk_mq_tag_to_rq(tags, tag);
> +	if (unlikely(!rq)) {
> +		pr_err("could not locate request for tag %#x\n",
> +			tag);
> +		return NULL;
> +	}
> +	if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
> +		dev_err(nvme_req(rq)->ctrl->device,
> +			"request %#x genctr mismatch (got %#x expected %#x)\n",
> +			tag, genctr, nvme_req(rq)->genctr);

'nvme_genctr_mask(nvme_req(rq)->genctr)' for the expected value too?

Thanks,
Daniel

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-19 17:43 ` [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg
  2021-05-20  6:49   ` Daniel Wagner
@ 2021-05-25 22:50   ` Max Gurtovoy
  2021-05-26  0:39     ` Keith Busch
  2021-05-26  8:59   ` Hannes Reinecke
  2 siblings, 1 reply; 25+ messages in thread
From: Max Gurtovoy @ 2021-05-25 22:50 UTC (permalink / raw)
  To: linux-nvme


On 5/19/2021 8:43 PM, Sagi Grimberg wrote:
> We cannot detect a (perhaps buggy) controller that is sending us
> a completion for a request that was already completed (for example
> sending a completion twice), this phenomenon was seen in the wild
> a few times.

The bad controller should be fixed.

In the past, I've sent patches that check that sqid match in nvme cqe to 
protect faulty drives that might send
the completion on a wrong msix.

My patch wasn't accepted since it added an additional "if" in the fast path.

Now we're adding much more operation in the fast path because of buggy 
ctrl ?


> So to protect against this, we use the upper 4 msbits of the nvme sqe
> command_id to use as a 4-bit generation counter and verify it matches
> the existing request generation that is incrementing on every execution.
>
> The 16-bit command_id structure now is constructed by:
> | xxxx | xxxxxxxxxxxx |
>    gen    request tag

Is this new specification for command_id ?


>
> This means that we are giving up some possible queue depth as 12 bits
> allow for a maximum queue depth of 4095 instead of 65536, however we
> never create such long queues anyways so no real harm done.
>
> Suggested-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>   drivers/nvme/host/core.c   |  3 ++-
>   drivers/nvme/host/nvme.h   | 47 +++++++++++++++++++++++++++++++++++++-
>   drivers/nvme/host/pci.c    |  2 +-
>   drivers/nvme/host/rdma.c   |  4 ++--
>   drivers/nvme/host/tcp.c    | 26 ++++++++++-----------
>   drivers/nvme/target/loop.c |  4 ++--
>   6 files changed, 66 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e6612971f4eb..8308dd1e9e3f 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1006,7 +1006,8 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
>   		return BLK_STS_IOERR;
>   	}
>   
> -	cmd->common.command_id = req->tag;
> +	nvme_req(req)->genctr++;
> +	cmd->common.command_id = nvme_cid(req);
>   	trace_nvme_setup_cmd(req, cmd);
>   	return ret;
>   }
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 05f31a2c64bb..e6f72c82582d 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -158,6 +158,7 @@ enum nvme_quirks {
>   struct nvme_request {
>   	struct nvme_command	*cmd;
>   	union nvme_result	result;
> +	u8			genctr;
>   	u8			retries;
>   	u8			flags;
>   	u16			status;
> @@ -497,6 +498,49 @@ struct nvme_ctrl_ops {
>   	int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
>   };
>   
> +/*
> + * nvme command_id is constructed as such:
> + * | xxxx | xxxxxxxxxxxx |
> + *   gen    request tag
> + */
> +#define nvme_genctr_mask(gen)			(gen & 0xf)
> +#define nvme_cid_install_genctr(gen)		(nvme_genctr_mask(gen) << 12)
> +#define nvme_genctr_from_cid(cid)		((cid & 0xf000) >> 12)
> +#define nvme_tag_from_cid(cid)			(cid & 0xfff)
> +
> +static inline u16 nvme_cid(struct request *rq)
> +{
> +	return nvme_cid_install_genctr(nvme_req(rq)->genctr) | rq->tag;
> +}
> +
> +static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
> +		u16 command_id)
> +{
> +	u8 genctr = nvme_genctr_from_cid(command_id);
> +	u16 tag = nvme_tag_from_cid(command_id);
> +	struct request *rq;
> +
> +	rq = blk_mq_tag_to_rq(tags, tag);
> +	if (unlikely(!rq)) {
> +		pr_err("could not locate request for tag %#x\n",
> +			tag);
> +		return NULL;
> +	}
> +	if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
> +		dev_err(nvme_req(rq)->ctrl->device,
> +			"request %#x genctr mismatch (got %#x expected %#x)\n",
> +			tag, genctr, nvme_req(rq)->genctr);
> +		return NULL;
> +	}
> +	return rq;
> +}
> +
> +static inline struct request *nvme_cid_to_rq(struct blk_mq_tags *tags,
> +                u16 command_id)
> +{
> +	return blk_mq_tag_to_rq(tags, nvme_tag_from_cid(command_id));
> +}
> +
>   #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
>   void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
>   			    const char *dev_name);
> @@ -594,7 +638,8 @@ static inline void nvme_put_ctrl(struct nvme_ctrl *ctrl)
>   
>   static inline bool nvme_is_aen_req(u16 qid, __u16 command_id)
>   {
> -	return !qid && command_id >= NVME_AQ_BLK_MQ_DEPTH;
> +	return !qid &&
> +		nvme_tag_from_cid(command_id) >= NVME_AQ_BLK_MQ_DEPTH;
>   }
>   
>   void nvme_complete_rq(struct request *req);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index ac26d242b630..7398c5a76956 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1014,7 +1014,7 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
>   		return;
>   	}
>   
> -	req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), command_id);
> +	req = nvme_find_rq(nvme_queue_tagset(nvmeq), command_id);
>   	if (unlikely(!req)) {
>   		dev_warn(nvmeq->dev->ctrl.device,
>   			"invalid id %d completed on queue %d\n",
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 203b47a8ec92..ab5b7d175488 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1727,10 +1727,10 @@ static void nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
>   	struct request *rq;
>   	struct nvme_rdma_request *req;
>   
> -	rq = blk_mq_tag_to_rq(nvme_rdma_tagset(queue), cqe->command_id);
> +	rq = nvme_find_rq(nvme_rdma_tagset(queue), cqe->command_id);
>   	if (!rq) {
>   		dev_err(queue->ctrl->ctrl.device,
> -			"tag 0x%x on QP %#x not found\n",
> +			"got bad command_id %#x on QP %#x\n",
>   			cqe->command_id, queue->qp->qp_num);
>   		nvme_rdma_error_recovery(queue->ctrl);
>   		return;
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index e189715ceca7..cf30568c9bbc 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -487,11 +487,11 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
>   {
>   	struct request *rq;
>   
> -	rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), cqe->command_id);
> +	rq = nvme_find_rq(nvme_tcp_tagset(queue), cqe->command_id);
>   	if (!rq) {
>   		dev_err(queue->ctrl->ctrl.device,
> -			"queue %d tag 0x%x not found\n",
> -			nvme_tcp_queue_id(queue), cqe->command_id);
> +			"got bad cqe.command_id %#x on queue %d\n",
> +			cqe->command_id, nvme_tcp_queue_id(queue));
>   		nvme_tcp_error_recovery(&queue->ctrl->ctrl);
>   		return -EINVAL;
>   	}
> @@ -508,11 +508,11 @@ static int nvme_tcp_handle_c2h_data(struct nvme_tcp_queue *queue,
>   {
>   	struct request *rq;
>   
> -	rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
> +	rq = nvme_find_rq(nvme_tcp_tagset(queue), pdu->command_id);
>   	if (!rq) {
>   		dev_err(queue->ctrl->ctrl.device,
> -			"queue %d tag %#x not found\n",
> -			nvme_tcp_queue_id(queue), pdu->command_id);
> +			"got bad c2hdata.command_id %#x on queue %d\n",
> +			pdu->command_id, nvme_tcp_queue_id(queue));
>   		return -ENOENT;
>   	}
>   
> @@ -599,7 +599,7 @@ static int nvme_tcp_setup_h2c_data_pdu(struct nvme_tcp_request *req,
>   	data->hdr.plen =
>   		cpu_to_le32(data->hdr.hlen + hdgst + req->pdu_len + ddgst);
>   	data->ttag = pdu->ttag;
> -	data->command_id = rq->tag;
> +	data->command_id = nvme_cid(rq);
>   	data->data_offset = cpu_to_le32(req->data_sent);
>   	data->data_length = cpu_to_le32(req->pdu_len);
>   	return 0;
> @@ -612,11 +612,11 @@ static int nvme_tcp_handle_r2t(struct nvme_tcp_queue *queue,
>   	struct request *rq;
>   	int ret;
>   
> -	rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
> +	rq = nvme_find_rq(nvme_tcp_tagset(queue), pdu->command_id);
>   	if (!rq) {
>   		dev_err(queue->ctrl->ctrl.device,
> -			"queue %d tag %#x not found\n",
> -			nvme_tcp_queue_id(queue), pdu->command_id);
> +			"got bad r2t.command_id %#x on queue %d\n",
> +			pdu->command_id, nvme_tcp_queue_id(queue));
>   		return -ENOENT;
>   	}
>   	req = blk_mq_rq_to_pdu(rq);
> @@ -696,7 +696,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
>   {
>   	struct nvme_tcp_data_pdu *pdu = (void *)queue->pdu;
>   	struct request *rq =
> -		blk_mq_tag_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
> +		nvme_cid_to_rq(nvme_tcp_tagset(queue), pdu->command_id);
>   	struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq);
>   
>   	while (true) {
> @@ -789,8 +789,8 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue,
>   	}
>   
>   	if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
> -		struct request *rq = blk_mq_tag_to_rq(nvme_tcp_tagset(queue),
> -						pdu->command_id);
> +		struct request *rq = nvme_cid_to_rq(nvme_tcp_tagset(queue),
> +					pdu->command_id);
>   
>   		nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
>   		queue->nr_cqe++;
> diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
> index 1b89a6bb819a..9f1f5d572960 100644
> --- a/drivers/nvme/target/loop.c
> +++ b/drivers/nvme/target/loop.c
> @@ -107,10 +107,10 @@ static void nvme_loop_queue_response(struct nvmet_req *req)
>   	} else {
>   		struct request *rq;
>   
> -		rq = blk_mq_tag_to_rq(nvme_loop_tagset(queue), cqe->command_id);
> +		rq = nvme_find_rq(nvme_loop_tagset(queue), cqe->command_id);
>   		if (!rq) {
>   			dev_err(queue->ctrl->ctrl.device,
> -				"tag 0x%x on queue %d not found\n",
> +				"got bad command_id %#x on queue %d\n",
>   				cqe->command_id, nvme_loop_queue_idx(queue));
>   			return;
>   		}

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-25 22:50   ` Max Gurtovoy
@ 2021-05-26  0:39     ` Keith Busch
  2021-05-26  1:47       ` Chaitanya Kulkarni
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2021-05-26  0:39 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: linux-nvme

On Wed, May 26, 2021 at 01:50:29AM +0300, Max Gurtovoy wrote:
> 
> On 5/19/2021 8:43 PM, Sagi Grimberg wrote:
> > We cannot detect a (perhaps buggy) controller that is sending us
> > a completion for a request that was already completed (for example
> > sending a completion twice), this phenomenon was seen in the wild
> > a few times.
> 
> The bad controller should be fixed.
> 
> In the past, I've sent patches that check that sqid match in nvme cqe to
> protect faulty drives that might send
> the completion on a wrong msix.
> 
> My patch wasn't accepted since it added an additional "if" in the fast path.
> 
> Now we're adding much more operation in the fast path because of buggy ctrl
> ?

I shared the same performance concern on v1 on this series. I haven't
been able to test this one yet (only have emulation for two more weeks).

Hannes says the bug this catches happens frequently enough on TCP. If we
don't catch it, we get kernel panic or corruption, so we defintely need to
do something. Sagi correctly noted this type of bug is not unique to TCP
(or even NVMe, for that matter), but if there is a performance impact on
PCI, and no one so far reports such an issue, I would still recommend
this type of mitigation be isolated to transports that actually observe
invalid CQEs.
 
> > So to protect against this, we use the upper 4 msbits of the nvme sqe
> > command_id to use as a 4-bit generation counter and verify it matches
> > the existing request generation that is incrementing on every execution.
> > 
> > The 16-bit command_id structure now is constructed by:
> > | xxxx | xxxxxxxxxxxx |
> >    gen    request tag
> 
> Is this new specification for command_id ?

The host can put whatever it wants in this field. It's an opaque value
as far as the controller is concerned.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-26  0:39     ` Keith Busch
@ 2021-05-26  1:47       ` Chaitanya Kulkarni
  2021-05-26  8:41         ` Sagi Grimberg
  0 siblings, 1 reply; 25+ messages in thread
From: Chaitanya Kulkarni @ 2021-05-26  1:47 UTC (permalink / raw)
  To: Keith Busch, Max Gurtovoy; +Cc: linux-nvme

On 5/25/21 17:45, Keith Busch wrote:
>> The bad controller should be fixed.
>>
>> In the past, I've sent patches that check that sqid match in nvme cqe to
>> protect faulty drives that might send
>> the completion on a wrong msix.
>>
>> My patch wasn't accepted since it added an additional "if" in the fast path.
>>
>> Now we're adding much more operation in the fast path because of buggy ctrl
>> ?
> I shared the same performance concern on v1 on this series. I haven't
> been able to test this one yet (only have emulation for two more weeks).
>
> Hannes says the bug this catches happens frequently enough on TCP. If we
> don't catch it, we get kernel panic or corruption, so we defintely need to
> do something. Sagi correctly noted this type of bug is not unique to TCP
> (or even NVMe, for that matter), but if there is a performance impact on
> PCI, and no one so far reports such an issue, I would still recommend
> this type of mitigation be isolated to transports that actually observe
> invalid CQEs.
>  

Please do that if possible.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-26  1:47       ` Chaitanya Kulkarni
@ 2021-05-26  8:41         ` Sagi Grimberg
  2021-05-26  8:48           ` Hannes Reinecke
  2021-05-26  9:26           ` Max Gurtovoy
  0 siblings, 2 replies; 25+ messages in thread
From: Sagi Grimberg @ 2021-05-26  8:41 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Keith Busch, Max Gurtovoy; +Cc: linux-nvme


>>> The bad controller should be fixed.
>>>
>>> In the past, I've sent patches that check that sqid match in nvme cqe to
>>> protect faulty drives that might send
>>> the completion on a wrong msix.
>>>
>>> My patch wasn't accepted since it added an additional "if" in the fast path.
>>>
>>> Now we're adding much more operation in the fast path because of buggy ctrl
>>> ?
>> I shared the same performance concern on v1 on this series. I haven't
>> been able to test this one yet (only have emulation for two more weeks).
>>
>> Hannes says the bug this catches happens frequently enough on TCP. If we
>> don't catch it, we get kernel panic or corruption, so we defintely need to
>> do something. Sagi correctly noted this type of bug is not unique to TCP
>> (or even NVMe, for that matter), but if there is a performance impact on
>> PCI, and no one so far reports such an issue, I would still recommend
>> this type of mitigation be isolated to transports that actually observe
>> invalid CQEs.
>>   
> 
> Please do that if possible.

I disagree with that. The original report was about TCP, but that is
pure coincidence as far as I'm concerned, as I indicated before there
is nothing TCP specific about this.

Today we don't protect against such a scenario. I originally dismissed
this issue as we should probably address the a buggy controller, but an
argument was made that a buggy controller should not be able to crash
the kernel and we should have a safe-guard for it, which is a valid
argument, but this is not the worst that can happen, this could have
caused a silent data corruption with a slightly different timing. So
it seems absolutely appropriate to me considering the possible
consequences.

The way I see it, we should decide where we stand here, either we
continue to ignore this possible panic/data-corruption (which I
personally think would be wrong), or we address it, and if we do,
we should do it in the correct place, not limit it to the observed
component.

As for the performance concerns, I'd be surprised if performance is
noticeably impacted from 2 assignments and 2 optimized branches.
Also the overall nvme_request did not grow (still 32-bytes, half a
cacheline). But let's see if measurements prove otherwise...

Before patch:
struct nvme_request {
         struct nvme_command *      cmd;                  /*     0     8 */
         union nvme_result          result;               /*     8     8 */
         u8                         retries;              /*    16     1 */
         u8                         flags;                /*    17     1 */
         u16                        status;               /*    18     2 */

         /* XXX 4 bytes hole, try to pack */

         struct nvme_ctrl *         ctrl;                 /*    24     8 */

         /* size: 32, cachelines: 1, members: 6 */
         /* sum members: 28, holes: 1, sum holes: 4 */
         /* last cacheline: 32 bytes */
};

After patch:
truct nvme_request {
         struct nvme_command *      cmd;                  /*     0     8 */
         union nvme_result          result;               /*     8     8 */
         u8                         genctr;               /*    16     1 */
         u8                         retries;              /*    17     1 */
         u8                         flags;                /*    18     1 */

         /* XXX 1 byte hole, try to pack */

         u16                        status;               /*    20     2 */

         /* XXX 2 bytes hole, try to pack */

         struct nvme_ctrl *         ctrl;                 /*    24     8 */

         /* size: 32, cachelines: 1, members: 7 */
         /* sum members: 29, holes: 2, sum holes: 3 */
         /* last cacheline: 32 bytes */
};


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-26  8:41         ` Sagi Grimberg
@ 2021-05-26  8:48           ` Hannes Reinecke
  2021-05-26  9:26           ` Max Gurtovoy
  1 sibling, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2021-05-26  8:48 UTC (permalink / raw)
  To: Sagi Grimberg, Chaitanya Kulkarni, Keith Busch, Max Gurtovoy; +Cc: linux-nvme

On 5/26/21 10:41 AM, Sagi Grimberg wrote:
> 
>>>> The bad controller should be fixed.
>>>>
>>>> In the past, I've sent patches that check that sqid match in nvme
>>>> cqe to
>>>> protect faulty drives that might send
>>>> the completion on a wrong msix.
>>>>
>>>> My patch wasn't accepted since it added an additional "if" in the
>>>> fast path.
>>>>
>>>> Now we're adding much more operation in the fast path because of
>>>> buggy ctrl
>>>> ?
>>> I shared the same performance concern on v1 on this series. I haven't
>>> been able to test this one yet (only have emulation for two more weeks).
>>>
>>> Hannes says the bug this catches happens frequently enough on TCP. If we
>>> don't catch it, we get kernel panic or corruption, so we defintely
>>> need to
>>> do something. Sagi correctly noted this type of bug is not unique to TCP
>>> (or even NVMe, for that matter), but if there is a performance impact on
>>> PCI, and no one so far reports such an issue, I would still recommend
>>> this type of mitigation be isolated to transports that actually observe
>>> invalid CQEs.
>>>   
>>
>> Please do that if possible.
> 
> I disagree with that. The original report was about TCP, but that is
> pure coincidence as far as I'm concerned, as I indicated before there
> is nothing TCP specific about this.
> 
> Today we don't protect against such a scenario. I originally dismissed
> this issue as we should probably address the a buggy controller, but an
> argument was made that a buggy controller should not be able to crash
> the kernel and we should have a safe-guard for it, which is a valid
> argument, but this is not the worst that can happen, this could have
> caused a silent data corruption with a slightly different timing. So
> it seems absolutely appropriate to me considering the possible
> consequences.
> 
> The way I see it, we should decide where we stand here, either we
> continue to ignore this possible panic/data-corruption (which I
> personally think would be wrong), or we address it, and if we do,
> we should do it in the correct place, not limit it to the observed
> component.
> 
I am perfectly fine with extending this to all transports; the prime
reason why we haven't seen it is because other transports like RDMA or
FC are way more encapsulated, making it harder to receive invalid packets.
(Lossless fabrics and all that).
TCP is far more loosely structured, allowing for invalid packets are
more easily.
But really this issue is present in all transports, so we should be
addressing it on a general level.
(And it's only a matter of time until someone builds cheapo NVMe-pci
devices having similar issues :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/4] nvme-pci: limit maximum queue depth to 4095
  2021-05-19 17:43 ` [PATCH v2 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg
  2021-05-19 21:12   ` Chaitanya Kulkarni
@ 2021-05-26  8:55   ` Hannes Reinecke
  1 sibling, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2021-05-26  8:55 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

On 5/19/21 7:43 PM, Sagi Grimberg wrote:
> We are going to use the upper 4-bits of the command_id for a generation
> counter, so enforce the new queue depth upper limit. As we enforce
> both min and max queue depth, use param_set_uint_minmax istead of
> open coding it.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/host/pci.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a29b170701fc..ac26d242b630 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -60,6 +60,8 @@ MODULE_PARM_DESC(sgl_threshold,
>  		"Use SGLs when average request segment size is larger or equal to "
>  		"this size. Use 0 to disable SGLs.");
>  
> +#define NVME_PCI_MIN_QUEUE_SIZE 2
> +#define NVME_PCI_MAX_QUEUE_SIZE 4095
>  static int io_queue_depth_set(const char *val, const struct kernel_param *kp);
>  static const struct kernel_param_ops io_queue_depth_ops = {
>  	.set = io_queue_depth_set,
> @@ -68,7 +70,7 @@ static const struct kernel_param_ops io_queue_depth_ops = {
>  
>  static unsigned int io_queue_depth = 1024;
>  module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644);
> -MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
> +MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2 and < 4096");
>  
Why do we limit it to 4095 (ie one less than 4096), but have the default
at 1024?
Shouldn't the default be one less, too?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data
  2021-05-19 17:43 ` [PATCH v2 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data Sagi Grimberg
@ 2021-05-26  8:56   ` Hannes Reinecke
  0 siblings, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2021-05-26  8:56 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

On 5/19/21 7:43 PM, Sagi Grimberg wrote:
> We already validate it when receiving the c2hdata pdu header
> and this is not changing so this is a redundant check.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/host/tcp.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-19 17:43 ` [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg
  2021-05-20  6:49   ` Daniel Wagner
  2021-05-25 22:50   ` Max Gurtovoy
@ 2021-05-26  8:59   ` Hannes Reinecke
  2 siblings, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2021-05-26  8:59 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner

On 5/19/21 7:43 PM, Sagi Grimberg wrote:
> We cannot detect a (perhaps buggy) controller that is sending us
> a completion for a request that was already completed (for example
> sending a completion twice), this phenomenon was seen in the wild
> a few times.
> 
> So to protect against this, we use the upper 4 msbits of the nvme sqe
> command_id to use as a 4-bit generation counter and verify it matches
> the existing request generation that is incrementing on every execution.
> 
> The 16-bit command_id structure now is constructed by:
> | xxxx | xxxxxxxxxxxx |
>   gen    request tag
> 
> This means that we are giving up some possible queue depth as 12 bits
> allow for a maximum queue depth of 4095 instead of 65536, however we
> never create such long queues anyways so no real harm done.
> 
> Suggested-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/host/core.c   |  3 ++-
>  drivers/nvme/host/nvme.h   | 47 +++++++++++++++++++++++++++++++++++++-
>  drivers/nvme/host/pci.c    |  2 +-
>  drivers/nvme/host/rdma.c   |  4 ++--
>  drivers/nvme/host/tcp.c    | 26 ++++++++++-----------
>  drivers/nvme/target/loop.c |  4 ++--
>  6 files changed, 66 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e6612971f4eb..8308dd1e9e3f 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1006,7 +1006,8 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
>  		return BLK_STS_IOERR;
>  	}
>  
> -	cmd->common.command_id = req->tag;
> +	nvme_req(req)->genctr++;
> +	cmd->common.command_id = nvme_cid(req);
>  	trace_nvme_setup_cmd(req, cmd);
>  	return ret;
>  }
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 05f31a2c64bb..e6f72c82582d 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -158,6 +158,7 @@ enum nvme_quirks {
>  struct nvme_request {
>  	struct nvme_command	*cmd;
>  	union nvme_result	result;
> +	u8			genctr;
>  	u8			retries;
>  	u8			flags;
>  	u16			status;
> @@ -497,6 +498,49 @@ struct nvme_ctrl_ops {
>  	int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
>  };
>  
> +/*
> + * nvme command_id is constructed as such:
> + * | xxxx | xxxxxxxxxxxx |
> + *   gen    request tag
> + */
> +#define nvme_genctr_mask(gen)			(gen & 0xf)
> +#define nvme_cid_install_genctr(gen)		(nvme_genctr_mask(gen) << 12)
> +#define nvme_genctr_from_cid(cid)		((cid & 0xf000) >> 12)
> +#define nvme_tag_from_cid(cid)			(cid & 0xfff)
> +
> +static inline u16 nvme_cid(struct request *rq)
> +{
> +	return nvme_cid_install_genctr(nvme_req(rq)->genctr) | rq->tag;
> +}
> +
> +static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
> +		u16 command_id)
> +{
> +	u8 genctr = nvme_genctr_from_cid(command_id);
> +	u16 tag = nvme_tag_from_cid(command_id);
> +	struct request *rq;
> +
> +	rq = blk_mq_tag_to_rq(tags, tag);
> +	if (unlikely(!rq)) {
> +		pr_err("could not locate request for tag %#x\n",
> +			tag);
> +		return NULL;
> +	}
> +	if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
> +		dev_err(nvme_req(rq)->ctrl->device,
> +			"request %#x genctr mismatch (got %#x expected %#x)\n",
> +			tag, genctr, nvme_req(rq)->genctr);
> +		return NULL;
> +	}
> +	return rq;
> +}
> +
> +static inline struct request *nvme_cid_to_rq(struct blk_mq_tags *tags,
> +                u16 command_id)
> +{
> +	return blk_mq_tag_to_rq(tags, nvme_tag_from_cid(command_id));
> +}
> +
>  #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
>  void nvme_fault_inject_init(struct nvme_fault_inject *fault_inj,
>  			    const char *dev_name);

What about raising NVMF_MAX_QUEUE_SIZE to 4096?

Other than that:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation
  2021-05-26  8:41         ` Sagi Grimberg
  2021-05-26  8:48           ` Hannes Reinecke
@ 2021-05-26  9:26           ` Max Gurtovoy
  1 sibling, 0 replies; 25+ messages in thread
From: Max Gurtovoy @ 2021-05-26  9:26 UTC (permalink / raw)
  To: Sagi Grimberg, Chaitanya Kulkarni, Keith Busch
  Cc: linux-nvme, Christoph Hellwig


On 5/26/2021 11:41 AM, Sagi Grimberg wrote:
>
>>>> The bad controller should be fixed.
>>>>
>>>> In the past, I've sent patches that check that sqid match in nvme 
>>>> cqe to
>>>> protect faulty drives that might send
>>>> the completion on a wrong msix.
>>>>
>>>> My patch wasn't accepted since it added an additional "if" in the 
>>>> fast path.
>>>>
>>>> Now we're adding much more operation in the fast path because of 
>>>> buggy ctrl
>>>> ?
>>> I shared the same performance concern on v1 on this series. I haven't
>>> been able to test this one yet (only have emulation for two more 
>>> weeks).
>>>
>>> Hannes says the bug this catches happens frequently enough on TCP. 
>>> If we
>>> don't catch it, we get kernel panic or corruption, so we defintely 
>>> need to
>>> do something. Sagi correctly noted this type of bug is not unique to 
>>> TCP
>>> (or even NVMe, for that matter), but if there is a performance 
>>> impact on
>>> PCI, and no one so far reports such an issue, I would still recommend
>>> this type of mitigation be isolated to transports that actually observe
>>> invalid CQEs.
>>
>> Please do that if possible.
>
> I disagree with that. The original report was about TCP, but that is
> pure coincidence as far as I'm concerned, as I indicated before there
> is nothing TCP specific about this.
>
> Today we don't protect against such a scenario. I originally dismissed
> this issue as we should probably address the a buggy controller, but an
> argument was made that a buggy controller should not be able to crash
> the kernel and we should have a safe-guard for it, which is a valid
> argument, but this is not the worst that can happen, this could have
> caused a silent data corruption with a slightly different timing. So
> it seems absolutely appropriate to me considering the possible
> consequences.
>
> The way I see it, we should decide where we stand here, either we
> continue to ignore this possible panic/data-corruption (which I
> personally think would be wrong), or we address it, and if we do,
> we should do it in the correct place, not limit it to the observed
> component.
>
> As for the performance concerns, I'd be surprised if performance is
> noticeably impacted from 2 assignments and 2 optimized branches.
> Also the overall nvme_request did not grow (still 32-bytes, half a
> cacheline). But let's see if measurements prove otherwise...

Of course that this will not have a big impact, but this is not the point.

The point is that I don't understand the state of mind.

why didn't we defend from buggy controller with the below code (We had a 
bug in our NVMe SNAP controller that caused this, so I decided to 
contribute this case upstream):

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5a0bf6a..ab7ff34 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -983,6 +983,13 @@ static inline void nvme_handle_cqe(struct 
nvme_queue *nvmeq, u16 idx)
         volatile struct nvme_completion *cqe = &nvmeq->cqes[idx];
         struct request *req;

+       if (unlikely(le16_to_cpu(cqe->sq_id) != nvmeq->qid)) {
+               dev_warn(nvmeq->dev->ctrl.device,
+                        "got completion on sqid %d instead of sqid %d\n",
+                        nvmeq->qid, le16_to_cpu(cqe->sq_id));
+               return;
+       }
+
         if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
                 dev_warn(nvmeq->dev->ctrl.device,
                         "invalid id %d completed on queue %d\n",

And why we want to defend from buggy controllers that perform double 
completion ?



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] nvme: protect against possible request reference after completion
  2021-05-19 17:43 [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
                   ` (3 preceding siblings ...)
  2021-05-19 17:43 ` [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg
@ 2021-06-16 16:28 ` Sagi Grimberg
  2021-06-16 17:04   ` Keith Busch
  4 siblings, 1 reply; 25+ messages in thread
From: Sagi Grimberg @ 2021-06-16 16:28 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch; +Cc: Daniel Wagner


> Nothing in nvme protects against referencing a request after it was completed.
> For example, in case a buggy controller sends a completion twice for the same
> request, the host can access and modify a request that was already completed.
> 
> At best, this will cause a panic, but on the worst case, this can cause a silent
> data corruption if the request was already reused and executed by the time
> we reference it.
> 
> The nvme command_id is an opaque that we simply placed the request tag thus far.
> To protect against a access after completion, we introduce a generation counter
> to the upper 4-bits of the command_id that will increment every invocation and
> be validated upon the reception of a completion. This will limit the maximum
> queue depth to be effectively 4095, but we hardly ever use such long queues
> (in fabrics the maximum is already 1024).

Keith,

Did you get a chance to look at the performance impact of this patch 
set? I think we are all in agreement that this is a useful safeguard if
there is no major performance impact.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] nvme: protect against possible request reference after completion
  2021-06-16 16:28 ` [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
@ 2021-06-16 17:04   ` Keith Busch
  2021-06-16 21:05     ` Sagi Grimberg
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2021-06-16 17:04 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Daniel Wagner

On Wed, Jun 16, 2021 at 09:28:53AM -0700, Sagi Grimberg wrote:
> 
> > Nothing in nvme protects against referencing a request after it was completed.
> > For example, in case a buggy controller sends a completion twice for the same
> > request, the host can access and modify a request that was already completed.
> > 
> > At best, this will cause a panic, but on the worst case, this can cause a silent
> > data corruption if the request was already reused and executed by the time
> > we reference it.
> > 
> > The nvme command_id is an opaque that we simply placed the request tag thus far.
> > To protect against a access after completion, we introduce a generation counter
> > to the upper 4-bits of the command_id that will increment every invocation and
> > be validated upon the reception of a completion. This will limit the maximum
> > queue depth to be effectively 4095, but we hardly ever use such long queues
> > (in fabrics the maximum is already 1024).
> 
> Keith,
> 
> Did you get a chance to look at the performance impact of this patch set? I
> think we are all in agreement that this is a useful safeguard if
> there is no major performance impact.

Yes, I was able to set up my more powerful machine for perf tests
earlier this week. I ran fio hipri tests against this patch yesterday,
and did not observe a measurable difference with either io_uring or
pvsync2 engines, so no objection from me here.

Acked-by: Keith Busch <kbusch@kernel.org>


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] nvme: protect against possible request reference after completion
  2021-06-16 17:04   ` Keith Busch
@ 2021-06-16 21:05     ` Sagi Grimberg
  2021-07-01 11:51       ` Daniel Wagner
  0 siblings, 1 reply; 25+ messages in thread
From: Sagi Grimberg @ 2021-06-16 21:05 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, Christoph Hellwig, Daniel Wagner


>>> Nothing in nvme protects against referencing a request after it was completed.
>>> For example, in case a buggy controller sends a completion twice for the same
>>> request, the host can access and modify a request that was already completed.
>>>
>>> At best, this will cause a panic, but on the worst case, this can cause a silent
>>> data corruption if the request was already reused and executed by the time
>>> we reference it.
>>>
>>> The nvme command_id is an opaque that we simply placed the request tag thus far.
>>> To protect against a access after completion, we introduce a generation counter
>>> to the upper 4-bits of the command_id that will increment every invocation and
>>> be validated upon the reception of a completion. This will limit the maximum
>>> queue depth to be effectively 4095, but we hardly ever use such long queues
>>> (in fabrics the maximum is already 1024).
>>
>> Keith,
>>
>> Did you get a chance to look at the performance impact of this patch set? I
>> think we are all in agreement that this is a useful safeguard if
>> there is no major performance impact.
> 
> Yes, I was able to set up my more powerful machine for perf tests
> earlier this week. I ran fio hipri tests against this patch yesterday,
> and did not observe a measurable difference with either io_uring or
> pvsync2 engines, so no objection from me here.
> 
> Acked-by: Keith Busch <kbusch@kernel.org>

Cool, Christoph, where do you stand on this? Can we move forward with
this?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] nvme: protect against possible request reference after completion
  2021-06-16 21:05     ` Sagi Grimberg
@ 2021-07-01 11:51       ` Daniel Wagner
  2021-07-01 11:52         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Wagner @ 2021-07-01 11:51 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Keith Busch, linux-nvme, Christoph Hellwig

On Wed, Jun 16, 2021 at 02:05:16PM -0700, Sagi Grimberg wrote:
> Cool, Christoph, where do you stand on this? Can we move forward with
> this?

Same here, I would like to get this merged. Thanks!

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] nvme: protect against possible request reference after completion
  2021-07-01 11:51       ` Daniel Wagner
@ 2021-07-01 11:52         ` Christoph Hellwig
  2021-07-08 10:02           ` Daniel Wagner
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2021-07-01 11:52 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Christoph Hellwig

On Thu, Jul 01, 2021 at 01:51:23PM +0200, Daniel Wagner wrote:
> On Wed, Jun 16, 2021 at 02:05:16PM -0700, Sagi Grimberg wrote:
> > Cool, Christoph, where do you stand on this? Can we move forward with
> > this?
> 
> Same here, I would like to get this merged. Thanks!

I think this can go in as soon as we open the nvme-5.15 branch.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] nvme: protect against possible request reference after completion
  2021-07-01 11:52         ` Christoph Hellwig
@ 2021-07-08 10:02           ` Daniel Wagner
  2021-07-09  6:37             ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Wagner @ 2021-07-08 10:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme

On Thu, Jul 01, 2021 at 01:52:20PM +0200, Christoph Hellwig wrote:
> I think this can go in as soon as we open the nvme-5.15 branch.

nvme-5.15 or nvme-5.14?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/4] nvme: protect against possible request reference after completion
  2021-07-08 10:02           ` Daniel Wagner
@ 2021-07-09  6:37             ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2021-07-09  6:37 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme

On Thu, Jul 08, 2021 at 12:02:43PM +0200, Daniel Wagner wrote:
> On Thu, Jul 01, 2021 at 01:52:20PM +0200, Christoph Hellwig wrote:
> > I think this can go in as soon as we open the nvme-5.15 branch.
> 
> nvme-5.15 or nvme-5.14?

5.15.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-07-09  6:38 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 17:43 [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
2021-05-19 17:43 ` [PATCH v2 1/4] params: lift param_set_uint_minmax to common code Sagi Grimberg
2021-05-19 21:10   ` Chaitanya Kulkarni
2021-05-20  6:01   ` Christoph Hellwig
2021-05-19 17:43 ` [PATCH v2 2/4] nvme-pci: limit maximum queue depth to 4095 Sagi Grimberg
2021-05-19 21:12   ` Chaitanya Kulkarni
2021-05-26  8:55   ` Hannes Reinecke
2021-05-19 17:43 ` [PATCH v2 3/4] nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data Sagi Grimberg
2021-05-26  8:56   ` Hannes Reinecke
2021-05-19 17:43 ` [PATCH v2 4/4] nvme: code command_id with a genctr for use-after-free validation Sagi Grimberg
2021-05-20  6:49   ` Daniel Wagner
2021-05-25 22:50   ` Max Gurtovoy
2021-05-26  0:39     ` Keith Busch
2021-05-26  1:47       ` Chaitanya Kulkarni
2021-05-26  8:41         ` Sagi Grimberg
2021-05-26  8:48           ` Hannes Reinecke
2021-05-26  9:26           ` Max Gurtovoy
2021-05-26  8:59   ` Hannes Reinecke
2021-06-16 16:28 ` [PATCH v2 0/4] nvme: protect against possible request reference after completion Sagi Grimberg
2021-06-16 17:04   ` Keith Busch
2021-06-16 21:05     ` Sagi Grimberg
2021-07-01 11:51       ` Daniel Wagner
2021-07-01 11:52         ` Christoph Hellwig
2021-07-08 10:02           ` Daniel Wagner
2021-07-09  6:37             ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.