All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] nvme-pci: check that sqid match in nvme cqe
@ 2019-01-01 13:37 Max Gurtovoy
  2019-01-04 18:13 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2019-01-01 13:37 UTC (permalink / raw)


This patch protects against faulty drives (e.g: a drive might send
the completion on a wrong msix).

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/pci.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5a0bf6a..ab7ff34 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -983,6 +983,13 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
 	volatile struct nvme_completion *cqe = &nvmeq->cqes[idx];
 	struct request *req;
 
+	if (unlikely(le16_to_cpu(cqe->sq_id) != nvmeq->qid)) {
+		dev_warn(nvmeq->dev->ctrl.device,
+			 "got completion on sqid %d instead of sqid %d\n",
+			 nvmeq->qid, le16_to_cpu(cqe->sq_id));
+		return;
+	}
+
 	if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
 		dev_warn(nvmeq->dev->ctrl.device,
 			"invalid id %d completed on queue %d\n",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvme-pci: check that sqid match in nvme cqe
  2019-01-01 13:37 [PATCH 1/1] nvme-pci: check that sqid match in nvme cqe Max Gurtovoy
@ 2019-01-04 18:13 ` Christoph Hellwig
  2019-01-06 12:21   ` Max Gurtovoy
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2019-01-04 18:13 UTC (permalink / raw)


On Tue, Jan 01, 2019@03:37:18PM +0200, Max Gurtovoy wrote:
> This patch protects against faulty drives (e.g: a drive might send
> the completion on a wrong msix).

Where/how do you see this issue?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvme-pci: check that sqid match in nvme cqe
  2019-01-04 18:13 ` Christoph Hellwig
@ 2019-01-06 12:21   ` Max Gurtovoy
  2019-01-09 18:35     ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2019-01-06 12:21 UTC (permalink / raw)



On 1/4/2019 8:13 PM, Christoph Hellwig wrote:
> On Tue, Jan 01, 2019@03:37:18PM +0200, Max Gurtovoy wrote:
>> This patch protects against faulty drives (e.g: a drive might send
>> the completion on a wrong msix).
> Where/how do you see this issue?

maybe it wasn't a good example but let's say I can change 1 line in QEMU 
code (@hw/block/nvme.c) to trigger this issue.

if I'll change

req->cqe.sq_id = cpu_to_le16(sq->sqid);

to be

req->cqe.sq_id = 0;

we'll always get wrong sq_id except of admin queue commands.

This patch tries to protect against HW/FW bugs and we need to decide if 
it's in of our driver interest to do so.

if not, let's abandon this commit.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvme-pci: check that sqid match in nvme cqe
  2019-01-06 12:21   ` Max Gurtovoy
@ 2019-01-09 18:35     ` Christoph Hellwig
  2019-01-09 18:35       ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2019-01-09 18:35 UTC (permalink / raw)


On Sun, Jan 06, 2019@02:21:53PM +0200, Max Gurtovoy wrote:
> maybe it wasn't a good example but let's say I can change 1 line in QEMU 
> code (@hw/block/nvme.c) to trigger this issue.
>
> if I'll change
>
> req->cqe.sq_id = cpu_to_le16(sq->sqid);
>
> to be
>
> req->cqe.sq_id = 0;
>
> we'll always get wrong sq_id except of admin queue commands.
>
> This patch tries to protect against HW/FW bugs and we need to decide if 
> it's in of our driver interest to do so.
>
> if not, let's abandon this commit.

Yes, but hardware that gets this wrong will just blow up and timeout
anyway.  I'm not sure this case is worth adding a special detetion
in the slow path.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvme-pci: check that sqid match in nvme cqe
  2019-01-09 18:35     ` Christoph Hellwig
@ 2019-01-09 18:35       ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2019-01-09 18:35 UTC (permalink / raw)


On Wed, Jan 09, 2019@07:35:21PM +0100, Christoph Hellwig wrote:
> Yes, but hardware that gets this wrong will just blow up and timeout
> anyway.  I'm not sure this case is worth adding a special detetion
> in the slow path.

s/slow/fast/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-01-09 18:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01 13:37 [PATCH 1/1] nvme-pci: check that sqid match in nvme cqe Max Gurtovoy
2019-01-04 18:13 ` Christoph Hellwig
2019-01-06 12:21   ` Max Gurtovoy
2019-01-09 18:35     ` Christoph Hellwig
2019-01-09 18:35       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.