linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2] Workaround for discard on non-conformant nvme devices
@ 2019-11-10 18:27 Eduard Hasenleithner
  2019-11-11 10:28 ` Christoph Hellwig
  2019-11-11 18:58 ` Keith Busch
  0 siblings, 2 replies; 4+ messages in thread
From: Eduard Hasenleithner @ 2019-11-10 18:27 UTC (permalink / raw)
  To: linux-nvme; +Cc: Keith Busch

As documented in https://bugzilla.kernel.org/show_bug.cgi?id=202665 
there are lots of Linux nvme users which get IO-MMU related errors when 
performing discard on nvme. So far analysis suggests that the errors are 
caused by non-conformant nvme devices which are reading beyond the end 
of the buffer containing the segments to be discarded.

Until now two different variants of this behavior have been observed: 
The controller found on an Intel 660p always reads a multiple of 512 
bytes. If the last chunk exceeds a page it continues with the subsequent 
page. For a Corsair MP510 the situation is even worse: The controller 
always reads a full page (4096) bytes. Then when the address is not 
aligned to 4096 it will continue reading at the address given in PRP2 
(which is most of the time 0).

This patch
* introduces a new NVME_QUIRK_DSM_PAGEALIGN
* makes the nvme_setup_discard function always return the
	(page aligned) discard_page for devices with the quirk

Since this patch is only RFC a list of affected devices is not included yet.

Signed-off-by: Eduard Hasenleithner <eduard@hasenleithner.at>

--- linux-5.3.7/drivers/nvme/host/core.c.orig	2019-11-04 
21:53:20.758837001 +0100
+++ linux-5.3.7/drivers/nvme/host/core.c	2019-11-10 19:11:49.419372363 +0100
@@ -562,8 +562,13 @@ static blk_status_t nvme_setup_discard(s
  	struct nvme_dsm_range *range;
  	struct bio *bio;

-	range = kmalloc_array(segments, sizeof(*range),
-				GFP_ATOMIC | __GFP_NOWARN);
+	if (!(ns->ctrl->quirks & NVME_QUIRK_DSM_PAGEALIGN)) {
+		range = kmalloc_array(segments, sizeof(*range),
+					GFP_ATOMIC | __GFP_NOWARN);
+	} else {
+		/* Device with quirk: use (page aligned) discard_page */
+		range = NULL;
+	}
  	if (!range) {
  		/*
  		 * If we fail allocation our range, fallback to the controller
--- linux-5.3.7/drivers/nvme/host/nvme.h.orig	2019-11-10 
18:16:39.097549037 +0100
+++ linux-5.3.7/drivers/nvme/host/nvme.h	2019-11-10 18:19:26.084893558 +0100
@@ -97,6 +97,11 @@ enum nvme_quirks {
  	 * Force simple suspend/resume path.
  	 */
  	NVME_QUIRK_SIMPLE_SUSPEND		= (1 << 10),
+
+	/*
+	 * Workaround for devices reading DSM beyond end of page.
+	 */
+	NVME_QUIRK_DSM_PAGEALIGN		= (1 << 11),
  };

  /*

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH v2] Workaround for discard on non-conformant nvme devices
  2019-11-10 18:27 [RFC PATCH v2] Workaround for discard on non-conformant nvme devices Eduard Hasenleithner
@ 2019-11-11 10:28 ` Christoph Hellwig
  2019-11-11 18:58 ` Keith Busch
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2019-11-11 10:28 UTC (permalink / raw)
  To: Eduard Hasenleithner; +Cc: Keith Busch, linux-nvme

> +	if (!(ns->ctrl->quirks & NVME_QUIRK_DSM_PAGEALIGN)) {
> +		range = kmalloc_array(segments, sizeof(*range),
> +					GFP_ATOMIC | __GFP_NOWARN);
> +	} else {
> +		/* Device with quirk: use (page aligned) discard_page */
> +		range = NULL;
> +	}

Nitpick: Try to avoid pointless negations in conditionals.  Also I think
that comment needs to be expanded a bit to explain why we use the range.
That fact that we use it is fairly obvious from the code.

Also please add the quirk to the pci id table so that it actually gets
used.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH v2] Workaround for discard on non-conformant nvme devices
  2019-11-10 18:27 [RFC PATCH v2] Workaround for discard on non-conformant nvme devices Eduard Hasenleithner
  2019-11-11 10:28 ` Christoph Hellwig
@ 2019-11-11 18:58 ` Keith Busch
  2019-11-11 22:10   ` Eduard Hasenleithner
  1 sibling, 1 reply; 4+ messages in thread
From: Keith Busch @ 2019-11-11 18:58 UTC (permalink / raw)
  To: Eduard Hasenleithner; +Cc: linux-nvme

On Sun, Nov 10, 2019 at 07:27:08PM +0100, Eduard Hasenleithner wrote:
> +++ linux-5.3.7/drivers/nvme/host/core.c	2019-11-10 19:11:49.419372363 +0100
> @@ -562,8 +562,13 @@ static blk_status_t nvme_setup_discard(s
>  	struct nvme_dsm_range *range;
>  	struct bio *bio;
> 
> -	range = kmalloc_array(segments, sizeof(*range),
> -				GFP_ATOMIC | __GFP_NOWARN);
> +	if (!(ns->ctrl->quirks & NVME_QUIRK_DSM_PAGEALIGN)) {
> +		range = kmalloc_array(segments, sizeof(*range),
> +					GFP_ATOMIC | __GFP_NOWARN);
> +	} else {
> +		/* Device with quirk: use (page aligned) discard_page */
> +		range = NULL;
> +	}

Please initialize 'range' to NULL so we don't need the 'else' case.

> @@ -97,6 +97,11 @@ enum nvme_quirks {
>  	 * Force simple suspend/resume path.
>  	 */
>  	NVME_QUIRK_SIMPLE_SUSPEND		= (1 << 10),
> +
> +	/*
> +	 * Workaround for devices reading DSM beyond end of page.
> +	 */
> +	NVME_QUIRK_DSM_PAGEALIGN		= (1 << 11),

Let's get a better description of this. My recommendation:

	For devices that do not consider the DSM 'Number of Ranges'
	field when determining how much data to DMA. Page aligned and
	sized is always sufficient as that is the largest a DSM range
	list can be.

And let's add the known devices to the nvme pci_driver that require
this.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH v2] Workaround for discard on non-conformant nvme devices
  2019-11-11 18:58 ` Keith Busch
@ 2019-11-11 22:10   ` Eduard Hasenleithner
  0 siblings, 0 replies; 4+ messages in thread
From: Eduard Hasenleithner @ 2019-11-11 22:10 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme

On 11.11.19 19:58, Keith Busch wrote:
> Please initialize 'range' to NULL so we don't need the 'else' case.
	...
> Let's get a better description of this. My recommendation:
> 
> 	For devices that do not consider the DSM 'Number of Ranges'
> 	field when determining how much data to DMA. Page aligned and
> 	sized is always sufficient as that is the largest a DSM range
> 	list can be.
> 
> And let's add the known devices to the nvme pci_driver that require
> this.


Thanks for the input. I've considered this in the next version: PATCHv3.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-11 22:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-10 18:27 [RFC PATCH v2] Workaround for discard on non-conformant nvme devices Eduard Hasenleithner
2019-11-11 10:28 ` Christoph Hellwig
2019-11-11 18:58 ` Keith Busch
2019-11-11 22:10   ` Eduard Hasenleithner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).