Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHv4] Workaround for discard on non-conformant nvme devices
@ 2019-11-12 20:55 Eduard Hasenleithner
  2019-11-12 21:51 ` Keith Busch
  0 siblings, 1 reply; 2+ messages in thread
From: Eduard Hasenleithner @ 2019-11-12 20:55 UTC (permalink / raw)
  To: linux-nvme; +Cc: Keith Busch

As documented in https://bugzilla.kernel.org/show_bug.cgi?id=202665 there are lots of Linux nvme users which get IO-MMU related errors when performing discard on nvme. So far analysis suggests that the errors are caused by non-conformant nvme devices which are reading beyond the end of the buffer containing the segments to be discarded.

Until now two different variants of this behavior have been observed: SM22XX controllers round up the read size to a multiple of 512 bytes. Phison (E12) are found to unconditionally read the maximum amount of discard segment data allowed by the spec (256 segments or 4kB). In case the start address of the buffer is not aligned (512B or 4kB) and the actual data fits in the remainder of the buffer page it may cause the controller to read a page which is not mapped on IO-MMU.

The patch changes the nvme_setup_discard function to unconditionally allocate memory for a segment array of maximum size (256 segments). This prevents the nvme from reading beyond the end of the IO-MMU mapped buffer.

Signed-off-by: Eduard Hasenleithner <eduard@hasenleithner.at>

--- linux-5.3.7/drivers/nvme/host/core.c.orig	2019-11-12 20:42:16.394800789 +0100
+++ linux-5.3.7/drivers/nvme/host/core.c	2019-11-12 21:23:07.635266361 +0100
@@ -555,15 +555,22 @@ static inline void nvme_setup_flush(stru
 	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
+#define DSM_SEGMENTS_MAX 256
+
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmnd)
 {
 	unsigned short segments = blk_rq_nr_discard_segments(req), n = 0;
 	struct nvme_dsm_range *range;
 	struct bio *bio;
+	/* Some devices do not consider the DSM 'Number of Ranges'
+	 * field when determining how much data to DMA. Always allocate
+	 * memory for maximum number of segments to prevent device
+	 * reading beyond end of buffer. */
+	static const size_t alloc_size = sizeof *range * DSM_SEGMENTS_MAX;
 
-	range = kmalloc_array(segments, sizeof(*range),
-				GFP_ATOMIC | __GFP_NOWARN);
+	WARN_ON(segments > DSM_SEGMENTS_MAX);
+	range = kzalloc(alloc_size, GFP_ATOMIC | __GFP_NOWARN);
 	if (!range) {
 		/*
 		 * If we fail allocation our range, fallback to the controller
@@ -603,7 +610,7 @@ static blk_status_t nvme_setup_discard(s
 
 	req->special_vec.bv_page = virt_to_page(range);
 	req->special_vec.bv_offset = offset_in_page(range);
-	req->special_vec.bv_len = sizeof(*range) * segments;
+	req->special_vec.bv_len = alloc_size;
 	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
 
 	return BLK_STS_OK;

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCHv4] Workaround for discard on non-conformant nvme devices
  2019-11-12 20:55 [PATCHv4] Workaround for discard on non-conformant nvme devices Eduard Hasenleithner
@ 2019-11-12 21:51 ` Keith Busch
  0 siblings, 0 replies; 2+ messages in thread
From: Keith Busch @ 2019-11-12 21:51 UTC (permalink / raw)
  To: Eduard Hasenleithner; +Cc: linux-nvme

On Tue, Nov 12, 2019 at 09:55:01PM +0100, Eduard Hasenleithner wrote:
> As documented in https://bugzilla.kernel.org/show_bug.cgi?id=202665 there are lots of Linux nvme users which get IO-MMU related errors when performing discard on nvme. So far analysis suggests that the errors are caused by non-conformant nvme devices which are reading beyond the end of the buffer containing the segments to be discarded.
> 
> Until now two different variants of this behavior have been observed: SM22XX controllers round up the read size to a multiple of 512 bytes. Phison (E12) are found to unconditionally read the maximum amount of discard segment data allowed by the spec (256 segments or 4kB). In case the start address of the buffer is not aligned (512B or 4kB) and the actual data fits in the remainder of the buffer page it may cause the controller to read a page which is not mapped on IO-MMU.
> 
> The patch changes the nvme_setup_discard function to unconditionally allocate memory for a segment array of maximum size (256 segments). This prevents the nvme from reading beyond the end of the IO-MMU mapped buffer.
> 
> Signed-off-by: Eduard Hasenleithner <eduard@hasenleithner.at>

Just a few style comments on kernel guidelines. I won't ask you to
resubmit a patch since I steered you in the wrong direction earlier, so
I modified these points inline with the commit. Please take a look here:

  http://git.infradead.org/nvme.git/commitdiff/530436c45ef2e446c12538a400e465929a0b3ade?hp=400b6a7b13a3fd71cff087139ce45dd1e5fff444

First, changelogs need to be word wrapped at 72 characters and use
imperative tone:

  https://www.kernel.org/doc/html/v5.3/process/submitting-patches.html
 
> --- linux-5.3.7/drivers/nvme/host/core.c.orig	2019-11-12 20:42:16.394800789 +0100
> +++ linux-5.3.7/drivers/nvme/host/core.c	2019-11-12 21:23:07.635266361 +0100
> @@ -555,15 +555,22 @@ static inline void nvme_setup_flush(stru
>  	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
>  }
>  
> +#define DSM_SEGMENTS_MAX 256

We actually already have a #define NVME_DSM_MAX_RANGES

>  static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
>  		struct nvme_command *cmnd)
>  {
>  	unsigned short segments = blk_rq_nr_discard_segments(req), n = 0;
>  	struct nvme_dsm_range *range;
>  	struct bio *bio;
> +	/* Some devices do not consider the DSM 'Number of Ranges'
> +	 * field when determining how much data to DMA. Always allocate
> +	 * memory for maximum number of segments to prevent device
> +	 * reading beyond end of buffer. */

Multline comments style look like this:

  /*
   * This is a multiline comment according to the kernel documentation
   * at https://www.kernel.org/doc/html/v5.3/process/coding-style.html
   */

> +	static const size_t alloc_size = sizeof *range * DSM_SEGMENTS_MAX;

The 'sizeof' always uses the () in linux kernel coding style.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 20:55 [PATCHv4] Workaround for discard on non-conformant nvme devices Eduard Hasenleithner
2019-11-12 21:51 ` Keith Busch

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git