Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH] Workaround for discard on non-conformant nvme devices
@ 2019-11-04 21:47 Eduard Hasenleithner
  2019-11-06 16:52 ` Sagi Grimberg
  0 siblings, 1 reply; 7+ messages in thread
From: Eduard Hasenleithner @ 2019-11-04 21:47 UTC (permalink / raw)
  To: linux-nvme

As documented in https://bugzilla.kernel.org/show_bug.cgi?id=202665 
there are lots of Linux nvme users which get IO-MMU related errors when 
performing discard on nvme. So far analysis suggests that the errors are 
caused by non-conformat nvme devices which are reading beyond the end of 
the buffer containing the segments to be discarded.

Until now two different variants of this behavior have been observed: 
The controller found on an Intel 660p always reads a multiple of 512 
bytes. If the last chunk exceeds a page it continues with the subsequent 
page. For a Corsair MP510 the situation is even worse: The controller 
always reads a full page (4096) bytes. Then when the address is not 
aligned to 4096 it will continue reading at the address given in PRP2 
(which is most of the time 0).

This patch makes the nvme_setup_discard function always request a 
multiple of a page size (4096) from the kernel for storing the segment 
array. Since this makes the buffer always page-aligned the device 
reading beyond end of a page is avoided.

Patch is based on linux-5.3.7 tarball. Note: patch itself is not tested 
yet; for my tests some time ago I just hard-coded 256 segments. For now 
this email is meant for informing the nvme kernel developers about the 
topic.

Signed-off-by: Eduard Hasenleithner <eduard@hasenleithner.at>

--- drivers/nvme/host/core.c.orig	2019-11-04 21:53:20.758837001 +0100
+++ drivers/nvme/host/core.c	2019-11-04 22:05:54.409415849 +0100
@@ -561,9 +561,9 @@ static blk_status_t nvme_setup_discard(s
  	unsigned short segments = blk_rq_nr_discard_segments(req), n = 0;
  	struct nvme_dsm_range *range;
  	struct bio *bio;
+	size_t aligned_size = round_up(sizeof *range * segments, 4096);

-	range = kmalloc_array(segments, sizeof(*range),
-				GFP_ATOMIC | __GFP_NOWARN);
+	range = kmalloc(aligned_size, GFP_ATOMIC | __GFP_NOWARN);
  	if (!range) {
  		/*
  		 * If we fail allocation our range, fallback to the controller


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-04 21:47 [RFC PATCH] Workaround for discard on non-conformant nvme devices Eduard Hasenleithner
2019-11-06 16:52 ` Sagi Grimberg
2019-11-06 18:23   ` Keith Busch
2019-11-06 20:22     ` Eduard Hasenleithner
2019-11-06 20:43       ` Keith Busch
2019-11-06 21:10         ` Eduard Hasenleithner
2019-11-06 21:34           ` Keith Busch

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git