From: Eduard Hasenleithner <eduard@hasenleithner.at>
To: linux-nvme@lists.infradead.org
Subject: [RFC PATCH] Workaround for discard on non-conformant nvme devices
Date: Mon, 4 Nov 2019 22:47:34 +0100 [thread overview]
Message-ID: <f220c69a-793d-9160-4f20-921c52748009@hasenleithner.at> (raw)
As documented in https://bugzilla.kernel.org/show_bug.cgi?id=202665
there are lots of Linux nvme users which get IO-MMU related errors when
performing discard on nvme. So far analysis suggests that the errors are
caused by non-conformat nvme devices which are reading beyond the end of
the buffer containing the segments to be discarded.
Until now two different variants of this behavior have been observed:
The controller found on an Intel 660p always reads a multiple of 512
bytes. If the last chunk exceeds a page it continues with the subsequent
page. For a Corsair MP510 the situation is even worse: The controller
always reads a full page (4096) bytes. Then when the address is not
aligned to 4096 it will continue reading at the address given in PRP2
(which is most of the time 0).
This patch makes the nvme_setup_discard function always request a
multiple of a page size (4096) from the kernel for storing the segment
array. Since this makes the buffer always page-aligned the device
reading beyond end of a page is avoided.
Patch is based on linux-5.3.7 tarball. Note: patch itself is not tested
yet; for my tests some time ago I just hard-coded 256 segments. For now
this email is meant for informing the nvme kernel developers about the
topic.
Signed-off-by: Eduard Hasenleithner <eduard@hasenleithner.at>
--- drivers/nvme/host/core.c.orig 2019-11-04 21:53:20.758837001 +0100
+++ drivers/nvme/host/core.c 2019-11-04 22:05:54.409415849 +0100
@@ -561,9 +561,9 @@ static blk_status_t nvme_setup_discard(s
unsigned short segments = blk_rq_nr_discard_segments(req), n = 0;
struct nvme_dsm_range *range;
struct bio *bio;
+ size_t aligned_size = round_up(sizeof *range * segments, 4096);
- range = kmalloc_array(segments, sizeof(*range),
- GFP_ATOMIC | __GFP_NOWARN);
+ range = kmalloc(aligned_size, GFP_ATOMIC | __GFP_NOWARN);
if (!range) {
/*
* If we fail allocation our range, fallback to the controller
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next reply other threads:[~2019-11-04 21:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-04 21:47 Eduard Hasenleithner [this message]
2019-11-06 16:52 ` [RFC PATCH] Workaround for discard on non-conformant nvme devices Sagi Grimberg
2019-11-06 18:23 ` Keith Busch
2019-11-06 20:22 ` Eduard Hasenleithner
2019-11-06 20:43 ` Keith Busch
2019-11-06 21:10 ` Eduard Hasenleithner
2019-11-06 21:34 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f220c69a-793d-9160-4f20-921c52748009@hasenleithner.at \
--to=eduard@hasenleithner.at \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).