All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: Limit command retries
@ 2016-06-20 17:44 Keith Busch
  2016-06-22 10:30 ` Sagi Grimberg
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Keith Busch @ 2016-06-20 17:44 UTC (permalink / raw)


Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.

This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.

The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 5 +++++
 drivers/nvme/host/nvme.h | 5 ++++-
 drivers/nvme/host/pci.c  | 6 ++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9d7cee4..2b2eca9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -47,6 +47,11 @@ unsigned char shutdown_timeout = 5;
 module_param(shutdown_timeout, byte, 0644);
 MODULE_PARM_DESC(shutdown_timeout, "timeout in seconds for controller shutdown");
 
+unsigned int nvme_max_retries = 5;
+module_param_named(max_retries, nvme_max_retries, uint, 0644);
+MODULE_PARM_DESC(max_retries, "max number of retries a command may have");
+EXPORT_SYMBOL_GPL(nvme_max_retries);
+
 static int nvme_major;
 module_param(nvme_major, int, 0);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 282421f..8ab141d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -38,6 +38,8 @@ extern unsigned char admin_timeout;
 extern unsigned char shutdown_timeout;
 #define SHUTDOWN_TIMEOUT	(shutdown_timeout * HZ)
 
+extern unsigned int nvme_max_retries;
+
 enum {
 	NVME_NS_LBA		= 0,
 	NVME_NS_LIGHTNVM	= 1,
@@ -204,7 +206,8 @@ static inline int nvme_error_status(u16 status)
 static inline bool nvme_req_needs_retry(struct request *req, u16 status)
 {
 	return !(status & NVME_SC_DNR || blk_noretry_request(req)) &&
-		(jiffies - req->start_time) < req->timeout;
+		(jiffies - req->start_time) < req->timeout &&
+		req->retries < nvme_max_retries;
 }
 
 void nvme_cancel_request(struct request *req, void *data, bool reserved);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6d33f77..a2b51ea 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -310,6 +310,11 @@ static int nvme_init_iod(struct request *rq, unsigned size,
 	iod->npages = -1;
 	iod->nents = 0;
 	iod->length = size;
+
+	if (!(rq->cmd_flags & REQ_DONTPREP)) {
+		rq->retries = 0;
+		rq->cmd_flags |= REQ_DONTPREP;
+	}
 	return 0;
 }
 
@@ -623,6 +628,7 @@ static void nvme_complete_rq(struct request *req)
 
 	if (unlikely(req->errors)) {
 		if (nvme_req_needs_retry(req, req->errors)) {
+			req->retries++;
 			nvme_requeue_req(req);
 			return;
 		}
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] nvme: Limit command retries
  2016-06-20 17:44 [PATCH] nvme: Limit command retries Keith Busch
@ 2016-06-22 10:30 ` Sagi Grimberg
  2016-06-22 14:40   ` Keith Busch
  2016-06-22 16:14 ` Christoph Hellwig
  2016-07-12 23:21 ` Jens Axboe
  2 siblings, 1 reply; 7+ messages in thread
From: Sagi Grimberg @ 2016-06-22 10:30 UTC (permalink / raw)



> Many controller implementations will return errors to commands that will
> not succeed, but without the DNR bit set. The driver previously retried
> these commands an unlimited number of times until the command timeout
> has exceeded, which takes an unnecessarilly long period of time.
>
> This patch limits the number of retries a command can have, defaulting
> to 5, but is user tunable at load or runtime.
>
> The struct request's 'retries' field is used to track the number of
> retries attempted. This is in contrast with scsi's use of this field,
> which indicates how many retries are allowed.

Why not follow the same scheme?

>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>   drivers/nvme/host/core.c | 5 +++++
>   drivers/nvme/host/nvme.h | 5 ++++-
>   drivers/nvme/host/pci.c  | 6 ++++++
>   3 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 9d7cee4..2b2eca9 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -47,6 +47,11 @@ unsigned char shutdown_timeout = 5;
>   module_param(shutdown_timeout, byte, 0644);
>   MODULE_PARM_DESC(shutdown_timeout, "timeout in seconds for controller shutdown");
>
> +unsigned int nvme_max_retries = 5;
> +module_param_named(max_retries, nvme_max_retries, uint, 0644);
> +MODULE_PARM_DESC(max_retries, "max number of retries a command may have");
> +EXPORT_SYMBOL_GPL(nvme_max_retries);

This looks like its a per-controller attribute, it might be better to
make nvme-cli pass this via the vendor extensions? This way we can have
it on a per-controller basis.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] nvme: Limit command retries
  2016-06-22 10:30 ` Sagi Grimberg
@ 2016-06-22 14:40   ` Keith Busch
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2016-06-22 14:40 UTC (permalink / raw)


On Wed, Jun 22, 2016@01:30:32PM +0300, Sagi Grimberg wrote:
> >The struct request's 'retries' field is used to track the number of
> >retries attempted. This is in contrast with scsi's use of this field,
> >which indicates how many retries are allowed.
> 
> Why not follow the same scheme?

It's not documented to mean what scsi thinks it means, and it saves some
space from yet another field to track how many retries have occured. For
the pci driver, that field could be added to struct nvme_iod, but
that's not used for fabrics. Using req->retries can be consistent with
all nvme transports.

> >+unsigned int nvme_max_retries = 5;
> >+module_param_named(max_retries, nvme_max_retries, uint, 0644);
> >+MODULE_PARM_DESC(max_retries, "max number of retries a command may have");
> >+EXPORT_SYMBOL_GPL(nvme_max_retries);
> 
> This looks like its a per-controller attribute, it might be better to
> make nvme-cli pass this via the vendor extensions? This way we can have
> it on a per-controller basis.

If we need per-controller, I'd like a sysfs attribute. nvme-cli mostly
does ioctl's, but it also does some sysfs stuff too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] nvme: Limit command retries
  2016-06-20 17:44 [PATCH] nvme: Limit command retries Keith Busch
  2016-06-22 10:30 ` Sagi Grimberg
@ 2016-06-22 16:14 ` Christoph Hellwig
  2016-06-22 17:01   ` Keith Busch
  2016-07-12 23:21 ` Jens Axboe
  2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-06-22 16:14 UTC (permalink / raw)


I've got some work in progress that gets rid of the SCSI-specific
fields in struct request and moves them into a scsi specific private
data structure, and that also does the same for NVMe.

How urgent is this patch?  If it fixes an actual major issue seen
in the field we should probably but the ->retries use in as-is,
otherwise I'd prefer to avoid introducing more users of it for now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] nvme: Limit command retries
  2016-06-22 16:14 ` Christoph Hellwig
@ 2016-06-22 17:01   ` Keith Busch
  2016-07-10  9:47     ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2016-06-22 17:01 UTC (permalink / raw)


On Wed, Jun 22, 2016@09:14:42AM -0700, Christoph Hellwig wrote:
> How urgent is this patch?  If it fixes an actual major issue seen
> in the field we should probably but the ->retries use in as-is,
> otherwise I'd prefer to avoid introducing more users of it for now.

Cool, thanks. This is not an urgent issue for us, but was hoping to make
the 4.8 window. If we can flush something better out, I'm happy to wait.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] nvme: Limit command retries
  2016-06-22 17:01   ` Keith Busch
@ 2016-07-10  9:47     ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2016-07-10  9:47 UTC (permalink / raw)


On Wed, Jun 22, 2016@01:01:22PM -0400, Keith Busch wrote:
> On Wed, Jun 22, 2016@09:14:42AM -0700, Christoph Hellwig wrote:
> > How urgent is this patch?  If it fixes an actual major issue seen
> > in the field we should probably but the ->retries use in as-is,
> > otherwise I'd prefer to avoid introducing more users of it for now.
> 
> Cool, thanks. This is not an urgent issue for us, but was hoping to make
> the 4.8 window. If we can flush something better out, I'm happy to wait.

Given that even the prep patches for that work aren't merged yet
they aren't likely to make it for 4.8.

Based on that I think we should get this patch in for the merge window:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] nvme: Limit command retries
  2016-06-20 17:44 [PATCH] nvme: Limit command retries Keith Busch
  2016-06-22 10:30 ` Sagi Grimberg
  2016-06-22 16:14 ` Christoph Hellwig
@ 2016-07-12 23:21 ` Jens Axboe
  2 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2016-07-12 23:21 UTC (permalink / raw)


On 06/20/2016 10:44 AM, Keith Busch wrote:
> Many controller implementations will return errors to commands that will
> not succeed, but without the DNR bit set. The driver previously retried
> these commands an unlimited number of times until the command timeout
> has exceeded, which takes an unnecessarilly long period of time.
>
> This patch limits the number of retries a command can have, defaulting
> to 5, but is user tunable at load or runtime.
>
> The struct request's 'retries' field is used to track the number of
> retries attempted. This is in contrast with scsi's use of this field,
> which indicates how many retries are allowed.

Added for 4.8, thanks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-12 23:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-20 17:44 [PATCH] nvme: Limit command retries Keith Busch
2016-06-22 10:30 ` Sagi Grimberg
2016-06-22 14:40   ` Keith Busch
2016-06-22 16:14 ` Christoph Hellwig
2016-06-22 17:01   ` Keith Busch
2016-07-10  9:47     ` Christoph Hellwig
2016-07-12 23:21 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.