[PATCH v2 0/2] Adding per-controller timeout support to nvme

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Adding per-controller timeout support to nvme
@ 2019-04-03 12:35 Maximilian Heyne
  2019-04-03 12:35 ` [PATCH v2 1/2] nvme: add per-controller io and admin timeouts Maximilian Heyne
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Maximilian Heyne @ 2019-04-03 12:35 UTC (permalink / raw)
  Cc: David Woodhouse, Amit Shah, Maximilian Heyne, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, James Smart,
	linux-nvme, linux-kernel

As different nvme controllers are connect via different fabrics, some require
different timeout settings than others. This series implements per-controller
timeouts in the nvme subsystem which can be set via sysfs.

We have reached out to the NVMe working group to implement per-controller
timeout values. These patches are paving the way for this.

Changes since v1:
- implement the change not only for the pci NVMe driver but also for fc,
  lightnvm, rdma, tcp and loop.
- add an additional check when updating timeouts to not race with controller
  creation or deletion

Maximilian Heyne (2):
  nvme: add per-controller io and admin timeouts
  nvme: add sysfs controls for io and admin timeouts

 drivers/nvme/host/core.c     | 123 +++++++++++++++++++++++++++++++++++++++++--
 drivers/nvme/host/fc.c       |   2 +-
 drivers/nvme/host/lightnvm.c |   2 +-
 drivers/nvme/host/nvme.h     |   2 +
 drivers/nvme/host/pci.c      |  13 ++---
 drivers/nvme/host/rdma.c     |   4 +-
 drivers/nvme/host/tcp.c      |   4 +-
 drivers/nvme/target/loop.c   |   4 +-
 8 files changed, 136 insertions(+), 18 deletions(-)

-- 
2.16.5




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] nvme: add per-controller io and admin timeouts
  2019-04-03 12:35 [PATCH v2 0/2] Adding per-controller timeout support to nvme Maximilian Heyne
@ 2019-04-03 12:35 ` Maximilian Heyne
  2019-04-03 12:35 ` [PATCH v2 2/2] nvme: add sysfs controls for " Maximilian Heyne
  2019-04-24 16:55 ` [PATCH v2 0/2] Adding per-controller timeout support to nvme Sagi Grimberg
  2 siblings, 0 replies; 10+ messages in thread
From: Maximilian Heyne @ 2019-04-03 12:35 UTC (permalink / raw)
  Cc: David Woodhouse, Amit Shah, Maximilian Heyne, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, James Smart,
	linux-nvme, linux-kernel

Some NVMe controller require specific io and admin timeouts that are
different from the default, for instance local vs. remote storage.

This patch adds per-controller admin and io timeouts to the nvme_ctrl
structure and replaces all usages of the module parameters in the NVMe
drivers with the per-device timeouts.

Original-patch-by: Milan Pandurov <milanpa@amazon.com>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
v2: use per-controller timeout in other nvme drivers too

 drivers/nvme/host/core.c     | 11 +++++++----
 drivers/nvme/host/fc.c       |  2 +-
 drivers/nvme/host/lightnvm.c |  2 +-
 drivers/nvme/host/nvme.h     |  2 ++
 drivers/nvme/host/pci.c      | 13 +++++++------
 drivers/nvme/host/rdma.c     |  4 ++--
 drivers/nvme/host/tcp.c      |  4 ++--
 drivers/nvme/target/loop.c   |  4 ++--
 8 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 470601980794..d0530bf7a677 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -779,7 +779,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
+	req->timeout = timeout ? timeout : q->rq_timeout;
 
 	if (buffer && bufflen) {
 		ret = blk_rq_map_kern(q, req, buffer, bufflen, GFP_KERNEL);
@@ -862,7 +862,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	req->timeout = timeout ? timeout : ADMIN_TIMEOUT;
+	req->timeout = timeout ? timeout : q->rq_timeout;
 	nvme_req(req)->flags |= NVME_REQ_USERCMD;
 
 	if (ubuffer && bufflen) {
@@ -1800,7 +1800,8 @@ int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len,
 	cmd.common.cdw11 = cpu_to_le32(len);
 
 	return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
-				      ADMIN_TIMEOUT, NVME_QID_ANY, 1, 0, false);
+				      ctrl->admin_timeout, NVME_QID_ANY, 1, 0,
+				      false);
 }
 EXPORT_SYMBOL_GPL(nvme_sec_submit);
 #endif /* CONFIG_BLK_SED_OPAL */
@@ -3575,7 +3576,7 @@ static void nvme_fw_act_work(struct work_struct *work)
 				msecs_to_jiffies(ctrl->mtfa * 100);
 	else
 		fw_act_timeout = jiffies +
-				msecs_to_jiffies(admin_timeout * 1000);
+		    msecs_to_jiffies((ctrl->admin_timeout / HZ) * 1000);
 
 	nvme_stop_queues(ctrl);
 	while (nvme_ctrl_pp_status(ctrl)) {
@@ -3721,6 +3722,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	ctrl->dev = dev;
 	ctrl->ops = ops;
 	ctrl->quirks = quirks;
+	ctrl->io_timeout = NVME_IO_TIMEOUT;
+	ctrl->admin_timeout = ADMIN_TIMEOUT;
 	INIT_WORK(&ctrl->scan_work, nvme_scan_work);
 	INIT_WORK(&ctrl->async_event_work, nvme_async_event_work);
 	INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work);
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index f3b9d91ba0df..09614a214bed 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2431,7 +2431,7 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
 			    ctrl->lport->ops->fcprqst_priv_sz);
 	ctrl->tag_set.driver_data = ctrl;
 	ctrl->tag_set.nr_hw_queues = ctrl->ctrl.queue_count - 1;
-	ctrl->tag_set.timeout = NVME_IO_TIMEOUT;
+	ctrl->tag_set.timeout = ctrl->ctrl.io_timeout;
 
 	ret = blk_mq_alloc_tag_set(&ctrl->tag_set);
 	if (ret)
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 949e29e1d782..b9160123b53c 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -789,7 +789,7 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q,
 		goto err_cmd;
 	}
 
-	rq->timeout = timeout ? timeout : ADMIN_TIMEOUT;
+	rq->timeout = timeout ? timeout : ns->ctrl->admin_timeout;
 
 	if (ppa_buf && ppa_len) {
 		ppa_list = dma_pool_alloc(dev->dma_pool, GFP_KERNEL, &ppa_dma);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 527d64545023..1397650edfda 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -201,6 +201,8 @@ struct nvme_ctrl {
 	u32 aen_result;
 	u32 ctratt;
 	unsigned int shutdown_timeout;
+	unsigned int io_timeout;
+	unsigned int admin_timeout;
 	unsigned int kato;
 	bool subsystem;
 	unsigned long quirks;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d63aac..16d7a00fecf0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1357,7 +1357,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		return BLK_EH_RESET_TIMER;
 	}
 
-	abort_req->timeout = ADMIN_TIMEOUT;
+	abort_req->timeout = dev->ctrl.admin_timeout;
 	abort_req->end_io_data = NULL;
 	blk_execute_rq_nowait(abort_req->q, NULL, abort_req, 0, abort_endio);
 
@@ -1637,7 +1637,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 		dev->admin_tagset.nr_hw_queues = 1;
 
 		dev->admin_tagset.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
-		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
+		dev->admin_tagset.timeout = dev->ctrl.admin_timeout;
 		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
 		dev->admin_tagset.cmd_size = nvme_pci_cmd_size(dev, false);
 		dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
@@ -2226,7 +2226,7 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	req->timeout = ADMIN_TIMEOUT;
+	req->timeout = nvmeq->dev->ctrl.admin_timeout;
 	req->end_io_data = nvmeq;
 
 	init_completion(&nvmeq->delete_done);
@@ -2242,7 +2242,7 @@ static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
 	unsigned long timeout;
 
  retry:
-	timeout = ADMIN_TIMEOUT;
+	timeout = dev->ctrl.admin_timeout;
 	while (nr_queues > 0) {
 		if (nvme_delete_queue(&dev->queues[nr_queues], opcode))
 			break;
@@ -2282,7 +2282,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 		dev->tagset.nr_maps = 2; /* default + read */
 		if (dev->io_queues[HCTX_TYPE_POLL])
 			dev->tagset.nr_maps++;
-		dev->tagset.timeout = NVME_IO_TIMEOUT;
+		dev->tagset.timeout = dev->ctrl.io_timeout;
 		dev->tagset.numa_node = dev_to_node(dev->dev);
 		dev->tagset.queue_depth =
 				min_t(int, dev->q_depth, BLK_MQ_MAX_DEPTH) - 1;
@@ -2417,7 +2417,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	 */
 	if (!dead) {
 		if (shutdown)
-			nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+			nvme_wait_freeze_timeout(&dev->ctrl,
+						 dev->ctrl.io_timeout);
 	}
 
 	nvme_stop_queues(&dev->ctrl);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 11a5ecae78c8..a81049fd3896 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -724,7 +724,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 			SG_CHUNK_SIZE * sizeof(struct scatterlist);
 		set->driver_data = ctrl;
 		set->nr_hw_queues = 1;
-		set->timeout = ADMIN_TIMEOUT;
+		set->timeout = nctrl->admin_timeout;
 		set->flags = BLK_MQ_F_NO_SCHED;
 	} else {
 		set = &ctrl->tag_set;
@@ -738,7 +738,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 			SG_CHUNK_SIZE * sizeof(struct scatterlist);
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
-		set->timeout = NVME_IO_TIMEOUT;
+		set->timeout = nctrl->io_timeout;
 		set->nr_maps = nctrl->opts->nr_poll_queues ? HCTX_MAX_TYPES : 2;
 	}
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 68c49dd67210..dede067e339b 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1449,7 +1449,7 @@ static struct blk_mq_tag_set *nvme_tcp_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->cmd_size = sizeof(struct nvme_tcp_request);
 		set->driver_data = ctrl;
 		set->nr_hw_queues = 1;
-		set->timeout = ADMIN_TIMEOUT;
+		set->timeout = nctrl->admin_timeout;
 	} else {
 		set = &ctrl->tag_set;
 		memset(set, 0, sizeof(*set));
@@ -1461,7 +1461,7 @@ static struct blk_mq_tag_set *nvme_tcp_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->cmd_size = sizeof(struct nvme_tcp_request);
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
-		set->timeout = NVME_IO_TIMEOUT;
+		set->timeout = nctrl->io_timeout;
 		set->nr_maps = 2 /* default + read */;
 	}
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b9f623ab01f3..7fbefca5c6cb 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -359,7 +359,7 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 		SG_CHUNK_SIZE * sizeof(struct scatterlist);
 	ctrl->admin_tag_set.driver_data = ctrl;
 	ctrl->admin_tag_set.nr_hw_queues = 1;
-	ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
+	ctrl->admin_tag_set.timeout = ctrl->ctrl.admin_timeout;
 	ctrl->admin_tag_set.flags = BLK_MQ_F_NO_SCHED;
 
 	ctrl->queues[0].ctrl = ctrl;
@@ -532,7 +532,7 @@ static int nvme_loop_create_io_queues(struct nvme_loop_ctrl *ctrl)
 		SG_CHUNK_SIZE * sizeof(struct scatterlist);
 	ctrl->tag_set.driver_data = ctrl;
 	ctrl->tag_set.nr_hw_queues = ctrl->ctrl.queue_count - 1;
-	ctrl->tag_set.timeout = NVME_IO_TIMEOUT;
+	ctrl->tag_set.timeout = ctrl->ctrl.io_timeout;
 	ctrl->ctrl.tagset = &ctrl->tag_set;
 
 	ret = blk_mq_alloc_tag_set(&ctrl->tag_set);
-- 
2.16.5




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] nvme: add sysfs controls for io and admin timeouts
  2019-04-03 12:35 [PATCH v2 0/2] Adding per-controller timeout support to nvme Maximilian Heyne
  2019-04-03 12:35 ` [PATCH v2 1/2] nvme: add per-controller io and admin timeouts Maximilian Heyne
@ 2019-04-03 12:35 ` Maximilian Heyne
  2019-04-09 10:23   ` Christoph Hellwig
  2019-04-24 16:55 ` [PATCH v2 0/2] Adding per-controller timeout support to nvme Sagi Grimberg
  2 siblings, 1 reply; 10+ messages in thread
From: Maximilian Heyne @ 2019-04-03 12:35 UTC (permalink / raw)
  Cc: David Woodhouse, Amit Shah, Maximilian Heyne, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, James Smart,
	linux-nvme, linux-kernel

Add two sysfs files for reading and updating the admin and io timeouts
of individual NVMe controllers.

The controller must be in the LIVE or ADMIN_ONLY state for this to work
so that setting timeouts doesn't race with the creation or deletion of
tagset, admin_tagset, admin_q and connect_q of nvme_ctrl.

Original-patch-by: Milan Pandurov <milanpa@amazon.com>
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
v2: 
 - rework setting timeouts to work with all relevant nvme drivers
 - add check for state LIVE to not race with controller creation/deletion

 drivers/nvme/host/core.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d0530bf7a677..51f359d78f17 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2768,6 +2768,116 @@ static ssize_t nvme_sysfs_rescan(struct device *dev,
 }
 static DEVICE_ATTR(rescan_controller, S_IWUSR, NULL, nvme_sysfs_rescan);
 
+static int nvme_set_io_timeout(struct nvme_ctrl *ctrl, unsigned int timeout)
+{
+	struct nvme_ns *ns;
+	unsigned long flags;
+	int ret = -EBUSY;
+
+	spin_lock_irqsave(&ctrl->lock, flags);
+	if (ctrl->state == NVME_CTRL_LIVE) {
+		ctrl->io_timeout = timeout;
+		if (ctrl->tagset)
+			ctrl->tagset->timeout = timeout;
+		if (ctrl->connect_q)
+			blk_queue_rq_timeout(ctrl->connect_q, timeout);
+
+		down_write(&ctrl->namespaces_rwsem);
+		list_for_each_entry(ns, &ctrl->namespaces, list) {
+			blk_queue_rq_timeout(ns->queue, timeout);
+		}
+		up_write(&ctrl->namespaces_rwsem);
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&ctrl->lock, flags);
+	return ret;
+}
+
+static int nvme_set_admin_timeout(struct nvme_ctrl *ctrl, unsigned int timeout)
+{
+	unsigned long flags;
+	int ret = -EBUSY;
+
+	spin_lock_irqsave(&ctrl->lock, flags);
+	if (ctrl->state == NVME_CTRL_LIVE ||
+	    ctrl->state == NVME_CTRL_ADMIN_ONLY) {
+		ctrl->admin_timeout = timeout;
+		ctrl->admin_tagset->timeout = timeout;
+		blk_queue_rq_timeout(ctrl->admin_q, timeout);
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&ctrl->lock, flags);
+	return ret;
+}
+
+static ssize_t io_timeout_show(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", ctrl->io_timeout / HZ);
+}
+
+static ssize_t io_timeout_store(struct device *dev,
+				struct device_attribute *attr, const char *buf,
+				size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	unsigned int timeout;
+	int ret;
+
+	ret = kstrtouint(buf, 10u, &timeout);
+	if (ret == -EINVAL) {
+		dev_warn(ctrl->dev, "Error parsing timeout value.\n");
+		return ret;
+	}
+	if (ret == -ERANGE || timeout == 0 || timeout > UINT_MAX / HZ) {
+		dev_warn(ctrl->dev,
+			 "Timeout value out of range (0 < timeout <= %u).\n",
+			 UINT_MAX / HZ);
+		return -ERANGE;
+	}
+	ret = nvme_set_io_timeout(ctrl, timeout * HZ);
+	if (ret < 0)
+		return ret;
+	return count;
+}
+static DEVICE_ATTR_RW(io_timeout);
+
+static ssize_t admin_timeout_show(struct device *dev,
+				  struct device_attribute *attr, char *buf)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", ctrl->admin_timeout / HZ);
+}
+
+static ssize_t admin_timeout_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+	unsigned int timeout;
+	int ret;
+
+	ret = kstrtouint(buf, 10u, &timeout);
+	if (ret == -EINVAL) {
+		dev_warn(ctrl->dev, "Error parsing timeout value.\n");
+		return ret;
+	}
+	if (ret == -ERANGE || timeout == 0 || timeout > UINT_MAX / HZ) {
+		dev_warn(ctrl->dev,
+			 "Timeout value out of range (0 < timeout <= %u).\n",
+			 UINT_MAX / HZ);
+		return -ERANGE;
+	}
+	ret = nvme_set_admin_timeout(ctrl, timeout * HZ);
+	if (ret < 0)
+		return ret;
+	return count;
+}
+static DEVICE_ATTR_RW(admin_timeout);
+
 static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
 {
 	struct gendisk *disk = dev_to_disk(dev);
@@ -3008,6 +3118,8 @@ static struct attribute *nvme_dev_attrs[] = {
 	&dev_attr_address.attr,
 	&dev_attr_state.attr,
 	&dev_attr_numa_node.attr,
+	&dev_attr_io_timeout.attr,
+	&dev_attr_admin_timeout.attr,
 	NULL
 };
 
-- 
2.16.5




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] nvme: add sysfs controls for io and admin timeouts
  2019-04-03 12:35 ` [PATCH v2 2/2] nvme: add sysfs controls for " Maximilian Heyne
@ 2019-04-09 10:23   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2019-04-09 10:23 UTC (permalink / raw)
  To: Maximilian Heyne
  Cc: David Woodhouse, Amit Shah, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, James Smart, linux-nvme,
	linux-kernel

I really don't think the iteration over all queues of a tagset
makes a whole lot of sense.  We really need to replace the per-queue
limit with a per-tagset one (or at least add the per-tagset one)
and then allow tweaking that in the block layer instead of writing
this boiler plate code.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme
  2019-04-03 12:35 [PATCH v2 0/2] Adding per-controller timeout support to nvme Maximilian Heyne
  2019-04-03 12:35 ` [PATCH v2 1/2] nvme: add per-controller io and admin timeouts Maximilian Heyne
  2019-04-03 12:35 ` [PATCH v2 2/2] nvme: add sysfs controls for " Maximilian Heyne
@ 2019-04-24 16:55 ` Sagi Grimberg
  2019-04-24 20:07   ` Keith Busch
  2 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2019-04-24 16:55 UTC (permalink / raw)
  To: Maximilian Heyne
  Cc: David Woodhouse, Amit Shah, Keith Busch, Jens Axboe,
	Christoph Hellwig, James Smart, linux-nvme, linux-kernel


> As different nvme controllers are connect via different fabrics, some require
> different timeout settings than others. This series implements per-controller
> timeouts in the nvme subsystem which can be set via sysfs.

How much of a real issue is this?

block io_timeout defaults to 30 seconds which are considered a universal
eternity for pretty much any nvme fabric. Moreover, io_timeout is
mutable already on a per-namespace level.

This leaves the admin_timeout which goes beyond this to 60 seconds...

Can you describe what exactly are you trying to solve?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme
  2019-04-24 16:55 ` [PATCH v2 0/2] Adding per-controller timeout support to nvme Sagi Grimberg
@ 2019-04-24 20:07   ` Keith Busch
  2019-04-24 20:30     ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2019-04-24 20:07 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Maximilian Heyne, David Woodhouse, Amit Shah, Keith Busch,
	Jens Axboe, Christoph Hellwig, James Smart, linux-nvme,
	linux-kernel

On Wed, Apr 24, 2019 at 09:55:16AM -0700, Sagi Grimberg wrote:
> 
> > As different nvme controllers are connect via different fabrics, some require
> > different timeout settings than others. This series implements per-controller
> > timeouts in the nvme subsystem which can be set via sysfs.
> 
> How much of a real issue is this?
> 
> block io_timeout defaults to 30 seconds which are considered a universal
> eternity for pretty much any nvme fabric. Moreover, io_timeout is
> mutable already on a per-namespace level.
> 
> This leaves the admin_timeout which goes beyond this to 60 seconds...
> 
> Can you describe what exactly are you trying to solve?

I think they must have an nvme target that is backed by slow media
(i.e. non-SSD). If that's the case, I think it may be a better option
if the target advertises relatively shallow queue depths and/or lower
MDTS that better aligns to the backing storage capabilies.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme
  2019-04-24 20:07   ` Keith Busch
@ 2019-04-24 20:30     ` David Woodhouse
  2019-04-24 20:44       ` Keith Busch
  2019-04-24 20:58       ` Sagi Grimberg
  0 siblings, 2 replies; 10+ messages in thread
From: David Woodhouse @ 2019-04-24 20:30 UTC (permalink / raw)
  To: Keith Busch, Sagi Grimberg
  Cc: Maximilian Heyne, Amit Shah, Keith Busch, Jens Axboe,
	Christoph Hellwig, James Smart, linux-nvme, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1319 bytes --]

On Wed, 2019-04-24 at 14:07 -0600, Keith Busch wrote:
> On Wed, Apr 24, 2019 at 09:55:16AM -0700, Sagi Grimberg wrote:
> > 
> > > As different nvme controllers are connect via different fabrics, some require
> > > different timeout settings than others. This series implements per-controller
> > > timeouts in the nvme subsystem which can be set via sysfs.
> > 
> > How much of a real issue is this?
> > 
> > block io_timeout defaults to 30 seconds which are considered a universal
> > eternity for pretty much any nvme fabric. Moreover, io_timeout is
> > mutable already on a per-namespace level.
> > 
> > This leaves the admin_timeout which goes beyond this to 60 seconds...
> > 
> > Can you describe what exactly are you trying to solve?
> 
> I think they must have an nvme target that is backed by slow media
> (i.e. non-SSD). If that's the case, I think it may be a better option
> if the target advertises relatively shallow queue depths and/or lower
> MDTS that better aligns to the backing storage capabilies.

It isn't that the media is slow; the max timeout is based on the SLA
for certain classes of "fabric" outages. Linux copes *really* badly
with I/O errors, and if we can make the timeout last long enough to
cover the switch restart worst case, then users are a lot happier.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme
  2019-04-24 20:30     ` David Woodhouse
@ 2019-04-24 20:44       ` Keith Busch
  2019-04-24 20:58       ` Sagi Grimberg
  1 sibling, 0 replies; 10+ messages in thread
From: Keith Busch @ 2019-04-24 20:44 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Sagi Grimberg, Jens Axboe, James Smart, linux-nvme, linux-kernel,
	Keith Busch, Maximilian Heyne, Amit Shah, Christoph Hellwig

On Wed, Apr 24, 2019 at 10:30:08PM +0200, David Woodhouse wrote:
> It isn't that the media is slow; the max timeout is based on the SLA
> for certain classes of "fabric" outages. Linux copes *really* badly
> with I/O errors, and if we can make the timeout last long enough to
> cover the switch restart worst case, then users are a lot happier.

Gotchya. So the default timeout is sufficient under normal operation,
but temporary intermittent outages may exceed it. It'd be a real
dissappointment if the command times out with an error anyway after
waiting for the extended time, but we ought to cover the worst case
time for a successful completion.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme
  2019-04-24 20:30     ` David Woodhouse
  2019-04-24 20:44       ` Keith Busch
@ 2019-04-24 20:58       ` Sagi Grimberg
  2019-04-25  5:45         ` David Woodhouse
  1 sibling, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2019-04-24 20:58 UTC (permalink / raw)
  To: David Woodhouse, Keith Busch
  Cc: Jens Axboe, James Smart, linux-nvme, linux-kernel, Keith Busch,
	Maximilian Heyne, Amit Shah, Christoph Hellwig


>>>> As different nvme controllers are connect via different fabrics, some require
>>>> different timeout settings than others. This series implements per-controller
>>>> timeouts in the nvme subsystem which can be set via sysfs.
>>>
>>> How much of a real issue is this?
>>>
>>> block io_timeout defaults to 30 seconds which are considered a universal
>>> eternity for pretty much any nvme fabric. Moreover, io_timeout is
>>> mutable already on a per-namespace level.
>>>
>>> This leaves the admin_timeout which goes beyond this to 60 seconds...
>>>
>>> Can you describe what exactly are you trying to solve?
>>
>> I think they must have an nvme target that is backed by slow media
>> (i.e. non-SSD). If that's the case, I think it may be a better option
>> if the target advertises relatively shallow queue depths and/or lower
>> MDTS that better aligns to the backing storage capabilies.
> 
> It isn't that the media is slow; the max timeout is based on the SLA
> for certain classes of "fabric" outages. Linux copes *really* badly
> with I/O errors, and if we can make the timeout last long enough to
> cover the switch restart worst case, then users are a lot happier.

Well, what is usually done to handle fabric outages is having multiple
paths to the storage device, not sure if that is applicable for you or
not...

What do you mean by "Linux copes *really* badly with I/O errors"? What
can be done better?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] Adding per-controller timeout support to nvme
  2019-04-24 20:58       ` Sagi Grimberg
@ 2019-04-25  5:45         ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2019-04-25  5:45 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch
  Cc: Jens Axboe, James Smart, linux-nvme, linux-kernel, Keith Busch,
	Maximilian Heyne, Amit Shah, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

On Wed, 2019-04-24 at 13:58 -0700, Sagi Grimberg wrote:
> > It isn't that the media is slow; the max timeout is based on the SLA
> > for certain classes of "fabric" outages. Linux copes *really* badly
> > with I/O errors, and if we can make the timeout last long enough to
> > cover the switch restart worst case, then users are a lot happier.
> 
> Well, what is usually done to handle fabric outages is having multiple
> paths to the storage device, not sure if that is applicable for you or
> not...

Yeah, that turns out to be impractical in this case.

> What do you mean by "Linux copes *really* badly with I/O errors"? What
> can be done better?

There's not a lot that can be done here in the short term. If file
systems get errors on certain I/O, then graceful recovery would be
complicated to achieve.

Better for the I/O timeout to be set higher than the known worst case
time for successful completion.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-04-25  5:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-03 12:35 [PATCH v2 0/2] Adding per-controller timeout support to nvme Maximilian Heyne
2019-04-03 12:35 ` [PATCH v2 1/2] nvme: add per-controller io and admin timeouts Maximilian Heyne
2019-04-03 12:35 ` [PATCH v2 2/2] nvme: add sysfs controls for " Maximilian Heyne
2019-04-09 10:23   ` Christoph Hellwig
2019-04-24 16:55 ` [PATCH v2 0/2] Adding per-controller timeout support to nvme Sagi Grimberg
2019-04-24 20:07   ` Keith Busch
2019-04-24 20:30     ` David Woodhouse
2019-04-24 20:44       ` Keith Busch
2019-04-24 20:58       ` Sagi Grimberg
2019-04-25  5:45         ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).