Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/1] nvmet: allow user to set req alloc flag
@ 2020-10-20  1:44 Chaitanya Kulkarni
  2020-10-20  1:44 ` [PATCH 1/1] " Chaitanya Kulkarni
  2020-10-20  8:22 ` [PATCH 0/1] " Sagi Grimberg
  0 siblings, 2 replies; 5+ messages in thread
From: Chaitanya Kulkarni @ 2020-10-20  1:44 UTC (permalink / raw)
  To: linux-nvme; +Cc: logang, hch, Chaitanya Kulkarni, sagi

Hi,

When using NVMeOF target in the passthru mode we allocate the request
with BLK_MQ_REQ_NOWAIT flag. This allocates the request in the following
manner :-

nvme_alloc_request()
 blk_mq_alloc_rquest()
  blk_mq_queue_enter()
   if (flag & BLK_MQ_REQ_NOWAIT)
  	return -EBUSY; <-- return if busy.

On the NVMe controller which I've the fio random write workload running
parallel on 32 namespaces with higher queue depth results in I/O error,
where blk_mq_queue_enter() returning -EBUSY as shown above. This problem
is not easy to reproduce but occurs once in a while with following error
(See 1 for detailed log) :-

test1: (groupid=0, jobs=32): err= 5 
(file:io_u.c:1744, func=io_u error, error=Input/output error):

When the flag BLK_MQ_REQ_NOWAIT is removed from the allocation the
workload doen't result in the error.

This patch fixes the problem with the request allocation by adding
a new configfs attribute so that user can optionally decide whether
to use BLK_MQ_REQ_NOWAIT or not. We retain the default behavior by
using BLK_MQ_REQ_NOWAIT when creating the nvmet passthru subsystem.

Regards,
Chaitanya

Chaitanya Kulkarni (1):
  nvmet: allow user to set req alloc flag

 drivers/nvme/target/configfs.c | 36 ++++++++++++++++++++++++++++++++++
 drivers/nvme/target/core.c     |  3 +++
 drivers/nvme/target/nvmet.h    |  1 +
 drivers/nvme/target/passthru.c |  3 ++-
 4 files changed, 42 insertions(+), 1 deletion(-)

1. FIO workload resulting in the I/O error with default behavior :-
-------------------------------------------------------------------

# cat nvmet-req-nowait-1.fio.log 
fio-3.8-5-g464b
Starting 32 processes

test1: (groupid=0, jobs=32): err= 5 (file:io_u.c:1744, func=io_u error, error=Input/output error): pid=4580: Mon Oct 19 16:07:42 2020
  write: IOPS=871k, BW=3403MiB/s (3569MB/s)(1664MiB/489msec)
    slat (usec): min=6, max=17846, avg=33.13, stdev=228.94
    clat (usec): min=50, max=40690, avg=4618.04, stdev=1154.78
     lat (usec): min=194, max=40702, avg=4651.20, stdev=1170.03
    clat percentiles (usec):
     |  1.00th=[ 3195],  5.00th=[ 3621], 10.00th=[ 3720], 20.00th=[ 3752],
     | 30.00th=[ 3818], 40.00th=[ 3884], 50.00th=[ 3982], 60.00th=[ 4359],
     | 70.00th=[ 5604], 80.00th=[ 5735], 90.00th=[ 5997], 95.00th=[ 6194],
     | 99.00th=[ 6915], 99.50th=[ 7504], 99.90th=[16057], 99.95th=[17695],
     | 99.99th=[22938]
  lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.21%, 4=50.40%, 10=49.19%, 20=0.09%, 50=0.04%
  cpu          : usr=4.07%, sys=34.86%, ctx=5296, majf=0, minf=393
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,426154,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=3403MiB/s (3569MB/s), 3403MiB/s-3403MiB/s (3569MB/s-3569MB/s), io=1664MiB (1745MB), run=489-489msec

Disk stats (read/write):
  nvme2n1: ios=0/9329, merge=0/0, ticks=0/16477, in_queue=16478, util=1.81%
  nvme2n2: ios=40/7807, merge=0/0, ticks=3/13759, in_queue=13763, util=1.92%
  nvme2n3: ios=40/7577, merge=0/0, ticks=7/17245, in_queue=17253, util=1.98%
  nvme2n4: ios=40/7647, merge=0/0, ticks=7/27139, in_queue=27147, util=2.14%
  nvme2n5: ios=40/8678, merge=0/0, ticks=5/20387, in_queue=20393, util=2.16%
  nvme2n6: ios=40/9069, merge=0/0, ticks=7/26973, in_queue=26981, util=2.29%
  nvme2n7: ios=40/6509, merge=0/0, ticks=8/19087, in_queue=19095, util=2.34%
  nvme2n8: ios=40/9491, merge=0/0, ticks=4/19053, in_queue=19058, util=2.42%
  nvme2n9: ios=40/6604, merge=0/0, ticks=4/27047, in_queue=27051, util=2.43%
  nvme2n10: ios=40/9525, merge=0/0, ticks=8/27259, in_queue=27268, util=2.60%
  nvme2n11: ios=40/8255, merge=0/0, ticks=7/27218, in_queue=27226, util=2.65%
  nvme2n12: ios=40/7239, merge=0/0, ticks=5/26904, in_queue=26910, util=2.75%
  nvme2n13: ios=40/7588, merge=0/0, ticks=6/18549, in_queue=18556, util=2.75%
  nvme2n14: ios=40/7474, merge=0/0, ticks=5/18387, in_queue=18392, util=2.91%
  nvme2n15: ios=40/7293, merge=0/0, ticks=4/14218, in_queue=14223, util=3.23%
  nvme2n16: ios=0/8255, merge=0/0, ticks=0/27107, in_queue=27108, util=3.39%
  nvme2n17: ios=0/8383, merge=0/0, ticks=0/27563, in_queue=27563, util=3.79%
  nvme2n18: ios=0/7900, merge=0/0, ticks=0/27229, in_queue=27229, util=3.94%
  nvme2n19: ios=0/9406, merge=0/0, ticks=0/14151, in_queue=14152, util=4.37%
  nvme2n20: ios=0/9022, merge=0/0, ticks=0/27223, in_queue=27223, util=4.89%
  nvme2n21: ios=0/7231, merge=0/0, ticks=0/14102, in_queue=14102, util=4.98%
  nvme2n22: ios=0/8891, merge=0/0, ticks=0/27090, in_queue=27090, util=5.52%
  nvme2n23: ios=0/6223, merge=0/0, ticks=0/27066, in_queue=27067, util=5.54%
  nvme2n24: ios=0/7366, merge=0/0, ticks=0/27291, in_queue=27291, util=6.09%
  nvme2n25: ios=0/7301, merge=0/0, ticks=0/26984, in_queue=26985, util=7.55%
  nvme2n26: ios=0/7716, merge=0/0, ticks=0/19097, in_queue=19098, util=8.43%
  nvme2n27: ios=0/9144, merge=0/0, ticks=0/27355, in_queue=27356, util=10.98%
  nvme2n28: ios=0/6591, merge=0/0, ticks=0/14058, in_queue=14058, util=12.28%
  nvme2n29: ios=0/8915, merge=0/0, ticks=0/27237, in_queue=27237, util=17.74%
  nvme2n30: ios=0/7098, merge=0/0, ticks=0/25610, in_queue=25610, util=28.49%
  nvme2n31: ios=0/6404, merge=0/0, ticks=0/21010, in_queue=21011, util=31.86%
  nvme2n32: ios=0/9678, merge=0/0, ticks=0/19048, in_queue=19049, util=73.70%

1. FIO workload without any error when not using BLK_MQ_REQ_NOWAIT :-
---------------------------------------------------------------------

# cat nvmet-req-nowait-0.fio.log 
fio-3.8-5-g464b
Starting 32 processes

test1: (groupid=0, jobs=32): err= 0: pid=4665: Mon Oct 19 16:09:06 2020
  write: IOPS=832k, BW=3250MiB/s (3408MB/s)(195GiB/61570msec)
    slat (usec): min=6, max=1984.3k, avg=34.57, stdev=1293.12
    clat (usec): min=42, max=3012.5k, avg=4804.90, stdev=16251.84
     lat (usec): min=80, max=3012.5k, avg=4839.58, stdev=16320.43
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    4], 10.00th=[    4], 20.00th=[    5],
     | 30.00th=[    5], 40.00th=[    5], 50.00th=[    5], 60.00th=[    5],
     | 70.00th=[    5], 80.00th=[    5], 90.00th=[    7], 95.00th=[    7],
     | 99.00th=[    7], 99.50th=[    8], 99.90th=[   13], 99.95th=[   19],
     | 99.99th=[  776]
   bw (  KiB/s): min=   56, max=132007, per=3.23%, avg=107439.18, stdev=20819.41, samples=3790
   iops        : min=   14, max=33001, avg=26859.75, stdev=5204.86, samples=3790
  lat (usec)   : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.01%, 4=13.50%, 10=86.26%, 20=0.19%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.02%, 750=0.01%, 1000=0.01%
  cpu          : usr=3.89%, sys=33.38%, ctx=5596245, majf=0, minf=438
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,51228543,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=3250MiB/s (3408MB/s), 3250MiB/s-3250MiB/s (3408MB/s-3408MB/s), io=195GiB (210GB), run=61570-61570msec

Disk stats (read/write):
  nvme2n1: ios=96/1583859, merge=0/0, ticks=13/431764, in_queue=431778, util=74.77%
  nvme2n2: ios=80/1626714, merge=0/0, ticks=9/441432, in_queue=441441, util=77.34%
  nvme2n3: ios=0/1656183, merge=0/0, ticks=0/468030, in_queue=468030, util=79.37%
  nvme2n4: ios=80/1557939, merge=0/0, ticks=8/424431, in_queue=424439, util=73.46%
  nvme2n5: ios=80/1649598, merge=0/0, ticks=8/428050, in_queue=428059, util=79.85%
  nvme2n6: ios=40/1609485, merge=0/0, ticks=4/426484, in_queue=426488, util=79.08%
  nvme2n7: ios=89/1583523, merge=0/0, ticks=11/428733, in_queue=428744, util=76.78%
  nvme2n8: ios=96/1582927, merge=0/0, ticks=11/423911, in_queue=423922, util=77.75%
  nvme2n9: ios=96/1666151, merge=0/0, ticks=13/447659, in_queue=447671, util=81.52%
  nvme2n10: ios=96/1524152, merge=0/0, ticks=12/420001, in_queue=420013, util=76.71%
  nvme2n11: ios=96/1630334, merge=0/0, ticks=14/449232, in_queue=449246, util=80.11%
  nvme2n12: ios=96/1582044, merge=0/0, ticks=11/433235, in_queue=433246, util=79.40%
  nvme2n13: ios=96/1663883, merge=0/0, ticks=10/477186, in_queue=477196, util=84.84%
  nvme2n14: ios=96/1522140, merge=0/0, ticks=14/442828, in_queue=442842, util=76.87%
  nvme2n15: ios=80/1564809, merge=0/0, ticks=11/437801, in_queue=437812, util=80.18%
  nvme2n16: ios=80/1596635, merge=0/0, ticks=10/441174, in_queue=441183, util=82.13%
  nvme2n17: ios=80/1646794, merge=0/0, ticks=7/467660, in_queue=467667, util=86.82%
  nvme2n18: ios=80/1589903, merge=0/0, ticks=11/449100, in_queue=449112, util=83.04%
  nvme2n19: ios=80/1613331, merge=0/0, ticks=12/433444, in_queue=433456, util=86.20%
  nvme2n20: ios=80/1594185, merge=0/0, ticks=9/433980, in_queue=433988, util=87.45%
  nvme2n21: ios=80/1566221, merge=0/0, ticks=11/430998, in_queue=431009, util=86.10%
  nvme2n22: ios=80/1636843, merge=0/0, ticks=8/462546, in_queue=462554, util=87.66%
  nvme2n23: ios=80/1522876, merge=0/0, ticks=13/441547, in_queue=441560, util=84.79%
  nvme2n24: ios=80/1636153, merge=0/0, ticks=8/465240, in_queue=465248, util=90.78%
  nvme2n25: ios=80/1507958, merge=0/0, ticks=9/419422, in_queue=419431, util=83.93%
  nvme2n26: ios=0/1536856, merge=0/0, ticks=0/428050, in_queue=428050, util=88.16%
  nvme2n27: ios=0/1608849, merge=0/0, ticks=0/457471, in_queue=457470, util=91.70%
  nvme2n28: ios=0/1607794, merge=0/0, ticks=0/467126, in_queue=467126, util=94.52%
  nvme2n29: ios=0/1623665, merge=0/0, ticks=0/432039, in_queue=432039, util=93.98%
  nvme2n30: ios=0/1633017, merge=0/0, ticks=0/441836, in_queue=441836, util=97.04%
  nvme2n31: ios=0/1644927, merge=0/0, ticks=0/455211, in_queue=455212, util=96.53%
  nvme2n32: ios=0/1582765, merge=0/0, ticks=0/447856, in_queue=447856, util=94.97%
-- 
2.22.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvmet: allow user to set req alloc flag
  2020-10-20  1:44 [PATCH 0/1] nvmet: allow user to set req alloc flag Chaitanya Kulkarni
@ 2020-10-20  1:44 ` Chaitanya Kulkarni
  2020-10-20  8:22 ` [PATCH 0/1] " Sagi Grimberg
  1 sibling, 0 replies; 5+ messages in thread
From: Chaitanya Kulkarni @ 2020-10-20  1:44 UTC (permalink / raw)
  To: linux-nvme; +Cc: logang, hch, Chaitanya Kulkarni, sagi


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 4603 bytes --]

By default, we set the passthru request allocation flag such that it
returns the error in the following code patch and we fail the I/O when
BLK_MQ_REQ_NOWAIT is set:-

nvme_alloc_request()
 blk_mq_alloc_request()
  blk_mq_queue_enter()
   if (flag & BLK_MQ_REQ_NOWAIT)
        return -EBUSY; <-- return if busy.

On some controllers using BLK_MQ_REQ_NOWAIT ends up in I/O error where
the controller is perfectly healthy and not in a degraded state.

Block layer request allocation does allow us to wait instead of
immediately returning the error when we BLK_MQ_REQ_NOWAIT flag is not 
used. This has shown to fix the I/O error problem reported under
heavy random write workload.

This patch fixes the problem with the request allocation by adding
a new configfs attribute so that user can optionally decide whether
to use BLK_MQ_REQ_NOWAIT or not. We retain the default behavior by
using BLK_MQ_REQ_NOWAIT when creating the nvmet passthru subsystem.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/configfs.c | 36 ++++++++++++++++++++++++++++++++++
 drivers/nvme/target/core.c     |  3 +++
 drivers/nvme/target/nvmet.h    |  1 +
 drivers/nvme/target/passthru.c |  3 ++-
 4 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 37e1d7784e17..3892b8fb0ff6 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -6,6 +6,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/blk-mq.h>
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/ctype.h>
@@ -736,9 +737,44 @@ static ssize_t nvmet_passthru_enable_store(struct config_item *item,
 }
 CONFIGFS_ATTR(nvmet_passthru_, enable);
 
+static ssize_t nvmet_passthru_req_nowait_show(struct config_item *item,
+						char *page)
+{
+	struct nvmet_subsys *subsys = to_subsys(item->ci_parent);
+
+	return snprintf(page, PAGE_SIZE, "%d\n", subsys->req_nowait);
+}
+
+static ssize_t nvmet_passthru_req_nowait_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_subsys *subsys = to_subsys(item->ci_parent);
+	bool req_nowait;
+	int ret = 0;
+
+	mutex_lock(&subsys->lock);
+	if (subsys->passthru_ctrl) {
+		pr_err("disable passthru ctrl before setting req_nowait\n");
+		ret = -EBUSY;
+		goto out;
+	}
+
+	if (strtobool(page, &req_nowait)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	subsys->req_nowait = req_nowait ? BLK_MQ_REQ_NOWAIT : 0;
+out:
+	mutex_unlock(&subsys->lock);
+	return ret ? ret : count;
+}
+CONFIGFS_ATTR(nvmet_passthru_, req_nowait);
+
 static struct configfs_attribute *nvmet_passthru_attrs[] = {
 	&nvmet_passthru_attr_device_path,
 	&nvmet_passthru_attr_enable,
+	&nvmet_passthru_attr_req_nowait,
 	NULL,
 };
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index aafcbc424b7a..50634945bd42 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1511,6 +1511,9 @@ struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
 	}
 	subsys->cntlid_min = NVME_CNTLID_MIN;
 	subsys->cntlid_max = NVME_CNTLID_MAX;
+#ifdef NVME_TARGET_PASSTHRU
+	subsys->req_nowait = BLK_MQ_REQ_NOWAIT;
+#endif
 	kref_init(&subsys->ref);
 
 	mutex_init(&subsys->lock);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 559a15ccc322..a7ca8f5860c9 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -249,6 +249,7 @@ struct nvmet_subsys {
 	struct nvme_ctrl	*passthru_ctrl;
 	char			*passthru_ctrl_path;
 	struct config_group	passthru_group;
+	blk_mq_req_flags_t	req_nowait;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
 };
 
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
index 56c571052216..7a2ce00f7a25 100644
--- a/drivers/nvme/target/passthru.c
+++ b/drivers/nvme/target/passthru.c
@@ -215,6 +215,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
 
 static void nvmet_passthru_execute_cmd(struct nvmet_req *req)
 {
+	blk_mq_req_flags_t req_nowait = req->sq->ctrl->subsys->req_nowait;
 	struct nvme_ctrl *ctrl = nvmet_req_passthru_ctrl(req);
 	struct request_queue *q = ctrl->admin_q;
 	struct nvme_ns *ns = NULL;
@@ -236,7 +237,7 @@ static void nvmet_passthru_execute_cmd(struct nvmet_req *req)
 		q = ns->queue;
 	}
 
-	rq = nvme_alloc_request(q, req->cmd, BLK_MQ_REQ_NOWAIT, NVME_QID_ANY);
+	rq = nvme_alloc_request(q, req->cmd, req_nowait, NVME_QID_ANY);
 	if (IS_ERR(rq)) {
 		status = NVME_SC_INTERNAL;
 		goto out_put_ns;
-- 
2.22.1



[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/1] nvmet: allow user to set req alloc flag
  2020-10-20  1:44 [PATCH 0/1] nvmet: allow user to set req alloc flag Chaitanya Kulkarni
  2020-10-20  1:44 ` [PATCH 1/1] " Chaitanya Kulkarni
@ 2020-10-20  8:22 ` Sagi Grimberg
  2020-10-20 15:39   ` Logan Gunthorpe
  1 sibling, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2020-10-20  8:22 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme; +Cc: logang, hch


> Hi,
> 
> When using NVMeOF target in the passthru mode we allocate the request
> with BLK_MQ_REQ_NOWAIT flag. This allocates the request in the following
> manner :-
> 
> nvme_alloc_request()
>   blk_mq_alloc_rquest()
>    blk_mq_queue_enter()
>     if (flag & BLK_MQ_REQ_NOWAIT)
>    	return -EBUSY; <-- return if busy.
> 
> On the NVMe controller which I've the fio random write workload running
> parallel on 32 namespaces with higher queue depth results in I/O error,
> where blk_mq_queue_enter() returning -EBUSY as shown above. This problem
> is not easy to reproduce but occurs once in a while with following error
> (See 1 for detailed log) :-
> 
> test1: (groupid=0, jobs=32): err= 5
> (file:io_u.c:1744, func=io_u error, error=Input/output error):
> 
> When the flag BLK_MQ_REQ_NOWAIT is removed from the allocation the
> workload doen't result in the error.
> 
> This patch fixes the problem with the request allocation by adding
> a new configfs attribute so that user can optionally decide whether
> to use BLK_MQ_REQ_NOWAIT or not. We retain the default behavior by
> using BLK_MQ_REQ_NOWAIT when creating the nvmet passthru subsystem.

Why should we ever set REQ_NOWAIT at all? Nothing prevents the
host(s) queue depth from exceeding the controller queue depth...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/1] nvmet: allow user to set req alloc flag
  2020-10-20  8:22 ` [PATCH 0/1] " Sagi Grimberg
@ 2020-10-20 15:39   ` Logan Gunthorpe
  2020-10-20 15:53     ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Logan Gunthorpe @ 2020-10-20 15:39 UTC (permalink / raw)
  To: Sagi Grimberg, Chaitanya Kulkarni, linux-nvme; +Cc: hch



On 2020-10-20 2:22 a.m., Sagi Grimberg wrote:
> 
>> Hi,
>>
>> When using NVMeOF target in the passthru mode we allocate the request
>> with BLK_MQ_REQ_NOWAIT flag. This allocates the request in the following
>> manner :-
>>
>> nvme_alloc_request()
>>   blk_mq_alloc_rquest()
>>    blk_mq_queue_enter()
>>     if (flag & BLK_MQ_REQ_NOWAIT)
>>    	return -EBUSY; <-- return if busy.
>>
>> On the NVMe controller which I've the fio random write workload running
>> parallel on 32 namespaces with higher queue depth results in I/O error,
>> where blk_mq_queue_enter() returning -EBUSY as shown above. This problem
>> is not easy to reproduce but occurs once in a while with following error
>> (See 1 for detailed log) :-
>>
>> test1: (groupid=0, jobs=32): err= 5
>> (file:io_u.c:1744, func=io_u error, error=Input/output error):
>>
>> When the flag BLK_MQ_REQ_NOWAIT is removed from the allocation the
>> workload doen't result in the error.
>>
>> This patch fixes the problem with the request allocation by adding
>> a new configfs attribute so that user can optionally decide whether
>> to use BLK_MQ_REQ_NOWAIT or not. We retain the default behavior by
>> using BLK_MQ_REQ_NOWAIT when creating the nvmet passthru subsystem.
> 
> Why should we ever set REQ_NOWAIT at all? Nothing prevents the
> host(s) queue depth from exceeding the controller queue depth...

I agree... I certainly found adding a configfs attribute for this rather
off-putting. Why would the user want an option that turns on random errors?

Logan


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/1] nvmet: allow user to set req alloc flag
  2020-10-20 15:39   ` Logan Gunthorpe
@ 2020-10-20 15:53     ` Keith Busch
  0 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2020-10-20 15:53 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: linux-nvme, Sagi Grimberg, Chaitanya Kulkarni, hch

On Tue, Oct 20, 2020 at 09:39:28AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2020-10-20 2:22 a.m., Sagi Grimberg wrote:
> > 
> >> Hi,
> >>
> >> When using NVMeOF target in the passthru mode we allocate the request
> >> with BLK_MQ_REQ_NOWAIT flag. This allocates the request in the following
> >> manner :-
> >>
> >> nvme_alloc_request()
> >>   blk_mq_alloc_rquest()
> >>    blk_mq_queue_enter()
> >>     if (flag & BLK_MQ_REQ_NOWAIT)
> >>    	return -EBUSY; <-- return if busy.
> >>
> >> On the NVMe controller which I've the fio random write workload running
> >> parallel on 32 namespaces with higher queue depth results in I/O error,
> >> where blk_mq_queue_enter() returning -EBUSY as shown above. This problem
> >> is not easy to reproduce but occurs once in a while with following error
> >> (See 1 for detailed log) :-
> >>
> >> test1: (groupid=0, jobs=32): err= 5
> >> (file:io_u.c:1744, func=io_u error, error=Input/output error):
> >>
> >> When the flag BLK_MQ_REQ_NOWAIT is removed from the allocation the
> >> workload doen't result in the error.
> >>
> >> This patch fixes the problem with the request allocation by adding
> >> a new configfs attribute so that user can optionally decide whether
> >> to use BLK_MQ_REQ_NOWAIT or not. We retain the default behavior by
> >> using BLK_MQ_REQ_NOWAIT when creating the nvmet passthru subsystem.
> > 
> > Why should we ever set REQ_NOWAIT at all? Nothing prevents the
> > host(s) queue depth from exceeding the controller queue depth...
> 
> I agree... I certainly found adding a configfs attribute for this rather
> off-putting. Why would the user want an option that turns on random errors?

Right, you really only want to use NOWAIT when waiting can lead to bad
things like a deadlock. I don't think the target passthrough has any
such issue, so you should be able to safely leave that flag out.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-20  1:44 [PATCH 0/1] nvmet: allow user to set req alloc flag Chaitanya Kulkarni
2020-10-20  1:44 ` [PATCH 1/1] " Chaitanya Kulkarni
2020-10-20  8:22 ` [PATCH 0/1] " Sagi Grimberg
2020-10-20 15:39   ` Logan Gunthorpe
2020-10-20 15:53     ` Keith Busch

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git