* [PATCH 0/5] implement nvmf read/write queue maps
@ 2018-12-11 10:49 Sagi Grimberg
2018-12-11 10:49 ` [PATCH 1/5] blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues Sagi Grimberg
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 10:49 UTC (permalink / raw)
To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
This set implements read/write queue maps to nvmf (implemented in tcp
and rdma). We basically allow the users to pass in nr_write_queues
argument that will basically maps a separate set of queues to host
write I/O (or more correctly non-read I/O) and a set of queues to
hold read I/O (which is now controlled by the known nr_io_queues).
A patchset that restores nvme-rdma polling is in the pipe.
The polling is less trivial because:
1. we can find non I/O completions in the cq (i.e. memreg)
2. we need to start with non-polling for a sane connect and
then switch to polling which is not trivial behind the
cq API we use.
Note that read/write separation for rdma but especially tcp this can be
very clear win as we minimize the risk for head-of-queue blocking for
mixed workloads over a single tcp byte stream.
Sagi Grimberg (5):
blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues
nvme-fabrics: add missing nvmf_ctrl_options documentation
nvme-fabrics: allow user to set nr_write_queues for separate queue
maps
nvme-tcp: support separate queue maps for read and write
nvme-rdma: support read/write queue separation
block/blk-mq-rdma.c | 8 +++---
drivers/nvme/host/fabrics.c | 15 ++++++++++-
drivers/nvme/host/fabrics.h | 6 +++++
drivers/nvme/host/rdma.c | 39 ++++++++++++++++++++++++---
drivers/nvme/host/tcp.c | 53 ++++++++++++++++++++++++++++++++-----
include/linux/blk-mq-rdma.h | 2 +-
6 files changed, 108 insertions(+), 15 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/5] blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues
2018-12-11 10:49 [PATCH 0/5] implement nvmf read/write queue maps Sagi Grimberg
@ 2018-12-11 10:49 ` Sagi Grimberg
2018-12-11 13:34 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH 2/5] nvme-fabrics: add missing nvmf_ctrl_options documentation Sagi Grimberg
` (5 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 10:49 UTC (permalink / raw)
To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
block/blk-mq-rdma.c | 8 ++++----
drivers/nvme/host/rdma.c | 2 +-
include/linux/blk-mq-rdma.h | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
index a71576aff3a5..45030a81a1ed 100644
--- a/block/blk-mq-rdma.c
+++ b/block/blk-mq-rdma.c
@@ -29,24 +29,24 @@
* @set->nr_hw_queues, or @dev does not provide an affinity mask for a
* vector, we fallback to the naive mapping.
*/
-int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
+int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map,
struct ib_device *dev, int first_vec)
{
const struct cpumask *mask;
unsigned int queue, cpu;
- for (queue = 0; queue < set->nr_hw_queues; queue++) {
+ for (queue = 0; queue < map->nr_queues; queue++) {
mask = ib_get_vector_affinity(dev, first_vec + queue);
if (!mask)
goto fallback;
for_each_cpu(cpu, mask)
- set->map[0].mq_map[cpu] = queue;
+ map->mq_map[cpu] = map->queue_offset + queue;
}
return 0;
fallback:
- return blk_mq_map_queues(&set->map[0]);
+ return blk_mq_map_queues(map);
}
EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f2db848f6985..5057d5ab5aaa 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1751,7 +1751,7 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
{
struct nvme_rdma_ctrl *ctrl = set->driver_data;
- return blk_mq_rdma_map_queues(set, ctrl->device->dev, 0);
+ return blk_mq_rdma_map_queues(&set->map[0], ctrl->device->dev, 0);
}
static const struct blk_mq_ops nvme_rdma_mq_ops = {
diff --git a/include/linux/blk-mq-rdma.h b/include/linux/blk-mq-rdma.h
index b4ade198007d..7b6ecf9ac4c3 100644
--- a/include/linux/blk-mq-rdma.h
+++ b/include/linux/blk-mq-rdma.h
@@ -4,7 +4,7 @@
struct blk_mq_tag_set;
struct ib_device;
-int blk_mq_rdma_map_queues(struct blk_mq_tag_set *set,
+int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map,
struct ib_device *dev, int first_vec);
#endif /* _LINUX_BLK_MQ_RDMA_H */
--
2.17.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/5] nvme-fabrics: add missing nvmf_ctrl_options documentation
2018-12-11 10:49 [PATCH 0/5] implement nvmf read/write queue maps Sagi Grimberg
2018-12-11 10:49 ` [PATCH 1/5] blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues Sagi Grimberg
@ 2018-12-11 10:49 ` Sagi Grimberg
2018-12-11 13:35 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH 3/5] nvme-fabrics: allow user to set nr_write_queues for separate queue maps Sagi Grimberg
` (4 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 10:49 UTC (permalink / raw)
To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/fabrics.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 524a02a67817..28dc916ef26b 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -88,6 +88,9 @@ enum {
* @max_reconnects: maximum number of allowed reconnect attempts before removing
* the controller, (-1) means reconnect forever, zero means remove
* immediately;
+ * @disable_sqflow: disable controller sq flow control
+ * @hdr_digest: generate/verify header digest (TCP)
+ * @data_digest: generate/verify data digest (TCP)
*/
struct nvmf_ctrl_options {
unsigned mask;
--
2.17.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/5] nvme-fabrics: allow user to set nr_write_queues for separate queue maps
2018-12-11 10:49 [PATCH 0/5] implement nvmf read/write queue maps Sagi Grimberg
2018-12-11 10:49 ` [PATCH 1/5] blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues Sagi Grimberg
2018-12-11 10:49 ` [PATCH 2/5] nvme-fabrics: add missing nvmf_ctrl_options documentation Sagi Grimberg
@ 2018-12-11 10:49 ` Sagi Grimberg
2018-12-11 13:35 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH 4/5] nvme-tcp: support separate queue maps for read and write Sagi Grimberg
` (3 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 10:49 UTC (permalink / raw)
To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
This argument will specify how many I/O queues will be connected
in create_ctrl in addition to nr_io_queues. With this configuration,
I/O that carries payload from the host to the target, will use
the default hctx queue map, and I/O that involves target to host
transfers will use the read hctx queue map.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/fabrics.c | 15 ++++++++++++++-
drivers/nvme/host/fabrics.h | 3 +++
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 9c62c6838b76..066c3a02e08b 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -616,6 +616,7 @@ static const match_table_t opt_tokens = {
{ NVMF_OPT_DISABLE_SQFLOW, "disable_sqflow" },
{ NVMF_OPT_HDR_DIGEST, "hdr_digest" },
{ NVMF_OPT_DATA_DIGEST, "data_digest" },
+ { NVMF_OPT_NR_WRITE_QUEUES, "nr_write_queues=%d" },
{ NVMF_OPT_ERR, NULL }
};
@@ -837,6 +838,18 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
case NVMF_OPT_DATA_DIGEST:
opts->data_digest = true;
break;
+ case NVMF_OPT_NR_WRITE_QUEUES:
+ if (match_int(args, &token)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ if (token <= 0) {
+ pr_err("Invalid nr_write_queues %d\n", token);
+ ret = -EINVAL;
+ goto out;
+ }
+ opts->nr_write_queues = token;
+ break;
default:
pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
p);
@@ -954,7 +967,7 @@ EXPORT_SYMBOL_GPL(nvmf_free_options);
#define NVMF_ALLOWED_OPTS (NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
NVMF_OPT_KATO | NVMF_OPT_HOSTNQN | \
NVMF_OPT_HOST_ID | NVMF_OPT_DUP_CONNECT |\
- NVMF_OPT_DISABLE_SQFLOW)
+ NVMF_OPT_DISABLE_SQFLOW | NVMF_OPT_NR_WRITE_QUEUES)
static struct nvme_ctrl *
nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 28dc916ef26b..81b8fd1c0c5d 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -61,6 +61,7 @@ enum {
NVMF_OPT_DISABLE_SQFLOW = 1 << 14,
NVMF_OPT_HDR_DIGEST = 1 << 15,
NVMF_OPT_DATA_DIGEST = 1 << 16,
+ NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
};
/**
@@ -91,6 +92,7 @@ enum {
* @disable_sqflow: disable controller sq flow control
* @hdr_digest: generate/verify header digest (TCP)
* @data_digest: generate/verify data digest (TCP)
+ * @nr_write_queues: number of queues for write I/O
*/
struct nvmf_ctrl_options {
unsigned mask;
@@ -110,6 +112,7 @@ struct nvmf_ctrl_options {
bool disable_sqflow;
bool hdr_digest;
bool data_digest;
+ unsigned int nr_write_queues;
};
/*
--
2.17.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/5] nvme-tcp: support separate queue maps for read and write
2018-12-11 10:49 [PATCH 0/5] implement nvmf read/write queue maps Sagi Grimberg
` (2 preceding siblings ...)
2018-12-11 10:49 ` [PATCH 3/5] nvme-fabrics: allow user to set nr_write_queues for separate queue maps Sagi Grimberg
@ 2018-12-11 10:49 ` Sagi Grimberg
2018-12-11 13:41 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH 5/5] nvme-rdma: support read/write queue separation Sagi Grimberg
` (2 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 10:49 UTC (permalink / raw)
To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/tcp.c | 53 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 47 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 15543358e245..5c0ba99fb105 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1215,7 +1215,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
struct nvme_tcp_queue *queue = &ctrl->queues[qid];
struct linger sol = { .l_onoff = 1, .l_linger = 0 };
- int ret, opt, rcv_pdu_size;
+ int ret, opt, rcv_pdu_size, n;
queue->ctrl = ctrl;
INIT_LIST_HEAD(&queue->send_list);
@@ -1271,7 +1271,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
}
queue->sock->sk->sk_allocation = GFP_ATOMIC;
- queue->io_cpu = (qid == 0) ? 0 : qid - 1;
+ n = (qid ? qid - 1 : 0) % num_online_cpus();
+ queue->io_cpu = cpumask_next_wrap(n - 1, cpu_online_mask, -1, false);
queue->request = NULL;
queue->data_remaining = 0;
queue->ddgst_remaining = 0;
@@ -1433,6 +1434,7 @@ static struct blk_mq_tag_set *nvme_tcp_alloc_tagset(struct nvme_ctrl *nctrl,
set->driver_data = ctrl;
set->nr_hw_queues = nctrl->queue_count - 1;
set->timeout = NVME_IO_TIMEOUT;
+ set->nr_maps = 2 /* default + read */;
}
ret = blk_mq_alloc_tag_set(set);
@@ -1527,7 +1529,12 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
static unsigned int nvme_tcp_nr_io_queues(struct nvme_ctrl *ctrl)
{
- return min(ctrl->queue_count - 1, num_online_cpus());
+ unsigned int nr_io_queues;
+
+ nr_io_queues = min(ctrl->opts->nr_io_queues, num_online_cpus());
+ nr_io_queues += min(ctrl->opts->nr_write_queues, num_online_cpus());
+
+ return nr_io_queues;
}
static int nvme_alloc_io_queues(struct nvme_ctrl *ctrl)
@@ -2052,6 +2059,38 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,
return BLK_STS_OK;
}
+static int nvme_tcp_map_queues(struct blk_mq_tag_set *set)
+{
+ struct nvme_tcp_ctrl *ctrl = set->driver_data;
+ struct blk_mq_queue_map *map;
+
+ if (ctrl->ctrl.opts->nr_write_queues) {
+ /* separate read/write queues */
+ map = &set->map[HCTX_TYPE_DEFAULT];
+ map->queue_offset = 0;
+ map->nr_queues = ctrl->ctrl.opts->nr_write_queues;
+ blk_mq_map_queues(map);
+
+ map = &set->map[HCTX_TYPE_READ];
+ map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
+ map->queue_offset = ctrl->ctrl.opts->nr_write_queues;
+ blk_mq_map_queues(map);
+ } else {
+ /* mixed read/write queues */
+ map = &set->map[HCTX_TYPE_DEFAULT];
+ map->queue_offset = 0;
+ map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
+ blk_mq_map_queues(map);
+
+ map = &set->map[HCTX_TYPE_READ];
+ map->queue_offset = 0;
+ map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
+ blk_mq_map_queues(map);
+ }
+
+ return 0;
+}
+
static struct blk_mq_ops nvme_tcp_mq_ops = {
.queue_rq = nvme_tcp_queue_rq,
.complete = nvme_complete_rq,
@@ -2059,6 +2098,7 @@ static struct blk_mq_ops nvme_tcp_mq_ops = {
.exit_request = nvme_tcp_exit_request,
.init_hctx = nvme_tcp_init_hctx,
.timeout = nvme_tcp_timeout,
+ .map_queues = nvme_tcp_map_queues,
};
static struct blk_mq_ops nvme_tcp_admin_mq_ops = {
@@ -2113,7 +2153,7 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev,
INIT_LIST_HEAD(&ctrl->list);
ctrl->ctrl.opts = opts;
- ctrl->ctrl.queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
+ ctrl->ctrl.queue_count = opts->nr_io_queues + opts->nr_write_queues + 1;
ctrl->ctrl.sqsize = opts->queue_size - 1;
ctrl->ctrl.kato = opts->kato;
@@ -2155,7 +2195,7 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev,
goto out_free_ctrl;
}
- ctrl->queues = kcalloc(opts->nr_io_queues + 1, sizeof(*ctrl->queues),
+ ctrl->queues = kcalloc(ctrl->ctrl.queue_count, sizeof(*ctrl->queues),
GFP_KERNEL);
if (!ctrl->queues) {
ret = -ENOMEM;
@@ -2206,7 +2246,8 @@ static struct nvmf_transport_ops nvme_tcp_transport = {
.required_opts = NVMF_OPT_TRADDR,
.allowed_opts = NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY |
NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
- NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST,
+ NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |
+ NVMF_OPT_NR_IO_QUEUES,
.create_ctrl = nvme_tcp_create_ctrl,
};
--
2.17.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/5] nvme-rdma: support read/write queue separation
2018-12-11 10:49 [PATCH 0/5] implement nvmf read/write queue maps Sagi Grimberg
` (3 preceding siblings ...)
2018-12-11 10:49 ` [PATCH 4/5] nvme-tcp: support separate queue maps for read and write Sagi Grimberg
@ 2018-12-11 10:49 ` Sagi Grimberg
2018-12-11 13:42 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH nvme-cli 6/5] fabrics: pass in nr_write_queues Sagi Grimberg
2018-12-11 13:28 ` [PATCH 0/5] implement nvmf read/write queue maps Christoph Hellwig
6 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 10:49 UTC (permalink / raw)
To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/host/rdma.c | 39 ++++++++++++++++++++++++++++++++++++---
1 file changed, 36 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5057d5ab5aaa..cfe823a491f2 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -645,6 +645,8 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
nr_io_queues = min_t(unsigned int, nr_io_queues,
ibdev->num_comp_vectors);
+ nr_io_queues += min(opts->nr_write_queues, num_online_cpus());
+
ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
if (ret)
return ret;
@@ -714,6 +716,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
set->driver_data = ctrl;
set->nr_hw_queues = nctrl->queue_count - 1;
set->timeout = NVME_IO_TIMEOUT;
+ set->nr_maps = 2 /* default + read */;
}
ret = blk_mq_alloc_tag_set(set);
@@ -1750,8 +1753,37 @@ static void nvme_rdma_complete_rq(struct request *rq)
static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
{
struct nvme_rdma_ctrl *ctrl = set->driver_data;
+ struct blk_mq_queue_map *map;
+ int offset = 0;
+
+ if (ctrl->ctrl.opts->nr_write_queues) {
+ /* separate read/write queues */
+ map = &set->map[HCTX_TYPE_DEFAULT];
+ map->queue_offset = offset;
+ map->nr_queues = ctrl->ctrl.opts->nr_write_queues;
+ blk_mq_rdma_map_queues(map, ctrl->device->dev, 0);
+ offset += map->nr_queues;
+
+ map = &set->map[HCTX_TYPE_READ];
+ map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
+ map->queue_offset = offset;
+ blk_mq_rdma_map_queues(map, ctrl->device->dev, offset);
+ offset += map->nr_queues;
- return blk_mq_rdma_map_queues(&set->map[0], ctrl->device->dev, 0);
+ } else {
+ /* mixed read/write queues */
+ map = &set->map[HCTX_TYPE_DEFAULT];
+ map->queue_offset = 0;
+ map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
+ blk_mq_rdma_map_queues(map, ctrl->device->dev, 0);
+
+ map = &set->map[HCTX_TYPE_READ];
+ map->queue_offset = 0;
+ map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
+ blk_mq_rdma_map_queues(map, ctrl->device->dev, 0);
+ }
+
+ return 0;
}
static const struct blk_mq_ops nvme_rdma_mq_ops = {
@@ -1906,7 +1938,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
INIT_WORK(&ctrl->err_work, nvme_rdma_error_recovery_work);
INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work);
- ctrl->ctrl.queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
+ ctrl->ctrl.queue_count = opts->nr_io_queues + opts->nr_write_queues + 1;
ctrl->ctrl.sqsize = opts->queue_size - 1;
ctrl->ctrl.kato = opts->kato;
@@ -1957,7 +1989,8 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
.module = THIS_MODULE,
.required_opts = NVMF_OPT_TRADDR,
.allowed_opts = NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY |
- NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO,
+ NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
+ NVMF_OPT_NR_IO_QUEUES,
.create_ctrl = nvme_rdma_create_ctrl,
};
--
2.17.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH nvme-cli 6/5] fabrics: pass in nr_write_queues
2018-12-11 10:49 [PATCH 0/5] implement nvmf read/write queue maps Sagi Grimberg
` (4 preceding siblings ...)
2018-12-11 10:49 ` [PATCH 5/5] nvme-rdma: support read/write queue separation Sagi Grimberg
@ 2018-12-11 10:49 ` Sagi Grimberg
2018-12-11 19:30 ` Keith Busch
2018-12-11 13:28 ` [PATCH 0/5] implement nvmf read/write queue maps Christoph Hellwig
6 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 10:49 UTC (permalink / raw)
To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
nr_write_queues specifies the number of queues additional to nr_io_queues
will be connected. These queues will host write I/O (host to target
payload) while nr_io_queues will host read I/O (target to host payload).
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
fabrics.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fabrics.c b/fabrics.c
index 93f3410a1c72..55ce2ba79c4f 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -53,6 +53,7 @@ static struct config {
char *hostnqn;
char *hostid;
int nr_io_queues;
+ int nr_write_queues;
int queue_size;
int keep_alive_tmo;
int reconnect_delay;
@@ -621,6 +622,8 @@ static int build_options(char *argstr, int max_len)
add_argument(&argstr, &max_len, "hostid", cfg.hostid)) ||
add_int_argument(&argstr, &max_len, "nr_io_queues",
cfg.nr_io_queues) ||
+ add_int_argument(&argstr, &max_len, "nr_write_queues",
+ cfg.nr_write_queues) ||
add_int_argument(&argstr, &max_len, "queue_size", cfg.queue_size) ||
add_int_argument(&argstr, &max_len, "keep_alive_tmo",
cfg.keep_alive_tmo) ||
@@ -694,6 +697,13 @@ retry:
p += len;
}
+ if (cfg.nr_write_queues) {
+ len = sprintf(p, ",nr_write_queues=%d", cfg.nr_write_queues);
+ if (len < 0)
+ return -EINVAL;
+ p += len;
+ }
+
if (cfg.host_traddr) {
len = sprintf(p, ",host_traddr=%s", cfg.host_traddr);
if (len < 0)
@@ -1009,6 +1019,7 @@ int connect(const char *desc, int argc, char **argv)
{"hostnqn", 'q', "LIST", CFG_STRING, &cfg.hostnqn, required_argument, "user-defined hostnqn" },
{"hostid", 'I', "LIST", CFG_STRING, &cfg.hostid, required_argument, "user-defined hostid (if default not used)"},
{"nr-io-queues", 'i', "LIST", CFG_INT, &cfg.nr_io_queues, required_argument, "number of io queues to use (default is core count)" },
+ {"nr_write_queues", 'W', "LIST", CFG_INT, &cfg.nr_write_queues, required_argument, "number of write queues to use (default 0)" },
{"queue-size", 'Q', "LIST", CFG_INT, &cfg.queue_size, required_argument, "number of io queue elements to use (default 128)" },
{"keep-alive-tmo", 'k', "LIST", CFG_INT, &cfg.keep_alive_tmo, required_argument, "keep alive timeout period in seconds" },
{"reconnect-delay", 'c', "LIST", CFG_INT, &cfg.reconnect_delay, required_argument, "reconnect timeout period in seconds" },
--
2.17.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/5] implement nvmf read/write queue maps
2018-12-11 10:49 [PATCH 0/5] implement nvmf read/write queue maps Sagi Grimberg
` (5 preceding siblings ...)
2018-12-11 10:49 ` [PATCH nvme-cli 6/5] fabrics: pass in nr_write_queues Sagi Grimberg
@ 2018-12-11 13:28 ` Christoph Hellwig
6 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-12-11 13:28 UTC (permalink / raw)
To: Sagi Grimberg
Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
On Tue, Dec 11, 2018 at 02:49:30AM -0800, Sagi Grimberg wrote:
> This set implements read/write queue maps to nvmf (implemented in tcp
> and rdma). We basically allow the users to pass in nr_write_queues
> argument that will basically maps a separate set of queues to host
> write I/O (or more correctly non-read I/O) and a set of queues to
> hold read I/O (which is now controlled by the known nr_io_queues).
>
> A patchset that restores nvme-rdma polling is in the pipe.
> The polling is less trivial because:
> 1. we can find non I/O completions in the cq (i.e. memreg)
> 2. we need to start with non-polling for a sane connect and
> then switch to polling which is not trivial behind the
> cq API we use.
I think we should enhance the CQ API to better support polling,
the old poll code was a bit of a layering violation vs the core
code..
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues
2018-12-11 10:49 ` [PATCH 1/5] blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues Sagi Grimberg
@ 2018-12-11 13:34 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-12-11 13:34 UTC (permalink / raw)
To: Sagi Grimberg
Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
Looks good.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/5] nvme-fabrics: add missing nvmf_ctrl_options documentation
2018-12-11 10:49 ` [PATCH 2/5] nvme-fabrics: add missing nvmf_ctrl_options documentation Sagi Grimberg
@ 2018-12-11 13:35 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-12-11 13:35 UTC (permalink / raw)
To: Sagi Grimberg
Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/5] nvme-fabrics: allow user to set nr_write_queues for separate queue maps
2018-12-11 10:49 ` [PATCH 3/5] nvme-fabrics: allow user to set nr_write_queues for separate queue maps Sagi Grimberg
@ 2018-12-11 13:35 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-12-11 13:35 UTC (permalink / raw)
To: Sagi Grimberg
Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
On Tue, Dec 11, 2018 at 02:49:33AM -0800, Sagi Grimberg wrote:
> This argument will specify how many I/O queues will be connected
> in create_ctrl in addition to nr_io_queues. With this configuration,
> I/O that carries payload from the host to the target, will use
> the default hctx queue map, and I/O that involves target to host
> transfers will use the read hctx queue map.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] nvme-tcp: support separate queue maps for read and write
2018-12-11 10:49 ` [PATCH 4/5] nvme-tcp: support separate queue maps for read and write Sagi Grimberg
@ 2018-12-11 13:41 ` Christoph Hellwig
2018-12-11 23:11 ` Sagi Grimberg
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2018-12-11 13:41 UTC (permalink / raw)
To: Sagi Grimberg
Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
On Tue, Dec 11, 2018 at 02:49:34AM -0800, Sagi Grimberg wrote:
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> drivers/nvme/host/tcp.c | 53 ++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 47 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 15543358e245..5c0ba99fb105 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1215,7 +1215,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
> struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
> struct nvme_tcp_queue *queue = &ctrl->queues[qid];
> struct linger sol = { .l_onoff = 1, .l_linger = 0 };
> - int ret, opt, rcv_pdu_size;
> + int ret, opt, rcv_pdu_size, n;
>
> queue->ctrl = ctrl;
> INIT_LIST_HEAD(&queue->send_list);
> @@ -1271,7 +1271,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
> }
>
> queue->sock->sk->sk_allocation = GFP_ATOMIC;
> - queue->io_cpu = (qid == 0) ? 0 : qid - 1;
> + n = (qid ? qid - 1 : 0) % num_online_cpus();
> + queue->io_cpu = cpumask_next_wrap(n - 1, cpu_online_mask, -1, false);
> queue->request = NULL;
> queue->data_remaining = 0;
> queue->ddgst_remaining = 0;
> @@ -1433,6 +1434,7 @@ static struct blk_mq_tag_set *nvme_tcp_alloc_tagset(struct nvme_ctrl *nctrl,
> set->driver_data = ctrl;
> set->nr_hw_queues = nctrl->queue_count - 1;
> set->timeout = NVME_IO_TIMEOUT;
> + set->nr_maps = 2 /* default + read */;
> }
>
> ret = blk_mq_alloc_tag_set(set);
> @@ -1527,7 +1529,12 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
>
> static unsigned int nvme_tcp_nr_io_queues(struct nvme_ctrl *ctrl)
> {
> - return min(ctrl->queue_count - 1, num_online_cpus());
> + unsigned int nr_io_queues;
> +
> + nr_io_queues = min(ctrl->opts->nr_io_queues, num_online_cpus());
> + nr_io_queues += min(ctrl->opts->nr_write_queues, num_online_cpus());
> +
> + return nr_io_queues;
> }
>
> static int nvme_alloc_io_queues(struct nvme_ctrl *ctrl)
> @@ -2052,6 +2059,38 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,
> return BLK_STS_OK;
> }
>
> +static int nvme_tcp_map_queues(struct blk_mq_tag_set *set)
> +{
> + struct nvme_tcp_ctrl *ctrl = set->driver_data;
> + struct blk_mq_queue_map *map;
> +
> + if (ctrl->ctrl.opts->nr_write_queues) {
> + /* separate read/write queues */
> + map = &set->map[HCTX_TYPE_DEFAULT];
> + map->queue_offset = 0;
> + map->nr_queues = ctrl->ctrl.opts->nr_write_queues;
> + blk_mq_map_queues(map);
Shouldn't this use nr_io_queues?
> + map = &set->map[HCTX_TYPE_READ];
> + map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
> + map->queue_offset = ctrl->ctrl.opts->nr_write_queues;
> + blk_mq_map_queues(map);
> + } else {
> + /* mixed read/write queues */
> + map = &set->map[HCTX_TYPE_DEFAULT];
> + map->queue_offset = 0;
> + map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
> + blk_mq_map_queues(map);
> +
> + map = &set->map[HCTX_TYPE_READ];
> + map->queue_offset = 0;
> + map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
> + blk_mq_map_queues(map);
Also I find the reused local map variable a little odd and not helpful
for readability. What about something like:
static int nvme_tcp_map_queues(struct blk_mq_tag_set *set)
{
struct nvme_tcp_ctrl *ctrl = set->driver_data;
set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
set->map[HCTX_TYPE_DEFAULT].nr_queues = ctrl->ctrl.opts->nr_io_queues;
blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
if (ctrl->ctrl.opts->nr_write_queues) {
/* separate read/write queues */
set->map[HCTX_TYPE_READ].queue_offset =
ctrl->ctrl.opts->nr_io_queues;
set->map[HCTX_TYPE_READ].nr_queues =
ctrl->ctrl.opts->nr_write_queues;
} else {
/* mixed read/write queues */
set->map[HCTX_TYPE_READ].queue_offset = 0;
set->map[HCTX_TYPE_READ].nr_queues =
ctrl->ctrl.opts->nr_io_queues;
}
blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
return 0;
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] nvme-rdma: support read/write queue separation
2018-12-11 10:49 ` [PATCH 5/5] nvme-rdma: support read/write queue separation Sagi Grimberg
@ 2018-12-11 13:42 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2018-12-11 13:42 UTC (permalink / raw)
To: Sagi Grimberg
Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block, Jens Axboe
This probably wants a little explanation..
> struct nvme_rdma_ctrl *ctrl = set->driver_data;
> + struct blk_mq_queue_map *map;
> + int offset = 0;
> +
> + if (ctrl->ctrl.opts->nr_write_queues) {
> + /* separate read/write queues */
> + map = &set->map[HCTX_TYPE_DEFAULT];
> + map->queue_offset = offset;
> + map->nr_queues = ctrl->ctrl.opts->nr_write_queues;
> + blk_mq_rdma_map_queues(map, ctrl->device->dev, 0);
> + offset += map->nr_queues;
> +
> + map = &set->map[HCTX_TYPE_READ];
> + map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
> + map->queue_offset = offset;
> + blk_mq_rdma_map_queues(map, ctrl->device->dev, offset);
> + offset += map->nr_queues;
>
> + } else {
> + /* mixed read/write queues */
> + map = &set->map[HCTX_TYPE_DEFAULT];
> + map->queue_offset = 0;
> + map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
> + blk_mq_rdma_map_queues(map, ctrl->device->dev, 0);
> +
> + map = &set->map[HCTX_TYPE_READ];
> + map->queue_offset = 0;
> + map->nr_queues = ctrl->ctrl.opts->nr_io_queues;
> + blk_mq_rdma_map_queues(map, ctrl->device->dev, 0);
> + }
Same comment and suggested style as for the TCP one.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH nvme-cli 6/5] fabrics: pass in nr_write_queues
2018-12-11 10:49 ` [PATCH nvme-cli 6/5] fabrics: pass in nr_write_queues Sagi Grimberg
@ 2018-12-11 19:30 ` Keith Busch
2018-12-11 23:34 ` Sagi Grimberg
0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2018-12-11 19:30 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, linux-block, Jens Axboe
On Tue, Dec 11, 2018 at 02:49:36AM -0800, Sagi Grimberg wrote:
> if (cfg.host_traddr) {
> len = sprintf(p, ",host_traddr=%s", cfg.host_traddr);
> if (len < 0)
> @@ -1009,6 +1019,7 @@ int connect(const char *desc, int argc, char **argv)
> {"hostnqn", 'q', "LIST", CFG_STRING, &cfg.hostnqn, required_argument, "user-defined hostnqn" },
> {"hostid", 'I', "LIST", CFG_STRING, &cfg.hostid, required_argument, "user-defined hostid (if default not used)"},
> {"nr-io-queues", 'i', "LIST", CFG_INT, &cfg.nr_io_queues, required_argument, "number of io queues to use (default is core count)" },
> + {"nr_write_queues", 'W', "LIST", CFG_INT, &cfg.nr_write_queues, required_argument, "number of write queues to use (default 0)" },
I realize the kernel's nvme-admin uses "_", but everything else in the
shell utility here uses "-", so just want to use that for consistency.
And the documentation needs to include this as well.
Otherwise this is great.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/5] nvme-tcp: support separate queue maps for read and write
2018-12-11 13:41 ` Christoph Hellwig
@ 2018-12-11 23:11 ` Sagi Grimberg
0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 23:11 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block, Jens Axboe
>> +static int nvme_tcp_map_queues(struct blk_mq_tag_set *set)
>> +{
>> + struct nvme_tcp_ctrl *ctrl = set->driver_data;
>> + struct blk_mq_queue_map *map;
>> +
>> + if (ctrl->ctrl.opts->nr_write_queues) {
>> + /* separate read/write queues */
>> + map = &set->map[HCTX_TYPE_DEFAULT];
>> + map->queue_offset = 0;
>> + map->nr_queues = ctrl->ctrl.opts->nr_write_queues;
>> + blk_mq_map_queues(map);
>
> Shouldn't this use nr_io_queues?
The intent is that HCTX_TYPE_READ will always use nr_io_queues and
HCTX_TYPE_DEFAULT will use nr_write_queues.. I'll document that
in the change log.
> Also I find the reused local map variable a little odd and not helpful
> for readability. What about something like:
>
> static int nvme_tcp_map_queues(struct blk_mq_tag_set *set)
> {
> struct nvme_tcp_ctrl *ctrl = set->driver_data;
>
> set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> set->map[HCTX_TYPE_DEFAULT].nr_queues = ctrl->ctrl.opts->nr_io_queues;
> blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>
> if (ctrl->ctrl.opts->nr_write_queues) {
> /* separate read/write queues */
> set->map[HCTX_TYPE_READ].queue_offset =
> ctrl->ctrl.opts->nr_io_queues;
> set->map[HCTX_TYPE_READ].nr_queues =
> ctrl->ctrl.opts->nr_write_queues;
> } else {
> /* mixed read/write queues */
> set->map[HCTX_TYPE_READ].queue_offset = 0;
> set->map[HCTX_TYPE_READ].nr_queues =
> ctrl->ctrl.opts->nr_io_queues;
> }
> blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
> return 0;
> }
That's better.. I'll update the patch with a change according to my
note above..
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH nvme-cli 6/5] fabrics: pass in nr_write_queues
2018-12-11 19:30 ` Keith Busch
@ 2018-12-11 23:34 ` Sagi Grimberg
0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2018-12-11 23:34 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-nvme, Christoph Hellwig, linux-block, Jens Axboe
> On Tue, Dec 11, 2018 at 02:49:36AM -0800, Sagi Grimberg wrote:
>> if (cfg.host_traddr) {
>> len = sprintf(p, ",host_traddr=%s", cfg.host_traddr);
>> if (len < 0)
>> @@ -1009,6 +1019,7 @@ int connect(const char *desc, int argc, char **argv)
>> {"hostnqn", 'q', "LIST", CFG_STRING, &cfg.hostnqn, required_argument, "user-defined hostnqn" },
>> {"hostid", 'I', "LIST", CFG_STRING, &cfg.hostid, required_argument, "user-defined hostid (if default not used)"},
>> {"nr-io-queues", 'i', "LIST", CFG_INT, &cfg.nr_io_queues, required_argument, "number of io queues to use (default is core count)" },
>> + {"nr_write_queues", 'W', "LIST", CFG_INT, &cfg.nr_write_queues, required_argument, "number of write queues to use (default 0)" },
>
> I realize the kernel's nvme-admin uses "_", but everything else in the
> shell utility here uses "-", so just want to use that for consistency.
> And the documentation needs to include this as well.
Will do.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-12-11 23:34 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-11 10:49 [PATCH 0/5] implement nvmf read/write queue maps Sagi Grimberg
2018-12-11 10:49 ` [PATCH 1/5] blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues Sagi Grimberg
2018-12-11 13:34 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH 2/5] nvme-fabrics: add missing nvmf_ctrl_options documentation Sagi Grimberg
2018-12-11 13:35 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH 3/5] nvme-fabrics: allow user to set nr_write_queues for separate queue maps Sagi Grimberg
2018-12-11 13:35 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH 4/5] nvme-tcp: support separate queue maps for read and write Sagi Grimberg
2018-12-11 13:41 ` Christoph Hellwig
2018-12-11 23:11 ` Sagi Grimberg
2018-12-11 10:49 ` [PATCH 5/5] nvme-rdma: support read/write queue separation Sagi Grimberg
2018-12-11 13:42 ` Christoph Hellwig
2018-12-11 10:49 ` [PATCH nvme-cli 6/5] fabrics: pass in nr_write_queues Sagi Grimberg
2018-12-11 19:30 ` Keith Busch
2018-12-11 23:34 ` Sagi Grimberg
2018-12-11 13:28 ` [PATCH 0/5] implement nvmf read/write queue maps Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).