All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] nvme-rdma: fix queue mapping when queue count is limited
@ 2019-05-24  4:10 Sagi Grimberg
  2019-05-24  4:10 ` [PATCH 2/2] nvme-tcp: " Sagi Grimberg
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Sagi Grimberg @ 2019-05-24  4:10 UTC (permalink / raw)


When the controller supports less queues than requested, we
should make sure that queue mapping does the right thing and
not assume that all queues are available. This fixes a crash
when the controller supports less queues than requested.

The rules are:
1. if no write/poll queues are requested, we assign the available queues
   to the default queue map. The default and read queue maps share the
   existing queues.
2. if write queues are requested:
  - first make sure that read queue map gets the requested
    nr_io_queues count
  - then grant the default queue map the minimum between the requested
    nr_write_queues and the remaining queues. If there are no available
    queues to dedicate to the default queue map, fallback to (1) and
    share all the queues in the existing queue map.
3. if poll queues are requested:
  - map the remaining queues to the poll queue map.

Also, provide a log indication on how we constructed the different
queue maps.

One side-affect change is that we remove the rdma device number of
completion vectors device limitation when setting the queue count.
No rdma device uses managed affinity so there is no real reason as
we cannot guarantee any vector based queue mapping. This change is
squashed here because this patch needs to go to stable kernel and
its simpler having it here than having it live on its own without
a clear indication why it ended up in stable kenrels.

Reported-by: Harris, James R <james.r.harris at intel.com>
Cc: <stable at vger.kernel.org> # v5.0+
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/rdma.c | 94 ++++++++++++++++++++++++----------------
 1 file changed, 56 insertions(+), 38 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f383146e7d0f..cd7d93727b30 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -640,35 +640,12 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
-	struct ib_device *ibdev = ctrl->device->dev;
 	unsigned int nr_io_queues;
 	int i, ret;
 
-	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
-
-	/*
-	 * we map queues according to the device irq vectors for
-	 * optimal locality so we don't need more queues than
-	 * completion vectors.
-	 */
-	nr_io_queues = min_t(unsigned int, nr_io_queues,
-				ibdev->num_comp_vectors);
-
-	if (opts->nr_write_queues) {
-		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
-				min(opts->nr_write_queues, nr_io_queues);
-		nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
-	} else {
-		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
-	}
-
-	ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
-
-	if (opts->nr_poll_queues) {
-		ctrl->io_queues[HCTX_TYPE_POLL] =
-			min(opts->nr_poll_queues, num_online_cpus());
-		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
-	}
+	nr_io_queues = min(opts->nr_io_queues, num_online_cpus()) +
+		min(opts->nr_write_queues, num_online_cpus()) +
+		min(opts->nr_poll_queues, num_online_cpus());
 
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
@@ -681,6 +658,34 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 	dev_info(ctrl->ctrl.device,
 		"creating %d I/O queues.\n", nr_io_queues);
 
+	if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
+		/*
+		 * separate read/write queues
+		 * hand out dedicated default queues only after we have
+		 * sufficient read queues.
+		 */
+		ctrl->io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_write_queues, nr_io_queues);
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	} else {
+		/*
+		 * shared read/write queues
+		 * either no write queues were requested, or we don't have
+		 * sufficient queue count to have dedicated default queues.
+		 */
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_io_queues, nr_io_queues);
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	}
+
+	if (opts->nr_poll_queues && nr_io_queues) {
+		/* map dedicated poll queues only if we have queues left */
+		ctrl->io_queues[HCTX_TYPE_POLL] =
+			min(opts->nr_poll_queues, nr_io_queues);
+	}
+
 	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
 		ret = nvme_rdma_alloc_queue(ctrl, i,
 				ctrl->ctrl.sqsize + 1);
@@ -1763,17 +1768,24 @@ static void nvme_rdma_complete_rq(struct request *rq)
 static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
 {
 	struct nvme_rdma_ctrl *ctrl = set->driver_data;
+	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 
-	set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
-	set->map[HCTX_TYPE_DEFAULT].nr_queues =
-			ctrl->io_queues[HCTX_TYPE_DEFAULT];
-	set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
-	if (ctrl->ctrl.opts->nr_write_queues) {
+	if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
 		/* separate read/write queues */
+		set->map[HCTX_TYPE_DEFAULT].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_READ];
 		set->map[HCTX_TYPE_READ].queue_offset =
-				ctrl->io_queues[HCTX_TYPE_DEFAULT];
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
 	} else {
-		/* mixed read/write queues */
+		/* shared read/write queues */
+		set->map[HCTX_TYPE_DEFAULT].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
 		set->map[HCTX_TYPE_READ].queue_offset = 0;
 	}
 	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_DEFAULT],
@@ -1781,16 +1793,22 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
 	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
 			ctrl->device->dev, 0);
 
-	if (ctrl->ctrl.opts->nr_poll_queues) {
+	if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
+		/* map dedicated poll queues only if we have queues left */
 		set->map[HCTX_TYPE_POLL].nr_queues =
 				ctrl->io_queues[HCTX_TYPE_POLL];
 		set->map[HCTX_TYPE_POLL].queue_offset =
-				ctrl->io_queues[HCTX_TYPE_DEFAULT];
-		if (ctrl->ctrl.opts->nr_write_queues)
-			set->map[HCTX_TYPE_POLL].queue_offset +=
-				ctrl->io_queues[HCTX_TYPE_READ];
+			ctrl->io_queues[HCTX_TYPE_DEFAULT] +
+			ctrl->io_queues[HCTX_TYPE_READ];
 		blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
 	}
+
+	dev_info(ctrl->ctrl.device,
+		"mapped %d/%d/%d default/read/poll queues.\n",
+		ctrl->io_queues[HCTX_TYPE_DEFAULT],
+		ctrl->io_queues[HCTX_TYPE_READ],
+		ctrl->io_queues[HCTX_TYPE_POLL]);
+
 	return 0;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] nvme-tcp: fix queue mapping when queue count is limited
  2019-05-24  4:10 [PATCH 1/2] nvme-rdma: fix queue mapping when queue count is limited Sagi Grimberg
@ 2019-05-24  4:10 ` Sagi Grimberg
  2019-05-28 21:50   ` Harris, James R
  2019-05-27 15:43 ` [PATCH 1/2] nvme-rdma: " Max Gurtovoy
  2019-05-28 21:49 ` Harris, James R
  2 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2019-05-24  4:10 UTC (permalink / raw)


When the controller supports less queues than requested, we
should make sure that queue mapping does the right thing and
not assume that all queues are available. This fixes a crash
when the controller supports less queues than requested.

The rules are:
1. if no write queues are requested, we assign the available queues
   to the default queue map. The default and read queue maps share the
   existing queues.
2. if write queues are requested:
  - first make sure that read queue map gets the requested
    nr_io_queues count
  - then grant the default queue map the minimum between the requested
    nr_write_queues and the remaining queues. If there are no available
    queues to dedicate to the default queue map, fallback to (1) and
    share all the queues in the existing queue map.

Also, provide a log indication on how we constructed the different
queue maps.

Reported-by: Harris, James R <james.r.harris at intel.com>
Cc: <stable at vger.kernel.org> # v5.0+
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/tcp.c | 57 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 50 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 2b107a1d152b..08a2501b9357 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -111,6 +111,7 @@ struct nvme_tcp_ctrl {
 	struct work_struct	err_work;
 	struct delayed_work	connect_work;
 	struct nvme_tcp_request async_req;
+	u32			io_queues[HCTX_MAX_TYPES];
 };
 
 static LIST_HEAD(nvme_tcp_ctrl_list);
@@ -1564,6 +1565,35 @@ static unsigned int nvme_tcp_nr_io_queues(struct nvme_ctrl *ctrl)
 	return nr_io_queues;
 }
 
+static void nvme_tcp_set_io_queues(struct nvme_ctrl *nctrl,
+		unsigned int nr_io_queues)
+{
+	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
+	struct nvmf_ctrl_options *opts = nctrl->opts;
+
+	if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
+		/*
+		 * separate read/write queues
+		 * hand out dedicated default queues only after we have
+		 * sufficient read queues.
+		 */
+		ctrl->io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_write_queues, nr_io_queues);
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	} else {
+		/*
+		 * shared read/write queues
+		 * either no write queues were requested, or we don't have
+		 * sufficient queue count to have dedicated default queues.
+		 */
+		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
+			min(opts->nr_io_queues, nr_io_queues);
+		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
+	}
+}
+
 static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
 {
 	unsigned int nr_io_queues;
@@ -1581,6 +1611,8 @@ static int nvme_tcp_alloc_io_queues(struct nvme_ctrl *ctrl)
 	dev_info(ctrl->device,
 		"creating %d I/O queues.\n", nr_io_queues);
 
+	nvme_tcp_set_io_queues(ctrl, nr_io_queues);
+
 	return __nvme_tcp_alloc_io_queues(ctrl);
 }
 
@@ -2089,23 +2121,34 @@ static blk_status_t nvme_tcp_queue_rq(struct blk_mq_hw_ctx *hctx,
 static int nvme_tcp_map_queues(struct blk_mq_tag_set *set)
 {
 	struct nvme_tcp_ctrl *ctrl = set->driver_data;
+	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 
-	set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
-	set->map[HCTX_TYPE_READ].nr_queues = ctrl->ctrl.opts->nr_io_queues;
-	if (ctrl->ctrl.opts->nr_write_queues) {
+	if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
 		/* separate read/write queues */
 		set->map[HCTX_TYPE_DEFAULT].nr_queues =
-				ctrl->ctrl.opts->nr_write_queues;
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_READ];
 		set->map[HCTX_TYPE_READ].queue_offset =
-				ctrl->ctrl.opts->nr_write_queues;
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
 	} else {
-		/* mixed read/write queues */
+		/* shared read/write queues */
 		set->map[HCTX_TYPE_DEFAULT].nr_queues =
-				ctrl->ctrl.opts->nr_io_queues;
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
+		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
+		set->map[HCTX_TYPE_READ].nr_queues =
+			ctrl->io_queues[HCTX_TYPE_DEFAULT];
 		set->map[HCTX_TYPE_READ].queue_offset = 0;
 	}
 	blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
 	blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);
+
+	dev_info(ctrl->ctrl.device,
+		"mapped %d/%d default/read queues.\n",
+		ctrl->io_queues[HCTX_TYPE_DEFAULT],
+		ctrl->io_queues[HCTX_TYPE_READ]);
+
 	return 0;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 1/2] nvme-rdma: fix queue mapping when queue count is limited
  2019-05-24  4:10 [PATCH 1/2] nvme-rdma: fix queue mapping when queue count is limited Sagi Grimberg
  2019-05-24  4:10 ` [PATCH 2/2] nvme-tcp: " Sagi Grimberg
@ 2019-05-27 15:43 ` Max Gurtovoy
  2019-05-28 19:46   ` Sagi Grimberg
  2019-05-28 21:49 ` Harris, James R
  2 siblings, 1 reply; 6+ messages in thread
From: Max Gurtovoy @ 2019-05-27 15:43 UTC (permalink / raw)



On 5/24/2019 7:10 AM, Sagi Grimberg wrote:
> When the controller supports less queues than requested, we
> should make sure that queue mapping does the right thing and
> not assume that all queues are available. This fixes a crash
> when the controller supports less queues than requested.
>
> The rules are:
> 1. if no write/poll queues are requested, we assign the available queues
>     to the default queue map. The default and read queue maps share the
>     existing queues.
> 2. if write queues are requested:
>    - first make sure that read queue map gets the requested
>      nr_io_queues count
>    - then grant the default queue map the minimum between the requested
>      nr_write_queues and the remaining queues. If there are no available
>      queues to dedicate to the default queue map, fallback to (1) and
>      share all the queues in the existing queue map.
> 3. if poll queues are requested:
>    - map the remaining queues to the poll queue map.
>
> Also, provide a log indication on how we constructed the different
> queue maps.
>
> One side-affect change is that we remove the rdma device number of
> completion vectors device limitation when setting the queue count.

Why creating more rw queues than completion vectors ?

Did you test the performance with this change ?

> No rdma device uses managed affinity so there is no real reason as
> we cannot guarantee any vector based queue mapping. This change is
> squashed here because this patch needs to go to stable kernel and
> its simpler having it here than having it live on its own without
> a clear indication why it ended up in stable kenrels.
>
> Reported-by: Harris, James R <james.r.harris at intel.com>
> Cc: <stable at vger.kernel.org> # v5.0+
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
>   drivers/nvme/host/rdma.c | 94 ++++++++++++++++++++++++----------------
>   1 file changed, 56 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index f383146e7d0f..cd7d93727b30 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -640,35 +640,12 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
>   static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>   {
>   	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
> -	struct ib_device *ibdev = ctrl->device->dev;
>   	unsigned int nr_io_queues;
>   	int i, ret;
>   
> -	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
> -
> -	/*
> -	 * we map queues according to the device irq vectors for
> -	 * optimal locality so we don't need more queues than
> -	 * completion vectors.
> -	 */
> -	nr_io_queues = min_t(unsigned int, nr_io_queues,
> -				ibdev->num_comp_vectors);

I wouldn't do it in this patch as we need to see perf results for this.

> -
> -	if (opts->nr_write_queues) {
> -		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> -				min(opts->nr_write_queues, nr_io_queues);
> -		nr_io_queues += ctrl->io_queues[HCTX_TYPE_DEFAULT];
> -	} else {
> -		ctrl->io_queues[HCTX_TYPE_DEFAULT] = nr_io_queues;
> -	}
> -
> -	ctrl->io_queues[HCTX_TYPE_READ] = nr_io_queues;
> -
> -	if (opts->nr_poll_queues) {
> -		ctrl->io_queues[HCTX_TYPE_POLL] =
> -			min(opts->nr_poll_queues, num_online_cpus());
> -		nr_io_queues += ctrl->io_queues[HCTX_TYPE_POLL];
> -	}
> +	nr_io_queues = min(opts->nr_io_queues, num_online_cpus()) +
> +		min(opts->nr_write_queues, num_online_cpus()) +
> +		min(opts->nr_poll_queues, num_online_cpus());
>   
>   	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
>   	if (ret)
> @@ -681,6 +658,34 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
>   	dev_info(ctrl->ctrl.device,
>   		"creating %d I/O queues.\n", nr_io_queues);
>   
> +	if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
I would use local variables for write/poll/default queues and not 
ctrl->opts pointer.

opts->nr_write_queues is not limited to num_online_cpu and can be 1000 
(and you might steal poll queues).

Maybe we should limit it in nvmf_parse_options ?

> +		/*
> +		 * separate read/write queues
> +		 * hand out dedicated default queues only after we have
> +		 * sufficient read queues.
> +		 */
> +		ctrl->io_queues[HCTX_TYPE_READ] = opts->nr_io_queues;
> +		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_READ];
> +		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> +			min(opts->nr_write_queues, nr_io_queues);
> +		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +	} else {
> +		/*
> +		 * shared read/write queues
> +		 * either no write queues were requested, or we don't have
> +		 * sufficient queue count to have dedicated default queues.
> +		 */
> +		ctrl->io_queues[HCTX_TYPE_DEFAULT] =
> +			min(opts->nr_io_queues, nr_io_queues);
> +		nr_io_queues -= ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +	}
> +
> +	if (opts->nr_poll_queues && nr_io_queues) {
> +		/* map dedicated poll queues only if we have queues left */
> +		ctrl->io_queues[HCTX_TYPE_POLL] =
> +			min(opts->nr_poll_queues, nr_io_queues);
> +	}
> +
>   	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
>   		ret = nvme_rdma_alloc_queue(ctrl, i,
>   				ctrl->ctrl.sqsize + 1);
> @@ -1763,17 +1768,24 @@ static void nvme_rdma_complete_rq(struct request *rq)
>   static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
>   {
>   	struct nvme_rdma_ctrl *ctrl = set->driver_data;
> +	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>   
> -	set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> -	set->map[HCTX_TYPE_DEFAULT].nr_queues =
> -			ctrl->io_queues[HCTX_TYPE_DEFAULT];
> -	set->map[HCTX_TYPE_READ].nr_queues = ctrl->io_queues[HCTX_TYPE_READ];
> -	if (ctrl->ctrl.opts->nr_write_queues) {
> +	if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
>   		/* separate read/write queues */
> +		set->map[HCTX_TYPE_DEFAULT].nr_queues =
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> +		set->map[HCTX_TYPE_READ].nr_queues =
> +			ctrl->io_queues[HCTX_TYPE_READ];
>   		set->map[HCTX_TYPE_READ].queue_offset =
> -				ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT];
>   	} else {
> -		/* mixed read/write queues */
> +		/* shared read/write queues */
> +		set->map[HCTX_TYPE_DEFAULT].nr_queues =
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT];
> +		set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
> +		set->map[HCTX_TYPE_READ].nr_queues =
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT];
>   		set->map[HCTX_TYPE_READ].queue_offset = 0;

why we should set the READ map in case of shared queues ?

In that case we should rename TYPE_DEFAULT to TYPE_WRITE..


>   	}
>   	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_DEFAULT],
> @@ -1781,16 +1793,22 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
>   	blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
>   			ctrl->device->dev, 0);
>   
> -	if (ctrl->ctrl.opts->nr_poll_queues) {
> +	if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
> +		/* map dedicated poll queues only if we have queues left */
>   		set->map[HCTX_TYPE_POLL].nr_queues =
>   				ctrl->io_queues[HCTX_TYPE_POLL];
>   		set->map[HCTX_TYPE_POLL].queue_offset =
> -				ctrl->io_queues[HCTX_TYPE_DEFAULT];
> -		if (ctrl->ctrl.opts->nr_write_queues)
> -			set->map[HCTX_TYPE_POLL].queue_offset +=
> -				ctrl->io_queues[HCTX_TYPE_READ];
> +			ctrl->io_queues[HCTX_TYPE_DEFAULT] +
> +			ctrl->io_queues[HCTX_TYPE_READ];

I would use set->map[TYPE].nr_queues (the real offset).


>   		blk_mq_map_queues(&set->map[HCTX_TYPE_POLL]);
>   	}
> +
> +	dev_info(ctrl->ctrl.device,
> +		"mapped %d/%d/%d default/read/poll queues.\n",
> +		ctrl->io_queues[HCTX_TYPE_DEFAULT],
> +		ctrl->io_queues[HCTX_TYPE_READ],
> +		ctrl->io_queues[HCTX_TYPE_POLL]);
> +
>   	return 0;
>   }
>   

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] nvme-rdma: fix queue mapping when queue count is limited
  2019-05-27 15:43 ` [PATCH 1/2] nvme-rdma: " Max Gurtovoy
@ 2019-05-28 19:46   ` Sagi Grimberg
  0 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2019-05-28 19:46 UTC (permalink / raw)



>> When the controller supports less queues than requested, we
>> should make sure that queue mapping does the right thing and
>> not assume that all queues are available. This fixes a crash
>> when the controller supports less queues than requested.
>>
>> The rules are:
>> 1. if no write/poll queues are requested, we assign the available queues
>> ??? to the default queue map. The default and read queue maps share the
>> ??? existing queues.
>> 2. if write queues are requested:
>> ?? - first make sure that read queue map gets the requested
>> ???? nr_io_queues count
>> ?? - then grant the default queue map the minimum between the requested
>> ???? nr_write_queues and the remaining queues. If there are no available
>> ???? queues to dedicate to the default queue map, fallback to (1) and
>> ???? share all the queues in the existing queue map.
>> 3. if poll queues are requested:
>> ?? - map the remaining queues to the poll queue map.
>>
>> Also, provide a log indication on how we constructed the different
>> queue maps.
>>
>> One side-affect change is that we remove the rdma device number of
>> completion vectors device limitation when setting the queue count.
> 
> Why creating more rw queues than completion vectors ?

If the user is asking for num_online_cpus read and num_online_cpus write
it will never be able to. You are saying that each one should be limited
to the number of completion vectors?

I can do that.

>> @@ -681,6 +658,34 @@ static int nvme_rdma_alloc_io_queues(struct 
>> nvme_rdma_ctrl *ctrl)
>> ????? dev_info(ctrl->ctrl.device,
>> ????????? "creating %d I/O queues.\n", nr_io_queues);
>> +??? if (opts->nr_write_queues && opts->nr_io_queues < nr_io_queues) {
> I would use local variables for write/poll/default queues and not 
> ctrl->opts pointer.
> 
> opts->nr_write_queues is not limited to num_online_cpu and can be 1000 
> (and you might steal poll queues).

This is just a question of weather write queues were requested, the
actual number is taken from ctrl->io_queues[HCTX_TYPE_DEFAULT]. So
it cannot "steal" any queues.

> Maybe we should limit it in nvmf_parse_options ?

I would avoid changing what the user passed in.

>> @@ -1763,17 +1768,24 @@ static void nvme_rdma_complete_rq(struct 
>> request *rq)
>> ? static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
>> ? {
>> ????? struct nvme_rdma_ctrl *ctrl = set->driver_data;
>> +??? struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
>> -??? set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
>> -??? set->map[HCTX_TYPE_DEFAULT].nr_queues =
>> -??????????? ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> -??? set->map[HCTX_TYPE_READ].nr_queues = 
>> ctrl->io_queues[HCTX_TYPE_READ];
>> -??? if (ctrl->ctrl.opts->nr_write_queues) {
>> +??? if (opts->nr_write_queues && ctrl->io_queues[HCTX_TYPE_READ]) {
>> ????????? /* separate read/write queues */
>> +??????? set->map[HCTX_TYPE_DEFAULT].nr_queues =
>> +??????????? ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> +??????? set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
>> +??????? set->map[HCTX_TYPE_READ].nr_queues =
>> +??????????? ctrl->io_queues[HCTX_TYPE_READ];
>> ????????? set->map[HCTX_TYPE_READ].queue_offset =
>> -??????????????? ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> +??????????? ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> ????? } else {
>> -??????? /* mixed read/write queues */
>> +??????? /* shared read/write queues */
>> +??????? set->map[HCTX_TYPE_DEFAULT].nr_queues =
>> +??????????? ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> +??????? set->map[HCTX_TYPE_DEFAULT].queue_offset = 0;
>> +??????? set->map[HCTX_TYPE_READ].nr_queues =
>> +??????????? ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> ????????? set->map[HCTX_TYPE_READ].queue_offset = 0;
> 
> why we should set the READ map in case of shared queues ?

The READ map is always set regardless of which case, its just
set differently.

> In that case we should rename TYPE_DEFAULT to TYPE_WRITE..

This is the shared queues case. This means that both default
and read queue maps relate to the same queues.

>> @@ -1781,16 +1793,22 @@ static int nvme_rdma_map_queues(struct 
>> blk_mq_tag_set *set)
>> ????? blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
>> ????????????? ctrl->device->dev, 0);
>> -??? if (ctrl->ctrl.opts->nr_poll_queues) {
>> +??? if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
>> +??????? /* map dedicated poll queues only if we have queues left */
>> ????????? set->map[HCTX_TYPE_POLL].nr_queues =
>> ????????????????? ctrl->io_queues[HCTX_TYPE_POLL];
>> ????????? set->map[HCTX_TYPE_POLL].queue_offset =
>> -??????????????? ctrl->io_queues[HCTX_TYPE_DEFAULT];
>> -??????? if (ctrl->ctrl.opts->nr_write_queues)
>> -??????????? set->map[HCTX_TYPE_POLL].queue_offset +=
>> -??????????????? ctrl->io_queues[HCTX_TYPE_READ];
>> +??????????? ctrl->io_queues[HCTX_TYPE_DEFAULT] +
>> +??????????? ctrl->io_queues[HCTX_TYPE_READ];
> 
> I would use set->map[TYPE].nr_queues (the real offset).

Its exactly the same value, whats the difference?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] nvme-rdma: fix queue mapping when queue count is limited
  2019-05-24  4:10 [PATCH 1/2] nvme-rdma: fix queue mapping when queue count is limited Sagi Grimberg
  2019-05-24  4:10 ` [PATCH 2/2] nvme-tcp: " Sagi Grimberg
  2019-05-27 15:43 ` [PATCH 1/2] nvme-rdma: " Max Gurtovoy
@ 2019-05-28 21:49 ` Harris, James R
  2 siblings, 0 replies; 6+ messages in thread
From: Harris, James R @ 2019-05-28 21:49 UTC (permalink / raw)




?On 5/23/19, 9:10 PM, "Sagi Grimberg" <sagi@grimberg.me> wrote:

    When the controller supports less queues than requested, we
    should make sure that queue mapping does the right thing and
    not assume that all queues are available. This fixes a crash
    when the controller supports less queues than requested.
    
    The rules are:
    1. if no write/poll queues are requested, we assign the available queues
       to the default queue map. The default and read queue maps share the
       existing queues.
    2. if write queues are requested:
      - first make sure that read queue map gets the requested
        nr_io_queues count
      - then grant the default queue map the minimum between the requested
        nr_write_queues and the remaining queues. If there are no available
        queues to dedicate to the default queue map, fallback to (1) and
        share all the queues in the existing queue map.
    3. if poll queues are requested:
      - map the remaining queues to the poll queue map.
    
    Also, provide a log indication on how we constructed the different
    queue maps.
    
    One side-affect change is that we remove the rdma device number of
    completion vectors device limitation when setting the queue count.
    No rdma device uses managed affinity so there is no real reason as
    we cannot guarantee any vector based queue mapping. This change is
    squashed here because this patch needs to go to stable kernel and
    its simpler having it here than having it live on its own without
    a clear indication why it ended up in stable kenrels.
    
    Reported-by: Harris, James R <james.r.harris at intel.com>
    Cc: <stable at vger.kernel.org> # v5.0+
    Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

Thanks Sagi.  This works for me.

Tested-by: Jim Harris <james.r.harris at intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] nvme-tcp: fix queue mapping when queue count is limited
  2019-05-24  4:10 ` [PATCH 2/2] nvme-tcp: " Sagi Grimberg
@ 2019-05-28 21:50   ` Harris, James R
  0 siblings, 0 replies; 6+ messages in thread
From: Harris, James R @ 2019-05-28 21:50 UTC (permalink / raw)




?On 5/23/19, 9:10 PM, "Sagi Grimberg" <sagi@grimberg.me> wrote:

    When the controller supports less queues than requested, we
    should make sure that queue mapping does the right thing and
    not assume that all queues are available. This fixes a crash
    when the controller supports less queues than requested.
    
    The rules are:
    1. if no write queues are requested, we assign the available queues
       to the default queue map. The default and read queue maps share the
       existing queues.
    2. if write queues are requested:
      - first make sure that read queue map gets the requested
        nr_io_queues count
      - then grant the default queue map the minimum between the requested
        nr_write_queues and the remaining queues. If there are no available
        queues to dedicate to the default queue map, fallback to (1) and
        share all the queues in the existing queue map.
    
    Also, provide a log indication on how we constructed the different
    queue maps.
    
    Reported-by: Harris, James R <james.r.harris at intel.com>
    Cc: <stable at vger.kernel.org> # v5.0+
    Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

Thanks Sagi.  This works too.

Tested-by: Jim Harris <james.r.harris at intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-28 21:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-24  4:10 [PATCH 1/2] nvme-rdma: fix queue mapping when queue count is limited Sagi Grimberg
2019-05-24  4:10 ` [PATCH 2/2] nvme-tcp: " Sagi Grimberg
2019-05-28 21:50   ` Harris, James R
2019-05-27 15:43 ` [PATCH 1/2] nvme-rdma: " Max Gurtovoy
2019-05-28 19:46   ` Sagi Grimberg
2019-05-28 21:49 ` Harris, James R

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.