All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme pci: fix nvme_setup_irqs()
@ 2019-01-03  1:34 Ming Lei
  2019-01-07  1:31 ` Ming Lei
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Ming Lei @ 2019-01-03  1:34 UTC (permalink / raw)


When -ENOSPC is returned from pci_alloc_irq_vectors_affinity(),
we still try to allocate multiple irq vectors again, so irq queues
covers the admin queue actually. But we don't consider that, then
number of the allocated irq vector may be same with sum of
io_queues[HCTX_TYPE_DEFAULT] and io_queues[HCTX_TYPE_READ], this way
is obviously wrong, and finally breaks nvme_pci_map_queues(), and
warning from pci_irq_get_affinity() is triggered.

IRQ queues should cover admin queues, this patch makes this
point explicitely in nvme_calc_io_queues().

We got severl boot failure internal report on aarch64, so please
consider to fix it in v4.20.

Fixes: 6451fe73fa0f ("nvme: fix irq vs io_queue calculations")
Cc: Keith Busch <keith.busch at intel.com>
Cc: Jens Axboe <axboe at fb.com>
Signed-off-by: Ming Lei <ming.lei at redhat.com>
---
 drivers/nvme/host/pci.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5a0bf6a24d50..584ea7a57122 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2028,14 +2028,18 @@ static int nvme_setup_host_mem(struct nvme_dev *dev)
 	return ret;
 }
 
+/* irq_queues covers admin queue */
 static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int irq_queues)
 {
 	unsigned int this_w_queues = write_queues;
 
+	WARN_ON(!irq_queues);
+
 	/*
-	 * Setup read/write queue split
+	 * Setup read/write queue split, assign admin queue one independent
+	 * irq vector if irq_queues is > 1.
 	 */
-	if (irq_queues == 1) {
+	if (irq_queues <= 2) {
 		dev->io_queues[HCTX_TYPE_DEFAULT] = 1;
 		dev->io_queues[HCTX_TYPE_READ] = 0;
 		return;
@@ -2043,21 +2047,21 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int irq_queues)
 
 	/*
 	 * If 'write_queues' is set, ensure it leaves room for at least
-	 * one read queue
+	 * one read queue and one admin queue
 	 */
 	if (this_w_queues >= irq_queues)
-		this_w_queues = irq_queues - 1;
+		this_w_queues = irq_queues - 2;
 
 	/*
 	 * If 'write_queues' is set to zero, reads and writes will share
 	 * a queue set.
 	 */
 	if (!this_w_queues) {
-		dev->io_queues[HCTX_TYPE_DEFAULT] = irq_queues;
+		dev->io_queues[HCTX_TYPE_DEFAULT] = irq_queues - 1;
 		dev->io_queues[HCTX_TYPE_READ] = 0;
 	} else {
 		dev->io_queues[HCTX_TYPE_DEFAULT] = this_w_queues;
-		dev->io_queues[HCTX_TYPE_READ] = irq_queues - this_w_queues;
+		dev->io_queues[HCTX_TYPE_READ] = irq_queues - this_w_queues - 1;
 	}
 }
 
@@ -2082,7 +2086,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
 		this_p_queues = nr_io_queues - 1;
 		irq_queues = 1;
 	} else {
-		irq_queues = nr_io_queues - this_p_queues;
+		irq_queues = nr_io_queues - this_p_queues + 1;
 	}
 	dev->io_queues[HCTX_TYPE_POLL] = this_p_queues;
 
@@ -2102,8 +2106,9 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
 		 * If we got a failure and we're down to asking for just
 		 * 1 + 1 queues, just ask for a single vector. We'll share
 		 * that between the single IO queue and the admin queue.
+		 * Otherwise, we assign one independent vector to admin queue.
 		 */
-		if (result >= 0 && irq_queues > 1)
+		if (irq_queues > 1)
 			irq_queues = irq_sets[0] + irq_sets[1] + 1;
 
 		result = pci_alloc_irq_vectors_affinity(pdev, irq_queues,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] nvme pci: fix nvme_setup_irqs()
  2019-01-03  1:34 [PATCH] nvme pci: fix nvme_setup_irqs() Ming Lei
@ 2019-01-07  1:31 ` Ming Lei
  2019-01-07 16:24 ` Keith Busch
  2019-01-15  5:51 ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2019-01-07  1:31 UTC (permalink / raw)


On Thu, Jan 03, 2019@09:34:39AM +0800, Ming Lei wrote:
> When -ENOSPC is returned from pci_alloc_irq_vectors_affinity(),
> we still try to allocate multiple irq vectors again, so irq queues
> covers the admin queue actually. But we don't consider that, then
> number of the allocated irq vector may be same with sum of
> io_queues[HCTX_TYPE_DEFAULT] and io_queues[HCTX_TYPE_READ], this way
> is obviously wrong, and finally breaks nvme_pci_map_queues(), and
> warning from pci_irq_get_affinity() is triggered.
> 
> IRQ queues should cover admin queues, this patch makes this
> point explicitely in nvme_calc_io_queues().
> 
> We got severl boot failure internal report on aarch64, so please
> consider to fix it in v4.20.
> 
> Fixes: 6451fe73fa0f ("nvme: fix irq vs io_queue calculations")
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Jens Axboe <axboe at fb.com>
> Signed-off-by: Ming Lei <ming.lei at redhat.com>
> ---
>  drivers/nvme/host/pci.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 5a0bf6a24d50..584ea7a57122 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2028,14 +2028,18 @@ static int nvme_setup_host_mem(struct nvme_dev *dev)
>  	return ret;
>  }
>  
> +/* irq_queues covers admin queue */
>  static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int irq_queues)
>  {
>  	unsigned int this_w_queues = write_queues;
>  
> +	WARN_ON(!irq_queues);
> +
>  	/*
> -	 * Setup read/write queue split
> +	 * Setup read/write queue split, assign admin queue one independent
> +	 * irq vector if irq_queues is > 1.
>  	 */
> -	if (irq_queues == 1) {
> +	if (irq_queues <= 2) {
>  		dev->io_queues[HCTX_TYPE_DEFAULT] = 1;
>  		dev->io_queues[HCTX_TYPE_READ] = 0;
>  		return;
> @@ -2043,21 +2047,21 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int irq_queues)
>  
>  	/*
>  	 * If 'write_queues' is set, ensure it leaves room for at least
> -	 * one read queue
> +	 * one read queue and one admin queue
>  	 */
>  	if (this_w_queues >= irq_queues)
> -		this_w_queues = irq_queues - 1;
> +		this_w_queues = irq_queues - 2;
>  
>  	/*
>  	 * If 'write_queues' is set to zero, reads and writes will share
>  	 * a queue set.
>  	 */
>  	if (!this_w_queues) {
> -		dev->io_queues[HCTX_TYPE_DEFAULT] = irq_queues;
> +		dev->io_queues[HCTX_TYPE_DEFAULT] = irq_queues - 1;
>  		dev->io_queues[HCTX_TYPE_READ] = 0;
>  	} else {
>  		dev->io_queues[HCTX_TYPE_DEFAULT] = this_w_queues;
> -		dev->io_queues[HCTX_TYPE_READ] = irq_queues - this_w_queues;
> +		dev->io_queues[HCTX_TYPE_READ] = irq_queues - this_w_queues - 1;
>  	}
>  }
>  
> @@ -2082,7 +2086,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
>  		this_p_queues = nr_io_queues - 1;
>  		irq_queues = 1;
>  	} else {
> -		irq_queues = nr_io_queues - this_p_queues;
> +		irq_queues = nr_io_queues - this_p_queues + 1;
>  	}
>  	dev->io_queues[HCTX_TYPE_POLL] = this_p_queues;
>  
> @@ -2102,8 +2106,9 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
>  		 * If we got a failure and we're down to asking for just
>  		 * 1 + 1 queues, just ask for a single vector. We'll share
>  		 * that between the single IO queue and the admin queue.
> +		 * Otherwise, we assign one independent vector to admin queue.
>  		 */
> -		if (result >= 0 && irq_queues > 1)
> +		if (irq_queues > 1)
>  			irq_queues = irq_sets[0] + irq_sets[1] + 1;
>  
>  		result = pci_alloc_irq_vectors_affinity(pdev, irq_queues,
> -- 
> 2.9.5
> 

Ping...

Thanks,
Ming

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme pci: fix nvme_setup_irqs()
  2019-01-03  1:34 [PATCH] nvme pci: fix nvme_setup_irqs() Ming Lei
  2019-01-07  1:31 ` Ming Lei
@ 2019-01-07 16:24 ` Keith Busch
  2019-01-10  0:51   ` Ming Lei
  2019-01-15  5:51 ` Christoph Hellwig
  2 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2019-01-07 16:24 UTC (permalink / raw)


On Thu, Jan 03, 2019@09:34:39AM +0800, Ming Lei wrote:
> When -ENOSPC is returned from pci_alloc_irq_vectors_affinity(),
> we still try to allocate multiple irq vectors again, so irq queues
> covers the admin queue actually. But we don't consider that, then
> number of the allocated irq vector may be same with sum of
> io_queues[HCTX_TYPE_DEFAULT] and io_queues[HCTX_TYPE_READ], this way
> is obviously wrong, and finally breaks nvme_pci_map_queues(), and
> warning from pci_irq_get_affinity() is triggered.
> 
> IRQ queues should cover admin queues, this patch makes this
> point explicitely in nvme_calc_io_queues().
> 
> We got severl boot failure internal report on aarch64, so please
> consider to fix it in v4.20.

I see what you saying with the inconsistent meaning for irq_queues, though
4.20 should be fine.

I hope we can make the irq sets easier to use in the future, but your
patch looks correct for the current interface.

Reviewed-by: Keith Busch <keith.busch at intel.com>
 
> Fixes: 6451fe73fa0f ("nvme: fix irq vs io_queue calculations")
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Jens Axboe <axboe at fb.com>
> Signed-off-by: Ming Lei <ming.lei at redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme pci: fix nvme_setup_irqs()
  2019-01-07 16:24 ` Keith Busch
@ 2019-01-10  0:51   ` Ming Lei
  0 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2019-01-10  0:51 UTC (permalink / raw)


On Mon, Jan 07, 2019@09:24:06AM -0700, Keith Busch wrote:
> On Thu, Jan 03, 2019@09:34:39AM +0800, Ming Lei wrote:
> > When -ENOSPC is returned from pci_alloc_irq_vectors_affinity(),
> > we still try to allocate multiple irq vectors again, so irq queues
> > covers the admin queue actually. But we don't consider that, then
> > number of the allocated irq vector may be same with sum of
> > io_queues[HCTX_TYPE_DEFAULT] and io_queues[HCTX_TYPE_READ], this way
> > is obviously wrong, and finally breaks nvme_pci_map_queues(), and
> > warning from pci_irq_get_affinity() is triggered.
> > 
> > IRQ queues should cover admin queues, this patch makes this
> > point explicitely in nvme_calc_io_queues().
> > 
> > We got severl boot failure internal report on aarch64, so please
> > consider to fix it in v4.20.
> 
> I see what you saying with the inconsistent meaning for irq_queues, though
> 4.20 should be fine.
> 
> I hope we can make the irq sets easier to use in the future, but your

Yeah, I agree the current API isn't easy to use for irq sets, and we
need to improve this.

> patch looks correct for the current interface.
> 
> Reviewed-by: Keith Busch <keith.busch at intel.com>

Thanks!

Christoph, what do you think of this patch?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme pci: fix nvme_setup_irqs()
  2019-01-03  1:34 [PATCH] nvme pci: fix nvme_setup_irqs() Ming Lei
  2019-01-07  1:31 ` Ming Lei
  2019-01-07 16:24 ` Keith Busch
@ 2019-01-15  5:51 ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2019-01-15  5:51 UTC (permalink / raw)


Thanks,

applied to nvme-5.0.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-01-15  5:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03  1:34 [PATCH] nvme pci: fix nvme_setup_irqs() Ming Lei
2019-01-07  1:31 ` Ming Lei
2019-01-07 16:24 ` Keith Busch
2019-01-10  0:51   ` Ming Lei
2019-01-15  5:51 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.