linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
       [not found] <20181025211626.12692-1-axboe@kernel.dk>
@ 2018-10-25 21:16 ` Jens Axboe
  2018-10-25 21:52   ` Keith Busch
  2018-10-29  7:43   ` Hannes Reinecke
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2018-10-25 21:16 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: Jens Axboe, Thomas Gleixner, linux-kernel

A driver may have a need to allocate multiple sets of MSI/MSI-X
interrupts, and have them appropriately affinitized. Add support for
defining a number of sets in the irq_affinity structure, of varying
sizes, and get each set affinitized correctly across the machine.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/interrupt.h |  4 ++++
 kernel/irq/affinity.c     | 31 +++++++++++++++++++++++++------
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index eeceac3376fc..9fce2131902c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -247,10 +247,14 @@ struct irq_affinity_notify {
  *			the MSI(-X) vector space
  * @post_vectors:	Don't apply affinity to @post_vectors at end of
  *			the MSI(-X) vector space
+ * @nr_sets:		Length of passed in *sets array
+ * @sets:		Number of affinitized sets
  */
 struct irq_affinity {
 	int	pre_vectors;
 	int	post_vectors;
+	int	nr_sets;
+	int	*sets;
 };
 
 #if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..0055e252e438 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	int curvec, usedvecs;
 	cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
 	struct cpumask *masks = NULL;
+	int i, nr_sets;
 
 	/*
 	 * If there aren't any vectors left after applying the pre/post
@@ -210,10 +211,23 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	get_online_cpus();
 	build_node_to_cpumask(node_to_cpumask);
 
-	/* Spread on present CPUs starting from affd->pre_vectors */
-	usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
-					    node_to_cpumask, cpu_present_mask,
-					    nmsk, masks);
+	/*
+	 * Spread on present CPUs starting from affd->pre_vectors. If we
+	 * have multiple sets, build each sets affinity mask separately.
+	 */
+	nr_sets = affd->nr_sets;
+	if (!nr_sets)
+		nr_sets = 1;
+
+	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
+		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
+		int nr;
+
+		nr = irq_build_affinity_masks(affd, curvec, this_vecs,
+					      node_to_cpumask, cpu_present_mask,
+					      nmsk, masks + usedvecs);
+		usedvecs += nr;
+	}
 
 	/*
 	 * Spread on non present CPUs starting from the next vector to be
@@ -258,13 +272,18 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
 {
 	int resv = affd->pre_vectors + affd->post_vectors;
 	int vecs = maxvec - resv;
+	int i, set_vecs;
 	int ret;
 
 	if (resv > minvec)
 		return 0;
 
 	get_online_cpus();
-	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
+	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs);
 	put_online_cpus();
-	return ret;
+
+	for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
+		set_vecs += affd->sets[i];
+
+	return resv + max(ret, set_vecs);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-25 21:16 ` [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs Jens Axboe
@ 2018-10-25 21:52   ` Keith Busch
  2018-10-25 23:07     ` Jens Axboe
  2018-10-29  7:43   ` Hannes Reinecke
  1 sibling, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-10-25 21:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-nvme, Thomas Gleixner, linux-kernel

On Thu, Oct 25, 2018 at 03:16:23PM -0600, Jens Axboe wrote:
> A driver may have a need to allocate multiple sets of MSI/MSI-X
> interrupts, and have them appropriately affinitized. Add support for
> defining a number of sets in the irq_affinity structure, of varying
> sizes, and get each set affinitized correctly across the machine.

<>

> @@ -258,13 +272,18 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
>  {
>  	int resv = affd->pre_vectors + affd->post_vectors;
>  	int vecs = maxvec - resv;
> +	int i, set_vecs;
>  	int ret;
>  
>  	if (resv > minvec)
>  		return 0;
>  
>  	get_online_cpus();
> -	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
> +	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs);
>  	put_online_cpus();
> -	return ret;
> +
> +	for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
> +		set_vecs += affd->sets[i];
> +
> +	return resv + max(ret, set_vecs);
>  }

This is looking pretty good, but we may risk getting into an infinite
loop in __pci_enable_msix_range() if we're requesting too many vectors
in a set: the above code may continue returning set_vecs, overriding
the reduced nvec that pci requested, and pci msix initialization will
continue to fail because it is repeatedly requesting to activate the
same vector count that failed before.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-25 21:52   ` Keith Busch
@ 2018-10-25 23:07     ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2018-10-25 23:07 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-nvme, Thomas Gleixner, linux-kernel

On 10/25/18 3:52 PM, Keith Busch wrote:
> On Thu, Oct 25, 2018 at 03:16:23PM -0600, Jens Axboe wrote:
>> A driver may have a need to allocate multiple sets of MSI/MSI-X
>> interrupts, and have them appropriately affinitized. Add support for
>> defining a number of sets in the irq_affinity structure, of varying
>> sizes, and get each set affinitized correctly across the machine.
> 
> <>
> 
>> @@ -258,13 +272,18 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
>>  {
>>  	int resv = affd->pre_vectors + affd->post_vectors;
>>  	int vecs = maxvec - resv;
>> +	int i, set_vecs;
>>  	int ret;
>>  
>>  	if (resv > minvec)
>>  		return 0;
>>  
>>  	get_online_cpus();
>> -	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
>> +	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs);
>>  	put_online_cpus();
>> -	return ret;
>> +
>> +	for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
>> +		set_vecs += affd->sets[i];
>> +
>> +	return resv + max(ret, set_vecs);
>>  }
> 
> This is looking pretty good, but we may risk getting into an infinite
> loop in __pci_enable_msix_range() if we're requesting too many vectors
> in a set: the above code may continue returning set_vecs, overriding
> the reduced nvec that pci requested, and pci msix initialization will
> continue to fail because it is repeatedly requesting to activate the
> same vector count that failed before.

Good catch, we always want to be using min() with the passed in maxvec
in there. How about this incremental?


diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 0055e252e438..2046a0f0f0f1 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -272,18 +272,21 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
 {
 	int resv = affd->pre_vectors + affd->post_vectors;
 	int vecs = maxvec - resv;
-	int i, set_vecs;
-	int ret;
+	int set_vecs;
 
 	if (resv > minvec)
 		return 0;
 
-	get_online_cpus();
-	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs);
-	put_online_cpus();
+	if (affd->nr_sets) {
+		int i;
 
-	for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
-		set_vecs += affd->sets[i];
+		for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
+			set_vecs += affd->sets[i];
+	} else {
+		get_online_cpus();
+		set_vecs = cpumask_weight(cpu_possible_mask);
+		put_online_cpus();
+	}
 
-	return resv + max(ret, set_vecs);
+	return resv + min(set_vecs, vecs);
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-25 21:16 ` [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs Jens Axboe
  2018-10-25 21:52   ` Keith Busch
@ 2018-10-29  7:43   ` Hannes Reinecke
  1 sibling, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2018-10-29  7:43 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-nvme; +Cc: Thomas Gleixner, linux-kernel

On 10/25/18 11:16 PM, Jens Axboe wrote:
> A driver may have a need to allocate multiple sets of MSI/MSI-X
> interrupts, and have them appropriately affinitized. Add support for
> defining a number of sets in the irq_affinity structure, of varying
> sizes, and get each set affinitized correctly across the machine.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   include/linux/interrupt.h |  4 ++++
>   kernel/irq/affinity.c     | 31 +++++++++++++++++++++++++------
>   2 files changed, 29 insertions(+), 6 deletions(-)
> 

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 17:46                       ` Thomas Gleixner
@ 2018-10-30 17:47                         ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 17:47 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Keith Busch, linux-block, linux-scsi, linux-kernel

On 10/30/18 11:46 AM, Thomas Gleixner wrote:
> On Tue, 30 Oct 2018, Jens Axboe wrote:
>> On 10/30/18 11:25 AM, Thomas Gleixner wrote:
>>> Jens,
>>>
>>> On Tue, 30 Oct 2018, Jens Axboe wrote:
>>>> On 10/30/18 10:02 AM, Keith Busch wrote:
>>>>> pci_alloc_irq_vectors_affinity() starts at the provided max_vecs. If
>>>>> that doesn't work, it will iterate down to min_vecs without returning to
>>>>> the caller. The caller doesn't have a chance to adjust its sets between
>>>>> iterations when you provide a range.
>>>>>
>>>>> The 'masks' overrun problem happens if the caller provides min_vecs
>>>>> as a smaller value than the sum of the set (plus any reserved).
>>>>>
>>>>> If it's up to the caller to ensure that doesn't happen, then min and
>>>>> max must both be the same value, and that value must also be the same as
>>>>> the set sum + reserved vectors. The range just becomes redundant since
>>>>> it is already bounded by the set.
>>>>>
>>>>> Using the nvme example, it would need something like this to prevent the
>>>>> 'masks' overrun:
>>>>
>>>> OK, now I hear what you are saying. And you are right, the callers needs
>>>> to provide minvec == maxvec for sets, and then have a loop around that
>>>> to adjust as needed.
>>>
>>> But then we should enforce it in the core code, right?
>>
>> Yes, I was going to ask you if you want a followup patch for that, or
>> an updated version of the original?
> 
> Updated combo patch would be nice :)

I'll re-post the series with the updated combo some time later today.

> 	lazytglx

I understand :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 17:34                     ` Jens Axboe
  2018-10-30 17:43                       ` Jens Axboe
@ 2018-10-30 17:46                       ` Thomas Gleixner
  2018-10-30 17:47                         ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2018-10-30 17:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, linux-block, linux-scsi, linux-kernel

On Tue, 30 Oct 2018, Jens Axboe wrote:
> On 10/30/18 11:25 AM, Thomas Gleixner wrote:
> > Jens,
> > 
> > On Tue, 30 Oct 2018, Jens Axboe wrote:
> >> On 10/30/18 10:02 AM, Keith Busch wrote:
> >>> pci_alloc_irq_vectors_affinity() starts at the provided max_vecs. If
> >>> that doesn't work, it will iterate down to min_vecs without returning to
> >>> the caller. The caller doesn't have a chance to adjust its sets between
> >>> iterations when you provide a range.
> >>>
> >>> The 'masks' overrun problem happens if the caller provides min_vecs
> >>> as a smaller value than the sum of the set (plus any reserved).
> >>>
> >>> If it's up to the caller to ensure that doesn't happen, then min and
> >>> max must both be the same value, and that value must also be the same as
> >>> the set sum + reserved vectors. The range just becomes redundant since
> >>> it is already bounded by the set.
> >>>
> >>> Using the nvme example, it would need something like this to prevent the
> >>> 'masks' overrun:
> >>
> >> OK, now I hear what you are saying. And you are right, the callers needs
> >> to provide minvec == maxvec for sets, and then have a loop around that
> >> to adjust as needed.
> > 
> > But then we should enforce it in the core code, right?
> 
> Yes, I was going to ask you if you want a followup patch for that, or
> an updated version of the original?

Updated combo patch would be nice :)

Thanks

	lazytglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 17:34                     ` Jens Axboe
@ 2018-10-30 17:43                       ` Jens Axboe
  2018-10-30 17:46                       ` Thomas Gleixner
  1 sibling, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 17:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Keith Busch, linux-block, linux-scsi, linux-kernel

On 10/30/18 11:34 AM, Jens Axboe wrote:
> On 10/30/18 11:25 AM, Thomas Gleixner wrote:
>> Jens,
>>
>> On Tue, 30 Oct 2018, Jens Axboe wrote:
>>> On 10/30/18 10:02 AM, Keith Busch wrote:
>>>> pci_alloc_irq_vectors_affinity() starts at the provided max_vecs. If
>>>> that doesn't work, it will iterate down to min_vecs without returning to
>>>> the caller. The caller doesn't have a chance to adjust its sets between
>>>> iterations when you provide a range.
>>>>
>>>> The 'masks' overrun problem happens if the caller provides min_vecs
>>>> as a smaller value than the sum of the set (plus any reserved).
>>>>
>>>> If it's up to the caller to ensure that doesn't happen, then min and
>>>> max must both be the same value, and that value must also be the same as
>>>> the set sum + reserved vectors. The range just becomes redundant since
>>>> it is already bounded by the set.
>>>>
>>>> Using the nvme example, it would need something like this to prevent the
>>>> 'masks' overrun:
>>>
>>> OK, now I hear what you are saying. And you are right, the callers needs
>>> to provide minvec == maxvec for sets, and then have a loop around that
>>> to adjust as needed.
>>
>> But then we should enforce it in the core code, right?
> 
> Yes, I was going to ask you if you want a followup patch for that, or
> an updated version of the original?

Here's an incremental, I'm going to fold this into the original unless
I hear otherwise.


diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index af24ed50a245..e6c6e10b9ceb 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1036,6 +1036,13 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	if (maxvec < minvec)
 		return -ERANGE;
 
+	/*
+	 * If the caller is passing in sets, we can't support a range of
+	 * vectors. The caller needs to handle that.
+	 */
+	if (affd->nr_sets && minvec != maxvec)
+		return -EINVAL;
+
 	if (WARN_ON_ONCE(dev->msi_enabled))
 		return -EINVAL;
 
@@ -1087,6 +1094,13 @@ static int __pci_enable_msix_range(struct pci_dev *dev,
 	if (maxvec < minvec)
 		return -ERANGE;
 
+	/*
+	 * If the caller is passing in sets, we can't support a range of
+	 * supported vectors. The caller needs to handle that.
+	 */
+	if (affd->nr_sets && minvec != maxvec)
+		return -EINVAL;
+
 	if (WARN_ON_ONCE(dev->msix_enabled))
 		return -EINVAL;
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 17:33                       ` Jens Axboe
@ 2018-10-30 17:35                         ` Keith Busch
  0 siblings, 0 replies; 25+ messages in thread
From: Keith Busch @ 2018-10-30 17:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On Tue, Oct 30, 2018 at 11:33:51AM -0600, Jens Axboe wrote:
> On 10/30/18 11:22 AM, Keith Busch wrote:
> > On Tue, Oct 30, 2018 at 11:09:04AM -0600, Jens Axboe wrote:
> >> Pretty trivial, below. This also keeps the queue mapping calculations
> >> more clean, as we don't have to do one after we're done allocating
> >> IRQs.
> > 
> > Yep, this addresses my concern. It less efficient than PCI since PCI
> > can usually jump straight to a valid vector count in a single iteration
> > where this only subtracts by 1. I really can't be bothered to care for
> > optimizing that, so this works for me! :) 
> 
> It definitely is less efficient than just getting the count that we
> can support, but it's at probe time so I could not really be bothered
> either.
> 
> Can I add your reviewed-by?

Yes, please.

Reviewed-by: Keith Busch <keith.busch@intel.com>

> -- 
> Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 17:25                   ` Thomas Gleixner
@ 2018-10-30 17:34                     ` Jens Axboe
  2018-10-30 17:43                       ` Jens Axboe
  2018-10-30 17:46                       ` Thomas Gleixner
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 17:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Keith Busch, linux-block, linux-scsi, linux-kernel

On 10/30/18 11:25 AM, Thomas Gleixner wrote:
> Jens,
> 
> On Tue, 30 Oct 2018, Jens Axboe wrote:
>> On 10/30/18 10:02 AM, Keith Busch wrote:
>>> pci_alloc_irq_vectors_affinity() starts at the provided max_vecs. If
>>> that doesn't work, it will iterate down to min_vecs without returning to
>>> the caller. The caller doesn't have a chance to adjust its sets between
>>> iterations when you provide a range.
>>>
>>> The 'masks' overrun problem happens if the caller provides min_vecs
>>> as a smaller value than the sum of the set (plus any reserved).
>>>
>>> If it's up to the caller to ensure that doesn't happen, then min and
>>> max must both be the same value, and that value must also be the same as
>>> the set sum + reserved vectors. The range just becomes redundant since
>>> it is already bounded by the set.
>>>
>>> Using the nvme example, it would need something like this to prevent the
>>> 'masks' overrun:
>>
>> OK, now I hear what you are saying. And you are right, the callers needs
>> to provide minvec == maxvec for sets, and then have a loop around that
>> to adjust as needed.
> 
> But then we should enforce it in the core code, right?

Yes, I was going to ask you if you want a followup patch for that, or
an updated version of the original?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 17:22                     ` Keith Busch
@ 2018-10-30 17:33                       ` Jens Axboe
  2018-10-30 17:35                         ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 17:33 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On 10/30/18 11:22 AM, Keith Busch wrote:
> On Tue, Oct 30, 2018 at 11:09:04AM -0600, Jens Axboe wrote:
>> Pretty trivial, below. This also keeps the queue mapping calculations
>> more clean, as we don't have to do one after we're done allocating
>> IRQs.
> 
> Yep, this addresses my concern. It less efficient than PCI since PCI
> can usually jump straight to a valid vector count in a single iteration
> where this only subtracts by 1. I really can't be bothered to care for
> optimizing that, so this works for me! :) 

It definitely is less efficient than just getting the count that we
can support, but it's at probe time so I could not really be bothered
either.

Can I add your reviewed-by?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 16:42                 ` Jens Axboe
  2018-10-30 17:09                   ` Jens Axboe
@ 2018-10-30 17:25                   ` Thomas Gleixner
  2018-10-30 17:34                     ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2018-10-30 17:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, linux-block, linux-scsi, linux-kernel

Jens,

On Tue, 30 Oct 2018, Jens Axboe wrote:
> On 10/30/18 10:02 AM, Keith Busch wrote:
> > pci_alloc_irq_vectors_affinity() starts at the provided max_vecs. If
> > that doesn't work, it will iterate down to min_vecs without returning to
> > the caller. The caller doesn't have a chance to adjust its sets between
> > iterations when you provide a range.
> > 
> > The 'masks' overrun problem happens if the caller provides min_vecs
> > as a smaller value than the sum of the set (plus any reserved).
> > 
> > If it's up to the caller to ensure that doesn't happen, then min and
> > max must both be the same value, and that value must also be the same as
> > the set sum + reserved vectors. The range just becomes redundant since
> > it is already bounded by the set.
> > 
> > Using the nvme example, it would need something like this to prevent the
> > 'masks' overrun:
> 
> OK, now I hear what you are saying. And you are right, the callers needs
> to provide minvec == maxvec for sets, and then have a loop around that
> to adjust as needed.

But then we should enforce it in the core code, right?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 17:09                   ` Jens Axboe
@ 2018-10-30 17:22                     ` Keith Busch
  2018-10-30 17:33                       ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-10-30 17:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On Tue, Oct 30, 2018 at 11:09:04AM -0600, Jens Axboe wrote:
> Pretty trivial, below. This also keeps the queue mapping calculations
> more clean, as we don't have to do one after we're done allocating
> IRQs.

Yep, this addresses my concern. It less efficient than PCI since PCI
can usually jump straight to a valid vector count in a single iteration
where this only subtracts by 1. I really can't be bothered to care for
optimizing that, so this works for me! :) 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 16:42                 ` Jens Axboe
@ 2018-10-30 17:09                   ` Jens Axboe
  2018-10-30 17:22                     ` Keith Busch
  2018-10-30 17:25                   ` Thomas Gleixner
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 17:09 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On 10/30/18 10:42 AM, Jens Axboe wrote:
> On 10/30/18 10:02 AM, Keith Busch wrote:
>> On Tue, Oct 30, 2018 at 09:18:05AM -0600, Jens Axboe wrote:
>>> On 10/30/18 9:08 AM, Keith Busch wrote:
>>>> On Tue, Oct 30, 2018 at 08:53:37AM -0600, Jens Axboe wrote:
>>>>> The sum of the set can't exceed the nvecs passed in, the nvecs passed in
>>>>> should be the less than or equal to nvecs. Granted this isn't enforced,
>>>>> and perhaps that should be the case.
>>>>
>>>> That should at least initially be true for a proper functioning
>>>> driver. It's not enforced as you mentioned, but that's only related to
>>>> the issue I'm referring to.
>>>>
>>>> The problem is pci_alloc_irq_vectors_affinity() takes a range, min_vecs
>>>> and max_vecs, but a range of allowable vector allocations doesn't make
>>>> sense when using sets.
>>>
>>> I feel like we're going in circles here, not sure what you feel the
>>> issue is now? The range is fine, whoever uses sets will need to adjust
>>> their sets based on what pci_alloc_irq_vectors_affinity() returns,
>>> if it didn't return the passed in desired max.
>>
>> Sorry, let me to try again.
>>
>> pci_alloc_irq_vectors_affinity() starts at the provided max_vecs. If
>> that doesn't work, it will iterate down to min_vecs without returning to
>> the caller. The caller doesn't have a chance to adjust its sets between
>> iterations when you provide a range.
>>
>> The 'masks' overrun problem happens if the caller provides min_vecs
>> as a smaller value than the sum of the set (plus any reserved).
>>
>> If it's up to the caller to ensure that doesn't happen, then min and
>> max must both be the same value, and that value must also be the same as
>> the set sum + reserved vectors. The range just becomes redundant since
>> it is already bounded by the set.
>>
>> Using the nvme example, it would need something like this to prevent the
>> 'masks' overrun:
> 
> OK, now I hear what you are saying. And you are right, the callers needs
> to provide minvec == maxvec for sets, and then have a loop around that
> to adjust as needed.
> 
> I'll make that change in nvme.

Pretty trivial, below. This also keeps the queue mapping calculations
more clean, as we don't have to do one after we're done allocating
IRQs.


commit e8a35d023a192e34540c60f779fe755970b8eeb2
Author: Jens Axboe <axboe@kernel.dk>
Date:   Tue Oct 30 11:06:29 2018 -0600

    nvme: utilize two queue maps, one for reads and one for writes
    
    NVMe does round-robin between queues by default, which means that
    sharing a queue map for both reads and writes can be problematic
    in terms of read servicing. It's much easier to flood the queue
    with writes and reduce the read servicing.
    
    Implement two queue maps, one for reads and one for writes. The
    write queue count is configurable through the 'write_queues'
    parameter.
    
    By default, we retain the previous behavior of having a single
    queue set, shared between reads and writes. Setting 'write_queues'
    to a non-zero value will create two queue sets, one for reads and
    one for writes, the latter using the configurable number of
    queues (hardware queue counts permitting).
    
    Reviewed-by: Hannes Reinecke <hare@suse.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e5d783cb6937..17170686105f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -74,11 +74,29 @@ static int io_queue_depth = 1024;
 module_param_cb(io_queue_depth, &io_queue_depth_ops, &io_queue_depth, 0644);
 MODULE_PARM_DESC(io_queue_depth, "set io queue depth, should >= 2");
 
+static int queue_count_set(const char *val, const struct kernel_param *kp);
+static const struct kernel_param_ops queue_count_ops = {
+	.set = queue_count_set,
+	.get = param_get_int,
+};
+
+static int write_queues;
+module_param_cb(write_queues, &queue_count_ops, &write_queues, 0644);
+MODULE_PARM_DESC(write_queues,
+	"Number of queues to use for writes. If not set, reads and writes "
+	"will share a queue set.");
+
 struct nvme_dev;
 struct nvme_queue;
 
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 
+enum {
+	NVMEQ_TYPE_READ,
+	NVMEQ_TYPE_WRITE,
+	NVMEQ_TYPE_NR,
+};
+
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -92,6 +110,7 @@ struct nvme_dev {
 	struct dma_pool *prp_small_pool;
 	unsigned online_queues;
 	unsigned max_qid;
+	unsigned io_queues[NVMEQ_TYPE_NR];
 	unsigned int num_vecs;
 	int q_depth;
 	u32 db_stride;
@@ -134,6 +153,17 @@ static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
 	return param_set_int(val, kp);
 }
 
+static int queue_count_set(const char *val, const struct kernel_param *kp)
+{
+	int n = 0, ret;
+
+	ret = kstrtoint(val, 10, &n);
+	if (n > num_possible_cpus())
+		n = num_possible_cpus();
+
+	return param_set_int(val, kp);
+}
+
 static inline unsigned int sq_idx(unsigned int qid, u32 stride)
 {
 	return qid * 2 * stride;
@@ -218,9 +248,20 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_dbbuf) != 64);
 }
 
+static unsigned int max_io_queues(void)
+{
+	return num_possible_cpus() + write_queues;
+}
+
+static unsigned int max_queue_count(void)
+{
+	/* IO queues + admin queue */
+	return 1 + max_io_queues();
+}
+
 static inline unsigned int nvme_dbbuf_size(u32 stride)
 {
-	return ((num_possible_cpus() + 1) * 8 * stride);
+	return (max_queue_count() * 8 * stride);
 }
 
 static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev)
@@ -431,12 +472,41 @@ static int nvme_init_request(struct blk_mq_tag_set *set, struct request *req,
 	return 0;
 }
 
+static int queue_irq_offset(struct nvme_dev *dev)
+{
+	/* if we have more than 1 vec, admin queue offsets us 1 */
+	if (dev->num_vecs > 1)
+		return 1;
+
+	return 0;
+}
+
 static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 {
 	struct nvme_dev *dev = set->driver_data;
+	int i, qoff, offset;
+
+	offset = queue_irq_offset(dev);
+	for (i = 0, qoff = 0; i < set->nr_maps; i++) {
+		struct blk_mq_queue_map *map = &set->map[i];
 
-	return blk_mq_pci_map_queues(&set->map[0], to_pci_dev(dev->dev),
-			dev->num_vecs > 1 ? 1 /* admin queue */ : 0);
+		map->nr_queues = dev->io_queues[i];
+		if (!map->nr_queues) {
+			BUG_ON(i == NVMEQ_TYPE_READ);
+
+			/* shared set, resuse read set parameters */
+			map->nr_queues = dev->io_queues[NVMEQ_TYPE_READ];
+			qoff = 0;
+			offset = queue_irq_offset(dev);
+		}
+
+		map->queue_offset = qoff;
+		blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
+		qoff += map->nr_queues;
+		offset += map->nr_queues;
+	}
+
+	return 0;
 }
 
 /**
@@ -849,6 +919,14 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
+static int nvme_flags_to_type(struct request_queue *q, unsigned int flags)
+{
+	if ((flags & REQ_OP_MASK) == REQ_OP_READ)
+		return NVMEQ_TYPE_READ;
+
+	return NVMEQ_TYPE_WRITE;
+}
+
 static void nvme_pci_complete_rq(struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -1476,6 +1554,7 @@ static const struct blk_mq_ops nvme_mq_admin_ops = {
 
 static const struct blk_mq_ops nvme_mq_ops = {
 	.queue_rq	= nvme_queue_rq,
+	.flags_to_type	= nvme_flags_to_type,
 	.complete	= nvme_pci_complete_rq,
 	.init_hctx	= nvme_init_hctx,
 	.init_request	= nvme_init_request,
@@ -1888,18 +1967,53 @@ static int nvme_setup_host_mem(struct nvme_dev *dev)
 	return ret;
 }
 
+static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues)
+{
+	unsigned int this_w_queues = write_queues;
+
+	/*
+	 * Setup read/write queue split
+	 */
+	if (nr_io_queues == 1) {
+		dev->io_queues[NVMEQ_TYPE_READ] = 1;
+		dev->io_queues[NVMEQ_TYPE_WRITE] = 0;
+		return;
+	}
+
+	/*
+	 * If 'write_queues' is set, ensure it leaves room for at least
+	 * one read queue
+	 */
+	if (this_w_queues >= nr_io_queues)
+		this_w_queues = nr_io_queues - 1;
+
+	/*
+	 * If 'write_queues' is set to zero, reads and writes will share
+	 * a queue set.
+	 */
+	if (!this_w_queues) {
+		dev->io_queues[NVMEQ_TYPE_WRITE] = 0;
+		dev->io_queues[NVMEQ_TYPE_READ] = nr_io_queues;
+	} else {
+		dev->io_queues[NVMEQ_TYPE_WRITE] = this_w_queues;
+		dev->io_queues[NVMEQ_TYPE_READ] = nr_io_queues - this_w_queues;
+	}
+}
+
 static int nvme_setup_io_queues(struct nvme_dev *dev)
 {
 	struct nvme_queue *adminq = &dev->queues[0];
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int result, nr_io_queues;
 	unsigned long size;
-
+	int irq_sets[2];
 	struct irq_affinity affd = {
-		.pre_vectors = 1
+		.pre_vectors = 1,
+		.nr_sets = ARRAY_SIZE(irq_sets),
+		.sets = irq_sets,
 	};
 
-	nr_io_queues = num_possible_cpus();
+	nr_io_queues = max_io_queues();
 	result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
 	if (result < 0)
 		return result;
@@ -1934,13 +2048,48 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	 * setting up the full range we need.
 	 */
 	pci_free_irq_vectors(pdev);
-	result = pci_alloc_irq_vectors_affinity(pdev, 1, nr_io_queues + 1,
-			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
-	if (result <= 0)
-		return -EIO;
+
+	/*
+	 * For irq sets, we have to ask for minvec == maxvec. This passes
+	 * any reduction back to us, so we can adjust our queue counts and
+	 * IRQ vector needs.
+	 */
+	do {
+		nvme_calc_io_queues(dev, nr_io_queues);
+		irq_sets[0] = dev->io_queues[NVMEQ_TYPE_READ];
+		irq_sets[1] = dev->io_queues[NVMEQ_TYPE_WRITE];
+		if (!irq_sets[1])
+			affd.nr_sets = 1;
+
+		/*
+		 * Need IRQs for read+write queues, and one for the admin queue
+		 */
+		nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
+
+		result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues,
+				nr_io_queues,
+				PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
+
+		/*
+		 * Need to reduce our vec counts
+		 */
+		if (result == -ENOSPC) {
+			nr_io_queues--;
+			if (!nr_io_queues)
+				return result;
+			continue;
+		} else if (result <= 0)
+			return -EIO;
+		break;
+	} while (1);
+
 	dev->num_vecs = result;
 	dev->max_qid = max(result - 1, 1);
 
+	dev_info(dev->ctrl.device, "%d/%d/%d read/write queues\n",
+					dev->io_queues[NVMEQ_TYPE_READ],
+					dev->io_queues[NVMEQ_TYPE_WRITE]);
+
 	/*
 	 * Should investigate if there's a performance win from allocating
 	 * more queues than interrupt vectors; it might allow the submission
@@ -2042,6 +2191,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	if (!dev->ctrl.tagset) {
 		dev->tagset.ops = &nvme_mq_ops;
 		dev->tagset.nr_hw_queues = dev->online_queues - 1;
+		dev->tagset.nr_maps = NVMEQ_TYPE_NR;
 		dev->tagset.timeout = NVME_IO_TIMEOUT;
 		dev->tagset.numa_node = dev_to_node(dev->dev);
 		dev->tagset.queue_depth =
@@ -2489,8 +2639,8 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (!dev)
 		return -ENOMEM;
 
-	dev->queues = kcalloc_node(num_possible_cpus() + 1,
-			sizeof(struct nvme_queue), GFP_KERNEL, node);
+	dev->queues = kcalloc_node(max_queue_count(), sizeof(struct nvme_queue),
+					GFP_KERNEL, node);
 	if (!dev->queues)
 		goto free;
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 16:02               ` Keith Busch
@ 2018-10-30 16:42                 ` Jens Axboe
  2018-10-30 17:09                   ` Jens Axboe
  2018-10-30 17:25                   ` Thomas Gleixner
  0 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 16:42 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On 10/30/18 10:02 AM, Keith Busch wrote:
> On Tue, Oct 30, 2018 at 09:18:05AM -0600, Jens Axboe wrote:
>> On 10/30/18 9:08 AM, Keith Busch wrote:
>>> On Tue, Oct 30, 2018 at 08:53:37AM -0600, Jens Axboe wrote:
>>>> The sum of the set can't exceed the nvecs passed in, the nvecs passed in
>>>> should be the less than or equal to nvecs. Granted this isn't enforced,
>>>> and perhaps that should be the case.
>>>
>>> That should at least initially be true for a proper functioning
>>> driver. It's not enforced as you mentioned, but that's only related to
>>> the issue I'm referring to.
>>>
>>> The problem is pci_alloc_irq_vectors_affinity() takes a range, min_vecs
>>> and max_vecs, but a range of allowable vector allocations doesn't make
>>> sense when using sets.
>>
>> I feel like we're going in circles here, not sure what you feel the
>> issue is now? The range is fine, whoever uses sets will need to adjust
>> their sets based on what pci_alloc_irq_vectors_affinity() returns,
>> if it didn't return the passed in desired max.
> 
> Sorry, let me to try again.
> 
> pci_alloc_irq_vectors_affinity() starts at the provided max_vecs. If
> that doesn't work, it will iterate down to min_vecs without returning to
> the caller. The caller doesn't have a chance to adjust its sets between
> iterations when you provide a range.
> 
> The 'masks' overrun problem happens if the caller provides min_vecs
> as a smaller value than the sum of the set (plus any reserved).
> 
> If it's up to the caller to ensure that doesn't happen, then min and
> max must both be the same value, and that value must also be the same as
> the set sum + reserved vectors. The range just becomes redundant since
> it is already bounded by the set.
> 
> Using the nvme example, it would need something like this to prevent the
> 'masks' overrun:

OK, now I hear what you are saying. And you are right, the callers needs
to provide minvec == maxvec for sets, and then have a loop around that
to adjust as needed.

I'll make that change in nvme.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 15:18             ` Jens Axboe
@ 2018-10-30 16:02               ` Keith Busch
  2018-10-30 16:42                 ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-10-30 16:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On Tue, Oct 30, 2018 at 09:18:05AM -0600, Jens Axboe wrote:
> On 10/30/18 9:08 AM, Keith Busch wrote:
> > On Tue, Oct 30, 2018 at 08:53:37AM -0600, Jens Axboe wrote:
> >> The sum of the set can't exceed the nvecs passed in, the nvecs passed in
> >> should be the less than or equal to nvecs. Granted this isn't enforced,
> >> and perhaps that should be the case.
> > 
> > That should at least initially be true for a proper functioning
> > driver. It's not enforced as you mentioned, but that's only related to
> > the issue I'm referring to.
> > 
> > The problem is pci_alloc_irq_vectors_affinity() takes a range, min_vecs
> > and max_vecs, but a range of allowable vector allocations doesn't make
> > sense when using sets.
> 
> I feel like we're going in circles here, not sure what you feel the
> issue is now? The range is fine, whoever uses sets will need to adjust
> their sets based on what pci_alloc_irq_vectors_affinity() returns,
> if it didn't return the passed in desired max.

Sorry, let me to try again.

pci_alloc_irq_vectors_affinity() starts at the provided max_vecs. If
that doesn't work, it will iterate down to min_vecs without returning to
the caller. The caller doesn't have a chance to adjust its sets between
iterations when you provide a range.

The 'masks' overrun problem happens if the caller provides min_vecs
as a smaller value than the sum of the set (plus any reserved).

If it's up to the caller to ensure that doesn't happen, then min and
max must both be the same value, and that value must also be the same as
the set sum + reserved vectors. The range just becomes redundant since
it is already bounded by the set.

Using the nvme example, it would need something like this to prevent the
'masks' overrun:

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a8747b956e43..625eff570eaa 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2120,7 +2120,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	 * setting up the full range we need.
 	 */
 	pci_free_irq_vectors(pdev);
-	result = pci_alloc_irq_vectors_affinity(pdev, 1, nr_io_queues,
+	result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues, nr_io_queues,
 			PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
 	if (result <= 0)
 		return -EIO;
--

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 15:08           ` Keith Busch
@ 2018-10-30 15:18             ` Jens Axboe
  2018-10-30 16:02               ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 15:18 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On 10/30/18 9:08 AM, Keith Busch wrote:
> On Tue, Oct 30, 2018 at 08:53:37AM -0600, Jens Axboe wrote:
>> The sum of the set can't exceed the nvecs passed in, the nvecs passed in
>> should be the less than or equal to nvecs. Granted this isn't enforced,
>> and perhaps that should be the case.
> 
> That should at least initially be true for a proper functioning
> driver. It's not enforced as you mentioned, but that's only related to
> the issue I'm referring to.
> 
> The problem is pci_alloc_irq_vectors_affinity() takes a range, min_vecs
> and max_vecs, but a range of allowable vector allocations doesn't make
> sense when using sets.

I feel like we're going in circles here, not sure what you feel the
issue is now? The range is fine, whoever uses sets will need to adjust
their sets based on what pci_alloc_irq_vectors_affinity() returns,
if it didn't return the passed in desired max.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 14:53         ` Jens Axboe
@ 2018-10-30 15:08           ` Keith Busch
  2018-10-30 15:18             ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-10-30 15:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On Tue, Oct 30, 2018 at 08:53:37AM -0600, Jens Axboe wrote:
> The sum of the set can't exceed the nvecs passed in, the nvecs passed in
> should be the less than or equal to nvecs. Granted this isn't enforced,
> and perhaps that should be the case.

That should at least initially be true for a proper functioning
driver. It's not enforced as you mentioned, but that's only related to
the issue I'm referring to.

The problem is pci_alloc_irq_vectors_affinity() takes a range, min_vecs
and max_vecs, but a range of allowable vector allocations doesn't make
sense when using sets.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 14:45       ` Keith Busch
@ 2018-10-30 14:53         ` Jens Axboe
  2018-10-30 15:08           ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 14:53 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On 10/30/18 8:45 AM, Keith Busch wrote:
> On Tue, Oct 30, 2018 at 08:36:35AM -0600, Jens Axboe wrote:
>> On 10/30/18 8:26 AM, Keith Busch wrote:
>>> On Mon, Oct 29, 2018 at 10:37:35AM -0600, Jens Axboe wrote:
>>>> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
>>>> index f4f29b9d90ee..2046a0f0f0f1 100644
>>>> --- a/kernel/irq/affinity.c
>>>> +++ b/kernel/irq/affinity.c
>>>> @@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>>>>  	int curvec, usedvecs;
>>>>  	cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
>>>>  	struct cpumask *masks = NULL;
>>>> +	int i, nr_sets;
>>>>  
>>>>  	/*
>>>>  	 * If there aren't any vectors left after applying the pre/post
>>>> @@ -210,10 +211,23 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>>>>  	get_online_cpus();
>>>>  	build_node_to_cpumask(node_to_cpumask);
>>>>  
>>>> -	/* Spread on present CPUs starting from affd->pre_vectors */
>>>> -	usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
>>>> -					    node_to_cpumask, cpu_present_mask,
>>>> -					    nmsk, masks);
>>>> +	/*
>>>> +	 * Spread on present CPUs starting from affd->pre_vectors. If we
>>>> +	 * have multiple sets, build each sets affinity mask separately.
>>>> +	 */
>>>> +	nr_sets = affd->nr_sets;
>>>> +	if (!nr_sets)
>>>> +		nr_sets = 1;
>>>> +
>>>> +	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
>>>> +		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
>>>> +		int nr;
>>>> +
>>>> +		nr = irq_build_affinity_masks(affd, curvec, this_vecs,
>>>> +					      node_to_cpumask, cpu_present_mask,
>>>> +					      nmsk, masks + usedvecs);
>>>> +		usedvecs += nr;
>>>> +	}
>>>
>>>
>>> While the code below returns the appropriate number of possible vectors
>>> when a set requested too many, the above code is still using the value
>>> from the set, which may exceed 'nvecs' used to kcalloc 'masks', so
>>> 'masks + usedvecs' may go out of bounds.
>>
>> How so? nvecs must the max number of vecs, the sum of the sets can't
>> exceed that value.
> 
> 'nvecs' is what irq_calc_affinity_vectors() returns, which is the min
> of either the requested max or the sum of the set, and the sum of the set
> isn't guaranteed to be the smaller value.

The sum of the set can't exceed the nvecs passed in, the nvecs passed in
should be the less than or equal to nvecs. Granted this isn't enforced,
and perhaps that should be the case.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 14:36     ` Jens Axboe
@ 2018-10-30 14:45       ` Keith Busch
  2018-10-30 14:53         ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-10-30 14:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On Tue, Oct 30, 2018 at 08:36:35AM -0600, Jens Axboe wrote:
> On 10/30/18 8:26 AM, Keith Busch wrote:
> > On Mon, Oct 29, 2018 at 10:37:35AM -0600, Jens Axboe wrote:
> >> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> >> index f4f29b9d90ee..2046a0f0f0f1 100644
> >> --- a/kernel/irq/affinity.c
> >> +++ b/kernel/irq/affinity.c
> >> @@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> >>  	int curvec, usedvecs;
> >>  	cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
> >>  	struct cpumask *masks = NULL;
> >> +	int i, nr_sets;
> >>  
> >>  	/*
> >>  	 * If there aren't any vectors left after applying the pre/post
> >> @@ -210,10 +211,23 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
> >>  	get_online_cpus();
> >>  	build_node_to_cpumask(node_to_cpumask);
> >>  
> >> -	/* Spread on present CPUs starting from affd->pre_vectors */
> >> -	usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
> >> -					    node_to_cpumask, cpu_present_mask,
> >> -					    nmsk, masks);
> >> +	/*
> >> +	 * Spread on present CPUs starting from affd->pre_vectors. If we
> >> +	 * have multiple sets, build each sets affinity mask separately.
> >> +	 */
> >> +	nr_sets = affd->nr_sets;
> >> +	if (!nr_sets)
> >> +		nr_sets = 1;
> >> +
> >> +	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
> >> +		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
> >> +		int nr;
> >> +
> >> +		nr = irq_build_affinity_masks(affd, curvec, this_vecs,
> >> +					      node_to_cpumask, cpu_present_mask,
> >> +					      nmsk, masks + usedvecs);
> >> +		usedvecs += nr;
> >> +	}
> > 
> > 
> > While the code below returns the appropriate number of possible vectors
> > when a set requested too many, the above code is still using the value
> > from the set, which may exceed 'nvecs' used to kcalloc 'masks', so
> > 'masks + usedvecs' may go out of bounds.
> 
> How so? nvecs must the max number of vecs, the sum of the sets can't
> exceed that value.

'nvecs' is what irq_calc_affinity_vectors() returns, which is the min
of either the requested max or the sum of the set, and the sum of the set
isn't guaranteed to be the smaller value.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-30 14:26   ` Keith Busch
@ 2018-10-30 14:36     ` Jens Axboe
  2018-10-30 14:45       ` Keith Busch
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2018-10-30 14:36 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On 10/30/18 8:26 AM, Keith Busch wrote:
> On Mon, Oct 29, 2018 at 10:37:35AM -0600, Jens Axboe wrote:
>> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
>> index f4f29b9d90ee..2046a0f0f0f1 100644
>> --- a/kernel/irq/affinity.c
>> +++ b/kernel/irq/affinity.c
>> @@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>>  	int curvec, usedvecs;
>>  	cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
>>  	struct cpumask *masks = NULL;
>> +	int i, nr_sets;
>>  
>>  	/*
>>  	 * If there aren't any vectors left after applying the pre/post
>> @@ -210,10 +211,23 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>>  	get_online_cpus();
>>  	build_node_to_cpumask(node_to_cpumask);
>>  
>> -	/* Spread on present CPUs starting from affd->pre_vectors */
>> -	usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
>> -					    node_to_cpumask, cpu_present_mask,
>> -					    nmsk, masks);
>> +	/*
>> +	 * Spread on present CPUs starting from affd->pre_vectors. If we
>> +	 * have multiple sets, build each sets affinity mask separately.
>> +	 */
>> +	nr_sets = affd->nr_sets;
>> +	if (!nr_sets)
>> +		nr_sets = 1;
>> +
>> +	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
>> +		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
>> +		int nr;
>> +
>> +		nr = irq_build_affinity_masks(affd, curvec, this_vecs,
>> +					      node_to_cpumask, cpu_present_mask,
>> +					      nmsk, masks + usedvecs);
>> +		usedvecs += nr;
>> +	}
> 
> 
> While the code below returns the appropriate number of possible vectors
> when a set requested too many, the above code is still using the value
> from the set, which may exceed 'nvecs' used to kcalloc 'masks', so
> 'masks + usedvecs' may go out of bounds.

How so? nvecs must the max number of vecs, the sum of the sets can't
exceed that value.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-29 16:37 ` [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs Jens Axboe
  2018-10-29 17:08   ` Thomas Gleixner
  2018-10-30  9:25   ` Ming Lei
@ 2018-10-30 14:26   ` Keith Busch
  2018-10-30 14:36     ` Jens Axboe
  2 siblings, 1 reply; 25+ messages in thread
From: Keith Busch @ 2018-10-30 14:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On Mon, Oct 29, 2018 at 10:37:35AM -0600, Jens Axboe wrote:
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index f4f29b9d90ee..2046a0f0f0f1 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	int curvec, usedvecs;
>  	cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
>  	struct cpumask *masks = NULL;
> +	int i, nr_sets;
>  
>  	/*
>  	 * If there aren't any vectors left after applying the pre/post
> @@ -210,10 +211,23 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	get_online_cpus();
>  	build_node_to_cpumask(node_to_cpumask);
>  
> -	/* Spread on present CPUs starting from affd->pre_vectors */
> -	usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
> -					    node_to_cpumask, cpu_present_mask,
> -					    nmsk, masks);
> +	/*
> +	 * Spread on present CPUs starting from affd->pre_vectors. If we
> +	 * have multiple sets, build each sets affinity mask separately.
> +	 */
> +	nr_sets = affd->nr_sets;
> +	if (!nr_sets)
> +		nr_sets = 1;
> +
> +	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
> +		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
> +		int nr;
> +
> +		nr = irq_build_affinity_masks(affd, curvec, this_vecs,
> +					      node_to_cpumask, cpu_present_mask,
> +					      nmsk, masks + usedvecs);
> +		usedvecs += nr;
> +	}


While the code below returns the appropriate number of possible vectors
when a set requested too many, the above code is still using the value
from the set, which may exceed 'nvecs' used to kcalloc 'masks', so
'masks + usedvecs' may go out of bounds.

  
>  	/*
>  	 * Spread on non present CPUs starting from the next vector to be
> @@ -258,13 +272,21 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
>  {
>  	int resv = affd->pre_vectors + affd->post_vectors;
>  	int vecs = maxvec - resv;
> -	int ret;
> +	int set_vecs;
>  
>  	if (resv > minvec)
>  		return 0;
>  
> -	get_online_cpus();
> -	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
> -	put_online_cpus();
> -	return ret;
> +	if (affd->nr_sets) {
> +		int i;
> +
> +		for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
> +			set_vecs += affd->sets[i];
> +	} else {
> +		get_online_cpus();
> +		set_vecs = cpumask_weight(cpu_possible_mask);
> +		put_online_cpus();
> +	}
> +
> +	return resv + min(set_vecs, vecs);
>  }



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-29 16:37 ` [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs Jens Axboe
  2018-10-29 17:08   ` Thomas Gleixner
@ 2018-10-30  9:25   ` Ming Lei
  2018-10-30 14:26   ` Keith Busch
  2 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2018-10-30  9:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, linux-kernel, Thomas Gleixner

On Mon, Oct 29, 2018 at 10:37:35AM -0600, Jens Axboe wrote:
> A driver may have a need to allocate multiple sets of MSI/MSI-X
> interrupts, and have them appropriately affinitized. Add support for
> defining a number of sets in the irq_affinity structure, of varying
> sizes, and get each set affinitized correctly across the machine.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  include/linux/interrupt.h |  4 ++++
>  kernel/irq/affinity.c     | 40 ++++++++++++++++++++++++++++++---------
>  2 files changed, 35 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 1d6711c28271..ca397ff40836 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -247,10 +247,14 @@ struct irq_affinity_notify {
>   *			the MSI(-X) vector space
>   * @post_vectors:	Don't apply affinity to @post_vectors at end of
>   *			the MSI(-X) vector space
> + * @nr_sets:		Length of passed in *sets array
> + * @sets:		Number of affinitized sets
>   */
>  struct irq_affinity {
>  	int	pre_vectors;
>  	int	post_vectors;
> +	int	nr_sets;
> +	int	*sets;
>  };
>  
>  #if defined(CONFIG_SMP)
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index f4f29b9d90ee..2046a0f0f0f1 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	int curvec, usedvecs;
>  	cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
>  	struct cpumask *masks = NULL;
> +	int i, nr_sets;
>  
>  	/*
>  	 * If there aren't any vectors left after applying the pre/post
> @@ -210,10 +211,23 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
>  	get_online_cpus();
>  	build_node_to_cpumask(node_to_cpumask);
>  
> -	/* Spread on present CPUs starting from affd->pre_vectors */
> -	usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
> -					    node_to_cpumask, cpu_present_mask,
> -					    nmsk, masks);
> +	/*
> +	 * Spread on present CPUs starting from affd->pre_vectors. If we
> +	 * have multiple sets, build each sets affinity mask separately.
> +	 */
> +	nr_sets = affd->nr_sets;
> +	if (!nr_sets)
> +		nr_sets = 1;
> +
> +	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
> +		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
> +		int nr;
> +
> +		nr = irq_build_affinity_masks(affd, curvec, this_vecs,
> +					      node_to_cpumask, cpu_present_mask,
> +					      nmsk, masks + usedvecs);
> +		usedvecs += nr;
> +	}
>  
>  	/*
>  	 * Spread on non present CPUs starting from the next vector to be
> @@ -258,13 +272,21 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
>  {
>  	int resv = affd->pre_vectors + affd->post_vectors;
>  	int vecs = maxvec - resv;
> -	int ret;
> +	int set_vecs;
>  
>  	if (resv > minvec)
>  		return 0;
>  
> -	get_online_cpus();
> -	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
> -	put_online_cpus();
> -	return ret;
> +	if (affd->nr_sets) {
> +		int i;
> +
> +		for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
> +			set_vecs += affd->sets[i];
> +	} else {
> +		get_online_cpus();
> +		set_vecs = cpumask_weight(cpu_possible_mask);
> +		put_online_cpus();
> +	}
> +
> +	return resv + min(set_vecs, vecs);
>  }
> -- 
> 2.17.1
> 

Looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-29 17:08   ` Thomas Gleixner
@ 2018-10-29 17:09     ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2018-10-29 17:09 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-block, linux-scsi, linux-kernel

On 10/29/18 11:08 AM, Thomas Gleixner wrote:
> Jens,
> 
> On Mon, 29 Oct 2018, Jens Axboe wrote:
> 
>> A driver may have a need to allocate multiple sets of MSI/MSI-X
>> interrupts, and have them appropriately affinitized. Add support for
>> defining a number of sets in the irq_affinity structure, of varying
>> sizes, and get each set affinitized correctly across the machine.
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: linux-kernel@vger.kernel.org
>> Reviewed-by: Hannes Reinecke <hare@suse.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> This looks good.
> 
> Vs. merge logistics: I'm expecting some other changes in that area as per
> discussion with megasas (IIRC) folks. So I'd like to apply that myself
> right after -rc1 and provide it to you as a single commit to pull from so
> we can avoid collisions in next and the merge window.

That sounds fine, thanks Thomas!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-29 16:37 ` [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs Jens Axboe
@ 2018-10-29 17:08   ` Thomas Gleixner
  2018-10-29 17:09     ` Jens Axboe
  2018-10-30  9:25   ` Ming Lei
  2018-10-30 14:26   ` Keith Busch
  2 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2018-10-29 17:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, linux-kernel

Jens,

On Mon, 29 Oct 2018, Jens Axboe wrote:

> A driver may have a need to allocate multiple sets of MSI/MSI-X
> interrupts, and have them appropriately affinitized. Add support for
> defining a number of sets in the irq_affinity structure, of varying
> sizes, and get each set affinitized correctly across the machine.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

This looks good.

Vs. merge logistics: I'm expecting some other changes in that area as per
discussion with megasas (IIRC) folks. So I'd like to apply that myself
right after -rc1 and provide it to you as a single commit to pull from so
we can avoid collisions in next and the merge window.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs
  2018-10-29 16:37 [PATCHSET v2 0/14] blk-mq: Add support for multiple queue maps Jens Axboe
@ 2018-10-29 16:37 ` Jens Axboe
  2018-10-29 17:08   ` Thomas Gleixner
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Jens Axboe @ 2018-10-29 16:37 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-kernel; +Cc: Jens Axboe, Thomas Gleixner

A driver may have a need to allocate multiple sets of MSI/MSI-X
interrupts, and have them appropriately affinitized. Add support for
defining a number of sets in the irq_affinity structure, of varying
sizes, and get each set affinitized correctly across the machine.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/interrupt.h |  4 ++++
 kernel/irq/affinity.c     | 40 ++++++++++++++++++++++++++++++---------
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 1d6711c28271..ca397ff40836 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -247,10 +247,14 @@ struct irq_affinity_notify {
  *			the MSI(-X) vector space
  * @post_vectors:	Don't apply affinity to @post_vectors at end of
  *			the MSI(-X) vector space
+ * @nr_sets:		Length of passed in *sets array
+ * @sets:		Number of affinitized sets
  */
 struct irq_affinity {
 	int	pre_vectors;
 	int	post_vectors;
+	int	nr_sets;
+	int	*sets;
 };
 
 #if defined(CONFIG_SMP)
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..2046a0f0f0f1 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -180,6 +180,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	int curvec, usedvecs;
 	cpumask_var_t nmsk, npresmsk, *node_to_cpumask;
 	struct cpumask *masks = NULL;
+	int i, nr_sets;
 
 	/*
 	 * If there aren't any vectors left after applying the pre/post
@@ -210,10 +211,23 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	get_online_cpus();
 	build_node_to_cpumask(node_to_cpumask);
 
-	/* Spread on present CPUs starting from affd->pre_vectors */
-	usedvecs = irq_build_affinity_masks(affd, curvec, affvecs,
-					    node_to_cpumask, cpu_present_mask,
-					    nmsk, masks);
+	/*
+	 * Spread on present CPUs starting from affd->pre_vectors. If we
+	 * have multiple sets, build each sets affinity mask separately.
+	 */
+	nr_sets = affd->nr_sets;
+	if (!nr_sets)
+		nr_sets = 1;
+
+	for (i = 0, usedvecs = 0; i < nr_sets; i++) {
+		int this_vecs = affd->sets ? affd->sets[i] : affvecs;
+		int nr;
+
+		nr = irq_build_affinity_masks(affd, curvec, this_vecs,
+					      node_to_cpumask, cpu_present_mask,
+					      nmsk, masks + usedvecs);
+		usedvecs += nr;
+	}
 
 	/*
 	 * Spread on non present CPUs starting from the next vector to be
@@ -258,13 +272,21 @@ int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity
 {
 	int resv = affd->pre_vectors + affd->post_vectors;
 	int vecs = maxvec - resv;
-	int ret;
+	int set_vecs;
 
 	if (resv > minvec)
 		return 0;
 
-	get_online_cpus();
-	ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
-	put_online_cpus();
-	return ret;
+	if (affd->nr_sets) {
+		int i;
+
+		for (i = 0, set_vecs = 0;  i < affd->nr_sets; i++)
+			set_vecs += affd->sets[i];
+	} else {
+		get_online_cpus();
+		set_vecs = cpumask_weight(cpu_possible_mask);
+		put_online_cpus();
+	}
+
+	return resv + min(set_vecs, vecs);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-10-30 17:47 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20181025211626.12692-1-axboe@kernel.dk>
2018-10-25 21:16 ` [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs Jens Axboe
2018-10-25 21:52   ` Keith Busch
2018-10-25 23:07     ` Jens Axboe
2018-10-29  7:43   ` Hannes Reinecke
2018-10-29 16:37 [PATCHSET v2 0/14] blk-mq: Add support for multiple queue maps Jens Axboe
2018-10-29 16:37 ` [PATCH 11/14] irq: add support for allocating (and affinitizing) sets of IRQs Jens Axboe
2018-10-29 17:08   ` Thomas Gleixner
2018-10-29 17:09     ` Jens Axboe
2018-10-30  9:25   ` Ming Lei
2018-10-30 14:26   ` Keith Busch
2018-10-30 14:36     ` Jens Axboe
2018-10-30 14:45       ` Keith Busch
2018-10-30 14:53         ` Jens Axboe
2018-10-30 15:08           ` Keith Busch
2018-10-30 15:18             ` Jens Axboe
2018-10-30 16:02               ` Keith Busch
2018-10-30 16:42                 ` Jens Axboe
2018-10-30 17:09                   ` Jens Axboe
2018-10-30 17:22                     ` Keith Busch
2018-10-30 17:33                       ` Jens Axboe
2018-10-30 17:35                         ` Keith Busch
2018-10-30 17:25                   ` Thomas Gleixner
2018-10-30 17:34                     ` Jens Axboe
2018-10-30 17:43                       ` Jens Axboe
2018-10-30 17:46                       ` Thomas Gleixner
2018-10-30 17:47                         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).