linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts
@ 2018-09-13  3:10 Dou Liyang
  2018-09-13 11:00 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Dou Liyang @ 2018-09-13  3:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, kashyap.desai, shivasharan.srikanteshwara, sumit.saxena,
	ming.lei, hch, douly.fnst

From: Dou Liyang <douly.fnst@cn.fujitsu.com>

As Kashyap and Sumit reported, in MSI/-x subsystem, the pre/post vectors
may be used to some extra reply queues for performance. the best way to
map the pre/post vectors is map them to the local numa node.

But, current Linux can't do that, because

  The pre and post vectors are marked managed and their affinity mask
  is set to the irq default affinity mask.

  The default affinity mask is by default ALL cpus, but it can be tweaked
  both on the kernel command line and via proc.

  If that mask is only a subset of CPUs and all of them go offline
  then these vectors are shutdown in managed mode.

So, clear these affinity mask and check it in alloc_desc() to leave them
as regular interrupts which can be affinity controlled and also can move
freely on hotplug.

Note: will break the validation of affinity mask(s)

Reported-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reported-by: Sumit Saxena <sumit.saxena@broadcom.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 kernel/irq/affinity.c |  9 ++++++---
 kernel/irq/irqdesc.c  | 24 ++++++++++--------------
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..ba35a5050dd2 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -204,7 +204,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 
 	/* Fill out vectors at the beginning that don't need affinity */
 	for (curvec = 0; curvec < affd->pre_vectors; curvec++)
-		cpumask_copy(masks + curvec, irq_default_affinity);
+		cpumask_clear(masks + curvec);
 
 	/* Stabilize the cpumasks */
 	get_online_cpus();
@@ -234,10 +234,13 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd)
 	/* Fill out vectors at the end that don't need affinity */
 	if (usedvecs >= affvecs)
 		curvec = affd->pre_vectors + affvecs;
-	else
+	else {
 		curvec = affd->pre_vectors + usedvecs;
+		for (; curvec < affd->pre_vectors + affvecs; curvec++)
+			cpumask_copy(masks + curvec, irq_default_affinity);
+	}
 	for (; curvec < nvecs; curvec++)
-		cpumask_copy(masks + curvec, irq_default_affinity);
+		cpumask_clear(masks + curvec);
 
 outnodemsk:
 	free_node_to_cpumask(node_to_cpumask);
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 578d0e5f1b5b..5cffa791a20b 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -453,24 +453,20 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node,
 {
 	const struct cpumask *mask = NULL;
 	struct irq_desc *desc;
-	unsigned int flags;
+	unsigned int flags = 0;
 	int i;
 
-	/* Validate affinity mask(s) */
-	if (affinity) {
-		for (i = 0, mask = affinity; i < cnt; i++, mask++) {
-			if (cpumask_empty(mask))
-				return -EINVAL;
-		}
-	}
-
-	flags = affinity ? IRQD_AFFINITY_MANAGED | IRQD_MANAGED_SHUTDOWN : 0;
-	mask = NULL;
-
 	for (i = 0; i < cnt; i++) {
 		if (affinity) {
-			node = cpu_to_node(cpumask_first(affinity));
-			mask = affinity;
+			if (cpumask_empty(affinity)) {
+				flags = 0;
+				mask = NULL;
+			} else {
+				flags = IRQD_AFFINITY_MANAGED |
+					IRQD_MANAGED_SHUTDOWN;
+				mask = affinity;
+				node = cpu_to_node(cpumask_first(affinity));
+			}
 			affinity++;
 		}
 		desc = alloc_desc(start + i, node, flags, mask, owner);
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts
  2018-09-13  3:10 [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts Dou Liyang
@ 2018-09-13 11:00 ` Thomas Gleixner
  2018-09-20  8:39   ` Kashyap Desai
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2018-09-13 11:00 UTC (permalink / raw)
  To: Dou Liyang
  Cc: LKML, kashyap.desai, shivasharan.srikanteshwara, sumit.saxena,
	ming.lei, Christoph Hellwig, douly.fnst, Bjorn Helgaas

On Thu, 13 Sep 2018, Dou Liyang wrote:
> So, clear these affinity mask and check it in alloc_desc() to leave them
> as regular interrupts which can be affinity controlled and also can move
> freely on hotplug.

This is the wrong direction as it does not allow to do initial affinity
assignement for the non-managed interrupts on allocation time. And that's
what Kashyap and Sumit are looking for.

The trivial fix for the possible breakage when irq_default_affinity !=
cpu_possible_mask is to set the affinity for the pre/post vectors to
cpu_possible_mask and be done with it.

One other thing I noticed while staring at that is the fact, that the PCI
code does not care about the return value of irq_create_affinity_masks() at
all. It just proceeds if masks is NULL as if the stuff was requested with
the affinity descriptor pointer being NULL. I don't think that's a
brilliant idea because the drivers might rely on the interrupt being
managed, but it might be a non-issue and just result in bad locality.

Christoph?

So back to the problem at hand. We need to change the affinity management
scheme in a way which lets us differentiate between managed and unmanaged
interrupts, so it allows to automatically assign (initial) affinity to all
of them.

Right now everything is bound to the cpumasks array, which does not allow
to convey more information. So we need to come up with something different.

Something like the below (does not compile and is just for reference)
should do the trick. I'm not sure whether we want to convey the information
(masks and bitmap) in a single data structure or not, but that's an
implementation detail.

Thanks,

	tglx

8<---------
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -535,15 +535,16 @@ static struct msi_desc *
 msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity *affd)
 {
 	struct cpumask *masks = NULL;
+	unsigned long managed = 0;
 	struct msi_desc *entry;
 	u16 control;
 
 	if (affd)
-		masks = irq_create_affinity_masks(nvec, affd);
+		masks = irq_create_affinity_masks(nvec, affd, &managed);
 
 
 	/* MSI Entry Initialization */
-	entry = alloc_msi_entry(&dev->dev, nvec, masks);
+	entry = alloc_msi_entry(&dev->dev, nvec, masks, managed);
 	if (!entry)
 		goto out;
 
@@ -672,15 +673,22 @@ static int msix_setup_entries(struct pci
 			      struct msix_entry *entries, int nvec,
 			      const struct irq_affinity *affd)
 {
+	/*
+	 * MSIX_MAX_VECTORS = 2048, i.e. 256 bytes. Might need runtime
+	 * allocation. OTOH, are 2048 vectors realistic?
+	 */
+	DECLARE_BITMAP(managed, MSIX_MAX_VECTORS);
 	struct cpumask *curmsk, *masks = NULL;
 	struct msi_desc *entry;
 	int ret, i;
 
 	if (affd)
-		masks = irq_create_affinity_masks(nvec, affd);
+		masks = irq_create_affinity_masks(nvec, affd, managed);
 
 	for (i = 0, curmsk = masks; i < nvec; i++) {
-		entry = alloc_msi_entry(&dev->dev, 1, curmsk);
+		unsigned long m = test_bit(i, managed) ? 1 : 0;
+
+		entry = alloc_msi_entry(&dev->dev, 1, curmsk, m);
 		if (!entry) {
 			if (!i)
 				iounmap(base);
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -27,7 +27,8 @@
  * and the affinity masks from @affinity are copied.
  */
 struct msi_desc *
-alloc_msi_entry(struct device *dev, int nvec, const struct cpumask *affinity)
+alloc_msi_entry(struct device *dev, int nvec, const struct cpumask *affinity,
+		unsigned long managed)
 {
 	struct msi_desc *desc;
 
@@ -38,6 +39,7 @@ alloc_msi_entry(struct device *dev, int
 	INIT_LIST_HEAD(&desc->list);
 	desc->dev = dev;
 	desc->nvec_used = nvec;
+	desc->managed = managed;
 	if (affinity) {
 		desc->affinity = kmemdup(affinity,
 			nvec * sizeof(*desc->affinity), GFP_KERNEL);
@@ -416,7 +418,7 @@ int msi_domain_alloc_irqs(struct irq_dom
 
 		virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used,
 					       dev_to_node(dev), &arg, false,
-					       desc->affinity);
+					       desc->affinity, desc->managed);
 		if (virq < 0) {
 			ret = -ENOSPC;
 			if (ops->handle_error)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts
  2018-09-13 11:00 ` Thomas Gleixner
@ 2018-09-20  8:39   ` Kashyap Desai
  2018-09-20 12:30     ` Dou Liyang
  0 siblings, 1 reply; 4+ messages in thread
From: Kashyap Desai @ 2018-09-20  8:39 UTC (permalink / raw)
  To: Thomas Gleixner, Dou Liyang
  Cc: LKML, Shivasharan Srikanteshwara, Sumit Saxena, ming.lei,
	Christoph Hellwig, douly.fnst, Bjorn Helgaas

> This is the wrong direction as it does not allow to do initial affinity
> assignement for the non-managed interrupts on allocation time. And
that's
> what Kashyap and Sumit are looking for.
>
> The trivial fix for the possible breakage when irq_default_affinity !=
> cpu_possible_mask is to set the affinity for the pre/post vectors to
> cpu_possible_mask and be done with it.
>
> One other thing I noticed while staring at that is the fact, that the
PCI
> code does not care about the return value of irq_create_affinity_masks()
at
> all. It just proceeds if masks is NULL as if the stuff was requested
with
> the affinity descriptor pointer being NULL. I don't think that's a
> brilliant idea because the drivers might rely on the interrupt being
> managed, but it might be a non-issue and just result in bad locality.
>
> Christoph?
>
> So back to the problem at hand. We need to change the affinity
management
> scheme in a way which lets us differentiate between managed and
> unmanaged
> interrupts, so it allows to automatically assign (initial) affinity to
all
> of them.
>
> Right now everything is bound to the cpumasks array, which does not
allow
> to convey more information. So we need to come up with something
> different.
>
> Something like the below (does not compile and is just for reference)
> should do the trick. I'm not sure whether we want to convey the
information
> (masks and bitmap) in a single data structure or not, but that's an
> implementation detail.


Dou -  Do you want me to test your patch or shall I wait for next version
?

>
> Thanks,
>
> 	tglx
>
> 8<---------
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -535,15 +535,16 @@ static struct msi_desc *
>  msi_setup_entry(struct pci_dev *dev, int nvec, const struct
irq_affinity *affd)
>  {
>  	struct cpumask *masks = NULL;
> +	unsigned long managed = 0;
>  	struct msi_desc *entry;
>  	u16 control;
>
>  	if (affd)
> -		masks = irq_create_affinity_masks(nvec, affd);
> +		masks = irq_create_affinity_masks(nvec, affd, &managed);
>
>
>  	/* MSI Entry Initialization */
> -	entry = alloc_msi_entry(&dev->dev, nvec, masks);
> +	entry = alloc_msi_entry(&dev->dev, nvec, masks, managed);
>  	if (!entry)
>  		goto out;
>
> @@ -672,15 +673,22 @@ static int msix_setup_entries(struct pci
>  			      struct msix_entry *entries, int nvec,
>  			      const struct irq_affinity *affd)
>  {
> +	/*
> +	 * MSIX_MAX_VECTORS = 2048, i.e. 256 bytes. Might need runtime
> +	 * allocation. OTOH, are 2048 vectors realistic?
> +	 */
> +	DECLARE_BITMAP(managed, MSIX_MAX_VECTORS);
>  	struct cpumask *curmsk, *masks = NULL;
>  	struct msi_desc *entry;
>  	int ret, i;
>
>  	if (affd)
> -		masks = irq_create_affinity_masks(nvec, affd);
> +		masks = irq_create_affinity_masks(nvec, affd, managed);
>
>  	for (i = 0, curmsk = masks; i < nvec; i++) {
> -		entry = alloc_msi_entry(&dev->dev, 1, curmsk);
> +		unsigned long m = test_bit(i, managed) ? 1 : 0;
> +
> +		entry = alloc_msi_entry(&dev->dev, 1, curmsk, m);
>  		if (!entry) {
>  			if (!i)
>  				iounmap(base);
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -27,7 +27,8 @@
>   * and the affinity masks from @affinity are copied.
>   */
>  struct msi_desc *
> -alloc_msi_entry(struct device *dev, int nvec, const struct cpumask
*affinity)
> +alloc_msi_entry(struct device *dev, int nvec, const struct cpumask
*affinity,
> +		unsigned long managed)
>  {
>  	struct msi_desc *desc;
>
> @@ -38,6 +39,7 @@ alloc_msi_entry(struct device *dev, int
>  	INIT_LIST_HEAD(&desc->list);
>  	desc->dev = dev;
>  	desc->nvec_used = nvec;
> +	desc->managed = managed;
>  	if (affinity) {
>  		desc->affinity = kmemdup(affinity,
>  			nvec * sizeof(*desc->affinity), GFP_KERNEL);
> @@ -416,7 +418,7 @@ int msi_domain_alloc_irqs(struct irq_dom
>
>  		virq = __irq_domain_alloc_irqs(domain, -1,
desc->nvec_used,
>  					       dev_to_node(dev), &arg,
false,
> -					       desc->affinity);
> +					       desc->affinity,
desc->managed);
>  		if (virq < 0) {
>  			ret = -ENOSPC;
>  			if (ops->handle_error)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts
  2018-09-20  8:39   ` Kashyap Desai
@ 2018-09-20 12:30     ` Dou Liyang
  0 siblings, 0 replies; 4+ messages in thread
From: Dou Liyang @ 2018-09-20 12:30 UTC (permalink / raw)
  To: Kashyap Desai, Thomas Gleixner
  Cc: LKML, Shivasharan Srikanteshwara, Sumit Saxena, ming.lei,
	Christoph Hellwig, douly.fnst, Bjorn Helgaas

Hi Kashyap,

On 2018/9/20 16:39, Kashyap Desai worte:
> Dou -  Do you want me to test your patch or shall I wait for next version
> ?

I'm sorry, please wait for the next version.

Thanks,
	dou


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-09-20 13:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13  3:10 [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts Dou Liyang
2018-09-13 11:00 ` Thomas Gleixner
2018-09-20  8:39   ` Kashyap Desai
2018-09-20 12:30     ` Dou Liyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).