linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] net: mvpp2: Survive CPU hotplug events
@ 2022-02-16  9:08 Marc Zyngier
  2022-02-16  9:08 ` [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity() Marc Zyngier
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Marc Zyngier @ 2022-02-16  9:08 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, Marcin Wojtas, Russell King, David S. Miller,
	Jakub Kicinski, Thomas Gleixner, John Garry, kernel-team

I recently realised that playing with CPU hotplug on a system equiped
with a set of MVPP2 devices (Marvell 8040) was fraught with danger and
would result in a rapid lockup or panic.

As it turns out, the per-CPU nature of the MVPP2 interrupts are
getting in the way. A good solution for this seems to rely on the
kernel's managed interrupt approach, where the core kernel will not
move interrupts around as the CPUs for down, but will simply disable
the corresponding interrupt.

Converting the driver to this requires a bit of refactoring in the IRQ
subsystem to expose the required primitive, as well as a bit of
surgery in the driver itself.

Note that although the system now survives such event, the driver
seems to assume that all queues are always active and doesn't inform
the device that a CPU has gone away. Someout who actually understand
this driver should have a look at it.

Patches on top of 5.17-rc3, lightly tested on a McBin.

Marc Zyngier (2):
  genirq: Extract irq_set_affinity_masks() from
    devm_platform_get_irqs_affinity()
  net: mvpp2: Convert to managed interrupts to fix CPU HP issues

 drivers/base/platform.c                       | 20 +-----
 drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |  1 -
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 67 ++++++++++---------
 include/linux/interrupt.h                     |  8 +++
 kernel/irq/affinity.c                         | 27 ++++++++
 5 files changed, 72 insertions(+), 51 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity()
  2022-02-16  9:08 [PATCH 0/2] net: mvpp2: Survive CPU hotplug events Marc Zyngier
@ 2022-02-16  9:08 ` Marc Zyngier
  2022-02-16 10:56   ` Greg Kroah-Hartman
  2022-02-17 17:07   ` John Garry
  2022-02-16  9:08 ` [PATCH 2/2] net: mvpp2: Convert to managed interrupts to fix CPU HP issues Marc Zyngier
  2022-02-16 13:19 ` [PATCH 0/2] net: mvpp2: Survive CPU hotplug events Marcin Wojtas
  2 siblings, 2 replies; 12+ messages in thread
From: Marc Zyngier @ 2022-02-16  9:08 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, Marcin Wojtas, Russell King, David S. Miller,
	Jakub Kicinski, Thomas Gleixner, John Garry, kernel-team

In order to better support drivers that deal with interrupts in a more
"hands-on" way, extract the core of devm_platform_get_irqs_affinity()
and expose it as irq_set_affinity_masks().

This helper allows a driver to provide a set of wired interrupts that
are to be configured as managed interrupts. As with the original helper,
this is exported as EXPORT_SYMBOL_GPL.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/base/platform.c   | 20 +++-----------------
 include/linux/interrupt.h |  8 ++++++++
 kernel/irq/affinity.c     | 27 +++++++++++++++++++++++++++
 3 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 6cb04ac48bf0..b363cf6ce5be 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -335,7 +335,6 @@ int devm_platform_get_irqs_affinity(struct platform_device *dev,
 				    int **irqs)
 {
 	struct irq_affinity_devres *ptr;
-	struct irq_affinity_desc *desc;
 	size_t size;
 	int i, ret, nvec;
 
@@ -376,31 +375,18 @@ int devm_platform_get_irqs_affinity(struct platform_device *dev,
 		ptr->irq[i] = irq;
 	}
 
-	desc = irq_create_affinity_masks(nvec, affd);
-	if (!desc) {
-		ret = -ENOMEM;
+	ret = irq_set_affinity_masks(affd, ptr->irq, nvec);
+	if (ret) {
+		dev_err(&dev->dev, "failed to update affinity descriptors (%d)\n", ret);
 		goto err_free_devres;
 	}
 
-	for (i = 0; i < nvec; i++) {
-		ret = irq_update_affinity_desc(ptr->irq[i], &desc[i]);
-		if (ret) {
-			dev_err(&dev->dev, "failed to update irq%d affinity descriptor (%d)\n",
-				ptr->irq[i], ret);
-			goto err_free_desc;
-		}
-	}
-
 	devres_add(&dev->dev, ptr);
 
-	kfree(desc);
-
 	*irqs = ptr->irq;
 
 	return nvec;
 
-err_free_desc:
-	kfree(desc);
 err_free_devres:
 	devres_free(ptr);
 	return ret;
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 9367f1cb2e3c..6bfce96206f8 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -381,6 +381,8 @@ irq_create_affinity_masks(unsigned int nvec, struct irq_affinity *affd);
 unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
 				       const struct irq_affinity *affd);
 
+int irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, int nvec);
+
 #else /* CONFIG_SMP */
 
 static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
@@ -443,6 +445,12 @@ irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
 	return maxvec;
 }
 
+static inline int
+irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, int nvec)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_SMP */
 
 /*
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f7ff8919dc9b..c0f868cd5b87 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -512,3 +512,30 @@ unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
 
 	return resv + min(set_vecs, maxvec - resv);
 }
+
+/*
+ * irq_set_affinity_masks - Set the affinity masks of a number of interrupts
+ *                          for multiqueue spreading
+ * @affd:	Description of the affinity requirements
+ * @irqs:	An array of interrupt numbers
+ * @nvec:	The total number of interrupts
+ */
+int irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, int nvec)
+{
+	struct irq_affinity_desc *desc;
+	int i, err = 0;
+
+	desc = irq_create_affinity_masks(nvec, affd);
+	if (!desc)
+		return -ENOMEM;
+
+	for (i = 0; i < nvec; i++) {
+		err = irq_update_affinity_desc(irqs[i], desc + i);
+		if (err)
+			break;
+	}
+
+	kfree(desc);
+	return err;
+}
+EXPORT_SYMBOL_GPL(irq_set_affinity_masks);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] net: mvpp2: Convert to managed interrupts to fix CPU HP issues
  2022-02-16  9:08 [PATCH 0/2] net: mvpp2: Survive CPU hotplug events Marc Zyngier
  2022-02-16  9:08 ` [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity() Marc Zyngier
@ 2022-02-16  9:08 ` Marc Zyngier
  2022-02-16 11:38   ` Marc Zyngier
  2022-02-16 13:19 ` [PATCH 0/2] net: mvpp2: Survive CPU hotplug events Marcin Wojtas
  2 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2022-02-16  9:08 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, Marcin Wojtas, Russell King, David S. Miller,
	Jakub Kicinski, Thomas Gleixner, John Garry, kernel-team

The MVPP2 driver uses a set of per-CPU interrupts and relies on
each particular interrupt to fire *only* on the CPU it has been
assigned to.

Although the affinity setting is restricted to prevent userspace
to move interrupts around, this all falls apart when using CPU
hotplug, as this breaks the affinity. Depending on how lucky you
are, the interrupt will then scream on the wrong CPU, eventually
leading to an ugly crash.

Ideally, the interrupt assigned to a given CPU would simply be left
where it is, only masked when the CPU goes down, and brought back
up when the CPU is alive again. As it turns out, this is the model
used for most multi-queue devices, and we'd be better off using it
for the MVPP2 driver.

Drop the home-baked affinity settings in favour of the ready-made
irq_set_affinity_masks() helper, making things slightly simpler.

With this change, the driver able to sustain CPUs being taken away.
What is still missing is a way to tell the device that it should
stop sending traffic to a given CPU.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |  1 -
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 67 ++++++++++---------
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index ad73a488fc5f..86f8feaf5350 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -1143,7 +1143,6 @@ struct mvpp2_queue_vector {
 	int nrxqs;
 	u32 pending_cause_rx;
 	struct mvpp2_port *port;
-	struct cpumask *mask;
 };
 
 /* Internal represention of a Flow Steering rule */
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 7cdbf8b8bbf6..cdc519583e86 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -4674,49 +4674,54 @@ static void mvpp21_get_mac_address(struct mvpp2_port *port, unsigned char *addr)
 
 static int mvpp2_irqs_init(struct mvpp2_port *port)
 {
-	int err, i;
+	struct irq_affinity affd = {
+		/* No pre/post-vectors, single set */
+	};
+	int err, i, nvec, *irqs;
 
-	for (i = 0; i < port->nqvecs; i++) {
+	for (i = nvec = 0; i < port->nqvecs; i++) {
 		struct mvpp2_queue_vector *qv = port->qvecs + i;
 
-		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE) {
-			qv->mask = kzalloc(cpumask_size(), GFP_KERNEL);
-			if (!qv->mask) {
-				err = -ENOMEM;
-				goto err;
-			}
+		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE)
+			nvec++;
+	}
 
-			irq_set_status_flags(qv->irq, IRQ_NO_BALANCING);
-		}
+	irqs = kmalloc(sizeof(*irqs) * nvec, GFP_KERNEL);
+	if (!irqs)
+		return -ENOMEM;
 
-		err = request_irq(qv->irq, mvpp2_isr, 0, port->dev->name, qv);
-		if (err)
-			goto err;
+	for (i = 0; i < port->nqvecs; i++) {
+		struct mvpp2_queue_vector *qv = port->qvecs + i;
 
-		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE) {
-			unsigned int cpu;
+		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE)
+			irqs[i] = qv->irq;
+	}
 
-			for_each_present_cpu(cpu) {
-				if (mvpp2_cpu_to_thread(port->priv, cpu) ==
-				    qv->sw_thread_id)
-					cpumask_set_cpu(cpu, qv->mask);
-			}
+	err = irq_set_affinity_masks(&affd, irqs, nvec);
+	if (err)
+		goto err;
 
-			irq_set_affinity_hint(qv->irq, qv->mask);
+	for (i = 0; i < port->nqvecs; i++) {
+		struct mvpp2_queue_vector *qv = port->qvecs + i;
+
+		err = request_irq(qv->irq, mvpp2_isr, 0, port->dev->name, qv);
+		if (err) {
+			nvec = i;
+			break;
 		}
 	}
 
-	return 0;
-err:
-	for (i = 0; i < port->nqvecs; i++) {
-		struct mvpp2_queue_vector *qv = port->qvecs + i;
+	if (err) {
+		for (i = 0; i < nvec; i++) {
+			struct mvpp2_queue_vector *qv = port->qvecs + i;
 
-		irq_set_affinity_hint(qv->irq, NULL);
-		kfree(qv->mask);
-		qv->mask = NULL;
-		free_irq(qv->irq, qv);
+			free_irq(qv->irq, qv);
+		}
 	}
 
+err:
+	kfree(irqs);
+
 	return err;
 }
 
@@ -4727,10 +4732,6 @@ static void mvpp2_irqs_deinit(struct mvpp2_port *port)
 	for (i = 0; i < port->nqvecs; i++) {
 		struct mvpp2_queue_vector *qv = port->qvecs + i;
 
-		irq_set_affinity_hint(qv->irq, NULL);
-		kfree(qv->mask);
-		qv->mask = NULL;
-		irq_clear_status_flags(qv->irq, IRQ_NO_BALANCING);
 		free_irq(qv->irq, qv);
 	}
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity()
  2022-02-16  9:08 ` [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity() Marc Zyngier
@ 2022-02-16 10:56   ` Greg Kroah-Hartman
  2022-02-17 17:07   ` John Garry
  1 sibling, 0 replies; 12+ messages in thread
From: Greg Kroah-Hartman @ 2022-02-16 10:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, netdev, Marcin Wojtas, Russell King,
	David S. Miller, Jakub Kicinski, Thomas Gleixner, John Garry,
	kernel-team

On Wed, Feb 16, 2022 at 09:08:44AM +0000, Marc Zyngier wrote:
> In order to better support drivers that deal with interrupts in a more
> "hands-on" way, extract the core of devm_platform_get_irqs_affinity()
> and expose it as irq_set_affinity_masks().
> 
> This helper allows a driver to provide a set of wired interrupts that
> are to be configured as managed interrupts. As with the original helper,
> this is exported as EXPORT_SYMBOL_GPL.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  drivers/base/platform.c   | 20 +++-----------------
>  include/linux/interrupt.h |  8 ++++++++
>  kernel/irq/affinity.c     | 27 +++++++++++++++++++++++++++
>  3 files changed, 38 insertions(+), 17 deletions(-)
> 

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] net: mvpp2: Convert to managed interrupts to fix CPU HP issues
  2022-02-16  9:08 ` [PATCH 2/2] net: mvpp2: Convert to managed interrupts to fix CPU HP issues Marc Zyngier
@ 2022-02-16 11:38   ` Marc Zyngier
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2022-02-16 11:38 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, Marcin Wojtas, Russell King, David S. Miller,
	Jakub Kicinski, Thomas Gleixner, John Garry, kernel-team

On 2022-02-16 09:08, Marc Zyngier wrote:
> The MVPP2 driver uses a set of per-CPU interrupts and relies on
> each particular interrupt to fire *only* on the CPU it has been
> assigned to.
> 
> Although the affinity setting is restricted to prevent userspace
> to move interrupts around, this all falls apart when using CPU
> hotplug, as this breaks the affinity. Depending on how lucky you
> are, the interrupt will then scream on the wrong CPU, eventually
> leading to an ugly crash.
> 
> Ideally, the interrupt assigned to a given CPU would simply be left
> where it is, only masked when the CPU goes down, and brought back
> up when the CPU is alive again. As it turns out, this is the model
> used for most multi-queue devices, and we'd be better off using it
> for the MVPP2 driver.
> 
> Drop the home-baked affinity settings in favour of the ready-made
> irq_set_affinity_masks() helper, making things slightly simpler.
> 
> With this change, the driver able to sustain CPUs being taken away.
> What is still missing is a way to tell the device that it should
> stop sending traffic to a given CPU.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |  1 -
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 67 ++++++++++---------
>  2 files changed, 34 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
> b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
> index ad73a488fc5f..86f8feaf5350 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
> @@ -1143,7 +1143,6 @@ struct mvpp2_queue_vector {
>  	int nrxqs;
>  	u32 pending_cause_rx;
>  	struct mvpp2_port *port;
> -	struct cpumask *mask;
>  };
> 
>  /* Internal represention of a Flow Steering rule */
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> index 7cdbf8b8bbf6..cdc519583e86 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -4674,49 +4674,54 @@ static void mvpp21_get_mac_address(struct
> mvpp2_port *port, unsigned char *addr)
> 
>  static int mvpp2_irqs_init(struct mvpp2_port *port)
>  {
> -	int err, i;
> +	struct irq_affinity affd = {
> +		/* No pre/post-vectors, single set */
> +	};
> +	int err, i, nvec, *irqs;
> 
> -	for (i = 0; i < port->nqvecs; i++) {
> +	for (i = nvec = 0; i < port->nqvecs; i++) {
>  		struct mvpp2_queue_vector *qv = port->qvecs + i;
> 
> -		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE) {
> -			qv->mask = kzalloc(cpumask_size(), GFP_KERNEL);
> -			if (!qv->mask) {
> -				err = -ENOMEM;
> -				goto err;
> -			}
> +		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE)
> +			nvec++;
> +	}
> 
> -			irq_set_status_flags(qv->irq, IRQ_NO_BALANCING);
> -		}
> +	irqs = kmalloc(sizeof(*irqs) * nvec, GFP_KERNEL);
> +	if (!irqs)
> +		return -ENOMEM;
> 
> -		err = request_irq(qv->irq, mvpp2_isr, 0, port->dev->name, qv);
> -		if (err)
> -			goto err;
> +	for (i = 0; i < port->nqvecs; i++) {
> +		struct mvpp2_queue_vector *qv = port->qvecs + i;
> 
> -		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE) {
> -			unsigned int cpu;
> +		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE)
> +			irqs[i] = qv->irq;
> +	}

Errr, this is broken. non-private interrupts are not accounted for
in the sizing of the irqs[] array, so using 'i' as the index is
plain wrong.

I have added this on top:

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c 
b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index cdc519583e86..518ef07a067b 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -4690,11 +4690,11 @@ static int mvpp2_irqs_init(struct mvpp2_port 
*port)
  	if (!irqs)
  		return -ENOMEM;

-	for (i = 0; i < port->nqvecs; i++) {
+	for (i = nvec = 0; i < port->nqvecs; i++) {
  		struct mvpp2_queue_vector *qv = port->qvecs + i;

  		if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE)
-			irqs[i] = qv->irq;
+			irqs[nvec++] = qv->irq;
  	}

  	err = irq_set_affinity_masks(&affd, irqs, nvec);

Thanks to Russell for pointing out that something was amiss.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] net: mvpp2: Survive CPU hotplug events
  2022-02-16  9:08 [PATCH 0/2] net: mvpp2: Survive CPU hotplug events Marc Zyngier
  2022-02-16  9:08 ` [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity() Marc Zyngier
  2022-02-16  9:08 ` [PATCH 2/2] net: mvpp2: Convert to managed interrupts to fix CPU HP issues Marc Zyngier
@ 2022-02-16 13:19 ` Marcin Wojtas
  2022-02-16 13:29   ` Marc Zyngier
  2 siblings, 1 reply; 12+ messages in thread
From: Marcin Wojtas @ 2022-02-16 13:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Linux Kernel Mailing List, netdev, Greg Kroah-Hartman,
	Russell King, David S. Miller, Jakub Kicinski, Thomas Gleixner,
	John Garry, kernel-team

Hi Marc,

śr., 16 lut 2022 o 10:08 Marc Zyngier <maz@kernel.org> napisał(a):
>
> I recently realised that playing with CPU hotplug on a system equiped
> with a set of MVPP2 devices (Marvell 8040) was fraught with danger and
> would result in a rapid lockup or panic.
>
> As it turns out, the per-CPU nature of the MVPP2 interrupts are
> getting in the way. A good solution for this seems to rely on the
> kernel's managed interrupt approach, where the core kernel will not
> move interrupts around as the CPUs for down, but will simply disable
> the corresponding interrupt.
>
> Converting the driver to this requires a bit of refactoring in the IRQ
> subsystem to expose the required primitive, as well as a bit of
> surgery in the driver itself.
>
> Note that although the system now survives such event, the driver
> seems to assume that all queues are always active and doesn't inform
> the device that a CPU has gone away. Someout who actually understand
> this driver should have a look at it.
>
> Patches on top of 5.17-rc3, lightly tested on a McBin.
>

Thank you for the patches. Can you, please, share the commands you
used? I'd like to test it more.

Best regards,
Marcin

> Marc Zyngier (2):
>   genirq: Extract irq_set_affinity_masks() from
>     devm_platform_get_irqs_affinity()
>   net: mvpp2: Convert to managed interrupts to fix CPU HP issues
>
>  drivers/base/platform.c                       | 20 +-----
>  drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |  1 -
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 67 ++++++++++---------
>  include/linux/interrupt.h                     |  8 +++
>  kernel/irq/affinity.c                         | 27 ++++++++
>  5 files changed, 72 insertions(+), 51 deletions(-)
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] net: mvpp2: Survive CPU hotplug events
  2022-02-16 13:19 ` [PATCH 0/2] net: mvpp2: Survive CPU hotplug events Marcin Wojtas
@ 2022-02-16 13:29   ` Marc Zyngier
  2022-02-16 13:32     ` Marcin Wojtas
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2022-02-16 13:29 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Linux Kernel Mailing List, netdev, Greg Kroah-Hartman,
	Russell King, David S. Miller, Jakub Kicinski, Thomas Gleixner,
	John Garry, kernel-team

On Wed, 16 Feb 2022 13:19:30 +0000,
Marcin Wojtas <mw@semihalf.com> wrote:
> 
> Hi Marc,
> 
> śr., 16 lut 2022 o 10:08 Marc Zyngier <maz@kernel.org> napisał(a):
> >
> > I recently realised that playing with CPU hotplug on a system equiped
> > with a set of MVPP2 devices (Marvell 8040) was fraught with danger and
> > would result in a rapid lockup or panic.
> >
> > As it turns out, the per-CPU nature of the MVPP2 interrupts are
> > getting in the way. A good solution for this seems to rely on the
> > kernel's managed interrupt approach, where the core kernel will not
> > move interrupts around as the CPUs for down, but will simply disable
> > the corresponding interrupt.
> >
> > Converting the driver to this requires a bit of refactoring in the IRQ
> > subsystem to expose the required primitive, as well as a bit of
> > surgery in the driver itself.
> >
> > Note that although the system now survives such event, the driver
> > seems to assume that all queues are always active and doesn't inform
> > the device that a CPU has gone away. Someout who actually understand
> > this driver should have a look at it.
> >
> > Patches on top of 5.17-rc3, lightly tested on a McBin.
> >
> 
> Thank you for the patches. Can you, please, share the commands you
> used? I'd like to test it more.

Offline CPU3:
# echo 0 > /sys/devices/system/cpu/cpu3/online

Online CPU3:
# echo 1 > /sys/devices/system/cpu/cpu3/online

Put that in a loop, using different CPUs.

On my HW, turning off CPU0 leads to odd behaviours (I wouldn't be
surprised if the firmware was broken in that respect, and also the
fact that the device keeps trying to send stuff to that CPU...).

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] net: mvpp2: Survive CPU hotplug events
  2022-02-16 13:29   ` Marc Zyngier
@ 2022-02-16 13:32     ` Marcin Wojtas
  0 siblings, 0 replies; 12+ messages in thread
From: Marcin Wojtas @ 2022-02-16 13:32 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Linux Kernel Mailing List, netdev, Greg Kroah-Hartman,
	Russell King, David S. Miller, Jakub Kicinski, Thomas Gleixner,
	John Garry, kernel-team

śr., 16 lut 2022 o 14:29 Marc Zyngier <maz@kernel.org> napisał(a):
>
> On Wed, 16 Feb 2022 13:19:30 +0000,
> Marcin Wojtas <mw@semihalf.com> wrote:
> >
> > Hi Marc,
> >
> > śr., 16 lut 2022 o 10:08 Marc Zyngier <maz@kernel.org> napisał(a):
> > >
> > > I recently realised that playing with CPU hotplug on a system equiped
> > > with a set of MVPP2 devices (Marvell 8040) was fraught with danger and
> > > would result in a rapid lockup or panic.
> > >
> > > As it turns out, the per-CPU nature of the MVPP2 interrupts are
> > > getting in the way. A good solution for this seems to rely on the
> > > kernel's managed interrupt approach, where the core kernel will not
> > > move interrupts around as the CPUs for down, but will simply disable
> > > the corresponding interrupt.
> > >
> > > Converting the driver to this requires a bit of refactoring in the IRQ
> > > subsystem to expose the required primitive, as well as a bit of
> > > surgery in the driver itself.
> > >
> > > Note that although the system now survives such event, the driver
> > > seems to assume that all queues are always active and doesn't inform
> > > the device that a CPU has gone away. Someout who actually understand
> > > this driver should have a look at it.
> > >
> > > Patches on top of 5.17-rc3, lightly tested on a McBin.
> > >
> >
> > Thank you for the patches. Can you, please, share the commands you
> > used? I'd like to test it more.
>
> Offline CPU3:
> # echo 0 > /sys/devices/system/cpu/cpu3/online
>
> Online CPU3:
> # echo 1 > /sys/devices/system/cpu/cpu3/online
>
> Put that in a loop, using different CPUs.
>
> On my HW, turning off CPU0 leads to odd behaviours (I wouldn't be
> surprised if the firmware was broken in that respect, and also the
> fact that the device keeps trying to send stuff to that CPU...).
>

Thanks, I think stressing DUT with traffic during CPU hotplug will be
a good scenario - I'll try that.

Marcin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity()
  2022-02-16  9:08 ` [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity() Marc Zyngier
  2022-02-16 10:56   ` Greg Kroah-Hartman
@ 2022-02-17 17:07   ` John Garry
  2022-02-17 17:17     ` Marc Zyngier
  1 sibling, 1 reply; 12+ messages in thread
From: John Garry @ 2022-02-17 17:07 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel, netdev
  Cc: Greg Kroah-Hartman, Marcin Wojtas, Russell King, David S. Miller,
	Jakub Kicinski, Thomas Gleixner, kernel-team

On 16/02/2022 09:08, Marc Zyngier wrote:

Hi Marc,

> In order to better support drivers that deal with interrupts in a more
> "hands-on" way, extract the core of devm_platform_get_irqs_affinity()
> and expose it as irq_set_affinity_masks().
> 
> This helper allows a driver to provide a set of wired interrupts that
> are to be configured as managed interrupts. As with the original helper,
> this is exported as EXPORT_SYMBOL_GPL.

I know you mentioned it in 2/2, but it would be interesting to see how 
network controller drivers can handle the problem of missing in-flight 
IO completions for managed irq shutdown. For storage controllers this is 
all now safely handled in the block layer.

> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Just a small comment, below.

Tested-by: John Garry <john.garry@huawei.com> #D05

Thanks,
John

> ---
>   drivers/base/platform.c   | 20 +++-----------------
>   include/linux/interrupt.h |  8 ++++++++
>   kernel/irq/affinity.c     | 27 +++++++++++++++++++++++++++
>   3 files changed, 38 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6cb04ac48bf0..b363cf6ce5be 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -335,7 +335,6 @@ int devm_platform_get_irqs_affinity(struct platform_device *dev,
>   				    int **irqs)
>   {
>   	struct irq_affinity_devres *ptr;
> -	struct irq_affinity_desc *desc;
>   	size_t size;
>   	int i, ret, nvec;
>   
> @@ -376,31 +375,18 @@ int devm_platform_get_irqs_affinity(struct platform_device *dev,
>   		ptr->irq[i] = irq;
>   	}
>   
> -	desc = irq_create_affinity_masks(nvec, affd);
> -	if (!desc) {
> -		ret = -ENOMEM;
> +	ret = irq_set_affinity_masks(affd, ptr->irq, nvec);
> +	if (ret) {
> +		dev_err(&dev->dev, "failed to update affinity descriptors (%d)\n", ret);
>   		goto err_free_devres;
>   	}
>   
> -	for (i = 0; i < nvec; i++) {
> -		ret = irq_update_affinity_desc(ptr->irq[i], &desc[i]);
> -		if (ret) {
> -			dev_err(&dev->dev, "failed to update irq%d affinity descriptor (%d)\n",
> -				ptr->irq[i], ret);
> -			goto err_free_desc;
> -		}
> -	}
> -
>   	devres_add(&dev->dev, ptr);
>   
> -	kfree(desc);
> -
>   	*irqs = ptr->irq;
>   
>   	return nvec;
>   
> -err_free_desc:
> -	kfree(desc);
>   err_free_devres:
>   	devres_free(ptr);
>   	return ret;
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 9367f1cb2e3c..6bfce96206f8 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -381,6 +381,8 @@ irq_create_affinity_masks(unsigned int nvec, struct irq_affinity *affd);
>   unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
>   				       const struct irq_affinity *affd);
>   
> +int irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, int nvec);
> +
>   #else /* CONFIG_SMP */
>   
>   static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
> @@ -443,6 +445,12 @@ irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
>   	return maxvec;
>   }
>   
> +static inline int
> +irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, int nvec)
> +{
> +	return -EINVAL;
> +}
> +
>   #endif /* CONFIG_SMP */
>   
>   /*
> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
> index f7ff8919dc9b..c0f868cd5b87 100644
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -512,3 +512,30 @@ unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec,
>   
>   	return resv + min(set_vecs, maxvec - resv);
>   }
> +
> +/*
> + * irq_set_affinity_masks - Set the affinity masks of a number of interrupts
> + *                          for multiqueue spreading
> + * @affd:	Description of the affinity requirements
> + * @irqs:	An array of interrupt numbers
> + * @nvec:	The total number of interrupts
> + */
> +int irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, int nvec)
> +{
> +	struct irq_affinity_desc *desc;
> +	int i, err = 0;

nit: it might be worth doing something similar to how 
pci_alloc_irq_vectors_affinity() handles sets with no pre- and 
post-vectors with msi_default_affd

> +
> +	desc = irq_create_affinity_masks(nvec, affd);
> +	if (!desc)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < nvec; i++) {
> +		err = irq_update_affinity_desc(irqs[i], desc + i);
> +		if (err)
> +			break;
> +	}
> +
> +	kfree(desc);
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(irq_set_affinity_masks);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity()
  2022-02-17 17:07   ` John Garry
@ 2022-02-17 17:17     ` Marc Zyngier
  2022-02-18  8:41       ` John Garry
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2022-02-17 17:17 UTC (permalink / raw)
  To: John Garry
  Cc: linux-kernel, netdev, Greg Kroah-Hartman, Marcin Wojtas,
	Russell King, David S. Miller, Jakub Kicinski, Thomas Gleixner,
	kernel-team

Hi John,

On 2022-02-17 17:07, John Garry wrote:
> On 16/02/2022 09:08, Marc Zyngier wrote:
> 
> Hi Marc,
> 
>> In order to better support drivers that deal with interrupts in a more
>> "hands-on" way, extract the core of devm_platform_get_irqs_affinity()
>> and expose it as irq_set_affinity_masks().
>> 
>> This helper allows a driver to provide a set of wired interrupts that
>> are to be configured as managed interrupts. As with the original 
>> helper,
>> this is exported as EXPORT_SYMBOL_GPL.
> 
> I know you mentioned it in 2/2, but it would be interesting to see how
> network controller drivers can handle the problem of missing in-flight
> IO completions for managed irq shutdown. For storage controllers this
> is all now safely handled in the block layer.

Do you have a pointer to this? It'd be interesting to see if there is
a common pattern.

> 
>> 
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
> Just a small comment, below.
> 
> Tested-by: John Garry <john.garry@huawei.com> #D05
> 
> Thanks,
> John
> 
>> ---
>>   drivers/base/platform.c   | 20 +++-----------------
>>   include/linux/interrupt.h |  8 ++++++++
>>   kernel/irq/affinity.c     | 27 +++++++++++++++++++++++++++
>>   3 files changed, 38 insertions(+), 17 deletions(-)
>> 
>> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
>> index 6cb04ac48bf0..b363cf6ce5be 100644
>> --- a/drivers/base/platform.c
>> +++ b/drivers/base/platform.c
>> @@ -335,7 +335,6 @@ int devm_platform_get_irqs_affinity(struct 
>> platform_device *dev,
>>   				    int **irqs)
>>   {
>>   	struct irq_affinity_devres *ptr;
>> -	struct irq_affinity_desc *desc;
>>   	size_t size;
>>   	int i, ret, nvec;
>>   @@ -376,31 +375,18 @@ int devm_platform_get_irqs_affinity(struct 
>> platform_device *dev,
>>   		ptr->irq[i] = irq;
>>   	}
>>   -	desc = irq_create_affinity_masks(nvec, affd);
>> -	if (!desc) {
>> -		ret = -ENOMEM;
>> +	ret = irq_set_affinity_masks(affd, ptr->irq, nvec);
>> +	if (ret) {
>> +		dev_err(&dev->dev, "failed to update affinity descriptors (%d)\n", 
>> ret);
>>   		goto err_free_devres;
>>   	}
>>   -	for (i = 0; i < nvec; i++) {
>> -		ret = irq_update_affinity_desc(ptr->irq[i], &desc[i]);
>> -		if (ret) {
>> -			dev_err(&dev->dev, "failed to update irq%d affinity descriptor 
>> (%d)\n",
>> -				ptr->irq[i], ret);
>> -			goto err_free_desc;
>> -		}
>> -	}
>> -
>>   	devres_add(&dev->dev, ptr);
>>   -	kfree(desc);
>> -
>>   	*irqs = ptr->irq;
>>     	return nvec;
>>   -err_free_desc:
>> -	kfree(desc);
>>   err_free_devres:
>>   	devres_free(ptr);
>>   	return ret;
>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>> index 9367f1cb2e3c..6bfce96206f8 100644
>> --- a/include/linux/interrupt.h
>> +++ b/include/linux/interrupt.h
>> @@ -381,6 +381,8 @@ irq_create_affinity_masks(unsigned int nvec, 
>> struct irq_affinity *affd);
>>   unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned 
>> int maxvec,
>>   				       const struct irq_affinity *affd);
>>   +int irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, 
>> int nvec);
>> +
>>   #else /* CONFIG_SMP */
>>     static inline int irq_set_affinity(unsigned int irq, const struct 
>> cpumask *m)
>> @@ -443,6 +445,12 @@ irq_calc_affinity_vectors(unsigned int minvec, 
>> unsigned int maxvec,
>>   	return maxvec;
>>   }
>>   +static inline int
>> +irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, int 
>> nvec)
>> +{
>> +	return -EINVAL;
>> +}
>> +
>>   #endif /* CONFIG_SMP */
>>     /*
>> diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
>> index f7ff8919dc9b..c0f868cd5b87 100644
>> --- a/kernel/irq/affinity.c
>> +++ b/kernel/irq/affinity.c
>> @@ -512,3 +512,30 @@ unsigned int irq_calc_affinity_vectors(unsigned 
>> int minvec, unsigned int maxvec,
>>     	return resv + min(set_vecs, maxvec - resv);
>>   }
>> +
>> +/*
>> + * irq_set_affinity_masks - Set the affinity masks of a number of 
>> interrupts
>> + *                          for multiqueue spreading
>> + * @affd:	Description of the affinity requirements
>> + * @irqs:	An array of interrupt numbers
>> + * @nvec:	The total number of interrupts
>> + */
>> +int irq_set_affinity_masks(struct irq_affinity *affd, int *irqs, int 
>> nvec)
>> +{
>> +	struct irq_affinity_desc *desc;
>> +	int i, err = 0;
> 
> nit: it might be worth doing something similar to how
> pci_alloc_irq_vectors_affinity() handles sets with no pre- and
> post-vectors with msi_default_affd

Yes, good point. This would probably simplify most callers of this code.

Thanks,

           M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity()
  2022-02-17 17:17     ` Marc Zyngier
@ 2022-02-18  8:41       ` John Garry
  2022-03-15 14:25         ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2022-02-18  8:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, netdev, Greg Kroah-Hartman, Marcin Wojtas,
	Russell King, David S. Miller, Jakub Kicinski, Thomas Gleixner,
	kernel-team

On 17/02/2022 17:17, Marc Zyngier wrote:

Hi Marc,

>> I know you mentioned it in 2/2, but it would be interesting to see how
>> network controller drivers can handle the problem of missing in-flight
>> IO completions for managed irq shutdown. For storage controllers this
>> is all now safely handled in the block layer.
> 
> Do you have a pointer to this? It'd be interesting to see if there is
> a common pattern.

Check blk_mq_hctx_notify_offline() and other hotplug handler friends in 
block/blk-mq.c and also blk_mq_get_ctx()/blk_mq_map_queue()

So the key steps in CPU offlining are:
- when the last CPU in HW queue context cpumask is going offline we mark 
the HW queue as inactive and no longer queue requests there
- drain all in-flight requests before we allow that last CPU to go 
offline, meaning that we always have a CPU online to service any 
completion interrupts

This scheme relies on symmetrical HW submission and completion queues 
and also that the blk-mq HW queue context cpumask is same as the HW 
queue's IRQ affinity mask (see blk_mq_pci_map_queues()).

I am not sure how much this would fit with the networking stack or that 
marvell driver.

Thanks,
John

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity()
  2022-02-18  8:41       ` John Garry
@ 2022-03-15 14:25         ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2022-03-15 14:25 UTC (permalink / raw)
  To: John Garry, Marc Zyngier
  Cc: linux-kernel, netdev, Greg Kroah-Hartman, Marcin Wojtas,
	Russell King, David S. Miller, Jakub Kicinski, kernel-team

On Fri, Feb 18 2022 at 08:41, John Garry wrote:
> On 17/02/2022 17:17, Marc Zyngier wrote:
>>> I know you mentioned it in 2/2, but it would be interesting to see how
>>> network controller drivers can handle the problem of missing in-flight
>>> IO completions for managed irq shutdown. For storage controllers this
>>> is all now safely handled in the block layer.
>> 
>> Do you have a pointer to this? It'd be interesting to see if there is
>> a common pattern.
>
> Check blk_mq_hctx_notify_offline() and other hotplug handler friends in 
> block/blk-mq.c and also blk_mq_get_ctx()/blk_mq_map_queue()
>
> So the key steps in CPU offlining are:
> - when the last CPU in HW queue context cpumask is going offline we mark 
> the HW queue as inactive and no longer queue requests there
> - drain all in-flight requests before we allow that last CPU to go 
> offline, meaning that we always have a CPU online to service any 
> completion interrupts
>
> This scheme relies on symmetrical HW submission and completion queues 
> and also that the blk-mq HW queue context cpumask is same as the HW 
> queue's IRQ affinity mask (see blk_mq_pci_map_queues()).
>
> I am not sure how much this would fit with the networking stack or that 
> marvell driver.

The problem with networking is RX flow steering.

The driver in question initializes the RX flows in
mvpp22_port_rss_init() by default so the packets are evenly distributed
accross the RX queues.

So without actually steering the RX flow away from the RX queue which is
associated to the to be unplugged CPU, this does not really work well.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-15 14:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16  9:08 [PATCH 0/2] net: mvpp2: Survive CPU hotplug events Marc Zyngier
2022-02-16  9:08 ` [PATCH 1/2] genirq: Extract irq_set_affinity_masks() from devm_platform_get_irqs_affinity() Marc Zyngier
2022-02-16 10:56   ` Greg Kroah-Hartman
2022-02-17 17:07   ` John Garry
2022-02-17 17:17     ` Marc Zyngier
2022-02-18  8:41       ` John Garry
2022-03-15 14:25         ` Thomas Gleixner
2022-02-16  9:08 ` [PATCH 2/2] net: mvpp2: Convert to managed interrupts to fix CPU HP issues Marc Zyngier
2022-02-16 11:38   ` Marc Zyngier
2022-02-16 13:19 ` [PATCH 0/2] net: mvpp2: Survive CPU hotplug events Marcin Wojtas
2022-02-16 13:29   ` Marc Zyngier
2022-02-16 13:32     ` Marcin Wojtas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).