All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Fix CPU hotplug IRQ migration
@ 2011-07-21 15:14 Russell King - ARM Linux
  2011-07-21 15:24 ` [PATCH 1/4] ARM: CPU hotplug: fix abuse of irqdesc->node Russell King - ARM Linux
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-21 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

This series fixes a bunch of issues with IRQ migration:

1. Preventing affinity changes causing IRQs to be directed to off-line CPUs.
2. Preventing oopses with non-GIC interrupt controllers.
3. Preventing per-CPU IRQs being candidates for migration.
4. Removing the abuse of irqdesc's node member.

This prevents crashes on OMAP4430 SDP when non-default IRQ affinity
settings are used and a CPU which owns some IRQs is offlined.

With this patch set applied, there is no reason core code can't handle
CPU0 being offlined - on OMAP4, the only remaining issue is CPU0
being woken from WFI and falsely passing the non-spurious wake-up test.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] ARM: CPU hotplug: fix abuse of irqdesc->node
  2011-07-21 15:14 [PATCH 0/4] Fix CPU hotplug IRQ migration Russell King - ARM Linux
@ 2011-07-21 15:24 ` Russell King - ARM Linux
  2011-07-21 15:24 ` [PATCH 2/4] ARM: GIC: avoid routing interrupts to offline CPUs Russell King - ARM Linux
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-21 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

irqdesc's node member is supposed to mark the numa node number for the
interrupt.  Our use of it is non-standard.  Remove this, replacing the
functionality with a test of the affinity mask.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/common/gic.c |    1 -
 arch/arm/kernel/irq.c |   10 ++--------
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c
index 4ddd0a6..635d985 100644
--- a/arch/arm/common/gic.c
+++ b/arch/arm/common/gic.c
@@ -189,7 +189,6 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 	bit = 1 << (cpu + shift);
 
 	spin_lock(&irq_controller_lock);
-	d->node = cpu;
 	val = readl_relaxed(reg) & ~mask;
 	writel_relaxed(val | bit, reg);
 	spin_unlock(&irq_controller_lock);
diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 83bbad0..d7aa5c9 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -166,15 +166,9 @@ void migrate_irqs(void)
 		bool affinity_broken = false;
 
 		raw_spin_lock(&desc->lock);
-		do {
-			if (desc->action == NULL)
-				break;
-
-			if (d->node != cpu)
-				break;
-
+		if (desc->action != NULL &&
+		    cpumask_test_cpu(smp_processor_id(), d->affinity))
 			affinity_broken = migrate_one_irq(d);
-		} while (0);
 		raw_spin_unlock(&desc->lock);
 
 		if (affinity_broken && printk_ratelimit())
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] ARM: GIC: avoid routing interrupts to offline CPUs
  2011-07-21 15:14 [PATCH 0/4] Fix CPU hotplug IRQ migration Russell King - ARM Linux
  2011-07-21 15:24 ` [PATCH 1/4] ARM: CPU hotplug: fix abuse of irqdesc->node Russell King - ARM Linux
@ 2011-07-21 15:24 ` Russell King - ARM Linux
  2011-07-21 15:25 ` [PATCH 3/4] ARM: CPU hotplug: pass in proper affinity mask on IRQ migration Russell King - ARM Linux
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-21 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

The irq_set_affinity() method can be called with masks which include
offline CPUs.  This allows offline CPUs to have interrupts routed to
them by writing to /proc/irq/*/smp_affinity after hotplug has taken
a CPU offline.  Fix this by ensuring that we select a target CPU
present in both the required affinity and the online CPU mask.

Ensure that we return IRQ_SET_MASK_OK (which happens to be 0) on
success to ensure generic code copies the new mask into the irq_data
structure.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/common/gic.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/common/gic.c b/arch/arm/common/gic.c
index 635d985..7bdd917 100644
--- a/arch/arm/common/gic.c
+++ b/arch/arm/common/gic.c
@@ -179,10 +179,10 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 {
 	void __iomem *reg = gic_dist_base(d) + GIC_DIST_TARGET + (gic_irq(d) & ~3);
 	unsigned int shift = (d->irq % 4) * 8;
-	unsigned int cpu = cpumask_first(mask_val);
+	unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
 	u32 val, mask, bit;
 
-	if (cpu >= 8)
+	if (cpu >= 8 || cpu >= nr_cpu_ids)
 		return -EINVAL;
 
 	mask = 0xff << shift;
@@ -193,7 +193,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 	writel_relaxed(val | bit, reg);
 	spin_unlock(&irq_controller_lock);
 
-	return 0;
+	return IRQ_SET_MASK_OK;
 }
 #endif
 
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] ARM: CPU hotplug: pass in proper affinity mask on IRQ migration
  2011-07-21 15:14 [PATCH 0/4] Fix CPU hotplug IRQ migration Russell King - ARM Linux
  2011-07-21 15:24 ` [PATCH 1/4] ARM: CPU hotplug: fix abuse of irqdesc->node Russell King - ARM Linux
  2011-07-21 15:24 ` [PATCH 2/4] ARM: GIC: avoid routing interrupts to offline CPUs Russell King - ARM Linux
@ 2011-07-21 15:25 ` Russell King - ARM Linux
  2011-07-25 12:38   ` Santosh Shilimkar
  2011-07-21 15:25 ` [PATCH 4/4] ARM: CPU hotplug: ensure we migrate all IRQs off a downed CPU Russell King - ARM Linux
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-21 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

Now that the GIC takes care of selecting a target interrupt from the
affinity mask, we don't need all this complexity in the core code
anymore.  Just detect when we need to break affinity.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/kernel/irq.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index d7aa5c9..ab63c05 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -133,17 +133,15 @@ int __init arch_probe_nr_irqs(void)
 
 static bool migrate_one_irq(struct irq_data *d)
 {
-	unsigned int cpu = cpumask_any_and(d->affinity, cpu_online_mask);
+	const struct cpumask *affinity = d->affinity;
 	bool ret = false;
 
-	if (cpu >= nr_cpu_ids) {
-		cpu = cpumask_any(cpu_online_mask);
+	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
+		affinity cpu_online_mask;
 		ret = true;
 	}
 
-	pr_debug("IRQ%u: moving from cpu%u to cpu%u\n", d->irq, d->node, cpu);
-
-	d->chip->irq_set_affinity(d, cpumask_of(cpu), true);
+	d->chip->irq_set_affinity(d, affinity, true);
 
 	return ret;
 }
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] ARM: CPU hotplug: ensure we migrate all IRQs off a downed CPU
  2011-07-21 15:14 [PATCH 0/4] Fix CPU hotplug IRQ migration Russell King - ARM Linux
                   ` (2 preceding siblings ...)
  2011-07-21 15:25 ` [PATCH 3/4] ARM: CPU hotplug: pass in proper affinity mask on IRQ migration Russell King - ARM Linux
@ 2011-07-21 15:25 ` Russell King - ARM Linux
  2011-07-22  5:35 ` [PATCH 0/4] Fix CPU hotplug IRQ migration Santosh Shilimkar
  2011-07-22  8:50 ` Will Deacon
  5 siblings, 0 replies; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-21 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

Our selection of interrupts to consider for IRQ migration is sub-
standard.  We were potentially including per-CPU interrupts in our
migration strategy, but omitting chained interrupts.  This caused
some interrupts to remain on a downed CPU.

We were also trying to migrate interrupts which were not migratable,
resulting in an OOPS.

Instead, iterate over all interrupts, skipping per-CPU interrupts
or interrupts whose affinity does not include the downed CPU, and
attempt to set the affinity for every one else if their chip
implements irq_set_affinity().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/kernel/irq.c |   39 ++++++++++++++++++++++++++++-----------
 1 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index ab63c05..0f928a1 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -131,46 +131,63 @@ int __init arch_probe_nr_irqs(void)
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-static bool migrate_one_irq(struct irq_data *d)
+static bool migrate_one_irq(struct irq_desc *desc)
 {
+	struct irq_data *d = irq_desc_get_irq_data(desc);
 	const struct cpumask *affinity = d->affinity;
+	struct irq_chip *c;
 	bool ret = false;
 
+	/*
+	 * If this is a per-CPU interrupt, or the affinity does not
+	 * include this CPU, then we have nothing to do.
+	 */
+	if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))
+		return false;
+
 	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
-		affinity cpu_online_mask;
+		affinity = cpu_online_mask;
 		ret = true;
 	}
 
-	d->chip->irq_set_affinity(d, affinity, true);
+	c = irq_data_get_irq_chip(d);
+	if (c->irq_set_affinity)
+		c->irq_set_affinity(d, affinity, true);
+	else
+		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
 
 	return ret;
 }
 
 /*
- * The CPU has been marked offline.  Migrate IRQs off this CPU.  If
- * the affinity settings do not allow other CPUs, force them onto any
+ * The current CPU has been marked offline.  Migrate IRQs off this CPU.
+ * If the affinity settings do not allow other CPUs, force them onto any
  * available CPU.
+ *
+ * Note: we must iterate over all IRQs, whether they have an attached
+ * action structure or not, as we need to get chained interrupts too.
  */
 void migrate_irqs(void)
 {
-	unsigned int i, cpu = smp_processor_id();
+	unsigned int i;
 	struct irq_desc *desc;
 	unsigned long flags;
 
 	local_irq_save(flags);
 
 	for_each_irq_desc(i, desc) {
-		struct irq_data *d = &desc->irq_data;
 		bool affinity_broken = false;
 
+		if (!desc)
+			continue;
+
 		raw_spin_lock(&desc->lock);
-		if (desc->action != NULL &&
-		    cpumask_test_cpu(smp_processor_id(), d->affinity))
-			affinity_broken = migrate_one_irq(d);
+		affinity_broken = migrate_one_irq(desc);
 		raw_spin_unlock(&desc->lock);
 
 		if (affinity_broken && printk_ratelimit())
-			pr_warning("IRQ%u no longer affine to CPU%u\n", i, cpu);
+			pr_warning("IRQ%u no longer affine to CPU%u\n", i,
+				smp_processor_id());
 	}
 
 	local_irq_restore(flags);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-21 15:14 [PATCH 0/4] Fix CPU hotplug IRQ migration Russell King - ARM Linux
                   ` (3 preceding siblings ...)
  2011-07-21 15:25 ` [PATCH 4/4] ARM: CPU hotplug: ensure we migrate all IRQs off a downed CPU Russell King - ARM Linux
@ 2011-07-22  5:35 ` Santosh Shilimkar
  2011-07-22  8:12   ` Russell King - ARM Linux
  2011-07-25 13:04   ` Santosh Shilimkar
  2011-07-22  8:50 ` Will Deacon
  5 siblings, 2 replies; 17+ messages in thread
From: Santosh Shilimkar @ 2011-07-22  5:35 UTC (permalink / raw)
  To: linux-arm-kernel

+ Colin,

On 7/21/2011 8:44 PM, Russell King - ARM Linux wrote:
> This series fixes a bunch of issues with IRQ migration:
>
> 1. Preventing affinity changes causing IRQs to be directed to off-line CPUs.
> 2. Preventing oopses with non-GIC interrupt controllers.
> 3. Preventing per-CPU IRQs being candidates for migration.
> 4. Removing the abuse of irqdesc's node member.
>
> This prevents crashes on OMAP4430 SDP when non-default IRQ affinity
> settings are used and a CPU which owns some IRQs is offlined.
>
Firstly thanks for fixing the IRQ migration and affinity issues
with hotplug code. Colin found the hotplug irq affinity issue
last week.

Colin quoted ..
----------------------------------
The irq affinity mask is not kept up to date by the ARM irq
code.  In particular, migrate_one_irq is not called when
the irq node does not match the cpu going offline, and
when it is called, it doesn't update the irq affinity
mask before calling the irq chip's set_affinity function.
This causes the use of the affinity mask in the mask
and unmask functions to be unreliable, possibly unmasking
an interrupt on a cpu that is offline.
----------------------------------

Need to check if this series addresses above issue as well.

> With this patch set applied, there is no reason core code can't handle
> CPU0 being offlined - on OMAP4, the only remaining issue is CPU0
> being woken from WFI and falsely passing the non-spurious wake-up test.
There is one as commented in other thread. The Secure code runs only
on CPU0 so if you offline the CPU0, all the secure services are not
available.

Regards
Santosh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-22  5:35 ` [PATCH 0/4] Fix CPU hotplug IRQ migration Santosh Shilimkar
@ 2011-07-22  8:12   ` Russell King - ARM Linux
  2011-07-22  8:22     ` Santosh Shilimkar
  2011-07-22 17:14     ` Colin Cross
  2011-07-25 13:04   ` Santosh Shilimkar
  1 sibling, 2 replies; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-22  8:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 22, 2011 at 11:05:01AM +0530, Santosh Shilimkar wrote:
> + Colin,
>
> On 7/21/2011 8:44 PM, Russell King - ARM Linux wrote:
>> This series fixes a bunch of issues with IRQ migration:
>>
>> 1. Preventing affinity changes causing IRQs to be directed to off-line CPUs.
>> 2. Preventing oopses with non-GIC interrupt controllers.
>> 3. Preventing per-CPU IRQs being candidates for migration.
>> 4. Removing the abuse of irqdesc's node member.
>>
>> This prevents crashes on OMAP4430 SDP when non-default IRQ affinity
>> settings are used and a CPU which owns some IRQs is offlined.
>>
> Firstly thanks for fixing the IRQ migration and affinity issues
> with hotplug code. Colin found the hotplug irq affinity issue
> last week.
>
> Colin quoted ..
> ----------------------------------
> The irq affinity mask is not kept up to date by the ARM irq
> code.  In particular, migrate_one_irq is not called when
> the irq node does not match the cpu going offline, and
> when it is called, it doesn't update the irq affinity
> mask before calling the irq chip's set_affinity function.
> This causes the use of the affinity mask in the mask
> and unmask functions to be unreliable, possibly unmasking
> an interrupt on a cpu that is offline.
> ----------------------------------

I don't have a copy of that message... where was it posted?

> Need to check if this series addresses above issue as well.

It does not, and it is wrong to change the affinity mask of the interrupt
due to a CPU going offline.  Think about it.

If you offline CPU0, which migrates all IRQs to CPU1.  You then update
the affinity mask to exclude CPU0.  Now you online CPU0, and offline CPU1.
You now get a pile of kernel messages about breaking the _apparant_ users
affinity settings for the interrupts, because you have to move them off
CPU1 which is the only CPU they are now affine to.

The affinity mask is a policy setting from userspace.  It is not an
absolute setting which the kernel must keep updated.

And actually you _can't_ tell what CPU the interrupt is routed to from
the affinity mask - the affinity mask gives a hint as to which CPU_s_
(plural) are permitted to receive the interrupt.  The fact we choose -
at the moment - the first CPU to route stuff to from the masks is merely
an implementation detail which must not be relied upon.

So, Colin is wrong on the affinity mask issue.  The correct behaviour
is that the affinity at any point is the set of CPUs in the current
affinity mask _and_ the CPU online mask.  If that is an empty set, then
any online CPU will be chosen to handle the interrupt.

>> With this patch set applied, there is no reason core code can't handle
>> CPU0 being offlined - on OMAP4, the only remaining issue is CPU0
>> being woken from WFI and falsely passing the non-spurious wake-up test.
> There is one as commented in other thread. The Secure code runs only
> on CPU0 so if you offline the CPU0, all the secure services are not
> available.

Secure code is called on CPU1 too, so I don't think your statement is
accurate.  Eg:

	omap_modify_auxcoreboot0(0x0, 0x200)

	omap_read_auxcoreboot0()

both result in calling secure code on CPU1.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-22  8:12   ` Russell King - ARM Linux
@ 2011-07-22  8:22     ` Santosh Shilimkar
  2011-07-22 17:14     ` Colin Cross
  1 sibling, 0 replies; 17+ messages in thread
From: Santosh Shilimkar @ 2011-07-22  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/22/2011 1:42 PM, Russell King - ARM Linux wrote:
> On Fri, Jul 22, 2011 at 11:05:01AM +0530, Santosh Shilimkar wrote:

[...]

> Secure code is called on CPU1 too, so I don't think your statement is
> accurate.  Eg:
>
> 	omap_modify_auxcoreboot0(0x0, 0x200)
>
> 	omap_read_auxcoreboot0()
>
> both result in calling secure code on CPU1.
These are monitor mode secure API's and they can be executed
on any CPU.

I am talking about the services like secure encryption/decryption,
secure copy, saving secure RAM etc.

These services can't be executed on CPU1 and hence the concern.
There is entire secure middle-ware which runs in parallel with Linux
and all it's services are bound to CPU0 only.

Regards
Santosh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-21 15:14 [PATCH 0/4] Fix CPU hotplug IRQ migration Russell King - ARM Linux
                   ` (4 preceding siblings ...)
  2011-07-22  5:35 ` [PATCH 0/4] Fix CPU hotplug IRQ migration Santosh Shilimkar
@ 2011-07-22  8:50 ` Will Deacon
  5 siblings, 0 replies; 17+ messages in thread
From: Will Deacon @ 2011-07-22  8:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Thu, Jul 21, 2011 at 04:14:13PM +0100, Russell King - ARM Linux wrote:
> This series fixes a bunch of issues with IRQ migration:
> 
> 1. Preventing affinity changes causing IRQs to be directed to off-line CPUs.
> 2. Preventing oopses with non-GIC interrupt controllers.
> 3. Preventing per-CPU IRQs being candidates for migration.
> 4. Removing the abuse of irqdesc's node member.
> 
> This prevents crashes on OMAP4430 SDP when non-default IRQ affinity
> settings are used and a CPU which owns some IRQs is offlined.
> 
> With this patch set applied, there is no reason core code can't handle
> CPU0 being offlined - on OMAP4, the only remaining issue is CPU0
> being woken from WFI and falsely passing the non-spurious wake-up test.

I have a related patch that allows the kernel to (in theory) boot on a CPU
other than CPU0. This obviously requires the platform code to be able to
deal with that, but it seems to work fine on my RealView-PBX board booting
on CPU1.

The use case I have for this is making the kexec syscall on a CPU with ID !=
0. This CPU will end up becoming the boot CPU in the kexec'd kernel so you
have to do a bit of tweaking to make sure the thread_info->cpu is correct in
order to boot.

I'll post after the merge window.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-22  8:12   ` Russell King - ARM Linux
  2011-07-22  8:22     ` Santosh Shilimkar
@ 2011-07-22 17:14     ` Colin Cross
  1 sibling, 0 replies; 17+ messages in thread
From: Colin Cross @ 2011-07-22 17:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 22, 2011 at 1:12 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Jul 22, 2011 at 11:05:01AM +0530, Santosh Shilimkar wrote:
>> + Colin,
>>
>> On 7/21/2011 8:44 PM, Russell King - ARM Linux wrote:
>>> This series fixes a bunch of issues with IRQ migration:
>>>
>>> 1. Preventing affinity changes causing IRQs to be directed to off-line CPUs.
>>> 2. Preventing oopses with non-GIC interrupt controllers.
>>> 3. Preventing per-CPU IRQs being candidates for migration.
>>> 4. Removing the abuse of irqdesc's node member.
>>>
>>> This prevents crashes on OMAP4430 SDP when non-default IRQ affinity
>>> settings are used and a CPU which owns some IRQs is offlined.
>>>
>> Firstly thanks for fixing the IRQ migration and affinity issues
>> with hotplug code. Colin found the hotplug irq affinity issue
>> last week.
>>
>> Colin quoted ..
>> ----------------------------------
>> The irq affinity mask is not kept up to date by the ARM irq
>> code. ?In particular, migrate_one_irq is not called when
>> the irq node does not match the cpu going offline, and
>> when it is called, it doesn't update the irq affinity
>> mask before calling the irq chip's set_affinity function.
>> This causes the use of the affinity mask in the mask
>> and unmask functions to be unreliable, possibly unmasking
>> an interrupt on a cpu that is offline.
>> ----------------------------------
>
> I don't have a copy of that message... where was it posted?

It was sent directly to TI, as it only applied to a change that was
not in the mainline tree, where mask and unmask were using
irq_desc->affinity to decide which cpus to modify.

>> Need to check if this series addresses above issue as well.
>
> It does not, and it is wrong to change the affinity mask of the interrupt
> due to a CPU going offline. ?Think about it.
>
> If you offline CPU0, which migrates all IRQs to CPU1. ?You then update
> the affinity mask to exclude CPU0. ?Now you online CPU0, and offline CPU1.
> You now get a pile of kernel messages about breaking the _apparant_ users
> affinity settings for the interrupts, because you have to move them off
> CPU1 which is the only CPU they are now affine to.
>
> The affinity mask is a policy setting from userspace. ?It is not an
> absolute setting which the kernel must keep updated.

OK.  It seems odd to me that chip->irq_set_affinity and
irq_desc->affinity can be different, but I guess it makes sense.

> And actually you _can't_ tell what CPU the interrupt is routed to from
> the affinity mask - the affinity mask gives a hint as to which CPU_s_
> (plural) are permitted to receive the interrupt. ?The fact we choose -
> at the moment - the first CPU to route stuff to from the masks is merely
> an implementation detail which must not be relied upon.

This may be a problem for OMAP4.  The WAKEUPGEN module has an unmask
bit for each cpu.  In order for an interrupt to be handled by a cpu
that is powered down during idle, the unmask bit for that cpu has to
be set, so the unmask function for gic_arch_extn has to be able to
determine which cpu will handle the interrupt.  Today, that can only
be done by relying on the gic's implementation.

gic_set_affinity is also going to need to call a gic_arch_extn
function to mask the wakeup bits for the offline cpu.

> So, Colin is wrong on the affinity mask issue. ?The correct behaviour
> is that the affinity at any point is the set of CPUs in the current
> affinity mask _and_ the CPU online mask. ?If that is an empty set, then
> any online CPU will be chosen to handle the interrupt.
>
>>> With this patch set applied, there is no reason core code can't handle
>>> CPU0 being offlined - on OMAP4, the only remaining issue is CPU0
>>> being woken from WFI and falsely passing the non-spurious wake-up test.
>> There is one as commented in other thread. The Secure code runs only
>> on CPU0 so if you offline the CPU0, all the secure services are not
>> available.
>
> Secure code is called on CPU1 too, so I don't think your statement is
> accurate. ?Eg:
>
> ? ? ? ?omap_modify_auxcoreboot0(0x0, 0x200)
>
> ? ? ? ?omap_read_auxcoreboot0()
>
> both result in calling secure code on CPU1.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/4] ARM: CPU hotplug: pass in proper affinity mask on IRQ migration
  2011-07-21 15:25 ` [PATCH 3/4] ARM: CPU hotplug: pass in proper affinity mask on IRQ migration Russell King - ARM Linux
@ 2011-07-25 12:38   ` Santosh Shilimkar
  2011-07-25 13:06     ` Russell King - ARM Linux
  0 siblings, 1 reply; 17+ messages in thread
From: Santosh Shilimkar @ 2011-07-25 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/21/2011 8:55 PM, Russell King - ARM Linux wrote:
> Now that the GIC takes care of selecting a target interrupt from the
> affinity mask, we don't need all this complexity in the core code
> anymore.  Just detect when we need to break affinity.
>
> Signed-off-by: Russell King<rmk+kernel@arm.linux.org.uk>
> ---
>   arch/arm/kernel/irq.c |   10 ++++------
>   1 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
> index d7aa5c9..ab63c05 100644
> --- a/arch/arm/kernel/irq.c
> +++ b/arch/arm/kernel/irq.c
> @@ -133,17 +133,15 @@ int __init arch_probe_nr_irqs(void)
>
>   static bool migrate_one_irq(struct irq_data *d)
>   {
> -	unsigned int cpu = cpumask_any_and(d->affinity, cpu_online_mask);
> +	const struct cpumask *affinity = d->affinity;
>   	bool ret = false;
>
> -	if (cpu>= nr_cpu_ids) {
> -		cpu = cpumask_any(cpu_online_mask);
> +	if (cpumask_any_and(affinity, cpu_online_mask)>= nr_cpu_ids) {
> +		affinity cpu_online_mask;

I noticed, you missed '=' above and same is corrected in PATCH 4/4.
Should be fixed in this patch itself to avoid git-bisect breakage.

Regards
Santosh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-22  5:35 ` [PATCH 0/4] Fix CPU hotplug IRQ migration Santosh Shilimkar
  2011-07-22  8:12   ` Russell King - ARM Linux
@ 2011-07-25 13:04   ` Santosh Shilimkar
  2011-07-25 13:23     ` Russell King - ARM Linux
  1 sibling, 1 reply; 17+ messages in thread
From: Santosh Shilimkar @ 2011-07-25 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,
On 7/22/2011 11:05 AM, Santosh Shilimkar wrote:
> + Colin,
>
> On 7/21/2011 8:44 PM, Russell King - ARM Linux wrote:
>> This series fixes a bunch of issues with IRQ migration:
>>
>> 1. Preventing affinity changes causing IRQs to be directed to off-line
>> CPUs.
>> 2. Preventing oopses with non-GIC interrupt controllers.
>> 3. Preventing per-CPU IRQs being candidates for migration.
>> 4. Removing the abuse of irqdesc's node member.
>>
>> This prevents crashes on OMAP4430 SDP when non-default IRQ affinity
>> settings are used and a CPU which owns some IRQs is offlined.
>>
> Firstly thanks for fixing the IRQ migration and affinity issues
> with hotplug code. Colin found the hotplug irq affinity issue
> last week.
>
I have tested and revtested this series on OMAP4. Though I made one 
interesting observations while testing.

# cat /proc/irq/44/smp_affinity
3
# echo 2 > /proc/irq/44/smp_affinity
# cat /proc/irq/44/smp_affinity
2

# cat /proc/interrupts
            CPU0       CPU1
  44:       5179         38       GIC  DMA

# echo 0 > /sys/devices/system/cpu/cpu1/online
[  137.663208] CPU1: shutdown
#
# cat /proc/interrupts
            CPU0
  44:       5241       GIC  DMA

After CPU1 offline, the IRQ is handled by CPU0
even though masks say's it's CPU0. Mask should have
been corrected by hotplug code but as per your comments
this is just userspace settings and shouldn't dictate
which CPU handles the IRQ.

The interesting part is once you online CPU now,
IRQ44 continues to be on CPU0.

You think above behavior is fine? My i686 UBUNTU box,
I don't see above behaviour. The hotplug code updates
the IRQ mask to available CPU.

Secondly, GIC smp_set_affinity is kind of set_target_cpu
function. How can we relay the target CPU information to
gic_arch_extn, so that they can update their settings as
per GIC.

And lastly, we need to ensure that migrate_irq()
should be relayed to gic_arch_extn(), so that all
the interrupt on dying CPU can be masked.

Apart from one comment in patch 3/4, and above observations,
this series looks good to me.

If you need one, can add my
Reviewedd-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>

Regards
Santosh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/4] ARM: CPU hotplug: pass in proper affinity mask on IRQ migration
  2011-07-25 12:38   ` Santosh Shilimkar
@ 2011-07-25 13:06     ` Russell King - ARM Linux
  0 siblings, 0 replies; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-25 13:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 25, 2011 at 06:08:59PM +0530, Santosh Shilimkar wrote:
> On 7/21/2011 8:55 PM, Russell King - ARM Linux wrote:
>> +	if (cpumask_any_and(affinity, cpu_online_mask)>= nr_cpu_ids) {
>> +		affinity cpu_online_mask;
>
> I noticed, you missed '=' above and same is corrected in PATCH 4/4.
> Should be fixed in this patch itself to avoid git-bisect breakage.

Unfortuantely, too late...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-25 13:04   ` Santosh Shilimkar
@ 2011-07-25 13:23     ` Russell King - ARM Linux
  2011-07-25 14:27       ` Santosh Shilimkar
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-25 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 25, 2011 at 06:34:07PM +0530, Santosh Shilimkar wrote:
> After CPU1 offline, the IRQ is handled by CPU0
> even though masks say's it's CPU0. Mask should have

ITYM CPU1 here.

> been corrected by hotplug code but as per your comments
> this is just userspace settings and shouldn't dictate
> which CPU handles the IRQ.
>
> The interesting part is once you online CPU now,
> IRQ44 continues to be on CPU0.

Yes, we don't migrate IRQs back onto their affine CPUs when they come
back online - I don't think anyone does.

> You think above behavior is fine? My i686 UBUNTU box,
> I don't see above behaviour. The hotplug code updates
> the IRQ mask to available CPU.

Yes, I noticed that, and the code is really weird - and I think buggy -
there for two reasons:

1. If you bind an interrupt to CPUs 1, and you take CPU 1 offline,
   the result is that the interrupt now is bound to _all_ CPUs even
   when CPU 1 comes back online.

2. irq_set_affinity() returns a code to indicate whether the affinity
   mask is to be updated (see __irq_set_affinity_locked - IRQ_SET_MASK_OK
   or IRQ_SET_MASK_OK_NOCOPY which defines whether irq_data->affinity
   is updated.

To have the individual irq_set_affinity() directly set irq_data->affinity
like x86 does seems to go against the intentions of
__irq_set_affinity_locked().

> Secondly, GIC smp_set_affinity is kind of set_target_cpu function.

No, the gic_set_affinity() has always been a "set this interrupt to be
routed to one of these IRQs", except when our IRQ migration code used
to call it with a specific CPU.  That's an exceptional (and incorrect)
case though.

Why?  The interface is that the affinity mask contains a set of CPUs,
and that's how generic code will call it when you write to the
/proc/irq/*/smp_affinity files.  If you have a 4-CPU system, and you
echo '3' into one of those files, gic_set_affinity() will be called
with a mask containing CPU0,1 and not 2,3.

The fact that gic_set_affinity() has always chosen the first CPU in the
mask is an implementation detail, one which should stay private to
gic_set_affinity().

> How can we relay the target CPU information to gic_arch_extn, so that
> they can update their settings as per GIC.

That was my point to Colin - the set_affinity() interface, nor
irq_data->affinity really isn't suitable for doing that.

One way we _could_ do it is have the GIC code recompose a CPU mask
containing exactly one CPU - but if we ever end up with NR_CPUS > 32
the CPU mask idea may become unmanagable - it may be better to pass
the exact CPU number down.  It depends how its going to be used, and
so far I can see no examples of that.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-25 13:23     ` Russell King - ARM Linux
@ 2011-07-25 14:27       ` Santosh Shilimkar
  2011-07-25 14:46         ` Russell King - ARM Linux
  0 siblings, 1 reply; 17+ messages in thread
From: Santosh Shilimkar @ 2011-07-25 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/25/2011 6:53 PM, Russell King - ARM Linux wrote:
> On Mon, Jul 25, 2011 at 06:34:07PM +0530, Santosh Shilimkar wrote:
>> After CPU1 offline, the IRQ is handled by CPU0
>> even though masks say's it's CPU0. Mask should have
>
> ITYM CPU1 here.
>
right.

>> been corrected by hotplug code but as per your comments
>> this is just userspace settings and shouldn't dictate
>> which CPU handles the IRQ.
>>
>> The interesting part is once you online CPU now,
>> IRQ44 continues to be on CPU0.
>
> Yes, we don't migrate IRQs back onto their affine CPUs when they come
> back online - I don't think anyone does.
>
>> You think above behavior is fine? My i686 UBUNTU box,
>> I don't see above behaviour. The hotplug code updates
>> the IRQ mask to available CPU.
>
> Yes, I noticed that, and the code is really weird - and I think buggy -
> there for two reasons:
>
> 1. If you bind an interrupt to CPUs 1, and you take CPU 1 offline,
>     the result is that the interrupt now is bound to _all_ CPUs even
>     when CPU 1 comes back online.
>
> 2. irq_set_affinity() returns a code to indicate whether the affinity
>     mask is to be updated (see __irq_set_affinity_locked - IRQ_SET_MASK_OK
>     or IRQ_SET_MASK_OK_NOCOPY which defines whether irq_data->affinity
>     is updated.
>
> To have the individual irq_set_affinity() directly set irq_data->affinity
> like x86 does seems to go against the intentions of
> __irq_set_affinity_locked().
>
ok.

>> Secondly, GIC smp_set_affinity is kind of set_target_cpu function.
>
> No, the gic_set_affinity() has always been a "set this interrupt to be
> routed to one of these IRQs", except when our IRQ migration code used
> to call it with a specific CPU.  That's an exceptional (and incorrect)
> case though.
>
> Why?  The interface is that the affinity mask contains a set of CPUs,
> and that's how generic code will call it when you write to the
> /proc/irq/*/smp_affinity files.  If you have a 4-CPU system, and you
> echo '3' into one of those files, gic_set_affinity() will be called
> with a mask containing CPU0,1 and not 2,3.
>
> The fact that gic_set_affinity() has always chosen the first CPU in the
> mask is an implementation detail, one which should stay private to
> gic_set_affinity().
>
This is probably the contention point considering 'gic_arch_extn'
exist and can have control per CPU. But as long as we ensure GIC
and 'gic_arch_extn' have same implementation, it should be fine.

>> How can we relay the target CPU information to gic_arch_extn, so that
>> they can update their settings as per GIC.
>
> That was my point to Colin - the set_affinity() interface, nor
> irq_data->affinity really isn't suitable for doing that.
>
Good. We are aligned here.

> One way we _could_ do it is have the GIC code recompose a CPU mask
> containing exactly one CPU - but if we ever end up with NR_CPUS>  32
> the CPU mask idea may become unmanagable - it may be better to pass
> the exact CPU number down.  It depends how its going to be used, and
> so far I can see no examples of that.
Will make you OMAP4 IRQ code available for this particular aspect and
may be we can take it from there. There are two main problems I am
seeing currently.

1)Which CPU gic_arch_extn_[mask/unmask] operate on?
Can we assume that whichever CPU the call being executed
should be the target CPU ? This might not be safe assumption
though. For the extn, there is no other way to know the target
CPU for mask/unmask of an IRQ

2) Relaying the IRQ migration information to gic_arch_extns

Thanks for good discussion.

Regards
Santosh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-25 14:27       ` Santosh Shilimkar
@ 2011-07-25 14:46         ` Russell King - ARM Linux
  2011-07-25 15:03           ` Santosh Shilimkar
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2011-07-25 14:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 25, 2011 at 07:57:36PM +0530, Santosh Shilimkar wrote:
> Will make you OMAP4 IRQ code available for this particular aspect and
> may be we can take it from there. There are two main problems I am
> seeing currently.
>
> 1)Which CPU gic_arch_extn_[mask/unmask] operate on?
> Can we assume that whichever CPU the call being executed
> should be the target CPU ? This might not be safe assumption
> though. For the extn, there is no other way to know the target
> CPU for mask/unmask of an IRQ

This goes back to the mistake that was made with the initial set of
GIC PPI patches.

You can't assume anything about which CPU the mask/unmask functions end
up running on.  Between code calling request_irq() and the relevant
unmask function being called, we can end up scheduling, either because
of a mutex being waited for or because of a preemption event.  That
means we can end up on a different CPU to that which we started out
on.

So, the only thing you can do is remember the CPU number which it was
last associated with - a simple 'unsigned int' should do in the
extensions data structures, one per IRQ.

However, my point is that doing this:

	gic_arch_extn.irq_set_affinity(d, cpumask_of(cpu), force);

and then:

	cpu = cpumask_first(affinity);

in the extensions code is a very long winded way of communicating a
single CPU number down into the IRQ extension code.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] Fix CPU hotplug IRQ migration
  2011-07-25 14:46         ` Russell King - ARM Linux
@ 2011-07-25 15:03           ` Santosh Shilimkar
  0 siblings, 0 replies; 17+ messages in thread
From: Santosh Shilimkar @ 2011-07-25 15:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/25/2011 8:16 PM, Russell King - ARM Linux wrote:
> On Mon, Jul 25, 2011 at 07:57:36PM +0530, Santosh Shilimkar wrote:
>> Will make you OMAP4 IRQ code available for this particular aspect and
>> may be we can take it from there. There are two main problems I am
>> seeing currently.
>>
>> 1)Which CPU gic_arch_extn_[mask/unmask] operate on?
>> Can we assume that whichever CPU the call being executed
>> should be the target CPU ? This might not be safe assumption
>> though. For the extn, there is no other way to know the target
>> CPU for mask/unmask of an IRQ
>
> This goes back to the mistake that was made with the initial set of
> GIC PPI patches.
>
> You can't assume anything about which CPU the mask/unmask functions end
> up running on.  Between code calling request_irq() and the relevant
> unmask function being called, we can end up scheduling, either because
> of a mutex being waited for or because of a preemption event.  That
> means we can end up on a different CPU to that which we started out
> on.
>
Thanks for confirming the same.

> So, the only thing you can do is remember the CPU number which it was
> last associated with - a simple 'unsigned int' should do in the
> extensions data structures, one per IRQ.
>
How do you build the per IRQ CPU association information?
The point is fist mask/unmask call itself can be happening from
a CPU which is not the target CPU for that IRQ.
We can do something like GIC init does. Mark all the IRQ's
to boot CPU, save this info and keep using it in mask/unmask.
This association can be changed by CPU hotplug or force IRQ
affinity setting and we take care of updating the stored
PER IRQ cpu association.

> However, my point is that doing this:
>
> 	gic_arch_extn.irq_set_affinity(d, cpumask_of(cpu), force);
>
> and then:
>
> 	cpu = cpumask_first(affinity);
>
> in the extensions code is a very long winded way of communicating a
> single CPU number down into the IRQ extension code.
Agree. As you suggested , the GIC code recompose a CPU mask
containing exactly one CPU and relaying that information with
some custom function might be more efficient.

Regards
Santosh

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-07-25 15:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-21 15:14 [PATCH 0/4] Fix CPU hotplug IRQ migration Russell King - ARM Linux
2011-07-21 15:24 ` [PATCH 1/4] ARM: CPU hotplug: fix abuse of irqdesc->node Russell King - ARM Linux
2011-07-21 15:24 ` [PATCH 2/4] ARM: GIC: avoid routing interrupts to offline CPUs Russell King - ARM Linux
2011-07-21 15:25 ` [PATCH 3/4] ARM: CPU hotplug: pass in proper affinity mask on IRQ migration Russell King - ARM Linux
2011-07-25 12:38   ` Santosh Shilimkar
2011-07-25 13:06     ` Russell King - ARM Linux
2011-07-21 15:25 ` [PATCH 4/4] ARM: CPU hotplug: ensure we migrate all IRQs off a downed CPU Russell King - ARM Linux
2011-07-22  5:35 ` [PATCH 0/4] Fix CPU hotplug IRQ migration Santosh Shilimkar
2011-07-22  8:12   ` Russell King - ARM Linux
2011-07-22  8:22     ` Santosh Shilimkar
2011-07-22 17:14     ` Colin Cross
2011-07-25 13:04   ` Santosh Shilimkar
2011-07-25 13:23     ` Russell King - ARM Linux
2011-07-25 14:27       ` Santosh Shilimkar
2011-07-25 14:46         ` Russell King - ARM Linux
2011-07-25 15:03           ` Santosh Shilimkar
2011-07-22  8:50 ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.