All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
@ 2014-05-09 16:40 Sudeep Holla
  2014-05-23 11:26 ` Sudeep Holla
  2014-05-23 12:10 ` Russell King - ARM Linux
  0 siblings, 2 replies; 12+ messages in thread
From: Sudeep Holla @ 2014-05-09 16:40 UTC (permalink / raw)
  To: linux-arm-kernel

From: Sudeep Holla <sudeep.holla@arm.com>

Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
enabled the forced irq_set_affinity which previously refused to route an
interrupt to an offline cpu.

Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
implements this force logic and disables the cpu online check for GIC
interrupt controller.

When __cpu_disable calls migrate_irqs, it disables the current cpu in
cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
away from the cpu but passes affinity mask with the cpu being offlined
also included in it.

When calling irq_set_affinity with force == true in a cpu hotplug path,
the caller must ensure that the cpu being offlined is not present in the
affinity mask or it may be selected as the target CPU, leading to the
interrupt not being migrated.

This patch uses cpu_online_mask when using forced irq_set_affinity so
that the IRQs are properly migrated away.

Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
locks up as the IRQs are not migrated away from CPU0.

Cc: Russell King <linux@arm.linux.org.uk> 
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm/kernel/irq.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 9723d17..1f216a5 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -155,11 +155,15 @@ static bool migrate_one_irq(struct irq_desc *desc)
 	if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))
 		return false;
 
-	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
-		affinity = cpu_online_mask;
+	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids)
 		ret = true;
-	}
 
+	/*
+	 * when using forced irq_set_affinity we must ensure that the cpu
+	 * being offlined is not present in the affinity mask, it may be
+	 * selected as the target CPU otherwise
+	 */
+	affinity = cpu_online_mask;
 	c = irq_data_get_irq_chip(d);
 	if (!c->irq_set_affinity)
 		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-05-09 16:40 [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity Sudeep Holla
@ 2014-05-23 11:26 ` Sudeep Holla
  2014-05-23 12:10 ` Russell King - ARM Linux
  1 sibling, 0 replies; 12+ messages in thread
From: Sudeep Holla @ 2014-05-23 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On 09/05/14 17:40, Sudeep Holla wrote:
> From: Sudeep Holla <sudeep.holla@arm.com>
>
> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
> enabled the forced irq_set_affinity which previously refused to route an
> interrupt to an offline cpu.
>
> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
> implements this force logic and disables the cpu online check for GIC
> interrupt controller.
>
> When __cpu_disable calls migrate_irqs, it disables the current cpu in
> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
> away from the cpu but passes affinity mask with the cpu being offlined
> also included in it.
>
> When calling irq_set_affinity with force == true in a cpu hotplug path,
> the caller must ensure that the cpu being offlined is not present in the
> affinity mask or it may be selected as the target CPU, leading to the
> interrupt not being migrated.
>
> This patch uses cpu_online_mask when using forced irq_set_affinity so
> that the IRQs are properly migrated away.
>
> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
> locks up as the IRQs are not migrated away from CPU0.
>

If you are fine with this change, can I put this in the patch tracker?

Regards,
Sudeep

> Cc: Russell King <linux@arm.linux.org.uk>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
>   arch/arm/kernel/irq.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
> index 9723d17..1f216a5 100644
> --- a/arch/arm/kernel/irq.c
> +++ b/arch/arm/kernel/irq.c
> @@ -155,11 +155,15 @@ static bool migrate_one_irq(struct irq_desc *desc)
>   	if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))
>   		return false;
>
> -	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
> -		affinity = cpu_online_mask;
> +	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids)
>   		ret = true;
> -	}
>
> +	/*
> +	 * when using forced irq_set_affinity we must ensure that the cpu
> +	 * being offlined is not present in the affinity mask, it may be
> +	 * selected as the target CPU otherwise
> +	 */
> +	affinity = cpu_online_mask;
>   	c = irq_data_get_irq_chip(d);
>   	if (!c->irq_set_affinity)
>   		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-05-09 16:40 [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity Sudeep Holla
  2014-05-23 11:26 ` Sudeep Holla
@ 2014-05-23 12:10 ` Russell King - ARM Linux
  2014-05-23 12:51   ` Sudeep Holla
  1 sibling, 1 reply; 12+ messages in thread
From: Russell King - ARM Linux @ 2014-05-23 12:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 09, 2014 at 05:40:40PM +0100, Sudeep Holla wrote:
> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
> enabled the forced irq_set_affinity which previously refused to route an
> interrupt to an offline cpu.
> 
> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
> implements this force logic and disables the cpu online check for GIC
> interrupt controller.
> 
> When __cpu_disable calls migrate_irqs, it disables the current cpu in
> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
> away from the cpu but passes affinity mask with the cpu being offlined
> also included in it.
> 
> When calling irq_set_affinity with force == true in a cpu hotplug path,
> the caller must ensure that the cpu being offlined is not present in the
> affinity mask or it may be selected as the target CPU, leading to the
> interrupt not being migrated.
> 
> This patch uses cpu_online_mask when using forced irq_set_affinity so
> that the IRQs are properly migrated away.
> 
> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
> locks up as the IRQs are not migrated away from CPU0.

You don't explain /how/ this happens, and I'm not convinced that you've
properly diagnosed this bug.

> @@ -155,11 +155,15 @@ static bool migrate_one_irq(struct irq_desc *desc)
>  	if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))
>  		return false;
>  
> -	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
> -		affinity = cpu_online_mask;
> +	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids)
>  		ret = true;
> -	}

The idea here with the original code is:

- if the current CPU (which is the one being offlined) is not in the
  affinity mask, do nothing.
- if "affinity & cpu_online_mask" indicates that there's no CPUs in the
  new set (cpu_online_mask must have been updated to indicate that the
  current CPU is offline) then re-set the affinity mask and report that
  we forced a change.
- otherwise, re-set the existing affinity (which will force the IRQ
  controller to re-evaluate it's routing given the affinity and online
  CPUs.)

This code is correct.  In fact, changing it as you have, you /always/
reset the affinity mask whether or not the CPU being offlined is the
last CPU in the affinity set.

If you are finding that CPU0 is left with interrupts afterwards, the
bug lies elsewhere - probably in the IRQ controller code.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-05-23 12:10 ` Russell King - ARM Linux
@ 2014-05-23 12:51   ` Sudeep Holla
  2014-06-20 13:04     ` Sudeep Holla
  0 siblings, 1 reply; 12+ messages in thread
From: Sudeep Holla @ 2014-05-23 12:51 UTC (permalink / raw)
  To: linux-arm-kernel



On 23/05/14 13:10, Russell King - ARM Linux wrote:
> On Fri, May 09, 2014 at 05:40:40PM +0100, Sudeep Holla wrote:
>> From: Sudeep Holla <sudeep.holla@arm.com>
>>
>> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
>> enabled the forced irq_set_affinity which previously refused to route an
>> interrupt to an offline cpu.
>>
>> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
>> implements this force logic and disables the cpu online check for GIC
>> interrupt controller.
>>
>> When __cpu_disable calls migrate_irqs, it disables the current cpu in
>> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
>> away from the cpu but passes affinity mask with the cpu being offlined
>> also included in it.
>>
>> When calling irq_set_affinity with force == true in a cpu hotplug path,
>> the caller must ensure that the cpu being offlined is not present in the
>> affinity mask or it may be selected as the target CPU, leading to the
>> interrupt not being migrated.
>>
>> This patch uses cpu_online_mask when using forced irq_set_affinity so
>> that the IRQs are properly migrated away.
>>
>> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
>> locks up as the IRQs are not migrated away from CPU0.
>
> You don't explain /how/ this happens, and I'm not convinced that you've
> properly diagnosed this bug.
>

Sorry for not being elaborate enough.
- On boot by default all the irqs have cpu_online_mask as affinity
- Now if CPU0 is being hotplugged out, CPU0 is removed from cpu_online_mask
   and migrate_irqs is called
- In migrate_one_irq, when affinity is read from the irq_desc, it still contains
   CPU0 which is expected.
- irq_set_affinity is called with affinity with CPU0 set and force = true,
   which chooses CPU0 resulting in not migrating the IRQ.

>> @@ -155,11 +155,15 @@ static bool migrate_one_irq(struct irq_desc *desc)
>>   	if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))
>>   		return false;
>>
>> -	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
>> -		affinity = cpu_online_mask;
>> +	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids)
>>   		ret = true;
>> -	}
>
> The idea here with the original code is:
>
> - if the current CPU (which is the one being offlined) is not in the
>    affinity mask, do nothing.
> - if "affinity & cpu_online_mask" indicates that there's no CPUs in the
>    new set (cpu_online_mask must have been updated to indicate that the
>    current CPU is offline) then re-set the affinity mask and report that
>    we forced a change.
> - otherwise, re-set the existing affinity (which will force the IRQ
>    controller to re-evaluate it's routing given the affinity and online
>    CPUs.)
>

I completely understand the above idea, except that the new feature added
to allow forced affinity setting(as mentioned in the commit log by 2 commits),
changes the behaviour of last step.

IRQ controller now re-evaluates it's routing based on the given affinity alone
and doesn't consider online CPUs when force = true is set. This will result in
the CPU being offlined chosen as the target if it happens to be the first in the
affinity mask.

> This code is correct.  In fact, changing it as you have, you /always/
> reset the affinity mask whether or not the CPU being offlined is the
> last CPU in the affinity set.
>
> If you are finding that CPU0 is left with interrupts afterwards, the
> bug lies elsewhere - probably in the IRQ controller code.
>

Since the IRQ controller code is changed to provide that feature, either
- we have to choose not to use forced option, or
- we need to make sure we pass valid affinity mask with force = true option

I chose latter in this patch. Let me know your opinion.

Regards,
Sudeep

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-05-23 12:51   ` Sudeep Holla
@ 2014-06-20 13:04     ` Sudeep Holla
  2014-06-20 15:16       ` Russell King - ARM Linux
  0 siblings, 1 reply; 12+ messages in thread
From: Sudeep Holla @ 2014-06-20 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On 23/05/14 13:51, Sudeep Holla wrote:
>
>
> On 23/05/14 13:10, Russell King - ARM Linux wrote:
>> On Fri, May 09, 2014 at 05:40:40PM +0100, Sudeep Holla wrote:
>>> From: Sudeep Holla <sudeep.holla@arm.com>
>>>
>>> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
>>> enabled the forced irq_set_affinity which previously refused to route an
>>> interrupt to an offline cpu.
>>>
>>> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
>>> implements this force logic and disables the cpu online check for GIC
>>> interrupt controller.
>>>
>>> When __cpu_disable calls migrate_irqs, it disables the current cpu in
>>> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
>>> away from the cpu but passes affinity mask with the cpu being offlined
>>> also included in it.
>>>
>>> When calling irq_set_affinity with force == true in a cpu hotplug path,
>>> the caller must ensure that the cpu being offlined is not present in the
>>> affinity mask or it may be selected as the target CPU, leading to the
>>> interrupt not being migrated.
>>>
>>> This patch uses cpu_online_mask when using forced irq_set_affinity so
>>> that the IRQs are properly migrated away.
>>>
>>> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
>>> locks up as the IRQs are not migrated away from CPU0.
>>
>> You don't explain /how/ this happens, and I'm not convinced that you've
>> properly diagnosed this bug.
>>
>
> Sorry for not being elaborate enough.
> - On boot by default all the irqs have cpu_online_mask as affinity
> - Now if CPU0 is being hotplugged out, CPU0 is removed from cpu_online_mask
>     and migrate_irqs is called
> - In migrate_one_irq, when affinity is read from the irq_desc, it still contains
>     CPU0 which is expected.
> - irq_set_affinity is called with affinity with CPU0 set and force = true,
>     which chooses CPU0 resulting in not migrating the IRQ.
>
>>> @@ -155,11 +155,15 @@ static bool migrate_one_irq(struct irq_desc *desc)
>>>    	if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))
>>>    		return false;
>>>
>>> -	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
>>> -		affinity = cpu_online_mask;
>>> +	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids)
>>>    		ret = true;
>>> -	}
>>
>> The idea here with the original code is:
>>
>> - if the current CPU (which is the one being offlined) is not in the
>>     affinity mask, do nothing.
>> - if "affinity & cpu_online_mask" indicates that there's no CPUs in the
>>     new set (cpu_online_mask must have been updated to indicate that the
>>     current CPU is offline) then re-set the affinity mask and report that
>>     we forced a change.
>> - otherwise, re-set the existing affinity (which will force the IRQ
>>     controller to re-evaluate it's routing given the affinity and online
>>     CPUs.)
>>
>
> I completely understand the above idea, except that the new feature added
> to allow forced affinity setting(as mentioned in the commit log by 2 commits),
> changes the behaviour of last step.
>
> IRQ controller now re-evaluates it's routing based on the given affinity alone
> and doesn't consider online CPUs when force = true is set. This will result in
> the CPU being offlined chosen as the target if it happens to be the first in the
> affinity mask.
>
>> This code is correct.  In fact, changing it as you have, you /always/
>> reset the affinity mask whether or not the CPU being offlined is the
>> last CPU in the affinity set.
>>
>> If you are finding that CPU0 is left with interrupts afterwards, the
>> bug lies elsewhere - probably in the IRQ controller code.
>>
>
> Since the IRQ controller code is changed to provide that feature, either
> - we have to choose not to use forced option, or
> - we need to make sure we pass valid affinity mask with force = true option
>
> I chose latter in this patch. Let me know your opinion.
>

Any suggestions on this ? Since commit 01f8fa4f01d8 and ffde1de64012 are now in
stable releases, CPU0 hotplug is broken there now.

Regards,
Sudeep

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-06-20 13:04     ` Sudeep Holla
@ 2014-06-20 15:16       ` Russell King - ARM Linux
  2014-07-17  9:58         ` Sudeep Holla
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King - ARM Linux @ 2014-06-20 15:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 20, 2014 at 02:04:52PM +0100, Sudeep Holla wrote:
> Hi Russell,
>
> On 23/05/14 13:51, Sudeep Holla wrote:
>>
>>
>> On 23/05/14 13:10, Russell King - ARM Linux wrote:
>>> On Fri, May 09, 2014 at 05:40:40PM +0100, Sudeep Holla wrote:
>>>> From: Sudeep Holla <sudeep.holla@arm.com>
>>>>
>>>> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
>>>> enabled the forced irq_set_affinity which previously refused to route an
>>>> interrupt to an offline cpu.
>>>>
>>>> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
>>>> implements this force logic and disables the cpu online check for GIC
>>>> interrupt controller.
>>>>
>>>> When __cpu_disable calls migrate_irqs, it disables the current cpu in
>>>> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
>>>> away from the cpu but passes affinity mask with the cpu being offlined
>>>> also included in it.
>>>>
>>>> When calling irq_set_affinity with force == true in a cpu hotplug path,
>>>> the caller must ensure that the cpu being offlined is not present in the
>>>> affinity mask or it may be selected as the target CPU, leading to the
>>>> interrupt not being migrated.
>>>>
>>>> This patch uses cpu_online_mask when using forced irq_set_affinity so
>>>> that the IRQs are properly migrated away.
>>>>
>>>> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
>>>> locks up as the IRQs are not migrated away from CPU0.
>>>
>>> You don't explain /how/ this happens, and I'm not convinced that you've
>>> properly diagnosed this bug.
>>>
>>
>> Sorry for not being elaborate enough.
>> - On boot by default all the irqs have cpu_online_mask as affinity
>> - Now if CPU0 is being hotplugged out, CPU0 is removed from cpu_online_mask
>>     and migrate_irqs is called
>> - In migrate_one_irq, when affinity is read from the irq_desc, it still contains
>>     CPU0 which is expected.
>> - irq_set_affinity is called with affinity with CPU0 set and force = true,
>>     which chooses CPU0 resulting in not migrating the IRQ.
>>
>>>> @@ -155,11 +155,15 @@ static bool migrate_one_irq(struct irq_desc *desc)
>>>>    	if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))
>>>>    		return false;
>>>>
>>>> -	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
>>>> -		affinity = cpu_online_mask;
>>>> +	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids)
>>>>    		ret = true;
>>>> -	}
>>>
>>> The idea here with the original code is:
>>>
>>> - if the current CPU (which is the one being offlined) is not in the
>>>     affinity mask, do nothing.
>>> - if "affinity & cpu_online_mask" indicates that there's no CPUs in the
>>>     new set (cpu_online_mask must have been updated to indicate that the
>>>     current CPU is offline) then re-set the affinity mask and report that
>>>     we forced a change.
>>> - otherwise, re-set the existing affinity (which will force the IRQ
>>>     controller to re-evaluate it's routing given the affinity and online
>>>     CPUs.)
>>>
>>
>> I completely understand the above idea, except that the new feature added
>> to allow forced affinity setting(as mentioned in the commit log by 2 commits),
>> changes the behaviour of last step.
>>
>> IRQ controller now re-evaluates it's routing based on the given affinity alone
>> and doesn't consider online CPUs when force = true is set. This will result in
>> the CPU being offlined chosen as the target if it happens to be the first in the
>> affinity mask.
>>
>>> This code is correct.  In fact, changing it as you have, you /always/
>>> reset the affinity mask whether or not the CPU being offlined is the
>>> last CPU in the affinity set.
>>>
>>> If you are finding that CPU0 is left with interrupts afterwards, the
>>> bug lies elsewhere - probably in the IRQ controller code.
>>>
>>
>> Since the IRQ controller code is changed to provide that feature, either
>> - we have to choose not to use forced option, or
>> - we need to make sure we pass valid affinity mask with force = true option
>>
>> I chose latter in this patch. Let me know your opinion.
>>
>
> Any suggestions on this ? Since commit 01f8fa4f01d8 and ffde1de64012 are now in
> stable releases, CPU0 hotplug is broken there now.

Maybe we should ask Thomas, as he's (a) the maintainer of the irqchip
stuff, and (b) the author of the patch causing the breakage.

>From what I can see looking at the x86 code, the work-around in
ffde1de64012 is wrong.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-06-20 15:16       ` Russell King - ARM Linux
@ 2014-07-17  9:58         ` Sudeep Holla
  2014-08-25  9:55           ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Sudeep Holla @ 2014-07-17  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 20/06/14 16:16, Russell King - ARM Linux wrote:
> On Fri, Jun 20, 2014 at 02:04:52PM +0100, Sudeep Holla wrote:
>> Hi Russell,
>>
>> On 23/05/14 13:51, Sudeep Holla wrote:
>>>
>>>
>>> On 23/05/14 13:10, Russell King - ARM Linux wrote:
>>>> On Fri, May 09, 2014 at 05:40:40PM +0100, Sudeep Holla wrote:
>>>>> From: Sudeep Holla <sudeep.holla@arm.com>
>>>>>
>>>>> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
>>>>> enabled the forced irq_set_affinity which previously refused to route an
>>>>> interrupt to an offline cpu.
>>>>>
>>>>> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
>>>>> implements this force logic and disables the cpu online check for GIC
>>>>> interrupt controller.
>>>>>
>>>>> When __cpu_disable calls migrate_irqs, it disables the current cpu in
>>>>> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
>>>>> away from the cpu but passes affinity mask with the cpu being offlined
>>>>> also included in it.
>>>>>
>>>>> When calling irq_set_affinity with force == true in a cpu hotplug path,
>>>>> the caller must ensure that the cpu being offlined is not present in the
>>>>> affinity mask or it may be selected as the target CPU, leading to the
>>>>> interrupt not being migrated.
>>>>>
>>>>> This patch uses cpu_online_mask when using forced irq_set_affinity so
>>>>> that the IRQs are properly migrated away.
>>>>>
>>>>> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
>>>>> locks up as the IRQs are not migrated away from CPU0.
>>>>
>>>> You don't explain /how/ this happens, and I'm not convinced that you've
>>>> properly diagnosed this bug.
>>>>
>>>
>>> Sorry for not being elaborate enough.
>>> - On boot by default all the irqs have cpu_online_mask as affinity
>>> - Now if CPU0 is being hotplugged out, CPU0 is removed from cpu_online_mask
>>>      and migrate_irqs is called
>>> - In migrate_one_irq, when affinity is read from the irq_desc, it still contains
>>>      CPU0 which is expected.
>>> - irq_set_affinity is called with affinity with CPU0 set and force = true,
>>>      which chooses CPU0 resulting in not migrating the IRQ.
>>>
>>>>> @@ -155,11 +155,15 @@ static bool migrate_one_irq(struct irq_desc *desc)
>>>>>     	if (irqd_is_per_cpu(d) || !cpumask_test_cpu(smp_processor_id(), affinity))
>>>>>     		return false;
>>>>>
>>>>> -	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
>>>>> -		affinity = cpu_online_mask;
>>>>> +	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids)
>>>>>     		ret = true;
>>>>> -	}
>>>>
>>>> The idea here with the original code is:
>>>>
>>>> - if the current CPU (which is the one being offlined) is not in the
>>>>      affinity mask, do nothing.
>>>> - if "affinity & cpu_online_mask" indicates that there's no CPUs in the
>>>>      new set (cpu_online_mask must have been updated to indicate that the
>>>>      current CPU is offline) then re-set the affinity mask and report that
>>>>      we forced a change.
>>>> - otherwise, re-set the existing affinity (which will force the IRQ
>>>>      controller to re-evaluate it's routing given the affinity and online
>>>>      CPUs.)
>>>>
>>>
>>> I completely understand the above idea, except that the new feature added
>>> to allow forced affinity setting(as mentioned in the commit log by 2 commits),
>>> changes the behaviour of last step.
>>>
>>> IRQ controller now re-evaluates it's routing based on the given affinity alone
>>> and doesn't consider online CPUs when force = true is set. This will result in
>>> the CPU being offlined chosen as the target if it happens to be the first in the
>>> affinity mask.
>>>
>>>> This code is correct.  In fact, changing it as you have, you /always/
>>>> reset the affinity mask whether or not the CPU being offlined is the
>>>> last CPU in the affinity set.
>>>>
>>>> If you are finding that CPU0 is left with interrupts afterwards, the
>>>> bug lies elsewhere - probably in the IRQ controller code.
>>>>
>>>
>>> Since the IRQ controller code is changed to provide that feature, either
>>> - we have to choose not to use forced option, or
>>> - we need to make sure we pass valid affinity mask with force = true option
>>>
>>> I chose latter in this patch. Let me know your opinion.
>>>
>>
>> Any suggestions on this ? Since commit 01f8fa4f01d8 and ffde1de64012 are now in
>> stable releases, CPU0 hotplug is broken there now.
>
> Maybe we should ask Thomas, as he's (a) the maintainer of the irqchip
> stuff, and (b) the author of the patch causing the breakage.
>
>  From what I can see looking at the x86 code, the work-around in
> ffde1de64012 is wrong.
>

Can provide your thoughts on how to solve this issue ?

Is it expected from all the irqchip implementation to use force flag in
irq_set_affinity to ignore cpu_online_mask similar to GIC ?

Regards,
Sudeep

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-07-17  9:58         ` Sudeep Holla
@ 2014-08-25  9:55           ` Thomas Gleixner
  2014-08-26 15:19             ` Sudeep Holla
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2014-08-25  9:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 17 Jul 2014, Sudeep Holla wrote:
> > > Any suggestions on this ? Since commit 01f8fa4f01d8 and ffde1de64012 are
> > > now in
> > > stable releases, CPU0 hotplug is broken there now.
> > 
> > Maybe we should ask Thomas, as he's (a) the maintainer of the irqchip
> > stuff, and (b) the author of the patch causing the breakage.
> > 
> >  From what I can see looking at the x86 code, the work-around in
> > ffde1de64012 is wrong.
> > 
> 
> Can provide your thoughts on how to solve this issue ?

ffde1de64012 is not about offlining a cpu, it's about onlining where
we need to make sure that we assign the affinity to a not yet online
marked cpu.
 
> Is it expected from all the irqchip implementation to use force flag in
> irq_set_affinity to ignore cpu_online_mask similar to GIC ?

No, it's only relevant for the cases where we need to route irqs to
not yet online cpus.

Now the wreckage of offlining was definitely not intended and I wonder
why set_affinity() is called there with force = true. This was
introduced in commit 1dbfa187dad. I acked it back then, but I have no
idea why, because the force argument did not have any effect at that
time.

Changing it to false should solve the issue.

Thanks,

	tglx

diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 2c4257604513..5c4d38e32a51 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -175,7 +175,7 @@ static bool migrate_one_irq(struct irq_desc *desc)
 	c = irq_data_get_irq_chip(d);
 	if (!c->irq_set_affinity)
 		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
-	else if (c->irq_set_affinity(d, affinity, true) == IRQ_SET_MASK_OK && ret)
+	else if (c->irq_set_affinity(d, affinity, false) == IRQ_SET_MASK_OK && ret)
 		cpumask_copy(d->affinity, affinity);
 
 	return ret;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-08-25  9:55           ` Thomas Gleixner
@ 2014-08-26 15:19             ` Sudeep Holla
  2014-08-28  9:32               ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Sudeep Holla @ 2014-08-26 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

Thanks for your feedback.

On 25/08/14 10:55, Thomas Gleixner wrote:
> On Thu, 17 Jul 2014, Sudeep Holla wrote:
>>>> Any suggestions on this ? Since commit 01f8fa4f01d8 and ffde1de64012 are
>>>> now in
>>>> stable releases, CPU0 hotplug is broken there now.
>>>
>>> Maybe we should ask Thomas, as he's (a) the maintainer of the irqchip
>>> stuff, and (b) the author of the patch causing the breakage.
>>>
>>>   From what I can see looking at the x86 code, the work-around in
>>> ffde1de64012 is wrong.
>>>
>>
>> Can provide your thoughts on how to solve this issue ?
>
> ffde1de64012 is not about offlining a cpu, it's about onlining where
> we need to make sure that we assign the affinity to a not yet online
> marked cpu.
>
>> Is it expected from all the irqchip implementation to use force flag in
>> irq_set_affinity to ignore cpu_online_mask similar to GIC ?
>
> No, it's only relevant for the cases where we need to route irqs to
> not yet online cpus.
>

Ok. IIUC Russell's main concern was if irqchip implementation uses force
flag differently, then we can't change the core code to false. Also
x86 core code also uses forced irq_set_affinity in arch/x86/kernel/irq.c

Russell, any comments on this or are you fine with changing to false.

Regards,
Sudeep

> Now the wreckage of offlining was definitely not intended and I wonder
> why set_affinity() is called there with force = true. This was
> introduced in commit 1dbfa187dad. I acked it back then, but I have no
> idea why, because the force argument did not have any effect at that
> time.
>
> Changing it to false should solve the issue.
>
> Thanks,
>
> 	tglx
>
> diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
> index 2c4257604513..5c4d38e32a51 100644
> --- a/arch/arm/kernel/irq.c
> +++ b/arch/arm/kernel/irq.c
> @@ -175,7 +175,7 @@ static bool migrate_one_irq(struct irq_desc *desc)
>   	c = irq_data_get_irq_chip(d);
>   	if (!c->irq_set_affinity)
>   		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
> -	else if (c->irq_set_affinity(d, affinity, true) == IRQ_SET_MASK_OK && ret)
> +	else if (c->irq_set_affinity(d, affinity, false) == IRQ_SET_MASK_OK && ret)
>   		cpumask_copy(d->affinity, affinity);
>
>   	return ret;
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-08-26 15:19             ` Sudeep Holla
@ 2014-08-28  9:32               ` Thomas Gleixner
  2014-08-28 10:12                 ` Sudeep Holla
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2014-08-28  9:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 26 Aug 2014, Sudeep Holla wrote:
> > > Can provide your thoughts on how to solve this issue ?
> > 
> > ffde1de64012 is not about offlining a cpu, it's about onlining where
> > we need to make sure that we assign the affinity to a not yet online
> > marked cpu.
> > 
> > > Is it expected from all the irqchip implementation to use force flag in
> > > irq_set_affinity to ignore cpu_online_mask similar to GIC ?
> > 
> > No, it's only relevant for the cases where we need to route irqs to
> > not yet online cpus.
> > 
> 
> Ok. IIUC Russell's main concern was if irqchip implementation uses force
> flag differently, then we can't change the core code to false. Also
> x86 core code also uses forced irq_set_affinity in arch/x86/kernel/irq.c

Which is pointless as none of the x86 irq chip implementations
actually honours the force argument.

In fact until the point where I implemented it in the GIC driver,
nothing ever used that argument. So the GIC conversion actually added
semantics to the argument. Any driver which will make use of it, has
to follow that now. I'll add documentation to the core code for it ...

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-08-28  9:32               ` Thomas Gleixner
@ 2014-08-28 10:12                 ` Sudeep Holla
  2014-08-28 10:16                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 12+ messages in thread
From: Sudeep Holla @ 2014-08-28 10:12 UTC (permalink / raw)
  To: linux-arm-kernel



On 28/08/14 10:32, Thomas Gleixner wrote:
> On Tue, 26 Aug 2014, Sudeep Holla wrote:
>>>> Can provide your thoughts on how to solve this issue ?
>>>
>>> ffde1de64012 is not about offlining a cpu, it's about onlining where
>>> we need to make sure that we assign the affinity to a not yet online
>>> marked cpu.
>>>
>>>> Is it expected from all the irqchip implementation to use force flag in
>>>> irq_set_affinity to ignore cpu_online_mask similar to GIC ?
>>>
>>> No, it's only relevant for the cases where we need to route irqs to
>>> not yet online cpus.
>>>
>>
>> Ok. IIUC Russell's main concern was if irqchip implementation uses force
>> flag differently, then we can't change the core code to false. Also
>> x86 core code also uses forced irq_set_affinity in arch/x86/kernel/irq.c
>
> Which is pointless as none of the x86 irq chip implementations
> actually honours the force argument.
>
> In fact until the point where I implemented it in the GIC driver,
> nothing ever used that argument. So the GIC conversion actually added
> semantics to the argument. Any driver which will make use of it, has
> to follow that now. I'll add documentation to the core code for it ...
>

Thanks Thomas for confirming.

Hi Russell,

Can I post the patch changing force to false in irq_set_affinity call
to fix the issue and cc stable ? It's broken in stable kernels(v3.10 and
v3.14)

Regards,
Sudeep

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity
  2014-08-28 10:12                 ` Sudeep Holla
@ 2014-08-28 10:16                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 12+ messages in thread
From: Russell King - ARM Linux @ 2014-08-28 10:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Aug 28, 2014 at 11:12:15AM +0100, Sudeep Holla wrote:
>
>
> On 28/08/14 10:32, Thomas Gleixner wrote:
>> On Tue, 26 Aug 2014, Sudeep Holla wrote:
>>>>> Can provide your thoughts on how to solve this issue ?
>>>>
>>>> ffde1de64012 is not about offlining a cpu, it's about onlining where
>>>> we need to make sure that we assign the affinity to a not yet online
>>>> marked cpu.
>>>>
>>>>> Is it expected from all the irqchip implementation to use force flag in
>>>>> irq_set_affinity to ignore cpu_online_mask similar to GIC ?
>>>>
>>>> No, it's only relevant for the cases where we need to route irqs to
>>>> not yet online cpus.
>>>>
>>>
>>> Ok. IIUC Russell's main concern was if irqchip implementation uses force
>>> flag differently, then we can't change the core code to false. Also
>>> x86 core code also uses forced irq_set_affinity in arch/x86/kernel/irq.c
>>
>> Which is pointless as none of the x86 irq chip implementations
>> actually honours the force argument.
>>
>> In fact until the point where I implemented it in the GIC driver,
>> nothing ever used that argument. So the GIC conversion actually added
>> semantics to the argument. Any driver which will make use of it, has
>> to follow that now. I'll add documentation to the core code for it ...
>>
>
> Thanks Thomas for confirming.
>
> Hi Russell,
>
> Can I post the patch changing force to false in irq_set_affinity call
> to fix the issue and cc stable ? It's broken in stable kernels(v3.10 and
> v3.14)

I think it's up to Thomas to suggest what the correct solution is to
the problem he introduced, and it's not clear from Thomas' email you
quote above what he thinks would be the right solution.  It sounds
like passing false there would be the right thing, but really it needs
a clear and unambiguous statement from Thomas.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-28 10:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-09 16:40 [PATCH] arm: use cpu_online_mask when using forced irq_set_affinity Sudeep Holla
2014-05-23 11:26 ` Sudeep Holla
2014-05-23 12:10 ` Russell King - ARM Linux
2014-05-23 12:51   ` Sudeep Holla
2014-06-20 13:04     ` Sudeep Holla
2014-06-20 15:16       ` Russell King - ARM Linux
2014-07-17  9:58         ` Sudeep Holla
2014-08-25  9:55           ` Thomas Gleixner
2014-08-26 15:19             ` Sudeep Holla
2014-08-28  9:32               ` Thomas Gleixner
2014-08-28 10:12                 ` Sudeep Holla
2014-08-28 10:16                   ` Russell King - ARM Linux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.