linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 17:38 [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask Chuansheng Liu
@ 2012-09-26  8:49 ` Srivatsa S. Bhat
  2012-09-26  8:51   ` Liu, Chuansheng
  2012-09-26  8:56   ` Liu, Chuansheng
  2012-09-26 23:45 ` Chuansheng Liu
  1 sibling, 2 replies; 15+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-26  8:49 UTC (permalink / raw)
  To: Chuansheng Liu; +Cc: tglx, mingo, x86, linux-kernel, yanmin_zhang

On 09/26/2012 11:08 PM, Chuansheng Liu wrote:
> 
> When one CPU is going offline, and fixup_irqs() will re-set the
> irq affinity in some cases, we should clean the offlining CPU from
> the irq affinity.
> 
> The reason is setting offlining CPU as of the affinity is useless.
> Moreover, the smp_affinity value will be confusing when the
> offlining CPU come back again.
> 
> Example:
> For irq 93 with 4 CPUS, the default affinity f(1111),
> normal cases: 4 CPUS will receive the irq93 interrupts.
> 
> When echo 0 > /sys/devices/system/cpu/cpu3/online, just CPU0,1,2 will
> receive the interrupts.
> 
> But after the CPU3 is online again, we will not set affinity,the result
> will be:
> the smp_affinity is f, but still just CPU0,1,2 can receive the interrupts.
> 
> So we should clean the offlining CPU from irq affinity mask
> in fixup_irqs().
> 
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Please hold on.. I'm not yet done reviewing, I might have more comments :-)

> Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
> ---
>  arch/x86/kernel/irq.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index d44f782..08bb905 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -239,6 +239,7 @@ void fixup_irqs(void)
>  	struct irq_desc *desc;
>  	struct irq_data *data;
>  	struct irq_chip *chip;
> +	int cpu = smp_processor_id();
> 
>  	for_each_irq_desc(irq, desc) {
>  		int break_affinity = 0;
> @@ -277,8 +278,11 @@ void fixup_irqs(void)
>  		if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
>  			chip->irq_mask(data);
> 
> -		if (chip->irq_set_affinity)
> -			chip->irq_set_affinity(data, affinity, true);
> +		if ((chip->irq_set_affinity) &&
> +			!chip->irq_set_affinity(data, affinity, true)) {

A return value of 0 and 1 are acceptable. So this check isn't correct.

Regards,
Srivatsa S. Bhat

> +			if (cpumask_test_cpu(cpu, data->affinity))
> +				cpumask_clear_cpu(cpu, data->affinity);

OMG, why did you drop the other hunk which cleared the cpu *before*
invoking ->irq_set_affinity()? IMO, altering irq affinity involves more work
than just altering the mask; that's why you have that ->irq_set_affinity()
function. So, if you alter the mask *after* calling ->irq_set_affinity(),
its not right.. 

Regards,
Srivatsa S. Bhat
 
> +		}
>  		else if (!(warned++))
>  			set_affinity = 0;
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26  8:49 ` Srivatsa S. Bhat
@ 2012-09-26  8:51   ` Liu, Chuansheng
  2012-09-26  8:56   ` Liu, Chuansheng
  1 sibling, 0 replies; 15+ messages in thread
From: Liu, Chuansheng @ 2012-09-26  8:51 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: tglx, mingo, x86, linux-kernel, yanmin_zhang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 267 bytes --]

> Please hold on.. I'm not yet done reviewing, I might have more comments :-)
Sure, welcome, thanks again.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26  8:49 ` Srivatsa S. Bhat
  2012-09-26  8:51   ` Liu, Chuansheng
@ 2012-09-26  8:56   ` Liu, Chuansheng
  2012-09-26  9:02     ` Srivatsa S. Bhat
  1 sibling, 1 reply; 15+ messages in thread
From: Liu, Chuansheng @ 2012-09-26  8:56 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: tglx, mingo, x86, linux-kernel, yanmin_zhang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 681 bytes --]

> A return value of 0 and 1 are acceptable. So this check isn't correct.
> 
> Regards,
> Srivatsa S. Bhat
> 
Which case value 1 is acceptable, could you share? Thanks.

> OMG, why did you drop the other hunk which cleared the cpu *before*
> invoking ->irq_set_affinity()? IMO, altering irq affinity involves more work
> than just altering the mask; that's why you have that ->irq_set_affinity()
> function. So, if you alter the mask *after* calling ->irq_set_affinity(),
> its not right..
Sorry the mistake, will update.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26  8:56   ` Liu, Chuansheng
@ 2012-09-26  9:02     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 15+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-26  9:02 UTC (permalink / raw)
  To: Liu, Chuansheng; +Cc: tglx, mingo, x86, linux-kernel, yanmin_zhang

On 09/26/2012 02:26 PM, Liu, Chuansheng wrote:
>> A return value of 0 and 1 are acceptable. So this check isn't correct.
>>
>> Regards,
>> Srivatsa S. Bhat
>>
> Which case value 1 is acceptable, could you share? Thanks.

I can see the following in include/linux/irq.h:

/*
 * Return value for chip->irq_set_affinity()
 *
 * IRQ_SET_MASK_OK      - OK, core updates irq_data.affinity
 * IRQ_SET_MASK_NOCPY   - OK, chip did update irq_data.affinity
 */
enum {
        IRQ_SET_MASK_OK = 0,
        IRQ_SET_MASK_OK_NOCOPY,
};

And see some of those ->irq_set_affinity() implementations at various
places.

Regards,
Srivatsa S. Bhat

> 
>> OMG, why did you drop the other hunk which cleared the cpu *before*
>> invoking ->irq_set_affinity()? IMO, altering irq affinity involves more work
>> than just altering the mask; that's why you have that ->irq_set_affinity()
>> function. So, if you alter the mask *after* calling ->irq_set_affinity(),
>> its not right..
> Sorry the mistake, will update.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 23:45 ` Chuansheng Liu
@ 2012-09-26 15:47   ` Srivatsa S. Bhat
  2012-09-26 16:03   ` Srivatsa S. Bhat
  1 sibling, 0 replies; 15+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-26 15:47 UTC (permalink / raw)
  To: Chuansheng Liu
  Cc: tglx, mingo, x86, linux-kernel, yanmin_zhang, Suresh Siddha,
	Paul E. McKenney

On 09/27/2012 05:15 AM, Chuansheng Liu wrote:
> 
> When one CPU is going offline, and fixup_irqs() will re-set the
> irq affinity in some cases, we should clean the offlining CPU from
> the irq affinity.
> 
> The reason is setting offlining CPU as of the affinity is useless.
> Moreover, the smp_affinity value will be confusing when the
> offlining CPU come back again.
> 
> Example:
> For irq 93 with 4 CPUS, the default affinity f(1111),
> normal cases: 4 CPUS will receive the irq93 interrupts.
> 
> When echo 0 > /sys/devices/system/cpu/cpu3/online, just CPU0,1,2 will
> receive the interrupts.
> 
> But after the CPU3 is online again, we will not set affinity,the result
> will be:
> the smp_affinity is f, but still just CPU0,1,2 can receive the interrupts.
> 
> So we should clean the offlining CPU from irq affinity mask
> in fixup_irqs().
> 
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

:-)

OK, so here is the general rule: You shouldn't automatically add
Reviewed-by tags.. You can include them only if the reviewer _explicitly_
lets you know that he is fine with the patch. Often, review happens in
multiple iterations/stages. So just because you addressed all the review
comments raised in iteration 'n' doesn't mean there won't be issues in
iteration 'n+1', perhaps because the way you addressed the concern might
not be the best approach.. or the reviewer might find more issues in
iteration 'n+1' which he might have over-looked in iteration 'n'.
So please refrain from adding such tags automatically!

> Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
> ---
>  arch/x86/kernel/irq.c |   21 +++++++++++++++++----
>  1 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index d44f782..ead0807 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -239,10 +239,13 @@ void fixup_irqs(void)
>  	struct irq_desc *desc;
>  	struct irq_data *data;
>  	struct irq_chip *chip;
> +	int cpu = smp_processor_id();
> 
>  	for_each_irq_desc(irq, desc) {
>  		int break_affinity = 0;
>  		int set_affinity = 1;
> +		bool set_ret = false;
> +
>  		const struct cpumask *affinity;
> 
>  		if (!desc)
> @@ -256,7 +259,8 @@ void fixup_irqs(void)
>  		data = irq_desc_get_irq_data(desc);
>  		affinity = data->affinity;
>  		if (!irq_has_action(irq) || irqd_is_per_cpu(data) ||
> -		    cpumask_subset(affinity, cpu_online_mask)) {
> +		    cpumask_subset(affinity, cpu_online_mask) ||
> +		    !cpumask_test_cpu(cpu, data->affinity)) {

This last check is superfluous, because it already checks if 'affinity'
is a subset of cpu_online_mask. Note that this cpu was already removed
from the cpu_online_mask before coming here.

>  			raw_spin_unlock(&desc->lock);
>  			continue;
>  		}
> @@ -277,9 +281,18 @@ void fixup_irqs(void)
>  		if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
>  			chip->irq_mask(data);
> 
> -		if (chip->irq_set_affinity)
> -			chip->irq_set_affinity(data, affinity, true);
> -		else if (!(warned++))
> +		if (chip->irq_set_affinity) {
> +			struct cpumask mask;

It is good to avoid allocating huge cpumask bitmasks like this on stack.
If we really can't do without a temp mask, you could perhaps do something like:
		cpumask_var_t mask;

		alloc_cpumask_var(&mask, GFP_ATOMIC);

> +			cpumask_copy(&mask, affinity);
> +			cpumask_clear_cpu(cpu, &mask);
> +			switch (chip->irq_set_affinity(data, &mask, true)) {
> +			case IRQ_SET_MASK_OK:
> +				cpumask_copy(data->affinity, &mask);

This is again not required. __ioapic_set_affinity() copies the mask for you.
(And __ioapic_set_affinity() is called in every ->irq_set_affinity implementation,
if I read the source code correctly).


Regards,
Srivatsa S. Bhat

> +			case IRQ_SET_MASK_OK_NOCOPY:
> +				set_ret = true;
> +			}
> +		}
> +		if ((!set_ret) && !(warned++))
>  			set_affinity = 0;
> 
>  		/*
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 23:45 ` Chuansheng Liu
  2012-09-26 15:47   ` Srivatsa S. Bhat
@ 2012-09-26 16:03   ` Srivatsa S. Bhat
  2012-09-26 17:06     ` Suresh Siddha
  1 sibling, 1 reply; 15+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-26 16:03 UTC (permalink / raw)
  To: Chuansheng Liu
  Cc: tglx, mingo, x86, linux-kernel, yanmin_zhang, Paul E. McKenney,
	Suresh Siddha, Peter Zijlstra, rusty

On 09/27/2012 05:15 AM, Chuansheng Liu wrote:
> 
> When one CPU is going offline, and fixup_irqs() will re-set the
> irq affinity in some cases, we should clean the offlining CPU from
> the irq affinity.
> 
> The reason is setting offlining CPU as of the affinity is useless.
> Moreover, the smp_affinity value will be confusing when the
> offlining CPU come back again.
> 
> Example:
> For irq 93 with 4 CPUS, the default affinity f(1111),
> normal cases: 4 CPUS will receive the irq93 interrupts.
> 
> When echo 0 > /sys/devices/system/cpu/cpu3/online, just CPU0,1,2 will
> receive the interrupts.
> 
> But after the CPU3 is online again, we will not set affinity,the result
> will be:
> the smp_affinity is f, but still just CPU0,1,2 can receive the interrupts.
> 
> So we should clean the offlining CPU from irq affinity mask
> in fixup_irqs().
> 

I have some fundamental questions here:
1. Why was the CPU never removed from the affinity masks in the original
code? I find it hard to believe that it was just an oversight, because the
whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
So, is that really a bug or is the existing code correct for some reason
which I don't know of?

2. In case this is indeed a bug, why are the warnings ratelimited when the
interrupts can't be affined to other CPUs? Are they not serious enough to
report? Put more strongly, why do we even silently return with a warning
instead of reporting that the CPU offline operation failed?? Is that because
we have come way too far in the hotplug sequence and we can't easily roll
back? Or are we still actually OK in that situation?

Suresh, I'd be grateful if you could kindly throw some light on these
issues... I'm actually debugging an issue where an offline CPU gets apic timer
interrupts (and in one case, I even saw a device interrupt), which I have
reported in another thread at: https://lkml.org/lkml/2012/9/26/119
But this issue in fixup_irqs() that Liu brought to light looks even more
surprising to me..

Regards,
Srivatsa S. Bhat

> ---
>  arch/x86/kernel/irq.c |   21 +++++++++++++++++----
>  1 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index d44f782..ead0807 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -239,10 +239,13 @@ void fixup_irqs(void)
>  	struct irq_desc *desc;
>  	struct irq_data *data;
>  	struct irq_chip *chip;
> +	int cpu = smp_processor_id();
> 
>  	for_each_irq_desc(irq, desc) {
>  		int break_affinity = 0;
>  		int set_affinity = 1;
> +		bool set_ret = false;
> +
>  		const struct cpumask *affinity;
> 
>  		if (!desc)
> @@ -256,7 +259,8 @@ void fixup_irqs(void)
>  		data = irq_desc_get_irq_data(desc);
>  		affinity = data->affinity;
>  		if (!irq_has_action(irq) || irqd_is_per_cpu(data) ||
> -		    cpumask_subset(affinity, cpu_online_mask)) {
> +		    cpumask_subset(affinity, cpu_online_mask) ||
> +		    !cpumask_test_cpu(cpu, data->affinity)) {
>  			raw_spin_unlock(&desc->lock);
>  			continue;
>  		}
> @@ -277,9 +281,18 @@ void fixup_irqs(void)
>  		if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
>  			chip->irq_mask(data);
> 
> -		if (chip->irq_set_affinity)
> -			chip->irq_set_affinity(data, affinity, true);
> -		else if (!(warned++))
> +		if (chip->irq_set_affinity) {
> +			struct cpumask mask;
> +			cpumask_copy(&mask, affinity);
> +			cpumask_clear_cpu(cpu, &mask);
> +			switch (chip->irq_set_affinity(data, &mask, true)) {
> +			case IRQ_SET_MASK_OK:
> +				cpumask_copy(data->affinity, &mask);
> +			case IRQ_SET_MASK_OK_NOCOPY:
> +				set_ret = true;
> +			}
> +		}
> +		if ((!set_ret) && !(warned++))
>  			set_affinity = 0;
> 
>  		/*
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 16:03   ` Srivatsa S. Bhat
@ 2012-09-26 17:06     ` Suresh Siddha
  2012-09-26 17:30       ` Srivatsa S. Bhat
  0 siblings, 1 reply; 15+ messages in thread
From: Suresh Siddha @ 2012-09-26 17:06 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Chuansheng Liu, tglx, mingo, x86, linux-kernel, yanmin_zhang,
	Paul E. McKenney, Peter Zijlstra, rusty

On Wed, 2012-09-26 at 21:33 +0530, Srivatsa S. Bhat wrote:
> I have some fundamental questions here:
> 1. Why was the CPU never removed from the affinity masks in the original
> code? I find it hard to believe that it was just an oversight, because the
> whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
> So, is that really a bug or is the existing code correct for some reason
> which I don't know of?

I am not aware of the history but my guess is that the affinity mask
which is coming from the user-space wants to be preserved. And
fixup_irqs() is fixing the underlying interrupt routing when the cpu
goes down with a hope that things will be corrected when the cpu comes
back online. But  as Liu noted, we are not correcting the underlying
routing when the cpu comes back online. I think we should fix that
rather than modifying the user-specified affinity.

> 2. In case this is indeed a bug, why are the warnings ratelimited when the
> interrupts can't be affined to other CPUs? Are they not serious enough to
> report? Put more strongly, why do we even silently return with a warning
> instead of reporting that the CPU offline operation failed?? Is that because
> we have come way too far in the hotplug sequence and we can't easily roll
> back? Or are we still actually OK in that situation?

Are you referring to the "cannot set affinity for irq" messages? That
happens only if the irq chip doesn't have the irq_set_affinity() setup.
But that is not common.

> 
> Suresh, I'd be grateful if you could kindly throw some light on these
> issues... I'm actually debugging an issue where an offline CPU gets apic timer
> interrupts (and in one case, I even saw a device interrupt), which I have
> reported in another thread at: https://lkml.org/lkml/2012/9/26/119
> But this issue in fixup_irqs() that Liu brought to light looks even more
> surprising to me..

These issues look different to me, will look into that.

thanks,
suresh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 17:06     ` Suresh Siddha
@ 2012-09-26 17:30       ` Srivatsa S. Bhat
  2012-09-26 22:46         ` Suresh Siddha
  0 siblings, 1 reply; 15+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-26 17:30 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Chuansheng Liu, tglx, mingo, x86, linux-kernel, yanmin_zhang,
	Paul E. McKenney, Peter Zijlstra, rusty

On 09/26/2012 10:36 PM, Suresh Siddha wrote:
> On Wed, 2012-09-26 at 21:33 +0530, Srivatsa S. Bhat wrote:
>> I have some fundamental questions here:
>> 1. Why was the CPU never removed from the affinity masks in the original
>> code? I find it hard to believe that it was just an oversight, because the
>> whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
>> So, is that really a bug or is the existing code correct for some reason
>> which I don't know of?
> 
> I am not aware of the history but my guess is that the affinity mask
> which is coming from the user-space wants to be preserved. And
> fixup_irqs() is fixing the underlying interrupt routing when the cpu
> goes down

and the code that corresponds to that is:
irq_force_complete_move(irq); is it?

> with a hope that things will be corrected when the cpu comes
> back online. But  as Liu noted, we are not correcting the underlying
> routing when the cpu comes back online. I think we should fix that
> rather than modifying the user-specified affinity.
> 

Hmm, I didn't entirely get your suggestion. Are you saying that we should change
data->affinity (by calling ->irq_set_affinity()) during offline but maintain a
copy of the original affinity mask somewhere, so that we can try to match it
when possible (ie., when CPU comes back online)?

>> 2. In case this is indeed a bug, why are the warnings ratelimited when the
>> interrupts can't be affined to other CPUs? Are they not serious enough to
>> report? Put more strongly, why do we even silently return with a warning
>> instead of reporting that the CPU offline operation failed?? Is that because
>> we have come way too far in the hotplug sequence and we can't easily roll
>> back? Or are we still actually OK in that situation?
> 
> Are you referring to the "cannot set affinity for irq" messages?

Yes

> That happens only if the irq chip doesn't have the irq_set_affinity() setup.

That is my other point of concern : setting irq affinity can fail even if
we have ->irq_set_affinity(). (If __ioapic_set_affinity() fails, for example).
Why don't we complain in that case? I think we should... and if its serious
enough, abort the hotplug operation or atleast indicate that offline failed..

> But that is not common.
> 
>>
>> Suresh, I'd be grateful if you could kindly throw some light on these
>> issues... I'm actually debugging an issue where an offline CPU gets apic timer
>> interrupts (and in one case, I even saw a device interrupt), which I have
>> reported in another thread at: https://lkml.org/lkml/2012/9/26/119
>> But this issue in fixup_irqs() that Liu brought to light looks even more
>> surprising to me..
> 
> These issues look different to me, will look into that.
> 

Ok, thanks a lot!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
@ 2012-09-26 17:38 Chuansheng Liu
  2012-09-26  8:49 ` Srivatsa S. Bhat
  2012-09-26 23:45 ` Chuansheng Liu
  0 siblings, 2 replies; 15+ messages in thread
From: Chuansheng Liu @ 2012-09-26 17:38 UTC (permalink / raw)
  To: srivatsa.bhat, tglx, mingo, x86; +Cc: linux-kernel, yanmin_zhang


When one CPU is going offline, and fixup_irqs() will re-set the
irq affinity in some cases, we should clean the offlining CPU from
the irq affinity.

The reason is setting offlining CPU as of the affinity is useless.
Moreover, the smp_affinity value will be confusing when the
offlining CPU come back again.

Example:
For irq 93 with 4 CPUS, the default affinity f(1111),
normal cases: 4 CPUS will receive the irq93 interrupts.

When echo 0 > /sys/devices/system/cpu/cpu3/online, just CPU0,1,2 will
receive the interrupts.

But after the CPU3 is online again, we will not set affinity,the result
will be:
the smp_affinity is f, but still just CPU0,1,2 can receive the interrupts.

So we should clean the offlining CPU from irq affinity mask
in fixup_irqs().

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
---
 arch/x86/kernel/irq.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index d44f782..08bb905 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -239,6 +239,7 @@ void fixup_irqs(void)
 	struct irq_desc *desc;
 	struct irq_data *data;
 	struct irq_chip *chip;
+	int cpu = smp_processor_id();
 
 	for_each_irq_desc(irq, desc) {
 		int break_affinity = 0;
@@ -277,8 +278,11 @@ void fixup_irqs(void)
 		if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
 			chip->irq_mask(data);
 
-		if (chip->irq_set_affinity)
-			chip->irq_set_affinity(data, affinity, true);
+		if ((chip->irq_set_affinity) &&
+			!chip->irq_set_affinity(data, affinity, true)) {
+			if (cpumask_test_cpu(cpu, data->affinity))
+				cpumask_clear_cpu(cpu, data->affinity);
+		}
 		else if (!(warned++))
 			set_affinity = 0;
 
-- 
1.7.0.4




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 17:30       ` Srivatsa S. Bhat
@ 2012-09-26 22:46         ` Suresh Siddha
  2012-09-27 18:42           ` Srivatsa S. Bhat
  2012-10-09  8:51           ` Liu, Chuansheng
  0 siblings, 2 replies; 15+ messages in thread
From: Suresh Siddha @ 2012-09-26 22:46 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Chuansheng Liu, tglx, mingo, x86, linux-kernel, yanmin_zhang,
	Paul E. McKenney, Peter Zijlstra, rusty

On Wed, 2012-09-26 at 23:00 +0530, Srivatsa S. Bhat wrote:
> On 09/26/2012 10:36 PM, Suresh Siddha wrote:
> > On Wed, 2012-09-26 at 21:33 +0530, Srivatsa S. Bhat wrote:
> >> I have some fundamental questions here:
> >> 1. Why was the CPU never removed from the affinity masks in the original
> >> code? I find it hard to believe that it was just an oversight, because the
> >> whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
> >> So, is that really a bug or is the existing code correct for some reason
> >> which I don't know of?
> > 
> > I am not aware of the history but my guess is that the affinity mask
> > which is coming from the user-space wants to be preserved. And
> > fixup_irqs() is fixing the underlying interrupt routing when the cpu
> > goes down
> 
> and the code that corresponds to that is:
> irq_force_complete_move(irq); is it?

No. irq_set_affinity()

> > with a hope that things will be corrected when the cpu comes
> > back online. But  as Liu noted, we are not correcting the underlying
> > routing when the cpu comes back online. I think we should fix that
> > rather than modifying the user-specified affinity.
> > 
> 
> Hmm, I didn't entirely get your suggestion. Are you saying that we should change
> data->affinity (by calling ->irq_set_affinity()) during offline but maintain a
> copy of the original affinity mask somewhere, so that we can try to match it
> when possible (ie., when CPU comes back online)?

Don't change the data->affinity in the fixup_irqs() and shortly after a
cpu is online, call irq_chip's irq_set_affinity() for those irq's who
affinity included this cpu (now that the cpu is back online,
irq_set_affinity() will setup the HW routing tables correctly).

This presumes that across the suspend/resume, cpu offline/online
operations, we don't want to break the irq affinity setup by the
user-level entity like irqbalance etc...

> > That happens only if the irq chip doesn't have the irq_set_affinity() setup.
> 
> That is my other point of concern : setting irq affinity can fail even if
> we have ->irq_set_affinity(). (If __ioapic_set_affinity() fails, for example).
> Why don't we complain in that case? I think we should... and if its serious
> enough, abort the hotplug operation or atleast indicate that offline failed..

yes if there is a failure then we are in trouble, as the cpu is already
disappeared from the online-masks etc. For platforms with
interrupt-remapping, interrupts can be migrated from the process context
and as such this all can be done much before.

And for legacy platforms we have done quite a few changes in the recent
past like using eoi_ioapic_irq() for level triggered interrupts etc,
that makes it as safe as it can be. Perhaps we can move most of the
fixup_irqs() code much ahead and the lost section of the current
fixup_irqs() (which check IRR bits and use the retrigger function to
trigger the interrupt on another cpu) can still be done late just like
now.

thanks,
suresh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 17:38 [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask Chuansheng Liu
  2012-09-26  8:49 ` Srivatsa S. Bhat
@ 2012-09-26 23:45 ` Chuansheng Liu
  2012-09-26 15:47   ` Srivatsa S. Bhat
  2012-09-26 16:03   ` Srivatsa S. Bhat
  1 sibling, 2 replies; 15+ messages in thread
From: Chuansheng Liu @ 2012-09-26 23:45 UTC (permalink / raw)
  To: srivatsa.bhat, tglx, mingo, x86
  Cc: linux-kernel, yanmin_zhang, chuansheng.liu


When one CPU is going offline, and fixup_irqs() will re-set the
irq affinity in some cases, we should clean the offlining CPU from
the irq affinity.

The reason is setting offlining CPU as of the affinity is useless.
Moreover, the smp_affinity value will be confusing when the
offlining CPU come back again.

Example:
For irq 93 with 4 CPUS, the default affinity f(1111),
normal cases: 4 CPUS will receive the irq93 interrupts.

When echo 0 > /sys/devices/system/cpu/cpu3/online, just CPU0,1,2 will
receive the interrupts.

But after the CPU3 is online again, we will not set affinity,the result
will be:
the smp_affinity is f, but still just CPU0,1,2 can receive the interrupts.

So we should clean the offlining CPU from irq affinity mask
in fixup_irqs().

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
---
 arch/x86/kernel/irq.c |   21 +++++++++++++++++----
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index d44f782..ead0807 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -239,10 +239,13 @@ void fixup_irqs(void)
 	struct irq_desc *desc;
 	struct irq_data *data;
 	struct irq_chip *chip;
+	int cpu = smp_processor_id();
 
 	for_each_irq_desc(irq, desc) {
 		int break_affinity = 0;
 		int set_affinity = 1;
+		bool set_ret = false;
+
 		const struct cpumask *affinity;
 
 		if (!desc)
@@ -256,7 +259,8 @@ void fixup_irqs(void)
 		data = irq_desc_get_irq_data(desc);
 		affinity = data->affinity;
 		if (!irq_has_action(irq) || irqd_is_per_cpu(data) ||
-		    cpumask_subset(affinity, cpu_online_mask)) {
+		    cpumask_subset(affinity, cpu_online_mask) ||
+		    !cpumask_test_cpu(cpu, data->affinity)) {
 			raw_spin_unlock(&desc->lock);
 			continue;
 		}
@@ -277,9 +281,18 @@ void fixup_irqs(void)
 		if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
 			chip->irq_mask(data);
 
-		if (chip->irq_set_affinity)
-			chip->irq_set_affinity(data, affinity, true);
-		else if (!(warned++))
+		if (chip->irq_set_affinity) {
+			struct cpumask mask;
+			cpumask_copy(&mask, affinity);
+			cpumask_clear_cpu(cpu, &mask);
+			switch (chip->irq_set_affinity(data, &mask, true)) {
+			case IRQ_SET_MASK_OK:
+				cpumask_copy(data->affinity, &mask);
+			case IRQ_SET_MASK_OK_NOCOPY:
+				set_ret = true;
+			}
+		}
+		if ((!set_ret) && !(warned++))
 			set_affinity = 0;
 
 		/*
-- 
1.7.0.4




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 22:46         ` Suresh Siddha
@ 2012-09-27 18:42           ` Srivatsa S. Bhat
  2012-09-27 19:20             ` Suresh Siddha
  2012-10-09  8:51           ` Liu, Chuansheng
  1 sibling, 1 reply; 15+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-27 18:42 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Chuansheng Liu, tglx, mingo, x86, linux-kernel, yanmin_zhang,
	Paul E. McKenney, Peter Zijlstra, rusty

On 09/27/2012 04:16 AM, Suresh Siddha wrote:
> On Wed, 2012-09-26 at 23:00 +0530, Srivatsa S. Bhat wrote:
>> On 09/26/2012 10:36 PM, Suresh Siddha wrote:
>>> On Wed, 2012-09-26 at 21:33 +0530, Srivatsa S. Bhat wrote:
>>>> I have some fundamental questions here:
>>>> 1. Why was the CPU never removed from the affinity masks in the original
>>>> code? I find it hard to believe that it was just an oversight, because the
>>>> whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
>>>> So, is that really a bug or is the existing code correct for some reason
>>>> which I don't know of?
>>>
>>> I am not aware of the history but my guess is that the affinity mask
>>> which is coming from the user-space wants to be preserved. And
>>> fixup_irqs() is fixing the underlying interrupt routing when the cpu
>>> goes down
>>
>> and the code that corresponds to that is:
>> irq_force_complete_move(irq); is it?
> 
> No. irq_set_affinity()
> 

Um? That takes the updated/changed affinity and sets data->affinity to
that value no? You mentioned that probably the intention of the original
code was to preserve the user-set affinity mask, but still change the
underlying interrupt routing. Sorry, but I still didn't quite understand
what is that part of the code that achieves that.

It appears that ->irq_set_affinity() is doing both - change the (user-set)
affinity as well as the underlying irq routing. And it does it only when
all CPUs in the original affinity mask go offline, not on every offline;
which looks like a real bug to me.

>>> with a hope that things will be corrected when the cpu comes
>>> back online. But  as Liu noted, we are not correcting the underlying
>>> routing when the cpu comes back online. I think we should fix that
>>> rather than modifying the user-specified affinity.
>>>

I am not able to distinguish the 2 things in the existing code, as I
mentioned above :(

>>
>> Hmm, I didn't entirely get your suggestion. Are you saying that we should change
>> data->affinity (by calling ->irq_set_affinity()) during offline but maintain a
>> copy of the original affinity mask somewhere, so that we can try to match it
>> when possible (ie., when CPU comes back online)?
> 
> Don't change the data->affinity in the fixup_irqs()

You mean, IOW, don't call ->irq_set_affinity() during CPU offline at all?
(Would that even be correct?)

> and shortly after a
> cpu is online, call irq_chip's irq_set_affinity() for those irq's who
> affinity included this cpu (now that the cpu is back online,
> irq_set_affinity() will setup the HW routing tables correctly).
> 
> This presumes that across the suspend/resume, cpu offline/online
> operations, we don't want to break the irq affinity setup by the
> user-level entity like irqbalance etc...
> 

The only way I can think of to preserve the user-set affinity but still
alter the underlying routing appropriately when needed, is by having an
additional mask (per-irq) that holds the user-set affinity.


>>> That happens only if the irq chip doesn't have the irq_set_affinity() setup.
>>
>> That is my other point of concern : setting irq affinity can fail even if
>> we have ->irq_set_affinity(). (If __ioapic_set_affinity() fails, for example).
>> Why don't we complain in that case? I think we should... and if its serious
>> enough, abort the hotplug operation or atleast indicate that offline failed..
> 
> yes if there is a failure then we are in trouble, as the cpu is already
> disappeared from the online-masks etc. For platforms with
> interrupt-remapping, interrupts can be migrated from the process context
> and as such this all can be done much before.
> 
> And for legacy platforms we have done quite a few changes in the recent
> past like using eoi_ioapic_irq() for level triggered interrupts etc,
> that makes it as safe as it can be. Perhaps we can move most of the
> fixup_irqs() code much ahead and the lost section of the current
> fixup_irqs() (which check IRR bits and use the retrigger function to
> trigger the interrupt on another cpu) can still be done late just like
> now.
> 

Hmm.. ok.. Thanks for the explanation!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-27 18:42           ` Srivatsa S. Bhat
@ 2012-09-27 19:20             ` Suresh Siddha
  2012-09-27 20:33               ` Srivatsa S. Bhat
  0 siblings, 1 reply; 15+ messages in thread
From: Suresh Siddha @ 2012-09-27 19:20 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Chuansheng Liu, tglx, mingo, x86, linux-kernel, yanmin_zhang,
	Paul E. McKenney, Peter Zijlstra, rusty

On Fri, 2012-09-28 at 00:12 +0530, Srivatsa S. Bhat wrote:
> On 09/27/2012 04:16 AM, Suresh Siddha wrote:
> > 
> > No. irq_set_affinity()
> > 
> 
> Um? That takes the updated/changed affinity and sets data->affinity to
> that value no? You mentioned that probably the intention of the original
> code was to preserve the user-set affinity mask, but still change the
> underlying interrupt routing. Sorry, but I still didn't quite understand
> what is that part of the code that achieves that.

For the HW routing to be changed we AND it with cpu_online_map and use
that for programming the interrupt entries etc. The user-specified
affinity still has the cpu that is offlined.

And when the cpu comes online and if it is part of the user-specified
affinity, then the HW routing can be again modified to include the new
cpu.

hope this clears it!

thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-27 19:20             ` Suresh Siddha
@ 2012-09-27 20:33               ` Srivatsa S. Bhat
  0 siblings, 0 replies; 15+ messages in thread
From: Srivatsa S. Bhat @ 2012-09-27 20:33 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Chuansheng Liu, tglx, mingo, x86, linux-kernel, yanmin_zhang,
	Paul E. McKenney, Peter Zijlstra, rusty

On 09/28/2012 12:50 AM, Suresh Siddha wrote:
> On Fri, 2012-09-28 at 00:12 +0530, Srivatsa S. Bhat wrote:
>> On 09/27/2012 04:16 AM, Suresh Siddha wrote:
>>>
>>> No. irq_set_affinity()
>>>
>>
>> Um? That takes the updated/changed affinity and sets data->affinity to
>> that value no? You mentioned that probably the intention of the original
>> code was to preserve the user-set affinity mask, but still change the
>> underlying interrupt routing. Sorry, but I still didn't quite understand
>> what is that part of the code that achieves that.
> 
> For the HW routing to be changed we AND it with cpu_online_map and use
> that for programming the interrupt entries etc.

Ah, now I see.. you were referring to the __assign_irq_vector() code, whereas
I was looking only at fixup_irqs() and was trying to find the code that did
what you said.. that's what got me confused earlier :-)

> The user-specified
> affinity still has the cpu that is offlined.
> 

Right, so data->affinity is untouched, whereas cfg->domain is updated when
the CPU is offlined..

> And when the cpu comes online and if it is part of the user-specified
> affinity, then the HW routing can be again modified to include the new
> cpu.
> 

Right, got it.

> hope this clears it!
>

Yep, thanks a lot!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
  2012-09-26 22:46         ` Suresh Siddha
  2012-09-27 18:42           ` Srivatsa S. Bhat
@ 2012-10-09  8:51           ` Liu, Chuansheng
  1 sibling, 0 replies; 15+ messages in thread
From: Liu, Chuansheng @ 2012-10-09  8:51 UTC (permalink / raw)
  To: Siddha, Suresh B, Srivatsa S. Bhat
  Cc: tglx, mingo, x86, linux-kernel, yanmin_zhang, Paul E. McKenney,
	Peter Zijlstra, rusty

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3974 bytes --]

One suggestion below, thanks.

> -----Original Message-----
> From: Siddha, Suresh B
> Sent: Thursday, September 27, 2012 6:46 AM
> To: Srivatsa S. Bhat
> Cc: Liu, Chuansheng; tglx@linutronix.de; mingo@redhat.com; x86@kernel.org;
> linux-kernel@vger.kernel.org; yanmin_zhang@linux.intel.com; Paul E.
> McKenney; Peter Zijlstra; rusty@rustcorp.com.au
> Subject: Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq
> affinity mask
> 
> On Wed, 2012-09-26 at 23:00 +0530, Srivatsa S. Bhat wrote:
> > On 09/26/2012 10:36 PM, Suresh Siddha wrote:
> > > On Wed, 2012-09-26 at 21:33 +0530, Srivatsa S. Bhat wrote:
> > >> I have some fundamental questions here:
> > >> 1. Why was the CPU never removed from the affinity masks in the original
> > >> code? I find it hard to believe that it was just an oversight, because the
> > >> whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
> > >> So, is that really a bug or is the existing code correct for some reason
> > >> which I don't know of?
> > >
> > > I am not aware of the history but my guess is that the affinity mask
> > > which is coming from the user-space wants to be preserved. And
> > > fixup_irqs() is fixing the underlying interrupt routing when the cpu
> > > goes down
> >
> > and the code that corresponds to that is:
> > irq_force_complete_move(irq); is it?
> 
> No. irq_set_affinity()
> 
> > > with a hope that things will be corrected when the cpu comes
> > > back online. But  as Liu noted, we are not correcting the underlying
> > > routing when the cpu comes back online. I think we should fix that
> > > rather than modifying the user-specified affinity.
> > >
> >
> > Hmm, I didn't entirely get your suggestion. Are you saying that we should
> change
> > data->affinity (by calling ->irq_set_affinity()) during offline but maintain a
> > copy of the original affinity mask somewhere, so that we can try to match it
> > when possible (ie., when CPU comes back online)?
> 
> Don't change the data->affinity in the fixup_irqs() and shortly after a
> cpu is online, call irq_chip's irq_set_affinity() for those irq's who
> affinity included this cpu (now that the cpu is back online,
> irq_set_affinity() will setup the HW routing tables correctly).
> 
> This presumes that across the suspend/resume, cpu offline/online
> operations, we don't want to break the irq affinity setup by the
> user-level entity like irqbalance etc...
> 
In fixup_irqs(), could we untouch the data->affinity, but calling ->irq_set_affinity() without offlined CPU.
If so, the better affinity is set, and user-space can use the data->affinity correctly, also new affinity setting
will and online cpus.

> > > That happens only if the irq chip doesn't have the irq_set_affinity() setup.
> >
> > That is my other point of concern : setting irq affinity can fail even if
> > we have ->irq_set_affinity(). (If __ioapic_set_affinity() fails, for example).
> > Why don't we complain in that case? I think we should... and if its serious
> > enough, abort the hotplug operation or atleast indicate that offline failed..
> 
> yes if there is a failure then we are in trouble, as the cpu is already
> disappeared from the online-masks etc. For platforms with
> interrupt-remapping, interrupts can be migrated from the process context
> and as such this all can be done much before.
> 
> And for legacy platforms we have done quite a few changes in the recent
> past like using eoi_ioapic_irq() for level triggered interrupts etc,
> that makes it as safe as it can be. Perhaps we can move most of the
> fixup_irqs() code much ahead and the lost section of the current
> fixup_irqs() (which check IRR bits and use the retrigger function to
> trigger the interrupt on another cpu) can still be done late just like
> now.
> 
> thanks,
> suresh

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-10-09  8:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-26 17:38 [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask Chuansheng Liu
2012-09-26  8:49 ` Srivatsa S. Bhat
2012-09-26  8:51   ` Liu, Chuansheng
2012-09-26  8:56   ` Liu, Chuansheng
2012-09-26  9:02     ` Srivatsa S. Bhat
2012-09-26 23:45 ` Chuansheng Liu
2012-09-26 15:47   ` Srivatsa S. Bhat
2012-09-26 16:03   ` Srivatsa S. Bhat
2012-09-26 17:06     ` Suresh Siddha
2012-09-26 17:30       ` Srivatsa S. Bhat
2012-09-26 22:46         ` Suresh Siddha
2012-09-27 18:42           ` Srivatsa S. Bhat
2012-09-27 19:20             ` Suresh Siddha
2012-09-27 20:33               ` Srivatsa S. Bhat
2012-10-09  8:51           ` Liu, Chuansheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).