All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Chuansheng Liu <chuansheng.liu@intel.com>,
	tglx@linutronix.de, mingo@redhat.com, x86@kernel.org,
	linux-kernel@vger.kernel.org, yanmin_zhang@linux.intel.com,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"rusty@rustcorp.com.au" <rusty@rustcorp.com.au>
Subject: Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
Date: Wed, 26 Sep 2012 23:00:08 +0530	[thread overview]
Message-ID: <50633BA0.5030700@linux.vnet.ibm.com> (raw)
In-Reply-To: <1348679199.26695.455.camel@sbsiddha-desk.sc.intel.com>

On 09/26/2012 10:36 PM, Suresh Siddha wrote:
> On Wed, 2012-09-26 at 21:33 +0530, Srivatsa S. Bhat wrote:
>> I have some fundamental questions here:
>> 1. Why was the CPU never removed from the affinity masks in the original
>> code? I find it hard to believe that it was just an oversight, because the
>> whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
>> So, is that really a bug or is the existing code correct for some reason
>> which I don't know of?
> 
> I am not aware of the history but my guess is that the affinity mask
> which is coming from the user-space wants to be preserved. And
> fixup_irqs() is fixing the underlying interrupt routing when the cpu
> goes down

and the code that corresponds to that is:
irq_force_complete_move(irq); is it?

> with a hope that things will be corrected when the cpu comes
> back online. But  as Liu noted, we are not correcting the underlying
> routing when the cpu comes back online. I think we should fix that
> rather than modifying the user-specified affinity.
> 

Hmm, I didn't entirely get your suggestion. Are you saying that we should change
data->affinity (by calling ->irq_set_affinity()) during offline but maintain a
copy of the original affinity mask somewhere, so that we can try to match it
when possible (ie., when CPU comes back online)?

>> 2. In case this is indeed a bug, why are the warnings ratelimited when the
>> interrupts can't be affined to other CPUs? Are they not serious enough to
>> report? Put more strongly, why do we even silently return with a warning
>> instead of reporting that the CPU offline operation failed?? Is that because
>> we have come way too far in the hotplug sequence and we can't easily roll
>> back? Or are we still actually OK in that situation?
> 
> Are you referring to the "cannot set affinity for irq" messages?

Yes

> That happens only if the irq chip doesn't have the irq_set_affinity() setup.

That is my other point of concern : setting irq affinity can fail even if
we have ->irq_set_affinity(). (If __ioapic_set_affinity() fails, for example).
Why don't we complain in that case? I think we should... and if its serious
enough, abort the hotplug operation or atleast indicate that offline failed..

> But that is not common.
> 
>>
>> Suresh, I'd be grateful if you could kindly throw some light on these
>> issues... I'm actually debugging an issue where an offline CPU gets apic timer
>> interrupts (and in one case, I even saw a device interrupt), which I have
>> reported in another thread at: https://lkml.org/lkml/2012/9/26/119
>> But this issue in fixup_irqs() that Liu brought to light looks even more
>> surprising to me..
> 
> These issues look different to me, will look into that.
> 

Ok, thanks a lot!

Regards,
Srivatsa S. Bhat


  reply	other threads:[~2012-09-26 17:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-26 17:38 [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask Chuansheng Liu
2012-09-26  8:49 ` Srivatsa S. Bhat
2012-09-26  8:51   ` Liu, Chuansheng
2012-09-26  8:56   ` Liu, Chuansheng
2012-09-26  9:02     ` Srivatsa S. Bhat
2012-09-26 23:45 ` Chuansheng Liu
2012-09-26 15:47   ` Srivatsa S. Bhat
2012-09-26 16:03   ` Srivatsa S. Bhat
2012-09-26 17:06     ` Suresh Siddha
2012-09-26 17:30       ` Srivatsa S. Bhat [this message]
2012-09-26 22:46         ` Suresh Siddha
2012-09-27 18:42           ` Srivatsa S. Bhat
2012-09-27 19:20             ` Suresh Siddha
2012-09-27 20:33               ` Srivatsa S. Bhat
2012-10-09  8:51           ` Liu, Chuansheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50633BA0.5030700@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=chuansheng.liu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.