From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Chuansheng Liu <chuansheng.liu@intel.com>,
tglx@linutronix.de, mingo@redhat.com, x86@kernel.org,
linux-kernel@vger.kernel.org, yanmin_zhang@linux.intel.com,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
"rusty@rustcorp.com.au" <rusty@rustcorp.com.au>
Subject: Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask
Date: Wed, 26 Sep 2012 23:00:08 +0530 [thread overview]
Message-ID: <50633BA0.5030700@linux.vnet.ibm.com> (raw)
In-Reply-To: <1348679199.26695.455.camel@sbsiddha-desk.sc.intel.com>
On 09/26/2012 10:36 PM, Suresh Siddha wrote:
> On Wed, 2012-09-26 at 21:33 +0530, Srivatsa S. Bhat wrote:
>> I have some fundamental questions here:
>> 1. Why was the CPU never removed from the affinity masks in the original
>> code? I find it hard to believe that it was just an oversight, because the
>> whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
>> So, is that really a bug or is the existing code correct for some reason
>> which I don't know of?
>
> I am not aware of the history but my guess is that the affinity mask
> which is coming from the user-space wants to be preserved. And
> fixup_irqs() is fixing the underlying interrupt routing when the cpu
> goes down
and the code that corresponds to that is:
irq_force_complete_move(irq); is it?
> with a hope that things will be corrected when the cpu comes
> back online. But as Liu noted, we are not correcting the underlying
> routing when the cpu comes back online. I think we should fix that
> rather than modifying the user-specified affinity.
>
Hmm, I didn't entirely get your suggestion. Are you saying that we should change
data->affinity (by calling ->irq_set_affinity()) during offline but maintain a
copy of the original affinity mask somewhere, so that we can try to match it
when possible (ie., when CPU comes back online)?
>> 2. In case this is indeed a bug, why are the warnings ratelimited when the
>> interrupts can't be affined to other CPUs? Are they not serious enough to
>> report? Put more strongly, why do we even silently return with a warning
>> instead of reporting that the CPU offline operation failed?? Is that because
>> we have come way too far in the hotplug sequence and we can't easily roll
>> back? Or are we still actually OK in that situation?
>
> Are you referring to the "cannot set affinity for irq" messages?
Yes
> That happens only if the irq chip doesn't have the irq_set_affinity() setup.
That is my other point of concern : setting irq affinity can fail even if
we have ->irq_set_affinity(). (If __ioapic_set_affinity() fails, for example).
Why don't we complain in that case? I think we should... and if its serious
enough, abort the hotplug operation or atleast indicate that offline failed..
> But that is not common.
>
>>
>> Suresh, I'd be grateful if you could kindly throw some light on these
>> issues... I'm actually debugging an issue where an offline CPU gets apic timer
>> interrupts (and in one case, I even saw a device interrupt), which I have
>> reported in another thread at: https://lkml.org/lkml/2012/9/26/119
>> But this issue in fixup_irqs() that Liu brought to light looks even more
>> surprising to me..
>
> These issues look different to me, will look into that.
>
Ok, thanks a lot!
Regards,
Srivatsa S. Bhat
next prev parent reply other threads:[~2012-09-26 17:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-26 17:38 [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the irq affinity mask Chuansheng Liu
2012-09-26 8:49 ` Srivatsa S. Bhat
2012-09-26 8:51 ` Liu, Chuansheng
2012-09-26 8:56 ` Liu, Chuansheng
2012-09-26 9:02 ` Srivatsa S. Bhat
2012-09-26 23:45 ` Chuansheng Liu
2012-09-26 15:47 ` Srivatsa S. Bhat
2012-09-26 16:03 ` Srivatsa S. Bhat
2012-09-26 17:06 ` Suresh Siddha
2012-09-26 17:30 ` Srivatsa S. Bhat [this message]
2012-09-26 22:46 ` Suresh Siddha
2012-09-27 18:42 ` Srivatsa S. Bhat
2012-09-27 19:20 ` Suresh Siddha
2012-09-27 20:33 ` Srivatsa S. Bhat
2012-10-09 8:51 ` Liu, Chuansheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50633BA0.5030700@linux.vnet.ibm.com \
--to=srivatsa.bhat@linux.vnet.ibm.com \
--cc=chuansheng.liu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.