From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755425Ab1EFMyr (ORCPT ); Fri, 6 May 2011 08:54:47 -0400 Received: from mga03.intel.com ([143.182.124.21]:2538 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754235Ab1EFMyp convert rfc822-to-8bit (ORCPT ); Fri, 6 May 2011 08:54:45 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.64,326,1301900400"; d="scan'208";a="431696542" From: "Tian, Kevin" To: Thomas Gleixner CC: "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , Ian Campbell , "JBeulich@novell.com" , "xen-devel@lists.xensource.com" Date: Fri, 6 May 2011 20:54:39 +0800 Subject: RE: [PATCH v2 2/2] x86: don't unmask disabled irqs when migrating them Thread-Topic: [PATCH v2 2/2] x86: don't unmask disabled irqs when migrating them Thread-Index: AcwL1GyqnvmzZBFoTHyj0RqABKyCfgAE23UA Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C8ED7F962@shsmsx502.ccr.corp.intel.com> References: <625BA99ED14B2D499DC4E29D8138F1505C8ED7F7E3@shsmsx502.ccr.corp.intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Thomas Gleixner > Sent: Friday, May 06, 2011 6:00 PM > > On Fri, 6 May 2011, Tian, Kevin wrote: > > x86: don't unmask disabled irqs when migrating them > > > > it doesn't make sense to mask/unmask a disabled irq when migrating it > > from offlined cpu to another, because it's not expected to handle any > > instance of it. Current mask/set_affinity/unmask steps may trigger > > unexpected instance on disabled irq which then simply bug on when > > there is no handler for it. One failing example is observed in Xen. > > Xen pvops > > So there is no handler, why the heck is there an irq action? > > if (!irq_has_action(irq) .... > continue; > > Should have caught an uninitialized interrupt. If Xen abuses interrupts that way, > then it rightfully explodes. And we do not fix it by magic somewhere else. sorry that my bad description here. there does be a dummy handler registered on such irqs which simply throws out a BUG_ON when hit. I should just say such injection is not expected instead of no handler. :-) > > > guest marks a special type of irqs as disabled, which are simply used > > As I explained before several times, IRQF_DISABLED has absolutely nothing to > do with it and pvops _CANNOT_ mark an interrupt disabled. I have to admit that I need more study about whole interrupt sub-system, to better understand your explanation here. Also here again my description is not accurate enough. I meant that Xen pvops request the special irq with below flags: IRQF_DISABLED|IRQF_PERCPU|IRQF_NOBALANCING and then later explicitly disable it with disable_irq(). As you said that IRQF_DISABLED itself has nothing to do with it, and it's the later disable_irq() which takes real effect because Xen event chip hooks this callback to mask the irq from the chip level. > > > > > chip = irq_data_get_irq_chip(data); > > - if (!irqd_can_move_in_process_context(data) && chip->irq_mask) > > + do_mask = !irqd_irq_disabled(data) && > > + !irqd_can_move_in_process_context(data) && chip->irq_mask; > > + if (do_mask) > > chip->irq_mask(data); > > This is completely wrong. irqd_irq_disabled() is a status information which does > not tell you whether the interrupt is actually masked at the hardware level > because we do lazy interrupt hardware masking. So your change would keep > the line unmasked at the hardware level for all interrupts which are in the lazy > disabled state. Got it. > > The only conditional which is interesting is the unmask path and that's a simple > optimization and not a correctness problem. > So what's your suggestion based on my updated information? Is there any interface I may take to differentiate above exception with normal case? Basically in Xen usage we want such irqs permanently disabled at the chip level. Or could we only do mask/unmask for irqs which are unmasked atm if as you said it's just an optimization step? :-) Thanks Kevin From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Tian, Kevin" Subject: RE: [PATCH v2 2/2] x86: don't unmask disabled irqs when migrating them Date: Fri, 6 May 2011 20:54:39 +0800 Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C8ED7F962@shsmsx502.ccr.corp.intel.com> References: <625BA99ED14B2D499DC4E29D8138F1505C8ED7F7E3@shsmsx502.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Thomas Gleixner Cc: Ian, "linux-kernel@vger.kernel.org" , "JBeulich@novell.com" , Campbell , "mingo@redhat.com" , "hpa@zytor.com" , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org > From: Thomas Gleixner > Sent: Friday, May 06, 2011 6:00 PM >=20 > On Fri, 6 May 2011, Tian, Kevin wrote: > > x86: don't unmask disabled irqs when migrating them > > > > it doesn't make sense to mask/unmask a disabled irq when migrating it > > from offlined cpu to another, because it's not expected to handle any > > instance of it. Current mask/set_affinity/unmask steps may trigger > > unexpected instance on disabled irq which then simply bug on when > > there is no handler for it. One failing example is observed in Xen. > > Xen pvops >=20 > So there is no handler, why the heck is there an irq action? >=20 > if (!irq_has_action(irq) .... > continue; >=20 > Should have caught an uninitialized interrupt. If Xen abuses interrupts t= hat way, > then it rightfully explodes. And we do not fix it by magic somewhere else= . sorry that my bad description here. there does be a dummy handler registere= d on such irqs which simply throws out a BUG_ON when hit. I should just say s= uch=20 injection is not expected instead of no handler. :-) >=20 > > guest marks a special type of irqs as disabled, which are simply used >=20 > As I explained before several times, IRQF_DISABLED has absolutely nothing= to > do with it and pvops _CANNOT_ mark an interrupt disabled. I have to admit that I need more study about whole interrupt sub-system, to= better understand your explanation here. Also here again my description is not acc= urate enough. I meant that Xen pvops request the special irq with below flags: IRQF_DISABLED|IRQF_PERCPU|IRQF_NOBALANCING and then later explicitly disable it with disable_irq(). As you said that I= RQF_DISABLED itself has nothing to do with it, and it's the later disable_irq() which ta= kes real=20 effect because Xen event chip hooks this callback to mask the irq from the = chip level. >=20 > > > > chip =3D irq_data_get_irq_chip(data); > > - if (!irqd_can_move_in_process_context(data) && chip->irq_mask) > > + do_mask =3D !irqd_irq_disabled(data) && > > + !irqd_can_move_in_process_context(data) && chip->irq_mask; > > + if (do_mask) > > chip->irq_mask(data); >=20 > This is completely wrong. irqd_irq_disabled() is a status information whi= ch does > not tell you whether the interrupt is actually masked at the hardware lev= el > because we do lazy interrupt hardware masking. So your change would keep > the line unmasked at the hardware level for all interrupts which are in t= he lazy > disabled state. Got it. >=20 > The only conditional which is interesting is the unmask path and that's a= simple > optimization and not a correctness problem. >=20 So what's your suggestion based on my updated information? Is there any interface I may take to differentiate above exception with normal case? Bas= ically in Xen usage we want such irqs permanently disabled at the chip level. Or could we only do mask/unmask for irqs which are unmasked atm if as you said it's just an optimization step? :-) =20 Thanks Kevin