All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	"woodhouse, Jacob Pan" <jacob.jun.pan@intel.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Stephane Eranian <eranian@google.com>,
	Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@suse.de>,
	iommu@lists.linux-foundation.org, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ricardo Neri <ricardo.neri@intel.com>,
	Andi Kleen <andi.kleen@intel.com>,
	David Woodhouse <dwmw2@infradead.org>
Subject: Re: [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
Date: Thu, 13 May 2021 18:57:48 -0700	[thread overview]
Message-ID: <20210514015748.GA8236@ranerica-svr.sc.intel.com> (raw)
In-Reply-To: <87lf8uhzk9.ffs@nanos.tec.linutronix.de>

On Wed, May 05, 2021 at 01:03:18AM +0200, Thomas Gleixner wrote:
> On Tue, May 04 2021 at 12:10, Ricardo Neri wrote:

Thank you very much for your feedback, Thomas. I am sorry it took me a
while to reply to your email. I needed to digest and research your
comments.

> > In x86 there is not an IRQF_NMI flag that can be used to indicate the
> 
> There exists no IRQF_NMI flag at all. No architecture provides that.

Thank you for the clarification. I think I meant to say that there is a
request_nmi() function but AFAIK it is only used in the ARM PMU and
would not work on x86.

> 
> > delivery mode when requesting an interrupt (via request_irq()). Thus,
> > there is no way for the interrupt remapping driver to know and set
> > the delivery mode.
> 
> There is no support for this today. So what?

Using request_irq() plus a HPET quirk looked to me a reasonable
way to use the irqdomain hierarchy to allocate an interrupt with NMI as
the delivery mode.

> 
> > Hence, when allocating an interrupt, check if such interrupt belongs to
> > the HPET hardlockup detector and fixup the delivery mode accordingly.
> 
> What?
> 
> > +		/*
> > +		 * If we find the HPET hardlockup detector irq, fixup the
> > +		 * delivery mode.
> > +		 */
> > +		if (is_hpet_irq_hardlockup_detector(info))
> > +			irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
> 
> Again. We are not sticking some random device checks into that
> code. It's wrong and I explained it to you before.
> 
>   https://lore.kernel.org/lkml/alpine.DEB.2.21.1906161042080.1760@nanos.tec.linutronix.de/
> 
> But I'm happy to repeat it again:
> 
>   "No. This is horrible hackery violating all the layering which we carefully
>    put into place to avoid exactly this kind of sprinkling conditionals into
>    all code pathes.
> 
>    With some thought the existing irqdomain hierarchy can be used to achieve
>    the same thing without tons of extra functions and conditionals."
> 
> So the outcome of thought and using the irqdomain hierarchy is:
> 
>    Replacing an hpet specific conditional in one place with an hpet
>    specific conditional in a different place.
> 
> Impressive.

I am sorry Thomas, I did try to make the quirk less hacky but I did not
think of the solution you provide below.

> 
> hpet_assign_irq(...., bool nmi)
>   init_info(info)
>     ...
>     if (nmi)
>         info.flags |= X86_IRQ_ALLOC_AS_NMI;
>   
>    irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info)
>      intel_irq_remapping_alloc(..., info)
>        irq_domain_alloc_irq_parents(..., info)
>          x86_vector_alloc_irqs(..., info)
>          {   
>            if (info->flags & X86_IRQ_ALLOC_AS_NMI && nr_irqs != 1)
>                   return -EINVAL;
> 
>            for (i = 0; i < nr_irqs; i++) {
>              ....
>              if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
>                  irq_cfg_setup_nmi(apicd);
>                  continue;
>              }
>              ...
>          }
> 
> irq_cfg_setup_nmi() sets irq_cfg->delivery_mode and whatever is required
> and everything else just works. Of course this needs a few other minor
> tweaks but none of those introduces random hpet quirks all over the
> place. Not convoluted enough, right?

Thanks for the detailed demonstration! It does seem cleaner than what I
implemented.

> 
> But that solves none of other problems. Let me summarize again which
> options or non-options we have:
> 
>     1) Selective IPIs from NMI context cannot work
> 
>        As explained in the other thread.
> 
>     2) Shorthand IPI allbutself from NMI
>     
>        This should work, but that obviously does not take the watchdog
>        cpumask into account.
> 
>        Also this only works when IPI shorthand mode is enabled. See
>        apic_smt_update() for details.
> 
>     3) Sending the IPIs from irq_work
> 
>        This would solve the problem, but if the CPU which is the NMI
>        target is really stuck in an interrupt disabled region then the
>        IPIs won't be sent.
> 
>        OTOH, if that's the case then the CPU which was processing the
>        NMI will continue to be stuck until the next NMI hits which
>        will detect that the CPU is stuck which is a good enough
>        reason to send a shorthand IPI to all CPUs ignoring the
>        watchdog cpumask.
> 
>        Same limitation vs. shorthand mode as #2
> 
>     4) Changing affinity of the HPET NMI from NMI
> 
>        As we established two years ago that cannot work with interrupt
>        remapping
> 
>     5) Changing affinity of the HPET NMI from irq_work
> 
>        Same issues as #3
> 
> Anything else than #2 is just causing more problems than it solves, but
> surely the NOHZ_FULL/isolation people might have opinions on this.
> 
> OTOH, as this is opt-in, anything which wants a watchdog mask which is
> not the full online set, has to accept that HPET has these restrictions.
> 
> And that's exactly what I suggested two years ago:
> 
>  https://lore.kernel.org/lkml/alpine.DEB.2.21.1906172343120.1963@nanos.tec.linutronix.de/
> 
>   "It definitely would be worthwhile to experiment with that. if we
>    could use shorthands (also for regular IPIs) that would be a great
>    improvement in general and would nicely solve that NMI issue. Beware
>    of the dragons though."
> 
> As a consequence of this conversation I implemented shorthand IPIs...
> 
> But I haven't seen any mentioning that this has been tried, why the
> approach was not chosen or any discussion about that matter.

Indeed, I focused on 5) and I overlooked your comment on using your
new support for shortand IPIs.

I'll go back and see to implement option #2, or perhaps the alternative
solution you proposed on a separate thread.

Thanks and BR,
Ricardo

WARNING: multiple messages have this Message-ID (diff)
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	David Woodhouse <dwmw2@infradead.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Stephane Eranian <eranian@google.com>,
	Ricardo Neri <ricardo.neri@intel.com>,
	iommu@lists.linux-foundation.org, "woodhouse,
	Jacob Pan" <jacob.jun.pan@intel.com>,
	Andi Kleen <andi.kleen@intel.com>, Borislav Petkov <bp@suse.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt
Date: Thu, 13 May 2021 18:57:48 -0700	[thread overview]
Message-ID: <20210514015748.GA8236@ranerica-svr.sc.intel.com> (raw)
In-Reply-To: <87lf8uhzk9.ffs@nanos.tec.linutronix.de>

On Wed, May 05, 2021 at 01:03:18AM +0200, Thomas Gleixner wrote:
> On Tue, May 04 2021 at 12:10, Ricardo Neri wrote:

Thank you very much for your feedback, Thomas. I am sorry it took me a
while to reply to your email. I needed to digest and research your
comments.

> > In x86 there is not an IRQF_NMI flag that can be used to indicate the
> 
> There exists no IRQF_NMI flag at all. No architecture provides that.

Thank you for the clarification. I think I meant to say that there is a
request_nmi() function but AFAIK it is only used in the ARM PMU and
would not work on x86.

> 
> > delivery mode when requesting an interrupt (via request_irq()). Thus,
> > there is no way for the interrupt remapping driver to know and set
> > the delivery mode.
> 
> There is no support for this today. So what?

Using request_irq() plus a HPET quirk looked to me a reasonable
way to use the irqdomain hierarchy to allocate an interrupt with NMI as
the delivery mode.

> 
> > Hence, when allocating an interrupt, check if such interrupt belongs to
> > the HPET hardlockup detector and fixup the delivery mode accordingly.
> 
> What?
> 
> > +		/*
> > +		 * If we find the HPET hardlockup detector irq, fixup the
> > +		 * delivery mode.
> > +		 */
> > +		if (is_hpet_irq_hardlockup_detector(info))
> > +			irq_cfg->delivery_mode = APIC_DELIVERY_MODE_NMI;
> 
> Again. We are not sticking some random device checks into that
> code. It's wrong and I explained it to you before.
> 
>   https://lore.kernel.org/lkml/alpine.DEB.2.21.1906161042080.1760@nanos.tec.linutronix.de/
> 
> But I'm happy to repeat it again:
> 
>   "No. This is horrible hackery violating all the layering which we carefully
>    put into place to avoid exactly this kind of sprinkling conditionals into
>    all code pathes.
> 
>    With some thought the existing irqdomain hierarchy can be used to achieve
>    the same thing without tons of extra functions and conditionals."
> 
> So the outcome of thought and using the irqdomain hierarchy is:
> 
>    Replacing an hpet specific conditional in one place with an hpet
>    specific conditional in a different place.
> 
> Impressive.

I am sorry Thomas, I did try to make the quirk less hacky but I did not
think of the solution you provide below.

> 
> hpet_assign_irq(...., bool nmi)
>   init_info(info)
>     ...
>     if (nmi)
>         info.flags |= X86_IRQ_ALLOC_AS_NMI;
>   
>    irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info)
>      intel_irq_remapping_alloc(..., info)
>        irq_domain_alloc_irq_parents(..., info)
>          x86_vector_alloc_irqs(..., info)
>          {   
>            if (info->flags & X86_IRQ_ALLOC_AS_NMI && nr_irqs != 1)
>                   return -EINVAL;
> 
>            for (i = 0; i < nr_irqs; i++) {
>              ....
>              if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
>                  irq_cfg_setup_nmi(apicd);
>                  continue;
>              }
>              ...
>          }
> 
> irq_cfg_setup_nmi() sets irq_cfg->delivery_mode and whatever is required
> and everything else just works. Of course this needs a few other minor
> tweaks but none of those introduces random hpet quirks all over the
> place. Not convoluted enough, right?

Thanks for the detailed demonstration! It does seem cleaner than what I
implemented.

> 
> But that solves none of other problems. Let me summarize again which
> options or non-options we have:
> 
>     1) Selective IPIs from NMI context cannot work
> 
>        As explained in the other thread.
> 
>     2) Shorthand IPI allbutself from NMI
>     
>        This should work, but that obviously does not take the watchdog
>        cpumask into account.
> 
>        Also this only works when IPI shorthand mode is enabled. See
>        apic_smt_update() for details.
> 
>     3) Sending the IPIs from irq_work
> 
>        This would solve the problem, but if the CPU which is the NMI
>        target is really stuck in an interrupt disabled region then the
>        IPIs won't be sent.
> 
>        OTOH, if that's the case then the CPU which was processing the
>        NMI will continue to be stuck until the next NMI hits which
>        will detect that the CPU is stuck which is a good enough
>        reason to send a shorthand IPI to all CPUs ignoring the
>        watchdog cpumask.
> 
>        Same limitation vs. shorthand mode as #2
> 
>     4) Changing affinity of the HPET NMI from NMI
> 
>        As we established two years ago that cannot work with interrupt
>        remapping
> 
>     5) Changing affinity of the HPET NMI from irq_work
> 
>        Same issues as #3
> 
> Anything else than #2 is just causing more problems than it solves, but
> surely the NOHZ_FULL/isolation people might have opinions on this.
> 
> OTOH, as this is opt-in, anything which wants a watchdog mask which is
> not the full online set, has to accept that HPET has these restrictions.
> 
> And that's exactly what I suggested two years ago:
> 
>  https://lore.kernel.org/lkml/alpine.DEB.2.21.1906172343120.1963@nanos.tec.linutronix.de/
> 
>   "It definitely would be worthwhile to experiment with that. if we
>    could use shorthands (also for regular IPIs) that would be a great
>    improvement in general and would nicely solve that NMI issue. Beware
>    of the dragons though."
> 
> As a consequence of this conversation I implemented shorthand IPIs...
> 
> But I haven't seen any mentioning that this has been tried, why the
> approach was not chosen or any discussion about that matter.

Indeed, I focused on 5) and I overlooked your comment on using your
new support for shortand IPIs.

I'll go back and see to implement option #2, or perhaps the alternative
solution you proposed on a separate thread.

Thanks and BR,
Ricardo
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2021-05-14  1:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04 19:10 [RFC PATCH v5 0/7] x86: watchdog/hardlockup/hpet: Add support for interrupt remapping Ricardo Neri
2021-05-04 19:10 ` Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 1/7] x86/apic: Add irq_cfg::delivery_mode Ricardo Neri
2021-05-04 19:10   ` Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 2/7] x86/hpet: Introduce function to identify HPET hardlockup detector irq Ricardo Neri
2021-05-04 19:10   ` Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 3/7] iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode Ricardo Neri
2021-05-04 19:10   ` Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 4/7] iommu/amd: Set the IRTE delivery mode from irq_cfg Ricardo Neri
2021-05-04 19:10   ` Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 5/7] iommu/vt-d: Fixup delivery mode of the HPET hardlockup interrupt Ricardo Neri
2021-05-04 19:10   ` Ricardo Neri
2021-05-04 23:03   ` Thomas Gleixner
2021-05-04 23:03     ` Thomas Gleixner
2021-05-14  1:57     ` Ricardo Neri [this message]
2021-05-14  1:57       ` Ricardo Neri
2021-05-14 21:31   ` Thomas Gleixner
2021-05-14 21:31     ` Thomas Gleixner
2021-05-04 19:10 ` [RFC PATCH v5 6/7] iommu/amd: " Ricardo Neri
2021-05-04 19:10   ` Ricardo Neri
2021-05-04 19:10 ` [RFC PATCH v5 7/7] x86/watchdog/hardlockup/hpet: Support interrupt remapping Ricardo Neri
2021-05-04 19:10   ` Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210514015748.GA8236@ranerica-svr.sc.intel.com \
    --to=ricardo.neri-calderon@linux.intel.com \
    --cc=andi.kleen@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@suse.de \
    --cc=dwmw2@infradead.org \
    --cc=eranian@google.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.