linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org, Tony Luck <tony.luck@intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Stephane Eranian <eranian@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,
	Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ricardo Neri <ricardo.neri@intel.com>,
	iommu@lists.linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs
Date: Fri, 13 May 2022 11:03:20 -0700	[thread overview]
Message-ID: <20220513180320.GA22683@ranerica-svr.sc.intel.com> (raw)
In-Reply-To: <87zgjufjrf.ffs@tglx>

On Fri, May 06, 2022 at 11:12:20PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> > Vectors are meaningless when allocating IRQs with NMI as the delivery
> > mode.
> 
> Vectors are not meaningless. NMI has a fixed vector.
> 
> The point is that for a fixed vector there is no vector management
> required.
> 
> Can you spot the difference?

Yes, I see your point now. Thank you for the explanation.

> 
> > In such case, skip the reservation of IRQ vectors. Do it in the lowest-
> > level functions where the actual IRQ reservation takes place.
> >
> > Since NMIs target specific CPUs, keep the functionality to find the best
> > CPU.
> 
> Again. What for?
>   
> > +	if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> > +		cpu = irq_matrix_find_best_cpu(vector_matrix, dest);
> > +		apicd->cpu = cpu;
> > +		vector = 0;
> > +		goto no_vector;
> > +	}
> 
> Why would a NMI ever end up in this code? There is no vector management
> required and this find cpu exercise is pointless.

But even if the NMI has a fixed vector, it is still necessary to determine
which CPU will get the NMI. It is still necessary to determine what to
write in the Destination ID field of the MSI message.

irq_matrix_find_best_cpu() would find the CPU with the lowest number of
managed vectors so that the NMI is directed to that CPU. 

In today's code, an NMI would end up here because we rely on the existing
interrupt management infrastructure... Unless, the check is done the entry
points as you propose.

> 
> >  	vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
> >  	trace_vector_alloc(irqd->irq, vector, resvd, vector);
> >  	if (vector < 0)
> >  		return vector;
> >  	apic_update_vector(irqd, vector, cpu);
> > +
> > +no_vector:
> >  	apic_update_irq_cfg(irqd, vector, cpu);
> >  
> >  	return 0;
> > @@ -321,12 +330,22 @@ assign_managed_vector(struct irq_data *irqd, const struct cpumask *dest)
> >  	/* set_affinity might call here for nothing */
> >  	if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
> >  		return 0;
> > +
> > +	if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI) {
> > +		cpu = irq_matrix_find_best_cpu_managed(vector_matrix, dest);
> > +		apicd->cpu = cpu;
> > +		vector = 0;
> > +		goto no_vector;
> > +	}
> 
> This code can never be reached for a NMI delivery. If so, then it's a
> bug.
> 
> This all is special purpose for that particular HPET NMI watchdog use
> case and we are not exposing this to anything else at all.
> 
> So why are you sprinkling this NMI nonsense all over the place? Just
> because? There are well defined entry points to all of this where this
> can be fenced off.

I put the NMI checks in these points because assign_vector_locked() and
assign_managed_vector() are reached through multiple paths and these are
the two places where the allocation of the vector is requested and the
destination CPU is determined.

I do observe this code being reached for an NMI, but that is because this
code still does not know about NMIs... Unless the checks for NMI are put
in the entry points as you pointed out.

The intent was to refactor the code in a generic manner and not to focus
only in the NMI watchdog. That would have looked hacky IMO.

> 
> If at some day the hardware people get their act together and provide a
> proper vector space for NMIs then we have to revisit that, but then
> there will be a separate NMI vector management too.
> 
> What you want is the below which also covers the next patch. Please keep
> an eye on the comments I added/modified.

Thank you for the code and the clarifying comments.
> 
> Thanks,
> 
>         tglx
> ---
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -42,6 +42,7 @@ EXPORT_SYMBOL_GPL(x86_vector_domain);
>  static DEFINE_RAW_SPINLOCK(vector_lock);
>  static cpumask_var_t vector_searchmask;
>  static struct irq_chip lapic_controller;
> +static struct irq_chip lapic_nmi_controller;
>  static struct irq_matrix *vector_matrix;
>  #ifdef CONFIG_SMP
>  static DEFINE_PER_CPU(struct hlist_head, cleanup_list);
> @@ -451,6 +452,10 @@ static int x86_vector_activate(struct ir
>  	trace_vector_activate(irqd->irq, apicd->is_managed,
>  			      apicd->can_reserve, reserve);
>  
> +	/* NMI has a fixed vector. No vector management required */
> +	if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI)
> +		return 0;
> +
>  	raw_spin_lock_irqsave(&vector_lock, flags);
>  	if (!apicd->can_reserve && !apicd->is_managed)
>  		assign_irq_vector_any_locked(irqd);
> @@ -472,6 +477,10 @@ static void vector_free_reserved_and_man
>  	trace_vector_teardown(irqd->irq, apicd->is_managed,
>  			      apicd->has_reserved);
>  
> +	/* NMI has a fixed vector. No vector management required */
> +	if (apicd->hw_irq_cfg.delivery_mode == APIC_DELIVERY_MODE_NMI)
> +		return;
> +
>  	if (apicd->has_reserved)
>  		irq_matrix_remove_reserved(vector_matrix);
>  	if (apicd->is_managed)
> @@ -568,6 +577,24 @@ static int x86_vector_alloc_irqs(struct
>  		irqd->hwirq = virq + i;
>  		irqd_set_single_target(irqd);
>  
> +		if (info->flags & X86_IRQ_ALLOC_AS_NMI) {
> +			/*
> +			 * NMIs have a fixed vector and need their own
> +			 * interrupt chip so nothing can end up in the
> +			 * regular local APIC management code except the
> +			 * MSI message composing callback.
> +			 */
> +			irqd->chip = &lapic_nmi_controller;
> +			/*
> +			 * Don't allow affinity setting attempts for NMIs.
> +			 * This cannot work with the regular affinity
> +			 * mechanisms and for the intended HPET NMI
> +			 * watchdog use case it's not required.

But we do need the ability to set affinity, right? As stated above, we need
to know what Destination ID to write in the MSI message or in the interrupt
remapping table entry.

It cannot be any CPU because only one specific CPU is supposed to handle the
NMI from the HPET channel.

We cannot hard-code a CPU for that because it may go offline (and ignore NMIs)
or not be part of the monitored CPUs.

Also, if lapic_nmi_controller.irq_set_affinity() is NULL, then irq_chips
INTEL-IR, AMD-IR, those using msi_domain_set_affinity() need to check for NULL.
They currently unconditionally call the parent irq_chip's irq_set_affinity().
I see that there is a irq_chip_set_affinity_parent() function. Perhaps it can
be used for this check?

> +			 */
> +			irqd_set_no_balance(irqd);

This code does not set apicd->hw_irq_cfg.delivery_mode as NMI, right?
I had to add that to make it work.

> +			continue;
> +		}
> +
>  		/*
>  		 * Prevent that any of these interrupts is invoked in
>  		 * non interrupt context via e.g. generic_handle_irq()
> @@ -921,6 +948,11 @@ static struct irq_chip lapic_controller
>  	.irq_retrigger		= apic_retrigger_irq,
>  };
>  
> +static struct irq_chip lapic_nmi_controller = {
> +	.name			= "APIC-NMI",
> +	.irq_compose_msi_msg	= x86_vector_msi_compose_msg,
> +};
> +
>  #ifdef CONFIG_SMP
>  
>  static void free_moved_vector(struct apic_chip_data *apicd)
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -261,6 +261,11 @@ static inline bool irqd_is_per_cpu(struc
>  	return __irqd_to_state(d) & IRQD_PER_CPU;
>  }
>  
> +static inline void irqd_set_no_balance(struct irq_data *d)
> +{
> +	__irqd_to_state(d) |= IRQD_NO_BALANCING;
> +}
> +
>  static inline bool irqd_can_balance(struct irq_data *d)
>  {
>  	return !(__irqd_to_state(d) & (IRQD_PER_CPU | IRQD_NO_BALANCING));

Thanks and BR,
Ricardo

  reply	other threads:[~2022-05-13 17:59 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-05 23:59 [PATCH v6 00/29] x86: Implement an HPET-based hardlockup detector Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 01/29] irq/matrix: Expose functions to allocate the best CPU for new vectors Ricardo Neri
2022-05-06 19:48   ` Thomas Gleixner
2022-05-12  0:09     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 02/29] x86/apic: Add irq_cfg::delivery_mode Ricardo Neri
2022-05-06 19:53   ` Thomas Gleixner
2022-05-12  0:26     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 03/29] x86/apic/msi: Set the delivery mode individually for each IRQ Ricardo Neri
2022-05-06 20:05   ` Thomas Gleixner
2022-05-12  0:38     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 04/29] x86/apic: Add the X86_IRQ_ALLOC_AS_NMI irq allocation flag Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs Ricardo Neri
2022-05-06 21:12   ` Thomas Gleixner
2022-05-13 18:03     ` Ricardo Neri [this message]
2022-05-13 20:50       ` Thomas Gleixner
2022-05-13 23:45         ` Ricardo Neri
2022-05-14  8:15           ` Thomas Gleixner
2022-05-05 23:59 ` [PATCH v6 06/29] x86/apic/vector: Implement support for NMI delivery mode Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 07/29] iommu/vt-d: Clear the redirection hint when the destination mode is physical Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 08/29] iommu/vt-d: Rework prepare_irte() to support per-IRQ delivery mode Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 09/29] iommu/vt-d: Set the IRTE delivery mode individually for each IRQ Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 10/29] iommu/vt-d: Implement minor tweaks for NMI irqs Ricardo Neri
2022-05-06 21:23   ` Thomas Gleixner
2022-05-13 18:07     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 11/29] iommu/amd: Expose [set|get]_dev_entry_bit() Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 12/29] iommu/amd: Enable NMIPass when allocating an NMI irq Ricardo Neri
2022-05-06 21:26   ` Thomas Gleixner
2022-05-13 19:01     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 13/29] iommu/amd: Compose MSI messages for NMI irqs in non-IR format Ricardo Neri
2022-05-06 21:31   ` Thomas Gleixner
2022-05-13 19:03     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 14/29] x86/hpet: Expose hpet_writel() in header Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic() Ricardo Neri
2022-05-06 21:41   ` Thomas Gleixner
2022-05-06 21:51     ` Thomas Gleixner
2022-05-13 21:29       ` Ricardo Neri
2022-05-13 21:19     ` Ricardo Neri
2022-05-14  8:17       ` Thomas Gleixner
2022-05-17 22:54         ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 16/29] x86/hpet: Prepare IRQ assignments to use the X86_ALLOC_AS_NMI flag Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 17/29] x86/hpet: Reserve an HPET channel for the hardlockup detector Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 18/29] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 19/29] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init() Ricardo Neri
2022-05-10 10:38   ` Nicholas Piggin
2022-05-13 23:16     ` Ricardo Neri
2022-05-20  0:25       ` Nicholas Piggin
2022-05-06  0:00 ` [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category Ricardo Neri
2022-05-09 13:59   ` Thomas Gleixner
2022-05-17 18:41     ` Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
2022-05-09 14:03   ` Thomas Gleixner
2022-05-13 22:16     ` Ricardo Neri
2022-05-14 14:04       ` Thomas Gleixner
2022-05-06  0:00 ` [PATCH v6 23/29] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 24/29] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog" Ricardo Neri
2022-05-10 10:46   ` Nicholas Piggin
2022-05-13 23:17     ` Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 25/29] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 26/29] x86/watchdog: Add a shim hardlockup detector Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 27/29] watchdog: Expose lockup_detector_reconfigure() Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz Ricardo Neri
2022-05-10 11:16   ` Nicholas Piggin
2022-05-10 11:44     ` Thomas Gleixner
2022-05-17 22:53       ` Ricardo Neri
2022-05-17 22:08     ` Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 29/29] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable Ricardo Neri
2022-05-10 12:14   ` Nicholas Piggin
2022-05-17  3:09     ` Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220513180320.GA22683@ranerica-svr.sc.intel.com \
    --to=ricardo.neri-calderon@linux.intel.com \
    --cc=Suravee.Suthikulpanit@amd.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=eranian@google.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).