linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
	Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Ashok Raj <ashok.raj@intel.com>,
	Andi Kleen <ak@linux.intel.com>, Tony Luck <tony.luck@intel.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Peter Zijlstra \(Intel\)" <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Stephane Eranian <eranian@google.com>,
	Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ricardo Neri <ricardo.neri@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
	Andi Kleen <andi.kleen@intel.com>
Subject: Re: [RFC PATCH v5 07/16] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
Date: Tue, 04 May 2021 22:53:46 +0200	[thread overview]
Message-ID: <87o8dqi5k5.ffs@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20210504190526.22347-8-ricardo.neri-calderon@linux.intel.com>

Ricardo,

On Tue, May 04 2021 at 12:05, Ricardo Neri wrote:
> +static int hardlockup_detector_nmi_handler(unsigned int type,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	int cpu = smp_processor_id();
> +
> +	if (is_hpet_wdt_interrupt(hdata)) {
> +		/*
> +		 * Make a copy of the target mask. We need this as once a CPU
> +		 * gets the watchdog NMI it will clear itself from ipi_cpumask.
> +		 * Also, target_cpumask will be updated in a workqueue for the
> +		 * next NMI IPI.
> +		 */
> +		cpumask_copy(hld_data->ipi_cpumask, hld_data->monitored_cpumask);
> +		/*
> +		 * Even though the NMI IPI will be sent to all CPUs but self,
> +		 * clear the CPU to identify a potential unrelated NMI.
> +		 */
> +		cpumask_clear_cpu(cpu, hld_data->ipi_cpumask);
> +		if (cpumask_weight(hld_data->ipi_cpumask))
> +			apic->send_IPI_mask_allbutself(hld_data->ipi_cpumask,
> +						       NMI_VECTOR);

How is this supposed to work correctly?

x2apic_cluster:
 x2apic_send_IPI_mask_allbutself()
  __x2apic_send_IPI_mask()
    	tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask);
	cpumask_copy(tmpmsk, mask);

So if an NMI hits right after or in the middle of the cpumask_copy()
then the IPI sent from that NMI overwrites tmpmask and when its done
then tmpmask is empty. Similar to when it hits in the middle of
processing, just with the difference that maybe a few IPIs have been
sent already. But the not yet sent ones are lost...

Also anything which ends up in __default_send_IPI_dest_field() is
borked:

__default_send_IPI_dest_field()
	cfg = __prepare_ICR2(mask);
	native_apic_mem_write(APIC_ICR2, cfg);

          <- NMI hits and invokes IPI which invokes __default_send_IPI_dest_field()...

	cfg = __prepare_ICR(0, vector, dest);
	native_apic_mem_write(APIC_ICR, cfg);
        
IOW, when the NMI returns ICR2 has been overwritten and the interrupted
IPI goes into lala land.

Thanks,

        tglx

  reply	other threads:[~2021-05-04 20:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04 19:05 [RFC PATCH v5 00/16] x86: Implement an HPET-based hardlockup detector Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 01/16] x86/hpet: Expose hpet_writel() in header Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 02/16] x86/hpet: Add helper function hpet_set_comparator_periodic() Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 03/16] x86/hpet: Reserve an HPET channel for the hardlockup detector Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 04/16] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 05/16] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 06/16] x86/nmi: Add an NMI_WATCHDOG NMI handler category Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 07/16] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
2021-05-04 20:53   ` Thomas Gleixner [this message]
2021-05-04 19:05 ` [RFC PATCH v5 08/16] x86/watchdog/hardlockup/hpet: Introduce a target_cpumask Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 09/16] watchdog/hardlockup/hpet: Group packages receiving IPIs when needed Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 10/16] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored groups Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 11/16] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 12/16] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog" Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 13/16] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 14/16] x86/watchdog: Add a shim hardlockup detector Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 15/16] watchdog: Expose lockup_detector_reconfigure() Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 16/16] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8dqi5k5.ffs@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=Suravee.Suthikulpanit@amd.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=ashok.raj@intel.com \
    --cc=bp@suse.de \
    --cc=eranian@google.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri-calderon@linux.intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).