From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@suse.de>
Cc: Ashok Raj <ashok.raj@intel.com>, Joerg Roedel <joro@8bytes.org>,
Andi Kleen <andi.kleen@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>,
Stephane Eranian <eranian@google.com>,
"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
Randy Dunlap <rdunlap@infradead.org>,
x86@kernel.org, linux-kernel@vger.kernel.org,
iommu@lists.linux-foundation.org,
Ricardo Neri <ricardo.neri@intel.com>,
Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Subject: [RFC PATCH v4 00/21] Implement an HPET-based hardlockup detector
Date: Thu, 23 May 2019 18:16:02 -0700 [thread overview]
Message-ID: <1558660583-28561-1-git-send-email-ricardo.neri-calderon@linux.intel.com> (raw)
Hi,
This is the third attempt to demonstrate the implementation of a
hardlockup detector driven by the High-Precision Event Timer. This
version provides a few but important updates with respect the previous
version (please refer to the Changes since v3 section). The three initial
implementations can be found here [1], [2], and [3].
== Introduction ==
In CPU architectures that do not have an NMI watchdog, one can be
constructed using a counter of the Performance Monitoring Unit (PMU).
Counters in the PMU have high granularity and high visibility of the CPU.
These capabilities and their limited number make these counters precious
resources. Unfortunately, the perf-based hardlockup detector permanently
consumes one of these counters per CPU.
These counters could be freed for profiling purposes if the hardlockup
detector were driven by another timer.
The hardlockup detector runs relatively infrequently and does not require
visibility of the CPU activity (in addition to detect locked-up CPUs). A
timer that is external to the CPU (e.g., in the chipset) can be used to
drive the detector.
A key requirement is that the timer needs to be capable of issuing a
non-maskable interrupt to the CPU. In most cases, this can be achieved
by tweaking the delivery mode of the interrupt. It is especially
straightforward for MSI interrupts.
== Details of this implementation
This implementation uses an HPET timer to deliver an NMI interrupt via
an MSI message.
Unlike the the perf-based hardlockup detector, this implementation is
driven by a single timer. The timer targets one CPU at a time in a round-
robin manner. This means that if a CPU must be monitored every watch_thresh
seconds, in a system with N monitored CPUs the timer must expire every
watch_thresh/N. A timer expiration per CPU attribute is maintained.
The timer expiration time per CPU is updated every time CPUs are put
online or offline (a CPU hotplug thread enables and disables the watchdog
in these events) or the user changes the file /proc/sys/kernel/
watchdog_cpumask.
Also, given that a single timer drives the detector, a cpumask is needed
to keep track of which online CPUs are allowed to be monitored. This mask
is also updated every time a CPU is put online or offline or when the user
modifies the mask in /proc/sys/kernel/watchdog_cpumask. This mask
is needed to keep the current behavior of the lockup detector.
In order to avoid reading HPET registers in every NMI, the time-stamp
counter is used to determine whether the HPET caused the interrupt. At
every timer expiration, we compute the value the time-stamp counter is
expected to have the next time the timer expires. I have found
experimentally that expected TSC value consistently has an error of less
than 1.5%
Furthermore, only one write to HPET registers is done every
watchdog_thresh seconds. This write can be eliminated if the HPET timer
is periodic.
== Parts of this series ==
For clarity, patches are grouped as follows:
1) New irq definition. Patch 1 adds a definition for NMI delivery mode
in MSI interrupts. No other changes are done to generic irq code.
2) HPET updates. Patches 2-7 prepare the HPET code to accommodate the
new detector: rework periodic programming, reserve and configure a
timer for the detector and expose a few existing functions.
3) NMI watchdog. Patches 8-11 updates the existing hardlockup detector
to uncouple it from perf, switch back to the perf implementation if
TSC becomes unstable, and introduce a new NMI handler category
intended to run after the NMI_LOCAL handlers.
4) New HPET-based hardlockup detector. Patches 12-17 includes changes to
probe the hardware resources, configure the interrupt and rotate the
destination of the interrupts among all monitored CPUs. Also, it
includes an x86-specific shim hardlockup detector that selects
between HPET and perf implementations.
5) Interrupt remapping. Patches 18-22 add support to operate this new
detector with interrupt remapping enabled.
Thanks and BR,
Ricardo
Change since v3:
* Fixed yet another bug in periodic programming of the HPET timer that
prevented the system from booting.
* Fixed computation of HPET frequency to use hpet_readl() only.
* Added a missing #include in the watchdog_hld_hpet.c
* Fixed various typos and grammar errors (Randy Dunlap)
Changes since v2:
* Added functionality to switch to the perf-based hardlockup
detector if the TSC becomes unstable (Thomas Gleixner).
* Brought back the round-robin mechanism proposed in v1 (this time not
using the interrupt subsystem). This also requires to compute
expiration times as in v1 (Andi Kleen, Stephane Eranian).
* Fixed a bug in which using a periodic timer was not working(thanks
to Suravee Suthikulpanit!).
* In this version, I incorporate support for interrupt remapping in the
last 4 patches so that they can be reviewed separately if needed.
* Removed redundant documentation of functions (Thomas Gleixner).
* Added a new category of NMI handler, NMI_WATCHDOG, which executes after
NMI_LOCAL handlers (Andi Kleen).
* Updated handling of "nmi_watchdog" to support comma-separated
arguments.
* Undid split of the generic hardlockup detector into a separate file
(Thomas Gleixner).
* Added a new intermediate symbol CONFIG_HARDLOCKUP_DETECTOR_CORE to
select generic parts of the detector (Paul E. McKenney,
Thomas Gleixner).
* Removed use of struct cpumask in favor of a variable length array in
conjunction with kzalloc (Peter Zijlstra).
* Added CPU as argument hardlockup_detector_hpet_enable()/disable()
(Thomas Gleixner).
* Remove unnecessary export of function declarations, flags and bit
fields (Thomas Gleixner).
* Removed unnecessary check for FSB support when reserving timer for the
detector (Thomas Gleixner).
* Separated TSC code from HPET code in kick_timer() (Thomas Gleixner).
* Reworked condition to check if the expected TSC value is within the
error margin to avoid conditional (Peter Zijlstra).
* Removed TSC error margin from struct hld_data; use global variable
instead (Peter Zijlstra).
* Removed previously introduced watchdog_get_allowed_cpumask*() and
reworked hardlockup_detector_hpet_enable()/disable() to not need
access to watchdog_allowed_mask (Thomas Gleixner).
Changes since v1:
* Removed reads to HPET registers at every NMI. Instead use the time-stamp
counter to infer the interrupt source (Thomas Gleixner, Andi Kleen).
* Do not target CPUs in a round-robin manner. Instead, the HPET timer
always targets the same CPU; other CPUs are monitored via an
interprocessor interrupt.
* Removed use of generic irq code to set interrupt affinity and NMI
delivery. Instead, configure the interrupt directly in HPET registers
(Thomas Gleixner).
* Removed the proposed ops structure for NMI watchdogs. Instead, split
the existing implementation into a generic library and perf-specific
infrastructure (Thomas Gleixner, Nicholas Piggin).
* Added an x86-specific shim hardlockup detector that selects between
HPET and perf infrastructures as needed (Nicholas Piggin).
* Removed locks taken in NMI and !NMI context. This was wrong and is no
longer needed (Thomas Gleixner).
* Fixed unconditonal return NMI_HANDLED when the HPET timer is programmed
for FSB/MSI delivery (Peter Zijlstra).
References:
[1]. https://lkml.org/lkml/2018/6/12/1027
[2]. https://lkml.org/lkml/2019/2/27/402
[3]. https://lkml.org/lkml/2019/5/14/386
Ricardo Neri (21):
x86/msi: Add definition for NMI delivery mode
x86/hpet: Expose hpet_writel() in header
x86/hpet: Calculate ticks-per-second in a separate function
x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes
x86/hpet: Reserve timer for the HPET hardlockup detector
x86/hpet: Configure the timer used by the hardlockup detector
watchdog/hardlockup: Define a generic function to detect hardlockups
watchdog/hardlockup: Decouple the hardlockup detector from perf
x86/nmi: Add a NMI_WATCHDOG NMI handler category
watchdog/hardlockup: Add function to enable NMI watchdog on all
allowed CPUs at once
x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
watchdog/hardlockup/hpet: Adjust timer expiration on the number of
monitored CPUs
x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"
watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot
parameter
x86/watchdog: Add a shim hardlockup detector
x86/tsc: Switch to perf-based hardlockup detector if TSC become
unstable
x86/apic: Add a parameter for the APIC delivery mode
iommu/vt-d: Rework prepare_irte() to support per-irq delivery mode
iommu/vt-d: hpet: Reserve an interrupt remampping table entry for
watchdog
x86/watchdog/hardlockup/hpet: Support interrupt remapping
.../admin-guide/kernel-parameters.txt | 8 +-
arch/x86/Kconfig.debug | 15 +
arch/x86/include/asm/hpet.h | 47 ++
arch/x86/include/asm/hw_irq.h | 5 +-
arch/x86/include/asm/msidef.h | 4 +
arch/x86/include/asm/nmi.h | 1 +
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/apic/vector.c | 10 +
arch/x86/kernel/hpet.c | 115 ++++-
arch/x86/kernel/nmi.c | 10 +
arch/x86/kernel/tsc.c | 2 +
arch/x86/kernel/watchdog_hld.c | 85 ++++
arch/x86/kernel/watchdog_hld_hpet.c | 453 ++++++++++++++++++
drivers/char/hpet.c | 31 +-
drivers/iommu/intel_irq_remapping.c | 59 ++-
include/linux/hpet.h | 1 +
include/linux/nmi.h | 8 +-
kernel/Makefile | 2 +-
kernel/watchdog.c | 23 +-
kernel/watchdog_hld.c | 50 +-
lib/Kconfig.debug | 4 +
21 files changed, 879 insertions(+), 56 deletions(-)
create mode 100644 arch/x86/kernel/watchdog_hld.c
create mode 100644 arch/x86/kernel/watchdog_hld_hpet.c
--
2.17.1
next reply other threads:[~2019-05-24 1:16 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-24 1:16 Ricardo Neri [this message]
2019-05-24 1:16 ` [RFC PATCH v4 01/21] x86/msi: Add definition for NMI delivery mode Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 02/21] x86/hpet: Expose hpet_writel() in header Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 03/21] x86/hpet: Calculate ticks-per-second in a separate function Ricardo Neri
2019-06-14 15:54 ` Thomas Gleixner
2019-06-14 15:59 ` Thomas Gleixner
2019-06-18 22:48 ` Ricardo Neri
2019-06-18 23:13 ` Thomas Gleixner
2019-05-24 1:16 ` [RFC PATCH v4 04/21] x86/hpet: Add hpet_set_comparator() for periodic and one-shot modes Ricardo Neri
2019-06-14 18:17 ` Thomas Gleixner
2019-06-18 22:48 ` Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 05/21] x86/hpet: Reserve timer for the HPET hardlockup detector Ricardo Neri
2019-06-11 19:54 ` Thomas Gleixner
2019-06-14 1:14 ` Ricardo Neri
2019-06-14 16:10 ` Thomas Gleixner
2019-06-18 22:48 ` Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 06/21] x86/hpet: Configure the timer used by the " Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 07/21] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 08/21] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 09/21] x86/nmi: Add a NMI_WATCHDOG NMI handler category Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 10/21] watchdog/hardlockup: Add function to enable NMI watchdog on all allowed CPUs at once Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 11/21] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs Ricardo Neri
2019-06-11 20:11 ` Thomas Gleixner
2019-06-18 22:46 ` Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 13/21] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 14/21] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog" Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 15/21] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 16/21] x86/watchdog: Add a shim hardlockup detector Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 17/21] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable Ricardo Neri
2019-06-07 0:35 ` Stephane Eranian
2019-06-07 14:14 ` Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 18/21] x86/apic: Add a parameter for the APIC delivery mode Ricardo Neri
2019-06-16 9:55 ` Thomas Gleixner
2019-06-18 22:47 ` Ricardo Neri
2019-06-18 23:15 ` Thomas Gleixner
2019-05-24 1:16 ` [RFC PATCH v4 19/21] iommu/vt-d: Rework prepare_irte() to support per-irq " Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 20/21] iommu/vt-d: hpet: Reserve an interrupt remampping table entry for watchdog Ricardo Neri
2019-06-16 18:42 ` Thomas Gleixner
2019-06-16 19:21 ` Thomas Gleixner
2019-06-17 8:25 ` Thomas Gleixner
2019-06-17 21:38 ` Stephane Eranian
2019-06-17 23:08 ` Thomas Gleixner
2019-06-19 15:43 ` Jacob Pan
2019-06-21 15:33 ` Thomas Gleixner
2019-06-21 17:31 ` Jacob Pan
2019-06-21 18:39 ` Jacob Pan
2019-06-21 20:05 ` Thomas Gleixner
2019-06-21 23:55 ` Ricardo Neri
2019-06-22 7:21 ` Thomas Gleixner
2019-10-18 2:48 ` Ricardo Neri
2019-06-18 22:45 ` Ricardo Neri
2019-05-24 1:16 ` [RFC PATCH v4 21/21] x86/watchdog/hardlockup/hpet: Support interrupt remapping Ricardo Neri
2019-06-16 8:44 ` Thomas Gleixner
2019-06-16 8:53 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1558660583-28561-1-git-send-email-ricardo.neri-calderon@linux.intel.com \
--to=ricardo.neri-calderon@linux.intel.com \
--cc=Suravee.Suthikulpanit@amd.com \
--cc=andi.kleen@intel.com \
--cc=ashok.raj@intel.com \
--cc=bp@suse.de \
--cc=eranian@google.com \
--cc=iommu@lists.linux-foundation.org \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=ravi.v.shankar@intel.com \
--cc=rdunlap@infradead.org \
--cc=ricardo.neri@intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).