[RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector

* [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector
@ 2019-02-27 16:05 Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 01/14] x86/msi: Add definition for NMI delivery mode Ricardo Neri
                   ` (13 more replies)
  0 siblings, 14 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri

Hi,

This is the second attempt to demonstrate the implementation of a
hardlockup detector driven by the High-Precision Event Timer. The
initial implementation can be found here [1]. 

== Introduction ==

In CPU architectures that do not have an NMI watchdog, one can be
constructed using a counter of the Performance Monitoring Unit (PMU).
Counters in the PMU have high granularity and high visibility of the CPU.
These capabilities and their limited number make these counters precious
resources. Unfortunately, the perf-based hardlockup detector permanently
consumes one of these counters per CPU.

These counters could be freed for profiling purposes if the hardlockup
detector were driven by another timer.

The hardlockup detector runs relatively infrequently and does not require
visibility of the CPU activity (in addition to detect locked-up CPUs). A
timer that is external to the CPU (e.g., in the chipset) can be used to
drive the detector.

A key requirement is that the timer needs to be capable of issuing a
non-maskable interrupt to the CPU. In most cases, this can be achieved
by tweaking the delivery mode of the interrupt in the interrupt controller
chip (the exception is the IO APIC).

== Details of this implementation

This implementation aims to be simpler than the first attempt. Thus, it
only uses an HPET timer that is capable of issuing interrupts via the
Front Side Bus. Also, the series does not cover the case of interrupt
remapping (to be sent in a subsequent series). The generic interrupt code
is not used and, instead, the detector directly programs all the HPET
registers.

In order to not have to read HPET registers in every NMI, the time-stamp
counter is used to determine whether the HPET caused the interrupt.

Furthermore, only one write to HPET registers is done every
watchdog_thresh seconds. This write can be eliminated if the HPET timer
is periodic.

Lastly, the HPET timer always targets the same CPU. Hence, it is not
necessary to update the interrupt CPU affinity while the hardlockup
detector is running. The rest of the CPUs in the system are monitored
issuing a interprocessor interrupt. CPUs check a cpumask to determine
whether they need to look for hardlockups.

== Parts of this series ==

   1) Add a definition for NMI delivery mode in MSI interrupts. No other
      changes are done to generic irq code.

   2) Rework the x86 HPET platform code to reserve, configure a timer and
      expose the needed interfaces and definitions. Patches 2-6

   3) Rework the hardlockup detector to decouple its generic parts from
      the perf implementation. Patches 7-10

   4) Add an HPET-based hardlockup detector. This includes probing the
      hardware resources, configure the interrupt and rotate the
      destination of the interrupts among all monitored CPUs. Also, it
      includes an x86-specific shim hardlockup detector that selects
      between HPET and perf implementations. Patches 11-14

Thanks and BR,
Ricardo

Changes since v1:

 * Removed reads to HPET registers at every NMI. Instead use the time-stamp
   counter to infer the interrupt source (Thomas Gleixner, Andi Kleen).
 * Do not target CPUs in a round-robin manner. Instead, the HPET timer
   always targets the same CPU; other CPUs are monitored via an
   interprocessor interrupt.
 * Removed use of generic irq code to set interrupt affinity and NMI
   delivery. Instead, configure the interrupt directly in HPET registers
   (Thomas Gleixner).
 * Removed the proposed ops structure for NMI watchdogs. Instead, split
   the existing implementation into a generic library and perf-specific
   infrastructure (Thomas Gleixner, Nicholas Piggin).
 * Added an x86-specific shim hardlockup detector that selects between
   HPET and perf infrastructures as needed (Nicholas Piggin).
 * Removed locks taken in NMI and !NMI context. This was wrong and is no
   longer needed (Thomas Gleixner).
 * Fixed unconditonal return NMI_HANDLED when the HPET timer is programmed
   for FSB/MSI delivery (Peter Zijlstra).

References:

[1]. https://lkml.org/lkml/2018/6/12/1027

Ricardo Neri (14):
  kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based
    implementation
  x86/msi: Add definition for NMI delivery mode
  x86/hpet: Expose more functions to read and write registers
  x86/hpet: Calculate ticks-per-second in a separate function
  x86/hpet: Reserve timer for the HPET hardlockup detector
  x86/hpet: Relocate flag definitions to a header file
  x86/hpet: Configure the timer used by the hardlockup detector
  watchdog/hardlockup: Define a generic function to detect hardlockups
  watchdog/hardlockup: Decouple the hardlockup detector from perf
  x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot
    parameter
  x86/watchdog: Add a shim hardlockup detector

 .../admin-guide/kernel-parameters.txt         |   6 +-
 arch/x86/Kconfig.debug                        |  14 +
 arch/x86/include/asm/hpet.h                   |  46 ++
 arch/x86/include/asm/msidef.h                 |   1 +
 arch/x86/kernel/Makefile                      |   2 +
 arch/x86/kernel/hpet.c                        |  64 ++-
 arch/x86/kernel/watchdog_hld.c                |  78 +++
 arch/x86/kernel/watchdog_hld_hpet.c           | 447 ++++++++++++++++++
 drivers/char/hpet.c                           |  31 +-
 include/linux/hpet.h                          |   1 +
 include/linux/nmi.h                           |  12 +-
 kernel/Makefile                               |   3 +-
 kernel/watchdog.c                             |   9 +-
 kernel/watchdog_hld.c                         | 151 +-----
 kernel/watchdog_hld_perf.c                    | 175 +++++++
 15 files changed, 869 insertions(+), 171 deletions(-)
 create mode 100644 arch/x86/kernel/watchdog_hld.c
 create mode 100644 arch/x86/kernel/watchdog_hld_hpet.c
 create mode 100644 kernel/watchdog_hld_perf.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 49+ messages in thread