linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector
@ 2019-02-27 16:05 Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 01/14] x86/msi: Add definition for NMI delivery mode Ricardo Neri
                   ` (13 more replies)
  0 siblings, 14 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri

Hi,

This is the second attempt to demonstrate the implementation of a
hardlockup detector driven by the High-Precision Event Timer. The
initial implementation can be found here [1]. 

== Introduction ==

In CPU architectures that do not have an NMI watchdog, one can be
constructed using a counter of the Performance Monitoring Unit (PMU).
Counters in the PMU have high granularity and high visibility of the CPU.
These capabilities and their limited number make these counters precious
resources. Unfortunately, the perf-based hardlockup detector permanently
consumes one of these counters per CPU.

These counters could be freed for profiling purposes if the hardlockup
detector were driven by another timer.

The hardlockup detector runs relatively infrequently and does not require
visibility of the CPU activity (in addition to detect locked-up CPUs). A
timer that is external to the CPU (e.g., in the chipset) can be used to
drive the detector.

A key requirement is that the timer needs to be capable of issuing a
non-maskable interrupt to the CPU. In most cases, this can be achieved
by tweaking the delivery mode of the interrupt in the interrupt controller
chip (the exception is the IO APIC).

== Details of this implementation

This implementation aims to be simpler than the first attempt. Thus, it
only uses an HPET timer that is capable of issuing interrupts via the
Front Side Bus. Also, the series does not cover the case of interrupt
remapping (to be sent in a subsequent series). The generic interrupt code
is not used and, instead, the detector directly programs all the HPET
registers.

In order to not have to read HPET registers in every NMI, the time-stamp
counter is used to determine whether the HPET caused the interrupt.

Furthermore, only one write to HPET registers is done every
watchdog_thresh seconds. This write can be eliminated if the HPET timer
is periodic.

Lastly, the HPET timer always targets the same CPU. Hence, it is not
necessary to update the interrupt CPU affinity while the hardlockup
detector is running. The rest of the CPUs in the system are monitored
issuing a interprocessor interrupt. CPUs check a cpumask to determine
whether they need to look for hardlockups.

== Parts of this series ==

   1) Add a definition for NMI delivery mode in MSI interrupts. No other
      changes are done to generic irq code.

   2) Rework the x86 HPET platform code to reserve, configure a timer and
      expose the needed interfaces and definitions. Patches 2-6

   3) Rework the hardlockup detector to decouple its generic parts from
      the perf implementation. Patches 7-10

   4) Add an HPET-based hardlockup detector. This includes probing the
      hardware resources, configure the interrupt and rotate the
      destination of the interrupts among all monitored CPUs. Also, it
      includes an x86-specific shim hardlockup detector that selects
      between HPET and perf implementations. Patches 11-14


Thanks and BR,
Ricardo

Changes since v1:

 * Removed reads to HPET registers at every NMI. Instead use the time-stamp
   counter to infer the interrupt source (Thomas Gleixner, Andi Kleen).
 * Do not target CPUs in a round-robin manner. Instead, the HPET timer
   always targets the same CPU; other CPUs are monitored via an
   interprocessor interrupt.
 * Removed use of generic irq code to set interrupt affinity and NMI
   delivery. Instead, configure the interrupt directly in HPET registers
   (Thomas Gleixner).
 * Removed the proposed ops structure for NMI watchdogs. Instead, split
   the existing implementation into a generic library and perf-specific
   infrastructure (Thomas Gleixner, Nicholas Piggin).
 * Added an x86-specific shim hardlockup detector that selects between
   HPET and perf infrastructures as needed (Nicholas Piggin).
 * Removed locks taken in NMI and !NMI context. This was wrong and is no
   longer needed (Thomas Gleixner).
 * Fixed unconditonal return NMI_HANDLED when the HPET timer is programmed
   for FSB/MSI delivery (Peter Zijlstra).

References:

[1]. https://lkml.org/lkml/2018/6/12/1027

Ricardo Neri (14):
  kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based
    implementation
  x86/msi: Add definition for NMI delivery mode
  x86/hpet: Expose more functions to read and write registers
  x86/hpet: Calculate ticks-per-second in a separate function
  x86/hpet: Reserve timer for the HPET hardlockup detector
  x86/hpet: Relocate flag definitions to a header file
  x86/hpet: Configure the timer used by the hardlockup detector
  watchdog/hardlockup: Define a generic function to detect hardlockups
  watchdog/hardlockup: Decouple the hardlockup detector from perf
  x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot
    parameter
  x86/watchdog: Add a shim hardlockup detector

 .../admin-guide/kernel-parameters.txt         |   6 +-
 arch/x86/Kconfig.debug                        |  14 +
 arch/x86/include/asm/hpet.h                   |  46 ++
 arch/x86/include/asm/msidef.h                 |   1 +
 arch/x86/kernel/Makefile                      |   2 +
 arch/x86/kernel/hpet.c                        |  64 ++-
 arch/x86/kernel/watchdog_hld.c                |  78 +++
 arch/x86/kernel/watchdog_hld_hpet.c           | 447 ++++++++++++++++++
 drivers/char/hpet.c                           |  31 +-
 include/linux/hpet.h                          |   1 +
 include/linux/nmi.h                           |  12 +-
 kernel/Makefile                               |   3 +-
 kernel/watchdog.c                             |   9 +-
 kernel/watchdog_hld.c                         | 151 +-----
 kernel/watchdog_hld_perf.c                    | 175 +++++++
 15 files changed, 869 insertions(+), 171 deletions(-)
 create mode 100644 arch/x86/kernel/watchdog_hld.c
 create mode 100644 arch/x86/kernel/watchdog_hld_hpet.c
 create mode 100644 kernel/watchdog_hld_perf.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 01/14] x86/msi: Add definition for NMI delivery mode
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 02/14] x86/hpet: Expose more functions to read and write registers Ricardo Neri
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Joerg Roedel, Juergen Gross, Bjorn Helgaas, Wincy Van,
	Kate Stewart, Philippe Ombredanne, Eric W. Biederman, Baoquan He,
	Dou Liyang, Jan Kiszka

Until now, the delivery mode of MSI interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Wincy Van <fanwenyi0529@gmail.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/msidef.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8ccc32d0..38ccfdc2d96e 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -18,6 +18,7 @@
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
+#define  MSI_DATA_DELIVERY_NMI		(4 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
 #define MSI_DATA_LEVEL_SHIFT		14
 #define	 MSI_DATA_LEVEL_DEASSERT	(0 << MSI_DATA_LEVEL_SHIFT)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 02/14] x86/hpet: Expose more functions to read and write registers
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 01/14] x86/msi: Add definition for NMI delivery mode Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 21:00   ` Thomas Gleixner
  2019-02-27 16:05 ` [RFC PATCH v2 03/14] x86/hpet: Calculate ticks-per-second in a separate function Ricardo Neri
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki

Some of the registers in the HPET hardware have a width of 64 bits. 64-bit
access functions are needed mostly to read the counter and write the
comparator in a single read or write. Also, 64-bit accesses can be used to
to read parameters located in the higher bits of some registers (such as
the timer period and the IO APIC pins that can be asserted by the timer)
without the need of masking and shifting the register values.

64-bit read and write functions are added. These functions, along with the
existing hpet_writel(), are exposed via the HPET header to be used by other
kernel subsystems.

Thus far, the only consumer of these functions will the HPET-based
hardlockup detector, which will only be available in 64-bit builds. Thus,
the 64-bit access functions are wrapped in CONFIG_X86_64.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h | 10 ++++++++++
 arch/x86/kernel/hpet.c      | 12 +++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 67385d56d4f4..9e0afde47587 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -72,6 +72,11 @@ extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
+extern void hpet_writel(unsigned int d, unsigned int a);
+#ifdef CONFIG_X86_64
+extern unsigned long hpet_readq(unsigned int a);
+extern void hpet_writeq(unsigned long d, unsigned int a);
+#endif
 extern void force_hpet_resume(void);
 
 struct irq_data;
@@ -109,6 +114,11 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
 static inline int hpet_enable(void) { return 0; }
 static inline int is_hpet_enabled(void) { return 0; }
 #define hpet_readl(a) 0
+#define hpet_writel(d, a)
+#ifdef CONFIG_X86_64
+#define hpet_readq(a) 0
+#define hpet_writeq(d, a)
+#endif
 #define default_setup_hpet_msi	NULL
 
 #endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index dfd3aca82c61..ee439d84e83b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -61,12 +61,22 @@ inline unsigned int hpet_readl(unsigned int a)
 	return readl(hpet_virt_address + a);
 }
 
-static inline void hpet_writel(unsigned int d, unsigned int a)
+inline void hpet_writel(unsigned int d, unsigned int a)
 {
 	writel(d, hpet_virt_address + a);
 }
 
 #ifdef CONFIG_X86_64
+inline unsigned long hpet_readq(unsigned int a)
+{
+	return readq(hpet_virt_address + a);
+}
+
+inline void hpet_writeq(unsigned long d, unsigned int a)
+{
+	writeq(d, hpet_virt_address + a);
+}
+
 #include <asm/pgtable.h>
 #endif
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 03/14] x86/hpet: Calculate ticks-per-second in a separate function
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 01/14] x86/msi: Add definition for NMI delivery mode Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 02/14] x86/hpet: Expose more functions to read and write registers Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 21:03   ` Thomas Gleixner
  2019-02-27 16:05 ` [RFC PATCH v2 04/14] x86/hpet: Reserve timer for the HPET hardlockup detector Ricardo Neri
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki

It is easier to compute the expiration times of an HPET timer by using
its frequency (i.e., the number of times it ticks in a second) than its
period, as given in the capabilities register.

In addition to the HPET char driver, the HPET-based hardlockup detector
will also need to know the timer's frequency. Thus, create a common
function that both can use.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/char/hpet.c  | 31 +++++++++++++++++++++++++------
 include/linux/hpet.h |  1 +
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index 4a22b4b41aef..b73b68c0e127 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -836,6 +836,29 @@ static unsigned long hpet_calibrate(struct hpets *hpetp)
 	return ret;
 }
 
+u64 hpet_get_ticks_per_sec(u64 hpet_caps)
+{
+	u64 ticks_per_sec, period;
+
+	period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
+		 HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
+
+	/*
+	 * The frequency is the reciprocal of the period. The period is given
+	 * femtoseconds per second. Thus, prepare a dividend to obtain the
+	 * frequency in ticks per second.
+	 */
+
+	/* 10^15 femtoseconds per second */
+	ticks_per_sec = 1000000000000000uLL;
+	ticks_per_sec += period >> 1; /* round */
+
+	/* The quotient is put in the dividend. We drop the remainder. */
+	do_div(ticks_per_sec, period);
+
+	return ticks_per_sec;
+}
+
 int hpet_alloc(struct hpet_data *hdp)
 {
 	u64 cap, mcfg;
@@ -845,7 +868,6 @@ int hpet_alloc(struct hpet_data *hdp)
 	size_t siz;
 	struct hpet __iomem *hpet;
 	static struct hpets *last;
-	unsigned long period;
 	unsigned long long temp;
 	u32 remainder;
 
@@ -881,6 +903,8 @@ int hpet_alloc(struct hpet_data *hdp)
 
 	cap = readq(&hpet->hpet_cap);
 
+	temp = hpet_get_ticks_per_sec(cap);
+
 	ntimer = ((cap & HPET_NUM_TIM_CAP_MASK) >> HPET_NUM_TIM_CAP_SHIFT) + 1;
 
 	if (hpetp->hp_ntimer != ntimer) {
@@ -897,11 +921,6 @@ int hpet_alloc(struct hpet_data *hdp)
 
 	last = hpetp;
 
-	period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
-		HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
-	temp = 1000000000000000uLL; /* 10^15 femtoseconds per second */
-	temp += period >> 1; /* round */
-	do_div(temp, period);
 	hpetp->hp_tick_freq = temp; /* ticks per second */
 
 	printk(KERN_INFO "hpet%d: at MMIO 0x%lx, IRQ%s",
diff --git a/include/linux/hpet.h b/include/linux/hpet.h
index 8604564b985d..e7b36bcf4699 100644
--- a/include/linux/hpet.h
+++ b/include/linux/hpet.h
@@ -107,5 +107,6 @@ static inline void hpet_reserve_timer(struct hpet_data *hd, int timer)
 }
 
 int hpet_alloc(struct hpet_data *);
+u64 hpet_get_ticks_per_sec(u64 hpet_caps);
 
 #endif				/* !__HPET__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 04/14] x86/hpet: Reserve timer for the HPET hardlockup detector
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (2 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 03/14] x86/hpet: Calculate ticks-per-second in a separate function Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 05/14] x86/hpet: Relocate flag definitions to a header file Ricardo Neri
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki

HPET timer 2 will be used to drive the HPET-based hardlockup detector.
Reserve such timer to ensure it cannot be used by user space programs or
clock events.

When looking for MSI-capable timers for clock events, skip timer 2 if
the HPET hardlockup detector is selected.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c      | 19 ++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9e0afde47587..3266796f7b60 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -61,6 +61,9 @@
  */
 #define HPET_MIN_PERIOD		100000UL
 
+/* Timer used for the hardlockup detector */
+#define HPET_WD_TIMER_NR 2
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index ee439d84e83b..e9e18bf7e65d 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -182,7 +182,8 @@ do {								\
 
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
- * timer 0 and timer 1 in case of RTC emulation.
+ * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
+ * the HPET-based hardlockup detector is used.
  */
 #ifdef CONFIG_HPET
 
@@ -192,7 +193,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 {
 	struct hpet __iomem *hpet = hpet_virt_address;
 	struct hpet_timer __iomem *timer = &hpet->hpet_timers[2];
-	unsigned int nrtimers, i;
+	unsigned int nrtimers, i, start_timer;
 	struct hpet_data hd;
 
 	nrtimers = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
@@ -207,6 +208,13 @@ static void hpet_reserve_platform_timers(unsigned int id)
 	hpet_reserve_timer(&hd, 1);
 #endif
 
+	if (IS_ENABLED(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET)) {
+		hpet_reserve_timer(&hd, HPET_WD_TIMER_NR);
+		start_timer = HPET_WD_TIMER_NR + 1;
+	} else {
+		start_timer = HPET_WD_TIMER_NR;
+	}
+
 	/*
 	 * NOTE that hd_irq[] reflects IOAPIC input pins (LEGACY_8254
 	 * is wrong for i8259!) not the output IRQ.  Many BIOS writers
@@ -215,7 +223,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 	hd.hd_irq[0] = HPET_LEGACY_8254;
 	hd.hd_irq[1] = HPET_LEGACY_RTC;
 
-	for (i = 2; i < nrtimers; timer++, i++) {
+	for (i = start_timer; i < nrtimers; timer++, i++) {
 		hd.hd_irq[i] = (readl(&timer->hpet_config) &
 			Tn_INT_ROUTE_CNF_MASK) >> Tn_INT_ROUTE_CNF_SHIFT;
 	}
@@ -627,6 +635,11 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 		struct hpet_dev *hdev = &hpet_devs[num_timers_used];
 		unsigned int cfg = hpet_readl(HPET_Tn_CFG(i));
 
+		/* Do not use timer reserved for the HPET watchdog. */
+		if (IS_ENABLED(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) &&
+		    i == HPET_WD_TIMER_NR)
+			continue;
+
 		/* Only consider HPET timer with MSI support */
 		if (!(cfg & HPET_TN_FSB_CAP))
 			continue;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 05/14] x86/hpet: Relocate flag definitions to a header file
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (3 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 04/14] x86/hpet: Reserve timer for the HPET hardlockup detector Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 21:11   ` Thomas Gleixner
  2019-02-27 16:05 ` [RFC PATCH v2 06/14] x86/hpet: Configure the timer used by the hardlockup detector Ricardo Neri
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki

Users of HPET timers (such as the hardlockup detector) need the definitions
of these flags to interpret the configuration of a timer as passed by
platform code.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h | 6 ++++++
 arch/x86/kernel/hpet.c      | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 3266796f7b60..9fd112a0ffdd 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -64,6 +64,12 @@
 /* Timer used for the hardlockup detector */
 #define HPET_WD_TIMER_NR 2
 
+#define HPET_DEV_USED_BIT		2
+#define HPET_DEV_USED			(1 << HPET_DEV_USED_BIT)
+#define HPET_DEV_VALID			0x8
+#define HPET_DEV_FSB_CAP		0x1000
+#define HPET_DEV_PERI_CAP		0x2000
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index e9e18bf7e65d..1c5c63366109 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -21,12 +21,6 @@
 
 #define HPET_MASK			CLOCKSOURCE_MASK(32)
 
-#define HPET_DEV_USED_BIT		2
-#define HPET_DEV_USED			(1 << HPET_DEV_USED_BIT)
-#define HPET_DEV_VALID			0x8
-#define HPET_DEV_FSB_CAP		0x1000
-#define HPET_DEV_PERI_CAP		0x2000
-
 #define HPET_MIN_CYCLES			128
 #define HPET_MIN_PROG_DELTA		(HPET_MIN_CYCLES + (HPET_MIN_CYCLES >> 1))
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 06/14] x86/hpet: Configure the timer used by the hardlockup detector
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (4 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 05/14] x86/hpet: Relocate flag definitions to a header file Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 21:13   ` Thomas Gleixner
  2019-02-27 16:05 ` [RFC PATCH v2 07/14] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki

Implement the initial configuration of the timer to be used by the
hardlockup detector. Return a data structure with a description of the
timer; this information is subsequently used by the hardlockup detector.

Only provide the timer if it supports Front Side Bus interrupt delivery.
This condition greatly simplifies the implementation of the detector.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h | 13 +++++++++++++
 arch/x86/kernel/hpet.c      | 27 +++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9fd112a0ffdd..4d559e0c746f 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -118,6 +118,19 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
+#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data {
+	u32		num;
+	u32		flags;
+	u64		ticks_per_second;
+};
+
+extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+#else
+static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{ return NULL; }
+#endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
+
 #else /* CONFIG_HPET_TIMER */
 
 static inline int hpet_enable(void) { return 0; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 1c5c63366109..923b8061e815 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -174,6 +174,33 @@ do {								\
 		_hpet_print_config(__func__, __LINE__);	\
 } while (0)
 
+#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+	struct hpet_hld_data *hdata;
+	unsigned int cfg;
+
+	cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+
+	if (!(cfg & HPET_TN_FSB_CAP))
+		return NULL;
+
+	hdata = kzalloc(sizeof(*hdata), GFP_KERNEL);
+	if (!hdata)
+		return NULL;
+
+	hdata->flags = HPET_DEV_FSB_CAP;
+
+	if (cfg & HPET_TN_PERIODIC_CAP)
+		hdata->flags |= HPET_DEV_PERI_CAP;
+
+	hdata->num = HPET_WD_TIMER_NR;
+	hdata->ticks_per_second = hpet_get_ticks_per_sec(hpet_readq(HPET_ID));
+
+	return hdata;
+}
+#endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
+
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
  * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 07/14] watchdog/hardlockup: Define a generic function to detect hardlockups
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (5 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 06/14] x86/hpet: Configure the timer used by the hardlockup detector Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 08/14] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Babu Moger, David S. Miller,
	Benjamin Herrenschmidt, Paul Mackerras, Mathieu Desnoyers,
	Masami Hiramatsu, Andrew Morton, Philippe Ombredanne,
	Colin Ian King, Luis R. Rodriguez, sparclinux, linuxppc-dev

The procedure to detect hardlockups is independent of the underlying
mechanism that generated the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.

For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h   |  1 +
 kernel/watchdog_hld.c | 18 +++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 9003e29cde46..5a8b19749769 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -212,6 +212,7 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
 				void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 				 void __user *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include <asm/nmi.h>
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 71381168dede..9724cd57307b 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
 	.disabled	= 1,
 };
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-				       struct perf_sample_data *data,
-				       struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
 {
-	/* Ensure the watchdog never gets throttled */
-	event->hw.interrupts = 0;
-
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;
@@ -162,6 +156,16 @@ static void watchdog_overflow_callback(struct perf_event *event,
 	return;
 }
 
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+	inspect_for_hardlockups(regs);
+}
+
 static int hardlockup_detector_event_create(void)
 {
 	unsigned int cpu = smp_processor_id();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 08/14] watchdog/hardlockup: Decouple the hardlockup detector from perf
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (6 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 07/14] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 21:18   ` Thomas Gleixner
  2019-02-27 16:05 ` [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation Ricardo Neri
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Babu Moger, David S. Miller,
	Benjamin Herrenschmidt, Paul Mackerras, Mathieu Desnoyers,
	Masami Hiramatsu, Andrew Morton, Philippe Ombredanne,
	Colin Ian King, Luis R. Rodriguez, sparclinux, linuxppc-dev

The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).

Put in a separate file all the code that is specific to perf: create and
manage events, stop and start the detector. This perf-specific code is put
in the new file watchdog_hld_perf.c

The code generic code used to monitor the timers' thresholds, check
timestamps and detect hardlockups remains in watchdog_hld.c

Functions and variables are simply relocated to a new file. No functional
changes were made.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/Makefile            |   3 +-
 kernel/watchdog_hld.c      | 153 --------------------------------
 kernel/watchdog_hld_perf.c | 175 +++++++++++++++++++++++++++++++++++++
 3 files changed, 177 insertions(+), 154 deletions(-)
 create mode 100644 kernel/watchdog_hld_perf.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 6aa7543bcdb2..5b75e6003458 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -81,7 +81,8 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 9724cd57307b..372db565b1b9 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 notrace void arch_touch_nmi_watchdog(void)
 {
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
 }
 #endif
 
-static struct perf_event_attr wd_hw_attr = {
-	.type		= PERF_TYPE_HARDWARE,
-	.config		= PERF_COUNT_HW_CPU_CYCLES,
-	.size		= sizeof(struct perf_event_attr),
-	.pinned		= 1,
-	.disabled	= 1,
-};
-
 void inspect_for_hardlockups(struct pt_regs *regs)
 {
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
@@ -156,144 +144,3 @@ void inspect_for_hardlockups(struct pt_regs *regs)
 	return;
 }
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-				       struct perf_sample_data *data,
-				       struct pt_regs *regs)
-{
-	/* Ensure the watchdog never gets throttled */
-	event->hw.interrupts = 0;
-	inspect_for_hardlockups(regs);
-}
-
-static int hardlockup_detector_event_create(void)
-{
-	unsigned int cpu = smp_processor_id();
-	struct perf_event_attr *wd_attr;
-	struct perf_event *evt;
-
-	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
-
-	/* Try to register using hardware perf events */
-	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
-					       watchdog_overflow_callback, NULL);
-	if (IS_ERR(evt)) {
-		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
-			 PTR_ERR(evt));
-		return PTR_ERR(evt);
-	}
-	this_cpu_write(watchdog_ev, evt);
-	return 0;
-}
-
-/**
- * hardlockup_detector_perf_enable - Enable the local event
- */
-void hardlockup_detector_perf_enable(void)
-{
-	if (hardlockup_detector_event_create())
-		return;
-
-	/* use original value for check */
-	if (!atomic_fetch_inc(&watchdog_cpus))
-		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
-
-	perf_event_enable(this_cpu_read(watchdog_ev));
-}
-
-/**
- * hardlockup_detector_perf_disable - Disable the local event
- */
-void hardlockup_detector_perf_disable(void)
-{
-	struct perf_event *event = this_cpu_read(watchdog_ev);
-
-	if (event) {
-		perf_event_disable(event);
-		this_cpu_write(watchdog_ev, NULL);
-		this_cpu_write(dead_event, event);
-		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-		atomic_dec(&watchdog_cpus);
-	}
-}
-
-/**
- * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
- *
- * Called from lockup_detector_cleanup(). Serialized by the caller.
- */
-void hardlockup_detector_perf_cleanup(void)
-{
-	int cpu;
-
-	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(dead_event, cpu);
-
-		/*
-		 * Required because for_each_cpu() reports  unconditionally
-		 * CPU0 as set on UP kernels. Sigh.
-		 */
-		if (event)
-			perf_event_release_kernel(event);
-		per_cpu(dead_event, cpu) = NULL;
-	}
-	cpumask_clear(&dead_events_mask);
-}
-
-/**
- * hardlockup_detector_perf_stop - Globally stop watchdog events
- *
- * Special interface for x86 to handle the perf HT bug.
- */
-void __init hardlockup_detector_perf_stop(void)
-{
-	int cpu;
-
-	lockdep_assert_cpus_held();
-
-	for_each_online_cpu(cpu) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
-
-		if (event)
-			perf_event_disable(event);
-	}
-}
-
-/**
- * hardlockup_detector_perf_restart - Globally restart watchdog events
- *
- * Special interface for x86 to handle the perf HT bug.
- */
-void __init hardlockup_detector_perf_restart(void)
-{
-	int cpu;
-
-	lockdep_assert_cpus_held();
-
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-		return;
-
-	for_each_online_cpu(cpu) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
-
-		if (event)
-			perf_event_enable(event);
-	}
-}
-
-/**
- * hardlockup_detector_perf_init - Probe whether NMI event is available at all
- */
-int __init hardlockup_detector_perf_init(void)
-{
-	int ret = hardlockup_detector_event_create();
-
-	if (ret) {
-		pr_info("Perf NMI watchdog permanently disabled\n");
-	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
-	}
-	return ret;
-}
diff --git a/kernel/watchdog_hld_perf.c b/kernel/watchdog_hld_perf.c
new file mode 100644
index 000000000000..1d06ec5a8e42
--- /dev/null
+++ b/kernel/watchdog_hld_perf.c
@@ -0,0 +1,175 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Detect hard lockups on a system
+ *
+ * Copyright (C) Intel Corporation 2019
+ *
+ * Note: All of this code comes from the original perf-specific hardlockup
+ * detector.
+ */
+
+#define pr_fmt(fmt) "NMI perf watchdog: " fmt
+
+#include <linux/nmi.h>
+#include <linux/atomic.h>
+#include <linux/module.h>
+#include <linux/sched/debug.h>
+#include <linux/perf_event.h>
+#include <asm/irq_regs.h>
+
+static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
+static struct cpumask dead_events_mask;
+
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
+
+static struct perf_event_attr wd_hw_attr = {
+	.type		= PERF_TYPE_HARDWARE,
+	.config		= PERF_COUNT_HW_CPU_CYCLES,
+	.size		= sizeof(struct perf_event_attr),
+	.pinned		= 1,
+	.disabled	= 1,
+};
+
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+	inspect_for_hardlockups(regs);
+}
+
+static int hardlockup_detector_event_create(void)
+{
+	unsigned int cpu = smp_processor_id();
+	struct perf_event_attr *wd_attr;
+	struct perf_event *evt;
+
+	wd_attr = &wd_hw_attr;
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+
+	/* Try to register using hardware perf events */
+	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
+					       watchdog_overflow_callback, NULL);
+	if (IS_ERR(evt)) {
+		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
+			 PTR_ERR(evt));
+		return PTR_ERR(evt);
+	}
+	this_cpu_write(watchdog_ev, evt);
+	return 0;
+}
+
+/**
+ * hardlockup_detector_perf_enable - Enable the local event
+ */
+void hardlockup_detector_perf_enable(void)
+{
+	if (hardlockup_detector_event_create())
+		return;
+
+	/* use original value for check */
+	if (!atomic_fetch_inc(&watchdog_cpus))
+		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
+
+	perf_event_enable(this_cpu_read(watchdog_ev));
+}
+
+/**
+ * hardlockup_detector_perf_disable - Disable the local event
+ */
+void hardlockup_detector_perf_disable(void)
+{
+	struct perf_event *event = this_cpu_read(watchdog_ev);
+
+	if (event) {
+		perf_event_disable(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
+		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
+		atomic_dec(&watchdog_cpus);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
+ *
+ * Called from lockup_detector_cleanup(). Serialized by the caller.
+ */
+void hardlockup_detector_perf_cleanup(void)
+{
+	int cpu;
+
+	for_each_cpu(cpu, &dead_events_mask) {
+		struct perf_event *event = per_cpu(dead_event, cpu);
+
+		/*
+		 * Required because for_each_cpu() reports  unconditionally
+		 * CPU0 as set on UP kernels. Sigh.
+		 */
+		if (event)
+			perf_event_release_kernel(event);
+		per_cpu(dead_event, cpu) = NULL;
+	}
+	cpumask_clear(&dead_events_mask);
+}
+
+/**
+ * hardlockup_detector_perf_stop - Globally stop watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_stop(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_disable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_restart - Globally restart watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_restart(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+		return;
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_enable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_init - Probe whether NMI event is available at all
+ */
+int __init hardlockup_detector_perf_init(void)
+{
+	int ret = hardlockup_detector_event_create();
+
+	if (ret) {
+		pr_info("Perf NMI watchdog permanently disabled\n");
+	} else {
+		perf_event_release_kernel(this_cpu_read(watchdog_ev));
+		this_cpu_write(watchdog_ev, NULL);
+	}
+	return ret;
+}
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (7 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 08/14] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-02-27 16:17   ` Paul E. McKenney
  2019-02-27 16:05 ` [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask Ricardo Neri
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Rafael J. Wysocki, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Alexei Starovoitov,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, sparclinux, linuxppc-dev

CPU architectures that have an NMI watchdog use arch_touch_nmi_watchdog()
to briefly ignore the hardlockup detector. If the architecture does not
have an NMI watchdog, one can be constructed using a source of non-
maskable interrupts. In this case, arch_touch_nmi_watchdog() is common
to any underlying hardware resource used to drive the detector and needs
to be available to other kernel subsystems if hardware different from perf
drives the detector.

There exists perf-based and HPET-based implementations. Make it available
to the latter.

For clarity, wrap this function in a separate preprocessor conditional
from functions which are truly specific to the perf-based implementation.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 5a8b19749769..bf5ebcfdd590 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -94,8 +94,16 @@ static inline void hardlockup_detector_disable(void) {}
 # define NMI_WATCHDOG_SYSCTL_PERM	0444
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
+    defined(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET)
 extern void arch_touch_nmi_watchdog(void);
+#else
+# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static inline void arch_touch_nmi_watchdog(void) {}
+# endif
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_disable(void);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (8 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 21:22   ` Thomas Gleixner
  2019-04-09 11:34   ` Peter Zijlstra
  2019-02-27 16:05 ` [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Rafael J. Wysocki, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Alexei Starovoitov,
	Babu Moger, Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, David S. Miller, Benjamin Herrenschmidt,
	sparclinux, linuxppc-dev

Implementations of NMI watchdogs that use a single piece of hardware to
monitor all the CPUs in the system (as opposed to per-CPU implementations
such as perf) need to know which CPUs the watchdog is allowed to monitor.
In this manner, non-maskable interrupts are directed only to the monitored
CPUs.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h | 1 +
 kernel/watchdog.c   | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index bf5ebcfdd590..b563fb6ef21d 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -83,6 +83,7 @@ static inline void reset_hung_task_detector(void) { }
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern struct cpumask *watchdog_get_allowed_cpumask(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 8fbfda94a67b..367aa81294ef 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -44,7 +44,7 @@ int __read_mostly soft_watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
 int __read_mostly nmi_watchdog_available;
 
-struct cpumask watchdog_allowed_mask __read_mostly;
+static struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
@@ -92,6 +92,11 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 # endif /* CONFIG_SMP */
+
+struct cpumask *watchdog_get_allowed_cpumask(void)
+{
+	return &watchdog_allowed_mask;
+}
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (9 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 20:49   ` Thomas Gleixner
                     ` (2 more replies)
  2019-02-27 16:05 ` [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
                   ` (2 subsequent siblings)
  13 siblings, 3 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki, Mimi Zohar, Jan Kiszka,
	Nick Desaulniers, Masahiro Yamada, Nayna Jain

This is the initial implementation of a hardlockup detector driven by an
HPET timer. This initial implementation includes functions to control
the timer via its registers. It also requests such timer, installs
an NMI interrupt handler and performs the initial configuration of
the timer.

The detector is not functional at this stage. A subsequent changeset will
invoke the interfaces provides by this detector.

In order to minimize the reconfiguration of interrupts, the HPET timer
always target the same CPU (the first CPU present in the
watchdog_allowed_mask cpumask, the handling CPU). If the HPET caused
an NMI on the handling CPU, an NMI interprocessor interrupt is sent
to the other CPUs in the watchdog_allowed_mask. Upon receiving the
interrupt, such CPUs will check a cpumask and inspect for hardlockups
if requested in such mask.

This detector relies on an HPET timer that is capable of using Front Side
Bus interrupts. In order to avoid using the generic interrupt code,
program directly the MSI message register of the HPET timer.

HPET registers are only accessed to kick the timer after looking for
hardlockups. This happens every watchdog_thresh seconds. A subsequent
changeset will determine whether the HPET timer caused the interrupt based
on the value of the time-stamp counter. For now, just add a stub function.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/Kconfig.debug              |  10 +
 arch/x86/include/asm/hpet.h         |  12 +
 arch/x86/kernel/Makefile            |   1 +
 arch/x86/kernel/watchdog_hld_hpet.c | 405 ++++++++++++++++++++++++++++
 4 files changed, 428 insertions(+)
 create mode 100644 arch/x86/kernel/watchdog_hld_hpet.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 15d0fbe27872..3a2845a29e8a 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -169,6 +169,16 @@ config IOMMU_LEAK
 config HAVE_MMIOTRACE_SUPPORT
 	def_bool y
 
+config X86_HARDLOCKUP_DETECTOR_HPET
+	bool "Use HPET Timer for Hard Lockup Detection"
+	select SOFTLOCKUP_DETECTOR
+	select HARDLOCKUP_DETECTOR
+	depends on HPET_TIMER && HPET && X86_64
+	help
+	  Say y to enable a hardlockup detector that is driven by an High-
+	  Precision Event Timer. This option is helpful to not use counters
+	  from the Performance Monitoring Unit to drive the detector.
+
 config X86_DECODER_SELFTEST
 	bool "x86 instruction decoder selftest"
 	depends on DEBUG_KERNEL && KPROBES
diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 4d559e0c746f..15dc3b576496 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -123,12 +123,24 @@ struct hpet_hld_data {
 	u32		num;
 	u32		flags;
 	u64		ticks_per_second;
+	u32		handling_cpu;
+	struct cpumask	cpu_monitored_mask;
+	struct msi_msg	msi_msg;
 };
 
 extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+extern int hardlockup_detector_hpet_init(void);
+extern void hardlockup_detector_hpet_stop(void);
+extern void hardlockup_detector_hpet_enable(void);
+extern void hardlockup_detector_hpet_disable(void);
 #else
 static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
 { return NULL; }
+static inline int hardlockup_detector_hpet_init(void)
+{ return -ENODEV; }
+static inline void hardlockup_detector_hpet_stop(void) {}
+static inline void hardlockup_detector_hpet_enable(void) {}
+static inline void hardlockup_detector_hpet_disable(void) {}
 #endif /* CONFIG_X86_HARDLOCKUP_DETECTOR_HPET */
 
 #else /* CONFIG_HPET_TIMER */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 00b7e27bc2b7..9d610e8a9224 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_VM86)		+= vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
 
 obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
+obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld_hpet.o
 obj-$(CONFIG_APB_TIMER)		+= apb_timer.o
 
 obj-$(CONFIG_AMD_NB)		+= amd_nb.o
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
new file mode 100644
index 000000000000..cfa284da4bf6
--- /dev/null
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -0,0 +1,405 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A hardlockup detector driven by an HPET timer.
+ *
+ * Copyright (C) Intel Corporation 2019
+ *
+ * A hardlockup detector driven by an HPET timer. It implements the same
+ * interfaces as the PERF-based hardlockup detector.
+ *
+ * In order to minimize the reconfiguration of interrupts, the HPET timer
+ * always targets the same CPU (the first CPU present in the
+ * watchdog_allowed_mask cpumask, the handling CPU). If the HPET caused
+ * an NMI on the handling CPU, an NMI interprocessor interrupt is sent
+ * to the other CPUs in the watchdog_allowed_mask.
+ */
+
+#include <linux/nmi.h>
+#include <linux/hpet.h>
+#include <asm/msidef.h>
+#include <asm/hpet.h>
+
+static struct hpet_hld_data *hld_data;
+
+/**
+ * get_count() - Get the current count of the HPET timer
+ *
+ * Returns:
+ *
+ * Value of the main counter of the HPET timer
+ */
+static inline unsigned long get_count(void)
+{
+	return hpet_readq(HPET_COUNTER);
+}
+
+/**
+ * set_comparator() - Update the comparator in an HPET timer instance
+ * @hdata:	A data structure with the timer instance to update
+ * @cmp:	The value to write in the in the comparator registere
+ *
+ * Returns:
+ *
+ * None
+ */
+static inline void set_comparator(struct hpet_hld_data *hdata,
+				  unsigned long cmp)
+{
+	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
+}
+
+/**
+ * kick_timer() - Reprogram timer to expire in the future
+ * @hdata:	A data structure with the timer instance to update
+ * @force:	Force reprogram. Useful enabling or re-enabling detector.
+ *
+ * Reprogram the timer to expire within watchdog_thresh seconds in the future.
+ *
+ * Returns:
+ *
+ * None
+ */
+static void kick_timer(struct hpet_hld_data *hdata, bool force)
+{
+	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
+	unsigned long new_compare, count;
+
+	/*
+	 * Update the comparator in increments of watch_thresh seconds relative
+	 * to the current count. Since watch_thresh is given in seconds, we
+	 * are able to update the comparator before the counter reaches such new
+	 * value.
+	 *
+	 * Let it wrap around if needed.
+	 */
+
+	if (kick_needed) {
+		count = get_count();
+
+		new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+
+		set_comparator(hdata, new_compare);
+	}
+}
+
+/**
+ * disable_timer() - Disable an HPET timer instance
+ * @hdata:	A data structure with the timer instance to disable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void disable_timer(struct hpet_hld_data *hdata)
+{
+	unsigned int v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v &= ~HPET_TN_ENABLE;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * enable_timer() - Enable an HPET timer instance
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void enable_timer(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v |= HPET_TN_ENABLE;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * set_periodic() - Set an HPET timer instance in periodic mode
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * If the timer supports periodic mode, configure it in such mode.
+ * Returns:
+ *
+ * None
+ */
+static void set_periodic(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+
+	if (!(hdata->flags & HPET_DEV_PERI_CAP))
+		return;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v |= HPET_TN_PERIODIC;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * is_hpet_wdt_interrupt() - Determine if an HPET timer caused interrupt
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * Returns:
+ *
+ * True if the HPET watchdog timer caused the interrupt. False otherwise.
+ */
+static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
+{
+	return false;
+}
+
+/**
+ * compose_msi_msg() - Populate address and data fields of an MSI message
+ * @hdata:	A data strucure with the message to populate
+ *
+ * Populate an MSI message to deliver an NMI interrupt. Fields are populated
+ * as in the MSI interrupt domain. This function does not populate the
+ * Destination ID.
+ *
+ * Returns: none
+ */
+static void compose_msi_msg(struct hpet_hld_data *hdata)
+{
+	struct msi_msg *msg = &hdata->msi_msg;
+
+	/*
+	 * The HPET FSB Interrupt Route register does not have an
+	 * address_hi part.
+	 */
+	msg->address_lo = MSI_ADDR_BASE_LO;
+
+	if (apic->irq_dest_mode == 0)
+		msg->address_lo |= MSI_ADDR_DEST_MODE_PHYSICAL;
+	else
+		msg->address_lo |= MSI_ADDR_DEST_MODE_LOGICAL;
+
+	msg->address_lo |= MSI_ADDR_REDIRECTION_CPU;
+
+	/*
+	 * On edge trigger, we don't care about assert level. Also,
+	 * since delivery mode is NMI, no irq vector is needed.
+	 */
+	msg->data = MSI_DATA_TRIGGER_EDGE | MSI_DATA_LEVEL_ASSERT |
+		    MSI_DATA_DELIVERY_NMI;
+}
+
+/** update_handling_cpu() - Update APIC destid of handling CPU
+ * @hdata:	A data strucure with the MSI message to update
+ *
+ * Update the APIC destid of the MSI message generated by the HPET timer
+ * on expiration.
+ */
+static int update_handling_cpu(struct hpet_hld_data *hdata)
+{
+	unsigned int destid;
+
+	hdata->msi_msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
+	destid = apic->calc_dest_apicid(hdata->handling_cpu);
+	hdata->msi_msg.address_lo |= MSI_ADDR_DEST_ID(destid);
+
+	hpet_writel(hdata->msi_msg.address_lo, HPET_Tn_ROUTE(hdata->num) + 4);
+
+	return 0;
+}
+
+/**
+ * hardlockup_detector_nmi_handler() - NMI Interrupt handler
+ * @val:	Attribute associated with the NMI. Not used.
+ * @regs:	Register values as seen when the NMI was asserted
+ *
+ * When in NMI context, check if it was caused by the expiration of the
+ * HPET timer. If yes, create a CPU mask to issue an IPI to the rest
+ * of monitored CPUs. Upon receiving their own NMI, the other CPUs will
+ * check such mask to determine if they need to also look for lockups.
+ *
+ * Returns:
+ *
+ * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
+ * otherwise.
+ */
+static int hardlockup_detector_nmi_handler(unsigned int val,
+					   struct pt_regs *regs)
+{
+	struct hpet_hld_data *hdata = hld_data;
+	unsigned int cpu = smp_processor_id();
+
+	if (is_hpet_wdt_interrupt(hdata)) {
+		/* Get ready to check other CPUs for hardlockups. */
+		cpumask_copy(&hdata->cpu_monitored_mask,
+			     watchdog_get_allowed_cpumask());
+		cpumask_clear_cpu(smp_processor_id(),
+				  &hdata->cpu_monitored_mask);
+
+		apic->send_IPI_mask_allbutself(&hdata->cpu_monitored_mask,
+					       NMI_VECTOR);
+
+		kick_timer(hdata, !(hdata->flags & HPET_DEV_PERI_CAP));
+
+		inspect_for_hardlockups(regs);
+
+		return NMI_HANDLED;
+	}
+
+	if (cpumask_test_and_clear_cpu(cpu, &hdata->cpu_monitored_mask)) {
+		inspect_for_hardlockups(regs);
+		return NMI_HANDLED;
+	}
+
+	return NMI_DONE;
+}
+
+/**
+ * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure an instance of the HPET timer to deliver interrupts via the Front-
+ * Side Bus.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_irq_msi_mode(struct hpet_hld_data *hdata)
+{
+	unsigned int v;
+
+	compose_msi_msg(hdata);
+	hpet_writel(hdata->msi_msg.data, HPET_Tn_ROUTE(hdata->num));
+	hpet_writel(hdata->msi_msg.address_lo, HPET_Tn_ROUTE(hdata->num) + 4);
+
+	/*
+	 * Since FSB interrupt delivery is used, configure as edge-triggered
+	 * interrupt.
+	 */
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v &= ~HPET_TN_LEVEL;
+	v |= HPET_TN_FSB;
+
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	return 0;
+}
+
+/**
+ * setup_hpet_irq() - Configure the interrupt delivery of an HPET timer
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure the interrupt parameters of an HPET timer. If supported, configure
+ * interrupts to be delivered via the Front-Side Bus. Also, install an interrupt
+ * handler.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_hpet_irq(struct hpet_hld_data *hdata)
+{
+	int ret;
+
+	ret = setup_irq_msi_mode(hdata);
+	if (ret)
+		return ret;
+
+	ret = register_nmi_handler(NMI_LOCAL, hardlockup_detector_nmi_handler,
+				   0, "hpet_hld");
+
+	return ret;
+}
+
+/**
+ * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
+ *
+ * This function is called for each CPU that enables the lockup watchdog.
+ * Since the HPET timer only targets the handling CPU, configure the timer
+ * only in such case.
+ *
+ * Returns:
+ *
+ * None
+ */
+void hardlockup_detector_hpet_enable(void)
+{
+	struct cpumask *allowed = watchdog_get_allowed_cpumask();
+	unsigned int cpu = smp_processor_id();
+
+	if (!hld_data)
+		return;
+
+	hld_data->handling_cpu = cpumask_first(allowed);
+
+	if (cpu == hld_data->handling_cpu) {
+		update_handling_cpu(hld_data);
+		/* Force timer kick when detector is just enabled */
+		kick_timer(hld_data, true);
+		enable_timer(hld_data);
+	}
+}
+
+/**
+ * hardlockup_detector_hpet_disable() - Disable the hardlockup detector
+ *
+ * The hardlockup detector is disabled for the CPU that executes the
+ * function.
+ *
+ * None
+ */
+void hardlockup_detector_hpet_disable(void)
+{
+	struct cpumask *allowed = watchdog_get_allowed_cpumask();
+
+	if (!hld_data)
+		return;
+
+	/* Only disable the timer if there are no more CPUs to monitor. */
+	if (!cpumask_weight(allowed))
+		disable_timer(hld_data);
+}
+
+/**
+ * hardlockup_detector_hpet_stop() - Stop the NMI watchdog on all CPUs
+ *
+ * Returns:
+ *
+ * None
+ */
+void hardlockup_detector_hpet_stop(void)
+{
+	disable_timer(hld_data);
+}
+
+/**
+ * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
+ *
+ * Only initialize and configure the detector if an HPET is available on the
+ * system.
+ *
+ * Returns:
+ *
+ * 0 success. An error code if initialization was unsuccessful.
+ */
+int __init hardlockup_detector_hpet_init(void)
+{
+	int ret;
+
+	if (!is_hpet_enabled())
+		return -ENODEV;
+
+	if (check_tsc_unstable())
+		return -ENODEV;
+
+	hld_data = hpet_hardlockup_detector_assign_timer();
+	if (!hld_data)
+		return -ENODEV;
+
+	disable_timer(hld_data);
+
+	set_periodic(hld_data);
+
+	ret = setup_hpet_irq(hld_data);
+	if (ret)
+		return -ENODEV;
+
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (10 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 20:55   ` Thomas Gleixner
  2019-04-09 11:28   ` Peter Zijlstra
  2019-02-27 16:05 ` [RFC PATCH v2 13/14] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
  2019-02-27 16:05 ` [RFC PATCH v2 14/14] x86/watchdog: Add a shim hardlockup detector Ricardo Neri
  13 siblings, 2 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki, Mimi Zohar, Jan Kiszka,
	Nick Desaulniers, Masahiro Yamada, Nayna Jain

The only direct method to determine whether an HPET timer caused an
interrupt is to read the Interrupt Status register. Unfortunately,
reading HPET registers is slow and, therefore, it is not recommended to
read them while in NMI context. Furthermore, status is not available if
the interrupt is generated vi the Front Side Bus.

An indirect manner is to compute the expected value of the the time-stamp
counter and, at the time of the interrupt and verify that its actual
value is within a range of the expected value. Since the hardlockup
detector operates in seconds, high precision is not needed. This
implementation considers that the HPET caused the HMI if the time-stamp
counter reads the expected value -/+ 1.5%. This value is selected is it
is equivalent to 1/64 and the division can be performed using bit
shifts. Experimentally, the error in the estimation is consistently less
than 1%.

Also, only read the time-stamp counter of the handling CPU (the one
targeted by the HPET timer). This helps to avoid variability of the time
stamp across CPUs.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Suggested-by: Andi Kleen <andi.kleen@intel.com> 
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h         |  2 ++
 arch/x86/kernel/watchdog_hld_hpet.c | 28 +++++++++++++++++++++++++---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 15dc3b576496..09763340c911 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -123,6 +123,8 @@ struct hpet_hld_data {
 	u32		num;
 	u32		flags;
 	u64		ticks_per_second;
+	u64		tsc_next;
+	u64		tsc_next_error;
 	u32		handling_cpu;
 	struct cpumask	cpu_monitored_mask;
 	struct msi_msg	msi_msg;
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
index cfa284da4bf6..65b4699f249a 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -55,6 +55,11 @@ static inline void set_comparator(struct hpet_hld_data *hdata,
  *
  * Reprogram the timer to expire within watchdog_thresh seconds in the future.
  *
+ * Also compute the expected value of the time-stamp counter at the time of
+ * expiration as well as a deviation from the expected value. The maximum
+ * deviation is of ~1.5%. This deviation can be easily computed by shifting
+ * by 6 positions the delta between the current and expected time-stamp values.
+ *
  * Returns:
  *
  * None
@@ -62,7 +67,18 @@ static inline void set_comparator(struct hpet_hld_data *hdata,
 static void kick_timer(struct hpet_hld_data *hdata, bool force)
 {
 	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
-	unsigned long new_compare, count;
+	unsigned long tsc_curr, tsc_delta, new_compare, count;
+
+	/* Start obtaining the current TSC and HPET counts. */
+	tsc_curr = rdtsc();
+
+	if (kick_needed)
+		count = get_count();
+
+	tsc_delta = (unsigned long)watchdog_thresh * (unsigned long)tsc_khz
+		    * 1000L;
+	hdata->tsc_next = tsc_curr + tsc_delta;
+	hdata->tsc_next_error = tsc_delta >> 6;
 
 	/*
 	 * Update the comparator in increments of watch_thresh seconds relative
@@ -74,8 +90,6 @@ static void kick_timer(struct hpet_hld_data *hdata, bool force)
 	 */
 
 	if (kick_needed) {
-		count = get_count();
-
 		new_compare = count + watchdog_thresh * hdata->ticks_per_second;
 
 		set_comparator(hdata, new_compare);
@@ -147,6 +161,14 @@ static void set_periodic(struct hpet_hld_data *hdata)
  */
 static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
 {
+	if (smp_processor_id() == hdata->handling_cpu) {
+		unsigned long tsc_curr;
+
+		tsc_curr = rdtsc();
+		if (abs(tsc_curr - hdata->tsc_next) < hdata->tsc_next_error)
+			return true;
+	}
+
 	return false;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 13/14] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (11 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  2019-03-26 21:29   ` Thomas Gleixner
  2019-02-27 16:05 ` [RFC PATCH v2 14/14] x86/watchdog: Add a shim hardlockup detector Ricardo Neri
  13 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki, Mimi Zohar, Jan Kiszka,
	Nick Desaulniers, Masahiro Yamada, Nayna Jain

Keep the HPET-based hardlockup detector disabled unless explicitly enabled
via a command line argument. If such parameter is not given, the hardlockup
detector will fallback to use the perf-based implementation.

The function hardlockup_panic_setup() is updated to return 0 in order to
to allow __setup functions of specific hardlockup detectors (in this case
hardlockup_detector_hpet_setup()) to inspect the nmi_watchdog boot
parameter.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

--
checkpatch gives the following warning:

CHECK: __setup appears un-documented -- check Documentation/admin-guide/kernel-parameters.rst
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);

This is a false-positive as the option nmi_watchdog is already
documented. The option is re-evaluated in this file as well.
---
 .../admin-guide/kernel-parameters.txt         |  6 +++++-
 arch/x86/kernel/watchdog_hld_hpet.c           | 20 +++++++++++++++++++
 kernel/watchdog.c                             |  2 +-
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cf8f5877d85f..c6270ddeb130 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2739,7 +2739,7 @@
 			Format: [state][,regs][,debounce][,die]
 
 	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
-			Format: [panic,][nopanic,][num]
+			Format: [panic,][nopanic,][num,][hpet]
 			Valid num: 0 or 1
 			0 - turn hardlockup detector in nmi_watchdog off
 			1 - turn hardlockup detector in nmi_watchdog on
@@ -2749,6 +2749,10 @@
 			please see 'nowatchdog'.
 			This is useful when you use a panic=... timeout and
 			need the box quickly up again.
+			When hpet is specified, the NMI watchdog will be driven
+			by an HPET timer, if available in the system. Otherwise,
+			the perf-based implementation will be used. Specifying
+			hpet implies that nmi_watchdog is on.
 
 			These settings can be accessed at runtime via
 			the nmi_watchdog and hardlockup_panic sysctls.
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
index 65b4699f249a..4402deff4b77 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -20,6 +20,7 @@
 #include <asm/hpet.h>
 
 static struct hpet_hld_data *hld_data;
+static bool hardlockup_use_hpet;
 
 /**
  * get_count() - Get the current count of the HPET timer
@@ -391,6 +392,22 @@ void hardlockup_detector_hpet_stop(void)
 	disable_timer(hld_data);
 }
 
+/**
+ * hardlockup_detector_hpet_setup() - Parse command-line parameters
+ * @str:	A string containing the kernel command line
+ *
+ * Parse the nmi_watchdog parameter from the kernel command line. If
+ * selected by the user, use this implementation to detect hardlockups.
+ */
+static int __init hardlockup_detector_hpet_setup(char *str)
+{
+	if (strstr(str, "hpet"))
+		hardlockup_use_hpet = true;
+
+	return 0;
+}
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);
+
 /**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
@@ -405,6 +422,9 @@ int __init hardlockup_detector_hpet_init(void)
 {
 	int ret;
 
+	if (!hardlockup_use_hpet)
+		return -ENODEV;
+
 	if (!is_hpet_enabled())
 		return -ENODEV;
 
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 367aa81294ef..28cad7310378 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -78,7 +78,7 @@ static int __init hardlockup_panic_setup(char *str)
 		nmi_watchdog_user_enabled = 0;
 	else if (!strncmp(str, "1", 1))
 		nmi_watchdog_user_enabled = 1;
-	return 1;
+	return 0;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 14/14] x86/watchdog: Add a shim hardlockup detector
  2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
                   ` (12 preceding siblings ...)
  2019-02-27 16:05 ` [RFC PATCH v2 13/14] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
@ 2019-02-27 16:05 ` Ricardo Neri
  13 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-02-27 16:05 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, Ricardo Neri, H. Peter Anvin,
	Tony Luck, Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki, Mimi Zohar, Jan Kiszka,
	Nick Desaulniers, Masahiro Yamada, Nayna Jain

The generic hardlockup detector is based on perf. It also provides a set
of weak stubs that CPU architectures can override. Add a shim hardlockup
detector for x86 that selects between perf and hpet implementations.

Specifically, this shim implementation is needed for the HPET-based
hardlockup detector; it can also be used for future implementations.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/Kconfig.debug         |  4 ++
 arch/x86/kernel/Makefile       |  1 +
 arch/x86/kernel/watchdog_hld.c | 78 ++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)
 create mode 100644 arch/x86/kernel/watchdog_hld.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 3a2845a29e8a..5e5f4d2330af 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -169,10 +169,14 @@ config IOMMU_LEAK
 config HAVE_MMIOTRACE_SUPPORT
 	def_bool y
 
+config X86_HARDLOCKUP_DETECTOR
+	bool
+
 config X86_HARDLOCKUP_DETECTOR_HPET
 	bool "Use HPET Timer for Hard Lockup Detection"
 	select SOFTLOCKUP_DETECTOR
 	select HARDLOCKUP_DETECTOR
+	select X86_HARDLOCKUP_DETECTOR
 	depends on HPET_TIMER && HPET && X86_64
 	help
 	  Say y to enable a hardlockup detector that is driven by an High-
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 9d610e8a9224..5372453fe2a7 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_VM86)		+= vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
 
 obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
+obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR) += watchdog_hld.o
 obj-$(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld_hpet.o
 obj-$(CONFIG_APB_TIMER)		+= apb_timer.o
 
diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
new file mode 100644
index 000000000000..a2f6190dbce0
--- /dev/null
+++ b/arch/x86/kernel/watchdog_hld.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A shim hardlockup detector. It simply overrides the weak stubs of the
+ * generic implementation.
+ *
+ * Copyright (C) Intel Corporation 2019
+ */
+
+#include <linux/nmi.h>
+#include <asm/hpet.h>
+
+enum x86_hardlockup_detector {
+	X86_HARDLOCKUP_DETECTOR_PERF,
+	X86_HARDLOCKUP_DETECTOR_HPET,
+};
+
+static enum __read_mostly x86_hardlockup_detector detector_type;
+
+int watchdog_nmi_enable(unsigned int cpu)
+{
+	if (detector_type == X86_HARDLOCKUP_DETECTOR_PERF) {
+		hardlockup_detector_perf_enable();
+		return 0;
+	}
+
+	if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET) {
+		hardlockup_detector_hpet_enable();
+		return 0;
+	}
+
+	return -ENODEV;
+}
+
+void watchdog_nmi_disable(unsigned int cpu)
+{
+	if (detector_type == X86_HARDLOCKUP_DETECTOR_PERF) {
+		hardlockup_detector_perf_disable();
+		return;
+	}
+
+	if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET) {
+		hardlockup_detector_hpet_disable();
+		return;
+	}
+}
+
+int __init watchdog_nmi_probe(void)
+{
+	int ret;
+
+	/*
+	 * Try first with the HPET hardlockup detector. It will only
+	 * succeed if selected at build time and the nmi_watchdog
+	 * command-line parameter is configured. This ensure that the
+	 * perf-based detector is used by default, if selected at
+	 * build time.
+	 */
+	ret = hardlockup_detector_hpet_init();
+	if (!ret) {
+		detector_type = X86_HARDLOCKUP_DETECTOR_HPET;
+		return ret;
+	}
+
+	ret = hardlockup_detector_perf_init();
+	if (!ret) {
+		detector_type = X86_HARDLOCKUP_DETECTOR_PERF;
+		return ret;
+	}
+
+	return ret;
+}
+
+void watchdog_nmi_stop(void)
+{
+	/* Only the HPET lockup detector defines a stop function. */
+	if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
+		hardlockup_detector_hpet_enable();
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation
  2019-02-27 16:05 ` [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation Ricardo Neri
@ 2019-02-27 16:17   ` Paul E. McKenney
  2019-03-01  1:17     ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Paul E. McKenney @ 2019-02-27 16:17 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel,
	Ricardo Neri, H. Peter Anvin, Tony Luck, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Luis R. Rodriguez, Waiman Long, Josh Poimboeuf, Randy Dunlap,
	Davidlohr Bueso, Christoffer Dall, Marc Zyngier, Kai-Heng Feng,
	Konrad Rzeszutek Wilk, David Rientjes, sparclinux, linuxppc-dev

On Wed, Feb 27, 2019 at 08:05:13AM -0800, Ricardo Neri wrote:
> CPU architectures that have an NMI watchdog use arch_touch_nmi_watchdog()
> to briefly ignore the hardlockup detector. If the architecture does not
> have an NMI watchdog, one can be constructed using a source of non-
> maskable interrupts. In this case, arch_touch_nmi_watchdog() is common
> to any underlying hardware resource used to drive the detector and needs
> to be available to other kernel subsystems if hardware different from perf
> drives the detector.
> 
> There exists perf-based and HPET-based implementations. Make it available
> to the latter.
> 
> For clarity, wrap this function in a separate preprocessor conditional
> from functions which are truly specific to the perf-based implementation.
> 
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Andi Kleen <andi.kleen@intel.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Babu Moger <babu.moger@oracle.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Philippe Ombredanne <pombredanne@nexb.com>
> Cc: Colin Ian King <colin.king@canonical.com>
> Cc: Byungchul Park <byungchul.park@lge.com>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Christoffer Dall <cdall@linaro.org>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
> Cc: x86@kernel.org
> Cc: sparclinux@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  include/linux/nmi.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 5a8b19749769..bf5ebcfdd590 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -94,8 +94,16 @@ static inline void hardlockup_detector_disable(void) {}
>  # define NMI_WATCHDOG_SYSCTL_PERM	0444
>  #endif
> 
> -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
> +    defined(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET)

Why not instead make CONFIG_X86_HARDLOCKUP_DETECTOR_HPET select
CONFIG_HARDLOCKUP_DETECTOR_PERF?  Keep the arch-specific details
in the arch-specific files and all that.

							Thanx, Paul

>  extern void arch_touch_nmi_watchdog(void);
> +#else
> +# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
> +static inline void arch_touch_nmi_watchdog(void) {}
> +# endif
> +#endif
> +
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>  extern void hardlockup_detector_perf_stop(void);
>  extern void hardlockup_detector_perf_restart(void);
>  extern void hardlockup_detector_perf_disable(void);
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation
  2019-02-27 16:17   ` Paul E. McKenney
@ 2019-03-01  1:17     ` Ricardo Neri
  2019-03-26 21:20       ` Thomas Gleixner
  0 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-03-01  1:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel,
	Ricardo Neri, H. Peter Anvin, Tony Luck, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Luis R. Rodriguez, Waiman Long, Josh Poimboeuf, Randy Dunlap,
	Davidlohr Bueso, Christoffer Dall, Marc Zyngier, Kai-Heng Feng,
	Konrad Rzeszutek Wilk, David Rientjes, sparclinux, linuxppc-dev

On Wed, Feb 27, 2019 at 08:17:58AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 27, 2019 at 08:05:13AM -0800, Ricardo Neri wrote:
> > CPU architectures that have an NMI watchdog use arch_touch_nmi_watchdog()
> > to briefly ignore the hardlockup detector. If the architecture does not
> > have an NMI watchdog, one can be constructed using a source of non-
> > maskable interrupts. In this case, arch_touch_nmi_watchdog() is common
> > to any underlying hardware resource used to drive the detector and needs
> > to be available to other kernel subsystems if hardware different from perf
> > drives the detector.
> > 
> > There exists perf-based and HPET-based implementations. Make it available
> > to the latter.
> > 
> > For clarity, wrap this function in a separate preprocessor conditional
> > from functions which are truly specific to the perf-based implementation.
> > 
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Ashok Raj <ashok.raj@intel.com>
> > Cc: Andi Kleen <andi.kleen@intel.com>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > Cc: Don Zickus <dzickus@redhat.com>
> > Cc: Nicholas Piggin <npiggin@gmail.com>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Babu Moger <babu.moger@oracle.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Philippe Ombredanne <pombredanne@nexb.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Byungchul Park <byungchul.park@lge.com>
> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Davidlohr Bueso <dave@stgolabs.net>
> > Cc: Christoffer Dall <cdall@linaro.org>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Cc: sparclinux@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  include/linux/nmi.h | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> > index 5a8b19749769..bf5ebcfdd590 100644
> > --- a/include/linux/nmi.h
> > +++ b/include/linux/nmi.h
> > @@ -94,8 +94,16 @@ static inline void hardlockup_detector_disable(void) {}
> >  # define NMI_WATCHDOG_SYSCTL_PERM	0444
> >  #endif
> > 
> > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
> > +    defined(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET)
> 
> Why not instead make CONFIG_X86_HARDLOCKUP_DETECTOR_HPET select
> CONFIG_HARDLOCKUP_DETECTOR_PERF?  Keep the arch-specific details
> in the arch-specific files and all that.

Thanks for your feedback, Paul! The HPET implementation does not use
perf. Thus, in my opinion is not correct for the HPET HLD to select
the perf implementation. Patch 8 of this series splits the perf-specific
code and the generic hardlockup detector code. Does this make sense?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-02-27 16:05 ` [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
@ 2019-03-26 20:49   ` Thomas Gleixner
  2019-04-09  2:02     ` Ricardo Neri
  2019-04-09 10:59     ` Peter Zijlstra
  2019-04-05 16:12   ` Suthikulpanit, Suravee
  2019-04-09 11:03   ` Peter Zijlstra
  2 siblings, 2 replies; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 20:49 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Wed, 27 Feb 2019, Ricardo Neri wrote:
> +/**
> + * get_count() - Get the current count of the HPET timer
> + *
> + * Returns:
> + *
> + * Value of the main counter of the HPET timer

The extra newline is not required IIRC.

    Returns: Value ......

shoud be sufficient and spares a lot of lines all over the place.

> + */
> +static inline unsigned long get_count(void)
> +{
> +	return hpet_readq(HPET_COUNTER);
> +}
> +
> +/**
> + * set_comparator() - Update the comparator in an HPET timer instance
> + * @hdata:	A data structure with the timer instance to update
> + * @cmp:	The value to write in the in the comparator registere
> + *
> + * Returns:
> + *
> + * None

Returns: None is not required and provides no additional value.

I appreciate that you try to document your functions extensively, but please
apply common sense whether it really provides value. Documenting the
obvious is not necessarily an improvement.

> + */
> +static inline void set_comparator(struct hpet_hld_data *hdata,
> +				  unsigned long cmp)
> +{
> +	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
> +}

Also do we really need wrappers plus N lines documentation for
hpet_readq/writeq()?

I fail to see the benefit. It's just increasing the line count for no
reason and requires the reader to lookup functions which are just cosmetic
wrappers.

> +/**
> + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> + * @val:	Attribute associated with the NMI. Not used.
> + * @regs:	Register values as seen when the NMI was asserted
> + *
> + * When in NMI context, check if it was caused by the expiration of the

When in NMI context? This function _IS_ called in NMI context, right?

> + * HPET timer. If yes, create a CPU mask to issue an IPI to the rest
> + * of monitored CPUs. Upon receiving their own NMI, the other CPUs will
> + * check such mask to determine if they need to also look for lockups.
> + *
> + * Returns:
> + *
> + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> + * otherwise.
> + */
> +static int hardlockup_detector_nmi_handler(unsigned int val,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	unsigned int cpu = smp_processor_id();
> +
> +	if (is_hpet_wdt_interrupt(hdata)) {
> +		/* Get ready to check other CPUs for hardlockups. */
> +		cpumask_copy(&hdata->cpu_monitored_mask,
> +			     watchdog_get_allowed_cpumask());
> +		cpumask_clear_cpu(smp_processor_id(),
> +				  &hdata->cpu_monitored_mask);

Why are you copying the mask in NMI context and not updating it when the
watchdog enable/disable functions are called?

Also the mask can contain offline CPUs, which is bad because the send IPI
function expects offline CPUs to be cleared.

Clearing the CPU in the mask is not required because the function you
invoke:

> +		apic->send_IPI_mask_allbutself(&hdata->cpu_monitored_mask,
> +					       NMI_VECTOR);

has it's name for a reason.

> +
> +		kick_timer(hdata, !(hdata->flags & HPET_DEV_PERI_CAP));
> +
> +		inspect_for_hardlockups(regs);
> +
> +		return NMI_HANDLED;
> +	}
> +
> +	if (cpumask_test_and_clear_cpu(cpu, &hdata->cpu_monitored_mask)) {
> +		inspect_for_hardlockups(regs);
> +		return NMI_HANDLED;
> +	}

So if the function above returns false, then why would you try to check
that CPU mask and invoke the inspection? You already cleared that bit
above. This part is pointless and can be removed.

> +void hardlockup_detector_hpet_enable(void)
> +{
> +	struct cpumask *allowed = watchdog_get_allowed_cpumask();
> +	unsigned int cpu = smp_processor_id();

This is called from a function which has already a cpu argument. Why aren't
you handing that in?

> +
> +	if (!hld_data)
> +		return;

Why can hld_data be NULL here?

> +
> +	hld_data->handling_cpu = cpumask_first(allowed);
> +
> +	if (cpu == hld_data->handling_cpu) {
> +		update_handling_cpu(hld_data);
> +		/* Force timer kick when detector is just enabled */
> +		kick_timer(hld_data, true);
> +		enable_timer(hld_data);
> +	}

That logic is wrong. The per cpu enable function is called via:

  softlockup_start_all()
    for_each_cpu(cpu, &watchdog_allowed_mask)
       smp_call_on_cpu(cpu, softlockup_start_fn, NULL, false);

       ---> softlockup_start_fn()
               watchdog_enable()

OR

  lockup_detector_online_cpu()
    watchdog_enable()

The latter is a CPU hotplug operation. The former is invoked when the
watchdog is (re)configured in the core code. Both are mutually exclusive
and guarantee to never invoke the function concurrently on multiple CPUs.

So way you should handle this is:

	cpumask_set_cpu(cpu, hld_data->cpu_monitored_mask);

	if (!hld_data->enabled_cpus++) {
		hld_data->handling_cpu = cpu;
		kick_timer();
		enable_timer();
	}

The cpu mask starts off empty and each CPU sets itself when the function is
invoked on it.

data->enabled_cpus keeps track of the enabled cpus so you avoid
reconfiguration just because a different cpu comes online. And it's
required for disable as well.

> +void hardlockup_detector_hpet_disable(void)
> +{
> +	struct cpumask *allowed = watchdog_get_allowed_cpumask();
> +
> +	if (!hld_data)
> +		return;
> +
> +	/* Only disable the timer if there are no more CPUs to monitor. */
> +	if (!cpumask_weight(allowed))
> +		disable_timer(hld_data);

Again this should be:

	cpumask_clear_cpu(cpu, hld_data->cpu_monitored_mask);
	hld_data->enabled_cpus--;

	if (hld_data->handling_cpu != cpu)
		return;

	disable_timer();
	if (hld_data->enabled_cpus)
		return;

	hld_data->handling_cpu = cpumask_first(hld_data->cpu_monitored_mask);
	enable_timer();

> +}
> +
> +/**
> + * hardlockup_detector_hpet_stop() - Stop the NMI watchdog on all CPUs
> + *
> + * Returns:
> + *
> + * None
> + */
> +void hardlockup_detector_hpet_stop(void)
> +{
> +	disable_timer(hld_data);
> +}

This function is nowhere used.

> +int __init hardlockup_detector_hpet_init(void)
> +{
> +	int ret;
> +
> +	if (!is_hpet_enabled())
> +		return -ENODEV;
> +
> +	if (check_tsc_unstable())

What happens if the TSC becomes unstable _AFTER_ you initialized and
started the HPET watchdog?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  2019-02-27 16:05 ` [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
@ 2019-03-26 20:55   ` Thomas Gleixner
  2019-04-09  2:02     ` Ricardo Neri
  2019-04-09 11:28   ` Peter Zijlstra
  1 sibling, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 20:55 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Wed, 27 Feb 2019, Ricardo Neri wrote:
> @@ -62,7 +67,18 @@ static inline void set_comparator(struct hpet_hld_data *hdata,
>  static void kick_timer(struct hpet_hld_data *hdata, bool force)
>  {
>  	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
> -	unsigned long new_compare, count;
> +	unsigned long tsc_curr, tsc_delta, new_compare, count;
> +
> +	/* Start obtaining the current TSC and HPET counts. */
> +	tsc_curr = rdtsc();
> +
> +	if (kick_needed)
> +		count = get_count();

Can you please keep the TSC code in one block and the HPET block in the
next one? Having this inbetween is really bad to follow.

It really does not matter whether you read the HPET counter before or after
the calculation. This is a crystal ball estimation anyway so a few cyles
more or less are completely irrelevant.

> +	tsc_delta = (unsigned long)watchdog_thresh * (unsigned long)tsc_khz
> +		    * 1000L;
> +	hdata->tsc_next = tsc_curr + tsc_delta;
> +	hdata->tsc_next_error = tsc_delta >> 6;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 02/14] x86/hpet: Expose more functions to read and write registers
  2019-02-27 16:05 ` [RFC PATCH v2 02/14] x86/hpet: Expose more functions to read and write registers Ricardo Neri
@ 2019-03-26 21:00   ` Thomas Gleixner
  2019-04-09  2:03     ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 21:00 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Philippe Ombredanne, Kate Stewart,
	Rafael J. Wysocki

On Wed, 27 Feb 2019, Ricardo Neri wrote:
>  struct irq_data;
> @@ -109,6 +114,11 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
>  static inline int hpet_enable(void) { return 0; }
>  static inline int is_hpet_enabled(void) { return 0; }
>  #define hpet_readl(a) 0
> +#define hpet_writel(d, a)

What for?

> +#ifdef CONFIG_X86_64
> +#define hpet_readq(a) 0
> +#define hpet_writeq(d, a)
> +#endif

Ditto.

There are no users outside of HPET and your new HPET watchdog code for
those. And both are not compiled when CONFIG_HPET=n.

The only reason to have the hpet_readl() define, which btw. should be an
inline, is to avoid massive ifdeffery in the TSC calibration code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 03/14] x86/hpet: Calculate ticks-per-second in a separate function
  2019-02-27 16:05 ` [RFC PATCH v2 03/14] x86/hpet: Calculate ticks-per-second in a separate function Ricardo Neri
@ 2019-03-26 21:03   ` Thomas Gleixner
  2019-04-09  2:04     ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 21:03 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki

On Wed, 27 Feb 2019, Ricardo Neri wrote:
>  int hpet_alloc(struct hpet_data *hdp)
>  {
>  	u64 cap, mcfg;
> @@ -845,7 +868,6 @@ int hpet_alloc(struct hpet_data *hdp)
>  	size_t siz;
>  	struct hpet __iomem *hpet;
>  	static struct hpets *last;
> -	unsigned long period;
>  	unsigned long long temp;
>  	u32 remainder;
>  
> @@ -881,6 +903,8 @@ int hpet_alloc(struct hpet_data *hdp)
>  
>  	cap = readq(&hpet->hpet_cap);
>  
> +	temp = hpet_get_ticks_per_sec(cap);

Just putting stuff to random places does not make the code any better.

>  	ntimer = ((cap & HPET_NUM_TIM_CAP_MASK) >> HPET_NUM_TIM_CAP_SHIFT) + 1;
>  
>  	if (hpetp->hp_ntimer != ntimer) {
> @@ -897,11 +921,6 @@ int hpet_alloc(struct hpet_data *hdp)
>  
>  	last = hpetp;
>  
> -	period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
> -		HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
> -	temp = 1000000000000000uLL; /* 10^15 femtoseconds per second */
> -	temp += period >> 1; /* round */
> -	do_div(temp, period);
>  	hpetp->hp_tick_freq = temp; /* ticks per second */

What's wrong with the obvious:

       hpetp->hp_tick_freq = hpet_get_ticks_per_sec(cap);

Hmm?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 05/14] x86/hpet: Relocate flag definitions to a header file
  2019-02-27 16:05 ` [RFC PATCH v2 05/14] x86/hpet: Relocate flag definitions to a header file Ricardo Neri
@ 2019-03-26 21:11   ` Thomas Gleixner
  2019-04-09  2:04     ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 21:11 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki

On Wed, 27 Feb 2019, Ricardo Neri wrote:

> Users of HPET timers (such as the hardlockup detector) need the definitions
> of these flags to interpret the configuration of a timer as passed by
> platform code.

Which platform code?

> +#define HPET_DEV_USED_BIT		2
> +#define HPET_DEV_USED			(1 << HPET_DEV_USED_BIT)
> +#define HPET_DEV_VALID			0x8
> +#define HPET_DEV_FSB_CAP		0x1000
> +#define HPET_DEV_PERI_CAP		0x2000

I'm not seing why you would need any of those in the watchdog code.

The only function related to the watchdog which needs these is
hpet_hardlockup_detector_assign_timer() and that is located in hpet.c
itself.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/14] x86/hpet: Configure the timer used by the hardlockup detector
  2019-02-27 16:05 ` [RFC PATCH v2 06/14] x86/hpet: Configure the timer used by the hardlockup detector Ricardo Neri
@ 2019-03-26 21:13   ` Thomas Gleixner
  2019-04-09  2:04     ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 21:13 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki

On Wed, 27 Feb 2019, Ricardo Neri wrote:
> +#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
> +struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
> +{
> +	struct hpet_hld_data *hdata;
> +	unsigned int cfg;
> +
> +	cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER_NR));
> +
> +	if (!(cfg & HPET_TN_FSB_CAP))
> +		return NULL;
> +
> +	hdata = kzalloc(sizeof(*hdata), GFP_KERNEL);
> +	if (!hdata)
> +		return NULL;
> +
> +	hdata->flags = HPET_DEV_FSB_CAP;

Pointless.

> +
> +	if (cfg & HPET_TN_PERIODIC_CAP)
> +		hdata->flags |= HPET_DEV_PERI_CAP;

This can be expressed by a simple:

     hdata->has_periodic = 1;

And no flag shuffling required at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 08/14] watchdog/hardlockup: Decouple the hardlockup detector from perf
  2019-02-27 16:05 ` [RFC PATCH v2 08/14] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
@ 2019-03-26 21:18   ` Thomas Gleixner
  2019-04-09  2:05     ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 21:18 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Luis R. Rodriguez,
	sparclinux, linuxppc-dev

On Wed, 27 Feb 2019, Ricardo Neri wrote:
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Detect hard lockups on a system
> + *
> + * Copyright (C) Intel Corporation 2019
> + *
> + * Note: All of this code comes from the original perf-specific hardlockup
> + * detector.

And because you copied it from there without modification this becomes
automatically Copyrighted by Intel?

May I recommend that you talk to your legal departement about this.

Aside of that there is really no value in creating yet another file. The
code in question can simply wrapped into a large #ifdef PERF in the old
file.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation
  2019-03-01  1:17     ` Ricardo Neri
@ 2019-03-26 21:20       ` Thomas Gleixner
  2019-04-09  2:05         ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 21:20 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Paul E. McKenney, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel,
	Ricardo Neri, H. Peter Anvin, Tony Luck, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Luis R. Rodriguez, Waiman Long, Josh Poimboeuf, Randy Dunlap,
	Davidlohr Bueso, Christoffer Dall, Marc Zyngier, Kai-Heng Feng,
	Konrad Rzeszutek Wilk, David Rientjes, sparclinux, linuxppc-dev

On Thu, 28 Feb 2019, Ricardo Neri wrote:
> > > 
> > > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
> > > +    defined(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET)
> > 
> > Why not instead make CONFIG_X86_HARDLOCKUP_DETECTOR_HPET select
> > CONFIG_HARDLOCKUP_DETECTOR_PERF?  Keep the arch-specific details
> > in the arch-specific files and all that.
> 
> Thanks for your feedback, Paul! The HPET implementation does not use
> perf. Thus, in my opinion is not correct for the HPET HLD to select
> the perf implementation. Patch 8 of this series splits the perf-specific
> code and the generic hardlockup detector code. Does this make sense?

That's what intermediate config symbols are for.

config HARDLOCKUP_DETECTOR_CORE
       bool

And make both PERF and HPET select it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  2019-02-27 16:05 ` [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask Ricardo Neri
@ 2019-03-26 21:22   ` Thomas Gleixner
  2019-04-09  2:05     ` Ricardo Neri
  2019-04-09 11:34   ` Peter Zijlstra
  1 sibling, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 21:22 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes,
	David S. Miller, Benjamin Herrenschmidt, sparclinux,
	linuxppc-dev

On Wed, 27 Feb 2019, Ricardo Neri wrote:
>  
> -struct cpumask watchdog_allowed_mask __read_mostly;
> +static struct cpumask watchdog_allowed_mask __read_mostly;

That hunk is correct.

>  struct cpumask watchdog_cpumask __read_mostly;
>  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> @@ -92,6 +92,11 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>  }
>  __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>  # endif /* CONFIG_SMP */
> +
> +struct cpumask *watchdog_get_allowed_cpumask(void)
> +{
> +	return &watchdog_allowed_mask;
> +}

That part is pointless as I showed you in the other reply. You don't need
that mask in your code at all.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 13/14] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
  2019-02-27 16:05 ` [RFC PATCH v2 13/14] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
@ 2019-03-26 21:29   ` Thomas Gleixner
  2019-04-09  2:07     ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2019-03-26 21:29 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Wed, 27 Feb 2019, Ricardo Neri wrote:
> +			When hpet is specified, the NMI watchdog will be driven
> +			by an HPET timer, if available in the system. Otherwise,
> +			the perf-based implementation will be used. Specifying
> +			hpet implies that nmi_watchdog is on.

How so?

> +static int __init hardlockup_detector_hpet_setup(char *str)
> +{
> +	if (strstr(str, "hpet"))
> +		hardlockup_use_hpet = true;

strstr()? Not really.

> +
> +	return 0;
> +}
> +__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);
> +
>  /**
>   * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
>   *
> @@ -405,6 +422,9 @@ int __init hardlockup_detector_hpet_init(void)
>  {
>  	int ret;
>  
> +	if (!hardlockup_use_hpet)
> +		return -ENODEV;

This should have been there in the patch which introduces
hardlockup_detector_hpet_init(). And this patch merily adds the command
line magic which sets that flag.

> +
>  	if (!is_hpet_enabled())
>  		return -ENODEV;
>  
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 367aa81294ef..28cad7310378 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -78,7 +78,7 @@ static int __init hardlockup_panic_setup(char *str)
>  		nmi_watchdog_user_enabled = 0;
>  	else if (!strncmp(str, "1", 1))
>  		nmi_watchdog_user_enabled = 1;
> -	return 1;
> +	return 0;

Why?

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-02-27 16:05 ` [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
  2019-03-26 20:49   ` Thomas Gleixner
@ 2019-04-05 16:12   ` Suthikulpanit, Suravee
  2019-04-09  2:14     ` Ricardo Neri
  2019-04-09 11:03   ` Peter Zijlstra
  2 siblings, 1 reply; 49+ messages in thread
From: Suthikulpanit, Suravee @ 2019-04-05 16:12 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Suthikulpanit, Suravee
  Cc: Ashok Raj, Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86,
	linux-kernel, Ricardo Neri, H. Peter Anvin, Tony Luck,
	Clemens Ladisch, Arnd Bergmann, Philippe Ombredanne,
	Kate Stewart, Rafael J. Wysocki, Mimi Zohar, Jan Kiszka,
	Nick Desaulniers, Masahiro Yamada, Nayna Jain, Lendacky, Thomas,
	Grimm, Jon

Hi Neri,

While trying out this patch series, I found that it does not work when the HPET timer
is in periodic mode.

On 2/27/19 11:05 PM, Ricardo Neri wrote:
> ......
> diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
> new file mode 100644
> index 000000000000..cfa284da4bf6
> --- /dev/null
> +++ b/arch/x86/kernel/watchdog_hld_hpet.c
> @@ -0,0 +1,405 @@
> .....
> +
> +/**
> + * set_comparator() - Update the comparator in an HPET timer instance
> + * @hdata:	A data structure with the timer instance to update
> + * @cmp:	The value to write in the in the comparator registere
> + *
> + * Returns:
> + *
> + * None
> + */
> +static inline void set_comparator(struct hpet_hld_data *hdata,
> +				  unsigned long cmp)
> +{
> +	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
> +}
> +
> +/**
> + * kick_timer() - Reprogram timer to expire in the future
> + * @hdata:	A data structure with the timer instance to update
> + * @force:	Force reprogram. Useful enabling or re-enabling detector.
> + *
> + * Reprogram the timer to expire within watchdog_thresh seconds in the future.
> + *
> + * Returns:
> + *
> + * None
> + */
> +static void kick_timer(struct hpet_hld_data *hdata, bool force)
> +{
> +	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
> +	unsigned long new_compare, count;
> +
> +	/*
> +	 * Update the comparator in increments of watch_thresh seconds relative
> +	 * to the current count. Since watch_thresh is given in seconds, we
> +	 * are able to update the comparator before the counter reaches such new
> +	 * value.
> +	 *
> +	 * Let it wrap around if needed.
> +	 */
> +
> +	if (kick_needed) {
> +		count = get_count();
> +
> +		new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> +
> +		set_comparator(hdata, new_compare);
> +	}
> +}

It turns out that the set_comparator() does not seem to support periodic mode,
in which it should have been setting the accumulator/period via the comparator register.

Instead, what if we re-factor the code in the arch/x86/kernel/hpet.c:hpet_set_periodic()
to hpet_set_comparator(). Then we can also reuse it in the arch/x86/kernel/watchdog_hld_pet.c:
kick_timer() as shown below.

With these changes, I can test the series on AMD systems, and see all cores
in /proc/interrupts are receiving NMI interrupts.

Regards,
Suravee


diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 0976334..f30957f 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -115,6 +115,7 @@ extern int hpet_rtc_timer_init(void);
  extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id);
  extern int hpet_register_irq_handler(rtc_irq_handler handler);
  extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
+extern void hpet_set_comparator(int num, unsigned int cmp, unsigned int period);

  #endif /* CONFIG_HPET_EMULATE_RTC */

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 03b05dae..f3b2351 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -328,6 +328,41 @@ static void hpet_legacy_clockevent_register(void)
  	printk(KERN_DEBUG "hpet clockevent registered\n");
  }

+/*
+ * hpet_set_comparator() - Helper function for setting comparator register
+ * @num:	The timer ID
+ * @cmp:	The value to be written to the comparator/accumulator
+ * @period:	The value to be written to the period (0 = oneshot mode)
+ *
+ * Helper function for updating comparator, accumulator and period values.
+ *
+ * In periodic mode, HPET needs HPET_TN_SETVAL to be set before writing
+ * to the Tn_CMP to update the accumulator. Then, HPET needs a second
+ * write (with HPET_TN_SETVAL cleared) to Tn_CMP to set the period.
+ * The HPET_TN_SETVAL bit is automatically cleared after the first write.
+ *
+ * For one-shot mode, HPET_TN_SETVAL does not need to be set.
+ *
+ * See the following documents:
+ *   - Intel IA-PC HPET (High Precision Event Timers) Specification
+ *   - AMD-8111 HyperTransport I/O Hub Data Sheet, Publication # 24674
+ */
+void hpet_set_comparator(int num, unsigned int cmp, unsigned int period)
+{
+	if (period) {
+		unsigned int v = hpet_readl(HPET_Tn_CFG(num));
+		hpet_writel(v | HPET_TN_SETVAL, HPET_Tn_CFG(num));
+	}
+
+	hpet_writel(cmp, HPET_Tn_CMP(num));
+	if (!period)
+		return;
+
+	udelay(1);
+	hpet_writel(period, HPET_Tn_CMP(num));
+}
+EXPORT_SYMBOL_GPL(hpet_set_comparator);
+
  static int hpet_set_periodic(struct clock_event_device *evt, int timer)
  {
  	unsigned int cfg, cmp, now;
@@ -339,19 +374,11 @@ static int hpet_set_periodic(struct clock_event_device *evt, int timer)
  	now = hpet_readl(HPET_COUNTER);
  	cmp = now + (unsigned int)delta;
  	cfg = hpet_readl(HPET_Tn_CFG(timer));
-	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
-	       HPET_TN_32BIT;
+	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_32BIT;
  	hpet_writel(cfg, HPET_Tn_CFG(timer));
-	hpet_writel(cmp, HPET_Tn_CMP(timer));
-	udelay(1);
-	/*
-	 * HPET on AMD 81xx needs a second write (with HPET_TN_SETVAL
-	 * cleared) to T0_CMP to set the period. The HPET_TN_SETVAL
-	 * bit is automatically cleared after the first write.
-	 * (See AMD-8111 HyperTransport I/O Hub Data Sheet,
-	 * Publication # 24674)
-	 */
-	hpet_writel((unsigned int)delta, HPET_Tn_CMP(timer));
+
+	hpet_set_comparator(timer, cmp, (unsigned int)delta);
+
  	hpet_start_counter();
  	hpet_print_config();

diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
index 4402def..de33c50 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -35,21 +35,6 @@ static inline unsigned long get_count(void)
  }

  /**
- * set_comparator() - Update the comparator in an HPET timer instance
- * @hdata:	A data structure with the timer instance to update
- * @cmp:	The value to write in the in the comparator registere
- *
- * Returns:
- *
- * None
- */
-static inline void set_comparator(struct hpet_hld_data *hdata,
-				  unsigned long cmp)
-{
-	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
-}
-
-/**
   * kick_timer() - Reprogram timer to expire in the future
   * @hdata:	A data structure with the timer instance to update
   * @force:	Force reprogram. Useful enabling or re-enabling detector.
@@ -68,7 +53,7 @@ static inline void set_comparator(struct hpet_hld_data *hdata,
  static void kick_timer(struct hpet_hld_data *hdata, bool force)
  {
  	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
-	unsigned long tsc_curr, tsc_delta, new_compare, count;
+	unsigned long tsc_curr, tsc_delta, new_compare, count, period = 0;

  	/* Start obtaining the current TSC and HPET counts. */
  	tsc_curr = rdtsc();
@@ -81,6 +66,9 @@ static void kick_timer(struct hpet_hld_data *hdata, bool force)
  	hdata->tsc_next = tsc_curr + tsc_delta;
  	hdata->tsc_next_error = tsc_delta >> 6;

+	if (!kick_needed)
+		return;
+
  	/*
  	 * Update the comparator in increments of watch_thresh seconds relative
  	 * to the current count. Since watch_thresh is given in seconds, we
@@ -89,12 +77,11 @@ static void kick_timer(struct hpet_hld_data *hdata, bool force)
  	 *
  	 * Let it wrap around if needed.
  	 */
+	if (hdata->flags & HPET_DEV_PERI_CAP)
+		period = watchdog_thresh * hdata->ticks_per_second;

-	if (kick_needed) {
-		new_compare = count + watchdog_thresh * hdata->ticks_per_second;
-
-		set_comparator(hdata, new_compare);
-	}
+	new_compare = count + period;
+	hpet_set_comparator(hdata->num, new_compare, period);
  }

  /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-03-26 20:49   ` Thomas Gleixner
@ 2019-04-09  2:02     ` Ricardo Neri
  2019-04-09 10:59     ` Peter Zijlstra
  1 sibling, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Tue, Mar 26, 2019 at 09:49:13PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> > +/**
> > + * get_count() - Get the current count of the HPET timer
> > + *
> > + * Returns:
> > + *
> > + * Value of the main counter of the HPET timer
> 
> The extra newline is not required IIRC.
> 
>     Returns: Value ......
> 
> shoud be sufficient and spares a lot of lines all over the place.
> 

Many thanks for your review!

I checked the documentation again and the extra line is not needed. I'll
remove it.

> > + */
> > +static inline unsigned long get_count(void)
> > +{
> > +	return hpet_readq(HPET_COUNTER);
> > +}
> > +
> > +/**
> > + * set_comparator() - Update the comparator in an HPET timer instance
> > + * @hdata:	A data structure with the timer instance to update
> > + * @cmp:	The value to write in the in the comparator registere
> > + *
> > + * Returns:
> > + *
> > + * None
> 
> Returns: None is not required and provides no additional value.
> 
> I appreciate that you try to document your functions extensively, but please
> apply common sense whether it really provides value. Documenting the
> obvious is not necessarily an improvement.

Sure, I'll limit function documentation to places where it makes sense.

> 
> > + */
> > +static inline void set_comparator(struct hpet_hld_data *hdata,
> > +				  unsigned long cmp)
> > +{
> > +	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
> > +}
> 
> Also do we really need wrappers plus N lines documentation for
> hpet_readq/writeq()?
> 
> I fail to see the benefit. It's just increasing the line count for no
> reason and requires the reader to lookup functions which are just cosmetic
> wrappers.

I'll remove the wrapper for set_comparator() and get_count() as well as
their documentation. These are the simplest wrappers. Is this
acceptable?

> 
> > +/**
> > + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> > + * @val:	Attribute associated with the NMI. Not used.
> > + * @regs:	Register values as seen when the NMI was asserted
> > + *
> > + * When in NMI context, check if it was caused by the expiration of the
> 
> When in NMI context? This function _IS_ called in NMI context, right?
> 

Yes, this wording is incorrect. I'll remove the "When in NMI context"
part.

> > + * HPET timer. If yes, create a CPU mask to issue an IPI to the rest
> > + * of monitored CPUs. Upon receiving their own NMI, the other CPUs will
> > + * check such mask to determine if they need to also look for lockups.
> > + *
> > + * Returns:
> > + *
> > + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> > + * otherwise.
> > + */
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int cpu = smp_processor_id();
> > +
> > +	if (is_hpet_wdt_interrupt(hdata)) {
> > +		/* Get ready to check other CPUs for hardlockups. */
> > +		cpumask_copy(&hdata->cpu_monitored_mask,
> > +			     watchdog_get_allowed_cpumask());
> > +		cpumask_clear_cpu(smp_processor_id(),
> > +				  &hdata->cpu_monitored_mask);
> 
> Why are you copying the mask in NMI context and not updating it when the
> watchdog enable/disable functions are called?

I think I need two CPU masks for this implementation. A mask is used to
track what CPUs need to be monitored at any given time; I use
watchdog_allowed_mask from kernel/watchdog.c for that.

Separately, I need to prepare a CPU mask to determine which CPUs should be
targeted in the IPI; I use hdata->cpu_monitored_mask for this. This mask will
remain clear most of the time. As soon as CPUs receive their own IPI,
they must clear themselves from this mask. This to prevent them from
incorrectly handling unrelated NMIs; as would be the case if it relied on
watchdog_allowed_mask.

> 
> Also the mask can contain offline CPUs, which is bad because the send IPI
> function expects offline CPUs to be cleared.
> 
> Clearing the CPU in the mask is not required because the function you
> invoke:
> 
> > +		apic->send_IPI_mask_allbutself(&hdata->cpu_monitored_mask,
> > +					       NMI_VECTOR);
> 
> has it's name for a reason.
> 

I think it still needs to be cleared. While it may not receive an IPI
from this call, an unrelated NMI may happen and it will incorrectly invoke
inspection as it found its bit in the CPU mask set.

> > +
> > +		kick_timer(hdata, !(hdata->flags & HPET_DEV_PERI_CAP));
> > +
> > +		inspect_for_hardlockups(regs);
> > +
> > +		return NMI_HANDLED;
> > +	}
> > +
> > +	if (cpumask_test_and_clear_cpu(cpu, &hdata->cpu_monitored_mask)) {
> > +		inspect_for_hardlockups(regs);
> > +		return NMI_HANDLED;
> > +	}
> 
> So if the function above returns false, then why would you try to check
> that CPU mask and invoke the inspection? You already cleared that bit
> above. This part is pointless and can be removed.
> 

Do you mean is_hpet_wdt_interrupt()? Such function only returns true if it
determines that the HPET caused an NMI in the handling_cpu. The goal is that
the first block only handles an NMI caused by the HPET and the second block only
handles the NMI IPI issued by send_IPI_mask_allbutself() in the first
block.

Also, using IPIs may not be the best approach if we have to send them across
packages as they can be rather costly. Instead, I plan to go back to my initial
proposal of targeting each online CPU separately in a round robin fashion
instead of having a single CPU handling the HPET interrupt and issuing
IPIs.

> > +void hardlockup_detector_hpet_enable(void)
> > +{
> > +	struct cpumask *allowed = watchdog_get_allowed_cpumask();
> > +	unsigned int cpu = smp_processor_id();
> 
> This is called from a function which has already a cpu argument. Why aren't
> you handing that in?
> 

But it could also be called from watchdog_nmi_stop(), which does not
take a cpu as argument. Perhaps watchdog_nmi_stop() does not need to be
implemented as in addition to it watchdog_nmi_disable() is called on all
CPUs when stopping or reconfiguring the watchdog.

> > +
> > +	if (!hld_data)
> > +		return;
> 
> Why can hld_data be NULL here?

It cannot. It could only be null if, for instance HPET was disabled or
not available. These cases cause hardlockup_detector_hpet_init() to
fail and the NMI watchdog falls back to the perf implementation.

I'll remove this check.

> 
> > +
> > +	hld_data->handling_cpu = cpumask_first(allowed);
> > +
> > +	if (cpu == hld_data->handling_cpu) {
> > +		update_handling_cpu(hld_data);
> > +		/* Force timer kick when detector is just enabled */
> > +		kick_timer(hld_data, true);
> > +		enable_timer(hld_data);
> > +	}
> 
> That logic is wrong. The per cpu enable function is called via:
> 
>   softlockup_start_all()
>     for_each_cpu(cpu, &watchdog_allowed_mask)
>        smp_call_on_cpu(cpu, softlockup_start_fn, NULL, false);
> 
>        ---> softlockup_start_fn()
>                watchdog_enable()
> 
> OR
> 
>   lockup_detector_online_cpu()
>     watchdog_enable()
> 
> The latter is a CPU hotplug operation. The former is invoked when the
> watchdog is (re)configured in the core code. Both are mutually exclusive
> and guarantee to never invoke the function concurrently on multiple CPUs.
> 
> So way you should handle this is:
> 
> 	cpumask_set_cpu(cpu, hld_data->cpu_monitored_mask);
> 
> 	if (!hld_data->enabled_cpus++) {
> 		hld_data->handling_cpu = cpu;
> 		kick_timer();
> 		enable_timer();
> 	}
> 
> The cpu mask starts off empty and each CPU sets itself when the function is
> invoked on it.
> 
> data->enabled_cpus keeps track of the enabled cpus so you avoid
> reconfiguration just because a different cpu comes online. And it's
> required for disable as well.
> 
> > +void hardlockup_detector_hpet_disable(void)
> > +{
> > +	struct cpumask *allowed = watchdog_get_allowed_cpumask();
> > +
> > +	if (!hld_data)
> > +		return;
> > +
> > +	/* Only disable the timer if there are no more CPUs to monitor. */
> > +	if (!cpumask_weight(allowed))
> > +		disable_timer(hld_data);
> 
> Again this should be:
> 
> 	cpumask_clear_cpu(cpu, hld_data->cpu_monitored_mask);
> 	hld_data->enabled_cpus--;
> 
> 	if (hld_data->handling_cpu != cpu)
> 		return;
> 
> 	disable_timer();
> 	if (hld_data->enabled_cpus)
> 		return;
> 
> 	hld_data->handling_cpu = cpumask_first(hld_data->cpu_monitored_mask);
> 	enable_timer();
>

Sure. I will update the impelementation with this suggestion in both
hardlockup_detector_hpet_disable() and
hardlockup_detector_hpet_enable().

> > +}
> > +
> > +/**
> > + * hardlockup_detector_hpet_stop() - Stop the NMI watchdog on all CPUs
> > + *
> > + * Returns:
> > + *
> > + * None
> > + */
> > +void hardlockup_detector_hpet_stop(void)
> > +{
> > +	disable_timer(hld_data);
> > +}
> 
> This function is nowhere used.
> 

Actually, thanks to this comment I found a bug in my patches. This
function should be called from watchdog_nmi_stop().

> > +int __init hardlockup_detector_hpet_init(void)
> > +{
> > +	int ret;
> > +
> > +	if (!is_hpet_enabled())
> > +		return -ENODEV;
> > +
> > +	if (check_tsc_unstable())
> 
> What happens if the TSC becomes unstable _AFTER_ you initialized and
> started the HPET watchdog?

I had not considered such case. Probably it will need to be disabled and
switch to the perf implementation?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  2019-03-26 20:55   ` Thomas Gleixner
@ 2019-04-09  2:02     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Tue, Mar 26, 2019 at 09:55:35PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> > @@ -62,7 +67,18 @@ static inline void set_comparator(struct hpet_hld_data *hdata,
> >  static void kick_timer(struct hpet_hld_data *hdata, bool force)
> >  {
> >  	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
> > -	unsigned long new_compare, count;
> > +	unsigned long tsc_curr, tsc_delta, new_compare, count;
> > +
> > +	/* Start obtaining the current TSC and HPET counts. */
> > +	tsc_curr = rdtsc();
> > +
> > +	if (kick_needed)
> > +		count = get_count();
> 
> Can you please keep the TSC code in one block and the HPET block in the
> next one? Having this inbetween is really bad to follow.
> 
> It really does not matter whether you read the HPET counter before or after
> the calculation. This is a crystal ball estimation anyway so a few cyles
> more or less are completely irrelevant.

Sure! I will implement this change.

Thanks and BR,
Ricardo

> 
> > +	tsc_delta = (unsigned long)watchdog_thresh * (unsigned long)tsc_khz
> > +		    * 1000L;
> > +	hdata->tsc_next = tsc_curr + tsc_delta;
> > +	hdata->tsc_next_error = tsc_delta >> 6;
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 02/14] x86/hpet: Expose more functions to read and write registers
  2019-03-26 21:00   ` Thomas Gleixner
@ 2019-04-09  2:03     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Philippe Ombredanne, Kate Stewart,
	Rafael J. Wysocki

On Tue, Mar 26, 2019 at 10:00:24PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> >  struct irq_data;
> > @@ -109,6 +114,11 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
> >  static inline int hpet_enable(void) { return 0; }
> >  static inline int is_hpet_enabled(void) { return 0; }
> >  #define hpet_readl(a) 0
> > +#define hpet_writel(d, a)
> 
> What for?
> 
> > +#ifdef CONFIG_X86_64
> > +#define hpet_readq(a) 0
> > +#define hpet_writeq(d, a)
> > +#endif
> 
> Ditto.
> 
> There are no users outside of HPET and your new HPET watchdog code for
> those. And both are not compiled when CONFIG_HPET=n.

I'll remove these unneeded defintions.
> 
> The only reason to have the hpet_readl() define, which btw. should be an
> inline, is to avoid massive ifdeffery in the TSC calibration code.

May I ask what is the problem with the #define hpet_readl()? Commit
bfc0f5947afa("x86: merge tsc calibration") changed it from inline to
#define. Should I change it back?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 03/14] x86/hpet: Calculate ticks-per-second in a separate function
  2019-03-26 21:03   ` Thomas Gleixner
@ 2019-04-09  2:04     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki

On Tue, Mar 26, 2019 at 10:03:02PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> >  int hpet_alloc(struct hpet_data *hdp)
> >  {
> >  	u64 cap, mcfg;
> > @@ -845,7 +868,6 @@ int hpet_alloc(struct hpet_data *hdp)
> >  	size_t siz;
> >  	struct hpet __iomem *hpet;
> >  	static struct hpets *last;
> > -	unsigned long period;
> >  	unsigned long long temp;
> >  	u32 remainder;
> >  
> > @@ -881,6 +903,8 @@ int hpet_alloc(struct hpet_data *hdp)
> >  
> >  	cap = readq(&hpet->hpet_cap);
> >  
> > +	temp = hpet_get_ticks_per_sec(cap);
> 
> Just putting stuff to random places does not make the code any better.

This seems to not be needed. I'll remove it and directly save the result
in hpetp->hp_tick_freq;

> 
> >  	ntimer = ((cap & HPET_NUM_TIM_CAP_MASK) >> HPET_NUM_TIM_CAP_SHIFT) + 1;
> >  
> >  	if (hpetp->hp_ntimer != ntimer) {
> > @@ -897,11 +921,6 @@ int hpet_alloc(struct hpet_data *hdp)
> >  
> >  	last = hpetp;
> >  
> > -	period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
> > -		HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
> > -	temp = 1000000000000000uLL; /* 10^15 femtoseconds per second */
> > -	temp += period >> 1; /* round */
> > -	do_div(temp, period);
> >  	hpetp->hp_tick_freq = temp; /* ticks per second */
> 
> What's wrong with the obvious:
> 
>        hpetp->hp_tick_freq = hpet_get_ticks_per_sec(cap);
> 
> Hmm?

Nothing wrong. I'll implement this change.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 05/14] x86/hpet: Relocate flag definitions to a header file
  2019-03-26 21:11   ` Thomas Gleixner
@ 2019-04-09  2:04     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki

On Tue, Mar 26, 2019 at 10:11:16PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> 
> > Users of HPET timers (such as the hardlockup detector) need the definitions
> > of these flags to interpret the configuration of a timer as passed by
> > platform code.
> 
> Which platform code?

Sorry, it is not platform code. Shall I call it the HPET-enabling code?
> 
> > +#define HPET_DEV_USED_BIT		2
> > +#define HPET_DEV_USED			(1 << HPET_DEV_USED_BIT)
> > +#define HPET_DEV_VALID			0x8
> > +#define HPET_DEV_FSB_CAP		0x1000
> > +#define HPET_DEV_PERI_CAP		0x2000
> 
> I'm not seing why you would need any of those in the watchdog code.
> 
> The only function related to the watchdog which needs these is
> hpet_hardlockup_detector_assign_timer() and that is located in hpet.c
> itself.

Yes, I see now that it is not needed. This patch can be removed.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/14] x86/hpet: Configure the timer used by the hardlockup detector
  2019-03-26 21:13   ` Thomas Gleixner
@ 2019-04-09  2:04     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki

On Tue, Mar 26, 2019 at 10:13:06PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> > +#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR_HPET
> > +struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
> > +{
> > +	struct hpet_hld_data *hdata;
> > +	unsigned int cfg;
> > +
> > +	cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER_NR));
> > +
> > +	if (!(cfg & HPET_TN_FSB_CAP))
> > +		return NULL;
> > +
> > +	hdata = kzalloc(sizeof(*hdata), GFP_KERNEL);
> > +	if (!hdata)
> > +		return NULL;
> > +
> > +	hdata->flags = HPET_DEV_FSB_CAP;
> 
> Pointless.

Agreed. Only if the timer is FSB-capable is hdata is initialized.
> 
> > +
> > +	if (cfg & HPET_TN_PERIODIC_CAP)
> > +		hdata->flags |= HPET_DEV_PERI_CAP;
> 
> This can be expressed by a simple:
> 
>      hdata->has_periodic = 1;
> 
> And no flag shuffling required at all.

Sure. I'll implement this change.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 08/14] watchdog/hardlockup: Decouple the hardlockup detector from perf
  2019-03-26 21:18   ` Thomas Gleixner
@ 2019-04-09  2:05     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Luis R. Rodriguez,
	sparclinux, linuxppc-dev

On Tue, Mar 26, 2019 at 10:18:32PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Detect hard lockups on a system
> > + *
> > + * Copyright (C) Intel Corporation 2019
> > + *
> > + * Note: All of this code comes from the original perf-specific hardlockup
> > + * detector.
> 
> And because you copied it from there without modification this becomes
> automatically Copyrighted by Intel?

Yes, it does not look correct.

> 
> May I recommend that you talk to your legal departement about this.
> 
> Aside of that there is really no value in creating yet another file. The
> code in question can simply wrapped into a large #ifdef PERF in the old
> file.

Sure, I'll implement the change in this manner.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation
  2019-03-26 21:20       ` Thomas Gleixner
@ 2019-04-09  2:05         ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul E. McKenney, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel,
	Ricardo Neri, H. Peter Anvin, Tony Luck, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Luis R. Rodriguez, Waiman Long, Josh Poimboeuf, Randy Dunlap,
	Davidlohr Bueso, Christoffer Dall, Marc Zyngier, Kai-Heng Feng,
	Konrad Rzeszutek Wilk, David Rientjes, sparclinux, linuxppc-dev

On Tue, Mar 26, 2019 at 10:20:41PM +0100, Thomas Gleixner wrote:
> On Thu, 28 Feb 2019, Ricardo Neri wrote:
> > > > 
> > > > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
> > > > +    defined(CONFIG_X86_HARDLOCKUP_DETECTOR_HPET)
> > > 
> > > Why not instead make CONFIG_X86_HARDLOCKUP_DETECTOR_HPET select
> > > CONFIG_HARDLOCKUP_DETECTOR_PERF?  Keep the arch-specific details
> > > in the arch-specific files and all that.
> > 
> > Thanks for your feedback, Paul! The HPET implementation does not use
> > perf. Thus, in my opinion is not correct for the HPET HLD to select
> > the perf implementation. Patch 8 of this series splits the perf-specific
> > code and the generic hardlockup detector code. Does this make sense?
> 
> That's what intermediate config symbols are for.
> 
> config HARDLOCKUP_DETECTOR_CORE
>        bool
> 
> And make both PERF and HPET select it.

I'll implement it in this manner.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  2019-03-26 21:22   ` Thomas Gleixner
@ 2019-04-09  2:05     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes,
	David S. Miller, Benjamin Herrenschmidt, sparclinux,
	linuxppc-dev

On Tue, Mar 26, 2019 at 10:22:40PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> >  
> > -struct cpumask watchdog_allowed_mask __read_mostly;
> > +static struct cpumask watchdog_allowed_mask __read_mostly;
> 
> That hunk is correct.

I'll send a separate patch with this hunk only.
> 
> >  struct cpumask watchdog_cpumask __read_mostly;
> >  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> > @@ -92,6 +92,11 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> >  }
> >  __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
> >  # endif /* CONFIG_SMP */
> > +
> > +struct cpumask *watchdog_get_allowed_cpumask(void)
> > +{
> > +	return &watchdog_allowed_mask;
> > +}
> 
> That part is pointless as I showed you in the other reply. You don't need
> that mask in your code at all.

Yes. I'll remove this hunk and follow the approach you suggested in your
reply to patch 11.

Thanks and BR,
Ricardo
> 
> Thanks,
> 
> 	tglx
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 13/14] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
  2019-03-26 21:29   ` Thomas Gleixner
@ 2019-04-09  2:07     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Ashok Raj, Andi Kleen,
	Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Tue, Mar 26, 2019 at 10:29:52PM +0100, Thomas Gleixner wrote:
> On Wed, 27 Feb 2019, Ricardo Neri wrote:
> > +			When hpet is specified, the NMI watchdog will be driven
> > +			by an HPET timer, if available in the system. Otherwise,
> > +			the perf-based implementation will be used. Specifying
> > +			hpet implies that nmi_watchdog is on.
> 
> How so?
> 
I meant to say that the user does not need to provide nmi_watchdog=1 and
nmi_watchdog=hpet separately.

I think this is true because watchdog_user_enabled in kernel/watchdog.c is set
to 1 when CONFIG_HARDLOCKUP_DETECTOR is selected. Also, if nmi_watchdog_available
is set to true if watchdog_nmi_probe() is successful.

Perhaps I can add a warning in case nmi_watchdog=hpet and either
CONFIG_HARDLOCKUP_DETECTOR or CONFIG_HARDLOCKUP_DETECTOR_HPET are not
selected?

> > +static int __init hardlockup_detector_hpet_setup(char *str)
> > +{
> > +	if (strstr(str, "hpet"))
> > +		hardlockup_use_hpet = true;
> 
> strstr()? Not really.

Is strncmp(str, "hpet", 5) more acceptable?

> 
> > +
> > +	return 0;
> > +}
> > +__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);
> > +
> >  /**
> >   * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
> >   *
> > @@ -405,6 +422,9 @@ int __init hardlockup_detector_hpet_init(void)
> >  {
> >  	int ret;
> >  
> > +	if (!hardlockup_use_hpet)
> > +		return -ENODEV;
> 
> This should have been there in the patch which introduces
> hardlockup_detector_hpet_init(). And this patch merily adds the command
> line magic which sets that flag.

Sure, I will move this check into the patch that introduces
hardlockup_detector_hpet_init().

> 
> > +
> >  	if (!is_hpet_enabled())
> >  		return -ENODEV;
> >  
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 367aa81294ef..28cad7310378 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -78,7 +78,7 @@ static int __init hardlockup_panic_setup(char *str)
> >  		nmi_watchdog_user_enabled = 0;
> >  	else if (!strncmp(str, "1", 1))
> >  		nmi_watchdog_user_enabled = 1;
> > -	return 1;
> > +	return 0;
> 
> Why?

My understanding is that this is needed so that other __setup functions that also
want to check "nmi_watchdog" are able to do it. Is this understanding
not correct?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-04-05 16:12   ` Suthikulpanit, Suravee
@ 2019-04-09  2:14     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-09  2:14 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Peter Zijlstra, Ravi V. Shankar, x86, linux-kernel,
	Ricardo Neri, H. Peter Anvin, Tony Luck, Clemens Ladisch,
	Arnd Bergmann, Philippe Ombredanne, Kate Stewart,
	Rafael J. Wysocki, Mimi Zohar, Jan Kiszka, Nick Desaulniers,
	Masahiro Yamada, Nayna Jain, Lendacky, Thomas, Grimm, Jon

On Fri, Apr 05, 2019 at 04:12:56PM +0000, Suthikulpanit, Suravee wrote:
> Hi Neri,

Hi Suravee,

Many thanks for testing the patches!
> 
> While trying out this patch series, I found that it does not work when the HPET timer
> is in periodic mode.

I should have tested this better. I'll double check my setup.
> 
> On 2/27/19 11:05 PM, Ricardo Neri wrote:
> > ......
> > diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
> > new file mode 100644
> > index 000000000000..cfa284da4bf6
> > --- /dev/null
> > +++ b/arch/x86/kernel/watchdog_hld_hpet.c
> > @@ -0,0 +1,405 @@
> > .....
> > +
> > +/**
> > + * set_comparator() - Update the comparator in an HPET timer instance
> > + * @hdata:	A data structure with the timer instance to update
> > + * @cmp:	The value to write in the in the comparator registere
> > + *
> > + * Returns:
> > + *
> > + * None
> > + */
> > +static inline void set_comparator(struct hpet_hld_data *hdata,
> > +				  unsigned long cmp)
> > +{
> > +	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
> > +}
> > +
> > +/**
> > + * kick_timer() - Reprogram timer to expire in the future
> > + * @hdata:	A data structure with the timer instance to update
> > + * @force:	Force reprogram. Useful enabling or re-enabling detector.
> > + *
> > + * Reprogram the timer to expire within watchdog_thresh seconds in the future.
> > + *
> > + * Returns:
> > + *
> > + * None
> > + */
> > +static void kick_timer(struct hpet_hld_data *hdata, bool force)
> > +{
> > +	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
> > +	unsigned long new_compare, count;
> > +
> > +	/*
> > +	 * Update the comparator in increments of watch_thresh seconds relative
> > +	 * to the current count. Since watch_thresh is given in seconds, we
> > +	 * are able to update the comparator before the counter reaches such new
> > +	 * value.
> > +	 *
> > +	 * Let it wrap around if needed.
> > +	 */
> > +
> > +	if (kick_needed) {
> > +		count = get_count();
> > +
> > +		new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> > +
> > +		set_comparator(hdata, new_compare);
> > +	}
> > +}
> 
> It turns out that the set_comparator() does not seem to support periodic mode,
> in which it should have been setting the accumulator/period via the comparator register.
> 
> Instead, what if we re-factor the code in the arch/x86/kernel/hpet.c:hpet_set_periodic()
> to hpet_set_comparator(). Then we can also reuse it in the arch/x86/kernel/watchdog_hld_pet.c:
> kick_timer() as shown below.
> 
> With these changes, I can test the series on AMD systems, and see all cores
> in /proc/interrupts are receiving NMI interrupts.

These changes look OK to me. Thanks! I'll incorporate them in my next version.

> 
> Regards,
> Suravee
> 
> 
> diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
> index 0976334..f30957f 100644
> --- a/arch/x86/include/asm/hpet.h
> +++ b/arch/x86/include/asm/hpet.h
> @@ -115,6 +115,7 @@ extern int hpet_rtc_timer_init(void);
>   extern irqreturn_t hpet_rtc_interrupt(int irq, void *dev_id);
>   extern int hpet_register_irq_handler(rtc_irq_handler handler);
>   extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
> +extern void hpet_set_comparator(int num, unsigned int cmp, unsigned int period);
> 
>   #endif /* CONFIG_HPET_EMULATE_RTC */
> 
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 03b05dae..f3b2351 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -328,6 +328,41 @@ static void hpet_legacy_clockevent_register(void)
>   	printk(KERN_DEBUG "hpet clockevent registered\n");
>   }
> 
> +/*
> + * hpet_set_comparator() - Helper function for setting comparator register
> + * @num:	The timer ID
> + * @cmp:	The value to be written to the comparator/accumulator
> + * @period:	The value to be written to the period (0 = oneshot mode)
> + *
> + * Helper function for updating comparator, accumulator and period values.
> + *
> + * In periodic mode, HPET needs HPET_TN_SETVAL to be set before writing
> + * to the Tn_CMP to update the accumulator. Then, HPET needs a second
> + * write (with HPET_TN_SETVAL cleared) to Tn_CMP to set the period.
> + * The HPET_TN_SETVAL bit is automatically cleared after the first write.
> + *
> + * For one-shot mode, HPET_TN_SETVAL does not need to be set.
> + *
> + * See the following documents:
> + *   - Intel IA-PC HPET (High Precision Event Timers) Specification
> + *   - AMD-8111 HyperTransport I/O Hub Data Sheet, Publication # 24674
> + */
> +void hpet_set_comparator(int num, unsigned int cmp, unsigned int period)
> +{
> +	if (period) {
> +		unsigned int v = hpet_readl(HPET_Tn_CFG(num));
> +		hpet_writel(v | HPET_TN_SETVAL, HPET_Tn_CFG(num));
> +	}
> +
> +	hpet_writel(cmp, HPET_Tn_CMP(num));
> +	if (!period)
> +		return;
> +
> +	udelay(1);
> +	hpet_writel(period, HPET_Tn_CMP(num));
> +}
> +EXPORT_SYMBOL_GPL(hpet_set_comparator);
> +
>   static int hpet_set_periodic(struct clock_event_device *evt, int timer)
>   {
>   	unsigned int cfg, cmp, now;
> @@ -339,19 +374,11 @@ static int hpet_set_periodic(struct clock_event_device *evt, int timer)
>   	now = hpet_readl(HPET_COUNTER);
>   	cmp = now + (unsigned int)delta;
>   	cfg = hpet_readl(HPET_Tn_CFG(timer));
> -	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
> -	       HPET_TN_32BIT;
> +	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_32BIT;
>   	hpet_writel(cfg, HPET_Tn_CFG(timer));
> -	hpet_writel(cmp, HPET_Tn_CMP(timer));
> -	udelay(1);
> -	/*
> -	 * HPET on AMD 81xx needs a second write (with HPET_TN_SETVAL
> -	 * cleared) to T0_CMP to set the period. The HPET_TN_SETVAL
> -	 * bit is automatically cleared after the first write.
> -	 * (See AMD-8111 HyperTransport I/O Hub Data Sheet,
> -	 * Publication # 24674)
> -	 */
> -	hpet_writel((unsigned int)delta, HPET_Tn_CMP(timer));
> +
> +	hpet_set_comparator(timer, cmp, (unsigned int)delta);
> +
>   	hpet_start_counter();
>   	hpet_print_config();
> 
> diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
> index 4402def..de33c50 100644
> --- a/arch/x86/kernel/watchdog_hld_hpet.c
> +++ b/arch/x86/kernel/watchdog_hld_hpet.c
> @@ -35,21 +35,6 @@ static inline unsigned long get_count(void)
>   }
> 
>   /**
> - * set_comparator() - Update the comparator in an HPET timer instance
> - * @hdata:	A data structure with the timer instance to update
> - * @cmp:	The value to write in the in the comparator registere
> - *
> - * Returns:
> - *
> - * None
> - */
> -static inline void set_comparator(struct hpet_hld_data *hdata,
> -				  unsigned long cmp)
> -{
> -	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
> -}
> -
> -/**
>    * kick_timer() - Reprogram timer to expire in the future
>    * @hdata:	A data structure with the timer instance to update
>    * @force:	Force reprogram. Useful enabling or re-enabling detector.
> @@ -68,7 +53,7 @@ static inline void set_comparator(struct hpet_hld_data *hdata,
>   static void kick_timer(struct hpet_hld_data *hdata, bool force)
>   {
>   	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
> -	unsigned long tsc_curr, tsc_delta, new_compare, count;
> +	unsigned long tsc_curr, tsc_delta, new_compare, count, period = 0;
> 
>   	/* Start obtaining the current TSC and HPET counts. */
>   	tsc_curr = rdtsc();
> @@ -81,6 +66,9 @@ static void kick_timer(struct hpet_hld_data *hdata, bool force)
>   	hdata->tsc_next = tsc_curr + tsc_delta;
>   	hdata->tsc_next_error = tsc_delta >> 6;
> 
> +	if (!kick_needed)
> +		return;
> +
>   	/*
>   	 * Update the comparator in increments of watch_thresh seconds relative
>   	 * to the current count. Since watch_thresh is given in seconds, we
> @@ -89,12 +77,11 @@ static void kick_timer(struct hpet_hld_data *hdata, bool force)
>   	 *
>   	 * Let it wrap around if needed.
>   	 */
> +	if (hdata->flags & HPET_DEV_PERI_CAP)
> +		period = watchdog_thresh * hdata->ticks_per_second;
> 
> -	if (kick_needed) {
> -		new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> -
> -		set_comparator(hdata, new_compare);
> -	}
> +	new_compare = count + period;
> +	hpet_set_comparator(hdata->num, new_compare, period);
>   }
> 
>   /**
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-03-26 20:49   ` Thomas Gleixner
  2019-04-09  2:02     ` Ricardo Neri
@ 2019-04-09 10:59     ` Peter Zijlstra
  2019-04-10  1:13       ` Ricardo Neri
  1 sibling, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2019-04-09 10:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ricardo Neri, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Tue, Mar 26, 2019 at 09:49:13PM +0100, Thomas Gleixner wrote:
> So way you should handle this is:
> 
> 	cpumask_set_cpu(cpu, hld_data->cpu_monitored_mask);
> 
> 	if (!hld_data->enabled_cpus++) {
> 		hld_data->handling_cpu = cpu;
> 		kick_timer();
> 		enable_timer();
> 	}
> 
> The cpu mask starts off empty and each CPU sets itself when the function is
> invoked on it.
> 
> data->enabled_cpus keeps track of the enabled cpus so you avoid
> reconfiguration just because a different cpu comes online. And it's
> required for disable as well.
> 
> > +void hardlockup_detector_hpet_disable(void)
> > +{
> > +	struct cpumask *allowed = watchdog_get_allowed_cpumask();
> > +
> > +	if (!hld_data)
> > +		return;
> > +
> > +	/* Only disable the timer if there are no more CPUs to monitor. */
> > +	if (!cpumask_weight(allowed))
> > +		disable_timer(hld_data);
> 
> Again this should be:
> 
> 	cpumask_clear_cpu(cpu, hld_data->cpu_monitored_mask);
> 	hld_data->enabled_cpus--;
> 
> 	if (hld_data->handling_cpu != cpu)
> 		return;
> 
> 	disable_timer();
> 	if (hld_data->enabled_cpus)
> 		return;

	if (!hld_data->enabled_cpus)
		return;

> 
> 	hld_data->handling_cpu = cpumask_first(hld_data->cpu_monitored_mask);
> 	enable_timer();

That said; you can do the above without ->enabled_cpus, by using
->handling_cpu == nr_cpu_ids to indicate 'empty'. But I'm not at all
sure that is worth the effort, it results in less obious code.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-02-27 16:05 ` [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
  2019-03-26 20:49   ` Thomas Gleixner
  2019-04-05 16:12   ` Suthikulpanit, Suravee
@ 2019-04-09 11:03   ` Peter Zijlstra
  2019-04-10  1:05     ` Ricardo Neri
  2 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2019-04-09 11:03 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Wed, Feb 27, 2019 at 08:05:15AM -0800, Ricardo Neri wrote:
> diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
> index 4d559e0c746f..15dc3b576496 100644
> --- a/arch/x86/include/asm/hpet.h
> +++ b/arch/x86/include/asm/hpet.h
> @@ -123,12 +123,24 @@ struct hpet_hld_data {
>  	u32		num;
>  	u32		flags;
>  	u64		ticks_per_second;
> +	u32		handling_cpu;
> +	struct cpumask	cpu_monitored_mask;
> +	struct msi_msg	msi_msg;
>  };

Please don't use struct cpumask unless you absolutely have to.

The above is better written as:


	struct hpet_hld_data {
		...
		unsigned long cpumask[0];
	};

and allocated using:

	struct hpet_hld_data *hhd = kzalloc(sizeof(struct hpet_hld_data) + cpumask_size());

and used as:

	to_cpumask(hhd->cpumask);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  2019-02-27 16:05 ` [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
  2019-03-26 20:55   ` Thomas Gleixner
@ 2019-04-09 11:28   ` Peter Zijlstra
  2019-04-10  1:19     ` Ricardo Neri
  1 sibling, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2019-04-09 11:28 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Wed, Feb 27, 2019 at 08:05:16AM -0800, Ricardo Neri wrote:
> @@ -62,7 +67,18 @@ static inline void set_comparator(struct hpet_hld_data *hdata,
>  static void kick_timer(struct hpet_hld_data *hdata, bool force)
>  {
>  	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
> -	unsigned long new_compare, count;
> +	unsigned long tsc_curr, tsc_delta, new_compare, count;
> +
> +	/* Start obtaining the current TSC and HPET counts. */
> +	tsc_curr = rdtsc();
> +
> +	if (kick_needed)
> +		count = get_count();
> +
> +	tsc_delta = (unsigned long)watchdog_thresh * (unsigned long)tsc_khz
> +		    * 1000L;
> +	hdata->tsc_next = tsc_curr + tsc_delta;
> +	hdata->tsc_next_error = tsc_delta >> 6;

What do we need a per hld_data tsc_next_error for? It is basically a
global 'constant'.

>  	/*
>  	 * Update the comparator in increments of watch_thresh seconds relative
> @@ -74,8 +90,6 @@ static void kick_timer(struct hpet_hld_data *hdata, bool force)
>  	 */
>  
>  	if (kick_needed) {
> -		count = get_count();
> -
>  		new_compare = count + watchdog_thresh * hdata->ticks_per_second;
>  
>  		set_comparator(hdata, new_compare);
> @@ -147,6 +161,14 @@ static void set_periodic(struct hpet_hld_data *hdata)
>   */
>  static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
>  {
> +	if (smp_processor_id() == hdata->handling_cpu) {
> +		unsigned long tsc_curr;

TSC is u64

> +
> +		tsc_curr = rdtsc();
> +		if (abs(tsc_curr - hdata->tsc_next) < hdata->tsc_next_error)
> +			return true;

You can write that as:

		(tsc_curr - hdata->tsc_next) + tsc_error < 2*tsc_error

which doesn't contain any branches what so ever.

> +	}
> +
>  	return false;
>  }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  2019-02-27 16:05 ` [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask Ricardo Neri
  2019-03-26 21:22   ` Thomas Gleixner
@ 2019-04-09 11:34   ` Peter Zijlstra
  2019-04-11  1:15     ` Ricardo Neri
  1 sibling, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2019-04-09 11:34 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes,
	David S. Miller, Benjamin Herrenschmidt, sparclinux,
	linuxppc-dev

On Wed, Feb 27, 2019 at 08:05:14AM -0800, Ricardo Neri wrote:
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 8fbfda94a67b..367aa81294ef 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -44,7 +44,7 @@ int __read_mostly soft_watchdog_user_enabled = 1;
>  int __read_mostly watchdog_thresh = 10;
>  int __read_mostly nmi_watchdog_available;
>  
> -struct cpumask watchdog_allowed_mask __read_mostly;
> +static struct cpumask watchdog_allowed_mask __read_mostly;
>  
>  struct cpumask watchdog_cpumask __read_mostly;
>  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);

Hurmph, more struct cpumask, ideally this would get converted to
cpumask_var_t, I don't think we need this before the allocators work, do
we?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-04-09 11:03   ` Peter Zijlstra
@ 2019-04-10  1:05     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-10  1:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Tue, Apr 09, 2019 at 01:03:40PM +0200, Peter Zijlstra wrote:
> On Wed, Feb 27, 2019 at 08:05:15AM -0800, Ricardo Neri wrote:
> > diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
> > index 4d559e0c746f..15dc3b576496 100644
> > --- a/arch/x86/include/asm/hpet.h
> > +++ b/arch/x86/include/asm/hpet.h
> > @@ -123,12 +123,24 @@ struct hpet_hld_data {
> >  	u32		num;
> >  	u32		flags;
> >  	u64		ticks_per_second;
> > +	u32		handling_cpu;
> > +	struct cpumask	cpu_monitored_mask;
> > +	struct msi_msg	msi_msg;
> >  };
> 
> Please don't use struct cpumask unless you absolutely have to.
> 
> The above is better written as:
> 
> 
> 	struct hpet_hld_data {
> 		...
> 		unsigned long cpumask[0];
> 	};
> 
> and allocated using:
> 
> 	struct hpet_hld_data *hhd = kzalloc(sizeof(struct hpet_hld_data) + cpumask_size());
> 
> and used as:
> 
> 	to_cpumask(hhd->cpumask);

Thanks for your feedback, Peter!

Sure. I will make this change.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
  2019-04-09 10:59     ` Peter Zijlstra
@ 2019-04-10  1:13       ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-10  1:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Tue, Apr 09, 2019 at 12:59:46PM +0200, Peter Zijlstra wrote:
> On Tue, Mar 26, 2019 at 09:49:13PM +0100, Thomas Gleixner wrote:
> > So way you should handle this is:
> > 
> > 	cpumask_set_cpu(cpu, hld_data->cpu_monitored_mask);
> > 
> > 	if (!hld_data->enabled_cpus++) {
> > 		hld_data->handling_cpu = cpu;
> > 		kick_timer();
> > 		enable_timer();
> > 	}
> > 
> > The cpu mask starts off empty and each CPU sets itself when the function is
> > invoked on it.
> > 
> > data->enabled_cpus keeps track of the enabled cpus so you avoid
> > reconfiguration just because a different cpu comes online. And it's
> > required for disable as well.
> > 
> > > +void hardlockup_detector_hpet_disable(void)
> > > +{
> > > +	struct cpumask *allowed = watchdog_get_allowed_cpumask();
> > > +
> > > +	if (!hld_data)
> > > +		return;
> > > +
> > > +	/* Only disable the timer if there are no more CPUs to monitor. */
> > > +	if (!cpumask_weight(allowed))
> > > +		disable_timer(hld_data);
> > 
> > Again this should be:
> > 
> > 	cpumask_clear_cpu(cpu, hld_data->cpu_monitored_mask);
> > 	hld_data->enabled_cpus--;
> > 
> > 	if (hld_data->handling_cpu != cpu)
> > 		return;
> > 
> > 	disable_timer();
> > 	if (hld_data->enabled_cpus)
> > 		return;
> 
> 	if (!hld_data->enabled_cpus)
> 		return;
> 
> > 
> > 	hld_data->handling_cpu = cpumask_first(hld_data->cpu_monitored_mask);
> > 	enable_timer();
> 
> That said; you can do the above without ->enabled_cpus, by using
> ->handling_cpu == nr_cpu_ids to indicate 'empty'. But I'm not at all
> sure that is worth the effort, it results in less obious code.

I agree. It is probably clearer to check for !hld_data->enabled_cpus as
it clearly indicates what happens if there are no more CPUs to monitor.

Thanks and BR,
Ricardo


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  2019-04-09 11:28   ` Peter Zijlstra
@ 2019-04-10  1:19     ` Ricardo Neri
  2019-04-10  7:01       ` Peter Zijlstra
  0 siblings, 1 reply; 49+ messages in thread
From: Ricardo Neri @ 2019-04-10  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Tue, Apr 09, 2019 at 01:28:17PM +0200, Peter Zijlstra wrote:
> On Wed, Feb 27, 2019 at 08:05:16AM -0800, Ricardo Neri wrote:
> > @@ -62,7 +67,18 @@ static inline void set_comparator(struct hpet_hld_data *hdata,
> >  static void kick_timer(struct hpet_hld_data *hdata, bool force)
> >  {
> >  	bool kick_needed = force || !(hdata->flags & HPET_DEV_PERI_CAP);
> > -	unsigned long new_compare, count;
> > +	unsigned long tsc_curr, tsc_delta, new_compare, count;
> > +
> > +	/* Start obtaining the current TSC and HPET counts. */
> > +	tsc_curr = rdtsc();
> > +
> > +	if (kick_needed)
> > +		count = get_count();
> > +
> > +	tsc_delta = (unsigned long)watchdog_thresh * (unsigned long)tsc_khz
> > +		    * 1000L;
> > +	hdata->tsc_next = tsc_curr + tsc_delta;
> > +	hdata->tsc_next_error = tsc_delta >> 6;
> 
> What do we need a per hld_data tsc_next_error for? It is basically a
> global 'constant'.
> 

This is true. I thought I'd keep all the needed variables in a single
struct to make the code more readable. I guess, I did not achieve that
goal. I'll put it as a static global variable.

> >  	/*
> >  	 * Update the comparator in increments of watch_thresh seconds relative
> > @@ -74,8 +90,6 @@ static void kick_timer(struct hpet_hld_data *hdata, bool force)
> >  	 */
> >  
> >  	if (kick_needed) {
> > -		count = get_count();
> > -
> >  		new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> >  
> >  		set_comparator(hdata, new_compare);
> > @@ -147,6 +161,14 @@ static void set_periodic(struct hpet_hld_data *hdata)
> >   */
> >  static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
> >  {
> > +	if (smp_processor_id() == hdata->handling_cpu) {
> > +		unsigned long tsc_curr;
> 
> TSC is u64

In x86_64, isn't u64 an unsigned long? Do you mean to consider the
32-bit case?

> 
> > +
> > +		tsc_curr = rdtsc();
> > +		if (abs(tsc_curr - hdata->tsc_next) < hdata->tsc_next_error)
> > +			return true;
> 
> You can write that as:
> 
> 		(tsc_curr - hdata->tsc_next) + tsc_error < 2*tsc_error
> 
> which doesn't contain any branches what so ever.
> 

Sure, I'll add this change.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  2019-04-10  1:19     ` Ricardo Neri
@ 2019-04-10  7:01       ` Peter Zijlstra
  2019-04-11  1:12         ` Ricardo Neri
  0 siblings, 1 reply; 49+ messages in thread
From: Peter Zijlstra @ 2019-04-10  7:01 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Tue, Apr 09, 2019 at 06:19:57PM -0700, Ricardo Neri wrote:
> On Tue, Apr 09, 2019 at 01:28:17PM +0200, Peter Zijlstra wrote:
> > > @@ -147,6 +161,14 @@ static void set_periodic(struct hpet_hld_data *hdata)
> > >   */
> > >  static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
> > >  {
> > > +	if (smp_processor_id() == hdata->handling_cpu) {
> > > +		unsigned long tsc_curr;
> > 
> > TSC is u64
> 
> In x86_64, isn't u64 an unsigned long? Do you mean to consider the
> 32-bit case?

Unless none of this code is available for x86_32, you pretty much have
to consider 32bit.

But even then, using u64 for 64bit values is the right thing.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
  2019-04-10  7:01       ` Peter Zijlstra
@ 2019-04-11  1:12         ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-11  1:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, Mimi Zohar,
	Jan Kiszka, Nick Desaulniers, Masahiro Yamada, Nayna Jain

On Wed, Apr 10, 2019 at 09:01:52AM +0200, Peter Zijlstra wrote:
> On Tue, Apr 09, 2019 at 06:19:57PM -0700, Ricardo Neri wrote:
> > On Tue, Apr 09, 2019 at 01:28:17PM +0200, Peter Zijlstra wrote:
> > > > @@ -147,6 +161,14 @@ static void set_periodic(struct hpet_hld_data *hdata)
> > > >   */
> > > >  static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
> > > >  {
> > > > +	if (smp_processor_id() == hdata->handling_cpu) {
> > > > +		unsigned long tsc_curr;
> > > 
> > > TSC is u64
> > 
> > In x86_64, isn't u64 an unsigned long? Do you mean to consider the
> > 32-bit case?
> 
> Unless none of this code is available for x86_32, you pretty much have
> to consider 32bit.
> 
> But even then, using u64 for 64bit values is the right thing.

OK, I'll implement this change.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  2019-04-09 11:34   ` Peter Zijlstra
@ 2019-04-11  1:15     ` Ricardo Neri
  0 siblings, 0 replies; 49+ messages in thread
From: Ricardo Neri @ 2019-04-11  1:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Ashok Raj,
	Andi Kleen, Ravi V. Shankar, x86, linux-kernel, Ricardo Neri,
	H. Peter Anvin, Tony Luck, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes,
	David S. Miller, Benjamin Herrenschmidt, sparclinux,
	linuxppc-dev

On Tue, Apr 09, 2019 at 01:34:21PM +0200, Peter Zijlstra wrote:
> On Wed, Feb 27, 2019 at 08:05:14AM -0800, Ricardo Neri wrote:
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 8fbfda94a67b..367aa81294ef 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -44,7 +44,7 @@ int __read_mostly soft_watchdog_user_enabled = 1;
> >  int __read_mostly watchdog_thresh = 10;
> >  int __read_mostly nmi_watchdog_available;
> >  
> > -struct cpumask watchdog_allowed_mask __read_mostly;
> > +static struct cpumask watchdog_allowed_mask __read_mostly;
> >  
> >  struct cpumask watchdog_cpumask __read_mostly;
> >  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> 
> Hurmph, more struct cpumask, ideally this would get converted to
> cpumask_var_t, I don't think we need this before the allocators work, do
> we?

I see mm_init() is called before lockup_detector_init(); both from start_kernel().
Thus, IMHO, kzalloc should work at this point.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2019-04-11  1:16 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-27 16:05 [RFC PATCH v2 00/14] Implement an HPET-based hardlockup detector Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 01/14] x86/msi: Add definition for NMI delivery mode Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 02/14] x86/hpet: Expose more functions to read and write registers Ricardo Neri
2019-03-26 21:00   ` Thomas Gleixner
2019-04-09  2:03     ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 03/14] x86/hpet: Calculate ticks-per-second in a separate function Ricardo Neri
2019-03-26 21:03   ` Thomas Gleixner
2019-04-09  2:04     ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 04/14] x86/hpet: Reserve timer for the HPET hardlockup detector Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 05/14] x86/hpet: Relocate flag definitions to a header file Ricardo Neri
2019-03-26 21:11   ` Thomas Gleixner
2019-04-09  2:04     ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 06/14] x86/hpet: Configure the timer used by the hardlockup detector Ricardo Neri
2019-03-26 21:13   ` Thomas Gleixner
2019-04-09  2:04     ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 07/14] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 08/14] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
2019-03-26 21:18   ` Thomas Gleixner
2019-04-09  2:05     ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 09/14] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation Ricardo Neri
2019-02-27 16:17   ` Paul E. McKenney
2019-03-01  1:17     ` Ricardo Neri
2019-03-26 21:20       ` Thomas Gleixner
2019-04-09  2:05         ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 10/14] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask Ricardo Neri
2019-03-26 21:22   ` Thomas Gleixner
2019-04-09  2:05     ` Ricardo Neri
2019-04-09 11:34   ` Peter Zijlstra
2019-04-11  1:15     ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 11/14] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
2019-03-26 20:49   ` Thomas Gleixner
2019-04-09  2:02     ` Ricardo Neri
2019-04-09 10:59     ` Peter Zijlstra
2019-04-10  1:13       ` Ricardo Neri
2019-04-05 16:12   ` Suthikulpanit, Suravee
2019-04-09  2:14     ` Ricardo Neri
2019-04-09 11:03   ` Peter Zijlstra
2019-04-10  1:05     ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 12/14] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
2019-03-26 20:55   ` Thomas Gleixner
2019-04-09  2:02     ` Ricardo Neri
2019-04-09 11:28   ` Peter Zijlstra
2019-04-10  1:19     ` Ricardo Neri
2019-04-10  7:01       ` Peter Zijlstra
2019-04-11  1:12         ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 13/14] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
2019-03-26 21:29   ` Thomas Gleixner
2019-04-09  2:07     ` Ricardo Neri
2019-02-27 16:05 ` [RFC PATCH v2 14/14] x86/watchdog: Add a shim hardlockup detector Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).