* [RFC PATCH v5 04/16] watchdog/hardlockup: Define a generic function to detect hardlockups
[not found] <20210504190526.22347-1-ricardo.neri-calderon@linux.intel.com>
@ 2021-05-04 19:05 ` Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 05/16] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
` (2 subsequent siblings)
3 siblings, 0 replies; 4+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:05 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
Cc: Ravi V. Shankar, Tony Luck, Ashok Raj, Peter Zijlstra (Intel),
Ricardo Neri, x86, Ricardo Neri, Nicholas Piggin, linux-kernel,
Andi Kleen, Stephane Eranian, Suravee Suthikulpanit,
H. Peter Anvin, Andi Kleen, Andrew Morton, linuxppc-dev
The procedure to detect hardlockups is independent of the underlying
mechanism that generates the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.
For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
* None
Changes since v3:
* None
Changes since v2:
* None
Changes since v1:
* None
---
include/linux/nmi.h | 1 +
kernel/watchdog_hld.c | 18 +++++++++++-------
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..1b68f48ad440 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -207,6 +207,7 @@ int proc_nmi_watchdog(struct ctl_table *, int , void *, size_t *, loff_t *);
int proc_soft_watchdog(struct ctl_table *, int , void *, size_t *, loff_t *);
int proc_watchdog_thresh(struct ctl_table *, int , void *, size_t *, loff_t *);
int proc_watchdog_cpumask(struct ctl_table *, int, void *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
#ifdef CONFIG_HAVE_ACPI_APEI_NMI
#include <asm/nmi.h>
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..b352e507b17f 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
.disabled = 1,
};
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
{
- /* Ensure the watchdog never gets throttled */
- event->hw.interrupts = 0;
-
if (__this_cpu_read(watchdog_nmi_touch) == true) {
__this_cpu_write(watchdog_nmi_touch, false);
return;
@@ -163,6 +157,16 @@ static void watchdog_overflow_callback(struct perf_event *event,
return;
}
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ /* Ensure the watchdog never gets throttled */
+ event->hw.interrupts = 0;
+ inspect_for_hardlockups(regs);
+}
+
static int hardlockup_detector_event_create(void)
{
unsigned int cpu = smp_processor_id();
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH v5 05/16] watchdog/hardlockup: Decouple the hardlockup detector from perf
[not found] <20210504190526.22347-1-ricardo.neri-calderon@linux.intel.com>
2021-05-04 19:05 ` [RFC PATCH v5 04/16] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
@ 2021-05-04 19:05 ` Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 12/16] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog" Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 15/16] watchdog: Expose lockup_detector_reconfigure() Ricardo Neri
3 siblings, 0 replies; 4+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:05 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
Cc: Ravi V. Shankar, Tony Luck, Ashok Raj, Peter Zijlstra (Intel),
Ricardo Neri, x86, Ricardo Neri, Nicholas Piggin, linux-kernel,
Andi Kleen, Stephane Eranian, Suravee Suthikulpanit,
H. Peter Anvin, Andi Kleen, Andrew Morton, linuxppc-dev
The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).
Group and wrap in #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF all the code
specific to perf: create and manage perf events, stop and start the perf-
based detector.
The generic portion of the detector (monitor the timers' thresholds, check
timestamps and detect hardlockups as well as the implementation of
arch_touch_nmi_watchdog()) is now selected with the new intermediate config
symbol CONFIG_HARDLOCKUP_DETECTOR_CORE.
The perf-based implementation of the detector selects the new intermediate
symbol. Other implementations should do the same.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
* None
Changes since v3:
* Squashed into this patch a previous patch to make
arch_touch_nmi_watchdog() part of the core detector code.
Changes since v2:
* Undid split of the generic hardlockup detector into a separate file.
(Thomas Gleixner)
* Added a new intermediate symbol CONFIG_HARDLOCKUP_DETECTOR_CORE to
select generic parts of the detector (Paul E. McKenney,
Thomas Gleixner).
Changes since v1:
* Make the generic detector code with CONFIG_HARDLOCKUP_DETECTOR.
---
include/linux/nmi.h | 5 ++++-
kernel/Makefile | 2 +-
kernel/watchdog_hld.c | 32 ++++++++++++++++++++------------
lib/Kconfig.debug | 4 ++++
4 files changed, 29 insertions(+), 14 deletions(-)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 1b68f48ad440..cf12380e51b3 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -94,8 +94,11 @@ static inline void hardlockup_detector_disable(void) {}
# define NMI_WATCHDOG_SYSCTL_PERM 0444
#endif
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_CORE)
extern void arch_touch_nmi_watchdog(void);
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
extern void hardlockup_detector_perf_stop(void);
extern void hardlockup_detector_perf_restart(void);
extern void hardlockup_detector_perf_disable(void);
diff --git a/kernel/Makefile b/kernel/Makefile
index e8a6715f38dc..03ac041abfff 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -95,7 +95,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
obj-$(CONFIG_KGDB) += debug/
obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_CORE) += watchdog_hld.o
obj-$(CONFIG_SECCOMP) += seccomp.o
obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index b352e507b17f..bb6435978c46 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
static DEFINE_PER_CPU(bool, hard_watchdog_warn);
static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
notrace void arch_touch_nmi_watchdog(void)
{
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
}
#endif
-static struct perf_event_attr wd_hw_attr = {
- .type = PERF_TYPE_HARDWARE,
- .config = PERF_COUNT_HW_CPU_CYCLES,
- .size = sizeof(struct perf_event_attr),
- .pinned = 1,
- .disabled = 1,
-};
-
void inspect_for_hardlockups(struct pt_regs *regs)
{
if (__this_cpu_read(watchdog_nmi_touch) == true) {
@@ -157,6 +145,24 @@ void inspect_for_hardlockups(struct pt_regs *regs)
return;
}
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
+#undef pr_fmt
+#define pr_fmt(fmt) "NMI perf watchdog: " fmt
+
+static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
+static struct cpumask dead_events_mask;
+
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
+
+static struct perf_event_attr wd_hw_attr = {
+ .type = PERF_TYPE_HARDWARE,
+ .config = PERF_COUNT_HW_CPU_CYCLES,
+ .size = sizeof(struct perf_event_attr),
+ .pinned = 1,
+ .disabled = 1,
+};
+
/* Callback function for perf event subsystem */
static void watchdog_overflow_callback(struct perf_event *event,
struct perf_sample_data *data,
@@ -298,3 +304,5 @@ int __init hardlockup_detector_perf_init(void)
}
return ret;
}
+
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_PERF */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5918e1686736..08332b5ad4fe 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1021,9 +1021,13 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
+config HARDLOCKUP_DETECTOR_CORE
+ bool
+
config HARDLOCKUP_DETECTOR_PERF
bool
select SOFTLOCKUP_DETECTOR
+ select HARDLOCKUP_DETECTOR_CORE
#
# Enables a timestamp based low pass filter to compensate for perf based
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH v5 12/16] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"
[not found] <20210504190526.22347-1-ricardo.neri-calderon@linux.intel.com>
2021-05-04 19:05 ` [RFC PATCH v5 04/16] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 05/16] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
@ 2021-05-04 19:05 ` Ricardo Neri
2021-05-04 19:05 ` [RFC PATCH v5 15/16] watchdog: Expose lockup_detector_reconfigure() Ricardo Neri
3 siblings, 0 replies; 4+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:05 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
Cc: Ravi V. Shankar, Tony Luck, Ashok Raj, Peter Zijlstra (Intel),
Ricardo Neri, x86, Ricardo Neri, Nicholas Piggin, linux-kernel,
Andi Kleen, Stephane Eranian, Suravee Suthikulpanit,
H. Peter Anvin, Andi Kleen, Andrew Morton, linuxppc-dev
Prepare hardlockup_panic_setup() to handle a comma-separated list of
options. Thus, it can continue parsing its own command-line options while
ignoring paremeters that are relevant only to specific implementations of
the hardlockup detector. Such implementations may use an early_param to
parse their own options.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
* None
Changes since v3:
* None
Changes since v2:
* Introduced this patch.
Changes since v1:
* None
---
kernel/watchdog.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 107bc38b1945..4615064ee282 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -73,13 +73,13 @@ void __init hardlockup_detector_disable(void)
static int __init hardlockup_panic_setup(char *str)
{
- if (!strncmp(str, "panic", 5))
+ if (parse_option_str(str, "panic"))
hardlockup_panic = 1;
- else if (!strncmp(str, "nopanic", 7))
+ else if (parse_option_str(str, "nopanic"))
hardlockup_panic = 0;
- else if (!strncmp(str, "0", 1))
+ else if (parse_option_str(str, "0"))
nmi_watchdog_user_enabled = 0;
- else if (!strncmp(str, "1", 1))
+ else if (parse_option_str(str, "1"))
nmi_watchdog_user_enabled = 1;
return 1;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH v5 15/16] watchdog: Expose lockup_detector_reconfigure()
[not found] <20210504190526.22347-1-ricardo.neri-calderon@linux.intel.com>
` (2 preceding siblings ...)
2021-05-04 19:05 ` [RFC PATCH v5 12/16] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog" Ricardo Neri
@ 2021-05-04 19:05 ` Ricardo Neri
3 siblings, 0 replies; 4+ messages in thread
From: Ricardo Neri @ 2021-05-04 19:05 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
Cc: Ravi V. Shankar, Tony Luck, Ashok Raj, Peter Zijlstra (Intel),
Ricardo Neri, x86, Ricardo Neri, Nicholas Piggin, linux-kernel,
Andi Kleen, Stephane Eranian, Suravee Suthikulpanit,
H. Peter Anvin, Andi Kleen, Andrew Morton, linuxppc-dev
When there are more than one implementation of the NMI watchdog, there may
be situations in which switching from one to another is needed. For
if the time-stamp counter becomes unstable, the HPET-based NMI watchdog
an no longer be used.
Switching to another hardlockup detector can be done cleanly by updating
the arch-specific stub and then reconfiguring the whole lockup detector.
Expose lockup_detector_reconfigure() to achieve this goal.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v4:
* Switching to the perf-based lockup detector under the hood is hacky.
Instead, reconfigure the whole lockup detector.
Changes since v3:
* None
Changes since v2:
* Introduced this patch.
Changes since v1:
* N/A
---
include/linux/nmi.h | 2 ++
kernel/watchdog.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index cf12380e51b3..73827a477288 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -16,6 +16,7 @@ void lockup_detector_init(void);
void lockup_detector_soft_poweroff(void);
void lockup_detector_cleanup(void);
bool is_hardlockup(void);
+void lockup_detector_reconfigure(void);
extern int watchdog_user_enabled;
extern int nmi_watchdog_user_enabled;
@@ -37,6 +38,7 @@ extern int sysctl_hardlockup_all_cpu_backtrace;
static inline void lockup_detector_init(void) { }
static inline void lockup_detector_soft_poweroff(void) { }
static inline void lockup_detector_cleanup(void) { }
+static inline void lockup_detector_reconfigure(void) { }
#endif /* !CONFIG_LOCKUP_DETECTOR */
#ifdef CONFIG_SOFTLOCKUP_DETECTOR
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4615064ee282..96f06938dc83 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -531,7 +531,7 @@ int lockup_detector_offline_cpu(unsigned int cpu)
return 0;
}
-static void lockup_detector_reconfigure(void)
+void lockup_detector_reconfigure(void)
{
cpus_read_lock();
watchdog_nmi_stop();
@@ -577,7 +577,7 @@ static __init void lockup_detector_setup(void)
}
#else /* CONFIG_SOFTLOCKUP_DETECTOR */
-static void lockup_detector_reconfigure(void)
+void lockup_detector_reconfigure(void)
{
cpus_read_lock();
watchdog_nmi_stop();
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread