linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector
@ 2023-05-19 17:18 Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 01/18] watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on correct config Douglas Anderson
                   ` (17 more replies)
  0 siblings, 18 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

This patch series adds the "buddy" hardlockup detector. In brief, the
buddy hardlockup detector can detect hardlockups without arch-level
support by having CPUs checkup on a "buddy" CPU periodically. All the
details are in the patch ("watchdog/hardlockup: detect hard lockups
using secondary (buddy) CPUs") and I encourage folks to reply to that
patch for general comments about this approach.

Note that in v3, this was a single patch [1]. It's now exploded into a
much larger series. The much larger series does a bunch of cleanup
that Petr requested in response to v3 [2] [3]. This new series ends up
sharing a lot more code with the perf lockup detector. It also tries
to bring a little sanity to some of the naming we had.

v5 of this series attempts to resolve comments made against v4. It
also fixes a bug that I had introduced between v3 and v4 where
watchdog_hardlockup_check() was reading watchdog_hardlockup_touch from
the wrong CPU. As somewhat expected there was debate about some of the
naming in the v4 responses. I've mostly tended to stick with Petr
Mladek's opinions here.

The fact that this series now touches across the whole lockup detector
codebase also caused conflicts with the series trying to add a arm64
perf-based hardlockup detector. That was a bit incovenient for me
since I was testing on arm64 and wanted to make sure that the perf
and buddy hardlockup detectors both compiled and worked. Because of
this, I've pulled the latest arm64 perf-based lockup detector patches
into my series. Specifically:
- Patches #1 - #3 of the arm64 perf-based lockup detector patches were
  generic cleanups. I added them early in my series. IMO these should
  just land.
- Patches #4 - #6 of the arm64 perf-based lockup detector patches were
  less generic but still caused conflict with my series. I tacked them
  to the end of my series after adapting them to my changes. However,
  I don't really consider them part of this series and I'd be OK if
  the series landed without them. For use cases I'm aware of the buddy
  system is sufficient and I see no urgent need to land the arm64 perf
  hardlockup support, though I also don't have any strong objections
  to the patches.

I will note that this patch series currently has no conflicts with the
other patch series I posed recently adding support for pseudo-NMI
based backtraces [5] and the two patches series are meant to work
together.

Given the new design of this patch series, testing all combinations is
fairly difficult. I've attempted to make sure that all combinations of
CONFIG_ options are good, but it wouldn't surprise me if I missed
something. I apologize in advance and I'll do my best to fix any
problems that are found.

I'll also note that the CC list is pretty giant. Some of this is based
on get_maintainers and some of this is people I thought might be
interested. Now that this series is touching so many files, I've
stopped auto-CCing everyone that get_maintainers suggests. I'll reply
to v3 and point at this patch to make sure folks are aware of it, but
if I stopped CCing you and you want back on then please yell.

As far as I can tell, there's no true MAINTAINER listed for the
existing watchdog code. Assuming people don't hate this, maybe it
would go through Andrew Morton's tree? There is now some arch-specific
code for sparc and powerpc, but it's all still watchdog code so
hopefully that would still be fine to go through the same tree.

[1] https://lore.kernel.org/r/20230501082341.v3.1.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
[2] https://lore.kernel.org/r/ZFEqynvf5nqkzEvQ@alley
[3] https://lore.kernel.org/r/ZFErmshcrcikrSU1@alley
[4] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
[5] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org

Changes in v5:
- ("More properly prevent false ...") promoted to its own patch for v5.
- Add Nicholas's explanation of why this didn't break builds.
- Don't dump stack on the buddy CPU if we fail to backtrace the hung CPU.
- Fixed wrong __this_cpu to per_cpu (oops).
- Found a few more names / comments to change.
- Move side effect (timestamp check ordering) to its own patch.
- No longer rename touch_nmi_watchdog(), just add comments.
- Tried to make names more consistent as per v4 feedback.
- Use atomic_t for hrtimer_interrupts.
- watchdog_hardlockup_dumped_stacks => watchdog_hardlockup_all_cpu_dumped
- watchdog_hardlockup_interrupt_count() => watchdog_hardlockup_kick()
- watchdog_hardlockup_is_lockedup() => is_hardlockup()
- watchdog_hardlockup_perf.c => kernel/watchdog_hld.c in description.
- watchdog_hardlockup_processed => watchdog_hardlockup_warned
- watchdog_hardlockup_touch => watchdog_hardlockup_touched.

Changes in v4:
- ("Add a "cpu" param to watchdog_hardlockup_check()") new for v4.
- ("Add a weak function for an arch to detect ...") new for v4.
- ("Define dummy watchdog_update_hrtimer_threshold() ...") new for v4.
- ("Have the perf hardlockup use __weak ...") new for v4.
- ("Move perf hardlockup checking/panic ...") new for v4.
- ("Move perf hardlockup watchdog petting to watchdog.c") new for v4.
- ("Rename some "NMI watchdog" constants/function ...") new for v4.
- ("Rename touch_nmi_watchdog() to ...") new for v4.
- ("Rename watchdog_hld.c to watchdog_perf.c") new for v4.
- ("Style changes to watchdog_hardlockup_check ...") new for v4.
- Pulled ("Adapt the watchdog_hld interface ...") into my series for v4.
- Pulled ("Enable perf events based hard ...") into my series for v4.
- Pulled ("Ensure CPU-bound context when creating ...") into my series for v4.
- Pulled ("add hw_nmi_get_sample_period for ...") into my series for v4.
- Pulled ("change watchdog_nmi_enable() to void") into my series for v4.
- Pulled ("remove WATCHDOG_DEFAULT") into my series for v4.
- Reworked atop a whole pile of cleanups, as suggested by Petr.

Changes in v3:
- Added a note in commit message about the effect on idle.
- Cleaned up commit message pros/cons to be complete sentences.
- More cpu => CPU (in Kconfig and comments).
- No code changes other than comments.

Changes in v2:
- No code changes.
- Reworked description and Kconfig based on v1 discussion.
- cpu => CPU (in commit message).

Douglas Anderson (13):
  watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on
    correct config
  watchdog/perf: More properly prevent false positives with turbo modes
  watchdog/hardlockup: Add comments to touch_nmi_watchdog()
  watchdog/perf: Rename watchdog_hld.c to watchdog_perf.c
  watchdog/hardlockup: Move perf hardlockup checking/panic to common
    watchdog.c
  watchdog/hardlockup: Style changes to watchdog_hardlockup_check() /
    is_hardlockup()
  watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check()
  watchdog/hardlockup: Move perf hardlockup watchdog petting to
    watchdog.c
  watchdog/hardlockup: Rename some "NMI watchdog" constants/function
  watchdog/hardlockup: Have the perf hardlockup use __weak functions
    more cleanly
  watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
  watchdog/perf: Add a weak function for an arch to detect if perf can
    use NMIs
  arm64: Enable perf events based hard lockup detector

Lecopzer Chen (4):
  watchdog: remove WATCHDOG_DEFAULT
  watchdog/hardlockup: change watchdog_nmi_enable() to void
  watchdog/perf: Adapt the watchdog_perf interface for async model
  arm64: add hw_nmi_get_sample_period for preparation of lockup detector

Pingfan Liu (1):
  watchdog/perf: Ensure CPU-bound context when creating hardlockup
    detector event

 arch/arm64/Kconfig                         |   2 +
 arch/arm64/kernel/Makefile                 |   1 +
 arch/arm64/kernel/watchdog_hld.c           |  36 +++
 arch/powerpc/include/asm/nmi.h             |   4 +-
 arch/powerpc/kernel/watchdog.c             |  12 +-
 arch/powerpc/platforms/pseries/mobility.c  |   4 +-
 arch/sparc/kernel/nmi.c                    |  10 +-
 drivers/perf/arm_pmu.c                     |   5 +
 drivers/perf/arm_pmuv3.c                   |  12 +-
 include/linux/nmi.h                        |  73 +++--
 include/linux/perf/arm_pmu.h               |   2 +
 kernel/Makefile                            |   3 +-
 kernel/watchdog.c                          | 344 +++++++++++++++------
 kernel/watchdog_buddy.c                    |  93 ++++++
 kernel/{watchdog_hld.c => watchdog_perf.c} | 105 +++----
 lib/Kconfig.debug                          |  52 +++-
 16 files changed, 550 insertions(+), 208 deletions(-)
 create mode 100644 arch/arm64/kernel/watchdog_hld.c
 create mode 100644 kernel/watchdog_buddy.c
 rename kernel/{watchdog_hld.c => watchdog_perf.c} (72%)

-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v5 01/18] watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on correct config
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 02/18] watchdog/perf: More properly prevent false positives with turbo modes Douglas Anderson
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

The real watchdog_update_hrtimer_threshold() is defined in
kernel/watchdog_hld.c. That file is included if
CONFIG_HARDLOCKUP_DETECTOR_PERF and the function is defined in that
file if CONFIG_HARDLOCKUP_CHECK_TIMESTAMP.

The dummy version of the function in "nmi.h" didn't get that quite
right. While this doesn't appear to be a huge deal, it's nice to make
it consistent.

It doesn't break builds because CHECK_TIMESTAMP is only defined by
x86 so others don't get a double definition, and x86 uses perf lockup
detector, so it gets the out of line version.

Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- Add Nicholas's explanation of why this didn't break builds.
- watchdog_hardlockup_perf.c => kernel/watchdog_hld.c in description.

Changes in v4:
- ("Define dummy watchdog_update_hrtimer_threshold() ...") new for v4.

 include/linux/nmi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 048c0b9aa623..771d77b62bc1 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -197,7 +197,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh);
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
-    defined(CONFIG_HARDLOCKUP_DETECTOR)
+    defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 void watchdog_update_hrtimer_threshold(u64 period);
 #else
 static inline void watchdog_update_hrtimer_threshold(u64 period) { }
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 02/18] watchdog/perf: More properly prevent false positives with turbo modes
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 01/18] watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on correct config Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-23  9:35   ` Petr Mladek
  2023-05-19 17:18 ` [PATCH v5 03/18] watchdog: remove WATCHDOG_DEFAULT Douglas Anderson
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

Currently, in the watchdog_overflow_callback() we first check to see
if the watchdog had been touched and _then_ we handle the workaround
for turbo mode. This order should be reversed.

Specifically, "touching" the hardlockup detector's watchdog should
avoid lockups being detected for one period that should be roughly the
same regardless of whether we're running turbo or not. That means that
we should do the extra accounting for turbo _before_ we look at (and
clear) the global indicating that we've been touched.

NOTE: this fix is made based on code inspection. I am not aware of any
reports where the old code would have generated false positives. That
being said, this order seems more correct and also makes it easier
down the line to share code with the "buddy" hardlockup detector.

Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- ("More properly prevent false ...") promoted to its own patch for v5.

 kernel/watchdog_hld.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..1e8a49dc956e 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -114,14 +114,14 @@ static void watchdog_overflow_callback(struct perf_event *event,
 	/* Ensure the watchdog never gets throttled */
 	event->hw.interrupts = 0;
 
+	if (!watchdog_check_timestamp())
+		return;
+
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;
 	}
 
-	if (!watchdog_check_timestamp())
-		return;
-
 	/* check for a hardlockup
 	 * This is done by making sure our timer interrupt
 	 * is incrementing.  The timer interrupt should have
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 03/18] watchdog: remove WATCHDOG_DEFAULT
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 01/18] watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on correct config Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 02/18] watchdog/perf: More properly prevent false positives with turbo modes Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 04/18] watchdog/hardlockup: change watchdog_nmi_enable() to void Douglas Anderson
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

From: Lecopzer Chen <lecopzer.chen@mediatek.com>

No reference to WATCHDOG_DEFAULT, remove it.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together. This is a nice cleanup and could land together
with the rest of my series if that makes sense.

I changed the patch prefix to match others in my series.

[1] https://lore.kernel.org/r/20220903093415.15850-2-lecopzer.chen@mediatek.com/

(no changes since v4)

Changes in v4:
- Pulled ("remove WATCHDOG_DEFAULT") into my series for v4.

 kernel/watchdog.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 8e61f21e7e33..582d572e1379 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -30,10 +30,8 @@
 static DEFINE_MUTEX(watchdog_mutex);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
-# define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT	1
 #else
-# define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT	0
 #endif
 
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 04/18] watchdog/hardlockup: change watchdog_nmi_enable() to void
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (2 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 03/18] watchdog: remove WATCHDOG_DEFAULT Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 05/18] watchdog/perf: Ensure CPU-bound context when creating hardlockup detector event Douglas Anderson
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

From: Lecopzer Chen <lecopzer.chen@mediatek.com>

Nobody cares about the return value of watchdog_nmi_enable(),
changing its prototype to void.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together. This is a nice cleanup and could land together
with the rest of my series if that makes sense.

I changed the patch prefix to match others in my series.

[1] https://lore.kernel.org/r/20220903093415.15850-3-lecopzer.chen@mediatek.com/

(no changes since v4)

Changes in v4:
- Pulled ("change watchdog_nmi_enable() to void") into my series for v4.

 arch/sparc/kernel/nmi.c | 8 +++-----
 include/linux/nmi.h     | 2 +-
 kernel/watchdog.c       | 3 +--
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 060fff95a305..5dcf31f7e81f 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
  * sparc specific NMI watchdog enable function.
  * Enables watchdog if it is not enabled already.
  */
-int watchdog_nmi_enable(unsigned int cpu)
+void watchdog_nmi_enable(unsigned int cpu)
 {
 	if (atomic_read(&nmi_active) == -1) {
 		pr_warn("NMI watchdog cannot be enabled or disabled\n");
-		return -1;
+		return;
 	}
 
 	/*
@@ -295,11 +295,9 @@ int watchdog_nmi_enable(unsigned int cpu)
 	 * process first.
 	 */
 	if (!nmi_init_done)
-		return 0;
+		return;
 
 	smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
-
-	return 0;
 }
 /*
  * sparc specific NMI watchdog disable function.
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 771d77b62bc1..454fe99c4874 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -119,7 +119,7 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
 int watchdog_nmi_probe(void);
-int watchdog_nmi_enable(unsigned int cpu);
+void watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
 
 void lockup_detector_reconfigure(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 582d572e1379..c705a18b26bf 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -93,10 +93,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
  * softlockup watchdog start and stop. The arch must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-int __weak watchdog_nmi_enable(unsigned int cpu)
+void __weak watchdog_nmi_enable(unsigned int cpu)
 {
 	hardlockup_detector_perf_enable();
-	return 0;
 }
 
 void __weak watchdog_nmi_disable(unsigned int cpu)
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 05/18] watchdog/perf: Ensure CPU-bound context when creating hardlockup detector event
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (3 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 04/18] watchdog/hardlockup: change watchdog_nmi_enable() to void Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 06/18] watchdog/hardlockup: Add comments to touch_nmi_watchdog() Douglas Anderson
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

From: Pingfan Liu <kernelfans@gmail.com>

hardlockup_detector_event_create() should create perf_event on the
current CPU. Preemption could not get disabled because
perf_event_create_kernel_counter() allocates memory. Instead,
the CPU locality is achieved by processing the code in a per-CPU
bound kthread.

Add a check to prevent mistakes when calling the code in another
code path.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together. This is a nice cleanup and could land together
with the rest of my series if that makes sense.

I changed the patch prefix to match others in my series.

[1] https://lore.kernel.org/r/20220903093415.15850-4-lecopzer.chen@mediatek.com/

(no changes since v4)

Changes in v4:
- Pulled ("Ensure CPU-bound context when creating ...") into my series for v4.

 kernel/watchdog_hld.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 1e8a49dc956e..2125b09e09d7 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -165,10 +165,16 @@ static void watchdog_overflow_callback(struct perf_event *event,
 
 static int hardlockup_detector_event_create(void)
 {
-	unsigned int cpu = smp_processor_id();
+	unsigned int cpu;
 	struct perf_event_attr *wd_attr;
 	struct perf_event *evt;
 
+	/*
+	 * Preemption is not disabled because memory will be allocated.
+	 * Ensure CPU-locality by calling this in per-CPU kthread.
+	 */
+	WARN_ON(!is_percpu_thread());
+	cpu = raw_smp_processor_id();
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 06/18] watchdog/hardlockup: Add comments to touch_nmi_watchdog()
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (4 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 05/18] watchdog/perf: Ensure CPU-bound context when creating hardlockup detector event Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-23  9:58   ` Petr Mladek
  2023-05-19 17:18 ` [PATCH v5 07/18] watchdog/perf: Rename watchdog_hld.c to watchdog_perf.c Douglas Anderson
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

In preparation for the buddy hardlockup detector, add comments to
touch_nmi_watchdog() to make it obvious that it touches the configured
hardlockup detector regardless of whether it's backed by an NMI. Also
note that arch_touch_nmi_watchdog() may not be architecture-specific.

Ideally, we'd like to rename these functions but that is a fairly
disruptive change touching a lot of drivers. After discussion [1] the
plan is to defer this until a good time.

[1] https://lore.kernel.org/r/ZFy0TX1tfhlH8gxj@alley

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- No longer rename touch_nmi_watchdog(), just add comments.

Changes in v4:
- ("Rename touch_nmi_watchdog() to ...") new for v4.

 include/linux/nmi.h | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 454fe99c4874..fafab128f37e 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -125,15 +125,30 @@ void watchdog_nmi_disable(unsigned int cpu);
 void lockup_detector_reconfigure(void);
 
 /**
- * touch_nmi_watchdog - restart NMI watchdog timeout.
+ * touch_nmi_watchdog - manually pet the hardlockup watchdog.
  *
- * If the architecture supports the NMI watchdog, touch_nmi_watchdog()
- * may be used to reset the timeout - for code which intentionally
- * disables interrupts for a long time. This call is stateless.
+ * If we support detecting hardlockups, touch_nmi_watchdog() may be
+ * used to pet the watchdog (reset the timeout) - for code which
+ * intentionally disables interrupts for a long time. This call is stateless.
+ *
+ * Though this function has "nmi" in the name, the hardlockup watchdog might
+ * not be backed by NMIs. This function will likely be renamed to
+ * touch_hardlockup_watchdog() in the future.
  */
 static inline void touch_nmi_watchdog(void)
 {
+	/*
+	 * Pass on to the hardlockup detector selected via CONFIG_. Note that
+	 * the hardlockup detector may not be arch-specific nor using NMIs
+	 * and the arch_touch_nmi_watchdog() function will likely be renamed
+	 * in the future.
+	 */
 	arch_touch_nmi_watchdog();
+
+	/*
+	 * Touching the hardlock detector implcitily pets the
+	 * softlockup detector too
+	 */
 	touch_softlockup_watchdog();
 }
 
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 07/18] watchdog/perf: Rename watchdog_hld.c to watchdog_perf.c
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (5 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 06/18] watchdog/hardlockup: Add comments to touch_nmi_watchdog() Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 08/18] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c Douglas Anderson
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

The code currently in "watchdog_hld.c" is for detecting hardlockups
using perf, as evidenced by the line in the Makefile that only
compiles this file if CONFIG_HARDLOCKUP_DETECTOR_PERF is
defined. Rename the file to prepare for the buddy hardlockup detector,
which doesn't use perf.

It could be argued that the new name makes it less obvious that this
is a hardlockup detector. While true, it's not hard to remember that
the "perf" detector is always a hardlockup detector and it's nice not
to have names that are too convoluted.

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

(no changes since v4)

Changes in v4:
- ("Rename watchdog_hld.c to watchdog_perf.c") new for v4.

 kernel/Makefile                            | 2 +-
 kernel/{watchdog_hld.c => watchdog_perf.c} | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename kernel/{watchdog_hld.c => watchdog_perf.c} (99%)

diff --git a/kernel/Makefile b/kernel/Makefile
index b69c95315480..7eb72033143c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -91,7 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_perf.c
similarity index 99%
rename from kernel/watchdog_hld.c
rename to kernel/watchdog_perf.c
index 2125b09e09d7..8b8015758ea5 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_perf.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Detect hard lockups on a system
+ * Detect hard lockups on a system using perf
  *
  * started by Don Zickus, Copyright (C) 2010 Red Hat, Inc.
  *
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 08/18] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (6 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 07/18] watchdog/perf: Rename watchdog_hld.c to watchdog_perf.c Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-23 11:45   ` Petr Mladek
  2023-05-19 17:18 ` [PATCH v5 09/18] watchdog/hardlockup: Style changes to watchdog_hardlockup_check() / is_hardlockup() Douglas Anderson
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

The perf hardlockup detector works by looking at interrupt counts and
seeing if they change from run to run. The interrupt counts are
managed by the common watchdog code via its watchdog_timer_fn().

Currently the API between the perf detector and the common code is a
function: is_hardlockup(). When the hard lockup detector sees that
function return true then it handles printing out debug info and
inducing a panic if necessary.

Let's change the API a little bit in preparation for the buddy
hardlockup detector. The buddy hardlockup detector wants to print
nearly the same debug info and have nearly the same panic
behavior. That means we want to move all that code to the common
file. For now, the code in the common file will only be there if the
perf hardlockup detector is enabled, but eventually it will be
selected by a common config.

Right now, this _just_ moves the code from the perf detector file to
the common file and changes the names. It doesn't make the changes
that the buddy hardlockup detector will need and doesn't do any style
cleanups. A future patch will do cleanup to make it more obvious what
changed.

With the above, we no longer have any callers of is_hardlockup()
outside of the "watchdog.c" file, so we can remove it from the header,
make it static, and move it to the same "#ifdef" block as our new
watchdog_hardlockup_check(). While doing this, it can be noted that
even if no hardlockup detectors were configured the existing code used
to still have the code for counting/checking "hrtimer_interrupts" even
if the perf hardlockup detector wasn't configured. We didn't need to
do that, so move all the "hrtimer_interrupts" counting to only be
there if the perf hardlockup detector is configured as well.

This change is expected to be a no-op.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- watchdog_hardlockup_interrupt_count() => watchdog_hardlockup_kick()
- watchdog_hardlockup_is_lockedup() => is_hardlockup()

Changes in v4:
- ("Move perf hardlockup checking/panic ...") new for v4.

 include/linux/nmi.h    |  5 ++-
 kernel/watchdog.c      | 93 +++++++++++++++++++++++++++++++++---------
 kernel/watchdog_perf.c | 42 +------------------
 3 files changed, 78 insertions(+), 62 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index fafab128f37e..0c62c1bf0a71 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -15,7 +15,6 @@
 void lockup_detector_init(void);
 void lockup_detector_soft_poweroff(void);
 void lockup_detector_cleanup(void);
-bool is_hardlockup(void);
 
 extern int watchdog_user_enabled;
 extern int nmi_watchdog_user_enabled;
@@ -88,6 +87,10 @@ extern unsigned int hardlockup_panic;
 static inline void hardlockup_detector_disable(void) {}
 #endif
 
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+void watchdog_hardlockup_check(struct pt_regs *regs);
+#endif
+
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
 # define NMI_WATCHDOG_SYSCTL_PERM	0644
 #else
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index c705a18b26bf..12ce37d76e7d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -85,6 +85,78 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+
+static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
+static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+static DEFINE_PER_CPU(bool, hard_watchdog_warn);
+static unsigned long hardlockup_allcpu_dumped;
+
+static bool is_hardlockup(void)
+{
+	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
+
+	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
+		return true;
+
+	__this_cpu_write(hrtimer_interrupts_saved, hrint);
+	return false;
+}
+
+static void watchdog_hardlockup_kick(void)
+{
+	__this_cpu_inc(hrtimer_interrupts);
+}
+
+void watchdog_hardlockup_check(struct pt_regs *regs)
+{
+	/* check for a hardlockup
+	 * This is done by making sure our timer interrupt
+	 * is incrementing.  The timer interrupt should have
+	 * fired multiple times before we overflow'd.  If it hasn't
+	 * then this is a good indication the cpu is stuck
+	 */
+	if (is_hardlockup()) {
+		int this_cpu = smp_processor_id();
+
+		/* only print hardlockups once */
+		if (__this_cpu_read(hard_watchdog_warn) == true)
+			return;
+
+		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
+			 this_cpu);
+		print_modules();
+		print_irqtrace_events(current);
+		if (regs)
+			show_regs(regs);
+		else
+			dump_stack();
+
+		/*
+		 * Perform all-CPU dump only once to avoid multiple hardlockups
+		 * generating interleaving traces
+		 */
+		if (sysctl_hardlockup_all_cpu_backtrace &&
+				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
+			trigger_allbutself_cpu_backtrace();
+
+		if (hardlockup_panic)
+			nmi_panic(regs, "Hard LOCKUP");
+
+		__this_cpu_write(hard_watchdog_warn, true);
+		return;
+	}
+
+	__this_cpu_write(hard_watchdog_warn, false);
+	return;
+}
+
+#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */
+
+static inline void watchdog_hardlockup_kick(void) { }
+
+#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */
+
 /*
  * These functions can be overridden if an architecture implements its
  * own hardlockup detector.
@@ -176,8 +248,6 @@ static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 static DEFINE_PER_CPU(unsigned long, watchdog_report_ts);
 static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
 static DEFINE_PER_CPU(bool, softlockup_touch_sync);
-static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
-static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
 static unsigned long soft_lockup_nmi_warn;
 
 static int __init nowatchdog_setup(char *str)
@@ -312,22 +382,6 @@ static int is_softlockup(unsigned long touch_ts,
 }
 
 /* watchdog detector functions */
-bool is_hardlockup(void)
-{
-	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
-
-	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
-		return true;
-
-	__this_cpu_write(hrtimer_interrupts_saved, hrint);
-	return false;
-}
-
-static void watchdog_interrupt_count(void)
-{
-	__this_cpu_inc(hrtimer_interrupts);
-}
-
 static DEFINE_PER_CPU(struct completion, softlockup_completion);
 static DEFINE_PER_CPU(struct cpu_stop_work, softlockup_stop_work);
 
@@ -358,8 +412,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
 
-	/* kick the hardlockup detector */
-	watchdog_interrupt_count();
+	watchdog_hardlockup_kick();
 
 	/* kick the softlockup detector */
 	if (completion_done(this_cpu_ptr(&softlockup_completion))) {
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 8b8015758ea5..04415812d079 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -20,13 +20,11 @@
 #include <asm/irq_regs.h>
 #include <linux/perf_event.h>
 
-static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
-static unsigned long hardlockup_allcpu_dumped;
 static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 notrace void arch_touch_nmi_watchdog(void)
@@ -122,45 +120,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 		return;
 	}
 
-	/* check for a hardlockup
-	 * This is done by making sure our timer interrupt
-	 * is incrementing.  The timer interrupt should have
-	 * fired multiple times before we overflow'd.  If it hasn't
-	 * then this is a good indication the cpu is stuck
-	 */
-	if (is_hardlockup()) {
-		int this_cpu = smp_processor_id();
-
-		/* only print hardlockups once */
-		if (__this_cpu_read(hard_watchdog_warn) == true)
-			return;
-
-		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
-			 this_cpu);
-		print_modules();
-		print_irqtrace_events(current);
-		if (regs)
-			show_regs(regs);
-		else
-			dump_stack();
-
-		/*
-		 * Perform all-CPU dump only once to avoid multiple hardlockups
-		 * generating interleaving traces
-		 */
-		if (sysctl_hardlockup_all_cpu_backtrace &&
-				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
-			trigger_allbutself_cpu_backtrace();
-
-		if (hardlockup_panic)
-			nmi_panic(regs, "Hard LOCKUP");
-
-		__this_cpu_write(hard_watchdog_warn, true);
-		return;
-	}
-
-	__this_cpu_write(hard_watchdog_warn, false);
-	return;
+	watchdog_hardlockup_check(regs);
 }
 
 static int hardlockup_detector_event_create(void)
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 09/18] watchdog/hardlockup: Style changes to watchdog_hardlockup_check() / is_hardlockup()
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (7 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 08/18] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check() Douglas Anderson
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

These are tiny style changes:
- Add a blank line before a "return".
- Renames two globals to use the "watchdog_hardlockup" prefix.
- Store processor id in "unsigned int" rather than "int".
- Minor comment rewording.
- Use "else" rather than extra returns since it seemed more symmetric.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- watchdog_hardlockup_dumped_stacks => watchdog_hardlockup_all_cpu_dumped
- watchdog_hardlockup_processed => watchdog_hardlockup_warned

Changes in v4:
- ("Style changes to watchdog_hardlockup_check ...") new for v4.

 kernel/watchdog.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 12ce37d76e7d..169e5dffbc00 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -89,8 +89,8 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
-static DEFINE_PER_CPU(bool, hard_watchdog_warn);
-static unsigned long hardlockup_allcpu_dumped;
+static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
+static unsigned long watchdog_hardlockup_all_cpu_dumped;
 
 static bool is_hardlockup(void)
 {
@@ -100,6 +100,7 @@ static bool is_hardlockup(void)
 		return true;
 
 	__this_cpu_write(hrtimer_interrupts_saved, hrint);
+
 	return false;
 }
 
@@ -110,21 +111,20 @@ static void watchdog_hardlockup_kick(void)
 
 void watchdog_hardlockup_check(struct pt_regs *regs)
 {
-	/* check for a hardlockup
-	 * This is done by making sure our timer interrupt
-	 * is incrementing.  The timer interrupt should have
-	 * fired multiple times before we overflow'd.  If it hasn't
+	/*
+	 * Check for a hardlockup by making sure the CPU's timer
+	 * interrupt is incrementing. The timer interrupt should have
+	 * fired multiple times before we overflow'd. If it hasn't
 	 * then this is a good indication the cpu is stuck
 	 */
 	if (is_hardlockup()) {
-		int this_cpu = smp_processor_id();
+		unsigned int this_cpu = smp_processor_id();
 
-		/* only print hardlockups once */
-		if (__this_cpu_read(hard_watchdog_warn) == true)
+		/* Only print hardlockups once. */
+		if (__this_cpu_read(watchdog_hardlockup_warned))
 			return;
 
-		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n",
-			 this_cpu);
+		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu);
 		print_modules();
 		print_irqtrace_events(current);
 		if (regs)
@@ -137,18 +137,16 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
 		 * generating interleaving traces
 		 */
 		if (sysctl_hardlockup_all_cpu_backtrace &&
-				!test_and_set_bit(0, &hardlockup_allcpu_dumped))
+		    !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped))
 			trigger_allbutself_cpu_backtrace();
 
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
-		__this_cpu_write(hard_watchdog_warn, true);
-		return;
+		__this_cpu_write(watchdog_hardlockup_warned, true);
+	} else {
+		__this_cpu_write(watchdog_hardlockup_warned, false);
 	}
-
-	__this_cpu_write(hard_watchdog_warn, false);
-	return;
 }
 
 #else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check()
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (8 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 09/18] watchdog/hardlockup: Style changes to watchdog_hardlockup_check() / is_hardlockup() Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-23 16:02   ` Petr Mladek
  2023-05-19 17:18 ` [PATCH v5 11/18] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c Douglas Anderson
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

In preparation for the buddy hardlockup detector where the CPU
checking for lockup might not be the currently running CPU, add a
"cpu" parameter to watchdog_hardlockup_check().

As part of this change, make hrtimer_interrupts an atomic_t since now
the CPU incrementing the value and the CPU reading the value might be
different. Technially this could also be done with just READ_ONCE and
WRITE_ONCE, but atomic_t feels a little cleaner in this case.

While hrtimer_interrupts is made atomic_t, we change
hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
needed to match the data type backing atomic_t for hrtimer_interrupts.
Even if this changes us from 64-bits to 32-bits (which I don't think
is true for most compilers), it doesn't really matter. All we ever do
is increment it every few seconds and compare it to an old value so
32-bits is fine (even 16-bits would be). The "signed" vs "unsigned"
also doesn't matter for simple equality comparisons.

hrtimer_interrupts_saved is _not_ switched to atomic_t nor even
accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is
always consistently accessed with the same CPU. NOTE: with the
upcoming "buddy" detector there is one special case. When a CPU goes
offline/online then we can change which CPU is the one to consistently
access a given instance of hrtimer_interrupts_saved. We still can't
end up with a partially updated hrtimer_interrupts_saved, however,
because we end up petting all affected CPUs to make sure the new and
old CPU can't end up somehow read/write hrtimer_interrupts_saved at
the same time.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- Don't dump stack on the buddy CPU if we fail to backtrace the hung CPU.
- Use atomic_t for hrtimer_interrupts.

Changes in v4:
- ("Add a "cpu" param to watchdog_hardlockup_check()") new for v4.

 include/linux/nmi.h    |  2 +-
 kernel/watchdog.c      | 52 ++++++++++++++++++++++++++----------------
 kernel/watchdog_perf.c |  2 +-
 3 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 0c62c1bf0a71..92aa568c0c42 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -88,7 +88,7 @@ static inline void hardlockup_detector_disable(void) {}
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
-void watchdog_hardlockup_check(struct pt_regs *regs);
+void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 #endif
 
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 169e5dffbc00..2552e224f76a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -87,29 +87,34 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 
-static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
-static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
+static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
 static unsigned long watchdog_hardlockup_all_cpu_dumped;
 
-static bool is_hardlockup(void)
+static bool is_hardlockup(unsigned int cpu)
 {
-	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
+	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
 
-	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
+	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
 		return true;
 
-	__this_cpu_write(hrtimer_interrupts_saved, hrint);
+	/*
+	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
+	 * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
+	 * written/read by a single CPU.
+	 */
+	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
 
 	return false;
 }
 
 static void watchdog_hardlockup_kick(void)
 {
-	__this_cpu_inc(hrtimer_interrupts);
+	atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));
 }
 
-void watchdog_hardlockup_check(struct pt_regs *regs)
+void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
 	/*
 	 * Check for a hardlockup by making sure the CPU's timer
@@ -117,35 +122,42 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
 	 * fired multiple times before we overflow'd. If it hasn't
 	 * then this is a good indication the cpu is stuck
 	 */
-	if (is_hardlockup()) {
+	if (is_hardlockup(cpu)) {
 		unsigned int this_cpu = smp_processor_id();
+		struct cpumask backtrace_mask = *cpu_online_mask;
 
 		/* Only print hardlockups once. */
-		if (__this_cpu_read(watchdog_hardlockup_warned))
+		if (per_cpu(watchdog_hardlockup_warned, cpu))
 			return;
 
-		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu);
+		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu);
 		print_modules();
 		print_irqtrace_events(current);
-		if (regs)
-			show_regs(regs);
-		else
-			dump_stack();
+		if (cpu == this_cpu) {
+			if (regs)
+				show_regs(regs);
+			else
+				dump_stack();
+			cpumask_clear_cpu(cpu, &backtrace_mask);
+		} else {
+			if (trigger_single_cpu_backtrace(cpu))
+				cpumask_clear_cpu(cpu, &backtrace_mask);
+		}
 
 		/*
-		 * Perform all-CPU dump only once to avoid multiple hardlockups
-		 * generating interleaving traces
+		 * Perform multi-CPU dump only once to avoid multiple
+		 * hardlockups generating interleaving traces
 		 */
 		if (sysctl_hardlockup_all_cpu_backtrace &&
 		    !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped))
-			trigger_allbutself_cpu_backtrace();
+			trigger_cpumask_backtrace(&backtrace_mask);
 
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
-		__this_cpu_write(watchdog_hardlockup_warned, true);
+		per_cpu(watchdog_hardlockup_warned, cpu) = true;
 	} else {
-		__this_cpu_write(watchdog_hardlockup_warned, false);
+		per_cpu(watchdog_hardlockup_warned, cpu) = false;
 	}
 }
 
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 04415812d079..4e60e8023515 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -120,7 +120,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 		return;
 	}
 
-	watchdog_hardlockup_check(regs);
+	watchdog_hardlockup_check(smp_processor_id(), regs);
 }
 
 static int hardlockup_detector_event_create(void)
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 11/18] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (9 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check() Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-24 13:07   ` Petr Mladek
  2023-05-19 17:18 ` [PATCH v5 12/18] watchdog/hardlockup: Rename some "NMI watchdog" constants/function Douglas Anderson
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

In preparation for the buddy hardlockup detector, which wants the same
petting logic as the current perf hardlockup detector, move the code
to watchdog.c. While doing this, rename the global variable to match
others nearby. As part of this change we have to change the code to
account for the fact that the CPU we're running on might be different
than the one we're checking.

Currently the code in watchdog.c is guarded by
CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem
silly. However, a future patch will change this.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- Fixed wrong __this_cpu to per_cpu (oops).
- Move side effect (timestamp check ordering) to its own patch.
- watchdog_hardlockup_touch => watchdog_hardlockup_touched.

Changes in v4:
- ("Move perf hardlockup watchdog petting to watchdog.c") new for v4.

 include/linux/nmi.h    |  5 +++--
 kernel/watchdog.c      | 19 +++++++++++++++++++
 kernel/watchdog_perf.c | 19 -------------------
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 92aa568c0c42..e286a2a1902d 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -88,7 +88,10 @@ static inline void hardlockup_detector_disable(void) {}
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+void arch_touch_nmi_watchdog(void);
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
+#elif !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static inline void arch_touch_nmi_watchdog(void) { }
 #endif
 
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
@@ -98,7 +101,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
-extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_disable(void);
@@ -113,7 +115,6 @@ static inline void hardlockup_detector_perf_enable(void) { }
 static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
 static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
-static inline void arch_touch_nmi_watchdog(void) {}
 # else
 static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 2552e224f76a..64d7d2a0a7df 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -90,8 +90,22 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
 static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
+static DEFINE_PER_CPU(bool, watchdog_hardlockup_touched);
 static unsigned long watchdog_hardlockup_all_cpu_dumped;
 
+notrace void arch_touch_nmi_watchdog(void)
+{
+	/*
+	 * Using __raw here because some code paths have
+	 * preemption enabled.  If preemption is enabled
+	 * then interrupts should be enabled too, in which
+	 * case we shouldn't have to worry about the watchdog
+	 * going off.
+	 */
+	raw_cpu_write(watchdog_hardlockup_touched, true);
+}
+EXPORT_SYMBOL(arch_touch_nmi_watchdog);
+
 static bool is_hardlockup(unsigned int cpu)
 {
 	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
@@ -116,6 +130,11 @@ static void watchdog_hardlockup_kick(void)
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
+	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
+		per_cpu(watchdog_hardlockup_touched, cpu) = false;
+		return;
+	}
+
 	/*
 	 * Check for a hardlockup by making sure the CPU's timer
 	 * interrupt is incrementing. The timer interrupt should have
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 4e60e8023515..547917ebd5d3 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -20,26 +20,12 @@
 #include <asm/irq_regs.h>
 #include <linux/perf_event.h>
 
-static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
-notrace void arch_touch_nmi_watchdog(void)
-{
-	/*
-	 * Using __raw here because some code paths have
-	 * preemption enabled.  If preemption is enabled
-	 * then interrupts should be enabled too, in which
-	 * case we shouldn't have to worry about the watchdog
-	 * going off.
-	 */
-	raw_cpu_write(watchdog_nmi_touch, true);
-}
-EXPORT_SYMBOL(arch_touch_nmi_watchdog);
-
 #ifdef CONFIG_HARDLOCKUP_CHECK_TIMESTAMP
 static DEFINE_PER_CPU(ktime_t, last_timestamp);
 static DEFINE_PER_CPU(unsigned int, nmi_rearmed);
@@ -115,11 +101,6 @@ static void watchdog_overflow_callback(struct perf_event *event,
 	if (!watchdog_check_timestamp())
 		return;
 
-	if (__this_cpu_read(watchdog_nmi_touch) == true) {
-		__this_cpu_write(watchdog_nmi_touch, false);
-		return;
-	}
-
 	watchdog_hardlockup_check(smp_processor_id(), regs);
 }
 
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 12/18] watchdog/hardlockup: Rename some "NMI watchdog" constants/function
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (10 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 11/18] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-24 13:38   ` Petr Mladek
  2023-05-19 17:18 ` [PATCH v5 13/18] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly Douglas Anderson
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

Do a search and replace of:
- NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED
- SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED
- watchdog_nmi_ => watchdog_hardlockup_
- nmi_watchdog_available => watchdog_hardlockup_available
- nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled
- soft_watchdog_user_enabled => watchdog_softlockup_user_enabled
- NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT

Then update a few comments near where names were changed.

This is specifically to make it less confusing when we want to
introduce the buddy hardlockup detector, which isn't using NMIs. As
part of this, we sanitized a few names for consistency.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- Found a few more names / comments to change.
- Tried to make names more consistent as per v4 feedback.

Changes in v4:
- ("Rename some "NMI watchdog" constants/function ...") new for v4.

 arch/powerpc/include/asm/nmi.h            |   4 +-
 arch/powerpc/kernel/watchdog.c            |  12 +--
 arch/powerpc/platforms/pseries/mobility.c |   4 +-
 arch/sparc/kernel/nmi.c                   |   4 +-
 include/linux/nmi.h                       |  18 ++--
 kernel/watchdog.c                         | 113 +++++++++++-----------
 kernel/watchdog_perf.c                    |   2 +-
 7 files changed, 79 insertions(+), 78 deletions(-)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index c3c7adef74de..43bfd4de868f 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -5,10 +5,10 @@
 #ifdef CONFIG_PPC_WATCHDOG
 extern void arch_touch_nmi_watchdog(void);
 long soft_nmi_interrupt(struct pt_regs *regs);
-void watchdog_nmi_set_timeout_pct(u64 pct);
+void watchdog_hardlockup_set_timeout_pct(u64 pct);
 #else
 static inline void arch_touch_nmi_watchdog(void) {}
-static inline void watchdog_nmi_set_timeout_pct(u64 pct) {}
+static inline void watchdog_hardlockup_set_timeout_pct(u64 pct) {}
 #endif
 
 #ifdef CONFIG_NMI_IPI
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index dbcc4a793f0b..edb2dd1f53eb 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -438,7 +438,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
 	int cpu = smp_processor_id();
 
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+	if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED))
 		return HRTIMER_NORESTART;
 
 	if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
@@ -479,7 +479,7 @@ static void start_watchdog(void *arg)
 		return;
 	}
 
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+	if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED))
 		return;
 
 	if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
@@ -546,7 +546,7 @@ static void watchdog_calc_timeouts(void)
 	wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5;
 }
 
-void watchdog_nmi_stop(void)
+void watchdog_hardlockup_stop(void)
 {
 	int cpu;
 
@@ -554,7 +554,7 @@ void watchdog_nmi_stop(void)
 		stop_watchdog_on_cpu(cpu);
 }
 
-void watchdog_nmi_start(void)
+void watchdog_hardlockup_start(void)
 {
 	int cpu;
 
@@ -566,7 +566,7 @@ void watchdog_nmi_start(void)
 /*
  * Invoked from core watchdog init.
  */
-int __init watchdog_nmi_probe(void)
+int __init watchdog_hardlockup_probe(void)
 {
 	int err;
 
@@ -582,7 +582,7 @@ int __init watchdog_nmi_probe(void)
 }
 
 #ifdef CONFIG_PPC_PSERIES
-void watchdog_nmi_set_timeout_pct(u64 pct)
+void watchdog_hardlockup_set_timeout_pct(u64 pct)
 {
 	pr_info("Set the NMI watchdog timeout factor to %llu%%\n", pct);
 	WRITE_ONCE(wd_timeout_pct, pct);
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 6f30113b5468..cd632ba9ebff 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -750,7 +750,7 @@ static int pseries_migrate_partition(u64 handle)
 		goto out;
 
 	if (factor)
-		watchdog_nmi_set_timeout_pct(factor);
+		watchdog_hardlockup_set_timeout_pct(factor);
 
 	ret = pseries_suspend(handle);
 	if (ret == 0) {
@@ -766,7 +766,7 @@ static int pseries_migrate_partition(u64 handle)
 		pseries_cancel_migration(handle, ret);
 
 	if (factor)
-		watchdog_nmi_set_timeout_pct(0);
+		watchdog_hardlockup_set_timeout_pct(0);
 
 out:
 	vas_migration_handler(VAS_RESUME);
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 5dcf31f7e81f..9d9e29b75c43 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -282,7 +282,7 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
  * sparc specific NMI watchdog enable function.
  * Enables watchdog if it is not enabled already.
  */
-void watchdog_nmi_enable(unsigned int cpu)
+void watchdog_hardlockup_enable(unsigned int cpu)
 {
 	if (atomic_read(&nmi_active) == -1) {
 		pr_warn("NMI watchdog cannot be enabled or disabled\n");
@@ -303,7 +303,7 @@ void watchdog_nmi_enable(unsigned int cpu)
  * sparc specific NMI watchdog disable function.
  * Disables watchdog if it is not disabled already.
  */
-void watchdog_nmi_disable(unsigned int cpu)
+void watchdog_hardlockup_disable(unsigned int cpu)
 {
 	if (atomic_read(&nmi_active) == -1)
 		pr_warn_once("NMI watchdog cannot be enabled or disabled\n");
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e286a2a1902d..83076bf70ce8 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -75,10 +75,10 @@ static inline void reset_hung_task_detector(void) { }
  * handled differently because its value is not boolean, and the lockup
  * detectors are 'suspended' while 'watchdog_thresh' is equal zero.
  */
-#define NMI_WATCHDOG_ENABLED_BIT   0
-#define SOFT_WATCHDOG_ENABLED_BIT  1
-#define NMI_WATCHDOG_ENABLED      (1 << NMI_WATCHDOG_ENABLED_BIT)
-#define SOFT_WATCHDOG_ENABLED     (1 << SOFT_WATCHDOG_ENABLED_BIT)
+#define WATCHDOG_HARDLOCKUP_ENABLED_BIT  0
+#define WATCHDOG_SOFTOCKUP_ENABLED_BIT   1
+#define WATCHDOG_HARDLOCKUP_ENABLED     (1 << WATCHDOG_HARDLOCKUP_ENABLED_BIT)
+#define WATCHDOG_SOFTOCKUP_ENABLED      (1 << WATCHDOG_SOFTOCKUP_ENABLED_BIT)
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
@@ -120,11 +120,11 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
 #endif
 
-void watchdog_nmi_stop(void);
-void watchdog_nmi_start(void);
-int watchdog_nmi_probe(void);
-void watchdog_nmi_enable(unsigned int cpu);
-void watchdog_nmi_disable(unsigned int cpu);
+void watchdog_hardlockup_stop(void);
+void watchdog_hardlockup_start(void);
+int watchdog_hardlockup_probe(void);
+void watchdog_hardlockup_enable(unsigned int cpu);
+void watchdog_hardlockup_disable(unsigned int cpu);
 
 void lockup_detector_reconfigure(void);
 
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 64d7d2a0a7df..31548c0ae874 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -30,17 +30,17 @@
 static DEFINE_MUTEX(watchdog_mutex);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
-# define NMI_WATCHDOG_DEFAULT	1
+# define WATCHDOG_HARDLOCKUP_DEFAULT	1
 #else
-# define NMI_WATCHDOG_DEFAULT	0
+# define WATCHDOG_HARDLOCKUP_DEFAULT	0
 #endif
 
 unsigned long __read_mostly watchdog_enabled;
 int __read_mostly watchdog_user_enabled = 1;
-int __read_mostly nmi_watchdog_user_enabled = NMI_WATCHDOG_DEFAULT;
-int __read_mostly soft_watchdog_user_enabled = 1;
+int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT;
+int __read_mostly watchdog_softlockup_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
-static int __read_mostly nmi_watchdog_available;
+static int __read_mostly watchdog_hardlockup_available;
 
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
@@ -66,7 +66,7 @@ unsigned int __read_mostly hardlockup_panic =
  */
 void __init hardlockup_detector_disable(void)
 {
-	nmi_watchdog_user_enabled = 0;
+	watchdog_hardlockup_user_enabled = 0;
 }
 
 static int __init hardlockup_panic_setup(char *str)
@@ -76,9 +76,9 @@ static int __init hardlockup_panic_setup(char *str)
 	else if (!strncmp(str, "nopanic", 7))
 		hardlockup_panic = 0;
 	else if (!strncmp(str, "0", 1))
-		nmi_watchdog_user_enabled = 0;
+		watchdog_hardlockup_user_enabled = 0;
 	else if (!strncmp(str, "1", 1))
-		nmi_watchdog_user_enabled = 1;
+		watchdog_hardlockup_user_enabled = 1;
 	return 1;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
@@ -190,40 +190,40 @@ static inline void watchdog_hardlockup_kick(void) { }
  * These functions can be overridden if an architecture implements its
  * own hardlockup detector.
  *
- * watchdog_nmi_enable/disable can be implemented to start and stop when
+ * watchdog_hardlockup_enable/disable can be implemented to start and stop when
  * softlockup watchdog start and stop. The arch must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-void __weak watchdog_nmi_enable(unsigned int cpu)
+void __weak watchdog_hardlockup_enable(unsigned int cpu)
 {
 	hardlockup_detector_perf_enable();
 }
 
-void __weak watchdog_nmi_disable(unsigned int cpu)
+void __weak watchdog_hardlockup_disable(unsigned int cpu)
 {
 	hardlockup_detector_perf_disable();
 }
 
-/* Return 0, if a NMI watchdog is available. Error code otherwise */
-int __weak __init watchdog_nmi_probe(void)
+/* Return 0, if a hardlockup watchdog is available. Error code otherwise */
+int __weak __init watchdog_hardlockup_probe(void)
 {
 	return hardlockup_detector_perf_init();
 }
 
 /**
- * watchdog_nmi_stop - Stop the watchdog for reconfiguration
+ * watchdog_hardlockup_stop - Stop the watchdog for reconfiguration
  *
  * The reconfiguration steps are:
- * watchdog_nmi_stop();
+ * watchdog_hardlockup_stop();
  * update_variables();
- * watchdog_nmi_start();
+ * watchdog_hardlockup_start();
  */
-void __weak watchdog_nmi_stop(void) { }
+void __weak watchdog_hardlockup_stop(void) { }
 
 /**
- * watchdog_nmi_start - Start the watchdog after reconfiguration
+ * watchdog_hardlockup_start - Start the watchdog after reconfiguration
  *
- * Counterpart to watchdog_nmi_stop().
+ * Counterpart to watchdog_hardlockup_stop().
  *
  * The following variables have been updated in update_variables() and
  * contain the currently valid configuration:
@@ -231,23 +231,23 @@ void __weak watchdog_nmi_stop(void) { }
  * - watchdog_thresh
  * - watchdog_cpumask
  */
-void __weak watchdog_nmi_start(void) { }
+void __weak watchdog_hardlockup_start(void) { }
 
 /**
  * lockup_detector_update_enable - Update the sysctl enable bit
  *
- * Caller needs to make sure that the NMI/perf watchdogs are off, so this
- * can't race with watchdog_nmi_disable().
+ * Caller needs to make sure that the hard watchdogs are off, so this
+ * can't race with watchdog_hardlockup_disable().
  */
 static void lockup_detector_update_enable(void)
 {
 	watchdog_enabled = 0;
 	if (!watchdog_user_enabled)
 		return;
-	if (nmi_watchdog_available && nmi_watchdog_user_enabled)
-		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
-	if (soft_watchdog_user_enabled)
-		watchdog_enabled |= SOFT_WATCHDOG_ENABLED;
+	if (watchdog_hardlockup_available && watchdog_hardlockup_user_enabled)
+		watchdog_enabled |= WATCHDOG_HARDLOCKUP_ENABLED;
+	if (watchdog_softlockup_user_enabled)
+		watchdog_enabled |= WATCHDOG_SOFTOCKUP_ENABLED;
 }
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
@@ -288,7 +288,7 @@ __setup("nowatchdog", nowatchdog_setup);
 
 static int __init nosoftlockup_setup(char *str)
 {
-	soft_watchdog_user_enabled = 0;
+	watchdog_softlockup_user_enabled = 0;
 	return 1;
 }
 __setup("nosoftlockup", nosoftlockup_setup);
@@ -402,7 +402,7 @@ static int is_softlockup(unsigned long touch_ts,
 			 unsigned long period_ts,
 			 unsigned long now)
 {
-	if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
+	if ((watchdog_enabled & WATCHDOG_SOFTOCKUP_ENABLED) && watchdog_thresh) {
 		/* Warn about unreasonable delays. */
 		if (time_after(now, period_ts + get_softlockup_thresh()))
 			return now - touch_ts;
@@ -537,7 +537,7 @@ static void watchdog_enable(unsigned int cpu)
 	complete(done);
 
 	/*
-	 * Start the timer first to prevent the NMI watchdog triggering
+	 * Start the timer first to prevent the hardlockup watchdog triggering
 	 * before the timer has a chance to fire.
 	 */
 	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
@@ -547,9 +547,9 @@ static void watchdog_enable(unsigned int cpu)
 
 	/* Initialize timestamp */
 	update_touch_ts();
-	/* Enable the perf event */
-	if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
-		watchdog_nmi_enable(cpu);
+	/* Enable the hardlockup detector */
+	if (watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED)
+		watchdog_hardlockup_enable(cpu);
 }
 
 static void watchdog_disable(unsigned int cpu)
@@ -559,11 +559,11 @@ static void watchdog_disable(unsigned int cpu)
 	WARN_ON_ONCE(cpu != smp_processor_id());
 
 	/*
-	 * Disable the perf event first. That prevents that a large delay
-	 * between disabling the timer and disabling the perf event causes
-	 * the perf NMI to detect a false positive.
+	 * Disable the hardlockup detector first. That prevents that a large
+	 * delay between disabling the timer and disabling the hardlockup
+	 * detector causes a false positive.
 	 */
-	watchdog_nmi_disable(cpu);
+	watchdog_hardlockup_disable(cpu);
 	hrtimer_cancel(hrtimer);
 	wait_for_completion(this_cpu_ptr(&softlockup_completion));
 }
@@ -619,7 +619,7 @@ int lockup_detector_offline_cpu(unsigned int cpu)
 static void __lockup_detector_reconfigure(void)
 {
 	cpus_read_lock();
-	watchdog_nmi_stop();
+	watchdog_hardlockup_stop();
 
 	softlockup_stop_all();
 	set_sample_period();
@@ -627,7 +627,7 @@ static void __lockup_detector_reconfigure(void)
 	if (watchdog_enabled && watchdog_thresh)
 		softlockup_start_all();
 
-	watchdog_nmi_start();
+	watchdog_hardlockup_start();
 	cpus_read_unlock();
 	/*
 	 * Must be called outside the cpus locked section to prevent
@@ -668,9 +668,9 @@ static __init void lockup_detector_setup(void)
 static void __lockup_detector_reconfigure(void)
 {
 	cpus_read_lock();
-	watchdog_nmi_stop();
+	watchdog_hardlockup_stop();
 	lockup_detector_update_enable();
-	watchdog_nmi_start();
+	watchdog_hardlockup_start();
 	cpus_read_unlock();
 }
 void lockup_detector_reconfigure(void)
@@ -725,14 +725,14 @@ static void proc_watchdog_update(void)
 /*
  * common function for watchdog, nmi_watchdog and soft_watchdog parameter
  *
- * caller             | table->data points to      | 'which'
- * -------------------|----------------------------|--------------------------
- * proc_watchdog      | watchdog_user_enabled      | NMI_WATCHDOG_ENABLED |
- *                    |                            | SOFT_WATCHDOG_ENABLED
- * -------------------|----------------------------|--------------------------
- * proc_nmi_watchdog  | nmi_watchdog_user_enabled  | NMI_WATCHDOG_ENABLED
- * -------------------|----------------------------|--------------------------
- * proc_soft_watchdog | soft_watchdog_user_enabled | SOFT_WATCHDOG_ENABLED
+ * caller             | table->data points to            | 'which'
+ * -------------------|----------------------------------|-------------------------------
+ * proc_watchdog      | watchdog_user_enabled            | WATCHDOG_HARDLOCKUP_ENABLED |
+ *                    |                                  | WATCHDOG_SOFTOCKUP_ENABLED
+ * -------------------|----------------------------------|-------------------------------
+ * proc_nmi_watchdog  | watchdog_hardlockup_user_enabled | WATCHDOG_HARDLOCKUP_ENABLED
+ * -------------------|----------------------------------|-------------------------------
+ * proc_soft_watchdog | watchdog_softlockup_user_enabled | WATCHDOG_SOFTOCKUP_ENABLED
  */
 static int proc_watchdog_common(int which, struct ctl_table *table, int write,
 				void *buffer, size_t *lenp, loff_t *ppos)
@@ -764,7 +764,8 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write,
 int proc_watchdog(struct ctl_table *table, int write,
 		  void *buffer, size_t *lenp, loff_t *ppos)
 {
-	return proc_watchdog_common(NMI_WATCHDOG_ENABLED|SOFT_WATCHDOG_ENABLED,
+	return proc_watchdog_common(WATCHDOG_HARDLOCKUP_ENABLED |
+				    WATCHDOG_SOFTOCKUP_ENABLED,
 				    table, write, buffer, lenp, ppos);
 }
 
@@ -774,9 +775,9 @@ int proc_watchdog(struct ctl_table *table, int write,
 int proc_nmi_watchdog(struct ctl_table *table, int write,
 		      void *buffer, size_t *lenp, loff_t *ppos)
 {
-	if (!nmi_watchdog_available && write)
+	if (!watchdog_hardlockup_available && write)
 		return -ENOTSUPP;
-	return proc_watchdog_common(NMI_WATCHDOG_ENABLED,
+	return proc_watchdog_common(WATCHDOG_HARDLOCKUP_ENABLED,
 				    table, write, buffer, lenp, ppos);
 }
 
@@ -786,7 +787,7 @@ int proc_nmi_watchdog(struct ctl_table *table, int write,
 int proc_soft_watchdog(struct ctl_table *table, int write,
 			void *buffer, size_t *lenp, loff_t *ppos)
 {
-	return proc_watchdog_common(SOFT_WATCHDOG_ENABLED,
+	return proc_watchdog_common(WATCHDOG_SOFTOCKUP_ENABLED,
 				    table, write, buffer, lenp, ppos);
 }
 
@@ -854,7 +855,7 @@ static struct ctl_table watchdog_sysctls[] = {
 	},
 	{
 		.procname       = "nmi_watchdog",
-		.data		= &nmi_watchdog_user_enabled,
+		.data		= &watchdog_hardlockup_user_enabled,
 		.maxlen		= sizeof(int),
 		.mode		= NMI_WATCHDOG_SYSCTL_PERM,
 		.proc_handler   = proc_nmi_watchdog,
@@ -871,7 +872,7 @@ static struct ctl_table watchdog_sysctls[] = {
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 	{
 		.procname       = "soft_watchdog",
-		.data		= &soft_watchdog_user_enabled,
+		.data		= &watchdog_softlockup_user_enabled,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
 		.proc_handler   = proc_soft_watchdog,
@@ -940,8 +941,8 @@ void __init lockup_detector_init(void)
 	cpumask_copy(&watchdog_cpumask,
 		     housekeeping_cpumask(HK_TYPE_TIMER));
 
-	if (!watchdog_nmi_probe())
-		nmi_watchdog_available = true;
+	if (!watchdog_hardlockup_probe())
+		watchdog_hardlockup_available = true;
 	lockup_detector_setup();
 	watchdog_sysctl_init();
 }
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 547917ebd5d3..9e6042a892b3 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -215,7 +215,7 @@ void __init hardlockup_detector_perf_restart(void)
 
 	lockdep_assert_cpus_held();
 
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+	if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED))
 		return;
 
 	for_each_online_cpu(cpu) {
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 13/18] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (11 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 12/18] watchdog/hardlockup: Rename some "NMI watchdog" constants/function Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-24 13:59   ` Petr Mladek
  2023-05-19 17:18 ` [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs Douglas Anderson
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

The fact that there watchdog_hardlockup_enable(),
watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are
declared __weak means that the configured hardlockup detector can
define non-weak versions of those functions if it needs to. Instead of
doing this, the perf hardlockup detector hooked itself into the
default __weak implementation, which was a bit awkward. Clean this up.

From comments, it looks as if the original design was done because the
__weak function were expected to implemented by the architecture and
not by the configured hardlockup detector. This got awkward when we
tried to add the buddy lockup detector which was not arch-specific but
wanted to hook into those same functions.

This is not expected to have any functional impact.

Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

(no changes since v4)

Changes in v4:
- ("Have the perf hardlockup use __weak ...") new for v4.

 include/linux/nmi.h    | 10 ----------
 kernel/watchdog.c      | 30 ++++++++++++++++++------------
 kernel/watchdog_perf.c | 20 ++++++++++++++------
 3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 83076bf70ce8..c216e8a1be1f 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -103,21 +103,11 @@ static inline void arch_touch_nmi_watchdog(void) { }
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
-extern void hardlockup_detector_perf_disable(void);
-extern void hardlockup_detector_perf_enable(void);
 extern void hardlockup_detector_perf_cleanup(void);
-extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-static inline void hardlockup_detector_perf_disable(void) { }
-static inline void hardlockup_detector_perf_enable(void) { }
 static inline void hardlockup_detector_perf_cleanup(void) { }
-# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
-# else
-static inline int hardlockup_detector_perf_init(void) { return 0; }
-# endif
 #endif
 
 void watchdog_hardlockup_stop(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 31548c0ae874..08ce046f636d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -187,27 +187,33 @@ static inline void watchdog_hardlockup_kick(void) { }
 #endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */
 
 /*
- * These functions can be overridden if an architecture implements its
- * own hardlockup detector.
+ * These functions can be overridden based on the configured hardlockdup detector.
  *
  * watchdog_hardlockup_enable/disable can be implemented to start and stop when
- * softlockup watchdog start and stop. The arch must select the
+ * softlockup watchdog start and stop. The detector must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-void __weak watchdog_hardlockup_enable(unsigned int cpu)
-{
-	hardlockup_detector_perf_enable();
-}
+void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
 
-void __weak watchdog_hardlockup_disable(unsigned int cpu)
-{
-	hardlockup_detector_perf_disable();
-}
+void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
 
 /* Return 0, if a hardlockup watchdog is available. Error code otherwise */
 int __weak __init watchdog_hardlockup_probe(void)
 {
-	return hardlockup_detector_perf_init();
+	/*
+	 * If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture
+	 * is assumed to have the hard watchdog available and we return 0.
+	 */
+	if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
+		return 0;
+
+	/*
+	 * Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG
+	 * are required to implement a non-weak version of this probe function
+	 * to tell whether they are available. If they don't override then
+	 * we'll return -ENODEV.
+	 */
+	return -ENODEV;
 }
 
 /**
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 9e6042a892b3..349fcd4d2abc 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -132,10 +132,14 @@ static int hardlockup_detector_event_create(void)
 }
 
 /**
- * hardlockup_detector_perf_enable - Enable the local event
+ * watchdog_hardlockup_enable - Enable the local event
+ *
+ * @cpu: The CPU to enable hard lockup on.
  */
-void hardlockup_detector_perf_enable(void)
+void watchdog_hardlockup_enable(unsigned int cpu)
 {
+	WARN_ON_ONCE(cpu != smp_processor_id());
+
 	if (hardlockup_detector_event_create())
 		return;
 
@@ -147,12 +151,16 @@ void hardlockup_detector_perf_enable(void)
 }
 
 /**
- * hardlockup_detector_perf_disable - Disable the local event
+ * watchdog_hardlockup_disable - Disable the local event
+ *
+ * @cpu: The CPU to enable hard lockup on.
  */
-void hardlockup_detector_perf_disable(void)
+void watchdog_hardlockup_disable(unsigned int cpu)
 {
 	struct perf_event *event = this_cpu_read(watchdog_ev);
 
+	WARN_ON_ONCE(cpu != smp_processor_id());
+
 	if (event) {
 		perf_event_disable(event);
 		this_cpu_write(watchdog_ev, NULL);
@@ -227,9 +235,9 @@ void __init hardlockup_detector_perf_restart(void)
 }
 
 /**
- * hardlockup_detector_perf_init - Probe whether NMI event is available at all
+ * watchdog_hardlockup_probe - Probe whether NMI event is available at all
  */
-int __init hardlockup_detector_perf_init(void)
+int __init watchdog_hardlockup_probe(void)
 {
 	int ret = hardlockup_detector_event_create();
 
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (12 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 13/18] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-25 16:26   ` Petr Mladek
  2023-05-19 17:18 ` [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs Douglas Anderson
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson, Colin Cross

Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups. Instead of using
something arch-specific we will use the buddy system, where each CPU
watches out for another one. Specifically, each CPU will use its
softlockup hrtimer to check that the next CPU is processing hrtimer
interrupts by verifying that a counter is increasing.

NOTE: unlike the other hard lockup detectors, the buddy one can't
easily show what's happening on the CPU that locked up just by doing a
simple backtrace. It relies on some other mechanism in the system to
get information about the locked up CPUs. This could be support for
NMI backtraces like [1], it could be a mechanism for printing the PC
of locked CPUs at panic time like [2] / [3], or it could be something
else. Even though that means we still rely on arch-specific code, this
arch-specific code seems to often be implemented even on architectures
that don't have a hardlockup detector.

This style of hardlockup detector originated in some downstream
Android trees and has been rebased on / carried in ChromeOS trees for
quite a long time for use on arm and arm64 boards. Historically on
these boards we've leveraged mechanism [2] / [3] to get information
about hung CPUs, but we could move to [1].

Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific
hardlockup detector. On x86, for instance, there is a 24-part patch
series [4] in progress switching the arch-specific hard lockup
detector from a scarce perf counter to a less-scarce hardware
resource. Potentially the buddy system could be a simpler alternative
to free up the perf counter but still get hard lockup detection.

Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
  arch-specific hardlockup detector, like arm32 and arm64 (though it's
  being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
  system could still detect the problem (though it would be unlikely
  to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
  detector on the running CPU as it uses to detect hardlockups on
  other CPUs. Compared to other hardlockup detectors, this means it
  generates fewer interrupts and thus is likely better able to let
  CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
  can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
  backtrace) to get info about the locked up CPU.

[1] https://lore.kernel.org/r/20230419225604.21204-1-dianders@chromium.org
[2] https://issuetracker.google.com/172213129
[3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
[4] https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/
[5] https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/

Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
This patch has been rebased in ChromeOS kernel trees many times, and
each time someone had to do work on it they added their
Signed-off-by. I've included those here. I've also left the author as
Colin Cross since the core code is still his, even if it's now been
reorganized a lot.

(no changes since v4)

Changes in v4:
- Reworked atop a whole pile of cleanups, as suggested by Petr.

Changes in v3:
- Added a note in commit message about the effect on idle.
- Cleaned up commit message pros/cons to be complete sentences.
- More cpu => CPU (in Kconfig and comments).
- No code changes other than comments.

Changes in v2:
- No code changes.
- Reworked description and Kconfig based on v1 discussion.
- cpu => CPU (in commit message).

 include/linux/nmi.h     |  9 +++-
 kernel/Makefile         |  1 +
 kernel/watchdog.c       | 29 +++++++++----
 kernel/watchdog_buddy.c | 93 +++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug       | 52 +++++++++++++++++++++--
 5 files changed, 173 insertions(+), 11 deletions(-)
 create mode 100644 kernel/watchdog_buddy.c

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index c216e8a1be1f..47db14e7da52 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -87,8 +87,9 @@ extern unsigned int hardlockup_panic;
 static inline void hardlockup_detector_disable(void) {}
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER)
 void arch_touch_nmi_watchdog(void);
+void watchdog_hardlockup_touch_cpu(unsigned int cpu);
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 #elif !defined(CONFIG_HAVE_NMI_WATCHDOG)
 static inline void arch_touch_nmi_watchdog(void) { }
@@ -118,6 +119,12 @@ void watchdog_hardlockup_disable(unsigned int cpu);
 
 void lockup_detector_reconfigure(void);
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
+void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts);
+#else
+static inline void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) {}
+#endif
+
 /**
  * touch_nmi_watchdog - manually pet the hardlockup watchdog.
  *
diff --git a/kernel/Makefile b/kernel/Makefile
index 7eb72033143c..f9e3fd9195d9 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -91,6 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) += watchdog_buddy.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 08ce046f636d..4110f7ca23a5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER)
 
 static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
 static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
@@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void)
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
+void watchdog_hardlockup_touch_cpu(unsigned int cpu)
+{
+	per_cpu(watchdog_hardlockup_touched, cpu) = true;
+
+	/* Match with smp_rmb() in watchdog_hardlockup_check() */
+	smp_wmb();
+}
+
 static bool is_hardlockup(unsigned int cpu)
 {
 	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
@@ -123,13 +131,16 @@ static bool is_hardlockup(unsigned int cpu)
 	return false;
 }
 
-static void watchdog_hardlockup_kick(void)
+static unsigned long watchdog_hardlockup_kick(void)
 {
-	atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));
+	return atomic_inc_return(raw_cpu_ptr(&hrtimer_interrupts));
 }
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
+	/* Match with smp_wmb() in watchdog_hardlockup_touch_cpu() */
+	smp_rmb();
+
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
 		return;
@@ -180,11 +191,11 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 	}
 }
 
-#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */
+#else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */
 
-static inline void watchdog_hardlockup_kick(void) { }
+static inline unsigned long watchdog_hardlockup_kick(void) { return 0; }
 
-#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */
+#endif /* !CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */
 
 /*
  * These functions can be overridden based on the configured hardlockdup detector.
@@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	struct pt_regs *regs = get_irq_regs();
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
+	unsigned long hrtimer_interrupts;
 
 	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
 
-	watchdog_hardlockup_kick();
+	hrtimer_interrupts = watchdog_hardlockup_kick();
+
+	/* test for hardlockups */
+	watchdog_buddy_check_hardlockup(hrtimer_interrupts);
 
 	/* kick the softlockup detector */
 	if (completion_done(this_cpu_ptr(&softlockup_completion))) {
diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
new file mode 100644
index 000000000000..fee45af2e5bd
--- /dev/null
+++ b/kernel/watchdog_buddy.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/kernel.h>
+#include <linux/nmi.h>
+#include <linux/percpu-defs.h>
+
+static cpumask_t __read_mostly watchdog_cpus;
+
+static unsigned int watchdog_next_cpu(unsigned int cpu)
+{
+	cpumask_t cpus = watchdog_cpus;
+	unsigned int next_cpu;
+
+	next_cpu = cpumask_next(cpu, &cpus);
+	if (next_cpu >= nr_cpu_ids)
+		next_cpu = cpumask_first(&cpus);
+
+	if (next_cpu == cpu)
+		return nr_cpu_ids;
+
+	return next_cpu;
+}
+
+int __init watchdog_hardlockup_probe(void)
+{
+	return 0;
+}
+
+void watchdog_hardlockup_enable(unsigned int cpu)
+{
+	unsigned int next_cpu;
+
+	/*
+	 * The new CPU will be marked online before the hrtimer interrupt
+	 * gets a chance to run on it. If another CPU tests for a
+	 * hardlockup on the new CPU before it has run its the hrtimer
+	 * interrupt, it will get a false positive. Touch the watchdog on
+	 * the new CPU to delay the check for at least 3 sampling periods
+	 * to guarantee one hrtimer has run on the new CPU.
+	 */
+	watchdog_hardlockup_touch_cpu(cpu);
+
+	/*
+	 * We are going to check the next CPU. Our watchdog_hrtimer
+	 * need not be zero if the CPU has already been online earlier.
+	 * Touch the watchdog on the next CPU to avoid false positive
+	 * if we try to check it in less then 3 interrupts.
+	 */
+	next_cpu = watchdog_next_cpu(cpu);
+	if (next_cpu < nr_cpu_ids)
+		watchdog_hardlockup_touch_cpu(next_cpu);
+
+	cpumask_set_cpu(cpu, &watchdog_cpus);
+}
+
+void watchdog_hardlockup_disable(unsigned int cpu)
+{
+	unsigned int next_cpu = watchdog_next_cpu(cpu);
+
+	/*
+	 * Offlining this CPU will cause the CPU before this one to start
+	 * checking the one after this one. If this CPU just finished checking
+	 * the next CPU and updating hrtimer_interrupts_saved, and then the
+	 * previous CPU checks it within one sample period, it will trigger a
+	 * false positive. Touch the watchdog on the next CPU to prevent it.
+	 */
+	if (next_cpu < nr_cpu_ids)
+		watchdog_hardlockup_touch_cpu(next_cpu);
+
+	cpumask_clear_cpu(cpu, &watchdog_cpus);
+}
+
+void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts)
+{
+	unsigned int next_cpu;
+
+	/*
+	 * Test for hardlockups every 3 samples. The sample period is
+	 *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
+	 *  watchdog_thresh (over by 20%).
+	 */
+	if (hrtimer_interrupts % 3 != 0)
+		return;
+
+	/* check for a hardlockup on the next CPU */
+	next_cpu = watchdog_next_cpu(smp_processor_id());
+	if (next_cpu >= nr_cpu_ids)
+		return;
+
+	watchdog_hardlockup_check(next_cpu, NULL);
+}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ce51d4dc6803..abcad0513a32 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
 
 	  Say N if unsure.
 
-config HARDLOCKUP_DETECTOR_PERF
+# Both the "perf" and "buddy" hardlockup detectors count hrtimer
+# interrupts. This config enables functions managing this common code.
+config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
 	bool
 	select SOFTLOCKUP_DETECTOR
 
+config HARDLOCKUP_DETECTOR_PERF
+	bool
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF
+	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
+
+config HARDLOCKUP_DETECTOR_BUDDY
+	bool
+	depends on SMP
+	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
+
+# For hardlockup detectors you can have one directly provided by the arch
+# or use a "non-arch" one. If you're using a "non-arch" one that is
+# further divided the perf hardlockup detector (which, confusingly, needs
+# arch-provided perf support) and the buddy hardlockup detector (which just
+# needs SMP). In either case, using the "non-arch" code conflicts with
+# the NMI watchdog code (which is sometimes used directly and sometimes used
+# by the arch-provided hardlockup detector).
+config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+	bool
+	depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
+	default y
+
+config HARDLOCKUP_DETECTOR_PREFER_BUDDY
+	bool "Prefer the buddy CPU hardlockup detector"
+	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
+	help
+	  Say Y here to prefer the buddy hardlockup detector over the perf one.
+
+	  With the buddy detector, each CPU uses its softlockup hrtimer
+	  to check that the next CPU is processing hrtimer interrupts by
+	  verifying that a counter is increasing.
+
+	  This hardlockup detector is useful on systems that don't have
+	  an arch-specific hardlockup detector or if resources needed
+	  for the hardlockup detector are better used for other things.
+
+# This will select the appropriate non-arch hardlockdup detector
+config HARDLOCKUP_DETECTOR_NON_ARCH
+	bool
+	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+	select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
+	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
+
 #
 # Enables a timestamp based low pass filter to compensate for perf based
 # hard lockup detection which runs too fast due to turbo modes.
@@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP
 config HARDLOCKUP_DETECTOR
 	bool "Detect Hard Lockups"
 	depends on DEBUG_KERNEL && !S390
-	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
+	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
 	select LOCKUP_DETECTOR
-	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
+	select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+
 	help
 	  Say Y here to enable the kernel to act as a watchdog to detect
 	  hard lockups.
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (13 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-26 12:36   ` Petr Mladek
  2023-06-12 10:33   ` Mark Rutland
  2023-05-19 17:18 ` [PATCH v5 16/18] watchdog/perf: Adapt the watchdog_perf interface for async model Douglas Anderson
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

On arm64, NMI support needs to be detected at runtime. Add a weak
function to the perf hardlockup detector so that an architecture can
implement it to detect whether NMIs are available.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
While I won't object to this patch landing, I consider it part of the
arm64 perf hardlockup effort. I would be OK with the earlier patches
in the series landing and then not landing ${SUBJECT} patch nor
anything else later.

I'll also note that, as an alternative to this, it would be nice if we
could figure out how to make perf_event_create_kernel_counter() fail
on arm64 if NMIs aren't available. Maybe we could add a "must_use_nmi"
element to "struct perf_event_attr"?

(no changes since v4)

Changes in v4:
- ("Add a weak function for an arch to detect ...") new for v4.

 include/linux/nmi.h    |  1 +
 kernel/watchdog_perf.c | 12 +++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 47db14e7da52..eb616fc07c85 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -210,6 +210,7 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
+bool arch_perf_nmi_is_available(void);
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 349fcd4d2abc..8ea00c4a24b2 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -234,12 +234,22 @@ void __init hardlockup_detector_perf_restart(void)
 	}
 }
 
+bool __weak __init arch_perf_nmi_is_available(void)
+{
+	return true;
+}
+
 /**
  * watchdog_hardlockup_probe - Probe whether NMI event is available at all
  */
 int __init watchdog_hardlockup_probe(void)
 {
-	int ret = hardlockup_detector_event_create();
+	int ret;
+
+	if (!arch_perf_nmi_is_available())
+		return -ENODEV;
+
+	ret = hardlockup_detector_event_create();
 
 	if (ret) {
 		pr_info("Perf NMI watchdog permanently disabled\n");
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 16/18] watchdog/perf: Adapt the watchdog_perf interface for async model
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (14 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 17/18] arm64: add hw_nmi_get_sample_period for preparation of lockup detector Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 18/18] arm64: Enable perf events based hard " Douglas Anderson
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

From: Lecopzer Chen <lecopzer.chen@mediatek.com>

When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be
not ready yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
with the driver model and cpuhp. Hence it is hard to push this
initialization before smp_init().

But it is easy to take an opposite approach and try to initialize
the watchdog once again later.
The delayed probe is called using workqueues. It need to allocate
memory and must be proceed in a normal context.
The delayed probe is able to use if watchdog_hardlockup_probe() returns
non-zero which means the return code returned when PMU is not ready yet.

Provide an API - lockup_detector_retry_init() for anyone who needs
to delayed init lockup detector if they had ever failed at
lockup_detector_init().

The original assumption is: nobody should use delayed probe after
lockup_detector_check() which has __init attribute.
That is, anyone uses this API must call between lockup_detector_init()
and lockup_detector_check(), and the caller must have __init attribute

Reviewed-by: Petr Mladek <pmladek@suse.com>
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together.

As part of making this match with my series, I resolved a few
conflicts and did a few renames. I also fixed the kernel test robot
yell [2] by providing a dummy implementation of the retry function if
CONFIG_LOCKUP_DETECTOR wasn't defined. That led me to move the
definition of the function and sanitize the name of the function to
match others around it.

This patch makes less sense without the patches to add support for the
arm64 perf hardlockup detector. Thus, I've put it at the end of the
series.

I removed the "kernel test robot" tag since that didn't really make
sense. Presumably the robot found a problem on a previous version of
the patch, but that's not normally a reason to add the Reported-by.

[1] https://lore.kernel.org/r/20220903093415.15850-5-lecopzer.chen@mediatek.com/
[2] https://lore.kernel.org/r/202209050639.jDaWd49E-lkp@intel.com/

(no changes since v4)

Changes in v4:
- Pulled ("Adapt the watchdog_hld interface ...") into my series for v4.

 include/linux/nmi.h |  2 ++
 kernel/watchdog.c   | 67 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index eb616fc07c85..d23902a2fd49 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -13,6 +13,7 @@
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 void lockup_detector_init(void);
+void lockup_detector_retry_init(void);
 void lockup_detector_soft_poweroff(void);
 void lockup_detector_cleanup(void);
 
@@ -34,6 +35,7 @@ extern int sysctl_hardlockup_all_cpu_backtrace;
 
 #else /* CONFIG_LOCKUP_DETECTOR */
 static inline void lockup_detector_init(void) { }
+static inline void lockup_detector_retry_init(void) { }
 static inline void lockup_detector_soft_poweroff(void) { }
 static inline void lockup_detector_cleanup(void) { }
 #endif /* !CONFIG_LOCKUP_DETECTOR */
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4110f7ca23a5..877d8670f26e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -208,7 +208,13 @@ void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
 
 void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
 
-/* Return 0, if a hardlockup watchdog is available. Error code otherwise */
+/*
+ * Watchdog-detector specific API.
+ *
+ * Return 0 when hardlockup watchdog is available, negative value otherwise.
+ * Note that the negative value means that a delayed probe might
+ * succeed later.
+ */
 int __weak __init watchdog_hardlockup_probe(void)
 {
 	/*
@@ -954,6 +960,62 @@ static void __init watchdog_sysctl_init(void)
 #define watchdog_sysctl_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
+static void __init lockup_detector_delay_init(struct work_struct *work);
+static bool allow_lockup_detector_init_retry __initdata;
+
+static struct work_struct detector_work __initdata =
+		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
+
+static void __init lockup_detector_delay_init(struct work_struct *work)
+{
+	int ret;
+
+	ret = watchdog_hardlockup_probe();
+	if (ret) {
+		pr_info("Delayed init of the lockup detector failed: %d\n", ret);
+		pr_info("Hard watchdog permanently disabled\n");
+		return;
+	}
+
+	allow_lockup_detector_init_retry = false;
+
+	watchdog_hardlockup_available = true;
+	lockup_detector_setup();
+}
+
+/*
+ * lockup_detector_retry_init - retry init lockup detector if possible.
+ *
+ * Retry hardlockup detector init. It is useful when it requires some
+ * functionality that has to be initialized later on a particular
+ * platform.
+ */
+void __init lockup_detector_retry_init(void)
+{
+	/* Must be called before late init calls */
+	if (!allow_lockup_detector_init_retry)
+		return;
+
+	schedule_work(&detector_work);
+}
+
+/*
+ * Ensure that optional delayed hardlockup init is proceed before
+ * the init code and memory is freed.
+ */
+static int __init lockup_detector_check(void)
+{
+	/* Prevent any later retry. */
+	allow_lockup_detector_init_retry = false;
+
+	/* Make sure no work is pending. */
+	flush_work(&detector_work);
+
+	return 0;
+
+}
+late_initcall_sync(lockup_detector_check);
+
 void __init lockup_detector_init(void)
 {
 	if (tick_nohz_full_enabled())
@@ -964,6 +1026,9 @@ void __init lockup_detector_init(void)
 
 	if (!watchdog_hardlockup_probe())
 		watchdog_hardlockup_available = true;
+	else
+		allow_lockup_detector_init_retry = true;
+
 	lockup_detector_setup();
 	watchdog_sysctl_init();
 }
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 17/18] arm64: add hw_nmi_get_sample_period for preparation of lockup detector
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (15 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 16/18] watchdog/perf: Adapt the watchdog_perf interface for async model Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  2023-05-19 17:18 ` [PATCH v5 18/18] arm64: Enable perf events based hard " Douglas Anderson
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

From: Lecopzer Chen <lecopzer.chen@mediatek.com>

Set safe maximum CPU frequency to 5 GHz in case a particular platform
doesn't implement cpufreq driver.
Although, architecture doesn't put any restrictions on
maximum frequency but 5 GHz seems to be safe maximum given the available
Arm CPUs in the market which are clocked much less than 5 GHz.

On the other hand, we can't make it much higher as it would lead to
a large hard-lockup detection timeout on parts which are running
slower (eg. 1GHz on Developerbox) and doesn't possess a cpufreq driver.

Co-developed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together.

The only change I made here was to remove an extra blank line that
checkpatch complained about.

As mentioned in the cover letter, I'm not really expecting this patch
to land together with the patches for the buddy detector. I included
it with my series simply for convenience of testing both series
together.

NOTE: the previous patch posted by Lecopzer pointed to Sumit's
patch [2] in the commit text but provided no context. I moved it to
this "after the cut" note.

[1] https://lore.kernel.org/r/20220903093415.15850-6-lecopzer.chen@mediatek.com/
[2] http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org

(no changes since v4)

Changes in v4:
- Pulled ("add hw_nmi_get_sample_period for ...") into my series for v4.

 arch/arm64/kernel/Makefile       |  1 +
 arch/arm64/kernel/watchdog_hld.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)
 create mode 100644 arch/arm64/kernel/watchdog_hld.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 7c2bb4e72476..cc22011ab66a 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_FUNCTION_TRACER)		+= ftrace.o entry-ftrace.o
 obj-$(CONFIG_MODULES)			+= module.o
 obj-$(CONFIG_ARM64_MODULE_PLTS)		+= module-plts.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF)	+= watchdog_hld.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
 obj-$(CONFIG_CPU_PM)			+= sleep.o suspend.o
 obj-$(CONFIG_CPU_IDLE)			+= cpuidle.o
diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
new file mode 100644
index 000000000000..2401eb1b7e55
--- /dev/null
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpufreq.h>
+
+/*
+ * Safe maximum CPU frequency in case a particular platform doesn't implement
+ * cpufreq driver. Although, architecture doesn't put any restrictions on
+ * maximum frequency but 5 GHz seems to be safe maximum given the available
+ * Arm CPUs in the market which are clocked much less than 5 GHz. On the other
+ * hand, we can't make it much higher as it would lead to a large hard-lockup
+ * detection timeout on parts which are running slower (eg. 1GHz on
+ * Developerbox) and doesn't possess a cpufreq driver.
+ */
+#define SAFE_MAX_CPU_FREQ	5000000000UL // 5 GHz
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
+{
+	unsigned int cpu = smp_processor_id();
+	unsigned long max_cpu_freq;
+
+	max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
+	if (!max_cpu_freq)
+		max_cpu_freq = SAFE_MAX_CPU_FREQ;
+
+	return (u64)max_cpu_freq * watchdog_thresh;
+}
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v5 18/18] arm64: Enable perf events based hard lockup detector
  2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
                   ` (16 preceding siblings ...)
  2023-05-19 17:18 ` [PATCH v5 17/18] arm64: add hw_nmi_get_sample_period for preparation of lockup detector Douglas Anderson
@ 2023-05-19 17:18 ` Douglas Anderson
  17 siblings, 0 replies; 38+ messages in thread
From: Douglas Anderson @ 2023-05-19 17:18 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: Matthias Kaehlcke, kgdb-bugreport, Stephane Eranian, mpe,
	Tzung-Bi Shih, Daniel Thompson, Mark Rutland, linuxppc-dev,
	Sumit Garg, npiggin, davem, Marc Zyngier, Stephen Boyd,
	sparclinux, christophe.leroy, Catalin Marinas, ravi.v.shankar,
	Randy Dunlap, Pingfan Liu, Guenter Roeck, Lecopzer Chen,
	Ian Rogers, ito-yuichi, ricardo.neri, linux-arm-kernel,
	linux-perf-users, Will Deacon, Chen-Yu Tsai, linux-kernel,
	Masayoshi Mizuma, Andi Kleen, Douglas Anderson

With the recent feature added to enable perf events to use pseudo NMIs
as interrupts on platforms which support GICv3 or later, its now been
possible to enable hard lockup detector (or NMI watchdog) on arm64
platforms. So enable corresponding support.

One thing to note here is that normally lockup detector is initialized
just after the early initcalls but PMU on arm64 comes up much later as
device_initcall(). To cope with that, override
arch_perf_nmi_is_available() to let the watchdog framework know PMU
not ready, and inform the framework to re-initialize lockup detection
once PMU has been initialized.

Co-developed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I yanked this patch from the mailing lists [1] into my series just to
make it easier to avoid conflicts between my series and the one adding
the arm64 perf hardlockup detector, in case someone wanted to test
them both together.

As part of making this match with my series, I needed to resolve
conflicts with the patch ("watchdog/hardlockup: Have the perf
hardlockup use __weak functions more cleanly"). This makes ${SUBJECT}
patch now depend on the patch ("watchdog/perf: Add a weak function for
an arch to detect if perf can use NMIs"). As talked about in that
patch, there may be better alternatives to accomplish the same thing.

As mentioned in the cover letter, I'm not really expecting this patch
to land together with the patches for the buddy detector. I included
it with my series simply for convenience of testing both series
together.

NOTE: the previous patch posted by Lecopzer pointed to Sumit's
patch [2] in the commit text but provided no context. I moved it to
this "after the cut" note.

[1] https://lore.kernel.org/r/20220903093415.15850-7-lecopzer.chen@mediatek.com/
[2] http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org

(no changes since v4)

Changes in v4:
- Pulled ("Enable perf events based hard ...") into my series for v4.

 arch/arm64/Kconfig               |  2 ++
 arch/arm64/kernel/watchdog_hld.c | 12 ++++++++++++
 drivers/perf/arm_pmu.c           |  5 +++++
 drivers/perf/arm_pmuv3.c         | 12 ++++++++++--
 include/linux/perf/arm_pmu.h     |  2 ++
 5 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1201d25a8a4..b3718e538f18 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -203,12 +203,14 @@ config ARM64
 	select HAVE_FUNCTION_ERROR_INJECTION
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_GCC_PLUGINS
+	select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS
 	select HAVE_IOREMAP_PROT
 	select HAVE_IRQ_TIME_ACCOUNTING
 	select HAVE_KVM
 	select HAVE_NMI
 	select HAVE_PERF_EVENTS
+	select HAVE_PERF_EVENTS_NMI if ARM64_PSEUDO_NMI
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_PREEMPT_DYNAMIC_KEY
diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
index 2401eb1b7e55..dcd25322127c 100644
--- a/arch/arm64/kernel/watchdog_hld.c
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/nmi.h>
 #include <linux/cpufreq.h>
+#include <linux/perf/arm_pmu.h>
 
 /*
  * Safe maximum CPU frequency in case a particular platform doesn't implement
@@ -22,3 +24,13 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
 
 	return (u64)max_cpu_freq * watchdog_thresh;
 }
+
+bool __init arch_perf_nmi_is_available(void)
+{
+	/*
+	 * hardlockup_detector_perf_init() will success even if Pseudo-NMI turns off,
+	 * however, the pmu interrupts will act like a normal interrupt instead of
+	 * NMI and the hardlockup detector would be broken.
+	 */
+	return arm_pmu_irq_is_nmi();
+}
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 15bd1e34a88e..7b9caa502d33 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -687,6 +687,11 @@ static int armpmu_get_cpu_irq(struct arm_pmu *pmu, int cpu)
 	return per_cpu(hw_events->irq, cpu);
 }
 
+bool arm_pmu_irq_is_nmi(void)
+{
+	return has_nmi;
+}
+
 /*
  * PMU hardware loses all context when a CPU goes offline.
  * When a CPU is hotplugged back in, since some hardware registers are
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index c98e4039386d..7b28d65f3f1c 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -22,6 +22,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched_clock.h>
 #include <linux/smp.h>
+#include <linux/nmi.h>
 
 #include <asm/arm_pmuv3.h>
 
@@ -1348,10 +1349,17 @@ static struct platform_driver armv8_pmu_driver = {
 
 static int __init armv8_pmu_driver_init(void)
 {
+	int ret;
+
 	if (acpi_disabled)
-		return platform_driver_register(&armv8_pmu_driver);
+		ret = platform_driver_register(&armv8_pmu_driver);
 	else
-		return arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
+		ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
+
+	if (!ret)
+		lockup_detector_retry_init();
+
+	return ret;
 }
 device_initcall(armv8_pmu_driver_init)
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 525b5d64e394..5b00f5cb4cf9 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -171,6 +171,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu);
 #define kvm_host_pmu_init(x)	do { } while(0)
 #endif
 
+bool arm_pmu_irq_is_nmi(void);
+
 /* Internal functions only for core arm_pmu code */
 struct arm_pmu *armpmu_alloc(void);
 void armpmu_free(struct arm_pmu *pmu);
-- 
2.40.1.698.g37aff9b760-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 02/18] watchdog/perf: More properly prevent false positives with turbo modes
  2023-05-19 17:18 ` [PATCH v5 02/18] watchdog/perf: More properly prevent false positives with turbo modes Douglas Anderson
@ 2023-05-23  9:35   ` Petr Mladek
  0 siblings, 0 replies; 38+ messages in thread
From: Petr Mladek @ 2023-05-23  9:35 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri 2023-05-19 10:18:26, Douglas Anderson wrote:
> Currently, in the watchdog_overflow_callback() we first check to see
> if the watchdog had been touched and _then_ we handle the workaround
> for turbo mode. This order should be reversed.
> 
> Specifically, "touching" the hardlockup detector's watchdog should
> avoid lockups being detected for one period that should be roughly the
> same regardless of whether we're running turbo or not. That means that
> we should do the extra accounting for turbo _before_ we look at (and
> clear) the global indicating that we've been touched.

The ideal solution would be to reset the turbo-mode-related
variables when the watchdog is touched. And keep checking
watchdog_nmi_touch first.

But this ordering change should be good enough. It causes that
we always check watchdog_nmi_touch when the turbo-more-related
variables are already reset.

> NOTE: this fix is made based on code inspection. I am not aware of any
> reports where the old code would have generated false positives. That
> being said, this order seems more correct and also makes it easier
> down the line to share code with the "buddy" hardlockup detector.
> 
> Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 06/18] watchdog/hardlockup: Add comments to touch_nmi_watchdog()
  2023-05-19 17:18 ` [PATCH v5 06/18] watchdog/hardlockup: Add comments to touch_nmi_watchdog() Douglas Anderson
@ 2023-05-23  9:58   ` Petr Mladek
  0 siblings, 0 replies; 38+ messages in thread
From: Petr Mladek @ 2023-05-23  9:58 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri 2023-05-19 10:18:30, Douglas Anderson wrote:
> In preparation for the buddy hardlockup detector, add comments to
> touch_nmi_watchdog() to make it obvious that it touches the configured
> hardlockup detector regardless of whether it's backed by an NMI. Also
> note that arch_touch_nmi_watchdog() may not be architecture-specific.
> 
> Ideally, we'd like to rename these functions but that is a fairly
> disruptive change touching a lot of drivers. After discussion [1] the
> plan is to defer this until a good time.
> 
> [1] https://lore.kernel.org/r/ZFy0TX1tfhlH8gxj@alley
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
> Changes in v5:
> - No longer rename touch_nmi_watchdog(), just add comments.
> 
> Changes in v4:
> - ("Rename touch_nmi_watchdog() to ...") new for v4.
> 
>  include/linux/nmi.h | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 454fe99c4874..fafab128f37e 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -125,15 +125,30 @@ void watchdog_nmi_disable(unsigned int cpu);
>  void lockup_detector_reconfigure(void);
>  
>  /**
> - * touch_nmi_watchdog - restart NMI watchdog timeout.
> + * touch_nmi_watchdog - manually pet the hardlockup watchdog.
>   *
> - * If the architecture supports the NMI watchdog, touch_nmi_watchdog()
> - * may be used to reset the timeout - for code which intentionally
> - * disables interrupts for a long time. This call is stateless.
> + * If we support detecting hardlockups, touch_nmi_watchdog() may be
> + * used to pet the watchdog (reset the timeout) - for code which

Nit: I personally prefer "reset the timeout" over "pet the watchdog".
     "pet" is just another ambiguous name as "touch" ;-)

> + * intentionally disables interrupts for a long time. This call is stateless.
> + *
> + * Though this function has "nmi" in the name, the hardlockup watchdog might
> + * not be backed by NMIs. This function will likely be renamed to
> + * touch_hardlockup_watchdog() in the future.
>   */
>  static inline void touch_nmi_watchdog(void)
>  {
> +	/*
> +	 * Pass on to the hardlockup detector selected via CONFIG_. Note that
> +	 * the hardlockup detector may not be arch-specific nor using NMIs
> +	 * and the arch_touch_nmi_watchdog() function will likely be renamed
> +	 * in the future.
> +	 */
>  	arch_touch_nmi_watchdog();
> +
> +	/*
> +	 * Touching the hardlock detector implcitily pets the
> +	 * softlockup detector too
> +	 */

s/implcitily/implicitly/

That said, I would remove this comment completely. It describes what
is clear from the code.

A more useful information would be why it is done. But it is probably
clear as well. CPU could not schedule when interrupts are disabled.

>  	touch_softlockup_watchdog();
>  }

With the removed comment above touch_softlockup_watchdog():

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 08/18] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c
  2023-05-19 17:18 ` [PATCH v5 08/18] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c Douglas Anderson
@ 2023-05-23 11:45   ` Petr Mladek
  0 siblings, 0 replies; 38+ messages in thread
From: Petr Mladek @ 2023-05-23 11:45 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri 2023-05-19 10:18:32, Douglas Anderson wrote:
> The perf hardlockup detector works by looking at interrupt counts and
> seeing if they change from run to run. The interrupt counts are
> managed by the common watchdog code via its watchdog_timer_fn().
> 
> Currently the API between the perf detector and the common code is a
> function: is_hardlockup(). When the hard lockup detector sees that
> function return true then it handles printing out debug info and
> inducing a panic if necessary.
> 
> Let's change the API a little bit in preparation for the buddy
> hardlockup detector. The buddy hardlockup detector wants to print
> nearly the same debug info and have nearly the same panic
> behavior. That means we want to move all that code to the common
> file. For now, the code in the common file will only be there if the
> perf hardlockup detector is enabled, but eventually it will be
> selected by a common config.
> 
> Right now, this _just_ moves the code from the perf detector file to
> the common file and changes the names. It doesn't make the changes
> that the buddy hardlockup detector will need and doesn't do any style
> cleanups. A future patch will do cleanup to make it more obvious what
> changed.
> 
> With the above, we no longer have any callers of is_hardlockup()
> outside of the "watchdog.c" file, so we can remove it from the header,
> make it static, and move it to the same "#ifdef" block as our new
> watchdog_hardlockup_check(). While doing this, it can be noted that
> even if no hardlockup detectors were configured the existing code used
> to still have the code for counting/checking "hrtimer_interrupts" even
> if the perf hardlockup detector wasn't configured. We didn't need to
> do that, so move all the "hrtimer_interrupts" counting to only be
> there if the perf hardlockup detector is configured as well.
> 
> This change is expected to be a no-op.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check()
  2023-05-19 17:18 ` [PATCH v5 10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check() Douglas Anderson
@ 2023-05-23 16:02   ` Petr Mladek
  2023-05-23 16:34     ` Doug Anderson
  0 siblings, 1 reply; 38+ messages in thread
From: Petr Mladek @ 2023-05-23 16:02 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri 2023-05-19 10:18:34, Douglas Anderson wrote:
> In preparation for the buddy hardlockup detector where the CPU
> checking for lockup might not be the currently running CPU, add a
> "cpu" parameter to watchdog_hardlockup_check().
> 
> As part of this change, make hrtimer_interrupts an atomic_t since now
> the CPU incrementing the value and the CPU reading the value might be
> different. Technially this could also be done with just READ_ONCE and
> WRITE_ONCE, but atomic_t feels a little cleaner in this case.
> 
> While hrtimer_interrupts is made atomic_t, we change
> hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
> needed to match the data type backing atomic_t for hrtimer_interrupts.
> Even if this changes us from 64-bits to 32-bits (which I don't think
> is true for most compilers), it doesn't really matter. All we ever do
> is increment it every few seconds and compare it to an old value so
> 32-bits is fine (even 16-bits would be). The "signed" vs "unsigned"
> also doesn't matter for simple equality comparisons.
> 
> hrtimer_interrupts_saved is _not_ switched to atomic_t nor even
> accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is
> always consistently accessed with the same CPU. NOTE: with the
> upcoming "buddy" detector there is one special case. When a CPU goes
> offline/online then we can change which CPU is the one to consistently
> access a given instance of hrtimer_interrupts_saved. We still can't
> end up with a partially updated hrtimer_interrupts_saved, however,
> because we end up petting all affected CPUs to make sure the new and
> old CPU can't end up somehow read/write hrtimer_interrupts_saved at
> the same time.
> 
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -87,29 +87,34 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
>  
>  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>  
> -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> +static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> +static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
>  static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
>  static unsigned long watchdog_hardlockup_all_cpu_dumped;
>  
> -static bool is_hardlockup(void)
> +static bool is_hardlockup(unsigned int cpu)
>  {
> -	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> +	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
>  
> -	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> +	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
>  		return true;
>  
> -	__this_cpu_write(hrtimer_interrupts_saved, hrint);
> +	/*
> +	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> +	 * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> +	 * written/read by a single CPU.
> +	 */
> +	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
>  
>  	return false;
>  }
>  
>  static void watchdog_hardlockup_kick(void)
>  {
> -	__this_cpu_inc(hrtimer_interrupts);
> +	atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));

Is there any particular reason why raw_*() is needed, please?

My expectation is that the raw_ API should be used only when
there is a good reason for it. Where a good reason might be
when the checks might fail but the consistency is guaranteed
another way.

IMHO, we should use:

	atomic_inc(this_cpu_ptr(&hrtimer_interrupts));

To be honest, I am a bit lost in the per_cpu API definitions.

But this_cpu_ptr() seems to be implemented the same way as
per_cpu_ptr() when CONFIG_DEBUG_PREEMPT is enabled.
And we use per_cpu_ptr() in is_hardlockup().

Also this_cpu_ptr() is used more commonly:

$> git grep this_cpu_ptr | wc -l
1385
$> git grep raw_cpu_ptr | wc -l
114

>  }
>  
> -void watchdog_hardlockup_check(struct pt_regs *regs)
> +void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  {
>  	/*
>  	 * Check for a hardlockup by making sure the CPU's timer
> @@ -117,35 +122,42 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
>  	 * fired multiple times before we overflow'd. If it hasn't
>  	 * then this is a good indication the cpu is stuck
>  	 */
> -	if (is_hardlockup()) {
> +	if (is_hardlockup(cpu)) {
>  		unsigned int this_cpu = smp_processor_id();
> +		struct cpumask backtrace_mask = *cpu_online_mask;

Does this work, please?

IMHO, we should use cpumask_copy().

>  
>  		/* Only print hardlockups once. */
> -		if (__this_cpu_read(watchdog_hardlockup_warned))
> +		if (per_cpu(watchdog_hardlockup_warned, cpu))
>  			return;
>  

Otherwise, it looks good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check()
  2023-05-23 16:02   ` Petr Mladek
@ 2023-05-23 16:34     ` Doug Anderson
  2023-05-24 11:36       ` Petr Mladek
  0 siblings, 1 reply; 38+ messages in thread
From: Doug Anderson @ 2023-05-23 16:34 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

Hi,

On Tue, May 23, 2023 at 9:02 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2023-05-19 10:18:34, Douglas Anderson wrote:
> > In preparation for the buddy hardlockup detector where the CPU
> > checking for lockup might not be the currently running CPU, add a
> > "cpu" parameter to watchdog_hardlockup_check().
> >
> > As part of this change, make hrtimer_interrupts an atomic_t since now
> > the CPU incrementing the value and the CPU reading the value might be
> > different. Technially this could also be done with just READ_ONCE and
> > WRITE_ONCE, but atomic_t feels a little cleaner in this case.
> >
> > While hrtimer_interrupts is made atomic_t, we change
> > hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
> > needed to match the data type backing atomic_t for hrtimer_interrupts.
> > Even if this changes us from 64-bits to 32-bits (which I don't think
> > is true for most compilers), it doesn't really matter. All we ever do
> > is increment it every few seconds and compare it to an old value so
> > 32-bits is fine (even 16-bits would be). The "signed" vs "unsigned"
> > also doesn't matter for simple equality comparisons.
> >
> > hrtimer_interrupts_saved is _not_ switched to atomic_t nor even
> > accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is
> > always consistently accessed with the same CPU. NOTE: with the
> > upcoming "buddy" detector there is one special case. When a CPU goes
> > offline/online then we can change which CPU is the one to consistently
> > access a given instance of hrtimer_interrupts_saved. We still can't
> > end up with a partially updated hrtimer_interrupts_saved, however,
> > because we end up petting all affected CPUs to make sure the new and
> > old CPU can't end up somehow read/write hrtimer_interrupts_saved at
> > the same time.
> >
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -87,29 +87,34 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
> >
> >  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> >
> > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> > +static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> > +static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
> >  static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
> >  static unsigned long watchdog_hardlockup_all_cpu_dumped;
> >
> > -static bool is_hardlockup(void)
> > +static bool is_hardlockup(unsigned int cpu)
> >  {
> > -     unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> > +     int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > -     if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> > +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> >               return true;
> >
> > -     __this_cpu_write(hrtimer_interrupts_saved, hrint);
> > +     /*
> > +      * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > +      * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > +      * written/read by a single CPU.
> > +      */
> > +     per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> >
> >       return false;
> >  }
> >
> >  static void watchdog_hardlockup_kick(void)
> >  {
> > -     __this_cpu_inc(hrtimer_interrupts);
> > +     atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));
>
> Is there any particular reason why raw_*() is needed, please?
>
> My expectation is that the raw_ API should be used only when
> there is a good reason for it. Where a good reason might be
> when the checks might fail but the consistency is guaranteed
> another way.
>
> IMHO, we should use:
>
>         atomic_inc(this_cpu_ptr(&hrtimer_interrupts));
>
> To be honest, I am a bit lost in the per_cpu API definitions.
>
> But this_cpu_ptr() seems to be implemented the same way as
> per_cpu_ptr() when CONFIG_DEBUG_PREEMPT is enabled.
> And we use per_cpu_ptr() in is_hardlockup().
>
> Also this_cpu_ptr() is used more commonly:
>
> $> git grep this_cpu_ptr | wc -l
> 1385
> $> git grep raw_cpu_ptr | wc -l
> 114

Hmmm, I think maybe I confused myself. The old code purposely used the
double-underscore prefixed version of this_cpu_inc(). I couldn't find
a double-underscore version of this_cpu_ptr() and I somehow convinced
myself that the raw() version was the right equivalent version.

You're right that this_cpu_ptr() is a better choice here and I don't
see any reason why we'd need the raw version.

> >  }
> >
> > -void watchdog_hardlockup_check(struct pt_regs *regs)
> > +void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >  {
> >       /*
> >        * Check for a hardlockup by making sure the CPU's timer
> > @@ -117,35 +122,42 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
> >        * fired multiple times before we overflow'd. If it hasn't
> >        * then this is a good indication the cpu is stuck
> >        */
> > -     if (is_hardlockup()) {
> > +     if (is_hardlockup(cpu)) {
> >               unsigned int this_cpu = smp_processor_id();
> > +             struct cpumask backtrace_mask = *cpu_online_mask;
>
> Does this work, please?
>
> IMHO, we should use cpumask_copy().

Ah, good call, thanks!


> >               /* Only print hardlockups once. */
> > -             if (__this_cpu_read(watchdog_hardlockup_warned))
> > +             if (per_cpu(watchdog_hardlockup_warned, cpu))
> >                       return;
> >
>
> Otherwise, it looks good to me.

Neither change seems urgent though both are important to fix, I'll
wait a day or two to see if you have feedback on any of the other
patches and I'll send a fixup series.

-Doug

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check()
  2023-05-23 16:34     ` Doug Anderson
@ 2023-05-24 11:36       ` Petr Mladek
  0 siblings, 0 replies; 38+ messages in thread
From: Petr Mladek @ 2023-05-24 11:36 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Tue 2023-05-23 09:34:37, Doug Anderson wrote:
> Hi,
> 
> On Tue, May 23, 2023 at 9:02 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Fri 2023-05-19 10:18:34, Douglas Anderson wrote:
> > > In preparation for the buddy hardlockup detector where the CPU
> > > checking for lockup might not be the currently running CPU, add a
> > > "cpu" parameter to watchdog_hardlockup_check().
> > >
> > > As part of this change, make hrtimer_interrupts an atomic_t since now
> > > the CPU incrementing the value and the CPU reading the value might be
> > > different. Technially this could also be done with just READ_ONCE and
> > > WRITE_ONCE, but atomic_t feels a little cleaner in this case.
> > >
> > > While hrtimer_interrupts is made atomic_t, we change
> > > hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
> > > needed to match the data type backing atomic_t for hrtimer_interrupts.
> > > Even if this changes us from 64-bits to 32-bits (which I don't think
> > > is true for most compilers), it doesn't really matter. All we ever do
> > > is increment it every few seconds and compare it to an old value so
> > > 32-bits is fine (even 16-bits would be). The "signed" vs "unsigned"
> > > also doesn't matter for simple equality comparisons.
> > >
> > > hrtimer_interrupts_saved is _not_ switched to atomic_t nor even
> > > accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is
> > > always consistently accessed with the same CPU. NOTE: with the
> > > upcoming "buddy" detector there is one special case. When a CPU goes
> > > offline/online then we can change which CPU is the one to consistently
> > > access a given instance of hrtimer_interrupts_saved. We still can't
> > > end up with a partially updated hrtimer_interrupts_saved, however,
> > > because we end up petting all affected CPUs to make sure the new and
> > > old CPU can't end up somehow read/write hrtimer_interrupts_saved at
> > > the same time.
> > >
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -87,29 +87,34 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
> > >
> > >  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > >
> > > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> > > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> > > +static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> > > +static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
> > >  static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
> > >  static unsigned long watchdog_hardlockup_all_cpu_dumped;
> > >
> > > -static bool is_hardlockup(void)
> > > +static bool is_hardlockup(unsigned int cpu)
> > >  {
> > > -     unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> > > +     int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > >
> > > -     if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> > > +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > >               return true;
> > >
> > > -     __this_cpu_write(hrtimer_interrupts_saved, hrint);
> > > +     /*
> > > +      * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > > +      * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > > +      * written/read by a single CPU.
> > > +      */
> > > +     per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > >
> > >       return false;
> > >  }
> > >
> > >  static void watchdog_hardlockup_kick(void)
> > >  {
> > > -     __this_cpu_inc(hrtimer_interrupts);
> > > +     atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));
> >
> > Is there any particular reason why raw_*() is needed, please?
> >
> > My expectation is that the raw_ API should be used only when
> > there is a good reason for it. Where a good reason might be
> > when the checks might fail but the consistency is guaranteed
> > another way.
> >
> > IMHO, we should use:
> >
> >         atomic_inc(this_cpu_ptr(&hrtimer_interrupts));
> >
> > To be honest, I am a bit lost in the per_cpu API definitions.
> >
> > But this_cpu_ptr() seems to be implemented the same way as
> > per_cpu_ptr() when CONFIG_DEBUG_PREEMPT is enabled.
> > And we use per_cpu_ptr() in is_hardlockup().
> >
> > Also this_cpu_ptr() is used more commonly:
> >
> > $> git grep this_cpu_ptr | wc -l
> > 1385
> > $> git grep raw_cpu_ptr | wc -l
> > 114
> 
> Hmmm, I think maybe I confused myself. The old code purposely used the
> double-underscore prefixed version of this_cpu_inc(). I couldn't find
> a double-underscore version of this_cpu_ptr() and I somehow convinced
> myself that the raw() version was the right equivalent version.
> 
> You're right that this_cpu_ptr() is a better choice here and I don't
> see any reason why we'd need the raw version.

I was confused too. Honestly, it looks a bit messy to me.

My understanding is that this_cpu*() API has the following semantic:

    + this_cpu_*()* actively disables interrupts/preemption

    + __this_cpu_*() just warns when the task could migrate
		between CPUs.

    + raw_cpu_*() can be used in preemtible context when
		the validity is guaranteed another way.

this_cpu_ptr() does not fit the above. I guess that it is
because it is just providing the address and it is not
accessing the data. So it is enough to read the current
CPU id an atomic way.

IMHO, it would make sense to distinguish how the pointer is
going to be used. From this POV, __this_cpu_ptr() and
raw_cpu_ptr() would make more sense to me.

But it looks to me that this_cpu_ptr() has the same semantic
as per_cpu_ptr().

> Neither change seems urgent though both are important to fix, I'll
> wait a day or two to see if you have feedback on any of the other
> patches and I'll send a fixup series.

Yup, I am going to review the rest.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 11/18] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c
  2023-05-19 17:18 ` [PATCH v5 11/18] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c Douglas Anderson
@ 2023-05-24 13:07   ` Petr Mladek
  0 siblings, 0 replies; 38+ messages in thread
From: Petr Mladek @ 2023-05-24 13:07 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri 2023-05-19 10:18:35, Douglas Anderson wrote:
> In preparation for the buddy hardlockup detector, which wants the same
> petting logic as the current perf hardlockup detector, move the code
> to watchdog.c. While doing this, rename the global variable to match
> others nearby. As part of this change we have to change the code to
> account for the fact that the CPU we're running on might be different
> than the one we're checking.
> 
> Currently the code in watchdog.c is guarded by
> CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem
> silly. However, a future patch will change this.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 12/18] watchdog/hardlockup: Rename some "NMI watchdog" constants/function
  2023-05-19 17:18 ` [PATCH v5 12/18] watchdog/hardlockup: Rename some "NMI watchdog" constants/function Douglas Anderson
@ 2023-05-24 13:38   ` Petr Mladek
  2023-05-25 23:33     ` Doug Anderson
  0 siblings, 1 reply; 38+ messages in thread
From: Petr Mladek @ 2023-05-24 13:38 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri 2023-05-19 10:18:36, Douglas Anderson wrote:
> Do a search and replace of:
> - NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED
> - SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED
> - watchdog_nmi_ => watchdog_hardlockup_
> - nmi_watchdog_available => watchdog_hardlockup_available
> - nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled
> - soft_watchdog_user_enabled => watchdog_softlockup_user_enabled
> - NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT
> 
> Then update a few comments near where names were changed.
> 
> This is specifically to make it less confusing when we want to
> introduce the buddy hardlockup detector, which isn't using NMIs. As
> part of this, we sanitized a few names for consistency.
> 
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -30,17 +30,17 @@
>  static DEFINE_MUTEX(watchdog_mutex);
>  
>  #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
> -# define NMI_WATCHDOG_DEFAULT	1
> +# define WATCHDOG_HARDLOCKUP_DEFAULT	1
>  #else
> -# define NMI_WATCHDOG_DEFAULT	0
> +# define WATCHDOG_HARDLOCKUP_DEFAULT	0
>  #endif
>  
>  unsigned long __read_mostly watchdog_enabled;
>  int __read_mostly watchdog_user_enabled = 1;
> -int __read_mostly nmi_watchdog_user_enabled = NMI_WATCHDOG_DEFAULT;
> -int __read_mostly soft_watchdog_user_enabled = 1;
> +int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT;
> +int __read_mostly watchdog_softlockup_user_enabled = 1;

I still see nmi_watchdog_user_enabled and soft_watchdog_user_enabled
in include/linux/nmi.h. They are declared there and also mentioned
in a comment.

It seems that they do not actually need to be declared there.
I guess that it was need for the /proc stuff. But it was
moved into kernel/watchdog.c by the commit commit dd0693fdf054f2ed37
("watchdog: move watchdog sysctl interface to watchdog.c").

>  int __read_mostly watchdog_thresh = 10;
> -static int __read_mostly nmi_watchdog_available;
> +static int __read_mostly watchdog_hardlockup_available;
>  
>  struct cpumask watchdog_cpumask __read_mostly;
>  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);

Otherwise, I like the changes.

With the following:

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 83076bf70ce8..d14fe345eae9 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -17,8 +17,6 @@ void lockup_detector_soft_poweroff(void);
 void lockup_detector_cleanup(void);
 
 extern int watchdog_user_enabled;
-extern int nmi_watchdog_user_enabled;
-extern int soft_watchdog_user_enabled;
 extern int watchdog_thresh;
 extern unsigned long watchdog_enabled;
 
@@ -68,8 +66,8 @@ static inline void reset_hung_task_detector(void) { }
  * 'watchdog_enabled' variable. Each lockup detector has its dedicated bit -
  * bit 0 for the hard lockup detector and bit 1 for the soft lockup detector.
  *
- * 'watchdog_user_enabled', 'nmi_watchdog_user_enabled' and
- * 'soft_watchdog_user_enabled' are variables that are only used as an
+ * 'watchdog_user_enabled', 'watchdog_hardlockup_user_enabled' and
+ * 'watchdog_softlockup_user_enabled' are variables that are only used as an
  * 'interface' between the parameters in /proc/sys/kernel and the internal
  * state bits in 'watchdog_enabled'. The 'watchdog_thresh' variable is
  * handled differently because its value is not boolean, and the lockup

Reviewed-by: Petr Mladek <pmladek@suse.com>

Even better might be to remove the unused declaration in a separate
patch. I think that more declarations are not needed after moving
the /proc stuff into kernel/watchdog.c.

But it might also be fixed later.

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 13/18] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly
  2023-05-19 17:18 ` [PATCH v5 13/18] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly Douglas Anderson
@ 2023-05-24 13:59   ` Petr Mladek
  2023-05-24 19:38     ` Doug Anderson
  0 siblings, 1 reply; 38+ messages in thread
From: Petr Mladek @ 2023-05-24 13:59 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri 2023-05-19 10:18:37, Douglas Anderson wrote:
> The fact that there watchdog_hardlockup_enable(),
> watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are
> declared __weak means that the configured hardlockup detector can
> define non-weak versions of those functions if it needs to. Instead of
> doing this, the perf hardlockup detector hooked itself into the
> default __weak implementation, which was a bit awkward. Clean this up.
> 
> >From comments, it looks as if the original design was done because the
> __weak function were expected to implemented by the architecture and
> not by the configured hardlockup detector. This got awkward when we
> tried to add the buddy lockup detector which was not arch-specific but
> wanted to hook into those same functions.
> 
> This is not expected to have any functional impact.
>
> @@ -187,27 +187,33 @@ static inline void watchdog_hardlockup_kick(void) { }
>  #endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */
>  
>  /*
> - * These functions can be overridden if an architecture implements its
> - * own hardlockup detector.
> + * These functions can be overridden based on the configured hardlockdup detector.
>   *
>   * watchdog_hardlockup_enable/disable can be implemented to start and stop when
> - * softlockup watchdog start and stop. The arch must select the
> + * softlockup watchdog start and stop. The detector must select the
>   * SOFTLOCKUP_DETECTOR Kconfig.
>   */
> -void __weak watchdog_hardlockup_enable(unsigned int cpu)
> -{
> -	hardlockup_detector_perf_enable();
> -}
> +void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
>  
> -void __weak watchdog_hardlockup_disable(unsigned int cpu)
> -{
> -	hardlockup_detector_perf_disable();
> -}
> +void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
>  
>  /* Return 0, if a hardlockup watchdog is available. Error code otherwise */
>  int __weak __init watchdog_hardlockup_probe(void)
>  {
> -	return hardlockup_detector_perf_init();
> +	/*
> +	 * If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture
> +	 * is assumed to have the hard watchdog available and we return 0.
> +	 */
> +	if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
> +		return 0;
> +
> +	/*
> +	 * Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG
> +	 * are required to implement a non-weak version of this probe function
> +	 * to tell whether they are available. If they don't override then
> +	 * we'll return -ENODEV.
> +	 */
> +	return -ENODEV;
>  }

When thinking more about it. It is weird that we need to handle
CONFIG_HAVE_NMI_WATCHDOG in this default week function.

It should be handled in watchdog_hardlockup_probe() implemented
in kernel/watchdog_perf.c.

IMHO, the default __weak function could always return -ENODEV;

Would it make sense, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 13/18] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly
  2023-05-24 13:59   ` Petr Mladek
@ 2023-05-24 19:38     ` Doug Anderson
  2023-05-26 14:44       ` Petr Mladek
  0 siblings, 1 reply; 38+ messages in thread
From: Doug Anderson @ 2023-05-24 19:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

Hi,

On Wed, May 24, 2023 at 6:59 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2023-05-19 10:18:37, Douglas Anderson wrote:
> > The fact that there watchdog_hardlockup_enable(),
> > watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are
> > declared __weak means that the configured hardlockup detector can
> > define non-weak versions of those functions if it needs to. Instead of
> > doing this, the perf hardlockup detector hooked itself into the
> > default __weak implementation, which was a bit awkward. Clean this up.
> >
> > >From comments, it looks as if the original design was done because the
> > __weak function were expected to implemented by the architecture and
> > not by the configured hardlockup detector. This got awkward when we
> > tried to add the buddy lockup detector which was not arch-specific but
> > wanted to hook into those same functions.
> >
> > This is not expected to have any functional impact.
> >
> > @@ -187,27 +187,33 @@ static inline void watchdog_hardlockup_kick(void) { }
> >  #endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */
> >
> >  /*
> > - * These functions can be overridden if an architecture implements its
> > - * own hardlockup detector.
> > + * These functions can be overridden based on the configured hardlockdup detector.
> >   *
> >   * watchdog_hardlockup_enable/disable can be implemented to start and stop when
> > - * softlockup watchdog start and stop. The arch must select the
> > + * softlockup watchdog start and stop. The detector must select the
> >   * SOFTLOCKUP_DETECTOR Kconfig.
> >   */
> > -void __weak watchdog_hardlockup_enable(unsigned int cpu)
> > -{
> > -     hardlockup_detector_perf_enable();
> > -}
> > +void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
> >
> > -void __weak watchdog_hardlockup_disable(unsigned int cpu)
> > -{
> > -     hardlockup_detector_perf_disable();
> > -}
> > +void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> >
> >  /* Return 0, if a hardlockup watchdog is available. Error code otherwise */
> >  int __weak __init watchdog_hardlockup_probe(void)
> >  {
> > -     return hardlockup_detector_perf_init();
> > +     /*
> > +      * If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture
> > +      * is assumed to have the hard watchdog available and we return 0.
> > +      */
> > +     if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
> > +             return 0;
> > +
> > +     /*
> > +      * Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG
> > +      * are required to implement a non-weak version of this probe function
> > +      * to tell whether they are available. If they don't override then
> > +      * we'll return -ENODEV.
> > +      */
> > +     return -ENODEV;
> >  }
>
> When thinking more about it. It is weird that we need to handle
> CONFIG_HAVE_NMI_WATCHDOG in this default week function.
>
> It should be handled in watchdog_hardlockup_probe() implemented
> in kernel/watchdog_perf.c.
>
> IMHO, the default __weak function could always return -ENODEV;
>
> Would it make sense, please?

I don't quite understand. I'd agree that the special case for
CONFIG_HAVE_NMI_WATCHDOG is ugly, but it was also ugly before. IMO
it's actually a little less ugly / easier to understand after my
patch. ...but let me walk through how I think this special case works
and maybe you can tell me where I'm confused.

The first thing to understand is that CONFIG_HARDLOCKUP_DETECTOR_PERF
and CONFIG_HAVE_NMI_WATCHDOG are mutually exclusive from each other.
This was true before any of my patches and is still true after them.
Specifically, if CONFIG_HAVE_NMI_WATCHDOG is defined then an
architecture implements arch_touch_nmi_watchdog() (as documented in
the Kconfig docs for HAVE_NMI_WATCHDOG). Looking at the tree before my
series you can see that the perf hardlockup detector also implemented
arch_touch_nmi_watchdog(). This would have caused a conflict. The
mutual exclusion was presumably enforced by an architecture not
defining both HAVE_NMI_WATCHDOG and HAVE_HARDLOCKUP_DETECTOR_PERF.

The second thing to understand is that an architecture that defines
CONFIG_HAVE_NMI_WATCHDOG is _not_ required to implement
watchdog_hardlockup_probe() (used to be called watchdog_nmi_probe()).
Maybe this should change, but at the very least it appears that
SPARC64 defines HAVE_NMI_WATCHDOG but doesn't define
watchdog_hardlockup_probe() AKA watchdog_nmi_probe(). Anyone who
defines CONFIG_HAVE_NMI_WATCHDOG and doesn't implement
watchdog_hardlockup_probe() is claiming that their watchdog needs no
probing and is always available.

So with that context:

1. We can't handle any special cases for CONFIG_HAVE_NMI_WATCHDOG in
"kernel/watchdog_perf.c". The special cases that we need to handle are
all for the cases where CONFIG_HARDLOCKUP_DETECTOR_PERF isn't defined
and that means "kernel/watchdog_perf.c" isn't included.

2. We can't have the default __weak function return -ENODEV because
CONFIG_HAVE_NMI_WATCHDOG doesn't require an arch to implement
watchdog_hardlockup_probe(), but we want watchdog_hardlockup_probe()
to return "no error" in that case so that
"watchdog_hardlockup_available" gets set to true.

Does that sound right?

I'd agree that a future improvement saying that
CONFIG_HAVE_NMI_WATCHDOG means you _must_ implement
watchdog_hardlockup_probe() would make sense and that would allow us
to get rid of the special case. IMO, though, that's a separate patch.
I'd be happy to review that patch if you wanted to post it up. :-)

If we want to add that requirement, I _think_ the only thing you'd
need to do is to add watchdog_hardlockup_probe() to sparc64 and have
it return 0 and put that definition in the same file containing
arch_touch_nmi_watchdog(). powerpc also gets CONFIG_HAVE_NMI_WATCHDOG
as a side effect of selecting CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH but
it looks like they implement watchdog_hardlockup_probe() already. Oh,
but maybe this will fix a preexisting (existed before my patches)
minor bug... Unless I'm missing something (entirely possible!) on
powerpc today I guess if you turn off CONFIG_PPC_WATCHDOG then
CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH and CONFIG_HAVE_NMI_WATCHDOG
would still be defined and we'd end up returning 0 (no error) from
watchdog_hardlockup_probe(). That means that on powerpc today if you
turn off CONFIG_PPC_WATCHDOG that '/proc/sys/kernel/nmi_watchdog' will
still think the watchdog is enabled?


-Doug

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
  2023-05-19 17:18 ` [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs Douglas Anderson
@ 2023-05-25 16:26   ` Petr Mladek
  2023-05-25 20:08     ` Doug Anderson
  0 siblings, 1 reply; 38+ messages in thread
From: Petr Mladek @ 2023-05-25 16:26 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen,
	Colin Cross

On Fri 2023-05-19 10:18:38, Douglas Anderson wrote:
> Implement a hardlockup detector that doesn't doesn't need any extra
> arch-specific support code to detect lockups. Instead of using
> something arch-specific we will use the buddy system, where each CPU
> watches out for another one. Specifically, each CPU will use its
> softlockup hrtimer to check that the next CPU is processing hrtimer
> interrupts by verifying that a counter is increasing.
> 
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
>  
>  #endif /* CONFIG_HARDLOCKUP_DETECTOR */
>  
> -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER)
>  
>  static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
>  static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
> @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void)
>  }
>  EXPORT_SYMBOL(arch_touch_nmi_watchdog);
>  
> +void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> +{
> +	per_cpu(watchdog_hardlockup_touched, cpu) = true;
> +
> +	/* Match with smp_rmb() in watchdog_hardlockup_check() */
> +	smp_wmb();

It is great that you described where the related barrier is.

Another important information is what exactly is synchronized.
And I am actually not sure what we are synchronizing here.

My understanding is that a write barrier should synchronize
related writes, for example:

	X = ...;
	/* Make sure that X is modified before Y */
	smp_wmb();
	Y = ...;

And the related read barrier should synchronize the related reads,
for example:

	if (test(Y)) {
		/*
		 * Make sure that we use the updated X when
		 * we saw the updated Y.
		 */
		 smp_rmb();
		 do_something(X);
	 }

IMHO, we do not need any barrier here because we have only
one variable "watchdog_hardlockup_touched" here.
watchdog_hardlockup_check() will either see the updated value
or not. But it does not synchronize it against any other
variables or values.

> +}
> +
>  static bool is_hardlockup(unsigned int cpu)
>  {
>  	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  	struct pt_regs *regs = get_irq_regs();
>  	int duration;
>  	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
> +	unsigned long hrtimer_interrupts;
>  
>  	if (!watchdog_enabled)
>  		return HRTIMER_NORESTART;
>  
> -	watchdog_hardlockup_kick();
> +	hrtimer_interrupts = watchdog_hardlockup_kick();
> +
> +	/* test for hardlockups */

I would omit the comment. It is not valid when perf detector is used.
And checking the buddy is clear from the function name.

> +	watchdog_buddy_check_hardlockup(hrtimer_interrupts);

I would personally move this into watchdog_hardlockup_kick().
watchdog_timer_fn() is already complex enough. And checking
the buddy when kicking a CPU makes sense.

Also I would not pass "hrtimer_interrupts". I guess that it is
just an optimization. It is an extra churn in the code. IMHO,
is is not wort it. This code does not need to be super optimized.

> +

>  
>  	/* kick the softlockup detector */
>  	if (completion_done(this_cpu_ptr(&softlockup_completion))) {
> diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
> new file mode 100644
> index 000000000000..fee45af2e5bd
> --- /dev/null
> +++ b/kernel/watchdog_buddy.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/kernel.h>
> +#include <linux/nmi.h>
> +#include <linux/percpu-defs.h>
> +
> +static cpumask_t __read_mostly watchdog_cpus;
> +
> +static unsigned int watchdog_next_cpu(unsigned int cpu)
> +{
> +	cpumask_t cpus = watchdog_cpus;

A copy should be done by cpumask_copy().

But the question is why a copy would be needed. When called from
watchdog_buddy_check_hardlockup(), this function is not sychronized
against the CPU hotplug. And even the copying would be racy.

IMHO, the copy does not help much and we do not need it.

The only important this is that this function would return
a valid CPU. And I think that it is guarantted because
CPU0 could not be disabled.

> +	unsigned int next_cpu;
> +
> +	next_cpu = cpumask_next(cpu, &cpus);
> +	if (next_cpu >= nr_cpu_ids)
> +		next_cpu = cpumask_first(&cpus);
> +
> +	if (next_cpu == cpu)
> +		return nr_cpu_ids;
>> +	return next_cpu;
> +}
> +
> +int __init watchdog_hardlockup_probe(void)
> +{
> +	return 0;
> +}
> +
> +void watchdog_hardlockup_enable(unsigned int cpu)
> +{
> +	unsigned int next_cpu;
> +
> +	/*
> +	 * The new CPU will be marked online before the hrtimer interrupt
> +	 * gets a chance to run on it. If another CPU tests for a
> +	 * hardlockup on the new CPU before it has run its the hrtimer
> +	 * interrupt, it will get a false positive. Touch the watchdog on
> +	 * the new CPU to delay the check for at least 3 sampling periods
> +	 * to guarantee one hrtimer has run on the new CPU.
> +	 */
> +	watchdog_hardlockup_touch_cpu(cpu);
> +
> +	/*
> +	 * We are going to check the next CPU. Our watchdog_hrtimer
> +	 * need not be zero if the CPU has already been online earlier.
> +	 * Touch the watchdog on the next CPU to avoid false positive
> +	 * if we try to check it in less then 3 interrupts.
> +	 */
> +	next_cpu = watchdog_next_cpu(cpu);
> +	if (next_cpu < nr_cpu_ids)
> +		watchdog_hardlockup_touch_cpu(next_cpu);

Thinking loudly:

This feels racy when many CPUs are enabled/disabled in parallel.
I am not 100% sure it it can happen though.

My understanding is that it can't happen because the CPU hotplug
is serialized by cpu_add_remove_lock.

So, this seems to work after all.

> +
> +	cpumask_set_cpu(cpu, &watchdog_cpus);
> +}
> +
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>  
>  	  Say N if unsure.
>  
> -config HARDLOCKUP_DETECTOR_PERF
> +# Both the "perf" and "buddy" hardlockup detectors count hrtimer
> +# interrupts. This config enables functions managing this common code.
> +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
>  	bool
>  	select SOFTLOCKUP_DETECTOR
>  
> +config HARDLOCKUP_DETECTOR_PERF
> +	bool
> +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> +	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> +
> +config HARDLOCKUP_DETECTOR_BUDDY
> +	bool
> +	depends on SMP
> +	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> +
> +# For hardlockup detectors you can have one directly provided by the arch
> +# or use a "non-arch" one. If you're using a "non-arch" one that is
> +# further divided the perf hardlockup detector (which, confusingly, needs
> +# arch-provided perf support) and the buddy hardlockup detector (which just
> +# needs SMP). In either case, using the "non-arch" code conflicts with
> +# the NMI watchdog code (which is sometimes used directly and sometimes used
> +# by the arch-provided hardlockup detector).
> +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> +	bool
> +	depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
> +	default y
> +
> +config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> +	bool "Prefer the buddy CPU hardlockup detector"
> +	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP

Huh, I have big troubles to scratch my head around this check:

       HAVE_HARDLOCKUP_DETECTOR_NON_ARCH depends on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP

       and this depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and again
	       on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP.

> +	help
> +	  Say Y here to prefer the buddy hardlockup detector over the perf one.
> +
> +	  With the buddy detector, each CPU uses its softlockup hrtimer
> +	  to check that the next CPU is processing hrtimer interrupts by
> +	  verifying that a counter is increasing.
> +
> +	  This hardlockup detector is useful on systems that don't have
> +	  an arch-specific hardlockup detector or if resources needed
> +	  for the hardlockup detector are better used for other things.
> +
> +# This will select the appropriate non-arch hardlockdup detector
> +config HARDLOCKUP_DETECTOR_NON_ARCH
> +	bool
> +	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> +	select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
> +	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> +
>  #
>  # Enables a timestamp based low pass filter to compensate for perf based
>  # hard lockup detection which runs too fast due to turbo modes.
> @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP
>  config HARDLOCKUP_DETECTOR
>  	bool "Detect Hard Lockups"
>  	depends on DEBUG_KERNEL && !S390

Is there any reason why S390 could not or do not want to use the buddy
hardlockup detector.

> -	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
> +	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
>  	select LOCKUP_DETECTOR
> -	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
> +	select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH

Anyway, the configuration of the hard lockup detectors is insane and
this patchset makes it even worse, especially the new
HARDLOCKUP_DETECTOR_NON_ARCH stuff.

It seems that sparc, powerpc and s390 are somehow special. Do you
still have in mind how they are distinguished using the Kconfig
variables?

For example, I am pretty confused by the meaning of HAVE_NMI_WATCHDOG.

And sparc has its own variant of
watchdog_hardlockup_enable()/disable(). It means that it is
arch-specific. Does it work with the 13th patch which made
watchdog_hardlockup_enable()/disable() to be watchdog-hardlockup-type
specific? Is is somehow related to HAVE_NMI_WATCHDOG?
Does this replace the entire watchdog only only the enable part?

I think that we need to make this more straightforward. But I first
need to understand the existing maze of config variables.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
  2023-05-25 16:26   ` Petr Mladek
@ 2023-05-25 20:08     ` Doug Anderson
  2023-05-26 12:29       ` Petr Mladek
  0 siblings, 1 reply; 38+ messages in thread
From: Doug Anderson @ 2023-05-25 20:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen,
	Colin Cross

Hi,

On Thu, May 25, 2023 at 9:27 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2023-05-19 10:18:38, Douglas Anderson wrote:
> > Implement a hardlockup detector that doesn't doesn't need any extra
> > arch-specific support code to detect lockups. Instead of using
> > something arch-specific we will use the buddy system, where each CPU
> > watches out for another one. Specifically, each CPU will use its
> > softlockup hrtimer to check that the next CPU is processing hrtimer
> > interrupts by verifying that a counter is increasing.
> >
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
> >
> >  #endif /* CONFIG_HARDLOCKUP_DETECTOR */
> >
> > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER)
> >
> >  static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> >  static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
> > @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void)
> >  }
> >  EXPORT_SYMBOL(arch_touch_nmi_watchdog);
> >
> > +void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> > +{
> > +     per_cpu(watchdog_hardlockup_touched, cpu) = true;
> > +
> > +     /* Match with smp_rmb() in watchdog_hardlockup_check() */
> > +     smp_wmb();
>
> It is great that you described where the related barrier is.
>
> Another important information is what exactly is synchronized.
> And I am actually not sure what we are synchronizing here.
>
> My understanding is that a write barrier should synchronize
> related writes, for example:
>
>         X = ...;
>         /* Make sure that X is modified before Y */
>         smp_wmb();
>         Y = ...;
>
> And the related read barrier should synchronize the related reads,
> for example:
>
>         if (test(Y)) {
>                 /*
>                  * Make sure that we use the updated X when
>                  * we saw the updated Y.
>                  */
>                  smp_rmb();
>                  do_something(X);
>          }
>
> IMHO, we do not need any barrier here because we have only
> one variable "watchdog_hardlockup_touched" here.
> watchdog_hardlockup_check() will either see the updated value
> or not. But it does not synchronize it against any other
> variables or values.

Fair. These barriers were present in the original buddy lockup
detector that we've been carrying in ChromeOS but that doesn't
necessarily mean that they were there for a good reason.

Reasoning about weakly ordered memory always makes my brain hurt and I
never feel confident at the end that I got the right answer and, of
course, this is coupled by the fact that if I have a logic error in my
reasoning that it might cause a rare / subtle bug. :( When possible I
try to write code that uses full blown locks to make it easier to
reason about (even if less efficient), but that's not necessarily
possible here. While we obviously don't just want to sprinkle barriers
all over the code, IMO it's not a terrible sin to put a barrier in a
case where it makes it easier to reason about the order of things.

In any case, I guess in this case I would worry about some sort of
ordering race when enabling / disabling the buddy lockup detector. At
the end of the buddy's watchdog_hardlockup_enable() /
watchdog_hardlockup_disable() we adjust the "watchdog_cpus" which
changes buddy assignments. Without a barrier, I _think_ it would be
possible for a new CPU to notice the change in buddies without
noticing the touch. Does that match your understanding? Now when
reasoning about CPUs going online/offline we need to consider this
extra case and we have to decide if there's any chance it could lead
to a false lockup detection. With the memory barriers here, it's a
little easier to think about.

Did the above convince you about keeping the barriers? If so, do you
have any suggested comment that would make it clearer?


> > +}
> > +
> >  static bool is_hardlockup(unsigned int cpu)
> >  {
> >       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> >       struct pt_regs *regs = get_irq_regs();
> >       int duration;
> >       int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
> > +     unsigned long hrtimer_interrupts;
> >
> >       if (!watchdog_enabled)
> >               return HRTIMER_NORESTART;
> >
> > -     watchdog_hardlockup_kick();
> > +     hrtimer_interrupts = watchdog_hardlockup_kick();
> > +
> > +     /* test for hardlockups */
>
> I would omit the comment. It is not valid when perf detector is used.
> And checking the buddy is clear from the function name.
>
> > +     watchdog_buddy_check_hardlockup(hrtimer_interrupts);
>
> I would personally move this into watchdog_hardlockup_kick().
> watchdog_timer_fn() is already complex enough. And checking
> the buddy when kicking a CPU makes sense.

Sure, I'll add that to my list of things to follow-up with.


> Also I would not pass "hrtimer_interrupts". I guess that it is
> just an optimization. It is an extra churn in the code. IMHO,
> is is not wort it. This code does not need to be super optimized.

The main reason I did it is that "hrtimer_interrupts" is static to
watchdog.c now. If I don't pass it in then I have to make it
non-static and add it to the header. That also means anyone looking at
the variable and figuring out how it is read/written needs to go
search for other people that reference it. I feel like it's cleaner to
just pass it in. If you feel strongly that I should change this then
let me know, but otherwise I'll plan to leave this how I have it.


> >       /* kick the softlockup detector */
> >       if (completion_done(this_cpu_ptr(&softlockup_completion))) {
> > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
> > new file mode 100644
> > index 000000000000..fee45af2e5bd
> > --- /dev/null
> > +++ b/kernel/watchdog_buddy.c
> > @@ -0,0 +1,93 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/cpumask.h>
> > +#include <linux/kernel.h>
> > +#include <linux/nmi.h>
> > +#include <linux/percpu-defs.h>
> > +
> > +static cpumask_t __read_mostly watchdog_cpus;
> > +
> > +static unsigned int watchdog_next_cpu(unsigned int cpu)
> > +{
> > +     cpumask_t cpus = watchdog_cpus;
>
> A copy should be done by cpumask_copy().
>
> But the question is why a copy would be needed. When called from
> watchdog_buddy_check_hardlockup(), this function is not sychronized
> against the CPU hotplug. And even the copying would be racy.
>
> IMHO, the copy does not help much and we do not need it.
>
> The only important this is that this function would return
> a valid CPU. And I think that it is guarantted because
> CPU0 could not be disabled.

Yup, I'll get rid of the copy.


> > +     unsigned int next_cpu;
> > +
> > +     next_cpu = cpumask_next(cpu, &cpus);
> > +     if (next_cpu >= nr_cpu_ids)
> > +             next_cpu = cpumask_first(&cpus);
> > +
> > +     if (next_cpu == cpu)
> > +             return nr_cpu_ids;
> >> +    return next_cpu;
> > +}
> > +
> > +int __init watchdog_hardlockup_probe(void)
> > +{
> > +     return 0;
> > +}
> > +
> > +void watchdog_hardlockup_enable(unsigned int cpu)
> > +{
> > +     unsigned int next_cpu;
> > +
> > +     /*
> > +      * The new CPU will be marked online before the hrtimer interrupt
> > +      * gets a chance to run on it. If another CPU tests for a
> > +      * hardlockup on the new CPU before it has run its the hrtimer
> > +      * interrupt, it will get a false positive. Touch the watchdog on
> > +      * the new CPU to delay the check for at least 3 sampling periods
> > +      * to guarantee one hrtimer has run on the new CPU.
> > +      */
> > +     watchdog_hardlockup_touch_cpu(cpu);
> > +
> > +     /*
> > +      * We are going to check the next CPU. Our watchdog_hrtimer
> > +      * need not be zero if the CPU has already been online earlier.
> > +      * Touch the watchdog on the next CPU to avoid false positive
> > +      * if we try to check it in less then 3 interrupts.
> > +      */
> > +     next_cpu = watchdog_next_cpu(cpu);
> > +     if (next_cpu < nr_cpu_ids)
> > +             watchdog_hardlockup_touch_cpu(next_cpu);
>
> Thinking loudly:
>
> This feels racy when many CPUs are enabled/disabled in parallel.
> I am not 100% sure it it can happen though.
>
> My understanding is that it can't happen because the CPU hotplug
> is serialized by cpu_add_remove_lock.
>
> So, this seems to work after all.
>
> > +
> > +     cpumask_set_cpu(cpu, &watchdog_cpus);
> > +}
> > +
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
> >
> >         Say N if unsure.
> >
> > -config HARDLOCKUP_DETECTOR_PERF
> > +# Both the "perf" and "buddy" hardlockup detectors count hrtimer
> > +# interrupts. This config enables functions managing this common code.
> > +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> >       bool
> >       select SOFTLOCKUP_DETECTOR
> >
> > +config HARDLOCKUP_DETECTOR_PERF
> > +     bool
> > +     depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> > +     select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > +
> > +config HARDLOCKUP_DETECTOR_BUDDY
> > +     bool
> > +     depends on SMP
> > +     select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > +
> > +# For hardlockup detectors you can have one directly provided by the arch
> > +# or use a "non-arch" one. If you're using a "non-arch" one that is
> > +# further divided the perf hardlockup detector (which, confusingly, needs
> > +# arch-provided perf support) and the buddy hardlockup detector (which just
> > +# needs SMP). In either case, using the "non-arch" code conflicts with
> > +# the NMI watchdog code (which is sometimes used directly and sometimes used
> > +# by the arch-provided hardlockup detector).
> > +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> > +     bool
> > +     depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
> > +     default y
> > +
> > +config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > +     bool "Prefer the buddy CPU hardlockup detector"
> > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
>
> Huh, I have big troubles to scratch my head around this check:
>
>        HAVE_HARDLOCKUP_DETECTOR_NON_ARCH depends on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP
>
>        and this depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and again
>                on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP.

The goal is to have "HARDLOCKUP_DETECTOR_PREFER_BUDDY" to show up as
an option if there is an option _other_ than the buddy. If there's not
more than one hardlockup detector to pick from then there's no reason
to ask the person configuring the kernel which one they'd prefer. At
the moment, if you have an "arch" lockup detector then you're stuck
with it, so you only get a choice if a "perf" detector is available
and you've got SMP.

Ah, so I guess this could be simplified to:

depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP

OK, I'll add that to the list.


> > +     help
> > +       Say Y here to prefer the buddy hardlockup detector over the perf one.
> > +
> > +       With the buddy detector, each CPU uses its softlockup hrtimer
> > +       to check that the next CPU is processing hrtimer interrupts by
> > +       verifying that a counter is increasing.
> > +
> > +       This hardlockup detector is useful on systems that don't have
> > +       an arch-specific hardlockup detector or if resources needed
> > +       for the hardlockup detector are better used for other things.
> > +
> > +# This will select the appropriate non-arch hardlockdup detector
> > +config HARDLOCKUP_DETECTOR_NON_ARCH
> > +     bool
> > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> > +     select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > +     select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > +
> >  #
> >  # Enables a timestamp based low pass filter to compensate for perf based
> >  # hard lockup detection which runs too fast due to turbo modes.
> > @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP
> >  config HARDLOCKUP_DETECTOR
> >       bool "Detect Hard Lockups"
> >       depends on DEBUG_KERNEL && !S390
>
> Is there any reason why S390 could not or do not want to use the buddy
> hardlockup detector.

This isn't a new dependency, but it's a good question. Looking at the
git history, I see commit dea20a3fbdd0 ("[PATCH] Disable
DETECT_SOFTLOCKUP for s390"). ...and it looks like the softlockup
detector still says it's broken on s390. That would mean that the
buddy detector is broken too.


> > -     depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> >       select LOCKUP_DETECTOR
> > -     select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
> > +     select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
>
> Anyway, the configuration of the hard lockup detectors is insane and
> this patchset makes it even worse, especially the new
> HARDLOCKUP_DETECTOR_NON_ARCH stuff.
>
> It seems that sparc, powerpc and s390 are somehow special. Do you
> still have in mind how they are distinguished using the Kconfig
> variables?
>
> For example, I am pretty confused by the meaning of HAVE_NMI_WATCHDOG.
>
> And sparc has its own variant of
> watchdog_hardlockup_enable()/disable(). It means that it is
> arch-specific. Does it work with the 13th patch which made
> watchdog_hardlockup_enable()/disable() to be watchdog-hardlockup-type
> specific? Is is somehow related to HAVE_NMI_WATCHDOG?
> Does this replace the entire watchdog only only the enable part?
>
> I think that we need to make this more straightforward. But I first
> need to understand the existing maze of config variables.

I agree that it's confusing. I'm obviously biased, but IMO it's less
confusing after my patchset than before. ;-) The state of the world
before my patchset set a pretty low bar.

As far as I understand it, at an architecture-level you can choose any
_ONE_ of the following:

a) Implement bits needed for the the "perf" hardlockup detector. x86
has done this, some configs of powerpc do this, and arm64 now after my
patch series. This is HAVE_HARDLOCKUP_DETECTOR_PERF.

b) Implement your own totally separate hardlockup detector that
doesn't use any of the common "perf" code but still looks the same to
userspace (same sysctls, etc). Only powerpc does this (in some
configs). As per conversations in previous versions of my patch
series, apparently powerpc's version is quite fancy and maybe someday
people can move some of these features to the common code. This is
HAVE_HARDLOCKUP_DETECTOR_ARCH.

c) Don't implement the full features of a hardlockup detector but
still have the basics. In the very least, I think it doesn't support
the sysctls "hardlockup_panic" and "hardlockup_all_cpu_backtrace". It
doesn't support the kernel command line parameter "nmi_watchdog=". I
don't know for sure if there are any other differences. Only sparc64
does this. This is HAVE_NMI_WATCHDOG. Confusingly,
HAVE_HARDLOCKUP_DETECTOR_ARCH selects HAVE_NMI_WATCHDOG.

d) Don't implement _any_ hardlockup detector of any sort. After my
patchset you can still end up with "buddy" if you have SMP.


One thing that would probably help would be to bring sparc64 to a full
"arch" hardlockup implementation and then get rid of the special case.
That seems a bit outside my scope, though if someone wanted to post
patches for that I'd be willing to give them a review.

I guess other than that, the best we could try to do is to rename some
configs and/or add some subconfigs to describe certain features? Maybe
HAVE_NMI_WATCHDOG => HAVE_HARDLOCKUP_DETECTOR_ARCH_BASIC_FEATURES
would help? I'd love to come up with a better name for
HAVE_HARDLOCKUP_DETECTOR_NON_ARCH but I couldn't come up with one.
Maybe the unwieldy  "HAVE_HARDLOCKUP_DETECTOR_THAT_COUNTS_HRTIMER"?

If you have concrete suggestions for what would be cleaner, let me
know and I can queue up a patch. ...or I'm happy to review a patch.

-Doug

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 12/18] watchdog/hardlockup: Rename some "NMI watchdog" constants/function
  2023-05-24 13:38   ` Petr Mladek
@ 2023-05-25 23:33     ` Doug Anderson
  0 siblings, 0 replies; 38+ messages in thread
From: Doug Anderson @ 2023-05-25 23:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

Hi,

On Wed, May 24, 2023 at 6:38 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2023-05-19 10:18:36, Douglas Anderson wrote:
> > Do a search and replace of:
> > - NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED
> > - SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED
> > - watchdog_nmi_ => watchdog_hardlockup_
> > - nmi_watchdog_available => watchdog_hardlockup_available
> > - nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled
> > - soft_watchdog_user_enabled => watchdog_softlockup_user_enabled
> > - NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT
> >
> > Then update a few comments near where names were changed.
> >
> > This is specifically to make it less confusing when we want to
> > introduce the buddy hardlockup detector, which isn't using NMIs. As
> > part of this, we sanitized a few names for consistency.
> >
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -30,17 +30,17 @@
> >  static DEFINE_MUTEX(watchdog_mutex);
> >
> >  #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
> > -# define NMI_WATCHDOG_DEFAULT        1
> > +# define WATCHDOG_HARDLOCKUP_DEFAULT 1
> >  #else
> > -# define NMI_WATCHDOG_DEFAULT        0
> > +# define WATCHDOG_HARDLOCKUP_DEFAULT 0
> >  #endif
> >
> >  unsigned long __read_mostly watchdog_enabled;
> >  int __read_mostly watchdog_user_enabled = 1;
> > -int __read_mostly nmi_watchdog_user_enabled = NMI_WATCHDOG_DEFAULT;
> > -int __read_mostly soft_watchdog_user_enabled = 1;
> > +int __read_mostly watchdog_hardlockup_user_enabled = WATCHDOG_HARDLOCKUP_DEFAULT;
> > +int __read_mostly watchdog_softlockup_user_enabled = 1;
>
> I still see nmi_watchdog_user_enabled and soft_watchdog_user_enabled
> in include/linux/nmi.h. They are declared there and also mentioned
> in a comment.
>
> It seems that they do not actually need to be declared there.
> I guess that it was need for the /proc stuff. But it was
> moved into kernel/watchdog.c by the commit commit dd0693fdf054f2ed37
> ("watchdog: move watchdog sysctl interface to watchdog.c").
>
> >  int __read_mostly watchdog_thresh = 10;
> > -static int __read_mostly nmi_watchdog_available;
> > +static int __read_mostly watchdog_hardlockup_available;
> >
> >  struct cpumask watchdog_cpumask __read_mostly;
> >  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>
> Otherwise, I like the changes.
>
> With the following:
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 83076bf70ce8..d14fe345eae9 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -17,8 +17,6 @@ void lockup_detector_soft_poweroff(void);
>  void lockup_detector_cleanup(void);
>
>  extern int watchdog_user_enabled;
> -extern int nmi_watchdog_user_enabled;
> -extern int soft_watchdog_user_enabled;
>  extern int watchdog_thresh;
>  extern unsigned long watchdog_enabled;
>
> @@ -68,8 +66,8 @@ static inline void reset_hung_task_detector(void) { }
>   * 'watchdog_enabled' variable. Each lockup detector has its dedicated bit -
>   * bit 0 for the hard lockup detector and bit 1 for the soft lockup detector.
>   *
> - * 'watchdog_user_enabled', 'nmi_watchdog_user_enabled' and
> - * 'soft_watchdog_user_enabled' are variables that are only used as an
> + * 'watchdog_user_enabled', 'watchdog_hardlockup_user_enabled' and
> + * 'watchdog_softlockup_user_enabled' are variables that are only used as an
>   * 'interface' between the parameters in /proc/sys/kernel and the internal
>   * state bits in 'watchdog_enabled'. The 'watchdog_thresh' variable is
>   * handled differently because its value is not boolean, and the lockup
>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
>
> Even better might be to remove the unused declaration in a separate
> patch. I think that more declarations are not needed after moving
> the /proc stuff into kernel/watchdog.c.
>
> But it might also be fixed later.

Breadcrumbs: I squashed your suggestion together with Tom's patch and
posted the result:

https://lore.kernel.org/r/20230525162822.1.I0fb41d138d158c9230573eaa37dc56afa2fb14ee@changeid

-Doug

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
  2023-05-25 20:08     ` Doug Anderson
@ 2023-05-26 12:29       ` Petr Mladek
  0 siblings, 0 replies; 38+ messages in thread
From: Petr Mladek @ 2023-05-26 12:29 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen,
	Colin Cross

On Thu 2023-05-25 13:08:04, Doug Anderson wrote:
> Hi,
> 
> On Thu, May 25, 2023 at 9:27 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Fri 2023-05-19 10:18:38, Douglas Anderson wrote:
> > > Implement a hardlockup detector that doesn't doesn't need any extra
> > > arch-specific support code to detect lockups. Instead of using
> > > something arch-specific we will use the buddy system, where each CPU
> > > watches out for another one. Specifically, each CPU will use its
> > > softlockup hrtimer to check that the next CPU is processing hrtimer
> > > interrupts by verifying that a counter is increasing.
> > >
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
> > >
> > >  #endif /* CONFIG_HARDLOCKUP_DETECTOR */
> > >
> > > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER)
> > >
> > >  static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> > >  static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
> > > @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void)
> > >  }
> > >  EXPORT_SYMBOL(arch_touch_nmi_watchdog);
> > >
> > > +void watchdog_hardlockup_touch_cpu(unsigned int cpu)
> > > +{
> > > +     per_cpu(watchdog_hardlockup_touched, cpu) = true;
> > > +
> > > +     /* Match with smp_rmb() in watchdog_hardlockup_check() */
> > > +     smp_wmb();
> >
> > It is great that you described where the related barrier is.
> >
> > Another important information is what exactly is synchronized.
> > And I am actually not sure what we are synchronizing here.
> >
> > My understanding is that a write barrier should synchronize
> > related writes, for example:
> >
> >         X = ...;
> >         /* Make sure that X is modified before Y */
> >         smp_wmb();
> >         Y = ...;
> >
> > And the related read barrier should synchronize the related reads,
> > for example:
> >
> >         if (test(Y)) {
> >                 /*
> >                  * Make sure that we use the updated X when
> >                  * we saw the updated Y.
> >                  */
> >                  smp_rmb();
> >                  do_something(X);
> >          }
> >
> > IMHO, we do not need any barrier here because we have only
> > one variable "watchdog_hardlockup_touched" here.
> > watchdog_hardlockup_check() will either see the updated value
> > or not. But it does not synchronize it against any other
> > variables or values.
> 
> Fair. These barriers were present in the original buddy lockup
> detector that we've been carrying in ChromeOS but that doesn't
> necessarily mean that they were there for a good reason.
> 
> Reasoning about weakly ordered memory always makes my brain hurt and I
> never feel confident at the end that I got the right answer and, of
> course, this is coupled by the fact that if I have a logic error in my
> reasoning that it might cause a rare / subtle bug. :(

Sure. Lockless code is complicated.

> When possible I
> try to write code that uses full blown locks to make it easier to
> reason about (even if less efficient),

Makes sense. There should be a good reason to use lockless code
because it is complicated to do it right and maintain.

> but that's not necessarily
> possible here. While we obviously don't just want to sprinkle barriers
> all over the code, IMO it's not a terrible sin to put a barrier in a
> case where it makes it easier to reason about the order of things.

I understand this. Well, it is always important to describe the
the reason why the barrier was added there. Even when it is wrong,
it gives a clue what was the motivation. Otherwise, it is hard
to do any changes on the code later.

I guess that it might be more problematic for you because
you probably are not the original author.

> In any case, I guess in this case I would worry about some sort of
> ordering race when enabling / disabling the buddy lockup detector. At
> the end of the buddy's watchdog_hardlockup_enable() /
> watchdog_hardlockup_disable() we adjust the "watchdog_cpus" which
> changes buddy assignments. Without a barrier, I _think_ it would be
> possible for a new CPU to notice the change in buddies without
> noticing the touch. Does that match your understanding? Now when
> reasoning about CPUs going online/offline we need to consider this
> extra case and we have to decide if there's any chance it could lead
> to a false lockup detection. With the memory barriers here, it's a
> little easier to think about.

This makes sense. I did not think about the hotplug scenario.

Well, I suggest to move the barriers into the buddy code and describe
it there. It does not make sense to use the barriers for the perf
hardlockup.

I mean something like:

diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index fee45af2e5bd..ebe71dcb55e6 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -52,6 +52,13 @@ void watchdog_hardlockup_enable(unsigned int cpu)
 	if (next_cpu < nr_cpu_ids)
 		watchdog_hardlockup_touch_cpu(next_cpu);
 
+	/*
+	 * Makes sure that watchdog is touched on this CPU before
+	 * other CPUs could see it in watchdog_cpus. The counter
+	 * part is in watchdog_buddy_check_hardlockup().
+	 */
+	smp_wmb();
+
 	cpumask_set_cpu(cpu, &watchdog_cpus);
 }
 
@@ -69,6 +76,13 @@ void watchdog_hardlockup_disable(unsigned int cpu)
 	if (next_cpu < nr_cpu_ids)
 		watchdog_hardlockup_touch_cpu(next_cpu);
 
+	/*
+	 * Makes sure that watchdog is touched on the next CPU before
+	 * this CPU disappear in watchdog_cpus. The counter part is in
+	 * watchdog_buddy_check_hardlockup().
+	 */
+	smp_wmb();
+
 	cpumask_clear_cpu(cpu, &watchdog_cpus);
 }
 
@@ -89,5 +103,12 @@ void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts)
 	if (next_cpu >= nr_cpu_ids)
 		return;
 
+	/*
+	 * Make sure that the watchdog was touched on next CPU when
+	 * watchdog_next_cpu() returned another one because of
+	 * a change in watchdog_hardlockup_enable()/disable().
+	 */
+	smp_rmb();
+
 	watchdog_hardlockup_check(next_cpu, NULL);
 }

> Did the above convince you about keeping the barriers? If so, do you
> have any suggested comment that would make it clearer?
> 
> 
> > > +}
> > > +
> > >  static bool is_hardlockup(unsigned int cpu)
> > >  {
> > >       int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > > @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > >       struct pt_regs *regs = get_irq_regs();
> > >       int duration;
> > >       int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
> > > +     unsigned long hrtimer_interrupts;
> > >
> > >       if (!watchdog_enabled)
> > >               return HRTIMER_NORESTART;
> > >
> > > -     watchdog_hardlockup_kick();
> > > +     hrtimer_interrupts = watchdog_hardlockup_kick();
> > > +
> > > +     /* test for hardlockups */
> >
> > I would omit the comment. It is not valid when perf detector is used.
> > And checking the buddy is clear from the function name.
> >
> > > +     watchdog_buddy_check_hardlockup(hrtimer_interrupts);
> >
> > I would personally move this into watchdog_hardlockup_kick().
> > watchdog_timer_fn() is already complex enough. And checking
> > the buddy when kicking a CPU makes sense.
> 
> Sure, I'll add that to my list of things to follow-up with.
> 
> 
> > Also I would not pass "hrtimer_interrupts". I guess that it is
> > just an optimization. It is an extra churn in the code. IMHO,
> > is is not wort it. This code does not need to be super optimized.
> 
> The main reason I did it is that "hrtimer_interrupts" is static to
> watchdog.c now. If I don't pass it in then I have to make it
> non-static and add it to the header. That also means anyone looking at
> the variable and figuring out how it is read/written needs to go
> search for other people that reference it. I feel like it's cleaner to
> just pass it in. If you feel strongly that I should change this then
> let me know, but otherwise I'll plan to leave this how I have it.

I do not have strong opinion. For me, the more important change is
to move watchdog_buddy_check_hardlockup() into
watchdog_hardlockup_kick(). watchdog_timer_fn() is already too complex.

> 
> > >       /* kick the softlockup detector */
> > >       if (completion_done(this_cpu_ptr(&softlockup_completion))) {
> > > --- a/lib/Kconfig.debug
> > > +++ b/lib/Kconfig.debug
> > > @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
> > >
> > >         Say N if unsure.
> > >
> > > -config HARDLOCKUP_DETECTOR_PERF
> > > +# Both the "perf" and "buddy" hardlockup detectors count hrtimer
> > > +# interrupts. This config enables functions managing this common code.
> > > +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > >       bool
> > >       select SOFTLOCKUP_DETECTOR
> > >
> > > +config HARDLOCKUP_DETECTOR_PERF
> > > +     bool
> > > +     depends on HAVE_HARDLOCKUP_DETECTOR_PERF
> > > +     select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > > +
> > > +config HARDLOCKUP_DETECTOR_BUDDY
> > > +     bool
> > > +     depends on SMP
> > > +     select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
> > > +
> > > +# For hardlockup detectors you can have one directly provided by the arch
> > > +# or use a "non-arch" one. If you're using a "non-arch" one that is
> > > +# further divided the perf hardlockup detector (which, confusingly, needs
> > > +# arch-provided perf support) and the buddy hardlockup detector (which just
> > > +# needs SMP). In either case, using the "non-arch" code conflicts with
> > > +# the NMI watchdog code (which is sometimes used directly and sometimes used
> > > +# by the arch-provided hardlockup detector).
> > > +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> > > +     bool
> > > +     depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
> > > +     default y
> > > +
> > > +config HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > > +     bool "Prefer the buddy CPU hardlockup detector"
> > > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
> >
> > Huh, I have big troubles to scratch my head around this check:
> >
> >        HAVE_HARDLOCKUP_DETECTOR_NON_ARCH depends on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP
> >
> >        and this depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and again
> >                on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP.
> 
> The goal is to have "HARDLOCKUP_DETECTOR_PREFER_BUDDY" to show up as
> an option if there is an option _other_ than the buddy. If there's not
> more than one hardlockup detector to pick from then there's no reason
> to ask the person configuring the kernel which one they'd prefer. At
> the moment, if you have an "arch" lockup detector then you're stuck
> with it, so you only get a choice if a "perf" detector is available
> and you've got SMP.
> 
> Ah, so I guess this could be simplified to:
> 
> depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP

Yes, this is much better.

> OK, I'll add that to the list.
> 
> 
> > > +     help
> > > +       Say Y here to prefer the buddy hardlockup detector over the perf one.
> > > +
> > > +       With the buddy detector, each CPU uses its softlockup hrtimer
> > > +       to check that the next CPU is processing hrtimer interrupts by
> > > +       verifying that a counter is increasing.
> > > +
> > > +       This hardlockup detector is useful on systems that don't have
> > > +       an arch-specific hardlockup detector or if resources needed
> > > +       for the hardlockup detector are better used for other things.
> > > +
> > > +# This will select the appropriate non-arch hardlockdup detector
> > > +config HARDLOCKUP_DETECTOR_NON_ARCH
> > > +     bool
> > > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> > > +     select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > > +     select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
> > > +
> > >  #
> > >  # Enables a timestamp based low pass filter to compensate for perf based
> > >  # hard lockup detection which runs too fast due to turbo modes.
> > > @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP
> > >  config HARDLOCKUP_DETECTOR
> > >       bool "Detect Hard Lockups"
> > >       depends on DEBUG_KERNEL && !S390
> >
> > Is there any reason why S390 could not or do not want to use the buddy
> > hardlockup detector.
> 
> This isn't a new dependency, but it's a good question. Looking at the
> git history, I see commit dea20a3fbdd0 ("[PATCH] Disable
> DETECT_SOFTLOCKUP for s390"). ...and it looks like the softlockup
> detector still says it's broken on s390. That would mean that the
> buddy detector is broken too.

It seems that s390 wanted to disable the watchdog completely, see
the commit  dea20a3fbdd08e5 ("[PATCH] Disable DETECT_SOFTLOCKUP for s390")
because they got too many false positives.

> 
> > > -     depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > > +     depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> > >       select LOCKUP_DETECTOR
> > > -     select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
> > > +     select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> >
> > Anyway, the configuration of the hard lockup detectors is insane and
> > this patchset makes it even worse, especially the new
> > HARDLOCKUP_DETECTOR_NON_ARCH stuff.
> >
> > It seems that sparc, powerpc and s390 are somehow special. Do you
> > still have in mind how they are distinguished using the Kconfig
> > variables?
> >
> > For example, I am pretty confused by the meaning of HAVE_NMI_WATCHDOG.
> >
> > And sparc has its own variant of
> > watchdog_hardlockup_enable()/disable(). It means that it is
> > arch-specific. Does it work with the 13th patch which made
> > watchdog_hardlockup_enable()/disable() to be watchdog-hardlockup-type
> > specific? Is is somehow related to HAVE_NMI_WATCHDOG?
> > Does this replace the entire watchdog only only the enable part?
> >
> > I think that we need to make this more straightforward. But I first
> > need to understand the existing maze of config variables.
> 
> I agree that it's confusing. I'm obviously biased, but IMO it's less
> confusing after my patchset than before. ;-) The state of the world
> before my patchset set a pretty low bar.
> 
> As far as I understand it, at an architecture-level you can choose any
> _ONE_ of the following:
> 
> a) Implement bits needed for the the "perf" hardlockup detector. x86
> has done this, some configs of powerpc do this, and arm64 now after my
> patch series. This is HAVE_HARDLOCKUP_DETECTOR_PERF.
> 
> b) Implement your own totally separate hardlockup detector that
> doesn't use any of the common "perf" code but still looks the same to
> userspace (same sysctls, etc). Only powerpc does this (in some
> configs). As per conversations in previous versions of my patch
> series, apparently powerpc's version is quite fancy and maybe someday
> people can move some of these features to the common code. This is
> HAVE_HARDLOCKUP_DETECTOR_ARCH.
> 
> c) Don't implement the full features of a hardlockup detector but
> still have the basics. In the very least, I think it doesn't support
> the sysctls "hardlockup_panic" and "hardlockup_all_cpu_backtrace". It
> doesn't support the kernel command line parameter "nmi_watchdog=". I
> don't know for sure if there are any other differences. Only sparc64
> does this. This is HAVE_NMI_WATCHDOG. Confusingly,
> HAVE_HARDLOCKUP_DETECTOR_ARCH selects HAVE_NMI_WATCHDOG.
> 
> d) Don't implement _any_ hardlockup detector of any sort. After my
> patchset you can still end up with "buddy" if you have SMP.
>
> One thing that would probably help would be to bring sparc64 to a full
> "arch" hardlockup implementation and then get rid of the special case.
> That seems a bit outside my scope, though if someone wanted to post
> patches for that I'd be willing to give them a review.

It would be nice but it might be problematic if we do not have
access to the hardware.

> I guess other than that, the best we could try to do is to rename some
> configs and/or add some subconfigs to describe certain features? Maybe
> HAVE_NMI_WATCHDOG => HAVE_HARDLOCKUP_DETECTOR_ARCH_BASIC_FEATURES
> would help? I'd love to come up with a better name for
> HAVE_HARDLOCKUP_DETECTOR_NON_ARCH but I couldn't come up with one.
> Maybe the unwieldy  "HAVE_HARDLOCKUP_DETECTOR_THAT_COUNTS_HRTIMER"?

Renaming the config variables seems to be the best solution at the moment.
IMHO, it would be nice to have something like:

  + CONFIG_HARDLOCKUP_DETECTOR for code shared by all hardlockup
    detectors

  + CONFIG_HARDLOCKUP_DETECTOR_PERF for code using kernel/watchdog_perf.c

  + CONFIG_HARDLOCKUP_DETECTOR_BUDDY for code using
	  kernel/watchdog_buddy.c

  + HAVE_HARDLOCKUP_DETECTOR_PERF set by architectures that support
	  using kernel/watchdog_perf.c

  + HAVE_HARDLOCKUP_DETECTOR_ARCH set by architectures that have
	alternative implementation of the hardlockup detector

  + CONFIG_HARDLOCKUP_DETECTOR_PREFER_BUDDY
	Allow to prefer the buddy detector when _PERF or _ARCH
	is available as well.

  + HAVE_HARDLOCKUP_PANIC
  + HAVE_HARDLOCKUP_ALL_CPU_BACKTRACE
	set when the earchitecture support these features and
	used for the sysfs interface

> If you have concrete suggestions for what would be cleaner, let me
> know and I can queue up a patch. ...or I'm happy to review a patch.

I am not sure how complicated it would be to rename the config
variables to somehing sane. I am sorry I do not have time to
prepare the patches at the moment.

I'll let Andrew to decide if he would require this cleanup to accept
the patchset.

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs
  2023-05-19 17:18 ` [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs Douglas Anderson
@ 2023-05-26 12:36   ` Petr Mladek
  2023-06-12 10:33   ` Mark Rutland
  1 sibling, 0 replies; 38+ messages in thread
From: Petr Mladek @ 2023-05-26 12:36 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri 2023-05-19 10:18:39, Douglas Anderson wrote:
> On arm64, NMI support needs to be detected at runtime. Add a weak
> function to the perf hardlockup detector so that an architecture can
> implement it to detect whether NMIs are available.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> While I won't object to this patch landing, I consider it part of the
> arm64 perf hardlockup effort. I would be OK with the earlier patches
> in the series landing and then not landing ${SUBJECT} patch nor
> anything else later.
> 
> I'll also note that, as an alternative to this, it would be nice if we
> could figure out how to make perf_event_create_kernel_counter() fail
> on arm64 if NMIs aren't available. Maybe we could add a "must_use_nmi"
> element to "struct perf_event_attr"?
> 
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -234,12 +234,22 @@ void __init hardlockup_detector_perf_restart(void)
>  	}
>  }
>  
> +bool __weak __init arch_perf_nmi_is_available(void)
> +{
> +	return true;
> +}
> +
>  /**
>   * watchdog_hardlockup_probe - Probe whether NMI event is available at all
>   */
>  int __init watchdog_hardlockup_probe(void)
>  {
> -	int ret = hardlockup_detector_event_create();
> +	int ret;
> +
> +	if (!arch_perf_nmi_is_available())
> +		return -ENODEV;

My understanding is that this would block the perf hardlockup detector
at runtime. Does it work with the "nmi_watchdog" sysctl. I see
that it is made read-only when it is not enabled at build time,
see NMI_WATCHDOG_SYSCTL_PERM.

> +
> +	ret = hardlockup_detector_event_create();
>  
>  	if (ret) {
>  		pr_info("Perf NMI watchdog permanently disabled\n");

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 13/18] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly
  2023-05-24 19:38     ` Doug Anderson
@ 2023-05-26 14:44       ` Petr Mladek
  0 siblings, 0 replies; 38+ messages in thread
From: Petr Mladek @ 2023-05-26 14:44 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	Mark Rutland, linuxppc-dev, Sumit Garg, npiggin, davem,
	Marc Zyngier, Stephen Boyd, sparclinux, christophe.leroy,
	Catalin Marinas, ravi.v.shankar, Randy Dunlap, Pingfan Liu,
	Guenter Roeck, Lecopzer Chen, Ian Rogers, ito-yuichi,
	ricardo.neri, linux-arm-kernel, linux-perf-users, Will Deacon,
	Chen-Yu Tsai, linux-kernel, Masayoshi Mizuma, Andi Kleen

On Wed 2023-05-24 12:38:49, Doug Anderson wrote:
> Hi,
> 
> On Wed, May 24, 2023 at 6:59 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Fri 2023-05-19 10:18:37, Douglas Anderson wrote:
> > > The fact that there watchdog_hardlockup_enable(),
> > > watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are
> > > declared __weak means that the configured hardlockup detector can
> > > define non-weak versions of those functions if it needs to. Instead of
> > > doing this, the perf hardlockup detector hooked itself into the
> > > default __weak implementation, which was a bit awkward. Clean this up.
> > >
> > > >From comments, it looks as if the original design was done because the
> > > __weak function were expected to implemented by the architecture and
> > > not by the configured hardlockup detector. This got awkward when we
> > > tried to add the buddy lockup detector which was not arch-specific but
> > > wanted to hook into those same functions.
> > >
> > > This is not expected to have any functional impact.
> > >
> > > @@ -187,27 +187,33 @@ static inline void watchdog_hardlockup_kick(void) { }
> > >  #endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */
> > >
> > >  /*
> > > - * These functions can be overridden if an architecture implements its
> > > - * own hardlockup detector.
> > > + * These functions can be overridden based on the configured hardlockdup detector.
> > >   *
> > >   * watchdog_hardlockup_enable/disable can be implemented to start and stop when
> > > - * softlockup watchdog start and stop. The arch must select the
> > > + * softlockup watchdog start and stop. The detector must select the
> > >   * SOFTLOCKUP_DETECTOR Kconfig.
> > >   */
> > > -void __weak watchdog_hardlockup_enable(unsigned int cpu)
> > > -{
> > > -     hardlockup_detector_perf_enable();
> > > -}
> > > +void __weak watchdog_hardlockup_enable(unsigned int cpu) { }
> > >
> > > -void __weak watchdog_hardlockup_disable(unsigned int cpu)
> > > -{
> > > -     hardlockup_detector_perf_disable();
> > > -}
> > > +void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
> > >
> > >  /* Return 0, if a hardlockup watchdog is available. Error code otherwise */
> > >  int __weak __init watchdog_hardlockup_probe(void)
> > >  {
> > > -     return hardlockup_detector_perf_init();
> > > +     /*
> > > +      * If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture
> > > +      * is assumed to have the hard watchdog available and we return 0.
> > > +      */
> > > +     if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
> > > +             return 0;
> > > +
> > > +     /*
> > > +      * Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG
> > > +      * are required to implement a non-weak version of this probe function
> > > +      * to tell whether they are available. If they don't override then
> > > +      * we'll return -ENODEV.
> > > +      */
> > > +     return -ENODEV;
> > >  }
> >
> > When thinking more about it. It is weird that we need to handle
> > CONFIG_HAVE_NMI_WATCHDOG in this default week function.
> >
> > It should be handled in watchdog_hardlockup_probe() implemented
> > in kernel/watchdog_perf.c.
> >
> > IMHO, the default __weak function could always return -ENODEV;
> >
> > Would it make sense, please?
> 
> I don't quite understand. I'd agree that the special case for
> CONFIG_HAVE_NMI_WATCHDOG is ugly, but it was also ugly before. IMO
> it's actually a little less ugly / easier to understand after my
> patch. ...but let me walk through how I think this special case works
> and maybe you can tell me where I'm confused.
> 
> The first thing to understand is that CONFIG_HARDLOCKUP_DETECTOR_PERF
> and CONFIG_HAVE_NMI_WATCHDOG are mutually exclusive from each other.
> This was true before any of my patches and is still true after them.
> Specifically, if CONFIG_HAVE_NMI_WATCHDOG is defined then an
> architecture implements arch_touch_nmi_watchdog() (as documented in
> the Kconfig docs for HAVE_NMI_WATCHDOG). Looking at the tree before my
> series you can see that the perf hardlockup detector also implemented
> arch_touch_nmi_watchdog(). This would have caused a conflict. The
> mutual exclusion was presumably enforced by an architecture not
> defining both HAVE_NMI_WATCHDOG and HAVE_HARDLOCKUP_DETECTOR_PERF.
> 
> The second thing to understand is that an architecture that defines
> CONFIG_HAVE_NMI_WATCHDOG is _not_ required to implement
> watchdog_hardlockup_probe() (used to be called watchdog_nmi_probe()).
> Maybe this should change, but at the very least it appears that
> SPARC64 defines HAVE_NMI_WATCHDOG but doesn't define
> watchdog_hardlockup_probe() AKA watchdog_nmi_probe(). Anyone who
> defines CONFIG_HAVE_NMI_WATCHDOG and doesn't implement
> watchdog_hardlockup_probe() is claiming that their watchdog needs no
> probing and is always available.
> 
> So with that context:
> 
> 1. We can't handle any special cases for CONFIG_HAVE_NMI_WATCHDOG in
> "kernel/watchdog_perf.c". The special cases that we need to handle are
> all for the cases where CONFIG_HARDLOCKUP_DETECTOR_PERF isn't defined
> and that means "kernel/watchdog_perf.c" isn't included.
> 
> 2. We can't have the default __weak function return -ENODEV because
> CONFIG_HAVE_NMI_WATCHDOG doesn't require an arch to implement
> watchdog_hardlockup_probe(), but we want watchdog_hardlockup_probe()
> to return "no error" in that case so that
> "watchdog_hardlockup_available" gets set to true.
> 
> Does that sound right?
> 
> I'd agree that a future improvement saying that
> CONFIG_HAVE_NMI_WATCHDOG means you _must_ implement
> watchdog_hardlockup_probe() would make sense and that would allow us
> to get rid of the special case. IMO, though, that's a separate patch.
> I'd be happy to review that patch if you wanted to post it up. :-)
> 
> If we want to add that requirement, I _think_ the only thing you'd
> need to do is to add watchdog_hardlockup_probe() to sparc64 and have
> it return 0 and put that definition in the same file containing
> arch_touch_nmi_watchdog().

This is my understanding. IMHO, if we define
watchdog_hardlockup_probe() in /arch/sparc/kernel/nmi.c
then we could remove the CONFIG_HAVE_NMI_WATCHDOG check from
the default watchdog_hardlockup_probe().

Honestly, I am afraid that nobody really thought about any rules.
People were just adding their stuff any way that worked for them.
And this is why we ended with this maze.

It is not your fault. You just moved the ifdef from the header file
into the function definition. But it showed very well how ugly it was.

By ugly, I mean that powerpc, perf, and buddy hardlockup detectors
make watchdog_hardlockup_probe() successful by implementing
an alternative to the default weak implementation. Only, sparc64
reports success via a crazy check in the default weak configuration.

The check is crazy because it makes the decision based on
CONFIG_HAVE_NMI_WATCHDOG. But the related code is in
arch/sparc/kernel/nmi.c which is compiled when CONFIG_SPARC64
is enabled.

The connection between CONFIG_HAVE_NMI_WATCHDOG and
CONFIG_SPARC64 is hidden in arch/sparc/Kconfig.

It would be much more straightforward when the weak function
is implemented in arch/sparc/kernel/nmi.c. It will be clear
that probe() will succeed when the watchdog gets initialized.


> powerpc also gets CONFIG_HAVE_NMI_WATCHDOG
> as a side effect of selecting CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH but
> it looks like they implement watchdog_hardlockup_probe() already. Oh,
> but maybe this will fix a preexisting (existed before my patches)
> minor bug... Unless I'm missing something (entirely possible!) on
> powerpc today I guess if you turn off CONFIG_PPC_WATCHDOG then
> CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH and CONFIG_HAVE_NMI_WATCHDOG
> would still be defined and we'd end up returning 0 (no error) from
> watchdog_hardlockup_probe(). That means that on powerpc today if you
> turn off CONFIG_PPC_WATCHDOG that '/proc/sys/kernel/nmi_watchdog' will
> still think the watchdog is enabled?

Yeah, it seems that this bug was there. And it will get fixed when
the default weak implementation of watchdog_hardlockup_probe()
always returns false.

Again, I'll let Andrew to decide whether this should get cleaned
in this patchset or later. But it would be fine to fix this
after we spent so much time understanding the mess.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs
  2023-05-19 17:18 ` [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs Douglas Anderson
  2023-05-26 12:36   ` Petr Mladek
@ 2023-06-12 10:33   ` Mark Rutland
  2023-06-12 13:55     ` Doug Anderson
  1 sibling, 1 reply; 38+ messages in thread
From: Mark Rutland @ 2023-06-12 10:33 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Petr Mladek, Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	linuxppc-dev, Sumit Garg, npiggin, davem, Marc Zyngier,
	Stephen Boyd, sparclinux, christophe.leroy, Catalin Marinas,
	ravi.v.shankar, Randy Dunlap, Pingfan Liu, Guenter Roeck,
	Lecopzer Chen, Ian Rogers, ito-yuichi, ricardo.neri,
	linux-arm-kernel, linux-perf-users, Will Deacon, Chen-Yu Tsai,
	linux-kernel, Masayoshi Mizuma, Andi Kleen

On Fri, May 19, 2023 at 10:18:39AM -0700, Douglas Anderson wrote:
> On arm64, NMI support needs to be detected at runtime. Add a weak
> function to the perf hardlockup detector so that an architecture can
> implement it to detect whether NMIs are available.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> While I won't object to this patch landing, I consider it part of the
> arm64 perf hardlockup effort. I would be OK with the earlier patches
> in the series landing and then not landing ${SUBJECT} patch nor
> anything else later.

FWIW, everything prior to this looks fine to me, so I reckon it'd be worth
splitting the series here and getting the buddy lockup detector in first, to
avoid a log-jam on all the subsequent NMI bits.

Thanks,
Mark.

> I'll also note that, as an alternative to this, it would be nice if we
> could figure out how to make perf_event_create_kernel_counter() fail
> on arm64 if NMIs aren't available. Maybe we could add a "must_use_nmi"
> element to "struct perf_event_attr"?
> 
> (no changes since v4)
> 
> Changes in v4:
> - ("Add a weak function for an arch to detect ...") new for v4.
> 
>  include/linux/nmi.h    |  1 +
>  kernel/watchdog_perf.c | 12 +++++++++++-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 47db14e7da52..eb616fc07c85 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -210,6 +210,7 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
>  
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>  u64 hw_nmi_get_sample_period(int watchdog_thresh);
> +bool arch_perf_nmi_is_available(void);
>  #endif
>  
>  #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> index 349fcd4d2abc..8ea00c4a24b2 100644
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -234,12 +234,22 @@ void __init hardlockup_detector_perf_restart(void)
>  	}
>  }
>  
> +bool __weak __init arch_perf_nmi_is_available(void)
> +{
> +	return true;
> +}
> +
>  /**
>   * watchdog_hardlockup_probe - Probe whether NMI event is available at all
>   */
>  int __init watchdog_hardlockup_probe(void)
>  {
> -	int ret = hardlockup_detector_event_create();
> +	int ret;
> +
> +	if (!arch_perf_nmi_is_available())
> +		return -ENODEV;
> +
> +	ret = hardlockup_detector_event_create();
>  
>  	if (ret) {
>  		pr_info("Perf NMI watchdog permanently disabled\n");
> -- 
> 2.40.1.698.g37aff9b760-goog
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs
  2023-06-12 10:33   ` Mark Rutland
@ 2023-06-12 13:55     ` Doug Anderson
  2023-06-12 13:59       ` Mark Rutland
  0 siblings, 1 reply; 38+ messages in thread
From: Doug Anderson @ 2023-06-12 13:55 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Petr Mladek, Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	linuxppc-dev, Sumit Garg, npiggin, davem, Marc Zyngier,
	Stephen Boyd, sparclinux, christophe.leroy, Catalin Marinas,
	ravi.v.shankar, Randy Dunlap, Pingfan Liu, Guenter Roeck,
	Lecopzer Chen, Ian Rogers, ito-yuichi, ricardo.neri,
	linux-arm-kernel, linux-perf-users, Will Deacon, Chen-Yu Tsai,
	linux-kernel, Masayoshi Mizuma, Andi Kleen

Mark,

On Mon, Jun 12, 2023 at 3:33 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Fri, May 19, 2023 at 10:18:39AM -0700, Douglas Anderson wrote:
> > On arm64, NMI support needs to be detected at runtime. Add a weak
> > function to the perf hardlockup detector so that an architecture can
> > implement it to detect whether NMIs are available.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > While I won't object to this patch landing, I consider it part of the
> > arm64 perf hardlockup effort. I would be OK with the earlier patches
> > in the series landing and then not landing ${SUBJECT} patch nor
> > anything else later.
>
> FWIW, everything prior to this looks fine to me, so I reckon it'd be worth
> splitting the series here and getting the buddy lockup detector in first, to
> avoid a log-jam on all the subsequent NMI bits.

I think the whole series has already landed in Andrew's tree,
including the arm64 "perf" lockup detector bits. I saw all the
notifications from Andrew go through over the weekend that they were
moved from an "unstable" branch to a "stable" one and I see them at:

https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/log/?h=mm-nonmm-stable

When I first saw Anderw land the arm64 perf lockup detector bits in
his unstable branch several weeks ago, I sent a private message to the
arm64 maintainers (yourself included) to make sure you were aware of
it and that it hadn't been caught in mail filters. I got the
impression that everything was OK. Is that not the case?


-Doug

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs
  2023-06-12 13:55     ` Doug Anderson
@ 2023-06-12 13:59       ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2023-06-12 13:59 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Petr Mladek, Andrew Morton, Matthias Kaehlcke, kgdb-bugreport,
	Stephane Eranian, mpe, Tzung-Bi Shih, Daniel Thompson,
	linuxppc-dev, Sumit Garg, npiggin, davem, Marc Zyngier,
	Stephen Boyd, sparclinux, christophe.leroy, Catalin Marinas,
	ravi.v.shankar, Randy Dunlap, Pingfan Liu, Guenter Roeck,
	Lecopzer Chen, Ian Rogers, ito-yuichi, ricardo.neri,
	linux-arm-kernel, linux-perf-users, Will Deacon, Chen-Yu Tsai,
	linux-kernel, Masayoshi Mizuma, Andi Kleen

On Mon, Jun 12, 2023 at 06:55:37AM -0700, Doug Anderson wrote:
> Mark,
> 
> On Mon, Jun 12, 2023 at 3:33 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Fri, May 19, 2023 at 10:18:39AM -0700, Douglas Anderson wrote:
> > > On arm64, NMI support needs to be detected at runtime. Add a weak
> > > function to the perf hardlockup detector so that an architecture can
> > > implement it to detect whether NMIs are available.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > > While I won't object to this patch landing, I consider it part of the
> > > arm64 perf hardlockup effort. I would be OK with the earlier patches
> > > in the series landing and then not landing ${SUBJECT} patch nor
> > > anything else later.
> >
> > FWIW, everything prior to this looks fine to me, so I reckon it'd be worth
> > splitting the series here and getting the buddy lockup detector in first, to
> > avoid a log-jam on all the subsequent NMI bits.
> 
> I think the whole series has already landed in Andrew's tree,
> including the arm64 "perf" lockup detector bits. I saw all the
> notifications from Andrew go through over the weekend that they were
> moved from an "unstable" branch to a "stable" one and I see them at:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/log/?h=mm-nonmm-stable
> 
> When I first saw Anderw land the arm64 perf lockup detector bits in
> his unstable branch several weeks ago, I sent a private message to the
> arm64 maintainers (yourself included) to make sure you were aware of
> it and that it hadn't been caught in mail filters. I got the
> impression that everything was OK. Is that not the case?

Sorry; I'm slowly catching up with a backlog of email, and I'm just behind.

Feel free to ignore this; sorry for the noise!

If we spot anything going wrong in testing we can look at fixing those up.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-06-12 14:00 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-19 17:18 [PATCH v5 00/18] watchdog/hardlockup: Add the buddy hardlockup detector Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 01/18] watchdog/perf: Define dummy watchdog_update_hrtimer_threshold() on correct config Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 02/18] watchdog/perf: More properly prevent false positives with turbo modes Douglas Anderson
2023-05-23  9:35   ` Petr Mladek
2023-05-19 17:18 ` [PATCH v5 03/18] watchdog: remove WATCHDOG_DEFAULT Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 04/18] watchdog/hardlockup: change watchdog_nmi_enable() to void Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 05/18] watchdog/perf: Ensure CPU-bound context when creating hardlockup detector event Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 06/18] watchdog/hardlockup: Add comments to touch_nmi_watchdog() Douglas Anderson
2023-05-23  9:58   ` Petr Mladek
2023-05-19 17:18 ` [PATCH v5 07/18] watchdog/perf: Rename watchdog_hld.c to watchdog_perf.c Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 08/18] watchdog/hardlockup: Move perf hardlockup checking/panic to common watchdog.c Douglas Anderson
2023-05-23 11:45   ` Petr Mladek
2023-05-19 17:18 ` [PATCH v5 09/18] watchdog/hardlockup: Style changes to watchdog_hardlockup_check() / is_hardlockup() Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check() Douglas Anderson
2023-05-23 16:02   ` Petr Mladek
2023-05-23 16:34     ` Doug Anderson
2023-05-24 11:36       ` Petr Mladek
2023-05-19 17:18 ` [PATCH v5 11/18] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c Douglas Anderson
2023-05-24 13:07   ` Petr Mladek
2023-05-19 17:18 ` [PATCH v5 12/18] watchdog/hardlockup: Rename some "NMI watchdog" constants/function Douglas Anderson
2023-05-24 13:38   ` Petr Mladek
2023-05-25 23:33     ` Doug Anderson
2023-05-19 17:18 ` [PATCH v5 13/18] watchdog/hardlockup: Have the perf hardlockup use __weak functions more cleanly Douglas Anderson
2023-05-24 13:59   ` Petr Mladek
2023-05-24 19:38     ` Doug Anderson
2023-05-26 14:44       ` Petr Mladek
2023-05-19 17:18 ` [PATCH v5 14/18] watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs Douglas Anderson
2023-05-25 16:26   ` Petr Mladek
2023-05-25 20:08     ` Doug Anderson
2023-05-26 12:29       ` Petr Mladek
2023-05-19 17:18 ` [PATCH v5 15/18] watchdog/perf: Add a weak function for an arch to detect if perf can use NMIs Douglas Anderson
2023-05-26 12:36   ` Petr Mladek
2023-06-12 10:33   ` Mark Rutland
2023-06-12 13:55     ` Doug Anderson
2023-06-12 13:59       ` Mark Rutland
2023-05-19 17:18 ` [PATCH v5 16/18] watchdog/perf: Adapt the watchdog_perf interface for async model Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 17/18] arm64: add hw_nmi_get_sample_period for preparation of lockup detector Douglas Anderson
2023-05-19 17:18 ` [PATCH v5 18/18] arm64: Enable perf events based hard " Douglas Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).