All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] watchdog: Cleanup / fixes after buddy series v5 reviews
@ 2023-05-27  1:41 ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson


This patch series attempts to finish resolving the feedback received
from Petr Mladek on the v5 series I posted. Andrew has already landed
v5 so I'm posting this as additional patches.

Probably the only thing that wasn't fully as clean as Petr requested
was the Kconfig stuff. I couldn't find a better way to express it
without a more major overhaul. In the very least, I renamed "NON_ARCH"
to "PERF_OR_BUDDY" in the hopes that will make it marginally better.

Nothing in this series is terribly critical and even the bugfixes are
small. However, it does cleanup a few things that were pointed out in
review.


Douglas Anderson (10):
  watchdog/hardlockup: Keep kernel.nmi_watchdog sysctl as 0444 if probe
    fails
  watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement
    watchdog_hardlockup_probe()
  watchdog/hardlockup: Don't use raw_cpu_ptr() in
    watchdog_hardlockup_kick()
  watchdog/hardlockup: In watchdog_hardlockup_check() use cpumask_copy()
  watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
  watchdog/buddy: Cleanup how watchdog_buddy_check_hardlockup() is
    called
  watchdog/buddy: Don't copy the cpumask in watchdog_next_cpu()
  watchdog/buddy: Simplify the dependency for
    HARDLOCKUP_DETECTOR_PREFER_BUDDY
  watchdog/hardlockup: Move SMP barriers from common code to buddy code
  watchdog/hardlockup: Rename HAVE_HARDLOCKUP_DETECTOR_NON_ARCH to
    ..._PERF_OR_BUDDY

 arch/Kconfig            |  3 +-
 arch/sparc/kernel/nmi.c |  5 +++
 include/linux/nmi.h     | 14 ++-------
 kernel/watchdog.c       | 68 ++++++++++++++++++-----------------------
 kernel/watchdog_buddy.c | 28 ++++++++++++++---
 lib/Kconfig.debug       | 14 ++++-----
 6 files changed, 70 insertions(+), 62 deletions(-)

-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 00/10] watchdog: Cleanup / fixes after buddy series v5 reviews
@ 2023-05-27  1:41 ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller


This patch series attempts to finish resolving the feedback received
from Petr Mladek on the v5 series I posted. Andrew has already landed
v5 so I'm posting this as additional patches.

Probably the only thing that wasn't fully as clean as Petr requested
was the Kconfig stuff. I couldn't find a better way to express it
without a more major overhaul. In the very least, I renamed "NON_ARCH"
to "PERF_OR_BUDDY" in the hopes that will make it marginally better.

Nothing in this series is terribly critical and even the bugfixes are
small. However, it does cleanup a few things that were pointed out in
review.


Douglas Anderson (10):
  watchdog/hardlockup: Keep kernel.nmi_watchdog sysctl as 0444 if probe
    fails
  watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement
    watchdog_hardlockup_probe()
  watchdog/hardlockup: Don't use raw_cpu_ptr() in
    watchdog_hardlockup_kick()
  watchdog/hardlockup: In watchdog_hardlockup_check() use cpumask_copy()
  watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
  watchdog/buddy: Cleanup how watchdog_buddy_check_hardlockup() is
    called
  watchdog/buddy: Don't copy the cpumask in watchdog_next_cpu()
  watchdog/buddy: Simplify the dependency for
    HARDLOCKUP_DETECTOR_PREFER_BUDDY
  watchdog/hardlockup: Move SMP barriers from common code to buddy code
  watchdog/hardlockup: Rename HAVE_HARDLOCKUP_DETECTOR_NON_ARCH to
    ..._PERF_OR_BUDDY

 arch/Kconfig            |  3 +-
 arch/sparc/kernel/nmi.c |  5 +++
 include/linux/nmi.h     | 14 ++-------
 kernel/watchdog.c       | 68 ++++++++++++++++++-----------------------
 kernel/watchdog_buddy.c | 28 ++++++++++++++---
 lib/Kconfig.debug       | 14 ++++-----
 6 files changed, 70 insertions(+), 62 deletions(-)

-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/10] watchdog/hardlockup: Keep kernel.nmi_watchdog sysctl as 0444 if probe fails
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

The permissions for the kernel.nmi_watchdog sysctl have always been
set at compile time despite the fact that a watchdog can fail to
probe. Let's fix this and set the permissions based on whether the
hardlockup detector actually probed.

Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf")
Reported-by: Petr Mladek <pmladek@suse.com>
Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/nmi.h |  6 ------
 kernel/watchdog.c   | 30 ++++++++++++++++++++----------
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 333465e235e1..3a27169ec383 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -95,12 +95,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 static inline void arch_touch_nmi_watchdog(void) { }
 #endif
 
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
-# define NMI_WATCHDOG_SYSCTL_PERM	0644
-#else
-# define NMI_WATCHDOG_SYSCTL_PERM	0444
-#endif
-
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 237990e8d345..4b9e31edb47f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -880,15 +880,6 @@ static struct ctl_table watchdog_sysctls[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= (void *)&sixty,
 	},
-	{
-		.procname       = "nmi_watchdog",
-		.data		= &watchdog_hardlockup_user_enabled,
-		.maxlen		= sizeof(int),
-		.mode		= NMI_WATCHDOG_SYSCTL_PERM,
-		.proc_handler   = proc_nmi_watchdog,
-		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE,
-	},
 	{
 		.procname	= "watchdog_cpumask",
 		.data		= &watchdog_cpumask_bits,
@@ -952,10 +943,28 @@ static struct ctl_table watchdog_sysctls[] = {
 	{}
 };
 
+static struct ctl_table watchdog_hardlockup_sysctl[] = {
+	{
+		.procname       = "nmi_watchdog",
+		.data		= &watchdog_hardlockup_user_enabled,
+		.maxlen		= sizeof(int),
+		.mode		= 0444,
+		.proc_handler   = proc_nmi_watchdog,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+	{}
+};
+
 static void __init watchdog_sysctl_init(void)
 {
 	register_sysctl_init("kernel", watchdog_sysctls);
+
+	if (watchdog_hardlockup_available)
+		watchdog_hardlockup_sysctl[0].mode = 0644;
+	register_sysctl_init("kernel", watchdog_hardlockup_sysctl);
 }
+
 #else
 #define watchdog_sysctl_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
@@ -1011,6 +1020,8 @@ static int __init lockup_detector_check(void)
 	/* Make sure no work is pending. */
 	flush_work(&detector_work);
 
+	watchdog_sysctl_init();
+
 	return 0;
 
 }
@@ -1030,5 +1041,4 @@ void __init lockup_detector_init(void)
 		allow_lockup_detector_init_retry = true;
 
 	lockup_detector_setup();
-	watchdog_sysctl_init();
 }
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 01/10] watchdog/hardlockup: Keep kernel.nmi_watchdog sysctl as 0444 if probe fails
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

The permissions for the kernel.nmi_watchdog sysctl have always been
set at compile time despite the fact that a watchdog can fail to
probe. Let's fix this and set the permissions based on whether the
hardlockup detector actually probed.

Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf")
Reported-by: Petr Mladek <pmladek@suse.com>
Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/nmi.h |  6 ------
 kernel/watchdog.c   | 30 ++++++++++++++++++++----------
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 333465e235e1..3a27169ec383 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -95,12 +95,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 static inline void arch_touch_nmi_watchdog(void) { }
 #endif
 
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
-# define NMI_WATCHDOG_SYSCTL_PERM	0644
-#else
-# define NMI_WATCHDOG_SYSCTL_PERM	0444
-#endif
-
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 237990e8d345..4b9e31edb47f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -880,15 +880,6 @@ static struct ctl_table watchdog_sysctls[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= (void *)&sixty,
 	},
-	{
-		.procname       = "nmi_watchdog",
-		.data		= &watchdog_hardlockup_user_enabled,
-		.maxlen		= sizeof(int),
-		.mode		= NMI_WATCHDOG_SYSCTL_PERM,
-		.proc_handler   = proc_nmi_watchdog,
-		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE,
-	},
 	{
 		.procname	= "watchdog_cpumask",
 		.data		= &watchdog_cpumask_bits,
@@ -952,10 +943,28 @@ static struct ctl_table watchdog_sysctls[] = {
 	{}
 };
 
+static struct ctl_table watchdog_hardlockup_sysctl[] = {
+	{
+		.procname       = "nmi_watchdog",
+		.data		= &watchdog_hardlockup_user_enabled,
+		.maxlen		= sizeof(int),
+		.mode		= 0444,
+		.proc_handler   = proc_nmi_watchdog,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
+	{}
+};
+
 static void __init watchdog_sysctl_init(void)
 {
 	register_sysctl_init("kernel", watchdog_sysctls);
+
+	if (watchdog_hardlockup_available)
+		watchdog_hardlockup_sysctl[0].mode = 0644;
+	register_sysctl_init("kernel", watchdog_hardlockup_sysctl);
 }
+
 #else
 #define watchdog_sysctl_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
@@ -1011,6 +1020,8 @@ static int __init lockup_detector_check(void)
 	/* Make sure no work is pending. */
 	flush_work(&detector_work);
 
+	watchdog_sysctl_init();
+
 	return 0;
 
 }
@@ -1030,5 +1041,4 @@ void __init lockup_detector_init(void)
 		allow_lockup_detector_init_retry = true;
 
 	lockup_detector_setup();
-	watchdog_sysctl_init();
 }
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/10] watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG
without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one
architecture, we have some special case code in the watchdog core to
handle the fact that watchdog_hardlockup_probe() isn't implemented.

Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of
the special case.

As a side effect of doing this, code inspection tells us that we could
fix a minor bug where the system won't properly realize that NMI
watchdogs are disabled. Specifically, on powerpc if
CONFIG_PPC_WATCHDOG is turned off the arch might still select
CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which selects
CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then
nothing will override the "weak" watchdog_hardlockup_probe() and we'll
fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Though this does fix a minor bug, I didn't mark this as "Fixes"
because it's super minor. One could also argue that this wasn't a bug
at all but simply was never an implemented feature. The code that
added some amount of dynamicness here was commit a994a3147e4c
("watchdog/hardlockup/perf: Implement init time detection of perf")
which, as per the title, was only intending to make "perf"
dynamic. The old NMI watchdog presumably has never been handled
dynamically.

 arch/Kconfig            |  3 ++-
 arch/sparc/kernel/nmi.c |  5 +++++
 kernel/watchdog.c       | 13 -------------
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 64d771855ecd..b4f6387b12fe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -428,7 +428,8 @@ config HAVE_NMI_WATCHDOG
 	bool
 	help
 	  The arch provides a low level NMI watchdog. It provides
-	  asm/nmi.h, and defines its own arch_touch_nmi_watchdog().
+	  asm/nmi.h, and defines its own watchdog_hardlockup_probe() and
+	  arch_touch_nmi_watchdog().
 
 config HAVE_HARDLOCKUP_DETECTOR_ARCH
 	bool
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 9d9e29b75c43..17cdfdbf1f3b 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -65,6 +65,11 @@ void arch_touch_nmi_watchdog(void)
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
+int __init watchdog_hardlockup_probe(void)
+{
+	return 0;
+}
+
 static void die_nmi(const char *str, struct pt_regs *regs, int do_panic)
 {
 	int this_cpu = smp_processor_id();
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4b9e31edb47f..62230f5b8878 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -217,19 +217,6 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
  */
 int __weak __init watchdog_hardlockup_probe(void)
 {
-	/*
-	 * If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture
-	 * is assumed to have the hard watchdog available and we return 0.
-	 */
-	if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
-		return 0;
-
-	/*
-	 * Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG
-	 * are required to implement a non-weak version of this probe function
-	 * to tell whether they are available. If they don't override then
-	 * we'll return -ENODEV.
-	 */
 	return -ENODEV;
 }
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/10] watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG
without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one
architecture, we have some special case code in the watchdog core to
handle the fact that watchdog_hardlockup_probe() isn't implemented.

Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of
the special case.

As a side effect of doing this, code inspection tells us that we could
fix a minor bug where the system won't properly realize that NMI
watchdogs are disabled. Specifically, on powerpc if
CONFIG_PPC_WATCHDOG is turned off the arch might still select
CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which selects
CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then
nothing will override the "weak" watchdog_hardlockup_probe() and we'll
fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Though this does fix a minor bug, I didn't mark this as "Fixes"
because it's super minor. One could also argue that this wasn't a bug
at all but simply was never an implemented feature. The code that
added some amount of dynamicness here was commit a994a3147e4c
("watchdog/hardlockup/perf: Implement init time detection of perf")
which, as per the title, was only intending to make "perf"
dynamic. The old NMI watchdog presumably has never been handled
dynamically.

 arch/Kconfig            |  3 ++-
 arch/sparc/kernel/nmi.c |  5 +++++
 kernel/watchdog.c       | 13 -------------
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 64d771855ecd..b4f6387b12fe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -428,7 +428,8 @@ config HAVE_NMI_WATCHDOG
 	bool
 	help
 	  The arch provides a low level NMI watchdog. It provides
-	  asm/nmi.h, and defines its own arch_touch_nmi_watchdog().
+	  asm/nmi.h, and defines its own watchdog_hardlockup_probe() and
+	  arch_touch_nmi_watchdog().
 
 config HAVE_HARDLOCKUP_DETECTOR_ARCH
 	bool
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 9d9e29b75c43..17cdfdbf1f3b 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -65,6 +65,11 @@ void arch_touch_nmi_watchdog(void)
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
+int __init watchdog_hardlockup_probe(void)
+{
+	return 0;
+}
+
 static void die_nmi(const char *str, struct pt_regs *regs, int do_panic)
 {
 	int this_cpu = smp_processor_id();
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4b9e31edb47f..62230f5b8878 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -217,19 +217,6 @@ void __weak watchdog_hardlockup_disable(unsigned int cpu) { }
  */
 int __weak __init watchdog_hardlockup_probe(void)
 {
-	/*
-	 * If CONFIG_HAVE_NMI_WATCHDOG is defined then an architecture
-	 * is assumed to have the hard watchdog available and we return 0.
-	 */
-	if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
-		return 0;
-
-	/*
-	 * Hardlockup detectors other than those using CONFIG_HAVE_NMI_WATCHDOG
-	 * are required to implement a non-weak version of this probe function
-	 * to tell whether they are available. If they don't override then
-	 * we'll return -ENODEV.
-	 */
 	return -ENODEV;
 }
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/10] watchdog/hardlockup: Don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") there was no reason to use
raw_cpu_ptr(). Using this_cpu_ptr() works fine.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 62230f5b8878..32dac8028753 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -133,7 +133,7 @@ static bool is_hardlockup(unsigned int cpu)
 
 static unsigned long watchdog_hardlockup_kick(void)
 {
-	return atomic_inc_return(raw_cpu_ptr(&hrtimer_interrupts));
+	return atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts));
 }
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/10] watchdog/hardlockup: Don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") there was no reason to use
raw_cpu_ptr(). Using this_cpu_ptr() works fine.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 62230f5b8878..32dac8028753 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -133,7 +133,7 @@ static bool is_hardlockup(unsigned int cpu)
 
 static unsigned long watchdog_hardlockup_kick(void)
 {
-	return atomic_inc_return(raw_cpu_ptr(&hrtimer_interrupts));
+	return atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts));
 }
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/10] watchdog/hardlockup: In watchdog_hardlockup_check() use cpumask_copy()
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started using a cpumask to keep track
of which CPUs to backtrace. When setting up this cpumask, it's better
to use cpumask_copy() than to just copy the structure directly. Fix this.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 32dac8028753..85f4839b6faf 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -154,7 +154,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 	 */
 	if (is_hardlockup(cpu)) {
 		unsigned int this_cpu = smp_processor_id();
-		struct cpumask backtrace_mask = *cpu_online_mask;
+		struct cpumask backtrace_mask;
+
+		cpumask_copy(&backtrace_mask, cpu_online_mask);
 
 		/* Only print hardlockups once. */
 		if (per_cpu(watchdog_hardlockup_warned, cpu))
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/10] watchdog/hardlockup: In watchdog_hardlockup_check() use cpumask_copy()
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started using a cpumask to keep track
of which CPUs to backtrace. When setting up this cpumask, it's better
to use cpumask_copy() than to just copy the structure directly. Fix this.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 32dac8028753..85f4839b6faf 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -154,7 +154,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 	 */
 	if (is_hardlockup(cpu)) {
 		unsigned int this_cpu = smp_processor_id();
-		struct cpumask backtrace_mask = *cpu_online_mask;
+		struct cpumask backtrace_mask;
+
+		cpumask_copy(&backtrace_mask, cpu_online_mask);
 
 		/* Only print hardlockups once. */
 		if (per_cpu(watchdog_hardlockup_warned, cpu))
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/10] watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

In the patch ("watchdog/hardlockup: add comments to
touch_nmi_watchdog()") we adjusted some comments for
touch_nmi_watchdog(). The comment about the softlockup had a typo and
were also felt to be too obvious. Remove it.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/nmi.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 3a27169ec383..6c6a5ce77c66 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -140,10 +140,6 @@ static inline void touch_nmi_watchdog(void)
 	 */
 	arch_touch_nmi_watchdog();
 
-	/*
-	 * Touching the hardlock detector implcitily pets the
-	 * softlockup detector too
-	 */
 	touch_softlockup_watchdog();
 }
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/10] watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

In the patch ("watchdog/hardlockup: add comments to
touch_nmi_watchdog()") we adjusted some comments for
touch_nmi_watchdog(). The comment about the softlockup had a typo and
were also felt to be too obvious. Remove it.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/nmi.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 3a27169ec383..6c6a5ce77c66 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -140,10 +140,6 @@ static inline void touch_nmi_watchdog(void)
 	 */
 	arch_touch_nmi_watchdog();
 
-	/*
-	 * Touching the hardlock detector implcitily pets the
-	 * softlockup detector too
-	 */
 	touch_softlockup_watchdog();
 }
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/10] watchdog/buddy: Cleanup how watchdog_buddy_check_hardlockup() is called
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

In the patch ("watchdog/hardlockup: detect hard lockups using
secondary (buddy) CPUs"), we added a call from the common watchdog.c
file into the buddy. That call could be done more cleanly.
Specifically:
1. If we move the call into watchdog_hardlockup_kick() then it keeps
   watchdog_timer_fn() simpler.
2. We don't need to pass an "unsigned long" to the buddy for the timer
   count. In the patch ("watchdog/hardlockup: add a "cpu" param to
   watchdog_hardlockup_check()") the count was changed to "atomic_t"
   which is backed by an int, so we should match types.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/nmi.h     |  4 ++--
 kernel/watchdog.c       | 15 +++++++--------
 kernel/watchdog_buddy.c |  2 +-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 6c6a5ce77c66..43893e5f858a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -114,9 +114,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
 void lockup_detector_reconfigure(void);
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
-void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts);
+void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
 #else
-static inline void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) {}
+static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
 #endif
 
 /**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 85f4839b6faf..6cc46b8e3d07 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -131,9 +131,12 @@ static bool is_hardlockup(unsigned int cpu)
 	return false;
 }
 
-static unsigned long watchdog_hardlockup_kick(void)
+static void watchdog_hardlockup_kick(void)
 {
-	return atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts));
+	int new_interrupts;
+
+	new_interrupts = atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts));
+	watchdog_buddy_check_hardlockup(new_interrupts);
 }
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
@@ -195,7 +198,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 
 #else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */
 
-static inline unsigned long watchdog_hardlockup_kick(void) { return 0; }
+static inline void watchdog_hardlockup_kick(void) { }
 
 #endif /* !CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */
 
@@ -449,15 +452,11 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	struct pt_regs *regs = get_irq_regs();
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
-	unsigned long hrtimer_interrupts;
 
 	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
 
-	hrtimer_interrupts = watchdog_hardlockup_kick();
-
-	/* test for hardlockups */
-	watchdog_buddy_check_hardlockup(hrtimer_interrupts);
+	watchdog_hardlockup_kick();
 
 	/* kick the softlockup detector */
 	if (completion_done(this_cpu_ptr(&softlockup_completion))) {
diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index fee45af2e5bd..3ffc5f2ede5a 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -72,7 +72,7 @@ void watchdog_hardlockup_disable(unsigned int cpu)
 	cpumask_clear_cpu(cpu, &watchdog_cpus);
 }
 
-void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts)
+void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
 {
 	unsigned int next_cpu;
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/10] watchdog/buddy: Cleanup how watchdog_buddy_check_hardlockup() is called
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

In the patch ("watchdog/hardlockup: detect hard lockups using
secondary (buddy) CPUs"), we added a call from the common watchdog.c
file into the buddy. That call could be done more cleanly.
Specifically:
1. If we move the call into watchdog_hardlockup_kick() then it keeps
   watchdog_timer_fn() simpler.
2. We don't need to pass an "unsigned long" to the buddy for the timer
   count. In the patch ("watchdog/hardlockup: add a "cpu" param to
   watchdog_hardlockup_check()") the count was changed to "atomic_t"
   which is backed by an int, so we should match types.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/nmi.h     |  4 ++--
 kernel/watchdog.c       | 15 +++++++--------
 kernel/watchdog_buddy.c |  2 +-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 6c6a5ce77c66..43893e5f858a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -114,9 +114,9 @@ void watchdog_hardlockup_disable(unsigned int cpu);
 void lockup_detector_reconfigure(void);
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY
-void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts);
+void watchdog_buddy_check_hardlockup(int hrtimer_interrupts);
 #else
-static inline void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) {}
+static inline void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) {}
 #endif
 
 /**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 85f4839b6faf..6cc46b8e3d07 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -131,9 +131,12 @@ static bool is_hardlockup(unsigned int cpu)
 	return false;
 }
 
-static unsigned long watchdog_hardlockup_kick(void)
+static void watchdog_hardlockup_kick(void)
 {
-	return atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts));
+	int new_interrupts;
+
+	new_interrupts = atomic_inc_return(this_cpu_ptr(&hrtimer_interrupts));
+	watchdog_buddy_check_hardlockup(new_interrupts);
 }
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
@@ -195,7 +198,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 
 #else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */
 
-static inline unsigned long watchdog_hardlockup_kick(void) { return 0; }
+static inline void watchdog_hardlockup_kick(void) { }
 
 #endif /* !CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */
 
@@ -449,15 +452,11 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	struct pt_regs *regs = get_irq_regs();
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
-	unsigned long hrtimer_interrupts;
 
 	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
 
-	hrtimer_interrupts = watchdog_hardlockup_kick();
-
-	/* test for hardlockups */
-	watchdog_buddy_check_hardlockup(hrtimer_interrupts);
+	watchdog_hardlockup_kick();
 
 	/* kick the softlockup detector */
 	if (completion_done(this_cpu_ptr(&softlockup_completion))) {
diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index fee45af2e5bd..3ffc5f2ede5a 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -72,7 +72,7 @@ void watchdog_hardlockup_disable(unsigned int cpu)
 	cpumask_clear_cpu(cpu, &watchdog_cpus);
 }
 
-void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts)
+void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
 {
 	unsigned int next_cpu;
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/10] watchdog/buddy: Don't copy the cpumask in watchdog_next_cpu()
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

There's no reason to make a copy of the "watchdog_cpus" locally in
watchdog_next_cpu(). Making a copy wouldn't make things any more race
free and we're just reading the value so there's no need for a copy.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog_buddy.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index 3ffc5f2ede5a..2ef88722c5e7 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -10,12 +10,11 @@ static cpumask_t __read_mostly watchdog_cpus;
 
 static unsigned int watchdog_next_cpu(unsigned int cpu)
 {
-	cpumask_t cpus = watchdog_cpus;
 	unsigned int next_cpu;
 
-	next_cpu = cpumask_next(cpu, &cpus);
+	next_cpu = cpumask_next(cpu, &watchdog_cpus);
 	if (next_cpu >= nr_cpu_ids)
-		next_cpu = cpumask_first(&cpus);
+		next_cpu = cpumask_first(&watchdog_cpus);
 
 	if (next_cpu == cpu)
 		return nr_cpu_ids;
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/10] watchdog/buddy: Don't copy the cpumask in watchdog_next_cpu()
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

There's no reason to make a copy of the "watchdog_cpus" locally in
watchdog_next_cpu(). Making a copy wouldn't make things any more race
free and we're just reading the value so there's no need for a copy.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog_buddy.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index 3ffc5f2ede5a..2ef88722c5e7 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -10,12 +10,11 @@ static cpumask_t __read_mostly watchdog_cpus;
 
 static unsigned int watchdog_next_cpu(unsigned int cpu)
 {
-	cpumask_t cpus = watchdog_cpus;
 	unsigned int next_cpu;
 
-	next_cpu = cpumask_next(cpu, &cpus);
+	next_cpu = cpumask_next(cpu, &watchdog_cpus);
 	if (next_cpu >= nr_cpu_ids)
-		next_cpu = cpumask_first(&cpus);
+		next_cpu = cpumask_first(&watchdog_cpus);
 
 	if (next_cpu == cpu)
 		return nr_cpu_ids;
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/10] watchdog/buddy: Simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

The dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY was more
complicated than it needed to be. If the "perf" detector is available
and we have SMP then we have a choice, so enable the config based on
just those two config items.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c9df93402237..eb1edd5905bc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1065,7 +1065,7 @@ config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
 
 config HARDLOCKUP_DETECTOR_PREFER_BUDDY
 	bool "Prefer the buddy CPU hardlockup detector"
-	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
 	help
 	  Say Y here to prefer the buddy hardlockup detector over the perf one.
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/10] watchdog/buddy: Simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

The dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY was more
complicated than it needed to be. If the "perf" detector is available
and we have SMP then we have a choice, so enable the config based on
just those two config items.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 lib/Kconfig.debug | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c9df93402237..eb1edd5905bc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1065,7 +1065,7 @@ config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
 
 config HARDLOCKUP_DETECTOR_PREFER_BUDDY
 	bool "Prefer the buddy CPU hardlockup detector"
-	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP
 	help
 	  Say Y here to prefer the buddy hardlockup detector over the perf one.
 
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/10] watchdog/hardlockup: Move SMP barriers from common code to buddy code
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

It's been suggested that since the SMP barriers are only potentially
useful for the buddy hardlockup detector, not the perf hardlockup
detector, that the barriers belong in the buddy code. Let's move them
and add clearer comments about why they're needed.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog.c       |  6 ------
 kernel/watchdog_buddy.c | 21 +++++++++++++++++++++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6cc46b8e3d07..a351ab0c35eb 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -109,9 +109,6 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 void watchdog_hardlockup_touch_cpu(unsigned int cpu)
 {
 	per_cpu(watchdog_hardlockup_touched, cpu) = true;
-
-	/* Match with smp_rmb() in watchdog_hardlockup_check() */
-	smp_wmb();
 }
 
 static bool is_hardlockup(unsigned int cpu)
@@ -141,9 +138,6 @@ static void watchdog_hardlockup_kick(void)
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
-	/* Match with smp_wmb() in watchdog_hardlockup_touch_cpu() */
-	smp_rmb();
-
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
 		return;
diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index 2ef88722c5e7..34dbfe091f4b 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -51,6 +51,13 @@ void watchdog_hardlockup_enable(unsigned int cpu)
 	if (next_cpu < nr_cpu_ids)
 		watchdog_hardlockup_touch_cpu(next_cpu);
 
+	/*
+	 * Makes sure that watchdog is touched on this CPU before
+	 * other CPUs could see it in watchdog_cpus. The counter
+	 * part is in watchdog_buddy_check_hardlockup().
+	 */
+	smp_wmb();
+
 	cpumask_set_cpu(cpu, &watchdog_cpus);
 }
 
@@ -68,6 +75,13 @@ void watchdog_hardlockup_disable(unsigned int cpu)
 	if (next_cpu < nr_cpu_ids)
 		watchdog_hardlockup_touch_cpu(next_cpu);
 
+	/*
+	 * Makes sure that watchdog is touched on the next CPU before
+	 * this CPU disappear in watchdog_cpus. The counter part is in
+	 * watchdog_buddy_check_hardlockup().
+	 */
+	smp_wmb();
+
 	cpumask_clear_cpu(cpu, &watchdog_cpus);
 }
 
@@ -88,5 +102,12 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
 	if (next_cpu >= nr_cpu_ids)
 		return;
 
+	/*
+	 * Make sure that the watchdog was touched on next CPU when
+	 * watchdog_next_cpu() returned another one because of
+	 * a change in watchdog_hardlockup_enable()/disable().
+	 */
+	smp_rmb();
+
 	watchdog_hardlockup_check(next_cpu, NULL);
 }
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/10] watchdog/hardlockup: Move SMP barriers from common code to buddy code
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

It's been suggested that since the SMP barriers are only potentially
useful for the buddy hardlockup detector, not the perf hardlockup
detector, that the barriers belong in the buddy code. Let's move them
and add clearer comments about why they're needed.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/watchdog.c       |  6 ------
 kernel/watchdog_buddy.c | 21 +++++++++++++++++++++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6cc46b8e3d07..a351ab0c35eb 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -109,9 +109,6 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 void watchdog_hardlockup_touch_cpu(unsigned int cpu)
 {
 	per_cpu(watchdog_hardlockup_touched, cpu) = true;
-
-	/* Match with smp_rmb() in watchdog_hardlockup_check() */
-	smp_wmb();
 }
 
 static bool is_hardlockup(unsigned int cpu)
@@ -141,9 +138,6 @@ static void watchdog_hardlockup_kick(void)
 
 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
-	/* Match with smp_wmb() in watchdog_hardlockup_touch_cpu() */
-	smp_rmb();
-
 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
 		return;
diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c
index 2ef88722c5e7..34dbfe091f4b 100644
--- a/kernel/watchdog_buddy.c
+++ b/kernel/watchdog_buddy.c
@@ -51,6 +51,13 @@ void watchdog_hardlockup_enable(unsigned int cpu)
 	if (next_cpu < nr_cpu_ids)
 		watchdog_hardlockup_touch_cpu(next_cpu);
 
+	/*
+	 * Makes sure that watchdog is touched on this CPU before
+	 * other CPUs could see it in watchdog_cpus. The counter
+	 * part is in watchdog_buddy_check_hardlockup().
+	 */
+	smp_wmb();
+
 	cpumask_set_cpu(cpu, &watchdog_cpus);
 }
 
@@ -68,6 +75,13 @@ void watchdog_hardlockup_disable(unsigned int cpu)
 	if (next_cpu < nr_cpu_ids)
 		watchdog_hardlockup_touch_cpu(next_cpu);
 
+	/*
+	 * Makes sure that watchdog is touched on the next CPU before
+	 * this CPU disappear in watchdog_cpus. The counter part is in
+	 * watchdog_buddy_check_hardlockup().
+	 */
+	smp_wmb();
+
 	cpumask_clear_cpu(cpu, &watchdog_cpus);
 }
 
@@ -88,5 +102,12 @@ void watchdog_buddy_check_hardlockup(int hrtimer_interrupts)
 	if (next_cpu >= nr_cpu_ids)
 		return;
 
+	/*
+	 * Make sure that the watchdog was touched on next CPU when
+	 * watchdog_next_cpu() returned another one because of
+	 * a change in watchdog_hardlockup_enable()/disable().
+	 */
+	smp_rmb();
+
 	watchdog_hardlockup_check(next_cpu, NULL);
 }
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/10] watchdog/hardlockup: Rename HAVE_HARDLOCKUP_DETECTOR_NON_ARCH to ..._PERF_OR_BUDDY
  2023-05-27  1:41 ` Douglas Anderson
@ 2023-05-27  1:41   ` Douglas Anderson
  -1 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, Michael Ellerman,
	linuxppc-dev, Christophe Leroy, sparclinux, David S . Miller,
	linux-perf-users, Douglas Anderson

HAVE_HARDLOCKUP_DETECTOR_NON_ARCH is a mouthful and
confusing. HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY is even more of a
mouthful, but probably less confusing. Rename the Kconfig names.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 lib/Kconfig.debug | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index eb1edd5905bc..b9e162698a82 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1058,7 +1058,7 @@ config HARDLOCKUP_DETECTOR_BUDDY
 # needs SMP). In either case, using the "non-arch" code conflicts with
 # the NMI watchdog code (which is sometimes used directly and sometimes used
 # by the arch-provided hardlockup detector).
-config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+config HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
 	bool
 	depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
 	default y
@@ -1077,10 +1077,10 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY
 	  an arch-specific hardlockup detector or if resources needed
 	  for the hardlockup detector are better used for other things.
 
-# This will select the appropriate non-arch hardlockdup detector
-config HARDLOCKUP_DETECTOR_NON_ARCH
+# This will select the appropriate non-arch hardlockup detector
+config HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
 	bool
-	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
 	select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
 	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
 
@@ -1098,9 +1098,9 @@ config HARDLOCKUP_CHECK_TIMESTAMP
 config HARDLOCKUP_DETECTOR
 	bool "Detect Hard Lockups"
 	depends on DEBUG_KERNEL && !S390
-	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH
 	select LOCKUP_DETECTOR
-	select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+	select HARDLOCKUP_DETECTOR_PERF_OR_BUDDY if HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
 
 	help
 	  Say Y here to enable the kernel to act as a watchdog to detect
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/10] watchdog/hardlockup: Rename HAVE_HARDLOCKUP_DETECTOR_NON_ARCH to ..._PERF_OR_BUDDY
@ 2023-05-27  1:41   ` Douglas Anderson
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas Anderson @ 2023-05-27  1:41 UTC (permalink / raw)
  To: Petr Mladek, Andrew Morton
  Cc: kgdb-bugreport, linux-kernel, Douglas Anderson, linux-perf-users,
	Nicholas Piggin, sparclinux, linuxppc-dev, David S . Miller

HAVE_HARDLOCKUP_DETECTOR_NON_ARCH is a mouthful and
confusing. HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY is even more of a
mouthful, but probably less confusing. Rename the Kconfig names.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 lib/Kconfig.debug | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index eb1edd5905bc..b9e162698a82 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1058,7 +1058,7 @@ config HARDLOCKUP_DETECTOR_BUDDY
 # needs SMP). In either case, using the "non-arch" code conflicts with
 # the NMI watchdog code (which is sometimes used directly and sometimes used
 # by the arch-provided hardlockup detector).
-config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+config HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
 	bool
 	depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
 	default y
@@ -1077,10 +1077,10 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY
 	  an arch-specific hardlockup detector or if resources needed
 	  for the hardlockup detector are better used for other things.
 
-# This will select the appropriate non-arch hardlockdup detector
-config HARDLOCKUP_DETECTOR_NON_ARCH
+# This will select the appropriate non-arch hardlockup detector
+config HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
 	bool
-	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
 	select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
 	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
 
@@ -1098,9 +1098,9 @@ config HARDLOCKUP_CHECK_TIMESTAMP
 config HARDLOCKUP_DETECTOR
 	bool "Detect Hard Lockups"
 	depends on DEBUG_KERNEL && !S390
-	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH
 	select LOCKUP_DETECTOR
-	select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
+	select HARDLOCKUP_DETECTOR_PERF_OR_BUDDY if HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
 
 	help
 	  Say Y here to enable the kernel to act as a watchdog to detect
-- 
2.41.0.rc0.172.g3f132b7071-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/10] watchdog/hardlockup: Keep kernel.nmi_watchdog sysctl as 0444 if probe fails
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 14:15     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:15 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:31, Douglas Anderson wrote:
> The permissions for the kernel.nmi_watchdog sysctl have always been
> set at compile time despite the fact that a watchdog can fail to
> probe. Let's fix this and set the permissions based on whether the
> hardlockup detector actually probed.
> 
> Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf")
> Reported-by: Petr Mladek <pmladek@suse.com>
> Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Looks good to me:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/10] watchdog/hardlockup: Keep kernel.nmi_watchdog sysctl as 0444 if probe fails
@ 2023-05-30 14:15     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:15 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:31, Douglas Anderson wrote:
> The permissions for the kernel.nmi_watchdog sysctl have always been
> set at compile time despite the fact that a watchdog can fail to
> probe. Let's fix this and set the permissions based on whether the
> hardlockup detector actually probed.
> 
> Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf")
> Reported-by: Petr Mladek <pmladek@suse.com>
> Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Looks good to me:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/10] watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 14:38     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:38 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:32, Douglas Anderson wrote:
> Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG
> without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one
> architecture, we have some special case code in the watchdog core to
> handle the fact that watchdog_hardlockup_probe() isn't implemented.
> 
> Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of
> the special case.
> 
> As a side effect of doing this, code inspection tells us that we could
> fix a minor bug where the system won't properly realize that NMI
> watchdogs are disabled. Specifically, on powerpc if
> CONFIG_PPC_WATCHDOG is turned off the arch might still select
> CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which selects
> CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then
> nothing will override the "weak" watchdog_hardlockup_probe() and we'll
> fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Looks good:

Reviewed-by: Petr Mladek <pmladek@suse.com>

> ---
> Though this does fix a minor bug, I didn't mark this as "Fixes"
> because it's super minor. One could also argue that this wasn't a bug
> at all but simply was never an implemented feature. The code that
> added some amount of dynamicness here was commit a994a3147e4c
> ("watchdog/hardlockup/perf: Implement init time detection of perf")
> which, as per the title, was only intending to make "perf"
> dynamic. The old NMI watchdog presumably has never been handled
> dynamically.

I agree that it is a minor bug and Fixes tag is not needed.

Another motivation is that all the dependencies are quite
complicated. IMHO, it is not worth spending time on
reviewing potential backports.

That said, I am afraid that the artificial intelligence will nominate
this patch for stable backports even without the Fixes tag.
You know, there are the words "fix" and "bug" in the commit message ;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/10] watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
@ 2023-05-30 14:38     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:38 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:32, Douglas Anderson wrote:
> Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG
> without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one
> architecture, we have some special case code in the watchdog core to
> handle the fact that watchdog_hardlockup_probe() isn't implemented.
> 
> Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of
> the special case.
> 
> As a side effect of doing this, code inspection tells us that we could
> fix a minor bug where the system won't properly realize that NMI
> watchdogs are disabled. Specifically, on powerpc if
> CONFIG_PPC_WATCHDOG is turned off the arch might still select
> CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which selects
> CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then
> nothing will override the "weak" watchdog_hardlockup_probe() and we'll
> fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Looks good:

Reviewed-by: Petr Mladek <pmladek@suse.com>

> ---
> Though this does fix a minor bug, I didn't mark this as "Fixes"
> because it's super minor. One could also argue that this wasn't a bug
> at all but simply was never an implemented feature. The code that
> added some amount of dynamicness here was commit a994a3147e4c
> ("watchdog/hardlockup/perf: Implement init time detection of perf")
> which, as per the title, was only intending to make "perf"
> dynamic. The old NMI watchdog presumably has never been handled
> dynamically.

I agree that it is a minor bug and Fixes tag is not needed.

Another motivation is that all the dependencies are quite
complicated. IMHO, it is not worth spending time on
reviewing potential backports.

That said, I am afraid that the artificial intelligence will nominate
this patch for stable backports even without the Fixes tag.
You know, there are the words "fix" and "bug" in the commit message ;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/10] watchdog/hardlockup: Don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 14:39     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:39 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:33, Douglas Anderson wrote:
> In the patch ("watchdog/hardlockup: add a "cpu" param to
> watchdog_hardlockup_check()") there was no reason to use
> raw_cpu_ptr(). Using this_cpu_ptr() works fine.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/10] watchdog/hardlockup: Don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
@ 2023-05-30 14:39     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:39 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:33, Douglas Anderson wrote:
> In the patch ("watchdog/hardlockup: add a "cpu" param to
> watchdog_hardlockup_check()") there was no reason to use
> raw_cpu_ptr(). Using this_cpu_ptr() works fine.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/10] watchdog/hardlockup: In watchdog_hardlockup_check() use cpumask_copy()
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 14:40     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:40 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:34, Douglas Anderson wrote:
> In the patch ("watchdog/hardlockup: add a "cpu" param to
> watchdog_hardlockup_check()") we started using a cpumask to keep track
> of which CPUs to backtrace. When setting up this cpumask, it's better
> to use cpumask_copy() than to just copy the structure directly. Fix this.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/10] watchdog/hardlockup: In watchdog_hardlockup_check() use cpumask_copy()
@ 2023-05-30 14:40     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:40 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:34, Douglas Anderson wrote:
> In the patch ("watchdog/hardlockup: add a "cpu" param to
> watchdog_hardlockup_check()") we started using a cpumask to keep track
> of which CPUs to backtrace. When setting up this cpumask, it's better
> to use cpumask_copy() than to just copy the structure directly. Fix this.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/10] watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 14:42     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:42 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:35, Douglas Anderson wrote:
> In the patch ("watchdog/hardlockup: add comments to
> touch_nmi_watchdog()") we adjusted some comments for
> touch_nmi_watchdog(). The comment about the softlockup had a typo and
> were also felt to be too obvious. Remove it.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/10] watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
@ 2023-05-30 14:42     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:42 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:35, Douglas Anderson wrote:
> In the patch ("watchdog/hardlockup: add comments to
> touch_nmi_watchdog()") we adjusted some comments for
> touch_nmi_watchdog(). The comment about the softlockup had a typo and
> were also felt to be too obvious. Remove it.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/10] watchdog/buddy: Cleanup how watchdog_buddy_check_hardlockup() is called
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 14:56     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:56 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:36, Douglas Anderson wrote:
> In the patch ("watchdog/hardlockup: detect hard lockups using
> secondary (buddy) CPUs"), we added a call from the common watchdog.c
> file into the buddy. That call could be done more cleanly.
> Specifically:
> 1. If we move the call into watchdog_hardlockup_kick() then it keeps
>    watchdog_timer_fn() simpler.
> 2. We don't need to pass an "unsigned long" to the buddy for the timer
>    count. In the patch ("watchdog/hardlockup: add a "cpu" param to
>    watchdog_hardlockup_check()") the count was changed to "atomic_t"
>    which is backed by an int, so we should match types.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>


The change looks fine:

Reviewed-by: Petr Mladek <pmladek@suse.com>

That said, I would prefer to squash it into the patch ("watchdog/hardlockup:
detect hard lockups using secondary (buddy) CPUs"). It would remove
some back and forth churn in the git history. But it is up to Andrew.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/10] watchdog/buddy: Cleanup how watchdog_buddy_check_hardlockup() is called
@ 2023-05-30 14:56     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:56 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:36, Douglas Anderson wrote:
> In the patch ("watchdog/hardlockup: detect hard lockups using
> secondary (buddy) CPUs"), we added a call from the common watchdog.c
> file into the buddy. That call could be done more cleanly.
> Specifically:
> 1. If we move the call into watchdog_hardlockup_kick() then it keeps
>    watchdog_timer_fn() simpler.
> 2. We don't need to pass an "unsigned long" to the buddy for the timer
>    count. In the patch ("watchdog/hardlockup: add a "cpu" param to
>    watchdog_hardlockup_check()") the count was changed to "atomic_t"
>    which is backed by an int, so we should match types.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>


The change looks fine:

Reviewed-by: Petr Mladek <pmladek@suse.com>

That said, I would prefer to squash it into the patch ("watchdog/hardlockup:
detect hard lockups using secondary (buddy) CPUs"). It would remove
some back and forth churn in the git history. But it is up to Andrew.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/10] watchdog/buddy: Don't copy the cpumask in watchdog_next_cpu()
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 14:57     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:57 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:37, Douglas Anderson wrote:
> There's no reason to make a copy of the "watchdog_cpus" locally in
> watchdog_next_cpu(). Making a copy wouldn't make things any more race
> free and we're just reading the value so there's no need for a copy.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/10] watchdog/buddy: Don't copy the cpumask in watchdog_next_cpu()
@ 2023-05-30 14:57     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:57 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:37, Douglas Anderson wrote:
> There's no reason to make a copy of the "watchdog_cpus" locally in
> watchdog_next_cpu(). Making a copy wouldn't make things any more race
> free and we're just reading the value so there's no need for a copy.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/10] watchdog/buddy: Simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 14:58     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:58 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:38, Douglas Anderson wrote:
> The dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY was more
> complicated than it needed to be. If the "perf" detector is available
> and we have SMP then we have a choice, so enable the config based on
> just those two config items.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/10] watchdog/buddy: Simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
@ 2023-05-30 14:58     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 14:58 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:38, Douglas Anderson wrote:
> The dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY was more
> complicated than it needed to be. If the "perf" detector is available
> and we have SMP then we have a choice, so enable the config based on
> just those two config items.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/10] watchdog/hardlockup: Move SMP barriers from common code to buddy code
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-05-30 15:00     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 15:00 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:39, Douglas Anderson wrote:
> It's been suggested that since the SMP barriers are only potentially
> useful for the buddy hardlockup detector, not the perf hardlockup
> detector, that the barriers belong in the buddy code. Let's move them
> and add clearer comments about why they're needed.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/10] watchdog/hardlockup: Move SMP barriers from common code to buddy code
@ 2023-05-30 15:00     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-05-30 15:00 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:39, Douglas Anderson wrote:
> It's been suggested that since the SMP barriers are only potentially
> useful for the buddy hardlockup detector, not the perf hardlockup
> detector, that the barriers belong in the buddy code. Let's move them
> and add clearer comments about why they're needed.
> 
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/10] watchdog/hardlockup: Rename HAVE_HARDLOCKUP_DETECTOR_NON_ARCH to ..._PERF_OR_BUDDY
  2023-05-27  1:41   ` Douglas Anderson
@ 2023-06-01 13:03     ` Petr Mladek
  -1 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-06-01 13:03 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Andrew Morton, kgdb-bugreport, linux-kernel, Nicholas Piggin,
	Michael Ellerman, linuxppc-dev, Christophe Leroy, sparclinux,
	David S . Miller, linux-perf-users

On Fri 2023-05-26 18:41:40, Douglas Anderson wrote:
> HAVE_HARDLOCKUP_DETECTOR_NON_ARCH is a mouthful and
> confusing. HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY is even more of a
> mouthful, but probably less confusing. Rename the Kconfig names.

It is better. But I have an idea that might be even better.

> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  lib/Kconfig.debug | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index eb1edd5905bc..b9e162698a82 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1058,7 +1058,7 @@ config HARDLOCKUP_DETECTOR_BUDDY
>  # needs SMP). In either case, using the "non-arch" code conflicts with
>  # the NMI watchdog code (which is sometimes used directly and sometimes used
>  # by the arch-provided hardlockup detector).

The comment above still uses the term "no-arch" and tries to
explain the confusion around it.

> -config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> +config HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
>  	bool
>  	depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
>  	default y
> @@ -1077,10 +1077,10 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY
>  	  an arch-specific hardlockup detector or if resources needed
>  	  for the hardlockup detector are better used for other things.
>  
> -# This will select the appropriate non-arch hardlockdup detector
> -config HARDLOCKUP_DETECTOR_NON_ARCH
> +# This will select the appropriate non-arch hardlockup detector
> +config HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
>  	bool
> -	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
>  	select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
>  	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
>  
> @@ -1098,9 +1098,9 @@ config HARDLOCKUP_CHECK_TIMESTAMP
>  config HARDLOCKUP_DETECTOR
>  	bool "Detect Hard Lockups"
>  	depends on DEBUG_KERNEL && !S390
> -	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH
>  	select LOCKUP_DETECTOR
> -	select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> +	select HARDLOCKUP_DETECTOR_PERF_OR_BUDDY if HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
>  
>  	help
>  	  Say Y here to enable the kernel to act as a watchdog to detect

I am sorry but I am still confused by the logic. For me, it is far
from clear what combinations are possible, impossible, and optional.

Especially, the effect of HAVE_NMI_WATCHDOG and
HAVE_HARDLOCKUP_DETECTOR_ARCH is quite tricky.

I was playing with it and came up with a more straightforward solution
and found more possibilities how the simplify the logic. I am going
to prepare a patchset that would replace this patch.

Just to get the idea. I made the following changes:

     + define the values in logical order:
	+ HAVE_*
	+ HARDLOCKUP_DETECTOR y/n value
	+ HARDLOCKUP_DETECTOR_PREFER_BUDDY y/n value
	+ HARDLOCKUP_DETECTOR_PERF decision based on above
	+ HARDLOCKUP_DETECTOR_BUDDY decision based on above

     + remove HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY,
       instead, explicitly define the dependencies on all HAVE_*
       variables to make it clear what it possible
       and what is not possible

     + remove HARDLOCKUP_DETECTOR_PERF_OR_BUDDY,
       instead use "imply" in HARDLOCKUP_DETECTOR to trigger
       re-evaluation of HARDLOCKUP_DETECTOR_PERF and
       HARDLOCKUP_DETECTOR_BUDDY decisions


My current version has the following in lib/Kconfig.devel:

--- cut ---
config HAVE_HARDLOCKUP_DETECTOR_BUDDY
	bool
	depends on SMP
	default y

#
# arch/ can define HAVE_NMI_WATCHDOG to provide their own hard
# lockup detector rather than the generic perf or buddy detector.
#
config HARDLOCKUP_DETECTOR
	bool "Detect Hard Lockups"
	depends on DEBUG_KERNEL && !S390
	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_NMI_WATCHDOG
	imply HARDLOCKUP_DETECTOR_PERF
	imply HARDLOCKUP_DETECTOR_BUDDY
	select LOCKUP_DETECTOR

	help
	  Say Y here to enable the kernel to act as a watchdog to detect
	  hard lockups.

	  Hardlockups are bugs that cause the CPU to loop in kernel mode
	  for more than 10 seconds, without letting other interrupts have a
	  chance to run.  The current stack trace is displayed upon detection
	  and the system will stay locked up.

#
# The architecture-specific variant is always used when available,
# see HAVE_NMI_WATCHDOG
#
config HARDLOCKUP_DETECTOR_PREFER_BUDDY
	bool "Prefer the buddy CPU hardlockup detector"
	depends on HARDLOCKUP_DETECTOR
	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY && !HAVE_NMI_WATCHDOG
	default n
	help
	  Say Y here to prefer the buddy hardlockup detector over the perf one.

	  With the buddy detector, each CPU uses its softlockup hrtimer
	  to check that the next CPU is processing hrtimer interrupts by
	  verifying that a counter is increasing.

	  This hardlockup detector is useful on systems that don't have
	  an arch-specific hardlockup detector or if resources needed
	  for the hardlockup detector are better used for other things.

config HARDLOCKUP_DETECTOR_PERF
	bool
	depends on HARDLOCKUP_DETECTOR
	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY && !HAVE_NMI_WATCHDOG
	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER

config HARDLOCKUP_DETECTOR_BUDDY
	bool
	depends on HARDLOCKUP_DETECTOR
	depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
	depends on HARDLOCKUP_DETECTOR_PREFER_BUDDY || !HAVE_HARDLOCKUP_DETECTOR_PERF
	depends on !HAVE_NMI_WATCHDOG
	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER

# Both the "perf" and "buddy" hardlockup detectors need counting hrtimer
# interrupts.
config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
	bool
	depends on HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_BUDDY
	select SOFTLOCKUP_DETECTOR
--- cut ---

Also I am going to break the dependency between HAVE_NMI_WATCHDOG and
HAVE_HADRDLOCKUP_DETECTOR_ARCH. HAVE_NMI_WATCHDOG is needed only
for the very special powerpc64 watchdog. I am going to make sure
that it will be used only there and it will not be needed for
sparc and arm. As a result, we would have 4 separate implementations:

    + HAVE_HARDLOCKUP_DETECTOR_BUDDY enabled on any SMP system

    + HAVE_HARDLOCKUP_DETECTOR_PERF enabled on architectures supporting
	this perf-based solution

    + HAVE_HARDLOCKUP_DETECTOR_ARCH enabled on architectures which
	need another solution instead of the perf interface;
	they would support the usual HARDLOCKUP_DETECTOR command
	line parameters and sysctl interface

    + HAVE_NMI_WATCHDOG enabled just on powerpc64; it is special
	solution with its own command line parameters. Also it does
	not support hardlockup sysctl interface. I think about
	renaming it to HAVE_HARDLOCKUP_DETECTOR_POWERPC64 or
	_CUSTOM.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/10] watchdog/hardlockup: Rename HAVE_HARDLOCKUP_DETECTOR_NON_ARCH to ..._PERF_OR_BUDDY
@ 2023-06-01 13:03     ` Petr Mladek
  0 siblings, 0 replies; 42+ messages in thread
From: Petr Mladek @ 2023-06-01 13:03 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: kgdb-bugreport, linux-kernel, Nicholas Piggin, linux-perf-users,
	sparclinux, Andrew Morton, linuxppc-dev, David S . Miller

On Fri 2023-05-26 18:41:40, Douglas Anderson wrote:
> HAVE_HARDLOCKUP_DETECTOR_NON_ARCH is a mouthful and
> confusing. HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY is even more of a
> mouthful, but probably less confusing. Rename the Kconfig names.

It is better. But I have an idea that might be even better.

> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  lib/Kconfig.debug | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index eb1edd5905bc..b9e162698a82 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1058,7 +1058,7 @@ config HARDLOCKUP_DETECTOR_BUDDY
>  # needs SMP). In either case, using the "non-arch" code conflicts with
>  # the NMI watchdog code (which is sometimes used directly and sometimes used
>  # by the arch-provided hardlockup detector).

The comment above still uses the term "no-arch" and tries to
explain the confusion around it.

> -config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> +config HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
>  	bool
>  	depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG
>  	default y
> @@ -1077,10 +1077,10 @@ config HARDLOCKUP_DETECTOR_PREFER_BUDDY
>  	  an arch-specific hardlockup detector or if resources needed
>  	  for the hardlockup detector are better used for other things.
>  
> -# This will select the appropriate non-arch hardlockdup detector
> -config HARDLOCKUP_DETECTOR_NON_ARCH
> +# This will select the appropriate non-arch hardlockup detector
> +config HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
>  	bool
> -	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
>  	select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY
>  	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY
>  
> @@ -1098,9 +1098,9 @@ config HARDLOCKUP_CHECK_TIMESTAMP
>  config HARDLOCKUP_DETECTOR
>  	bool "Detect Hard Lockups"
>  	depends on DEBUG_KERNEL && !S390
> -	depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH
> +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH
>  	select LOCKUP_DETECTOR
> -	select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH
> +	select HARDLOCKUP_DETECTOR_PERF_OR_BUDDY if HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY
>  
>  	help
>  	  Say Y here to enable the kernel to act as a watchdog to detect

I am sorry but I am still confused by the logic. For me, it is far
from clear what combinations are possible, impossible, and optional.

Especially, the effect of HAVE_NMI_WATCHDOG and
HAVE_HARDLOCKUP_DETECTOR_ARCH is quite tricky.

I was playing with it and came up with a more straightforward solution
and found more possibilities how the simplify the logic. I am going
to prepare a patchset that would replace this patch.

Just to get the idea. I made the following changes:

     + define the values in logical order:
	+ HAVE_*
	+ HARDLOCKUP_DETECTOR y/n value
	+ HARDLOCKUP_DETECTOR_PREFER_BUDDY y/n value
	+ HARDLOCKUP_DETECTOR_PERF decision based on above
	+ HARDLOCKUP_DETECTOR_BUDDY decision based on above

     + remove HAVE_HARDLOCKUP_DETECTOR_PERF_OR_BUDDY,
       instead, explicitly define the dependencies on all HAVE_*
       variables to make it clear what it possible
       and what is not possible

     + remove HARDLOCKUP_DETECTOR_PERF_OR_BUDDY,
       instead use "imply" in HARDLOCKUP_DETECTOR to trigger
       re-evaluation of HARDLOCKUP_DETECTOR_PERF and
       HARDLOCKUP_DETECTOR_BUDDY decisions


My current version has the following in lib/Kconfig.devel:

--- cut ---
config HAVE_HARDLOCKUP_DETECTOR_BUDDY
	bool
	depends on SMP
	default y

#
# arch/ can define HAVE_NMI_WATCHDOG to provide their own hard
# lockup detector rather than the generic perf or buddy detector.
#
config HARDLOCKUP_DETECTOR
	bool "Detect Hard Lockups"
	depends on DEBUG_KERNEL && !S390
	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_NMI_WATCHDOG
	imply HARDLOCKUP_DETECTOR_PERF
	imply HARDLOCKUP_DETECTOR_BUDDY
	select LOCKUP_DETECTOR

	help
	  Say Y here to enable the kernel to act as a watchdog to detect
	  hard lockups.

	  Hardlockups are bugs that cause the CPU to loop in kernel mode
	  for more than 10 seconds, without letting other interrupts have a
	  chance to run.  The current stack trace is displayed upon detection
	  and the system will stay locked up.

#
# The architecture-specific variant is always used when available,
# see HAVE_NMI_WATCHDOG
#
config HARDLOCKUP_DETECTOR_PREFER_BUDDY
	bool "Prefer the buddy CPU hardlockup detector"
	depends on HARDLOCKUP_DETECTOR
	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && HAVE_HARDLOCKUP_DETECTOR_BUDDY && !HAVE_NMI_WATCHDOG
	default n
	help
	  Say Y here to prefer the buddy hardlockup detector over the perf one.

	  With the buddy detector, each CPU uses its softlockup hrtimer
	  to check that the next CPU is processing hrtimer interrupts by
	  verifying that a counter is increasing.

	  This hardlockup detector is useful on systems that don't have
	  an arch-specific hardlockup detector or if resources needed
	  for the hardlockup detector are better used for other things.

config HARDLOCKUP_DETECTOR_PERF
	bool
	depends on HARDLOCKUP_DETECTOR
	depends on HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY && !HAVE_NMI_WATCHDOG
	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER

config HARDLOCKUP_DETECTOR_BUDDY
	bool
	depends on HARDLOCKUP_DETECTOR
	depends on HAVE_HARDLOCKUP_DETECTOR_BUDDY
	depends on HARDLOCKUP_DETECTOR_PREFER_BUDDY || !HAVE_HARDLOCKUP_DETECTOR_PERF
	depends on !HAVE_NMI_WATCHDOG
	select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER

# Both the "perf" and "buddy" hardlockup detectors need counting hrtimer
# interrupts.
config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER
	bool
	depends on HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_BUDDY
	select SOFTLOCKUP_DETECTOR
--- cut ---

Also I am going to break the dependency between HAVE_NMI_WATCHDOG and
HAVE_HADRDLOCKUP_DETECTOR_ARCH. HAVE_NMI_WATCHDOG is needed only
for the very special powerpc64 watchdog. I am going to make sure
that it will be used only there and it will not be needed for
sparc and arm. As a result, we would have 4 separate implementations:

    + HAVE_HARDLOCKUP_DETECTOR_BUDDY enabled on any SMP system

    + HAVE_HARDLOCKUP_DETECTOR_PERF enabled on architectures supporting
	this perf-based solution

    + HAVE_HARDLOCKUP_DETECTOR_ARCH enabled on architectures which
	need another solution instead of the perf interface;
	they would support the usual HARDLOCKUP_DETECTOR command
	line parameters and sysctl interface

    + HAVE_NMI_WATCHDOG enabled just on powerpc64; it is special
	solution with its own command line parameters. Also it does
	not support hardlockup sysctl interface. I think about
	renaming it to HAVE_HARDLOCKUP_DETECTOR_POWERPC64 or
	_CUSTOM.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-06-01 13:04 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-27  1:41 [PATCH 00/10] watchdog: Cleanup / fixes after buddy series v5 reviews Douglas Anderson
2023-05-27  1:41 ` Douglas Anderson
2023-05-27  1:41 ` [PATCH 01/10] watchdog/hardlockup: Keep kernel.nmi_watchdog sysctl as 0444 if probe fails Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 14:15   ` Petr Mladek
2023-05-30 14:15     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 02/10] watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe() Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 14:38   ` Petr Mladek
2023-05-30 14:38     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 03/10] watchdog/hardlockup: Don't use raw_cpu_ptr() in watchdog_hardlockup_kick() Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 14:39   ` Petr Mladek
2023-05-30 14:39     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 04/10] watchdog/hardlockup: In watchdog_hardlockup_check() use cpumask_copy() Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 14:40   ` Petr Mladek
2023-05-30 14:40     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 05/10] watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog() Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 14:42   ` Petr Mladek
2023-05-30 14:42     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 06/10] watchdog/buddy: Cleanup how watchdog_buddy_check_hardlockup() is called Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 14:56   ` Petr Mladek
2023-05-30 14:56     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 07/10] watchdog/buddy: Don't copy the cpumask in watchdog_next_cpu() Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 14:57   ` Petr Mladek
2023-05-30 14:57     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 08/10] watchdog/buddy: Simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 14:58   ` Petr Mladek
2023-05-30 14:58     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 09/10] watchdog/hardlockup: Move SMP barriers from common code to buddy code Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-05-30 15:00   ` Petr Mladek
2023-05-30 15:00     ` Petr Mladek
2023-05-27  1:41 ` [PATCH 10/10] watchdog/hardlockup: Rename HAVE_HARDLOCKUP_DETECTOR_NON_ARCH to ..._PERF_OR_BUDDY Douglas Anderson
2023-05-27  1:41   ` Douglas Anderson
2023-06-01 13:03   ` Petr Mladek
2023-06-01 13:03     ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.