linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Support hld based on Pseudo-NMI for arm64
@ 2022-02-12 10:43 Lecopzer Chen
  2022-02-12 10:43 ` [PATCH 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT Lecopzer Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-12 10:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, davem, Matthias Brugger, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Petr Mladek,
	Andrew Morton, Wang Qing, Luis Chamberlain, Xiaoming Ni,
	linux-arm-kernel, linux-perf-users, sparclinux, linux-mediatek,
	sumit.garg, kernelfans, lecopzer.chen, yj.chiang

As we already used hld internally for arm64 since 2020, there still
doesn't have a proper commit on the upstream and we badly need it.

This serise rebase on 5.17-rc3 from [1] and the origin author is
Pingfan Liu <kernelfans@gmail.com>
Sumit Garg <sumit.garg@linaro.org>

[1] wasn't reviewed for its patch v3, I'll take over the further development
for this.


Qoute from [1]:

Hard lockup detector is helpful to diagnose unpaired irq enable/disable.
But the current watchdog framework can not cope with arm64 hw perf event
easily.

On arm64, when lockup_detector_init()->watchdog_nmi_probe(), PMU is not
ready until device_initcall(armv8_pmu_driver_init).  And it is deeply
integrated with the driver model and cpuhp. Hence it is hard to push the
initialization of armv8_pmu_driver_init() before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async. 
The async model is achieved by expanding watchdog_nmi_probe() with
-EBUSY, and a re-initializing work_struct which waits on a
wait_queue_head.

[1] https://lore.kernel.org/lkml/20211014024155.15253-1-kernelfans@gmail.com/


Lecopzer Chen (2):
  kernel/watchdog: remove WATCHDOG_DEFAULT
  kernel/watchdog: change watchdog_nmi_enable() to void

Pingfan Liu (2):
  kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup
    detector event
  kernel/watchdog: Adapt the watchdog_hld interface for async model

Sumit Garg (1):
  arm64: Enable perf events based hard lockup detector

 arch/arm64/Kconfig               |  2 ++
 arch/arm64/kernel/Makefile       |  1 +
 arch/arm64/kernel/perf_event.c   | 11 ++++--
 arch/arm64/kernel/watchdog_hld.c | 36 +++++++++++++++++++
 arch/sparc/kernel/nmi.c          |  8 ++---
 drivers/perf/arm_pmu.c           |  5 +++
 include/linux/nmi.h              | 11 +++++-
 include/linux/perf/arm_pmu.h     |  2 ++
 kernel/watchdog.c                | 61 ++++++++++++++++++++++++++++----
 kernel/watchdog_hld.c            |  8 ++++-
 10 files changed, 131 insertions(+), 14 deletions(-)
 create mode 100644 arch/arm64/kernel/watchdog_hld.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT
  2022-02-12 10:43 [PATCH 0/5] Support hld based on Pseudo-NMI for arm64 Lecopzer Chen
@ 2022-02-12 10:43 ` Lecopzer Chen
  2022-02-25 12:47   ` Petr Mladek
  2022-02-12 10:43 ` [PATCH 2/5] kernel/watchdog: change watchdog_nmi_enable() to void Lecopzer Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-12 10:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, davem, Matthias Brugger, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Petr Mladek,
	Andrew Morton, Wang Qing, Luis Chamberlain, Xiaoming Ni,
	linux-arm-kernel, linux-perf-users, sparclinux, linux-mediatek,
	sumit.garg, kernelfans, lecopzer.chen, yj.chiang

No reference to WATCHDOG_DEFAULT, remove it.

this commit reworks from [1].

[1] https://lore.kernel.org/lkml/20211014024155.15253-2-kernelfans@gmail.com/
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
---
 kernel/watchdog.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 99afb88d2e85..db451e943a04 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -30,10 +30,8 @@
 static DEFINE_MUTEX(watchdog_mutex);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
-# define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT	1
 #else
-# define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT	0
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] kernel/watchdog: change watchdog_nmi_enable() to void
  2022-02-12 10:43 [PATCH 0/5] Support hld based on Pseudo-NMI for arm64 Lecopzer Chen
  2022-02-12 10:43 ` [PATCH 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT Lecopzer Chen
@ 2022-02-12 10:43 ` Lecopzer Chen
  2022-02-25 12:50   ` Petr Mladek
  2022-02-12 10:43 ` [PATCH 3/5] kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup detector event Lecopzer Chen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-12 10:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, davem, Matthias Brugger, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Petr Mladek,
	Andrew Morton, Wang Qing, Luis Chamberlain, Xiaoming Ni,
	linux-arm-kernel, linux-perf-users, sparclinux, linux-mediatek,
	sumit.garg, kernelfans, lecopzer.chen, yj.chiang


Nobody cares about the return value of watchdog_nmi_enable(),
changing its prototype to void.

This commit reworks from [1].

[1] https://lore.kernel.org/lkml/20211014024155.15253-2-kernelfans@gmail.com/
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
---
 arch/sparc/kernel/nmi.c | 8 ++++----
 include/linux/nmi.h     | 2 +-
 kernel/watchdog.c       | 3 +--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 060fff95a305..8dc0f4e820b0 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
  * sparc specific NMI watchdog enable function.
  * Enables watchdog if it is not enabled already.
  */
-int watchdog_nmi_enable(unsigned int cpu)
+void watchdog_nmi_enable(unsigned int cpu)
 {
 	if (atomic_read(&nmi_active) == -1) {
 		pr_warn("NMI watchdog cannot be enabled or disabled\n");
-		return -1;
+		return;
 	}
 
 	/*
@@ -295,11 +295,11 @@ int watchdog_nmi_enable(unsigned int cpu)
 	 * process first.
 	 */
 	if (!nmi_init_done)
-		return 0;
+		return;
 
 	smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
 
-	return 0;
+	return;
 }
 /*
  * sparc specific NMI watchdog disable function.
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..b7bcd63c36b4 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -119,7 +119,7 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
 int watchdog_nmi_probe(void);
-int watchdog_nmi_enable(unsigned int cpu);
+void watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
 
 /**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index db451e943a04..b71d434cf648 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -93,10 +93,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
  * softlockup watchdog start and stop. The arch must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-int __weak watchdog_nmi_enable(unsigned int cpu)
+void __weak watchdog_nmi_enable(unsigned int cpu)
 {
 	hardlockup_detector_perf_enable();
-	return 0;
 }
 
 void __weak watchdog_nmi_disable(unsigned int cpu)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup detector event
  2022-02-12 10:43 [PATCH 0/5] Support hld based on Pseudo-NMI for arm64 Lecopzer Chen
  2022-02-12 10:43 ` [PATCH 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT Lecopzer Chen
  2022-02-12 10:43 ` [PATCH 2/5] kernel/watchdog: change watchdog_nmi_enable() to void Lecopzer Chen
@ 2022-02-12 10:43 ` Lecopzer Chen
  2022-02-25 13:15   ` Petr Mladek
  2022-02-12 10:43 ` [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model Lecopzer Chen
  2022-02-12 10:43 ` [PATCH 5/5] arm64: Enable perf events based hard lockup detector Lecopzer Chen
  4 siblings, 1 reply; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-12 10:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, davem, Matthias Brugger, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Petr Mladek,
	Andrew Morton, Wang Qing, Luis Chamberlain, Xiaoming Ni,
	linux-arm-kernel, linux-perf-users, sparclinux, linux-mediatek,
	sumit.garg, kernelfans, lecopzer.chen, yj.chiang

From: Pingfan Liu <kernelfans@gmail.com>

from: Pingfan Liu <kernelfans@gmail.com>

hardlockup_detector_event_create() should create perf_event on the
current CPU. Preemption could not get disabled because
perf_event_create_kernel_counter() allocates memory. Instead,
the CPU locality is achieved by processing the code in a per-CPU
bound kthread.

Add a check to prevent mistakes when calling the code in another
code path.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
---
 kernel/watchdog_hld.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..96b717205952 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -165,10 +165,16 @@ static void watchdog_overflow_callback(struct perf_event *event,
 
 static int hardlockup_detector_event_create(void)
 {
-	unsigned int cpu = smp_processor_id();
+	unsigned int cpu;
 	struct perf_event_attr *wd_attr;
 	struct perf_event *evt;
 
+	/*
+	 * Preemption is not disabled because memory will be allocated.
+	 * Ensure CPU-locality by calling this in per-CPU kthread.
+	 */
+	WARN_ON(!is_percpu_thread());
+	cpu = raw_smp_processor_id();
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model
  2022-02-12 10:43 [PATCH 0/5] Support hld based on Pseudo-NMI for arm64 Lecopzer Chen
                   ` (2 preceding siblings ...)
  2022-02-12 10:43 ` [PATCH 3/5] kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup detector event Lecopzer Chen
@ 2022-02-12 10:43 ` Lecopzer Chen
  2022-02-25 15:20   ` Petr Mladek
  2022-02-12 10:43 ` [PATCH 5/5] arm64: Enable perf events based hard lockup detector Lecopzer Chen
  4 siblings, 1 reply; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-12 10:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, davem, Matthias Brugger, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Petr Mladek,
	Andrew Morton, Wang Qing, Luis Chamberlain, Xiaoming Ni,
	linux-arm-kernel, linux-perf-users, sparclinux, linux-mediatek,
	sumit.garg, kernelfans, lecopzer.chen, yj.chiang

From: Pingfan Liu <kernelfans@gmail.com>

from: Pingfan Liu <kernelfans@gmail.com>

When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
with the driver model and cpuhp. Hence it is hard to push this
initialization before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async.

The async model is achieved by expanding watchdog_nmi_probe() with
-EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
---
 kernel/watchdog.c | 56 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b71d434cf648..fa8490cfeef8 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -103,7 +103,11 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
 	hardlockup_detector_perf_disable();
 }
 
-/* Return 0, if a NMI watchdog is available. Error code otherwise */
+/*
+ * Arch specific API. Return 0, if a NMI watchdog is available. -EBUSY if not
+ * ready, and arch code should wake up hld_detector_wait when ready. Other
+ * negative value if not support.
+ */
 int __weak __init watchdog_nmi_probe(void)
 {
 	return hardlockup_detector_perf_init();
@@ -839,16 +843,64 @@ static void __init watchdog_sysctl_init(void)
 #define watchdog_sysctl_init() do { } while (0)
 #endif /* CONFIG_SYSCTL */
 
+static void lockup_detector_delay_init(struct work_struct *work);
+enum hld_detector_state detector_delay_init_state __initdata;
+
+struct wait_queue_head hld_detector_wait __initdata =
+		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
+
+static struct work_struct detector_work __initdata =
+		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
+
+static void __init lockup_detector_delay_init(struct work_struct *work)
+{
+	int ret;
+
+	wait_event(hld_detector_wait,
+			detector_delay_init_state == DELAY_INIT_READY);
+	ret = watchdog_nmi_probe();
+	if (!ret) {
+		nmi_watchdog_available = true;
+		lockup_detector_setup();
+	} else {
+		WARN_ON(ret == -EBUSY);
+		pr_info("Perf NMI watchdog permanently disabled\n");
+	}
+}
+
+/* Ensure the check is called after the initialization of PMU driver */
+static int __init lockup_detector_check(void)
+{
+	if (detector_delay_init_state < DELAY_INIT_WAIT)
+		return 0;
+
+	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
+		detector_delay_init_state = DELAY_INIT_READY;
+		wake_up(&hld_detector_wait);
+	}
+	flush_work(&detector_work);
+	return 0;
+}
+late_initcall_sync(lockup_detector_check);
+
 void __init lockup_detector_init(void)
 {
+	int ret;
+
 	if (tick_nohz_full_enabled())
 		pr_info("Disabling watchdog on nohz_full cores by default\n");
 
 	cpumask_copy(&watchdog_cpumask,
 		     housekeeping_cpumask(HK_FLAG_TIMER));
 
-	if (!watchdog_nmi_probe())
+	ret = watchdog_nmi_probe();
+	if (!ret)
 		nmi_watchdog_available = true;
+	else if (ret == -EBUSY) {
+		detector_delay_init_state = DELAY_INIT_WAIT;
+		queue_work_on(smp_processor_id(), system_wq, &detector_work);
+	}
+
 	lockup_detector_setup();
 	watchdog_sysctl_init();
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] arm64: Enable perf events based hard lockup detector
  2022-02-12 10:43 [PATCH 0/5] Support hld based on Pseudo-NMI for arm64 Lecopzer Chen
                   ` (3 preceding siblings ...)
  2022-02-12 10:43 ` [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model Lecopzer Chen
@ 2022-02-12 10:43 ` Lecopzer Chen
  4 siblings, 0 replies; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-12 10:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, davem, Matthias Brugger, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Petr Mladek,
	Andrew Morton, Wang Qing, Luis Chamberlain, Xiaoming Ni,
	linux-arm-kernel, linux-perf-users, sparclinux, linux-mediatek,
	sumit.garg, kernelfans, lecopzer.chen, yj.chiang

From: Sumit Garg <sumit.garg@linaro.org>

from: Pingfan Liu <kernelfans@gmail.com>

With the recent feature added to enable perf events to use pseudo NMIs
as interrupts on platforms which support GICv3 or later, its now been
possible to enable hard lockup detector (or NMI watchdog) on arm64
platforms. So enable corresponding support.

One thing to note here is that normally lockup detector is initialized
just after the early initcalls but PMU on arm64 comes up much later as
device_initcall(). To cope with that, overriding watchdog_nmi_probe() to
let the watchdog framework know PMU not ready, and inform the framework
to re-initialize lockup detection once PMU has been initialized.

[1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org

Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
(Pingfan: adapt it to watchdog_hld async model based on [1])
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
---
 arch/arm64/Kconfig               |  2 ++
 arch/arm64/kernel/Makefile       |  1 +
 arch/arm64/kernel/perf_event.c   | 11 ++++++++--
 arch/arm64/kernel/watchdog_hld.c | 36 ++++++++++++++++++++++++++++++++
 drivers/perf/arm_pmu.c           |  5 +++++
 include/linux/nmi.h              |  9 ++++++++
 include/linux/perf/arm_pmu.h     |  2 ++
 7 files changed, 64 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kernel/watchdog_hld.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 09b885cc4db5..df6fed8327ba 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -190,6 +190,8 @@ config ARM64
 	select HAVE_NMI
 	select HAVE_PATA_PLATFORM
 	select HAVE_PERF_EVENTS
+	select HAVE_PERF_EVENTS_NMI if ARM64_PSEUDO_NMI
+	select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 88b3e2a21408..3e62a8877ed7 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_MODULES)			+= module.o
 obj-$(CONFIG_ARM64_MODULE_PLTS)		+= module-plts.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
 obj-$(CONFIG_HW_PERF_EVENTS)		+= perf_event.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF)	+= watchdog_hld.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
 obj-$(CONFIG_CPU_PM)			+= sleep.o suspend.o
 obj-$(CONFIG_CPU_IDLE)			+= cpuidle.o
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cab678ed6618..73db9f2588d5 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -23,6 +23,7 @@
 #include <linux/platform_device.h>
 #include <linux/sched_clock.h>
 #include <linux/smp.h>
+#include <linux/nmi.h>
 
 /* ARMv8 Cortex-A53 specific event types. */
 #define ARMV8_A53_PERFCTR_PREF_LINEFILL				0xC2
@@ -1380,10 +1381,16 @@ static struct platform_driver armv8_pmu_driver = {
 
 static int __init armv8_pmu_driver_init(void)
 {
+	int ret;
+
 	if (acpi_disabled)
-		return platform_driver_register(&armv8_pmu_driver);
+		ret = platform_driver_register(&armv8_pmu_driver);
 	else
-		return arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
+		ret = arm_pmu_acpi_probe(armv8_pmuv3_pmu_init);
+
+	detector_delay_init_state = DELAY_INIT_READY;
+	wake_up(&hld_detector_wait);
+	return ret;
 }
 device_initcall(armv8_pmu_driver_init)
 
diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
new file mode 100644
index 000000000000..85536906a186
--- /dev/null
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/nmi.h>
+#include <linux/cpufreq.h>
+#include <linux/perf/arm_pmu.h>
+
+/*
+ * Safe maximum CPU frequency in case a particular platform doesn't implement
+ * cpufreq driver. Although, architecture doesn't put any restrictions on
+ * maximum frequency but 5 GHz seems to be safe maximum given the available
+ * Arm CPUs in the market which are clocked much less than 5 GHz. On the other
+ * hand, we can't make it much higher as it would lead to a large hard-lockup
+ * detection timeout on parts which are running slower (eg. 1GHz on
+ * Developerbox) and doesn't possess a cpufreq driver.
+ */
+#define SAFE_MAX_CPU_FREQ	5000000000UL // 5 GHz
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
+{
+	unsigned int cpu = smp_processor_id();
+	unsigned long max_cpu_freq;
+
+	max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
+	if (!max_cpu_freq)
+		max_cpu_freq = SAFE_MAX_CPU_FREQ;
+
+	return (u64)max_cpu_freq * watchdog_thresh;
+}
+
+int __init watchdog_nmi_probe(void)
+{
+	if (detector_delay_init_state != DELAY_INIT_READY)
+		return -EBUSY;
+	else if (!arm_pmu_irq_is_nmi())
+		return -ENODEV;
+
+	return hardlockup_detector_perf_init();
+}
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 295cc7952d0e..e77f4897fca2 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -697,6 +697,11 @@ static int armpmu_get_cpu_irq(struct arm_pmu *pmu, int cpu)
 	return per_cpu(hw_events->irq, cpu);
 }
 
+bool arm_pmu_irq_is_nmi(void)
+{
+	return has_nmi;
+}
+
 /*
  * PMU hardware loses all context when a CPU goes offline.
  * When a CPU is hotplugged back in, since some hardware registers are
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b7bcd63c36b4..9def85c00bd8 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -118,6 +118,15 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
 
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
+
+enum hld_detector_state {
+	DELAY_INIT_NOP,
+	DELAY_INIT_WAIT,
+	DELAY_INIT_READY
+};
+
+extern enum hld_detector_state detector_delay_init_state;
+extern struct wait_queue_head hld_detector_wait;
 int watchdog_nmi_probe(void);
 void watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 2512e2f9cd4e..9325d01adc3e 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -169,6 +169,8 @@ void kvm_host_pmu_init(struct arm_pmu *pmu);
 #define kvm_host_pmu_init(x)	do { } while(0)
 #endif
 
+bool arm_pmu_irq_is_nmi(void);
+
 /* Internal functions only for core arm_pmu code */
 struct arm_pmu *armpmu_alloc(void);
 struct arm_pmu *armpmu_alloc_atomic(void);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT
  2022-02-12 10:43 ` [PATCH 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT Lecopzer Chen
@ 2022-02-25 12:47   ` Petr Mladek
  2022-02-26  9:52     ` Lecopzer Chen
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2022-02-25 12:47 UTC (permalink / raw)
  To: Lecopzer Chen
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, davem,
	Matthias Brugger, Marc Zyngier, Julien Thierry, Kees Cook,
	Masahiro Yamada, Andrew Morton, Wang Qing, Luis Chamberlain,
	Xiaoming Ni, linux-arm-kernel, linux-perf-users, sparclinux,
	linux-mediatek, sumit.garg, kernelfans, yj.chiang

On Sat 2022-02-12 18:43:45, Lecopzer Chen wrote:
> No reference to WATCHDOG_DEFAULT, remove it.

It would be enough to mention the link to the previous
version in the cover letter.

> this commit reworks from [1].
> 
> [1] https://lore.kernel.org/lkml/20211014024155.15253-2-kernelfans@gmail.com/
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] kernel/watchdog: change watchdog_nmi_enable() to void
  2022-02-12 10:43 ` [PATCH 2/5] kernel/watchdog: change watchdog_nmi_enable() to void Lecopzer Chen
@ 2022-02-25 12:50   ` Petr Mladek
  2022-02-26  9:54     ` Lecopzer Chen
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2022-02-25 12:50 UTC (permalink / raw)
  To: Lecopzer Chen
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, davem,
	Matthias Brugger, Marc Zyngier, Julien Thierry, Kees Cook,
	Masahiro Yamada, Andrew Morton, Wang Qing, Luis Chamberlain,
	Xiaoming Ni, linux-arm-kernel, linux-perf-users, sparclinux,
	linux-mediatek, sumit.garg, kernelfans, yj.chiang

On Sat 2022-02-12 18:43:46, Lecopzer Chen wrote:
> 
> Nobody cares about the return value of watchdog_nmi_enable(),
> changing its prototype to void.
> 
> --- a/arch/sparc/kernel/nmi.c
> +++ b/arch/sparc/kernel/nmi.c
> @@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
>   * sparc specific NMI watchdog enable function.
>   * Enables watchdog if it is not enabled already.
>   */
> -int watchdog_nmi_enable(unsigned int cpu)
> +void watchdog_nmi_enable(unsigned int cpu)
>  {
>  	if (atomic_read(&nmi_active) == -1) {
>  		pr_warn("NMI watchdog cannot be enabled or disabled\n");
> -		return -1;
> +		return;
>  	}
>  
>  	/*
> @@ -295,11 +295,11 @@ int watchdog_nmi_enable(unsigned int cpu)
>  	 * process first.
>  	 */
>  	if (!nmi_init_done)
> -		return 0;
> +		return;
>  
>  	smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
>  
> -	return 0;
> +	return;

Return at the end of the function is superfluous.

>  }
>  /*
>   * sparc specific NMI watchdog disable function.

Otherwise, it looks good.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup detector event
  2022-02-12 10:43 ` [PATCH 3/5] kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup detector event Lecopzer Chen
@ 2022-02-25 13:15   ` Petr Mladek
  0 siblings, 0 replies; 15+ messages in thread
From: Petr Mladek @ 2022-02-25 13:15 UTC (permalink / raw)
  To: Lecopzer Chen
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, davem,
	Matthias Brugger, Marc Zyngier, Julien Thierry, Kees Cook,
	Masahiro Yamada, Andrew Morton, Wang Qing, Luis Chamberlain,
	Xiaoming Ni, linux-arm-kernel, linux-perf-users, sparclinux,
	linux-mediatek, sumit.garg, kernelfans, yj.chiang

On Sat 2022-02-12 18:43:47, Lecopzer Chen wrote:
> From: Pingfan Liu <kernelfans@gmail.com>
> 
> from: Pingfan Liu <kernelfans@gmail.com>
> 
> hardlockup_detector_event_create() should create perf_event on the
> current CPU. Preemption could not get disabled because
> perf_event_create_kernel_counter() allocates memory. Instead,
> the CPU locality is achieved by processing the code in a per-CPU
> bound kthread.
> 
> Add a check to prevent mistakes when calling the code in another
> code path.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model
  2022-02-12 10:43 ` [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model Lecopzer Chen
@ 2022-02-25 15:20   ` Petr Mladek
  2022-02-26 10:52     ` Lecopzer Chen
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2022-02-25 15:20 UTC (permalink / raw)
  To: Lecopzer Chen
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, davem,
	Matthias Brugger, Marc Zyngier, Julien Thierry, Kees Cook,
	Masahiro Yamada, Andrew Morton, Wang Qing, Luis Chamberlain,
	Xiaoming Ni, linux-arm-kernel, linux-perf-users, sparclinux,
	linux-mediatek, sumit.garg, kernelfans, yj.chiang

On Sat 2022-02-12 18:43:48, Lecopzer Chen wrote:
> From: Pingfan Liu <kernelfans@gmail.com>
> 
> from: Pingfan Liu <kernelfans@gmail.com>
> 
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().
> 
> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.
> 
> The async model is achieved by expanding watchdog_nmi_probe() with
> -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> ---
>  kernel/watchdog.c | 56 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index b71d434cf648..fa8490cfeef8 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -839,16 +843,64 @@ static void __init watchdog_sysctl_init(void)
>  #define watchdog_sysctl_init() do { } while (0)
>  #endif /* CONFIG_SYSCTL */
>  
> +static void lockup_detector_delay_init(struct work_struct *work);
> +enum hld_detector_state detector_delay_init_state __initdata;

I would call this "lockup_detector_init_state" to use the same
naming scheme everywhere.

> +
> +struct wait_queue_head hld_detector_wait __initdata =
> +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> +
> +static struct work_struct detector_work __initdata =

I would call this "lockup_detector_work" to use the same naming scheme
everywhere.

> +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> +
> +static void __init lockup_detector_delay_init(struct work_struct *work)
> +{
> +	int ret;
> +
> +	wait_event(hld_detector_wait,
> +			detector_delay_init_state == DELAY_INIT_READY);

DELAY_INIT_READY is defined in the 5th patch.

There are many other build errors because this patch uses something
that is defined in the 5th patch.

> +	ret = watchdog_nmi_probe();
> +	if (!ret) {
> +		nmi_watchdog_available = true;
> +		lockup_detector_setup();
> +	} else {
> +		WARN_ON(ret == -EBUSY);

Why WARN_ON(), please?

Note that it might cause panic() when "panic_on_warn" command line
parameter is used.

Also the backtrace will not help much. The context is well known.
This code is called from a workqueue worker.


> +		pr_info("Perf NMI watchdog permanently disabled\n");
> +	}
> +}
> +
> +/* Ensure the check is called after the initialization of PMU driver */
> +static int __init lockup_detector_check(void)
> +{
> +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> +		return 0;
> +
> +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {

Again. Is WARN_ON() needed?

Also the condition looks wrong. IMHO, this is the expected state.

> +		detector_delay_init_state = DELAY_INIT_READY;
> +		wake_up(&hld_detector_wait);
> +	}
> +	flush_work(&detector_work);
> +	return 0;
> +}
> +late_initcall_sync(lockup_detector_check);

Otherwise, it make sense.

Best Regards,
Petr

PS: I am not going to review the last patch because I am no familiar
    with arm. I reviewed just the changes in the generic watchdog
    code.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT
  2022-02-25 12:47   ` Petr Mladek
@ 2022-02-26  9:52     ` Lecopzer Chen
  0 siblings, 0 replies; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-26  9:52 UTC (permalink / raw)
  To: pmladek
  Cc: acme, akpm, alexander.shishkin, catalin.marinas, davem, jolsa,
	jthierry, keescook, kernelfans, lecopzer.chen, linux-arm-kernel,
	linux-kernel, linux-mediatek, linux-perf-users, mark.rutland,
	masahiroy, matthias.bgg, maz, mcgrof, mingo, namhyung,
	nixiaoming, peterz, sparclinux, sumit.garg, wangqing, will,
	yj.chiang

> On Sat 2022-02-12 18:43:45, Lecopzer Chen wrote:
> > No reference to WATCHDOG_DEFAULT, remove it.
> 
> It would be enough to mention the link to the previous
> version in the cover letter.

Thanks, I'll tweak the commit message in next version patch.

> 
> > this commit reworks from [1].
> > 
> > [1] https://lore.kernel.org/lkml/20211014024155.15253-2-kernelfans@gmail.com/
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>


BRs,
Lecopzer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] kernel/watchdog: change watchdog_nmi_enable() to void
  2022-02-25 12:50   ` Petr Mladek
@ 2022-02-26  9:54     ` Lecopzer Chen
  0 siblings, 0 replies; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-26  9:54 UTC (permalink / raw)
  To: pmladek
  Cc: acme, akpm, alexander.shishkin, catalin.marinas, davem, jolsa,
	jthierry, keescook, kernelfans, lecopzer.chen, linux-arm-kernel,
	linux-kernel, linux-mediatek, linux-perf-users, mark.rutland,
	masahiroy, matthias.bgg, maz, mcgrof, mingo, namhyung,
	nixiaoming, peterz, sparclinux, sumit.garg, wangqing, will,
	yj.chiang

> On Sat 2022-02-12 18:43:46, Lecopzer Chen wrote:
> > 
> > Nobody cares about the return value of watchdog_nmi_enable(),
> > changing its prototype to void.
> > 
> > --- a/arch/sparc/kernel/nmi.c
> > +++ b/arch/sparc/kernel/nmi.c
> > @@ -282,11 +282,11 @@ __setup("nmi_watchdog=", setup_nmi_watchdog);
> >   * sparc specific NMI watchdog enable function.
> >   * Enables watchdog if it is not enabled already.
> >   */
> > -int watchdog_nmi_enable(unsigned int cpu)
> > +void watchdog_nmi_enable(unsigned int cpu)
> >  {
> >  	if (atomic_read(&nmi_active) == -1) {
> >  		pr_warn("NMI watchdog cannot be enabled or disabled\n");
> > -		return -1;
> > +		return;
> >  	}
> >  
> >  	/*
> > @@ -295,11 +295,11 @@ int watchdog_nmi_enable(unsigned int cpu)
> >  	 * process first.
> >  	 */
> >  	if (!nmi_init_done)
> > -		return 0;
> > +		return;
> >  
> >  	smp_call_function_single(cpu, start_nmi_watchdog, NULL, 1);
> >  
> > -	return 0;
> > +	return;
> 
> Return at the end of the function is superfluous.

Thanks, I'll fix in the next patch.

> 
> >  }
> >  /*
> >   * sparc specific NMI watchdog disable function.
> 
> Otherwise, it looks good.
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model
  2022-02-25 15:20   ` Petr Mladek
@ 2022-02-26 10:52     ` Lecopzer Chen
  2022-02-28 10:14       ` Petr Mladek
  0 siblings, 1 reply; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-26 10:52 UTC (permalink / raw)
  To: pmladek
  Cc: acme, akpm, alexander.shishkin, catalin.marinas, davem, jolsa,
	jthierry, keescook, kernelfans, lecopzer.chen, linux-arm-kernel,
	linux-kernel, linux-mediatek, linux-perf-users, mark.rutland,
	masahiroy, matthias.bgg, maz, mcgrof, mingo, namhyung,
	nixiaoming, peterz, sparclinux, sumit.garg, wangqing, will,
	yj.chiang

> On Sat 2022-02-12 18:43:48, Lecopzer Chen wrote:
> > From: Pingfan Liu <kernelfans@gmail.com>
> > 
> > from: Pingfan Liu <kernelfans@gmail.com>
> > 
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> > 
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
> > 
> > The async model is achieved by expanding watchdog_nmi_probe() with
> > -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> > Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
> > ---
> >  kernel/watchdog.c | 56 +++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 54 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index b71d434cf648..fa8490cfeef8 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -839,16 +843,64 @@ static void __init watchdog_sysctl_init(void)
> >  #define watchdog_sysctl_init() do { } while (0)
> >  #endif /* CONFIG_SYSCTL */
> >  
> > +static void lockup_detector_delay_init(struct work_struct *work);
> > +enum hld_detector_state detector_delay_init_state __initdata;
> 
> I would call this "lockup_detector_init_state" to use the same
> naming scheme everywhere.
> 
> > +
> > +struct wait_queue_head hld_detector_wait __initdata =
> > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > +
> > +static struct work_struct detector_work __initdata =
> 
> I would call this "lockup_detector_work" to use the same naming scheme
> everywhere.

For the naming part, I'll revise both of them in next patch.

> 
> > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > +
> > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > +{
> > +	int ret;
> > +
> > +	wait_event(hld_detector_wait,
> > +			detector_delay_init_state == DELAY_INIT_READY);
> 
> DELAY_INIT_READY is defined in the 5th patch.
> 
> There are many other build errors because this patch uses something
> that is defined in the 5th patch.

Thanks for pointing this out, the I'll fix 4th and 5th patches to correct the order.

> 
> > +	ret = watchdog_nmi_probe();
> > +	if (!ret) {
> > +		nmi_watchdog_available = true;
> > +		lockup_detector_setup();
> > +	} else {
> > +		WARN_ON(ret == -EBUSY);
> 
> Why WARN_ON(), please?
> 
> Note that it might cause panic() when "panic_on_warn" command line
> parameter is used.
> 
> Also the backtrace will not help much. The context is well known.
> This code is called from a workqueue worker.
 
The motivation to WARN should be:

lockup_detector_init
-> watchdog_nmi_probe return -EBUSY
-> lockup_detector_delay_init checks (detector_delay_init_state == DELAY_INIT_READY)
-> watchdog_nmi_probe checks
+	if (detector_delay_init_state != DELAY_INIT_READY)
+		return -EBUSY;

Since we first check detector_delay_init_state equals to DELAY_INIT_READY
and goes into watchdog_nmi_probe() and checks detector_delay_init_state again
becasue now we move from common part to arch part code.
In this condition, there shouldn't have any racing to detector_delay_init_state.
If it does happend an unknown racing, then shows a warning to it.

I think it make sense to remove WARN now becasue it looks verbosely...
However, I would rather change the following printk to
"Delayed init for lockup detector failed."

Is this fine with you?



> 
> > +		pr_info("Perf NMI watchdog permanently disabled\n");
> > +	}
> > +}
> > +
> > +/* Ensure the check is called after the initialization of PMU driver */
> > +static int __init lockup_detector_check(void)
> > +{
> > +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> > +		return 0;
> > +
> > +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
> 
> Again. Is WARN_ON() needed?
> 
> Also the condition looks wrong. IMHO, this is the expected state.
> 

This does expected DELAY_INIT_READY here, which means,
every one who comes here to be checked should be READY and WARN if you're
still in WAIT state, and which means the previous lockup_detector_delay_init()
failed.

IMO, either keeping or removing WARN is fine with me.

I think I'll remove WARN and add
pr_info("Delayed init checking for lockup detector failed, retry for once.");
inside the `if (detector_delay_init_state == DELAY_INIT_WAIT)`

Or would you have any other suggestion? thanks.

> > +		detector_delay_init_state = DELAY_INIT_READY;
> > +		wake_up(&hld_detector_wait);
> > +	}
> > +	flush_work(&detector_work);
> > +	return 0;
> > +}
> > +late_initcall_sync(lockup_detector_check);
> 
> Otherwise, it make sense.
> 
> Best Regards,
> Petr
> 
> PS: I am not going to review the last patch because I am no familiar
>     with arm. I reviewed just the changes in the generic watchdog
>     code.

Thanks again for your review.


BRs,
Lecopzer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model
  2022-02-26 10:52     ` Lecopzer Chen
@ 2022-02-28 10:14       ` Petr Mladek
  2022-02-28 16:32         ` Lecopzer Chen
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Mladek @ 2022-02-28 10:14 UTC (permalink / raw)
  To: Lecopzer Chen
  Cc: acme, akpm, alexander.shishkin, catalin.marinas, davem, jolsa,
	jthierry, keescook, kernelfans, linux-arm-kernel, linux-kernel,
	linux-mediatek, linux-perf-users, mark.rutland, masahiroy,
	matthias.bgg, maz, mcgrof, mingo, namhyung, nixiaoming, peterz,
	sparclinux, sumit.garg, wangqing, will, yj.chiang

On Sat 2022-02-26 18:52:29, Lecopzer Chen wrote:
> > On Sat 2022-02-12 18:43:48, Lecopzer Chen wrote:
> > > From: Pingfan Liu <kernelfans@gmail.com>
> > > 
> > > from: Pingfan Liu <kernelfans@gmail.com>
> > > 
> > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > yet. E.g. on arm64, PMU is not ready until
> > > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > > with the driver model and cpuhp. Hence it is hard to push this
> > > initialization before smp_init().
> > > 
> > > But it is easy to take an opposite approach by enabling watchdog_hld to
> > > get the capability of PMU async.
> > > 
> > > The async model is achieved by expanding watchdog_nmi_probe() with
> > > -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> > > 
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index b71d434cf648..fa8490cfeef8 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -839,16 +843,64 @@ static void __init watchdog_sysctl_init(void)
> > >  #define watchdog_sysctl_init() do { } while (0)
> > >  #endif /* CONFIG_SYSCTL */
> > >  
> > > +static void lockup_detector_delay_init(struct work_struct *work);
> > > +enum hld_detector_state detector_delay_init_state __initdata;
> > 
> > I would call this "lockup_detector_init_state" to use the same
> > naming scheme everywhere.
> > 
> > > +
> > > +struct wait_queue_head hld_detector_wait __initdata =
> > > +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> > > +
> > > +static struct work_struct detector_work __initdata =
> > 
> > I would call this "lockup_detector_work" to use the same naming scheme
> > everywhere.
> 
> For the naming part, I'll revise both of them in next patch.
> 
> > 
> > > +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> > > +
> > > +static void __init lockup_detector_delay_init(struct work_struct *work)
> > > +{
> > > +	int ret;
> > > +
> > > +	wait_event(hld_detector_wait,
> > > +			detector_delay_init_state == DELAY_INIT_READY);
> > 
> > DELAY_INIT_READY is defined in the 5th patch.
> > 
> > There are many other build errors because this patch uses something
> > that is defined in the 5th patch.
> 
> Thanks for pointing this out, the I'll fix 4th and 5th patches to correct the order.
> 
> > 
> > > +	ret = watchdog_nmi_probe();
> > > +	if (!ret) {
> > > +		nmi_watchdog_available = true;
> > > +		lockup_detector_setup();
> > > +	} else {
> > > +		WARN_ON(ret == -EBUSY);
> > 
> > Why WARN_ON(), please?
> > 
> > Note that it might cause panic() when "panic_on_warn" command line
> > parameter is used.
> > 
> > Also the backtrace will not help much. The context is well known.
> > This code is called from a workqueue worker.
>  
> The motivation to WARN should be:
> 
> lockup_detector_init
> -> watchdog_nmi_probe return -EBUSY
> -> lockup_detector_delay_init checks (detector_delay_init_state == DELAY_INIT_READY)
> -> watchdog_nmi_probe checks
> +	if (detector_delay_init_state != DELAY_INIT_READY)
> +		return -EBUSY;
> 
> Since we first check detector_delay_init_state equals to DELAY_INIT_READY
> and goes into watchdog_nmi_probe() and checks detector_delay_init_state again
> becasue now we move from common part to arch part code.
> In this condition, there shouldn't have any racing to detector_delay_init_state.
> If it does happend an unknown racing, then shows a warning to it.

There should not be any race.

     wait_event(hld_detector_wait,
		detector_delay_init_state == DELAY_INIT_READY);

waits until it is waken by lockup_detector_check(). Well, it could
wait forewer when lockup_detector_check() is caller earlier, see below.


> I think it make sense to remove WARN now becasue it looks verbosely...
> However, I would rather change the following printk to
> "Delayed init for lockup detector failed."

I would print both messages. The above message says what failed.


> > > +		pr_info("Perf NMI watchdog permanently disabled\n");

And this message explains what is the result of the above failure.
It is not obvious.

> > > +	}
> > > +}
> > > +
> > > +/* Ensure the check is called after the initialization of PMU driver */
> > > +static int __init lockup_detector_check(void)
> > > +{
> > > +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> > > +		return 0;
> > > +
> > > +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
> > 
> > Again. Is WARN_ON() needed?
> > 
> > Also the condition looks wrong. IMHO, this is the expected state.
> > 
> 
> This does expected DELAY_INIT_READY here, which means,
> every one who comes here to be checked should be READY and WARN if you're
> still in WAIT state, and which means the previous lockup_detector_delay_init()
> failed.

No, DELAY_INIT_READY is set below. DELAY_INIT_WAIT is valid value here.
It means that lockup_detector_delay_init() work is queued.


> IMO, either keeping or removing WARN is fine with me.
> 
> I think I'll remove WARN and add
> pr_info("Delayed init checking for lockup detector failed, retry for once.");
> inside the `if (detector_delay_init_state == DELAY_INIT_WAIT)`
> 
> Or would you have any other suggestion? thanks.
> 
> > > +		detector_delay_init_state = DELAY_INIT_READY;
> > > +		wake_up(&hld_detector_wait);

I see another problem now. We should always call the wake up here
when the work was queued. Otherwise, the worker will stay blocked
forewer.

The worker will also get blocked when the late_initcall is called
before the work is proceed by a worker.

> > > +	}
> > > +	flush_work(&detector_work);
> > > +	return 0;
> > > +}
> > > +late_initcall_sync(lockup_detector_check);


OK, I think that the three states are too complicated. I suggest to
use only a single bool. Something like:

static bool lockup_detector_pending_init __initdata;

struct wait_queue_head lockup_detector_wait __initdata =
		__WAIT_QUEUE_HEAD_INITIALIZER(lockup_detector_wait);

static struct work_struct detector_work __initdata =
		__WORK_INITIALIZER(lockup_detector_work,
				   lockup_detector_delay_init);

static void __init lockup_detector_delay_init(struct work_struct *work)
{
	int ret;

	wait_event(lockup_detector_wait, lockup_detector_pending_init == false);

	ret = watchdog_nmi_probe();
	if (ret) {
		pr_info("Delayed init of the lockup detector failed: %\n);
		pr_info("Perf NMI watchdog permanently disabled\n");
		return;
	}

	nmi_watchdog_available = true;
	lockup_detector_setup();
}

/* Trigger delayedEnsure the check is called after the initialization of PMU driver */
static int __init lockup_detector_check(void)
{
	if (!lockup_detector_pending_init)
		return;

	lockup_detector_pending_init = false;
	wake_up(&lockup_detector_wait);
	return 0;
}
late_initcall_sync(lockup_detector_check);

void __init lockup_detector_init(void)
{
	int ret;

	if (tick_nohz_full_enabled())
		pr_info("Disabling watchdog on nohz_full cores by default\n");

	cpumask_copy(&watchdog_cpumask,
		     housekeeping_cpumask(HK_FLAG_TIMER));

	ret = watchdog_nmi_probe();
	if (!ret)
		nmi_watchdog_available = true;
	else if (ret == -EBUSY) {
		detector_delay_pending_init = true;
		/* Init must be done in a process context on a bound CPU. */
		queue_work_on(smp_processor_id(), system_wq, 
				  &lockup_detector_work);
	}

	lockup_detector_setup();
	watchdog_sysctl_init();
}

The result is that lockup_detector_work() will never stay blocked
forever. There are two possibilities:

1.  lockup_detector_work() called before lockup_detector_check().
    In this case, wait_event() will wait until lockup_detector_check()
    clears detector_delay_pending_init and calls wake_up().

2. lockup_detector_check() called before lockup_detector_work().
   In this case, wait_even() will immediately continue because
   it will see cleared detector_delay_pending_init.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model
  2022-02-28 10:14       ` Petr Mladek
@ 2022-02-28 16:32         ` Lecopzer Chen
  0 siblings, 0 replies; 15+ messages in thread
From: Lecopzer Chen @ 2022-02-28 16:32 UTC (permalink / raw)
  To: pmladek
  Cc: acme, akpm, alexander.shishkin, catalin.marinas, davem, jolsa,
	jthierry, keescook, kernelfans, lecopzer.chen, linux-arm-kernel,
	linux-kernel, linux-mediatek, linux-perf-users, mark.rutland,
	masahiroy, matthias.bgg, maz, mcgrof, mingo, namhyung,
	nixiaoming, peterz, sparclinux, sumit.garg, wangqing, will,
	yj.chiang

Yes, there is no race now, the condition is much like a verbose checking for
the state. I'll remove it.


> > I think it make sense to remove WARN now becasue it looks verbosely...
> > However, I would rather change the following printk to
> > "Delayed init for lockup detector failed."
> 
> I would print both messages. The above message says what failed.
> 
> 
> > > > +		pr_info("Perf NMI watchdog permanently disabled\n");
> 
> And this message explains what is the result of the above failure.
> It is not obvious.

Yes, make sense, let's print both.


> 
> > > > +	}
> > > > +}
> > > > +
> > > > +/* Ensure the check is called after the initialization of PMU driver */
> > > > +static int __init lockup_detector_check(void)
> > > > +{
> > > > +	if (detector_delay_init_state < DELAY_INIT_WAIT)
> > > > +		return 0;
> > > > +
> > > > +	if (WARN_ON(detector_delay_init_state == DELAY_INIT_WAIT)) {
> > > 
> > > Again. Is WARN_ON() needed?
> > > 
> > > Also the condition looks wrong. IMHO, this is the expected state.
> > > 
> > 
> > This does expected DELAY_INIT_READY here, which means,
> > every one who comes here to be checked should be READY and WARN if you're
> > still in WAIT state, and which means the previous lockup_detector_delay_init()
> > failed.
> 
> No, DELAY_INIT_READY is set below. DELAY_INIT_WAIT is valid value here.
> It means that lockup_detector_delay_init() work is queued.
> 

Sorry, I didn't describe clearly,

For the call flow:

kernel_init_freeable()
-> lockup_detector_init()
--> queue work(lockup_detector_delay_init) with state registering
    to DELAY_INIT_WAIT.
---> lockup_detector_delay_init wait DELAY_INIT_READY that set
     by armv8_pmu_driver_init().
----> device_initcall(armv8_pmu_driver_init),
      set state to READY and wake_up the work. (in 5th patch)
-----> lockup_detector_delay_init recieves READY and calls
       watchdog_nmi_probe() again.
------> late_initcall_sync(lockup_detector_check);
        check if the state is READY? In other words, did the arch driver
        finish probing watchdog between "queue work" and "late_initcall_sync()"?
        If not, we forcely set state to READY and wake_up again.


> 
> > IMO, either keeping or removing WARN is fine with me.
> > 
> > I think I'll remove WARN and add
> > pr_info("Delayed init checking for lockup detector failed, retry for once.");
> > inside the `if (detector_delay_init_state == DELAY_INIT_WAIT)`
> > 
> > Or would you have any other suggestion? thanks.
> > 
> > > > +		detector_delay_init_state = DELAY_INIT_READY;
> > > > +		wake_up(&hld_detector_wait);
> 
> I see another problem now. We should always call the wake up here
> when the work was queued. Otherwise, the worker will stay blocked
> forewer.
> 
> The worker will also get blocked when the late_initcall is called
> before the work is proceed by a worker.

lockup_detector_check() is used to solve the blocking state.
As the description above, if state is WAIT when lockup_detector_check(),
we would forcely set state to READY can wake up the work for once.
After lockup_detector_check(), nobody cares about the state and the worker
also finishes its work.

> 
> > > > +	}
> > > > +	flush_work(&detector_work);
> > > > +	return 0;
> > > > +}
> > > > +late_initcall_sync(lockup_detector_check);
> 
> 
> OK, I think that the three states are too complicated. I suggest to
> use only a single bool. Something like:
> 
> static bool lockup_detector_pending_init __initdata;
> 
> struct wait_queue_head lockup_detector_wait __initdata =
> 		__WAIT_QUEUE_HEAD_INITIALIZER(lockup_detector_wait);
> 
> static struct work_struct detector_work __initdata =
> 		__WORK_INITIALIZER(lockup_detector_work,
> 				   lockup_detector_delay_init);
> 
> static void __init lockup_detector_delay_init(struct work_struct *work)
> {
> 	int ret;
> 
> 	wait_event(lockup_detector_wait, lockup_detector_pending_init == false);
> 
> 	ret = watchdog_nmi_probe();
> 	if (ret) {
> 		pr_info("Delayed init of the lockup detector failed: %\n);
> 		pr_info("Perf NMI watchdog permanently disabled\n");
> 		return;
> 	}
> 
> 	nmi_watchdog_available = true;
> 	lockup_detector_setup();
> }
> 
> /* Trigger delayedEnsure the check is called after the initialization of PMU driver */
> static int __init lockup_detector_check(void)
> {
> 	if (!lockup_detector_pending_init)
> 		return;
> 
> 	lockup_detector_pending_init = false;
> 	wake_up(&lockup_detector_wait);
> 	return 0;
> }
> late_initcall_sync(lockup_detector_check);
> 
> void __init lockup_detector_init(void)
> {
> 	int ret;
> 
> 	if (tick_nohz_full_enabled())
> 		pr_info("Disabling watchdog on nohz_full cores by default\n");
> 
> 	cpumask_copy(&watchdog_cpumask,
> 		     housekeeping_cpumask(HK_FLAG_TIMER));
> 
> 	ret = watchdog_nmi_probe();
> 	if (!ret)
> 		nmi_watchdog_available = true;
> 	else if (ret == -EBUSY) {
> 		detector_delay_pending_init = true;
> 		/* Init must be done in a process context on a bound CPU. */
> 		queue_work_on(smp_processor_id(), system_wq, 
> 				  &lockup_detector_work);
> 	}
> 
> 	lockup_detector_setup();
> 	watchdog_sysctl_init();
> }
> 
> The result is that lockup_detector_work() will never stay blocked
> forever. There are two possibilities:
> 
> 1.  lockup_detector_work() called before lockup_detector_check().
>     In this case, wait_event() will wait until lockup_detector_check()
>     clears detector_delay_pending_init and calls wake_up().
> 
> 2. lockup_detector_check() called before lockup_detector_work().
>    In this case, wait_even() will immediately continue because
>    it will see cleared detector_delay_pending_init.
> 

Thanks, I think this logic is much simpler than three states for our use case now,
It also fits the call flow described above, I will revise it base on this
code.


Thanks a lot for your code and review!

BRs,
Lecopzer

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-02-28 16:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-12 10:43 [PATCH 0/5] Support hld based on Pseudo-NMI for arm64 Lecopzer Chen
2022-02-12 10:43 ` [PATCH 1/5] kernel/watchdog: remove WATCHDOG_DEFAULT Lecopzer Chen
2022-02-25 12:47   ` Petr Mladek
2022-02-26  9:52     ` Lecopzer Chen
2022-02-12 10:43 ` [PATCH 2/5] kernel/watchdog: change watchdog_nmi_enable() to void Lecopzer Chen
2022-02-25 12:50   ` Petr Mladek
2022-02-26  9:54     ` Lecopzer Chen
2022-02-12 10:43 ` [PATCH 3/5] kernel/watchdog_hld: Ensure CPU-bound context when creating hardlockup detector event Lecopzer Chen
2022-02-25 13:15   ` Petr Mladek
2022-02-12 10:43 ` [PATCH 4/5] kernel/watchdog: Adapt the watchdog_hld interface for async model Lecopzer Chen
2022-02-25 15:20   ` Petr Mladek
2022-02-26 10:52     ` Lecopzer Chen
2022-02-28 10:14       ` Petr Mladek
2022-02-28 16:32         ` Lecopzer Chen
2022-02-12 10:43 ` [PATCH 5/5] arm64: Enable perf events based hard lockup detector Lecopzer Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).