linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] watchdog_hld cleanup and async model for arm64
@ 2021-09-15  3:50 Pingfan Liu
  2021-09-15  3:50 ` [PATCH 1/5] kernel/watchdog: remove useless WATCHDOG_DEFAULT Pingfan Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-15  3:50 UTC (permalink / raw)
  To: linux-arm-kernel, linux-kernel
  Cc: Pingfan Liu, Sumit Garg, Catalin Marinas, Will Deacon,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Petr Mladek, Andrew Morton, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj

hard lockup detector is helpful to diagnose unpaired irq enable/disable.
Sumit has tried with a series, and the last one is V5 [1].
Since it lasts a long time without any update, I takes a retry, which
addresses the delay intialization of watchdog_hld.

There is an obstacle to integrate arm64 hw perf event into watchdog_hld.
When lockup_detector_init()->watchdog_nmi_probe(), on arm64, PMU is not
ready until device_initcall(armv8_pmu_driver_init).  And it is deeply
integrated with the driver model and cpuhp. Hence it is hard to push
the initialization of armv8_pmu_driver_init() before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async. The async model is achieved by
introducing an extra parameter notifier of watchdog_nmi_probe().

In this series, [1-2/5] are trivial cleanup. [3-5/5] is for this async
model.
    
[1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org

Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wang Qing <wangqing@vivo.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Santosh Sivaraj <santosh@fossix.org>
To: linux-arm-kernel@lists.infradead.org
To: linux-kernel@vger.kernel.org


Pingfan Liu (5):
  kernel/watchdog: remove useless WATCHDOG_DEFAULT
  kernel/watchdog_hld: clarify the condition in
    hardlockup_detector_event_create()
  kernel/watchdog: adapt the watchdog_hld interface for async model
  kernel/watchdog_hld: simplify the detecting of hld watchdog
  arm64/watchdog_hld: enable hard lockup on arm64 platform

 arch/arm64/Kconfig                  |  3 ++
 arch/arm64/include/asm/perf_event.h |  5 ++
 arch/arm64/kernel/Makefile          |  1 +
 arch/arm64/kernel/perf_event.c      | 14 ++++-
 arch/arm64/kernel/watchdog_hld.c    | 83 +++++++++++++++++++++++++++++
 drivers/perf/arm_pmu.c              |  5 ++
 include/linux/nmi.h                 | 20 ++++---
 kernel/watchdog.c                   | 69 ++++++++++++++++++------
 kernel/watchdog_hld.c               | 26 +++------
 9 files changed, 181 insertions(+), 45 deletions(-)
 create mode 100644 arch/arm64/kernel/watchdog_hld.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] kernel/watchdog: remove useless WATCHDOG_DEFAULT
  2021-09-15  3:50 [PATCH 0/5] watchdog_hld cleanup and async model for arm64 Pingfan Liu
@ 2021-09-15  3:50 ` Pingfan Liu
  2021-09-15  3:51 ` [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create() Pingfan Liu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-15  3:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Petr Mladek, Andrew Morton, Wang Qing,
	Peter Zijlstra (Intel),
	Santosh Sivaraj

Since no reference to WATCHDOG_DEFAULT, remove it.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wang Qing <wangqing@vivo.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Santosh Sivaraj <santosh@fossix.org>
To: linux-kernel@vger.kernel.org
---
 kernel/watchdog.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index ad912511a0c0..e2a9e3331416 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -30,10 +30,8 @@
 static DEFINE_MUTEX(watchdog_mutex);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
-# define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT	1
 #else
-# define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED)
 # define NMI_WATCHDOG_DEFAULT	0
 #endif
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create()
  2021-09-15  3:50 [PATCH 0/5] watchdog_hld cleanup and async model for arm64 Pingfan Liu
  2021-09-15  3:50 ` [PATCH 1/5] kernel/watchdog: remove useless WATCHDOG_DEFAULT Pingfan Liu
@ 2021-09-15  3:51 ` Pingfan Liu
  2021-09-15  4:06   ` Andrew Morton
  2021-09-15 13:45   ` Peter Zijlstra
  2021-09-15  3:51 ` [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model Pingfan Liu
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-15  3:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Petr Mladek, Andrew Morton, Wang Qing,
	Peter Zijlstra (Intel),
	Santosh Sivaraj, Sumit Garg, Will Deacon, Mark Rutland

hardlockup_detector_event_create() indirectly calls
kmem_cache_alloc_node(), which is blockable.

So here, the really planned context is is_percpu_thread().

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wang Qing <wangqing@vivo.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Santosh Sivaraj <santosh@fossix.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
To: linux-kernel@vger.kernel.org
---
 kernel/watchdog_hld.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..6876e796dbf5 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
 
 static int hardlockup_detector_event_create(void)
 {
-	unsigned int cpu = smp_processor_id();
+	unsigned int cpu;
 	struct perf_event_attr *wd_attr;
 	struct perf_event *evt;
 
+	/* This function plans to execute in cpu bound kthread */
+	BUG_ON(!is_percpu_thread());
+	cpu = raw_smp_processor_id();
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-15  3:50 [PATCH 0/5] watchdog_hld cleanup and async model for arm64 Pingfan Liu
  2021-09-15  3:50 ` [PATCH 1/5] kernel/watchdog: remove useless WATCHDOG_DEFAULT Pingfan Liu
  2021-09-15  3:51 ` [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create() Pingfan Liu
@ 2021-09-15  3:51 ` Pingfan Liu
  2021-09-15 14:02   ` Peter Zijlstra
  2021-09-16  8:29   ` Petr Mladek
  2021-09-15  3:51 ` [PATCH 4/5] kernel/watchdog_hld: simplify the detecting of hld watchdog Pingfan Liu
  2021-09-15  3:51 ` [PATCH 5/5] arm64/watchdog_hld: enable hard lockup on arm64 platform Pingfan Liu
  4 siblings, 2 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-15  3:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Sumit Garg, Catalin Marinas, Will Deacon,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Petr Mladek, Andrew Morton, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj

When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
with the driver model and cpuhp. Hence it is hard to push this
initialization before smp_init().

But it is easy to take an opposite approach by enabling watchdog_hld to
get the capability of PMU async.

The async model is achieved by introducing an extra parameter notifier
of watchdog_nmi_probe().

Note after this patch, the async model, which is utilized by the next
patch, does not take effect yet.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wang Qing <wangqing@vivo.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Santosh Sivaraj <santosh@fossix.org>
To: linux-kernel@vger.kernel.org
---
 include/linux/nmi.h | 12 +++++++++--
 kernel/watchdog.c   | 49 +++++++++++++++++++++++++++++++++++----------
 2 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..70665fa6e0a9 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -78,8 +78,10 @@ static inline void reset_hung_task_detector(void) { }
  */
 #define NMI_WATCHDOG_ENABLED_BIT   0
 #define SOFT_WATCHDOG_ENABLED_BIT  1
+#define NMI_WATCHDOG_UNDETERMINED_BIT  2
 #define NMI_WATCHDOG_ENABLED      (1 << NMI_WATCHDOG_ENABLED_BIT)
 #define SOFT_WATCHDOG_ENABLED     (1 << SOFT_WATCHDOG_ENABLED_BIT)
+#define NMI_WATCHDOG_UNDETERMINED    (1 << NMI_WATCHDOG_UNDETERMINED_BIT)
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
@@ -116,10 +118,16 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
 #endif
 
+struct watchdog_nmi_status {
+	unsigned int cpu;
+	int status;
+};
+
+typedef void (*watchdog_nmi_status_reporter)(struct watchdog_nmi_status *);
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
-int watchdog_nmi_probe(void);
-int watchdog_nmi_enable(unsigned int cpu);
+int watchdog_nmi_probe(watchdog_nmi_status_reporter notifier);
+void watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
 
 /**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index e2a9e3331416..4ab71943d65f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -40,7 +40,7 @@ int __read_mostly watchdog_user_enabled = 1;
 int __read_mostly nmi_watchdog_user_enabled = NMI_WATCHDOG_DEFAULT;
 int __read_mostly soft_watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
-static int __read_mostly nmi_watchdog_available;
+static int __read_mostly nmi_watchdog_status;
 
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
@@ -85,6 +85,10 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
+static void lockup_detector_update_enable(void);
+
+static watchdog_nmi_status_reporter status_reporter;
+
 /*
  * These functions can be overridden if an architecture implements its
  * own hardlockup detector.
@@ -93,10 +97,9 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
  * softlockup watchdog start and stop. The arch must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-int __weak watchdog_nmi_enable(unsigned int cpu)
+void __weak watchdog_nmi_enable(unsigned int cpu)
 {
 	hardlockup_detector_perf_enable();
-	return 0;
 }
 
 void __weak watchdog_nmi_disable(unsigned int cpu)
@@ -104,8 +107,28 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
 	hardlockup_detector_perf_disable();
 }
 
-/* Return 0, if a NMI watchdog is available. Error code otherwise */
-int __weak __init watchdog_nmi_probe(void)
+static void watchdog_nmi_report_capability(struct watchdog_nmi_status *data)
+{
+	/* Set status to 1 temporary to block any further access */
+	if (atomic_cmpxchg((atomic_t *)&nmi_watchdog_status, -EBUSY, 1)
+			== -EBUSY) {
+		if (!data->status) {
+			nmi_watchdog_status = 0;
+			lockup_detector_update_enable();
+		} else {
+			nmi_watchdog_status = -ENODEV;
+			/* turn offf watchdog_enabled forever */
+			lockup_detector_update_enable();
+			pr_info("Perf NMI watchdog permanently disabled\n");
+		}
+	}
+}
+
+/*
+ * Return 0, if a NMI watchdog is available. -ENODEV if unavailable. -EBUSY if
+ * undetermined at this stage, and async notifier will update later.
+ */
+int __weak __init watchdog_nmi_probe(watchdog_nmi_status_reporter notifier)
 {
 	return hardlockup_detector_perf_init();
 }
@@ -144,8 +167,12 @@ static void lockup_detector_update_enable(void)
 	watchdog_enabled = 0;
 	if (!watchdog_user_enabled)
 		return;
-	if (nmi_watchdog_available && nmi_watchdog_user_enabled)
-		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+	if (nmi_watchdog_user_enabled) {
+		if (nmi_watchdog_status == 0)
+			watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+		else if (nmi_watchdog_status == -EBUSY)
+			watchdog_enabled |= NMI_WATCHDOG_UNDETERMINED;
+	}
 	if (soft_watchdog_user_enabled)
 		watchdog_enabled |= SOFT_WATCHDOG_ENABLED;
 }
@@ -467,7 +494,8 @@ static void watchdog_enable(unsigned int cpu)
 	/* Initialize timestamp */
 	update_touch_ts();
 	/* Enable the perf event */
-	if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
+	if (watchdog_enabled &
+			(NMI_WATCHDOG_ENABLED | NMI_WATCHDOG_UNDETERMINED))
 		watchdog_nmi_enable(cpu);
 }
 
@@ -682,7 +710,7 @@ int proc_watchdog(struct ctl_table *table, int write,
 int proc_nmi_watchdog(struct ctl_table *table, int write,
 		      void *buffer, size_t *lenp, loff_t *ppos)
 {
-	if (!nmi_watchdog_available && write)
+	if (!nmi_watchdog_status && write)
 		return -ENOTSUPP;
 	return proc_watchdog_common(NMI_WATCHDOG_ENABLED,
 				    table, write, buffer, lenp, ppos);
@@ -748,7 +776,6 @@ void __init lockup_detector_init(void)
 	cpumask_copy(&watchdog_cpumask,
 		     housekeeping_cpumask(HK_FLAG_TIMER));
 
-	if (!watchdog_nmi_probe())
-		nmi_watchdog_available = true;
+	nmi_watchdog_status = watchdog_nmi_probe(watchdog_nmi_report_capability);
 	lockup_detector_setup();
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] kernel/watchdog_hld: simplify the detecting of hld watchdog
  2021-09-15  3:50 [PATCH 0/5] watchdog_hld cleanup and async model for arm64 Pingfan Liu
                   ` (2 preceding siblings ...)
  2021-09-15  3:51 ` [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model Pingfan Liu
@ 2021-09-15  3:51 ` Pingfan Liu
  2021-09-15  3:51 ` [PATCH 5/5] arm64/watchdog_hld: enable hard lockup on arm64 platform Pingfan Liu
  4 siblings, 0 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-15  3:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Petr Mladek, Andrew Morton, Wang Qing,
	Peter Zijlstra (Intel),
	Santosh Sivaraj

By utilizing the new interface model, the stages of probe and enable can
be merged, which saves the pair of perf_event alloc and free.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Wang Qing <wangqing@vivo.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Santosh Sivaraj <santosh@fossix.org>
To: linux-kernel@vger.kernel.org
---
 include/linux/nmi.h   |  8 ++------
 kernel/watchdog.c     | 16 ++++++++++++++--
 kernel/watchdog_hld.c | 21 +++------------------
 3 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 70665fa6e0a9..e41cdf3edba7 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -101,20 +101,16 @@ extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_disable(void);
-extern void hardlockup_detector_perf_enable(void);
+extern int hardlockup_detector_perf_enable(void);
 extern void hardlockup_detector_perf_cleanup(void);
-extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
 static inline void hardlockup_detector_perf_disable(void) { }
-static inline void hardlockup_detector_perf_enable(void) { }
+static inline int hardlockup_detector_perf_enable(void) { return -ENODEV; }
 static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
 static inline void arch_touch_nmi_watchdog(void) {}
-# else
-static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
 #endif
 
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4ab71943d65f..3f5efbd5961c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -99,7 +99,17 @@ static watchdog_nmi_status_reporter status_reporter;
  */
 void __weak watchdog_nmi_enable(unsigned int cpu)
 {
-	hardlockup_detector_perf_enable();
+	struct watchdog_nmi_status data;
+	int ret;
+
+	ret = hardlockup_detector_perf_enable();
+	/* No concurrent risk because BP executes this before smp_init() */
+	if (watchdog_enabled & NMI_WATCHDOG_UNDETERMINED
+			&& status_reporter) {
+		data.cpu = cpu;
+		data.status = ret;
+		(*status_reporter)(&data);
+	}
 }
 
 void __weak watchdog_nmi_disable(unsigned int cpu)
@@ -130,7 +140,9 @@ static void watchdog_nmi_report_capability(struct watchdog_nmi_status *data)
  */
 int __weak __init watchdog_nmi_probe(watchdog_nmi_status_reporter notifier)
 {
-	return hardlockup_detector_perf_init();
+	status_reporter = notifier;
+
+	return -EBUSY;
 }
 
 /**
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 6876e796dbf5..2894778fbc6d 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -190,16 +190,17 @@ static int hardlockup_detector_event_create(void)
 /**
  * hardlockup_detector_perf_enable - Enable the local event
  */
-void hardlockup_detector_perf_enable(void)
+int hardlockup_detector_perf_enable(void)
 {
 	if (hardlockup_detector_event_create())
-		return;
+		return -ENODEV;
 
 	/* use original value for check */
 	if (!atomic_fetch_inc(&watchdog_cpus))
 		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
 
 	perf_event_enable(this_cpu_read(watchdog_ev));
+	return 0;
 }
 
 /**
@@ -281,19 +282,3 @@ void __init hardlockup_detector_perf_restart(void)
 			perf_event_enable(event);
 	}
 }
-
-/**
- * hardlockup_detector_perf_init - Probe whether NMI event is available at all
- */
-int __init hardlockup_detector_perf_init(void)
-{
-	int ret = hardlockup_detector_event_create();
-
-	if (ret) {
-		pr_info("Perf NMI watchdog permanently disabled\n");
-	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
-	}
-	return ret;
-}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] arm64/watchdog_hld: enable hard lockup on arm64 platform
  2021-09-15  3:50 [PATCH 0/5] watchdog_hld cleanup and async model for arm64 Pingfan Liu
                   ` (3 preceding siblings ...)
  2021-09-15  3:51 ` [PATCH 4/5] kernel/watchdog_hld: simplify the detecting of hld watchdog Pingfan Liu
@ 2021-09-15  3:51 ` Pingfan Liu
  2021-09-17 15:11   ` Pingfan Liu
  4 siblings, 1 reply; 21+ messages in thread
From: Pingfan Liu @ 2021-09-15  3:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Pingfan Liu, Catalin Marinas, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	linux-kernel

With watchdog_hld armed with the async model, arm64 can probe and enable
perf NMI after smp_init(). At the boot stage, all of cpus arm hard
lockup detector in the async model.

In this patch, the function hw_nmi_get_sample_period() is borrowed from
[1], credit goes to Sumit.

[1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
To: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm64/Kconfig                  |  3 ++
 arch/arm64/include/asm/perf_event.h |  5 ++
 arch/arm64/kernel/Makefile          |  1 +
 arch/arm64/kernel/perf_event.c      | 14 ++++-
 arch/arm64/kernel/watchdog_hld.c    | 83 +++++++++++++++++++++++++++++
 drivers/perf/arm_pmu.c              |  5 ++
 6 files changed, 109 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kernel/watchdog_hld.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 62c3c1d2190f..0f49e58a9dd8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -193,6 +193,9 @@ config ARM64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
+	select HAVE_PERF_EVENTS_NMI
+	select HAVE_HARDLOCKUP_DETECTOR_PERF \
+		if PERF_EVENTS && HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_FUNCTION_ARG_ACCESS_API
 	select HAVE_FUTEX_CMPXCHG if FUTEX
diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index 4ef6f19331f9..712a75f432f0 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -6,6 +6,7 @@
 #ifndef __ASM_PERF_EVENT_H
 #define __ASM_PERF_EVENT_H
 
+#include <linux/wait.h>
 #include <asm/stack_pointer.h>
 #include <asm/ptrace.h>
 
@@ -259,4 +260,8 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 	(regs)->pstate = PSR_MODE_EL1h;	\
 }
 
+extern bool arm_pmu_initialized;
+extern wait_queue_head_t arm_pmu_wait;
+extern bool check_pmu_nmi_ability(void);
+
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 3f1490bfb938..789c2fe5bb90 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_MODULES)			+= module.o
 obj-$(CONFIG_ARM64_MODULE_PLTS)		+= module-plts.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
 obj-$(CONFIG_HW_PERF_EVENTS)		+= perf_event.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF)	+= watchdog_hld.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
 obj-$(CONFIG_CPU_PM)			+= sleep.o suspend.o
 obj-$(CONFIG_CPU_IDLE)			+= cpuidle.o
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index d07788dad388..c4144cea0d55 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -24,6 +24,9 @@
 #include <linux/sched_clock.h>
 #include <linux/smp.h>
 
+bool arm_pmu_initialized;
+DECLARE_WAIT_QUEUE_HEAD(arm_pmu_wait);
+
 /* ARMv8 Cortex-A53 specific event types. */
 #define ARMV8_A53_PERFCTR_PREF_LINEFILL				0xC2
 
@@ -1284,10 +1287,17 @@ static struct platform_driver armv8_pmu_driver = {
 
 static int __init armv8_pmu_driver_init(void)
 {
+	int ret;
+
 	if (acpi_disabled)
-		return platform_driver_register(&armv8_pmu_driver);
+		ret = platform_driver_register(&armv8_pmu_driver);
 	else
-		return arm_pmu_acpi_probe(armv8_pmuv3_init);
+		ret = arm_pmu_acpi_probe(armv8_pmuv3_init);
+
+	arm_pmu_initialized = true;
+	wake_up_all(&arm_pmu_wait);
+
+	return ret;
 }
 device_initcall(armv8_pmu_driver_init)
 
diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
new file mode 100644
index 000000000000..18bfa62f1058
--- /dev/null
+++ b/arch/arm64/kernel/watchdog_hld.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/kthread.h>
+#include <linux/sched.h>
+#include <linux/nmi.h>
+#include <linux/cpufreq.h>
+#include <asm/perf_event.h>
+
+/*
+ * Safe maximum CPU frequency in case a particular platform doesn't implement
+ * cpufreq driver. Although, architecture doesn't put any restrictions on
+ * maximum frequency but 5 GHz seems to be safe maximum given the available
+ * Arm CPUs in the market which are clocked much less than 5 GHz. On the other
+ * hand, we can't make it much higher as it would lead to a large hard-lockup
+ * detection timeout on parts which are running slower (eg. 1GHz on
+ * Developerbox) and doesn't possess a cpufreq driver.
+ */
+#define SAFE_MAX_CPU_FREQ	5000000000UL // 5 GHz
+u64 hw_nmi_get_sample_period(int watchdog_thresh)
+{
+	unsigned int cpu = smp_processor_id();
+	unsigned long max_cpu_freq;
+
+	max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
+	if (!max_cpu_freq)
+		max_cpu_freq = SAFE_MAX_CPU_FREQ;
+
+	return (u64)max_cpu_freq * watchdog_thresh;
+}
+
+static watchdog_nmi_status_reporter status_reporter;
+
+static int hld_enabled_thread_fun(void *unused)
+{
+	struct watchdog_nmi_status status;
+	watchdog_nmi_status_reporter local_reporter;
+	int ret;
+
+	wait_event(arm_pmu_wait, arm_pmu_initialized);
+	status.cpu = raw_smp_processor_id();
+
+	if (!check_pmu_nmi_ability()) {
+		status.status = -ENODEV;
+		goto report;
+	}
+
+	ret = hardlockup_detector_perf_enable();
+	status.status = ret;
+
+report:
+	local_reporter = (watchdog_nmi_status_reporter)atomic64_fetch_and(0,
+				(atomic64_t *)&status_reporter);
+	if (local_reporter)
+		(*local_reporter)(&status);
+
+	return 0;
+}
+
+/* As for watchdog_nmi_disable(), using the default implement */
+void watchdog_nmi_enable(unsigned int cpu)
+{
+	struct task_struct *t;
+
+	/* PMU is not ready */
+	if (!arm_pmu_initialized) {
+		t = kthread_create_on_cpu(hld_enabled_thread_fun, NULL, cpu,
+			       "arm64_hld.%u");
+		if (IS_ERR(t))
+			return;
+		wake_up_process(t);
+		return;
+	}
+
+	/* For hotplug in cpu */
+	hardlockup_detector_perf_enable();
+}
+
+int __init watchdog_nmi_probe(watchdog_nmi_status_reporter notifier)
+{
+	/* On arm64, arm pmu is not ready at this stage */
+	status_reporter = notifier;
+	return -EBUSY;
+}
+
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 3cbc3baf087f..e08961b37922 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -105,6 +105,11 @@ static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops);
 
 static bool has_nmi;
 
+bool check_pmu_nmi_ability(void)
+{
+	return has_nmi;
+}
+
 static inline u64 arm_pmu_event_max_period(struct perf_event *event)
 {
 	if (event->hw.flags & ARMPMU_EVT_64BIT)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create()
  2021-09-15  3:51 ` [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create() Pingfan Liu
@ 2021-09-15  4:06   ` Andrew Morton
  2021-09-16  3:47     ` Pingfan Liu
  2021-09-15 13:45   ` Peter Zijlstra
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2021-09-15  4:06 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Petr Mladek, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj, Sumit Garg, Will Deacon, Mark Rutland

On Wed, 15 Sep 2021 11:51:00 +0800 Pingfan Liu <kernelfans@gmail.com> wrote:

> hardlockup_detector_event_create() indirectly calls
> kmem_cache_alloc_node(), which is blockable.
> 
> So here, the really planned context is is_percpu_thread().
> 
> ...
>
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  
>  static int hardlockup_detector_event_create(void)
>  {
> -	unsigned int cpu = smp_processor_id();
> +	unsigned int cpu;
>  	struct perf_event_attr *wd_attr;
>  	struct perf_event *evt;
>  
> +	/* This function plans to execute in cpu bound kthread */
> +	BUG_ON(!is_percpu_thread());

Can we avoid adding the BUG()?  Find a way to emit a WARNing and then
permit the kernel to continue?

> +	cpu = raw_smp_processor_id();
>  	wd_attr = &wd_hw_attr;
>  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create()
  2021-09-15  3:51 ` [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create() Pingfan Liu
  2021-09-15  4:06   ` Andrew Morton
@ 2021-09-15 13:45   ` Peter Zijlstra
  2021-09-16  3:57     ` Pingfan Liu
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2021-09-15 13:45 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Petr Mladek, Andrew Morton, Wang Qing,
	Santosh Sivaraj, Sumit Garg, Will Deacon, Mark Rutland

On Wed, Sep 15, 2021 at 11:51:00AM +0800, Pingfan Liu wrote:
> hardlockup_detector_event_create() indirectly calls
> kmem_cache_alloc_node(), which is blockable.
> 
> So here, the really planned context is is_percpu_thread().
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Wang Qing <wangqing@vivo.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Santosh Sivaraj <santosh@fossix.org>
> Cc: Sumit Garg <sumit.garg@linaro.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> To: linux-kernel@vger.kernel.org
> ---
>  kernel/watchdog_hld.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 247bf0b1582c..6876e796dbf5 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  
>  static int hardlockup_detector_event_create(void)
>  {
> -	unsigned int cpu = smp_processor_id();
> +	unsigned int cpu;
>  	struct perf_event_attr *wd_attr;
>  	struct perf_event *evt;
>  
> +	/* This function plans to execute in cpu bound kthread */
> +	BUG_ON(!is_percpu_thread());
> +	cpu = raw_smp_processor_id();
>  	wd_attr = &wd_hw_attr;
>  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);

This patch makes no sense.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-15  3:51 ` [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model Pingfan Liu
@ 2021-09-15 14:02   ` Peter Zijlstra
  2021-09-16  3:07     ` Pingfan Liu
  2021-09-16  8:29   ` Petr Mladek
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2021-09-15 14:02 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Sumit Garg, Catalin Marinas, Will Deacon,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Petr Mladek, Andrew Morton, Wang Qing, Santosh Sivaraj

On Wed, Sep 15, 2021 at 11:51:01AM +0800, Pingfan Liu wrote:
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().
> 
> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.
> 
> The async model is achieved by introducing an extra parameter notifier
> of watchdog_nmi_probe().
> 
> Note after this patch, the async model, which is utilized by the next
> patch, does not take effect yet.

I can't make any sense of what you're trying to do..

> +static void watchdog_nmi_report_capability(struct watchdog_nmi_status *data)
> +{
> +	/* Set status to 1 temporary to block any further access */
> +	if (atomic_cmpxchg((atomic_t *)&nmi_watchdog_status, -EBUSY, 1)
> +			== -EBUSY) {

But this..

> +		if (!data->status) {
> +			nmi_watchdog_status = 0;
> +			lockup_detector_update_enable();
> +		} else {
> +			nmi_watchdog_status = -ENODEV;
> +			/* turn offf watchdog_enabled forever */
> +			lockup_detector_update_enable();
> +			pr_info("Perf NMI watchdog permanently disabled\n");
> +		}
> +	}
> +}

> @@ -467,7 +494,8 @@ static void watchdog_enable(unsigned int cpu)
>  	/* Initialize timestamp */
>  	update_touch_ts();
>  	/* Enable the perf event */
> -	if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
> +	if (watchdog_enabled &
> +			(NMI_WATCHDOG_ENABLED | NMI_WATCHDOG_UNDETERMINED))

and this, are horrible indenting.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-15 14:02   ` Peter Zijlstra
@ 2021-09-16  3:07     ` Pingfan Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-16  3:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Pingfan Liu, linux-kernel, Sumit Garg, Catalin Marinas,
	Will Deacon, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Petr Mladek, Andrew Morton, Wang Qing, Santosh Sivaraj

On Wed, Sep 15, 2021 at 04:02:28PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 15, 2021 at 11:51:01AM +0800, Pingfan Liu wrote:
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> > 
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
> > 
> > The async model is achieved by introducing an extra parameter notifier
> > of watchdog_nmi_probe().
> > 
> > Note after this patch, the async model, which is utilized by the next
> > patch, does not take effect yet.
> 
> I can't make any sense of what you're trying to do..
> 
Sorry for a bad expression. what I mean is: this patch [3/5] provides an
framework for async model. But since watchdog_nmi_probe() still return 0 or
-ENODEV after this patch, the code's behavior is the same as original.

Does it make sense to you?
> > +static void watchdog_nmi_report_capability(struct watchdog_nmi_status *data)
> > +{
> > +	/* Set status to 1 temporary to block any further access */
> > +	if (atomic_cmpxchg((atomic_t *)&nmi_watchdog_status, -EBUSY, 1)
> > +			== -EBUSY) {
> 
> But this..
> 
Oh, check other codes, for a wrapped condition, blanks should be better choice.
> > +		if (!data->status) {
> > +			nmi_watchdog_status = 0;
> > +			lockup_detector_update_enable();
> > +		} else {
> > +			nmi_watchdog_status = -ENODEV;
> > +			/* turn offf watchdog_enabled forever */
> > +			lockup_detector_update_enable();
> > +			pr_info("Perf NMI watchdog permanently disabled\n");
> > +		}
> > +	}
> > +}
> 
> > @@ -467,7 +494,8 @@ static void watchdog_enable(unsigned int cpu)
> >  	/* Initialize timestamp */
> >  	update_touch_ts();
> >  	/* Enable the perf event */
> > -	if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
> > +	if (watchdog_enabled &
> > +			(NMI_WATCHDOG_ENABLED | NMI_WATCHDOG_UNDETERMINED))
> 
> and this, are horrible indenting.

Ditto.

Thanks for your comment and review.

Regards,

	Pingfan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create()
  2021-09-15  4:06   ` Andrew Morton
@ 2021-09-16  3:47     ` Pingfan Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-16  3:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pingfan Liu, linux-kernel, Petr Mladek, Wang Qing,
	Peter Zijlstra (Intel),
	Santosh Sivaraj, Sumit Garg, Will Deacon, Mark Rutland

On Tue, Sep 14, 2021 at 09:06:27PM -0700, Andrew Morton wrote:
> On Wed, 15 Sep 2021 11:51:00 +0800 Pingfan Liu <kernelfans@gmail.com> wrote:
> 
> > hardlockup_detector_event_create() indirectly calls
> > kmem_cache_alloc_node(), which is blockable.
> > 
> > So here, the really planned context is is_percpu_thread().
> > 
> > ...
> >
> > --- a/kernel/watchdog_hld.c
> > +++ b/kernel/watchdog_hld.c
> > @@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
> >  
> >  static int hardlockup_detector_event_create(void)
> >  {
> > -	unsigned int cpu = smp_processor_id();
> > +	unsigned int cpu;
> >  	struct perf_event_attr *wd_attr;
> >  	struct perf_event *evt;
> >  
> > +	/* This function plans to execute in cpu bound kthread */
> > +	BUG_ON(!is_percpu_thread());
> 
> Can we avoid adding the BUG()?  Find a way to emit a WARNing and then
> permit the kernel to continue?
> 
Yes, WARN_ON() can work in this case.

Thanks,

	Pingfan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create()
  2021-09-15 13:45   ` Peter Zijlstra
@ 2021-09-16  3:57     ` Pingfan Liu
  2021-09-16  8:02       ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: Pingfan Liu @ 2021-09-16  3:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Pingfan Liu, linux-kernel, Petr Mladek, Andrew Morton, Wang Qing,
	Santosh Sivaraj, Sumit Garg, Will Deacon, Mark Rutland

On Wed, Sep 15, 2021 at 03:45:06PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 15, 2021 at 11:51:00AM +0800, Pingfan Liu wrote:
> > hardlockup_detector_event_create() indirectly calls
> > kmem_cache_alloc_node(), which is blockable.
> > 
> > So here, the really planned context is is_percpu_thread().
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Wang Qing <wangqing@vivo.com>
> > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> > Cc: Santosh Sivaraj <santosh@fossix.org>
> > Cc: Sumit Garg <sumit.garg@linaro.org>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > To: linux-kernel@vger.kernel.org
> > ---
> >  kernel/watchdog_hld.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> > index 247bf0b1582c..6876e796dbf5 100644
> > --- a/kernel/watchdog_hld.c
> > +++ b/kernel/watchdog_hld.c
> > @@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
> >  
> >  static int hardlockup_detector_event_create(void)
> >  {
> > -	unsigned int cpu = smp_processor_id();
> > +	unsigned int cpu;
> >  	struct perf_event_attr *wd_attr;
> >  	struct perf_event *evt;
> >  
> > +	/* This function plans to execute in cpu bound kthread */
> > +	BUG_ON(!is_percpu_thread());
> > +	cpu = raw_smp_processor_id();
> >  	wd_attr = &wd_hw_attr;
> >  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> 
> This patch makes no sense.

This patch aims to disable any attempt such as using get_cpu()/put_cpu() to
shut up the check_preemption_disabled().

But if anybody is familiar with the integration of watchdog_hld and
cpuhp, he should know the right way without this BUG_ON() or warn.

Do you still think it is pointless?


Thanks,

	Pingfan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create()
  2021-09-16  3:57     ` Pingfan Liu
@ 2021-09-16  8:02       ` Petr Mladek
  2021-09-17 15:08         ` Pingfan Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2021-09-16  8:02 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Peter Zijlstra, Pingfan Liu, linux-kernel, Andrew Morton,
	Wang Qing, Santosh Sivaraj, Sumit Garg, Will Deacon,
	Mark Rutland

On Thu 2021-09-16 11:57:44, Pingfan Liu wrote:
> On Wed, Sep 15, 2021 at 03:45:06PM +0200, Peter Zijlstra wrote:
> > On Wed, Sep 15, 2021 at 11:51:00AM +0800, Pingfan Liu wrote:
> > > hardlockup_detector_event_create() indirectly calls
> > > kmem_cache_alloc_node(), which is blockable.
> > > 
> > > So here, the really planned context is is_percpu_thread().
> > > 
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > Cc: Petr Mladek <pmladek@suse.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Wang Qing <wangqing@vivo.com>
> > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> > > Cc: Santosh Sivaraj <santosh@fossix.org>
> > > Cc: Sumit Garg <sumit.garg@linaro.org>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > To: linux-kernel@vger.kernel.org
> > > ---
> > >  kernel/watchdog_hld.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> > > index 247bf0b1582c..6876e796dbf5 100644
> > > --- a/kernel/watchdog_hld.c
> > > +++ b/kernel/watchdog_hld.c
> > > @@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
> > >  
> > >  static int hardlockup_detector_event_create(void)
> > >  {
> > > -	unsigned int cpu = smp_processor_id();
> > > +	unsigned int cpu;
> > >  	struct perf_event_attr *wd_attr;
> > >  	struct perf_event *evt;
> > >  
> > > +	/* This function plans to execute in cpu bound kthread */
> > > +	BUG_ON(!is_percpu_thread());
> > > +	cpu = raw_smp_processor_id();
> > >  	wd_attr = &wd_hw_attr;
> > >  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> > 
> > This patch makes no sense.
> 
> This patch aims to disable any attempt such as using get_cpu()/put_cpu() to
> shut up the check_preemption_disabled().

I have to say that the description of the problem is really cryptic.
Please, provide more context, code paths, sample code, next time.

Well, I probably got it. The code might sleep. But it should run on the
same CPU even after waking up. You try to achieve this by running
the code in a process that is bound to a single CPU.

IMHO, this is not reliable. Anyone could change the affinity
of the process in the meantime.

I see two solutions. Either avoid the sleep or making sure
that the code access per-CPU variables on the same CPU
all the time.

For example, you might use

	*per_cpu_ptr(watchdog_ev, cpu) = evt;

instead of

	this_cpu_write(watchdog_ev, evt);

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-15  3:51 ` [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model Pingfan Liu
  2021-09-15 14:02   ` Peter Zijlstra
@ 2021-09-16  8:29   ` Petr Mladek
  2021-09-16  8:36     ` Petr Mladek
  2021-09-17 14:43     ` Pingfan Liu
  1 sibling, 2 replies; 21+ messages in thread
From: Petr Mladek @ 2021-09-16  8:29 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Sumit Garg, Catalin Marinas, Will Deacon,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Andrew Morton, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj

On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().
> 
> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.

This is another cryptic description. I have probably got it after
looking at the 5th patch (was not Cc :-(

> The async model is achieved by introducing an extra parameter notifier
> of watchdog_nmi_probe().

I would say that the code is horrible and looks too complex.

What about simply calling watchdog_nmi_probe() and
lockup_detector_setup() once again when watchdog_nmi_probe()
failed in lockup_detector_init()?

Or do not call lockup_detector_init() at all in
kernel_init_freeable() when PMU is not ready yet.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-16  8:29   ` Petr Mladek
@ 2021-09-16  8:36     ` Petr Mladek
  2021-09-17 15:41       ` Pingfan Liu
  2021-09-17 14:43     ` Pingfan Liu
  1 sibling, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2021-09-16  8:36 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Sumit Garg, Catalin Marinas, Will Deacon,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Andrew Morton, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj

On Thu 2021-09-16 10:29:05, Petr Mladek wrote:
> On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> > 
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
> 
> This is another cryptic description. I have probably got it after
> looking at the 5th patch (was not Cc :-(
> 
> > The async model is achieved by introducing an extra parameter notifier
> > of watchdog_nmi_probe().
> 
> I would say that the code is horrible and looks too complex.
> 
> What about simply calling watchdog_nmi_probe() and
> lockup_detector_setup() once again when watchdog_nmi_probe()
> failed in lockup_detector_init()?
> 
> Or do not call lockup_detector_init() at all in
> kernel_init_freeable() when PMU is not ready yet.

BTW: It is an overkill to create your own kthread just to run some
code just once. And you implemeted it a wrong way. The kthread
must wait in a loop until someone else stop it and read
the exit code.

The easiest solution is to queue a work into system_wq for this.

I was not Cc for the 5th patch, so I write it here.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-16  8:29   ` Petr Mladek
  2021-09-16  8:36     ` Petr Mladek
@ 2021-09-17 14:43     ` Pingfan Liu
  1 sibling, 0 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-17 14:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Pingfan Liu, linux-kernel, Sumit Garg, Catalin Marinas,
	Will Deacon, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Andrew Morton, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj

On Thu, Sep 16, 2021 at 10:29:04AM +0200, Petr Mladek wrote:
> On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > yet. E.g. on arm64, PMU is not ready until
> > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > with the driver model and cpuhp. Hence it is hard to push this
> > initialization before smp_init().
> > 
> > But it is easy to take an opposite approach by enabling watchdog_hld to
> > get the capability of PMU async.
> 
> This is another cryptic description. I have probably got it after
> looking at the 5th patch (was not Cc :-(
> 
> > The async model is achieved by introducing an extra parameter notifier
> > of watchdog_nmi_probe().
> 
> I would say that the code is horrible and looks too complex.
> 
> What about simply calling watchdog_nmi_probe() and
> lockup_detector_setup() once again when watchdog_nmi_probe()
> failed in lockup_detector_init()?
> 
It may work. But there is still a way to report the PMU NMI capability
to watchdog layer accurately. And the API should be extened somehow.

I am thinking something, maybe I can model in another way.
> Or do not call lockup_detector_init() at all in
> kernel_init_freeable() when PMU is not ready yet.
> 

This may be not a good choice. Since lockup_detector_init() had better
be ready as early as possible, especially before drivers.


Thanks,

	Pingfan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create()
  2021-09-16  8:02       ` Petr Mladek
@ 2021-09-17 15:08         ` Pingfan Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-17 15:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Peter Zijlstra, Pingfan Liu, linux-kernel, Andrew Morton,
	Wang Qing, Santosh Sivaraj, Sumit Garg, Will Deacon,
	Mark Rutland

On Thu, Sep 16, 2021 at 10:02:27AM +0200, Petr Mladek wrote:
> On Thu 2021-09-16 11:57:44, Pingfan Liu wrote:
> > On Wed, Sep 15, 2021 at 03:45:06PM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 15, 2021 at 11:51:00AM +0800, Pingfan Liu wrote:
> > > > hardlockup_detector_event_create() indirectly calls
> > > > kmem_cache_alloc_node(), which is blockable.
> > > > 
> > > > So here, the really planned context is is_percpu_thread().
> > > > 
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > Cc: Petr Mladek <pmladek@suse.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: Wang Qing <wangqing@vivo.com>
> > > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> > > > Cc: Santosh Sivaraj <santosh@fossix.org>
> > > > Cc: Sumit Garg <sumit.garg@linaro.org>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > To: linux-kernel@vger.kernel.org
> > > > ---
> > > >  kernel/watchdog_hld.c | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> > > > index 247bf0b1582c..6876e796dbf5 100644
> > > > --- a/kernel/watchdog_hld.c
> > > > +++ b/kernel/watchdog_hld.c
> > > > @@ -165,10 +165,13 @@ static void watchdog_overflow_callback(struct perf_event *event,
> > > >  
> > > >  static int hardlockup_detector_event_create(void)
> > > >  {
> > > > -	unsigned int cpu = smp_processor_id();
> > > > +	unsigned int cpu;
> > > >  	struct perf_event_attr *wd_attr;
> > > >  	struct perf_event *evt;
> > > >  
> > > > +	/* This function plans to execute in cpu bound kthread */
> > > > +	BUG_ON(!is_percpu_thread());
> > > > +	cpu = raw_smp_processor_id();
> > > >  	wd_attr = &wd_hw_attr;
> > > >  	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> > > 
> > > This patch makes no sense.
> > 
> > This patch aims to disable any attempt such as using get_cpu()/put_cpu() to
> > shut up the check_preemption_disabled().
> 
> I have to say that the description of the problem is really cryptic.
> Please, provide more context, code paths, sample code, next time.
> 
Sorry, I will be more straight forward. And I had thought commit log had
mentioned it.
> Well, I probably got it. The code might sleep. But it should run on the

And you do get it.
> same CPU even after waking up. You try to achieve this by running
> the code in a process that is bound to a single CPU.
> 
> IMHO, this is not reliable. Anyone could change the affinity
> of the process in the meantime.
> 
No, it is not. As the code says: PF_NO_SETAFFINITY.
static inline bool is_percpu_thread(void)
{
#ifdef CONFIG_SMP
	return (current->flags & PF_NO_SETAFFINITY) &&
		(current->nr_cpus_allowed  == 1);
#else
	return true;
#endif
}

This is critical for cpuhp_* (kernel/cpu.c). And
hardlockup_detector_event_create() should be planned to run on such a
kthread.

Thanks,

	Pingfan

> I see two solutions. Either avoid the sleep or making sure
> that the code access per-CPU variables on the same CPU
> all the time.
> 
> For example, you might use
> 
> 	*per_cpu_ptr(watchdog_ev, cpu) = evt;
> 
> instead of
> 
> 	this_cpu_write(watchdog_ev, evt);
> 
> Best Regards,
> Petr


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] arm64/watchdog_hld: enable hard lockup on arm64 platform
  2021-09-15  3:51 ` [PATCH 5/5] arm64/watchdog_hld: enable hard lockup on arm64 platform Pingfan Liu
@ 2021-09-17 15:11   ` Pingfan Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-17 15:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: linux-arm-kernel, Catalin Marinas, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	linux-kernel

On Wed, Sep 15, 2021 at 11:51:03AM +0800, Pingfan Liu wrote:

+ Petr Mladek

> With watchdog_hld armed with the async model, arm64 can probe and enable
> perf NMI after smp_init(). At the boot stage, all of cpus arm hard
> lockup detector in the async model.
> 
> In this patch, the function hw_nmi_get_sample_period() is borrowed from
> [1], credit goes to Sumit.
> 
> [1]: http://lore.kernel.org/linux-arm-kernel/1610712101-14929-1-git-send-email-sumit.garg@linaro.org
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Julien Thierry <jthierry@redhat.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Masahiro Yamada <masahiroy@kernel.org>
> Cc: Sami Tolvanen <samitolvanen@google.com>
> To: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arm64/Kconfig                  |  3 ++
>  arch/arm64/include/asm/perf_event.h |  5 ++
>  arch/arm64/kernel/Makefile          |  1 +
>  arch/arm64/kernel/perf_event.c      | 14 ++++-
>  arch/arm64/kernel/watchdog_hld.c    | 83 +++++++++++++++++++++++++++++
>  drivers/perf/arm_pmu.c              |  5 ++
>  6 files changed, 109 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/kernel/watchdog_hld.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 62c3c1d2190f..0f49e58a9dd8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -193,6 +193,9 @@ config ARM64
>  	select HAVE_PERF_EVENTS
>  	select HAVE_PERF_REGS
>  	select HAVE_PERF_USER_STACK_DUMP
> +	select HAVE_PERF_EVENTS_NMI
> +	select HAVE_HARDLOCKUP_DETECTOR_PERF \
> +		if PERF_EVENTS && HW_PERF_EVENTS && HAVE_PERF_EVENTS_NMI
>  	select HAVE_REGS_AND_STACK_ACCESS_API
>  	select HAVE_FUNCTION_ARG_ACCESS_API
>  	select HAVE_FUTEX_CMPXCHG if FUTEX
> diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
> index 4ef6f19331f9..712a75f432f0 100644
> --- a/arch/arm64/include/asm/perf_event.h
> +++ b/arch/arm64/include/asm/perf_event.h
> @@ -6,6 +6,7 @@
>  #ifndef __ASM_PERF_EVENT_H
>  #define __ASM_PERF_EVENT_H
>  
> +#include <linux/wait.h>
>  #include <asm/stack_pointer.h>
>  #include <asm/ptrace.h>
>  
> @@ -259,4 +260,8 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
>  	(regs)->pstate = PSR_MODE_EL1h;	\
>  }
>  
> +extern bool arm_pmu_initialized;
> +extern wait_queue_head_t arm_pmu_wait;
> +extern bool check_pmu_nmi_ability(void);
> +
>  #endif
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 3f1490bfb938..789c2fe5bb90 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -46,6 +46,7 @@ obj-$(CONFIG_MODULES)			+= module.o
>  obj-$(CONFIG_ARM64_MODULE_PLTS)		+= module-plts.o
>  obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
>  obj-$(CONFIG_HW_PERF_EVENTS)		+= perf_event.o
> +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF)	+= watchdog_hld.o
>  obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
>  obj-$(CONFIG_CPU_PM)			+= sleep.o suspend.o
>  obj-$(CONFIG_CPU_IDLE)			+= cpuidle.o
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index d07788dad388..c4144cea0d55 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -24,6 +24,9 @@
>  #include <linux/sched_clock.h>
>  #include <linux/smp.h>
>  
> +bool arm_pmu_initialized;
> +DECLARE_WAIT_QUEUE_HEAD(arm_pmu_wait);
> +
>  /* ARMv8 Cortex-A53 specific event types. */
>  #define ARMV8_A53_PERFCTR_PREF_LINEFILL				0xC2
>  
> @@ -1284,10 +1287,17 @@ static struct platform_driver armv8_pmu_driver = {
>  
>  static int __init armv8_pmu_driver_init(void)
>  {
> +	int ret;
> +
>  	if (acpi_disabled)
> -		return platform_driver_register(&armv8_pmu_driver);
> +		ret = platform_driver_register(&armv8_pmu_driver);
>  	else
> -		return arm_pmu_acpi_probe(armv8_pmuv3_init);
> +		ret = arm_pmu_acpi_probe(armv8_pmuv3_init);
> +
> +	arm_pmu_initialized = true;
> +	wake_up_all(&arm_pmu_wait);
> +
> +	return ret;
>  }
>  device_initcall(armv8_pmu_driver_init)
>  
> diff --git a/arch/arm64/kernel/watchdog_hld.c b/arch/arm64/kernel/watchdog_hld.c
> new file mode 100644
> index 000000000000..18bfa62f1058
> --- /dev/null
> +++ b/arch/arm64/kernel/watchdog_hld.c
> @@ -0,0 +1,83 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/kthread.h>
> +#include <linux/sched.h>
> +#include <linux/nmi.h>
> +#include <linux/cpufreq.h>
> +#include <asm/perf_event.h>
> +
> +/*
> + * Safe maximum CPU frequency in case a particular platform doesn't implement
> + * cpufreq driver. Although, architecture doesn't put any restrictions on
> + * maximum frequency but 5 GHz seems to be safe maximum given the available
> + * Arm CPUs in the market which are clocked much less than 5 GHz. On the other
> + * hand, we can't make it much higher as it would lead to a large hard-lockup
> + * detection timeout on parts which are running slower (eg. 1GHz on
> + * Developerbox) and doesn't possess a cpufreq driver.
> + */
> +#define SAFE_MAX_CPU_FREQ	5000000000UL // 5 GHz
> +u64 hw_nmi_get_sample_period(int watchdog_thresh)
> +{
> +	unsigned int cpu = smp_processor_id();
> +	unsigned long max_cpu_freq;
> +
> +	max_cpu_freq = cpufreq_get_hw_max_freq(cpu) * 1000UL;
> +	if (!max_cpu_freq)
> +		max_cpu_freq = SAFE_MAX_CPU_FREQ;
> +
> +	return (u64)max_cpu_freq * watchdog_thresh;
> +}
> +
> +static watchdog_nmi_status_reporter status_reporter;
> +
> +static int hld_enabled_thread_fun(void *unused)
> +{
> +	struct watchdog_nmi_status status;
> +	watchdog_nmi_status_reporter local_reporter;
> +	int ret;
> +
> +	wait_event(arm_pmu_wait, arm_pmu_initialized);
> +	status.cpu = raw_smp_processor_id();
> +
> +	if (!check_pmu_nmi_ability()) {
> +		status.status = -ENODEV;
> +		goto report;
> +	}
> +
> +	ret = hardlockup_detector_perf_enable();
> +	status.status = ret;
> +
> +report:
> +	local_reporter = (watchdog_nmi_status_reporter)atomic64_fetch_and(0,
> +				(atomic64_t *)&status_reporter);
> +	if (local_reporter)
> +		(*local_reporter)(&status);
> +
> +	return 0;
> +}
> +
> +/* As for watchdog_nmi_disable(), using the default implement */
> +void watchdog_nmi_enable(unsigned int cpu)
> +{
> +	struct task_struct *t;
> +
> +	/* PMU is not ready */
> +	if (!arm_pmu_initialized) {
> +		t = kthread_create_on_cpu(hld_enabled_thread_fun, NULL, cpu,
> +			       "arm64_hld.%u");
> +		if (IS_ERR(t))
> +			return;
> +		wake_up_process(t);
> +		return;
> +	}
> +
> +	/* For hotplug in cpu */
> +	hardlockup_detector_perf_enable();
> +}
> +
> +int __init watchdog_nmi_probe(watchdog_nmi_status_reporter notifier)
> +{
> +	/* On arm64, arm pmu is not ready at this stage */
> +	status_reporter = notifier;
> +	return -EBUSY;
> +}
> +
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 3cbc3baf087f..e08961b37922 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -105,6 +105,11 @@ static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops);
>  
>  static bool has_nmi;
>  
> +bool check_pmu_nmi_ability(void)
> +{
> +	return has_nmi;
> +}
> +
>  static inline u64 arm_pmu_event_max_period(struct perf_event *event)
>  {
>  	if (event->hw.flags & ARMPMU_EVT_64BIT)
> -- 
> 2.31.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-16  8:36     ` Petr Mladek
@ 2021-09-17 15:41       ` Pingfan Liu
  2021-09-20  8:20         ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: Pingfan Liu @ 2021-09-17 15:41 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Pingfan Liu, linux-kernel, Sumit Garg, Catalin Marinas,
	Will Deacon, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Andrew Morton, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj

On Thu, Sep 16, 2021 at 10:36:10AM +0200, Petr Mladek wrote:
> On Thu 2021-09-16 10:29:05, Petr Mladek wrote:
> > On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > yet. E.g. on arm64, PMU is not ready until
> > > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > > with the driver model and cpuhp. Hence it is hard to push this
> > > initialization before smp_init().
> > > 
> > > But it is easy to take an opposite approach by enabling watchdog_hld to
> > > get the capability of PMU async.
> > 
> > This is another cryptic description. I have probably got it after
> > looking at the 5th patch (was not Cc :-(
> > 
> > > The async model is achieved by introducing an extra parameter notifier
> > > of watchdog_nmi_probe().
> > 
> > I would say that the code is horrible and looks too complex.
> > 
> > What about simply calling watchdog_nmi_probe() and
> > lockup_detector_setup() once again when watchdog_nmi_probe()
> > failed in lockup_detector_init()?
> > 
> > Or do not call lockup_detector_init() at all in
> > kernel_init_freeable() when PMU is not ready yet.
> 
> BTW: It is an overkill to create your own kthread just to run some
> code just once. And you implemeted it a wrong way. The kthread

I had thought about queue_work_on() in watchdog_nmi_enable(). But since
this work will block the worker kthread for this cpu. So finally,
another worker kthread should be created for other work.

But now, I think queue_work_on() may be more neat.

> must wait in a loop until someone else stop it and read
> the exit code.
> 
Is this behavior mandotory? Since this kthread can decide the exit
condition by itself.

And a quick through static int kthread(void *_create), I am not aware of
any problem with it.

> The easiest solution is to queue a work into system_wq for this.
> 
> I was not Cc for the 5th patch, so I write it here.
> 
Sorry for the inconvenience and Cc you now in case that you have further
comments. I will cc you in the next version.

Appreciate for all of your suggestions and comments

Thanks,

	Pingfan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-17 15:41       ` Pingfan Liu
@ 2021-09-20  8:20         ` Petr Mladek
  2021-09-22  4:26           ` Pingfan Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2021-09-20  8:20 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Pingfan Liu, linux-kernel, Sumit Garg, Catalin Marinas,
	Will Deacon, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Andrew Morton, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj

On Fri 2021-09-17 23:41:31, Pingfan Liu wrote:
> On Thu, Sep 16, 2021 at 10:36:10AM +0200, Petr Mladek wrote:
> > On Thu 2021-09-16 10:29:05, Petr Mladek wrote:
> > > On Wed 2021-09-15 11:51:01, Pingfan Liu wrote:
> > > > When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> > > > yet. E.g. on arm64, PMU is not ready until
> > > > device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> > > > with the driver model and cpuhp. Hence it is hard to push this
> > > > initialization before smp_init().
> > > > 
> > > > But it is easy to take an opposite approach by enabling watchdog_hld to
> > > > get the capability of PMU async.
> > > 
> > > This is another cryptic description. I have probably got it after
> > > looking at the 5th patch (was not Cc :-(
> > > 
> > > > The async model is achieved by introducing an extra parameter notifier
> > > > of watchdog_nmi_probe().
> > > 
> > > I would say that the code is horrible and looks too complex.
> > > 
> > > What about simply calling watchdog_nmi_probe() and
> > > lockup_detector_setup() once again when watchdog_nmi_probe()
> > > failed in lockup_detector_init()?
> > > 
> > > Or do not call lockup_detector_init() at all in
> > > kernel_init_freeable() when PMU is not ready yet.
> > 
> > BTW: It is an overkill to create your own kthread just to run some
> > code just once. And you implemeted it a wrong way. The kthread
> 
> I had thought about queue_work_on() in watchdog_nmi_enable(). But since
> this work will block the worker kthread for this cpu. So finally,
> another worker kthread should be created for other work.

This is not a problem. workqueues use a pool of workers that are
already created and can be used when one worker gets blocked.

> But now, I think queue_work_on() may be more neat.
> 
> > must wait in a loop until someone else stop it and read
> > the exit code.
> > 
> Is this behavior mandotory? Since this kthread can decide the exit
> condition by itself.

I am pretty sure. Unfortunately, I can't find it in the documentation.

My view is the following. Each process has a task_struct. The
scheduler needs task_struct so that it can switch processes.
The task_struct must still exist when the process exits.
The scheduler puts the task into TASK_DEAD state.
Another process has to read the exit code and destroy the
task struct.

See, do_exit() in kernel/exit.c. It ends with do_dead_task().
It is the point when the process goes into TASK_DEAD state.

For a good example, see lib/test_vmalloc.c. The kthread waits
until anyone want him to stop:

static int test_func(void *private)
{
[...]

	/*
	 * Wait for the kthread_stop() call.
	 */
	while (!kthread_should_stop())
		msleep(10);

	return 0;
}

The kthreads are started and stopped in:

static void do_concurrent_test(void)
{
[...]
	for (i = 0; i < nr_threads; i++) {
[...]
		t->task = kthread_run(test_func, t, "vmalloc_test/%d", i);
[...]
	/*
	 * Sleep quiet until all workers are done with 1 second
	 * interval. Since the test can take a lot of time we
	 * can run into a stack trace of the hung task. That is
	 * why we go with completion_timeout and HZ value.
	 */
	do {
		ret = wait_for_completion_timeout(&test_all_done_comp, HZ);
	} while (!ret);
[...]
	for (i = 0; i < nr_threads; i++) {
[...]
		if (!IS_ERR(t->task))
			kthread_stop(t->task);
[...]
}


You do not have to solve this if you use the system workqueue
(system_wq).

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model
  2021-09-20  8:20         ` Petr Mladek
@ 2021-09-22  4:26           ` Pingfan Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Pingfan Liu @ 2021-09-22  4:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Pingfan Liu, linux-kernel, Sumit Garg, Catalin Marinas,
	Will Deacon, Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Marc Zyngier,
	Julien Thierry, Kees Cook, Masahiro Yamada, Sami Tolvanen,
	Andrew Morton, Wang Qing, Peter Zijlstra (Intel),
	Santosh Sivaraj

On Mon, Sep 20, 2021 at 10:20:46AM +0200, Petr Mladek wrote:
> On Fri 2021-09-17 23:41:31, Pingfan Liu wrote:
[...]
> > 
> > I had thought about queue_work_on() in watchdog_nmi_enable(). But since
> > this work will block the worker kthread for this cpu. So finally,
> > another worker kthread should be created for other work.
> 
> This is not a problem. workqueues use a pool of workers that are
> already created and can be used when one worker gets blocked.
> 
Yes, you are right. The creation is dynamic and immune to blocking.

> > But now, I think queue_work_on() may be more neat.
> > 
> > > must wait in a loop until someone else stop it and read
> > > the exit code.
> > > 
> > Is this behavior mandotory? Since this kthread can decide the exit
> > condition by itself.
> 
> I am pretty sure. Unfortunately, I can't find it in the documentation.
> 
> My view is the following. Each process has a task_struct. The
> scheduler needs task_struct so that it can switch processes.
> The task_struct must still exist when the process exits.
> The scheduler puts the task into TASK_DEAD state.
> Another process has to read the exit code and destroy the
> task struct.
> 
Thanks for bringing up this, and I have an opportunity to think about it.

The core of the problem is put_task_struct(), and who releases the
last one.
It should be: finish_task_switch()->put_task_struct_rcu_user()->delayed_put_task_struct()->put_task_struct(),
if (unlikely(prev_state == TASK_DEAD)). It does not depend on another task.

> See, do_exit() in kernel/exit.c. It ends with do_dead_task().
> It is the point when the process goes into TASK_DEAD state.
> 
> For a good example, see lib/test_vmalloc.c. The kthread waits
> until anyone want him to stop:
> 
> static int test_func(void *private)
> {
> [...]
> 
> 	/*
> 	 * Wait for the kthread_stop() call.
> 	 */
> 	while (!kthread_should_stop())
> 		msleep(10);
> 
> 	return 0;
> }
> 
> The kthreads are started and stopped in:
> 
> static void do_concurrent_test(void)
> {
> [...]
> 	for (i = 0; i < nr_threads; i++) {
> [...]
> 		t->task = kthread_run(test_func, t, "vmalloc_test/%d", i);
> [...]
> 	/*
> 	 * Sleep quiet until all workers are done with 1 second
> 	 * interval. Since the test can take a lot of time we
> 	 * can run into a stack trace of the hung task. That is
> 	 * why we go with completion_timeout and HZ value.
> 	 */
> 	do {
> 		ret = wait_for_completion_timeout(&test_all_done_comp, HZ);
> 	} while (!ret);
> [...]
> 	for (i = 0; i < nr_threads; i++) {
> [...]
> 		if (!IS_ERR(t->task))
> 			kthread_stop(t->task);
> [...]
> }

They are good and elegant examples.
> 
> 
> You do not have to solve this if you use the system workqueue
> (system_wq).
> 
Yes, workqueue is a better choice.

Thanks for your great patience.


Regards,

	Pingfan



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-09-22  4:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-15  3:50 [PATCH 0/5] watchdog_hld cleanup and async model for arm64 Pingfan Liu
2021-09-15  3:50 ` [PATCH 1/5] kernel/watchdog: remove useless WATCHDOG_DEFAULT Pingfan Liu
2021-09-15  3:51 ` [PATCH 2/5] kernel/watchdog_hld: clarify the condition in hardlockup_detector_event_create() Pingfan Liu
2021-09-15  4:06   ` Andrew Morton
2021-09-16  3:47     ` Pingfan Liu
2021-09-15 13:45   ` Peter Zijlstra
2021-09-16  3:57     ` Pingfan Liu
2021-09-16  8:02       ` Petr Mladek
2021-09-17 15:08         ` Pingfan Liu
2021-09-15  3:51 ` [PATCH 3/5] kernel/watchdog: adapt the watchdog_hld interface for async model Pingfan Liu
2021-09-15 14:02   ` Peter Zijlstra
2021-09-16  3:07     ` Pingfan Liu
2021-09-16  8:29   ` Petr Mladek
2021-09-16  8:36     ` Petr Mladek
2021-09-17 15:41       ` Pingfan Liu
2021-09-20  8:20         ` Petr Mladek
2021-09-22  4:26           ` Pingfan Liu
2021-09-17 14:43     ` Pingfan Liu
2021-09-15  3:51 ` [PATCH 4/5] kernel/watchdog_hld: simplify the detecting of hld watchdog Pingfan Liu
2021-09-15  3:51 ` [PATCH 5/5] arm64/watchdog_hld: enable hard lockup on arm64 platform Pingfan Liu
2021-09-17 15:11   ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).