linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
       [not found] <20170912193654.321505854@linutronix.de>
@ 2017-09-12 19:37 ` Thomas Gleixner
  2017-10-03  0:29   ` Michael Ellerman
  2017-09-12 19:37 ` [patch V2 29/29] lockup_detector: Cleanup hotplug locking mess Thomas Gleixner
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2017-09-12 19:37 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Borislav Petkov, Andrew Morton,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell, Benjamin Herrenschmidt, Michael Ellerman,
	linuxppc-dev

Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
need to be done in two steps.

     1) Stop all NMIs
     2) Read the new parameters and start NMIs

Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
clean reconfiguration add a 'run' argument and split the functionality in
powerpc.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170831073054.740462115@linutronix.de

---
 arch/powerpc/kernel/watchdog.c |   17 +++++++++--------
 include/linux/nmi.h            |    2 ++
 kernel/watchdog.c              |   31 ++++++++++++++++++++++---------
 3 files changed, 33 insertions(+), 17 deletions(-)

--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -355,17 +355,18 @@ static void watchdog_calc_timeouts(void)
 	wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5;
 }
 
-void watchdog_nmi_reconfigure(void)
+void watchdog_nmi_reconfigure(bool run)
 {
 	int cpu;
 
-	watchdog_calc_timeouts();
-
-	for_each_cpu(cpu, &wd_cpus_enabled)
-		stop_wd_on_cpu(cpu);
-
-	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
-		start_wd_on_cpu(cpu);
+	if (!run) {
+		for_each_cpu(cpu, &wd_cpus_enabled)
+			stop_wd_on_cpu(cpu);
+	} else {
+		watchdog_calc_timeouts();
+		for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
+			start_wd_on_cpu(cpu);
+	}
 }
 
 /*
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -103,6 +103,8 @@ static inline void arch_touch_nmi_watchd
 #endif
 #endif
 
+void watchdog_nmi_reconfigure(bool run);
+
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
  *
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -112,17 +112,25 @@ void __weak watchdog_nmi_disable(unsigne
 	hardlockup_detector_perf_disable();
 }
 
-/*
- * watchdog_nmi_reconfigure can be implemented to be notified after any
- * watchdog configuration change. The arch hardlockup watchdog should
- * respond to the following variables:
+/**
+ * watchdog_nmi_reconfigure - Optional function to reconfigure NMI watchdogs
+ * @run:	If false stop the watchdogs on all enabled CPUs
+ *		If true start the watchdogs on all enabled CPUs
+ *
+ * The core call order is:
+ * watchdog_nmi_reconfigure(false);
+ * update_variables();
+ * watchdog_nmi_reconfigure(true);
+ *
+ * The second call which starts the watchdogs again guarantees that the
+ * following variables are stable across the call.
  * - watchdog_enabled
  * - watchdog_thresh
  * - watchdog_cpumask
- * - sysctl_hardlockup_all_cpu_backtrace
- * - hardlockup_panic
+ *
+ * After the call the variables can be changed again.
  */
-void __weak watchdog_nmi_reconfigure(void) { }
+void __weak watchdog_nmi_reconfigure(bool run) { }
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 
@@ -515,10 +523,12 @@ static void softlockup_unpark_threads(vo
 
 static void softlockup_reconfigure_threads(bool enabled)
 {
+	watchdog_nmi_reconfigure(false);
 	softlockup_park_all_threads();
 	set_sample_period();
 	if (enabled)
 		softlockup_unpark_threads();
+	watchdog_nmi_reconfigure(true);
 }
 
 /*
@@ -559,7 +569,11 @@ static inline void watchdog_unpark_threa
 static inline int watchdog_enable_all_cpus(void) { return 0; }
 static inline void watchdog_disable_all_cpus(void) { }
 static inline void softlockup_init_threads(void) { }
-static inline void softlockup_reconfigure_threads(bool enabled) { }
+static void softlockup_reconfigure_threads(bool enabled)
+{
+	watchdog_nmi_reconfigure(false);
+	watchdog_nmi_reconfigure(true);
+}
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
 static void __lockup_detector_cleanup(void)
@@ -599,7 +613,6 @@ static void proc_watchdog_update(void)
 	/* Remove impossible cpus to keep sysctl output clean. */
 	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
 	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
-	watchdog_nmi_reconfigure();
 }
 
 /*

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch V2 29/29] lockup_detector: Cleanup hotplug locking mess
       [not found] <20170912193654.321505854@linutronix.de>
  2017-09-12 19:37 ` [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
@ 2017-09-12 19:37 ` Thomas Gleixner
  1 sibling, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-09-12 19:37 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Borislav Petkov, Andrew Morton,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell, Benjamin Herrenschmidt, Michael Ellerman,
	linuxppc-dev

All watchdog thread related functions are delegated to the smpboot thread
infrastructure, which handles serialization against CPU hotplug correctly.

The sysctl interface is completely decoupled from anything which requires
CPU hotplug protection.

No need to protect the sysctl writes against cpu hotplug anymore. Remove it
and add the now required protection to the powerpc arch_nmi_watchdog
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170831073055.331699267@linutronix.de

---
 arch/powerpc/kernel/watchdog.c |    2 ++
 kernel/watchdog.c              |    6 ------
 2 files changed, 2 insertions(+), 6 deletions(-)

--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -359,6 +359,7 @@ void watchdog_nmi_reconfigure(bool run)
 {
 	int cpu;
 
+	cpus_read_lock();
 	if (!run) {
 		for_each_cpu(cpu, &wd_cpus_enabled)
 			stop_wd_on_cpu(cpu);
@@ -367,6 +368,7 @@ void watchdog_nmi_reconfigure(bool run)
 		for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
 			start_wd_on_cpu(cpu);
 	}
+	cpus_read_unlock();
 }
 
 /*
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -664,7 +664,6 @@ static int proc_watchdog_common(int whic
 {
 	int err, old, *param = table->data;
 
-	cpu_hotplug_disable();
 	mutex_lock(&watchdog_mutex);
 
 	if (!write) {
@@ -681,7 +680,6 @@ static int proc_watchdog_common(int whic
 			proc_watchdog_update();
 	}
 	mutex_unlock(&watchdog_mutex);
-	cpu_hotplug_enable();
 	return err;
 }
 
@@ -725,7 +723,6 @@ int proc_watchdog_thresh(struct ctl_tabl
 {
 	int err, old;
 
-	cpu_hotplug_disable();
 	mutex_lock(&watchdog_mutex);
 
 	old = READ_ONCE(watchdog_thresh);
@@ -735,7 +732,6 @@ int proc_watchdog_thresh(struct ctl_tabl
 		proc_watchdog_update();
 
 	mutex_unlock(&watchdog_mutex);
-	cpu_hotplug_enable();
 	return err;
 }
 
@@ -750,7 +746,6 @@ int proc_watchdog_cpumask(struct ctl_tab
 {
 	int err;
 
-	cpu_hotplug_disable();
 	mutex_lock(&watchdog_mutex);
 
 	err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
@@ -758,7 +753,6 @@ int proc_watchdog_cpumask(struct ctl_tab
 		proc_watchdog_update();
 
 	mutex_unlock(&watchdog_mutex);
-	cpu_hotplug_enable();
 	return err;
 }
 #endif /* CONFIG_SYSCTL */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-09-12 19:37 ` [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
@ 2017-10-03  0:29   ` Michael Ellerman
  2017-10-03  6:50     ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Ellerman @ 2017-10-03  0:29 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Ingo Molnar, Peter Zijlstra, Borislav Petkov, Andrew Morton,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell, Benjamin Herrenschmidt, linuxppc-dev

Hi Thomas,

Thomas Gleixner <tglx@linutronix.de> writes:
> Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
> need to be done in two steps.
>
>      1) Stop all NMIs
>      2) Read the new parameters and start NMIs
>
> Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
> clean reconfiguration add a 'run' argument and split the functionality in
> powerpc.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Ulrich Obergfell <uobergfe@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Link: http://lkml.kernel.org/r/20170831073054.740462115@linutronix.de
>
> ---
>  arch/powerpc/kernel/watchdog.c |   17 +++++++++--------
>  include/linux/nmi.h            |    2 ++
>  kernel/watchdog.c              |   31 ++++++++++++++++++++++---------
>  3 files changed, 33 insertions(+), 17 deletions(-)

Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc
because we're calling it multiple times for the boot CPU.

The first call is via:

  start_wd_on_cpu+0x80/0x2f0
  watchdog_nmi_reconfigure+0x124/0x170
  softlockup_reconfigure_threads+0x110/0x130
  lockup_detector_init+0xbc/0xe0
  kernel_init_freeable+0x18c/0x37c
  kernel_init+0x2c/0x160
  ret_from_kernel_thread+0x5c/0xbc

And then again via the CPU hotplug registration:

  start_wd_on_cpu+0x80/0x2f0
  cpuhp_invoke_callback+0x194/0x620
  cpuhp_thread_fun+0x7c/0x1b0
  smpboot_thread_fn+0x290/0x2a0
  kthread+0x168/0x1b0
  ret_from_kernel_thread+0x5c/0xbc


The first call is new because previously watchdog_nmi_reconfigure()
wasn't called from softlockup_reconfigure_threads().

I'm not sure what the easiest fix is. One option would be to just drop
the WARN_ON, it's just there for paranoia AFAICS.

cheers

>
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -355,17 +355,18 @@ static void watchdog_calc_timeouts(void)
>  	wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5;
>  }
>  
> -void watchdog_nmi_reconfigure(void)
> +void watchdog_nmi_reconfigure(bool run)
>  {
>  	int cpu;
>  
> -	watchdog_calc_timeouts();
> -
> -	for_each_cpu(cpu, &wd_cpus_enabled)
> -		stop_wd_on_cpu(cpu);
> -
> -	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
> -		start_wd_on_cpu(cpu);
> +	if (!run) {
> +		for_each_cpu(cpu, &wd_cpus_enabled)
> +			stop_wd_on_cpu(cpu);
> +	} else {
> +		watchdog_calc_timeouts();
> +		for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
> +			start_wd_on_cpu(cpu);
> +	}
>  }
>  
>  /*
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -103,6 +103,8 @@ static inline void arch_touch_nmi_watchd
>  #endif
>  #endif
>  
> +void watchdog_nmi_reconfigure(bool run);
> +
>  /**
>   * touch_nmi_watchdog - restart NMI watchdog timeout.
>   *
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -112,17 +112,25 @@ void __weak watchdog_nmi_disable(unsigne
>  	hardlockup_detector_perf_disable();
>  }
>  
> -/*
> - * watchdog_nmi_reconfigure can be implemented to be notified after any
> - * watchdog configuration change. The arch hardlockup watchdog should
> - * respond to the following variables:
> +/**
> + * watchdog_nmi_reconfigure - Optional function to reconfigure NMI watchdogs
> + * @run:	If false stop the watchdogs on all enabled CPUs
> + *		If true start the watchdogs on all enabled CPUs
> + *
> + * The core call order is:
> + * watchdog_nmi_reconfigure(false);
> + * update_variables();
> + * watchdog_nmi_reconfigure(true);
> + *
> + * The second call which starts the watchdogs again guarantees that the
> + * following variables are stable across the call.
>   * - watchdog_enabled
>   * - watchdog_thresh
>   * - watchdog_cpumask
> - * - sysctl_hardlockup_all_cpu_backtrace
> - * - hardlockup_panic
> + *
> + * After the call the variables can be changed again.
>   */
> -void __weak watchdog_nmi_reconfigure(void) { }
> +void __weak watchdog_nmi_reconfigure(bool run) { }
>  
>  #ifdef CONFIG_SOFTLOCKUP_DETECTOR
>  
> @@ -515,10 +523,12 @@ static void softlockup_unpark_threads(vo
>  
>  static void softlockup_reconfigure_threads(bool enabled)
>  {
> +	watchdog_nmi_reconfigure(false);
>  	softlockup_park_all_threads();
>  	set_sample_period();
>  	if (enabled)
>  		softlockup_unpark_threads();
> +	watchdog_nmi_reconfigure(true);
>  }
>  
>  /*
> @@ -559,7 +569,11 @@ static inline void watchdog_unpark_threa
>  static inline int watchdog_enable_all_cpus(void) { return 0; }
>  static inline void watchdog_disable_all_cpus(void) { }
>  static inline void softlockup_init_threads(void) { }
> -static inline void softlockup_reconfigure_threads(bool enabled) { }
> +static void softlockup_reconfigure_threads(bool enabled)
> +{
> +	watchdog_nmi_reconfigure(false);
> +	watchdog_nmi_reconfigure(true);
> +}
>  #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
>  
>  static void __lockup_detector_cleanup(void)
> @@ -599,7 +613,6 @@ static void proc_watchdog_update(void)
>  	/* Remove impossible cpus to keep sysctl output clean. */
>  	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
>  	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
> -	watchdog_nmi_reconfigure();
>  }
>  
>  /*

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03  0:29   ` Michael Ellerman
@ 2017-10-03  6:50     ` Thomas Gleixner
  2017-10-03  7:04       ` Thomas Gleixner
  2017-10-03 11:36       ` Michael Ellerman
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-10-03  6:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov,
	Andrew Morton, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

On Tue, 3 Oct 2017, Michael Ellerman wrote:
> Hi Thomas,
> Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc
> because we're calling it multiple times for the boot CPU.
> 
> The first call is via:
> 
>   start_wd_on_cpu+0x80/0x2f0
>   watchdog_nmi_reconfigure+0x124/0x170
>   softlockup_reconfigure_threads+0x110/0x130
>   lockup_detector_init+0xbc/0xe0
>   kernel_init_freeable+0x18c/0x37c
>   kernel_init+0x2c/0x160
>   ret_from_kernel_thread+0x5c/0xbc
> 
> And then again via the CPU hotplug registration:
> 
>   start_wd_on_cpu+0x80/0x2f0
>   cpuhp_invoke_callback+0x194/0x620
>   cpuhp_thread_fun+0x7c/0x1b0
>   smpboot_thread_fn+0x290/0x2a0
>   kthread+0x168/0x1b0
>   ret_from_kernel_thread+0x5c/0xbc
> 
> 
> The first call is new because previously watchdog_nmi_reconfigure()
> wasn't called from softlockup_reconfigure_threads().

Hmm, don't you have the same problem with CPU hotplug or do you just get
lucky because the hotplug callback in your code is ordered vs. the
softlockup thread hotplug callback in a way that this does not hit?

> I'm not sure what the easiest fix is. One option would be to just drop
> the WARN_ON, it's just there for paranoia AFAICS.

The straight forward way is to make use of the new probe function. Patch
below.

Thanks,

	tglx

8<------------------
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -375,20 +375,18 @@ void watchdog_nmi_start(void)
 /*
  * This runs after lockup_detector_init() which sets up watchdog_cpumask.
  */
-static int __init powerpc_watchdog_init(void)
+int __init watchdog_nmi_probe(void)
 {
 	int err;
 
-	watchdog_calc_timeouts();
-
-	err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/watchdog:online",
-				start_wd_on_cpu, stop_wd_on_cpu);
+	err = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					"powerpc/watchdog:online",
+					start_wd_on_cpu, stop_wd_on_cpu);
 	if (err < 0)
 		pr_warn("Watchdog could not be initialized");
 
 	return 0;
 }
-arch_initcall(powerpc_watchdog_init);
 
 static void handle_backtrace_ipi(struct pt_regs *regs)
 {
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -608,7 +608,6 @@ static inline int watchdog_park_threads(
 static inline void watchdog_unpark_threads(void) { }
 static inline int watchdog_enable_all_cpus(void) { return 0; }
 static inline void watchdog_disable_all_cpus(void) { }
-static inline void softlockup_init_threads(void) { }
 static void softlockup_reconfigure_threads(void)
 {
 	cpus_read_lock();
@@ -617,6 +616,10 @@ static void softlockup_reconfigure_threa
 	watchdog_nmi_start();
 	cpus_read_unlock();
 }
+static inline void softlockup_init_threads(void)
+{
+	softlockup_reconfigure_threads();
+}
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
 static void __lockup_detector_cleanup(void)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03  6:50     ` Thomas Gleixner
@ 2017-10-03  7:04       ` Thomas Gleixner
  2017-10-03 10:01         ` Nicholas Piggin
  2017-10-03 11:36       ` Michael Ellerman
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2017-10-03  7:04 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov,
	Andrew Morton, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> On Tue, 3 Oct 2017, Michael Ellerman wrote:
> > Hi Thomas,
> > Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc
> > because we're calling it multiple times for the boot CPU.
> > 
> > The first call is via:
> > 
> >   start_wd_on_cpu+0x80/0x2f0
> >   watchdog_nmi_reconfigure+0x124/0x170
> >   softlockup_reconfigure_threads+0x110/0x130
> >   lockup_detector_init+0xbc/0xe0
> >   kernel_init_freeable+0x18c/0x37c
> >   kernel_init+0x2c/0x160
> >   ret_from_kernel_thread+0x5c/0xbc
> > 
> > And then again via the CPU hotplug registration:
> > 
> >   start_wd_on_cpu+0x80/0x2f0
> >   cpuhp_invoke_callback+0x194/0x620
> >   cpuhp_thread_fun+0x7c/0x1b0
> >   smpboot_thread_fn+0x290/0x2a0
> >   kthread+0x168/0x1b0
> >   ret_from_kernel_thread+0x5c/0xbc
> > 
> > 
> > The first call is new because previously watchdog_nmi_reconfigure()
> > wasn't called from softlockup_reconfigure_threads().
> 
> Hmm, don't you have the same problem with CPU hotplug or do you just get
> lucky because the hotplug callback in your code is ordered vs. the
> softlockup thread hotplug callback in a way that this does not hit?

Which leads me to the question why you need the hotplug state at all if the
softlockup detector is enabled. Wouldn't it make more sense to only
register the state if softlockup detector is turned off in Kconfig and
actually move it to the core code?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03  7:04       ` Thomas Gleixner
@ 2017-10-03 10:01         ` Nicholas Piggin
  2017-10-03 10:56           ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2017-10-03 10:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michael Ellerman, LKML, Ingo Molnar, Peter Zijlstra,
	Borislav Petkov, Andrew Morton, Sebastian Siewior, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

On Tue, 3 Oct 2017 09:04:03 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> > On Tue, 3 Oct 2017, Michael Ellerman wrote:  
> > > Hi Thomas,
> > > Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc
> > > because we're calling it multiple times for the boot CPU.
> > > 
> > > The first call is via:
> > > 
> > >   start_wd_on_cpu+0x80/0x2f0
> > >   watchdog_nmi_reconfigure+0x124/0x170
> > >   softlockup_reconfigure_threads+0x110/0x130
> > >   lockup_detector_init+0xbc/0xe0
> > >   kernel_init_freeable+0x18c/0x37c
> > >   kernel_init+0x2c/0x160
> > >   ret_from_kernel_thread+0x5c/0xbc
> > > 
> > > And then again via the CPU hotplug registration:
> > > 
> > >   start_wd_on_cpu+0x80/0x2f0
> > >   cpuhp_invoke_callback+0x194/0x620
> > >   cpuhp_thread_fun+0x7c/0x1b0
> > >   smpboot_thread_fn+0x290/0x2a0
> > >   kthread+0x168/0x1b0
> > >   ret_from_kernel_thread+0x5c/0xbc
> > > 
> > > 
> > > The first call is new because previously watchdog_nmi_reconfigure()
> > > wasn't called from softlockup_reconfigure_threads().  
> > 
> > Hmm, don't you have the same problem with CPU hotplug or do you just get
> > lucky because the hotplug callback in your code is ordered vs. the
> > softlockup thread hotplug callback in a way that this does not hit?

I had the idea that it watchdog_nmi_reconfigure() being only called
with get_online_cpus held would prevent hotplug callbacks running.
  
> 
> Which leads me to the question why you need the hotplug state at all if the
> softlockup detector is enabled. Wouldn't it make more sense to only
> register the state if softlockup detector is turned off in Kconfig and
> actually move it to the core code?

I don't understand what you mean exactly, but it was done to avoid
relying on the softlockup detector at all, because it wasn't needed
for anything else (unlike the perf lockup detector).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03 10:01         ` Nicholas Piggin
@ 2017-10-03 10:56           ` Thomas Gleixner
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-10-03 10:56 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Michael Ellerman, LKML, Ingo Molnar, Peter Zijlstra,
	Borislav Petkov, Andrew Morton, Sebastian Siewior, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

On Tue, 3 Oct 2017, Nicholas Piggin wrote:
> On Tue, 3 Oct 2017 09:04:03 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> > > On Tue, 3 Oct 2017, Michael Ellerman wrote:  
> > > > Hi Thomas,
> > > > Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc
> > > > because we're calling it multiple times for the boot CPU.
> > > > 
> > > > The first call is via:
> > > > 
> > > >   start_wd_on_cpu+0x80/0x2f0
> > > >   watchdog_nmi_reconfigure+0x124/0x170
> > > >   softlockup_reconfigure_threads+0x110/0x130
> > > >   lockup_detector_init+0xbc/0xe0
> > > >   kernel_init_freeable+0x18c/0x37c
> > > >   kernel_init+0x2c/0x160
> > > >   ret_from_kernel_thread+0x5c/0xbc
> > > > 
> > > > And then again via the CPU hotplug registration:
> > > > 
> > > >   start_wd_on_cpu+0x80/0x2f0
> > > >   cpuhp_invoke_callback+0x194/0x620
> > > >   cpuhp_thread_fun+0x7c/0x1b0
> > > >   smpboot_thread_fn+0x290/0x2a0
> > > >   kthread+0x168/0x1b0
> > > >   ret_from_kernel_thread+0x5c/0xbc
> > > > 
> > > > 
> > > > The first call is new because previously watchdog_nmi_reconfigure()
> > > > wasn't called from softlockup_reconfigure_threads().  
> > > 
> > > Hmm, don't you have the same problem with CPU hotplug or do you just get
> > > lucky because the hotplug callback in your code is ordered vs. the
> > > softlockup thread hotplug callback in a way that this does not hit?
> 
> I had the idea that it watchdog_nmi_reconfigure() being only called
> with get_online_cpus held would prevent hotplug callbacks running.
>   
> > 
> > Which leads me to the question why you need the hotplug state at all if the
> > softlockup detector is enabled. Wouldn't it make more sense to only
> > register the state if softlockup detector is turned off in Kconfig and
> > actually move it to the core code?
> 
> I don't understand what you mean exactly, but it was done to avoid
> relying on the softlockup detector at all, because it wasn't needed
> for anything else (unlike the perf lockup detector).

If the softlockup detector is enabled along with your hardlockup detector
then the current code in mainline invokes watchdog_nmi_enable(cpu), which
is a weak function and as I just noticed not implemented by powerpc. So
it's a non issue because it's not implemented.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03  6:50     ` Thomas Gleixner
  2017-10-03  7:04       ` Thomas Gleixner
@ 2017-10-03 11:36       ` Michael Ellerman
  2017-10-03 12:13         ` Thomas Gleixner
  1 sibling, 1 reply; 13+ messages in thread
From: Michael Ellerman @ 2017-10-03 11:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov,
	Andrew Morton, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, 3 Oct 2017, Michael Ellerman wrote:
>> Hi Thomas,
>> Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc
>> because we're calling it multiple times for the boot CPU.
>> 
>> The first call is via:
>> 
>>   start_wd_on_cpu+0x80/0x2f0
>>   watchdog_nmi_reconfigure+0x124/0x170
>>   softlockup_reconfigure_threads+0x110/0x130
>>   lockup_detector_init+0xbc/0xe0
>>   kernel_init_freeable+0x18c/0x37c
>>   kernel_init+0x2c/0x160
>>   ret_from_kernel_thread+0x5c/0xbc
>> 
>> And then again via the CPU hotplug registration:
>> 
>>   start_wd_on_cpu+0x80/0x2f0
>>   cpuhp_invoke_callback+0x194/0x620
>>   cpuhp_thread_fun+0x7c/0x1b0
>>   smpboot_thread_fn+0x290/0x2a0
>>   kthread+0x168/0x1b0
>>   ret_from_kernel_thread+0x5c/0xbc
>> 
>> 
>> The first call is new because previously watchdog_nmi_reconfigure()
>> wasn't called from softlockup_reconfigure_threads().
>
> Hmm, don't you have the same problem with CPU hotplug or do you just get
> lucky because the hotplug callback in your code is ordered vs. the
> softlockup thread hotplug callback in a way that this does not hit?

I don't see it with CPU hotplug.

AFAICS that's because softlockup_reconfigure_threads() isn't called for
CPU hotplug. Unless there's a path I'm missing?

>> I'm not sure what the easiest fix is. One option would be to just drop
>> the WARN_ON, it's just there for paranoia AFAICS.
>
> The straight forward way is to make use of the new probe function. Patch
> below.

Thanks.

Hmm, I tried that patch, it makes the warning go away. But then I
triggered a deliberate hard lockup and got nothing.

Then I went back to the existing code (in linux-next), and I still get
no warning from a deliberate hard lockup.

So seems there may be some more gremlins. Will test more in the morning.

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03 11:36       ` Michael Ellerman
@ 2017-10-03 12:13         ` Thomas Gleixner
  2017-10-03 13:20           ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2017-10-03 12:13 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov,
	Andrew Morton, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

On Tue, 3 Oct 2017, Michael Ellerman wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> >> The first call is new because previously watchdog_nmi_reconfigure()
> >> wasn't called from softlockup_reconfigure_threads().
> >
> > Hmm, don't you have the same problem with CPU hotplug or do you just get
> > lucky because the hotplug callback in your code is ordered vs. the
> > softlockup thread hotplug callback in a way that this does not hit?
> 
> I don't see it with CPU hotplug.
> 
> AFAICS that's because softlockup_reconfigure_threads() isn't called for
> CPU hotplug. Unless there's a path I'm missing?

As I said in the other reply, I assumed that its called via
watchdog_nmi_enable(cpu), but that's a weak function which is not
implemented on power. So no issue.

> >> I'm not sure what the easiest fix is. One option would be to just drop
> >> the WARN_ON, it's just there for paranoia AFAICS.
> >
> > The straight forward way is to make use of the new probe function. Patch
> > below.
> 
> Hmm, I tried that patch, it makes the warning go away. But then I
> triggered a deliberate hard lockup and got nothing.
> 
> Then I went back to the existing code (in linux-next), and I still get
> no warning from a deliberate hard lockup.
> 
> So seems there may be some more gremlins. Will test more in the morning.

Hrm. That's weird. I'll have a look and send a proper patch series on top
of next.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03 12:13         ` Thomas Gleixner
@ 2017-10-03 13:20           ` Thomas Gleixner
  2017-10-03 19:27             ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2017-10-03 13:20 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov,
	Andrew Morton, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> On Tue, 3 Oct 2017, Michael Ellerman wrote:
> > Hmm, I tried that patch, it makes the warning go away. But then I
> > triggered a deliberate hard lockup and got nothing.
> > 
> > Then I went back to the existing code (in linux-next), and I still get
> > no warning from a deliberate hard lockup.
> > 
> > So seems there may be some more gremlins. Will test more in the morning.
> 
> Hrm. That's weird. I'll have a look and send a proper patch series on top
> of next.

The major difference is that the reworked code utilizes
watchdog_nmi_reconfigure() for both init and the sysctl updates, but I
can't for my life figure out why that doesn't work.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03 13:20           ` Thomas Gleixner
@ 2017-10-03 19:27             ` Thomas Gleixner
  2017-10-04  5:53               ` Michael Ellerman
  2017-10-05 16:17               ` Don Zickus
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-10-03 19:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov,
	Andrew Morton, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> > On Tue, 3 Oct 2017, Michael Ellerman wrote:
> > > Hmm, I tried that patch, it makes the warning go away. But then I
> > > triggered a deliberate hard lockup and got nothing.
> > > 
> > > Then I went back to the existing code (in linux-next), and I still get
> > > no warning from a deliberate hard lockup.
> > > 
> > > So seems there may be some more gremlins. Will test more in the morning.
> > 
> > Hrm. That's weird. I'll have a look and send a proper patch series on top
> > of next.
> 
> The major difference is that the reworked code utilizes
> watchdog_nmi_reconfigure() for both init and the sysctl updates, but I
> can't for my life figure out why that doesn't work.

I collected the changes which Linus requested along with the nmi_probe()
one and pushed them into:

 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.core/urgent

That's based on 4.13 final so it neither contains 4.14 nor -next material.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03 19:27             ` Thomas Gleixner
@ 2017-10-04  5:53               ` Michael Ellerman
  2017-10-05 16:17               ` Don Zickus
  1 sibling, 0 replies; 13+ messages in thread
From: Michael Ellerman @ 2017-10-04  5:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov,
	Andrew Morton, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf, Ulrich Obergfell, Benjamin Herrenschmidt,
	linuxppc-dev

Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
>> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
>> > On Tue, 3 Oct 2017, Michael Ellerman wrote:
>> > > Hmm, I tried that patch, it makes the warning go away. But then I
>> > > triggered a deliberate hard lockup and got nothing.
>> > > 
>> > > Then I went back to the existing code (in linux-next), and I still get
>> > > no warning from a deliberate hard lockup.
>> > > 
>> > > So seems there may be some more gremlins. Will test more in the morning.
>> > 
>> > Hrm. That's weird. I'll have a look and send a proper patch series on top
>> > of next.
>> 
>> The major difference is that the reworked code utilizes
>> watchdog_nmi_reconfigure() for both init and the sysctl updates, but I
>> can't for my life figure out why that doesn't work.
>
> I collected the changes which Linus requested along with the nmi_probe()
> one and pushed them into:
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.core/urgent
>
> That's based on 4.13 final so it neither contains 4.14 nor -next material.

Thanks. I tested that here and it seems fine. The warning at boot is
gone and it is correctly catching a hard lockup triggered via LKDTM, eg:

  # mount -t debugfs none /sys/kernel/debug
  # echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
  lkdtm: Performing direct entry HARDLOCKUP
  Watchdog CPU:0 Hard LOCKUP
  Modules linked in:
  CPU: 0 PID: 1215 Comm: sh Not tainted 4.13.0-gcc6-11846-g86be5ee #162
  task: c0000000f1fc4c00 task.stack: c0000000ee3ac000
  NIP:  c0000000007205a4 LR: c00000000071f950 CTR: c000000000720570
  REGS: c00000003ffffd80 TRAP: 0900   Not tainted  (4.13.0-gcc6-11846-g86be5ee)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28002228  XER: 00000000
  CFAR: c0000000007205a8 SOFTE: 0 
  GPR00: c00000000071f950 c0000000ee3afbb0 c00000000107cf00 c0000000010604f0 
  GPR04: c0000000ffa05d90 c0000000ffa1c968 0000000000000000 0000000000000000 
  GPR08: 0000000000000007 0000000000000001 0000000000000000 9000000030001003 
  GPR12: c000000000720570 c00000000fd40000 0000000000000000 0000000000000000 
  GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000100b8fd0 
  GPR20: 000001002f5a3485 00000000100b8f90 0000000000000000 0000000000000000 
  GPR24: c000000001060778 c0000000ee3afe00 c0000000ee3afe00 c0000000010603b0 
  GPR28: 000000000000000b c0000000f1fe0000 0000000000000140 c0000000010604f0 
  NIP [c0000000007205a4] lkdtm_HARDLOCKUP+0x34/0x40
  LR [c00000000071f950] lkdtm_do_action+0x50/0x70
  Call Trace:
  [c0000000ee3afbb0] [0000000000000140] 0x140 (unreliable)
  [c0000000ee3afbd0] [c00000000071f950] lkdtm_do_action+0x50/0x70
  [c0000000ee3afc00] [c00000000071fdc0] direct_entry+0x110/0x1b0
  [c0000000ee3afc90] [c00000000050141c] full_proxy_write+0x9c/0x110
  [c0000000ee3afcf0] [c000000000336a3c] __vfs_write+0x6c/0x210
  [c0000000ee3afd90] [c000000000338960] vfs_write+0xd0/0x270
  [c0000000ee3afde0] [c00000000033a93c] SyS_write+0x6c/0x110
  [c0000000ee3afe30] [c00000000000b220] system_call+0x58/0x6c
  Instruction dump:
  3842c990 7c0802a6 f8010010 f821ffe1 60000000 60000000 39400000 892d027a 
  994d027a 60000000 60420000 7c210b78 <7c421378> 4bfffff8 60420000 3c4c0096 
  Kernel panic - not syncing: Hard LOCKUP

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-10-03 19:27             ` Thomas Gleixner
  2017-10-04  5:53               ` Michael Ellerman
@ 2017-10-05 16:17               ` Don Zickus
  1 sibling, 0 replies; 13+ messages in thread
From: Don Zickus @ 2017-10-05 16:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michael Ellerman, LKML, Ingo Molnar, Peter Zijlstra,
	Borislav Petkov, Andrew Morton, Sebastian Siewior,
	Nicholas Piggin, Chris Metcalf, Ulrich Obergfell,
	Benjamin Herrenschmidt, linuxppc-dev

On Tue, Oct 03, 2017 at 07:27:01PM +0000, Thomas Gleixner wrote:
> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> > On Tue, 3 Oct 2017, Thomas Gleixner wrote:
> > > On Tue, 3 Oct 2017, Michael Ellerman wrote:
> > > > Hmm, I tried that patch, it makes the warning go away. But then I
> > > > triggered a deliberate hard lockup and got nothing.
> > > > 
> > > > Then I went back to the existing code (in linux-next), and I still get
> > > > no warning from a deliberate hard lockup.
> > > > 
> > > > So seems there may be some more gremlins. Will test more in the morning.
> > > 
> > > Hrm. That's weird. I'll have a look and send a proper patch series on top
> > > of next.
> > 
> > The major difference is that the reworked code utilizes
> > watchdog_nmi_reconfigure() for both init and the sysctl updates, but I
> > can't for my life figure out why that doesn't work.
> 
> I collected the changes which Linus requested along with the nmi_probe()
> one and pushed them into:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.core/urgent
> 
> That's based on 4.13 final so it neither contains 4.14 nor -next material.

Tested your changes on 4.14-rc3 and it passes my tests.  Thanks!

Tested-and-Reviewed-by: Don Zickus <dzickus@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-05 16:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170912193654.321505854@linutronix.de>
2017-09-12 19:37 ` [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
2017-10-03  0:29   ` Michael Ellerman
2017-10-03  6:50     ` Thomas Gleixner
2017-10-03  7:04       ` Thomas Gleixner
2017-10-03 10:01         ` Nicholas Piggin
2017-10-03 10:56           ` Thomas Gleixner
2017-10-03 11:36       ` Michael Ellerman
2017-10-03 12:13         ` Thomas Gleixner
2017-10-03 13:20           ` Thomas Gleixner
2017-10-03 19:27             ` Thomas Gleixner
2017-10-04  5:53               ` Michael Ellerman
2017-10-05 16:17               ` Don Zickus
2017-09-12 19:37 ` [patch V2 29/29] lockup_detector: Cleanup hotplug locking mess Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).