linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC GIT Pull] core watchdog sanitizing
       [not found] ` <CA+55aFxQsmSb=zohGZOZK7eR4n5xOHNArmCa9j1b7+wJ+khQrg@mail.gmail.com>
@ 2017-10-02 18:46   ` Thomas Gleixner
  2017-10-02 19:04     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2017-10-02 18:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Don Zickus,
	Benjamin Herrenschmidt, Michael Ellerman, Nicholas Piggin,
	linuxppc-dev

On Sun, 1 Oct 2017, Linus Torvalds wrote:
> So get rid of that kind of shit, and I may reconsider. But as is, I
> look at that patch and say "no, this is worse than the garbage it used
> to be".

I agree that adding that 'run' argument was certainly not a piece of
art. Though I disagree with the sentiment that non-functional garbage is
preferrable over functionally correct code which merily contains a bad
implementation choice. Without knowing you for years, it would certainly
seem more rewarding to dump garbage and get it merged, than caring and
mopping up garbage, which should not have been merged in the first
place. Your motivation skills are truly outstanding.

Enough vented. Find below the cure for that major offense.

Thanks,

	tglx

8<--------------------

Subject: watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Mon, 02 Oct 2017 12:34:50 +0200

The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
into two stages. One to stop the NMI and one to restart it after
reconfiguration. That was done by adding a boolean 'run' argument to the
function, which is functionally correct but not necessarily a piece of art.

Replace it by two explicit functions: watchdog_nmi_stop() and
watchdog_nmi_start().

Fixes: 6592ad2fcc8f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage")
Requested-by: Linus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas 'Mopping up garbage' Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org

---
 arch/powerpc/kernel/watchdog.c |   23 ++++++++++++++---------
 include/linux/nmi.h            |    3 ++-
 kernel/watchdog.c              |   33 ++++++++++++++++++---------------
 3 files changed, 34 insertions(+), 25 deletions(-)

--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -355,19 +355,24 @@ static void watchdog_calc_timeouts(void)
 	wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5;
 }
 
-void watchdog_nmi_reconfigure(bool run)
+void watchdog_nmi_stop(void)
 {
 	int cpu;
 
 	cpus_read_lock();
-	if (!run) {
-		for_each_cpu(cpu, &wd_cpus_enabled)
-			stop_wd_on_cpu(cpu);
-	} else {
-		watchdog_calc_timeouts();
-		for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
-			start_wd_on_cpu(cpu);
-	}
+	for_each_cpu(cpu, &wd_cpus_enabled)
+		stop_wd_on_cpu(cpu);
+	cpus_read_unlock();
+}
+
+void watchdog_nmi_start(void)
+{
+	int cpu;
+
+	cpus_read_lock();
+	watchdog_calc_timeouts();
+	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
+		start_wd_on_cpu(cpu);
 	cpus_read_unlock();
 }
 
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -109,7 +109,8 @@ static inline int hardlockup_detector_pe
 # endif
 #endif
 
-void watchdog_nmi_reconfigure(bool run);
+void watchdog_nmi_stop(void);
+void watchdog_nmi_start(void);
 
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -123,24 +123,27 @@ int __weak __init watchdog_nmi_probe(voi
 }
 
 /**
- * watchdog_nmi_reconfigure - Optional function to reconfigure NMI watchdogs
- * @run:	If false stop the watchdogs on all enabled CPUs
- *		If true start the watchdogs on all enabled CPUs
+ * watchdog_nmi_stop - Stop the watchdog for reconfiguration
  *
- * The core call order is:
- * watchdog_nmi_reconfigure(false);
+ * The reconfiguration steps are:
+ * watchdog_nmi_stop();
  * update_variables();
- * watchdog_nmi_reconfigure(true);
+ * watchdog_nmi_start();
+ */
+void __weak watchdog_nmi_stop(void) { }
+
+/**
+ * watchdog_nmi_start - Start the watchdog after reconfiguration
  *
- * The second call which starts the watchdogs again guarantees that the
- * following variables are stable across the call.
+ * Counterpart to watchdog_nmi_stop().
+ *
+ * The following variables have been updated in update_variables() and
+ * contain the currently valid configuration:
  * - watchdog_enabled
  * - watchdog_thresh
  * - watchdog_cpumask
- *
- * After the call the variables can be changed again.
  */
-void __weak watchdog_nmi_reconfigure(bool run) { }
+void __weak watchdog_nmi_start(void) { }
 
 /**
  * lockup_detector_update_enable - Update the sysctl enable bit
@@ -551,13 +554,13 @@ static void softlockup_unpark_threads(vo
 
 static void softlockup_reconfigure_threads(void)
 {
-	watchdog_nmi_reconfigure(false);
+	watchdog_nmi_stop();
 	softlockup_park_all_threads();
 	set_sample_period();
 	lockup_detector_update_enable();
 	if (watchdog_enabled && watchdog_thresh)
 		softlockup_unpark_threads();
-	watchdog_nmi_reconfigure(true);
+	watchdog_nmi_start();
 }
 
 /*
@@ -602,9 +605,9 @@ static inline void watchdog_disable_all_
 static inline void softlockup_init_threads(void) { }
 static void softlockup_reconfigure_threads(void)
 {
-	watchdog_nmi_reconfigure(false);
+	watchdog_nmi_stop();
 	lockup_detector_update_enable();
-	watchdog_nmi_reconfigure(true);
+	watchdog_nmi_start();
 }
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC GIT Pull] core watchdog sanitizing
  2017-10-02 18:46   ` [RFC GIT Pull] core watchdog sanitizing Thomas Gleixner
@ 2017-10-02 19:04     ` Linus Torvalds
  2017-10-02 19:32       ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2017-10-02 19:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Don Zickus,
	Benjamin Herrenschmidt, Michael Ellerman, Nicholas Piggin,
	ppc-dev

On Mon, Oct 2, 2017 at 11:46 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> I agree that adding that 'run' argument was certainly not a piece of
> art. Though I disagree with the sentiment that non-functional garbage is
> preferrable over functionally correct code which merily contains a bad
> implementation choice.

I agree that it's somewhat arbitrary, but I also find it really hard
to vet code where my initial reaction is just "this is too ugly".

So it may be superficial, but ..

> Enough vented. Find below the cure for that major offense.

Looks much better to me. Thanks.

Side note: would it perhaps make sense to have that
cpus_read_lock/unlock() sequence around the whole reconfiguration
section?

Because while looking at that sequence, it looks a bit odd to me that
cpu's can come and go in the middle of the nmi watchdog
reconfiguration sequence.

In particular, what happens if a new CPU is brought up just as the NMI
matchdog is being reconfigured? The NMI's have been stopped for the
old CPU's, what happens for the new one that came up in between that
watchdog_nmi_stop/start?

This may be all obviously safe, I'm just asking for clarification.

               Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC GIT Pull] core watchdog sanitizing
  2017-10-02 19:04     ` Linus Torvalds
@ 2017-10-02 19:32       ` Thomas Gleixner
  2017-10-02 20:32         ` Don Zickus
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2017-10-02 19:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Don Zickus,
	Benjamin Herrenschmidt, Michael Ellerman, Nicholas Piggin,
	ppc-dev

On Mon, 2 Oct 2017, Linus Torvalds wrote:
> Side note: would it perhaps make sense to have that
> cpus_read_lock/unlock() sequence around the whole reconfiguration
> section?
> 
> Because while looking at that sequence, it looks a bit odd to me that
> cpu's can come and go in the middle of the nmi watchdog
> reconfiguration sequence.
> 
> In particular, what happens if a new CPU is brought up just as the NMI
> matchdog is being reconfigured? The NMI's have been stopped for the
> old CPU's, what happens for the new one that came up in between that
> watchdog_nmi_stop/start?
> 
> This may be all obviously safe, I'm just asking for clarification.

It's safe because the newly upcoming CPU will see an empty enabled mask in
the powerpc implementation. The perf based implementation has a similar
protection.

Though yes, it would be more obvious to expand the cpus locked
section. That requires a bit of shuffling. Untested patch below.

Thanks,

	tglx

8<------------------

--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -359,21 +359,17 @@ void watchdog_nmi_stop(void)
 {
 	int cpu;
 
-	cpus_read_lock();
 	for_each_cpu(cpu, &wd_cpus_enabled)
 		stop_wd_on_cpu(cpu);
-	cpus_read_unlock();
 }
 
 void watchdog_nmi_start(void)
 {
 	int cpu;
 
-	cpus_read_lock();
 	watchdog_calc_timeouts();
 	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
 		start_wd_on_cpu(cpu);
-	cpus_read_unlock();
 }
 
 /*
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -351,7 +351,7 @@ void smpboot_update_cpumask_percpu_threa
 	static struct cpumask tmp;
 	unsigned int cpu;
 
-	get_online_cpus();
+	lockdep_assert_cpus_held();
 	mutex_lock(&smpboot_threads_lock);
 
 	/* Park threads that were exclusively enabled on the old mask. */
@@ -367,7 +367,6 @@ void smpboot_update_cpumask_percpu_threa
 	cpumask_copy(old, new);
 
 	mutex_unlock(&smpboot_threads_lock);
-	put_online_cpus();
 }
 
 static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -535,7 +535,6 @@ static void softlockup_update_smpboot_th
 
 	smpboot_update_cpumask_percpu_thread(&watchdog_threads,
 					     &watchdog_allowed_mask);
-	__lockup_detector_cleanup();
 }
 
 /* Temporarily park all watchdog threads */
@@ -554,6 +553,7 @@ static void softlockup_unpark_threads(vo
 
 static void softlockup_reconfigure_threads(void)
 {
+	cpus_read_lock();
 	watchdog_nmi_stop();
 	softlockup_park_all_threads();
 	set_sample_period();
@@ -561,6 +561,12 @@ static void softlockup_reconfigure_threa
 	if (watchdog_enabled && watchdog_thresh)
 		softlockup_unpark_threads();
 	watchdog_nmi_start();
+	cpus_read_unlock();
+	/*
+	 * Must be called outside the cpus locked section to prevent
+	 * recursive locking in the perf code.
+	 */
+	__lockup_detector_cleanup();
 }
 
 /*

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC GIT Pull] core watchdog sanitizing
  2017-10-02 19:32       ` Thomas Gleixner
@ 2017-10-02 20:32         ` Don Zickus
  2017-10-02 20:45           ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Don Zickus @ 2017-10-02 20:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, LKML, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Benjamin Herrenschmidt, Michael Ellerman,
	Nicholas Piggin, ppc-dev

On Mon, Oct 02, 2017 at 07:32:57PM +0000, Thomas Gleixner wrote:
> On Mon, 2 Oct 2017, Linus Torvalds wrote:
> > Side note: would it perhaps make sense to have that
> > cpus_read_lock/unlock() sequence around the whole reconfiguration
> > section?
> > 
> > Because while looking at that sequence, it looks a bit odd to me that
> > cpu's can come and go in the middle of the nmi watchdog
> > reconfiguration sequence.
> > 
> > In particular, what happens if a new CPU is brought up just as the NMI
> > matchdog is being reconfigured? The NMI's have been stopped for the
> > old CPU's, what happens for the new one that came up in between that
> > watchdog_nmi_stop/start?
> > 
> > This may be all obviously safe, I'm just asking for clarification.
> 
> It's safe because the newly upcoming CPU will see an empty enabled mask in
> the powerpc implementation. The perf based implementation has a similar
> protection.
> 
> Though yes, it would be more obvious to expand the cpus locked
> section. That requires a bit of shuffling. Untested patch below.
> 
> Thanks,
> 
> 	tglx
> 
> 8<------------------
> 
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -359,21 +359,17 @@ void watchdog_nmi_stop(void)
>  {
>  	int cpu;
>  
> -	cpus_read_lock();
>  	for_each_cpu(cpu, &wd_cpus_enabled)
>  		stop_wd_on_cpu(cpu);
> -	cpus_read_unlock();
>  }
>  
>  void watchdog_nmi_start(void)
>  {
>  	int cpu;
>  
> -	cpus_read_lock();
>  	watchdog_calc_timeouts();
>  	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
>  		start_wd_on_cpu(cpu);
> -	cpus_read_unlock();
>  }
>  
>  /*
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -351,7 +351,7 @@ void smpboot_update_cpumask_percpu_threa
>  	static struct cpumask tmp;
>  	unsigned int cpu;
>  
> -	get_online_cpus();
> +	lockdep_assert_cpus_held();
>  	mutex_lock(&smpboot_threads_lock);
>  
>  	/* Park threads that were exclusively enabled on the old mask. */
> @@ -367,7 +367,6 @@ void smpboot_update_cpumask_percpu_threa
>  	cpumask_copy(old, new);
>  
>  	mutex_unlock(&smpboot_threads_lock);
> -	put_online_cpus();
>  }
>  
>  static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -535,7 +535,6 @@ static void softlockup_update_smpboot_th
>  
>  	smpboot_update_cpumask_percpu_thread(&watchdog_threads,
>  					     &watchdog_allowed_mask);
> -	__lockup_detector_cleanup();
>  }
>  
>  /* Temporarily park all watchdog threads */
> @@ -554,6 +553,7 @@ static void softlockup_unpark_threads(vo
>  
>  static void softlockup_reconfigure_threads(void)

There is a second copy of ^^^^, you will need to add identical locking there
too.

I can test both of these patches tomorrow.

Cheers,
Don

>  {
> +	cpus_read_lock();
>  	watchdog_nmi_stop();
>  	softlockup_park_all_threads();
>  	set_sample_period();
> @@ -561,6 +561,12 @@ static void softlockup_reconfigure_threa
>  	if (watchdog_enabled && watchdog_thresh)
>  		softlockup_unpark_threads();
>  	watchdog_nmi_start();
> +	cpus_read_unlock();
> +	/*
> +	 * Must be called outside the cpus locked section to prevent
> +	 * recursive locking in the perf code.
> +	 */
> +	__lockup_detector_cleanup();
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC GIT Pull] core watchdog sanitizing
  2017-10-02 20:32         ` Don Zickus
@ 2017-10-02 20:45           ` Thomas Gleixner
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2017-10-02 20:45 UTC (permalink / raw)
  To: Don Zickus
  Cc: Linus Torvalds, LKML, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Benjamin Herrenschmidt, Michael Ellerman,
	Nicholas Piggin, ppc-dev

On Mon, 2 Oct 2017, Don Zickus wrote:
> On Mon, Oct 02, 2017 at 07:32:57PM +0000, Thomas Gleixner wrote:
> >  static void softlockup_reconfigure_threads(void)
> 
> There is a second copy of ^^^^, you will need to add identical locking there
> too.

Good catch, but then we might move it further out
 
> > +	cpus_read_unlock();
> > +	/*
> > +	 * Must be called outside the cpus locked section to prevent
> > +	 * recursive locking in the perf code.
> > +	 */
> > +	__lockup_detector_cleanup();

Or not due to this ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-02 20:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <alpine.DEB.2.20.1710011217510.3874@nanos>
     [not found] ` <CA+55aFxQsmSb=zohGZOZK7eR4n5xOHNArmCa9j1b7+wJ+khQrg@mail.gmail.com>
2017-10-02 18:46   ` [RFC GIT Pull] core watchdog sanitizing Thomas Gleixner
2017-10-02 19:04     ` Linus Torvalds
2017-10-02 19:32       ` Thomas Gleixner
2017-10-02 20:32         ` Don Zickus
2017-10-02 20:45           ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).