LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Sebastian Siewior <bigeasy@linutronix.de>,
	Nicholas Piggin <npiggin@gmail.com>,
	Don Zickus <dzickus@redhat.com>,
	Chris Metcalf <cmetcalf@mellanox.com>,
	Ulrich Obergfell <uobergfe@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
Date: Tue, 03 Oct 2017 11:29:59 +1100
Message-ID: <87d165dqew.fsf@concordia.ellerman.id.au> (raw)
In-Reply-To: <20170912194147.862865570@linutronix.de>

Hi Thomas,

Thomas Gleixner <tglx@linutronix.de> writes:
> Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
> need to be done in two steps.
>
>      1) Stop all NMIs
>      2) Read the new parameters and start NMIs
>
> Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
> clean reconfiguration add a 'run' argument and split the functionality in
> powerpc.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Chris Metcalf <cmetcalf@mellanox.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Ulrich Obergfell <uobergfe@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Link: http://lkml.kernel.org/r/20170831073054.740462115@linutronix.de
>
> ---
>  arch/powerpc/kernel/watchdog.c |   17 +++++++++--------
>  include/linux/nmi.h            |    2 ++
>  kernel/watchdog.c              |   31 ++++++++++++++++++++++---------
>  3 files changed, 33 insertions(+), 17 deletions(-)

Unfortunately this is hitting the WARN_ON in start_wd_cpu() on powerpc
because we're calling it multiple times for the boot CPU.

The first call is via:

  start_wd_on_cpu+0x80/0x2f0
  watchdog_nmi_reconfigure+0x124/0x170
  softlockup_reconfigure_threads+0x110/0x130
  lockup_detector_init+0xbc/0xe0
  kernel_init_freeable+0x18c/0x37c
  kernel_init+0x2c/0x160
  ret_from_kernel_thread+0x5c/0xbc

And then again via the CPU hotplug registration:

  start_wd_on_cpu+0x80/0x2f0
  cpuhp_invoke_callback+0x194/0x620
  cpuhp_thread_fun+0x7c/0x1b0
  smpboot_thread_fn+0x290/0x2a0
  kthread+0x168/0x1b0
  ret_from_kernel_thread+0x5c/0xbc


The first call is new because previously watchdog_nmi_reconfigure()
wasn't called from softlockup_reconfigure_threads().

I'm not sure what the easiest fix is. One option would be to just drop
the WARN_ON, it's just there for paranoia AFAICS.

cheers

>
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -355,17 +355,18 @@ static void watchdog_calc_timeouts(void)
>  	wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5;
>  }
>  
> -void watchdog_nmi_reconfigure(void)
> +void watchdog_nmi_reconfigure(bool run)
>  {
>  	int cpu;
>  
> -	watchdog_calc_timeouts();
> -
> -	for_each_cpu(cpu, &wd_cpus_enabled)
> -		stop_wd_on_cpu(cpu);
> -
> -	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
> -		start_wd_on_cpu(cpu);
> +	if (!run) {
> +		for_each_cpu(cpu, &wd_cpus_enabled)
> +			stop_wd_on_cpu(cpu);
> +	} else {
> +		watchdog_calc_timeouts();
> +		for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
> +			start_wd_on_cpu(cpu);
> +	}
>  }
>  
>  /*
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -103,6 +103,8 @@ static inline void arch_touch_nmi_watchd
>  #endif
>  #endif
>  
> +void watchdog_nmi_reconfigure(bool run);
> +
>  /**
>   * touch_nmi_watchdog - restart NMI watchdog timeout.
>   *
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -112,17 +112,25 @@ void __weak watchdog_nmi_disable(unsigne
>  	hardlockup_detector_perf_disable();
>  }
>  
> -/*
> - * watchdog_nmi_reconfigure can be implemented to be notified after any
> - * watchdog configuration change. The arch hardlockup watchdog should
> - * respond to the following variables:
> +/**
> + * watchdog_nmi_reconfigure - Optional function to reconfigure NMI watchdogs
> + * @run:	If false stop the watchdogs on all enabled CPUs
> + *		If true start the watchdogs on all enabled CPUs
> + *
> + * The core call order is:
> + * watchdog_nmi_reconfigure(false);
> + * update_variables();
> + * watchdog_nmi_reconfigure(true);
> + *
> + * The second call which starts the watchdogs again guarantees that the
> + * following variables are stable across the call.
>   * - watchdog_enabled
>   * - watchdog_thresh
>   * - watchdog_cpumask
> - * - sysctl_hardlockup_all_cpu_backtrace
> - * - hardlockup_panic
> + *
> + * After the call the variables can be changed again.
>   */
> -void __weak watchdog_nmi_reconfigure(void) { }
> +void __weak watchdog_nmi_reconfigure(bool run) { }
>  
>  #ifdef CONFIG_SOFTLOCKUP_DETECTOR
>  
> @@ -515,10 +523,12 @@ static void softlockup_unpark_threads(vo
>  
>  static void softlockup_reconfigure_threads(bool enabled)
>  {
> +	watchdog_nmi_reconfigure(false);
>  	softlockup_park_all_threads();
>  	set_sample_period();
>  	if (enabled)
>  		softlockup_unpark_threads();
> +	watchdog_nmi_reconfigure(true);
>  }
>  
>  /*
> @@ -559,7 +569,11 @@ static inline void watchdog_unpark_threa
>  static inline int watchdog_enable_all_cpus(void) { return 0; }
>  static inline void watchdog_disable_all_cpus(void) { }
>  static inline void softlockup_init_threads(void) { }
> -static inline void softlockup_reconfigure_threads(bool enabled) { }
> +static void softlockup_reconfigure_threads(bool enabled)
> +{
> +	watchdog_nmi_reconfigure(false);
> +	watchdog_nmi_reconfigure(true);
> +}
>  #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
>  
>  static void __lockup_detector_cleanup(void)
> @@ -599,7 +613,6 @@ static void proc_watchdog_update(void)
>  	/* Remove impossible cpus to keep sysctl output clean. */
>  	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
>  	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
> -	watchdog_nmi_reconfigure();
>  }
>  
>  /*

  parent reply index

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-12 19:36 [patch V2 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
2017-09-12 19:36 ` [patch V2 01/29] hardlockup_detector: Provide interface to stop/restart perf events Thomas Gleixner
2017-09-14 10:40   ` [tip:core/urgent] watchdog/hardlockup: " tip-bot for Peter Zijlstra
2017-09-12 19:36 ` [patch V2 02/29] perf/x86/intel: Sanitize PMU HT bug workaround Thomas Gleixner
2017-09-14 10:40   ` [tip:core/urgent] perf/x86/intel, watchdog/core: " tip-bot for Peter Zijlstra
2017-09-12 19:36 ` [patch V2 03/29] lockup_detector: Provide interface to stop from poweroff() Thomas Gleixner
2017-09-14 10:40   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:36 ` [patch V2 04/29] parisc: Use lockup_detector_stop() Thomas Gleixner
2017-09-14  8:59   ` Helge Deller
2017-09-14 13:46     ` Don Zickus
2017-09-14 10:41   ` [tip:core/urgent] parisc, watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:36 ` [patch V2 05/29] lockup_detector: Remove broken suspend/resume interfaces Thomas Gleixner
2017-09-14 10:41   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 06/29] lockup_detector: Rework cpu hotplug locking Thomas Gleixner
2017-09-14 10:41   ` [tip:core/urgent] watchdog/core: Rework CPU " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 07/29] lockup_detector: Rename watchdog_proc_mutex Thomas Gleixner
2017-09-14 10:42   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 08/29] lockup_detector: Mark hardlockup_detector_disable() __init Thomas Gleixner
2017-09-14 10:42   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 09/29] lockup_detector/perf: Remove broken self disable on failure Thomas Gleixner
2017-09-14 10:43   ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock Thomas Gleixner
2017-09-14 10:43   ` [tip:core/urgent] watchdog/hardlockup/perf: Prevent CPU " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 11/29] lockup_detector: Remove park_in_progress obfuscation Thomas Gleixner
2017-09-12 19:37 ` [patch V2 12/29] lockup_detector: Cleanup stub functions Thomas Gleixner
2017-09-14 10:44   ` [tip:core/urgent] watchdog/core: Clean up " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 13/29] lockup_detector: Cleanup the ifdef maze Thomas Gleixner
2017-09-14 10:44   ` [tip:core/urgent] watchdog/core: Clean up the #ifdef maze tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 14/29] lockup_detector: Split out cpumask write function Thomas Gleixner
2017-09-14 10:45   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 15/29] smpboot/threads: Avoid runtime allocation Thomas Gleixner
2017-09-14 10:45   ` [tip:core/urgent] smpboot/threads, watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 16/29] lockup_detector: Create new thread handling infrastructure Thomas Gleixner
2017-09-14 10:45   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 17/29] lockup_detector: Get rid of the thread teardown/setup dance Thomas Gleixner
2017-09-14 10:46   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 18/29] lockup_detector: Further simplify sysctl handling Thomas Gleixner
2017-09-14 10:46   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 19/29] lockup_detector: Cleanup header mess Thomas Gleixner
2017-09-14 10:47   ` [tip:core/urgent] watchdog/core: Clean up " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 20/29] lockup_detector/sysctl: Get rid of the ifdeffery Thomas Gleixner
2017-09-14 10:47   ` [tip:core/urgent] watchdog/sysctl: Get rid of the #ifdeffery tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 21/29] lockup_detector: Cleanup sysctl variable name space Thomas Gleixner
2017-09-14 10:47   ` [tip:core/urgent] watchdog/sysctl: Clean up " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
2017-09-14 10:48   ` [tip:core/urgent] watchdog/core, powerpc: " tip-bot for Thomas Gleixner
2017-10-03  0:29   ` Michael Ellerman [this message]
2017-10-03  6:50     ` [patch V2 22/29] lockup_detector: " Thomas Gleixner
2017-10-03  7:04       ` Thomas Gleixner
2017-10-03 10:01         ` Nicholas Piggin
2017-10-03 10:56           ` Thomas Gleixner
2017-10-03 11:36       ` Michael Ellerman
2017-10-03 12:13         ` Thomas Gleixner
2017-10-03 13:20           ` Thomas Gleixner
2017-10-03 19:27             ` Thomas Gleixner
2017-10-04  5:53               ` Michael Ellerman
2017-10-05 16:17               ` Don Zickus
2017-09-12 19:37 ` [patch V2 23/29] lockup_detector: Get rid of the racy update loop Thomas Gleixner
2017-09-14 10:48   ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 24/29] lockup_detector/perf: Implement init time perf validation Thomas Gleixner
2017-09-14 10:48   ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 25/29] lockup_detector: Implement init time detection of perf Thomas Gleixner
2017-09-13 18:02   ` Don Zickus
2017-09-13 18:05     ` Thomas Gleixner
2017-09-14  5:27       ` Ingo Molnar
2017-09-14 10:49   ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 26/29] lockup_detector/perf: Implement CPU enable replacement Thomas Gleixner
2017-09-14 10:49   ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 27/29] lockup_detector: Use new perf CPU enable mechanism Thomas Gleixner
2017-09-14 10:50   ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 28/29] lockup_detector/perf: Simplify deferred event destroy Thomas Gleixner
2017-09-14 10:50   ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 29/29] lockup_detector: Cleanup hotplug locking mess Thomas Gleixner
2017-09-14 10:50   ` [tip:core/urgent] watchdog/hardlockup: Clean up " tip-bot for Thomas Gleixner
2017-09-13 18:06 ` [patch V2 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Don Zickus
2017-09-14  5:27   ` Ingo Molnar
2017-09-14  8:11   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d165dqew.fsf@concordia.ellerman.id.au \
    --to=mpe@ellerman.id.au \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=bigeasy@linutronix.de \
    --cc=bp@alien8.de \
    --cc=cmetcalf@mellanox.com \
    --cc=dzickus@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=uobergfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git