From: Don Zickus <dzickus@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Borislav Petkov <bp@alien8.de>,
Andrew Morton <akpm@linux-foundation.org>,
Sebastian Siewior <bigeasy@linutronix.de>,
Nicholas Piggin <npiggin@gmail.com>,
Chris Metcalf <cmetcalf@mellanox.com>,
Ulrich Obergfell <uobergfe@redhat.com>
Subject: Re: [patch V2 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape
Date: Wed, 13 Sep 2017 14:06:42 -0400 [thread overview]
Message-ID: <20170913180642.ywrszlfziobv7yiu@redhat.com> (raw)
In-Reply-To: <20170912193654.321505854@linutronix.de>
On Tue, Sep 12, 2017 at 09:36:54PM +0200, Thomas Gleixner wrote:
> The lockup detector is broken is several ways:
>
> - It's deadlock prone vs. CPU hotplug in various ways. Some of these
> are due to recursive cpus_read_lock() others are due to
> cpus_read_lock() from CPU hotplug callbacks which immediately lock
> the machine because cpus are write locked.
>
> - The handling of the cpu hotplug threads happens sideways to the
> smpboot thread infrastructure, which is racy and pointless
>
> - The handling of the user space sysctl interface is a complete
> trainwreck as it fiddles directly with variables which can be
> modified or evaluated by the running watchdogs.
>
> - The perf event initialization is a trainwreck as it tries to create
> perf events over and over even if perf is not functional (no
> hardware, ....). To avoid excessive dmesg spam it contains magic
> printk ratelimiting along with either wrong or useless messages.
>
> - The code structure is horrible as ifdef sections are scattered all
> over the place which makes it unreadable
>
> - There is more wreckage, but see the changelogs for the ugly details.
>
Aside from the simple compile issue in patch 25. I have no issues with this
patchset. Thanks Thomas!
Reviewed-by: Don Zickus <dzickus@redhat.com>
> The following series sanitizes the facility and addresses the problems.
>
> Changes since V1:
>
> - Wrapped the perf specific calls into the weak watchdog_nmi_*
> functions
>
> - Fixed the compile error pointed out by Don
>
> - Fixed the reconfiguration parameter inconsistency which broke
> powerpc
>
> - Picked up the updated version of patch 11/29
>
> Delta patch below.
>
> Thanks,
>
> tglx
> ---
> Diffstat for the series:
>
> arch/parisc/kernel/process.c | 2
> arch/powerpc/kernel/watchdog.c | 22 -
> arch/x86/events/intel/core.c | 11
> include/linux/nmi.h | 121 +++----
> include/linux/smpboot.h | 4
> kernel/cpu.c | 6
> kernel/smpboot.c | 22 -
> kernel/sysctl.c | 22 -
> kernel/watchdog.c | 633 ++++++++++++++---------------------------
> kernel/watchdog_hld.c | 193 ++++++------
> 10 files changed, 434 insertions(+), 602 deletions(-)
>
> Delta patch vs. V1
> 8<------------------------
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -355,12 +355,12 @@ static void watchdog_calc_timeouts(void)
> wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5;
> }
>
> -void watchdog_nmi_reconfigure(bool stop)
> +void watchdog_nmi_reconfigure(bool run)
> {
> int cpu;
>
> cpus_read_lock();
> - if (stop) {
> + if (!run) {
> for_each_cpu(cpu, &wd_cpus_enabled)
> stop_wd_on_cpu(cpu);
> } else {
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -102,7 +102,7 @@ static inline void hardlockup_detector_p
> static inline void hardlockup_detector_perf_enable(void) { }
> static inline void hardlockup_detector_perf_cleanup(void) { }
> # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
> -static int hardlockup_detector_perf_init(void) { return -ENODEV; }
> +static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
> static inline void arch_touch_nmi_watchdog(void) {}
> # else
> static int hardlockup_detector_perf_init(void) { return 0; }
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -105,18 +105,32 @@ static int __init hardlockup_all_cpu_bac
> * softlockup watchdog threads start and stop. The arch must select the
> * SOFTLOCKUP_DETECTOR Kconfig.
> */
> -int __weak watchdog_nmi_enable(unsigned int cpu) { return 0; }
> -void __weak watchdog_nmi_disable(unsigned int cpu) { }
> +int __weak watchdog_nmi_enable(unsigned int cpu)
> +{
> + hardlockup_detector_perf_enable();
> + return 0;
> +}
> +
> +void __weak watchdog_nmi_disable(unsigned int cpu)
> +{
> + hardlockup_detector_perf_disable();
> +}
> +
> +/* Return 0, if a NMI watchdog is available. Error code otherwise */
> +int __weak __init void watchdog_nmi_probe(void)
> +{
> + return hardlockup_detector_perf_init();
> +}
>
> /**
> * watchdog_nmi_reconfigure - Optional function to reconfigure NMI watchdogs
> - * @stop: If true stop the watchdogs on all enabled CPUs
> - * If false start the watchdogs on all enabled CPUs
> + * @run: If false stop the watchdogs on all enabled CPUs
> + * If true start the watchdogs on all enabled CPUs
> *
> * The core call order is:
> - * watchdog_nmi_reconfigure(true);
> - * update_variables();
> * watchdog_nmi_reconfigure(false);
> + * update_variables();
> + * watchdog_nmi_reconfigure(true);
> *
> * The second call which starts the watchdogs again guarantees that the
> * following variables are stable across the call.
> @@ -126,13 +140,13 @@ void __weak watchdog_nmi_disable(unsigne
> *
> * After the call the variables can be changed again.
> */
> -void __weak watchdog_nmi_reconfigure(bool stop) { }
> +void __weak watchdog_nmi_reconfigure(bool run) { }
>
> /**
> * lockup_detector_update_enable - Update the sysctl enable bit
> *
> * Caller needs to make sure that the NMI/perf watchdogs are off, so this
> - * can't race with hardlockup_detector_disable().
> + * can't race with watchdog_nmi_disable().
> */
> static void lockup_detector_update_enable(void)
> {
> @@ -453,8 +467,7 @@ static void watchdog_enable(unsigned int
> __touch_watchdog();
> /* Enable the perf event */
> if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
> - hardlockup_detector_perf_enable();
> - watchdog_nmi_enable(cpu);
> + watchdog_nmi_enable(cpu);
>
> watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);
> }
> @@ -469,7 +482,6 @@ static void watchdog_disable(unsigned in
> * between disabling the timer and disabling the perf event causes
> * the perf NMI to detect a false positive.
> */
> - hardlockup_detector_perf_disable();
> watchdog_nmi_disable(cpu);
> hrtimer_cancel(hrtimer);
> }
> @@ -745,12 +757,6 @@ int proc_watchdog_cpumask(struct ctl_tab
> }
> #endif /* CONFIG_SYSCTL */
>
> -static __init void detect_nmi_watchdog(void)
> -{
> - if (!hardlockup_detector_perf_init())
> - nmi_watchdog_available = true;
> -}
> -
> void __init lockup_detector_init(void)
> {
> #ifdef CONFIG_NO_HZ_FULL
> @@ -763,6 +769,7 @@ void __init lockup_detector_init(void)
> cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
> #endif
>
> - detect_nmi_watchdog();
> + if (!watchdog_nmi_probe())
> + nmi_watchdog_available = true;
> softlockup_init_threads();
> }
>
>
>
next prev parent reply other threads:[~2017-09-13 18:06 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-12 19:36 [patch V2 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
2017-09-12 19:36 ` [patch V2 01/29] hardlockup_detector: Provide interface to stop/restart perf events Thomas Gleixner
2017-09-14 10:40 ` [tip:core/urgent] watchdog/hardlockup: " tip-bot for Peter Zijlstra
2017-09-12 19:36 ` [patch V2 02/29] perf/x86/intel: Sanitize PMU HT bug workaround Thomas Gleixner
2017-09-14 10:40 ` [tip:core/urgent] perf/x86/intel, watchdog/core: " tip-bot for Peter Zijlstra
2017-09-12 19:36 ` [patch V2 03/29] lockup_detector: Provide interface to stop from poweroff() Thomas Gleixner
2017-09-14 10:40 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:36 ` [patch V2 04/29] parisc: Use lockup_detector_stop() Thomas Gleixner
2017-09-14 8:59 ` Helge Deller
2017-09-14 13:46 ` Don Zickus
2017-09-14 10:41 ` [tip:core/urgent] parisc, watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:36 ` [patch V2 05/29] lockup_detector: Remove broken suspend/resume interfaces Thomas Gleixner
2017-09-14 10:41 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 06/29] lockup_detector: Rework cpu hotplug locking Thomas Gleixner
2017-09-14 10:41 ` [tip:core/urgent] watchdog/core: Rework CPU " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 07/29] lockup_detector: Rename watchdog_proc_mutex Thomas Gleixner
2017-09-14 10:42 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 08/29] lockup_detector: Mark hardlockup_detector_disable() __init Thomas Gleixner
2017-09-14 10:42 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 09/29] lockup_detector/perf: Remove broken self disable on failure Thomas Gleixner
2017-09-14 10:43 ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock Thomas Gleixner
2017-09-14 10:43 ` [tip:core/urgent] watchdog/hardlockup/perf: Prevent CPU " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 11/29] lockup_detector: Remove park_in_progress obfuscation Thomas Gleixner
2017-09-12 19:37 ` [patch V2 12/29] lockup_detector: Cleanup stub functions Thomas Gleixner
2017-09-14 10:44 ` [tip:core/urgent] watchdog/core: Clean up " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 13/29] lockup_detector: Cleanup the ifdef maze Thomas Gleixner
2017-09-14 10:44 ` [tip:core/urgent] watchdog/core: Clean up the #ifdef maze tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 14/29] lockup_detector: Split out cpumask write function Thomas Gleixner
2017-09-14 10:45 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 15/29] smpboot/threads: Avoid runtime allocation Thomas Gleixner
2017-09-14 10:45 ` [tip:core/urgent] smpboot/threads, watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 16/29] lockup_detector: Create new thread handling infrastructure Thomas Gleixner
2017-09-14 10:45 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 17/29] lockup_detector: Get rid of the thread teardown/setup dance Thomas Gleixner
2017-09-14 10:46 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 18/29] lockup_detector: Further simplify sysctl handling Thomas Gleixner
2017-09-14 10:46 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 19/29] lockup_detector: Cleanup header mess Thomas Gleixner
2017-09-14 10:47 ` [tip:core/urgent] watchdog/core: Clean up " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 20/29] lockup_detector/sysctl: Get rid of the ifdeffery Thomas Gleixner
2017-09-14 10:47 ` [tip:core/urgent] watchdog/sysctl: Get rid of the #ifdeffery tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 21/29] lockup_detector: Cleanup sysctl variable name space Thomas Gleixner
2017-09-14 10:47 ` [tip:core/urgent] watchdog/sysctl: Clean up " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
2017-09-14 10:48 ` [tip:core/urgent] watchdog/core, powerpc: " tip-bot for Thomas Gleixner
2017-10-03 0:29 ` [patch V2 22/29] lockup_detector: " Michael Ellerman
2017-10-03 6:50 ` Thomas Gleixner
2017-10-03 7:04 ` Thomas Gleixner
2017-10-03 10:01 ` Nicholas Piggin
2017-10-03 10:56 ` Thomas Gleixner
2017-10-03 11:36 ` Michael Ellerman
2017-10-03 12:13 ` Thomas Gleixner
2017-10-03 13:20 ` Thomas Gleixner
2017-10-03 19:27 ` Thomas Gleixner
2017-10-04 5:53 ` Michael Ellerman
2017-10-05 16:17 ` Don Zickus
2017-09-12 19:37 ` [patch V2 23/29] lockup_detector: Get rid of the racy update loop Thomas Gleixner
2017-09-14 10:48 ` [tip:core/urgent] watchdog/core: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 24/29] lockup_detector/perf: Implement init time perf validation Thomas Gleixner
2017-09-14 10:48 ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 25/29] lockup_detector: Implement init time detection of perf Thomas Gleixner
2017-09-13 18:02 ` Don Zickus
2017-09-13 18:05 ` Thomas Gleixner
2017-09-14 5:27 ` Ingo Molnar
2017-09-14 10:49 ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 26/29] lockup_detector/perf: Implement CPU enable replacement Thomas Gleixner
2017-09-14 10:49 ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 27/29] lockup_detector: Use new perf CPU enable mechanism Thomas Gleixner
2017-09-14 10:50 ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 28/29] lockup_detector/perf: Simplify deferred event destroy Thomas Gleixner
2017-09-14 10:50 ` [tip:core/urgent] watchdog/hardlockup/perf: " tip-bot for Thomas Gleixner
2017-09-12 19:37 ` [patch V2 29/29] lockup_detector: Cleanup hotplug locking mess Thomas Gleixner
2017-09-14 10:50 ` [tip:core/urgent] watchdog/hardlockup: Clean up " tip-bot for Thomas Gleixner
2017-09-13 18:06 ` Don Zickus [this message]
2017-09-14 5:27 ` [patch V2 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Ingo Molnar
2017-09-14 8:11 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170913180642.ywrszlfziobv7yiu@redhat.com \
--to=dzickus@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=bp@alien8.de \
--cc=cmetcalf@mellanox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=uobergfe@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).