From: Don Zickus <dzickus@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Borislav Petkov <bp@alien8.de>,
Sebastian Siewior <bigeasy@linutronix.de>,
Nicholas Piggin <npiggin@gmail.com>,
Chris Metcalf <cmetcalf@mellanox.com>,
Ulrich Obergfell <uobergfe@redhat.com>
Subject: Re: [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance
Date: Fri, 1 Sep 2017 15:08:22 -0400 [thread overview]
Message-ID: <20170901190822.4vmgl72qefp72s2j@redhat.com> (raw)
In-Reply-To: <20170831073054.346816852@linutronix.de>
On Thu, Aug 31, 2017 at 09:16:15AM +0200, Thomas Gleixner wrote:
> The lockup detector reconfiguration tears down all watchdog threads when
> the watchdog is disabled and sets them up again when its enabled.
>
> That's a pointless exercise. The watchdog threads are not consuming an
> insane amount of resources, so it's enough to set them up at init time and
> keep them in parked position when the watchdog is disabled and unpark them
> when it is reenabled. The smpboot thread infrastructure takes care of
> keeping the force parked threads in place even across cpu hotplug.
The original reasoning for implementing this complexity was we were worried
about catching kthread_[un]park errors. It was thought in the rare case if
it failed, we would hang. Perhaps we misunderstood the kthread_[un]park
code.
Cheers,
Don
>
> Aside of that the code implements the park/unpark facility of smp hotplug
> threads on its own, which is even more pointless. We have functionality in
> the smpboot thread code to do so.
>
> Use the new thread management functions and get rid of the unholy mess.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> kernel/watchdog.c | 190 +++++-------------------------------------------------
> 1 file changed, 19 insertions(+), 171 deletions(-)
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -92,13 +92,6 @@ struct cpumask watchdog_cpumask __read_m
> unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>
> /*
> - * The 'watchdog_running' variable is set to 1 when the watchdog threads
> - * are registered/started and is set to 0 when the watchdog threads are
> - * unregistered/stopped, so it is an indicator whether the threads exist.
> - */
> -static int __read_mostly watchdog_running;
> -
> -/*
> * These functions can be overridden if an architecture implements its
> * own hardlockup detector.
> *
> @@ -123,10 +116,6 @@ void __weak watchdog_nmi_reconfigure(voi
>
> #ifdef CONFIG_SOFTLOCKUP_DETECTOR
>
> -/* Helper for online, unparked cpus. */
> -#define for_each_watchdog_cpu(cpu) \
> - for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
> -
> /* Global variables, exported for sysctl */
> unsigned int __read_mostly softlockup_panic =
> CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
> @@ -252,11 +241,15 @@ void touch_all_softlockup_watchdogs(void
> int cpu;
>
> /*
> - * this is done lockless
> - * do we care if a 0 races with a timestamp?
> - * all it means is the softlock check starts one cycle later
> + * watchdog_mutex cannpt be taken here, as this might be called
> + * from (soft)interrupt context, so the access to
> + * watchdog_allowed_cpumask might race with a concurrent update.
> + *
> + * The watchdog time stamp can race against a concurrent real
> + * update as well, the only side effect might be a cycle delay for
> + * the softlockup check.
> */
> - for_each_watchdog_cpu(cpu)
> + for_each_cpu(cpu, &watchdog_allowed_mask)
> per_cpu(watchdog_touch_ts, cpu) = 0;
> wq_watchdog_touch(-1);
> }
> @@ -296,9 +289,6 @@ static void watchdog_interrupt_count(voi
> __this_cpu_inc(hrtimer_interrupts);
> }
>
> -static int watchdog_enable_all_cpus(void);
> -static void watchdog_disable_all_cpus(void);
> -
> /* watchdog kicker functions */
> static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> {
> @@ -492,95 +482,6 @@ static struct smp_hotplug_thread watchdo
> .unpark = watchdog_enable,
> };
>
> -/*
> - * park all watchdog threads that are specified in 'watchdog_cpumask'
> - *
> - * This function returns an error if kthread_park() of a watchdog thread
> - * fails. In this situation, the watchdog threads of some CPUs can already
> - * be parked and the watchdog threads of other CPUs can still be runnable.
> - * Callers are expected to handle this special condition as appropriate in
> - * their context.
> - *
> - * This function may only be called in a context that is protected against
> - * races with CPU hotplug - for example, via get_online_cpus().
> - */
> -static int watchdog_park_threads(void)
> -{
> - int cpu, ret = 0;
> -
> - for_each_watchdog_cpu(cpu) {
> - ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
> - if (ret)
> - break;
> - }
> - return ret;
> -}
> -
> -/*
> - * unpark all watchdog threads that are specified in 'watchdog_cpumask'
> - *
> - * This function may only be called in a context that is protected against
> - * races with CPU hotplug - for example, via get_online_cpus().
> - */
> -static void watchdog_unpark_threads(void)
> -{
> - int cpu;
> -
> - for_each_watchdog_cpu(cpu)
> - kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> -}
> -
> -static int update_watchdog_all_cpus(void)
> -{
> - int ret;
> -
> - ret = watchdog_park_threads();
> - if (ret)
> - return ret;
> -
> - watchdog_unpark_threads();
> -
> - return 0;
> -}
> -
> -static int watchdog_enable_all_cpus(void)
> -{
> - int err = 0;
> -
> - if (!watchdog_running) {
> - err = smpboot_register_percpu_thread_cpumask(&watchdog_threads,
> - &watchdog_cpumask);
> - if (err)
> - pr_err("Failed to create watchdog threads, disabled\n");
> - else
> - watchdog_running = 1;
> - } else {
> - /*
> - * Enable/disable the lockup detectors or
> - * change the sample period 'on the fly'.
> - */
> - err = update_watchdog_all_cpus();
> -
> - if (err) {
> - watchdog_disable_all_cpus();
> - pr_err("Failed to update lockup detectors, disabled\n");
> - }
> - }
> -
> - if (err)
> - watchdog_enabled = 0;
> -
> - return err;
> -}
> -
> -static void watchdog_disable_all_cpus(void)
> -{
> - if (watchdog_running) {
> - watchdog_running = 0;
> - smpboot_unregister_percpu_thread(&watchdog_threads);
> - }
> -}
> -
> static void softlockup_update_smpboot_threads(void)
> {
> lockdep_assert_held(&watchdog_mutex);
> @@ -655,7 +556,6 @@ static inline int watchdog_park_threads(
> static inline void watchdog_unpark_threads(void) { }
> static inline int watchdog_enable_all_cpus(void) { return 0; }
> static inline void watchdog_disable_all_cpus(void) { }
> -static inline void set_sample_period(void) { }
> static inline void softlockup_init_threads(void) { }
> static inline void softlockup_update_threads(void) { }
> static inline void softlockup_reconfigure_threads(bool enabled) { }
> @@ -695,28 +595,10 @@ void lockup_detector_soft_poweroff(void)
> /*
> * Update the run state of the lockup detectors.
> */
> -static int proc_watchdog_update(void)
> +static void proc_watchdog_update(void)
> {
> - int err = 0;
> -
> - /*
> - * Watchdog threads won't be started if they are already active.
> - * The 'watchdog_running' variable in watchdog_*_all_cpus() takes
> - * care of this. If those threads are already active, the sample
> - * period will be updated and the lockup detectors will be enabled
> - * or disabled 'on the fly'.
> - */
> - if (watchdog_enabled && watchdog_thresh)
> - err = watchdog_enable_all_cpus();
> - else
> - watchdog_disable_all_cpus();
> -
> + softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
> watchdog_nmi_reconfigure();
> -
> - __lockup_detector_cleanup();
> -
> - return err;
> -
> }
>
> /*
> @@ -772,17 +654,8 @@ static int proc_watchdog_common(int whic
> new = old & ~which;
> } while (cmpxchg(&watchdog_enabled, old, new) != old);
>
> - /*
> - * Update the run state of the lockup detectors. There is _no_
> - * need to check the value returned by proc_watchdog_update()
> - * and to restore the previous value of 'watchdog_enabled' as
> - * both lockup detectors are disabled if proc_watchdog_update()
> - * returns an error.
> - */
> - if (old == new)
> - goto out;
> -
> - err = proc_watchdog_update();
> + if (old != new)
> + proc_watchdog_update();
> }
> out:
> mutex_unlock(&watchdog_mutex);
> @@ -826,50 +699,28 @@ int proc_soft_watchdog(struct ctl_table
> int proc_watchdog_thresh(struct ctl_table *table, int write,
> void __user *buffer, size_t *lenp, loff_t *ppos)
> {
> - int err, old, new;
> + int err, old;
>
> cpu_hotplug_disable();
> mutex_lock(&watchdog_mutex);
>
> - old = ACCESS_ONCE(watchdog_thresh);
> + old = READ_ONCE(watchdog_thresh);
> err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>
> - if (err || !write)
> - goto out;
> + if (!err && write && old != READ_ONCE(watchdog_thresh))
> + proc_watchdog_update();
>
> - /*
> - * Update the sample period. Restore on failure.
> - */
> - new = ACCESS_ONCE(watchdog_thresh);
> - if (old == new)
> - goto out;
> -
> - set_sample_period();
> - err = proc_watchdog_update();
> - if (err) {
> - watchdog_thresh = old;
> - set_sample_period();
> - }
> -out:
> mutex_unlock(&watchdog_mutex);
> cpu_hotplug_enable();
> return err;
> }
>
> -static void watchdog_update_cpus(void)
> -{
> - if (IS_ENABLED(CONFIG_SOFTLOCKUP_DETECTOR) && watchdog_running) {
> - smpboot_update_cpumask_percpu_thread(&watchdog_threads,
> - &watchdog_cpumask);
> - __lockup_detector_cleanup();
> - }
> -}
> -
> static void proc_watchdog_cpumask_update(void)
> {
> /* Remove impossible cpus to keep sysctl output clean. */
> cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
> - watchdog_update_cpus();
> +
> + softlockup_update_threads();
> watchdog_nmi_reconfigure();
> }
>
> @@ -899,8 +750,6 @@ int proc_watchdog_cpumask(struct ctl_tab
>
> void __init lockup_detector_init(void)
> {
> - set_sample_period();
> -
> #ifdef CONFIG_NO_HZ_FULL
> if (tick_nohz_full_enabled()) {
> pr_info("Disabling watchdog on nohz_full cores by default\n");
> @@ -911,6 +760,5 @@ void __init lockup_detector_init(void)
> cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
> #endif
>
> - if (watchdog_enabled)
> - watchdog_enable_all_cpus();
> + softlockup_init_threads();
> }
>
>
next prev parent reply other threads:[~2017-09-01 19:08 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-31 7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
2017-08-31 7:15 ` [patch 01/29] hardlockup_detector: Provide interface to stop/restart perf events Thomas Gleixner
2017-09-06 16:14 ` Borislav Petkov
2017-08-31 7:16 ` [patch 02/29] perf/x86/intel: Sanitize PMU HT bug workaround Thomas Gleixner
2017-08-31 7:16 ` [patch 03/29] lockup_detector: Provide interface to stop from poweroff() Thomas Gleixner
2017-08-31 7:16 ` [patch 04/29] parisc: Use lockup_detector_stop() Thomas Gleixner
2017-08-31 7:16 ` [patch 05/29] lockup_detector: Remove broken suspend/resume interfaces Thomas Gleixner
2017-08-31 7:16 ` [patch 06/29] lockup_detector: Rework cpu hotplug locking Thomas Gleixner
2017-08-31 7:16 ` [patch 07/29] lockup_detector: Rename watchdog_proc_mutex Thomas Gleixner
2017-08-31 7:16 ` [patch 08/29] lockup_detector: Mark hardlockup_detector_disable() __init Thomas Gleixner
2017-08-31 7:16 ` [patch 09/29] lockup_detector/perf: Remove broken self disable on failure Thomas Gleixner
2017-08-31 7:16 ` [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock Thomas Gleixner
2017-09-01 19:02 ` Don Zickus
2017-09-01 19:29 ` Thomas Gleixner
2017-09-05 14:51 ` Don Zickus
2017-08-31 7:16 ` [patch 11/29] lockup_detector: Remove park_in_progress hackery Thomas Gleixner
[not found] ` <CAEeg4=CJohPTi8FUNWqb3egsbZnExyJapcNC7wD-2amXTsMrYw@mail.gmail.com>
2017-09-04 12:10 ` Peter Zijlstra
2017-09-05 15:15 ` Don Zickus
2017-09-05 15:42 ` Thomas Gleixner
2017-09-05 13:58 ` Thomas Gleixner
2017-09-05 19:19 ` [patch V2 11/29] lockup_detector: Remove park_in_progress obfuscation Thomas Gleixner
2017-09-14 10:43 ` [tip:core/urgent] watchdog/core: Remove the " tip-bot for Thomas Gleixner
2017-08-31 7:16 ` [patch 12/29] lockup_detector: Cleanup stub functions Thomas Gleixner
2017-08-31 7:16 ` [patch 13/29] lockup_detector: Cleanup the ifdef maze Thomas Gleixner
2017-08-31 7:16 ` [patch 14/29] lockup_detector: Split out cpumask write function Thomas Gleixner
2017-08-31 7:16 ` [patch 15/29] smpboot/threads: Avoid runtime allocation Thomas Gleixner
2017-08-31 7:16 ` [patch 16/29] lockup_detector: Create new thread handling infrastructure Thomas Gleixner
2017-08-31 7:16 ` [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance Thomas Gleixner
2017-09-01 19:08 ` Don Zickus [this message]
2017-09-01 19:45 ` Thomas Gleixner
2017-08-31 7:16 ` [patch 18/29] lockup_detector: Further simplify sysctl handling Thomas Gleixner
2017-08-31 7:16 ` [patch 19/29] lockup_detector: Cleanup header mess Thomas Gleixner
2017-08-31 7:16 ` [patch 20/29] lockup_detector/sysctl: Get rid of the ifdeffery Thomas Gleixner
2017-08-31 7:16 ` [patch 21/29] lockup_detector: Cleanup sysctl variable name space Thomas Gleixner
2017-08-31 7:16 ` [patch 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
2017-08-31 7:16 ` [patch 23/29] lockup_detector: Get rid of the racy update loop Thomas Gleixner
2017-08-31 7:16 ` [patch 24/29] lockup_detector/perf: Implement init time perf validation Thomas Gleixner
2017-09-07 15:58 ` Don Zickus
2017-08-31 7:16 ` [patch 25/29] lockup_detector: Implement init time detection of perf Thomas Gleixner
2017-08-31 7:16 ` [patch 26/29] lockup_detector/perf: Implement CPU enable replacement Thomas Gleixner
2017-08-31 7:16 ` [patch 27/29] lockup_detector: Use new perf CPU enable mechanism Thomas Gleixner
2017-08-31 7:16 ` [patch 28/29] lockup_detector/perf: Simplify deferred event destroy Thomas Gleixner
2017-08-31 7:16 ` [patch 29/29] lockup_detector: Cleanup hotplug locking mess Thomas Gleixner
2017-08-31 22:10 ` [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Don Zickus
2017-09-01 4:42 ` Nicholas Piggin
2017-09-01 9:18 ` Thomas Gleixner
2017-09-07 16:04 ` Don Zickus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170901190822.4vmgl72qefp72s2j@redhat.com \
--to=dzickus@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=bp@alien8.de \
--cc=cmetcalf@mellanox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=uobergfe@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).