linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org,
	sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com,
	maz@kernel.org, kernel-team@fb.com, neeraju@codeaurora.org,
	ak@linux.intel.com
Subject: Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
Date: Mon, 12 Apr 2021 11:20:49 -0700	[thread overview]
Message-ID: <20210412182049.GE4510@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <87k0p71whr.ffs@nanos.tec.linutronix.de>

On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote:
> On Sun, Apr 11 2021 at 21:21, Paul E. McKenney wrote:
> > On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote:
> >> So I need to is inline clocksource_verify_percpu_wq()
> >> into clocksource_verify_percpu() and then move the call to
> >> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
> >> before the existing call to list_del_init().  Will do!
> >
> > Except that this triggers the WARN_ON_ONCE() in smp_call_function_single()
> > due to interrupts being disabled across that list_del_init().
> >
> > Possibilities include:
> >
> > 1.	Figure out why interrupts must be disabled only sometimes while
> > 	holding watchdog_lock, in the hope that they need not be across
> > 	the entire critical section for __clocksource_watchdog_kthread().
> > 	As in:
> >
> > 		local_irq_restore(flags);
> > 		clocksource_verify_percpu(cs);
> > 		local_irq_save(flags);
> >
> > 	Trying this first with lockdep enabled.  Might be spectacular.
> 
> Yes, it's a possible deadlock against the watchdog timer firing ...

And lockdep most emphatically agrees with you.  ;-)

> The reason for irqsave is again historical AFAICT and nobody bothered to
> clean it up. spin_lock_bh() should be sufficient to serialize against
> the watchdog timer, though I haven't looked at all possible scenarios.

Though if BH is disabled, there is not so much advantage to
invoking it from __clocksource_watchdog_kthread().  Might as
well just invoke it directly from clocksource_watchdog().

> > 2.	Invoke clocksource_verify_percpu() from its original
> > 	location in clocksource_watchdog(), just before the call to
> > 	__clocksource_unstable().  This relies on the fact that
> > 	clocksource_watchdog() acquires watchdog_lock without
> > 	disabling interrupts.
> 
> That should be fine, but this might cause the softirq to 'run' for a
> very long time which is not pretty either.
> 
> Aside of that, do we really need to check _all_ online CPUs? What you
> are trying to figure out is whether the wreckage is CPU local or global,
> right?
> 
> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
> enough? Either the other CPU has the same wreckage, then it's global or
> it hasn't which points to a per CPU local issue.
> 
> Sure it does not catch the case where a subset (>1) of all CPUs is
> affected, but I'm not seing how that really buys us anything.

Good point!  My thought is to randomly pick eight CPUs to keep the
duration reasonable while having a good chance of hitting "interesting"
CPU choices in multicore and multisocket systems.

However, if a hard-to-reproduce problem occurred, it would be good to take
the hit and scan all the CPUs.  Additionally, there are some workloads
for which the switch from TSC to HPET is fatal anyway due to increased
overhead.  For these workloads, the full CPU scan is no additional pain.

So I am thinking in terms of a default that probes eight randomly selected
CPUs without worrying about duplicates (as in there would be some chance
that fewer CPUs would actually be probed), but with a boot-time flag
that does all CPUs.  I would add the (default) random selection as a
separate patch.

I will send a new series out later today, Pacific Time.

							Thanx, Paul

  reply	other threads:[~2021-04-12 18:20 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06  0:40 [PATCH RFC clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
2021-01-06  0:41 ` [PATCH RFC clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-01-06  0:41 ` [PATCH RFC clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-01-06 16:28   ` Rik van Riel
2021-01-06 19:53     ` Paul E. McKenney
2021-01-06 20:59       ` Rik van Riel
2021-01-06  0:41 ` [PATCH RFC clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-01-06  0:41 ` [PATCH RFC clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-01-06  0:41 ` [PATCH RFC clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-01-12  0:42 ` [PATCH v2 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
2021-01-12  0:45   ` [PATCH v2 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-01-12  0:45   ` [PATCH v2 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-01-12  0:45   ` [PATCH v2 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-01-12  0:45   ` [PATCH v2 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-01-12  0:45   ` [PATCH v2 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-02-02 17:04   ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
2021-02-02 17:06     ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-02-02 17:06     ` [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-02-02 17:06     ` [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-02-02 17:06     ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-02-02 19:51       ` Randy Dunlap
2021-02-03  0:50         ` Paul E. McKenney
2021-02-03  1:31           ` Randy Dunlap
2021-02-03  1:40             ` Paul E. McKenney
2021-02-02 17:06     ` [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-02-17 21:28     ` [PATCH v3 clocksource] Do not mark clocks unstable due to delays Paul E. McKenney
2021-02-17 21:29       ` [PATCH clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-02-17 21:29       ` [PATCH clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-02-17 21:29       ` [PATCH clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-02-17 21:29       ` [PATCH clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-02-17 21:29       ` [PATCH clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-03-04  0:49       ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-03-04  0:53         ` [PATCH kernel/time 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-03-04  0:53         ` [PATCH kernel/time 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-03-04  0:53         ` [PATCH kernel/time 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-03-04  0:53         ` [PATCH kernel/time 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-03-04  0:53         ` [PATCH kernel/time 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-04-02 20:29         ` [PATCH v5 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide module parameters to inject delays in watchdog paulmck
2021-04-02 22:22             ` Thomas Gleixner
2021-04-02 22:37               ` Paul E. McKenney
2021-04-02 22:48               ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-02 22:49                 ` [PATCH v7 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog paulmck
2021-04-02 22:49                 ` [PATCH v7 clocksource 2/5] clocksource: Retry clock read if long delays detected paulmck
2021-04-10  8:41                   ` Thomas Gleixner
2021-04-10 23:50                     ` Paul E. McKenney
2021-04-02 22:49                 ` [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-04-10  9:00                   ` Thomas Gleixner
2021-04-11  0:20                     ` Paul E. McKenney
2021-04-11 10:33                       ` Thomas Gleixner
2021-04-11 16:46                         ` Paul E. McKenney
2021-04-12  4:21                           ` Paul E. McKenney
2021-04-12 13:08                             ` Thomas Gleixner
2021-04-12 18:20                               ` Paul E. McKenney [this message]
2021-04-12 18:54                                 ` Thomas Gleixner
2021-04-12 19:57                                   ` Paul E. McKenney
2021-04-12 20:37                                     ` Thomas Gleixner
2021-04-12 23:18                                       ` Paul E. McKenney
2021-04-13 20:49                                         ` Thomas Gleixner
2021-04-14  4:48                                           ` Paul E. McKenney
2021-04-02 22:49                 ` [PATCH v7 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-04-02 22:49                 ` [PATCH v7 clocksource 5/5] clocksource: Do pairwise clock-desynchronization checking paulmck
2021-04-10  9:04                   ` Thomas Gleixner
2021-04-11  0:21                     ` Paul E. McKenney
2021-04-10  8:01                 ` [PATCH v7 clocksource] Do not mark clocks unstable due to delays for v5.13 Thomas Gleixner
2021-04-10 23:26                   ` Paul E. McKenney
2021-04-11 10:58                     ` Thomas Gleixner
2021-04-11 16:50                       ` Paul E. McKenney
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Retry clock read if long delays detected paulmck
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Check per-CPU clock synchronization when marked unstable paulmck
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Provide a module parameter to fuzz per-CPU clock checking paulmck
2021-04-02 20:31           ` [PATCH v6 clocksource] Do not mark clocks unstable dueclocksource: Do pairwise clock-desynchronization checking paulmck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210412182049.GE4510@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=Mark.Rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).