From: "Paul E. McKenney" <paulmck@kernel.org>
To: Luming Yu <luming.yu@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
John Stultz <john.stultz@linaro.org>,
sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com,
maz@kernel.org, kernel-team@fb.com, neeraju@codeaurora.org,
Andi Kleen <ak@linux.intel.com>,
feng.tang@intel.com, zhengjun.xing@intel.com
Subject: Re: [PATCH V11 clocksource 0/6] Do not mark clocks unstable due to delays for v5.13
Date: Thu, 29 Apr 2021 22:10:59 -0700 [thread overview]
Message-ID: <20210430051059.GE975577@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <CAJRGBZxre5=xt-RQFo6HU3rBYu7YuVtXZxNHicbKFX3FMB1T7A@mail.gmail.com>
On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > Hello!
> >
> > If there is a sufficient delay between reading the watchdog clock and the
> > clock under test, the clock under test will be marked unstable through no
> > fault of its own. This series checks for this, doing limited retries
> > to get a good set of clock reads. If the clock is marked unstable
> > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > This series also provides delay injection, which may be enabled via
> > kernel boot parameters to test the checking for delays.
> >
> > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > vCPU preemption.
> >
> > 1. Provide module parameters to inject delays in watchdog.
> >
> > 2. Retry clock read if long delays detected.
> >
> > 3. Check per-CPU clock synchronization when marked unstable.
> >
> > 4. Provide a module parameter to fuzz per-CPU clock checking.
> >
> > 5. Limit number of CPUs checked for clock synchronization.
> >
> > 6. Reduce clocksource-skew threshold for TSC.
> >
> > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > kernel test robot:
> >
> > o Automatically compute the uncertainty margin for clocksource, and
> > also allow them to be specified manually before that clocksource
> > is registered.
> >
> > o For the automatically computed uncertainty margins, bound them
> > below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> >
> > o For the manually specified uncertainty margins, splat (but
> > continue) if they are less than 100 microseconds (again 2 *
> > WATCHDOG_MAX_SKEW). The purpose of splatting is to discourage
> > production use of this clock-skew-inducing debugging technique.
> >
> > o Manually set the uncertainty margin for clocksource_jiffies
> > (and thus refined_jiffies) to TICK_NSEC to compensate for the
> > very low frequency of these clocks.
> >
> > o Manually set the uncertainty margin for clocksource_tsc_early
> > to 32 milliseconds.
> >
> > o Apply numerous "Link:" fields to all patches.
> >
> > o Add some acks and CCs.
> >
> > Changes since v9:
> >
> > o Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> > Zhengjun; and Thomas Gleixner.
> >
> > o Improve CPU selection for clock-synchronization checking.
> >
> > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> >
> > Changes since v8, based on Thomas Gleixner feedback:
> >
> > o Reduced clock-skew threshold to 200us and delay limit to 50us.
> >
> > o Split out a cs_watchdog_read() function.
> >
> > o Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> >
> > o Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> >
> > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> >
> > Changes since v7, based on Thomas Gleixner feedback:
> >
> > o Fix embarrassing git-format-patch operator error.
> >
> > o Merge pairwise clock-desynchronization checking into the checking
> > of per-CPU clock synchronization when marked unstable.
> >
> > o Do selective per-CPU checking rather than blindly checking all
> > CPUs. Provide a clocksource.verify_n_cpus kernel boot parameter
> > to control this behavior, with the value -1 choosing the old
> > check-all-CPUs behavior. The default is to randomly check 8 CPUs.
> >
> > o Fix the clock-desynchronization checking to avoid a potential
> > use-after-free error for dynamically allocated clocksource
> > structures.
> >
> > o Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> > clocksource skew checking.
> >
> > o Update commit logs and do code-style updates.
> >
> > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> >
> > Changes since v5:
> >
> > o Rebased to v5.12-rc5.
> >
> > Changes since v4:
> >
> > o Rebased to v5.12-rc1.
> >
> > Changes since v3:
> >
> > o Rebased to v5.11.
> >
> > o Apply Randy Dunlap feedback.
> >
> > Changes since v2:
> >
> > o Rebased to v5.11-rc6.
> >
> > o Updated Cc: list.
> >
> > Changes since v1:
> >
> > o Applied feedback from Rik van Riel.
> >
> > o Rebased to v5.11-rc3.
> >
> > o Stripped "RFC" from the subject lines.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > Documentation/admin-guide/kernel-parameters.txt | 32 +++
> > arch/x86/kernel/tsc.c | 1
> > b/Documentation/admin-guide/kernel-parameters.txt | 21 ++
> > b/arch/x86/kernel/tsc.c | 3
> > b/include/linux/clocksource.h | 2
> > b/kernel/time/clocksource.c | 23 ++
> > b/kernel/time/jiffies.c | 15 -
> > include/linux/clocksource.h | 3
> > kernel/time/clocksource.c | 227 ++++++++++++++++++++--
> > 9 files changed, 304 insertions(+), 23 deletions(-)
>
> Hi Paul,
> using the v11, I added a approve flag and made it work for my early
> inject test where tsc is good
> through a cross tsc sync test. Ideally with the small tweak, we could
> get less tsc issues to debug.
> And I'm not sure it would help in real trouble shooting cases. But we
> will see if it would help.
Thank you for the patch!
However, Thomas had me rework this code to put the error injection into
a kernel module, so this effect is now obtained in a different way.
So I am unable to make use of your patch.
Thanx, Paul
> My hack patch snippet as below:
>
>
>
> static void __clocksource_unstable(struct clocksource *cs)
> {
> + if (!cs->approved)
> + return;
> +
> cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
> cs->flags |= CLOCK_SOURCE_UNSTABLE;
>
> @@ -366,9 +369,12 @@
> if (!cpumask_empty(&cpus_behind))
> pr_warn(" CPUs %*pbl behind CPU %d for
> clocksource %s.\n",
> cpumask_pr_args(&cpus_behind), testcpu, cs->name);
> - if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind))
> + if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
> pr_warn(" CPU %d check durations %lldns -
> %lldns for clocksource %s.\n",
> testcpu, cs_nsec_min, cs_nsec_max, cs->name);
> + cs->approved = true;
> + __clocksource_unstable(cs);
> + }
> }
>
> static void clocksource_watchdog(struct timer_list *unused)
> @@ -396,6 +402,7 @@
>
> if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
> /* Clock readout unreliable, so give it up. */
> + cs->approved = false;
> __clocksource_unstable(cs);
> continue;
> }
next prev parent reply other threads:[~2021-04-30 5:11 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-29 1:29 [PATCH V11 clocksource 0/6] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-29 1:30 ` [PATCH v11 clocksource 1/6] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
2021-04-29 1:30 ` [PATCH v11 clocksource 2/6] clocksource: Retry clock read if long delays detected Paul E. McKenney
2021-04-29 1:30 ` [PATCH v11 clocksource 3/6] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
2021-04-29 1:30 ` [PATCH v11 clocksource 4/6] clocksource: Provide a module parameter to fuzz per-CPU clock checking Paul E. McKenney
2021-04-29 1:30 ` [PATCH v11 clocksource 5/6] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney
2021-04-29 1:30 ` [PATCH v11 clocksource 6/6] clocksource: Reduce clocksource-skew threshold for TSC Paul E. McKenney
2021-04-29 7:02 ` kernel test robot
2021-04-29 14:04 ` Paul E. McKenney
2021-04-29 11:13 ` [PATCH V11 clocksource 0/6] Do not mark clocks unstable due to delays for v5.13 Luming Yu
2021-04-30 5:10 ` Paul E. McKenney [this message]
2021-04-30 6:52 ` Luming Yu
2021-05-01 4:28 ` Paul E. McKenney
2021-06-02 5:10 ` Luming Yu
2021-06-02 17:46 ` Paul E. McKenney
2021-06-02 18:24 ` Paul E. McKenney
2021-06-03 4:25 ` Luming Yu
2021-06-03 14:40 ` Paul E. McKenney
2021-06-04 4:23 ` Luming Yu
2021-06-04 20:26 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210430051059.GE975577@paulmck-ThinkPad-P17-Gen-1 \
--to=paulmck@kernel.org \
--cc=Mark.Rutland@arm.com \
--cc=ak@linux.intel.com \
--cc=corbet@lwn.net \
--cc=feng.tang@intel.com \
--cc=john.stultz@linaro.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luming.yu@gmail.com \
--cc=maz@kernel.org \
--cc=neeraju@codeaurora.org \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
--cc=zhengjun.xing@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).