All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luming Yu <luming.yu@gmail.com>
To: paulmck@kernel.org
Cc: Andi Kleen <ak@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	john.stultz@linaro.org, sboyd@kernel.org, corbet@lwn.net,
	Mark.Rutland@arm.com, maz@kernel.org, kernel-team@fb.com,
	neeraju@codeaurora.org, feng.tang@intel.com,
	zhengjun.xing@intel.com, Chris Mason <clm@fb.com>
Subject: Re: [PATCH v10 clocksource 1/7] clocksource: Provide module parameters to inject delays in watchdog
Date: Wed, 28 Apr 2021 22:24:42 +0800	[thread overview]
Message-ID: <CAJRGBZyFbvpdrbKmV9KrXz7VkMcTMqVp2PAcFRvroZue6b9tag@mail.gmail.com> (raw)
In-Reply-To: <20210428135725.GN975577@paulmck-ThinkPad-P17-Gen-1>

On Wed, Apr 28, 2021 at 9:57 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Apr 28, 2021 at 12:49:12PM +0800, Luming Yu wrote:
> > We 'd expect to see clock_source watchdog can avoid to do wrong thing
> > due to the injected delay or
> > in real life delay by doing tsc sync-re-check by applying the patch-set.
> > However , the noise is still cause wrong actions and the patch doesn't
> > defeat the injected's delay
> > please correct me if  I'm wrong.
>
> Injecting delay is just a test.  In real life, if you got four delays
> in a row, the cause is likely that the clock read is broken and taking
> a very long time.  In which case marking that clock unstable is a
> reasonable response.
>
> Other causes include having an NMI or SMI storm, getting extremely
> unlucky with vCPU preemptions, and so on.  In these cases, you are not
> making much forward progress anyway, so marked-unstable clock is the
> least of your worries.
>
> I ran this (without injected delays) for more than a thousand hours on
> rcutorture guest OSes and didn't see any instances of even two consecutive
> bad reads.   There was the very occasional single instance of a bad read.
>
> Therefore, the code marks the clock unstable if it has four bad reads
> in a row, as it should.

The hard problem to solve is tsc is still in good shape and it can be verified
with a quick cross check with other thread/core's tsc counts in the
injected situation or in real life case
to prove if it is  truly a tsc problem or reference clock's problem of
the watchdog.

Ideally, we could factor out hard-to-debug unstable tsc problems from
clock source watchdog problems
and get less and less tsc sightings caused by clock source watchdog.

>
>                                                         Thanx, Paul
>
> > parameters]# cat *
> > 1
> > 1
> > -1
> > 3
> > 8
> >
> > [62939.809615] clocksource: clocksource_watchdog_inject_delay():
> > Injecting delay.
> > [62939.816867] clocksource: clocksource_watchdog_inject_delay():
> > Injecting delay.
> > [62939.824094] clocksource: clocksource_watchdog_inject_delay():
> > Injecting delay.
> > [62939.831314] clocksource: clocksource_watchdog_inject_delay():
> > Injecting delay.
> > [62939.838536] clocksource: timekeeping watchdog on CPU26: hpet
> > read-back delay of 7220833ns, attempt 4, marking unstable
> > [62939.849230] tsc: Marking TSC unstable due to clocksource watchdog
> > [62939.855340] TSC found unstable after boot, most likely due to
> > broken BIOS. Use 'tsc=unstable'.
> > [62939.863972] sched_clock: Marking unstable (62943398530130,
> > -3543150114)<-(62941186607503, -1331276112)
> > [62939.875104] clocksource: Checking clocksource tsc synchronization
> > from CPU 123 to CPUs 0,6,26,62,78,97-98,137.
> > [62939.886518] clocksource: Switched to clocksource hpet
> >
> > On Tue, Apr 27, 2021 at 2:27 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Mon, Apr 26, 2021 at 10:56:27AM -0700, Andi Kleen wrote:
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > - module parameters
> > > > >
> > > > >   If the scope of the fault injection capability is limited to a
> > > > >   single kernel module, it is better to provide module parameters to
> > > > >   configure the fault attributes.
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > And in this case, the fault injection capability is in fact limited to
> > > > > kernel/clocksource.c.
> > > >
> > > >
> > > > I disagree with this recommendation because it prevents fuzzer coverage.
> > > >
> > > > Much better to have an uniform interface that can be automatically
> > > > explored.
> > >
> > > The permissions for these module parameters is 0644, so there is no
> > > reason why the fuzzers cannot use them via sysfs.
> > >
> > >                                                         Thanx, Paul

  reply	other threads:[~2021-04-28 14:25 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-25 22:45 [PATCH v10 clocksource 0/7] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-25 22:47 ` [PATCH v10 clocksource 1/7] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
2021-04-26  4:07   ` Andi Kleen
2021-04-26  7:13     ` Thomas Gleixner
2021-04-26 15:28     ` Paul E. McKenney
2021-04-26 16:00       ` Andi Kleen
2021-04-26 16:14         ` Paul E. McKenney
2021-04-26 17:56           ` Andi Kleen
2021-04-26 18:24             ` Paul E. McKenney
2021-04-28  4:49               ` Luming Yu
2021-04-28 13:57                 ` Paul E. McKenney
2021-04-28 14:24                   ` Luming Yu [this message]
2021-04-28 14:37                     ` Thomas Gleixner
2021-04-25 22:47 ` [PATCH v10 clocksource 2/7] clocksource: Retry clock read if long delays detected Paul E. McKenney
2021-04-27  1:44   ` Feng Tang
2021-04-25 22:47 ` [PATCH v10 clocksource 3/7] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
2021-04-26  4:12   ` Andi Kleen
2021-04-26  7:16     ` Thomas Gleixner
2021-04-25 22:47 ` [PATCH v10 clocksource 4/7] clocksource: Provide a module parameter to fuzz per-CPU clock checking Paul E. McKenney
2021-04-25 22:47 ` [PATCH v10 clocksource 5/7] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney
2021-04-25 22:47 ` [PATCH v10 clocksource 6/7] clocksource: Forgive tsc_early pre-calibration drift Paul E. McKenney
2021-04-26 15:01   ` Feng Tang
2021-04-26 15:25     ` Paul E. McKenney
2021-04-26 15:36       ` Feng Tang
2021-04-26 18:26         ` Paul E. McKenney
2021-04-27  1:13           ` Feng Tang
2021-04-27  3:46             ` Paul E. McKenney
2021-04-27  4:16               ` Feng Tang
2021-04-26 15:28     ` Thomas Gleixner
2021-04-27 21:03     ` Thomas Gleixner
2021-04-27  7:27   ` [clocksource] 8c30ace35d: WARNING:at_kernel/time/clocksource.c:#clocksource_watchdog kernel test robot
2021-04-27  7:27     ` kernel test robot
2021-04-27  8:45     ` Feng Tang
2021-04-27  8:45       ` Feng Tang
2021-04-27 13:37       ` Paul E. McKenney
2021-04-27 13:37         ` Paul E. McKenney
2021-04-27 17:50         ` Paul E. McKenney
2021-04-27 17:50           ` Paul E. McKenney
2021-04-27 21:09           ` Thomas Gleixner
2021-04-27 21:09             ` Thomas Gleixner
2021-04-28  1:48             ` Paul E. McKenney
2021-04-28  1:48               ` Paul E. McKenney
2021-04-28 10:14               ` Thomas Gleixner
2021-04-28 10:14                 ` Thomas Gleixner
2021-04-28 18:31                 ` Paul E. McKenney
2021-04-28 18:31                   ` Paul E. McKenney
2021-04-28 13:34             ` Thomas Gleixner
2021-04-28 13:34               ` Thomas Gleixner
2021-04-28 15:39               ` Peter Zijlstra
2021-04-28 15:39                 ` Peter Zijlstra
2021-04-28 17:00                 ` Thomas Gleixner
2021-04-28 17:00                   ` Thomas Gleixner
2021-04-29  7:38                   ` Feng Tang
2021-04-29  7:38                     ` Feng Tang
2021-04-28 18:31               ` Paul E. McKenney
2021-04-28 18:31                 ` Paul E. McKenney
2021-04-29  8:27                 ` Thomas Gleixner
2021-04-29  8:27                   ` Thomas Gleixner
2021-04-29 14:26                   ` Paul E. McKenney
2021-04-29 14:26                     ` Paul E. McKenney
2021-04-29 17:30                     ` Thomas Gleixner
2021-04-29 17:30                       ` Thomas Gleixner
2021-04-29 23:04                       ` Andi Kleen
2021-04-29 23:04                         ` Andi Kleen
2021-04-30  0:24                         ` Paul E. McKenney
2021-04-30  0:24                           ` Paul E. McKenney
2021-04-30  0:59                           ` Paul E. McKenney
2021-04-30  0:59                             ` Paul E. McKenney
2021-04-30  5:08                       ` Paul E. McKenney
2021-04-30  5:08                         ` Paul E. McKenney
2021-04-25 22:47 ` [PATCH v9 clocksource 6/6] clocksource: Reduce WATCHDOG_THRESHOLD Paul E. McKenney
2021-04-25 22:47 ` [PATCH v10 clocksource 7/7] " Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJRGBZyFbvpdrbKmV9KrXz7VkMcTMqVp2PAcFRvroZue6b9tag@mail.gmail.com \
    --to=luming.yu@gmail.com \
    --cc=Mark.Rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=feng.tang@intel.com \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=zhengjun.xing@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.