linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org,
	sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com,
	maz@kernel.org, kernel-team@fb.com, neeraju@codeaurora.org,
	ak@linux.intel.com, Chris Mason <clm@fb.com>
Subject: Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
Date: Sat, 17 Apr 2021 16:51:36 -0700	[thread overview]
Message-ID: <20210417235136.GD5006@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <87sg3prsbt.ffs@nanos.tec.linutronix.de>

On Sat, Apr 17, 2021 at 02:47:18PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 13 2021 at 21:36, Paul E. McKenney wrote:
> 
> Bah, hit send too quick.
> 
> > +	cpumask_clear(&cpus_ahead);
> > +	cpumask_clear(&cpus_behind);
> > +	preempt_disable();
> 
> Daft. 

Would migrate_disable() be better?

Yes, I know, in virtual environments, the hypervisor can migrate anyway,
but this does limit the potential damage to one out of the two schedulers.

> > +	testcpu = smp_processor_id();
> > +	pr_warn("Checking clocksource %s synchronization from CPU %d.\n", cs->name, testcpu);
> > +	for_each_online_cpu(cpu) {
> > +		if (cpu == testcpu)
> > +			continue;
> > +		csnow_begin = cs->read(cs);
> > +		smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 1);
> > +		csnow_end = cs->read(cs);
> 
> As this must run with interrupts enabled, that's a pretty rough
> approximation like measuring wind speed with a wet thumb.
> 
> Wouldn't it be smarter to let the remote CPU do the watchdog dance and
> take that result? i.e. split out more of the watchdog code so that you
> can get the nanoseconds delta on that remote CPU to the watchdog.

First, an interrupt, NMI, SMI, vCPU preemption, or whatever could
not cause a false positive.  A false negative, perhaps, but no
false positives.  Second, in normal operation, these are rare, so that
hitting the (eventual) default of eight CPUs is very likely to result in
tight bounds on the delay-based error for most of those CPUs.  Third,
we really need to compare the TSC on one CPU to the TSC on the other
in order to have a very clear indication of a problem, should a real
TSC-synchronization issue arise.  In contrast, comparisons against the
watchdog timer will be more complicated and any errors detected will be
quite hard to prove to be due to TSC issues.

Or am I once again missing something?

> > +		delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
> > +		if (delta < 0)
> > +			cpumask_set_cpu(cpu, &cpus_behind);
> > +		delta = (csnow_end - csnow_mid) & cs->mask;
> > +		if (delta < 0)
> > +			cpumask_set_cpu(cpu, &cpus_ahead);
> > +		delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
> > +		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
> 
> > +		if (firsttime || cs_nsec > cs_nsec_max)
> > +			cs_nsec_max = cs_nsec;
> > +		if (firsttime || cs_nsec < cs_nsec_min)
> > +			cs_nsec_min = cs_nsec;
> > +		firsttime = 0;
> 
>   int64_t cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;
> 
> and then the firsttime muck is not needed at all.

Good point, will fix!

And again, thank you for looking all of this over.

							Thanx, Paul

  reply	other threads:[~2021-04-17 23:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14  4:34 [PATCH v8 clocksource] Do not mark clocks unstable due to delays for v5.13 Paul E. McKenney
2021-04-14  4:35 ` [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog Paul E. McKenney
2021-04-16 20:10   ` Thomas Gleixner
2021-04-16 22:38     ` Paul E. McKenney
2021-04-14  4:35 ` [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected Paul E. McKenney
2021-04-16 20:45   ` Thomas Gleixner
2021-04-17  0:25     ` Paul E. McKenney
2021-04-17 12:24   ` Thomas Gleixner
2021-04-17 22:54     ` Paul E. McKenney
2021-04-17 23:15       ` Thomas Gleixner
2021-04-17 23:40         ` Paul E. McKenney
2021-04-14  4:36 ` [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable Paul E. McKenney
2021-04-17 12:28   ` Thomas Gleixner
2021-04-17 23:42     ` Paul E. McKenney
2021-04-17 12:47   ` Thomas Gleixner
2021-04-17 23:51     ` Paul E. McKenney [this message]
2021-04-18 16:20       ` Paul E. McKenney
2021-04-14  4:36 ` [PATCH v8 clocksource 4/5] clocksource: Provide a module parameter to fuzz per-CPU clock checking Paul E. McKenney
2021-04-14  4:36 ` [PATCH v8 clocksource 5/5] clocksource: Limit number of CPUs checked for clock synchronization Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210417235136.GD5006@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=Mark.Rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).