All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org,
	sboyd@kernel.org, corbet@lwn.net, Mark.Rutland@arm.com,
	maz@kernel.org, kernel-team@meta.com, neeraju@codeaurora.org,
	ak@linux.intel.com, feng.tang@intel.com, zhengjun.xing@intel.com,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Chris Mason <clm@meta.com>, John Stultz <jstultz@google.com>,
	Waiman Long <longman@redhat.com>
Subject: Re: [PATCH clocksource 1/3] clocksource: Reject bogus watchdog clocksource measurements
Date: Thu, 17 Nov 2022 22:57:34 +0100	[thread overview]
Message-ID: <87mt8pkzw1.ffs@tglx> (raw)
In-Reply-To: <20221114232827.835599-1-paulmck@kernel.org>

Paul!

On Mon, Nov 14 2022 at 15:28, Paul E. McKenney wrote:
>  
> +		/* Check for bogus measurements. */
> +		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> +		if (wd_nsec < (wdi >> 2)) {
> +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +			continue;
> +		}
> +		if (wd_nsec > (wdi << 2)) {
> +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +			continue;
> +		}

This is really getting ridiculous.

The clocksource watchdog is supposed to run periodically with period =
WATCHDOG_INTERVAL.

That periodic schedule depends on the clocksource which is monitored. If
the clocksource runs fast the period is shortened and if it runs slow is
prolonged.

Now you add checks:

 1) If the period observed by the watchdog clocksource is less than 1/4
    of the expected period, everything is fine.

 2) If the period observed by the watchdog clocksource is greater than 4
    times the expected period, everything is fine.

IOW, you are preventing detection of one class of problems which caused
us to implement the watchdog clocksource in the first place.

You are preventing it by making the watchdog decision circular dependent
on the clocksource it is supposed to watch. IOW, you put a fox in charge
of the henhouse. That's a really brilliant plan.

But what's worse is the constant stream of heuristics which make the
clocksource watchdog "work" under workloads which are simply impossible
to be handled by its current implementation.

If I look at the full set of them by now, I'm pretty sure that a real
TSC fail would not be noticed anymore because there are more exceptions
and excuses why a particular measurement it bogus or invalid or
whatever.

I didn't do a full analysis yet, but I have a hard time to convince
myself that - assumed we add this gem - the watchdog will be anything
else than a useless waste of CPU cycles as there is always one of the
heuristics declaring that everything is fine along with incomprehensible
messages in dmesg which will create more confusion to most people than
being helpful.

This is hunting us for 20+ years now and why do we still need this? I'm
pretty sure that farcebook does not run their server farms on 20 years
old silicon.

That means even the largest customers have not been able to convince the
CPU manufactures to fix this idiocy by now, right? Either that or they
did not even try because it's simpler to "fix" it in software.

That's the only two explanations I have for the constant stream of
voodoo logic. Both are just a proof for my claim that this industry just
"works" by chance.

Can we stop this pretty please and either come up with something
fundamentally different or just admit defeat and remove the whole thing?

Thanks,

        tglx


  reply	other threads:[~2022-11-17 21:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-14 23:28 [PATCH clocksource 0/3] Reject bogus watchdog clocksource measurements Paul E. McKenney
2022-11-14 23:28 ` [PATCH clocksource 1/3] clocksource: " Paul E. McKenney
2022-11-17 21:57   ` Thomas Gleixner [this message]
2022-11-17 23:09     ` Paul E. McKenney
2022-11-21  0:55       ` Feng Tang
2022-11-21 15:21         ` Paul E. McKenney
2022-11-21 18:14         ` Paul E. McKenney
2022-11-22 15:55           ` Feng Tang
2022-11-22 22:07             ` Paul E. McKenney
2022-11-23  2:36               ` Feng Tang
2022-11-23 21:23                 ` Paul E. McKenney
2022-11-28  2:15                   ` Feng Tang
2022-11-29 19:29                     ` Paul E. McKenney
2022-11-30  1:38                       ` Feng Tang
2022-11-30  4:12                         ` Paul E. McKenney
2022-11-30  4:49                           ` Feng Tang
2022-11-30  5:16                             ` Paul E. McKenney
2022-11-30  5:35                               ` Feng Tang
2022-11-30  5:50                                 ` Paul E. McKenney
2022-11-30  6:00                                   ` Feng Tang
2022-12-01 17:24                                     ` Paul E. McKenney
2022-12-02  1:10                                       ` Feng Tang
2022-12-02  1:44                                         ` Paul E. McKenney
2022-12-02  2:02                                           ` Feng Tang
2022-12-02 22:24                                             ` Paul E. McKenney
2022-12-03  2:51                                               ` Feng Tang
2022-11-14 23:28 ` [PATCH clocksource 2/3] clocksource: Add comments to classify bogus measurements Paul E. McKenney
2022-11-14 23:28 ` [PATCH clocksource 3/3] clocksource: Exponential backoff for load-induced bogus watchdog reads Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mt8pkzw1.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=Mark.Rutland@arm.com \
    --cc=ak@linux.intel.com \
    --cc=clm@meta.com \
    --cc=corbet@lwn.net \
    --cc=feng.tang@intel.com \
    --cc=john.stultz@linaro.org \
    --cc=jstultz@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=maz@kernel.org \
    --cc=neeraju@codeaurora.org \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=zhengjun.xing@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.