linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Feng Tang <feng.tang@intel.com>
Cc: Waiman Long <longman@redhat.com>,
	John Stultz <john.stultz@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Cassio Neri <cassio.neri@gmail.com>,
	Linus Walleij <linus.walleij@linaro.org>,
	Colin Ian King <colin.king@canonical.com>,
	Frederic Weisbecker <frederic@kernel.org>
Subject: Re: [PATCH 1/2] clocksource: Avoid accidental unstable marking of clocksources
Date: Tue, 16 Nov 2021 12:36:50 -0800	[thread overview]
Message-ID: <20211116203650.GV641268@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <20211116013651.GC34844@shbuild999.sh.intel.com>

On Tue, Nov 16, 2021 at 09:36:51AM +0800, Feng Tang wrote:
> On Mon, Nov 15, 2021 at 06:07:09AM -0800, Paul E. McKenney wrote:
> > On Mon, Nov 15, 2021 at 03:59:15PM +0800, Feng Tang wrote:
> > > On Sun, Nov 14, 2021 at 10:24:56PM -0500, Waiman Long wrote:
> > > > 
> > > > On 11/14/21 21:08, Feng Tang wrote:
> > > > > Or did you have something else in mind?
> > > > > > > > I'm not sure the detail in  Waiman's cases, and in our cases (stress-ng)
> > > > > > > > the delay between watchdog's (HPET here) read were not linear, that
> > > > > > > > from debug data, sometimes the 3-2 difference could be bigger or much
> > > > > > > > bigger than the 2-1 difference.
> > > > > > > > 
> > > > > > > > The reason could be the gap between 2 reads depends hugely on the system
> > > > > > > > pressure at that time that 3 HPET read happens. On our test box (a
> > > > > > > > 2-Socket Cascade Lake AP server), the 2-1 and 3-2 difference are stably
> > > > > > > > about 2.5 us,  while under the stress it could be bumped to from 6 us
> > > > > > > > to 2800 us.
> > > > > > > > 
> > > > > > > > So I think checking the 3-2 difference plus increasing the max retries
> > > > > > > > to 10 may be a simple way, if the watchdog read is found to be
> > > > > > > > abnormally long, we skip this round of check.
> > > > > > > On one of the test system, I had measured that normal delay
> > > > > > > (hpet->tsc->hpet) was normally a bit over 2us. It was a bit more than 4us at
> > > > > > > bootup time. However, the same system under stress could have a delay of
> > > > > > > over 200us at bootup time. When I measured the consecutive hpet delay, it
> > > > > > > was about 180us. So hpet read did dominate the total clocksource read delay.
> > > > > > Thank you both for the data!
> > > > > > 
> > > > > > > I would not suggest increasing the max retries as it may still fail in most
> > > > > > > cases because the system stress will likely not be going away within a short
> > > > > > > time. So we are likely just wasting cpu times. I believe we should just skip
> > > > > > > it if it is the watchdog read that is causing most of the delay.
> > > > > > If anything, adding that extra read would cause me to -reduce- the number
> > > > > > of retries to avoid increasing the per-watchdog overhead.
> > > > > I understand Waiman's concern here, and in our test patch, the 2
> > > > > consecutive watchdog read delay check is done inside this retrying
> > > > > loop accompanying the 'cs' read, and once an abnormal delay is found,
> > > > > the watchdog check is skipped without waiting for the max-retries to
> > > > > complete.
> > > > > 
> > > > > Our test data shows the consecutive delay is not always big even when
> > > > > the system is much stressed, that's why I suggest to increase the
> > > > > retries.
> > > > 
> > > > If we need a large number of retries to avoid triggering the unstable TSC
> > > > message, we should consider increase the threshod instead. Right?
> > > > 
> > > > That is why my patch 2 makes the max skew value a configurable option so
> > > > that we can tune it if necessary.
> > > 
> > > I'm fine with it, though the ideal case I expected is with carefully
> > > picked values for max_retries/screw_threshhold, we could save the users
> > > from configuring these. But given the complexity of all HWs out there,
> > > it's not an easy goal.
> > 
> > That is my goal as well, but I expect that more experience, testing,
> > and patches will be required to reach that goal.
> > 
> > > And I still suggest to put the consecutive watchdog read check inside
> > > the retry loop, so that it could bail out early when detecting the
> > > abnormal delay.
> > 
> > If the HPET read shows abnormal delay, agreed.  But if the abnormal
> > delay is only in the clocksource under test (TSC in this case), then
> > a re-read seems to me to make sense.
> 
> Yes, I agree. The retry logic you introeduced does help to filter
> many false alarms from a watchdog. 
> 
> > > Another thing is we may need to set the 'watchdog_reset_pending', as
> > > under the stress, there could be consecutive many times of "skipping"
> > > watchdog check, and the saved value of 'cs' and 'watchdog' should be
> > > reset.
> > 
> > My thought was to count a read failure only if the HPET read did not
> > have excessive delays.  This means that a cache-buster workload could 
> > indefinitely delay a clock-skew check, which was one reason that I
> > was thinking in terms of using the actual measured delays to set the
> > clock-skew check criterion.
> > 
> > Either way, something like Waiman's patch checking the HPET delay looks
> > to me to be valuable.
> 
> Yes, and Wainman is working on a new version.

Looking forward to seeing it!

> btw, here is our easy reproducer (the case you have worked with Oliver
> Sang), running the stress-ng's case (192 is the CPU number of the test
> box):
> 
>  sudo stress-ng --timeout 30 --times --verify --metrics-brief --ioport 192

Good to know, thank you!

							Thanx, Paul

  reply	other threads:[~2021-11-16 20:36 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10 22:17 [PATCH 0/2] clocksource: Avoid incorrect hpet fallback Waiman Long
2021-11-10 22:17 ` [PATCH 1/2] clocksource: Avoid accidental unstable marking of clocksources Waiman Long
2021-11-11  4:57   ` Feng Tang
2021-11-11 14:43     ` Paul E. McKenney
2021-11-12  5:44       ` Feng Tang
2021-11-12 13:47         ` Paul E. McKenney
2021-11-13  3:43         ` Waiman Long
2021-11-14 15:54           ` Paul E. McKenney
2021-11-15  2:08             ` Feng Tang
2021-11-15  3:24               ` Waiman Long
2021-11-15  7:59                 ` Feng Tang
2021-11-15 14:07                   ` Paul E. McKenney
2021-11-16  1:36                     ` Feng Tang
2021-11-16 20:36                       ` Paul E. McKenney [this message]
2021-11-15 19:19                   ` Waiman Long
2021-11-10 22:17 ` [PATCH 2/2] clocksource: Add a Kconfig option for WATCHDOG_MAX_SKEW Waiman Long
2021-11-10 22:32 ` [PATCH 0/2] clocksource: Avoid incorrect hpet fallback Paul E. McKenney
2021-11-10 23:25   ` Waiman Long
2021-11-11  0:04     ` Paul E. McKenney
2021-11-11  1:19       ` Waiman Long
2021-11-11  1:23 ` Feng Tang
2021-11-11  1:30   ` Waiman Long
2021-11-11  1:53     ` Feng Tang
2021-11-11  3:07       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211116203650.GV641268@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=cassio.neri@gmail.com \
    --cc=colin.king@canonical.com \
    --cc=feng.tang@intel.com \
    --cc=frederic@kernel.org \
    --cc=john.stultz@linaro.org \
    --cc=linus.walleij@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).