All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@intel.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	rui.zhang@intel.com, andi.kleen@intel.com, len.brown@intel.com,
	tim.c.chen@intel.com
Subject: Re: [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms
Date: Tue, 7 Dec 2021 09:41:06 +0800	[thread overview]
Message-ID: <20211207014106.GB32145@shbuild999.sh.intel.com> (raw)
In-Reply-To: <20211130162815.GU641268@paulmck-ThinkPad-P17-Gen-1>

Hi Paul,

On Tue, Nov 30, 2021 at 08:28:15AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 30, 2021 at 11:02:56PM +0800, Feng Tang wrote:
> > And similar big gap between 'tsc' and 'hpet' is seen for the server
> > case (5.5 kernel which doesn't have the cs_watchdog_read() patchset). 
> > 
> > [1196945.314929] clocksource: timekeeping watchdog on CPU67: Marking clocksource 'tsc' as unstable because the skew is too large:
> > [1196945.314935] clocksource:                       'hpet' wd_now: 25272026 wd_last: 2e9ce418 mask: ffffffff
> > [1196945.314938] clocksource:                       'tsc' cs_now: 95b400003fdf1 cs_last: 95ae7ed7c33f7 mask: ffffffffffffffff
> > [1196945.314948] tsc: Marking TSC unstable due to clocksource watchdog
> > [1196945.314977] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> > [1196945.314981] sched_clock: Marking unstable (1196945264804527, 50153181)<-(1196945399926576, -84962703)
> > [1196945.316255] clocksource: Switched to clocksource hpet
> > 
> > For this case, I don't have access to the HW and only have the
> > dmesg log, from which it seems the watchdog timer has been postponed
> > a very long time from running.
> 
> Thank you for the analysis!
> 
> One approach to handle this situation would be to avoid checking for
> clock skew if the time since the last watchdog read was more than (say)
> twice the desired watchdog spacing.  This does leave open the question of
> exactly which clocksource to use to measure the time between successive
> clocksource reads.  My thought is to check this only once upon entry to
> the handler and to use the designated-good clocksource.
> 
> Does that make sense, or would something else work better?

For this case that the watchdog timer has been delayed for too long
time (170 seconds here), it may be a general problem. IIRC, there
was a similar report in LKML for a non-x86 platform. 

As for fix, I thought about scalable comparing, say if the timer
is delayed 10 seconds, and our checking interval is 500 ms, then
maybe we can lift the checking margin to 20X. But this has a problem
that the watchdog's counter could wrap, in above case, the HPET
already wrapped once (about 170+ seconds), and the wrap time 
could be much shorter for other timers (4 seconds for acpi_pm timer?).
So your idea of limiting the max delay is reasonable.

Thanks,
Feng

> 							Thanx, Paul

  parent reply	other threads:[~2021-12-07  1:41 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-17  2:37 [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked Feng Tang
2021-11-17  2:37 ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Feng Tang
2021-11-30  6:46   ` Feng Tang
2021-11-30 14:40     ` Paul E. McKenney
2021-11-30 15:02       ` Feng Tang
2021-11-30 16:28         ` Paul E. McKenney
2021-11-30 20:39           ` Thomas Gleixner
2021-11-30 20:47             ` Paul E. McKenney
2021-11-30 21:55               ` Thomas Gleixner
2021-11-30 22:48                 ` Paul E. McKenney
2021-11-30 23:19                   ` Thomas Gleixner
2021-11-30 23:37                     ` Paul E. McKenney
2021-12-01  1:26                       ` Feng Tang
2021-12-01 17:52                         ` Paul E. McKenney
2021-12-07  1:41           ` Feng Tang [this message]
2021-12-01  4:45   ` Luming Yu
2021-12-01  5:19     ` Feng Tang
2021-12-01 10:41     ` Thomas Gleixner
2021-12-01 23:47   ` [tip: x86/urgent] x86/tsc: Disable clocksource watchdog for TSC on qualified platorms tip-bot2 for Feng Tang
2021-12-02  4:47   ` [PATCH v3 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms Luming Yu
2021-12-01 23:47 ` [tip: x86/urgent] x86/tsc: Add a timer to make sure TSC_adjust is always checked tip-bot2 for Feng Tang
2022-03-14 17:52 ` [PATCH v3 1/2] x86/tsc: add a timer to make sure tsc_adjust " Nicolas Saenz Julienne
2022-03-15  1:33   ` Feng Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211207014106.GB32145@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=andi.kleen@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rui.zhang@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.