LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] clocksource: Untrust the clocksource watchdog when its interval is too small
@ 2019-05-16  9:06 Harry Pan
  2019-05-18 14:10 ` [PATCH v2] " Harry Pan
  0 siblings, 1 reply; 5+ messages in thread
From: Harry Pan @ 2019-05-16  9:06 UTC (permalink / raw)
  To: LKML; +Cc: gs0622, Harry Pan, Stephen Boyd, Thomas Gleixner, John Stultz

This patch performs a sanity check on the deviation of the clocksource watchdog,
target to reduce false alarm that incorrectly marks current clocksource unstable
when there comes discrepancy.

Say if there is a discrepancy between the current clocksource and watchdog,
validate the watchdog deviation first, if its interval is too small against
the expected timer interval, we shall trust the current clocksource.

It is identified on some Coffee Lake platform w/ PC10 allowed, when the CPU
entered and exited from PC10 (the residency counter is increased), the HPET
generates timestamp delay, this causes discrepancy making kernel incorrectly
untrust the current clocksource (TSC in this case) and re-select the next
clocksource which is the problematic HPET, this eventually causes a user
sensible wall clock delay.

The HPET timestamp delay shall be tackled in firmware domain in order to
properly handle the timer offload between XTAL and RTC when it enters PC10,
while this patch is a mitigation to reduce the false alarm of clocksource
unstable regardless what clocksources are paired.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203183
Signed-off-by: Harry Pan <harry.pan@intel.com>

---

 kernel/time/clocksource.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3bcc19ceb073..fb0a67827346 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -96,6 +96,7 @@ static u64 suspend_start;
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
 static void clocksource_watchdog_work(struct work_struct *work);
 static void clocksource_select(void);
+static void clocksource_dequeue_watchdog(struct clocksource *cs);
 
 static LIST_HEAD(watchdog_list);
 static struct clocksource *watchdog;
@@ -236,6 +237,12 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		/* Check the deviation from the watchdog clocksource. */
 		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+			if (wd_nsec < jiffies_to_nsecs(WATCHDOG_INTERVAL) - WATCHDOG_THRESHOLD) {
+				pr_err("Stop timekeeping watchdog '%s' because expected interval is too small in %lld ns only\n",
+					watchdog->name, wd_nsec);
+				clocksource_dequeue_watchdog(cs);
+				return;
+			}
 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
 				smp_processor_id(), cs->name);
 			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] clocksource: Untrust the clocksource watchdog when its interval is too small
  2019-05-16  9:06 [PATCH] clocksource: Untrust the clocksource watchdog when its interval is too small Harry Pan
@ 2019-05-18 14:10 ` " Harry Pan
  2019-05-18 15:26   ` Thomas Gleixner
  2019-05-18 17:45   ` [PATCH v3] clocksource: Untrust the watchdog if " Harry Pan
  0 siblings, 2 replies; 5+ messages in thread
From: Harry Pan @ 2019-05-18 14:10 UTC (permalink / raw)
  To: LKML; +Cc: gs0622, Harry Pan, Stephen Boyd, Thomas Gleixner, John Stultz

This patch performs a sanity check on the deviation of the clocksource watchdog,
target to reduce false alarm that incorrectly marks current clocksource unstable
when there comes discrepancy.

Say if there is a discrepancy between the current clocksource and watchdog,
validate the watchdog deviation first, if its interval is too small against
the expected timer interval, we shall trust the current clocksource.

It is identified on some Coffee Lake platform w/ PC10 allowed, when the CPU
entered and exited from PC10 (the residency counter is increased), the HPET
generates timestamp delay, this causes discrepancy making kernel incorrectly
untrust the current clocksource (TSC in this case) and re-select the next
clocksource which is the problematic HPET, this eventually causes a user
sensible wall clock delay.

The HPET timestamp delay shall be tackled in firmware domain in order to
properly handle the timer offload between XTAL and RTC when it enters PC10,
while this patch is a mitigation to reduce the false alarm of clocksource
unstable regardless what clocksources are paired.

v2: fix resource leak: the locked watchdog_lock

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203183
Signed-off-by: Harry Pan <harry.pan@intel.com>

---

 kernel/time/clocksource.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3bcc19ceb073..090d937d5ec4 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -96,6 +96,7 @@ static u64 suspend_start;
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
 static void clocksource_watchdog_work(struct work_struct *work);
 static void clocksource_select(void);
+static void clocksource_dequeue_watchdog(struct clocksource *cs);
 
 static LIST_HEAD(watchdog_list);
 static struct clocksource *watchdog;
@@ -236,6 +237,12 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		/* Check the deviation from the watchdog clocksource. */
 		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+			if (wd_nsec < jiffies_to_nsecs(WATCHDOG_INTERVAL) - WATCHDOG_THRESHOLD) {
+				pr_err("Stop timekeeping watchdog '%s' because expected interval is too small in %lld ns only\n",
+					watchdog->name, wd_nsec);
+				clocksource_dequeue_watchdog(cs);
+				goto out;
+			}
 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
 				smp_processor_id(), cs->name);
 			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] clocksource: Untrust the clocksource watchdog when its interval is too small
  2019-05-18 14:10 ` [PATCH v2] " Harry Pan
@ 2019-05-18 15:26   ` Thomas Gleixner
       [not found]     ` <CAHECPZMhOmQnvH=usFJoJTmF=Tc74uD+JgE6euWzqwz46LfMMQ@mail.gmail.com>
  2019-05-18 17:45   ` [PATCH v3] clocksource: Untrust the watchdog if " Harry Pan
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2019-05-18 15:26 UTC (permalink / raw)
  To: Harry Pan
  Cc: LKML, gs0622, Stephen Boyd, John Stultz, x86, Dave Hansen,
	H. Peter Anvin

On Sat, 18 May 2019, Harry Pan wrote:

> This patch performs a sanity check on the deviation of the clocksource watchdog,

Please read Documentation/process/submitting-patches.rst and search for
'This patch'.

> target to reduce false alarm that incorrectly marks current clocksource unstable
> when there comes discrepancy.
> 
> Say if there is a discrepancy between the current clocksource and watchdog,
> validate the watchdog deviation first, if its interval is too small against
> the expected timer interval, we shall trust the current clocksource.
> 
> It is identified on some Coffee Lake platform w/ PC10 allowed, when the CPU
> entered and exited from PC10 (the residency counter is increased), the HPET
> generates timestamp delay, this causes discrepancy making kernel incorrectly
> untrust the current clocksource (TSC in this case) and re-select the next
> clocksource which is the problematic HPET, this eventually causes a user
> sensible wall clock delay.
> 
> The HPET timestamp delay shall be tackled in firmware domain in order to
> properly handle the timer offload between XTAL and RTC when it enters PC10,
> while this patch is a mitigation to reduce the false alarm of clocksource
> unstable regardless what clocksources are paired.

That's completely wrong. If Intel managed to wreckage the HPET then the
HPET needs to be blacklisted on those platforms and not worked around in
the watchdog code. HPET is exposed by other means as well which means these
interfaces are broken.

If we finally could trust the TSC then we could avoid the watchdog mess
completely, but it's still exposed to possible SMM/BIOS wreckage and the
multi-socket unreliability. Sigh, I'm explaining this for almost two
decades to Intel that the kernel needs a trustable, reliable clocksource,
but all we get are more "features" which make timekeeping a trainwreck.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3] clocksource: Untrust the watchdog if its interval is too small
  2019-05-18 14:10 ` [PATCH v2] " Harry Pan
  2019-05-18 15:26   ` Thomas Gleixner
@ 2019-05-18 17:45   ` " Harry Pan
  1 sibling, 0 replies; 5+ messages in thread
From: Harry Pan @ 2019-05-18 17:45 UTC (permalink / raw)
  To: LKML; +Cc: gs0622, Harry Pan, Stephen Boyd, Thomas Gleixner, John Stultz

Perform sanity check on the watchdog to validate its interval, avoid
to generate a false alarm that incorrectly marks the main clocksource
as unstable when there comes discrepancy.

Say if there is a discrepancy between the current clocksource and watchdog,
validate the watchdog deviation first, if its interval is too small against
the expected timer interval, we shall trust the current clocksource, else
incorrectly kick off the main clocksource could mess up the wall clock.

It is identified on some Coffee Lake platform w/ PC10 allowed, it has a
problematic HPET timer in the platform integration, when the CPU exited
from the low power mode of PC10, the HPET generates timestamp delay, this
causes discrepancy making kernel incorrectly untrust the current clocksource
(TSC in this case) and re-select the next clocksource which is the problematic
HPET, this eventually causes a user sensible wall clock delay.

v2: fix resource leak: the locked watchdog_lock
v3: revise the communication: focus on the timer self validation

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203183
Signed-off-by: Harry Pan <harry.pan@intel.com>

---

 kernel/time/clocksource.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3bcc19ceb073..090d937d5ec4 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -96,6 +96,7 @@ static u64 suspend_start;
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
 static void clocksource_watchdog_work(struct work_struct *work);
 static void clocksource_select(void);
+static void clocksource_dequeue_watchdog(struct clocksource *cs);
 
 static LIST_HEAD(watchdog_list);
 static struct clocksource *watchdog;
@@ -236,6 +237,12 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		/* Check the deviation from the watchdog clocksource. */
 		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+			if (wd_nsec < jiffies_to_nsecs(WATCHDOG_INTERVAL) - WATCHDOG_THRESHOLD) {
+				pr_err("Stop timekeeping watchdog '%s' because expected interval is too small in %lld ns only\n",
+					watchdog->name, wd_nsec);
+				clocksource_dequeue_watchdog(cs);
+				goto out;
+			}
 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
 				smp_processor_id(), cs->name);
 			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] clocksource: Untrust the clocksource watchdog when its interval is too small
       [not found]     ` <CAHECPZMhOmQnvH=usFJoJTmF=Tc74uD+JgE6euWzqwz46LfMMQ@mail.gmail.com>
@ 2019-05-18 18:21       ` Thomas Gleixner
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2019-05-18 18:21 UTC (permalink / raw)
  To: Harry Pan
  Cc: Harry Pan, LKML, Stephen Boyd, John Stultz, x86, Dave Hansen,
	H. Peter Anvin

Harry,

On Sun, 19 May 2019, Harry Pan wrote:

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

> I just want to point out: it is wrong if a problematic watchdog clocksource
> kicks off the main clocksource while this watchdog mechanism is unable to
> validate itself through a simple interval check;
> there is no any hardwired knowledge in this patch.

The point is, that any clocksource which is not reliable needs to be marked
as such and cannot be used as watchdog clocksource or as clocksource at all.

You're papering over the underlying problem. If HPET is not longer
reliable, then HPET needs to be blacklisted, not only as clocksource, also
as clockevent device and no exposure via the HPET device interface.

Everything else is just cosmetical surgery. And no, we are not going to
verify whether the watchdog clocksource might be wrong simply because you
create a circular dependency of what is watching what.

Please provide a list of SKUs which are affected and we disable HPET on
those.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16  9:06 [PATCH] clocksource: Untrust the clocksource watchdog when its interval is too small Harry Pan
2019-05-18 14:10 ` [PATCH v2] " Harry Pan
2019-05-18 15:26   ` Thomas Gleixner
     [not found]     ` <CAHECPZMhOmQnvH=usFJoJTmF=Tc74uD+JgE6euWzqwz46LfMMQ@mail.gmail.com>
2019-05-18 18:21       ` Thomas Gleixner
2019-05-18 17:45   ` [PATCH v3] clocksource: Untrust the watchdog if " Harry Pan

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git