linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] clocksource: Make clocksource watchdog check with WATCHDOG_INTERVAL period
@ 2022-05-28  1:57 Waiman Long
  2022-05-28  2:07 ` Waiman Long
  0 siblings, 1 reply; 6+ messages in thread
From: Waiman Long @ 2022-05-28  1:57 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner, Stephen Boyd, Feng Tang, Paul E. McKenney
  Cc: linux-kernel, Michael Larabel, Joe Mario, Michey Mehta, Waiman Long

Since commit c86ff8c55b8a ("clocksource: Avoid accidental unstable
marking of clocksource"), a new WD_READ_SKIP value was introduced
as a possible return value of cs_watchdog_read() to skip the current
check. However, this has an undesriable side effect of extending the
time gap between csnow and cs_last to more than one WATCHDOG_INTERVAL
(0.5s) in case of intermittent WD_READ_SKIP's.

There was an instance of reported clocksource watchdog failure with
the time skew of 485us where the uncertainly threshold is 400us. In
that particular case, the (now - last) gap was about 2s. Looking at
the dmesg log, it was clear there was a successful cs_watchdog_read()
followed by 3 skips and then another successful cs_watchdog_read().

If there is an existing skew between the hpet (watchdog) and tsc
clocksource, enlarging the period by 4x will certainly increase the
measured skew causing it to exceed the threshold in this case. Fix
this variable period problem by resetting the CLOCK_SOURCE_WATCHDOG bit
after each WD_READ_SKIP to force the reloading of wd_last and cs_last
in the next round. This ensures that we have two consecutive successful
cs_watchdog_read()'s before checking the clock skew.

Fixes: c86ff8c55b8a ("clocksource: Avoid accidental unstable marking of clocksource")
Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/time/clocksource.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cee5da1e54c4..173e052c12b6 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -411,9 +411,18 @@ static void clocksource_watchdog(struct timer_list *unused)
 		read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
 
 		if (read_ret != WD_READ_SUCCESS) {
-			if (read_ret == WD_READ_UNSTABLE)
+			if (read_ret == WD_READ_UNSTABLE) {
 				/* Clock readout unreliable, so give it up. */
 				__clocksource_unstable(cs);
+			} else { /* WD_READ_SKIP */
+				/*
+				 * Watchdog clock unstable at the moment,
+				 * discard the stored wd_last and cs_last to
+				 * make sure the gap between now and last
+				 * is always one WATCHDOG_INTERVAL.
+				 */
+				cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
+			}
 			continue;
 		}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource: Make clocksource watchdog check with WATCHDOG_INTERVAL period
  2022-05-28  1:57 [PATCH] clocksource: Make clocksource watchdog check with WATCHDOG_INTERVAL period Waiman Long
@ 2022-05-28  2:07 ` Waiman Long
  2022-05-29 22:16   ` Michael Larabel
  0 siblings, 1 reply; 6+ messages in thread
From: Waiman Long @ 2022-05-28  2:07 UTC (permalink / raw)
  To: Thomas Gleixner, Stephen Boyd, Feng Tang, Paul E. McKenney, John Stultz
  Cc: linux-kernel, Michael Larabel, Joe Mario, Michey Mehta

On 5/27/22 21:57, Waiman Long wrote:
> Since commit c86ff8c55b8a ("clocksource: Avoid accidental unstable
> marking of clocksource"), a new WD_READ_SKIP value was introduced
> as a possible return value of cs_watchdog_read() to skip the current
> check. However, this has an undesriable side effect of extending the
> time gap between csnow and cs_last to more than one WATCHDOG_INTERVAL
> (0.5s) in case of intermittent WD_READ_SKIP's.
>
> There was an instance of reported clocksource watchdog failure with
> the time skew of 485us where the uncertainly threshold is 400us. In
> that particular case, the (now - last) gap was about 2s. Looking at
> the dmesg log, it was clear there was a successful cs_watchdog_read()
> followed by 3 skips and then another successful cs_watchdog_read().
>
> If there is an existing skew between the hpet (watchdog) and tsc
> clocksource, enlarging the period by 4x will certainly increase the
> measured skew causing it to exceed the threshold in this case. Fix
> this variable period problem by resetting the CLOCK_SOURCE_WATCHDOG bit
> after each WD_READ_SKIP to force the reloading of wd_last and cs_last
> in the next round. This ensures that we have two consecutive successful
> cs_watchdog_read()'s before checking the clock skew.
>
> Fixes: c86ff8c55b8a ("clocksource: Avoid accidental unstable marking of clocksource")
> Reported-by: Michael Larabel <Michael@phoronix.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   kernel/time/clocksource.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index cee5da1e54c4..173e052c12b6 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -411,9 +411,18 @@ static void clocksource_watchdog(struct timer_list *unused)
>   		read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
>   
>   		if (read_ret != WD_READ_SUCCESS) {
> -			if (read_ret == WD_READ_UNSTABLE)
> +			if (read_ret == WD_READ_UNSTABLE) {
>   				/* Clock readout unreliable, so give it up. */
>   				__clocksource_unstable(cs);
> +			} else { /* WD_READ_SKIP */
> +				/*
> +				 * Watchdog clock unstable at the moment,
> +				 * discard the stored wd_last and cs_last to
> +				 * make sure the gap between now and last
> +				 * is always one WATCHDOG_INTERVAL.
> +				 */
> +				cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
> +			}
>   			continue;
>   		}
>   

Sorry, I accidentally use the old email address for John.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource: Make clocksource watchdog check with WATCHDOG_INTERVAL period
  2022-05-28  2:07 ` Waiman Long
@ 2022-05-29 22:16   ` Michael Larabel
  2022-05-30  0:16     ` Waiman Long
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Larabel @ 2022-05-29 22:16 UTC (permalink / raw)
  To: Waiman Long, Thomas Gleixner, Stephen Boyd, Feng Tang,
	Paul E. McKenney, John Stultz
  Cc: linux-kernel, Joe Mario, Michey Mehta

On 5/27/22 21:07, Waiman Long wrote:
> On 5/27/22 21:57, Waiman Long wrote:
>> Since commit c86ff8c55b8a ("clocksource: Avoid accidental unstable
>> marking of clocksource"), a new WD_READ_SKIP value was introduced
>> as a possible return value of cs_watchdog_read() to skip the current
>> check. However, this has an undesriable side effect of extending the
>> time gap between csnow and cs_last to more than one WATCHDOG_INTERVAL
>> (0.5s) in case of intermittent WD_READ_SKIP's.
>>
>> There was an instance of reported clocksource watchdog failure with
>> the time skew of 485us where the uncertainly threshold is 400us. In
>> that particular case, the (now - last) gap was about 2s. Looking at
>> the dmesg log, it was clear there was a successful cs_watchdog_read()
>> followed by 3 skips and then another successful cs_watchdog_read().
>>
>> If there is an existing skew between the hpet (watchdog) and tsc
>> clocksource, enlarging the period by 4x will certainly increase the
>> measured skew causing it to exceed the threshold in this case. Fix
>> this variable period problem by resetting the CLOCK_SOURCE_WATCHDOG bit
>> after each WD_READ_SKIP to force the reloading of wd_last and cs_last
>> in the next round. This ensures that we have two consecutive successful
>> cs_watchdog_read()'s before checking the clock skew.
>>
>> Fixes: c86ff8c55b8a ("clocksource: Avoid accidental unstable marking 
>> of clocksource")
>> Reported-by: Michael Larabel <Michael@phoronix.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   kernel/time/clocksource.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index cee5da1e54c4..173e052c12b6 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -411,9 +411,18 @@ static void clocksource_watchdog(struct 
>> timer_list *unused)
>>           read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
>>             if (read_ret != WD_READ_SUCCESS) {
>> -            if (read_ret == WD_READ_UNSTABLE)
>> +            if (read_ret == WD_READ_UNSTABLE) {
>>                   /* Clock readout unreliable, so give it up. */
>>                   __clocksource_unstable(cs);
>> +            } else { /* WD_READ_SKIP */
>> +                /*
>> +                 * Watchdog clock unstable at the moment,
>> +                 * discard the stored wd_last and cs_last to
>> +                 * make sure the gap between now and last
>> +                 * is always one WATCHDOG_INTERVAL.
>> +                 */
>> +                cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>> +            }
>>               continue;
>>           }
>
> Sorry, I accidentally use the old email address for John.
>
> Cheers,
> Longman


I've tested this patch on the affected Daytona + Milan-X system and can 
confirm it does fix the performance problem that led to this issue. 
Though it is spamming the kernel log now every half-second with 
clocksource messages,  not sure if that is intended/desirable behavior?


[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2195.990 MHz processor
[    1.238759] clocksource: tsc-early: mask: 0xffffffffffffffff 
max_cycles: 0x1fa766bc6ba, max_idle_ns: 440795275714 ns
[    2.769608] clocksource: Switched to clocksource tsc-early
[    3.263925] clocksource: wd-tsc-early-wd read-back delay of 292215ns, 
clock-skew test skipped!
[    3.743804] clocksource: wd-tsc-early-wd read-back delay of 268469ns, 
clock-skew test skipped!
[    3.935663] tsc: Refined TSC clocksource calibration: 2195.274 MHz
[    3.935844] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 
0x1fa4c255513, max_idle_ns: 440795289702 ns
[    3.936449] clocksource: Switched to clocksource tsc
[    4.255932] clocksource: wd-tsc-wd read-back delay of 260228ns, 
clock-skew test skipped!
[    4.767892] clocksource: wd-tsc-wd read-back delay of 272520ns, 
clock-skew test skipped!
[    5.247581] clocksource: wd-tsc-wd read-back delay of 200444ns, 
clock-skew test skipped!
[    5.759560] clocksource: wd-tsc-wd read-back delay of 165942ns, 
clock-skew test skipped!
[    6.239687] clocksource: wd-tsc-wd read-back delay of 232222ns, 
clock-skew test skipped!
[    7.264014] clocksource: wd-tsc-wd read-back delay of 282927ns, 
clock-skew test skipped!
[    7.743864] clocksource: wd-tsc-wd read-back delay of 288374ns, 
clock-skew test skipped!
[    8.255590] clocksource: wd-tsc-wd read-back delay of 206730ns, 
clock-skew test skipped!
[    8.767778] clocksource: wd-tsc-wd read-back delay of 267771ns, 
clock-skew test skipped!
[    9.247870] clocksource: wd-tsc-wd read-back delay of 224469ns, 
clock-skew test skipped!
[   10.239340] clocksource: wd-tsc-wd read-back delay of 109720ns, 
clock-skew test skipped!
[   12.255276] clocksource: wd-tsc-wd read-back delay of 104692ns, 
clock-skew test skipped!
[   16.255362] clocksource: wd-tsc-wd read-back delay of 122780ns, 
clock-skew test skipped!
[   17.759335] clocksource: wd-tsc-wd read-back delay of 155885ns, 
clock-skew test skipped!
[   18.239500] clocksource: wd-tsc-wd read-back delay of 176558ns, 
clock-skew test skipped!
[   18.751341] clocksource: wd-tsc-wd read-back delay of 157352ns, 
clock-skew test skipped!
[   19.263618] clocksource: wd-tsc-wd read-back delay of 177606ns, 
clock-skew test skipped!
[   19.743487] clocksource: wd-tsc-wd read-back delay of 157841ns, 
clock-skew test skipped!
[   20.255482] clocksource: wd-tsc-wd read-back delay of 157701ns, 
clock-skew test skipped!
[   20.767634] clocksource: wd-tsc-wd read-back delay of 173136ns, 
clock-skew test skipped!
[   21.247405] clocksource: wd-tsc-wd read-back delay of 175441ns, 
clock-skew test skipped!
...

Thanks,
Michael


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource: Make clocksource watchdog check with WATCHDOG_INTERVAL period
  2022-05-29 22:16   ` Michael Larabel
@ 2022-05-30  0:16     ` Waiman Long
  2022-05-30  0:43       ` Michael Larabel
  0 siblings, 1 reply; 6+ messages in thread
From: Waiman Long @ 2022-05-30  0:16 UTC (permalink / raw)
  To: Michael Larabel, Thomas Gleixner, Stephen Boyd, Feng Tang,
	Paul E. McKenney, John Stultz
  Cc: linux-kernel, Joe Mario, Michey Mehta

On 5/29/22 18:16, Michael Larabel wrote:
> On 5/27/22 21:07, Waiman Long wrote:
>> On 5/27/22 21:57, Waiman Long wrote:
>>> Since commit c86ff8c55b8a ("clocksource: Avoid accidental unstable
>>> marking of clocksource"), a new WD_READ_SKIP value was introduced
>>> as a possible return value of cs_watchdog_read() to skip the current
>>> check. However, this has an undesriable side effect of extending the
>>> time gap between csnow and cs_last to more than one WATCHDOG_INTERVAL
>>> (0.5s) in case of intermittent WD_READ_SKIP's.
>>>
>>> There was an instance of reported clocksource watchdog failure with
>>> the time skew of 485us where the uncertainly threshold is 400us. In
>>> that particular case, the (now - last) gap was about 2s. Looking at
>>> the dmesg log, it was clear there was a successful cs_watchdog_read()
>>> followed by 3 skips and then another successful cs_watchdog_read().
>>>
>>> If there is an existing skew between the hpet (watchdog) and tsc
>>> clocksource, enlarging the period by 4x will certainly increase the
>>> measured skew causing it to exceed the threshold in this case. Fix
>>> this variable period problem by resetting the CLOCK_SOURCE_WATCHDOG bit
>>> after each WD_READ_SKIP to force the reloading of wd_last and cs_last
>>> in the next round. This ensures that we have two consecutive successful
>>> cs_watchdog_read()'s before checking the clock skew.
>>>
>>> Fixes: c86ff8c55b8a ("clocksource: Avoid accidental unstable marking 
>>> of clocksource")
>>> Reported-by: Michael Larabel <Michael@phoronix.com>
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>   kernel/time/clocksource.c | 11 ++++++++++-
>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>>> index cee5da1e54c4..173e052c12b6 100644
>>> --- a/kernel/time/clocksource.c
>>> +++ b/kernel/time/clocksource.c
>>> @@ -411,9 +411,18 @@ static void clocksource_watchdog(struct 
>>> timer_list *unused)
>>>           read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
>>>             if (read_ret != WD_READ_SUCCESS) {
>>> -            if (read_ret == WD_READ_UNSTABLE)
>>> +            if (read_ret == WD_READ_UNSTABLE) {
>>>                   /* Clock readout unreliable, so give it up. */
>>>                   __clocksource_unstable(cs);
>>> +            } else { /* WD_READ_SKIP */
>>> +                /*
>>> +                 * Watchdog clock unstable at the moment,
>>> +                 * discard the stored wd_last and cs_last to
>>> +                 * make sure the gap between now and last
>>> +                 * is always one WATCHDOG_INTERVAL.
>>> +                 */
>>> +                cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>>> +            }
>>>               continue;
>>>           }
>>
>> Sorry, I accidentally use the old email address for John.
>>
>> Cheers,
>> Longman
>
>
> I've tested this patch on the affected Daytona + Milan-X system and 
> can confirm it does fix the performance problem that led to this 
> issue. Though it is spamming the kernel log now every half-second with 
> clocksource messages,  not sure if that is intended/desirable behavior?
>
>
> [    0.000000] tsc: Fast TSC calibration using PIT
> [    0.000000] tsc: Detected 2195.990 MHz processor
> [    1.238759] clocksource: tsc-early: mask: 0xffffffffffffffff 
> max_cycles: 0x1fa766bc6ba, max_idle_ns: 440795275714 ns
> [    2.769608] clocksource: Switched to clocksource tsc-early
> [    3.263925] clocksource: wd-tsc-early-wd read-back delay of 
> 292215ns, clock-skew test skipped!
> [    3.743804] clocksource: wd-tsc-early-wd read-back delay of 
> 268469ns, clock-skew test skipped!
> [    3.935663] tsc: Refined TSC clocksource calibration: 2195.274 MHz
> [    3.935844] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 
> 0x1fa4c255513, max_idle_ns: 440795289702 ns
> [    3.936449] clocksource: Switched to clocksource tsc
> [    4.255932] clocksource: wd-tsc-wd read-back delay of 260228ns, 
> clock-skew test skipped!
> [    4.767892] clocksource: wd-tsc-wd read-back delay of 272520ns, 
> clock-skew test skipped!
> [    5.247581] clocksource: wd-tsc-wd read-back delay of 200444ns, 
> clock-skew test skipped!
> [    5.759560] clocksource: wd-tsc-wd read-back delay of 165942ns, 
> clock-skew test skipped!
> [    6.239687] clocksource: wd-tsc-wd read-back delay of 232222ns, 
> clock-skew test skipped!
> [    7.264014] clocksource: wd-tsc-wd read-back delay of 282927ns, 
> clock-skew test skipped!
> [    7.743864] clocksource: wd-tsc-wd read-back delay of 288374ns, 
> clock-skew test skipped!
> [    8.255590] clocksource: wd-tsc-wd read-back delay of 206730ns, 
> clock-skew test skipped!
> [    8.767778] clocksource: wd-tsc-wd read-back delay of 267771ns, 
> clock-skew test skipped!
> [    9.247870] clocksource: wd-tsc-wd read-back delay of 224469ns, 
> clock-skew test skipped!
> [   10.239340] clocksource: wd-tsc-wd read-back delay of 109720ns, 
> clock-skew test skipped!
> [   12.255276] clocksource: wd-tsc-wd read-back delay of 104692ns, 
> clock-skew test skipped!
> [   16.255362] clocksource: wd-tsc-wd read-back delay of 122780ns, 
> clock-skew test skipped!
> [   17.759335] clocksource: wd-tsc-wd read-back delay of 155885ns, 
> clock-skew test skipped!
> [   18.239500] clocksource: wd-tsc-wd read-back delay of 176558ns, 
> clock-skew test skipped!
> [   18.751341] clocksource: wd-tsc-wd read-back delay of 157352ns, 
> clock-skew test skipped!
> [   19.263618] clocksource: wd-tsc-wd read-back delay of 177606ns, 
> clock-skew test skipped!
> [   19.743487] clocksource: wd-tsc-wd read-back delay of 157841ns, 
> clock-skew test skipped!
> [   20.255482] clocksource: wd-tsc-wd read-back delay of 157701ns, 
> clock-skew test skipped!
> [   20.767634] clocksource: wd-tsc-wd read-back delay of 173136ns, 
> clock-skew test skipped!
> [   21.247405] clocksource: wd-tsc-wd read-back delay of 175441ns, 
> clock-skew test skipped!
> ...
>
> Thanks,
> Michael
>
Thanks for the testing. Did the spamming stop after a while?

It does show that your particular Milan-X CPU(s) have unreliable hpet. 
The only way to stop the spamming is to build a kernel with a larger 
CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US. By default, it is 100us. The 
allowable hpet-hpet delay is only half of that. In this particular case, 
it will have to be set to at least 500 or maybe even 600.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource: Make clocksource watchdog check with WATCHDOG_INTERVAL period
  2022-05-30  0:16     ` Waiman Long
@ 2022-05-30  0:43       ` Michael Larabel
  2022-05-30  2:57         ` Waiman Long
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Larabel @ 2022-05-30  0:43 UTC (permalink / raw)
  To: Waiman Long, Thomas Gleixner, Stephen Boyd, Feng Tang,
	Paul E. McKenney, John Stultz
  Cc: linux-kernel, Joe Mario, Michey Mehta


On 5/29/22 19:16, Waiman Long wrote:
> On 5/29/22 18:16, Michael Larabel wrote:
>> On 5/27/22 21:07, Waiman Long wrote:
>>> On 5/27/22 21:57, Waiman Long wrote:
>>>> Since commit c86ff8c55b8a ("clocksource: Avoid accidental unstable
>>>> marking of clocksource"), a new WD_READ_SKIP value was introduced
>>>> as a possible return value of cs_watchdog_read() to skip the current
>>>> check. However, this has an undesriable side effect of extending the
>>>> time gap between csnow and cs_last to more than one WATCHDOG_INTERVAL
>>>> (0.5s) in case of intermittent WD_READ_SKIP's.
>>>>
>>>> There was an instance of reported clocksource watchdog failure with
>>>> the time skew of 485us where the uncertainly threshold is 400us. In
>>>> that particular case, the (now - last) gap was about 2s. Looking at
>>>> the dmesg log, it was clear there was a successful cs_watchdog_read()
>>>> followed by 3 skips and then another successful cs_watchdog_read().
>>>>
>>>> If there is an existing skew between the hpet (watchdog) and tsc
>>>> clocksource, enlarging the period by 4x will certainly increase the
>>>> measured skew causing it to exceed the threshold in this case. Fix
>>>> this variable period problem by resetting the CLOCK_SOURCE_WATCHDOG 
>>>> bit
>>>> after each WD_READ_SKIP to force the reloading of wd_last and cs_last
>>>> in the next round. This ensures that we have two consecutive 
>>>> successful
>>>> cs_watchdog_read()'s before checking the clock skew.
>>>>
>>>> Fixes: c86ff8c55b8a ("clocksource: Avoid accidental unstable 
>>>> marking of clocksource")
>>>> Reported-by: Michael Larabel <Michael@phoronix.com>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>   kernel/time/clocksource.c | 11 ++++++++++-
>>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>>>> index cee5da1e54c4..173e052c12b6 100644
>>>> --- a/kernel/time/clocksource.c
>>>> +++ b/kernel/time/clocksource.c
>>>> @@ -411,9 +411,18 @@ static void clocksource_watchdog(struct 
>>>> timer_list *unused)
>>>>           read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
>>>>             if (read_ret != WD_READ_SUCCESS) {
>>>> -            if (read_ret == WD_READ_UNSTABLE)
>>>> +            if (read_ret == WD_READ_UNSTABLE) {
>>>>                   /* Clock readout unreliable, so give it up. */
>>>>                   __clocksource_unstable(cs);
>>>> +            } else { /* WD_READ_SKIP */
>>>> +                /*
>>>> +                 * Watchdog clock unstable at the moment,
>>>> +                 * discard the stored wd_last and cs_last to
>>>> +                 * make sure the gap between now and last
>>>> +                 * is always one WATCHDOG_INTERVAL.
>>>> +                 */
>>>> +                cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>>>> +            }
>>>>               continue;
>>>>           }
>>>
>>> Sorry, I accidentally use the old email address for John.
>>>
>>> Cheers,
>>> Longman
>>
>>
>> I've tested this patch on the affected Daytona + Milan-X system and 
>> can confirm it does fix the performance problem that led to this 
>> issue. Though it is spamming the kernel log now every half-second 
>> with clocksource messages,  not sure if that is intended/desirable 
>> behavior?
>>
>>
>> [    0.000000] tsc: Fast TSC calibration using PIT
>> [    0.000000] tsc: Detected 2195.990 MHz processor
>> [    1.238759] clocksource: tsc-early: mask: 0xffffffffffffffff 
>> max_cycles: 0x1fa766bc6ba, max_idle_ns: 440795275714 ns
>> [    2.769608] clocksource: Switched to clocksource tsc-early
>> [    3.263925] clocksource: wd-tsc-early-wd read-back delay of 
>> 292215ns, clock-skew test skipped!
>> [    3.743804] clocksource: wd-tsc-early-wd read-back delay of 
>> 268469ns, clock-skew test skipped!
>> [    3.935663] tsc: Refined TSC clocksource calibration: 2195.274 MHz
>> [    3.935844] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 
>> 0x1fa4c255513, max_idle_ns: 440795289702 ns
>> [    3.936449] clocksource: Switched to clocksource tsc
>> [    4.255932] clocksource: wd-tsc-wd read-back delay of 260228ns, 
>> clock-skew test skipped!
>> [    4.767892] clocksource: wd-tsc-wd read-back delay of 272520ns, 
>> clock-skew test skipped!
>> [    5.247581] clocksource: wd-tsc-wd read-back delay of 200444ns, 
>> clock-skew test skipped!
>> [    5.759560] clocksource: wd-tsc-wd read-back delay of 165942ns, 
>> clock-skew test skipped!
>> [    6.239687] clocksource: wd-tsc-wd read-back delay of 232222ns, 
>> clock-skew test skipped!
>> [    7.264014] clocksource: wd-tsc-wd read-back delay of 282927ns, 
>> clock-skew test skipped!
>> [    7.743864] clocksource: wd-tsc-wd read-back delay of 288374ns, 
>> clock-skew test skipped!
>> [    8.255590] clocksource: wd-tsc-wd read-back delay of 206730ns, 
>> clock-skew test skipped!
>> [    8.767778] clocksource: wd-tsc-wd read-back delay of 267771ns, 
>> clock-skew test skipped!
>> [    9.247870] clocksource: wd-tsc-wd read-back delay of 224469ns, 
>> clock-skew test skipped!
>> [   10.239340] clocksource: wd-tsc-wd read-back delay of 109720ns, 
>> clock-skew test skipped!
>> [   12.255276] clocksource: wd-tsc-wd read-back delay of 104692ns, 
>> clock-skew test skipped!
>> [   16.255362] clocksource: wd-tsc-wd read-back delay of 122780ns, 
>> clock-skew test skipped!
>> [   17.759335] clocksource: wd-tsc-wd read-back delay of 155885ns, 
>> clock-skew test skipped!
>> [   18.239500] clocksource: wd-tsc-wd read-back delay of 176558ns, 
>> clock-skew test skipped!
>> [   18.751341] clocksource: wd-tsc-wd read-back delay of 157352ns, 
>> clock-skew test skipped!
>> [   19.263618] clocksource: wd-tsc-wd read-back delay of 177606ns, 
>> clock-skew test skipped!
>> [   19.743487] clocksource: wd-tsc-wd read-back delay of 157841ns, 
>> clock-skew test skipped!
>> [   20.255482] clocksource: wd-tsc-wd read-back delay of 157701ns, 
>> clock-skew test skipped!
>> [   20.767634] clocksource: wd-tsc-wd read-back delay of 173136ns, 
>> clock-skew test skipped!
>> [   21.247405] clocksource: wd-tsc-wd read-back delay of 175441ns, 
>> clock-skew test skipped!
>> ...
>>
>> Thanks,
>> Michael
>>
> Thanks for the testing. Did the spamming stop after a while?
>
> It does show that your particular Milan-X CPU(s) have unreliable hpet. 
> The only way to stop the spamming is to build a kernel with a larger 
> CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US. By default, it is 100us. The 
> allowable hpet-hpet delay is only half of that. In this particular 
> case, it will have to be set to at least 500 or maybe even 600.
>
> Thanks,
> Longman
>

No, the spamming hadn't stopped. At least as of one hour into the uptime 
it was still spewing every half second. Can confirm tomorrow if it ever 
stops after a longer duration but at least as of one hour of running the 
benchmarks to verify the performance was back to expectations, I had 
shut down that server to move onto other work.

Thanks,
Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource: Make clocksource watchdog check with WATCHDOG_INTERVAL period
  2022-05-30  0:43       ` Michael Larabel
@ 2022-05-30  2:57         ` Waiman Long
  0 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2022-05-30  2:57 UTC (permalink / raw)
  To: Michael Larabel, Thomas Gleixner, Stephen Boyd, Feng Tang,
	Paul E. McKenney, John Stultz
  Cc: linux-kernel, Joe Mario, Michey Mehta

On 5/29/22 20:43, Michael Larabel wrote:
>
> On 5/29/22 19:16, Waiman Long wrote:
>> On 5/29/22 18:16, Michael Larabel wrote:
>>> On 5/27/22 21:07, Waiman Long wrote:
>>>> On 5/27/22 21:57, Waiman Long wrote:
>>>>> Since commit c86ff8c55b8a ("clocksource: Avoid accidental unstable
>>>>> marking of clocksource"), a new WD_READ_SKIP value was introduced
>>>>> as a possible return value of cs_watchdog_read() to skip the current
>>>>> check. However, this has an undesriable side effect of extending the
>>>>> time gap between csnow and cs_last to more than one WATCHDOG_INTERVAL
>>>>> (0.5s) in case of intermittent WD_READ_SKIP's.
>>>>>
>>>>> There was an instance of reported clocksource watchdog failure with
>>>>> the time skew of 485us where the uncertainly threshold is 400us. In
>>>>> that particular case, the (now - last) gap was about 2s. Looking at
>>>>> the dmesg log, it was clear there was a successful cs_watchdog_read()
>>>>> followed by 3 skips and then another successful cs_watchdog_read().
>>>>>
>>>>> If there is an existing skew between the hpet (watchdog) and tsc
>>>>> clocksource, enlarging the period by 4x will certainly increase the
>>>>> measured skew causing it to exceed the threshold in this case. Fix
>>>>> this variable period problem by resetting the 
>>>>> CLOCK_SOURCE_WATCHDOG bit
>>>>> after each WD_READ_SKIP to force the reloading of wd_last and cs_last
>>>>> in the next round. This ensures that we have two consecutive 
>>>>> successful
>>>>> cs_watchdog_read()'s before checking the clock skew.
>>>>>
>>>>> Fixes: c86ff8c55b8a ("clocksource: Avoid accidental unstable 
>>>>> marking of clocksource")
>>>>> Reported-by: Michael Larabel <Michael@phoronix.com>
>>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>> ---
>>>>>   kernel/time/clocksource.c | 11 ++++++++++-
>>>>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>>>>> index cee5da1e54c4..173e052c12b6 100644
>>>>> --- a/kernel/time/clocksource.c
>>>>> +++ b/kernel/time/clocksource.c
>>>>> @@ -411,9 +411,18 @@ static void clocksource_watchdog(struct 
>>>>> timer_list *unused)
>>>>>           read_ret = cs_watchdog_read(cs, &csnow, &wdnow);
>>>>>             if (read_ret != WD_READ_SUCCESS) {
>>>>> -            if (read_ret == WD_READ_UNSTABLE)
>>>>> +            if (read_ret == WD_READ_UNSTABLE) {
>>>>>                   /* Clock readout unreliable, so give it up. */
>>>>>                   __clocksource_unstable(cs);
>>>>> +            } else { /* WD_READ_SKIP */
>>>>> +                /*
>>>>> +                 * Watchdog clock unstable at the moment,
>>>>> +                 * discard the stored wd_last and cs_last to
>>>>> +                 * make sure the gap between now and last
>>>>> +                 * is always one WATCHDOG_INTERVAL.
>>>>> +                 */
>>>>> +                cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>>>>> +            }
>>>>>               continue;
>>>>>           }
>>>>
>>>> Sorry, I accidentally use the old email address for John.
>>>>
>>>> Cheers,
>>>> Longman
>>>
>>>
>>> I've tested this patch on the affected Daytona + Milan-X system and 
>>> can confirm it does fix the performance problem that led to this 
>>> issue. Though it is spamming the kernel log now every half-second 
>>> with clocksource messages,  not sure if that is intended/desirable 
>>> behavior?
>>>
>>>
>>> [    0.000000] tsc: Fast TSC calibration using PIT
>>> [    0.000000] tsc: Detected 2195.990 MHz processor
>>> [    1.238759] clocksource: tsc-early: mask: 0xffffffffffffffff 
>>> max_cycles: 0x1fa766bc6ba, max_idle_ns: 440795275714 ns
>>> [    2.769608] clocksource: Switched to clocksource tsc-early
>>> [    3.263925] clocksource: wd-tsc-early-wd read-back delay of 
>>> 292215ns, clock-skew test skipped!
>>> [    3.743804] clocksource: wd-tsc-early-wd read-back delay of 
>>> 268469ns, clock-skew test skipped!
>>> [    3.935663] tsc: Refined TSC clocksource calibration: 2195.274 MHz
>>> [    3.935844] clocksource: tsc: mask: 0xffffffffffffffff 
>>> max_cycles: 0x1fa4c255513, max_idle_ns: 440795289702 ns
>>> [    3.936449] clocksource: Switched to clocksource tsc
>>> [    4.255932] clocksource: wd-tsc-wd read-back delay of 260228ns, 
>>> clock-skew test skipped!
>>> [    4.767892] clocksource: wd-tsc-wd read-back delay of 272520ns, 
>>> clock-skew test skipped!
>>> [    5.247581] clocksource: wd-tsc-wd read-back delay of 200444ns, 
>>> clock-skew test skipped!
>>> [    5.759560] clocksource: wd-tsc-wd read-back delay of 165942ns, 
>>> clock-skew test skipped!
>>> [    6.239687] clocksource: wd-tsc-wd read-back delay of 232222ns, 
>>> clock-skew test skipped!
>>> [    7.264014] clocksource: wd-tsc-wd read-back delay of 282927ns, 
>>> clock-skew test skipped!
>>> [    7.743864] clocksource: wd-tsc-wd read-back delay of 288374ns, 
>>> clock-skew test skipped!
>>> [    8.255590] clocksource: wd-tsc-wd read-back delay of 206730ns, 
>>> clock-skew test skipped!
>>> [    8.767778] clocksource: wd-tsc-wd read-back delay of 267771ns, 
>>> clock-skew test skipped!
>>> [    9.247870] clocksource: wd-tsc-wd read-back delay of 224469ns, 
>>> clock-skew test skipped!
>>> [   10.239340] clocksource: wd-tsc-wd read-back delay of 109720ns, 
>>> clock-skew test skipped!
>>> [   12.255276] clocksource: wd-tsc-wd read-back delay of 104692ns, 
>>> clock-skew test skipped!
>>> [   16.255362] clocksource: wd-tsc-wd read-back delay of 122780ns, 
>>> clock-skew test skipped!
>>> [   17.759335] clocksource: wd-tsc-wd read-back delay of 155885ns, 
>>> clock-skew test skipped!
>>> [   18.239500] clocksource: wd-tsc-wd read-back delay of 176558ns, 
>>> clock-skew test skipped!
>>> [   18.751341] clocksource: wd-tsc-wd read-back delay of 157352ns, 
>>> clock-skew test skipped!
>>> [   19.263618] clocksource: wd-tsc-wd read-back delay of 177606ns, 
>>> clock-skew test skipped!
>>> [   19.743487] clocksource: wd-tsc-wd read-back delay of 157841ns, 
>>> clock-skew test skipped!
>>> [   20.255482] clocksource: wd-tsc-wd read-back delay of 157701ns, 
>>> clock-skew test skipped!
>>> [   20.767634] clocksource: wd-tsc-wd read-back delay of 173136ns, 
>>> clock-skew test skipped!
>>> [   21.247405] clocksource: wd-tsc-wd read-back delay of 175441ns, 
>>> clock-skew test skipped!
>>> ...
>>>
>>> Thanks,
>>> Michael
>>>
>> Thanks for the testing. Did the spamming stop after a while?
>>
>> It does show that your particular Milan-X CPU(s) have unreliable 
>> hpet. The only way to stop the spamming is to build a kernel with a 
>> larger CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US. By default, it is 
>> 100us. The allowable hpet-hpet delay is only half of that. In this 
>> particular case, it will have to be set to at least 500 or maybe even 
>> 600.
>>
>> Thanks,
>> Longman
>>
>
> No, the spamming hadn't stopped. At least as of one hour into the 
> uptime it was still spewing every half second. Can confirm tomorrow if 
> it ever stops after a longer duration but at least as of one hour of 
> running the benchmarks to verify the performance was back to 
> expectations, I had shut down that server to move onto other work.

With my past experience, hpet read will slow down significantly when the 
system is running a very cpu intensive workload leading to cpu being 
under thermal stress. Benchmarking is certainly an activity that may 
trigger this problem. However, they usually go away when the cpus are 
not at 100% load anymore and the system can cool down.

If this still happens at idle, there is definitely something wrong with 
the cpus. I don't think we have Milan system internally that can 
reproduce this problem at the moment. However Milan-X has HBM memory 
chiplets on top of the cpu chiplets. That may make heat dissipation 
harder and exacerbate this hpet problem.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-30  2:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-28  1:57 [PATCH] clocksource: Make clocksource watchdog check with WATCHDOG_INTERVAL period Waiman Long
2022-05-28  2:07 ` Waiman Long
2022-05-29 22:16   ` Michael Larabel
2022-05-30  0:16     ` Waiman Long
2022-05-30  0:43       ` Michael Larabel
2022-05-30  2:57         ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).