linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] clocksource: improve unstable clocksource detection
       [not found] <cover.1438795461.git.shli@fb.com>
@ 2015-08-05 18:12 ` Shaohua Li
  2015-08-17 16:14   ` Shaohua Li
  2015-08-05 18:12 ` [PATCH 2/2] clocksource: sanity check watchdog clocksource Shaohua Li
  1 sibling, 1 reply; 4+ messages in thread
From: Shaohua Li @ 2015-08-05 18:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kernel-team, John Stultz, Thomas Gleixner, Ingo Molnar

>From time to time we saw TSC is marked as unstable in our systems, while
the CPUs declare to have stable TSC. Looking at the clocksource unstable
detection, there are two problems:
- watchdog clock source wrap. HPET is the most common watchdog clock
  source. It's 32-bit and runs in 14.3Mhz. That means the hpet counter
  can wrap in about 5 minutes.
- threshold isn't scaled against interval. The threshold is 0.0625s in
  0.5s interval. What if the actual interval is bigger than 0.5s?

The watchdog runs in a timer bh, so hard/soft irq can defer its running.
Heavy network stack softirq can hog a cpu. IPMI driver can disable
interrupt for a very long time. The first problem is mostly we are
suffering I think.

Here is a simple patch to fix the issues. If the waterdog doesn't run
for a long time, we ignore the detection. This should work for the two
problems. For the second one, we probably doen't need to scale if the
interval isn't very long.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 kernel/time/clocksource.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 841b72f..8417c83 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -122,9 +122,10 @@ static int clocksource_watchdog_kthread(void *data);
 static void __clocksource_change_rating(struct clocksource *cs, int rating);
 
 /*
- * Interval: 0.5sec Threshold: 0.0625s
+ * Interval: 0.5sec MaxInterval: 1s Threshold: 0.0625s
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
+#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
 
 static void clocksource_watchdog_work(struct work_struct *work)
@@ -217,7 +218,9 @@ static void clocksource_watchdog(unsigned long data)
 			continue;
 
 		/* Check the deviation from the watchdog clocksource. */
-		if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
+		if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) &&
+		    cs_nsec < WATCHDOG_MAX_INTERVAL_NS &&
+		    wd_nsec < WATCHDOG_MAX_INTERVAL_NS) {
 			pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable because the skew is too large:\n",
 				cs->name);
 			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] clocksource: sanity check watchdog clocksource
       [not found] <cover.1438795461.git.shli@fb.com>
  2015-08-05 18:12 ` [PATCH 1/2] clocksource: improve unstable clocksource detection Shaohua Li
@ 2015-08-05 18:12 ` Shaohua Li
  1 sibling, 0 replies; 4+ messages in thread
From: Shaohua Li @ 2015-08-05 18:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kernel-team, John Stultz, Thomas Gleixner, Ingo Molnar

Add a sanity check to make sure watchdog clocksource doesn't wrap too
quickly.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 kernel/time/clocksource.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 8417c83..64e4629 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -327,7 +327,8 @@ static void clocksource_enqueue_watchdog(struct clocksource *cs)
 		if (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS)
 			cs->flags |= CLOCK_SOURCE_VALID_FOR_HRES;
 		/* Pick the best watchdog. */
-		if (!watchdog || cs->rating > watchdog->rating) {
+		if (cs->max_idle_ns > WATCHDOG_MAX_INTERVAL_NS &&
+		    (!watchdog || cs->rating > watchdog->rating)) {
 			watchdog = cs;
 			/* Reset watchdog cycles */
 			clocksource_reset_watchdog();
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] clocksource: improve unstable clocksource detection
  2015-08-05 18:12 ` [PATCH 1/2] clocksource: improve unstable clocksource detection Shaohua Li
@ 2015-08-17 16:14   ` Shaohua Li
  2015-08-17 16:22     ` John Stultz
  0 siblings, 1 reply; 4+ messages in thread
From: Shaohua Li @ 2015-08-17 16:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kernel-team, John Stultz, Thomas Gleixner, Ingo Molnar

ping, any comments?

On Wed, Aug 05, 2015 at 11:12:53AM -0700, Shaohua Li wrote:
> From time to time we saw TSC is marked as unstable in our systems, while
> the CPUs declare to have stable TSC. Looking at the clocksource unstable
> detection, there are two problems:
> - watchdog clock source wrap. HPET is the most common watchdog clock
>   source. It's 32-bit and runs in 14.3Mhz. That means the hpet counter
>   can wrap in about 5 minutes.
> - threshold isn't scaled against interval. The threshold is 0.0625s in
>   0.5s interval. What if the actual interval is bigger than 0.5s?
> 
> The watchdog runs in a timer bh, so hard/soft irq can defer its running.
> Heavy network stack softirq can hog a cpu. IPMI driver can disable
> interrupt for a very long time. The first problem is mostly we are
> suffering I think.
> 
> Here is a simple patch to fix the issues. If the waterdog doesn't run
> for a long time, we ignore the detection. This should work for the two
> problems. For the second one, we probably doen't need to scale if the
> interval isn't very long.
> 
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  kernel/time/clocksource.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 841b72f..8417c83 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -122,9 +122,10 @@ static int clocksource_watchdog_kthread(void *data);
>  static void __clocksource_change_rating(struct clocksource *cs, int rating);
>  
>  /*
> - * Interval: 0.5sec Threshold: 0.0625s
> + * Interval: 0.5sec MaxInterval: 1s Threshold: 0.0625s
>   */
>  #define WATCHDOG_INTERVAL (HZ >> 1)
> +#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC)
>  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
>  
>  static void clocksource_watchdog_work(struct work_struct *work)
> @@ -217,7 +218,9 @@ static void clocksource_watchdog(unsigned long data)
>  			continue;
>  
>  		/* Check the deviation from the watchdog clocksource. */
> -		if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
> +		if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) &&
> +		    cs_nsec < WATCHDOG_MAX_INTERVAL_NS &&
> +		    wd_nsec < WATCHDOG_MAX_INTERVAL_NS) {
>  			pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable because the skew is too large:\n",
>  				cs->name);
>  			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
> -- 
> 1.8.5.6
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] clocksource: improve unstable clocksource detection
  2015-08-17 16:14   ` Shaohua Li
@ 2015-08-17 16:22     ` John Stultz
  0 siblings, 0 replies; 4+ messages in thread
From: John Stultz @ 2015-08-17 16:22 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Kernel-team, Thomas Gleixner, Ingo Molnar

On Mon, Aug 17, 2015 at 9:14 AM, Shaohua Li <shli@fb.com> wrote:
> ping, any comments?
>
> On Wed, Aug 05, 2015 at 11:12:53AM -0700, Shaohua Li wrote:
>> From time to time we saw TSC is marked as unstable in our systems, while
>> the CPUs declare to have stable TSC. Looking at the clocksource unstable
>> detection, there are two problems:
>> - watchdog clock source wrap. HPET is the most common watchdog clock
>>   source. It's 32-bit and runs in 14.3Mhz. That means the hpet counter
>>   can wrap in about 5 minutes.
>> - threshold isn't scaled against interval. The threshold is 0.0625s in
>>   0.5s interval. What if the actual interval is bigger than 0.5s?
>>
>> The watchdog runs in a timer bh, so hard/soft irq can defer its running.
>> Heavy network stack softirq can hog a cpu. IPMI driver can disable
>> interrupt for a very long time. The first problem is mostly we are
>> suffering I think.
>>
>> Here is a simple patch to fix the issues. If the waterdog doesn't run
>> for a long time, we ignore the detection. This should work for the two
>> problems. For the second one, we probably doen't need to scale if the
>> interval isn't very long.
>>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Signed-off-by: Shaohua Li <shli@fb.com>
>> ---
>>  kernel/time/clocksource.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index 841b72f..8417c83 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -122,9 +122,10 @@ static int clocksource_watchdog_kthread(void *data);
>>  static void __clocksource_change_rating(struct clocksource *cs, int rating);
>>
>>  /*
>> - * Interval: 0.5sec Threshold: 0.0625s
>> + * Interval: 0.5sec MaxInterval: 1s Threshold: 0.0625s
>>   */
>>  #define WATCHDOG_INTERVAL (HZ >> 1)
>> +#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC)
>>  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
>>
>>  static void clocksource_watchdog_work(struct work_struct *work)
>> @@ -217,7 +218,9 @@ static void clocksource_watchdog(unsigned long data)
>>                       continue;
>>
>>               /* Check the deviation from the watchdog clocksource. */
>> -             if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
>> +             if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) &&
>> +                 cs_nsec < WATCHDOG_MAX_INTERVAL_NS &&
>> +                 wd_nsec < WATCHDOG_MAX_INTERVAL_NS) {
>>                       pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable because the skew is too large:\n",
>>                               cs->name);
>>                       pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",

Sorry. I got back from vacation last week and haven't finished queuing
items from my inbox.  It looks reasonable to me. I'm hoping to put
together my patchset for 4.3, today and this hopefully should make it
unless I run into anything concerning.

thanks
-john

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-17 16:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1438795461.git.shli@fb.com>
2015-08-05 18:12 ` [PATCH 1/2] clocksource: improve unstable clocksource detection Shaohua Li
2015-08-17 16:14   ` Shaohua Li
2015-08-17 16:22     ` John Stultz
2015-08-05 18:12 ` [PATCH 2/2] clocksource: sanity check watchdog clocksource Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).