linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [URGENT rfc patch 0/3] tsc clocksource bug fix
@ 2013-07-04  5:34 Alex Shi
  2013-07-04  5:34 ` [PATCH 1/3] clocksource: clean up clocksource_select Alex Shi
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Alex Shi @ 2013-07-04  5:34 UTC (permalink / raw)
  To: tglx, hpa, tim.c.chen; +Cc: linux-kernel, andi.kleen, a.p.zijlstra, mingo

We find some benchmarks drop a lot on tip/sched/core on many Intel boxes,
like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae
cause this regression. Due to this commit, the clocksource was changed
to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later
in clocksource_watchdog. 

Tim Chen said the hpet reading cost much. That cause this regression.

This patchset fixed this bug by re-select clocksource after this flag set
on tsc.

BTW,
If the tsc is marked as constant and nonstop, could we set it as system
clocksource when do tsc register? w/o checking it on clocksource_watchdog?

regards!
Alex


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/3] clocksource: clean up clocksource_select
  2013-07-04  5:34 [URGENT rfc patch 0/3] tsc clocksource bug fix Alex Shi
@ 2013-07-04  5:34 ` Alex Shi
  2013-07-04  9:57   ` Thomas Gleixner
  2013-07-04  5:34 ` [PATCH 2/3] clockesource: set override clocksource Alex Shi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Alex Shi @ 2013-07-04  5:34 UTC (permalink / raw)
  To: tglx, hpa, tim.c.chen; +Cc: linux-kernel, andi.kleen, a.p.zijlstra, mingo

After clocksource_find_best() introduced, it is impossible to get into
some code path. so clean them up.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/time/clocksource.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index e713ef7..021c159 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -582,26 +582,12 @@ static void __clocksource_select(bool skipcur)
 	if (!best)
 		return;
 
-	/* Check for the override clocksource. */
 	list_for_each_entry(cs, &clocksource_list, list) {
 		if (skipcur && cs == curr_clocksource)
 			continue;
 		if (strcmp(cs->name, override_name) != 0)
 			continue;
-		/*
-		 * Check to make sure we don't switch to a non-highres
-		 * capable clocksource if the tick code is in oneshot
-		 * mode (highres or nohz)
-		 */
-		if (!(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES) && oneshot) {
-			/* Override clocksource cannot be used. */
-			printk(KERN_WARNING "Override clocksource %s is not "
-			       "HRT compatible. Cannot switch while in "
-			       "HRT/NOHZ mode\n", cs->name);
-			override_name[0] = 0;
-		} else
-			/* Override clocksource can be used. */
-			best = cs;
+		best = cs;
 		break;
 	}
 
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/3] clockesource: set override clocksource
  2013-07-04  5:34 [URGENT rfc patch 0/3] tsc clocksource bug fix Alex Shi
  2013-07-04  5:34 ` [PATCH 1/3] clocksource: clean up clocksource_select Alex Shi
@ 2013-07-04  5:34 ` Alex Shi
  2013-07-04 10:23   ` Thomas Gleixner
  2013-07-04  5:34 ` [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug Alex Shi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Alex Shi @ 2013-07-04  5:34 UTC (permalink / raw)
  To: tglx, hpa, tim.c.chen; +Cc: linux-kernel, andi.kleen, a.p.zijlstra, mingo

Shrink the mutex region. And save a clocksource_select action if set
clocksource is same as current clocksource.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/time/clocksource.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 021c159..9d6c333 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -885,13 +885,15 @@ static ssize_t sysfs_override_clocksource(struct device *dev,
 {
 	size_t ret;
 
-	mutex_lock(&clocksource_mutex);
-
 	ret = sysfs_get_uname(buf, override_name, count);
-	if (ret >= 0)
-		clocksource_select();
+	if (ret >= 0) {
+		if (!strcmp(curr_clocksource->name, override_name))
+			return ret;
 
-	mutex_unlock(&clocksource_mutex);
+		mutex_lock(&clocksource_mutex);
+		clocksource_select();
+		mutex_unlock(&clocksource_mutex);
+	}
 
 	return ret;
 }
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug
  2013-07-04  5:34 [URGENT rfc patch 0/3] tsc clocksource bug fix Alex Shi
  2013-07-04  5:34 ` [PATCH 1/3] clocksource: clean up clocksource_select Alex Shi
  2013-07-04  5:34 ` [PATCH 2/3] clockesource: set override clocksource Alex Shi
@ 2013-07-04  5:34 ` Alex Shi
  2013-07-04 10:55   ` Thomas Gleixner
  2013-07-04  7:58 ` [URGENT rfc patch 0/3] tsc clocksource bug fix Peter Zijlstra
  2013-07-04 11:00 ` Thomas Gleixner
  4 siblings, 1 reply; 34+ messages in thread
From: Alex Shi @ 2013-07-04  5:34 UTC (permalink / raw)
  To: tglx, hpa, tim.c.chen; +Cc: linux-kernel, andi.kleen, a.p.zijlstra, mingo

commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2
    clocksource: Always verify highres capability

This commit will reject a clock to be system clocksource if it has no
CLOCK_SOURCE_VALID_FOR_HRES flags. Then the tsc to be rejected as
clocksource, because this flag for tsc is set in clocksource_watchdog
which run after the tsc register.
TSC registered in tsc_refine_calibration_work() and started the watchdog
at that time.

This patch re-select the clocksource after we make sure the tsc has this
flag. Fixed this bug.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/time/clocksource.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 9d6c333..3ad9f29 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -308,6 +308,8 @@ static void clocksource_watchdog(unsigned long data)
 			 * transition into high-res mode:
 			 */
 			tick_clock_notify();
+			if (finished_booting)
+				schedule_work(&watchdog_work);
 		}
 	}
 
@@ -404,6 +406,7 @@ static void clocksource_dequeue_watchdog(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
+static void clocksource_select(void);
 static int clocksource_watchdog_kthread(void *data)
 {
 	struct clocksource *cs, *tmp;
@@ -412,11 +415,14 @@ static int clocksource_watchdog_kthread(void *data)
 
 	mutex_lock(&clocksource_mutex);
 	spin_lock_irqsave(&watchdog_lock, flags);
-	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list)
+	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
 			list_del_init(&cs->wd_list);
 			list_add(&cs->wd_list, &unstable);
 		}
+		if (cs->flags & CLOCK_SOURCE_VALID_FOR_HRES)
+			clocksource_select();
+	}
 	/* Check if the watchdog timer needs to be stopped. */
 	clocksource_stop_watchdog();
 	spin_unlock_irqrestore(&watchdog_lock, flags);
-- 
1.7.12


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-04  5:34 [URGENT rfc patch 0/3] tsc clocksource bug fix Alex Shi
                   ` (2 preceding siblings ...)
  2013-07-04  5:34 ` [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug Alex Shi
@ 2013-07-04  7:58 ` Peter Zijlstra
  2013-07-04  8:13   ` Alex Shi
  2013-07-05 14:23   ` Frederic Weisbecker
  2013-07-04 11:00 ` Thomas Gleixner
  4 siblings, 2 replies; 34+ messages in thread
From: Peter Zijlstra @ 2013-07-04  7:58 UTC (permalink / raw)
  To: Alex Shi; +Cc: tglx, hpa, tim.c.chen, linux-kernel, andi.kleen, mingo

On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote:

> If the tsc is marked as constant and nonstop, could we set it as system
> clocksource when do tsc register? w/o checking it on clocksource_watchdog?

I'd not do that; the BIOS can still screw you over, we need some validation.

That said; we do need means to disable the clocksource watchdog -- although I
suppose Frederic might already have provided this for this NOHZ efforts when I
wasn't looking.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-04  7:58 ` [URGENT rfc patch 0/3] tsc clocksource bug fix Peter Zijlstra
@ 2013-07-04  8:13   ` Alex Shi
  2013-07-05 14:23   ` Frederic Weisbecker
  1 sibling, 0 replies; 34+ messages in thread
From: Alex Shi @ 2013-07-04  8:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, hpa, tim.c.chen, linux-kernel, andi.kleen, mingo

On 07/04/2013 03:58 PM, Peter Zijlstra wrote:
> On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote:
> 
>> If the tsc is marked as constant and nonstop, could we set it as system
>> clocksource when do tsc register? w/o checking it on clocksource_watchdog?
> 
> I'd not do that; the BIOS can still screw you over, we need some validation.

I see. thanks!
> 
> That said; we do need means to disable the clocksource watchdog -- although I
> suppose Frederic might already have provided this for this NOHZ efforts when I
> wasn't looking.
> 


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] clocksource: clean up clocksource_select
  2013-07-04  5:34 ` [PATCH 1/3] clocksource: clean up clocksource_select Alex Shi
@ 2013-07-04  9:57   ` Thomas Gleixner
  2013-07-04 10:21     ` Alex Shi
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-04  9:57 UTC (permalink / raw)
  To: Alex Shi; +Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Thu, 4 Jul 2013, Alex Shi wrote:

> After clocksource_find_best() introduced, it is impossible to get into
> some code path. so clean them up.

That's wrong.
 
> Signed-off-by: Alex Shi <alex.shi@intel.com>
> ---
>  kernel/time/clocksource.c | 16 +---------------
>  1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index e713ef7..021c159 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -582,26 +582,12 @@ static void __clocksource_select(bool skipcur)
>  	if (!best)
>  		return;
>  
> -	/* Check for the override clocksource. */
>  	list_for_each_entry(cs, &clocksource_list, list) {
>  		if (skipcur && cs == curr_clocksource)
>  			continue;
>  		if (strcmp(cs->name, override_name) != 0)
>  			continue;

We need this check and it is completely unrelated to the problem
you're trying to solve.

Assume the following:

       System boots with clocksource A and switches into highres mode.
       Now clocksource B gets registered and B is not highres capable.

       clocksource_find_best() selects again A, but we have
       clocksource=B on the kernel command line to override the kernel
       decision.

By removing the check, you install he non highres capable clocksource
B and kill the machine.

> -		/*
> -		 * Check to make sure we don't switch to a non-highres
> -		 * capable clocksource if the tick code is in oneshot
> -		 * mode (highres or nohz)
> -		 */
> -		if (!(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES) && oneshot) {
> -			/* Override clocksource cannot be used. */
> -			printk(KERN_WARNING "Override clocksource %s is not "
> -			       "HRT compatible. Cannot switch while in "
> -			       "HRT/NOHZ mode\n", cs->name);
> -			override_name[0] = 0;
> -		} else
> -			/* Override clocksource can be used. */
> -			best = cs;
> +		best = cs;
>  		break;
>  	}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] clocksource: clean up clocksource_select
  2013-07-04  9:57   ` Thomas Gleixner
@ 2013-07-04 10:21     ` Alex Shi
  2013-07-04 10:27       ` Thomas Gleixner
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Shi @ 2013-07-04 10:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo


> We need this check and it is completely unrelated to the problem
> you're trying to solve.
> 
> Assume the following:
> 
>        System boots with clocksource A and switches into highres mode.
>        Now clocksource B gets registered and B is not highres capable.
> 
>        clocksource_find_best() selects again A, but we have
>        clocksource=B on the kernel command line to override the kernel
>        decision.
> 
> By removing the check, you install he non highres capable clocksource
> B and kill the machine.
> 

You'r right. my bad, Sorry!

BTW, why we allow user override a second best clocksource? I mean user
can override the tsc with hpet. because undetected unstable tsc?



-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] clockesource: set override clocksource
  2013-07-04  5:34 ` [PATCH 2/3] clockesource: set override clocksource Alex Shi
@ 2013-07-04 10:23   ` Thomas Gleixner
  0 siblings, 0 replies; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-04 10:23 UTC (permalink / raw)
  To: Alex Shi; +Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Thu, 4 Jul 2013, Alex Shi wrote:

> Shrink the mutex region. And save a clocksource_select action if set
> clocksource is same as current clocksource.

Again, how is that related to the issue described in 0/3 ?

That's an optimization and not a regression fix. And it's wrong as
well.
 
> Signed-off-by: Alex Shi <alex.shi@intel.com>
> ---
>  kernel/time/clocksource.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 021c159..9d6c333 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -885,13 +885,15 @@ static ssize_t sysfs_override_clocksource(struct device *dev,
>  {
>  	size_t ret;
>  
> -	mutex_lock(&clocksource_mutex);
> -
>  	ret = sysfs_get_uname(buf, override_name, count);
> -	if (ret >= 0)
> -		clocksource_select();
> +	if (ret >= 0) {
> +		if (!strcmp(curr_clocksource->name, override_name))

What if you get preempted in the middle of sysfs_get_uname() or in the
middle of strcmp() and some other code triggers a clocksource_select()
while you are off the CPU?

That might end up in a half filled override_name buffer or accessing
memory which might have been freed already because curr_clocksource
changed and the old driver was unloaded.

Not pretty.

If at all we can do:

-	mutex_lock(&clocksource_mutex);
-
-  	ret = sysfs_get_uname(buf, override_name, count);
+  	ret = sysfs_get_uname(buf, tmp_buf, count);
+	if (ret < 0)
+	   	return ret;

+	mutex_lock(&clocksource_mutex);
+	if (strcmp(override_name, tmp_buf) != 0) {
+	   	memcpy(override_name, tmp_buf, sizeof(tmp_buf));		  
+		clocksource_select();
+	}	

	mutex_unlock(&clocksource_mutex);
  	return ret;
}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] clocksource: clean up clocksource_select
  2013-07-04 10:21     ` Alex Shi
@ 2013-07-04 10:27       ` Thomas Gleixner
  2013-07-05  1:22         ` Alex Shi
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-04 10:27 UTC (permalink / raw)
  To: Alex Shi; +Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Thu, 4 Jul 2013, Alex Shi wrote:
> > We need this check and it is completely unrelated to the problem
> > you're trying to solve.
> > 
> > Assume the following:
> > 
> >        System boots with clocksource A and switches into highres mode.
> >        Now clocksource B gets registered and B is not highres capable.
> > 
> >        clocksource_find_best() selects again A, but we have
> >        clocksource=B on the kernel command line to override the kernel
> >        decision.
> > 
> > By removing the check, you install he non highres capable clocksource
> > B and kill the machine.
> > 
> 
> You'r right. my bad, Sorry!
> 
> BTW, why we allow user override a second best clocksource? I mean user
> can override the tsc with hpet. because undetected unstable tsc?

The user can decide to override with clocksource=jiffies if he wants
for testing purposes. That wont switch into highres ever if done from
the kernel command line. Now we have a sysfs interface, so we need to
sanity check the user override against highres.

The override is quite useful to test hpet, pmtimer on a machine which
would always select TSC.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug
  2013-07-04  5:34 ` [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug Alex Shi
@ 2013-07-04 10:55   ` Thomas Gleixner
  2013-07-04 13:04     ` Thomas Gleixner
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-04 10:55 UTC (permalink / raw)
  To: Alex Shi; +Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Thu, 4 Jul 2013, Alex Shi wrote:
> commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2
>     clocksource: Always verify highres capability
> 
> This commit will reject a clock to be system clocksource if it has no
> CLOCK_SOURCE_VALID_FOR_HRES flags. Then the tsc to be rejected as

No. It rejects the clocksource if the system is in oneshot mode and
the clocksource does not support HIGHRES.

So at boot time, the tick device mode is PERIODIC and the clocksource
is set to jiffies.

In clocksource_done_booting() we select the best clocksource from the
registered list. We are still in PERIODIC mode, so the selection in
clocksource_find_best() should grab TSC whether the HIGHRES_VALID flag
is set or not. The relevant check in find_best() is:

        if (oneshot && !(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES))
                 continue;                                         

And oneshot should be definitely false at that point. So we don't care
about the HRES flag at all.

So if we select TSC from clocksource_done_booting() that prevents the
system from switching into highres mode as long as the VALID_FOR_HRES
flag is not set by the watchdog. If it gets set then
tick_clock_notify() tells the tick code to reevaluate.

So now the question is why you are observing that HPET is selected in
the first place.

Can you add instrumentation to the code and provide the data please? I
try to reproduce myself. What's the environment you're using?

> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 9d6c333..3ad9f29 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -308,6 +308,8 @@ static void clocksource_watchdog(unsigned long data)
>  			 * transition into high-res mode:
>  			 */
>  			tick_clock_notify();
> +			if (finished_booting)
> +				schedule_work(&watchdog_work);

This only makes sense, when the clocksource which gets the FLAG set
has the highest rating, but was not selected because the system was in
ONESHOT mode already.

And I can't see why that should suddenly happen.

>  static int clocksource_watchdog_kthread(void *data)
>  {
>  	struct clocksource *cs, *tmp;
> @@ -412,11 +415,14 @@ static int clocksource_watchdog_kthread(void *data)
>  
>  	mutex_lock(&clocksource_mutex);
>  	spin_lock_irqsave(&watchdog_lock, flags);
> -	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list)
> +	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
>  		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
>  			list_del_init(&cs->wd_list);
>  			list_add(&cs->wd_list, &unstable);
>  		}
> +		if (cs->flags & CLOCK_SOURCE_VALID_FOR_HRES)
> +			clocksource_select();

Unlikely, but if we have 3 watched clocksources which have the HRES
bit set, you'd call 3 times clocksource_select().

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-04  5:34 [URGENT rfc patch 0/3] tsc clocksource bug fix Alex Shi
                   ` (3 preceding siblings ...)
  2013-07-04  7:58 ` [URGENT rfc patch 0/3] tsc clocksource bug fix Peter Zijlstra
@ 2013-07-04 11:00 ` Thomas Gleixner
  2013-07-04 18:11   ` Davidlohr Bueso
  4 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-04 11:00 UTC (permalink / raw)
  To: Alex Shi; +Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Thu, 4 Jul 2013, Alex Shi wrote:

> We find some benchmarks drop a lot on tip/sched/core on many Intel boxes,
> like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae
> cause this regression. Due to this commit, the clocksource was changed
> to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later
> in clocksource_watchdog. 

5d33b883ae is not in tip/sched/core. So what are you testing and
bisecting?
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug
  2013-07-04 10:55   ` Thomas Gleixner
@ 2013-07-04 13:04     ` Thomas Gleixner
  2013-07-04 20:46       ` [PATCH] clocksource: Reselect clocksource when watchdog validated Thomas Gleixner
  2013-07-05  2:48       ` [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug Alex Shi
  0 siblings, 2 replies; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-04 13:04 UTC (permalink / raw)
  To: Alex Shi; +Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Thu, 4 Jul 2013, Thomas Gleixner wrote:
> On Thu, 4 Jul 2013, Alex Shi wrote:
> > commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2
> >     clocksource: Always verify highres capability
> > 
> > This commit will reject a clock to be system clocksource if it has no
> > CLOCK_SOURCE_VALID_FOR_HRES flags. Then the tsc to be rejected as
> 
> No. It rejects the clocksource if the system is in oneshot mode and
> the clocksource does not support HIGHRES.
> 
> So at boot time, the tick device mode is PERIODIC and the clocksource
> is set to jiffies.
> 
> In clocksource_done_booting() we select the best clocksource from the
> registered list. We are still in PERIODIC mode, so the selection in
> clocksource_find_best() should grab TSC whether the HIGHRES_VALID flag
> is set or not. The relevant check in find_best() is:
> 
>         if (oneshot && !(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES))
>                  continue;                                         
> 
> And oneshot should be definitely false at that point. So we don't care
> about the HRES flag at all.
> 
> So if we select TSC from clocksource_done_booting() that prevents the
> system from switching into highres mode as long as the VALID_FOR_HRES
> flag is not set by the watchdog. If it gets set then
> tick_clock_notify() tells the tick code to reevaluate.
> 
> So now the question is why you are observing that HPET is selected in
> the first place.
> 
> Can you add instrumentation to the code and provide the data please? I
> try to reproduce myself. What's the environment you're using?

Ok. Figured it out.

Boot.
start tsc revalidation()
register(hpet)
register(pmtimer);

clocksource_done_booting()
	select HPET

switch to oneshot

tsc_revalidation()
	register(TSC)
	clocksource_select()
		keep HPET because TSC is not yet validated by the watchdog

watchdog validates TSC, but that does not make TSC the system
clocksource.

So, the old code was actually installing TSC over HPET even when TSC
was not yet qualified for HRES. Kinda worked :)

So yes, we need somthing like your patch to fix that. 

> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 9d6c333..3ad9f29 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -308,6 +308,8 @@ static void clocksource_watchdog(unsigned long data)
> >  			 * transition into high-res mode:
> >  			 */
> >  			tick_clock_notify();
> > +			if (finished_booting)
> > +				schedule_work(&watchdog_work);
> 
> This only makes sense, when the clocksource which gets the FLAG set
> has the highest rating, but was not selected because the system was in
> ONESHOT mode already.
> 
> And I can't see why that should suddenly happen.

Now I can :)

> >  static int clocksource_watchdog_kthread(void *data)
> >  {
> >  	struct clocksource *cs, *tmp;
> > @@ -412,11 +415,14 @@ static int clocksource_watchdog_kthread(void *data)
> >  
> >  	mutex_lock(&clocksource_mutex);
> >  	spin_lock_irqsave(&watchdog_lock, flags);
> > -	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list)
> > +	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
> >  		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
> >  			list_del_init(&cs->wd_list);
> >  			list_add(&cs->wd_list, &unstable);
> >  		}
> > +		if (cs->flags & CLOCK_SOURCE_VALID_FOR_HRES)
> > +			clocksource_select();
> 
> Unlikely, but if we have 3 watched clocksources which have the HRES
> bit set, you'd call 3 times clocksource_select().

Also the reselect must be called outside the watchdog_lock region.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-04 11:00 ` Thomas Gleixner
@ 2013-07-04 18:11   ` Davidlohr Bueso
  2013-07-04 20:27     ` Thomas Gleixner
  0 siblings, 1 reply; 34+ messages in thread
From: Davidlohr Bueso @ 2013-07-04 18:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alex Shi, hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Thu, 2013-07-04 at 13:00 +0200, Thomas Gleixner wrote:
> On Thu, 4 Jul 2013, Alex Shi wrote:
> 
> > We find some benchmarks drop a lot on tip/sched/core on many Intel boxes,
> > like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae
> > cause this regression. Due to this commit, the clocksource was changed
> > to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later
> > in clocksource_watchdog. 
> 
> 5d33b883ae is not in tip/sched/core. So what are you testing and
> bisecting?

I think he's referring to:

commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Apr 25 20:31:43 2013 +0000

    clocksource: Always verify highres capability
    
    If a clocksource has a (wrong) high rating, but can't be used as a
    timebase for oneshot tick mode, it is unconditionally selected even
    when the system is already in oneshot tick mode. This causes full
    system failure.
    
    Verify the clocksource selection against the oneshot mode.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: John Stultz <john.stultz@linaro.org>
    Cc: Magnus Damm <magnus.damm@gmail.com>
    Link: http://lkml.kernel.org/r/20130425143435.635040849@linutronix.de
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-04 18:11   ` Davidlohr Bueso
@ 2013-07-04 20:27     ` Thomas Gleixner
  2013-07-05  1:12       ` Alex Shi
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-04 20:27 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Alex Shi, hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Thu, 4 Jul 2013, Davidlohr Bueso wrote:

> On Thu, 2013-07-04 at 13:00 +0200, Thomas Gleixner wrote:
> > On Thu, 4 Jul 2013, Alex Shi wrote:
> > 
> > > We find some benchmarks drop a lot on tip/sched/core on many Intel boxes,
> > > like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae
> > > cause this regression. Due to this commit, the clocksource was changed
> > > to hpet from tsc even tsc will be set CLOCK_SOURCE_VALID_FOR_HRES later
> > > in clocksource_watchdog. 
> > 
> > 5d33b883ae is not in tip/sched/core. So what are you testing and
> > bisecting?
> 
> I think he's referring to:
> 
> commit 5d33b883aed81c6fbcd09c6f7c3619eee850a7e2

I know what he is referring to. He explicitly mentions this commit:

> > > like oltp, tbench, hackbench etc. and bisected the commit 5d33b883ae

What I was pointing out that he was referring to tip sched/core at the
same time

> > > We find some benchmarks drop a lot on tip/sched/core on many Intel boxes,

And that branch does NOT have that commit included. So how can you see
a regression on a branch caused by a commit NOT included into that
branch?

The offending commit is in tip timers/core and not in tip
sched/core. What I'm wanted to say is, that we need a proper
description of problems and not some random association.

It tricked you to assume, that I'm not able to figure it out myself :)

See? These things are complex and subtle, so we need precise
descriptions and not some sloppy semi correct data.

I'm well aware of the issue and with Peters help I got a reasonable
explanation for it. A proper fix is about to be sent out.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] clocksource: Reselect clocksource when watchdog validated
  2013-07-04 13:04     ` Thomas Gleixner
@ 2013-07-04 20:46       ` Thomas Gleixner
  2013-07-05  1:38         ` Alex Shi
  2013-07-05  9:13         ` [tip:timers/core] clocksource: Reselect clocksource when watchdog validated high-res capability tip-bot for Thomas Gleixner
  2013-07-05  2:48       ` [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug Alex Shi
  1 sibling, 2 replies; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-04 20:46 UTC (permalink / raw)
  To: Alex Shi
  Cc: hpa, tim.c.chen, LKML, andi.kleen, a.p.zijlstra, Ingo Molnar,
	Davidlohr Bueso, John Stultz

Up to commit 5d33b883a (clocksource: Always verify highres capability)
we had no sanity check when selecting a clocksource, which prevented
that a non highres capable clocksource is used when the system already
switched to highres/nohz mode.

The new sanity check works as Alex and Tim found out. It prevents the
TSC from being used. This happens because on x86 the boot process
looks like this:

 tsc_start_freqency_validation(TSC);
 clocksource_register(HPET);
 clocksource_done_booting();
	clocksource_select()
		Selects HPET which is valid for high-res

 switch_to_highres();

 clocksource_register(TSC); 
 	TSC is not selected, because it is not yet
	flagged as VALID_HIGH_RES

 clocksource_watchdog()
	Validates TSC for highres, but that does not make TSC
	the current clocksource.

Before the sanity check was added, we installed TSC unvalidated which
worked most of the time. If the TSC was really detected as unstable,
then the unstable logic removed it and installed HPET again.

The sanity check is correct and needed. So the watchdog needs to kick
a reselection of the clocksource, when it qualifies TSC as a valid
high res clocksource.

To solve this, we mark the clocksource which got the flag
CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
thread evaluates the flag and invokes clocksource_select() when set.

To avoid that the clocksource_done_booting() code, which is about to
install the first real clocksource anyway, needs to go through
clocksource_select and tick_oneshot_notify() pointlessly, split out
the clocksource_watchdog_kthread() list walk code and invoke the
select/notify only when called from clocksource_watchdog_kthread().

So clocksource_done_booting() can utilize the same splitout code
without the select/notify invocation and the clocksource_mutex
unlock/relock dance.

Reported-by: Alex Shi <alex.shi@intel.com>
Cc: Hans Peter Anvin <hpa@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/clocksource.h |    1 
 kernel/time/clocksource.c   |   57 ++++++++++++++++++++++++++++++++------------
 2 files changed, 43 insertions(+), 15 deletions(-)

Index: tip/include/linux/clocksource.h
===================================================================
--- tip.orig/include/linux/clocksource.h
+++ tip/include/linux/clocksource.h
@@ -210,6 +210,7 @@ struct clocksource {
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
+#define CLOCK_SOURCE_RESELECT			0x100
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
Index: tip/kernel/time/clocksource.c
===================================================================
--- tip.orig/kernel/time/clocksource.c
+++ tip/kernel/time/clocksource.c
@@ -181,6 +181,7 @@ static int finished_booting;
 
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
 static void clocksource_watchdog_work(struct work_struct *work);
+static void clocksource_select(void);
 
 static LIST_HEAD(watchdog_list);
 static struct clocksource *watchdog;
@@ -301,13 +302,30 @@ static void clocksource_watchdog(unsigne
 		if (!(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES) &&
 		    (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS) &&
 		    (watchdog->flags & CLOCK_SOURCE_IS_CONTINUOUS)) {
+			/* Mark it valid for high-res. */
 			cs->flags |= CLOCK_SOURCE_VALID_FOR_HRES;
+
+			/*
+			 * clocksource_done_booting() will sort it if
+			 * finished_booting is not set yet.
+			 */
+			if (!finished_booting)
+				continue;
+
 			/*
-			 * We just marked the clocksource as highres-capable,
-			 * notify the rest of the system as well so that we
-			 * transition into high-res mode:
+			 * If this is not the current clocksource let
+			 * the watchdog thread reselect it. Due to the
+			 * change to high res this clocksource might
+			 * be preferred now. If it is the current
+			 * clocksource let the tick code know about
+			 * that change.
 			 */
-			tick_clock_notify();
+			if (cs != curr_clocksource) {
+				cs->flags |= CLOCK_SOURCE_RESELECT;
+				schedule_work(&watchdog_work);
+			} else {
+				tick_clock_notify();
+			}
 		}
 	}
 
@@ -404,19 +422,25 @@ static void clocksource_dequeue_watchdog
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
-static int clocksource_watchdog_kthread(void *data)
+static int __clocksource_watchdog_kthread(void)
 {
 	struct clocksource *cs, *tmp;
 	unsigned long flags;
 	LIST_HEAD(unstable);
+	int select = 0;
 
-	mutex_lock(&clocksource_mutex);
 	spin_lock_irqsave(&watchdog_lock, flags);
-	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list)
+	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
 			list_del_init(&cs->wd_list);
 			list_add(&cs->wd_list, &unstable);
+			select = 1;
+		}
+		if (cs->flags & CLOCK_SOURCE_RESELECT) {
+			cs->flags &= ~CLOCK_SOURCE_RESELECT;
+			select = 1;
 		}
+	}
 	/* Check if the watchdog timer needs to be stopped. */
 	clocksource_stop_watchdog();
 	spin_unlock_irqrestore(&watchdog_lock, flags);
@@ -426,6 +450,14 @@ static int clocksource_watchdog_kthread(
 		list_del_init(&cs->wd_list);
 		__clocksource_change_rating(cs, 0);
 	}
+	return select;
+}
+
+static int clocksource_watchdog_kthread(void *data)
+{
+	mutex_lock(&clocksource_mutex);
+	if (__clocksource_watchdog_kthread())
+		clocksource_select();
 	mutex_unlock(&clocksource_mutex);
 	return 0;
 }
@@ -445,7 +477,7 @@ static void clocksource_enqueue_watchdog
 
 static inline void clocksource_dequeue_watchdog(struct clocksource *cs) { }
 static inline void clocksource_resume_watchdog(void) { }
-static inline int clocksource_watchdog_kthread(void *data) { return 0; }
+static inline int __clocksource_watchdog_kthread(void) { return 0; }
 static bool clocksource_is_watchdog(struct clocksource *cs) { return false; }
 
 #endif /* CONFIG_CLOCKSOURCE_WATCHDOG */
@@ -647,16 +679,11 @@ static int __init clocksource_done_booti
 {
 	mutex_lock(&clocksource_mutex);
 	curr_clocksource = clocksource_default_clock();
-	mutex_unlock(&clocksource_mutex);
-
 	finished_booting = 1;
-
 	/*
 	 * Run the watchdog first to eliminate unstable clock sources
 	 */
-	clocksource_watchdog_kthread(NULL);
-
-	mutex_lock(&clocksource_mutex);
+	__clocksource_watchdog_kthread();
 	clocksource_select();
 	mutex_unlock(&clocksource_mutex);
 	return 0;
@@ -789,7 +816,6 @@ static void __clocksource_change_rating(
 	list_del(&cs->list);
 	cs->rating = rating;
 	clocksource_enqueue(cs);
-	clocksource_select();
 }
 
 /**
@@ -801,6 +827,7 @@ void clocksource_change_rating(struct cl
 {
 	mutex_lock(&clocksource_mutex);
 	__clocksource_change_rating(cs, rating);
+	clocksource_select();
 	mutex_unlock(&clocksource_mutex);
 }
 EXPORT_SYMBOL(clocksource_change_rating);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-04 20:27     ` Thomas Gleixner
@ 2013-07-05  1:12       ` Alex Shi
  2013-07-05  5:58         ` Thomas Gleixner
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Shi @ 2013-07-05  1:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Davidlohr Bueso, hpa, tim.c.chen, linux-kernel, andi.kleen,
	a.p.zijlstra, mingo

On 07/05/2013 04:27 AM, Thomas Gleixner wrote:
>>>> We find some benchmarks drop a lot on tip/sched/core on many Intel boxes,
> And that branch does NOT have that commit included. So how can you see
> a regression on a branch caused by a commit NOT included into that
> branch?
> 
> The offending commit is in tip timers/core and not in tip
> sched/core. What I'm wanted to say is, that we need a proper
> description of problems and not some random association.
> 
> It tricked you to assume, that I'm not able to figure it out myself :)
> 
> See? These things are complex and subtle, so we need precise
> descriptions and not some sloppy semi correct data.
> 
> I'm well aware of the issue and with Peters help I got a reasonable
> explanation for it. A proper fix is about to be sent out.
> 

Ingo had merged your branch into sched/core. :)

commit f9bed7021710a3e45c331f7d7781de914cc1b939
Merge: 7e76057 67dd331
Author: Ingo Molnar <mingo@kernel.org>
Date:   Wed May 29 11:21:59 2013 +0200

    Merge branch 'timers/urgent'

...

commit 7d194f78bde64ec813c1ed8291181bdd61515e78
Merge: 0298bf7 1eaff67
Author: Ingo Molnar <mingo@kernel.org>
Date:   Tue May 28 09:53:41 2013 +0200

    Merge branch 'timers/core'

commit 0298bf70644d7334bec16ae47f3aa58f4f883b59
Merge: cc662fa 2938d27
Author: Ingo Molnar <mingo@kernel.org>
Date:   Tue May 28 09:49:51 2013 +0200

    Merge branch 'timers/urgent'

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] clocksource: clean up clocksource_select
  2013-07-04 10:27       ` Thomas Gleixner
@ 2013-07-05  1:22         ` Alex Shi
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Shi @ 2013-07-05  1:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On 07/04/2013 06:27 PM, Thomas Gleixner wrote:
>> > BTW, why we allow user override a second best clocksource? I mean user
>> > can override the tsc with hpet. because undetected unstable tsc?
> The user can decide to override with clocksource=jiffies if he wants
> for testing purposes. That wont switch into highres ever if done from
> the kernel command line. Now we have a sysfs interface, so we need to
> sanity check the user override against highres.
> 
> The override is quite useful to test hpet, pmtimer on a machine which
> would always select TSC.

Got it. thanks a lot for friendly explanation!

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] clocksource: Reselect clocksource when watchdog validated
  2013-07-04 20:46       ` [PATCH] clocksource: Reselect clocksource when watchdog validated Thomas Gleixner
@ 2013-07-05  1:38         ` Alex Shi
  2013-07-05  9:13         ` [tip:timers/core] clocksource: Reselect clocksource when watchdog validated high-res capability tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 34+ messages in thread
From: Alex Shi @ 2013-07-05  1:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: hpa, tim.c.chen, LKML, andi.kleen, a.p.zijlstra, Ingo Molnar,
	Davidlohr Bueso, John Stultz

On 07/05/2013 04:46 AM, Thomas Gleixner wrote:
> 
> Reported-by: Alex Shi <alex.shi@intel.com>
> Cc: Hans Peter Anvin <hpa@linux.intel.com>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Andi Kleen <andi.kleen@intel.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Glad to the problem fixed. Thanks!

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug
  2013-07-04 13:04     ` Thomas Gleixner
  2013-07-04 20:46       ` [PATCH] clocksource: Reselect clocksource when watchdog validated Thomas Gleixner
@ 2013-07-05  2:48       ` Alex Shi
  2013-07-05  5:41         ` Thomas Gleixner
  1 sibling, 1 reply; 34+ messages in thread
From: Alex Shi @ 2013-07-05  2:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On 07/04/2013 09:04 PM, Thomas Gleixner wrote:
>>> > >  static int clocksource_watchdog_kthread(void *data)
>>> > >  {
>>> > >  	struct clocksource *cs, *tmp;
>>> > > @@ -412,11 +415,14 @@ static int clocksource_watchdog_kthread(void *data)
>>> > >  
>>> > >  	mutex_lock(&clocksource_mutex);
>>> > >  	spin_lock_irqsave(&watchdog_lock, flags);
>>> > > -	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list)
>>> > > +	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
>>> > >  		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
>>> > >  			list_del_init(&cs->wd_list);
>>> > >  			list_add(&cs->wd_list, &unstable);
>>> > >  		}
>>> > > +		if (cs->flags & CLOCK_SOURCE_VALID_FOR_HRES)
>>> > > +			clocksource_select();
>> > 
>> > Unlikely, but if we have 3 watched clocksources which have the HRES
>> > bit set, you'd call 3 times clocksource_select().
> Also the reselect must be called outside the watchdog_lock region.

Sorry for stupid, the watchdog_lock used protect watchdog related
resource. but clocksource_select doesn't touch them. So, I know it isn't
necessary to put reselect under this lock. Just don't know why the
reselect *must* be called outside of it?

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug
  2013-07-05  2:48       ` [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug Alex Shi
@ 2013-07-05  5:41         ` Thomas Gleixner
  2013-07-05  6:44           ` Alex Shi
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-05  5:41 UTC (permalink / raw)
  To: Alex Shi; +Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On Fri, 5 Jul 2013, Alex Shi wrote:
> On 07/04/2013 09:04 PM, Thomas Gleixner wrote:
> >>> > >  static int clocksource_watchdog_kthread(void *data)
> >>> > >  {
> >>> > >  	struct clocksource *cs, *tmp;
> >>> > > @@ -412,11 +415,14 @@ static int clocksource_watchdog_kthread(void *data)
> >>> > >  
> >>> > >  	mutex_lock(&clocksource_mutex);
> >>> > >  	spin_lock_irqsave(&watchdog_lock, flags);
> >>> > > -	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list)
> >>> > > +	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
> >>> > >  		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
> >>> > >  			list_del_init(&cs->wd_list);
> >>> > >  			list_add(&cs->wd_list, &unstable);
> >>> > >  		}
> >>> > > +		if (cs->flags & CLOCK_SOURCE_VALID_FOR_HRES)
> >>> > > +			clocksource_select();
> >> > 
> >> > Unlikely, but if we have 3 watched clocksources which have the HRES
> >> > bit set, you'd call 3 times clocksource_select().
> > Also the reselect must be called outside the watchdog_lock region.
> 
> Sorry for stupid, the watchdog_lock used protect watchdog related
> resource. but clocksource_select doesn't touch them. So, I know it isn't
> necessary to put reselect under this lock. Just don't know why the
> reselect *must* be called outside of it?

  clocksource_select()
    timekeeping_notify()
      stop_machine()
        get_online_cpus()
	  might_sleep()
	  mutex_lock()

So we need to be in preemptible context when we call clocksource_select().

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05  1:12       ` Alex Shi
@ 2013-07-05  5:58         ` Thomas Gleixner
  2013-07-05  6:28           ` Alex Shi
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-05  5:58 UTC (permalink / raw)
  To: Alex Shi
  Cc: Davidlohr Bueso, hpa, tim.c.chen, linux-kernel, andi.kleen,
	a.p.zijlstra, mingo

On Fri, 5 Jul 2013, Alex Shi wrote:
> On 07/05/2013 04:27 AM, Thomas Gleixner wrote:
> >>>> We find some benchmarks drop a lot on tip/sched/core on many Intel boxes,
> > And that branch does NOT have that commit included. So how can you see
> > a regression on a branch caused by a commit NOT included into that
> > branch?
> > 
> > The offending commit is in tip timers/core and not in tip
> > sched/core. What I'm wanted to say is, that we need a proper
> > description of problems and not some random association.
> > 
> > It tricked you to assume, that I'm not able to figure it out myself :)
> > 
> > See? These things are complex and subtle, so we need precise
> > descriptions and not some sloppy semi correct data.
> > 
> > I'm well aware of the issue and with Peters help I got a reasonable
> > explanation for it. A proper fix is about to be sent out.
> > 
> 
> Ingo had merged your branch into sched/core. :)
> 
> commit f9bed7021710a3e45c331f7d7781de914cc1b939
> Merge: 7e76057 67dd331
> Author: Ingo Molnar <mingo@kernel.org>
> Date:   Wed May 29 11:21:59 2013 +0200
> 
>     Merge branch 'timers/urgent'

Not really.

tip$ git branch --contains f9bed70217
* master

tip$ git branch --contains 5d33b883ae
* master
  timers/core

So you are testing tip/master not tip/sched/core.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05  5:58         ` Thomas Gleixner
@ 2013-07-05  6:28           ` Alex Shi
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Shi @ 2013-07-05  6:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Davidlohr Bueso, hpa, tim.c.chen, linux-kernel, andi.kleen,
	a.p.zijlstra, mingo

On 07/05/2013 01:58 PM, Thomas Gleixner wrote:
>> > 
>> > Ingo had merged your branch into sched/core. :)
>> > 
>> > commit f9bed7021710a3e45c331f7d7781de914cc1b939
>> > Merge: 7e76057 67dd331
>> > Author: Ingo Molnar <mingo@kernel.org>
>> > Date:   Wed May 29 11:21:59 2013 +0200
>> > 
>> >     Merge branch 'timers/urgent'
> Not really.
> 
> tip$ git branch --contains f9bed70217
> * master
> 
> tip$ git branch --contains 5d33b883ae
> * master
>   timers/core
> 
> So you are testing tip/master not tip/sched/core.

You'r right. I mixed them. sorry.

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug
  2013-07-05  5:41         ` Thomas Gleixner
@ 2013-07-05  6:44           ` Alex Shi
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Shi @ 2013-07-05  6:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: hpa, tim.c.chen, linux-kernel, andi.kleen, a.p.zijlstra, mingo

On 07/05/2013 01:41 PM, Thomas Gleixner wrote:
>>>>> > >> > Unlikely, but if we have 3 watched clocksources which have the HRES
>>>>> > >> > bit set, you'd call 3 times clocksource_select().
>>> > > Also the reselect must be called outside the watchdog_lock region.
>> > 
>> > Sorry for stupid, the watchdog_lock used protect watchdog related
>> > resource. but clocksource_select doesn't touch them. So, I know it isn't
>> > necessary to put reselect under this lock. Just don't know why the
>> > reselect *must* be called outside of it?
>   clocksource_select()
>     timekeeping_notify()
>       stop_machine()
>         get_online_cpus()
> 	  might_sleep()
> 	  mutex_lock()
> 
> So we need to be in preemptible context when we call clocksource_select().

understand! many thanks!

-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [tip:timers/core] clocksource: Reselect clocksource when watchdog validated high-res capability
  2013-07-04 20:46       ` [PATCH] clocksource: Reselect clocksource when watchdog validated Thomas Gleixner
  2013-07-05  1:38         ` Alex Shi
@ 2013-07-05  9:13         ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 34+ messages in thread
From: tip-bot for Thomas Gleixner @ 2013-07-05  9:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, davidlohr.bueso, a.p.zijlstra,
	tim.c.chen, alex.shi, andi.kleen, john.stultz, tglx, hpa

Commit-ID:  332962f2c88868ed3cdab466870baaa34dd58612
Gitweb:     http://git.kernel.org/tip/332962f2c88868ed3cdab466870baaa34dd58612
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 4 Jul 2013 22:46:45 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 5 Jul 2013 11:09:28 +0200

clocksource: Reselect clocksource when watchdog validated high-res capability

Up to commit 5d33b883a (clocksource: Always verify highres capability)
we had no sanity check when selecting a clocksource, which prevented
that a non highres capable clocksource is used when the system already
switched to highres/nohz mode.

The new sanity check works as Alex and Tim found out. It prevents the
TSC from being used. This happens because on x86 the boot process
looks like this:

 tsc_start_freqency_validation(TSC);
 clocksource_register(HPET);
 clocksource_done_booting();
	clocksource_select()
		Selects HPET which is valid for high-res

 switch_to_highres();

 clocksource_register(TSC);
 	TSC is not selected, because it is not yet
	flagged as VALID_HIGH_RES

 clocksource_watchdog()
	Validates TSC for highres, but that does not make TSC
	the current clocksource.

Before the sanity check was added, we installed TSC unvalidated which
worked most of the time. If the TSC was really detected as unstable,
then the unstable logic removed it and installed HPET again.

The sanity check is correct and needed. So the watchdog needs to kick
a reselection of the clocksource, when it qualifies TSC as a valid
high res clocksource.

To solve this, we mark the clocksource which got the flag
CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
thread evaluates the flag and invokes clocksource_select() when set.

To avoid that the clocksource_done_booting() code, which is about to
install the first real clocksource anyway, needs to go through
clocksource_select and tick_oneshot_notify() pointlessly, split out
the clocksource_watchdog_kthread() list walk code and invoke the
select/notify only when called from clocksource_watchdog_kthread().

So clocksource_done_booting() can utilize the same splitout code
without the select/notify invocation and the clocksource_mutex
unlock/relock dance.

Reported-and-tested-by: Alex Shi <alex.shi@intel.com>
Cc: Hans Peter Anvin <hpa@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/clocksource.h |  1 +
 kernel/time/clocksource.c   | 57 +++++++++++++++++++++++++++++++++------------
 2 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 2f39a49..dbbf8aa 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -210,6 +210,7 @@ struct clocksource {
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
+#define CLOCK_SOURCE_RESELECT			0x100
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index e713ef7..50a8736 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -181,6 +181,7 @@ static int finished_booting;
 
 #ifdef CONFIG_CLOCKSOURCE_WATCHDOG
 static void clocksource_watchdog_work(struct work_struct *work);
+static void clocksource_select(void);
 
 static LIST_HEAD(watchdog_list);
 static struct clocksource *watchdog;
@@ -301,13 +302,30 @@ static void clocksource_watchdog(unsigned long data)
 		if (!(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES) &&
 		    (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS) &&
 		    (watchdog->flags & CLOCK_SOURCE_IS_CONTINUOUS)) {
+			/* Mark it valid for high-res. */
 			cs->flags |= CLOCK_SOURCE_VALID_FOR_HRES;
+
+			/*
+			 * clocksource_done_booting() will sort it if
+			 * finished_booting is not set yet.
+			 */
+			if (!finished_booting)
+				continue;
+
 			/*
-			 * We just marked the clocksource as highres-capable,
-			 * notify the rest of the system as well so that we
-			 * transition into high-res mode:
+			 * If this is not the current clocksource let
+			 * the watchdog thread reselect it. Due to the
+			 * change to high res this clocksource might
+			 * be preferred now. If it is the current
+			 * clocksource let the tick code know about
+			 * that change.
 			 */
-			tick_clock_notify();
+			if (cs != curr_clocksource) {
+				cs->flags |= CLOCK_SOURCE_RESELECT;
+				schedule_work(&watchdog_work);
+			} else {
+				tick_clock_notify();
+			}
 		}
 	}
 
@@ -404,19 +422,25 @@ static void clocksource_dequeue_watchdog(struct clocksource *cs)
 	spin_unlock_irqrestore(&watchdog_lock, flags);
 }
 
-static int clocksource_watchdog_kthread(void *data)
+static int __clocksource_watchdog_kthread(void)
 {
 	struct clocksource *cs, *tmp;
 	unsigned long flags;
 	LIST_HEAD(unstable);
+	int select = 0;
 
-	mutex_lock(&clocksource_mutex);
 	spin_lock_irqsave(&watchdog_lock, flags);
-	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list)
+	list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) {
 		if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
 			list_del_init(&cs->wd_list);
 			list_add(&cs->wd_list, &unstable);
+			select = 1;
+		}
+		if (cs->flags & CLOCK_SOURCE_RESELECT) {
+			cs->flags &= ~CLOCK_SOURCE_RESELECT;
+			select = 1;
 		}
+	}
 	/* Check if the watchdog timer needs to be stopped. */
 	clocksource_stop_watchdog();
 	spin_unlock_irqrestore(&watchdog_lock, flags);
@@ -426,6 +450,14 @@ static int clocksource_watchdog_kthread(void *data)
 		list_del_init(&cs->wd_list);
 		__clocksource_change_rating(cs, 0);
 	}
+	return select;
+}
+
+static int clocksource_watchdog_kthread(void *data)
+{
+	mutex_lock(&clocksource_mutex);
+	if (__clocksource_watchdog_kthread())
+		clocksource_select();
 	mutex_unlock(&clocksource_mutex);
 	return 0;
 }
@@ -445,7 +477,7 @@ static void clocksource_enqueue_watchdog(struct clocksource *cs)
 
 static inline void clocksource_dequeue_watchdog(struct clocksource *cs) { }
 static inline void clocksource_resume_watchdog(void) { }
-static inline int clocksource_watchdog_kthread(void *data) { return 0; }
+static inline int __clocksource_watchdog_kthread(void) { return 0; }
 static bool clocksource_is_watchdog(struct clocksource *cs) { return false; }
 
 #endif /* CONFIG_CLOCKSOURCE_WATCHDOG */
@@ -647,16 +679,11 @@ static int __init clocksource_done_booting(void)
 {
 	mutex_lock(&clocksource_mutex);
 	curr_clocksource = clocksource_default_clock();
-	mutex_unlock(&clocksource_mutex);
-
 	finished_booting = 1;
-
 	/*
 	 * Run the watchdog first to eliminate unstable clock sources
 	 */
-	clocksource_watchdog_kthread(NULL);
-
-	mutex_lock(&clocksource_mutex);
+	__clocksource_watchdog_kthread();
 	clocksource_select();
 	mutex_unlock(&clocksource_mutex);
 	return 0;
@@ -789,7 +816,6 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating)
 	list_del(&cs->list);
 	cs->rating = rating;
 	clocksource_enqueue(cs);
-	clocksource_select();
 }
 
 /**
@@ -801,6 +827,7 @@ void clocksource_change_rating(struct clocksource *cs, int rating)
 {
 	mutex_lock(&clocksource_mutex);
 	__clocksource_change_rating(cs, rating);
+	clocksource_select();
 	mutex_unlock(&clocksource_mutex);
 }
 EXPORT_SYMBOL(clocksource_change_rating);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-04  7:58 ` [URGENT rfc patch 0/3] tsc clocksource bug fix Peter Zijlstra
  2013-07-04  8:13   ` Alex Shi
@ 2013-07-05 14:23   ` Frederic Weisbecker
  2013-07-05 14:38     ` Peter Zijlstra
  1 sibling, 1 reply; 34+ messages in thread
From: Frederic Weisbecker @ 2013-07-05 14:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alex Shi, tglx, hpa, tim.c.chen, linux-kernel, andi.kleen, mingo

2013/7/4 Peter Zijlstra <peterz@infradead.org>:
> On Thu, Jul 04, 2013 at 01:34:13PM +0800, Alex Shi wrote:
>
>> If the tsc is marked as constant and nonstop, could we set it as system
>> clocksource when do tsc register? w/o checking it on clocksource_watchdog?
>
> I'd not do that; the BIOS can still screw you over, we need some validation.
>
> That said; we do need means to disable the clocksource watchdog -- although I
> suppose Frederic might already have provided this for this NOHZ efforts when I
> wasn't looking.

Nope, I haven't touched that. I prefer not to fiddle with unstable
clocksource for now :)

As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply
don't stop the tick.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05 14:23   ` Frederic Weisbecker
@ 2013-07-05 14:38     ` Peter Zijlstra
  2013-07-05 15:24       ` Thomas Gleixner
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2013-07-05 14:38 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Alex Shi, tglx, hpa, tim.c.chen, linux-kernel, andi.kleen, mingo

On Fri, Jul 05, 2013 at 04:23:33PM +0200, Frederic Weisbecker wrote:
> Nope, I haven't touched that. I prefer not to fiddle with unstable
> clocksource for now :)
> 
> As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply
> don't stop the tick.

Not entirely the same thing; I thought the clocksource watchdog was ran
even when we have a 'stable' TSC, just to make sure it stays stable.
There's known cases where the BIOS f*cks us over and wrecks TSC sync.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05 14:38     ` Peter Zijlstra
@ 2013-07-05 15:24       ` Thomas Gleixner
  2013-07-05 21:22         ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-05 15:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Alex Shi, hpa, tim.c.chen, linux-kernel,
	andi.kleen, mingo

On Fri, 5 Jul 2013, Peter Zijlstra wrote:

> On Fri, Jul 05, 2013 at 04:23:33PM +0200, Frederic Weisbecker wrote:
> > Nope, I haven't touched that. I prefer not to fiddle with unstable
> > clocksource for now :)
> > 
> > As for unstable TSCs, if sched_clock_tick() needs to be fed, we simply
> > don't stop the tick.
> 
> Not entirely the same thing; I thought the clocksource watchdog was ran
> even when we have a 'stable' TSC, just to make sure it stays stable.
> There's known cases where the BIOS f*cks us over and wrecks TSC sync.

See arch/x86/kernel/tsc.c

We disable the watchdog for the TSC when tsc_clocksource_reliable is
set.

tsc_clocksource_reliable is set when:

 - you add tsc=reliable to the kernel command line

 - boot_cpu_has(X86_FEATURE_TSC_RELIABLE)
 
   X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and
   moorsetown. So all other machines keep the watchdog enabled.

 - On Geode LX (OLPC)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05 15:24       ` Thomas Gleixner
@ 2013-07-05 21:22         ` Peter Zijlstra
  2013-07-05 21:50           ` Thomas Gleixner
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2013-07-05 21:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Alex Shi, hpa, tim.c.chen, linux-kernel,
	andi.kleen, mingo

On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote:
> See arch/x86/kernel/tsc.c
> 
> We disable the watchdog for the TSC when tsc_clocksource_reliable is
> set.
> 
> tsc_clocksource_reliable is set when:
> 
>  - you add tsc=reliable to the kernel command line

Ah, I didn't know about that one, useful.

>  - boot_cpu_has(X86_FEATURE_TSC_RELIABLE)
>  
>    X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and
>    moorsetown. So all other machines keep the watchdog enabled.

Right.. I knew it was enabled on my machines even though they normally
have usable TSC.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05 21:22         ` Peter Zijlstra
@ 2013-07-05 21:50           ` Thomas Gleixner
  2013-07-05 21:58             ` Borislav Petkov
  2013-07-06 10:50             ` Peter Zijlstra
  0 siblings, 2 replies; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-05 21:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Alex Shi, hpa, tim.c.chen, linux-kernel,
	andi.kleen, mingo

On Fri, 5 Jul 2013, Peter Zijlstra wrote:
> On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote:
> > See arch/x86/kernel/tsc.c
> > 
> > We disable the watchdog for the TSC when tsc_clocksource_reliable is
> > set.
> > 
> > tsc_clocksource_reliable is set when:
> > 
> >  - you add tsc=reliable to the kernel command line
> 
> Ah, I didn't know about that one, useful.
> 
> >  - boot_cpu_has(X86_FEATURE_TSC_RELIABLE)
> >  
> >    X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and
> >    moorsetown. So all other machines keep the watchdog enabled.
> 
> Right.. I knew it was enabled on my machines even though they normally
> have usable TSC.

Yeah, but our well justified paranoia still prevents us from trusting
these CPU flags. Maybe some day BIOS is going to be replaced by
something useful. You know: Hope springs eternal....





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05 21:50           ` Thomas Gleixner
@ 2013-07-05 21:58             ` Borislav Petkov
  2013-07-05 22:17               ` Thomas Gleixner
  2013-07-06 10:50             ` Peter Zijlstra
  1 sibling, 1 reply; 34+ messages in thread
From: Borislav Petkov @ 2013-07-05 21:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Frederic Weisbecker, Alex Shi, hpa, tim.c.chen,
	linux-kernel, andi.kleen, mingo

On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote:
> Yeah, but our well justified paranoia still prevents us from trusting
> these CPU flags. Maybe some day BIOS is going to be replaced by
> something useful. You know: Hope springs eternal....

Not in the next 10 yrs at least if one took a look at the
overengineered, obese at birth and braindead crap by the name of UEFI.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05 21:58             ` Borislav Petkov
@ 2013-07-05 22:17               ` Thomas Gleixner
  2013-07-06  8:37                 ` Borislav Petkov
  0 siblings, 1 reply; 34+ messages in thread
From: Thomas Gleixner @ 2013-07-05 22:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Frederic Weisbecker, Alex Shi, hpa, tim.c.chen,
	linux-kernel, andi.kleen, mingo

On Fri, 5 Jul 2013, Borislav Petkov wrote:
> On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote:
> > Yeah, but our well justified paranoia still prevents us from trusting
> > these CPU flags. Maybe some day BIOS is going to be replaced by
> > something useful. You know: Hope springs eternal....
> 
> Not in the next 10 yrs at least if one took a look at the
> overengineered, obese at birth and braindead crap by the name of UEFI.

Good news! 10 years is way less than eternity and just before
retirement :)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05 22:17               ` Thomas Gleixner
@ 2013-07-06  8:37                 ` Borislav Petkov
  0 siblings, 0 replies; 34+ messages in thread
From: Borislav Petkov @ 2013-07-06  8:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Frederic Weisbecker, Alex Shi, hpa, tim.c.chen,
	linux-kernel, andi.kleen, mingo

On Sat, Jul 06, 2013 at 12:17:46AM +0200, Thomas Gleixner wrote:
> Good news! 10 years is way less than eternity and just before
> retirement :)

You know that after the 10 years they'll come up with an even uglier
platform-differentiation-fiddle-with-dong-while-smoking-crack-crap which
will even replace the OS, right?

Hmm, I'm wondering what would be faster: wait *at least* 10 more years
or get an old mainboard and start experimenting with coreboot...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [URGENT rfc patch 0/3] tsc clocksource bug fix
  2013-07-05 21:50           ` Thomas Gleixner
  2013-07-05 21:58             ` Borislav Petkov
@ 2013-07-06 10:50             ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2013-07-06 10:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Frederic Weisbecker, Alex Shi, hpa, tim.c.chen, linux-kernel,
	andi.kleen, mingo

On Fri, Jul 05, 2013 at 11:50:05PM +0200, Thomas Gleixner wrote:
> On Fri, 5 Jul 2013, Peter Zijlstra wrote:
> > On Fri, Jul 05, 2013 at 05:24:09PM +0200, Thomas Gleixner wrote:
> > > See arch/x86/kernel/tsc.c
> > > 
> > > We disable the watchdog for the TSC when tsc_clocksource_reliable is
> > > set.
> > > 
> > > tsc_clocksource_reliable is set when:
> > > 
> > >  - you add tsc=reliable to the kernel command line
> > 
> > Ah, I didn't know about that one, useful.
> > 
> > >  - boot_cpu_has(X86_FEATURE_TSC_RELIABLE)
> > >  
> > >    X86_FEATURE_TSC_RELIABLE is a software flag, set by vmware and
> > >    moorsetown. So all other machines keep the watchdog enabled.
> > 
> > Right.. I knew it was enabled on my machines even though they normally
> > have usable TSC.
> 
> Yeah, but our well justified paranoia still prevents us from trusting
> these CPU flags. Maybe some day BIOS is going to be replaced by
> something useful. You know: Hope springs eternal....

Oh quite agreed. Its just that at several times I've wanted to disable the
thing. Now I know you can do using the kernel cmdline. Previously I had to
wreck code -- not that much harder really :-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2013-07-06 10:50 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-04  5:34 [URGENT rfc patch 0/3] tsc clocksource bug fix Alex Shi
2013-07-04  5:34 ` [PATCH 1/3] clocksource: clean up clocksource_select Alex Shi
2013-07-04  9:57   ` Thomas Gleixner
2013-07-04 10:21     ` Alex Shi
2013-07-04 10:27       ` Thomas Gleixner
2013-07-05  1:22         ` Alex Shi
2013-07-04  5:34 ` [PATCH 2/3] clockesource: set override clocksource Alex Shi
2013-07-04 10:23   ` Thomas Gleixner
2013-07-04  5:34 ` [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug Alex Shi
2013-07-04 10:55   ` Thomas Gleixner
2013-07-04 13:04     ` Thomas Gleixner
2013-07-04 20:46       ` [PATCH] clocksource: Reselect clocksource when watchdog validated Thomas Gleixner
2013-07-05  1:38         ` Alex Shi
2013-07-05  9:13         ` [tip:timers/core] clocksource: Reselect clocksource when watchdog validated high-res capability tip-bot for Thomas Gleixner
2013-07-05  2:48       ` [PATCH 3/3] clocksource: fix can not set tsc as clocksource bug Alex Shi
2013-07-05  5:41         ` Thomas Gleixner
2013-07-05  6:44           ` Alex Shi
2013-07-04  7:58 ` [URGENT rfc patch 0/3] tsc clocksource bug fix Peter Zijlstra
2013-07-04  8:13   ` Alex Shi
2013-07-05 14:23   ` Frederic Weisbecker
2013-07-05 14:38     ` Peter Zijlstra
2013-07-05 15:24       ` Thomas Gleixner
2013-07-05 21:22         ` Peter Zijlstra
2013-07-05 21:50           ` Thomas Gleixner
2013-07-05 21:58             ` Borislav Petkov
2013-07-05 22:17               ` Thomas Gleixner
2013-07-06  8:37                 ` Borislav Petkov
2013-07-06 10:50             ` Peter Zijlstra
2013-07-04 11:00 ` Thomas Gleixner
2013-07-04 18:11   ` Davidlohr Bueso
2013-07-04 20:27     ` Thomas Gleixner
2013-07-05  1:12       ` Alex Shi
2013-07-05  5:58         ` Thomas Gleixner
2013-07-05  6:28           ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).