All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
@ 2012-01-16 20:15 Suresh Siddha
  2012-01-17  0:18 ` Linus Torvalds
  2012-01-17  2:47 ` [patch] x86, tsc: fix " Yinghai Lu
  0 siblings, 2 replies; 10+ messages in thread
From: Suresh Siddha @ 2012-01-16 20:15 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin
  Cc: linux-kernel, asit.k.mallick

Linus, We are seeing NTP failures on a big cluster as a result of big
variation in calibrated TSC values. Our debug showed that it is indeed
because of the SMI and its effect on quick pit calibration. Appended
patch helps fix it. It ran over the weekend boot tests with out any
failures.

thanks,
suresh
---
From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86, tsc: fix SMI induced variation in quick_pit_calibrate()

pit_expect_msb() returns success wrongly in the below SMI scenario:

a. pit_verify_msb() has not yet seen the MSB transition.

b. We are close to the MSB transition though and got a SMI immediately after
   returning from pit_verify_msb() which didn't see the MSB transition. PIT MSB
   transition has happened somewhere during SMI execution.

c. Returned from SMI and we noted down the 'tsc', saw the pit MSB change now and
   exited the loop to calculate d1/d2. Instead of noting the TSC at the MSB
   transition, we are way off because of the SMI.  And as the SMI happened
   between the pit_verify_msb() and before the 'tsc' is recorded in the
   for loop, d1/d2 will be small and quick_pit_calibrate() will not notice
   this error.

Depending on whether SMI disturbance happens while computing d1 or d2, we will
see the TSC calibrated value smaller or bigger than the expected value. As a
result, in a cluster we were seeing a variation of approximately +/- 20MHz in
the calibrated values, resulting in NTP failures.

  [ As far as the SMI source is concerned, this is a periodic SMI that gets
    disabled after ACPI is enabled by the OS. But the TSC calibration happens
    before the ACPI is enabled. ]

Fix this by comparing the returned delta (that is supposed to capture the MSB
transition and represents d1/d2 in the quick_pit_calibrate()) with
the one before in the for loop. If both of them are similar, then the returned
delta, tsc is captured closer to the MSB transition. Otherwise we will return
failure and fallback to slow PIT calibration.

Any SMI induced disturbance in returned delta (d1/d2) itself is already caught
in our requirements of error has to be less than 500ppm.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/kernel/tsc.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index c0dd5b6..27a1311 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -290,11 +290,12 @@ static inline int pit_verify_msb(unsigned char val)
 static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *deltap)
 {
 	int count;
-	u64 tsc = 0;
+	u64 tsc = 0, tsc_old;
 
 	for (count = 0; count < 50000; count++) {
 		if (!pit_verify_msb(val))
 			break;
+		tsc_old = tsc;
 		tsc = get_cycles();
 	}
 	*deltap = get_cycles() - tsc;
@@ -304,7 +305,7 @@ static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *de
 	 * We require _some_ success, but the quality control
 	 * will be based on the error terms on the TSC values.
 	 */
-	return count > 5;
+	return count > 5 && (tsc - tsc_old <= 2 * (*deltap));
 }
 
 /*



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
  2012-01-16 20:15 [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate() Suresh Siddha
@ 2012-01-17  0:18 ` Linus Torvalds
  2012-01-17  0:41   ` Suresh Siddha
  2012-01-17  2:47 ` [patch] x86, tsc: fix " Yinghai Lu
  1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-01-17  0:18 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	asit.k.mallick

[-- Attachment #1: Type: text/plain, Size: 1663 bytes --]

On Mon, Jan 16, 2012 at 12:15 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
> Linus, We are seeing NTP failures on a big cluster as a result of big
> variation in calibrated TSC values. Our debug showed that it is indeed
> because of the SMI and its effect on quick pit calibration. Appended
> patch helps fix it. It ran over the weekend boot tests with out any
> failures.

Ok, I think your patch is wrong.

HOWEVER.

I think it may be *close* to right.

So what happens is that right now we calculate the "deltatsc" over the
last PIT read - the one that returned the new MSB.

But what we *should* do is to calculate deltatsc over the *two* last
PIT reads - so that we have the time delay over seeing both the old
MSB _and_ seeign the new one. That's the true measure of how precisely
we caught the "MSB changed" thing, after all!

So I think your patch is a total hack that just compares the two last
TSC deltas, but it's actually close to the "correct' thing in that it
does start taking the time to see the last of the previous MSB into
account.

So this patch changes pit_expect_msb() so that

 - the 'tsc' is the TSC in between the two reads that read the MSB
change from the PIT (same as before)

 - the 'delta' is the difference in TSC from *before* the MSB changed
to *after* the MSB changed.

Now the delta is twice as big as before (it covers four PIT accesses,
roughly 4us), so the comments might have to be updated to match, but
the rest of the code should "just work" (except it might loop a bit
longer, and maybe it gives closer to 250 ppm precision).

Does this fix it for you? I have NOT tested it in any way.

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 686 bytes --]

 arch/x86/kernel/tsc.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index c0dd5b603749..6b742da41304 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -290,14 +290,15 @@ static inline int pit_verify_msb(unsigned char val)
 static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *deltap)
 {
 	int count;
-	u64 tsc = 0;
+	u64 tsc = 0, prev_tsc = 0;
 
 	for (count = 0; count < 50000; count++) {
 		if (!pit_verify_msb(val))
 			break;
+		prev_tsc = tsc;
 		tsc = get_cycles();
 	}
-	*deltap = get_cycles() - tsc;
+	*deltap = get_cycles() - prev_tsc;
 	*tscp = tsc;
 
 	/*

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
  2012-01-17  0:18 ` Linus Torvalds
@ 2012-01-17  0:41   ` Suresh Siddha
  2012-01-17  0:52     ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-01-17  0:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	asit.k.mallick

On Mon, 2012-01-16 at 16:18 -0800, Linus Torvalds wrote:
> So I think your patch is a total hack that just compares the two last
> TSC deltas, but it's actually close to the "correct' thing in that it
> does start taking the time to see the last of the previous MSB into
> account.

yeah, it was a hack but your alternative sounds good.

> Now the delta is twice as big as before (it covers four PIT accesses,
> roughly 4us), so the comments might have to be updated to match, but
> the rest of the code should "just work" (except it might loop a bit
> longer, and maybe it gives closer to 250 ppm precision).
> 
> Does this fix it for you? I have NOT tested it in any way.
> 

also may be we should change the correction 

from: delta += (long)(d2 - d1)/2;
to:   delta += (long)(d2 - d1)/4;

I will give this patch a try.

thanks,
suresh




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
  2012-01-17  0:41   ` Suresh Siddha
@ 2012-01-17  0:52     ` Linus Torvalds
  2012-01-17  1:30       ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-01-17  0:52 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	asit.k.mallick

On Mon, Jan 16, 2012 at 4:41 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
>
> also may be we should change the correction
>
> from: delta += (long)(d2 - d1)/2;
> to:   delta += (long)(d2 - d1)/4;
>
> I will give this patch a try.

Hmm. I get the feeling that we should remove that line entirely.

I think it actually makes the 'delta' less precise - the "d2-d1" thing
is how much of a difference there was in the first and last PIT MSB
differences, but while that difference might give some kind of error
estimate for how close we can expect 'delta' to be from correct, I
don't think we can really *add* that error to 'delta'. We might as
well subtract it (and we do - the sign is random and depends on which
one happened to be longer).

So I think that line should go away entirely. It doesn't have any
meaning. Yes, 'delta' may not be exact, but we don't know which
direction to correct into, so..

I realize that I wrote it, and that it as such must be bug-free, but I
suspect that removing that line is even *more* bug-free.

                 Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
  2012-01-17  0:52     ` Linus Torvalds
@ 2012-01-17  1:30       ` Linus Torvalds
  2012-01-17  4:06         ` Suresh Siddha
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-01-17  1:30 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	asit.k.mallick

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]

On Mon, Jan 16, 2012 at 4:52 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. I get the feeling that we should remove that line entirely.

Yeah, I really cannot come up with a single good reason to keep that
line, and suspect that I was a bit loopy when I wrote it.

So here's the suggested trivially updated patch. Does this work for people?

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1294 bytes --]

 arch/x86/kernel/tsc.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index c0dd5b603749..b2abd647d233 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -290,14 +290,15 @@ static inline int pit_verify_msb(unsigned char val)
 static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *deltap)
 {
 	int count;
-	u64 tsc = 0;
+	u64 tsc = 0, prev_tsc = 0;
 
 	for (count = 0; count < 50000; count++) {
 		if (!pit_verify_msb(val))
 			break;
+		prev_tsc = tsc;
 		tsc = get_cycles();
 	}
-	*deltap = get_cycles() - tsc;
+	*deltap = get_cycles() - prev_tsc;
 	*tscp = tsc;
 
 	/*
@@ -383,15 +384,12 @@ success:
 	 *
 	 * As a result, we can depend on there not being
 	 * any odd delays anywhere, and the TSC reads are
-	 * reliable (within the error). We also adjust the
-	 * delta to the middle of the error bars, just
-	 * because it looks nicer.
+	 * reliable (within the error).
 	 *
 	 * kHz = ticks / time-in-seconds / 1000;
 	 * kHz = (t2 - t1) / (I * 256 / PIT_TICK_RATE) / 1000
 	 * kHz = ((t2 - t1) * PIT_TICK_RATE) / (I * 256 * 1000)
 	 */
-	delta += (long)(d2 - d1)/2;
 	delta *= PIT_TICK_RATE;
 	do_div(delta, i*256*1000);
 	printk("Fast TSC calibration using PIT\n");

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
  2012-01-16 20:15 [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate() Suresh Siddha
  2012-01-17  0:18 ` Linus Torvalds
@ 2012-01-17  2:47 ` Yinghai Lu
  1 sibling, 0 replies; 10+ messages in thread
From: Yinghai Lu @ 2012-01-17  2:47 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	linux-kernel, asit.k.mallick

On Mon, Jan 16, 2012 at 12:15 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
> Linus, We are seeing NTP failures on a big cluster as a result of big
> variation in calibrated TSC values. Our debug showed that it is indeed
> because of the SMI and its effect on quick pit calibration. Appended
> patch helps fix it. It ran over the weekend boot tests with out any
> failures.
>
>
>  [ As far as the SMI source is concerned, this is a periodic SMI that gets
>    disabled after ACPI is enabled by the OS. But the TSC calibration happens
>    before the ACPI is enabled. ]
>

Hi, Suresh

Do you know what is SMI doing at that time ? with USB legacy emulating?

I have some patches that will do USB hand off early.
Wonder if that would help.
please check those patches at

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
usb_smi_disable_early

Thanks

Yinghai Lu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
  2012-01-17  1:30       ` Linus Torvalds
@ 2012-01-17  4:06         ` Suresh Siddha
  2012-01-17  5:15           ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-01-17  4:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	asit.k.mallick

On Mon, 2012-01-16 at 17:30 -0800, Linus Torvalds wrote:
> On Mon, Jan 16, 2012 at 4:52 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Hmm. I get the feeling that we should remove that line entirely.
> 
> Yeah, I really cannot come up with a single good reason to keep that
> line, and suspect that I was a bit loopy when I wrote it.
> 
> So here's the suggested trivially updated patch. Does this work for people?

It does work. But it is taking almost 105 iterations or so to stabilize.
And if I boot the system in power-save mode (lowest p-state), it takes
almost 115 iterations to stabilize (close to the 25msec limit). This is
on the latest gen platform.

So I suspect we can either go back to 500ppm error tolerance:

                        if ((d1+d2) >= delta >> 10)
                                continue;

or increase the MAX_QUICK_PIT_MS bit more.

thanks,
suresh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
  2012-01-17  4:06         ` Suresh Siddha
@ 2012-01-17  5:15           ` Linus Torvalds
  2012-01-17 23:35             ` Suresh Siddha
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-01-17  5:15 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	asit.k.mallick

On Mon, Jan 16, 2012 at 8:06 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
>
> So I suspect we can either go back to 500ppm error tolerance:
>
>                        if ((d1+d2) >= delta >> 10)
>                                continue;
>
> or increase the MAX_QUICK_PIT_MS bit more.

Agreed. Either works for me. I think I'd prefer increasing
MAX_QUICK_PIT_MS, because 25ms is still a fairly short time, but the
better we can make the precision of the clock, the better off we'll be
later.

So let's just double MAX_QUICK_PIT_MS to 50, ok?

And since you've tested it, I'm perfectly happy to sign off on that
patch so you can add my signed-off-by line. And if you can write a
reasonable description for it, maybe we can get Ingo & co to apply it.

              Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate()
  2012-01-17  5:15           ` Linus Torvalds
@ 2012-01-17 23:35             ` Suresh Siddha
  2012-01-18  0:25               ` [tip:x86/urgent] x86, tsc: Fix " tip-bot for Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-01-17 23:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, linux-kernel,
	asit.k.mallick

On Mon, 2012-01-16 at 21:15 -0800, Linus Torvalds wrote:
> So let's just double MAX_QUICK_PIT_MS to 50, ok?
> 

Ok. Here is the patch with the changelog. x86 folks, can you please
apply it?

Thanks.
---
From: Linus Torvalds <torvalds@linux-foundation.org>
Subject: x86, tsc: fix SMI induced variation in quick_pit_calibrate()

pit_expect_msb() returns success wrongly in the below SMI scenario:

a. pit_verify_msb() has not yet seen the MSB transition.

b. we are close to the MSB transition though and got a SMI immediately after
   returning from pit_verify_msb() which didn't see the MSB transition. PIT MSB
   transition has happened somewhere during SMI execution.

c. returned from SMI and we noted down the 'tsc', saw the pit MSB change now and
   exited the loop to calculate 'deltatsc'. Instead of noting the TSC at the MSB
   transition, we are way off because of the SMI.  And as the SMI happened
   between the pit_verify_msb() and before the 'tsc' is recorded in the
   for loop, 'delattsc' (d1/d2 in quick_pit_calibrate()) will be small and
   quick_pit_calibrate() will not notice this error.

Depending on whether SMI disturbance happens while computing d1 or d2, we will
see the TSC calibrated value smaller or bigger than the expected value. As a
result, in a cluster we were seeing a variation of approximately +/- 20MHz in
the calibrated values, resulting in NTP failures.

  [ As far as the SMI source is concerned, this is a periodic SMI that gets
    disabled after ACPI is enabled by the OS. But the TSC calibration happens
    before the ACPI is enabled. ]

To address this, change pit_expect_msb() so that

 - the 'tsc' is the TSC in between the two reads that read the MSB
change from the PIT (same as before)

 - the 'delta' is the difference in TSC from *before* the MSB changed
to *after* the MSB changed.

Now the delta is twice as big as before (it covers four PIT accesses,
roughly 4us) and quick_pit_calibrate() will loop a bit longer to get
the calibrated value with in the 500ppm precision. As the delta (d1/d2)
covers four PIT accesses, actual calibrated result might be closer to
250ppm precision.

As the loop now takes longer to stabilize, double MAX_QUICK_PIT_MS to 50.

SMI disturbance will showup as much larger delta's and the loop will take
longer than usual for the result to be with in the accepted precision. Or will
fallback to slow PIT calibration if it takes more than 50msec.

Also while we are at this, remove the calibration correction that aims to
get the result to the middle of the error bars. We really don't know which
direction to correct into, so remove it.

Reported-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/kernel/tsc.c |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2c9cf0f..f546946 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -290,14 +290,15 @@ static inline int pit_verify_msb(unsigned char val)
 static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *deltap)
 {
 	int count;
-	u64 tsc = 0;
+	u64 tsc = 0, prev_tsc = 0;
 
 	for (count = 0; count < 50000; count++) {
 		if (!pit_verify_msb(val))
 			break;
+		prev_tsc = tsc;
 		tsc = get_cycles();
 	}
-	*deltap = get_cycles() - tsc;
+	*deltap = get_cycles() - prev_tsc;
 	*tscp = tsc;
 
 	/*
@@ -311,9 +312,9 @@ static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *de
  * How many MSB values do we want to see? We aim for
  * a maximum error rate of 500ppm (in practice the
  * real error is much smaller), but refuse to spend
- * more than 25ms on it.
+ * more than 50ms on it.
  */
-#define MAX_QUICK_PIT_MS 25
+#define MAX_QUICK_PIT_MS 50
 #define MAX_QUICK_PIT_ITERATIONS (MAX_QUICK_PIT_MS * PIT_TICK_RATE / 1000 / 256)
 
 static unsigned long quick_pit_calibrate(void)
@@ -383,15 +384,12 @@ success:
 	 *
 	 * As a result, we can depend on there not being
 	 * any odd delays anywhere, and the TSC reads are
-	 * reliable (within the error). We also adjust the
-	 * delta to the middle of the error bars, just
-	 * because it looks nicer.
+	 * reliable (within the error).
 	 *
 	 * kHz = ticks / time-in-seconds / 1000;
 	 * kHz = (t2 - t1) / (I * 256 / PIT_TICK_RATE) / 1000
 	 * kHz = ((t2 - t1) * PIT_TICK_RATE) / (I * 256 * 1000)
 	 */
-	delta += (long)(d2 - d1)/2;
 	delta *= PIT_TICK_RATE;
 	do_div(delta, i*256*1000);
 	printk("Fast TSC calibration using PIT\n");



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:x86/urgent] x86, tsc: Fix SMI induced variation in quick_pit_calibrate()
  2012-01-17 23:35             ` Suresh Siddha
@ 2012-01-18  0:25               ` tip-bot for Linus Torvalds
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Linus Torvalds @ 2012-01-18  0:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, suresh.b.siddha, tglx

Commit-ID:  68f30fbee19cc67849b9fa8e153ede70758afe81
Gitweb:     http://git.kernel.org/tip/68f30fbee19cc67849b9fa8e153ede70758afe81
Author:     Linus Torvalds <torvalds@linux-foundation.org>
AuthorDate: Tue, 17 Jan 2012 15:35:37 -0800
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Tue, 17 Jan 2012 15:46:51 -0800

x86, tsc: Fix SMI induced variation in quick_pit_calibrate()

pit_expect_msb() returns success wrongly in the below SMI scenario:

a. pit_verify_msb() has not yet seen the MSB transition.

b. we are close to the MSB transition though and got a SMI immediately after
   returning from pit_verify_msb() which didn't see the MSB transition. PIT MSB
   transition has happened somewhere during SMI execution.

c. returned from SMI and we noted down the 'tsc', saw the pit MSB change now and
   exited the loop to calculate 'deltatsc'. Instead of noting the TSC at the MSB
   transition, we are way off because of the SMI.  And as the SMI happened
   between the pit_verify_msb() and before the 'tsc' is recorded in the
   for loop, 'delattsc' (d1/d2 in quick_pit_calibrate()) will be small and
   quick_pit_calibrate() will not notice this error.

Depending on whether SMI disturbance happens while computing d1 or d2, we will
see the TSC calibrated value smaller or bigger than the expected value. As a
result, in a cluster we were seeing a variation of approximately +/- 20MHz in
the calibrated values, resulting in NTP failures.

  [ As far as the SMI source is concerned, this is a periodic SMI that gets
    disabled after ACPI is enabled by the OS. But the TSC calibration happens
    before the ACPI is enabled. ]

To address this, change pit_expect_msb() so that

 - the 'tsc' is the TSC in between the two reads that read the MSB
change from the PIT (same as before)

 - the 'delta' is the difference in TSC from *before* the MSB changed
to *after* the MSB changed.

Now the delta is twice as big as before (it covers four PIT accesses,
roughly 4us) and quick_pit_calibrate() will loop a bit longer to get
the calibrated value with in the 500ppm precision. As the delta (d1/d2)
covers four PIT accesses, actual calibrated result might be closer to
250ppm precision.

As the loop now takes longer to stabilize, double MAX_QUICK_PIT_MS to 50.

SMI disturbance will showup as much larger delta's and the loop will take
longer than usual for the result to be with in the accepted precision. Or will
fallback to slow PIT calibration if it takes more than 50msec.

Also while we are at this, remove the calibration correction that aims to
get the result to the middle of the error bars. We really don't know which
direction to correct into, so remove it.

Reported-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1326843337.5291.4.camel@sbsiddha-mobl2
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/kernel/tsc.c |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2c9cf0f..f546946 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -290,14 +290,15 @@ static inline int pit_verify_msb(unsigned char val)
 static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *deltap)
 {
 	int count;
-	u64 tsc = 0;
+	u64 tsc = 0, prev_tsc = 0;
 
 	for (count = 0; count < 50000; count++) {
 		if (!pit_verify_msb(val))
 			break;
+		prev_tsc = tsc;
 		tsc = get_cycles();
 	}
-	*deltap = get_cycles() - tsc;
+	*deltap = get_cycles() - prev_tsc;
 	*tscp = tsc;
 
 	/*
@@ -311,9 +312,9 @@ static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *de
  * How many MSB values do we want to see? We aim for
  * a maximum error rate of 500ppm (in practice the
  * real error is much smaller), but refuse to spend
- * more than 25ms on it.
+ * more than 50ms on it.
  */
-#define MAX_QUICK_PIT_MS 25
+#define MAX_QUICK_PIT_MS 50
 #define MAX_QUICK_PIT_ITERATIONS (MAX_QUICK_PIT_MS * PIT_TICK_RATE / 1000 / 256)
 
 static unsigned long quick_pit_calibrate(void)
@@ -383,15 +384,12 @@ success:
 	 *
 	 * As a result, we can depend on there not being
 	 * any odd delays anywhere, and the TSC reads are
-	 * reliable (within the error). We also adjust the
-	 * delta to the middle of the error bars, just
-	 * because it looks nicer.
+	 * reliable (within the error).
 	 *
 	 * kHz = ticks / time-in-seconds / 1000;
 	 * kHz = (t2 - t1) / (I * 256 / PIT_TICK_RATE) / 1000
 	 * kHz = ((t2 - t1) * PIT_TICK_RATE) / (I * 256 * 1000)
 	 */
-	delta += (long)(d2 - d1)/2;
 	delta *= PIT_TICK_RATE;
 	do_div(delta, i*256*1000);
 	printk("Fast TSC calibration using PIT\n");

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-01-18  0:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-16 20:15 [patch] x86, tsc: fix SMI induced variation in quick_pit_calibrate() Suresh Siddha
2012-01-17  0:18 ` Linus Torvalds
2012-01-17  0:41   ` Suresh Siddha
2012-01-17  0:52     ` Linus Torvalds
2012-01-17  1:30       ` Linus Torvalds
2012-01-17  4:06         ` Suresh Siddha
2012-01-17  5:15           ` Linus Torvalds
2012-01-17 23:35             ` Suresh Siddha
2012-01-18  0:25               ` [tip:x86/urgent] x86, tsc: Fix " tip-bot for Linus Torvalds
2012-01-17  2:47 ` [patch] x86, tsc: fix " Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.