All of lore.kernel.org
 help / color / mirror / Atom feed
* x86: tsc: make TSC calibration more immune to interrupts
@ 2011-04-20 18:52 Kasper Pedersen
  2011-04-20 19:15 ` john stultz
  0 siblings, 1 reply; 11+ messages in thread
From: Kasper Pedersen @ 2011-04-20 18:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, John Stultz,
	Peter Zijlstra, Suresh Siddha, Kasper Pedersen

When a SMI or plain interrupt occurs during the delayed part
of TSC calibration, and the SMI/irq handler is good and fast
so that is does not exceed SMI_TRESHOLD, tsc_khz can be a bit
off (10-30ppm).

We should not depend on interrupts being longer than 50000
clocks, so always do the 5 tries, and use the best sample we
get.
This should work always for any four periodic or rate-limited
interrupt sources. If we get 5 interrupts with 500ns gaps in
a row, behaviour should be as without this patch.

This costs us 20-100 microseconds in startup time, as
tsc_read_refs is called 8 times.

measurements:
On a 700MHz P3 I see t2-t1=~22000, and 31ppm error.
A Core2 is similar: http://n1.taur.dk/tscdeviat.png
(while mostly t2-t1=~1000, in about 1 of 3000 tests
I see t2-t1=~20000 for both machines.)
vmware ESX4 has t2-t1=~8000 and up.

Signed-off-by: Kasper Pedersen <kernel@kasperkp.dk>
---
 arch/x86/kernel/tsc.c |   24 +++++++++++++++++-------
 1 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ffe5755..e916f99 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -117,7 +117,7 @@ static int __init tsc_setup(char *str)
 
 __setup("tsc=", tsc_setup);
 
-#define MAX_RETRIES     5
+#define BESTOF_SAMPLES      5
 #define SMI_TRESHOLD    50000
 
 /*
@@ -125,19 +125,29 @@ __setup("tsc=", tsc_setup);
  */
 static u64 tsc_read_refs(u64 *p, int hpet)
 {
-	u64 t1, t2;
+	u64 t1, t2, tp, best_uncertainty, uncertainty, best_t2;
 	int i;
 
-	for (i = 0; i < MAX_RETRIES; i++) {
+	best_uncertainty = SMI_TRESHOLD;
+	best_t2 = 0;
+	for (i = 0; i < BESTOF_SAMPLES; i++) {
 		t1 = get_cycles();
 		if (hpet)
-			*p = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
+			tp = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
 		else
-			*p = acpi_pm_read_early();
+			tp = acpi_pm_read_early();
 		t2 = get_cycles();
-		if ((t2 - t1) < SMI_TRESHOLD)
-			return t2;
+		uncertainty = t2 - t1;
+		if (uncertainty < best_uncertainty) {
+			best_uncertainty = uncertainty;
+			best_t2 = t2;
+			*p = tp;
+		}
 	}
+	if (best_uncertainty < SMI_TRESHOLD)
+		return best_t2;
+
+	*p = tp;
 	return ULLONG_MAX;
 }
 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: x86: tsc: make TSC calibration more immune to interrupts
  2011-04-20 18:52 x86: tsc: make TSC calibration more immune to interrupts Kasper Pedersen
@ 2011-04-20 19:15 ` john stultz
  2011-04-20 19:44   ` Kasper Pedersen
  0 siblings, 1 reply; 11+ messages in thread
From: john stultz @ 2011-04-20 19:15 UTC (permalink / raw)
  To: Kasper Pedersen
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Suresh Siddha

On Wed, 2011-04-20 at 20:52 +0200, Kasper Pedersen wrote:
> When a SMI or plain interrupt occurs during the delayed part
> of TSC calibration, and the SMI/irq handler is good and fast
> so that is does not exceed SMI_TRESHOLD, tsc_khz can be a bit
> off (10-30ppm).
> 
> We should not depend on interrupts being longer than 50000
> clocks, so always do the 5 tries, and use the best sample we
> get.
> This should work always for any four periodic or rate-limited
> interrupt sources. If we get 5 interrupts with 500ns gaps in
> a row, behaviour should be as without this patch.
> 
> This costs us 20-100 microseconds in startup time, as
> tsc_read_refs is called 8 times.
> 
> measurements:
> On a 700MHz P3 I see t2-t1=~22000, and 31ppm error.
> A Core2 is similar: http://n1.taur.dk/tscdeviat.png
> (while mostly t2-t1=~1000, in about 1 of 3000 tests
> I see t2-t1=~20000 for both machines.)
> vmware ESX4 has t2-t1=~8000 and up.


I guess I'm curious how useful this is with the refined TSC calibration
that was added not too long ago:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=08ec0c58fb8a05d3191d5cb6f5d6f81adb419798

Are you saying that you see the same 10-30ppm variance in the dmesg
line: "Refined TSC clocksource calibration: XXXX.XXX MHz" ?

thanks
-john



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86: tsc: make TSC calibration more immune to interrupts
  2011-04-20 19:15 ` john stultz
@ 2011-04-20 19:44   ` Kasper Pedersen
  2011-04-20 20:28     ` john stultz
  0 siblings, 1 reply; 11+ messages in thread
From: Kasper Pedersen @ 2011-04-20 19:44 UTC (permalink / raw)
  To: john stultz
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Suresh Siddha

On 04/20/2011 09:15 PM, john stultz wrote:
> On Wed, 2011-04-20 at 20:52 +0200, Kasper Pedersen wrote:
>> When a SMI or plain interrupt occurs during the delayed part
>> of TSC calibration, and the SMI/irq handler is good and fast
>> so that is does not exceed SMI_TRESHOLD, tsc_khz can be a bit
>> off (10-30ppm).

> I guess I'm curious how useful this is with the refined TSC calibration
> that was added not too long ago:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=08ec0c58fb8a05d3191d5cb6f5d6f81adb419798
> 
> Are you saying that you see the same 10-30ppm variance in the dmesg
> line: "Refined TSC clocksource calibration: XXXX.XXX MHz" ?
>

Yes, I do.
With the delayed workqueue patch, I see a wonderful 0.4ppm
offset _almost_ all of the time.

Very rarely though (about 1 in 3000) the value output by
"Refined TSC .." jumps. It can jump no further than
50000/F_TSC/delayed_time, so on a modern machine it jumps
no further than 20ppm.

This can happen when a short irq occurs between
*p = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
and
t2 = get_cycles();


Without the delayed workqueue patch 30ppm is insignificant.

/Kasper Pederen

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86: tsc: make TSC calibration more immune to interrupts
  2011-04-20 19:44   ` Kasper Pedersen
@ 2011-04-20 20:28     ` john stultz
  2011-04-20 21:22       ` x86: tsc: v2 " Kasper Pedersen
  0 siblings, 1 reply; 11+ messages in thread
From: john stultz @ 2011-04-20 20:28 UTC (permalink / raw)
  To: Kasper Pedersen
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Suresh Siddha

On Wed, 2011-04-20 at 21:44 +0200, Kasper Pedersen wrote:
> With the delayed workqueue patch, I see a wonderful 0.4ppm
> offset _almost_ all of the time.
> 
> Very rarely though (about 1 in 3000) the value output by
> "Refined TSC .." jumps. It can jump no further than
> 50000/F_TSC/delayed_time, so on a modern machine it jumps
> no further than 20ppm.
> 
> This can happen when a short irq occurs between
> *p = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
> and
> t2 = get_cycles();
> 
> 
> Without the delayed workqueue patch 30ppm is insignificant.

Thanks for the additional details!

In that case, the patch seems reasonable to me.

If the additional ~200us (I suspect 100us is a little low, as acpi pm
can take ~3.5us access time * 5 * 8 = 140us) is an issue, you could
probably add a "good vs best" flag, which would decide between returning
the first value that is under the SMI_THRESHOLD vs your method of
returning the best value with the lowest uncertainty. This would allow
us to only use it in the refined calibration where the smaller noise
reduction makes a difference.

Ingo, Thomas, any other complaints?

thanks
-john



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86: tsc: v2 make TSC calibration more immune to interrupts
  2011-04-20 20:28     ` john stultz
@ 2011-04-20 21:22       ` Kasper Pedersen
  2011-04-20 22:39         ` Josh Triplett
  0 siblings, 1 reply; 11+ messages in thread
From: Kasper Pedersen @ 2011-04-20 21:22 UTC (permalink / raw)
  To: john stultz
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Peter Zijlstra, Suresh Siddha

When a SMI or plain interrupt occurs during the delayed part
of TSC calibration, and the SMI/irq handler is good and fast
so that is does not exceed SMI_TRESHOLD, tsc_khz can be a bit
off (10-30ppm).

We should not depend on interrupts being longer than 50000
clocks, so, in the refined calibration, always do the 5
tries, and use the best sample we get.

This should work always for any four periodic or rate-limited
interrupt sources. If we get 5 interrupts with 500ns gaps in
a row, behaviour should be as without this patch.

It is safe to use the first value that passes SMI_TRESHOLD
for the initial calibration: As long as tsc_khz is above
100MHz, SMI_TRESHOLD represents less than 1% of error.

The 8 additional samples costs us 28 microseconds in startup
time.

measurements:
On a 700MHz P3 I see t2-t1=~22000, and 31ppm error.
A Core2 is similar: http://n1.taur.dk/tscdeviat.png
(while mostly t2-t1=~1000, in about 1 of 3000 tests
I see t2-t1=~20000 for both machines.)
vmware ESX4 has t2-t1=~8000 and up.

v2: John Stulz suggested limiting best uncertainty to
where it is needed, saving ~170usec startup time.

Signed-off-by: Kasper Pedersen <kernel@kasperkp.dk>
---
 arch/x86/kernel/tsc.c |   36 ++++++++++++++++++++++++------------
 1 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ffe5755..8dc813b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -117,27 +117,39 @@ static int __init tsc_setup(char *str)
 
 __setup("tsc=", tsc_setup);
 
-#define MAX_RETRIES     5
+#define BESTOF_SAMPLES      5
 #define SMI_TRESHOLD    50000
 
 /*
  * Read TSC and the reference counters. Take care of SMI disturbance
  */
-static u64 tsc_read_refs(u64 *p, int hpet)
+static u64 tsc_read_refs(u64 *p, int hpet, int find_best)
 {
-	u64 t1, t2;
+	u64 t1, t2, tp, best_uncertainty, uncertainty, best_t2;
 	int i;
 
-	for (i = 0; i < MAX_RETRIES; i++) {
+	best_uncertainty = SMI_TRESHOLD;
+	best_t2 = 0;
+	for (i = 0; i < BESTOF_SAMPLES; i++) {
 		t1 = get_cycles();
 		if (hpet)
-			*p = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
+			tp = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
 		else
-			*p = acpi_pm_read_early();
+			tp = acpi_pm_read_early();
 		t2 = get_cycles();
-		if ((t2 - t1) < SMI_TRESHOLD)
-			return t2;
+		uncertainty = t2 - t1;
+		if (uncertainty < best_uncertainty) {
+			best_uncertainty = uncertainty;
+			best_t2 = t2;
+			*p = tp;
+			if (!find_best)
+				break;
+		}
 	}
+	if (best_uncertainty < SMI_TRESHOLD)
+		return best_t2;
+
+	*p = tp;
 	return ULLONG_MAX;
 }
 
@@ -455,9 +467,9 @@ unsigned long native_calibrate_tsc(void)
 		 * read the end value.
 		 */
 		local_irq_save(flags);
-		tsc1 = tsc_read_refs(&ref1, hpet);
+		tsc1 = tsc_read_refs(&ref1, hpet, 0);
 		tsc_pit_khz = pit_calibrate_tsc(latch, ms, loopmin);
-		tsc2 = tsc_read_refs(&ref2, hpet);
+		tsc2 = tsc_read_refs(&ref2, hpet, 0);
 		local_irq_restore(flags);
 
 		/* Pick the lowest PIT TSC calibration so far */
@@ -928,11 +940,11 @@ static void tsc_refine_calibration_work(struct work_struct *work)
 		 */
 		hpet = is_hpet_enabled();
 		schedule_delayed_work(&tsc_irqwork, HZ);
-		tsc_start = tsc_read_refs(&ref_start, hpet);
+		tsc_start = tsc_read_refs(&ref_start, hpet, 1);
 		return;
 	}
 
-	tsc_stop = tsc_read_refs(&ref_stop, hpet);
+	tsc_stop = tsc_read_refs(&ref_stop, hpet, 1);
 
 	/* hpet or pmtimer available ? */
 	if (ref_start == ref_stop)



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: x86: tsc: v2 make TSC calibration more immune to interrupts
  2011-04-20 21:22       ` x86: tsc: v2 " Kasper Pedersen
@ 2011-04-20 22:39         ` Josh Triplett
  2011-04-21  2:19           ` john stultz
                             ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Josh Triplett @ 2011-04-20 22:39 UTC (permalink / raw)
  To: Kasper Pedersen
  Cc: john stultz, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Peter Zijlstra, Suresh Siddha

On Wed, Apr 20, 2011 at 11:22:19PM +0200, Kasper Pedersen wrote:
> When a SMI or plain interrupt occurs during the delayed part
> of TSC calibration, and the SMI/irq handler is good and fast
> so that is does not exceed SMI_TRESHOLD, tsc_khz can be a bit
> off (10-30ppm).
> 
> We should not depend on interrupts being longer than 50000
> clocks, so, in the refined calibration, always do the 5
> tries, and use the best sample we get.
> 
> This should work always for any four periodic or rate-limited
> interrupt sources. If we get 5 interrupts with 500ns gaps in
> a row, behaviour should be as without this patch.
> 
> It is safe to use the first value that passes SMI_TRESHOLD
> for the initial calibration: As long as tsc_khz is above
> 100MHz, SMI_TRESHOLD represents less than 1% of error.
> 
> The 8 additional samples costs us 28 microseconds in startup
> time.
> 
> measurements:
> On a 700MHz P3 I see t2-t1=~22000, and 31ppm error.
> A Core2 is similar: http://n1.taur.dk/tscdeviat.png
> (while mostly t2-t1=~1000, in about 1 of 3000 tests
> I see t2-t1=~20000 for both machines.)
> vmware ESX4 has t2-t1=~8000 and up.
> 
> v2: John Stulz suggested limiting best uncertainty to
> where it is needed, saving ~170usec startup time.

Have you considered disabling interrupts while calibrating?  That would
ensure that you only have to care about SMIs, not arbitrary interrupts.

Also, on more recent x86 systems you could look at MSR_SMI_COUNT (MSR
0x34) to detect if any SMIs have occurred during the sample period.
rdmsr, start sample period, stop sample period, rdmsr, if delta of 0
then no SMIs occurred.  Exists on Nehalem and newer, at least.

- Josh Triplett

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86: tsc: v2 make TSC calibration more immune to interrupts
  2011-04-20 22:39         ` Josh Triplett
@ 2011-04-21  2:19           ` john stultz
  2011-04-21  4:32             ` Josh Triplett
  2011-04-21 19:46           ` Kasper Pedersen
  2011-04-21 19:52           ` x86: tsc: v3 " Kasper Pedersen
  2 siblings, 1 reply; 11+ messages in thread
From: john stultz @ 2011-04-21  2:19 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Kasper Pedersen, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Peter Zijlstra, Suresh Siddha

On Wed, 2011-04-20 at 15:39 -0700, Josh Triplett wrote:
> On Wed, Apr 20, 2011 at 11:22:19PM +0200, Kasper Pedersen wrote:
> > When a SMI or plain interrupt occurs during the delayed part
> > of TSC calibration, and the SMI/irq handler is good and fast
> > so that is does not exceed SMI_TRESHOLD, tsc_khz can be a bit
> > off (10-30ppm).
> > 
> > We should not depend on interrupts being longer than 50000
> > clocks, so, in the refined calibration, always do the 5
> > tries, and use the best sample we get.
> > 
> > This should work always for any four periodic or rate-limited
> > interrupt sources. If we get 5 interrupts with 500ns gaps in
> > a row, behaviour should be as without this patch.
> > 
> > It is safe to use the first value that passes SMI_TRESHOLD
> > for the initial calibration: As long as tsc_khz is above
> > 100MHz, SMI_TRESHOLD represents less than 1% of error.
> > 
> > The 8 additional samples costs us 28 microseconds in startup
> > time.
> > 
> > measurements:
> > On a 700MHz P3 I see t2-t1=~22000, and 31ppm error.
> > A Core2 is similar: http://n1.taur.dk/tscdeviat.png
> > (while mostly t2-t1=~1000, in about 1 of 3000 tests
> > I see t2-t1=~20000 for both machines.)
> > vmware ESX4 has t2-t1=~8000 and up.
> > 
> > v2: John Stulz suggested limiting best uncertainty to
> > where it is needed, saving ~170usec startup time.
> 
> Have you considered disabling interrupts while calibrating?  That would
> ensure that you only have to care about SMIs, not arbitrary interrupts.

This calibration is actually timer based (and runs for 1 second,
allowing the system to continue booting in the meantime), so disabling
irqs wouldn't work. You could just disable irqs during the tsc_getref,
but that still has the possibility to get hit by SMIs, which are the
real issue.

> Also, on more recent x86 systems you could look at MSR_SMI_COUNT (MSR
> 0x34) to detect if any SMIs have occurred during the sample period.
> rdmsr, start sample period, stop sample period, rdmsr, if delta of 0
> then no SMIs occurred.  Exists on Nehalem and newer, at least.

That's interesting... but probably still too machine specific to be
generally useful.

thanks
-john


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86: tsc: v2 make TSC calibration more immune to interrupts
  2011-04-21  2:19           ` john stultz
@ 2011-04-21  4:32             ` Josh Triplett
  0 siblings, 0 replies; 11+ messages in thread
From: Josh Triplett @ 2011-04-21  4:32 UTC (permalink / raw)
  To: john stultz
  Cc: Kasper Pedersen, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Peter Zijlstra, Suresh Siddha

On Wed, Apr 20, 2011 at 07:19:28PM -0700, john stultz wrote:
> On Wed, 2011-04-20 at 15:39 -0700, Josh Triplett wrote:
> > On Wed, Apr 20, 2011 at 11:22:19PM +0200, Kasper Pedersen wrote:
> > > When a SMI or plain interrupt occurs during the delayed part
> > > of TSC calibration, and the SMI/irq handler is good and fast
> > > so that is does not exceed SMI_TRESHOLD, tsc_khz can be a bit
> > > off (10-30ppm).
> > > 
> > > We should not depend on interrupts being longer than 50000
> > > clocks, so, in the refined calibration, always do the 5
> > > tries, and use the best sample we get.
> > > 
> > > This should work always for any four periodic or rate-limited
> > > interrupt sources. If we get 5 interrupts with 500ns gaps in
> > > a row, behaviour should be as without this patch.
> > > 
> > > It is safe to use the first value that passes SMI_TRESHOLD
> > > for the initial calibration: As long as tsc_khz is above
> > > 100MHz, SMI_TRESHOLD represents less than 1% of error.
> > > 
> > > The 8 additional samples costs us 28 microseconds in startup
> > > time.
> > > 
> > > measurements:
> > > On a 700MHz P3 I see t2-t1=~22000, and 31ppm error.
> > > A Core2 is similar: http://n1.taur.dk/tscdeviat.png
> > > (while mostly t2-t1=~1000, in about 1 of 3000 tests
> > > I see t2-t1=~20000 for both machines.)
> > > vmware ESX4 has t2-t1=~8000 and up.
> > > 
> > > v2: John Stulz suggested limiting best uncertainty to
> > > where it is needed, saving ~170usec startup time.
> > 
> > Have you considered disabling interrupts while calibrating?  That would
> > ensure that you only have to care about SMIs, not arbitrary interrupts.
> 
> This calibration is actually timer based (and runs for 1 second,
> allowing the system to continue booting in the meantime), so disabling
> irqs wouldn't work. You could just disable irqs during the tsc_getref,
> but that still has the possibility to get hit by SMIs, which are the
> real issue.

Ah, I see.  But it sounds like disabling IRQs during the critical region
would at least control all the sources of jitter that the kernel has
control over, and if tsc_getref only lasts for a few microseconds then
it has a very good chance of avoiding SMIs, as evidenced by the rarity
of the original problem reported in this thread ("about 1 in 3000").

> > Also, on more recent x86 systems you could look at MSR_SMI_COUNT (MSR
> > 0x34) to detect if any SMIs have occurred during the sample period.
> > rdmsr, start sample period, stop sample period, rdmsr, if delta of 0
> > then no SMIs occurred.  Exists on Nehalem and newer, at least.
> 
> That's interesting... but probably still too machine specific to be
> generally useful.

It seems like something usable as an enhancement if available: if the
MSR exists, use it to detect a lack of SMIs, and if no SMIs occur then
you don't need to keep sampling.  If the MSR doesn't exist, then go
ahead and sample a few times.

- Josh Triplett

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86: tsc: v2 make TSC calibration more immune to interrupts
  2011-04-20 22:39         ` Josh Triplett
  2011-04-21  2:19           ` john stultz
@ 2011-04-21 19:46           ` Kasper Pedersen
  2011-04-23  1:38             ` john stultz
  2011-04-21 19:52           ` x86: tsc: v3 " Kasper Pedersen
  2 siblings, 1 reply; 11+ messages in thread
From: Kasper Pedersen @ 2011-04-21 19:46 UTC (permalink / raw)
  To: Josh Triplett
  Cc: john stultz, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Peter Zijlstra, Suresh Siddha

On 04/21/2011 12:39 AM, Josh Triplett wrote:
> 
> Have you considered disabling interrupts while calibrating?  That would
> ensure that you only have to care about SMIs, not arbitrary interrupts.
>
> Also, on more recent x86 systems you could look at MSR_SMI_COUNT (MSR
> 0x34) to detect if any SMIs have occurred during the sample period.
> rdmsr, start sample period, stop sample period, rdmsr, if delta of 0
> then no SMIs occurred.  Exists on Nehalem and newer, at least.


I have now tested this, and it is worth doing.

+		local_irq_save(flags);
		t1 = get_cycles();
		if (hpet)
			tp = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
		else
			tp = acpi_pm_read_early();
		t2 = get_cycles();
+		local_irq_restore(flags);

On P3, when no SMI occur, the sample is now always perfect.
On Core2 there is still very rare variation - up to 8000 clock,
maybe bus contention?
Thinking it was cache I tried keeping irqs disabled for 5M
iterations, and that makes it occur _more_ often.

Atom shows the same as Core2, but even more rarely.

So, even if we have a Nehalem with SMI counter, the 5 samples
will still have benefit.


/Kasper Pedersen
-- 
This is my second patch ever, tell me when I do something wrong.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* x86: tsc: v3 make TSC calibration more immune to interrupts
  2011-04-20 22:39         ` Josh Triplett
  2011-04-21  2:19           ` john stultz
  2011-04-21 19:46           ` Kasper Pedersen
@ 2011-04-21 19:52           ` Kasper Pedersen
  2 siblings, 0 replies; 11+ messages in thread
From: Kasper Pedersen @ 2011-04-21 19:52 UTC (permalink / raw)
  To: Josh Triplett
  Cc: john stultz, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Peter Zijlstra, Suresh Siddha

When a SMI or plain interrupt occurs during the delayed part
of TSC calibration, and the SMI/irq handler is good and fast
so that is does not exceed SMI_TRESHOLD, tsc_khz can be a bit
off (10-30ppm).

We should keep plain interrupts out of the reading of
tsc/reference, so disable interrupts there.

We should not depend on SMIs being longer than 50000 clocks,
so, in the refined calibration, always do the 5 tries, and
use the best sample we get.

This should work always for any four periodic or rate-limited
SMI sources. If we get 5 SMIs with 500ns gaps in a row, 
behaviour should be as without this patch.

It is safe to use the first value that passes SMI_TRESHOLD
for the initial calibration: As long as tsc_khz is above
100MHz, SMI_TRESHOLD represents less than 1% of error.

The 8 additional samples costs us 28 microseconds in startup
time.

measurements:
On a 700MHz P3 I see t2-t1=~22000, and 31ppm error.
A Core2 is similar: http://n1.taur.dk/tscdeviat.png
(while mostly t2-t1=~1000, in about 1 of 3000 tests
I see t2-t1=~20000 for both machines.)
vmware ESX4 has t2-t1=~8000 and up.

v2: John Stultz suggested limiting best uncertainty to
where it is needed, saving ~170usec startup time.

v3: Josh Triplett suggested disabling irqs. This does
indeed help, and the 5-sample code now only needs to
handle the SMI case.

Signed-off-by: Kasper Pedersen <kernel@kasperkp.dk>
---
 arch/x86/kernel/tsc.c |   39 +++++++++++++++++++++++++++------------
 1 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ffe5755..9983c03 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -117,27 +117,42 @@ static int __init tsc_setup(char *str)
 
 __setup("tsc=", tsc_setup);
 
-#define MAX_RETRIES     5
+#define BESTOF_SAMPLES      5
 #define SMI_TRESHOLD    50000
 
 /*
  * Read TSC and the reference counters. Take care of SMI disturbance
  */
-static u64 tsc_read_refs(u64 *p, int hpet)
+static u64 tsc_read_refs(u64 *p, int hpet, int find_best)
 {
-	u64 t1, t2;
+	u64 t1, t2, tp, best_uncertainty, uncertainty, best_t2;
 	int i;
+	unsigned long flags;
 
-	for (i = 0; i < MAX_RETRIES; i++) {
+	best_uncertainty = SMI_TRESHOLD;
+	best_t2 = 0;
+	for (i = 0; i < BESTOF_SAMPLES; i++) {
+		local_irq_save(flags);
 		t1 = get_cycles();
 		if (hpet)
-			*p = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
+			tp = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
 		else
-			*p = acpi_pm_read_early();
+			tp = acpi_pm_read_early();
 		t2 = get_cycles();
-		if ((t2 - t1) < SMI_TRESHOLD)
-			return t2;
+		local_irq_restore(flags);
+		uncertainty = t2 - t1;
+		if (uncertainty < best_uncertainty) {
+			best_uncertainty = uncertainty;
+			best_t2 = t2;
+			*p = tp;
+			if (!find_best)
+				break;
+		}
 	}
+	if (best_uncertainty < SMI_TRESHOLD)
+		return best_t2;
+
+	*p = tp;
 	return ULLONG_MAX;
 }
 
@@ -455,9 +470,9 @@ unsigned long native_calibrate_tsc(void)
 		 * read the end value.
 		 */
 		local_irq_save(flags);
-		tsc1 = tsc_read_refs(&ref1, hpet);
+		tsc1 = tsc_read_refs(&ref1, hpet, 0);
 		tsc_pit_khz = pit_calibrate_tsc(latch, ms, loopmin);
-		tsc2 = tsc_read_refs(&ref2, hpet);
+		tsc2 = tsc_read_refs(&ref2, hpet, 0);
 		local_irq_restore(flags);
 
 		/* Pick the lowest PIT TSC calibration so far */
@@ -928,11 +943,11 @@ static void tsc_refine_calibration_work(struct work_struct *work)
 		 */
 		hpet = is_hpet_enabled();
 		schedule_delayed_work(&tsc_irqwork, HZ);
-		tsc_start = tsc_read_refs(&ref_start, hpet);
+		tsc_start = tsc_read_refs(&ref_start, hpet, 1);
 		return;
 	}
 
-	tsc_stop = tsc_read_refs(&ref_stop, hpet);
+	tsc_stop = tsc_read_refs(&ref_stop, hpet, 1);
 
 	/* hpet or pmtimer available ? */
 	if (ref_start == ref_stop)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: x86: tsc: v2 make TSC calibration more immune to interrupts
  2011-04-21 19:46           ` Kasper Pedersen
@ 2011-04-23  1:38             ` john stultz
  0 siblings, 0 replies; 11+ messages in thread
From: john stultz @ 2011-04-23  1:38 UTC (permalink / raw)
  To: Kasper Pedersen
  Cc: Josh Triplett, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, Peter Zijlstra, Suresh Siddha

On Thu, 2011-04-21 at 21:46 +0200, Kasper Pedersen wrote:
> On 04/21/2011 12:39 AM, Josh Triplett wrote:
> > 
> > Have you considered disabling interrupts while calibrating?  That would
> > ensure that you only have to care about SMIs, not arbitrary interrupts.
> >
> > Also, on more recent x86 systems you could look at MSR_SMI_COUNT (MSR
> > 0x34) to detect if any SMIs have occurred during the sample period.
> > rdmsr, start sample period, stop sample period, rdmsr, if delta of 0
> > then no SMIs occurred.  Exists on Nehalem and newer, at least.
> 
> 
> I have now tested this, and it is worth doing.

Cool! When you have the chance, send out your latest patch and I'll
review/ack and hopefully queue it for tglx.

thanks
-john



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-04-23  1:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-20 18:52 x86: tsc: make TSC calibration more immune to interrupts Kasper Pedersen
2011-04-20 19:15 ` john stultz
2011-04-20 19:44   ` Kasper Pedersen
2011-04-20 20:28     ` john stultz
2011-04-20 21:22       ` x86: tsc: v2 " Kasper Pedersen
2011-04-20 22:39         ` Josh Triplett
2011-04-21  2:19           ` john stultz
2011-04-21  4:32             ` Josh Triplett
2011-04-21 19:46           ` Kasper Pedersen
2011-04-23  1:38             ` john stultz
2011-04-21 19:52           ` x86: tsc: v3 " Kasper Pedersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.