linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
@ 2015-11-09 19:59 gratian.crisan
  2015-11-09 22:02 ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: gratian.crisan @ 2015-11-09 19:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Ingo Molnar, H . Peter Anvin, x86, Borislav Petkov,
	Peter Zijlstra, Josh Cartwright, gratian

From: Gratian Crisan <gratian.crisan@ni.com>

The Intel Xeon E5 processor family suffers from errata[1] BT81:
"TSC is Not Affected by Warm Reset.
Problem: The TSC (Time Stamp Counter MSR 10H) should be cleared on reset.
Due to this erratum the TSC is not affected by warm reset.
Implication: The TSC is not cleared by a warm reset. The TSC is cleared by
power-on reset as expected. Intel has not observed any functional failures
due to this erratum.
Workaround: None identified.
Status: There are no plans to fix this erratum."

The observed behavior: after a warm reset the TSC gets reset for CPU0 but
not for any of the other cores i.e. they continue incrementing from the
value they had before the reset. The TSCs are otherwise stable and
always-running so the offset error stays constant.

Add x86 bug flag if an Intel Xeon E5 gets detected and based on that
synchronize the TSC offset by performing the following measurement:

target     source
  t0 ---\
         \-->
             ts
         /--
  t1 <--/

Where:
  * target is the target CPU who's TSC offset we are trying to correct;
  * source is the source CPU used as a reference (i.e. the boot CPU);
  * t0, t1 are TSC time-stamps obtained on the target CPU;
  * ts is the time-stamp acquired on the source CPU.

If the source and target CPU TSCs are synchronized, and the interconnect is
symmetric, then ts falls exactly half-way between t0 and t1. In practice
the measured offset will include the RDTSC instruction latency as well as
the latency introduced by the interconnect. To account for these latencies
we are performing the offset measurement in a loop and use for correction
the minimum measured offset; the idea being that it contains the least
amount of unaccounted for latency. The minimum measured offset is then used
to adjust the TSC register on the target CPU.

Signed-off-by: Gratian Crisan <gratian.crisan@ni.com>

[1] http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf
---
 arch/x86/include/asm/cpufeature.h |   1 +
 arch/x86/kernel/cpu/intel.c       |   9 +++
 arch/x86/kernel/tsc_sync.c        | 124 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 134 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index e4f8010..3fb0b62 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -272,6 +272,7 @@
 #define X86_BUG_FXSAVE_LEAK	X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */
 #define X86_BUG_CLFLUSH_MONITOR	X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */
 #define X86_BUG_SYSRET_SS_ATTRS	X86_BUG(8) /* SYSRET doesn't fix up SS attrs */
+#define X86_BUG_TSC_OFFSET	X86_BUG(9) /* CPU has skewed but stable TSCs */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 209ac1e..42732dc 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -296,6 +296,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
 #else
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
+#ifdef CONFIG_X86_TSC
+	/*
+	 * Xeon E5 BT81 errata: TSC is not affected by warm reset.
+	 * The TSC registers for CPUs other than CPU0 are not cleared by a warm
+	 * reset resulting in a constant offset error.
+	 */
+	if ((c->x86 == 6) && (c->x86_model == 0x3f))
+		set_cpu_bug(c, X86_BUG_TSC_OFFSET);
+#endif
 }
 #endif
 
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 78083bf..0d0f40c 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -114,6 +114,124 @@ static inline unsigned int loop_timeout(int cpu)
 }
 
 /*
+ * Read the current TSC counter value excluding time-stamps that are zero.
+ * Zero is treated as a special measurement synchronization value in the TSC
+ * offset synchronization code.
+ */
+static inline unsigned long long get_cycles_nz(void)
+{
+	unsigned long long ts;
+again:
+	ts = rdtsc_ordered();
+	if (unlikely(!ts))
+		goto again;
+	return ts;
+}
+
+static atomic64_t target_t0;
+static atomic64_t target_t1;
+static atomic64_t source_ts;
+/*
+ * Measure the TSC offset for the target CPU being brought up vs. the source
+ * CPU. We are collecting three time-stamps:
+ *
+ * target     source
+ *   t0 ---\
+ *          \-->
+ *              ts
+ *          /--
+ *   t1 <--/
+ *
+ * If the source and target TSCs are synchronized, and the interconnect is
+ * symmetric, then ts falls exactly half-way between t0 and t1. We are returning
+ * any deviation from [t0..t1] mid-point as the offset of the target TSC vs. the
+ * source TSC. The measured offset will contain errors like the latency of RDTSC
+ * instruction and the latency introduced by the interconnect. Multiple
+ * measurements are required to filter out these errors.
+ */
+static s64 target_tsc_offset(void)
+{
+	u64 t0, t1, ts;
+	s64 offset;
+
+	t0 = get_cycles_nz();
+	atomic64_set(&target_t0, t0);
+
+	while (!(ts = atomic64_read(&source_ts)))
+		cpu_relax();
+	atomic64_set(&source_ts, 0);
+
+	t1 = get_cycles_nz();
+
+	/* Calculate the offset w/o overflow. */
+	offset = t0/2 + t1/2 - ts;
+	offset += ((t0 & 0x1) & (t1 & 0x1));
+
+	atomic64_set(&target_t1, t1);
+
+	return offset;
+}
+
+static void source_tsc_offset(void)
+{
+	while (!atomic64_read(&target_t0))
+		cpu_relax();
+	atomic64_set(&target_t0, 0);
+
+	atomic64_set(&source_ts, get_cycles_nz());
+
+	while (!atomic64_read(&target_t1))
+		cpu_relax();
+	atomic64_set(&target_t1, 0);
+}
+
+static void adjust_tsc_offset(s64 offset)
+{
+	u64 ts;
+
+	ts = rdtsc_ordered();
+	ts -= offset;
+	write_tsc((u32)ts, (u32)(ts >> 32));
+}
+
+/*
+ * Synchronize a target CPU that has a constant offset vs. a source CPU.
+ * Multiple measurements of the TSC offset are performed and the minimum
+ * value is used for adjustment. This is to eliminate as much of the measurement
+ * latency as possible; it will also filter out the errors in the first
+ * iteration caused by the target CPU arriving early.
+ */
+#define NUM_SYNC_ROUNDS 64
+static void sync_tsc_target(void)
+{
+	int i;
+	s64 off, min_off;
+
+	min_off = S64_MAX;
+	for (i = 0; i < NUM_SYNC_ROUNDS; i++) {
+		off = target_tsc_offset();
+		if (i && (abs64(off) < abs64(min_off)))
+			min_off = off;
+		if (unlikely(!(i & 7)))
+			touch_nmi_watchdog();
+	}
+	adjust_tsc_offset(min_off);
+}
+
+static void sync_tsc_source(void)
+{
+	int i;
+
+	preempt_disable();
+	for (i = 0; i < NUM_SYNC_ROUNDS; i++) {
+		source_tsc_offset();
+		if (unlikely(!(i & 7)))
+			touch_nmi_watchdog();
+	}
+	preempt_enable();
+}
+
+/*
  * Source CPU calls into this - it waits for the freshly booted
  * target CPU to arrive and then starts the measurement:
  */
@@ -121,6 +239,9 @@ void check_tsc_sync_source(int cpu)
 {
 	int cpus = 2;
 
+	if (static_cpu_has_bug(X86_BUG_TSC_OFFSET))
+		sync_tsc_source();
+
 	/*
 	 * No need to check if we already know that the TSC is not
 	 * synchronized or if we have no TSC.
@@ -187,6 +308,9 @@ void check_tsc_sync_target(void)
 {
 	int cpus = 2;
 
+	if (static_cpu_has_bug(X86_BUG_TSC_OFFSET))
+		sync_tsc_target();
+
 	/* Also aborts if there is no TSC. */
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
 		return;
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-09 19:59 [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error gratian.crisan
@ 2015-11-09 22:02 ` Peter Zijlstra
       [not found]   ` <CAKA=qzarnUUmZb7DQE+u0Dei3F+FQNoL2bak_-dV9D9+3L=itQ@mail.gmail.com>
  2015-11-13 21:13   ` Dave Hansen
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2015-11-09 22:02 UTC (permalink / raw)
  To: gratian.crisan
  Cc: Thomas Gleixner, linux-kernel, Ingo Molnar, H . Peter Anvin, x86,
	Borislav Petkov, Josh Cartwright, gratian

On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.crisan@ni.com wrote:

> The Intel Xeon E5 processor family suffers from errata[1] BT81:

> +#ifdef CONFIG_X86_TSC
> +	/*
> +	 * Xeon E5 BT81 errata: TSC is not affected by warm reset.
> +	 * The TSC registers for CPUs other than CPU0 are not cleared by a warm
> +	 * reset resulting in a constant offset error.
> +	 */
> +	if ((c->x86 == 6) && (c->x86_model == 0x3f))
> +		set_cpu_bug(c, X86_BUG_TSC_OFFSET);
> +#endif

That's hardly a family, that's just one, Haswell server.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
       [not found]   ` <CAKA=qzarnUUmZb7DQE+u0Dei3F+FQNoL2bak_-dV9D9+3L=itQ@mail.gmail.com>
@ 2015-11-10 18:27     ` Josh Hunt
  2015-11-10 19:47       ` Gratian Crisan
  0 siblings, 1 reply; 11+ messages in thread
From: Josh Hunt @ 2015-11-10 18:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: gratian.crisan, Thomas Gleixner, LKML, Ingo Molnar,
	H . Peter Anvin, x86, Borislav Petkov, Josh Cartwright, gratian

On Tue, Nov 10, 2015 at 12:24 PM, Josh Hunt <joshhunt00@gmail.com> wrote:
>
> On Mon, Nov 9, 2015 at 4:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.crisan@ni.com wrote:
>>
>> > The Intel Xeon E5 processor family suffers from errata[1] BT81:
>>
>> > +#ifdef CONFIG_X86_TSC
>> > +     /*
>> > +      * Xeon E5 BT81 errata: TSC is not affected by warm reset.
>> > +      * The TSC registers for CPUs other than CPU0 are not cleared by a warm
>> > +      * reset resulting in a constant offset error.
>> > +      */
>> > +     if ((c->x86 == 6) && (c->x86_model == 0x3f))
>> > +             set_cpu_bug(c, X86_BUG_TSC_OFFSET);
>> > +#endif
>>
>> That's hardly a family, that's just one, Haswell server.
>
>
> Are you actually observing this problem on this processor?
>
> The document you've referenced and the x86_model # above do not match up. The errata should be for Intel processors with an x86_model value of 0x2d by my calculations:
>
> Model: 1101b
> Extended Model: 0010b
>
> The calc from cpu_detect() is:
>                  if (c->x86 >= 0x6)
>                         c->x86_model += ((tfms >> 16) & 0xf) << 4;
>
> 0x3f is a different CPU.
> --
> Josh

Resending, as gmail inserted html and lkml dropped the previous reply...


-- 
Josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-10 18:27     ` Josh Hunt
@ 2015-11-10 19:47       ` Gratian Crisan
  2015-11-10 20:41         ` Josh Hunt
  0 siblings, 1 reply; 11+ messages in thread
From: Gratian Crisan @ 2015-11-10 19:47 UTC (permalink / raw)
  To: Josh Hunt
  Cc: Peter Zijlstra, gratian.crisan, Thomas Gleixner, LKML,
	Ingo Molnar, H . Peter Anvin, x86, Borislav Petkov,
	Josh Cartwright, gratian


Josh Hunt writes:

> On Tue, Nov 10, 2015 at 12:24 PM, Josh Hunt <joshhunt00@gmail.com> wrote:
>>
>> On Mon, Nov 9, 2015 at 4:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>
>>> On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.crisan@ni.com wrote:
>>>
>>> > The Intel Xeon E5 processor family suffers from errata[1] BT81:
>>>
>>> > +#ifdef CONFIG_X86_TSC
>>> > +     /*
>>> > +      * Xeon E5 BT81 errata: TSC is not affected by warm reset.
>>> > +      * The TSC registers for CPUs other than CPU0 are not cleared by a warm
>>> > +      * reset resulting in a constant offset error.
>>> > +      */
>>> > +     if ((c->x86 == 6) && (c->x86_model == 0x3f))
>>> > +             set_cpu_bug(c, X86_BUG_TSC_OFFSET);
>>> > +#endif
>>>
>>> That's hardly a family, that's just one, Haswell server.
>>
>>
>> Are you actually observing this problem on this processor?
>>
>> The document you've referenced and the x86_model # above do not match up. The errata should be for Intel processors with an x86_model value of 0x2d by my calculations:
>>
>> Model: 1101b
>> Extended Model: 0010b
>>
>> The calc from cpu_detect() is:
>>                  if (c->x86 >= 0x6)
>>                         c->x86_model += ((tfms >> 16) & 0xf) << 4;
>>
>> 0x3f is a different CPU.

The processor I am seeing the issue on is (according to /proc/cpuinfo):
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2618L v3 @ 2.30GHz
stepping	: 2
microcode	: 0x2e

The observed behavior does seem to match BT81 errata i.e. the TSC does
not get reset on warm reboots and it is otherwise stable.

However you are correct in pointing out that the errata CPU model number
does not match. My apologies, decoding the x86 cpu info/model numbers is
a new area for me and the title of the errata made it sound like it
applies to the whole Intel Xeon E5 family. It was only in trying to
reply to Peter's comment that I've noticed the discrepancy with regards
to the model number.

I am currently trying to figure out if there is an errata that
specifically lists Xeon E5-2618L with TSC problems on warm resets and I
will re-work this.

Sorry again for not double checking.

-Gratian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-10 19:47       ` Gratian Crisan
@ 2015-11-10 20:41         ` Josh Hunt
  2015-11-11 15:41           ` Gratian Crisan
  0 siblings, 1 reply; 11+ messages in thread
From: Josh Hunt @ 2015-11-10 20:41 UTC (permalink / raw)
  To: Gratian Crisan
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, Ingo Molnar,
	H . Peter Anvin, x86, Borislav Petkov, Josh Cartwright, gratian

On Tue, Nov 10, 2015 at 1:47 PM, Gratian Crisan <gratian.crisan@ni.com> wrote:
>
> The observed behavior does seem to match BT81 errata i.e. the TSC does
> not get reset on warm reboots and it is otherwise stable.
>
If you have a simple testcase to reproduce the problem I'd be
interested in seeing it.

-- 
Josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-10 20:41         ` Josh Hunt
@ 2015-11-11 15:41           ` Gratian Crisan
  2015-11-13 20:43             ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Gratian Crisan @ 2015-11-11 15:41 UTC (permalink / raw)
  To: Josh Hunt
  Cc: Gratian Crisan, Peter Zijlstra, Thomas Gleixner, LKML,
	Ingo Molnar, H . Peter Anvin, x86, Borislav Petkov,
	Josh Cartwright, gratian


Josh Hunt writes:

> On Tue, Nov 10, 2015 at 1:47 PM, Gratian Crisan <gratian.crisan@ni.com> wrote:
>>
>> The observed behavior does seem to match BT81 errata i.e. the TSC does
>> not get reset on warm reboots and it is otherwise stable.
>>
> If you have a simple testcase to reproduce the problem I'd be
> interested in seeing it.

We have first hit this bug on a 4.1 PREEMPT_RT kernel where it actually
causes a boot hang on warm reboots. I haven't quite got to the bottom of
why the TSC having a large offset vs. CPU0 would cause the hang yet. I
have some stack traces around somewhere I can dig up.

I also wrote a small C utility[1], with a bit of code borrowed from the
kernel, for reading the TSC on all CPUs. It starts a high priority
thread per CPU, tries to synchronize them and prints out the TSC values
and their offset with regards to CPU0.
It can be called from a SysV init shell script[2] at the beginning of
the boot process and right before a reboot to save the values in a file.

I've pasted the results after 3 reboots [3]. You can see the CPU0's TSC
getting reset on reboot and the other cores happily ticking on throughout
the reboot.

-Gratian

[1] read-tsc.c
--8<---------------cut here---------------start------------->8---
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <pthread.h>
#include <sched.h>
#include <errno.h>
#include <assert.h>

#define DECLARE_ARGS(val, low, high)	unsigned low, high
#define EAX_EDX_VAL(val, low, high)	((low) | ((uint64_t)(high) << 32))
#define EAX_EDX_ARGS(val, low, high)	"a" (low), "d" (high)
#define EAX_EDX_RET(val, low, high)	"=a" (low), "=d" (high)

static int thread_sync;
static unsigned long long *tsc_data = NULL;

#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

static inline void rep_nop(void)
{
	asm volatile("rep; nop" ::: "memory");
}

static inline void cpu_relax(void)
{
	rep_nop();
}

static inline unsigned long long rdtsc_ordered(void)
{
	DECLARE_ARGS(val, low, high);

	asm volatile("lfence" : : : "memory");
	asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));

	return EAX_EDX_VAL(val, low, high);
}

static void* threadfn(void *param)
{
	long cpu = (long)param;
	cpu_set_t mask;
	struct sched_param schedp;

	CPU_ZERO(&mask);
	CPU_SET(cpu, &mask);
	if (sched_setaffinity(0, sizeof(mask), &mask) == -1) {
		perror("error: Failed to set the CPU affinity");
		return NULL;
	}

	/*
	 * Set the thread priority just below the migration thread's. The idea
	 * is to minimize the chances of being preempted while running the test.
	 */
	memset(&schedp, 0, sizeof(schedp));
	schedp.sched_priority = sched_get_priority_max(SCHED_FIFO) - 1;
	if (sched_setscheduler(0, SCHED_FIFO, &schedp) == -1) {
		perror("error: Failed to set the thread priority");
		return NULL;
	}

	__sync_sub_and_fetch(&thread_sync, 1);
	while (ACCESS_ONCE(thread_sync))
		cpu_relax();

	tsc_data[cpu] = rdtsc_ordered();

	return NULL;
}

int main(int argc, char* argv[])
{
	long i;
	unsigned long n_cpus;
	pthread_t *th = NULL;
	int ret = EXIT_SUCCESS;

	n_cpus = sysconf(_SC_NPROCESSORS_ONLN);
	thread_sync = n_cpus;
	__sync_synchronize();

	tsc_data = (unsigned long long*)malloc(n_cpus *
					       sizeof(unsigned long long));
	if (!tsc_data) {
		fprintf(stderr, "error: Failed to allocate memory for TSC data\n");
		ret = EXIT_FAILURE;
		goto out;
	}

	th = (pthread_t *)malloc(n_cpus * sizeof(pthread_t));
	if (!th) {
		fprintf(stderr, "error: Failed to allocate memory for thread data\n");
		ret = EXIT_FAILURE;
		goto out;
	}

	for (i = 0; i < n_cpus; i++)
		pthread_create(&th[i], NULL, threadfn, (void*)i);		

	for (i = 0; i < n_cpus; i++)
		pthread_join(th[i], NULL);

	if (argc > 1)
		printf("%s: ", argv[1]);
	for (i = 0; i < n_cpus; i++)
		printf("%llu[%lld] ", tsc_data[i], tsc_data[i] - tsc_data[0]);
	printf("\n");
	
  out:
	free(tsc_data);
	free(th);
		
	return ret;
}
--8<---------------cut here---------------end--------------->8---


[2] /etc/init.d/save-tsc
--8<---------------cut here---------------start------------->8---
#!/bin/sh

read-tsc "$1" >> /tsc.dat
exit 0
--8<---------------cut here---------------end--------------->8---


[3] tsc.dat
--8<---------------cut here---------------start------------->8---
stop: 222292260504[0] 146566095145777[146343802885273] 146566095145866[146343802885362] 146566095145817[146343802885313] 146566095145895[146343802885391] 146566095145840[146343802885336] 146566095145751[146343802885247] 146566095145707[146343802885203] 
start: 42437383741[0] 146626054987730[146583617603989] 146626054987813[146583617604072] 146626054987873[146583617604132] 146626054987444[146583617603703] 146626054987557[146583617603816] 146626054987703[146583617603962] 146626054987922[146583617604181] 
stop: 175075718467[0] 146758693322318[146583617603851] 146758693322251[146583617603784] 146758693322294[146583617603827] 146758693322276[146583617603809] 146758693322197[146583617603730] 146758693322228[146583617603761] 146758693322116[146583617603649] 
start: 42318111746[0] 146818573335855[146776255224109] 146818573336118[146776255224372] 146818573335988[146776255224242] 146818573335796[146776255224050] 146818573335930[146776255224184] 146818573335738[146776255223992] 146818573335619[146776255223873] 
stop: 117186647162[0] 146893441871380[146776255224218] 146893441871412[146776255224250] 146893441871361[146776255224199] 146893441871287[146776255224125] 146893441871335[146776255224173] 146893441871439[146776255224277] 146893441871269[146776255224107] 
start: 42577639385[0] 146953539519284[146910961879899] 146953539519333[146910961879948] 146953539519268[146910961879883] 146953539536718[146910961897333] 146953539519223[146910961879838] 146953539519068[146910961879683] 146953539519185[146910961879800] 
--8<---------------cut here---------------end--------------->8---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-11 15:41           ` Gratian Crisan
@ 2015-11-13 20:43             ` Peter Zijlstra
  2015-11-17 16:38               ` Gratian Crisan
  2015-11-19 19:04               ` Gratian Crisan
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2015-11-13 20:43 UTC (permalink / raw)
  To: Gratian Crisan
  Cc: Josh Hunt, Thomas Gleixner, LKML, Ingo Molnar, H . Peter Anvin,
	x86, Borislav Petkov, Josh Cartwright, gratian

On Wed, Nov 11, 2015 at 09:41:25AM -0600, Gratian Crisan wrote:
> I also wrote a small C utility[1], with a bit of code borrowed from the
> kernel, for reading the TSC on all CPUs. It starts a high priority
> thread per CPU, tries to synchronize them and prints out the TSC values
> and their offset with regards to CPU0.
> It can be called from a SysV init shell script[2] at the beginning of
> the boot process and right before a reboot to save the values in a file.

Could you also read and print TSC_ADJUST (msr 0x3b) ? This would tell us
if for example your BIOS messed it up.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-09 22:02 ` Peter Zijlstra
       [not found]   ` <CAKA=qzarnUUmZb7DQE+u0Dei3F+FQNoL2bak_-dV9D9+3L=itQ@mail.gmail.com>
@ 2015-11-13 21:13   ` Dave Hansen
  2015-11-17 16:49     ` Gratian Crisan
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2015-11-13 21:13 UTC (permalink / raw)
  To: Peter Zijlstra, gratian.crisan
  Cc: Thomas Gleixner, linux-kernel, Ingo Molnar, H . Peter Anvin, x86,
	Borislav Petkov, Josh Cartwright, gratian

On 11/09/2015 02:02 PM, Peter Zijlstra wrote:
> On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.crisan@ni.com wrote:
>> The Intel Xeon E5 processor family suffers from errata[1] BT81:
> 
>> +#ifdef CONFIG_X86_TSC
>> +	/*
>> +	 * Xeon E5 BT81 errata: TSC is not affected by warm reset.
>> +	 * The TSC registers for CPUs other than CPU0 are not cleared by a warm
>> +	 * reset resulting in a constant offset error.
>> +	 */
>> +	if ((c->x86 == 6) && (c->x86_model == 0x3f))
>> +		set_cpu_bug(c, X86_BUG_TSC_OFFSET);
>> +#endif
> 
> That's hardly a family, that's just one, Haswell server.

How did you come up with that x86_model?  The document you linked to
claimes that "Extended Model" is 0010b and "Model Number" is 1101b, so
the x86_model you are looking for should be 0x2d.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-13 20:43             ` Peter Zijlstra
@ 2015-11-17 16:38               ` Gratian Crisan
  2015-11-19 19:04               ` Gratian Crisan
  1 sibling, 0 replies; 11+ messages in thread
From: Gratian Crisan @ 2015-11-17 16:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gratian Crisan, Josh Hunt, Thomas Gleixner, LKML, Ingo Molnar,
	H . Peter Anvin, x86, Borislav Petkov, Josh Cartwright, gratian


Peter Zijlstra writes:

> On Wed, Nov 11, 2015 at 09:41:25AM -0600, Gratian Crisan wrote:
>> I also wrote a small C utility[1], with a bit of code borrowed from the
>> kernel, for reading the TSC on all CPUs. It starts a high priority
>> thread per CPU, tries to synchronize them and prints out the TSC values
>> and their offset with regards to CPU0.
>> It can be called from a SysV init shell script[2] at the beginning of
>> the boot process and right before a reboot to save the values in a file.
>
> Could you also read and print TSC_ADJUST (msr 0x3b) ? This would tell us
> if for example your BIOS messed it up.

Good call on the TSC_ADJUST. The BIOS seems to set it for CPU0 but not
for any of the other ones. I'll bug our BIOS guys about it.
Here's how the data looks after a couple reboots:

stop      : 127385698358[0] 741784252175365[741656866477007] 741784252175432[741656866477074] 741784252175471[741656866477113] 741784252175349[741656866476991] 741784252175458[741656866477100] 741784252175285[741656866476927] 741784252175501[741656866477143] 
TSC_ADJUST: -741656866477048 0 0 0 0 0 0 0 

start     : 47601069657[0] 741849504816842[741801903747185] 741849504816884[741801903747227] 741849504817004[741801903747347] 741849504817113[741801903747456] 741849504817051[741801903747394] 741849504816746[741801903747089] 741849504816962[741801903747305] 
TSC_ADJUST: -741801903747272 0 0 0 0 0 0 0 

stop      : 127495422447[0] 741929399169793[741801903747346] 741929399169821[741801903747374] 741929399169739[741801903747292] 741929399169767[741801903747320] 741929399169657[741801903747210] 741929399169612[741801903747165] 741929399169679[741801903747232] 
TSC_ADJUST: -741801903747272 0 0 0 0 0 0 0 

start     : 47522880051[0] 741994508088208[741946985208157] 741994508088258[741946985208207] 741994508088305[741946985208254] 741994508088110[741946985208059] 741994508088052[741946985208001] 741994508088020[741946985207969] 741994508087930[741946985207879] 
TSC_ADJUST: -741946985208111 0 0 0 0 0 0 0

Thanks a lot for helping with this,
-Gratian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-13 21:13   ` Dave Hansen
@ 2015-11-17 16:49     ` Gratian Crisan
  0 siblings, 0 replies; 11+ messages in thread
From: Gratian Crisan @ 2015-11-17 16:49 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Peter Zijlstra, gratian.crisan, Thomas Gleixner, linux-kernel,
	Ingo Molnar, H . Peter Anvin, x86, Borislav Petkov,
	Josh Cartwright, gratian


Dave Hansen writes:

> On 11/09/2015 02:02 PM, Peter Zijlstra wrote:
>> On Mon, Nov 09, 2015 at 01:59:02PM -0600, gratian.crisan@ni.com wrote:
>>> The Intel Xeon E5 processor family suffers from errata[1] BT81:
>> 
>>> +#ifdef CONFIG_X86_TSC
>>> +	/*
>>> +	 * Xeon E5 BT81 errata: TSC is not affected by warm reset.
>>> +	 * The TSC registers for CPUs other than CPU0 are not cleared by a warm
>>> +	 * reset resulting in a constant offset error.
>>> +	 */
>>> +	if ((c->x86 == 6) && (c->x86_model == 0x3f))
>>> +		set_cpu_bug(c, X86_BUG_TSC_OFFSET);
>>> +#endif
>> 
>> That's hardly a family, that's just one, Haswell server.
>
> How did you come up with that x86_model?  The document you linked to
> claimes that "Extended Model" is 0010b and "Model Number" is 1101b, so
> the x86_model you are looking for should be 0x2d.

Apologies. I've messed up. The observed behavior seemed to match the
errata and it was a Xeon E5. I've used the model number I read of the
machine exhibiting the behavior w/o properly matching it with the model
number in the errata.

In the meantime Peter Zijlstra pointed me in the right direction i.e. it
looks like the BIOS is changing the TSC_ADJUST for CPU0 but not any of
the other ones. I'll sort it out with our BIOS guys and drop this patch.

Sorry again for the confusion.
-Gratian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error
  2015-11-13 20:43             ` Peter Zijlstra
  2015-11-17 16:38               ` Gratian Crisan
@ 2015-11-19 19:04               ` Gratian Crisan
  1 sibling, 0 replies; 11+ messages in thread
From: Gratian Crisan @ 2015-11-19 19:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gratian Crisan, Josh Hunt, Thomas Gleixner, LKML, Ingo Molnar,
	H . Peter Anvin, x86, Borislav Petkov, Josh Cartwright, gratian


Peter Zijlstra writes:

> On Wed, Nov 11, 2015 at 09:41:25AM -0600, Gratian Crisan wrote:
>> I also wrote a small C utility[1], with a bit of code borrowed from the
>> kernel, for reading the TSC on all CPUs. It starts a high priority
>> thread per CPU, tries to synchronize them and prints out the TSC values
>> and their offset with regards to CPU0.
>> It can be called from a SysV init shell script[2] at the beginning of
>> the boot process and right before a reboot to save the values in a file.
>
> Could you also read and print TSC_ADJUST (msr 0x3b) ? This would tell us
> if for example your BIOS messed it up.

We've gathered some more information on this:

1. We were able to confirm that the TSC_ADJUST is set for CPU0 before
the bootloader or kernel start. Using an EFI shell to read MSR 0x3b
shows:
 0: 0xFFFFFFFFAD999CCB
 1: 0x0000000000000000
 2: 0x0000000000000000
 3: 0x0000000000000000
 4: 0x0000000000000000
 5: 0x0000000000000000
 6: 0x0000000000000000
 7: 0x0000000000000000

2. We were also able to reproduce this behavior on a Dell Precision
T5810 workstation PC. Relevant /proc/cpuinfo below[1]. Additionally we
have observed the same behavior on an off the shelf motherboard
equipped with a Haswell-E similar to the Haswell-EP we originally saw
this on.

We are still trying to figure out what in the boot chain sets the CPU0's
TSC_ADJUST i.e. BIOS, something pre-BIOS and how to proceed from there.

Thanks,
        Gratian

[1]
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
stepping        : 2
microcode       : 0x27
cpu MHz         : 1200.117
cache size      : 15360 KB
physical id     : 0
siblings        : 12
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips        : 6983.91
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-11-19 19:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09 19:59 [RFC PATCH] tsc: synchronize TSCs on buggy Intel Xeon E5 CPUs with offset error gratian.crisan
2015-11-09 22:02 ` Peter Zijlstra
     [not found]   ` <CAKA=qzarnUUmZb7DQE+u0Dei3F+FQNoL2bak_-dV9D9+3L=itQ@mail.gmail.com>
2015-11-10 18:27     ` Josh Hunt
2015-11-10 19:47       ` Gratian Crisan
2015-11-10 20:41         ` Josh Hunt
2015-11-11 15:41           ` Gratian Crisan
2015-11-13 20:43             ` Peter Zijlstra
2015-11-17 16:38               ` Gratian Crisan
2015-11-19 19:04               ` Gratian Crisan
2015-11-13 21:13   ` Dave Hansen
2015-11-17 16:49     ` Gratian Crisan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).