All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] x86/platform/UV: Update TSC support
@ 2017-09-21 20:12 mike.travis
  2017-09-21 20:12 ` [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid mike.travis
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: mike.travis @ 2017-09-21 20:12 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Peter Zijlstra, Bin Gao, Prarit Bhargava, Dimitri Sivanich,
	Andrew Banman, Russ Anderson, linux-kernel, x86


The UV BIOS goes to considerable effort to get the TSC synchronization
accurate across the entire system.  Included in that are multiple
chassis that can have 32+ sockets.  The architecture also supports an
external high resolution clock to help all the chassis and CPU's in
maintaining this synchronization.

As a result the UV system BIOS provides much better TSC accuracy than
the current kernel TSC ADJUST functions.  This is extremely important in
applications that, for example, store and access data bases.

These two patches disable an assumption made by the kernel TSC ADJUST
functions that are not applicable to the UV system (and maybe others?).
That being that Socket 0 in the system should have a TSC ADJUST value
of zero.  This is incorrect when all the chassis are started asynchronously
to each other, therefore which TSC's are farther advanced is not predictable.
The UV BIOS deals with this scenario correctly.

-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid
  2017-09-21 20:12 [PATCH 0/3] x86/platform/UV: Update TSC support mike.travis
@ 2017-09-21 20:12 ` mike.travis
  2017-09-25 15:30   ` Thomas Gleixner
  2017-09-21 20:12 ` [PATCH 2/3] x86/kernel: Skip TSC test and error messages if already unstable mike.travis
  2017-09-21 20:12 ` [PATCH 3/3] x86/platform/UV: Add check of TSC state set by UV BIOS mike.travis
  2 siblings, 1 reply; 9+ messages in thread
From: mike.travis @ 2017-09-21 20:12 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Peter Zijlstra, Bin Gao, Prarit Bhargava, Dimitri Sivanich,
	Andrew Banman, Russ Anderson, linux-kernel, x86

[-- Attachment #1: add-base-nonzero --]
[-- Type: text/plain, Size: 4194 bytes --]

Add a flag to indicate that a TSC ADJUST value of non-zero is valid
on Socket 0.  This is required on multiple chassis systems for which
the Time Stamp Counter on all the chassis are started asynchronously.
The UV architecture is an example of this.

In this scenario the UV system BIOS will adjust all the TSC ADJUST values
forward so there are no negative ADJUST values.  This may cause the TSC
ADJUST value on socket 0 to not necessarily be zero.

This procedure results TSC skew rates that are far less than the values
as set by the current kernel TSC adjustment functions which force the
TSC ADJUST to be zero on Socket 0.  Note also that an assumption of
zero for TSC ADJUST was also made by not initializing the adjust values.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Reviewed-by: Russ Anderson <russ.anderson@hpe.com>
Reviewed-by: Andrew Banman <andrew.abanman@hpe.com>
---
 arch/x86/include/asm/tsc.h |    2 ++
 arch/x86/kernel/tsc.c      |   19 +++++++++++++++++++
 arch/x86/kernel/tsc_sync.c |   25 +++++++++++++++++++++----
 3 files changed, 42 insertions(+), 4 deletions(-)

--- linux.orig/arch/x86/include/asm/tsc.h
+++ linux/arch/x86/include/asm/tsc.h
@@ -35,6 +35,8 @@ extern void tsc_init(void);
 extern void mark_tsc_unstable(char *reason);
 extern int unsynchronized_tsc(void);
 extern int check_tsc_unstable(void);
+extern int check_tsc_socket0_nonzero(void);
+extern void mark_tsc_socket0_nonzero(char *reason);
 extern unsigned long native_calibrate_cpu(void);
 extern unsigned long native_calibrate_tsc(void);
 extern unsigned long long native_sched_clock_from_tsc(u64 tsc);
--- linux.orig/arch/x86/kernel/tsc.c
+++ linux/arch/x86/kernel/tsc.c
@@ -37,6 +37,11 @@ EXPORT_SYMBOL(tsc_khz);
  */
 static int __read_mostly tsc_unstable;
 
+/*
+ * TSC on socket 0 being non-zero may be correct as set by BIOS
+ */
+static int __read_mostly tsc_socket0_nonzero;
+
 /* native_sched_clock() is called before tsc_init(), so
    we must start with the TSC soft disabled to prevent
    erroneous rdtsc usage on !boot_cpu_has(X86_FEATURE_TSC) processors */
@@ -244,6 +249,20 @@ int check_tsc_unstable(void)
 }
 EXPORT_SYMBOL_GPL(check_tsc_unstable);
 
+void mark_tsc_socket0_nonzero(char *reason)
+{
+	tsc_socket0_nonzero = 1;
+	pr_info("Marking TSC non-zero value valid for socket 0 due to %s\n",
+		reason);
+}
+EXPORT_SYMBOL_GPL(mark_tsc_socket0_nonzero);
+
+int check_tsc_socket0_nonzero(void)
+{
+	return tsc_socket0_nonzero;
+}
+EXPORT_SYMBOL_GPL(check_tsc_socket0_nonzero);
+
 #ifdef CONFIG_X86_TSC
 int __init notsc_setup(char *str)
 {
--- linux.orig/arch/x86/kernel/tsc_sync.c
+++ linux/arch/x86/kernel/tsc_sync.c
@@ -71,12 +71,22 @@ static void tsc_sanitize_first_cpu(struc
 	 * non zero. We don't do that on non boot cpus because physical
 	 * hotplug should have set the ADJUST register to a value > 0 so
 	 * the TSC is in sync with the already running cpus.
+	 *
+	 * Also don't force the ADJUST value from non-zero to zero if that
+	 * is a valid value for socket 0 as determined by the system BIOS.
+	 * This is required where multiple chassis are started asynchronously
+	 * with each other and socket 0 may not have an TSC ADJUST value of 0.
 	 */
 	if (bootcpu && bootval != 0) {
-		pr_warn(FW_BUG "TSC ADJUST: CPU%u: %lld force to 0\n", cpu,
-			bootval);
-		wrmsrl(MSR_IA32_TSC_ADJUST, 0);
-		bootval = 0;
+		if (!check_tsc_socket0_nonzero()) {
+			pr_warn(FW_BUG "TSC ADJUST: CPU%u: %lld force to 0\n",
+				cpu, bootval);
+			wrmsrl(MSR_IA32_TSC_ADJUST, 0);
+			bootval = 0;
+		} else {
+			pr_info("TSC ADJUST: CPU%u: %lld NOT forced to 0\n",
+				cpu, bootval);
+		}
 	}
 	cur->adjusted = bootval;
 }
@@ -118,6 +128,13 @@ bool tsc_store_and_check_tsc_adjust(bool
 	cur->warned = false;
 
 	/*
+	 * If a non-zero TSC value for socket 0 is valid then the default
+	 * adjusted value cannot assumed to be zero.
+	 */
+	if (check_tsc_socket0_nonzero())
+		cur->adjusted = bootval;
+
+	/*
 	 * Check whether this CPU is the first in a package to come up. In
 	 * this case do not check the boot value against another package
 	 * because the new package might have been physically hotplugged,

-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] x86/kernel: Skip TSC test and error messages if already unstable
  2017-09-21 20:12 [PATCH 0/3] x86/platform/UV: Update TSC support mike.travis
  2017-09-21 20:12 ` [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid mike.travis
@ 2017-09-21 20:12 ` mike.travis
  2017-09-21 20:12 ` [PATCH 3/3] x86/platform/UV: Add check of TSC state set by UV BIOS mike.travis
  2 siblings, 0 replies; 9+ messages in thread
From: mike.travis @ 2017-09-21 20:12 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Peter Zijlstra, Bin Gao, Prarit Bhargava, Dimitri Sivanich,
	Andrew Banman, Russ Anderson, linux-kernel, x86

[-- Attachment #1: unstable-tsc-disable-check --]
[-- Type: text/plain, Size: 1254 bytes --]

If the TSC has already been determined to be unstable, then checking
TSC ADJUST values is a waste of time and generates unnecessary error
messages (840 for a 16 socket Skylake 28/2 core/ht system).

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Reviewed-by: Russ Anderson <russ.anderson@hpe.com>
---
 arch/x86/kernel/tsc_sync.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- linux.orig/arch/x86/kernel/tsc_sync.c
+++ linux/arch/x86/kernel/tsc_sync.c
@@ -38,6 +38,10 @@ void tsc_verify_tsc_adjust(bool resume)
 	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
 		return;
 
+	/* Skip unnecessary error messages if TSC already unstable */
+	if (check_tsc_unstable())
+		return;
+
 	/* Rate limit the MSR check */
 	if (!resume && time_before(jiffies, adj->nextcheck))
 		return;
@@ -61,6 +65,9 @@ void tsc_verify_tsc_adjust(bool resume)
 static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval,
 				   unsigned int cpu, bool bootcpu)
 {
+	/* Skip unnecessary error messages if TSC already unstable */
+	if (check_tsc_unstable())
+		return;
 	/*
 	 * First online CPU in a package stores the boot value in the
 	 * adjustment value. This value might change later via the sync

-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] x86/platform/UV: Add check of TSC state set by UV BIOS
  2017-09-21 20:12 [PATCH 0/3] x86/platform/UV: Update TSC support mike.travis
  2017-09-21 20:12 ` [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid mike.travis
  2017-09-21 20:12 ` [PATCH 2/3] x86/kernel: Skip TSC test and error messages if already unstable mike.travis
@ 2017-09-21 20:12 ` mike.travis
  2 siblings, 0 replies; 9+ messages in thread
From: mike.travis @ 2017-09-21 20:12 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Peter Zijlstra, Bin Gao, Prarit Bhargava, Dimitri Sivanich,
	Andrew Banman, Russ Anderson, linux-kernel, x86

[-- Attachment #1: uv_set-base-nonzero --]
[-- Type: text/plain, Size: 3584 bytes --]

Insert a check early in UV system startup that checks whether BIOS was
able to obtain satisfactory TSC Sync stability.  If not, it usually
is caused by an error in the external TSC clock generation source.
In this case the best fallback is to use the builtin hardware RTC as
the kernel would also not be able to obtain an accurate TSC sync.

Signed-off-by: Mike Travis <mike.travis@hpe.com>
Reviewed-by: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Reviewed-by: Russ Anderson <russ.anderson@hpe.com>
Reviewed-by: Andrew Banman <andrew.abanman@hpe.com>
---
 arch/x86/include/asm/uv/uv_hub.h   |   19 +++++++++++++--
 arch/x86/kernel/apic/x2apic_uv_x.c |   44 +++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 3 deletions(-)

--- linux.orig/arch/x86/include/asm/uv/uv_hub.h
+++ linux/arch/x86/include/asm/uv/uv_hub.h
@@ -776,9 +776,22 @@ static inline int uv_num_possible_blades
 extern void uv_nmi_setup(void);
 extern void uv_nmi_setup_hubless(void);
 
+/* BIOS/Kernel flags exchange MMR */
+#define	UVH_BIOS_KERNEL_MMR		UVH_SCRATCH5
+#define	UVH_BIOS_KERNEL_MMR_ALIAS	UVH_SCRATCH5_ALIAS
+#define UVH_BIOS_KERNEL_MMR_ALIAS_2	UVH_SCRATCH5_ALIAS_2
+
+/* TSC sync valid, set by BIOS */
+#define	UVH_TSC_SYNC_MMR	UVH_BIOS_KERNEL_MMR
+#define	UVH_TSC_SYNC_SHIFT	10
+#define	UVH_TSC_SYNC_SHIFT_UV2K	16	/* UV3k has different bits */
+#define	UVH_TSC_SYNC_MASK	3	/* 0011 */
+#define	UVH_TSC_SYNC_VALID	3	/* 0011 */
+#define	UVH_TSC_SYNC_INVALID	2	/* 0010 */
+
 /* BMC sets a bit this MMR non-zero before sending an NMI */
-#define UVH_NMI_MMR		UVH_SCRATCH5
-#define UVH_NMI_MMR_CLEAR	UVH_SCRATCH5_ALIAS
+#define UVH_NMI_MMR		UVH_BIOS_KERNEL_MMR
+#define UVH_NMI_MMR_CLEAR	UVH_BIOS_KERNEL_MMR_ALIAS
 #define UVH_NMI_MMR_SHIFT	63
 #define	UVH_NMI_MMR_TYPE	"SCRATCH5"
 
@@ -792,7 +805,7 @@ extern void uv_nmi_setup_hubless(void);
 #define UVH_NMI_MMRX_SUPPORTED	UVH_EXTIO_INT0_BROADCAST
 
 /* Indicates to BIOS that we want to use the newer SMM NMI handler */
-#define UVH_NMI_MMRX_REQ	UVH_SCRATCH5_ALIAS_2
+#define UVH_NMI_MMRX_REQ	UVH_BIOS_KERNEL_MMR_ALIAS_2
 #define UVH_NMI_MMRX_REQ_SHIFT	62
 
 struct uv_hub_nmi_s {
--- linux.orig/arch/x86/kernel/apic/x2apic_uv_x.c
+++ linux/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -154,6 +154,49 @@ static int __init early_get_pnodeid(void
 	return pnode;
 }
 
+static void uv_tsc_check_sync(void)
+{
+	u64 mmr;
+	int sync_state;
+	int mmr_shift;
+	char *reason;
+	char *state;
+	bool valid;
+
+	/* Accommodate different UV arch BIOSes */
+	mmr = uv_early_read_mmr(UVH_TSC_SYNC_MMR);
+	mmr_shift =
+		is_uv1_hub() ? 0 :
+		is_uv2_hub() ? UVH_TSC_SYNC_SHIFT_UV2K : UVH_TSC_SYNC_SHIFT;
+	if (mmr_shift)
+		sync_state = (mmr >> mmr_shift) & UVH_TSC_SYNC_MASK;
+	else
+		sync_state = 0;
+
+	switch (sync_state) {
+	case UVH_TSC_SYNC_VALID:
+		state = "in sync";
+		valid = true;
+		reason = "UV BIOS";
+		break;
+
+	case UVH_TSC_SYNC_INVALID:
+		state = "unstable";
+		valid = false;
+		break;
+	default:
+		state = "unknown";
+		valid = true;
+		reason = "UV default";
+		break;
+	}
+	pr_info("UV: TSC sync state from BIOS:0%d(%s)\n", sync_state, state);
+	if (valid)
+		mark_tsc_socket0_nonzero(reason);
+	else
+		mark_tsc_unstable("UV BIOS");
+}
+
 /* [Copied from arch/x86/kernel/cpu/topology.c:detect_extended_topology()] */
 
 #define SMT_LEVEL			0	/* Leaf 0xb SMT level */
@@ -288,6 +331,7 @@ static int __init uv_acpi_madt_oem_check
 	}
 
 	pr_info("UV: OEM IDs %s/%s, System/HUB Types %d/%d, uv_apic %d\n", oem_id, oem_table_id, uv_system_type, uv_min_hub_revision_id, uv_apic);
+	uv_tsc_check_sync();
 
 	return uv_apic;
 

-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid
  2017-09-21 20:12 ` [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid mike.travis
@ 2017-09-25 15:30   ` Thomas Gleixner
  2017-09-25 16:47     ` Mike Travis
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2017-09-25 15:30 UTC (permalink / raw)
  To: mike.travis
  Cc: Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Bin Gao,
	Prarit Bhargava, Dimitri Sivanich, Andrew Banman, Russ Anderson,
	linux-kernel, x86

On Thu, 21 Sep 2017, mike.travis@hpe.com wrote:
> +/*
> + * TSC on socket 0 being non-zero may be correct as set by BIOS
> + */
> +static int __read_mostly tsc_socket0_nonzero;
> +
>  /* native_sched_clock() is called before tsc_init(), so
>     we must start with the TSC soft disabled to prevent
>     erroneous rdtsc usage on !boot_cpu_has(X86_FEATURE_TSC) processors */
> @@ -244,6 +249,20 @@ int check_tsc_unstable(void)
>  }
>  EXPORT_SYMBOL_GPL(check_tsc_unstable);
>  
> +void mark_tsc_socket0_nonzero(char *reason)
> +{
> +	tsc_socket0_nonzero = 1;
> +	pr_info("Marking TSC non-zero value valid for socket 0 due to %s\n",
> +		reason);
> +}
> +EXPORT_SYMBOL_GPL(mark_tsc_socket0_nonzero);
>
> +int check_tsc_socket0_nonzero(void)
> +{
> +	return tsc_socket0_nonzero;
> +}
> +EXPORT_SYMBOL_GPL(check_tsc_socket0_nonzero);

Is there a real reason to export these functions? I can't see the UV early
boot code and tsc_sync being built as modules in the forseeable future, but
perhaps you know more than I do :)

Aside of that I really do not like this kind of special case hackery. The
real question is whether we need to enforce TSC_ADJUST == 0 on the boot cpu
at all. In principle we don't anymore now that we handle that TSC deadline
timer wreckage cleanly.

But the UV 'boot chassis at different times' brings me to a related
question:

How is this setup dealing with ART (Always Running Timer, which is
distributed over PCIe for hardware timestamping and hardware assisted event
correlation)?

I assume that ART on UV is also per chassis, but that means that the
documented relation ship of:

	TSC = ART * n/d + offset

where $offset is system wide (the TSC_ADJUST value of the boot cpu), is
not applicable.

Is there some other magic in play which makes ART work across chassis?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid
  2017-09-25 15:30   ` Thomas Gleixner
@ 2017-09-25 16:47     ` Mike Travis
  2017-09-25 18:10       ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Travis @ 2017-09-25 16:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Bin Gao,
	Prarit Bhargava, Dimitri Sivanich, Andrew Banman, Russ Anderson,
	linux-kernel, x86



On 9/25/2017 8:30 AM, Thomas Gleixner wrote:
> On Thu, 21 Sep 2017, mike.travis@hpe.com wrote:
>> +/*
>> + * TSC on socket 0 being non-zero may be correct as set by BIOS
>> + */
>> +static int __read_mostly tsc_socket0_nonzero;
>> +
>>   /* native_sched_clock() is called before tsc_init(), so
>>      we must start with the TSC soft disabled to prevent
>>      erroneous rdtsc usage on !boot_cpu_has(X86_FEATURE_TSC) processors */
>> @@ -244,6 +249,20 @@ int check_tsc_unstable(void)
>>   }
>>   EXPORT_SYMBOL_GPL(check_tsc_unstable);
>>   
>> +void mark_tsc_socket0_nonzero(char *reason)
>> +{
>> +	tsc_socket0_nonzero = 1;
>> +	pr_info("Marking TSC non-zero value valid for socket 0 due to %s\n",
>> +		reason);
>> +}
>> +EXPORT_SYMBOL_GPL(mark_tsc_socket0_nonzero);
>>
>> +int check_tsc_socket0_nonzero(void)
>> +{
>> +	return tsc_socket0_nonzero;
>> +}
>> +EXPORT_SYMBOL_GPL(check_tsc_socket0_nonzero);
> 
> Is there a real reason to export these functions? I can't see the UV early
> boot code and tsc_sync being built as modules in the forseeable future, but
> perhaps you know more than I do :)

Yes, that was a mistake.  I originally inserted this by following the 
example of the check_tsc_unstable() function.  I had a later patch 
moving this to tsc_sync.c as a local flag but apparently it reverted 
back to the earlier version.  I'll fix that.
> 
> Aside of that I really do not like this kind of special case hackery. The
> real question is whether we need to enforce TSC_ADJUST == 0 on the boot cpu
> at all. In principle we don't anymore now that we handle that TSC deadline
> timer wreckage cleanly.

I am hesitant to make such a global change as it appears the author 
intentionally added this.  It not only caused our internal tsc sync 
tests to become totally out of whack, it also generated an avalanche of 
error messages to the system console (>3000 messages for a 32 socket 
Skylake system).  And I don't have the means to test how major changes 
to the TSC adjust functions will affect standard whitebox PC's.

Our BIOS team did make a change to conform to the "TSC_ADJUST should be 
the same on all cpu threads on a single socket" requirement, so we were 
able to pass that part of the TSC validation functions.  (Prior to this, 
the TSC's were synced by writing directly to the TSC MSR and natural 
delays in the processor firmware caused the slight differences in the 
TSC ADJUST values.)

> 
> But the UV 'boot chassis at different times' brings me to a related
> question:

Essentially what happens is the system reset signals are distributed in 
various ways which cause the different chassis to start up 
asynchronously with each other.  The UV system is not "hard" bound to 
each other but adapts to the system configuration as it starts up.

> 
> How is this setup dealing with ART (Always Running Timer, which is
> distributed over PCIe for hardware timestamping and hardware assisted event
> correlation)?
> 
> I assume that ART on UV is also per chassis, but that means that the
> documented relation ship of:
> 
> 	TSC = ART * n/d + offset
> 
> where $offset is system wide (the TSC_ADJUST value of the boot cpu), is
> not applicable.
> 
> Is there some other magic in play which makes ART work across chassis?
 >
 > Thanks,
 >
 > 	tglx
 >

Sorry, I'm not sure how the UV hardware mimics the concept of 'ART'.  It 
does have an external clock generator that is distributed as part of the 
NumaLink protocol and signal set.  Since separate chassis can be 
configured to be either within the same SSI or in separate SSI's then it 
has the ability to configure which chassis are in sync with each other 
and which are on a different clock sync.  This is all within the purview 
of the BIOS folks.

We do have independent methods to verify if TSCs' are in sync with each 
other by measuring the skew rate.  Typical deviations on UV are within a 
two digit clock tick spread, which at an Uncore frequency of 2.5Ghz is 
in the small single digit or less nanosecond range.

I'll post the updated patch set shortly.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid
  2017-09-25 16:47     ` Mike Travis
@ 2017-09-25 18:10       ` Thomas Gleixner
  2017-09-26  0:13         ` Mike Travis
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2017-09-25 18:10 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Bin Gao,
	Prarit Bhargava, Dimitri Sivanich, Andrew Banman, Russ Anderson,
	linux-kernel, x86

On Mon, 25 Sep 2017, Mike Travis wrote:
> On 9/25/2017 8:30 AM, Thomas Gleixner wrote:
> > Aside of that I really do not like this kind of special case hackery. The
> > real question is whether we need to enforce TSC_ADJUST == 0 on the boot cpu
> > at all. In principle we don't anymore now that we handle that TSC deadline
> > timer wreckage cleanly.
> 
> I am hesitant to make such a global change as it appears the author
> intentionally added this.  It not only caused our internal tsc sync tests to
> become totally out of whack, it also generated an avalanche of error messages
> to the system console (>3000 messages for a 32 socket Skylake system).  And I
> don't have the means to test how major changes to the TSC adjust functions
> will affect standard whitebox PC's.

The reason why I put it there in the first place was to make the TSC
deadline timer work on a whole range of systems. It turned out that our
'fix' was not enough so we changed that later to disable the deadline timer
completely on affected systems when the firmware does not contain a fix for
it. So there is no real technical reason anymore to enforce TSC_ADJUST == 0
on the boot CPU. So rather than special casing this for UV we should just
remove that requirement and leave the boot value as is.

> Our BIOS team did make a change to conform to the "TSC_ADJUST should be the
> same on all cpu threads on a single socket" requirement, so we were able to
> pass that part of the TSC validation functions.  (Prior to this, the TSC's
> were synced by writing directly to the TSC MSR and natural delays in the
> processor firmware caused the slight differences in the TSC ADJUST values.)

Right. TSC_ADJUST is there for a reason.

> > But the UV 'boot chassis at different times' brings me to a related
> > question:
> 
> Essentially what happens is the system reset signals are distributed in
> various ways which cause the different chassis to start up asynchronously with
> each other.  The UV system is not "hard" bound to each other but adapts to the
> system configuration as it starts up.

I figured that much.

> > How is this setup dealing with ART (Always Running Timer, which is
> > distributed over PCIe for hardware timestamping and hardware assisted event
> > correlation)?
> > 
> > I assume that ART on UV is also per chassis, but that means that the
> > documented relation ship of:
> > 
> > 	TSC = ART * n/d + offset
> > 
> > where $offset is system wide (the TSC_ADJUST value of the boot cpu), is
> > not applicable.
> > 
> > Is there some other magic in play which makes ART work across chassis?
> >
> > Thanks,
> >
> > 	tglx
> >
> 
> Sorry, I'm not sure how the UV hardware mimics the concept of 'ART'.  It does
> have an external clock generator that is distributed as part of the NumaLink
> protocol and signal set.  Since separate chassis can be configured to be
> either within the same SSI or in separate SSI's then it has the ability to
> configure which chassis are in sync with each other and which are on a
> different clock sync.  This is all within the purview of the BIOS folks.

Cute. How is that supposed to work, when the chassis are out of sync?

> We do have independent methods to verify if TSCs' are in sync with each other
> by measuring the skew rate.  Typical deviations on UV are within a two digit
> clock tick spread, which at an Uncore frequency of 2.5Ghz is in the small
> single digit or less nanosecond range.

That should be good enough to pass the kernel side tests.

But back to my question about ART. You might talk to your BIOS/HW folks
about that and eventually disable the ART related functionality in the
kernel on UV.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid
  2017-09-25 18:10       ` Thomas Gleixner
@ 2017-09-26  0:13         ` Mike Travis
  2017-09-26  7:28           ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Travis @ 2017-09-26  0:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Bin Gao,
	Prarit Bhargava, Dimitri Sivanich, Andrew Banman, Russ Anderson,
	linux-kernel, x86



On 9/25/2017 11:10 AM, Thomas Gleixner wrote:
> On Mon, 25 Sep 2017, Mike Travis wrote:
>> On 9/25/2017 8:30 AM, Thomas Gleixner wrote:
>>> Aside of that I really do not like this kind of special case hackery. The
>>> real question is whether we need to enforce TSC_ADJUST == 0 on the boot cpu
>>> at all. In principle we don't anymore now that we handle that TSC deadline
>>> timer wreckage cleanly.
>>
>> I am hesitant to make such a global change as it appears the author
>> intentionally added this.  It not only caused our internal tsc sync tests to
>> become totally out of whack, it also generated an avalanche of error messages
>> to the system console (>3000 messages for a 32 socket Skylake system).  And I
>> don't have the means to test how major changes to the TSC adjust functions
>> will affect standard whitebox PC's.
> 
> The reason why I put it there in the first place was to make the TSC
> deadline timer work on a whole range of systems. It turned out that our
> 'fix' was not enough so we changed that later to disable the deadline timer
> completely on affected systems when the firmware does not contain a fix for
> it. So there is no real technical reason anymore to enforce TSC_ADJUST == 0
> on the boot CPU. So rather than special casing this for UV we should just
> remove that requirement and leave the boot value as is.

Okay, I could do that but as I mentioned I'm not comfortable changing 
code that I cannot personally verify is correct.  I'm assuming you only 
mean removing the part where socket 0 should have a TSC_ADJUST value of 
0?  And the part mentioned below that all cpu threads on a socket should 
have the same adjust values should not be changed?
> 
>> Our BIOS team did make a change to conform to the "TSC_ADJUST should be the
>> same on all cpu threads on a single socket" requirement, so we were able to
>> pass that part of the TSC validation functions.  (Prior to this, the TSC's
>> were synced by writing directly to the TSC MSR and natural delays in the
>> processor firmware caused the slight differences in the TSC ADJUST values.)
> 
> Right. TSC_ADJUST is there for a reason.

Well, there is the problem with bit 63 (sign bit) in the TSC ADJUST MSR. 
So it's almost there... :)
> 
>>> But the UV 'boot chassis at different times' brings me to a related
>>> question:
>>
>> Essentially what happens is the system reset signals are distributed in
>> various ways which cause the different chassis to start up asynchronously with
>> each other.  The UV system is not "hard" bound to each other but adapts to the
>> system configuration as it starts up.
> 
> I figured that much.
> 
>>> How is this setup dealing with ART (Always Running Timer, which is
>>> distributed over PCIe for hardware timestamping and hardware assisted event
>>> correlation)?
>>>
>>> I assume that ART on UV is also per chassis, but that means that the
>>> documented relation ship of:
>>>
>>> 	TSC = ART * n/d + offset
>>>
>>> where $offset is system wide (the TSC_ADJUST value of the boot cpu), is
>>> not applicable.
>>>
>>> Is there some other magic in play which makes ART work across chassis?
>>>
>>> Thanks,
>>>
>>> 	tglx
>>>
>>
>> Sorry, I'm not sure how the UV hardware mimics the concept of 'ART'.  It does
>> have an external clock generator that is distributed as part of the NumaLink
>> protocol and signal set.  Since separate chassis can be configured to be
>> either within the same SSI or in separate SSI's then it has the ability to
>> configure which chassis are in sync with each other and which are on a
>> different clock sync.  This is all within the purview of the BIOS folks.
> 
> Cute. How is that supposed to work, when the chassis are out of sync?

Huh?  The TSC's are only guaranteed to be synced within an SSI.  As long 
as the sockets in the same SSI share a common clock, the BIOS can 
synchronize the TSC's to be very, very close.

>> We do have independent methods to verify if TSCs' are in sync with each other
>> by measuring the skew rate.  Typical deviations on UV are within a two digit
>> clock tick spread, which at an Uncore frequency of 2.5Ghz is in the small
>> single digit or less nanosecond range.
> 
> That should be good enough to pass the kernel side tests.
> 
> But back to my question about ART. You might talk to your BIOS/HW folks
> about that and eventually disable the ART related functionality in the
> kernel on UV.

Once again, I'm not sure what code you are talking about.  I talked to 
the BIOS engineer specifically working on TSC related items and he was 
also not familiar with the Intel ART spec.  I could go farther up the 
chain to the hardware designers but right now this would be extremely 
low on their priority queue.

Perhaps if you were more specific in what code should be disabled?  We 
would like to keep as much as the self check code as possible as there 
are still some remotely possible failure cases.  And removing the self 
checks would have a negative impact on reliability.

I actually did test using the CPU flags, "TSC reliable" and no "TSC 
ADJUST" to prevent the kernel from overwriting the already correct TSC 
ADJUST values.  But that also turned off the clock source watchdog and 
having the check_tsc_sync_target() test run was a good verification 
step.  Our goal was/is to conform as much as possible to the checks and 
the only unfixable code was the assumption that socket 0 should have a 
TSC_ADJUST value of 0.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid
  2017-09-26  0:13         ` Mike Travis
@ 2017-09-26  7:28           ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2017-09-26  7:28 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Bin Gao,
	Prarit Bhargava, Dimitri Sivanich, Andrew Banman, Russ Anderson,
	linux-kernel, x86

On Mon, 25 Sep 2017, Mike Travis wrote:
> On 9/25/2017 11:10 AM, Thomas Gleixner wrote:
> > The reason why I put it there in the first place was to make the TSC
> > deadline timer work on a whole range of systems. It turned out that our
> > 'fix' was not enough so we changed that later to disable the deadline timer
> > completely on affected systems when the firmware does not contain a fix for
> > it. So there is no real technical reason anymore to enforce TSC_ADJUST == 0
> > on the boot CPU. So rather than special casing this for UV we should just
> > remove that requirement and leave the boot value as is.
> 
> Okay, I could do that but as I mentioned I'm not comfortable changing code
> that I cannot personally verify is correct.  I'm assuming you only mean
> removing the part where socket 0 should have a TSC_ADJUST value of 0?  And the
> part mentioned below that all cpu threads on a socket should have the same
> adjust values should not be changed?

Right, only the restriction of TSC_ADJUST == 0 on the boot CPU can be
removed. If the TSC_ADJUST values in a package differ, then BIOS did
something really stupid because the underlying counter is the same for all
threads in a package and different TSC_ADJUST values will result in TSC out
of sync being detected.

> > > Our BIOS team did make a change to conform to the "TSC_ADJUST should be
> > > the
> > > same on all cpu threads on a single socket" requirement, so we were able
> > > to
> > > pass that part of the TSC validation functions.  (Prior to this, the TSC's
> > > were synced by writing directly to the TSC MSR and natural delays in the
> > > processor firmware caused the slight differences in the TSC ADJUST
> > > values.)
> > 
> > Right. TSC_ADJUST is there for a reason.
> 
> Well, there is the problem with bit 63 (sign bit) in the TSC ADJUST MSR. So
> it's almost there... :)

What's the problem with bit 63?

> > But back to my question about ART. You might talk to your BIOS/HW folks
> > about that and eventually disable the ART related functionality in the
> > kernel on UV.
> 
> Once again, I'm not sure what code you are talking about.  I talked to the
> BIOS engineer specifically working on TSC related items and he was also not
> familiar with the Intel ART spec.  I could go farther up the chain to the
> hardware designers but right now this would be extremely low on their priority
> queue.
> 
> Perhaps if you were more specific in what code should be disabled?  We would
> like to keep as much as the self check code as possible as there are still
> some remotely possible failure cases.  And removing the self checks would have
> a negative impact on reliability.

detect_art() -> X86_FEATURE_ART

X86_FEATURE_ART enables the hardware assisted PTP <-> TSC correlation in
network cards and is used for network synchronized audio stuff. It's
probably a non issue today but TSN is becoming more widespread so I expect
more users in the foreseeable future. We can just drop out in detect_art()
if the system is UV so X86_FEATURE_ART does not get set and all related
functionality is disabled automagically.

> I actually did test using the CPU flags, "TSC reliable" and no "TSC ADJUST" to
> prevent the kernel from overwriting the already correct TSC ADJUST values.
> But that also turned off the clock source watchdog and having the
> check_tsc_sync_target() test run was a good verification step.  Our goal
> was/is to conform as much as possible to the checks and the only unfixable
> code was the assumption that socket 0 should have a TSC_ADJUST value of 0.

Which can be removed. If you don't want to do it. I'm happy to write the
patch^Wchangelog myself.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-26  7:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-21 20:12 [PATCH 0/3] x86/platform/UV: Update TSC support mike.travis
2017-09-21 20:12 ` [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid mike.travis
2017-09-25 15:30   ` Thomas Gleixner
2017-09-25 16:47     ` Mike Travis
2017-09-25 18:10       ` Thomas Gleixner
2017-09-26  0:13         ` Mike Travis
2017-09-26  7:28           ` Thomas Gleixner
2017-09-21 20:12 ` [PATCH 2/3] x86/kernel: Skip TSC test and error messages if already unstable mike.travis
2017-09-21 20:12 ` [PATCH 3/3] x86/platform/UV: Add check of TSC state set by UV BIOS mike.travis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.