[PATCH] arch/x86/kernel/tsc.c : set X86_FEATURE_ART for TSC on CPUs like i7-4910MQ : bug #194609

* [PATCH] arch/x86/kernel/tsc.c : set X86_FEATURE_ART for TSC on CPUs like i7-4910MQ : bug #194609
@ 2017-02-19  0:31 Jason Vas Dias
  2017-02-19 15:35 ` Jason Vas Dias
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Vas Dias @ 2017-02-19  0:31 UTC (permalink / raw)
  To: kernel-janitors, linux-kernel, prarit

[-- Attachment #1: Type: text/plain, Size: 3499 bytes --]

I originally reported this issue on bugzilla.kernel.org : bug # 194609 :
https://bugzilla.kernel.org/show_bug.cgi?id=194609
, but it was not posted to the list .

My CPU reports 'model name' as
"Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz" ,
has 4 physical & 8 hyperthreading cores with a frequency scalable from 800000
to 3900000 (/sys/devices/system/cpu/cpu0/cpufreq/scaling_{min,max}_freq) , and
flags :
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt dtherm
ida arat pln pts

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
$

CPUID:15H is available in user-space, returning the integers : ( 7,
832, 832 ) in EAX:EBX:ECX , yet boot_cpu_data.cpuid_level is 13 , so
in detect_art() in tsc.c,
Linux does not think ART is enabled, and does not set the synthesized CPUID +
((3*32)+10) bit, so a program looking at /dev/cpu/0/cpuid would not
see this bit set .
if an e1000 NIC card had been installed, PTP would not be available.
Also, if the MSR TSC_ADJUST has not yet been written, as it seems to be
nowhere else in Linux,  the code will always think X86_FEATURE_ART is 0
because the CPU will always get a fault reading the MSR since it has
never been written.

So the attached patch makes tsc.c set X86_FEATURE_ART correctly in tsc.c ,
and set the TSC_ADJUST to 0 if the rdmsr gets an error .
Please consider applying it to a future linux version.

It would be nice for user-space programs that want to use the TSC with
rdtsc / rdtscp instructions, such as the demo program attached to the
bug report,
could have confidence that Linux is actually generating the results of
clock_gettime(CLOCK_MONOTONIC_RAW, &timespec)
in a predictable way from the TSC by looking at the
 /dev/cpu/0/cpuid[bit(((3*32)+10)] value before enabling user-space
use of TSC values, so that they can correlate TSC values with linux
clock_gettime() values.

The patch applies to linux kernels v4.8 & v4.9.10 GIT tags  and the
kernels build
and run & the demo program produces results like :
 $ ./ttsc1
has tsc: 1 constant: 1
832 / 7 = 118 : 832 - 9.888914286E+04hz : OK:1
Hooray! TSC is enabled with KHz: 2893300
ts2 - ts1: 261 ts3 - ts2: 211 ns1: 0.000000146 ns2: 0.000001629
ts3 - ts2: 27 ns1: 0.000000168
ts3 - ts2: 20 ns1: 0.000000147
ts3 - ts2: 14 ns1: 0.000000152
ts3 - ts2: 15 ns1: 0.000000151
ts3 - ts2: 15 ns1: 0.000000153
ts3 - ts2: 15 ns1: 0.000000150
ts3 - ts2: 20 ns1: 0.000000148
ts3 - ts2: 19 ns1: 0.000000164
ts3 - ts2: 19 ns1: 0.000000164
ts3 - ts2: 19 ns1: 0.000000160
t1 - t0: 52901 - ns2: 0.000053951

The value 'ts3 - ts2' is the number of nanoseconds measured by
successive calls to
'rdtscp'; the 'ns1' value is the number of nanoseconds (shown as
decimal seconds)
measured by
  clock_gettime(CLOCK_MONOTONIC_RAW, &ts2) -
  clock_gettime(CLOCK_MONOTONIC_RAW, &ts1)
when casting each {ts.tv_sec, ts.tv_nsec} to a 128 bit long long integer .
It shows a user-space program can read the TSC with a latency of @20ns
but can only measure times >= @ 140ns using Linux clock_gettime()  on this CPU.

[-- Attachment #2: x86_kernel_tsc-bz194609.patch --]
[-- Type: application/octet-stream, Size: 2993 bytes --]

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 46b2f41..f76cca8 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1030,6 +1030,7 @@ core_initcall(cpufreq_register_tsc_scaling);
 #endif /* CONFIG_CPU_FREQ */
 
 #define ART_CPUID_LEAF (0x15)
+#define MINIMUM_CPUID_EXTENDED_LEAF_THAT_MUST_HAVE_ART (0x80000008)
 #define ART_MIN_DENOMINATOR (1)
 
 
@@ -1038,24 +1039,43 @@ core_initcall(cpufreq_register_tsc_scaling);
  */
 static void detect_art(void)
 {
-	unsigned int unused[2];
-
-	if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
-		return;
-
-	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
-	      &art_to_tsc_numerator, unused, unused+1);
-
+	unsigned int v[2];
+
+	if(boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
+        {
+                if(boot_cpu_data.extended_cpuid_level >= MINIMUM_CPUID_EXTENDED_LEAF_THAT_MUST_HAVE_ART)
+                {
+                        pr_info("Would normally not use ART - cpuid_level:%d < %d - but extended_cpuid_level is: %x, so probing for ART support.\n",
+                        boot_cpu_data.cpuid_level, ART_CPUID_LEAF, boot_cpu_data.extended_cpuid_level);
+                }else
+                        return;
+        }                
+
+        cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
+              &art_to_tsc_numerator, v, v+1);
+        
 	/* Don't enable ART in a VM, non-stop TSC required */
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
-	    !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
-	    art_to_tsc_denominator < ART_MIN_DENOMINATOR)
-		return;
-
-	if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
-		return;
-
+           !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
+	   art_to_tsc_denominator < ART_MIN_DENOMINATOR)
+        {
+                pr_info("Not using Intel ART for TSC - HYPERVISOR:%d  NO NONSTOP_TSC:%d  bad TSC/Crystal ratio denominator: %d.", boot_cpu_has(X86_FEATURE_HYPERVISOR), !boot_cpu_has(X86_FEATURE_NONSTOP_TSC), art_to_tsc_denominator );
+                return;
+        }
+	if (  (v[0]=rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))!=0) /* will get fault on first read if nothing written yet */
+        {
+                if((v[1]=wrmsrl_safe(MSR_IA32_TSC_ADJUST, 0))!=0)
+                {
+                        pr_info("Not using Intel ART for TSC - failed to initialize TSC_ADJUST: %d %d.\n", v[0], v[1] );
+                        return;
+                }else
+                {
+                        art_to_tsc_offset = 0; /* perhaps initalize to -1 * current rdtsc value ? */
+                        pr_info("Using Intel ART for TSC - TSC_ADJUST initialized to %llu.\n",art_to_tsc_offset);
+                }
+        }
 	/* Make this sticky over multiple CPU init calls */
+        pr_info("Using Intel Always Running Timer (ART) feature %x for TSC on all CPUs - TSC/CCC: %d/%d offset: %llu.\n", X86_FEATURE_ART, art_to_tsc_numerator, art_to_tsc_denominator, art_to_tsc_offset );
 	setup_force_cpu_cap(X86_FEATURE_ART);
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread