I have found a new source of weirdness with TSC using clock_gettime(CLOCK_MONOTONIC_RAW,&ts) : The vsyscall_gtod_data.mult field changes somewhat between calls to clock_gettime(CLOCK_MONOTONIC_RAW,&ts), so that sometimes an extra (2^24) nanoseconds are added or removed from the value derived from the TSC and stored in 'ts' . This is demonstrated by the output of the test program in the attached ttsc.tar file: $ ./tlgtd it worked! - GTOD: clock:1 mult:5798662 shift:24 synced - mult now: 5798661 What it is doing is finding the address of the 'vsyscall_gtod_data' structure from /proc/kallsyms, and mapping the virtual address to an ELF section offset within /proc/kcore, and reading just the 'vsyscall_gtod_data' structure into user-space memory . Really, this 'mult' value, which is used to return the seconds|nanoseconds value: ( tsc_cycles * mult ) >> shift (where shift is 24 ), should not change from the first time it is initialized . The TSC is meant to be FIXED FREQUENCY, right ? So how could / why should the conversion function from TSC ticks to nanoseconds change ? So now it is doubly difficult for user-space libraries to maintain their RDTSC derived seconds|nanoseconds values to correlate well those returned by the kernel, because they must regularly read the updated 'mult' value used by the kernel . I really don't think the kernel should randomly be deciding to increase / decrease the TSC tick period by 2^24 nanoseconds! Is this a bug or intentional ? I am searching for all places where a '[.>]mult.*=' occurs, but this returns rather alot of matches. Please could a future version of linux at least export the 'mult' and 'shift' values for the current clocksource ! Regards, Jason On 22/02/2017, Jason Vas Dias wrote: > OK, last post on this issue today - > can anyone explain why, with standard 4.10.0 kernel & no new > 'notsc_adjust' option, and the same maths being used, these two runs > should display > such a wide disparity between clock_gettime(CLOCK_MONOTONIC_RAW,&ts) > values ? : > > $ J/pub/ttsc/ttsc1 > max_extended_leaf: 80000008 > has tsc: 1 constant: 1 > Invariant TSC is enabled: Actual TSC freq: 2.893299GHz - TSC adjust: 1. > ts2 - ts1: 162 ts3 - ts2: 110 ns1: 0.000000641 ns2: 0.000002850 > ts3 - ts2: 175 ns1: 0.000000659 > ts3 - ts2: 18 ns1: 0.000000643 > ts3 - ts2: 18 ns1: 0.000000618 > ts3 - ts2: 17 ns1: 0.000000620 > ts3 - ts2: 17 ns1: 0.000000616 > ts3 - ts2: 18 ns1: 0.000000641 > ts3 - ts2: 18 ns1: 0.000000709 > ts3 - ts2: 20 ns1: 0.000000763 > ts3 - ts2: 20 ns1: 0.000000735 > ts3 - ts2: 20 ns1: 0.000000761 > t1 - t0: 78200 - ns2: 0.000080824 > $ J/pub/ttsc/ttsc1 > max_extended_leaf: 80000008 > has tsc: 1 constant: 1 > Invariant TSC is enabled: Actual TSC freq: 2.893299GHz - TSC adjust: 1. > ts2 - ts1: 217 ts3 - ts2: 221 ns1: 0.000001294 ns2: 0.000005375 > ts3 - ts2: 210 ns1: 0.000001418 > ts3 - ts2: 23 ns1: 0.000001399 > ts3 - ts2: 22 ns1: 0.000001445 > ts3 - ts2: 25 ns1: 0.000001321 > ts3 - ts2: 20 ns1: 0.000001428 > ts3 - ts2: 25 ns1: 0.000001367 > ts3 - ts2: 23 ns1: 0.000001425 > ts3 - ts2: 23 ns1: 0.000001357 > ts3 - ts2: 22 ns1: 0.000001487 > ts3 - ts2: 25 ns1: 0.000001377 > t1 - t0: 145753 - ns2: 0.000150781 > > (complete source of test program ttsc1 attached in ttsc.tar > $ tar -xpf ttsc.tar > $ cd ttsc > $ make > ). > > On 22/02/2017, Jason Vas Dias wrote: >> I actually tried adding a 'notsc_adjust' kernel option to disable any >> setting or >> access to the TSC_ADJUST MSR, but then I see the problems - a big >> disparity >> in values depending on which CPU the thread is scheduled - and no >> improvement in clock_gettime() latency. So I don't think the new >> TSC_ADJUST >> code in ts_sync.c itself is the issue - but something added @ 460ns >> onto every clock_gettime() call when moving from v4.8.0 -> v4.10.0 . >> As I don't think fixing the clock_gettime() latency issue is my problem >> or >> even >> possible with current clock architecture approach, it is a non-issue. >> >> But please, can anyone tell me if are there any plans to move the time >> infrastructure out of the kernel and into glibc along the lines >> outlined >> in previous mail - if not, I am going to concentrate on this more radical >> overhaul approach for my own systems . >> >> At least, I think mapping the clocksource information structure itself in >> some >> kind of sharable page makes sense . Processes could map that page >> copy-on-write >> so they could start off with all the timing parameters preloaded, then >> keep >> their copy updated using the rdtscp instruction , or msync() (read-only) >> with the kernel's single copy to get the latest time any process has >> requested. >> All real-time parameters & adjustments could be stored in that page , >> & eventually a single copy of the tzdata could be used by both kernel >> & user-space. >> That is what I am working towards. Any plans to make linux real-time tsc >> clock user-friendly ? >> >> >> >> On 22/02/2017, Jason Vas Dias wrote: >>> Yes, my CPU is still getting a fault every time the TSC_ADJUST MSR is >>> read or written . It is probably because it genuinuely does not >>> support any cpuid > 13 , >>> or the modern TSC_ADJUST interface . This is probably why my >>> clock_gettime() >>> latencies are so bad. Now I have to develop a patch to disable all >>> access >>> to >>> TSC_ADJUST MSR if boot_cpu_data.cpuid_level <= 13 . >>> I really have an unlucky CPU :-) . >>> >>> But really, I think this issue goes deeper into the fundamental limits >>> of >>> time measurement on Linux : it is never going to be possible to measure >>> minimum times with clock_gettime() comparable with those returned by >>> rdtscp instruction - the time taken to enter the kernel through the >>> VDSO, >>> queue an access to vsyscall_gtod_data via a workqueue, access it & do >>> computations & copy value to user-space is NEVER going to be up to the >>> job of measuring small real-time durations of the order of 10-20 TSC >>> ticks >>> . >>> >>> I think the best way to solve this problem going forward would be to >>> store >>> the entire vsyscall_gtod_data data structure representing the current >>> clocksource >>> in a shared page which is memory-mappable (read-only) by user-space . >>> I think sser-space programs should be able to do something like : >>> int fd = >>> open("/sys/devices/system/clocksource/clocksource0/gtod.page",O_RDONLY); >>> size_t psz = getpagesize(); >>> void *gtod = mmap( 0, psz, PROT_READ, MAP_PRIVATE, fd, 0 ); >>> msync(gtod,psz,MS_SYNC); >>> >>> Then they could all read the real-time clock values as they are updated >>> in real-time by the kernel, and know exactly how to interpret them . >>> >>> I also think that all mktime() / gmtime() / localtime() timezone >>> handling >>> functionality should be >>> moved to user-space, and that the kernel should actually load and link >>> in >>> some >>> /lib/libtzdata.so >>> library, provided by glibc / libc implementations, that is exactly the >>> same library >>> used by glibc() code to parse tzdata ; tzdata should be loaded at boot >>> time >>> by the kernel from the same places glibc loads it, and both the kernel >>> and >>> glibc should use identical mktime(), gmtime(), etc. functions to access >>> it, >>> and >>> glibc using code would not need to enter the kernel at all for any >>> time-handling >>> code. This tzdata-library code be automatically loaded into process >>> images >>> the >>> same way the vdso region is , and the whole system could access only one >>> copy of it and the 'gtod.page' in memory. >>> >>> That's just my two-cents worth, and how I'd like to eventually get >>> things working >>> on my system. >>> >>> All the best, Regards, >>> Jason >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On 22/02/2017, Jason Vas Dias wrote: >>>> On 22/02/2017, Jason Vas Dias wrote: >>>>> RE: >>>>>>> 4.10 has new code which utilizes the TSC_ADJUST MSR. >>>>> >>>>> I just built an unpatched linux v4.10 with tglx's TSC improvements - >>>>> much else improved in this kernel (like iwlwifi) - thanks! >>>>> >>>>> I have attached an updated version of the test program which >>>>> doesn't print the bogus "Nominal TSC Frequency" (the previous >>>>> version printed it, but equally ignored it). >>>>> >>>>> The clock_gettime(CLOCK_MONOTONIC_RAW,&ts) latency has improved by >>>>> a factor of 2 - it used to be @140ns and is now @ 70ns ! Wow! : >>>>> >>>>> $ uname -r >>>>> 4.10.0 >>>>> $ ./ttsc1 >>>>> max_extended_leaf: 80000008 >>>>> has tsc: 1 constant: 1 >>>>> Invariant TSC is enabled: Actual TSC freq: 2.893299GHz. >>>>> ts2 - ts1: 144 ts3 - ts2: 96 ns1: 0.000000588 ns2: 0.000002599 >>>>> ts3 - ts2: 178 ns1: 0.000000592 >>>>> ts3 - ts2: 14 ns1: 0.000000577 >>>>> ts3 - ts2: 14 ns1: 0.000000651 >>>>> ts3 - ts2: 17 ns1: 0.000000625 >>>>> ts3 - ts2: 17 ns1: 0.000000677 >>>>> ts3 - ts2: 17 ns1: 0.000000626 >>>>> ts3 - ts2: 17 ns1: 0.000000627 >>>>> ts3 - ts2: 17 ns1: 0.000000627 >>>>> ts3 - ts2: 18 ns1: 0.000000655 >>>>> ts3 - ts2: 17 ns1: 0.000000631 >>>>> t1 - t0: 89067 - ns2: 0.000091411 >>>>> >>>> >>>> >>>> Oops, going blind in my old age. These latencies are actually 3 times >>>> greater than under 4.8 !! >>>> >>>> Under 4.8, the program printed latencies of @ 140ns for clock_gettime, >>>> as >>>> shown >>>> in bug 194609 as the 'ns1' (timespec_b - timespec_a) value:: >>>> >>>> ts3 - ts2: 24 ns1: 0.000000162 >>>> ts3 - ts2: 17 ns1: 0.000000143 >>>> ts3 - ts2: 17 ns1: 0.000000146 >>>> ts3 - ts2: 17 ns1: 0.000000149 >>>> ts3 - ts2: 17 ns1: 0.000000141 >>>> ts3 - ts2: 16 ns1: 0.000000142 >>>> >>>> now the clock_gettime(CLOCK_MONOTONIC_RAW,&ts) latency is @ >>>> 600ns, @ 4 times more than under 4.8 . >>>> But I'm glad the TSC_ADJUST problems are fixed. >>>> >>>> Will programs reading : >>>> $ cat /sys/devices/msr/events/tsc >>>> event=0x00 >>>> read a new event for each setting of the TSC_ADJUST MSR or a wrmsr on >>>> the >>>> TSC ? >>>> >>>>> I think this is because under Linux 4.8, the CPU got a fault every >>>>> time it read the TSC_ADJUST MSR. >>>> >>>> maybe it still is! >>>> >>>> >>>>> But user programs wanting to use the TSC and correlate its value to >>>>> clock_gettime(CLOCK_MONOTONIC_RAW) values accurately like the above >>>>> program still have to dig the TSC frequency value out of the kernel >>>>> with objdump - this was really the point of the bug #194609. >>>>> >>>>> I would still like to investigate exporting 'tsc_khz' & 'mult' + >>>>> 'shift' values via sysfs. >>>>> >>>>> Regards, >>>>> Jason. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 21/02/2017, Jason Vas Dias wrote: >>>>>> Thank You for enlightening me - >>>>>> >>>>>> I was just having a hard time believing that Intel would ship a chip >>>>>> that features a monotonic, fixed frequency timestamp counter >>>>>> without specifying in either documentation or on-chip or in ACPI what >>>>>> precisely that hard-wired frequency is, but I now know that to >>>>>> be the case for the unfortunate i7-4910MQ - I mean, how can the CPU >>>>>> assert CPUID:80000007[8] ( InvariantTSC ) which it does, which is >>>>>> difficult to reconcile with the statement in the SDM : >>>>>> 17.16.4 Invariant Time-Keeping >>>>>> The invariant TSC is based on the invariant timekeeping hardware >>>>>> (called Always Running Timer or ART), that runs at the core >>>>>> crystal >>>>>> clock >>>>>> frequency. The ratio defined by CPUID leaf 15H expresses the >>>>>> frequency >>>>>> relationship between the ART hardware and TSC. If >>>>>> CPUID.15H:EBX[31:0] >>>>>> != >>>>>> 0 >>>>>> and CPUID.80000007H:EDX[InvariantTSC] = 1, the following >>>>>> linearity >>>>>> relationship holds between TSC and the ART hardware: >>>>>> TSC_Value = (ART_Value * CPUID.15H:EBX[31:0] ) >>>>>> / CPUID.15H:EAX[31:0] + K >>>>>> Where 'K' is an offset that can be adjusted by a privileged >>>>>> agent*2. >>>>>> When ART hardware is reset, both invariant TSC and K are also >>>>>> reset. >>>>>> >>>>>> So I'm just trying to figure out what CPUID.15H:EBX[31:0] and >>>>>> CPUID.15H:EAX[31:0] are for my hardware. I assumed (incorrectly) >>>>>> that >>>>>> the "Nominal TSC Frequency" formulae in the manul must apply to all >>>>>> CPUs with InvariantTSC . >>>>>> >>>>>> Do I understand correctly , that since I do have InvariantTSC , the >>>>>> TSC_Value is in fact calculated according to the above formula, but >>>>>> with >>>>>> a "hidden" ART Value, & Core Crystal Clock frequency & its ratio to >>>>>> TSC frequency ? >>>>>> It was obvious this nominal TSC Frequency had nothing to do with the >>>>>> actual TSC frequency used by Linux, which is 'tsc_khz' . >>>>>> I guess wishful thinking led me to believe CPUID:15h was actually >>>>>> supported somehow , because I thought InvariantTSC meant it had ART >>>>>> hardware . >>>>>> >>>>>> I do strongly suggest that Linux exports its calibrated TSC Khz >>>>>> somewhere to user >>>>>> space . >>>>>> >>>>>> I think the best long-term solution would be to allow programs to >>>>>> somehow read the TSC without invoking >>>>>> clock_gettime(CLOCK_MONOTONIC_RAW,&ts), & >>>>>> having to enter the kernel, which incurs an overhead of > 120ns on my >>>>>> system >>>>>> . >>>>>> >>>>>> >>>>>> Couldn't linux export its 'tsc_khz' and / or 'clocksource->mult' and >>>>>> 'clocksource->shift' values to /sysfs somehow ? >>>>>> >>>>>> For instance , only if the 'current_clocksource' is 'tsc', then >>>>>> these >>>>>> values could be exported as: >>>>>> /sys/devices/system/clocksource/clocksource0/shift >>>>>> /sys/devices/system/clocksource/clocksource0/mult >>>>>> /sys/devices/system/clocksource/clocksource0/freq >>>>>> >>>>>> So user-space programs could know that the value returned by >>>>>> clock_gettime(CLOCK_MONOTONIC_RAW) >>>>>> would be >>>>>> { .tv_sec = ( ( rdtsc() * mult ) >> shift ) >> 32, >>>>>> , .tv_nsec = ( ( rdtsc() * mult ) >> shift ) >> &~0U >>>>>> } >>>>>> and that represents ticks of period (1.0 / ( freq * 1000 )) S. >>>>>> >>>>>> That would save user-space programs from having to know 'tsc_khz' by >>>>>> parsing the 'Refined TSC' frequency from log files or by examining >>>>>> the >>>>>> running kernel with objdump to obtain this value & figure out 'mult' >>>>>> & >>>>>> 'shift' themselves. >>>>>> >>>>>> And why not a >>>>>> /sys/devices/system/clocksource/clocksource0/value >>>>>> file that actually prints this ( ( rdtsc() * mult ) >> shift ) >>>>>> expression as a long integer? >>>>>> And perhaps a >>>>>> /sys/devices/pnp0/XX\:YY/rtc/rtc0/nanoseconds >>>>>> file that actually prints out the number of real-time nano-seconds >>>>>> since >>>>>> the >>>>>> contents of the existing >>>>>> /sys/devices/pnp0/XX\:YY/rtc/rtc0/{time,since_epoch} >>>>>> files using the current TSC value? >>>>>> To read the rtc0/{date,time} files is already faster than entering >>>>>> the >>>>>> kernel to call >>>>>> clock_gettime(CLOCK_REALTIME, &ts) & convert to integer for scripts. >>>>>> >>>>>> I will work on developing a patch to this effect if no-one else is. >>>>>> >>>>>> Also, am I right in assuming that the maximum granularity of the >>>>>> real-time >>>>>> clock >>>>>> on my system is 1/64th of a second ? : >>>>>> $ cat /sys/devices/pnp0/00\:02/rtc/rtc0/max_user_freq >>>>>> 64 >>>>>> This is the maximum granularity that can be stored in CMOS , not >>>>>> returned by TSC? Couldn't we have something similar that gave an >>>>>> accurate idea of TSC frequency and the precise formula applied to TSC >>>>>> value to get clock_gettime >>>>>> (CLOCK_MONOTONIC_RAW) value ? >>>>>> >>>>>> Regards, >>>>>> Jason >>>>>> >>>>>> >>>>>> This code does produce good timestamps with a latency of @20ns >>>>>> that correlate well with clock_gettIme(CLOCK_MONOTONIC_RAW,&ts) >>>>>> values, but it depends on a global variable that is initialized to >>>>>> the 'tsc_khz' value >>>>>> computed by running kernel parsed from objdump /proc/kcore output : >>>>>> >>>>>> static inline __attribute__((always_inline)) >>>>>> U64_t >>>>>> IA64_tsc_now() >>>>>> { if(!( _ia64_invariant_tsc_enabled >>>>>> ||(( _cpu0id_fd == -1) && >>>>>> IA64_invariant_tsc_is_enabled(NULL,NULL)) >>>>>> ) >>>>>> ) >>>>>> { fprintf(stderr, __FILE__":%d:(%s): must be called with invariant >>>>>> TSC enabled.\n"); >>>>>> return 0; >>>>>> } >>>>>> U32_t tsc_hi, tsc_lo; >>>>>> register UL_t tsc; >>>>>> asm volatile >>>>>> ( "rdtscp\n\t" >>>>>> "mov %%edx, %0\n\t" >>>>>> "mov %%eax, %1\n\t" >>>>>> "mov %%ecx, %2\n\t" >>>>>> : "=m" (tsc_hi) , >>>>>> "=m" (tsc_lo) , >>>>>> "=m" (_ia64_tsc_user_cpu) : >>>>>> : "%eax","%ecx","%edx" >>>>>> ); >>>>>> tsc=(((UL_t)tsc_hi) << 32)|((UL_t)tsc_lo); >>>>>> return tsc; >>>>>> } >>>>>> >>>>>> __thread >>>>>> U64_t _ia64_first_tsc = 0xffffffffffffffffUL; >>>>>> >>>>>> static inline __attribute__((always_inline)) >>>>>> U64_t IA64_tsc_ticks_since_start() >>>>>> { if(_ia64_first_tsc == 0xffffffffffffffffUL) >>>>>> { _ia64_first_tsc = IA64_tsc_now(); >>>>>> return 0; >>>>>> } >>>>>> return (IA64_tsc_now() - _ia64_first_tsc) ; >>>>>> } >>>>>> >>>>>> static inline __attribute__((always_inline)) >>>>>> void >>>>>> ia64_tsc_calc_mult_shift >>>>>> ( register U32_t *mult, >>>>>> register U32_t *shift >>>>>> ) >>>>>> { /* paraphrases Linux clocksource.c's clocks_calc_mult_shift() >>>>>> function: >>>>>> * calculates second + nanosecond mult + shift in same way linux >>>>>> does. >>>>>> * we want to be compatible with what linux returns in struct >>>>>> timespec ts after call to >>>>>> * clock_gettime(CLOCK_MONOTONIC_RAW, &ts). >>>>>> */ >>>>>> const U32_t scale=1000U; >>>>>> register U32_t from= IA64_tsc_khz(); >>>>>> register U32_t to = NSEC_PER_SEC / scale; >>>>>> register U64_t sec = ( ~0UL / from ) / scale; >>>>>> sec = (sec > 600) ? 600 : ((sec > 0) ? sec : 1); >>>>>> register U64_t maxsec = sec * scale; >>>>>> UL_t tmp; >>>>>> U32_t sft, sftacc=32; >>>>>> /* >>>>>> * Calculate the shift factor which is limiting the conversion >>>>>> * range: >>>>>> */ >>>>>> tmp = (maxsec * from) >> 32; >>>>>> while (tmp) >>>>>> { tmp >>=1; >>>>>> sftacc--; >>>>>> } >>>>>> /* >>>>>> * Find the conversion shift/mult pair which has the best >>>>>> * accuracy and fits the maxsec conversion range: >>>>>> */ >>>>>> for (sft = 32; sft > 0; sft--) >>>>>> { tmp = ((UL_t) to) << sft; >>>>>> tmp += from / 2; >>>>>> tmp = tmp / from; >>>>>> if ((tmp >> sftacc) == 0) >>>>>> break; >>>>>> } >>>>>> *mult = tmp; >>>>>> *shift = sft; >>>>>> } >>>>>> >>>>>> __thread >>>>>> U32_t _ia64_tsc_mult = ~0U, _ia64_tsc_shift=~0U; >>>>>> >>>>>> static inline __attribute__((always_inline)) >>>>>> U64_t IA64_s_ns_since_start() >>>>>> { if( ( _ia64_tsc_mult == ~0U ) || ( _ia64_tsc_shift == ~0U ) ) >>>>>> ia64_tsc_calc_mult_shift( &_ia64_tsc_mult, &_ia64_tsc_shift); >>>>>> register U64_t cycles = IA64_tsc_ticks_since_start(); >>>>>> register U64_t ns = ((cycles >>>>>> *((UL_t)_ia64_tsc_mult))>>_ia64_tsc_shift); >>>>>> return( (((ns / NSEC_PER_SEC)&0xffffffffUL) << 32) | ((ns % >>>>>> NSEC_PER_SEC)&0x3fffffffUL) ); >>>>>> /* Yes, we are purposefully ignoring durations of more than 4.2 >>>>>> billion seconds here! */ >>>>>> } >>>>>> >>>>>> >>>>>> I think Linux should export the 'tsc_khz', 'mult' and 'shift' values >>>>>> somehow, >>>>>> then user-space libraries could have more confidence in using 'rdtsc' >>>>>> or 'rdtscp' >>>>>> if Linux's current_clocksource is 'tsc'. >>>>>> >>>>>> Regards, >>>>>> Jason >>>>>> >>>>>> >>>>>> >>>>>> On 20/02/2017, Thomas Gleixner wrote: >>>>>>> On Sun, 19 Feb 2017, Jason Vas Dias wrote: >>>>>>> >>>>>>>> CPUID:15H is available in user-space, returning the integers : ( 7, >>>>>>>> 832, 832 ) in EAX:EBX:ECX , yet boot_cpu_data.cpuid_level is 13 , >>>>>>>> so >>>>>>>> in detect_art() in tsc.c, >>>>>>> >>>>>>> By some definition of available. You can feed CPUID random leaf >>>>>>> numbers >>>>>>> and >>>>>>> it will return something, usually the value of the last valid CPUID >>>>>>> leaf, >>>>>>> which is 13 on your CPU. A similar CPU model has >>>>>>> >>>>>>> 0x0000000d 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 >>>>>>> edx=0x00000000 >>>>>>> >>>>>>> i.e. 7, 832, 832, 0 >>>>>>> >>>>>>> Looks familiar, right? >>>>>>> >>>>>>> You can verify that with 'cpuid -1 -r' on your machine. >>>>>>> >>>>>>>> Linux does not think ART is enabled, and does not set the >>>>>>>> synthesized >>>>>>>> CPUID + >>>>>>>> ((3*32)+10) bit, so a program looking at /dev/cpu/0/cpuid would not >>>>>>>> see this bit set . >>>>>>> >>>>>>> Rightfully so. This is a Haswell Core model. >>>>>>> >>>>>>>> if an e1000 NIC card had been installed, PTP would not be >>>>>>>> available. >>>>>>> >>>>>>> PTP is independent of the ART kernel feature . ART just provides >>>>>>> enhanced >>>>>>> PTP features. You are confusing things here. >>>>>>> >>>>>>> The ART feature as the kernel sees it is a hardware extension which >>>>>>> feeds >>>>>>> the ART clock to peripherals for timestamping and time correlation >>>>>>> purposes. The ratio between ART and TSC is described by CPUID leaf >>>>>>> 0x15 >>>>>>> so >>>>>>> the kernel can make use of that correlation, e.g. for enhanced PTP >>>>>>> accuracy. >>>>>>> >>>>>>> It's correct, that the NONSTOP_TSC feature depends on the >>>>>>> availability >>>>>>> of >>>>>>> ART, but that has nothing to do with the feature bit, which solely >>>>>>> describes the ratio between TSC and the ART frequency which is >>>>>>> exposed >>>>>>> to >>>>>>> peripherals. That frequency is not necessarily the real ART >>>>>>> frequency. >>>>>>> >>>>>>>> Also, if the MSR TSC_ADJUST has not yet been written, as it seems >>>>>>>> to >>>>>>>> be >>>>>>>> nowhere else in Linux, the code will always think X86_FEATURE_ART >>>>>>>> is >>>>>>>> 0 >>>>>>>> because the CPU will always get a fault reading the MSR since it >>>>>>>> has >>>>>>>> never been written. >>>>>>> >>>>>>> Huch? If an access to the TSC ADJUST MSR faults, then something is >>>>>>> really >>>>>>> wrong. And writing it unconditionally to 0 is not going to happen. >>>>>>> 4.10 >>>>>>> has >>>>>>> new code which utilizes the TSC_ADJUST MSR. >>>>>>> >>>>>>>> It would be nice for user-space programs that want to use the TSC >>>>>>>> with >>>>>>>> rdtsc / rdtscp instructions, such as the demo program attached to >>>>>>>> the >>>>>>>> bug report, >>>>>>>> could have confidence that Linux is actually generating the results >>>>>>>> of >>>>>>>> clock_gettime(CLOCK_MONOTONIC_RAW, ×pec) >>>>>>>> in a predictable way from the TSC by looking at the >>>>>>>> /dev/cpu/0/cpuid[bit(((3*32)+10)] value before enabling user-space >>>>>>>> use of TSC values, so that they can correlate TSC values with linux >>>>>>>> clock_gettime() values. >>>>>>> >>>>>>> What has ART to do with correct CLOCK_MONOTONIC_RAW values? >>>>>>> >>>>>>> Nothing at all, really. >>>>>>> >>>>>>> The kernel makes use of the proper information values already. >>>>>>> >>>>>>> The TSC frequency is determined from: >>>>>>> >>>>>>> 1) CPUID(0x16) if available >>>>>>> 2) MSRs if available >>>>>>> 3) By calibration against a known clock >>>>>>> >>>>>>> If the kernel uses TSC as clocksource then the CLOCK_MONOTONIC_* >>>>>>> values >>>>>>> are >>>>>>> correct whether that machine has ART exposed to peripherals or not. >>>>>>> >>>>>>>> has tsc: 1 constant: 1 >>>>>>>> 832 / 7 = 118 : 832 - 9.888914286E+04hz : OK:1 >>>>>>> >>>>>>> And that voodoo math tells us what? That you found a way to >>>>>>> correlate >>>>>>> CPUID(0xd) to the TSC frequency on that machine. >>>>>>> >>>>>>> Now I'm curious how you do that on this other machine which returns >>>>>>> for >>>>>>> cpuid(15): 1, 1, 1 >>>>>>> >>>>>>> You can't because all of this is completely wrong. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> tglx >>>>>>> >>>>>> >>>>> >>>> >>> >> >