On 24/10/2019 20:45, speck for Borislav Petkov wrote: > On Thu, Oct 24, 2019 at 06:39:35PM +0100, speck for Andrew Cooper wrote: >> On 23/10/2019 10:01, speck for Pawan Gupta wrote: >>> + if (tsx_ctrl_state == TSX_CTRL_DISABLE) { >>> + tsx_disable(); >>> + >>> + /* >>> + * tsx_disable() will change the state of the >>> + * RTM CPUID bit. Clear it here since it is now >>> + * expected to be not set. >>> + */ >>> + setup_clear_cpu_cap(X86_FEATURE_RTM); >> This same argument applies to HLE, and it would be weird for >> pre-TSX_CTRL CPUs with tsx=off to report HLE but not RTM in /proc/cpuid > Right, the correct fix for that would be to run tsx_init() before we > read CPUID leafs in get_cpu_cap() because then it'll load the proper > bits already and we won't have to clear anything - the MSR write > would've cleared both CPUID bits already. > > I just checked that it happens globally even: > > $ ./cpuid -r | grep -E "^\s+0x00000007" | awk '{ print $6 }' | uniq > edx=0xbc000400 > > Bits 4 (HLE) and 11 (RTM) are cleared on all CPUs. > > I'll try this tomorrow to check whether it would even work that early. > > If there's issues with it, then we'll have to do the above thing and > clear HLE too, by hand. On Xen, I've juggled things such that we load microcode, then interpret tsx= if the user has provided it (taking care to always write MSR_TSX_CTRL if it is available, to discard whatever settings firmware or kexec left), before querying CPUID. Later, the spec-ctrl= interpretation happens, which might choose to turn off TSX due to TAA, which then has to modify MSR_TSX_CTRL and force clear the bits in the policy. Given that speculative mitigations rely heavily on CPUID, I can't reason about a clean way to disentangle this, but the above seems to be the least complicated algorithm. >> Furthermore, while grepping through the tree, I found >> >> events/intel/lbr.c-267-static inline bool >> lbr_from_signext_quirk_needed(void) >> events/intel/lbr.c-268-{ >> events/intel/lbr.c-269- int lbr_format = x86_pmu.intel_cap.lbr_format; >> events/intel/lbr.c:270: bool tsx_support = boot_cpu_has(X86_FEATURE_HLE) || >> events/intel/lbr.c-271-                    boot_cpu_has(X86_FEATURE_RTM); >> events/intel/lbr.c-272- >> events/intel/lbr.c-273- return !tsx_support && (lbr_desc[lbr_format] & >> LBR_TSX); >> >> which is going to need an adjustment to avoid applying the quirks on >> non-broken hardware. > When both HLE and RTM bits are cleared, that should still be correct, > no? Or am I missing something? On Haswell and Broadwell, the microcode which turned HLE/RTM off in the pipeline left the LBR MSRs in a state where you can't context switch the value, because they would yield a value via RDMSR which WRMSR faulted on, because the two operations had an asymmetric view of how the top bits of metadata should be interpreted, given some TSX-related metadata and a sign extended linear address. On Skylake where you can't actually turn RTM off, but we may hide FEATURE_RTM/HLE, the above quirk is probably not true. On Cascadelake, who knows?  RTM is being turned off in the pipeline, but maybe the HSX/BWX bug has been fixed, or maybe it is being turned off in a different way, or ... ~Andrew