From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (193.142.43.55:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 24 Oct 2019 20:07:41 -0000 Received: from esa2.hc3370-68.iphmx.com ([216.71.145.153]) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iNjOV-00008M-OW for speck@linutronix.de; Thu, 24 Oct 2019 22:07:40 +0200 Subject: [MODERATED] Re: [PATCH 3/9] TAA 3 References: <580e02757c3e639bff00fcea830aa46eba46a92f.1571905227.git.bp@suse.de> <6f1ab744-622c-179b-276b-5506b2fd9ae1@citrix.com> <20191024194503.GH14115@zn.tnic> From: Andrew Cooper Message-ID: <38430127-3ece-dc06-2264-6b3bc347b523@citrix.com> Date: Thu, 24 Oct 2019 21:07:27 +0100 MIME-Version: 1.0 In-Reply-To: <20191024194503.GH14115@zn.tnic> Content-Type: multipart/mixed; boundary="lbbqloG4VN3oFyJWuPgDyjvRkeHkNVbfH"; protected-headers="v1" To: speck@linutronix.de List-ID: --lbbqloG4VN3oFyJWuPgDyjvRkeHkNVbfH Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-GB On 24/10/2019 20:45, speck for Borislav Petkov wrote: > On Thu, Oct 24, 2019 at 06:39:35PM +0100, speck for Andrew Cooper wrote= : >> On 23/10/2019 10:01, speck for Pawan Gupta wrote: >>> + if (tsx_ctrl_state =3D=3D TSX_CTRL_DISABLE) { >>> + tsx_disable(); >>> + >>> + /* >>> + * tsx_disable() will change the state of the >>> + * RTM CPUID bit. Clear it here since it is now >>> + * expected to be not set. >>> + */ >>> + setup_clear_cpu_cap(X86_FEATURE_RTM); >> This same argument applies to HLE, and it would be weird for >> pre-TSX_CTRL CPUs with tsx=3Doff to report HLE but not RTM in /proc/cp= uid > Right, the correct fix for that would be to run tsx_init() before we > read CPUID leafs in get_cpu_cap() because then it'll load the proper > bits already and we won't have to clear anything - the MSR write > would've cleared both CPUID bits already. > > I just checked that it happens globally even: > > $ ./cpuid -r | grep -E "^\s+0x00000007" | awk '{ print $6 }' | uniq > edx=3D0xbc000400 > > Bits 4 (HLE) and 11 (RTM) are cleared on all CPUs. > > I'll try this tomorrow to check whether it would even work that early. > > If there's issues with it, then we'll have to do the above thing and > clear HLE too, by hand. On Xen, I've juggled things such that we load microcode, then interpret tsx=3D if the user has provided it (taking care to always write MSR_TSX_CTRL if it is available, to discard whatever settings firmware or kexec left), before querying CPUID. Later, the spec-ctrl=3D interpretation happens, which might choose to tur= n off TSX due to TAA, which then has to modify MSR_TSX_CTRL and force clear the bits in the policy. Given that speculative mitigations rely heavily on CPUID, I can't reason about a clean way to disentangle this, but the above seems to be the least complicated algorithm. >> Furthermore, while grepping through the tree, I found >> >> events/intel/lbr.c-267-static inline bool >> lbr_from_signext_quirk_needed(void) >> events/intel/lbr.c-268-{ >> events/intel/lbr.c-269- int lbr_format =3D x86_pmu.intel_cap.lbr_forma= t; >> events/intel/lbr.c:270: bool tsx_support =3D boot_cpu_has(X86_FEATURE_= HLE) || >> events/intel/lbr.c-271-=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 boot_c= pu_has(X86_FEATURE_RTM); >> events/intel/lbr.c-272- >> events/intel/lbr.c-273- return !tsx_support && (lbr_desc[lbr_format] &= >> LBR_TSX); >> >> which is going to need an adjustment to avoid applying the quirks on >> non-broken hardware. > When both HLE and RTM bits are cleared, that should still be correct, > no? Or am I missing something? On Haswell and Broadwell, the microcode which turned HLE/RTM off in the pipeline left the LBR MSRs in a state where you can't context switch the value, because they would yield a value via RDMSR which WRMSR faulted on, because the two operations had an asymmetric view of how the top bits of metadata should be interpreted, given some TSX-related metadata and a sign extended linear address. On Skylake where you can't actually turn RTM off, but we may hide FEATURE_RTM/HLE, the above quirk is probably not true. On Cascadelake, who knows?=C2=A0 RTM is being turned off in the pipeline,= but maybe the HSX/BWX bug has been fixed, or maybe it is being turned off in a different way, or ... ~Andrew --lbbqloG4VN3oFyJWuPgDyjvRkeHkNVbfH--