From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (146.0.238.70:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 12 Dec 2018 18:06:54 -0000 Received: from smtp.eu.citrix.com ([185.25.65.24]) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gX8uK-00061b-Jj for speck@linutronix.de; Wed, 12 Dec 2018 19:06:53 +0100 Subject: [MODERATED] Re: [PATCH v2 5/8] MDSv2 7 References: <86f7d711109a095f09a0a9cfd9ca5efeba04cb23.1544464266.git.ak@linux.intel.com> From: Andrew Cooper Message-ID: Date: Wed, 12 Dec 2018 10:05:32 -0800 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="JFl9X5r2ns6eEJ0abTsaguJCPDokax6Qj"; protected-headers="v1" To: speck@linutronix.de List-ID: --JFl9X5r2ns6eEJ0abTsaguJCPDokax6Qj Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-GB On 10/12/2018 16:33, speck for Andrew Cooper wrote: > On 10/12/2018 17:53, speck for Andi Kleen wrote: >> xdiff --git a/arch/x86/lib/clear_cpu.S b/arch/x86/lib/clear_cpu.S >> new file mode 100644 >> index 000000000000..5af33baf5427 >> --- /dev/null >> +++ b/arch/x86/lib/clear_cpu.S >> @@ -0,0 +1,107 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> + >> +#include >> +#include >> +#include >> + >> +/* >> + * Clear internal CPU buffers on kernel boundaries. >> + * >> + * These sequences are somewhat fragile, please don't add >> + * or change instructions in the middle of the areas marked with >> + * start/end. >> + * >> + * Interrupts and NMIs we deal with by reclearing. We clear parts >> + * of the kernel stack, which has other advantages too. >> + * >> + * Save all registers to make it easier to use for callers. >> + * >> + * This sequence is for Nehalem-IvyBridge. For Haswell we jump >> + * to hsw_clear_buf. >> + * >> + * These functions need to be called on a full stack, as they may >> + * use upto 1.5k of stack. They should be also called with >> + * interrupts disabled. NMIs etc. are handled by letting every >> + * NMI do its own clear sequence. >> + */ >> +ENTRY(ivb_clear_cpu) >> +GLOBAL(do_clear_cpu) >> + /* >> + * obj[tf]ool complains about unreachable code here, >> + * which appears to be spurious?!? >> + */ >> + ALTERNATIVE "", "jmp hsw_clear_cpu", X86_BUG_MDS_CLEAR_CPU_HSWange >> + push %__ASM_REG(si) >> + push %__ASM_REG(di) >> + push %__ASM_REG(cx) >> + mov %_ASM_SP, %__ASM_REG(si) >> + sub $2*16, %_ASM_SP >> + and $-16,%_ASM_SP >> + movdqa %xmm0, (%_ASM_SP) >> + movdqa %xmm1, 1*16(%_ASM_SP) > You don't need to preserve %xmm1 here.=C2=A0 It is unmodified by the > sequence, because the orpd pulls zero out of its memory operand.=C2=A0 > Similarly... > >> + sub $672, %_ASM_SP >> + xorpd %xmm0,%xmm0 >> + movdqa %xmm0, (%_ASM_SP) >> + movdqa %xmm0, 16(%_ASM_SP) > ... this store doesn't appear to do anything useful, as that stack slot= > isn't read again, and... > >> + mov %_ASM_SP, %__ASM_REG(di) >> + /* Clear sequence start */ >> + movdqu %xmm0,(%__ASM_REG(di)) >> + lfence >> + orpd (%__ASM_REG(di)), %xmm0 >> + orpd (%__ASM_REG(di)), %xmm1 >> + mfence >> + movl $40, %ecx >> + add $32, %__ASM_REG(di) > ... I know this was in the recommended sequence, but bytes 16-31 aren't= > used at all, and it feels fishy. > > Either this wants to be $16, or the second orpd wants a displacement of= > 16 (and we do need to retain the second zeroing write) so all 32 bytes > are used. Based on what Ronak has said in person, two back-to-back loads are guaranteed to be scheduled on alternate load ports. Therefore, this can be a 16 byte change rather than 32. However, there is a separate problem with synchronising the other threads, because a pause waitloop will be racing with this sequence for allocation of load ports.=C2=A0 There doesn't appear to be a viable optio= n to guarantee that these two orpd hit both load ports in the core. Given the GPR restoration later in the return to guest path which is 15ish loads in a line, one option being discussed is to do away with the first half of software sequence entirely. ~Andrew --JFl9X5r2ns6eEJ0abTsaguJCPDokax6Qj--