From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4D21C43334 for ; Fri, 22 Jul 2022 13:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234344AbiGVN2F convert rfc822-to-8bit (ORCPT ); Fri, 22 Jul 2022 09:28:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42578 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232973AbiGVN16 (ORCPT ); Fri, 22 Jul 2022 09:27:58 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DE4C89B186 for ; Fri, 22 Jul 2022 06:27:56 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-45-ONyrBzjjM6S2RqhK5NzQ_A-1; Fri, 22 Jul 2022 14:27:53 +0100 X-MC-Unique: ONyrBzjjM6S2RqhK5NzQ_A-1 Received: from AcuMS.Aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) by AcuMS.aculab.com (fd9f:af1c:a25b:0:994c:f5c2:35d6:9b65) with Microsoft SMTP Server (TLS) id 15.0.1497.36; Fri, 22 Jul 2022 14:27:50 +0100 Received: from AcuMS.Aculab.com ([fe80::994c:f5c2:35d6:9b65]) by AcuMS.aculab.com ([fe80::994c:f5c2:35d6:9b65%12]) with mapi id 15.00.1497.036; Fri, 22 Jul 2022 14:27:50 +0100 From: David Laight To: 'Peter Zijlstra' CC: 'Linus Torvalds' , Sami Tolvanen , Thomas Gleixner , Joao Moreira , LKML , "the arch/x86 maintainers" , Tim Chen , "Josh Poimboeuf" , "Cooper, Andrew" , Pawan Gupta , Johannes Wikner , Alyssa Milburn , Jann Horn , "H.J. Lu" , "Moreira, Joao" , "Nuzman, Joseph" , Steven Rostedt , "Gross, Jurgen" , Masami Hiramatsu , Alexei Starovoitov , Daniel Borkmann , Peter Collingbourne , Kees Cook Subject: RE: [patch 00/38] x86/retbleed: Call depth tracking mitigation Thread-Topic: [patch 00/38] x86/retbleed: Call depth tracking mitigation Thread-Index: AQHYnSyhi03gNC/QIkePUsvbFv4MOq2JW8MQgADObYCAADMmIA== Date: Fri, 22 Jul 2022 13:27:50 +0000 Message-ID: <4a0a9639ce41498e8116dc56a9083955@AcuMS.aculab.com> References: In-Reply-To: Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=C51A453 smtp.mailfrom=david.laight@aculab.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Peter Zijlstra > Sent: 22 July 2022 12:03 > > On Thu, Jul 21, 2022 at 10:01:12PM +0000, David Laight wrote: > > > Since: "If the callee is a variadic function, then the number of floating > > point arguments passed to the function in vector registers must be provided > > by the caller in the AL register." > > > > And that that never happens in the kernel you can use %eax instead > > of %r10d. > > Except there's the AMD BTC thing and we should (compiler patch seems > MIA) have an unconditional: 'xor %eax,%eax' in front of every function > call. I've just read https://www.amd.com/system/files/documents/technical-guidance-for-mitigating-branch-type-confusion_v7_20220712.pdf It doesn't seem to suggest clearing registers except as a vague 'might help' before a function return (to limit what the speculated code can do. The only advantage I can think of for 'xor ax,ax' is that it is done as a register rename - and isn't dependant on older instructions. So it might reduce some pipeline stalls. I'm guessing that someone might find a 'gadget' that depends on %eax and it may be possible to find somewhere that leaves an arbitrary value in it. It is also about the only register that isn't live! > (The official mitigation strategy was CALL; LFENCE IIRC, but that's so > horrible nobody is actually considering that) > > Yes, the suggested sequence ends with rax being zero, but since we start > the speculation before that result is computed that's not good enough I > suspect. The speculated code can't use the 'wrong' %eax value. The only problem is that reading from -4(%r11) is likely to be a D$ miss giving plenty of time for the cpu to execute 'crap'. But I'm not sure a later 'xor ax,ax' helps. (OTOH this is all horrid and makes my brian hurt.) AFAICT with BTC you 'just lose'. I thought it was bad enough that some cpu used the BTB for predicted conditional jumps - but using it to decide 'this must be a branch instruction' seems especially broken. Seems the best thing to do with those cpu is to run an embedded system with a busybox+buildroot userspace where almost everything runs as root :-) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)