From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D151C433EF for ; Mon, 18 Jul 2022 23:00:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236519AbiGRXA0 (ORCPT ); Mon, 18 Jul 2022 19:00:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236510AbiGRXAB (ORCPT ); Mon, 18 Jul 2022 19:00:01 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 843FB3340F for ; Mon, 18 Jul 2022 15:59:38 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1658185175; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nRQgoUbWlCEHhtfsKhy1JNqLu3+h5kACXL2L6BNUSuY=; b=D00lgYW1CO1cj58Oej4VSqyofF6YehWYC9vLaGH47VKEIsBbdbBStnaN4TQd12drkue5TH tpCm0CcszjBgnk1GcLeD4QERVeDoO3ds1xUA/eOtKGFIU3hamoEI0JabtKuKHQu35ZCPbP j4w45P77KvedADqsea0+FY/vnJ+vYrJJYy8SIWzi8sTj3mQBe1U6h+jP8LGUaLMVbDDRJD oE/QYdQFXLiGhK5Korwo7aZZd/+yP8mBOQbMfo1Pfiz8hBaMXtVE95nur+n1HW/LDGHaQt 8W4HJlKgmK+JClgXdhfKsLNkdmDzJHOpPUNleXnaskMpVcnp2HELnZCwdPVjzA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1658185175; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nRQgoUbWlCEHhtfsKhy1JNqLu3+h5kACXL2L6BNUSuY=; b=30bUySOHEf6spJn6GWQrwyo2s5oy9JCuXIZgjHL2TjScBxhO2HqDdxrKVtFucmiPt73n20 4oHOLZ/gwU82npBA== To: Sami Tolvanen , Peter Zijlstra Cc: Linus Torvalds , LKML , the arch/x86 maintainers , Tim Chen , Josh Poimboeuf , Andrew Cooper , Pawan Gupta , Johannes Wikner , Alyssa Milburn , Jann Horn , "H.J. Lu" , Joao Moreira , Joseph Nuzman , Steven Rostedt , Juergen Gross , Masami Hiramatsu , Alexei Starovoitov , Daniel Borkmann Subject: Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation In-Reply-To: References: <20220716230344.239749011@linutronix.de> <87wncauslw.ffs@tglx> <87tu7euska.ffs@tglx> <87o7xmup5t.ffs@tglx> Date: Tue, 19 Jul 2022 00:59:35 +0200 Message-ID: <87ilnuuiw8.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 18 2022 at 15:48, Sami Tolvanen wrote: > On Mon, Jul 18, 2022 at 2:18 PM Peter Zijlstra wrote: >> >> On Mon, Jul 18, 2022 at 10:44:14PM +0200, Thomas Gleixner wrote: >> > And we need input from the Clang folks because their CFI work also puts >> > stuff in front of the function entry, which nicely collides. >> >> Right, I need to go look at the latest kCFI patches, that sorta got >> side-tracked for working on all the retbleed muck :/ >> >> Basically kCFI wants to preface every (indirect callable) function with: >> >> __cfi_\func: >> int3 >> movl $0x12345678, %rax >> int3 >> int3 >> \func: > > Yes, and in order to avoid scattering the code with call target > gadgets, the preamble should remain immediately before the function. > >> Ofc, we can still put the whole: >> >> sarq $5, PER_CPU_VAR(__x86_call_depth); >> jmp \func_direct >> >> thing in front of that. > > Sure, that would work. > >> But it does somewhat destroy the version I had that only needs the >> 10 bytes padding for the sarq. > > There's also the question of how function alignment should work in the > KCFI case. Currently, the __cfi_ preamble is 16-byte aligned, which > obviously means the function itself isn't. That's bad. The function entry should be 16 byte aligned and as I just learned for AMD the ideal alignment would be possibly 32 byte as that's their I-fetch width. But my experiments with 16 bytes alignment independent of the padding muck is benefitial for both AMD and Intel over the 4 byte alignment we have right now. This really needs a lot of thought and performance analysis before we commit to anything here. Peter's an my investigations have shown how sensitive this is. We can't just add stuff without taking the whole picture into account (independent of the proposed padding muck). Thanks, tglx