From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EC1CC2D0A3 for ; Thu, 29 Oct 2020 11:51:06 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D2DE120796 for ; Thu, 29 Oct 2020 11:51:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="rmHnhVsN" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D2DE120796 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=oijWIolNxE6IlYuZ3BecBXyyQvrpOvVKJ03O3o4BUow=; b=rmHnhVsN+RV0GkGe/C9uiMLNv JblXKSKmYKhzJq1gCgm6gSRzIFBjG0Wfs4aouridvOWBk5cMx9goOBhNof8hm5+PYnt+6edWWfFJA 86LXq9Jk/HM6clmV9CJYbYOyacc/xMx1U8nL25XD0dWDJbkljkED/zBlWqTLxeu0pNMtG8Svj31Zo bMqjOHl356Q3s5oAERhwrHAt7r5s6hWMZR128X04dzx8ojUNEtjum8sONjPZr967YILAkm0xj6bqJ juAGY8KmDmTlbD2sHv9LPu9CcZJ0DSgF9u37gvFFp+3dWemsCytuTVpUcCaNv0ZqvLeVnzNHY2SkH BUgcncZig==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kY6Ry-00044V-DM; Thu, 29 Oct 2020 11:50:38 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kY6Rw-00043N-5K for linux-arm-kernel@lists.infradead.org; Thu, 29 Oct 2020 11:50:37 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DBCFB13A1; Thu, 29 Oct 2020 04:50:33 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.55.248]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 879163F719; Thu, 29 Oct 2020 04:50:32 -0700 (PDT) Date: Thu, 29 Oct 2020 11:50:26 +0000 From: Mark Rutland To: Ard Biesheuvel Subject: Re: [PATCH v2] arm64: implement support for static call trampolines Message-ID: <20201029115026.GA61831@C02TD0UTHF1T.local> References: <20201028184114.6834-1-ardb@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201028184114.6834-1-ardb@kernel.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201029_075036_446394_2053CB44 X-CRM114-Status: GOOD ( 28.60 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Zijlstra , catalin.marinas@arm.com, will@kernel.org, james.morse@arm.com, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Ard, On Wed, Oct 28, 2020 at 07:41:14PM +0100, Ard Biesheuvel wrote: > Implement arm64 support for the 'unoptimized' static call variety, > which routes all calls through a single trampoline that is patched > to perform a tail call to the selected function. Given the complexity and subtlety here, do we actually need this? If there's some core feature that depends on this (or wants to), it'd be nice to call that out in the commit message. > Since static call targets may be located in modules loaded out of > direct branching range, we need to be able to fall back to issuing > a ADRP/ADD pair to load the branch target into R16 and use a BR > instruction. As this involves patching more than a single B or NOP > instruction (for which the architecture makes special provisions > in terms of the synchronization needed), we may need to run the > full blown instruction patching logic that uses stop_machine(). It > also means that once we've patched in a ADRP/ADD pair once, we are > quite restricted in the patching we can code subsequently, and we > may end up using an indirect call after all (note that Noted. I guess we [...] > v2: > This turned nasty really quickly when I realized that any sleeping task > could have been interrupted right in the middle of the ADRP/ADD pair > that we emit for static call targets that are out of immediate branching > range. > +/* > + * The static call trampoline consists of one of the following sequences: > + * > + * (A) (B) (C) (D) (E) > + * 00: BTI C BTI C BTI C BTI C BTI C > + * 04: B fn NOP NOP NOP NOP > + * 08: RET RET ADRP X16, fn ADRP X16, fn ADRP X16, fn > + * 0c: NOP NOP ADD X16, fn ADD X16, fn ADD X16, fn > + * 10: BR X16 RET NOP > + * 14: ADRP X16, &fn > + * 18: LDR X16, [X16, &fn] > + * 1c: BR X16 Are these all padded with trailing NOPs or UDFs? I assume the same space is statically reserved. > + * > + * The architecture permits us to patch B instructions into NOPs or vice versa > + * directly, but patching any other instruction sequence requires careful > + * synchronization. Since branch targets may be out of range for ordinary > + * immediate branch instructions, we may have to fall back to ADRP/ADD/BR > + * sequences in some cases, which complicates things considerably; since any > + * sleeping tasks may have been preempted right in the middle of any of these > + * sequences, we have to carefully transform one into the other, and ensure > + * that it is safe to resume execution at any point in the sequence for tasks > + * that have already executed part of it. > + * > + * So the rules are: > + * - we start out with (A) or (B) > + * - a branch within immediate range can always be patched in at offset 0x4; > + * - sequence (A) can be turned into (B) for NULL branch targets; > + * - a branch outside of immediate range can be patched using (C), but only if > + * . the sequence being updated is (A) or (B), or > + * . the branch target address modulo 4k results in the same ADD opcode > + * (which could occur when patching the same far target a second time) > + * - once we have patched in (C) we cannot go back to (A) or (B), so patching > + * in a NULL target now requires sequence (D); > + * - if we cannot patch a far target using (C), we fall back to sequence (E), > + * which loads the function pointer from memory. Cases C-E all use an indirect branch, which goes against one of the arguments for having static calls (the assumption that CPUs won't mis-predict direct branches). Similarly case E is a literal pool with more steps. That means that for us, static calls would only be an opportunistic optimization rather than a hardening feature. Do they actually save us much, or could we get by with an inline literal pool in the trampoline? It'd be much easier to reason consistently if the trampoline were always: | BTI C | LDR X16, _literal // pc-relative load | BR X16 | _literal: | < patch a 64-bit value here atomically > ... and I would strongly prefer that to having multiple sequences that could all be live -- I'm really not keen on the complexity and subtlety that entails. [...] > + * Note that sequence (E) is only used when switching between multiple far > + * targets, and that it is not a terminal degraded state. I think what you're saying here is that you can go from (E) to (C) if switching to a new far branch that's in ADRP+ADD range, but it can be misread to mean (E) is a transient step while patching a branch rather than some branches only being possible to encode with (E). Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel