From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB800C2D0A3 for ; Thu, 29 Oct 2020 10:40:40 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 33CF420782 for ; Thu, 29 Oct 2020 10:40:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="Dywx+IHx" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33CF420782 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=hFwDbnxGTNLx52KdR/XWfn9eLcuKot29KIMRMO8c88c=; b=Dywx+IHxD77XOuGgnAjBUa87k dm9KafXHg3+5BEnCr+hDDQrNm9ZJQ7IpvRmvB+4quPU08AMW94X0iI+dKNPID+JYZqKrwBChMTAaG Rw+v+lZ3Kyz/6r2K3yYFxfttC0n/andWBkJS/aN+g6PtN+qMTw92IdXjaUU2bGUflvGoRXJtyQH08 dGE6w5GFwXUSbdlGGh6uVGZnY1YzRX1WSwsTo1iODnRZkIOWIjTcFYq0SLF6Dv4OY4Mnvq1mta32Y /KZxZhga0hyF3qZdiQTiY3LcbKlbXFCzFXErxjnQLzSUQk31j/qldyFZljtx08RiLEL26zLmi/r7U um9mbeh6g==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kY5Ln-00037J-8W; Thu, 29 Oct 2020 10:40:11 +0000 Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kY5Ll-00036i-87; Thu, 29 Oct 2020 10:40:09 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id D21E83012C3; Thu, 29 Oct 2020 11:40:07 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id B231520409A7E; Thu, 29 Oct 2020 11:40:07 +0100 (CET) Date: Thu, 29 Oct 2020 11:40:07 +0100 From: Peter Zijlstra To: Ard Biesheuvel Subject: Re: [PATCH v2] arm64: implement support for static call trampolines Message-ID: <20201029104007.GK2628@hirez.programming.kicks-ass.net> References: <20201028184114.6834-1-ardb@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201028184114.6834-1-ardb@kernel.org> X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, catalin.marinas@arm.com, will@kernel.org, james.morse@arm.com, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Oct 28, 2020 at 07:41:14PM +0100, Ard Biesheuvel wrote: > +/* > + * The static call trampoline consists of one of the following sequences: > + * > + * (A) (B) (C) (D) (E) > + * 00: BTI C BTI C BTI C BTI C BTI C > + * 04: B fn NOP NOP NOP NOP > + * 08: RET RET ADRP X16, fn ADRP X16, fn ADRP X16, fn > + * 0c: NOP NOP ADD X16, fn ADD X16, fn ADD X16, fn > + * 10: BR X16 RET NOP > + * 14: ADRP X16, &fn > + * 18: LDR X16, [X16, &fn] > + * 1c: BR X16 > + * > + * The architecture permits us to patch B instructions into NOPs or vice versa > + * directly, but patching any other instruction sequence requires careful > + * synchronization. Since branch targets may be out of range for ordinary > + * immediate branch instructions, we may have to fall back to ADRP/ADD/BR > + * sequences in some cases, which complicates things considerably; since any > + * sleeping tasks may have been preempted right in the middle of any of these > + * sequences, we have to carefully transform one into the other, and ensure > + * that it is safe to resume execution at any point in the sequence for tasks > + * that have already executed part of it. > + * > + * So the rules are: > + * - we start out with (A) or (B) > + * - a branch within immediate range can always be patched in at offset 0x4; > + * - sequence (A) can be turned into (B) for NULL branch targets; > + * - a branch outside of immediate range can be patched using (C), but only if > + * . the sequence being updated is (A) or (B), or > + * . the branch target address modulo 4k results in the same ADD opcode > + * (which could occur when patching the same far target a second time) > + * - once we have patched in (C) we cannot go back to (A) or (B), so patching > + * in a NULL target now requires sequence (D); > + * - if we cannot patch a far target using (C), we fall back to sequence (E), > + * which loads the function pointer from memory. > + * > + * If we abide by these rules, then the following must hold for tasks that were > + * interrupted halfway through execution of the trampoline: > + * - when resuming at offset 0x8, we can only encounter a RET if (B) or (D) > + * was patched in at any point, and therefore a NULL target is valid; > + * - when resuming at offset 0xc, we are executing the ADD opcode that is only > + * reachable via the preceding ADRP, and which is patched in only a single > + * time, and is therefore guaranteed to be consistent with the ADRP target; > + * - when resuming at offset 0x10, X16 must refer to a valid target, since it > + * is only reachable via a ADRP/ADD pair that is guaranteed to be consistent. > + * > + * Note that sequence (E) is only used when switching between multiple far > + * targets, and that it is not a terminal degraded state. > + */ Would it make things easier if your trampoline consisted of two complete slots, between which you can flip? Something like: 0x00 B 0x24 / NOP 0x04 < slot 1 > .... 0x20 0x24 < slot 2 > .... 0x40 Then each (20 byte) slot can contain any of the variants above and you can write the unused slot without stop-machine. Then, when the unused slot is populated, flip the initial instruction (like a static-branch), issue synchronize_rcu_tasks() and flip to using the other slot for next time. Alternatively, you can patch the call-sites to point to the alternative trampoline slot, but that might be pushing things a bit. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel