From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=JspG=EE=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9EC1CC2D0A3
	for <linux-arm-kernel@archiver.kernel.org>; Thu, 29 Oct 2020 11:51:06 +0000 (UTC)
Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id D2DE120796
	for <linux-arm-kernel@archiver.kernel.org>; Thu, 29 Oct 2020 11:51:05 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="rmHnhVsN"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D2DE120796
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding:
	Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive:
	List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID:
	Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	 bh=oijWIolNxE6IlYuZ3BecBXyyQvrpOvVKJ03O3o4BUow=; b=rmHnhVsN+RV0GkGe/C9uiMLNv
	JblXKSKmYKhzJq1gCgm6gSRzIFBjG0Wfs4aouridvOWBk5cMx9goOBhNof8hm5+PYnt+6edWWfFJA
	86LXq9Jk/HM6clmV9CJYbYOyacc/xMx1U8nL25XD0dWDJbkljkED/zBlWqTLxeu0pNMtG8Svj31Zo
	bMqjOHl356Q3s5oAERhwrHAt7r5s6hWMZR128X04dzx8ojUNEtjum8sONjPZr967YILAkm0xj6bqJ
	juAGY8KmDmTlbD2sHv9LPu9CcZJ0DSgF9u37gvFFp+3dWemsCytuTVpUcCaNv0ZqvLeVnzNHY2SkH
	BUgcncZig==;
Received: from localhost ([::1] helo=merlin.infradead.org)
	by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux))
	id 1kY6Ry-00044V-DM; Thu, 29 Oct 2020 11:50:38 +0000
Received: from foss.arm.com ([217.140.110.172])
 by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux))
 id 1kY6Rw-00043N-5K
 for linux-arm-kernel@lists.infradead.org; Thu, 29 Oct 2020 11:50:37 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DBCFB13A1;
 Thu, 29 Oct 2020 04:50:33 -0700 (PDT)
Received: from C02TD0UTHF1T.local (unknown [10.57.55.248])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 879163F719;
 Thu, 29 Oct 2020 04:50:32 -0700 (PDT)
Date: Thu, 29 Oct 2020 11:50:26 +0000
From: Mark Rutland <mark.rutland@arm.com>
To: Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH v2] arm64: implement support for static call trampolines
Message-ID: <20201029115026.GA61831@C02TD0UTHF1T.local>
References: <20201028184114.6834-1-ardb@kernel.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20201028184114.6834-1-ardb@kernel.org>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20201029_075036_446394_2053CB44 
X-CRM114-Status: GOOD (  28.60  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Cc: Peter Zijlstra <peterz@infradead.org>, catalin.marinas@arm.com,
 will@kernel.org, james.morse@arm.com, linux-arm-kernel@lists.infradead.org
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

Hi Ard,

On Wed, Oct 28, 2020 at 07:41:14PM +0100, Ard Biesheuvel wrote:
> Implement arm64 support for the 'unoptimized' static call variety,
> which routes all calls through a single trampoline that is patched
> to perform a tail call to the selected function.

Given the complexity and subtlety here, do we actually need this?

If there's some core feature that depends on this (or wants to), it'd be
nice to call that out in the commit message.

> Since static call targets may be located in modules loaded out of
> direct branching range, we need to be able to fall back to issuing
> a ADRP/ADD pair to load the branch target into R16 and use a BR
> instruction. As this involves patching more than a single B or NOP
> instruction (for which the architecture makes special provisions
> in terms of the synchronization needed), we may need to run the
> full blown instruction patching logic that uses stop_machine(). It
> also means that once we've patched in a ADRP/ADD pair once, we are
> quite restricted in the patching we can code subsequently, and we
> may end up using an indirect call after all (note that 

Noted. I guess we

[...]

> v2:
> This turned nasty really quickly when I realized that any sleeping task
> could have been interrupted right in the middle of the ADRP/ADD pair
> that we emit for static call targets that are out of immediate branching
> range.

> +/*
> + * The static call trampoline consists of one of the following sequences:
> + *
> + *      (A)           (B)           (C)           (D)           (E)
> + * 00: BTI  C        BTI  C        BTI  C        BTI  C        BTI  C
> + * 04: B    fn       NOP           NOP           NOP           NOP
> + * 08: RET           RET           ADRP X16, fn  ADRP X16, fn  ADRP X16, fn
> + * 0c: NOP           NOP           ADD  X16, fn  ADD  X16, fn  ADD  X16, fn
> + * 10:                             BR   X16      RET           NOP
> + * 14:                                                         ADRP X16, &fn
> + * 18:                                                         LDR  X16, [X16, &fn]
> + * 1c:                                                         BR   X16

Are these all padded with trailing NOPs or UDFs? I assume the same space is
statically reserved.

> + *
> + * The architecture permits us to patch B instructions into NOPs or vice versa
> + * directly, but patching any other instruction sequence requires careful
> + * synchronization. Since branch targets may be out of range for ordinary
> + * immediate branch instructions, we may have to fall back to ADRP/ADD/BR
> + * sequences in some cases, which complicates things considerably; since any
> + * sleeping tasks may have been preempted right in the middle of any of these
> + * sequences, we have to carefully transform one into the other, and ensure
> + * that it is safe to resume execution at any point in the sequence for tasks
> + * that have already executed part of it.
> + *
> + * So the rules are:
> + * - we start out with (A) or (B)
> + * - a branch within immediate range can always be patched in at offset 0x4;
> + * - sequence (A) can be turned into (B) for NULL branch targets;
> + * - a branch outside of immediate range can be patched using (C), but only if
> + *   . the sequence being updated is (A) or (B), or
> + *   . the branch target address modulo 4k results in the same ADD opcode
> + *     (which could occur when patching the same far target a second time)
> + * - once we have patched in (C) we cannot go back to (A) or (B), so patching
> + *   in a NULL target now requires sequence (D);
> + * - if we cannot patch a far target using (C), we fall back to sequence (E),
> + *   which loads the function pointer from memory.

Cases C-E all use an indirect branch, which goes against one of the
arguments for having static calls (the assumption that CPUs won't
mis-predict direct branches). Similarly case E is a literal pool with
more steps.

That means that for us, static calls would only be an opportunistic
optimization rather than a hardening feature. Do they actually save us
much, or could we get by with an inline literal pool in the trampoline?

It'd be much easier to reason consistently if the trampoline were
always:

| 	BTI C
| 	LDR X16, _literal // pc-relative load
| 	BR X16
| _literal:
| 	< patch a 64-bit value here atomically >

... and I would strongly prefer that to having multiple sequences that
could all be live -- I'm really not keen on the complexity and subtlety
that entails.

[...]

> + * Note that sequence (E) is only used when switching between multiple far
> + * targets, and that it is not a terminal degraded state.

I think what you're saying here is that you can go from (E) to (C) if
switching to a new far branch that's in ADRP+ADD range, but it can be
misread to mean (E) is a transient step while patching a branch rather
than some branches only being possible to encode with (E).

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel