From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.2 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B616C433DF for ; Tue, 7 Jul 2020 10:37:32 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB3092065F for ; Tue, 7 Jul 2020 10:37:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="cYiWSta0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB3092065F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Yo37i0M4ornX6gaFlGXG5ji2JgdnT4qIzkz02NfCdZU=; b=cYiWSta0uRWthXe6huTlpKGxn HWZEG9p0DDZt4DBk+I7wNSGOJUg+VVV0y9y4XQ4sXOg/UL8zVQ77OlO5pXzHwKE2hvVT3UQ4WeGSO SOkX30YV1Bb9K80WWuuK0KWNq288vyBs9cLmZyWoFRmps/dMwPpiBv3dPw6/j5tfY92InMVYrdtsN m8UXYuZNLyK0bO+H2T+6uqTM/jLednNXQ1StLQpInUxVeSSaCXvflOa4CFGK3heLJxNiWRTI5D0Ie u9YGoCEXEPy4Vg7A20DJnqB3tjGstHWfKVlQJLueXk6fthnLd/6Rn7mLw1UltXLeEN9ITmd4rE06K eGNISIkhA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jskxH-0005Ki-40; Tue, 07 Jul 2020 10:36:03 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jskxC-0005JE-SQ for linux-arm-kernel@lists.infradead.org; Tue, 07 Jul 2020 10:36:00 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3041EC0A; Tue, 7 Jul 2020 03:35:57 -0700 (PDT) Received: from arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B9AB83F71E; Tue, 7 Jul 2020 03:35:55 -0700 (PDT) Date: Tue, 7 Jul 2020 11:35:53 +0100 From: Dave Martin To: Ard Biesheuvel Subject: Re: [PATCH] arm64/alternatives: use subsections for replacement sequences Message-ID: <20200707103552.GJ10992@arm.com> References: <20200630081921.13443-1-ardb@kernel.org> <20200701170055.v5xufbjjkjknnkuc@bakewell.cambridge.arm.com> <20200706155024.GA10992@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200707_063559_070165_414D87E9 X-CRM114-Status: GOOD ( 65.01 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Anders Roxell , Arnd Bergmann , Suzuki K Poulose , Catalin Marinas , Mark Brown , James Morse , Andre Przywara , Will Deacon , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Jul 06, 2020 at 07:04:44PM +0300, Ard Biesheuvel wrote: > On Mon, 6 Jul 2020 at 18:50, Dave Martin wrote: > > > > On Wed, Jul 01, 2020 at 07:32:07PM +0200, Ard Biesheuvel wrote: > > > On Wed, 1 Jul 2020 at 19:30, Ard Biesheuvel wrote: > > > > > > > > On Wed, 1 Jul 2020 at 19:01, Dave P Martin wrote: > > > > > > > > > > On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote: > > > > > > When building very large kernels, the logic that emits replacement > > > > > > sequences for alternatives fails when relative branches are present > > > > > > in the code that is emitted into the .altinstr_replacement section > > > > > > and patched in at the original site and fixed up. The reason is that > > > > > > the linker will insert veneers if relative branches go out of range, > > > > > > and due to the relative distance of the .altinstr_replacement from > > > > > > the .text section where its branch targets usually live, veneers > > > > > > may be emitted at the end of the .altinstr_replacement section, with > > > > > > the relative branches in the sequence pointed at the veneers instead > > > > > > of the actual target. > > > > > > > > > > > > The alternatives patching logic will attempt to fix up the branch to > > > > > > point to its original target, which will be the veneer in this case, > > > > > > but given that the patch site is likely to be far away as well, it > > > > > > will be out of range and so patching will fail. There are other cases > > > > > > where these veneers are problematic, e.g., when the target of the > > > > > > branch is in .text while the patch site is in .init.text, in which > > > > > > case putting the replacement sequence inside .text may not help either. > > > > > > > > > > > > So let's use subsections to emit the replacement code as closely as > > > > > > possible to the patch site, to ensure that veneers are only likely to > > > > > > be emitted if they are required at the patch site as well, in which > > > > > > case they will be in range for the replacement sequence both before > > > > > > and after it is transported to the patch site. > > > > > > > > > > > > This will prevent alternative sequences in non-init code from being > > > > > > released from memory after boot, but this is tolerable given that the > > > > > > entire section is only 512 KB on an allyesconfig build (which weighs in > > > > > > at 500+ MB for the entire Image). Also, note that modules today carry > > > > > > the replacement sequences in non-init sections as well, and any of > > > > > > those that target init code will be emitted into init sections after > > > > > > this change. > > > > > > > > > > > > This fixes an early crash when booting an allyesconfig kernel on a > > > > > > system where any of the alternatives sequences containing relative > > > > > > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2) > > > > > > > > > > > > Cc: Suzuki K Poulose > > > > > > Cc: James Morse > > > > > > Cc: Andre Przywara > > > > > > Cc: Dave P Martin > > > > > > Signed-off-by: Ard Biesheuvel > > > > > > --- > > > > > > arch/arm64/include/asm/alternative.h | 16 ++++++++-------- > > > > > > arch/arm64/kernel/vmlinux.lds.S | 3 --- > > > > > > 2 files changed, 8 insertions(+), 11 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > > > > > > index 5e5dc05d63a0..12f0eb56a1cc 100644 > > > > > > --- a/arch/arm64/include/asm/alternative.h > > > > > > +++ b/arch/arm64/include/asm/alternative.h > > > > > > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > > > > > > ".pushsection .altinstructions,\"a\"\n" \ > > > > > > ALTINSTR_ENTRY(feature) \ > > > > > > ".popsection\n" \ > > > > > > - ".pushsection .altinstr_replacement, \"a\"\n" \ > > > > > > + ".subsection 1\n" \ > > > > > > > > > > This uses subsections in existing sections. Could that interfere with > > > > > existing (or future) uses of subsections? (I've not checked whether > > > > > there actually are such uses. I'm also assuming that clobbering the > > > > > invoker's idea of what section is .previous doesn't matter.) > > > > > > > > > > > > > It shouldn't matter, really. You can use different indexes for the > > > > subsection, but since the execution never flows from one subsection > > > > into the next, all that matters is that they are 'somewhere else' > > > > > > > > As for the use of .previous - the idea is that this does not affect > > > > the contents of the section stack, which I think makes sense. We could > > > > use '.pushsection .text, 1' as well to enter another subsection in > > > > .text, but that means we keep the .text vs .init.text issue that this > > > > patch solves. > > > > The following works: > > > > .pushsection junk > > .previous > > .subsection foo > > // ... > > > > .popsection > > > > though the gas documentation is not very clear about the relationship > > between .previous and the section stack directives. In fact, each stack > > slot has its own notion of .previous. If this trick is too uncertain, > > we can probably do without it though. Relying on .previous after > > invoking a macro is ill-advised anyway, and I haven't seen this issue > > come up in practice. > > > > There is some mention about .previous only affecting the slot at the > top of the stack, but it is rather vague. > > > Since foo is just a number, maybe we could just pick a randomish value, > > similarly to the way we "manage" asm local labels generated by macros, > > leaving small-integer values reserved for top-level code? This might > > help prevent surprises later on. > > > > In general it's reasonable to use subsections to consolidate things > > that should be kept close but otherwise contiguous in the output > > object, such as collecting together entries for a jump table. If a .S > > file and macros it calls are both using subsection 1, the macros might > > spit out garbage into the middle of the table the .S file is trying to > > build for example. However, I don't see any obvious evidence of that > > kind of thing in arch/arm64, and nothing in core code (no surprise > > there, this is asm). > > > > All uses of subsections I am aware just use it to emit code out of > line, with a label at the start and a branch at the end, and the only > reason is to remove it from the hot path. For that kind of use, the > only requirement for the subsection index is that it is != 0. > > If you are doing smart things with subsections and expect some > consistency in how they are organized at the end of the object file, > you might care about the subsection index, but I don't see a reason to > do so here. If we used .subsection 77, say, then a conflict would be highly unlikely. But I'd agree that this might just cause more confusion in the long run (if it ever matters). > > > > > > > > > Another wrinkle: the replacement code now becomes executable, whereas > > > > > I think it was previously in rodata. I'm not sure how much this > > > > > matters, but it might be a source of gadgets. > > > > > > > > > > > > > True. Perhaps we need to get rid of relative branches in alternative > > > > sequences entirely - see below. > > > > > > > > > > > > > > A different option would be to add an explicitly veneered branch macro > > > > > for use in alternatives, maybe adrp+add+br. For BTI compatility, we'd > > > > > need a bti j or equivalent at the destination, which might or might not > > > > > be easy to achieve -- mind you, I think we theoretically need that > > > > > anyway for veneers to work properly in all cases. > > > > > > > > > > Because we would define the exact instruction sequence, the > > > > > alternatives code could probably replace it with a direct branch if the > > > > > actual destination is close enough. The downside is that it wouldn't > > > > > be a single instruction any more, and there would be some overhead for > > > > > conditional branches if we replace the unneeded insns with NOPs. > > > > > > > > > > > > > Yeah, this becomes quite hairy very quickly, especially because now > > > > you need to allocate a spare register each time you do this. > > > > which is not ideal, I agree. > > > > Do we anticipate any truly out-of-range branches, or are we assuming the > > kernel text + modules area is always compact enough that direct > > branching always works? > > > > There are two realistic failure modes, afaict: > - *Really* large kernels, such as allyesconfig, which is kind of > artificial but still relevant for validation purposes > - Occurrences where the module region is almost exhausted, to the > point where the next module's non-init segment no longer fits, but the > init section does. In this case, any alternative sequences with > branches that need to be patched into the init text section may be out > of range for their targets (which could be actual functions or PLTs > that were emitted into the non-init section) > > > > > > > > > > One other option is to simply disallow branches in the alternative > > > > sequences: I spotted three occurrences [0] that are quite easily > > > > fixed, by inverting the condition so that the relative branch is > > > > emitted in place, and the alternative sequence is just NOPs. > > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-alt-branches > > > > That might be a nice simplification if there's no significant > > performance impact. > > > > Actually, the PAN code could be improved a bit more - I just sent a > patch - but in general, disallowing such branches would be a nice > simplification but I am not sure we can easily enforce it at build > time. We could scan for problem relocs when linking, but that would introduce an extra build step. So I'd guess that wouldn't be popular just for this. Cheers ---Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel