From mboxrd@z Thu Jan 1 00:00:00 1970 From: ard.biesheuvel@linaro.org (Ard Biesheuvel) Date: Thu, 25 Feb 2016 17:13:07 +0100 Subject: [PATCH v5sub2 1/8] arm64: add support for module PLTs In-Reply-To: References: <1454332178-4414-1-git-send-email-ard.biesheuvel@linaro.org> <1454332178-4414-2-git-send-email-ard.biesheuvel@linaro.org> <20160225160714.GA16546@arm.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 25 February 2016 at 17:12, Ard Biesheuvel wrote: > On 25 February 2016 at 17:07, Will Deacon wrote: >> Hi Ard, >> >> On Mon, Feb 01, 2016 at 02:09:31PM +0100, Ard Biesheuvel wrote: >>> This adds support for emitting PLTs at module load time for relative >>> branches that are out of range. This is a prerequisite for KASLR, which >>> may place the kernel and the modules anywhere in the vmalloc area, >>> making it more likely that branch target offsets exceed the maximum >>> range of +/- 128 MB. >>> >>> Signed-off-by: Ard Biesheuvel >>> --- >>> >>> In this version, I removed the distinction between relocations against >>> .init executable sections and ordinary executable sections. The reason >>> is that it is hardly worth the trouble, given that .init.text usually >>> does not contain that many far branches, and this version now only >>> reserves PLT entry space for jump and call relocations against undefined >>> symbols (since symbols defined in the same module can be assumed to be >>> within +/- 128 MB) >>> >>> For example, the mac80211.ko module (which is fairly sizable at ~400 KB) >>> built with -mcmodel=large gives the following relocation counts: >>> >>> relocs branches unique !local >>> .text 3925 3347 518 219 >>> .init.text 11 8 7 1 >>> .exit.text 4 4 4 1 >>> .text.unlikely 81 67 36 17 >>> >>> ('unique' means branches to unique type/symbol/addend combos, of which >>> !local is the subset referring to undefined symbols) >>> >>> IOW, we are only emitting a single PLT entry for the .init sections, and >>> we are better off just adding it to the core PLT section instead. >>> --- >>> arch/arm64/Kconfig | 9 + >>> arch/arm64/Makefile | 6 +- >>> arch/arm64/include/asm/module.h | 11 ++ >>> arch/arm64/kernel/Makefile | 1 + >>> arch/arm64/kernel/module-plts.c | 201 ++++++++++++++++++++ >>> arch/arm64/kernel/module.c | 12 ++ >>> arch/arm64/kernel/module.lds | 3 + >>> 7 files changed, 242 insertions(+), 1 deletion(-) >> >> [...] >> >>> +struct plt_entry { >>> + /* >>> + * A program that conforms to the AArch64 Procedure Call Standard >>> + * (AAPCS64) must assume that a veneer that alters IP0 (x16) and/or >>> + * IP1 (x17) may be inserted at any branch instruction that is >>> + * exposed to a relocation that supports long branches. Since that >>> + * is exactly what we are dealing with here, we are free to use x16 >>> + * as a scratch register in the PLT veneers. >>> + */ >>> + __le32 mov0; /* movn x16, #0x.... */ >>> + __le32 mov1; /* movk x16, #0x...., lsl #16 */ >>> + __le32 mov2; /* movk x16, #0x...., lsl #32 */ >>> + __le32 br; /* br x16 */ >>> +}; >> >> I'm worried about this code when CONFIG_ARM64_LSE_ATOMICS=y, but we don't >> detect them on the CPU at runtime. In this case, all atomic operations >> are moved out-of-line and called using a bl instruction from inline asm. >> >> The out-of-line code is compiled with magic GCC options > > Which options are those exactly? > >> to force the >> explicit save/restore of all used registers (see arch/arm64/lib/Makefile), >> otherwise we'd have to clutter the inline asm with constraints that >> wouldn't be needed had we managed to patch the bl with an LSE atomic >> instruction. >> >> If you're emitting a PLT, couldn't we end up with silent corruption of >> x16 for modules using out-of-line atomics like this? >> > > If you violate the AAPCS64 ABI, then obviously the claim above does not hold. IOW, yes, x16 will be corrupted unexpectedly if it you are expecting it to be preserved across a bl or b instruction that is exposed to a jump/call relocation