All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb@kernel.org>
To: Hou Wenlong <houwenlong.hwl@antgroup.com>
Cc: linux-kernel@vger.kernel.org,
	Lai Jiangshan <jiangshan.ljs@antgroup.com>,
	Kees Cook <keescook@chromium.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Petr Mladek <pmladek@suse.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Song Liu <song@kernel.org>,
	Julian Pidancet <julian.pidancet@oracle.com>
Subject: Re: [PATCH RFC 31/43] x86/modules: Adapt module loading for PIE support
Date: Wed, 10 May 2023 10:15:51 +0200	[thread overview]
Message-ID: <CAMj1kXF3N89J-Y8OptZjXPxnOLW+K5Ot+jCSBT+py90uvBxh1w@mail.gmail.com> (raw)
In-Reply-To: <20230510070949.GA7127@k08j02272.eu95sqa>

On Wed, 10 May 2023 at 09:15, Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
>
> On Mon, May 08, 2023 at 05:16:34PM +0800, Ard Biesheuvel wrote:
> > On Mon, 8 May 2023 at 10:38, Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
> > >
> > > On Sat, Apr 29, 2023 at 03:29:32AM +0800, Ard Biesheuvel wrote:
> > > > On Fri, 28 Apr 2023 at 10:53, Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
> > > > >
> > > > > Adapt module loading to support PIE relocations. No GOT is generared for
> > > > > module, all the GOT entry of got references in module should exist in
> > > > > kernel GOT.  Currently, there is only one usable got reference for
> > > > > __fentry__().
> > > > >
> > > >
> > > > I don't think this is the right approach. We should permit GOTPCREL
> > > > relocations properly, which means making them point to a location in
> > > > memory that carries the absolute address of the symbol. There are
> > > > several ways to go about that, but perhaps the simplest way is to make
> > > > the symbol address in ksymtab a 64-bit absolute value (but retain the
> > > > PC32 references for the symbol name and the symbol namespace name).
> > > > That way, you can always resolve such GOTPCREL relocations by pointing
> > > > it to the ksymtab entry. Another option would be to take inspiration
> > > > from the PLT code we have on ARM and arm64 (and other architectures,
> > > > surely) and to count the GOT based relocations, allocate some extra
> > > > r/o module space for each, and allocate slots and populate them with
> > > > the right value as you fix up the relocations.
> > > >
> > > > Then, many such relocations can be relaxed at module load time if the
> > > > symbol is in range. IIUC, the module and kernel will still be inside
> > > > the same 2G window even after widening the KASLR range to 512G, so
> > > > most GOT loads can be converted into RIP relative LEA instructions.
> > > >
> > > > Note that this will also permit you to do things like
> > > >
> > > > #define PV_VCPU_PREEMPTED_ASM \
> > > >  "leaq __per_cpu_offset(%rip), %rax \n\t" \
> > > >  "movq (%rax,%rdi,8), %rax \n\t" \
> > > >  "addq steal_time@GOTPCREL(%rip), %rax \n\t" \
> > > >  "cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "(%rax) \n\t" \
> > > >  "setne %al\n\t"
> > > >
> > > > or
> > > >
> > > > +#ifdef CONFIG_X86_PIE
> > > > + " pushq arch_rethook_trampoline@GOTPCREL(%rip)\n"
> > > > +#else
> > > > " pushq $arch_rethook_trampoline\n"
> > > > +#endif
> > > >
> > > > instead of having these kludgy push/pop sequences to free up temp registers.
> > > >
> > > > (FYI I have looked into this PIE linking just a few weeks ago [0] so
> > > > this is all rather fresh in my memory)
> > > >
> > > >
> > > >
> > > >
> > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=x86-pie
> > > >
> > > >
> > > Hi Ard,
> > > Thanks for providing the link, it has been very helpful for me as I am
> > > new to the topic of compilers.
> >
> > Happy to hear that.
> >
> > > One key difference I noticed is that you
> > > linked the kernel with "-pie" instead of "--emit-reloc". I also noticed
> > > that Thomas' initial patchset[0] used "-pie", but in RFC v3 [1], it
> > > switched to "--emit-reloc" in order to reduce dynamic relocation space
> > > on mapped memory.
> > >
> >
> > The problem with --emit-relocs is that the relocations emitted into
> > the binary may get out of sync with the actual code after the linker
> > has applied relocations.
> >
> > $ cat /tmp/a.s
> > foo:movq foo@GOTPCREL(%rip), %rax
> >
> > $ x86_64-linux-gnu-gcc -c -o /tmp/a.o /tmp/a.s
> > ard@gambale:~/linux$ x86_64-linux-gnu-objdump -dr /tmp/a.o
> >
> > /tmp/a.o:     file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <foo>:
> >    0: 48 8b 05 00 00 00 00 mov    0x0(%rip),%rax        # 7 <foo+0x7>
> > 3: R_X86_64_REX_GOTPCRELX foo-0x4
> >
> > $ x86_64-linux-gnu-gcc -c -o /tmp/a.o /tmp/a.s
> > $ x86_64-linux-gnu-objdump -dr /tmp/a.o
> > 0000000000000000 <foo>:
> >    0: 48 8b 05 00 00 00 00 mov    0x0(%rip),%rax        # 7 <foo+0x7>
> > 3: R_X86_64_REX_GOTPCRELX foo-0x4
> >
> > $ x86_64-linux-gnu-gcc -o /tmp/a.elf -nostartfiles
> > -Wl,-no-pie,-q,--defsym,_start=0x0 /tmp/a.s
> > $ x86_64-linux-gnu-objdump -dr /tmp/a.elf
> > 0000000000401000 <foo>:
> >   401000: 48 c7 c0 00 10 40 00 mov    $0x401000,%rax
> > 401003: R_X86_64_32S foo
> >
> > $ x86_64-linux-gnu-gcc -o /tmp/a.elf -nostartfiles
> > -Wl,-q,--defsym,_start=0x0 /tmp/a.s
> > $ x86_64-linux-gnu-objdump -dr /tmp/a.elf
> > 0000000000001000 <foo>:
> >     1000: 48 8d 05 f9 ff ff ff lea    -0x7(%rip),%rax        # 1000 <foo>
> > 1003: R_X86_64_PC32 foo-0x4
> >
> > This all looks as expected. However, when using Clang, we end up with
> >
> > $ clang -target x86_64-linux-gnu -o /tmp/a.elf -nostartfiles
> > -fuse-ld=lld -Wl,--relax,-q,--defsym,_start=0x0 /tmp/a.s
> > $ x86_64-linux-gnu-objdump -dr /tmp/a.elf
> > 00000000000012c0 <foo>:
> >     12c0: 48 8d 05 f9 ff ff ff lea    -0x7(%rip),%rax        # 12c0 <foo>
> > 12c3: R_X86_64_REX_GOTPCRELX foo-0x4
> >
> > So in this case, what --emit-relocs gives us is not what is actually
> > in the binary. We cannot just ignore these either, given that they are
> > treated differently depending on whether the symbol is a per-CPU
> > symbol or not - in the former case, we need to perform a fixup if the
> > relaxed reference is RIP relative, and in the latter case, if the
> > relaxed reference is absolute.
> >
> > On top of that, --emit-relocs does not cover the GOT, so we'd still
> > need to process that from the code explicitly.
> >
> > In general, relying on --emit-relocs is kind of dodgy, and I think
> > combining PIE linking with --emit-relocs is a bad idea.
> >
> > > The another issue is that it requires the addition of the
> > > "-mrelax-relocations=no" option to support older compilers and linkers.
> >
> > Why? The decompressor is now linked in PIE mode so we should be able
> > to drop that. Or do you need to add is somewhere else?
> >
> Hi Ard,
>
> After removing the "-mrelax-relocations=no" option, I noticed that the
> linker was relaxing GOT references as absolute references for mov
> instructions, even if the symbol was in a high address, as long as I
> kept the compile-time base address of the kernel image in the top 2G. I
> consulted the "Optimize GOTPCRELX Relocations" chapter in x86-64 psABI,
> which stated that "When position-independent code is disabled and foo is
> defined locally in the lower 32-bit address space, memory operand in mov
> can be converted into immediate operand". However, it seemed that if the
> symbol was in the higher 32-bit address space, the memory operand in mov
> would also be converted into an immediate operand. If I decreased the
> compile-time base address of the kernel image, it would be relaxed as
> lea. Therefore, I believe that using "-mrelax-relocations=no" without
> "-pie" option is necessary.

Indeed. As you noted, the linker assumes that non-PIE linked binaries
will always appear at their link time address, and relaxations will
try to take advantage of that.

Currently, we use -pie linking only for the decompressor, and we
should be able to drop -mrelax-relocations=no from its LDFLAGS. But
position dependent linking should not use relaxations at all.

> Is there a way to force the linker to relax
> it as lea without using the "-pie" option when linking?
>

Not that I am aware of.

> Since all GOT references cannot be omitted, perhaps I should try linking
> the kernel with the "-pie" option.
>

That way, we will end up with two sets of relocations, the static ones
from --emit-relocs and the dynamic ones from -pie. This should be
manageable, given that the difference between those sets should
exactly cover the GOT.

However, relying on --emit-relocs and -pie at the same time seems
clumsy to me. I'd prefer to only depend on -pie at /some/ point.

-- 
Ard.

  reply	other threads:[~2023-05-10  8:16 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-28  9:50 [PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 01/43] x86/crypto: Adapt assembly for PIE support Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 02/43] x86: Add macro to get symbol address " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 03/43] x86: relocate_kernel - Adapt assembly " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 04/43] x86/entry/64: " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 05/43] x86: pm-trace: " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 06/43] x86/CPU: " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 07/43] x86/acpi: " Hou Wenlong
2023-04-28 11:32   ` Rafael J. Wysocki
2023-04-28  9:50 ` [PATCH RFC 08/43] x86/boot/64: " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 09/43] x86/power/64: " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 10/43] x86/alternatives: " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 11/43] x86/irq: " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 12/43] x86,rethook: " Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 13/43] x86/paravirt: Use relative reference for original instruction Hou Wenlong
2023-06-01  9:29   ` Juergen Gross
2023-06-01  9:29     ` Juergen Gross via Virtualization
2023-06-05  6:40     ` Nadav Amit
2023-06-05  6:40       ` Nadav Amit via Virtualization
2023-06-06 11:35       ` Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 14/43] x86/Kconfig: Introduce new Kconfig for PIE kernel building Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 15/43] x86/PVH: Use fixed_percpu_data to set up GS base Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 16/43] x86-64: Use per-cpu stack canary if supported by compiler Hou Wenlong
2023-05-01 17:27   ` Nick Desaulniers
2023-05-05  6:14     ` Hou Wenlong
2023-05-05 18:02       ` Nick Desaulniers
2023-05-05 19:06         ` Fangrui Song
2023-05-08  8:06         ` Hou Wenlong
2023-05-04 10:31   ` Juergen Gross
2023-05-05  3:09     ` Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 17/43] x86/pie: Enable stack protector only if per-cpu stack canary is supported Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 18/43] x86/percpu: Use PC-relative addressing for percpu variable references Hou Wenlong
2023-04-28  9:50 ` [PATCH RFC 19/43] x86/tools: Explicitly include autoconf.h for hostprogs Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 20/43] x86/percpu: Adapt percpu references relocation for PIE support Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 21/43] x86/ftrace: Adapt assembly " Hou Wenlong
2023-04-28 13:37   ` Steven Rostedt
2023-04-29  3:43     ` Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 22/43] x86/ftrace: Adapt ftrace nop patching " Hou Wenlong
2023-04-28 13:44   ` Steven Rostedt
2023-04-29  3:38     ` Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 23/43] x86/pie: Force hidden visibility for all symbol references Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 24/43] x86/boot/compressed: Adapt sed command to generate voffset.h when PIE is enabled Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 25/43] x86/mm: Make the x86 GOT read-only Hou Wenlong
2023-04-30 14:23   ` Ard Biesheuvel
2023-05-08 11:40     ` Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 26/43] x86/pie: Add .data.rel.* sections into link script Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 27/43] x86/relocs: Handle PIE relocations Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 28/43] KVM: x86: Adapt assembly for PIE support Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 29/43] x86/PVH: Adapt PVH booting " Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 30/43] x86/bpf: Adapt BPF_CALL JIT codegen " Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 31/43] x86/modules: Adapt module loading " Hou Wenlong
2023-04-28 19:29   ` Ard Biesheuvel
2023-05-08  8:32     ` Hou Wenlong
2023-05-08  9:16       ` Ard Biesheuvel
2023-05-08 11:40         ` Hou Wenlong
2023-05-08 17:47           ` Ard Biesheuvel
2023-05-09  9:42             ` Hou Wenlong
2023-05-09  9:52               ` Ard Biesheuvel
2023-05-09 12:35                 ` Hou Wenlong
2023-05-10  7:09         ` Hou Wenlong
2023-05-10  8:15           ` Ard Biesheuvel [this message]
2023-04-28  9:51 ` [PATCH RFC 32/43] x86/boot/64: Use data relocation to get absloute address when PIE is enabled Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 33/43] objtool: Add validation for x86 PIE support Hou Wenlong
2023-04-28 10:28   ` Christophe Leroy
2023-04-28 11:43     ` Peter Zijlstra
2023-04-29  4:04       ` Hou Wenlong
2023-04-29  3:52     ` Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 34/43] objtool: Adapt indirect call of __fentry__() for " Hou Wenlong
2023-04-28 15:18   ` Peter Zijlstra
2023-04-28  9:51 ` [PATCH RFC 35/43] x86/pie: Build the kernel as PIE Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 36/43] x86/vsyscall: Don't use set_fixmap() to map vsyscall page Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 37/43] x86/xen: Pin up to VSYSCALL_ADDR when vsyscall page is out of fixmap area Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 38/43] x86/fixmap: Move vsyscall page " Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 39/43] x86/fixmap: Unify FIXADDR_TOP Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 40/43] x86/boot: Fill kernel image puds dynamically Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 41/43] x86/mm: Sort address_markers array when X86 PIE is enabled Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 42/43] x86/pie: Allow kernel image to be relocated in top 512G Hou Wenlong
2023-04-28  9:51 ` [PATCH RFC 43/43] x86/boot: Extend relocate range for PIE kernel image Hou Wenlong
2023-04-28 15:22 ` [PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible Peter Zijlstra
2023-05-06  7:19   ` Hou Wenlong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMj1kXF3N89J-Y8OptZjXPxnOLW+K5Ot+jCSBT+py90uvBxh1w@mail.gmail.com \
    --to=ardb@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=houwenlong.hwl@antgroup.com \
    --cc=hpa@zytor.com \
    --cc=jiangshan.ljs@antgroup.com \
    --cc=julian.pidancet@oracle.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.