From: "Hou Wenlong" <houwenlong.hwl@antgroup.com>
To: linux-kernel@vger.kernel.org
Cc: "Thomas Garnier" <thgarnie@chromium.org>,
"Lai Jiangshan" <jiangshan.ljs@antgroup.com>,
"Kees Cook" <keescook@chromium.org>,
"Hou Wenlong" <houwenlong.hwl@antgroup.com>,
"Nathan Chancellor" <nathan@kernel.org>,
"Nick Desaulniers" <ndesaulniers@google.com>,
"Tom Rix" <trix@redhat.com>, <bpf@vger.kernel.org>,
<llvm@lists.linux.dev>
Subject: [PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible
Date: Fri, 28 Apr 2023 17:50:40 +0800 [thread overview]
Message-ID: <cover.1682673542.git.houwenlong.hwl@antgroup.com> (raw)
Purpose:
These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated
below the top 2G of the virtual address space. And this patchset
provides an example to allow kernel image to be relocated in top 512G of
the address space.
The ultimate purpose for PIE kernel is to increase the security of the
the kernel and also the fleixbility of the kernel image's virtual
address, which can be even in the low half of the address space. More
locations the kernel can fit in, this means an attacker could guess
harder.
The patchset is based on Thomas Garnier's X86 PIE patchset v6[1] and
v11[2]. However, some design changes are made and some bugs are fixed by
testing with different configurations and compilers.
Important changes:
- For fixmap area, move vsyscall page out of fixmap area and unify
__FIXADDR_TOP for x86. Then fixmap area could be relocated together
with kernel image.
- For compile-time base address of kernel image, keep it in top 2G of
address space. Introduce a new variable to store the run-time base
address and adapt for VA/PA transition during runtime.
- For percpu section, keep it as zero mapping for SMP. Because
compile-time base address of kernel image still resides in top 2G of
address space, then RIP-relative reference can still be used when
percpu section is zero mapping. However, when do relocation for
percpu variable references, percpu variable should be treated as
normal variable and absolute references should be relocated
accordingly. In addition, the relocation offset should be subtracted
from the GS base in order to ensure correct operation.
- For x86/boot/head64.c, don't build it as mcmodel=large. Instead, use
data relocation to acqiure global symbol's value and make
fixup_pointer() as a nop when running in identity mapping. This is
because not all global symbol references in the code use
fixup_pointer(), e.g. variables in macro related to 5-level paging,
which can be optimized by GCC as relative referencs. If build it as
mcmodel=large, there will be more fixup_pointer() calls, resulting
in uglier code. Actually, if build it as PIE even when
CONFIG_X86_PIE is disabled, then all fixup_pointer() could be
dropped. However stack protector would be broken if per-cpu stack
protector is not supported.
Limitations:
- Since I am not familiar with XEN, it has been disabled for now as it
is not adapted for PIE. This is due to the assignment of wrong
pointers (with low address values) to x86_ini_ops when running in
identity mapping. This issue can be resolved by building pagetable
eraly and jumping to high kernel address as soon as possible.
- It is not allowed to reference global variables in an alternative
section since RIP-relative addressing is not fixed in
apply_alternatives(). Fortunately, all disallowed relocations in the
alternative section can be captured by objtool. I believe that this
issue can also be fixed by using objtool.
- For module loading, only allow to load module without GOT for
simplicity. Only weak global variable referencs are using GOT.
Tests:
I only have tested booting with GCC 5.1.0 (min version), GCC 12.2.0
and CLANG 15.0.7. And I have also run the following tests for both
default configuration and Ubuntu configuration.
Performance/Size impact (GCC 12.2.0):
Size of vmlinux (Default configuration):
File size:
- PIE disabled: +0.012%
- PIE enabled: -2.219%
instructions:
- PIE disabled: same
- PIE enabled: +1.383%
.text section:
- PIE disabled: same
- PIE enabled: +0.589%
Size of vmlinux (Ubuntu configuration):
File size:
- PIE disabled: same
- PIE enabled: +2.391%
instructions:
- PIE disabled: +0.013%
- PIE enabled: +1.566%
.text section:
- PIE disabled: same
- PIE enabled: +0.055%
The .text section size increase is due to more instructions required for
PIE code. There are two reasons that have been mentioned in previous
mailist. Firstly, switch folding is disabled under PIE [3]. Secondly,
two instructions are needed for PIE to represent a single instruction
with sign extension, such as when accessing an array element. While only
one instruction is required when using mcmode=kernel, for PIE, it needs
to use lea to get the base of the array first.
Hackbench (50% and 1600% on thread/process for pipe/sockets):
- PIE disabled: no significant change (avg -/+ 0.5% on default config).
- PIE enabled: -2% to +2% in average (default config).
Kernbench (average of 10 Half and Optimal runs):
Elapsed Time:
- PIE disabled: no significant change (avg -0.2% on ubuntu config)
- PIE enabled: average -0.2% to +0.2%
System Time:
- PIE disabled: no significant change (avg -0.5% on ubuntu config)
- PIE enabled: average -0.5% to +0.5%
[1] https://lore.kernel.org/all/20190131192533.34130-1-thgarnie@chromium.org
[2] https://lore.kernel.org/all/20200228000105.165012-1-thgarnie@chromium.org
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303
Brian Gerst (1):
x86-64: Use per-cpu stack canary if supported by compiler
Hou Wenlong (29):
x86/irq: Adapt assembly for PIE support
x86,rethook: Adapt assembly for PIE support
x86/paravirt: Use relative reference for original instruction
x86/Kconfig: Introduce new Kconfig for PIE kernel building
x86/PVH: Use fixed_percpu_data to set up GS base
x86/pie: Enable stack protector only if per-cpu stack canary is
supported
x86/percpu: Use PC-relative addressing for percpu variable references
x86/tools: Explicitly include autoconf.h for hostprogs
x86/percpu: Adapt percpu references relocation for PIE support
x86/ftrace: Adapt assembly for PIE support
x86/pie: Force hidden visibility for all symbol references
x86/boot/compressed: Adapt sed command to generate voffset.h when PIE
is enabled
x86/pie: Add .data.rel.* sections into link script
KVM: x86: Adapt assembly for PIE support
x86/PVH: Adapt PVH booting for PIE support
x86/bpf: Adapt BPF_CALL JIT codegen for PIE support
x86/modules: Adapt module loading for PIE support
x86/boot/64: Use data relocation to get absloute address when PIE is
enabled
objtool: Add validation for x86 PIE support
objtool: Adapt indirect call of __fentry__() for PIE support
x86/pie: Build the kernel as PIE
x86/vsyscall: Don't use set_fixmap() to map vsyscall page
x86/xen: Pin up to VSYSCALL_ADDR when vsyscall page is out of fixmap
area
x86/fixmap: Move vsyscall page out of fixmap area
x86/fixmap: Unify FIXADDR_TOP
x86/boot: Fill kernel image puds dynamically
x86/mm: Sort address_markers array when X86 PIE is enabled
x86/pie: Allow kernel image to be relocated in top 512G
x86/boot: Extend relocate range for PIE kernel image
Thomas Garnier (13):
x86/crypto: Adapt assembly for PIE support
x86: Add macro to get symbol address for PIE support
x86: relocate_kernel - Adapt assembly for PIE support
x86/entry/64: Adapt assembly for PIE support
x86: pm-trace: Adapt assembly for PIE support
x86/CPU: Adapt assembly for PIE support
x86/acpi: Adapt assembly for PIE support
x86/boot/64: Adapt assembly for PIE support
x86/power/64: Adapt assembly for PIE support
x86/alternatives: Adapt assembly for PIE support
x86/ftrace: Adapt ftrace nop patching for PIE support
x86/mm: Make the x86 GOT read-only
x86/relocs: Handle PIE relocations
Documentation/x86/x86_64/mm.rst | 4 +
arch/x86/Kconfig | 36 +++++-
arch/x86/Makefile | 33 +++--
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/kaslr.c | 55 +++++++++
arch/x86/boot/compressed/misc.c | 4 +-
arch/x86/boot/compressed/misc.h | 9 ++
arch/x86/crypto/aegis128-aesni-asm.S | 6 +-
arch/x86/crypto/aesni-intel_asm.S | 2 +-
arch/x86/crypto/aesni-intel_avx-x86_64.S | 3 +-
arch/x86/crypto/aria-aesni-avx-asm_64.S | 30 ++---
arch/x86/crypto/camellia-aesni-avx-asm_64.S | 30 ++---
arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 30 ++---
arch/x86/crypto/camellia-x86_64-asm_64.S | 8 +-
arch/x86/crypto/cast5-avx-x86_64-asm_64.S | 50 ++++----
arch/x86/crypto/cast6-avx-x86_64-asm_64.S | 44 ++++---
arch/x86/crypto/crc32c-pcl-intel-asm_64.S | 3 +-
arch/x86/crypto/des3_ede-asm_64.S | 96 ++++++++++-----
arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +-
arch/x86/crypto/sha256-avx2-asm.S | 18 ++-
arch/x86/entry/calling.h | 17 ++-
arch/x86/entry/entry_64.S | 22 +++-
arch/x86/entry/vdso/Makefile | 2 +-
arch/x86/entry/vsyscall/vsyscall_64.c | 7 +-
arch/x86/include/asm/alternative.h | 6 +-
arch/x86/include/asm/asm.h | 1 +
arch/x86/include/asm/fixmap.h | 28 +----
arch/x86/include/asm/irq_stack.h | 2 +-
arch/x86/include/asm/kmsan.h | 6 +-
arch/x86/include/asm/nospec-branch.h | 10 +-
arch/x86/include/asm/page_64.h | 8 +-
arch/x86/include/asm/page_64_types.h | 8 ++
arch/x86/include/asm/paravirt.h | 17 ++-
arch/x86/include/asm/paravirt_types.h | 12 +-
arch/x86/include/asm/percpu.h | 29 ++++-
arch/x86/include/asm/pgtable_64_types.h | 10 +-
arch/x86/include/asm/pm-trace.h | 2 +-
arch/x86/include/asm/processor.h | 17 ++-
arch/x86/include/asm/sections.h | 5 +
arch/x86/include/asm/stackprotector.h | 16 ++-
arch/x86/include/asm/sync_core.h | 6 +-
arch/x86/include/asm/vsyscall.h | 13 ++
arch/x86/kernel/acpi/wakeup_64.S | 31 ++---
arch/x86/kernel/alternative.c | 8 +-
arch/x86/kernel/asm-offsets_64.c | 2 +-
arch/x86/kernel/callthunks.c | 2 +-
arch/x86/kernel/cpu/common.c | 15 ++-
arch/x86/kernel/ftrace.c | 46 ++++++-
arch/x86/kernel/ftrace_64.S | 9 +-
arch/x86/kernel/head64.c | 77 +++++++++---
arch/x86/kernel/head_64.S | 68 ++++++++---
arch/x86/kernel/kvm.c | 21 +++-
arch/x86/kernel/module.c | 27 +++++
arch/x86/kernel/paravirt.c | 4 +
arch/x86/kernel/relocate_kernel_64.S | 2 +-
arch/x86/kernel/rethook.c | 8 ++
arch/x86/kernel/setup.c | 6 +
arch/x86/kernel/vmlinux.lds.S | 10 +-
arch/x86/kvm/svm/vmenter.S | 10 +-
arch/x86/kvm/vmx/vmenter.S | 2 +-
arch/x86/lib/cmpxchg16b_emu.S | 8 +-
arch/x86/mm/dump_pagetables.c | 36 +++++-
arch/x86/mm/fault.c | 1 -
arch/x86/mm/init_64.c | 10 +-
arch/x86/mm/ioremap.c | 5 +-
arch/x86/mm/kasan_init_64.c | 4 +-
arch/x86/mm/pat/set_memory.c | 2 +-
arch/x86/mm/pgtable.c | 13 ++
arch/x86/mm/pgtable_32.c | 3 -
arch/x86/mm/physaddr.c | 14 +--
arch/x86/net/bpf_jit_comp.c | 17 ++-
arch/x86/platform/efi/efi_thunk_64.S | 4 +
arch/x86/platform/pvh/head.S | 29 ++++-
arch/x86/power/hibernate_asm_64.S | 4 +-
arch/x86/tools/Makefile | 4 +-
arch/x86/tools/relocs.c | 113 ++++++++++++++++-
arch/x86/xen/mmu_pv.c | 32 +++--
arch/x86/xen/xen-asm.S | 10 +-
arch/x86/xen/xen-head.S | 14 ++-
include/asm-generic/vmlinux.lds.h | 12 ++
scripts/Makefile.lib | 1 +
scripts/recordmcount.c | 81 ++++++++-----
tools/objtool/arch/x86/decode.c | 10 +-
tools/objtool/builtin-check.c | 4 +-
tools/objtool/check.c | 121 +++++++++++++++++++
tools/objtool/include/objtool/builtin.h | 1 +
86 files changed, 1202 insertions(+), 410 deletions(-)
Patchset is based on tip/master.
base-commit: 01cbd032298654fe4c85e153dd9a224e5bc10194
--
2.31.1
next reply other threads:[~2023-04-28 9:51 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-28 9:50 Hou Wenlong [this message]
2023-04-28 9:50 ` [PATCH RFC 01/43] x86/crypto: Adapt assembly for PIE support Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 02/43] x86: Add macro to get symbol address " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 03/43] x86: relocate_kernel - Adapt assembly " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 04/43] x86/entry/64: " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 05/43] x86: pm-trace: " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 06/43] x86/CPU: " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 07/43] x86/acpi: " Hou Wenlong
2023-04-28 11:32 ` Rafael J. Wysocki
2023-04-28 9:50 ` [PATCH RFC 08/43] x86/boot/64: " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 09/43] x86/power/64: " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 10/43] x86/alternatives: " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 11/43] x86/irq: " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 12/43] x86,rethook: " Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 13/43] x86/paravirt: Use relative reference for original instruction Hou Wenlong
2023-06-01 9:29 ` Juergen Gross
2023-06-05 6:40 ` Nadav Amit
2023-06-06 11:35 ` Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 14/43] x86/Kconfig: Introduce new Kconfig for PIE kernel building Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 15/43] x86/PVH: Use fixed_percpu_data to set up GS base Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 16/43] x86-64: Use per-cpu stack canary if supported by compiler Hou Wenlong
2023-05-01 17:27 ` Nick Desaulniers
2023-05-05 6:14 ` Hou Wenlong
2023-05-05 18:02 ` Nick Desaulniers
2023-05-05 19:06 ` Fangrui Song
2023-05-08 8:06 ` Hou Wenlong
2023-05-04 10:31 ` Juergen Gross
2023-05-05 3:09 ` Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 17/43] x86/pie: Enable stack protector only if per-cpu stack canary is supported Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 18/43] x86/percpu: Use PC-relative addressing for percpu variable references Hou Wenlong
2023-04-28 9:50 ` [PATCH RFC 19/43] x86/tools: Explicitly include autoconf.h for hostprogs Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 20/43] x86/percpu: Adapt percpu references relocation for PIE support Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 21/43] x86/ftrace: Adapt assembly " Hou Wenlong
2023-04-28 13:37 ` Steven Rostedt
2023-04-29 3:43 ` Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 22/43] x86/ftrace: Adapt ftrace nop patching " Hou Wenlong
2023-04-28 13:44 ` Steven Rostedt
2023-04-29 3:38 ` Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 23/43] x86/pie: Force hidden visibility for all symbol references Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 24/43] x86/boot/compressed: Adapt sed command to generate voffset.h when PIE is enabled Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 25/43] x86/mm: Make the x86 GOT read-only Hou Wenlong
2023-04-30 14:23 ` Ard Biesheuvel
2023-05-08 11:40 ` Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 26/43] x86/pie: Add .data.rel.* sections into link script Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 27/43] x86/relocs: Handle PIE relocations Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 28/43] KVM: x86: Adapt assembly for PIE support Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 29/43] x86/PVH: Adapt PVH booting " Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 30/43] x86/bpf: Adapt BPF_CALL JIT codegen " Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 31/43] x86/modules: Adapt module loading " Hou Wenlong
2023-04-28 19:29 ` Ard Biesheuvel
2023-05-08 8:32 ` Hou Wenlong
2023-05-08 9:16 ` Ard Biesheuvel
2023-05-08 11:40 ` Hou Wenlong
2023-05-08 17:47 ` Ard Biesheuvel
2023-05-09 9:42 ` Hou Wenlong
2023-05-09 9:52 ` Ard Biesheuvel
2023-05-09 12:35 ` Hou Wenlong
2023-05-10 7:09 ` Hou Wenlong
2023-05-10 8:15 ` Ard Biesheuvel
2023-04-28 9:51 ` [PATCH RFC 32/43] x86/boot/64: Use data relocation to get absloute address when PIE is enabled Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 33/43] objtool: Add validation for x86 PIE support Hou Wenlong
2023-04-28 10:28 ` Christophe Leroy
2023-04-28 11:43 ` Peter Zijlstra
2023-04-29 4:04 ` Hou Wenlong
2023-04-29 3:52 ` Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 34/43] objtool: Adapt indirect call of __fentry__() for " Hou Wenlong
2023-04-28 15:18 ` Peter Zijlstra
2023-04-28 9:51 ` [PATCH RFC 35/43] x86/pie: Build the kernel as PIE Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 36/43] x86/vsyscall: Don't use set_fixmap() to map vsyscall page Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 37/43] x86/xen: Pin up to VSYSCALL_ADDR when vsyscall page is out of fixmap area Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 38/43] x86/fixmap: Move vsyscall page " Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 39/43] x86/fixmap: Unify FIXADDR_TOP Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 40/43] x86/boot: Fill kernel image puds dynamically Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 41/43] x86/mm: Sort address_markers array when X86 PIE is enabled Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 42/43] x86/pie: Allow kernel image to be relocated in top 512G Hou Wenlong
2023-04-28 9:51 ` [PATCH RFC 43/43] x86/boot: Extend relocate range for PIE kernel image Hou Wenlong
2023-04-28 15:22 ` [PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible Peter Zijlstra
2023-05-06 7:19 ` Hou Wenlong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1682673542.git.houwenlong.hwl@antgroup.com \
--to=houwenlong.hwl@antgroup.com \
--cc=bpf@vger.kernel.org \
--cc=jiangshan.ljs@antgroup.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=thgarnie@chromium.org \
--cc=trix@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).