All of lore.kernel.org
 help / color / mirror / Atom feed
* x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)

^ permalink raw reply	[flat|nested] 231+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)

^ permalink raw reply	[flat|nested] 231+ messages in thread

* [kernel-hardening] x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)

^ permalink raw reply	[flat|nested] 231+ messages in thread

* [RFC v3 01/27] x86/crypto: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/crypto/aes-x86_64-asm_64.S          | 45 ++++++++-----
 arch/x86/crypto/aesni-intel_asm.S            | 14 ++--
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |  6 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 42 ++++++------
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 ++++++-------
 arch/x86/crypto/camellia-x86_64-asm_64.S     |  8 ++-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    | 50 ++++++++-------
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    | 44 +++++++------
 arch/x86/crypto/des3_ede-asm_64.S            | 96 ++++++++++++++++++----------
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |  4 +-
 arch/x86/crypto/glue_helper-asm-avx.S        |  4 +-
 arch/x86/crypto/glue_helper-asm-avx2.S       |  6 +-
 12 files changed, 211 insertions(+), 152 deletions(-)

diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S b/arch/x86/crypto/aes-x86_64-asm_64.S
index 8739cf7795de..86fa068e5e81 100644
--- a/arch/x86/crypto/aes-x86_64-asm_64.S
+++ b/arch/x86/crypto/aes-x86_64-asm_64.S
@@ -48,8 +48,12 @@
 #define R10	%r10
 #define R11	%r11
 
+/* Hold global for PIE suport */
+#define RBASE	%r12
+
 #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \
 	ENTRY(FUNC);			\
+	pushq	RBASE;			\
 	movq	r1,r2;			\
 	leaq	KEY+48(r8),r9;		\
 	movq	r10,r11;		\
@@ -74,54 +78,63 @@
 	movl	r6 ## E,4(r9);		\
 	movl	r7 ## E,8(r9);		\
 	movl	r8 ## E,12(r9);		\
+	popq	RBASE;			\
 	ret;				\
 	ENDPROC(FUNC);
 
+#define round_mov(tab_off, reg_i, reg_o) \
+	leaq	tab_off(%rip), RBASE; \
+	movl	(RBASE,reg_i,4), reg_o;
+
+#define round_xor(tab_off, reg_i, reg_o) \
+	leaq	tab_off(%rip), RBASE; \
+	xorl	(RBASE,reg_i,4), reg_o;
+
 #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
 	movzbl	r2 ## H,r5 ## E;	\
 	movzbl	r2 ## L,r6 ## E;	\
-	movl	TAB+1024(,r5,4),r5 ## E;\
+	round_mov(TAB+1024, r5, r5 ## E)\
 	movw	r4 ## X,r2 ## X;	\
-	movl	TAB(,r6,4),r6 ## E;	\
+	round_mov(TAB, r6, r6 ## E)	\
 	roll	$16,r2 ## E;		\
 	shrl	$16,r4 ## E;		\
 	movzbl	r4 ## L,r7 ## E;	\
 	movzbl	r4 ## H,r4 ## E;	\
 	xorl	OFFSET(r8),ra ## E;	\
 	xorl	OFFSET+4(r8),rb ## E;	\
-	xorl	TAB+3072(,r4,4),r5 ## E;\
-	xorl	TAB+2048(,r7,4),r6 ## E;\
+	round_xor(TAB+3072, r4, r5 ## E)\
+	round_xor(TAB+2048, r7, r6 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r4 ## E;	\
-	movl	TAB+1024(,r4,4),r4 ## E;\
+	round_mov(TAB+1024, r4, r4 ## E)\
 	movw	r3 ## X,r1 ## X;	\
 	roll	$16,r1 ## E;		\
 	shrl	$16,r3 ## E;		\
-	xorl	TAB(,r7,4),r5 ## E;	\
+	round_xor(TAB, r7, r5 ## E)	\
 	movzbl	r3 ## L,r7 ## E;	\
 	movzbl	r3 ## H,r3 ## E;	\
-	xorl	TAB+3072(,r3,4),r4 ## E;\
-	xorl	TAB+2048(,r7,4),r5 ## E;\
+	round_xor(TAB+3072, r3, r4 ## E)\
+	round_xor(TAB+2048, r7, r5 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r3 ## E;	\
 	shrl	$16,r1 ## E;		\
-	xorl	TAB+3072(,r3,4),r6 ## E;\
-	movl	TAB+2048(,r7,4),r3 ## E;\
+	round_xor(TAB+3072, r3, r6 ## E)\
+	round_mov(TAB+2048, r7, r3 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r1 ## E;	\
-	xorl	TAB+1024(,r1,4),r6 ## E;\
-	xorl	TAB(,r7,4),r3 ## E;	\
+	round_xor(TAB+1024, r1, r6 ## E)\
+	round_xor(TAB, r7, r3 ## E)	\
 	movzbl	r2 ## H,r1 ## E;	\
 	movzbl	r2 ## L,r7 ## E;	\
 	shrl	$16,r2 ## E;		\
-	xorl	TAB+3072(,r1,4),r3 ## E;\
-	xorl	TAB+2048(,r7,4),r4 ## E;\
+	round_xor(TAB+3072, r1, r3 ## E)\
+	round_xor(TAB+2048, r7, r4 ## E)\
 	movzbl	r2 ## H,r1 ## E;	\
 	movzbl	r2 ## L,r2 ## E;	\
 	xorl	OFFSET+8(r8),rc ## E;	\
 	xorl	OFFSET+12(r8),rd ## E;	\
-	xorl	TAB+1024(,r1,4),r3 ## E;\
-	xorl	TAB(,r2,4),r4 ## E;
+	round_xor(TAB+1024, r1, r3 ## E)\
+	round_xor(TAB, r2, r4 ## E)
 
 #define move_regs(r1,r2,r3,r4) \
 	movl	r3 ## E,r1 ## E;	\
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 16627fec80b2..5f73201dff32 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -325,7 +325,8 @@ _get_AAD_rest0\num_initial_blocks\operation:
 	vpshufb and an array of shuffle masks */
 	movq	   %r12, %r11
 	salq	   $4, %r11
-	movdqu	   aad_shift_arr(%r11), \TMP1
+	leaq	   aad_shift_arr(%rip), %rax
+	movdqu	   (%rax,%r11,), \TMP1
 	PSHUFB_XMM \TMP1, %xmm\i
 _get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
@@ -584,7 +585,8 @@ _get_AAD_rest0\num_initial_blocks\operation:
 	vpshufb and an array of shuffle masks */
 	movq	   %r12, %r11
 	salq	   $4, %r11
-	movdqu	   aad_shift_arr(%r11), \TMP1
+	leaq	   aad_shift_arr(%rip), %rax
+	movdqu	   (%rax,%r11,), \TMP1
 	PSHUFB_XMM \TMP1, %xmm\i
 _get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
@@ -2722,7 +2724,7 @@ ENDPROC(aesni_cbc_dec)
  */
 .align 4
 _aesni_inc_init:
-	movaps .Lbswap_mask, BSWAP_MASK
+	movaps .Lbswap_mask(%rip), BSWAP_MASK
 	movaps IV, CTR
 	PSHUFB_XMM BSWAP_MASK CTR
 	mov $1, TCTR_LOW
@@ -2850,12 +2852,12 @@ ENTRY(aesni_xts_crypt8)
 	cmpb $0, %cl
 	movl $0, %ecx
 	movl $240, %r10d
-	leaq _aesni_enc4, %r11
-	leaq _aesni_dec4, %rax
+	leaq _aesni_enc4(%rip), %r11
+	leaq _aesni_dec4(%rip), %rax
 	cmovel %r10d, %ecx
 	cmoveq %rax, %r11
 
-	movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
 	movups (IVP), IV
 
 	mov 480(KEYP), KLEN
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb1518bf8..488605b19fe8 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -454,7 +454,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	movdqu  aad_shift_arr(%r11), \T1
+	leaq	aad_shift_arr(%rip), %rax
+	movdqu  (%rax,%r11,), \T1
 	vpshufb \T1, reg_i, reg_i
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), reg_i, reg_i
@@ -1761,7 +1762,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	movdqu  aad_shift_arr(%r11), \T1
+	leaq	aad_shift_arr(%rip), %rax
+	movdqu  (%rax,%r11,), \T1
 	vpshufb \T1, reg_i, reg_i
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), reg_i, reg_i
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e2863c..46feaea52632 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -52,10 +52,10 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vmovdqa .Linv_shift_row, t4; \
-	vbroadcastss .L0f0f0f0f, t7; \
-	vmovdqa .Lpre_tf_lo_s1, t0; \
-	vmovdqa .Lpre_tf_hi_s1, t1; \
+	vmovdqa .Linv_shift_row(%rip), t4; \
+	vbroadcastss .L0f0f0f0f(%rip), t7; \
+	vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -68,8 +68,8 @@
 	vpshufb t4, x6, x6; \
 	\
 	/* prefilter sboxes 1, 2 and 3 */ \
-	vmovdqa .Lpre_tf_lo_s4, t2; \
-	vmovdqa .Lpre_tf_hi_s4, t3; \
+	vmovdqa .Lpre_tf_lo_s4(%rip), t2; \
+	vmovdqa .Lpre_tf_hi_s4(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x1, t0, t1, t7, t6); \
@@ -83,8 +83,8 @@
 	filter_8bit(x6, t2, t3, t7, t6); \
 	\
 	/* AES subbytes + AES shift rows */ \
-	vmovdqa .Lpost_tf_lo_s1, t0; \
-	vmovdqa .Lpost_tf_hi_s1, t1; \
+	vmovdqa .Lpost_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4, x0, x0; \
 	vaesenclast t4, x7, x7; \
 	vaesenclast t4, x1, x1; \
@@ -95,16 +95,16 @@
 	vaesenclast t4, x6, x6; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vmovdqa .Lpost_tf_lo_s3, t2; \
-	vmovdqa .Lpost_tf_hi_s3, t3; \
+	vmovdqa .Lpost_tf_lo_s3(%rip), t2; \
+	vmovdqa .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vmovdqa .Lpost_tf_lo_s2, t4; \
-	vmovdqa .Lpost_tf_hi_s2, t5; \
+	vmovdqa .Lpost_tf_lo_s2(%rip), t4; \
+	vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -443,7 +443,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vmovdqu .Lshufb_16x16b, a0; \
+	vmovdqu .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -482,7 +482,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 16(rio), x0, y7; \
 	vpxor 1 * 16(rio), x0, y6; \
@@ -533,7 +533,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1016,7 +1016,7 @@ ENTRY(camellia_ctr_16way)
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lbswap128_mask, %xmm14;
+	vmovdqa .Lbswap128_mask(%rip), %xmm14;
 
 	/* load IV and byteswap */
 	vmovdqu (%rcx), %xmm0;
@@ -1065,7 +1065,7 @@ ENTRY(camellia_ctr_16way)
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor %xmm0, %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1133,7 +1133,7 @@ camellia_xts_crypt_16way:
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+	vmovdqa .Lxts_gf128mul_and_shl1_mask(%rip), %xmm14;
 
 	/* load IV */
 	vmovdqu (%rcx), %xmm0;
@@ -1209,7 +1209,7 @@ camellia_xts_crypt_16way:
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX, %r8, 8), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor 0 * 16(%rax), %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1264,7 +1264,7 @@ ENTRY(camellia_xts_enc_16way)
 	 */
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk16, %r9;
+	leaq __camellia_enc_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 ENDPROC(camellia_xts_enc_16way)
@@ -1282,7 +1282,7 @@ ENTRY(camellia_xts_dec_16way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk16, %r9;
+	leaq __camellia_dec_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 ENDPROC(camellia_xts_dec_16way)
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index eee5b3982cfd..93da327fec83 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -69,12 +69,12 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vbroadcasti128 .Linv_shift_row, t4; \
-	vpbroadcastd .L0f0f0f0f, t7; \
-	vbroadcasti128 .Lpre_tf_lo_s1, t5; \
-	vbroadcasti128 .Lpre_tf_hi_s1, t6; \
-	vbroadcasti128 .Lpre_tf_lo_s4, t2; \
-	vbroadcasti128 .Lpre_tf_hi_s4, t3; \
+	vbroadcasti128 .Linv_shift_row(%rip), t4; \
+	vpbroadcastd .L0f0f0f0f(%rip), t7; \
+	vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \
+	vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \
+	vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \
+	vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -120,8 +120,8 @@
 	vinserti128 $1, t2##_x, x6, x6; \
 	vextracti128 $1, x1, t3##_x; \
 	vextracti128 $1, x4, t2##_x; \
-	vbroadcasti128 .Lpost_tf_lo_s1, t0; \
-	vbroadcasti128 .Lpost_tf_hi_s1, t1; \
+	vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \
+	vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4##_x, x2##_x, x2##_x; \
 	vaesenclast t4##_x, t6##_x, t6##_x; \
 	vinserti128 $1, t6##_x, x2, x2; \
@@ -136,16 +136,16 @@
 	vinserti128 $1, t2##_x, x4, x4; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vbroadcasti128 .Lpost_tf_lo_s3, t2; \
-	vbroadcasti128 .Lpost_tf_hi_s3, t3; \
+	vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \
+	vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vbroadcasti128 .Lpost_tf_lo_s2, t4; \
-	vbroadcasti128 .Lpost_tf_hi_s2, t5; \
+	vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \
+	vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -482,7 +482,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vbroadcasti128 .Lshufb_16x16b, a0; \
+	vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -521,7 +521,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 32(rio), x0, y7; \
 	vpxor 1 * 32(rio), x0, y6; \
@@ -572,7 +572,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1112,7 +1112,7 @@ ENTRY(camellia_ctr_32way)
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm1;
 	inc_le128(%xmm0, %xmm15, %xmm14);
-	vbroadcasti128 .Lbswap128_mask, %ymm14;
+	vbroadcasti128 .Lbswap128_mask(%rip), %ymm14;
 	vinserti128 $1, %xmm0, %ymm1, %ymm0;
 	vpshufb %ymm14, %ymm0, %ymm13;
 	vmovdqu %ymm13, 15 * 32(%rax);
@@ -1158,7 +1158,7 @@ ENTRY(camellia_ctr_32way)
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor %ymm0, %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1242,13 +1242,13 @@ camellia_xts_crypt_32way:
 	subq $(16 * 32), %rsp;
 	movq %rsp, %rax;
 
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0, %ymm12;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0(%rip), %ymm12;
 
 	/* load IV and construct second IV */
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm15;
 	gf128mul_x_ble(%xmm0, %xmm12, %xmm13);
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1, %ymm13;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1(%rip), %ymm13;
 	vinserti128 $1, %xmm0, %ymm15, %ymm0;
 	vpxor 0 * 32(%rdx), %ymm0, %ymm15;
 	vmovdqu %ymm15, 15 * 32(%rax);
@@ -1325,7 +1325,7 @@ camellia_xts_crypt_32way:
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX, %r8, 8), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor 0 * 32(%rax), %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1383,7 +1383,7 @@ ENTRY(camellia_xts_enc_32way)
 
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk32, %r9;
+	leaq __camellia_enc_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 ENDPROC(camellia_xts_enc_32way)
@@ -1401,7 +1401,7 @@ ENTRY(camellia_xts_dec_32way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk32, %r9;
+	leaq __camellia_dec_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 ENDPROC(camellia_xts_dec_32way)
diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S
index 95ba6956a7f6..ef1137406959 100644
--- a/arch/x86/crypto/camellia-x86_64-asm_64.S
+++ b/arch/x86/crypto/camellia-x86_64-asm_64.S
@@ -92,11 +92,13 @@
 #define RXORbl %r9b
 
 #define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \
+	leaq T0(%rip), 			tmp1; \
 	movzbl ab ## bl,		tmp2 ## d; \
+	xorq (tmp1, tmp2, 8),		dst; \
+	leaq T1(%rip), 			tmp2; \
 	movzbl ab ## bh,		tmp1 ## d; \
-	rorq $16,			ab; \
-	xorq T0(, tmp2, 8),		dst; \
-	xorq T1(, tmp1, 8),		dst;
+	xorq (tmp2, tmp1, 8),		dst; \
+	rorq $16,			ab;
 
 /**********************************************************************
   1-way camellia
diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
index 86107c961bb4..64eb5c87d04a 100644
--- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -166,15 +170,15 @@
 	subround(l ## 3, r ## 3, l ## 4, r ## 4, f);
 
 #define enc_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR;
 
 #define dec_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR; \
-	vpshufb		.Lbswap128_mask,          RKR, RKR;
+	vpshufb		.Lbswap128_mask(%rip),    RKR, RKR;
 
 #define transpose_2x4(x0, x1, t0, t1) \
 	vpunpckldq		x1, x0, t0; \
@@ -251,9 +255,9 @@ __cast5_enc_blk16:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	enc_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -287,7 +291,7 @@ __cast5_enc_blk16:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RR1, RL1, RTMP, RX, RKM);
 	outunpack_blocks(RR2, RL2, RTMP, RX, RKM);
@@ -325,9 +329,9 @@ __cast5_dec_blk16:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	dec_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -358,7 +362,7 @@ __cast5_dec_blk16:
 	round(RL, RR, 1, 2);
 	round(RR, RL, 0, 1);
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	popq %rbx;
 	popq %r15;
 
@@ -521,8 +525,8 @@ ENTRY(cast5_ctr_16way)
 
 	vpcmpeqd RKR, RKR, RKR;
 	vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */
-	vmovdqa .Lbswap_iv_mask, R1ST;
-	vmovdqa .Lbswap128_mask, RKM;
+	vmovdqa .Lbswap_iv_mask(%rip), R1ST;
+	vmovdqa .Lbswap128_mask(%rip), RKM;
 
 	/* load IV and byteswap */
 	vmovq (%rcx), RX;
diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index 7f30b6f0d72c..da1b7e4a23e4 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -190,10 +194,10 @@
 	qop(RD, RC, 1);
 
 #define shuffle(mask) \
-	vpshufb		mask,            RKR, RKR;
+	vpshufb		mask(%rip),            RKR, RKR;
 
 #define preload_rkr(n, do_mask, mask) \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		(kr+n*16)(CTX),           RKR, RKR; \
 	do_mask(mask);
@@ -275,9 +279,9 @@ __cast6_enc_blk8:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -301,7 +305,7 @@ __cast6_enc_blk8:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -323,9 +327,9 @@ __cast6_dec_blk8:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -349,7 +353,7 @@ __cast6_dec_blk8:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
 
diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S
index 8e49ce117494..4bbd3ec78df5 100644
--- a/arch/x86/crypto/des3_ede-asm_64.S
+++ b/arch/x86/crypto/des3_ede-asm_64.S
@@ -138,21 +138,29 @@
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
 	shrq $16, RW0; \
-	movq s8(, RT0, 8), RT0; \
-	xorq s6(, RT1, 8), to; \
+	leaq s8(%rip), RW1; \
+	movq (RW1, RT0, 8), RT0; \
+	leaq s6(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
 	movzbl RW0bl, RL1d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s4(, RT2, 8), RT0; \
-	xorq s2(, RT3, 8), to; \
+	leaq s4(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
+	leaq s2(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
-	xorq s7(, RL1, 8), RT0; \
-	xorq s5(, RT1, 8), to; \
-	xorq s3(, RT2, 8), RT0; \
+	leaq s7(%rip), RW1; \
+	xorq (RW1, RL1, 8), RT0; \
+	leaq s5(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
+	leaq s3(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
 	load_next_key(n, RW0); \
 	xorq RT0, to; \
-	xorq s1(, RT3, 8), to; \
+	leaq s1(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 
 #define load_next_key(n, RWx) \
 	movq (((n) + 1) * 8)(CTX), RWx;
@@ -364,65 +372,89 @@ ENDPROC(des3_ede_x86_64_crypt_blk)
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s8(, RT3, 8), to##0; \
-	xorq s6(, RT1, 8), to##0; \
+	leaq s8(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s6(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s4(, RT3, 8), to##0; \
-	xorq s2(, RT1, 8), to##0; \
+	leaq s4(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s2(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s7(, RT3, 8), to##0; \
-	xorq s5(, RT1, 8), to##0; \
+	leaq s7(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s5(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	load_next_key(n, RW0); \
-	xorq s3(, RT3, 8), to##0; \
-	xorq s1(, RT1, 8), to##0; \
+	leaq s3(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s1(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 		xorq from##1, RW1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s8(, RT3, 8), to##1; \
-		xorq s6(, RT1, 8), to##1; \
+		leaq s8(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s6(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s4(, RT3, 8), to##1; \
-		xorq s2(, RT1, 8), to##1; \
+		leaq s4(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s2(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrl $16, RW1d; \
-		xorq s7(, RT3, 8), to##1; \
-		xorq s5(, RT1, 8), to##1; \
+		leaq s7(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s5(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		do_movq(RW0, RW1); \
-		xorq s3(, RT3, 8), to##1; \
-		xorq s1(, RT1, 8), to##1; \
+		leaq s3(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s1(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 			xorq from##2, RW2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s8(, RT3, 8), to##2; \
-			xorq s6(, RT1, 8), to##2; \
+			leaq s8(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s6(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s4(, RT3, 8), to##2; \
-			xorq s2(, RT1, 8), to##2; \
+			leaq s4(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s2(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrl $16, RW2d; \
-			xorq s7(, RT3, 8), to##2; \
-			xorq s5(, RT1, 8), to##2; \
+			leaq s7(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s5(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			do_movq(RW0, RW2); \
-			xorq s3(, RT3, 8), to##2; \
-			xorq s1(, RT1, 8), to##2;
+			leaq s3(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s1(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2;
 
 #define __movq(src, dst) \
 	movq src, dst;
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index f94375a8dcd1..d56a281221fb 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -97,7 +97,7 @@ ENTRY(clmul_ghash_mul)
 	FRAME_BEGIN
 	movups (%rdi), DATA
 	movups (%rsi), SHASH
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	PSHUFB_XMM BSWAP DATA
 	call __clmul_gf128mul_ble
 	PSHUFB_XMM BSWAP DATA
@@ -114,7 +114,7 @@ ENTRY(clmul_ghash_update)
 	FRAME_BEGIN
 	cmp $16, %rdx
 	jb .Lupdate_just_ret	# check length
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	movups (%rdi), DATA
 	movups (%rcx), SHASH
 	PSHUFB_XMM BSWAP DATA
diff --git a/arch/x86/crypto/glue_helper-asm-avx.S b/arch/x86/crypto/glue_helper-asm-avx.S
index 02ee2308fb38..8a49ab1699ef 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -54,7 +54,7 @@
 #define load_ctr_8way(iv, bswap, x0, x1, x2, x3, x4, x5, x6, x7, t0, t1, t2) \
 	vpcmpeqd t0, t0, t0; \
 	vpsrldq $8, t0, t0; /* low: -1, high: 0 */ \
-	vmovdqa bswap, t1; \
+	vmovdqa bswap(%rip), t1; \
 	\
 	/* load IV and byteswap */ \
 	vmovdqu (iv), x7; \
@@ -99,7 +99,7 @@
 
 #define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
 		      t1, xts_gf128mul_and_shl1_mask) \
-	vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+	vmovdqa xts_gf128mul_and_shl1_mask(%rip), t0; \
 	\
 	/* load IV */ \
 	vmovdqu (iv), tiv; \
diff --git a/arch/x86/crypto/glue_helper-asm-avx2.S b/arch/x86/crypto/glue_helper-asm-avx2.S
index a53ac11dd385..e04c80467bd2 100644
--- a/arch/x86/crypto/glue_helper-asm-avx2.S
+++ b/arch/x86/crypto/glue_helper-asm-avx2.S
@@ -67,7 +67,7 @@
 	vmovdqu (iv), t2x; \
 	vmovdqa t2x, t3x; \
 	inc_le128(t2x, t0x, t1x); \
-	vbroadcasti128 bswap, t1; \
+	vbroadcasti128 bswap(%rip), t1; \
 	vinserti128 $1, t2x, t3, t2; /* ab: le0 ; cd: le1 */ \
 	vpshufb t1, t2, x0; \
 	\
@@ -124,13 +124,13 @@
 		       tivx, t0, t0x, t1, t1x, t2, t2x, t3, \
 		       xts_gf128mul_and_shl1_mask_0, \
 		       xts_gf128mul_and_shl1_mask_1) \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_0, t1; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_0(%rip), t1; \
 	\
 	/* load IV and construct second IV */ \
 	vmovdqu (iv), tivx; \
 	vmovdqa tivx, t0x; \
 	gf128mul_x_ble(tivx, t1x, t2x); \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_1, t2; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_1(%rip), t2; \
 	vinserti128 $1, tivx, t0, tiv; \
 	vpxor (0*32)(src), tiv, x0; \
 	vmovdqu tiv, (0*32)(dst); \
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 01/27] x86/crypto: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/crypto/aes-x86_64-asm_64.S          | 45 ++++++++-----
 arch/x86/crypto/aesni-intel_asm.S            | 14 ++--
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |  6 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 42 ++++++------
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 ++++++-------
 arch/x86/crypto/camellia-x86_64-asm_64.S     |  8 ++-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    | 50 ++++++++-------
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    | 44 +++++++------
 arch/x86/crypto/des3_ede-asm_64.S            | 96 ++++++++++++++++++----------
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |  4 +-
 arch/x86/crypto/glue_helper-asm-avx.S        |  4 +-
 arch/x86/crypto/glue_helper-asm-avx2.S       |  6 +-
 12 files changed, 211 insertions(+), 152 deletions(-)

diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S b/arch/x86/crypto/aes-x86_64-asm_64.S
index 8739cf7795de..86fa068e5e81 100644
--- a/arch/x86/crypto/aes-x86_64-asm_64.S
+++ b/arch/x86/crypto/aes-x86_64-asm_64.S
@@ -48,8 +48,12 @@
 #define R10	%r10
 #define R11	%r11
 
+/* Hold global for PIE suport */
+#define RBASE	%r12
+
 #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \
 	ENTRY(FUNC);			\
+	pushq	RBASE;			\
 	movq	r1,r2;			\
 	leaq	KEY+48(r8),r9;		\
 	movq	r10,r11;		\
@@ -74,54 +78,63 @@
 	movl	r6 ## E,4(r9);		\
 	movl	r7 ## E,8(r9);		\
 	movl	r8 ## E,12(r9);		\
+	popq	RBASE;			\
 	ret;				\
 	ENDPROC(FUNC);
 
+#define round_mov(tab_off, reg_i, reg_o) \
+	leaq	tab_off(%rip), RBASE; \
+	movl	(RBASE,reg_i,4), reg_o;
+
+#define round_xor(tab_off, reg_i, reg_o) \
+	leaq	tab_off(%rip), RBASE; \
+	xorl	(RBASE,reg_i,4), reg_o;
+
 #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
 	movzbl	r2 ## H,r5 ## E;	\
 	movzbl	r2 ## L,r6 ## E;	\
-	movl	TAB+1024(,r5,4),r5 ## E;\
+	round_mov(TAB+1024, r5, r5 ## E)\
 	movw	r4 ## X,r2 ## X;	\
-	movl	TAB(,r6,4),r6 ## E;	\
+	round_mov(TAB, r6, r6 ## E)	\
 	roll	$16,r2 ## E;		\
 	shrl	$16,r4 ## E;		\
 	movzbl	r4 ## L,r7 ## E;	\
 	movzbl	r4 ## H,r4 ## E;	\
 	xorl	OFFSET(r8),ra ## E;	\
 	xorl	OFFSET+4(r8),rb ## E;	\
-	xorl	TAB+3072(,r4,4),r5 ## E;\
-	xorl	TAB+2048(,r7,4),r6 ## E;\
+	round_xor(TAB+3072, r4, r5 ## E)\
+	round_xor(TAB+2048, r7, r6 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r4 ## E;	\
-	movl	TAB+1024(,r4,4),r4 ## E;\
+	round_mov(TAB+1024, r4, r4 ## E)\
 	movw	r3 ## X,r1 ## X;	\
 	roll	$16,r1 ## E;		\
 	shrl	$16,r3 ## E;		\
-	xorl	TAB(,r7,4),r5 ## E;	\
+	round_xor(TAB, r7, r5 ## E)	\
 	movzbl	r3 ## L,r7 ## E;	\
 	movzbl	r3 ## H,r3 ## E;	\
-	xorl	TAB+3072(,r3,4),r4 ## E;\
-	xorl	TAB+2048(,r7,4),r5 ## E;\
+	round_xor(TAB+3072, r3, r4 ## E)\
+	round_xor(TAB+2048, r7, r5 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r3 ## E;	\
 	shrl	$16,r1 ## E;		\
-	xorl	TAB+3072(,r3,4),r6 ## E;\
-	movl	TAB+2048(,r7,4),r3 ## E;\
+	round_xor(TAB+3072, r3, r6 ## E)\
+	round_mov(TAB+2048, r7, r3 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r1 ## E;	\
-	xorl	TAB+1024(,r1,4),r6 ## E;\
-	xorl	TAB(,r7,4),r3 ## E;	\
+	round_xor(TAB+1024, r1, r6 ## E)\
+	round_xor(TAB, r7, r3 ## E)	\
 	movzbl	r2 ## H,r1 ## E;	\
 	movzbl	r2 ## L,r7 ## E;	\
 	shrl	$16,r2 ## E;		\
-	xorl	TAB+3072(,r1,4),r3 ## E;\
-	xorl	TAB+2048(,r7,4),r4 ## E;\
+	round_xor(TAB+3072, r1, r3 ## E)\
+	round_xor(TAB+2048, r7, r4 ## E)\
 	movzbl	r2 ## H,r1 ## E;	\
 	movzbl	r2 ## L,r2 ## E;	\
 	xorl	OFFSET+8(r8),rc ## E;	\
 	xorl	OFFSET+12(r8),rd ## E;	\
-	xorl	TAB+1024(,r1,4),r3 ## E;\
-	xorl	TAB(,r2,4),r4 ## E;
+	round_xor(TAB+1024, r1, r3 ## E)\
+	round_xor(TAB, r2, r4 ## E)
 
 #define move_regs(r1,r2,r3,r4) \
 	movl	r3 ## E,r1 ## E;	\
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 16627fec80b2..5f73201dff32 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -325,7 +325,8 @@ _get_AAD_rest0\num_initial_blocks\operation:
 	vpshufb and an array of shuffle masks */
 	movq	   %r12, %r11
 	salq	   $4, %r11
-	movdqu	   aad_shift_arr(%r11), \TMP1
+	leaq	   aad_shift_arr(%rip), %rax
+	movdqu	   (%rax,%r11,), \TMP1
 	PSHUFB_XMM \TMP1, %xmm\i
 _get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
@@ -584,7 +585,8 @@ _get_AAD_rest0\num_initial_blocks\operation:
 	vpshufb and an array of shuffle masks */
 	movq	   %r12, %r11
 	salq	   $4, %r11
-	movdqu	   aad_shift_arr(%r11), \TMP1
+	leaq	   aad_shift_arr(%rip), %rax
+	movdqu	   (%rax,%r11,), \TMP1
 	PSHUFB_XMM \TMP1, %xmm\i
 _get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
@@ -2722,7 +2724,7 @@ ENDPROC(aesni_cbc_dec)
  */
 .align 4
 _aesni_inc_init:
-	movaps .Lbswap_mask, BSWAP_MASK
+	movaps .Lbswap_mask(%rip), BSWAP_MASK
 	movaps IV, CTR
 	PSHUFB_XMM BSWAP_MASK CTR
 	mov $1, TCTR_LOW
@@ -2850,12 +2852,12 @@ ENTRY(aesni_xts_crypt8)
 	cmpb $0, %cl
 	movl $0, %ecx
 	movl $240, %r10d
-	leaq _aesni_enc4, %r11
-	leaq _aesni_dec4, %rax
+	leaq _aesni_enc4(%rip), %r11
+	leaq _aesni_dec4(%rip), %rax
 	cmovel %r10d, %ecx
 	cmoveq %rax, %r11
 
-	movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
 	movups (IVP), IV
 
 	mov 480(KEYP), KLEN
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb1518bf8..488605b19fe8 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -454,7 +454,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	movdqu  aad_shift_arr(%r11), \T1
+	leaq	aad_shift_arr(%rip), %rax
+	movdqu  (%rax,%r11,), \T1
 	vpshufb \T1, reg_i, reg_i
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), reg_i, reg_i
@@ -1761,7 +1762,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	movdqu  aad_shift_arr(%r11), \T1
+	leaq	aad_shift_arr(%rip), %rax
+	movdqu  (%rax,%r11,), \T1
 	vpshufb \T1, reg_i, reg_i
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), reg_i, reg_i
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e2863c..46feaea52632 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -52,10 +52,10 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vmovdqa .Linv_shift_row, t4; \
-	vbroadcastss .L0f0f0f0f, t7; \
-	vmovdqa .Lpre_tf_lo_s1, t0; \
-	vmovdqa .Lpre_tf_hi_s1, t1; \
+	vmovdqa .Linv_shift_row(%rip), t4; \
+	vbroadcastss .L0f0f0f0f(%rip), t7; \
+	vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -68,8 +68,8 @@
 	vpshufb t4, x6, x6; \
 	\
 	/* prefilter sboxes 1, 2 and 3 */ \
-	vmovdqa .Lpre_tf_lo_s4, t2; \
-	vmovdqa .Lpre_tf_hi_s4, t3; \
+	vmovdqa .Lpre_tf_lo_s4(%rip), t2; \
+	vmovdqa .Lpre_tf_hi_s4(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x1, t0, t1, t7, t6); \
@@ -83,8 +83,8 @@
 	filter_8bit(x6, t2, t3, t7, t6); \
 	\
 	/* AES subbytes + AES shift rows */ \
-	vmovdqa .Lpost_tf_lo_s1, t0; \
-	vmovdqa .Lpost_tf_hi_s1, t1; \
+	vmovdqa .Lpost_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4, x0, x0; \
 	vaesenclast t4, x7, x7; \
 	vaesenclast t4, x1, x1; \
@@ -95,16 +95,16 @@
 	vaesenclast t4, x6, x6; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vmovdqa .Lpost_tf_lo_s3, t2; \
-	vmovdqa .Lpost_tf_hi_s3, t3; \
+	vmovdqa .Lpost_tf_lo_s3(%rip), t2; \
+	vmovdqa .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vmovdqa .Lpost_tf_lo_s2, t4; \
-	vmovdqa .Lpost_tf_hi_s2, t5; \
+	vmovdqa .Lpost_tf_lo_s2(%rip), t4; \
+	vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -443,7 +443,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vmovdqu .Lshufb_16x16b, a0; \
+	vmovdqu .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -482,7 +482,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 16(rio), x0, y7; \
 	vpxor 1 * 16(rio), x0, y6; \
@@ -533,7 +533,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1016,7 +1016,7 @@ ENTRY(camellia_ctr_16way)
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lbswap128_mask, %xmm14;
+	vmovdqa .Lbswap128_mask(%rip), %xmm14;
 
 	/* load IV and byteswap */
 	vmovdqu (%rcx), %xmm0;
@@ -1065,7 +1065,7 @@ ENTRY(camellia_ctr_16way)
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor %xmm0, %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1133,7 +1133,7 @@ camellia_xts_crypt_16way:
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+	vmovdqa .Lxts_gf128mul_and_shl1_mask(%rip), %xmm14;
 
 	/* load IV */
 	vmovdqu (%rcx), %xmm0;
@@ -1209,7 +1209,7 @@ camellia_xts_crypt_16way:
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX, %r8, 8), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor 0 * 16(%rax), %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1264,7 +1264,7 @@ ENTRY(camellia_xts_enc_16way)
 	 */
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk16, %r9;
+	leaq __camellia_enc_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 ENDPROC(camellia_xts_enc_16way)
@@ -1282,7 +1282,7 @@ ENTRY(camellia_xts_dec_16way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk16, %r9;
+	leaq __camellia_dec_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 ENDPROC(camellia_xts_dec_16way)
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index eee5b3982cfd..93da327fec83 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -69,12 +69,12 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vbroadcasti128 .Linv_shift_row, t4; \
-	vpbroadcastd .L0f0f0f0f, t7; \
-	vbroadcasti128 .Lpre_tf_lo_s1, t5; \
-	vbroadcasti128 .Lpre_tf_hi_s1, t6; \
-	vbroadcasti128 .Lpre_tf_lo_s4, t2; \
-	vbroadcasti128 .Lpre_tf_hi_s4, t3; \
+	vbroadcasti128 .Linv_shift_row(%rip), t4; \
+	vpbroadcastd .L0f0f0f0f(%rip), t7; \
+	vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \
+	vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \
+	vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \
+	vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -120,8 +120,8 @@
 	vinserti128 $1, t2##_x, x6, x6; \
 	vextracti128 $1, x1, t3##_x; \
 	vextracti128 $1, x4, t2##_x; \
-	vbroadcasti128 .Lpost_tf_lo_s1, t0; \
-	vbroadcasti128 .Lpost_tf_hi_s1, t1; \
+	vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \
+	vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4##_x, x2##_x, x2##_x; \
 	vaesenclast t4##_x, t6##_x, t6##_x; \
 	vinserti128 $1, t6##_x, x2, x2; \
@@ -136,16 +136,16 @@
 	vinserti128 $1, t2##_x, x4, x4; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vbroadcasti128 .Lpost_tf_lo_s3, t2; \
-	vbroadcasti128 .Lpost_tf_hi_s3, t3; \
+	vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \
+	vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vbroadcasti128 .Lpost_tf_lo_s2, t4; \
-	vbroadcasti128 .Lpost_tf_hi_s2, t5; \
+	vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \
+	vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -482,7 +482,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vbroadcasti128 .Lshufb_16x16b, a0; \
+	vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -521,7 +521,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 32(rio), x0, y7; \
 	vpxor 1 * 32(rio), x0, y6; \
@@ -572,7 +572,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1112,7 +1112,7 @@ ENTRY(camellia_ctr_32way)
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm1;
 	inc_le128(%xmm0, %xmm15, %xmm14);
-	vbroadcasti128 .Lbswap128_mask, %ymm14;
+	vbroadcasti128 .Lbswap128_mask(%rip), %ymm14;
 	vinserti128 $1, %xmm0, %ymm1, %ymm0;
 	vpshufb %ymm14, %ymm0, %ymm13;
 	vmovdqu %ymm13, 15 * 32(%rax);
@@ -1158,7 +1158,7 @@ ENTRY(camellia_ctr_32way)
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor %ymm0, %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1242,13 +1242,13 @@ camellia_xts_crypt_32way:
 	subq $(16 * 32), %rsp;
 	movq %rsp, %rax;
 
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0, %ymm12;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0(%rip), %ymm12;
 
 	/* load IV and construct second IV */
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm15;
 	gf128mul_x_ble(%xmm0, %xmm12, %xmm13);
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1, %ymm13;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1(%rip), %ymm13;
 	vinserti128 $1, %xmm0, %ymm15, %ymm0;
 	vpxor 0 * 32(%rdx), %ymm0, %ymm15;
 	vmovdqu %ymm15, 15 * 32(%rax);
@@ -1325,7 +1325,7 @@ camellia_xts_crypt_32way:
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX, %r8, 8), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor 0 * 32(%rax), %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1383,7 +1383,7 @@ ENTRY(camellia_xts_enc_32way)
 
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk32, %r9;
+	leaq __camellia_enc_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 ENDPROC(camellia_xts_enc_32way)
@@ -1401,7 +1401,7 @@ ENTRY(camellia_xts_dec_32way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk32, %r9;
+	leaq __camellia_dec_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 ENDPROC(camellia_xts_dec_32way)
diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S
index 95ba6956a7f6..ef1137406959 100644
--- a/arch/x86/crypto/camellia-x86_64-asm_64.S
+++ b/arch/x86/crypto/camellia-x86_64-asm_64.S
@@ -92,11 +92,13 @@
 #define RXORbl %r9b
 
 #define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \
+	leaq T0(%rip), 			tmp1; \
 	movzbl ab ## bl,		tmp2 ## d; \
+	xorq (tmp1, tmp2, 8),		dst; \
+	leaq T1(%rip), 			tmp2; \
 	movzbl ab ## bh,		tmp1 ## d; \
-	rorq $16,			ab; \
-	xorq T0(, tmp2, 8),		dst; \
-	xorq T1(, tmp1, 8),		dst;
+	xorq (tmp2, tmp1, 8),		dst; \
+	rorq $16,			ab;
 
 /**********************************************************************
   1-way camellia
diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
index 86107c961bb4..64eb5c87d04a 100644
--- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -166,15 +170,15 @@
 	subround(l ## 3, r ## 3, l ## 4, r ## 4, f);
 
 #define enc_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR;
 
 #define dec_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR; \
-	vpshufb		.Lbswap128_mask,          RKR, RKR;
+	vpshufb		.Lbswap128_mask(%rip),    RKR, RKR;
 
 #define transpose_2x4(x0, x1, t0, t1) \
 	vpunpckldq		x1, x0, t0; \
@@ -251,9 +255,9 @@ __cast5_enc_blk16:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	enc_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -287,7 +291,7 @@ __cast5_enc_blk16:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RR1, RL1, RTMP, RX, RKM);
 	outunpack_blocks(RR2, RL2, RTMP, RX, RKM);
@@ -325,9 +329,9 @@ __cast5_dec_blk16:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	dec_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -358,7 +362,7 @@ __cast5_dec_blk16:
 	round(RL, RR, 1, 2);
 	round(RR, RL, 0, 1);
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	popq %rbx;
 	popq %r15;
 
@@ -521,8 +525,8 @@ ENTRY(cast5_ctr_16way)
 
 	vpcmpeqd RKR, RKR, RKR;
 	vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */
-	vmovdqa .Lbswap_iv_mask, R1ST;
-	vmovdqa .Lbswap128_mask, RKM;
+	vmovdqa .Lbswap_iv_mask(%rip), R1ST;
+	vmovdqa .Lbswap128_mask(%rip), RKM;
 
 	/* load IV and byteswap */
 	vmovq (%rcx), RX;
diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index 7f30b6f0d72c..da1b7e4a23e4 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -190,10 +194,10 @@
 	qop(RD, RC, 1);
 
 #define shuffle(mask) \
-	vpshufb		mask,            RKR, RKR;
+	vpshufb		mask(%rip),            RKR, RKR;
 
 #define preload_rkr(n, do_mask, mask) \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		(kr+n*16)(CTX),           RKR, RKR; \
 	do_mask(mask);
@@ -275,9 +279,9 @@ __cast6_enc_blk8:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -301,7 +305,7 @@ __cast6_enc_blk8:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -323,9 +327,9 @@ __cast6_dec_blk8:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -349,7 +353,7 @@ __cast6_dec_blk8:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
 
diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S
index 8e49ce117494..4bbd3ec78df5 100644
--- a/arch/x86/crypto/des3_ede-asm_64.S
+++ b/arch/x86/crypto/des3_ede-asm_64.S
@@ -138,21 +138,29 @@
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
 	shrq $16, RW0; \
-	movq s8(, RT0, 8), RT0; \
-	xorq s6(, RT1, 8), to; \
+	leaq s8(%rip), RW1; \
+	movq (RW1, RT0, 8), RT0; \
+	leaq s6(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
 	movzbl RW0bl, RL1d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s4(, RT2, 8), RT0; \
-	xorq s2(, RT3, 8), to; \
+	leaq s4(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
+	leaq s2(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
-	xorq s7(, RL1, 8), RT0; \
-	xorq s5(, RT1, 8), to; \
-	xorq s3(, RT2, 8), RT0; \
+	leaq s7(%rip), RW1; \
+	xorq (RW1, RL1, 8), RT0; \
+	leaq s5(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
+	leaq s3(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
 	load_next_key(n, RW0); \
 	xorq RT0, to; \
-	xorq s1(, RT3, 8), to; \
+	leaq s1(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 
 #define load_next_key(n, RWx) \
 	movq (((n) + 1) * 8)(CTX), RWx;
@@ -364,65 +372,89 @@ ENDPROC(des3_ede_x86_64_crypt_blk)
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s8(, RT3, 8), to##0; \
-	xorq s6(, RT1, 8), to##0; \
+	leaq s8(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s6(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s4(, RT3, 8), to##0; \
-	xorq s2(, RT1, 8), to##0; \
+	leaq s4(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s2(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s7(, RT3, 8), to##0; \
-	xorq s5(, RT1, 8), to##0; \
+	leaq s7(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s5(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	load_next_key(n, RW0); \
-	xorq s3(, RT3, 8), to##0; \
-	xorq s1(, RT1, 8), to##0; \
+	leaq s3(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s1(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 		xorq from##1, RW1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s8(, RT3, 8), to##1; \
-		xorq s6(, RT1, 8), to##1; \
+		leaq s8(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s6(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s4(, RT3, 8), to##1; \
-		xorq s2(, RT1, 8), to##1; \
+		leaq s4(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s2(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrl $16, RW1d; \
-		xorq s7(, RT3, 8), to##1; \
-		xorq s5(, RT1, 8), to##1; \
+		leaq s7(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s5(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		do_movq(RW0, RW1); \
-		xorq s3(, RT3, 8), to##1; \
-		xorq s1(, RT1, 8), to##1; \
+		leaq s3(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s1(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 			xorq from##2, RW2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s8(, RT3, 8), to##2; \
-			xorq s6(, RT1, 8), to##2; \
+			leaq s8(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s6(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s4(, RT3, 8), to##2; \
-			xorq s2(, RT1, 8), to##2; \
+			leaq s4(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s2(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrl $16, RW2d; \
-			xorq s7(, RT3, 8), to##2; \
-			xorq s5(, RT1, 8), to##2; \
+			leaq s7(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s5(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			do_movq(RW0, RW2); \
-			xorq s3(, RT3, 8), to##2; \
-			xorq s1(, RT1, 8), to##2;
+			leaq s3(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s1(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2;
 
 #define __movq(src, dst) \
 	movq src, dst;
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index f94375a8dcd1..d56a281221fb 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -97,7 +97,7 @@ ENTRY(clmul_ghash_mul)
 	FRAME_BEGIN
 	movups (%rdi), DATA
 	movups (%rsi), SHASH
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	PSHUFB_XMM BSWAP DATA
 	call __clmul_gf128mul_ble
 	PSHUFB_XMM BSWAP DATA
@@ -114,7 +114,7 @@ ENTRY(clmul_ghash_update)
 	FRAME_BEGIN
 	cmp $16, %rdx
 	jb .Lupdate_just_ret	# check length
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	movups (%rdi), DATA
 	movups (%rcx), SHASH
 	PSHUFB_XMM BSWAP DATA
diff --git a/arch/x86/crypto/glue_helper-asm-avx.S b/arch/x86/crypto/glue_helper-asm-avx.S
index 02ee2308fb38..8a49ab1699ef 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -54,7 +54,7 @@
 #define load_ctr_8way(iv, bswap, x0, x1, x2, x3, x4, x5, x6, x7, t0, t1, t2) \
 	vpcmpeqd t0, t0, t0; \
 	vpsrldq $8, t0, t0; /* low: -1, high: 0 */ \
-	vmovdqa bswap, t1; \
+	vmovdqa bswap(%rip), t1; \
 	\
 	/* load IV and byteswap */ \
 	vmovdqu (iv), x7; \
@@ -99,7 +99,7 @@
 
 #define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
 		      t1, xts_gf128mul_and_shl1_mask) \
-	vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+	vmovdqa xts_gf128mul_and_shl1_mask(%rip), t0; \
 	\
 	/* load IV */ \
 	vmovdqu (iv), tiv; \
diff --git a/arch/x86/crypto/glue_helper-asm-avx2.S b/arch/x86/crypto/glue_helper-asm-avx2.S
index a53ac11dd385..e04c80467bd2 100644
--- a/arch/x86/crypto/glue_helper-asm-avx2.S
+++ b/arch/x86/crypto/glue_helper-asm-avx2.S
@@ -67,7 +67,7 @@
 	vmovdqu (iv), t2x; \
 	vmovdqa t2x, t3x; \
 	inc_le128(t2x, t0x, t1x); \
-	vbroadcasti128 bswap, t1; \
+	vbroadcasti128 bswap(%rip), t1; \
 	vinserti128 $1, t2x, t3, t2; /* ab: le0 ; cd: le1 */ \
 	vpshufb t1, t2, x0; \
 	\
@@ -124,13 +124,13 @@
 		       tivx, t0, t0x, t1, t1x, t2, t2x, t3, \
 		       xts_gf128mul_and_shl1_mask_0, \
 		       xts_gf128mul_and_shl1_mask_1) \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_0, t1; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_0(%rip), t1; \
 	\
 	/* load IV and construct second IV */ \
 	vmovdqu (iv), tivx; \
 	vmovdqa tivx, t0x; \
 	gf128mul_x_ble(tivx, t1x, t2x); \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_1, t2; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_1(%rip), t2; \
 	vinserti128 $1, tivx, t0, tiv; \
 	vpxor (0*32)(src), tiv, x0; \
 	vmovdqu tiv, (0*32)(dst); \
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 01/27] x86/crypto: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (2 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/crypto/aes-x86_64-asm_64.S          | 45 ++++++++-----
 arch/x86/crypto/aesni-intel_asm.S            | 14 ++--
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |  6 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 42 ++++++------
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 ++++++-------
 arch/x86/crypto/camellia-x86_64-asm_64.S     |  8 ++-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    | 50 ++++++++-------
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    | 44 +++++++------
 arch/x86/crypto/des3_ede-asm_64.S            | 96 ++++++++++++++++++----------
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |  4 +-
 arch/x86/crypto/glue_helper-asm-avx.S        |  4 +-
 arch/x86/crypto/glue_helper-asm-avx2.S       |  6 +-
 12 files changed, 211 insertions(+), 152 deletions(-)

diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S b/arch/x86/crypto/aes-x86_64-asm_64.S
index 8739cf7795de..86fa068e5e81 100644
--- a/arch/x86/crypto/aes-x86_64-asm_64.S
+++ b/arch/x86/crypto/aes-x86_64-asm_64.S
@@ -48,8 +48,12 @@
 #define R10	%r10
 #define R11	%r11
 
+/* Hold global for PIE suport */
+#define RBASE	%r12
+
 #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \
 	ENTRY(FUNC);			\
+	pushq	RBASE;			\
 	movq	r1,r2;			\
 	leaq	KEY+48(r8),r9;		\
 	movq	r10,r11;		\
@@ -74,54 +78,63 @@
 	movl	r6 ## E,4(r9);		\
 	movl	r7 ## E,8(r9);		\
 	movl	r8 ## E,12(r9);		\
+	popq	RBASE;			\
 	ret;				\
 	ENDPROC(FUNC);
 
+#define round_mov(tab_off, reg_i, reg_o) \
+	leaq	tab_off(%rip), RBASE; \
+	movl	(RBASE,reg_i,4), reg_o;
+
+#define round_xor(tab_off, reg_i, reg_o) \
+	leaq	tab_off(%rip), RBASE; \
+	xorl	(RBASE,reg_i,4), reg_o;
+
 #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
 	movzbl	r2 ## H,r5 ## E;	\
 	movzbl	r2 ## L,r6 ## E;	\
-	movl	TAB+1024(,r5,4),r5 ## E;\
+	round_mov(TAB+1024, r5, r5 ## E)\
 	movw	r4 ## X,r2 ## X;	\
-	movl	TAB(,r6,4),r6 ## E;	\
+	round_mov(TAB, r6, r6 ## E)	\
 	roll	$16,r2 ## E;		\
 	shrl	$16,r4 ## E;		\
 	movzbl	r4 ## L,r7 ## E;	\
 	movzbl	r4 ## H,r4 ## E;	\
 	xorl	OFFSET(r8),ra ## E;	\
 	xorl	OFFSET+4(r8),rb ## E;	\
-	xorl	TAB+3072(,r4,4),r5 ## E;\
-	xorl	TAB+2048(,r7,4),r6 ## E;\
+	round_xor(TAB+3072, r4, r5 ## E)\
+	round_xor(TAB+2048, r7, r6 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r4 ## E;	\
-	movl	TAB+1024(,r4,4),r4 ## E;\
+	round_mov(TAB+1024, r4, r4 ## E)\
 	movw	r3 ## X,r1 ## X;	\
 	roll	$16,r1 ## E;		\
 	shrl	$16,r3 ## E;		\
-	xorl	TAB(,r7,4),r5 ## E;	\
+	round_xor(TAB, r7, r5 ## E)	\
 	movzbl	r3 ## L,r7 ## E;	\
 	movzbl	r3 ## H,r3 ## E;	\
-	xorl	TAB+3072(,r3,4),r4 ## E;\
-	xorl	TAB+2048(,r7,4),r5 ## E;\
+	round_xor(TAB+3072, r3, r4 ## E)\
+	round_xor(TAB+2048, r7, r5 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r3 ## E;	\
 	shrl	$16,r1 ## E;		\
-	xorl	TAB+3072(,r3,4),r6 ## E;\
-	movl	TAB+2048(,r7,4),r3 ## E;\
+	round_xor(TAB+3072, r3, r6 ## E)\
+	round_mov(TAB+2048, r7, r3 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r1 ## E;	\
-	xorl	TAB+1024(,r1,4),r6 ## E;\
-	xorl	TAB(,r7,4),r3 ## E;	\
+	round_xor(TAB+1024, r1, r6 ## E)\
+	round_xor(TAB, r7, r3 ## E)	\
 	movzbl	r2 ## H,r1 ## E;	\
 	movzbl	r2 ## L,r7 ## E;	\
 	shrl	$16,r2 ## E;		\
-	xorl	TAB+3072(,r1,4),r3 ## E;\
-	xorl	TAB+2048(,r7,4),r4 ## E;\
+	round_xor(TAB+3072, r1, r3 ## E)\
+	round_xor(TAB+2048, r7, r4 ## E)\
 	movzbl	r2 ## H,r1 ## E;	\
 	movzbl	r2 ## L,r2 ## E;	\
 	xorl	OFFSET+8(r8),rc ## E;	\
 	xorl	OFFSET+12(r8),rd ## E;	\
-	xorl	TAB+1024(,r1,4),r3 ## E;\
-	xorl	TAB(,r2,4),r4 ## E;
+	round_xor(TAB+1024, r1, r3 ## E)\
+	round_xor(TAB, r2, r4 ## E)
 
 #define move_regs(r1,r2,r3,r4) \
 	movl	r3 ## E,r1 ## E;	\
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 16627fec80b2..5f73201dff32 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -325,7 +325,8 @@ _get_AAD_rest0\num_initial_blocks\operation:
 	vpshufb and an array of shuffle masks */
 	movq	   %r12, %r11
 	salq	   $4, %r11
-	movdqu	   aad_shift_arr(%r11), \TMP1
+	leaq	   aad_shift_arr(%rip), %rax
+	movdqu	   (%rax,%r11,), \TMP1
 	PSHUFB_XMM \TMP1, %xmm\i
 _get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
@@ -584,7 +585,8 @@ _get_AAD_rest0\num_initial_blocks\operation:
 	vpshufb and an array of shuffle masks */
 	movq	   %r12, %r11
 	salq	   $4, %r11
-	movdqu	   aad_shift_arr(%r11), \TMP1
+	leaq	   aad_shift_arr(%rip), %rax
+	movdqu	   (%rax,%r11,), \TMP1
 	PSHUFB_XMM \TMP1, %xmm\i
 _get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
@@ -2722,7 +2724,7 @@ ENDPROC(aesni_cbc_dec)
  */
 .align 4
 _aesni_inc_init:
-	movaps .Lbswap_mask, BSWAP_MASK
+	movaps .Lbswap_mask(%rip), BSWAP_MASK
 	movaps IV, CTR
 	PSHUFB_XMM BSWAP_MASK CTR
 	mov $1, TCTR_LOW
@@ -2850,12 +2852,12 @@ ENTRY(aesni_xts_crypt8)
 	cmpb $0, %cl
 	movl $0, %ecx
 	movl $240, %r10d
-	leaq _aesni_enc4, %r11
-	leaq _aesni_dec4, %rax
+	leaq _aesni_enc4(%rip), %r11
+	leaq _aesni_dec4(%rip), %rax
 	cmovel %r10d, %ecx
 	cmoveq %rax, %r11
 
-	movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
 	movups (IVP), IV
 
 	mov 480(KEYP), KLEN
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb1518bf8..488605b19fe8 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -454,7 +454,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	movdqu  aad_shift_arr(%r11), \T1
+	leaq	aad_shift_arr(%rip), %rax
+	movdqu  (%rax,%r11,), \T1
 	vpshufb \T1, reg_i, reg_i
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), reg_i, reg_i
@@ -1761,7 +1762,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	movdqu  aad_shift_arr(%r11), \T1
+	leaq	aad_shift_arr(%rip), %rax
+	movdqu  (%rax,%r11,), \T1
 	vpshufb \T1, reg_i, reg_i
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), reg_i, reg_i
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e2863c..46feaea52632 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -52,10 +52,10 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vmovdqa .Linv_shift_row, t4; \
-	vbroadcastss .L0f0f0f0f, t7; \
-	vmovdqa .Lpre_tf_lo_s1, t0; \
-	vmovdqa .Lpre_tf_hi_s1, t1; \
+	vmovdqa .Linv_shift_row(%rip), t4; \
+	vbroadcastss .L0f0f0f0f(%rip), t7; \
+	vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -68,8 +68,8 @@
 	vpshufb t4, x6, x6; \
 	\
 	/* prefilter sboxes 1, 2 and 3 */ \
-	vmovdqa .Lpre_tf_lo_s4, t2; \
-	vmovdqa .Lpre_tf_hi_s4, t3; \
+	vmovdqa .Lpre_tf_lo_s4(%rip), t2; \
+	vmovdqa .Lpre_tf_hi_s4(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x1, t0, t1, t7, t6); \
@@ -83,8 +83,8 @@
 	filter_8bit(x6, t2, t3, t7, t6); \
 	\
 	/* AES subbytes + AES shift rows */ \
-	vmovdqa .Lpost_tf_lo_s1, t0; \
-	vmovdqa .Lpost_tf_hi_s1, t1; \
+	vmovdqa .Lpost_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4, x0, x0; \
 	vaesenclast t4, x7, x7; \
 	vaesenclast t4, x1, x1; \
@@ -95,16 +95,16 @@
 	vaesenclast t4, x6, x6; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vmovdqa .Lpost_tf_lo_s3, t2; \
-	vmovdqa .Lpost_tf_hi_s3, t3; \
+	vmovdqa .Lpost_tf_lo_s3(%rip), t2; \
+	vmovdqa .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vmovdqa .Lpost_tf_lo_s2, t4; \
-	vmovdqa .Lpost_tf_hi_s2, t5; \
+	vmovdqa .Lpost_tf_lo_s2(%rip), t4; \
+	vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -443,7 +443,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vmovdqu .Lshufb_16x16b, a0; \
+	vmovdqu .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -482,7 +482,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 16(rio), x0, y7; \
 	vpxor 1 * 16(rio), x0, y6; \
@@ -533,7 +533,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1016,7 +1016,7 @@ ENTRY(camellia_ctr_16way)
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lbswap128_mask, %xmm14;
+	vmovdqa .Lbswap128_mask(%rip), %xmm14;
 
 	/* load IV and byteswap */
 	vmovdqu (%rcx), %xmm0;
@@ -1065,7 +1065,7 @@ ENTRY(camellia_ctr_16way)
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor %xmm0, %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1133,7 +1133,7 @@ camellia_xts_crypt_16way:
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+	vmovdqa .Lxts_gf128mul_and_shl1_mask(%rip), %xmm14;
 
 	/* load IV */
 	vmovdqu (%rcx), %xmm0;
@@ -1209,7 +1209,7 @@ camellia_xts_crypt_16way:
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX, %r8, 8), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor 0 * 16(%rax), %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1264,7 +1264,7 @@ ENTRY(camellia_xts_enc_16way)
 	 */
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk16, %r9;
+	leaq __camellia_enc_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 ENDPROC(camellia_xts_enc_16way)
@@ -1282,7 +1282,7 @@ ENTRY(camellia_xts_dec_16way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk16, %r9;
+	leaq __camellia_dec_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 ENDPROC(camellia_xts_dec_16way)
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index eee5b3982cfd..93da327fec83 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -69,12 +69,12 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vbroadcasti128 .Linv_shift_row, t4; \
-	vpbroadcastd .L0f0f0f0f, t7; \
-	vbroadcasti128 .Lpre_tf_lo_s1, t5; \
-	vbroadcasti128 .Lpre_tf_hi_s1, t6; \
-	vbroadcasti128 .Lpre_tf_lo_s4, t2; \
-	vbroadcasti128 .Lpre_tf_hi_s4, t3; \
+	vbroadcasti128 .Linv_shift_row(%rip), t4; \
+	vpbroadcastd .L0f0f0f0f(%rip), t7; \
+	vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \
+	vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \
+	vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \
+	vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -120,8 +120,8 @@
 	vinserti128 $1, t2##_x, x6, x6; \
 	vextracti128 $1, x1, t3##_x; \
 	vextracti128 $1, x4, t2##_x; \
-	vbroadcasti128 .Lpost_tf_lo_s1, t0; \
-	vbroadcasti128 .Lpost_tf_hi_s1, t1; \
+	vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \
+	vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4##_x, x2##_x, x2##_x; \
 	vaesenclast t4##_x, t6##_x, t6##_x; \
 	vinserti128 $1, t6##_x, x2, x2; \
@@ -136,16 +136,16 @@
 	vinserti128 $1, t2##_x, x4, x4; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vbroadcasti128 .Lpost_tf_lo_s3, t2; \
-	vbroadcasti128 .Lpost_tf_hi_s3, t3; \
+	vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \
+	vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vbroadcasti128 .Lpost_tf_lo_s2, t4; \
-	vbroadcasti128 .Lpost_tf_hi_s2, t5; \
+	vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \
+	vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -482,7 +482,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vbroadcasti128 .Lshufb_16x16b, a0; \
+	vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -521,7 +521,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 32(rio), x0, y7; \
 	vpxor 1 * 32(rio), x0, y6; \
@@ -572,7 +572,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1112,7 +1112,7 @@ ENTRY(camellia_ctr_32way)
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm1;
 	inc_le128(%xmm0, %xmm15, %xmm14);
-	vbroadcasti128 .Lbswap128_mask, %ymm14;
+	vbroadcasti128 .Lbswap128_mask(%rip), %ymm14;
 	vinserti128 $1, %xmm0, %ymm1, %ymm0;
 	vpshufb %ymm14, %ymm0, %ymm13;
 	vmovdqu %ymm13, 15 * 32(%rax);
@@ -1158,7 +1158,7 @@ ENTRY(camellia_ctr_32way)
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor %ymm0, %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1242,13 +1242,13 @@ camellia_xts_crypt_32way:
 	subq $(16 * 32), %rsp;
 	movq %rsp, %rax;
 
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0, %ymm12;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0(%rip), %ymm12;
 
 	/* load IV and construct second IV */
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm15;
 	gf128mul_x_ble(%xmm0, %xmm12, %xmm13);
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1, %ymm13;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1(%rip), %ymm13;
 	vinserti128 $1, %xmm0, %ymm15, %ymm0;
 	vpxor 0 * 32(%rdx), %ymm0, %ymm15;
 	vmovdqu %ymm15, 15 * 32(%rax);
@@ -1325,7 +1325,7 @@ camellia_xts_crypt_32way:
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX, %r8, 8), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor 0 * 32(%rax), %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1383,7 +1383,7 @@ ENTRY(camellia_xts_enc_32way)
 
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk32, %r9;
+	leaq __camellia_enc_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 ENDPROC(camellia_xts_enc_32way)
@@ -1401,7 +1401,7 @@ ENTRY(camellia_xts_dec_32way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk32, %r9;
+	leaq __camellia_dec_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 ENDPROC(camellia_xts_dec_32way)
diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S
index 95ba6956a7f6..ef1137406959 100644
--- a/arch/x86/crypto/camellia-x86_64-asm_64.S
+++ b/arch/x86/crypto/camellia-x86_64-asm_64.S
@@ -92,11 +92,13 @@
 #define RXORbl %r9b
 
 #define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \
+	leaq T0(%rip), 			tmp1; \
 	movzbl ab ## bl,		tmp2 ## d; \
+	xorq (tmp1, tmp2, 8),		dst; \
+	leaq T1(%rip), 			tmp2; \
 	movzbl ab ## bh,		tmp1 ## d; \
-	rorq $16,			ab; \
-	xorq T0(, tmp2, 8),		dst; \
-	xorq T1(, tmp1, 8),		dst;
+	xorq (tmp2, tmp1, 8),		dst; \
+	rorq $16,			ab;
 
 /**********************************************************************
   1-way camellia
diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
index 86107c961bb4..64eb5c87d04a 100644
--- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -166,15 +170,15 @@
 	subround(l ## 3, r ## 3, l ## 4, r ## 4, f);
 
 #define enc_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR;
 
 #define dec_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR; \
-	vpshufb		.Lbswap128_mask,          RKR, RKR;
+	vpshufb		.Lbswap128_mask(%rip),    RKR, RKR;
 
 #define transpose_2x4(x0, x1, t0, t1) \
 	vpunpckldq		x1, x0, t0; \
@@ -251,9 +255,9 @@ __cast5_enc_blk16:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	enc_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -287,7 +291,7 @@ __cast5_enc_blk16:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RR1, RL1, RTMP, RX, RKM);
 	outunpack_blocks(RR2, RL2, RTMP, RX, RKM);
@@ -325,9 +329,9 @@ __cast5_dec_blk16:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	dec_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -358,7 +362,7 @@ __cast5_dec_blk16:
 	round(RL, RR, 1, 2);
 	round(RR, RL, 0, 1);
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	popq %rbx;
 	popq %r15;
 
@@ -521,8 +525,8 @@ ENTRY(cast5_ctr_16way)
 
 	vpcmpeqd RKR, RKR, RKR;
 	vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */
-	vmovdqa .Lbswap_iv_mask, R1ST;
-	vmovdqa .Lbswap128_mask, RKM;
+	vmovdqa .Lbswap_iv_mask(%rip), R1ST;
+	vmovdqa .Lbswap128_mask(%rip), RKM;
 
 	/* load IV and byteswap */
 	vmovq (%rcx), RX;
diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index 7f30b6f0d72c..da1b7e4a23e4 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -190,10 +194,10 @@
 	qop(RD, RC, 1);
 
 #define shuffle(mask) \
-	vpshufb		mask,            RKR, RKR;
+	vpshufb		mask(%rip),            RKR, RKR;
 
 #define preload_rkr(n, do_mask, mask) \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		(kr+n*16)(CTX),           RKR, RKR; \
 	do_mask(mask);
@@ -275,9 +279,9 @@ __cast6_enc_blk8:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -301,7 +305,7 @@ __cast6_enc_blk8:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -323,9 +327,9 @@ __cast6_dec_blk8:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -349,7 +353,7 @@ __cast6_dec_blk8:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
 
diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S
index 8e49ce117494..4bbd3ec78df5 100644
--- a/arch/x86/crypto/des3_ede-asm_64.S
+++ b/arch/x86/crypto/des3_ede-asm_64.S
@@ -138,21 +138,29 @@
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
 	shrq $16, RW0; \
-	movq s8(, RT0, 8), RT0; \
-	xorq s6(, RT1, 8), to; \
+	leaq s8(%rip), RW1; \
+	movq (RW1, RT0, 8), RT0; \
+	leaq s6(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
 	movzbl RW0bl, RL1d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s4(, RT2, 8), RT0; \
-	xorq s2(, RT3, 8), to; \
+	leaq s4(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
+	leaq s2(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
-	xorq s7(, RL1, 8), RT0; \
-	xorq s5(, RT1, 8), to; \
-	xorq s3(, RT2, 8), RT0; \
+	leaq s7(%rip), RW1; \
+	xorq (RW1, RL1, 8), RT0; \
+	leaq s5(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
+	leaq s3(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
 	load_next_key(n, RW0); \
 	xorq RT0, to; \
-	xorq s1(, RT3, 8), to; \
+	leaq s1(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 
 #define load_next_key(n, RWx) \
 	movq (((n) + 1) * 8)(CTX), RWx;
@@ -364,65 +372,89 @@ ENDPROC(des3_ede_x86_64_crypt_blk)
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s8(, RT3, 8), to##0; \
-	xorq s6(, RT1, 8), to##0; \
+	leaq s8(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s6(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s4(, RT3, 8), to##0; \
-	xorq s2(, RT1, 8), to##0; \
+	leaq s4(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s2(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s7(, RT3, 8), to##0; \
-	xorq s5(, RT1, 8), to##0; \
+	leaq s7(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s5(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	load_next_key(n, RW0); \
-	xorq s3(, RT3, 8), to##0; \
-	xorq s1(, RT1, 8), to##0; \
+	leaq s3(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s1(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 		xorq from##1, RW1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s8(, RT3, 8), to##1; \
-		xorq s6(, RT1, 8), to##1; \
+		leaq s8(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s6(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s4(, RT3, 8), to##1; \
-		xorq s2(, RT1, 8), to##1; \
+		leaq s4(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s2(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrl $16, RW1d; \
-		xorq s7(, RT3, 8), to##1; \
-		xorq s5(, RT1, 8), to##1; \
+		leaq s7(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s5(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		do_movq(RW0, RW1); \
-		xorq s3(, RT3, 8), to##1; \
-		xorq s1(, RT1, 8), to##1; \
+		leaq s3(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s1(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 			xorq from##2, RW2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s8(, RT3, 8), to##2; \
-			xorq s6(, RT1, 8), to##2; \
+			leaq s8(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s6(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s4(, RT3, 8), to##2; \
-			xorq s2(, RT1, 8), to##2; \
+			leaq s4(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s2(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrl $16, RW2d; \
-			xorq s7(, RT3, 8), to##2; \
-			xorq s5(, RT1, 8), to##2; \
+			leaq s7(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s5(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			do_movq(RW0, RW2); \
-			xorq s3(, RT3, 8), to##2; \
-			xorq s1(, RT1, 8), to##2;
+			leaq s3(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s1(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2;
 
 #define __movq(src, dst) \
 	movq src, dst;
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index f94375a8dcd1..d56a281221fb 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -97,7 +97,7 @@ ENTRY(clmul_ghash_mul)
 	FRAME_BEGIN
 	movups (%rdi), DATA
 	movups (%rsi), SHASH
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	PSHUFB_XMM BSWAP DATA
 	call __clmul_gf128mul_ble
 	PSHUFB_XMM BSWAP DATA
@@ -114,7 +114,7 @@ ENTRY(clmul_ghash_update)
 	FRAME_BEGIN
 	cmp $16, %rdx
 	jb .Lupdate_just_ret	# check length
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	movups (%rdi), DATA
 	movups (%rcx), SHASH
 	PSHUFB_XMM BSWAP DATA
diff --git a/arch/x86/crypto/glue_helper-asm-avx.S b/arch/x86/crypto/glue_helper-asm-avx.S
index 02ee2308fb38..8a49ab1699ef 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -54,7 +54,7 @@
 #define load_ctr_8way(iv, bswap, x0, x1, x2, x3, x4, x5, x6, x7, t0, t1, t2) \
 	vpcmpeqd t0, t0, t0; \
 	vpsrldq $8, t0, t0; /* low: -1, high: 0 */ \
-	vmovdqa bswap, t1; \
+	vmovdqa bswap(%rip), t1; \
 	\
 	/* load IV and byteswap */ \
 	vmovdqu (iv), x7; \
@@ -99,7 +99,7 @@
 
 #define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
 		      t1, xts_gf128mul_and_shl1_mask) \
-	vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+	vmovdqa xts_gf128mul_and_shl1_mask(%rip), t0; \
 	\
 	/* load IV */ \
 	vmovdqu (iv), tiv; \
diff --git a/arch/x86/crypto/glue_helper-asm-avx2.S b/arch/x86/crypto/glue_helper-asm-avx2.S
index a53ac11dd385..e04c80467bd2 100644
--- a/arch/x86/crypto/glue_helper-asm-avx2.S
+++ b/arch/x86/crypto/glue_helper-asm-avx2.S
@@ -67,7 +67,7 @@
 	vmovdqu (iv), t2x; \
 	vmovdqa t2x, t3x; \
 	inc_le128(t2x, t0x, t1x); \
-	vbroadcasti128 bswap, t1; \
+	vbroadcasti128 bswap(%rip), t1; \
 	vinserti128 $1, t2x, t3, t2; /* ab: le0 ; cd: le1 */ \
 	vpshufb t1, t2, x0; \
 	\
@@ -124,13 +124,13 @@
 		       tivx, t0, t0x, t1, t1x, t2, t2x, t3, \
 		       xts_gf128mul_and_shl1_mask_0, \
 		       xts_gf128mul_and_shl1_mask_1) \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_0, t1; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_0(%rip), t1; \
 	\
 	/* load IV and construct second IV */ \
 	vmovdqu (iv), tivx; \
 	vmovdqa tivx, t0x; \
 	gf128mul_x_ble(tivx, t1x, t2x); \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_1, t2; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_1(%rip), t2; \
 	vinserti128 $1, tivx, t0, tiv; \
 	vpxor (0*32)(src), tiv, x0; \
 	vmovdqu tiv, (0*32)(dst); \
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 01/27] x86/crypto: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/crypto/aes-x86_64-asm_64.S          | 45 ++++++++-----
 arch/x86/crypto/aesni-intel_asm.S            | 14 ++--
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |  6 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 42 ++++++------
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 ++++++-------
 arch/x86/crypto/camellia-x86_64-asm_64.S     |  8 ++-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    | 50 ++++++++-------
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    | 44 +++++++------
 arch/x86/crypto/des3_ede-asm_64.S            | 96 ++++++++++++++++++----------
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |  4 +-
 arch/x86/crypto/glue_helper-asm-avx.S        |  4 +-
 arch/x86/crypto/glue_helper-asm-avx2.S       |  6 +-
 12 files changed, 211 insertions(+), 152 deletions(-)

diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S b/arch/x86/crypto/aes-x86_64-asm_64.S
index 8739cf7795de..86fa068e5e81 100644
--- a/arch/x86/crypto/aes-x86_64-asm_64.S
+++ b/arch/x86/crypto/aes-x86_64-asm_64.S
@@ -48,8 +48,12 @@
 #define R10	%r10
 #define R11	%r11
 
+/* Hold global for PIE suport */
+#define RBASE	%r12
+
 #define prologue(FUNC,KEY,B128,B192,r1,r2,r5,r6,r7,r8,r9,r10,r11) \
 	ENTRY(FUNC);			\
+	pushq	RBASE;			\
 	movq	r1,r2;			\
 	leaq	KEY+48(r8),r9;		\
 	movq	r10,r11;		\
@@ -74,54 +78,63 @@
 	movl	r6 ## E,4(r9);		\
 	movl	r7 ## E,8(r9);		\
 	movl	r8 ## E,12(r9);		\
+	popq	RBASE;			\
 	ret;				\
 	ENDPROC(FUNC);
 
+#define round_mov(tab_off, reg_i, reg_o) \
+	leaq	tab_off(%rip), RBASE; \
+	movl	(RBASE,reg_i,4), reg_o;
+
+#define round_xor(tab_off, reg_i, reg_o) \
+	leaq	tab_off(%rip), RBASE; \
+	xorl	(RBASE,reg_i,4), reg_o;
+
 #define round(TAB,OFFSET,r1,r2,r3,r4,r5,r6,r7,r8,ra,rb,rc,rd) \
 	movzbl	r2 ## H,r5 ## E;	\
 	movzbl	r2 ## L,r6 ## E;	\
-	movl	TAB+1024(,r5,4),r5 ## E;\
+	round_mov(TAB+1024, r5, r5 ## E)\
 	movw	r4 ## X,r2 ## X;	\
-	movl	TAB(,r6,4),r6 ## E;	\
+	round_mov(TAB, r6, r6 ## E)	\
 	roll	$16,r2 ## E;		\
 	shrl	$16,r4 ## E;		\
 	movzbl	r4 ## L,r7 ## E;	\
 	movzbl	r4 ## H,r4 ## E;	\
 	xorl	OFFSET(r8),ra ## E;	\
 	xorl	OFFSET+4(r8),rb ## E;	\
-	xorl	TAB+3072(,r4,4),r5 ## E;\
-	xorl	TAB+2048(,r7,4),r6 ## E;\
+	round_xor(TAB+3072, r4, r5 ## E)\
+	round_xor(TAB+2048, r7, r6 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r4 ## E;	\
-	movl	TAB+1024(,r4,4),r4 ## E;\
+	round_mov(TAB+1024, r4, r4 ## E)\
 	movw	r3 ## X,r1 ## X;	\
 	roll	$16,r1 ## E;		\
 	shrl	$16,r3 ## E;		\
-	xorl	TAB(,r7,4),r5 ## E;	\
+	round_xor(TAB, r7, r5 ## E)	\
 	movzbl	r3 ## L,r7 ## E;	\
 	movzbl	r3 ## H,r3 ## E;	\
-	xorl	TAB+3072(,r3,4),r4 ## E;\
-	xorl	TAB+2048(,r7,4),r5 ## E;\
+	round_xor(TAB+3072, r3, r4 ## E)\
+	round_xor(TAB+2048, r7, r5 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r3 ## E;	\
 	shrl	$16,r1 ## E;		\
-	xorl	TAB+3072(,r3,4),r6 ## E;\
-	movl	TAB+2048(,r7,4),r3 ## E;\
+	round_xor(TAB+3072, r3, r6 ## E)\
+	round_mov(TAB+2048, r7, r3 ## E)\
 	movzbl	r1 ## L,r7 ## E;	\
 	movzbl	r1 ## H,r1 ## E;	\
-	xorl	TAB+1024(,r1,4),r6 ## E;\
-	xorl	TAB(,r7,4),r3 ## E;	\
+	round_xor(TAB+1024, r1, r6 ## E)\
+	round_xor(TAB, r7, r3 ## E)	\
 	movzbl	r2 ## H,r1 ## E;	\
 	movzbl	r2 ## L,r7 ## E;	\
 	shrl	$16,r2 ## E;		\
-	xorl	TAB+3072(,r1,4),r3 ## E;\
-	xorl	TAB+2048(,r7,4),r4 ## E;\
+	round_xor(TAB+3072, r1, r3 ## E)\
+	round_xor(TAB+2048, r7, r4 ## E)\
 	movzbl	r2 ## H,r1 ## E;	\
 	movzbl	r2 ## L,r2 ## E;	\
 	xorl	OFFSET+8(r8),rc ## E;	\
 	xorl	OFFSET+12(r8),rd ## E;	\
-	xorl	TAB+1024(,r1,4),r3 ## E;\
-	xorl	TAB(,r2,4),r4 ## E;
+	round_xor(TAB+1024, r1, r3 ## E)\
+	round_xor(TAB, r2, r4 ## E)
 
 #define move_regs(r1,r2,r3,r4) \
 	movl	r3 ## E,r1 ## E;	\
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 16627fec80b2..5f73201dff32 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -325,7 +325,8 @@ _get_AAD_rest0\num_initial_blocks\operation:
 	vpshufb and an array of shuffle masks */
 	movq	   %r12, %r11
 	salq	   $4, %r11
-	movdqu	   aad_shift_arr(%r11), \TMP1
+	leaq	   aad_shift_arr(%rip), %rax
+	movdqu	   (%rax,%r11,), \TMP1
 	PSHUFB_XMM \TMP1, %xmm\i
 _get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
@@ -584,7 +585,8 @@ _get_AAD_rest0\num_initial_blocks\operation:
 	vpshufb and an array of shuffle masks */
 	movq	   %r12, %r11
 	salq	   $4, %r11
-	movdqu	   aad_shift_arr(%r11), \TMP1
+	leaq	   aad_shift_arr(%rip), %rax
+	movdqu	   (%rax,%r11,), \TMP1
 	PSHUFB_XMM \TMP1, %xmm\i
 _get_AAD_rest_final\num_initial_blocks\operation:
 	PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
@@ -2722,7 +2724,7 @@ ENDPROC(aesni_cbc_dec)
  */
 .align 4
 _aesni_inc_init:
-	movaps .Lbswap_mask, BSWAP_MASK
+	movaps .Lbswap_mask(%rip), BSWAP_MASK
 	movaps IV, CTR
 	PSHUFB_XMM BSWAP_MASK CTR
 	mov $1, TCTR_LOW
@@ -2850,12 +2852,12 @@ ENTRY(aesni_xts_crypt8)
 	cmpb $0, %cl
 	movl $0, %ecx
 	movl $240, %r10d
-	leaq _aesni_enc4, %r11
-	leaq _aesni_dec4, %rax
+	leaq _aesni_enc4(%rip), %r11
+	leaq _aesni_dec4(%rip), %rax
 	cmovel %r10d, %ecx
 	cmoveq %rax, %r11
 
-	movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
 	movups (IVP), IV
 
 	mov 480(KEYP), KLEN
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb1518bf8..488605b19fe8 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -454,7 +454,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	movdqu  aad_shift_arr(%r11), \T1
+	leaq	aad_shift_arr(%rip), %rax
+	movdqu  (%rax,%r11,), \T1
 	vpshufb \T1, reg_i, reg_i
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), reg_i, reg_i
@@ -1761,7 +1762,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	movdqu  aad_shift_arr(%r11), \T1
+	leaq	aad_shift_arr(%rip), %rax
+	movdqu  (%rax,%r11,), \T1
 	vpshufb \T1, reg_i, reg_i
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), reg_i, reg_i
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e2863c..46feaea52632 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -52,10 +52,10 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vmovdqa .Linv_shift_row, t4; \
-	vbroadcastss .L0f0f0f0f, t7; \
-	vmovdqa .Lpre_tf_lo_s1, t0; \
-	vmovdqa .Lpre_tf_hi_s1, t1; \
+	vmovdqa .Linv_shift_row(%rip), t4; \
+	vbroadcastss .L0f0f0f0f(%rip), t7; \
+	vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -68,8 +68,8 @@
 	vpshufb t4, x6, x6; \
 	\
 	/* prefilter sboxes 1, 2 and 3 */ \
-	vmovdqa .Lpre_tf_lo_s4, t2; \
-	vmovdqa .Lpre_tf_hi_s4, t3; \
+	vmovdqa .Lpre_tf_lo_s4(%rip), t2; \
+	vmovdqa .Lpre_tf_hi_s4(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x1, t0, t1, t7, t6); \
@@ -83,8 +83,8 @@
 	filter_8bit(x6, t2, t3, t7, t6); \
 	\
 	/* AES subbytes + AES shift rows */ \
-	vmovdqa .Lpost_tf_lo_s1, t0; \
-	vmovdqa .Lpost_tf_hi_s1, t1; \
+	vmovdqa .Lpost_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4, x0, x0; \
 	vaesenclast t4, x7, x7; \
 	vaesenclast t4, x1, x1; \
@@ -95,16 +95,16 @@
 	vaesenclast t4, x6, x6; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vmovdqa .Lpost_tf_lo_s3, t2; \
-	vmovdqa .Lpost_tf_hi_s3, t3; \
+	vmovdqa .Lpost_tf_lo_s3(%rip), t2; \
+	vmovdqa .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vmovdqa .Lpost_tf_lo_s2, t4; \
-	vmovdqa .Lpost_tf_hi_s2, t5; \
+	vmovdqa .Lpost_tf_lo_s2(%rip), t4; \
+	vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -443,7 +443,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vmovdqu .Lshufb_16x16b, a0; \
+	vmovdqu .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -482,7 +482,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 16(rio), x0, y7; \
 	vpxor 1 * 16(rio), x0, y6; \
@@ -533,7 +533,7 @@ ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1016,7 +1016,7 @@ ENTRY(camellia_ctr_16way)
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lbswap128_mask, %xmm14;
+	vmovdqa .Lbswap128_mask(%rip), %xmm14;
 
 	/* load IV and byteswap */
 	vmovdqu (%rcx), %xmm0;
@@ -1065,7 +1065,7 @@ ENTRY(camellia_ctr_16way)
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor %xmm0, %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1133,7 +1133,7 @@ camellia_xts_crypt_16way:
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+	vmovdqa .Lxts_gf128mul_and_shl1_mask(%rip), %xmm14;
 
 	/* load IV */
 	vmovdqu (%rcx), %xmm0;
@@ -1209,7 +1209,7 @@ camellia_xts_crypt_16way:
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX, %r8, 8), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor 0 * 16(%rax), %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1264,7 +1264,7 @@ ENTRY(camellia_xts_enc_16way)
 	 */
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk16, %r9;
+	leaq __camellia_enc_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 ENDPROC(camellia_xts_enc_16way)
@@ -1282,7 +1282,7 @@ ENTRY(camellia_xts_dec_16way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk16, %r9;
+	leaq __camellia_dec_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 ENDPROC(camellia_xts_dec_16way)
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index eee5b3982cfd..93da327fec83 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -69,12 +69,12 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vbroadcasti128 .Linv_shift_row, t4; \
-	vpbroadcastd .L0f0f0f0f, t7; \
-	vbroadcasti128 .Lpre_tf_lo_s1, t5; \
-	vbroadcasti128 .Lpre_tf_hi_s1, t6; \
-	vbroadcasti128 .Lpre_tf_lo_s4, t2; \
-	vbroadcasti128 .Lpre_tf_hi_s4, t3; \
+	vbroadcasti128 .Linv_shift_row(%rip), t4; \
+	vpbroadcastd .L0f0f0f0f(%rip), t7; \
+	vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \
+	vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \
+	vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \
+	vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -120,8 +120,8 @@
 	vinserti128 $1, t2##_x, x6, x6; \
 	vextracti128 $1, x1, t3##_x; \
 	vextracti128 $1, x4, t2##_x; \
-	vbroadcasti128 .Lpost_tf_lo_s1, t0; \
-	vbroadcasti128 .Lpost_tf_hi_s1, t1; \
+	vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \
+	vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4##_x, x2##_x, x2##_x; \
 	vaesenclast t4##_x, t6##_x, t6##_x; \
 	vinserti128 $1, t6##_x, x2, x2; \
@@ -136,16 +136,16 @@
 	vinserti128 $1, t2##_x, x4, x4; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vbroadcasti128 .Lpost_tf_lo_s3, t2; \
-	vbroadcasti128 .Lpost_tf_hi_s3, t3; \
+	vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \
+	vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vbroadcasti128 .Lpost_tf_lo_s2, t4; \
-	vbroadcasti128 .Lpost_tf_hi_s2, t5; \
+	vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \
+	vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -482,7 +482,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vbroadcasti128 .Lshufb_16x16b, a0; \
+	vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -521,7 +521,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 32(rio), x0, y7; \
 	vpxor 1 * 32(rio), x0, y6; \
@@ -572,7 +572,7 @@ ENDPROC(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1112,7 +1112,7 @@ ENTRY(camellia_ctr_32way)
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm1;
 	inc_le128(%xmm0, %xmm15, %xmm14);
-	vbroadcasti128 .Lbswap128_mask, %ymm14;
+	vbroadcasti128 .Lbswap128_mask(%rip), %ymm14;
 	vinserti128 $1, %xmm0, %ymm1, %ymm0;
 	vpshufb %ymm14, %ymm0, %ymm13;
 	vmovdqu %ymm13, 15 * 32(%rax);
@@ -1158,7 +1158,7 @@ ENTRY(camellia_ctr_32way)
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor %ymm0, %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1242,13 +1242,13 @@ camellia_xts_crypt_32way:
 	subq $(16 * 32), %rsp;
 	movq %rsp, %rax;
 
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0, %ymm12;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0(%rip), %ymm12;
 
 	/* load IV and construct second IV */
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm15;
 	gf128mul_x_ble(%xmm0, %xmm12, %xmm13);
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1, %ymm13;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1(%rip), %ymm13;
 	vinserti128 $1, %xmm0, %ymm15, %ymm0;
 	vpxor 0 * 32(%rdx), %ymm0, %ymm15;
 	vmovdqu %ymm15, 15 * 32(%rax);
@@ -1325,7 +1325,7 @@ camellia_xts_crypt_32way:
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX, %r8, 8), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor 0 * 32(%rax), %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1383,7 +1383,7 @@ ENTRY(camellia_xts_enc_32way)
 
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk32, %r9;
+	leaq __camellia_enc_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 ENDPROC(camellia_xts_enc_32way)
@@ -1401,7 +1401,7 @@ ENTRY(camellia_xts_dec_32way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk32, %r9;
+	leaq __camellia_dec_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 ENDPROC(camellia_xts_dec_32way)
diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S
index 95ba6956a7f6..ef1137406959 100644
--- a/arch/x86/crypto/camellia-x86_64-asm_64.S
+++ b/arch/x86/crypto/camellia-x86_64-asm_64.S
@@ -92,11 +92,13 @@
 #define RXORbl %r9b
 
 #define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \
+	leaq T0(%rip), 			tmp1; \
 	movzbl ab ## bl,		tmp2 ## d; \
+	xorq (tmp1, tmp2, 8),		dst; \
+	leaq T1(%rip), 			tmp2; \
 	movzbl ab ## bh,		tmp1 ## d; \
-	rorq $16,			ab; \
-	xorq T0(, tmp2, 8),		dst; \
-	xorq T1(, tmp1, 8),		dst;
+	xorq (tmp2, tmp1, 8),		dst; \
+	rorq $16,			ab;
 
 /**********************************************************************
   1-way camellia
diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
index 86107c961bb4..64eb5c87d04a 100644
--- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -166,15 +170,15 @@
 	subround(l ## 3, r ## 3, l ## 4, r ## 4, f);
 
 #define enc_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR;
 
 #define dec_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR; \
-	vpshufb		.Lbswap128_mask,          RKR, RKR;
+	vpshufb		.Lbswap128_mask(%rip),    RKR, RKR;
 
 #define transpose_2x4(x0, x1, t0, t1) \
 	vpunpckldq		x1, x0, t0; \
@@ -251,9 +255,9 @@ __cast5_enc_blk16:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	enc_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -287,7 +291,7 @@ __cast5_enc_blk16:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RR1, RL1, RTMP, RX, RKM);
 	outunpack_blocks(RR2, RL2, RTMP, RX, RKM);
@@ -325,9 +329,9 @@ __cast5_dec_blk16:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	dec_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -358,7 +362,7 @@ __cast5_dec_blk16:
 	round(RL, RR, 1, 2);
 	round(RR, RL, 0, 1);
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	popq %rbx;
 	popq %r15;
 
@@ -521,8 +525,8 @@ ENTRY(cast5_ctr_16way)
 
 	vpcmpeqd RKR, RKR, RKR;
 	vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */
-	vmovdqa .Lbswap_iv_mask, R1ST;
-	vmovdqa .Lbswap128_mask, RKM;
+	vmovdqa .Lbswap_iv_mask(%rip), R1ST;
+	vmovdqa .Lbswap128_mask(%rip), RKM;
 
 	/* load IV and byteswap */
 	vmovq (%rcx), RX;
diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index 7f30b6f0d72c..da1b7e4a23e4 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -98,16 +98,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -190,10 +194,10 @@
 	qop(RD, RC, 1);
 
 #define shuffle(mask) \
-	vpshufb		mask,            RKR, RKR;
+	vpshufb		mask(%rip),            RKR, RKR;
 
 #define preload_rkr(n, do_mask, mask) \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		(kr+n*16)(CTX),           RKR, RKR; \
 	do_mask(mask);
@@ -275,9 +279,9 @@ __cast6_enc_blk8:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -301,7 +305,7 @@ __cast6_enc_blk8:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -323,9 +327,9 @@ __cast6_dec_blk8:
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -349,7 +353,7 @@ __cast6_dec_blk8:
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
 
diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S
index 8e49ce117494..4bbd3ec78df5 100644
--- a/arch/x86/crypto/des3_ede-asm_64.S
+++ b/arch/x86/crypto/des3_ede-asm_64.S
@@ -138,21 +138,29 @@
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
 	shrq $16, RW0; \
-	movq s8(, RT0, 8), RT0; \
-	xorq s6(, RT1, 8), to; \
+	leaq s8(%rip), RW1; \
+	movq (RW1, RT0, 8), RT0; \
+	leaq s6(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
 	movzbl RW0bl, RL1d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s4(, RT2, 8), RT0; \
-	xorq s2(, RT3, 8), to; \
+	leaq s4(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
+	leaq s2(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
-	xorq s7(, RL1, 8), RT0; \
-	xorq s5(, RT1, 8), to; \
-	xorq s3(, RT2, 8), RT0; \
+	leaq s7(%rip), RW1; \
+	xorq (RW1, RL1, 8), RT0; \
+	leaq s5(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
+	leaq s3(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
 	load_next_key(n, RW0); \
 	xorq RT0, to; \
-	xorq s1(, RT3, 8), to; \
+	leaq s1(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 
 #define load_next_key(n, RWx) \
 	movq (((n) + 1) * 8)(CTX), RWx;
@@ -364,65 +372,89 @@ ENDPROC(des3_ede_x86_64_crypt_blk)
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s8(, RT3, 8), to##0; \
-	xorq s6(, RT1, 8), to##0; \
+	leaq s8(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s6(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s4(, RT3, 8), to##0; \
-	xorq s2(, RT1, 8), to##0; \
+	leaq s4(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s2(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s7(, RT3, 8), to##0; \
-	xorq s5(, RT1, 8), to##0; \
+	leaq s7(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s5(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	load_next_key(n, RW0); \
-	xorq s3(, RT3, 8), to##0; \
-	xorq s1(, RT1, 8), to##0; \
+	leaq s3(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s1(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 		xorq from##1, RW1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s8(, RT3, 8), to##1; \
-		xorq s6(, RT1, 8), to##1; \
+		leaq s8(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s6(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s4(, RT3, 8), to##1; \
-		xorq s2(, RT1, 8), to##1; \
+		leaq s4(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s2(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrl $16, RW1d; \
-		xorq s7(, RT3, 8), to##1; \
-		xorq s5(, RT1, 8), to##1; \
+		leaq s7(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s5(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		do_movq(RW0, RW1); \
-		xorq s3(, RT3, 8), to##1; \
-		xorq s1(, RT1, 8), to##1; \
+		leaq s3(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s1(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 			xorq from##2, RW2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s8(, RT3, 8), to##2; \
-			xorq s6(, RT1, 8), to##2; \
+			leaq s8(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s6(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s4(, RT3, 8), to##2; \
-			xorq s2(, RT1, 8), to##2; \
+			leaq s4(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s2(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrl $16, RW2d; \
-			xorq s7(, RT3, 8), to##2; \
-			xorq s5(, RT1, 8), to##2; \
+			leaq s7(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s5(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			do_movq(RW0, RW2); \
-			xorq s3(, RT3, 8), to##2; \
-			xorq s1(, RT1, 8), to##2;
+			leaq s3(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s1(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2;
 
 #define __movq(src, dst) \
 	movq src, dst;
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index f94375a8dcd1..d56a281221fb 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -97,7 +97,7 @@ ENTRY(clmul_ghash_mul)
 	FRAME_BEGIN
 	movups (%rdi), DATA
 	movups (%rsi), SHASH
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	PSHUFB_XMM BSWAP DATA
 	call __clmul_gf128mul_ble
 	PSHUFB_XMM BSWAP DATA
@@ -114,7 +114,7 @@ ENTRY(clmul_ghash_update)
 	FRAME_BEGIN
 	cmp $16, %rdx
 	jb .Lupdate_just_ret	# check length
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	movups (%rdi), DATA
 	movups (%rcx), SHASH
 	PSHUFB_XMM BSWAP DATA
diff --git a/arch/x86/crypto/glue_helper-asm-avx.S b/arch/x86/crypto/glue_helper-asm-avx.S
index 02ee2308fb38..8a49ab1699ef 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -54,7 +54,7 @@
 #define load_ctr_8way(iv, bswap, x0, x1, x2, x3, x4, x5, x6, x7, t0, t1, t2) \
 	vpcmpeqd t0, t0, t0; \
 	vpsrldq $8, t0, t0; /* low: -1, high: 0 */ \
-	vmovdqa bswap, t1; \
+	vmovdqa bswap(%rip), t1; \
 	\
 	/* load IV and byteswap */ \
 	vmovdqu (iv), x7; \
@@ -99,7 +99,7 @@
 
 #define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
 		      t1, xts_gf128mul_and_shl1_mask) \
-	vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+	vmovdqa xts_gf128mul_and_shl1_mask(%rip), t0; \
 	\
 	/* load IV */ \
 	vmovdqu (iv), tiv; \
diff --git a/arch/x86/crypto/glue_helper-asm-avx2.S b/arch/x86/crypto/glue_helper-asm-avx2.S
index a53ac11dd385..e04c80467bd2 100644
--- a/arch/x86/crypto/glue_helper-asm-avx2.S
+++ b/arch/x86/crypto/glue_helper-asm-avx2.S
@@ -67,7 +67,7 @@
 	vmovdqu (iv), t2x; \
 	vmovdqa t2x, t3x; \
 	inc_le128(t2x, t0x, t1x); \
-	vbroadcasti128 bswap, t1; \
+	vbroadcasti128 bswap(%rip), t1; \
 	vinserti128 $1, t2x, t3, t2; /* ab: le0 ; cd: le1 */ \
 	vpshufb t1, t2, x0; \
 	\
@@ -124,13 +124,13 @@
 		       tivx, t0, t0x, t1, t1x, t2, t2x, t3, \
 		       xts_gf128mul_and_shl1_mask_0, \
 		       xts_gf128mul_and_shl1_mask_1) \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_0, t1; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_0(%rip), t1; \
 	\
 	/* load IV and construct second IV */ \
 	vmovdqu (iv), tivx; \
 	vmovdqa tivx, t0x; \
 	gf128mul_x_ble(tivx, t1x, t2x); \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_1, t2; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_1(%rip), t2; \
 	vinserti128 $1, tivx, t0, tiv; \
 	vpxor (0*32)(src), tiv, x0; \
 	vmovdqu tiv, (0*32)(dst); \
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 02/27] x86: Use symbol name on bug table for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index aa6b2023d8f8..1210d22ad547 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -37,7 +37,7 @@ do {									\
 	asm volatile("1:\t" ins "\n"					\
 		     ".pushsection __bug_table,\"aw\"\n"		\
 		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
-		     "\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"	\
+		     "\t"  __BUG_REL(%P0) "\t# bug_entry::file\n"	\
 		     "\t.word %c1"        "\t# bug_entry::line\n"	\
 		     "\t.word %c2"        "\t# bug_entry::flags\n"	\
 		     "\t.org 2b+%c3\n"					\
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 02/27] x86: Use symbol name on bug table for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index aa6b2023d8f8..1210d22ad547 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -37,7 +37,7 @@ do {									\
 	asm volatile("1:\t" ins "\n"					\
 		     ".pushsection __bug_table,\"aw\"\n"		\
 		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
-		     "\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"	\
+		     "\t"  __BUG_REL(%P0) "\t# bug_entry::file\n"	\
 		     "\t.word %c1"        "\t# bug_entry::line\n"	\
 		     "\t.word %c2"        "\t# bug_entry::flags\n"	\
 		     "\t.org 2b+%c3\n"					\
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 02/27] x86: Use symbol name on bug table for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (3 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index aa6b2023d8f8..1210d22ad547 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -37,7 +37,7 @@ do {									\
 	asm volatile("1:\t" ins "\n"					\
 		     ".pushsection __bug_table,\"aw\"\n"		\
 		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
-		     "\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"	\
+		     "\t"  __BUG_REL(%P0) "\t# bug_entry::file\n"	\
 		     "\t.word %c1"        "\t# bug_entry::line\n"	\
 		     "\t.word %c2"        "\t# bug_entry::flags\n"	\
 		     "\t.org 2b+%c3\n"					\
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 02/27] x86: Use symbol name on bug table for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/bug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index aa6b2023d8f8..1210d22ad547 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -37,7 +37,7 @@ do {									\
 	asm volatile("1:\t" ins "\n"					\
 		     ".pushsection __bug_table,\"aw\"\n"		\
 		     "2:\t" __BUG_REL(1b) "\t# bug_entry::bug_addr\n"	\
-		     "\t"  __BUG_REL(%c0) "\t# bug_entry::file\n"	\
+		     "\t"  __BUG_REL(%P0) "\t# bug_entry::file\n"	\
 		     "\t.word %c1"        "\t# bug_entry::line\n"	\
 		     "\t.word %c2"        "\t# bug_entry::flags\n"	\
 		     "\t.org 2b+%c3\n"					\
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 03/27] x86: Use symbol name in jump table for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/jump_label.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index adc54c12cbd1..6e558e4524dc 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -36,9 +36,9 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
 		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+		_ASM_PTR "1b, %l[l_yes], %P0 \n\t"
 		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+		: :  "X" (&((char *)key)[branch]) : : l_yes);
 
 	return false;
 l_yes:
@@ -52,9 +52,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 		"2:\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+		_ASM_PTR "1b, %l[l_yes], %P0 \n\t"
 		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+		: :  "X" (&((char *)key)[branch]) : : l_yes);
 
 	return false;
 l_yes:
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 03/27] x86: Use symbol name in jump table for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/jump_label.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index adc54c12cbd1..6e558e4524dc 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -36,9 +36,9 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
 		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+		_ASM_PTR "1b, %l[l_yes], %P0 \n\t"
 		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+		: :  "X" (&((char *)key)[branch]) : : l_yes);
 
 	return false;
 l_yes:
@@ -52,9 +52,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 		"2:\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+		_ASM_PTR "1b, %l[l_yes], %P0 \n\t"
 		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+		: :  "X" (&((char *)key)[branch]) : : l_yes);
 
 	return false;
 l_yes:
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 03/27] x86: Use symbol name in jump table for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (6 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/jump_label.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index adc54c12cbd1..6e558e4524dc 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -36,9 +36,9 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
 		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+		_ASM_PTR "1b, %l[l_yes], %P0 \n\t"
 		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+		: :  "X" (&((char *)key)[branch]) : : l_yes);
 
 	return false;
 l_yes:
@@ -52,9 +52,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 		"2:\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+		_ASM_PTR "1b, %l[l_yes], %P0 \n\t"
 		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+		: :  "X" (&((char *)key)[branch]) : : l_yes);
 
 	return false;
 l_yes:
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 03/27] x86: Use symbol name in jump table for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Replace the %c constraint with %P. The %c is incompatible with PIE
because it implies an immediate value whereas %P reference a symbol.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/jump_label.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index adc54c12cbd1..6e558e4524dc 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -36,9 +36,9 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
 		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+		_ASM_PTR "1b, %l[l_yes], %P0 \n\t"
 		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+		: :  "X" (&((char *)key)[branch]) : : l_yes);
 
 	return false;
 l_yes:
@@ -52,9 +52,9 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 		"2:\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
-		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
+		_ASM_PTR "1b, %l[l_yes], %P0 \n\t"
 		".popsection \n\t"
-		: :  "i" (key), "i" (branch) : : l_yes);
+		: :  "X" (&((char *)key)[branch]) : : l_yes);
 
 	return false;
 l_yes:
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 04/27] x86: Add macro to get symbol address for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add a new _ASM_GET_PTR macro to fetch a symbol address. It will be used
to replace "_ASM_MOV $<symbol>, %dst" code construct that are not compatible
with PIE.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/asm.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index c1eadbaf1115..dddcb8a3b777 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -55,6 +55,19 @@
 # define CC_OUT(c) [_cc_ ## c] "=qm"
 #endif
 
+/* Macros to get a global variable address with PIE support on 64-bit */
+#ifdef CONFIG_X86_32
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(movl $##_src)
+#else
+#ifdef __ASSEMBLY__
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(leaq (_src)(%rip))
+#else
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(leaq (_src)(%%rip))
+#endif
+#endif
+#define _ASM_GET_PTR(_src, _dst) \
+		__ASM_GET_PTR_PRE(_src) __ASM_FORM(_dst)
+
 /* Exception table entry */
 #ifdef __ASSEMBLY__
 # define _ASM_EXTABLE_HANDLE(from, to, handler)			\
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 04/27] x86: Add macro to get symbol address for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add a new _ASM_GET_PTR macro to fetch a symbol address. It will be used
to replace "_ASM_MOV $<symbol>, %dst" code construct that are not compatible
with PIE.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/asm.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index c1eadbaf1115..dddcb8a3b777 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -55,6 +55,19 @@
 # define CC_OUT(c) [_cc_ ## c] "=qm"
 #endif
 
+/* Macros to get a global variable address with PIE support on 64-bit */
+#ifdef CONFIG_X86_32
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(movl $##_src)
+#else
+#ifdef __ASSEMBLY__
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(leaq (_src)(%rip))
+#else
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(leaq (_src)(%%rip))
+#endif
+#endif
+#define _ASM_GET_PTR(_src, _dst) \
+		__ASM_GET_PTR_PRE(_src) __ASM_FORM(_dst)
+
 /* Exception table entry */
 #ifdef __ASSEMBLY__
 # define _ASM_EXTABLE_HANDLE(from, to, handler)			\
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 04/27] x86: Add macro to get symbol address for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (7 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add a new _ASM_GET_PTR macro to fetch a symbol address. It will be used
to replace "_ASM_MOV $<symbol>, %dst" code construct that are not compatible
with PIE.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/asm.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index c1eadbaf1115..dddcb8a3b777 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -55,6 +55,19 @@
 # define CC_OUT(c) [_cc_ ## c] "=qm"
 #endif
 
+/* Macros to get a global variable address with PIE support on 64-bit */
+#ifdef CONFIG_X86_32
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(movl $##_src)
+#else
+#ifdef __ASSEMBLY__
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(leaq (_src)(%rip))
+#else
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(leaq (_src)(%%rip))
+#endif
+#endif
+#define _ASM_GET_PTR(_src, _dst) \
+		__ASM_GET_PTR_PRE(_src) __ASM_FORM(_dst)
+
 /* Exception table entry */
 #ifdef __ASSEMBLY__
 # define _ASM_EXTABLE_HANDLE(from, to, handler)			\
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 04/27] x86: Add macro to get symbol address for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Add a new _ASM_GET_PTR macro to fetch a symbol address. It will be used
to replace "_ASM_MOV $<symbol>, %dst" code construct that are not compatible
with PIE.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/asm.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index c1eadbaf1115..dddcb8a3b777 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -55,6 +55,19 @@
 # define CC_OUT(c) [_cc_ ## c] "=qm"
 #endif
 
+/* Macros to get a global variable address with PIE support on 64-bit */
+#ifdef CONFIG_X86_32
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(movl $##_src)
+#else
+#ifdef __ASSEMBLY__
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(leaq (_src)(%rip))
+#else
+#define __ASM_GET_PTR_PRE(_src) __ASM_FORM_COMMA(leaq (_src)(%%rip))
+#endif
+#endif
+#define _ASM_GET_PTR(_src, _dst) \
+		__ASM_GET_PTR_PRE(_src) __ASM_FORM(_dst)
+
 /* Exception table entry */
 #ifdef __ASSEMBLY__
 # define _ASM_EXTABLE_HANDLE(from, to, handler)			\
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 05/27] x86: relocate_kernel - Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 307d3bac5f04..2ecbdcbe985b 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -200,9 +200,11 @@ identity_mapped:
 	movq	%rax, %cr3
 	lea	PAGE_SIZE(%r8), %rsp
 	call	swap_pages
-	movq	$virtual_mapped, %rax
-	pushq	%rax
-	ret
+	jmp	*virtual_mapped_addr(%rip)
+
+	/* Absolute value for PIE support */
+virtual_mapped_addr:
+	.quad virtual_mapped
 
 virtual_mapped:
 	movq	RSP(%r8), %rsp
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 05/27] x86: relocate_kernel - Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 307d3bac5f04..2ecbdcbe985b 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -200,9 +200,11 @@ identity_mapped:
 	movq	%rax, %cr3
 	lea	PAGE_SIZE(%r8), %rsp
 	call	swap_pages
-	movq	$virtual_mapped, %rax
-	pushq	%rax
-	ret
+	jmp	*virtual_mapped_addr(%rip)
+
+	/* Absolute value for PIE support */
+virtual_mapped_addr:
+	.quad virtual_mapped
 
 virtual_mapped:
 	movq	RSP(%r8), %rsp
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 05/27] x86: relocate_kernel - Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (10 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 307d3bac5f04..2ecbdcbe985b 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -200,9 +200,11 @@ identity_mapped:
 	movq	%rax, %cr3
 	lea	PAGE_SIZE(%r8), %rsp
 	call	swap_pages
-	movq	$virtual_mapped, %rax
-	pushq	%rax
-	ret
+	jmp	*virtual_mapped_addr(%rip)
+
+	/* Absolute value for PIE support */
+virtual_mapped_addr:
+	.quad virtual_mapped
 
 virtual_mapped:
 	movq	RSP(%r8), %rsp
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 05/27] x86: relocate_kernel - Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 307d3bac5f04..2ecbdcbe985b 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -200,9 +200,11 @@ identity_mapped:
 	movq	%rax, %cr3
 	lea	PAGE_SIZE(%r8), %rsp
 	call	swap_pages
-	movq	$virtual_mapped, %rax
-	pushq	%rax
-	ret
+	jmp	*virtual_mapped_addr(%rip)
+
+	/* Absolute value for PIE support */
+virtual_mapped_addr:
+	.quad virtual_mapped
 
 virtual_mapped:
 	movq	RSP(%r8), %rsp
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 06/27] x86/entry/64: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/entry/entry_64.S | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49167258d587..15bd5530d2ae 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -194,12 +194,15 @@ entry_SYSCALL_64_fastpath:
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
 	movq	%r10, %rcx
 
+	/* Ensures the call is position independent */
+	leaq	sys_call_table(%rip), %r11
+
 	/*
 	 * This call instruction is handled specially in stub_ptregs_64.
 	 * It might end up jumping to the slow path.  If it jumps, RAX
 	 * and all argument registers are clobbered.
 	 */
-	call	*sys_call_table(, %rax, 8)
+	call	*(%r11, %rax, 8)
 .Lentry_SYSCALL_64_after_fastpath_call:
 
 	movq	%rax, RAX(%rsp)
@@ -334,7 +337,8 @@ ENTRY(stub_ptregs_64)
 	 * RAX stores a pointer to the C function implementing the syscall.
 	 * IRQs are on.
 	 */
-	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
+	leaq	.Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11
+	cmpq	%r11, (%rsp)
 	jne	1f
 
 	/*
@@ -1172,7 +1176,8 @@ ENTRY(error_entry)
 	movl	%ecx, %eax			/* zero extend */
 	cmpq	%rax, RIP+8(%rsp)
 	je	.Lbstep_iret
-	cmpq	$.Lgs_change, RIP+8(%rsp)
+	leaq	.Lgs_change(%rip), %rcx
+	cmpq	%rcx, RIP+8(%rsp)
 	jne	.Lerror_entry_done
 
 	/*
@@ -1383,10 +1388,10 @@ ENTRY(nmi)
 	 * resume the outer NMI.
 	 */
 
-	movq	$repeat_nmi, %rdx
+	leaq	repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	1f
-	movq	$end_repeat_nmi, %rdx
+	leaq	end_repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	nested_nmi_out
 1:
@@ -1440,7 +1445,8 @@ nested_nmi:
 	pushq	%rdx
 	pushfq
 	pushq	$__KERNEL_CS
-	pushq	$repeat_nmi
+	leaq	repeat_nmi(%rip), %rdx
+	pushq	%rdx
 
 	/* Put stack back */
 	addq	$(6*8), %rsp
@@ -1479,7 +1485,9 @@ first_nmi:
 	addq	$8, (%rsp)	/* Fix up RSP */
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
-	pushq	$1f		/* RIP */
+	pushq	%rax		/* Support Position Independent Code */
+	leaq	1f(%rip), %rax	/* RIP */
+	xchgq	%rax, (%rsp)	/* Restore RAX, put 1f */
 	INTERRUPT_RETURN	/* continues at repeat_nmi below */
 	UNWIND_HINT_IRET_REGS
 1:
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 06/27] x86/entry/64: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/entry/entry_64.S | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49167258d587..15bd5530d2ae 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -194,12 +194,15 @@ entry_SYSCALL_64_fastpath:
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
 	movq	%r10, %rcx
 
+	/* Ensures the call is position independent */
+	leaq	sys_call_table(%rip), %r11
+
 	/*
 	 * This call instruction is handled specially in stub_ptregs_64.
 	 * It might end up jumping to the slow path.  If it jumps, RAX
 	 * and all argument registers are clobbered.
 	 */
-	call	*sys_call_table(, %rax, 8)
+	call	*(%r11, %rax, 8)
 .Lentry_SYSCALL_64_after_fastpath_call:
 
 	movq	%rax, RAX(%rsp)
@@ -334,7 +337,8 @@ ENTRY(stub_ptregs_64)
 	 * RAX stores a pointer to the C function implementing the syscall.
 	 * IRQs are on.
 	 */
-	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
+	leaq	.Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11
+	cmpq	%r11, (%rsp)
 	jne	1f
 
 	/*
@@ -1172,7 +1176,8 @@ ENTRY(error_entry)
 	movl	%ecx, %eax			/* zero extend */
 	cmpq	%rax, RIP+8(%rsp)
 	je	.Lbstep_iret
-	cmpq	$.Lgs_change, RIP+8(%rsp)
+	leaq	.Lgs_change(%rip), %rcx
+	cmpq	%rcx, RIP+8(%rsp)
 	jne	.Lerror_entry_done
 
 	/*
@@ -1383,10 +1388,10 @@ ENTRY(nmi)
 	 * resume the outer NMI.
 	 */
 
-	movq	$repeat_nmi, %rdx
+	leaq	repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	1f
-	movq	$end_repeat_nmi, %rdx
+	leaq	end_repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	nested_nmi_out
 1:
@@ -1440,7 +1445,8 @@ nested_nmi:
 	pushq	%rdx
 	pushfq
 	pushq	$__KERNEL_CS
-	pushq	$repeat_nmi
+	leaq	repeat_nmi(%rip), %rdx
+	pushq	%rdx
 
 	/* Put stack back */
 	addq	$(6*8), %rsp
@@ -1479,7 +1485,9 @@ first_nmi:
 	addq	$8, (%rsp)	/* Fix up RSP */
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
-	pushq	$1f		/* RIP */
+	pushq	%rax		/* Support Position Independent Code */
+	leaq	1f(%rip), %rax	/* RIP */
+	xchgq	%rax, (%rsp)	/* Restore RAX, put 1f */
 	INTERRUPT_RETURN	/* continues at repeat_nmi below */
 	UNWIND_HINT_IRET_REGS
 1:
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 06/27] x86/entry/64: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (11 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/entry/entry_64.S | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49167258d587..15bd5530d2ae 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -194,12 +194,15 @@ entry_SYSCALL_64_fastpath:
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
 	movq	%r10, %rcx
 
+	/* Ensures the call is position independent */
+	leaq	sys_call_table(%rip), %r11
+
 	/*
 	 * This call instruction is handled specially in stub_ptregs_64.
 	 * It might end up jumping to the slow path.  If it jumps, RAX
 	 * and all argument registers are clobbered.
 	 */
-	call	*sys_call_table(, %rax, 8)
+	call	*(%r11, %rax, 8)
 .Lentry_SYSCALL_64_after_fastpath_call:
 
 	movq	%rax, RAX(%rsp)
@@ -334,7 +337,8 @@ ENTRY(stub_ptregs_64)
 	 * RAX stores a pointer to the C function implementing the syscall.
 	 * IRQs are on.
 	 */
-	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
+	leaq	.Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11
+	cmpq	%r11, (%rsp)
 	jne	1f
 
 	/*
@@ -1172,7 +1176,8 @@ ENTRY(error_entry)
 	movl	%ecx, %eax			/* zero extend */
 	cmpq	%rax, RIP+8(%rsp)
 	je	.Lbstep_iret
-	cmpq	$.Lgs_change, RIP+8(%rsp)
+	leaq	.Lgs_change(%rip), %rcx
+	cmpq	%rcx, RIP+8(%rsp)
 	jne	.Lerror_entry_done
 
 	/*
@@ -1383,10 +1388,10 @@ ENTRY(nmi)
 	 * resume the outer NMI.
 	 */
 
-	movq	$repeat_nmi, %rdx
+	leaq	repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	1f
-	movq	$end_repeat_nmi, %rdx
+	leaq	end_repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	nested_nmi_out
 1:
@@ -1440,7 +1445,8 @@ nested_nmi:
 	pushq	%rdx
 	pushfq
 	pushq	$__KERNEL_CS
-	pushq	$repeat_nmi
+	leaq	repeat_nmi(%rip), %rdx
+	pushq	%rdx
 
 	/* Put stack back */
 	addq	$(6*8), %rsp
@@ -1479,7 +1485,9 @@ first_nmi:
 	addq	$8, (%rsp)	/* Fix up RSP */
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
-	pushq	$1f		/* RIP */
+	pushq	%rax		/* Support Position Independent Code */
+	leaq	1f(%rip), %rax	/* RIP */
+	xchgq	%rax, (%rsp)	/* Restore RAX, put 1f */
 	INTERRUPT_RETURN	/* continues at repeat_nmi below */
 	UNWIND_HINT_IRET_REGS
 1:
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 06/27] x86/entry/64: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/entry/entry_64.S | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49167258d587..15bd5530d2ae 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -194,12 +194,15 @@ entry_SYSCALL_64_fastpath:
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
 	movq	%r10, %rcx
 
+	/* Ensures the call is position independent */
+	leaq	sys_call_table(%rip), %r11
+
 	/*
 	 * This call instruction is handled specially in stub_ptregs_64.
 	 * It might end up jumping to the slow path.  If it jumps, RAX
 	 * and all argument registers are clobbered.
 	 */
-	call	*sys_call_table(, %rax, 8)
+	call	*(%r11, %rax, 8)
 .Lentry_SYSCALL_64_after_fastpath_call:
 
 	movq	%rax, RAX(%rsp)
@@ -334,7 +337,8 @@ ENTRY(stub_ptregs_64)
 	 * RAX stores a pointer to the C function implementing the syscall.
 	 * IRQs are on.
 	 */
-	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
+	leaq	.Lentry_SYSCALL_64_after_fastpath_call(%rip), %r11
+	cmpq	%r11, (%rsp)
 	jne	1f
 
 	/*
@@ -1172,7 +1176,8 @@ ENTRY(error_entry)
 	movl	%ecx, %eax			/* zero extend */
 	cmpq	%rax, RIP+8(%rsp)
 	je	.Lbstep_iret
-	cmpq	$.Lgs_change, RIP+8(%rsp)
+	leaq	.Lgs_change(%rip), %rcx
+	cmpq	%rcx, RIP+8(%rsp)
 	jne	.Lerror_entry_done
 
 	/*
@@ -1383,10 +1388,10 @@ ENTRY(nmi)
 	 * resume the outer NMI.
 	 */
 
-	movq	$repeat_nmi, %rdx
+	leaq	repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	1f
-	movq	$end_repeat_nmi, %rdx
+	leaq	end_repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	nested_nmi_out
 1:
@@ -1440,7 +1445,8 @@ nested_nmi:
 	pushq	%rdx
 	pushfq
 	pushq	$__KERNEL_CS
-	pushq	$repeat_nmi
+	leaq	repeat_nmi(%rip), %rdx
+	pushq	%rdx
 
 	/* Put stack back */
 	addq	$(6*8), %rsp
@@ -1479,7 +1485,9 @@ first_nmi:
 	addq	$8, (%rsp)	/* Fix up RSP */
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
-	pushq	$1f		/* RIP */
+	pushq	%rax		/* Support Position Independent Code */
+	leaq	1f(%rip), %rax	/* RIP */
+	xchgq	%rax, (%rsp)	/* Restore RAX, put 1f */
 	INTERRUPT_RETURN	/* continues at repeat_nmi below */
 	UNWIND_HINT_IRET_REGS
 1:
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 07/27] x86: pm-trace - Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change assembly to use the new _ASM_GET_PTR macro instead of _ASM_MOV for
the assembly to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/pm-trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index 7b7ac42c3661..a3801261f0dd 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -7,7 +7,7 @@
 do {								\
 	if (pm_trace_enabled) {					\
 		const void *tracedata;				\
-		asm volatile(_ASM_MOV " $1f,%0\n"		\
+		asm volatile(_ASM_GET_PTR(1f, %0) "\n"		\
 			     ".section .tracedata,\"a\"\n"	\
 			     "1:\t.word %c1\n\t"		\
 			     _ASM_PTR " %c2\n"			\
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 07/27] x86: pm-trace - Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change assembly to use the new _ASM_GET_PTR macro instead of _ASM_MOV for
the assembly to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/pm-trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index 7b7ac42c3661..a3801261f0dd 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -7,7 +7,7 @@
 do {								\
 	if (pm_trace_enabled) {					\
 		const void *tracedata;				\
-		asm volatile(_ASM_MOV " $1f,%0\n"		\
+		asm volatile(_ASM_GET_PTR(1f, %0) "\n"		\
 			     ".section .tracedata,\"a\"\n"	\
 			     "1:\t.word %c1\n\t"		\
 			     _ASM_PTR " %c2\n"			\
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 07/27] x86: pm-trace - Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (14 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change assembly to use the new _ASM_GET_PTR macro instead of _ASM_MOV for
the assembly to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/pm-trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index 7b7ac42c3661..a3801261f0dd 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -7,7 +7,7 @@
 do {								\
 	if (pm_trace_enabled) {					\
 		const void *tracedata;				\
-		asm volatile(_ASM_MOV " $1f,%0\n"		\
+		asm volatile(_ASM_GET_PTR(1f, %0) "\n"		\
 			     ".section .tracedata,\"a\"\n"	\
 			     "1:\t.word %c1\n\t"		\
 			     _ASM_PTR " %c2\n"			\
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 07/27] x86: pm-trace - Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change assembly to use the new _ASM_GET_PTR macro instead of _ASM_MOV for
the assembly to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/pm-trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index 7b7ac42c3661..a3801261f0dd 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -7,7 +7,7 @@
 do {								\
 	if (pm_trace_enabled) {					\
 		const void *tracedata;				\
-		asm volatile(_ASM_MOV " $1f,%0\n"		\
+		asm volatile(_ASM_GET_PTR(1f, %0) "\n"		\
 			     ".section .tracedata,\"a\"\n"	\
 			     "1:\t.word %c1\n\t"		\
 			     _ASM_PTR " %c2\n"			\
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 08/27] x86/CPU: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. Use the new _ASM_GET_PTR macro instead of
the 'mov $symbol, %dst' construct to not have an absolute reference.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/processor.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b446c5a082ad..b09bd50b06c7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -49,7 +49,7 @@ static inline void *current_text_addr(void)
 {
 	void *pc;
 
-	asm volatile("mov $1f, %0; 1:":"=r" (pc));
+	asm volatile(_ASM_GET_PTR(1f, %0) "; 1:":"=r" (pc));
 
 	return pc;
 }
@@ -695,6 +695,7 @@ static inline void sync_core(void)
 		: ASM_CALL_CONSTRAINT : : "memory");
 #else
 	unsigned int tmp;
+	unsigned long tmp2;
 
 	asm volatile (
 		UNWIND_HINT_SAVE
@@ -705,11 +706,13 @@ static inline void sync_core(void)
 		"pushfq\n\t"
 		"mov %%cs, %0\n\t"
 		"pushq %q0\n\t"
-		"pushq $1f\n\t"
+		"leaq 1f(%%rip), %1\n\t"
+		"pushq %1\n\t"
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
+		: "=&r" (tmp), "=&r" (tmp2), ASM_CALL_CONSTRAINT
+		: : "cc", "memory");
 #endif
 }
 
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 08/27] x86/CPU: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. Use the new _ASM_GET_PTR macro instead of
the 'mov $symbol, %dst' construct to not have an absolute reference.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/processor.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b446c5a082ad..b09bd50b06c7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -49,7 +49,7 @@ static inline void *current_text_addr(void)
 {
 	void *pc;
 
-	asm volatile("mov $1f, %0; 1:":"=r" (pc));
+	asm volatile(_ASM_GET_PTR(1f, %0) "; 1:":"=r" (pc));
 
 	return pc;
 }
@@ -695,6 +695,7 @@ static inline void sync_core(void)
 		: ASM_CALL_CONSTRAINT : : "memory");
 #else
 	unsigned int tmp;
+	unsigned long tmp2;
 
 	asm volatile (
 		UNWIND_HINT_SAVE
@@ -705,11 +706,13 @@ static inline void sync_core(void)
 		"pushfq\n\t"
 		"mov %%cs, %0\n\t"
 		"pushq %q0\n\t"
-		"pushq $1f\n\t"
+		"leaq 1f(%%rip), %1\n\t"
+		"pushq %1\n\t"
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
+		: "=&r" (tmp), "=&r" (tmp2), ASM_CALL_CONSTRAINT
+		: : "cc", "memory");
 #endif
 }
 
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 08/27] x86/CPU: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (16 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. Use the new _ASM_GET_PTR macro instead of
the 'mov $symbol, %dst' construct to not have an absolute reference.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/processor.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b446c5a082ad..b09bd50b06c7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -49,7 +49,7 @@ static inline void *current_text_addr(void)
 {
 	void *pc;
 
-	asm volatile("mov $1f, %0; 1:":"=r" (pc));
+	asm volatile(_ASM_GET_PTR(1f, %0) "; 1:":"=r" (pc));
 
 	return pc;
 }
@@ -695,6 +695,7 @@ static inline void sync_core(void)
 		: ASM_CALL_CONSTRAINT : : "memory");
 #else
 	unsigned int tmp;
+	unsigned long tmp2;
 
 	asm volatile (
 		UNWIND_HINT_SAVE
@@ -705,11 +706,13 @@ static inline void sync_core(void)
 		"pushfq\n\t"
 		"mov %%cs, %0\n\t"
 		"pushq %q0\n\t"
-		"pushq $1f\n\t"
+		"leaq 1f(%%rip), %1\n\t"
+		"pushq %1\n\t"
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
+		: "=&r" (tmp), "=&r" (tmp2), ASM_CALL_CONSTRAINT
+		: : "cc", "memory");
 #endif
 }
 
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 08/27] x86/CPU: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. Use the new _ASM_GET_PTR macro instead of
the 'mov $symbol, %dst' construct to not have an absolute reference.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/processor.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b446c5a082ad..b09bd50b06c7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -49,7 +49,7 @@ static inline void *current_text_addr(void)
 {
 	void *pc;
 
-	asm volatile("mov $1f, %0; 1:":"=r" (pc));
+	asm volatile(_ASM_GET_PTR(1f, %0) "; 1:":"=r" (pc));
 
 	return pc;
 }
@@ -695,6 +695,7 @@ static inline void sync_core(void)
 		: ASM_CALL_CONSTRAINT : : "memory");
 #else
 	unsigned int tmp;
+	unsigned long tmp2;
 
 	asm volatile (
 		UNWIND_HINT_SAVE
@@ -705,11 +706,13 @@ static inline void sync_core(void)
 		"pushfq\n\t"
 		"mov %%cs, %0\n\t"
 		"pushq %q0\n\t"
-		"pushq $1f\n\t"
+		"leaq 1f(%%rip), %1\n\t"
+		"pushq %1\n\t"
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
+		: "=&r" (tmp), "=&r" (tmp2), ASM_CALL_CONSTRAINT
+		: : "cc", "memory");
 #endif
 }
 
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 09/27] x86/acpi: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/acpi/wakeup_64.S | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 50b8ed0317a3..472659c0f811 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
 	 * Hooray, we are in Long 64-bit mode (but still running in low memory)
 	 */
 ENTRY(wakeup_long64)
-	movq	saved_magic, %rax
+	movq	saved_magic(%rip), %rax
 	movq	$0x123456789abcdef0, %rdx
 	cmpq	%rdx, %rax
 	jne	bogus_64_magic
@@ -25,14 +25,14 @@ ENTRY(wakeup_long64)
 	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
-	movq	saved_rsp, %rsp
+	movq	saved_rsp(%rip), %rsp
 
-	movq	saved_rbx, %rbx
-	movq	saved_rdi, %rdi
-	movq	saved_rsi, %rsi
-	movq	saved_rbp, %rbp
+	movq	saved_rbx(%rip), %rbx
+	movq	saved_rdi(%rip), %rdi
+	movq	saved_rsi(%rip), %rsi
+	movq	saved_rbp(%rip), %rbp
 
-	movq	saved_rip, %rax
+	movq	saved_rip(%rip), %rax
 	jmp	*%rax
 ENDPROC(wakeup_long64)
 
@@ -45,7 +45,7 @@ ENTRY(do_suspend_lowlevel)
 	xorl	%eax, %eax
 	call	save_processor_state
 
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -64,13 +64,14 @@ ENTRY(do_suspend_lowlevel)
 	pushfq
 	popq	pt_regs_flags(%rax)
 
-	movq	$.Lresume_point, saved_rip(%rip)
+	leaq	.Lresume_point(%rip), %rax
+	movq	%rax, saved_rip(%rip)
 
-	movq	%rsp, saved_rsp
-	movq	%rbp, saved_rbp
-	movq	%rbx, saved_rbx
-	movq	%rdi, saved_rdi
-	movq	%rsi, saved_rsi
+	movq	%rsp, saved_rsp(%rip)
+	movq	%rbp, saved_rbp(%rip)
+	movq	%rbx, saved_rbx(%rip)
+	movq	%rdi, saved_rdi(%rip)
+	movq	%rsi, saved_rsi(%rip)
 
 	addq	$8, %rsp
 	movl	$3, %edi
@@ -82,7 +83,7 @@ ENTRY(do_suspend_lowlevel)
 	.align 4
 .Lresume_point:
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	saved_context_cr4(%rax), %rbx
 	movq	%rbx, %cr4
 	movq	saved_context_cr3(%rax), %rbx
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 09/27] x86/acpi: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/acpi/wakeup_64.S | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 50b8ed0317a3..472659c0f811 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
 	 * Hooray, we are in Long 64-bit mode (but still running in low memory)
 	 */
 ENTRY(wakeup_long64)
-	movq	saved_magic, %rax
+	movq	saved_magic(%rip), %rax
 	movq	$0x123456789abcdef0, %rdx
 	cmpq	%rdx, %rax
 	jne	bogus_64_magic
@@ -25,14 +25,14 @@ ENTRY(wakeup_long64)
 	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
-	movq	saved_rsp, %rsp
+	movq	saved_rsp(%rip), %rsp
 
-	movq	saved_rbx, %rbx
-	movq	saved_rdi, %rdi
-	movq	saved_rsi, %rsi
-	movq	saved_rbp, %rbp
+	movq	saved_rbx(%rip), %rbx
+	movq	saved_rdi(%rip), %rdi
+	movq	saved_rsi(%rip), %rsi
+	movq	saved_rbp(%rip), %rbp
 
-	movq	saved_rip, %rax
+	movq	saved_rip(%rip), %rax
 	jmp	*%rax
 ENDPROC(wakeup_long64)
 
@@ -45,7 +45,7 @@ ENTRY(do_suspend_lowlevel)
 	xorl	%eax, %eax
 	call	save_processor_state
 
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -64,13 +64,14 @@ ENTRY(do_suspend_lowlevel)
 	pushfq
 	popq	pt_regs_flags(%rax)
 
-	movq	$.Lresume_point, saved_rip(%rip)
+	leaq	.Lresume_point(%rip), %rax
+	movq	%rax, saved_rip(%rip)
 
-	movq	%rsp, saved_rsp
-	movq	%rbp, saved_rbp
-	movq	%rbx, saved_rbx
-	movq	%rdi, saved_rdi
-	movq	%rsi, saved_rsi
+	movq	%rsp, saved_rsp(%rip)
+	movq	%rbp, saved_rbp(%rip)
+	movq	%rbx, saved_rbx(%rip)
+	movq	%rdi, saved_rdi(%rip)
+	movq	%rsi, saved_rsi(%rip)
 
 	addq	$8, %rsp
 	movl	$3, %edi
@@ -82,7 +83,7 @@ ENTRY(do_suspend_lowlevel)
 	.align 4
 .Lresume_point:
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	saved_context_cr4(%rax), %rbx
 	movq	%rbx, %cr4
 	movq	saved_context_cr3(%rax), %rbx
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 09/27] x86/acpi: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (17 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/acpi/wakeup_64.S | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 50b8ed0317a3..472659c0f811 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
 	 * Hooray, we are in Long 64-bit mode (but still running in low memory)
 	 */
 ENTRY(wakeup_long64)
-	movq	saved_magic, %rax
+	movq	saved_magic(%rip), %rax
 	movq	$0x123456789abcdef0, %rdx
 	cmpq	%rdx, %rax
 	jne	bogus_64_magic
@@ -25,14 +25,14 @@ ENTRY(wakeup_long64)
 	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
-	movq	saved_rsp, %rsp
+	movq	saved_rsp(%rip), %rsp
 
-	movq	saved_rbx, %rbx
-	movq	saved_rdi, %rdi
-	movq	saved_rsi, %rsi
-	movq	saved_rbp, %rbp
+	movq	saved_rbx(%rip), %rbx
+	movq	saved_rdi(%rip), %rdi
+	movq	saved_rsi(%rip), %rsi
+	movq	saved_rbp(%rip), %rbp
 
-	movq	saved_rip, %rax
+	movq	saved_rip(%rip), %rax
 	jmp	*%rax
 ENDPROC(wakeup_long64)
 
@@ -45,7 +45,7 @@ ENTRY(do_suspend_lowlevel)
 	xorl	%eax, %eax
 	call	save_processor_state
 
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -64,13 +64,14 @@ ENTRY(do_suspend_lowlevel)
 	pushfq
 	popq	pt_regs_flags(%rax)
 
-	movq	$.Lresume_point, saved_rip(%rip)
+	leaq	.Lresume_point(%rip), %rax
+	movq	%rax, saved_rip(%rip)
 
-	movq	%rsp, saved_rsp
-	movq	%rbp, saved_rbp
-	movq	%rbx, saved_rbx
-	movq	%rdi, saved_rdi
-	movq	%rsi, saved_rsi
+	movq	%rsp, saved_rsp(%rip)
+	movq	%rbp, saved_rbp(%rip)
+	movq	%rbx, saved_rbx(%rip)
+	movq	%rdi, saved_rdi(%rip)
+	movq	%rsi, saved_rsi(%rip)
 
 	addq	$8, %rsp
 	movl	$3, %edi
@@ -82,7 +83,7 @@ ENTRY(do_suspend_lowlevel)
 	.align 4
 .Lresume_point:
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	saved_context_cr4(%rax), %rbx
 	movq	%rbx, %cr4
 	movq	saved_context_cr3(%rax), %rbx
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 09/27] x86/acpi: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/acpi/wakeup_64.S | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index 50b8ed0317a3..472659c0f811 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -14,7 +14,7 @@
 	 * Hooray, we are in Long 64-bit mode (but still running in low memory)
 	 */
 ENTRY(wakeup_long64)
-	movq	saved_magic, %rax
+	movq	saved_magic(%rip), %rax
 	movq	$0x123456789abcdef0, %rdx
 	cmpq	%rdx, %rax
 	jne	bogus_64_magic
@@ -25,14 +25,14 @@ ENTRY(wakeup_long64)
 	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
-	movq	saved_rsp, %rsp
+	movq	saved_rsp(%rip), %rsp
 
-	movq	saved_rbx, %rbx
-	movq	saved_rdi, %rdi
-	movq	saved_rsi, %rsi
-	movq	saved_rbp, %rbp
+	movq	saved_rbx(%rip), %rbx
+	movq	saved_rdi(%rip), %rdi
+	movq	saved_rsi(%rip), %rsi
+	movq	saved_rbp(%rip), %rbp
 
-	movq	saved_rip, %rax
+	movq	saved_rip(%rip), %rax
 	jmp	*%rax
 ENDPROC(wakeup_long64)
 
@@ -45,7 +45,7 @@ ENTRY(do_suspend_lowlevel)
 	xorl	%eax, %eax
 	call	save_processor_state
 
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -64,13 +64,14 @@ ENTRY(do_suspend_lowlevel)
 	pushfq
 	popq	pt_regs_flags(%rax)
 
-	movq	$.Lresume_point, saved_rip(%rip)
+	leaq	.Lresume_point(%rip), %rax
+	movq	%rax, saved_rip(%rip)
 
-	movq	%rsp, saved_rsp
-	movq	%rbp, saved_rbp
-	movq	%rbx, saved_rbx
-	movq	%rdi, saved_rdi
-	movq	%rsi, saved_rsi
+	movq	%rsp, saved_rsp(%rip)
+	movq	%rbp, saved_rbp(%rip)
+	movq	%rbx, saved_rbx(%rip)
+	movq	%rdi, saved_rdi(%rip)
+	movq	%rsi, saved_rsi(%rip)
 
 	addq	$8, %rsp
 	movl	$3, %edi
@@ -82,7 +83,7 @@ ENTRY(do_suspend_lowlevel)
 	.align 4
 .Lresume_point:
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	saved_context_cr4(%rax), %rbx
 	movq	%rbx, %cr4
 	movq	saved_context_cr3(%rax), %rbx
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 10/27] x86/boot/64: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Early at boot, the kernel is mapped at a temporary address while preparing
the page table. To know the changes needed for the page table with KASLR,
the boot code calculate the difference between the expected address of the
kernel and the one chosen by KASLR. It does not work with PIE because all
symbols in code are relatives. Instead of getting the future relocated
virtual address, you will get the current temporary mapping. The solution
is using global variables that will be relocated as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/head_64.S | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 42e32c2e51bb..32d1899f48df 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -86,8 +86,21 @@ startup_64:
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(early_top_pgt - __START_KERNEL_map), %rax
+	addq    _early_top_pgt_offset(%rip), %rax
 	jmp 1f
+
+	/*
+	 * Position Independent Code takes only relative references in code
+	 * meaning a global variable address is relative to RIP and not its
+	 * future virtual address. Global variables can be used instead as they
+	 * are still relocated on the expected kernel mapping address.
+	 */
+	.align 8
+_early_top_pgt_offset:
+	.quad early_top_pgt - __START_KERNEL_map
+_init_top_offset:
+	.quad init_top_pgt - __START_KERNEL_map
+
 ENTRY(secondary_startup_64)
 	UNWIND_HINT_EMPTY
 	/*
@@ -116,7 +129,7 @@ ENTRY(secondary_startup_64)
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(init_top_pgt - __START_KERNEL_map), %rax
+	addq    _init_top_offset(%rip), %rax
 1:
 
 	/* Enable PAE mode, PGE and LA57 */
@@ -131,7 +144,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
-	movq	$1f, %rax
+	movabs  $1f, %rax
 	jmp	*%rax
 1:
 	UNWIND_HINT_EMPTY
@@ -230,11 +243,12 @@ ENTRY(secondary_startup_64)
 	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 	 *		address given in m16:64.
 	 */
-	pushq	$.Lafter_lret	# put return address on stack for unwinder
+	leaq	.Lafter_lret(%rip), %rax
+	pushq	%rax		# put return address on stack for unwinder
 	xorq	%rbp, %rbp	# clear frame pointer
-	movq	initial_code(%rip), %rax
+	leaq	initial_code(%rip), %rax
 	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
+	pushq	(%rax)		# target address in negative space
 	lretq
 .Lafter_lret:
 END(secondary_startup_64)
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 10/27] x86/boot/64: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Early at boot, the kernel is mapped at a temporary address while preparing
the page table. To know the changes needed for the page table with KASLR,
the boot code calculate the difference between the expected address of the
kernel and the one chosen by KASLR. It does not work with PIE because all
symbols in code are relatives. Instead of getting the future relocated
virtual address, you will get the current temporary mapping. The solution
is using global variables that will be relocated as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/head_64.S | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 42e32c2e51bb..32d1899f48df 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -86,8 +86,21 @@ startup_64:
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(early_top_pgt - __START_KERNEL_map), %rax
+	addq    _early_top_pgt_offset(%rip), %rax
 	jmp 1f
+
+	/*
+	 * Position Independent Code takes only relative references in code
+	 * meaning a global variable address is relative to RIP and not its
+	 * future virtual address. Global variables can be used instead as they
+	 * are still relocated on the expected kernel mapping address.
+	 */
+	.align 8
+_early_top_pgt_offset:
+	.quad early_top_pgt - __START_KERNEL_map
+_init_top_offset:
+	.quad init_top_pgt - __START_KERNEL_map
+
 ENTRY(secondary_startup_64)
 	UNWIND_HINT_EMPTY
 	/*
@@ -116,7 +129,7 @@ ENTRY(secondary_startup_64)
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(init_top_pgt - __START_KERNEL_map), %rax
+	addq    _init_top_offset(%rip), %rax
 1:
 
 	/* Enable PAE mode, PGE and LA57 */
@@ -131,7 +144,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
-	movq	$1f, %rax
+	movabs  $1f, %rax
 	jmp	*%rax
 1:
 	UNWIND_HINT_EMPTY
@@ -230,11 +243,12 @@ ENTRY(secondary_startup_64)
 	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 	 *		address given in m16:64.
 	 */
-	pushq	$.Lafter_lret	# put return address on stack for unwinder
+	leaq	.Lafter_lret(%rip), %rax
+	pushq	%rax		# put return address on stack for unwinder
 	xorq	%rbp, %rbp	# clear frame pointer
-	movq	initial_code(%rip), %rax
+	leaq	initial_code(%rip), %rax
 	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
+	pushq	(%rax)		# target address in negative space
 	lretq
 .Lafter_lret:
 END(secondary_startup_64)
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 10/27] x86/boot/64: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (19 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Early at boot, the kernel is mapped at a temporary address while preparing
the page table. To know the changes needed for the page table with KASLR,
the boot code calculate the difference between the expected address of the
kernel and the one chosen by KASLR. It does not work with PIE because all
symbols in code are relatives. Instead of getting the future relocated
virtual address, you will get the current temporary mapping. The solution
is using global variables that will be relocated as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/head_64.S | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 42e32c2e51bb..32d1899f48df 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -86,8 +86,21 @@ startup_64:
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(early_top_pgt - __START_KERNEL_map), %rax
+	addq    _early_top_pgt_offset(%rip), %rax
 	jmp 1f
+
+	/*
+	 * Position Independent Code takes only relative references in code
+	 * meaning a global variable address is relative to RIP and not its
+	 * future virtual address. Global variables can be used instead as they
+	 * are still relocated on the expected kernel mapping address.
+	 */
+	.align 8
+_early_top_pgt_offset:
+	.quad early_top_pgt - __START_KERNEL_map
+_init_top_offset:
+	.quad init_top_pgt - __START_KERNEL_map
+
 ENTRY(secondary_startup_64)
 	UNWIND_HINT_EMPTY
 	/*
@@ -116,7 +129,7 @@ ENTRY(secondary_startup_64)
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(init_top_pgt - __START_KERNEL_map), %rax
+	addq    _init_top_offset(%rip), %rax
 1:
 
 	/* Enable PAE mode, PGE and LA57 */
@@ -131,7 +144,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
-	movq	$1f, %rax
+	movabs  $1f, %rax
 	jmp	*%rax
 1:
 	UNWIND_HINT_EMPTY
@@ -230,11 +243,12 @@ ENTRY(secondary_startup_64)
 	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 	 *		address given in m16:64.
 	 */
-	pushq	$.Lafter_lret	# put return address on stack for unwinder
+	leaq	.Lafter_lret(%rip), %rax
+	pushq	%rax		# put return address on stack for unwinder
 	xorq	%rbp, %rbp	# clear frame pointer
-	movq	initial_code(%rip), %rax
+	leaq	initial_code(%rip), %rax
 	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
+	pushq	(%rax)		# target address in negative space
 	lretq
 .Lafter_lret:
 END(secondary_startup_64)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 10/27] x86/boot/64: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Early at boot, the kernel is mapped at a temporary address while preparing
the page table. To know the changes needed for the page table with KASLR,
the boot code calculate the difference between the expected address of the
kernel and the one chosen by KASLR. It does not work with PIE because all
symbols in code are relatives. Instead of getting the future relocated
virtual address, you will get the current temporary mapping. The solution
is using global variables that will be relocated as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/head_64.S | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 42e32c2e51bb..32d1899f48df 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -86,8 +86,21 @@ startup_64:
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(early_top_pgt - __START_KERNEL_map), %rax
+	addq    _early_top_pgt_offset(%rip), %rax
 	jmp 1f
+
+	/*
+	 * Position Independent Code takes only relative references in code
+	 * meaning a global variable address is relative to RIP and not its
+	 * future virtual address. Global variables can be used instead as they
+	 * are still relocated on the expected kernel mapping address.
+	 */
+	.align 8
+_early_top_pgt_offset:
+	.quad early_top_pgt - __START_KERNEL_map
+_init_top_offset:
+	.quad init_top_pgt - __START_KERNEL_map
+
 ENTRY(secondary_startup_64)
 	UNWIND_HINT_EMPTY
 	/*
@@ -116,7 +129,7 @@ ENTRY(secondary_startup_64)
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(init_top_pgt - __START_KERNEL_map), %rax
+	addq    _init_top_offset(%rip), %rax
 1:
 
 	/* Enable PAE mode, PGE and LA57 */
@@ -131,7 +144,7 @@ ENTRY(secondary_startup_64)
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
-	movq	$1f, %rax
+	movabs  $1f, %rax
 	jmp	*%rax
 1:
 	UNWIND_HINT_EMPTY
@@ -230,11 +243,12 @@ ENTRY(secondary_startup_64)
 	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 	 *		address given in m16:64.
 	 */
-	pushq	$.Lafter_lret	# put return address on stack for unwinder
+	leaq	.Lafter_lret(%rip), %rax
+	pushq	%rax		# put return address on stack for unwinder
 	xorq	%rbp, %rbp	# clear frame pointer
-	movq	initial_code(%rip), %rax
+	leaq	initial_code(%rip), %rax
 	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
+	pushq	(%rax)		# target address in negative space
 	lretq
 .Lafter_lret:
 END(secondary_startup_64)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 11/27] x86/power/64: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/power/hibernate_asm_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index ce8da3a0412c..6fdd7bbc3c33 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -24,7 +24,7 @@
 #include <asm/frame.h>
 
 ENTRY(swsusp_arch_suspend)
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -115,7 +115,7 @@ ENTRY(restore_registers)
 	movq	%rax, %cr4;  # turn PGE back on
 
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	pt_regs_sp(%rax), %rsp
 	movq	pt_regs_bp(%rax), %rbp
 	movq	pt_regs_si(%rax), %rsi
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 11/27] x86/power/64: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/power/hibernate_asm_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index ce8da3a0412c..6fdd7bbc3c33 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -24,7 +24,7 @@
 #include <asm/frame.h>
 
 ENTRY(swsusp_arch_suspend)
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -115,7 +115,7 @@ ENTRY(restore_registers)
 	movq	%rax, %cr4;  # turn PGE back on
 
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	pt_regs_sp(%rax), %rsp
 	movq	pt_regs_bp(%rax), %rbp
 	movq	pt_regs_si(%rax), %rsi
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 11/27] x86/power/64: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (22 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/power/hibernate_asm_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index ce8da3a0412c..6fdd7bbc3c33 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -24,7 +24,7 @@
 #include <asm/frame.h>
 
 ENTRY(swsusp_arch_suspend)
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -115,7 +115,7 @@ ENTRY(restore_registers)
 	movq	%rax, %cr4;  # turn PGE back on
 
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	pt_regs_sp(%rax), %rsp
 	movq	pt_regs_bp(%rax), %rbp
 	movq	pt_regs_si(%rax), %rsi
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 11/27] x86/power/64: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/power/hibernate_asm_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index ce8da3a0412c..6fdd7bbc3c33 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -24,7 +24,7 @@
 #include <asm/frame.h>
 
 ENTRY(swsusp_arch_suspend)
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -115,7 +115,7 @@ ENTRY(restore_registers)
 	movq	%rax, %cr4;  # turn PGE back on
 
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	pt_regs_sp(%rax), %rsp
 	movq	pt_regs_bp(%rax), %rbp
 	movq	pt_regs_si(%rax), %rsi
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 12/27] x86/paravirt: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

if PIE is enabled, switch the paravirt assembly constraints to be
compatible. The %c/i constrains generate smaller code so is kept by
default.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/paravirt_types.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 280d94c36dad..e6961f8a74aa 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -335,9 +335,17 @@ extern struct pv_lock_ops pv_lock_ops;
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
 
+#ifdef CONFIG_X86_PIE
+#define paravirt_opptr_call "a"
+#define paravirt_opptr_type "p"
+#else
+#define paravirt_opptr_call "c"
+#define paravirt_opptr_type "i"
+#endif
+
 #define paravirt_type(op)				\
 	[paravirt_typenum] "i" (PARAVIRT_PATCH(op)),	\
-	[paravirt_opptr] "i" (&(op))
+	[paravirt_opptr] paravirt_opptr_type (&(op))
 #define paravirt_clobber(clobber)		\
 	[paravirt_clobber] "i" (clobber)
 
@@ -391,7 +399,7 @@ int paravirt_disable_iospace(void);
  * offset into the paravirt_patch_template structure, and can therefore be
  * freely converted back into a structure offset.
  */
-#define PARAVIRT_CALL	"call *%c[paravirt_opptr];"
+#define PARAVIRT_CALL	"call *%" paravirt_opptr_call "[paravirt_opptr];"
 
 /*
  * These macros are intended to wrap calls through one of the paravirt
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 12/27] x86/paravirt: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

if PIE is enabled, switch the paravirt assembly constraints to be
compatible. The %c/i constrains generate smaller code so is kept by
default.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/paravirt_types.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 280d94c36dad..e6961f8a74aa 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -335,9 +335,17 @@ extern struct pv_lock_ops pv_lock_ops;
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
 
+#ifdef CONFIG_X86_PIE
+#define paravirt_opptr_call "a"
+#define paravirt_opptr_type "p"
+#else
+#define paravirt_opptr_call "c"
+#define paravirt_opptr_type "i"
+#endif
+
 #define paravirt_type(op)				\
 	[paravirt_typenum] "i" (PARAVIRT_PATCH(op)),	\
-	[paravirt_opptr] "i" (&(op))
+	[paravirt_opptr] paravirt_opptr_type (&(op))
 #define paravirt_clobber(clobber)		\
 	[paravirt_clobber] "i" (clobber)
 
@@ -391,7 +399,7 @@ int paravirt_disable_iospace(void);
  * offset into the paravirt_patch_template structure, and can therefore be
  * freely converted back into a structure offset.
  */
-#define PARAVIRT_CALL	"call *%c[paravirt_opptr];"
+#define PARAVIRT_CALL	"call *%" paravirt_opptr_call "[paravirt_opptr];"
 
 /*
  * These macros are intended to wrap calls through one of the paravirt
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 12/27] x86/paravirt: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (24 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

if PIE is enabled, switch the paravirt assembly constraints to be
compatible. The %c/i constrains generate smaller code so is kept by
default.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/paravirt_types.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 280d94c36dad..e6961f8a74aa 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -335,9 +335,17 @@ extern struct pv_lock_ops pv_lock_ops;
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
 
+#ifdef CONFIG_X86_PIE
+#define paravirt_opptr_call "a"
+#define paravirt_opptr_type "p"
+#else
+#define paravirt_opptr_call "c"
+#define paravirt_opptr_type "i"
+#endif
+
 #define paravirt_type(op)				\
 	[paravirt_typenum] "i" (PARAVIRT_PATCH(op)),	\
-	[paravirt_opptr] "i" (&(op))
+	[paravirt_opptr] paravirt_opptr_type (&(op))
 #define paravirt_clobber(clobber)		\
 	[paravirt_clobber] "i" (clobber)
 
@@ -391,7 +399,7 @@ int paravirt_disable_iospace(void);
  * offset into the paravirt_patch_template structure, and can therefore be
  * freely converted back into a structure offset.
  */
-#define PARAVIRT_CALL	"call *%c[paravirt_opptr];"
+#define PARAVIRT_CALL	"call *%" paravirt_opptr_call "[paravirt_opptr];"
 
 /*
  * These macros are intended to wrap calls through one of the paravirt
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 12/27] x86/paravirt: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

if PIE is enabled, switch the paravirt assembly constraints to be
compatible. The %c/i constrains generate smaller code so is kept by
default.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/paravirt_types.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 280d94c36dad..e6961f8a74aa 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -335,9 +335,17 @@ extern struct pv_lock_ops pv_lock_ops;
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
 
+#ifdef CONFIG_X86_PIE
+#define paravirt_opptr_call "a"
+#define paravirt_opptr_type "p"
+#else
+#define paravirt_opptr_call "c"
+#define paravirt_opptr_type "i"
+#endif
+
 #define paravirt_type(op)				\
 	[paravirt_typenum] "i" (PARAVIRT_PATCH(op)),	\
-	[paravirt_opptr] "i" (&(op))
+	[paravirt_opptr] paravirt_opptr_type (&(op))
 #define paravirt_clobber(clobber)		\
 	[paravirt_clobber] "i" (clobber)
 
@@ -391,7 +399,7 @@ int paravirt_disable_iospace(void);
  * offset into the paravirt_patch_template structure, and can therefore be
  * freely converted back into a structure offset.
  */
-#define PARAVIRT_CALL	"call *%c[paravirt_opptr];"
+#define PARAVIRT_CALL	"call *%" paravirt_opptr_call "[paravirt_opptr];"
 
 /*
  * These macros are intended to wrap calls through one of the paravirt
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 13/27] x86/boot/64: Use _text in a global for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

By default PIE generated code create only relative references so _text
points to the temporary virtual address. Instead use a global variable
so the relocation is done as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/head64.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index bab4fa579450..675f1dba3b21 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -45,8 +45,14 @@ static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 	return ptr - (void *)_text + (void *)physaddr;
 }
 
-unsigned long __head __startup_64(unsigned long physaddr,
-				  struct boot_params *bp)
+/*
+ * Use a global variable to properly calculate _text delta on PIE. By default
+ * a PIE binary do a RIP relative difference instead of the relocated address.
+ */
+unsigned long _text_offset = (unsigned long)(_text - __START_KERNEL_map);
+
+unsigned long __head notrace __startup_64(unsigned long physaddr,
+					  struct boot_params *bp)
 {
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
@@ -65,7 +71,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * Compute the delta between the address I am compiled to run at
 	 * and the address I am actually running at.
 	 */
-	load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+	load_delta = physaddr - _text_offset;
 
 	/* Is the address not 2M aligned? */
 	if (load_delta & ~PMD_PAGE_MASK)
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 13/27] x86/boot/64: Use _text in a global for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

By default PIE generated code create only relative references so _text
points to the temporary virtual address. Instead use a global variable
so the relocation is done as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/head64.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index bab4fa579450..675f1dba3b21 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -45,8 +45,14 @@ static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 	return ptr - (void *)_text + (void *)physaddr;
 }
 
-unsigned long __head __startup_64(unsigned long physaddr,
-				  struct boot_params *bp)
+/*
+ * Use a global variable to properly calculate _text delta on PIE. By default
+ * a PIE binary do a RIP relative difference instead of the relocated address.
+ */
+unsigned long _text_offset = (unsigned long)(_text - __START_KERNEL_map);
+
+unsigned long __head notrace __startup_64(unsigned long physaddr,
+					  struct boot_params *bp)
 {
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
@@ -65,7 +71,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * Compute the delta between the address I am compiled to run at
 	 * and the address I am actually running at.
 	 */
-	load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+	load_delta = physaddr - _text_offset;
 
 	/* Is the address not 2M aligned? */
 	if (load_delta & ~PMD_PAGE_MASK)
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 13/27] x86/boot/64: Use _text in a global for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (25 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

By default PIE generated code create only relative references so _text
points to the temporary virtual address. Instead use a global variable
so the relocation is done as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/head64.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index bab4fa579450..675f1dba3b21 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -45,8 +45,14 @@ static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 	return ptr - (void *)_text + (void *)physaddr;
 }
 
-unsigned long __head __startup_64(unsigned long physaddr,
-				  struct boot_params *bp)
+/*
+ * Use a global variable to properly calculate _text delta on PIE. By default
+ * a PIE binary do a RIP relative difference instead of the relocated address.
+ */
+unsigned long _text_offset = (unsigned long)(_text - __START_KERNEL_map);
+
+unsigned long __head notrace __startup_64(unsigned long physaddr,
+					  struct boot_params *bp)
 {
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
@@ -65,7 +71,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * Compute the delta between the address I am compiled to run at
 	 * and the address I am actually running at.
 	 */
-	load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+	load_delta = physaddr - _text_offset;
 
 	/* Is the address not 2M aligned? */
 	if (load_delta & ~PMD_PAGE_MASK)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 13/27] x86/boot/64: Use _text in a global for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

By default PIE generated code create only relative references so _text
points to the temporary virtual address. Instead use a global variable
so the relocation is done as expected.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/kernel/head64.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index bab4fa579450..675f1dba3b21 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -45,8 +45,14 @@ static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 	return ptr - (void *)_text + (void *)physaddr;
 }
 
-unsigned long __head __startup_64(unsigned long physaddr,
-				  struct boot_params *bp)
+/*
+ * Use a global variable to properly calculate _text delta on PIE. By default
+ * a PIE binary do a RIP relative difference instead of the relocated address.
+ */
+unsigned long _text_offset = (unsigned long)(_text - __START_KERNEL_map);
+
+unsigned long __head notrace __startup_64(unsigned long physaddr,
+					  struct boot_params *bp)
 {
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
@@ -65,7 +71,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
 	 * Compute the delta between the address I am compiled to run at
 	 * and the address I am actually running at.
 	 */
-	load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+	load_delta = physaddr - _text_offset;
 
 	/* Is the address not 2M aligned? */
 	if (load_delta & ~PMD_PAGE_MASK)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 14/27] x86/percpu: Adapt percpu for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Perpcu uses a clever design where the .percu ELF section has a virtual
address of zero and the relocation code avoid relocating specific
symbols. It makes the code simple and easily adaptable with or without
SMP support.

This design is incompatible with PIE because generated code always try to
access the zero virtual address relative to the default mapping address.
It becomes impossible when KASLR is configured to go below -2G. This
patch solves this problem by removing the zero mapping and adapting the GS
base to be relative to the expected address. These changes are done only
when PIE is enabled. The original implementation is kept as-is
by default.

The assembly and PER_CPU macros are changed to use relative references
when PIE is enabled.

The KALLSYMS_ABSOLUTE_PERCPU configuration is disabled with PIE given
percpu symbols are not absolute in this case.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/entry/entry_64.S      |  4 ++--
 arch/x86/include/asm/percpu.h  | 25 +++++++++++++++++++------
 arch/x86/kernel/cpu/common.c   |  4 +++-
 arch/x86/kernel/head_64.S      |  4 ++++
 arch/x86/kernel/setup_percpu.c |  2 +-
 arch/x86/kernel/vmlinux.lds.S  | 13 +++++++++++--
 arch/x86/lib/cmpxchg16b_emu.S  |  8 ++++----
 arch/x86/xen/xen-asm.S         | 12 ++++++------
 init/Kconfig                   |  2 +-
 9 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 15bd5530d2ae..d3a52d2342af 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -392,7 +392,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_CC_STACKPROTECTOR
 	movq	TASK_stack_canary(%rsi), %rbx
-	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+	movq	%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
 
 	/* restore callee-saved registers */
@@ -808,7 +808,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss + (TSS_ist + ((x) - 1) * 8))
 
 .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
 ENTRY(\sym)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index b21a475fd7ed..07250f1099b5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,9 +4,11 @@
 #ifdef CONFIG_X86_64
 #define __percpu_seg		gs
 #define __percpu_mov_op		movq
+#define __percpu_rel		(%rip)
 #else
 #define __percpu_seg		fs
 #define __percpu_mov_op		movl
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
@@ -27,10 +29,14 @@
 #define PER_CPU(var, reg)						\
 	__percpu_mov_op %__percpu_seg:this_cpu_off, reg;		\
 	lea var(reg), reg
-#define PER_CPU_VAR(var)	%__percpu_seg:var
+/* Compatible with Position Independent Code */
+#define PER_CPU_VAR(var)		%__percpu_seg:(var)##__percpu_rel
+/* Rare absolute reference */
+#define PER_CPU_VAR_ABS(var)		%__percpu_seg:var
 #else /* ! SMP */
 #define PER_CPU(var, reg)	__percpu_mov_op $var, reg
-#define PER_CPU_VAR(var)	var
+#define PER_CPU_VAR(var)	(var)##__percpu_rel
+#define PER_CPU_VAR_ABS(var)	var
 #endif	/* SMP */
 
 #ifdef CONFIG_X86_64_SMP
@@ -208,27 +214,34 @@ do {									\
 	pfo_ret__;					\
 })
 
+/* Position Independent code uses relative addresses only */
+#ifdef CONFIG_X86_PIE
+#define __percpu_stable_arg __percpu_arg(a1)
+#else
+#define __percpu_stable_arg __percpu_arg(P1)
+#endif
+
 #define percpu_stable_op(op, var)			\
 ({							\
 	typeof(var) pfo_ret__;				\
 	switch (sizeof(var)) {				\
 	case 1:						\
-		asm(op "b "__percpu_arg(P1)",%0"	\
+		asm(op "b "__percpu_stable_arg ",%0"	\
 		    : "=q" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 2:						\
-		asm(op "w "__percpu_arg(P1)",%0"	\
+		asm(op "w "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 4:						\
-		asm(op "l "__percpu_arg(P1)",%0"	\
+		asm(op "l "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 8:						\
-		asm(op "q "__percpu_arg(P1)",%0"	\
+		asm(op "q "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c9176bae7fd8..8732c9e719d5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -461,7 +461,9 @@ void load_percpu_segment(int cpu)
 	loadsegment(fs, __KERNEL_PERCPU);
 #else
 	__loadsegment_simple(gs, 0);
-	wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
+	wrmsrl(MSR_GS_BASE,
+	       (unsigned long)per_cpu(irq_stack_union.gs_base, cpu) -
+	       (unsigned long)__per_cpu_start);
 #endif
 	load_stack_canary_segment();
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 32d1899f48df..df5198e310fc 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -274,7 +274,11 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
+#ifdef CONFIG_X86_PIE
+	.quad	0
+#else
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
+#endif
 	GLOBAL(initial_stack)
 	/*
 	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 28dafed6c682..271829a1cc38 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -25,7 +25,7 @@
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
 EXPORT_PER_CPU_SYMBOL(cpu_number);
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_PIE)
 #define BOOT_PERCPU_OFFSET ((unsigned long)__per_cpu_load)
 #else
 #define BOOT_PERCPU_OFFSET 0
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index f05f00acac89..48268d059ebe 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -186,9 +186,14 @@ SECTIONS
 	/*
 	 * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
 	 * output PHDR, so the next output section - .init.text - should
-	 * start another segment - init.
+	 * start another segment - init. For Position Independent Code, the
+	 * per-cpu section cannot be zero-based because everything is relative.
 	 */
+#ifdef CONFIG_X86_PIE
+	PERCPU_SECTION(INTERNODE_CACHE_BYTES)
+#else
 	PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu)
+#endif
 	ASSERT(SIZEOF(.data..percpu) < CONFIG_PHYSICAL_START,
 	       "per-CPU data too large - increase CONFIG_PHYSICAL_START")
 #endif
@@ -364,7 +369,11 @@ SECTIONS
  * Per-cpu symbols which need to be offset from __per_cpu_load
  * for the boot processor.
  */
+#ifdef CONFIG_X86_PIE
+#define INIT_PER_CPU(x) init_per_cpu__##x = x
+#else
 #define INIT_PER_CPU(x) init_per_cpu__##x = x + __per_cpu_load
+#endif
 INIT_PER_CPU(gdt_page);
 INIT_PER_CPU(irq_stack_union);
 
@@ -374,7 +383,7 @@ INIT_PER_CPU(irq_stack_union);
 . = ASSERT((_end - _text <= KERNEL_IMAGE_SIZE),
 	   "kernel image bigger than KERNEL_IMAGE_SIZE");
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && !defined(CONFIG_X86_PIE)
 . = ASSERT((irq_stack_union == 0),
            "irq_stack_union is not at start of per-cpu area");
 #endif
diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 9b330242e740..254950604ae4 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -33,13 +33,13 @@ ENTRY(this_cpu_cmpxchg16b_emu)
 	pushfq
 	cli
 
-	cmpq PER_CPU_VAR((%rsi)), %rax
+	cmpq PER_CPU_VAR_ABS((%rsi)), %rax
 	jne .Lnot_same
-	cmpq PER_CPU_VAR(8(%rsi)), %rdx
+	cmpq PER_CPU_VAR_ABS(8(%rsi)), %rdx
 	jne .Lnot_same
 
-	movq %rbx, PER_CPU_VAR((%rsi))
-	movq %rcx, PER_CPU_VAR(8(%rsi))
+	movq %rbx, PER_CPU_VAR_ABS((%rsi))
+	movq %rcx, PER_CPU_VAR_ABS(8(%rsi))
 
 	popfq
 	mov $1, %al
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index dcd31fa39b5d..495d7f42f254 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -20,7 +20,7 @@
 ENTRY(xen_irq_enable_direct)
 	FRAME_BEGIN
 	/* Unmask events */
-	movb $0, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	movb $0, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 
 	/*
 	 * Preempt here doesn't matter because that will deal with any
@@ -29,7 +29,7 @@ ENTRY(xen_irq_enable_direct)
 	 */
 
 	/* Test for pending */
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
 	jz 1f
 
 	call check_events
@@ -44,7 +44,7 @@ ENTRY(xen_irq_enable_direct)
  * non-zero.
  */
 ENTRY(xen_irq_disable_direct)
-	movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	movb $1, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	ret
 ENDPROC(xen_irq_disable_direct)
 
@@ -58,7 +58,7 @@ ENDPROC(xen_irq_disable_direct)
  * x86 use opposite senses (mask vs enable).
  */
 ENTRY(xen_save_fl_direct)
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	setz %ah
 	addb %ah, %ah
 	ret
@@ -79,7 +79,7 @@ ENTRY(xen_restore_fl_direct)
 #else
 	testb $X86_EFLAGS_IF>>8, %ah
 #endif
-	setz PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	setz PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	/*
 	 * Preempt here doesn't matter because that will deal with any
 	 * pending interrupts.  The pending check may end up being run
@@ -87,7 +87,7 @@ ENTRY(xen_restore_fl_direct)
 	 */
 
 	/* check for unmasked and pending */
-	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
 	jnz 1f
 	call check_events
 1:
diff --git a/init/Kconfig b/init/Kconfig
index 78cb2461012e..ccb1d8daf241 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1201,7 +1201,7 @@ config KALLSYMS_ALL
 config KALLSYMS_ABSOLUTE_PERCPU
 	bool
 	depends on KALLSYMS
-	default X86_64 && SMP
+	default X86_64 && SMP && !X86_PIE
 
 config KALLSYMS_BASE_RELATIVE
 	bool
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 14/27] x86/percpu: Adapt percpu for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Perpcu uses a clever design where the .percu ELF section has a virtual
address of zero and the relocation code avoid relocating specific
symbols. It makes the code simple and easily adaptable with or without
SMP support.

This design is incompatible with PIE because generated code always try to
access the zero virtual address relative to the default mapping address.
It becomes impossible when KASLR is configured to go below -2G. This
patch solves this problem by removing the zero mapping and adapting the GS
base to be relative to the expected address. These changes are done only
when PIE is enabled. The original implementation is kept as-is
by default.

The assembly and PER_CPU macros are changed to use relative references
when PIE is enabled.

The KALLSYMS_ABSOLUTE_PERCPU configuration is disabled with PIE given
percpu symbols are not absolute in this case.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/entry/entry_64.S      |  4 ++--
 arch/x86/include/asm/percpu.h  | 25 +++++++++++++++++++------
 arch/x86/kernel/cpu/common.c   |  4 +++-
 arch/x86/kernel/head_64.S      |  4 ++++
 arch/x86/kernel/setup_percpu.c |  2 +-
 arch/x86/kernel/vmlinux.lds.S  | 13 +++++++++++--
 arch/x86/lib/cmpxchg16b_emu.S  |  8 ++++----
 arch/x86/xen/xen-asm.S         | 12 ++++++------
 init/Kconfig                   |  2 +-
 9 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 15bd5530d2ae..d3a52d2342af 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -392,7 +392,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_CC_STACKPROTECTOR
 	movq	TASK_stack_canary(%rsi), %rbx
-	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+	movq	%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
 
 	/* restore callee-saved registers */
@@ -808,7 +808,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss + (TSS_ist + ((x) - 1) * 8))
 
 .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
 ENTRY(\sym)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index b21a475fd7ed..07250f1099b5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,9 +4,11 @@
 #ifdef CONFIG_X86_64
 #define __percpu_seg		gs
 #define __percpu_mov_op		movq
+#define __percpu_rel		(%rip)
 #else
 #define __percpu_seg		fs
 #define __percpu_mov_op		movl
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
@@ -27,10 +29,14 @@
 #define PER_CPU(var, reg)						\
 	__percpu_mov_op %__percpu_seg:this_cpu_off, reg;		\
 	lea var(reg), reg
-#define PER_CPU_VAR(var)	%__percpu_seg:var
+/* Compatible with Position Independent Code */
+#define PER_CPU_VAR(var)		%__percpu_seg:(var)##__percpu_rel
+/* Rare absolute reference */
+#define PER_CPU_VAR_ABS(var)		%__percpu_seg:var
 #else /* ! SMP */
 #define PER_CPU(var, reg)	__percpu_mov_op $var, reg
-#define PER_CPU_VAR(var)	var
+#define PER_CPU_VAR(var)	(var)##__percpu_rel
+#define PER_CPU_VAR_ABS(var)	var
 #endif	/* SMP */
 
 #ifdef CONFIG_X86_64_SMP
@@ -208,27 +214,34 @@ do {									\
 	pfo_ret__;					\
 })
 
+/* Position Independent code uses relative addresses only */
+#ifdef CONFIG_X86_PIE
+#define __percpu_stable_arg __percpu_arg(a1)
+#else
+#define __percpu_stable_arg __percpu_arg(P1)
+#endif
+
 #define percpu_stable_op(op, var)			\
 ({							\
 	typeof(var) pfo_ret__;				\
 	switch (sizeof(var)) {				\
 	case 1:						\
-		asm(op "b "__percpu_arg(P1)",%0"	\
+		asm(op "b "__percpu_stable_arg ",%0"	\
 		    : "=q" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 2:						\
-		asm(op "w "__percpu_arg(P1)",%0"	\
+		asm(op "w "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 4:						\
-		asm(op "l "__percpu_arg(P1)",%0"	\
+		asm(op "l "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 8:						\
-		asm(op "q "__percpu_arg(P1)",%0"	\
+		asm(op "q "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c9176bae7fd8..8732c9e719d5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -461,7 +461,9 @@ void load_percpu_segment(int cpu)
 	loadsegment(fs, __KERNEL_PERCPU);
 #else
 	__loadsegment_simple(gs, 0);
-	wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
+	wrmsrl(MSR_GS_BASE,
+	       (unsigned long)per_cpu(irq_stack_union.gs_base, cpu) -
+	       (unsigned long)__per_cpu_start);
 #endif
 	load_stack_canary_segment();
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 32d1899f48df..df5198e310fc 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -274,7 +274,11 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
+#ifdef CONFIG_X86_PIE
+	.quad	0
+#else
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
+#endif
 	GLOBAL(initial_stack)
 	/*
 	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 28dafed6c682..271829a1cc38 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -25,7 +25,7 @@
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
 EXPORT_PER_CPU_SYMBOL(cpu_number);
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_PIE)
 #define BOOT_PERCPU_OFFSET ((unsigned long)__per_cpu_load)
 #else
 #define BOOT_PERCPU_OFFSET 0
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index f05f00acac89..48268d059ebe 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -186,9 +186,14 @@ SECTIONS
 	/*
 	 * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
 	 * output PHDR, so the next output section - .init.text - should
-	 * start another segment - init.
+	 * start another segment - init. For Position Independent Code, the
+	 * per-cpu section cannot be zero-based because everything is relative.
 	 */
+#ifdef CONFIG_X86_PIE
+	PERCPU_SECTION(INTERNODE_CACHE_BYTES)
+#else
 	PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu)
+#endif
 	ASSERT(SIZEOF(.data..percpu) < CONFIG_PHYSICAL_START,
 	       "per-CPU data too large - increase CONFIG_PHYSICAL_START")
 #endif
@@ -364,7 +369,11 @@ SECTIONS
  * Per-cpu symbols which need to be offset from __per_cpu_load
  * for the boot processor.
  */
+#ifdef CONFIG_X86_PIE
+#define INIT_PER_CPU(x) init_per_cpu__##x = x
+#else
 #define INIT_PER_CPU(x) init_per_cpu__##x = x + __per_cpu_load
+#endif
 INIT_PER_CPU(gdt_page);
 INIT_PER_CPU(irq_stack_union);
 
@@ -374,7 +383,7 @@ INIT_PER_CPU(irq_stack_union);
 . = ASSERT((_end - _text <= KERNEL_IMAGE_SIZE),
 	   "kernel image bigger than KERNEL_IMAGE_SIZE");
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && !defined(CONFIG_X86_PIE)
 . = ASSERT((irq_stack_union == 0),
            "irq_stack_union is not at start of per-cpu area");
 #endif
diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 9b330242e740..254950604ae4 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -33,13 +33,13 @@ ENTRY(this_cpu_cmpxchg16b_emu)
 	pushfq
 	cli
 
-	cmpq PER_CPU_VAR((%rsi)), %rax
+	cmpq PER_CPU_VAR_ABS((%rsi)), %rax
 	jne .Lnot_same
-	cmpq PER_CPU_VAR(8(%rsi)), %rdx
+	cmpq PER_CPU_VAR_ABS(8(%rsi)), %rdx
 	jne .Lnot_same
 
-	movq %rbx, PER_CPU_VAR((%rsi))
-	movq %rcx, PER_CPU_VAR(8(%rsi))
+	movq %rbx, PER_CPU_VAR_ABS((%rsi))
+	movq %rcx, PER_CPU_VAR_ABS(8(%rsi))
 
 	popfq
 	mov $1, %al
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index dcd31fa39b5d..495d7f42f254 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -20,7 +20,7 @@
 ENTRY(xen_irq_enable_direct)
 	FRAME_BEGIN
 	/* Unmask events */
-	movb $0, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	movb $0, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 
 	/*
 	 * Preempt here doesn't matter because that will deal with any
@@ -29,7 +29,7 @@ ENTRY(xen_irq_enable_direct)
 	 */
 
 	/* Test for pending */
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
 	jz 1f
 
 	call check_events
@@ -44,7 +44,7 @@ ENTRY(xen_irq_enable_direct)
  * non-zero.
  */
 ENTRY(xen_irq_disable_direct)
-	movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	movb $1, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	ret
 ENDPROC(xen_irq_disable_direct)
 
@@ -58,7 +58,7 @@ ENDPROC(xen_irq_disable_direct)
  * x86 use opposite senses (mask vs enable).
  */
 ENTRY(xen_save_fl_direct)
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	setz %ah
 	addb %ah, %ah
 	ret
@@ -79,7 +79,7 @@ ENTRY(xen_restore_fl_direct)
 #else
 	testb $X86_EFLAGS_IF>>8, %ah
 #endif
-	setz PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	setz PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	/*
 	 * Preempt here doesn't matter because that will deal with any
 	 * pending interrupts.  The pending check may end up being run
@@ -87,7 +87,7 @@ ENTRY(xen_restore_fl_direct)
 	 */
 
 	/* check for unmasked and pending */
-	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
 	jnz 1f
 	call check_events
 1:
diff --git a/init/Kconfig b/init/Kconfig
index 78cb2461012e..ccb1d8daf241 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1201,7 +1201,7 @@ config KALLSYMS_ALL
 config KALLSYMS_ABSOLUTE_PERCPU
 	bool
 	depends on KALLSYMS
-	default X86_64 && SMP
+	default X86_64 && SMP && !X86_PIE
 
 config KALLSYMS_BASE_RELATIVE
 	bool
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 14/27] x86/percpu: Adapt percpu for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (27 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Perpcu uses a clever design where the .percu ELF section has a virtual
address of zero and the relocation code avoid relocating specific
symbols. It makes the code simple and easily adaptable with or without
SMP support.

This design is incompatible with PIE because generated code always try to
access the zero virtual address relative to the default mapping address.
It becomes impossible when KASLR is configured to go below -2G. This
patch solves this problem by removing the zero mapping and adapting the GS
base to be relative to the expected address. These changes are done only
when PIE is enabled. The original implementation is kept as-is
by default.

The assembly and PER_CPU macros are changed to use relative references
when PIE is enabled.

The KALLSYMS_ABSOLUTE_PERCPU configuration is disabled with PIE given
percpu symbols are not absolute in this case.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/entry/entry_64.S      |  4 ++--
 arch/x86/include/asm/percpu.h  | 25 +++++++++++++++++++------
 arch/x86/kernel/cpu/common.c   |  4 +++-
 arch/x86/kernel/head_64.S      |  4 ++++
 arch/x86/kernel/setup_percpu.c |  2 +-
 arch/x86/kernel/vmlinux.lds.S  | 13 +++++++++++--
 arch/x86/lib/cmpxchg16b_emu.S  |  8 ++++----
 arch/x86/xen/xen-asm.S         | 12 ++++++------
 init/Kconfig                   |  2 +-
 9 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 15bd5530d2ae..d3a52d2342af 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -392,7 +392,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_CC_STACKPROTECTOR
 	movq	TASK_stack_canary(%rsi), %rbx
-	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+	movq	%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
 
 	/* restore callee-saved registers */
@@ -808,7 +808,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss + (TSS_ist + ((x) - 1) * 8))
 
 .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
 ENTRY(\sym)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index b21a475fd7ed..07250f1099b5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,9 +4,11 @@
 #ifdef CONFIG_X86_64
 #define __percpu_seg		gs
 #define __percpu_mov_op		movq
+#define __percpu_rel		(%rip)
 #else
 #define __percpu_seg		fs
 #define __percpu_mov_op		movl
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
@@ -27,10 +29,14 @@
 #define PER_CPU(var, reg)						\
 	__percpu_mov_op %__percpu_seg:this_cpu_off, reg;		\
 	lea var(reg), reg
-#define PER_CPU_VAR(var)	%__percpu_seg:var
+/* Compatible with Position Independent Code */
+#define PER_CPU_VAR(var)		%__percpu_seg:(var)##__percpu_rel
+/* Rare absolute reference */
+#define PER_CPU_VAR_ABS(var)		%__percpu_seg:var
 #else /* ! SMP */
 #define PER_CPU(var, reg)	__percpu_mov_op $var, reg
-#define PER_CPU_VAR(var)	var
+#define PER_CPU_VAR(var)	(var)##__percpu_rel
+#define PER_CPU_VAR_ABS(var)	var
 #endif	/* SMP */
 
 #ifdef CONFIG_X86_64_SMP
@@ -208,27 +214,34 @@ do {									\
 	pfo_ret__;					\
 })
 
+/* Position Independent code uses relative addresses only */
+#ifdef CONFIG_X86_PIE
+#define __percpu_stable_arg __percpu_arg(a1)
+#else
+#define __percpu_stable_arg __percpu_arg(P1)
+#endif
+
 #define percpu_stable_op(op, var)			\
 ({							\
 	typeof(var) pfo_ret__;				\
 	switch (sizeof(var)) {				\
 	case 1:						\
-		asm(op "b "__percpu_arg(P1)",%0"	\
+		asm(op "b "__percpu_stable_arg ",%0"	\
 		    : "=q" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 2:						\
-		asm(op "w "__percpu_arg(P1)",%0"	\
+		asm(op "w "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 4:						\
-		asm(op "l "__percpu_arg(P1)",%0"	\
+		asm(op "l "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 8:						\
-		asm(op "q "__percpu_arg(P1)",%0"	\
+		asm(op "q "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c9176bae7fd8..8732c9e719d5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -461,7 +461,9 @@ void load_percpu_segment(int cpu)
 	loadsegment(fs, __KERNEL_PERCPU);
 #else
 	__loadsegment_simple(gs, 0);
-	wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
+	wrmsrl(MSR_GS_BASE,
+	       (unsigned long)per_cpu(irq_stack_union.gs_base, cpu) -
+	       (unsigned long)__per_cpu_start);
 #endif
 	load_stack_canary_segment();
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 32d1899f48df..df5198e310fc 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -274,7 +274,11 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
+#ifdef CONFIG_X86_PIE
+	.quad	0
+#else
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
+#endif
 	GLOBAL(initial_stack)
 	/*
 	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 28dafed6c682..271829a1cc38 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -25,7 +25,7 @@
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
 EXPORT_PER_CPU_SYMBOL(cpu_number);
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_PIE)
 #define BOOT_PERCPU_OFFSET ((unsigned long)__per_cpu_load)
 #else
 #define BOOT_PERCPU_OFFSET 0
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index f05f00acac89..48268d059ebe 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -186,9 +186,14 @@ SECTIONS
 	/*
 	 * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
 	 * output PHDR, so the next output section - .init.text - should
-	 * start another segment - init.
+	 * start another segment - init. For Position Independent Code, the
+	 * per-cpu section cannot be zero-based because everything is relative.
 	 */
+#ifdef CONFIG_X86_PIE
+	PERCPU_SECTION(INTERNODE_CACHE_BYTES)
+#else
 	PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu)
+#endif
 	ASSERT(SIZEOF(.data..percpu) < CONFIG_PHYSICAL_START,
 	       "per-CPU data too large - increase CONFIG_PHYSICAL_START")
 #endif
@@ -364,7 +369,11 @@ SECTIONS
  * Per-cpu symbols which need to be offset from __per_cpu_load
  * for the boot processor.
  */
+#ifdef CONFIG_X86_PIE
+#define INIT_PER_CPU(x) init_per_cpu__##x = x
+#else
 #define INIT_PER_CPU(x) init_per_cpu__##x = x + __per_cpu_load
+#endif
 INIT_PER_CPU(gdt_page);
 INIT_PER_CPU(irq_stack_union);
 
@@ -374,7 +383,7 @@ INIT_PER_CPU(irq_stack_union);
 . = ASSERT((_end - _text <= KERNEL_IMAGE_SIZE),
 	   "kernel image bigger than KERNEL_IMAGE_SIZE");
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && !defined(CONFIG_X86_PIE)
 . = ASSERT((irq_stack_union == 0),
            "irq_stack_union is not at start of per-cpu area");
 #endif
diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 9b330242e740..254950604ae4 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -33,13 +33,13 @@ ENTRY(this_cpu_cmpxchg16b_emu)
 	pushfq
 	cli
 
-	cmpq PER_CPU_VAR((%rsi)), %rax
+	cmpq PER_CPU_VAR_ABS((%rsi)), %rax
 	jne .Lnot_same
-	cmpq PER_CPU_VAR(8(%rsi)), %rdx
+	cmpq PER_CPU_VAR_ABS(8(%rsi)), %rdx
 	jne .Lnot_same
 
-	movq %rbx, PER_CPU_VAR((%rsi))
-	movq %rcx, PER_CPU_VAR(8(%rsi))
+	movq %rbx, PER_CPU_VAR_ABS((%rsi))
+	movq %rcx, PER_CPU_VAR_ABS(8(%rsi))
 
 	popfq
 	mov $1, %al
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index dcd31fa39b5d..495d7f42f254 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -20,7 +20,7 @@
 ENTRY(xen_irq_enable_direct)
 	FRAME_BEGIN
 	/* Unmask events */
-	movb $0, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	movb $0, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 
 	/*
 	 * Preempt here doesn't matter because that will deal with any
@@ -29,7 +29,7 @@ ENTRY(xen_irq_enable_direct)
 	 */
 
 	/* Test for pending */
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
 	jz 1f
 
 	call check_events
@@ -44,7 +44,7 @@ ENTRY(xen_irq_enable_direct)
  * non-zero.
  */
 ENTRY(xen_irq_disable_direct)
-	movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	movb $1, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	ret
 ENDPROC(xen_irq_disable_direct)
 
@@ -58,7 +58,7 @@ ENDPROC(xen_irq_disable_direct)
  * x86 use opposite senses (mask vs enable).
  */
 ENTRY(xen_save_fl_direct)
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	setz %ah
 	addb %ah, %ah
 	ret
@@ -79,7 +79,7 @@ ENTRY(xen_restore_fl_direct)
 #else
 	testb $X86_EFLAGS_IF>>8, %ah
 #endif
-	setz PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	setz PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	/*
 	 * Preempt here doesn't matter because that will deal with any
 	 * pending interrupts.  The pending check may end up being run
@@ -87,7 +87,7 @@ ENTRY(xen_restore_fl_direct)
 	 */
 
 	/* check for unmasked and pending */
-	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
 	jnz 1f
 	call check_events
 1:
diff --git a/init/Kconfig b/init/Kconfig
index 78cb2461012e..ccb1d8daf241 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1201,7 +1201,7 @@ config KALLSYMS_ALL
 config KALLSYMS_ABSOLUTE_PERCPU
 	bool
 	depends on KALLSYMS
-	default X86_64 && SMP
+	default X86_64 && SMP && !X86_PIE
 
 config KALLSYMS_BASE_RELATIVE
 	bool
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 14/27] x86/percpu: Adapt percpu for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Perpcu uses a clever design where the .percu ELF section has a virtual
address of zero and the relocation code avoid relocating specific
symbols. It makes the code simple and easily adaptable with or without
SMP support.

This design is incompatible with PIE because generated code always try to
access the zero virtual address relative to the default mapping address.
It becomes impossible when KASLR is configured to go below -2G. This
patch solves this problem by removing the zero mapping and adapting the GS
base to be relative to the expected address. These changes are done only
when PIE is enabled. The original implementation is kept as-is
by default.

The assembly and PER_CPU macros are changed to use relative references
when PIE is enabled.

The KALLSYMS_ABSOLUTE_PERCPU configuration is disabled with PIE given
percpu symbols are not absolute in this case.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/entry/entry_64.S      |  4 ++--
 arch/x86/include/asm/percpu.h  | 25 +++++++++++++++++++------
 arch/x86/kernel/cpu/common.c   |  4 +++-
 arch/x86/kernel/head_64.S      |  4 ++++
 arch/x86/kernel/setup_percpu.c |  2 +-
 arch/x86/kernel/vmlinux.lds.S  | 13 +++++++++++--
 arch/x86/lib/cmpxchg16b_emu.S  |  8 ++++----
 arch/x86/xen/xen-asm.S         | 12 ++++++------
 init/Kconfig                   |  2 +-
 9 files changed, 51 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 15bd5530d2ae..d3a52d2342af 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -392,7 +392,7 @@ ENTRY(__switch_to_asm)
 
 #ifdef CONFIG_CC_STACKPROTECTOR
 	movq	TASK_stack_canary(%rsi), %rbx
-	movq	%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
+	movq	%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
 
 	/* restore callee-saved registers */
@@ -808,7 +808,7 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
 /*
  * Exception entry points.
  */
-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8)
+#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss + (TSS_ist + ((x) - 1) * 8))
 
 .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
 ENTRY(\sym)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index b21a475fd7ed..07250f1099b5 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -4,9 +4,11 @@
 #ifdef CONFIG_X86_64
 #define __percpu_seg		gs
 #define __percpu_mov_op		movq
+#define __percpu_rel		(%rip)
 #else
 #define __percpu_seg		fs
 #define __percpu_mov_op		movl
+#define __percpu_rel
 #endif
 
 #ifdef __ASSEMBLY__
@@ -27,10 +29,14 @@
 #define PER_CPU(var, reg)						\
 	__percpu_mov_op %__percpu_seg:this_cpu_off, reg;		\
 	lea var(reg), reg
-#define PER_CPU_VAR(var)	%__percpu_seg:var
+/* Compatible with Position Independent Code */
+#define PER_CPU_VAR(var)		%__percpu_seg:(var)##__percpu_rel
+/* Rare absolute reference */
+#define PER_CPU_VAR_ABS(var)		%__percpu_seg:var
 #else /* ! SMP */
 #define PER_CPU(var, reg)	__percpu_mov_op $var, reg
-#define PER_CPU_VAR(var)	var
+#define PER_CPU_VAR(var)	(var)##__percpu_rel
+#define PER_CPU_VAR_ABS(var)	var
 #endif	/* SMP */
 
 #ifdef CONFIG_X86_64_SMP
@@ -208,27 +214,34 @@ do {									\
 	pfo_ret__;					\
 })
 
+/* Position Independent code uses relative addresses only */
+#ifdef CONFIG_X86_PIE
+#define __percpu_stable_arg __percpu_arg(a1)
+#else
+#define __percpu_stable_arg __percpu_arg(P1)
+#endif
+
 #define percpu_stable_op(op, var)			\
 ({							\
 	typeof(var) pfo_ret__;				\
 	switch (sizeof(var)) {				\
 	case 1:						\
-		asm(op "b "__percpu_arg(P1)",%0"	\
+		asm(op "b "__percpu_stable_arg ",%0"	\
 		    : "=q" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 2:						\
-		asm(op "w "__percpu_arg(P1)",%0"	\
+		asm(op "w "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 4:						\
-		asm(op "l "__percpu_arg(P1)",%0"	\
+		asm(op "l "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
 	case 8:						\
-		asm(op "q "__percpu_arg(P1)",%0"	\
+		asm(op "q "__percpu_stable_arg ",%0"	\
 		    : "=r" (pfo_ret__)			\
 		    : "p" (&(var)));			\
 		break;					\
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c9176bae7fd8..8732c9e719d5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -461,7 +461,9 @@ void load_percpu_segment(int cpu)
 	loadsegment(fs, __KERNEL_PERCPU);
 #else
 	__loadsegment_simple(gs, 0);
-	wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
+	wrmsrl(MSR_GS_BASE,
+	       (unsigned long)per_cpu(irq_stack_union.gs_base, cpu) -
+	       (unsigned long)__per_cpu_start);
 #endif
 	load_stack_canary_segment();
 }
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 32d1899f48df..df5198e310fc 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -274,7 +274,11 @@ ENDPROC(start_cpu0)
 	GLOBAL(initial_code)
 	.quad	x86_64_start_kernel
 	GLOBAL(initial_gs)
+#ifdef CONFIG_X86_PIE
+	.quad	0
+#else
 	.quad	INIT_PER_CPU_VAR(irq_stack_union)
+#endif
 	GLOBAL(initial_stack)
 	/*
 	 * The SIZEOF_PTREGS gap is a convention which helps the in-kernel
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 28dafed6c682..271829a1cc38 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -25,7 +25,7 @@
 DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
 EXPORT_PER_CPU_SYMBOL(cpu_number);
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_PIE)
 #define BOOT_PERCPU_OFFSET ((unsigned long)__per_cpu_load)
 #else
 #define BOOT_PERCPU_OFFSET 0
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index f05f00acac89..48268d059ebe 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -186,9 +186,14 @@ SECTIONS
 	/*
 	 * percpu offsets are zero-based on SMP.  PERCPU_VADDR() changes the
 	 * output PHDR, so the next output section - .init.text - should
-	 * start another segment - init.
+	 * start another segment - init. For Position Independent Code, the
+	 * per-cpu section cannot be zero-based because everything is relative.
 	 */
+#ifdef CONFIG_X86_PIE
+	PERCPU_SECTION(INTERNODE_CACHE_BYTES)
+#else
 	PERCPU_VADDR(INTERNODE_CACHE_BYTES, 0, :percpu)
+#endif
 	ASSERT(SIZEOF(.data..percpu) < CONFIG_PHYSICAL_START,
 	       "per-CPU data too large - increase CONFIG_PHYSICAL_START")
 #endif
@@ -364,7 +369,11 @@ SECTIONS
  * Per-cpu symbols which need to be offset from __per_cpu_load
  * for the boot processor.
  */
+#ifdef CONFIG_X86_PIE
+#define INIT_PER_CPU(x) init_per_cpu__##x = x
+#else
 #define INIT_PER_CPU(x) init_per_cpu__##x = x + __per_cpu_load
+#endif
 INIT_PER_CPU(gdt_page);
 INIT_PER_CPU(irq_stack_union);
 
@@ -374,7 +383,7 @@ INIT_PER_CPU(irq_stack_union);
 . = ASSERT((_end - _text <= KERNEL_IMAGE_SIZE),
 	   "kernel image bigger than KERNEL_IMAGE_SIZE");
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && !defined(CONFIG_X86_PIE)
 . = ASSERT((irq_stack_union == 0),
            "irq_stack_union is not at start of per-cpu area");
 #endif
diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 9b330242e740..254950604ae4 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -33,13 +33,13 @@ ENTRY(this_cpu_cmpxchg16b_emu)
 	pushfq
 	cli
 
-	cmpq PER_CPU_VAR((%rsi)), %rax
+	cmpq PER_CPU_VAR_ABS((%rsi)), %rax
 	jne .Lnot_same
-	cmpq PER_CPU_VAR(8(%rsi)), %rdx
+	cmpq PER_CPU_VAR_ABS(8(%rsi)), %rdx
 	jne .Lnot_same
 
-	movq %rbx, PER_CPU_VAR((%rsi))
-	movq %rcx, PER_CPU_VAR(8(%rsi))
+	movq %rbx, PER_CPU_VAR_ABS((%rsi))
+	movq %rcx, PER_CPU_VAR_ABS(8(%rsi))
 
 	popfq
 	mov $1, %al
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index dcd31fa39b5d..495d7f42f254 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -20,7 +20,7 @@
 ENTRY(xen_irq_enable_direct)
 	FRAME_BEGIN
 	/* Unmask events */
-	movb $0, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	movb $0, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 
 	/*
 	 * Preempt here doesn't matter because that will deal with any
@@ -29,7 +29,7 @@ ENTRY(xen_irq_enable_direct)
 	 */
 
 	/* Test for pending */
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
 	jz 1f
 
 	call check_events
@@ -44,7 +44,7 @@ ENTRY(xen_irq_enable_direct)
  * non-zero.
  */
 ENTRY(xen_irq_disable_direct)
-	movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	movb $1, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	ret
 ENDPROC(xen_irq_disable_direct)
 
@@ -58,7 +58,7 @@ ENDPROC(xen_irq_disable_direct)
  * x86 use opposite senses (mask vs enable).
  */
 ENTRY(xen_save_fl_direct)
-	testb $0xff, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	testb $0xff, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	setz %ah
 	addb %ah, %ah
 	ret
@@ -79,7 +79,7 @@ ENTRY(xen_restore_fl_direct)
 #else
 	testb $X86_EFLAGS_IF>>8, %ah
 #endif
-	setz PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
+	setz PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_mask)
 	/*
 	 * Preempt here doesn't matter because that will deal with any
 	 * pending interrupts.  The pending check may end up being run
@@ -87,7 +87,7 @@ ENTRY(xen_restore_fl_direct)
 	 */
 
 	/* check for unmasked and pending */
-	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_pending
+	cmpw $0x0001, PER_CPU_VAR(xen_vcpu_info + XEN_vcpu_info_pending)
 	jnz 1f
 	call check_events
 1:
diff --git a/init/Kconfig b/init/Kconfig
index 78cb2461012e..ccb1d8daf241 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1201,7 +1201,7 @@ config KALLSYMS_ALL
 config KALLSYMS_ABSOLUTE_PERCPU
 	bool
 	depends on KALLSYMS
-	default X86_64 && SMP
+	default X86_64 && SMP && !X86_PIE
 
 config KALLSYMS_BASE_RELATIVE
 	bool
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 15/27] compiler: Option to default to hidden symbols
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Provide an option to default visibility to hidden except for key
symbols. This option is disabled by default and will be used by x86_64
PIE support to remove errors between compilation units.

The default visibility is also enabled for external symbols that are
compared as they maybe equals (start/end of sections). In this case,
older versions of GCC will remove the comparison if the symbols are
hidden. This issue exists at least on gcc 4.9 and before.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/boot/boot.h                 |  2 +-
 arch/x86/include/asm/setup.h         |  2 +-
 arch/x86/kernel/cpu/microcode/core.c |  4 ++--
 drivers/base/firmware_class.c        |  4 ++--
 include/asm-generic/sections.h       |  6 ++++++
 include/linux/compiler.h             |  8 ++++++++
 init/Kconfig                         |  7 +++++++
 kernel/kallsyms.c                    | 16 ++++++++--------
 kernel/trace/trace.h                 |  4 ++--
 lib/dynamic_debug.c                  |  4 ++--
 10 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef5a9cc66fb8..d726c35bdd96 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -193,7 +193,7 @@ static inline bool memcmp_gs(const void *s1, addr_t s2, size_t len)
 }
 
 /* Heap -- available for dynamic lists. */
-extern char _end[];
+extern char _end[] __default_visibility;
 extern char *HEAP;
 extern char *heap_end;
 #define RESET_HEAP() ((void *)( HEAP = _end ))
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index a65cf544686a..7e0b54f605c6 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -67,7 +67,7 @@ static inline void x86_ce4100_early_setup(void) { }
  * This is set up by the setup-routine at boot-time
  */
 extern struct boot_params boot_params;
-extern char _text[];
+extern char _text[] __default_visibility;
 
 static inline bool kaslr_enabled(void)
 {
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 86e8f0b2537b..8f021783a929 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -144,8 +144,8 @@ static bool __init check_loader_disabled_bsp(void)
 	return *res;
 }
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 bool get_builtin_firmware(struct cpio_data *cd, const char *name)
 {
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4b57cf5bc81d..77d4727f6594 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -45,8 +45,8 @@ MODULE_LICENSE("GPL");
 
 #ifdef CONFIG_FW_LOADER
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 static bool fw_get_builtin_firmware(struct firmware *fw, const char *name,
 				    void *buf, size_t size)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index e5da44eddd2f..1aa5d6dac9e1 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -30,6 +30,9 @@
  *	__irqentry_text_start, __irqentry_text_end
  *	__softirqentry_text_start, __softirqentry_text_end
  */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(default)
+#endif
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
 extern char __bss_start[], __bss_stop[];
@@ -46,6 +49,9 @@ extern char __softirqentry_text_start[], __softirqentry_text_end[];
 
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility pop
+#endif
 
 extern __visible const void __nosave_begin, __nosave_end;
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e95a2631e545..6997716f73bf 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -78,6 +78,14 @@ extern void __chk_io_ptr(const volatile void __iomem *);
 #include <linux/compiler-clang.h>
 #endif
 
+/* Useful for Position Independent Code to reduce global references */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(hidden)
+#define __default_visibility  __attribute__((visibility ("default")))
+#else
+#define __default_visibility
+#endif
+
 /*
  * Generic compiler-dependent macros required for kernel
  * build go below this comment. Actual compiler/compiler version
diff --git a/init/Kconfig b/init/Kconfig
index ccb1d8daf241..b640201fcff7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1649,6 +1649,13 @@ config PROFILING
 config TRACEPOINTS
 	bool
 
+#
+# Default to hidden visibility for all symbols.
+# Useful for Position Independent Code to reduce global references.
+#
+config DEFAULT_HIDDEN
+	bool
+
 source "arch/Kconfig"
 
 endmenu		# General setup
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 127e7cfafa55..252019c8c3a9 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -32,24 +32,24 @@
  * These will be re-linked against their real values
  * during the second link stage.
  */
-extern const unsigned long kallsyms_addresses[] __weak;
-extern const int kallsyms_offsets[] __weak;
-extern const u8 kallsyms_names[] __weak;
+extern const unsigned long kallsyms_addresses[] __weak __default_visibility;
+extern const int kallsyms_offsets[] __weak __default_visibility;
+extern const u8 kallsyms_names[] __weak __default_visibility;
 
 /*
  * Tell the compiler that the count isn't in the small data section if the arch
  * has one (eg: FRV).
  */
 extern const unsigned long kallsyms_num_syms
-__attribute__((weak, section(".rodata")));
+__attribute__((weak, section(".rodata"))) __default_visibility;
 
 extern const unsigned long kallsyms_relative_base
-__attribute__((weak, section(".rodata")));
+__attribute__((weak, section(".rodata"))) __default_visibility;
 
-extern const u8 kallsyms_token_table[] __weak;
-extern const u16 kallsyms_token_index[] __weak;
+extern const u8 kallsyms_token_table[] __weak __default_visibility;
+extern const u16 kallsyms_token_index[] __weak __default_visibility;
 
-extern const unsigned long kallsyms_markers[] __weak;
+extern const unsigned long kallsyms_markers[] __weak __default_visibility;
 
 static inline int is_kernel_inittext(unsigned long addr)
 {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 652c682707cd..31cb920039a2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1742,8 +1742,8 @@ extern int trace_event_enable_disable(struct trace_event_file *file,
 				      int enable, int soft_disable);
 extern int tracing_alloc_snapshot(void);
 
-extern const char *__start___trace_bprintk_fmt[];
-extern const char *__stop___trace_bprintk_fmt[];
+extern const char *__start___trace_bprintk_fmt[] __default_visibility;
+extern const char *__stop___trace_bprintk_fmt[] __default_visibility;
 
 extern const char *__start___tracepoint_str[];
 extern const char *__stop___tracepoint_str[];
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index da796e2dc4f5..10ed20177354 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -37,8 +37,8 @@
 #include <linux/device.h>
 #include <linux/netdevice.h>
 
-extern struct _ddebug __start___verbose[];
-extern struct _ddebug __stop___verbose[];
+extern struct _ddebug __start___verbose[] __default_visibility;
+extern struct _ddebug __stop___verbose[] __default_visibility;
 
 struct ddebug_table {
 	struct list_head link;
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 15/27] compiler: Option to default to hidden symbols
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Provide an option to default visibility to hidden except for key
symbols. This option is disabled by default and will be used by x86_64
PIE support to remove errors between compilation units.

The default visibility is also enabled for external symbols that are
compared as they maybe equals (start/end of sections). In this case,
older versions of GCC will remove the comparison if the symbols are
hidden. This issue exists at least on gcc 4.9 and before.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/boot/boot.h                 |  2 +-
 arch/x86/include/asm/setup.h         |  2 +-
 arch/x86/kernel/cpu/microcode/core.c |  4 ++--
 drivers/base/firmware_class.c        |  4 ++--
 include/asm-generic/sections.h       |  6 ++++++
 include/linux/compiler.h             |  8 ++++++++
 init/Kconfig                         |  7 +++++++
 kernel/kallsyms.c                    | 16 ++++++++--------
 kernel/trace/trace.h                 |  4 ++--
 lib/dynamic_debug.c                  |  4 ++--
 10 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef5a9cc66fb8..d726c35bdd96 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -193,7 +193,7 @@ static inline bool memcmp_gs(const void *s1, addr_t s2, size_t len)
 }
 
 /* Heap -- available for dynamic lists. */
-extern char _end[];
+extern char _end[] __default_visibility;
 extern char *HEAP;
 extern char *heap_end;
 #define RESET_HEAP() ((void *)( HEAP = _end ))
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index a65cf544686a..7e0b54f605c6 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -67,7 +67,7 @@ static inline void x86_ce4100_early_setup(void) { }
  * This is set up by the setup-routine at boot-time
  */
 extern struct boot_params boot_params;
-extern char _text[];
+extern char _text[] __default_visibility;
 
 static inline bool kaslr_enabled(void)
 {
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 86e8f0b2537b..8f021783a929 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -144,8 +144,8 @@ static bool __init check_loader_disabled_bsp(void)
 	return *res;
 }
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 bool get_builtin_firmware(struct cpio_data *cd, const char *name)
 {
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4b57cf5bc81d..77d4727f6594 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -45,8 +45,8 @@ MODULE_LICENSE("GPL");
 
 #ifdef CONFIG_FW_LOADER
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 static bool fw_get_builtin_firmware(struct firmware *fw, const char *name,
 				    void *buf, size_t size)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index e5da44eddd2f..1aa5d6dac9e1 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -30,6 +30,9 @@
  *	__irqentry_text_start, __irqentry_text_end
  *	__softirqentry_text_start, __softirqentry_text_end
  */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(default)
+#endif
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
 extern char __bss_start[], __bss_stop[];
@@ -46,6 +49,9 @@ extern char __softirqentry_text_start[], __softirqentry_text_end[];
 
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility pop
+#endif
 
 extern __visible const void __nosave_begin, __nosave_end;
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e95a2631e545..6997716f73bf 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -78,6 +78,14 @@ extern void __chk_io_ptr(const volatile void __iomem *);
 #include <linux/compiler-clang.h>
 #endif
 
+/* Useful for Position Independent Code to reduce global references */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(hidden)
+#define __default_visibility  __attribute__((visibility ("default")))
+#else
+#define __default_visibility
+#endif
+
 /*
  * Generic compiler-dependent macros required for kernel
  * build go below this comment. Actual compiler/compiler version
diff --git a/init/Kconfig b/init/Kconfig
index ccb1d8daf241..b640201fcff7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1649,6 +1649,13 @@ config PROFILING
 config TRACEPOINTS
 	bool
 
+#
+# Default to hidden visibility for all symbols.
+# Useful for Position Independent Code to reduce global references.
+#
+config DEFAULT_HIDDEN
+	bool
+
 source "arch/Kconfig"
 
 endmenu		# General setup
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 127e7cfafa55..252019c8c3a9 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -32,24 +32,24 @@
  * These will be re-linked against their real values
  * during the second link stage.
  */
-extern const unsigned long kallsyms_addresses[] __weak;
-extern const int kallsyms_offsets[] __weak;
-extern const u8 kallsyms_names[] __weak;
+extern const unsigned long kallsyms_addresses[] __weak __default_visibility;
+extern const int kallsyms_offsets[] __weak __default_visibility;
+extern const u8 kallsyms_names[] __weak __default_visibility;
 
 /*
  * Tell the compiler that the count isn't in the small data section if the arch
  * has one (eg: FRV).
  */
 extern const unsigned long kallsyms_num_syms
-__attribute__((weak, section(".rodata")));
+__attribute__((weak, section(".rodata"))) __default_visibility;
 
 extern const unsigned long kallsyms_relative_base
-__attribute__((weak, section(".rodata")));
+__attribute__((weak, section(".rodata"))) __default_visibility;
 
-extern const u8 kallsyms_token_table[] __weak;
-extern const u16 kallsyms_token_index[] __weak;
+extern const u8 kallsyms_token_table[] __weak __default_visibility;
+extern const u16 kallsyms_token_index[] __weak __default_visibility;
 
-extern const unsigned long kallsyms_markers[] __weak;
+extern const unsigned long kallsyms_markers[] __weak __default_visibility;
 
 static inline int is_kernel_inittext(unsigned long addr)
 {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 652c682707cd..31cb920039a2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1742,8 +1742,8 @@ extern int trace_event_enable_disable(struct trace_event_file *file,
 				      int enable, int soft_disable);
 extern int tracing_alloc_snapshot(void);
 
-extern const char *__start___trace_bprintk_fmt[];
-extern const char *__stop___trace_bprintk_fmt[];
+extern const char *__start___trace_bprintk_fmt[] __default_visibility;
+extern const char *__stop___trace_bprintk_fmt[] __default_visibility;
 
 extern const char *__start___tracepoint_str[];
 extern const char *__stop___tracepoint_str[];
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index da796e2dc4f5..10ed20177354 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -37,8 +37,8 @@
 #include <linux/device.h>
 #include <linux/netdevice.h>
 
-extern struct _ddebug __start___verbose[];
-extern struct _ddebug __stop___verbose[];
+extern struct _ddebug __start___verbose[] __default_visibility;
+extern struct _ddebug __stop___verbose[] __default_visibility;
 
 struct ddebug_table {
 	struct list_head link;
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 15/27] compiler: Option to default to hidden symbols
  2017-10-04 21:19 ` Thomas Garnier
                   ` (30 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Provide an option to default visibility to hidden except for key
symbols. This option is disabled by default and will be used by x86_64
PIE support to remove errors between compilation units.

The default visibility is also enabled for external symbols that are
compared as they maybe equals (start/end of sections). In this case,
older versions of GCC will remove the comparison if the symbols are
hidden. This issue exists at least on gcc 4.9 and before.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/boot/boot.h                 |  2 +-
 arch/x86/include/asm/setup.h         |  2 +-
 arch/x86/kernel/cpu/microcode/core.c |  4 ++--
 drivers/base/firmware_class.c        |  4 ++--
 include/asm-generic/sections.h       |  6 ++++++
 include/linux/compiler.h             |  8 ++++++++
 init/Kconfig                         |  7 +++++++
 kernel/kallsyms.c                    | 16 ++++++++--------
 kernel/trace/trace.h                 |  4 ++--
 lib/dynamic_debug.c                  |  4 ++--
 10 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef5a9cc66fb8..d726c35bdd96 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -193,7 +193,7 @@ static inline bool memcmp_gs(const void *s1, addr_t s2, size_t len)
 }
 
 /* Heap -- available for dynamic lists. */
-extern char _end[];
+extern char _end[] __default_visibility;
 extern char *HEAP;
 extern char *heap_end;
 #define RESET_HEAP() ((void *)( HEAP = _end ))
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index a65cf544686a..7e0b54f605c6 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -67,7 +67,7 @@ static inline void x86_ce4100_early_setup(void) { }
  * This is set up by the setup-routine at boot-time
  */
 extern struct boot_params boot_params;
-extern char _text[];
+extern char _text[] __default_visibility;
 
 static inline bool kaslr_enabled(void)
 {
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 86e8f0b2537b..8f021783a929 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -144,8 +144,8 @@ static bool __init check_loader_disabled_bsp(void)
 	return *res;
 }
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 bool get_builtin_firmware(struct cpio_data *cd, const char *name)
 {
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4b57cf5bc81d..77d4727f6594 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -45,8 +45,8 @@ MODULE_LICENSE("GPL");
 
 #ifdef CONFIG_FW_LOADER
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 static bool fw_get_builtin_firmware(struct firmware *fw, const char *name,
 				    void *buf, size_t size)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index e5da44eddd2f..1aa5d6dac9e1 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -30,6 +30,9 @@
  *	__irqentry_text_start, __irqentry_text_end
  *	__softirqentry_text_start, __softirqentry_text_end
  */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(default)
+#endif
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
 extern char __bss_start[], __bss_stop[];
@@ -46,6 +49,9 @@ extern char __softirqentry_text_start[], __softirqentry_text_end[];
 
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility pop
+#endif
 
 extern __visible const void __nosave_begin, __nosave_end;
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e95a2631e545..6997716f73bf 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -78,6 +78,14 @@ extern void __chk_io_ptr(const volatile void __iomem *);
 #include <linux/compiler-clang.h>
 #endif
 
+/* Useful for Position Independent Code to reduce global references */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(hidden)
+#define __default_visibility  __attribute__((visibility ("default")))
+#else
+#define __default_visibility
+#endif
+
 /*
  * Generic compiler-dependent macros required for kernel
  * build go below this comment. Actual compiler/compiler version
diff --git a/init/Kconfig b/init/Kconfig
index ccb1d8daf241..b640201fcff7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1649,6 +1649,13 @@ config PROFILING
 config TRACEPOINTS
 	bool
 
+#
+# Default to hidden visibility for all symbols.
+# Useful for Position Independent Code to reduce global references.
+#
+config DEFAULT_HIDDEN
+	bool
+
 source "arch/Kconfig"
 
 endmenu		# General setup
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 127e7cfafa55..252019c8c3a9 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -32,24 +32,24 @@
  * These will be re-linked against their real values
  * during the second link stage.
  */
-extern const unsigned long kallsyms_addresses[] __weak;
-extern const int kallsyms_offsets[] __weak;
-extern const u8 kallsyms_names[] __weak;
+extern const unsigned long kallsyms_addresses[] __weak __default_visibility;
+extern const int kallsyms_offsets[] __weak __default_visibility;
+extern const u8 kallsyms_names[] __weak __default_visibility;
 
 /*
  * Tell the compiler that the count isn't in the small data section if the arch
  * has one (eg: FRV).
  */
 extern const unsigned long kallsyms_num_syms
-__attribute__((weak, section(".rodata")));
+__attribute__((weak, section(".rodata"))) __default_visibility;
 
 extern const unsigned long kallsyms_relative_base
-__attribute__((weak, section(".rodata")));
+__attribute__((weak, section(".rodata"))) __default_visibility;
 
-extern const u8 kallsyms_token_table[] __weak;
-extern const u16 kallsyms_token_index[] __weak;
+extern const u8 kallsyms_token_table[] __weak __default_visibility;
+extern const u16 kallsyms_token_index[] __weak __default_visibility;
 
-extern const unsigned long kallsyms_markers[] __weak;
+extern const unsigned long kallsyms_markers[] __weak __default_visibility;
 
 static inline int is_kernel_inittext(unsigned long addr)
 {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 652c682707cd..31cb920039a2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1742,8 +1742,8 @@ extern int trace_event_enable_disable(struct trace_event_file *file,
 				      int enable, int soft_disable);
 extern int tracing_alloc_snapshot(void);
 
-extern const char *__start___trace_bprintk_fmt[];
-extern const char *__stop___trace_bprintk_fmt[];
+extern const char *__start___trace_bprintk_fmt[] __default_visibility;
+extern const char *__stop___trace_bprintk_fmt[] __default_visibility;
 
 extern const char *__start___tracepoint_str[];
 extern const char *__stop___tracepoint_str[];
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index da796e2dc4f5..10ed20177354 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -37,8 +37,8 @@
 #include <linux/device.h>
 #include <linux/netdevice.h>
 
-extern struct _ddebug __start___verbose[];
-extern struct _ddebug __stop___verbose[];
+extern struct _ddebug __start___verbose[] __default_visibility;
+extern struct _ddebug __stop___verbose[] __default_visibility;
 
 struct ddebug_table {
 	struct list_head link;
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 15/27] compiler: Option to default to hidden symbols
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Provide an option to default visibility to hidden except for key
symbols. This option is disabled by default and will be used by x86_64
PIE support to remove errors between compilation units.

The default visibility is also enabled for external symbols that are
compared as they maybe equals (start/end of sections). In this case,
older versions of GCC will remove the comparison if the symbols are
hidden. This issue exists at least on gcc 4.9 and before.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/boot/boot.h                 |  2 +-
 arch/x86/include/asm/setup.h         |  2 +-
 arch/x86/kernel/cpu/microcode/core.c |  4 ++--
 drivers/base/firmware_class.c        |  4 ++--
 include/asm-generic/sections.h       |  6 ++++++
 include/linux/compiler.h             |  8 ++++++++
 init/Kconfig                         |  7 +++++++
 kernel/kallsyms.c                    | 16 ++++++++--------
 kernel/trace/trace.h                 |  4 ++--
 lib/dynamic_debug.c                  |  4 ++--
 10 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index ef5a9cc66fb8..d726c35bdd96 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -193,7 +193,7 @@ static inline bool memcmp_gs(const void *s1, addr_t s2, size_t len)
 }
 
 /* Heap -- available for dynamic lists. */
-extern char _end[];
+extern char _end[] __default_visibility;
 extern char *HEAP;
 extern char *heap_end;
 #define RESET_HEAP() ((void *)( HEAP = _end ))
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index a65cf544686a..7e0b54f605c6 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -67,7 +67,7 @@ static inline void x86_ce4100_early_setup(void) { }
  * This is set up by the setup-routine at boot-time
  */
 extern struct boot_params boot_params;
-extern char _text[];
+extern char _text[] __default_visibility;
 
 static inline bool kaslr_enabled(void)
 {
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 86e8f0b2537b..8f021783a929 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -144,8 +144,8 @@ static bool __init check_loader_disabled_bsp(void)
 	return *res;
 }
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 bool get_builtin_firmware(struct cpio_data *cd, const char *name)
 {
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 4b57cf5bc81d..77d4727f6594 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -45,8 +45,8 @@ MODULE_LICENSE("GPL");
 
 #ifdef CONFIG_FW_LOADER
 
-extern struct builtin_fw __start_builtin_fw[];
-extern struct builtin_fw __end_builtin_fw[];
+extern struct builtin_fw __start_builtin_fw[] __default_visibility;
+extern struct builtin_fw __end_builtin_fw[] __default_visibility;
 
 static bool fw_get_builtin_firmware(struct firmware *fw, const char *name,
 				    void *buf, size_t size)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index e5da44eddd2f..1aa5d6dac9e1 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -30,6 +30,9 @@
  *	__irqentry_text_start, __irqentry_text_end
  *	__softirqentry_text_start, __softirqentry_text_end
  */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(default)
+#endif
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
 extern char __bss_start[], __bss_stop[];
@@ -46,6 +49,9 @@ extern char __softirqentry_text_start[], __softirqentry_text_end[];
 
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility pop
+#endif
 
 extern __visible const void __nosave_begin, __nosave_end;
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index e95a2631e545..6997716f73bf 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -78,6 +78,14 @@ extern void __chk_io_ptr(const volatile void __iomem *);
 #include <linux/compiler-clang.h>
 #endif
 
+/* Useful for Position Independent Code to reduce global references */
+#ifdef CONFIG_DEFAULT_HIDDEN
+#pragma GCC visibility push(hidden)
+#define __default_visibility  __attribute__((visibility ("default")))
+#else
+#define __default_visibility
+#endif
+
 /*
  * Generic compiler-dependent macros required for kernel
  * build go below this comment. Actual compiler/compiler version
diff --git a/init/Kconfig b/init/Kconfig
index ccb1d8daf241..b640201fcff7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1649,6 +1649,13 @@ config PROFILING
 config TRACEPOINTS
 	bool
 
+#
+# Default to hidden visibility for all symbols.
+# Useful for Position Independent Code to reduce global references.
+#
+config DEFAULT_HIDDEN
+	bool
+
 source "arch/Kconfig"
 
 endmenu		# General setup
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 127e7cfafa55..252019c8c3a9 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -32,24 +32,24 @@
  * These will be re-linked against their real values
  * during the second link stage.
  */
-extern const unsigned long kallsyms_addresses[] __weak;
-extern const int kallsyms_offsets[] __weak;
-extern const u8 kallsyms_names[] __weak;
+extern const unsigned long kallsyms_addresses[] __weak __default_visibility;
+extern const int kallsyms_offsets[] __weak __default_visibility;
+extern const u8 kallsyms_names[] __weak __default_visibility;
 
 /*
  * Tell the compiler that the count isn't in the small data section if the arch
  * has one (eg: FRV).
  */
 extern const unsigned long kallsyms_num_syms
-__attribute__((weak, section(".rodata")));
+__attribute__((weak, section(".rodata"))) __default_visibility;
 
 extern const unsigned long kallsyms_relative_base
-__attribute__((weak, section(".rodata")));
+__attribute__((weak, section(".rodata"))) __default_visibility;
 
-extern const u8 kallsyms_token_table[] __weak;
-extern const u16 kallsyms_token_index[] __weak;
+extern const u8 kallsyms_token_table[] __weak __default_visibility;
+extern const u16 kallsyms_token_index[] __weak __default_visibility;
 
-extern const unsigned long kallsyms_markers[] __weak;
+extern const unsigned long kallsyms_markers[] __weak __default_visibility;
 
 static inline int is_kernel_inittext(unsigned long addr)
 {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 652c682707cd..31cb920039a2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1742,8 +1742,8 @@ extern int trace_event_enable_disable(struct trace_event_file *file,
 				      int enable, int soft_disable);
 extern int tracing_alloc_snapshot(void);
 
-extern const char *__start___trace_bprintk_fmt[];
-extern const char *__stop___trace_bprintk_fmt[];
+extern const char *__start___trace_bprintk_fmt[] __default_visibility;
+extern const char *__stop___trace_bprintk_fmt[] __default_visibility;
 
 extern const char *__start___tracepoint_str[];
 extern const char *__stop___tracepoint_str[];
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index da796e2dc4f5..10ed20177354 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -37,8 +37,8 @@
 #include <linux/device.h>
 #include <linux/netdevice.h>
 
-extern struct _ddebug __start___verbose[];
-extern struct _ddebug __stop___verbose[];
+extern struct _ddebug __start___verbose[] __default_visibility;
+extern struct _ddebug __stop___verbose[] __default_visibility;
 
 struct ddebug_table {
 	struct list_head link;
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 16/27] x86/relocs: Handle PIE relocations
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the relocation tool to correctly handle relocations generated by
-fPIE option:

 - Add relocation for each entry of the .got section given the linker does not
   generate R_X86_64_GLOB_DAT on a simple link.
 - Ignore R_X86_64_GOTPCREL and R_X86_64_PLT32.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 73eb7fd4aec4..5d3eb2760198 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -31,6 +31,7 @@ struct section {
 	Elf_Sym        *symtab;
 	Elf_Rel        *reltab;
 	char           *strtab;
+	Elf_Addr       *got;
 };
 static struct section *secs;
 
@@ -292,6 +293,35 @@ static Elf_Sym *sym_lookup(const char *symname)
 	return 0;
 }
 
+static Elf_Sym *sym_lookup_addr(Elf_Addr addr, const char **name)
+{
+	int i;
+	for (i = 0; i < ehdr.e_shnum; i++) {
+		struct section *sec = &secs[i];
+		long nsyms;
+		Elf_Sym *symtab;
+		Elf_Sym *sym;
+
+		if (sec->shdr.sh_type != SHT_SYMTAB)
+			continue;
+
+		nsyms = sec->shdr.sh_size/sizeof(Elf_Sym);
+		symtab = sec->symtab;
+
+		for (sym = symtab; --nsyms >= 0; sym++) {
+			if (sym->st_value == addr) {
+				if (name) {
+					*name = sym_name(sec->link->strtab,
+							 sym);
+				}
+				return sym;
+			}
+		}
+	}
+	return 0;
+}
+
+
 #if BYTE_ORDER == LITTLE_ENDIAN
 #define le16_to_cpu(val) (val)
 #define le32_to_cpu(val) (val)
@@ -512,6 +542,33 @@ static void read_relocs(FILE *fp)
 	}
 }
 
+static void read_got(FILE *fp)
+{
+	int i;
+	for (i = 0; i < ehdr.e_shnum; i++) {
+		struct section *sec = &secs[i];
+		sec->got = NULL;
+		if (sec->shdr.sh_type != SHT_PROGBITS ||
+		    strcmp(sec_name(i), ".got")) {
+			continue;
+		}
+		sec->got = malloc(sec->shdr.sh_size);
+		if (!sec->got) {
+			die("malloc of %d bytes for got failed\n",
+				sec->shdr.sh_size);
+		}
+		if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+			die("Seek to %d failed: %s\n",
+				sec->shdr.sh_offset, strerror(errno));
+		}
+		if (fread(sec->got, 1, sec->shdr.sh_size, fp)
+		    != sec->shdr.sh_size) {
+			die("Cannot read got: %s\n",
+				strerror(errno));
+		}
+	}
+}
+
 
 static void print_absolute_symbols(void)
 {
@@ -642,6 +699,32 @@ static void add_reloc(struct relocs *r, uint32_t offset)
 	r->offset[r->count++] = offset;
 }
 
+/*
+ * The linker does not generate relocations for the GOT for the kernel.
+ * If a GOT is found, simulate the relocations that should have been included.
+ */
+static void walk_got_table(int (*process)(struct section *sec, Elf_Rel *rel,
+					  Elf_Sym *sym, const char *symname),
+			   struct section *sec)
+{
+	int i;
+	Elf_Addr entry;
+	Elf_Sym *sym;
+	const char *symname;
+	Elf_Rel rel;
+
+	for (i = 0; i < sec->shdr.sh_size/sizeof(Elf_Addr); i++) {
+		entry = sec->got[i];
+		sym = sym_lookup_addr(entry, &symname);
+		if (!sym)
+			die("Could not found got symbol for entry %d\n", i);
+		rel.r_offset = sec->shdr.sh_addr + i * sizeof(Elf_Addr);
+		rel.r_info = ELF_BITS == 64 ? R_X86_64_GLOB_DAT
+			     : R_386_GLOB_DAT;
+		process(sec, &rel, sym, symname);
+	}
+}
+
 static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
 			Elf_Sym *sym, const char *symname))
 {
@@ -655,6 +738,8 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
 		struct section *sec = &secs[i];
 
 		if (sec->shdr.sh_type != SHT_REL_TYPE) {
+			if (sec->got)
+				walk_got_table(process, sec);
 			continue;
 		}
 		sec_symtab  = sec->link;
@@ -764,6 +849,8 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		offset += per_cpu_load_addr;
 
 	switch (r_type) {
+	case R_X86_64_PLT32:
+	case R_X86_64_GOTPCREL:
 	case R_X86_64_NONE:
 		/* NONE can be ignored. */
 		break;
@@ -805,7 +892,7 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		 * the relocations are processed.
 		 * Make sure that the offset will fit.
 		 */
-		if ((int32_t)offset != (int64_t)offset)
+		if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
 			die("Relocation offset doesn't fit in 32 bits\n");
 
 		if (r_type == R_X86_64_64)
@@ -814,6 +901,10 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 			add_reloc(&relocs32, offset);
 		break;
 
+	case R_X86_64_GLOB_DAT:
+		add_reloc(&relocs64, offset);
+		break;
+
 	default:
 		die("Unsupported relocation type: %s (%d)\n",
 		    rel_type(r_type), r_type);
@@ -1083,6 +1174,7 @@ void process(FILE *fp, int use_real_mode, int as_text,
 	read_strtabs(fp);
 	read_symtabs(fp);
 	read_relocs(fp);
+	read_got(fp);
 	if (ELF_BITS == 64)
 		percpu_init();
 	if (show_absolute_syms) {
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 16/27] x86/relocs: Handle PIE relocations
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the relocation tool to correctly handle relocations generated by
-fPIE option:

 - Add relocation for each entry of the .got section given the linker does not
   generate R_X86_64_GLOB_DAT on a simple link.
 - Ignore R_X86_64_GOTPCREL and R_X86_64_PLT32.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 73eb7fd4aec4..5d3eb2760198 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -31,6 +31,7 @@ struct section {
 	Elf_Sym        *symtab;
 	Elf_Rel        *reltab;
 	char           *strtab;
+	Elf_Addr       *got;
 };
 static struct section *secs;
 
@@ -292,6 +293,35 @@ static Elf_Sym *sym_lookup(const char *symname)
 	return 0;
 }
 
+static Elf_Sym *sym_lookup_addr(Elf_Addr addr, const char **name)
+{
+	int i;
+	for (i = 0; i < ehdr.e_shnum; i++) {
+		struct section *sec = &secs[i];
+		long nsyms;
+		Elf_Sym *symtab;
+		Elf_Sym *sym;
+
+		if (sec->shdr.sh_type != SHT_SYMTAB)
+			continue;
+
+		nsyms = sec->shdr.sh_size/sizeof(Elf_Sym);
+		symtab = sec->symtab;
+
+		for (sym = symtab; --nsyms >= 0; sym++) {
+			if (sym->st_value == addr) {
+				if (name) {
+					*name = sym_name(sec->link->strtab,
+							 sym);
+				}
+				return sym;
+			}
+		}
+	}
+	return 0;
+}
+
+
 #if BYTE_ORDER == LITTLE_ENDIAN
 #define le16_to_cpu(val) (val)
 #define le32_to_cpu(val) (val)
@@ -512,6 +542,33 @@ static void read_relocs(FILE *fp)
 	}
 }
 
+static void read_got(FILE *fp)
+{
+	int i;
+	for (i = 0; i < ehdr.e_shnum; i++) {
+		struct section *sec = &secs[i];
+		sec->got = NULL;
+		if (sec->shdr.sh_type != SHT_PROGBITS ||
+		    strcmp(sec_name(i), ".got")) {
+			continue;
+		}
+		sec->got = malloc(sec->shdr.sh_size);
+		if (!sec->got) {
+			die("malloc of %d bytes for got failed\n",
+				sec->shdr.sh_size);
+		}
+		if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+			die("Seek to %d failed: %s\n",
+				sec->shdr.sh_offset, strerror(errno));
+		}
+		if (fread(sec->got, 1, sec->shdr.sh_size, fp)
+		    != sec->shdr.sh_size) {
+			die("Cannot read got: %s\n",
+				strerror(errno));
+		}
+	}
+}
+
 
 static void print_absolute_symbols(void)
 {
@@ -642,6 +699,32 @@ static void add_reloc(struct relocs *r, uint32_t offset)
 	r->offset[r->count++] = offset;
 }
 
+/*
+ * The linker does not generate relocations for the GOT for the kernel.
+ * If a GOT is found, simulate the relocations that should have been included.
+ */
+static void walk_got_table(int (*process)(struct section *sec, Elf_Rel *rel,
+					  Elf_Sym *sym, const char *symname),
+			   struct section *sec)
+{
+	int i;
+	Elf_Addr entry;
+	Elf_Sym *sym;
+	const char *symname;
+	Elf_Rel rel;
+
+	for (i = 0; i < sec->shdr.sh_size/sizeof(Elf_Addr); i++) {
+		entry = sec->got[i];
+		sym = sym_lookup_addr(entry, &symname);
+		if (!sym)
+			die("Could not found got symbol for entry %d\n", i);
+		rel.r_offset = sec->shdr.sh_addr + i * sizeof(Elf_Addr);
+		rel.r_info = ELF_BITS == 64 ? R_X86_64_GLOB_DAT
+			     : R_386_GLOB_DAT;
+		process(sec, &rel, sym, symname);
+	}
+}
+
 static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
 			Elf_Sym *sym, const char *symname))
 {
@@ -655,6 +738,8 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
 		struct section *sec = &secs[i];
 
 		if (sec->shdr.sh_type != SHT_REL_TYPE) {
+			if (sec->got)
+				walk_got_table(process, sec);
 			continue;
 		}
 		sec_symtab  = sec->link;
@@ -764,6 +849,8 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		offset += per_cpu_load_addr;
 
 	switch (r_type) {
+	case R_X86_64_PLT32:
+	case R_X86_64_GOTPCREL:
 	case R_X86_64_NONE:
 		/* NONE can be ignored. */
 		break;
@@ -805,7 +892,7 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		 * the relocations are processed.
 		 * Make sure that the offset will fit.
 		 */
-		if ((int32_t)offset != (int64_t)offset)
+		if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
 			die("Relocation offset doesn't fit in 32 bits\n");
 
 		if (r_type == R_X86_64_64)
@@ -814,6 +901,10 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 			add_reloc(&relocs32, offset);
 		break;
 
+	case R_X86_64_GLOB_DAT:
+		add_reloc(&relocs64, offset);
+		break;
+
 	default:
 		die("Unsupported relocation type: %s (%d)\n",
 		    rel_type(r_type), r_type);
@@ -1083,6 +1174,7 @@ void process(FILE *fp, int use_real_mode, int as_text,
 	read_strtabs(fp);
 	read_symtabs(fp);
 	read_relocs(fp);
+	read_got(fp);
 	if (ELF_BITS == 64)
 		percpu_init();
 	if (show_absolute_syms) {
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 16/27] x86/relocs: Handle PIE relocations
  2017-10-04 21:19 ` Thomas Garnier
                   ` (32 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the relocation tool to correctly handle relocations generated by
-fPIE option:

 - Add relocation for each entry of the .got section given the linker does not
   generate R_X86_64_GLOB_DAT on a simple link.
 - Ignore R_X86_64_GOTPCREL and R_X86_64_PLT32.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 73eb7fd4aec4..5d3eb2760198 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -31,6 +31,7 @@ struct section {
 	Elf_Sym        *symtab;
 	Elf_Rel        *reltab;
 	char           *strtab;
+	Elf_Addr       *got;
 };
 static struct section *secs;
 
@@ -292,6 +293,35 @@ static Elf_Sym *sym_lookup(const char *symname)
 	return 0;
 }
 
+static Elf_Sym *sym_lookup_addr(Elf_Addr addr, const char **name)
+{
+	int i;
+	for (i = 0; i < ehdr.e_shnum; i++) {
+		struct section *sec = &secs[i];
+		long nsyms;
+		Elf_Sym *symtab;
+		Elf_Sym *sym;
+
+		if (sec->shdr.sh_type != SHT_SYMTAB)
+			continue;
+
+		nsyms = sec->shdr.sh_size/sizeof(Elf_Sym);
+		symtab = sec->symtab;
+
+		for (sym = symtab; --nsyms >= 0; sym++) {
+			if (sym->st_value == addr) {
+				if (name) {
+					*name = sym_name(sec->link->strtab,
+							 sym);
+				}
+				return sym;
+			}
+		}
+	}
+	return 0;
+}
+
+
 #if BYTE_ORDER == LITTLE_ENDIAN
 #define le16_to_cpu(val) (val)
 #define le32_to_cpu(val) (val)
@@ -512,6 +542,33 @@ static void read_relocs(FILE *fp)
 	}
 }
 
+static void read_got(FILE *fp)
+{
+	int i;
+	for (i = 0; i < ehdr.e_shnum; i++) {
+		struct section *sec = &secs[i];
+		sec->got = NULL;
+		if (sec->shdr.sh_type != SHT_PROGBITS ||
+		    strcmp(sec_name(i), ".got")) {
+			continue;
+		}
+		sec->got = malloc(sec->shdr.sh_size);
+		if (!sec->got) {
+			die("malloc of %d bytes for got failed\n",
+				sec->shdr.sh_size);
+		}
+		if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+			die("Seek to %d failed: %s\n",
+				sec->shdr.sh_offset, strerror(errno));
+		}
+		if (fread(sec->got, 1, sec->shdr.sh_size, fp)
+		    != sec->shdr.sh_size) {
+			die("Cannot read got: %s\n",
+				strerror(errno));
+		}
+	}
+}
+
 
 static void print_absolute_symbols(void)
 {
@@ -642,6 +699,32 @@ static void add_reloc(struct relocs *r, uint32_t offset)
 	r->offset[r->count++] = offset;
 }
 
+/*
+ * The linker does not generate relocations for the GOT for the kernel.
+ * If a GOT is found, simulate the relocations that should have been included.
+ */
+static void walk_got_table(int (*process)(struct section *sec, Elf_Rel *rel,
+					  Elf_Sym *sym, const char *symname),
+			   struct section *sec)
+{
+	int i;
+	Elf_Addr entry;
+	Elf_Sym *sym;
+	const char *symname;
+	Elf_Rel rel;
+
+	for (i = 0; i < sec->shdr.sh_size/sizeof(Elf_Addr); i++) {
+		entry = sec->got[i];
+		sym = sym_lookup_addr(entry, &symname);
+		if (!sym)
+			die("Could not found got symbol for entry %d\n", i);
+		rel.r_offset = sec->shdr.sh_addr + i * sizeof(Elf_Addr);
+		rel.r_info = ELF_BITS == 64 ? R_X86_64_GLOB_DAT
+			     : R_386_GLOB_DAT;
+		process(sec, &rel, sym, symname);
+	}
+}
+
 static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
 			Elf_Sym *sym, const char *symname))
 {
@@ -655,6 +738,8 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
 		struct section *sec = &secs[i];
 
 		if (sec->shdr.sh_type != SHT_REL_TYPE) {
+			if (sec->got)
+				walk_got_table(process, sec);
 			continue;
 		}
 		sec_symtab  = sec->link;
@@ -764,6 +849,8 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		offset += per_cpu_load_addr;
 
 	switch (r_type) {
+	case R_X86_64_PLT32:
+	case R_X86_64_GOTPCREL:
 	case R_X86_64_NONE:
 		/* NONE can be ignored. */
 		break;
@@ -805,7 +892,7 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		 * the relocations are processed.
 		 * Make sure that the offset will fit.
 		 */
-		if ((int32_t)offset != (int64_t)offset)
+		if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
 			die("Relocation offset doesn't fit in 32 bits\n");
 
 		if (r_type == R_X86_64_64)
@@ -814,6 +901,10 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 			add_reloc(&relocs32, offset);
 		break;
 
+	case R_X86_64_GLOB_DAT:
+		add_reloc(&relocs64, offset);
+		break;
+
 	default:
 		die("Unsupported relocation type: %s (%d)\n",
 		    rel_type(r_type), r_type);
@@ -1083,6 +1174,7 @@ void process(FILE *fp, int use_real_mode, int as_text,
 	read_strtabs(fp);
 	read_symtabs(fp);
 	read_relocs(fp);
+	read_got(fp);
 	if (ELF_BITS == 64)
 		percpu_init();
 	if (show_absolute_syms) {
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 16/27] x86/relocs: Handle PIE relocations
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the relocation tool to correctly handle relocations generated by
-fPIE option:

 - Add relocation for each entry of the .got section given the linker does not
   generate R_X86_64_GLOB_DAT on a simple link.
 - Ignore R_X86_64_GOTPCREL and R_X86_64_PLT32.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 73eb7fd4aec4..5d3eb2760198 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -31,6 +31,7 @@ struct section {
 	Elf_Sym        *symtab;
 	Elf_Rel        *reltab;
 	char           *strtab;
+	Elf_Addr       *got;
 };
 static struct section *secs;
 
@@ -292,6 +293,35 @@ static Elf_Sym *sym_lookup(const char *symname)
 	return 0;
 }
 
+static Elf_Sym *sym_lookup_addr(Elf_Addr addr, const char **name)
+{
+	int i;
+	for (i = 0; i < ehdr.e_shnum; i++) {
+		struct section *sec = &secs[i];
+		long nsyms;
+		Elf_Sym *symtab;
+		Elf_Sym *sym;
+
+		if (sec->shdr.sh_type != SHT_SYMTAB)
+			continue;
+
+		nsyms = sec->shdr.sh_size/sizeof(Elf_Sym);
+		symtab = sec->symtab;
+
+		for (sym = symtab; --nsyms >= 0; sym++) {
+			if (sym->st_value == addr) {
+				if (name) {
+					*name = sym_name(sec->link->strtab,
+							 sym);
+				}
+				return sym;
+			}
+		}
+	}
+	return 0;
+}
+
+
 #if BYTE_ORDER == LITTLE_ENDIAN
 #define le16_to_cpu(val) (val)
 #define le32_to_cpu(val) (val)
@@ -512,6 +542,33 @@ static void read_relocs(FILE *fp)
 	}
 }
 
+static void read_got(FILE *fp)
+{
+	int i;
+	for (i = 0; i < ehdr.e_shnum; i++) {
+		struct section *sec = &secs[i];
+		sec->got = NULL;
+		if (sec->shdr.sh_type != SHT_PROGBITS ||
+		    strcmp(sec_name(i), ".got")) {
+			continue;
+		}
+		sec->got = malloc(sec->shdr.sh_size);
+		if (!sec->got) {
+			die("malloc of %d bytes for got failed\n",
+				sec->shdr.sh_size);
+		}
+		if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+			die("Seek to %d failed: %s\n",
+				sec->shdr.sh_offset, strerror(errno));
+		}
+		if (fread(sec->got, 1, sec->shdr.sh_size, fp)
+		    != sec->shdr.sh_size) {
+			die("Cannot read got: %s\n",
+				strerror(errno));
+		}
+	}
+}
+
 
 static void print_absolute_symbols(void)
 {
@@ -642,6 +699,32 @@ static void add_reloc(struct relocs *r, uint32_t offset)
 	r->offset[r->count++] = offset;
 }
 
+/*
+ * The linker does not generate relocations for the GOT for the kernel.
+ * If a GOT is found, simulate the relocations that should have been included.
+ */
+static void walk_got_table(int (*process)(struct section *sec, Elf_Rel *rel,
+					  Elf_Sym *sym, const char *symname),
+			   struct section *sec)
+{
+	int i;
+	Elf_Addr entry;
+	Elf_Sym *sym;
+	const char *symname;
+	Elf_Rel rel;
+
+	for (i = 0; i < sec->shdr.sh_size/sizeof(Elf_Addr); i++) {
+		entry = sec->got[i];
+		sym = sym_lookup_addr(entry, &symname);
+		if (!sym)
+			die("Could not found got symbol for entry %d\n", i);
+		rel.r_offset = sec->shdr.sh_addr + i * sizeof(Elf_Addr);
+		rel.r_info = ELF_BITS == 64 ? R_X86_64_GLOB_DAT
+			     : R_386_GLOB_DAT;
+		process(sec, &rel, sym, symname);
+	}
+}
+
 static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
 			Elf_Sym *sym, const char *symname))
 {
@@ -655,6 +738,8 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
 		struct section *sec = &secs[i];
 
 		if (sec->shdr.sh_type != SHT_REL_TYPE) {
+			if (sec->got)
+				walk_got_table(process, sec);
 			continue;
 		}
 		sec_symtab  = sec->link;
@@ -764,6 +849,8 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		offset += per_cpu_load_addr;
 
 	switch (r_type) {
+	case R_X86_64_PLT32:
+	case R_X86_64_GOTPCREL:
 	case R_X86_64_NONE:
 		/* NONE can be ignored. */
 		break;
@@ -805,7 +892,7 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		 * the relocations are processed.
 		 * Make sure that the offset will fit.
 		 */
-		if ((int32_t)offset != (int64_t)offset)
+		if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
 			die("Relocation offset doesn't fit in 32 bits\n");
 
 		if (r_type == R_X86_64_64)
@@ -814,6 +901,10 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 			add_reloc(&relocs32, offset);
 		break;
 
+	case R_X86_64_GLOB_DAT:
+		add_reloc(&relocs64, offset);
+		break;
+
 	default:
 		die("Unsupported relocation type: %s (%d)\n",
 		    rel_type(r_type), r_type);
@@ -1083,6 +1174,7 @@ void process(FILE *fp, int use_real_mode, int as_text,
 	read_strtabs(fp);
 	read_symtabs(fp);
 	read_relocs(fp);
+	read_got(fp);
 	if (ELF_BITS == 64)
 		percpu_init();
 	if (show_absolute_syms) {
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 17/27] xen: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use the new _ASM_GET_PTR macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c | 16 +++++++++++++++-
 arch/x86/xen/xen-head.S |  9 +++++----
 arch/x86/xen/xen-pvh.S  | 13 +++++++++----
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 5d3eb2760198..bc032ad88b22 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -831,6 +831,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(ElfW(Addr) offset)
+{
+	ElfW(Sym) *sym = sym_lookup("pvh_start_xen");
+	return sym && (offset >= sym->st_value) &&
+		(offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
@@ -892,8 +902,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		 * the relocations are processed.
 		 * Make sure that the offset will fit.
 		 */
-		if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+		if (r_type != R_X86_64_64 &&
+		    (int32_t)offset != (int64_t)offset) {
+			if (is_in_xenpvh_assembly(offset))
+				break;
 			die("Relocation offset doesn't fit in 32 bits\n");
+		}
 
 		if (r_type == R_X86_64_64)
 			add_reloc(&relocs64, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 124941d09b2b..e5b7b9566191 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,14 +25,15 @@ ENTRY(startup_xen)
 
 	/* Clear .bss */
 	xor %eax,%eax
-	mov $__bss_start, %_ASM_DI
-	mov $__bss_stop, %_ASM_CX
+	_ASM_GET_PTR(__bss_start, %_ASM_DI)
+	_ASM_GET_PTR(__bss_stop, %_ASM_CX)
 	sub %_ASM_DI, %_ASM_CX
 	shr $__ASM_SEL(2, 3), %_ASM_CX
 	rep __ASM_SIZE(stos)
 
-	mov %_ASM_SI, xen_start_info
-	mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+	_ASM_GET_PTR(xen_start_info, %_ASM_AX)
+	mov %_ASM_SI, (%_ASM_AX)
+	_ASM_GET_PTR(init_thread_union+THREAD_SIZE, %_ASM_SP)
 
 	jmp xen_start_kernel
 END(startup_xen)
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index e1a5fbeae08d..43e234c7c2de 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -101,8 +101,8 @@ ENTRY(pvh_start_xen)
 	call xen_prepare_pvh
 
 	/* startup_64 expects boot_params in %rsi. */
-	mov $_pa(pvh_bootparams), %rsi
-	mov $_pa(startup_64), %rax
+	movabs $_pa(pvh_bootparams), %rsi
+	movabs $_pa(startup_64), %rax
 	jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -137,10 +137,15 @@ END(pvh_start_xen)
 
 	.section ".init.data","aw"
 	.balign 8
+	/*
+	 * Use a quad for _pa(gdt_start) because PIE does not understand a
+	 * long is enough. The resulting value will still be in the lower long
+	 * part.
+	 */
 gdt:
 	.word gdt_end - gdt_start
-	.long _pa(gdt_start)
-	.word 0
+	.quad _pa(gdt_start)
+	.balign 8
 gdt_start:
 	.quad 0x0000000000000000            /* NULL descriptor */
 	.quad 0x0000000000000000            /* reserved */
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 17/27] xen: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use the new _ASM_GET_PTR macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c | 16 +++++++++++++++-
 arch/x86/xen/xen-head.S |  9 +++++----
 arch/x86/xen/xen-pvh.S  | 13 +++++++++----
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 5d3eb2760198..bc032ad88b22 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -831,6 +831,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(ElfW(Addr) offset)
+{
+	ElfW(Sym) *sym = sym_lookup("pvh_start_xen");
+	return sym && (offset >= sym->st_value) &&
+		(offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
@@ -892,8 +902,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		 * the relocations are processed.
 		 * Make sure that the offset will fit.
 		 */
-		if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+		if (r_type != R_X86_64_64 &&
+		    (int32_t)offset != (int64_t)offset) {
+			if (is_in_xenpvh_assembly(offset))
+				break;
 			die("Relocation offset doesn't fit in 32 bits\n");
+		}
 
 		if (r_type == R_X86_64_64)
 			add_reloc(&relocs64, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 124941d09b2b..e5b7b9566191 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,14 +25,15 @@ ENTRY(startup_xen)
 
 	/* Clear .bss */
 	xor %eax,%eax
-	mov $__bss_start, %_ASM_DI
-	mov $__bss_stop, %_ASM_CX
+	_ASM_GET_PTR(__bss_start, %_ASM_DI)
+	_ASM_GET_PTR(__bss_stop, %_ASM_CX)
 	sub %_ASM_DI, %_ASM_CX
 	shr $__ASM_SEL(2, 3), %_ASM_CX
 	rep __ASM_SIZE(stos)
 
-	mov %_ASM_SI, xen_start_info
-	mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+	_ASM_GET_PTR(xen_start_info, %_ASM_AX)
+	mov %_ASM_SI, (%_ASM_AX)
+	_ASM_GET_PTR(init_thread_union+THREAD_SIZE, %_ASM_SP)
 
 	jmp xen_start_kernel
 END(startup_xen)
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index e1a5fbeae08d..43e234c7c2de 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -101,8 +101,8 @@ ENTRY(pvh_start_xen)
 	call xen_prepare_pvh
 
 	/* startup_64 expects boot_params in %rsi. */
-	mov $_pa(pvh_bootparams), %rsi
-	mov $_pa(startup_64), %rax
+	movabs $_pa(pvh_bootparams), %rsi
+	movabs $_pa(startup_64), %rax
 	jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -137,10 +137,15 @@ END(pvh_start_xen)
 
 	.section ".init.data","aw"
 	.balign 8
+	/*
+	 * Use a quad for _pa(gdt_start) because PIE does not understand a
+	 * long is enough. The resulting value will still be in the lower long
+	 * part.
+	 */
 gdt:
 	.word gdt_end - gdt_start
-	.long _pa(gdt_start)
-	.word 0
+	.quad _pa(gdt_start)
+	.balign 8
 gdt_start:
 	.quad 0x0000000000000000            /* NULL descriptor */
 	.quad 0x0000000000000000            /* reserved */
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 17/27] xen: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (33 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use the new _ASM_GET_PTR macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c | 16 +++++++++++++++-
 arch/x86/xen/xen-head.S |  9 +++++----
 arch/x86/xen/xen-pvh.S  | 13 +++++++++----
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 5d3eb2760198..bc032ad88b22 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -831,6 +831,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(ElfW(Addr) offset)
+{
+	ElfW(Sym) *sym = sym_lookup("pvh_start_xen");
+	return sym && (offset >= sym->st_value) &&
+		(offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
@@ -892,8 +902,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		 * the relocations are processed.
 		 * Make sure that the offset will fit.
 		 */
-		if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+		if (r_type != R_X86_64_64 &&
+		    (int32_t)offset != (int64_t)offset) {
+			if (is_in_xenpvh_assembly(offset))
+				break;
 			die("Relocation offset doesn't fit in 32 bits\n");
+		}
 
 		if (r_type == R_X86_64_64)
 			add_reloc(&relocs64, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 124941d09b2b..e5b7b9566191 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,14 +25,15 @@ ENTRY(startup_xen)
 
 	/* Clear .bss */
 	xor %eax,%eax
-	mov $__bss_start, %_ASM_DI
-	mov $__bss_stop, %_ASM_CX
+	_ASM_GET_PTR(__bss_start, %_ASM_DI)
+	_ASM_GET_PTR(__bss_stop, %_ASM_CX)
 	sub %_ASM_DI, %_ASM_CX
 	shr $__ASM_SEL(2, 3), %_ASM_CX
 	rep __ASM_SIZE(stos)
 
-	mov %_ASM_SI, xen_start_info
-	mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+	_ASM_GET_PTR(xen_start_info, %_ASM_AX)
+	mov %_ASM_SI, (%_ASM_AX)
+	_ASM_GET_PTR(init_thread_union+THREAD_SIZE, %_ASM_SP)
 
 	jmp xen_start_kernel
 END(startup_xen)
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index e1a5fbeae08d..43e234c7c2de 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -101,8 +101,8 @@ ENTRY(pvh_start_xen)
 	call xen_prepare_pvh
 
 	/* startup_64 expects boot_params in %rsi. */
-	mov $_pa(pvh_bootparams), %rsi
-	mov $_pa(startup_64), %rax
+	movabs $_pa(pvh_bootparams), %rsi
+	movabs $_pa(startup_64), %rax
 	jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -137,10 +137,15 @@ END(pvh_start_xen)
 
 	.section ".init.data","aw"
 	.balign 8
+	/*
+	 * Use a quad for _pa(gdt_start) because PIE does not understand a
+	 * long is enough. The resulting value will still be in the lower long
+	 * part.
+	 */
 gdt:
 	.word gdt_end - gdt_start
-	.long _pa(gdt_start)
-	.word 0
+	.quad _pa(gdt_start)
+	.balign 8
 gdt_start:
 	.quad 0x0000000000000000            /* NULL descriptor */
 	.quad 0x0000000000000000            /* reserved */
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 17/27] xen: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use the new _ASM_GET_PTR macro which get a
symbol reference while being PIE compatible. Adapt the relocation tool
to ignore 32-bit Xen code.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c | 16 +++++++++++++++-
 arch/x86/xen/xen-head.S |  9 +++++----
 arch/x86/xen/xen-pvh.S  | 13 +++++++++----
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 5d3eb2760198..bc032ad88b22 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -831,6 +831,16 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
+/*
+ * Check if the 32-bit relocation is within the xenpvh 32-bit code.
+ * If so, ignores it.
+ */
+static int is_in_xenpvh_assembly(ElfW(Addr) offset)
+{
+	ElfW(Sym) *sym = sym_lookup("pvh_start_xen");
+	return sym && (offset >= sym->st_value) &&
+		(offset < (sym->st_value + sym->st_size));
+}
 
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
@@ -892,8 +902,12 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		 * the relocations are processed.
 		 * Make sure that the offset will fit.
 		 */
-		if (r_type != R_X86_64_64 && (int32_t)offset != (int64_t)offset)
+		if (r_type != R_X86_64_64 &&
+		    (int32_t)offset != (int64_t)offset) {
+			if (is_in_xenpvh_assembly(offset))
+				break;
 			die("Relocation offset doesn't fit in 32 bits\n");
+		}
 
 		if (r_type == R_X86_64_64)
 			add_reloc(&relocs64, offset);
diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S
index 124941d09b2b..e5b7b9566191 100644
--- a/arch/x86/xen/xen-head.S
+++ b/arch/x86/xen/xen-head.S
@@ -25,14 +25,15 @@ ENTRY(startup_xen)
 
 	/* Clear .bss */
 	xor %eax,%eax
-	mov $__bss_start, %_ASM_DI
-	mov $__bss_stop, %_ASM_CX
+	_ASM_GET_PTR(__bss_start, %_ASM_DI)
+	_ASM_GET_PTR(__bss_stop, %_ASM_CX)
 	sub %_ASM_DI, %_ASM_CX
 	shr $__ASM_SEL(2, 3), %_ASM_CX
 	rep __ASM_SIZE(stos)
 
-	mov %_ASM_SI, xen_start_info
-	mov $init_thread_union+THREAD_SIZE, %_ASM_SP
+	_ASM_GET_PTR(xen_start_info, %_ASM_AX)
+	mov %_ASM_SI, (%_ASM_AX)
+	_ASM_GET_PTR(init_thread_union+THREAD_SIZE, %_ASM_SP)
 
 	jmp xen_start_kernel
 END(startup_xen)
diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index e1a5fbeae08d..43e234c7c2de 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -101,8 +101,8 @@ ENTRY(pvh_start_xen)
 	call xen_prepare_pvh
 
 	/* startup_64 expects boot_params in %rsi. */
-	mov $_pa(pvh_bootparams), %rsi
-	mov $_pa(startup_64), %rax
+	movabs $_pa(pvh_bootparams), %rsi
+	movabs $_pa(startup_64), %rax
 	jmp *%rax
 
 #else /* CONFIG_X86_64 */
@@ -137,10 +137,15 @@ END(pvh_start_xen)
 
 	.section ".init.data","aw"
 	.balign 8
+	/*
+	 * Use a quad for _pa(gdt_start) because PIE does not understand a
+	 * long is enough. The resulting value will still be in the lower long
+	 * part.
+	 */
 gdt:
 	.word gdt_end - gdt_start
-	.long _pa(gdt_start)
-	.word 0
+	.quad _pa(gdt_start)
+	.balign 8
 gdt_start:
 	.quad 0x0000000000000000            /* NULL descriptor */
 	.quad 0x0000000000000000            /* reserved */
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 18/27] kvm: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. The new __ASM_GET_PTR_PRE macro is used to
get the address of a symbol on both 32 and 64-bit with PIE support.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ++++--
 arch/x86/kernel/kvm.c           | 6 ++++--
 arch/x86/kvm/svm.c              | 4 ++--
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9d7d856b2d89..14073fda75fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1342,9 +1342,11 @@ asmlinkage void kvm_spurious_fault(void);
 	".pushsection .fixup, \"ax\" \n" \
 	"667: \n\t" \
 	cleanup_insn "\n\t"		      \
-	"cmpb $0, kvm_rebooting \n\t"	      \
+	"cmpb $0, kvm_rebooting" __ASM_SEL(,(%%rip)) " \n\t" \
 	"jne 668b \n\t"      		      \
-	__ASM_SIZE(push) " $666b \n\t"	      \
+	__ASM_SIZE(push) "%%" _ASM_AX " \n\t"		\
+	__ASM_GET_PTR_PRE(666b) "%%" _ASM_AX "\n\t"	\
+	"xchg %%" _ASM_AX ", (%%" _ASM_SP ") \n\t"	\
 	"call kvm_spurious_fault \n\t"	      \
 	".popsection \n\t" \
 	_ASM_EXTABLE(666b, 667b)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index aa60a08b65b1..07176bfc188b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -620,8 +620,10 @@ asm(
 ".global __raw_callee_save___kvm_vcpu_is_preempted;"
 ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
 "__raw_callee_save___kvm_vcpu_is_preempted:"
-"movq	__per_cpu_offset(,%rdi,8), %rax;"
-"cmpb	$0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
+"leaq	__per_cpu_offset(%rip), %rax;"
+"movq	(%rax,%rdi,8), %rax;"
+"addq	" __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rip), %rax;"
+"cmpb	$0, (%rax);"
 "setne	%al;"
 "ret;"
 ".popsection");
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b3cbf7..364536080438 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -568,12 +568,12 @@ static u32 svm_msrpm_offset(u32 msr)
 
 static inline void clgi(void)
 {
-	asm volatile (__ex(SVM_CLGI));
+	asm volatile (__ex(SVM_CLGI) : :);
 }
 
 static inline void stgi(void)
 {
-	asm volatile (__ex(SVM_STGI));
+	asm volatile (__ex(SVM_STGI) : :);
 }
 
 static inline void invlpga(unsigned long addr, u32 asid)
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 18/27] kvm: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. The new __ASM_GET_PTR_PRE macro is used to
get the address of a symbol on both 32 and 64-bit with PIE support.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ++++--
 arch/x86/kernel/kvm.c           | 6 ++++--
 arch/x86/kvm/svm.c              | 4 ++--
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9d7d856b2d89..14073fda75fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1342,9 +1342,11 @@ asmlinkage void kvm_spurious_fault(void);
 	".pushsection .fixup, \"ax\" \n" \
 	"667: \n\t" \
 	cleanup_insn "\n\t"		      \
-	"cmpb $0, kvm_rebooting \n\t"	      \
+	"cmpb $0, kvm_rebooting" __ASM_SEL(,(%%rip)) " \n\t" \
 	"jne 668b \n\t"      		      \
-	__ASM_SIZE(push) " $666b \n\t"	      \
+	__ASM_SIZE(push) "%%" _ASM_AX " \n\t"		\
+	__ASM_GET_PTR_PRE(666b) "%%" _ASM_AX "\n\t"	\
+	"xchg %%" _ASM_AX ", (%%" _ASM_SP ") \n\t"	\
 	"call kvm_spurious_fault \n\t"	      \
 	".popsection \n\t" \
 	_ASM_EXTABLE(666b, 667b)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index aa60a08b65b1..07176bfc188b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -620,8 +620,10 @@ asm(
 ".global __raw_callee_save___kvm_vcpu_is_preempted;"
 ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
 "__raw_callee_save___kvm_vcpu_is_preempted:"
-"movq	__per_cpu_offset(,%rdi,8), %rax;"
-"cmpb	$0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
+"leaq	__per_cpu_offset(%rip), %rax;"
+"movq	(%rax,%rdi,8), %rax;"
+"addq	" __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rip), %rax;"
+"cmpb	$0, (%rax);"
 "setne	%al;"
 "ret;"
 ".popsection");
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b3cbf7..364536080438 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -568,12 +568,12 @@ static u32 svm_msrpm_offset(u32 msr)
 
 static inline void clgi(void)
 {
-	asm volatile (__ex(SVM_CLGI));
+	asm volatile (__ex(SVM_CLGI) : :);
 }
 
 static inline void stgi(void)
 {
-	asm volatile (__ex(SVM_STGI));
+	asm volatile (__ex(SVM_STGI) : :);
 }
 
 static inline void invlpga(unsigned long addr, u32 asid)
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 18/27] kvm: Adapt assembly for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (35 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. The new __ASM_GET_PTR_PRE macro is used to
get the address of a symbol on both 32 and 64-bit with PIE support.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ++++--
 arch/x86/kernel/kvm.c           | 6 ++++--
 arch/x86/kvm/svm.c              | 4 ++--
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9d7d856b2d89..14073fda75fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1342,9 +1342,11 @@ asmlinkage void kvm_spurious_fault(void);
 	".pushsection .fixup, \"ax\" \n" \
 	"667: \n\t" \
 	cleanup_insn "\n\t"		      \
-	"cmpb $0, kvm_rebooting \n\t"	      \
+	"cmpb $0, kvm_rebooting" __ASM_SEL(,(%%rip)) " \n\t" \
 	"jne 668b \n\t"      		      \
-	__ASM_SIZE(push) " $666b \n\t"	      \
+	__ASM_SIZE(push) "%%" _ASM_AX " \n\t"		\
+	__ASM_GET_PTR_PRE(666b) "%%" _ASM_AX "\n\t"	\
+	"xchg %%" _ASM_AX ", (%%" _ASM_SP ") \n\t"	\
 	"call kvm_spurious_fault \n\t"	      \
 	".popsection \n\t" \
 	_ASM_EXTABLE(666b, 667b)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index aa60a08b65b1..07176bfc188b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -620,8 +620,10 @@ asm(
 ".global __raw_callee_save___kvm_vcpu_is_preempted;"
 ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
 "__raw_callee_save___kvm_vcpu_is_preempted:"
-"movq	__per_cpu_offset(,%rdi,8), %rax;"
-"cmpb	$0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
+"leaq	__per_cpu_offset(%rip), %rax;"
+"movq	(%rax,%rdi,8), %rax;"
+"addq	" __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rip), %rax;"
+"cmpb	$0, (%rax);"
 "setne	%al;"
 "ret;"
 ".popsection");
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b3cbf7..364536080438 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -568,12 +568,12 @@ static u32 svm_msrpm_offset(u32 msr)
 
 static inline void clgi(void)
 {
-	asm volatile (__ex(SVM_CLGI));
+	asm volatile (__ex(SVM_CLGI) : :);
 }
 
 static inline void stgi(void)
 {
-	asm volatile (__ex(SVM_STGI));
+	asm volatile (__ex(SVM_STGI) : :);
 }
 
 static inline void invlpga(unsigned long addr, u32 asid)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 18/27] kvm: Adapt assembly for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible. The new __ASM_GET_PTR_PRE macro is used to
get the address of a symbol on both 32 and 64-bit with PIE support.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/kvm_host.h | 6 ++++--
 arch/x86/kernel/kvm.c           | 6 ++++--
 arch/x86/kvm/svm.c              | 4 ++--
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9d7d856b2d89..14073fda75fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1342,9 +1342,11 @@ asmlinkage void kvm_spurious_fault(void);
 	".pushsection .fixup, \"ax\" \n" \
 	"667: \n\t" \
 	cleanup_insn "\n\t"		      \
-	"cmpb $0, kvm_rebooting \n\t"	      \
+	"cmpb $0, kvm_rebooting" __ASM_SEL(,(%%rip)) " \n\t" \
 	"jne 668b \n\t"      		      \
-	__ASM_SIZE(push) " $666b \n\t"	      \
+	__ASM_SIZE(push) "%%" _ASM_AX " \n\t"		\
+	__ASM_GET_PTR_PRE(666b) "%%" _ASM_AX "\n\t"	\
+	"xchg %%" _ASM_AX ", (%%" _ASM_SP ") \n\t"	\
 	"call kvm_spurious_fault \n\t"	      \
 	".popsection \n\t" \
 	_ASM_EXTABLE(666b, 667b)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index aa60a08b65b1..07176bfc188b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -620,8 +620,10 @@ asm(
 ".global __raw_callee_save___kvm_vcpu_is_preempted;"
 ".type __raw_callee_save___kvm_vcpu_is_preempted, @function;"
 "__raw_callee_save___kvm_vcpu_is_preempted:"
-"movq	__per_cpu_offset(,%rdi,8), %rax;"
-"cmpb	$0, " __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rax);"
+"leaq	__per_cpu_offset(%rip), %rax;"
+"movq	(%rax,%rdi,8), %rax;"
+"addq	" __stringify(KVM_STEAL_TIME_preempted) "+steal_time(%rip), %rax;"
+"cmpb	$0, (%rax);"
 "setne	%al;"
 "ret;"
 ".popsection");
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b3cbf7..364536080438 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -568,12 +568,12 @@ static u32 svm_msrpm_offset(u32 msr)
 
 static inline void clgi(void)
 {
-	asm volatile (__ex(SVM_CLGI));
+	asm volatile (__ex(SVM_CLGI) : :);
 }
 
 static inline void stgi(void)
 {
-	asm volatile (__ex(SVM_STGI));
+	asm volatile (__ex(SVM_STGI) : :);
 }
 
 static inline void invlpga(unsigned long addr, u32 asid)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 19/27] x86: Support global stack cookie
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add an off-by-default configuration option to use a global stack cookie
instead of the default TLS. This configuration option will only be used
with PIE binaries.

For kernel stack cookie, the compiler uses the mcmodel=kernel to switch
between the fs segment to gs segment. A PIE binary does not use
mcmodel=kernel because it can be relocated anywhere, therefore the
compiler will default to the fs segment register. This is going to be
fixed with a compiler change allowing to pick the segment register as
done on PowerPC. In the meantime, this configuration can be used to
support older compilers.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig                      |  4 ++++
 arch/x86/Makefile                     |  9 +++++++++
 arch/x86/entry/entry_32.S             |  3 ++-
 arch/x86/entry/entry_64.S             |  3 ++-
 arch/x86/include/asm/processor.h      |  3 ++-
 arch/x86/include/asm/stackprotector.h | 19 ++++++++++++++-----
 arch/x86/kernel/asm-offsets.c         |  3 ++-
 arch/x86/kernel/asm-offsets_32.c      |  3 ++-
 arch/x86/kernel/asm-offsets_64.c      |  3 ++-
 arch/x86/kernel/cpu/common.c          |  3 ++-
 arch/x86/kernel/head_32.S             |  3 ++-
 arch/x86/kernel/process.c             |  5 +++++
 12 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 063f1e0d51aa..777197fab6dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2133,6 +2133,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+config X86_GLOBAL_STACKPROTECTOR
+	bool
+	depends on CC_STACKPROTECTOR
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 6276572259c8..4cb4f0495ddc 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -60,6 +60,15 @@ endif
 KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
 KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
 
+ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+        ifeq ($(call cc-option, -mstack-protector-guard=global),)
+                $(error Cannot use CONFIG_X86_GLOBAL_STACKPROTECTOR: \
+                        -mstack-protector-guard=global not supported \
+                        by compiler)
+        endif
+        KBUILD_CFLAGS += -mstack-protector-guard=global
+endif
+
 ifeq ($(CONFIG_X86_32),y)
         BITS := 32
         UTS_MACHINE := i386
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8a13d468635a..ab3e5056722f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -237,7 +237,8 @@ ENTRY(__switch_to_asm)
 	movl	%esp, TASK_threadsp(%eax)
 	movl	TASK_threadsp(%edx), %esp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	movl	TASK_stack_canary(%edx), %ebx
 	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
 #endif
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d3a52d2342af..01be62c1b436 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -390,7 +390,8 @@ ENTRY(__switch_to_asm)
 	movq	%rsp, TASK_threadsp(%rdi)
 	movq	TASK_threadsp(%rsi), %rsp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	movq	TASK_stack_canary(%rsi), %rbx
 	movq	%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b09bd50b06c7..e3a7ef8d5fb8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -394,7 +394,8 @@ DECLARE_PER_CPU(char *, irq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 #else	/* X86_64 */
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 /*
  * Make sure stack canary segment base is cached-aligned:
  *   "For Intel Atom processors, avoid non zero segment base address
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 8abedf1d650e..66462d778dc5 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -51,6 +51,10 @@
 #define GDT_STACK_CANARY_INIT						\
 	[GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+extern unsigned long __stack_chk_guard;
+#endif
+
 /*
  * Initialize the stackprotector canary value.
  *
@@ -62,7 +66,7 @@ static __always_inline void boot_init_stack_canary(void)
 	u64 canary;
 	u64 tsc;
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
 #endif
 	/*
@@ -76,17 +80,22 @@ static __always_inline void boot_init_stack_canary(void)
 	canary += tsc + (tsc << 32UL);
 	canary &= CANARY_MASK;
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+	if (__stack_chk_guard == 0)
+		__stack_chk_guard = canary ?: 1;
+#else /* !CONFIG_X86_GLOBAL_STACKPROTECTOR */
 	current->stack_canary = canary;
 #ifdef CONFIG_X86_64
 	this_cpu_write(irq_stack_union.stack_canary, canary);
-#else
+#else /* CONFIG_X86_32 */
 	this_cpu_write(stack_canary.canary, canary);
 #endif
+#endif
 }
 
 static inline void setup_stack_canary_segment(int cpu)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
 	struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
 	struct desc_struct desc;
@@ -99,7 +108,7 @@ static inline void setup_stack_canary_segment(int cpu)
 
 static inline void load_stack_canary_segment(void)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
 #endif
 }
@@ -115,7 +124,7 @@ static inline void setup_stack_canary_segment(int cpu)
 
 static inline void load_stack_canary_segment(void)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	asm volatile ("mov %0, %%gs" : : "r" (0));
 #endif
 }
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index de827d6ac8c2..b30a12cd021e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -30,7 +30,8 @@
 void common(void) {
 	BLANK();
 	OFFSET(TASK_threadsp, task_struct, thread.sp);
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	OFFSET(TASK_stack_canary, task_struct, stack_canary);
 #endif
 
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 710edab9e644..33584e7e486b 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -54,7 +54,8 @@ void foo(void)
 	/* Size of SYSENTER_stack */
 	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	BLANK();
 	OFFSET(stack_canary_offset, stack_canary, canary);
 #endif
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index cf42206926af..06feb31a09f5 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -64,7 +64,8 @@ int main(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	BLANK();
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
 	BLANK();
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8732c9e719d5..839f4da2ed76 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1431,7 +1431,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
 	(unsigned long)&init_thread_union + THREAD_SIZE;
 EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 #endif
 
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 9ed3074d0d27..a55a67b33934 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -377,7 +377,8 @@ ENDPROC(startup_32_smp)
  */
 __INIT
 setup_once:
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	/*
 	 * Configure the stack canary. The linker can't handle this by
 	 * relocation.  Manually set base address in stack canary
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd6b85fac666..66ea1a35413e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -73,6 +73,11 @@ EXPORT_PER_CPU_SYMBOL(cpu_tss);
 DEFINE_PER_CPU(bool, __tss_limit_invalid);
 EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid);
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+unsigned long __stack_chk_guard __read_mostly;
+EXPORT_SYMBOL(__stack_chk_guard);
+#endif
+
 /*
  * this gets called so that we can store lazy state into memory and copy the
  * current task into the new thread.
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 19/27] x86: Support global stack cookie
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add an off-by-default configuration option to use a global stack cookie
instead of the default TLS. This configuration option will only be used
with PIE binaries.

For kernel stack cookie, the compiler uses the mcmodel=kernel to switch
between the fs segment to gs segment. A PIE binary does not use
mcmodel=kernel because it can be relocated anywhere, therefore the
compiler will default to the fs segment register. This is going to be
fixed with a compiler change allowing to pick the segment register as
done on PowerPC. In the meantime, this configuration can be used to
support older compilers.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig                      |  4 ++++
 arch/x86/Makefile                     |  9 +++++++++
 arch/x86/entry/entry_32.S             |  3 ++-
 arch/x86/entry/entry_64.S             |  3 ++-
 arch/x86/include/asm/processor.h      |  3 ++-
 arch/x86/include/asm/stackprotector.h | 19 ++++++++++++++-----
 arch/x86/kernel/asm-offsets.c         |  3 ++-
 arch/x86/kernel/asm-offsets_32.c      |  3 ++-
 arch/x86/kernel/asm-offsets_64.c      |  3 ++-
 arch/x86/kernel/cpu/common.c          |  3 ++-
 arch/x86/kernel/head_32.S             |  3 ++-
 arch/x86/kernel/process.c             |  5 +++++
 12 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 063f1e0d51aa..777197fab6dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2133,6 +2133,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+config X86_GLOBAL_STACKPROTECTOR
+	bool
+	depends on CC_STACKPROTECTOR
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 6276572259c8..4cb4f0495ddc 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -60,6 +60,15 @@ endif
 KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
 KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
 
+ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+        ifeq ($(call cc-option, -mstack-protector-guard=global),)
+                $(error Cannot use CONFIG_X86_GLOBAL_STACKPROTECTOR: \
+                        -mstack-protector-guard=global not supported \
+                        by compiler)
+        endif
+        KBUILD_CFLAGS += -mstack-protector-guard=global
+endif
+
 ifeq ($(CONFIG_X86_32),y)
         BITS := 32
         UTS_MACHINE := i386
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8a13d468635a..ab3e5056722f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -237,7 +237,8 @@ ENTRY(__switch_to_asm)
 	movl	%esp, TASK_threadsp(%eax)
 	movl	TASK_threadsp(%edx), %esp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	movl	TASK_stack_canary(%edx), %ebx
 	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
 #endif
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d3a52d2342af..01be62c1b436 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -390,7 +390,8 @@ ENTRY(__switch_to_asm)
 	movq	%rsp, TASK_threadsp(%rdi)
 	movq	TASK_threadsp(%rsi), %rsp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	movq	TASK_stack_canary(%rsi), %rbx
 	movq	%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b09bd50b06c7..e3a7ef8d5fb8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -394,7 +394,8 @@ DECLARE_PER_CPU(char *, irq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 #else	/* X86_64 */
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 /*
  * Make sure stack canary segment base is cached-aligned:
  *   "For Intel Atom processors, avoid non zero segment base address
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 8abedf1d650e..66462d778dc5 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -51,6 +51,10 @@
 #define GDT_STACK_CANARY_INIT						\
 	[GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+extern unsigned long __stack_chk_guard;
+#endif
+
 /*
  * Initialize the stackprotector canary value.
  *
@@ -62,7 +66,7 @@ static __always_inline void boot_init_stack_canary(void)
 	u64 canary;
 	u64 tsc;
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
 #endif
 	/*
@@ -76,17 +80,22 @@ static __always_inline void boot_init_stack_canary(void)
 	canary += tsc + (tsc << 32UL);
 	canary &= CANARY_MASK;
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+	if (__stack_chk_guard == 0)
+		__stack_chk_guard = canary ?: 1;
+#else /* !CONFIG_X86_GLOBAL_STACKPROTECTOR */
 	current->stack_canary = canary;
 #ifdef CONFIG_X86_64
 	this_cpu_write(irq_stack_union.stack_canary, canary);
-#else
+#else /* CONFIG_X86_32 */
 	this_cpu_write(stack_canary.canary, canary);
 #endif
+#endif
 }
 
 static inline void setup_stack_canary_segment(int cpu)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
 	struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
 	struct desc_struct desc;
@@ -99,7 +108,7 @@ static inline void setup_stack_canary_segment(int cpu)
 
 static inline void load_stack_canary_segment(void)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
 #endif
 }
@@ -115,7 +124,7 @@ static inline void setup_stack_canary_segment(int cpu)
 
 static inline void load_stack_canary_segment(void)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	asm volatile ("mov %0, %%gs" : : "r" (0));
 #endif
 }
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index de827d6ac8c2..b30a12cd021e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -30,7 +30,8 @@
 void common(void) {
 	BLANK();
 	OFFSET(TASK_threadsp, task_struct, thread.sp);
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	OFFSET(TASK_stack_canary, task_struct, stack_canary);
 #endif
 
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 710edab9e644..33584e7e486b 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -54,7 +54,8 @@ void foo(void)
 	/* Size of SYSENTER_stack */
 	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	BLANK();
 	OFFSET(stack_canary_offset, stack_canary, canary);
 #endif
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index cf42206926af..06feb31a09f5 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -64,7 +64,8 @@ int main(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	BLANK();
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
 	BLANK();
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8732c9e719d5..839f4da2ed76 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1431,7 +1431,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
 	(unsigned long)&init_thread_union + THREAD_SIZE;
 EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 #endif
 
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 9ed3074d0d27..a55a67b33934 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -377,7 +377,8 @@ ENDPROC(startup_32_smp)
  */
 __INIT
 setup_once:
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	/*
 	 * Configure the stack canary. The linker can't handle this by
 	 * relocation.  Manually set base address in stack canary
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd6b85fac666..66ea1a35413e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -73,6 +73,11 @@ EXPORT_PER_CPU_SYMBOL(cpu_tss);
 DEFINE_PER_CPU(bool, __tss_limit_invalid);
 EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid);
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+unsigned long __stack_chk_guard __read_mostly;
+EXPORT_SYMBOL(__stack_chk_guard);
+#endif
+
 /*
  * this gets called so that we can store lazy state into memory and copy the
  * current task into the new thread.
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 19/27] x86: Support global stack cookie
  2017-10-04 21:19 ` Thomas Garnier
                   ` (37 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add an off-by-default configuration option to use a global stack cookie
instead of the default TLS. This configuration option will only be used
with PIE binaries.

For kernel stack cookie, the compiler uses the mcmodel=kernel to switch
between the fs segment to gs segment. A PIE binary does not use
mcmodel=kernel because it can be relocated anywhere, therefore the
compiler will default to the fs segment register. This is going to be
fixed with a compiler change allowing to pick the segment register as
done on PowerPC. In the meantime, this configuration can be used to
support older compilers.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig                      |  4 ++++
 arch/x86/Makefile                     |  9 +++++++++
 arch/x86/entry/entry_32.S             |  3 ++-
 arch/x86/entry/entry_64.S             |  3 ++-
 arch/x86/include/asm/processor.h      |  3 ++-
 arch/x86/include/asm/stackprotector.h | 19 ++++++++++++++-----
 arch/x86/kernel/asm-offsets.c         |  3 ++-
 arch/x86/kernel/asm-offsets_32.c      |  3 ++-
 arch/x86/kernel/asm-offsets_64.c      |  3 ++-
 arch/x86/kernel/cpu/common.c          |  3 ++-
 arch/x86/kernel/head_32.S             |  3 ++-
 arch/x86/kernel/process.c             |  5 +++++
 12 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 063f1e0d51aa..777197fab6dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2133,6 +2133,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+config X86_GLOBAL_STACKPROTECTOR
+	bool
+	depends on CC_STACKPROTECTOR
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 6276572259c8..4cb4f0495ddc 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -60,6 +60,15 @@ endif
 KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
 KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
 
+ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+        ifeq ($(call cc-option, -mstack-protector-guard=global),)
+                $(error Cannot use CONFIG_X86_GLOBAL_STACKPROTECTOR: \
+                        -mstack-protector-guard=global not supported \
+                        by compiler)
+        endif
+        KBUILD_CFLAGS += -mstack-protector-guard=global
+endif
+
 ifeq ($(CONFIG_X86_32),y)
         BITS := 32
         UTS_MACHINE := i386
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8a13d468635a..ab3e5056722f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -237,7 +237,8 @@ ENTRY(__switch_to_asm)
 	movl	%esp, TASK_threadsp(%eax)
 	movl	TASK_threadsp(%edx), %esp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	movl	TASK_stack_canary(%edx), %ebx
 	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
 #endif
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d3a52d2342af..01be62c1b436 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -390,7 +390,8 @@ ENTRY(__switch_to_asm)
 	movq	%rsp, TASK_threadsp(%rdi)
 	movq	TASK_threadsp(%rsi), %rsp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	movq	TASK_stack_canary(%rsi), %rbx
 	movq	%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b09bd50b06c7..e3a7ef8d5fb8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -394,7 +394,8 @@ DECLARE_PER_CPU(char *, irq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 #else	/* X86_64 */
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 /*
  * Make sure stack canary segment base is cached-aligned:
  *   "For Intel Atom processors, avoid non zero segment base address
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 8abedf1d650e..66462d778dc5 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -51,6 +51,10 @@
 #define GDT_STACK_CANARY_INIT						\
 	[GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+extern unsigned long __stack_chk_guard;
+#endif
+
 /*
  * Initialize the stackprotector canary value.
  *
@@ -62,7 +66,7 @@ static __always_inline void boot_init_stack_canary(void)
 	u64 canary;
 	u64 tsc;
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
 #endif
 	/*
@@ -76,17 +80,22 @@ static __always_inline void boot_init_stack_canary(void)
 	canary += tsc + (tsc << 32UL);
 	canary &= CANARY_MASK;
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+	if (__stack_chk_guard == 0)
+		__stack_chk_guard = canary ?: 1;
+#else /* !CONFIG_X86_GLOBAL_STACKPROTECTOR */
 	current->stack_canary = canary;
 #ifdef CONFIG_X86_64
 	this_cpu_write(irq_stack_union.stack_canary, canary);
-#else
+#else /* CONFIG_X86_32 */
 	this_cpu_write(stack_canary.canary, canary);
 #endif
+#endif
 }
 
 static inline void setup_stack_canary_segment(int cpu)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
 	struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
 	struct desc_struct desc;
@@ -99,7 +108,7 @@ static inline void setup_stack_canary_segment(int cpu)
 
 static inline void load_stack_canary_segment(void)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
 #endif
 }
@@ -115,7 +124,7 @@ static inline void setup_stack_canary_segment(int cpu)
 
 static inline void load_stack_canary_segment(void)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	asm volatile ("mov %0, %%gs" : : "r" (0));
 #endif
 }
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index de827d6ac8c2..b30a12cd021e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -30,7 +30,8 @@
 void common(void) {
 	BLANK();
 	OFFSET(TASK_threadsp, task_struct, thread.sp);
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	OFFSET(TASK_stack_canary, task_struct, stack_canary);
 #endif
 
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 710edab9e644..33584e7e486b 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -54,7 +54,8 @@ void foo(void)
 	/* Size of SYSENTER_stack */
 	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	BLANK();
 	OFFSET(stack_canary_offset, stack_canary, canary);
 #endif
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index cf42206926af..06feb31a09f5 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -64,7 +64,8 @@ int main(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	BLANK();
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
 	BLANK();
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8732c9e719d5..839f4da2ed76 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1431,7 +1431,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
 	(unsigned long)&init_thread_union + THREAD_SIZE;
 EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 #endif
 
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 9ed3074d0d27..a55a67b33934 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -377,7 +377,8 @@ ENDPROC(startup_32_smp)
  */
 __INIT
 setup_once:
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	/*
 	 * Configure the stack canary. The linker can't handle this by
 	 * relocation.  Manually set base address in stack canary
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd6b85fac666..66ea1a35413e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -73,6 +73,11 @@ EXPORT_PER_CPU_SYMBOL(cpu_tss);
 DEFINE_PER_CPU(bool, __tss_limit_invalid);
 EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid);
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+unsigned long __stack_chk_guard __read_mostly;
+EXPORT_SYMBOL(__stack_chk_guard);
+#endif
+
 /*
  * this gets called so that we can store lazy state into memory and copy the
  * current task into the new thread.
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 19/27] x86: Support global stack cookie
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Add an off-by-default configuration option to use a global stack cookie
instead of the default TLS. This configuration option will only be used
with PIE binaries.

For kernel stack cookie, the compiler uses the mcmodel=kernel to switch
between the fs segment to gs segment. A PIE binary does not use
mcmodel=kernel because it can be relocated anywhere, therefore the
compiler will default to the fs segment register. This is going to be
fixed with a compiler change allowing to pick the segment register as
done on PowerPC. In the meantime, this configuration can be used to
support older compilers.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig                      |  4 ++++
 arch/x86/Makefile                     |  9 +++++++++
 arch/x86/entry/entry_32.S             |  3 ++-
 arch/x86/entry/entry_64.S             |  3 ++-
 arch/x86/include/asm/processor.h      |  3 ++-
 arch/x86/include/asm/stackprotector.h | 19 ++++++++++++++-----
 arch/x86/kernel/asm-offsets.c         |  3 ++-
 arch/x86/kernel/asm-offsets_32.c      |  3 ++-
 arch/x86/kernel/asm-offsets_64.c      |  3 ++-
 arch/x86/kernel/cpu/common.c          |  3 ++-
 arch/x86/kernel/head_32.S             |  3 ++-
 arch/x86/kernel/process.c             |  5 +++++
 12 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 063f1e0d51aa..777197fab6dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2133,6 +2133,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+config X86_GLOBAL_STACKPROTECTOR
+	bool
+	depends on CC_STACKPROTECTOR
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 6276572259c8..4cb4f0495ddc 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -60,6 +60,15 @@ endif
 KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
 KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
 
+ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+        ifeq ($(call cc-option, -mstack-protector-guard=global),)
+                $(error Cannot use CONFIG_X86_GLOBAL_STACKPROTECTOR: \
+                        -mstack-protector-guard=global not supported \
+                        by compiler)
+        endif
+        KBUILD_CFLAGS += -mstack-protector-guard=global
+endif
+
 ifeq ($(CONFIG_X86_32),y)
         BITS := 32
         UTS_MACHINE := i386
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8a13d468635a..ab3e5056722f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -237,7 +237,8 @@ ENTRY(__switch_to_asm)
 	movl	%esp, TASK_threadsp(%eax)
 	movl	TASK_threadsp(%edx), %esp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	movl	TASK_stack_canary(%edx), %ebx
 	movl	%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
 #endif
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d3a52d2342af..01be62c1b436 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -390,7 +390,8 @@ ENTRY(__switch_to_asm)
 	movq	%rsp, TASK_threadsp(%rdi)
 	movq	TASK_threadsp(%rsi), %rsp
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	movq	TASK_stack_canary(%rsi), %rbx
 	movq	%rbx, PER_CPU_VAR(irq_stack_union + stack_canary_offset)
 #endif
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index b09bd50b06c7..e3a7ef8d5fb8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -394,7 +394,8 @@ DECLARE_PER_CPU(char *, irq_stack_ptr);
 DECLARE_PER_CPU(unsigned int, irq_count);
 extern asmlinkage void ignore_sysret(void);
 #else	/* X86_64 */
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 /*
  * Make sure stack canary segment base is cached-aligned:
  *   "For Intel Atom processors, avoid non zero segment base address
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 8abedf1d650e..66462d778dc5 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -51,6 +51,10 @@
 #define GDT_STACK_CANARY_INIT						\
 	[GDT_ENTRY_STACK_CANARY] = GDT_ENTRY_INIT(0x4090, 0, 0x18),
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+extern unsigned long __stack_chk_guard;
+#endif
+
 /*
  * Initialize the stackprotector canary value.
  *
@@ -62,7 +66,7 @@ static __always_inline void boot_init_stack_canary(void)
 	u64 canary;
 	u64 tsc;
 
-#ifdef CONFIG_X86_64
+#if defined(CONFIG_X86_64) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	BUILD_BUG_ON(offsetof(union irq_stack_union, stack_canary) != 40);
 #endif
 	/*
@@ -76,17 +80,22 @@ static __always_inline void boot_init_stack_canary(void)
 	canary += tsc + (tsc << 32UL);
 	canary &= CANARY_MASK;
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+	if (__stack_chk_guard == 0)
+		__stack_chk_guard = canary ?: 1;
+#else /* !CONFIG_X86_GLOBAL_STACKPROTECTOR */
 	current->stack_canary = canary;
 #ifdef CONFIG_X86_64
 	this_cpu_write(irq_stack_union.stack_canary, canary);
-#else
+#else /* CONFIG_X86_32 */
 	this_cpu_write(stack_canary.canary, canary);
 #endif
+#endif
 }
 
 static inline void setup_stack_canary_segment(int cpu)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	unsigned long canary = (unsigned long)&per_cpu(stack_canary, cpu);
 	struct desc_struct *gdt_table = get_cpu_gdt_rw(cpu);
 	struct desc_struct desc;
@@ -99,7 +108,7 @@ static inline void setup_stack_canary_segment(int cpu)
 
 static inline void load_stack_canary_segment(void)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	asm("mov %0, %%gs" : : "r" (__KERNEL_STACK_CANARY) : "memory");
 #endif
 }
@@ -115,7 +124,7 @@ static inline void setup_stack_canary_segment(int cpu)
 
 static inline void load_stack_canary_segment(void)
 {
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	asm volatile ("mov %0, %%gs" : : "r" (0));
 #endif
 }
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index de827d6ac8c2..b30a12cd021e 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -30,7 +30,8 @@
 void common(void) {
 	BLANK();
 	OFFSET(TASK_threadsp, task_struct, thread.sp);
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	OFFSET(TASK_stack_canary, task_struct, stack_canary);
 #endif
 
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 710edab9e644..33584e7e486b 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -54,7 +54,8 @@ void foo(void)
 	/* Size of SYSENTER_stack */
 	DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack));
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	BLANK();
 	OFFSET(stack_canary_offset, stack_canary, canary);
 #endif
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index cf42206926af..06feb31a09f5 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -64,7 +64,8 @@ int main(void)
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
 	BLANK();
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	DEFINE(stack_canary_offset, offsetof(union irq_stack_union, stack_canary));
 	BLANK();
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8732c9e719d5..839f4da2ed76 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1431,7 +1431,8 @@ DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) =
 	(unsigned long)&init_thread_union + THREAD_SIZE;
 EXPORT_PER_CPU_SYMBOL(cpu_current_top_of_stack);
 
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 DEFINE_PER_CPU_ALIGNED(struct stack_canary, stack_canary);
 #endif
 
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 9ed3074d0d27..a55a67b33934 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -377,7 +377,8 @@ ENDPROC(startup_32_smp)
  */
 __INIT
 setup_once:
-#ifdef CONFIG_CC_STACKPROTECTOR
+#if defined(CONFIG_CC_STACKPROTECTOR) && \
+	!defined(CONFIG_X86_GLOBAL_STACKPROTECTOR)
 	/*
 	 * Configure the stack canary. The linker can't handle this by
 	 * relocation.  Manually set base address in stack canary
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd6b85fac666..66ea1a35413e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -73,6 +73,11 @@ EXPORT_PER_CPU_SYMBOL(cpu_tss);
 DEFINE_PER_CPU(bool, __tss_limit_invalid);
 EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid);
 
+#ifdef CONFIG_X86_GLOBAL_STACKPROTECTOR
+unsigned long __stack_chk_guard __read_mostly;
+EXPORT_SYMBOL(__stack_chk_guard);
+#endif
+
 /*
  * this gets called so that we can store lazy state into memory and copy the
  * current task into the new thread.
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

When using -fPIE/PIC with function tracing, the compiler generates a
call through the GOT (call *__fentry__@GOTPCREL). This instruction
takes 6 bytes instead of 5 on the usual relative call.

With this change, function tracing supports 6 bytes on traceable
function and can still replace relative calls on the ftrace assembly
functions.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/ftrace.h   |  23 +++++-
 arch/x86/include/asm/sections.h |   4 +
 arch/x86/kernel/ftrace.c        | 168 ++++++++++++++++++++++++++--------------
 arch/x86/kernel/module.lds      |   3 +
 4 files changed, 139 insertions(+), 59 deletions(-)
 create mode 100644 arch/x86/kernel/module.lds

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index eccd0ac6bc38..b8bbcc7fad7f 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_FTRACE_H
 #define _ASM_X86_FTRACE_H
 
+
 #ifdef CONFIG_FUNCTION_TRACER
 #ifdef CC_USING_FENTRY
 # define MCOUNT_ADDR		((unsigned long)(__fentry__))
@@ -8,7 +9,19 @@
 # define MCOUNT_ADDR		((unsigned long)(mcount))
 # define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
-#define MCOUNT_INSN_SIZE	5 /* sizeof mcount call */
+
+#define MCOUNT_RELINSN_SIZE	5 /* sizeof relative (call or jump) */
+#define MCOUNT_GOTCALL_SIZE	6 /* sizeof call *got */
+
+/*
+ * MCOUNT_INSN_SIZE is the highest size of instructions based on the
+ * configuration.
+ */
+#ifdef CONFIG_X86_PIE
+#define MCOUNT_INSN_SIZE	MCOUNT_GOTCALL_SIZE
+#else
+#define MCOUNT_INSN_SIZE	MCOUNT_RELINSN_SIZE
+#endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 #define ARCH_SUPPORTS_FTRACE_OPS 1
@@ -17,6 +30,8 @@
 #define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
 
 #ifndef __ASSEMBLY__
+#include <asm/sections.h>
+
 extern void mcount(void);
 extern atomic_t modifying_ftrace_code;
 extern void __fentry__(void);
@@ -24,9 +39,11 @@ extern void __fentry__(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
 	/*
-	 * addr is the address of the mcount call instruction.
-	 * recordmcount does the necessary offset calculation.
+	 * addr is the address of the mcount call instruction. PIE has always a
+	 * byte added to the start of the function.
 	 */
+	if (IS_ENABLED(CONFIG_X86_PIE))
+		addr -= 1;
 	return addr;
 }
 
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 2f75f30cb2f6..6b2d496cf1aa 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -11,4 +11,8 @@ extern struct exception_table_entry __stop___ex_table[];
 extern char __end_rodata_hpage_align[];
 #endif
 
+#if defined(CONFIG_X86_PIE)
+extern char __start_got[], __end_got[];
+#endif
+
 #endif	/* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 9bef1bbeba63..41d8c4c4306d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -58,12 +58,17 @@ static int ftrace_calc_offset(long ip, long addr)
 	return (int)(addr - ip);
 }
 
-static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr,
+					  unsigned int size)
 {
 	static union ftrace_code_union calc;
 
+	/* On PIE, fill the rest of the buffer with nops */
+	if (IS_ENABLED(CONFIG_X86_PIE))
+		memset(calc.code, ideal_nops[1][0], sizeof(calc.code));
+
 	calc.e8		= 0xe8;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_RELINSN_SIZE, addr);
 
 	/*
 	 * No locking needed, this must be called via kstop_machine
@@ -72,6 +77,44 @@ static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 	return calc.code;
 }
 
+#ifdef CONFIG_X86_PIE
+union ftrace_code_got_union {
+	char code[MCOUNT_INSN_SIZE];
+	struct {
+		unsigned short ff15;
+		int offset;
+	} __attribute__((packed));
+};
+
+/* Used to identify a mcount GOT call on PIE */
+static unsigned char *ftrace_original_call(struct module* mod, unsigned long ip,
+					   unsigned long addr,
+					   unsigned int size)
+{
+	static union ftrace_code_got_union calc;
+	unsigned long gotaddr;
+
+	calc.ff15 = 0x15ff;
+
+	gotaddr = module_find_got_entry(mod, addr);
+	if (!gotaddr) {
+		pr_err("Failed to find GOT entry for 0x%lx\n", addr);
+		return NULL;
+	}
+
+	calc.offset = ftrace_calc_offset(ip + MCOUNT_GOTCALL_SIZE, gotaddr);
+	return calc.code;
+}
+#else
+static unsigned char *ftrace_original_call(struct module* mod, unsigned long ip,
+					   unsigned long addr,
+					   unsigned int size)
+{
+	return ftrace_call_replace(ip, addr, size);
+}
+
+#endif
+
 static inline int
 within(unsigned long addr, unsigned long start, unsigned long end)
 {
@@ -94,16 +137,18 @@ static unsigned long text_ip_addr(unsigned long ip)
 	return ip;
 }
 
-static const unsigned char *ftrace_nop_replace(void)
+static const unsigned char *ftrace_nop_replace(unsigned int size)
 {
-	return ideal_nops[NOP_ATOMIC5];
+	return ideal_nops[size == 5 ? NOP_ATOMIC5 : size];
 }
 
 static int
-ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
-		   unsigned const char *new_code)
+ftrace_modify_code_direct(struct dyn_ftrace *rec, unsigned const char *old_code,
+			  unsigned const char *new_code)
 {
 	unsigned char replaced[MCOUNT_INSN_SIZE];
+	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	ftrace_expected = old_code;
 
@@ -116,17 +161,17 @@ ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
 	 */
 
 	/* read the text we want to modify */
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(replaced, (void *)ip, size))
 		return -EFAULT;
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(replaced, old_code, size) != 0)
 		return -EINVAL;
 
 	ip = text_ip_addr(ip);
 
 	/* replace the text with the new text */
-	if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
+	if (probe_kernel_write((void *)ip, new_code, size))
 		return -EPERM;
 
 	sync_core();
@@ -139,9 +184,7 @@ int ftrace_make_nop(struct module *mod,
 {
 	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
-
-	old = ftrace_call_replace(ip, addr);
-	new = ftrace_nop_replace();
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	/*
 	 * On boot up, and when modules are loaded, the MCOUNT_ADDR
@@ -151,14 +194,20 @@ int ftrace_make_nop(struct module *mod,
 	 * We do not want to use the breakpoint version in this case,
 	 * just modify the code directly.
 	 */
-	if (addr == MCOUNT_ADDR)
-		return ftrace_modify_code_direct(rec->ip, old, new);
+	if (addr != MCOUNT_ADDR) {
+		ftrace_expected = NULL;
 
-	ftrace_expected = NULL;
+		/* Normal cases use add_brk_on_nop */
+		WARN_ONCE(1, "invalid use of ftrace_make_nop");
+		return -EINVAL;
+	}
 
-	/* Normal cases use add_brk_on_nop */
-	WARN_ONCE(1, "invalid use of ftrace_make_nop");
-	return -EINVAL;
+	old = ftrace_original_call(mod, ip, addr, size);
+	if (!old)
+		return -EINVAL;
+	new = ftrace_nop_replace(size);
+
+	return ftrace_modify_code_direct(rec, old, new);
 }
 
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
@@ -166,11 +215,11 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
 
-	old = ftrace_nop_replace();
-	new = ftrace_call_replace(ip, addr);
+	old = ftrace_nop_replace(MCOUNT_INSN_SIZE);
+	new = ftrace_call_replace(ip, addr, MCOUNT_INSN_SIZE);
 
 	/* Should only be called when module is loaded */
-	return ftrace_modify_code_direct(rec->ip, old, new);
+	return ftrace_modify_code_direct(rec, old, new);
 }
 
 /*
@@ -233,7 +282,7 @@ static int update_ftrace_func(unsigned long ip, void *new)
 	unsigned char old[MCOUNT_INSN_SIZE];
 	int ret;
 
-	memcpy(old, (void *)ip, MCOUNT_INSN_SIZE);
+	memcpy(old, (void *)ip, MCOUNT_RELINSN_SIZE);
 
 	ftrace_update_func = ip;
 	/* Make sure the breakpoints see the ftrace_update_func update */
@@ -255,13 +304,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
-	new = ftrace_call_replace(ip, (unsigned long)func);
+	new = ftrace_call_replace(ip, (unsigned long)func, MCOUNT_RELINSN_SIZE);
 	ret = update_ftrace_func(ip, new);
 
 	/* Also update the regs callback function */
 	if (!ret) {
 		ip = (unsigned long)(&ftrace_regs_call);
-		new = ftrace_call_replace(ip, (unsigned long)func);
+		new = ftrace_call_replace(ip, (unsigned long)func,
+					  MCOUNT_RELINSN_SIZE);
 		ret = update_ftrace_func(ip, new);
 	}
 
@@ -309,18 +359,18 @@ static int ftrace_write(unsigned long ip, const char *val, int size)
 	return 0;
 }
 
-static int add_break(unsigned long ip, const char *old)
+static int add_break(unsigned long ip, const char *old, unsigned int size)
 {
 	unsigned char replaced[MCOUNT_INSN_SIZE];
 	unsigned char brk = BREAKPOINT_INSTRUCTION;
 
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(replaced, (void *)ip, size))
 		return -EFAULT;
 
 	ftrace_expected = old;
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(replaced, old, size) != 0)
 		return -EINVAL;
 
 	return ftrace_write(ip, &brk, 1);
@@ -330,20 +380,22 @@ static int add_brk_on_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned const char *old;
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
-	old = ftrace_call_replace(ip, addr);
+	old = ftrace_call_replace(ip, addr, size);
 
-	return add_break(rec->ip, old);
+	return add_break(rec->ip, old, size);
 }
 
 
 static int add_brk_on_nop(struct dyn_ftrace *rec)
 {
 	unsigned const char *old;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
-	old = ftrace_nop_replace();
+	old = ftrace_nop_replace(size);
 
-	return add_break(rec->ip, old);
+	return add_break(rec->ip, old, size);
 }
 
 static int add_breakpoints(struct dyn_ftrace *rec, int enable)
@@ -386,22 +438,23 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 	const unsigned char *nop;
 	unsigned long ftrace_addr;
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	/* If we fail the read, just give up */
-	if (probe_kernel_read(ins, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(ins, (void *)ip, size))
 		return -EFAULT;
 
 	/* If this does not have a breakpoint, we are done */
 	if (ins[0] != brk)
 		return 0;
 
-	nop = ftrace_nop_replace();
+	nop = ftrace_nop_replace(size);
 
 	/*
 	 * If the last 4 bytes of the instruction do not match
 	 * a nop, then we assume that this is a call to ftrace_addr.
 	 */
-	if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0) {
+	if (memcmp(&ins[1], &nop[1], size - 1) != 0) {
 		/*
 		 * For extra paranoidism, we check if the breakpoint is on
 		 * a call that would actually jump to the ftrace_addr.
@@ -409,18 +462,18 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 		 * a disaster.
 		 */
 		ftrace_addr = ftrace_get_addr_new(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
+		nop = ftrace_call_replace(ip, ftrace_addr, size);
 
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) == 0)
+		if (memcmp(&ins[1], &nop[1], size - 1) == 0)
 			goto update;
 
 		/* Check both ftrace_addr and ftrace_old_addr */
 		ftrace_addr = ftrace_get_addr_curr(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
+		nop = ftrace_call_replace(ip, ftrace_addr, size);
 
 		ftrace_expected = nop;
 
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0)
+		if (memcmp(&ins[1], &nop[1], size - 1) != 0)
 			return -EINVAL;
 	}
 
@@ -428,30 +481,33 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 	return ftrace_write(ip, nop, 1);
 }
 
-static int add_update_code(unsigned long ip, unsigned const char *new)
+static int add_update_code(unsigned long ip, unsigned const char *new,
+			   unsigned int size)
 {
 	/* skip breakpoint */
 	ip++;
 	new++;
-	return ftrace_write(ip, new, MCOUNT_INSN_SIZE - 1);
+	return ftrace_write(ip, new, size - 1);
 }
 
 static int add_update_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 	unsigned const char *new;
 
-	new = ftrace_call_replace(ip, addr);
-	return add_update_code(ip, new);
+	new = ftrace_call_replace(ip, addr, size);
+	return add_update_code(ip, new, size);
 }
 
 static int add_update_nop(struct dyn_ftrace *rec)
 {
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 	unsigned const char *new;
 
-	new = ftrace_nop_replace();
-	return add_update_code(ip, new);
+	new = ftrace_nop_replace(size);
+	return add_update_code(ip, new, size);
 }
 
 static int add_update(struct dyn_ftrace *rec, int enable)
@@ -485,7 +541,7 @@ static int finish_update_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned long ip = rec->ip;
 	unsigned const char *new;
 
-	new = ftrace_call_replace(ip, addr);
+	new = ftrace_call_replace(ip, addr, MCOUNT_INSN_SIZE);
 
 	return ftrace_write(ip, new, 1);
 }
@@ -495,7 +551,7 @@ static int finish_update_nop(struct dyn_ftrace *rec)
 	unsigned long ip = rec->ip;
 	unsigned const char *new;
 
-	new = ftrace_nop_replace();
+	new = ftrace_nop_replace(MCOUNT_INSN_SIZE);
 
 	return ftrace_write(ip, new, 1);
 }
@@ -619,13 +675,13 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
 {
 	int ret;
 
-	ret = add_break(ip, old_code);
+	ret = add_break(ip, old_code, MCOUNT_RELINSN_SIZE);
 	if (ret)
 		goto out;
 
 	run_sync();
 
-	ret = add_update_code(ip, new_code);
+	ret = add_update_code(ip, new_code, MCOUNT_RELINSN_SIZE);
 	if (ret)
 		goto fail_update;
 
@@ -670,7 +726,7 @@ static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 
 	/* Jmp not a call (ignore the .e8) */
 	calc.e8		= 0xe9;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_RELINSN_SIZE, addr);
 
 	/*
 	 * ftrace external locks synchronize the access to the static variable.
@@ -766,11 +822,11 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 * the jmp to ftrace_epilogue, as well as the address of
 	 * the ftrace_ops this trampoline is used for.
 	 */
-	trampoline = alloc_tramp(size + MCOUNT_INSN_SIZE + sizeof(void *));
+	trampoline = alloc_tramp(size + MCOUNT_RELINSN_SIZE + sizeof(void *));
 	if (!trampoline)
 		return 0;
 
-	*tramp_size = size + MCOUNT_INSN_SIZE + sizeof(void *);
+	*tramp_size = size + MCOUNT_RELINSN_SIZE + sizeof(void *);
 
 	/* Copy ftrace_caller onto the trampoline memory */
 	ret = probe_kernel_read(trampoline, (void *)start_offset, size);
@@ -783,7 +839,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	/* The trampoline ends with a jmp to ftrace_epilogue */
 	jmp = ftrace_jmp_replace(ip, (unsigned long)ftrace_epilogue);
-	memcpy(trampoline + size, jmp, MCOUNT_INSN_SIZE);
+	memcpy(trampoline + size, jmp, MCOUNT_RELINSN_SIZE);
 
 	/*
 	 * The address of the ftrace_ops that is used for this trampoline
@@ -793,7 +849,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 * the global function_trace_op variable.
 	 */
 
-	ptr = (unsigned long *)(trampoline + size + MCOUNT_INSN_SIZE);
+	ptr = (unsigned long *)(trampoline + size + MCOUNT_RELINSN_SIZE);
 	*ptr = (unsigned long)ops;
 
 	op_offset -= start_offset;
@@ -868,7 +924,7 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 	func = ftrace_ops_get_func(ops);
 
 	/* Do a safe modify in case the trampoline is executing */
-	new = ftrace_call_replace(ip, (unsigned long)func);
+	new = ftrace_call_replace(ip, (unsigned long)func, MCOUNT_RELINSN_SIZE);
 	ret = update_ftrace_func(ip, new);
 	set_memory_ro(ops->trampoline, npages);
 
@@ -882,7 +938,7 @@ static void *addr_from_call(void *ptr)
 	union ftrace_code_union calc;
 	int ret;
 
-	ret = probe_kernel_read(&calc, ptr, MCOUNT_INSN_SIZE);
+	ret = probe_kernel_read(&calc, ptr, MCOUNT_RELINSN_SIZE);
 	if (WARN_ON_ONCE(ret < 0))
 		return NULL;
 
@@ -892,7 +948,7 @@ static void *addr_from_call(void *ptr)
 		return NULL;
 	}
 
-	return ptr + MCOUNT_INSN_SIZE + calc.offset;
+	return ptr + MCOUNT_RELINSN_SIZE + calc.offset;
 }
 
 void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
diff --git a/arch/x86/kernel/module.lds b/arch/x86/kernel/module.lds
new file mode 100644
index 000000000000..fd6e95a4b454
--- /dev/null
+++ b/arch/x86/kernel/module.lds
@@ -0,0 +1,3 @@
+SECTIONS {
+	.got (NOLOAD) : { BYTE(0) }
+}
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

When using -fPIE/PIC with function tracing, the compiler generates a
call through the GOT (call *__fentry__@GOTPCREL). This instruction
takes 6 bytes instead of 5 on the usual relative call.

With this change, function tracing supports 6 bytes on traceable
function and can still replace relative calls on the ftrace assembly
functions.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/ftrace.h   |  23 +++++-
 arch/x86/include/asm/sections.h |   4 +
 arch/x86/kernel/ftrace.c        | 168 ++++++++++++++++++++++++++--------------
 arch/x86/kernel/module.lds      |   3 +
 4 files changed, 139 insertions(+), 59 deletions(-)
 create mode 100644 arch/x86/kernel/module.lds

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index eccd0ac6bc38..b8bbcc7fad7f 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_FTRACE_H
 #define _ASM_X86_FTRACE_H
 
+
 #ifdef CONFIG_FUNCTION_TRACER
 #ifdef CC_USING_FENTRY
 # define MCOUNT_ADDR		((unsigned long)(__fentry__))
@@ -8,7 +9,19 @@
 # define MCOUNT_ADDR		((unsigned long)(mcount))
 # define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
-#define MCOUNT_INSN_SIZE	5 /* sizeof mcount call */
+
+#define MCOUNT_RELINSN_SIZE	5 /* sizeof relative (call or jump) */
+#define MCOUNT_GOTCALL_SIZE	6 /* sizeof call *got */
+
+/*
+ * MCOUNT_INSN_SIZE is the highest size of instructions based on the
+ * configuration.
+ */
+#ifdef CONFIG_X86_PIE
+#define MCOUNT_INSN_SIZE	MCOUNT_GOTCALL_SIZE
+#else
+#define MCOUNT_INSN_SIZE	MCOUNT_RELINSN_SIZE
+#endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 #define ARCH_SUPPORTS_FTRACE_OPS 1
@@ -17,6 +30,8 @@
 #define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
 
 #ifndef __ASSEMBLY__
+#include <asm/sections.h>
+
 extern void mcount(void);
 extern atomic_t modifying_ftrace_code;
 extern void __fentry__(void);
@@ -24,9 +39,11 @@ extern void __fentry__(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
 	/*
-	 * addr is the address of the mcount call instruction.
-	 * recordmcount does the necessary offset calculation.
+	 * addr is the address of the mcount call instruction. PIE has always a
+	 * byte added to the start of the function.
 	 */
+	if (IS_ENABLED(CONFIG_X86_PIE))
+		addr -= 1;
 	return addr;
 }
 
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 2f75f30cb2f6..6b2d496cf1aa 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -11,4 +11,8 @@ extern struct exception_table_entry __stop___ex_table[];
 extern char __end_rodata_hpage_align[];
 #endif
 
+#if defined(CONFIG_X86_PIE)
+extern char __start_got[], __end_got[];
+#endif
+
 #endif	/* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 9bef1bbeba63..41d8c4c4306d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -58,12 +58,17 @@ static int ftrace_calc_offset(long ip, long addr)
 	return (int)(addr - ip);
 }
 
-static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr,
+					  unsigned int size)
 {
 	static union ftrace_code_union calc;
 
+	/* On PIE, fill the rest of the buffer with nops */
+	if (IS_ENABLED(CONFIG_X86_PIE))
+		memset(calc.code, ideal_nops[1][0], sizeof(calc.code));
+
 	calc.e8		= 0xe8;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_RELINSN_SIZE, addr);
 
 	/*
 	 * No locking needed, this must be called via kstop_machine
@@ -72,6 +77,44 @@ static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 	return calc.code;
 }
 
+#ifdef CONFIG_X86_PIE
+union ftrace_code_got_union {
+	char code[MCOUNT_INSN_SIZE];
+	struct {
+		unsigned short ff15;
+		int offset;
+	} __attribute__((packed));
+};
+
+/* Used to identify a mcount GOT call on PIE */
+static unsigned char *ftrace_original_call(struct module* mod, unsigned long ip,
+					   unsigned long addr,
+					   unsigned int size)
+{
+	static union ftrace_code_got_union calc;
+	unsigned long gotaddr;
+
+	calc.ff15 = 0x15ff;
+
+	gotaddr = module_find_got_entry(mod, addr);
+	if (!gotaddr) {
+		pr_err("Failed to find GOT entry for 0x%lx\n", addr);
+		return NULL;
+	}
+
+	calc.offset = ftrace_calc_offset(ip + MCOUNT_GOTCALL_SIZE, gotaddr);
+	return calc.code;
+}
+#else
+static unsigned char *ftrace_original_call(struct module* mod, unsigned long ip,
+					   unsigned long addr,
+					   unsigned int size)
+{
+	return ftrace_call_replace(ip, addr, size);
+}
+
+#endif
+
 static inline int
 within(unsigned long addr, unsigned long start, unsigned long end)
 {
@@ -94,16 +137,18 @@ static unsigned long text_ip_addr(unsigned long ip)
 	return ip;
 }
 
-static const unsigned char *ftrace_nop_replace(void)
+static const unsigned char *ftrace_nop_replace(unsigned int size)
 {
-	return ideal_nops[NOP_ATOMIC5];
+	return ideal_nops[size == 5 ? NOP_ATOMIC5 : size];
 }
 
 static int
-ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
-		   unsigned const char *new_code)
+ftrace_modify_code_direct(struct dyn_ftrace *rec, unsigned const char *old_code,
+			  unsigned const char *new_code)
 {
 	unsigned char replaced[MCOUNT_INSN_SIZE];
+	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	ftrace_expected = old_code;
 
@@ -116,17 +161,17 @@ ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
 	 */
 
 	/* read the text we want to modify */
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(replaced, (void *)ip, size))
 		return -EFAULT;
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(replaced, old_code, size) != 0)
 		return -EINVAL;
 
 	ip = text_ip_addr(ip);
 
 	/* replace the text with the new text */
-	if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
+	if (probe_kernel_write((void *)ip, new_code, size))
 		return -EPERM;
 
 	sync_core();
@@ -139,9 +184,7 @@ int ftrace_make_nop(struct module *mod,
 {
 	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
-
-	old = ftrace_call_replace(ip, addr);
-	new = ftrace_nop_replace();
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	/*
 	 * On boot up, and when modules are loaded, the MCOUNT_ADDR
@@ -151,14 +194,20 @@ int ftrace_make_nop(struct module *mod,
 	 * We do not want to use the breakpoint version in this case,
 	 * just modify the code directly.
 	 */
-	if (addr == MCOUNT_ADDR)
-		return ftrace_modify_code_direct(rec->ip, old, new);
+	if (addr != MCOUNT_ADDR) {
+		ftrace_expected = NULL;
 
-	ftrace_expected = NULL;
+		/* Normal cases use add_brk_on_nop */
+		WARN_ONCE(1, "invalid use of ftrace_make_nop");
+		return -EINVAL;
+	}
 
-	/* Normal cases use add_brk_on_nop */
-	WARN_ONCE(1, "invalid use of ftrace_make_nop");
-	return -EINVAL;
+	old = ftrace_original_call(mod, ip, addr, size);
+	if (!old)
+		return -EINVAL;
+	new = ftrace_nop_replace(size);
+
+	return ftrace_modify_code_direct(rec, old, new);
 }
 
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
@@ -166,11 +215,11 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
 
-	old = ftrace_nop_replace();
-	new = ftrace_call_replace(ip, addr);
+	old = ftrace_nop_replace(MCOUNT_INSN_SIZE);
+	new = ftrace_call_replace(ip, addr, MCOUNT_INSN_SIZE);
 
 	/* Should only be called when module is loaded */
-	return ftrace_modify_code_direct(rec->ip, old, new);
+	return ftrace_modify_code_direct(rec, old, new);
 }
 
 /*
@@ -233,7 +282,7 @@ static int update_ftrace_func(unsigned long ip, void *new)
 	unsigned char old[MCOUNT_INSN_SIZE];
 	int ret;
 
-	memcpy(old, (void *)ip, MCOUNT_INSN_SIZE);
+	memcpy(old, (void *)ip, MCOUNT_RELINSN_SIZE);
 
 	ftrace_update_func = ip;
 	/* Make sure the breakpoints see the ftrace_update_func update */
@@ -255,13 +304,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
-	new = ftrace_call_replace(ip, (unsigned long)func);
+	new = ftrace_call_replace(ip, (unsigned long)func, MCOUNT_RELINSN_SIZE);
 	ret = update_ftrace_func(ip, new);
 
 	/* Also update the regs callback function */
 	if (!ret) {
 		ip = (unsigned long)(&ftrace_regs_call);
-		new = ftrace_call_replace(ip, (unsigned long)func);
+		new = ftrace_call_replace(ip, (unsigned long)func,
+					  MCOUNT_RELINSN_SIZE);
 		ret = update_ftrace_func(ip, new);
 	}
 
@@ -309,18 +359,18 @@ static int ftrace_write(unsigned long ip, const char *val, int size)
 	return 0;
 }
 
-static int add_break(unsigned long ip, const char *old)
+static int add_break(unsigned long ip, const char *old, unsigned int size)
 {
 	unsigned char replaced[MCOUNT_INSN_SIZE];
 	unsigned char brk = BREAKPOINT_INSTRUCTION;
 
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(replaced, (void *)ip, size))
 		return -EFAULT;
 
 	ftrace_expected = old;
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(replaced, old, size) != 0)
 		return -EINVAL;
 
 	return ftrace_write(ip, &brk, 1);
@@ -330,20 +380,22 @@ static int add_brk_on_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned const char *old;
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
-	old = ftrace_call_replace(ip, addr);
+	old = ftrace_call_replace(ip, addr, size);
 
-	return add_break(rec->ip, old);
+	return add_break(rec->ip, old, size);
 }
 
 
 static int add_brk_on_nop(struct dyn_ftrace *rec)
 {
 	unsigned const char *old;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
-	old = ftrace_nop_replace();
+	old = ftrace_nop_replace(size);
 
-	return add_break(rec->ip, old);
+	return add_break(rec->ip, old, size);
 }
 
 static int add_breakpoints(struct dyn_ftrace *rec, int enable)
@@ -386,22 +438,23 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 	const unsigned char *nop;
 	unsigned long ftrace_addr;
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	/* If we fail the read, just give up */
-	if (probe_kernel_read(ins, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(ins, (void *)ip, size))
 		return -EFAULT;
 
 	/* If this does not have a breakpoint, we are done */
 	if (ins[0] != brk)
 		return 0;
 
-	nop = ftrace_nop_replace();
+	nop = ftrace_nop_replace(size);
 
 	/*
 	 * If the last 4 bytes of the instruction do not match
 	 * a nop, then we assume that this is a call to ftrace_addr.
 	 */
-	if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0) {
+	if (memcmp(&ins[1], &nop[1], size - 1) != 0) {
 		/*
 		 * For extra paranoidism, we check if the breakpoint is on
 		 * a call that would actually jump to the ftrace_addr.
@@ -409,18 +462,18 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 		 * a disaster.
 		 */
 		ftrace_addr = ftrace_get_addr_new(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
+		nop = ftrace_call_replace(ip, ftrace_addr, size);
 
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) == 0)
+		if (memcmp(&ins[1], &nop[1], size - 1) == 0)
 			goto update;
 
 		/* Check both ftrace_addr and ftrace_old_addr */
 		ftrace_addr = ftrace_get_addr_curr(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
+		nop = ftrace_call_replace(ip, ftrace_addr, size);
 
 		ftrace_expected = nop;
 
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0)
+		if (memcmp(&ins[1], &nop[1], size - 1) != 0)
 			return -EINVAL;
 	}
 
@@ -428,30 +481,33 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 	return ftrace_write(ip, nop, 1);
 }
 
-static int add_update_code(unsigned long ip, unsigned const char *new)
+static int add_update_code(unsigned long ip, unsigned const char *new,
+			   unsigned int size)
 {
 	/* skip breakpoint */
 	ip++;
 	new++;
-	return ftrace_write(ip, new, MCOUNT_INSN_SIZE - 1);
+	return ftrace_write(ip, new, size - 1);
 }
 
 static int add_update_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 	unsigned const char *new;
 
-	new = ftrace_call_replace(ip, addr);
-	return add_update_code(ip, new);
+	new = ftrace_call_replace(ip, addr, size);
+	return add_update_code(ip, new, size);
 }
 
 static int add_update_nop(struct dyn_ftrace *rec)
 {
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 	unsigned const char *new;
 
-	new = ftrace_nop_replace();
-	return add_update_code(ip, new);
+	new = ftrace_nop_replace(size);
+	return add_update_code(ip, new, size);
 }
 
 static int add_update(struct dyn_ftrace *rec, int enable)
@@ -485,7 +541,7 @@ static int finish_update_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned long ip = rec->ip;
 	unsigned const char *new;
 
-	new = ftrace_call_replace(ip, addr);
+	new = ftrace_call_replace(ip, addr, MCOUNT_INSN_SIZE);
 
 	return ftrace_write(ip, new, 1);
 }
@@ -495,7 +551,7 @@ static int finish_update_nop(struct dyn_ftrace *rec)
 	unsigned long ip = rec->ip;
 	unsigned const char *new;
 
-	new = ftrace_nop_replace();
+	new = ftrace_nop_replace(MCOUNT_INSN_SIZE);
 
 	return ftrace_write(ip, new, 1);
 }
@@ -619,13 +675,13 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
 {
 	int ret;
 
-	ret = add_break(ip, old_code);
+	ret = add_break(ip, old_code, MCOUNT_RELINSN_SIZE);
 	if (ret)
 		goto out;
 
 	run_sync();
 
-	ret = add_update_code(ip, new_code);
+	ret = add_update_code(ip, new_code, MCOUNT_RELINSN_SIZE);
 	if (ret)
 		goto fail_update;
 
@@ -670,7 +726,7 @@ static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 
 	/* Jmp not a call (ignore the .e8) */
 	calc.e8		= 0xe9;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_RELINSN_SIZE, addr);
 
 	/*
 	 * ftrace external locks synchronize the access to the static variable.
@@ -766,11 +822,11 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 * the jmp to ftrace_epilogue, as well as the address of
 	 * the ftrace_ops this trampoline is used for.
 	 */
-	trampoline = alloc_tramp(size + MCOUNT_INSN_SIZE + sizeof(void *));
+	trampoline = alloc_tramp(size + MCOUNT_RELINSN_SIZE + sizeof(void *));
 	if (!trampoline)
 		return 0;
 
-	*tramp_size = size + MCOUNT_INSN_SIZE + sizeof(void *);
+	*tramp_size = size + MCOUNT_RELINSN_SIZE + sizeof(void *);
 
 	/* Copy ftrace_caller onto the trampoline memory */
 	ret = probe_kernel_read(trampoline, (void *)start_offset, size);
@@ -783,7 +839,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	/* The trampoline ends with a jmp to ftrace_epilogue */
 	jmp = ftrace_jmp_replace(ip, (unsigned long)ftrace_epilogue);
-	memcpy(trampoline + size, jmp, MCOUNT_INSN_SIZE);
+	memcpy(trampoline + size, jmp, MCOUNT_RELINSN_SIZE);
 
 	/*
 	 * The address of the ftrace_ops that is used for this trampoline
@@ -793,7 +849,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 * the global function_trace_op variable.
 	 */
 
-	ptr = (unsigned long *)(trampoline + size + MCOUNT_INSN_SIZE);
+	ptr = (unsigned long *)(trampoline + size + MCOUNT_RELINSN_SIZE);
 	*ptr = (unsigned long)ops;
 
 	op_offset -= start_offset;
@@ -868,7 +924,7 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 	func = ftrace_ops_get_func(ops);
 
 	/* Do a safe modify in case the trampoline is executing */
-	new = ftrace_call_replace(ip, (unsigned long)func);
+	new = ftrace_call_replace(ip, (unsigned long)func, MCOUNT_RELINSN_SIZE);
 	ret = update_ftrace_func(ip, new);
 	set_memory_ro(ops->trampoline, npages);
 
@@ -882,7 +938,7 @@ static void *addr_from_call(void *ptr)
 	union ftrace_code_union calc;
 	int ret;
 
-	ret = probe_kernel_read(&calc, ptr, MCOUNT_INSN_SIZE);
+	ret = probe_kernel_read(&calc, ptr, MCOUNT_RELINSN_SIZE);
 	if (WARN_ON_ONCE(ret < 0))
 		return NULL;
 
@@ -892,7 +948,7 @@ static void *addr_from_call(void *ptr)
 		return NULL;
 	}
 
-	return ptr + MCOUNT_INSN_SIZE + calc.offset;
+	return ptr + MCOUNT_RELINSN_SIZE + calc.offset;
 }
 
 void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
diff --git a/arch/x86/kernel/module.lds b/arch/x86/kernel/module.lds
new file mode 100644
index 000000000000..fd6e95a4b454
--- /dev/null
+++ b/arch/x86/kernel/module.lds
@@ -0,0 +1,3 @@
+SECTIONS {
+	.got (NOLOAD) : { BYTE(0) }
+}
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (39 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

When using -fPIE/PIC with function tracing, the compiler generates a
call through the GOT (call *__fentry__@GOTPCREL). This instruction
takes 6 bytes instead of 5 on the usual relative call.

With this change, function tracing supports 6 bytes on traceable
function and can still replace relative calls on the ftrace assembly
functions.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/ftrace.h   |  23 +++++-
 arch/x86/include/asm/sections.h |   4 +
 arch/x86/kernel/ftrace.c        | 168 ++++++++++++++++++++++++++--------------
 arch/x86/kernel/module.lds      |   3 +
 4 files changed, 139 insertions(+), 59 deletions(-)
 create mode 100644 arch/x86/kernel/module.lds

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index eccd0ac6bc38..b8bbcc7fad7f 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_FTRACE_H
 #define _ASM_X86_FTRACE_H
 
+
 #ifdef CONFIG_FUNCTION_TRACER
 #ifdef CC_USING_FENTRY
 # define MCOUNT_ADDR		((unsigned long)(__fentry__))
@@ -8,7 +9,19 @@
 # define MCOUNT_ADDR		((unsigned long)(mcount))
 # define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
-#define MCOUNT_INSN_SIZE	5 /* sizeof mcount call */
+
+#define MCOUNT_RELINSN_SIZE	5 /* sizeof relative (call or jump) */
+#define MCOUNT_GOTCALL_SIZE	6 /* sizeof call *got */
+
+/*
+ * MCOUNT_INSN_SIZE is the highest size of instructions based on the
+ * configuration.
+ */
+#ifdef CONFIG_X86_PIE
+#define MCOUNT_INSN_SIZE	MCOUNT_GOTCALL_SIZE
+#else
+#define MCOUNT_INSN_SIZE	MCOUNT_RELINSN_SIZE
+#endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 #define ARCH_SUPPORTS_FTRACE_OPS 1
@@ -17,6 +30,8 @@
 #define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
 
 #ifndef __ASSEMBLY__
+#include <asm/sections.h>
+
 extern void mcount(void);
 extern atomic_t modifying_ftrace_code;
 extern void __fentry__(void);
@@ -24,9 +39,11 @@ extern void __fentry__(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
 	/*
-	 * addr is the address of the mcount call instruction.
-	 * recordmcount does the necessary offset calculation.
+	 * addr is the address of the mcount call instruction. PIE has always a
+	 * byte added to the start of the function.
 	 */
+	if (IS_ENABLED(CONFIG_X86_PIE))
+		addr -= 1;
 	return addr;
 }
 
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 2f75f30cb2f6..6b2d496cf1aa 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -11,4 +11,8 @@ extern struct exception_table_entry __stop___ex_table[];
 extern char __end_rodata_hpage_align[];
 #endif
 
+#if defined(CONFIG_X86_PIE)
+extern char __start_got[], __end_got[];
+#endif
+
 #endif	/* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 9bef1bbeba63..41d8c4c4306d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -58,12 +58,17 @@ static int ftrace_calc_offset(long ip, long addr)
 	return (int)(addr - ip);
 }
 
-static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr,
+					  unsigned int size)
 {
 	static union ftrace_code_union calc;
 
+	/* On PIE, fill the rest of the buffer with nops */
+	if (IS_ENABLED(CONFIG_X86_PIE))
+		memset(calc.code, ideal_nops[1][0], sizeof(calc.code));
+
 	calc.e8		= 0xe8;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_RELINSN_SIZE, addr);
 
 	/*
 	 * No locking needed, this must be called via kstop_machine
@@ -72,6 +77,44 @@ static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 	return calc.code;
 }
 
+#ifdef CONFIG_X86_PIE
+union ftrace_code_got_union {
+	char code[MCOUNT_INSN_SIZE];
+	struct {
+		unsigned short ff15;
+		int offset;
+	} __attribute__((packed));
+};
+
+/* Used to identify a mcount GOT call on PIE */
+static unsigned char *ftrace_original_call(struct module* mod, unsigned long ip,
+					   unsigned long addr,
+					   unsigned int size)
+{
+	static union ftrace_code_got_union calc;
+	unsigned long gotaddr;
+
+	calc.ff15 = 0x15ff;
+
+	gotaddr = module_find_got_entry(mod, addr);
+	if (!gotaddr) {
+		pr_err("Failed to find GOT entry for 0x%lx\n", addr);
+		return NULL;
+	}
+
+	calc.offset = ftrace_calc_offset(ip + MCOUNT_GOTCALL_SIZE, gotaddr);
+	return calc.code;
+}
+#else
+static unsigned char *ftrace_original_call(struct module* mod, unsigned long ip,
+					   unsigned long addr,
+					   unsigned int size)
+{
+	return ftrace_call_replace(ip, addr, size);
+}
+
+#endif
+
 static inline int
 within(unsigned long addr, unsigned long start, unsigned long end)
 {
@@ -94,16 +137,18 @@ static unsigned long text_ip_addr(unsigned long ip)
 	return ip;
 }
 
-static const unsigned char *ftrace_nop_replace(void)
+static const unsigned char *ftrace_nop_replace(unsigned int size)
 {
-	return ideal_nops[NOP_ATOMIC5];
+	return ideal_nops[size == 5 ? NOP_ATOMIC5 : size];
 }
 
 static int
-ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
-		   unsigned const char *new_code)
+ftrace_modify_code_direct(struct dyn_ftrace *rec, unsigned const char *old_code,
+			  unsigned const char *new_code)
 {
 	unsigned char replaced[MCOUNT_INSN_SIZE];
+	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	ftrace_expected = old_code;
 
@@ -116,17 +161,17 @@ ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
 	 */
 
 	/* read the text we want to modify */
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(replaced, (void *)ip, size))
 		return -EFAULT;
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(replaced, old_code, size) != 0)
 		return -EINVAL;
 
 	ip = text_ip_addr(ip);
 
 	/* replace the text with the new text */
-	if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
+	if (probe_kernel_write((void *)ip, new_code, size))
 		return -EPERM;
 
 	sync_core();
@@ -139,9 +184,7 @@ int ftrace_make_nop(struct module *mod,
 {
 	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
-
-	old = ftrace_call_replace(ip, addr);
-	new = ftrace_nop_replace();
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	/*
 	 * On boot up, and when modules are loaded, the MCOUNT_ADDR
@@ -151,14 +194,20 @@ int ftrace_make_nop(struct module *mod,
 	 * We do not want to use the breakpoint version in this case,
 	 * just modify the code directly.
 	 */
-	if (addr == MCOUNT_ADDR)
-		return ftrace_modify_code_direct(rec->ip, old, new);
+	if (addr != MCOUNT_ADDR) {
+		ftrace_expected = NULL;
 
-	ftrace_expected = NULL;
+		/* Normal cases use add_brk_on_nop */
+		WARN_ONCE(1, "invalid use of ftrace_make_nop");
+		return -EINVAL;
+	}
 
-	/* Normal cases use add_brk_on_nop */
-	WARN_ONCE(1, "invalid use of ftrace_make_nop");
-	return -EINVAL;
+	old = ftrace_original_call(mod, ip, addr, size);
+	if (!old)
+		return -EINVAL;
+	new = ftrace_nop_replace(size);
+
+	return ftrace_modify_code_direct(rec, old, new);
 }
 
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
@@ -166,11 +215,11 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
 
-	old = ftrace_nop_replace();
-	new = ftrace_call_replace(ip, addr);
+	old = ftrace_nop_replace(MCOUNT_INSN_SIZE);
+	new = ftrace_call_replace(ip, addr, MCOUNT_INSN_SIZE);
 
 	/* Should only be called when module is loaded */
-	return ftrace_modify_code_direct(rec->ip, old, new);
+	return ftrace_modify_code_direct(rec, old, new);
 }
 
 /*
@@ -233,7 +282,7 @@ static int update_ftrace_func(unsigned long ip, void *new)
 	unsigned char old[MCOUNT_INSN_SIZE];
 	int ret;
 
-	memcpy(old, (void *)ip, MCOUNT_INSN_SIZE);
+	memcpy(old, (void *)ip, MCOUNT_RELINSN_SIZE);
 
 	ftrace_update_func = ip;
 	/* Make sure the breakpoints see the ftrace_update_func update */
@@ -255,13 +304,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
-	new = ftrace_call_replace(ip, (unsigned long)func);
+	new = ftrace_call_replace(ip, (unsigned long)func, MCOUNT_RELINSN_SIZE);
 	ret = update_ftrace_func(ip, new);
 
 	/* Also update the regs callback function */
 	if (!ret) {
 		ip = (unsigned long)(&ftrace_regs_call);
-		new = ftrace_call_replace(ip, (unsigned long)func);
+		new = ftrace_call_replace(ip, (unsigned long)func,
+					  MCOUNT_RELINSN_SIZE);
 		ret = update_ftrace_func(ip, new);
 	}
 
@@ -309,18 +359,18 @@ static int ftrace_write(unsigned long ip, const char *val, int size)
 	return 0;
 }
 
-static int add_break(unsigned long ip, const char *old)
+static int add_break(unsigned long ip, const char *old, unsigned int size)
 {
 	unsigned char replaced[MCOUNT_INSN_SIZE];
 	unsigned char brk = BREAKPOINT_INSTRUCTION;
 
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(replaced, (void *)ip, size))
 		return -EFAULT;
 
 	ftrace_expected = old;
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(replaced, old, size) != 0)
 		return -EINVAL;
 
 	return ftrace_write(ip, &brk, 1);
@@ -330,20 +380,22 @@ static int add_brk_on_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned const char *old;
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
-	old = ftrace_call_replace(ip, addr);
+	old = ftrace_call_replace(ip, addr, size);
 
-	return add_break(rec->ip, old);
+	return add_break(rec->ip, old, size);
 }
 
 
 static int add_brk_on_nop(struct dyn_ftrace *rec)
 {
 	unsigned const char *old;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
-	old = ftrace_nop_replace();
+	old = ftrace_nop_replace(size);
 
-	return add_break(rec->ip, old);
+	return add_break(rec->ip, old, size);
 }
 
 static int add_breakpoints(struct dyn_ftrace *rec, int enable)
@@ -386,22 +438,23 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 	const unsigned char *nop;
 	unsigned long ftrace_addr;
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	/* If we fail the read, just give up */
-	if (probe_kernel_read(ins, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(ins, (void *)ip, size))
 		return -EFAULT;
 
 	/* If this does not have a breakpoint, we are done */
 	if (ins[0] != brk)
 		return 0;
 
-	nop = ftrace_nop_replace();
+	nop = ftrace_nop_replace(size);
 
 	/*
 	 * If the last 4 bytes of the instruction do not match
 	 * a nop, then we assume that this is a call to ftrace_addr.
 	 */
-	if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0) {
+	if (memcmp(&ins[1], &nop[1], size - 1) != 0) {
 		/*
 		 * For extra paranoidism, we check if the breakpoint is on
 		 * a call that would actually jump to the ftrace_addr.
@@ -409,18 +462,18 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 		 * a disaster.
 		 */
 		ftrace_addr = ftrace_get_addr_new(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
+		nop = ftrace_call_replace(ip, ftrace_addr, size);
 
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) == 0)
+		if (memcmp(&ins[1], &nop[1], size - 1) == 0)
 			goto update;
 
 		/* Check both ftrace_addr and ftrace_old_addr */
 		ftrace_addr = ftrace_get_addr_curr(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
+		nop = ftrace_call_replace(ip, ftrace_addr, size);
 
 		ftrace_expected = nop;
 
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0)
+		if (memcmp(&ins[1], &nop[1], size - 1) != 0)
 			return -EINVAL;
 	}
 
@@ -428,30 +481,33 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 	return ftrace_write(ip, nop, 1);
 }
 
-static int add_update_code(unsigned long ip, unsigned const char *new)
+static int add_update_code(unsigned long ip, unsigned const char *new,
+			   unsigned int size)
 {
 	/* skip breakpoint */
 	ip++;
 	new++;
-	return ftrace_write(ip, new, MCOUNT_INSN_SIZE - 1);
+	return ftrace_write(ip, new, size - 1);
 }
 
 static int add_update_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 	unsigned const char *new;
 
-	new = ftrace_call_replace(ip, addr);
-	return add_update_code(ip, new);
+	new = ftrace_call_replace(ip, addr, size);
+	return add_update_code(ip, new, size);
 }
 
 static int add_update_nop(struct dyn_ftrace *rec)
 {
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 	unsigned const char *new;
 
-	new = ftrace_nop_replace();
-	return add_update_code(ip, new);
+	new = ftrace_nop_replace(size);
+	return add_update_code(ip, new, size);
 }
 
 static int add_update(struct dyn_ftrace *rec, int enable)
@@ -485,7 +541,7 @@ static int finish_update_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned long ip = rec->ip;
 	unsigned const char *new;
 
-	new = ftrace_call_replace(ip, addr);
+	new = ftrace_call_replace(ip, addr, MCOUNT_INSN_SIZE);
 
 	return ftrace_write(ip, new, 1);
 }
@@ -495,7 +551,7 @@ static int finish_update_nop(struct dyn_ftrace *rec)
 	unsigned long ip = rec->ip;
 	unsigned const char *new;
 
-	new = ftrace_nop_replace();
+	new = ftrace_nop_replace(MCOUNT_INSN_SIZE);
 
 	return ftrace_write(ip, new, 1);
 }
@@ -619,13 +675,13 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
 {
 	int ret;
 
-	ret = add_break(ip, old_code);
+	ret = add_break(ip, old_code, MCOUNT_RELINSN_SIZE);
 	if (ret)
 		goto out;
 
 	run_sync();
 
-	ret = add_update_code(ip, new_code);
+	ret = add_update_code(ip, new_code, MCOUNT_RELINSN_SIZE);
 	if (ret)
 		goto fail_update;
 
@@ -670,7 +726,7 @@ static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 
 	/* Jmp not a call (ignore the .e8) */
 	calc.e8		= 0xe9;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_RELINSN_SIZE, addr);
 
 	/*
 	 * ftrace external locks synchronize the access to the static variable.
@@ -766,11 +822,11 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 * the jmp to ftrace_epilogue, as well as the address of
 	 * the ftrace_ops this trampoline is used for.
 	 */
-	trampoline = alloc_tramp(size + MCOUNT_INSN_SIZE + sizeof(void *));
+	trampoline = alloc_tramp(size + MCOUNT_RELINSN_SIZE + sizeof(void *));
 	if (!trampoline)
 		return 0;
 
-	*tramp_size = size + MCOUNT_INSN_SIZE + sizeof(void *);
+	*tramp_size = size + MCOUNT_RELINSN_SIZE + sizeof(void *);
 
 	/* Copy ftrace_caller onto the trampoline memory */
 	ret = probe_kernel_read(trampoline, (void *)start_offset, size);
@@ -783,7 +839,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	/* The trampoline ends with a jmp to ftrace_epilogue */
 	jmp = ftrace_jmp_replace(ip, (unsigned long)ftrace_epilogue);
-	memcpy(trampoline + size, jmp, MCOUNT_INSN_SIZE);
+	memcpy(trampoline + size, jmp, MCOUNT_RELINSN_SIZE);
 
 	/*
 	 * The address of the ftrace_ops that is used for this trampoline
@@ -793,7 +849,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 * the global function_trace_op variable.
 	 */
 
-	ptr = (unsigned long *)(trampoline + size + MCOUNT_INSN_SIZE);
+	ptr = (unsigned long *)(trampoline + size + MCOUNT_RELINSN_SIZE);
 	*ptr = (unsigned long)ops;
 
 	op_offset -= start_offset;
@@ -868,7 +924,7 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 	func = ftrace_ops_get_func(ops);
 
 	/* Do a safe modify in case the trampoline is executing */
-	new = ftrace_call_replace(ip, (unsigned long)func);
+	new = ftrace_call_replace(ip, (unsigned long)func, MCOUNT_RELINSN_SIZE);
 	ret = update_ftrace_func(ip, new);
 	set_memory_ro(ops->trampoline, npages);
 
@@ -882,7 +938,7 @@ static void *addr_from_call(void *ptr)
 	union ftrace_code_union calc;
 	int ret;
 
-	ret = probe_kernel_read(&calc, ptr, MCOUNT_INSN_SIZE);
+	ret = probe_kernel_read(&calc, ptr, MCOUNT_RELINSN_SIZE);
 	if (WARN_ON_ONCE(ret < 0))
 		return NULL;
 
@@ -892,7 +948,7 @@ static void *addr_from_call(void *ptr)
 		return NULL;
 	}
 
-	return ptr + MCOUNT_INSN_SIZE + calc.offset;
+	return ptr + MCOUNT_RELINSN_SIZE + calc.offset;
 }
 
 void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
diff --git a/arch/x86/kernel/module.lds b/arch/x86/kernel/module.lds
new file mode 100644
index 000000000000..fd6e95a4b454
--- /dev/null
+++ b/arch/x86/kernel/module.lds
@@ -0,0 +1,3 @@
+SECTIONS {
+	.got (NOLOAD) : { BYTE(0) }
+}
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

When using -fPIE/PIC with function tracing, the compiler generates a
call through the GOT (call *__fentry__@GOTPCREL). This instruction
takes 6 bytes instead of 5 on the usual relative call.

With this change, function tracing supports 6 bytes on traceable
function and can still replace relative calls on the ftrace assembly
functions.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/include/asm/ftrace.h   |  23 +++++-
 arch/x86/include/asm/sections.h |   4 +
 arch/x86/kernel/ftrace.c        | 168 ++++++++++++++++++++++++++--------------
 arch/x86/kernel/module.lds      |   3 +
 4 files changed, 139 insertions(+), 59 deletions(-)
 create mode 100644 arch/x86/kernel/module.lds

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index eccd0ac6bc38..b8bbcc7fad7f 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -1,6 +1,7 @@
 #ifndef _ASM_X86_FTRACE_H
 #define _ASM_X86_FTRACE_H
 
+
 #ifdef CONFIG_FUNCTION_TRACER
 #ifdef CC_USING_FENTRY
 # define MCOUNT_ADDR		((unsigned long)(__fentry__))
@@ -8,7 +9,19 @@
 # define MCOUNT_ADDR		((unsigned long)(mcount))
 # define HAVE_FUNCTION_GRAPH_FP_TEST
 #endif
-#define MCOUNT_INSN_SIZE	5 /* sizeof mcount call */
+
+#define MCOUNT_RELINSN_SIZE	5 /* sizeof relative (call or jump) */
+#define MCOUNT_GOTCALL_SIZE	6 /* sizeof call *got */
+
+/*
+ * MCOUNT_INSN_SIZE is the highest size of instructions based on the
+ * configuration.
+ */
+#ifdef CONFIG_X86_PIE
+#define MCOUNT_INSN_SIZE	MCOUNT_GOTCALL_SIZE
+#else
+#define MCOUNT_INSN_SIZE	MCOUNT_RELINSN_SIZE
+#endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 #define ARCH_SUPPORTS_FTRACE_OPS 1
@@ -17,6 +30,8 @@
 #define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
 
 #ifndef __ASSEMBLY__
+#include <asm/sections.h>
+
 extern void mcount(void);
 extern atomic_t modifying_ftrace_code;
 extern void __fentry__(void);
@@ -24,9 +39,11 @@ extern void __fentry__(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
 	/*
-	 * addr is the address of the mcount call instruction.
-	 * recordmcount does the necessary offset calculation.
+	 * addr is the address of the mcount call instruction. PIE has always a
+	 * byte added to the start of the function.
 	 */
+	if (IS_ENABLED(CONFIG_X86_PIE))
+		addr -= 1;
 	return addr;
 }
 
diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 2f75f30cb2f6..6b2d496cf1aa 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -11,4 +11,8 @@ extern struct exception_table_entry __stop___ex_table[];
 extern char __end_rodata_hpage_align[];
 #endif
 
+#if defined(CONFIG_X86_PIE)
+extern char __start_got[], __end_got[];
+#endif
+
 #endif	/* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 9bef1bbeba63..41d8c4c4306d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -58,12 +58,17 @@ static int ftrace_calc_offset(long ip, long addr)
 	return (int)(addr - ip);
 }
 
-static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
+static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr,
+					  unsigned int size)
 {
 	static union ftrace_code_union calc;
 
+	/* On PIE, fill the rest of the buffer with nops */
+	if (IS_ENABLED(CONFIG_X86_PIE))
+		memset(calc.code, ideal_nops[1][0], sizeof(calc.code));
+
 	calc.e8		= 0xe8;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_RELINSN_SIZE, addr);
 
 	/*
 	 * No locking needed, this must be called via kstop_machine
@@ -72,6 +77,44 @@ static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
 	return calc.code;
 }
 
+#ifdef CONFIG_X86_PIE
+union ftrace_code_got_union {
+	char code[MCOUNT_INSN_SIZE];
+	struct {
+		unsigned short ff15;
+		int offset;
+	} __attribute__((packed));
+};
+
+/* Used to identify a mcount GOT call on PIE */
+static unsigned char *ftrace_original_call(struct module* mod, unsigned long ip,
+					   unsigned long addr,
+					   unsigned int size)
+{
+	static union ftrace_code_got_union calc;
+	unsigned long gotaddr;
+
+	calc.ff15 = 0x15ff;
+
+	gotaddr = module_find_got_entry(mod, addr);
+	if (!gotaddr) {
+		pr_err("Failed to find GOT entry for 0x%lx\n", addr);
+		return NULL;
+	}
+
+	calc.offset = ftrace_calc_offset(ip + MCOUNT_GOTCALL_SIZE, gotaddr);
+	return calc.code;
+}
+#else
+static unsigned char *ftrace_original_call(struct module* mod, unsigned long ip,
+					   unsigned long addr,
+					   unsigned int size)
+{
+	return ftrace_call_replace(ip, addr, size);
+}
+
+#endif
+
 static inline int
 within(unsigned long addr, unsigned long start, unsigned long end)
 {
@@ -94,16 +137,18 @@ static unsigned long text_ip_addr(unsigned long ip)
 	return ip;
 }
 
-static const unsigned char *ftrace_nop_replace(void)
+static const unsigned char *ftrace_nop_replace(unsigned int size)
 {
-	return ideal_nops[NOP_ATOMIC5];
+	return ideal_nops[size == 5 ? NOP_ATOMIC5 : size];
 }
 
 static int
-ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
-		   unsigned const char *new_code)
+ftrace_modify_code_direct(struct dyn_ftrace *rec, unsigned const char *old_code,
+			  unsigned const char *new_code)
 {
 	unsigned char replaced[MCOUNT_INSN_SIZE];
+	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	ftrace_expected = old_code;
 
@@ -116,17 +161,17 @@ ftrace_modify_code_direct(unsigned long ip, unsigned const char *old_code,
 	 */
 
 	/* read the text we want to modify */
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(replaced, (void *)ip, size))
 		return -EFAULT;
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old_code, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(replaced, old_code, size) != 0)
 		return -EINVAL;
 
 	ip = text_ip_addr(ip);
 
 	/* replace the text with the new text */
-	if (probe_kernel_write((void *)ip, new_code, MCOUNT_INSN_SIZE))
+	if (probe_kernel_write((void *)ip, new_code, size))
 		return -EPERM;
 
 	sync_core();
@@ -139,9 +184,7 @@ int ftrace_make_nop(struct module *mod,
 {
 	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
-
-	old = ftrace_call_replace(ip, addr);
-	new = ftrace_nop_replace();
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	/*
 	 * On boot up, and when modules are loaded, the MCOUNT_ADDR
@@ -151,14 +194,20 @@ int ftrace_make_nop(struct module *mod,
 	 * We do not want to use the breakpoint version in this case,
 	 * just modify the code directly.
 	 */
-	if (addr == MCOUNT_ADDR)
-		return ftrace_modify_code_direct(rec->ip, old, new);
+	if (addr != MCOUNT_ADDR) {
+		ftrace_expected = NULL;
 
-	ftrace_expected = NULL;
+		/* Normal cases use add_brk_on_nop */
+		WARN_ONCE(1, "invalid use of ftrace_make_nop");
+		return -EINVAL;
+	}
 
-	/* Normal cases use add_brk_on_nop */
-	WARN_ONCE(1, "invalid use of ftrace_make_nop");
-	return -EINVAL;
+	old = ftrace_original_call(mod, ip, addr, size);
+	if (!old)
+		return -EINVAL;
+	new = ftrace_nop_replace(size);
+
+	return ftrace_modify_code_direct(rec, old, new);
 }
 
 int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
@@ -166,11 +215,11 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned const char *new, *old;
 	unsigned long ip = rec->ip;
 
-	old = ftrace_nop_replace();
-	new = ftrace_call_replace(ip, addr);
+	old = ftrace_nop_replace(MCOUNT_INSN_SIZE);
+	new = ftrace_call_replace(ip, addr, MCOUNT_INSN_SIZE);
 
 	/* Should only be called when module is loaded */
-	return ftrace_modify_code_direct(rec->ip, old, new);
+	return ftrace_modify_code_direct(rec, old, new);
 }
 
 /*
@@ -233,7 +282,7 @@ static int update_ftrace_func(unsigned long ip, void *new)
 	unsigned char old[MCOUNT_INSN_SIZE];
 	int ret;
 
-	memcpy(old, (void *)ip, MCOUNT_INSN_SIZE);
+	memcpy(old, (void *)ip, MCOUNT_RELINSN_SIZE);
 
 	ftrace_update_func = ip;
 	/* Make sure the breakpoints see the ftrace_update_func update */
@@ -255,13 +304,14 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
 	unsigned char *new;
 	int ret;
 
-	new = ftrace_call_replace(ip, (unsigned long)func);
+	new = ftrace_call_replace(ip, (unsigned long)func, MCOUNT_RELINSN_SIZE);
 	ret = update_ftrace_func(ip, new);
 
 	/* Also update the regs callback function */
 	if (!ret) {
 		ip = (unsigned long)(&ftrace_regs_call);
-		new = ftrace_call_replace(ip, (unsigned long)func);
+		new = ftrace_call_replace(ip, (unsigned long)func,
+					  MCOUNT_RELINSN_SIZE);
 		ret = update_ftrace_func(ip, new);
 	}
 
@@ -309,18 +359,18 @@ static int ftrace_write(unsigned long ip, const char *val, int size)
 	return 0;
 }
 
-static int add_break(unsigned long ip, const char *old)
+static int add_break(unsigned long ip, const char *old, unsigned int size)
 {
 	unsigned char replaced[MCOUNT_INSN_SIZE];
 	unsigned char brk = BREAKPOINT_INSTRUCTION;
 
-	if (probe_kernel_read(replaced, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(replaced, (void *)ip, size))
 		return -EFAULT;
 
 	ftrace_expected = old;
 
 	/* Make sure it is what we expect it to be */
-	if (memcmp(replaced, old, MCOUNT_INSN_SIZE) != 0)
+	if (memcmp(replaced, old, size) != 0)
 		return -EINVAL;
 
 	return ftrace_write(ip, &brk, 1);
@@ -330,20 +380,22 @@ static int add_brk_on_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned const char *old;
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
-	old = ftrace_call_replace(ip, addr);
+	old = ftrace_call_replace(ip, addr, size);
 
-	return add_break(rec->ip, old);
+	return add_break(rec->ip, old, size);
 }
 
 
 static int add_brk_on_nop(struct dyn_ftrace *rec)
 {
 	unsigned const char *old;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
-	old = ftrace_nop_replace();
+	old = ftrace_nop_replace(size);
 
-	return add_break(rec->ip, old);
+	return add_break(rec->ip, old, size);
 }
 
 static int add_breakpoints(struct dyn_ftrace *rec, int enable)
@@ -386,22 +438,23 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 	const unsigned char *nop;
 	unsigned long ftrace_addr;
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 
 	/* If we fail the read, just give up */
-	if (probe_kernel_read(ins, (void *)ip, MCOUNT_INSN_SIZE))
+	if (probe_kernel_read(ins, (void *)ip, size))
 		return -EFAULT;
 
 	/* If this does not have a breakpoint, we are done */
 	if (ins[0] != brk)
 		return 0;
 
-	nop = ftrace_nop_replace();
+	nop = ftrace_nop_replace(size);
 
 	/*
 	 * If the last 4 bytes of the instruction do not match
 	 * a nop, then we assume that this is a call to ftrace_addr.
 	 */
-	if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0) {
+	if (memcmp(&ins[1], &nop[1], size - 1) != 0) {
 		/*
 		 * For extra paranoidism, we check if the breakpoint is on
 		 * a call that would actually jump to the ftrace_addr.
@@ -409,18 +462,18 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 		 * a disaster.
 		 */
 		ftrace_addr = ftrace_get_addr_new(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
+		nop = ftrace_call_replace(ip, ftrace_addr, size);
 
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) == 0)
+		if (memcmp(&ins[1], &nop[1], size - 1) == 0)
 			goto update;
 
 		/* Check both ftrace_addr and ftrace_old_addr */
 		ftrace_addr = ftrace_get_addr_curr(rec);
-		nop = ftrace_call_replace(ip, ftrace_addr);
+		nop = ftrace_call_replace(ip, ftrace_addr, size);
 
 		ftrace_expected = nop;
 
-		if (memcmp(&ins[1], &nop[1], MCOUNT_INSN_SIZE - 1) != 0)
+		if (memcmp(&ins[1], &nop[1], size - 1) != 0)
 			return -EINVAL;
 	}
 
@@ -428,30 +481,33 @@ static int remove_breakpoint(struct dyn_ftrace *rec)
 	return ftrace_write(ip, nop, 1);
 }
 
-static int add_update_code(unsigned long ip, unsigned const char *new)
+static int add_update_code(unsigned long ip, unsigned const char *new,
+			   unsigned int size)
 {
 	/* skip breakpoint */
 	ip++;
 	new++;
-	return ftrace_write(ip, new, MCOUNT_INSN_SIZE - 1);
+	return ftrace_write(ip, new, size - 1);
 }
 
 static int add_update_call(struct dyn_ftrace *rec, unsigned long addr)
 {
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 	unsigned const char *new;
 
-	new = ftrace_call_replace(ip, addr);
-	return add_update_code(ip, new);
+	new = ftrace_call_replace(ip, addr, size);
+	return add_update_code(ip, new, size);
 }
 
 static int add_update_nop(struct dyn_ftrace *rec)
 {
 	unsigned long ip = rec->ip;
+	unsigned int size = MCOUNT_INSN_SIZE;
 	unsigned const char *new;
 
-	new = ftrace_nop_replace();
-	return add_update_code(ip, new);
+	new = ftrace_nop_replace(size);
+	return add_update_code(ip, new, size);
 }
 
 static int add_update(struct dyn_ftrace *rec, int enable)
@@ -485,7 +541,7 @@ static int finish_update_call(struct dyn_ftrace *rec, unsigned long addr)
 	unsigned long ip = rec->ip;
 	unsigned const char *new;
 
-	new = ftrace_call_replace(ip, addr);
+	new = ftrace_call_replace(ip, addr, MCOUNT_INSN_SIZE);
 
 	return ftrace_write(ip, new, 1);
 }
@@ -495,7 +551,7 @@ static int finish_update_nop(struct dyn_ftrace *rec)
 	unsigned long ip = rec->ip;
 	unsigned const char *new;
 
-	new = ftrace_nop_replace();
+	new = ftrace_nop_replace(MCOUNT_INSN_SIZE);
 
 	return ftrace_write(ip, new, 1);
 }
@@ -619,13 +675,13 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
 {
 	int ret;
 
-	ret = add_break(ip, old_code);
+	ret = add_break(ip, old_code, MCOUNT_RELINSN_SIZE);
 	if (ret)
 		goto out;
 
 	run_sync();
 
-	ret = add_update_code(ip, new_code);
+	ret = add_update_code(ip, new_code, MCOUNT_RELINSN_SIZE);
 	if (ret)
 		goto fail_update;
 
@@ -670,7 +726,7 @@ static unsigned char *ftrace_jmp_replace(unsigned long ip, unsigned long addr)
 
 	/* Jmp not a call (ignore the .e8) */
 	calc.e8		= 0xe9;
-	calc.offset	= ftrace_calc_offset(ip + MCOUNT_INSN_SIZE, addr);
+	calc.offset	= ftrace_calc_offset(ip + MCOUNT_RELINSN_SIZE, addr);
 
 	/*
 	 * ftrace external locks synchronize the access to the static variable.
@@ -766,11 +822,11 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 * the jmp to ftrace_epilogue, as well as the address of
 	 * the ftrace_ops this trampoline is used for.
 	 */
-	trampoline = alloc_tramp(size + MCOUNT_INSN_SIZE + sizeof(void *));
+	trampoline = alloc_tramp(size + MCOUNT_RELINSN_SIZE + sizeof(void *));
 	if (!trampoline)
 		return 0;
 
-	*tramp_size = size + MCOUNT_INSN_SIZE + sizeof(void *);
+	*tramp_size = size + MCOUNT_RELINSN_SIZE + sizeof(void *);
 
 	/* Copy ftrace_caller onto the trampoline memory */
 	ret = probe_kernel_read(trampoline, (void *)start_offset, size);
@@ -783,7 +839,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 
 	/* The trampoline ends with a jmp to ftrace_epilogue */
 	jmp = ftrace_jmp_replace(ip, (unsigned long)ftrace_epilogue);
-	memcpy(trampoline + size, jmp, MCOUNT_INSN_SIZE);
+	memcpy(trampoline + size, jmp, MCOUNT_RELINSN_SIZE);
 
 	/*
 	 * The address of the ftrace_ops that is used for this trampoline
@@ -793,7 +849,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size)
 	 * the global function_trace_op variable.
 	 */
 
-	ptr = (unsigned long *)(trampoline + size + MCOUNT_INSN_SIZE);
+	ptr = (unsigned long *)(trampoline + size + MCOUNT_RELINSN_SIZE);
 	*ptr = (unsigned long)ops;
 
 	op_offset -= start_offset;
@@ -868,7 +924,7 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 	func = ftrace_ops_get_func(ops);
 
 	/* Do a safe modify in case the trampoline is executing */
-	new = ftrace_call_replace(ip, (unsigned long)func);
+	new = ftrace_call_replace(ip, (unsigned long)func, MCOUNT_RELINSN_SIZE);
 	ret = update_ftrace_func(ip, new);
 	set_memory_ro(ops->trampoline, npages);
 
@@ -882,7 +938,7 @@ static void *addr_from_call(void *ptr)
 	union ftrace_code_union calc;
 	int ret;
 
-	ret = probe_kernel_read(&calc, ptr, MCOUNT_INSN_SIZE);
+	ret = probe_kernel_read(&calc, ptr, MCOUNT_RELINSN_SIZE);
 	if (WARN_ON_ONCE(ret < 0))
 		return NULL;
 
@@ -892,7 +948,7 @@ static void *addr_from_call(void *ptr)
 		return NULL;
 	}
 
-	return ptr + MCOUNT_INSN_SIZE + calc.offset;
+	return ptr + MCOUNT_RELINSN_SIZE + calc.offset;
 }
 
 void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
diff --git a/arch/x86/kernel/module.lds b/arch/x86/kernel/module.lds
new file mode 100644
index 000000000000..fd6e95a4b454
--- /dev/null
+++ b/arch/x86/kernel/module.lds
@@ -0,0 +1,3 @@
+SECTIONS {
+	.got (NOLOAD) : { BYTE(0) }
+}
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 21/27] x86/mm/dump_pagetables: Fix address markers index on x86_64
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The address_markers_idx enum is not aligned with the table when EFI is
enabled. Add an EFI_VA_END_NR entry in this case.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/mm/dump_pagetables.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 5e3ac6fe6c9e..8691a57da63e 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -52,12 +52,15 @@ enum address_markers_idx {
 	LOW_KERNEL_NR,
 	VMALLOC_START_NR,
 	VMEMMAP_START_NR,
-#ifdef CONFIG_KASAN
+# ifdef CONFIG_KASAN
 	KASAN_SHADOW_START_NR,
 	KASAN_SHADOW_END_NR,
-#endif
+# endif
 # ifdef CONFIG_X86_ESPFIX64
 	ESPFIX_START_NR,
+# endif
+# ifdef CONFIG_EFI
+	EFI_VA_END_NR,
 # endif
 	HIGH_KERNEL_NR,
 	MODULES_VADDR_NR,
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 21/27] x86/mm/dump_pagetables: Fix address markers index on x86_64
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The address_markers_idx enum is not aligned with the table when EFI is
enabled. Add an EFI_VA_END_NR entry in this case.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/mm/dump_pagetables.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 5e3ac6fe6c9e..8691a57da63e 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -52,12 +52,15 @@ enum address_markers_idx {
 	LOW_KERNEL_NR,
 	VMALLOC_START_NR,
 	VMEMMAP_START_NR,
-#ifdef CONFIG_KASAN
+# ifdef CONFIG_KASAN
 	KASAN_SHADOW_START_NR,
 	KASAN_SHADOW_END_NR,
-#endif
+# endif
 # ifdef CONFIG_X86_ESPFIX64
 	ESPFIX_START_NR,
+# endif
+# ifdef CONFIG_EFI
+	EFI_VA_END_NR,
 # endif
 	HIGH_KERNEL_NR,
 	MODULES_VADDR_NR,
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 21/27] x86/mm/dump_pagetables: Fix address markers index on x86_64
  2017-10-04 21:19 ` Thomas Garnier
                   ` (41 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The address_markers_idx enum is not aligned with the table when EFI is
enabled. Add an EFI_VA_END_NR entry in this case.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/mm/dump_pagetables.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 5e3ac6fe6c9e..8691a57da63e 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -52,12 +52,15 @@ enum address_markers_idx {
 	LOW_KERNEL_NR,
 	VMALLOC_START_NR,
 	VMEMMAP_START_NR,
-#ifdef CONFIG_KASAN
+# ifdef CONFIG_KASAN
 	KASAN_SHADOW_START_NR,
 	KASAN_SHADOW_END_NR,
-#endif
+# endif
 # ifdef CONFIG_X86_ESPFIX64
 	ESPFIX_START_NR,
+# endif
+# ifdef CONFIG_EFI
+	EFI_VA_END_NR,
 # endif
 	HIGH_KERNEL_NR,
 	MODULES_VADDR_NR,
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 21/27] x86/mm/dump_pagetables: Fix address markers index on x86_64
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

The address_markers_idx enum is not aligned with the table when EFI is
enabled. Add an EFI_VA_END_NR entry in this case.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/mm/dump_pagetables.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 5e3ac6fe6c9e..8691a57da63e 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -52,12 +52,15 @@ enum address_markers_idx {
 	LOW_KERNEL_NR,
 	VMALLOC_START_NR,
 	VMEMMAP_START_NR,
-#ifdef CONFIG_KASAN
+# ifdef CONFIG_KASAN
 	KASAN_SHADOW_START_NR,
 	KASAN_SHADOW_END_NR,
-#endif
+# endif
 # ifdef CONFIG_X86_ESPFIX64
 	ESPFIX_START_NR,
+# endif
+# ifdef CONFIG_EFI
+	EFI_VA_END_NR,
 # endif
 	HIGH_KERNEL_NR,
 	MODULES_VADDR_NR,
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 22/27] x86/modules: Add option to start module section after kernel
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add an option so the module section is just after the mapped kernel. It
will ensure position independent modules are always at the right
distance from the kernel and do not require mcmodule=large. It also
optimize the available size for modules by getting rid of the empty
space on kernel randomization range.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 Documentation/x86/x86_64/mm.txt         | 3 +++
 arch/x86/Kconfig                        | 4 ++++
 arch/x86/include/asm/pgtable_64_types.h | 6 +++++-
 arch/x86/kernel/head64.c                | 5 ++++-
 arch/x86/mm/dump_pagetables.c           | 4 ++--
 5 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b0798e281aa6..b51d66386e32 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -73,4 +73,7 @@ Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
 physical memory, vmalloc/ioremap space and virtual memory map are randomized.
 Their order is preserved but their base will be offset early at boot time.
 
+If CONFIG_DYNAMIC_MODULE_BASE is enabled, the module section follows the end of
+the mapped kernel.
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 777197fab6dd..1e4b399c64e5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2133,6 +2133,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+# Module section starts just after the end of the kernel module
+config DYNAMIC_MODULE_BASE
+	bool
+
 config X86_GLOBAL_STACKPROTECTOR
 	bool
 	depends on CC_STACKPROTECTOR
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 06470da156ba..e00fc429b898 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -6,6 +6,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #include <asm/kaslr.h>
+#include <asm/sections.h>
 
 /*
  * These are used to make use of C type-checking..
@@ -18,7 +19,6 @@ typedef unsigned long	pgdval_t;
 typedef unsigned long	pgprotval_t;
 
 typedef struct { pteval_t pte; } pte_t;
-
 #endif	/* !__ASSEMBLY__ */
 
 #define SHARED_KERNEL_PMD	0
@@ -93,7 +93,11 @@ typedef struct { pteval_t pte; } pte_t;
 #define VMEMMAP_START	__VMEMMAP_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 #define VMALLOC_END	(VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
+#ifdef CONFIG_DYNAMIC_MODULE_BASE
+#define MODULES_VADDR   ALIGN(((unsigned long)_end + PAGE_SIZE), PMD_SIZE)
+#else
 #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+#endif
 /* The module sections ends with the start of the fixmap */
 #define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 675f1dba3b21..b6363f0d11a7 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -321,12 +321,15 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	 * Build-time sanity checks on the kernel image and module
 	 * area mappings. (these are purely build-time and produce no code)
 	 */
+#ifndef CONFIG_DYNAMIC_MODULE_BASE
 	BUILD_BUG_ON(MODULES_VADDR < __START_KERNEL_map);
 	BUILD_BUG_ON(MODULES_VADDR - __START_KERNEL_map < KERNEL_IMAGE_SIZE);
-	BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
+	BUILD_BUG_ON(!IS_ENABLED(CONFIG_RANDOMIZE_BASE_LARGE) &&
+		     MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
 	BUILD_BUG_ON((__START_KERNEL_map & ~PMD_MASK) != 0);
 	BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
 	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
+#endif
 	BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 8691a57da63e..8565b2b45848 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -95,7 +95,7 @@ static struct addr_marker address_markers[] = {
 	{ EFI_VA_END,		"EFI Runtime Services" },
 # endif
 	{ __START_KERNEL_map,   "High Kernel Mapping" },
-	{ MODULES_VADDR,        "Modules" },
+	{ 0/* MODULES_VADDR */,	"Modules" },
 	{ MODULES_END,          "End Modules" },
 #else
 	{ PAGE_OFFSET,          "Kernel Mapping" },
@@ -529,7 +529,7 @@ static int __init pt_dump_init(void)
 # endif
 	address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
 #endif
-
+	address_markers[MODULES_VADDR_NR].start_address = MODULES_VADDR;
 	return 0;
 }
 __initcall(pt_dump_init);
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 22/27] x86/modules: Add option to start module section after kernel
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add an option so the module section is just after the mapped kernel. It
will ensure position independent modules are always at the right
distance from the kernel and do not require mcmodule=large. It also
optimize the available size for modules by getting rid of the empty
space on kernel randomization range.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 Documentation/x86/x86_64/mm.txt         | 3 +++
 arch/x86/Kconfig                        | 4 ++++
 arch/x86/include/asm/pgtable_64_types.h | 6 +++++-
 arch/x86/kernel/head64.c                | 5 ++++-
 arch/x86/mm/dump_pagetables.c           | 4 ++--
 5 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b0798e281aa6..b51d66386e32 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -73,4 +73,7 @@ Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
 physical memory, vmalloc/ioremap space and virtual memory map are randomized.
 Their order is preserved but their base will be offset early at boot time.
 
+If CONFIG_DYNAMIC_MODULE_BASE is enabled, the module section follows the end of
+the mapped kernel.
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 777197fab6dd..1e4b399c64e5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2133,6 +2133,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+# Module section starts just after the end of the kernel module
+config DYNAMIC_MODULE_BASE
+	bool
+
 config X86_GLOBAL_STACKPROTECTOR
 	bool
 	depends on CC_STACKPROTECTOR
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 06470da156ba..e00fc429b898 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -6,6 +6,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #include <asm/kaslr.h>
+#include <asm/sections.h>
 
 /*
  * These are used to make use of C type-checking..
@@ -18,7 +19,6 @@ typedef unsigned long	pgdval_t;
 typedef unsigned long	pgprotval_t;
 
 typedef struct { pteval_t pte; } pte_t;
-
 #endif	/* !__ASSEMBLY__ */
 
 #define SHARED_KERNEL_PMD	0
@@ -93,7 +93,11 @@ typedef struct { pteval_t pte; } pte_t;
 #define VMEMMAP_START	__VMEMMAP_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 #define VMALLOC_END	(VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
+#ifdef CONFIG_DYNAMIC_MODULE_BASE
+#define MODULES_VADDR   ALIGN(((unsigned long)_end + PAGE_SIZE), PMD_SIZE)
+#else
 #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+#endif
 /* The module sections ends with the start of the fixmap */
 #define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 675f1dba3b21..b6363f0d11a7 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -321,12 +321,15 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	 * Build-time sanity checks on the kernel image and module
 	 * area mappings. (these are purely build-time and produce no code)
 	 */
+#ifndef CONFIG_DYNAMIC_MODULE_BASE
 	BUILD_BUG_ON(MODULES_VADDR < __START_KERNEL_map);
 	BUILD_BUG_ON(MODULES_VADDR - __START_KERNEL_map < KERNEL_IMAGE_SIZE);
-	BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
+	BUILD_BUG_ON(!IS_ENABLED(CONFIG_RANDOMIZE_BASE_LARGE) &&
+		     MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
 	BUILD_BUG_ON((__START_KERNEL_map & ~PMD_MASK) != 0);
 	BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
 	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
+#endif
 	BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 8691a57da63e..8565b2b45848 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -95,7 +95,7 @@ static struct addr_marker address_markers[] = {
 	{ EFI_VA_END,		"EFI Runtime Services" },
 # endif
 	{ __START_KERNEL_map,   "High Kernel Mapping" },
-	{ MODULES_VADDR,        "Modules" },
+	{ 0/* MODULES_VADDR */,	"Modules" },
 	{ MODULES_END,          "End Modules" },
 #else
 	{ PAGE_OFFSET,          "Kernel Mapping" },
@@ -529,7 +529,7 @@ static int __init pt_dump_init(void)
 # endif
 	address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
 #endif
-
+	address_markers[MODULES_VADDR_NR].start_address = MODULES_VADDR;
 	return 0;
 }
 __initcall(pt_dump_init);
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 22/27] x86/modules: Add option to start module section after kernel
  2017-10-04 21:19 ` Thomas Garnier
                   ` (44 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add an option so the module section is just after the mapped kernel. It
will ensure position independent modules are always at the right
distance from the kernel and do not require mcmodule=large. It also
optimize the available size for modules by getting rid of the empty
space on kernel randomization range.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 Documentation/x86/x86_64/mm.txt         | 3 +++
 arch/x86/Kconfig                        | 4 ++++
 arch/x86/include/asm/pgtable_64_types.h | 6 +++++-
 arch/x86/kernel/head64.c                | 5 ++++-
 arch/x86/mm/dump_pagetables.c           | 4 ++--
 5 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b0798e281aa6..b51d66386e32 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -73,4 +73,7 @@ Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
 physical memory, vmalloc/ioremap space and virtual memory map are randomized.
 Their order is preserved but their base will be offset early at boot time.
 
+If CONFIG_DYNAMIC_MODULE_BASE is enabled, the module section follows the end of
+the mapped kernel.
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 777197fab6dd..1e4b399c64e5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2133,6 +2133,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+# Module section starts just after the end of the kernel module
+config DYNAMIC_MODULE_BASE
+	bool
+
 config X86_GLOBAL_STACKPROTECTOR
 	bool
 	depends on CC_STACKPROTECTOR
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 06470da156ba..e00fc429b898 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -6,6 +6,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #include <asm/kaslr.h>
+#include <asm/sections.h>
 
 /*
  * These are used to make use of C type-checking..
@@ -18,7 +19,6 @@ typedef unsigned long	pgdval_t;
 typedef unsigned long	pgprotval_t;
 
 typedef struct { pteval_t pte; } pte_t;
-
 #endif	/* !__ASSEMBLY__ */
 
 #define SHARED_KERNEL_PMD	0
@@ -93,7 +93,11 @@ typedef struct { pteval_t pte; } pte_t;
 #define VMEMMAP_START	__VMEMMAP_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 #define VMALLOC_END	(VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
+#ifdef CONFIG_DYNAMIC_MODULE_BASE
+#define MODULES_VADDR   ALIGN(((unsigned long)_end + PAGE_SIZE), PMD_SIZE)
+#else
 #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+#endif
 /* The module sections ends with the start of the fixmap */
 #define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 675f1dba3b21..b6363f0d11a7 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -321,12 +321,15 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	 * Build-time sanity checks on the kernel image and module
 	 * area mappings. (these are purely build-time and produce no code)
 	 */
+#ifndef CONFIG_DYNAMIC_MODULE_BASE
 	BUILD_BUG_ON(MODULES_VADDR < __START_KERNEL_map);
 	BUILD_BUG_ON(MODULES_VADDR - __START_KERNEL_map < KERNEL_IMAGE_SIZE);
-	BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
+	BUILD_BUG_ON(!IS_ENABLED(CONFIG_RANDOMIZE_BASE_LARGE) &&
+		     MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
 	BUILD_BUG_ON((__START_KERNEL_map & ~PMD_MASK) != 0);
 	BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
 	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
+#endif
 	BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 8691a57da63e..8565b2b45848 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -95,7 +95,7 @@ static struct addr_marker address_markers[] = {
 	{ EFI_VA_END,		"EFI Runtime Services" },
 # endif
 	{ __START_KERNEL_map,   "High Kernel Mapping" },
-	{ MODULES_VADDR,        "Modules" },
+	{ 0/* MODULES_VADDR */,	"Modules" },
 	{ MODULES_END,          "End Modules" },
 #else
 	{ PAGE_OFFSET,          "Kernel Mapping" },
@@ -529,7 +529,7 @@ static int __init pt_dump_init(void)
 # endif
 	address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
 #endif
-
+	address_markers[MODULES_VADDR_NR].start_address = MODULES_VADDR;
 	return 0;
 }
 __initcall(pt_dump_init);
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 22/27] x86/modules: Add option to start module section after kernel
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Add an option so the module section is just after the mapped kernel. It
will ensure position independent modules are always at the right
distance from the kernel and do not require mcmodule=large. It also
optimize the available size for modules by getting rid of the empty
space on kernel randomization range.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 Documentation/x86/x86_64/mm.txt         | 3 +++
 arch/x86/Kconfig                        | 4 ++++
 arch/x86/include/asm/pgtable_64_types.h | 6 +++++-
 arch/x86/kernel/head64.c                | 5 ++++-
 arch/x86/mm/dump_pagetables.c           | 4 ++--
 5 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b0798e281aa6..b51d66386e32 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -73,4 +73,7 @@ Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
 physical memory, vmalloc/ioremap space and virtual memory map are randomized.
 Their order is preserved but their base will be offset early at boot time.
 
+If CONFIG_DYNAMIC_MODULE_BASE is enabled, the module section follows the end of
+the mapped kernel.
+
 -Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 777197fab6dd..1e4b399c64e5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2133,6 +2133,10 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 
 	   If unsure, leave at the default value.
 
+# Module section starts just after the end of the kernel module
+config DYNAMIC_MODULE_BASE
+	bool
+
 config X86_GLOBAL_STACKPROTECTOR
 	bool
 	depends on CC_STACKPROTECTOR
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 06470da156ba..e00fc429b898 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -6,6 +6,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #include <asm/kaslr.h>
+#include <asm/sections.h>
 
 /*
  * These are used to make use of C type-checking..
@@ -18,7 +19,6 @@ typedef unsigned long	pgdval_t;
 typedef unsigned long	pgprotval_t;
 
 typedef struct { pteval_t pte; } pte_t;
-
 #endif	/* !__ASSEMBLY__ */
 
 #define SHARED_KERNEL_PMD	0
@@ -93,7 +93,11 @@ typedef struct { pteval_t pte; } pte_t;
 #define VMEMMAP_START	__VMEMMAP_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 #define VMALLOC_END	(VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
+#ifdef CONFIG_DYNAMIC_MODULE_BASE
+#define MODULES_VADDR   ALIGN(((unsigned long)_end + PAGE_SIZE), PMD_SIZE)
+#else
 #define MODULES_VADDR    (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
+#endif
 /* The module sections ends with the start of the fixmap */
 #define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 675f1dba3b21..b6363f0d11a7 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -321,12 +321,15 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 	 * Build-time sanity checks on the kernel image and module
 	 * area mappings. (these are purely build-time and produce no code)
 	 */
+#ifndef CONFIG_DYNAMIC_MODULE_BASE
 	BUILD_BUG_ON(MODULES_VADDR < __START_KERNEL_map);
 	BUILD_BUG_ON(MODULES_VADDR - __START_KERNEL_map < KERNEL_IMAGE_SIZE);
-	BUILD_BUG_ON(MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
+	BUILD_BUG_ON(!IS_ENABLED(CONFIG_RANDOMIZE_BASE_LARGE) &&
+		     MODULES_LEN + KERNEL_IMAGE_SIZE > 2*PUD_SIZE);
 	BUILD_BUG_ON((__START_KERNEL_map & ~PMD_MASK) != 0);
 	BUILD_BUG_ON((MODULES_VADDR & ~PMD_MASK) != 0);
 	BUILD_BUG_ON(!(MODULES_VADDR > __START_KERNEL));
+#endif
 	BUILD_BUG_ON(!(((MODULES_END - 1) & PGDIR_MASK) ==
 				(__START_KERNEL & PGDIR_MASK)));
 	BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 8691a57da63e..8565b2b45848 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -95,7 +95,7 @@ static struct addr_marker address_markers[] = {
 	{ EFI_VA_END,		"EFI Runtime Services" },
 # endif
 	{ __START_KERNEL_map,   "High Kernel Mapping" },
-	{ MODULES_VADDR,        "Modules" },
+	{ 0/* MODULES_VADDR */,	"Modules" },
 	{ MODULES_END,          "End Modules" },
 #else
 	{ PAGE_OFFSET,          "Kernel Mapping" },
@@ -529,7 +529,7 @@ static int __init pt_dump_init(void)
 # endif
 	address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
 #endif
-
+	address_markers[MODULES_VADDR_NR].start_address = MODULES_VADDR;
 	return 0;
 }
 __initcall(pt_dump_init);
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 23/27] x86/modules: Adapt module loading for PIE support
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:19   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Adapt module loading to support PIE relocations. Generate dynamic GOT if
a symbol requires it but no entry exist in the kernel GOT.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Makefile             |   4 +
 arch/x86/include/asm/module.h |  14 +++
 arch/x86/kernel/module.c      | 204 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 217 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 4cb4f0495ddc..42774185a58a 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -143,7 +143,11 @@ else
         KBUILD_CFLAGS += $(cflags-y)
 
         KBUILD_CFLAGS += -mno-red-zone
+ifdef CONFIG_X86_PIE
+        KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
+else
         KBUILD_CFLAGS += -mcmodel=kernel
+endif
 
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
index 9eb7c718aaf8..8e0bd52bbadf 100644
--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -4,12 +4,23 @@
 #include <asm-generic/module.h>
 #include <asm/orc_types.h>
 
+#ifdef CONFIG_X86_PIE
+struct mod_got_sec {
+	struct elf64_shdr	*got;
+	int			got_num_entries;
+	int			got_max_entries;
+};
+#endif
+
 struct mod_arch_specific {
 #ifdef CONFIG_ORC_UNWINDER
 	unsigned int num_orcs;
 	int *orc_unwind_ip;
 	struct orc_entry *orc_unwind;
 #endif
+#ifdef CONFIG_X86_PIE
+	struct mod_got_sec	core;
+#endif
 };
 
 #ifdef CONFIG_X86_64
@@ -70,4 +81,7 @@ struct mod_arch_specific {
 # define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY
 #endif
 
+
+u64 module_find_got_entry(struct module *mod, u64 addr);
+
 #endif /* _ASM_X86_MODULE_H */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 62e7d70aadd5..3b9b43a9d63b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -30,6 +30,7 @@
 #include <linux/gfp.h>
 #include <linux/jump_label.h>
 #include <linux/random.h>
+#include <linux/sort.h>
 
 #include <asm/text-patching.h>
 #include <asm/page.h>
@@ -77,6 +78,195 @@ static unsigned long int get_module_load_offset(void)
 }
 #endif
 
+#ifdef CONFIG_X86_PIE
+static u64 find_got_kernel_entry(Elf64_Sym *sym, const Elf64_Rela *rela)
+{
+	u64 *pos;
+
+	for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+		if (*pos == sym->st_value)
+			return (u64)pos + rela->r_addend;
+	}
+
+	return 0;
+}
+
+/* Search the GOT entry for a specific address and a module (optional) */
+u64 module_find_got_entry(struct module *mod, u64 addr)
+{
+	int i;
+	u64 *pos;
+
+	for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+		if (*pos == addr)
+			return (u64)pos;
+	}
+
+	if (!mod)
+		return 0;
+
+	pos = (u64*)mod->arch.core.got->sh_addr;
+	for (i = 0; i < mod->arch.core.got_num_entries; i++) {
+		if (pos[i] == addr)
+			return (u64)&pos[i];
+	}
+	return 0;
+}
+
+static u64 module_emit_got_entry(struct module *mod, void *loc,
+				 const Elf64_Rela *rela, Elf64_Sym *sym)
+{
+	struct mod_got_sec *gotsec = &mod->arch.core;
+	u64 *got = (u64*)gotsec->got->sh_addr;
+	int i = gotsec->got_num_entries;
+	u64 ret;
+
+	/* Check if we can use the kernel GOT */
+	ret = find_got_kernel_entry(sym, rela);
+	if (ret)
+		return ret;
+
+	got[i] = sym->st_value;
+
+	/*
+	 * Check if the entry we just created is a duplicate. Given that the
+	 * relocations are sorted, this will be the last entry we allocated.
+	 * (if one exists).
+	 */
+	if (i > 0 && got[i] == got[i - 2]) {
+		ret = (u64)&got[i - 1];
+	} else {
+		gotsec->got_num_entries++;
+		BUG_ON(gotsec->got_num_entries > gotsec->got_max_entries);
+		ret = (u64)&got[i];
+	}
+
+	return ret + rela->r_addend;
+}
+
+#define cmp_3way(a,b)	((a) < (b) ? -1 : (a) > (b))
+
+static int cmp_rela(const void *a, const void *b)
+{
+	const Elf64_Rela *x = a, *y = b;
+	int i;
+
+	/* sort by type, symbol index and addend */
+	i = cmp_3way(ELF64_R_TYPE(x->r_info), ELF64_R_TYPE(y->r_info));
+	if (i == 0)
+		i = cmp_3way(ELF64_R_SYM(x->r_info), ELF64_R_SYM(y->r_info));
+	if (i == 0)
+		i = cmp_3way(x->r_addend, y->r_addend);
+	return i;
+}
+
+static bool duplicate_rel(const Elf64_Rela *rela, int num)
+{
+	/*
+	 * Entries are sorted by type, symbol index and addend. That means
+	 * that, if a duplicate entry exists, it must be in the preceding
+	 * slot.
+	 */
+	return num > 0 && cmp_rela(rela + num, rela + num - 1) == 0;
+}
+
+static unsigned int count_gots(Elf64_Sym *syms, Elf64_Rela *rela, int num)
+{
+	unsigned int ret = 0;
+	Elf64_Sym *s;
+	int i;
+
+	for (i = 0; i < num; i++) {
+		switch (ELF64_R_TYPE(rela[i].r_info)) {
+		case R_X86_64_GOTPCREL:
+			s = syms + ELF64_R_SYM(rela[i].r_info);
+
+			/*
+			 * Use the kernel GOT when possible, else reserve a
+			 * custom one for this module.
+			 */
+			if (!duplicate_rel(rela, i) &&
+			    !find_got_kernel_entry(s, rela + i))
+				ret++;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Generate GOT entries for GOTPCREL relocations that do not exists in the
+ * kernel GOT. Based on arm64 module-plts implementation.
+ */
+int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
+			      char *secstrings, struct module *mod)
+{
+	unsigned long gots = 0;
+	Elf_Shdr *symtab = NULL;
+	Elf64_Sym *syms = NULL;
+	char *strings, *name;
+	int i;
+
+	/*
+	 * Find the empty .got section so we can expand it to store the PLT
+	 * entries. Record the symtab address as well.
+	 */
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		if (!strcmp(secstrings + sechdrs[i].sh_name, ".got")) {
+			mod->arch.core.got = sechdrs + i;
+		} else if (sechdrs[i].sh_type == SHT_SYMTAB) {
+			symtab = sechdrs + i;
+			syms = (Elf64_Sym *)symtab->sh_addr;
+		}
+	}
+
+	if (!mod->arch.core.got) {
+		pr_err("%s: module GOT section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+	if (!syms) {
+		pr_err("%s: module symtab section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		Elf64_Rela *rels = (void *)ehdr + sechdrs[i].sh_offset;
+		int numrels = sechdrs[i].sh_size / sizeof(Elf64_Rela);
+
+		if (sechdrs[i].sh_type != SHT_RELA)
+			continue;
+
+		/* sort by type, symbol index and addend */
+		sort(rels, numrels, sizeof(Elf64_Rela), cmp_rela, NULL);
+
+		gots += count_gots(syms, rels, numrels);
+	}
+
+	mod->arch.core.got->sh_type = SHT_NOBITS;
+	mod->arch.core.got->sh_flags = SHF_ALLOC;
+	mod->arch.core.got->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core.got->sh_size = (gots + 1) * sizeof(u64);
+	mod->arch.core.got_num_entries = 0;
+	mod->arch.core.got_max_entries = gots;
+
+	/*
+	 * If a _GLOBAL_OFFSET_TABLE_ symbol exists, make it absolute for
+	 * modules to correctly reference it. Similar to s390 implementation.
+	 */
+	strings = (void *) ehdr + sechdrs[symtab->sh_link].sh_offset;
+	for (i = 0; i < symtab->sh_size/sizeof(Elf_Sym); i++) {
+		if (syms[i].st_shndx != SHN_UNDEF)
+			continue;
+		name = strings + syms[i].st_name;
+		if (!strcmp(name, "_GLOBAL_OFFSET_TABLE_")) {
+			syms[i].st_shndx = SHN_ABS;
+			break;
+		}
+	}
+	return 0;
+}
+#endif
+
 void *module_alloc(unsigned long size)
 {
 	void *p;
@@ -184,13 +374,18 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			if ((s64)val != *(s32 *)loc)
 				goto overflow;
 			break;
+#ifdef CONFIG_X86_PIE
+		case R_X86_64_GOTPCREL:
+			val = module_emit_got_entry(me, loc, rel + i, sym);
+			/* fallthrough */
+#endif
+		case R_X86_64_PLT32:
 		case R_X86_64_PC32:
 			val -= (u64)loc;
 			*(u32 *)loc = val;
-#if 0
-			if ((s64)val != *(s32 *)loc)
+			if (IS_ENABLED(CONFIG_X86_PIE) &&
+			    (s64)val != *(s32 *)loc)
 				goto overflow;
-#endif
 			break;
 		default:
 			pr_err("%s: Unknown rela relocation: %llu\n",
@@ -203,8 +398,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 overflow:
 	pr_err("overflow in relocation type %d val %Lx\n",
 	       (int)ELF64_R_TYPE(rel[i].r_info), val);
-	pr_err("`%s' likely not compiled with -mcmodel=kernel\n",
-	       me->name);
+	pr_err("`%s' likely too far from the kernel\n", me->name);
 	return -ENOEXEC;
 }
 #endif
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 23/27] x86/modules: Adapt module loading for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Adapt module loading to support PIE relocations. Generate dynamic GOT if
a symbol requires it but no entry exist in the kernel GOT.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Makefile             |   4 +
 arch/x86/include/asm/module.h |  14 +++
 arch/x86/kernel/module.c      | 204 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 217 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 4cb4f0495ddc..42774185a58a 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -143,7 +143,11 @@ else
         KBUILD_CFLAGS += $(cflags-y)
 
         KBUILD_CFLAGS += -mno-red-zone
+ifdef CONFIG_X86_PIE
+        KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
+else
         KBUILD_CFLAGS += -mcmodel=kernel
+endif
 
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
index 9eb7c718aaf8..8e0bd52bbadf 100644
--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -4,12 +4,23 @@
 #include <asm-generic/module.h>
 #include <asm/orc_types.h>
 
+#ifdef CONFIG_X86_PIE
+struct mod_got_sec {
+	struct elf64_shdr	*got;
+	int			got_num_entries;
+	int			got_max_entries;
+};
+#endif
+
 struct mod_arch_specific {
 #ifdef CONFIG_ORC_UNWINDER
 	unsigned int num_orcs;
 	int *orc_unwind_ip;
 	struct orc_entry *orc_unwind;
 #endif
+#ifdef CONFIG_X86_PIE
+	struct mod_got_sec	core;
+#endif
 };
 
 #ifdef CONFIG_X86_64
@@ -70,4 +81,7 @@ struct mod_arch_specific {
 # define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY
 #endif
 
+
+u64 module_find_got_entry(struct module *mod, u64 addr);
+
 #endif /* _ASM_X86_MODULE_H */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 62e7d70aadd5..3b9b43a9d63b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -30,6 +30,7 @@
 #include <linux/gfp.h>
 #include <linux/jump_label.h>
 #include <linux/random.h>
+#include <linux/sort.h>
 
 #include <asm/text-patching.h>
 #include <asm/page.h>
@@ -77,6 +78,195 @@ static unsigned long int get_module_load_offset(void)
 }
 #endif
 
+#ifdef CONFIG_X86_PIE
+static u64 find_got_kernel_entry(Elf64_Sym *sym, const Elf64_Rela *rela)
+{
+	u64 *pos;
+
+	for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+		if (*pos == sym->st_value)
+			return (u64)pos + rela->r_addend;
+	}
+
+	return 0;
+}
+
+/* Search the GOT entry for a specific address and a module (optional) */
+u64 module_find_got_entry(struct module *mod, u64 addr)
+{
+	int i;
+	u64 *pos;
+
+	for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+		if (*pos == addr)
+			return (u64)pos;
+	}
+
+	if (!mod)
+		return 0;
+
+	pos = (u64*)mod->arch.core.got->sh_addr;
+	for (i = 0; i < mod->arch.core.got_num_entries; i++) {
+		if (pos[i] == addr)
+			return (u64)&pos[i];
+	}
+	return 0;
+}
+
+static u64 module_emit_got_entry(struct module *mod, void *loc,
+				 const Elf64_Rela *rela, Elf64_Sym *sym)
+{
+	struct mod_got_sec *gotsec = &mod->arch.core;
+	u64 *got = (u64*)gotsec->got->sh_addr;
+	int i = gotsec->got_num_entries;
+	u64 ret;
+
+	/* Check if we can use the kernel GOT */
+	ret = find_got_kernel_entry(sym, rela);
+	if (ret)
+		return ret;
+
+	got[i] = sym->st_value;
+
+	/*
+	 * Check if the entry we just created is a duplicate. Given that the
+	 * relocations are sorted, this will be the last entry we allocated.
+	 * (if one exists).
+	 */
+	if (i > 0 && got[i] == got[i - 2]) {
+		ret = (u64)&got[i - 1];
+	} else {
+		gotsec->got_num_entries++;
+		BUG_ON(gotsec->got_num_entries > gotsec->got_max_entries);
+		ret = (u64)&got[i];
+	}
+
+	return ret + rela->r_addend;
+}
+
+#define cmp_3way(a,b)	((a) < (b) ? -1 : (a) > (b))
+
+static int cmp_rela(const void *a, const void *b)
+{
+	const Elf64_Rela *x = a, *y = b;
+	int i;
+
+	/* sort by type, symbol index and addend */
+	i = cmp_3way(ELF64_R_TYPE(x->r_info), ELF64_R_TYPE(y->r_info));
+	if (i == 0)
+		i = cmp_3way(ELF64_R_SYM(x->r_info), ELF64_R_SYM(y->r_info));
+	if (i == 0)
+		i = cmp_3way(x->r_addend, y->r_addend);
+	return i;
+}
+
+static bool duplicate_rel(const Elf64_Rela *rela, int num)
+{
+	/*
+	 * Entries are sorted by type, symbol index and addend. That means
+	 * that, if a duplicate entry exists, it must be in the preceding
+	 * slot.
+	 */
+	return num > 0 && cmp_rela(rela + num, rela + num - 1) == 0;
+}
+
+static unsigned int count_gots(Elf64_Sym *syms, Elf64_Rela *rela, int num)
+{
+	unsigned int ret = 0;
+	Elf64_Sym *s;
+	int i;
+
+	for (i = 0; i < num; i++) {
+		switch (ELF64_R_TYPE(rela[i].r_info)) {
+		case R_X86_64_GOTPCREL:
+			s = syms + ELF64_R_SYM(rela[i].r_info);
+
+			/*
+			 * Use the kernel GOT when possible, else reserve a
+			 * custom one for this module.
+			 */
+			if (!duplicate_rel(rela, i) &&
+			    !find_got_kernel_entry(s, rela + i))
+				ret++;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Generate GOT entries for GOTPCREL relocations that do not exists in the
+ * kernel GOT. Based on arm64 module-plts implementation.
+ */
+int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
+			      char *secstrings, struct module *mod)
+{
+	unsigned long gots = 0;
+	Elf_Shdr *symtab = NULL;
+	Elf64_Sym *syms = NULL;
+	char *strings, *name;
+	int i;
+
+	/*
+	 * Find the empty .got section so we can expand it to store the PLT
+	 * entries. Record the symtab address as well.
+	 */
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		if (!strcmp(secstrings + sechdrs[i].sh_name, ".got")) {
+			mod->arch.core.got = sechdrs + i;
+		} else if (sechdrs[i].sh_type == SHT_SYMTAB) {
+			symtab = sechdrs + i;
+			syms = (Elf64_Sym *)symtab->sh_addr;
+		}
+	}
+
+	if (!mod->arch.core.got) {
+		pr_err("%s: module GOT section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+	if (!syms) {
+		pr_err("%s: module symtab section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		Elf64_Rela *rels = (void *)ehdr + sechdrs[i].sh_offset;
+		int numrels = sechdrs[i].sh_size / sizeof(Elf64_Rela);
+
+		if (sechdrs[i].sh_type != SHT_RELA)
+			continue;
+
+		/* sort by type, symbol index and addend */
+		sort(rels, numrels, sizeof(Elf64_Rela), cmp_rela, NULL);
+
+		gots += count_gots(syms, rels, numrels);
+	}
+
+	mod->arch.core.got->sh_type = SHT_NOBITS;
+	mod->arch.core.got->sh_flags = SHF_ALLOC;
+	mod->arch.core.got->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core.got->sh_size = (gots + 1) * sizeof(u64);
+	mod->arch.core.got_num_entries = 0;
+	mod->arch.core.got_max_entries = gots;
+
+	/*
+	 * If a _GLOBAL_OFFSET_TABLE_ symbol exists, make it absolute for
+	 * modules to correctly reference it. Similar to s390 implementation.
+	 */
+	strings = (void *) ehdr + sechdrs[symtab->sh_link].sh_offset;
+	for (i = 0; i < symtab->sh_size/sizeof(Elf_Sym); i++) {
+		if (syms[i].st_shndx != SHN_UNDEF)
+			continue;
+		name = strings + syms[i].st_name;
+		if (!strcmp(name, "_GLOBAL_OFFSET_TABLE_")) {
+			syms[i].st_shndx = SHN_ABS;
+			break;
+		}
+	}
+	return 0;
+}
+#endif
+
 void *module_alloc(unsigned long size)
 {
 	void *p;
@@ -184,13 +374,18 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			if ((s64)val != *(s32 *)loc)
 				goto overflow;
 			break;
+#ifdef CONFIG_X86_PIE
+		case R_X86_64_GOTPCREL:
+			val = module_emit_got_entry(me, loc, rel + i, sym);
+			/* fallthrough */
+#endif
+		case R_X86_64_PLT32:
 		case R_X86_64_PC32:
 			val -= (u64)loc;
 			*(u32 *)loc = val;
-#if 0
-			if ((s64)val != *(s32 *)loc)
+			if (IS_ENABLED(CONFIG_X86_PIE) &&
+			    (s64)val != *(s32 *)loc)
 				goto overflow;
-#endif
 			break;
 		default:
 			pr_err("%s: Unknown rela relocation: %llu\n",
@@ -203,8 +398,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 overflow:
 	pr_err("overflow in relocation type %d val %Lx\n",
 	       (int)ELF64_R_TYPE(rel[i].r_info), val);
-	pr_err("`%s' likely not compiled with -mcmodel=kernel\n",
-	       me->name);
+	pr_err("`%s' likely too far from the kernel\n", me->name);
 	return -ENOEXEC;
 }
 #endif
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 23/27] x86/modules: Adapt module loading for PIE support
  2017-10-04 21:19 ` Thomas Garnier
                   ` (46 preceding siblings ...)
  (?)
@ 2017-10-04 21:19 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Adapt module loading to support PIE relocations. Generate dynamic GOT if
a symbol requires it but no entry exist in the kernel GOT.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Makefile             |   4 +
 arch/x86/include/asm/module.h |  14 +++
 arch/x86/kernel/module.c      | 204 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 217 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 4cb4f0495ddc..42774185a58a 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -143,7 +143,11 @@ else
         KBUILD_CFLAGS += $(cflags-y)
 
         KBUILD_CFLAGS += -mno-red-zone
+ifdef CONFIG_X86_PIE
+        KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
+else
         KBUILD_CFLAGS += -mcmodel=kernel
+endif
 
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
index 9eb7c718aaf8..8e0bd52bbadf 100644
--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -4,12 +4,23 @@
 #include <asm-generic/module.h>
 #include <asm/orc_types.h>
 
+#ifdef CONFIG_X86_PIE
+struct mod_got_sec {
+	struct elf64_shdr	*got;
+	int			got_num_entries;
+	int			got_max_entries;
+};
+#endif
+
 struct mod_arch_specific {
 #ifdef CONFIG_ORC_UNWINDER
 	unsigned int num_orcs;
 	int *orc_unwind_ip;
 	struct orc_entry *orc_unwind;
 #endif
+#ifdef CONFIG_X86_PIE
+	struct mod_got_sec	core;
+#endif
 };
 
 #ifdef CONFIG_X86_64
@@ -70,4 +81,7 @@ struct mod_arch_specific {
 # define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY
 #endif
 
+
+u64 module_find_got_entry(struct module *mod, u64 addr);
+
 #endif /* _ASM_X86_MODULE_H */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 62e7d70aadd5..3b9b43a9d63b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -30,6 +30,7 @@
 #include <linux/gfp.h>
 #include <linux/jump_label.h>
 #include <linux/random.h>
+#include <linux/sort.h>
 
 #include <asm/text-patching.h>
 #include <asm/page.h>
@@ -77,6 +78,195 @@ static unsigned long int get_module_load_offset(void)
 }
 #endif
 
+#ifdef CONFIG_X86_PIE
+static u64 find_got_kernel_entry(Elf64_Sym *sym, const Elf64_Rela *rela)
+{
+	u64 *pos;
+
+	for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+		if (*pos == sym->st_value)
+			return (u64)pos + rela->r_addend;
+	}
+
+	return 0;
+}
+
+/* Search the GOT entry for a specific address and a module (optional) */
+u64 module_find_got_entry(struct module *mod, u64 addr)
+{
+	int i;
+	u64 *pos;
+
+	for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+		if (*pos == addr)
+			return (u64)pos;
+	}
+
+	if (!mod)
+		return 0;
+
+	pos = (u64*)mod->arch.core.got->sh_addr;
+	for (i = 0; i < mod->arch.core.got_num_entries; i++) {
+		if (pos[i] == addr)
+			return (u64)&pos[i];
+	}
+	return 0;
+}
+
+static u64 module_emit_got_entry(struct module *mod, void *loc,
+				 const Elf64_Rela *rela, Elf64_Sym *sym)
+{
+	struct mod_got_sec *gotsec = &mod->arch.core;
+	u64 *got = (u64*)gotsec->got->sh_addr;
+	int i = gotsec->got_num_entries;
+	u64 ret;
+
+	/* Check if we can use the kernel GOT */
+	ret = find_got_kernel_entry(sym, rela);
+	if (ret)
+		return ret;
+
+	got[i] = sym->st_value;
+
+	/*
+	 * Check if the entry we just created is a duplicate. Given that the
+	 * relocations are sorted, this will be the last entry we allocated.
+	 * (if one exists).
+	 */
+	if (i > 0 && got[i] == got[i - 2]) {
+		ret = (u64)&got[i - 1];
+	} else {
+		gotsec->got_num_entries++;
+		BUG_ON(gotsec->got_num_entries > gotsec->got_max_entries);
+		ret = (u64)&got[i];
+	}
+
+	return ret + rela->r_addend;
+}
+
+#define cmp_3way(a,b)	((a) < (b) ? -1 : (a) > (b))
+
+static int cmp_rela(const void *a, const void *b)
+{
+	const Elf64_Rela *x = a, *y = b;
+	int i;
+
+	/* sort by type, symbol index and addend */
+	i = cmp_3way(ELF64_R_TYPE(x->r_info), ELF64_R_TYPE(y->r_info));
+	if (i == 0)
+		i = cmp_3way(ELF64_R_SYM(x->r_info), ELF64_R_SYM(y->r_info));
+	if (i == 0)
+		i = cmp_3way(x->r_addend, y->r_addend);
+	return i;
+}
+
+static bool duplicate_rel(const Elf64_Rela *rela, int num)
+{
+	/*
+	 * Entries are sorted by type, symbol index and addend. That means
+	 * that, if a duplicate entry exists, it must be in the preceding
+	 * slot.
+	 */
+	return num > 0 && cmp_rela(rela + num, rela + num - 1) == 0;
+}
+
+static unsigned int count_gots(Elf64_Sym *syms, Elf64_Rela *rela, int num)
+{
+	unsigned int ret = 0;
+	Elf64_Sym *s;
+	int i;
+
+	for (i = 0; i < num; i++) {
+		switch (ELF64_R_TYPE(rela[i].r_info)) {
+		case R_X86_64_GOTPCREL:
+			s = syms + ELF64_R_SYM(rela[i].r_info);
+
+			/*
+			 * Use the kernel GOT when possible, else reserve a
+			 * custom one for this module.
+			 */
+			if (!duplicate_rel(rela, i) &&
+			    !find_got_kernel_entry(s, rela + i))
+				ret++;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Generate GOT entries for GOTPCREL relocations that do not exists in the
+ * kernel GOT. Based on arm64 module-plts implementation.
+ */
+int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
+			      char *secstrings, struct module *mod)
+{
+	unsigned long gots = 0;
+	Elf_Shdr *symtab = NULL;
+	Elf64_Sym *syms = NULL;
+	char *strings, *name;
+	int i;
+
+	/*
+	 * Find the empty .got section so we can expand it to store the PLT
+	 * entries. Record the symtab address as well.
+	 */
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		if (!strcmp(secstrings + sechdrs[i].sh_name, ".got")) {
+			mod->arch.core.got = sechdrs + i;
+		} else if (sechdrs[i].sh_type == SHT_SYMTAB) {
+			symtab = sechdrs + i;
+			syms = (Elf64_Sym *)symtab->sh_addr;
+		}
+	}
+
+	if (!mod->arch.core.got) {
+		pr_err("%s: module GOT section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+	if (!syms) {
+		pr_err("%s: module symtab section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		Elf64_Rela *rels = (void *)ehdr + sechdrs[i].sh_offset;
+		int numrels = sechdrs[i].sh_size / sizeof(Elf64_Rela);
+
+		if (sechdrs[i].sh_type != SHT_RELA)
+			continue;
+
+		/* sort by type, symbol index and addend */
+		sort(rels, numrels, sizeof(Elf64_Rela), cmp_rela, NULL);
+
+		gots += count_gots(syms, rels, numrels);
+	}
+
+	mod->arch.core.got->sh_type = SHT_NOBITS;
+	mod->arch.core.got->sh_flags = SHF_ALLOC;
+	mod->arch.core.got->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core.got->sh_size = (gots + 1) * sizeof(u64);
+	mod->arch.core.got_num_entries = 0;
+	mod->arch.core.got_max_entries = gots;
+
+	/*
+	 * If a _GLOBAL_OFFSET_TABLE_ symbol exists, make it absolute for
+	 * modules to correctly reference it. Similar to s390 implementation.
+	 */
+	strings = (void *) ehdr + sechdrs[symtab->sh_link].sh_offset;
+	for (i = 0; i < symtab->sh_size/sizeof(Elf_Sym); i++) {
+		if (syms[i].st_shndx != SHN_UNDEF)
+			continue;
+		name = strings + syms[i].st_name;
+		if (!strcmp(name, "_GLOBAL_OFFSET_TABLE_")) {
+			syms[i].st_shndx = SHN_ABS;
+			break;
+		}
+	}
+	return 0;
+}
+#endif
+
 void *module_alloc(unsigned long size)
 {
 	void *p;
@@ -184,13 +374,18 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			if ((s64)val != *(s32 *)loc)
 				goto overflow;
 			break;
+#ifdef CONFIG_X86_PIE
+		case R_X86_64_GOTPCREL:
+			val = module_emit_got_entry(me, loc, rel + i, sym);
+			/* fallthrough */
+#endif
+		case R_X86_64_PLT32:
 		case R_X86_64_PC32:
 			val -= (u64)loc;
 			*(u32 *)loc = val;
-#if 0
-			if ((s64)val != *(s32 *)loc)
+			if (IS_ENABLED(CONFIG_X86_PIE) &&
+			    (s64)val != *(s32 *)loc)
 				goto overflow;
-#endif
 			break;
 		default:
 			pr_err("%s: Unknown rela relocation: %llu\n",
@@ -203,8 +398,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 overflow:
 	pr_err("overflow in relocation type %d val %Lx\n",
 	       (int)ELF64_R_TYPE(rel[i].r_info), val);
-	pr_err("`%s' likely not compiled with -mcmodel=kernel\n",
-	       me->name);
+	pr_err("`%s' likely too far from the kernel\n", me->name);
 	return -ENOEXEC;
 }
 #endif
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 23/27] x86/modules: Adapt module loading for PIE support
@ 2017-10-04 21:19   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Adapt module loading to support PIE relocations. Generate dynamic GOT if
a symbol requires it but no entry exist in the kernel GOT.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Makefile             |   4 +
 arch/x86/include/asm/module.h |  14 +++
 arch/x86/kernel/module.c      | 204 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 217 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 4cb4f0495ddc..42774185a58a 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -143,7 +143,11 @@ else
         KBUILD_CFLAGS += $(cflags-y)
 
         KBUILD_CFLAGS += -mno-red-zone
+ifdef CONFIG_X86_PIE
+        KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
+else
         KBUILD_CFLAGS += -mcmodel=kernel
+endif
 
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
index 9eb7c718aaf8..8e0bd52bbadf 100644
--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -4,12 +4,23 @@
 #include <asm-generic/module.h>
 #include <asm/orc_types.h>
 
+#ifdef CONFIG_X86_PIE
+struct mod_got_sec {
+	struct elf64_shdr	*got;
+	int			got_num_entries;
+	int			got_max_entries;
+};
+#endif
+
 struct mod_arch_specific {
 #ifdef CONFIG_ORC_UNWINDER
 	unsigned int num_orcs;
 	int *orc_unwind_ip;
 	struct orc_entry *orc_unwind;
 #endif
+#ifdef CONFIG_X86_PIE
+	struct mod_got_sec	core;
+#endif
 };
 
 #ifdef CONFIG_X86_64
@@ -70,4 +81,7 @@ struct mod_arch_specific {
 # define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY
 #endif
 
+
+u64 module_find_got_entry(struct module *mod, u64 addr);
+
 #endif /* _ASM_X86_MODULE_H */
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 62e7d70aadd5..3b9b43a9d63b 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -30,6 +30,7 @@
 #include <linux/gfp.h>
 #include <linux/jump_label.h>
 #include <linux/random.h>
+#include <linux/sort.h>
 
 #include <asm/text-patching.h>
 #include <asm/page.h>
@@ -77,6 +78,195 @@ static unsigned long int get_module_load_offset(void)
 }
 #endif
 
+#ifdef CONFIG_X86_PIE
+static u64 find_got_kernel_entry(Elf64_Sym *sym, const Elf64_Rela *rela)
+{
+	u64 *pos;
+
+	for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+		if (*pos == sym->st_value)
+			return (u64)pos + rela->r_addend;
+	}
+
+	return 0;
+}
+
+/* Search the GOT entry for a specific address and a module (optional) */
+u64 module_find_got_entry(struct module *mod, u64 addr)
+{
+	int i;
+	u64 *pos;
+
+	for (pos = (u64*)__start_got; pos < (u64*)__end_got; pos++) {
+		if (*pos == addr)
+			return (u64)pos;
+	}
+
+	if (!mod)
+		return 0;
+
+	pos = (u64*)mod->arch.core.got->sh_addr;
+	for (i = 0; i < mod->arch.core.got_num_entries; i++) {
+		if (pos[i] == addr)
+			return (u64)&pos[i];
+	}
+	return 0;
+}
+
+static u64 module_emit_got_entry(struct module *mod, void *loc,
+				 const Elf64_Rela *rela, Elf64_Sym *sym)
+{
+	struct mod_got_sec *gotsec = &mod->arch.core;
+	u64 *got = (u64*)gotsec->got->sh_addr;
+	int i = gotsec->got_num_entries;
+	u64 ret;
+
+	/* Check if we can use the kernel GOT */
+	ret = find_got_kernel_entry(sym, rela);
+	if (ret)
+		return ret;
+
+	got[i] = sym->st_value;
+
+	/*
+	 * Check if the entry we just created is a duplicate. Given that the
+	 * relocations are sorted, this will be the last entry we allocated.
+	 * (if one exists).
+	 */
+	if (i > 0 && got[i] == got[i - 2]) {
+		ret = (u64)&got[i - 1];
+	} else {
+		gotsec->got_num_entries++;
+		BUG_ON(gotsec->got_num_entries > gotsec->got_max_entries);
+		ret = (u64)&got[i];
+	}
+
+	return ret + rela->r_addend;
+}
+
+#define cmp_3way(a,b)	((a) < (b) ? -1 : (a) > (b))
+
+static int cmp_rela(const void *a, const void *b)
+{
+	const Elf64_Rela *x = a, *y = b;
+	int i;
+
+	/* sort by type, symbol index and addend */
+	i = cmp_3way(ELF64_R_TYPE(x->r_info), ELF64_R_TYPE(y->r_info));
+	if (i == 0)
+		i = cmp_3way(ELF64_R_SYM(x->r_info), ELF64_R_SYM(y->r_info));
+	if (i == 0)
+		i = cmp_3way(x->r_addend, y->r_addend);
+	return i;
+}
+
+static bool duplicate_rel(const Elf64_Rela *rela, int num)
+{
+	/*
+	 * Entries are sorted by type, symbol index and addend. That means
+	 * that, if a duplicate entry exists, it must be in the preceding
+	 * slot.
+	 */
+	return num > 0 && cmp_rela(rela + num, rela + num - 1) == 0;
+}
+
+static unsigned int count_gots(Elf64_Sym *syms, Elf64_Rela *rela, int num)
+{
+	unsigned int ret = 0;
+	Elf64_Sym *s;
+	int i;
+
+	for (i = 0; i < num; i++) {
+		switch (ELF64_R_TYPE(rela[i].r_info)) {
+		case R_X86_64_GOTPCREL:
+			s = syms + ELF64_R_SYM(rela[i].r_info);
+
+			/*
+			 * Use the kernel GOT when possible, else reserve a
+			 * custom one for this module.
+			 */
+			if (!duplicate_rel(rela, i) &&
+			    !find_got_kernel_entry(s, rela + i))
+				ret++;
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Generate GOT entries for GOTPCREL relocations that do not exists in the
+ * kernel GOT. Based on arm64 module-plts implementation.
+ */
+int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
+			      char *secstrings, struct module *mod)
+{
+	unsigned long gots = 0;
+	Elf_Shdr *symtab = NULL;
+	Elf64_Sym *syms = NULL;
+	char *strings, *name;
+	int i;
+
+	/*
+	 * Find the empty .got section so we can expand it to store the PLT
+	 * entries. Record the symtab address as well.
+	 */
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		if (!strcmp(secstrings + sechdrs[i].sh_name, ".got")) {
+			mod->arch.core.got = sechdrs + i;
+		} else if (sechdrs[i].sh_type == SHT_SYMTAB) {
+			symtab = sechdrs + i;
+			syms = (Elf64_Sym *)symtab->sh_addr;
+		}
+	}
+
+	if (!mod->arch.core.got) {
+		pr_err("%s: module GOT section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+	if (!syms) {
+		pr_err("%s: module symtab section missing\n", mod->name);
+		return -ENOEXEC;
+	}
+
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		Elf64_Rela *rels = (void *)ehdr + sechdrs[i].sh_offset;
+		int numrels = sechdrs[i].sh_size / sizeof(Elf64_Rela);
+
+		if (sechdrs[i].sh_type != SHT_RELA)
+			continue;
+
+		/* sort by type, symbol index and addend */
+		sort(rels, numrels, sizeof(Elf64_Rela), cmp_rela, NULL);
+
+		gots += count_gots(syms, rels, numrels);
+	}
+
+	mod->arch.core.got->sh_type = SHT_NOBITS;
+	mod->arch.core.got->sh_flags = SHF_ALLOC;
+	mod->arch.core.got->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core.got->sh_size = (gots + 1) * sizeof(u64);
+	mod->arch.core.got_num_entries = 0;
+	mod->arch.core.got_max_entries = gots;
+
+	/*
+	 * If a _GLOBAL_OFFSET_TABLE_ symbol exists, make it absolute for
+	 * modules to correctly reference it. Similar to s390 implementation.
+	 */
+	strings = (void *) ehdr + sechdrs[symtab->sh_link].sh_offset;
+	for (i = 0; i < symtab->sh_size/sizeof(Elf_Sym); i++) {
+		if (syms[i].st_shndx != SHN_UNDEF)
+			continue;
+		name = strings + syms[i].st_name;
+		if (!strcmp(name, "_GLOBAL_OFFSET_TABLE_")) {
+			syms[i].st_shndx = SHN_ABS;
+			break;
+		}
+	}
+	return 0;
+}
+#endif
+
 void *module_alloc(unsigned long size)
 {
 	void *p;
@@ -184,13 +374,18 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 			if ((s64)val != *(s32 *)loc)
 				goto overflow;
 			break;
+#ifdef CONFIG_X86_PIE
+		case R_X86_64_GOTPCREL:
+			val = module_emit_got_entry(me, loc, rel + i, sym);
+			/* fallthrough */
+#endif
+		case R_X86_64_PLT32:
 		case R_X86_64_PC32:
 			val -= (u64)loc;
 			*(u32 *)loc = val;
-#if 0
-			if ((s64)val != *(s32 *)loc)
+			if (IS_ENABLED(CONFIG_X86_PIE) &&
+			    (s64)val != *(s32 *)loc)
 				goto overflow;
-#endif
 			break;
 		default:
 			pr_err("%s: Unknown rela relocation: %llu\n",
@@ -203,8 +398,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 overflow:
 	pr_err("overflow in relocation type %d val %Lx\n",
 	       (int)ELF64_R_TYPE(rel[i].r_info), val);
-	pr_err("`%s' likely not compiled with -mcmodel=kernel\n",
-	       me->name);
+	pr_err("`%s' likely too far from the kernel\n", me->name);
 	return -ENOEXEC;
 }
 #endif
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 24/27] x86/mm: Make the x86 GOT read-only
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:20   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The GOT is changed during early boot when relocations are applied. Make
it read-only directly. This table exists only for PIE binary.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 include/asm-generic/vmlinux.lds.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e549bff87c5b..a2301c292e26 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -279,6 +279,17 @@
 	VMLINUX_SYMBOL(__end_ro_after_init) = .;
 #endif
 
+#ifdef CONFIG_X86_PIE
+#define RO_GOT_X86							\
+	.got        : AT(ADDR(.got) - LOAD_OFFSET) {			\
+		VMLINUX_SYMBOL(__start_got) = .;			\
+		*(.got);						\
+		VMLINUX_SYMBOL(__end_got) = .;				\
+	}
+#else
+#define RO_GOT_X86
+#endif
+
 /*
  * Read only Data
  */
@@ -335,6 +346,7 @@
 		VMLINUX_SYMBOL(__end_builtin_fw) = .;			\
 	}								\
 									\
+	RO_GOT_X86							\
 	TRACEDATA							\
 									\
 	/* Kernel symbol table: Normal symbols */			\
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 24/27] x86/mm: Make the x86 GOT read-only
@ 2017-10-04 21:20   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The GOT is changed during early boot when relocations are applied. Make
it read-only directly. This table exists only for PIE binary.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 include/asm-generic/vmlinux.lds.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e549bff87c5b..a2301c292e26 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -279,6 +279,17 @@
 	VMLINUX_SYMBOL(__end_ro_after_init) = .;
 #endif
 
+#ifdef CONFIG_X86_PIE
+#define RO_GOT_X86							\
+	.got        : AT(ADDR(.got) - LOAD_OFFSET) {			\
+		VMLINUX_SYMBOL(__start_got) = .;			\
+		*(.got);						\
+		VMLINUX_SYMBOL(__end_got) = .;				\
+	}
+#else
+#define RO_GOT_X86
+#endif
+
 /*
  * Read only Data
  */
@@ -335,6 +346,7 @@
 		VMLINUX_SYMBOL(__end_builtin_fw) = .;			\
 	}								\
 									\
+	RO_GOT_X86							\
 	TRACEDATA							\
 									\
 	/* Kernel symbol table: Normal symbols */			\
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 24/27] x86/mm: Make the x86 GOT read-only
  2017-10-04 21:19 ` Thomas Garnier
                   ` (47 preceding siblings ...)
  (?)
@ 2017-10-04 21:20 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The GOT is changed during early boot when relocations are applied. Make
it read-only directly. This table exists only for PIE binary.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 include/asm-generic/vmlinux.lds.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e549bff87c5b..a2301c292e26 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -279,6 +279,17 @@
 	VMLINUX_SYMBOL(__end_ro_after_init) = .;
 #endif
 
+#ifdef CONFIG_X86_PIE
+#define RO_GOT_X86							\
+	.got        : AT(ADDR(.got) - LOAD_OFFSET) {			\
+		VMLINUX_SYMBOL(__start_got) = .;			\
+		*(.got);						\
+		VMLINUX_SYMBOL(__end_got) = .;				\
+	}
+#else
+#define RO_GOT_X86
+#endif
+
 /*
  * Read only Data
  */
@@ -335,6 +346,7 @@
 		VMLINUX_SYMBOL(__end_builtin_fw) = .;			\
 	}								\
 									\
+	RO_GOT_X86							\
 	TRACEDATA							\
 									\
 	/* Kernel symbol table: Normal symbols */			\
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 24/27] x86/mm: Make the x86 GOT read-only
@ 2017-10-04 21:20   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

The GOT is changed during early boot when relocations are applied. Make
it read-only directly. This table exists only for PIE binary.

Position Independent Executable (PIE) support will allow to extended the
KASLR randomization range below the -2G memory limit.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 include/asm-generic/vmlinux.lds.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e549bff87c5b..a2301c292e26 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -279,6 +279,17 @@
 	VMLINUX_SYMBOL(__end_ro_after_init) = .;
 #endif
 
+#ifdef CONFIG_X86_PIE
+#define RO_GOT_X86							\
+	.got        : AT(ADDR(.got) - LOAD_OFFSET) {			\
+		VMLINUX_SYMBOL(__start_got) = .;			\
+		*(.got);						\
+		VMLINUX_SYMBOL(__end_got) = .;				\
+	}
+#else
+#define RO_GOT_X86
+#endif
+
 /*
  * Read only Data
  */
@@ -335,6 +346,7 @@
 		VMLINUX_SYMBOL(__end_builtin_fw) = .;			\
 	}								\
 									\
+	RO_GOT_X86							\
 	TRACEDATA							\
 									\
 	/* Kernel symbol table: Normal symbols */			\
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 25/27] x86/pie: Add option to build the kernel as PIE
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:20   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add the CONFIG_X86_PIE option which builds the kernel as a Position
Independent Executable (PIE). The kernel is currently build with the
mcmodel=kernel option which forces it to stay on the top 2G of the
virtual address space. With PIE, the kernel will be able to move below
the current limit.

The --emit-relocs linker option was kept instead of using -pie to limit
the impact on mapped sections. Any incompatible relocation will be
catch by the arch/x86/tools/relocs binary at compile time.

Performance/Size impact:
Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig  | 8 ++++++++
 arch/x86/Makefile | 1 +
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1e4b399c64e5..b92f96923712 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2141,6 +2141,14 @@ config X86_GLOBAL_STACKPROTECTOR
 	bool
 	depends on CC_STACKPROTECTOR
 
+config X86_PIE
+	bool
+	depends on X86_64
+	select DEFAULT_HIDDEN
+	select DYNAMIC_MODULE_BASE
+	select MODULE_REL_CRCS if MODVERSIONS
+	select X86_GLOBAL_STACKPROTECTOR if CC_STACKPROTECTOR
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 42774185a58a..c49855b4b1be 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -144,6 +144,7 @@ else
 
         KBUILD_CFLAGS += -mno-red-zone
 ifdef CONFIG_X86_PIE
+        KBUILD_CFLAGS += -fPIC
         KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
 else
         KBUILD_CFLAGS += -mcmodel=kernel
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 25/27] x86/pie: Add option to build the kernel as PIE
@ 2017-10-04 21:20   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add the CONFIG_X86_PIE option which builds the kernel as a Position
Independent Executable (PIE). The kernel is currently build with the
mcmodel=kernel option which forces it to stay on the top 2G of the
virtual address space. With PIE, the kernel will be able to move below
the current limit.

The --emit-relocs linker option was kept instead of using -pie to limit
the impact on mapped sections. Any incompatible relocation will be
catch by the arch/x86/tools/relocs binary at compile time.

Performance/Size impact:
Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig  | 8 ++++++++
 arch/x86/Makefile | 1 +
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1e4b399c64e5..b92f96923712 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2141,6 +2141,14 @@ config X86_GLOBAL_STACKPROTECTOR
 	bool
 	depends on CC_STACKPROTECTOR
 
+config X86_PIE
+	bool
+	depends on X86_64
+	select DEFAULT_HIDDEN
+	select DYNAMIC_MODULE_BASE
+	select MODULE_REL_CRCS if MODVERSIONS
+	select X86_GLOBAL_STACKPROTECTOR if CC_STACKPROTECTOR
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 42774185a58a..c49855b4b1be 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -144,6 +144,7 @@ else
 
         KBUILD_CFLAGS += -mno-red-zone
 ifdef CONFIG_X86_PIE
+        KBUILD_CFLAGS += -fPIC
         KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
 else
         KBUILD_CFLAGS += -mcmodel=kernel
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 25/27] x86/pie: Add option to build the kernel as PIE
  2017-10-04 21:19 ` Thomas Garnier
                   ` (49 preceding siblings ...)
  (?)
@ 2017-10-04 21:20 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add the CONFIG_X86_PIE option which builds the kernel as a Position
Independent Executable (PIE). The kernel is currently build with the
mcmodel=kernel option which forces it to stay on the top 2G of the
virtual address space. With PIE, the kernel will be able to move below
the current limit.

The --emit-relocs linker option was kept instead of using -pie to limit
the impact on mapped sections. Any incompatible relocation will be
catch by the arch/x86/tools/relocs binary at compile time.

Performance/Size impact:
Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig  | 8 ++++++++
 arch/x86/Makefile | 1 +
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1e4b399c64e5..b92f96923712 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2141,6 +2141,14 @@ config X86_GLOBAL_STACKPROTECTOR
 	bool
 	depends on CC_STACKPROTECTOR
 
+config X86_PIE
+	bool
+	depends on X86_64
+	select DEFAULT_HIDDEN
+	select DYNAMIC_MODULE_BASE
+	select MODULE_REL_CRCS if MODVERSIONS
+	select X86_GLOBAL_STACKPROTECTOR if CC_STACKPROTECTOR
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 42774185a58a..c49855b4b1be 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -144,6 +144,7 @@ else
 
         KBUILD_CFLAGS += -mno-red-zone
 ifdef CONFIG_X86_PIE
+        KBUILD_CFLAGS += -fPIC
         KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
 else
         KBUILD_CFLAGS += -mcmodel=kernel
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 25/27] x86/pie: Add option to build the kernel as PIE
@ 2017-10-04 21:20   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Add the CONFIG_X86_PIE option which builds the kernel as a Position
Independent Executable (PIE). The kernel is currently build with the
mcmodel=kernel option which forces it to stay on the top 2G of the
virtual address space. With PIE, the kernel will be able to move below
the current limit.

The --emit-relocs linker option was kept instead of using -pie to limit
the impact on mapped sections. Any incompatible relocation will be
catch by the arch/x86/tools/relocs binary at compile time.

Performance/Size impact:
Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig  | 8 ++++++++
 arch/x86/Makefile | 1 +
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1e4b399c64e5..b92f96923712 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2141,6 +2141,14 @@ config X86_GLOBAL_STACKPROTECTOR
 	bool
 	depends on CC_STACKPROTECTOR
 
+config X86_PIE
+	bool
+	depends on X86_64
+	select DEFAULT_HIDDEN
+	select DYNAMIC_MODULE_BASE
+	select MODULE_REL_CRCS if MODVERSIONS
+	select X86_GLOBAL_STACKPROTECTOR if CC_STACKPROTECTOR
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 42774185a58a..c49855b4b1be 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -144,6 +144,7 @@ else
 
         KBUILD_CFLAGS += -mno-red-zone
 ifdef CONFIG_X86_PIE
+        KBUILD_CFLAGS += -fPIC
         KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/x86/kernel/module.lds
 else
         KBUILD_CFLAGS += -mcmodel=kernel
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 26/27] x86/relocs: Add option to generate 64-bit relocations
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:20   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The x86 relocation tool generates a list of 32-bit signed integers. There
was no need to use 64-bit integers because all addresses where above the 2G
top of the memory.

This change add a large-reloc option to generate 64-bit unsigned integers.
It can be used when the kernel plan to go below the top 2G and 32-bit
integers are not enough.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c        | 60 +++++++++++++++++++++++++++++++++---------
 arch/x86/tools/relocs.h        |  4 +--
 arch/x86/tools/relocs_common.c | 15 +++++++----
 3 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index bc032ad88b22..e7497ea1fe76 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -12,8 +12,14 @@
 
 static Elf_Ehdr ehdr;
 
+#if ELF_BITS == 64
+typedef uint64_t rel_off_t;
+#else
+typedef uint32_t rel_off_t;
+#endif
+
 struct relocs {
-	uint32_t	*offset;
+	rel_off_t	*offset;
 	unsigned long	count;
 	unsigned long	size;
 };
@@ -684,7 +690,7 @@ static void print_absolute_relocs(void)
 		printf("\n");
 }
 
-static void add_reloc(struct relocs *r, uint32_t offset)
+static void add_reloc(struct relocs *r, rel_off_t offset)
 {
 	if (r->count == r->size) {
 		unsigned long newsize = r->size + 50000;
@@ -1058,26 +1064,48 @@ static void sort_relocs(struct relocs *r)
 	qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs);
 }
 
-static int write32(uint32_t v, FILE *f)
+static int write32(rel_off_t rel, FILE *f)
 {
-	unsigned char buf[4];
+	unsigned char buf[sizeof(uint32_t)];
+	uint32_t v = (uint32_t)rel;
 
 	put_unaligned_le32(v, buf);
-	return fwrite(buf, 1, 4, f) == 4 ? 0 : -1;
+	return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
 }
 
-static int write32_as_text(uint32_t v, FILE *f)
+static int write32_as_text(rel_off_t rel, FILE *f)
 {
+	uint32_t v = (uint32_t)rel;
 	return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1;
 }
 
-static void emit_relocs(int as_text, int use_real_mode)
+static int write64(rel_off_t rel, FILE *f)
+{
+	unsigned char buf[sizeof(uint64_t)];
+	uint64_t v = (uint64_t)rel;
+
+	put_unaligned_le64(v, buf);
+	return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
+}
+
+static int write64_as_text(rel_off_t rel, FILE *f)
+{
+	uint64_t v = (uint64_t)rel;
+	return fprintf(f, "\t.quad 0x%016"PRIx64"\n", v) > 0 ? 0 : -1;
+}
+
+static void emit_relocs(int as_text, int use_real_mode, int use_large_reloc)
 {
 	int i;
-	int (*write_reloc)(uint32_t, FILE *) = write32;
+	int (*write_reloc)(rel_off_t, FILE *);
 	int (*do_reloc)(struct section *sec, Elf_Rel *rel, Elf_Sym *sym,
 			const char *symname);
 
+	if (use_large_reloc)
+		write_reloc = write64;
+	else
+		write_reloc = write32;
+
 #if ELF_BITS == 64
 	if (!use_real_mode)
 		do_reloc = do_reloc64;
@@ -1088,6 +1116,9 @@ static void emit_relocs(int as_text, int use_real_mode)
 		do_reloc = do_reloc32;
 	else
 		do_reloc = do_reloc_real;
+
+	/* Large relocations only for 64-bit */
+	use_large_reloc = 0;
 #endif
 
 	/* Collect up the relocations */
@@ -1111,8 +1142,13 @@ static void emit_relocs(int as_text, int use_real_mode)
 		 * gas will like.
 		 */
 		printf(".section \".data.reloc\",\"a\"\n");
-		printf(".balign 4\n");
-		write_reloc = write32_as_text;
+		if (use_large_reloc) {
+			printf(".balign 8\n");
+			write_reloc = write64_as_text;
+		} else {
+			printf(".balign 4\n");
+			write_reloc = write32_as_text;
+		}
 	}
 
 	if (use_real_mode) {
@@ -1180,7 +1216,7 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 	     int show_absolute_syms, int show_absolute_relocs,
-	     int show_reloc_info)
+	     int show_reloc_info, int use_large_reloc)
 {
 	regex_init(use_real_mode);
 	read_ehdr(fp);
@@ -1203,5 +1239,5 @@ void process(FILE *fp, int use_real_mode, int as_text,
 		print_reloc_info();
 		return;
 	}
-	emit_relocs(as_text, use_real_mode);
+	emit_relocs(as_text, use_real_mode, use_large_reloc);
 }
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 1d23bf953a4a..cb771cc4412d 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -30,8 +30,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int use_large_reloc);
 void process_64(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int use_large_reloc);
 #endif /* RELOCS_H */
diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
index acab636bcb34..9cf1391af50a 100644
--- a/arch/x86/tools/relocs_common.c
+++ b/arch/x86/tools/relocs_common.c
@@ -11,14 +11,14 @@ void die(char *fmt, ...)
 
 static void usage(void)
 {
-	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \
-	    " vmlinux\n");
+	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode|" \
+	    "--large-reloc]  vmlinux\n");
 }
 
 int main(int argc, char **argv)
 {
 	int show_absolute_syms, show_absolute_relocs, show_reloc_info;
-	int as_text, use_real_mode;
+	int as_text, use_real_mode, use_large_reloc;
 	const char *fname;
 	FILE *fp;
 	int i;
@@ -29,6 +29,7 @@ int main(int argc, char **argv)
 	show_reloc_info = 0;
 	as_text = 0;
 	use_real_mode = 0;
+	use_large_reloc = 0;
 	fname = NULL;
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
@@ -53,6 +54,10 @@ int main(int argc, char **argv)
 				use_real_mode = 1;
 				continue;
 			}
+			if (strcmp(arg, "--large-reloc") == 0) {
+				use_large_reloc = 1;
+				continue;
+			}
 		}
 		else if (!fname) {
 			fname = arg;
@@ -74,11 +79,11 @@ int main(int argc, char **argv)
 	if (e_ident[EI_CLASS] == ELFCLASS64)
 		process_64(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, use_large_reloc);
 	else
 		process_32(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, use_large_reloc);
 	fclose(fp);
 	return 0;
 }
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 26/27] x86/relocs: Add option to generate 64-bit relocations
@ 2017-10-04 21:20   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The x86 relocation tool generates a list of 32-bit signed integers. There
was no need to use 64-bit integers because all addresses where above the 2G
top of the memory.

This change add a large-reloc option to generate 64-bit unsigned integers.
It can be used when the kernel plan to go below the top 2G and 32-bit
integers are not enough.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c        | 60 +++++++++++++++++++++++++++++++++---------
 arch/x86/tools/relocs.h        |  4 +--
 arch/x86/tools/relocs_common.c | 15 +++++++----
 3 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index bc032ad88b22..e7497ea1fe76 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -12,8 +12,14 @@
 
 static Elf_Ehdr ehdr;
 
+#if ELF_BITS == 64
+typedef uint64_t rel_off_t;
+#else
+typedef uint32_t rel_off_t;
+#endif
+
 struct relocs {
-	uint32_t	*offset;
+	rel_off_t	*offset;
 	unsigned long	count;
 	unsigned long	size;
 };
@@ -684,7 +690,7 @@ static void print_absolute_relocs(void)
 		printf("\n");
 }
 
-static void add_reloc(struct relocs *r, uint32_t offset)
+static void add_reloc(struct relocs *r, rel_off_t offset)
 {
 	if (r->count == r->size) {
 		unsigned long newsize = r->size + 50000;
@@ -1058,26 +1064,48 @@ static void sort_relocs(struct relocs *r)
 	qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs);
 }
 
-static int write32(uint32_t v, FILE *f)
+static int write32(rel_off_t rel, FILE *f)
 {
-	unsigned char buf[4];
+	unsigned char buf[sizeof(uint32_t)];
+	uint32_t v = (uint32_t)rel;
 
 	put_unaligned_le32(v, buf);
-	return fwrite(buf, 1, 4, f) == 4 ? 0 : -1;
+	return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
 }
 
-static int write32_as_text(uint32_t v, FILE *f)
+static int write32_as_text(rel_off_t rel, FILE *f)
 {
+	uint32_t v = (uint32_t)rel;
 	return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1;
 }
 
-static void emit_relocs(int as_text, int use_real_mode)
+static int write64(rel_off_t rel, FILE *f)
+{
+	unsigned char buf[sizeof(uint64_t)];
+	uint64_t v = (uint64_t)rel;
+
+	put_unaligned_le64(v, buf);
+	return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
+}
+
+static int write64_as_text(rel_off_t rel, FILE *f)
+{
+	uint64_t v = (uint64_t)rel;
+	return fprintf(f, "\t.quad 0x%016"PRIx64"\n", v) > 0 ? 0 : -1;
+}
+
+static void emit_relocs(int as_text, int use_real_mode, int use_large_reloc)
 {
 	int i;
-	int (*write_reloc)(uint32_t, FILE *) = write32;
+	int (*write_reloc)(rel_off_t, FILE *);
 	int (*do_reloc)(struct section *sec, Elf_Rel *rel, Elf_Sym *sym,
 			const char *symname);
 
+	if (use_large_reloc)
+		write_reloc = write64;
+	else
+		write_reloc = write32;
+
 #if ELF_BITS == 64
 	if (!use_real_mode)
 		do_reloc = do_reloc64;
@@ -1088,6 +1116,9 @@ static void emit_relocs(int as_text, int use_real_mode)
 		do_reloc = do_reloc32;
 	else
 		do_reloc = do_reloc_real;
+
+	/* Large relocations only for 64-bit */
+	use_large_reloc = 0;
 #endif
 
 	/* Collect up the relocations */
@@ -1111,8 +1142,13 @@ static void emit_relocs(int as_text, int use_real_mode)
 		 * gas will like.
 		 */
 		printf(".section \".data.reloc\",\"a\"\n");
-		printf(".balign 4\n");
-		write_reloc = write32_as_text;
+		if (use_large_reloc) {
+			printf(".balign 8\n");
+			write_reloc = write64_as_text;
+		} else {
+			printf(".balign 4\n");
+			write_reloc = write32_as_text;
+		}
 	}
 
 	if (use_real_mode) {
@@ -1180,7 +1216,7 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 	     int show_absolute_syms, int show_absolute_relocs,
-	     int show_reloc_info)
+	     int show_reloc_info, int use_large_reloc)
 {
 	regex_init(use_real_mode);
 	read_ehdr(fp);
@@ -1203,5 +1239,5 @@ void process(FILE *fp, int use_real_mode, int as_text,
 		print_reloc_info();
 		return;
 	}
-	emit_relocs(as_text, use_real_mode);
+	emit_relocs(as_text, use_real_mode, use_large_reloc);
 }
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 1d23bf953a4a..cb771cc4412d 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -30,8 +30,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int use_large_reloc);
 void process_64(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int use_large_reloc);
 #endif /* RELOCS_H */
diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
index acab636bcb34..9cf1391af50a 100644
--- a/arch/x86/tools/relocs_common.c
+++ b/arch/x86/tools/relocs_common.c
@@ -11,14 +11,14 @@ void die(char *fmt, ...)
 
 static void usage(void)
 {
-	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \
-	    " vmlinux\n");
+	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode|" \
+	    "--large-reloc]  vmlinux\n");
 }
 
 int main(int argc, char **argv)
 {
 	int show_absolute_syms, show_absolute_relocs, show_reloc_info;
-	int as_text, use_real_mode;
+	int as_text, use_real_mode, use_large_reloc;
 	const char *fname;
 	FILE *fp;
 	int i;
@@ -29,6 +29,7 @@ int main(int argc, char **argv)
 	show_reloc_info = 0;
 	as_text = 0;
 	use_real_mode = 0;
+	use_large_reloc = 0;
 	fname = NULL;
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
@@ -53,6 +54,10 @@ int main(int argc, char **argv)
 				use_real_mode = 1;
 				continue;
 			}
+			if (strcmp(arg, "--large-reloc") == 0) {
+				use_large_reloc = 1;
+				continue;
+			}
 		}
 		else if (!fname) {
 			fname = arg;
@@ -74,11 +79,11 @@ int main(int argc, char **argv)
 	if (e_ident[EI_CLASS] == ELFCLASS64)
 		process_64(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, use_large_reloc);
 	else
 		process_32(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, use_large_reloc);
 	fclose(fp);
 	return 0;
 }
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 26/27] x86/relocs: Add option to generate 64-bit relocations
  2017-10-04 21:19 ` Thomas Garnier
                   ` (52 preceding siblings ...)
  (?)
@ 2017-10-04 21:20 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

The x86 relocation tool generates a list of 32-bit signed integers. There
was no need to use 64-bit integers because all addresses where above the 2G
top of the memory.

This change add a large-reloc option to generate 64-bit unsigned integers.
It can be used when the kernel plan to go below the top 2G and 32-bit
integers are not enough.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c        | 60 +++++++++++++++++++++++++++++++++---------
 arch/x86/tools/relocs.h        |  4 +--
 arch/x86/tools/relocs_common.c | 15 +++++++----
 3 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index bc032ad88b22..e7497ea1fe76 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -12,8 +12,14 @@
 
 static Elf_Ehdr ehdr;
 
+#if ELF_BITS == 64
+typedef uint64_t rel_off_t;
+#else
+typedef uint32_t rel_off_t;
+#endif
+
 struct relocs {
-	uint32_t	*offset;
+	rel_off_t	*offset;
 	unsigned long	count;
 	unsigned long	size;
 };
@@ -684,7 +690,7 @@ static void print_absolute_relocs(void)
 		printf("\n");
 }
 
-static void add_reloc(struct relocs *r, uint32_t offset)
+static void add_reloc(struct relocs *r, rel_off_t offset)
 {
 	if (r->count == r->size) {
 		unsigned long newsize = r->size + 50000;
@@ -1058,26 +1064,48 @@ static void sort_relocs(struct relocs *r)
 	qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs);
 }
 
-static int write32(uint32_t v, FILE *f)
+static int write32(rel_off_t rel, FILE *f)
 {
-	unsigned char buf[4];
+	unsigned char buf[sizeof(uint32_t)];
+	uint32_t v = (uint32_t)rel;
 
 	put_unaligned_le32(v, buf);
-	return fwrite(buf, 1, 4, f) == 4 ? 0 : -1;
+	return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
 }
 
-static int write32_as_text(uint32_t v, FILE *f)
+static int write32_as_text(rel_off_t rel, FILE *f)
 {
+	uint32_t v = (uint32_t)rel;
 	return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1;
 }
 
-static void emit_relocs(int as_text, int use_real_mode)
+static int write64(rel_off_t rel, FILE *f)
+{
+	unsigned char buf[sizeof(uint64_t)];
+	uint64_t v = (uint64_t)rel;
+
+	put_unaligned_le64(v, buf);
+	return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
+}
+
+static int write64_as_text(rel_off_t rel, FILE *f)
+{
+	uint64_t v = (uint64_t)rel;
+	return fprintf(f, "\t.quad 0x%016"PRIx64"\n", v) > 0 ? 0 : -1;
+}
+
+static void emit_relocs(int as_text, int use_real_mode, int use_large_reloc)
 {
 	int i;
-	int (*write_reloc)(uint32_t, FILE *) = write32;
+	int (*write_reloc)(rel_off_t, FILE *);
 	int (*do_reloc)(struct section *sec, Elf_Rel *rel, Elf_Sym *sym,
 			const char *symname);
 
+	if (use_large_reloc)
+		write_reloc = write64;
+	else
+		write_reloc = write32;
+
 #if ELF_BITS == 64
 	if (!use_real_mode)
 		do_reloc = do_reloc64;
@@ -1088,6 +1116,9 @@ static void emit_relocs(int as_text, int use_real_mode)
 		do_reloc = do_reloc32;
 	else
 		do_reloc = do_reloc_real;
+
+	/* Large relocations only for 64-bit */
+	use_large_reloc = 0;
 #endif
 
 	/* Collect up the relocations */
@@ -1111,8 +1142,13 @@ static void emit_relocs(int as_text, int use_real_mode)
 		 * gas will like.
 		 */
 		printf(".section \".data.reloc\",\"a\"\n");
-		printf(".balign 4\n");
-		write_reloc = write32_as_text;
+		if (use_large_reloc) {
+			printf(".balign 8\n");
+			write_reloc = write64_as_text;
+		} else {
+			printf(".balign 4\n");
+			write_reloc = write32_as_text;
+		}
 	}
 
 	if (use_real_mode) {
@@ -1180,7 +1216,7 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 	     int show_absolute_syms, int show_absolute_relocs,
-	     int show_reloc_info)
+	     int show_reloc_info, int use_large_reloc)
 {
 	regex_init(use_real_mode);
 	read_ehdr(fp);
@@ -1203,5 +1239,5 @@ void process(FILE *fp, int use_real_mode, int as_text,
 		print_reloc_info();
 		return;
 	}
-	emit_relocs(as_text, use_real_mode);
+	emit_relocs(as_text, use_real_mode, use_large_reloc);
 }
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 1d23bf953a4a..cb771cc4412d 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -30,8 +30,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int use_large_reloc);
 void process_64(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int use_large_reloc);
 #endif /* RELOCS_H */
diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
index acab636bcb34..9cf1391af50a 100644
--- a/arch/x86/tools/relocs_common.c
+++ b/arch/x86/tools/relocs_common.c
@@ -11,14 +11,14 @@ void die(char *fmt, ...)
 
 static void usage(void)
 {
-	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \
-	    " vmlinux\n");
+	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode|" \
+	    "--large-reloc]  vmlinux\n");
 }
 
 int main(int argc, char **argv)
 {
 	int show_absolute_syms, show_absolute_relocs, show_reloc_info;
-	int as_text, use_real_mode;
+	int as_text, use_real_mode, use_large_reloc;
 	const char *fname;
 	FILE *fp;
 	int i;
@@ -29,6 +29,7 @@ int main(int argc, char **argv)
 	show_reloc_info = 0;
 	as_text = 0;
 	use_real_mode = 0;
+	use_large_reloc = 0;
 	fname = NULL;
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
@@ -53,6 +54,10 @@ int main(int argc, char **argv)
 				use_real_mode = 1;
 				continue;
 			}
+			if (strcmp(arg, "--large-reloc") == 0) {
+				use_large_reloc = 1;
+				continue;
+			}
 		}
 		else if (!fname) {
 			fname = arg;
@@ -74,11 +79,11 @@ int main(int argc, char **argv)
 	if (e_ident[EI_CLASS] == ELFCLASS64)
 		process_64(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, use_large_reloc);
 	else
 		process_32(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, use_large_reloc);
 	fclose(fp);
 	return 0;
 }
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 26/27] x86/relocs: Add option to generate 64-bit relocations
@ 2017-10-04 21:20   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

The x86 relocation tool generates a list of 32-bit signed integers. There
was no need to use 64-bit integers because all addresses where above the 2G
top of the memory.

This change add a large-reloc option to generate 64-bit unsigned integers.
It can be used when the kernel plan to go below the top 2G and 32-bit
integers are not enough.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/tools/relocs.c        | 60 +++++++++++++++++++++++++++++++++---------
 arch/x86/tools/relocs.h        |  4 +--
 arch/x86/tools/relocs_common.c | 15 +++++++----
 3 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index bc032ad88b22..e7497ea1fe76 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -12,8 +12,14 @@
 
 static Elf_Ehdr ehdr;
 
+#if ELF_BITS == 64
+typedef uint64_t rel_off_t;
+#else
+typedef uint32_t rel_off_t;
+#endif
+
 struct relocs {
-	uint32_t	*offset;
+	rel_off_t	*offset;
 	unsigned long	count;
 	unsigned long	size;
 };
@@ -684,7 +690,7 @@ static void print_absolute_relocs(void)
 		printf("\n");
 }
 
-static void add_reloc(struct relocs *r, uint32_t offset)
+static void add_reloc(struct relocs *r, rel_off_t offset)
 {
 	if (r->count == r->size) {
 		unsigned long newsize = r->size + 50000;
@@ -1058,26 +1064,48 @@ static void sort_relocs(struct relocs *r)
 	qsort(r->offset, r->count, sizeof(r->offset[0]), cmp_relocs);
 }
 
-static int write32(uint32_t v, FILE *f)
+static int write32(rel_off_t rel, FILE *f)
 {
-	unsigned char buf[4];
+	unsigned char buf[sizeof(uint32_t)];
+	uint32_t v = (uint32_t)rel;
 
 	put_unaligned_le32(v, buf);
-	return fwrite(buf, 1, 4, f) == 4 ? 0 : -1;
+	return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
 }
 
-static int write32_as_text(uint32_t v, FILE *f)
+static int write32_as_text(rel_off_t rel, FILE *f)
 {
+	uint32_t v = (uint32_t)rel;
 	return fprintf(f, "\t.long 0x%08"PRIx32"\n", v) > 0 ? 0 : -1;
 }
 
-static void emit_relocs(int as_text, int use_real_mode)
+static int write64(rel_off_t rel, FILE *f)
+{
+	unsigned char buf[sizeof(uint64_t)];
+	uint64_t v = (uint64_t)rel;
+
+	put_unaligned_le64(v, buf);
+	return fwrite(buf, 1, sizeof(buf), f) == sizeof(buf) ? 0 : -1;
+}
+
+static int write64_as_text(rel_off_t rel, FILE *f)
+{
+	uint64_t v = (uint64_t)rel;
+	return fprintf(f, "\t.quad 0x%016"PRIx64"\n", v) > 0 ? 0 : -1;
+}
+
+static void emit_relocs(int as_text, int use_real_mode, int use_large_reloc)
 {
 	int i;
-	int (*write_reloc)(uint32_t, FILE *) = write32;
+	int (*write_reloc)(rel_off_t, FILE *);
 	int (*do_reloc)(struct section *sec, Elf_Rel *rel, Elf_Sym *sym,
 			const char *symname);
 
+	if (use_large_reloc)
+		write_reloc = write64;
+	else
+		write_reloc = write32;
+
 #if ELF_BITS == 64
 	if (!use_real_mode)
 		do_reloc = do_reloc64;
@@ -1088,6 +1116,9 @@ static void emit_relocs(int as_text, int use_real_mode)
 		do_reloc = do_reloc32;
 	else
 		do_reloc = do_reloc_real;
+
+	/* Large relocations only for 64-bit */
+	use_large_reloc = 0;
 #endif
 
 	/* Collect up the relocations */
@@ -1111,8 +1142,13 @@ static void emit_relocs(int as_text, int use_real_mode)
 		 * gas will like.
 		 */
 		printf(".section \".data.reloc\",\"a\"\n");
-		printf(".balign 4\n");
-		write_reloc = write32_as_text;
+		if (use_large_reloc) {
+			printf(".balign 8\n");
+			write_reloc = write64_as_text;
+		} else {
+			printf(".balign 4\n");
+			write_reloc = write32_as_text;
+		}
 	}
 
 	if (use_real_mode) {
@@ -1180,7 +1216,7 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 	     int show_absolute_syms, int show_absolute_relocs,
-	     int show_reloc_info)
+	     int show_reloc_info, int use_large_reloc)
 {
 	regex_init(use_real_mode);
 	read_ehdr(fp);
@@ -1203,5 +1239,5 @@ void process(FILE *fp, int use_real_mode, int as_text,
 		print_reloc_info();
 		return;
 	}
-	emit_relocs(as_text, use_real_mode);
+	emit_relocs(as_text, use_real_mode, use_large_reloc);
 }
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 1d23bf953a4a..cb771cc4412d 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -30,8 +30,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int use_large_reloc);
 void process_64(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int use_large_reloc);
 #endif /* RELOCS_H */
diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
index acab636bcb34..9cf1391af50a 100644
--- a/arch/x86/tools/relocs_common.c
+++ b/arch/x86/tools/relocs_common.c
@@ -11,14 +11,14 @@ void die(char *fmt, ...)
 
 static void usage(void)
 {
-	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \
-	    " vmlinux\n");
+	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode|" \
+	    "--large-reloc]  vmlinux\n");
 }
 
 int main(int argc, char **argv)
 {
 	int show_absolute_syms, show_absolute_relocs, show_reloc_info;
-	int as_text, use_real_mode;
+	int as_text, use_real_mode, use_large_reloc;
 	const char *fname;
 	FILE *fp;
 	int i;
@@ -29,6 +29,7 @@ int main(int argc, char **argv)
 	show_reloc_info = 0;
 	as_text = 0;
 	use_real_mode = 0;
+	use_large_reloc = 0;
 	fname = NULL;
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
@@ -53,6 +54,10 @@ int main(int argc, char **argv)
 				use_real_mode = 1;
 				continue;
 			}
+			if (strcmp(arg, "--large-reloc") == 0) {
+				use_large_reloc = 1;
+				continue;
+			}
 		}
 		else if (!fname) {
 			fname = arg;
@@ -74,11 +79,11 @@ int main(int argc, char **argv)
 	if (e_ident[EI_CLASS] == ELFCLASS64)
 		process_64(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, use_large_reloc);
 	else
 		process_32(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, use_large_reloc);
 	fclose(fp);
 	return 0;
 }
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 27/27] x86/kaslr: Add option to extend KASLR range from 1GB to 3GB
  2017-10-04 21:19 ` Thomas Garnier
  (?)
@ 2017-10-04 21:20   ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Bor
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add a new CONFIG_RANDOMIZE_BASE_LARGE option to benefit from PIE
support. It increases the KASLR range from 1GB to 3GB. The new range
stars at 0xffffffff00000000 just above the EFI memory region. This
option is off by default.

The boot code is adapted to create the appropriate page table spanning
three PUD pages.

The relocation table uses 64-bit integers generated with the updated
relocation tool with the large-reloc option.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig                     | 21 +++++++++++++++++++++
 arch/x86/boot/compressed/Makefile    |  5 +++++
 arch/x86/boot/compressed/misc.c      | 10 +++++++++-
 arch/x86/include/asm/page_64_types.h |  9 +++++++++
 arch/x86/kernel/head64.c             | 15 ++++++++++++---
 arch/x86/kernel/head_64.S            | 11 ++++++++++-
 6 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b92f96923712..81f4512549d1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2149,6 +2149,27 @@ config X86_PIE
 	select MODULE_REL_CRCS if MODVERSIONS
 	select X86_GLOBAL_STACKPROTECTOR if CC_STACKPROTECTOR
 
+config RANDOMIZE_BASE_LARGE
+	bool "Increase the randomization range of the kernel image"
+	depends on X86_64 && RANDOMIZE_BASE
+	select X86_PIE
+	select X86_MODULE_PLTS if MODULES
+	default n
+	---help---
+	  Build the kernel as a Position Independent Executable (PIE) and
+	  increase the available randomization range from 1GB to 3GB.
+
+	  This option impacts performance on kernel CPU intensive workloads up
+	  to 10% due to PIE generated code. Impact on user-mode processes and
+	  typical usage would be significantly less (0.50% when you build the
+	  kernel).
+
+	  The kernel and modules will generate slightly more assembly (1 to 2%
+	  increase on the .text sections). The vmlinux binary will be
+	  significantly smaller due to less relocations.
+
+	  If unsure say N
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 8a958274b54c..94dfee5a7cd2 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -112,7 +112,12 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 
 targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relocs
 
+# Large randomization require bigger relocation table
+ifeq ($(CONFIG_RANDOMIZE_BASE_LARGE),y)
+CMD_RELOCS = arch/x86/tools/relocs --large-reloc
+else
 CMD_RELOCS = arch/x86/tools/relocs
+endif
 quiet_cmd_relocs = RELOCS  $@
       cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c14217cd0155..c1ac9f2e283d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -169,10 +169,18 @@ void __puthex(unsigned long value)
 }
 
 #if CONFIG_X86_NEED_RELOCS
+
+/* Large randomization go lower than -2G and use large relocation table */
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+typedef long rel_t;
+#else
+typedef int rel_t;
+#endif
+
 static void handle_relocations(void *output, unsigned long output_len,
 			       unsigned long virt_addr)
 {
-	int *reloc;
+	rel_t *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
 	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 3f5f08b010d0..6b65f846dd64 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -48,7 +48,11 @@
 #define __PAGE_OFFSET           __PAGE_OFFSET_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define __START_KERNEL_map	_AC(0xffffffff00000000, UL)
+#else
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
+#endif /* CONFIG_RANDOMIZE_BASE_LARGE */
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #ifdef CONFIG_X86_5LEVEL
@@ -65,9 +69,14 @@
  * 512MiB by default, leaving 1.5GiB for modules once the page tables
  * are fully set up. If kernel ASLR is configured, it can extend the
  * kernel page table mapping, reducing the size of the modules area.
+ * On PIE, we relocate the binary 2G lower so add this extra space.
  */
 #if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define KERNEL_IMAGE_SIZE	(_AC(3, UL) * 1024 * 1024 * 1024)
+#else
 #define KERNEL_IMAGE_SIZE	(1024 * 1024 * 1024)
+#endif
 #else
 #define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index b6363f0d11a7..d603d0f5a40a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -39,6 +39,7 @@ static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
 #define __head	__section(.head.text)
+#define pud_count(x)   (((x + (PUD_SIZE - 1)) & ~(PUD_SIZE - 1)) >> PUD_SHIFT)
 
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 {
@@ -56,6 +57,8 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 {
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
+	unsigned long level3_kernel_start, level3_kernel_count;
+	unsigned long level3_fixmap_start;
 	pgdval_t *pgd;
 	p4dval_t *p4d;
 	pudval_t *pud;
@@ -83,6 +86,11 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	/* Include the SME encryption mask in the fixup value */
 	load_delta += sme_get_me_mask();
 
+	/* Look at the randomization spread to adapt page table used */
+	level3_kernel_start = pud_index(__START_KERNEL_map);
+	level3_kernel_count = pud_count(KERNEL_IMAGE_SIZE);
+	level3_fixmap_start = level3_kernel_start + level3_kernel_count;
+
 	/* Fixup the physical addresses in the page table */
 
 	pgd = fixup_pointer(&early_top_pgt, physaddr);
@@ -94,8 +102,9 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	}
 
 	pud = fixup_pointer(&level3_kernel_pgt, physaddr);
-	pud[510] += load_delta;
-	pud[511] += load_delta;
+	for (i = 0; i < level3_kernel_count; i++)
+		pud[level3_kernel_start + i] += load_delta;
+	pud[level3_fixmap_start] += load_delta;
 
 	pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
 	pmd[506] += load_delta;
@@ -150,7 +159,7 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	 */
 
 	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
+	for (i = 0; i < PTRS_PER_PMD * level3_kernel_count; i++) {
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
 	}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index df5198e310fc..7918ffefc9c9 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -39,11 +39,15 @@
 
 #define p4d_index(x)	(((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
 #define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+#define pud_count(x)   (((x + (PUD_SIZE - 1)) & ~(PUD_SIZE - 1)) >> PUD_SHIFT)
 
 PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
 PGD_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
+/* Adapt page table L3 space based on range of randomization */
+L3_KERNEL_ENTRY_COUNT = pud_count(KERNEL_IMAGE_SIZE)
+
 	.text
 	__HEAD
 	.code64
@@ -413,7 +417,12 @@ NEXT_PAGE(level4_kernel_pgt)
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	i = 0
+	.rept	L3_KERNEL_ENTRY_COUNT
+	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC \
+		+ PAGE_SIZE*i
+	i = i + 1
+	.endr
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(level2_kernel_pgt)
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 27/27] x86/kaslr: Add option to extend KASLR range from 1GB to 3GB
@ 2017-10-04 21:20   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add a new CONFIG_RANDOMIZE_BASE_LARGE option to benefit from PIE
support. It increases the KASLR range from 1GB to 3GB. The new range
stars at 0xffffffff00000000 just above the EFI memory region. This
option is off by default.

The boot code is adapted to create the appropriate page table spanning
three PUD pages.

The relocation table uses 64-bit integers generated with the updated
relocation tool with the large-reloc option.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig                     | 21 +++++++++++++++++++++
 arch/x86/boot/compressed/Makefile    |  5 +++++
 arch/x86/boot/compressed/misc.c      | 10 +++++++++-
 arch/x86/include/asm/page_64_types.h |  9 +++++++++
 arch/x86/kernel/head64.c             | 15 ++++++++++++---
 arch/x86/kernel/head_64.S            | 11 ++++++++++-
 6 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b92f96923712..81f4512549d1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2149,6 +2149,27 @@ config X86_PIE
 	select MODULE_REL_CRCS if MODVERSIONS
 	select X86_GLOBAL_STACKPROTECTOR if CC_STACKPROTECTOR
 
+config RANDOMIZE_BASE_LARGE
+	bool "Increase the randomization range of the kernel image"
+	depends on X86_64 && RANDOMIZE_BASE
+	select X86_PIE
+	select X86_MODULE_PLTS if MODULES
+	default n
+	---help---
+	  Build the kernel as a Position Independent Executable (PIE) and
+	  increase the available randomization range from 1GB to 3GB.
+
+	  This option impacts performance on kernel CPU intensive workloads up
+	  to 10% due to PIE generated code. Impact on user-mode processes and
+	  typical usage would be significantly less (0.50% when you build the
+	  kernel).
+
+	  The kernel and modules will generate slightly more assembly (1 to 2%
+	  increase on the .text sections). The vmlinux binary will be
+	  significantly smaller due to less relocations.
+
+	  If unsure say N
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 8a958274b54c..94dfee5a7cd2 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -112,7 +112,12 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 
 targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relocs
 
+# Large randomization require bigger relocation table
+ifeq ($(CONFIG_RANDOMIZE_BASE_LARGE),y)
+CMD_RELOCS = arch/x86/tools/relocs --large-reloc
+else
 CMD_RELOCS = arch/x86/tools/relocs
+endif
 quiet_cmd_relocs = RELOCS  $@
       cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c14217cd0155..c1ac9f2e283d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -169,10 +169,18 @@ void __puthex(unsigned long value)
 }
 
 #if CONFIG_X86_NEED_RELOCS
+
+/* Large randomization go lower than -2G and use large relocation table */
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+typedef long rel_t;
+#else
+typedef int rel_t;
+#endif
+
 static void handle_relocations(void *output, unsigned long output_len,
 			       unsigned long virt_addr)
 {
-	int *reloc;
+	rel_t *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
 	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 3f5f08b010d0..6b65f846dd64 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -48,7 +48,11 @@
 #define __PAGE_OFFSET           __PAGE_OFFSET_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define __START_KERNEL_map	_AC(0xffffffff00000000, UL)
+#else
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
+#endif /* CONFIG_RANDOMIZE_BASE_LARGE */
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #ifdef CONFIG_X86_5LEVEL
@@ -65,9 +69,14 @@
  * 512MiB by default, leaving 1.5GiB for modules once the page tables
  * are fully set up. If kernel ASLR is configured, it can extend the
  * kernel page table mapping, reducing the size of the modules area.
+ * On PIE, we relocate the binary 2G lower so add this extra space.
  */
 #if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define KERNEL_IMAGE_SIZE	(_AC(3, UL) * 1024 * 1024 * 1024)
+#else
 #define KERNEL_IMAGE_SIZE	(1024 * 1024 * 1024)
+#endif
 #else
 #define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index b6363f0d11a7..d603d0f5a40a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -39,6 +39,7 @@ static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
 #define __head	__section(.head.text)
+#define pud_count(x)   (((x + (PUD_SIZE - 1)) & ~(PUD_SIZE - 1)) >> PUD_SHIFT)
 
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 {
@@ -56,6 +57,8 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 {
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
+	unsigned long level3_kernel_start, level3_kernel_count;
+	unsigned long level3_fixmap_start;
 	pgdval_t *pgd;
 	p4dval_t *p4d;
 	pudval_t *pud;
@@ -83,6 +86,11 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	/* Include the SME encryption mask in the fixup value */
 	load_delta += sme_get_me_mask();
 
+	/* Look at the randomization spread to adapt page table used */
+	level3_kernel_start = pud_index(__START_KERNEL_map);
+	level3_kernel_count = pud_count(KERNEL_IMAGE_SIZE);
+	level3_fixmap_start = level3_kernel_start + level3_kernel_count;
+
 	/* Fixup the physical addresses in the page table */
 
 	pgd = fixup_pointer(&early_top_pgt, physaddr);
@@ -94,8 +102,9 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	}
 
 	pud = fixup_pointer(&level3_kernel_pgt, physaddr);
-	pud[510] += load_delta;
-	pud[511] += load_delta;
+	for (i = 0; i < level3_kernel_count; i++)
+		pud[level3_kernel_start + i] += load_delta;
+	pud[level3_fixmap_start] += load_delta;
 
 	pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
 	pmd[506] += load_delta;
@@ -150,7 +159,7 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	 */
 
 	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
+	for (i = 0; i < PTRS_PER_PMD * level3_kernel_count; i++) {
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
 	}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index df5198e310fc..7918ffefc9c9 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -39,11 +39,15 @@
 
 #define p4d_index(x)	(((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
 #define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+#define pud_count(x)   (((x + (PUD_SIZE - 1)) & ~(PUD_SIZE - 1)) >> PUD_SHIFT)
 
 PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
 PGD_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
+/* Adapt page table L3 space based on range of randomization */
+L3_KERNEL_ENTRY_COUNT = pud_count(KERNEL_IMAGE_SIZE)
+
 	.text
 	__HEAD
 	.code64
@@ -413,7 +417,12 @@ NEXT_PAGE(level4_kernel_pgt)
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	i = 0
+	.rept	L3_KERNEL_ENTRY_COUNT
+	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC \
+		+ PAGE_SIZE*i
+	i = i + 1
+	.endr
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(level2_kernel_pgt)
-- 
2.14.2.920.gcf0c67979c-goog


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [RFC v3 27/27] x86/kaslr: Add option to extend KASLR range from 1GB to 3GB
  2017-10-04 21:19 ` Thomas Garnier
                   ` (53 preceding siblings ...)
  (?)
@ 2017-10-04 21:20 ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

Add a new CONFIG_RANDOMIZE_BASE_LARGE option to benefit from PIE
support. It increases the KASLR range from 1GB to 3GB. The new range
stars at 0xffffffff00000000 just above the EFI memory region. This
option is off by default.

The boot code is adapted to create the appropriate page table spanning
three PUD pages.

The relocation table uses 64-bit integers generated with the updated
relocation tool with the large-reloc option.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig                     | 21 +++++++++++++++++++++
 arch/x86/boot/compressed/Makefile    |  5 +++++
 arch/x86/boot/compressed/misc.c      | 10 +++++++++-
 arch/x86/include/asm/page_64_types.h |  9 +++++++++
 arch/x86/kernel/head64.c             | 15 ++++++++++++---
 arch/x86/kernel/head_64.S            | 11 ++++++++++-
 6 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b92f96923712..81f4512549d1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2149,6 +2149,27 @@ config X86_PIE
 	select MODULE_REL_CRCS if MODVERSIONS
 	select X86_GLOBAL_STACKPROTECTOR if CC_STACKPROTECTOR
 
+config RANDOMIZE_BASE_LARGE
+	bool "Increase the randomization range of the kernel image"
+	depends on X86_64 && RANDOMIZE_BASE
+	select X86_PIE
+	select X86_MODULE_PLTS if MODULES
+	default n
+	---help---
+	  Build the kernel as a Position Independent Executable (PIE) and
+	  increase the available randomization range from 1GB to 3GB.
+
+	  This option impacts performance on kernel CPU intensive workloads up
+	  to 10% due to PIE generated code. Impact on user-mode processes and
+	  typical usage would be significantly less (0.50% when you build the
+	  kernel).
+
+	  The kernel and modules will generate slightly more assembly (1 to 2%
+	  increase on the .text sections). The vmlinux binary will be
+	  significantly smaller due to less relocations.
+
+	  If unsure say N
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 8a958274b54c..94dfee5a7cd2 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -112,7 +112,12 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 
 targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relocs
 
+# Large randomization require bigger relocation table
+ifeq ($(CONFIG_RANDOMIZE_BASE_LARGE),y)
+CMD_RELOCS = arch/x86/tools/relocs --large-reloc
+else
 CMD_RELOCS = arch/x86/tools/relocs
+endif
 quiet_cmd_relocs = RELOCS  $@
       cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c14217cd0155..c1ac9f2e283d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -169,10 +169,18 @@ void __puthex(unsigned long value)
 }
 
 #if CONFIG_X86_NEED_RELOCS
+
+/* Large randomization go lower than -2G and use large relocation table */
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+typedef long rel_t;
+#else
+typedef int rel_t;
+#endif
+
 static void handle_relocations(void *output, unsigned long output_len,
 			       unsigned long virt_addr)
 {
-	int *reloc;
+	rel_t *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
 	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 3f5f08b010d0..6b65f846dd64 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -48,7 +48,11 @@
 #define __PAGE_OFFSET           __PAGE_OFFSET_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define __START_KERNEL_map	_AC(0xffffffff00000000, UL)
+#else
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
+#endif /* CONFIG_RANDOMIZE_BASE_LARGE */
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #ifdef CONFIG_X86_5LEVEL
@@ -65,9 +69,14 @@
  * 512MiB by default, leaving 1.5GiB for modules once the page tables
  * are fully set up. If kernel ASLR is configured, it can extend the
  * kernel page table mapping, reducing the size of the modules area.
+ * On PIE, we relocate the binary 2G lower so add this extra space.
  */
 #if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define KERNEL_IMAGE_SIZE	(_AC(3, UL) * 1024 * 1024 * 1024)
+#else
 #define KERNEL_IMAGE_SIZE	(1024 * 1024 * 1024)
+#endif
 #else
 #define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index b6363f0d11a7..d603d0f5a40a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -39,6 +39,7 @@ static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
 #define __head	__section(.head.text)
+#define pud_count(x)   (((x + (PUD_SIZE - 1)) & ~(PUD_SIZE - 1)) >> PUD_SHIFT)
 
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 {
@@ -56,6 +57,8 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 {
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
+	unsigned long level3_kernel_start, level3_kernel_count;
+	unsigned long level3_fixmap_start;
 	pgdval_t *pgd;
 	p4dval_t *p4d;
 	pudval_t *pud;
@@ -83,6 +86,11 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	/* Include the SME encryption mask in the fixup value */
 	load_delta += sme_get_me_mask();
 
+	/* Look at the randomization spread to adapt page table used */
+	level3_kernel_start = pud_index(__START_KERNEL_map);
+	level3_kernel_count = pud_count(KERNEL_IMAGE_SIZE);
+	level3_fixmap_start = level3_kernel_start + level3_kernel_count;
+
 	/* Fixup the physical addresses in the page table */
 
 	pgd = fixup_pointer(&early_top_pgt, physaddr);
@@ -94,8 +102,9 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	}
 
 	pud = fixup_pointer(&level3_kernel_pgt, physaddr);
-	pud[510] += load_delta;
-	pud[511] += load_delta;
+	for (i = 0; i < level3_kernel_count; i++)
+		pud[level3_kernel_start + i] += load_delta;
+	pud[level3_fixmap_start] += load_delta;
 
 	pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
 	pmd[506] += load_delta;
@@ -150,7 +159,7 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	 */
 
 	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
+	for (i = 0; i < PTRS_PER_PMD * level3_kernel_count; i++) {
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
 	}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index df5198e310fc..7918ffefc9c9 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -39,11 +39,15 @@
 
 #define p4d_index(x)	(((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
 #define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+#define pud_count(x)   (((x + (PUD_SIZE - 1)) & ~(PUD_SIZE - 1)) >> PUD_SHIFT)
 
 PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
 PGD_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
+/* Adapt page table L3 space based on range of randomization */
+L3_KERNEL_ENTRY_COUNT = pud_count(KERNEL_IMAGE_SIZE)
+
 	.text
 	__HEAD
 	.code64
@@ -413,7 +417,12 @@ NEXT_PAGE(level4_kernel_pgt)
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	i = 0
+	.rept	L3_KERNEL_ENTRY_COUNT
+	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC \
+		+ PAGE_SIZE*i
+	i = i + 1
+	.endr
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(level2_kernel_pgt)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* [kernel-hardening] [RFC v3 27/27] x86/kaslr: Add option to extend KASLR range from 1GB to 3GB
@ 2017-10-04 21:20   ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter, Boris Ostrovsky, Alexey Dobriyan,
	Andrew Morton, Paul Gortmaker, Chris Metcalf, Paul E . McKenney,
	Nicolas Pitre, Borislav Petkov, Luis R . Rodriguez,
	Greg Kroah-Hartman, Christopher Li, Steven Rostedt, Jason Baron,
	Dou Liyang, Rafael J . Wysocki, Mika Westerberg, Lukas Wunner,
	Masahiro Yamada, Alexei Starovoitov, Daniel Borkmann,
	Markus Trippelsdorf, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

Add a new CONFIG_RANDOMIZE_BASE_LARGE option to benefit from PIE
support. It increases the KASLR range from 1GB to 3GB. The new range
stars at 0xffffffff00000000 just above the EFI memory region. This
option is off by default.

The boot code is adapted to create the appropriate page table spanning
three PUD pages.

The relocation table uses 64-bit integers generated with the updated
relocation tool with the large-reloc option.

Signed-off-by: Thomas Garnier <thgarnie@google.com>
---
 arch/x86/Kconfig                     | 21 +++++++++++++++++++++
 arch/x86/boot/compressed/Makefile    |  5 +++++
 arch/x86/boot/compressed/misc.c      | 10 +++++++++-
 arch/x86/include/asm/page_64_types.h |  9 +++++++++
 arch/x86/kernel/head64.c             | 15 ++++++++++++---
 arch/x86/kernel/head_64.S            | 11 ++++++++++-
 6 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b92f96923712..81f4512549d1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2149,6 +2149,27 @@ config X86_PIE
 	select MODULE_REL_CRCS if MODVERSIONS
 	select X86_GLOBAL_STACKPROTECTOR if CC_STACKPROTECTOR
 
+config RANDOMIZE_BASE_LARGE
+	bool "Increase the randomization range of the kernel image"
+	depends on X86_64 && RANDOMIZE_BASE
+	select X86_PIE
+	select X86_MODULE_PLTS if MODULES
+	default n
+	---help---
+	  Build the kernel as a Position Independent Executable (PIE) and
+	  increase the available randomization range from 1GB to 3GB.
+
+	  This option impacts performance on kernel CPU intensive workloads up
+	  to 10% due to PIE generated code. Impact on user-mode processes and
+	  typical usage would be significantly less (0.50% when you build the
+	  kernel).
+
+	  The kernel and modules will generate slightly more assembly (1 to 2%
+	  increase on the .text sections). The vmlinux binary will be
+	  significantly smaller due to less relocations.
+
+	  If unsure say N
+
 config HOTPLUG_CPU
 	bool "Support for hot-pluggable CPUs"
 	depends on SMP
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 8a958274b54c..94dfee5a7cd2 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -112,7 +112,12 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 
 targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relocs
 
+# Large randomization require bigger relocation table
+ifeq ($(CONFIG_RANDOMIZE_BASE_LARGE),y)
+CMD_RELOCS = arch/x86/tools/relocs --large-reloc
+else
 CMD_RELOCS = arch/x86/tools/relocs
+endif
 quiet_cmd_relocs = RELOCS  $@
       cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c14217cd0155..c1ac9f2e283d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -169,10 +169,18 @@ void __puthex(unsigned long value)
 }
 
 #if CONFIG_X86_NEED_RELOCS
+
+/* Large randomization go lower than -2G and use large relocation table */
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+typedef long rel_t;
+#else
+typedef int rel_t;
+#endif
+
 static void handle_relocations(void *output, unsigned long output_len,
 			       unsigned long virt_addr)
 {
-	int *reloc;
+	rel_t *reloc;
 	unsigned long delta, map, ptr;
 	unsigned long min_addr = (unsigned long)output;
 	unsigned long max_addr = min_addr + (VO___bss_start - VO__text);
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 3f5f08b010d0..6b65f846dd64 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -48,7 +48,11 @@
 #define __PAGE_OFFSET           __PAGE_OFFSET_BASE
 #endif /* CONFIG_RANDOMIZE_MEMORY */
 
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define __START_KERNEL_map	_AC(0xffffffff00000000, UL)
+#else
 #define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
+#endif /* CONFIG_RANDOMIZE_BASE_LARGE */
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #ifdef CONFIG_X86_5LEVEL
@@ -65,9 +69,14 @@
  * 512MiB by default, leaving 1.5GiB for modules once the page tables
  * are fully set up. If kernel ASLR is configured, it can extend the
  * kernel page table mapping, reducing the size of the modules area.
+ * On PIE, we relocate the binary 2G lower so add this extra space.
  */
 #if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE_LARGE
+#define KERNEL_IMAGE_SIZE	(_AC(3, UL) * 1024 * 1024 * 1024)
+#else
 #define KERNEL_IMAGE_SIZE	(1024 * 1024 * 1024)
+#endif
 #else
 #define KERNEL_IMAGE_SIZE	(512 * 1024 * 1024)
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index b6363f0d11a7..d603d0f5a40a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -39,6 +39,7 @@ static unsigned int __initdata next_early_pgt;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
 
 #define __head	__section(.head.text)
+#define pud_count(x)   (((x + (PUD_SIZE - 1)) & ~(PUD_SIZE - 1)) >> PUD_SHIFT)
 
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 {
@@ -56,6 +57,8 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 {
 	unsigned long load_delta, *p;
 	unsigned long pgtable_flags;
+	unsigned long level3_kernel_start, level3_kernel_count;
+	unsigned long level3_fixmap_start;
 	pgdval_t *pgd;
 	p4dval_t *p4d;
 	pudval_t *pud;
@@ -83,6 +86,11 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	/* Include the SME encryption mask in the fixup value */
 	load_delta += sme_get_me_mask();
 
+	/* Look at the randomization spread to adapt page table used */
+	level3_kernel_start = pud_index(__START_KERNEL_map);
+	level3_kernel_count = pud_count(KERNEL_IMAGE_SIZE);
+	level3_fixmap_start = level3_kernel_start + level3_kernel_count;
+
 	/* Fixup the physical addresses in the page table */
 
 	pgd = fixup_pointer(&early_top_pgt, physaddr);
@@ -94,8 +102,9 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	}
 
 	pud = fixup_pointer(&level3_kernel_pgt, physaddr);
-	pud[510] += load_delta;
-	pud[511] += load_delta;
+	for (i = 0; i < level3_kernel_count; i++)
+		pud[level3_kernel_start + i] += load_delta;
+	pud[level3_fixmap_start] += load_delta;
 
 	pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
 	pmd[506] += load_delta;
@@ -150,7 +159,7 @@ unsigned long __head notrace __startup_64(unsigned long physaddr,
 	 */
 
 	pmd = fixup_pointer(level2_kernel_pgt, physaddr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
+	for (i = 0; i < PTRS_PER_PMD * level3_kernel_count; i++) {
 		if (pmd[i] & _PAGE_PRESENT)
 			pmd[i] += load_delta;
 	}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index df5198e310fc..7918ffefc9c9 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -39,11 +39,15 @@
 
 #define p4d_index(x)	(((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
 #define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+#define pud_count(x)   (((x + (PUD_SIZE - 1)) & ~(PUD_SIZE - 1)) >> PUD_SHIFT)
 
 PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
 PGD_START_KERNEL = pgd_index(__START_KERNEL_map)
 L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
+/* Adapt page table L3 space based on range of randomization */
+L3_KERNEL_ENTRY_COUNT = pud_count(KERNEL_IMAGE_SIZE)
+
 	.text
 	__HEAD
 	.code64
@@ -413,7 +417,12 @@ NEXT_PAGE(level4_kernel_pgt)
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	i = 0
+	.rept	L3_KERNEL_ENTRY_COUNT
+	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC \
+		+ PAGE_SIZE*i
+	i = i + 1
+	.endr
 	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(level2_kernel_pgt)
-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply related	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-04 21:19   ` Thomas Garnier
@ 2017-10-05 13:06     ` Steven Rostedt
  -1 siblings, 0 replies; 231+ messages in thread
From: Steven Rostedt @ 2017-10-05 13:06 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Michal Hocko, Radim Krčmář,
	linux-doc, Daniel Micay, Len Brown, Peter Zijlstra,
	Christopher Li, Jan H . Schönherr, Alexei Starovoitov,
	Matthew Wilcox, virtualization, David Howells, Paul Gortmaker,
	Pavel Machek, H . Peter Anvin, kernel-hardening, Andrey Ryabinin,
	Christoph Lameter, Thomas Gleixner, Chris Metcalf, x86,
	Herbert Xu

On Wed,  4 Oct 2017 14:19:56 -0700
Thomas Garnier <thgarnie@google.com> wrote:

> When using -fPIE/PIC with function tracing, the compiler generates a
> call through the GOT (call *__fentry__@GOTPCREL). This instruction
> takes 6 bytes instead of 5 on the usual relative call.
> 
> With this change, function tracing supports 6 bytes on traceable
> function and can still replace relative calls on the ftrace assembly
> functions.
> 
> Position Independent Executable (PIE) support will allow to extended the
> KASLR randomization range below the -2G memory limit.

Question: This 6 bytes is only the initial call that gcc creates. When
function tracing is enabled, the calls are back to the normal call to
the ftrace trampoline?

-- Steve

^ permalink raw reply	[flat|nested] 231+ messages in thread

* [kernel-hardening] Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
@ 2017-10-05 13:06     ` Steven Rostedt
  0 siblings, 0 replies; 231+ messages in thread
From: Steven Rostedt @ 2017-10-05 13:06 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Kees Cook, Matthias Kaehlcke, Tom Lendacky, Andy Lutomirski,
	Kirill A . Shutemov, Borislav Petkov, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Juergen Gross, Chris Wright,
	Alok Kataria, Rusty Russell, Tejun Heo, Christoph Lameter,
	Boris Ostrovsky, Alexey Dobriyan, Andrew Morton, Paul Gortmaker,
	Chris Metcalf, Paul E . McKenney, Nicolas Pitre, Borislav Petkov,
	Luis R . Rodriguez, Greg Kroah-Hartman, Christopher Li,
	Jason Baron, Dou Liyang, Rafael J . Wysocki, Mika Westerberg,
	Lukas Wunner, Masahiro Yamada, Alexei Starovoitov,
	Daniel Borkmann, Markus Trippelsdorf, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay, x86, linux-crypto,
	linux-kernel, linux-pm, virtualization, xen-devel, linux-arch,
	linux-sparse, kvm, linux-doc, kernel-hardening

On Wed,  4 Oct 2017 14:19:56 -0700
Thomas Garnier <thgarnie@google.com> wrote:

> When using -fPIE/PIC with function tracing, the compiler generates a
> call through the GOT (call *__fentry__@GOTPCREL). This instruction
> takes 6 bytes instead of 5 on the usual relative call.
> 
> With this change, function tracing supports 6 bytes on traceable
> function and can still replace relative calls on the ftrace assembly
> functions.
> 
> Position Independent Executable (PIE) support will allow to extended the
> KASLR randomization range below the -2G memory limit.

Question: This 6 bytes is only the initial call that gcc creates. When
function tracing is enabled, the calls are back to the normal call to
the ftrace trampoline?

-- Steve

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-04 21:19   ` Thomas Garnier
                     ` (2 preceding siblings ...)
  (?)
@ 2017-10-05 13:06   ` Steven Rostedt
  -1 siblings, 0 replies; 231+ messages in thread
From: Steven Rostedt @ 2017-10-05 13:06 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Michal Hocko, Radim Krčmář,
	linux-doc, Daniel Micay, Len Brown, Peter Zijlstra,
	Christopher Li, Jan H . Schönherr, Alexei Starovoitov,
	Matthew Wilcox, virtualization, David Howells, Paul Gortmaker,
	Pavel Machek, H . Peter Anvin, kernel-hardening, Andrey Ryabinin,
	Christoph Lameter, Thomas Gleixner, Chris Metcalf, x86,
	Herbert Xu

On Wed,  4 Oct 2017 14:19:56 -0700
Thomas Garnier <thgarnie@google.com> wrote:

> When using -fPIE/PIC with function tracing, the compiler generates a
> call through the GOT (call *__fentry__@GOTPCREL). This instruction
> takes 6 bytes instead of 5 on the usual relative call.
> 
> With this change, function tracing supports 6 bytes on traceable
> function and can still replace relative calls on the ftrace assembly
> functions.
> 
> Position Independent Executable (PIE) support will allow to extended the
> KASLR randomization range below the -2G memory limit.

Question: This 6 bytes is only the initial call that gcc creates. When
function tracing is enabled, the calls are back to the normal call to
the ftrace trampoline?

-- Steve


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-05 13:06     ` [kernel-hardening] " Steven Rostedt
  (?)
@ 2017-10-05 16:01       ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-05 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Kees Cook, Matthias Kaehlcke, Tom Lendacky, Andy Lutomirski,
	Kirill A . Shutemov, Borislav Petkov, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Juergen Gross, Chris Wright,
	Alok Kataria, Rusty Russell, Tejun Heo, Christoph Lameter,
	Boris Ostrovsky

On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed,  4 Oct 2017 14:19:56 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> When using -fPIE/PIC with function tracing, the compiler generates a
>> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> takes 6 bytes instead of 5 on the usual relative call.
>>
>> With this change, function tracing supports 6 bytes on traceable
>> function and can still replace relative calls on the ftrace assembly
>> functions.
>>
>> Position Independent Executable (PIE) support will allow to extended the
>> KASLR randomization range below the -2G memory limit.
>
> Question: This 6 bytes is only the initial call that gcc creates. When
> function tracing is enabled, the calls are back to the normal call to
> the ftrace trampoline?

That is correct.

>
> -- Steve
>



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
@ 2017-10-05 16:01       ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-05 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Kees Cook, Matthias Kaehlcke, Tom Lendacky, Andy Lutomirski,
	Kirill A . Shutemov, Borislav Petkov, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Juergen Gross, Chris Wright,
	Alok Kataria, Rusty Russell, Tejun Heo, Christoph Lameter,
	Boris Ostrovsky

On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed,  4 Oct 2017 14:19:56 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> When using -fPIE/PIC with function tracing, the compiler generates a
>> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> takes 6 bytes instead of 5 on the usual relative call.
>>
>> With this change, function tracing supports 6 bytes on traceable
>> function and can still replace relative calls on the ftrace assembly
>> functions.
>>
>> Position Independent Executable (PIE) support will allow to extended the
>> KASLR randomization range below the -2G memory limit.
>
> Question: This 6 bytes is only the initial call that gcc creates. When
> function tracing is enabled, the calls are back to the normal call to
> the ftrace trampoline?

That is correct.

>
> -- Steve
>



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-05 13:06     ` [kernel-hardening] " Steven Rostedt
  (?)
  (?)
@ 2017-10-05 16:01     ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-05 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nicolas Pitre, Michal Hocko, Radim Krčmář,
	linux-doc, Daniel Micay, Len Brown, Peter Zijlstra,
	Christopher Li, Jan H . Schönherr, Alexei Starovoitov,
	Matthew Wilcox, virtualization, David Howells, Paul Gortmaker,
	Pavel Machek, H . Peter Anvin, Kernel Hardening, Andrey Ryabinin,
	Christoph Lameter, Thomas Gleixner, Chris Metcalf,
	the arch/x86 maintainers

On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed,  4 Oct 2017 14:19:56 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> When using -fPIE/PIC with function tracing, the compiler generates a
>> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> takes 6 bytes instead of 5 on the usual relative call.
>>
>> With this change, function tracing supports 6 bytes on traceable
>> function and can still replace relative calls on the ftrace assembly
>> functions.
>>
>> Position Independent Executable (PIE) support will allow to extended the
>> KASLR randomization range below the -2G memory limit.
>
> Question: This 6 bytes is only the initial call that gcc creates. When
> function tracing is enabled, the calls are back to the normal call to
> the ftrace trampoline?

That is correct.

>
> -- Steve
>



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* [kernel-hardening] Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
@ 2017-10-05 16:01       ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-05 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Kees Cook, Matthias Kaehlcke, Tom Lendacky, Andy Lutomirski,
	Kirill A . Shutemov, Borislav Petkov, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Juergen Gross, Chris Wright,
	Alok Kataria, Rusty Russell, Tejun Heo, Christoph Lameter,
	Boris Ostrovsky, Alexey Dobriyan, Andrew Morton, Paul Gortmaker,
	Chris Metcalf, Paul E . McKenney, Nicolas Pitre, Borislav Petkov,
	Luis R . Rodriguez, Greg Kroah-Hartman, Christopher Li,
	Jason Baron, Dou Liyang, Rafael J . Wysocki, Mika Westerberg,
	Lukas Wunner, Masahiro Yamada, Alexei Starovoitov,
	Daniel Borkmann, Markus Trippelsdorf, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay, the arch/x86 maintainers,
	Linux Crypto Mailing List, LKML, Linux PM list, virtualization,
	xen-devel, linux-arch, Sparse Mailing-list, kvm list, linux-doc,
	Kernel Hardening

On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed,  4 Oct 2017 14:19:56 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> When using -fPIE/PIC with function tracing, the compiler generates a
>> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> takes 6 bytes instead of 5 on the usual relative call.
>>
>> With this change, function tracing supports 6 bytes on traceable
>> function and can still replace relative calls on the ftrace assembly
>> functions.
>>
>> Position Independent Executable (PIE) support will allow to extended the
>> KASLR randomization range below the -2G memory limit.
>
> Question: This 6 bytes is only the initial call that gcc creates. When
> function tracing is enabled, the calls are back to the normal call to
> the ftrace trampoline?

That is correct.

>
> -- Steve
>



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-05 13:06     ` [kernel-hardening] " Steven Rostedt
                       ` (2 preceding siblings ...)
  (?)
@ 2017-10-05 16:01     ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-05 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nicolas Pitre, Michal Hocko, Radim Krčmář,
	linux-doc, Daniel Micay, Len Brown, Peter Zijlstra,
	Christopher Li, Jan H . Schönherr, Alexei Starovoitov,
	Matthew Wilcox, virtualization, David Howells, Paul Gortmaker,
	Pavel Machek, H . Peter Anvin, Kernel Hardening, Andrey Ryabinin,
	Christoph Lameter, Thomas Gleixner, Chris Metcalf,
	the arch/x86 maintainers

On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed,  4 Oct 2017 14:19:56 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> When using -fPIE/PIC with function tracing, the compiler generates a
>> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> takes 6 bytes instead of 5 on the usual relative call.
>>
>> With this change, function tracing supports 6 bytes on traceable
>> function and can still replace relative calls on the ftrace assembly
>> functions.
>>
>> Position Independent Executable (PIE) support will allow to extended the
>> KASLR randomization range below the -2G memory limit.
>
> Question: This 6 bytes is only the initial call that gcc creates. When
> function tracing is enabled, the calls are back to the normal call to
> the ftrace trampoline?

That is correct.

>
> -- Steve
>



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-05 16:01       ` Thomas Garnier
@ 2017-10-05 16:11         ` Steven Rostedt
  -1 siblings, 0 replies; 231+ messages in thread
From: Steven Rostedt @ 2017-10-05 16:11 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Michal Hocko, Radim Krčmář,
	linux-doc, Daniel Micay, Len Brown, Peter Zijlstra,
	Christopher Li, Jan H . Schönherr, Alexei Starovoitov,
	Matthew Wilcox, virtualization, David Howells, Paul Gortmaker,
	Pavel Machek, H . Peter Anvin, Kernel Hardening, Andrey Ryabinin,
	Christoph Lameter, Thomas Gleixner, Chris Metcalf,
	the arch/x86 maintainers

On Thu, 5 Oct 2017 09:01:14 -0700
Thomas Garnier <thgarnie@google.com> wrote:

> On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Wed,  4 Oct 2017 14:19:56 -0700
> > Thomas Garnier <thgarnie@google.com> wrote:
> >  
> >> When using -fPIE/PIC with function tracing, the compiler generates a
> >> call through the GOT (call *__fentry__@GOTPCREL). This instruction
> >> takes 6 bytes instead of 5 on the usual relative call.
> >>
> >> With this change, function tracing supports 6 bytes on traceable
> >> function and can still replace relative calls on the ftrace assembly
> >> functions.
> >>
> >> Position Independent Executable (PIE) support will allow to extended the
> >> KASLR randomization range below the -2G memory limit.  
> >
> > Question: This 6 bytes is only the initial call that gcc creates. When
> > function tracing is enabled, the calls are back to the normal call to
> > the ftrace trampoline?  
> 
> That is correct.
> 

Then I think a better idea is to simply nop them out at compile time,
and have the code that updates them to nops to know about it.

See scripts/recordmcount.c

Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it
the same as if it didn't exist? This code can be a little complex, and
can cause really nasty side effects if things go wrong. I would like to
keep from adding more variables to the changes here.

-- Steve

^ permalink raw reply	[flat|nested] 231+ messages in thread

* [kernel-hardening] Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
@ 2017-10-05 16:11         ` Steven Rostedt
  0 siblings, 0 replies; 231+ messages in thread
From: Steven Rostedt @ 2017-10-05 16:11 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Kees Cook, Matthias Kaehlcke, Tom Lendacky, Andy Lutomirski,
	Kirill A . Shutemov, Borislav Petkov, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Juergen Gross, Chris Wright,
	Alok Kataria, Rusty Russell, Tejun Heo, Christoph Lameter,
	Boris Ostrovsky, Alexey Dobriyan, Andrew Morton, Paul Gortmaker,
	Chris Metcalf, Paul E . McKenney, Nicolas Pitre, Borislav Petkov,
	Luis R . Rodriguez, Greg Kroah-Hartman, Christopher Li,
	Jason Baron, Dou Liyang, Rafael J . Wysocki, Mika Westerberg,
	Lukas Wunner, Masahiro Yamada, Alexei Starovoitov,
	Daniel Borkmann, Markus Trippelsdorf, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay, the arch/x86 maintainers,
	Linux Crypto Mailing List, LKML, Linux PM list, virtualization,
	xen-devel, linux-arch, Sparse Mailing-list, kvm list, linux-doc,
	Kernel Hardening

On Thu, 5 Oct 2017 09:01:14 -0700
Thomas Garnier <thgarnie@google.com> wrote:

> On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Wed,  4 Oct 2017 14:19:56 -0700
> > Thomas Garnier <thgarnie@google.com> wrote:
> >  
> >> When using -fPIE/PIC with function tracing, the compiler generates a
> >> call through the GOT (call *__fentry__@GOTPCREL). This instruction
> >> takes 6 bytes instead of 5 on the usual relative call.
> >>
> >> With this change, function tracing supports 6 bytes on traceable
> >> function and can still replace relative calls on the ftrace assembly
> >> functions.
> >>
> >> Position Independent Executable (PIE) support will allow to extended the
> >> KASLR randomization range below the -2G memory limit.  
> >
> > Question: This 6 bytes is only the initial call that gcc creates. When
> > function tracing is enabled, the calls are back to the normal call to
> > the ftrace trampoline?  
> 
> That is correct.
> 

Then I think a better idea is to simply nop them out at compile time,
and have the code that updates them to nops to know about it.

See scripts/recordmcount.c

Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it
the same as if it didn't exist? This code can be a little complex, and
can cause really nasty side effects if things go wrong. I would like to
keep from adding more variables to the changes here.

-- Steve

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-05 16:01       ` Thomas Garnier
                         ` (2 preceding siblings ...)
  (?)
@ 2017-10-05 16:11       ` Steven Rostedt
  -1 siblings, 0 replies; 231+ messages in thread
From: Steven Rostedt @ 2017-10-05 16:11 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Michal Hocko, Radim Krčmář,
	linux-doc, Daniel Micay, Len Brown, Peter Zijlstra,
	Christopher Li, Jan H . Schönherr, Alexei Starovoitov,
	Matthew Wilcox, virtualization, David Howells, Paul Gortmaker,
	Pavel Machek, H . Peter Anvin, Kernel Hardening, Andrey Ryabinin,
	Christoph Lameter, Thomas Gleixner, Chris Metcalf,
	the arch/x86 maintainers

On Thu, 5 Oct 2017 09:01:14 -0700
Thomas Garnier <thgarnie@google.com> wrote:

> On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> > On Wed,  4 Oct 2017 14:19:56 -0700
> > Thomas Garnier <thgarnie@google.com> wrote:
> >  
> >> When using -fPIE/PIC with function tracing, the compiler generates a
> >> call through the GOT (call *__fentry__@GOTPCREL). This instruction
> >> takes 6 bytes instead of 5 on the usual relative call.
> >>
> >> With this change, function tracing supports 6 bytes on traceable
> >> function and can still replace relative calls on the ftrace assembly
> >> functions.
> >>
> >> Position Independent Executable (PIE) support will allow to extended the
> >> KASLR randomization range below the -2G memory limit.  
> >
> > Question: This 6 bytes is only the initial call that gcc creates. When
> > function tracing is enabled, the calls are back to the normal call to
> > the ftrace trampoline?  
> 
> That is correct.
> 

Then I think a better idea is to simply nop them out at compile time,
and have the code that updates them to nops to know about it.

See scripts/recordmcount.c

Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it
the same as if it didn't exist? This code can be a little complex, and
can cause really nasty side effects if things go wrong. I would like to
keep from adding more variables to the changes here.

-- Steve

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-05 16:11         ` [kernel-hardening] " Steven Rostedt
  (?)
@ 2017-10-05 16:14           ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-05 16:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Kees Cook, Matthias Kaehlcke, Tom Lendacky, Andy Lutomirski,
	Kirill A . Shutemov, Borislav Petkov, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Juergen Gross, Chris Wright,
	Alok Kataria, Rusty Russell, Tejun Heo, Christoph Lameter,
	Boris Ostrovsky

On Thu, Oct 5, 2017 at 9:11 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 5 Oct 2017 09:01:14 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Wed,  4 Oct 2017 14:19:56 -0700
>> > Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> When using -fPIE/PIC with function tracing, the compiler generates a
>> >> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> >> takes 6 bytes instead of 5 on the usual relative call.
>> >>
>> >> With this change, function tracing supports 6 bytes on traceable
>> >> function and can still replace relative calls on the ftrace assembly
>> >> functions.
>> >>
>> >> Position Independent Executable (PIE) support will allow to extended the
>> >> KASLR randomization range below the -2G memory limit.
>> >
>> > Question: This 6 bytes is only the initial call that gcc creates. When
>> > function tracing is enabled, the calls are back to the normal call to
>> > the ftrace trampoline?
>>
>> That is correct.
>>
>
> Then I think a better idea is to simply nop them out at compile time,
> and have the code that updates them to nops to know about it.
>
> See scripts/recordmcount.c
>
> Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it
> the same as if it didn't exist? This code can be a little complex, and
> can cause really nasty side effects if things go wrong. I would like to
> keep from adding more variables to the changes here.

Sure, I will simplify it for the next iteration.

Thanks for the feedback.

>
> -- Steve



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
@ 2017-10-05 16:14           ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-05 16:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Kees Cook, Matthias Kaehlcke, Tom Lendacky, Andy Lutomirski,
	Kirill A . Shutemov, Borislav Petkov, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Juergen Gross, Chris Wright,
	Alok Kataria, Rusty Russell, Tejun Heo, Christoph Lameter,
	Boris Ostrovsky

On Thu, Oct 5, 2017 at 9:11 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 5 Oct 2017 09:01:14 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Wed,  4 Oct 2017 14:19:56 -0700
>> > Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> When using -fPIE/PIC with function tracing, the compiler generates a
>> >> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> >> takes 6 bytes instead of 5 on the usual relative call.
>> >>
>> >> With this change, function tracing supports 6 bytes on traceable
>> >> function and can still replace relative calls on the ftrace assembly
>> >> functions.
>> >>
>> >> Position Independent Executable (PIE) support will allow to extended the
>> >> KASLR randomization range below the -2G memory limit.
>> >
>> > Question: This 6 bytes is only the initial call that gcc creates. When
>> > function tracing is enabled, the calls are back to the normal call to
>> > the ftrace trampoline?
>>
>> That is correct.
>>
>
> Then I think a better idea is to simply nop them out at compile time,
> and have the code that updates them to nops to know about it.
>
> See scripts/recordmcount.c
>
> Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it
> the same as if it didn't exist? This code can be a little complex, and
> can cause really nasty side effects if things go wrong. I would like to
> keep from adding more variables to the changes here.

Sure, I will simplify it for the next iteration.

Thanks for the feedback.

>
> -- Steve



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-05 16:11         ` [kernel-hardening] " Steven Rostedt
  (?)
@ 2017-10-05 16:14         ` Thomas Garnier via Virtualization
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-05 16:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nicolas Pitre, Michal Hocko, Radim Krčmář,
	linux-doc, Daniel Micay, Len Brown, Peter Zijlstra,
	Christopher Li, Jan H . Schönherr, Alexei Starovoitov,
	Matthew Wilcox, virtualization, David Howells, Paul Gortmaker,
	Pavel Machek, H . Peter Anvin, Kernel Hardening, Andrey Ryabinin,
	Christoph Lameter, Thomas Gleixner, Chris Metcalf,
	the arch/x86 maintainers

On Thu, Oct 5, 2017 at 9:11 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 5 Oct 2017 09:01:14 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Wed,  4 Oct 2017 14:19:56 -0700
>> > Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> When using -fPIE/PIC with function tracing, the compiler generates a
>> >> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> >> takes 6 bytes instead of 5 on the usual relative call.
>> >>
>> >> With this change, function tracing supports 6 bytes on traceable
>> >> function and can still replace relative calls on the ftrace assembly
>> >> functions.
>> >>
>> >> Position Independent Executable (PIE) support will allow to extended the
>> >> KASLR randomization range below the -2G memory limit.
>> >
>> > Question: This 6 bytes is only the initial call that gcc creates. When
>> > function tracing is enabled, the calls are back to the normal call to
>> > the ftrace trampoline?
>>
>> That is correct.
>>
>
> Then I think a better idea is to simply nop them out at compile time,
> and have the code that updates them to nops to know about it.
>
> See scripts/recordmcount.c
>
> Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it
> the same as if it didn't exist? This code can be a little complex, and
> can cause really nasty side effects if things go wrong. I would like to
> keep from adding more variables to the changes here.

Sure, I will simplify it for the next iteration.

Thanks for the feedback.

>
> -- Steve



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* [kernel-hardening] Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
@ 2017-10-05 16:14           ` Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-05 16:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Kees Cook, Matthias Kaehlcke, Tom Lendacky, Andy Lutomirski,
	Kirill A . Shutemov, Borislav Petkov, Rafael J . Wysocki,
	Len Brown, Pavel Machek, Juergen Gross, Chris Wright,
	Alok Kataria, Rusty Russell, Tejun Heo, Christoph Lameter,
	Boris Ostrovsky, Alexey Dobriyan, Andrew Morton, Paul Gortmaker,
	Chris Metcalf, Paul E . McKenney, Nicolas Pitre, Borislav Petkov,
	Luis R . Rodriguez, Greg Kroah-Hartman, Christopher Li,
	Jason Baron, Dou Liyang, Rafael J . Wysocki, Mika Westerberg,
	Lukas Wunner, Masahiro Yamada, Alexei Starovoitov,
	Daniel Borkmann, Markus Trippelsdorf, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Rik van Riel, David Howells, Ard Biesheuvel,
	Waiman Long, Kyle Huey, Andrey Ryabinin, Jonathan Corbet,
	Matthew Wilcox, Michal Hocko, Peter Foley, Paul Bolle,
	Jiri Kosina, Rob Landley, H . J . Lu, Baoquan He,
	Jan H . Schönherr, Daniel Micay, the arch/x86 maintainers,
	Linux Crypto Mailing List, LKML, Linux PM list, virtualization,
	xen-devel, linux-arch, Sparse Mailing-list, kvm list, linux-doc,
	Kernel Hardening

On Thu, Oct 5, 2017 at 9:11 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 5 Oct 2017 09:01:14 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Wed,  4 Oct 2017 14:19:56 -0700
>> > Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> When using -fPIE/PIC with function tracing, the compiler generates a
>> >> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> >> takes 6 bytes instead of 5 on the usual relative call.
>> >>
>> >> With this change, function tracing supports 6 bytes on traceable
>> >> function and can still replace relative calls on the ftrace assembly
>> >> functions.
>> >>
>> >> Position Independent Executable (PIE) support will allow to extended the
>> >> KASLR randomization range below the -2G memory limit.
>> >
>> > Question: This 6 bytes is only the initial call that gcc creates. When
>> > function tracing is enabled, the calls are back to the normal call to
>> > the ftrace trampoline?
>>
>> That is correct.
>>
>
> Then I think a better idea is to simply nop them out at compile time,
> and have the code that updates them to nops to know about it.
>
> See scripts/recordmcount.c
>
> Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it
> the same as if it didn't exist? This code can be a little complex, and
> can cause really nasty side effects if things go wrong. I would like to
> keep from adding more variables to the changes here.

Sure, I will simplify it for the next iteration.

Thanks for the feedback.

>
> -- Steve



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support
  2017-10-05 16:11         ` [kernel-hardening] " Steven Rostedt
                           ` (2 preceding siblings ...)
  (?)
@ 2017-10-05 16:14         ` Thomas Garnier
  -1 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-05 16:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Nicolas Pitre, Michal Hocko, Radim Krčmář,
	linux-doc, Daniel Micay, Len Brown, Peter Zijlstra,
	Christopher Li, Jan H . Schönherr, Alexei Starovoitov,
	Matthew Wilcox, virtualization, David Howells, Paul Gortmaker,
	Pavel Machek, H . Peter Anvin, Kernel Hardening, Andrey Ryabinin,
	Christoph Lameter, Thomas Gleixner, Chris Metcalf,
	the arch/x86 maintainers

On Thu, Oct 5, 2017 at 9:11 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 5 Oct 2017 09:01:14 -0700
> Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Oct 5, 2017 at 6:06 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> > On Wed,  4 Oct 2017 14:19:56 -0700
>> > Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> When using -fPIE/PIC with function tracing, the compiler generates a
>> >> call through the GOT (call *__fentry__@GOTPCREL). This instruction
>> >> takes 6 bytes instead of 5 on the usual relative call.
>> >>
>> >> With this change, function tracing supports 6 bytes on traceable
>> >> function and can still replace relative calls on the ftrace assembly
>> >> functions.
>> >>
>> >> Position Independent Executable (PIE) support will allow to extended the
>> >> KASLR randomization range below the -2G memory limit.
>> >
>> > Question: This 6 bytes is only the initial call that gcc creates. When
>> > function tracing is enabled, the calls are back to the normal call to
>> > the ftrace trampoline?
>>
>> That is correct.
>>
>
> Then I think a better idea is to simply nop them out at compile time,
> and have the code that updates them to nops to know about it.
>
> See scripts/recordmcount.c
>
> Could we simply add a 5 byte nop followed by a 1 byte nop, and treat it
> the same as if it didn't exist? This code can be a little complex, and
> can cause really nasty side effects if things go wrong. I would like to
> keep from adding more variables to the changes here.

Sure, I will simplify it for the next iteration.

Thanks for the feedback.

>
> -- Steve



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-10-06 10:39                       ` Pavel Machek
@ 2017-10-20  8:13                         ` Ingo Molnar
  2017-10-20  8:13                         ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-10-20  8:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: H. Peter Anvin, Peter Zijlstra, Thomas Garnier, Herbert Xu,
	David S . Miller, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Tejun Heo


* Pavel Machek <pavel@ucw.cz> wrote:

> On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> > 
> > * Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > > > For example, there would be collision with regular user-space mappings, right? 
> > > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > > > the kernel lives?
> > > 
> > > Local unpriviledged users can probably get your secret bits using cache probing 
> > > and jump prediction buffers.
> > > 
> > > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> > > leak it for you, anyway.
> > 
> > Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
> > angle.
> 
> I believe any x86 CPU running Linux will leak it. And with CPU vendors
> putting "artifical inteligence" into branch prediction, no, I don't
> think it is going to get better.
> 
> That does not mean we shoudl not prevent mmap() info leak, but...

That might or might not be so, but there's a world of a difference between
running a relatively long statistical attack figuring out the kernel's
location, versus being able to programmatically probe the kernel's location
by using large MAP_FIXED user-space mmap()s, within a few dozen microseconds
or so and a 100% guaranteed, non-statistical result.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-10-06 10:39                       ` Pavel Machek
  2017-10-20  8:13                         ` Ingo Molnar
@ 2017-10-20  8:13                         ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-10-20  8:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Peter Foley,
	H. Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel,
	Rafael J . Wysocki


* Pavel Machek <pavel@ucw.cz> wrote:

> On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> > 
> > * Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > > > For example, there would be collision with regular user-space mappings, right? 
> > > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > > > the kernel lives?
> > > 
> > > Local unpriviledged users can probably get your secret bits using cache probing 
> > > and jump prediction buffers.
> > > 
> > > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> > > leak it for you, anyway.
> > 
> > Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
> > angle.
> 
> I believe any x86 CPU running Linux will leak it. And with CPU vendors
> putting "artifical inteligence" into branch prediction, no, I don't
> think it is going to get better.
> 
> That does not mean we shoudl not prevent mmap() info leak, but...

That might or might not be so, but there's a world of a difference between
running a relatively long statistical attack figuring out the kernel's
location, versus being able to programmatically probe the kernel's location
by using large MAP_FIXED user-space mmap()s, within a few dozen microseconds
or so and a 100% guaranteed, non-statistical result.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-25  7:33                     ` Ingo Molnar
  2017-10-06 10:39                       ` Pavel Machek
@ 2017-10-06 10:39                       ` Pavel Machek
  2017-10-20  8:13                         ` Ingo Molnar
  2017-10-20  8:13                         ` Ingo Molnar
  1 sibling, 2 replies; 231+ messages in thread
From: Pavel Machek @ 2017-10-06 10:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Peter Zijlstra, Thomas Garnier, Herbert Xu,
	David S . Miller, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> 
> * Pavel Machek <pavel@ucw.cz> wrote:
> 
> > > For example, there would be collision with regular user-space mappings, right? 
> > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > > the kernel lives?
> > 
> > Local unpriviledged users can probably get your secret bits using cache probing 
> > and jump prediction buffers.
> > 
> > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> > leak it for you, anyway.
> 
> Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
> angle.

I believe any x86 CPU running Linux will leak it. And with CPU vendors
putting "artifical inteligence" into branch prediction, no, I don't
think it is going to get better.

That does not mean we shoudl not prevent mmap() info leak, but...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-25  7:33                     ` Ingo Molnar
@ 2017-10-06 10:39                       ` Pavel Machek
  2017-10-06 10:39                       ` Pavel Machek
  1 sibling, 0 replies; 231+ messages in thread
From: Pavel Machek @ 2017-10-06 10:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Peter Foley,
	H. Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel,
	Rafael J . Wysocki


[-- Attachment #1.1: Type: text/plain, Size: 1039 bytes --]

On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> 
> * Pavel Machek <pavel@ucw.cz> wrote:
> 
> > > For example, there would be collision with regular user-space mappings, right? 
> > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > > the kernel lives?
> > 
> > Local unpriviledged users can probably get your secret bits using cache probing 
> > and jump prediction buffers.
> > 
> > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> > leak it for you, anyway.
> 
> Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
> angle.

I believe any x86 CPU running Linux will leak it. And with CPU vendors
putting "artifical inteligence" into branch prediction, no, I don't
think it is going to get better.

That does not mean we shoudl not prevent mmap() info leak, but...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 Thomas Garnier via Virtualization
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)

^ permalink raw reply	[flat|nested] 231+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-23  9:43                                 ` Ingo Molnar
  2017-10-02 20:28                                   ` Thomas Garnier
@ 2017-10-02 20:28                                   ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-02 20:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Sat, Sep 23, 2017 at 2:43 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> >   2) we first implement the additional entropy bits that Linus suggested.
>> >
>> > does this work for you?
>>
>> Sure, I can look at how feasible that is. If it is, can I send
>> everything as part of the same patch set? The additional entropy would
>> be enabled for all KASLR but PIE will be off-by-default of course.
>
> Sure, can all be part of the same series.

I looked deeper in the change Linus proposed (moving the .text section
based on the cacheline). I think the complexity is too high for the
value of this change.

To move only the .text section would require at least the following changes:
 - Overall change on how relocations are processed, need to separate
relocations in and outside of the .text section.
 - Break assumptions on _text alignment while keeping calculation on
size accurate (for example _end - _text).

With a rough attempt at this, I managed to pass early boot and still
crash later on.

This change would be valuable if you leak the address of a section
other than .text and you want to know where .text is. Meaning the main
bug that you are trying to exploit only allow you to execute code (and
you are trying to ROP in .text). I would argue that a better
mitigation for this type of bugs is moving function pointer to
read-only sections and using stack cookies (for ret address). This
change won't prevent other type of attacks, like data corruption.

I think it would be more valuable to look at something like selfrando
/ pagerando [1] but maybe wait a bit for it to be more mature
(especially on the debugging side).

What do you think?

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-23  9:43                                 ` Ingo Molnar
@ 2017-10-02 20:28                                   ` Thomas Garnier
  2017-10-02 20:28                                   ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-10-02 20:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Sat, Sep 23, 2017 at 2:43 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> >   2) we first implement the additional entropy bits that Linus suggested.
>> >
>> > does this work for you?
>>
>> Sure, I can look at how feasible that is. If it is, can I send
>> everything as part of the same patch set? The additional entropy would
>> be enabled for all KASLR but PIE will be off-by-default of course.
>
> Sure, can all be part of the same series.

I looked deeper in the change Linus proposed (moving the .text section
based on the cacheline). I think the complexity is too high for the
value of this change.

To move only the .text section would require at least the following changes:
 - Overall change on how relocations are processed, need to separate
relocations in and outside of the .text section.
 - Break assumptions on _text alignment while keeping calculation on
size accurate (for example _end - _text).

With a rough attempt at this, I managed to pass early boot and still
crash later on.

This change would be valuable if you leak the address of a section
other than .text and you want to know where .text is. Meaning the main
bug that you are trying to exploit only allow you to execute code (and
you are trying to ROP in .text). I would argue that a better
mitigation for this type of bugs is moving function pointer to
read-only sections and using stack cookies (for ret address). This
change won't prevent other type of attacks, like data corruption.

I think it would be more valuable to look at something like selfrando
/ pagerando [1] but maybe wait a bit for it to be more mature
(especially on the debugging side).

What do you think?

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-24 22:37                   ` Pavel Machek
  2017-09-25  7:33                     ` Ingo Molnar
@ 2017-09-25  7:33                     ` Ingo Molnar
  2017-10-06 10:39                       ` Pavel Machek
  2017-10-06 10:39                       ` Pavel Machek
  1 sibling, 2 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-25  7:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: H. Peter Anvin, Peter Zijlstra, Thomas Garnier, Herbert Xu,
	David S . Miller, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Tejun Heo


* Pavel Machek <pavel@ucw.cz> wrote:

> > For example, there would be collision with regular user-space mappings, right? 
> > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > the kernel lives?
> 
> Local unpriviledged users can probably get your secret bits using cache probing 
> and jump prediction buffers.
> 
> Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> leak it for you, anyway.

Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
angle.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-24 22:37                   ` Pavel Machek
@ 2017-09-25  7:33                     ` Ingo Molnar
  2017-09-25  7:33                     ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-25  7:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Peter Foley,
	H. Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel,
	Rafael J . Wysocki


* Pavel Machek <pavel@ucw.cz> wrote:

> > For example, there would be collision with regular user-space mappings, right? 
> > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > the kernel lives?
> 
> Local unpriviledged users can probably get your secret bits using cache probing 
> and jump prediction buffers.
> 
> Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> leak it for you, anyway.

Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
angle.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-23 10:00                 ` Ingo Molnar
@ 2017-09-24 22:37                   ` Pavel Machek
  2017-09-25  7:33                     ` Ingo Molnar
  2017-09-25  7:33                     ` Ingo Molnar
  2017-09-24 22:37                   ` Pavel Machek
  1 sibling, 2 replies; 231+ messages in thread
From: Pavel Machek @ 2017-09-24 22:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Peter Zijlstra, Thomas Garnier, Herbert Xu,
	David S . Miller, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Hi!

> > We do need to consider how we want modules to fit into whatever model we
> > choose, though.  They can be adjacent, or we could go with a more
> > traditional dynamic link model where the modules can be separate, and
> > chained together with the main kernel via the GOT.
> 
> So I believe we should start with 'adjacent'. The thing is, having modules 
> separately randomized mostly helps if any of the secret locations fails and
> we want to prevent hopping from one to the other. But if one the kernel-privileged
> secret location fails then KASLR has already failed to a significant degree...
> 
> So I think the large-PIC model for modules does not buy us any real advantages in 
> practice, and the disadvantages of large-PIC are real and most Linux users have to 
> pay that cost unconditionally, as distro kernels have half of their kernel 
> functionality living in modules.
> 
> But I do see fundamental value in being able to hide the kernel somewhere in a ~48 
> bits address space, especially if we also implement Linus's suggestion to utilize 
> the lower bits as well. 0..281474976710656 is a nicely large range and will get 
> larger with time.
> 
> But it should all be done smartly and carefully:
> 
> For example, there would be collision with regular user-space mappings, right?
> Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
> the kernel lives?

Local unpriviledged users can probably get your secret bits using
cache probing and jump prediction buffers.

Yes, you don't want to leak the information using mmap(MAP_FIXED), but
CPU will leak it for you, anyway.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-23 10:00                 ` Ingo Molnar
  2017-09-24 22:37                   ` Pavel Machek
@ 2017-09-24 22:37                   ` Pavel Machek
  1 sibling, 0 replies; 231+ messages in thread
From: Pavel Machek @ 2017-09-24 22:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Peter Foley,
	H. Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel,
	Rafael J . Wysocki


[-- Attachment #1.1: Type: text/plain, Size: 1813 bytes --]

Hi!

> > We do need to consider how we want modules to fit into whatever model we
> > choose, though.  They can be adjacent, or we could go with a more
> > traditional dynamic link model where the modules can be separate, and
> > chained together with the main kernel via the GOT.
> 
> So I believe we should start with 'adjacent'. The thing is, having modules 
> separately randomized mostly helps if any of the secret locations fails and
> we want to prevent hopping from one to the other. But if one the kernel-privileged
> secret location fails then KASLR has already failed to a significant degree...
> 
> So I think the large-PIC model for modules does not buy us any real advantages in 
> practice, and the disadvantages of large-PIC are real and most Linux users have to 
> pay that cost unconditionally, as distro kernels have half of their kernel 
> functionality living in modules.
> 
> But I do see fundamental value in being able to hide the kernel somewhere in a ~48 
> bits address space, especially if we also implement Linus's suggestion to utilize 
> the lower bits as well. 0..281474976710656 is a nicely large range and will get 
> larger with time.
> 
> But it should all be done smartly and carefully:
> 
> For example, there would be collision with regular user-space mappings, right?
> Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
> the kernel lives?

Local unpriviledged users can probably get your secret bits using
cache probing and jump prediction buffers.

Yes, you don't want to leak the information using mmap(MAP_FIXED), but
CPU will leak it for you, anyway.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:27               ` H. Peter Anvin
@ 2017-09-23 10:00                 ` Ingo Molnar
  2017-09-24 22:37                   ` Pavel Machek
  2017-09-24 22:37                   ` Pavel Machek
  2017-09-23 10:00                 ` Ingo Molnar
  1 sibling, 2 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-23 10:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Thomas Garnier, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* H. Peter Anvin <hpa@zytor.com> wrote:

> We do need to consider how we want modules to fit into whatever model we
> choose, though.  They can be adjacent, or we could go with a more
> traditional dynamic link model where the modules can be separate, and
> chained together with the main kernel via the GOT.

So I believe we should start with 'adjacent'. The thing is, having modules 
separately randomized mostly helps if any of the secret locations fails and
we want to prevent hopping from one to the other. But if one the kernel-privileged
secret location fails then KASLR has already failed to a significant degree...

So I think the large-PIC model for modules does not buy us any real advantages in 
practice, and the disadvantages of large-PIC are real and most Linux users have to 
pay that cost unconditionally, as distro kernels have half of their kernel 
functionality living in modules.

But I do see fundamental value in being able to hide the kernel somewhere in a ~48 
bits address space, especially if we also implement Linus's suggestion to utilize 
the lower bits as well. 0..281474976710656 is a nicely large range and will get 
larger with time.

But it should all be done smartly and carefully:

For example, there would be collision with regular user-space mappings, right?
Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
the kernel lives?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:27               ` H. Peter Anvin
  2017-09-23 10:00                 ` Ingo Molnar
@ 2017-09-23 10:00                 ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-23 10:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki


* H. Peter Anvin <hpa@zytor.com> wrote:

> We do need to consider how we want modules to fit into whatever model we
> choose, though.  They can be adjacent, or we could go with a more
> traditional dynamic link model where the modules can be separate, and
> chained together with the main kernel via the GOT.

So I believe we should start with 'adjacent'. The thing is, having modules 
separately randomized mostly helps if any of the secret locations fails and
we want to prevent hopping from one to the other. But if one the kernel-privileged
secret location fails then KASLR has already failed to a significant degree...

So I think the large-PIC model for modules does not buy us any real advantages in 
practice, and the disadvantages of large-PIC are real and most Linux users have to 
pay that cost unconditionally, as distro kernels have half of their kernel 
functionality living in modules.

But I do see fundamental value in being able to hide the kernel somewhere in a ~48 
bits address space, especially if we also implement Linus's suggestion to utilize 
the lower bits as well. 0..281474976710656 is a nicely large range and will get 
larger with time.

But it should all be done smartly and carefully:

For example, there would be collision with regular user-space mappings, right?
Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
the kernel lives?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
                                                   ` (4 preceding siblings ...)
  2017-09-23  9:49                                 ` Ingo Molnar
@ 2017-09-23  9:49                                 ` Ingo Molnar
  5 siblings, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-23  9:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 09/22/17 09:32, Ingo Molnar wrote:
> > 
> > BTW., I think things improved with ORC because with ORC we have RBP as an extra 
> > register and with PIE we lose RBX - so register pressure in code generation is 
> > lower.
> > 
> 
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

Indeed, but we'd use a new register _a lot_ for constructs, transforming:

  mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

into:

  lea    rbx,[rip+<off>] (7 bytes)
  mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

... which I suppose is quite close to (but not the same as) 'losing' RBX.

Of course the compiler can pick other registers as well, not that it matters much 
to register pressure in larger functions in the end. Plus if the compiler has to 
pick a callee-saved register there's the additional saving/restoring overhead of 
that as well.

Right?

> I'm somewhat confused how we can have as much as almost 1% overhead.  I suspect 
> that we end up making a GOT and maybe even a PLT for no good reason.

So the above transformation alone would explain a good chunk of the overhead I 
think.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
                                                   ` (3 preceding siblings ...)
  2017-09-22 18:59                                 ` Thomas Garnier
@ 2017-09-23  9:49                                 ` Ingo Molnar
  2017-09-23  9:49                                 ` Ingo Molnar
  5 siblings, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-23  9:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 09/22/17 09:32, Ingo Molnar wrote:
> > 
> > BTW., I think things improved with ORC because with ORC we have RBP as an extra 
> > register and with PIE we lose RBX - so register pressure in code generation is 
> > lower.
> > 
> 
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

Indeed, but we'd use a new register _a lot_ for constructs, transforming:

  mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

into:

  lea    rbx,[rip+<off>] (7 bytes)
  mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

... which I suppose is quite close to (but not the same as) 'losing' RBX.

Of course the compiler can pick other registers as well, not that it matters much 
to register pressure in larger functions in the end. Plus if the compiler has to 
pick a callee-saved register there's the additional saving/restoring overhead of 
that as well.

Right?

> I'm somewhat confused how we can have as much as almost 1% overhead.  I suspect 
> that we end up making a GOT and maybe even a PLT for no good reason.

So the above transformation alone would explain a good chunk of the overhead I 
think.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:08                               ` Thomas Garnier
@ 2017-09-23  9:43                                 ` Ingo Molnar
  2017-10-02 20:28                                   ` Thomas Garnier
  2017-10-02 20:28                                   ` Thomas Garnier
  2017-09-23  9:43                                 ` Ingo Molnar
  1 sibling, 2 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-23  9:43 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> >   2) we first implement the additional entropy bits that Linus suggested.
> >
> > does this work for you?
> 
> Sure, I can look at how feasible that is. If it is, can I send
> everything as part of the same patch set? The additional entropy would
> be enabled for all KASLR but PIE will be off-by-default of course.

Sure, can all be part of the same series.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:08                               ` Thomas Garnier
  2017-09-23  9:43                                 ` Ingo Molnar
@ 2017-09-23  9:43                                 ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-23  9:43 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> >   2) we first implement the additional entropy bits that Linus suggested.
> >
> > does this work for you?
> 
> Sure, I can look at how feasible that is. If it is, can I send
> everything as part of the same patch set? The additional entropy would
> be enabled for all KASLR but PIE will be off-by-default of course.

Sure, can all be part of the same series.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:21                             ` Thomas Garnier
  2017-09-22  4:24                               ` Markus Trippelsdorf
  2017-09-22 23:55                               ` Thomas Garnier
@ 2017-09-22 23:55                               ` Thomas Garnier
  2 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22 23:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Thu, Sep 21, 2017 at 2:21 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>>
>> On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> >   window, but in reality I've been procrastinating this is due to the permanent,
>> >   non-trivial impact PIE has on generated C code. )
>> >
>> > * Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> 1) PIE sometime needs two instructions to represent a single
>> >> instruction on mcmodel=kernel.
>> >
>> > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > kernel, with the very latest GCC?
>> >
>> > Also, to make sure: which unwinder did you use for your measurements,
>> > frame-pointers or ORC? Please use ORC only for future numbers, as
>> > frame-pointers is obsolete from a performance measurement POV.
>> >
>> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>> >
>> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
>> > switch in the future.
>> >
>>
>> There are somewhat related concerns in the ARM world, so it would be
>> good if we could work with the GCC developers to get a more high level
>> and arch neutral command line option (-mkernel-pie? sounds yummy!)
>> that stops the compiler from making inferences that only hold for
>> shared libraries and/or other hosted executables (GOT indirections,
>> avoiding text relocations etc). That way, we will also be able to drop
>> the 'hidden' visibility override at some point, which we currently
>> need to prevent the compiler from redirecting all global symbol
>> references via entries in the GOT.
>
> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
> register for stack cookies but I can see great benefits in having a
> more general kernel flag that would allow to get rid of the GOT and
> PLT when you are building position independent code for the kernel. It
> could also include optimizations like folding switch tables etc...
>
> Should we start a separate discussion on that? Anyone that would be
> more experienced than I to push that to gcc & clang upstream?

After separate discussion, opened:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

>
>>
>> All we really need is the ability to move the image around in virtual
>> memory, and things like reducing the CoW footprint or enabling ELF
>> symbol preemption are completely irrelevant for us.
>
>
>
>
> --
> Thomas



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:21                             ` Thomas Garnier
  2017-09-22  4:24                               ` Markus Trippelsdorf
@ 2017-09-22 23:55                               ` Thomas Garnier
  2017-09-22 23:55                               ` Thomas Garnier
  2 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22 23:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 2:21 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>>
>> On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> >   window, but in reality I've been procrastinating this is due to the permanent,
>> >   non-trivial impact PIE has on generated C code. )
>> >
>> > * Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> 1) PIE sometime needs two instructions to represent a single
>> >> instruction on mcmodel=kernel.
>> >
>> > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > kernel, with the very latest GCC?
>> >
>> > Also, to make sure: which unwinder did you use for your measurements,
>> > frame-pointers or ORC? Please use ORC only for future numbers, as
>> > frame-pointers is obsolete from a performance measurement POV.
>> >
>> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>> >
>> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
>> > switch in the future.
>> >
>>
>> There are somewhat related concerns in the ARM world, so it would be
>> good if we could work with the GCC developers to get a more high level
>> and arch neutral command line option (-mkernel-pie? sounds yummy!)
>> that stops the compiler from making inferences that only hold for
>> shared libraries and/or other hosted executables (GOT indirections,
>> avoiding text relocations etc). That way, we will also be able to drop
>> the 'hidden' visibility override at some point, which we currently
>> need to prevent the compiler from redirecting all global symbol
>> references via entries in the GOT.
>
> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
> register for stack cookies but I can see great benefits in having a
> more general kernel flag that would allow to get rid of the GOT and
> PLT when you are building position independent code for the kernel. It
> could also include optimizations like folding switch tables etc...
>
> Should we start a separate discussion on that? Anyone that would be
> more experienced than I to push that to gcc & clang upstream?

After separate discussion, opened:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

>
>>
>> All we really need is the ability to move the image around in virtual
>> memory, and things like reducing the CoW footprint or enabling ELF
>> symbol preemption are completely irrelevant for us.
>
>
>
>
> --
> Thomas



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 19:06                                   ` H. Peter Anvin
  2017-09-22 22:19                                     ` hjl.tools
@ 2017-09-22 22:30                                     ` hjl.tools
  1 sibling, 0 replies; 231+ messages in thread
From: hjl.tools @ 2017-09-22 22:30 UTC (permalink / raw)
  To: H. Peter Anvin, Kees Cook
  Cc: Radim Krčmář,
	Peter Zijlstra, Paul Gortmaker, Pavel Machek, Christoph Lameter,
	Ingo Molnar, Herbert Xu, Joerg Roedel, Matthias Kaehlcke,
	Borislav Petkov, Len Brown, Arnd Bergmann, Brian Gerst,
	Andy Lutomirski, Josh Poimboeuf

<cmetcalf@mellanox.com>,Andrew Morton <akpm@linux-foundation.org>,"Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,Nicolas Pitre <nicolas.pitre@linaro.org>,Christopher Li <sparse@chrisli.org>,"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,Lukas Wunner <lukas@wunner.de>,Mika Westerberg <mika.westerberg@linux.intel.com>,Dou Liyang <douly.fnst@cn.fujitsu.com>,Daniel Borkmann <daniel@iogearbox.net>,Alexei Starovoitov <ast@kernel.org>,Masahiro Yamada <yamada.masahiro@socionext.com>,Markus Trippelsdorf <markus@trippelsdorf.de>,Steven Rostedt <rostedt@goodmis.org>,Rik van Riel <riel@redhat.com>,David Howells <dhowells@redhat.com>,Waiman Long <longman@redhat.com>,Kyle Huey <me@kylehuey.com>,Peter Foley <pefoley2@pefoley.com>,Tim Chen <tim.c.chen@linux.intel.com>,Catalin Marinas <catalin.marinas@arm.com>,Ard Biesheuvel <ard.biesheuvel@linaro.org>,Michal Hocko <mhocko@suse.com>,Matthew Wilcox <mawilcox@microsoft.com>,Paul Bolle <pebolle@tiscali.nl>,Rob Landley <rob@landley.net>,Baoquan He
<bhe@redhat.com>,Daniel Micay <danielmicay@gmail.com>,the arch/x86 maintainers <x86@kernel.org>,Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,LKML <linux-kernel@vger.kernel.org>,xen-devel <xen-devel@lists.xenproject.org>,kvm list <kvm@vger.kernel.org>,Linux PM list <linux-pm@vger.kernel.org>,linux-arch <linux-arch@vger.kernel.org>,Sparse Mailing-list <linux-sparse@vger.kernel.org>,Kernel Hardening <kernel-hardening@lists.openwall.com>,Linus Torvalds <torvalds@linux-foundation.org>,Peter Zijlstra <a.p.zijlstra@chello.nl>,Borislav Petkov <bp@alien8.de>
From: "H.J. Lu" <hjl.tools@gmail.com>
Message-ID: <CFFA3E3A-3136-4FAF-80E1-96A515A5C903@gmail.com>



On September 23, 2017 3:06:16 AM GMT+08:00, "H. Peter Anvin" <hpa@zytor.com> wrote:
>On 09/22/17 11:57, Kees Cook wrote:
>> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com>
>wrote:
>>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since
>x86-64
>>> has RIP-relative addressing there is no need for a dedicated PIC
>register.
>> 
>> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
>> reusable, and that seems to have improved performance:
>> https://gcc.gnu.org/gcc-5/changes.html
>
>It still talks about a PIC register on x86-64, which confuses me.
>Perhaps older gcc's would allocate a PIC register under certain
>circumstances, and then lose it for the entire function?
>
>For i386, the PIC register is required by the ABI to be %ebx at the
>point any PLT entry is called.  Not an issue with -mno-plt which goes
>straight to the GOT, although in most cases there needs to be a PIC
>register to find the GOT unless load-time relocation is permitted.
>
>	-hpa
We need a static PIE option so that compiler can optimize it
without using hidden visibility.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 19:06                                   ` H. Peter Anvin
@ 2017-09-22 22:19                                     ` hjl.tools
  2017-09-22 22:30                                     ` hjl.tools
  1 sibling, 0 replies; 231+ messages in thread
From: hjl.tools @ 2017-09-22 22:19 UTC (permalink / raw)
  To: H. Peter Anvin, Kees Cook
  Cc: Radim Krčmář,
	Peter Zijlstra, Paul Gortmaker, Pavel Machek, Christoph Lameter,
	Ingo Molnar, Herbert Xu, Joerg Roedel, Matthias Kaehlcke,
	Borislav Petkov, Len Brown, Arnd Bergmann, Brian Gerst,
	Andy Lutomirski, Josh Poimboeuf






[-- Attachment #2.2: Type: text/plain, Size: 1208 bytes --]



On September 23, 2017 3:06:16 AM GMT+08:00, "H. Peter Anvin" <hpa@zytor.com> wrote:
>On 09/22/17 11:57, Kees Cook wrote:
>> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com>
>wrote:
>>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since
>x86-64
>>> has RIP-relative addressing there is no need for a dedicated PIC
>register.
>> 
>> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
>> reusable, and that seems to have improved performance:
>> https://gcc.gnu.org/gcc-5/changes.html
>
>It still talks about a PIC register on x86-64, which confuses me.
>Perhaps older gcc's would allocate a PIC register under certain
>circumstances, and then lose it for the entire function?
>
>For i386, the PIC register is required by the ABI to be %ebx at the
>point any PLT entry is called.  Not an issue with -mno-plt which goes
>straight to the GOT, although in most cases there needs to be a PIC
>register to find the GOT unless load-time relocation is permitted.
>
>	-hpa

We need a static PIE option so that compiler can optimize it
without using hidden visibility.

H.J.
Sent from my Android device with K-9 Mail. Please excuse my brevity.

[-- Attachment #2.3: Type: text/html, Size: 1855 bytes --]

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:57                                 ` Kees Cook
@ 2017-09-22 19:06                                   ` H. Peter Anvin
  2017-09-22 22:19                                     ` hjl.tools
  2017-09-22 22:30                                     ` hjl.tools
  2017-09-22 19:06                                   ` H. Peter Anvin
  1 sibling, 2 replies; 231+ messages in thread
From: H. Peter Anvin @ 2017-09-22 19:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Thomas Garnier, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov

On 09/22/17 11:57, Kees Cook wrote:
> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
>> has RIP-relative addressing there is no need for a dedicated PIC register.
> 
> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
> reusable, and that seems to have improved performance:
> https://gcc.gnu.org/gcc-5/changes.html

It still talks about a PIC register on x86-64, which confuses me.
Perhaps older gcc's would allocate a PIC register under certain
circumstances, and then lose it for the entire function?

For i386, the PIC register is required by the ABI to be %ebx at the
point any PLT entry is called.  Not an issue with -mno-plt which goes
straight to the GOT, although in most cases there needs to be a PIC
register to find the GOT unless load-time relocation is permitted.

	-hpa

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:57                                 ` Kees Cook
  2017-09-22 19:06                                   ` H. Peter Anvin
@ 2017-09-22 19:06                                   ` H. Peter Anvin
  1 sibling, 0 replies; 231+ messages in thread
From: H. Peter Anvin @ 2017-09-22 19:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Peter Zijlstra,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On 09/22/17 11:57, Kees Cook wrote:
> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
>> has RIP-relative addressing there is no need for a dedicated PIC register.
> 
> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
> reusable, and that seems to have improved performance:
> https://gcc.gnu.org/gcc-5/changes.html

It still talks about a PIC register on x86-64, which confuses me.
Perhaps older gcc's would allocate a PIC register under certain
circumstances, and then lose it for the entire function?

For i386, the PIC register is required by the ABI to be %ebx at the
point any PLT entry is called.  Not an issue with -mno-plt which goes
straight to the GOT, although in most cases there needs to be a PIC
register to find the GOT unless load-time relocation is permitted.

	-hpa


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
                                                   ` (2 preceding siblings ...)
  2017-09-22 18:59                                 ` Thomas Garnier
@ 2017-09-22 18:59                                 ` Thomas Garnier
  2017-09-23  9:49                                 ` Ingo Molnar
  2017-09-23  9:49                                 ` Ingo Molnar
  5 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22 18:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/22/17 09:32, Ingo Molnar wrote:
>>
>> BTW., I think things improved with ORC because with ORC we have RBP as an extra
>> register and with PIE we lose RBX - so register pressure in code generation is
>> lower.
>>
>
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.
>
> I'm somewhat confused how we can have as much as almost 1% overhead.  I
> suspect that we end up making a GOT and maybe even a PLT for no good reason.

We have a GOT with very few entries, mainly linker script globals that
I think we can work to reduce or remove.

We have a PLT but it is empty. On latest iteration (not sent yet),
modules have PLT32 relocations but no PLT entry. I got rid of
mcmodel=large for modules and instead I move the beginning of the
module section just after the kernel so relative relocations work.

>
>         -hpa



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
  2017-09-22 18:57                                 ` Kees Cook
  2017-09-22 18:57                                 ` Kees Cook
@ 2017-09-22 18:59                                 ` Thomas Garnier
  2017-09-22 18:59                                 ` Thomas Garnier
                                                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22 18:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel

On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/22/17 09:32, Ingo Molnar wrote:
>>
>> BTW., I think things improved with ORC because with ORC we have RBP as an extra
>> register and with PIE we lose RBX - so register pressure in code generation is
>> lower.
>>
>
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.
>
> I'm somewhat confused how we can have as much as almost 1% overhead.  I
> suspect that we end up making a GOT and maybe even a PLT for no good reason.

We have a GOT with very few entries, mainly linker script globals that
I think we can work to reduce or remove.

We have a PLT but it is empty. On latest iteration (not sent yet),
modules have PLT32 relocations but no PLT entry. I got rid of
mcmodel=large for modules and instead I move the beginning of the
module section just after the kernel so relative relocations work.

>
>         -hpa



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
@ 2017-09-22 18:57                                 ` Kees Cook
  2017-09-22 19:06                                   ` H. Peter Anvin
  2017-09-22 19:06                                   ` H. Peter Anvin
  2017-09-22 18:57                                 ` Kees Cook
                                                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 231+ messages in thread
From: Kees Cook @ 2017-09-22 18:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Garnier, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

FWIW, since gcc 5, the PIC register isn't totally lost. It is now
reusable, and that seems to have improved performance:
https://gcc.gnu.org/gcc-5/changes.html

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
  2017-09-22 18:57                                 ` Kees Cook
@ 2017-09-22 18:57                                 ` Kees Cook
  2017-09-22 18:59                                 ` Thomas Garnier
                                                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 231+ messages in thread
From: Kees Cook @ 2017-09-22 18:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Peter Zijlstra,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

FWIW, since gcc 5, the PIC register isn't totally lost. It is now
reusable, and that seems to have improved performance:
https://gcc.gnu.org/gcc-5/changes.html

-Kees

-- 
Kees Cook
Pixel Security

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 16:32                             ` Ingo Molnar
  2017-09-22 18:08                               ` Thomas Garnier
  2017-09-22 18:08                               ` Thomas Garnier
@ 2017-09-22 18:38                               ` H. Peter Anvin
  2017-09-22 18:57                                 ` Kees Cook
                                                   ` (5 more replies)
  2017-09-22 18:38                               ` H. Peter Anvin
  3 siblings, 6 replies; 231+ messages in thread
From: H. Peter Anvin @ 2017-09-22 18:38 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown

On 09/22/17 09:32, Ingo Molnar wrote:
> 
> BTW., I think things improved with ORC because with ORC we have RBP as an extra 
> register and with PIE we lose RBX - so register pressure in code generation is 
> lower.
> 

We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
has RIP-relative addressing there is no need for a dedicated PIC register.

I'm somewhat confused how we can have as much as almost 1% overhead.  I
suspect that we end up making a GOT and maybe even a PLT for no good reason.

	-hpa

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 16:32                             ` Ingo Molnar
                                                 ` (2 preceding siblings ...)
  2017-09-22 18:38                               ` H. Peter Anvin
@ 2017-09-22 18:38                               ` H. Peter Anvin
  3 siblings, 0 replies; 231+ messages in thread
From: H. Peter Anvin @ 2017-09-22 18:38 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel

On 09/22/17 09:32, Ingo Molnar wrote:
> 
> BTW., I think things improved with ORC because with ORC we have RBP as an extra 
> register and with PIE we lose RBX - so register pressure in code generation is 
> lower.
> 

We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
has RIP-relative addressing there is no need for a dedicated PIC register.

I'm somewhat confused how we can have as much as almost 1% overhead.  I
suspect that we end up making a GOT and maybe even a PLT for no good reason.

	-hpa

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:28             ` Peter Zijlstra
@ 2017-09-22 18:27               ` H. Peter Anvin
  2017-09-23 10:00                 ` Ingo Molnar
  2017-09-23 10:00                 ` Ingo Molnar
  2017-09-22 18:27               ` H. Peter Anvin
  1 sibling, 2 replies; 231+ messages in thread
From: H. Peter Anvin @ 2017-09-22 18:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown

On 08/21/17 07:28, Peter Zijlstra wrote:
> 
> Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
> immediates. Still, small RIP relative should be able to live at any
> point as long as everything lives inside the same 2G relative range, so
> would still allow the goal of increasing the KASLR range.
> 
> So I'm not seeing how we need large mode for that. That said, after
> reading up on all this, RIP relative will not be too pretty either,
> while CALL is naturally RIP relative, data still needs an explicit %rip
> offset, still loads better than the large model.
> 

The large model makes no sense whatsoever.  I think what we're actually
looking for is the small-PIC model.

Ingo asked:
> I.e. is there no GCC code generation mode where code can be placed anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits so that the 
> generated code is fast?

That's the small-PIC model.  I think if all symbols are forced to hidden
then it won't even need a GOT/PLT.

We do need to consider how we want modules to fit into whatever model we
choose, though.  They can be adjacent, or we could go with a more
traditional dynamic link model where the modules can be separate, and
chained together with the main kernel via the GOT.

	-hpa

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:28             ` Peter Zijlstra
  2017-09-22 18:27               ` H. Peter Anvin
@ 2017-09-22 18:27               ` H. Peter Anvin
  1 sibling, 0 replies; 231+ messages in thread
From: H. Peter Anvin @ 2017-09-22 18:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, Kernel Hardening,
	Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki,
	Daniel Micay

On 08/21/17 07:28, Peter Zijlstra wrote:
> 
> Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
> immediates. Still, small RIP relative should be able to live at any
> point as long as everything lives inside the same 2G relative range, so
> would still allow the goal of increasing the KASLR range.
> 
> So I'm not seeing how we need large mode for that. That said, after
> reading up on all this, RIP relative will not be too pretty either,
> while CALL is naturally RIP relative, data still needs an explicit %rip
> offset, still loads better than the large model.
> 

The large model makes no sense whatsoever.  I think what we're actually
looking for is the small-PIC model.

Ingo asked:
> I.e. is there no GCC code generation mode where code can be placed anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits so that the 
> generated code is fast?

That's the small-PIC model.  I think if all symbols are forced to hidden
then it won't even need a GOT/PLT.

We do need to consider how we want modules to fit into whatever model we
choose, though.  They can be adjacent, or we could go with a more
traditional dynamic link model where the modules can be separate, and
chained together with the main kernel via the GOT.

	-hpa

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 16:32                             ` Ingo Molnar
  2017-09-22 18:08                               ` Thomas Garnier
@ 2017-09-22 18:08                               ` Thomas Garnier
  2017-09-23  9:43                                 ` Ingo Molnar
  2017-09-23  9:43                                 ` Ingo Molnar
  2017-09-22 18:38                               ` H. Peter Anvin
  2017-09-22 18:38                               ` H. Peter Anvin
  3 siblings, 2 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22 18:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Sep 22, 2017 at 9:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> >   window, but in reality I've been procrastinating this is due to the permanent,
>> >   non-trivial impact PIE has on generated C code. )
>> >
>> > * Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> 1) PIE sometime needs two instructions to represent a single
>> >> instruction on mcmodel=kernel.
>> >
>> > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > kernel, with the very latest GCC?
>>
>> I am not sure what is the best way to measure that.
>
> If this is the dominant factor then 'sizeof vmlinux' ought to be enough:
>
>> With ORC: PIE .text is 0.814224% than baseline
>
> I.e. the overhead is +0.81% in both size and (roughly) in number of instructions
> executed.
>
> BTW., I think things improved with ORC because with ORC we have RBP as an extra
> register and with PIE we lose RBX - so register pressure in code generation is
> lower.

That make sense.

>
> Ok, I suspect we can try it, but my preconditions for merging it would be:
>
>   1) Linus doesn't NAK it (obviously)

Of course.

>   2) we first implement the additional entropy bits that Linus suggested.
>
> does this work for you?

Sure, I can look at how feasible that is. If it is, can I send
everything as part of the same patch set? The additional entropy would
be enabled for all KASLR but PIE will be off-by-default of course.

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 16:32                             ` Ingo Molnar
@ 2017-09-22 18:08                               ` Thomas Garnier
  2017-09-22 18:08                               ` Thomas Garnier
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22 18:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Fri, Sep 22, 2017 at 9:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> >   window, but in reality I've been procrastinating this is due to the permanent,
>> >   non-trivial impact PIE has on generated C code. )
>> >
>> > * Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> 1) PIE sometime needs two instructions to represent a single
>> >> instruction on mcmodel=kernel.
>> >
>> > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > kernel, with the very latest GCC?
>>
>> I am not sure what is the best way to measure that.
>
> If this is the dominant factor then 'sizeof vmlinux' ought to be enough:
>
>> With ORC: PIE .text is 0.814224% than baseline
>
> I.e. the overhead is +0.81% in both size and (roughly) in number of instructions
> executed.
>
> BTW., I think things improved with ORC because with ORC we have RBP as an extra
> register and with PIE we lose RBX - so register pressure in code generation is
> lower.

That make sense.

>
> Ok, I suspect we can try it, but my preconditions for merging it would be:
>
>   1) Linus doesn't NAK it (obviously)

Of course.

>   2) we first implement the additional entropy bits that Linus suggested.
>
> does this work for you?

Sure, I can look at how feasible that is. If it is, can I send
everything as part of the same patch set? The additional entropy would
be enabled for all KASLR but PIE will be off-by-default of course.

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:16                           ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
@ 2017-09-22 16:32                             ` Ingo Molnar
  2017-09-22 18:08                               ` Thomas Garnier
                                                 ` (3 more replies)
  2017-09-22 16:32                             ` Ingo Molnar
  3 siblings, 4 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-22 16:32 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> 
> I am not sure what is the best way to measure that.

If this is the dominant factor then 'sizeof vmlinux' ought to be enough:

> With ORC: PIE .text is 0.814224% than baseline

I.e. the overhead is +0.81% in both size and (roughly) in number of instructions 
executed.

BTW., I think things improved with ORC because with ORC we have RBP as an extra 
register and with PIE we lose RBX - so register pressure in code generation is 
lower.

Ok, I suspect we can try it, but my preconditions for merging it would be:

  1) Linus doesn't NAK it (obviously)
  2) we first implement the additional entropy bits that Linus suggested.

does this work for you?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:16                           ` Thomas Garnier
                                               ` (2 preceding siblings ...)
  2017-09-22 16:32                             ` Ingo Molnar
@ 2017-09-22 16:32                             ` Ingo Molnar
  3 siblings, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-22 16:32 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> 
> I am not sure what is the best way to measure that.

If this is the dominant factor then 'sizeof vmlinux' ought to be enough:

> With ORC: PIE .text is 0.814224% than baseline

I.e. the overhead is +0.81% in both size and (roughly) in number of instructions 
executed.

BTW., I think things improved with ORC because with ORC we have RBP as an extra 
register and with PIE we lose RBX - so register pressure in code generation is 
lower.

Ok, I suspect we can try it, but my preconditions for merging it would be:

  1) Linus doesn't NAK it (obviously)
  2) we first implement the additional entropy bits that Linus suggested.

does this work for you?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22  4:24                               ` Markus Trippelsdorf
@ 2017-09-22 14:38                                 ` Thomas Garnier
  2017-09-22 14:38                                 ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22 14:38 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Ard Biesheuvel, Ingo Molnar, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Peter Zijlstra,
	Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown

On Thu, Sep 21, 2017 at 9:24 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2017.09.21 at 14:21 -0700, Thomas Garnier wrote:
>> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> >
>> > On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>> > >
>> > > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> > >   window, but in reality I've been procrastinating this is due to the permanent,
>> > >   non-trivial impact PIE has on generated C code. )
>> > >
>> > > * Thomas Garnier <thgarnie@google.com> wrote:
>> > >
>> > >> 1) PIE sometime needs two instructions to represent a single
>> > >> instruction on mcmodel=kernel.
>> > >
>> > > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > > kernel, with the very latest GCC?
>> > >
>> > > Also, to make sure: which unwinder did you use for your measurements,
>> > > frame-pointers or ORC? Please use ORC only for future numbers, as
>> > > frame-pointers is obsolete from a performance measurement POV.
>> > >
>> > >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>> > >
>> > > Hopefully this can either be fixed in GCC or at least influenced via a compiler
>> > > switch in the future.
>> > >
>> >
>> > There are somewhat related concerns in the ARM world, so it would be
>> > good if we could work with the GCC developers to get a more high level
>> > and arch neutral command line option (-mkernel-pie? sounds yummy!)
>> > that stops the compiler from making inferences that only hold for
>> > shared libraries and/or other hosted executables (GOT indirections,
>> > avoiding text relocations etc). That way, we will also be able to drop
>> > the 'hidden' visibility override at some point, which we currently
>> > need to prevent the compiler from redirecting all global symbol
>> > references via entries in the GOT.
>>
>> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
>> register for stack cookies but I can see great benefits in having a
>> more general kernel flag that would allow to get rid of the GOT and
>> PLT when you are building position independent code for the kernel. It
>> could also include optimizations like folding switch tables etc...
>>
>> Should we start a separate discussion on that? Anyone that would be
>> more experienced than I to push that to gcc & clang upstream?
>
> Just open a gcc bug. See
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 as an example.

Make sense, I will look into this. Thanks Andy for the stack cookie bug!

>
> --
> Markus



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22  4:24                               ` Markus Trippelsdorf
  2017-09-22 14:38                                 ` Thomas Garnier
@ 2017-09-22 14:38                                 ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22 14:38 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, Len Brown,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 9:24 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2017.09.21 at 14:21 -0700, Thomas Garnier wrote:
>> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> >
>> > On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>> > >
>> > > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> > >   window, but in reality I've been procrastinating this is due to the permanent,
>> > >   non-trivial impact PIE has on generated C code. )
>> > >
>> > > * Thomas Garnier <thgarnie@google.com> wrote:
>> > >
>> > >> 1) PIE sometime needs two instructions to represent a single
>> > >> instruction on mcmodel=kernel.
>> > >
>> > > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > > kernel, with the very latest GCC?
>> > >
>> > > Also, to make sure: which unwinder did you use for your measurements,
>> > > frame-pointers or ORC? Please use ORC only for future numbers, as
>> > > frame-pointers is obsolete from a performance measurement POV.
>> > >
>> > >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>> > >
>> > > Hopefully this can either be fixed in GCC or at least influenced via a compiler
>> > > switch in the future.
>> > >
>> >
>> > There are somewhat related concerns in the ARM world, so it would be
>> > good if we could work with the GCC developers to get a more high level
>> > and arch neutral command line option (-mkernel-pie? sounds yummy!)
>> > that stops the compiler from making inferences that only hold for
>> > shared libraries and/or other hosted executables (GOT indirections,
>> > avoiding text relocations etc). That way, we will also be able to drop
>> > the 'hidden' visibility override at some point, which we currently
>> > need to prevent the compiler from redirecting all global symbol
>> > references via entries in the GOT.
>>
>> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
>> register for stack cookies but I can see great benefits in having a
>> more general kernel flag that would allow to get rid of the GOT and
>> PLT when you are building position independent code for the kernel. It
>> could also include optimizations like folding switch tables etc...
>>
>> Should we start a separate discussion on that? Anyone that would be
>> more experienced than I to push that to gcc & clang upstream?
>
> Just open a gcc bug. See
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 as an example.

Make sense, I will look into this. Thanks Andy for the stack cookie bug!

>
> --
> Markus



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:21                             ` Thomas Garnier
@ 2017-09-22  4:24                               ` Markus Trippelsdorf
  2017-09-22 14:38                                 ` Thomas Garnier
  2017-09-22 14:38                                 ` Thomas Garnier
  2017-09-22 23:55                               ` Thomas Garnier
  2017-09-22 23:55                               ` Thomas Garnier
  2 siblings, 2 replies; 231+ messages in thread
From: Markus Trippelsdorf @ 2017-09-22  4:24 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, Len Brown,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On 2017.09.21 at 14:21 -0700, Thomas Garnier wrote:
> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > ( Sorry about the delay in answering this. I could blame the delay on the merge
> > >   window, but in reality I've been procrastinating this is due to the permanent,
> > >   non-trivial impact PIE has on generated C code. )
> > >
> > > * Thomas Garnier <thgarnie@google.com> wrote:
> > >
> > >> 1) PIE sometime needs two instructions to represent a single
> > >> instruction on mcmodel=kernel.
> > >
> > > What again is the typical frequency of this occurring in an x86-64 defconfig
> > > kernel, with the very latest GCC?
> > >
> > > Also, to make sure: which unwinder did you use for your measurements,
> > > frame-pointers or ORC? Please use ORC only for future numbers, as
> > > frame-pointers is obsolete from a performance measurement POV.
> > >
> > >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> > >
> > > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > > switch in the future.
> > >
> >
> > There are somewhat related concerns in the ARM world, so it would be
> > good if we could work with the GCC developers to get a more high level
> > and arch neutral command line option (-mkernel-pie? sounds yummy!)
> > that stops the compiler from making inferences that only hold for
> > shared libraries and/or other hosted executables (GOT indirections,
> > avoiding text relocations etc). That way, we will also be able to drop
> > the 'hidden' visibility override at some point, which we currently
> > need to prevent the compiler from redirecting all global symbol
> > references via entries in the GOT.
> 
> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
> register for stack cookies but I can see great benefits in having a
> more general kernel flag that would allow to get rid of the GOT and
> PLT when you are building position independent code for the kernel. It
> could also include optimizations like folding switch tables etc...
> 
> Should we start a separate discussion on that? Anyone that would be
> more experienced than I to push that to gcc & clang upstream?

Just open a gcc bug. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 as an example.

-- 
Markus

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:16                           ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
@ 2017-09-22  0:06                             ` Thomas Garnier
  2017-09-22 16:32                             ` Ingo Molnar
  2017-09-22 16:32                             ` Ingo Molnar
  3 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22  0:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Thu, Sep 21, 2017 at 2:16 PM, Thomas Garnier <thgarnie@google.com> wrote:
>
> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
>
> I am not sure what is the best way to measure that.

A very approximate approach would be to look at each instruction using
the signed trick with a _32S relocation. All _32S relocations won't be
translated to more instructions because some are just relocating part
of an absolute mov which would be actually smaller if relative.

Used this command to get a relative estimate:

objdump -dr ./baseline/vmlinux | egrep -A 2 '\-0x[0-9a-f]{8}' | grep
_32S | wc -l

Got 6130 places, if you assume each add at least 7 bytes. It adds at
least 42910 bytes on the .text section. The text section is 78599
bytes bigger from baseline to PIE. That's at least 54% of the size
difference. Assuming we found all of them and we can't factor the
impact on using an additional register.

Similar approach with the switch table but a bit more complex:

1) Find all constructs as with an lea (%rip) followed by a jmp
instruction inside a function (typical unfolded switch case).
2) Remove occurrences of less than 4 for the destination address

Result: 480 switch cases in 49 functions. Each case take at least 9
bytes and the switch itself takes 16 bytes (assuming one per
function).

That's 5104 bytes for easy to identify switches (less than 7% of the increase).

I am certainly missing a lot of differences. I checked if the percpu
changes impacted the size and it doesn't (only 3 bytes added on PIE).

I also tried different ways to compare the .text section like size of
symbols or number of bytes on full disassembly but the results are
really off from the whole .text size so I am not sure if it is the
right way to go about it.

>
> >
> > Also, to make sure: which unwinder did you use for your measurements,
> > frame-pointers or ORC? Please use ORC only for future numbers, as
> > frame-pointers is obsolete from a performance measurement POV.
>
> I used the default configuration which uses frame-pointer. I built all
> the different binaries with ORC and I see an improvement in size:
>
> On latest revision (just built and ran performance tests this week):
>
> With framepointer: PIE .text is 0.837324% than baseline
>
> With ORC: PIE .text is 0.814224% than baseline
>
> Comparing baselines only, ORC is -2.849832% than frame-pointers.
>
> >
> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> >
> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > switch in the future.
> >
> >> The switches are the biggest increase on small functions but I don't
> >> think they represent a large portion of the difference (number 1 is).
> >
> > Ok.
> >
> >> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> >> kernel being faster by 1% across multiple runs (comparing 50 runs done
> >> across 5 reboots twice). I don't think PIE is faster than a
> >> mcmodel=kernel but recent versions of gcc makes them fairly similar.
> >
> > So I think we are down to an overhead range where the inherent noise (both random
> > and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.
> >
> > So I think it's the kernel .text size change that is the best noise-free proxy for
> > the overhead impact of PIE.
>
> I agree but it might be hard to measure the exact impact. What is
> acceptable and what is not?
>
> >
> > It doesn't hurt to double check actual real performance as well, just don't expect
> > there to be much of a signal for anything but fully cached microbenchmark
> > workloads.
>
> That's aligned with what I see in the latest performance testing.
> Performance is close enough that it is hard to get exact numbers (pie
> is just a bit slower than baseline on hackench (~1%)).
>
> >
> > Thanks,
> >
> >         Ingo
>
>
>
> --
> Thomas




-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:16                           ` Thomas Garnier
@ 2017-09-22  0:06                             ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-22  0:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 2:16 PM, Thomas Garnier <thgarnie@google.com> wrote:
>
> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
>
> I am not sure what is the best way to measure that.

A very approximate approach would be to look at each instruction using
the signed trick with a _32S relocation. All _32S relocations won't be
translated to more instructions because some are just relocating part
of an absolute mov which would be actually smaller if relative.

Used this command to get a relative estimate:

objdump -dr ./baseline/vmlinux | egrep -A 2 '\-0x[0-9a-f]{8}' | grep
_32S | wc -l

Got 6130 places, if you assume each add at least 7 bytes. It adds at
least 42910 bytes on the .text section. The text section is 78599
bytes bigger from baseline to PIE. That's at least 54% of the size
difference. Assuming we found all of them and we can't factor the
impact on using an additional register.

Similar approach with the switch table but a bit more complex:

1) Find all constructs as with an lea (%rip) followed by a jmp
instruction inside a function (typical unfolded switch case).
2) Remove occurrences of less than 4 for the destination address

Result: 480 switch cases in 49 functions. Each case take at least 9
bytes and the switch itself takes 16 bytes (assuming one per
function).

That's 5104 bytes for easy to identify switches (less than 7% of the increase).

I am certainly missing a lot of differences. I checked if the percpu
changes impacted the size and it doesn't (only 3 bytes added on PIE).

I also tried different ways to compare the .text section like size of
symbols or number of bytes on full disassembly but the results are
really off from the whole .text size so I am not sure if it is the
right way to go about it.

>
> >
> > Also, to make sure: which unwinder did you use for your measurements,
> > frame-pointers or ORC? Please use ORC only for future numbers, as
> > frame-pointers is obsolete from a performance measurement POV.
>
> I used the default configuration which uses frame-pointer. I built all
> the different binaries with ORC and I see an improvement in size:
>
> On latest revision (just built and ran performance tests this week):
>
> With framepointer: PIE .text is 0.837324% than baseline
>
> With ORC: PIE .text is 0.814224% than baseline
>
> Comparing baselines only, ORC is -2.849832% than frame-pointers.
>
> >
> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> >
> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > switch in the future.
> >
> >> The switches are the biggest increase on small functions but I don't
> >> think they represent a large portion of the difference (number 1 is).
> >
> > Ok.
> >
> >> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> >> kernel being faster by 1% across multiple runs (comparing 50 runs done
> >> across 5 reboots twice). I don't think PIE is faster than a
> >> mcmodel=kernel but recent versions of gcc makes them fairly similar.
> >
> > So I think we are down to an overhead range where the inherent noise (both random
> > and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.
> >
> > So I think it's the kernel .text size change that is the best noise-free proxy for
> > the overhead impact of PIE.
>
> I agree but it might be hard to measure the exact impact. What is
> acceptable and what is not?
>
> >
> > It doesn't hurt to double check actual real performance as well, just don't expect
> > there to be much of a signal for anything but fully cached microbenchmark
> > workloads.
>
> That's aligned with what I see in the latest performance testing.
> Performance is close enough that it is hard to get exact numbers (pie
> is just a bit slower than baseline on hackench (~1%)).
>
> >
> > Thanks,
> >
> >         Ingo
>
>
>
> --
> Thomas




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 16:10                           ` Ard Biesheuvel
  2017-09-21 21:21                             ` Thomas Garnier
@ 2017-09-21 21:21                             ` Thomas Garnier
  2017-09-22  4:24                               ` Markus Trippelsdorf
                                                 ` (2 more replies)
  1 sibling, 3 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-21 21:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> >
> > Also, to make sure: which unwinder did you use for your measurements,
> > frame-pointers or ORC? Please use ORC only for future numbers, as
> > frame-pointers is obsolete from a performance measurement POV.
> >
> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> >
> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > switch in the future.
> >
>
> There are somewhat related concerns in the ARM world, so it would be
> good if we could work with the GCC developers to get a more high level
> and arch neutral command line option (-mkernel-pie? sounds yummy!)
> that stops the compiler from making inferences that only hold for
> shared libraries and/or other hosted executables (GOT indirections,
> avoiding text relocations etc). That way, we will also be able to drop
> the 'hidden' visibility override at some point, which we currently
> need to prevent the compiler from redirecting all global symbol
> references via entries in the GOT.

My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
register for stack cookies but I can see great benefits in having a
more general kernel flag that would allow to get rid of the GOT and
PLT when you are building position independent code for the kernel. It
could also include optimizations like folding switch tables etc...

Should we start a separate discussion on that? Anyone that would be
more experienced than I to push that to gcc & clang upstream?

>
> All we really need is the ability to move the image around in virtual
> memory, and things like reducing the CoW footprint or enabling ELF
> symbol preemption are completely irrelevant for us.




-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 16:10                           ` Ard Biesheuvel
@ 2017-09-21 21:21                             ` Thomas Garnier
  2017-09-21 21:21                             ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-21 21:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> >
> > Also, to make sure: which unwinder did you use for your measurements,
> > frame-pointers or ORC? Please use ORC only for future numbers, as
> > frame-pointers is obsolete from a performance measurement POV.
> >
> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> >
> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > switch in the future.
> >
>
> There are somewhat related concerns in the ARM world, so it would be
> good if we could work with the GCC developers to get a more high level
> and arch neutral command line option (-mkernel-pie? sounds yummy!)
> that stops the compiler from making inferences that only hold for
> shared libraries and/or other hosted executables (GOT indirections,
> avoiding text relocations etc). That way, we will also be able to drop
> the 'hidden' visibility override at some point, which we currently
> need to prevent the compiler from redirecting all global symbol
> references via entries in the GOT.

My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
register for stack cookies but I can see great benefits in having a
more general kernel flag that would allow to get rid of the GOT and
PLT when you are building position independent code for the kernel. It
could also include optimizations like folding switch tables etc...

Should we start a separate discussion on that? Anyone that would be
more experienced than I to push that to gcc & clang upstream?

>
> All we really need is the ability to move the image around in virtual
> memory, and things like reducing the CoW footprint or enabling ELF
> symbol preemption are completely irrelevant for us.




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 15:59                         ` Ingo Molnar
  2017-09-21 16:10                           ` Ard Biesheuvel
  2017-09-21 16:10                           ` Ard Biesheuvel
@ 2017-09-21 21:16                           ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
                                               ` (3 more replies)
  2017-09-21 21:16                           ` Thomas Garnier
  3 siblings, 4 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-21 21:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> ( Sorry about the delay in answering this. I could blame the delay on the merge
>   window, but in reality I've been procrastinating this is due to the permanent,
>   non-trivial impact PIE has on generated C code. )
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> 1) PIE sometime needs two instructions to represent a single
>> instruction on mcmodel=kernel.
>
> What again is the typical frequency of this occurring in an x86-64 defconfig
> kernel, with the very latest GCC?

I am not sure what is the best way to measure that.

>
> Also, to make sure: which unwinder did you use for your measurements,
> frame-pointers or ORC? Please use ORC only for future numbers, as
> frame-pointers is obsolete from a performance measurement POV.

I used the default configuration which uses frame-pointer. I built all
the different binaries with ORC and I see an improvement in size:

On latest revision (just built and ran performance tests this week):

With framepointer: PIE .text is 0.837324% than baseline

With ORC: PIE .text is 0.814224% than baseline

Comparing baselines only, ORC is -2.849832% than frame-pointers.

>
>> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>
> Hopefully this can either be fixed in GCC or at least influenced via a compiler
> switch in the future.
>
>> The switches are the biggest increase on small functions but I don't
>> think they represent a large portion of the difference (number 1 is).
>
> Ok.
>
>> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
>> kernel being faster by 1% across multiple runs (comparing 50 runs done
>> across 5 reboots twice). I don't think PIE is faster than a
>> mcmodel=kernel but recent versions of gcc makes them fairly similar.
>
> So I think we are down to an overhead range where the inherent noise (both random
> and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.
>
> So I think it's the kernel .text size change that is the best noise-free proxy for
> the overhead impact of PIE.

I agree but it might be hard to measure the exact impact. What is
acceptable and what is not?

>
> It doesn't hurt to double check actual real performance as well, just don't expect
> there to be much of a signal for anything but fully cached microbenchmark
> workloads.

That's aligned with what I see in the latest performance testing.
Performance is close enough that it is hard to get exact numbers (pie
is just a bit slower than baseline on hackench (~1%)).

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 15:59                         ` Ingo Molnar
                                             ` (2 preceding siblings ...)
  2017-09-21 21:16                           ` Thomas Garnier
@ 2017-09-21 21:16                           ` Thomas Garnier
  3 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-09-21 21:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> ( Sorry about the delay in answering this. I could blame the delay on the merge
>   window, but in reality I've been procrastinating this is due to the permanent,
>   non-trivial impact PIE has on generated C code. )
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> 1) PIE sometime needs two instructions to represent a single
>> instruction on mcmodel=kernel.
>
> What again is the typical frequency of this occurring in an x86-64 defconfig
> kernel, with the very latest GCC?

I am not sure what is the best way to measure that.

>
> Also, to make sure: which unwinder did you use for your measurements,
> frame-pointers or ORC? Please use ORC only for future numbers, as
> frame-pointers is obsolete from a performance measurement POV.

I used the default configuration which uses frame-pointer. I built all
the different binaries with ORC and I see an improvement in size:

On latest revision (just built and ran performance tests this week):

With framepointer: PIE .text is 0.837324% than baseline

With ORC: PIE .text is 0.814224% than baseline

Comparing baselines only, ORC is -2.849832% than frame-pointers.

>
>> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>
> Hopefully this can either be fixed in GCC or at least influenced via a compiler
> switch in the future.
>
>> The switches are the biggest increase on small functions but I don't
>> think they represent a large portion of the difference (number 1 is).
>
> Ok.
>
>> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
>> kernel being faster by 1% across multiple runs (comparing 50 runs done
>> across 5 reboots twice). I don't think PIE is faster than a
>> mcmodel=kernel but recent versions of gcc makes them fairly similar.
>
> So I think we are down to an overhead range where the inherent noise (both random
> and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.
>
> So I think it's the kernel .text size change that is the best noise-free proxy for
> the overhead impact of PIE.

I agree but it might be hard to measure the exact impact. What is
acceptable and what is not?

>
> It doesn't hurt to double check actual real performance as well, just don't expect
> there to be much of a signal for anything but fully cached microbenchmark
> workloads.

That's aligned with what I see in the latest performance testing.
Performance is close enough that it is hard to get exact numbers (pie
is just a bit slower than baseline on hackench (~1%)).

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 15:59                         ` Ingo Molnar
  2017-09-21 16:10                           ` Ard Biesheuvel
@ 2017-09-21 16:10                           ` Ard Biesheuvel
  2017-09-21 21:21                             ` Thomas Garnier
  2017-09-21 21:21                             ` Thomas Garnier
  2017-09-21 21:16                           ` Thomas Garnier
  2017-09-21 21:16                           ` Thomas Garnier
  3 siblings, 2 replies; 231+ messages in thread
From: Ard Biesheuvel @ 2017-09-21 16:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> ( Sorry about the delay in answering this. I could blame the delay on the merge
>   window, but in reality I've been procrastinating this is due to the permanent,
>   non-trivial impact PIE has on generated C code. )
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> 1) PIE sometime needs two instructions to represent a single
>> instruction on mcmodel=kernel.
>
> What again is the typical frequency of this occurring in an x86-64 defconfig
> kernel, with the very latest GCC?
>
> Also, to make sure: which unwinder did you use for your measurements,
> frame-pointers or ORC? Please use ORC only for future numbers, as
> frame-pointers is obsolete from a performance measurement POV.
>
>> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>
> Hopefully this can either be fixed in GCC or at least influenced via a compiler
> switch in the future.
>

There are somewhat related concerns in the ARM world, so it would be
good if we could work with the GCC developers to get a more high level
and arch neutral command line option (-mkernel-pie? sounds yummy!)
that stops the compiler from making inferences that only hold for
shared libraries and/or other hosted executables (GOT indirections,
avoiding text relocations etc). That way, we will also be able to drop
the 'hidden' visibility override at some point, which we currently
need to prevent the compiler from redirecting all global symbol
references via entries in the GOT.

All we really need is the ability to move the image around in virtual
memory, and things like reducing the CoW footprint or enabling ELF
symbol preemption are completely irrelevant for us.

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 15:59                         ` Ingo Molnar
@ 2017-09-21 16:10                           ` Ard Biesheuvel
  2017-09-21 16:10                           ` Ard Biesheuvel
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 231+ messages in thread
From: Ard Biesheuvel @ 2017-09-21 16:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> ( Sorry about the delay in answering this. I could blame the delay on the merge
>   window, but in reality I've been procrastinating this is due to the permanent,
>   non-trivial impact PIE has on generated C code. )
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> 1) PIE sometime needs two instructions to represent a single
>> instruction on mcmodel=kernel.
>
> What again is the typical frequency of this occurring in an x86-64 defconfig
> kernel, with the very latest GCC?
>
> Also, to make sure: which unwinder did you use for your measurements,
> frame-pointers or ORC? Please use ORC only for future numbers, as
> frame-pointers is obsolete from a performance measurement POV.
>
>> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>
> Hopefully this can either be fixed in GCC or at least influenced via a compiler
> switch in the future.
>

There are somewhat related concerns in the ARM world, so it would be
good if we could work with the GCC developers to get a more high level
and arch neutral command line option (-mkernel-pie? sounds yummy!)
that stops the compiler from making inferences that only hold for
shared libraries and/or other hosted executables (GOT indirections,
avoiding text relocations etc). That way, we will also be able to drop
the 'hidden' visibility override at some point, which we currently
need to prevent the compiler from redirecting all global symbol
references via entries in the GOT.

All we really need is the ability to move the image around in virtual
memory, and things like reducing the CoW footprint or enabling ELF
symbol preemption are completely irrelevant for us.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-29 19:34                       ` Thomas Garnier
@ 2017-09-21 15:59                         ` Ingo Molnar
  2017-09-21 16:10                           ` Ard Biesheuvel
                                             ` (3 more replies)
  2017-09-21 15:59                         ` Ingo Molnar
  1 sibling, 4 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-21 15:59 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


( Sorry about the delay in answering this. I could blame the delay on the merge 
  window, but in reality I've been procrastinating this is due to the permanent,
  non-trivial impact PIE has on generated C code. )

* Thomas Garnier <thgarnie@google.com> wrote:

> 1) PIE sometime needs two instructions to represent a single
> instruction on mcmodel=kernel.

What again is the typical frequency of this occurring in an x86-64 defconfig 
kernel, with the very latest GCC?

Also, to make sure: which unwinder did you use for your measurements, 
frame-pointers or ORC? Please use ORC only for future numbers, as
frame-pointers is obsolete from a performance measurement POV.

> 2) GCC does not optimize switches in PIE in order to reduce relocations:

Hopefully this can either be fixed in GCC or at least influenced via a compiler 
switch in the future.

> The switches are the biggest increase on small functions but I don't
> think they represent a large portion of the difference (number 1 is).

Ok.

> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> kernel being faster by 1% across multiple runs (comparing 50 runs done
> across 5 reboots twice). I don't think PIE is faster than a
> mcmodel=kernel but recent versions of gcc makes them fairly similar.

So I think we are down to an overhead range where the inherent noise (both random 
and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.

So I think it's the kernel .text size change that is the best noise-free proxy for 
the overhead impact of PIE.

It doesn't hurt to double check actual real performance as well, just don't expect 
there to be much of a signal for anything but fully cached microbenchmark 
workloads.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-29 19:34                       ` Thomas Garnier
  2017-09-21 15:59                         ` Ingo Molnar
@ 2017-09-21 15:59                         ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-09-21 15:59 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


( Sorry about the delay in answering this. I could blame the delay on the merge 
  window, but in reality I've been procrastinating this is due to the permanent,
  non-trivial impact PIE has on generated C code. )

* Thomas Garnier <thgarnie@google.com> wrote:

> 1) PIE sometime needs two instructions to represent a single
> instruction on mcmodel=kernel.

What again is the typical frequency of this occurring in an x86-64 defconfig 
kernel, with the very latest GCC?

Also, to make sure: which unwinder did you use for your measurements, 
frame-pointers or ORC? Please use ORC only for future numbers, as
frame-pointers is obsolete from a performance measurement POV.

> 2) GCC does not optimize switches in PIE in order to reduce relocations:

Hopefully this can either be fixed in GCC or at least influenced via a compiler 
switch in the future.

> The switches are the biggest increase on small functions but I don't
> think they represent a large portion of the difference (number 1 is).

Ok.

> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> kernel being faster by 1% across multiple runs (comparing 50 runs done
> across 5 reboots twice). I don't think PIE is faster than a
> mcmodel=kernel but recent versions of gcc makes them fairly similar.

So I think we are down to an overhead range where the inherent noise (both random 
and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.

So I think it's the kernel .text size change that is the best noise-free proxy for 
the overhead impact of PIE.

It doesn't hurt to double check actual real performance as well, just don't expect 
there to be much of a signal for anything but fully cached microbenchmark 
workloads.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-25 15:05                     ` Thomas Garnier
@ 2017-08-29 19:34                       ` Thomas Garnier
  2017-09-21 15:59                         ` Ingo Molnar
  2017-09-21 15:59                         ` Ingo Molnar
  2017-08-29 19:34                       ` Thomas Garnier
  1 sibling, 2 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-29 19:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Aug 25, 2017 at 8:05 AM, Thomas Garnier <thgarnie@google.com> wrote:
> On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> With the fix for function tracing, the hackbench results have an
>>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>>> configuration, the numbers are closer to 0.8%.
>>>
>>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>>> and +1.180% on the ubuntu configuration.
>>
>> A 1% text size increase is still significant. Could you look at the disassembly,
>> where does the size increase come from?
>
> I will take a look, in this current iteration I added the .got and
> .got.plt so removing them will remove a big (even if they are small,
> we don't use them to increase perf).
>
> What do you think about the perf numbers in general so far?

I looked at the size increase. I could identify two common cases:

1) PIE sometime needs two instructions to represent a single
instruction on mcmodel=kernel.

For example, this instruction plays on the sign extension (mcmodel=kernel):

mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

The address 0xffffffff81c25fa0 can be represented as -0x7e3da060 using
a 32S relocation.

with PIE:

lea    rbx,[rip+<off>] (7 bytes)
mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

2) GCC does not optimize switches in PIE in order to reduce relocations:

For example the switch in phy_modes [1]:

static inline const char *phy_modes(phy_interface_t interface)
{
    switch (interface) {
    case PHY_INTERFACE_MODE_NA:
        return "";
    case PHY_INTERFACE_MODE_INTERNAL:
        return "internal";
    case PHY_INTERFACE_MODE_MII:
        return "mii";

Without PIE (gcc 7.2.0), the whole table is optimize to be one instruction:

   0x000000000040045b <+27>:    mov    rdi,QWORD PTR [rax*8+0x400660]

With PIE (gcc 7.2.0):

   0x0000000000000641 <+33>:    movsxd rax,DWORD PTR [rdx+rax*4]
   0x0000000000000645 <+37>:    add    rax,rdx
   0x0000000000000648 <+40>:    jmp    rax
....
   0x000000000000065d <+61>:    lea    rdi,[rip+0x264]        # 0x8c8
   0x0000000000000664 <+68>:    jmp    0x651 <main+49>
   0x0000000000000666 <+70>:    lea    rdi,[rip+0x2bc]        # 0x929
   0x000000000000066d <+77>:    jmp    0x651 <main+49>
   0x000000000000066f <+79>:    lea    rdi,[rip+0x2a8]        # 0x91e
   0x0000000000000676 <+86>:    jmp    0x651 <main+49>
   0x0000000000000678 <+88>:    lea    rdi,[rip+0x294]        # 0x913
   0x000000000000067f <+95>:    jmp    0x651 <main+49>

That's a deliberate choice, clang is able to optimize it (clang-3.8):

   0x0000000000000963 <+19>:    lea    rcx,[rip+0x200406]        # 0x200d70
   0x000000000000096a <+26>:    mov    rdi,QWORD PTR [rcx+rax*8]

I checked gcc and the code deciding to fold the switch basically do
not do it for pic to reduce relocations [2].

The switches are the biggest increase on small functions but I don't
think they represent a large portion of the difference (number 1 is).

A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
kernel being faster by 1% across multiple runs (comparing 50 runs done
across 5 reboots twice). I don't think PIE is faster than a
mcmodel=kernel but recent versions of gcc makes them fairly similar.

[1] http://elixir.free-electrons.com/linux/v4.13-rc7/source/include/linux/phy.h#L113
[2] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828

>
>>
>> Thanks,
>>
>>         Ingo
>
>
>
> --
> Thomas



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-25 15:05                     ` Thomas Garnier
  2017-08-29 19:34                       ` Thomas Garnier
@ 2017-08-29 19:34                       ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-29 19:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Fri, Aug 25, 2017 at 8:05 AM, Thomas Garnier <thgarnie@google.com> wrote:
> On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> With the fix for function tracing, the hackbench results have an
>>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>>> configuration, the numbers are closer to 0.8%.
>>>
>>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>>> and +1.180% on the ubuntu configuration.
>>
>> A 1% text size increase is still significant. Could you look at the disassembly,
>> where does the size increase come from?
>
> I will take a look, in this current iteration I added the .got and
> .got.plt so removing them will remove a big (even if they are small,
> we don't use them to increase perf).
>
> What do you think about the perf numbers in general so far?

I looked at the size increase. I could identify two common cases:

1) PIE sometime needs two instructions to represent a single
instruction on mcmodel=kernel.

For example, this instruction plays on the sign extension (mcmodel=kernel):

mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

The address 0xffffffff81c25fa0 can be represented as -0x7e3da060 using
a 32S relocation.

with PIE:

lea    rbx,[rip+<off>] (7 bytes)
mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

2) GCC does not optimize switches in PIE in order to reduce relocations:

For example the switch in phy_modes [1]:

static inline const char *phy_modes(phy_interface_t interface)
{
    switch (interface) {
    case PHY_INTERFACE_MODE_NA:
        return "";
    case PHY_INTERFACE_MODE_INTERNAL:
        return "internal";
    case PHY_INTERFACE_MODE_MII:
        return "mii";

Without PIE (gcc 7.2.0), the whole table is optimize to be one instruction:

   0x000000000040045b <+27>:    mov    rdi,QWORD PTR [rax*8+0x400660]

With PIE (gcc 7.2.0):

   0x0000000000000641 <+33>:    movsxd rax,DWORD PTR [rdx+rax*4]
   0x0000000000000645 <+37>:    add    rax,rdx
   0x0000000000000648 <+40>:    jmp    rax
....
   0x000000000000065d <+61>:    lea    rdi,[rip+0x264]        # 0x8c8
   0x0000000000000664 <+68>:    jmp    0x651 <main+49>
   0x0000000000000666 <+70>:    lea    rdi,[rip+0x2bc]        # 0x929
   0x000000000000066d <+77>:    jmp    0x651 <main+49>
   0x000000000000066f <+79>:    lea    rdi,[rip+0x2a8]        # 0x91e
   0x0000000000000676 <+86>:    jmp    0x651 <main+49>
   0x0000000000000678 <+88>:    lea    rdi,[rip+0x294]        # 0x913
   0x000000000000067f <+95>:    jmp    0x651 <main+49>

That's a deliberate choice, clang is able to optimize it (clang-3.8):

   0x0000000000000963 <+19>:    lea    rcx,[rip+0x200406]        # 0x200d70
   0x000000000000096a <+26>:    mov    rdi,QWORD PTR [rcx+rax*8]

I checked gcc and the code deciding to fold the switch basically do
not do it for pic to reduce relocations [2].

The switches are the biggest increase on small functions but I don't
think they represent a large portion of the difference (number 1 is).

A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
kernel being faster by 1% across multiple runs (comparing 50 runs done
across 5 reboots twice). I don't think PIE is faster than a
mcmodel=kernel but recent versions of gcc makes them fairly similar.

[1] http://elixir.free-electrons.com/linux/v4.13-rc7/source/include/linux/phy.h#L113
[2] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828

>
>>
>> Thanks,
>>
>>         Ingo
>
>
>
> --
> Thomas



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:31         ` Peter Zijlstra
  2017-08-21 15:57           ` Thomas Garnier
  2017-08-21 15:57           ` Thomas Garnier
@ 2017-08-28  1:26           ` H. Peter Anvin
  2017-08-28  1:26           ` H. Peter Anvin
  3 siblings, 0 replies; 231+ messages in thread
From: H. Peter Anvin @ 2017-08-28  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Garnier
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On 08/21/17 07:31, Peter Zijlstra wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> 
>>> Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
>>> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
>>> x86-64 address space to randomize the location of kernel text. The location of
>>> modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
> 
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
> 

No.  It is about whether you can do something like:

	movl $variable, %eax		/* rax = &variable; */

or

	addl %ecx,variable(,%rsi,4)	/* variable[rsi] += ecx */

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:31         ` Peter Zijlstra
                             ` (2 preceding siblings ...)
  2017-08-28  1:26           ` H. Peter Anvin
@ 2017-08-28  1:26           ` H. Peter Anvin
  3 siblings, 0 replies; 231+ messages in thread
From: H. Peter Anvin @ 2017-08-28  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Garnier
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, Kernel Hardening,
	Christoph Lameter, Ingo Molnar, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki,
	Daniel Micay

On 08/21/17 07:31, Peter Zijlstra wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> 
>>> Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
>>> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
>>> x86-64 address space to randomize the location of kernel text. The location of
>>> modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
> 
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
> 

No.  It is about whether you can do something like:

	movl $variable, %eax		/* rax = &variable; */

or

	addl %ecx,variable(,%rsi,4)	/* variable[rsi] += ecx */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:42                   ` Linus Torvalds
@ 2017-08-25 15:35                     ` Thomas Garnier
  2017-08-25 15:35                     ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-25 15:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Thu, Aug 24, 2017 at 2:42 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Aug 24, 2017 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >
> > My original performance testing was done with an Ubuntu generic
> > configuration. This configuration has the CONFIG_FUNCTION_TRACER
> > option which was incompatible with PIE. The tracer failed to replace
> > the __fentry__ call by a nop slide on each traceable function because
> > the instruction was not the one expected. If PIE is enabled, gcc
> > generates a difference call instruction based on the GOT without
> > checking the visibility options (basically call *__fentry__@GOTPCREL).
>
> Gah.
>
> Don't we actually have *more* address bits for randomization at the
> low end, rather than getting rid of -mcmodel=kernel?

We have but I think we use most of it for potential modules and the
fixmap but it is not that big. The increase in range from 1G to 3G is
just an example and a way to ensure PIE work as expected. The long
term goal is being able to put the kernel where we want in memory,
randomizing the position and the order of almost all memory sections.

That would be valuable against BTB attack [1] for example where
randomization on the low 32-bit is ineffective.

[1] https://github.com/felixwilhelm/mario_baslr

>
> Has anybody looked at just moving kernel text by smaller values than
> the page size? Yeah, yeah, the kernel has several sections that need
> page alignment, but I think we could relocate normal text by just the
> cacheline size, and that sounds like it would give several bits of
> randomness with little downside.

I didn't look into it. There is value in it depending on performance
impact. I think both PIE and lower grain randomization would be
useful.

>
> Or has somebody already looked at it and I just missed it?
>
>                Linus




-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:42                   ` Linus Torvalds
  2017-08-25 15:35                     ` Thomas Garnier
@ 2017-08-25 15:35                     ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-25 15:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Aug 24, 2017 at 2:42 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Aug 24, 2017 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >
> > My original performance testing was done with an Ubuntu generic
> > configuration. This configuration has the CONFIG_FUNCTION_TRACER
> > option which was incompatible with PIE. The tracer failed to replace
> > the __fentry__ call by a nop slide on each traceable function because
> > the instruction was not the one expected. If PIE is enabled, gcc
> > generates a difference call instruction based on the GOT without
> > checking the visibility options (basically call *__fentry__@GOTPCREL).
>
> Gah.
>
> Don't we actually have *more* address bits for randomization at the
> low end, rather than getting rid of -mcmodel=kernel?

We have but I think we use most of it for potential modules and the
fixmap but it is not that big. The increase in range from 1G to 3G is
just an example and a way to ensure PIE work as expected. The long
term goal is being able to put the kernel where we want in memory,
randomizing the position and the order of almost all memory sections.

That would be valuable against BTB attack [1] for example where
randomization on the low 32-bit is ineffective.

[1] https://github.com/felixwilhelm/mario_baslr

>
> Has anybody looked at just moving kernel text by smaller values than
> the page size? Yeah, yeah, the kernel has several sections that need
> page alignment, but I think we could relocate normal text by just the
> cacheline size, and that sounds like it would give several bits of
> randomness with little downside.

I didn't look into it. There is value in it depending on performance
impact. I think both PIE and lower grain randomization would be
useful.

>
> Or has somebody already looked at it and I just missed it?
>
>                Linus




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-25  8:04                   ` Ingo Molnar
  2017-08-25 15:05                     ` Thomas Garnier
@ 2017-08-25 15:05                     ` Thomas Garnier
  2017-08-29 19:34                       ` Thomas Garnier
  2017-08-29 19:34                       ` Thomas Garnier
  1 sibling, 2 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-25 15:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> With the fix for function tracing, the hackbench results have an
>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>> configuration, the numbers are closer to 0.8%.
>>
>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>> and +1.180% on the ubuntu configuration.
>
> A 1% text size increase is still significant. Could you look at the disassembly,
> where does the size increase come from?

I will take a look, in this current iteration I added the .got and
.got.plt so removing them will remove a big (even if they are small,
we don't use them to increase perf).

What do you think about the perf numbers in general so far?

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-25  8:04                   ` Ingo Molnar
@ 2017-08-25 15:05                     ` Thomas Garnier
  2017-08-25 15:05                     ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-25 15:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> With the fix for function tracing, the hackbench results have an
>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>> configuration, the numbers are closer to 0.8%.
>>
>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>> and +1.180% on the ubuntu configuration.
>
> A 1% text size increase is still significant. Could you look at the disassembly,
> where does the size increase come from?

I will take a look, in this current iteration I added the .got and
.got.plt so removing them will remove a big (even if they are small,
we don't use them to increase perf).

What do you think about the perf numbers in general so far?

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
                                     ` (3 preceding siblings ...)
  2017-08-25  8:04                   ` Ingo Molnar
@ 2017-08-25  8:04                   ` Ingo Molnar
  2017-08-25 15:05                     ` Thomas Garnier
  2017-08-25 15:05                     ` Thomas Garnier
  4 siblings, 2 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-25  8:04 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> With the fix for function tracing, the hackbench results have an
> average of +0.8 to +1.4% (from +8% to +10% before). With a default
> configuration, the numbers are closer to 0.8%.
> 
> On the .text size, with gcc 4.9 I see +0.8% on default configuration
> and +1.180% on the ubuntu configuration.

A 1% text size increase is still significant. Could you look at the disassembly, 
where does the size increase come from?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
                                     ` (2 preceding siblings ...)
  2017-08-25  1:07                   ` Steven Rostedt
@ 2017-08-25  8:04                   ` Ingo Molnar
  2017-08-25  8:04                   ` Ingo Molnar
  4 siblings, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-25  8:04 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> With the fix for function tracing, the hackbench results have an
> average of +0.8 to +1.4% (from +8% to +10% before). With a default
> configuration, the numbers are closer to 0.8%.
> 
> On the .text size, with gcc 4.9 I see +0.8% on default configuration
> and +1.180% on the ubuntu configuration.

A 1% text size increase is still significant. Could you look at the disassembly, 
where does the size increase come from?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:42                   ` Linus Torvalds
  2017-08-24 21:42                   ` Linus Torvalds
@ 2017-08-25  1:07                   ` Steven Rostedt
  2017-08-25  8:04                   ` Ingo Molnar
  2017-08-25  8:04                   ` Ingo Molnar
  4 siblings, 0 replies; 231+ messages in thread
From: Steven Rostedt @ 2017-08-25  1:07 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, 24 Aug 2017 14:13:38 -0700
Thomas Garnier <thgarnie@google.com> wrote:

> With the fix for function tracing, the hackbench results have an
> average of +0.8 to +1.4% (from +8% to +10% before). With a default
> configuration, the numbers are closer to 0.8%.

Wow, an empty fentry function not "nop"ed out only added 8% to 10%
overhead. I never did the benchmarks of that since I did it before
fentry was introduced, which was with the old "mcount". That gave an
average of 13% overhead in hackbench.

-- Steve

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:42                   ` Linus Torvalds
@ 2017-08-24 21:42                   ` Linus Torvalds
  2017-08-25 15:35                     ` Thomas Garnier
  2017-08-25 15:35                     ` Thomas Garnier
  2017-08-25  1:07                   ` Steven Rostedt
                                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 231+ messages in thread
From: Linus Torvalds @ 2017-08-24 21:42 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Thu, Aug 24, 2017 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
>
> My original performance testing was done with an Ubuntu generic
> configuration. This configuration has the CONFIG_FUNCTION_TRACER
> option which was incompatible with PIE. The tracer failed to replace
> the __fentry__ call by a nop slide on each traceable function because
> the instruction was not the one expected. If PIE is enabled, gcc
> generates a difference call instruction based on the GOT without
> checking the visibility options (basically call *__fentry__@GOTPCREL).

Gah.

Don't we actually have *more* address bits for randomization at the
low end, rather than getting rid of -mcmodel=kernel?

Has anybody looked at just moving kernel text by smaller values than
the page size? Yeah, yeah, the kernel has several sections that need
page alignment, but I think we could relocate normal text by just the
cacheline size, and that sounds like it would give several bits of
randomness with little downside.

Or has somebody already looked at it and I just missed it?

               Linus

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
@ 2017-08-24 21:42                   ` Linus Torvalds
  2017-08-24 21:42                   ` Linus Torvalds
                                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 231+ messages in thread
From: Linus Torvalds @ 2017-08-24 21:42 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Aug 24, 2017 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
>
> My original performance testing was done with an Ubuntu generic
> configuration. This configuration has the CONFIG_FUNCTION_TRACER
> option which was incompatible with PIE. The tracer failed to replace
> the __fentry__ call by a nop slide on each traceable function because
> the instruction was not the one expected. If PIE is enabled, gcc
> generates a difference call instruction based on the GOT without
> checking the visibility options (basically call *__fentry__@GOTPCREL).

Gah.

Don't we actually have *more* address bits for randomization at the
low end, rather than getting rid of -mcmodel=kernel?

Has anybody looked at just moving kernel text by smaller values than
the page size? Yeah, yeah, the kernel has several sections that need
page alignment, but I think we could relocate normal text by just the
cacheline size, and that sounds like it would give several bits of
randomness with little downside.

Or has somebody already looked at it and I just missed it?

               Linus

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-17 14:10               ` Thomas Garnier
  2017-08-24 21:13                 ` Thomas Garnier
@ 2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:42                   ` Linus Torvalds
                                     ` (4 more replies)
  1 sibling, 5 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-24 21:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Thu, Aug 17, 2017 at 7:10 AM, Thomas Garnier <thgarnie@google.com> wrote:
>
> On Thu, Aug 17, 2017 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> > > > > -model=small/medium assume you are on the low 32-bit. It generates
> > > > > instructions where the virtual addresses have the high 32-bit to be zero.
> > > >
> > > > How are these assumptions hardcoded by GCC? Most of the instructions should be
> > > > relocatable straight away, as most call/jump/branch instructions are
> > > > RIP-relative.
> > >
> > > I think PIE is capable to use relative instructions well. mcmodel=large assumes
> > > symbols can be anywhere.
> >
> > So if the numbers in your changelog and Kconfig text cannot be trusted, there's
> > this description of the size impact which I suspect is less susceptible to
> > measurement error:
> >
> > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > +         increase on the .text sections). The vmlinux binary will be
> > +         significantly smaller due to less relocations.
> >
> > ... but describing a 1-2% kernel text size increase as "slightly more assembly"
> > shows a gratituous disregard to kernel code generation quality! In reality that's
> > a huge size increase that in most cases will almost directly transfer to a 1-2%
> > slowdown for kernel intense workloads.
> >
> >
> > Where does that size increase come from, if PIE is capable of using relative
> > instructins well? Does it come from the loss of a generic register and the
> > resulting increase in register pressure, stack spills, etc.?
>
> I will try to gather more information on the size increase. The size
> increase might be smaller with gcc 4.9 given performance was much
> better.

Coming back on this thread as I identified the root cause of the
performance issue.

My original performance testing was done with an Ubuntu generic
configuration. This configuration has the CONFIG_FUNCTION_TRACER
option which was incompatible with PIE. The tracer failed to replace
the __fentry__ call by a nop slide on each traceable function because
the instruction was not the one expected. If PIE is enabled, gcc
generates a difference call instruction based on the GOT without
checking the visibility options (basically call *__fentry__@GOTPCREL).

With the fix for function tracing, the hackbench results have an
average of +0.8 to +1.4% (from +8% to +10% before). With a default
configuration, the numbers are closer to 0.8%.

On the .text size, with gcc 4.9 I see +0.8% on default configuration
and +1.180% on the ubuntu configuration.

Next iteration should have an updated set of performance metrics (will
try to use gcc 6.0 or higher) and incorporate the fix on function
tracing.

Let me know if you have questions and feedback.

>
> >
> > So I'm still unhappy about this all, and about the attitude surrounding it.
> >
> > Thanks,
> >
> >         Ingo
>
>
>
>
> --
> Thomas




-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-17 14:10               ` Thomas Garnier
@ 2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:13                 ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-24 21:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Aug 17, 2017 at 7:10 AM, Thomas Garnier <thgarnie@google.com> wrote:
>
> On Thu, Aug 17, 2017 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> > > > > -model=small/medium assume you are on the low 32-bit. It generates
> > > > > instructions where the virtual addresses have the high 32-bit to be zero.
> > > >
> > > > How are these assumptions hardcoded by GCC? Most of the instructions should be
> > > > relocatable straight away, as most call/jump/branch instructions are
> > > > RIP-relative.
> > >
> > > I think PIE is capable to use relative instructions well. mcmodel=large assumes
> > > symbols can be anywhere.
> >
> > So if the numbers in your changelog and Kconfig text cannot be trusted, there's
> > this description of the size impact which I suspect is less susceptible to
> > measurement error:
> >
> > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > +         increase on the .text sections). The vmlinux binary will be
> > +         significantly smaller due to less relocations.
> >
> > ... but describing a 1-2% kernel text size increase as "slightly more assembly"
> > shows a gratituous disregard to kernel code generation quality! In reality that's
> > a huge size increase that in most cases will almost directly transfer to a 1-2%
> > slowdown for kernel intense workloads.
> >
> >
> > Where does that size increase come from, if PIE is capable of using relative
> > instructins well? Does it come from the loss of a generic register and the
> > resulting increase in register pressure, stack spills, etc.?
>
> I will try to gather more information on the size increase. The size
> increase might be smaller with gcc 4.9 given performance was much
> better.

Coming back on this thread as I identified the root cause of the
performance issue.

My original performance testing was done with an Ubuntu generic
configuration. This configuration has the CONFIG_FUNCTION_TRACER
option which was incompatible with PIE. The tracer failed to replace
the __fentry__ call by a nop slide on each traceable function because
the instruction was not the one expected. If PIE is enabled, gcc
generates a difference call instruction based on the GOT without
checking the visibility options (basically call *__fentry__@GOTPCREL).

With the fix for function tracing, the hackbench results have an
average of +0.8 to +1.4% (from +8% to +10% before). With a default
configuration, the numbers are closer to 0.8%.

On the .text size, with gcc 4.9 I see +0.8% on default configuration
and +1.180% on the ubuntu configuration.

Next iteration should have an updated set of performance metrics (will
try to use gcc 6.0 or higher) and incorporate the fix on function
tracing.

Let me know if you have questions and feedback.

>
> >
> > So I'm still unhappy about this all, and about the attitude surrounding it.
> >
> > Thanks,
> >
> >         Ingo
>
>
>
>
> --
> Thomas




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:31         ` Peter Zijlstra
  2017-08-21 15:57           ` Thomas Garnier
@ 2017-08-21 15:57           ` Thomas Garnier
  2017-08-28  1:26           ` H. Peter Anvin
  2017-08-28  1:26           ` H. Peter Anvin
  3 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-21 15:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Mon, Aug 21, 2017 at 7:31 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
>> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
>> > x86-64 address space to randomize the location of kernel text. The location of
>> > modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
>
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
>

That's different than what I expected at first too.

Now, I think I have an alternative of using mcmodel=large. I could use
-fPIC and ensure modules are never far away from the main kernel
(moving the module section start close to the random kernel end). I
looked at it and that seems possible but will require more work. I
plan to start with the mcmodel=large support and add this mode in a
way that could benefit classic KASLR (without -fPIC) because it
randomize where modules start based on the kernel.

-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:31         ` Peter Zijlstra
@ 2017-08-21 15:57           ` Thomas Garnier
  2017-08-21 15:57           ` Thomas Garnier
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-21 15:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Mon, Aug 21, 2017 at 7:31 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
>> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
>> > x86-64 address space to randomize the location of kernel text. The location of
>> > modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
>
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
>

That's different than what I expected at first too.

Now, I think I have an alternative of using mcmodel=large. I could use
-fPIC and ensure modules are never far away from the main kernel
(moving the module section start close to the random kernel end). I
looked at it and that seems possible but will require more work. I
plan to start with the mcmodel=large support and add this mode in a
way that could benefit classic KASLR (without -fPIC) because it
randomize where modules start based on the kernel.

-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
                           ` (3 preceding siblings ...)
  2017-08-16 15:12         ` Ingo Molnar
@ 2017-08-21 14:31         ` Peter Zijlstra
  2017-08-21 15:57           ` Thomas Garnier
                             ` (3 more replies)
  2017-08-21 14:31         ` Peter Zijlstra
  5 siblings, 4 replies; 231+ messages in thread
From: Peter Zijlstra @ 2017-08-21 14:31 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo, Christoph Lameter

On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:

> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> > x86-64 address space to randomize the location of kernel text. The location of
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates
> instructions where the virtual addresses have the high 32-bit to be
> zero.

That's a compiler fail, right? Because the SDM states that for "CALL
rel32" the 32bit displacement is sign extended on x86_64.

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
                           ` (4 preceding siblings ...)
  2017-08-21 14:31         ` Peter Zijlstra
@ 2017-08-21 14:31         ` Peter Zijlstra
  5 siblings, 0 replies; 231+ messages in thread
From: Peter Zijlstra @ 2017-08-21 14:31 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:

> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> > x86-64 address space to randomize the location of kernel text. The location of
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates
> instructions where the virtual addresses have the high 32-bit to be
> zero.

That's a compiler fail, right? Because the SDM states that for "CALL
rel32" the 32bit displacement is sign extended on x86_64.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 13:32           ` Peter Zijlstra
  2017-08-21 14:28             ` Peter Zijlstra
@ 2017-08-21 14:28             ` Peter Zijlstra
  2017-09-22 18:27               ` H. Peter Anvin
  2017-09-22 18:27               ` H. Peter Anvin
  1 sibling, 2 replies; 231+ messages in thread
From: Peter Zijlstra @ 2017-08-21 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Mon, Aug 21, 2017 at 03:32:22PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 16, 2017 at 05:12:35PM +0200, Ingo Molnar wrote:
> > Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
> > instruction level.
> > 
> > Function calls look like this:
> > 
> >  -mcmodel=medium:
> > 
> >    757:   e8 98 ff ff ff          callq  6f4 <test_code>
> > 
> >  -mcmodel=large
> > 
> >    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
> >    782:   ff ff ff 
> >    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
> >    789:   ff d0                   callq  *%rax
> > 
> > And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
> > totally unacceptable.
> 
> So why does this need to be computed for every single call? How often
> will we move the kernel around at runtime?
> 
> Why can't we process the relocation at load time and then discard the
> relocation tables along with the rest of __init ?

Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
immediates. Still, small RIP relative should be able to live at any
point as long as everything lives inside the same 2G relative range, so
would still allow the goal of increasing the KASLR range.

So I'm not seeing how we need large mode for that. That said, after
reading up on all this, RIP relative will not be too pretty either,
while CALL is naturally RIP relative, data still needs an explicit %rip
offset, still loads better than the large model.

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 13:32           ` Peter Zijlstra
@ 2017-08-21 14:28             ` Peter Zijlstra
  2017-08-21 14:28             ` Peter Zijlstra
  1 sibling, 0 replies; 231+ messages in thread
From: Peter Zijlstra @ 2017-08-21 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Mon, Aug 21, 2017 at 03:32:22PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 16, 2017 at 05:12:35PM +0200, Ingo Molnar wrote:
> > Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
> > instruction level.
> > 
> > Function calls look like this:
> > 
> >  -mcmodel=medium:
> > 
> >    757:   e8 98 ff ff ff          callq  6f4 <test_code>
> > 
> >  -mcmodel=large
> > 
> >    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
> >    782:   ff ff ff 
> >    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
> >    789:   ff d0                   callq  *%rax
> > 
> > And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
> > totally unacceptable.
> 
> So why does this need to be computed for every single call? How often
> will we move the kernel around at runtime?
> 
> Why can't we process the relocation at load time and then discard the
> relocation tables along with the rest of __init ?

Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
immediates. Still, small RIP relative should be able to live at any
point as long as everything lives inside the same 2G relative range, so
would still allow the goal of increasing the KASLR range.

So I'm not seeing how we need large mode for that. That said, after
reading up on all this, RIP relative will not be too pretty either,
while CALL is naturally RIP relative, data still needs an explicit %rip
offset, still loads better than the large model.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (5 preceding siblings ...)
  2017-08-16 16:57           ` Thomas Garnier
@ 2017-08-21 13:32           ` Peter Zijlstra
  2017-08-21 14:28             ` Peter Zijlstra
  2017-08-21 14:28             ` Peter Zijlstra
  2017-08-21 13:32           ` Peter Zijlstra
  7 siblings, 2 replies; 231+ messages in thread
From: Peter Zijlstra @ 2017-08-21 13:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Wed, Aug 16, 2017 at 05:12:35PM +0200, Ingo Molnar wrote:
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
> instruction level.
> 
> Function calls look like this:
> 
>  -mcmodel=medium:
> 
>    757:   e8 98 ff ff ff          callq  6f4 <test_code>
> 
>  -mcmodel=large
> 
>    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
>    782:   ff ff ff 
>    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
>    789:   ff d0                   callq  *%rax
> 
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
> totally unacceptable.

So why does this need to be computed for every single call? How often
will we move the kernel around at runtime?

Why can't we process the relocation at load time and then discard the
relocation tables along with the rest of __init ?

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (6 preceding siblings ...)
  2017-08-21 13:32           ` Peter Zijlstra
@ 2017-08-21 13:32           ` Peter Zijlstra
  7 siblings, 0 replies; 231+ messages in thread
From: Peter Zijlstra @ 2017-08-21 13:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Wed, Aug 16, 2017 at 05:12:35PM +0200, Ingo Molnar wrote:
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
> instruction level.
> 
> Function calls look like this:
> 
>  -mcmodel=medium:
> 
>    757:   e8 98 ff ff ff          callq  6f4 <test_code>
> 
>  -mcmodel=large
> 
>    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
>    782:   ff ff ff 
>    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
>    789:   ff d0                   callq  *%rax
> 
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
> totally unacceptable.

So why does this need to be computed for every single call? How often
will we move the kernel around at runtime?

Why can't we process the relocation at load time and then discard the
relocation tables along with the rest of __init ?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-17  8:09             ` Ingo Molnar
  2017-08-17 14:10               ` Thomas Garnier
@ 2017-08-17 14:10               ` Thomas Garnier
  2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:13                 ` Thomas Garnier
  1 sibling, 2 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-17 14:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Thu, Aug 17, 2017 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
> > > > -model=small/medium assume you are on the low 32-bit. It generates
> > > > instructions where the virtual addresses have the high 32-bit to be zero.
> > >
> > > How are these assumptions hardcoded by GCC? Most of the instructions should be
> > > relocatable straight away, as most call/jump/branch instructions are
> > > RIP-relative.
> >
> > I think PIE is capable to use relative instructions well. mcmodel=large assumes
> > symbols can be anywhere.
>
> So if the numbers in your changelog and Kconfig text cannot be trusted, there's
> this description of the size impact which I suspect is less susceptible to
> measurement error:
>
> +         The kernel and modules will generate slightly more assembly (1 to 2%
> +         increase on the .text sections). The vmlinux binary will be
> +         significantly smaller due to less relocations.
>
> ... but describing a 1-2% kernel text size increase as "slightly more assembly"
> shows a gratituous disregard to kernel code generation quality! In reality that's
> a huge size increase that in most cases will almost directly transfer to a 1-2%
> slowdown for kernel intense workloads.
>
>
> Where does that size increase come from, if PIE is capable of using relative
> instructins well? Does it come from the loss of a generic register and the
> resulting increase in register pressure, stack spills, etc.?

I will try to gather more information on the size increase. The size
increase might be smaller with gcc 4.9 given performance was much
better.

>
> So I'm still unhappy about this all, and about the attitude surrounding it.
>
> Thanks,
>
>         Ingo




-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-17  8:09             ` Ingo Molnar
@ 2017-08-17 14:10               ` Thomas Garnier
  2017-08-17 14:10               ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-17 14:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Aug 17, 2017 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
> > > > -model=small/medium assume you are on the low 32-bit. It generates
> > > > instructions where the virtual addresses have the high 32-bit to be zero.
> > >
> > > How are these assumptions hardcoded by GCC? Most of the instructions should be
> > > relocatable straight away, as most call/jump/branch instructions are
> > > RIP-relative.
> >
> > I think PIE is capable to use relative instructions well. mcmodel=large assumes
> > symbols can be anywhere.
>
> So if the numbers in your changelog and Kconfig text cannot be trusted, there's
> this description of the size impact which I suspect is less susceptible to
> measurement error:
>
> +         The kernel and modules will generate slightly more assembly (1 to 2%
> +         increase on the .text sections). The vmlinux binary will be
> +         significantly smaller due to less relocations.
>
> ... but describing a 1-2% kernel text size increase as "slightly more assembly"
> shows a gratituous disregard to kernel code generation quality! In reality that's
> a huge size increase that in most cases will almost directly transfer to a 1-2%
> slowdown for kernel intense workloads.
>
>
> Where does that size increase come from, if PIE is capable of using relative
> instructins well? Does it come from the loss of a generic register and the
> resulting increase in register pressure, stack spills, etc.?

I will try to gather more information on the size increase. The size
increase might be smaller with gcc 4.9 given performance was much
better.

>
> So I'm still unhappy about this all, and about the attitude surrounding it.
>
> Thanks,
>
>         Ingo




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 16:57           ` Thomas Garnier
  2017-08-17  8:09             ` Ingo Molnar
@ 2017-08-17  8:09             ` Ingo Molnar
  2017-08-17 14:10               ` Thomas Garnier
  2017-08-17 14:10               ` Thomas Garnier
  1 sibling, 2 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-17  8:09 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> > > -model=small/medium assume you are on the low 32-bit. It generates 
> > > instructions where the virtual addresses have the high 32-bit to be zero.
> >
> > How are these assumptions hardcoded by GCC? Most of the instructions should be 
> > relocatable straight away, as most call/jump/branch instructions are 
> > RIP-relative.
> 
> I think PIE is capable to use relative instructions well. mcmodel=large assumes 
> symbols can be anywhere.

So if the numbers in your changelog and Kconfig text cannot be trusted, there's 
this description of the size impact which I suspect is less susceptible to 
measurement error:

+         The kernel and modules will generate slightly more assembly (1 to 2%
+         increase on the .text sections). The vmlinux binary will be
+         significantly smaller due to less relocations.

... but describing a 1-2% kernel text size increase as "slightly more assembly" 
shows a gratituous disregard to kernel code generation quality! In reality that's 
a huge size increase that in most cases will almost directly transfer to a 1-2% 
slowdown for kernel intense workloads.

Where does that size increase come from, if PIE is capable of using relative 
instructins well? Does it come from the loss of a generic register and the 
resulting increase in register pressure, stack spills, etc.?

So I'm still unhappy about this all, and about the attitude surrounding it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 16:57           ` Thomas Garnier
@ 2017-08-17  8:09             ` Ingo Molnar
  2017-08-17  8:09             ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-17  8:09 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> > > -model=small/medium assume you are on the low 32-bit. It generates 
> > > instructions where the virtual addresses have the high 32-bit to be zero.
> >
> > How are these assumptions hardcoded by GCC? Most of the instructions should be 
> > relocatable straight away, as most call/jump/branch instructions are 
> > RIP-relative.
> 
> I think PIE is capable to use relative instructions well. mcmodel=large assumes 
> symbols can be anywhere.

So if the numbers in your changelog and Kconfig text cannot be trusted, there's 
this description of the size impact which I suspect is less susceptible to 
measurement error:

+         The kernel and modules will generate slightly more assembly (1 to 2%
+         increase on the .text sections). The vmlinux binary will be
+         significantly smaller due to less relocations.

... but describing a 1-2% kernel text size increase as "slightly more assembly" 
shows a gratituous disregard to kernel code generation quality! In reality that's 
a huge size increase that in most cases will almost directly transfer to a 1-2% 
slowdown for kernel intense workloads.

Where does that size increase come from, if PIE is capable of using relative 
instructins well? Does it come from the loss of a generic register and the 
resulting increase in register pressure, stack spills, etc.?

So I'm still unhappy about this all, and about the attitude surrounding it.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (4 preceding siblings ...)
  2017-08-16 16:57           ` Thomas Garnier
@ 2017-08-16 16:57           ` Thomas Garnier
  2017-08-17  8:09             ` Ingo Molnar
  2017-08-17  8:09             ` Ingo Molnar
  2017-08-21 13:32           ` Peter Zijlstra
  2017-08-21 13:32           ` Peter Zijlstra
  7 siblings, 2 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-16 16:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Wed, Aug 16, 2017 at 8:12 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
> > On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > * Thomas Garnier <thgarnie@google.com> wrote:
> > >
> > >> > Do these changes get us closer to being able to build the kernel as truly
> > >> > position independent, i.e. to place it anywhere in the valid x86-64 address
> > >> > space? Or any other advantages?
> > >>
> > >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
> > >> have a full randomized address space where position and order of sections are
> > >> completely random. There is still some work to get there but being able to build
> > >> a PIE kernel is a significant step.
> > >
> > > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> > >
> > > +config RANDOMIZE_BASE_LARGE
> > > +       bool "Increase the randomization range of the kernel image"
> > > +       depends on X86_64 && RANDOMIZE_BASE
> > > +       select X86_PIE
> > > +       select X86_MODULE_PLTS if MODULES
> > > +       default n
> > > +       ---help---
> > > +         Build the kernel as a Position Independent Executable (PIE) and
> > > +         increase the available randomization range from 1GB to 3GB.
> > > +
> > > +         This option impacts performance on kernel CPU intensive workloads up
> > > +         to 10% due to PIE generated code. Impact on user-mode processes and
> > > +         typical usage would be significantly less (0.50% when you build the
> > > +         kernel).
> > > +
> > > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > > +         increase on the .text sections). The vmlinux binary will be
> > > +         significantly smaller due to less relocations.
> > >
> > > To put 10% kernel overhead into perspective: enabling this option wipes out about
> > > 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> > > ... (!!)
> >
> > Note that 10% is the high-bound of a CPU intensive workload.
>
> Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of
> modern kernel performance. In many cases we are literally applying cycle level
> optimizations that are barely measurable. A 0.1% speedup in linear execution speed
> is already a big success.
>
> > I am going to start doing performance testing on -mcmodel=large to see if it is
> > faster than -fPIE.
>
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine
> instruction level.
>
> Function calls look like this:
>
>  -mcmodel=medium:
>
>    757:   e8 98 ff ff ff          callq  6f4 <test_code>
>
>  -mcmodel=large
>
>    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
>    782:   ff ff ff
>    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
>    789:   ff d0                   callq  *%rax
>
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.
>

I started looking into mcmodel=large and ran into multiple issues. In
the meantime, i thought I would
try difference configurations and compilers.

I did 10 hackbench runs accross 10 reboots with and without pie (same
commit) with gcc 4.9. I copied
the result below and based on the hackbench configuration we are
between -0.29% and 1.92% (average
across is 0.8%) which seems more aligned with what people discussed in
this thread.

I don't know how I got 10% maximum on hackbench, I am still
investigating. It could be the configuration
I used or my base compiler being too old.

> > > I think the fundamental flaw is the assumption that we need a PIE executable
> > > to have a freely relocatable kernel on 64-bit CPUs.
> > >
> > > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> > > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> > > x86-64 address space to randomize the location of kernel text. The location of
> > > modules can be further randomized within that 2GB window.
> >
> > -model=small/medium assume you are on the low 32-bit. It generates instructions
> > where the virtual addresses have the high 32-bit to be zero.
>
> How are these assumptions hardcoded by GCC? Most of the instructions should be
> relocatable straight away, as most call/jump/branch instructions are RIP-relative.

I think PIE is capable to use relative instructions well.
mcmodel=large assumes symbols can be anywhere.

>
> I.e. is there no GCC code generation mode where code can be placed anywhere in the
> canonical address space, yet call and jump distance is within 31 bits so that the
> generated code is fast?

I think that's basically PIE. With PIE, you have the assumption
everything is close, the main issue is any assembly referencing
absolute addresses.

>
> Thanks,
>
>         Ingo

process-pipe-1600 ------
         baseline_samecommit     pie  % diff
0                     16.985  16.999   0.082
1                     17.065  17.071   0.033
2                     17.188  17.130  -0.342
3                     17.148  17.107  -0.240
4                     17.217  17.170  -0.275
5                     17.216  17.145  -0.415
6                     17.161  17.109  -0.304
7                     17.202  17.122  -0.465
8                     17.169  17.173   0.024
9                     17.217  17.178  -0.227
average               17.157  17.120  -0.213
median                17.169  17.122  -0.271
min                   16.985  16.999   0.082
max                   17.217  17.178  -0.228

[14 rows x 3 columns]
threads-pipe-1600 ------
         baseline_samecommit     pie  % diff
0                     17.914  18.041   0.707
1                     18.337  18.352   0.083
2                     18.233  18.457   1.225
3                     18.334  18.402   0.366
4                     18.381  18.369  -0.066
5                     18.370  18.408   0.207
6                     18.337  18.400   0.345
7                     18.368  18.372   0.020
8                     18.328  18.588   1.421
9                     18.369  18.344  -0.138
average               18.297  18.373   0.415
median                18.337  18.373   0.200
min                   17.914  18.041   0.707
max                   18.381  18.588   1.126

[14 rows x 3 columns]
threads-pipe-50 ------
         baseline_samecommit     pie  % diff
0                     23.491  22.794  -2.965
1                     23.219  23.542   1.387
2                     22.886  23.638   3.286
3                     23.233  23.778   2.343
4                     23.228  23.703   2.046
5                     23.000  23.376   1.636
6                     23.589  23.335  -1.079
7                     23.043  23.543   2.169
8                     23.117  23.350   1.007
9                     23.059  23.420   1.564
average               23.187  23.448   1.127
median                23.187  23.448   1.127
min                   22.886  22.794  -0.399
max                   23.589  23.778   0.800

[14 rows x 3 columns]
process-socket-50 ------
         baseline_samecommit     pie  % diff
0                     20.333  20.430   0.479
1                     20.198  20.371   0.856
2                     20.494  20.737   1.185
3                     20.445  21.264   4.008
4                     20.530  20.911   1.854
5                     20.281  20.487   1.015
6                     20.311  20.871   2.757
7                     20.472  20.890   2.044
8                     20.568  20.422  -0.710
9                     20.415  20.647   1.134
average               20.405  20.703   1.462
median                20.415  20.703   1.410
min                   20.198  20.371   0.856
max                   20.568  21.264   3.385

[14 rows x 3 columns]
process-pipe-50 ------
         baseline_samecommit     pie  % diff
0                     20.131  20.643   2.541
1                     20.184  20.658   2.349
2                     20.359  20.907   2.693
3                     20.365  21.284   4.514
4                     20.506  20.578   0.352
5                     20.393  20.599   1.010
6                     20.245  20.515   1.331
7                     20.627  20.964   1.636
8                     20.519  20.862   1.670
9                     20.505  20.741   1.150
average               20.383  20.775   1.922
median                20.383  20.741   1.753
min                   20.131  20.515   1.907
max                   20.627  21.284   3.186

[14 rows x 3 columns]
threads-socket-50 ------
         baseline_samecommit     pie  % diff
0                     23.197  23.728   2.286
1                     23.304  23.585   1.205
2                     23.098  23.379   1.217
3                     23.028  23.787   3.295
4                     23.242  23.122  -0.517
5                     23.036  23.512   2.068
6                     23.139  23.258   0.512
7                     22.801  23.458   2.881
8                     23.319  23.276  -0.187
9                     22.989  23.577   2.557
average               23.115  23.468   1.526
median                23.115  23.468   1.526
min                   22.801  23.122   1.407
max                   23.319  23.787   2.006

[14 rows x 3 columns]
process-socket-1600 ------
         baseline_samecommit     pie  % diff
0                     17.214  17.168  -0.262
1                     17.172  17.195   0.135
2                     17.278  17.137  -0.817
3                     17.173  17.102  -0.414
4                     17.211  17.153  -0.335
5                     17.220  17.160  -0.345
6                     17.224  17.161  -0.365
7                     17.224  17.224  -0.004
8                     17.176  17.135  -0.236
9                     17.242  17.188  -0.311
average               17.213  17.162  -0.296
median                17.214  17.161  -0.306
min                   17.172  17.102  -0.405
max                   17.278  17.224  -0.315

[14 rows x 3 columns]
threads-socket-1600 ------
         baseline_samecommit     pie  % diff
0                     18.395  18.389  -0.031
1                     18.459  18.404  -0.296
2                     18.427  18.445   0.096
3                     18.449  18.421  -0.150
4                     18.416  18.411  -0.026
5                     18.409  18.443   0.185
6                     18.325  18.308  -0.092
7                     18.491  18.317  -0.940
8                     18.496  18.375  -0.656
9                     18.436  18.385  -0.279
average               18.430  18.390  -0.219
median                18.430  18.390  -0.219
min                   18.325  18.308  -0.092
max                   18.496  18.445  -0.278

[14 rows x 3 columns]
Total stats ======
         baseline_samecommit     pie  % diff
average               19.773  19.930   0.791
median                19.773  19.930   0.791
min                   16.985  16.999   0.082
max                   23.589  23.787   0.839

[4 rows x 3 columns]

-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (3 preceding siblings ...)
  2017-08-16 16:26           ` Daniel Micay
@ 2017-08-16 16:57           ` Thomas Garnier
  2017-08-16 16:57           ` Thomas Garnier
                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-16 16:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Wed, Aug 16, 2017 at 8:12 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
> > On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > * Thomas Garnier <thgarnie@google.com> wrote:
> > >
> > >> > Do these changes get us closer to being able to build the kernel as truly
> > >> > position independent, i.e. to place it anywhere in the valid x86-64 address
> > >> > space? Or any other advantages?
> > >>
> > >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
> > >> have a full randomized address space where position and order of sections are
> > >> completely random. There is still some work to get there but being able to build
> > >> a PIE kernel is a significant step.
> > >
> > > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> > >
> > > +config RANDOMIZE_BASE_LARGE
> > > +       bool "Increase the randomization range of the kernel image"
> > > +       depends on X86_64 && RANDOMIZE_BASE
> > > +       select X86_PIE
> > > +       select X86_MODULE_PLTS if MODULES
> > > +       default n
> > > +       ---help---
> > > +         Build the kernel as a Position Independent Executable (PIE) and
> > > +         increase the available randomization range from 1GB to 3GB.
> > > +
> > > +         This option impacts performance on kernel CPU intensive workloads up
> > > +         to 10% due to PIE generated code. Impact on user-mode processes and
> > > +         typical usage would be significantly less (0.50% when you build the
> > > +         kernel).
> > > +
> > > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > > +         increase on the .text sections). The vmlinux binary will be
> > > +         significantly smaller due to less relocations.
> > >
> > > To put 10% kernel overhead into perspective: enabling this option wipes out about
> > > 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> > > ... (!!)
> >
> > Note that 10% is the high-bound of a CPU intensive workload.
>
> Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of
> modern kernel performance. In many cases we are literally applying cycle level
> optimizations that are barely measurable. A 0.1% speedup in linear execution speed
> is already a big success.
>
> > I am going to start doing performance testing on -mcmodel=large to see if it is
> > faster than -fPIE.
>
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine
> instruction level.
>
> Function calls look like this:
>
>  -mcmodel=medium:
>
>    757:   e8 98 ff ff ff          callq  6f4 <test_code>
>
>  -mcmodel=large
>
>    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
>    782:   ff ff ff
>    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
>    789:   ff d0                   callq  *%rax
>
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.
>

I started looking into mcmodel=large and ran into multiple issues. In
the meantime, i thought I would
try difference configurations and compilers.

I did 10 hackbench runs accross 10 reboots with and without pie (same
commit) with gcc 4.9. I copied
the result below and based on the hackbench configuration we are
between -0.29% and 1.92% (average
across is 0.8%) which seems more aligned with what people discussed in
this thread.

I don't know how I got 10% maximum on hackbench, I am still
investigating. It could be the configuration
I used or my base compiler being too old.

> > > I think the fundamental flaw is the assumption that we need a PIE executable
> > > to have a freely relocatable kernel on 64-bit CPUs.
> > >
> > > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> > > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> > > x86-64 address space to randomize the location of kernel text. The location of
> > > modules can be further randomized within that 2GB window.
> >
> > -model=small/medium assume you are on the low 32-bit. It generates instructions
> > where the virtual addresses have the high 32-bit to be zero.
>
> How are these assumptions hardcoded by GCC? Most of the instructions should be
> relocatable straight away, as most call/jump/branch instructions are RIP-relative.

I think PIE is capable to use relative instructions well.
mcmodel=large assumes symbols can be anywhere.

>
> I.e. is there no GCC code generation mode where code can be placed anywhere in the
> canonical address space, yet call and jump distance is within 31 bits so that the
> generated code is fast?

I think that's basically PIE. With PIE, you have the assumption
everything is close, the main issue is any assembly referencing
absolute addresses.

>
> Thanks,
>
>         Ingo

process-pipe-1600 ------
         baseline_samecommit     pie  % diff
0                     16.985  16.999   0.082
1                     17.065  17.071   0.033
2                     17.188  17.130  -0.342
3                     17.148  17.107  -0.240
4                     17.217  17.170  -0.275
5                     17.216  17.145  -0.415
6                     17.161  17.109  -0.304
7                     17.202  17.122  -0.465
8                     17.169  17.173   0.024
9                     17.217  17.178  -0.227
average               17.157  17.120  -0.213
median                17.169  17.122  -0.271
min                   16.985  16.999   0.082
max                   17.217  17.178  -0.228

[14 rows x 3 columns]
threads-pipe-1600 ------
         baseline_samecommit     pie  % diff
0                     17.914  18.041   0.707
1                     18.337  18.352   0.083
2                     18.233  18.457   1.225
3                     18.334  18.402   0.366
4                     18.381  18.369  -0.066
5                     18.370  18.408   0.207
6                     18.337  18.400   0.345
7                     18.368  18.372   0.020
8                     18.328  18.588   1.421
9                     18.369  18.344  -0.138
average               18.297  18.373   0.415
median                18.337  18.373   0.200
min                   17.914  18.041   0.707
max                   18.381  18.588   1.126

[14 rows x 3 columns]
threads-pipe-50 ------
         baseline_samecommit     pie  % diff
0                     23.491  22.794  -2.965
1                     23.219  23.542   1.387
2                     22.886  23.638   3.286
3                     23.233  23.778   2.343
4                     23.228  23.703   2.046
5                     23.000  23.376   1.636
6                     23.589  23.335  -1.079
7                     23.043  23.543   2.169
8                     23.117  23.350   1.007
9                     23.059  23.420   1.564
average               23.187  23.448   1.127
median                23.187  23.448   1.127
min                   22.886  22.794  -0.399
max                   23.589  23.778   0.800

[14 rows x 3 columns]
process-socket-50 ------
         baseline_samecommit     pie  % diff
0                     20.333  20.430   0.479
1                     20.198  20.371   0.856
2                     20.494  20.737   1.185
3                     20.445  21.264   4.008
4                     20.530  20.911   1.854
5                     20.281  20.487   1.015
6                     20.311  20.871   2.757
7                     20.472  20.890   2.044
8                     20.568  20.422  -0.710
9                     20.415  20.647   1.134
average               20.405  20.703   1.462
median                20.415  20.703   1.410
min                   20.198  20.371   0.856
max                   20.568  21.264   3.385

[14 rows x 3 columns]
process-pipe-50 ------
         baseline_samecommit     pie  % diff
0                     20.131  20.643   2.541
1                     20.184  20.658   2.349
2                     20.359  20.907   2.693
3                     20.365  21.284   4.514
4                     20.506  20.578   0.352
5                     20.393  20.599   1.010
6                     20.245  20.515   1.331
7                     20.627  20.964   1.636
8                     20.519  20.862   1.670
9                     20.505  20.741   1.150
average               20.383  20.775   1.922
median                20.383  20.741   1.753
min                   20.131  20.515   1.907
max                   20.627  21.284   3.186

[14 rows x 3 columns]
threads-socket-50 ------
         baseline_samecommit     pie  % diff
0                     23.197  23.728   2.286
1                     23.304  23.585   1.205
2                     23.098  23.379   1.217
3                     23.028  23.787   3.295
4                     23.242  23.122  -0.517
5                     23.036  23.512   2.068
6                     23.139  23.258   0.512
7                     22.801  23.458   2.881
8                     23.319  23.276  -0.187
9                     22.989  23.577   2.557
average               23.115  23.468   1.526
median                23.115  23.468   1.526
min                   22.801  23.122   1.407
max                   23.319  23.787   2.006

[14 rows x 3 columns]
process-socket-1600 ------
         baseline_samecommit     pie  % diff
0                     17.214  17.168  -0.262
1                     17.172  17.195   0.135
2                     17.278  17.137  -0.817
3                     17.173  17.102  -0.414
4                     17.211  17.153  -0.335
5                     17.220  17.160  -0.345
6                     17.224  17.161  -0.365
7                     17.224  17.224  -0.004
8                     17.176  17.135  -0.236
9                     17.242  17.188  -0.311
average               17.213  17.162  -0.296
median                17.214  17.161  -0.306
min                   17.172  17.102  -0.405
max                   17.278  17.224  -0.315

[14 rows x 3 columns]
threads-socket-1600 ------
         baseline_samecommit     pie  % diff
0                     18.395  18.389  -0.031
1                     18.459  18.404  -0.296
2                     18.427  18.445   0.096
3                     18.449  18.421  -0.150
4                     18.416  18.411  -0.026
5                     18.409  18.443   0.185
6                     18.325  18.308  -0.092
7                     18.491  18.317  -0.940
8                     18.496  18.375  -0.656
9                     18.436  18.385  -0.279
average               18.430  18.390  -0.219
median                18.430  18.390  -0.219
min                   18.325  18.308  -0.092
max                   18.496  18.445  -0.278

[14 rows x 3 columns]
Total stats ======
         baseline_samecommit     pie  % diff
average               19.773  19.930   0.791
median                19.773  19.930   0.791
min                   16.985  16.999   0.082
max                   23.589  23.787   0.839

[4 rows x 3 columns]

-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 16:26           ` Daniel Micay
  2017-08-16 16:32             ` Ard Biesheuvel
@ 2017-08-16 16:32             ` Ard Biesheuvel
  1 sibling, 0 replies; 231+ messages in thread
From: Ard Biesheuvel @ 2017-08-16 16:32 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Ingo Molnar, Thomas Garnier, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Peter Zijlstra,
	Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown

On 16 August 2017 at 17:26, Daniel Micay <danielmicay@gmail.com> wrote:
>> How are these assumptions hardcoded by GCC? Most of the instructions
>> should be
>> relocatable straight away, as most call/jump/branch instructions are
>> RIP-relative.
>>
>> I.e. is there no GCC code generation mode where code can be placed
>> anywhere in the
>> canonical address space, yet call and jump distance is within 31 bits
>> so that the
>> generated code is fast?
>
> That's what PIE is meant to do. However, not disabling support for lazy
> linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
> it to add needless overhead.
>
> arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
> CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

The difference with arm64 is that its generic small code model is
already position independent, so we don't have to pass -fpic or -fpie
to the compiler. We only link in PIE mode to get the linker to emit
the dynamic relocation tables into the ELF binary. Relative branches
have a range of +/- 128 MB, which covers the kernel and modules
(unless the option to randomize the module region independently has
been selected, in which case branches between the kernel and modules
may be resolved via PLT entries that are emitted at module load time)

I am not sure how this extrapolates to x86, just adding some context.

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 16:26           ` Daniel Micay
@ 2017-08-16 16:32             ` Ard Biesheuvel
  2017-08-16 16:32             ` Ard Biesheuvel
  1 sibling, 0 replies; 231+ messages in thread
From: Ard Biesheuvel @ 2017-08-16 16:32 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, Len Brown,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On 16 August 2017 at 17:26, Daniel Micay <danielmicay@gmail.com> wrote:
>> How are these assumptions hardcoded by GCC? Most of the instructions
>> should be
>> relocatable straight away, as most call/jump/branch instructions are
>> RIP-relative.
>>
>> I.e. is there no GCC code generation mode where code can be placed
>> anywhere in the
>> canonical address space, yet call and jump distance is within 31 bits
>> so that the
>> generated code is fast?
>
> That's what PIE is meant to do. However, not disabling support for lazy
> linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
> it to add needless overhead.
>
> arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
> CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

The difference with arm64 is that its generic small code model is
already position independent, so we don't have to pass -fpic or -fpie
to the compiler. We only link in PIE mode to get the linker to emit
the dynamic relocation tables into the ELF binary. Relative branches
have a range of +/- 128 MB, which covers the kernel and modules
(unless the option to randomize the module region independently has
been selected, in which case branches between the kernel and modules
may be resolved via PLT entries that are emitted at module load time)

I am not sure how this extrapolates to x86, just adding some context.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
  2017-08-16 16:09           ` Christopher Lameter
  2017-08-16 16:09           ` Christopher Lameter
@ 2017-08-16 16:26           ` Daniel Micay
  2017-08-16 16:32             ` Ard Biesheuvel
  2017-08-16 16:32             ` Ard Biesheuvel
  2017-08-16 16:26           ` Daniel Micay
                             ` (4 subsequent siblings)
  7 siblings, 2 replies; 231+ messages in thread
From: Daniel Micay @ 2017-08-16 16:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

> How are these assumptions hardcoded by GCC? Most of the instructions
> should be 
> relocatable straight away, as most call/jump/branch instructions are
> RIP-relative.
> 
> I.e. is there no GCC code generation mode where code can be placed
> anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits
> so that the 
> generated code is fast?

That's what PIE is meant to do. However, not disabling support for lazy
linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
it to add needless overhead.

arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (2 preceding siblings ...)
  2017-08-16 16:26           ` Daniel Micay
@ 2017-08-16 16:26           ` Daniel Micay
  2017-08-16 16:57           ` Thomas Garnier
                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 231+ messages in thread
From: Daniel Micay @ 2017-08-16 16:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

> How are these assumptions hardcoded by GCC? Most of the instructions
> should be 
> relocatable straight away, as most call/jump/branch instructions are
> RIP-relative.
> 
> I.e. is there no GCC code generation mode where code can be placed
> anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits
> so that the 
> generated code is fast?

That's what PIE is meant to do. However, not disabling support for lazy
linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
it to add needless overhead.

arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
  2017-08-16 16:09           ` Christopher Lameter
@ 2017-08-16 16:09           ` Christopher Lameter
  2017-08-16 16:26           ` Daniel Micay
                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 231+ messages in thread
From: Christopher Lameter @ 2017-08-16 16:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki

On Wed, 16 Aug 2017, Ingo Molnar wrote:

> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.

Ahh finally a limit is in sight as to how much security hardening etc can
reduce kernel performance.

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
@ 2017-08-16 16:09           ` Christopher Lameter
  2017-08-16 16:09           ` Christopher Lameter
                             ` (6 subsequent siblings)
  7 siblings, 0 replies; 231+ messages in thread
From: Christopher Lameter @ 2017-08-16 16:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel

On Wed, 16 Aug 2017, Ingo Molnar wrote:

> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.

Ahh finally a limit is in sight as to how much security hardening etc can
reduce kernel performance.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
                           ` (2 preceding siblings ...)
  2017-08-16 15:12         ` Ingo Molnar
@ 2017-08-16 15:12         ` Ingo Molnar
  2017-08-16 16:09           ` Christopher Lameter
                             ` (7 more replies)
  2017-08-21 14:31         ` Peter Zijlstra
  2017-08-21 14:31         ` Peter Zijlstra
  5 siblings, 8 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-16 15:12 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> > Do these changes get us closer to being able to build the kernel as truly
> >> > position independent, i.e. to place it anywhere in the valid x86-64 address
> >> > space? Or any other advantages?
> >>
> >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
> >> have a full randomized address space where position and order of sections are
> >> completely random. There is still some work to get there but being able to build
> >> a PIE kernel is a significant step.
> >
> > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> >
> > +config RANDOMIZE_BASE_LARGE
> > +       bool "Increase the randomization range of the kernel image"
> > +       depends on X86_64 && RANDOMIZE_BASE
> > +       select X86_PIE
> > +       select X86_MODULE_PLTS if MODULES
> > +       default n
> > +       ---help---
> > +         Build the kernel as a Position Independent Executable (PIE) and
> > +         increase the available randomization range from 1GB to 3GB.
> > +
> > +         This option impacts performance on kernel CPU intensive workloads up
> > +         to 10% due to PIE generated code. Impact on user-mode processes and
> > +         typical usage would be significantly less (0.50% when you build the
> > +         kernel).
> > +
> > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > +         increase on the .text sections). The vmlinux binary will be
> > +         significantly smaller due to less relocations.
> >
> > To put 10% kernel overhead into perspective: enabling this option wipes out about
> > 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> > ... (!!)
> 
> Note that 10% is the high-bound of a CPU intensive workload.

Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of 
modern kernel performance. In many cases we are literally applying cycle level 
optimizations that are barely measurable. A 0.1% speedup in linear execution speed 
is already a big success.

> I am going to start doing performance testing on -mcmodel=large to see if it is 
> faster than -fPIE.

Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
instruction level.

Function calls look like this:

 -mcmodel=medium:

   757:   e8 98 ff ff ff          callq  6f4 <test_code>

 -mcmodel=large

   77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
   782:   ff ff ff 
   785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
   789:   ff d0                   callq  *%rax

And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
totally unacceptable.

> > I think the fundamental flaw is the assumption that we need a PIE executable 
> > to have a freely relocatable kernel on 64-bit CPUs.
> >
> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
> > x86-64 address space to randomize the location of kernel text. The location of 
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates instructions 
> where the virtual addresses have the high 32-bit to be zero.

How are these assumptions hardcoded by GCC? Most of the instructions should be 
relocatable straight away, as most call/jump/branch instructions are RIP-relative.

I.e. is there no GCC code generation mode where code can be placed anywhere in the 
canonical address space, yet call and jump distance is within 31 bits so that the 
generated code is fast?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:47         ` Daniel Micay
  2017-08-15 14:47         ` Daniel Micay
@ 2017-08-16 15:12         ` Ingo Molnar
  2017-08-16 15:12         ` Ingo Molnar
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-16 15:12 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> > Do these changes get us closer to being able to build the kernel as truly
> >> > position independent, i.e. to place it anywhere in the valid x86-64 address
> >> > space? Or any other advantages?
> >>
> >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
> >> have a full randomized address space where position and order of sections are
> >> completely random. There is still some work to get there but being able to build
> >> a PIE kernel is a significant step.
> >
> > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> >
> > +config RANDOMIZE_BASE_LARGE
> > +       bool "Increase the randomization range of the kernel image"
> > +       depends on X86_64 && RANDOMIZE_BASE
> > +       select X86_PIE
> > +       select X86_MODULE_PLTS if MODULES
> > +       default n
> > +       ---help---
> > +         Build the kernel as a Position Independent Executable (PIE) and
> > +         increase the available randomization range from 1GB to 3GB.
> > +
> > +         This option impacts performance on kernel CPU intensive workloads up
> > +         to 10% due to PIE generated code. Impact on user-mode processes and
> > +         typical usage would be significantly less (0.50% when you build the
> > +         kernel).
> > +
> > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > +         increase on the .text sections). The vmlinux binary will be
> > +         significantly smaller due to less relocations.
> >
> > To put 10% kernel overhead into perspective: enabling this option wipes out about
> > 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> > ... (!!)
> 
> Note that 10% is the high-bound of a CPU intensive workload.

Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of 
modern kernel performance. In many cases we are literally applying cycle level 
optimizations that are barely measurable. A 0.1% speedup in linear execution speed 
is already a big success.

> I am going to start doing performance testing on -mcmodel=large to see if it is 
> faster than -fPIE.

Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
instruction level.

Function calls look like this:

 -mcmodel=medium:

   757:   e8 98 ff ff ff          callq  6f4 <test_code>

 -mcmodel=large

   77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
   782:   ff ff ff 
   785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
   789:   ff d0                   callq  *%rax

And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
totally unacceptable.

> > I think the fundamental flaw is the assumption that we need a PIE executable 
> > to have a freely relocatable kernel on 64-bit CPUs.
> >
> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
> > x86-64 address space to randomize the location of kernel text. The location of 
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates instructions 
> where the virtual addresses have the high 32-bit to be zero.

How are these assumptions hardcoded by GCC? Most of the instructions should be 
relocatable straight away, as most call/jump/branch instructions are RIP-relative.

I.e. is there no GCC code generation mode where code can be placed anywhere in the 
canonical address space, yet call and jump distance is within 31 bits so that the 
generated code is fast?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:47         ` Daniel Micay
  2017-08-15 14:58           ` Thomas Garnier
@ 2017-08-15 14:58           ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-15 14:58 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Tue, Aug 15, 2017 at 7:47 AM, Daniel Micay <danielmicay@gmail.com> wrote:
> On 15 August 2017 at 10:20, Thomas Garnier <thgarnie@google.com> wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Thomas Garnier <thgarnie@google.com> wrote:
>>>
>>>> > Do these changes get us closer to being able to build the kernel as truly
>>>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>>>> > space? Or any other advantages?
>>>>
>>>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>>>> have a full randomized address space where position and order of sections are
>>>> completely random. There is still some work to get there but being able to build
>>>> a PIE kernel is a significant step.
>>>
>>> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>>>
>>> +config RANDOMIZE_BASE_LARGE
>>> +       bool "Increase the randomization range of the kernel image"
>>> +       depends on X86_64 && RANDOMIZE_BASE
>>> +       select X86_PIE
>>> +       select X86_MODULE_PLTS if MODULES
>>> +       default n
>>> +       ---help---
>>> +         Build the kernel as a Position Independent Executable (PIE) and
>>> +         increase the available randomization range from 1GB to 3GB.
>>> +
>>> +         This option impacts performance on kernel CPU intensive workloads up
>>> +         to 10% due to PIE generated code. Impact on user-mode processes and
>>> +         typical usage would be significantly less (0.50% when you build the
>>> +         kernel).
>>> +
>>> +         The kernel and modules will generate slightly more assembly (1 to 2%
>>> +         increase on the .text sections). The vmlinux binary will be
>>> +         significantly smaller due to less relocations.
>>>
>>> To put 10% kernel overhead into perspective: enabling this option wipes out about
>>> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
>>> ... (!!)
>>
>> Note that 10% is the high-bound of a CPU intensive workload.
>
> The cost can be reduced by using -fno-plt these days but some work
> might be required to make that work with the kernel.
>
> Where does that 10% estimate in the kernel config docs come from? I'd
> be surprised if it really cost that much on x86_64. That's a realistic
> cost for i386 with modern GCC (it used to be worse) but I'd expect
> x86_64 to be closer to 2% even for CPU intensive workloads. It should
> be very close to zero with -fno-plt.

I got 8 to 10% on hackbench. Other benchmarks were 4% or lower.

I will do look at more recent compiler and no-plt as well.

-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:47         ` Daniel Micay
@ 2017-08-15 14:58           ` Thomas Garnier
  2017-08-15 14:58           ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-15 14:58 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Tue, Aug 15, 2017 at 7:47 AM, Daniel Micay <danielmicay@gmail.com> wrote:
> On 15 August 2017 at 10:20, Thomas Garnier <thgarnie@google.com> wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Thomas Garnier <thgarnie@google.com> wrote:
>>>
>>>> > Do these changes get us closer to being able to build the kernel as truly
>>>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>>>> > space? Or any other advantages?
>>>>
>>>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>>>> have a full randomized address space where position and order of sections are
>>>> completely random. There is still some work to get there but being able to build
>>>> a PIE kernel is a significant step.
>>>
>>> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>>>
>>> +config RANDOMIZE_BASE_LARGE
>>> +       bool "Increase the randomization range of the kernel image"
>>> +       depends on X86_64 && RANDOMIZE_BASE
>>> +       select X86_PIE
>>> +       select X86_MODULE_PLTS if MODULES
>>> +       default n
>>> +       ---help---
>>> +         Build the kernel as a Position Independent Executable (PIE) and
>>> +         increase the available randomization range from 1GB to 3GB.
>>> +
>>> +         This option impacts performance on kernel CPU intensive workloads up
>>> +         to 10% due to PIE generated code. Impact on user-mode processes and
>>> +         typical usage would be significantly less (0.50% when you build the
>>> +         kernel).
>>> +
>>> +         The kernel and modules will generate slightly more assembly (1 to 2%
>>> +         increase on the .text sections). The vmlinux binary will be
>>> +         significantly smaller due to less relocations.
>>>
>>> To put 10% kernel overhead into perspective: enabling this option wipes out about
>>> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
>>> ... (!!)
>>
>> Note that 10% is the high-bound of a CPU intensive workload.
>
> The cost can be reduced by using -fno-plt these days but some work
> might be required to make that work with the kernel.
>
> Where does that 10% estimate in the kernel config docs come from? I'd
> be surprised if it really cost that much on x86_64. That's a realistic
> cost for i386 with modern GCC (it used to be worse) but I'd expect
> x86_64 to be closer to 2% even for CPU intensive workloads. It should
> be very close to zero with -fno-plt.

I got 8 to 10% on hackbench. Other benchmarks were 4% or lower.

I will do look at more recent compiler and no-plt as well.

-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:47         ` Daniel Micay
@ 2017-08-15 14:47         ` Daniel Micay
  2017-08-15 14:58           ` Thomas Garnier
  2017-08-15 14:58           ` Thomas Garnier
  2017-08-16 15:12         ` Ingo Molnar
                           ` (3 subsequent siblings)
  5 siblings, 2 replies; 231+ messages in thread
From: Daniel Micay @ 2017-08-15 14:47 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On 15 August 2017 at 10:20, Thomas Garnier <thgarnie@google.com> wrote:
> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> > Do these changes get us closer to being able to build the kernel as truly
>>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>>> > space? Or any other advantages?
>>>
>>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>>> have a full randomized address space where position and order of sections are
>>> completely random. There is still some work to get there but being able to build
>>> a PIE kernel is a significant step.
>>
>> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>>
>> +config RANDOMIZE_BASE_LARGE
>> +       bool "Increase the randomization range of the kernel image"
>> +       depends on X86_64 && RANDOMIZE_BASE
>> +       select X86_PIE
>> +       select X86_MODULE_PLTS if MODULES
>> +       default n
>> +       ---help---
>> +         Build the kernel as a Position Independent Executable (PIE) and
>> +         increase the available randomization range from 1GB to 3GB.
>> +
>> +         This option impacts performance on kernel CPU intensive workloads up
>> +         to 10% due to PIE generated code. Impact on user-mode processes and
>> +         typical usage would be significantly less (0.50% when you build the
>> +         kernel).
>> +
>> +         The kernel and modules will generate slightly more assembly (1 to 2%
>> +         increase on the .text sections). The vmlinux binary will be
>> +         significantly smaller due to less relocations.
>>
>> To put 10% kernel overhead into perspective: enabling this option wipes out about
>> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
>> ... (!!)
>
> Note that 10% is the high-bound of a CPU intensive workload.

The cost can be reduced by using -fno-plt these days but some work
might be required to make that work with the kernel.

Where does that 10% estimate in the kernel config docs come from? I'd
be surprised if it really cost that much on x86_64. That's a realistic
cost for i386 with modern GCC (it used to be worse) but I'd expect
x86_64 to be closer to 2% even for CPU intensive workloads. It should
be very close to zero with -fno-plt.

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
@ 2017-08-15 14:47         ` Daniel Micay
  2017-08-15 14:47         ` Daniel Micay
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 231+ messages in thread
From: Daniel Micay @ 2017-08-15 14:47 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On 15 August 2017 at 10:20, Thomas Garnier <thgarnie@google.com> wrote:
> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> > Do these changes get us closer to being able to build the kernel as truly
>>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>>> > space? Or any other advantages?
>>>
>>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>>> have a full randomized address space where position and order of sections are
>>> completely random. There is still some work to get there but being able to build
>>> a PIE kernel is a significant step.
>>
>> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>>
>> +config RANDOMIZE_BASE_LARGE
>> +       bool "Increase the randomization range of the kernel image"
>> +       depends on X86_64 && RANDOMIZE_BASE
>> +       select X86_PIE
>> +       select X86_MODULE_PLTS if MODULES
>> +       default n
>> +       ---help---
>> +         Build the kernel as a Position Independent Executable (PIE) and
>> +         increase the available randomization range from 1GB to 3GB.
>> +
>> +         This option impacts performance on kernel CPU intensive workloads up
>> +         to 10% due to PIE generated code. Impact on user-mode processes and
>> +         typical usage would be significantly less (0.50% when you build the
>> +         kernel).
>> +
>> +         The kernel and modules will generate slightly more assembly (1 to 2%
>> +         increase on the .text sections). The vmlinux binary will be
>> +         significantly smaller due to less relocations.
>>
>> To put 10% kernel overhead into perspective: enabling this option wipes out about
>> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
>> ... (!!)
>
> Note that 10% is the high-bound of a CPU intensive workload.

The cost can be reduced by using -fno-plt these days but some work
might be required to make that work with the kernel.

Where does that 10% estimate in the kernel config docs come from? I'd
be surprised if it really cost that much on x86_64. That's a realistic
cost for i386 with modern GCC (it used to be worse) but I'd expect
x86_64 to be closer to 2% even for CPU intensive workloads. It should
be very close to zero with -fno-plt.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15  7:56     ` Ingo Molnar
  2017-08-15 14:20       ` Thomas Garnier
@ 2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:47         ` Daniel Micay
                           ` (5 more replies)
  1 sibling, 6 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-15 14:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> > Do these changes get us closer to being able to build the kernel as truly
>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>> > space? Or any other advantages?
>>
>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>> have a full randomized address space where position and order of sections are
>> completely random. There is still some work to get there but being able to build
>> a PIE kernel is a significant step.
>
> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>
> +config RANDOMIZE_BASE_LARGE
> +       bool "Increase the randomization range of the kernel image"
> +       depends on X86_64 && RANDOMIZE_BASE
> +       select X86_PIE
> +       select X86_MODULE_PLTS if MODULES
> +       default n
> +       ---help---
> +         Build the kernel as a Position Independent Executable (PIE) and
> +         increase the available randomization range from 1GB to 3GB.
> +
> +         This option impacts performance on kernel CPU intensive workloads up
> +         to 10% due to PIE generated code. Impact on user-mode processes and
> +         typical usage would be significantly less (0.50% when you build the
> +         kernel).
> +
> +         The kernel and modules will generate slightly more assembly (1 to 2%
> +         increase on the .text sections). The vmlinux binary will be
> +         significantly smaller due to less relocations.
>
> To put 10% kernel overhead into perspective: enabling this option wipes out about
> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> ... (!!)

Note that 10% is the high-bound of a CPU intensive workload.

>
> I think the fundamental flaw is the assumption that we need a PIE executable to
> have a freely relocatable kernel on 64-bit CPUs.
>
> Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> x86-64 address space to randomize the location of kernel text. The location of
> modules can be further randomized within that 2GB window.

-model=small/medium assume you are on the low 32-bit. It generates
instructions where the virtual addresses have the high 32-bit to be
zero.

I am going to start doing performance testing on -mcmodel=large to see
if it is faster than -fPIE.

>
> It should have far less performance impact than the register-losing and
> overhead-inducing -fpie / -mcmodel=large (for modules) execution models.
>
> My quick guess is tha the performance impact might be close to zero in fact.

If mcmodel=small/medium was possible for kernel, I don't think it
would have less performance impact than mcmodel=large. It would still
need to set the high 32-bit to be a static value, only the relocation
would be a different size.

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15  7:56     ` Ingo Molnar
@ 2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:20       ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-15 14:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> > Do these changes get us closer to being able to build the kernel as truly
>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>> > space? Or any other advantages?
>>
>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>> have a full randomized address space where position and order of sections are
>> completely random. There is still some work to get there but being able to build
>> a PIE kernel is a significant step.
>
> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>
> +config RANDOMIZE_BASE_LARGE
> +       bool "Increase the randomization range of the kernel image"
> +       depends on X86_64 && RANDOMIZE_BASE
> +       select X86_PIE
> +       select X86_MODULE_PLTS if MODULES
> +       default n
> +       ---help---
> +         Build the kernel as a Position Independent Executable (PIE) and
> +         increase the available randomization range from 1GB to 3GB.
> +
> +         This option impacts performance on kernel CPU intensive workloads up
> +         to 10% due to PIE generated code. Impact on user-mode processes and
> +         typical usage would be significantly less (0.50% when you build the
> +         kernel).
> +
> +         The kernel and modules will generate slightly more assembly (1 to 2%
> +         increase on the .text sections). The vmlinux binary will be
> +         significantly smaller due to less relocations.
>
> To put 10% kernel overhead into perspective: enabling this option wipes out about
> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> ... (!!)

Note that 10% is the high-bound of a CPU intensive workload.

>
> I think the fundamental flaw is the assumption that we need a PIE executable to
> have a freely relocatable kernel on 64-bit CPUs.
>
> Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> x86-64 address space to randomize the location of kernel text. The location of
> modules can be further randomized within that 2GB window.

-model=small/medium assume you are on the low 32-bit. It generates
instructions where the virtual addresses have the high 32-bit to be
zero.

I am going to start doing performance testing on -mcmodel=large to see
if it is faster than -fPIE.

>
> It should have far less performance impact than the register-losing and
> overhead-inducing -fpie / -mcmodel=large (for modules) execution models.
>
> My quick guess is tha the performance impact might be close to zero in fact.

If mcmodel=small/medium was possible for kernel, I don't think it
would have less performance impact than mcmodel=large. It would still
need to set the high 32-bit to be a static value, only the relocation
would be a different size.

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-11 15:09   ` Thomas Garnier
  2017-08-15  7:56     ` Ingo Molnar
@ 2017-08-15  7:56     ` Ingo Molnar
  2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:20       ` Thomas Garnier
  1 sibling, 2 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-15  7:56 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> > Do these changes get us closer to being able to build the kernel as truly 
> > position independent, i.e. to place it anywhere in the valid x86-64 address 
> > space? Or any other advantages?
> 
> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to 
> have a full randomized address space where position and order of sections are 
> completely random. There is still some work to get there but being able to build 
> a PIE kernel is a significant step.

So I _really_ dislike the whole PIE approach, because of the huge slowdown:

+config RANDOMIZE_BASE_LARGE
+       bool "Increase the randomization range of the kernel image"
+       depends on X86_64 && RANDOMIZE_BASE
+       select X86_PIE
+       select X86_MODULE_PLTS if MODULES
+       default n
+       ---help---
+         Build the kernel as a Position Independent Executable (PIE) and
+         increase the available randomization range from 1GB to 3GB.
+
+         This option impacts performance on kernel CPU intensive workloads up
+         to 10% due to PIE generated code. Impact on user-mode processes and
+         typical usage would be significantly less (0.50% when you build the
+         kernel).
+
+         The kernel and modules will generate slightly more assembly (1 to 2%
+         increase on the .text sections). The vmlinux binary will be
+         significantly smaller due to less relocations.

To put 10% kernel overhead into perspective: enabling this option wipes out about 
5-10 years worth of painstaking optimizations we've done to keep the kernel fast 
... (!!)

I think the fundamental flaw is the assumption that we need a PIE executable to 
have a freely relocatable kernel on 64-bit CPUs.

Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
-mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
x86-64 address space to randomize the location of kernel text. The location of 
modules can be further randomized within that 2GB window.

It should have far less performance impact than the register-losing and 
overhead-inducing -fpie / -mcmodel=large (for modules) execution models.

My quick guess is tha the performance impact might be close to zero in fact.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-11 15:09   ` Thomas Garnier
@ 2017-08-15  7:56     ` Ingo Molnar
  2017-08-15  7:56     ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-15  7:56 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> > Do these changes get us closer to being able to build the kernel as truly 
> > position independent, i.e. to place it anywhere in the valid x86-64 address 
> > space? Or any other advantages?
> 
> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to 
> have a full randomized address space where position and order of sections are 
> completely random. There is still some work to get there but being able to build 
> a PIE kernel is a significant step.

So I _really_ dislike the whole PIE approach, because of the huge slowdown:

+config RANDOMIZE_BASE_LARGE
+       bool "Increase the randomization range of the kernel image"
+       depends on X86_64 && RANDOMIZE_BASE
+       select X86_PIE
+       select X86_MODULE_PLTS if MODULES
+       default n
+       ---help---
+         Build the kernel as a Position Independent Executable (PIE) and
+         increase the available randomization range from 1GB to 3GB.
+
+         This option impacts performance on kernel CPU intensive workloads up
+         to 10% due to PIE generated code. Impact on user-mode processes and
+         typical usage would be significantly less (0.50% when you build the
+         kernel).
+
+         The kernel and modules will generate slightly more assembly (1 to 2%
+         increase on the .text sections). The vmlinux binary will be
+         significantly smaller due to less relocations.

To put 10% kernel overhead into perspective: enabling this option wipes out about 
5-10 years worth of painstaking optimizations we've done to keep the kernel fast 
... (!!)

I think the fundamental flaw is the assumption that we need a PIE executable to 
have a freely relocatable kernel on 64-bit CPUs.

Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
-mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
x86-64 address space to randomize the location of kernel text. The location of 
modules can be further randomized within that 2GB window.

It should have far less performance impact than the register-losing and 
overhead-inducing -fpie / -mcmodel=large (for modules) execution models.

My quick guess is tha the performance impact might be close to zero in fact.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-11 12:41 ` Ingo Molnar
  2017-08-11 15:09   ` Thomas Garnier
@ 2017-08-11 15:09   ` Thomas Garnier
  2017-08-15  7:56     ` Ingo Molnar
  2017-08-15  7:56     ` Ingo Molnar
  1 sibling, 2 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-11 15:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Aug 11, 2017 at 5:41 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> Changes:
>>  - v2:
>>    - Add support for global stack cookie while compiler default to fs without
>>      mcmodel=kernel
>>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>>      preserve.
>>
>> These patches make the changes necessary to build the kernel as Position
>> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
>> the top 2G of the virtual address space. It allows to optionally extend the
>> KASLR randomization range from 1G to 3G.
>
> So this:
>
>  61 files changed, 923 insertions(+), 299 deletions(-)
>
> ... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces
> of code, to gain what appears to be only ~1.5 more bits of randomization!

The range increase is a way to use PIE right away.

>
> Do these changes get us closer to being able to build the kernel as truly position
> independent, i.e. to place it anywhere in the valid x86-64 address space? Or any
> other advantages?

Yes, PIE allows us to put the kernel anywhere in memory. It will allow
us to have a full randomized address space where position and order of
sections are completely random. There is still some work to get there
but being able to build a PIE kernel is a significant step.

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-11 12:41 ` Ingo Molnar
@ 2017-08-11 15:09   ` Thomas Garnier
  2017-08-11 15:09   ` Thomas Garnier
  1 sibling, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-11 15:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Fri, Aug 11, 2017 at 5:41 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> Changes:
>>  - v2:
>>    - Add support for global stack cookie while compiler default to fs without
>>      mcmodel=kernel
>>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>>      preserve.
>>
>> These patches make the changes necessary to build the kernel as Position
>> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
>> the top 2G of the virtual address space. It allows to optionally extend the
>> KASLR randomization range from 1G to 3G.
>
> So this:
>
>  61 files changed, 923 insertions(+), 299 deletions(-)
>
> ... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces
> of code, to gain what appears to be only ~1.5 more bits of randomization!

The range increase is a way to use PIE right away.

>
> Do these changes get us closer to being able to build the kernel as truly position
> independent, i.e. to place it anywhere in the valid x86-64 address space? Or any
> other advantages?

Yes, PIE allows us to put the kernel anywhere in memory. It will allow
us to have a full randomized address space where position and order of
sections are completely random. There is still some work to get there
but being able to build a PIE kernel is a significant step.

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-10 17:25 Thomas Garnier
@ 2017-08-11 12:41 ` Ingo Molnar
  2017-08-11 15:09   ` Thomas Garnier
  2017-08-11 15:09   ` Thomas Garnier
  2017-08-11 12:41 ` Ingo Molnar
  1 sibling, 2 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-11 12:41 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> Changes:
>  - v2:
>    - Add support for global stack cookie while compiler default to fs without
>      mcmodel=kernel
>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>      preserve.
> 
> These patches make the changes necessary to build the kernel as Position
> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
> the top 2G of the virtual address space. It allows to optionally extend the
> KASLR randomization range from 1G to 3G.

So this:

 61 files changed, 923 insertions(+), 299 deletions(-)

... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces 
of code, to gain what appears to be only ~1.5 more bits of randomization!

Do these changes get us closer to being able to build the kernel as truly position 
independent, i.e. to place it anywhere in the valid x86-64 address space? Or any 
other advantages?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-10 17:25 Thomas Garnier
  2017-08-11 12:41 ` Ingo Molnar
@ 2017-08-11 12:41 ` Ingo Molnar
  1 sibling, 0 replies; 231+ messages in thread
From: Ingo Molnar @ 2017-08-11 12:41 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, kernel-hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, x86, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel


* Thomas Garnier <thgarnie@google.com> wrote:

> Changes:
>  - v2:
>    - Add support for global stack cookie while compiler default to fs without
>      mcmodel=kernel
>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>      preserve.
> 
> These patches make the changes necessary to build the kernel as Position
> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
> the top 2G of the virtual address space. It allows to optionally extend the
> KASLR randomization range from 1G to 3G.

So this:

 61 files changed, 923 insertions(+), 299 deletions(-)

... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces 
of code, to gain what appears to be only ~1.5 more bits of randomization!

Do these changes get us closer to being able to build the kernel as truly position 
independent, i.e. to place it anywhere in the valid x86-64 address space? Or any 
other advantages?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-08-10 17:25 Thomas Garnier
  2017-08-11 12:41 ` Ingo Molnar
  2017-08-11 12:41 ` Ingo Molnar
  0 siblings, 2 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-10 17:25 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Thomas Garnier, Matthias Kaehlcke, Boris Ostrovsky,
	Juergen Gross, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown
  Cc: x86, linux-crypto, linux-kernel, xen-devel, kvm, linux-pm,
	linux-arch, linux-sparse, kernel-hardening

Changes:
 - v2:
   - Add support for global stack cookie while compiler default to fs without
     mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
     preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general.

The patches:
 - 1-3, 5-15: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 16: Adapt percpu design to work correctly when PIE is enabled.
 - 17: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 18: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie
 - 20: Add the CONFIG_X86_PIE option (off by default)
 - 21: Adapt relocation tool to generate a 64-bit relocation table.
 - 22: Add options to build modules as mcmodel=large and dynamically create a
       PLT for relative references out of range (adapted from arm64).
 - 23: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Hackbench (50% and 1600% loads):
 - PIE disabled: no significant change (-0.50% / +0.50%)
 - PIE enabled: 7% to 8% on half load, 10% on heavy load.

These results are aligned with the different research on user-mode PIE
impact on cpu intensive benchmarks (around 10% on x86_64).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-1% / +1%)
 - PIE enabled: 3% to 4%

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (-0.22% / +0.06%)
 - PIE enabled: around 0.50%
 System Time:
 - PIE disabled: no significant change (-0.99% / -1.28%)
 - PIE enabled: 5% to 6%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: 472928672 bytes (-0.000169% from baseline)
 - PIE enabled: 216878461 bytes (-54.14% from baseline)
 .text sections:
 - PIE disabled: 9373572 bytes (+0.04% from baseline)
 - PIE enabled: 9499138 bytes (+1.38% from baseline)

The big decrease in vmlinux file size is due to the lower number of
relocations appended to the file.

diffstat:
 arch/x86/Kconfig                             |   42 +++++
 arch/x86/Makefile                            |   28 +++
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++---
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++-----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   17 ++
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   11 -
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/Makefile                     |    2 
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/head64.c                     |   30 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   46 +++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module-plts.c                |  198 +++++++++++++++++++++++++++
 arch/x86/kernel/module.c                     |   18 +-
 arch/x86/kernel/module.lds                   |    4 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  134 +++++++++++++++---
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +-
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-asm.h                       |    3 
 arch/x86/xen/xen-head.S                      |    9 -
 include/asm-generic/sections.h               |    6 
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 61 files changed, 923 insertions(+), 299 deletions(-)

^ permalink raw reply	[flat|nested] 231+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-08-10 17:25 Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-08-10 17:25 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Thomas Garnier, Matthias Kaehlcke, Boris Ostrovsky,
	Juergen Gross, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek
  Cc: linux-arch, kvm, linux-pm, x86, linux-kernel, linux-sparse,
	linux-crypto, kernel-hardening, xen-devel

Changes:
 - v2:
   - Add support for global stack cookie while compiler default to fs without
     mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
     preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general.

The patches:
 - 1-3, 5-15: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 16: Adapt percpu design to work correctly when PIE is enabled.
 - 17: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 18: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie
 - 20: Add the CONFIG_X86_PIE option (off by default)
 - 21: Adapt relocation tool to generate a 64-bit relocation table.
 - 22: Add options to build modules as mcmodel=large and dynamically create a
       PLT for relative references out of range (adapted from arm64).
 - 23: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Hackbench (50% and 1600% loads):
 - PIE disabled: no significant change (-0.50% / +0.50%)
 - PIE enabled: 7% to 8% on half load, 10% on heavy load.

These results are aligned with the different research on user-mode PIE
impact on cpu intensive benchmarks (around 10% on x86_64).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-1% / +1%)
 - PIE enabled: 3% to 4%

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (-0.22% / +0.06%)
 - PIE enabled: around 0.50%
 System Time:
 - PIE disabled: no significant change (-0.99% / -1.28%)
 - PIE enabled: 5% to 6%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: 472928672 bytes (-0.000169% from baseline)
 - PIE enabled: 216878461 bytes (-54.14% from baseline)
 .text sections:
 - PIE disabled: 9373572 bytes (+0.04% from baseline)
 - PIE enabled: 9499138 bytes (+1.38% from baseline)

The big decrease in vmlinux file size is due to the lower number of
relocations appended to the file.

diffstat:
 arch/x86/Kconfig                             |   42 +++++
 arch/x86/Makefile                            |   28 +++
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++---
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++-----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   17 ++
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   11 -
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/Makefile                     |    2 
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/head64.c                     |   30 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   46 +++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module-plts.c                |  198 +++++++++++++++++++++++++++
 arch/x86/kernel/module.c                     |   18 +-
 arch/x86/kernel/module.lds                   |    4 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  134 +++++++++++++++---
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +-
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-asm.h                       |    3 
 arch/x86/xen/xen-head.S                      |    9 -
 include/asm-generic/sections.h               |    6 
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 61 files changed, 923 insertions(+), 299 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-07-18 22:33 Thomas Garnier
@ 2017-07-19 14:08 ` Christopher Lameter
  2017-07-19 14:08 ` Christopher Lameter
  1 sibling, 0 replies; 231+ messages in thread
From: Christopher Lameter @ 2017-07-19 14:08 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Andy Lutomirski, Borislav Petkov,
	Kirill A . Shutemov, Brian Gerst, Borislav Petkov,
	Christian Borntraeger, Rafael J . Wysocki

On Tue, 18 Jul 2017, Thomas Garnier wrote:

> Performance/Size impact:
> Hackbench (50% and 1600% loads):
>  - PIE enabled: 7% to 8% on half load, 10% on heavy load.
> slab_test (average of 10 runs):
>  - PIE enabled: 3% to 4%
> Kernbench (average of 10 Half and Optimal runs):
>  - PIE enabled: 5% to 6%
>
> Size of vmlinux (Ubuntu configuration):
>  File size:
>  - PIE disabled: 472928672 bytes (-0.000169% from baseline)
>  - PIE enabled: 216878461 bytes (-54.14% from baseline)

Maybe we need something like CONFIG_PARANOIA so that we can determine at
build time how much performance we want to sacrifice for performance?

Its going to be difficult to understand what all these hardening config
options do.

^ permalink raw reply	[flat|nested] 231+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-07-18 22:33 Thomas Garnier
  2017-07-19 14:08 ` Christopher Lameter
@ 2017-07-19 14:08 ` Christopher Lameter
  1 sibling, 0 replies; 231+ messages in thread
From: Christopher Lameter @ 2017-07-19 14:08 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Michal Hocko, kvm, Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li, x86,
	Paul Gortmaker, Pavel Machek, H . Peter Anvin, kernel-hardening,
	Thomas Gleixner, Chris Metcalf, linux-arch, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel, Peter Foley,
	Christian Borntraeger, linux-sparse, Matthias Kaehlcke,
	xen-devel, Borislav Petkov

On Tue, 18 Jul 2017, Thomas Garnier wrote:

> Performance/Size impact:
> Hackbench (50% and 1600% loads):
>  - PIE enabled: 7% to 8% on half load, 10% on heavy load.
> slab_test (average of 10 runs):
>  - PIE enabled: 3% to 4%
> Kernbench (average of 10 Half and Optimal runs):
>  - PIE enabled: 5% to 6%
>
> Size of vmlinux (Ubuntu configuration):
>  File size:
>  - PIE disabled: 472928672 bytes (-0.000169% from baseline)
>  - PIE enabled: 216878461 bytes (-54.14% from baseline)

Maybe we need something like CONFIG_PARANOIA so that we can determine at
build time how much performance we want to sacrifice for performance?

Its going to be difficult to understand what all these hardening config
options do.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-07-18 22:33 Thomas Garnier
  2017-07-19 14:08 ` Christopher Lameter
  2017-07-19 14:08 ` Christopher Lameter
  0 siblings, 2 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-07-18 22:33 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Andy Lutomirski, Borislav Petkov,
	Kirill A . Shutemov, Brian Gerst, Borislav Petkov,
	Christian Borntraeger, Rafael J . Wysocki
  Cc: x86, linux-crypto, linux-kernel, xen-devel, kvm, linux-pm,
	linux-arch, linux-sparse, kernel-hardening

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general.

The patches:
 - 1-3, 5-15: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 16: Adapt percpu design to work correctly when PIE is enabled.
 - 17: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 18: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add the CONFIG_X86_PIE option (off by default)
 - 20: Adapt relocation tool to generate a 64-bit relocation table.
 - 21: Add options to build modules as mcmodel=large and dynamically create a
       PLT for relative references out of range (adapted from arm64).
 - 22: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Hackbench (50% and 1600% loads):
 - PIE disabled: no significant change (-0.50% / +0.50%)
 - PIE enabled: 7% to 8% on half load, 10% on heavy load.

These results are aligned with the different research on user-mode PIE
impact on cpu intensive benchmarks (around 10% on x86_64).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-1% / +1%)
 - PIE enabled: 3% to 4%

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (-0.22% / +0.06%)
 - PIE enabled: around 0.50%
 System Time:
 - PIE disabled: no significant change (-0.99% / -1.28%)
 - PIE enabled: 5% to 6%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: 472928672 bytes (-0.000169% from baseline)
 - PIE enabled: 216878461 bytes (-54.14% from baseline)
 .text sections:
 - PIE disabled: 9373572 bytes (+0.04% from baseline)
 - PIE enabled: 9499138 bytes (+1.38% from baseline)

The big decrease in vmlinux file size is due to the lower number of
relocations appended to the file.

diffstat:
 arch/x86/Kconfig                             |   37 +++++
 arch/x86/Makefile                            |   17 ++
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++---
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++-----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_64.S                    |   26 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   16 ++
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |    8 -
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/kernel/Makefile                     |    2 
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/cpu/common.c                 |    4 
 arch/x86/kernel/head64.c                     |   28 +++
 arch/x86/kernel/head_64.S                    |   47 +++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module-plts.c                |  198 +++++++++++++++++++++++++++
 arch/x86/kernel/module.c                     |   18 +-
 arch/x86/kernel/module.lds                   |    4 
 arch/x86/kernel/relocate_kernel_64.S         |    2 
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  134 +++++++++++++++---
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +-
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-asm.h                       |    3 
 arch/x86/xen/xen-head.S                      |    9 -
 include/asm-generic/sections.h               |    6 
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 54 files changed, 868 insertions(+), 282 deletions(-)

^ permalink raw reply	[flat|nested] 231+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-07-18 22:33 Thomas Garnier
  0 siblings, 0 replies; 231+ messages in thread
From: Thomas Garnier @ 2017-07-18 22:33 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Andy Lutomirski, Borislav Petkov,
	Kirill A . Shutemov, Brian Gerst, Borislav Petkov,
	Christian Borntraeger, Rafael J . Wysocki
  Cc: linux-arch, kvm, linux-pm, x86, linux-kernel, linux-sparse,
	linux-crypto, kernel-hardening, xen-devel

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general.

The patches:
 - 1-3, 5-15: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 16: Adapt percpu design to work correctly when PIE is enabled.
 - 17: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 18: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add the CONFIG_X86_PIE option (off by default)
 - 20: Adapt relocation tool to generate a 64-bit relocation table.
 - 21: Add options to build modules as mcmodel=large and dynamically create a
       PLT for relative references out of range (adapted from arm64).
 - 22: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Hackbench (50% and 1600% loads):
 - PIE disabled: no significant change (-0.50% / +0.50%)
 - PIE enabled: 7% to 8% on half load, 10% on heavy load.

These results are aligned with the different research on user-mode PIE
impact on cpu intensive benchmarks (around 10% on x86_64).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-1% / +1%)
 - PIE enabled: 3% to 4%

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (-0.22% / +0.06%)
 - PIE enabled: around 0.50%
 System Time:
 - PIE disabled: no significant change (-0.99% / -1.28%)
 - PIE enabled: 5% to 6%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: 472928672 bytes (-0.000169% from baseline)
 - PIE enabled: 216878461 bytes (-54.14% from baseline)
 .text sections:
 - PIE disabled: 9373572 bytes (+0.04% from baseline)
 - PIE enabled: 9499138 bytes (+1.38% from baseline)

The big decrease in vmlinux file size is due to the lower number of
relocations appended to the file.

diffstat:
 arch/x86/Kconfig                             |   37 +++++
 arch/x86/Makefile                            |   17 ++
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++---
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++-----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_64.S                    |   26 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   16 ++
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |    8 -
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/kernel/Makefile                     |    2 
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/cpu/common.c                 |    4 
 arch/x86/kernel/head64.c                     |   28 +++
 arch/x86/kernel/head_64.S                    |   47 +++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module-plts.c                |  198 +++++++++++++++++++++++++++
 arch/x86/kernel/module.c                     |   18 +-
 arch/x86/kernel/module.lds                   |    4 
 arch/x86/kernel/relocate_kernel_64.S         |    2 
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  134 +++++++++++++++---
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +-
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-asm.h                       |    3 
 arch/x86/xen/xen-head.S                      |    9 -
 include/asm-generic/sections.h               |    6 
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 54 files changed, 868 insertions(+), 282 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 231+ messages in thread

end of thread, other threads:[~2017-10-20  8:13 UTC | newest]

Thread overview: 231+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-04 21:19 x86: PIE support and option to extend KASLR randomization Thomas Garnier
2017-10-04 21:19 ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 01/27] x86/crypto: Adapt assembly for PIE support Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 02/27] x86: Use symbol name on bug table " Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 03/27] x86: Use symbol name in jump " Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 04/27] x86: Add macro to get symbol address " Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 05/27] x86: relocate_kernel - Adapt assembly " Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 06/27] x86/entry/64: " Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 07/27] x86: pm-trace - " Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 08/27] x86/CPU: " Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 09/27] x86/acpi: " Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 10/27] x86/boot/64: " Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 11/27] x86/power/64: " Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 12/27] x86/paravirt: " Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 13/27] x86/boot/64: Use _text in a global " Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 14/27] x86/percpu: Adapt percpu " Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 15/27] compiler: Option to default to hidden symbols Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 16/27] x86/relocs: Handle PIE relocations Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 17/27] xen: Adapt assembly for PIE support Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 18/27] kvm: " Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 19/27] x86: Support global stack cookie Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 20/27] x86/ftrace: Adapt function tracing for PIE support Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-05 13:06   ` Steven Rostedt
2017-10-05 13:06     ` [kernel-hardening] " Steven Rostedt
2017-10-05 16:01     ` Thomas Garnier
2017-10-05 16:01       ` [kernel-hardening] " Thomas Garnier
2017-10-05 16:01       ` Thomas Garnier
2017-10-05 16:11       ` Steven Rostedt
2017-10-05 16:11         ` [kernel-hardening] " Steven Rostedt
2017-10-05 16:14         ` Thomas Garnier via Virtualization
2017-10-05 16:14         ` Thomas Garnier
2017-10-05 16:14           ` [kernel-hardening] " Thomas Garnier
2017-10-05 16:14           ` Thomas Garnier
2017-10-05 16:14         ` Thomas Garnier
2017-10-05 16:11       ` Steven Rostedt
2017-10-05 16:01     ` Thomas Garnier via Virtualization
2017-10-05 16:01     ` Thomas Garnier
2017-10-05 13:06   ` Steven Rostedt
2017-10-04 21:19 ` [RFC v3 21/27] x86/mm/dump_pagetables: Fix address markers index on x86_64 Thomas Garnier via Virtualization
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` [RFC v3 22/27] x86/modules: Add option to start module section after kernel Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:19 ` [RFC v3 23/27] x86/modules: Adapt module loading for PIE support Thomas Garnier
2017-10-04 21:19   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:19   ` Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier via Virtualization
2017-10-04 21:20 ` [RFC v3 24/27] x86/mm: Make the x86 GOT read-only Thomas Garnier via Virtualization
2017-10-04 21:20 ` Thomas Garnier
2017-10-04 21:20   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:20   ` Thomas Garnier
2017-10-04 21:20 ` [RFC v3 25/27] x86/pie: Add option to build the kernel as PIE Thomas Garnier via Virtualization
2017-10-04 21:20 ` Thomas Garnier
2017-10-04 21:20   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:20   ` Thomas Garnier
2017-10-04 21:20 ` [RFC v3 26/27] x86/relocs: Add option to generate 64-bit relocations Thomas Garnier
2017-10-04 21:20   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:20   ` Thomas Garnier
2017-10-04 21:20 ` Thomas Garnier via Virtualization
2017-10-04 21:20 ` [RFC v3 27/27] x86/kaslr: Add option to extend KASLR range from 1GB to 3GB Thomas Garnier via Virtualization
2017-10-04 21:20 ` Thomas Garnier
2017-10-04 21:20   ` [kernel-hardening] " Thomas Garnier
2017-10-04 21:20   ` Thomas Garnier
  -- strict thread matches above, loose matches on Subject: below --
2017-10-04 21:19 x86: PIE support and option to extend KASLR randomization Thomas Garnier
2017-10-04 21:19 Thomas Garnier via Virtualization
2017-08-10 17:25 Thomas Garnier
2017-08-11 12:41 ` Ingo Molnar
2017-08-11 15:09   ` Thomas Garnier
2017-08-11 15:09   ` Thomas Garnier
2017-08-15  7:56     ` Ingo Molnar
2017-08-15  7:56     ` Ingo Molnar
2017-08-15 14:20       ` Thomas Garnier
2017-08-15 14:20       ` Thomas Garnier
2017-08-15 14:47         ` Daniel Micay
2017-08-15 14:47         ` Daniel Micay
2017-08-15 14:58           ` Thomas Garnier
2017-08-15 14:58           ` Thomas Garnier
2017-08-16 15:12         ` Ingo Molnar
2017-08-16 15:12         ` Ingo Molnar
2017-08-16 16:09           ` Christopher Lameter
2017-08-16 16:09           ` Christopher Lameter
2017-08-16 16:26           ` Daniel Micay
2017-08-16 16:32             ` Ard Biesheuvel
2017-08-16 16:32             ` Ard Biesheuvel
2017-08-16 16:26           ` Daniel Micay
2017-08-16 16:57           ` Thomas Garnier
2017-08-16 16:57           ` Thomas Garnier
2017-08-17  8:09             ` Ingo Molnar
2017-08-17  8:09             ` Ingo Molnar
2017-08-17 14:10               ` Thomas Garnier
2017-08-17 14:10               ` Thomas Garnier
2017-08-24 21:13                 ` Thomas Garnier
2017-08-24 21:13                 ` Thomas Garnier
2017-08-24 21:42                   ` Linus Torvalds
2017-08-24 21:42                   ` Linus Torvalds
2017-08-25 15:35                     ` Thomas Garnier
2017-08-25 15:35                     ` Thomas Garnier
2017-08-25  1:07                   ` Steven Rostedt
2017-08-25  8:04                   ` Ingo Molnar
2017-08-25  8:04                   ` Ingo Molnar
2017-08-25 15:05                     ` Thomas Garnier
2017-08-25 15:05                     ` Thomas Garnier
2017-08-29 19:34                       ` Thomas Garnier
2017-09-21 15:59                         ` Ingo Molnar
2017-09-21 16:10                           ` Ard Biesheuvel
2017-09-21 16:10                           ` Ard Biesheuvel
2017-09-21 21:21                             ` Thomas Garnier
2017-09-21 21:21                             ` Thomas Garnier
2017-09-22  4:24                               ` Markus Trippelsdorf
2017-09-22 14:38                                 ` Thomas Garnier
2017-09-22 14:38                                 ` Thomas Garnier
2017-09-22 23:55                               ` Thomas Garnier
2017-09-22 23:55                               ` Thomas Garnier
2017-09-21 21:16                           ` Thomas Garnier
2017-09-22  0:06                             ` Thomas Garnier
2017-09-22  0:06                             ` Thomas Garnier
2017-09-22 16:32                             ` Ingo Molnar
2017-09-22 18:08                               ` Thomas Garnier
2017-09-22 18:08                               ` Thomas Garnier
2017-09-23  9:43                                 ` Ingo Molnar
2017-10-02 20:28                                   ` Thomas Garnier
2017-10-02 20:28                                   ` Thomas Garnier
2017-09-23  9:43                                 ` Ingo Molnar
2017-09-22 18:38                               ` H. Peter Anvin
2017-09-22 18:57                                 ` Kees Cook
2017-09-22 19:06                                   ` H. Peter Anvin
2017-09-22 22:19                                     ` hjl.tools
2017-09-22 22:30                                     ` hjl.tools
2017-09-22 19:06                                   ` H. Peter Anvin
2017-09-22 18:57                                 ` Kees Cook
2017-09-22 18:59                                 ` Thomas Garnier
2017-09-22 18:59                                 ` Thomas Garnier
2017-09-23  9:49                                 ` Ingo Molnar
2017-09-23  9:49                                 ` Ingo Molnar
2017-09-22 18:38                               ` H. Peter Anvin
2017-09-22 16:32                             ` Ingo Molnar
2017-09-21 21:16                           ` Thomas Garnier
2017-09-21 15:59                         ` Ingo Molnar
2017-08-29 19:34                       ` Thomas Garnier
2017-08-21 13:32           ` Peter Zijlstra
2017-08-21 14:28             ` Peter Zijlstra
2017-08-21 14:28             ` Peter Zijlstra
2017-09-22 18:27               ` H. Peter Anvin
2017-09-23 10:00                 ` Ingo Molnar
2017-09-24 22:37                   ` Pavel Machek
2017-09-25  7:33                     ` Ingo Molnar
2017-09-25  7:33                     ` Ingo Molnar
2017-10-06 10:39                       ` Pavel Machek
2017-10-06 10:39                       ` Pavel Machek
2017-10-20  8:13                         ` Ingo Molnar
2017-10-20  8:13                         ` Ingo Molnar
2017-09-24 22:37                   ` Pavel Machek
2017-09-23 10:00                 ` Ingo Molnar
2017-09-22 18:27               ` H. Peter Anvin
2017-08-21 13:32           ` Peter Zijlstra
2017-08-21 14:31         ` Peter Zijlstra
2017-08-21 15:57           ` Thomas Garnier
2017-08-21 15:57           ` Thomas Garnier
2017-08-28  1:26           ` H. Peter Anvin
2017-08-28  1:26           ` H. Peter Anvin
2017-08-21 14:31         ` Peter Zijlstra
2017-08-11 12:41 ` Ingo Molnar
2017-08-10 17:25 Thomas Garnier
2017-07-18 22:33 Thomas Garnier
2017-07-19 14:08 ` Christopher Lameter
2017-07-19 14:08 ` Christopher Lameter
2017-07-18 22:33 Thomas Garnier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.