All of lore.kernel.org
 help / color / mirror / Atom feed
* x86: PIE support and option to extend KASLR randomization
@ 2017-07-18 22:33 Thomas Garnier
  0 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-07-18 22:33 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Andy Lutomirski, Borislav Petkov,
	Kirill A . Shutemov, Brian Gerst, Borislav Petkov,
	Christian Borntraeger, Rafael J . Wysocki
  Cc: linux-arch, kvm, linux-pm, x86, linux-kernel, linux-sparse,
	linux-crypto, kernel-hardening, xen-devel

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general.

The patches:
 - 1-3, 5-15: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 16: Adapt percpu design to work correctly when PIE is enabled.
 - 17: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 18: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add the CONFIG_X86_PIE option (off by default)
 - 20: Adapt relocation tool to generate a 64-bit relocation table.
 - 21: Add options to build modules as mcmodel=large and dynamically create a
       PLT for relative references out of range (adapted from arm64).
 - 22: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Hackbench (50% and 1600% loads):
 - PIE disabled: no significant change (-0.50% / +0.50%)
 - PIE enabled: 7% to 8% on half load, 10% on heavy load.

These results are aligned with the different research on user-mode PIE
impact on cpu intensive benchmarks (around 10% on x86_64).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-1% / +1%)
 - PIE enabled: 3% to 4%

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (-0.22% / +0.06%)
 - PIE enabled: around 0.50%
 System Time:
 - PIE disabled: no significant change (-0.99% / -1.28%)
 - PIE enabled: 5% to 6%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: 472928672 bytes (-0.000169% from baseline)
 - PIE enabled: 216878461 bytes (-54.14% from baseline)
 .text sections:
 - PIE disabled: 9373572 bytes (+0.04% from baseline)
 - PIE enabled: 9499138 bytes (+1.38% from baseline)

The big decrease in vmlinux file size is due to the lower number of
relocations appended to the file.

diffstat:
 arch/x86/Kconfig                             |   37 +++++
 arch/x86/Makefile                            |   17 ++
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++---
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++-----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_64.S                    |   26 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   16 ++
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |    8 -
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/kernel/Makefile                     |    2 
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/cpu/common.c                 |    4 
 arch/x86/kernel/head64.c                     |   28 +++
 arch/x86/kernel/head_64.S                    |   47 +++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module-plts.c                |  198 +++++++++++++++++++++++++++
 arch/x86/kernel/module.c                     |   18 +-
 arch/x86/kernel/module.lds                   |    4 
 arch/x86/kernel/relocate_kernel_64.S         |    2 
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  134 +++++++++++++++---
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +-
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-asm.h                       |    3 
 arch/x86/xen/xen-head.S                      |    9 -
 include/asm-generic/sections.h               |    6 
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 54 files changed, 868 insertions(+), 282 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-10-06 10:39                       ` Pavel Machek
@ 2017-10-20  8:13                         ` Ingo Molnar
  2017-10-20  8:13                         ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-10-20  8:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: H. Peter Anvin, Peter Zijlstra, Thomas Garnier, Herbert Xu,
	David S . Miller, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Tejun Heo


* Pavel Machek <pavel@ucw.cz> wrote:

> On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> > 
> > * Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > > > For example, there would be collision with regular user-space mappings, right? 
> > > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > > > the kernel lives?
> > > 
> > > Local unpriviledged users can probably get your secret bits using cache probing 
> > > and jump prediction buffers.
> > > 
> > > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> > > leak it for you, anyway.
> > 
> > Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
> > angle.
> 
> I believe any x86 CPU running Linux will leak it. And with CPU vendors
> putting "artifical inteligence" into branch prediction, no, I don't
> think it is going to get better.
> 
> That does not mean we shoudl not prevent mmap() info leak, but...

That might or might not be so, but there's a world of a difference between
running a relatively long statistical attack figuring out the kernel's
location, versus being able to programmatically probe the kernel's location
by using large MAP_FIXED user-space mmap()s, within a few dozen microseconds
or so and a 100% guaranteed, non-statistical result.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-10-06 10:39                       ` Pavel Machek
  2017-10-20  8:13                         ` Ingo Molnar
@ 2017-10-20  8:13                         ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-10-20  8:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Peter Foley,
	H. Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel,
	Rafael J . Wysocki


* Pavel Machek <pavel@ucw.cz> wrote:

> On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> > 
> > * Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > > > For example, there would be collision with regular user-space mappings, right? 
> > > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > > > the kernel lives?
> > > 
> > > Local unpriviledged users can probably get your secret bits using cache probing 
> > > and jump prediction buffers.
> > > 
> > > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> > > leak it for you, anyway.
> > 
> > Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
> > angle.
> 
> I believe any x86 CPU running Linux will leak it. And with CPU vendors
> putting "artifical inteligence" into branch prediction, no, I don't
> think it is going to get better.
> 
> That does not mean we shoudl not prevent mmap() info leak, but...

That might or might not be so, but there's a world of a difference between
running a relatively long statistical attack figuring out the kernel's
location, versus being able to programmatically probe the kernel's location
by using large MAP_FIXED user-space mmap()s, within a few dozen microseconds
or so and a 100% guaranteed, non-statistical result.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-25  7:33                     ` Ingo Molnar
  2017-10-06 10:39                       ` Pavel Machek
@ 2017-10-06 10:39                       ` Pavel Machek
  2017-10-20  8:13                         ` Ingo Molnar
  2017-10-20  8:13                         ` Ingo Molnar
  1 sibling, 2 replies; 106+ messages in thread
From: Pavel Machek @ 2017-10-06 10:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Peter Zijlstra, Thomas Garnier, Herbert Xu,
	David S . Miller, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> 
> * Pavel Machek <pavel@ucw.cz> wrote:
> 
> > > For example, there would be collision with regular user-space mappings, right? 
> > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > > the kernel lives?
> > 
> > Local unpriviledged users can probably get your secret bits using cache probing 
> > and jump prediction buffers.
> > 
> > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> > leak it for you, anyway.
> 
> Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
> angle.

I believe any x86 CPU running Linux will leak it. And with CPU vendors
putting "artifical inteligence" into branch prediction, no, I don't
think it is going to get better.

That does not mean we shoudl not prevent mmap() info leak, but...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-25  7:33                     ` Ingo Molnar
@ 2017-10-06 10:39                       ` Pavel Machek
  2017-10-06 10:39                       ` Pavel Machek
  1 sibling, 0 replies; 106+ messages in thread
From: Pavel Machek @ 2017-10-06 10:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Peter Foley,
	H. Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel,
	Rafael J . Wysocki


[-- Attachment #1.1: Type: text/plain, Size: 1039 bytes --]

On Mon 2017-09-25 09:33:42, Ingo Molnar wrote:
> 
> * Pavel Machek <pavel@ucw.cz> wrote:
> 
> > > For example, there would be collision with regular user-space mappings, right? 
> > > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > > the kernel lives?
> > 
> > Local unpriviledged users can probably get your secret bits using cache probing 
> > and jump prediction buffers.
> > 
> > Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> > leak it for you, anyway.
> 
> Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
> angle.

I believe any x86 CPU running Linux will leak it. And with CPU vendors
putting "artifical inteligence" into branch prediction, no, I don't
think it is going to get better.

That does not mean we shoudl not prevent mmap() info leak, but...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 ` Thomas Garnier
  0 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)

^ permalink raw reply	[flat|nested] 106+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 ` Thomas Garnier
  0 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: x86, linux-crypto, linux-kernel, linux-pm, virtualization,
	xen-devel, linux-arch, linux-sparse, kvm, linux-doc,
	kernel-hardening

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)

^ permalink raw reply	[flat|nested] 106+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 Thomas Garnier via Virtualization
  0 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier via Virtualization @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)

^ permalink raw reply	[flat|nested] 106+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-10-04 21:19 Thomas Garnier
  0 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-10-04 21:19 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Kees Cook, Matthias Kaehlcke, Tom Lendacky,
	Andy Lutomirski, Kirill A . Shutemov, Borislav Petkov,
	Rafael J . Wysocki, Len Brown, Pavel Machek, Juergen Gross,
	Chris Wright, Alok Kataria, Rusty Russell, Tejun Heo,
	Christoph Lameter
  Cc: linux-arch, kvm, linux-pm, x86, linux-doc, linux-kernel,
	virtualization, linux-sparse, linux-crypto, kernel-hardening,
	xen-devel

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general. Thanks to Roland McGrath on his
feedback for using -pie versus --emit-relocs and details on compiler code
generation.

The patches:
 - 1-3, 5-1#, 17-18: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 14: Adapt percpu design to work correctly when PIE is enabled.
 - 15: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 16: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie.
 - 20: Support ftrace with PIE (used on Ubuntu config).
 - 21: Fix incorrect address marker on dump_pagetables.
 - 22: Add option to move the module section just after the kernel.
 - 23: Adapt module loading to support PIE with dynamic GOT.
 - 24: Make the GOT read-only.
 - 25: Add the CONFIG_X86_PIE option (off by default).
 - 26: Adapt relocation tool to generate a 64-bit relocation table.
 - 27: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Size of vmlinux (Default configuration):
 File size:
 - PIE disabled: +0.000031%
 - PIE enabled: -3.210% (less relocations)
 .text section:
 - PIE disabled: +0.000644%
 - PIE enabled: +0.837%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: -0.201%
 - PIE enabled: -0.082%
 .text section:
 - PIE disabled: same
 - PIE enabled: +1.319%

Size of vmlinux (Default configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +0.814%

Size of vmlinux (Ubuntu configuration + ORC):
 File size:
 - PIE enabled: -3.167%
 .text section:
 - PIE enabled: +1.26%

The size increase is mainly due to not having access to the 32-bit signed
relocation that can be used with mcmodel=kernel. A small part is due to reduced
optimization for PIE code. This bug [1] was opened with gcc to provide a better
code generation for kernel PIE.

Hackbench (50% and 1600% on thread/process for pipe/sockets):
 - PIE disabled: no significant change (avg +0.1% on latest test).
 - PIE enabled: between -0.50% to +0.86% in average (default and Ubuntu config).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-2% on latest run, likely noise).
 - PIE enabled: between -1% and +0.8% on latest runs.

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (avg -0.239%)
 - PIE enabled: average +0.07%
 System Time:
 - PIE disabled: no significant change (avg -0.277%)
 - PIE enabled: average +0.7%

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

diffstat:
 Documentation/x86/x86_64/mm.txt              |    3 
 arch/x86/Kconfig                             |   37 ++++
 arch/x86/Makefile                            |   14 +
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++--
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 ++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++--
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/ftrace.h                |   23 ++-
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   14 +
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pgtable_64_types.h      |    6 
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   12 +
 arch/x86/include/asm/sections.h              |    4 
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/cpu/microcode/core.c         |    4 
 arch/x86/kernel/ftrace.c                     |  168 ++++++++++++++--------
 arch/x86/kernel/head64.c                     |   32 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   41 ++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module.c                     |  204 ++++++++++++++++++++++++++-
 arch/x86/kernel/module.lds                   |    3 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/mm/dump_pagetables.c                |   11 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  170 ++++++++++++++++++++--
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-head.S                      |    9 -
 arch/x86/xen/xen-pvh.S                       |   13 +
 drivers/base/firmware_class.c                |    4 
 include/asm-generic/sections.h               |    6 
 include/asm-generic/vmlinux.lds.h            |   12 +
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 kernel/trace/trace.h                         |    4 
 lib/dynamic_debug.c                          |    4 
 70 files changed, 1109 insertions(+), 363 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-23  9:43                                 ` Ingo Molnar
  2017-10-02 20:28                                   ` Thomas Garnier
@ 2017-10-02 20:28                                   ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-10-02 20:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Sat, Sep 23, 2017 at 2:43 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> >   2) we first implement the additional entropy bits that Linus suggested.
>> >
>> > does this work for you?
>>
>> Sure, I can look at how feasible that is. If it is, can I send
>> everything as part of the same patch set? The additional entropy would
>> be enabled for all KASLR but PIE will be off-by-default of course.
>
> Sure, can all be part of the same series.

I looked deeper in the change Linus proposed (moving the .text section
based on the cacheline). I think the complexity is too high for the
value of this change.

To move only the .text section would require at least the following changes:
 - Overall change on how relocations are processed, need to separate
relocations in and outside of the .text section.
 - Break assumptions on _text alignment while keeping calculation on
size accurate (for example _end - _text).

With a rough attempt at this, I managed to pass early boot and still
crash later on.

This change would be valuable if you leak the address of a section
other than .text and you want to know where .text is. Meaning the main
bug that you are trying to exploit only allow you to execute code (and
you are trying to ROP in .text). I would argue that a better
mitigation for this type of bugs is moving function pointer to
read-only sections and using stack cookies (for ret address). This
change won't prevent other type of attacks, like data corruption.

I think it would be more valuable to look at something like selfrando
/ pagerando [1] but maybe wait a bit for it to be more mature
(especially on the debugging side).

What do you think?

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-23  9:43                                 ` Ingo Molnar
@ 2017-10-02 20:28                                   ` Thomas Garnier
  2017-10-02 20:28                                   ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-10-02 20:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Sat, Sep 23, 2017 at 2:43 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> >   2) we first implement the additional entropy bits that Linus suggested.
>> >
>> > does this work for you?
>>
>> Sure, I can look at how feasible that is. If it is, can I send
>> everything as part of the same patch set? The additional entropy would
>> be enabled for all KASLR but PIE will be off-by-default of course.
>
> Sure, can all be part of the same series.

I looked deeper in the change Linus proposed (moving the .text section
based on the cacheline). I think the complexity is too high for the
value of this change.

To move only the .text section would require at least the following changes:
 - Overall change on how relocations are processed, need to separate
relocations in and outside of the .text section.
 - Break assumptions on _text alignment while keeping calculation on
size accurate (for example _end - _text).

With a rough attempt at this, I managed to pass early boot and still
crash later on.

This change would be valuable if you leak the address of a section
other than .text and you want to know where .text is. Meaning the main
bug that you are trying to exploit only allow you to execute code (and
you are trying to ROP in .text). I would argue that a better
mitigation for this type of bugs is moving function pointer to
read-only sections and using stack cookies (for ret address). This
change won't prevent other type of attacks, like data corruption.

I think it would be more valuable to look at something like selfrando
/ pagerando [1] but maybe wait a bit for it to be more mature
(especially on the debugging side).

What do you think?

[1] http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-24 22:37                   ` Pavel Machek
  2017-09-25  7:33                     ` Ingo Molnar
@ 2017-09-25  7:33                     ` Ingo Molnar
  2017-10-06 10:39                       ` Pavel Machek
  2017-10-06 10:39                       ` Pavel Machek
  1 sibling, 2 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-25  7:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: H. Peter Anvin, Peter Zijlstra, Thomas Garnier, Herbert Xu,
	David S . Miller, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Tejun Heo


* Pavel Machek <pavel@ucw.cz> wrote:

> > For example, there would be collision with regular user-space mappings, right? 
> > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > the kernel lives?
> 
> Local unpriviledged users can probably get your secret bits using cache probing 
> and jump prediction buffers.
> 
> Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> leak it for you, anyway.

Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
angle.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-24 22:37                   ` Pavel Machek
@ 2017-09-25  7:33                     ` Ingo Molnar
  2017-09-25  7:33                     ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-25  7:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Peter Foley,
	H. Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel,
	Rafael J . Wysocki


* Pavel Machek <pavel@ucw.cz> wrote:

> > For example, there would be collision with regular user-space mappings, right? 
> > Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where 
> > the kernel lives?
> 
> Local unpriviledged users can probably get your secret bits using cache probing 
> and jump prediction buffers.
> 
> Yes, you don't want to leak the information using mmap(MAP_FIXED), but CPU will 
> leak it for you, anyway.

Depends on the CPU I think, and CPU vendors are busy trying to mitigate this 
angle.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-23 10:00                 ` Ingo Molnar
@ 2017-09-24 22:37                   ` Pavel Machek
  2017-09-25  7:33                     ` Ingo Molnar
  2017-09-25  7:33                     ` Ingo Molnar
  2017-09-24 22:37                   ` Pavel Machek
  1 sibling, 2 replies; 106+ messages in thread
From: Pavel Machek @ 2017-09-24 22:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Peter Zijlstra, Thomas Garnier, Herbert Xu,
	David S . Miller, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Hi!

> > We do need to consider how we want modules to fit into whatever model we
> > choose, though.  They can be adjacent, or we could go with a more
> > traditional dynamic link model where the modules can be separate, and
> > chained together with the main kernel via the GOT.
> 
> So I believe we should start with 'adjacent'. The thing is, having modules 
> separately randomized mostly helps if any of the secret locations fails and
> we want to prevent hopping from one to the other. But if one the kernel-privileged
> secret location fails then KASLR has already failed to a significant degree...
> 
> So I think the large-PIC model for modules does not buy us any real advantages in 
> practice, and the disadvantages of large-PIC are real and most Linux users have to 
> pay that cost unconditionally, as distro kernels have half of their kernel 
> functionality living in modules.
> 
> But I do see fundamental value in being able to hide the kernel somewhere in a ~48 
> bits address space, especially if we also implement Linus's suggestion to utilize 
> the lower bits as well. 0..281474976710656 is a nicely large range and will get 
> larger with time.
> 
> But it should all be done smartly and carefully:
> 
> For example, there would be collision with regular user-space mappings, right?
> Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
> the kernel lives?

Local unpriviledged users can probably get your secret bits using
cache probing and jump prediction buffers.

Yes, you don't want to leak the information using mmap(MAP_FIXED), but
CPU will leak it for you, anyway.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-23 10:00                 ` Ingo Molnar
  2017-09-24 22:37                   ` Pavel Machek
@ 2017-09-24 22:37                   ` Pavel Machek
  1 sibling, 0 replies; 106+ messages in thread
From: Pavel Machek @ 2017-09-24 22:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Peter Foley,
	H. Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel,
	Rafael J . Wysocki


[-- Attachment #1.1: Type: text/plain, Size: 1813 bytes --]

Hi!

> > We do need to consider how we want modules to fit into whatever model we
> > choose, though.  They can be adjacent, or we could go with a more
> > traditional dynamic link model where the modules can be separate, and
> > chained together with the main kernel via the GOT.
> 
> So I believe we should start with 'adjacent'. The thing is, having modules 
> separately randomized mostly helps if any of the secret locations fails and
> we want to prevent hopping from one to the other. But if one the kernel-privileged
> secret location fails then KASLR has already failed to a significant degree...
> 
> So I think the large-PIC model for modules does not buy us any real advantages in 
> practice, and the disadvantages of large-PIC are real and most Linux users have to 
> pay that cost unconditionally, as distro kernels have half of their kernel 
> functionality living in modules.
> 
> But I do see fundamental value in being able to hide the kernel somewhere in a ~48 
> bits address space, especially if we also implement Linus's suggestion to utilize 
> the lower bits as well. 0..281474976710656 is a nicely large range and will get 
> larger with time.
> 
> But it should all be done smartly and carefully:
> 
> For example, there would be collision with regular user-space mappings, right?
> Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
> the kernel lives?

Local unpriviledged users can probably get your secret bits using
cache probing and jump prediction buffers.

Yes, you don't want to leak the information using mmap(MAP_FIXED), but
CPU will leak it for you, anyway.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:27               ` H. Peter Anvin
@ 2017-09-23 10:00                 ` Ingo Molnar
  2017-09-24 22:37                   ` Pavel Machek
  2017-09-24 22:37                   ` Pavel Machek
  2017-09-23 10:00                 ` Ingo Molnar
  1 sibling, 2 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-23 10:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, Thomas Garnier, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* H. Peter Anvin <hpa@zytor.com> wrote:

> We do need to consider how we want modules to fit into whatever model we
> choose, though.  They can be adjacent, or we could go with a more
> traditional dynamic link model where the modules can be separate, and
> chained together with the main kernel via the GOT.

So I believe we should start with 'adjacent'. The thing is, having modules 
separately randomized mostly helps if any of the secret locations fails and
we want to prevent hopping from one to the other. But if one the kernel-privileged
secret location fails then KASLR has already failed to a significant degree...

So I think the large-PIC model for modules does not buy us any real advantages in 
practice, and the disadvantages of large-PIC are real and most Linux users have to 
pay that cost unconditionally, as distro kernels have half of their kernel 
functionality living in modules.

But I do see fundamental value in being able to hide the kernel somewhere in a ~48 
bits address space, especially if we also implement Linus's suggestion to utilize 
the lower bits as well. 0..281474976710656 is a nicely large range and will get 
larger with time.

But it should all be done smartly and carefully:

For example, there would be collision with regular user-space mappings, right?
Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
the kernel lives?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:27               ` H. Peter Anvin
  2017-09-23 10:00                 ` Ingo Molnar
@ 2017-09-23 10:00                 ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-23 10:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki


* H. Peter Anvin <hpa@zytor.com> wrote:

> We do need to consider how we want modules to fit into whatever model we
> choose, though.  They can be adjacent, or we could go with a more
> traditional dynamic link model where the modules can be separate, and
> chained together with the main kernel via the GOT.

So I believe we should start with 'adjacent'. The thing is, having modules 
separately randomized mostly helps if any of the secret locations fails and
we want to prevent hopping from one to the other. But if one the kernel-privileged
secret location fails then KASLR has already failed to a significant degree...

So I think the large-PIC model for modules does not buy us any real advantages in 
practice, and the disadvantages of large-PIC are real and most Linux users have to 
pay that cost unconditionally, as distro kernels have half of their kernel 
functionality living in modules.

But I do see fundamental value in being able to hide the kernel somewhere in a ~48 
bits address space, especially if we also implement Linus's suggestion to utilize 
the lower bits as well. 0..281474976710656 is a nicely large range and will get 
larger with time.

But it should all be done smartly and carefully:

For example, there would be collision with regular user-space mappings, right?
Can local unprivileged users use mmap(MAP_FIXED) probing to figure out where
the kernel lives?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
                                                   ` (4 preceding siblings ...)
  2017-09-23  9:49                                 ` Ingo Molnar
@ 2017-09-23  9:49                                 ` Ingo Molnar
  5 siblings, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-23  9:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 09/22/17 09:32, Ingo Molnar wrote:
> > 
> > BTW., I think things improved with ORC because with ORC we have RBP as an extra 
> > register and with PIE we lose RBX - so register pressure in code generation is 
> > lower.
> > 
> 
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

Indeed, but we'd use a new register _a lot_ for constructs, transforming:

  mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

into:

  lea    rbx,[rip+<off>] (7 bytes)
  mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

... which I suppose is quite close to (but not the same as) 'losing' RBX.

Of course the compiler can pick other registers as well, not that it matters much 
to register pressure in larger functions in the end. Plus if the compiler has to 
pick a callee-saved register there's the additional saving/restoring overhead of 
that as well.

Right?

> I'm somewhat confused how we can have as much as almost 1% overhead.  I suspect 
> that we end up making a GOT and maybe even a PLT for no good reason.

So the above transformation alone would explain a good chunk of the overhead I 
think.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
                                                   ` (3 preceding siblings ...)
  2017-09-22 18:59                                 ` Thomas Garnier
@ 2017-09-23  9:49                                 ` Ingo Molnar
  2017-09-23  9:49                                 ` Ingo Molnar
  5 siblings, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-23  9:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 09/22/17 09:32, Ingo Molnar wrote:
> > 
> > BTW., I think things improved with ORC because with ORC we have RBP as an extra 
> > register and with PIE we lose RBX - so register pressure in code generation is 
> > lower.
> > 
> 
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

Indeed, but we'd use a new register _a lot_ for constructs, transforming:

  mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

into:

  lea    rbx,[rip+<off>] (7 bytes)
  mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

... which I suppose is quite close to (but not the same as) 'losing' RBX.

Of course the compiler can pick other registers as well, not that it matters much 
to register pressure in larger functions in the end. Plus if the compiler has to 
pick a callee-saved register there's the additional saving/restoring overhead of 
that as well.

Right?

> I'm somewhat confused how we can have as much as almost 1% overhead.  I suspect 
> that we end up making a GOT and maybe even a PLT for no good reason.

So the above transformation alone would explain a good chunk of the overhead I 
think.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:08                               ` Thomas Garnier
@ 2017-09-23  9:43                                 ` Ingo Molnar
  2017-10-02 20:28                                   ` Thomas Garnier
  2017-10-02 20:28                                   ` Thomas Garnier
  2017-09-23  9:43                                 ` Ingo Molnar
  1 sibling, 2 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-23  9:43 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> >   2) we first implement the additional entropy bits that Linus suggested.
> >
> > does this work for you?
> 
> Sure, I can look at how feasible that is. If it is, can I send
> everything as part of the same patch set? The additional entropy would
> be enabled for all KASLR but PIE will be off-by-default of course.

Sure, can all be part of the same series.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:08                               ` Thomas Garnier
  2017-09-23  9:43                                 ` Ingo Molnar
@ 2017-09-23  9:43                                 ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-23  9:43 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> >   2) we first implement the additional entropy bits that Linus suggested.
> >
> > does this work for you?
> 
> Sure, I can look at how feasible that is. If it is, can I send
> everything as part of the same patch set? The additional entropy would
> be enabled for all KASLR but PIE will be off-by-default of course.

Sure, can all be part of the same series.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:21                             ` Thomas Garnier
  2017-09-22  4:24                               ` Markus Trippelsdorf
  2017-09-22 23:55                               ` Thomas Garnier
@ 2017-09-22 23:55                               ` Thomas Garnier
  2 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22 23:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Thu, Sep 21, 2017 at 2:21 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>>
>> On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> >   window, but in reality I've been procrastinating this is due to the permanent,
>> >   non-trivial impact PIE has on generated C code. )
>> >
>> > * Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> 1) PIE sometime needs two instructions to represent a single
>> >> instruction on mcmodel=kernel.
>> >
>> > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > kernel, with the very latest GCC?
>> >
>> > Also, to make sure: which unwinder did you use for your measurements,
>> > frame-pointers or ORC? Please use ORC only for future numbers, as
>> > frame-pointers is obsolete from a performance measurement POV.
>> >
>> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>> >
>> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
>> > switch in the future.
>> >
>>
>> There are somewhat related concerns in the ARM world, so it would be
>> good if we could work with the GCC developers to get a more high level
>> and arch neutral command line option (-mkernel-pie? sounds yummy!)
>> that stops the compiler from making inferences that only hold for
>> shared libraries and/or other hosted executables (GOT indirections,
>> avoiding text relocations etc). That way, we will also be able to drop
>> the 'hidden' visibility override at some point, which we currently
>> need to prevent the compiler from redirecting all global symbol
>> references via entries in the GOT.
>
> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
> register for stack cookies but I can see great benefits in having a
> more general kernel flag that would allow to get rid of the GOT and
> PLT when you are building position independent code for the kernel. It
> could also include optimizations like folding switch tables etc...
>
> Should we start a separate discussion on that? Anyone that would be
> more experienced than I to push that to gcc & clang upstream?

After separate discussion, opened:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

>
>>
>> All we really need is the ability to move the image around in virtual
>> memory, and things like reducing the CoW footprint or enabling ELF
>> symbol preemption are completely irrelevant for us.
>
>
>
>
> --
> Thomas



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:21                             ` Thomas Garnier
  2017-09-22  4:24                               ` Markus Trippelsdorf
@ 2017-09-22 23:55                               ` Thomas Garnier
  2017-09-22 23:55                               ` Thomas Garnier
  2 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22 23:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 2:21 PM, Thomas Garnier <thgarnie@google.com> wrote:
> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>>
>> On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> >   window, but in reality I've been procrastinating this is due to the permanent,
>> >   non-trivial impact PIE has on generated C code. )
>> >
>> > * Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> 1) PIE sometime needs two instructions to represent a single
>> >> instruction on mcmodel=kernel.
>> >
>> > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > kernel, with the very latest GCC?
>> >
>> > Also, to make sure: which unwinder did you use for your measurements,
>> > frame-pointers or ORC? Please use ORC only for future numbers, as
>> > frame-pointers is obsolete from a performance measurement POV.
>> >
>> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>> >
>> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
>> > switch in the future.
>> >
>>
>> There are somewhat related concerns in the ARM world, so it would be
>> good if we could work with the GCC developers to get a more high level
>> and arch neutral command line option (-mkernel-pie? sounds yummy!)
>> that stops the compiler from making inferences that only hold for
>> shared libraries and/or other hosted executables (GOT indirections,
>> avoiding text relocations etc). That way, we will also be able to drop
>> the 'hidden' visibility override at some point, which we currently
>> need to prevent the compiler from redirecting all global symbol
>> references via entries in the GOT.
>
> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
> register for stack cookies but I can see great benefits in having a
> more general kernel flag that would allow to get rid of the GOT and
> PLT when you are building position independent code for the kernel. It
> could also include optimizations like folding switch tables etc...
>
> Should we start a separate discussion on that? Anyone that would be
> more experienced than I to push that to gcc & clang upstream?

After separate discussion, opened:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82303

>
>>
>> All we really need is the ability to move the image around in virtual
>> memory, and things like reducing the CoW footprint or enabling ELF
>> symbol preemption are completely irrelevant for us.
>
>
>
>
> --
> Thomas



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 19:06                                   ` H. Peter Anvin
  2017-09-22 22:19                                     ` hjl.tools
@ 2017-09-22 22:30                                     ` hjl.tools
  1 sibling, 0 replies; 106+ messages in thread
From: hjl.tools @ 2017-09-22 22:30 UTC (permalink / raw)
  To: H. Peter Anvin, Kees Cook
  Cc: Radim Krčmář,
	Peter Zijlstra, Paul Gortmaker, Pavel Machek, Christoph Lameter,
	Ingo Molnar, Herbert Xu, Joerg Roedel, Matthias Kaehlcke,
	Borislav Petkov, Len Brown, Arnd Bergmann, Brian Gerst,
	Andy Lutomirski, Josh Poimboeuf

<cmetcalf@mellanox.com>,Andrew Morton <akpm@linux-foundation.org>,"Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,Nicolas Pitre <nicolas.pitre@linaro.org>,Christopher Li <sparse@chrisli.org>,"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,Lukas Wunner <lukas@wunner.de>,Mika Westerberg <mika.westerberg@linux.intel.com>,Dou Liyang <douly.fnst@cn.fujitsu.com>,Daniel Borkmann <daniel@iogearbox.net>,Alexei Starovoitov <ast@kernel.org>,Masahiro Yamada <yamada.masahiro@socionext.com>,Markus Trippelsdorf <markus@trippelsdorf.de>,Steven Rostedt <rostedt@goodmis.org>,Rik van Riel <riel@redhat.com>,David Howells <dhowells@redhat.com>,Waiman Long <longman@redhat.com>,Kyle Huey <me@kylehuey.com>,Peter Foley <pefoley2@pefoley.com>,Tim Chen <tim.c.chen@linux.intel.com>,Catalin Marinas <catalin.marinas@arm.com>,Ard Biesheuvel <ard.biesheuvel@linaro.org>,Michal Hocko <mhocko@suse.com>,Matthew Wilcox <mawilcox@microsoft.com>,Paul Bolle <pebolle@tiscali.nl>,Rob Landley <rob@landley.net>,Baoquan He
<bhe@redhat.com>,Daniel Micay <danielmicay@gmail.com>,the arch/x86 maintainers <x86@kernel.org>,Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,LKML <linux-kernel@vger.kernel.org>,xen-devel <xen-devel@lists.xenproject.org>,kvm list <kvm@vger.kernel.org>,Linux PM list <linux-pm@vger.kernel.org>,linux-arch <linux-arch@vger.kernel.org>,Sparse Mailing-list <linux-sparse@vger.kernel.org>,Kernel Hardening <kernel-hardening@lists.openwall.com>,Linus Torvalds <torvalds@linux-foundation.org>,Peter Zijlstra <a.p.zijlstra@chello.nl>,Borislav Petkov <bp@alien8.de>
From: "H.J. Lu" <hjl.tools@gmail.com>
Message-ID: <CFFA3E3A-3136-4FAF-80E1-96A515A5C903@gmail.com>



On September 23, 2017 3:06:16 AM GMT+08:00, "H. Peter Anvin" <hpa@zytor.com> wrote:
>On 09/22/17 11:57, Kees Cook wrote:
>> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com>
>wrote:
>>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since
>x86-64
>>> has RIP-relative addressing there is no need for a dedicated PIC
>register.
>> 
>> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
>> reusable, and that seems to have improved performance:
>> https://gcc.gnu.org/gcc-5/changes.html
>
>It still talks about a PIC register on x86-64, which confuses me.
>Perhaps older gcc's would allocate a PIC register under certain
>circumstances, and then lose it for the entire function?
>
>For i386, the PIC register is required by the ABI to be %ebx at the
>point any PLT entry is called.  Not an issue with -mno-plt which goes
>straight to the GOT, although in most cases there needs to be a PIC
>register to find the GOT unless load-time relocation is permitted.
>
>	-hpa
We need a static PIE option so that compiler can optimize it
without using hidden visibility.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 19:06                                   ` H. Peter Anvin
@ 2017-09-22 22:19                                     ` hjl.tools
  2017-09-22 22:30                                     ` hjl.tools
  1 sibling, 0 replies; 106+ messages in thread
From: hjl.tools @ 2017-09-22 22:19 UTC (permalink / raw)
  To: H. Peter Anvin, Kees Cook
  Cc: Radim Krčmář,
	Peter Zijlstra, Paul Gortmaker, Pavel Machek, Christoph Lameter,
	Ingo Molnar, Herbert Xu, Joerg Roedel, Matthias Kaehlcke,
	Borislav Petkov, Len Brown, Arnd Bergmann, Brian Gerst,
	Andy Lutomirski, Josh Poimboeuf






[-- Attachment #2.2: Type: text/plain, Size: 1208 bytes --]



On September 23, 2017 3:06:16 AM GMT+08:00, "H. Peter Anvin" <hpa@zytor.com> wrote:
>On 09/22/17 11:57, Kees Cook wrote:
>> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com>
>wrote:
>>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since
>x86-64
>>> has RIP-relative addressing there is no need for a dedicated PIC
>register.
>> 
>> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
>> reusable, and that seems to have improved performance:
>> https://gcc.gnu.org/gcc-5/changes.html
>
>It still talks about a PIC register on x86-64, which confuses me.
>Perhaps older gcc's would allocate a PIC register under certain
>circumstances, and then lose it for the entire function?
>
>For i386, the PIC register is required by the ABI to be %ebx at the
>point any PLT entry is called.  Not an issue with -mno-plt which goes
>straight to the GOT, although in most cases there needs to be a PIC
>register to find the GOT unless load-time relocation is permitted.
>
>	-hpa

We need a static PIE option so that compiler can optimize it
without using hidden visibility.

H.J.
Sent from my Android device with K-9 Mail. Please excuse my brevity.

[-- Attachment #2.3: Type: text/html, Size: 1855 bytes --]

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:57                                 ` Kees Cook
@ 2017-09-22 19:06                                   ` H. Peter Anvin
  2017-09-22 22:19                                     ` hjl.tools
  2017-09-22 22:30                                     ` hjl.tools
  2017-09-22 19:06                                   ` H. Peter Anvin
  1 sibling, 2 replies; 106+ messages in thread
From: H. Peter Anvin @ 2017-09-22 19:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Thomas Garnier, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov

On 09/22/17 11:57, Kees Cook wrote:
> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
>> has RIP-relative addressing there is no need for a dedicated PIC register.
> 
> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
> reusable, and that seems to have improved performance:
> https://gcc.gnu.org/gcc-5/changes.html

It still talks about a PIC register on x86-64, which confuses me.
Perhaps older gcc's would allocate a PIC register under certain
circumstances, and then lose it for the entire function?

For i386, the PIC register is required by the ABI to be %ebx at the
point any PLT entry is called.  Not an issue with -mno-plt which goes
straight to the GOT, although in most cases there needs to be a PIC
register to find the GOT unless load-time relocation is permitted.

	-hpa

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:57                                 ` Kees Cook
  2017-09-22 19:06                                   ` H. Peter Anvin
@ 2017-09-22 19:06                                   ` H. Peter Anvin
  1 sibling, 0 replies; 106+ messages in thread
From: H. Peter Anvin @ 2017-09-22 19:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Peter Zijlstra,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On 09/22/17 11:57, Kees Cook wrote:
> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
>> has RIP-relative addressing there is no need for a dedicated PIC register.
> 
> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
> reusable, and that seems to have improved performance:
> https://gcc.gnu.org/gcc-5/changes.html

It still talks about a PIC register on x86-64, which confuses me.
Perhaps older gcc's would allocate a PIC register under certain
circumstances, and then lose it for the entire function?

For i386, the PIC register is required by the ABI to be %ebx at the
point any PLT entry is called.  Not an issue with -mno-plt which goes
straight to the GOT, although in most cases there needs to be a PIC
register to find the GOT unless load-time relocation is permitted.

	-hpa


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
                                                   ` (2 preceding siblings ...)
  2017-09-22 18:59                                 ` Thomas Garnier
@ 2017-09-22 18:59                                 ` Thomas Garnier
  2017-09-23  9:49                                 ` Ingo Molnar
  2017-09-23  9:49                                 ` Ingo Molnar
  5 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22 18:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/22/17 09:32, Ingo Molnar wrote:
>>
>> BTW., I think things improved with ORC because with ORC we have RBP as an extra
>> register and with PIE we lose RBX - so register pressure in code generation is
>> lower.
>>
>
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.
>
> I'm somewhat confused how we can have as much as almost 1% overhead.  I
> suspect that we end up making a GOT and maybe even a PLT for no good reason.

We have a GOT with very few entries, mainly linker script globals that
I think we can work to reduce or remove.

We have a PLT but it is empty. On latest iteration (not sent yet),
modules have PLT32 relocations but no PLT entry. I got rid of
mcmodel=large for modules and instead I move the beginning of the
module section just after the kernel so relative relocations work.

>
>         -hpa



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
  2017-09-22 18:57                                 ` Kees Cook
  2017-09-22 18:57                                 ` Kees Cook
@ 2017-09-22 18:59                                 ` Thomas Garnier
  2017-09-22 18:59                                 ` Thomas Garnier
                                                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22 18:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel

On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/22/17 09:32, Ingo Molnar wrote:
>>
>> BTW., I think things improved with ORC because with ORC we have RBP as an extra
>> register and with PIE we lose RBX - so register pressure in code generation is
>> lower.
>>
>
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.
>
> I'm somewhat confused how we can have as much as almost 1% overhead.  I
> suspect that we end up making a GOT and maybe even a PLT for no good reason.

We have a GOT with very few entries, mainly linker script globals that
I think we can work to reduce or remove.

We have a PLT but it is empty. On latest iteration (not sent yet),
modules have PLT32 relocations but no PLT entry. I got rid of
mcmodel=large for modules and instead I move the beginning of the
module section just after the kernel so relative relocations work.

>
>         -hpa



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
@ 2017-09-22 18:57                                 ` Kees Cook
  2017-09-22 19:06                                   ` H. Peter Anvin
  2017-09-22 19:06                                   ` H. Peter Anvin
  2017-09-22 18:57                                 ` Kees Cook
                                                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 106+ messages in thread
From: Kees Cook @ 2017-09-22 18:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Thomas Garnier, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

FWIW, since gcc 5, the PIC register isn't totally lost. It is now
reusable, and that seems to have improved performance:
https://gcc.gnu.org/gcc-5/changes.html

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 18:38                               ` H. Peter Anvin
  2017-09-22 18:57                                 ` Kees Cook
@ 2017-09-22 18:57                                 ` Kees Cook
  2017-09-22 18:59                                 ` Thomas Garnier
                                                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 106+ messages in thread
From: Kees Cook @ 2017-09-22 18:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Nicolas Pitre, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Peter Zijlstra,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
> has RIP-relative addressing there is no need for a dedicated PIC register.

FWIW, since gcc 5, the PIC register isn't totally lost. It is now
reusable, and that seems to have improved performance:
https://gcc.gnu.org/gcc-5/changes.html

-Kees

-- 
Kees Cook
Pixel Security

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 16:32                             ` Ingo Molnar
  2017-09-22 18:08                               ` Thomas Garnier
  2017-09-22 18:08                               ` Thomas Garnier
@ 2017-09-22 18:38                               ` H. Peter Anvin
  2017-09-22 18:57                                 ` Kees Cook
                                                   ` (5 more replies)
  2017-09-22 18:38                               ` H. Peter Anvin
  3 siblings, 6 replies; 106+ messages in thread
From: H. Peter Anvin @ 2017-09-22 18:38 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown

On 09/22/17 09:32, Ingo Molnar wrote:
> 
> BTW., I think things improved with ORC because with ORC we have RBP as an extra 
> register and with PIE we lose RBX - so register pressure in code generation is 
> lower.
> 

We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
has RIP-relative addressing there is no need for a dedicated PIC register.

I'm somewhat confused how we can have as much as almost 1% overhead.  I
suspect that we end up making a GOT and maybe even a PLT for no good reason.

	-hpa

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 16:32                             ` Ingo Molnar
                                                 ` (2 preceding siblings ...)
  2017-09-22 18:38                               ` H. Peter Anvin
@ 2017-09-22 18:38                               ` H. Peter Anvin
  3 siblings, 0 replies; 106+ messages in thread
From: H. Peter Anvin @ 2017-09-22 18:38 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel

On 09/22/17 09:32, Ingo Molnar wrote:
> 
> BTW., I think things improved with ORC because with ORC we have RBP as an extra 
> register and with PIE we lose RBX - so register pressure in code generation is 
> lower.
> 

We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
has RIP-relative addressing there is no need for a dedicated PIC register.

I'm somewhat confused how we can have as much as almost 1% overhead.  I
suspect that we end up making a GOT and maybe even a PLT for no good reason.

	-hpa

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:28             ` Peter Zijlstra
@ 2017-09-22 18:27               ` H. Peter Anvin
  2017-09-23 10:00                 ` Ingo Molnar
  2017-09-23 10:00                 ` Ingo Molnar
  2017-09-22 18:27               ` H. Peter Anvin
  1 sibling, 2 replies; 106+ messages in thread
From: H. Peter Anvin @ 2017-09-22 18:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown

On 08/21/17 07:28, Peter Zijlstra wrote:
> 
> Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
> immediates. Still, small RIP relative should be able to live at any
> point as long as everything lives inside the same 2G relative range, so
> would still allow the goal of increasing the KASLR range.
> 
> So I'm not seeing how we need large mode for that. That said, after
> reading up on all this, RIP relative will not be too pretty either,
> while CALL is naturally RIP relative, data still needs an explicit %rip
> offset, still loads better than the large model.
> 

The large model makes no sense whatsoever.  I think what we're actually
looking for is the small-PIC model.

Ingo asked:
> I.e. is there no GCC code generation mode where code can be placed anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits so that the 
> generated code is fast?

That's the small-PIC model.  I think if all symbols are forced to hidden
then it won't even need a GOT/PLT.

We do need to consider how we want modules to fit into whatever model we
choose, though.  They can be adjacent, or we could go with a more
traditional dynamic link model where the modules can be separate, and
chained together with the main kernel via the GOT.

	-hpa

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:28             ` Peter Zijlstra
  2017-09-22 18:27               ` H. Peter Anvin
@ 2017-09-22 18:27               ` H. Peter Anvin
  1 sibling, 0 replies; 106+ messages in thread
From: H. Peter Anvin @ 2017-09-22 18:27 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, Kernel Hardening,
	Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki,
	Daniel Micay

On 08/21/17 07:28, Peter Zijlstra wrote:
> 
> Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
> immediates. Still, small RIP relative should be able to live at any
> point as long as everything lives inside the same 2G relative range, so
> would still allow the goal of increasing the KASLR range.
> 
> So I'm not seeing how we need large mode for that. That said, after
> reading up on all this, RIP relative will not be too pretty either,
> while CALL is naturally RIP relative, data still needs an explicit %rip
> offset, still loads better than the large model.
> 

The large model makes no sense whatsoever.  I think what we're actually
looking for is the small-PIC model.

Ingo asked:
> I.e. is there no GCC code generation mode where code can be placed anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits so that the 
> generated code is fast?

That's the small-PIC model.  I think if all symbols are forced to hidden
then it won't even need a GOT/PLT.

We do need to consider how we want modules to fit into whatever model we
choose, though.  They can be adjacent, or we could go with a more
traditional dynamic link model where the modules can be separate, and
chained together with the main kernel via the GOT.

	-hpa

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 16:32                             ` Ingo Molnar
  2017-09-22 18:08                               ` Thomas Garnier
@ 2017-09-22 18:08                               ` Thomas Garnier
  2017-09-23  9:43                                 ` Ingo Molnar
  2017-09-23  9:43                                 ` Ingo Molnar
  2017-09-22 18:38                               ` H. Peter Anvin
  2017-09-22 18:38                               ` H. Peter Anvin
  3 siblings, 2 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22 18:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Sep 22, 2017 at 9:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> >   window, but in reality I've been procrastinating this is due to the permanent,
>> >   non-trivial impact PIE has on generated C code. )
>> >
>> > * Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> 1) PIE sometime needs two instructions to represent a single
>> >> instruction on mcmodel=kernel.
>> >
>> > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > kernel, with the very latest GCC?
>>
>> I am not sure what is the best way to measure that.
>
> If this is the dominant factor then 'sizeof vmlinux' ought to be enough:
>
>> With ORC: PIE .text is 0.814224% than baseline
>
> I.e. the overhead is +0.81% in both size and (roughly) in number of instructions
> executed.
>
> BTW., I think things improved with ORC because with ORC we have RBP as an extra
> register and with PIE we lose RBX - so register pressure in code generation is
> lower.

That make sense.

>
> Ok, I suspect we can try it, but my preconditions for merging it would be:
>
>   1) Linus doesn't NAK it (obviously)

Of course.

>   2) we first implement the additional entropy bits that Linus suggested.
>
> does this work for you?

Sure, I can look at how feasible that is. If it is, can I send
everything as part of the same patch set? The additional entropy would
be enabled for all KASLR but PIE will be off-by-default of course.

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22 16:32                             ` Ingo Molnar
@ 2017-09-22 18:08                               ` Thomas Garnier
  2017-09-22 18:08                               ` Thomas Garnier
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22 18:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Fri, Sep 22, 2017 at 9:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> >   window, but in reality I've been procrastinating this is due to the permanent,
>> >   non-trivial impact PIE has on generated C code. )
>> >
>> > * Thomas Garnier <thgarnie@google.com> wrote:
>> >
>> >> 1) PIE sometime needs two instructions to represent a single
>> >> instruction on mcmodel=kernel.
>> >
>> > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > kernel, with the very latest GCC?
>>
>> I am not sure what is the best way to measure that.
>
> If this is the dominant factor then 'sizeof vmlinux' ought to be enough:
>
>> With ORC: PIE .text is 0.814224% than baseline
>
> I.e. the overhead is +0.81% in both size and (roughly) in number of instructions
> executed.
>
> BTW., I think things improved with ORC because with ORC we have RBP as an extra
> register and with PIE we lose RBX - so register pressure in code generation is
> lower.

That make sense.

>
> Ok, I suspect we can try it, but my preconditions for merging it would be:
>
>   1) Linus doesn't NAK it (obviously)

Of course.

>   2) we first implement the additional entropy bits that Linus suggested.
>
> does this work for you?

Sure, I can look at how feasible that is. If it is, can I send
everything as part of the same patch set? The additional entropy would
be enabled for all KASLR but PIE will be off-by-default of course.

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:16                           ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
@ 2017-09-22 16:32                             ` Ingo Molnar
  2017-09-22 18:08                               ` Thomas Garnier
                                                 ` (3 more replies)
  2017-09-22 16:32                             ` Ingo Molnar
  3 siblings, 4 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-22 16:32 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> 
> I am not sure what is the best way to measure that.

If this is the dominant factor then 'sizeof vmlinux' ought to be enough:

> With ORC: PIE .text is 0.814224% than baseline

I.e. the overhead is +0.81% in both size and (roughly) in number of instructions 
executed.

BTW., I think things improved with ORC because with ORC we have RBP as an extra 
register and with PIE we lose RBX - so register pressure in code generation is 
lower.

Ok, I suspect we can try it, but my preconditions for merging it would be:

  1) Linus doesn't NAK it (obviously)
  2) we first implement the additional entropy bits that Linus suggested.

does this work for you?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:16                           ` Thomas Garnier
                                               ` (2 preceding siblings ...)
  2017-09-22 16:32                             ` Ingo Molnar
@ 2017-09-22 16:32                             ` Ingo Molnar
  3 siblings, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-22 16:32 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> 
> I am not sure what is the best way to measure that.

If this is the dominant factor then 'sizeof vmlinux' ought to be enough:

> With ORC: PIE .text is 0.814224% than baseline

I.e. the overhead is +0.81% in both size and (roughly) in number of instructions 
executed.

BTW., I think things improved with ORC because with ORC we have RBP as an extra 
register and with PIE we lose RBX - so register pressure in code generation is 
lower.

Ok, I suspect we can try it, but my preconditions for merging it would be:

  1) Linus doesn't NAK it (obviously)
  2) we first implement the additional entropy bits that Linus suggested.

does this work for you?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22  4:24                               ` Markus Trippelsdorf
@ 2017-09-22 14:38                                 ` Thomas Garnier
  2017-09-22 14:38                                 ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22 14:38 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Ard Biesheuvel, Ingo Molnar, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Peter Zijlstra,
	Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown

On Thu, Sep 21, 2017 at 9:24 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2017.09.21 at 14:21 -0700, Thomas Garnier wrote:
>> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> >
>> > On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>> > >
>> > > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> > >   window, but in reality I've been procrastinating this is due to the permanent,
>> > >   non-trivial impact PIE has on generated C code. )
>> > >
>> > > * Thomas Garnier <thgarnie@google.com> wrote:
>> > >
>> > >> 1) PIE sometime needs two instructions to represent a single
>> > >> instruction on mcmodel=kernel.
>> > >
>> > > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > > kernel, with the very latest GCC?
>> > >
>> > > Also, to make sure: which unwinder did you use for your measurements,
>> > > frame-pointers or ORC? Please use ORC only for future numbers, as
>> > > frame-pointers is obsolete from a performance measurement POV.
>> > >
>> > >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>> > >
>> > > Hopefully this can either be fixed in GCC or at least influenced via a compiler
>> > > switch in the future.
>> > >
>> >
>> > There are somewhat related concerns in the ARM world, so it would be
>> > good if we could work with the GCC developers to get a more high level
>> > and arch neutral command line option (-mkernel-pie? sounds yummy!)
>> > that stops the compiler from making inferences that only hold for
>> > shared libraries and/or other hosted executables (GOT indirections,
>> > avoiding text relocations etc). That way, we will also be able to drop
>> > the 'hidden' visibility override at some point, which we currently
>> > need to prevent the compiler from redirecting all global symbol
>> > references via entries in the GOT.
>>
>> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
>> register for stack cookies but I can see great benefits in having a
>> more general kernel flag that would allow to get rid of the GOT and
>> PLT when you are building position independent code for the kernel. It
>> could also include optimizations like folding switch tables etc...
>>
>> Should we start a separate discussion on that? Anyone that would be
>> more experienced than I to push that to gcc & clang upstream?
>
> Just open a gcc bug. See
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 as an example.

Make sense, I will look into this. Thanks Andy for the stack cookie bug!

>
> --
> Markus



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-22  4:24                               ` Markus Trippelsdorf
  2017-09-22 14:38                                 ` Thomas Garnier
@ 2017-09-22 14:38                                 ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22 14:38 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, Len Brown,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 9:24 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2017.09.21 at 14:21 -0700, Thomas Garnier wrote:
>> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> >
>> > On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>> > >
>> > > ( Sorry about the delay in answering this. I could blame the delay on the merge
>> > >   window, but in reality I've been procrastinating this is due to the permanent,
>> > >   non-trivial impact PIE has on generated C code. )
>> > >
>> > > * Thomas Garnier <thgarnie@google.com> wrote:
>> > >
>> > >> 1) PIE sometime needs two instructions to represent a single
>> > >> instruction on mcmodel=kernel.
>> > >
>> > > What again is the typical frequency of this occurring in an x86-64 defconfig
>> > > kernel, with the very latest GCC?
>> > >
>> > > Also, to make sure: which unwinder did you use for your measurements,
>> > > frame-pointers or ORC? Please use ORC only for future numbers, as
>> > > frame-pointers is obsolete from a performance measurement POV.
>> > >
>> > >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>> > >
>> > > Hopefully this can either be fixed in GCC or at least influenced via a compiler
>> > > switch in the future.
>> > >
>> >
>> > There are somewhat related concerns in the ARM world, so it would be
>> > good if we could work with the GCC developers to get a more high level
>> > and arch neutral command line option (-mkernel-pie? sounds yummy!)
>> > that stops the compiler from making inferences that only hold for
>> > shared libraries and/or other hosted executables (GOT indirections,
>> > avoiding text relocations etc). That way, we will also be able to drop
>> > the 'hidden' visibility override at some point, which we currently
>> > need to prevent the compiler from redirecting all global symbol
>> > references via entries in the GOT.
>>
>> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
>> register for stack cookies but I can see great benefits in having a
>> more general kernel flag that would allow to get rid of the GOT and
>> PLT when you are building position independent code for the kernel. It
>> could also include optimizations like folding switch tables etc...
>>
>> Should we start a separate discussion on that? Anyone that would be
>> more experienced than I to push that to gcc & clang upstream?
>
> Just open a gcc bug. See
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 as an example.

Make sense, I will look into this. Thanks Andy for the stack cookie bug!

>
> --
> Markus



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:21                             ` Thomas Garnier
@ 2017-09-22  4:24                               ` Markus Trippelsdorf
  2017-09-22 14:38                                 ` Thomas Garnier
  2017-09-22 14:38                                 ` Thomas Garnier
  2017-09-22 23:55                               ` Thomas Garnier
  2017-09-22 23:55                               ` Thomas Garnier
  2 siblings, 2 replies; 106+ messages in thread
From: Markus Trippelsdorf @ 2017-09-22  4:24 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, Len Brown,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On 2017.09.21 at 14:21 -0700, Thomas Garnier wrote:
> On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > ( Sorry about the delay in answering this. I could blame the delay on the merge
> > >   window, but in reality I've been procrastinating this is due to the permanent,
> > >   non-trivial impact PIE has on generated C code. )
> > >
> > > * Thomas Garnier <thgarnie@google.com> wrote:
> > >
> > >> 1) PIE sometime needs two instructions to represent a single
> > >> instruction on mcmodel=kernel.
> > >
> > > What again is the typical frequency of this occurring in an x86-64 defconfig
> > > kernel, with the very latest GCC?
> > >
> > > Also, to make sure: which unwinder did you use for your measurements,
> > > frame-pointers or ORC? Please use ORC only for future numbers, as
> > > frame-pointers is obsolete from a performance measurement POV.
> > >
> > >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> > >
> > > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > > switch in the future.
> > >
> >
> > There are somewhat related concerns in the ARM world, so it would be
> > good if we could work with the GCC developers to get a more high level
> > and arch neutral command line option (-mkernel-pie? sounds yummy!)
> > that stops the compiler from making inferences that only hold for
> > shared libraries and/or other hosted executables (GOT indirections,
> > avoiding text relocations etc). That way, we will also be able to drop
> > the 'hidden' visibility override at some point, which we currently
> > need to prevent the compiler from redirecting all global symbol
> > references via entries in the GOT.
> 
> My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
> register for stack cookies but I can see great benefits in having a
> more general kernel flag that would allow to get rid of the GOT and
> PLT when you are building position independent code for the kernel. It
> could also include optimizations like folding switch tables etc...
> 
> Should we start a separate discussion on that? Anyone that would be
> more experienced than I to push that to gcc & clang upstream?

Just open a gcc bug. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708 as an example.

-- 
Markus

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:16                           ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
@ 2017-09-22  0:06                             ` Thomas Garnier
  2017-09-22 16:32                             ` Ingo Molnar
  2017-09-22 16:32                             ` Ingo Molnar
  3 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22  0:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Thu, Sep 21, 2017 at 2:16 PM, Thomas Garnier <thgarnie@google.com> wrote:
>
> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
>
> I am not sure what is the best way to measure that.

A very approximate approach would be to look at each instruction using
the signed trick with a _32S relocation. All _32S relocations won't be
translated to more instructions because some are just relocating part
of an absolute mov which would be actually smaller if relative.

Used this command to get a relative estimate:

objdump -dr ./baseline/vmlinux | egrep -A 2 '\-0x[0-9a-f]{8}' | grep
_32S | wc -l

Got 6130 places, if you assume each add at least 7 bytes. It adds at
least 42910 bytes on the .text section. The text section is 78599
bytes bigger from baseline to PIE. That's at least 54% of the size
difference. Assuming we found all of them and we can't factor the
impact on using an additional register.

Similar approach with the switch table but a bit more complex:

1) Find all constructs as with an lea (%rip) followed by a jmp
instruction inside a function (typical unfolded switch case).
2) Remove occurrences of less than 4 for the destination address

Result: 480 switch cases in 49 functions. Each case take at least 9
bytes and the switch itself takes 16 bytes (assuming one per
function).

That's 5104 bytes for easy to identify switches (less than 7% of the increase).

I am certainly missing a lot of differences. I checked if the percpu
changes impacted the size and it doesn't (only 3 bytes added on PIE).

I also tried different ways to compare the .text section like size of
symbols or number of bytes on full disassembly but the results are
really off from the whole .text size so I am not sure if it is the
right way to go about it.

>
> >
> > Also, to make sure: which unwinder did you use for your measurements,
> > frame-pointers or ORC? Please use ORC only for future numbers, as
> > frame-pointers is obsolete from a performance measurement POV.
>
> I used the default configuration which uses frame-pointer. I built all
> the different binaries with ORC and I see an improvement in size:
>
> On latest revision (just built and ran performance tests this week):
>
> With framepointer: PIE .text is 0.837324% than baseline
>
> With ORC: PIE .text is 0.814224% than baseline
>
> Comparing baselines only, ORC is -2.849832% than frame-pointers.
>
> >
> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> >
> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > switch in the future.
> >
> >> The switches are the biggest increase on small functions but I don't
> >> think they represent a large portion of the difference (number 1 is).
> >
> > Ok.
> >
> >> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> >> kernel being faster by 1% across multiple runs (comparing 50 runs done
> >> across 5 reboots twice). I don't think PIE is faster than a
> >> mcmodel=kernel but recent versions of gcc makes them fairly similar.
> >
> > So I think we are down to an overhead range where the inherent noise (both random
> > and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.
> >
> > So I think it's the kernel .text size change that is the best noise-free proxy for
> > the overhead impact of PIE.
>
> I agree but it might be hard to measure the exact impact. What is
> acceptable and what is not?
>
> >
> > It doesn't hurt to double check actual real performance as well, just don't expect
> > there to be much of a signal for anything but fully cached microbenchmark
> > workloads.
>
> That's aligned with what I see in the latest performance testing.
> Performance is close enough that it is hard to get exact numbers (pie
> is just a bit slower than baseline on hackench (~1%)).
>
> >
> > Thanks,
> >
> >         Ingo
>
>
>
> --
> Thomas




-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 21:16                           ` Thomas Garnier
@ 2017-09-22  0:06                             ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-22  0:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 2:16 PM, Thomas Garnier <thgarnie@google.com> wrote:
>
> On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
>
> I am not sure what is the best way to measure that.

A very approximate approach would be to look at each instruction using
the signed trick with a _32S relocation. All _32S relocations won't be
translated to more instructions because some are just relocating part
of an absolute mov which would be actually smaller if relative.

Used this command to get a relative estimate:

objdump -dr ./baseline/vmlinux | egrep -A 2 '\-0x[0-9a-f]{8}' | grep
_32S | wc -l

Got 6130 places, if you assume each add at least 7 bytes. It adds at
least 42910 bytes on the .text section. The text section is 78599
bytes bigger from baseline to PIE. That's at least 54% of the size
difference. Assuming we found all of them and we can't factor the
impact on using an additional register.

Similar approach with the switch table but a bit more complex:

1) Find all constructs as with an lea (%rip) followed by a jmp
instruction inside a function (typical unfolded switch case).
2) Remove occurrences of less than 4 for the destination address

Result: 480 switch cases in 49 functions. Each case take at least 9
bytes and the switch itself takes 16 bytes (assuming one per
function).

That's 5104 bytes for easy to identify switches (less than 7% of the increase).

I am certainly missing a lot of differences. I checked if the percpu
changes impacted the size and it doesn't (only 3 bytes added on PIE).

I also tried different ways to compare the .text section like size of
symbols or number of bytes on full disassembly but the results are
really off from the whole .text size so I am not sure if it is the
right way to go about it.

>
> >
> > Also, to make sure: which unwinder did you use for your measurements,
> > frame-pointers or ORC? Please use ORC only for future numbers, as
> > frame-pointers is obsolete from a performance measurement POV.
>
> I used the default configuration which uses frame-pointer. I built all
> the different binaries with ORC and I see an improvement in size:
>
> On latest revision (just built and ran performance tests this week):
>
> With framepointer: PIE .text is 0.837324% than baseline
>
> With ORC: PIE .text is 0.814224% than baseline
>
> Comparing baselines only, ORC is -2.849832% than frame-pointers.
>
> >
> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> >
> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > switch in the future.
> >
> >> The switches are the biggest increase on small functions but I don't
> >> think they represent a large portion of the difference (number 1 is).
> >
> > Ok.
> >
> >> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> >> kernel being faster by 1% across multiple runs (comparing 50 runs done
> >> across 5 reboots twice). I don't think PIE is faster than a
> >> mcmodel=kernel but recent versions of gcc makes them fairly similar.
> >
> > So I think we are down to an overhead range where the inherent noise (both random
> > and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.
> >
> > So I think it's the kernel .text size change that is the best noise-free proxy for
> > the overhead impact of PIE.
>
> I agree but it might be hard to measure the exact impact. What is
> acceptable and what is not?
>
> >
> > It doesn't hurt to double check actual real performance as well, just don't expect
> > there to be much of a signal for anything but fully cached microbenchmark
> > workloads.
>
> That's aligned with what I see in the latest performance testing.
> Performance is close enough that it is hard to get exact numbers (pie
> is just a bit slower than baseline on hackench (~1%)).
>
> >
> > Thanks,
> >
> >         Ingo
>
>
>
> --
> Thomas




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 16:10                           ` Ard Biesheuvel
  2017-09-21 21:21                             ` Thomas Garnier
@ 2017-09-21 21:21                             ` Thomas Garnier
  2017-09-22  4:24                               ` Markus Trippelsdorf
                                                 ` (2 more replies)
  1 sibling, 3 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-21 21:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> >
> > Also, to make sure: which unwinder did you use for your measurements,
> > frame-pointers or ORC? Please use ORC only for future numbers, as
> > frame-pointers is obsolete from a performance measurement POV.
> >
> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> >
> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > switch in the future.
> >
>
> There are somewhat related concerns in the ARM world, so it would be
> good if we could work with the GCC developers to get a more high level
> and arch neutral command line option (-mkernel-pie? sounds yummy!)
> that stops the compiler from making inferences that only hold for
> shared libraries and/or other hosted executables (GOT indirections,
> avoiding text relocations etc). That way, we will also be able to drop
> the 'hidden' visibility override at some point, which we currently
> need to prevent the compiler from redirecting all global symbol
> references via entries in the GOT.

My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
register for stack cookies but I can see great benefits in having a
more general kernel flag that would allow to get rid of the GOT and
PLT when you are building position independent code for the kernel. It
could also include optimizations like folding switch tables etc...

Should we start a separate discussion on that? Anyone that would be
more experienced than I to push that to gcc & clang upstream?

>
> All we really need is the ability to move the image around in virtual
> memory, and things like reducing the CoW footprint or enabling ELF
> symbol preemption are completely irrelevant for us.




-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 16:10                           ` Ard Biesheuvel
@ 2017-09-21 21:21                             ` Thomas Garnier
  2017-09-21 21:21                             ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-21 21:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 9:10 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > ( Sorry about the delay in answering this. I could blame the delay on the merge
> >   window, but in reality I've been procrastinating this is due to the permanent,
> >   non-trivial impact PIE has on generated C code. )
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> 1) PIE sometime needs two instructions to represent a single
> >> instruction on mcmodel=kernel.
> >
> > What again is the typical frequency of this occurring in an x86-64 defconfig
> > kernel, with the very latest GCC?
> >
> > Also, to make sure: which unwinder did you use for your measurements,
> > frame-pointers or ORC? Please use ORC only for future numbers, as
> > frame-pointers is obsolete from a performance measurement POV.
> >
> >> 2) GCC does not optimize switches in PIE in order to reduce relocations:
> >
> > Hopefully this can either be fixed in GCC or at least influenced via a compiler
> > switch in the future.
> >
>
> There are somewhat related concerns in the ARM world, so it would be
> good if we could work with the GCC developers to get a more high level
> and arch neutral command line option (-mkernel-pie? sounds yummy!)
> that stops the compiler from making inferences that only hold for
> shared libraries and/or other hosted executables (GOT indirections,
> avoiding text relocations etc). That way, we will also be able to drop
> the 'hidden' visibility override at some point, which we currently
> need to prevent the compiler from redirecting all global symbol
> references via entries in the GOT.

My plan was to add a -mtls-reg=<fs|gs> to switch the default segment
register for stack cookies but I can see great benefits in having a
more general kernel flag that would allow to get rid of the GOT and
PLT when you are building position independent code for the kernel. It
could also include optimizations like folding switch tables etc...

Should we start a separate discussion on that? Anyone that would be
more experienced than I to push that to gcc & clang upstream?

>
> All we really need is the ability to move the image around in virtual
> memory, and things like reducing the CoW footprint or enabling ELF
> symbol preemption are completely irrelevant for us.




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 15:59                         ` Ingo Molnar
  2017-09-21 16:10                           ` Ard Biesheuvel
  2017-09-21 16:10                           ` Ard Biesheuvel
@ 2017-09-21 21:16                           ` Thomas Garnier
  2017-09-22  0:06                             ` Thomas Garnier
                                               ` (3 more replies)
  2017-09-21 21:16                           ` Thomas Garnier
  3 siblings, 4 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-21 21:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> ( Sorry about the delay in answering this. I could blame the delay on the merge
>   window, but in reality I've been procrastinating this is due to the permanent,
>   non-trivial impact PIE has on generated C code. )
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> 1) PIE sometime needs two instructions to represent a single
>> instruction on mcmodel=kernel.
>
> What again is the typical frequency of this occurring in an x86-64 defconfig
> kernel, with the very latest GCC?

I am not sure what is the best way to measure that.

>
> Also, to make sure: which unwinder did you use for your measurements,
> frame-pointers or ORC? Please use ORC only for future numbers, as
> frame-pointers is obsolete from a performance measurement POV.

I used the default configuration which uses frame-pointer. I built all
the different binaries with ORC and I see an improvement in size:

On latest revision (just built and ran performance tests this week):

With framepointer: PIE .text is 0.837324% than baseline

With ORC: PIE .text is 0.814224% than baseline

Comparing baselines only, ORC is -2.849832% than frame-pointers.

>
>> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>
> Hopefully this can either be fixed in GCC or at least influenced via a compiler
> switch in the future.
>
>> The switches are the biggest increase on small functions but I don't
>> think they represent a large portion of the difference (number 1 is).
>
> Ok.
>
>> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
>> kernel being faster by 1% across multiple runs (comparing 50 runs done
>> across 5 reboots twice). I don't think PIE is faster than a
>> mcmodel=kernel but recent versions of gcc makes them fairly similar.
>
> So I think we are down to an overhead range where the inherent noise (both random
> and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.
>
> So I think it's the kernel .text size change that is the best noise-free proxy for
> the overhead impact of PIE.

I agree but it might be hard to measure the exact impact. What is
acceptable and what is not?

>
> It doesn't hurt to double check actual real performance as well, just don't expect
> there to be much of a signal for anything but fully cached microbenchmark
> workloads.

That's aligned with what I see in the latest performance testing.
Performance is close enough that it is hard to get exact numbers (pie
is just a bit slower than baseline on hackench (~1%)).

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 15:59                         ` Ingo Molnar
                                             ` (2 preceding siblings ...)
  2017-09-21 21:16                           ` Thomas Garnier
@ 2017-09-21 21:16                           ` Thomas Garnier
  3 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-09-21 21:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Sep 21, 2017 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> ( Sorry about the delay in answering this. I could blame the delay on the merge
>   window, but in reality I've been procrastinating this is due to the permanent,
>   non-trivial impact PIE has on generated C code. )
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> 1) PIE sometime needs two instructions to represent a single
>> instruction on mcmodel=kernel.
>
> What again is the typical frequency of this occurring in an x86-64 defconfig
> kernel, with the very latest GCC?

I am not sure what is the best way to measure that.

>
> Also, to make sure: which unwinder did you use for your measurements,
> frame-pointers or ORC? Please use ORC only for future numbers, as
> frame-pointers is obsolete from a performance measurement POV.

I used the default configuration which uses frame-pointer. I built all
the different binaries with ORC and I see an improvement in size:

On latest revision (just built and ran performance tests this week):

With framepointer: PIE .text is 0.837324% than baseline

With ORC: PIE .text is 0.814224% than baseline

Comparing baselines only, ORC is -2.849832% than frame-pointers.

>
>> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>
> Hopefully this can either be fixed in GCC or at least influenced via a compiler
> switch in the future.
>
>> The switches are the biggest increase on small functions but I don't
>> think they represent a large portion of the difference (number 1 is).
>
> Ok.
>
>> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
>> kernel being faster by 1% across multiple runs (comparing 50 runs done
>> across 5 reboots twice). I don't think PIE is faster than a
>> mcmodel=kernel but recent versions of gcc makes them fairly similar.
>
> So I think we are down to an overhead range where the inherent noise (both random
> and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.
>
> So I think it's the kernel .text size change that is the best noise-free proxy for
> the overhead impact of PIE.

I agree but it might be hard to measure the exact impact. What is
acceptable and what is not?

>
> It doesn't hurt to double check actual real performance as well, just don't expect
> there to be much of a signal for anything but fully cached microbenchmark
> workloads.

That's aligned with what I see in the latest performance testing.
Performance is close enough that it is hard to get exact numbers (pie
is just a bit slower than baseline on hackench (~1%)).

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 15:59                         ` Ingo Molnar
  2017-09-21 16:10                           ` Ard Biesheuvel
@ 2017-09-21 16:10                           ` Ard Biesheuvel
  2017-09-21 21:21                             ` Thomas Garnier
  2017-09-21 21:21                             ` Thomas Garnier
  2017-09-21 21:16                           ` Thomas Garnier
  2017-09-21 21:16                           ` Thomas Garnier
  3 siblings, 2 replies; 106+ messages in thread
From: Ard Biesheuvel @ 2017-09-21 16:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> ( Sorry about the delay in answering this. I could blame the delay on the merge
>   window, but in reality I've been procrastinating this is due to the permanent,
>   non-trivial impact PIE has on generated C code. )
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> 1) PIE sometime needs two instructions to represent a single
>> instruction on mcmodel=kernel.
>
> What again is the typical frequency of this occurring in an x86-64 defconfig
> kernel, with the very latest GCC?
>
> Also, to make sure: which unwinder did you use for your measurements,
> frame-pointers or ORC? Please use ORC only for future numbers, as
> frame-pointers is obsolete from a performance measurement POV.
>
>> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>
> Hopefully this can either be fixed in GCC or at least influenced via a compiler
> switch in the future.
>

There are somewhat related concerns in the ARM world, so it would be
good if we could work with the GCC developers to get a more high level
and arch neutral command line option (-mkernel-pie? sounds yummy!)
that stops the compiler from making inferences that only hold for
shared libraries and/or other hosted executables (GOT indirections,
avoiding text relocations etc). That way, we will also be able to drop
the 'hidden' visibility override at some point, which we currently
need to prevent the compiler from redirecting all global symbol
references via entries in the GOT.

All we really need is the ability to move the image around in virtual
memory, and things like reducing the CoW footprint or enabling ELF
symbol preemption are completely irrelevant for us.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-09-21 15:59                         ` Ingo Molnar
@ 2017-09-21 16:10                           ` Ard Biesheuvel
  2017-09-21 16:10                           ` Ard Biesheuvel
                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 106+ messages in thread
From: Ard Biesheuvel @ 2017-09-21 16:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On 21 September 2017 at 08:59, Ingo Molnar <mingo@kernel.org> wrote:
>
> ( Sorry about the delay in answering this. I could blame the delay on the merge
>   window, but in reality I've been procrastinating this is due to the permanent,
>   non-trivial impact PIE has on generated C code. )
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> 1) PIE sometime needs two instructions to represent a single
>> instruction on mcmodel=kernel.
>
> What again is the typical frequency of this occurring in an x86-64 defconfig
> kernel, with the very latest GCC?
>
> Also, to make sure: which unwinder did you use for your measurements,
> frame-pointers or ORC? Please use ORC only for future numbers, as
> frame-pointers is obsolete from a performance measurement POV.
>
>> 2) GCC does not optimize switches in PIE in order to reduce relocations:
>
> Hopefully this can either be fixed in GCC or at least influenced via a compiler
> switch in the future.
>

There are somewhat related concerns in the ARM world, so it would be
good if we could work with the GCC developers to get a more high level
and arch neutral command line option (-mkernel-pie? sounds yummy!)
that stops the compiler from making inferences that only hold for
shared libraries and/or other hosted executables (GOT indirections,
avoiding text relocations etc). That way, we will also be able to drop
the 'hidden' visibility override at some point, which we currently
need to prevent the compiler from redirecting all global symbol
references via entries in the GOT.

All we really need is the ability to move the image around in virtual
memory, and things like reducing the CoW footprint or enabling ELF
symbol preemption are completely irrelevant for us.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-29 19:34                       ` Thomas Garnier
@ 2017-09-21 15:59                         ` Ingo Molnar
  2017-09-21 16:10                           ` Ard Biesheuvel
                                             ` (3 more replies)
  2017-09-21 15:59                         ` Ingo Molnar
  1 sibling, 4 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-21 15:59 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


( Sorry about the delay in answering this. I could blame the delay on the merge 
  window, but in reality I've been procrastinating this is due to the permanent,
  non-trivial impact PIE has on generated C code. )

* Thomas Garnier <thgarnie@google.com> wrote:

> 1) PIE sometime needs two instructions to represent a single
> instruction on mcmodel=kernel.

What again is the typical frequency of this occurring in an x86-64 defconfig 
kernel, with the very latest GCC?

Also, to make sure: which unwinder did you use for your measurements, 
frame-pointers or ORC? Please use ORC only for future numbers, as
frame-pointers is obsolete from a performance measurement POV.

> 2) GCC does not optimize switches in PIE in order to reduce relocations:

Hopefully this can either be fixed in GCC or at least influenced via a compiler 
switch in the future.

> The switches are the biggest increase on small functions but I don't
> think they represent a large portion of the difference (number 1 is).

Ok.

> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> kernel being faster by 1% across multiple runs (comparing 50 runs done
> across 5 reboots twice). I don't think PIE is faster than a
> mcmodel=kernel but recent versions of gcc makes them fairly similar.

So I think we are down to an overhead range where the inherent noise (both random 
and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.

So I think it's the kernel .text size change that is the best noise-free proxy for 
the overhead impact of PIE.

It doesn't hurt to double check actual real performance as well, just don't expect 
there to be much of a signal for anything but fully cached microbenchmark 
workloads.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-29 19:34                       ` Thomas Garnier
  2017-09-21 15:59                         ` Ingo Molnar
@ 2017-09-21 15:59                         ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-09-21 15:59 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


( Sorry about the delay in answering this. I could blame the delay on the merge 
  window, but in reality I've been procrastinating this is due to the permanent,
  non-trivial impact PIE has on generated C code. )

* Thomas Garnier <thgarnie@google.com> wrote:

> 1) PIE sometime needs two instructions to represent a single
> instruction on mcmodel=kernel.

What again is the typical frequency of this occurring in an x86-64 defconfig 
kernel, with the very latest GCC?

Also, to make sure: which unwinder did you use for your measurements, 
frame-pointers or ORC? Please use ORC only for future numbers, as
frame-pointers is obsolete from a performance measurement POV.

> 2) GCC does not optimize switches in PIE in order to reduce relocations:

Hopefully this can either be fixed in GCC or at least influenced via a compiler 
switch in the future.

> The switches are the biggest increase on small functions but I don't
> think they represent a large portion of the difference (number 1 is).

Ok.

> A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
> kernel being faster by 1% across multiple runs (comparing 50 runs done
> across 5 reboots twice). I don't think PIE is faster than a
> mcmodel=kernel but recent versions of gcc makes them fairly similar.

So I think we are down to an overhead range where the inherent noise (both random 
and systematic one) in 'hackbench' overwhelms the signal we are trying to measure.

So I think it's the kernel .text size change that is the best noise-free proxy for 
the overhead impact of PIE.

It doesn't hurt to double check actual real performance as well, just don't expect 
there to be much of a signal for anything but fully cached microbenchmark 
workloads.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-25 15:05                     ` Thomas Garnier
@ 2017-08-29 19:34                       ` Thomas Garnier
  2017-09-21 15:59                         ` Ingo Molnar
  2017-09-21 15:59                         ` Ingo Molnar
  2017-08-29 19:34                       ` Thomas Garnier
  1 sibling, 2 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-29 19:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Aug 25, 2017 at 8:05 AM, Thomas Garnier <thgarnie@google.com> wrote:
> On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> With the fix for function tracing, the hackbench results have an
>>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>>> configuration, the numbers are closer to 0.8%.
>>>
>>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>>> and +1.180% on the ubuntu configuration.
>>
>> A 1% text size increase is still significant. Could you look at the disassembly,
>> where does the size increase come from?
>
> I will take a look, in this current iteration I added the .got and
> .got.plt so removing them will remove a big (even if they are small,
> we don't use them to increase perf).
>
> What do you think about the perf numbers in general so far?

I looked at the size increase. I could identify two common cases:

1) PIE sometime needs two instructions to represent a single
instruction on mcmodel=kernel.

For example, this instruction plays on the sign extension (mcmodel=kernel):

mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

The address 0xffffffff81c25fa0 can be represented as -0x7e3da060 using
a 32S relocation.

with PIE:

lea    rbx,[rip+<off>] (7 bytes)
mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

2) GCC does not optimize switches in PIE in order to reduce relocations:

For example the switch in phy_modes [1]:

static inline const char *phy_modes(phy_interface_t interface)
{
    switch (interface) {
    case PHY_INTERFACE_MODE_NA:
        return "";
    case PHY_INTERFACE_MODE_INTERNAL:
        return "internal";
    case PHY_INTERFACE_MODE_MII:
        return "mii";

Without PIE (gcc 7.2.0), the whole table is optimize to be one instruction:

   0x000000000040045b <+27>:    mov    rdi,QWORD PTR [rax*8+0x400660]

With PIE (gcc 7.2.0):

   0x0000000000000641 <+33>:    movsxd rax,DWORD PTR [rdx+rax*4]
   0x0000000000000645 <+37>:    add    rax,rdx
   0x0000000000000648 <+40>:    jmp    rax
....
   0x000000000000065d <+61>:    lea    rdi,[rip+0x264]        # 0x8c8
   0x0000000000000664 <+68>:    jmp    0x651 <main+49>
   0x0000000000000666 <+70>:    lea    rdi,[rip+0x2bc]        # 0x929
   0x000000000000066d <+77>:    jmp    0x651 <main+49>
   0x000000000000066f <+79>:    lea    rdi,[rip+0x2a8]        # 0x91e
   0x0000000000000676 <+86>:    jmp    0x651 <main+49>
   0x0000000000000678 <+88>:    lea    rdi,[rip+0x294]        # 0x913
   0x000000000000067f <+95>:    jmp    0x651 <main+49>

That's a deliberate choice, clang is able to optimize it (clang-3.8):

   0x0000000000000963 <+19>:    lea    rcx,[rip+0x200406]        # 0x200d70
   0x000000000000096a <+26>:    mov    rdi,QWORD PTR [rcx+rax*8]

I checked gcc and the code deciding to fold the switch basically do
not do it for pic to reduce relocations [2].

The switches are the biggest increase on small functions but I don't
think they represent a large portion of the difference (number 1 is).

A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
kernel being faster by 1% across multiple runs (comparing 50 runs done
across 5 reboots twice). I don't think PIE is faster than a
mcmodel=kernel but recent versions of gcc makes them fairly similar.

[1] http://elixir.free-electrons.com/linux/v4.13-rc7/source/include/linux/phy.h#L113
[2] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828

>
>>
>> Thanks,
>>
>>         Ingo
>
>
>
> --
> Thomas



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-25 15:05                     ` Thomas Garnier
  2017-08-29 19:34                       ` Thomas Garnier
@ 2017-08-29 19:34                       ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-29 19:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Fri, Aug 25, 2017 at 8:05 AM, Thomas Garnier <thgarnie@google.com> wrote:
> On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> With the fix for function tracing, the hackbench results have an
>>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>>> configuration, the numbers are closer to 0.8%.
>>>
>>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>>> and +1.180% on the ubuntu configuration.
>>
>> A 1% text size increase is still significant. Could you look at the disassembly,
>> where does the size increase come from?
>
> I will take a look, in this current iteration I added the .got and
> .got.plt so removing them will remove a big (even if they are small,
> we don't use them to increase perf).
>
> What do you think about the perf numbers in general so far?

I looked at the size increase. I could identify two common cases:

1) PIE sometime needs two instructions to represent a single
instruction on mcmodel=kernel.

For example, this instruction plays on the sign extension (mcmodel=kernel):

mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

The address 0xffffffff81c25fa0 can be represented as -0x7e3da060 using
a 32S relocation.

with PIE:

lea    rbx,[rip+<off>] (7 bytes)
mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

2) GCC does not optimize switches in PIE in order to reduce relocations:

For example the switch in phy_modes [1]:

static inline const char *phy_modes(phy_interface_t interface)
{
    switch (interface) {
    case PHY_INTERFACE_MODE_NA:
        return "";
    case PHY_INTERFACE_MODE_INTERNAL:
        return "internal";
    case PHY_INTERFACE_MODE_MII:
        return "mii";

Without PIE (gcc 7.2.0), the whole table is optimize to be one instruction:

   0x000000000040045b <+27>:    mov    rdi,QWORD PTR [rax*8+0x400660]

With PIE (gcc 7.2.0):

   0x0000000000000641 <+33>:    movsxd rax,DWORD PTR [rdx+rax*4]
   0x0000000000000645 <+37>:    add    rax,rdx
   0x0000000000000648 <+40>:    jmp    rax
....
   0x000000000000065d <+61>:    lea    rdi,[rip+0x264]        # 0x8c8
   0x0000000000000664 <+68>:    jmp    0x651 <main+49>
   0x0000000000000666 <+70>:    lea    rdi,[rip+0x2bc]        # 0x929
   0x000000000000066d <+77>:    jmp    0x651 <main+49>
   0x000000000000066f <+79>:    lea    rdi,[rip+0x2a8]        # 0x91e
   0x0000000000000676 <+86>:    jmp    0x651 <main+49>
   0x0000000000000678 <+88>:    lea    rdi,[rip+0x294]        # 0x913
   0x000000000000067f <+95>:    jmp    0x651 <main+49>

That's a deliberate choice, clang is able to optimize it (clang-3.8):

   0x0000000000000963 <+19>:    lea    rcx,[rip+0x200406]        # 0x200d70
   0x000000000000096a <+26>:    mov    rdi,QWORD PTR [rcx+rax*8]

I checked gcc and the code deciding to fold the switch basically do
not do it for pic to reduce relocations [2].

The switches are the biggest increase on small functions but I don't
think they represent a large portion of the difference (number 1 is).

A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
kernel being faster by 1% across multiple runs (comparing 50 runs done
across 5 reboots twice). I don't think PIE is faster than a
mcmodel=kernel but recent versions of gcc makes them fairly similar.

[1] http://elixir.free-electrons.com/linux/v4.13-rc7/source/include/linux/phy.h#L113
[2] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828

>
>>
>> Thanks,
>>
>>         Ingo
>
>
>
> --
> Thomas



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:31         ` Peter Zijlstra
  2017-08-21 15:57           ` Thomas Garnier
  2017-08-21 15:57           ` Thomas Garnier
@ 2017-08-28  1:26           ` H. Peter Anvin
  2017-08-28  1:26           ` H. Peter Anvin
  3 siblings, 0 replies; 106+ messages in thread
From: H. Peter Anvin @ 2017-08-28  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Garnier
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On 08/21/17 07:31, Peter Zijlstra wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> 
>>> Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
>>> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
>>> x86-64 address space to randomize the location of kernel text. The location of
>>> modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
> 
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
> 

No.  It is about whether you can do something like:

	movl $variable, %eax		/* rax = &variable; */

or

	addl %ecx,variable(,%rsi,4)	/* variable[rsi] += ecx */

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:31         ` Peter Zijlstra
                             ` (2 preceding siblings ...)
  2017-08-28  1:26           ` H. Peter Anvin
@ 2017-08-28  1:26           ` H. Peter Anvin
  3 siblings, 0 replies; 106+ messages in thread
From: H. Peter Anvin @ 2017-08-28  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Garnier
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, Kernel Hardening,
	Christoph Lameter, Ingo Molnar, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki,
	Daniel Micay

On 08/21/17 07:31, Peter Zijlstra wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> 
>>> Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
>>> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
>>> x86-64 address space to randomize the location of kernel text. The location of
>>> modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
> 
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
> 

No.  It is about whether you can do something like:

	movl $variable, %eax		/* rax = &variable; */

or

	addl %ecx,variable(,%rsi,4)	/* variable[rsi] += ecx */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:42                   ` Linus Torvalds
@ 2017-08-25 15:35                     ` Thomas Garnier
  2017-08-25 15:35                     ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-25 15:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Thu, Aug 24, 2017 at 2:42 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Aug 24, 2017 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >
> > My original performance testing was done with an Ubuntu generic
> > configuration. This configuration has the CONFIG_FUNCTION_TRACER
> > option which was incompatible with PIE. The tracer failed to replace
> > the __fentry__ call by a nop slide on each traceable function because
> > the instruction was not the one expected. If PIE is enabled, gcc
> > generates a difference call instruction based on the GOT without
> > checking the visibility options (basically call *__fentry__@GOTPCREL).
>
> Gah.
>
> Don't we actually have *more* address bits for randomization at the
> low end, rather than getting rid of -mcmodel=kernel?

We have but I think we use most of it for potential modules and the
fixmap but it is not that big. The increase in range from 1G to 3G is
just an example and a way to ensure PIE work as expected. The long
term goal is being able to put the kernel where we want in memory,
randomizing the position and the order of almost all memory sections.

That would be valuable against BTB attack [1] for example where
randomization on the low 32-bit is ineffective.

[1] https://github.com/felixwilhelm/mario_baslr

>
> Has anybody looked at just moving kernel text by smaller values than
> the page size? Yeah, yeah, the kernel has several sections that need
> page alignment, but I think we could relocate normal text by just the
> cacheline size, and that sounds like it would give several bits of
> randomness with little downside.

I didn't look into it. There is value in it depending on performance
impact. I think both PIE and lower grain randomization would be
useful.

>
> Or has somebody already looked at it and I just missed it?
>
>                Linus




-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:42                   ` Linus Torvalds
  2017-08-25 15:35                     ` Thomas Garnier
@ 2017-08-25 15:35                     ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-25 15:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Aug 24, 2017 at 2:42 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Aug 24, 2017 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >
> > My original performance testing was done with an Ubuntu generic
> > configuration. This configuration has the CONFIG_FUNCTION_TRACER
> > option which was incompatible with PIE. The tracer failed to replace
> > the __fentry__ call by a nop slide on each traceable function because
> > the instruction was not the one expected. If PIE is enabled, gcc
> > generates a difference call instruction based on the GOT without
> > checking the visibility options (basically call *__fentry__@GOTPCREL).
>
> Gah.
>
> Don't we actually have *more* address bits for randomization at the
> low end, rather than getting rid of -mcmodel=kernel?

We have but I think we use most of it for potential modules and the
fixmap but it is not that big. The increase in range from 1G to 3G is
just an example and a way to ensure PIE work as expected. The long
term goal is being able to put the kernel where we want in memory,
randomizing the position and the order of almost all memory sections.

That would be valuable against BTB attack [1] for example where
randomization on the low 32-bit is ineffective.

[1] https://github.com/felixwilhelm/mario_baslr

>
> Has anybody looked at just moving kernel text by smaller values than
> the page size? Yeah, yeah, the kernel has several sections that need
> page alignment, but I think we could relocate normal text by just the
> cacheline size, and that sounds like it would give several bits of
> randomness with little downside.

I didn't look into it. There is value in it depending on performance
impact. I think both PIE and lower grain randomization would be
useful.

>
> Or has somebody already looked at it and I just missed it?
>
>                Linus




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-25  8:04                   ` Ingo Molnar
  2017-08-25 15:05                     ` Thomas Garnier
@ 2017-08-25 15:05                     ` Thomas Garnier
  2017-08-29 19:34                       ` Thomas Garnier
  2017-08-29 19:34                       ` Thomas Garnier
  1 sibling, 2 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-25 15:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> With the fix for function tracing, the hackbench results have an
>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>> configuration, the numbers are closer to 0.8%.
>>
>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>> and +1.180% on the ubuntu configuration.
>
> A 1% text size increase is still significant. Could you look at the disassembly,
> where does the size increase come from?

I will take a look, in this current iteration I added the .got and
.got.plt so removing them will remove a big (even if they are small,
we don't use them to increase perf).

What do you think about the perf numbers in general so far?

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-25  8:04                   ` Ingo Molnar
@ 2017-08-25 15:05                     ` Thomas Garnier
  2017-08-25 15:05                     ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-25 15:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> With the fix for function tracing, the hackbench results have an
>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>> configuration, the numbers are closer to 0.8%.
>>
>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>> and +1.180% on the ubuntu configuration.
>
> A 1% text size increase is still significant. Could you look at the disassembly,
> where does the size increase come from?

I will take a look, in this current iteration I added the .got and
.got.plt so removing them will remove a big (even if they are small,
we don't use them to increase perf).

What do you think about the perf numbers in general so far?

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
                                     ` (3 preceding siblings ...)
  2017-08-25  8:04                   ` Ingo Molnar
@ 2017-08-25  8:04                   ` Ingo Molnar
  2017-08-25 15:05                     ` Thomas Garnier
  2017-08-25 15:05                     ` Thomas Garnier
  4 siblings, 2 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-25  8:04 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> With the fix for function tracing, the hackbench results have an
> average of +0.8 to +1.4% (from +8% to +10% before). With a default
> configuration, the numbers are closer to 0.8%.
> 
> On the .text size, with gcc 4.9 I see +0.8% on default configuration
> and +1.180% on the ubuntu configuration.

A 1% text size increase is still significant. Could you look at the disassembly, 
where does the size increase come from?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
                                     ` (2 preceding siblings ...)
  2017-08-25  1:07                   ` Steven Rostedt
@ 2017-08-25  8:04                   ` Ingo Molnar
  2017-08-25  8:04                   ` Ingo Molnar
  4 siblings, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-25  8:04 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> With the fix for function tracing, the hackbench results have an
> average of +0.8 to +1.4% (from +8% to +10% before). With a default
> configuration, the numbers are closer to 0.8%.
> 
> On the .text size, with gcc 4.9 I see +0.8% on default configuration
> and +1.180% on the ubuntu configuration.

A 1% text size increase is still significant. Could you look at the disassembly, 
where does the size increase come from?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:42                   ` Linus Torvalds
  2017-08-24 21:42                   ` Linus Torvalds
@ 2017-08-25  1:07                   ` Steven Rostedt
  2017-08-25  8:04                   ` Ingo Molnar
  2017-08-25  8:04                   ` Ingo Molnar
  4 siblings, 0 replies; 106+ messages in thread
From: Steven Rostedt @ 2017-08-25  1:07 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, 24 Aug 2017 14:13:38 -0700
Thomas Garnier <thgarnie@google.com> wrote:

> With the fix for function tracing, the hackbench results have an
> average of +0.8 to +1.4% (from +8% to +10% before). With a default
> configuration, the numbers are closer to 0.8%.

Wow, an empty fentry function not "nop"ed out only added 8% to 10%
overhead. I never did the benchmarks of that since I did it before
fentry was introduced, which was with the old "mcount". That gave an
average of 13% overhead in hackbench.

-- Steve

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:42                   ` Linus Torvalds
@ 2017-08-24 21:42                   ` Linus Torvalds
  2017-08-25 15:35                     ` Thomas Garnier
  2017-08-25 15:35                     ` Thomas Garnier
  2017-08-25  1:07                   ` Steven Rostedt
                                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 106+ messages in thread
From: Linus Torvalds @ 2017-08-24 21:42 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Thu, Aug 24, 2017 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
>
> My original performance testing was done with an Ubuntu generic
> configuration. This configuration has the CONFIG_FUNCTION_TRACER
> option which was incompatible with PIE. The tracer failed to replace
> the __fentry__ call by a nop slide on each traceable function because
> the instruction was not the one expected. If PIE is enabled, gcc
> generates a difference call instruction based on the GOT without
> checking the visibility options (basically call *__fentry__@GOTPCREL).

Gah.

Don't we actually have *more* address bits for randomization at the
low end, rather than getting rid of -mcmodel=kernel?

Has anybody looked at just moving kernel text by smaller values than
the page size? Yeah, yeah, the kernel has several sections that need
page alignment, but I think we could relocate normal text by just the
cacheline size, and that sounds like it would give several bits of
randomness with little downside.

Or has somebody already looked at it and I just missed it?

               Linus

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-24 21:13                 ` Thomas Garnier
@ 2017-08-24 21:42                   ` Linus Torvalds
  2017-08-24 21:42                   ` Linus Torvalds
                                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 106+ messages in thread
From: Linus Torvalds @ 2017-08-24 21:42 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Aug 24, 2017 at 2:13 PM, Thomas Garnier <thgarnie@google.com> wrote:
>
> My original performance testing was done with an Ubuntu generic
> configuration. This configuration has the CONFIG_FUNCTION_TRACER
> option which was incompatible with PIE. The tracer failed to replace
> the __fentry__ call by a nop slide on each traceable function because
> the instruction was not the one expected. If PIE is enabled, gcc
> generates a difference call instruction based on the GOT without
> checking the visibility options (basically call *__fentry__@GOTPCREL).

Gah.

Don't we actually have *more* address bits for randomization at the
low end, rather than getting rid of -mcmodel=kernel?

Has anybody looked at just moving kernel text by smaller values than
the page size? Yeah, yeah, the kernel has several sections that need
page alignment, but I think we could relocate normal text by just the
cacheline size, and that sounds like it would give several bits of
randomness with little downside.

Or has somebody already looked at it and I just missed it?

               Linus

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-17 14:10               ` Thomas Garnier
  2017-08-24 21:13                 ` Thomas Garnier
@ 2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:42                   ` Linus Torvalds
                                     ` (4 more replies)
  1 sibling, 5 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-24 21:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Thu, Aug 17, 2017 at 7:10 AM, Thomas Garnier <thgarnie@google.com> wrote:
>
> On Thu, Aug 17, 2017 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> > > > > -model=small/medium assume you are on the low 32-bit. It generates
> > > > > instructions where the virtual addresses have the high 32-bit to be zero.
> > > >
> > > > How are these assumptions hardcoded by GCC? Most of the instructions should be
> > > > relocatable straight away, as most call/jump/branch instructions are
> > > > RIP-relative.
> > >
> > > I think PIE is capable to use relative instructions well. mcmodel=large assumes
> > > symbols can be anywhere.
> >
> > So if the numbers in your changelog and Kconfig text cannot be trusted, there's
> > this description of the size impact which I suspect is less susceptible to
> > measurement error:
> >
> > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > +         increase on the .text sections). The vmlinux binary will be
> > +         significantly smaller due to less relocations.
> >
> > ... but describing a 1-2% kernel text size increase as "slightly more assembly"
> > shows a gratituous disregard to kernel code generation quality! In reality that's
> > a huge size increase that in most cases will almost directly transfer to a 1-2%
> > slowdown for kernel intense workloads.
> >
> >
> > Where does that size increase come from, if PIE is capable of using relative
> > instructins well? Does it come from the loss of a generic register and the
> > resulting increase in register pressure, stack spills, etc.?
>
> I will try to gather more information on the size increase. The size
> increase might be smaller with gcc 4.9 given performance was much
> better.

Coming back on this thread as I identified the root cause of the
performance issue.

My original performance testing was done with an Ubuntu generic
configuration. This configuration has the CONFIG_FUNCTION_TRACER
option which was incompatible with PIE. The tracer failed to replace
the __fentry__ call by a nop slide on each traceable function because
the instruction was not the one expected. If PIE is enabled, gcc
generates a difference call instruction based on the GOT without
checking the visibility options (basically call *__fentry__@GOTPCREL).

With the fix for function tracing, the hackbench results have an
average of +0.8 to +1.4% (from +8% to +10% before). With a default
configuration, the numbers are closer to 0.8%.

On the .text size, with gcc 4.9 I see +0.8% on default configuration
and +1.180% on the ubuntu configuration.

Next iteration should have an updated set of performance metrics (will
try to use gcc 6.0 or higher) and incorporate the fix on function
tracing.

Let me know if you have questions and feedback.

>
> >
> > So I'm still unhappy about this all, and about the attitude surrounding it.
> >
> > Thanks,
> >
> >         Ingo
>
>
>
>
> --
> Thomas




-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-17 14:10               ` Thomas Garnier
@ 2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:13                 ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-24 21:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Aug 17, 2017 at 7:10 AM, Thomas Garnier <thgarnie@google.com> wrote:
>
> On Thu, Aug 17, 2017 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> > > > > -model=small/medium assume you are on the low 32-bit. It generates
> > > > > instructions where the virtual addresses have the high 32-bit to be zero.
> > > >
> > > > How are these assumptions hardcoded by GCC? Most of the instructions should be
> > > > relocatable straight away, as most call/jump/branch instructions are
> > > > RIP-relative.
> > >
> > > I think PIE is capable to use relative instructions well. mcmodel=large assumes
> > > symbols can be anywhere.
> >
> > So if the numbers in your changelog and Kconfig text cannot be trusted, there's
> > this description of the size impact which I suspect is less susceptible to
> > measurement error:
> >
> > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > +         increase on the .text sections). The vmlinux binary will be
> > +         significantly smaller due to less relocations.
> >
> > ... but describing a 1-2% kernel text size increase as "slightly more assembly"
> > shows a gratituous disregard to kernel code generation quality! In reality that's
> > a huge size increase that in most cases will almost directly transfer to a 1-2%
> > slowdown for kernel intense workloads.
> >
> >
> > Where does that size increase come from, if PIE is capable of using relative
> > instructins well? Does it come from the loss of a generic register and the
> > resulting increase in register pressure, stack spills, etc.?
>
> I will try to gather more information on the size increase. The size
> increase might be smaller with gcc 4.9 given performance was much
> better.

Coming back on this thread as I identified the root cause of the
performance issue.

My original performance testing was done with an Ubuntu generic
configuration. This configuration has the CONFIG_FUNCTION_TRACER
option which was incompatible with PIE. The tracer failed to replace
the __fentry__ call by a nop slide on each traceable function because
the instruction was not the one expected. If PIE is enabled, gcc
generates a difference call instruction based on the GOT without
checking the visibility options (basically call *__fentry__@GOTPCREL).

With the fix for function tracing, the hackbench results have an
average of +0.8 to +1.4% (from +8% to +10% before). With a default
configuration, the numbers are closer to 0.8%.

On the .text size, with gcc 4.9 I see +0.8% on default configuration
and +1.180% on the ubuntu configuration.

Next iteration should have an updated set of performance metrics (will
try to use gcc 6.0 or higher) and incorporate the fix on function
tracing.

Let me know if you have questions and feedback.

>
> >
> > So I'm still unhappy about this all, and about the attitude surrounding it.
> >
> > Thanks,
> >
> >         Ingo
>
>
>
>
> --
> Thomas




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:31         ` Peter Zijlstra
  2017-08-21 15:57           ` Thomas Garnier
@ 2017-08-21 15:57           ` Thomas Garnier
  2017-08-28  1:26           ` H. Peter Anvin
  2017-08-28  1:26           ` H. Peter Anvin
  3 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-21 15:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Mon, Aug 21, 2017 at 7:31 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
>> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
>> > x86-64 address space to randomize the location of kernel text. The location of
>> > modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
>
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
>

That's different than what I expected at first too.

Now, I think I have an alternative of using mcmodel=large. I could use
-fPIC and ensure modules are never far away from the main kernel
(moving the module section start close to the random kernel end). I
looked at it and that seems possible but will require more work. I
plan to start with the mcmodel=large support and add this mode in a
way that could benefit classic KASLR (without -fPIC) because it
randomize where modules start based on the kernel.

-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 14:31         ` Peter Zijlstra
@ 2017-08-21 15:57           ` Thomas Garnier
  2017-08-21 15:57           ` Thomas Garnier
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-21 15:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Mon, Aug 21, 2017 at 7:31 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
>> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
>> > x86-64 address space to randomize the location of kernel text. The location of
>> > modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
>
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
>

That's different than what I expected at first too.

Now, I think I have an alternative of using mcmodel=large. I could use
-fPIC and ensure modules are never far away from the main kernel
(moving the module section start close to the random kernel end). I
looked at it and that seems possible but will require more work. I
plan to start with the mcmodel=large support and add this mode in a
way that could benefit classic KASLR (without -fPIC) because it
randomize where modules start based on the kernel.

-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
                           ` (3 preceding siblings ...)
  2017-08-16 15:12         ` Ingo Molnar
@ 2017-08-21 14:31         ` Peter Zijlstra
  2017-08-21 15:57           ` Thomas Garnier
                             ` (3 more replies)
  2017-08-21 14:31         ` Peter Zijlstra
  5 siblings, 4 replies; 106+ messages in thread
From: Peter Zijlstra @ 2017-08-21 14:31 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo, Christoph Lameter

On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:

> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> > x86-64 address space to randomize the location of kernel text. The location of
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates
> instructions where the virtual addresses have the high 32-bit to be
> zero.

That's a compiler fail, right? Because the SDM states that for "CALL
rel32" the 32bit displacement is sign extended on x86_64.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
                           ` (4 preceding siblings ...)
  2017-08-21 14:31         ` Peter Zijlstra
@ 2017-08-21 14:31         ` Peter Zijlstra
  5 siblings, 0 replies; 106+ messages in thread
From: Peter Zijlstra @ 2017-08-21 14:31 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Ingo Molnar, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:

> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> > x86-64 address space to randomize the location of kernel text. The location of
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates
> instructions where the virtual addresses have the high 32-bit to be
> zero.

That's a compiler fail, right? Because the SDM states that for "CALL
rel32" the 32bit displacement is sign extended on x86_64.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 13:32           ` Peter Zijlstra
  2017-08-21 14:28             ` Peter Zijlstra
@ 2017-08-21 14:28             ` Peter Zijlstra
  2017-09-22 18:27               ` H. Peter Anvin
  2017-09-22 18:27               ` H. Peter Anvin
  1 sibling, 2 replies; 106+ messages in thread
From: Peter Zijlstra @ 2017-08-21 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Mon, Aug 21, 2017 at 03:32:22PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 16, 2017 at 05:12:35PM +0200, Ingo Molnar wrote:
> > Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
> > instruction level.
> > 
> > Function calls look like this:
> > 
> >  -mcmodel=medium:
> > 
> >    757:   e8 98 ff ff ff          callq  6f4 <test_code>
> > 
> >  -mcmodel=large
> > 
> >    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
> >    782:   ff ff ff 
> >    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
> >    789:   ff d0                   callq  *%rax
> > 
> > And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
> > totally unacceptable.
> 
> So why does this need to be computed for every single call? How often
> will we move the kernel around at runtime?
> 
> Why can't we process the relocation at load time and then discard the
> relocation tables along with the rest of __init ?

Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
immediates. Still, small RIP relative should be able to live at any
point as long as everything lives inside the same 2G relative range, so
would still allow the goal of increasing the KASLR range.

So I'm not seeing how we need large mode for that. That said, after
reading up on all this, RIP relative will not be too pretty either,
while CALL is naturally RIP relative, data still needs an explicit %rip
offset, still loads better than the large model.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-21 13:32           ` Peter Zijlstra
@ 2017-08-21 14:28             ` Peter Zijlstra
  2017-08-21 14:28             ` Peter Zijlstra
  1 sibling, 0 replies; 106+ messages in thread
From: Peter Zijlstra @ 2017-08-21 14:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Mon, Aug 21, 2017 at 03:32:22PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 16, 2017 at 05:12:35PM +0200, Ingo Molnar wrote:
> > Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
> > instruction level.
> > 
> > Function calls look like this:
> > 
> >  -mcmodel=medium:
> > 
> >    757:   e8 98 ff ff ff          callq  6f4 <test_code>
> > 
> >  -mcmodel=large
> > 
> >    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
> >    782:   ff ff ff 
> >    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
> >    789:   ff d0                   callq  *%rax
> > 
> > And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
> > totally unacceptable.
> 
> So why does this need to be computed for every single call? How often
> will we move the kernel around at runtime?
> 
> Why can't we process the relocation at load time and then discard the
> relocation tables along with the rest of __init ?

Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
immediates. Still, small RIP relative should be able to live at any
point as long as everything lives inside the same 2G relative range, so
would still allow the goal of increasing the KASLR range.

So I'm not seeing how we need large mode for that. That said, after
reading up on all this, RIP relative will not be too pretty either,
while CALL is naturally RIP relative, data still needs an explicit %rip
offset, still loads better than the large model.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (5 preceding siblings ...)
  2017-08-16 16:57           ` Thomas Garnier
@ 2017-08-21 13:32           ` Peter Zijlstra
  2017-08-21 14:28             ` Peter Zijlstra
  2017-08-21 14:28             ` Peter Zijlstra
  2017-08-21 13:32           ` Peter Zijlstra
  7 siblings, 2 replies; 106+ messages in thread
From: Peter Zijlstra @ 2017-08-21 13:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Wed, Aug 16, 2017 at 05:12:35PM +0200, Ingo Molnar wrote:
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
> instruction level.
> 
> Function calls look like this:
> 
>  -mcmodel=medium:
> 
>    757:   e8 98 ff ff ff          callq  6f4 <test_code>
> 
>  -mcmodel=large
> 
>    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
>    782:   ff ff ff 
>    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
>    789:   ff d0                   callq  *%rax
> 
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
> totally unacceptable.

So why does this need to be computed for every single call? How often
will we move the kernel around at runtime?

Why can't we process the relocation at load time and then discard the
relocation tables along with the rest of __init ?

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (6 preceding siblings ...)
  2017-08-21 13:32           ` Peter Zijlstra
@ 2017-08-21 13:32           ` Peter Zijlstra
  7 siblings, 0 replies; 106+ messages in thread
From: Peter Zijlstra @ 2017-08-21 13:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Michal Hocko, Len Brown,
	Radim Krčmář,
	Catalin Marinas, Christopher Li, Alexei Starovoitov,
	David Howells, Paul Gortmaker, Pavel Machek, H . Peter Anvin,
	Kernel Hardening, Christoph Lameter, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel, Rafael J . Wysocki

On Wed, Aug 16, 2017 at 05:12:35PM +0200, Ingo Molnar wrote:
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
> instruction level.
> 
> Function calls look like this:
> 
>  -mcmodel=medium:
> 
>    757:   e8 98 ff ff ff          callq  6f4 <test_code>
> 
>  -mcmodel=large
> 
>    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
>    782:   ff ff ff 
>    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
>    789:   ff d0                   callq  *%rax
> 
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
> totally unacceptable.

So why does this need to be computed for every single call? How often
will we move the kernel around at runtime?

Why can't we process the relocation at load time and then discard the
relocation tables along with the rest of __init ?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-17  8:09             ` Ingo Molnar
  2017-08-17 14:10               ` Thomas Garnier
@ 2017-08-17 14:10               ` Thomas Garnier
  2017-08-24 21:13                 ` Thomas Garnier
  2017-08-24 21:13                 ` Thomas Garnier
  1 sibling, 2 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-17 14:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Thu, Aug 17, 2017 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
> > > > -model=small/medium assume you are on the low 32-bit. It generates
> > > > instructions where the virtual addresses have the high 32-bit to be zero.
> > >
> > > How are these assumptions hardcoded by GCC? Most of the instructions should be
> > > relocatable straight away, as most call/jump/branch instructions are
> > > RIP-relative.
> >
> > I think PIE is capable to use relative instructions well. mcmodel=large assumes
> > symbols can be anywhere.
>
> So if the numbers in your changelog and Kconfig text cannot be trusted, there's
> this description of the size impact which I suspect is less susceptible to
> measurement error:
>
> +         The kernel and modules will generate slightly more assembly (1 to 2%
> +         increase on the .text sections). The vmlinux binary will be
> +         significantly smaller due to less relocations.
>
> ... but describing a 1-2% kernel text size increase as "slightly more assembly"
> shows a gratituous disregard to kernel code generation quality! In reality that's
> a huge size increase that in most cases will almost directly transfer to a 1-2%
> slowdown for kernel intense workloads.
>
>
> Where does that size increase come from, if PIE is capable of using relative
> instructins well? Does it come from the loss of a generic register and the
> resulting increase in register pressure, stack spills, etc.?

I will try to gather more information on the size increase. The size
increase might be smaller with gcc 4.9 given performance was much
better.

>
> So I'm still unhappy about this all, and about the attitude surrounding it.
>
> Thanks,
>
>         Ingo




-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-17  8:09             ` Ingo Molnar
@ 2017-08-17 14:10               ` Thomas Garnier
  2017-08-17 14:10               ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-17 14:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Thu, Aug 17, 2017 at 1:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
> > > > -model=small/medium assume you are on the low 32-bit. It generates
> > > > instructions where the virtual addresses have the high 32-bit to be zero.
> > >
> > > How are these assumptions hardcoded by GCC? Most of the instructions should be
> > > relocatable straight away, as most call/jump/branch instructions are
> > > RIP-relative.
> >
> > I think PIE is capable to use relative instructions well. mcmodel=large assumes
> > symbols can be anywhere.
>
> So if the numbers in your changelog and Kconfig text cannot be trusted, there's
> this description of the size impact which I suspect is less susceptible to
> measurement error:
>
> +         The kernel and modules will generate slightly more assembly (1 to 2%
> +         increase on the .text sections). The vmlinux binary will be
> +         significantly smaller due to less relocations.
>
> ... but describing a 1-2% kernel text size increase as "slightly more assembly"
> shows a gratituous disregard to kernel code generation quality! In reality that's
> a huge size increase that in most cases will almost directly transfer to a 1-2%
> slowdown for kernel intense workloads.
>
>
> Where does that size increase come from, if PIE is capable of using relative
> instructins well? Does it come from the loss of a generic register and the
> resulting increase in register pressure, stack spills, etc.?

I will try to gather more information on the size increase. The size
increase might be smaller with gcc 4.9 given performance was much
better.

>
> So I'm still unhappy about this all, and about the attitude surrounding it.
>
> Thanks,
>
>         Ingo




-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 16:57           ` Thomas Garnier
  2017-08-17  8:09             ` Ingo Molnar
@ 2017-08-17  8:09             ` Ingo Molnar
  2017-08-17 14:10               ` Thomas Garnier
  2017-08-17 14:10               ` Thomas Garnier
  1 sibling, 2 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-17  8:09 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> > > -model=small/medium assume you are on the low 32-bit. It generates 
> > > instructions where the virtual addresses have the high 32-bit to be zero.
> >
> > How are these assumptions hardcoded by GCC? Most of the instructions should be 
> > relocatable straight away, as most call/jump/branch instructions are 
> > RIP-relative.
> 
> I think PIE is capable to use relative instructions well. mcmodel=large assumes 
> symbols can be anywhere.

So if the numbers in your changelog and Kconfig text cannot be trusted, there's 
this description of the size impact which I suspect is less susceptible to 
measurement error:

+         The kernel and modules will generate slightly more assembly (1 to 2%
+         increase on the .text sections). The vmlinux binary will be
+         significantly smaller due to less relocations.

... but describing a 1-2% kernel text size increase as "slightly more assembly" 
shows a gratituous disregard to kernel code generation quality! In reality that's 
a huge size increase that in most cases will almost directly transfer to a 1-2% 
slowdown for kernel intense workloads.

Where does that size increase come from, if PIE is capable of using relative 
instructins well? Does it come from the loss of a generic register and the 
resulting increase in register pressure, stack spills, etc.?

So I'm still unhappy about this all, and about the attitude surrounding it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 16:57           ` Thomas Garnier
@ 2017-08-17  8:09             ` Ingo Molnar
  2017-08-17  8:09             ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-17  8:09 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> > > -model=small/medium assume you are on the low 32-bit. It generates 
> > > instructions where the virtual addresses have the high 32-bit to be zero.
> >
> > How are these assumptions hardcoded by GCC? Most of the instructions should be 
> > relocatable straight away, as most call/jump/branch instructions are 
> > RIP-relative.
> 
> I think PIE is capable to use relative instructions well. mcmodel=large assumes 
> symbols can be anywhere.

So if the numbers in your changelog and Kconfig text cannot be trusted, there's 
this description of the size impact which I suspect is less susceptible to 
measurement error:

+         The kernel and modules will generate slightly more assembly (1 to 2%
+         increase on the .text sections). The vmlinux binary will be
+         significantly smaller due to less relocations.

... but describing a 1-2% kernel text size increase as "slightly more assembly" 
shows a gratituous disregard to kernel code generation quality! In reality that's 
a huge size increase that in most cases will almost directly transfer to a 1-2% 
slowdown for kernel intense workloads.

Where does that size increase come from, if PIE is capable of using relative 
instructins well? Does it come from the loss of a generic register and the 
resulting increase in register pressure, stack spills, etc.?

So I'm still unhappy about this all, and about the attitude surrounding it.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (4 preceding siblings ...)
  2017-08-16 16:57           ` Thomas Garnier
@ 2017-08-16 16:57           ` Thomas Garnier
  2017-08-17  8:09             ` Ingo Molnar
  2017-08-17  8:09             ` Ingo Molnar
  2017-08-21 13:32           ` Peter Zijlstra
  2017-08-21 13:32           ` Peter Zijlstra
  7 siblings, 2 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-16 16:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Wed, Aug 16, 2017 at 8:12 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
> > On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > * Thomas Garnier <thgarnie@google.com> wrote:
> > >
> > >> > Do these changes get us closer to being able to build the kernel as truly
> > >> > position independent, i.e. to place it anywhere in the valid x86-64 address
> > >> > space? Or any other advantages?
> > >>
> > >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
> > >> have a full randomized address space where position and order of sections are
> > >> completely random. There is still some work to get there but being able to build
> > >> a PIE kernel is a significant step.
> > >
> > > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> > >
> > > +config RANDOMIZE_BASE_LARGE
> > > +       bool "Increase the randomization range of the kernel image"
> > > +       depends on X86_64 && RANDOMIZE_BASE
> > > +       select X86_PIE
> > > +       select X86_MODULE_PLTS if MODULES
> > > +       default n
> > > +       ---help---
> > > +         Build the kernel as a Position Independent Executable (PIE) and
> > > +         increase the available randomization range from 1GB to 3GB.
> > > +
> > > +         This option impacts performance on kernel CPU intensive workloads up
> > > +         to 10% due to PIE generated code. Impact on user-mode processes and
> > > +         typical usage would be significantly less (0.50% when you build the
> > > +         kernel).
> > > +
> > > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > > +         increase on the .text sections). The vmlinux binary will be
> > > +         significantly smaller due to less relocations.
> > >
> > > To put 10% kernel overhead into perspective: enabling this option wipes out about
> > > 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> > > ... (!!)
> >
> > Note that 10% is the high-bound of a CPU intensive workload.
>
> Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of
> modern kernel performance. In many cases we are literally applying cycle level
> optimizations that are barely measurable. A 0.1% speedup in linear execution speed
> is already a big success.
>
> > I am going to start doing performance testing on -mcmodel=large to see if it is
> > faster than -fPIE.
>
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine
> instruction level.
>
> Function calls look like this:
>
>  -mcmodel=medium:
>
>    757:   e8 98 ff ff ff          callq  6f4 <test_code>
>
>  -mcmodel=large
>
>    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
>    782:   ff ff ff
>    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
>    789:   ff d0                   callq  *%rax
>
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.
>

I started looking into mcmodel=large and ran into multiple issues. In
the meantime, i thought I would
try difference configurations and compilers.

I did 10 hackbench runs accross 10 reboots with and without pie (same
commit) with gcc 4.9. I copied
the result below and based on the hackbench configuration we are
between -0.29% and 1.92% (average
across is 0.8%) which seems more aligned with what people discussed in
this thread.

I don't know how I got 10% maximum on hackbench, I am still
investigating. It could be the configuration
I used or my base compiler being too old.

> > > I think the fundamental flaw is the assumption that we need a PIE executable
> > > to have a freely relocatable kernel on 64-bit CPUs.
> > >
> > > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> > > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> > > x86-64 address space to randomize the location of kernel text. The location of
> > > modules can be further randomized within that 2GB window.
> >
> > -model=small/medium assume you are on the low 32-bit. It generates instructions
> > where the virtual addresses have the high 32-bit to be zero.
>
> How are these assumptions hardcoded by GCC? Most of the instructions should be
> relocatable straight away, as most call/jump/branch instructions are RIP-relative.

I think PIE is capable to use relative instructions well.
mcmodel=large assumes symbols can be anywhere.

>
> I.e. is there no GCC code generation mode where code can be placed anywhere in the
> canonical address space, yet call and jump distance is within 31 bits so that the
> generated code is fast?

I think that's basically PIE. With PIE, you have the assumption
everything is close, the main issue is any assembly referencing
absolute addresses.

>
> Thanks,
>
>         Ingo

process-pipe-1600 ------
         baseline_samecommit     pie  % diff
0                     16.985  16.999   0.082
1                     17.065  17.071   0.033
2                     17.188  17.130  -0.342
3                     17.148  17.107  -0.240
4                     17.217  17.170  -0.275
5                     17.216  17.145  -0.415
6                     17.161  17.109  -0.304
7                     17.202  17.122  -0.465
8                     17.169  17.173   0.024
9                     17.217  17.178  -0.227
average               17.157  17.120  -0.213
median                17.169  17.122  -0.271
min                   16.985  16.999   0.082
max                   17.217  17.178  -0.228

[14 rows x 3 columns]
threads-pipe-1600 ------
         baseline_samecommit     pie  % diff
0                     17.914  18.041   0.707
1                     18.337  18.352   0.083
2                     18.233  18.457   1.225
3                     18.334  18.402   0.366
4                     18.381  18.369  -0.066
5                     18.370  18.408   0.207
6                     18.337  18.400   0.345
7                     18.368  18.372   0.020
8                     18.328  18.588   1.421
9                     18.369  18.344  -0.138
average               18.297  18.373   0.415
median                18.337  18.373   0.200
min                   17.914  18.041   0.707
max                   18.381  18.588   1.126

[14 rows x 3 columns]
threads-pipe-50 ------
         baseline_samecommit     pie  % diff
0                     23.491  22.794  -2.965
1                     23.219  23.542   1.387
2                     22.886  23.638   3.286
3                     23.233  23.778   2.343
4                     23.228  23.703   2.046
5                     23.000  23.376   1.636
6                     23.589  23.335  -1.079
7                     23.043  23.543   2.169
8                     23.117  23.350   1.007
9                     23.059  23.420   1.564
average               23.187  23.448   1.127
median                23.187  23.448   1.127
min                   22.886  22.794  -0.399
max                   23.589  23.778   0.800

[14 rows x 3 columns]
process-socket-50 ------
         baseline_samecommit     pie  % diff
0                     20.333  20.430   0.479
1                     20.198  20.371   0.856
2                     20.494  20.737   1.185
3                     20.445  21.264   4.008
4                     20.530  20.911   1.854
5                     20.281  20.487   1.015
6                     20.311  20.871   2.757
7                     20.472  20.890   2.044
8                     20.568  20.422  -0.710
9                     20.415  20.647   1.134
average               20.405  20.703   1.462
median                20.415  20.703   1.410
min                   20.198  20.371   0.856
max                   20.568  21.264   3.385

[14 rows x 3 columns]
process-pipe-50 ------
         baseline_samecommit     pie  % diff
0                     20.131  20.643   2.541
1                     20.184  20.658   2.349
2                     20.359  20.907   2.693
3                     20.365  21.284   4.514
4                     20.506  20.578   0.352
5                     20.393  20.599   1.010
6                     20.245  20.515   1.331
7                     20.627  20.964   1.636
8                     20.519  20.862   1.670
9                     20.505  20.741   1.150
average               20.383  20.775   1.922
median                20.383  20.741   1.753
min                   20.131  20.515   1.907
max                   20.627  21.284   3.186

[14 rows x 3 columns]
threads-socket-50 ------
         baseline_samecommit     pie  % diff
0                     23.197  23.728   2.286
1                     23.304  23.585   1.205
2                     23.098  23.379   1.217
3                     23.028  23.787   3.295
4                     23.242  23.122  -0.517
5                     23.036  23.512   2.068
6                     23.139  23.258   0.512
7                     22.801  23.458   2.881
8                     23.319  23.276  -0.187
9                     22.989  23.577   2.557
average               23.115  23.468   1.526
median                23.115  23.468   1.526
min                   22.801  23.122   1.407
max                   23.319  23.787   2.006

[14 rows x 3 columns]
process-socket-1600 ------
         baseline_samecommit     pie  % diff
0                     17.214  17.168  -0.262
1                     17.172  17.195   0.135
2                     17.278  17.137  -0.817
3                     17.173  17.102  -0.414
4                     17.211  17.153  -0.335
5                     17.220  17.160  -0.345
6                     17.224  17.161  -0.365
7                     17.224  17.224  -0.004
8                     17.176  17.135  -0.236
9                     17.242  17.188  -0.311
average               17.213  17.162  -0.296
median                17.214  17.161  -0.306
min                   17.172  17.102  -0.405
max                   17.278  17.224  -0.315

[14 rows x 3 columns]
threads-socket-1600 ------
         baseline_samecommit     pie  % diff
0                     18.395  18.389  -0.031
1                     18.459  18.404  -0.296
2                     18.427  18.445   0.096
3                     18.449  18.421  -0.150
4                     18.416  18.411  -0.026
5                     18.409  18.443   0.185
6                     18.325  18.308  -0.092
7                     18.491  18.317  -0.940
8                     18.496  18.375  -0.656
9                     18.436  18.385  -0.279
average               18.430  18.390  -0.219
median                18.430  18.390  -0.219
min                   18.325  18.308  -0.092
max                   18.496  18.445  -0.278

[14 rows x 3 columns]
Total stats ======
         baseline_samecommit     pie  % diff
average               19.773  19.930   0.791
median                19.773  19.930   0.791
min                   16.985  16.999   0.082
max                   23.589  23.787   0.839

[4 rows x 3 columns]

-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (3 preceding siblings ...)
  2017-08-16 16:26           ` Daniel Micay
@ 2017-08-16 16:57           ` Thomas Garnier
  2017-08-16 16:57           ` Thomas Garnier
                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-16 16:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Wed, Aug 16, 2017 at 8:12 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
> > On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > * Thomas Garnier <thgarnie@google.com> wrote:
> > >
> > >> > Do these changes get us closer to being able to build the kernel as truly
> > >> > position independent, i.e. to place it anywhere in the valid x86-64 address
> > >> > space? Or any other advantages?
> > >>
> > >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
> > >> have a full randomized address space where position and order of sections are
> > >> completely random. There is still some work to get there but being able to build
> > >> a PIE kernel is a significant step.
> > >
> > > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> > >
> > > +config RANDOMIZE_BASE_LARGE
> > > +       bool "Increase the randomization range of the kernel image"
> > > +       depends on X86_64 && RANDOMIZE_BASE
> > > +       select X86_PIE
> > > +       select X86_MODULE_PLTS if MODULES
> > > +       default n
> > > +       ---help---
> > > +         Build the kernel as a Position Independent Executable (PIE) and
> > > +         increase the available randomization range from 1GB to 3GB.
> > > +
> > > +         This option impacts performance on kernel CPU intensive workloads up
> > > +         to 10% due to PIE generated code. Impact on user-mode processes and
> > > +         typical usage would be significantly less (0.50% when you build the
> > > +         kernel).
> > > +
> > > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > > +         increase on the .text sections). The vmlinux binary will be
> > > +         significantly smaller due to less relocations.
> > >
> > > To put 10% kernel overhead into perspective: enabling this option wipes out about
> > > 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> > > ... (!!)
> >
> > Note that 10% is the high-bound of a CPU intensive workload.
>
> Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of
> modern kernel performance. In many cases we are literally applying cycle level
> optimizations that are barely measurable. A 0.1% speedup in linear execution speed
> is already a big success.
>
> > I am going to start doing performance testing on -mcmodel=large to see if it is
> > faster than -fPIE.
>
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine
> instruction level.
>
> Function calls look like this:
>
>  -mcmodel=medium:
>
>    757:   e8 98 ff ff ff          callq  6f4 <test_code>
>
>  -mcmodel=large
>
>    77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
>    782:   ff ff ff
>    785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
>    789:   ff d0                   callq  *%rax
>
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.
>

I started looking into mcmodel=large and ran into multiple issues. In
the meantime, i thought I would
try difference configurations and compilers.

I did 10 hackbench runs accross 10 reboots with and without pie (same
commit) with gcc 4.9. I copied
the result below and based on the hackbench configuration we are
between -0.29% and 1.92% (average
across is 0.8%) which seems more aligned with what people discussed in
this thread.

I don't know how I got 10% maximum on hackbench, I am still
investigating. It could be the configuration
I used or my base compiler being too old.

> > > I think the fundamental flaw is the assumption that we need a PIE executable
> > > to have a freely relocatable kernel on 64-bit CPUs.
> > >
> > > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> > > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> > > x86-64 address space to randomize the location of kernel text. The location of
> > > modules can be further randomized within that 2GB window.
> >
> > -model=small/medium assume you are on the low 32-bit. It generates instructions
> > where the virtual addresses have the high 32-bit to be zero.
>
> How are these assumptions hardcoded by GCC? Most of the instructions should be
> relocatable straight away, as most call/jump/branch instructions are RIP-relative.

I think PIE is capable to use relative instructions well.
mcmodel=large assumes symbols can be anywhere.

>
> I.e. is there no GCC code generation mode where code can be placed anywhere in the
> canonical address space, yet call and jump distance is within 31 bits so that the
> generated code is fast?

I think that's basically PIE. With PIE, you have the assumption
everything is close, the main issue is any assembly referencing
absolute addresses.

>
> Thanks,
>
>         Ingo

process-pipe-1600 ------
         baseline_samecommit     pie  % diff
0                     16.985  16.999   0.082
1                     17.065  17.071   0.033
2                     17.188  17.130  -0.342
3                     17.148  17.107  -0.240
4                     17.217  17.170  -0.275
5                     17.216  17.145  -0.415
6                     17.161  17.109  -0.304
7                     17.202  17.122  -0.465
8                     17.169  17.173   0.024
9                     17.217  17.178  -0.227
average               17.157  17.120  -0.213
median                17.169  17.122  -0.271
min                   16.985  16.999   0.082
max                   17.217  17.178  -0.228

[14 rows x 3 columns]
threads-pipe-1600 ------
         baseline_samecommit     pie  % diff
0                     17.914  18.041   0.707
1                     18.337  18.352   0.083
2                     18.233  18.457   1.225
3                     18.334  18.402   0.366
4                     18.381  18.369  -0.066
5                     18.370  18.408   0.207
6                     18.337  18.400   0.345
7                     18.368  18.372   0.020
8                     18.328  18.588   1.421
9                     18.369  18.344  -0.138
average               18.297  18.373   0.415
median                18.337  18.373   0.200
min                   17.914  18.041   0.707
max                   18.381  18.588   1.126

[14 rows x 3 columns]
threads-pipe-50 ------
         baseline_samecommit     pie  % diff
0                     23.491  22.794  -2.965
1                     23.219  23.542   1.387
2                     22.886  23.638   3.286
3                     23.233  23.778   2.343
4                     23.228  23.703   2.046
5                     23.000  23.376   1.636
6                     23.589  23.335  -1.079
7                     23.043  23.543   2.169
8                     23.117  23.350   1.007
9                     23.059  23.420   1.564
average               23.187  23.448   1.127
median                23.187  23.448   1.127
min                   22.886  22.794  -0.399
max                   23.589  23.778   0.800

[14 rows x 3 columns]
process-socket-50 ------
         baseline_samecommit     pie  % diff
0                     20.333  20.430   0.479
1                     20.198  20.371   0.856
2                     20.494  20.737   1.185
3                     20.445  21.264   4.008
4                     20.530  20.911   1.854
5                     20.281  20.487   1.015
6                     20.311  20.871   2.757
7                     20.472  20.890   2.044
8                     20.568  20.422  -0.710
9                     20.415  20.647   1.134
average               20.405  20.703   1.462
median                20.415  20.703   1.410
min                   20.198  20.371   0.856
max                   20.568  21.264   3.385

[14 rows x 3 columns]
process-pipe-50 ------
         baseline_samecommit     pie  % diff
0                     20.131  20.643   2.541
1                     20.184  20.658   2.349
2                     20.359  20.907   2.693
3                     20.365  21.284   4.514
4                     20.506  20.578   0.352
5                     20.393  20.599   1.010
6                     20.245  20.515   1.331
7                     20.627  20.964   1.636
8                     20.519  20.862   1.670
9                     20.505  20.741   1.150
average               20.383  20.775   1.922
median                20.383  20.741   1.753
min                   20.131  20.515   1.907
max                   20.627  21.284   3.186

[14 rows x 3 columns]
threads-socket-50 ------
         baseline_samecommit     pie  % diff
0                     23.197  23.728   2.286
1                     23.304  23.585   1.205
2                     23.098  23.379   1.217
3                     23.028  23.787   3.295
4                     23.242  23.122  -0.517
5                     23.036  23.512   2.068
6                     23.139  23.258   0.512
7                     22.801  23.458   2.881
8                     23.319  23.276  -0.187
9                     22.989  23.577   2.557
average               23.115  23.468   1.526
median                23.115  23.468   1.526
min                   22.801  23.122   1.407
max                   23.319  23.787   2.006

[14 rows x 3 columns]
process-socket-1600 ------
         baseline_samecommit     pie  % diff
0                     17.214  17.168  -0.262
1                     17.172  17.195   0.135
2                     17.278  17.137  -0.817
3                     17.173  17.102  -0.414
4                     17.211  17.153  -0.335
5                     17.220  17.160  -0.345
6                     17.224  17.161  -0.365
7                     17.224  17.224  -0.004
8                     17.176  17.135  -0.236
9                     17.242  17.188  -0.311
average               17.213  17.162  -0.296
median                17.214  17.161  -0.306
min                   17.172  17.102  -0.405
max                   17.278  17.224  -0.315

[14 rows x 3 columns]
threads-socket-1600 ------
         baseline_samecommit     pie  % diff
0                     18.395  18.389  -0.031
1                     18.459  18.404  -0.296
2                     18.427  18.445   0.096
3                     18.449  18.421  -0.150
4                     18.416  18.411  -0.026
5                     18.409  18.443   0.185
6                     18.325  18.308  -0.092
7                     18.491  18.317  -0.940
8                     18.496  18.375  -0.656
9                     18.436  18.385  -0.279
average               18.430  18.390  -0.219
median                18.430  18.390  -0.219
min                   18.325  18.308  -0.092
max                   18.496  18.445  -0.278

[14 rows x 3 columns]
Total stats ======
         baseline_samecommit     pie  % diff
average               19.773  19.930   0.791
median                19.773  19.930   0.791
min                   16.985  16.999   0.082
max                   23.589  23.787   0.839

[4 rows x 3 columns]

-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 16:26           ` Daniel Micay
  2017-08-16 16:32             ` Ard Biesheuvel
@ 2017-08-16 16:32             ` Ard Biesheuvel
  1 sibling, 0 replies; 106+ messages in thread
From: Ard Biesheuvel @ 2017-08-16 16:32 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Ingo Molnar, Thomas Garnier, Herbert Xu, David S . Miller,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Peter Zijlstra,
	Josh Poimboeuf, Arnd Bergmann, Matthias Kaehlcke,
	Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown

On 16 August 2017 at 17:26, Daniel Micay <danielmicay@gmail.com> wrote:
>> How are these assumptions hardcoded by GCC? Most of the instructions
>> should be
>> relocatable straight away, as most call/jump/branch instructions are
>> RIP-relative.
>>
>> I.e. is there no GCC code generation mode where code can be placed
>> anywhere in the
>> canonical address space, yet call and jump distance is within 31 bits
>> so that the
>> generated code is fast?
>
> That's what PIE is meant to do. However, not disabling support for lazy
> linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
> it to add needless overhead.
>
> arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
> CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

The difference with arm64 is that its generic small code model is
already position independent, so we don't have to pass -fpic or -fpie
to the compiler. We only link in PIE mode to get the linker to emit
the dynamic relocation tables into the ELF binary. Relative branches
have a range of +/- 128 MB, which covers the kernel and modules
(unless the option to randomize the module region independently has
been selected, in which case branches between the kernel and modules
may be resolved via PLT entries that are emitted at module load time)

I am not sure how this extrapolates to x86, just adding some context.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 16:26           ` Daniel Micay
@ 2017-08-16 16:32             ` Ard Biesheuvel
  2017-08-16 16:32             ` Ard Biesheuvel
  1 sibling, 0 replies; 106+ messages in thread
From: Ard Biesheuvel @ 2017-08-16 16:32 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, Len Brown,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On 16 August 2017 at 17:26, Daniel Micay <danielmicay@gmail.com> wrote:
>> How are these assumptions hardcoded by GCC? Most of the instructions
>> should be
>> relocatable straight away, as most call/jump/branch instructions are
>> RIP-relative.
>>
>> I.e. is there no GCC code generation mode where code can be placed
>> anywhere in the
>> canonical address space, yet call and jump distance is within 31 bits
>> so that the
>> generated code is fast?
>
> That's what PIE is meant to do. However, not disabling support for lazy
> linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
> it to add needless overhead.
>
> arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
> CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

The difference with arm64 is that its generic small code model is
already position independent, so we don't have to pass -fpic or -fpie
to the compiler. We only link in PIE mode to get the linker to emit
the dynamic relocation tables into the ELF binary. Relative branches
have a range of +/- 128 MB, which covers the kernel and modules
(unless the option to randomize the module region independently has
been selected, in which case branches between the kernel and modules
may be resolved via PLT entries that are emitted at module load time)

I am not sure how this extrapolates to x86, just adding some context.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
  2017-08-16 16:09           ` Christopher Lameter
  2017-08-16 16:09           ` Christopher Lameter
@ 2017-08-16 16:26           ` Daniel Micay
  2017-08-16 16:32             ` Ard Biesheuvel
  2017-08-16 16:32             ` Ard Biesheuvel
  2017-08-16 16:26           ` Daniel Micay
                             ` (4 subsequent siblings)
  7 siblings, 2 replies; 106+ messages in thread
From: Daniel Micay @ 2017-08-16 16:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

> How are these assumptions hardcoded by GCC? Most of the instructions
> should be 
> relocatable straight away, as most call/jump/branch instructions are
> RIP-relative.
> 
> I.e. is there no GCC code generation mode where code can be placed
> anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits
> so that the 
> generated code is fast?

That's what PIE is meant to do. However, not disabling support for lazy
linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
it to add needless overhead.

arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
                             ` (2 preceding siblings ...)
  2017-08-16 16:26           ` Daniel Micay
@ 2017-08-16 16:26           ` Daniel Micay
  2017-08-16 16:57           ` Thomas Garnier
                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 106+ messages in thread
From: Daniel Micay @ 2017-08-16 16:26 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

> How are these assumptions hardcoded by GCC? Most of the instructions
> should be 
> relocatable straight away, as most call/jump/branch instructions are
> RIP-relative.
> 
> I.e. is there no GCC code generation mode where code can be placed
> anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits
> so that the 
> generated code is fast?

That's what PIE is meant to do. However, not disabling support for lazy
linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
it to add needless overhead.

arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
  2017-08-16 16:09           ` Christopher Lameter
@ 2017-08-16 16:09           ` Christopher Lameter
  2017-08-16 16:26           ` Daniel Micay
                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 106+ messages in thread
From: Christopher Lameter @ 2017-08-16 16:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Garnier, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki

On Wed, 16 Aug 2017, Ingo Molnar wrote:

> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.

Ahh finally a limit is in sight as to how much security hardening etc can
reduce kernel performance.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-16 15:12         ` Ingo Molnar
@ 2017-08-16 16:09           ` Christopher Lameter
  2017-08-16 16:09           ` Christopher Lameter
                             ` (6 subsequent siblings)
  7 siblings, 0 replies; 106+ messages in thread
From: Christopher Lameter @ 2017-08-16 16:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Thomas Gleixner, Kees Cook,
	the arch/x86 maintainers, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel

On Wed, 16 Aug 2017, Ingo Molnar wrote:

> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.

Ahh finally a limit is in sight as to how much security hardening etc can
reduce kernel performance.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
                           ` (2 preceding siblings ...)
  2017-08-16 15:12         ` Ingo Molnar
@ 2017-08-16 15:12         ` Ingo Molnar
  2017-08-16 16:09           ` Christopher Lameter
                             ` (7 more replies)
  2017-08-21 14:31         ` Peter Zijlstra
  2017-08-21 14:31         ` Peter Zijlstra
  5 siblings, 8 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-16 15:12 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> > Do these changes get us closer to being able to build the kernel as truly
> >> > position independent, i.e. to place it anywhere in the valid x86-64 address
> >> > space? Or any other advantages?
> >>
> >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
> >> have a full randomized address space where position and order of sections are
> >> completely random. There is still some work to get there but being able to build
> >> a PIE kernel is a significant step.
> >
> > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> >
> > +config RANDOMIZE_BASE_LARGE
> > +       bool "Increase the randomization range of the kernel image"
> > +       depends on X86_64 && RANDOMIZE_BASE
> > +       select X86_PIE
> > +       select X86_MODULE_PLTS if MODULES
> > +       default n
> > +       ---help---
> > +         Build the kernel as a Position Independent Executable (PIE) and
> > +         increase the available randomization range from 1GB to 3GB.
> > +
> > +         This option impacts performance on kernel CPU intensive workloads up
> > +         to 10% due to PIE generated code. Impact on user-mode processes and
> > +         typical usage would be significantly less (0.50% when you build the
> > +         kernel).
> > +
> > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > +         increase on the .text sections). The vmlinux binary will be
> > +         significantly smaller due to less relocations.
> >
> > To put 10% kernel overhead into perspective: enabling this option wipes out about
> > 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> > ... (!!)
> 
> Note that 10% is the high-bound of a CPU intensive workload.

Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of 
modern kernel performance. In many cases we are literally applying cycle level 
optimizations that are barely measurable. A 0.1% speedup in linear execution speed 
is already a big success.

> I am going to start doing performance testing on -mcmodel=large to see if it is 
> faster than -fPIE.

Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
instruction level.

Function calls look like this:

 -mcmodel=medium:

   757:   e8 98 ff ff ff          callq  6f4 <test_code>

 -mcmodel=large

   77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
   782:   ff ff ff 
   785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
   789:   ff d0                   callq  *%rax

And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
totally unacceptable.

> > I think the fundamental flaw is the assumption that we need a PIE executable 
> > to have a freely relocatable kernel on 64-bit CPUs.
> >
> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
> > x86-64 address space to randomize the location of kernel text. The location of 
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates instructions 
> where the virtual addresses have the high 32-bit to be zero.

How are these assumptions hardcoded by GCC? Most of the instructions should be 
relocatable straight away, as most call/jump/branch instructions are RIP-relative.

I.e. is there no GCC code generation mode where code can be placed anywhere in the 
canonical address space, yet call and jump distance is within 31 bits so that the 
generated code is fast?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:47         ` Daniel Micay
  2017-08-15 14:47         ` Daniel Micay
@ 2017-08-16 15:12         ` Ingo Molnar
  2017-08-16 15:12         ` Ingo Molnar
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-16 15:12 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Thomas Garnier <thgarnie@google.com> wrote:
> >
> >> > Do these changes get us closer to being able to build the kernel as truly
> >> > position independent, i.e. to place it anywhere in the valid x86-64 address
> >> > space? Or any other advantages?
> >>
> >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
> >> have a full randomized address space where position and order of sections are
> >> completely random. There is still some work to get there but being able to build
> >> a PIE kernel is a significant step.
> >
> > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> >
> > +config RANDOMIZE_BASE_LARGE
> > +       bool "Increase the randomization range of the kernel image"
> > +       depends on X86_64 && RANDOMIZE_BASE
> > +       select X86_PIE
> > +       select X86_MODULE_PLTS if MODULES
> > +       default n
> > +       ---help---
> > +         Build the kernel as a Position Independent Executable (PIE) and
> > +         increase the available randomization range from 1GB to 3GB.
> > +
> > +         This option impacts performance on kernel CPU intensive workloads up
> > +         to 10% due to PIE generated code. Impact on user-mode processes and
> > +         typical usage would be significantly less (0.50% when you build the
> > +         kernel).
> > +
> > +         The kernel and modules will generate slightly more assembly (1 to 2%
> > +         increase on the .text sections). The vmlinux binary will be
> > +         significantly smaller due to less relocations.
> >
> > To put 10% kernel overhead into perspective: enabling this option wipes out about
> > 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> > ... (!!)
> 
> Note that 10% is the high-bound of a CPU intensive workload.

Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of 
modern kernel performance. In many cases we are literally applying cycle level 
optimizations that are barely measurable. A 0.1% speedup in linear execution speed 
is already a big success.

> I am going to start doing performance testing on -mcmodel=large to see if it is 
> faster than -fPIE.

Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
instruction level.

Function calls look like this:

 -mcmodel=medium:

   757:   e8 98 ff ff ff          callq  6f4 <test_code>

 -mcmodel=large

   77b:   48 b8 10 f7 df ff ff    movabs $0xffffffffffdff710,%rax
   782:   ff ff ff 
   785:   48 8d 04 03             lea    (%rbx,%rax,1),%rax
   789:   ff d0                   callq  *%rax

And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
totally unacceptable.

> > I think the fundamental flaw is the assumption that we need a PIE executable 
> > to have a freely relocatable kernel on 64-bit CPUs.
> >
> > Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
> > x86-64 address space to randomize the location of kernel text. The location of 
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates instructions 
> where the virtual addresses have the high 32-bit to be zero.

How are these assumptions hardcoded by GCC? Most of the instructions should be 
relocatable straight away, as most call/jump/branch instructions are RIP-relative.

I.e. is there no GCC code generation mode where code can be placed anywhere in the 
canonical address space, yet call and jump distance is within 31 bits so that the 
generated code is fast?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:47         ` Daniel Micay
  2017-08-15 14:58           ` Thomas Garnier
@ 2017-08-15 14:58           ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-15 14:58 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On Tue, Aug 15, 2017 at 7:47 AM, Daniel Micay <danielmicay@gmail.com> wrote:
> On 15 August 2017 at 10:20, Thomas Garnier <thgarnie@google.com> wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Thomas Garnier <thgarnie@google.com> wrote:
>>>
>>>> > Do these changes get us closer to being able to build the kernel as truly
>>>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>>>> > space? Or any other advantages?
>>>>
>>>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>>>> have a full randomized address space where position and order of sections are
>>>> completely random. There is still some work to get there but being able to build
>>>> a PIE kernel is a significant step.
>>>
>>> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>>>
>>> +config RANDOMIZE_BASE_LARGE
>>> +       bool "Increase the randomization range of the kernel image"
>>> +       depends on X86_64 && RANDOMIZE_BASE
>>> +       select X86_PIE
>>> +       select X86_MODULE_PLTS if MODULES
>>> +       default n
>>> +       ---help---
>>> +         Build the kernel as a Position Independent Executable (PIE) and
>>> +         increase the available randomization range from 1GB to 3GB.
>>> +
>>> +         This option impacts performance on kernel CPU intensive workloads up
>>> +         to 10% due to PIE generated code. Impact on user-mode processes and
>>> +         typical usage would be significantly less (0.50% when you build the
>>> +         kernel).
>>> +
>>> +         The kernel and modules will generate slightly more assembly (1 to 2%
>>> +         increase on the .text sections). The vmlinux binary will be
>>> +         significantly smaller due to less relocations.
>>>
>>> To put 10% kernel overhead into perspective: enabling this option wipes out about
>>> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
>>> ... (!!)
>>
>> Note that 10% is the high-bound of a CPU intensive workload.
>
> The cost can be reduced by using -fno-plt these days but some work
> might be required to make that work with the kernel.
>
> Where does that 10% estimate in the kernel config docs come from? I'd
> be surprised if it really cost that much on x86_64. That's a realistic
> cost for i386 with modern GCC (it used to be worse) but I'd expect
> x86_64 to be closer to 2% even for CPU intensive workloads. It should
> be very close to zero with -fno-plt.

I got 8 to 10% on hackbench. Other benchmarks were 4% or lower.

I will do look at more recent compiler and no-plt as well.

-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:47         ` Daniel Micay
@ 2017-08-15 14:58           ` Thomas Garnier
  2017-08-15 14:58           ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-15 14:58 UTC (permalink / raw)
  To: Daniel Micay
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Tue, Aug 15, 2017 at 7:47 AM, Daniel Micay <danielmicay@gmail.com> wrote:
> On 15 August 2017 at 10:20, Thomas Garnier <thgarnie@google.com> wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Thomas Garnier <thgarnie@google.com> wrote:
>>>
>>>> > Do these changes get us closer to being able to build the kernel as truly
>>>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>>>> > space? Or any other advantages?
>>>>
>>>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>>>> have a full randomized address space where position and order of sections are
>>>> completely random. There is still some work to get there but being able to build
>>>> a PIE kernel is a significant step.
>>>
>>> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>>>
>>> +config RANDOMIZE_BASE_LARGE
>>> +       bool "Increase the randomization range of the kernel image"
>>> +       depends on X86_64 && RANDOMIZE_BASE
>>> +       select X86_PIE
>>> +       select X86_MODULE_PLTS if MODULES
>>> +       default n
>>> +       ---help---
>>> +         Build the kernel as a Position Independent Executable (PIE) and
>>> +         increase the available randomization range from 1GB to 3GB.
>>> +
>>> +         This option impacts performance on kernel CPU intensive workloads up
>>> +         to 10% due to PIE generated code. Impact on user-mode processes and
>>> +         typical usage would be significantly less (0.50% when you build the
>>> +         kernel).
>>> +
>>> +         The kernel and modules will generate slightly more assembly (1 to 2%
>>> +         increase on the .text sections). The vmlinux binary will be
>>> +         significantly smaller due to less relocations.
>>>
>>> To put 10% kernel overhead into perspective: enabling this option wipes out about
>>> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
>>> ... (!!)
>>
>> Note that 10% is the high-bound of a CPU intensive workload.
>
> The cost can be reduced by using -fno-plt these days but some work
> might be required to make that work with the kernel.
>
> Where does that 10% estimate in the kernel config docs come from? I'd
> be surprised if it really cost that much on x86_64. That's a realistic
> cost for i386 with modern GCC (it used to be worse) but I'd expect
> x86_64 to be closer to 2% even for CPU intensive workloads. It should
> be very close to zero with -fno-plt.

I got 8 to 10% on hackbench. Other benchmarks were 4% or lower.

I will do look at more recent compiler and no-plt as well.

-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:47         ` Daniel Micay
@ 2017-08-15 14:47         ` Daniel Micay
  2017-08-15 14:58           ` Thomas Garnier
  2017-08-15 14:58           ` Thomas Garnier
  2017-08-16 15:12         ` Ingo Molnar
                           ` (3 subsequent siblings)
  5 siblings, 2 replies; 106+ messages in thread
From: Daniel Micay @ 2017-08-15 14:47 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Ingo Molnar, Herbert Xu, David S . Miller, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek

On 15 August 2017 at 10:20, Thomas Garnier <thgarnie@google.com> wrote:
> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> > Do these changes get us closer to being able to build the kernel as truly
>>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>>> > space? Or any other advantages?
>>>
>>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>>> have a full randomized address space where position and order of sections are
>>> completely random. There is still some work to get there but being able to build
>>> a PIE kernel is a significant step.
>>
>> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>>
>> +config RANDOMIZE_BASE_LARGE
>> +       bool "Increase the randomization range of the kernel image"
>> +       depends on X86_64 && RANDOMIZE_BASE
>> +       select X86_PIE
>> +       select X86_MODULE_PLTS if MODULES
>> +       default n
>> +       ---help---
>> +         Build the kernel as a Position Independent Executable (PIE) and
>> +         increase the available randomization range from 1GB to 3GB.
>> +
>> +         This option impacts performance on kernel CPU intensive workloads up
>> +         to 10% due to PIE generated code. Impact on user-mode processes and
>> +         typical usage would be significantly less (0.50% when you build the
>> +         kernel).
>> +
>> +         The kernel and modules will generate slightly more assembly (1 to 2%
>> +         increase on the .text sections). The vmlinux binary will be
>> +         significantly smaller due to less relocations.
>>
>> To put 10% kernel overhead into perspective: enabling this option wipes out about
>> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
>> ... (!!)
>
> Note that 10% is the high-bound of a CPU intensive workload.

The cost can be reduced by using -fno-plt these days but some work
might be required to make that work with the kernel.

Where does that 10% estimate in the kernel config docs come from? I'd
be surprised if it really cost that much on x86_64. That's a realistic
cost for i386 with modern GCC (it used to be worse) but I'd expect
x86_64 to be closer to 2% even for CPU intensive workloads. It should
be very close to zero with -fno-plt.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15 14:20       ` Thomas Garnier
@ 2017-08-15 14:47         ` Daniel Micay
  2017-08-15 14:47         ` Daniel Micay
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 106+ messages in thread
From: Daniel Micay @ 2017-08-15 14:47 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Ingo Molnar, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On 15 August 2017 at 10:20, Thomas Garnier <thgarnie@google.com> wrote:
> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@google.com> wrote:
>>
>>> > Do these changes get us closer to being able to build the kernel as truly
>>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>>> > space? Or any other advantages?
>>>
>>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>>> have a full randomized address space where position and order of sections are
>>> completely random. There is still some work to get there but being able to build
>>> a PIE kernel is a significant step.
>>
>> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>>
>> +config RANDOMIZE_BASE_LARGE
>> +       bool "Increase the randomization range of the kernel image"
>> +       depends on X86_64 && RANDOMIZE_BASE
>> +       select X86_PIE
>> +       select X86_MODULE_PLTS if MODULES
>> +       default n
>> +       ---help---
>> +         Build the kernel as a Position Independent Executable (PIE) and
>> +         increase the available randomization range from 1GB to 3GB.
>> +
>> +         This option impacts performance on kernel CPU intensive workloads up
>> +         to 10% due to PIE generated code. Impact on user-mode processes and
>> +         typical usage would be significantly less (0.50% when you build the
>> +         kernel).
>> +
>> +         The kernel and modules will generate slightly more assembly (1 to 2%
>> +         increase on the .text sections). The vmlinux binary will be
>> +         significantly smaller due to less relocations.
>>
>> To put 10% kernel overhead into perspective: enabling this option wipes out about
>> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
>> ... (!!)
>
> Note that 10% is the high-bound of a CPU intensive workload.

The cost can be reduced by using -fno-plt these days but some work
might be required to make that work with the kernel.

Where does that 10% estimate in the kernel config docs come from? I'd
be surprised if it really cost that much on x86_64. That's a realistic
cost for i386 with modern GCC (it used to be worse) but I'd expect
x86_64 to be closer to 2% even for CPU intensive workloads. It should
be very close to zero with -fno-plt.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15  7:56     ` Ingo Molnar
  2017-08-15 14:20       ` Thomas Garnier
@ 2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:47         ` Daniel Micay
                           ` (5 more replies)
  1 sibling, 6 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-15 14:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> > Do these changes get us closer to being able to build the kernel as truly
>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>> > space? Or any other advantages?
>>
>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>> have a full randomized address space where position and order of sections are
>> completely random. There is still some work to get there but being able to build
>> a PIE kernel is a significant step.
>
> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>
> +config RANDOMIZE_BASE_LARGE
> +       bool "Increase the randomization range of the kernel image"
> +       depends on X86_64 && RANDOMIZE_BASE
> +       select X86_PIE
> +       select X86_MODULE_PLTS if MODULES
> +       default n
> +       ---help---
> +         Build the kernel as a Position Independent Executable (PIE) and
> +         increase the available randomization range from 1GB to 3GB.
> +
> +         This option impacts performance on kernel CPU intensive workloads up
> +         to 10% due to PIE generated code. Impact on user-mode processes and
> +         typical usage would be significantly less (0.50% when you build the
> +         kernel).
> +
> +         The kernel and modules will generate slightly more assembly (1 to 2%
> +         increase on the .text sections). The vmlinux binary will be
> +         significantly smaller due to less relocations.
>
> To put 10% kernel overhead into perspective: enabling this option wipes out about
> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> ... (!!)

Note that 10% is the high-bound of a CPU intensive workload.

>
> I think the fundamental flaw is the assumption that we need a PIE executable to
> have a freely relocatable kernel on 64-bit CPUs.
>
> Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> x86-64 address space to randomize the location of kernel text. The location of
> modules can be further randomized within that 2GB window.

-model=small/medium assume you are on the low 32-bit. It generates
instructions where the virtual addresses have the high 32-bit to be
zero.

I am going to start doing performance testing on -mcmodel=large to see
if it is faster than -fPIE.

>
> It should have far less performance impact than the register-losing and
> overhead-inducing -fpie / -mcmodel=large (for modules) execution models.
>
> My quick guess is tha the performance impact might be close to zero in fact.

If mcmodel=small/medium was possible for kernel, I don't think it
would have less performance impact than mcmodel=large. It would still
need to set the high 32-bit to be a static value, only the relocation
would be a different size.

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-15  7:56     ` Ingo Molnar
@ 2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:20       ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-15 14:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> > Do these changes get us closer to being able to build the kernel as truly
>> > position independent, i.e. to place it anywhere in the valid x86-64 address
>> > space? Or any other advantages?
>>
>> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to
>> have a full randomized address space where position and order of sections are
>> completely random. There is still some work to get there but being able to build
>> a PIE kernel is a significant step.
>
> So I _really_ dislike the whole PIE approach, because of the huge slowdown:
>
> +config RANDOMIZE_BASE_LARGE
> +       bool "Increase the randomization range of the kernel image"
> +       depends on X86_64 && RANDOMIZE_BASE
> +       select X86_PIE
> +       select X86_MODULE_PLTS if MODULES
> +       default n
> +       ---help---
> +         Build the kernel as a Position Independent Executable (PIE) and
> +         increase the available randomization range from 1GB to 3GB.
> +
> +         This option impacts performance on kernel CPU intensive workloads up
> +         to 10% due to PIE generated code. Impact on user-mode processes and
> +         typical usage would be significantly less (0.50% when you build the
> +         kernel).
> +
> +         The kernel and modules will generate slightly more assembly (1 to 2%
> +         increase on the .text sections). The vmlinux binary will be
> +         significantly smaller due to less relocations.
>
> To put 10% kernel overhead into perspective: enabling this option wipes out about
> 5-10 years worth of painstaking optimizations we've done to keep the kernel fast
> ... (!!)

Note that 10% is the high-bound of a CPU intensive workload.

>
> I think the fundamental flaw is the assumption that we need a PIE executable to
> have a freely relocatable kernel on 64-bit CPUs.
>
> Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie
> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical
> x86-64 address space to randomize the location of kernel text. The location of
> modules can be further randomized within that 2GB window.

-model=small/medium assume you are on the low 32-bit. It generates
instructions where the virtual addresses have the high 32-bit to be
zero.

I am going to start doing performance testing on -mcmodel=large to see
if it is faster than -fPIE.

>
> It should have far less performance impact than the register-losing and
> overhead-inducing -fpie / -mcmodel=large (for modules) execution models.
>
> My quick guess is tha the performance impact might be close to zero in fact.

If mcmodel=small/medium was possible for kernel, I don't think it
would have less performance impact than mcmodel=large. It would still
need to set the high 32-bit to be a static value, only the relocation
would be a different size.

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-11 15:09   ` Thomas Garnier
  2017-08-15  7:56     ` Ingo Molnar
@ 2017-08-15  7:56     ` Ingo Molnar
  2017-08-15 14:20       ` Thomas Garnier
  2017-08-15 14:20       ` Thomas Garnier
  1 sibling, 2 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-15  7:56 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> > Do these changes get us closer to being able to build the kernel as truly 
> > position independent, i.e. to place it anywhere in the valid x86-64 address 
> > space? Or any other advantages?
> 
> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to 
> have a full randomized address space where position and order of sections are 
> completely random. There is still some work to get there but being able to build 
> a PIE kernel is a significant step.

So I _really_ dislike the whole PIE approach, because of the huge slowdown:

+config RANDOMIZE_BASE_LARGE
+       bool "Increase the randomization range of the kernel image"
+       depends on X86_64 && RANDOMIZE_BASE
+       select X86_PIE
+       select X86_MODULE_PLTS if MODULES
+       default n
+       ---help---
+         Build the kernel as a Position Independent Executable (PIE) and
+         increase the available randomization range from 1GB to 3GB.
+
+         This option impacts performance on kernel CPU intensive workloads up
+         to 10% due to PIE generated code. Impact on user-mode processes and
+         typical usage would be significantly less (0.50% when you build the
+         kernel).
+
+         The kernel and modules will generate slightly more assembly (1 to 2%
+         increase on the .text sections). The vmlinux binary will be
+         significantly smaller due to less relocations.

To put 10% kernel overhead into perspective: enabling this option wipes out about 
5-10 years worth of painstaking optimizations we've done to keep the kernel fast 
... (!!)

I think the fundamental flaw is the assumption that we need a PIE executable to 
have a freely relocatable kernel on 64-bit CPUs.

Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
-mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
x86-64 address space to randomize the location of kernel text. The location of 
modules can be further randomized within that 2GB window.

It should have far less performance impact than the register-losing and 
overhead-inducing -fpie / -mcmodel=large (for modules) execution models.

My quick guess is tha the performance impact might be close to zero in fact.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-11 15:09   ` Thomas Garnier
@ 2017-08-15  7:56     ` Ingo Molnar
  2017-08-15  7:56     ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-15  7:56 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley


* Thomas Garnier <thgarnie@google.com> wrote:

> > Do these changes get us closer to being able to build the kernel as truly 
> > position independent, i.e. to place it anywhere in the valid x86-64 address 
> > space? Or any other advantages?
> 
> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us to 
> have a full randomized address space where position and order of sections are 
> completely random. There is still some work to get there but being able to build 
> a PIE kernel is a significant step.

So I _really_ dislike the whole PIE approach, because of the huge slowdown:

+config RANDOMIZE_BASE_LARGE
+       bool "Increase the randomization range of the kernel image"
+       depends on X86_64 && RANDOMIZE_BASE
+       select X86_PIE
+       select X86_MODULE_PLTS if MODULES
+       default n
+       ---help---
+         Build the kernel as a Position Independent Executable (PIE) and
+         increase the available randomization range from 1GB to 3GB.
+
+         This option impacts performance on kernel CPU intensive workloads up
+         to 10% due to PIE generated code. Impact on user-mode processes and
+         typical usage would be significantly less (0.50% when you build the
+         kernel).
+
+         The kernel and modules will generate slightly more assembly (1 to 2%
+         increase on the .text sections). The vmlinux binary will be
+         significantly smaller due to less relocations.

To put 10% kernel overhead into perspective: enabling this option wipes out about 
5-10 years worth of painstaking optimizations we've done to keep the kernel fast 
... (!!)

I think the fundamental flaw is the assumption that we need a PIE executable to 
have a freely relocatable kernel on 64-bit CPUs.

Have you considered a kernel with -mcmodel=small (or medium) instead of -fpie 
-mcmodel=large? We can pick a random 2GB window in the (non-kernel) canonical 
x86-64 address space to randomize the location of kernel text. The location of 
modules can be further randomized within that 2GB window.

It should have far less performance impact than the register-losing and 
overhead-inducing -fpie / -mcmodel=large (for modules) execution models.

My quick guess is tha the performance impact might be close to zero in fact.

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-11 12:41 ` Ingo Molnar
  2017-08-11 15:09   ` Thomas Garnier
@ 2017-08-11 15:09   ` Thomas Garnier
  2017-08-15  7:56     ` Ingo Molnar
  2017-08-15  7:56     ` Ingo Molnar
  1 sibling, 2 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-11 15:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo

On Fri, Aug 11, 2017 at 5:41 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> Changes:
>>  - v2:
>>    - Add support for global stack cookie while compiler default to fs without
>>      mcmodel=kernel
>>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>>      preserve.
>>
>> These patches make the changes necessary to build the kernel as Position
>> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
>> the top 2G of the virtual address space. It allows to optionally extend the
>> KASLR randomization range from 1G to 3G.
>
> So this:
>
>  61 files changed, 923 insertions(+), 299 deletions(-)
>
> ... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces
> of code, to gain what appears to be only ~1.5 more bits of randomization!

The range increase is a way to use PIE right away.

>
> Do these changes get us closer to being able to build the kernel as truly position
> independent, i.e. to place it anywhere in the valid x86-64 address space? Or any
> other advantages?

Yes, PIE allows us to put the kernel anywhere in memory. It will allow
us to have a full randomized address space where position and order of
sections are completely random. There is still some work to get there
but being able to build a PIE kernel is a significant step.

>
> Thanks,
>
>         Ingo



-- 
Thomas

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-11 12:41 ` Ingo Molnar
@ 2017-08-11 15:09   ` Thomas Garnier
  2017-08-11 15:09   ` Thomas Garnier
  1 sibling, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-11 15:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm list,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, Kernel Hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, the arch/x86 maintainers, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Peter Foley

On Fri, Aug 11, 2017 at 5:41 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Thomas Garnier <thgarnie@google.com> wrote:
>
>> Changes:
>>  - v2:
>>    - Add support for global stack cookie while compiler default to fs without
>>      mcmodel=kernel
>>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>>      preserve.
>>
>> These patches make the changes necessary to build the kernel as Position
>> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
>> the top 2G of the virtual address space. It allows to optionally extend the
>> KASLR randomization range from 1G to 3G.
>
> So this:
>
>  61 files changed, 923 insertions(+), 299 deletions(-)
>
> ... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces
> of code, to gain what appears to be only ~1.5 more bits of randomization!

The range increase is a way to use PIE right away.

>
> Do these changes get us closer to being able to build the kernel as truly position
> independent, i.e. to place it anywhere in the valid x86-64 address space? Or any
> other advantages?

Yes, PIE allows us to put the kernel anywhere in memory. It will allow
us to have a full randomized address space where position and order of
sections are completely random. There is still some work to get there
but being able to build a PIE kernel is a significant step.

>
> Thanks,
>
>         Ingo



-- 
Thomas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-10 17:25 Thomas Garnier
@ 2017-08-11 12:41 ` Ingo Molnar
  2017-08-11 15:09   ` Thomas Garnier
  2017-08-11 15:09   ` Thomas Garnier
  2017-08-11 12:41 ` Ingo Molnar
  1 sibling, 2 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-11 12:41 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek, Tejun Heo


* Thomas Garnier <thgarnie@google.com> wrote:

> Changes:
>  - v2:
>    - Add support for global stack cookie while compiler default to fs without
>      mcmodel=kernel
>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>      preserve.
> 
> These patches make the changes necessary to build the kernel as Position
> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
> the top 2G of the virtual address space. It allows to optionally extend the
> KASLR randomization range from 1G to 3G.

So this:

 61 files changed, 923 insertions(+), 299 deletions(-)

... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces 
of code, to gain what appears to be only ~1.5 more bits of randomization!

Do these changes get us closer to being able to build the kernel as truly position 
independent, i.e. to place it anywhere in the valid x86-64 address space? Or any 
other advantages?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-08-10 17:25 Thomas Garnier
  2017-08-11 12:41 ` Ingo Molnar
@ 2017-08-11 12:41 ` Ingo Molnar
  1 sibling, 0 replies; 106+ messages in thread
From: Ingo Molnar @ 2017-08-11 12:41 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Nicolas Pitre, Peter Zijlstra, Michal Hocko, kvm,
	Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li,
	Alexei Starovoitov, David Howells, Paul Gortmaker, Pavel Machek,
	H . Peter Anvin, kernel-hardening, Christoph Lameter,
	Thomas Gleixner, Kees Cook, x86, Herbert Xu, Daniel Borkmann,
	Matthew Wilcox, Peter Foley, Joerg Roedel


* Thomas Garnier <thgarnie@google.com> wrote:

> Changes:
>  - v2:
>    - Add support for global stack cookie while compiler default to fs without
>      mcmodel=kernel
>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>      preserve.
> 
> These patches make the changes necessary to build the kernel as Position
> Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
> the top 2G of the virtual address space. It allows to optionally extend the
> KASLR randomization range from 1G to 3G.

So this:

 61 files changed, 923 insertions(+), 299 deletions(-)

... is IMHO an _awful_ lot of churn and extra complexity in pretty fragile pieces 
of code, to gain what appears to be only ~1.5 more bits of randomization!

Do these changes get us closer to being able to build the kernel as truly position 
independent, i.e. to place it anywhere in the valid x86-64 address space? Or any 
other advantages?

Thanks,

	Ingo

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-08-10 17:25 Thomas Garnier
  2017-08-11 12:41 ` Ingo Molnar
  2017-08-11 12:41 ` Ingo Molnar
  0 siblings, 2 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-10 17:25 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Thomas Garnier, Matthias Kaehlcke, Boris Ostrovsky,
	Juergen Gross, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown
  Cc: x86, linux-crypto, linux-kernel, xen-devel, kvm, linux-pm,
	linux-arch, linux-sparse, kernel-hardening

Changes:
 - v2:
   - Add support for global stack cookie while compiler default to fs without
     mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
     preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general.

The patches:
 - 1-3, 5-15: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 16: Adapt percpu design to work correctly when PIE is enabled.
 - 17: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 18: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie
 - 20: Add the CONFIG_X86_PIE option (off by default)
 - 21: Adapt relocation tool to generate a 64-bit relocation table.
 - 22: Add options to build modules as mcmodel=large and dynamically create a
       PLT for relative references out of range (adapted from arm64).
 - 23: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Hackbench (50% and 1600% loads):
 - PIE disabled: no significant change (-0.50% / +0.50%)
 - PIE enabled: 7% to 8% on half load, 10% on heavy load.

These results are aligned with the different research on user-mode PIE
impact on cpu intensive benchmarks (around 10% on x86_64).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-1% / +1%)
 - PIE enabled: 3% to 4%

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (-0.22% / +0.06%)
 - PIE enabled: around 0.50%
 System Time:
 - PIE disabled: no significant change (-0.99% / -1.28%)
 - PIE enabled: 5% to 6%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: 472928672 bytes (-0.000169% from baseline)
 - PIE enabled: 216878461 bytes (-54.14% from baseline)
 .text sections:
 - PIE disabled: 9373572 bytes (+0.04% from baseline)
 - PIE enabled: 9499138 bytes (+1.38% from baseline)

The big decrease in vmlinux file size is due to the lower number of
relocations appended to the file.

diffstat:
 arch/x86/Kconfig                             |   42 +++++
 arch/x86/Makefile                            |   28 +++
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++---
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++-----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   17 ++
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   11 -
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/Makefile                     |    2 
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/head64.c                     |   30 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   46 +++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module-plts.c                |  198 +++++++++++++++++++++++++++
 arch/x86/kernel/module.c                     |   18 +-
 arch/x86/kernel/module.lds                   |    4 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  134 +++++++++++++++---
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +-
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-asm.h                       |    3 
 arch/x86/xen/xen-head.S                      |    9 -
 include/asm-generic/sections.h               |    6 
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 61 files changed, 923 insertions(+), 299 deletions(-)

^ permalink raw reply	[flat|nested] 106+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-08-10 17:25 Thomas Garnier
  0 siblings, 0 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-08-10 17:25 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Thomas Garnier, Matthias Kaehlcke, Boris Ostrovsky,
	Juergen Gross, Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Tom Lendacky, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Kirill A . Shutemov, Rafael J . Wysocki, Len Brown,
	Pavel Machek
  Cc: linux-arch, kvm, linux-pm, x86, linux-kernel, linux-sparse,
	linux-crypto, kernel-hardening, xen-devel

Changes:
 - v2:
   - Add support for global stack cookie while compiler default to fs without
     mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
     preserve.

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general.

The patches:
 - 1-3, 5-15: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 16: Adapt percpu design to work correctly when PIE is enabled.
 - 17: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 18: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add support for global cookie
 - 20: Add the CONFIG_X86_PIE option (off by default)
 - 21: Adapt relocation tool to generate a 64-bit relocation table.
 - 22: Add options to build modules as mcmodel=large and dynamically create a
       PLT for relative references out of range (adapted from arm64).
 - 23: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Hackbench (50% and 1600% loads):
 - PIE disabled: no significant change (-0.50% / +0.50%)
 - PIE enabled: 7% to 8% on half load, 10% on heavy load.

These results are aligned with the different research on user-mode PIE
impact on cpu intensive benchmarks (around 10% on x86_64).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-1% / +1%)
 - PIE enabled: 3% to 4%

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (-0.22% / +0.06%)
 - PIE enabled: around 0.50%
 System Time:
 - PIE disabled: no significant change (-0.99% / -1.28%)
 - PIE enabled: 5% to 6%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: 472928672 bytes (-0.000169% from baseline)
 - PIE enabled: 216878461 bytes (-54.14% from baseline)
 .text sections:
 - PIE disabled: 9373572 bytes (+0.04% from baseline)
 - PIE enabled: 9499138 bytes (+1.38% from baseline)

The big decrease in vmlinux file size is due to the lower number of
relocations appended to the file.

diffstat:
 arch/x86/Kconfig                             |   42 +++++
 arch/x86/Makefile                            |   28 +++
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++---
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++-----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_32.S                    |    3 
 arch/x86/entry/entry_64.S                    |   29 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   17 ++
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |   11 -
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/include/asm/stackprotector.h        |   19 +-
 arch/x86/kernel/Makefile                     |    2 
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/asm-offsets.c                |    3 
 arch/x86/kernel/asm-offsets_32.c             |    3 
 arch/x86/kernel/asm-offsets_64.c             |    3 
 arch/x86/kernel/cpu/common.c                 |    7 
 arch/x86/kernel/head64.c                     |   30 +++-
 arch/x86/kernel/head_32.S                    |    3 
 arch/x86/kernel/head_64.S                    |   46 +++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module-plts.c                |  198 +++++++++++++++++++++++++++
 arch/x86/kernel/module.c                     |   18 +-
 arch/x86/kernel/module.lds                   |    4 
 arch/x86/kernel/process.c                    |    5 
 arch/x86/kernel/relocate_kernel_64.S         |    8 -
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  134 +++++++++++++++---
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +-
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-asm.h                       |    3 
 arch/x86/xen/xen-head.S                      |    9 -
 include/asm-generic/sections.h               |    6 
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 61 files changed, 923 insertions(+), 299 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-07-18 22:33 Thomas Garnier
@ 2017-07-19 14:08 ` Christopher Lameter
  2017-07-19 14:08 ` Christopher Lameter
  1 sibling, 0 replies; 106+ messages in thread
From: Christopher Lameter @ 2017-07-19 14:08 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Arnd Bergmann,
	Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross, Paolo Bonzini,
	Radim Krčmář,
	Joerg Roedel, Andy Lutomirski, Borislav Petkov,
	Kirill A . Shutemov, Brian Gerst, Borislav Petkov,
	Christian Borntraeger, Rafael J . Wysocki

On Tue, 18 Jul 2017, Thomas Garnier wrote:

> Performance/Size impact:
> Hackbench (50% and 1600% loads):
>  - PIE enabled: 7% to 8% on half load, 10% on heavy load.
> slab_test (average of 10 runs):
>  - PIE enabled: 3% to 4%
> Kernbench (average of 10 Half and Optimal runs):
>  - PIE enabled: 5% to 6%
>
> Size of vmlinux (Ubuntu configuration):
>  File size:
>  - PIE disabled: 472928672 bytes (-0.000169% from baseline)
>  - PIE enabled: 216878461 bytes (-54.14% from baseline)

Maybe we need something like CONFIG_PARANOIA so that we can determine at
build time how much performance we want to sacrifice for performance?

Its going to be difficult to understand what all these hardening config
options do.

^ permalink raw reply	[flat|nested] 106+ messages in thread

* Re: x86: PIE support and option to extend KASLR randomization
  2017-07-18 22:33 Thomas Garnier
  2017-07-19 14:08 ` Christopher Lameter
@ 2017-07-19 14:08 ` Christopher Lameter
  1 sibling, 0 replies; 106+ messages in thread
From: Christopher Lameter @ 2017-07-19 14:08 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Michal Hocko, kvm, Radim Krčmář,
	Peter Zijlstra, Catalin Marinas, Christopher Li, x86,
	Paul Gortmaker, Pavel Machek, H . Peter Anvin, kernel-hardening,
	Thomas Gleixner, Chris Metcalf, linux-arch, Herbert Xu,
	Daniel Borkmann, Matthew Wilcox, Joerg Roedel, Peter Foley,
	Christian Borntraeger, linux-sparse, Matthias Kaehlcke,
	xen-devel, Borislav Petkov

On Tue, 18 Jul 2017, Thomas Garnier wrote:

> Performance/Size impact:
> Hackbench (50% and 1600% loads):
>  - PIE enabled: 7% to 8% on half load, 10% on heavy load.
> slab_test (average of 10 runs):
>  - PIE enabled: 3% to 4%
> Kernbench (average of 10 Half and Optimal runs):
>  - PIE enabled: 5% to 6%
>
> Size of vmlinux (Ubuntu configuration):
>  File size:
>  - PIE disabled: 472928672 bytes (-0.000169% from baseline)
>  - PIE enabled: 216878461 bytes (-54.14% from baseline)

Maybe we need something like CONFIG_PARANOIA so that we can determine at
build time how much performance we want to sacrifice for performance?

Its going to be difficult to understand what all these hardening config
options do.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 106+ messages in thread

* x86: PIE support and option to extend KASLR randomization
@ 2017-07-18 22:33 Thomas Garnier
  2017-07-19 14:08 ` Christopher Lameter
  2017-07-19 14:08 ` Christopher Lameter
  0 siblings, 2 replies; 106+ messages in thread
From: Thomas Garnier @ 2017-07-18 22:33 UTC (permalink / raw)
  To: Herbert Xu, David S . Miller, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Peter Zijlstra, Josh Poimboeuf, Thomas Garnier,
	Arnd Bergmann, Matthias Kaehlcke, Boris Ostrovsky, Juergen Gross,
	Paolo Bonzini, Radim Krčmář,
	Joerg Roedel, Andy Lutomirski, Borislav Petkov,
	Kirill A . Shutemov, Brian Gerst, Borislav Petkov,
	Christian Borntraeger, Rafael J . Wysocki
  Cc: x86, linux-crypto, linux-kernel, xen-devel, kvm, linux-pm,
	linux-arch, linux-sparse, kernel-hardening

These patches make the changes necessary to build the kernel as Position
Independent Executable (PIE) on x86_64. A PIE kernel can be relocated below
the top 2G of the virtual address space. It allows to optionally extend the
KASLR randomization range from 1G to 3G.

Thanks a lot to Ard Biesheuvel & Kees Cook on their feedback on compiler
changes, PIE support and KASLR in general.

The patches:
 - 1-3, 5-15: Change in assembly code to be PIE compliant.
 - 4: Add a new _ASM_GET_PTR macro to fetch a symbol address generically.
 - 16: Adapt percpu design to work correctly when PIE is enabled.
 - 17: Provide an option to default visibility to hidden except for key symbols.
       It removes errors between compilation units.
 - 18: Adapt relocation tool to handle PIE binary correctly.
 - 19: Add the CONFIG_X86_PIE option (off by default)
 - 20: Adapt relocation tool to generate a 64-bit relocation table.
 - 21: Add options to build modules as mcmodel=large and dynamically create a
       PLT for relative references out of range (adapted from arm64).
 - 22: Add the CONFIG_RANDOMIZE_BASE_LARGE option to increase relocation range
       from 1G to 3G (off by default).

Performance/Size impact:

Hackbench (50% and 1600% loads):
 - PIE disabled: no significant change (-0.50% / +0.50%)
 - PIE enabled: 7% to 8% on half load, 10% on heavy load.

These results are aligned with the different research on user-mode PIE
impact on cpu intensive benchmarks (around 10% on x86_64).

slab_test (average of 10 runs):
 - PIE disabled: no significant change (-1% / +1%)
 - PIE enabled: 3% to 4%

Kernbench (average of 10 Half and Optimal runs):
 Elapsed Time:
 - PIE disabled: no significant change (-0.22% / +0.06%)
 - PIE enabled: around 0.50%
 System Time:
 - PIE disabled: no significant change (-0.99% / -1.28%)
 - PIE enabled: 5% to 6%

Size of vmlinux (Ubuntu configuration):
 File size:
 - PIE disabled: 472928672 bytes (-0.000169% from baseline)
 - PIE enabled: 216878461 bytes (-54.14% from baseline)
 .text sections:
 - PIE disabled: 9373572 bytes (+0.04% from baseline)
 - PIE enabled: 9499138 bytes (+1.38% from baseline)

The big decrease in vmlinux file size is due to the lower number of
relocations appended to the file.

diffstat:
 arch/x86/Kconfig                             |   37 +++++
 arch/x86/Makefile                            |   17 ++
 arch/x86/boot/boot.h                         |    2 
 arch/x86/boot/compressed/Makefile            |    5 
 arch/x86/boot/compressed/misc.c              |   10 +
 arch/x86/crypto/aes-x86_64-asm_64.S          |   45 +++---
 arch/x86/crypto/aesni-intel_asm.S            |   14 +
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |    6 
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  |   42 ++---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S |   44 +++---
 arch/x86/crypto/camellia-x86_64-asm_64.S     |    8 -
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    |   50 +++---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    |   44 +++---
 arch/x86/crypto/des3_ede-asm_64.S            |   96 ++++++++-----
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |    4 
 arch/x86/crypto/glue_helper-asm-avx.S        |    4 
 arch/x86/crypto/glue_helper-asm-avx2.S       |    6 
 arch/x86/entry/entry_64.S                    |   26 ++-
 arch/x86/include/asm/asm.h                   |   13 +
 arch/x86/include/asm/bug.h                   |    2 
 arch/x86/include/asm/jump_label.h            |    8 -
 arch/x86/include/asm/kvm_host.h              |    6 
 arch/x86/include/asm/module.h                |   16 ++
 arch/x86/include/asm/page_64_types.h         |    9 +
 arch/x86/include/asm/paravirt_types.h        |   12 +
 arch/x86/include/asm/percpu.h                |   25 ++-
 arch/x86/include/asm/pm-trace.h              |    2 
 arch/x86/include/asm/processor.h             |    8 -
 arch/x86/include/asm/setup.h                 |    2 
 arch/x86/kernel/Makefile                     |    2 
 arch/x86/kernel/acpi/wakeup_64.S             |   31 ++--
 arch/x86/kernel/cpu/common.c                 |    4 
 arch/x86/kernel/head64.c                     |   28 +++
 arch/x86/kernel/head_64.S                    |   47 +++++-
 arch/x86/kernel/kvm.c                        |    6 
 arch/x86/kernel/module-plts.c                |  198 +++++++++++++++++++++++++++
 arch/x86/kernel/module.c                     |   18 +-
 arch/x86/kernel/module.lds                   |    4 
 arch/x86/kernel/relocate_kernel_64.S         |    2 
 arch/x86/kernel/setup_percpu.c               |    2 
 arch/x86/kernel/vmlinux.lds.S                |   13 +
 arch/x86/kvm/svm.c                           |    4 
 arch/x86/lib/cmpxchg16b_emu.S                |    8 -
 arch/x86/power/hibernate_asm_64.S            |    4 
 arch/x86/tools/relocs.c                      |  134 +++++++++++++++---
 arch/x86/tools/relocs.h                      |    4 
 arch/x86/tools/relocs_common.c               |   15 +-
 arch/x86/xen/xen-asm.S                       |   12 -
 arch/x86/xen/xen-asm.h                       |    3 
 arch/x86/xen/xen-head.S                      |    9 -
 include/asm-generic/sections.h               |    6 
 include/linux/compiler.h                     |    8 +
 init/Kconfig                                 |    9 +
 kernel/kallsyms.c                            |   16 +-
 54 files changed, 868 insertions(+), 282 deletions(-)

^ permalink raw reply	[flat|nested] 106+ messages in thread

end of thread, other threads:[~2017-10-20  8:13 UTC | newest]

Thread overview: 106+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-18 22:33 x86: PIE support and option to extend KASLR randomization Thomas Garnier
2017-07-18 22:33 Thomas Garnier
2017-07-19 14:08 ` Christopher Lameter
2017-07-19 14:08 ` Christopher Lameter
2017-08-10 17:25 Thomas Garnier
2017-08-10 17:25 Thomas Garnier
2017-08-11 12:41 ` Ingo Molnar
2017-08-11 15:09   ` Thomas Garnier
2017-08-11 15:09   ` Thomas Garnier
2017-08-15  7:56     ` Ingo Molnar
2017-08-15  7:56     ` Ingo Molnar
2017-08-15 14:20       ` Thomas Garnier
2017-08-15 14:20       ` Thomas Garnier
2017-08-15 14:47         ` Daniel Micay
2017-08-15 14:47         ` Daniel Micay
2017-08-15 14:58           ` Thomas Garnier
2017-08-15 14:58           ` Thomas Garnier
2017-08-16 15:12         ` Ingo Molnar
2017-08-16 15:12         ` Ingo Molnar
2017-08-16 16:09           ` Christopher Lameter
2017-08-16 16:09           ` Christopher Lameter
2017-08-16 16:26           ` Daniel Micay
2017-08-16 16:32             ` Ard Biesheuvel
2017-08-16 16:32             ` Ard Biesheuvel
2017-08-16 16:26           ` Daniel Micay
2017-08-16 16:57           ` Thomas Garnier
2017-08-16 16:57           ` Thomas Garnier
2017-08-17  8:09             ` Ingo Molnar
2017-08-17  8:09             ` Ingo Molnar
2017-08-17 14:10               ` Thomas Garnier
2017-08-17 14:10               ` Thomas Garnier
2017-08-24 21:13                 ` Thomas Garnier
2017-08-24 21:13                 ` Thomas Garnier
2017-08-24 21:42                   ` Linus Torvalds
2017-08-24 21:42                   ` Linus Torvalds
2017-08-25 15:35                     ` Thomas Garnier
2017-08-25 15:35                     ` Thomas Garnier
2017-08-25  1:07                   ` Steven Rostedt
2017-08-25  8:04                   ` Ingo Molnar
2017-08-25  8:04                   ` Ingo Molnar
2017-08-25 15:05                     ` Thomas Garnier
2017-08-25 15:05                     ` Thomas Garnier
2017-08-29 19:34                       ` Thomas Garnier
2017-09-21 15:59                         ` Ingo Molnar
2017-09-21 16:10                           ` Ard Biesheuvel
2017-09-21 16:10                           ` Ard Biesheuvel
2017-09-21 21:21                             ` Thomas Garnier
2017-09-21 21:21                             ` Thomas Garnier
2017-09-22  4:24                               ` Markus Trippelsdorf
2017-09-22 14:38                                 ` Thomas Garnier
2017-09-22 14:38                                 ` Thomas Garnier
2017-09-22 23:55                               ` Thomas Garnier
2017-09-22 23:55                               ` Thomas Garnier
2017-09-21 21:16                           ` Thomas Garnier
2017-09-22  0:06                             ` Thomas Garnier
2017-09-22  0:06                             ` Thomas Garnier
2017-09-22 16:32                             ` Ingo Molnar
2017-09-22 18:08                               ` Thomas Garnier
2017-09-22 18:08                               ` Thomas Garnier
2017-09-23  9:43                                 ` Ingo Molnar
2017-10-02 20:28                                   ` Thomas Garnier
2017-10-02 20:28                                   ` Thomas Garnier
2017-09-23  9:43                                 ` Ingo Molnar
2017-09-22 18:38                               ` H. Peter Anvin
2017-09-22 18:57                                 ` Kees Cook
2017-09-22 19:06                                   ` H. Peter Anvin
2017-09-22 22:19                                     ` hjl.tools
2017-09-22 22:30                                     ` hjl.tools
2017-09-22 19:06                                   ` H. Peter Anvin
2017-09-22 18:57                                 ` Kees Cook
2017-09-22 18:59                                 ` Thomas Garnier
2017-09-22 18:59                                 ` Thomas Garnier
2017-09-23  9:49                                 ` Ingo Molnar
2017-09-23  9:49                                 ` Ingo Molnar
2017-09-22 18:38                               ` H. Peter Anvin
2017-09-22 16:32                             ` Ingo Molnar
2017-09-21 21:16                           ` Thomas Garnier
2017-09-21 15:59                         ` Ingo Molnar
2017-08-29 19:34                       ` Thomas Garnier
2017-08-21 13:32           ` Peter Zijlstra
2017-08-21 14:28             ` Peter Zijlstra
2017-08-21 14:28             ` Peter Zijlstra
2017-09-22 18:27               ` H. Peter Anvin
2017-09-23 10:00                 ` Ingo Molnar
2017-09-24 22:37                   ` Pavel Machek
2017-09-25  7:33                     ` Ingo Molnar
2017-09-25  7:33                     ` Ingo Molnar
2017-10-06 10:39                       ` Pavel Machek
2017-10-06 10:39                       ` Pavel Machek
2017-10-20  8:13                         ` Ingo Molnar
2017-10-20  8:13                         ` Ingo Molnar
2017-09-24 22:37                   ` Pavel Machek
2017-09-23 10:00                 ` Ingo Molnar
2017-09-22 18:27               ` H. Peter Anvin
2017-08-21 13:32           ` Peter Zijlstra
2017-08-21 14:31         ` Peter Zijlstra
2017-08-21 15:57           ` Thomas Garnier
2017-08-21 15:57           ` Thomas Garnier
2017-08-28  1:26           ` H. Peter Anvin
2017-08-28  1:26           ` H. Peter Anvin
2017-08-21 14:31         ` Peter Zijlstra
2017-08-11 12:41 ` Ingo Molnar
2017-10-04 21:19 Thomas Garnier
2017-10-04 21:19 ` Thomas Garnier
2017-10-04 21:19 Thomas Garnier via Virtualization
2017-10-04 21:19 Thomas Garnier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.