All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-02-28  0:00 ` Thomas Garnier
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Herbert Xu, David S. Miller, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Peter Zijlstra, Thomas Garnier, Miguel Ojeda, Will Deacon,
	Ard Biesheuvel, Masami Hiramatsu, Jiri Slaby, Boris Ostrovsky,
	Josh Poimboeuf, Cao jin, Allison Randal, linux-crypto,
	linux-kernel, virtualization, linux-pm

Minor changes based on feedback and rebase from v10.

Splitting the previous serie in two. This part contains assembly code
changes required for PIE but without any direct dependencies with the
rest of the patchset.

Note: Using objtool to detect non-compliant PIE relocations is not yet
possible as this patchset only includes the simplest PIE changes.
Additional changes are needed in kvm, xen and percpu code.

Changes:
 - patch v11 (assembly);
   - Fix comments on x86/entry/64.
   - Remove KASLR PIE explanation on all commits.
   - Add note on objtool not being possible at this stage of the patchset.
 - patch v10 (assembly):
   - Swap rax for rdx on entry/64 changes based on feedback.
   - Addressed feedback from Borislav Petkov on boot, paravirt, alternatives
     and globally.
   - Rebased the patchset and ensure it works with large kaslr (not included).
 - patch v9 (assembly):
   - Moved to relative reference for sync_core based on feedback.
   - x86/crypto had multiple algorithms deleted, removed PIE changes to them.
   - fix typo on comment end line.
 - patch v8 (assembly):
   - Fix issues in crypto changes (thanks to Eric Biggers).
   - Remove unnecessary jump table change.
   - Change author and signoff to chromium email address.
 - patch v7 (assembly):
   - Split patchset and reorder changes.
 - patch v6:
   - Rebase on latest changes in jump tables and crypto.
   - Fix wording on couple commits.
   - Revisit checkpatch warnings.
   - Moving to @chromium.org.
 - patch v5:
   - Adapt new crypto modules for PIE.
   - Improve per-cpu commit message.
   - Fix xen 32-bit build error with .quad.
   - Remove extra code for ftrace.
 - patch v4:
   - Simplify early boot by removing global variables.
   - Modify the mcount location script for __mcount_loc intead of the address
     read in the ftrace implementation.
   - Edit commit description to explain better where the kernel can be located.
   - Streamlined the testing done on each patch proposal. Always testing
     hibernation, suspend, ftrace and kprobe to ensure no regressions.
 - patch v3:
   - Update on message to describe longer term PIE goal.
   - Minor change on ftrace if condition.
   - Changed code using xchgq.
 - patch v2:
   - Adapt patch to work post KPTI and compiler changes
   - Redo all performance testing with latest configs and compilers
   - Simplify mov macro on PIE (MOVABS now)
   - Reduce GOT footprint
 - patch v1:
   - Simplify ftrace implementation.
   - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
 - rfc v3:
   - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
     mapped memory. It also simplifies the relocation process.
   - Move the start the module section next to the kernel. Remove the need for
     -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
   - Support for XEN PVH as 32-bit relocations can be ignored with
     --emit-relocs.
   - Support for GOT relocations previously done automatically with -pie.
   - Remove need for dynamic PLT in modules.
   - Support dymamic GOT for modules.
 - rfc v2:
   - Add support for global stack cookie while compiler default to fs without
     mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
     preserve.

These patches make some of the changes necessary to build the kernel as
Position Independent Executable (PIE) on x86_64. Another patchset will
add the PIE option and larger architecture changes. PIE allows the kernel to be
placed below the 0xffffffff80000000 increasing the range of KASLR.

The patches:
 - 1, 3-11: Change in assembly code to be PIE compliant.
 - 2: Add a new _ASM_MOVABS macro to fetch a symbol address generically.

diffstat:
 crypto/aegis128-aesni-asm.S         |    6 +-
 crypto/aesni-intel_asm.S            |    8 +--
 crypto/aesni-intel_avx-x86_64.S     |    3 -
 crypto/camellia-aesni-avx-asm_64.S  |   42 +++++++--------
 crypto/camellia-aesni-avx2-asm_64.S |   44 ++++++++--------
 crypto/camellia-x86_64-asm_64.S     |    8 +--
 crypto/cast5-avx-x86_64-asm_64.S    |   50 ++++++++++--------
 crypto/cast6-avx-x86_64-asm_64.S    |   44 +++++++++-------
 crypto/des3_ede-asm_64.S            |   96 ++++++++++++++++++++++++------------
 crypto/ghash-clmulni-intel_asm.S    |    4 -
 crypto/glue_helper-asm-avx.S        |    4 -
 crypto/glue_helper-asm-avx2.S       |    6 +-
 crypto/sha256-avx2-asm.S            |   18 ++++--
 entry/entry_64.S                    |   16 ++++--
 include/asm/alternative.h           |    6 +-
 include/asm/asm.h                   |    1 
 include/asm/bug.h                   |    2 
 include/asm/paravirt_types.h        |   32 ++++++++++--
 include/asm/pm-trace.h              |    2 
 include/asm/processor.h             |    6 +-
 kernel/acpi/wakeup_64.S             |   31 ++++++-----
 kernel/head_64.S                    |   15 +++--
 kernel/relocate_kernel_64.S         |    2 
 power/hibernate_asm_64.S            |    4 -
 24 files changed, 268 insertions(+), 182 deletions(-)

Patchset is based on next-20200227.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-02-28  0:00 ` Thomas Garnier
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Herbert Xu, David S. Miller, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Peter Zijlstra, Thomas Garnier, Miguel Ojeda, Will Deacon,
	Ard Biesheuvel

Minor changes based on feedback and rebase from v10.

Splitting the previous serie in two. This part contains assembly code
changes required for PIE but without any direct dependencies with the
rest of the patchset.

Note: Using objtool to detect non-compliant PIE relocations is not yet
possible as this patchset only includes the simplest PIE changes.
Additional changes are needed in kvm, xen and percpu code.

Changes:
 - patch v11 (assembly);
   - Fix comments on x86/entry/64.
   - Remove KASLR PIE explanation on all commits.
   - Add note on objtool not being possible at this stage of the patchset.
 - patch v10 (assembly):
   - Swap rax for rdx on entry/64 changes based on feedback.
   - Addressed feedback from Borislav Petkov on boot, paravirt, alternatives
     and globally.
   - Rebased the patchset and ensure it works with large kaslr (not included).
 - patch v9 (assembly):
   - Moved to relative reference for sync_core based on feedback.
   - x86/crypto had multiple algorithms deleted, removed PIE changes to them.
   - fix typo on comment end line.
 - patch v8 (assembly):
   - Fix issues in crypto changes (thanks to Eric Biggers).
   - Remove unnecessary jump table change.
   - Change author and signoff to chromium email address.
 - patch v7 (assembly):
   - Split patchset and reorder changes.
 - patch v6:
   - Rebase on latest changes in jump tables and crypto.
   - Fix wording on couple commits.
   - Revisit checkpatch warnings.
   - Moving to @chromium.org.
 - patch v5:
   - Adapt new crypto modules for PIE.
   - Improve per-cpu commit message.
   - Fix xen 32-bit build error with .quad.
   - Remove extra code for ftrace.
 - patch v4:
   - Simplify early boot by removing global variables.
   - Modify the mcount location script for __mcount_loc intead of the address
     read in the ftrace implementation.
   - Edit commit description to explain better where the kernel can be located.
   - Streamlined the testing done on each patch proposal. Always testing
     hibernation, suspend, ftrace and kprobe to ensure no regressions.
 - patch v3:
   - Update on message to describe longer term PIE goal.
   - Minor change on ftrace if condition.
   - Changed code using xchgq.
 - patch v2:
   - Adapt patch to work post KPTI and compiler changes
   - Redo all performance testing with latest configs and compilers
   - Simplify mov macro on PIE (MOVABS now)
   - Reduce GOT footprint
 - patch v1:
   - Simplify ftrace implementation.
   - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
 - rfc v3:
   - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
     mapped memory. It also simplifies the relocation process.
   - Move the start the module section next to the kernel. Remove the need for
     -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
   - Support for XEN PVH as 32-bit relocations can be ignored with
     --emit-relocs.
   - Support for GOT relocations previously done automatically with -pie.
   - Remove need for dynamic PLT in modules.
   - Support dymamic GOT for modules.
 - rfc v2:
   - Add support for global stack cookie while compiler default to fs without
     mcmodel=kernel
   - Change patch 7 to correctly jump out of the identity mapping on kexec load
     preserve.

These patches make some of the changes necessary to build the kernel as
Position Independent Executable (PIE) on x86_64. Another patchset will
add the PIE option and larger architecture changes. PIE allows the kernel to be
placed below the 0xffffffff80000000 increasing the range of KASLR.

The patches:
 - 1, 3-11: Change in assembly code to be PIE compliant.
 - 2: Add a new _ASM_MOVABS macro to fetch a symbol address generically.

diffstat:
 crypto/aegis128-aesni-asm.S         |    6 +-
 crypto/aesni-intel_asm.S            |    8 +--
 crypto/aesni-intel_avx-x86_64.S     |    3 -
 crypto/camellia-aesni-avx-asm_64.S  |   42 +++++++--------
 crypto/camellia-aesni-avx2-asm_64.S |   44 ++++++++--------
 crypto/camellia-x86_64-asm_64.S     |    8 +--
 crypto/cast5-avx-x86_64-asm_64.S    |   50 ++++++++++--------
 crypto/cast6-avx-x86_64-asm_64.S    |   44 +++++++++-------
 crypto/des3_ede-asm_64.S            |   96 ++++++++++++++++++++++++------------
 crypto/ghash-clmulni-intel_asm.S    |    4 -
 crypto/glue_helper-asm-avx.S        |    4 -
 crypto/glue_helper-asm-avx2.S       |    6 +-
 crypto/sha256-avx2-asm.S            |   18 ++++--
 entry/entry_64.S                    |   16 ++++--
 include/asm/alternative.h           |    6 +-
 include/asm/asm.h                   |    1 
 include/asm/bug.h                   |    2 
 include/asm/paravirt_types.h        |   32 ++++++++++--
 include/asm/pm-trace.h              |    2 
 include/asm/processor.h             |    6 +-
 kernel/acpi/wakeup_64.S             |   31 ++++++-----
 kernel/head_64.S                    |   15 +++--
 kernel/relocate_kernel_64.S         |    2 
 power/hibernate_asm_64.S            |    4 -
 24 files changed, 268 insertions(+), 182 deletions(-)

Patchset is based on next-20200227.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v11 01/11] x86/crypto: Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Herbert Xu, David S. Miller,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-crypto, linux-kernel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
---
 arch/x86/crypto/aegis128-aesni-asm.S         |  6 +-
 arch/x86/crypto/aesni-intel_asm.S            |  8 +-
 arch/x86/crypto/aesni-intel_avx-x86_64.S     |  3 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 42 ++++-----
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 44 ++++-----
 arch/x86/crypto/camellia-x86_64-asm_64.S     |  8 +-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S    | 50 +++++-----
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S    | 44 +++++----
 arch/x86/crypto/des3_ede-asm_64.S            | 96 +++++++++++++-------
 arch/x86/crypto/ghash-clmulni-intel_asm.S    |  4 +-
 arch/x86/crypto/glue_helper-asm-avx.S        |  4 +-
 arch/x86/crypto/glue_helper-asm-avx2.S       |  6 +-
 arch/x86/crypto/sha256-avx2-asm.S            | 18 ++--
 13 files changed, 191 insertions(+), 142 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-asm.S b/arch/x86/crypto/aegis128-aesni-asm.S
index 51d46d93efbc..c4bd32d8ca43 100644
--- a/arch/x86/crypto/aegis128-aesni-asm.S
+++ b/arch/x86/crypto/aegis128-aesni-asm.S
@@ -200,8 +200,8 @@ SYM_FUNC_START(crypto_aegis128_aesni_init)
 	movdqa KEY, STATE4
 
 	/* load the constants: */
-	movdqa .Laegis128_const_0, STATE2
-	movdqa .Laegis128_const_1, STATE1
+	movdqa .Laegis128_const_0(%rip), STATE2
+	movdqa .Laegis128_const_1(%rip), STATE1
 	pxor STATE2, STATE3
 	pxor STATE1, STATE4
 
@@ -681,7 +681,7 @@ SYM_FUNC_START(crypto_aegis128_aesni_dec_tail)
 	punpcklbw T0, T0
 	punpcklbw T0, T0
 	punpcklbw T0, T0
-	movdqa .Laegis128_counter, T1
+	movdqa .Laegis128_counter(%rip), T1
 	pcmpgtb T1, T0
 	pand T0, MSG
 
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index cad6e1bfa7d5..a614e277d15b 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2597,7 +2597,7 @@ SYM_FUNC_END(aesni_cbc_dec)
  *	BSWAP_MASK == endian swapping mask
  */
 SYM_FUNC_START_LOCAL(_aesni_inc_init)
-	movaps .Lbswap_mask, BSWAP_MASK
+	movaps .Lbswap_mask(%rip), BSWAP_MASK
 	movaps IV, CTR
 	PSHUFB_XMM BSWAP_MASK CTR
 	mov $1, TCTR_LOW
@@ -2724,12 +2724,12 @@ SYM_FUNC_START(aesni_xts_crypt8)
 	cmpb $0, %cl
 	movl $0, %ecx
 	movl $240, %r10d
-	leaq _aesni_enc4, %r11
-	leaq _aesni_dec4, %rax
+	leaq _aesni_enc4(%rip), %r11
+	leaq _aesni_dec4(%rip), %rax
 	cmovel %r10d, %ecx
 	cmoveq %rax, %r11
 
-	movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
 	movups (IVP), IV
 
 	mov 480(KEYP), KLEN
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index bfa1c0b3e5b4..838221507976 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -660,7 +660,8 @@ _get_AAD_rest0\@:
 	vpshufb and an array of shuffle masks */
 	movq    %r12, %r11
 	salq    $4, %r11
-	vmovdqu  aad_shift_arr(%r11), \T1
+	leaq    aad_shift_arr(%rip), %rax
+	vmovdqu  (%rax,%r11,), \T1
 	vpshufb \T1, \T7, \T7
 _get_AAD_rest_final\@:
 	vpshufb SHUF_MASK(%rip), \T7, \T7
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index d01ddd73de65..8a203c507f81 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -53,10 +53,10 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vmovdqa .Linv_shift_row, t4; \
-	vbroadcastss .L0f0f0f0f, t7; \
-	vmovdqa .Lpre_tf_lo_s1, t0; \
-	vmovdqa .Lpre_tf_hi_s1, t1; \
+	vmovdqa .Linv_shift_row(%rip), t4; \
+	vbroadcastss .L0f0f0f0f(%rip), t7; \
+	vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -69,8 +69,8 @@
 	vpshufb t4, x6, x6; \
 	\
 	/* prefilter sboxes 1, 2 and 3 */ \
-	vmovdqa .Lpre_tf_lo_s4, t2; \
-	vmovdqa .Lpre_tf_hi_s4, t3; \
+	vmovdqa .Lpre_tf_lo_s4(%rip), t2; \
+	vmovdqa .Lpre_tf_hi_s4(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x1, t0, t1, t7, t6); \
@@ -84,8 +84,8 @@
 	filter_8bit(x6, t2, t3, t7, t6); \
 	\
 	/* AES subbytes + AES shift rows */ \
-	vmovdqa .Lpost_tf_lo_s1, t0; \
-	vmovdqa .Lpost_tf_hi_s1, t1; \
+	vmovdqa .Lpost_tf_lo_s1(%rip), t0; \
+	vmovdqa .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4, x0, x0; \
 	vaesenclast t4, x7, x7; \
 	vaesenclast t4, x1, x1; \
@@ -96,16 +96,16 @@
 	vaesenclast t4, x6, x6; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vmovdqa .Lpost_tf_lo_s3, t2; \
-	vmovdqa .Lpost_tf_hi_s3, t3; \
+	vmovdqa .Lpost_tf_lo_s3(%rip), t2; \
+	vmovdqa .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vmovdqa .Lpost_tf_lo_s2, t4; \
-	vmovdqa .Lpost_tf_hi_s2, t5; \
+	vmovdqa .Lpost_tf_lo_s2(%rip), t4; \
+	vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -444,7 +444,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vmovdqu .Lshufb_16x16b, a0; \
+	vmovdqu .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -483,7 +483,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 16(rio), x0, y7; \
 	vpxor 1 * 16(rio), x0, y6; \
@@ -534,7 +534,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vmovq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1017,7 +1017,7 @@ SYM_FUNC_START(camellia_ctr_16way)
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lbswap128_mask, %xmm14;
+	vmovdqa .Lbswap128_mask(%rip), %xmm14;
 
 	/* load IV and byteswap */
 	vmovdqu (%rcx), %xmm0;
@@ -1066,7 +1066,7 @@ SYM_FUNC_START(camellia_ctr_16way)
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor %xmm0, %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1134,7 +1134,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_16way)
 	subq $(16 * 16), %rsp;
 	movq %rsp, %rax;
 
-	vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+	vmovdqa .Lxts_gf128mul_and_shl1_mask(%rip), %xmm14;
 
 	/* load IV */
 	vmovdqu (%rcx), %xmm0;
@@ -1210,7 +1210,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_16way)
 
 	/* inpack16_pre: */
 	vmovq (key_table)(CTX, %r8, 8), %xmm15;
-	vpshufb .Lpack_bswap, %xmm15, %xmm15;
+	vpshufb .Lpack_bswap(%rip), %xmm15, %xmm15;
 	vpxor 0 * 16(%rax), %xmm15, %xmm0;
 	vpxor %xmm1, %xmm15, %xmm1;
 	vpxor %xmm2, %xmm15, %xmm2;
@@ -1265,7 +1265,7 @@ SYM_FUNC_START(camellia_xts_enc_16way)
 	 */
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk16, %r9;
+	leaq __camellia_enc_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 SYM_FUNC_END(camellia_xts_enc_16way)
@@ -1283,7 +1283,7 @@ SYM_FUNC_START(camellia_xts_dec_16way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk16, %r9;
+	leaq __camellia_dec_blk16(%rip), %r9;
 
 	jmp camellia_xts_crypt_16way;
 SYM_FUNC_END(camellia_xts_dec_16way)
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index 563ef6e83cdd..d7f4cd4b1702 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -65,12 +65,12 @@
 	/* \
 	 * S-function with AES subbytes \
 	 */ \
-	vbroadcasti128 .Linv_shift_row, t4; \
-	vpbroadcastd .L0f0f0f0f, t7; \
-	vbroadcasti128 .Lpre_tf_lo_s1, t5; \
-	vbroadcasti128 .Lpre_tf_hi_s1, t6; \
-	vbroadcasti128 .Lpre_tf_lo_s4, t2; \
-	vbroadcasti128 .Lpre_tf_hi_s4, t3; \
+	vbroadcasti128 .Linv_shift_row(%rip), t4; \
+	vpbroadcastd .L0f0f0f0f(%rip), t7; \
+	vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \
+	vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \
+	vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \
+	vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \
 	\
 	/* AES inverse shift rows */ \
 	vpshufb t4, x0, x0; \
@@ -116,8 +116,8 @@
 	vinserti128 $1, t2##_x, x6, x6; \
 	vextracti128 $1, x1, t3##_x; \
 	vextracti128 $1, x4, t2##_x; \
-	vbroadcasti128 .Lpost_tf_lo_s1, t0; \
-	vbroadcasti128 .Lpost_tf_hi_s1, t1; \
+	vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \
+	vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \
 	vaesenclast t4##_x, x2##_x, x2##_x; \
 	vaesenclast t4##_x, t6##_x, t6##_x; \
 	vinserti128 $1, t6##_x, x2, x2; \
@@ -132,16 +132,16 @@
 	vinserti128 $1, t2##_x, x4, x4; \
 	\
 	/* postfilter sboxes 1 and 4 */ \
-	vbroadcasti128 .Lpost_tf_lo_s3, t2; \
-	vbroadcasti128 .Lpost_tf_hi_s3, t3; \
+	vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \
+	vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \
 	filter_8bit(x0, t0, t1, t7, t6); \
 	filter_8bit(x7, t0, t1, t7, t6); \
 	filter_8bit(x3, t0, t1, t7, t6); \
 	filter_8bit(x6, t0, t1, t7, t6); \
 	\
 	/* postfilter sbox 3 */ \
-	vbroadcasti128 .Lpost_tf_lo_s2, t4; \
-	vbroadcasti128 .Lpost_tf_hi_s2, t5; \
+	vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \
+	vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \
 	filter_8bit(x2, t2, t3, t7, t6); \
 	filter_8bit(x5, t2, t3, t7, t6); \
 	\
@@ -478,7 +478,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	transpose_4x4(c0, c1, c2, c3, a0, a1); \
 	transpose_4x4(d0, d1, d2, d3, a0, a1); \
 	\
-	vbroadcasti128 .Lshufb_16x16b, a0; \
+	vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
 	vmovdqu st1, a1; \
 	vpshufb a0, a2, a2; \
 	vpshufb a0, a3, a3; \
@@ -517,7 +517,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 #define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
 		     y6, y7, rio, key) \
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor 0 * 32(rio), x0, y7; \
 	vpxor 1 * 32(rio), x0, y6; \
@@ -568,7 +568,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 	vmovdqu x0, stack_tmp0; \
 	\
 	vpbroadcastq key, x0; \
-	vpshufb .Lpack_bswap, x0, x0; \
+	vpshufb .Lpack_bswap(%rip), x0, x0; \
 	\
 	vpxor x0, y7, y7; \
 	vpxor x0, y6, y6; \
@@ -1108,7 +1108,7 @@ SYM_FUNC_START(camellia_ctr_32way)
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm1;
 	inc_le128(%xmm0, %xmm15, %xmm14);
-	vbroadcasti128 .Lbswap128_mask, %ymm14;
+	vbroadcasti128 .Lbswap128_mask(%rip), %ymm14;
 	vinserti128 $1, %xmm0, %ymm1, %ymm0;
 	vpshufb %ymm14, %ymm0, %ymm13;
 	vmovdqu %ymm13, 15 * 32(%rax);
@@ -1154,7 +1154,7 @@ SYM_FUNC_START(camellia_ctr_32way)
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor %ymm0, %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1238,13 +1238,13 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_32way)
 	subq $(16 * 32), %rsp;
 	movq %rsp, %rax;
 
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0, %ymm12;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_0(%rip), %ymm12;
 
 	/* load IV and construct second IV */
 	vmovdqu (%rcx), %xmm0;
 	vmovdqa %xmm0, %xmm15;
 	gf128mul_x_ble(%xmm0, %xmm12, %xmm13);
-	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1, %ymm13;
+	vbroadcasti128 .Lxts_gf128mul_and_shl1_mask_1(%rip), %ymm13;
 	vinserti128 $1, %xmm0, %ymm15, %ymm0;
 	vpxor 0 * 32(%rdx), %ymm0, %ymm15;
 	vmovdqu %ymm15, 15 * 32(%rax);
@@ -1321,7 +1321,7 @@ SYM_FUNC_START_LOCAL(camellia_xts_crypt_32way)
 
 	/* inpack32_pre: */
 	vpbroadcastq (key_table)(CTX, %r8, 8), %ymm15;
-	vpshufb .Lpack_bswap, %ymm15, %ymm15;
+	vpshufb .Lpack_bswap(%rip), %ymm15, %ymm15;
 	vpxor 0 * 32(%rax), %ymm15, %ymm0;
 	vpxor %ymm1, %ymm15, %ymm1;
 	vpxor %ymm2, %ymm15, %ymm2;
@@ -1379,7 +1379,7 @@ SYM_FUNC_START(camellia_xts_enc_32way)
 
 	xorl %r8d, %r8d; /* input whitening key, 0 for enc */
 
-	leaq __camellia_enc_blk32, %r9;
+	leaq __camellia_enc_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 SYM_FUNC_END(camellia_xts_enc_32way)
@@ -1397,7 +1397,7 @@ SYM_FUNC_START(camellia_xts_dec_32way)
 	movl $24, %eax;
 	cmovel %eax, %r8d;  /* input whitening key, last for dec */
 
-	leaq __camellia_dec_blk32, %r9;
+	leaq __camellia_dec_blk32(%rip), %r9;
 
 	jmp camellia_xts_crypt_32way;
 SYM_FUNC_END(camellia_xts_dec_32way)
diff --git a/arch/x86/crypto/camellia-x86_64-asm_64.S b/arch/x86/crypto/camellia-x86_64-asm_64.S
index 1372e6408850..3a4e5cca583a 100644
--- a/arch/x86/crypto/camellia-x86_64-asm_64.S
+++ b/arch/x86/crypto/camellia-x86_64-asm_64.S
@@ -77,11 +77,13 @@
 #define RXORbl %r9b
 
 #define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \
+	leaq T0(%rip), 			tmp1; \
 	movzbl ab ## bl,		tmp2 ## d; \
+	xorq (tmp1, tmp2, 8),		dst; \
+	leaq T1(%rip), 			tmp2; \
 	movzbl ab ## bh,		tmp1 ## d; \
-	rorq $16,			ab; \
-	xorq T0(, tmp2, 8),		dst; \
-	xorq T1(, tmp1, 8),		dst;
+	xorq (tmp2, tmp1, 8),		dst; \
+	rorq $16,			ab;
 
 /**********************************************************************
   1-way camellia
diff --git a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
index 8a6181b08b59..ffa9ed5f096b 100644
--- a/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast5-avx-x86_64-asm_64.S
@@ -83,16 +83,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -151,15 +155,15 @@
 	subround(l ## 3, r ## 3, l ## 4, r ## 4, f);
 
 #define enc_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR;
 
 #define dec_preload_rkr() \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		kr(CTX),                  RKR, RKR; \
-	vpshufb		.Lbswap128_mask,          RKR, RKR;
+	vpshufb		.Lbswap128_mask(%rip),    RKR, RKR;
 
 #define transpose_2x4(x0, x1, t0, t1) \
 	vpunpckldq		x1, x0, t0; \
@@ -236,9 +240,9 @@ SYM_FUNC_START_LOCAL(__cast5_enc_blk16)
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	enc_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -272,7 +276,7 @@ SYM_FUNC_START_LOCAL(__cast5_enc_blk16)
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RR1, RL1, RTMP, RX, RKM);
 	outunpack_blocks(RR2, RL2, RTMP, RX, RKM);
@@ -310,9 +314,9 @@ SYM_FUNC_START_LOCAL(__cast5_dec_blk16)
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 	dec_preload_rkr();
 
 	inpack_blocks(RL1, RR1, RTMP, RX, RKM);
@@ -343,7 +347,7 @@ SYM_FUNC_START_LOCAL(__cast5_dec_blk16)
 	round(RL, RR, 1, 2);
 	round(RR, RL, 0, 1);
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	popq %rbx;
 	popq %r15;
 
@@ -506,8 +510,8 @@ SYM_FUNC_START(cast5_ctr_16way)
 
 	vpcmpeqd RKR, RKR, RKR;
 	vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */
-	vmovdqa .Lbswap_iv_mask, R1ST;
-	vmovdqa .Lbswap128_mask, RKM;
+	vmovdqa .Lbswap_iv_mask(%rip), R1ST;
+	vmovdqa .Lbswap128_mask(%rip), RKM;
 
 	/* load IV and byteswap */
 	vmovq (%rcx), RX;
diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index 932a3ce32a88..c46a61411f98 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -83,16 +83,20 @@
 
 
 #define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	shrq $16,	src;                     \
-	movl		s1(, RID1, 4), dst ## d; \
-	op1		s2(, RID2, 4), dst ## d; \
-	movzbl		src ## bh,     RID1d;    \
-	movzbl		src ## bl,     RID2d;    \
-	interleave_op(il_reg);			 \
-	op2		s3(, RID1, 4), dst ## d; \
-	op3		s4(, RID2, 4), dst ## d;
+	movzbl		src ## bh,       RID1d;    \
+	leaq		s1(%rip),        RID2;     \
+	movl		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,       RID2d;    \
+	leaq		s2(%rip),        RID1;     \
+	op1		(RID1, RID2, 4), dst ## d; \
+	shrq $16,	src;                       \
+	movzbl		src ## bh,     RID1d;      \
+	leaq		s3(%rip),        RID2;     \
+	op2		(RID2, RID1, 4), dst ## d; \
+	movzbl		src ## bl,     RID2d;      \
+	leaq		s4(%rip),        RID1;     \
+	op3		(RID1, RID2, 4), dst ## d; \
+	interleave_op(il_reg);
 
 #define dummy(d) /* do nothing */
 
@@ -175,10 +179,10 @@
 	qop(RD, RC, 1);
 
 #define shuffle(mask) \
-	vpshufb		mask,            RKR, RKR;
+	vpshufb		mask(%rip),            RKR, RKR;
 
 #define preload_rkr(n, do_mask, mask) \
-	vbroadcastss	.L16_mask,                RKR;      \
+	vbroadcastss	.L16_mask(%rip),          RKR;      \
 	/* add 16-bit rotation to key rotations (mod 32) */ \
 	vpxor		(kr+n*16)(CTX),           RKR, RKR; \
 	do_mask(mask);
@@ -260,9 +264,9 @@ SYM_FUNC_START_LOCAL(__cast6_enc_blk8)
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -286,7 +290,7 @@ SYM_FUNC_START_LOCAL(__cast6_enc_blk8)
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -308,9 +312,9 @@ SYM_FUNC_START_LOCAL(__cast6_dec_blk8)
 
 	movq %rdi, CTX;
 
-	vmovdqa .Lbswap_mask, RKM;
-	vmovd .Lfirst_mask, R1ST;
-	vmovd .L32_mask, R32;
+	vmovdqa .Lbswap_mask(%rip), RKM;
+	vmovd .Lfirst_mask(%rip), R1ST;
+	vmovd .L32_mask(%rip), R32;
 
 	inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
@@ -334,7 +338,7 @@ SYM_FUNC_START_LOCAL(__cast6_dec_blk8)
 	popq %rbx;
 	popq %r15;
 
-	vmovdqa .Lbswap_mask, RKM;
+	vmovdqa .Lbswap_mask(%rip), RKM;
 	outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
 	outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
 
diff --git a/arch/x86/crypto/des3_ede-asm_64.S b/arch/x86/crypto/des3_ede-asm_64.S
index fac0fdc3f25d..c841de5fe065 100644
--- a/arch/x86/crypto/des3_ede-asm_64.S
+++ b/arch/x86/crypto/des3_ede-asm_64.S
@@ -129,21 +129,29 @@
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
 	shrq $16, RW0; \
-	movq s8(, RT0, 8), RT0; \
-	xorq s6(, RT1, 8), to; \
+	leaq s8(%rip), RW1; \
+	movq (RW1, RT0, 8), RT0; \
+	leaq s6(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
 	movzbl RW0bl, RL1d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s4(, RT2, 8), RT0; \
-	xorq s2(, RT3, 8), to; \
+	leaq s4(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
+	leaq s2(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 	movzbl RW0bl, RT2d; \
 	movzbl RW0bh, RT3d; \
-	xorq s7(, RL1, 8), RT0; \
-	xorq s5(, RT1, 8), to; \
-	xorq s3(, RT2, 8), RT0; \
+	leaq s7(%rip), RW1; \
+	xorq (RW1, RL1, 8), RT0; \
+	leaq s5(%rip), RW1; \
+	xorq (RW1, RT1, 8), to; \
+	leaq s3(%rip), RW1; \
+	xorq (RW1, RT2, 8), RT0; \
 	load_next_key(n, RW0); \
 	xorq RT0, to; \
-	xorq s1(, RT3, 8), to; \
+	leaq s1(%rip), RW1; \
+	xorq (RW1, RT3, 8), to; \
 
 #define load_next_key(n, RWx) \
 	movq (((n) + 1) * 8)(CTX), RWx;
@@ -355,65 +363,89 @@ SYM_FUNC_END(des3_ede_x86_64_crypt_blk)
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s8(, RT3, 8), to##0; \
-	xorq s6(, RT1, 8), to##0; \
+	leaq s8(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s6(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrq $16, RW0; \
-	xorq s4(, RT3, 8), to##0; \
-	xorq s2(, RT1, 8), to##0; \
+	leaq s4(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s2(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	shrl $16, RW0d; \
-	xorq s7(, RT3, 8), to##0; \
-	xorq s5(, RT1, 8), to##0; \
+	leaq s7(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s5(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 	movzbl RW0bl, RT3d; \
 	movzbl RW0bh, RT1d; \
 	load_next_key(n, RW0); \
-	xorq s3(, RT3, 8), to##0; \
-	xorq s1(, RT1, 8), to##0; \
+	leaq s3(%rip), RT2; \
+	xorq (RT2, RT3, 8), to##0; \
+	leaq s1(%rip), RT2; \
+	xorq (RT2, RT1, 8), to##0; \
 		xorq from##1, RW1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s8(, RT3, 8), to##1; \
-		xorq s6(, RT1, 8), to##1; \
+		leaq s8(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s6(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrq $16, RW1; \
-		xorq s4(, RT3, 8), to##1; \
-		xorq s2(, RT1, 8), to##1; \
+		leaq s4(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s2(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		shrl $16, RW1d; \
-		xorq s7(, RT3, 8), to##1; \
-		xorq s5(, RT1, 8), to##1; \
+		leaq s7(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s5(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 		movzbl RW1bl, RT3d; \
 		movzbl RW1bh, RT1d; \
 		do_movq(RW0, RW1); \
-		xorq s3(, RT3, 8), to##1; \
-		xorq s1(, RT1, 8), to##1; \
+		leaq s3(%rip), RT2; \
+		xorq (RT2, RT3, 8), to##1; \
+		leaq s1(%rip), RT2; \
+		xorq (RT2, RT1, 8), to##1; \
 			xorq from##2, RW2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s8(, RT3, 8), to##2; \
-			xorq s6(, RT1, 8), to##2; \
+			leaq s8(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s6(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrq $16, RW2; \
-			xorq s4(, RT3, 8), to##2; \
-			xorq s2(, RT1, 8), to##2; \
+			leaq s4(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s2(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			shrl $16, RW2d; \
-			xorq s7(, RT3, 8), to##2; \
-			xorq s5(, RT1, 8), to##2; \
+			leaq s7(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s5(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2; \
 			movzbl RW2bl, RT3d; \
 			movzbl RW2bh, RT1d; \
 			do_movq(RW0, RW2); \
-			xorq s3(, RT3, 8), to##2; \
-			xorq s1(, RT1, 8), to##2;
+			leaq s3(%rip), RT2; \
+			xorq (RT2, RT3, 8), to##2; \
+			leaq s1(%rip), RT2; \
+			xorq (RT2, RT1, 8), to##2;
 
 #define __movq(src, dst) \
 	movq src, dst;
diff --git a/arch/x86/crypto/ghash-clmulni-intel_asm.S b/arch/x86/crypto/ghash-clmulni-intel_asm.S
index bb9735fbb865..59717ade66d7 100644
--- a/arch/x86/crypto/ghash-clmulni-intel_asm.S
+++ b/arch/x86/crypto/ghash-clmulni-intel_asm.S
@@ -94,7 +94,7 @@ SYM_FUNC_START(clmul_ghash_mul)
 	FRAME_BEGIN
 	movups (%rdi), DATA
 	movups (%rsi), SHASH
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	PSHUFB_XMM BSWAP DATA
 	call __clmul_gf128mul_ble
 	PSHUFB_XMM BSWAP DATA
@@ -111,7 +111,7 @@ SYM_FUNC_START(clmul_ghash_update)
 	FRAME_BEGIN
 	cmp $16, %rdx
 	jb .Lupdate_just_ret	# check length
-	movaps .Lbswap_mask, BSWAP
+	movaps .Lbswap_mask(%rip), BSWAP
 	movups (%rdi), DATA
 	movups (%rcx), SHASH
 	PSHUFB_XMM BSWAP DATA
diff --git a/arch/x86/crypto/glue_helper-asm-avx.S b/arch/x86/crypto/glue_helper-asm-avx.S
index d08fc575ef7f..a9736f85fef0 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -44,7 +44,7 @@
 #define load_ctr_8way(iv, bswap, x0, x1, x2, x3, x4, x5, x6, x7, t0, t1, t2) \
 	vpcmpeqd t0, t0, t0; \
 	vpsrldq $8, t0, t0; /* low: -1, high: 0 */ \
-	vmovdqa bswap, t1; \
+	vmovdqa bswap(%rip), t1; \
 	\
 	/* load IV and byteswap */ \
 	vmovdqu (iv), x7; \
@@ -89,7 +89,7 @@
 
 #define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
 		      t1, xts_gf128mul_and_shl1_mask) \
-	vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+	vmovdqa xts_gf128mul_and_shl1_mask(%rip), t0; \
 	\
 	/* load IV */ \
 	vmovdqu (iv), tiv; \
diff --git a/arch/x86/crypto/glue_helper-asm-avx2.S b/arch/x86/crypto/glue_helper-asm-avx2.S
index d84508c85c13..efbf4953707e 100644
--- a/arch/x86/crypto/glue_helper-asm-avx2.S
+++ b/arch/x86/crypto/glue_helper-asm-avx2.S
@@ -62,7 +62,7 @@
 	vmovdqu (iv), t2x; \
 	vmovdqa t2x, t3x; \
 	inc_le128(t2x, t0x, t1x); \
-	vbroadcasti128 bswap, t1; \
+	vbroadcasti128 bswap(%rip), t1; \
 	vinserti128 $1, t2x, t3, t2; /* ab: le0 ; cd: le1 */ \
 	vpshufb t1, t2, x0; \
 	\
@@ -119,13 +119,13 @@
 		       tivx, t0, t0x, t1, t1x, t2, t2x, t3, \
 		       xts_gf128mul_and_shl1_mask_0, \
 		       xts_gf128mul_and_shl1_mask_1) \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_0, t1; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_0(%rip), t1; \
 	\
 	/* load IV and construct second IV */ \
 	vmovdqu (iv), tivx; \
 	vmovdqa tivx, t0x; \
 	gf128mul_x_ble(tivx, t1x, t2x); \
-	vbroadcasti128 xts_gf128mul_and_shl1_mask_1, t2; \
+	vbroadcasti128 xts_gf128mul_and_shl1_mask_1(%rip), t2; \
 	vinserti128 $1, tivx, t0, tiv; \
 	vpxor (0*32)(src), tiv, x0; \
 	vmovdqu tiv, (0*32)(dst); \
diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S
index 499d9ec129de..4b2a08445dc1 100644
--- a/arch/x86/crypto/sha256-avx2-asm.S
+++ b/arch/x86/crypto/sha256-avx2-asm.S
@@ -592,19 +592,23 @@ last_block_enter:
 
 .align 16
 loop1:
-	vpaddd	K256+0*32(SRND), X0, XFER
+	leaq	K256(%rip), INP
+	vpaddd	0*32(INP, SRND), X0, XFER
 	vmovdqa XFER, 0*32+_XFER(%rsp, SRND)
 	FOUR_ROUNDS_AND_SCHED	_XFER + 0*32
 
-	vpaddd	K256+1*32(SRND), X0, XFER
+	leaq	K256(%rip), INP
+	vpaddd	1*32(INP, SRND), X0, XFER
 	vmovdqa XFER, 1*32+_XFER(%rsp, SRND)
 	FOUR_ROUNDS_AND_SCHED	_XFER + 1*32
 
-	vpaddd	K256+2*32(SRND), X0, XFER
+	leaq	K256(%rip), INP
+	vpaddd	2*32(INP, SRND), X0, XFER
 	vmovdqa XFER, 2*32+_XFER(%rsp, SRND)
 	FOUR_ROUNDS_AND_SCHED	_XFER + 2*32
 
-	vpaddd	K256+3*32(SRND), X0, XFER
+	leaq	K256(%rip), INP
+	vpaddd	3*32(INP, SRND), X0, XFER
 	vmovdqa XFER, 3*32+_XFER(%rsp, SRND)
 	FOUR_ROUNDS_AND_SCHED	_XFER + 3*32
 
@@ -614,11 +618,13 @@ loop1:
 
 loop2:
 	## Do last 16 rounds with no scheduling
-	vpaddd	K256+0*32(SRND), X0, XFER
+	leaq	K256(%rip), INP
+	vpaddd	0*32(INP, SRND), X0, XFER
 	vmovdqa XFER, 0*32+_XFER(%rsp, SRND)
 	DO_4ROUNDS	_XFER + 0*32
 
-	vpaddd	K256+1*32(SRND), X1, XFER
+	leaq	K256(%rip), INP
+	vpaddd	1*32(INP, SRND), X1, XFER
 	vmovdqa XFER, 1*32+_XFER(%rsp, SRND)
 	DO_4ROUNDS	_XFER + 1*32
 	add	$2*32, SRND
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 02/11] x86: Add macro to get symbol address for PIE support
  2020-02-28  0:00 ` Thomas Garnier
  (?)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, Peter Zijlstra,
	Masami Hiramatsu, Will Deacon, linux-kernel

Add a new _ASM_MOVABS macro to fetch a symbol address. Replace
"_ASM_MOV $<symbol>, %dst" code construct that are not compatible with
PIE.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
---
 arch/x86/include/asm/asm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index cd339b88d5d4..644bdbf149ee 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -32,6 +32,7 @@
 #define _ASM_ALIGN	__ASM_SEL(.balign 4, .balign 8)
 
 #define _ASM_MOV	__ASM_SIZE(mov)
+#define _ASM_MOVABS	__ASM_SEL(movl, movabsq)
 #define _ASM_INC	__ASM_SIZE(inc)
 #define _ASM_DEC	__ASM_SIZE(dec)
 #define _ASM_ADD	__ASM_SIZE(add)
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 03/11] x86: relocate_kernel - Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (2 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, Allison Randal,
	Enrico Weigelt, Greg Kroah-Hartman, Jiri Slaby, linux-kernel

Change the assembly code to use only absolute references of symbols for the
kernel to be PIE compatible.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/kernel/relocate_kernel_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index ef3ba99068d3..c294339df5ef 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -206,7 +206,7 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	movq	%rax, %cr3
 	lea	PAGE_SIZE(%r8), %rsp
 	call	swap_pages
-	movq	$virtual_mapped, %rax
+	movabsq	$virtual_mapped, %rax
 	pushq	%rax
 	ret
 SYM_CODE_END(identity_mapped)
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 04/11] x86/entry/64: Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (3 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/entry/entry_64.S | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f2bb91e87877..2c8200d35797 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1329,7 +1329,8 @@ SYM_CODE_START_LOCAL(error_entry)
 	movl	%ecx, %eax			/* zero extend */
 	cmpq	%rax, RIP+8(%rsp)
 	je	.Lbstep_iret
-	cmpq	$.Lgs_change, RIP+8(%rsp)
+	leaq	.Lgs_change(%rip), %rcx
+	cmpq	%rcx, RIP+8(%rsp)
 	jne	.Lerror_entry_done_lfence
 
 	/*
@@ -1529,10 +1530,10 @@ SYM_CODE_START(nmi)
 	 * resume the outer NMI.
 	 */
 
-	movq	$repeat_nmi, %rdx
+	leaq	repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	1f
-	movq	$end_repeat_nmi, %rdx
+	leaq	end_repeat_nmi(%rip), %rdx
 	cmpq	8(%rsp), %rdx
 	ja	nested_nmi_out
 1:
@@ -1586,7 +1587,8 @@ nested_nmi:
 	pushq	%rdx
 	pushfq
 	pushq	$__KERNEL_CS
-	pushq	$repeat_nmi
+	leaq	repeat_nmi(%rip), %rdx
+	pushq	%rdx
 
 	/* Put stack back */
 	addq	$(6*8), %rsp
@@ -1625,7 +1627,11 @@ first_nmi:
 	addq	$8, (%rsp)	/* Fix up RSP */
 	pushfq			/* RFLAGS */
 	pushq	$__KERNEL_CS	/* CS */
-	pushq	$1f		/* RIP */
+	pushq	$0		/* Space for RIP */
+	pushq	%rdx		/* Save RDX */
+	leaq	1f(%rip), %rdx	/* Put the address of 1f label into RDX */
+	movq    %rdx, 8(%rsp)   /* Store it in RIP field */
+	popq	%rdx		/* Restore RDX */
 	iretq			/* continues at repeat_nmi below */
 	UNWIND_HINT_IRET_REGS
 1:
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 05/11] x86: pm-trace - Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (4 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, linux-kernel

Change assembly to use the new _ASM_MOVABS macro instead of _ASM_MOV for
the assembly to be PIE compatible.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/include/asm/pm-trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pm-trace.h b/arch/x86/include/asm/pm-trace.h
index bfa32aa428e5..972070806ce9 100644
--- a/arch/x86/include/asm/pm-trace.h
+++ b/arch/x86/include/asm/pm-trace.h
@@ -8,7 +8,7 @@
 do {								\
 	if (pm_trace_enabled) {					\
 		const void *tracedata;				\
-		asm volatile(_ASM_MOV " $1f,%0\n"		\
+		asm volatile(_ASM_MOVABS " $1f,%0\n"		\
 			     ".section .tracedata,\"a\"\n"	\
 			     "1:\t.word %c1\n\t"		\
 			     _ASM_PTR " %c2\n"			\
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 06/11] x86/CPU: Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (5 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  2020-03-03  4:58   ` Kees Cook
  -1 siblings, 1 reply; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, Andy Lutomirski,
	Peter Zijlstra (Intel),
	Len Brown, linux-kernel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
---
 arch/x86/include/asm/processor.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 09705ccc393c..fdf6366c482d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -746,11 +746,13 @@ static inline void sync_core(void)
 		"pushfq\n\t"
 		"mov %%cs, %0\n\t"
 		"pushq %q0\n\t"
-		"pushq $1f\n\t"
+		"leaq 1f(%%rip), %q0\n\t"
+		"pushq %q0\n\t"
 		"iretq\n\t"
 		UNWIND_HINT_RESTORE
 		"1:"
-		: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
+		: "=&r" (tmp), ASM_CALL_CONSTRAINT
+		: : "cc", "memory");
 #endif
 }
 
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 07/11] x86/acpi: Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (6 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Pavel Machek,
	Rafael J . Wysocki, Rafael J. Wysocki, Len Brown,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-pm, linux-kernel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/kernel/acpi/wakeup_64.S | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index c8daa92f38dc..8e221285d9f1 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -15,7 +15,7 @@
 	 * Hooray, we are in Long 64-bit mode (but still running in low memory)
 	 */
 SYM_FUNC_START(wakeup_long64)
-	movq	saved_magic, %rax
+	movq	saved_magic(%rip), %rax
 	movq	$0x123456789abcdef0, %rdx
 	cmpq	%rdx, %rax
 	je	2f
@@ -31,14 +31,14 @@ SYM_FUNC_START(wakeup_long64)
 	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
-	movq	saved_rsp, %rsp
+	movq	saved_rsp(%rip), %rsp
 
-	movq	saved_rbx, %rbx
-	movq	saved_rdi, %rdi
-	movq	saved_rsi, %rsi
-	movq	saved_rbp, %rbp
+	movq	saved_rbx(%rip), %rbx
+	movq	saved_rdi(%rip), %rdi
+	movq	saved_rsi(%rip), %rsi
+	movq	saved_rbp(%rip), %rbp
 
-	movq	saved_rip, %rax
+	movq	saved_rip(%rip), %rax
 	jmp	*%rax
 SYM_FUNC_END(wakeup_long64)
 
@@ -48,7 +48,7 @@ SYM_FUNC_START(do_suspend_lowlevel)
 	xorl	%eax, %eax
 	call	save_processor_state
 
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -67,13 +67,14 @@ SYM_FUNC_START(do_suspend_lowlevel)
 	pushfq
 	popq	pt_regs_flags(%rax)
 
-	movq	$.Lresume_point, saved_rip(%rip)
+	leaq	.Lresume_point(%rip), %rax
+	movq	%rax, saved_rip(%rip)
 
-	movq	%rsp, saved_rsp
-	movq	%rbp, saved_rbp
-	movq	%rbx, saved_rbx
-	movq	%rdi, saved_rdi
-	movq	%rsi, saved_rsi
+	movq	%rsp, saved_rsp(%rip)
+	movq	%rbp, saved_rbp(%rip)
+	movq	%rbx, saved_rbx(%rip)
+	movq	%rdi, saved_rdi(%rip)
+	movq	%rsi, saved_rsi(%rip)
 
 	addq	$8, %rsp
 	movl	$3, %edi
@@ -85,7 +86,7 @@ SYM_FUNC_START(do_suspend_lowlevel)
 	.align 4
 .Lresume_point:
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	saved_context_cr4(%rax), %rbx
 	movq	%rbx, %cr4
 	movq	saved_context_cr3(%rax), %rbx
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 08/11] x86/boot/64: Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (7 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, Jiri Slaby, Peter Zijlstra,
	Cao jin, Josh Poimboeuf, linux-kernel

Change the assembly code to use absolute reference for transition
between address spaces and relative references when referencing global
variables in the same address space. Ensure the kernel built with PIE
references the correct addresses based on context.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/kernel/head_64.S | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 4bbc770af632..40a467f8e116 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -87,7 +87,8 @@ SYM_CODE_START_NOALIGN(startup_64)
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(early_top_pgt - __START_KERNEL_map), %rax
+	movabs  $(early_top_pgt - __START_KERNEL_map), %rcx
+	addq    %rcx, %rax
 	jmp 1f
 SYM_CODE_END(startup_64)
 
@@ -119,7 +120,8 @@ SYM_CODE_START(secondary_startup_64)
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
-	addq	$(init_top_pgt - __START_KERNEL_map), %rax
+	movabs	$(init_top_pgt - __START_KERNEL_map), %rcx
+	addq    %rcx, %rax
 1:
 
 	/* Enable PAE mode, PGE and LA57 */
@@ -137,7 +139,7 @@ SYM_CODE_START(secondary_startup_64)
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
-	movq	$1f, %rax
+	movabs  $1f, %rax
 	ANNOTATE_RETPOLINE_SAFE
 	jmp	*%rax
 1:
@@ -234,11 +236,12 @@ SYM_CODE_START(secondary_startup_64)
 	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
 	 *		address given in m16:64.
 	 */
-	pushq	$.Lafter_lret	# put return address on stack for unwinder
+	movabs  $.Lafter_lret, %rax
+	pushq	%rax		# put return address on stack for unwinder
 	xorl	%ebp, %ebp	# clear frame pointer
-	movq	initial_code(%rip), %rax
+	leaq	initial_code(%rip), %rax
 	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
+	pushq	(%rax)		# target address in negative space
 	lretq
 .Lafter_lret:
 SYM_CODE_END(secondary_startup_64)
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 09/11] x86/power/64: Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (8 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Pavel Machek,
	Rafael J . Wysocki, Rafael J. Wysocki, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, x86, linux-pm,
	linux-kernel

Change the assembly code to use only relative references of symbols for the
kernel to be PIE compatible.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/power/hibernate_asm_64.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/power/hibernate_asm_64.S b/arch/x86/power/hibernate_asm_64.S
index 7918b8415f13..977b8ae85045 100644
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -23,7 +23,7 @@
 #include <asm/frame.h>
 
 SYM_FUNC_START(swsusp_arch_suspend)
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	%rsp, pt_regs_sp(%rax)
 	movq	%rbp, pt_regs_bp(%rax)
 	movq	%rsi, pt_regs_si(%rax)
@@ -116,7 +116,7 @@ SYM_FUNC_START(restore_registers)
 	movq	%rax, %cr4;  # turn PGE back on
 
 	/* We don't restore %rax, it must be 0 anyway */
-	movq	$saved_context, %rax
+	leaq	saved_context(%rip), %rax
 	movq	pt_regs_sp(%rax), %rsp
 	movq	pt_regs_bp(%rax), %rbp
 	movq	pt_regs_si(%rax), %rsi
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 10/11] x86/paravirt: Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (9 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Juergen Gross,
	Thomas Hellstrom, VMware, Inc.,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, virtualization, linux-kernel

If PIE is enabled, switch the paravirt assembly constraints to be
compatible. The %c/i constrains generate smaller code so is kept by
default.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
Acked-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/include/asm/paravirt_types.h | 32 +++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 84812964d3dd..82f7ca22e0ae 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -336,9 +336,32 @@ extern struct paravirt_patch_template pv_ops;
 #define PARAVIRT_PATCH(x)					\
 	(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
 
+#ifdef CONFIG_X86_PIE
+#define paravirt_opptr_call "a"
+#define paravirt_opptr_type "p"
+
+/*
+ * Alternative patching requires a maximum of 7 bytes but the relative call is
+ * only 6 bytes. If PIE is enabled, add an additional nop to the call
+ * instruction to ensure patching is possible.
+ *
+ * Without PIE, the call is reg/mem64:
+ * ff 14 25 68 37 02 82    callq  *0xffffffff82023768
+ *
+ * With PIE, it is relative to %rip and take 1-less byte:
+ * ff 15 fa d9 ff 00       callq  *0xffd9fa(%rip) # <pv_ops+0x30>
+ *
+ */
+#define PARAVIRT_CALL_POST  "nop;"
+#else
+#define paravirt_opptr_call "c"
+#define paravirt_opptr_type "i"
+#define PARAVIRT_CALL_POST  ""
+#endif
+
 #define paravirt_type(op)				\
 	[paravirt_typenum] "i" (PARAVIRT_PATCH(op)),	\
-	[paravirt_opptr] "i" (&(pv_ops.op))
+	[paravirt_opptr] paravirt_opptr_type (&(pv_ops.op))
 #define paravirt_clobber(clobber)		\
 	[paravirt_clobber] "i" (clobber)
 
@@ -377,9 +400,10 @@ int paravirt_disable_iospace(void);
  * offset into the paravirt_patch_template structure, and can therefore be
  * freely converted back into a structure offset.
  */
-#define PARAVIRT_CALL					\
-	ANNOTATE_RETPOLINE_SAFE				\
-	"call *%c[paravirt_opptr];"
+#define PARAVIRT_CALL						\
+	ANNOTATE_RETPOLINE_SAFE					\
+	"call *%" paravirt_opptr_call "[paravirt_opptr];"	\
+	PARAVIRT_CALL_POST
 
 /*
  * These macros are intended to wrap calls through one of the paravirt
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v11 11/11] x86/alternatives: Adapt assembly for PIE support
  2020-02-28  0:00 ` Thomas Garnier
                   ` (10 preceding siblings ...)
  (?)
@ 2020-02-28  0:00 ` Thomas Garnier
  2020-03-03  4:59   ` Kees Cook
  -1 siblings, 1 reply; 39+ messages in thread
From: Thomas Garnier @ 2020-02-28  0:00 UTC (permalink / raw)
  To: kernel-hardening
  Cc: kristen, keescook, Thomas Garnier, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, Miguel Ojeda,
	Rasmus Villemoes, Peter Zijlstra, linux-kernel

Change the assembly options to work with pointers instead of integers.
The generated code is the same PIE just ensures input is a pointer.

Signed-off-by: Thomas Garnier <thgarnie@chromium.org>
---
 arch/x86/include/asm/alternative.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 13adca37c99a..43a148042656 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -243,7 +243,7 @@ static inline int alternatives_text_reserved(void *start, void *end)
 /* Like alternative_io, but for replacing a direct call with another one. */
 #define alternative_call(oldfunc, newfunc, feature, output, input...)	\
 	asm_inline volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \
-		: output : [old] "i" (oldfunc), [new] "i" (newfunc), ## input)
+		: output : [old] "X" (oldfunc), [new] "X" (newfunc), ## input)
 
 /*
  * Like alternative_call, but there are two features and respective functions.
@@ -256,8 +256,8 @@ static inline int alternatives_text_reserved(void *start, void *end)
 	asm_inline volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
 		"call %P[new2]", feature2)				      \
 		: output, ASM_CALL_CONSTRAINT				      \
-		: [old] "i" (oldfunc), [new1] "i" (newfunc1),		      \
-		  [new2] "i" (newfunc2), ## input)
+		: [old] "X" (oldfunc), [new1] "X" (newfunc1),		      \
+		  [new2] "X" (newfunc2), ## input)
 
 /*
  * use this macro(s) if you need more than one output parameter
-- 
2.25.1.481.gfbce0eb801-goog


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 06/11] x86/CPU: Adapt assembly for PIE support
  2020-02-28  0:00 ` [PATCH v11 06/11] x86/CPU: " Thomas Garnier
@ 2020-03-03  4:58   ` Kees Cook
  0 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2020-03-03  4:58 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: kernel-hardening, kristen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, Andy Lutomirski,
	Peter Zijlstra (Intel),
	Len Brown, linux-kernel

On Thu, Feb 27, 2020 at 04:00:51PM -0800, Thomas Garnier wrote:
> Change the assembly code to use only relative references of symbols for the
> kernel to be PIE compatible.
> 
> Signed-off-by: Thomas Garnier <thgarnie@chromium.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/x86/include/asm/processor.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 09705ccc393c..fdf6366c482d 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -746,11 +746,13 @@ static inline void sync_core(void)
>  		"pushfq\n\t"
>  		"mov %%cs, %0\n\t"
>  		"pushq %q0\n\t"
> -		"pushq $1f\n\t"
> +		"leaq 1f(%%rip), %q0\n\t"
> +		"pushq %q0\n\t"
>  		"iretq\n\t"
>  		UNWIND_HINT_RESTORE
>  		"1:"
> -		: "=&r" (tmp), ASM_CALL_CONSTRAINT : : "cc", "memory");
> +		: "=&r" (tmp), ASM_CALL_CONSTRAINT
> +		: : "cc", "memory");
>  #endif
>  }
>  
> -- 
> 2.25.1.481.gfbce0eb801-goog
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 11/11] x86/alternatives: Adapt assembly for PIE support
  2020-02-28  0:00 ` [PATCH v11 11/11] x86/alternatives: " Thomas Garnier
@ 2020-03-03  4:59   ` Kees Cook
  0 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2020-03-03  4:59 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: kernel-hardening, kristen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, Miguel Ojeda,
	Rasmus Villemoes, Peter Zijlstra, linux-kernel

On Thu, Feb 27, 2020 at 04:00:56PM -0800, Thomas Garnier wrote:
> Change the assembly options to work with pointers instead of integers.
> The generated code is the same PIE just ensures input is a pointer.
> 
> Signed-off-by: Thomas Garnier <thgarnie@chromium.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/x86/include/asm/alternative.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 13adca37c99a..43a148042656 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -243,7 +243,7 @@ static inline int alternatives_text_reserved(void *start, void *end)
>  /* Like alternative_io, but for replacing a direct call with another one. */
>  #define alternative_call(oldfunc, newfunc, feature, output, input...)	\
>  	asm_inline volatile (ALTERNATIVE("call %P[old]", "call %P[new]", feature) \
> -		: output : [old] "i" (oldfunc), [new] "i" (newfunc), ## input)
> +		: output : [old] "X" (oldfunc), [new] "X" (newfunc), ## input)
>  
>  /*
>   * Like alternative_call, but there are two features and respective functions.
> @@ -256,8 +256,8 @@ static inline int alternatives_text_reserved(void *start, void *end)
>  	asm_inline volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", feature1,\
>  		"call %P[new2]", feature2)				      \
>  		: output, ASM_CALL_CONSTRAINT				      \
> -		: [old] "i" (oldfunc), [new1] "i" (newfunc1),		      \
> -		  [new2] "i" (newfunc2), ## input)
> +		: [old] "X" (oldfunc), [new1] "X" (newfunc1),		      \
> +		  [new2] "X" (newfunc2), ## input)
>  
>  /*
>   * use this macro(s) if you need more than one output parameter
> -- 
> 2.25.1.481.gfbce0eb801-goog
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-02-28  0:00 ` Thomas Garnier
@ 2020-03-03  5:02   ` Kees Cook
  -1 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2020-03-03  5:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Thomas Garnier
  Cc: kernel-hardening, kristen, Herbert Xu, David S. Miller,
	H. Peter Anvin, x86, Andy Lutomirski, Juergen Gross,
	Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Peter Zijlstra, Miguel Ojeda, Will Deacon, Ard Biesheuvel,
	Masami Hiramatsu, Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf,
	Cao jin, Allison Randal, linux-crypto, linux-kernel,
	virtualization, linux-pm

On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> Minor changes based on feedback and rebase from v10.
> 
> Splitting the previous serie in two. This part contains assembly code
> changes required for PIE but without any direct dependencies with the
> rest of the patchset.
> 
> Note: Using objtool to detect non-compliant PIE relocations is not yet
> possible as this patchset only includes the simplest PIE changes.
> Additional changes are needed in kvm, xen and percpu code.
> 
> Changes:
>  - patch v11 (assembly);
>    - Fix comments on x86/entry/64.
>    - Remove KASLR PIE explanation on all commits.
>    - Add note on objtool not being possible at this stage of the patchset.

This moves us closer to PIE in a clean first step. I think these patches
look good to go, and unblock the work in kvm, xen, and percpu code. Can
one of the x86 maintainers pick this series up?

Thanks!

-Kees

>  - patch v10 (assembly):
>    - Swap rax for rdx on entry/64 changes based on feedback.
>    - Addressed feedback from Borislav Petkov on boot, paravirt, alternatives
>      and globally.
>    - Rebased the patchset and ensure it works with large kaslr (not included).
>  - patch v9 (assembly):
>    - Moved to relative reference for sync_core based on feedback.
>    - x86/crypto had multiple algorithms deleted, removed PIE changes to them.
>    - fix typo on comment end line.
>  - patch v8 (assembly):
>    - Fix issues in crypto changes (thanks to Eric Biggers).
>    - Remove unnecessary jump table change.
>    - Change author and signoff to chromium email address.
>  - patch v7 (assembly):
>    - Split patchset and reorder changes.
>  - patch v6:
>    - Rebase on latest changes in jump tables and crypto.
>    - Fix wording on couple commits.
>    - Revisit checkpatch warnings.
>    - Moving to @chromium.org.
>  - patch v5:
>    - Adapt new crypto modules for PIE.
>    - Improve per-cpu commit message.
>    - Fix xen 32-bit build error with .quad.
>    - Remove extra code for ftrace.
>  - patch v4:
>    - Simplify early boot by removing global variables.
>    - Modify the mcount location script for __mcount_loc intead of the address
>      read in the ftrace implementation.
>    - Edit commit description to explain better where the kernel can be located.
>    - Streamlined the testing done on each patch proposal. Always testing
>      hibernation, suspend, ftrace and kprobe to ensure no regressions.
>  - patch v3:
>    - Update on message to describe longer term PIE goal.
>    - Minor change on ftrace if condition.
>    - Changed code using xchgq.
>  - patch v2:
>    - Adapt patch to work post KPTI and compiler changes
>    - Redo all performance testing with latest configs and compilers
>    - Simplify mov macro on PIE (MOVABS now)
>    - Reduce GOT footprint
>  - patch v1:
>    - Simplify ftrace implementation.
>    - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
>  - rfc v3:
>    - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
>      mapped memory. It also simplifies the relocation process.
>    - Move the start the module section next to the kernel. Remove the need for
>      -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
>    - Support for XEN PVH as 32-bit relocations can be ignored with
>      --emit-relocs.
>    - Support for GOT relocations previously done automatically with -pie.
>    - Remove need for dynamic PLT in modules.
>    - Support dymamic GOT for modules.
>  - rfc v2:
>    - Add support for global stack cookie while compiler default to fs without
>      mcmodel=kernel
>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>      preserve.
> 
> These patches make some of the changes necessary to build the kernel as
> Position Independent Executable (PIE) on x86_64. Another patchset will
> add the PIE option and larger architecture changes. PIE allows the kernel to be
> placed below the 0xffffffff80000000 increasing the range of KASLR.
> 
> The patches:
>  - 1, 3-11: Change in assembly code to be PIE compliant.
>  - 2: Add a new _ASM_MOVABS macro to fetch a symbol address generically.
> 
> diffstat:
>  crypto/aegis128-aesni-asm.S         |    6 +-
>  crypto/aesni-intel_asm.S            |    8 +--
>  crypto/aesni-intel_avx-x86_64.S     |    3 -
>  crypto/camellia-aesni-avx-asm_64.S  |   42 +++++++--------
>  crypto/camellia-aesni-avx2-asm_64.S |   44 ++++++++--------
>  crypto/camellia-x86_64-asm_64.S     |    8 +--
>  crypto/cast5-avx-x86_64-asm_64.S    |   50 ++++++++++--------
>  crypto/cast6-avx-x86_64-asm_64.S    |   44 +++++++++-------
>  crypto/des3_ede-asm_64.S            |   96 ++++++++++++++++++++++++------------
>  crypto/ghash-clmulni-intel_asm.S    |    4 -
>  crypto/glue_helper-asm-avx.S        |    4 -
>  crypto/glue_helper-asm-avx2.S       |    6 +-
>  crypto/sha256-avx2-asm.S            |   18 ++++--
>  entry/entry_64.S                    |   16 ++++--
>  include/asm/alternative.h           |    6 +-
>  include/asm/asm.h                   |    1 
>  include/asm/bug.h                   |    2 
>  include/asm/paravirt_types.h        |   32 ++++++++++--
>  include/asm/pm-trace.h              |    2 
>  include/asm/processor.h             |    6 +-
>  kernel/acpi/wakeup_64.S             |   31 ++++++-----
>  kernel/head_64.S                    |   15 +++--
>  kernel/relocate_kernel_64.S         |    2 
>  power/hibernate_asm_64.S            |    4 -
>  24 files changed, 268 insertions(+), 182 deletions(-)
> 
> Patchset is based on next-20200227.
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-03  5:02   ` Kees Cook
  0 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2020-03-03  5:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Thomas Garnier
  Cc: kernel-hardening, kristen, Herbert Xu, David S. Miller,
	H. Peter Anvin, x86, Andy Lutomirski, Juergen Gross,
	Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Peter Zijlstra, Miguel Ojeda, Will Deacon, Ard Biesheuvel,
	Masami Hiramatsu, Jiri Slaby, Boris Ostrovsky

On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> Minor changes based on feedback and rebase from v10.
> 
> Splitting the previous serie in two. This part contains assembly code
> changes required for PIE but without any direct dependencies with the
> rest of the patchset.
> 
> Note: Using objtool to detect non-compliant PIE relocations is not yet
> possible as this patchset only includes the simplest PIE changes.
> Additional changes are needed in kvm, xen and percpu code.
> 
> Changes:
>  - patch v11 (assembly);
>    - Fix comments on x86/entry/64.
>    - Remove KASLR PIE explanation on all commits.
>    - Add note on objtool not being possible at this stage of the patchset.

This moves us closer to PIE in a clean first step. I think these patches
look good to go, and unblock the work in kvm, xen, and percpu code. Can
one of the x86 maintainers pick this series up?

Thanks!

-Kees

>  - patch v10 (assembly):
>    - Swap rax for rdx on entry/64 changes based on feedback.
>    - Addressed feedback from Borislav Petkov on boot, paravirt, alternatives
>      and globally.
>    - Rebased the patchset and ensure it works with large kaslr (not included).
>  - patch v9 (assembly):
>    - Moved to relative reference for sync_core based on feedback.
>    - x86/crypto had multiple algorithms deleted, removed PIE changes to them.
>    - fix typo on comment end line.
>  - patch v8 (assembly):
>    - Fix issues in crypto changes (thanks to Eric Biggers).
>    - Remove unnecessary jump table change.
>    - Change author and signoff to chromium email address.
>  - patch v7 (assembly):
>    - Split patchset and reorder changes.
>  - patch v6:
>    - Rebase on latest changes in jump tables and crypto.
>    - Fix wording on couple commits.
>    - Revisit checkpatch warnings.
>    - Moving to @chromium.org.
>  - patch v5:
>    - Adapt new crypto modules for PIE.
>    - Improve per-cpu commit message.
>    - Fix xen 32-bit build error with .quad.
>    - Remove extra code for ftrace.
>  - patch v4:
>    - Simplify early boot by removing global variables.
>    - Modify the mcount location script for __mcount_loc intead of the address
>      read in the ftrace implementation.
>    - Edit commit description to explain better where the kernel can be located.
>    - Streamlined the testing done on each patch proposal. Always testing
>      hibernation, suspend, ftrace and kprobe to ensure no regressions.
>  - patch v3:
>    - Update on message to describe longer term PIE goal.
>    - Minor change on ftrace if condition.
>    - Changed code using xchgq.
>  - patch v2:
>    - Adapt patch to work post KPTI and compiler changes
>    - Redo all performance testing with latest configs and compilers
>    - Simplify mov macro on PIE (MOVABS now)
>    - Reduce GOT footprint
>  - patch v1:
>    - Simplify ftrace implementation.
>    - Use gcc mstack-protector-guard-reg=%gs with PIE when possible.
>  - rfc v3:
>    - Use --emit-relocs instead of -pie to reduce dynamic relocation space on
>      mapped memory. It also simplifies the relocation process.
>    - Move the start the module section next to the kernel. Remove the need for
>      -mcmodel=large on modules. Extends module space from 1 to 2G maximum.
>    - Support for XEN PVH as 32-bit relocations can be ignored with
>      --emit-relocs.
>    - Support for GOT relocations previously done automatically with -pie.
>    - Remove need for dynamic PLT in modules.
>    - Support dymamic GOT for modules.
>  - rfc v2:
>    - Add support for global stack cookie while compiler default to fs without
>      mcmodel=kernel
>    - Change patch 7 to correctly jump out of the identity mapping on kexec load
>      preserve.
> 
> These patches make some of the changes necessary to build the kernel as
> Position Independent Executable (PIE) on x86_64. Another patchset will
> add the PIE option and larger architecture changes. PIE allows the kernel to be
> placed below the 0xffffffff80000000 increasing the range of KASLR.
> 
> The patches:
>  - 1, 3-11: Change in assembly code to be PIE compliant.
>  - 2: Add a new _ASM_MOVABS macro to fetch a symbol address generically.
> 
> diffstat:
>  crypto/aegis128-aesni-asm.S         |    6 +-
>  crypto/aesni-intel_asm.S            |    8 +--
>  crypto/aesni-intel_avx-x86_64.S     |    3 -
>  crypto/camellia-aesni-avx-asm_64.S  |   42 +++++++--------
>  crypto/camellia-aesni-avx2-asm_64.S |   44 ++++++++--------
>  crypto/camellia-x86_64-asm_64.S     |    8 +--
>  crypto/cast5-avx-x86_64-asm_64.S    |   50 ++++++++++--------
>  crypto/cast6-avx-x86_64-asm_64.S    |   44 +++++++++-------
>  crypto/des3_ede-asm_64.S            |   96 ++++++++++++++++++++++++------------
>  crypto/ghash-clmulni-intel_asm.S    |    4 -
>  crypto/glue_helper-asm-avx.S        |    4 -
>  crypto/glue_helper-asm-avx2.S       |    6 +-
>  crypto/sha256-avx2-asm.S            |   18 ++++--
>  entry/entry_64.S                    |   16 ++++--
>  include/asm/alternative.h           |    6 +-
>  include/asm/asm.h                   |    1 
>  include/asm/bug.h                   |    2 
>  include/asm/paravirt_types.h        |   32 ++++++++++--
>  include/asm/pm-trace.h              |    2 
>  include/asm/processor.h             |    6 +-
>  kernel/acpi/wakeup_64.S             |   31 ++++++-----
>  kernel/head_64.S                    |   15 +++--
>  kernel/relocate_kernel_64.S         |    2 
>  power/hibernate_asm_64.S            |    4 -
>  24 files changed, 268 insertions(+), 182 deletions(-)
> 
> Patchset is based on next-20200227.
> 
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-03  5:02   ` Kees Cook
@ 2020-03-03  9:55     ` Peter Zijlstra
  -1 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2020-03-03  9:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Thomas Garnier,
	kernel-hardening, kristen, Herbert Xu, David S. Miller,
	H. Peter Anvin, x86, Andy Lutomirski, Juergen Gross,
	Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, linux-crypto, linux-kernel, virtualization,
	linux-pm

On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > Minor changes based on feedback and rebase from v10.
> > 
> > Splitting the previous serie in two. This part contains assembly code
> > changes required for PIE but without any direct dependencies with the
> > rest of the patchset.
> > 
> > Note: Using objtool to detect non-compliant PIE relocations is not yet
> > possible as this patchset only includes the simplest PIE changes.
> > Additional changes are needed in kvm, xen and percpu code.
> > 
> > Changes:
> >  - patch v11 (assembly);
> >    - Fix comments on x86/entry/64.
> >    - Remove KASLR PIE explanation on all commits.
> >    - Add note on objtool not being possible at this stage of the patchset.
> 
> This moves us closer to PIE in a clean first step. I think these patches
> look good to go, and unblock the work in kvm, xen, and percpu code. Can
> one of the x86 maintainers pick this series up?

But,... do we still need this in the light of that fine-grained kaslr
stuff?

What is the actual value of this PIE crud in the face of that?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-03  9:55     ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2020-03-03  9:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: kristen, kernel-hardening, VMware, Inc.,
	Rasmus Villemoes, virtualization, Thomas Garnier, Pavel Machek,
	H. Peter Anvin, Will Deacon, Ard Biesheuvel, Thomas Hellstrom,
	Herbert Xu, Jiri Slaby, Boris Ostrovsky, x86, Ingo Molnar,
	linux-crypto, Len Brown, linux-pm, Cao jin, Borislav Petkov,
	Andy Lutomirski, Josh Poimboeuf, Thomas Gleixner, Allison Randal,
	Juergen

On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > Minor changes based on feedback and rebase from v10.
> > 
> > Splitting the previous serie in two. This part contains assembly code
> > changes required for PIE but without any direct dependencies with the
> > rest of the patchset.
> > 
> > Note: Using objtool to detect non-compliant PIE relocations is not yet
> > possible as this patchset only includes the simplest PIE changes.
> > Additional changes are needed in kvm, xen and percpu code.
> > 
> > Changes:
> >  - patch v11 (assembly);
> >    - Fix comments on x86/entry/64.
> >    - Remove KASLR PIE explanation on all commits.
> >    - Add note on objtool not being possible at this stage of the patchset.
> 
> This moves us closer to PIE in a clean first step. I think these patches
> look good to go, and unblock the work in kvm, xen, and percpu code. Can
> one of the x86 maintainers pick this series up?

But,... do we still need this in the light of that fine-grained kaslr
stuff?

What is the actual value of this PIE crud in the face of that?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-03  9:55     ` Peter Zijlstra
  (?)
@ 2020-03-03 15:43       ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-03-03 15:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Kernel Hardening, Kristen Carlson Accardi, Herbert Xu,
	David S. Miller, H. Peter Anvin, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > > Minor changes based on feedback and rebase from v10.
> > >
> > > Splitting the previous serie in two. This part contains assembly code
> > > changes required for PIE but without any direct dependencies with the
> > > rest of the patchset.
> > >
> > > Note: Using objtool to detect non-compliant PIE relocations is not yet
> > > possible as this patchset only includes the simplest PIE changes.
> > > Additional changes are needed in kvm, xen and percpu code.
> > >
> > > Changes:
> > >  - patch v11 (assembly);
> > >    - Fix comments on x86/entry/64.
> > >    - Remove KASLR PIE explanation on all commits.
> > >    - Add note on objtool not being possible at this stage of the patchset.
> >
> > This moves us closer to PIE in a clean first step. I think these patches
> > look good to go, and unblock the work in kvm, xen, and percpu code. Can
> > one of the x86 maintainers pick this series up?
>
> But,... do we still need this in the light of that fine-grained kaslr
> stuff?
>
> What is the actual value of this PIE crud in the face of that?

If I remember well, it makes it easier/better but I haven't seen a
recent update on that. Is that accurate Kees?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-03 15:43       ` Thomas Garnier
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-03-03 15:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Kernel Hardening, Kristen Carlson Accardi, Herbert Xu,
	David S. Miller, H. Peter Anvin, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby

On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > > Minor changes based on feedback and rebase from v10.
> > >
> > > Splitting the previous serie in two. This part contains assembly code
> > > changes required for PIE but without any direct dependencies with the
> > > rest of the patchset.
> > >
> > > Note: Using objtool to detect non-compliant PIE relocations is not yet
> > > possible as this patchset only includes the simplest PIE changes.
> > > Additional changes are needed in kvm, xen and percpu code.
> > >
> > > Changes:
> > >  - patch v11 (assembly);
> > >    - Fix comments on x86/entry/64.
> > >    - Remove KASLR PIE explanation on all commits.
> > >    - Add note on objtool not being possible at this stage of the patchset.
> >
> > This moves us closer to PIE in a clean first step. I think these patches
> > look good to go, and unblock the work in kvm, xen, and percpu code. Can
> > one of the x86 maintainers pick this series up?
>
> But,... do we still need this in the light of that fine-grained kaslr
> stuff?
>
> What is the actual value of this PIE crud in the face of that?

If I remember well, it makes it easier/better but I haven't seen a
recent update on that. Is that accurate Kees?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-03 15:43       ` Thomas Garnier
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-03-03 15:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Kernel Hardening, Kristen Carlson Accardi, Herbert Xu,
	David S. Miller, H. Peter Anvin, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > > Minor changes based on feedback and rebase from v10.
> > >
> > > Splitting the previous serie in two. This part contains assembly code
> > > changes required for PIE but without any direct dependencies with the
> > > rest of the patchset.
> > >
> > > Note: Using objtool to detect non-compliant PIE relocations is not yet
> > > possible as this patchset only includes the simplest PIE changes.
> > > Additional changes are needed in kvm, xen and percpu code.
> > >
> > > Changes:
> > >  - patch v11 (assembly);
> > >    - Fix comments on x86/entry/64.
> > >    - Remove KASLR PIE explanation on all commits.
> > >    - Add note on objtool not being possible at this stage of the patchset.
> >
> > This moves us closer to PIE in a clean first step. I think these patches
> > look good to go, and unblock the work in kvm, xen, and percpu code. Can
> > one of the x86 maintainers pick this series up?
>
> But,... do we still need this in the light of that fine-grained kaslr
> stuff?
>
> What is the actual value of this PIE crud in the face of that?

If I remember well, it makes it easier/better but I haven't seen a
recent update on that. Is that accurate Kees?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-03 15:43       ` Thomas Garnier
  (?)
@ 2020-03-03 21:01         ` Kristen Carlson Accardi
  -1 siblings, 0 replies; 39+ messages in thread
From: Kristen Carlson Accardi @ 2020-03-03 21:01 UTC (permalink / raw)
  To: Thomas Garnier, Peter Zijlstra
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Kernel Hardening, Herbert Xu, David S. Miller, H. Peter Anvin,
	the arch/x86 maintainers, Andy Lutomirski, Juergen Gross,
	Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Tue, 2020-03-03 at 07:43 -0800, Thomas Garnier wrote:
> On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org>
> wrote:
> > On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> > > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > > > Minor changes based on feedback and rebase from v10.
> > > > 
> > > > Splitting the previous serie in two. This part contains
> > > > assembly code
> > > > changes required for PIE but without any direct dependencies
> > > > with the
> > > > rest of the patchset.
> > > > 
> > > > Note: Using objtool to detect non-compliant PIE relocations is
> > > > not yet
> > > > possible as this patchset only includes the simplest PIE
> > > > changes.
> > > > Additional changes are needed in kvm, xen and percpu code.
> > > > 
> > > > Changes:
> > > >  - patch v11 (assembly);
> > > >    - Fix comments on x86/entry/64.
> > > >    - Remove KASLR PIE explanation on all commits.
> > > >    - Add note on objtool not being possible at this stage of
> > > > the patchset.
> > > 
> > > This moves us closer to PIE in a clean first step. I think these
> > > patches
> > > look good to go, and unblock the work in kvm, xen, and percpu
> > > code. Can
> > > one of the x86 maintainers pick this series up?
> > 
> > But,... do we still need this in the light of that fine-grained
> > kaslr
> > stuff?
> > 
> > What is the actual value of this PIE crud in the face of that?
> 
> If I remember well, it makes it easier/better but I haven't seen a
> recent update on that. Is that accurate Kees?

I believe this patchset is valuable if people are trying to brute force
guess the kernel location, but not so awesome in the event of
infoleaks. In the case of the current fgkaslr implementation, we only
randomize within the existing text segment memory area - so with PIE
the text segment base can move around more, but within that it wouldn't
strengthen anything. So, if you have an infoleak, you learn the base
instantly, and are just left with the same extra protection you get
without PIE.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-03 21:01         ` Kristen Carlson Accardi
  0 siblings, 0 replies; 39+ messages in thread
From: Kristen Carlson Accardi @ 2020-03-03 21:01 UTC (permalink / raw)
  To: Thomas Garnier, Peter Zijlstra
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Kernel Hardening, Herbert Xu, David S. Miller, H. Peter Anvin,
	the arch/x86 maintainers, Andy Lutomirski, Juergen Gross,
	Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel

On Tue, 2020-03-03 at 07:43 -0800, Thomas Garnier wrote:
> On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org>
> wrote:
> > On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> > > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > > > Minor changes based on feedback and rebase from v10.
> > > > 
> > > > Splitting the previous serie in two. This part contains
> > > > assembly code
> > > > changes required for PIE but without any direct dependencies
> > > > with the
> > > > rest of the patchset.
> > > > 
> > > > Note: Using objtool to detect non-compliant PIE relocations is
> > > > not yet
> > > > possible as this patchset only includes the simplest PIE
> > > > changes.
> > > > Additional changes are needed in kvm, xen and percpu code.
> > > > 
> > > > Changes:
> > > >  - patch v11 (assembly);
> > > >    - Fix comments on x86/entry/64.
> > > >    - Remove KASLR PIE explanation on all commits.
> > > >    - Add note on objtool not being possible at this stage of
> > > > the patchset.
> > > 
> > > This moves us closer to PIE in a clean first step. I think these
> > > patches
> > > look good to go, and unblock the work in kvm, xen, and percpu
> > > code. Can
> > > one of the x86 maintainers pick this series up?
> > 
> > But,... do we still need this in the light of that fine-grained
> > kaslr
> > stuff?
> > 
> > What is the actual value of this PIE crud in the face of that?
> 
> If I remember well, it makes it easier/better but I haven't seen a
> recent update on that. Is that accurate Kees?

I believe this patchset is valuable if people are trying to brute force
guess the kernel location, but not so awesome in the event of
infoleaks. In the case of the current fgkaslr implementation, we only
randomize within the existing text segment memory area - so with PIE
the text segment base can move around more, but within that it wouldn't
strengthen anything. So, if you have an infoleak, you learn the base
instantly, and are just left with the same extra protection you get
without PIE.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-03 21:01         ` Kristen Carlson Accardi
  0 siblings, 0 replies; 39+ messages in thread
From: Kristen Carlson Accardi @ 2020-03-03 21:01 UTC (permalink / raw)
  To: Thomas Garnier, Peter Zijlstra
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Kernel Hardening, Herbert Xu, David S. Miller, H. Peter Anvin,
	the arch/x86 maintainers, Andy Lutomirski, Juergen Gross,
	Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Tue, 2020-03-03 at 07:43 -0800, Thomas Garnier wrote:
> On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org>
> wrote:
> > On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> > > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > > > Minor changes based on feedback and rebase from v10.
> > > > 
> > > > Splitting the previous serie in two. This part contains
> > > > assembly code
> > > > changes required for PIE but without any direct dependencies
> > > > with the
> > > > rest of the patchset.
> > > > 
> > > > Note: Using objtool to detect non-compliant PIE relocations is
> > > > not yet
> > > > possible as this patchset only includes the simplest PIE
> > > > changes.
> > > > Additional changes are needed in kvm, xen and percpu code.
> > > > 
> > > > Changes:
> > > >  - patch v11 (assembly);
> > > >    - Fix comments on x86/entry/64.
> > > >    - Remove KASLR PIE explanation on all commits.
> > > >    - Add note on objtool not being possible at this stage of
> > > > the patchset.
> > > 
> > > This moves us closer to PIE in a clean first step. I think these
> > > patches
> > > look good to go, and unblock the work in kvm, xen, and percpu
> > > code. Can
> > > one of the x86 maintainers pick this series up?
> > 
> > But,... do we still need this in the light of that fine-grained
> > kaslr
> > stuff?
> > 
> > What is the actual value of this PIE crud in the face of that?
> 
> If I remember well, it makes it easier/better but I haven't seen a
> recent update on that. Is that accurate Kees?

I believe this patchset is valuable if people are trying to brute force
guess the kernel location, but not so awesome in the event of
infoleaks. In the case of the current fgkaslr implementation, we only
randomize within the existing text segment memory area - so with PIE
the text segment base can move around more, but within that it wouldn't
strengthen anything. So, if you have an infoleak, you learn the base
instantly, and are just left with the same extra protection you get
without PIE.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-03 21:01         ` Kristen Carlson Accardi
@ 2020-03-03 21:19           ` Kees Cook
  -1 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2020-03-03 21:19 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Thomas Garnier, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Kernel Hardening, Herbert Xu, David S. Miller,
	H. Peter Anvin, the arch/x86 maintainers, Andy Lutomirski,
	Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Tue, Mar 03, 2020 at 01:01:26PM -0800, Kristen Carlson Accardi wrote:
> On Tue, 2020-03-03 at 07:43 -0800, Thomas Garnier wrote:
> > On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org>
> > wrote:
> > > On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> > > > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > > > > Minor changes based on feedback and rebase from v10.
> > > > > 
> > > > > Splitting the previous serie in two. This part contains
> > > > > assembly code
> > > > > changes required for PIE but without any direct dependencies
> > > > > with the
> > > > > rest of the patchset.
> > > > > 
> > > > > Note: Using objtool to detect non-compliant PIE relocations is
> > > > > not yet
> > > > > possible as this patchset only includes the simplest PIE
> > > > > changes.
> > > > > Additional changes are needed in kvm, xen and percpu code.
> > > > > 
> > > > > Changes:
> > > > >  - patch v11 (assembly);
> > > > >    - Fix comments on x86/entry/64.
> > > > >    - Remove KASLR PIE explanation on all commits.
> > > > >    - Add note on objtool not being possible at this stage of
> > > > > the patchset.
> > > > 
> > > > This moves us closer to PIE in a clean first step. I think these
> > > > patches
> > > > look good to go, and unblock the work in kvm, xen, and percpu
> > > > code. Can
> > > > one of the x86 maintainers pick this series up?
> > > 
> > > But,... do we still need this in the light of that fine-grained
> > > kaslr
> > > stuff?
> > > 
> > > What is the actual value of this PIE crud in the face of that?
> > 
> > If I remember well, it makes it easier/better but I haven't seen a
> > recent update on that. Is that accurate Kees?
> 
> I believe this patchset is valuable if people are trying to brute force
> guess the kernel location, but not so awesome in the event of
> infoleaks. In the case of the current fgkaslr implementation, we only
> randomize within the existing text segment memory area - so with PIE
> the text segment base can move around more, but within that it wouldn't
> strengthen anything. So, if you have an infoleak, you learn the base
> instantly, and are just left with the same extra protection you get
> without PIE.

Right -- PIE improves both non- and fg- KASLR similarly, in the sense
that the possible entropy for base offset is expanded. It also opens the
door to doing even more crazy things. (e.g. why keep the kernel text all
in one contiguous chunk?)

And generally speaking, it seems a nice improvement to me, as it gives
the kernel greater addressing flexibility.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-03 21:19           ` Kees Cook
  0 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2020-03-03 21:19 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Thomas Garnier, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Kernel Hardening, Herbert Xu, David S. Miller,
	H. Peter Anvin, the arch/x86 maintainers, Andy Lutomirski,
	Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon

On Tue, Mar 03, 2020 at 01:01:26PM -0800, Kristen Carlson Accardi wrote:
> On Tue, 2020-03-03 at 07:43 -0800, Thomas Garnier wrote:
> > On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org>
> > wrote:
> > > On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
> > > > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
> > > > > Minor changes based on feedback and rebase from v10.
> > > > > 
> > > > > Splitting the previous serie in two. This part contains
> > > > > assembly code
> > > > > changes required for PIE but without any direct dependencies
> > > > > with the
> > > > > rest of the patchset.
> > > > > 
> > > > > Note: Using objtool to detect non-compliant PIE relocations is
> > > > > not yet
> > > > > possible as this patchset only includes the simplest PIE
> > > > > changes.
> > > > > Additional changes are needed in kvm, xen and percpu code.
> > > > > 
> > > > > Changes:
> > > > >  - patch v11 (assembly);
> > > > >    - Fix comments on x86/entry/64.
> > > > >    - Remove KASLR PIE explanation on all commits.
> > > > >    - Add note on objtool not being possible at this stage of
> > > > > the patchset.
> > > > 
> > > > This moves us closer to PIE in a clean first step. I think these
> > > > patches
> > > > look good to go, and unblock the work in kvm, xen, and percpu
> > > > code. Can
> > > > one of the x86 maintainers pick this series up?
> > > 
> > > But,... do we still need this in the light of that fine-grained
> > > kaslr
> > > stuff?
> > > 
> > > What is the actual value of this PIE crud in the face of that?
> > 
> > If I remember well, it makes it easier/better but I haven't seen a
> > recent update on that. Is that accurate Kees?
> 
> I believe this patchset is valuable if people are trying to brute force
> guess the kernel location, but not so awesome in the event of
> infoleaks. In the case of the current fgkaslr implementation, we only
> randomize within the existing text segment memory area - so with PIE
> the text segment base can move around more, but within that it wouldn't
> strengthen anything. So, if you have an infoleak, you learn the base
> instantly, and are just left with the same extra protection you get
> without PIE.

Right -- PIE improves both non- and fg- KASLR similarly, in the sense
that the possible entropy for base offset is expanded. It also opens the
door to doing even more crazy things. (e.g. why keep the kernel text all
in one contiguous chunk?)

And generally speaking, it seems a nice improvement to me, as it gives
the kernel greater addressing flexibility.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-03 21:19           ` Kees Cook
@ 2020-03-04  9:21             ` Peter Zijlstra
  -1 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2020-03-04  9:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, Thomas Garnier, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Kernel Hardening, Herbert Xu,
	David S. Miller, H. Peter Anvin, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Tue, Mar 03, 2020 at 01:19:22PM -0800, Kees Cook wrote:
> On Tue, Mar 03, 2020 at 01:01:26PM -0800, Kristen Carlson Accardi wrote:
> > On Tue, 2020-03-03 at 07:43 -0800, Thomas Garnier wrote:
> > > On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org>

> > > > But,... do we still need this in the light of that fine-grained
> > > > kaslr
> > > > stuff?
> > > > 
> > > > What is the actual value of this PIE crud in the face of that?
> > > 
> > > If I remember well, it makes it easier/better but I haven't seen a
> > > recent update on that. Is that accurate Kees?
> > 
> > I believe this patchset is valuable if people are trying to brute force
> > guess the kernel location, but not so awesome in the event of
> > infoleaks. In the case of the current fgkaslr implementation, we only
> > randomize within the existing text segment memory area - so with PIE
> > the text segment base can move around more, but within that it wouldn't
> > strengthen anything. So, if you have an infoleak, you learn the base
> > instantly, and are just left with the same extra protection you get
> > without PIE.
> 
> Right -- PIE improves both non- and fg- KASLR similarly, in the sense
> that the possible entropy for base offset is expanded. It also opens the
> door to doing even more crazy things. 

So I'm really confused. I see it increases the aslr range, but I'm still
not sure why we care in the face of fgkaslr. Current kaslr is completely
broken because the hardware leaks more bits than we currently have, even
without the kernel itself leaking an address.

But leaking a single address is not a problem with fgkaslr.

> (e.g. why keep the kernel text all
> in one contiguous chunk?)

Dear gawd, please no. Also, we're limited to 2G text, that's just not a
lot of room. I'm really going to object when people propose we introduce
direct PLT for x86.

> And generally speaking, it seems a nice improvement to me, as it gives
> the kernel greater addressing flexibility.

But at what cost; it does unspeakable ugly to the asm. And didn't a
kernel compiled with the extended PIE range produce a measurably slower
kernel due to all the ugly?

So maybe I'm slow, but please spell out the benefit, because I'm not
seeing it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-04  9:21             ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2020-03-04  9:21 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, Kernel Hardening, VMware, Inc.,
	Rasmus Villemoes, virtualization, Thomas Garnier, Pavel Machek,
	H. Peter Anvin, Will Deacon, Ard Biesheuvel, Thomas Hellstrom,
	Herbert Xu, Jiri Slaby, Boris Ostrovsky,
	the arch/x86 maintainers, Ingo Molnar, Linux Crypto Mailing List,
	Len Brown, Linux PM list, Cao jin, Borislav Petkov,
	Andy Lutomirski, Josh

On Tue, Mar 03, 2020 at 01:19:22PM -0800, Kees Cook wrote:
> On Tue, Mar 03, 2020 at 01:01:26PM -0800, Kristen Carlson Accardi wrote:
> > On Tue, 2020-03-03 at 07:43 -0800, Thomas Garnier wrote:
> > > On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra <peterz@infradead.org>

> > > > But,... do we still need this in the light of that fine-grained
> > > > kaslr
> > > > stuff?
> > > > 
> > > > What is the actual value of this PIE crud in the face of that?
> > > 
> > > If I remember well, it makes it easier/better but I haven't seen a
> > > recent update on that. Is that accurate Kees?
> > 
> > I believe this patchset is valuable if people are trying to brute force
> > guess the kernel location, but not so awesome in the event of
> > infoleaks. In the case of the current fgkaslr implementation, we only
> > randomize within the existing text segment memory area - so with PIE
> > the text segment base can move around more, but within that it wouldn't
> > strengthen anything. So, if you have an infoleak, you learn the base
> > instantly, and are just left with the same extra protection you get
> > without PIE.
> 
> Right -- PIE improves both non- and fg- KASLR similarly, in the sense
> that the possible entropy for base offset is expanded. It also opens the
> door to doing even more crazy things. 

So I'm really confused. I see it increases the aslr range, but I'm still
not sure why we care in the face of fgkaslr. Current kaslr is completely
broken because the hardware leaks more bits than we currently have, even
without the kernel itself leaking an address.

But leaking a single address is not a problem with fgkaslr.

> (e.g. why keep the kernel text all
> in one contiguous chunk?)

Dear gawd, please no. Also, we're limited to 2G text, that's just not a
lot of room. I'm really going to object when people propose we introduce
direct PLT for x86.

> And generally speaking, it seems a nice improvement to me, as it gives
> the kernel greater addressing flexibility.

But at what cost; it does unspeakable ugly to the asm. And didn't a
kernel compiled with the extended PIE range produce a measurably slower
kernel due to all the ugly?

So maybe I'm slow, but please spell out the benefit, because I'm not
seeing it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-03 21:19           ` Kees Cook
  (?)
  (?)
@ 2020-03-04  9:40           ` H. Peter Anvin
  -1 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2020-03-04  9:40 UTC (permalink / raw)
  To: Kees Cook, Kristen Carlson Accardi
  Cc: Thomas Garnier, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Kernel Hardening, Herbert Xu, David S. Miller,
	the arch/x86 maintainers, Andy Lutomirski, Juergen Gross,
	Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux.Crypto

Mailing List <linux-crypto@vger.kernel.org>,LKML <linux-kernel@vger.kernel.org>,virtualization@lists.linux-foundation.org,Linux PM list <linux-pm@vger.kernel.org>
From: hpa@zytor.com
Message-ID: <F35C8DBD-9AC3-46F2-9043-6CB9A4FDDDC9@zytor.com>

On March 3, 2020 1:19:22 PM PST, Kees Cook <keescook@chromium.org> wrote:
>On Tue, Mar 03, 2020 at 01:01:26PM -0800, Kristen Carlson Accardi
>wrote:
>> On Tue, 2020-03-03 at 07:43 -0800, Thomas Garnier wrote:
>> > On Tue, Mar 3, 2020 at 1:55 AM Peter Zijlstra
><peterz@infradead.org>
>> > wrote:
>> > > On Mon, Mar 02, 2020 at 09:02:15PM -0800, Kees Cook wrote:
>> > > > On Thu, Feb 27, 2020 at 04:00:45PM -0800, Thomas Garnier wrote:
>> > > > > Minor changes based on feedback and rebase from v10.
>> > > > > 
>> > > > > Splitting the previous serie in two. This part contains
>> > > > > assembly code
>> > > > > changes required for PIE but without any direct dependencies
>> > > > > with the
>> > > > > rest of the patchset.
>> > > > > 
>> > > > > Note: Using objtool to detect non-compliant PIE relocations
>is
>> > > > > not yet
>> > > > > possible as this patchset only includes the simplest PIE
>> > > > > changes.
>> > > > > Additional changes are needed in kvm, xen and percpu code.
>> > > > > 
>> > > > > Changes:
>> > > > >  - patch v11 (assembly);
>> > > > >    - Fix comments on x86/entry/64.
>> > > > >    - Remove KASLR PIE explanation on all commits.
>> > > > >    - Add note on objtool not being possible at this stage of
>> > > > > the patchset.
>> > > > 
>> > > > This moves us closer to PIE in a clean first step. I think
>these
>> > > > patches
>> > > > look good to go, and unblock the work in kvm, xen, and percpu
>> > > > code. Can
>> > > > one of the x86 maintainers pick this series up?
>> > > 
>> > > But,... do we still need this in the light of that fine-grained
>> > > kaslr
>> > > stuff?
>> > > 
>> > > What is the actual value of this PIE crud in the face of that?
>> > 
>> > If I remember well, it makes it easier/better but I haven't seen a
>> > recent update on that. Is that accurate Kees?
>> 
>> I believe this patchset is valuable if people are trying to brute
>force
>> guess the kernel location, but not so awesome in the event of
>> infoleaks. In the case of the current fgkaslr implementation, we only
>> randomize within the existing text segment memory area - so with PIE
>> the text segment base can move around more, but within that it
>wouldn't
>> strengthen anything. So, if you have an infoleak, you learn the base
>> instantly, and are just left with the same extra protection you get
>> without PIE.
>
>Right -- PIE improves both non- and fg- KASLR similarly, in the sense
>that the possible entropy for base offset is expanded. It also opens
>the
>door to doing even more crazy things. (e.g. why keep the kernel text
>all
>in one contiguous chunk?)
>
>And generally speaking, it seems a nice improvement to me, as it gives
>the kernel greater addressing flexibility.

The difference in entropy between fgkaslr and extending the kernel to the PIC memory model (which is the real thing this is doing) is immense:

The current kASLR has maybe 9 bits of entropy. PIC-model could extend that by at most 16 bits at considerable cost in performance and complexity. Fgkaslr would provide many kilobits worth of entropy; the limiting factor would be the random number source used! With a valid RNG, no two boots across all the computers in the world across all time would have an infinitesimal probability of ever being the same; never mind the infoleak issue.

In addition to the combinatorics, fgkaslr pushes randomization right as well as left, so even for the address of any one individual function you get a gain of 15-17 bits.

"More is better" is a truism, but so is Amdahl's Law.


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-04  9:21             ` Peter Zijlstra
@ 2020-03-04 18:21               ` Kees Cook
  -1 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2020-03-04 18:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kristen Carlson Accardi, Thomas Garnier, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Kernel Hardening, Herbert Xu,
	David S. Miller, H. Peter Anvin, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Wed, Mar 04, 2020 at 10:21:36AM +0100, Peter Zijlstra wrote:
> But at what cost; it does unspeakable ugly to the asm. And didn't a
> kernel compiled with the extended PIE range produce a measurably slower
> kernel due to all the ugly?

Was that true? I thought the final results were a wash and that earlier
benchmarks weren't accurate for some reason? I can't find the thread
now. Thomas, do you have numbers on that?

BTW, I totally agree that fgkaslr is the way to go in the future. I
am mostly arguing for this under the assumption that it doesn't
have meaningful performance impact and that it gains the kernel some
flexibility in the kinds of things it can do in the future. If the former
is not true, then I'd agree, the benefit needs to be more clear.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-04 18:21               ` Kees Cook
  0 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2020-03-04 18:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kristen Carlson Accardi, Thomas Garnier, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Kernel Hardening, Herbert Xu,
	David S. Miller, H. Peter Anvin, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will

On Wed, Mar 04, 2020 at 10:21:36AM +0100, Peter Zijlstra wrote:
> But at what cost; it does unspeakable ugly to the asm. And didn't a
> kernel compiled with the extended PIE range produce a measurably slower
> kernel due to all the ugly?

Was that true? I thought the final results were a wash and that earlier
benchmarks weren't accurate for some reason? I can't find the thread
now. Thomas, do you have numbers on that?

BTW, I totally agree that fgkaslr is the way to go in the future. I
am mostly arguing for this under the assumption that it doesn't
have meaningful performance impact and that it gains the kernel some
flexibility in the kinds of things it can do in the future. If the former
is not true, then I'd agree, the benefit needs to be more clear.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-04 18:21               ` Kees Cook
@ 2020-03-04 18:44                 ` H. Peter Anvin
  -1 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2020-03-04 18:44 UTC (permalink / raw)
  To: Kees Cook, Peter Zijlstra
  Cc: Kristen Carlson Accardi, Thomas Garnier, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Kernel Hardening, Herbert Xu,
	David S. Miller, the arch/x86 maintainers, Andy Lutomirski,
	Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On 2020-03-04 10:21, Kees Cook wrote:
> On Wed, Mar 04, 2020 at 10:21:36AM +0100, Peter Zijlstra wrote:
>> But at what cost; it does unspeakable ugly to the asm. And didn't a
>> kernel compiled with the extended PIE range produce a measurably slower
>> kernel due to all the ugly?
> 
> Was that true? I thought the final results were a wash and that earlier
> benchmarks weren't accurate for some reason? I can't find the thread
> now. Thomas, do you have numbers on that?
> 
> BTW, I totally agree that fgkaslr is the way to go in the future. I
> am mostly arguing for this under the assumption that it doesn't
> have meaningful performance impact and that it gains the kernel some
> flexibility in the kinds of things it can do in the future. If the former
> is not true, then I'd agree, the benefit needs to be more clear.
> 

"Making the assembly really ugly" by itself is a reason not to do it, in my
Not So Humble Opinion[TM]; but the reason the kernel and small memory models
exist in the first place is because there is a nonzero performance impact of
the small-PIC memory model. Having modules in separate regions would further
add the cost of a GOT references all over the place (PLT is optional, useless
and deprecated for eager binding) *plus* might introduce at least one new
vector of attack: overwrite a random GOT slot, and just wait until it gets hit
by whatever code path it happens to be in; the exact code path doesn't matter.
From an kASLR perspective this is *very* bad, since you only need to guess the
general region of a GOT rather than an exact address.

The huge memory model, required for arbitrary placement, has a very
significant performance impact.

The assembly code is *very* different across memory models.

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-04 18:44                 ` H. Peter Anvin
  0 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2020-03-04 18:44 UTC (permalink / raw)
  To: Kees Cook, Peter Zijlstra
  Cc: Kristen Carlson Accardi, Kernel Hardening, VMware, Inc.,
	Rasmus Villemoes, virtualization, Thomas Garnier, Pavel Machek,
	Jiri Slaby, Ard Biesheuvel, Thomas Hellstrom, Herbert Xu,
	Will Deacon, Boris Ostrovsky, the arch/x86 maintainers,
	Ingo Molnar, Linux Crypto Mailing List, Len Brown, Linux PM list,
	Cao jin, Borislav Petkov, Andy Lutomirski, Josh Poimboeuf

On 2020-03-04 10:21, Kees Cook wrote:
> On Wed, Mar 04, 2020 at 10:21:36AM +0100, Peter Zijlstra wrote:
>> But at what cost; it does unspeakable ugly to the asm. And didn't a
>> kernel compiled with the extended PIE range produce a measurably slower
>> kernel due to all the ugly?
> 
> Was that true? I thought the final results were a wash and that earlier
> benchmarks weren't accurate for some reason? I can't find the thread
> now. Thomas, do you have numbers on that?
> 
> BTW, I totally agree that fgkaslr is the way to go in the future. I
> am mostly arguing for this under the assumption that it doesn't
> have meaningful performance impact and that it gains the kernel some
> flexibility in the kinds of things it can do in the future. If the former
> is not true, then I'd agree, the benefit needs to be more clear.
> 

"Making the assembly really ugly" by itself is a reason not to do it, in my
Not So Humble Opinion[TM]; but the reason the kernel and small memory models
exist in the first place is because there is a nonzero performance impact of
the small-PIC memory model. Having modules in separate regions would further
add the cost of a GOT references all over the place (PLT is optional, useless
and deprecated for eager binding) *plus* might introduce at least one new
vector of attack: overwrite a random GOT slot, and just wait until it gets hit
by whatever code path it happens to be in; the exact code path doesn't matter.
From an kASLR perspective this is *very* bad, since you only need to guess the
general region of a GOT rather than an exact address.

The huge memory model, required for arbitrary placement, has a very
significant performance impact.

The assembly code is *very* different across memory models.

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-04 18:44                 ` H. Peter Anvin
  (?)
@ 2020-03-04 19:19                   ` Thomas Garnier
  -1 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-03-04 19:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kees Cook, Peter Zijlstra, Kristen Carlson Accardi,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Kernel Hardening,
	Herbert Xu, David S. Miller, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Wed, Mar 4, 2020 at 10:45 AM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 2020-03-04 10:21, Kees Cook wrote:
> > On Wed, Mar 04, 2020 at 10:21:36AM +0100, Peter Zijlstra wrote:
> >> But at what cost; it does unspeakable ugly to the asm. And didn't a
> >> kernel compiled with the extended PIE range produce a measurably slower
> >> kernel due to all the ugly?
> >
> > Was that true? I thought the final results were a wash and that earlier
> > benchmarks weren't accurate for some reason? I can't find the thread
> > now. Thomas, do you have numbers on that?

I have never seen a significant performance impact. Performance and
size is better on more recent versions of gcc as it has better
generation of PIE code (for example generation of switches).

> >
> > BTW, I totally agree that fgkaslr is the way to go in the future. I
> > am mostly arguing for this under the assumption that it doesn't
> > have meaningful performance impact and that it gains the kernel some
> > flexibility in the kinds of things it can do in the future. If the former
> > is not true, then I'd agree, the benefit needs to be more clear.
> >
>
> "Making the assembly really ugly" by itself is a reason not to do it, in my
> Not So Humble Opinion[TM]; but the reason the kernel and small memory models
> exist in the first place is because there is a nonzero performance impact of
> the small-PIC memory model. Having modules in separate regions would further
> add the cost of a GOT references all over the place (PLT is optional, useless
> and deprecated for eager binding) *plus* might introduce at least one new
> vector of attack: overwrite a random GOT slot, and just wait until it gets hit
> by whatever code path it happens to be in; the exact code path doesn't matter.
> From an kASLR perspective this is *very* bad, since you only need to guess the
> general region of a GOT rather than an exact address.

I agree that it would add GOT references and I can explore that more
in terms of performance impact and size. This patchset makes the GOT
readonly too so I don't think the attack vector applies.

>
> The huge memory model, required for arbitrary placement, has a very
> significant performance impact.

I assume you mean mcmodel=large, it doesn't use it. It uses -fPIE and
removes -mcmodel=kernel. It favors relative references whenever
possible.

>
> The assembly code is *very* different across memory models.
>
>         -hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-04 19:19                   ` Thomas Garnier
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-03-04 19:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kees Cook, Peter Zijlstra, Kristen Carlson Accardi,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Kernel Hardening,
	Herbert Xu, David S. Miller, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Wil

On Wed, Mar 4, 2020 at 10:45 AM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 2020-03-04 10:21, Kees Cook wrote:
> > On Wed, Mar 04, 2020 at 10:21:36AM +0100, Peter Zijlstra wrote:
> >> But at what cost; it does unspeakable ugly to the asm. And didn't a
> >> kernel compiled with the extended PIE range produce a measurably slower
> >> kernel due to all the ugly?
> >
> > Was that true? I thought the final results were a wash and that earlier
> > benchmarks weren't accurate for some reason? I can't find the thread
> > now. Thomas, do you have numbers on that?

I have never seen a significant performance impact. Performance and
size is better on more recent versions of gcc as it has better
generation of PIE code (for example generation of switches).

> >
> > BTW, I totally agree that fgkaslr is the way to go in the future. I
> > am mostly arguing for this under the assumption that it doesn't
> > have meaningful performance impact and that it gains the kernel some
> > flexibility in the kinds of things it can do in the future. If the former
> > is not true, then I'd agree, the benefit needs to be more clear.
> >
>
> "Making the assembly really ugly" by itself is a reason not to do it, in my
> Not So Humble Opinion[TM]; but the reason the kernel and small memory models
> exist in the first place is because there is a nonzero performance impact of
> the small-PIC memory model. Having modules in separate regions would further
> add the cost of a GOT references all over the place (PLT is optional, useless
> and deprecated for eager binding) *plus* might introduce at least one new
> vector of attack: overwrite a random GOT slot, and just wait until it gets hit
> by whatever code path it happens to be in; the exact code path doesn't matter.
> From an kASLR perspective this is *very* bad, since you only need to guess the
> general region of a GOT rather than an exact address.

I agree that it would add GOT references and I can explore that more
in terms of performance impact and size. This patchset makes the GOT
readonly too so I don't think the attack vector applies.

>
> The huge memory model, required for arbitrary placement, has a very
> significant performance impact.

I assume you mean mcmodel=large, it doesn't use it. It uses -fPIE and
removes -mcmodel=kernel. It favors relative references whenever
possible.

>
> The assembly code is *very* different across memory models.
>
>         -hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-04 19:19                   ` Thomas Garnier
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Garnier @ 2020-03-04 19:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kees Cook, Peter Zijlstra, Kristen Carlson Accardi,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Kernel Hardening,
	Herbert Xu, David S. Miller, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On Wed, Mar 4, 2020 at 10:45 AM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 2020-03-04 10:21, Kees Cook wrote:
> > On Wed, Mar 04, 2020 at 10:21:36AM +0100, Peter Zijlstra wrote:
> >> But at what cost; it does unspeakable ugly to the asm. And didn't a
> >> kernel compiled with the extended PIE range produce a measurably slower
> >> kernel due to all the ugly?
> >
> > Was that true? I thought the final results were a wash and that earlier
> > benchmarks weren't accurate for some reason? I can't find the thread
> > now. Thomas, do you have numbers on that?

I have never seen a significant performance impact. Performance and
size is better on more recent versions of gcc as it has better
generation of PIE code (for example generation of switches).

> >
> > BTW, I totally agree that fgkaslr is the way to go in the future. I
> > am mostly arguing for this under the assumption that it doesn't
> > have meaningful performance impact and that it gains the kernel some
> > flexibility in the kinds of things it can do in the future. If the former
> > is not true, then I'd agree, the benefit needs to be more clear.
> >
>
> "Making the assembly really ugly" by itself is a reason not to do it, in my
> Not So Humble Opinion[TM]; but the reason the kernel and small memory models
> exist in the first place is because there is a nonzero performance impact of
> the small-PIC memory model. Having modules in separate regions would further
> add the cost of a GOT references all over the place (PLT is optional, useless
> and deprecated for eager binding) *plus* might introduce at least one new
> vector of attack: overwrite a random GOT slot, and just wait until it gets hit
> by whatever code path it happens to be in; the exact code path doesn't matter.
> From an kASLR perspective this is *very* bad, since you only need to guess the
> general region of a GOT rather than an exact address.

I agree that it would add GOT references and I can explore that more
in terms of performance impact and size. This patchset makes the GOT
readonly too so I don't think the attack vector applies.

>
> The huge memory model, required for arbitrary placement, has a very
> significant performance impact.

I assume you mean mcmodel=large, it doesn't use it. It uses -fPIE and
removes -mcmodel=kernel. It favors relative references whenever
possible.

>
> The assembly code is *very* different across memory models.
>
>         -hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
  2020-03-04 19:19                   ` Thomas Garnier
@ 2020-03-04 19:22                     ` H. Peter Anvin
  -1 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2020-03-04 19:22 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Kees Cook, Peter Zijlstra, Kristen Carlson Accardi,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Kernel Hardening,
	Herbert Xu, David S. Miller, the arch/x86 maintainers,
	Andy Lutomirski, Juergen Gross, Thomas Hellstrom, VMware, Inc.,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Rasmus Villemoes,
	Miguel Ojeda, Will Deacon, Ard Biesheuvel, Masami Hiramatsu,
	Jiri Slaby, Boris Ostrovsky, Josh Poimboeuf, Cao jin,
	Allison Randal, Linux Crypto Mailing List, LKML, virtualization,
	Linux PM list

On 2020-03-04 11:19, Thomas Garnier wrote:
>>
>> The huge memory model, required for arbitrary placement, has a very
>> significant performance impact.
> 
> I assume you mean mcmodel=large, it doesn't use it. It uses -fPIE and
> removes -mcmodel=kernel. It favors relative references whenever
> possible.
> 

I know... this was in reference to a comment of Kees'.

	-hpa


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v11 00/11] x86: PIE support to extend KASLR randomization
@ 2020-03-04 19:22                     ` H. Peter Anvin
  0 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2020-03-04 19:22 UTC (permalink / raw)
  To: Thomas Garnier
  Cc: Kristen Carlson Accardi, Kernel Hardening, Peter Zijlstra,
	Rasmus Villemoes, virtualization, Pavel Machek, Will Deacon,
	Ard Biesheuvel, Thomas Hellstrom, Herbert Xu, Jiri Slaby,
	Boris Ostrovsky, the arch/x86 maintainers, VMware, Inc.,
	Ingo Molnar, Linux Crypto Mailing List, Len Brown, Kees Cook,
	Linux PM list, Cao jin, Borislav Petkov, Andy Lutomirski, Josh

On 2020-03-04 11:19, Thomas Garnier wrote:
>>
>> The huge memory model, required for arbitrary placement, has a very
>> significant performance impact.
> 
> I assume you mean mcmodel=large, it doesn't use it. It uses -fPIE and
> removes -mcmodel=kernel. It favors relative references whenever
> possible.
> 

I know... this was in reference to a comment of Kees'.

	-hpa

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-03-04 19:25 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28  0:00 [PATCH v11 00/11] x86: PIE support to extend KASLR randomization Thomas Garnier
2020-02-28  0:00 ` Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 01/11] x86/crypto: Adapt assembly for PIE support Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 02/11] x86: Add macro to get symbol address " Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 03/11] x86: relocate_kernel - Adapt assembly " Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 04/11] x86/entry/64: " Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 05/11] x86: pm-trace - " Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 06/11] x86/CPU: " Thomas Garnier
2020-03-03  4:58   ` Kees Cook
2020-02-28  0:00 ` [PATCH v11 07/11] x86/acpi: " Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 08/11] x86/boot/64: " Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 09/11] x86/power/64: " Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 10/11] x86/paravirt: " Thomas Garnier
2020-02-28  0:00 ` [PATCH v11 11/11] x86/alternatives: " Thomas Garnier
2020-03-03  4:59   ` Kees Cook
2020-03-03  5:02 ` [PATCH v11 00/11] x86: PIE support to extend KASLR randomization Kees Cook
2020-03-03  5:02   ` Kees Cook
2020-03-03  9:55   ` Peter Zijlstra
2020-03-03  9:55     ` Peter Zijlstra
2020-03-03 15:43     ` Thomas Garnier
2020-03-03 15:43       ` Thomas Garnier
2020-03-03 15:43       ` Thomas Garnier
2020-03-03 21:01       ` Kristen Carlson Accardi
2020-03-03 21:01         ` Kristen Carlson Accardi
2020-03-03 21:01         ` Kristen Carlson Accardi
2020-03-03 21:19         ` Kees Cook
2020-03-03 21:19           ` Kees Cook
2020-03-04  9:21           ` Peter Zijlstra
2020-03-04  9:21             ` Peter Zijlstra
2020-03-04 18:21             ` Kees Cook
2020-03-04 18:21               ` Kees Cook
2020-03-04 18:44               ` H. Peter Anvin
2020-03-04 18:44                 ` H. Peter Anvin
2020-03-04 19:19                 ` Thomas Garnier
2020-03-04 19:19                   ` Thomas Garnier
2020-03-04 19:19                   ` Thomas Garnier
2020-03-04 19:22                   ` H. Peter Anvin
2020-03-04 19:22                     ` H. Peter Anvin
2020-03-04  9:40           ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.