linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11 v3] x86: load FPU registers on return to userland
@ 2018-10-04 14:05 Sebastian Andrzej Siewior
  2018-10-04 14:05 ` [PATCH 01/11] x86/entry: remove _TIF_ALLWORK_MASK Sebastian Andrzej Siewior
                   ` (11 more replies)
  0 siblings, 12 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

This is a refurbished series originally started by by Rik van Riel. The
goal is load the FPU registers on return to userland and not on every
context switch. By this optimisation we can:
- avoid loading the registers if the task stays in kernel and does
  not return to userland
- make kernel_fpu_begin() cheaper: it only saves the registers on the
  first invocation. The second invocation does not need save them again.

To access the FPU registers in kernel we need:
- disable preemption to avoid that the scheduler switches tasks. By
  doing so it would set TIF_LOAD_FPU and the FPU registers would be not
  valid.
- disable BH because the softirq might use kernel_fpu_begin() and then
  set TIF_LOAD_FPU instead loading the FPU registers on completion.

v1…v3:
v2 was never posted. I followed the idea to completely decouple PKRU
from xstate. This didn't quite work and made a few things complicated. 
One obvious required fixup is copy_fpstate_to_sigframe() where the PKRU
state needs to be fiddled into xstate. This required another
xfeatures_mask so that the sanity checks were performed and
xstate_offsets would be computed. Additionally ptrace also reads/sets
xstate in order to get/set the register and PKRU is one of them. So this
would need some fiddle, too.
In v3 I dropped that decouple idea. I also learned that the wrpkru
instruction is not privileged and so caching it in kernel does not work.
Instead I keep PKRU in xstate area and load it at context switch time
while the remaining registers are deferred (until return to userland).
The offset of PKRU within xstate is enumerated at boot time so why not
use it.

This seems to work with my in-kernel test case and a userland test case
which use xmm registers. The pkey feature was tested in non kvm
accelerated qemu and it seems to work, too.

Sebastian


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 01/11] x86/entry: remove _TIF_ALLWORK_MASK
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-11 16:27   ` Borislav Petkov
  2018-10-04 14:05 ` [PATCH 02/11] x86/fpu: add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

There is no user of _TIF_ALLWORK_MASK since commit 21d375b6b34ff
("x86/entry/64: Remove the SYSCALL64 fast path").
Remove unused define _TIF_ALLWORK_MASK.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/thread_info.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 2ff2a30a264f4..cd6920674b905 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -136,14 +136,6 @@ struct thread_info {
 	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT |	\
 	 _TIF_NOHZ)
 
-/* work to do on any return to user space */
-#define _TIF_ALLWORK_MASK						\
-	(_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |	\
-	 _TIF_NEED_RESCHED | _TIF_SINGLESTEP | _TIF_SYSCALL_EMU |	\
-	 _TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE |	\
-	 _TIF_PATCH_PENDING | _TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT |	\
-	 _TIF_FSCHECK)
-
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW							\
 	(_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP|_TIF_SSBD)
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 02/11] x86/fpu: add (__)make_fpregs_active helpers
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
  2018-10-04 14:05 ` [PATCH 01/11] x86/entry: remove _TIF_ALLWORK_MASK Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-04 14:05 ` [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

Add helper function that ensures the floating point registers for
the current task are active. Use with preemption disabled.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index a38bf5a1e37ad..16c4077ffc945 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -514,6 +514,26 @@ static inline void fpregs_activate(struct fpu *fpu)
 	trace_x86_fpu_regs_activated(fpu);
 }
 
+/*
+ * Load the FPU state for the current task. Call with preemption disabled.
+ */
+static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
+{
+	if (!fpregs_state_valid(fpu, cpu))
+		copy_kernel_to_fpregs(&fpu->state);
+	fpregs_activate(fpu);
+}
+
+static inline void __fpregs_changes_begin(void)
+{
+	preempt_disable();
+}
+
+static inline void __fpregs_changes_end(void)
+{
+	preempt_enable();
+}
+
 /*
  * FPU state switching for scheduling.
  *
@@ -553,11 +573,8 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
 		       new_fpu->initialized;
 
-	if (preload) {
-		if (!fpregs_state_valid(new_fpu, cpu))
-			copy_kernel_to_fpregs(&new_fpu->state);
-		fpregs_activate(new_fpu);
-	}
+	if (preload)
+		__fpregs_load_activate(new_fpu, cpu);
 }
 
 /*
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
  2018-10-04 14:05 ` [PATCH 01/11] x86/entry: remove _TIF_ALLWORK_MASK Sebastian Andrzej Siewior
  2018-10-04 14:05 ` [PATCH 02/11] x86/fpu: add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-11 17:30   ` Christophe de Dinechin
                     ` (2 more replies)
  2018-10-04 14:05 ` [PATCH 04/11] x86/fpu: eager switch PKRU state Sebastian Andrzej Siewior
                   ` (8 subsequent siblings)
  11 siblings, 3 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Most users of __raw_xsave_addr() use a feature number, shift it to a
mask and then __raw_xsave_addr() shifts it back to the feature number.

Make __raw_xsave_addr() use the feature number as argument.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/xstate.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7642d36..38d0b5ea40425 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -811,10 +811,8 @@ void fpu__resume_cpu(void)
  *
  * Note: does not work for compacted buffers.
  */
-void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
+void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr)
 {
-	int feature_nr = fls64(xstate_feature_mask) - 1;
-
 	if (!xfeature_enabled(feature_nr)) {
 		WARN_ON_FPU(1);
 		return NULL;
@@ -842,6 +840,7 @@ void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
  */
 void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 {
+	int feature_nr;
 	/*
 	 * Do we even *have* xsave state?
 	 */
@@ -869,7 +868,8 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	if (!(xsave->header.xfeatures & xstate_feature))
 		return NULL;
 
-	return __raw_xsave_addr(xsave, xstate_feature);
+	feature_nr = fls64(xstate_feature) - 1;
+	return __raw_xsave_addr(xsave, feature_nr);
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
 
@@ -1018,7 +1018,7 @@ int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int of
 		 * Copy only in-use xstates:
 		 */
 		if ((header.xfeatures >> i) & 1) {
-			void *src = __raw_xsave_addr(xsave, 1 << i);
+			void *src = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1104,7 +1104,7 @@ int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned i
 		 * Copy only in-use xstates:
 		 */
 		if ((header.xfeatures >> i) & 1) {
-			void *src = __raw_xsave_addr(xsave, 1 << i);
+			void *src = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1161,7 +1161,7 @@ int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, 1 << i);
+			void *dst = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
@@ -1215,7 +1215,7 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
 		u64 mask = ((u64)1 << i);
 
 		if (hdr.xfeatures & mask) {
-			void *dst = __raw_xsave_addr(xsave, 1 << i);
+			void *dst = __raw_xsave_addr(xsave, i);
 
 			offset = xstate_offsets[i];
 			size = xstate_sizes[i];
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 04/11] x86/fpu: eager switch PKRU state
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-12 17:51   ` Dave Hansen
  2018-10-04 14:05 ` [PATCH 05/11] x86/fpu: set PKRU state for kernel threads Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

While most of a task's FPU state is only needed in user space,
the protection keys need to be in place immediately after a
context switch.

The reason is that any accesses to userspace memory while running
in kernel mode also need to abide by the memory permissions
specified in the protection keys.

The "eager switch" is a preparation for loading the FPU state on return
to userland. Instead of decoupling PKRU state from xstate I update PKRU
within xstate on write operations by the kernel.

Signed-off-by: Rik van Riel <riel@surriel.com>
[bigeasy: save pkru to xstate, no cache]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 20 +++++++++++++++----
 arch/x86/include/asm/fpu/xstate.h   |  2 ++
 arch/x86/include/asm/pgtable.h      |  6 +-----
 arch/x86/include/asm/pkeys.h        |  2 +-
 arch/x86/kernel/fpu/core.c          |  2 +-
 arch/x86/mm/pkeys.c                 | 31 ++++++++++++++++++++++-------
 include/linux/pkeys.h               |  2 +-
 7 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 16c4077ffc945..956d967ca824a 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -570,11 +570,23 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
-		       new_fpu->initialized;
+	bool load_fpu;
 
-	if (preload)
-		__fpregs_load_activate(new_fpu, cpu);
+	load_fpu = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized;
+	if (!load_fpu)
+		return;
+
+	__fpregs_load_activate(new_fpu, cpu);
+
+#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+	if (static_cpu_has(X86_FEATURE_OSPKE)) {
+		struct pkru_state *pk;
+
+		pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
+		if (pk->pkru != __read_pkru())
+			__write_pkru(pk->pkru);
+	}
+#endif
 }
 
 /*
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 48581988d78c7..82227e222ca35 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -5,6 +5,7 @@
 #include <linux/types.h>
 #include <asm/processor.h>
 #include <linux/uaccess.h>
+#include <asm/user.h>
 
 /* Bit 63 of XCR0 is reserved for future expansion */
 #define XFEATURE_MASK_EXTEND	(~(XFEATURE_MASK_FPSSE | (1ULL << 63)))
@@ -47,6 +48,7 @@ extern void __init update_regset_xstate_info(unsigned int size,
 
 void fpu__xstate_clear_all_cpu_caps(void);
 void *get_xsave_addr(struct xregs_state *xsave, int xstate);
+void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr);
 const void *get_xsave_field_ptr(int xstate_field);
 int using_compacted_format(void);
 int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 690c0307afed0..80be15ba656f6 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -132,11 +132,7 @@ static inline u32 read_pkru(void)
 	return 0;
 }
 
-static inline void write_pkru(u32 pkru)
-{
-	if (boot_cpu_has(X86_FEATURE_OSPKE))
-		__write_pkru(pkru);
-}
+void write_pkru(u32 pkru);
 
 static inline int pte_young(pte_t pte)
 {
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 19b137f1b3beb..b184f916319e5 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -119,7 +119,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 		unsigned long init_val);
 extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 		unsigned long init_val);
-extern void copy_init_pkru_to_fpregs(void);
+extern void pkru_set_init_value(void);
 
 static inline int vma_pkey(struct vm_area_struct *vma)
 {
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2ea85b32421a0..72cd2e2a07194 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -373,7 +373,7 @@ static inline void copy_init_fpstate_to_fpregs(void)
 		copy_kernel_to_fregs(&init_fpstate.fsave);
 
 	if (boot_cpu_has(X86_FEATURE_OSPKE))
-		copy_init_pkru_to_fpregs();
+		pkru_set_init_value();
 }
 
 /*
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 6e98e0a7c9231..4409ada551c5e 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -18,6 +18,7 @@
 
 #include <asm/cpufeature.h>             /* boot_cpu_has, ...            */
 #include <asm/mmu_context.h>            /* vma_pkey()                   */
+#include <asm/fpu/xstate.h>
 
 int __execute_only_pkey(struct mm_struct *mm)
 {
@@ -123,6 +124,24 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey
 	return vma_pkey(vma);
 }
 
+void write_pkru(u32 pkru)
+{
+	struct pkru_state *pk;
+
+	if (!boot_cpu_has(X86_FEATURE_OSPKE))
+		return;
+
+	pk = __raw_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
+	/*
+	 * Update the PKRU value in cstate and then in the CPU. A context
+	 * switch between those two operation would load the new value from the
+	 * updated xstate and then we would write (the same value) to the CPU.
+	 */
+	pk->pkru = pkru;
+	__write_pkru(pkru);
+
+}
+
 #define PKRU_AD_KEY(pkey)	(PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY))
 
 /*
@@ -143,20 +162,18 @@ u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
  * we know the FPU regstiers are safe for use and we can use PKRU
  * directly.
  */
-void copy_init_pkru_to_fpregs(void)
+void pkru_set_init_value(void)
 {
 	u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value);
+
 	/*
 	 * Any write to PKRU takes it out of the XSAVE 'init
 	 * state' which increases context switch cost.  Avoid
-	 * writing 0 when PKRU was already 0.
+	 * writing then same value which is already written.
 	 */
-	if (!init_pkru_value_snapshot && !read_pkru())
+	if (init_pkru_value_snapshot == read_pkru())
 		return;
-	/*
-	 * Override the PKRU state that came from 'init_fpstate'
-	 * with the baseline from the process.
-	 */
+
 	write_pkru(init_pkru_value_snapshot);
 }
 
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 2955ba9760489..9a9efecc1388f 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -44,7 +44,7 @@ static inline bool arch_pkeys_enabled(void)
 	return false;
 }
 
-static inline void copy_init_pkru_to_fpregs(void)
+static inline void pkru_set_init_value(void)
 {
 }
 
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 04/11] x86/fpu: eager switch PKRU state Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-12 17:54   ` Dave Hansen
  2018-10-04 14:05 ` [PATCH 06/11] x86/pkeys: make init_pkru_value static Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The PKRU value is not set for kernel threads because they do not have
the ->initialized value set. As a result the kernel thread has a random
PKRU value set which it inherits from the previous task.
It has been suggested by Paolo Bonzini to set it for kernel threads, too
because it might be a fix.
I *think* this is not required because the kernel threads don't copy
data to/from userland and don't have access to any userspace mm in
general.
However there is this use_mm(). If we gain a mm by use_mm() we don't
have a matching PKRU value because those are per thread. It has been
suggested to use 0 as the PKRU value but this would bypass PKRU.

Set the initial (default) PKRU value for kernel threads.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 956d967ca824a..4ecaf4d22954e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -14,6 +14,7 @@
 #include <linux/compat.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/pkeys.h>
 
 #include <asm/user.h>
 #include <asm/fpu/api.h>
@@ -573,20 +574,23 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 	bool load_fpu;
 
 	load_fpu = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized;
-	if (!load_fpu)
-		return;
-
-	__fpregs_load_activate(new_fpu, cpu);
-
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
 	if (static_cpu_has(X86_FEATURE_OSPKE)) {
 		struct pkru_state *pk;
 
-		pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
-		if (pk->pkru != __read_pkru())
-			__write_pkru(pk->pkru);
+		if (!load_fpu) {
+			pkru_set_init_value();
+		} else {
+			pk = __raw_xsave_addr(&new_fpu->state.xsave,
+					      XFEATURE_PKRU);
+			if (pk->pkru != __read_pkru())
+				__write_pkru(pk->pkru);
+		}
 	}
 #endif
+	if (!load_fpu)
+		return;
+	__fpregs_load_activate(new_fpu, cpu);
 }
 
 /*
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 06/11] x86/pkeys: make init_pkru_value static
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 05/11] x86/fpu: set PKRU state for kernel threads Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-12 17:55   ` Dave Hansen
  2018-10-04 14:05 ` [PATCH 07/11] x86/pkeys: Drop the preempt-disable section Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

The variable init_pkru_value isn't used outside of this file.
Make init_pkru_value static.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/mm/pkeys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 4409ada551c5e..a150984171684 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -150,6 +150,7 @@ void write_pkru(u32 pkru)
  * in the process's lifetime will not accidentally get access
  * to data which is pkey-protected later on.
  */
+static
 u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) |
 		      PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) |
 		      PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) |
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 07/11] x86/pkeys: Drop the preempt-disable section
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 06/11] x86/pkeys: make init_pkru_value static Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-12 17:58   ` Dave Hansen
  2018-10-04 14:05 ` [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

The fpu->initialized flag should not be changed underneath us. This might be a
fallout during the removal of the LazyFPU support. The FPU is marked
initialized as soon as the state has been set to an initial value. It does not
signal if the CPU's FPU registers are loaded.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/mm/pkeys.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index a150984171684..eeb03b0f0f514 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -40,17 +40,13 @@ int __execute_only_pkey(struct mm_struct *mm)
 	 * dance to set PKRU if we do not need to.  Check it
 	 * first and assume that if the execute-only pkey is
 	 * write-disabled that we do not have to set it
-	 * ourselves.  We need preempt off so that nobody
-	 * can make fpregs inactive.
+	 * ourselves.
 	 */
-	preempt_disable();
 	if (!need_to_set_mm_pkey &&
 	    current->thread.fpu.initialized &&
 	    !__pkru_allows_read(read_pkru(), execute_only_pkey)) {
-		preempt_enable();
 		return execute_only_pkey;
 	}
-	preempt_enable();
 
 	/*
 	 * Set up PKRU so that it denies access for everything
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 07/11] x86/pkeys: Drop the preempt-disable section Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-11 17:50   ` Christophe de Dinechin
  2018-10-12 18:15   ` Dave Hansen
  2018-10-04 14:05 ` [PATCH 09/11] x86/entry: add TIF_LOAD_FPU Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

copy_fpstate_to_sigframe() has two callers and both invoke the function only if
fpu->initialized is set. So the check in the function for ->initialized makes
no sense. It might be a relict from the lazy-FPU time: If the FPU registers
were "loaded" then we could save them directly. Otherwise (the FPU
registers are not up to date) they are saved in the FPU struct and
it could be used for memcpy().

Since the registers always loaded at this point, save them and copy
later.
This code is extracted from an earlier version of the patchset while
there still was lazy-FPU on x86. This is a preparation for loading the
FPU registers on return to userland.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 45 ----------------------------
 arch/x86/kernel/fpu/signal.c        | 46 +++++++----------------------
 2 files changed, 11 insertions(+), 80 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 4ecaf4d22954e..df8816be3efdd 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -126,22 +126,6 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore)	\
 		     : output : input)
 
-static inline int copy_fregs_to_user(struct fregs_state __user *fx)
-{
-	return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
-}
-
-static inline int copy_fxregs_to_user(struct fxregs_state __user *fx)
-{
-	if (IS_ENABLED(CONFIG_X86_32))
-		return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
-	else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
-		return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
-
-	/* See comment in copy_fxregs_to_kernel() below. */
-	return user_insn(rex64/fxsave (%[fx]), "=m" (*fx), [fx] "R" (fx));
-}
-
 static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
 {
 	if (IS_ENABLED(CONFIG_X86_32)) {
@@ -352,35 +336,6 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
 	XSTATE_XRESTORE(xstate, lmask, hmask);
 }
 
-/*
- * Save xstate to user space xsave area.
- *
- * We don't use modified optimization because xrstor/xrstors might track
- * a different application.
- *
- * We don't use compacted format xsave area for
- * backward compatibility for old applications which don't understand
- * compacted format of xsave area.
- */
-static inline int copy_xregs_to_user(struct xregs_state __user *buf)
-{
-	int err;
-
-	/*
-	 * Clear the xsave header first, so that reserved fields are
-	 * initialized to zero.
-	 */
-	err = __clear_user(&buf->header, sizeof(buf->header));
-	if (unlikely(err))
-		return -EFAULT;
-
-	stac();
-	XSTATE_OP(XSAVE, buf, -1, -1, err);
-	clac();
-
-	return err;
-}
-
 /*
  * Restore xstate from user space xsave area.
  */
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 23f1691670b66..c8f5ff58578ed 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -117,23 +117,6 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
 
 	return err;
 }
-
-static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
-{
-	int err;
-
-	if (use_xsave())
-		err = copy_xregs_to_user(buf);
-	else if (use_fxsr())
-		err = copy_fxregs_to_user((struct fxregs_state __user *) buf);
-	else
-		err = copy_fregs_to_user((struct fregs_state __user *) buf);
-
-	if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
-		err = -EFAULT;
-	return err;
-}
-
 /*
  * Save the fpu, extended register state to the user signal frame.
  *
@@ -172,27 +155,20 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	if (fpu->initialized || using_compacted_format()) {
-		/* Save the live register state to the user directly. */
-		if (copy_fpregs_to_sigframe(buf_fx))
-			return -1;
-		/* Update the thread's fxstate to save the fsave header. */
-		if (ia32_fxstate)
-			copy_fxregs_to_kernel(fpu);
+	/* Update the thread's fxstate to save the fsave header. */
+	if (ia32_fxstate) {
+		copy_fxregs_to_kernel(fpu);
 	} else {
-		/*
-		 * It is a *bug* if kernel uses compacted-format for xsave
-		 * area and we copy it out directly to a signal frame. It
-		 * should have been handled above by saving the registers
-		 * directly.
-		 */
-		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
-			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
-			return -1;
-		}
+		copy_fpregs_to_fpstate(fpu);
+		fpregs_deactivate(fpu);
+	}
 
+	if (using_compacted_format()) {
+		copy_xstate_to_user(buf_fx, xsave, 0, size);
+	} else {
 		fpstate_sanitize_xstate(fpu);
-		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
+		size = fpu_user_xstate_size;
+		if (__copy_to_user(buf_fx, xsave, size))
 			return -1;
 	}
 
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 09/11] x86/entry: add TIF_LOAD_FPU
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-04 14:05 ` [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

Add TIF_LOAD_FPU. This is reserved for loading the FPU registers before
returning to userpace. This flag must not be set for systems without a
FPU.
It is introduced now, so we can add code handling it now before adding
the main feature.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/thread_info.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cd6920674b905..cd910fd8c8c4c 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -83,6 +83,7 @@ struct thread_info {
 #define TIF_SYSCALL_EMU		6	/* syscall emulation active */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SECCOMP		8	/* secure computing */
+#define TIF_LOAD_FPU		10	/* load FPU on return to userspace */
 #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
 #define TIF_UPROBE		12	/* breakpointed or singlestepping */
 #define TIF_PATCH_PENDING	13	/* pending live patching update */
@@ -110,6 +111,7 @@ struct thread_info {
 #define _TIF_SYSCALL_EMU	(1 << TIF_SYSCALL_EMU)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
+#define _TIF_LOAD_FPU		(1 << TIF_LOAD_FPU)
 #define _TIF_USER_RETURN_NOTIFY	(1 << TIF_USER_RETURN_NOTIFY)
 #define _TIF_UPROBE		(1 << TIF_UPROBE)
 #define _TIF_PATCH_PENDING	(1 << TIF_PATCH_PENDING)
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (8 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 09/11] x86/entry: add TIF_LOAD_FPU Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-12 19:40   ` Dave Hansen
  2018-10-04 14:05 ` [PATCH 11/11] x86/fpu: defer FPU state load until return to userspace Sebastian Andrzej Siewior
  2018-10-04 16:45 ` [PATCH 00/11 v3] x86: load FPU registers on return to userland Rik van Riel
  11 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

If TIF_LOAD_FPU is set, then the registers are saved (not loaded). In that case
we skip the saving part.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/kernel/fpu/signal.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c8f5ff58578ed..979dcd1ed82e0 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -155,13 +155,17 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
 			sizeof(struct user_i387_ia32_struct), NULL,
 			(struct _fpstate_32 __user *) buf) ? -1 : 1;
 
-	/* Update the thread's fxstate to save the fsave header. */
-	if (ia32_fxstate) {
-		copy_fxregs_to_kernel(fpu);
-	} else {
-		copy_fpregs_to_fpstate(fpu);
-		fpregs_deactivate(fpu);
+	__fpregs_changes_begin();
+	if (!test_thread_flag(TIF_LOAD_FPU)) {
+		/* Update the thread's fxstate to save the fsave header. */
+		if (ia32_fxstate) {
+			copy_fxregs_to_kernel(fpu);
+		} else {
+			copy_fpregs_to_fpstate(fpu);
+			fpregs_deactivate(fpu);
+		}
 	}
+	__fpregs_changes_end();
 
 	if (using_compacted_format()) {
 		copy_xstate_to_user(buf_fx, xsave, 0, size);
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 11/11] x86/fpu: defer FPU state load until return to userspace
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (9 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU Sebastian Andrzej Siewior
@ 2018-10-04 14:05 ` Sebastian Andrzej Siewior
  2018-10-04 16:14   ` Andy Lutomirski
  2018-10-04 16:45 ` [PATCH 00/11 v3] x86: load FPU registers on return to userland Rik van Riel
  11 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-04 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen,
	Sebastian Andrzej Siewior

From: Rik van Riel <riel@surriel.com>

Defer loading of FPU state until return to userspace. This gives
the kernel the potential to skip loading FPU state for tasks that
stay in kernel mode, or for tasks that end up with repeated
invocations of kernel_fpu_begin.

It also increases the chances that a task's FPU state will remain
valid in the FPU registers until it is scheduled back in, allowing
us to skip restoring that task's FPU state altogether.

The __fpregs_changes_{begin|end}() section ensures that the register
remain unchanged. Otherwise a context switch or a BH could save the
registers to its FPU context and processor's FPU register would remain
random.
fpu__restore() has one user so I pulled that preempt_disable() part into
fpu__restore(). While the function did *load* the registers, it now just
makes sure that they are loaded on return to userland.

KVM swaps the host/guest register on enry/exit path. I kept the flow as
is. First it ensures that the registers are loaded and then saves the
current (host) state before it loads the guest's register. Before
entring the guest, it ensures that the register are still loaded.

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/entry/common.c             |   9 +++
 arch/x86/include/asm/fpu/api.h      |  11 +++
 arch/x86/include/asm/fpu/internal.h |  25 ++++---
 arch/x86/include/asm/trace/fpu.h    |   5 +-
 arch/x86/kernel/fpu/core.c          | 108 ++++++++++++++++++++--------
 arch/x86/kernel/fpu/signal.c        |   3 -
 arch/x86/kernel/process.c           |   2 +-
 arch/x86/kernel/process_32.c        |   7 +-
 arch/x86/kernel/process_64.c        |   7 +-
 arch/x86/kvm/x86.c                  |  18 +++--
 10 files changed, 143 insertions(+), 52 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3b2490b819181..3dad5c3b335eb 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -31,6 +31,7 @@
 #include <asm/vdso.h>
 #include <linux/uaccess.h>
 #include <asm/cpufeature.h>
+#include <asm/fpu/api.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
@@ -196,6 +197,14 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
 	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
 		exit_to_usermode_loop(regs, cached_flags);
 
+	/* Reload ti->flags; we may have rescheduled above. */
+	cached_flags = READ_ONCE(ti->flags);
+
+	if (unlikely(cached_flags & _TIF_LOAD_FPU))
+		switch_fpu_return();
+	else
+		fpregs_is_state_consistent();
+
 #ifdef CONFIG_COMPAT
 	/*
 	 * Compat syscalls set TS_COMPAT.  Make sure we clear it before
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index a9caac9d4a729..e3077860f7333 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -27,6 +27,17 @@ extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
 
+#ifdef CONFIG_X86_DEBUG_FPU
+extern void fpregs_is_state_consistent(void);
+#else
+static inline void fpregs_is_state_consistent(void) { }
+#endif
+
+/*
+ * Load the task FPU state before returning to userspace.
+ */
+extern void switch_fpu_return(void);
+
 /*
  * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
  *
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index df8816be3efdd..346f8057ecd7b 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -32,7 +32,7 @@ extern void fpu__save(struct fpu *fpu);
 extern void fpu__restore(struct fpu *fpu);
 extern int  fpu__restore_sig(void __user *buf, int ia32_frame);
 extern void fpu__drop(struct fpu *fpu);
-extern int  fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
+extern int  fpu__copy(struct task_struct *dst, struct task_struct *src);
 extern void fpu__clear(struct fpu *fpu);
 extern int  fpu__exception_code(struct fpu *fpu, int trap_nr);
 extern int  dump_fpu(struct pt_regs *ptregs, struct user_i387_struct *fpstate);
@@ -473,21 +473,30 @@ static inline void fpregs_activate(struct fpu *fpu)
 /*
  * Load the FPU state for the current task. Call with preemption disabled.
  */
-static inline void __fpregs_load_activate(struct fpu *fpu, int cpu)
+static inline void __fpregs_load_activate(void)
 {
+	struct fpu *fpu = &current->thread.fpu;
+	int cpu = smp_processor_id();
+
 	if (!fpregs_state_valid(fpu, cpu))
 		copy_kernel_to_fpregs(&fpu->state);
 	fpregs_activate(fpu);
+	fpu->last_cpu = cpu;
+	clear_thread_flag(TIF_LOAD_FPU);
 }
 
+void fpregs_load_activate(void);
+
 static inline void __fpregs_changes_begin(void)
 {
 	preempt_disable();
+	local_bh_disable();
 }
 
 static inline void __fpregs_changes_end(void)
 {
 	preempt_enable();
+	local_bh_enable();
 }
 
 /*
@@ -498,8 +507,8 @@ static inline void __fpregs_changes_end(void)
  *  - switch_fpu_prepare() saves the old state.
  *    This is done within the context of the old process.
  *
- *  - switch_fpu_finish() restores the new state as
- *    necessary.
+ *  - switch_fpu_finish() sets TIF_LOAD_FPU; the floating point state
+ *    will get loaded on return to userspace, or when the kernel needs it.
  */
 static inline void
 switch_fpu_prepare(struct fpu *old_fpu, int cpu)
@@ -521,10 +530,10 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 
 /*
- * Set up the userspace FPU context for the new task, if the task
- * has used the FPU.
+ * Load PKRU from the FPU context if available. Delay loading the loading of the
+ * complete FPU state until the return to userland.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
+static inline void switch_fpu_finish(struct fpu *new_fpu)
 {
 	bool load_fpu;
 
@@ -545,7 +554,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 #endif
 	if (!load_fpu)
 		return;
-	__fpregs_load_activate(new_fpu, cpu);
+	set_thread_flag(TIF_LOAD_FPU);
 }
 
 /*
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index 069c04be15076..ec3be1c9da7ec 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -14,6 +14,7 @@ DECLARE_EVENT_CLASS(x86_fpu,
 	TP_STRUCT__entry(
 		__field(struct fpu *, fpu)
 		__field(bool, initialized)
+		__field(bool, load_fpu)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
@@ -21,14 +22,16 @@ DECLARE_EVENT_CLASS(x86_fpu,
 	TP_fast_assign(
 		__entry->fpu		= fpu;
 		__entry->initialized	= fpu->initialized;
+		__entry->load_fpu	= test_thread_flag(TIF_LOAD_FPU);
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p initialized: %d xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p initialized: %d load: %d xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
 			__entry->initialized,
+			__entry->load_fpu,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 72cd2e2a07194..c757fd1a8440d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -101,14 +101,15 @@ void __kernel_fpu_begin(void)
 
 	kernel_fpu_disable();
 
-	if (fpu->initialized) {
+	__cpu_invalidate_fpregs_state();
+
+	if (!test_thread_flag(TIF_LOAD_FPU)) {
+		set_thread_flag(TIF_LOAD_FPU);
 		/*
 		 * Ignore return value -- we don't care if reg state
 		 * is clobbered.
 		 */
 		copy_fpregs_to_fpstate(fpu);
-	} else {
-		__cpu_invalidate_fpregs_state();
 	}
 }
 EXPORT_SYMBOL(__kernel_fpu_begin);
@@ -117,8 +118,7 @@ void __kernel_fpu_end(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (fpu->initialized)
-		copy_kernel_to_fpregs(&fpu->state);
+	switch_fpu_finish(fpu);
 
 	kernel_fpu_enable();
 }
@@ -147,15 +147,15 @@ void fpu__save(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	preempt_disable();
+	__fpregs_changes_begin();
 	trace_x86_fpu_before_save(fpu);
-	if (fpu->initialized) {
+	if (fpu->initialized && !test_thread_flag(TIF_LOAD_FPU)) {
 		if (!copy_fpregs_to_fpstate(fpu)) {
 			copy_kernel_to_fpregs(&fpu->state);
 		}
 	}
 	trace_x86_fpu_after_save(fpu);
-	preempt_enable();
+	__fpregs_changes_end();
 }
 EXPORT_SYMBOL_GPL(fpu__save);
 
@@ -188,8 +188,11 @@ void fpstate_init(union fpregs_state *state)
 }
 EXPORT_SYMBOL_GPL(fpstate_init);
 
-int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
+int fpu__copy(struct task_struct *dst, struct task_struct *src)
 {
+	struct fpu *dst_fpu = &dst->thread.fpu;
+	struct fpu *src_fpu = &src->thread.fpu;
+
 	dst_fpu->last_cpu = -1;
 
 	if (!src_fpu->initialized || !static_cpu_has(X86_FEATURE_FPU))
@@ -204,16 +207,23 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 	memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
 
 	/*
-	 * Save current FPU registers directly into the child
+	 * If the FPU registers are not loaded just memcpy() the state.
+	 * Otherwise save current FPU registers directly into the child
 	 * FPU context, without any memory-to-memory copying.
 	 *
 	 * ( The function 'fails' in the FNSAVE case, which destroys
-	 *   register contents so we have to copy them back. )
+	 *   register contents so we have to load them back. )
 	 */
-	if (!copy_fpregs_to_fpstate(dst_fpu)) {
-		memcpy(&src_fpu->state, &dst_fpu->state, fpu_kernel_xstate_size);
-		copy_kernel_to_fpregs(&src_fpu->state);
-	}
+	__fpregs_changes_begin();
+	if (test_thread_flag(TIF_LOAD_FPU))
+		memcpy(&dst_fpu->state, &src_fpu->state, fpu_kernel_xstate_size);
+
+	else if (!copy_fpregs_to_fpstate(dst_fpu))
+		copy_kernel_to_fpregs(&dst_fpu->state);
+
+	__fpregs_changes_end();
+
+	set_tsk_thread_flag(dst, TIF_LOAD_FPU);
 
 	trace_x86_fpu_copy_src(src_fpu);
 	trace_x86_fpu_copy_dst(dst_fpu);
@@ -236,6 +246,7 @@ void fpu__initialize(struct fpu *fpu)
 		trace_x86_fpu_activate_state(fpu);
 		/* Safe to do for the current task: */
 		fpu->initialized = 1;
+		set_thread_flag(TIF_LOAD_FPU);
 	}
 }
 EXPORT_SYMBOL_GPL(fpu__initialize);
@@ -306,26 +317,18 @@ void fpu__prepare_write(struct fpu *fpu)
 }
 
 /*
- * 'fpu__restore()' is called to copy FPU registers from
- * the FPU fpstate to the live hw registers and to activate
- * access to the hardware registers, so that FPU instructions
- * can be used afterwards.
- *
- * Must be called with kernel preemption disabled (for example
- * with local interrupts disabled, as it is in the case of
- * do_device_not_available()).
+ * 'fpu__restore()' is called to ensure that the FPU registers in fpstate
+ * are loaded on return to userspace.
  */
 void fpu__restore(struct fpu *fpu)
 {
-	fpu__initialize(fpu);
+	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	/* Avoid __kernel_fpu_begin() right after fpregs_activate() */
-	kernel_fpu_disable();
 	trace_x86_fpu_before_restore(fpu);
-	fpregs_activate(fpu);
-	copy_kernel_to_fpregs(&fpu->state);
+	__fpu_invalidate_fpregs_state(fpu);
+	fpu->initialized = 1;
+	set_thread_flag(TIF_LOAD_FPU);
 	trace_x86_fpu_after_restore(fpu);
-	kernel_fpu_enable();
 }
 EXPORT_SYMBOL_GPL(fpu__restore);
 
@@ -400,6 +403,53 @@ void fpu__clear(struct fpu *fpu)
 	}
 }
 
+/*
+ * Load FPU context before returning to userspace.
+ */
+void switch_fpu_return(void)
+{
+	if (!static_cpu_has(X86_FEATURE_FPU))
+		return;
+
+	/*
+	 * We should never return to user space without the task's
+	 * own FPU contents loaded into the registers. That makes it
+	 * a bug to not have the task's FPU state set up.
+	 */
+	WARN_ON_FPU(!current->thread.fpu.initialized);
+
+	__fpregs_load_activate();
+}
+EXPORT_SYMBOL_GPL(switch_fpu_return);
+
+#ifdef CONFIG_X86_DEBUG_FPU
+/*
+ * If current FPU state according to its tracking (loaded FPU ctx on this CPU)
+ * is not valid then we must have TIF_LOAD_FPU set so the context is loaded on
+ * return to userland.
+ */
+void fpregs_is_state_consistent(void)
+{
+       struct fpu *fpu = &current->thread.fpu;
+
+       if (!fpu->initialized)
+               return;
+       if (test_thread_flag(TIF_LOAD_FPU))
+               return;
+       WARN_ON_FPU(!fpregs_state_valid(fpu, smp_processor_id()));
+}
+EXPORT_SYMBOL_GPL(fpregs_is_state_consistent);
+#endif
+
+void fpregs_load_activate(void)
+{
+	if (test_thread_flag(TIF_LOAD_FPU))
+		__fpregs_load_activate();
+	else
+		fpregs_is_state_consistent();
+}
+EXPORT_SYMBOL_GPL(fpregs_load_activate);
+
 /*
  * x87 math exception handling:
  */
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 979dcd1ed82e0..45d2f165b47ac 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -325,10 +325,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 			sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
 		}
 
-		fpu->initialized = 1;
-		preempt_disable();
 		fpu__restore(fpu);
-		preempt_enable();
 
 		return err;
 	} else {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c93fcfdf16734..cd7105fb92bfc 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -96,7 +96,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 	dst->thread.vm86 = NULL;
 #endif
 
-	return fpu__copy(&dst->thread.fpu, &src->thread.fpu);
+	return fpu__copy(dst, src);
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 5046a3c9dec2f..a65f8ce36379b 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -236,7 +236,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
 
-	switch_fpu_prepare(prev_fpu, cpu);
+	if (prev_fpu->initialized && !test_thread_flag(TIF_LOAD_FPU))
+		switch_fpu_prepare(prev_fpu, cpu);
 
 	/*
 	 * Save away %gs. No need to save %fs, as it was saved on the
@@ -297,10 +298,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (prev->gs | next->gs)
 		lazy_load_gs(next->gs);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	this_cpu_write(current_task, next_p);
 
+	switch_fpu_finish(next_fpu);
+
 	/* Load the Intel cache allocation PQR MSR. */
 	intel_rdt_sched_in();
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ea5ea850348da..66b763f3da6a0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -427,7 +427,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
 		     this_cpu_read(irq_count) != -1);
 
-	switch_fpu_prepare(prev_fpu, cpu);
+	if (prev_fpu->initialized && !test_thread_flag(TIF_LOAD_FPU))
+		switch_fpu_prepare(prev_fpu, cpu);
 
 	/* We must save %fs and %gs before load_TLS() because
 	 * %fs and %gs may be cleared by load_TLS().
@@ -478,8 +479,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	load_seg_legacy(prev->gsindex, prev->gsbase,
 			next->gsindex, next->gsbase, GS);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	/*
 	 * Switch the PDA and FPU contexts.
 	 */
@@ -489,6 +488,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* Reload sp0. */
 	update_task_stack(next_p);
 
+	switch_fpu_finish(next_fpu);
+
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
 	 */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index edbf00ec56b34..2a9f35e8bb81e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7592,6 +7592,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		wait_lapic_expire(vcpu);
 	guest_enter_irqoff();
 
+	if (test_thread_flag(TIF_LOAD_FPU))
+		switch_fpu_return();
+	else
+		fpregs_is_state_consistent();
+
 	if (unlikely(vcpu->arch.switch_db_regs)) {
 		set_debugreg(0, 7);
 		set_debugreg(vcpu->arch.eff_db[0], 0);
@@ -7851,22 +7856,27 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 /* Swap (qemu) user FPU context for the guest FPU context. */
 static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	preempt_disable();
+	__fpregs_changes_begin();
+	fpregs_load_activate();
+
 	copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
+
 	/* PKRU is separately restored in kvm_x86_ops->run.  */
 	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
 				~XFEATURE_MASK_PKRU);
-	preempt_enable();
+	__fpregs_changes_end();
 	trace_kvm_fpu(1);
 }
 
 /* When vcpu_run ends, restore user space FPU context. */
 static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	preempt_disable();
+	__fpregs_changes_begin();
+	fpregs_load_activate();
+
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
 	copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
-	preempt_enable();
+	__fpregs_changes_end();
 	++vcpu->stat.fpu_reload;
 	trace_kvm_fpu(0);
 }
-- 
2.19.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/11] x86/fpu: defer FPU state load until return to userspace
  2018-10-04 14:05 ` [PATCH 11/11] x86/fpu: defer FPU state load until return to userspace Sebastian Andrzej Siewior
@ 2018-10-04 16:14   ` Andy Lutomirski
  2018-10-12 20:25     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-04 16:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, X86 ML, Andrew Lutomirski, Paolo Bonzini, Radim Krcmar,
	kvm list, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Thu, Oct 4, 2018 at 7:06 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> From: Rik van Riel <riel@surriel.com>
>
> Defer loading of FPU state until return to userspace. This gives
> the kernel the potential to skip loading FPU state for tasks that
> stay in kernel mode, or for tasks that end up with repeated
> invocations of kernel_fpu_begin.
>
> It also increases the chances that a task's FPU state will remain
> valid in the FPU registers until it is scheduled back in, allowing
> us to skip restoring that task's FPU state altogether.
>
> The __fpregs_changes_{begin|end}() section ensures that the register
> remain unchanged. Otherwise a context switch or a BH could save the
> registers to its FPU context and processor's FPU register would remain
> random.
> fpu__restore() has one user so I pulled that preempt_disable() part into
> fpu__restore(). While the function did *load* the registers, it now just
> makes sure that they are loaded on return to userland.
>
> KVM swaps the host/guest register on enry/exit path. I kept the flow as
> is. First it ensures that the registers are loaded and then saves the
> current (host) state before it loads the guest's register. Before
> entring the guest, it ensures that the register are still loaded.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/entry/common.c             |   9 +++
>  arch/x86/include/asm/fpu/api.h      |  11 +++
>  arch/x86/include/asm/fpu/internal.h |  25 ++++---
>  arch/x86/include/asm/trace/fpu.h    |   5 +-
>  arch/x86/kernel/fpu/core.c          | 108 ++++++++++++++++++++--------
>  arch/x86/kernel/fpu/signal.c        |   3 -
>  arch/x86/kernel/process.c           |   2 +-
>  arch/x86/kernel/process_32.c        |   7 +-
>  arch/x86/kernel/process_64.c        |   7 +-
>  arch/x86/kvm/x86.c                  |  18 +++--
>  10 files changed, 143 insertions(+), 52 deletions(-)
>
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 3b2490b819181..3dad5c3b335eb 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -31,6 +31,7 @@
>  #include <asm/vdso.h>
>  #include <linux/uaccess.h>
>  #include <asm/cpufeature.h>
> +#include <asm/fpu/api.h>
>
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/syscalls.h>
> @@ -196,6 +197,14 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
>         if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
>                 exit_to_usermode_loop(regs, cached_flags);
>
> +       /* Reload ti->flags; we may have rescheduled above. */
> +       cached_flags = READ_ONCE(ti->flags);
> +
> +       if (unlikely(cached_flags & _TIF_LOAD_FPU))
> +               switch_fpu_return();
> +       else
> +               fpregs_is_state_consistent();

Shouldn't this be:

fpregs_assert_state_consistent();  /* see below */

if (unlikely(cached_flags & _TIF_LOAD_FPU))
  switch_fpu_return();

> diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> index a9caac9d4a729..e3077860f7333 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -27,6 +27,17 @@ extern void kernel_fpu_begin(void);
>  extern void kernel_fpu_end(void);
>  extern bool irq_fpu_usable(void);
>
> +#ifdef CONFIG_X86_DEBUG_FPU
> +extern void fpregs_is_state_consistent(void);
> +#else
> +static inline void fpregs_is_state_consistent(void) { }
> +#endif

Can you name this something like fpregs_assert_state_consistent()?
The "is" name makes it sound like it's:

bool fpregs_is_state_consistent();

and you're supposed to do:

WARN_ON(!fpregs_is_state_consistent());

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/11 v3] x86: load FPU registers on return to userland
  2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
                   ` (10 preceding siblings ...)
  2018-10-04 14:05 ` [PATCH 11/11] x86/fpu: defer FPU state load until return to userspace Sebastian Andrzej Siewior
@ 2018-10-04 16:45 ` Rik van Riel
  2018-10-04 16:50   ` Andy Lutomirski
  2018-10-05 11:55   ` Sebastian Andrzej Siewior
  11 siblings, 2 replies; 57+ messages in thread
From: Rik van Riel @ 2018-10-04 16:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 547 bytes --]

On Thu, 2018-10-04 at 16:05 +0200, Sebastian Andrzej Siewior wrote:


> In v3 I dropped that decouple idea. I also learned that the wrpkru
> instruction is not privileged and so caching it in kernel does not
> work.

Wait, so any thread can bypass its memory protection
keys, even if there is a seccomp filter preventing
it from calling the PKRU syscalls?

Is that intended?

Is that simply a hardware limitation, or something
where we can set a flag somewhere to force tasks to
go through the kernel?

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/11 v3] x86: load FPU registers on return to userland
  2018-10-04 16:45 ` [PATCH 00/11 v3] x86: load FPU registers on return to userland Rik van Riel
@ 2018-10-04 16:50   ` Andy Lutomirski
  2018-10-05 11:55   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-04 16:50 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Sebastian Andrzej Siewior, linux-kernel, x86, Andy Lutomirski,
	Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Dave Hansen



> On Oct 4, 2018, at 9:45 AM, Rik van Riel <riel@surriel.com> wrote:
> 
> On Thu, 2018-10-04 at 16:05 +0200, Sebastian Andrzej Siewior wrote:
> 
> 
>> In v3 I dropped that decouple idea. I also learned that the wrpkru
>> instruction is not privileged and so caching it in kernel does not
>> work.
> 
> Wait, so any thread can bypass its memory protection
> keys, even if there is a seccomp filter preventing
> it from calling the PKRU syscalls?
> 
> Is that intended?
> 
> Is that simply a hardware limitation, or something
> where we can set a flag somewhere to force tasks to
> go through the kernel?
> 
> 

Hardware limitation.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 00/11 v3] x86: load FPU registers on return to userland
  2018-10-04 16:45 ` [PATCH 00/11 v3] x86: load FPU registers on return to userland Rik van Riel
  2018-10-04 16:50   ` Andy Lutomirski
@ 2018-10-05 11:55   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-05 11:55 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Dave Hansen

On 2018-10-04 12:45:08 [-0400], Rik van Riel wrote:
> Wait, so any thread can bypass its memory protection
> keys, even if there is a seccomp filter preventing
> it from calling the PKRU syscalls?

We have SYS_pkey_alloc +free and SYS_pkey_mprotect. For read/ write of
the register value, libc is using and opcodes.

> Is that intended?

Either that or it ended like that because someone failed to attend a
meeting where this was discussed. Here is something from pkeys(7):

| Protection  keys  have  the  potential  to  add  a  layer  of security and
| reliability to applications.  But they have not been primarily designed as a
| security feature.  For instance, WRPKRU is a completely unprivileged
| instruction, so pkeys are useless in any case that an attacker controls the
| PKRU register or can execute arbitrary instructions.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 01/11] x86/entry: remove _TIF_ALLWORK_MASK
  2018-10-04 14:05 ` [PATCH 01/11] x86/entry: remove _TIF_ALLWORK_MASK Sebastian Andrzej Siewior
@ 2018-10-11 16:27   ` Borislav Petkov
  0 siblings, 0 replies; 57+ messages in thread
From: Borislav Petkov @ 2018-10-11 16:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Thu, Oct 04, 2018 at 04:05:37PM +0200, Sebastian Andrzej Siewior wrote:
> There is no user of _TIF_ALLWORK_MASK since commit 21d375b6b34ff
> ("x86/entry/64: Remove the SYSCALL64 fast path").
> Remove unused define _TIF_ALLWORK_MASK.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/thread_info.h | 8 --------
>  1 file changed, 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index 2ff2a30a264f4..cd6920674b905 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -136,14 +136,6 @@ struct thread_info {
>  	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT |	\
>  	 _TIF_NOHZ)
>  
> -/* work to do on any return to user space */
> -#define _TIF_ALLWORK_MASK						\
> -	(_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |	\
> -	 _TIF_NEED_RESCHED | _TIF_SINGLESTEP | _TIF_SYSCALL_EMU |	\
> -	 _TIF_SYSCALL_AUDIT | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE |	\
> -	 _TIF_PATCH_PENDING | _TIF_NOHZ | _TIF_SYSCALL_TRACEPOINT |	\
> -	 _TIF_FSCHECK)
> -
>  /* flags to check in __switch_to() */
>  #define _TIF_WORK_CTXSW							\
>  	(_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP|_TIF_SSBD)
> -- 

Just a nitpick for the next version: pls start subjects with a capital
letter, i.e.,

x86/entry: Remove _TIF_ALLWORK_MASK

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask
  2018-10-04 14:05 ` [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
@ 2018-10-11 17:30   ` Christophe de Dinechin
  2018-10-18 11:19     ` Sebastian Andrzej Siewior
  2018-10-12 15:52   ` Dave Hansen
  2018-10-17 10:01   ` Borislav Petkov
  2 siblings, 1 reply; 57+ messages in thread
From: Christophe de Dinechin @ 2018-10-11 17:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen


Sebastian Andrzej Siewior writes:

> Most users of __raw_xsave_addr() use a feature number, shift it to a
> mask and then __raw_xsave_addr() shifts it back to the feature number.
>
> Make __raw_xsave_addr() use the feature number as argument.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/kernel/fpu/xstate.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 87a57b7642d36..38d0b5ea40425 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -811,10 +811,8 @@ void fpu__resume_cpu(void)
>   *
>   * Note: does not work for compacted buffers.
>   */
> -void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
> +void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr)

It might be clearer to offer both interfaces, since both are used?

>  {
> -	int feature_nr = fls64(xstate_feature_mask) - 1;
> -
>  	if (!xfeature_enabled(feature_nr)) {
>  		WARN_ON_FPU(1);
>  		return NULL;
> @@ -842,6 +840,7 @@ void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
>   */
>  void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
>  {
> +	int feature_nr;
>  	/*
>  	 * Do we even *have* xsave state?
>  	 */
> @@ -869,7 +868,8 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
>  	if (!(xsave->header.xfeatures & xstate_feature))
>  		return NULL;
>
> -	return __raw_xsave_addr(xsave, xstate_feature);
> +	feature_nr = fls64(xstate_feature) - 1;
> +	return __raw_xsave_addr(xsave, feature_nr);

and then move that to a wrapper function that takes a mask?

>  }
>  EXPORT_SYMBOL_GPL(get_xsave_addr);
>
> @@ -1018,7 +1018,7 @@ int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int of
>  		 * Copy only in-use xstates:
>  		 */
>  		if ((header.xfeatures >> i) & 1) {
> -			void *src = __raw_xsave_addr(xsave, 1 << i);
> +			void *src = __raw_xsave_addr(xsave, i);
>
>  			offset = xstate_offsets[i];
>  			size = xstate_sizes[i];
> @@ -1104,7 +1104,7 @@ int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned i
>  		 * Copy only in-use xstates:
>  		 */
>  		if ((header.xfeatures >> i) & 1) {
> -			void *src = __raw_xsave_addr(xsave, 1 << i);
> +			void *src = __raw_xsave_addr(xsave, i);
>
>  			offset = xstate_offsets[i];
>  			size = xstate_sizes[i];
> @@ -1161,7 +1161,7 @@ int copy_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf)
>  		u64 mask = ((u64)1 << i);
>
>  		if (hdr.xfeatures & mask) {
> -			void *dst = __raw_xsave_addr(xsave, 1 << i);
> +			void *dst = __raw_xsave_addr(xsave, i);
>
>  			offset = xstate_offsets[i];
>  			size = xstate_sizes[i];
> @@ -1215,7 +1215,7 @@ int copy_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf)
>  		u64 mask = ((u64)1 << i);
>
>  		if (hdr.xfeatures & mask) {
> -			void *dst = __raw_xsave_addr(xsave, 1 << i);
> +			void *dst = __raw_xsave_addr(xsave, i);
>
>  			offset = xstate_offsets[i];
>  			size = xstate_sizes[i];


--
Cheers,
Christophe de Dinechin (IRC c3d)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2018-10-04 14:05 ` [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
@ 2018-10-11 17:50   ` Christophe de Dinechin
  2018-10-11 21:18     ` Andy Lutomirski
  2018-10-12 18:15   ` Dave Hansen
  1 sibling, 1 reply; 57+ messages in thread
From: Christophe de Dinechin @ 2018-10-11 17:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen


Sebastian Andrzej Siewior writes:

> From: Rik van Riel <riel@surriel.com>
>
> copy_fpstate_to_sigframe() has two callers and both invoke the function only if
> fpu->initialized is set.

One of the callers is in a different file.
Not sure if that warrants some kind of check that
fpu->initialized is set?

> So the check in the function for ->initialized makes
> no sense. It might be a relict from the lazy-FPU time: If the FPU
> registers

s/relict/relic/

> were "loaded" then we could save them directly. Otherwise (the FPU
> registers are not up to date) they are saved in the FPU struct and
> it could be used for memcpy().

>
> Since the registers always loaded at this point, save them and copy
> later.

register [are] always loaded

> This code is extracted from an earlier version of the patchset while
> there still was lazy-FPU on x86. This is a preparation for loading the
> FPU registers on return to userland.

maybe "only on return"?

>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/internal.h | 45 ----------------------------
>  arch/x86/kernel/fpu/signal.c        | 46 +++++++----------------------
>  2 files changed, 11 insertions(+), 80 deletions(-)
>
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 4ecaf4d22954e..df8816be3efdd 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -126,22 +126,6 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
>  		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore)	\
>  		     : output : input)
>
> -static inline int copy_fregs_to_user(struct fregs_state __user *fx)
> -{
> -	return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
> -}
> -
> -static inline int copy_fxregs_to_user(struct fxregs_state __user *fx)
> -{
> -	if (IS_ENABLED(CONFIG_X86_32))
> -		return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
> -	else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
> -		return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
> -
> -	/* See comment in copy_fxregs_to_kernel() below. */
> -	return user_insn(rex64/fxsave (%[fx]), "=m" (*fx), [fx] "R" (fx));
> -}
> -
>  static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
>  {
>  	if (IS_ENABLED(CONFIG_X86_32)) {
> @@ -352,35 +336,6 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
>  	XSTATE_XRESTORE(xstate, lmask, hmask);
>  }
>
> -/*
> - * Save xstate to user space xsave area.
> - *
> - * We don't use modified optimization because xrstor/xrstors might track
> - * a different application.
> - *
> - * We don't use compacted format xsave area for
> - * backward compatibility for old applications which don't understand
> - * compacted format of xsave area.
> - */
> -static inline int copy_xregs_to_user(struct xregs_state __user *buf)
> -{
> -	int err;
> -
> -	/*
> -	 * Clear the xsave header first, so that reserved fields are
> -	 * initialized to zero.
> -	 */
> -	err = __clear_user(&buf->header, sizeof(buf->header));
> -	if (unlikely(err))
> -		return -EFAULT;
> -
> -	stac();
> -	XSTATE_OP(XSAVE, buf, -1, -1, err);
> -	clac();
> -
> -	return err;
> -}
> -
>  /*
>   * Restore xstate from user space xsave area.
>   */
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 23f1691670b66..c8f5ff58578ed 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -117,23 +117,6 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
>
>  	return err;
>  }
> -
> -static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
> -{
> -	int err;
> -
> -	if (use_xsave())
> -		err = copy_xregs_to_user(buf);
> -	else if (use_fxsr())
> -		err = copy_fxregs_to_user((struct fxregs_state __user *) buf);
> -	else
> -		err = copy_fregs_to_user((struct fregs_state __user *) buf);
> -
> -	if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
> -		err = -EFAULT;
> -	return err;
> -}
> -
>  /*
>   * Save the fpu, extended register state to the user signal frame.
>   *

I would modify the function comment to add the new expectation that
fpu->initialized must be set on entry.

> @@ -172,27 +155,20 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
>  			sizeof(struct user_i387_ia32_struct), NULL,
>  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
>
> -	if (fpu->initialized || using_compacted_format()) {
> -		/* Save the live register state to the user directly. */
> -		if (copy_fpregs_to_sigframe(buf_fx))
> -			return -1;
> -		/* Update the thread's fxstate to save the fsave header. */
> -		if (ia32_fxstate)
> -			copy_fxregs_to_kernel(fpu);
> +	/* Update the thread's fxstate to save the fsave header. */
> +	if (ia32_fxstate) {
> +		copy_fxregs_to_kernel(fpu);
>  	} else {
> -		/*
> -		 * It is a *bug* if kernel uses compacted-format for xsave
> -		 * area and we copy it out directly to a signal frame. It
> -		 * should have been handled above by saving the registers
> -		 * directly.
> -		 */
> -		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> -			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
> -			return -1;
> -		}
> +		copy_fpregs_to_fpstate(fpu);
> +		fpregs_deactivate(fpu);
> +	}
>
> +	if (using_compacted_format()) {
> +		copy_xstate_to_user(buf_fx, xsave, 0, size);
> +	} else {
>  		fpstate_sanitize_xstate(fpu);
> -		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
> +		size = fpu_user_xstate_size;
> +		if (__copy_to_user(buf_fx, xsave, size))
>  			return -1;
>  	}


--
Cheers,
Christophe de Dinechin (IRC c3d)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2018-10-11 17:50   ` Christophe de Dinechin
@ 2018-10-11 21:18     ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-11 21:18 UTC (permalink / raw)
  To: christophe.de.dinechin
  Cc: Sebastian Andrzej Siewior, LKML, X86 ML, Andrew Lutomirski,
	Paolo Bonzini, Radim Krcmar, kvm list, Jason A. Donenfeld,
	Rik van Riel, Dave Hansen

On Thu, Oct 11, 2018 at 10:50 AM Christophe de Dinechin
<christophe.de.dinechin@gmail.com> wrote:
>
>
> Sebastian Andrzej Siewior writes:
>
> > From: Rik van Riel <riel@surriel.com>
> >
> > copy_fpstate_to_sigframe() has two callers and both invoke the function only if
> > fpu->initialized is set.
>
> One of the callers is in a different file.
> Not sure if that warrants some kind of check that
> fpu->initialized is set?

I still kind of think that fpu->initialized should be removed entirely
as part of this series.  Keeping unnecessary complication around in
the FPU code increaseds the scope for interesting bugs and makes
understanding the code considerably harder.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask
  2018-10-04 14:05 ` [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
  2018-10-11 17:30   ` Christophe de Dinechin
@ 2018-10-12 15:52   ` Dave Hansen
  2018-10-18 11:17     ` Sebastian Andrzej Siewior
  2018-10-17 10:01   ` Borislav Petkov
  2 siblings, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2018-10-12 15:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> Most users of __raw_xsave_addr() use a feature number, shift it to a
> mask and then __raw_xsave_addr() shifts it back to the feature number.
> 
> Make __raw_xsave_addr() use the feature number as argument.

This generally looks like a nice cleanup.  Thanks for taking a look at it!

> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 87a57b7642d36..38d0b5ea40425 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -811,10 +811,8 @@ void fpu__resume_cpu(void)
>   *
>   * Note: does not work for compacted buffers.
>   */
> -void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
> +void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr)
>  {

Could we call this 'xfeature_nr' consistently?

> -	int feature_nr = fls64(xstate_feature_mask) - 1;
> -
>  	if (!xfeature_enabled(feature_nr)) {
>  		WARN_ON_FPU(1);
>  		return NULL;
> @@ -842,6 +840,7 @@ void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
>   */
>  void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
>  {
> +	int feature_nr;
>  	/*
>  	 * Do we even *have* xsave state?
>  	 */
> @@ -869,7 +868,8 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
>  	if (!(xsave->header.xfeatures & xstate_feature))
>  		return NULL;
>  
> -	return __raw_xsave_addr(xsave, xstate_feature);
> +	feature_nr = fls64(xstate_feature) - 1;
> +	return __raw_xsave_addr(xsave, feature_nr);
>  }

Should we also be using a feature number for get_xsave_addr()?  In other
words, could you take a look and see how widely we should be doing this
kind of conversion and not just limit it to __raw_xsave_addr()?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/11] x86/fpu: eager switch PKRU state
  2018-10-04 14:05 ` [PATCH 04/11] x86/fpu: eager switch PKRU state Sebastian Andrzej Siewior
@ 2018-10-12 17:51   ` Dave Hansen
  2018-10-12 18:09     ` Andy Lutomirski
  2018-10-18 16:13     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 57+ messages in thread
From: Dave Hansen @ 2018-10-12 17:51 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> While most of a task's FPU state is only needed in user space,
> the protection keys need to be in place immediately after a
> context switch.
> 
> The reason is that any accesses to userspace memory while running
> in kernel mode also need to abide by the memory permissions
> specified in the protection keys.
> 
> The "eager switch" is a preparation for loading the FPU state on return
> to userland. Instead of decoupling PKRU state from xstate I update PKRU
> within xstate on write operations by the kernel.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> [bigeasy: save pkru to xstate, no cache]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/internal.h | 20 +++++++++++++++----
>  arch/x86/include/asm/fpu/xstate.h   |  2 ++
>  arch/x86/include/asm/pgtable.h      |  6 +-----
>  arch/x86/include/asm/pkeys.h        |  2 +-
>  arch/x86/kernel/fpu/core.c          |  2 +-
>  arch/x86/mm/pkeys.c                 | 31 ++++++++++++++++++++++-------
>  include/linux/pkeys.h               |  2 +-
>  7 files changed, 46 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 16c4077ffc945..956d967ca824a 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -570,11 +570,23 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
>   */
>  static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
>  {
> -	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
> -		       new_fpu->initialized;
> +	bool load_fpu;
>  
> -	if (preload)
> -		__fpregs_load_activate(new_fpu, cpu);
> +	load_fpu = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized;
> +	if (!load_fpu)
> +		return;

Needs comments, please.  Especially around what an uninitialized new_fpu
means.

> +	__fpregs_load_activate(new_fpu, cpu);
> +
> +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> +	if (static_cpu_has(X86_FEATURE_OSPKE)) {

FWIW, you should be able to use cpu_feature_enabled() instead of an
explicit #ifdef here.

> +		struct pkru_state *pk;
> +
> +		pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
> +		if (pk->pkru != __read_pkru())
> +			__write_pkru(pk->pkru);
> +	}
> +#endif
>  }

Comments here as well, please.

I think the goal is to keep the PKRU state in the 'init state' when
possible and also to save the cost of WRPKRU.  But, it would be really
nice to be explicit.

> -static inline void write_pkru(u32 pkru)
> -{
> -	if (boot_cpu_has(X86_FEATURE_OSPKE))
> -		__write_pkru(pkru);
> -}
> +void write_pkru(u32 pkru);

One reason I inlined this was because it enables the the PK code to be
optimized away entirely.  Putting the checks behind a function call
makes this optimization impossible.

Could you elaborate on why you chose to do this and what you think the
impact is or is not?

> diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
> index 19b137f1b3beb..b184f916319e5 100644
> --- a/arch/x86/include/asm/pkeys.h
> +++ b/arch/x86/include/asm/pkeys.h
> @@ -119,7 +119,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>  		unsigned long init_val);
>  extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>  		unsigned long init_val);
> -extern void copy_init_pkru_to_fpregs(void);
> +extern void pkru_set_init_value(void);

Could you elaborate on why the name is being changed?

> +void write_pkru(u32 pkru)
> +{
> +	struct pkru_state *pk;
> +
> +	if (!boot_cpu_has(X86_FEATURE_OSPKE))
> +		return;
> +
> +	pk = __raw_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> +	/*
> +	 * Update the PKRU value in cstate and then in the CPU. A context

"cstate"?  Did you mean xstate?

> +	 * switch between those two operation would load the new value from the
> +	 * updated xstate and then we would write (the same value) to the CPU.
> +	 */
> +	pk->pkru = pkru;
> +	__write_pkru(pkru);
> +
> +}

There's an unnecessary line there.

This also needs a lot more high-level context about why it is necessary.
 I think you had that in the changelog, but we also need the function
commented.


> -void copy_init_pkru_to_fpregs(void)
> +void pkru_set_init_value(void)
>  {
>  	u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value);
> +
>  	/*
>  	 * Any write to PKRU takes it out of the XSAVE 'init
>  	 * state' which increases context switch cost.  Avoid
> -	 * writing 0 when PKRU was already 0.
> +	 * writing then same value which is already written.
>  	 */

s/then/the/

> -	if (!init_pkru_value_snapshot && !read_pkru())
> +	if (init_pkru_value_snapshot == read_pkru())
>  		return;
> -	/*
> -	 * Override the PKRU state that came from 'init_fpstate'
> -	 * with the baseline from the process.
> -	 */
> +
>  	write_pkru(init_pkru_value_snapshot);
>  }

Isn't this doing some of the same work (including rdpkru()) as write_pkru()?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-04 14:05 ` [PATCH 05/11] x86/fpu: set PKRU state for kernel threads Sebastian Andrzej Siewior
@ 2018-10-12 17:54   ` Dave Hansen
  2018-10-12 18:02     ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2018-10-12 17:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> The PKRU value is not set for kernel threads because they do not have
> the ->initialized value set. As a result the kernel thread has a random
> PKRU value set which it inherits from the previous task.
> It has been suggested by Paolo Bonzini to set it for kernel threads, too
> because it might be a fix.
> I *think* this is not required because the kernel threads don't copy
> data to/from userland and don't have access to any userspace mm in
> general.
> However there is this use_mm(). If we gain a mm by use_mm() we don't
> have a matching PKRU value because those are per thread. It has been
> suggested to use 0 as the PKRU value but this would bypass PKRU.
> 
> Set the initial (default) PKRU value for kernel threads.

We might want to do this for cleanliness reasons...  Maybe.

But this *should* have no practical effects.  Kernel threads have no
real 'mm' and no user pages.  They should not have do access to user
mappings.  Protection keys *only* apply to user mappings.  Thus,
logically, they should never be affected by PKRU values.

So I'm kinda missing the point of the patch.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 06/11] x86/pkeys: make init_pkru_value static
  2018-10-04 14:05 ` [PATCH 06/11] x86/pkeys: make init_pkru_value static Sebastian Andrzej Siewior
@ 2018-10-12 17:55   ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2018-10-12 17:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> The variable init_pkru_value isn't used outside of this file.
> Make init_pkru_value static.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Looks good.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/11] x86/pkeys: Drop the preempt-disable section
  2018-10-04 14:05 ` [PATCH 07/11] x86/pkeys: Drop the preempt-disable section Sebastian Andrzej Siewior
@ 2018-10-12 17:58   ` Dave Hansen
  2018-10-12 18:07     ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2018-10-12 17:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> The fpu->initialized flag should not be changed underneath us. This might be a
> fallout during the removal of the LazyFPU support. The FPU is marked
> initialized as soon as the state has been set to an initial value. It does not
> signal if the CPU's FPU registers are loaded.

If this is true, then the check for fpu->initialized should also go away.

If we were paranoid, we might also add a WARN_ON_ONCE() to make sure our
assumption holds true.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-12 17:54   ` Dave Hansen
@ 2018-10-12 18:02     ` Andy Lutomirski
  2018-10-18 16:26       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-12 18:02 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sebastian Andrzej Siewior, LKML, X86 ML, Andrew Lutomirski,
	Paolo Bonzini, Radim Krcmar, kvm list, Jason A. Donenfeld,
	Rik van Riel

On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
<dave.hansen@linux.intel.com> wrote:
>
> On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> > The PKRU value is not set for kernel threads because they do not have
> > the ->initialized value set. As a result the kernel thread has a random
> > PKRU value set which it inherits from the previous task.
> > It has been suggested by Paolo Bonzini to set it for kernel threads, too
> > because it might be a fix.
> > I *think* this is not required because the kernel threads don't copy
> > data to/from userland and don't have access to any userspace mm in
> > general.
> > However there is this use_mm(). If we gain a mm by use_mm() we don't
> > have a matching PKRU value because those are per thread. It has been
> > suggested to use 0 as the PKRU value but this would bypass PKRU.
> >
> > Set the initial (default) PKRU value for kernel threads.
>
> We might want to do this for cleanliness reasons...  Maybe.
>
> But this *should* have no practical effects.  Kernel threads have no
> real 'mm' and no user pages.  They should not have do access to user
> mappings.  Protection keys *only* apply to user mappings.  Thus,
> logically, they should never be affected by PKRU values.
>
> So I'm kinda missing the point of the patch.

use_mm().

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/11] x86/pkeys: Drop the preempt-disable section
  2018-10-12 17:58   ` Dave Hansen
@ 2018-10-12 18:07     ` Andy Lutomirski
  2018-10-12 20:26       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-12 18:07 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sebastian Andrzej Siewior, LKML, X86 ML, Andrew Lutomirski,
	Paolo Bonzini, Radim Krcmar, kvm list, Jason A. Donenfeld,
	Rik van Riel

On Fri, Oct 12, 2018 at 10:58 AM Dave Hansen
<dave.hansen@linux.intel.com> wrote:
>
> On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> > The fpu->initialized flag should not be changed underneath us. This might be a
> > fallout during the removal of the LazyFPU support. The FPU is marked
> > initialized as soon as the state has been set to an initial value. It does not
> > signal if the CPU's FPU registers are loaded.
>
> If this is true, then the check for fpu->initialized should also go away.
>

All these "if" statements in the FPU code are messy and make
understanding and reviewing this code hard.

Can you prepare a patch for the beginning of your series that removes
fpu->initialized and just ensures that the fpu is always initialized?
This will regress performance, but you should get all that performance
back with TIF_LOAD_FPU.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/11] x86/fpu: eager switch PKRU state
  2018-10-12 17:51   ` Dave Hansen
@ 2018-10-12 18:09     ` Andy Lutomirski
  2018-10-12 19:44       ` Dave Hansen
  2018-10-18 16:13     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-12 18:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sebastian Andrzej Siewior, LKML, X86 ML, Andrew Lutomirski,
	Paolo Bonzini, Radim Krcmar, kvm list, Jason A. Donenfeld,
	Rik van Riel

On Fri, Oct 12, 2018 at 10:51 AM Dave Hansen
<dave.hansen@linux.intel.com> wrote:
>
> On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> > From: Rik van Riel <riel@surriel.com>
> >
> > While most of a task's FPU state is only needed in user space,
> > the protection keys need to be in place immediately after a
> > context switch.
> >
> > The reason is that any accesses to userspace memory while running
> > in kernel mode also need to abide by the memory permissions
> > specified in the protection keys.
> >
> > The "eager switch" is a preparation for loading the FPU state on return
> > to userland. Instead of decoupling PKRU state from xstate I update PKRU
> > within xstate on write operations by the kernel.
> >
> > Signed-off-by: Rik van Riel <riel@surriel.com>
> > [bigeasy: save pkru to xstate, no cache]
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> >  arch/x86/include/asm/fpu/internal.h | 20 +++++++++++++++----
> >  arch/x86/include/asm/fpu/xstate.h   |  2 ++
> >  arch/x86/include/asm/pgtable.h      |  6 +-----
> >  arch/x86/include/asm/pkeys.h        |  2 +-
> >  arch/x86/kernel/fpu/core.c          |  2 +-
> >  arch/x86/mm/pkeys.c                 | 31 ++++++++++++++++++++++-------
> >  include/linux/pkeys.h               |  2 +-
> >  7 files changed, 46 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > index 16c4077ffc945..956d967ca824a 100644
> > --- a/arch/x86/include/asm/fpu/internal.h
> > +++ b/arch/x86/include/asm/fpu/internal.h
> > @@ -570,11 +570,23 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
> >   */
> >  static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
> >  {
> > -     bool preload = static_cpu_has(X86_FEATURE_FPU) &&
> > -                    new_fpu->initialized;
> > +     bool load_fpu;
> >
> > -     if (preload)
> > -             __fpregs_load_activate(new_fpu, cpu);
> > +     load_fpu = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized;
> > +     if (!load_fpu)
> > +             return;
>
> Needs comments, please.  Especially around what an uninitialized new_fpu
> means.

See my other comment about getting rid of ->initialized *first*.

>
> > +     __fpregs_load_activate(new_fpu, cpu);
> > +
> > +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> > +     if (static_cpu_has(X86_FEATURE_OSPKE)) {
>
> FWIW, you should be able to use cpu_feature_enabled() instead of an
> explicit #ifdef here.
>
> > +             struct pkru_state *pk;
> > +
> > +             pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
> > +             if (pk->pkru != __read_pkru())
> > +                     __write_pkru(pk->pkru);
> > +     }
> > +#endif
> >  }
>
> Comments here as well, please.
>
> I think the goal is to keep the PKRU state in the 'init state' when
> possible and also to save the cost of WRPKRU.  But, it would be really
> nice to be explicit.

I suspect that this makes basically no difference.  PKRU is almost
never in the init state on Linux.  Also, it's a single word -- I doubt
that the init state optimization is worth much.

But maybe WRPKRU is more expensive than RDPKRU and a branch?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2018-10-04 14:05 ` [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
  2018-10-11 17:50   ` Christophe de Dinechin
@ 2018-10-12 18:15   ` Dave Hansen
  2018-11-02 14:42     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2018-10-12 18:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> copy_fpstate_to_sigframe() has two callers and both invoke the function only if
> fpu->initialized is set. So the check in the function for ->initialized makes
> no sense. It might be a relict from the lazy-FPU time: If the FPU registers
> were "loaded" then we could save them directly. Otherwise (the FPU
> registers are not up to date) they are saved in the FPU struct and
> it could be used for memcpy().
> 
> Since the registers always loaded at this point, save them and copy
> later.
> This code is extracted from an earlier version of the patchset while
> there still was lazy-FPU on x86. This is a preparation for loading the
> FPU registers on return to userland.

Could you make a pass through all the changelogs at at least make sure
they look sane and consistent?  The text width and paragraph spacing
here looks rather cobbled together.

> ---
>  arch/x86/include/asm/fpu/internal.h | 45 ----------------------------
>  arch/x86/kernel/fpu/signal.c        | 46 +++++++----------------------
>  2 files changed, 11 insertions(+), 80 deletions(-)

The diffstat here is really nice.

> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 4ecaf4d22954e..df8816be3efdd 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -126,22 +126,6 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
>  		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore)	\
>  		     : output : input)
>  
> -static inline int copy_fregs_to_user(struct fregs_state __user *fx)
> -{
> -	return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
> -}
> -
> -static inline int copy_fxregs_to_user(struct fxregs_state __user *fx)
> -{
> -	if (IS_ENABLED(CONFIG_X86_32))
> -		return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
> -	else if (IS_ENABLED(CONFIG_AS_FXSAVEQ))
> -		return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
> -
> -	/* See comment in copy_fxregs_to_kernel() below. */
> -	return user_insn(rex64/fxsave (%[fx]), "=m" (*fx), [fx] "R" (fx));
> -}
> -
>  static inline void copy_kernel_to_fxregs(struct fxregs_state *fx)
>  {
>  	if (IS_ENABLED(CONFIG_X86_32)) {
> @@ -352,35 +336,6 @@ static inline void copy_kernel_to_xregs(struct xregs_state *xstate, u64 mask)
>  	XSTATE_XRESTORE(xstate, lmask, hmask);
>  }
>  
> -/*
> - * Save xstate to user space xsave area.
> - *
> - * We don't use modified optimization because xrstor/xrstors might track
> - * a different application.
> - *
> - * We don't use compacted format xsave area for
> - * backward compatibility for old applications which don't understand
> - * compacted format of xsave area.
> - */
> -static inline int copy_xregs_to_user(struct xregs_state __user *buf)
> -{
> -	int err;
> -
> -	/*
> -	 * Clear the xsave header first, so that reserved fields are
> -	 * initialized to zero.
> -	 */
> -	err = __clear_user(&buf->header, sizeof(buf->header));
> -	if (unlikely(err))
> -		return -EFAULT;
> -
> -	stac();
> -	XSTATE_OP(XSAVE, buf, -1, -1, err);
> -	clac();
> -
> -	return err;
> -}
> -
>  /*
>   * Restore xstate from user space xsave area.
>   */
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index 23f1691670b66..c8f5ff58578ed 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -117,23 +117,6 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
>  
>  	return err;
>  }
> -
> -static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf)
> -{
> -	int err;
> -
> -	if (use_xsave())
> -		err = copy_xregs_to_user(buf);
> -	else if (use_fxsr())
> -		err = copy_fxregs_to_user((struct fxregs_state __user *) buf);
> -	else
> -		err = copy_fregs_to_user((struct fregs_state __user *) buf);
> -
> -	if (unlikely(err) && __clear_user(buf, fpu_user_xstate_size))
> -		err = -EFAULT;
> -	return err;
> -}
> -
>  /*
>   * Save the fpu, extended register state to the user signal frame.
>   *
> @@ -172,27 +155,20 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
>  			sizeof(struct user_i387_ia32_struct), NULL,
>  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
>  
> -	if (fpu->initialized || using_compacted_format()) {
> -		/* Save the live register state to the user directly. */
> -		if (copy_fpregs_to_sigframe(buf_fx))
> -			return -1;
> -		/* Update the thread's fxstate to save the fsave header. */
> -		if (ia32_fxstate)
> -			copy_fxregs_to_kernel(fpu);
> +	/* Update the thread's fxstate to save the fsave header. */
> +	if (ia32_fxstate) {
> +		copy_fxregs_to_kernel(fpu);
>  	} else {
> -		/*
> -		 * It is a *bug* if kernel uses compacted-format for xsave
> -		 * area and we copy it out directly to a signal frame. It
> -		 * should have been handled above by saving the registers
> -		 * directly.
> -		 */
> -		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> -			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
> -			return -1;
> -		}
> +		copy_fpregs_to_fpstate(fpu);
> +		fpregs_deactivate(fpu);
> +	}

Could you add a high-level comment for this if{}else{} block that says
something like:

	/* Save the registers to the fpstate. */

I also think it's worthwhile to explain the asymmetry between the
ia32_fxstate case and the other branch.  Why don't we
fpregs_deactivate() in the ia32_fxstate path, for instance?

> +	if (using_compacted_format()) {
> +		copy_xstate_to_user(buf_fx, xsave, 0, size);
> +	} else {
>  		fpstate_sanitize_xstate(fpu);
> -		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
> +		size = fpu_user_xstate_size;
> +		if (__copy_to_user(buf_fx, xsave, size))
>  			return -1;
>  	}

This seems unnecessary.  Why are you updating 'size' like this?


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU
  2018-10-04 14:05 ` [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU Sebastian Andrzej Siewior
@ 2018-10-12 19:40   ` Dave Hansen
  2018-10-15 15:24     ` Borislav Petkov
  2018-11-02 22:55     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 57+ messages in thread
From: Dave Hansen @ 2018-10-12 19:40 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> If TIF_LOAD_FPU is set, then the registers are saved (not loaded). In that case
> we skip the saving part.

This sentence hurts my brain.

	"If TIF_LOAD_FPU is set the registers are ... not loaded"

I think that means that something could use some better naming.

Should TIF_LOAD_FPU be TIF_NEED_FPU_LOAD, perhaps?

> Signed-off-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/kernel/fpu/signal.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> index c8f5ff58578ed..979dcd1ed82e0 100644
> --- a/arch/x86/kernel/fpu/signal.c
> +++ b/arch/x86/kernel/fpu/signal.c
> @@ -155,13 +155,17 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
>  			sizeof(struct user_i387_ia32_struct), NULL,
>  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
>  
> -	/* Update the thread's fxstate to save the fsave header. */
> -	if (ia32_fxstate) {
> -		copy_fxregs_to_kernel(fpu);
> -	} else {
> -		copy_fpregs_to_fpstate(fpu);
> -		fpregs_deactivate(fpu);
> +	__fpregs_changes_begin();
> +	if (!test_thread_flag(TIF_LOAD_FPU)) {

This needs commenting, please.

If we do not need to load the FPU at return to userspace, it means the
state is in the the registers, not the buffer.  So, update the buffer to
match the registers.

> +		/* Update the thread's fxstate to save the fsave header. */
> +		if (ia32_fxstate) {
> +			copy_fxregs_to_kernel(fpu);
> +		} else {
> +			copy_fpregs_to_fpstate(fpu);
> +			fpregs_deactivate(fpu);
> +		}
>  	}
> +	__fpregs_changes_end();

Do we really need the __fpregs_changes_*() abstraction for this single
call site?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/11] x86/fpu: eager switch PKRU state
  2018-10-12 18:09     ` Andy Lutomirski
@ 2018-10-12 19:44       ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2018-10-12 19:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sebastian Andrzej Siewior, LKML, X86 ML, Paolo Bonzini,
	Radim Krcmar, kvm list, Jason A. Donenfeld, Rik van Riel

On 10/12/2018 11:09 AM, Andy Lutomirski wrote:
> But maybe WRPKRU is more expensive than RDPKRU and a branch?

Yeah, it is more expensive.  It has a higher cycle cost and it's also
practically a (light) speculation barrier.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 11/11] x86/fpu: defer FPU state load until return to userspace
  2018-10-04 16:14   ` Andy Lutomirski
@ 2018-10-12 20:25     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-12 20:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Paolo Bonzini, Radim Krcmar, kvm list,
	Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-10-04 09:14:33 [-0700], Andy Lutomirski wrote:
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index 3b2490b819181..3dad5c3b335eb 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -196,6 +197,14 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
> >         if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
> >                 exit_to_usermode_loop(regs, cached_flags);
> >
> > +       /* Reload ti->flags; we may have rescheduled above. */
> > +       cached_flags = READ_ONCE(ti->flags);
> > +
> > +       if (unlikely(cached_flags & _TIF_LOAD_FPU))
> > +               switch_fpu_return();
> > +       else
> > +               fpregs_is_state_consistent();
> 
> Shouldn't this be:
> 
> fpregs_assert_state_consistent();  /* see below */
> 
> if (unlikely(cached_flags & _TIF_LOAD_FPU))
>   switch_fpu_return();

hmm. This should work. 

> > diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> > index a9caac9d4a729..e3077860f7333 100644
> > --- a/arch/x86/include/asm/fpu/api.h
> > +++ b/arch/x86/include/asm/fpu/api.h
> > @@ -27,6 +27,17 @@ extern void kernel_fpu_begin(void);
> >  extern void kernel_fpu_end(void);
> >  extern bool irq_fpu_usable(void);
> >
> > +#ifdef CONFIG_X86_DEBUG_FPU
> > +extern void fpregs_is_state_consistent(void);
> > +#else
> > +static inline void fpregs_is_state_consistent(void) { }
> > +#endif
> 
> Can you name this something like fpregs_assert_state_consistent()?
sure.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 07/11] x86/pkeys: Drop the preempt-disable section
  2018-10-12 18:07     ` Andy Lutomirski
@ 2018-10-12 20:26       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-12 20:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, LKML, X86 ML, Paolo Bonzini, Radim Krcmar, kvm list,
	Jason A. Donenfeld, Rik van Riel

On 2018-10-12 11:07:28 [-0700], Andy Lutomirski wrote:
> All these "if" statements in the FPU code are messy and make
> understanding and reviewing this code hard.
> 
> Can you prepare a patch for the beginning of your series that removes
> fpu->initialized and just ensures that the fpu is always initialized?
> This will regress performance, but you should get all that performance
> back with TIF_LOAD_FPU.

okay, will do.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU
  2018-10-12 19:40   ` Dave Hansen
@ 2018-10-15 15:24     ` Borislav Petkov
  2018-11-02 15:44       ` Sebastian Andrzej Siewior
  2018-11-02 22:55     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 57+ messages in thread
From: Borislav Petkov @ 2018-10-15 15:24 UTC (permalink / raw)
  To: Dave Hansen, Sebastian Andrzej Siewior, Rik van Riel
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld

On Fri, Oct 12, 2018 at 12:40:19PM -0700, Dave Hansen wrote:
> > +	__fpregs_changes_end();
> 
> Do we really need the __fpregs_changes_*() abstraction for this single
> call site?

Yap, I'm staring at those in patch 2, there's no documentation there what
they're supposed to do, only the commit message of patch 11 says:

"The __fpregs_changes_{begin|end}() section ensures that the register
remain unchanged. Otherwise a context switch or a BH could save the
registers to its FPU context and processor's FPU register would remain
random."

So I'd say we should drop that abstraction, use preempt_* and put that
text above the single usage site.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask
  2018-10-04 14:05 ` [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
  2018-10-11 17:30   ` Christophe de Dinechin
  2018-10-12 15:52   ` Dave Hansen
@ 2018-10-17 10:01   ` Borislav Petkov
  2018-10-18 11:48     ` Sebastian Andrzej Siewior
  2 siblings, 1 reply; 57+ messages in thread
From: Borislav Petkov @ 2018-10-17 10:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On Thu, Oct 04, 2018 at 04:05:39PM +0200, Sebastian Andrzej Siewior wrote:
> Most users of __raw_xsave_addr() use a feature number, shift it to a
> mask and then __raw_xsave_addr() shifts it back to the feature number.
> 
> Make __raw_xsave_addr() use the feature number as argument.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/kernel/fpu/xstate.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 87a57b7642d36..38d0b5ea40425 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -811,10 +811,8 @@ void fpu__resume_cpu(void)
>   *
>   * Note: does not work for compacted buffers.
>   */
> -void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
> +void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr)

Don't forget to update the comment above it too - it still says "mask".

Otherwise nice.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask
  2018-10-12 15:52   ` Dave Hansen
@ 2018-10-18 11:17     ` Sebastian Andrzej Siewior
  2018-10-18 11:21       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-18 11:17 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 2018-10-12 08:52:03 [-0700], Dave Hansen wrote:
> > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> > index 87a57b7642d36..38d0b5ea40425 100644
> > --- a/arch/x86/kernel/fpu/xstate.c
> > +++ b/arch/x86/kernel/fpu/xstate.c
> > @@ -811,10 +811,8 @@ void fpu__resume_cpu(void)
> >   *
> >   * Note: does not work for compacted buffers.
> >   */
> > -void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
> > +void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr)
> >  {
> 
> Could we call this 'xfeature_nr' consistently?
yes.

> Should we also be using a feature number for get_xsave_addr()?  In other
> words, could you take a look and see how widely we should be doing this
> kind of conversion and not just limit it to __raw_xsave_addr()?

this brings us to:

Subject: [PATCH] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use
 feature number instead of mask

After changing the argument of __raw_xsave_addr() from a mask to number
Dave suggested to check if it makes sense to do the same for
get_xsave_addr(). As it turns out it does. Only get_xsave_addr() needs
the mask to check if the requested feature is part of what is
support/saved and then uses the number again. The shift operation is
cheaper compared to "find last bit set". Also, the feature number uses
less opcode space compared to the mask :)

Make get_xsave_addr() argument a xfeature number instead of mask and fix
up its callers.
As part of this use xfeature_nr and xfeature_mask consistently.
This results in changes to the kvm code as:
	feature -> xfeature_mask
	index -> xfeature_nr

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c      | 23 +++++++++++------------
 arch/x86/kernel/traps.c           |  2 +-
 arch/x86/kvm/x86.c                | 28 ++++++++++++++--------------
 arch/x86/mm/mpx.c                 |  6 +++---
 5 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 48581988d78c7..a0b6062012ed7 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -47,7 +47,7 @@ extern void __init update_regset_xstate_info(unsigned int size,
 
 void fpu__xstate_clear_all_cpu_caps(void);
 void *get_xsave_addr(struct xregs_state *xsave, int xstate);
-const void *get_xsave_field_ptr(int xstate_field);
+const void *get_xsave_field_ptr(int xfeature_nr);
 int using_compacted_format(void);
 int copy_xstate_to_kernel(void *kbuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
 int copy_xstate_to_user(void __user *ubuf, struct xregs_state *xsave, unsigned int offset, unsigned int size);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 3dfe3627deaf6..375226055a413 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -832,15 +832,15 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
  *
  * Inputs:
  *	xstate: the thread's storage area for all FPU data
- *	xstate_feature: state which is defined in xsave.h (e.g.
- *	XFEATURE_MASK_FP, XFEATURE_MASK_SSE, etc...)
+ *	xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
+ *	XFEATURE_SSE, etc...)
  * Output:
  *	address of the state in the xsave area, or NULL if the
  *	field is not present in the xsave buffer.
  */
-void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
+void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
 {
-	int xfeature_nr;
+	u64 xfeature_mask = 1ULL << xfeature_nr;
 	/*
 	 * Do we even *have* xsave state?
 	 */
@@ -852,11 +852,11 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	 * have not enabled.  Remember that pcntxt_mask is
 	 * what we write to the XCR0 register.
 	 */
-	WARN_ONCE(!(xfeatures_mask & xstate_feature),
+	WARN_ONCE(!(xfeatures_mask & xfeature_mask),
 		  "get of unsupported state");
 	/*
 	 * This assumes the last 'xsave*' instruction to
-	 * have requested that 'xstate_feature' be saved.
+	 * have requested that 'xfeature_mask' be saved.
 	 * If it did not, we might be seeing and old value
 	 * of the field in the buffer.
 	 *
@@ -865,10 +865,9 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
 	 * or because the "init optimization" caused it
 	 * to not be saved.
 	 */
-	if (!(xsave->header.xfeatures & xstate_feature))
+	if (!(xsave->header.xfeatures & xfeature_mask))
 		return NULL;
 
-	xfeature_nr = fls64(xstate_feature) - 1;
 	return __raw_xsave_addr(xsave, xfeature_nr);
 }
 EXPORT_SYMBOL_GPL(get_xsave_addr);
@@ -884,13 +883,13 @@ EXPORT_SYMBOL_GPL(get_xsave_addr);
  * Note that this only works on the current task.
  *
  * Inputs:
- *	@xsave_state: state which is defined in xsave.h (e.g. XFEATURE_MASK_FP,
- *	XFEATURE_MASK_SSE, etc...)
+ *	@xfeature_nr: state which is defined in xsave.h (e.g. XFEATURE_FP,
+ *	XFEATURE_SSE, etc...)
  * Output:
  *	address of the state in the xsave area or NULL if the state
  *	is not present or is in its 'init state'.
  */
-const void *get_xsave_field_ptr(int xsave_state)
+const void *get_xsave_field_ptr(int xfeature_nr)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
@@ -900,7 +899,7 @@ const void *get_xsave_field_ptr(int xsave_state)
 	 */
 	fpu__save(fpu);
 
-	return get_xsave_addr(&fpu->state.xsave, xsave_state);
+	return get_xsave_addr(&fpu->state.xsave, xfeature_nr);
 }
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index e6db475164ede..aa4703d12e084 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -477,7 +477,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 	 * which is all zeros which indicates MPX was not
 	 * responsible for the exception.
 	 */
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		goto exit_trap;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca717737347e6..78006ea3cf46f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3533,15 +3533,15 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 	 */
 	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
 	while (valid) {
-		u64 feature = valid & -valid;
-		int index = fls64(feature) - 1;
-		void *src = get_xsave_addr(xsave, feature);
+		u64 xfeature_mask = valid & -valid;
+		int xfeature_nr = fls64(xfeature_mask) - 1;
+		void *src = get_xsave_addr(xsave, xfeature_nr);
 
 		if (src) {
 			u32 size, offset, ecx, edx;
-			cpuid_count(XSTATE_CPUID, index,
+			cpuid_count(XSTATE_CPUID, xfeature_nr,
 				    &size, &offset, &ecx, &edx);
-			if (feature == XFEATURE_MASK_PKRU)
+			if (xfeature_nr == XFEATURE_PKRU)
 				memcpy(dest + offset, &vcpu->arch.pkru,
 				       sizeof(vcpu->arch.pkru));
 			else
@@ -3549,7 +3549,7 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
 
 		}
 
-		valid -= feature;
+		valid -= xfeature_mask;
 	}
 }
 
@@ -3576,22 +3576,22 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
 	 */
 	valid = xstate_bv & ~XFEATURE_MASK_FPSSE;
 	while (valid) {
-		u64 feature = valid & -valid;
-		int index = fls64(feature) - 1;
-		void *dest = get_xsave_addr(xsave, feature);
+		u64 xfeature_mask = valid & -valid;
+		int xfeature_nr = fls64(xfeature_mask) - 1;
+		void *dest = get_xsave_addr(xsave, xfeature_nr);
 
 		if (dest) {
 			u32 size, offset, ecx, edx;
-			cpuid_count(XSTATE_CPUID, index,
+			cpuid_count(XSTATE_CPUID, xfeature_nr,
 				    &size, &offset, &ecx, &edx);
-			if (feature == XFEATURE_MASK_PKRU)
+			if (xfeature_nr == XFEATURE_PKRU)
 				memcpy(&vcpu->arch.pkru, src + offset,
 				       sizeof(vcpu->arch.pkru));
 			else
 				memcpy(dest, src + offset, size);
 		}
 
-		valid -= feature;
+		valid -= xfeature_mask;
 	}
 }
 
@@ -8562,11 +8562,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		if (init_event)
 			kvm_put_guest_fpu(vcpu);
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
-					XFEATURE_MASK_BNDREGS);
+					XFEATURE_BNDREGS);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndreg_state));
 		mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
-					XFEATURE_MASK_BNDCSR);
+					XFEATURE_BNDCSR);
 		if (mpx_state_buffer)
 			memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
 		if (init_event)
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index e500949bae245..a50c3fe11c6ff 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -145,7 +145,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
 		goto err_out;
 	}
 	/* get bndregs field from current task's xsave area */
-	bndregs = get_xsave_field_ptr(XFEATURE_MASK_BNDREGS);
+	bndregs = get_xsave_field_ptr(XFEATURE_BNDREGS);
 	if (!bndregs) {
 		err = -EINVAL;
 		goto err_out;
@@ -202,7 +202,7 @@ static __user void *mpx_get_bounds_dir(void)
 	 * The bounds directory pointer is stored in a register
 	 * only accessible if we first do an xsave.
 	 */
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		return MPX_INVALID_BOUNDS_DIR;
 
@@ -388,7 +388,7 @@ static int do_mpx_bt_fault(void)
 	const struct mpx_bndcsr *bndcsr;
 	struct mm_struct *mm = current->mm;
 
-	bndcsr = get_xsave_field_ptr(XFEATURE_MASK_BNDCSR);
+	bndcsr = get_xsave_field_ptr(XFEATURE_BNDCSR);
 	if (!bndcsr)
 		return -EINVAL;
 	/*
-- 
2.19.1


Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask
  2018-10-11 17:30   ` Christophe de Dinechin
@ 2018-10-18 11:19     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-18 11:19 UTC (permalink / raw)
  To: Christophe de Dinechin
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-10-11 19:30:07 [+0200], Christophe de Dinechin wrote:
> > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> > index 87a57b7642d36..38d0b5ea40425 100644
> > --- a/arch/x86/kernel/fpu/xstate.c
> > +++ b/arch/x86/kernel/fpu/xstate.c
> > @@ -811,10 +811,8 @@ void fpu__resume_cpu(void)
> >   *
> >   * Note: does not work for compacted buffers.
> >   */
> > -void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
> > +void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr)
> 
> It might be clearer to offer both interfaces, since both are used?
> 
> > @@ -869,7 +868,8 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
> >  	if (!(xsave->header.xfeatures & xstate_feature))
> >  		return NULL;
> >
> > -	return __raw_xsave_addr(xsave, xstate_feature);
> > +	feature_nr = fls64(xstate_feature) - 1;
> > +	return __raw_xsave_addr(xsave, feature_nr);
> 
> and then move that to a wrapper function that takes a mask?

made another patch this fls is gone now and everyone is using a number
instead of a mask.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask
  2018-10-18 11:17     ` Sebastian Andrzej Siewior
@ 2018-10-18 11:21       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-18 11:21 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 2018-10-18 13:17:19 [+0200], To Dave Hansen wrote:
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -47,7 +47,7 @@ extern void __init update_regset_xstate_info(unsigned int size,
>  
>  void fpu__xstate_clear_all_cpu_caps(void);
>  void *get_xsave_addr(struct xregs_state *xsave, int xstate);
and now I locally changed xstate to xfeature_nr.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask
  2018-10-17 10:01   ` Borislav Petkov
@ 2018-10-18 11:48     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-18 11:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

On 2018-10-17 12:01:43 [+0200], Borislav Petkov wrote:
> > -void *__raw_xsave_addr(struct xregs_state *xsave, int xstate_feature_mask)
> > +void *__raw_xsave_addr(struct xregs_state *xsave, int feature_nr)
> 
> Don't forget to update the comment above it too - it still says "mask".
done.

> Otherwise nice.
thanks.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/11] x86/fpu: eager switch PKRU state
  2018-10-12 17:51   ` Dave Hansen
  2018-10-12 18:09     ` Andy Lutomirski
@ 2018-10-18 16:13     ` Sebastian Andrzej Siewior
  2018-10-18 17:50       ` Dave Hansen
  1 sibling, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-18 16:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 2018-10-12 10:51:34 [-0700], Dave Hansen wrote:
> > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > index 16c4077ffc945..956d967ca824a 100644
> > --- a/arch/x86/include/asm/fpu/internal.h
> > +++ b/arch/x86/include/asm/fpu/internal.h
> > @@ -570,11 +570,23 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
> >   */
> >  static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
> >  {
> > -	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
> > -		       new_fpu->initialized;
> > +	bool load_fpu;
> >  
> > -	if (preload)
> > -		__fpregs_load_activate(new_fpu, cpu);
> > +	load_fpu = static_cpu_has(X86_FEATURE_FPU) && new_fpu->initialized;
> > +	if (!load_fpu)
> > +		return;
> 
> Needs comments, please.  Especially around what an uninitialized new_fpu
> means.

that ->initialized field is gone.

> > +	__fpregs_load_activate(new_fpu, cpu);
> > +
> > +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> > +	if (static_cpu_has(X86_FEATURE_OSPKE)) {
> 
> FWIW, you should be able to use cpu_feature_enabled() instead of an
> explicit #ifdef here.
okay.

> > +		struct pkru_state *pk;
> > +
> > +		pk = __raw_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
> > +		if (pk->pkru != __read_pkru())
> > +			__write_pkru(pk->pkru);
> > +	}
> > +#endif
> >  }
> 
> Comments here as well, please.
> 
> I think the goal is to keep the PKRU state in the 'init state' when
> possible and also to save the cost of WRPKRU.  But, it would be really
> nice to be explicit.

added:
     /*
      * Writting PKRU is expensive. Only write the PKRU value if it is
      * different from the current one.
      */

> > -static inline void write_pkru(u32 pkru)
> > -{
> > -	if (boot_cpu_has(X86_FEATURE_OSPKE))
> > -		__write_pkru(pkru);
> > -}
> > +void write_pkru(u32 pkru);
> 
> One reason I inlined this was because it enables the the PK code to be
> optimized away entirely.  Putting the checks behind a function call
> makes this optimization impossible.
> 
> Could you elaborate on why you chose to do this and what you think the
> impact is or is not?

One thing let to another. It gets to an include mess once I tried to do
more than just reading / writting the value.
I kept now write_pkru() and added __write_pkru_if_new() which does the
extra pieces. If you don't like the new version we would need to look on
how to make it simpler :)

> > diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
> > index 19b137f1b3beb..b184f916319e5 100644
> > --- a/arch/x86/include/asm/pkeys.h
> > +++ b/arch/x86/include/asm/pkeys.h
> > @@ -119,7 +119,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
> >  		unsigned long init_val);
> >  extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
> >  		unsigned long init_val);
> > -extern void copy_init_pkru_to_fpregs(void);
> > +extern void pkru_set_init_value(void);
> 
> Could you elaborate on why the name is being changed?

The function name read like init_pkru value is copied to fpregs save
area which is not the case. I could revert it if you prefer.

> > +void write_pkru(u32 pkru)
> > +{
> > +	struct pkru_state *pk;
> > +
> > +	if (!boot_cpu_has(X86_FEATURE_OSPKE))
> > +		return;
> > +
> > +	pk = __raw_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > +	/*
> > +	 * Update the PKRU value in cstate and then in the CPU. A context
> 
> "cstate"?  Did you mean xstate?

indeed.

> > +	 * switch between those two operation would load the new value from the
> > +	 * updated xstate and then we would write (the same value) to the CPU.
> > +	 */
> > +	pk->pkru = pkru;
> > +	__write_pkru(pkru);
> > +
> > +}
> 
> There's an unnecessary line there.
> 
> This also needs a lot more high-level context about why it is necessary.
>  I think you had that in the changelog, but we also need the function
> commented.

What is necessary? The manual update of "pk->pkru?

> > -	if (!init_pkru_value_snapshot && !read_pkru())
> > +	if (init_pkru_value_snapshot == read_pkru())
> >  		return;
> > -	/*
> > -	 * Override the PKRU state that came from 'init_fpstate'
> > -	 * with the baseline from the process.
> > -	 */
> > +
> >  	write_pkru(init_pkru_value_snapshot);
> >  }
> 
> Isn't this doing some of the same work (including rdpkru()) as write_pkru()?

Well, yes. Since my write_pkru() checks if the new value the same as the
current I dropped most of the code here.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-12 18:02     ` Andy Lutomirski
@ 2018-10-18 16:26       ` Sebastian Andrzej Siewior
  2018-10-18 16:48         ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-18 16:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, LKML, X86 ML, Paolo Bonzini, Radim Krcmar, kvm list,
	Jason A. Donenfeld, Rik van Riel

On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote:
> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
> <dave.hansen@linux.intel.com> wrote:
> >
> > On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> > > The PKRU value is not set for kernel threads because they do not have
> > > the ->initialized value set. As a result the kernel thread has a random
> > > PKRU value set which it inherits from the previous task.
> > > It has been suggested by Paolo Bonzini to set it for kernel threads, too
> > > because it might be a fix.
> > > I *think* this is not required because the kernel threads don't copy
> > > data to/from userland and don't have access to any userspace mm in
> > > general.
> > > However there is this use_mm(). If we gain a mm by use_mm() we don't
> > > have a matching PKRU value because those are per thread. It has been
> > > suggested to use 0 as the PKRU value but this would bypass PKRU.
> > >
> > > Set the initial (default) PKRU value for kernel threads.
> >
> > We might want to do this for cleanliness reasons...  Maybe.
> >
> > But this *should* have no practical effects.  Kernel threads have no
> > real 'mm' and no user pages.  They should not have do access to user
> > mappings.  Protection keys *only* apply to user mappings.  Thus,
> > logically, they should never be affected by PKRU values.
> >
> > So I'm kinda missing the point of the patch.
> 
> use_mm().

So. I would drop that patch from queue. Anyone feels different about it?

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-18 16:26       ` Sebastian Andrzej Siewior
@ 2018-10-18 16:48         ` Andy Lutomirski
  2018-10-18 17:47           ` Dave Hansen
  2018-10-18 18:25           ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-18 16:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Andy Lutomirski, Dave Hansen, LKML, X86 ML, Paolo Bonzini,
	Radim Krcmar, kvm list, Jason A. Donenfeld, Rik van Riel



> On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
>> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote:
>> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
>> <dave.hansen@linux.intel.com> wrote:
>>> 
>>>> On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
>>>> The PKRU value is not set for kernel threads because they do not have
>>>> the ->initialized value set. As a result the kernel thread has a random
>>>> PKRU value set which it inherits from the previous task.
>>>> It has been suggested by Paolo Bonzini to set it for kernel threads, too
>>>> because it might be a fix.
>>>> I *think* this is not required because the kernel threads don't copy
>>>> data to/from userland and don't have access to any userspace mm in
>>>> general.
>>>> However there is this use_mm(). If we gain a mm by use_mm() we don't
>>>> have a matching PKRU value because those are per thread. It has been
>>>> suggested to use 0 as the PKRU value but this would bypass PKRU.
>>>> 
>>>> Set the initial (default) PKRU value for kernel threads.
>>> 
>>> We might want to do this for cleanliness reasons...  Maybe.
>>> 
>>> But this *should* have no practical effects.  Kernel threads have no
>>> real 'mm' and no user pages.  They should not have do access to user
>>> mappings.  Protection keys *only* apply to user mappings.  Thus,
>>> logically, they should never be affected by PKRU values.
>>> 
>>> So I'm kinda missing the point of the patch.
>> 
>> use_mm().
> 
> So. I would drop that patch from queue. Anyone feels different about it?
> 

I think we *do* want the patch. It’s a bugfix for use_mm users, right?

> Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-18 16:48         ` Andy Lutomirski
@ 2018-10-18 17:47           ` Dave Hansen
  2018-10-18 18:25           ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2018-10-18 17:47 UTC (permalink / raw)
  To: Andy Lutomirski, Sebastian Andrzej Siewior
  Cc: Andy Lutomirski, LKML, X86 ML, Paolo Bonzini, Radim Krcmar,
	kvm list, Jason A. Donenfeld, Rik van Riel

On 10/18/2018 09:48 AM, Andy Lutomirski wrote:
>>>> We might want to do this for cleanliness reasons...  Maybe.
>>>>
>>>> But this *should* have no practical effects.  Kernel threads have no
>>>> real 'mm' and no user pages.  They should not have do access to user
>>>> mappings.  Protection keys *only* apply to user mappings.  Thus,
>>>> logically, they should never be affected by PKRU values.
>>>>
>>>> So I'm kinda missing the point of the patch.
>>> use_mm().
>> So. I would drop that patch from queue. Anyone feels different about it?
>>
> I think we *do* want the patch. It’s a bugfix for use_mm users, right?

Yes, we need it.  I was being dense and Andy kindly reminded me of the
point of the patch.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 04/11] x86/fpu: eager switch PKRU state
  2018-10-18 16:13     ` Sebastian Andrzej Siewior
@ 2018-10-18 17:50       ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2018-10-18 17:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

>>> diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
>>> index 19b137f1b3beb..b184f916319e5 100644
>>> --- a/arch/x86/include/asm/pkeys.h
>>> +++ b/arch/x86/include/asm/pkeys.h
>>> @@ -119,7 +119,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>>>  		unsigned long init_val);
>>>  extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>>>  		unsigned long init_val);
>>> -extern void copy_init_pkru_to_fpregs(void);
>>> +extern void pkru_set_init_value(void);
>>
>> Could you elaborate on why the name is being changed?
> 
> The function name read like init_pkru value is copied to fpregs save
> area which is not the case. I could revert it if you prefer.

I would just prefer that we find some way of saying *where* pkru is
being set.  Is it fpstate, fpregs, or both?

>>> +	 * switch between those two operation would load the new value from the
>>> +	 * updated xstate and then we would write (the same value) to the CPU.
>>> +	 */
>>> +	pk->pkru = pkru;
>>> +	__write_pkru(pkru);
>>> +
>>> +}
>>
>> There's an unnecessary line there.
>>
>> This also needs a lot more high-level context about why it is necessary.
>>  I think you had that in the changelog, but we also need the function
>> commented.
> 
> What is necessary? The manual update of "pk->pkru?

Why we need to update fpstate *and* the actual register.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-18 16:48         ` Andy Lutomirski
  2018-10-18 17:47           ` Dave Hansen
@ 2018-10-18 18:25           ` Sebastian Andrzej Siewior
  2018-10-18 20:46             ` Andy Lutomirski
  1 sibling, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-18 18:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Dave Hansen, LKML, X86 ML, Paolo Bonzini,
	Radim Krcmar, kvm list, Jason A. Donenfeld, Rik van Riel

On 2018-10-18 09:48:24 [-0700], Andy Lutomirski wrote:
> > On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote:
> >> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
> >>> So I'm kinda missing the point of the patch.
> >> 
> >> use_mm().
> > 
> > So. I would drop that patch from queue. Anyone feels different about it?
> > 
> 
> I think we *do* want the patch. It’s a bugfix for use_mm users, right?

This is the loophole that has been pointed out. I am not convinced what
the correct behaviour should be here (and we have five users of that
interface). For instance f_fs[0].  It reads data from the USB EP and
then writes it to userland task. Due to $circumstances it happens in a
workqueue instead of the task's context.  So it borrows the mm with
use_mm().  The current behaviour random because the PKRU value can not
be predicted. It may or may not work.

Setting it to allow-all/none would let the operation always fail or
succeed which might be an improvement in terms of debugging. However it
is hard to judge what the correct behaviour should be. Should fail or
succeed.
But this is not the only loophole: There is ptrace interface which is
used by gdb (just checked) and also bypasses PKRU. So…

[0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker()

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-18 18:25           ` Sebastian Andrzej Siewior
@ 2018-10-18 20:46             ` Andy Lutomirski
  2018-10-18 20:56               ` Dave Hansen
  2018-10-19  7:44               ` Paolo Bonzini
  0 siblings, 2 replies; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-18 20:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Andrew Lutomirski, Dave Hansen, LKML, X86 ML, Paolo Bonzini,
	Radim Krcmar, kvm list, Jason A. Donenfeld, Rik van Riel

On Thu, Oct 18, 2018 at 11:25 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2018-10-18 09:48:24 [-0700], Andy Lutomirski wrote:
> > > On Oct 18, 2018, at 9:26 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > >> On 2018-10-12 11:02:18 [-0700], Andy Lutomirski wrote:
> > >> On Fri, Oct 12, 2018 at 10:54 AM Dave Hansen
> > >>> So I'm kinda missing the point of the patch.
> > >>
> > >> use_mm().
> > >
> > > So. I would drop that patch from queue. Anyone feels different about it?
> > >
> >
> > I think we *do* want the patch. It’s a bugfix for use_mm users, right?
>
> This is the loophole that has been pointed out. I am not convinced what
> the correct behaviour should be here (and we have five users of that
> interface). For instance f_fs[0].  It reads data from the USB EP and
> then writes it to userland task. Due to $circumstances it happens in a
> workqueue instead of the task's context.  So it borrows the mm with
> use_mm().  The current behaviour random because the PKRU value can not
> be predicted. It may or may not work.
>
> Setting it to allow-all/none would let the operation always fail or
> succeed which might be an improvement in terms of debugging. However it
> is hard to judge what the correct behaviour should be. Should fail or
> succeed.
> But this is not the only loophole: There is ptrace interface which is
> used by gdb (just checked) and also bypasses PKRU. So…
>
> [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker()
>
> Sebastian

I think we need an entirely new API:

user_mm_ctx_t ctx = user_mm_ctx_get();

...

use_user_mm_ctx(ctx);
unuse_user_mm_ctx(ctx);

...

user_mm_ctx_put(ctx);

and ctx will store a copy of mm and PKRU.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-18 20:46             ` Andy Lutomirski
@ 2018-10-18 20:56               ` Dave Hansen
  2018-10-18 21:24                 ` Sebastian Andrzej Siewior
  2018-10-19  7:44               ` Paolo Bonzini
  1 sibling, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2018-10-18 20:56 UTC (permalink / raw)
  To: Andy Lutomirski, Sebastian Andrzej Siewior
  Cc: Andrew Lutomirski, LKML, X86 ML, Paolo Bonzini, Radim Krcmar,
	kvm list, Jason A. Donenfeld, Rik van Riel

On 10/18/2018 01:46 PM, Andy Lutomirski wrote:
> Setting it to allow-all/none would let the operation always fail or
> succeed which might be an improvement in terms of debugging. However it
> is hard to judge what the correct behaviour should be. Should fail or
> succeed.

Succeed. :)

> But this is not the only loophole: There is ptrace interface which is
> used by gdb (just checked) and also bypasses PKRU. So…

Bypassing protection keys is not a big deal IMNHO.  In places where a
sane one is not readily available, I'm totally fine with just
effectively disabling it (PKRU=0) for the length of time it isn't available.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-18 20:56               ` Dave Hansen
@ 2018-10-18 21:24                 ` Sebastian Andrzej Siewior
  2018-10-18 21:58                   ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-18 21:24 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Andrew Lutomirski, LKML, X86 ML, Paolo Bonzini,
	Radim Krcmar, kvm list, Jason A. Donenfeld, Rik van Riel

On 2018-10-18 13:56:24 [-0700], Dave Hansen wrote:
> > But this is not the only loophole: There is ptrace interface which is
> > used by gdb (just checked) and also bypasses PKRU. So…
> 
> Bypassing protection keys is not a big deal IMNHO.  In places where a
> sane one is not readily available, I'm totally fine with just
> effectively disabling it (PKRU=0) for the length of time it isn't available.

Okay, this makes things easier. Let document that for kernel threads we
use PKRU=0. This should be happening in my try right now. I double check
tomorrow just in case…

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-18 21:24                 ` Sebastian Andrzej Siewior
@ 2018-10-18 21:58                   ` Andy Lutomirski
  0 siblings, 0 replies; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-18 21:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Dave Hansen, Andrew Lutomirski, LKML, X86 ML, Paolo Bonzini,
	Radim Krcmar, kvm list, Jason A. Donenfeld, Rik van Riel



> On Oct 18, 2018, at 2:24 PM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> On 2018-10-18 13:56:24 [-0700], Dave Hansen wrote:
>>> But this is not the only loophole: There is ptrace interface which is
>>> used by gdb (just checked) and also bypasses PKRU. So…
>> 
>> Bypassing protection keys is not a big deal IMNHO.  In places where a
>> sane one is not readily available, I'm totally fine with just
>> effectively disabling it (PKRU=0) for the length of time it isn't available.
> 
> Okay, this makes things easier. Let document that for kernel threads we
> use PKRU=0. This should be happening in my try right now. I double check
> tomorrow just in case…
> 
> 

If you document that, please at least document that it’s a bug and not intended behavior.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-18 20:46             ` Andy Lutomirski
  2018-10-18 20:56               ` Dave Hansen
@ 2018-10-19  7:44               ` Paolo Bonzini
  2018-10-19 16:59                 ` Andy Lutomirski
  1 sibling, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2018-10-19  7:44 UTC (permalink / raw)
  To: Andy Lutomirski, Sebastian Andrzej Siewior
  Cc: Andrew Lutomirski, Dave Hansen, LKML, X86 ML, Radim Krcmar,
	kvm list, Jason A. Donenfeld, Rik van Riel

On 18/10/2018 22:46, Andy Lutomirski wrote:
>> [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker()
>>
>> Sebastian
> I think we need an entirely new API:
> 
> user_mm_ctx_t ctx = user_mm_ctx_get();
> 
> ...
> 
> use_user_mm_ctx(ctx);
> unuse_user_mm_ctx(ctx);
> 
> ...
> 
> user_mm_ctx_put(ctx);
> 
> and ctx will store a copy of mm and PKRU.
> 

That looks like a good API in general.  The ffs_user_copy_worker that
Sebastian mentioned seems to be used by AIO, in which case of course it
has to happen in a kernel thread.

But while the API is good, deciding on the desired semantics is
"interesting".  The submitting thread might be changing PKRU between the
time the I/O operation is submitted and the time it is completed, for
example.  You could look up the task's PKRU at use_mm time, except that
the task might have disappeared...  In the end just using PKRU=0 makes
some sense and it only should be documented that some kernel services
will ignore PKRU.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-19  7:44               ` Paolo Bonzini
@ 2018-10-19 16:59                 ` Andy Lutomirski
  2018-10-19 17:01                   ` Dave Hansen
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-19 16:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sebastian Andrzej Siewior, Andrew Lutomirski, Dave Hansen, LKML,
	X86 ML, Radim Krcmar, kvm list, Jason A. Donenfeld, Rik van Riel



> On Oct 19, 2018, at 12:44 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 18/10/2018 22:46, Andy Lutomirski wrote:
>>> [0] drivers/usb/gadget/function/f_fs.c::ffs_user_copy_worker()
>>> 
>>> Sebastian
>> I think we need an entirely new API:
>> 
>> user_mm_ctx_t ctx = user_mm_ctx_get();
>> 
>> ...
>> 
>> use_user_mm_ctx(ctx);
>> unuse_user_mm_ctx(ctx);
>> 
>> ...
>> 
>> user_mm_ctx_put(ctx);
>> 
>> and ctx will store a copy of mm and PKRU.
>> 
> 
> That looks like a good API in general.  The ffs_user_copy_worker that
> Sebastian mentioned seems to be used by AIO, in which case of course it
> has to happen in a kernel thread.
> 
> But while the API is good, deciding on the desired semantics is
> "interesting".  The submitting thread might be changing PKRU between the
> time the I/O operation is submitted and the time it is completed, for
> example.

I think there’s only one sensible answer: capture PKRU at the time of submission.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-19 16:59                 ` Andy Lutomirski
@ 2018-10-19 17:01                   ` Dave Hansen
  2018-10-19 17:37                     ` Andy Lutomirski
  0 siblings, 1 reply; 57+ messages in thread
From: Dave Hansen @ 2018-10-19 17:01 UTC (permalink / raw)
  To: Andy Lutomirski, Paolo Bonzini
  Cc: Sebastian Andrzej Siewior, Andrew Lutomirski, LKML, X86 ML,
	Radim Krcmar, kvm list, Jason A. Donenfeld, Rik van Riel

On 10/19/2018 09:59 AM, Andy Lutomirski wrote:
>> That looks like a good API in general.  The ffs_user_copy_worker that
>> Sebastian mentioned seems to be used by AIO, in which case of course it
>> has to happen in a kernel thread.
>>
>> But while the API is good, deciding on the desired semantics is
>> "interesting".  The submitting thread might be changing PKRU between the
>> time the I/O operation is submitted and the time it is completed, for
>> example.
> I think there’s only one sensible answer: capture PKRU at the time of submission.

I think it's much more straightforward to just not enforce pkeys.
Having this "phantom" value could cause a very odd, nearly undebuggable
I/O failure.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-19 17:01                   ` Dave Hansen
@ 2018-10-19 17:37                     ` Andy Lutomirski
  2018-10-19 18:26                       ` Dave Hansen
  0 siblings, 1 reply; 57+ messages in thread
From: Andy Lutomirski @ 2018-10-19 17:37 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Paolo Bonzini, Sebastian Andrzej Siewior, Andrew Lutomirski,
	LKML, X86 ML, Radim Krcmar, kvm list, Jason A. Donenfeld,
	Rik van Riel



> On Oct 19, 2018, at 10:01 AM, Dave Hansen <dave.hansen@linux.intel.com> wrote:
> 
> On 10/19/2018 09:59 AM, Andy Lutomirski wrote:
>>> That looks like a good API in general.  The ffs_user_copy_worker that
>>> Sebastian mentioned seems to be used by AIO, in which case of course it
>>> has to happen in a kernel thread.
>>> 
>>> But while the API is good, deciding on the desired semantics is
>>> "interesting".  The submitting thread might be changing PKRU between the
>>> time the I/O operation is submitted and the time it is completed, for
>>> example.
>> I think there’s only one sensible answer: capture PKRU at the time of submission.
> 
> I think it's much more straightforward to just not enforce pkeys.
> Having this "phantom" value could cause a very odd, nearly undebuggable
> I/O failure.

But now we have the reverse. The IO can work if it’s truly async but, if the kernel decides to synchronously complete IO (with GUP or copy_to_user), it’ll fail, right. This isn’t exactly friendly either.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 05/11] x86/fpu: set PKRU state for kernel threads
  2018-10-19 17:37                     ` Andy Lutomirski
@ 2018-10-19 18:26                       ` Dave Hansen
  0 siblings, 0 replies; 57+ messages in thread
From: Dave Hansen @ 2018-10-19 18:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Paolo Bonzini, Sebastian Andrzej Siewior, Andrew Lutomirski,
	LKML, X86 ML, Radim Krcmar, kvm list, Jason A. Donenfeld,
	Rik van Riel

On 10/19/2018 10:37 AM, Andy Lutomirski wrote:
>> I think it's much more straightforward to just not enforce pkeys. 
>> Having this "phantom" value could cause a very odd, nearly
>> undebuggable I/O failure.
> But now we have the reverse. The IO can work if it’s truly async but,
> if the kernel decides to synchronously complete IO (with GUP or
> copy_to_user), it’ll fail, right. This isn’t exactly friendly
> either.

Yeah, but a synchronous I/O failure is really straightforward to debug
because you get an immediate error message about it.  This is certainly
not the weirdest behavior or asymmetry that we would see from
synchronous vs. async I/O.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
  2018-10-12 18:15   ` Dave Hansen
@ 2018-11-02 14:42     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-02 14:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 2018-10-12 11:15:51 [-0700], Dave Hansen wrote:
> > @@ -172,27 +155,20 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
> >  			sizeof(struct user_i387_ia32_struct), NULL,
> >  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
> >  
> > -	if (fpu->initialized || using_compacted_format()) {
> > -		/* Save the live register state to the user directly. */
> > -		if (copy_fpregs_to_sigframe(buf_fx))
> > -			return -1;
> > -		/* Update the thread's fxstate to save the fsave header. */
> > -		if (ia32_fxstate)
> > -			copy_fxregs_to_kernel(fpu);
> > +	/* Update the thread's fxstate to save the fsave header. */
> > +	if (ia32_fxstate) {
> > +		copy_fxregs_to_kernel(fpu);
> >  	} else {
> > -		/*
> > -		 * It is a *bug* if kernel uses compacted-format for xsave
> > -		 * area and we copy it out directly to a signal frame. It
> > -		 * should have been handled above by saving the registers
> > -		 * directly.
> > -		 */
> > -		if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> > -			WARN_ONCE(1, "x86/fpu: saving compacted-format xsave area to a signal frame!\n");
> > -			return -1;
> > -		}
> > +		copy_fpregs_to_fpstate(fpu);
> > +		fpregs_deactivate(fpu);
> > +	}
> 
> Could you add a high-level comment for this if{}else{} block that says
> something like:
> 
> 	/* Save the registers to the fpstate. */
> 
> I also think it's worthwhile to explain the asymmetry between the
> ia32_fxstate case and the other branch.  Why don't we
> fpregs_deactivate() in the ia32_fxstate path, for instance?

Since the ->initialized is gone, the whole hunk here looks differently
and probably easier to understand.

> > +	if (using_compacted_format()) {
> > +		copy_xstate_to_user(buf_fx, xsave, 0, size);
> > +	} else {
> >  		fpstate_sanitize_xstate(fpu);
> > -		if (__copy_to_user(buf_fx, xsave, fpu_user_xstate_size))
> > +		size = fpu_user_xstate_size;
> > +		if (__copy_to_user(buf_fx, xsave, size))
> >  			return -1;
> >  	}

dropped this.

> This seems unnecessary.  Why are you updating 'size' like this?

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU
  2018-10-15 15:24     ` Borislav Petkov
@ 2018-11-02 15:44       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-02 15:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Rik van Riel, linux-kernel, x86, Andy Lutomirski,
	Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld

On 2018-10-15 17:24:31 [+0200], Borislav Petkov wrote:
> On Fri, Oct 12, 2018 at 12:40:19PM -0700, Dave Hansen wrote:
> > > +	__fpregs_changes_end();
> > 
> > Do we really need the __fpregs_changes_*() abstraction for this single
> > call site?
> 
> Yap, I'm staring at those in patch 2, there's no documentation there what
> they're supposed to do, only the commit message of patch 11 says:
> 
> "The __fpregs_changes_{begin|end}() section ensures that the register
> remain unchanged. Otherwise a context switch or a BH could save the
> registers to its FPU context and processor's FPU register would remain
> random."
> 
> So I'd say we should drop that abstraction, use preempt_* and put that
> text above the single usage site.

There are more than one caller and this function disables preemption and
BH in the end. Therefore I would like to keep it. But as suggested in
the previous patch I will think about renaming it.

> Thx.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU
  2018-10-12 19:40   ` Dave Hansen
  2018-10-15 15:24     ` Borislav Petkov
@ 2018-11-02 22:55     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 57+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-11-02 22:55 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, x86, Andy Lutomirski, Paolo Bonzini,
	Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel

On 2018-10-12 12:40:19 [-0700], Dave Hansen wrote:
> On 10/04/2018 07:05 AM, Sebastian Andrzej Siewior wrote:
> > From: Rik van Riel <riel@surriel.com>
> > 
> > If TIF_LOAD_FPU is set, then the registers are saved (not loaded). In that case
> > we skip the saving part.
> 
> This sentence hurts my brain.
> 
> 	"If TIF_LOAD_FPU is set the registers are ... not loaded"
> 
> I think that means that something could use some better naming.
> 
> Should TIF_LOAD_FPU be TIF_NEED_FPU_LOAD, perhaps?

ARM has TIF_FOREIGN_FPSTATE which basically means that the current FP
stat does not belong to current().
Let me try to use TIF_NEED_FPU_LOAD and see how that works out.

> > Signed-off-by: Rik van Riel <riel@surriel.com>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> >  arch/x86/kernel/fpu/signal.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
> > index c8f5ff58578ed..979dcd1ed82e0 100644
> > --- a/arch/x86/kernel/fpu/signal.c
> > +++ b/arch/x86/kernel/fpu/signal.c
> > @@ -155,13 +155,17 @@ int copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size)
> >  			sizeof(struct user_i387_ia32_struct), NULL,
> >  			(struct _fpstate_32 __user *) buf) ? -1 : 1;
> >  
> > -	/* Update the thread's fxstate to save the fsave header. */
> > -	if (ia32_fxstate) {
> > -		copy_fxregs_to_kernel(fpu);
> > -	} else {
> > -		copy_fpregs_to_fpstate(fpu);
> > -		fpregs_deactivate(fpu);
> > +	__fpregs_changes_begin();
> > +	if (!test_thread_flag(TIF_LOAD_FPU)) {
> 
> This needs commenting, please.
> 
> If we do not need to load the FPU at return to userspace, it means the
> state is in the the registers, not the buffer.  So, update the buffer to
> match the registers.

okay. Let me add this then.

Sebastian

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2018-11-02 22:55 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-04 14:05 [PATCH 00/11 v3] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
2018-10-04 14:05 ` [PATCH 01/11] x86/entry: remove _TIF_ALLWORK_MASK Sebastian Andrzej Siewior
2018-10-11 16:27   ` Borislav Petkov
2018-10-04 14:05 ` [PATCH 02/11] x86/fpu: add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
2018-10-04 14:05 ` [PATCH 03/11] x86/fpu: make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
2018-10-11 17:30   ` Christophe de Dinechin
2018-10-18 11:19     ` Sebastian Andrzej Siewior
2018-10-12 15:52   ` Dave Hansen
2018-10-18 11:17     ` Sebastian Andrzej Siewior
2018-10-18 11:21       ` Sebastian Andrzej Siewior
2018-10-17 10:01   ` Borislav Petkov
2018-10-18 11:48     ` Sebastian Andrzej Siewior
2018-10-04 14:05 ` [PATCH 04/11] x86/fpu: eager switch PKRU state Sebastian Andrzej Siewior
2018-10-12 17:51   ` Dave Hansen
2018-10-12 18:09     ` Andy Lutomirski
2018-10-12 19:44       ` Dave Hansen
2018-10-18 16:13     ` Sebastian Andrzej Siewior
2018-10-18 17:50       ` Dave Hansen
2018-10-04 14:05 ` [PATCH 05/11] x86/fpu: set PKRU state for kernel threads Sebastian Andrzej Siewior
2018-10-12 17:54   ` Dave Hansen
2018-10-12 18:02     ` Andy Lutomirski
2018-10-18 16:26       ` Sebastian Andrzej Siewior
2018-10-18 16:48         ` Andy Lutomirski
2018-10-18 17:47           ` Dave Hansen
2018-10-18 18:25           ` Sebastian Andrzej Siewior
2018-10-18 20:46             ` Andy Lutomirski
2018-10-18 20:56               ` Dave Hansen
2018-10-18 21:24                 ` Sebastian Andrzej Siewior
2018-10-18 21:58                   ` Andy Lutomirski
2018-10-19  7:44               ` Paolo Bonzini
2018-10-19 16:59                 ` Andy Lutomirski
2018-10-19 17:01                   ` Dave Hansen
2018-10-19 17:37                     ` Andy Lutomirski
2018-10-19 18:26                       ` Dave Hansen
2018-10-04 14:05 ` [PATCH 06/11] x86/pkeys: make init_pkru_value static Sebastian Andrzej Siewior
2018-10-12 17:55   ` Dave Hansen
2018-10-04 14:05 ` [PATCH 07/11] x86/pkeys: Drop the preempt-disable section Sebastian Andrzej Siewior
2018-10-12 17:58   ` Dave Hansen
2018-10-12 18:07     ` Andy Lutomirski
2018-10-12 20:26       ` Sebastian Andrzej Siewior
2018-10-04 14:05 ` [PATCH 08/11] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
2018-10-11 17:50   ` Christophe de Dinechin
2018-10-11 21:18     ` Andy Lutomirski
2018-10-12 18:15   ` Dave Hansen
2018-11-02 14:42     ` Sebastian Andrzej Siewior
2018-10-04 14:05 ` [PATCH 09/11] x86/entry: add TIF_LOAD_FPU Sebastian Andrzej Siewior
2018-10-04 14:05 ` [PATCH 10/11] x86/fpu: prepare copy_fpstate_to_sigframe for TIF_LOAD_FPU Sebastian Andrzej Siewior
2018-10-12 19:40   ` Dave Hansen
2018-10-15 15:24     ` Borislav Petkov
2018-11-02 15:44       ` Sebastian Andrzej Siewior
2018-11-02 22:55     ` Sebastian Andrzej Siewior
2018-10-04 14:05 ` [PATCH 11/11] x86/fpu: defer FPU state load until return to userspace Sebastian Andrzej Siewior
2018-10-04 16:14   ` Andy Lutomirski
2018-10-12 20:25     ` Sebastian Andrzej Siewior
2018-10-04 16:45 ` [PATCH 00/11 v3] x86: load FPU registers on return to userland Rik van Riel
2018-10-04 16:50   ` Andy Lutomirski
2018-10-05 11:55   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).