All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/18] KVM: arm64: Optimise FPSIMD context switching
@ 2018-05-22 16:05 ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Christoffer Dall, Marc Zyngier, Ard Biesheuvel,
	Catalin Marinas, Will Deacon, linux-arch, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Steven Rostedt, Oleg Nesterov

Note: Most of these patches are Arm-specific.  People not Cc'd on the
whole series can find it in the linux-arm-kernel archive [2].

This series aims to improve the way FPSIMD context is handled by KVM.
Changes since the previous v9 [1] are mostly minor, but there are some
fixes worthy of closer attention.

In addition to addressing a review comment by Marc on the changes in v9,
this series attempts to fix a NULL-dereference bug observed by Marc on
ESPRESSOBin [5].  A reproducer for a similar bug is documented in [6],
and this series fixes the observed bug (in patches 1 and 7).  At the
moment, this is my best hypothesis for the ESPRESSOBin failure, though
the relationship is unproven and we have no reproducer for the latter.

The changes are summarised in the individual patches.

Reviewers please note:

 * Since v8, patches 10 and 14 have changed.  Reviewer tags have been
   stripped from patch 14, due to non-trivial changes in v9 of the
   series: see the patch for details.

 * Since v9, patches 1 and 7 are also new, and correct a latent bug in
   FPSIMD context handling which is exposed by this series.

If people could take a close look at the above patches, that would be
much appreciated.

Cheers
---Dave

[1] [PATCH v9 00/16] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579569.html

[2] linux-arm-kernel archive
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/thread.html

[3] [kvmarm:queue 9/29] arch/arm/kvm/../../../virt/kvm/arm/arm.c:783:3: error: implicit declaration of function 'kvm_arch_vcpu_ctxsync_fp'; did you mean 'kvm_arch_vcpu_put_fp'?
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579400.html

[4] [kvmarm:queue 13/29] arch/arm/kvm/../../../virt/kvm/arm/arm.c:1598:6: error: implicit declaration of function 'system_supports_sve'
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579399.html

[5] [PULL v8] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579353.html


Christoffer Dall (1):
  KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change

Dave Martin (17):
  arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
  thread_info: Add update_thread_flag() helpers
  arm64: Use update{,_tsk}_thread_flag()
  KVM: arm64: Convert lazy FPSIMD context switch trap to C
  arm64: fpsimd: Generalise context saving for non-task contexts
  arm64: fpsimd: Eliminate task->mm checks
  arm64/sve: Refactor user SVE trap maintenance for external use
  KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
  KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  arm64/sve: Move read_zcr_features() out of cpufeature.h
  arm64/sve: Switch sve_pffr() argument from task to thread
  arm64/sve: Move sve_pffr() to fpsimd.h and make inline
  KVM: arm64: Save host SVE context as appropriate
  KVM: arm64: Remove eager host SVE state saving
  KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
  KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
  KVM: arm64: Invoke FPSIMD context switch trap from C

 arch/arm/include/asm/kvm_host.h      |  10 +-
 arch/arm64/Kconfig                   |   7 ++
 arch/arm64/include/asm/cpufeature.h  |  29 ------
 arch/arm64/include/asm/fpsimd.h      |  21 +++++
 arch/arm64/include/asm/kvm_asm.h     |   3 -
 arch/arm64/include/asm/kvm_host.h    |  45 +++++++--
 arch/arm64/include/asm/processor.h   |   2 +
 arch/arm64/include/asm/thread_info.h |   1 +
 arch/arm64/kernel/fpsimd.c           | 176 +++++++++++++++++------------------
 arch/arm64/kernel/ptrace.c           |   1 +
 arch/arm64/kvm/Kconfig               |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/debug.c               |   8 +-
 arch/arm64/kvm/fpsimd.c              | 110 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/debug-sr.c        |   6 +-
 arch/arm64/kvm/hyp/entry.S           |  43 ---------
 arch/arm64/kvm/hyp/hyp-entry.S       |  19 ----
 arch/arm64/kvm/hyp/switch.c          | 124 ++++++++++++++++--------
 arch/arm64/kvm/hyp/sysreg-sr.c       |   4 +-
 arch/arm64/kvm/sys_regs.c            |   9 +-
 include/linux/kvm_host.h             |   9 ++
 include/linux/sched.h                |   6 ++
 include/linux/thread_info.h          |  11 +++
 virt/kvm/Kconfig                     |   3 +
 virt/kvm/arm/arm.c                   |  14 ++-
 virt/kvm/kvm_main.c                  |   7 +-
 26 files changed, 416 insertions(+), 255 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 00/18] KVM: arm64: Optimise FPSIMD context switching
@ 2018-05-22 16:05 ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

Note: Most of these patches are Arm-specific.  People not Cc'd on the
whole series can find it in the linux-arm-kernel archive [2].

This series aims to improve the way FPSIMD context is handled by KVM.
Changes since the previous v9 [1] are mostly minor, but there are some
fixes worthy of closer attention.

In addition to addressing a review comment by Marc on the changes in v9,
this series attempts to fix a NULL-dereference bug observed by Marc on
ESPRESSOBin [5].  A reproducer for a similar bug is documented in [6],
and this series fixes the observed bug (in patches 1 and 7).  At the
moment, this is my best hypothesis for the ESPRESSOBin failure, though
the relationship is unproven and we have no reproducer for the latter.

The changes are summarised in the individual patches.

Reviewers please note:

 * Since v8, patches 10 and 14 have changed.  Reviewer tags have been
   stripped from patch 14, due to non-trivial changes in v9 of the
   series: see the patch for details.

 * Since v9, patches 1 and 7 are also new, and correct a latent bug in
   FPSIMD context handling which is exposed by this series.

If people could take a close look at the above patches, that would be
much appreciated.

Cheers
---Dave

[1] [PATCH v9 00/16] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579569.html

[2] linux-arm-kernel archive
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/thread.html

[3] [kvmarm:queue 9/29] arch/arm/kvm/../../../virt/kvm/arm/arm.c:783:3: error: implicit declaration of function 'kvm_arch_vcpu_ctxsync_fp'; did you mean 'kvm_arch_vcpu_put_fp'?
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579400.html

[4] [kvmarm:queue 13/29] arch/arm/kvm/../../../virt/kvm/arm/arm.c:1598:6: error: implicit declaration of function 'system_supports_sve'
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579399.html

[5] [PULL v8] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579353.html


Christoffer Dall (1):
  KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change

Dave Martin (17):
  arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
  thread_info: Add update_thread_flag() helpers
  arm64: Use update{,_tsk}_thread_flag()
  KVM: arm64: Convert lazy FPSIMD context switch trap to C
  arm64: fpsimd: Generalise context saving for non-task contexts
  arm64: fpsimd: Eliminate task->mm checks
  arm64/sve: Refactor user SVE trap maintenance for external use
  KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
  KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  arm64/sve: Move read_zcr_features() out of cpufeature.h
  arm64/sve: Switch sve_pffr() argument from task to thread
  arm64/sve: Move sve_pffr() to fpsimd.h and make inline
  KVM: arm64: Save host SVE context as appropriate
  KVM: arm64: Remove eager host SVE state saving
  KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
  KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
  KVM: arm64: Invoke FPSIMD context switch trap from C

 arch/arm/include/asm/kvm_host.h      |  10 +-
 arch/arm64/Kconfig                   |   7 ++
 arch/arm64/include/asm/cpufeature.h  |  29 ------
 arch/arm64/include/asm/fpsimd.h      |  21 +++++
 arch/arm64/include/asm/kvm_asm.h     |   3 -
 arch/arm64/include/asm/kvm_host.h    |  45 +++++++--
 arch/arm64/include/asm/processor.h   |   2 +
 arch/arm64/include/asm/thread_info.h |   1 +
 arch/arm64/kernel/fpsimd.c           | 176 +++++++++++++++++------------------
 arch/arm64/kernel/ptrace.c           |   1 +
 arch/arm64/kvm/Kconfig               |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/debug.c               |   8 +-
 arch/arm64/kvm/fpsimd.c              | 110 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/debug-sr.c        |   6 +-
 arch/arm64/kvm/hyp/entry.S           |  43 ---------
 arch/arm64/kvm/hyp/hyp-entry.S       |  19 ----
 arch/arm64/kvm/hyp/switch.c          | 124 ++++++++++++++++--------
 arch/arm64/kvm/hyp/sysreg-sr.c       |   4 +-
 arch/arm64/kvm/sys_regs.c            |   9 +-
 include/linux/kvm_host.h             |   9 ++
 include/linux/sched.h                |   6 ++
 include/linux/thread_info.h          |  11 +++
 virt/kvm/Kconfig                     |   3 +
 virt/kvm/arm/arm.c                   |  14 ++-
 virt/kvm/kvm_main.c                  |   7 +-
 26 files changed, 416 insertions(+), 255 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

fpsimd_last_state.st is set to NULL as a way of indicating that
current's FPSIMD registers are no longer loaded in the cpu.  In
particular, this is done when the kernel temporarily uses or
clobbers the FPSIMD registers for its own purposes, as in CPU PM or
kernel-mode NEON, resulting in them being populated with garbage
data not belonging to a task.

Commit 17eed27b02da ("arm64/sve: KVM: Prevent guests from using
SVE") factors this operation out as a new helper
fpsimd_flush_cpu_state() to make it clearer what is being done
here, and on SVE systems this helper is now used, via
kvm_fpsimd_flush_cpu_state(), to invalidate the registers after KVM
has run a vcpu.  The reason for this is that KVM does not yet
understand how to restore the full host SVE registers itself after
loading the guest FPSIMD context into them.

This exposes a particular problem: if fpsimd_last_state.st is set
to NULL without also setting TIF_FOREIGN_FPSTATE, the kernel may
continue to think that current's FPSIMD registers are live even
though they have actually been clobbered.

Prior to the aforementioned commit, the only path where
fpsimd_last_state.st is set to NULL without setting
TIF_FOREIGN_FPSTATE is when kernel_neon_begin() is called by a
kernel thread (where current->mm can be NULL).  This does not
matter, because the only harm is that at context-switch time
fpsimd_thread_switch() may unnecessarily save the FPSIMD registers
back to current's thread_struct (even though kernel threads are not
considered to have any FPSIMD context of their own and the
registers will never be reloaded).

Note that although CPU_PM_ENTER lacks the TIF_FOREIGN_FPSTATE
setting, every CPU passing through that path must subsequently pass
through CPU_PM_EXIT before it can re-enter the kernel proper.
CPU_PM_EXIT sets the flag.

The sve_flush_cpu_state() function added by commit 17eed27b02da
also lacks the proper maintenance of TIF_FOREIGN_FPSTATE.  This may
cause the bits of a host task's SVE registers that do not alias the
FPSIMD register file to spontaneously appear zeroed if a KVM vcpu
runs in the same task in the meantime.  Although this effect is
hidden by the fact that the non-FPSIMD bits of the SVE registers
are zeroed by a syscall anyway, it is doubtless a bad idea to rely
on these different code paths interacting correctly under future
maintenance.

This patch makes TIF_FOREIGN_FPSTATE an unconditional side-effect
of fpsimd_flush_cpu_state(), and removes the set_thread_flag()
calls that become redunant as a result.  This ensures that
TIF_FOREIGN_FPSTATE cannot remain clear if the FPSIMD state in the
FPSIMD registers is invalid.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---

Changes since v9:

 * New patch (bugfix to subsequent commits).
---
 arch/arm64/kernel/fpsimd.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 87a3536..12e1c96 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1067,6 +1067,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
 static inline void fpsimd_flush_cpu_state(void)
 {
 	__this_cpu_write(fpsimd_last_state.st, NULL);
+	set_thread_flag(TIF_FOREIGN_FPSTATE);
 }
 
 /*
@@ -1121,10 +1122,8 @@ void kernel_neon_begin(void)
 	__this_cpu_write(kernel_neon_busy, true);
 
 	/* Save unsaved task fpsimd state, if any: */
-	if (current->mm) {
+	if (current->mm)
 		task_fpsimd_save();
-		set_thread_flag(TIF_FOREIGN_FPSTATE);
-	}
 
 	/* Invalidate any task state remaining in the fpsimd regs: */
 	fpsimd_flush_cpu_state();
@@ -1251,8 +1250,6 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
 		fpsimd_flush_cpu_state();
 		break;
 	case CPU_PM_EXIT:
-		if (current->mm)
-			set_thread_flag(TIF_FOREIGN_FPSTATE);
 		break;
 	case CPU_PM_ENTER_FAILED:
 	default:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

fpsimd_last_state.st is set to NULL as a way of indicating that
current's FPSIMD registers are no longer loaded in the cpu.  In
particular, this is done when the kernel temporarily uses or
clobbers the FPSIMD registers for its own purposes, as in CPU PM or
kernel-mode NEON, resulting in them being populated with garbage
data not belonging to a task.

Commit 17eed27b02da ("arm64/sve: KVM: Prevent guests from using
SVE") factors this operation out as a new helper
fpsimd_flush_cpu_state() to make it clearer what is being done
here, and on SVE systems this helper is now used, via
kvm_fpsimd_flush_cpu_state(), to invalidate the registers after KVM
has run a vcpu.  The reason for this is that KVM does not yet
understand how to restore the full host SVE registers itself after
loading the guest FPSIMD context into them.

This exposes a particular problem: if fpsimd_last_state.st is set
to NULL without also setting TIF_FOREIGN_FPSTATE, the kernel may
continue to think that current's FPSIMD registers are live even
though they have actually been clobbered.

Prior to the aforementioned commit, the only path where
fpsimd_last_state.st is set to NULL without setting
TIF_FOREIGN_FPSTATE is when kernel_neon_begin() is called by a
kernel thread (where current->mm can be NULL).  This does not
matter, because the only harm is that at context-switch time
fpsimd_thread_switch() may unnecessarily save the FPSIMD registers
back to current's thread_struct (even though kernel threads are not
considered to have any FPSIMD context of their own and the
registers will never be reloaded).

Note that although CPU_PM_ENTER lacks the TIF_FOREIGN_FPSTATE
setting, every CPU passing through that path must subsequently pass
through CPU_PM_EXIT before it can re-enter the kernel proper.
CPU_PM_EXIT sets the flag.

The sve_flush_cpu_state() function added by commit 17eed27b02da
also lacks the proper maintenance of TIF_FOREIGN_FPSTATE.  This may
cause the bits of a host task's SVE registers that do not alias the
FPSIMD register file to spontaneously appear zeroed if a KVM vcpu
runs in the same task in the meantime.  Although this effect is
hidden by the fact that the non-FPSIMD bits of the SVE registers
are zeroed by a syscall anyway, it is doubtless a bad idea to rely
on these different code paths interacting correctly under future
maintenance.

This patch makes TIF_FOREIGN_FPSTATE an unconditional side-effect
of fpsimd_flush_cpu_state(), and removes the set_thread_flag()
calls that become redunant as a result.  This ensures that
TIF_FOREIGN_FPSTATE cannot remain clear if the FPSIMD state in the
FPSIMD registers is invalid.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---

Changes since v9:

 * New patch (bugfix to subsequent commits).
---
 arch/arm64/kernel/fpsimd.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 87a3536..12e1c96 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1067,6 +1067,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
 static inline void fpsimd_flush_cpu_state(void)
 {
 	__this_cpu_write(fpsimd_last_state.st, NULL);
+	set_thread_flag(TIF_FOREIGN_FPSTATE);
 }
 
 /*
@@ -1121,10 +1122,8 @@ void kernel_neon_begin(void)
 	__this_cpu_write(kernel_neon_busy, true);
 
 	/* Save unsaved task fpsimd state, if any: */
-	if (current->mm) {
+	if (current->mm)
 		task_fpsimd_save();
-		set_thread_flag(TIF_FOREIGN_FPSTATE);
-	}
 
 	/* Invalidate any task state remaining in the fpsimd regs: */
 	fpsimd_flush_cpu_state();
@@ -1251,8 +1250,6 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
 		fpsimd_flush_cpu_state();
 		break;
 	case CPU_PM_EXIT:
-		if (current->mm)
-			set_thread_flag(TIF_FOREIGN_FPSTATE);
 		break;
 	case CPU_PM_ENTER_FAILED:
 	default:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, Oleg Nesterov, Peter Zijlstra, Ingo Molnar,
	linux-arm-kernel

There are a number of bits of code sprinkled around the kernel to
set a thread flag if a certain condition is true, and clear it
otherwise.

To help make those call sites terser and less cumbersome, this
patch adds a new family of thread flag manipulators

	update*_thread_flag([...,] flag, cond)

which do the equivalent of:

	if (cond)
		set*_thread_flag([...,] flag);
	else
		clear*_thread_flag([...,] flag);

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h       |  6 ++++++
 include/linux/thread_info.h | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b3d697f..c2c3051 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1578,6 +1578,12 @@ static inline void clear_tsk_thread_flag(struct task_struct *tsk, int flag)
 	clear_ti_thread_flag(task_thread_info(tsk), flag);
 }
 
+static inline void update_tsk_thread_flag(struct task_struct *tsk, int flag,
+					  bool value)
+{
+	update_ti_thread_flag(task_thread_info(tsk), flag, value);
+}
+
 static inline int test_and_set_tsk_thread_flag(struct task_struct *tsk, int flag)
 {
 	return test_and_set_ti_thread_flag(task_thread_info(tsk), flag);
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index cf2862b..8d8821b 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -60,6 +60,15 @@ static inline void clear_ti_thread_flag(struct thread_info *ti, int flag)
 	clear_bit(flag, (unsigned long *)&ti->flags);
 }
 
+static inline void update_ti_thread_flag(struct thread_info *ti, int flag,
+					 bool value)
+{
+	if (value)
+		set_ti_thread_flag(ti, flag);
+	else
+		clear_ti_thread_flag(ti, flag);
+}
+
 static inline int test_and_set_ti_thread_flag(struct thread_info *ti, int flag)
 {
 	return test_and_set_bit(flag, (unsigned long *)&ti->flags);
@@ -79,6 +88,8 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
 	set_ti_thread_flag(current_thread_info(), flag)
 #define clear_thread_flag(flag) \
 	clear_ti_thread_flag(current_thread_info(), flag)
+#define update_thread_flag(flag, value) \
+	update_ti_thread_flag(current_thread_info(), flag, value)
 #define test_and_set_thread_flag(flag) \
 	test_and_set_ti_thread_flag(current_thread_info(), flag)
 #define test_and_clear_thread_flag(flag) \
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

There are a number of bits of code sprinkled around the kernel to
set a thread flag if a certain condition is true, and clear it
otherwise.

To help make those call sites terser and less cumbersome, this
patch adds a new family of thread flag manipulators

	update*_thread_flag([...,] flag, cond)

which do the equivalent of:

	if (cond)
		set*_thread_flag([...,] flag);
	else
		clear*_thread_flag([...,] flag);

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h       |  6 ++++++
 include/linux/thread_info.h | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b3d697f..c2c3051 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1578,6 +1578,12 @@ static inline void clear_tsk_thread_flag(struct task_struct *tsk, int flag)
 	clear_ti_thread_flag(task_thread_info(tsk), flag);
 }
 
+static inline void update_tsk_thread_flag(struct task_struct *tsk, int flag,
+					  bool value)
+{
+	update_ti_thread_flag(task_thread_info(tsk), flag, value);
+}
+
 static inline int test_and_set_tsk_thread_flag(struct task_struct *tsk, int flag)
 {
 	return test_and_set_ti_thread_flag(task_thread_info(tsk), flag);
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index cf2862b..8d8821b 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -60,6 +60,15 @@ static inline void clear_ti_thread_flag(struct thread_info *ti, int flag)
 	clear_bit(flag, (unsigned long *)&ti->flags);
 }
 
+static inline void update_ti_thread_flag(struct thread_info *ti, int flag,
+					 bool value)
+{
+	if (value)
+		set_ti_thread_flag(ti, flag);
+	else
+		clear_ti_thread_flag(ti, flag);
+}
+
 static inline int test_and_set_ti_thread_flag(struct thread_info *ti, int flag)
 {
 	return test_and_set_bit(flag, (unsigned long *)&ti->flags);
@@ -79,6 +88,8 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
 	set_ti_thread_flag(current_thread_info(), flag)
 #define clear_thread_flag(flag) \
 	clear_ti_thread_flag(current_thread_info(), flag)
+#define update_thread_flag(flag, value) \
+	update_ti_thread_flag(current_thread_info(), flag, value)
 #define test_and_set_thread_flag(flag) \
 	test_and_set_ti_thread_flag(current_thread_info(), flag)
 #define test_and_clear_thread_flag(flag) \
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 03/18] arm64: Use update{,_tsk}_thread_flag()
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

This patch uses the new update_thread_flag() helpers to simplify a
couple of if () set; else clear; constructs.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 12e1c96..9d85373 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -618,10 +618,8 @@ int sve_set_vector_length(struct task_struct *task,
 	task->thread.sve_vl = vl;
 
 out:
-	if (flags & PR_SVE_VL_INHERIT)
-		set_tsk_thread_flag(task, TIF_SVE_VL_INHERIT);
-	else
-		clear_tsk_thread_flag(task, TIF_SVE_VL_INHERIT);
+	update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
+			       flags & PR_SVE_VL_INHERIT);
 
 	return 0;
 }
@@ -910,12 +908,12 @@ void fpsimd_thread_switch(struct task_struct *next)
 		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
 		 * upon the next return to userland.
 		 */
-		if (__this_cpu_read(fpsimd_last_state.st) ==
-			&next->thread.uw.fpsimd_state
-		    && next->thread.fpsimd_cpu == smp_processor_id())
-			clear_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
-		else
-			set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
+		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
+					&next->thread.uw.fpsimd_state;
+		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
+
+		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
+				       wrong_task || wrong_cpu);
 	}
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 03/18] arm64: Use update{,_tsk}_thread_flag()
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

This patch uses the new update_thread_flag() helpers to simplify a
couple of if () set; else clear; constructs.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 12e1c96..9d85373 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -618,10 +618,8 @@ int sve_set_vector_length(struct task_struct *task,
 	task->thread.sve_vl = vl;
 
 out:
-	if (flags & PR_SVE_VL_INHERIT)
-		set_tsk_thread_flag(task, TIF_SVE_VL_INHERIT);
-	else
-		clear_tsk_thread_flag(task, TIF_SVE_VL_INHERIT);
+	update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
+			       flags & PR_SVE_VL_INHERIT);
 
 	return 0;
 }
@@ -910,12 +908,12 @@ void fpsimd_thread_switch(struct task_struct *next)
 		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
 		 * upon the next return to userland.
 		 */
-		if (__this_cpu_read(fpsimd_last_state.st) ==
-			&next->thread.uw.fpsimd_state
-		    && next->thread.fpsimd_cpu == smp_processor_id())
-			clear_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
-		else
-			set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
+		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
+					&next->thread.uw.fpsimd_state;
+		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
+
+		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
+				       wrong_task || wrong_cpu);
 	}
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, Christoffer Dall, linux-arm-kernel

From: Christoffer Dall <christoffer.dall@linaro.org>

KVM/ARM differs from other architectures in having to maintain an
additional virtual address space from that of the host and the
guest, because we split the execution of KVM across both EL1 and
EL2.

This results in a need to explicitly map data structures into EL2
(hyp) which are accessed from the hyp code.  As we are about to be
more clever with our FPSIMD handling on arm64, which stores data in
the task struct and uses thread_info flags, we will have to map
parts of the currently executing task struct into the EL2 virtual
address space.

However, we don't want to do this on every KVM_RUN, because it is a
fairly expensive operation to walk the page tables, and the common
execution mode is to map a single thread to a VCPU.  By introducing
a hook that architectures can select with
HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
other architectures, but have a simple way to only map the data we
need when required for arm64.

This patch introduces the framework only, and wires it up in the
arm/arm64 KVM common code.

No functional change.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/linux/kvm_host.h | 9 +++++++++
 virt/kvm/Kconfig         | 3 +++
 virt/kvm/kvm_main.c      | 7 ++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6930c63..4268ace 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
 void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 		unsigned long start, unsigned long end);
 
+#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
+int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
+#else
+static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index cca7e06..72143cf 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
 
 config HAVE_KVM_VCPU_ASYNC_IOCTL
        bool
+
+config HAVE_KVM_VCPU_RUN_PID_CHANGE
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c7b2e92..c32e240 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2550,8 +2550,13 @@ static long kvm_vcpu_ioctl(struct file *filp,
 		oldpid = rcu_access_pointer(vcpu->pid);
 		if (unlikely(oldpid != current->pids[PIDTYPE_PID].pid)) {
 			/* The thread running this VCPU changed. */
-			struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
+			struct pid *newpid;
 
+			r = kvm_arch_vcpu_run_pid_change(vcpu);
+			if (r)
+				break;
+
+			newpid = get_task_pid(current, PIDTYPE_PID);
 			rcu_assign_pointer(vcpu->pid, newpid);
 			if (oldpid)
 				synchronize_rcu();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christoffer Dall <christoffer.dall@linaro.org>

KVM/ARM differs from other architectures in having to maintain an
additional virtual address space from that of the host and the
guest, because we split the execution of KVM across both EL1 and
EL2.

This results in a need to explicitly map data structures into EL2
(hyp) which are accessed from the hyp code.  As we are about to be
more clever with our FPSIMD handling on arm64, which stores data in
the task struct and uses thread_info flags, we will have to map
parts of the currently executing task struct into the EL2 virtual
address space.

However, we don't want to do this on every KVM_RUN, because it is a
fairly expensive operation to walk the page tables, and the common
execution mode is to map a single thread to a VCPU.  By introducing
a hook that architectures can select with
HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
other architectures, but have a simple way to only map the data we
need when required for arm64.

This patch introduces the framework only, and wires it up in the
arm/arm64 KVM common code.

No functional change.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
---
 include/linux/kvm_host.h | 9 +++++++++
 virt/kvm/Kconfig         | 3 +++
 virt/kvm/kvm_main.c      | 7 ++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6930c63..4268ace 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
 void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 		unsigned long start, unsigned long end);
 
+#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
+int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
+#else
+static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index cca7e06..72143cf 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
 
 config HAVE_KVM_VCPU_ASYNC_IOCTL
        bool
+
+config HAVE_KVM_VCPU_RUN_PID_CHANGE
+       bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c7b2e92..c32e240 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2550,8 +2550,13 @@ static long kvm_vcpu_ioctl(struct file *filp,
 		oldpid = rcu_access_pointer(vcpu->pid);
 		if (unlikely(oldpid != current->pids[PIDTYPE_PID].pid)) {
 			/* The thread running this VCPU changed. */
-			struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
+			struct pid *newpid;
 
+			r = kvm_arch_vcpu_run_pid_change(vcpu);
+			if (r)
+				break;
+
+			newpid = get_task_pid(current, PIDTYPE_PID);
 			rcu_assign_pointer(vcpu->pid, newpid);
 			if (oldpid)
 				synchronize_rcu();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

To make the lazy FPSIMD context switch trap code easier to hack on,
this patch converts it to C.

This is not amazingly efficient, but the trap should typically only
be taken once per host context switch.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
 arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
 2 files changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e41a161..40f349b 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -172,40 +172,27 @@ ENTRY(__fpsimd_guest_restore)
 	// x1: vcpu
 	// x2-x29,lr: vcpu regs
 	// vcpu x0-x1 on the stack
-	stp	x2, x3, [sp, #-16]!
-	stp	x4, lr, [sp, #-16]!
-
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
-	mrs	x2, cptr_el2
-	bic	x2, x2, #CPTR_EL2_TFP
-	msr	cptr_el2, x2
-alternative_else
-	mrs	x2, cpacr_el1
-	orr	x2, x2, #CPACR_EL1_FPEN
-	msr	cpacr_el1, x2
-alternative_endif
-	isb
-
-	mov	x3, x1
-
-	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
-	kern_hyp_va x0
-	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	bl	__fpsimd_save_state
-
-	add	x2, x3, #VCPU_CONTEXT
-	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	bl	__fpsimd_restore_state
-
-	// Skip restoring fpexc32 for AArch64 guests
-	mrs	x1, hcr_el2
-	tbnz	x1, #HCR_RW_SHIFT, 1f
-	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
-	msr	fpexc32_el2, x4
-1:
-	ldp	x4, lr, [sp], #16
-	ldp	x2, x3, [sp], #16
-	ldp	x0, x1, [sp], #16
-
+	stp	x2, x3, [sp, #-144]!
+	stp	x4, x5, [sp, #16]
+	stp	x6, x7, [sp, #32]
+	stp	x8, x9, [sp, #48]
+	stp	x10, x11, [sp, #64]
+	stp	x12, x13, [sp, #80]
+	stp	x14, x15, [sp, #96]
+	stp	x16, x17, [sp, #112]
+	stp	x18, lr, [sp, #128]
+
+	bl	__hyp_switch_fpsimd
+
+	ldp	x4, x5, [sp, #16]
+	ldp	x6, x7, [sp, #32]
+	ldp	x8, x9, [sp, #48]
+	ldp	x10, x11, [sp, #64]
+	ldp	x12, x13, [sp, #80]
+	ldp	x14, x15, [sp, #96]
+	ldp	x16, x17, [sp, #112]
+	ldp	x18, lr, [sp, #128]
+	ldp	x0, x1, [sp, #144]
+	ldp	x2, x3, [sp], #160
 	eret
 ENDPROC(__fpsimd_guest_restore)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d964523..c0796c4 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 	}
 }
 
+void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
+				    struct kvm_vcpu *vcpu)
+{
+	kvm_cpu_context_t *host_ctxt;
+
+	if (has_vhe())
+		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
+			     cpacr_el1);
+	else
+		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
+			     cptr_el2);
+
+	isb();
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
+	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
+
+	/* Skip restoring fpexc32 for AArch64 guests */
+	if (!(read_sysreg(hcr_el2) & HCR_RW))
+		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
+			     fpexc32_el2);
+}
+
 /*
  * Return true when we were able to fixup the guest exit and should return to
  * the guest, false when we should restore the host state and return to the
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

To make the lazy FPSIMD context switch trap code easier to hack on,
this patch converts it to C.

This is not amazingly efficient, but the trap should typically only
be taken once per host context switch.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
 arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
 2 files changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index e41a161..40f349b 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -172,40 +172,27 @@ ENTRY(__fpsimd_guest_restore)
 	// x1: vcpu
 	// x2-x29,lr: vcpu regs
 	// vcpu x0-x1 on the stack
-	stp	x2, x3, [sp, #-16]!
-	stp	x4, lr, [sp, #-16]!
-
-alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
-	mrs	x2, cptr_el2
-	bic	x2, x2, #CPTR_EL2_TFP
-	msr	cptr_el2, x2
-alternative_else
-	mrs	x2, cpacr_el1
-	orr	x2, x2, #CPACR_EL1_FPEN
-	msr	cpacr_el1, x2
-alternative_endif
-	isb
-
-	mov	x3, x1
-
-	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
-	kern_hyp_va x0
-	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	bl	__fpsimd_save_state
-
-	add	x2, x3, #VCPU_CONTEXT
-	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
-	bl	__fpsimd_restore_state
-
-	// Skip restoring fpexc32 for AArch64 guests
-	mrs	x1, hcr_el2
-	tbnz	x1, #HCR_RW_SHIFT, 1f
-	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
-	msr	fpexc32_el2, x4
-1:
-	ldp	x4, lr, [sp], #16
-	ldp	x2, x3, [sp], #16
-	ldp	x0, x1, [sp], #16
-
+	stp	x2, x3, [sp, #-144]!
+	stp	x4, x5, [sp, #16]
+	stp	x6, x7, [sp, #32]
+	stp	x8, x9, [sp, #48]
+	stp	x10, x11, [sp, #64]
+	stp	x12, x13, [sp, #80]
+	stp	x14, x15, [sp, #96]
+	stp	x16, x17, [sp, #112]
+	stp	x18, lr, [sp, #128]
+
+	bl	__hyp_switch_fpsimd
+
+	ldp	x4, x5, [sp, #16]
+	ldp	x6, x7, [sp, #32]
+	ldp	x8, x9, [sp, #48]
+	ldp	x10, x11, [sp, #64]
+	ldp	x12, x13, [sp, #80]
+	ldp	x14, x15, [sp, #96]
+	ldp	x16, x17, [sp, #112]
+	ldp	x18, lr, [sp, #128]
+	ldp	x0, x1, [sp, #144]
+	ldp	x2, x3, [sp], #160
 	eret
 ENDPROC(__fpsimd_guest_restore)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d964523..c0796c4 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 	}
 }
 
+void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
+				    struct kvm_vcpu *vcpu)
+{
+	kvm_cpu_context_t *host_ctxt;
+
+	if (has_vhe())
+		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
+			     cpacr_el1);
+	else
+		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
+			     cptr_el2);
+
+	isb();
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
+	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
+
+	/* Skip restoring fpexc32 for AArch64 guests */
+	if (!(read_sysreg(hcr_el2) & HCR_RW))
+		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
+			     fpexc32_el2);
+}
+
 /*
  * Return true when we were able to fixup the guest exit and should return to
  * the guest, false when we should restore the host state and return to the
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
contexts to be handled by the fpsimd common code, this patch adapts
task_fpsimd_save() to save back the currently loaded context,
removing the explicit dependency on current.

The relevant storage to write back to in memory is now found by
examining the fpsimd_last_state percpu struct.

fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
fpsimd_last_state is updated under local_bh_disable() or
local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
thus, fpsimd_save() will write back to the correct storage for the
loaded context.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 9d85373..3aa100a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -270,13 +270,15 @@ static void task_fpsimd_load(void)
 }
 
 /*
- * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
- * with respect to the CPU registers.
+ * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
+ * date with respect to the CPU registers.
  *
  * Softirqs (and preemption) must be disabled.
  */
-static void task_fpsimd_save(void)
+static void fpsimd_save(void)
 {
+	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
+
 	WARN_ON(!in_softirq() && !irqs_disabled());
 
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
@@ -291,10 +293,9 @@ static void task_fpsimd_save(void)
 				return;
 			}
 
-			sve_save_state(sve_pffr(current),
-				       &current->thread.uw.fpsimd_state.fpsr);
+			sve_save_state(sve_pffr(current), &st->fpsr);
 		} else
-			fpsimd_save_state(&current->thread.uw.fpsimd_state);
+			fpsimd_save_state(st);
 	}
 }
 
@@ -598,7 +599,7 @@ int sve_set_vector_length(struct task_struct *task,
 	if (task == current) {
 		local_bh_disable();
 
-		task_fpsimd_save();
+		fpsimd_save();
 		set_thread_flag(TIF_FOREIGN_FPSTATE);
 	}
 
@@ -837,7 +838,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 
 	local_bh_disable();
 
-	task_fpsimd_save();
+	fpsimd_save();
 	fpsimd_to_sve(current);
 
 	/* Force ret_to_user to reload the registers: */
@@ -898,7 +899,7 @@ void fpsimd_thread_switch(struct task_struct *next)
 	 * 'current'.
 	 */
 	if (current->mm)
-		task_fpsimd_save();
+		fpsimd_save();
 
 	if (next->mm) {
 		/*
@@ -980,7 +981,7 @@ void fpsimd_preserve_current_state(void)
 		return;
 
 	local_bh_disable();
-	task_fpsimd_save();
+	fpsimd_save();
 	local_bh_enable();
 }
 
@@ -1121,7 +1122,7 @@ void kernel_neon_begin(void)
 
 	/* Save unsaved task fpsimd state, if any: */
 	if (current->mm)
-		task_fpsimd_save();
+		fpsimd_save();
 
 	/* Invalidate any task state remaining in the fpsimd regs: */
 	fpsimd_flush_cpu_state();
@@ -1244,7 +1245,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
 	switch (cmd) {
 	case CPU_PM_ENTER:
 		if (current->mm)
-			task_fpsimd_save();
+			fpsimd_save();
 		fpsimd_flush_cpu_state();
 		break;
 	case CPU_PM_EXIT:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
contexts to be handled by the fpsimd common code, this patch adapts
task_fpsimd_save() to save back the currently loaded context,
removing the explicit dependency on current.

The relevant storage to write back to in memory is now found by
examining the fpsimd_last_state percpu struct.

fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
fpsimd_last_state is updated under local_bh_disable() or
local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
thus, fpsimd_save() will write back to the correct storage for the
loaded context.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 9d85373..3aa100a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -270,13 +270,15 @@ static void task_fpsimd_load(void)
 }
 
 /*
- * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
- * with respect to the CPU registers.
+ * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
+ * date with respect to the CPU registers.
  *
  * Softirqs (and preemption) must be disabled.
  */
-static void task_fpsimd_save(void)
+static void fpsimd_save(void)
 {
+	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
+
 	WARN_ON(!in_softirq() && !irqs_disabled());
 
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
@@ -291,10 +293,9 @@ static void task_fpsimd_save(void)
 				return;
 			}
 
-			sve_save_state(sve_pffr(current),
-				       &current->thread.uw.fpsimd_state.fpsr);
+			sve_save_state(sve_pffr(current), &st->fpsr);
 		} else
-			fpsimd_save_state(&current->thread.uw.fpsimd_state);
+			fpsimd_save_state(st);
 	}
 }
 
@@ -598,7 +599,7 @@ int sve_set_vector_length(struct task_struct *task,
 	if (task == current) {
 		local_bh_disable();
 
-		task_fpsimd_save();
+		fpsimd_save();
 		set_thread_flag(TIF_FOREIGN_FPSTATE);
 	}
 
@@ -837,7 +838,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 
 	local_bh_disable();
 
-	task_fpsimd_save();
+	fpsimd_save();
 	fpsimd_to_sve(current);
 
 	/* Force ret_to_user to reload the registers: */
@@ -898,7 +899,7 @@ void fpsimd_thread_switch(struct task_struct *next)
 	 * 'current'.
 	 */
 	if (current->mm)
-		task_fpsimd_save();
+		fpsimd_save();
 
 	if (next->mm) {
 		/*
@@ -980,7 +981,7 @@ void fpsimd_preserve_current_state(void)
 		return;
 
 	local_bh_disable();
-	task_fpsimd_save();
+	fpsimd_save();
 	local_bh_enable();
 }
 
@@ -1121,7 +1122,7 @@ void kernel_neon_begin(void)
 
 	/* Save unsaved task fpsimd state, if any: */
 	if (current->mm)
-		task_fpsimd_save();
+		fpsimd_save();
 
 	/* Invalidate any task state remaining in the fpsimd regs: */
 	fpsimd_flush_cpu_state();
@@ -1244,7 +1245,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
 	switch (cmd) {
 	case CPU_PM_ENTER:
 		if (current->mm)
-			task_fpsimd_save();
+			fpsimd_save();
 		fpsimd_flush_cpu_state();
 		break;
 	case CPU_PM_EXIT:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.

The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory.  For these reasons, the ->mm
checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
maintained in a consistent way for kernel threads.

This is true by construction however: TIF_FOREIGN_FPSTATE is never
cleared except when returning to userspace or returning from a
signal: thus, for a true kernel thread no FPSIMD context is ever
loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
ever be saved.

This patch removes the redundant checks and special-case code.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---

Changes since v9:

 * New patch.  Introduced during debugging, since the ->mm checks
   appear bogus and/or redundant, so are likely to be hiding or
   causing bugs.
---
 arch/arm64/include/asm/thread_info.h |  1 +
 arch/arm64/kernel/fpsimd.c           | 38 ++++++++++++------------------------
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 740aa03c..a2ac914 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -47,6 +47,7 @@ struct thread_info {
 
 #define INIT_THREAD_INFO(tsk)						\
 {									\
+	.flags		= _TIF_FOREIGN_FPSTATE,				\
 	.preempt_count	= INIT_PREEMPT_COUNT,				\
 	.addr_limit	= KERNEL_DS,					\
 }
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 3aa100a..1222491 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -891,31 +891,21 @@ asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 
 void fpsimd_thread_switch(struct task_struct *next)
 {
+	bool wrong_task, wrong_cpu;
+
 	if (!system_supports_fpsimd())
 		return;
-	/*
-	 * Save the current FPSIMD state to memory, but only if whatever is in
-	 * the registers is in fact the most recent userland FPSIMD state of
-	 * 'current'.
-	 */
-	if (current->mm)
-		fpsimd_save();
 
-	if (next->mm) {
-		/*
-		 * If we are switching to a task whose most recent userland
-		 * FPSIMD state is already in the registers of *this* cpu,
-		 * we can skip loading the state from memory. Otherwise, set
-		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
-		 * upon the next return to userland.
-		 */
-		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
+	/* Save unsaved fpsimd state, if any: */
+	fpsimd_save();
+
+	/* Fix up TIF_FOREIGN_FPSTATE to correctly describe next's state: */
+	wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
 					&next->thread.uw.fpsimd_state;
-		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
+	wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
 
-		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
-				       wrong_task || wrong_cpu);
-	}
+	update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
+			       wrong_task || wrong_cpu);
 }
 
 void fpsimd_flush_thread(void)
@@ -1120,9 +1110,8 @@ void kernel_neon_begin(void)
 
 	__this_cpu_write(kernel_neon_busy, true);
 
-	/* Save unsaved task fpsimd state, if any: */
-	if (current->mm)
-		fpsimd_save();
+	/* Save unsaved fpsimd state, if any: */
+	fpsimd_save();
 
 	/* Invalidate any task state remaining in the fpsimd regs: */
 	fpsimd_flush_cpu_state();
@@ -1244,8 +1233,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
 {
 	switch (cmd) {
 	case CPU_PM_ENTER:
-		if (current->mm)
-			fpsimd_save();
+		fpsimd_save();
 		fpsimd_flush_cpu_state();
 		break;
 	case CPU_PM_EXIT:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.

The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory.  For these reasons, the ->mm
checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
maintained in a consistent way for kernel threads.

This is true by construction however: TIF_FOREIGN_FPSTATE is never
cleared except when returning to userspace or returning from a
signal: thus, for a true kernel thread no FPSIMD context is ever
loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
ever be saved.

This patch removes the redundant checks and special-case code.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---

Changes since v9:

 * New patch.  Introduced during debugging, since the ->mm checks
   appear bogus and/or redundant, so are likely to be hiding or
   causing bugs.
---
 arch/arm64/include/asm/thread_info.h |  1 +
 arch/arm64/kernel/fpsimd.c           | 38 ++++++++++++------------------------
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 740aa03c..a2ac914 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -47,6 +47,7 @@ struct thread_info {
 
 #define INIT_THREAD_INFO(tsk)						\
 {									\
+	.flags		= _TIF_FOREIGN_FPSTATE,				\
 	.preempt_count	= INIT_PREEMPT_COUNT,				\
 	.addr_limit	= KERNEL_DS,					\
 }
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 3aa100a..1222491 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -891,31 +891,21 @@ asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 
 void fpsimd_thread_switch(struct task_struct *next)
 {
+	bool wrong_task, wrong_cpu;
+
 	if (!system_supports_fpsimd())
 		return;
-	/*
-	 * Save the current FPSIMD state to memory, but only if whatever is in
-	 * the registers is in fact the most recent userland FPSIMD state of
-	 * 'current'.
-	 */
-	if (current->mm)
-		fpsimd_save();
 
-	if (next->mm) {
-		/*
-		 * If we are switching to a task whose most recent userland
-		 * FPSIMD state is already in the registers of *this* cpu,
-		 * we can skip loading the state from memory. Otherwise, set
-		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
-		 * upon the next return to userland.
-		 */
-		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
+	/* Save unsaved fpsimd state, if any: */
+	fpsimd_save();
+
+	/* Fix up TIF_FOREIGN_FPSTATE to correctly describe next's state: */
+	wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
 					&next->thread.uw.fpsimd_state;
-		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
+	wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
 
-		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
-				       wrong_task || wrong_cpu);
-	}
+	update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
+			       wrong_task || wrong_cpu);
 }
 
 void fpsimd_flush_thread(void)
@@ -1120,9 +1110,8 @@ void kernel_neon_begin(void)
 
 	__this_cpu_write(kernel_neon_busy, true);
 
-	/* Save unsaved task fpsimd state, if any: */
-	if (current->mm)
-		fpsimd_save();
+	/* Save unsaved fpsimd state, if any: */
+	fpsimd_save();
 
 	/* Invalidate any task state remaining in the fpsimd regs: */
 	fpsimd_flush_cpu_state();
@@ -1244,8 +1233,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
 {
 	switch (cmd) {
 	case CPU_PM_ENTER:
-		if (current->mm)
-			fpsimd_save();
+		fpsimd_save();
 		fpsimd_flush_cpu_state();
 		break;
 	case CPU_PM_EXIT:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 08/18] arm64/sve: Refactor user SVE trap maintenance for external use
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

In preparation for optimising the way KVM manages switching the
guest and host FPSIMD state, it is necessary to provide a means for
code outside arch/arm64/kernel/fpsimd.c to restore the user trap
configuration for SVE correctly for the current task.

Rather than requiring external code to duplicate the maintenance
explicitly, this patch wraps moves the trap maintenenace to
fpsimd_bind_to_cpu(), since it is logically part of the work of
associating the current task with the cpu.

Because fpsimd_bind_to_cpu() is rather a cryptic name to publish
alongside fpsimd_bind_state_to_cpu(), the former function is
renamed to fpsimd_bind_task_to_cpu() to make its purpose more
explicit.

This patch makes appropriate changes to ensure that
fpsimd_bind_task_to_cpu() is always called alongside
task_fpsimd_load(), so that the trap maintenance continues to be
done in every situation where it was done prior to this patch.

As a side-effect, the metadata updates done by
fpsimd_bind_task_to_cpu() now change from conditional to
unconditional in the "already bound" case of sigreturn.  This is
harmless, and a couple of extra stores on this slow path will not
impact performance.  I consider this a reasonable price to pay for
a slightly cleaner interface.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1222491..ba9e7df 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -257,16 +257,6 @@ static void task_fpsimd_load(void)
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
 	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
-
-	if (system_supports_sve()) {
-		/* Toggle SVE trapping for userspace if needed */
-		if (test_thread_flag(TIF_SVE))
-			sve_user_enable();
-		else
-			sve_user_disable();
-
-		/* Serialised by exception return to user */
-	}
 }
 
 /*
@@ -991,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
  * Associate current's FPSIMD context with this cpu
  * Preemption must be disabled when calling this function.
  */
-static void fpsimd_bind_to_cpu(void)
+static void fpsimd_bind_task_to_cpu(void)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -999,6 +989,16 @@ static void fpsimd_bind_to_cpu(void)
 	last->st = &current->thread.uw.fpsimd_state;
 	last->sve_in_use = test_thread_flag(TIF_SVE);
 	current->thread.fpsimd_cpu = smp_processor_id();
+
+	if (system_supports_sve()) {
+		/* Toggle SVE trapping for userspace if needed */
+		if (test_thread_flag(TIF_SVE))
+			sve_user_enable();
+		else
+			sve_user_disable();
+
+		/* Serialised by exception return to user */
+	}
 }
 
 /*
@@ -1015,7 +1015,7 @@ void fpsimd_restore_current_state(void)
 
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		task_fpsimd_load();
-		fpsimd_bind_to_cpu();
+		fpsimd_bind_task_to_cpu();
 	}
 
 	local_bh_enable();
@@ -1038,9 +1038,9 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 		fpsimd_to_sve(current);
 
 	task_fpsimd_load();
+	fpsimd_bind_task_to_cpu();
 
-	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE))
-		fpsimd_bind_to_cpu();
+	clear_thread_flag(TIF_FOREIGN_FPSTATE);
 
 	local_bh_enable();
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 08/18] arm64/sve: Refactor user SVE trap maintenance for external use
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation for optimising the way KVM manages switching the
guest and host FPSIMD state, it is necessary to provide a means for
code outside arch/arm64/kernel/fpsimd.c to restore the user trap
configuration for SVE correctly for the current task.

Rather than requiring external code to duplicate the maintenance
explicitly, this patch wraps moves the trap maintenenace to
fpsimd_bind_to_cpu(), since it is logically part of the work of
associating the current task with the cpu.

Because fpsimd_bind_to_cpu() is rather a cryptic name to publish
alongside fpsimd_bind_state_to_cpu(), the former function is
renamed to fpsimd_bind_task_to_cpu() to make its purpose more
explicit.

This patch makes appropriate changes to ensure that
fpsimd_bind_task_to_cpu() is always called alongside
task_fpsimd_load(), so that the trap maintenance continues to be
done in every situation where it was done prior to this patch.

As a side-effect, the metadata updates done by
fpsimd_bind_task_to_cpu() now change from conditional to
unconditional in the "already bound" case of sigreturn.  This is
harmless, and a couple of extra stores on this slow path will not
impact performance.  I consider this a reasonable price to pay for
a slightly cleaner interface.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1222491..ba9e7df 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -257,16 +257,6 @@ static void task_fpsimd_load(void)
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
 	else
 		fpsimd_load_state(&current->thread.uw.fpsimd_state);
-
-	if (system_supports_sve()) {
-		/* Toggle SVE trapping for userspace if needed */
-		if (test_thread_flag(TIF_SVE))
-			sve_user_enable();
-		else
-			sve_user_disable();
-
-		/* Serialised by exception return to user */
-	}
 }
 
 /*
@@ -991,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
  * Associate current's FPSIMD context with this cpu
  * Preemption must be disabled when calling this function.
  */
-static void fpsimd_bind_to_cpu(void)
+static void fpsimd_bind_task_to_cpu(void)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -999,6 +989,16 @@ static void fpsimd_bind_to_cpu(void)
 	last->st = &current->thread.uw.fpsimd_state;
 	last->sve_in_use = test_thread_flag(TIF_SVE);
 	current->thread.fpsimd_cpu = smp_processor_id();
+
+	if (system_supports_sve()) {
+		/* Toggle SVE trapping for userspace if needed */
+		if (test_thread_flag(TIF_SVE))
+			sve_user_enable();
+		else
+			sve_user_disable();
+
+		/* Serialised by exception return to user */
+	}
 }
 
 /*
@@ -1015,7 +1015,7 @@ void fpsimd_restore_current_state(void)
 
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		task_fpsimd_load();
-		fpsimd_bind_to_cpu();
+		fpsimd_bind_task_to_cpu();
 	}
 
 	local_bh_enable();
@@ -1038,9 +1038,9 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 		fpsimd_to_sve(current);
 
 	task_fpsimd_load();
+	fpsimd_bind_task_to_cpu();
 
-	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE))
-		fpsimd_bind_to_cpu();
+	clear_thread_flag(TIF_FOREIGN_FPSTATE);
 
 	local_bh_enable();
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 09/18] KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

In struct vcpu_arch, the debug_flags field is used to store
debug-related flags about the vcpu state.

Since we are about to add some more flags related to FPSIMD and
SVE, it makes sense to add them to the existing flags field rather
than adding new fields.  Since there is only one debug_flags flag
defined so far, there is plenty of free space for expansion.

In preparation for adding more flags, this patch renames the
debug_flags field to simply "flags", and updates comments
appropriately.

The flag definitions are also moved to <asm/kvm_host.h>, since
their presence in <asm/kvm_asm.h> was for purely historical
reasons:  these definitions are not used from asm any more, and not
very likely to be as more Hyp asm is migrated to C.

KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit
1ea66d27e7b0 ("arm64: KVM: Move away from the assembly version of
the world switch"), so this patch gets rid of that too.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/include/asm/kvm_asm.h  | 3 ---
 arch/arm64/include/asm/kvm_host.h | 7 +++++--
 arch/arm64/kvm/debug.c            | 8 ++++----
 arch/arm64/kvm/hyp/debug-sr.c     | 6 +++---
 arch/arm64/kvm/hyp/sysreg-sr.c    | 4 ++--
 arch/arm64/kvm/sys_regs.c         | 9 ++++-----
 6 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f6648a3..f62ccbf 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -30,9 +30,6 @@
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
 
-#define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
-#define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
-
 /* Translate a kernel address of @sym into its equivalent linear mapping */
 #define kvm_ksym_ref(sym)						\
 	({								\
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 469de8a..146c167 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -216,8 +216,8 @@ struct kvm_vcpu_arch {
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
 
-	/* Guest debug state */
-	u64 debug_flags;
+	/* Miscellaneous vcpu state flags */
+	u64 flags;
 
 	/*
 	 * We maintain more than a single set of debug registers to support
@@ -293,6 +293,9 @@ struct kvm_vcpu_arch {
 	bool sysregs_loaded_on_cpu;
 };
 
+/* vcpu_arch flags field values: */
+#define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
+
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
 
 /*
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index a1f4ebd..00d4223 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -103,7 +103,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
  *
  * Additionally, KVM only traps guest accesses to the debug registers if
  * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
- * flag on vcpu->arch.debug_flags).  Since the guest must not interfere
+ * flag on vcpu->arch.flags).  Since the guest must not interfere
  * with the hardware state when debugging the guest, we must ensure that
  * trapping is enabled whenever we are debugging the guest using the
  * debug registers.
@@ -111,7 +111,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
 
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 {
-	bool trap_debug = !(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY);
+	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
 	unsigned long mdscr;
 
 	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
@@ -184,7 +184,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 			vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
 
 			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
-			vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 			trap_debug = true;
 
 			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
@@ -206,7 +206,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 
 	/* If KDE or MDE are set, perform a full save/restore cycle. */
 	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 
 	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 3e717f6..5000976 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -163,7 +163,7 @@ void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 	if (!has_vhe())
 		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
 
-	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
 
 	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
@@ -185,7 +185,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
 	if (!has_vhe())
 		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
 
-	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
 
 	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
@@ -196,7 +196,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
 	__debug_save_state(vcpu, guest_dbg, guest_ctxt);
 	__debug_restore_state(vcpu, host_dbg, host_ctxt);
 
-	vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
+	vcpu->arch.flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
 
 u32 __hyp_text __kvm_get_mdcr_el2(void)
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index b3894df..35bc168 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -196,7 +196,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
 	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
 
-	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
 		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
 }
 
@@ -218,7 +218,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 	write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
 	write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
 
-	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
 		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6e3b969..a436373 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -31,7 +31,6 @@
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
-#include <asm/kvm_asm.h>
 #include <asm/kvm_coproc.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
@@ -338,7 +337,7 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 {
 	if (p->is_write) {
 		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 	} else {
 		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
 	}
@@ -369,7 +368,7 @@ static void reg_to_dbg(struct kvm_vcpu *vcpu,
 	}
 
 	*dbg_reg = val;
-	vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+	vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 }
 
 static void dbg_to_reg(struct kvm_vcpu *vcpu,
@@ -1441,7 +1440,7 @@ static bool trap_debug32(struct kvm_vcpu *vcpu,
 {
 	if (p->is_write) {
 		vcpu_cp14(vcpu, r->reg) = p->regval;
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 	} else {
 		p->regval = vcpu_cp14(vcpu, r->reg);
 	}
@@ -1473,7 +1472,7 @@ static bool trap_xvr(struct kvm_vcpu *vcpu,
 		val |= p->regval << 32;
 		*dbg_reg = val;
 
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 	} else {
 		p->regval = *dbg_reg >> 32;
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 09/18] KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

In struct vcpu_arch, the debug_flags field is used to store
debug-related flags about the vcpu state.

Since we are about to add some more flags related to FPSIMD and
SVE, it makes sense to add them to the existing flags field rather
than adding new fields.  Since there is only one debug_flags flag
defined so far, there is plenty of free space for expansion.

In preparation for adding more flags, this patch renames the
debug_flags field to simply "flags", and updates comments
appropriately.

The flag definitions are also moved to <asm/kvm_host.h>, since
their presence in <asm/kvm_asm.h> was for purely historical
reasons:  these definitions are not used from asm any more, and not
very likely to be as more Hyp asm is migrated to C.

KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit
1ea66d27e7b0 ("arm64: KVM: Move away from the assembly version of
the world switch"), so this patch gets rid of that too.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/include/asm/kvm_asm.h  | 3 ---
 arch/arm64/include/asm/kvm_host.h | 7 +++++--
 arch/arm64/kvm/debug.c            | 8 ++++----
 arch/arm64/kvm/hyp/debug-sr.c     | 6 +++---
 arch/arm64/kvm/hyp/sysreg-sr.c    | 4 ++--
 arch/arm64/kvm/sys_regs.c         | 9 ++++-----
 6 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f6648a3..f62ccbf 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -30,9 +30,6 @@
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
 
-#define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
-#define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
-
 /* Translate a kernel address of @sym into its equivalent linear mapping */
 #define kvm_ksym_ref(sym)						\
 	({								\
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 469de8a..146c167 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -216,8 +216,8 @@ struct kvm_vcpu_arch {
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
 
-	/* Guest debug state */
-	u64 debug_flags;
+	/* Miscellaneous vcpu state flags */
+	u64 flags;
 
 	/*
 	 * We maintain more than a single set of debug registers to support
@@ -293,6 +293,9 @@ struct kvm_vcpu_arch {
 	bool sysregs_loaded_on_cpu;
 };
 
+/* vcpu_arch flags field values: */
+#define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
+
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
 
 /*
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index a1f4ebd..00d4223 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -103,7 +103,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
  *
  * Additionally, KVM only traps guest accesses to the debug registers if
  * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
- * flag on vcpu->arch.debug_flags).  Since the guest must not interfere
+ * flag on vcpu->arch.flags).  Since the guest must not interfere
  * with the hardware state when debugging the guest, we must ensure that
  * trapping is enabled whenever we are debugging the guest using the
  * debug registers.
@@ -111,7 +111,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
 
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 {
-	bool trap_debug = !(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY);
+	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
 	unsigned long mdscr;
 
 	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
@@ -184,7 +184,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 			vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
 
 			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
-			vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 			trap_debug = true;
 
 			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
@@ -206,7 +206,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 
 	/* If KDE or MDE are set, perform a full save/restore cycle. */
 	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 
 	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 3e717f6..5000976 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -163,7 +163,7 @@ void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 	if (!has_vhe())
 		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
 
-	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
 
 	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
@@ -185,7 +185,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
 	if (!has_vhe())
 		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
 
-	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
+	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
 
 	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
@@ -196,7 +196,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
 	__debug_save_state(vcpu, guest_dbg, guest_ctxt);
 	__debug_restore_state(vcpu, host_dbg, host_ctxt);
 
-	vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
+	vcpu->arch.flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
 
 u32 __hyp_text __kvm_get_mdcr_el2(void)
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index b3894df..35bc168 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -196,7 +196,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
 	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
 
-	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
 		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
 }
 
@@ -218,7 +218,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 	write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
 	write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
 
-	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
+	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
 		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
 }
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6e3b969..a436373 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -31,7 +31,6 @@
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
-#include <asm/kvm_asm.h>
 #include <asm/kvm_coproc.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
@@ -338,7 +337,7 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 {
 	if (p->is_write) {
 		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 	} else {
 		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
 	}
@@ -369,7 +368,7 @@ static void reg_to_dbg(struct kvm_vcpu *vcpu,
 	}
 
 	*dbg_reg = val;
-	vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+	vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 }
 
 static void dbg_to_reg(struct kvm_vcpu *vcpu,
@@ -1441,7 +1440,7 @@ static bool trap_debug32(struct kvm_vcpu *vcpu,
 {
 	if (p->is_write) {
 		vcpu_cp14(vcpu, r->reg) = p->regval;
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 	} else {
 		p->regval = vcpu_cp14(vcpu, r->reg);
 	}
@@ -1473,7 +1472,7 @@ static bool trap_xvr(struct kvm_vcpu *vcpu,
 		val |= p->regval << 32;
 		*dbg_reg = val;
 
-		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 	} else {
 		p->regval = *dbg_reg >> 32;
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

This patch refactors KVM to align the host and guest FPSIMD
save/restore logic with each other for arm64.  This reduces the
number of redundant save/restore operations that must occur, and
reduces the common-case IRQ blackout time during guest exit storms
by saving the host state lazily and optimising away the need to
restore the host state before returning to the run loop.

Four hooks are defined in order to enable this:

 * kvm_arch_vcpu_run_map_fp():
   Called on PID change to map necessary bits of current to Hyp.

 * kvm_arch_vcpu_load_fp():
   Set up FP/SIMD for entering the KVM run loop (parse as
   "vcpu_load fp").

 * kvm_arch_vcpu_ctxsync_fp():
   Get FP/SIMD into a safe state for re-enabling interrupts after a
   guest exit back to the run loop.

   For arm64 specifically, this involves updating the host kernel's
   FPSIMD context tracking metadata so that kernel-mode NEON use
   will cause the vcpu's FPSIMD state to be saved back correctly
   into the vcpu struct.  This must be done before re-enabling
   interrupts because kernel-mode NEON may be used by softirqs.

 * kvm_arch_vcpu_put_fp():
   Save guest FP/SIMD state back to memory and dissociate from the
   CPU ("vcpu_put fp").

Also, the arm64 FPSIMD context switch code is updated to enable it
to save back FPSIMD state for a vcpu, not just current.  A few
helpers drive this:

 * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
   mark this CPU as having context fp (which may belong to a vcpu)
   currently loaded in its registers.  This is the non-task
   equivalent of the static function fpsimd_bind_to_cpu() in
   fpsimd.c.

 * task_fpsimd_save():
   exported to allow KVM to save the guest's FPSIMD state back to
   memory on exit from the run loop.

 * fpsimd_flush_state():
   invalidate any context's FPSIMD state that is currently loaded.
   Used to disassociate the vcpu from the CPU regs on run loop exit.

These changes allow the run loop to enable interrupts (and thus
softirqs that may use kernel-mode NEON) without having to save the
guest's FPSIMD state eagerly.

Some new vcpu_arch fields are added to make all this work.  Because
host FPSIMD state can now be saved back directly into current's
thread_struct as appropriate, host_cpu_context is no longer used
for preserving the FPSIMD state.  However, it is still needed for
preserving other things such as the host's system registers.  To
avoid ABI churn, the redundant storage space in host_cpu_context is
not removed for now.

arch/arm is not addressed by this patch and continues to use its
current save/restore logic.  It could provide implementations of
the helpers later if desired.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>

---

Reviewers note: tags retained because this delta is straightforward by
itself.  Please shout if you're not happy!

Changes since v9:

 * Remove redundant set_thread_flag(TIF_FOREIGN_FPSTATE) that is now
   implicit in fpsimd_flush_cpu_state().
---
 arch/arm/include/asm/kvm_host.h   |   8 +++
 arch/arm64/include/asm/fpsimd.h   |   6 +++
 arch/arm64/include/asm/kvm_host.h |  21 ++++++++
 arch/arm64/kernel/fpsimd.c        |  17 ++++--
 arch/arm64/kvm/Kconfig            |   1 +
 arch/arm64/kvm/Makefile           |   2 +-
 arch/arm64/kvm/fpsimd.c           | 111 ++++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/switch.c       |  51 +++++++++---------
 virt/kvm/arm/arm.c                |   4 ++
 9 files changed, 191 insertions(+), 30 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c7c28c8..ac870b2 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+/*
+ * VFP/NEON switching is all done by the hyp switch code, so no need to
+ * coordinate with host context handling for this state:
+ */
+static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
+
 /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
 static inline void kvm_fpsimd_flush_cpu_state(void) {}
 
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index aa7162a..3e00f70 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,6 +41,8 @@ struct task_struct;
 extern void fpsimd_save_state(struct user_fpsimd_state *state);
 extern void fpsimd_load_state(struct user_fpsimd_state *state);
 
+extern void fpsimd_save(void);
+
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
@@ -49,7 +51,11 @@ extern void fpsimd_preserve_current_state(void);
 extern void fpsimd_restore_current_state(void);
 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
+extern void fpsimd_bind_task_to_cpu(void);
+extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
+
 extern void fpsimd_flush_task_state(struct task_struct *target);
+extern void fpsimd_flush_cpu_state(void);
 extern void sve_flush_cpu_state(void);
 
 /* Maximum VL that SVE VL-agnostic software can transparently support */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 146c167..b3fe730 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -30,6 +30,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmio.h>
+#include <asm/thread_info.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -238,6 +239,10 @@ struct kvm_vcpu_arch {
 
 	/* Pointer to host CPU context */
 	kvm_cpu_context_t *host_cpu_context;
+
+	struct thread_info *host_thread_info;	/* hyp VA */
+	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
+
 	struct {
 		/* {Break,watch}point registers */
 		struct kvm_guest_debug_arch regs;
@@ -295,6 +300,9 @@ struct kvm_vcpu_arch {
 
 /* vcpu_arch flags field values: */
 #define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
+#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
+#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded */
+#define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
 
@@ -423,6 +431,19 @@ static inline void __cpu_init_stage2(void)
 		  "PARange is %d bits, unsupported configuration!", parange);
 }
 
+/* Guest/host FPSIMD coordination helpers */
+int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
+
+#ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */
+static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_run_map_fp(vcpu);
+}
+#endif
+
 /*
  * All host FP/SIMD state is restored on guest exit, so nothing needs
  * doing here except in the SVE case:
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ba9e7df..ded7ffd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -265,7 +265,7 @@ static void task_fpsimd_load(void)
  *
  * Softirqs (and preemption) must be disabled.
  */
-static void fpsimd_save(void)
+void fpsimd_save(void)
 {
 	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
 
@@ -981,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
  * Associate current's FPSIMD context with this cpu
  * Preemption must be disabled when calling this function.
  */
-static void fpsimd_bind_task_to_cpu(void)
+void fpsimd_bind_task_to_cpu(void)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1001,6 +1001,17 @@ static void fpsimd_bind_task_to_cpu(void)
 	}
 }
 
+void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
+{
+	struct fpsimd_last_state_struct *last =
+		this_cpu_ptr(&fpsimd_last_state);
+
+	WARN_ON(!in_softirq() && !irqs_disabled());
+
+	last->st = st;
+	last->sve_in_use = false;
+}
+
 /*
  * Load the userland FPSIMD state of 'current' from memory, but only if the
  * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
@@ -1053,7 +1064,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
 	t->thread.fpsimd_cpu = NR_CPUS;
 }
 
-static inline void fpsimd_flush_cpu_state(void)
+void fpsimd_flush_cpu_state(void)
 {
 	__this_cpu_write(fpsimd_last_state.st, NULL);
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a2e3a5a..47b23bf 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -39,6 +39,7 @@ config KVM
 	select HAVE_KVM_IRQ_ROUTING
 	select IRQ_BYPASS_MANAGER
 	select HAVE_KVM_IRQ_BYPASS
+	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	---help---
 	  Support hosting virtualized guest machines.
 	  We don't support KVM with 16K page tables yet, due to the multiple
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 93afff9..0f2a135 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
 kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
 kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
-kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
+kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
new file mode 100644
index 0000000..365933a
--- /dev/null
+++ b/arch/arm64/kvm/fpsimd.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arch/arm64/kvm/fpsimd.c: Guest/host FPSIMD context coordination helpers
+ *
+ * Copyright 2018 Arm Limited
+ * Author: Dave Martin <Dave.Martin@arm.com>
+ */
+#include <linux/bottom_half.h>
+#include <linux/sched.h>
+#include <linux/thread_info.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_mmu.h>
+
+/*
+ * Called on entry to KVM_RUN unless this vcpu previously ran at least
+ * once and the most recent prior KVM_RUN for this vcpu was called from
+ * the same task as current (highly likely).
+ *
+ * This is guaranteed to execute before kvm_arch_vcpu_load_fp(vcpu),
+ * such that on entering hyp the relevant parts of current are already
+ * mapped.
+ */
+int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	struct thread_info *ti = &current->thread_info;
+	struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
+
+	/*
+	 * Make sure the host task thread flags and fpsimd state are
+	 * visible to hyp:
+	 */
+	ret = create_hyp_mappings(ti, ti + 1, PAGE_HYP);
+	if (ret)
+		goto error;
+
+	ret = create_hyp_mappings(fpsimd, fpsimd + 1, PAGE_HYP);
+	if (ret)
+		goto error;
+
+	vcpu->arch.host_thread_info = kern_hyp_va(ti);
+	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
+error:
+	return ret;
+}
+
+/*
+ * Prepare vcpu for saving the host's FPSIMD state and loading the guest's.
+ * The actual loading is done by the FPSIMD access trap taken to hyp.
+ *
+ * Here, we just set the correct metadata to indicate that the FPSIMD
+ * state in the cpu regs (if any) belongs to current on the host.
+ *
+ * TIF_SVE is backed up here, since it may get clobbered with guest state.
+ * This flag is restored by kvm_arch_vcpu_put_fp(vcpu).
+ */
+void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
+{
+	BUG_ON(system_supports_sve());
+	BUG_ON(!current->mm);
+
+	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
+	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
+	if (test_thread_flag(TIF_SVE))
+		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
+}
+
+/*
+ * If the guest FPSIMD state was loaded, update the host's context
+ * tracking data mark the CPU FPSIMD regs as dirty and belonging to vcpu
+ * so that they will be written back if the kernel clobbers them due to
+ * kernel-mode NEON before re-entry into the guest.
+ */
+void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
+{
+	WARN_ON_ONCE(!irqs_disabled());
+
+	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
+		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs);
+		clear_thread_flag(TIF_FOREIGN_FPSTATE);
+		clear_thread_flag(TIF_SVE);
+	}
+}
+
+/*
+ * Write back the vcpu FPSIMD regs if they are dirty, and invalidate the
+ * cpu FPSIMD regs so that they can't be spuriously reused if this vcpu
+ * disappears and another task or vcpu appears that recycles the same
+ * struct fpsimd_state.
+ */
+void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
+{
+	local_bh_disable();
+
+	update_thread_flag(TIF_SVE,
+			   vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE);
+
+	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
+		/* Clean guest FP state to memory and invalidate cpu view */
+		fpsimd_save();
+		fpsimd_flush_cpu_state();
+	} else if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		/* Ensure user trap controls are correctly restored */
+		fpsimd_bind_task_to_cpu();
+	}
+
+	local_bh_enable();
+}
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c0796c4..118f300 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -23,19 +23,21 @@
 
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
+#include <asm/thread_info.h>
 
-static bool __hyp_text __fpsimd_enabled_nvhe(void)
+/* Check whether the FP regs were dirtied while in the host-side run loop: */
+static bool __hyp_text update_fp_enabled(struct kvm_vcpu *vcpu)
 {
-	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
-}
+	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE)
+		vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
+				      KVM_ARM64_FP_HOST);
 
-static bool fpsimd_enabled_vhe(void)
-{
-	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
+	return !!(vcpu->arch.flags & KVM_ARM64_FP_ENABLED);
 }
 
 /* Save the 32-bit only FPSIMD system register state */
@@ -92,7 +94,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
-	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
+	val &= ~CPACR_EL1_ZEN;
+	if (!update_fp_enabled(vcpu))
+		val &= ~CPACR_EL1_FPEN;
+
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
@@ -105,7 +110,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 	__activate_traps_common(vcpu);
 
 	val = CPTR_EL2_DEFAULT;
-	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
+	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
+	if (!update_fp_enabled(vcpu))
+		val |= CPTR_EL2_TFP;
+
 	write_sysreg(val, cptr_el2);
 }
 
@@ -321,8 +329,6 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 				    struct kvm_vcpu *vcpu)
 {
-	kvm_cpu_context_t *host_ctxt;
-
 	if (has_vhe())
 		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
 			     cpacr_el1);
@@ -332,14 +338,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 
 	isb();
 
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
+	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
+		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
+		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
+	}
+
 	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
 
 	/* Skip restoring fpexc32 for AArch64 guests */
 	if (!(read_sysreg(hcr_el2) & HCR_RW))
 		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
 			     fpexc32_el2);
+
+	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
 }
 
 /*
@@ -418,7 +429,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	host_ctxt = vcpu->arch.host_cpu_context;
@@ -440,19 +450,14 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	fp_enabled = fpsimd_enabled_vhe();
-
 	sysreg_save_guest_state_vhe(guest_ctxt);
 
 	__deactivate_traps(vcpu);
 
 	sysreg_restore_host_state_vhe(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
+	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
 		__fpsimd_save_fpexc32(vcpu);
-	}
 
 	__debug_switch_to_host(vcpu);
 
@@ -464,7 +469,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	vcpu = kern_hyp_va(vcpu);
@@ -496,8 +500,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	fp_enabled = __fpsimd_enabled_nvhe();
-
 	__sysreg_save_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
@@ -508,11 +510,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_state_nvhe(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
+	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
 		__fpsimd_save_fpexc32(vcpu);
-	}
 
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a4c1b76..bee226c 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -363,10 +363,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
 	kvm_vcpu_load_sysregs(vcpu);
+	kvm_arch_vcpu_load_fp(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_arch_vcpu_put_fp(vcpu);
 	kvm_vcpu_put_sysregs(vcpu);
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
@@ -778,6 +780,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
 			kvm_timer_sync_hwstate(vcpu);
 
+		kvm_arch_vcpu_ctxsync_fp(vcpu);
+
 		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

This patch refactors KVM to align the host and guest FPSIMD
save/restore logic with each other for arm64.  This reduces the
number of redundant save/restore operations that must occur, and
reduces the common-case IRQ blackout time during guest exit storms
by saving the host state lazily and optimising away the need to
restore the host state before returning to the run loop.

Four hooks are defined in order to enable this:

 * kvm_arch_vcpu_run_map_fp():
   Called on PID change to map necessary bits of current to Hyp.

 * kvm_arch_vcpu_load_fp():
   Set up FP/SIMD for entering the KVM run loop (parse as
   "vcpu_load fp").

 * kvm_arch_vcpu_ctxsync_fp():
   Get FP/SIMD into a safe state for re-enabling interrupts after a
   guest exit back to the run loop.

   For arm64 specifically, this involves updating the host kernel's
   FPSIMD context tracking metadata so that kernel-mode NEON use
   will cause the vcpu's FPSIMD state to be saved back correctly
   into the vcpu struct.  This must be done before re-enabling
   interrupts because kernel-mode NEON may be used by softirqs.

 * kvm_arch_vcpu_put_fp():
   Save guest FP/SIMD state back to memory and dissociate from the
   CPU ("vcpu_put fp").

Also, the arm64 FPSIMD context switch code is updated to enable it
to save back FPSIMD state for a vcpu, not just current.  A few
helpers drive this:

 * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
   mark this CPU as having context fp (which may belong to a vcpu)
   currently loaded in its registers.  This is the non-task
   equivalent of the static function fpsimd_bind_to_cpu() in
   fpsimd.c.

 * task_fpsimd_save():
   exported to allow KVM to save the guest's FPSIMD state back to
   memory on exit from the run loop.

 * fpsimd_flush_state():
   invalidate any context's FPSIMD state that is currently loaded.
   Used to disassociate the vcpu from the CPU regs on run loop exit.

These changes allow the run loop to enable interrupts (and thus
softirqs that may use kernel-mode NEON) without having to save the
guest's FPSIMD state eagerly.

Some new vcpu_arch fields are added to make all this work.  Because
host FPSIMD state can now be saved back directly into current's
thread_struct as appropriate, host_cpu_context is no longer used
for preserving the FPSIMD state.  However, it is still needed for
preserving other things such as the host's system registers.  To
avoid ABI churn, the redundant storage space in host_cpu_context is
not removed for now.

arch/arm is not addressed by this patch and continues to use its
current save/restore logic.  It could provide implementations of
the helpers later if desired.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>

---

Reviewers note: tags retained because this delta is straightforward by
itself.  Please shout if you're not happy!

Changes since v9:

 * Remove redundant set_thread_flag(TIF_FOREIGN_FPSTATE) that is now
   implicit in fpsimd_flush_cpu_state().
---
 arch/arm/include/asm/kvm_host.h   |   8 +++
 arch/arm64/include/asm/fpsimd.h   |   6 +++
 arch/arm64/include/asm/kvm_host.h |  21 ++++++++
 arch/arm64/kernel/fpsimd.c        |  17 ++++--
 arch/arm64/kvm/Kconfig            |   1 +
 arch/arm64/kvm/Makefile           |   2 +-
 arch/arm64/kvm/fpsimd.c           | 111 ++++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/switch.c       |  51 +++++++++---------
 virt/kvm/arm/arm.c                |   4 ++
 9 files changed, 191 insertions(+), 30 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c7c28c8..ac870b2 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 			       struct kvm_device_attr *attr);
 
+/*
+ * VFP/NEON switching is all done by the hyp switch code, so no need to
+ * coordinate with host context handling for this state:
+ */
+static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
+
 /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
 static inline void kvm_fpsimd_flush_cpu_state(void) {}
 
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index aa7162a..3e00f70 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,6 +41,8 @@ struct task_struct;
 extern void fpsimd_save_state(struct user_fpsimd_state *state);
 extern void fpsimd_load_state(struct user_fpsimd_state *state);
 
+extern void fpsimd_save(void);
+
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
@@ -49,7 +51,11 @@ extern void fpsimd_preserve_current_state(void);
 extern void fpsimd_restore_current_state(void);
 extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
 
+extern void fpsimd_bind_task_to_cpu(void);
+extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
+
 extern void fpsimd_flush_task_state(struct task_struct *target);
+extern void fpsimd_flush_cpu_state(void);
 extern void sve_flush_cpu_state(void);
 
 /* Maximum VL that SVE VL-agnostic software can transparently support */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 146c167..b3fe730 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -30,6 +30,7 @@
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmio.h>
+#include <asm/thread_info.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -238,6 +239,10 @@ struct kvm_vcpu_arch {
 
 	/* Pointer to host CPU context */
 	kvm_cpu_context_t *host_cpu_context;
+
+	struct thread_info *host_thread_info;	/* hyp VA */
+	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
+
 	struct {
 		/* {Break,watch}point registers */
 		struct kvm_guest_debug_arch regs;
@@ -295,6 +300,9 @@ struct kvm_vcpu_arch {
 
 /* vcpu_arch flags field values: */
 #define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
+#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
+#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded */
+#define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
 
@@ -423,6 +431,19 @@ static inline void __cpu_init_stage2(void)
 		  "PARange is %d bits, unsupported configuration!", parange);
 }
 
+/* Guest/host FPSIMD coordination helpers */
+int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
+
+#ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */
+static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_run_map_fp(vcpu);
+}
+#endif
+
 /*
  * All host FP/SIMD state is restored on guest exit, so nothing needs
  * doing here except in the SVE case:
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ba9e7df..ded7ffd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -265,7 +265,7 @@ static void task_fpsimd_load(void)
  *
  * Softirqs (and preemption) must be disabled.
  */
-static void fpsimd_save(void)
+void fpsimd_save(void)
 {
 	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
 
@@ -981,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
  * Associate current's FPSIMD context with this cpu
  * Preemption must be disabled when calling this function.
  */
-static void fpsimd_bind_task_to_cpu(void)
+void fpsimd_bind_task_to_cpu(void)
 {
 	struct fpsimd_last_state_struct *last =
 		this_cpu_ptr(&fpsimd_last_state);
@@ -1001,6 +1001,17 @@ static void fpsimd_bind_task_to_cpu(void)
 	}
 }
 
+void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
+{
+	struct fpsimd_last_state_struct *last =
+		this_cpu_ptr(&fpsimd_last_state);
+
+	WARN_ON(!in_softirq() && !irqs_disabled());
+
+	last->st = st;
+	last->sve_in_use = false;
+}
+
 /*
  * Load the userland FPSIMD state of 'current' from memory, but only if the
  * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
@@ -1053,7 +1064,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
 	t->thread.fpsimd_cpu = NR_CPUS;
 }
 
-static inline void fpsimd_flush_cpu_state(void)
+void fpsimd_flush_cpu_state(void)
 {
 	__this_cpu_write(fpsimd_last_state.st, NULL);
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a2e3a5a..47b23bf 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -39,6 +39,7 @@ config KVM
 	select HAVE_KVM_IRQ_ROUTING
 	select IRQ_BYPASS_MANAGER
 	select HAVE_KVM_IRQ_BYPASS
+	select HAVE_KVM_VCPU_RUN_PID_CHANGE
 	---help---
 	  Support hosting virtualized guest machines.
 	  We don't support KVM with 16K page tables yet, due to the multiple
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 93afff9..0f2a135 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
 kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
 kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
-kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
+kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
new file mode 100644
index 0000000..365933a
--- /dev/null
+++ b/arch/arm64/kvm/fpsimd.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arch/arm64/kvm/fpsimd.c: Guest/host FPSIMD context coordination helpers
+ *
+ * Copyright 2018 Arm Limited
+ * Author: Dave Martin <Dave.Martin@arm.com>
+ */
+#include <linux/bottom_half.h>
+#include <linux/sched.h>
+#include <linux/thread_info.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_asm.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_mmu.h>
+
+/*
+ * Called on entry to KVM_RUN unless this vcpu previously ran at least
+ * once and the most recent prior KVM_RUN for this vcpu was called from
+ * the same task as current (highly likely).
+ *
+ * This is guaranteed to execute before kvm_arch_vcpu_load_fp(vcpu),
+ * such that on entering hyp the relevant parts of current are already
+ * mapped.
+ */
+int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	struct thread_info *ti = &current->thread_info;
+	struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
+
+	/*
+	 * Make sure the host task thread flags and fpsimd state are
+	 * visible to hyp:
+	 */
+	ret = create_hyp_mappings(ti, ti + 1, PAGE_HYP);
+	if (ret)
+		goto error;
+
+	ret = create_hyp_mappings(fpsimd, fpsimd + 1, PAGE_HYP);
+	if (ret)
+		goto error;
+
+	vcpu->arch.host_thread_info = kern_hyp_va(ti);
+	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
+error:
+	return ret;
+}
+
+/*
+ * Prepare vcpu for saving the host's FPSIMD state and loading the guest's.
+ * The actual loading is done by the FPSIMD access trap taken to hyp.
+ *
+ * Here, we just set the correct metadata to indicate that the FPSIMD
+ * state in the cpu regs (if any) belongs to current on the host.
+ *
+ * TIF_SVE is backed up here, since it may get clobbered with guest state.
+ * This flag is restored by kvm_arch_vcpu_put_fp(vcpu).
+ */
+void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
+{
+	BUG_ON(system_supports_sve());
+	BUG_ON(!current->mm);
+
+	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
+	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
+	if (test_thread_flag(TIF_SVE))
+		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
+}
+
+/*
+ * If the guest FPSIMD state was loaded, update the host's context
+ * tracking data mark the CPU FPSIMD regs as dirty and belonging to vcpu
+ * so that they will be written back if the kernel clobbers them due to
+ * kernel-mode NEON before re-entry into the guest.
+ */
+void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
+{
+	WARN_ON_ONCE(!irqs_disabled());
+
+	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
+		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs);
+		clear_thread_flag(TIF_FOREIGN_FPSTATE);
+		clear_thread_flag(TIF_SVE);
+	}
+}
+
+/*
+ * Write back the vcpu FPSIMD regs if they are dirty, and invalidate the
+ * cpu FPSIMD regs so that they can't be spuriously reused if this vcpu
+ * disappears and another task or vcpu appears that recycles the same
+ * struct fpsimd_state.
+ */
+void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
+{
+	local_bh_disable();
+
+	update_thread_flag(TIF_SVE,
+			   vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE);
+
+	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
+		/* Clean guest FP state to memory and invalidate cpu view */
+		fpsimd_save();
+		fpsimd_flush_cpu_state();
+	} else if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		/* Ensure user trap controls are correctly restored */
+		fpsimd_bind_task_to_cpu();
+	}
+
+	local_bh_enable();
+}
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c0796c4..118f300 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -23,19 +23,21 @@
 
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
+#include <asm/thread_info.h>
 
-static bool __hyp_text __fpsimd_enabled_nvhe(void)
+/* Check whether the FP regs were dirtied while in the host-side run loop: */
+static bool __hyp_text update_fp_enabled(struct kvm_vcpu *vcpu)
 {
-	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
-}
+	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE)
+		vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
+				      KVM_ARM64_FP_HOST);
 
-static bool fpsimd_enabled_vhe(void)
-{
-	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
+	return !!(vcpu->arch.flags & KVM_ARM64_FP_ENABLED);
 }
 
 /* Save the 32-bit only FPSIMD system register state */
@@ -92,7 +94,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
-	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
+	val &= ~CPACR_EL1_ZEN;
+	if (!update_fp_enabled(vcpu))
+		val &= ~CPACR_EL1_FPEN;
+
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
@@ -105,7 +110,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 	__activate_traps_common(vcpu);
 
 	val = CPTR_EL2_DEFAULT;
-	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
+	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
+	if (!update_fp_enabled(vcpu))
+		val |= CPTR_EL2_TFP;
+
 	write_sysreg(val, cptr_el2);
 }
 
@@ -321,8 +329,6 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 				    struct kvm_vcpu *vcpu)
 {
-	kvm_cpu_context_t *host_ctxt;
-
 	if (has_vhe())
 		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
 			     cpacr_el1);
@@ -332,14 +338,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 
 	isb();
 
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
+	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
+		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
+		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
+	}
+
 	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
 
 	/* Skip restoring fpexc32 for AArch64 guests */
 	if (!(read_sysreg(hcr_el2) & HCR_RW))
 		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
 			     fpexc32_el2);
+
+	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
 }
 
 /*
@@ -418,7 +429,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	host_ctxt = vcpu->arch.host_cpu_context;
@@ -440,19 +450,14 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	fp_enabled = fpsimd_enabled_vhe();
-
 	sysreg_save_guest_state_vhe(guest_ctxt);
 
 	__deactivate_traps(vcpu);
 
 	sysreg_restore_host_state_vhe(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
+	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
 		__fpsimd_save_fpexc32(vcpu);
-	}
 
 	__debug_switch_to_host(vcpu);
 
@@ -464,7 +469,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
-	bool fp_enabled;
 	u64 exit_code;
 
 	vcpu = kern_hyp_va(vcpu);
@@ -496,8 +500,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 		/* And we're baaack! */
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
-	fp_enabled = __fpsimd_enabled_nvhe();
-
 	__sysreg_save_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
@@ -508,11 +510,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_state_nvhe(host_ctxt);
 
-	if (fp_enabled) {
-		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
-		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
+	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
 		__fpsimd_save_fpexc32(vcpu);
-	}
 
 	/*
 	 * This must come after restoring the host sysregs, since a non-VHE
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a4c1b76..bee226c 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -363,10 +363,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
 	kvm_vcpu_load_sysregs(vcpu);
+	kvm_arch_vcpu_load_fp(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_arch_vcpu_put_fp(vcpu);
 	kvm_vcpu_put_sysregs(vcpu);
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
@@ -778,6 +780,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
 			kvm_timer_sync_hwstate(vcpu);
 
+		kvm_arch_vcpu_ctxsync_fp(vcpu);
+
 		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 11/18] arm64/sve: Move read_zcr_features() out of cpufeature.h
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

Having read_zcr_features() inline in cpufeature.h results in that
header requiring #includes which make it hard to include
<asm/fpsimd.h> elsewhere without triggering header inclusion
cycles.

This is not a hot-path function and arguably should not be in
cpufeature.h in the first place, so this patch moves it to
fpsimd.c, compiled conditionally if CONFIG_ARM64_SVE=y.

This allows some SVE-related #includes to be dropped from
cpufeature.h, which will ease future maintenance.

A couple of missing #includes of <asm/fpsimd.h> are exposed by this
change under arch/arm64/.  This patch adds the missing #includes as
necessary.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 29 -----------------------------
 arch/arm64/include/asm/fpsimd.h     |  2 ++
 arch/arm64/include/asm/processor.h  |  1 +
 arch/arm64/kernel/fpsimd.c          | 28 ++++++++++++++++++++++++++++
 arch/arm64/kernel/ptrace.c          |  1 +
 5 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 09b0f2a..0a6b713 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -11,9 +11,7 @@
 
 #include <asm/cpucaps.h>
 #include <asm/cputype.h>
-#include <asm/fpsimd.h>
 #include <asm/hwcap.h>
-#include <asm/sigcontext.h>
 #include <asm/sysreg.h>
 
 /*
@@ -510,33 +508,6 @@ static inline bool system_supports_sve(void)
 		cpus_have_const_cap(ARM64_SVE);
 }
 
-/*
- * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
- * vector length.
- *
- * Use only if SVE is present.
- * This function clobbers the SVE vector length.
- */
-static inline u64 read_zcr_features(void)
-{
-	u64 zcr;
-	unsigned int vq_max;
-
-	/*
-	 * Set the maximum possible VL, and write zeroes to all other
-	 * bits to see if they stick.
-	 */
-	sve_kernel_enable(NULL);
-	write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
-
-	zcr = read_sysreg_s(SYS_ZCR_EL1);
-	zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
-	vq_max = sve_vq_from_vl(sve_get_vl());
-	zcr |= vq_max - 1; /* set LEN field to maximum effective value */
-
-	return zcr;
-}
-
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 3e00f70..fb60b22 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -69,6 +69,8 @@ extern unsigned int sve_get_vl(void);
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
+extern u64 read_zcr_features(void);
+
 extern int __ro_after_init sve_max_vl;
 
 #ifdef CONFIG_ARM64_SVE
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 7675989..f902b6d 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -40,6 +40,7 @@
 
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
+#include <asm/fpsimd.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/lse.h>
 #include <asm/pgtable-hwdef.h>
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ded7ffd..5152bbc 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -37,6 +37,7 @@
 #include <linux/sched/task_stack.h>
 #include <linux/signal.h>
 #include <linux/slab.h>
+#include <linux/stddef.h>
 #include <linux/sysctl.h>
 
 #include <asm/esr.h>
@@ -754,6 +755,33 @@ void sve_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
 	isb();
 }
 
+/*
+ * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
+ * vector length.
+ *
+ * Use only if SVE is present.
+ * This function clobbers the SVE vector length.
+ */
+u64 read_zcr_features(void)
+{
+	u64 zcr;
+	unsigned int vq_max;
+
+	/*
+	 * Set the maximum possible VL, and write zeroes to all other
+	 * bits to see if they stick.
+	 */
+	sve_kernel_enable(NULL);
+	write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
+
+	zcr = read_sysreg_s(SYS_ZCR_EL1);
+	zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
+	vq_max = sve_vq_from_vl(sve_get_vl());
+	zcr |= vq_max - 1; /* set LEN field to maximum effective value */
+
+	return zcr;
+}
+
 void __init sve_setup(void)
 {
 	u64 zcr;
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 7ff81fe..78889c4 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -44,6 +44,7 @@
 #include <asm/compat.h>
 #include <asm/cpufeature.h>
 #include <asm/debug-monitors.h>
+#include <asm/fpsimd.h>
 #include <asm/pgtable.h>
 #include <asm/stacktrace.h>
 #include <asm/syscall.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 11/18] arm64/sve: Move read_zcr_features() out of cpufeature.h
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

Having read_zcr_features() inline in cpufeature.h results in that
header requiring #includes which make it hard to include
<asm/fpsimd.h> elsewhere without triggering header inclusion
cycles.

This is not a hot-path function and arguably should not be in
cpufeature.h in the first place, so this patch moves it to
fpsimd.c, compiled conditionally if CONFIG_ARM64_SVE=y.

This allows some SVE-related #includes to be dropped from
cpufeature.h, which will ease future maintenance.

A couple of missing #includes of <asm/fpsimd.h> are exposed by this
change under arch/arm64/.  This patch adds the missing #includes as
necessary.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 29 -----------------------------
 arch/arm64/include/asm/fpsimd.h     |  2 ++
 arch/arm64/include/asm/processor.h  |  1 +
 arch/arm64/kernel/fpsimd.c          | 28 ++++++++++++++++++++++++++++
 arch/arm64/kernel/ptrace.c          |  1 +
 5 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 09b0f2a..0a6b713 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -11,9 +11,7 @@
 
 #include <asm/cpucaps.h>
 #include <asm/cputype.h>
-#include <asm/fpsimd.h>
 #include <asm/hwcap.h>
-#include <asm/sigcontext.h>
 #include <asm/sysreg.h>
 
 /*
@@ -510,33 +508,6 @@ static inline bool system_supports_sve(void)
 		cpus_have_const_cap(ARM64_SVE);
 }
 
-/*
- * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
- * vector length.
- *
- * Use only if SVE is present.
- * This function clobbers the SVE vector length.
- */
-static inline u64 read_zcr_features(void)
-{
-	u64 zcr;
-	unsigned int vq_max;
-
-	/*
-	 * Set the maximum possible VL, and write zeroes to all other
-	 * bits to see if they stick.
-	 */
-	sve_kernel_enable(NULL);
-	write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
-
-	zcr = read_sysreg_s(SYS_ZCR_EL1);
-	zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
-	vq_max = sve_vq_from_vl(sve_get_vl());
-	zcr |= vq_max - 1; /* set LEN field to maximum effective value */
-
-	return zcr;
-}
-
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 3e00f70..fb60b22 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -69,6 +69,8 @@ extern unsigned int sve_get_vl(void);
 struct arm64_cpu_capabilities;
 extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
 
+extern u64 read_zcr_features(void);
+
 extern int __ro_after_init sve_max_vl;
 
 #ifdef CONFIG_ARM64_SVE
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 7675989..f902b6d 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -40,6 +40,7 @@
 
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
+#include <asm/fpsimd.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/lse.h>
 #include <asm/pgtable-hwdef.h>
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ded7ffd..5152bbc 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -37,6 +37,7 @@
 #include <linux/sched/task_stack.h>
 #include <linux/signal.h>
 #include <linux/slab.h>
+#include <linux/stddef.h>
 #include <linux/sysctl.h>
 
 #include <asm/esr.h>
@@ -754,6 +755,33 @@ void sve_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
 	isb();
 }
 
+/*
+ * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
+ * vector length.
+ *
+ * Use only if SVE is present.
+ * This function clobbers the SVE vector length.
+ */
+u64 read_zcr_features(void)
+{
+	u64 zcr;
+	unsigned int vq_max;
+
+	/*
+	 * Set the maximum possible VL, and write zeroes to all other
+	 * bits to see if they stick.
+	 */
+	sve_kernel_enable(NULL);
+	write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
+
+	zcr = read_sysreg_s(SYS_ZCR_EL1);
+	zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
+	vq_max = sve_vq_from_vl(sve_get_vl());
+	zcr |= vq_max - 1; /* set LEN field to maximum effective value */
+
+	return zcr;
+}
+
 void __init sve_setup(void)
 {
 	u64 zcr;
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 7ff81fe..78889c4 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -44,6 +44,7 @@
 #include <asm/compat.h>
 #include <asm/cpufeature.h>
 #include <asm/debug-monitors.h>
+#include <asm/fpsimd.h>
 #include <asm/pgtable.h>
 #include <asm/stacktrace.h>
 #include <asm/syscall.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 12/18] arm64/sve: Switch sve_pffr() argument from task to thread
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

sve_pffr(), which is used to derive the base address used for
low-level SVE save/restore routines, currently takes the relevant
task_struct as an argument.

The only accessed fields are actually part of thread_struct, so
this patch changes the argument type accordingly.  This is done in
preparation for moving this function to a header, where we do not
want to have to include <linux/sched.h> due to the consequent
circular #include problems.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 5152bbc..c4e9762 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -44,6 +44,7 @@
 #include <asm/fpsimd.h>
 #include <asm/cpufeature.h>
 #include <asm/cputype.h>
+#include <asm/processor.h>
 #include <asm/simd.h>
 #include <asm/sigcontext.h>
 #include <asm/sysreg.h>
@@ -167,10 +168,9 @@ static size_t sve_ffr_offset(int vl)
 	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
 }
 
-static void *sve_pffr(struct task_struct *task)
+static void *sve_pffr(struct thread_struct *thread)
 {
-	return (char *)task->thread.sve_state +
-		sve_ffr_offset(task->thread.sve_vl);
+	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
 }
 
 static void change_cpacr(u64 val, u64 mask)
@@ -253,7 +253,7 @@ static void task_fpsimd_load(void)
 	WARN_ON(!in_softirq() && !irqs_disabled());
 
 	if (system_supports_sve() && test_thread_flag(TIF_SVE))
-		sve_load_state(sve_pffr(current),
+		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr,
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
 	else
@@ -284,7 +284,7 @@ void fpsimd_save(void)
 				return;
 			}
 
-			sve_save_state(sve_pffr(current), &st->fpsr);
+			sve_save_state(sve_pffr(&current->thread), &st->fpsr);
 		} else
 			fpsimd_save_state(st);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 12/18] arm64/sve: Switch sve_pffr() argument from task to thread
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

sve_pffr(), which is used to derive the base address used for
low-level SVE save/restore routines, currently takes the relevant
task_struct as an argument.

The only accessed fields are actually part of thread_struct, so
this patch changes the argument type accordingly.  This is done in
preparation for moving this function to a header, where we do not
want to have to include <linux/sched.h> due to the consequent
circular #include problems.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 5152bbc..c4e9762 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -44,6 +44,7 @@
 #include <asm/fpsimd.h>
 #include <asm/cpufeature.h>
 #include <asm/cputype.h>
+#include <asm/processor.h>
 #include <asm/simd.h>
 #include <asm/sigcontext.h>
 #include <asm/sysreg.h>
@@ -167,10 +168,9 @@ static size_t sve_ffr_offset(int vl)
 	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
 }
 
-static void *sve_pffr(struct task_struct *task)
+static void *sve_pffr(struct thread_struct *thread)
 {
-	return (char *)task->thread.sve_state +
-		sve_ffr_offset(task->thread.sve_vl);
+	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
 }
 
 static void change_cpacr(u64 val, u64 mask)
@@ -253,7 +253,7 @@ static void task_fpsimd_load(void)
 	WARN_ON(!in_softirq() && !irqs_disabled());
 
 	if (system_supports_sve() && test_thread_flag(TIF_SVE))
-		sve_load_state(sve_pffr(current),
+		sve_load_state(sve_pffr(&current->thread),
 			       &current->thread.uw.fpsimd_state.fpsr,
 			       sve_vq_from_vl(current->thread.sve_vl) - 1);
 	else
@@ -284,7 +284,7 @@ void fpsimd_save(void)
 				return;
 			}
 
-			sve_save_state(sve_pffr(current), &st->fpsr);
+			sve_save_state(sve_pffr(&current->thread), &st->fpsr);
 		} else
 			fpsimd_save_state(st);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 13/18] arm64/sve: Move sve_pffr() to fpsimd.h and make inline
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

In order to make sve_save_state()/sve_load_state() more easily
reusable and to get rid of a potential branch on context switch
critical paths, this patch makes sve_pffr() inline and moves it to
fpsimd.h.

<asm/processor.h> must be included in fpsimd.h in order to make
this work, and this creates an #include cycle that is tricky to
avoid without modifying core code, due to the way the PR_SVE_*()
prctl helpers are included in the core prctl implementation.

Instead of breaking the cycle, this patch defers inclusion of
<asm/fpsimd.h> in <asm/processor.h> until the point where it is
actually needed: i.e., immediately before the prctl definitions.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/fpsimd.h    | 13 +++++++++++++
 arch/arm64/include/asm/processor.h |  3 ++-
 arch/arm64/kernel/fpsimd.c         | 12 ------------
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index fb60b22..fa92747 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -18,6 +18,8 @@
 
 #include <asm/ptrace.h>
 #include <asm/errno.h>
+#include <asm/processor.h>
+#include <asm/sigcontext.h>
 
 #ifndef __ASSEMBLY__
 
@@ -61,6 +63,17 @@ extern void sve_flush_cpu_state(void);
 /* Maximum VL that SVE VL-agnostic software can transparently support */
 #define SVE_VL_ARCH_MAX 0x100
 
+/* Offset of FFR in the SVE register dump */
+static inline size_t sve_ffr_offset(int vl)
+{
+	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
+}
+
+static inline void *sve_pffr(struct thread_struct *thread)
+{
+	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
+}
+
 extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index f902b6d..ebaadb1 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -40,7 +40,6 @@
 
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
-#include <asm/fpsimd.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/lse.h>
 #include <asm/pgtable-hwdef.h>
@@ -245,6 +244,8 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused);
 void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused);
 void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused);
 
+#include <asm/fpsimd.h>
+
 /* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */
 #define SVE_SET_VL(arg)	sve_set_current_vl(arg)
 #define SVE_GET_VL()	sve_get_current_vl()
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index c4e9762..f39d3b0 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -161,18 +161,6 @@ static void sve_free(struct task_struct *task)
 	__sve_free(task);
 }
 
-
-/* Offset of FFR in the SVE register dump */
-static size_t sve_ffr_offset(int vl)
-{
-	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
-}
-
-static void *sve_pffr(struct thread_struct *thread)
-{
-	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
-}
-
 static void change_cpacr(u64 val, u64 mask)
 {
 	u64 cpacr = read_sysreg(CPACR_EL1);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 13/18] arm64/sve: Move sve_pffr() to fpsimd.h and make inline
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

In order to make sve_save_state()/sve_load_state() more easily
reusable and to get rid of a potential branch on context switch
critical paths, this patch makes sve_pffr() inline and moves it to
fpsimd.h.

<asm/processor.h> must be included in fpsimd.h in order to make
this work, and this creates an #include cycle that is tricky to
avoid without modifying core code, due to the way the PR_SVE_*()
prctl helpers are included in the core prctl implementation.

Instead of breaking the cycle, this patch defers inclusion of
<asm/fpsimd.h> in <asm/processor.h> until the point where it is
actually needed: i.e., immediately before the prctl definitions.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/fpsimd.h    | 13 +++++++++++++
 arch/arm64/include/asm/processor.h |  3 ++-
 arch/arm64/kernel/fpsimd.c         | 12 ------------
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index fb60b22..fa92747 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -18,6 +18,8 @@
 
 #include <asm/ptrace.h>
 #include <asm/errno.h>
+#include <asm/processor.h>
+#include <asm/sigcontext.h>
 
 #ifndef __ASSEMBLY__
 
@@ -61,6 +63,17 @@ extern void sve_flush_cpu_state(void);
 /* Maximum VL that SVE VL-agnostic software can transparently support */
 #define SVE_VL_ARCH_MAX 0x100
 
+/* Offset of FFR in the SVE register dump */
+static inline size_t sve_ffr_offset(int vl)
+{
+	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
+}
+
+static inline void *sve_pffr(struct thread_struct *thread)
+{
+	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
+}
+
 extern void sve_save_state(void *state, u32 *pfpsr);
 extern void sve_load_state(void const *state, u32 const *pfpsr,
 			   unsigned long vq_minus_1);
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index f902b6d..ebaadb1 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -40,7 +40,6 @@
 
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
-#include <asm/fpsimd.h>
 #include <asm/hw_breakpoint.h>
 #include <asm/lse.h>
 #include <asm/pgtable-hwdef.h>
@@ -245,6 +244,8 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused);
 void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused);
 void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused);
 
+#include <asm/fpsimd.h>
+
 /* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */
 #define SVE_SET_VL(arg)	sve_set_current_vl(arg)
 #define SVE_GET_VL()	sve_get_current_vl()
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index c4e9762..f39d3b0 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -161,18 +161,6 @@ static void sve_free(struct task_struct *task)
 	__sve_free(task);
 }
 
-
-/* Offset of FFR in the SVE register dump */
-static size_t sve_ffr_offset(int vl)
-{
-	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
-}
-
-static void *sve_pffr(struct thread_struct *thread)
-{
-	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
-}
-
 static void change_cpacr(u64 val, u64 mask)
 {
 	u64 cpacr = read_sysreg(CPACR_EL1);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

This patch adds SVE context saving to the hyp FPSIMD context switch
path.  This means that it is no longer necessary to save the host
SVE state in advance of entering the guest, when in use.

In order to avoid adding pointless complexity to the code, VHE is
assumed if SVE is in use.  VHE is an architectural prerequisite for
SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
kernels that support both SVE and KVM.

Historically, software models exist that can expose the
architecturally invalid configuration of SVE without VHE, so if
this situation is detected at kvm_init() time then KVM will be
disabled.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>

---

 * Tags stripped since v8, please reconfirm if possible:

Formerly-Reviewed-by: Christoffer Dall <christoffer.dall at arm.com>
Formerly-Acked-by: Marc Zyngier <marc.zyngier at arm.com>
Formerly-Acked-by: Catalin Marinas <catalin.marinas at arm.com>

Changes since v9:

Requested by Marc Zyngier:

 * Inline check for VHE if SVE is present into kvm_host.h.

   The check has been renamed to the more specific
   kvm_arch_check_sve_has_vhe(), and the kvm_pr_unimpl() moved back to
   arm.c (to avoid circular include issues).

   arm.c is not single-arch code, but it is all Arm-specific, so
   adding a hook like this doesn't seem too unreasonable.

Changes since v8:

 * Add kvm_arch_check_supported() hook, and move arm64-specific check
   for SVE-implies-VHE into arch/arm64/.

   Due to circular header dependency problems, it is difficult to get
   the prototype for kvm_pr_*() functions in <asm/kvm_host.h>, so this
   patch puts arm64's kvm_arch_check_supported() hook out of line.
   This is not a hot function.
---
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm64/Kconfig                |  7 +++++++
 arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
 arch/arm64/kvm/fpsimd.c           |  1 -
 arch/arm64/kvm/hyp/switch.c       | 20 +++++++++++++++++++-
 virt/kvm/arm/arm.c                |  7 +++++++
 6 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ac870b2..3b85bbb 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -280,6 +280,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
+static inline bool kvm_arch_check_sve_has_vhe(void) { return true; }
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eb2cf49..b0d3820 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1130,6 +1130,7 @@ endmenu
 config ARM64_SVE
 	bool "ARM Scalable Vector Extension support"
 	default y
+	depends on !KVM || ARM64_VHE
 	help
 	  The Scalable Vector Extension (SVE) is an extension to the AArch64
 	  execution state which complements and extends the SIMD functionality
@@ -1155,6 +1156,12 @@ config ARM64_SVE
 	  booting the kernel.  If unsure and you are not observing these
 	  symptoms, you should assume that it is safe to say Y.
 
+	  CPUs that support SVE are architecturally required to support the
+	  Virtualization Host Extensions (VHE), so the kernel makes no
+	  provision for supporting SVE alongside KVM without VHE enabled.
+	  Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
+	  KVM in the same kernel image.
+
 config ARM64_MODULE_PLTS
 	bool
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b3fe730..06d5a61 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -405,6 +405,19 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 	kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
 }
 
+static inline bool kvm_arch_check_sve_has_vhe(void)
+{
+	/*
+	 * The Arm architecture specifies that imlpementation of SVE
+	 * requires VHE also to be implemented.  The KVM code for arm64
+	 * relies on this when SVE is present:
+	 */
+	if (system_supports_sve())
+		return has_vhe();
+	else
+		return true;
+}
+
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 365933a..dc6ecfa 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -59,7 +59,6 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
  */
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 {
-	BUG_ON(system_supports_sve());
 	BUG_ON(!current->mm);
 
 	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 118f300..a6a8c7d 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -21,6 +21,7 @@
 
 #include <kvm/arm_psci.h>
 
+#include <asm/cpufeature.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
@@ -28,6 +29,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
+#include <asm/processor.h>
 #include <asm/thread_info.h>
 
 /* Check whether the FP regs were dirtied while in the host-side run loop: */
@@ -329,6 +331,8 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 				    struct kvm_vcpu *vcpu)
 {
+	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
+
 	if (has_vhe())
 		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
 			     cpacr_el1);
@@ -339,7 +343,21 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 	isb();
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
-		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
+		/*
+		 * In the SVE case, VHE is assumed: it is enforced by
+		 * Kconfig and kvm_arch_init().
+		 */
+		if (system_supports_sve() &&
+		    (vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE)) {
+			struct thread_struct *thread = container_of(
+				host_fpsimd,
+				struct thread_struct, uw.fpsimd_state);
+
+			sve_save_state(sve_pffr(thread), &host_fpsimd->fpsr);
+		} else {
+			__fpsimd_save_state(host_fpsimd);
+		}
+
 		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
 	}
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index bee226c..ce7c6f3 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -16,6 +16,7 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
+#include <linux/bug.h>
 #include <linux/cpu_pm.h>
 #include <linux/errno.h>
 #include <linux/err.h>
@@ -41,6 +42,7 @@
 #include <asm/mman.h>
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
+#include <asm/cpufeature.h>
 #include <asm/virt.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
@@ -1574,6 +1576,11 @@ int kvm_arch_init(void *opaque)
 		return -ENODEV;
 	}
 
+	if (!kvm_arch_check_sve_has_vhe()) {
+		kvm_pr_unimpl("SVE system without VHE unsupported.  Broken cpu?");
+		return -ENODEV;
+	}
+
 	for_each_online_cpu(cpu) {
 		smp_call_function_single(cpu, check_kvm_target_cpu, &ret, 1);
 		if (ret < 0) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds SVE context saving to the hyp FPSIMD context switch
path.  This means that it is no longer necessary to save the host
SVE state in advance of entering the guest, when in use.

In order to avoid adding pointless complexity to the code, VHE is
assumed if SVE is in use.  VHE is an architectural prerequisite for
SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
kernels that support both SVE and KVM.

Historically, software models exist that can expose the
architecturally invalid configuration of SVE without VHE, so if
this situation is detected at kvm_init() time then KVM will be
disabled.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>

---

 * Tags stripped since v8, please reconfirm if possible:

Formerly-Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Formerly-Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Formerly-Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Changes since v9:

Requested by Marc Zyngier:

 * Inline check for VHE if SVE is present into kvm_host.h.

   The check has been renamed to the more specific
   kvm_arch_check_sve_has_vhe(), and the kvm_pr_unimpl() moved back to
   arm.c (to avoid circular include issues).

   arm.c is not single-arch code, but it is all Arm-specific, so
   adding a hook like this doesn't seem too unreasonable.

Changes since v8:

 * Add kvm_arch_check_supported() hook, and move arm64-specific check
   for SVE-implies-VHE into arch/arm64/.

   Due to circular header dependency problems, it is difficult to get
   the prototype for kvm_pr_*() functions in <asm/kvm_host.h>, so this
   patch puts arm64's kvm_arch_check_supported() hook out of line.
   This is not a hot function.
---
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm64/Kconfig                |  7 +++++++
 arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
 arch/arm64/kvm/fpsimd.c           |  1 -
 arch/arm64/kvm/hyp/switch.c       | 20 +++++++++++++++++++-
 virt/kvm/arm/arm.c                |  7 +++++++
 6 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index ac870b2..3b85bbb 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -280,6 +280,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
+static inline bool kvm_arch_check_sve_has_vhe(void) { return true; }
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eb2cf49..b0d3820 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1130,6 +1130,7 @@ endmenu
 config ARM64_SVE
 	bool "ARM Scalable Vector Extension support"
 	default y
+	depends on !KVM || ARM64_VHE
 	help
 	  The Scalable Vector Extension (SVE) is an extension to the AArch64
 	  execution state which complements and extends the SIMD functionality
@@ -1155,6 +1156,12 @@ config ARM64_SVE
 	  booting the kernel.  If unsure and you are not observing these
 	  symptoms, you should assume that it is safe to say Y.
 
+	  CPUs that support SVE are architecturally required to support the
+	  Virtualization Host Extensions (VHE), so the kernel makes no
+	  provision for supporting SVE alongside KVM without VHE enabled.
+	  Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
+	  KVM in the same kernel image.
+
 config ARM64_MODULE_PLTS
 	bool
 	select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b3fe730..06d5a61 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -405,6 +405,19 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
 	kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
 }
 
+static inline bool kvm_arch_check_sve_has_vhe(void)
+{
+	/*
+	 * The Arm architecture specifies that imlpementation of SVE
+	 * requires VHE also to be implemented.  The KVM code for arm64
+	 * relies on this when SVE is present:
+	 */
+	if (system_supports_sve())
+		return has_vhe();
+	else
+		return true;
+}
+
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 365933a..dc6ecfa 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -59,7 +59,6 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
  */
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 {
-	BUG_ON(system_supports_sve());
 	BUG_ON(!current->mm);
 
 	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 118f300..a6a8c7d 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -21,6 +21,7 @@
 
 #include <kvm/arm_psci.h>
 
+#include <asm/cpufeature.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
@@ -28,6 +29,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 #include <asm/debug-monitors.h>
+#include <asm/processor.h>
 #include <asm/thread_info.h>
 
 /* Check whether the FP regs were dirtied while in the host-side run loop: */
@@ -329,6 +331,8 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 				    struct kvm_vcpu *vcpu)
 {
+	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
+
 	if (has_vhe())
 		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
 			     cpacr_el1);
@@ -339,7 +343,21 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 	isb();
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
-		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
+		/*
+		 * In the SVE case, VHE is assumed: it is enforced by
+		 * Kconfig and kvm_arch_init().
+		 */
+		if (system_supports_sve() &&
+		    (vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE)) {
+			struct thread_struct *thread = container_of(
+				host_fpsimd,
+				struct thread_struct, uw.fpsimd_state);
+
+			sve_save_state(sve_pffr(thread), &host_fpsimd->fpsr);
+		} else {
+			__fpsimd_save_state(host_fpsimd);
+		}
+
 		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
 	}
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index bee226c..ce7c6f3 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -16,6 +16,7 @@
  * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
  */
 
+#include <linux/bug.h>
 #include <linux/cpu_pm.h>
 #include <linux/errno.h>
 #include <linux/err.h>
@@ -41,6 +42,7 @@
 #include <asm/mman.h>
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
+#include <asm/cpufeature.h>
 #include <asm/virt.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
@@ -1574,6 +1576,11 @@ int kvm_arch_init(void *opaque)
 		return -ENODEV;
 	}
 
+	if (!kvm_arch_check_sve_has_vhe()) {
+		kvm_pr_unimpl("SVE system without VHE unsupported.  Broken cpu?");
+		return -ENODEV;
+	}
+
 	for_each_online_cpu(cpu) {
 		smp_call_function_single(cpu, check_kvm_target_cpu, &ret, 1);
 		if (ret < 0) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 15/18] KVM: arm64: Remove eager host SVE state saving
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

Now that the host SVE context can be saved on demand from Hyp,
there is no longer any need to save this state in advance before
entering the guest.

This patch removes the relevant call to
kvm_fpsimd_flush_cpu_state().

Since the problem that function was intended to solve now no longer
exists, the function and its dependencies are also deleted.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  3 ---
 arch/arm64/include/asm/kvm_host.h | 10 ----------
 arch/arm64/kernel/fpsimd.c        | 21 ---------------------
 virt/kvm/arm/arm.c                |  3 ---
 4 files changed, 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3b85bbb..f079a20 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -312,9 +312,6 @@ static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
 
-/* All host FP/SIMD state is restored on guest exit, so nothing to save: */
-static inline void kvm_fpsimd_flush_cpu_state(void) {}
-
 static inline void kvm_arm_vhe_guest_enter(void) {}
 static inline void kvm_arm_vhe_guest_exit(void) {}
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 06d5a61..ce7ed92 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -457,16 +457,6 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 }
 #endif
 
-/*
- * All host FP/SIMD state is restored on guest exit, so nothing needs
- * doing here except in the SVE case:
-*/
-static inline void kvm_fpsimd_flush_cpu_state(void)
-{
-	if (system_supports_sve())
-		sve_flush_cpu_state();
-}
-
 static inline void kvm_arm_vhe_guest_enter(void)
 {
 	local_daif_mask();
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index f39d3b0..ea5d780 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -120,7 +120,6 @@
  */
 struct fpsimd_last_state_struct {
 	struct user_fpsimd_state *st;
-	bool sve_in_use;
 };
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@@ -1003,7 +1002,6 @@ void fpsimd_bind_task_to_cpu(void)
 		this_cpu_ptr(&fpsimd_last_state);
 
 	last->st = &current->thread.uw.fpsimd_state;
-	last->sve_in_use = test_thread_flag(TIF_SVE);
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	if (system_supports_sve()) {
@@ -1025,7 +1023,6 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
 	WARN_ON(!in_softirq() && !irqs_disabled());
 
 	last->st = st;
-	last->sve_in_use = false;
 }
 
 /*
@@ -1086,24 +1083,6 @@ void fpsimd_flush_cpu_state(void)
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
 }
 
-/*
- * Invalidate any task SVE state currently held in this CPU's regs.
- *
- * This is used to prevent the kernel from trying to reuse SVE register data
- * that is detroyed by KVM guest enter/exit.  This function should go away when
- * KVM SVE support is implemented.  Don't use it for anything else.
- */
-#ifdef CONFIG_ARM64_SVE
-void sve_flush_cpu_state(void)
-{
-	struct fpsimd_last_state_struct const *last =
-		this_cpu_ptr(&fpsimd_last_state);
-
-	if (last->st && last->sve_in_use)
-		fpsimd_flush_cpu_state();
-}
-#endif /* CONFIG_ARM64_SVE */
-
 #ifdef CONFIG_KERNEL_MODE_NEON
 
 DEFINE_PER_CPU(bool, kernel_neon_busy);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index ce7c6f3..39e7771 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -682,9 +682,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		preempt_disable();
 
-		/* Flush FP/SIMD state that can't survive guest entry/exit */
-		kvm_fpsimd_flush_cpu_state();
-
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 15/18] KVM: arm64: Remove eager host SVE state saving
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

Now that the host SVE context can be saved on demand from Hyp,
there is no longer any need to save this state in advance before
entering the guest.

This patch removes the relevant call to
kvm_fpsimd_flush_cpu_state().

Since the problem that function was intended to solve now no longer
exists, the function and its dependencies are also deleted.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  3 ---
 arch/arm64/include/asm/kvm_host.h | 10 ----------
 arch/arm64/kernel/fpsimd.c        | 21 ---------------------
 virt/kvm/arm/arm.c                |  3 ---
 4 files changed, 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3b85bbb..f079a20 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -312,9 +312,6 @@ static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
 
-/* All host FP/SIMD state is restored on guest exit, so nothing to save: */
-static inline void kvm_fpsimd_flush_cpu_state(void) {}
-
 static inline void kvm_arm_vhe_guest_enter(void) {}
 static inline void kvm_arm_vhe_guest_exit(void) {}
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 06d5a61..ce7ed92 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -457,16 +457,6 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 }
 #endif
 
-/*
- * All host FP/SIMD state is restored on guest exit, so nothing needs
- * doing here except in the SVE case:
-*/
-static inline void kvm_fpsimd_flush_cpu_state(void)
-{
-	if (system_supports_sve())
-		sve_flush_cpu_state();
-}
-
 static inline void kvm_arm_vhe_guest_enter(void)
 {
 	local_daif_mask();
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index f39d3b0..ea5d780 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -120,7 +120,6 @@
  */
 struct fpsimd_last_state_struct {
 	struct user_fpsimd_state *st;
-	bool sve_in_use;
 };
 
 static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
@@ -1003,7 +1002,6 @@ void fpsimd_bind_task_to_cpu(void)
 		this_cpu_ptr(&fpsimd_last_state);
 
 	last->st = &current->thread.uw.fpsimd_state;
-	last->sve_in_use = test_thread_flag(TIF_SVE);
 	current->thread.fpsimd_cpu = smp_processor_id();
 
 	if (system_supports_sve()) {
@@ -1025,7 +1023,6 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
 	WARN_ON(!in_softirq() && !irqs_disabled());
 
 	last->st = st;
-	last->sve_in_use = false;
 }
 
 /*
@@ -1086,24 +1083,6 @@ void fpsimd_flush_cpu_state(void)
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
 }
 
-/*
- * Invalidate any task SVE state currently held in this CPU's regs.
- *
- * This is used to prevent the kernel from trying to reuse SVE register data
- * that is detroyed by KVM guest enter/exit.  This function should go away when
- * KVM SVE support is implemented.  Don't use it for anything else.
- */
-#ifdef CONFIG_ARM64_SVE
-void sve_flush_cpu_state(void)
-{
-	struct fpsimd_last_state_struct const *last =
-		this_cpu_ptr(&fpsimd_last_state);
-
-	if (last->st && last->sve_in_use)
-		fpsimd_flush_cpu_state();
-}
-#endif /* CONFIG_ARM64_SVE */
-
 #ifdef CONFIG_KERNEL_MODE_NEON
 
 DEFINE_PER_CPU(bool, kernel_neon_busy);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index ce7c6f3..39e7771 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -682,9 +682,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		preempt_disable();
 
-		/* Flush FP/SIMD state that can't survive guest entry/exit */
-		kvm_fpsimd_flush_cpu_state();
-
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 16/18] KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

In fixup_guest_exit(), there are a couple of cases where after
checking what the exit code was, we assign it explicitly with the
value it already had.

Assuming this is not indicative of a bug, these assignments are not
needed.

This patch removes the redundant assignments, and simplifies some
if-nesting that becomes trivial as a result.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index a6a8c7d..18d0faa 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -403,12 +403,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 		if (valid) {
 			int ret = __vgic_v2_perform_cpuif_access(vcpu);
 
-			if (ret == 1) {
-				if (__skip_instr(vcpu))
-					return true;
-				else
-					*exit_code = ARM_EXCEPTION_TRAP;
-			}
+			if (ret ==  1 && __skip_instr(vcpu))
+				return true;
 
 			if (ret == -1) {
 				/* Promote an illegal access to an
@@ -430,12 +426,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
 		int ret = __vgic_v3_perform_cpuif_access(vcpu);
 
-		if (ret == 1) {
-			if (__skip_instr(vcpu))
-				return true;
-			else
-				*exit_code = ARM_EXCEPTION_TRAP;
-		}
+		if (ret == 1 && __skip_instr(vcpu))
+			return true;
 	}
 
 	/* Return to the host kernel and handle the exit */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 16/18] KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

In fixup_guest_exit(), there are a couple of cases where after
checking what the exit code was, we assign it explicitly with the
value it already had.

Assuming this is not indicative of a bug, these assignments are not
needed.

This patch removes the redundant assignments, and simplifies some
if-nesting that becomes trivial as a result.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index a6a8c7d..18d0faa 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -403,12 +403,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 		if (valid) {
 			int ret = __vgic_v2_perform_cpuif_access(vcpu);
 
-			if (ret == 1) {
-				if (__skip_instr(vcpu))
-					return true;
-				else
-					*exit_code = ARM_EXCEPTION_TRAP;
-			}
+			if (ret ==  1 && __skip_instr(vcpu))
+				return true;
 
 			if (ret == -1) {
 				/* Promote an illegal access to an
@@ -430,12 +426,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
 		int ret = __vgic_v3_perform_cpuif_access(vcpu);
 
-		if (ret == 1) {
-			if (__skip_instr(vcpu))
-				return true;
-			else
-				*exit_code = ARM_EXCEPTION_TRAP;
-		}
+		if (ret == 1 && __skip_instr(vcpu))
+			return true;
 	}
 
 	/* Return to the host kernel and handle the exit */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 17/18] KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

The entire tail of fixup_guest_exit() is contained in if statements
of the form if (x && *exit_code == ARM_EXCEPTION_TRAP).  As a result,
we can check just once and bail out of the function early, allowing
the remaining if conditions to be simplified.

The only awkward case is where *exit_code is changed to
ARM_EXCEPTION_EL1_SERROR in the case of an illegal GICv2 CPU
interface access: in that case, the GICv3 trap handling code is
skipped using a goto.  This avoids pointlessly evaluating the
static branch check for the GICv3 case, even though we can't have
vgic_v2_cpuif_trap and vgic_v3_cpuif_trap true simultaneously
unless we have a GICv3 and GICv2 on the host: that sounds stupid,
but I haven't satisfied myself that it can't happen.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 18d0faa..4fbee95 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -387,11 +387,13 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	 * same PC once the SError has been injected, and replay the
 	 * trapping instruction.
 	 */
-	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
+	if (*exit_code != ARM_EXCEPTION_TRAP)
+		goto exit;
+
+	if (!__populate_fault_info(vcpu))
 		return true;
 
-	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
-	    *exit_code == ARM_EXCEPTION_TRAP) {
+	if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
 		bool valid;
 
 		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
@@ -417,11 +419,12 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
 				*exit_code = ARM_EXCEPTION_EL1_SERROR;
 			}
+
+			goto exit;
 		}
 	}
 
 	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
-	    *exit_code == ARM_EXCEPTION_TRAP &&
 	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
 	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
 		int ret = __vgic_v3_perform_cpuif_access(vcpu);
@@ -430,6 +433,7 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 			return true;
 	}
 
+exit:
 	/* Return to the host kernel and handle the exit */
 	return false;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 17/18] KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

The entire tail of fixup_guest_exit() is contained in if statements
of the form if (x && *exit_code == ARM_EXCEPTION_TRAP).  As a result,
we can check just once and bail out of the function early, allowing
the remaining if conditions to be simplified.

The only awkward case is where *exit_code is changed to
ARM_EXCEPTION_EL1_SERROR in the case of an illegal GICv2 CPU
interface access: in that case, the GICv3 trap handling code is
skipped using a goto.  This avoids pointlessly evaluating the
static branch check for the GICv3 case, even though we can't have
vgic_v2_cpuif_trap and vgic_v3_cpuif_trap true simultaneously
unless we have a GICv3 and GICv2 on the host: that sounds stupid,
but I haven't satisfied myself that it can't happen.

No functional change.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 18d0faa..4fbee95 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -387,11 +387,13 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	 * same PC once the SError has been injected, and replay the
 	 * trapping instruction.
 	 */
-	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
+	if (*exit_code != ARM_EXCEPTION_TRAP)
+		goto exit;
+
+	if (!__populate_fault_info(vcpu))
 		return true;
 
-	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
-	    *exit_code == ARM_EXCEPTION_TRAP) {
+	if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
 		bool valid;
 
 		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
@@ -417,11 +419,12 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
 				*exit_code = ARM_EXCEPTION_EL1_SERROR;
 			}
+
+			goto exit;
 		}
 	}
 
 	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
-	    *exit_code == ARM_EXCEPTION_TRAP &&
 	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
 	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
 		int ret = __vgic_v3_perform_cpuif_access(vcpu);
@@ -430,6 +433,7 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 			return true;
 	}
 
+exit:
 	/* Return to the host kernel and handle the exit */
 	return false;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 18/18] KVM: arm64: Invoke FPSIMD context switch trap from C
  2018-05-22 16:05 ` Dave Martin
@ 2018-05-22 16:05   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel

The conversion of the FPSIMD context switch trap code to C has added
some overhead to calling it, due to the need to save registers that
the procedure call standard defines as caller-saved.

So, perhaps it is no longer worth invoking this trap handler quite
so early.

Instead, we can invoke it from fixup_guest_exit(), with little
likelihood of increasing the overhead much further.

As a convenience, this patch gives __hyp_switch_fpsimd() the same
return semantics fixup_guest_exit().  For now there is no
possibility of a spurious FPSIMD trap, so the function always
returns true, but this allows it to be tail-called with a single
return statement.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/kvm/hyp/entry.S     | 30 ------------------------------
 arch/arm64/kvm/hyp/hyp-entry.S | 19 -------------------
 arch/arm64/kvm/hyp/switch.c    | 15 +++++++++++++--
 3 files changed, 13 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 40f349b..fad1e16 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -166,33 +166,3 @@ abort_guest_exit_end:
 	orr	x0, x0, x5
 1:	ret
 ENDPROC(__guest_exit)
-
-ENTRY(__fpsimd_guest_restore)
-	// x0: esr
-	// x1: vcpu
-	// x2-x29,lr: vcpu regs
-	// vcpu x0-x1 on the stack
-	stp	x2, x3, [sp, #-144]!
-	stp	x4, x5, [sp, #16]
-	stp	x6, x7, [sp, #32]
-	stp	x8, x9, [sp, #48]
-	stp	x10, x11, [sp, #64]
-	stp	x12, x13, [sp, #80]
-	stp	x14, x15, [sp, #96]
-	stp	x16, x17, [sp, #112]
-	stp	x18, lr, [sp, #128]
-
-	bl	__hyp_switch_fpsimd
-
-	ldp	x4, x5, [sp, #16]
-	ldp	x6, x7, [sp, #32]
-	ldp	x8, x9, [sp, #48]
-	ldp	x10, x11, [sp, #64]
-	ldp	x12, x13, [sp, #80]
-	ldp	x14, x15, [sp, #96]
-	ldp	x16, x17, [sp, #112]
-	ldp	x18, lr, [sp, #128]
-	ldp	x0, x1, [sp, #144]
-	ldp	x2, x3, [sp], #160
-	eret
-ENDPROC(__fpsimd_guest_restore)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index bffece2..753b9d2 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -113,25 +113,6 @@ el1_hvc_guest:
 
 el1_trap:
 	get_vcpu_ptr	x1, x0
-
-	mrs		x0, esr_el2
-	lsr		x0, x0, #ESR_ELx_EC_SHIFT
-	/*
-	 * x0: ESR_EC
-	 * x1: vcpu pointer
-	 */
-
-	/*
-	 * We trap the first access to the FP/SIMD to save the host context
-	 * and restore the guest context lazily.
-	 * If FP/SIMD is not implemented, handle the trap and inject an
-	 * undefined instruction exception to the guest.
-	 */
-alternative_if_not ARM64_HAS_NO_FPSIMD
-	cmp	x0, #ESR_ELx_EC_FP_ASIMD
-	b.eq	__fpsimd_guest_restore
-alternative_else_nop_endif
-
 	mov	x0, #ARM_EXCEPTION_TRAP
 	b	__guest_exit
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 4fbee95..2d45bd7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -328,8 +328,7 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 	}
 }
 
-void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
-				    struct kvm_vcpu *vcpu)
+static bool __hyp_text __hyp_switch_fpsimd(struct kvm_vcpu *vcpu)
 {
 	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
 
@@ -369,6 +368,8 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 			     fpexc32_el2);
 
 	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
+
+	return true;
 }
 
 /*
@@ -390,6 +391,16 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (*exit_code != ARM_EXCEPTION_TRAP)
 		goto exit;
 
+	/*
+	 * We trap the first access to the FP/SIMD to save the host context
+	 * and restore the guest context lazily.
+	 * If FP/SIMD is not implemented, handle the trap and inject an
+	 * undefined instruction exception to the guest.
+	 */
+	if (system_supports_fpsimd() &&
+	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD)
+		return __hyp_switch_fpsimd(vcpu);
+
 	if (!__populate_fault_info(vcpu))
 		return true;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v10 18/18] KVM: arm64: Invoke FPSIMD context switch trap from C
@ 2018-05-22 16:05   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-22 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

The conversion of the FPSIMD context switch trap code to C has added
some overhead to calling it, due to the need to save registers that
the procedure call standard defines as caller-saved.

So, perhaps it is no longer worth invoking this trap handler quite
so early.

Instead, we can invoke it from fixup_guest_exit(), with little
likelihood of increasing the overhead much further.

As a convenience, this patch gives __hyp_switch_fpsimd() the same
return semantics fixup_guest_exit().  For now there is no
possibility of a spurious FPSIMD trap, so the function always
returns true, but this allows it to be tail-called with a single
return statement.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/kvm/hyp/entry.S     | 30 ------------------------------
 arch/arm64/kvm/hyp/hyp-entry.S | 19 -------------------
 arch/arm64/kvm/hyp/switch.c    | 15 +++++++++++++--
 3 files changed, 13 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 40f349b..fad1e16 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -166,33 +166,3 @@ abort_guest_exit_end:
 	orr	x0, x0, x5
 1:	ret
 ENDPROC(__guest_exit)
-
-ENTRY(__fpsimd_guest_restore)
-	// x0: esr
-	// x1: vcpu
-	// x2-x29,lr: vcpu regs
-	// vcpu x0-x1 on the stack
-	stp	x2, x3, [sp, #-144]!
-	stp	x4, x5, [sp, #16]
-	stp	x6, x7, [sp, #32]
-	stp	x8, x9, [sp, #48]
-	stp	x10, x11, [sp, #64]
-	stp	x12, x13, [sp, #80]
-	stp	x14, x15, [sp, #96]
-	stp	x16, x17, [sp, #112]
-	stp	x18, lr, [sp, #128]
-
-	bl	__hyp_switch_fpsimd
-
-	ldp	x4, x5, [sp, #16]
-	ldp	x6, x7, [sp, #32]
-	ldp	x8, x9, [sp, #48]
-	ldp	x10, x11, [sp, #64]
-	ldp	x12, x13, [sp, #80]
-	ldp	x14, x15, [sp, #96]
-	ldp	x16, x17, [sp, #112]
-	ldp	x18, lr, [sp, #128]
-	ldp	x0, x1, [sp, #144]
-	ldp	x2, x3, [sp], #160
-	eret
-ENDPROC(__fpsimd_guest_restore)
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index bffece2..753b9d2 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -113,25 +113,6 @@ el1_hvc_guest:
 
 el1_trap:
 	get_vcpu_ptr	x1, x0
-
-	mrs		x0, esr_el2
-	lsr		x0, x0, #ESR_ELx_EC_SHIFT
-	/*
-	 * x0: ESR_EC
-	 * x1: vcpu pointer
-	 */
-
-	/*
-	 * We trap the first access to the FP/SIMD to save the host context
-	 * and restore the guest context lazily.
-	 * If FP/SIMD is not implemented, handle the trap and inject an
-	 * undefined instruction exception to the guest.
-	 */
-alternative_if_not ARM64_HAS_NO_FPSIMD
-	cmp	x0, #ESR_ELx_EC_FP_ASIMD
-	b.eq	__fpsimd_guest_restore
-alternative_else_nop_endif
-
 	mov	x0, #ARM_EXCEPTION_TRAP
 	b	__guest_exit
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 4fbee95..2d45bd7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -328,8 +328,7 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
 	}
 }
 
-void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
-				    struct kvm_vcpu *vcpu)
+static bool __hyp_text __hyp_switch_fpsimd(struct kvm_vcpu *vcpu)
 {
 	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
 
@@ -369,6 +368,8 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
 			     fpexc32_el2);
 
 	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
+
+	return true;
 }
 
 /*
@@ -390,6 +391,16 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (*exit_code != ARM_EXCEPTION_TRAP)
 		goto exit;
 
+	/*
+	 * We trap the first access to the FP/SIMD to save the host context
+	 * and restore the guest context lazily.
+	 * If FP/SIMD is not implemented, handle the trap and inject an
+	 * undefined instruction exception to the guest.
+	 */
+	if (system_supports_fpsimd() &&
+	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD)
+		return __hyp_switch_fpsimd(vcpu);
+
 	if (!__populate_fault_info(vcpu))
 		return true;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 11:33     ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-23 11:33 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Tue, May 22, 2018 at 05:05:02PM +0100, Dave Martin wrote:
> fpsimd_last_state.st is set to NULL as a way of indicating that
> current's FPSIMD registers are no longer loaded in the cpu.  In
> particular, this is done when the kernel temporarily uses or
> clobbers the FPSIMD registers for its own purposes, as in CPU PM or
> kernel-mode NEON, resulting in them being populated with garbage
> data not belonging to a task.
> 
> Commit 17eed27b02da ("arm64/sve: KVM: Prevent guests from using
> SVE") factors this operation out as a new helper
> fpsimd_flush_cpu_state() to make it clearer what is being done
> here, and on SVE systems this helper is now used, via
> kvm_fpsimd_flush_cpu_state(), to invalidate the registers after KVM
> has run a vcpu.  The reason for this is that KVM does not yet
> understand how to restore the full host SVE registers itself after
> loading the guest FPSIMD context into them.
> 
> This exposes a particular problem: if fpsimd_last_state.st is set
> to NULL without also setting TIF_FOREIGN_FPSTATE, the kernel may
> continue to think that current's FPSIMD registers are live even
> though they have actually been clobbered.
> 
> Prior to the aforementioned commit, the only path where
> fpsimd_last_state.st is set to NULL without setting
> TIF_FOREIGN_FPSTATE is when kernel_neon_begin() is called by a
> kernel thread (where current->mm can be NULL).  This does not
> matter, because the only harm is that at context-switch time
> fpsimd_thread_switch() may unnecessarily save the FPSIMD registers
> back to current's thread_struct (even though kernel threads are not
> considered to have any FPSIMD context of their own and the
> registers will never be reloaded).
> 
> Note that although CPU_PM_ENTER lacks the TIF_FOREIGN_FPSTATE
> setting, every CPU passing through that path must subsequently pass
> through CPU_PM_EXIT before it can re-enter the kernel proper.
> CPU_PM_EXIT sets the flag.
> 
> The sve_flush_cpu_state() function added by commit 17eed27b02da
> also lacks the proper maintenance of TIF_FOREIGN_FPSTATE.  This may
> cause the bits of a host task's SVE registers that do not alias the
> FPSIMD register file to spontaneously appear zeroed if a KVM vcpu
> runs in the same task in the meantime.  Although this effect is
> hidden by the fact that the non-FPSIMD bits of the SVE registers
> are zeroed by a syscall anyway, it is doubtless a bad idea to rely
> on these different code paths interacting correctly under future
> maintenance.
> 
> This patch makes TIF_FOREIGN_FPSTATE an unconditional side-effect
> of fpsimd_flush_cpu_state(), and removes the set_thread_flag()
> calls that become redunant as a result.  This ensures that

nit: redundant

> TIF_FOREIGN_FPSTATE cannot remain clear if the FPSIMD state in the
> FPSIMD registers is invalid.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 


> ---
> 
> Changes since v9:
> 
>  * New patch (bugfix to subsequent commits).
> ---
>  arch/arm64/kernel/fpsimd.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 87a3536..12e1c96 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -1067,6 +1067,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
>  static inline void fpsimd_flush_cpu_state(void)
>  {
>  	__this_cpu_write(fpsimd_last_state.st, NULL);
> +	set_thread_flag(TIF_FOREIGN_FPSTATE);
>  }
>  
>  /*
> @@ -1121,10 +1122,8 @@ void kernel_neon_begin(void)
>  	__this_cpu_write(kernel_neon_busy, true);
>  
>  	/* Save unsaved task fpsimd state, if any: */
> -	if (current->mm) {
> +	if (current->mm)
>  		task_fpsimd_save();
> -		set_thread_flag(TIF_FOREIGN_FPSTATE);
> -	}
>  
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
> @@ -1251,8 +1250,6 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  		fpsimd_flush_cpu_state();
>  		break;
>  	case CPU_PM_EXIT:
> -		if (current->mm)
> -			set_thread_flag(TIF_FOREIGN_FPSTATE);
>  		break;
>  	case CPU_PM_ENTER_FAILED:
>  	default:
> -- 
> 2.1.4
> 

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
@ 2018-05-23 11:33     ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-23 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 22, 2018 at 05:05:02PM +0100, Dave Martin wrote:
> fpsimd_last_state.st is set to NULL as a way of indicating that
> current's FPSIMD registers are no longer loaded in the cpu.  In
> particular, this is done when the kernel temporarily uses or
> clobbers the FPSIMD registers for its own purposes, as in CPU PM or
> kernel-mode NEON, resulting in them being populated with garbage
> data not belonging to a task.
> 
> Commit 17eed27b02da ("arm64/sve: KVM: Prevent guests from using
> SVE") factors this operation out as a new helper
> fpsimd_flush_cpu_state() to make it clearer what is being done
> here, and on SVE systems this helper is now used, via
> kvm_fpsimd_flush_cpu_state(), to invalidate the registers after KVM
> has run a vcpu.  The reason for this is that KVM does not yet
> understand how to restore the full host SVE registers itself after
> loading the guest FPSIMD context into them.
> 
> This exposes a particular problem: if fpsimd_last_state.st is set
> to NULL without also setting TIF_FOREIGN_FPSTATE, the kernel may
> continue to think that current's FPSIMD registers are live even
> though they have actually been clobbered.
> 
> Prior to the aforementioned commit, the only path where
> fpsimd_last_state.st is set to NULL without setting
> TIF_FOREIGN_FPSTATE is when kernel_neon_begin() is called by a
> kernel thread (where current->mm can be NULL).  This does not
> matter, because the only harm is that at context-switch time
> fpsimd_thread_switch() may unnecessarily save the FPSIMD registers
> back to current's thread_struct (even though kernel threads are not
> considered to have any FPSIMD context of their own and the
> registers will never be reloaded).
> 
> Note that although CPU_PM_ENTER lacks the TIF_FOREIGN_FPSTATE
> setting, every CPU passing through that path must subsequently pass
> through CPU_PM_EXIT before it can re-enter the kernel proper.
> CPU_PM_EXIT sets the flag.
> 
> The sve_flush_cpu_state() function added by commit 17eed27b02da
> also lacks the proper maintenance of TIF_FOREIGN_FPSTATE.  This may
> cause the bits of a host task's SVE registers that do not alias the
> FPSIMD register file to spontaneously appear zeroed if a KVM vcpu
> runs in the same task in the meantime.  Although this effect is
> hidden by the fact that the non-FPSIMD bits of the SVE registers
> are zeroed by a syscall anyway, it is doubtless a bad idea to rely
> on these different code paths interacting correctly under future
> maintenance.
> 
> This patch makes TIF_FOREIGN_FPSTATE an unconditional side-effect
> of fpsimd_flush_cpu_state(), and removes the set_thread_flag()
> calls that become redunant as a result.  This ensures that

nit: redundant

> TIF_FOREIGN_FPSTATE cannot remain clear if the FPSIMD state in the
> FPSIMD registers is invalid.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 


> ---
> 
> Changes since v9:
> 
>  * New patch (bugfix to subsequent commits).
> ---
>  arch/arm64/kernel/fpsimd.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 87a3536..12e1c96 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -1067,6 +1067,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
>  static inline void fpsimd_flush_cpu_state(void)
>  {
>  	__this_cpu_write(fpsimd_last_state.st, NULL);
> +	set_thread_flag(TIF_FOREIGN_FPSTATE);
>  }
>  
>  /*
> @@ -1121,10 +1122,8 @@ void kernel_neon_begin(void)
>  	__this_cpu_write(kernel_neon_busy, true);
>  
>  	/* Save unsaved task fpsimd state, if any: */
> -	if (current->mm) {
> +	if (current->mm)
>  		task_fpsimd_save();
> -		set_thread_flag(TIF_FOREIGN_FPSTATE);
> -	}
>  
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
> @@ -1251,8 +1250,6 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  		fpsimd_flush_cpu_state();
>  		break;
>  	case CPU_PM_EXIT:
> -		if (current->mm)
> -			set_thread_flag(TIF_FOREIGN_FPSTATE);
>  		break;
>  	case CPU_PM_ENTER_FAILED:
>  	default:
> -- 
> 2.1.4
> 

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 11:48     ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-23 11:48 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
> 
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For these reasons, the ->mm
> checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> maintained in a consistent way for kernel threads.
> 
> This is true by construction however: TIF_FOREIGN_FPSTATE is never
> cleared except when returning to userspace or returning from a
> signal: thus, for a true kernel thread no FPSIMD context is ever
> loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> ever be saved.

I don't understand this construction proof; from looking at the patch
below it is not obvious to me why fpsimd_thread_switch() can never have
!wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
kernel thread?


Thanks,
-Christoffer

> 
> This patch removes the redundant checks and special-case code.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> ---
> 
> Changes since v9:
> 
>  * New patch.  Introduced during debugging, since the ->mm checks
>    appear bogus and/or redundant, so are likely to be hiding or
>    causing bugs.
> ---
>  arch/arm64/include/asm/thread_info.h |  1 +
>  arch/arm64/kernel/fpsimd.c           | 38 ++++++++++++------------------------
>  2 files changed, 14 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 740aa03c..a2ac914 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -47,6 +47,7 @@ struct thread_info {
>  
>  #define INIT_THREAD_INFO(tsk)						\
>  {									\
> +	.flags		= _TIF_FOREIGN_FPSTATE,				\
>  	.preempt_count	= INIT_PREEMPT_COUNT,				\
>  	.addr_limit	= KERNEL_DS,					\
>  }
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 3aa100a..1222491 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -891,31 +891,21 @@ asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>  
>  void fpsimd_thread_switch(struct task_struct *next)
>  {
> +	bool wrong_task, wrong_cpu;
> +
>  	if (!system_supports_fpsimd())
>  		return;
> -	/*
> -	 * Save the current FPSIMD state to memory, but only if whatever is in
> -	 * the registers is in fact the most recent userland FPSIMD state of
> -	 * 'current'.
> -	 */
> -	if (current->mm)
> -		fpsimd_save();
>  
> -	if (next->mm) {
> -		/*
> -		 * If we are switching to a task whose most recent userland
> -		 * FPSIMD state is already in the registers of *this* cpu,
> -		 * we can skip loading the state from memory. Otherwise, set
> -		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
> -		 * upon the next return to userland.
> -		 */
> -		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> +	/* Save unsaved fpsimd state, if any: */
> +	fpsimd_save();
> +
> +	/* Fix up TIF_FOREIGN_FPSTATE to correctly describe next's state: */
> +	wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
>  					&next->thread.uw.fpsimd_state;
> -		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> +	wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
>  
> -		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> -				       wrong_task || wrong_cpu);
> -	}
> +	update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> +			       wrong_task || wrong_cpu);
>  }
>  
>  void fpsimd_flush_thread(void)
> @@ -1120,9 +1110,8 @@ void kernel_neon_begin(void)
>  
>  	__this_cpu_write(kernel_neon_busy, true);
>  
> -	/* Save unsaved task fpsimd state, if any: */
> -	if (current->mm)
> -		fpsimd_save();
> +	/* Save unsaved fpsimd state, if any: */
> +	fpsimd_save();
>  
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
> @@ -1244,8 +1233,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  {
>  	switch (cmd) {
>  	case CPU_PM_ENTER:
> -		if (current->mm)
> -			fpsimd_save();
> +		fpsimd_save();
>  		fpsimd_flush_cpu_state();
>  		break;
>  	case CPU_PM_EXIT:
> -- 
> 2.1.4
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-23 11:48     ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-23 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
> 
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For these reasons, the ->mm
> checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> maintained in a consistent way for kernel threads.
> 
> This is true by construction however: TIF_FOREIGN_FPSTATE is never
> cleared except when returning to userspace or returning from a
> signal: thus, for a true kernel thread no FPSIMD context is ever
> loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> ever be saved.

I don't understand this construction proof; from looking at the patch
below it is not obvious to me why fpsimd_thread_switch() can never have
!wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
kernel thread?


Thanks,
-Christoffer

> 
> This patch removes the redundant checks and special-case code.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> ---
> 
> Changes since v9:
> 
>  * New patch.  Introduced during debugging, since the ->mm checks
>    appear bogus and/or redundant, so are likely to be hiding or
>    causing bugs.
> ---
>  arch/arm64/include/asm/thread_info.h |  1 +
>  arch/arm64/kernel/fpsimd.c           | 38 ++++++++++++------------------------
>  2 files changed, 14 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 740aa03c..a2ac914 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -47,6 +47,7 @@ struct thread_info {
>  
>  #define INIT_THREAD_INFO(tsk)						\
>  {									\
> +	.flags		= _TIF_FOREIGN_FPSTATE,				\
>  	.preempt_count	= INIT_PREEMPT_COUNT,				\
>  	.addr_limit	= KERNEL_DS,					\
>  }
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 3aa100a..1222491 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -891,31 +891,21 @@ asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>  
>  void fpsimd_thread_switch(struct task_struct *next)
>  {
> +	bool wrong_task, wrong_cpu;
> +
>  	if (!system_supports_fpsimd())
>  		return;
> -	/*
> -	 * Save the current FPSIMD state to memory, but only if whatever is in
> -	 * the registers is in fact the most recent userland FPSIMD state of
> -	 * 'current'.
> -	 */
> -	if (current->mm)
> -		fpsimd_save();
>  
> -	if (next->mm) {
> -		/*
> -		 * If we are switching to a task whose most recent userland
> -		 * FPSIMD state is already in the registers of *this* cpu,
> -		 * we can skip loading the state from memory. Otherwise, set
> -		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
> -		 * upon the next return to userland.
> -		 */
> -		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> +	/* Save unsaved fpsimd state, if any: */
> +	fpsimd_save();
> +
> +	/* Fix up TIF_FOREIGN_FPSTATE to correctly describe next's state: */
> +	wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
>  					&next->thread.uw.fpsimd_state;
> -		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> +	wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
>  
> -		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> -				       wrong_task || wrong_cpu);
> -	}
> +	update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> +			       wrong_task || wrong_cpu);
>  }
>  
>  void fpsimd_flush_thread(void)
> @@ -1120,9 +1110,8 @@ void kernel_neon_begin(void)
>  
>  	__this_cpu_write(kernel_neon_busy, true);
>  
> -	/* Save unsaved task fpsimd state, if any: */
> -	if (current->mm)
> -		fpsimd_save();
> +	/* Save unsaved fpsimd state, if any: */
> +	fpsimd_save();
>  
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
> @@ -1244,8 +1233,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  {
>  	switch (cmd) {
>  	case CPU_PM_ENTER:
> -		if (current->mm)
> -			fpsimd_save();
> +		fpsimd_save();
>  		fpsimd_flush_cpu_state();
>  		break;
>  	case CPU_PM_EXIT:
> -- 
> 2.1.4
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-23 11:48     ` Christoffer Dall
@ 2018-05-23 13:31       ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-23 13:31 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > Currently the FPSIMD handling code uses the condition task->mm ==
> > NULL as a hint that task has no FPSIMD register context.
> > 
> > The ->mm check is only there to filter out tasks that cannot
> > possibly have FPSIMD context loaded, for optimisation purposes.
> > Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > saving FPSIMD context back to memory.  For these reasons, the ->mm
> > checks are not useful, providing that that TIF_FOREIGN_FPSTATE is

Hmmm, "that that".  I'll fix that.

> > maintained in a consistent way for kernel threads.
> > 
> > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > cleared except when returning to userspace or returning from a
> > signal: thus, for a true kernel thread no FPSIMD context is ever
> > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > ever be saved.
> 
> I don't understand this construction proof; from looking at the patch
> below it is not obvious to me why fpsimd_thread_switch() can never have
> !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> kernel thread?

Looking at this again, I think it is poorly worded.  This patch aims to
make it true by construction, but it isn't prior to the patch.

I'm tempted to delete the paragraph: the assertion of both untrue and
not the best way to justify that this patch works.


How about:

-8<-

The context switch logic already isolates user threads from each other.
This, it is sufficient for isolating user threads from the kernel,
since the goal either way is to ensure that code executing in userspace
cannot see any FPSIMD state except its own.  Thus, there is no special
property of kernel threads that we care about except that it is
pointless to save or load FPSIMD register state for them.

At worst, the removal of all the kernel thread special cases by this
patch would thus spuriously load and save state for kernel threads when
unnecessary.

But the context switch logic is already deliberately optimised to defer
reloads of the regs until ret_to_user (or sigreturn as a special case),
which kernel threads by definition never reach.

->8-


As an aside, I notice that we allow thread.fpsimd_cpu to be initialised
to 0 for the init task.  This means that the wrong_cpu check may yield
false for the init task when it shouldn't, because the init task's
FPSIMD state (which doesn't logically exist) is never loaded anywhere.
But the wrong_task check will always yield true for the init task for
the same reason, so this is just an inconsistency in the code rather
than a bug AFAICT.

copy_thread() calls fpsimd_flush_task_state() to make sure that
wrong_cpu is initially true for new tasks.  We should do the same for
the init task by adding

	.fpsimd_cpu = NR_CPUS,

to INIT_THREAD.


Cheers
---Dave

> 
> 
> Thanks,
> -Christoffer
> 
> > 
> > This patch removes the redundant checks and special-case code.
> > 
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > 
> > ---
> > 
> > Changes since v9:
> > 
> >  * New patch.  Introduced during debugging, since the ->mm checks
> >    appear bogus and/or redundant, so are likely to be hiding or
> >    causing bugs.
> > ---
> >  arch/arm64/include/asm/thread_info.h |  1 +
> >  arch/arm64/kernel/fpsimd.c           | 38 ++++++++++++------------------------
> >  2 files changed, 14 insertions(+), 25 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> > index 740aa03c..a2ac914 100644
> > --- a/arch/arm64/include/asm/thread_info.h
> > +++ b/arch/arm64/include/asm/thread_info.h
> > @@ -47,6 +47,7 @@ struct thread_info {
> >  
> >  #define INIT_THREAD_INFO(tsk)						\
> >  {									\
> > +	.flags		= _TIF_FOREIGN_FPSTATE,				\
> >  	.preempt_count	= INIT_PREEMPT_COUNT,				\
> >  	.addr_limit	= KERNEL_DS,					\
> >  }
> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > index 3aa100a..1222491 100644
> > --- a/arch/arm64/kernel/fpsimd.c
> > +++ b/arch/arm64/kernel/fpsimd.c
> > @@ -891,31 +891,21 @@ asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
> >  
> >  void fpsimd_thread_switch(struct task_struct *next)
> >  {
> > +	bool wrong_task, wrong_cpu;
> > +
> >  	if (!system_supports_fpsimd())
> >  		return;
> > -	/*
> > -	 * Save the current FPSIMD state to memory, but only if whatever is in
> > -	 * the registers is in fact the most recent userland FPSIMD state of
> > -	 * 'current'.
> > -	 */
> > -	if (current->mm)
> > -		fpsimd_save();
> >  
> > -	if (next->mm) {
> > -		/*
> > -		 * If we are switching to a task whose most recent userland
> > -		 * FPSIMD state is already in the registers of *this* cpu,
> > -		 * we can skip loading the state from memory. Otherwise, set
> > -		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
> > -		 * upon the next return to userland.
> > -		 */
> > -		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> > +	/* Save unsaved fpsimd state, if any: */
> > +	fpsimd_save();
> > +
> > +	/* Fix up TIF_FOREIGN_FPSTATE to correctly describe next's state: */
> > +	wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> >  					&next->thread.uw.fpsimd_state;
> > -		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> > +	wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> >  
> > -		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> > -				       wrong_task || wrong_cpu);
> > -	}
> > +	update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> > +			       wrong_task || wrong_cpu);
> >  }
> >  
> >  void fpsimd_flush_thread(void)
> > @@ -1120,9 +1110,8 @@ void kernel_neon_begin(void)
> >  
> >  	__this_cpu_write(kernel_neon_busy, true);
> >  
> > -	/* Save unsaved task fpsimd state, if any: */
> > -	if (current->mm)
> > -		fpsimd_save();
> > +	/* Save unsaved fpsimd state, if any: */
> > +	fpsimd_save();
> >  
> >  	/* Invalidate any task state remaining in the fpsimd regs: */
> >  	fpsimd_flush_cpu_state();
> > @@ -1244,8 +1233,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> >  {
> >  	switch (cmd) {
> >  	case CPU_PM_ENTER:
> > -		if (current->mm)
> > -			fpsimd_save();
> > +		fpsimd_save();
> >  		fpsimd_flush_cpu_state();
> >  		break;
> >  	case CPU_PM_EXIT:
> > -- 
> > 2.1.4

[...]

> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-23 13:31       ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-23 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > Currently the FPSIMD handling code uses the condition task->mm ==
> > NULL as a hint that task has no FPSIMD register context.
> > 
> > The ->mm check is only there to filter out tasks that cannot
> > possibly have FPSIMD context loaded, for optimisation purposes.
> > Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > saving FPSIMD context back to memory.  For these reasons, the ->mm
> > checks are not useful, providing that that TIF_FOREIGN_FPSTATE is

Hmmm, "that that".  I'll fix that.

> > maintained in a consistent way for kernel threads.
> > 
> > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > cleared except when returning to userspace or returning from a
> > signal: thus, for a true kernel thread no FPSIMD context is ever
> > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > ever be saved.
> 
> I don't understand this construction proof; from looking at the patch
> below it is not obvious to me why fpsimd_thread_switch() can never have
> !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> kernel thread?

Looking at this again, I think it is poorly worded.  This patch aims to
make it true by construction, but it isn't prior to the patch.

I'm tempted to delete the paragraph: the assertion of both untrue and
not the best way to justify that this patch works.


How about:

-8<-

The context switch logic already isolates user threads from each other.
This, it is sufficient for isolating user threads from the kernel,
since the goal either way is to ensure that code executing in userspace
cannot see any FPSIMD state except its own.  Thus, there is no special
property of kernel threads that we care about except that it is
pointless to save or load FPSIMD register state for them.

At worst, the removal of all the kernel thread special cases by this
patch would thus spuriously load and save state for kernel threads when
unnecessary.

But the context switch logic is already deliberately optimised to defer
reloads of the regs until ret_to_user (or sigreturn as a special case),
which kernel threads by definition never reach.

->8-


As an aside, I notice that we allow thread.fpsimd_cpu to be initialised
to 0 for the init task.  This means that the wrong_cpu check may yield
false for the init task when it shouldn't, because the init task's
FPSIMD state (which doesn't logically exist) is never loaded anywhere.
But the wrong_task check will always yield true for the init task for
the same reason, so this is just an inconsistency in the code rather
than a bug AFAICT.

copy_thread() calls fpsimd_flush_task_state() to make sure that
wrong_cpu is initially true for new tasks.  We should do the same for
the init task by adding

	.fpsimd_cpu = NR_CPUS,

to INIT_THREAD.


Cheers
---Dave

> 
> 
> Thanks,
> -Christoffer
> 
> > 
> > This patch removes the redundant checks and special-case code.
> > 
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > 
> > ---
> > 
> > Changes since v9:
> > 
> >  * New patch.  Introduced during debugging, since the ->mm checks
> >    appear bogus and/or redundant, so are likely to be hiding or
> >    causing bugs.
> > ---
> >  arch/arm64/include/asm/thread_info.h |  1 +
> >  arch/arm64/kernel/fpsimd.c           | 38 ++++++++++++------------------------
> >  2 files changed, 14 insertions(+), 25 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> > index 740aa03c..a2ac914 100644
> > --- a/arch/arm64/include/asm/thread_info.h
> > +++ b/arch/arm64/include/asm/thread_info.h
> > @@ -47,6 +47,7 @@ struct thread_info {
> >  
> >  #define INIT_THREAD_INFO(tsk)						\
> >  {									\
> > +	.flags		= _TIF_FOREIGN_FPSTATE,				\
> >  	.preempt_count	= INIT_PREEMPT_COUNT,				\
> >  	.addr_limit	= KERNEL_DS,					\
> >  }
> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > index 3aa100a..1222491 100644
> > --- a/arch/arm64/kernel/fpsimd.c
> > +++ b/arch/arm64/kernel/fpsimd.c
> > @@ -891,31 +891,21 @@ asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
> >  
> >  void fpsimd_thread_switch(struct task_struct *next)
> >  {
> > +	bool wrong_task, wrong_cpu;
> > +
> >  	if (!system_supports_fpsimd())
> >  		return;
> > -	/*
> > -	 * Save the current FPSIMD state to memory, but only if whatever is in
> > -	 * the registers is in fact the most recent userland FPSIMD state of
> > -	 * 'current'.
> > -	 */
> > -	if (current->mm)
> > -		fpsimd_save();
> >  
> > -	if (next->mm) {
> > -		/*
> > -		 * If we are switching to a task whose most recent userland
> > -		 * FPSIMD state is already in the registers of *this* cpu,
> > -		 * we can skip loading the state from memory. Otherwise, set
> > -		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
> > -		 * upon the next return to userland.
> > -		 */
> > -		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> > +	/* Save unsaved fpsimd state, if any: */
> > +	fpsimd_save();
> > +
> > +	/* Fix up TIF_FOREIGN_FPSTATE to correctly describe next's state: */
> > +	wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> >  					&next->thread.uw.fpsimd_state;
> > -		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> > +	wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> >  
> > -		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> > -				       wrong_task || wrong_cpu);
> > -	}
> > +	update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> > +			       wrong_task || wrong_cpu);
> >  }
> >  
> >  void fpsimd_flush_thread(void)
> > @@ -1120,9 +1110,8 @@ void kernel_neon_begin(void)
> >  
> >  	__this_cpu_write(kernel_neon_busy, true);
> >  
> > -	/* Save unsaved task fpsimd state, if any: */
> > -	if (current->mm)
> > -		fpsimd_save();
> > +	/* Save unsaved fpsimd state, if any: */
> > +	fpsimd_save();
> >  
> >  	/* Invalidate any task state remaining in the fpsimd regs: */
> >  	fpsimd_flush_cpu_state();
> > @@ -1244,8 +1233,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
> >  {
> >  	switch (cmd) {
> >  	case CPU_PM_ENTER:
> > -		if (current->mm)
> > -			fpsimd_save();
> > +		fpsimd_save();
> >  		fpsimd_flush_cpu_state();
> >  		break;
> >  	case CPU_PM_EXIT:
> > -- 
> > 2.1.4

[...]

> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 13:44     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 13:44 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> fpsimd_last_state.st is set to NULL as a way of indicating that
> current's FPSIMD registers are no longer loaded in the cpu.  In
> particular, this is done when the kernel temporarily uses or
> clobbers the FPSIMD registers for its own purposes, as in CPU PM or
> kernel-mode NEON, resulting in them being populated with garbage
> data not belonging to a task.
<snip>
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
@ 2018-05-23 13:44     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 13:44 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> fpsimd_last_state.st is set to NULL as a way of indicating that
> current's FPSIMD registers are no longer loaded in the cpu.  In
> particular, this is done when the kernel temporarily uses or
> clobbers the FPSIMD registers for its own purposes, as in CPU PM or
> kernel-mode NEON, resulting in them being populated with garbage
> data not belonging to a task.
<snip>
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 13:46     ` Catalin Marinas
  -1 siblings, 0 replies; 138+ messages in thread
From: Catalin Marinas @ 2018-05-23 13:46 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Will Deacon,
	kvmarm, linux-arm-kernel

On Tue, May 22, 2018 at 05:05:02PM +0100, Dave P Martin wrote:
> fpsimd_last_state.st is set to NULL as a way of indicating that
> current's FPSIMD registers are no longer loaded in the cpu.  In
> particular, this is done when the kernel temporarily uses or
> clobbers the FPSIMD registers for its own purposes, as in CPU PM or
> kernel-mode NEON, resulting in them being populated with garbage
> data not belonging to a task.
> 
> Commit 17eed27b02da ("arm64/sve: KVM: Prevent guests from using
> SVE") factors this operation out as a new helper
> fpsimd_flush_cpu_state() to make it clearer what is being done
> here, and on SVE systems this helper is now used, via
> kvm_fpsimd_flush_cpu_state(), to invalidate the registers after KVM
> has run a vcpu.  The reason for this is that KVM does not yet
> understand how to restore the full host SVE registers itself after
> loading the guest FPSIMD context into them.
> 
> This exposes a particular problem: if fpsimd_last_state.st is set
> to NULL without also setting TIF_FOREIGN_FPSTATE, the kernel may
> continue to think that current's FPSIMD registers are live even
> though they have actually been clobbered.
> 
> Prior to the aforementioned commit, the only path where
> fpsimd_last_state.st is set to NULL without setting
> TIF_FOREIGN_FPSTATE is when kernel_neon_begin() is called by a
> kernel thread (where current->mm can be NULL).  This does not
> matter, because the only harm is that at context-switch time
> fpsimd_thread_switch() may unnecessarily save the FPSIMD registers
> back to current's thread_struct (even though kernel threads are not
> considered to have any FPSIMD context of their own and the
> registers will never be reloaded).
> 
> Note that although CPU_PM_ENTER lacks the TIF_FOREIGN_FPSTATE
> setting, every CPU passing through that path must subsequently pass
> through CPU_PM_EXIT before it can re-enter the kernel proper.
> CPU_PM_EXIT sets the flag.
> 
> The sve_flush_cpu_state() function added by commit 17eed27b02da
> also lacks the proper maintenance of TIF_FOREIGN_FPSTATE.  This may
> cause the bits of a host task's SVE registers that do not alias the
> FPSIMD register file to spontaneously appear zeroed if a KVM vcpu
> runs in the same task in the meantime.  Although this effect is
> hidden by the fact that the non-FPSIMD bits of the SVE registers
> are zeroed by a syscall anyway, it is doubtless a bad idea to rely
> on these different code paths interacting correctly under future
> maintenance.
> 
> This patch makes TIF_FOREIGN_FPSTATE an unconditional side-effect
> of fpsimd_flush_cpu_state(), and removes the set_thread_flag()
> calls that become redunant as a result.  This ensures that
> TIF_FOREIGN_FPSTATE cannot remain clear if the FPSIMD state in the
> FPSIMD registers is invalid.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs
@ 2018-05-23 13:46     ` Catalin Marinas
  0 siblings, 0 replies; 138+ messages in thread
From: Catalin Marinas @ 2018-05-23 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 22, 2018 at 05:05:02PM +0100, Dave P Martin wrote:
> fpsimd_last_state.st is set to NULL as a way of indicating that
> current's FPSIMD registers are no longer loaded in the cpu.  In
> particular, this is done when the kernel temporarily uses or
> clobbers the FPSIMD registers for its own purposes, as in CPU PM or
> kernel-mode NEON, resulting in them being populated with garbage
> data not belonging to a task.
> 
> Commit 17eed27b02da ("arm64/sve: KVM: Prevent guests from using
> SVE") factors this operation out as a new helper
> fpsimd_flush_cpu_state() to make it clearer what is being done
> here, and on SVE systems this helper is now used, via
> kvm_fpsimd_flush_cpu_state(), to invalidate the registers after KVM
> has run a vcpu.  The reason for this is that KVM does not yet
> understand how to restore the full host SVE registers itself after
> loading the guest FPSIMD context into them.
> 
> This exposes a particular problem: if fpsimd_last_state.st is set
> to NULL without also setting TIF_FOREIGN_FPSTATE, the kernel may
> continue to think that current's FPSIMD registers are live even
> though they have actually been clobbered.
> 
> Prior to the aforementioned commit, the only path where
> fpsimd_last_state.st is set to NULL without setting
> TIF_FOREIGN_FPSTATE is when kernel_neon_begin() is called by a
> kernel thread (where current->mm can be NULL).  This does not
> matter, because the only harm is that at context-switch time
> fpsimd_thread_switch() may unnecessarily save the FPSIMD registers
> back to current's thread_struct (even though kernel threads are not
> considered to have any FPSIMD context of their own and the
> registers will never be reloaded).
> 
> Note that although CPU_PM_ENTER lacks the TIF_FOREIGN_FPSTATE
> setting, every CPU passing through that path must subsequently pass
> through CPU_PM_EXIT before it can re-enter the kernel proper.
> CPU_PM_EXIT sets the flag.
> 
> The sve_flush_cpu_state() function added by commit 17eed27b02da
> also lacks the proper maintenance of TIF_FOREIGN_FPSTATE.  This may
> cause the bits of a host task's SVE registers that do not alias the
> FPSIMD register file to spontaneously appear zeroed if a KVM vcpu
> runs in the same task in the meantime.  Although this effect is
> hidden by the fact that the non-FPSIMD bits of the SVE registers
> are zeroed by a syscall anyway, it is doubtless a bad idea to rely
> on these different code paths interacting correctly under future
> maintenance.
> 
> This patch makes TIF_FOREIGN_FPSTATE an unconditional side-effect
> of fpsimd_flush_cpu_state(), and removes the set_thread_flag()
> calls that become redunant as a result.  This ensures that
> TIF_FOREIGN_FPSTATE cannot remain clear if the FPSIMD state in the
> FPSIMD registers is invalid.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 13:46     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 13:46 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, Oleg Nesterov, Peter Zijlstra, Ingo Molnar, kvmarm,
	linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> There are a number of bits of code sprinkled around the kernel to
> set a thread flag if a certain condition is true, and clear it
> otherwise.
>
> To help make those call sites terser and less cumbersome, this
> patch adds a new family of thread flag manipulators
>
> 	update*_thread_flag([...,] flag, cond)
>
> which do the equivalent of:
>
> 	if (cond)
> 		set*_thread_flag([...,] flag);
> 	else
> 		clear*_thread_flag([...,] flag);
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
>  include/linux/sched.h       |  6 ++++++
>  include/linux/thread_info.h | 11 +++++++++++
>  2 files changed, 17 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b3d697f..c2c3051 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1578,6 +1578,12 @@ static inline void clear_tsk_thread_flag(struct task_struct *tsk, int flag)
>  	clear_ti_thread_flag(task_thread_info(tsk), flag);
>  }
>
> +static inline void update_tsk_thread_flag(struct task_struct *tsk, int flag,
> +					  bool value)
> +{
> +	update_ti_thread_flag(task_thread_info(tsk), flag, value);
> +}
> +
>  static inline int test_and_set_tsk_thread_flag(struct task_struct *tsk, int flag)
>  {
>  	return test_and_set_ti_thread_flag(task_thread_info(tsk), flag);
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index cf2862b..8d8821b 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -60,6 +60,15 @@ static inline void clear_ti_thread_flag(struct thread_info *ti, int flag)
>  	clear_bit(flag, (unsigned long *)&ti->flags);
>  }
>
> +static inline void update_ti_thread_flag(struct thread_info *ti, int flag,
> +					 bool value)
> +{
> +	if (value)
> +		set_ti_thread_flag(ti, flag);
> +	else
> +		clear_ti_thread_flag(ti, flag);
> +}
> +

value does seem a bit of vanilla non-informative name for a condition
flag but whatever:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


>  static inline int test_and_set_ti_thread_flag(struct thread_info *ti, int flag)
>  {
>  	return test_and_set_bit(flag, (unsigned long *)&ti->flags);
> @@ -79,6 +88,8 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
>  	set_ti_thread_flag(current_thread_info(), flag)
>  #define clear_thread_flag(flag) \
>  	clear_ti_thread_flag(current_thread_info(), flag)
> +#define update_thread_flag(flag, value) \
> +	update_ti_thread_flag(current_thread_info(), flag, value)
>  #define test_and_set_thread_flag(flag) \
>  	test_and_set_ti_thread_flag(current_thread_info(), flag)
>  #define test_and_clear_thread_flag(flag) \


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers
@ 2018-05-23 13:46     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 13:46 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> There are a number of bits of code sprinkled around the kernel to
> set a thread flag if a certain condition is true, and clear it
> otherwise.
>
> To help make those call sites terser and less cumbersome, this
> patch adds a new family of thread flag manipulators
>
> 	update*_thread_flag([...,] flag, cond)
>
> which do the equivalent of:
>
> 	if (cond)
> 		set*_thread_flag([...,] flag);
> 	else
> 		clear*_thread_flag([...,] flag);
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> ---
>  include/linux/sched.h       |  6 ++++++
>  include/linux/thread_info.h | 11 +++++++++++
>  2 files changed, 17 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b3d697f..c2c3051 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1578,6 +1578,12 @@ static inline void clear_tsk_thread_flag(struct task_struct *tsk, int flag)
>  	clear_ti_thread_flag(task_thread_info(tsk), flag);
>  }
>
> +static inline void update_tsk_thread_flag(struct task_struct *tsk, int flag,
> +					  bool value)
> +{
> +	update_ti_thread_flag(task_thread_info(tsk), flag, value);
> +}
> +
>  static inline int test_and_set_tsk_thread_flag(struct task_struct *tsk, int flag)
>  {
>  	return test_and_set_ti_thread_flag(task_thread_info(tsk), flag);
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index cf2862b..8d8821b 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -60,6 +60,15 @@ static inline void clear_ti_thread_flag(struct thread_info *ti, int flag)
>  	clear_bit(flag, (unsigned long *)&ti->flags);
>  }
>
> +static inline void update_ti_thread_flag(struct thread_info *ti, int flag,
> +					 bool value)
> +{
> +	if (value)
> +		set_ti_thread_flag(ti, flag);
> +	else
> +		clear_ti_thread_flag(ti, flag);
> +}
> +

value does seem a bit of vanilla non-informative name for a condition
flag but whatever:

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>


>  static inline int test_and_set_ti_thread_flag(struct thread_info *ti, int flag)
>  {
>  	return test_and_set_bit(flag, (unsigned long *)&ti->flags);
> @@ -79,6 +88,8 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
>  	set_ti_thread_flag(current_thread_info(), flag)
>  #define clear_thread_flag(flag) \
>  	clear_ti_thread_flag(current_thread_info(), flag)
> +#define update_thread_flag(flag, value) \
> +	update_ti_thread_flag(current_thread_info(), flag, value)
>  #define test_and_set_thread_flag(flag) \
>  	test_and_set_ti_thread_flag(current_thread_info(), flag)
>  #define test_and_clear_thread_flag(flag) \


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 03/18] arm64: Use update{,_tsk}_thread_flag()
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 13:48     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 13:48 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> This patch uses the new update_thread_flag() helpers to simplify a
> couple of if () set; else clear; constructs.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  arch/arm64/kernel/fpsimd.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 12e1c96..9d85373 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -618,10 +618,8 @@ int sve_set_vector_length(struct task_struct *task,
>  	task->thread.sve_vl = vl;
>
>  out:
> -	if (flags & PR_SVE_VL_INHERIT)
> -		set_tsk_thread_flag(task, TIF_SVE_VL_INHERIT);
> -	else
> -		clear_tsk_thread_flag(task, TIF_SVE_VL_INHERIT);
> +	update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
> +			       flags & PR_SVE_VL_INHERIT);
>
>  	return 0;
>  }
> @@ -910,12 +908,12 @@ void fpsimd_thread_switch(struct task_struct *next)
>  		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
>  		 * upon the next return to userland.
>  		 */
> -		if (__this_cpu_read(fpsimd_last_state.st) ==
> -			&next->thread.uw.fpsimd_state
> -		    && next->thread.fpsimd_cpu == smp_processor_id())
> -			clear_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
> -		else
> -			set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
> +		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> +					&next->thread.uw.fpsimd_state;
> +		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> +
> +		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> +				       wrong_task || wrong_cpu);
>  	}
>  }


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 03/18] arm64: Use update{,_tsk}_thread_flag()
@ 2018-05-23 13:48     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 13:48 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> This patch uses the new update_thread_flag() helpers to simplify a
> couple of if () set; else clear; constructs.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  arch/arm64/kernel/fpsimd.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 12e1c96..9d85373 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -618,10 +618,8 @@ int sve_set_vector_length(struct task_struct *task,
>  	task->thread.sve_vl = vl;
>
>  out:
> -	if (flags & PR_SVE_VL_INHERIT)
> -		set_tsk_thread_flag(task, TIF_SVE_VL_INHERIT);
> -	else
> -		clear_tsk_thread_flag(task, TIF_SVE_VL_INHERIT);
> +	update_tsk_thread_flag(task, TIF_SVE_VL_INHERIT,
> +			       flags & PR_SVE_VL_INHERIT);
>
>  	return 0;
>  }
> @@ -910,12 +908,12 @@ void fpsimd_thread_switch(struct task_struct *next)
>  		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
>  		 * upon the next return to userland.
>  		 */
> -		if (__this_cpu_read(fpsimd_last_state.st) ==
> -			&next->thread.uw.fpsimd_state
> -		    && next->thread.fpsimd_cpu == smp_processor_id())
> -			clear_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
> -		else
> -			set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
> +		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> +					&next->thread.uw.fpsimd_state;
> +		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> +
> +		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> +				       wrong_task || wrong_cpu);
>  	}
>  }


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers
  2018-05-23 13:46     ` Alex Bennée
@ 2018-05-23 13:57       ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-23 13:57 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, Oleg Nesterov, Peter Zijlstra, Ingo Molnar, kvmarm,
	linux-arm-kernel

On Wed, May 23, 2018 at 02:46:52PM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > There are a number of bits of code sprinkled around the kernel to
> > set a thread flag if a certain condition is true, and clear it
> > otherwise.
> >
> > To help make those call sites terser and less cumbersome, this
> > patch adds a new family of thread flag manipulators
> >
> > 	update*_thread_flag([...,] flag, cond)
> >
> > which do the equivalent of:
> >
> > 	if (cond)
> > 		set*_thread_flag([...,] flag);
> > 	else
> > 		clear*_thread_flag([...,] flag);
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > ---
> >  include/linux/sched.h       |  6 ++++++
> >  include/linux/thread_info.h | 11 +++++++++++
> >  2 files changed, 17 insertions(+)
> >

[...]

> > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> > index cf2862b..8d8821b 100644
> > --- a/include/linux/thread_info.h
> > +++ b/include/linux/thread_info.h
> > @@ -60,6 +60,15 @@ static inline void clear_ti_thread_flag(struct thread_info *ti, int flag)
> >  	clear_bit(flag, (unsigned long *)&ti->flags);
> >  }
> >
> > +static inline void update_ti_thread_flag(struct thread_info *ti, int flag,
> > +					 bool value)
> > +{
> > +	if (value)
> > +		set_ti_thread_flag(ti, flag);
> > +	else
> > +		clear_ti_thread_flag(ti, flag);
> > +}
> > +
> 
> value does seem a bit of vanilla non-informative name for a condition
> flag but whatever:
> 
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

I guess, though I couldn't some up with an obviously better name.

I support "condition" would have worked, but it's more verbose.

Thanks for the review
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers
@ 2018-05-23 13:57       ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-23 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 02:46:52PM +0100, Alex Benn?e wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > There are a number of bits of code sprinkled around the kernel to
> > set a thread flag if a certain condition is true, and clear it
> > otherwise.
> >
> > To help make those call sites terser and less cumbersome, this
> > patch adds a new family of thread flag manipulators
> >
> > 	update*_thread_flag([...,] flag, cond)
> >
> > which do the equivalent of:
> >
> > 	if (cond)
> > 		set*_thread_flag([...,] flag);
> > 	else
> > 		clear*_thread_flag([...,] flag);
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > ---
> >  include/linux/sched.h       |  6 ++++++
> >  include/linux/thread_info.h | 11 +++++++++++
> >  2 files changed, 17 insertions(+)
> >

[...]

> > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> > index cf2862b..8d8821b 100644
> > --- a/include/linux/thread_info.h
> > +++ b/include/linux/thread_info.h
> > @@ -60,6 +60,15 @@ static inline void clear_ti_thread_flag(struct thread_info *ti, int flag)
> >  	clear_bit(flag, (unsigned long *)&ti->flags);
> >  }
> >
> > +static inline void update_ti_thread_flag(struct thread_info *ti, int flag,
> > +					 bool value)
> > +{
> > +	if (value)
> > +		set_ti_thread_flag(ti, flag);
> > +	else
> > +		clear_ti_thread_flag(ti, flag);
> > +}
> > +
> 
> value does seem a bit of vanilla non-informative name for a condition
> flag but whatever:
> 
> Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

I guess, though I couldn't some up with an obviously better name.

I support "condition" would have worked, but it's more verbose.

Thanks for the review
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 14:34     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 14:34 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel, kvmarm, Christoffer Dall


Dave Martin <Dave.Martin@arm.com> writes:

> From: Christoffer Dall <christoffer.dall@linaro.org>
>
> KVM/ARM differs from other architectures in having to maintain an
> additional virtual address space from that of the host and the
> guest, because we split the execution of KVM across both EL1 and
> EL2.
>
> This results in a need to explicitly map data structures into EL2
> (hyp) which are accessed from the hyp code.  As we are about to be
> more clever with our FPSIMD handling on arm64, which stores data in
> the task struct and uses thread_info flags, we will have to map
> parts of the currently executing task struct into the EL2 virtual
> address space.
>
> However, we don't want to do this on every KVM_RUN, because it is a
> fairly expensive operation to walk the page tables, and the common
> execution mode is to map a single thread to a VCPU.  By introducing
> a hook that architectures can select with
> HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
> other architectures, but have a simple way to only map the data we
> need when required for arm64.
>
> This patch introduces the framework only, and wires it up in the
> arm/arm64 KVM common code.
>
> No functional change.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/linux/kvm_host.h | 9 +++++++++
>  virt/kvm/Kconfig         | 3 +++
>  virt/kvm/kvm_main.c      | 7 ++++++-
>  3 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 6930c63..4268ace 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
>  void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>  		unsigned long start, unsigned long end);
>
> +#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
> +int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
> +#else
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
> +
>  #endif
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index cca7e06..72143cf 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
>
>  config HAVE_KVM_VCPU_ASYNC_IOCTL
>         bool
> +
> +config HAVE_KVM_VCPU_RUN_PID_CHANGE
> +       bool

This almost threw me as I thought you might be able to enable this and
break the build, but apparently not:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c7b2e92..c32e240 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2550,8 +2550,13 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  		oldpid = rcu_access_pointer(vcpu->pid);
>  		if (unlikely(oldpid != current->pids[PIDTYPE_PID].pid)) {
>  			/* The thread running this VCPU changed. */
> -			struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
> +			struct pid *newpid;
>
> +			r = kvm_arch_vcpu_run_pid_change(vcpu);
> +			if (r)
> +				break;
> +
> +			newpid = get_task_pid(current, PIDTYPE_PID);
>  			rcu_assign_pointer(vcpu->pid, newpid);
>  			if (oldpid)
>  				synchronize_rcu();


--
Alex Bennée
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
@ 2018-05-23 14:34     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 14:34 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> From: Christoffer Dall <christoffer.dall@linaro.org>
>
> KVM/ARM differs from other architectures in having to maintain an
> additional virtual address space from that of the host and the
> guest, because we split the execution of KVM across both EL1 and
> EL2.
>
> This results in a need to explicitly map data structures into EL2
> (hyp) which are accessed from the hyp code.  As we are about to be
> more clever with our FPSIMD handling on arm64, which stores data in
> the task struct and uses thread_info flags, we will have to map
> parts of the currently executing task struct into the EL2 virtual
> address space.
>
> However, we don't want to do this on every KVM_RUN, because it is a
> fairly expensive operation to walk the page tables, and the common
> execution mode is to map a single thread to a VCPU.  By introducing
> a hook that architectures can select with
> HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
> other architectures, but have a simple way to only map the data we
> need when required for arm64.
>
> This patch introduces the framework only, and wires it up in the
> arm/arm64 KVM common code.
>
> No functional change.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  include/linux/kvm_host.h | 9 +++++++++
>  virt/kvm/Kconfig         | 3 +++
>  virt/kvm/kvm_main.c      | 7 ++++++-
>  3 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 6930c63..4268ace 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
>  void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>  		unsigned long start, unsigned long end);
>
> +#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
> +int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
> +#else
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> +	return 0;
> +}
> +#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
> +
>  #endif
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index cca7e06..72143cf 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
>
>  config HAVE_KVM_VCPU_ASYNC_IOCTL
>         bool
> +
> +config HAVE_KVM_VCPU_RUN_PID_CHANGE
> +       bool

This almost threw me as I thought you might be able to enable this and
break the build, but apparently not:

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>


> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c7b2e92..c32e240 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2550,8 +2550,13 @@ static long kvm_vcpu_ioctl(struct file *filp,
>  		oldpid = rcu_access_pointer(vcpu->pid);
>  		if (unlikely(oldpid != current->pids[PIDTYPE_PID].pid)) {
>  			/* The thread running this VCPU changed. */
> -			struct pid *newpid = get_task_pid(current, PIDTYPE_PID);
> +			struct pid *newpid;
>
> +			r = kvm_arch_vcpu_run_pid_change(vcpu);
> +			if (r)
> +				break;
> +
> +			newpid = get_task_pid(current, PIDTYPE_PID);
>  			rcu_assign_pointer(vcpu->pid, newpid);
>  			if (oldpid)
>  				synchronize_rcu();


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers
  2018-05-23 13:57       ` Dave Martin
@ 2018-05-23 14:35         ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 14:35 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, Oleg Nesterov, Peter Zijlstra, Ingo Molnar, kvmarm,
	linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> On Wed, May 23, 2018 at 02:46:52PM +0100, Alex Bennée wrote:
>>
>> Dave Martin <Dave.Martin@arm.com> writes:
>>
>> > There are a number of bits of code sprinkled around the kernel to
>> > set a thread flag if a certain condition is true, and clear it
>> > otherwise.
>> >
>> > To help make those call sites terser and less cumbersome, this
>> > patch adds a new family of thread flag manipulators
>> >
>> > 	update*_thread_flag([...,] flag, cond)
>> >
>> > which do the equivalent of:
>> >
>> > 	if (cond)
>> > 		set*_thread_flag([...,] flag);
>> > 	else
>> > 		clear*_thread_flag([...,] flag);
>> >
>> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
>> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>> > Cc: Ingo Molnar <mingo@redhat.com>
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Oleg Nesterov <oleg@redhat.com>
>> > ---
>> >  include/linux/sched.h       |  6 ++++++
>> >  include/linux/thread_info.h | 11 +++++++++++
>> >  2 files changed, 17 insertions(+)
>> >
>
> [...]
>
>> > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
>> > index cf2862b..8d8821b 100644
>> > --- a/include/linux/thread_info.h
>> > +++ b/include/linux/thread_info.h
>> > @@ -60,6 +60,15 @@ static inline void clear_ti_thread_flag(struct thread_info *ti, int flag)
>> >  	clear_bit(flag, (unsigned long *)&ti->flags);
>> >  }
>> >
>> > +static inline void update_ti_thread_flag(struct thread_info *ti, int flag,
>> > +					 bool value)
>> > +{
>> > +	if (value)
>> > +		set_ti_thread_flag(ti, flag);
>> > +	else
>> > +		clear_ti_thread_flag(ti, flag);
>> > +}
>> > +
>>
>> value does seem a bit of vanilla non-informative name for a condition
>> flag but whatever:
>>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
> I guess, though I couldn't some up with an obviously better name.
>
> I support "condition" would have worked, but it's more verbose.

Well as you use cond in the text I think cond would also work as an
abbreviated variable name. But its a minor nit ;-)

>
> Thanks for the review
> ---Dave


--
Alex Bennée
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers
@ 2018-05-23 14:35         ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 14:35 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> On Wed, May 23, 2018 at 02:46:52PM +0100, Alex Benn?e wrote:
>>
>> Dave Martin <Dave.Martin@arm.com> writes:
>>
>> > There are a number of bits of code sprinkled around the kernel to
>> > set a thread flag if a certain condition is true, and clear it
>> > otherwise.
>> >
>> > To help make those call sites terser and less cumbersome, this
>> > patch adds a new family of thread flag manipulators
>> >
>> > 	update*_thread_flag([...,] flag, cond)
>> >
>> > which do the equivalent of:
>> >
>> > 	if (cond)
>> > 		set*_thread_flag([...,] flag);
>> > 	else
>> > 		clear*_thread_flag([...,] flag);
>> >
>> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
>> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>> > Cc: Ingo Molnar <mingo@redhat.com>
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Oleg Nesterov <oleg@redhat.com>
>> > ---
>> >  include/linux/sched.h       |  6 ++++++
>> >  include/linux/thread_info.h | 11 +++++++++++
>> >  2 files changed, 17 insertions(+)
>> >
>
> [...]
>
>> > diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
>> > index cf2862b..8d8821b 100644
>> > --- a/include/linux/thread_info.h
>> > +++ b/include/linux/thread_info.h
>> > @@ -60,6 +60,15 @@ static inline void clear_ti_thread_flag(struct thread_info *ti, int flag)
>> >  	clear_bit(flag, (unsigned long *)&ti->flags);
>> >  }
>> >
>> > +static inline void update_ti_thread_flag(struct thread_info *ti, int flag,
>> > +					 bool value)
>> > +{
>> > +	if (value)
>> > +		set_ti_thread_flag(ti, flag);
>> > +	else
>> > +		clear_ti_thread_flag(ti, flag);
>> > +}
>> > +
>>
>> value does seem a bit of vanilla non-informative name for a condition
>> flag but whatever:
>>
>> Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
>
> I guess, though I couldn't some up with an obviously better name.
>
> I support "condition" would have worked, but it's more verbose.

Well as you use cond in the text I think cond would also work as an
abbreviated variable name. But its a minor nit ;-)

>
> Thanks for the review
> ---Dave


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
  2018-05-23 14:34     ` Alex Bennée
@ 2018-05-23 14:40       ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-23 14:40 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, Christoffer Dall, kvmarm, linux-arm-kernel

On Wed, May 23, 2018 at 03:34:20PM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > From: Christoffer Dall <christoffer.dall@linaro.org>
> >
> > KVM/ARM differs from other architectures in having to maintain an
> > additional virtual address space from that of the host and the
> > guest, because we split the execution of KVM across both EL1 and
> > EL2.
> >
> > This results in a need to explicitly map data structures into EL2
> > (hyp) which are accessed from the hyp code.  As we are about to be
> > more clever with our FPSIMD handling on arm64, which stores data in
> > the task struct and uses thread_info flags, we will have to map
> > parts of the currently executing task struct into the EL2 virtual
> > address space.
> >
> > However, we don't want to do this on every KVM_RUN, because it is a
> > fairly expensive operation to walk the page tables, and the common
> > execution mode is to map a single thread to a VCPU.  By introducing
> > a hook that architectures can select with
> > HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
> > other architectures, but have a simple way to only map the data we
> > need when required for arm64.
> >
> > This patch introduces the framework only, and wires it up in the
> > arm/arm64 KVM common code.
> >
> > No functional change.
> >
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  include/linux/kvm_host.h | 9 +++++++++
> >  virt/kvm/Kconfig         | 3 +++
> >  virt/kvm/kvm_main.c      | 7 ++++++-
> >  3 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 6930c63..4268ace 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
> >  void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> >  		unsigned long start, unsigned long end);
> >
> > +#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
> > +int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
> > +#else
> > +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> > +{
> > +	return 0;
> > +}
> > +#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
> > +
> >  #endif
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index cca7e06..72143cf 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
> >
> >  config HAVE_KVM_VCPU_ASYNC_IOCTL
> >         bool
> > +
> > +config HAVE_KVM_VCPU_RUN_PID_CHANGE
> > +       bool
> 
> This almost threw me as I thought you might be able to enable this and
> break the build, but apparently not:
> 
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Without a "help", the option seems non-interactive and cannot be true
unless something selects it.  It seems a bit weird to me too, but the
idiom appears widely used...

Christoffer?

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
@ 2018-05-23 14:40       ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-23 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 03:34:20PM +0100, Alex Benn?e wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > From: Christoffer Dall <christoffer.dall@linaro.org>
> >
> > KVM/ARM differs from other architectures in having to maintain an
> > additional virtual address space from that of the host and the
> > guest, because we split the execution of KVM across both EL1 and
> > EL2.
> >
> > This results in a need to explicitly map data structures into EL2
> > (hyp) which are accessed from the hyp code.  As we are about to be
> > more clever with our FPSIMD handling on arm64, which stores data in
> > the task struct and uses thread_info flags, we will have to map
> > parts of the currently executing task struct into the EL2 virtual
> > address space.
> >
> > However, we don't want to do this on every KVM_RUN, because it is a
> > fairly expensive operation to walk the page tables, and the common
> > execution mode is to map a single thread to a VCPU.  By introducing
> > a hook that architectures can select with
> > HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
> > other architectures, but have a simple way to only map the data we
> > need when required for arm64.
> >
> > This patch introduces the framework only, and wires it up in the
> > arm/arm64 KVM common code.
> >
> > No functional change.
> >
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  include/linux/kvm_host.h | 9 +++++++++
> >  virt/kvm/Kconfig         | 3 +++
> >  virt/kvm/kvm_main.c      | 7 ++++++-
> >  3 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 6930c63..4268ace 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
> >  void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> >  		unsigned long start, unsigned long end);
> >
> > +#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
> > +int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
> > +#else
> > +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> > +{
> > +	return 0;
> > +}
> > +#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
> > +
> >  #endif
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index cca7e06..72143cf 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
> >
> >  config HAVE_KVM_VCPU_ASYNC_IOCTL
> >         bool
> > +
> > +config HAVE_KVM_VCPU_RUN_PID_CHANGE
> > +       bool
> 
> This almost threw me as I thought you might be able to enable this and
> break the build, but apparently not:
> 
> Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

Without a "help", the option seems non-interactive and cannot be true
unless something selects it.  It seems a bit weird to me too, but the
idiom appears widely used...

Christoffer?

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-23 13:31       ` Dave Martin
@ 2018-05-23 14:56         ` Catalin Marinas
  -1 siblings, 0 replies; 138+ messages in thread
From: Catalin Marinas @ 2018-05-23 14:56 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Will Deacon,
	Christoffer Dall, kvmarm, linux-arm-kernel

On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > cleared except when returning to userspace or returning from a
> > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > ever be saved.
> > 
> > I don't understand this construction proof; from looking at the patch
> > below it is not obvious to me why fpsimd_thread_switch() can never have
> > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > kernel thread?
> 
> Looking at this again, I think it is poorly worded.  This patch aims to
> make it true by construction, but it isn't prior to the patch.
> 
> I'm tempted to delete the paragraph: the assertion of both untrue and
> not the best way to justify that this patch works.
> 
> 
> How about:
> 
> -8<-
> 
> The context switch logic already isolates user threads from each other.
> This, it is sufficient for isolating user threads from the kernel,
> since the goal either way is to ensure that code executing in userspace
> cannot see any FPSIMD state except its own.  Thus, there is no special
> property of kernel threads that we care about except that it is
> pointless to save or load FPSIMD register state for them.
> 
> At worst, the removal of all the kernel thread special cases by this
> patch would thus spuriously load and save state for kernel threads when
> unnecessary.
> 
> But the context switch logic is already deliberately optimised to defer
> reloads of the regs until ret_to_user (or sigreturn as a special case),
> which kernel threads by definition never reach.
> 
> ->8-

The "at worst" paragraph makes it look like it could happen (at least
until you reach the last paragraph). Maybe you can just say that
wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
always true for kernel threads. You should probably mention this in a
comment in the code as well.

-- 
Catalin

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-23 14:56         ` Catalin Marinas
  0 siblings, 0 replies; 138+ messages in thread
From: Catalin Marinas @ 2018-05-23 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > cleared except when returning to userspace or returning from a
> > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > ever be saved.
> > 
> > I don't understand this construction proof; from looking at the patch
> > below it is not obvious to me why fpsimd_thread_switch() can never have
> > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > kernel thread?
> 
> Looking at this again, I think it is poorly worded.  This patch aims to
> make it true by construction, but it isn't prior to the patch.
> 
> I'm tempted to delete the paragraph: the assertion of both untrue and
> not the best way to justify that this patch works.
> 
> 
> How about:
> 
> -8<-
> 
> The context switch logic already isolates user threads from each other.
> This, it is sufficient for isolating user threads from the kernel,
> since the goal either way is to ensure that code executing in userspace
> cannot see any FPSIMD state except its own.  Thus, there is no special
> property of kernel threads that we care about except that it is
> pointless to save or load FPSIMD register state for them.
> 
> At worst, the removal of all the kernel thread special cases by this
> patch would thus spuriously load and save state for kernel threads when
> unnecessary.
> 
> But the context switch logic is already deliberately optimised to defer
> reloads of the regs until ret_to_user (or sigreturn as a special case),
> which kernel threads by definition never reach.
> 
> ->8-

The "at worst" paragraph makes it look like it could happen (at least
until you reach the last paragraph). Maybe you can just say that
wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
always true for kernel threads. You should probably mention this in a
comment in the code as well.

-- 
Catalin

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 14:59     ` Catalin Marinas
  -1 siblings, 0 replies; 138+ messages in thread
From: Catalin Marinas @ 2018-05-23 14:59 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Will Deacon,
	kvmarm, linux-arm-kernel

On Tue, May 22, 2018 at 05:05:15PM +0100, Dave P Martin wrote:
> This patch adds SVE context saving to the hyp FPSIMD context switch
> path.  This means that it is no longer necessary to save the host
> SVE state in advance of entering the guest, when in use.
> 
> In order to avoid adding pointless complexity to the code, VHE is
> assumed if SVE is in use.  VHE is an architectural prerequisite for
> SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
> kernels that support both SVE and KVM.
> 
> Historically, software models exist that can expose the
> architecturally invalid configuration of SVE without VHE, so if
> this situation is detected at kvm_init() time then KVM will be
> disabled.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate
@ 2018-05-23 14:59     ` Catalin Marinas
  0 siblings, 0 replies; 138+ messages in thread
From: Catalin Marinas @ 2018-05-23 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 22, 2018 at 05:05:15PM +0100, Dave P Martin wrote:
> This patch adds SVE context saving to the hyp FPSIMD context switch
> path.  This means that it is no longer necessary to save the host
> SVE state in advance of entering the guest, when in use.
> 
> In order to avoid adding pointless complexity to the code, VHE is
> assumed if SVE is in use.  VHE is an architectural prerequisite for
> SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
> kernels that support both SVE and KVM.
> 
> Historically, software models exist that can expose the
> architecturally invalid configuration of SVE without VHE, so if
> this situation is detected at kvm_init() time then KVM will be
> disabled.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-23 14:56         ` Catalin Marinas
@ 2018-05-23 15:03           ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-23 15:03 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Will Deacon,
	Christoffer Dall, kvmarm, linux-arm-kernel

On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > cleared except when returning to userspace or returning from a
> > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > ever be saved.
> > > 
> > > I don't understand this construction proof; from looking at the patch
> > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > kernel thread?
> > 
> > Looking at this again, I think it is poorly worded.  This patch aims to
> > make it true by construction, but it isn't prior to the patch.
> > 
> > I'm tempted to delete the paragraph: the assertion of both untrue and
> > not the best way to justify that this patch works.
> > 
> > 
> > How about:
> > 
> > -8<-
> > 
> > The context switch logic already isolates user threads from each other.
> > This, it is sufficient for isolating user threads from the kernel,
> > since the goal either way is to ensure that code executing in userspace
> > cannot see any FPSIMD state except its own.  Thus, there is no special
> > property of kernel threads that we care about except that it is
> > pointless to save or load FPSIMD register state for them.
> > 
> > At worst, the removal of all the kernel thread special cases by this
> > patch would thus spuriously load and save state for kernel threads when
> > unnecessary.
> > 
> > But the context switch logic is already deliberately optimised to defer
> > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > which kernel threads by definition never reach.
> > 
> > ->8-
> 
> The "at worst" paragraph makes it look like it could happen (at least
> until you reach the last paragraph). Maybe you can just say that
> wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> always true for kernel threads. You should probably mention this in a
> comment in the code as well.

What if I just delete the second paragraph, and remove the "But" from
the start of the third, and append:

"As a result, the wrong_task and wrong_cpu tests in
fpsimd_thread_switch() will always yield false for kernel threads."

...with a similar comment in the code?

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-23 15:03           ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-23 15:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > cleared except when returning to userspace or returning from a
> > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > ever be saved.
> > > 
> > > I don't understand this construction proof; from looking at the patch
> > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > kernel thread?
> > 
> > Looking at this again, I think it is poorly worded.  This patch aims to
> > make it true by construction, but it isn't prior to the patch.
> > 
> > I'm tempted to delete the paragraph: the assertion of both untrue and
> > not the best way to justify that this patch works.
> > 
> > 
> > How about:
> > 
> > -8<-
> > 
> > The context switch logic already isolates user threads from each other.
> > This, it is sufficient for isolating user threads from the kernel,
> > since the goal either way is to ensure that code executing in userspace
> > cannot see any FPSIMD state except its own.  Thus, there is no special
> > property of kernel threads that we care about except that it is
> > pointless to save or load FPSIMD register state for them.
> > 
> > At worst, the removal of all the kernel thread special cases by this
> > patch would thus spuriously load and save state for kernel threads when
> > unnecessary.
> > 
> > But the context switch logic is already deliberately optimised to defer
> > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > which kernel threads by definition never reach.
> > 
> > ->8-
> 
> The "at worst" paragraph makes it look like it could happen (at least
> until you reach the last paragraph). Maybe you can just say that
> wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> always true for kernel threads. You should probably mention this in a
> comment in the code as well.

What if I just delete the second paragraph, and remove the "But" from
the start of the third, and append:

"As a result, the wrong_task and wrong_cpu tests in
fpsimd_thread_switch() will always yield false for kernel threads."

...with a similar comment in the code?

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-23 15:03           ` Dave Martin
@ 2018-05-23 16:42             ` Catalin Marinas
  -1 siblings, 0 replies; 138+ messages in thread
From: Catalin Marinas @ 2018-05-23 16:42 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Will Deacon,
	Christoffer Dall, kvmarm, linux-arm-kernel

On Wed, May 23, 2018 at 04:03:37PM +0100, Dave P Martin wrote:
> On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > > cleared except when returning to userspace or returning from a
> > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > > ever be saved.
> > > > 
> > > > I don't understand this construction proof; from looking at the patch
> > > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > > kernel thread?
> > > 
> > > Looking at this again, I think it is poorly worded.  This patch aims to
> > > make it true by construction, but it isn't prior to the patch.
> > > 
> > > I'm tempted to delete the paragraph: the assertion of both untrue and
> > > not the best way to justify that this patch works.
> > > 
> > > 
> > > How about:
> > > 
> > > -8<-
> > > 
> > > The context switch logic already isolates user threads from each other.
> > > This, it is sufficient for isolating user threads from the kernel,
> > > since the goal either way is to ensure that code executing in userspace
> > > cannot see any FPSIMD state except its own.  Thus, there is no special
> > > property of kernel threads that we care about except that it is
> > > pointless to save or load FPSIMD register state for them.
> > > 
> > > At worst, the removal of all the kernel thread special cases by this
> > > patch would thus spuriously load and save state for kernel threads when
> > > unnecessary.
> > > 
> > > But the context switch logic is already deliberately optimised to defer
> > > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > > which kernel threads by definition never reach.
> > > 
> > > ->8-
> > 
> > The "at worst" paragraph makes it look like it could happen (at least
> > until you reach the last paragraph). Maybe you can just say that
> > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> > always true for kernel threads. You should probably mention this in a
> > comment in the code as well.
> 
> What if I just delete the second paragraph, and remove the "But" from
> the start of the third, and append:
> 
> "As a result, the wrong_task and wrong_cpu tests in
> fpsimd_thread_switch() will always yield false for kernel threads."
> 
> ...with a similar comment in the code?

Sounds fine. With that, feel free to add:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-23 16:42             ` Catalin Marinas
  0 siblings, 0 replies; 138+ messages in thread
From: Catalin Marinas @ 2018-05-23 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 04:03:37PM +0100, Dave P Martin wrote:
> On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > > cleared except when returning to userspace or returning from a
> > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > > ever be saved.
> > > > 
> > > > I don't understand this construction proof; from looking at the patch
> > > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > > kernel thread?
> > > 
> > > Looking at this again, I think it is poorly worded.  This patch aims to
> > > make it true by construction, but it isn't prior to the patch.
> > > 
> > > I'm tempted to delete the paragraph: the assertion of both untrue and
> > > not the best way to justify that this patch works.
> > > 
> > > 
> > > How about:
> > > 
> > > -8<-
> > > 
> > > The context switch logic already isolates user threads from each other.
> > > This, it is sufficient for isolating user threads from the kernel,
> > > since the goal either way is to ensure that code executing in userspace
> > > cannot see any FPSIMD state except its own.  Thus, there is no special
> > > property of kernel threads that we care about except that it is
> > > pointless to save or load FPSIMD register state for them.
> > > 
> > > At worst, the removal of all the kernel thread special cases by this
> > > patch would thus spuriously load and save state for kernel threads when
> > > unnecessary.
> > > 
> > > But the context switch logic is already deliberately optimised to defer
> > > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > > which kernel threads by definition never reach.
> > > 
> > > ->8-
> > 
> > The "at worst" paragraph makes it look like it could happen (at least
> > until you reach the last paragraph). Maybe you can just say that
> > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> > always true for kernel threads. You should probably mention this in a
> > comment in the code as well.
> 
> What if I just delete the second paragraph, and remove the "But" from
> the start of the third, and append:
> 
> "As a result, the wrong_task and wrong_cpu tests in
> fpsimd_thread_switch() will always yield false for kernel threads."
> 
> ...with a similar comment in the code?

Sounds fine. With that, feel free to add:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 19:35     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 19:35 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> To make the lazy FPSIMD context switch trap code easier to hack on,
> this patch converts it to C.
>
> This is not amazingly efficient, but the trap should typically only
> be taken once per host context switch.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>  2 files changed, 46 insertions(+), 35 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index e41a161..40f349b 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -172,40 +172,27 @@ ENTRY(__fpsimd_guest_restore)
>  	// x1: vcpu
>  	// x2-x29,lr: vcpu regs
>  	// vcpu x0-x1 on the stack
> -	stp	x2, x3, [sp, #-16]!
> -	stp	x4, lr, [sp, #-16]!
> -
> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> -	mrs	x2, cptr_el2
> -	bic	x2, x2, #CPTR_EL2_TFP
> -	msr	cptr_el2, x2
> -alternative_else
> -	mrs	x2, cpacr_el1
> -	orr	x2, x2, #CPACR_EL1_FPEN
> -	msr	cpacr_el1, x2
> -alternative_endif
> -	isb
> -
> -	mov	x3, x1
> -
> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> -	kern_hyp_va x0
> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	bl	__fpsimd_save_state
> -
> -	add	x2, x3, #VCPU_CONTEXT
> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	bl	__fpsimd_restore_state
> -
> -	// Skip restoring fpexc32 for AArch64 guests
> -	mrs	x1, hcr_el2
> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> -	msr	fpexc32_el2, x4
> -1:
> -	ldp	x4, lr, [sp], #16
> -	ldp	x2, x3, [sp], #16
> -	ldp	x0, x1, [sp], #16
> -
> +	stp	x2, x3, [sp, #-144]!
> +	stp	x4, x5, [sp, #16]
> +	stp	x6, x7, [sp, #32]
> +	stp	x8, x9, [sp, #48]
> +	stp	x10, x11, [sp, #64]
> +	stp	x12, x13, [sp, #80]
> +	stp	x14, x15, [sp, #96]
> +	stp	x16, x17, [sp, #112]
> +	stp	x18, lr, [sp, #128]
> +
> +	bl	__hyp_switch_fpsimd
> +
> +	ldp	x4, x5, [sp, #16]
> +	ldp	x6, x7, [sp, #32]
> +	ldp	x8, x9, [sp, #48]
> +	ldp	x10, x11, [sp, #64]
> +	ldp	x12, x13, [sp, #80]
> +	ldp	x14, x15, [sp, #96]
> +	ldp	x16, x17, [sp, #112]
> +	ldp	x18, lr, [sp, #128]
> +	ldp	x0, x1, [sp, #144]
> +	ldp	x2, x3, [sp], #160
>  	eret
>  ENDPROC(__fpsimd_guest_restore)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d964523..c0796c4 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  	}
>  }
>
> +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> +				    struct kvm_vcpu *vcpu)
> +{
> +	kvm_cpu_context_t *host_ctxt;
> +
> +	if (has_vhe())
> +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> +			     cpacr_el1);
> +	else
> +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> +			     cptr_el2);

Is there no way to do alternative() in C or does it always come down to
different inline asms?

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> +
> +	isb();
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> +	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
> +
> +	/* Skip restoring fpexc32 for AArch64 guests */
> +	if (!(read_sysreg(hcr_el2) & HCR_RW))
> +		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
> +			     fpexc32_el2);
> +}
> +
>  /*
>   * Return true when we were able to fixup the guest exit and should return to
>   * the guest, false when we should restore the host state and return to the


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-05-23 19:35     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 19:35 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> To make the lazy FPSIMD context switch trap code easier to hack on,
> this patch converts it to C.
>
> This is not amazingly efficient, but the trap should typically only
> be taken once per host context switch.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
>  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>  2 files changed, 46 insertions(+), 35 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index e41a161..40f349b 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -172,40 +172,27 @@ ENTRY(__fpsimd_guest_restore)
>  	// x1: vcpu
>  	// x2-x29,lr: vcpu regs
>  	// vcpu x0-x1 on the stack
> -	stp	x2, x3, [sp, #-16]!
> -	stp	x4, lr, [sp, #-16]!
> -
> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> -	mrs	x2, cptr_el2
> -	bic	x2, x2, #CPTR_EL2_TFP
> -	msr	cptr_el2, x2
> -alternative_else
> -	mrs	x2, cpacr_el1
> -	orr	x2, x2, #CPACR_EL1_FPEN
> -	msr	cpacr_el1, x2
> -alternative_endif
> -	isb
> -
> -	mov	x3, x1
> -
> -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> -	kern_hyp_va x0
> -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	bl	__fpsimd_save_state
> -
> -	add	x2, x3, #VCPU_CONTEXT
> -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> -	bl	__fpsimd_restore_state
> -
> -	// Skip restoring fpexc32 for AArch64 guests
> -	mrs	x1, hcr_el2
> -	tbnz	x1, #HCR_RW_SHIFT, 1f
> -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> -	msr	fpexc32_el2, x4
> -1:
> -	ldp	x4, lr, [sp], #16
> -	ldp	x2, x3, [sp], #16
> -	ldp	x0, x1, [sp], #16
> -
> +	stp	x2, x3, [sp, #-144]!
> +	stp	x4, x5, [sp, #16]
> +	stp	x6, x7, [sp, #32]
> +	stp	x8, x9, [sp, #48]
> +	stp	x10, x11, [sp, #64]
> +	stp	x12, x13, [sp, #80]
> +	stp	x14, x15, [sp, #96]
> +	stp	x16, x17, [sp, #112]
> +	stp	x18, lr, [sp, #128]
> +
> +	bl	__hyp_switch_fpsimd
> +
> +	ldp	x4, x5, [sp, #16]
> +	ldp	x6, x7, [sp, #32]
> +	ldp	x8, x9, [sp, #48]
> +	ldp	x10, x11, [sp, #64]
> +	ldp	x12, x13, [sp, #80]
> +	ldp	x14, x15, [sp, #96]
> +	ldp	x16, x17, [sp, #112]
> +	ldp	x18, lr, [sp, #128]
> +	ldp	x0, x1, [sp, #144]
> +	ldp	x2, x3, [sp], #160
>  	eret
>  ENDPROC(__fpsimd_guest_restore)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d964523..c0796c4 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  	}
>  }
>
> +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> +				    struct kvm_vcpu *vcpu)
> +{
> +	kvm_cpu_context_t *host_ctxt;
> +
> +	if (has_vhe())
> +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> +			     cpacr_el1);
> +	else
> +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> +			     cptr_el2);

Is there no way to do alternative() in C or does it always come down to
different inline asms?

Anyway:

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>


> +
> +	isb();
> +
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> +	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
> +
> +	/* Skip restoring fpexc32 for AArch64 guests */
> +	if (!(read_sysreg(hcr_el2) & HCR_RW))
> +		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
> +			     fpexc32_el2);
> +}
> +
>  /*
>   * Return true when we were able to fixup the guest exit and should return to
>   * the guest, false when we should restore the host state and return to the


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 20:15     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 20:15 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
> contexts to be handled by the fpsimd common code, this patch adapts
> task_fpsimd_save() to save back the currently loaded context,
> removing the explicit dependency on current.
>
> The relevant storage to write back to in memory is now found by
> examining the fpsimd_last_state percpu struct.
>
> fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
> fpsimd_last_state is updated under local_bh_disable() or
> local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
> thus, fpsimd_save() will write back to the correct storage for the
> loaded context.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/kernel/fpsimd.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 9d85373..3aa100a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -270,13 +270,15 @@ static void task_fpsimd_load(void)
>  }
>
>  /*
> - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
> - * with respect to the CPU registers.
> + * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
> + * date with respect to the CPU registers.
>   *
>   * Softirqs (and preemption) must be disabled.
>   */
> -static void task_fpsimd_save(void)
> +static void fpsimd_save(void)
>  {
> +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> +

I thought I was missing something but the only write I saw of this was:

  __this_cpu_write(fpsimd_last_state.st, NULL);

which implied to me it is possible to have an invalid de-reference. I
did figure it out eventually as fpsimd_bind_state_to_cpu uses a more
indirect this_cpu_ptr idiom for tweaking this. I guess a reference to
fpsimd_bind_[task|state]_to_cpu in the comment would have helped my
confusion.

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


>  	WARN_ON(!in_softirq() && !irqs_disabled());
>
>  	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> @@ -291,10 +293,9 @@ static void task_fpsimd_save(void)
>  				return;
>  			}
>
> -			sve_save_state(sve_pffr(current),
> -				       &current->thread.uw.fpsimd_state.fpsr);
> +			sve_save_state(sve_pffr(current), &st->fpsr);
>  		} else
> -			fpsimd_save_state(&current->thread.uw.fpsimd_state);
> +			fpsimd_save_state(st);
>  	}
>  }
>
> @@ -598,7 +599,7 @@ int sve_set_vector_length(struct task_struct *task,
>  	if (task == current) {
>  		local_bh_disable();
>
> -		task_fpsimd_save();
> +		fpsimd_save();
>  		set_thread_flag(TIF_FOREIGN_FPSTATE);
>  	}
>
> @@ -837,7 +838,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>
>  	local_bh_disable();
>
> -	task_fpsimd_save();
> +	fpsimd_save();
>  	fpsimd_to_sve(current);
>
>  	/* Force ret_to_user to reload the registers: */
> @@ -898,7 +899,7 @@ void fpsimd_thread_switch(struct task_struct *next)
>  	 * 'current'.
>  	 */
>  	if (current->mm)
> -		task_fpsimd_save();
> +		fpsimd_save();
>
>  	if (next->mm) {
>  		/*
> @@ -980,7 +981,7 @@ void fpsimd_preserve_current_state(void)
>  		return;
>
>  	local_bh_disable();
> -	task_fpsimd_save();
> +	fpsimd_save();
>  	local_bh_enable();
>  }
>
> @@ -1121,7 +1122,7 @@ void kernel_neon_begin(void)
>
>  	/* Save unsaved task fpsimd state, if any: */
>  	if (current->mm)
> -		task_fpsimd_save();
> +		fpsimd_save();
>
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
> @@ -1244,7 +1245,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  	switch (cmd) {
>  	case CPU_PM_ENTER:
>  		if (current->mm)
> -			task_fpsimd_save();
> +			fpsimd_save();
>  		fpsimd_flush_cpu_state();
>  		break;
>  	case CPU_PM_EXIT:


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts
@ 2018-05-23 20:15     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 20:15 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
> contexts to be handled by the fpsimd common code, this patch adapts
> task_fpsimd_save() to save back the currently loaded context,
> removing the explicit dependency on current.
>
> The relevant storage to write back to in memory is now found by
> examining the fpsimd_last_state percpu struct.
>
> fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
> fpsimd_last_state is updated under local_bh_disable() or
> local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
> thus, fpsimd_save() will write back to the correct storage for the
> loaded context.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/kernel/fpsimd.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 9d85373..3aa100a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -270,13 +270,15 @@ static void task_fpsimd_load(void)
>  }
>
>  /*
> - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
> - * with respect to the CPU registers.
> + * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
> + * date with respect to the CPU registers.
>   *
>   * Softirqs (and preemption) must be disabled.
>   */
> -static void task_fpsimd_save(void)
> +static void fpsimd_save(void)
>  {
> +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> +

I thought I was missing something but the only write I saw of this was:

  __this_cpu_write(fpsimd_last_state.st, NULL);

which implied to me it is possible to have an invalid de-reference. I
did figure it out eventually as fpsimd_bind_state_to_cpu uses a more
indirect this_cpu_ptr idiom for tweaking this. I guess a reference to
fpsimd_bind_[task|state]_to_cpu in the comment would have helped my
confusion.

Anyway:

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>


>  	WARN_ON(!in_softirq() && !irqs_disabled());
>
>  	if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> @@ -291,10 +293,9 @@ static void task_fpsimd_save(void)
>  				return;
>  			}
>
> -			sve_save_state(sve_pffr(current),
> -				       &current->thread.uw.fpsimd_state.fpsr);
> +			sve_save_state(sve_pffr(current), &st->fpsr);
>  		} else
> -			fpsimd_save_state(&current->thread.uw.fpsimd_state);
> +			fpsimd_save_state(st);
>  	}
>  }
>
> @@ -598,7 +599,7 @@ int sve_set_vector_length(struct task_struct *task,
>  	if (task == current) {
>  		local_bh_disable();
>
> -		task_fpsimd_save();
> +		fpsimd_save();
>  		set_thread_flag(TIF_FOREIGN_FPSTATE);
>  	}
>
> @@ -837,7 +838,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>
>  	local_bh_disable();
>
> -	task_fpsimd_save();
> +	fpsimd_save();
>  	fpsimd_to_sve(current);
>
>  	/* Force ret_to_user to reload the registers: */
> @@ -898,7 +899,7 @@ void fpsimd_thread_switch(struct task_struct *next)
>  	 * 'current'.
>  	 */
>  	if (current->mm)
> -		task_fpsimd_save();
> +		fpsimd_save();
>
>  	if (next->mm) {
>  		/*
> @@ -980,7 +981,7 @@ void fpsimd_preserve_current_state(void)
>  		return;
>
>  	local_bh_disable();
> -	task_fpsimd_save();
> +	fpsimd_save();
>  	local_bh_enable();
>  }
>
> @@ -1121,7 +1122,7 @@ void kernel_neon_begin(void)
>
>  	/* Save unsaved task fpsimd state, if any: */
>  	if (current->mm)
> -		task_fpsimd_save();
> +		fpsimd_save();
>
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
> @@ -1244,7 +1245,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  	switch (cmd) {
>  	case CPU_PM_ENTER:
>  		if (current->mm)
> -			task_fpsimd_save();
> +			fpsimd_save();
>  		fpsimd_flush_cpu_state();
>  		break;
>  	case CPU_PM_EXIT:


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 08/18] arm64/sve: Refactor user SVE trap maintenance for external use
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-23 20:16     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 20:16 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In preparation for optimising the way KVM manages switching the
> guest and host FPSIMD state, it is necessary to provide a means for
> code outside arch/arm64/kernel/fpsimd.c to restore the user trap
> configuration for SVE correctly for the current task.
>
> Rather than requiring external code to duplicate the maintenance
> explicitly, this patch wraps moves the trap maintenenace to
> fpsimd_bind_to_cpu(), since it is logically part of the work of
> associating the current task with the cpu.
>
> Because fpsimd_bind_to_cpu() is rather a cryptic name to publish
> alongside fpsimd_bind_state_to_cpu(), the former function is
> renamed to fpsimd_bind_task_to_cpu() to make its purpose more
> explicit.
>
> This patch makes appropriate changes to ensure that
> fpsimd_bind_task_to_cpu() is always called alongside
> task_fpsimd_load(), so that the trap maintenance continues to be
> done in every situation where it was done prior to this patch.
>
> As a side-effect, the metadata updates done by
> fpsimd_bind_task_to_cpu() now change from conditional to
> unconditional in the "already bound" case of sigreturn.  This is
> harmless, and a couple of extra stores on this slow path will not
> impact performance.  I consider this a reasonable price to pay for
> a slightly cleaner interface.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

In fact the comment I alluded to in 6/18 could be applied in this.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> ---
>  arch/arm64/kernel/fpsimd.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 1222491..ba9e7df 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -257,16 +257,6 @@ static void task_fpsimd_load(void)
>  			       sve_vq_from_vl(current->thread.sve_vl) - 1);
>  	else
>  		fpsimd_load_state(&current->thread.uw.fpsimd_state);
> -
> -	if (system_supports_sve()) {
> -		/* Toggle SVE trapping for userspace if needed */
> -		if (test_thread_flag(TIF_SVE))
> -			sve_user_enable();
> -		else
> -			sve_user_disable();
> -
> -		/* Serialised by exception return to user */
> -	}
>  }
>
>  /*
> @@ -991,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
>   * Associate current's FPSIMD context with this cpu
>   * Preemption must be disabled when calling this function.
>   */
> -static void fpsimd_bind_to_cpu(void)
> +static void fpsimd_bind_task_to_cpu(void)
>  {
>  	struct fpsimd_last_state_struct *last =
>  		this_cpu_ptr(&fpsimd_last_state);
> @@ -999,6 +989,16 @@ static void fpsimd_bind_to_cpu(void)
>  	last->st = &current->thread.uw.fpsimd_state;
>  	last->sve_in_use = test_thread_flag(TIF_SVE);
>  	current->thread.fpsimd_cpu = smp_processor_id();
> +
> +	if (system_supports_sve()) {
> +		/* Toggle SVE trapping for userspace if needed */
> +		if (test_thread_flag(TIF_SVE))
> +			sve_user_enable();
> +		else
> +			sve_user_disable();
> +
> +		/* Serialised by exception return to user */
> +	}
>  }
>
>  /*
> @@ -1015,7 +1015,7 @@ void fpsimd_restore_current_state(void)
>
>  	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>  		task_fpsimd_load();
> -		fpsimd_bind_to_cpu();
> +		fpsimd_bind_task_to_cpu();
>  	}
>
>  	local_bh_enable();
> @@ -1038,9 +1038,9 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  		fpsimd_to_sve(current);
>
>  	task_fpsimd_load();
> +	fpsimd_bind_task_to_cpu();
>
> -	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE))
> -		fpsimd_bind_to_cpu();
> +	clear_thread_flag(TIF_FOREIGN_FPSTATE);
>
>  	local_bh_enable();
>  }


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 08/18] arm64/sve: Refactor user SVE trap maintenance for external use
@ 2018-05-23 20:16     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-23 20:16 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In preparation for optimising the way KVM manages switching the
> guest and host FPSIMD state, it is necessary to provide a means for
> code outside arch/arm64/kernel/fpsimd.c to restore the user trap
> configuration for SVE correctly for the current task.
>
> Rather than requiring external code to duplicate the maintenance
> explicitly, this patch wraps moves the trap maintenenace to
> fpsimd_bind_to_cpu(), since it is logically part of the work of
> associating the current task with the cpu.
>
> Because fpsimd_bind_to_cpu() is rather a cryptic name to publish
> alongside fpsimd_bind_state_to_cpu(), the former function is
> renamed to fpsimd_bind_task_to_cpu() to make its purpose more
> explicit.
>
> This patch makes appropriate changes to ensure that
> fpsimd_bind_task_to_cpu() is always called alongside
> task_fpsimd_load(), so that the trap maintenance continues to be
> done in every situation where it was done prior to this patch.
>
> As a side-effect, the metadata updates done by
> fpsimd_bind_task_to_cpu() now change from conditional to
> unconditional in the "already bound" case of sigreturn.  This is
> harmless, and a couple of extra stores on this slow path will not
> impact performance.  I consider this a reasonable price to pay for
> a slightly cleaner interface.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

In fact the comment I alluded to in 6/18 could be applied in this.

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>


> ---
>  arch/arm64/kernel/fpsimd.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 1222491..ba9e7df 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -257,16 +257,6 @@ static void task_fpsimd_load(void)
>  			       sve_vq_from_vl(current->thread.sve_vl) - 1);
>  	else
>  		fpsimd_load_state(&current->thread.uw.fpsimd_state);
> -
> -	if (system_supports_sve()) {
> -		/* Toggle SVE trapping for userspace if needed */
> -		if (test_thread_flag(TIF_SVE))
> -			sve_user_enable();
> -		else
> -			sve_user_disable();
> -
> -		/* Serialised by exception return to user */
> -	}
>  }
>
>  /*
> @@ -991,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
>   * Associate current's FPSIMD context with this cpu
>   * Preemption must be disabled when calling this function.
>   */
> -static void fpsimd_bind_to_cpu(void)
> +static void fpsimd_bind_task_to_cpu(void)
>  {
>  	struct fpsimd_last_state_struct *last =
>  		this_cpu_ptr(&fpsimd_last_state);
> @@ -999,6 +989,16 @@ static void fpsimd_bind_to_cpu(void)
>  	last->st = &current->thread.uw.fpsimd_state;
>  	last->sve_in_use = test_thread_flag(TIF_SVE);
>  	current->thread.fpsimd_cpu = smp_processor_id();
> +
> +	if (system_supports_sve()) {
> +		/* Toggle SVE trapping for userspace if needed */
> +		if (test_thread_flag(TIF_SVE))
> +			sve_user_enable();
> +		else
> +			sve_user_disable();
> +
> +		/* Serialised by exception return to user */
> +	}
>  }
>
>  /*
> @@ -1015,7 +1015,7 @@ void fpsimd_restore_current_state(void)
>
>  	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>  		task_fpsimd_load();
> -		fpsimd_bind_to_cpu();
> +		fpsimd_bind_task_to_cpu();
>  	}
>
>  	local_bh_enable();
> @@ -1038,9 +1038,9 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  		fpsimd_to_sve(current);
>
>  	task_fpsimd_load();
> +	fpsimd_bind_task_to_cpu();
>
> -	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE))
> -		fpsimd_bind_to_cpu();
> +	clear_thread_flag(TIF_FOREIGN_FPSTATE);
>
>  	local_bh_enable();
>  }


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
  2018-05-23 14:40       ` Dave Martin
@ 2018-05-24  8:11         ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  8:11 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-arm-kernel, Alex Bennée, kvmarm,
	Christoffer Dall

On Wed, May 23, 2018 at 03:40:26PM +0100, Dave Martin wrote:
> On Wed, May 23, 2018 at 03:34:20PM +0100, Alex Bennée wrote:
> > 
> > Dave Martin <Dave.Martin@arm.com> writes:
> > 
> > > From: Christoffer Dall <christoffer.dall@linaro.org>
> > >
> > > KVM/ARM differs from other architectures in having to maintain an
> > > additional virtual address space from that of the host and the
> > > guest, because we split the execution of KVM across both EL1 and
> > > EL2.
> > >
> > > This results in a need to explicitly map data structures into EL2
> > > (hyp) which are accessed from the hyp code.  As we are about to be
> > > more clever with our FPSIMD handling on arm64, which stores data in
> > > the task struct and uses thread_info flags, we will have to map
> > > parts of the currently executing task struct into the EL2 virtual
> > > address space.
> > >
> > > However, we don't want to do this on every KVM_RUN, because it is a
> > > fairly expensive operation to walk the page tables, and the common
> > > execution mode is to map a single thread to a VCPU.  By introducing
> > > a hook that architectures can select with
> > > HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
> > > other architectures, but have a simple way to only map the data we
> > > need when required for arm64.
> > >
> > > This patch introduces the framework only, and wires it up in the
> > > arm/arm64 KVM common code.
> > >
> > > No functional change.
> > >
> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > ---
> > >  include/linux/kvm_host.h | 9 +++++++++
> > >  virt/kvm/Kconfig         | 3 +++
> > >  virt/kvm/kvm_main.c      | 7 ++++++-
> > >  3 files changed, 18 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 6930c63..4268ace 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
> > >  void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> > >  		unsigned long start, unsigned long end);
> > >
> > > +#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
> > > +int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
> > > +#else
> > > +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> > > +{
> > > +	return 0;
> > > +}
> > > +#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
> > > +
> > >  #endif
> > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > > index cca7e06..72143cf 100644
> > > --- a/virt/kvm/Kconfig
> > > +++ b/virt/kvm/Kconfig
> > > @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
> > >
> > >  config HAVE_KVM_VCPU_ASYNC_IOCTL
> > >         bool
> > > +
> > > +config HAVE_KVM_VCPU_RUN_PID_CHANGE
> > > +       bool
> > 
> > This almost threw me as I thought you might be able to enable this and
> > break the build, but apparently not:
> > 
> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> 
> Without a "help", the option seems non-interactive and cannot be true
> unless something selects it.  It seems a bit weird to me too, but the
> idiom appears widely used...
> 
Indeed, I've copied this idiom from other things before and nobody has
complained, so I think it works (without any further deep insights into
the inner workings of Kconfig).

Thanks,
-Christoffer

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
@ 2018-05-24  8:11         ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  8:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 03:40:26PM +0100, Dave Martin wrote:
> On Wed, May 23, 2018 at 03:34:20PM +0100, Alex Benn?e wrote:
> > 
> > Dave Martin <Dave.Martin@arm.com> writes:
> > 
> > > From: Christoffer Dall <christoffer.dall@linaro.org>
> > >
> > > KVM/ARM differs from other architectures in having to maintain an
> > > additional virtual address space from that of the host and the
> > > guest, because we split the execution of KVM across both EL1 and
> > > EL2.
> > >
> > > This results in a need to explicitly map data structures into EL2
> > > (hyp) which are accessed from the hyp code.  As we are about to be
> > > more clever with our FPSIMD handling on arm64, which stores data in
> > > the task struct and uses thread_info flags, we will have to map
> > > parts of the currently executing task struct into the EL2 virtual
> > > address space.
> > >
> > > However, we don't want to do this on every KVM_RUN, because it is a
> > > fairly expensive operation to walk the page tables, and the common
> > > execution mode is to map a single thread to a VCPU.  By introducing
> > > a hook that architectures can select with
> > > HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
> > > other architectures, but have a simple way to only map the data we
> > > need when required for arm64.
> > >
> > > This patch introduces the framework only, and wires it up in the
> > > arm/arm64 KVM common code.
> > >
> > > No functional change.
> > >
> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > ---
> > >  include/linux/kvm_host.h | 9 +++++++++
> > >  virt/kvm/Kconfig         | 3 +++
> > >  virt/kvm/kvm_main.c      | 7 ++++++-
> > >  3 files changed, 18 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 6930c63..4268ace 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
> > >  void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> > >  		unsigned long start, unsigned long end);
> > >
> > > +#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
> > > +int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
> > > +#else
> > > +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> > > +{
> > > +	return 0;
> > > +}
> > > +#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
> > > +
> > >  #endif
> > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > > index cca7e06..72143cf 100644
> > > --- a/virt/kvm/Kconfig
> > > +++ b/virt/kvm/Kconfig
> > > @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
> > >
> > >  config HAVE_KVM_VCPU_ASYNC_IOCTL
> > >         bool
> > > +
> > > +config HAVE_KVM_VCPU_RUN_PID_CHANGE
> > > +       bool
> > 
> > This almost threw me as I thought you might be able to enable this and
> > break the build, but apparently not:
> > 
> > Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
> 
> Without a "help", the option seems non-interactive and cannot be true
> unless something selects it.  It seems a bit weird to me too, but the
> idiom appears widely used...
> 
Indeed, I've copied this idiom from other things before and nobody has
complained, so I think it works (without any further deep insights into
the inner workings of Kconfig).

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-05-23 19:35     ` Alex Bennée
@ 2018-05-24  8:12       ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  8:12 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, Dave Martin, linux-arm-kernel

On Wed, May 23, 2018 at 08:35:13PM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > To make the lazy FPSIMD context switch trap code easier to hack on,
> > this patch converts it to C.
> >
> > This is not amazingly efficient, but the trap should typically only
> > be taken once per host context switch.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >  2 files changed, 46 insertions(+), 35 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > index e41a161..40f349b 100644
> > --- a/arch/arm64/kvm/hyp/entry.S
> > +++ b/arch/arm64/kvm/hyp/entry.S
> > @@ -172,40 +172,27 @@ ENTRY(__fpsimd_guest_restore)
> >  	// x1: vcpu
> >  	// x2-x29,lr: vcpu regs
> >  	// vcpu x0-x1 on the stack
> > -	stp	x2, x3, [sp, #-16]!
> > -	stp	x4, lr, [sp, #-16]!
> > -
> > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > -	mrs	x2, cptr_el2
> > -	bic	x2, x2, #CPTR_EL2_TFP
> > -	msr	cptr_el2, x2
> > -alternative_else
> > -	mrs	x2, cpacr_el1
> > -	orr	x2, x2, #CPACR_EL1_FPEN
> > -	msr	cpacr_el1, x2
> > -alternative_endif
> > -	isb
> > -
> > -	mov	x3, x1
> > -
> > -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> > -	kern_hyp_va x0
> > -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > -	bl	__fpsimd_save_state
> > -
> > -	add	x2, x3, #VCPU_CONTEXT
> > -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > -	bl	__fpsimd_restore_state
> > -
> > -	// Skip restoring fpexc32 for AArch64 guests
> > -	mrs	x1, hcr_el2
> > -	tbnz	x1, #HCR_RW_SHIFT, 1f
> > -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> > -	msr	fpexc32_el2, x4
> > -1:
> > -	ldp	x4, lr, [sp], #16
> > -	ldp	x2, x3, [sp], #16
> > -	ldp	x0, x1, [sp], #16
> > -
> > +	stp	x2, x3, [sp, #-144]!
> > +	stp	x4, x5, [sp, #16]
> > +	stp	x6, x7, [sp, #32]
> > +	stp	x8, x9, [sp, #48]
> > +	stp	x10, x11, [sp, #64]
> > +	stp	x12, x13, [sp, #80]
> > +	stp	x14, x15, [sp, #96]
> > +	stp	x16, x17, [sp, #112]
> > +	stp	x18, lr, [sp, #128]
> > +
> > +	bl	__hyp_switch_fpsimd
> > +
> > +	ldp	x4, x5, [sp, #16]
> > +	ldp	x6, x7, [sp, #32]
> > +	ldp	x8, x9, [sp, #48]
> > +	ldp	x10, x11, [sp, #64]
> > +	ldp	x12, x13, [sp, #80]
> > +	ldp	x14, x15, [sp, #96]
> > +	ldp	x16, x17, [sp, #112]
> > +	ldp	x18, lr, [sp, #128]
> > +	ldp	x0, x1, [sp, #144]
> > +	ldp	x2, x3, [sp], #160
> >  	eret
> >  ENDPROC(__fpsimd_guest_restore)
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index d964523..c0796c4 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
> >  	}
> >  }
> >
> > +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> > +				    struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_cpu_context_t *host_ctxt;
> > +
> > +	if (has_vhe())
> > +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> > +			     cpacr_el1);
> > +	else
> > +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> > +			     cptr_el2);
> 
> Is there no way to do alternative() in C or does it always come down to
> different inline asms?
> 

has_vhe() should resolve to a static key, and I prefer this over the
previous alternative construct we had for selecting function calls in C,
as that resultet in having to follow too many levels of indirection.

Thanks,
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-05-24  8:12       ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  8:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 08:35:13PM +0100, Alex Benn?e wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > To make the lazy FPSIMD context switch trap code easier to hack on,
> > this patch converts it to C.
> >
> > This is not amazingly efficient, but the trap should typically only
> > be taken once per host context switch.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> >  2 files changed, 46 insertions(+), 35 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > index e41a161..40f349b 100644
> > --- a/arch/arm64/kvm/hyp/entry.S
> > +++ b/arch/arm64/kvm/hyp/entry.S
> > @@ -172,40 +172,27 @@ ENTRY(__fpsimd_guest_restore)
> >  	// x1: vcpu
> >  	// x2-x29,lr: vcpu regs
> >  	// vcpu x0-x1 on the stack
> > -	stp	x2, x3, [sp, #-16]!
> > -	stp	x4, lr, [sp, #-16]!
> > -
> > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > -	mrs	x2, cptr_el2
> > -	bic	x2, x2, #CPTR_EL2_TFP
> > -	msr	cptr_el2, x2
> > -alternative_else
> > -	mrs	x2, cpacr_el1
> > -	orr	x2, x2, #CPACR_EL1_FPEN
> > -	msr	cpacr_el1, x2
> > -alternative_endif
> > -	isb
> > -
> > -	mov	x3, x1
> > -
> > -	ldr	x0, [x3, #VCPU_HOST_CONTEXT]
> > -	kern_hyp_va x0
> > -	add	x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > -	bl	__fpsimd_save_state
> > -
> > -	add	x2, x3, #VCPU_CONTEXT
> > -	add	x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > -	bl	__fpsimd_restore_state
> > -
> > -	// Skip restoring fpexc32 for AArch64 guests
> > -	mrs	x1, hcr_el2
> > -	tbnz	x1, #HCR_RW_SHIFT, 1f
> > -	ldr	x4, [x3, #VCPU_FPEXC32_EL2]
> > -	msr	fpexc32_el2, x4
> > -1:
> > -	ldp	x4, lr, [sp], #16
> > -	ldp	x2, x3, [sp], #16
> > -	ldp	x0, x1, [sp], #16
> > -
> > +	stp	x2, x3, [sp, #-144]!
> > +	stp	x4, x5, [sp, #16]
> > +	stp	x6, x7, [sp, #32]
> > +	stp	x8, x9, [sp, #48]
> > +	stp	x10, x11, [sp, #64]
> > +	stp	x12, x13, [sp, #80]
> > +	stp	x14, x15, [sp, #96]
> > +	stp	x16, x17, [sp, #112]
> > +	stp	x18, lr, [sp, #128]
> > +
> > +	bl	__hyp_switch_fpsimd
> > +
> > +	ldp	x4, x5, [sp, #16]
> > +	ldp	x6, x7, [sp, #32]
> > +	ldp	x8, x9, [sp, #48]
> > +	ldp	x10, x11, [sp, #64]
> > +	ldp	x12, x13, [sp, #80]
> > +	ldp	x14, x15, [sp, #96]
> > +	ldp	x16, x17, [sp, #112]
> > +	ldp	x18, lr, [sp, #128]
> > +	ldp	x0, x1, [sp, #144]
> > +	ldp	x2, x3, [sp], #160
> >  	eret
> >  ENDPROC(__fpsimd_guest_restore)
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index d964523..c0796c4 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
> >  	}
> >  }
> >
> > +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> > +				    struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_cpu_context_t *host_ctxt;
> > +
> > +	if (has_vhe())
> > +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> > +			     cpacr_el1);
> > +	else
> > +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> > +			     cptr_el2);
> 
> Is there no way to do alternative() in C or does it always come down to
> different inline asms?
> 

has_vhe() should resolve to a static key, and I prefer this over the
previous alternative construct we had for selecting function calls in C,
as that resultet in having to follow too many levels of indirection.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-23 15:03           ` Dave Martin
@ 2018-05-24  8:33             ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  8:33 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Wed, May 23, 2018 at 04:03:37PM +0100, Dave Martin wrote:
> On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > > cleared except when returning to userspace or returning from a
> > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > > ever be saved.
> > > > 
> > > > I don't understand this construction proof; from looking at the patch
> > > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > > kernel thread?
> > > 
> > > Looking at this again, I think it is poorly worded.  This patch aims to
> > > make it true by construction, but it isn't prior to the patch.
> > > 
> > > I'm tempted to delete the paragraph: the assertion of both untrue and
> > > not the best way to justify that this patch works.
> > > 
> > > 
> > > How about:
> > > 
> > > -8<-
> > > 
> > > The context switch logic already isolates user threads from each other.
> > > This, it is sufficient for isolating user threads from the kernel,

s/This/Thus/ ?

I don't understand what 'it' refers to here?

> > > since the goal either way is to ensure that code executing in userspace
> > > cannot see any FPSIMD state except its own.  Thus, there is no special
> > > property of kernel threads that we care about except that it is
> > > pointless to save or load FPSIMD register state for them.

Actually, I'm not really sure what this paragraph is getting at.

> > > 
> > > At worst, the removal of all the kernel thread special cases by this
> > > patch would thus spuriously load and save state for kernel threads when
> > > unnecessary.
> > > 
> > > But the context switch logic is already deliberately optimised to defer
> > > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > > which kernel threads by definition never reach.
> > > 
> > > ->8-
> > 
> > The "at worst" paragraph makes it look like it could happen (at least
> > until you reach the last paragraph). Maybe you can just say that
> > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> > always true for kernel threads. You should probably mention this in a
> > comment in the code as well.
> 
> What if I just delete the second paragraph, and remove the "But" from
> the start of the third, and append:
> 
> "As a result, the wrong_task and wrong_cpu tests in
> fpsimd_thread_switch() will always yield false for kernel threads."
> 
> ...with a similar comment in the code?

...with a risk of being a bit over-pedantic and annoying, may I suggest
the following complete commit text:

------8<------
Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.

The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
However, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory.  For this reason, the ->mm
checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
maintained properly for kernel threads.

FPSIMD context is never preserved for kernel threads across a context
switch and therefore TIF_FOREIGN_FPSTATE should always be true for
kernel threads.  This is indeed the case, as the wrong_task and
wrong_cpu tests in fpsimd_thread_switch() will always yield false for
kernel threads.

Further, the context switch logic is already deliberately optimised to
defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
special case), which kernel threads by definition never reach, and
therefore this change introduces no additional work in the critical
path.

This patch removes the redundant checks and special-case code.
------8<------

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-24  8:33             ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 04:03:37PM +0100, Dave Martin wrote:
> On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > > cleared except when returning to userspace or returning from a
> > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > > ever be saved.
> > > > 
> > > > I don't understand this construction proof; from looking at the patch
> > > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > > kernel thread?
> > > 
> > > Looking at this again, I think it is poorly worded.  This patch aims to
> > > make it true by construction, but it isn't prior to the patch.
> > > 
> > > I'm tempted to delete the paragraph: the assertion of both untrue and
> > > not the best way to justify that this patch works.
> > > 
> > > 
> > > How about:
> > > 
> > > -8<-
> > > 
> > > The context switch logic already isolates user threads from each other.
> > > This, it is sufficient for isolating user threads from the kernel,

s/This/Thus/ ?

I don't understand what 'it' refers to here?

> > > since the goal either way is to ensure that code executing in userspace
> > > cannot see any FPSIMD state except its own.  Thus, there is no special
> > > property of kernel threads that we care about except that it is
> > > pointless to save or load FPSIMD register state for them.

Actually, I'm not really sure what this paragraph is getting at.

> > > 
> > > At worst, the removal of all the kernel thread special cases by this
> > > patch would thus spuriously load and save state for kernel threads when
> > > unnecessary.
> > > 
> > > But the context switch logic is already deliberately optimised to defer
> > > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > > which kernel threads by definition never reach.
> > > 
> > > ->8-
> > 
> > The "at worst" paragraph makes it look like it could happen (at least
> > until you reach the last paragraph). Maybe you can just say that
> > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> > always true for kernel threads. You should probably mention this in a
> > comment in the code as well.
> 
> What if I just delete the second paragraph, and remove the "But" from
> the start of the third, and append:
> 
> "As a result, the wrong_task and wrong_cpu tests in
> fpsimd_thread_switch() will always yield false for kernel threads."
> 
> ...with a similar comment in the code?

...with a risk of being a bit over-pedantic and annoying, may I suggest
the following complete commit text:

------8<------
Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.

The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
However, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory.  For this reason, the ->mm
checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
maintained properly for kernel threads.

FPSIMD context is never preserved for kernel threads across a context
switch and therefore TIF_FOREIGN_FPSTATE should always be true for
kernel threads.  This is indeed the case, as the wrong_task and
wrong_cpu tests in fpsimd_thread_switch() will always yield false for
kernel threads.

Further, the context switch logic is already deliberately optimised to
defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
special case), which kernel threads by definition never reach, and
therefore this change introduces no additional work in the critical
path.

This patch removes the redundant checks and special-case code.
------8<------

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-05-24  8:12       ` Christoffer Dall
@ 2018-05-24  8:54         ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24  8:54 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Thu, May 24, 2018 at 10:12:20AM +0200, Christoffer Dall wrote:
> On Wed, May 23, 2018 at 08:35:13PM +0100, Alex Bennée wrote:
> > 
> > Dave Martin <Dave.Martin@arm.com> writes:
> > 
> > > To make the lazy FPSIMD context switch trap code easier to hack on,
> > > this patch converts it to C.
> > >
> > > This is not amazingly efficient, but the trap should typically only
> > > be taken once per host context switch.
> > >
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > ---
> > >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> > >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> > >  2 files changed, 46 insertions(+), 35 deletions(-)

[...]

> > > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > > index d964523..c0796c4 100644
> > > --- a/arch/arm64/kvm/hyp/switch.c
> > > +++ b/arch/arm64/kvm/hyp/switch.c
> > > @@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
> > >  	}
> > >  }
> > >
> > > +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> > > +				    struct kvm_vcpu *vcpu)
> > > +{
> > > +	kvm_cpu_context_t *host_ctxt;
> > > +
> > > +	if (has_vhe())
> > > +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> > > +			     cpacr_el1);
> > > +	else
> > > +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> > > +			     cptr_el2);
> > 
> > Is there no way to do alternative() in C or does it always come down to
> > different inline asms?
> > 
> 
> has_vhe() should resolve to a static key, and I prefer this over the
> previous alternative construct we had for selecting function calls in C,
> as that resultet in having to follow too many levels of indirection.

I'll defer to Christoffer on that -- I was just following precedent :)

The if (has_vhe()) approach has the benefit of being much more
readable, and the static branch predictor in many CPUs will succeed in
folding a short-range unconditional branch out entirely.  There will be
a small increase in I-cache pressure due to the larger inline code
size, but probably not much beyond that.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-05-24  8:54         ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24  8:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 24, 2018 at 10:12:20AM +0200, Christoffer Dall wrote:
> On Wed, May 23, 2018 at 08:35:13PM +0100, Alex Benn?e wrote:
> > 
> > Dave Martin <Dave.Martin@arm.com> writes:
> > 
> > > To make the lazy FPSIMD context switch trap code easier to hack on,
> > > this patch converts it to C.
> > >
> > > This is not amazingly efficient, but the trap should typically only
> > > be taken once per host context switch.
> > >
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > > ---
> > >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
> > >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
> > >  2 files changed, 46 insertions(+), 35 deletions(-)

[...]

> > > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > > index d964523..c0796c4 100644
> > > --- a/arch/arm64/kvm/hyp/switch.c
> > > +++ b/arch/arm64/kvm/hyp/switch.c
> > > @@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
> > >  	}
> > >  }
> > >
> > > +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> > > +				    struct kvm_vcpu *vcpu)
> > > +{
> > > +	kvm_cpu_context_t *host_ctxt;
> > > +
> > > +	if (has_vhe())
> > > +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
> > > +			     cpacr_el1);
> > > +	else
> > > +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
> > > +			     cptr_el2);
> > 
> > Is there no way to do alternative() in C or does it always come down to
> > different inline asms?
> > 
> 
> has_vhe() should resolve to a static key, and I prefer this over the
> previous alternative construct we had for selecting function calls in C,
> as that resultet in having to follow too many levels of indirection.

I'll defer to Christoffer on that -- I was just following precedent :)

The if (has_vhe()) approach has the benefit of being much more
readable, and the static branch predictor in many CPUs will succeed in
folding a short-range unconditional branch out entirely.  There will be
a small increase in I-cache pressure due to the larger inline code
size, but probably not much beyond that.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts
  2018-05-23 20:15     ` Alex Bennée
@ 2018-05-24  9:03       ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24  9:03 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Wed, May 23, 2018 at 09:15:11PM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
> > contexts to be handled by the fpsimd common code, this patch adapts
> > task_fpsimd_save() to save back the currently loaded context,
> > removing the explicit dependency on current.
> >
> > The relevant storage to write back to in memory is now found by
> > examining the fpsimd_last_state percpu struct.
> >
> > fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
> > fpsimd_last_state is updated under local_bh_disable() or
> > local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
> > thus, fpsimd_save() will write back to the correct storage for the
> > loaded context.
> >
> > No functional change.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  arch/arm64/kernel/fpsimd.c | 25 +++++++++++++------------
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > index 9d85373..3aa100a 100644
> > --- a/arch/arm64/kernel/fpsimd.c
> > +++ b/arch/arm64/kernel/fpsimd.c
> > @@ -270,13 +270,15 @@ static void task_fpsimd_load(void)
> >  }
> >
> >  /*
> > - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
> > - * with respect to the CPU registers.
> > + * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
> > + * date with respect to the CPU registers.
> >   *
> >   * Softirqs (and preemption) must be disabled.
> >   */
> > -static void task_fpsimd_save(void)
> > +static void fpsimd_save(void)
> >  {
> > +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> > +
> 
> I thought I was missing something but the only write I saw of this was:
> 
>   __this_cpu_write(fpsimd_last_state.st, NULL);
> 
> which implied to me it is possible to have an invalid de-reference. I
> did figure it out eventually as fpsimd_bind_state_to_cpu uses a more
> indirect this_cpu_ptr idiom for tweaking this. I guess a reference to
> fpsimd_bind_[task|state]_to_cpu in the comment would have helped my
> confusion.

How about:

 static void fpsimd_save(void)
 {
 	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
+	/* set by fpsimd_bind_to_cpu() */
 
 	WARN_ON(!in_softirq() && !irqs_disabled());


> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Thanks
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts
@ 2018-05-24  9:03       ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 23, 2018 at 09:15:11PM +0100, Alex Benn?e wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
> > contexts to be handled by the fpsimd common code, this patch adapts
> > task_fpsimd_save() to save back the currently loaded context,
> > removing the explicit dependency on current.
> >
> > The relevant storage to write back to in memory is now found by
> > examining the fpsimd_last_state percpu struct.
> >
> > fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
> > fpsimd_last_state is updated under local_bh_disable() or
> > local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
> > thus, fpsimd_save() will write back to the correct storage for the
> > loaded context.
> >
> > No functional change.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > ---
> >  arch/arm64/kernel/fpsimd.c | 25 +++++++++++++------------
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > index 9d85373..3aa100a 100644
> > --- a/arch/arm64/kernel/fpsimd.c
> > +++ b/arch/arm64/kernel/fpsimd.c
> > @@ -270,13 +270,15 @@ static void task_fpsimd_load(void)
> >  }
> >
> >  /*
> > - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
> > - * with respect to the CPU registers.
> > + * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
> > + * date with respect to the CPU registers.
> >   *
> >   * Softirqs (and preemption) must be disabled.
> >   */
> > -static void task_fpsimd_save(void)
> > +static void fpsimd_save(void)
> >  {
> > +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> > +
> 
> I thought I was missing something but the only write I saw of this was:
> 
>   __this_cpu_write(fpsimd_last_state.st, NULL);
> 
> which implied to me it is possible to have an invalid de-reference. I
> did figure it out eventually as fpsimd_bind_state_to_cpu uses a more
> indirect this_cpu_ptr idiom for tweaking this. I guess a reference to
> fpsimd_bind_[task|state]_to_cpu in the comment would have helped my
> confusion.

How about:

 static void fpsimd_save(void)
 {
 	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
+	/* set by fpsimd_bind_to_cpu() */
 
 	WARN_ON(!in_softirq() && !irqs_disabled());


> Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

Thanks
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24  9:11     ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  9:11 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Tue, May 22, 2018 at 05:05:15PM +0100, Dave Martin wrote:
> This patch adds SVE context saving to the hyp FPSIMD context switch
> path.  This means that it is no longer necessary to save the host
> SVE state in advance of entering the guest, when in use.
> 
> In order to avoid adding pointless complexity to the code, VHE is
> assumed if SVE is in use.  VHE is an architectural prerequisite for
> SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
> kernels that support both SVE and KVM.
> 
> Historically, software models exist that can expose the
> architecturally invalid configuration of SVE without VHE, so if
> this situation is detected at kvm_init() time then KVM will be
> disabled.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> 
> ---
> 
>  * Tags stripped since v8, please reconfirm if possible:
> 
> Formerly-Reviewed-by: Christoffer Dall <christoffer.dall at arm.com>
> Formerly-Acked-by: Marc Zyngier <marc.zyngier at arm.com>
> Formerly-Acked-by: Catalin Marinas <catalin.marinas at arm.com>
> 
> Changes since v9:
> 
> Requested by Marc Zyngier:
> 
>  * Inline check for VHE if SVE is present into kvm_host.h.
> 
>    The check has been renamed to the more specific
>    kvm_arch_check_sve_has_vhe(), and the kvm_pr_unimpl() moved back to
>    arm.c (to avoid circular include issues).
> 
>    arm.c is not single-arch code, but it is all Arm-specific, so
>    adding a hook like this doesn't seem too unreasonable.
> 
> Changes since v8:
> 
>  * Add kvm_arch_check_supported() hook, and move arm64-specific check
>    for SVE-implies-VHE into arch/arm64/.
> 
>    Due to circular header dependency problems, it is difficult to get
>    the prototype for kvm_pr_*() functions in <asm/kvm_host.h>, so this
>    patch puts arm64's kvm_arch_check_supported() hook out of line.
>    This is not a hot function.
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm64/Kconfig                |  7 +++++++
>  arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
>  arch/arm64/kvm/fpsimd.c           |  1 -
>  arch/arm64/kvm/hyp/switch.c       | 20 +++++++++++++++++++-
>  virt/kvm/arm/arm.c                |  7 +++++++
>  6 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index ac870b2..3b85bbb 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -280,6 +280,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  
> +static inline bool kvm_arch_check_sve_has_vhe(void) { return true; }
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index eb2cf49..b0d3820 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1130,6 +1130,7 @@ endmenu
>  config ARM64_SVE
>  	bool "ARM Scalable Vector Extension support"
>  	default y
> +	depends on !KVM || ARM64_VHE
>  	help
>  	  The Scalable Vector Extension (SVE) is an extension to the AArch64
>  	  execution state which complements and extends the SIMD functionality
> @@ -1155,6 +1156,12 @@ config ARM64_SVE
>  	  booting the kernel.  If unsure and you are not observing these
>  	  symptoms, you should assume that it is safe to say Y.
>  
> +	  CPUs that support SVE are architecturally required to support the
> +	  Virtualization Host Extensions (VHE), so the kernel makes no
> +	  provision for supporting SVE alongside KVM without VHE enabled.
> +	  Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
> +	  KVM in the same kernel image.
> +
>  config ARM64_MODULE_PLTS
>  	bool
>  	select HAVE_MOD_ARCH_SPECIFIC
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b3fe730..06d5a61 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -405,6 +405,19 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>  	kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
>  }
>  
> +static inline bool kvm_arch_check_sve_has_vhe(void)
> +{
> +	/*
> +	 * The Arm architecture specifies that imlpementation of SVE

nit: implementation

> +	 * requires VHE also to be implemented.  The KVM code for arm64
> +	 * relies on this when SVE is present:
> +	 */
> +	if (system_supports_sve())
> +		return has_vhe();
> +	else
> +		return true;
> +}
> +
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 365933a..dc6ecfa 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -59,7 +59,6 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
>   */
>  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
>  {
> -	BUG_ON(system_supports_sve());
>  	BUG_ON(!current->mm);
>  
>  	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 118f300..a6a8c7d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -21,6 +21,7 @@
>  
>  #include <kvm/arm_psci.h>
>  
> +#include <asm/cpufeature.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
> @@ -28,6 +29,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/processor.h>
>  #include <asm/thread_info.h>
>  
>  /* Check whether the FP regs were dirtied while in the host-side run loop: */
> @@ -329,6 +331,8 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  				    struct kvm_vcpu *vcpu)
>  {
> +	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
> +
>  	if (has_vhe())
>  		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>  			     cpacr_el1);
> @@ -339,7 +343,21 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  	isb();
>  
>  	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
> -		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> +		/*
> +		 * In the SVE case, VHE is assumed: it is enforced by
> +		 * Kconfig and kvm_arch_init().
> +		 */
> +		if (system_supports_sve() &&
> +		    (vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE)) {
> +			struct thread_struct *thread = container_of(
> +				host_fpsimd,
> +				struct thread_struct, uw.fpsimd_state);
> +
> +			sve_save_state(sve_pffr(thread), &host_fpsimd->fpsr);
> +		} else {
> +			__fpsimd_save_state(host_fpsimd);
> +		}
> +
>  		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
>  	}
>  
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index bee226c..ce7c6f3 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -16,6 +16,7 @@
>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>   */
>  
> +#include <linux/bug.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/errno.h>
>  #include <linux/err.h>
> @@ -41,6 +42,7 @@
>  #include <asm/mman.h>
>  #include <asm/tlbflush.h>
>  #include <asm/cacheflush.h>
> +#include <asm/cpufeature.h>
>  #include <asm/virt.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
> @@ -1574,6 +1576,11 @@ int kvm_arch_init(void *opaque)
>  		return -ENODEV;
>  	}
>  
> +	if (!kvm_arch_check_sve_has_vhe()) {
> +		kvm_pr_unimpl("SVE system without VHE unsupported.  Broken cpu?");
> +		return -ENODEV;
> +	}
> +
>  	for_each_online_cpu(cpu) {
>  		smp_call_function_single(cpu, check_kvm_target_cpu, &ret, 1);
>  		if (ret < 0) {
> -- 
> 2.1.4
> 
> 

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate
@ 2018-05-24  9:11     ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  9:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 22, 2018 at 05:05:15PM +0100, Dave Martin wrote:
> This patch adds SVE context saving to the hyp FPSIMD context switch
> path.  This means that it is no longer necessary to save the host
> SVE state in advance of entering the guest, when in use.
> 
> In order to avoid adding pointless complexity to the code, VHE is
> assumed if SVE is in use.  VHE is an architectural prerequisite for
> SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
> kernels that support both SVE and KVM.
> 
> Historically, software models exist that can expose the
> architecturally invalid configuration of SVE without VHE, so if
> this situation is detected at kvm_init() time then KVM will be
> disabled.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> 
> ---
> 
>  * Tags stripped since v8, please reconfirm if possible:
> 
> Formerly-Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> Formerly-Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Formerly-Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> Changes since v9:
> 
> Requested by Marc Zyngier:
> 
>  * Inline check for VHE if SVE is present into kvm_host.h.
> 
>    The check has been renamed to the more specific
>    kvm_arch_check_sve_has_vhe(), and the kvm_pr_unimpl() moved back to
>    arm.c (to avoid circular include issues).
> 
>    arm.c is not single-arch code, but it is all Arm-specific, so
>    adding a hook like this doesn't seem too unreasonable.
> 
> Changes since v8:
> 
>  * Add kvm_arch_check_supported() hook, and move arm64-specific check
>    for SVE-implies-VHE into arch/arm64/.
> 
>    Due to circular header dependency problems, it is difficult to get
>    the prototype for kvm_pr_*() functions in <asm/kvm_host.h>, so this
>    patch puts arm64's kvm_arch_check_supported() hook out of line.
>    This is not a hot function.
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm64/Kconfig                |  7 +++++++
>  arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
>  arch/arm64/kvm/fpsimd.c           |  1 -
>  arch/arm64/kvm/hyp/switch.c       | 20 +++++++++++++++++++-
>  virt/kvm/arm/arm.c                |  7 +++++++
>  6 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index ac870b2..3b85bbb 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -280,6 +280,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  
> +static inline bool kvm_arch_check_sve_has_vhe(void) { return true; }
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index eb2cf49..b0d3820 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1130,6 +1130,7 @@ endmenu
>  config ARM64_SVE
>  	bool "ARM Scalable Vector Extension support"
>  	default y
> +	depends on !KVM || ARM64_VHE
>  	help
>  	  The Scalable Vector Extension (SVE) is an extension to the AArch64
>  	  execution state which complements and extends the SIMD functionality
> @@ -1155,6 +1156,12 @@ config ARM64_SVE
>  	  booting the kernel.  If unsure and you are not observing these
>  	  symptoms, you should assume that it is safe to say Y.
>  
> +	  CPUs that support SVE are architecturally required to support the
> +	  Virtualization Host Extensions (VHE), so the kernel makes no
> +	  provision for supporting SVE alongside KVM without VHE enabled.
> +	  Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
> +	  KVM in the same kernel image.
> +
>  config ARM64_MODULE_PLTS
>  	bool
>  	select HAVE_MOD_ARCH_SPECIFIC
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b3fe730..06d5a61 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -405,6 +405,19 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>  	kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
>  }
>  
> +static inline bool kvm_arch_check_sve_has_vhe(void)
> +{
> +	/*
> +	 * The Arm architecture specifies that imlpementation of SVE

nit: implementation

> +	 * requires VHE also to be implemented.  The KVM code for arm64
> +	 * relies on this when SVE is present:
> +	 */
> +	if (system_supports_sve())
> +		return has_vhe();
> +	else
> +		return true;
> +}
> +
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 365933a..dc6ecfa 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -59,7 +59,6 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
>   */
>  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
>  {
> -	BUG_ON(system_supports_sve());
>  	BUG_ON(!current->mm);
>  
>  	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 118f300..a6a8c7d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -21,6 +21,7 @@
>  
>  #include <kvm/arm_psci.h>
>  
> +#include <asm/cpufeature.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
> @@ -28,6 +29,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/processor.h>
>  #include <asm/thread_info.h>
>  
>  /* Check whether the FP regs were dirtied while in the host-side run loop: */
> @@ -329,6 +331,8 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  				    struct kvm_vcpu *vcpu)
>  {
> +	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
> +
>  	if (has_vhe())
>  		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>  			     cpacr_el1);
> @@ -339,7 +343,21 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  	isb();
>  
>  	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
> -		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> +		/*
> +		 * In the SVE case, VHE is assumed: it is enforced by
> +		 * Kconfig and kvm_arch_init().
> +		 */
> +		if (system_supports_sve() &&
> +		    (vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE)) {
> +			struct thread_struct *thread = container_of(
> +				host_fpsimd,
> +				struct thread_struct, uw.fpsimd_state);
> +
> +			sve_save_state(sve_pffr(thread), &host_fpsimd->fpsr);
> +		} else {
> +			__fpsimd_save_state(host_fpsimd);
> +		}
> +
>  		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
>  	}
>  
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index bee226c..ce7c6f3 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -16,6 +16,7 @@
>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>   */
>  
> +#include <linux/bug.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/errno.h>
>  #include <linux/err.h>
> @@ -41,6 +42,7 @@
>  #include <asm/mman.h>
>  #include <asm/tlbflush.h>
>  #include <asm/cacheflush.h>
> +#include <asm/cpufeature.h>
>  #include <asm/virt.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
> @@ -1574,6 +1576,11 @@ int kvm_arch_init(void *opaque)
>  		return -ENODEV;
>  	}
>  
> +	if (!kvm_arch_check_sve_has_vhe()) {
> +		kvm_pr_unimpl("SVE system without VHE unsupported.  Broken cpu?");
> +		return -ENODEV;
> +	}
> +
>  	for_each_online_cpu(cpu) {
>  		smp_call_function_single(cpu, check_kvm_target_cpu, &ret, 1);
>  		if (ret < 0) {
> -- 
> 2.1.4
> 
> 

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 16/18] KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24  9:11     ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  9:11 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Tue, May 22, 2018 at 05:05:17PM +0100, Dave Martin wrote:
> In fixup_guest_exit(), there are a couple of cases where after
> checking what the exit code was, we assign it explicitly with the
> value it already had.
> 
> Assuming this is not indicative of a bug, these assignments are not
> needed.
> 
> This patch removes the redundant assignments, and simplifies some
> if-nesting that becomes trivial as a result.
> 
> No functional change.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@arm.com>

> ---
>  arch/arm64/kvm/hyp/switch.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index a6a8c7d..18d0faa 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -403,12 +403,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  		if (valid) {
>  			int ret = __vgic_v2_perform_cpuif_access(vcpu);
>  
> -			if (ret == 1) {
> -				if (__skip_instr(vcpu))
> -					return true;
> -				else
> -					*exit_code = ARM_EXCEPTION_TRAP;
> -			}
> +			if (ret ==  1 && __skip_instr(vcpu))
> +				return true;
>  
>  			if (ret == -1) {
>  				/* Promote an illegal access to an
> @@ -430,12 +426,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
>  		int ret = __vgic_v3_perform_cpuif_access(vcpu);
>  
> -		if (ret == 1) {
> -			if (__skip_instr(vcpu))
> -				return true;
> -			else
> -				*exit_code = ARM_EXCEPTION_TRAP;
> -		}
> +		if (ret == 1 && __skip_instr(vcpu))
> +			return true;
>  	}
>  
>  	/* Return to the host kernel and handle the exit */
> -- 
> 2.1.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 16/18] KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
@ 2018-05-24  9:11     ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  9:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 22, 2018 at 05:05:17PM +0100, Dave Martin wrote:
> In fixup_guest_exit(), there are a couple of cases where after
> checking what the exit code was, we assign it explicitly with the
> value it already had.
> 
> Assuming this is not indicative of a bug, these assignments are not
> needed.
> 
> This patch removes the redundant assignments, and simplifies some
> if-nesting that becomes trivial as a result.
> 
> No functional change.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@arm.com>

> ---
>  arch/arm64/kvm/hyp/switch.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index a6a8c7d..18d0faa 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -403,12 +403,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  		if (valid) {
>  			int ret = __vgic_v2_perform_cpuif_access(vcpu);
>  
> -			if (ret == 1) {
> -				if (__skip_instr(vcpu))
> -					return true;
> -				else
> -					*exit_code = ARM_EXCEPTION_TRAP;
> -			}
> +			if (ret ==  1 && __skip_instr(vcpu))
> +				return true;
>  
>  			if (ret == -1) {
>  				/* Promote an illegal access to an
> @@ -430,12 +426,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
>  		int ret = __vgic_v3_perform_cpuif_access(vcpu);
>  
> -		if (ret == 1) {
> -			if (__skip_instr(vcpu))
> -				return true;
> -			else
> -				*exit_code = ARM_EXCEPTION_TRAP;
> -		}
> +		if (ret == 1 && __skip_instr(vcpu))
> +			return true;
>  	}
>  
>  	/* Return to the host kernel and handle the exit */
> -- 
> 2.1.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 17/18] KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24  9:12     ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  9:12 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Tue, May 22, 2018 at 05:05:18PM +0100, Dave Martin wrote:
> The entire tail of fixup_guest_exit() is contained in if statements
> of the form if (x && *exit_code == ARM_EXCEPTION_TRAP).  As a result,
> we can check just once and bail out of the function early, allowing
> the remaining if conditions to be simplified.
> 
> The only awkward case is where *exit_code is changed to
> ARM_EXCEPTION_EL1_SERROR in the case of an illegal GICv2 CPU
> interface access: in that case, the GICv3 trap handling code is
> skipped using a goto.  This avoids pointlessly evaluating the
> static branch check for the GICv3 case, even though we can't have
> vgic_v2_cpuif_trap and vgic_v3_cpuif_trap true simultaneously
> unless we have a GICv3 and GICv2 on the host: that sounds stupid,
> but I haven't satisfied myself that it can't happen.
> 
> No functional change.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>

> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/switch.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 18d0faa..4fbee95 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -387,11 +387,13 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	 * same PC once the SError has been injected, and replay the
>  	 * trapping instruction.
>  	 */
> -	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
> +	if (*exit_code != ARM_EXCEPTION_TRAP)
> +		goto exit;
> +
> +	if (!__populate_fault_info(vcpu))
>  		return true;
>  
> -	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
> -	    *exit_code == ARM_EXCEPTION_TRAP) {
> +	if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
>  		bool valid;
>  
>  		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
> @@ -417,11 +419,12 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
>  				*exit_code = ARM_EXCEPTION_EL1_SERROR;
>  			}
> +
> +			goto exit;
>  		}
>  	}
>  
>  	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
> -	    *exit_code == ARM_EXCEPTION_TRAP &&
>  	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
>  	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
>  		int ret = __vgic_v3_perform_cpuif_access(vcpu);
> @@ -430,6 +433,7 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  			return true;
>  	}
>  
> +exit:
>  	/* Return to the host kernel and handle the exit */
>  	return false;
>  }
> -- 
> 2.1.4
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 17/18] KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
@ 2018-05-24  9:12     ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24  9:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 22, 2018 at 05:05:18PM +0100, Dave Martin wrote:
> The entire tail of fixup_guest_exit() is contained in if statements
> of the form if (x && *exit_code == ARM_EXCEPTION_TRAP).  As a result,
> we can check just once and bail out of the function early, allowing
> the remaining if conditions to be simplified.
> 
> The only awkward case is where *exit_code is changed to
> ARM_EXCEPTION_EL1_SERROR in the case of an illegal GICv2 CPU
> interface access: in that case, the GICv3 trap handling code is
> skipped using a goto.  This avoids pointlessly evaluating the
> static branch check for the GICv3 case, even though we can't have
> vgic_v2_cpuif_trap and vgic_v3_cpuif_trap true simultaneously
> unless we have a GICv3 and GICv2 on the host: that sounds stupid,
> but I haven't satisfied myself that it can't happen.
> 
> No functional change.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>

> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/switch.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 18d0faa..4fbee95 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -387,11 +387,13 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	 * same PC once the SError has been injected, and replay the
>  	 * trapping instruction.
>  	 */
> -	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
> +	if (*exit_code != ARM_EXCEPTION_TRAP)
> +		goto exit;
> +
> +	if (!__populate_fault_info(vcpu))
>  		return true;
>  
> -	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
> -	    *exit_code == ARM_EXCEPTION_TRAP) {
> +	if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
>  		bool valid;
>  
>  		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
> @@ -417,11 +419,12 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
>  				*exit_code = ARM_EXCEPTION_EL1_SERROR;
>  			}
> +
> +			goto exit;
>  		}
>  	}
>  
>  	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
> -	    *exit_code == ARM_EXCEPTION_TRAP &&
>  	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
>  	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
>  		int ret = __vgic_v3_perform_cpuif_access(vcpu);
> @@ -430,6 +433,7 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  			return true;
>  	}
>  
> +exit:
>  	/* Return to the host kernel and handle the exit */
>  	return false;
>  }
> -- 
> 2.1.4
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
  2018-05-24  8:54         ` Dave Martin
@ 2018-05-24  9:14           ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:14 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, Christoffer Dall, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> On Thu, May 24, 2018 at 10:12:20AM +0200, Christoffer Dall wrote:
>> On Wed, May 23, 2018 at 08:35:13PM +0100, Alex Bennée wrote:
>> >
>> > Dave Martin <Dave.Martin@arm.com> writes:
>> >
>> > > To make the lazy FPSIMD context switch trap code easier to hack on,
>> > > this patch converts it to C.
>> > >
>> > > This is not amazingly efficient, but the trap should typically only
>> > > be taken once per host context switch.
>> > >
>> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>> > > ---
>> > >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
>> > >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>> > >  2 files changed, 46 insertions(+), 35 deletions(-)
>
> [...]
>
>> > > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> > > index d964523..c0796c4 100644
>> > > --- a/arch/arm64/kvm/hyp/switch.c
>> > > +++ b/arch/arm64/kvm/hyp/switch.c
>> > > @@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>> > >  	}
>> > >  }
>> > >
>> > > +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>> > > +				    struct kvm_vcpu *vcpu)
>> > > +{
>> > > +	kvm_cpu_context_t *host_ctxt;
>> > > +
>> > > +	if (has_vhe())
>> > > +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>> > > +			     cpacr_el1);
>> > > +	else
>> > > +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
>> > > +			     cptr_el2);
>> >
>> > Is there no way to do alternative() in C or does it always come down to
>> > different inline asms?
>> >
>>
>> has_vhe() should resolve to a static key, and I prefer this over the
>> previous alternative construct we had for selecting function calls in C,
>> as that resultet in having to follow too many levels of indirection.
>
> I'll defer to Christoffer on that -- I was just following precedent :)
>
> The if (has_vhe()) approach has the benefit of being much more
> readable, and the static branch predictor in many CPUs will succeed in
> folding a short-range unconditional branch out entirely.  There will be
> a small increase in I-cache pressure due to the larger inline code
> size, but probably not much beyond that.

Fair enough - it was mostly a curiosity. It seems most of the use of
alternative() are mostly at the low level instruction level anyway.

--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C
@ 2018-05-24  9:14           ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:14 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> On Thu, May 24, 2018 at 10:12:20AM +0200, Christoffer Dall wrote:
>> On Wed, May 23, 2018 at 08:35:13PM +0100, Alex Benn?e wrote:
>> >
>> > Dave Martin <Dave.Martin@arm.com> writes:
>> >
>> > > To make the lazy FPSIMD context switch trap code easier to hack on,
>> > > this patch converts it to C.
>> > >
>> > > This is not amazingly efficient, but the trap should typically only
>> > > be taken once per host context switch.
>> > >
>> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>> > > ---
>> > >  arch/arm64/kvm/hyp/entry.S  | 57 +++++++++++++++++----------------------------
>> > >  arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++
>> > >  2 files changed, 46 insertions(+), 35 deletions(-)
>
> [...]
>
>> > > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> > > index d964523..c0796c4 100644
>> > > --- a/arch/arm64/kvm/hyp/switch.c
>> > > +++ b/arch/arm64/kvm/hyp/switch.c
>> > > @@ -318,6 +318,30 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>> > >  	}
>> > >  }
>> > >
>> > > +void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>> > > +				    struct kvm_vcpu *vcpu)
>> > > +{
>> > > +	kvm_cpu_context_t *host_ctxt;
>> > > +
>> > > +	if (has_vhe())
>> > > +		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>> > > +			     cpacr_el1);
>> > > +	else
>> > > +		write_sysreg(read_sysreg(cptr_el2) & ~(u64)CPTR_EL2_TFP,
>> > > +			     cptr_el2);
>> >
>> > Is there no way to do alternative() in C or does it always come down to
>> > different inline asms?
>> >
>>
>> has_vhe() should resolve to a static key, and I prefer this over the
>> previous alternative construct we had for selecting function calls in C,
>> as that resultet in having to follow too many levels of indirection.
>
> I'll defer to Christoffer on that -- I was just following precedent :)
>
> The if (has_vhe()) approach has the benefit of being much more
> readable, and the static branch predictor in many CPUs will succeed in
> folding a short-range unconditional branch out entirely.  There will be
> a small increase in I-cache pressure due to the larger inline code
> size, but probably not much beyond that.

Fair enough - it was mostly a curiosity. It seems most of the use of
alternative() are mostly at the low level instruction level anyway.

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-24  8:33             ` Christoffer Dall
@ 2018-05-24  9:16               ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:16 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, Dave Martin, linux-arm-kernel


Christoffer Dall <christoffer.dall@arm.com> writes:

> On Wed, May 23, 2018 at 04:03:37PM +0100, Dave Martin wrote:
>> On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
>> > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
>> > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
>> > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
>> > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
>> > > > > cleared except when returning to userspace or returning from a
>> > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
>> > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
>> > > > > ever be saved.
>> > > >
>> > > > I don't understand this construction proof; from looking at the patch
>> > > > below it is not obvious to me why fpsimd_thread_switch() can never have
>> > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
>> > > > kernel thread?
>> > >
>> > > Looking at this again, I think it is poorly worded.  This patch aims to
>> > > make it true by construction, but it isn't prior to the patch.
>> > >
>> > > I'm tempted to delete the paragraph: the assertion of both untrue and
>> > > not the best way to justify that this patch works.
>> > >
>> > >
>> > > How about:
>> > >
>> > > -8<-
>> > >
>> > > The context switch logic already isolates user threads from each other.
>> > > This, it is sufficient for isolating user threads from the kernel,
>
> s/This/Thus/ ?
>
> I don't understand what 'it' refers to here?
>
>> > > since the goal either way is to ensure that code executing in userspace
>> > > cannot see any FPSIMD state except its own.  Thus, there is no special
>> > > property of kernel threads that we care about except that it is
>> > > pointless to save or load FPSIMD register state for them.
>
> Actually, I'm not really sure what this paragraph is getting at.
>
>> > >
>> > > At worst, the removal of all the kernel thread special cases by this
>> > > patch would thus spuriously load and save state for kernel threads when
>> > > unnecessary.
>> > >
>> > > But the context switch logic is already deliberately optimised to defer
>> > > reloads of the regs until ret_to_user (or sigreturn as a special case),
>> > > which kernel threads by definition never reach.
>> > >
>> > > ->8-
>> >
>> > The "at worst" paragraph makes it look like it could happen (at least
>> > until you reach the last paragraph). Maybe you can just say that
>> > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
>> > always true for kernel threads. You should probably mention this in a
>> > comment in the code as well.
>>
>> What if I just delete the second paragraph, and remove the "But" from
>> the start of the third, and append:
>>
>> "As a result, the wrong_task and wrong_cpu tests in
>> fpsimd_thread_switch() will always yield false for kernel threads."
>>
>> ...with a similar comment in the code?
>
> ...with a risk of being a bit over-pedantic and annoying, may I suggest
> the following complete commit text:
>
> ------8<------
> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
>
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For this reason, the ->mm
> checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> maintained properly for kernel threads.
>
> FPSIMD context is never preserved for kernel threads across a context
> switch and therefore TIF_FOREIGN_FPSTATE should always be true for
> kernel threads.  This is indeed the case, as the wrong_task and
> wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> kernel threads.
>
> Further, the context switch logic is already deliberately optimised to
> defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> special case), which kernel threads by definition never reach, and
> therefore this change introduces no additional work in the critical
> path.
>
> This patch removes the redundant checks and special-case code.
> ------8<------

FWIW I prefer this version for the commit text.

--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-24  9:16               ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:16 UTC (permalink / raw)
  To: linux-arm-kernel


Christoffer Dall <christoffer.dall@arm.com> writes:

> On Wed, May 23, 2018 at 04:03:37PM +0100, Dave Martin wrote:
>> On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
>> > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
>> > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
>> > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
>> > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
>> > > > > cleared except when returning to userspace or returning from a
>> > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
>> > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
>> > > > > ever be saved.
>> > > >
>> > > > I don't understand this construction proof; from looking at the patch
>> > > > below it is not obvious to me why fpsimd_thread_switch() can never have
>> > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
>> > > > kernel thread?
>> > >
>> > > Looking at this again, I think it is poorly worded.  This patch aims to
>> > > make it true by construction, but it isn't prior to the patch.
>> > >
>> > > I'm tempted to delete the paragraph: the assertion of both untrue and
>> > > not the best way to justify that this patch works.
>> > >
>> > >
>> > > How about:
>> > >
>> > > -8<-
>> > >
>> > > The context switch logic already isolates user threads from each other.
>> > > This, it is sufficient for isolating user threads from the kernel,
>
> s/This/Thus/ ?
>
> I don't understand what 'it' refers to here?
>
>> > > since the goal either way is to ensure that code executing in userspace
>> > > cannot see any FPSIMD state except its own.  Thus, there is no special
>> > > property of kernel threads that we care about except that it is
>> > > pointless to save or load FPSIMD register state for them.
>
> Actually, I'm not really sure what this paragraph is getting at.
>
>> > >
>> > > At worst, the removal of all the kernel thread special cases by this
>> > > patch would thus spuriously load and save state for kernel threads when
>> > > unnecessary.
>> > >
>> > > But the context switch logic is already deliberately optimised to defer
>> > > reloads of the regs until ret_to_user (or sigreturn as a special case),
>> > > which kernel threads by definition never reach.
>> > >
>> > > ->8-
>> >
>> > The "at worst" paragraph makes it look like it could happen (at least
>> > until you reach the last paragraph). Maybe you can just say that
>> > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
>> > always true for kernel threads. You should probably mention this in a
>> > comment in the code as well.
>>
>> What if I just delete the second paragraph, and remove the "But" from
>> the start of the third, and append:
>>
>> "As a result, the wrong_task and wrong_cpu tests in
>> fpsimd_thread_switch() will always yield false for kernel threads."
>>
>> ...with a similar comment in the code?
>
> ...with a risk of being a bit over-pedantic and annoying, may I suggest
> the following complete commit text:
>
> ------8<------
> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
>
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For this reason, the ->mm
> checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> maintained properly for kernel threads.
>
> FPSIMD context is never preserved for kernel threads across a context
> switch and therefore TIF_FOREIGN_FPSTATE should always be true for
> kernel threads.  This is indeed the case, as the wrong_task and
> wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> kernel threads.
>
> Further, the context switch logic is already deliberately optimised to
> defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> special case), which kernel threads by definition never reach, and
> therefore this change introduces no additional work in the critical
> path.
>
> This patch removes the redundant checks and special-case code.
> ------8<------

FWIW I prefer this version for the commit text.

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
  2018-05-24  8:11         ` Christoffer Dall
@ 2018-05-24  9:18           ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:18 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel, Dave Martin,
	Christoffer Dall


Christoffer Dall <christoffer.dall@arm.com> writes:

> On Wed, May 23, 2018 at 03:40:26PM +0100, Dave Martin wrote:
>> On Wed, May 23, 2018 at 03:34:20PM +0100, Alex Bennée wrote:
>> >
>> > Dave Martin <Dave.Martin@arm.com> writes:
>> >
>> > > From: Christoffer Dall <christoffer.dall@linaro.org>
>> > >
>> > > KVM/ARM differs from other architectures in having to maintain an
>> > > additional virtual address space from that of the host and the
>> > > guest, because we split the execution of KVM across both EL1 and
>> > > EL2.
>> > >
>> > > This results in a need to explicitly map data structures into EL2
>> > > (hyp) which are accessed from the hyp code.  As we are about to be
>> > > more clever with our FPSIMD handling on arm64, which stores data in
>> > > the task struct and uses thread_info flags, we will have to map
>> > > parts of the currently executing task struct into the EL2 virtual
>> > > address space.
>> > >
>> > > However, we don't want to do this on every KVM_RUN, because it is a
>> > > fairly expensive operation to walk the page tables, and the common
>> > > execution mode is to map a single thread to a VCPU.  By introducing
>> > > a hook that architectures can select with
>> > > HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
>> > > other architectures, but have a simple way to only map the data we
>> > > need when required for arm64.
>> > >
>> > > This patch introduces the framework only, and wires it up in the
>> > > arm/arm64 KVM common code.
>> > >
>> > > No functional change.
>> > >
>> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>> > > ---
>> > >  include/linux/kvm_host.h | 9 +++++++++
>> > >  virt/kvm/Kconfig         | 3 +++
>> > >  virt/kvm/kvm_main.c      | 7 ++++++-
>> > >  3 files changed, 18 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> > > index 6930c63..4268ace 100644
>> > > --- a/include/linux/kvm_host.h
>> > > +++ b/include/linux/kvm_host.h
>> > > @@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
>> > >  void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>> > >  		unsigned long start, unsigned long end);
>> > >
>> > > +#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
>> > > +int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
>> > > +#else
>> > > +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>> > > +{
>> > > +	return 0;
>> > > +}
>> > > +#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
>> > > +
>> > >  #endif
>> > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> > > index cca7e06..72143cf 100644
>> > > --- a/virt/kvm/Kconfig
>> > > +++ b/virt/kvm/Kconfig
>> > > @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
>> > >
>> > >  config HAVE_KVM_VCPU_ASYNC_IOCTL
>> > >         bool
>> > > +
>> > > +config HAVE_KVM_VCPU_RUN_PID_CHANGE
>> > > +       bool
>> >
>> > This almost threw me as I thought you might be able to enable this and
>> > break the build, but apparently not:
>> >
>> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> Without a "help", the option seems non-interactive and cannot be true
>> unless something selects it.  It seems a bit weird to me too, but the
>> idiom appears widely used...
>>
> Indeed, I've copied this idiom from other things before and nobody has
> complained, so I think it works (without any further deep insights into
> the inner workings of Kconfig).

It's fine. My main worry was breaking bisection with the normal "make
olddefconfig" approach. I tested it and found it to be fine and I don't
think we need to worry about people adding the symbol to .config
manually - they get to keep both pieces ;-)

--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
@ 2018-05-24  9:18           ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:18 UTC (permalink / raw)
  To: linux-arm-kernel


Christoffer Dall <christoffer.dall@arm.com> writes:

> On Wed, May 23, 2018 at 03:40:26PM +0100, Dave Martin wrote:
>> On Wed, May 23, 2018 at 03:34:20PM +0100, Alex Benn?e wrote:
>> >
>> > Dave Martin <Dave.Martin@arm.com> writes:
>> >
>> > > From: Christoffer Dall <christoffer.dall@linaro.org>
>> > >
>> > > KVM/ARM differs from other architectures in having to maintain an
>> > > additional virtual address space from that of the host and the
>> > > guest, because we split the execution of KVM across both EL1 and
>> > > EL2.
>> > >
>> > > This results in a need to explicitly map data structures into EL2
>> > > (hyp) which are accessed from the hyp code.  As we are about to be
>> > > more clever with our FPSIMD handling on arm64, which stores data in
>> > > the task struct and uses thread_info flags, we will have to map
>> > > parts of the currently executing task struct into the EL2 virtual
>> > > address space.
>> > >
>> > > However, we don't want to do this on every KVM_RUN, because it is a
>> > > fairly expensive operation to walk the page tables, and the common
>> > > execution mode is to map a single thread to a VCPU.  By introducing
>> > > a hook that architectures can select with
>> > > HAVE_KVM_VCPU_RUN_PID_CHANGE, we do not introduce overhead for
>> > > other architectures, but have a simple way to only map the data we
>> > > need when required for arm64.
>> > >
>> > > This patch introduces the framework only, and wires it up in the
>> > > arm/arm64 KVM common code.
>> > >
>> > > No functional change.
>> > >
>> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>> > > ---
>> > >  include/linux/kvm_host.h | 9 +++++++++
>> > >  virt/kvm/Kconfig         | 3 +++
>> > >  virt/kvm/kvm_main.c      | 7 ++++++-
>> > >  3 files changed, 18 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> > > index 6930c63..4268ace 100644
>> > > --- a/include/linux/kvm_host.h
>> > > +++ b/include/linux/kvm_host.h
>> > > @@ -1276,4 +1276,13 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
>> > >  void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
>> > >  		unsigned long start, unsigned long end);
>> > >
>> > > +#ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
>> > > +int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
>> > > +#else
>> > > +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>> > > +{
>> > > +	return 0;
>> > > +}
>> > > +#endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
>> > > +
>> > >  #endif
>> > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> > > index cca7e06..72143cf 100644
>> > > --- a/virt/kvm/Kconfig
>> > > +++ b/virt/kvm/Kconfig
>> > > @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
>> > >
>> > >  config HAVE_KVM_VCPU_ASYNC_IOCTL
>> > >         bool
>> > > +
>> > > +config HAVE_KVM_VCPU_RUN_PID_CHANGE
>> > > +       bool
>> >
>> > This almost threw me as I thought you might be able to enable this and
>> > break the build, but apparently not:
>> >
>> > Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
>>
>> Without a "help", the option seems non-interactive and cannot be true
>> unless something selects it.  It seems a bit weird to me too, but the
>> idiom appears widely used...
>>
> Indeed, I've copied this idiom from other things before and nobody has
> complained, so I think it works (without any further deep insights into
> the inner workings of Kconfig).

It's fine. My main worry was breaking bisection with the normal "make
olddefconfig" approach. I tested it and found it to be fine and I don't
think we need to worry about people adding the symbol to .config
manually - they get to keep both pieces ;-)

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24  9:19     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:19 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
>
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For these reasons, the ->mm
> checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> maintained in a consistent way for kernel threads.
>
> This is true by construction however: TIF_FOREIGN_FPSTATE is never
> cleared except when returning to userspace or returning from a
> signal: thus, for a true kernel thread no FPSIMD context is ever
> loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> ever be saved.
>
> This patch removes the redundant checks and special-case code.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

With Christoffer's commit text:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
> ---
>
> Changes since v9:
>
>  * New patch.  Introduced during debugging, since the ->mm checks
>    appear bogus and/or redundant, so are likely to be hiding or
>    causing bugs.
> ---
>  arch/arm64/include/asm/thread_info.h |  1 +
>  arch/arm64/kernel/fpsimd.c           | 38 ++++++++++++------------------------
>  2 files changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 740aa03c..a2ac914 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -47,6 +47,7 @@ struct thread_info {
>
>  #define INIT_THREAD_INFO(tsk)						\
>  {									\
> +	.flags		= _TIF_FOREIGN_FPSTATE,				\
>  	.preempt_count	= INIT_PREEMPT_COUNT,				\
>  	.addr_limit	= KERNEL_DS,					\
>  }
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 3aa100a..1222491 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -891,31 +891,21 @@ asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>
>  void fpsimd_thread_switch(struct task_struct *next)
>  {
> +	bool wrong_task, wrong_cpu;
> +
>  	if (!system_supports_fpsimd())
>  		return;
> -	/*
> -	 * Save the current FPSIMD state to memory, but only if whatever is in
> -	 * the registers is in fact the most recent userland FPSIMD state of
> -	 * 'current'.
> -	 */
> -	if (current->mm)
> -		fpsimd_save();
>
> -	if (next->mm) {
> -		/*
> -		 * If we are switching to a task whose most recent userland
> -		 * FPSIMD state is already in the registers of *this* cpu,
> -		 * we can skip loading the state from memory. Otherwise, set
> -		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
> -		 * upon the next return to userland.
> -		 */
> -		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> +	/* Save unsaved fpsimd state, if any: */
> +	fpsimd_save();
> +
> +	/* Fix up TIF_FOREIGN_FPSTATE to correctly describe next's state: */
> +	wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
>  					&next->thread.uw.fpsimd_state;
> -		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> +	wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
>
> -		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> -				       wrong_task || wrong_cpu);
> -	}
> +	update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> +			       wrong_task || wrong_cpu);
>  }
>
>  void fpsimd_flush_thread(void)
> @@ -1120,9 +1110,8 @@ void kernel_neon_begin(void)
>
>  	__this_cpu_write(kernel_neon_busy, true);
>
> -	/* Save unsaved task fpsimd state, if any: */
> -	if (current->mm)
> -		fpsimd_save();
> +	/* Save unsaved fpsimd state, if any: */
> +	fpsimd_save();
>
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
> @@ -1244,8 +1233,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  {
>  	switch (cmd) {
>  	case CPU_PM_ENTER:
> -		if (current->mm)
> -			fpsimd_save();
> +		fpsimd_save();
>  		fpsimd_flush_cpu_state();
>  		break;
>  	case CPU_PM_EXIT:


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-24  9:19     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:19 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
>
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For these reasons, the ->mm
> checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> maintained in a consistent way for kernel threads.
>
> This is true by construction however: TIF_FOREIGN_FPSTATE is never
> cleared except when returning to userspace or returning from a
> signal: thus, for a true kernel thread no FPSIMD context is ever
> loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> ever be saved.
>
> This patch removes the redundant checks and special-case code.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

With Christoffer's commit text:

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

>
> ---
>
> Changes since v9:
>
>  * New patch.  Introduced during debugging, since the ->mm checks
>    appear bogus and/or redundant, so are likely to be hiding or
>    causing bugs.
> ---
>  arch/arm64/include/asm/thread_info.h |  1 +
>  arch/arm64/kernel/fpsimd.c           | 38 ++++++++++++------------------------
>  2 files changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index 740aa03c..a2ac914 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -47,6 +47,7 @@ struct thread_info {
>
>  #define INIT_THREAD_INFO(tsk)						\
>  {									\
> +	.flags		= _TIF_FOREIGN_FPSTATE,				\
>  	.preempt_count	= INIT_PREEMPT_COUNT,				\
>  	.addr_limit	= KERNEL_DS,					\
>  }
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 3aa100a..1222491 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -891,31 +891,21 @@ asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>
>  void fpsimd_thread_switch(struct task_struct *next)
>  {
> +	bool wrong_task, wrong_cpu;
> +
>  	if (!system_supports_fpsimd())
>  		return;
> -	/*
> -	 * Save the current FPSIMD state to memory, but only if whatever is in
> -	 * the registers is in fact the most recent userland FPSIMD state of
> -	 * 'current'.
> -	 */
> -	if (current->mm)
> -		fpsimd_save();
>
> -	if (next->mm) {
> -		/*
> -		 * If we are switching to a task whose most recent userland
> -		 * FPSIMD state is already in the registers of *this* cpu,
> -		 * we can skip loading the state from memory. Otherwise, set
> -		 * the TIF_FOREIGN_FPSTATE flag so the state will be loaded
> -		 * upon the next return to userland.
> -		 */
> -		bool wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
> +	/* Save unsaved fpsimd state, if any: */
> +	fpsimd_save();
> +
> +	/* Fix up TIF_FOREIGN_FPSTATE to correctly describe next's state: */
> +	wrong_task = __this_cpu_read(fpsimd_last_state.st) !=
>  					&next->thread.uw.fpsimd_state;
> -		bool wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
> +	wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id();
>
> -		update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> -				       wrong_task || wrong_cpu);
> -	}
> +	update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE,
> +			       wrong_task || wrong_cpu);
>  }
>
>  void fpsimd_flush_thread(void)
> @@ -1120,9 +1110,8 @@ void kernel_neon_begin(void)
>
>  	__this_cpu_write(kernel_neon_busy, true);
>
> -	/* Save unsaved task fpsimd state, if any: */
> -	if (current->mm)
> -		fpsimd_save();
> +	/* Save unsaved fpsimd state, if any: */
> +	fpsimd_save();
>
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
> @@ -1244,8 +1233,7 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self,
>  {
>  	switch (cmd) {
>  	case CPU_PM_ENTER:
> -		if (current->mm)
> -			fpsimd_save();
> +		fpsimd_save();
>  		fpsimd_flush_cpu_state();
>  		break;
>  	case CPU_PM_EXIT:


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 09/18] KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24  9:21     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:21 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In struct vcpu_arch, the debug_flags field is used to store
> debug-related flags about the vcpu state.
>
> Since we are about to add some more flags related to FPSIMD and
> SVE, it makes sense to add them to the existing flags field rather
> than adding new fields.  Since there is only one debug_flags flag
> defined so far, there is plenty of free space for expansion.
>
> In preparation for adding more flags, this patch renames the
> debug_flags field to simply "flags", and updates comments
> appropriately.
>
> The flag definitions are also moved to <asm/kvm_host.h>, since
> their presence in <asm/kvm_asm.h> was for purely historical
> reasons:  these definitions are not used from asm any more, and not
> very likely to be as more Hyp asm is migrated to C.
>
> KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit
> 1ea66d27e7b0 ("arm64: KVM: Move away from the assembly version of
> the world switch"), so this patch gets rid of that too.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Christoffer Dall <christoffer.dall@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  arch/arm64/include/asm/kvm_asm.h  | 3 ---
>  arch/arm64/include/asm/kvm_host.h | 7 +++++--
>  arch/arm64/kvm/debug.c            | 8 ++++----
>  arch/arm64/kvm/hyp/debug-sr.c     | 6 +++---
>  arch/arm64/kvm/hyp/sysreg-sr.c    | 4 ++--
>  arch/arm64/kvm/sys_regs.c         | 9 ++++-----
>  6 files changed, 18 insertions(+), 19 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index f6648a3..f62ccbf 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -30,9 +30,6 @@
>  /* The hyp-stub will return this for any kvm_call_hyp() call */
>  #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
>
> -#define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
> -#define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
> -
>  /* Translate a kernel address of @sym into its equivalent linear mapping */
>  #define kvm_ksym_ref(sym)						\
>  	({								\
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 469de8a..146c167 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -216,8 +216,8 @@ struct kvm_vcpu_arch {
>  	/* Exception Information */
>  	struct kvm_vcpu_fault_info fault;
>
> -	/* Guest debug state */
> -	u64 debug_flags;
> +	/* Miscellaneous vcpu state flags */
> +	u64 flags;
>
>  	/*
>  	 * We maintain more than a single set of debug registers to support
> @@ -293,6 +293,9 @@ struct kvm_vcpu_arch {
>  	bool sysregs_loaded_on_cpu;
>  };
>
> +/* vcpu_arch flags field values: */
> +#define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
> +
>  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
>
>  /*
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index a1f4ebd..00d4223 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -103,7 +103,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>   *
>   * Additionally, KVM only traps guest accesses to the debug registers if
>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> - * flag on vcpu->arch.debug_flags).  Since the guest must not interfere
> + * flag on vcpu->arch.flags).  Since the guest must not interfere
>   * with the hardware state when debugging the guest, we must ensure that
>   * trapping is enabled whenever we are debugging the guest using the
>   * debug registers.
> @@ -111,7 +111,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  {
> -	bool trap_debug = !(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY);
> +	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>  	unsigned long mdscr;
>
>  	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
> @@ -184,7 +184,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  			vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
>
>  			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
> -			vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  			trap_debug = true;
>
>  			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
> @@ -206,7 +206,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>
>  	/* If KDE or MDE are set, perform a full save/restore cycle. */
>  	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>
>  	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>  	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 3e717f6..5000976 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -163,7 +163,7 @@ void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
>  	if (!has_vhe())
>  		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
>
> -	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
> +	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
>
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> @@ -185,7 +185,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
>  	if (!has_vhe())
>  		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
>
> -	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
> +	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
>
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> @@ -196,7 +196,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
>  	__debug_save_state(vcpu, guest_dbg, guest_ctxt);
>  	__debug_restore_state(vcpu, host_dbg, host_ctxt);
>
> -	vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
> +	vcpu->arch.flags &= ~KVM_ARM64_DEBUG_DIRTY;
>  }
>
>  u32 __hyp_text __kvm_get_mdcr_el2(void)
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index b3894df..35bc168 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -196,7 +196,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
>  	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
>  	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
>
> -	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> +	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
>  		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
>  }
>
> @@ -218,7 +218,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
>  	write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
>  	write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
>
> -	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> +	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
>  		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
>  }
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 6e3b969..a436373 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -31,7 +31,6 @@
>  #include <asm/debug-monitors.h>
>  #include <asm/esr.h>
>  #include <asm/kvm_arm.h>
> -#include <asm/kvm_asm.h>
>  #include <asm/kvm_coproc.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
> @@ -338,7 +337,7 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>  {
>  	if (p->is_write) {
>  		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  	} else {
>  		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
>  	}
> @@ -369,7 +368,7 @@ static void reg_to_dbg(struct kvm_vcpu *vcpu,
>  	}
>
>  	*dbg_reg = val;
> -	vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +	vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  }
>
>  static void dbg_to_reg(struct kvm_vcpu *vcpu,
> @@ -1441,7 +1440,7 @@ static bool trap_debug32(struct kvm_vcpu *vcpu,
>  {
>  	if (p->is_write) {
>  		vcpu_cp14(vcpu, r->reg) = p->regval;
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  	} else {
>  		p->regval = vcpu_cp14(vcpu, r->reg);
>  	}
> @@ -1473,7 +1472,7 @@ static bool trap_xvr(struct kvm_vcpu *vcpu,
>  		val |= p->regval << 32;
>  		*dbg_reg = val;
>
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  	} else {
>  		p->regval = *dbg_reg >> 32;
>  	}


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 09/18] KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
@ 2018-05-24  9:21     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:21 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In struct vcpu_arch, the debug_flags field is used to store
> debug-related flags about the vcpu state.
>
> Since we are about to add some more flags related to FPSIMD and
> SVE, it makes sense to add them to the existing flags field rather
> than adding new fields.  Since there is only one debug_flags flag
> defined so far, there is plenty of free space for expansion.
>
> In preparation for adding more flags, this patch renames the
> debug_flags field to simply "flags", and updates comments
> appropriately.
>
> The flag definitions are also moved to <asm/kvm_host.h>, since
> their presence in <asm/kvm_asm.h> was for purely historical
> reasons:  these definitions are not used from asm any more, and not
> very likely to be as more Hyp asm is migrated to C.
>
> KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit
> 1ea66d27e7b0 ("arm64: KVM: Move away from the assembly version of
> the world switch"), so this patch gets rid of that too.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Christoffer Dall <christoffer.dall@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  arch/arm64/include/asm/kvm_asm.h  | 3 ---
>  arch/arm64/include/asm/kvm_host.h | 7 +++++--
>  arch/arm64/kvm/debug.c            | 8 ++++----
>  arch/arm64/kvm/hyp/debug-sr.c     | 6 +++---
>  arch/arm64/kvm/hyp/sysreg-sr.c    | 4 ++--
>  arch/arm64/kvm/sys_regs.c         | 9 ++++-----
>  6 files changed, 18 insertions(+), 19 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index f6648a3..f62ccbf 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -30,9 +30,6 @@
>  /* The hyp-stub will return this for any kvm_call_hyp() call */
>  #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
>
> -#define KVM_ARM64_DEBUG_DIRTY_SHIFT	0
> -#define KVM_ARM64_DEBUG_DIRTY		(1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
> -
>  /* Translate a kernel address of @sym into its equivalent linear mapping */
>  #define kvm_ksym_ref(sym)						\
>  	({								\
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 469de8a..146c167 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -216,8 +216,8 @@ struct kvm_vcpu_arch {
>  	/* Exception Information */
>  	struct kvm_vcpu_fault_info fault;
>
> -	/* Guest debug state */
> -	u64 debug_flags;
> +	/* Miscellaneous vcpu state flags */
> +	u64 flags;
>
>  	/*
>  	 * We maintain more than a single set of debug registers to support
> @@ -293,6 +293,9 @@ struct kvm_vcpu_arch {
>  	bool sysregs_loaded_on_cpu;
>  };
>
> +/* vcpu_arch flags field values: */
> +#define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
> +
>  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
>
>  /*
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index a1f4ebd..00d4223 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -103,7 +103,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>   *
>   * Additionally, KVM only traps guest accesses to the debug registers if
>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> - * flag on vcpu->arch.debug_flags).  Since the guest must not interfere
> + * flag on vcpu->arch.flags).  Since the guest must not interfere
>   * with the hardware state when debugging the guest, we must ensure that
>   * trapping is enabled whenever we are debugging the guest using the
>   * debug registers.
> @@ -111,7 +111,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  {
> -	bool trap_debug = !(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY);
> +	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
>  	unsigned long mdscr;
>
>  	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
> @@ -184,7 +184,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  			vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1);
>
>  			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
> -			vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  			trap_debug = true;
>
>  			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
> @@ -206,7 +206,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>
>  	/* If KDE or MDE are set, perform a full save/restore cycle. */
>  	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>
>  	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
>  	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 3e717f6..5000976 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -163,7 +163,7 @@ void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
>  	if (!has_vhe())
>  		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
>
> -	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
> +	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
>
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> @@ -185,7 +185,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
>  	if (!has_vhe())
>  		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
>
> -	if (!(vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY))
> +	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
>
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> @@ -196,7 +196,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
>  	__debug_save_state(vcpu, guest_dbg, guest_ctxt);
>  	__debug_restore_state(vcpu, host_dbg, host_ctxt);
>
> -	vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY;
> +	vcpu->arch.flags &= ~KVM_ARM64_DEBUG_DIRTY;
>  }
>
>  u32 __hyp_text __kvm_get_mdcr_el2(void)
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index b3894df..35bc168 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -196,7 +196,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
>  	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
>  	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
>
> -	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> +	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
>  		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
>  }
>
> @@ -218,7 +218,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
>  	write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
>  	write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
>
> -	if (has_vhe() || vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> +	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
>  		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
>  }
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 6e3b969..a436373 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -31,7 +31,6 @@
>  #include <asm/debug-monitors.h>
>  #include <asm/esr.h>
>  #include <asm/kvm_arm.h>
> -#include <asm/kvm_asm.h>
>  #include <asm/kvm_coproc.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
> @@ -338,7 +337,7 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>  {
>  	if (p->is_write) {
>  		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  	} else {
>  		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
>  	}
> @@ -369,7 +368,7 @@ static void reg_to_dbg(struct kvm_vcpu *vcpu,
>  	}
>
>  	*dbg_reg = val;
> -	vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +	vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  }
>
>  static void dbg_to_reg(struct kvm_vcpu *vcpu,
> @@ -1441,7 +1440,7 @@ static bool trap_debug32(struct kvm_vcpu *vcpu,
>  {
>  	if (p->is_write) {
>  		vcpu_cp14(vcpu, r->reg) = p->regval;
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  	} else {
>  		p->regval = vcpu_cp14(vcpu, r->reg);
>  	}
> @@ -1473,7 +1472,7 @@ static bool trap_xvr(struct kvm_vcpu *vcpu,
>  		val |= p->regval << 32;
>  		*dbg_reg = val;
>
> -		vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
> +		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>  	} else {
>  		p->regval = *dbg_reg >> 32;
>  	}


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts
  2018-05-24  9:03       ` Dave Martin
@ 2018-05-24  9:41         ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:41 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> On Wed, May 23, 2018 at 09:15:11PM +0100, Alex Bennée wrote:
>>
>> Dave Martin <Dave.Martin@arm.com> writes:
>>
>> > In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
>> > contexts to be handled by the fpsimd common code, this patch adapts
>> > task_fpsimd_save() to save back the currently loaded context,
>> > removing the explicit dependency on current.
>> >
>> > The relevant storage to write back to in memory is now found by
>> > examining the fpsimd_last_state percpu struct.
>> >
>> > fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
>> > fpsimd_last_state is updated under local_bh_disable() or
>> > local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
>> > thus, fpsimd_save() will write back to the correct storage for the
>> > loaded context.
>> >
>> > No functional change.
>> >
>> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>> > ---
>> >  arch/arm64/kernel/fpsimd.c | 25 +++++++++++++------------
>> >  1 file changed, 13 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> > index 9d85373..3aa100a 100644
>> > --- a/arch/arm64/kernel/fpsimd.c
>> > +++ b/arch/arm64/kernel/fpsimd.c
>> > @@ -270,13 +270,15 @@ static void task_fpsimd_load(void)
>> >  }
>> >
>> >  /*
>> > - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
>> > - * with respect to the CPU registers.
>> > + * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
>> > + * date with respect to the CPU registers.
>> >   *
>> >   * Softirqs (and preemption) must be disabled.
>> >   */
>> > -static void task_fpsimd_save(void)
>> > +static void fpsimd_save(void)
>> >  {
>> > +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
>> > +
>>
>> I thought I was missing something but the only write I saw of this was:
>>
>>   __this_cpu_write(fpsimd_last_state.st, NULL);
>>
>> which implied to me it is possible to have an invalid de-reference. I
>> did figure it out eventually as fpsimd_bind_state_to_cpu uses a more
>> indirect this_cpu_ptr idiom for tweaking this. I guess a reference to
>> fpsimd_bind_[task|state]_to_cpu in the comment would have helped my
>> confusion.
>
> How about:
>
>  static void fpsimd_save(void)
>  {
>  	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> +	/* set by fpsimd_bind_to_cpu() */

Great, thanks ;-)

--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts
@ 2018-05-24  9:41         ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24  9:41 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> On Wed, May 23, 2018 at 09:15:11PM +0100, Alex Benn?e wrote:
>>
>> Dave Martin <Dave.Martin@arm.com> writes:
>>
>> > In preparation for allowing non-task (i.e., KVM vcpu) FPSIMD
>> > contexts to be handled by the fpsimd common code, this patch adapts
>> > task_fpsimd_save() to save back the currently loaded context,
>> > removing the explicit dependency on current.
>> >
>> > The relevant storage to write back to in memory is now found by
>> > examining the fpsimd_last_state percpu struct.
>> >
>> > fpsimd_save() does nothing unless TIF_FOREIGN_FPSTATE is clear, and
>> > fpsimd_last_state is updated under local_bh_disable() or
>> > local_irq_disable() everywhere that TIF_FOREIGN_FPSTATE is cleared:
>> > thus, fpsimd_save() will write back to the correct storage for the
>> > loaded context.
>> >
>> > No functional change.
>> >
>> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
>> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>> > ---
>> >  arch/arm64/kernel/fpsimd.c | 25 +++++++++++++------------
>> >  1 file changed, 13 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
>> > index 9d85373..3aa100a 100644
>> > --- a/arch/arm64/kernel/fpsimd.c
>> > +++ b/arch/arm64/kernel/fpsimd.c
>> > @@ -270,13 +270,15 @@ static void task_fpsimd_load(void)
>> >  }
>> >
>> >  /*
>> > - * Ensure current's FPSIMD/SVE storage in thread_struct is up to date
>> > - * with respect to the CPU registers.
>> > + * Ensure FPSIMD/SVE storage in memory for the loaded context is up to
>> > + * date with respect to the CPU registers.
>> >   *
>> >   * Softirqs (and preemption) must be disabled.
>> >   */
>> > -static void task_fpsimd_save(void)
>> > +static void fpsimd_save(void)
>> >  {
>> > +	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
>> > +
>>
>> I thought I was missing something but the only write I saw of this was:
>>
>>   __this_cpu_write(fpsimd_last_state.st, NULL);
>>
>> which implied to me it is possible to have an invalid de-reference. I
>> did figure it out eventually as fpsimd_bind_state_to_cpu uses a more
>> indirect this_cpu_ptr idiom for tweaking this. I guess a reference to
>> fpsimd_bind_[task|state]_to_cpu in the comment would have helped my
>> confusion.
>
> How about:
>
>  static void fpsimd_save(void)
>  {
>  	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
> +	/* set by fpsimd_bind_to_cpu() */

Great, thanks ;-)

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-24  8:33             ` Christoffer Dall
@ 2018-05-24  9:50               ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24  9:50 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Thu, May 24, 2018 at 10:33:50AM +0200, Christoffer Dall wrote:
> On Wed, May 23, 2018 at 04:03:37PM +0100, Dave Martin wrote:
> > On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> > > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > > > cleared except when returning to userspace or returning from a
> > > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > > > ever be saved.
> > > > > 
> > > > > I don't understand this construction proof; from looking at the patch
> > > > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > > > kernel thread?
> > > > 
> > > > Looking at this again, I think it is poorly worded.  This patch aims to
> > > > make it true by construction, but it isn't prior to the patch.
> > > > 
> > > > I'm tempted to delete the paragraph: the assertion of both untrue and
> > > > not the best way to justify that this patch works.
> > > > 
> > > > 
> > > > How about:
> > > > 
> > > > -8<-
> > > > 
> > > > The context switch logic already isolates user threads from each other.
> > > > This, it is sufficient for isolating user threads from the kernel,
> 
> s/This/Thus/ ?
> 
> I don't understand what 'it' refers to here?
> 
> > > > since the goal either way is to ensure that code executing in userspace
> > > > cannot see any FPSIMD state except its own.  Thus, there is no special
> > > > property of kernel threads that we care about except that it is
> > > > pointless to save or load FPSIMD register state for them.
> 
> Actually, I'm not really sure what this paragraph is getting at.

Reading this again, I don't think the paragraph adds much useful.

So I propose deleting that too.

> > > > 
> > > > At worst, the removal of all the kernel thread special cases by this
> > > > patch would thus spuriously load and save state for kernel threads when
> > > > unnecessary.
> > > > 
> > > > But the context switch logic is already deliberately optimised to defer
> > > > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > > > which kernel threads by definition never reach.
> > > > 
> > > > ->8-
> > > 
> > > The "at worst" paragraph makes it look like it could happen (at least
> > > until you reach the last paragraph). Maybe you can just say that
> > > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> > > always true for kernel threads. You should probably mention this in a
> > > comment in the code as well.
> > 
> > What if I just delete the second paragraph, and remove the "But" from
> > the start of the third, and append:
> > 
> > "As a result, the wrong_task and wrong_cpu tests in
> > fpsimd_thread_switch() will always yield false for kernel threads."
> > 
> > ...with a similar comment in the code?
> 
> ...with a risk of being a bit over-pedantic and annoying, may I suggest
> the following complete commit text:
> 
> ------8<------
> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
> 
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For this reason, the ->mm
> checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> maintained properly for kernel threads.
> 
> FPSIMD context is never preserved for kernel threads across a context
> switch and therefore TIF_FOREIGN_FPSTATE should always be true for

(This refactoring opens up the interesting possibility of making
kernel-mode NEON in task context preemptible for kernel threads so
that we actually do preserve state... but that's a discussion for
another day.  There may be code around that relies on
kernel_neon_begin() disabling preemption for real.)

> kernel threads.  This is indeed the case, as the wrong_task and

This suggests that TIF_FOREIGN_FPSTATE is always true for kernel
threads today.  This is not quite because use_mm() can make mm non-
NULL.

> wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> kernel threads.

("false" -> "true".  My bad.)

> Further, the context switch logic is already deliberately optimised to
> defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> special case), which kernel threads by definition never reach, and
> therefore this change introduces no additional work in the critical
> path.
> 
> This patch removes the redundant checks and special-case code.
> ------8<------

Looking at my existing text, I rather reworded it like this.
Does this work any better for you?

--8<--

Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.

The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory.  For these reasons, the ->mm
checks are not useful, providing that TIF_FOREIGN_FPSTATE is
maintained in a consistent way for kernel threads.

The context switch logic is already deliberately optimised to defer
reloads of the regs until ret_to_user (or sigreturn as a special
case), and save them only if they have been previously loaded.
Kernel threads by definition never reach these paths.  As a result,
the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
always yield true for kernel threads.

This patch removes the redundant checks and special-case code,                  ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread               is scheduled in, and ensures that this flag is set for the init
task.  The fpsimd_flush_task_state() call already present in                    copy_thread() ensures the same for any new task.

With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
ensures that no extra context save work is added for kernel
threads, and eliminates the redundant context saving that may
currently occur for kernel threads that have acquired an mm via
use_mm().

-->8--

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-24  9:50               ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 24, 2018 at 10:33:50AM +0200, Christoffer Dall wrote:
> On Wed, May 23, 2018 at 04:03:37PM +0100, Dave Martin wrote:
> > On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> > > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > > > cleared except when returning to userspace or returning from a
> > > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > > > ever be saved.
> > > > > 
> > > > > I don't understand this construction proof; from looking at the patch
> > > > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > > > kernel thread?
> > > > 
> > > > Looking at this again, I think it is poorly worded.  This patch aims to
> > > > make it true by construction, but it isn't prior to the patch.
> > > > 
> > > > I'm tempted to delete the paragraph: the assertion of both untrue and
> > > > not the best way to justify that this patch works.
> > > > 
> > > > 
> > > > How about:
> > > > 
> > > > -8<-
> > > > 
> > > > The context switch logic already isolates user threads from each other.
> > > > This, it is sufficient for isolating user threads from the kernel,
> 
> s/This/Thus/ ?
> 
> I don't understand what 'it' refers to here?
> 
> > > > since the goal either way is to ensure that code executing in userspace
> > > > cannot see any FPSIMD state except its own.  Thus, there is no special
> > > > property of kernel threads that we care about except that it is
> > > > pointless to save or load FPSIMD register state for them.
> 
> Actually, I'm not really sure what this paragraph is getting at.

Reading this again, I don't think the paragraph adds much useful.

So I propose deleting that too.

> > > > 
> > > > At worst, the removal of all the kernel thread special cases by this
> > > > patch would thus spuriously load and save state for kernel threads when
> > > > unnecessary.
> > > > 
> > > > But the context switch logic is already deliberately optimised to defer
> > > > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > > > which kernel threads by definition never reach.
> > > > 
> > > > ->8-
> > > 
> > > The "at worst" paragraph makes it look like it could happen (at least
> > > until you reach the last paragraph). Maybe you can just say that
> > > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> > > always true for kernel threads. You should probably mention this in a
> > > comment in the code as well.
> > 
> > What if I just delete the second paragraph, and remove the "But" from
> > the start of the third, and append:
> > 
> > "As a result, the wrong_task and wrong_cpu tests in
> > fpsimd_thread_switch() will always yield false for kernel threads."
> > 
> > ...with a similar comment in the code?
> 
> ...with a risk of being a bit over-pedantic and annoying, may I suggest
> the following complete commit text:
> 
> ------8<------
> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
> 
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For this reason, the ->mm
> checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> maintained properly for kernel threads.
> 
> FPSIMD context is never preserved for kernel threads across a context
> switch and therefore TIF_FOREIGN_FPSTATE should always be true for

(This refactoring opens up the interesting possibility of making
kernel-mode NEON in task context preemptible for kernel threads so
that we actually do preserve state... but that's a discussion for
another day.  There may be code around that relies on
kernel_neon_begin() disabling preemption for real.)

> kernel threads.  This is indeed the case, as the wrong_task and

This suggests that TIF_FOREIGN_FPSTATE is always true for kernel
threads today.  This is not quite because use_mm() can make mm non-
NULL.

> wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> kernel threads.

("false" -> "true".  My bad.)

> Further, the context switch logic is already deliberately optimised to
> defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> special case), which kernel threads by definition never reach, and
> therefore this change introduces no additional work in the critical
> path.
> 
> This patch removes the redundant checks and special-case code.
> ------8<------

Looking@my existing text, I rather reworded it like this.
Does this work any better for you?

--8<--

Currently the FPSIMD handling code uses the condition task->mm ==
NULL as a hint that task has no FPSIMD register context.

The ->mm check is only there to filter out tasks that cannot
possibly have FPSIMD context loaded, for optimisation purposes.
Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
saving FPSIMD context back to memory.  For these reasons, the ->mm
checks are not useful, providing that TIF_FOREIGN_FPSTATE is
maintained in a consistent way for kernel threads.

The context switch logic is already deliberately optimised to defer
reloads of the regs until ret_to_user (or sigreturn as a special
case), and save them only if they have been previously loaded.
Kernel threads by definition never reach these paths.  As a result,
the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
always yield true for kernel threads.

This patch removes the redundant checks and special-case code,                  ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread               is scheduled in, and ensures that this flag is set for the init
task.  The fpsimd_flush_task_state() call already present in                    copy_thread() ensures the same for any new task.

With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
ensures that no extra context save work is added for kernel
threads, and eliminates the redundant context saving that may
currently occur for kernel threads that have acquired an mm via
use_mm().

-->8--

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
  2018-05-24  9:18           ` Alex Bennée
@ 2018-05-24 10:04             ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24 10:04 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, Christoffer Dall, Christoffer Dall, kvmarm,
	linux-arm-kernel

On Thu, May 24, 2018 at 10:18:39AM +0100, Alex Bennée wrote:
> 
> Christoffer Dall <christoffer.dall@arm.com> writes:
> 
> > On Wed, May 23, 2018 at 03:40:26PM +0100, Dave Martin wrote:
> >> On Wed, May 23, 2018 at 03:34:20PM +0100, Alex Bennée wrote:
> >> >
> >> > Dave Martin <Dave.Martin@arm.com> writes:

[...]

> >> > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> >> > > index cca7e06..72143cf 100644
> >> > > --- a/virt/kvm/Kconfig
> >> > > +++ b/virt/kvm/Kconfig
> >> > > @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
> >> > >
> >> > >  config HAVE_KVM_VCPU_ASYNC_IOCTL
> >> > >         bool
> >> > > +
> >> > > +config HAVE_KVM_VCPU_RUN_PID_CHANGE
> >> > > +       bool
> >> >
> >> > This almost threw me as I thought you might be able to enable this and
> >> > break the build, but apparently not:
> >> >
> >> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> >>
> >> Without a "help", the option seems non-interactive and cannot be true
> >> unless something selects it.  It seems a bit weird to me too, but the
> >> idiom appears widely used...
> >>
> > Indeed, I've copied this idiom from other things before and nobody has
> > complained, so I think it works (without any further deep insights into
> > the inner workings of Kconfig).
> 
> It's fine. My main worry was breaking bisection with the normal "make
> olddefconfig" approach. I tested it and found it to be fine and I don't
> think we need to worry about people adding the symbol to .config
> manually - they get to keep both pieces ;-)

I wasted a fair amount of time at some point in the past trying to work
out why I couldn't set one of these options by
echo CONFIG_FOO=y >>.config ...

That was fun ;)

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change
@ 2018-05-24 10:04             ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24 10:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 24, 2018 at 10:18:39AM +0100, Alex Benn?e wrote:
> 
> Christoffer Dall <christoffer.dall@arm.com> writes:
> 
> > On Wed, May 23, 2018 at 03:40:26PM +0100, Dave Martin wrote:
> >> On Wed, May 23, 2018 at 03:34:20PM +0100, Alex Benn?e wrote:
> >> >
> >> > Dave Martin <Dave.Martin@arm.com> writes:

[...]

> >> > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> >> > > index cca7e06..72143cf 100644
> >> > > --- a/virt/kvm/Kconfig
> >> > > +++ b/virt/kvm/Kconfig
> >> > > @@ -54,3 +54,6 @@ config HAVE_KVM_IRQ_BYPASS
> >> > >
> >> > >  config HAVE_KVM_VCPU_ASYNC_IOCTL
> >> > >         bool
> >> > > +
> >> > > +config HAVE_KVM_VCPU_RUN_PID_CHANGE
> >> > > +       bool
> >> >
> >> > This almost threw me as I thought you might be able to enable this and
> >> > break the build, but apparently not:
> >> >
> >> > Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>
> >>
> >> Without a "help", the option seems non-interactive and cannot be true
> >> unless something selects it.  It seems a bit weird to me too, but the
> >> idiom appears widely used...
> >>
> > Indeed, I've copied this idiom from other things before and nobody has
> > complained, so I think it works (without any further deep insights into
> > the inner workings of Kconfig).
> 
> It's fine. My main worry was breaking bisection with the normal "make
> olddefconfig" approach. I tested it and found it to be fine and I don't
> think we need to worry about people adding the symbol to .config
> manually - they get to keep both pieces ;-)

I wasted a fair amount of time at some point in the past trying to work
out why I couldn't set one of these options by
echo CONFIG_FOO=y >>.config ...

That was fun ;)

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-24  9:50               ` Dave Martin
@ 2018-05-24 10:06                 ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24 10:06 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:
> On Thu, May 24, 2018 at 10:33:50AM +0200, Christoffer Dall wrote:
> > On Wed, May 23, 2018 at 04:03:37PM +0100, Dave Martin wrote:
> > > On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> > > > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > > > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > > > > cleared except when returning to userspace or returning from a
> > > > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > > > > ever be saved.
> > > > > > 
> > > > > > I don't understand this construction proof; from looking at the patch
> > > > > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > > > > kernel thread?
> > > > > 
> > > > > Looking at this again, I think it is poorly worded.  This patch aims to
> > > > > make it true by construction, but it isn't prior to the patch.
> > > > > 
> > > > > I'm tempted to delete the paragraph: the assertion of both untrue and
> > > > > not the best way to justify that this patch works.
> > > > > 
> > > > > 
> > > > > How about:
> > > > > 
> > > > > -8<-
> > > > > 
> > > > > The context switch logic already isolates user threads from each other.
> > > > > This, it is sufficient for isolating user threads from the kernel,
> > 
> > s/This/Thus/ ?
> > 
> > I don't understand what 'it' refers to here?
> > 
> > > > > since the goal either way is to ensure that code executing in userspace
> > > > > cannot see any FPSIMD state except its own.  Thus, there is no special
> > > > > property of kernel threads that we care about except that it is
> > > > > pointless to save or load FPSIMD register state for them.
> > 
> > Actually, I'm not really sure what this paragraph is getting at.
> 
> Reading this again, I don't think the paragraph adds much useful.
> 
> So I propose deleting that too.
> 
> > > > > 
> > > > > At worst, the removal of all the kernel thread special cases by this
> > > > > patch would thus spuriously load and save state for kernel threads when
> > > > > unnecessary.
> > > > > 
> > > > > But the context switch logic is already deliberately optimised to defer
> > > > > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > > > > which kernel threads by definition never reach.
> > > > > 
> > > > > ->8-
> > > > 
> > > > The "at worst" paragraph makes it look like it could happen (at least
> > > > until you reach the last paragraph). Maybe you can just say that
> > > > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> > > > always true for kernel threads. You should probably mention this in a
> > > > comment in the code as well.
> > > 
> > > What if I just delete the second paragraph, and remove the "But" from
> > > the start of the third, and append:
> > > 
> > > "As a result, the wrong_task and wrong_cpu tests in
> > > fpsimd_thread_switch() will always yield false for kernel threads."
> > > 
> > > ...with a similar comment in the code?
> > 
> > ...with a risk of being a bit over-pedantic and annoying, may I suggest
> > the following complete commit text:
> > 
> > ------8<------
> > Currently the FPSIMD handling code uses the condition task->mm ==
> > NULL as a hint that task has no FPSIMD register context.
> > 
> > The ->mm check is only there to filter out tasks that cannot
> > possibly have FPSIMD context loaded, for optimisation purposes.
> > However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > saving FPSIMD context back to memory.  For this reason, the ->mm
> > checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> > maintained properly for kernel threads.
> > 
> > FPSIMD context is never preserved for kernel threads across a context
> > switch and therefore TIF_FOREIGN_FPSTATE should always be true for
> 
> (This refactoring opens up the interesting possibility of making
> kernel-mode NEON in task context preemptible for kernel threads so
> that we actually do preserve state... but that's a discussion for
> another day.  There may be code around that relies on
> kernel_neon_begin() disabling preemption for real.)
> 
> > kernel threads.  This is indeed the case, as the wrong_task and
> 
> This suggests that TIF_FOREIGN_FPSTATE is always true for kernel
> threads today.  This is not quite because use_mm() can make mm non-
> NULL.
> 

I was suggesting that it's always true after this patch.

> > wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> > kernel threads.
> 
> ("false" -> "true".  My bad.)
> 
> > Further, the context switch logic is already deliberately optimised to
> > defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> > special case), which kernel threads by definition never reach, and
> > therefore this change introduces no additional work in the critical
> > path.
> > 
> > This patch removes the redundant checks and special-case code.
> > ------8<------
> 
> Looking at my existing text, I rather reworded it like this.
> Does this work any better for you?
> 
> --8<--
> 
> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
> 
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For these reasons, the ->mm
> checks are not useful, providing that TIF_FOREIGN_FPSTATE is
> maintained in a consistent way for kernel threads.

Consistent with what?  Without more context or explanation,
I'm not sure what the reader is to make of that.  Do you not mean the
TIF_FOREIGN_FPSTATE is always true for kernel threads?

> 
> The context switch logic is already deliberately optimised to defer
> reloads of the regs until ret_to_user (or sigreturn as a special
> case), and save them only if they have been previously loaded.
> Kernel threads by definition never reach these paths.  As a result,

I'm struggling with the "As a result," here.  Is this because reloads of
regs in ret_to_user (or sigreturn) are the only places that can make
wrong_cpu or wrong_task be false?

(I'm actually wanting to understand this, not just bikeshedding the
commit message, as new corner cases keep coming up on this logic.)

> the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
> always yield true for kernel threads.
> 
> This patch removes the redundant checks and special-case code,                  ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread               is scheduled in, and ensures that this flag is set for the init
> task.  The fpsimd_flush_task_state() call already present in                    copy_thread() ensures the same for any new task.

nit: funny formatting

nit: ensuring that TIF_FOREIGN_FPSTATE *remains* set whenever a kernel
thread is scheduled in?

> 
> With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
> ensures that no extra context save work is added for kernel
> threads, and eliminates the redundant context saving that may
> currently occur for kernel threads that have acquired an mm via
> use_mm().
> 
> -->8--

If you can slightly connect the dots with the "As a result" above, I'm
fine with your version of the text.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-24 10:06                 ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-24 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:
> On Thu, May 24, 2018 at 10:33:50AM +0200, Christoffer Dall wrote:
> > On Wed, May 23, 2018 at 04:03:37PM +0100, Dave Martin wrote:
> > > On Wed, May 23, 2018 at 03:56:57PM +0100, Catalin Marinas wrote:
> > > > On Wed, May 23, 2018 at 02:31:59PM +0100, Dave P Martin wrote:
> > > > > On Wed, May 23, 2018 at 01:48:12PM +0200, Christoffer Dall wrote:
> > > > > > On Tue, May 22, 2018 at 05:05:08PM +0100, Dave Martin wrote:
> > > > > > > This is true by construction however: TIF_FOREIGN_FPSTATE is never
> > > > > > > cleared except when returning to userspace or returning from a
> > > > > > > signal: thus, for a true kernel thread no FPSIMD context is ever
> > > > > > > loaded, TIF_FOREIGN_FPSTATE will remain set and no context will
> > > > > > > ever be saved.
> > > > > > 
> > > > > > I don't understand this construction proof; from looking at the patch
> > > > > > below it is not obvious to me why fpsimd_thread_switch() can never have
> > > > > > !wrong_task && !wrong_cpu and therefore clear TIF_FOREIGN_FPSTATE for a
> > > > > > kernel thread?
> > > > > 
> > > > > Looking at this again, I think it is poorly worded.  This patch aims to
> > > > > make it true by construction, but it isn't prior to the patch.
> > > > > 
> > > > > I'm tempted to delete the paragraph: the assertion of both untrue and
> > > > > not the best way to justify that this patch works.
> > > > > 
> > > > > 
> > > > > How about:
> > > > > 
> > > > > -8<-
> > > > > 
> > > > > The context switch logic already isolates user threads from each other.
> > > > > This, it is sufficient for isolating user threads from the kernel,
> > 
> > s/This/Thus/ ?
> > 
> > I don't understand what 'it' refers to here?
> > 
> > > > > since the goal either way is to ensure that code executing in userspace
> > > > > cannot see any FPSIMD state except its own.  Thus, there is no special
> > > > > property of kernel threads that we care about except that it is
> > > > > pointless to save or load FPSIMD register state for them.
> > 
> > Actually, I'm not really sure what this paragraph is getting at.
> 
> Reading this again, I don't think the paragraph adds much useful.
> 
> So I propose deleting that too.
> 
> > > > > 
> > > > > At worst, the removal of all the kernel thread special cases by this
> > > > > patch would thus spuriously load and save state for kernel threads when
> > > > > unnecessary.
> > > > > 
> > > > > But the context switch logic is already deliberately optimised to defer
> > > > > reloads of the regs until ret_to_user (or sigreturn as a special case),
> > > > > which kernel threads by definition never reach.
> > > > > 
> > > > > ->8-
> > > > 
> > > > The "at worst" paragraph makes it look like it could happen (at least
> > > > until you reach the last paragraph). Maybe you can just say that
> > > > wrong_task and wrong_cpu (with the fpsimd_cpu = NR_CPUS addition) are
> > > > always true for kernel threads. You should probably mention this in a
> > > > comment in the code as well.
> > > 
> > > What if I just delete the second paragraph, and remove the "But" from
> > > the start of the third, and append:
> > > 
> > > "As a result, the wrong_task and wrong_cpu tests in
> > > fpsimd_thread_switch() will always yield false for kernel threads."
> > > 
> > > ...with a similar comment in the code?
> > 
> > ...with a risk of being a bit over-pedantic and annoying, may I suggest
> > the following complete commit text:
> > 
> > ------8<------
> > Currently the FPSIMD handling code uses the condition task->mm ==
> > NULL as a hint that task has no FPSIMD register context.
> > 
> > The ->mm check is only there to filter out tasks that cannot
> > possibly have FPSIMD context loaded, for optimisation purposes.
> > However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > saving FPSIMD context back to memory.  For this reason, the ->mm
> > checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> > maintained properly for kernel threads.
> > 
> > FPSIMD context is never preserved for kernel threads across a context
> > switch and therefore TIF_FOREIGN_FPSTATE should always be true for
> 
> (This refactoring opens up the interesting possibility of making
> kernel-mode NEON in task context preemptible for kernel threads so
> that we actually do preserve state... but that's a discussion for
> another day.  There may be code around that relies on
> kernel_neon_begin() disabling preemption for real.)
> 
> > kernel threads.  This is indeed the case, as the wrong_task and
> 
> This suggests that TIF_FOREIGN_FPSTATE is always true for kernel
> threads today.  This is not quite because use_mm() can make mm non-
> NULL.
> 

I was suggesting that it's always true after this patch.

> > wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> > kernel threads.
> 
> ("false" -> "true".  My bad.)
> 
> > Further, the context switch logic is already deliberately optimised to
> > defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> > special case), which kernel threads by definition never reach, and
> > therefore this change introduces no additional work in the critical
> > path.
> > 
> > This patch removes the redundant checks and special-case code.
> > ------8<------
> 
> Looking at my existing text, I rather reworded it like this.
> Does this work any better for you?
> 
> --8<--
> 
> Currently the FPSIMD handling code uses the condition task->mm ==
> NULL as a hint that task has no FPSIMD register context.
> 
> The ->mm check is only there to filter out tasks that cannot
> possibly have FPSIMD context loaded, for optimisation purposes.
> Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> saving FPSIMD context back to memory.  For these reasons, the ->mm
> checks are not useful, providing that TIF_FOREIGN_FPSTATE is
> maintained in a consistent way for kernel threads.

Consistent with what?  Without more context or explanation,
I'm not sure what the reader is to make of that.  Do you not mean the
TIF_FOREIGN_FPSTATE is always true for kernel threads?

> 
> The context switch logic is already deliberately optimised to defer
> reloads of the regs until ret_to_user (or sigreturn as a special
> case), and save them only if they have been previously loaded.
> Kernel threads by definition never reach these paths.  As a result,

I'm struggling with the "As a result," here.  Is this because reloads of
regs in ret_to_user (or sigreturn) are the only places that can make
wrong_cpu or wrong_task be false?

(I'm actually wanting to understand this, not just bikeshedding the
commit message, as new corner cases keep coming up on this logic.)

> the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
> always yield true for kernel threads.
> 
> This patch removes the redundant checks and special-case code,                  ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread               is scheduled in, and ensures that this flag is set for the init
> task.  The fpsimd_flush_task_state() call already present in                    copy_thread() ensures the same for any new task.

nit: funny formatting

nit: ensuring that TIF_FOREIGN_FPSTATE *remains* set whenever a kernel
thread is scheduled in?

> 
> With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
> ensures that no extra context save work is added for kernel
> threads, and eliminates the redundant context saving that may
> currently occur for kernel threads that have acquired an mm via
> use_mm().
> 
> -->8--

If you can slightly connect the dots with the "As a result" above, I'm
fine with your version of the text.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 10:09     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 10:09 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64.  This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
>
> Four hooks are defined in order to enable this:
>
>  * kvm_arch_vcpu_run_map_fp():
>    Called on PID change to map necessary bits of current to Hyp.
>
>  * kvm_arch_vcpu_load_fp():
>    Set up FP/SIMD for entering the KVM run loop (parse as
>    "vcpu_load fp").
>
>  * kvm_arch_vcpu_ctxsync_fp():
>    Get FP/SIMD into a safe state for re-enabling interrupts after a
>    guest exit back to the run loop.
>
>    For arm64 specifically, this involves updating the host kernel's
>    FPSIMD context tracking metadata so that kernel-mode NEON use
>    will cause the vcpu's FPSIMD state to be saved back correctly
>    into the vcpu struct.  This must be done before re-enabling
>    interrupts because kernel-mode NEON may be used by softirqs.
>
>  * kvm_arch_vcpu_put_fp():
>    Save guest FP/SIMD state back to memory and dissociate from the
>    CPU ("vcpu_put fp").
>
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current.  A few
> helpers drive this:
>
>  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
>    mark this CPU as having context fp (which may belong to a vcpu)
>    currently loaded in its registers.  This is the non-task
>    equivalent of the static function fpsimd_bind_to_cpu() in
>    fpsimd.c.
>
>  * task_fpsimd_save():
>    exported to allow KVM to save the guest's FPSIMD state back to
>    memory on exit from the run loop.
>
>  * fpsimd_flush_state():
>    invalidate any context's FPSIMD state that is currently loaded.
>    Used to disassociate the vcpu from the CPU regs on run loop exit.
>
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
>
> Some new vcpu_arch fields are added to make all this work.  Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state.  However, it is still needed for
> preserving other things such as the host's system registers.  To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
>
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic.  It could provide implementations of
> the helpers later if desired.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>
> ---
>
> Reviewers note: tags retained because this delta is straightforward by
> itself.  Please shout if you're not happy!
>
> Changes since v9:
>
>  * Remove redundant set_thread_flag(TIF_FOREIGN_FPSTATE) that is now
>    implicit in fpsimd_flush_cpu_state().
> ---
>  arch/arm/include/asm/kvm_host.h   |   8 +++
>  arch/arm64/include/asm/fpsimd.h   |   6 +++
>  arch/arm64/include/asm/kvm_host.h |  21 ++++++++
>  arch/arm64/kernel/fpsimd.c        |  17 ++++--
>  arch/arm64/kvm/Kconfig            |   1 +
>  arch/arm64/kvm/Makefile           |   2 +-
>  arch/arm64/kvm/fpsimd.c           | 111 ++++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/switch.c       |  51 +++++++++---------
>  virt/kvm/arm/arm.c                |   4 ++
>  9 files changed, 191 insertions(+), 30 deletions(-)
>  create mode 100644 arch/arm64/kvm/fpsimd.c
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index c7c28c8..ac870b2 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  			       struct kvm_device_attr *attr);
>
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
>  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
>  static inline void kvm_fpsimd_flush_cpu_state(void) {}
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index aa7162a..3e00f70 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -41,6 +41,8 @@ struct task_struct;
>  extern void fpsimd_save_state(struct user_fpsimd_state *state);
>  extern void fpsimd_load_state(struct user_fpsimd_state *state);
>
> +extern void fpsimd_save(void);
> +
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>
> @@ -49,7 +51,11 @@ extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>
> +extern void fpsimd_bind_task_to_cpu(void);
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
>  extern void sve_flush_cpu_state(void);
>
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 146c167..b3fe730 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -30,6 +30,7 @@
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmio.h>
> +#include <asm/thread_info.h>
>
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>
> @@ -238,6 +239,10 @@ struct kvm_vcpu_arch {
>
>  	/* Pointer to host CPU context */
>  	kvm_cpu_context_t *host_cpu_context;
> +
> +	struct thread_info *host_thread_info;	/* hyp VA */
> +	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
> +
>  	struct {
>  		/* {Break,watch}point registers */
>  		struct kvm_guest_debug_arch regs;
> @@ -295,6 +300,9 @@ struct kvm_vcpu_arch {
>
>  /* vcpu_arch flags field values: */
>  #define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
> +#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
> +#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded
>  */

I may be descending into bike-shedding territory here but it seems a
little incongruous to have _ENABLED = guest FP state when we have _HOST
for host FP state. Why not KVM_ARM64_FP_GUEST?

> +#define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
>
>  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
>
> @@ -423,6 +431,19 @@ static inline void __cpu_init_stage2(void)
>  		  "PARange is %d bits, unsupported configuration!", parange);
>  }
>
> +/* Guest/host FPSIMD coordination helpers */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> +
> +#ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_arch_vcpu_run_map_fp(vcpu);
> +}
> +#endif
> +
>  /*
>   * All host FP/SIMD state is restored on guest exit, so nothing needs
>   * doing here except in the SVE case:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ba9e7df..ded7ffd 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -265,7 +265,7 @@ static void task_fpsimd_load(void)
>   *
>   * Softirqs (and preemption) must be disabled.
>   */
> -static void fpsimd_save(void)
> +void fpsimd_save(void)
>  {
>  	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
>
> @@ -981,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
>   * Associate current's FPSIMD context with this cpu
>   * Preemption must be disabled when calling this function.
>   */
> -static void fpsimd_bind_task_to_cpu(void)
> +void fpsimd_bind_task_to_cpu(void)
>  {
>  	struct fpsimd_last_state_struct *last =
>  		this_cpu_ptr(&fpsimd_last_state);
> @@ -1001,6 +1001,17 @@ static void fpsimd_bind_task_to_cpu(void)
>  	}
>  }
>
> +void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
> +{
> +	struct fpsimd_last_state_struct *last =
> +		this_cpu_ptr(&fpsimd_last_state);
> +
> +	WARN_ON(!in_softirq() && !irqs_disabled());
> +
> +	last->st = st;
> +	last->sve_in_use = false;
> +}
> +
>  /*
>   * Load the userland FPSIMD state of 'current' from memory, but only if the
>   * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
> @@ -1053,7 +1064,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
>  	t->thread.fpsimd_cpu = NR_CPUS;
>  }
>
> -static inline void fpsimd_flush_cpu_state(void)
> +void fpsimd_flush_cpu_state(void)
>  {
>  	__this_cpu_write(fpsimd_last_state.st, NULL);
>  	set_thread_flag(TIF_FOREIGN_FPSTATE);
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index a2e3a5a..47b23bf 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -39,6 +39,7 @@ config KVM
>  	select HAVE_KVM_IRQ_ROUTING
>  	select IRQ_BYPASS_MANAGER
>  	select HAVE_KVM_IRQ_BYPASS
> +	select HAVE_KVM_VCPU_RUN_PID_CHANGE
>  	---help---
>  	  Support hosting virtualized guest machines.
>  	  We don't support KVM with 16K page tables yet, due to the multiple
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 93afff9..0f2a135 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
>
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> new file mode 100644
> index 0000000..365933a
> --- /dev/null
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arch/arm64/kvm/fpsimd.c: Guest/host FPSIMD context coordination helpers
> + *
> + * Copyright 2018 Arm Limited
> + * Author: Dave Martin <Dave.Martin@arm.com>
> + */
> +#include <linux/bottom_half.h>
> +#include <linux/sched.h>
> +#include <linux/thread_info.h>
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_host.h>
> +#include <asm/kvm_mmu.h>
> +
> +/*
> + * Called on entry to KVM_RUN unless this vcpu previously ran at least
> + * once and the most recent prior KVM_RUN for this vcpu was called from
> + * the same task as current (highly likely).
> + *
> + * This is guaranteed to execute before kvm_arch_vcpu_load_fp(vcpu),
> + * such that on entering hyp the relevant parts of current are already
> + * mapped.
> + */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +
> +	struct thread_info *ti = &current->thread_info;
> +	struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
> +
> +	/*
> +	 * Make sure the host task thread flags and fpsimd state are
> +	 * visible to hyp:
> +	 */
> +	ret = create_hyp_mappings(ti, ti + 1, PAGE_HYP);
> +	if (ret)
> +		goto error;
> +
> +	ret = create_hyp_mappings(fpsimd, fpsimd + 1, PAGE_HYP);
> +	if (ret)
> +		goto error;
> +
> +	vcpu->arch.host_thread_info = kern_hyp_va(ti);
> +	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
> +error:
> +	return ret;
> +}
> +
> +/*
> + * Prepare vcpu for saving the host's FPSIMD state and loading the guest's.
> + * The actual loading is done by the FPSIMD access trap taken to hyp.
> + *
> + * Here, we just set the correct metadata to indicate that the FPSIMD
> + * state in the cpu regs (if any) belongs to current on the host.
> + *
> + * TIF_SVE is backed up here, since it may get clobbered with guest state.
> + * This flag is restored by kvm_arch_vcpu_put_fp(vcpu).
> + */
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
> +{
> +	BUG_ON(system_supports_sve());
> +	BUG_ON(!current->mm);
> +
> +	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
> +	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
> +	if (test_thread_flag(TIF_SVE))
> +		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
> +}
> +
> +/*
> + * If the guest FPSIMD state was loaded, update the host's context
> + * tracking data mark the CPU FPSIMD regs as dirty and belonging to vcpu
> + * so that they will be written back if the kernel clobbers them due to
> + * kernel-mode NEON before re-entry into the guest.
> + */
> +void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
> +{
> +	WARN_ON_ONCE(!irqs_disabled());
> +
> +	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> +		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs);
> +		clear_thread_flag(TIF_FOREIGN_FPSTATE);
> +		clear_thread_flag(TIF_SVE);
> +	}
> +}
> +
> +/*
> + * Write back the vcpu FPSIMD regs if they are dirty, and invalidate the
> + * cpu FPSIMD regs so that they can't be spuriously reused if this vcpu
> + * disappears and another task or vcpu appears that recycles the same
> + * struct fpsimd_state.
> + */
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> +{
> +	local_bh_disable();
> +
> +	update_thread_flag(TIF_SVE,
> +			   vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE);
> +
> +	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> +		/* Clean guest FP state to memory and invalidate cpu view */
> +		fpsimd_save();
> +		fpsimd_flush_cpu_state();
> +	} else if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> +		/* Ensure user trap controls are correctly restored */
> +		fpsimd_bind_task_to_cpu();
> +	}
> +
> +	local_bh_enable();
> +}
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index c0796c4..118f300 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -23,19 +23,21 @@
>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_host.h>
>  #include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/thread_info.h>
>
> -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> +/* Check whether the FP regs were dirtied while in the host-side run loop: */
> +static bool __hyp_text update_fp_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE)
> +		vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
> +				      KVM_ARM64_FP_HOST);
>
> -static bool fpsimd_enabled_vhe(void)
> -{
> -	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> +	return !!(vcpu->arch.flags & KVM_ARM64_FP_ENABLED);
>  }
>
>  /* Save the 32-bit only FPSIMD system register state */
> @@ -92,7 +94,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>
>  	val = read_sysreg(cpacr_el1);
>  	val |= CPACR_EL1_TTA;
> -	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
> +	val &= ~CPACR_EL1_ZEN;
> +	if (!update_fp_enabled(vcpu))
> +		val &= ~CPACR_EL1_FPEN;
> +
>  	write_sysreg(val, cpacr_el1);
>
>  	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
> @@ -105,7 +110,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>  	__activate_traps_common(vcpu);
>
>  	val = CPTR_EL2_DEFAULT;
> -	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
> +	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
> +	if (!update_fp_enabled(vcpu))
> +		val |= CPTR_EL2_TFP;
> +
>  	write_sysreg(val, cptr_el2);
>  }
>
> @@ -321,8 +329,6 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  				    struct kvm_vcpu *vcpu)
>  {
> -	kvm_cpu_context_t *host_ctxt;
> -
>  	if (has_vhe())
>  		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>  			     cpacr_el1);
> @@ -332,14 +338,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>
>  	isb();
>
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> +	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
> +		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> +		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
> +	}
> +
>  	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
>
>  	/* Skip restoring fpexc32 for AArch64 guests */
>  	if (!(read_sysreg(hcr_el2) & HCR_RW))
>  		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
>  			     fpexc32_el2);
> +
> +	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
>  }
>
>  /*
> @@ -418,7 +429,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> -	bool fp_enabled;
>  	u64 exit_code;
>
>  	host_ctxt = vcpu->arch.host_cpu_context;
> @@ -440,19 +450,14 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  		/* And we're baaack! */
>  	} while (fixup_guest_exit(vcpu, &exit_code));
>
> -	fp_enabled = fpsimd_enabled_vhe();
> -
>  	sysreg_save_guest_state_vhe(guest_ctxt);
>
>  	__deactivate_traps(vcpu);
>
>  	sysreg_restore_host_state_vhe(host_ctxt);
>
> -	if (fp_enabled) {
> -		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> -		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> +	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
>  		__fpsimd_save_fpexc32(vcpu);
> -	}
>
>  	__debug_switch_to_host(vcpu);
>
> @@ -464,7 +469,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> -	bool fp_enabled;
>  	u64 exit_code;
>
>  	vcpu = kern_hyp_va(vcpu);
> @@ -496,8 +500,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  		/* And we're baaack! */
>  	} while (fixup_guest_exit(vcpu, &exit_code));
>
> -	fp_enabled = __fpsimd_enabled_nvhe();
> -
>  	__sysreg_save_state_nvhe(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
>  	__timer_disable_traps(vcpu);
> @@ -508,11 +510,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>
>  	__sysreg_restore_state_nvhe(host_ctxt);
>
> -	if (fp_enabled) {
> -		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> -		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> +	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
>  		__fpsimd_save_fpexc32(vcpu);
> -	}
>
>  	/*
>  	 * This must come after restoring the host sysregs, since a non-VHE
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index a4c1b76..bee226c 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -363,10 +363,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	kvm_vgic_load(vcpu);
>  	kvm_timer_vcpu_load(vcpu);
>  	kvm_vcpu_load_sysregs(vcpu);
> +	kvm_arch_vcpu_load_fp(vcpu);
>  }
>
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	kvm_arch_vcpu_put_fp(vcpu);
>  	kvm_vcpu_put_sysregs(vcpu);
>  	kvm_timer_vcpu_put(vcpu);
>  	kvm_vgic_put(vcpu);
> @@ -778,6 +780,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (static_branch_unlikely(&userspace_irqchip_in_use))
>  			kvm_timer_sync_hwstate(vcpu);
>
> +		kvm_arch_vcpu_ctxsync_fp(vcpu);
> +
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still

Minor bike-shedding aside:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-05-24 10:09     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 10:09 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> This patch refactors KVM to align the host and guest FPSIMD
> save/restore logic with each other for arm64.  This reduces the
> number of redundant save/restore operations that must occur, and
> reduces the common-case IRQ blackout time during guest exit storms
> by saving the host state lazily and optimising away the need to
> restore the host state before returning to the run loop.
>
> Four hooks are defined in order to enable this:
>
>  * kvm_arch_vcpu_run_map_fp():
>    Called on PID change to map necessary bits of current to Hyp.
>
>  * kvm_arch_vcpu_load_fp():
>    Set up FP/SIMD for entering the KVM run loop (parse as
>    "vcpu_load fp").
>
>  * kvm_arch_vcpu_ctxsync_fp():
>    Get FP/SIMD into a safe state for re-enabling interrupts after a
>    guest exit back to the run loop.
>
>    For arm64 specifically, this involves updating the host kernel's
>    FPSIMD context tracking metadata so that kernel-mode NEON use
>    will cause the vcpu's FPSIMD state to be saved back correctly
>    into the vcpu struct.  This must be done before re-enabling
>    interrupts because kernel-mode NEON may be used by softirqs.
>
>  * kvm_arch_vcpu_put_fp():
>    Save guest FP/SIMD state back to memory and dissociate from the
>    CPU ("vcpu_put fp").
>
> Also, the arm64 FPSIMD context switch code is updated to enable it
> to save back FPSIMD state for a vcpu, not just current.  A few
> helpers drive this:
>
>  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
>    mark this CPU as having context fp (which may belong to a vcpu)
>    currently loaded in its registers.  This is the non-task
>    equivalent of the static function fpsimd_bind_to_cpu() in
>    fpsimd.c.
>
>  * task_fpsimd_save():
>    exported to allow KVM to save the guest's FPSIMD state back to
>    memory on exit from the run loop.
>
>  * fpsimd_flush_state():
>    invalidate any context's FPSIMD state that is currently loaded.
>    Used to disassociate the vcpu from the CPU regs on run loop exit.
>
> These changes allow the run loop to enable interrupts (and thus
> softirqs that may use kernel-mode NEON) without having to save the
> guest's FPSIMD state eagerly.
>
> Some new vcpu_arch fields are added to make all this work.  Because
> host FPSIMD state can now be saved back directly into current's
> thread_struct as appropriate, host_cpu_context is no longer used
> for preserving the FPSIMD state.  However, it is still needed for
> preserving other things such as the host's system registers.  To
> avoid ABI churn, the redundant storage space in host_cpu_context is
> not removed for now.
>
> arch/arm is not addressed by this patch and continues to use its
> current save/restore logic.  It could provide implementations of
> the helpers later if desired.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>
> ---
>
> Reviewers note: tags retained because this delta is straightforward by
> itself.  Please shout if you're not happy!
>
> Changes since v9:
>
>  * Remove redundant set_thread_flag(TIF_FOREIGN_FPSTATE) that is now
>    implicit in fpsimd_flush_cpu_state().
> ---
>  arch/arm/include/asm/kvm_host.h   |   8 +++
>  arch/arm64/include/asm/fpsimd.h   |   6 +++
>  arch/arm64/include/asm/kvm_host.h |  21 ++++++++
>  arch/arm64/kernel/fpsimd.c        |  17 ++++--
>  arch/arm64/kvm/Kconfig            |   1 +
>  arch/arm64/kvm/Makefile           |   2 +-
>  arch/arm64/kvm/fpsimd.c           | 111 ++++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/switch.c       |  51 +++++++++---------
>  virt/kvm/arm/arm.c                |   4 ++
>  9 files changed, 191 insertions(+), 30 deletions(-)
>  create mode 100644 arch/arm64/kvm/fpsimd.c
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index c7c28c8..ac870b2 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  			       struct kvm_device_attr *attr);
>
> +/*
> + * VFP/NEON switching is all done by the hyp switch code, so no need to
> + * coordinate with host context handling for this state:
> + */
> +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> +
>  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
>  static inline void kvm_fpsimd_flush_cpu_state(void) {}
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index aa7162a..3e00f70 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -41,6 +41,8 @@ struct task_struct;
>  extern void fpsimd_save_state(struct user_fpsimd_state *state);
>  extern void fpsimd_load_state(struct user_fpsimd_state *state);
>
> +extern void fpsimd_save(void);
> +
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>
> @@ -49,7 +51,11 @@ extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
>  extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
>
> +extern void fpsimd_bind_task_to_cpu(void);
> +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> +
>  extern void fpsimd_flush_task_state(struct task_struct *target);
> +extern void fpsimd_flush_cpu_state(void);
>  extern void sve_flush_cpu_state(void);
>
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 146c167..b3fe730 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -30,6 +30,7 @@
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmio.h>
> +#include <asm/thread_info.h>
>
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>
> @@ -238,6 +239,10 @@ struct kvm_vcpu_arch {
>
>  	/* Pointer to host CPU context */
>  	kvm_cpu_context_t *host_cpu_context;
> +
> +	struct thread_info *host_thread_info;	/* hyp VA */
> +	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
> +
>  	struct {
>  		/* {Break,watch}point registers */
>  		struct kvm_guest_debug_arch regs;
> @@ -295,6 +300,9 @@ struct kvm_vcpu_arch {
>
>  /* vcpu_arch flags field values: */
>  #define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
> +#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
> +#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded
>  */

I may be descending into bike-shedding territory here but it seems a
little incongruous to have _ENABLED = guest FP state when we have _HOST
for host FP state. Why not KVM_ARM64_FP_GUEST?

> +#define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
>
>  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
>
> @@ -423,6 +431,19 @@ static inline void __cpu_init_stage2(void)
>  		  "PARange is %d bits, unsupported configuration!", parange);
>  }
>
> +/* Guest/host FPSIMD coordination helpers */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
> +
> +#ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */
> +static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_arch_vcpu_run_map_fp(vcpu);
> +}
> +#endif
> +
>  /*
>   * All host FP/SIMD state is restored on guest exit, so nothing needs
>   * doing here except in the SVE case:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ba9e7df..ded7ffd 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -265,7 +265,7 @@ static void task_fpsimd_load(void)
>   *
>   * Softirqs (and preemption) must be disabled.
>   */
> -static void fpsimd_save(void)
> +void fpsimd_save(void)
>  {
>  	struct user_fpsimd_state *st = __this_cpu_read(fpsimd_last_state.st);
>
> @@ -981,7 +981,7 @@ void fpsimd_signal_preserve_current_state(void)
>   * Associate current's FPSIMD context with this cpu
>   * Preemption must be disabled when calling this function.
>   */
> -static void fpsimd_bind_task_to_cpu(void)
> +void fpsimd_bind_task_to_cpu(void)
>  {
>  	struct fpsimd_last_state_struct *last =
>  		this_cpu_ptr(&fpsimd_last_state);
> @@ -1001,6 +1001,17 @@ static void fpsimd_bind_task_to_cpu(void)
>  	}
>  }
>
> +void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
> +{
> +	struct fpsimd_last_state_struct *last =
> +		this_cpu_ptr(&fpsimd_last_state);
> +
> +	WARN_ON(!in_softirq() && !irqs_disabled());
> +
> +	last->st = st;
> +	last->sve_in_use = false;
> +}
> +
>  /*
>   * Load the userland FPSIMD state of 'current' from memory, but only if the
>   * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
> @@ -1053,7 +1064,7 @@ void fpsimd_flush_task_state(struct task_struct *t)
>  	t->thread.fpsimd_cpu = NR_CPUS;
>  }
>
> -static inline void fpsimd_flush_cpu_state(void)
> +void fpsimd_flush_cpu_state(void)
>  {
>  	__this_cpu_write(fpsimd_last_state.st, NULL);
>  	set_thread_flag(TIF_FOREIGN_FPSTATE);
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index a2e3a5a..47b23bf 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -39,6 +39,7 @@ config KVM
>  	select HAVE_KVM_IRQ_ROUTING
>  	select IRQ_BYPASS_MANAGER
>  	select HAVE_KVM_IRQ_BYPASS
> +	select HAVE_KVM_VCPU_RUN_PID_CHANGE
>  	---help---
>  	  Support hosting virtualized guest machines.
>  	  We don't support KVM with 16K page tables yet, due to the multiple
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 93afff9..0f2a135 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
>
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> new file mode 100644
> index 0000000..365933a
> --- /dev/null
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arch/arm64/kvm/fpsimd.c: Guest/host FPSIMD context coordination helpers
> + *
> + * Copyright 2018 Arm Limited
> + * Author: Dave Martin <Dave.Martin@arm.com>
> + */
> +#include <linux/bottom_half.h>
> +#include <linux/sched.h>
> +#include <linux/thread_info.h>
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_asm.h>
> +#include <asm/kvm_host.h>
> +#include <asm/kvm_mmu.h>
> +
> +/*
> + * Called on entry to KVM_RUN unless this vcpu previously ran at least
> + * once and the most recent prior KVM_RUN for this vcpu was called from
> + * the same task as current (highly likely).
> + *
> + * This is guaranteed to execute before kvm_arch_vcpu_load_fp(vcpu),
> + * such that on entering hyp the relevant parts of current are already
> + * mapped.
> + */
> +int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +
> +	struct thread_info *ti = &current->thread_info;
> +	struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
> +
> +	/*
> +	 * Make sure the host task thread flags and fpsimd state are
> +	 * visible to hyp:
> +	 */
> +	ret = create_hyp_mappings(ti, ti + 1, PAGE_HYP);
> +	if (ret)
> +		goto error;
> +
> +	ret = create_hyp_mappings(fpsimd, fpsimd + 1, PAGE_HYP);
> +	if (ret)
> +		goto error;
> +
> +	vcpu->arch.host_thread_info = kern_hyp_va(ti);
> +	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
> +error:
> +	return ret;
> +}
> +
> +/*
> + * Prepare vcpu for saving the host's FPSIMD state and loading the guest's.
> + * The actual loading is done by the FPSIMD access trap taken to hyp.
> + *
> + * Here, we just set the correct metadata to indicate that the FPSIMD
> + * state in the cpu regs (if any) belongs to current on the host.
> + *
> + * TIF_SVE is backed up here, since it may get clobbered with guest state.
> + * This flag is restored by kvm_arch_vcpu_put_fp(vcpu).
> + */
> +void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
> +{
> +	BUG_ON(system_supports_sve());
> +	BUG_ON(!current->mm);
> +
> +	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
> +	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
> +	if (test_thread_flag(TIF_SVE))
> +		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
> +}
> +
> +/*
> + * If the guest FPSIMD state was loaded, update the host's context
> + * tracking data mark the CPU FPSIMD regs as dirty and belonging to vcpu
> + * so that they will be written back if the kernel clobbers them due to
> + * kernel-mode NEON before re-entry into the guest.
> + */
> +void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
> +{
> +	WARN_ON_ONCE(!irqs_disabled());
> +
> +	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> +		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs);
> +		clear_thread_flag(TIF_FOREIGN_FPSTATE);
> +		clear_thread_flag(TIF_SVE);
> +	}
> +}
> +
> +/*
> + * Write back the vcpu FPSIMD regs if they are dirty, and invalidate the
> + * cpu FPSIMD regs so that they can't be spuriously reused if this vcpu
> + * disappears and another task or vcpu appears that recycles the same
> + * struct fpsimd_state.
> + */
> +void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
> +{
> +	local_bh_disable();
> +
> +	update_thread_flag(TIF_SVE,
> +			   vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE);
> +
> +	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
> +		/* Clean guest FP state to memory and invalidate cpu view */
> +		fpsimd_save();
> +		fpsimd_flush_cpu_state();
> +	} else if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> +		/* Ensure user trap controls are correctly restored */
> +		fpsimd_bind_task_to_cpu();
> +	}
> +
> +	local_bh_enable();
> +}
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index c0796c4..118f300 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -23,19 +23,21 @@
>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_host.h>
>  #include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/thread_info.h>
>
> -static bool __hyp_text __fpsimd_enabled_nvhe(void)
> +/* Check whether the FP regs were dirtied while in the host-side run loop: */
> +static bool __hyp_text update_fp_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> -}
> +	if (vcpu->arch.host_thread_info->flags & _TIF_FOREIGN_FPSTATE)
> +		vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
> +				      KVM_ARM64_FP_HOST);
>
> -static bool fpsimd_enabled_vhe(void)
> -{
> -	return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN);
> +	return !!(vcpu->arch.flags & KVM_ARM64_FP_ENABLED);
>  }
>
>  /* Save the 32-bit only FPSIMD system register state */
> @@ -92,7 +94,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>
>  	val = read_sysreg(cpacr_el1);
>  	val |= CPACR_EL1_TTA;
> -	val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
> +	val &= ~CPACR_EL1_ZEN;
> +	if (!update_fp_enabled(vcpu))
> +		val &= ~CPACR_EL1_FPEN;
> +
>  	write_sysreg(val, cpacr_el1);
>
>  	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
> @@ -105,7 +110,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>  	__activate_traps_common(vcpu);
>
>  	val = CPTR_EL2_DEFAULT;
> -	val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
> +	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
> +	if (!update_fp_enabled(vcpu))
> +		val |= CPTR_EL2_TFP;
> +
>  	write_sysreg(val, cptr_el2);
>  }
>
> @@ -321,8 +329,6 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  				    struct kvm_vcpu *vcpu)
>  {
> -	kvm_cpu_context_t *host_ctxt;
> -
>  	if (has_vhe())
>  		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>  			     cpacr_el1);
> @@ -332,14 +338,19 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>
>  	isb();
>
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	__fpsimd_save_state(&host_ctxt->gp_regs.fp_regs);
> +	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
> +		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> +		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
> +	}
> +
>  	__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
>
>  	/* Skip restoring fpexc32 for AArch64 guests */
>  	if (!(read_sysreg(hcr_el2) & HCR_RW))
>  		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
>  			     fpexc32_el2);
> +
> +	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
>  }
>
>  /*
> @@ -418,7 +429,6 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> -	bool fp_enabled;
>  	u64 exit_code;
>
>  	host_ctxt = vcpu->arch.host_cpu_context;
> @@ -440,19 +450,14 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  		/* And we're baaack! */
>  	} while (fixup_guest_exit(vcpu, &exit_code));
>
> -	fp_enabled = fpsimd_enabled_vhe();
> -
>  	sysreg_save_guest_state_vhe(guest_ctxt);
>
>  	__deactivate_traps(vcpu);
>
>  	sysreg_restore_host_state_vhe(host_ctxt);
>
> -	if (fp_enabled) {
> -		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> -		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> +	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
>  		__fpsimd_save_fpexc32(vcpu);
> -	}
>
>  	__debug_switch_to_host(vcpu);
>
> @@ -464,7 +469,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *host_ctxt;
>  	struct kvm_cpu_context *guest_ctxt;
> -	bool fp_enabled;
>  	u64 exit_code;
>
>  	vcpu = kern_hyp_va(vcpu);
> @@ -496,8 +500,6 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  		/* And we're baaack! */
>  	} while (fixup_guest_exit(vcpu, &exit_code));
>
> -	fp_enabled = __fpsimd_enabled_nvhe();
> -
>  	__sysreg_save_state_nvhe(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
>  	__timer_disable_traps(vcpu);
> @@ -508,11 +510,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>
>  	__sysreg_restore_state_nvhe(host_ctxt);
>
> -	if (fp_enabled) {
> -		__fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs);
> -		__fpsimd_restore_state(&host_ctxt->gp_regs.fp_regs);
> +	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
>  		__fpsimd_save_fpexc32(vcpu);
> -	}
>
>  	/*
>  	 * This must come after restoring the host sysregs, since a non-VHE
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index a4c1b76..bee226c 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -363,10 +363,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	kvm_vgic_load(vcpu);
>  	kvm_timer_vcpu_load(vcpu);
>  	kvm_vcpu_load_sysregs(vcpu);
> +	kvm_arch_vcpu_load_fp(vcpu);
>  }
>
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	kvm_arch_vcpu_put_fp(vcpu);
>  	kvm_vcpu_put_sysregs(vcpu);
>  	kvm_timer_vcpu_put(vcpu);
>  	kvm_vgic_put(vcpu);
> @@ -778,6 +780,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (static_branch_unlikely(&userspace_irqchip_in_use))
>  			kvm_timer_sync_hwstate(vcpu);
>
> +		kvm_arch_vcpu_ctxsync_fp(vcpu);
> +
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still

Minor bike-shedding aside:

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 11/18] arm64/sve: Move read_zcr_features() out of cpufeature.h
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 10:12     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 10:12 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> Having read_zcr_features() inline in cpufeature.h results in that
> header requiring #includes which make it hard to include
> <asm/fpsimd.h> elsewhere without triggering header inclusion
> cycles.
>
> This is not a hot-path function and arguably should not be in
> cpufeature.h in the first place, so this patch moves it to
> fpsimd.c, compiled conditionally if CONFIG_ARM64_SVE=y.
>
> This allows some SVE-related #includes to be dropped from
> cpufeature.h, which will ease future maintenance.
>
> A couple of missing #includes of <asm/fpsimd.h> are exposed by this
> change under arch/arm64/.  This patch adds the missing #includes as
> necessary.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  arch/arm64/include/asm/cpufeature.h | 29 -----------------------------
>  arch/arm64/include/asm/fpsimd.h     |  2 ++
>  arch/arm64/include/asm/processor.h  |  1 +
>  arch/arm64/kernel/fpsimd.c          | 28 ++++++++++++++++++++++++++++
>  arch/arm64/kernel/ptrace.c          |  1 +
>  5 files changed, 32 insertions(+), 29 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 09b0f2a..0a6b713 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -11,9 +11,7 @@
>
>  #include <asm/cpucaps.h>
>  #include <asm/cputype.h>
> -#include <asm/fpsimd.h>
>  #include <asm/hwcap.h>
> -#include <asm/sigcontext.h>
>  #include <asm/sysreg.h>
>
>  /*
> @@ -510,33 +508,6 @@ static inline bool system_supports_sve(void)
>  		cpus_have_const_cap(ARM64_SVE);
>  }
>
> -/*
> - * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
> - * vector length.
> - *
> - * Use only if SVE is present.
> - * This function clobbers the SVE vector length.
> - */
> -static inline u64 read_zcr_features(void)
> -{
> -	u64 zcr;
> -	unsigned int vq_max;
> -
> -	/*
> -	 * Set the maximum possible VL, and write zeroes to all other
> -	 * bits to see if they stick.
> -	 */
> -	sve_kernel_enable(NULL);
> -	write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
> -
> -	zcr = read_sysreg_s(SYS_ZCR_EL1);
> -	zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
> -	vq_max = sve_vq_from_vl(sve_get_vl());
> -	zcr |= vq_max - 1; /* set LEN field to maximum effective value */
> -
> -	return zcr;
> -}
> -
>  #endif /* __ASSEMBLY__ */
>
>  #endif
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 3e00f70..fb60b22 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -69,6 +69,8 @@ extern unsigned int sve_get_vl(void);
>  struct arm64_cpu_capabilities;
>  extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
>
> +extern u64 read_zcr_features(void);
> +
>  extern int __ro_after_init sve_max_vl;
>
>  #ifdef CONFIG_ARM64_SVE
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 7675989..f902b6d 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -40,6 +40,7 @@
>
>  #include <asm/alternative.h>
>  #include <asm/cpufeature.h>
> +#include <asm/fpsimd.h>
>  #include <asm/hw_breakpoint.h>
>  #include <asm/lse.h>
>  #include <asm/pgtable-hwdef.h>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ded7ffd..5152bbc 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -37,6 +37,7 @@
>  #include <linux/sched/task_stack.h>
>  #include <linux/signal.h>
>  #include <linux/slab.h>
> +#include <linux/stddef.h>
>  #include <linux/sysctl.h>
>
>  #include <asm/esr.h>
> @@ -754,6 +755,33 @@ void sve_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
>  	isb();
>  }
>
> +/*
> + * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
> + * vector length.
> + *
> + * Use only if SVE is present.
> + * This function clobbers the SVE vector length.
> + */
> +u64 read_zcr_features(void)
> +{
> +	u64 zcr;
> +	unsigned int vq_max;
> +
> +	/*
> +	 * Set the maximum possible VL, and write zeroes to all other
> +	 * bits to see if they stick.
> +	 */
> +	sve_kernel_enable(NULL);
> +	write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
> +
> +	zcr = read_sysreg_s(SYS_ZCR_EL1);
> +	zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
> +	vq_max = sve_vq_from_vl(sve_get_vl());
> +	zcr |= vq_max - 1; /* set LEN field to maximum effective value */
> +
> +	return zcr;
> +}
> +
>  void __init sve_setup(void)
>  {
>  	u64 zcr;
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 7ff81fe..78889c4 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -44,6 +44,7 @@
>  #include <asm/compat.h>
>  #include <asm/cpufeature.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/fpsimd.h>
>  #include <asm/pgtable.h>
>  #include <asm/stacktrace.h>
>  #include <asm/syscall.h>


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 11/18] arm64/sve: Move read_zcr_features() out of cpufeature.h
@ 2018-05-24 10:12     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 10:12 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> Having read_zcr_features() inline in cpufeature.h results in that
> header requiring #includes which make it hard to include
> <asm/fpsimd.h> elsewhere without triggering header inclusion
> cycles.
>
> This is not a hot-path function and arguably should not be in
> cpufeature.h in the first place, so this patch moves it to
> fpsimd.c, compiled conditionally if CONFIG_ARM64_SVE=y.
>
> This allows some SVE-related #includes to be dropped from
> cpufeature.h, which will ease future maintenance.
>
> A couple of missing #includes of <asm/fpsimd.h> are exposed by this
> change under arch/arm64/.  This patch adds the missing #includes as
> necessary.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  arch/arm64/include/asm/cpufeature.h | 29 -----------------------------
>  arch/arm64/include/asm/fpsimd.h     |  2 ++
>  arch/arm64/include/asm/processor.h  |  1 +
>  arch/arm64/kernel/fpsimd.c          | 28 ++++++++++++++++++++++++++++
>  arch/arm64/kernel/ptrace.c          |  1 +
>  5 files changed, 32 insertions(+), 29 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 09b0f2a..0a6b713 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -11,9 +11,7 @@
>
>  #include <asm/cpucaps.h>
>  #include <asm/cputype.h>
> -#include <asm/fpsimd.h>
>  #include <asm/hwcap.h>
> -#include <asm/sigcontext.h>
>  #include <asm/sysreg.h>
>
>  /*
> @@ -510,33 +508,6 @@ static inline bool system_supports_sve(void)
>  		cpus_have_const_cap(ARM64_SVE);
>  }
>
> -/*
> - * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
> - * vector length.
> - *
> - * Use only if SVE is present.
> - * This function clobbers the SVE vector length.
> - */
> -static inline u64 read_zcr_features(void)
> -{
> -	u64 zcr;
> -	unsigned int vq_max;
> -
> -	/*
> -	 * Set the maximum possible VL, and write zeroes to all other
> -	 * bits to see if they stick.
> -	 */
> -	sve_kernel_enable(NULL);
> -	write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
> -
> -	zcr = read_sysreg_s(SYS_ZCR_EL1);
> -	zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
> -	vq_max = sve_vq_from_vl(sve_get_vl());
> -	zcr |= vq_max - 1; /* set LEN field to maximum effective value */
> -
> -	return zcr;
> -}
> -
>  #endif /* __ASSEMBLY__ */
>
>  #endif
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 3e00f70..fb60b22 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -69,6 +69,8 @@ extern unsigned int sve_get_vl(void);
>  struct arm64_cpu_capabilities;
>  extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
>
> +extern u64 read_zcr_features(void);
> +
>  extern int __ro_after_init sve_max_vl;
>
>  #ifdef CONFIG_ARM64_SVE
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 7675989..f902b6d 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -40,6 +40,7 @@
>
>  #include <asm/alternative.h>
>  #include <asm/cpufeature.h>
> +#include <asm/fpsimd.h>
>  #include <asm/hw_breakpoint.h>
>  #include <asm/lse.h>
>  #include <asm/pgtable-hwdef.h>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index ded7ffd..5152bbc 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -37,6 +37,7 @@
>  #include <linux/sched/task_stack.h>
>  #include <linux/signal.h>
>  #include <linux/slab.h>
> +#include <linux/stddef.h>
>  #include <linux/sysctl.h>
>
>  #include <asm/esr.h>
> @@ -754,6 +755,33 @@ void sve_kernel_enable(const struct arm64_cpu_capabilities *__always_unused p)
>  	isb();
>  }
>
> +/*
> + * Read the pseudo-ZCR used by cpufeatures to identify the supported SVE
> + * vector length.
> + *
> + * Use only if SVE is present.
> + * This function clobbers the SVE vector length.
> + */
> +u64 read_zcr_features(void)
> +{
> +	u64 zcr;
> +	unsigned int vq_max;
> +
> +	/*
> +	 * Set the maximum possible VL, and write zeroes to all other
> +	 * bits to see if they stick.
> +	 */
> +	sve_kernel_enable(NULL);
> +	write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL1);
> +
> +	zcr = read_sysreg_s(SYS_ZCR_EL1);
> +	zcr &= ~(u64)ZCR_ELx_LEN_MASK; /* find sticky 1s outside LEN field */
> +	vq_max = sve_vq_from_vl(sve_get_vl());
> +	zcr |= vq_max - 1; /* set LEN field to maximum effective value */
> +
> +	return zcr;
> +}
> +
>  void __init sve_setup(void)
>  {
>  	u64 zcr;
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 7ff81fe..78889c4 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -44,6 +44,7 @@
>  #include <asm/compat.h>
>  #include <asm/cpufeature.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/fpsimd.h>
>  #include <asm/pgtable.h>
>  #include <asm/stacktrace.h>
>  #include <asm/syscall.h>


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 12/18] arm64/sve: Switch sve_pffr() argument from task to thread
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 10:12     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 10:12 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> sve_pffr(), which is used to derive the base address used for
> low-level SVE save/restore routines, currently takes the relevant
> task_struct as an argument.
>
> The only accessed fields are actually part of thread_struct, so
> this patch changes the argument type accordingly.  This is done in
> preparation for moving this function to a header, where we do not
> want to have to include <linux/sched.h> due to the consequent
> circular #include problems.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  arch/arm64/kernel/fpsimd.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 5152bbc..c4e9762 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -44,6 +44,7 @@
>  #include <asm/fpsimd.h>
>  #include <asm/cpufeature.h>
>  #include <asm/cputype.h>
> +#include <asm/processor.h>
>  #include <asm/simd.h>
>  #include <asm/sigcontext.h>
>  #include <asm/sysreg.h>
> @@ -167,10 +168,9 @@ static size_t sve_ffr_offset(int vl)
>  	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
>  }
>
> -static void *sve_pffr(struct task_struct *task)
> +static void *sve_pffr(struct thread_struct *thread)
>  {
> -	return (char *)task->thread.sve_state +
> -		sve_ffr_offset(task->thread.sve_vl);
> +	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
>  }
>
>  static void change_cpacr(u64 val, u64 mask)
> @@ -253,7 +253,7 @@ static void task_fpsimd_load(void)
>  	WARN_ON(!in_softirq() && !irqs_disabled());
>
>  	if (system_supports_sve() && test_thread_flag(TIF_SVE))
> -		sve_load_state(sve_pffr(current),
> +		sve_load_state(sve_pffr(&current->thread),
>  			       &current->thread.uw.fpsimd_state.fpsr,
>  			       sve_vq_from_vl(current->thread.sve_vl) - 1);
>  	else
> @@ -284,7 +284,7 @@ void fpsimd_save(void)
>  				return;
>  			}
>
> -			sve_save_state(sve_pffr(current), &st->fpsr);
> +			sve_save_state(sve_pffr(&current->thread), &st->fpsr);
>  		} else
>  			fpsimd_save_state(st);
>  	}


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 12/18] arm64/sve: Switch sve_pffr() argument from task to thread
@ 2018-05-24 10:12     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 10:12 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> sve_pffr(), which is used to derive the base address used for
> low-level SVE save/restore routines, currently takes the relevant
> task_struct as an argument.
>
> The only accessed fields are actually part of thread_struct, so
> this patch changes the argument type accordingly.  This is done in
> preparation for moving this function to a header, where we do not
> want to have to include <linux/sched.h> due to the consequent
> circular #include problems.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  arch/arm64/kernel/fpsimd.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 5152bbc..c4e9762 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -44,6 +44,7 @@
>  #include <asm/fpsimd.h>
>  #include <asm/cpufeature.h>
>  #include <asm/cputype.h>
> +#include <asm/processor.h>
>  #include <asm/simd.h>
>  #include <asm/sigcontext.h>
>  #include <asm/sysreg.h>
> @@ -167,10 +168,9 @@ static size_t sve_ffr_offset(int vl)
>  	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
>  }
>
> -static void *sve_pffr(struct task_struct *task)
> +static void *sve_pffr(struct thread_struct *thread)
>  {
> -	return (char *)task->thread.sve_state +
> -		sve_ffr_offset(task->thread.sve_vl);
> +	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
>  }
>
>  static void change_cpacr(u64 val, u64 mask)
> @@ -253,7 +253,7 @@ static void task_fpsimd_load(void)
>  	WARN_ON(!in_softirq() && !irqs_disabled());
>
>  	if (system_supports_sve() && test_thread_flag(TIF_SVE))
> -		sve_load_state(sve_pffr(current),
> +		sve_load_state(sve_pffr(&current->thread),
>  			       &current->thread.uw.fpsimd_state.fpsr,
>  			       sve_vq_from_vl(current->thread.sve_vl) - 1);
>  	else
> @@ -284,7 +284,7 @@ void fpsimd_save(void)
>  				return;
>  			}
>
> -			sve_save_state(sve_pffr(current), &st->fpsr);
> +			sve_save_state(sve_pffr(&current->thread), &st->fpsr);
>  		} else
>  			fpsimd_save_state(st);
>  	}


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
  2018-05-24 10:09     ` Alex Bennée
@ 2018-05-24 10:18       ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24 10:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Thu, May 24, 2018 at 11:09:02AM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > This patch refactors KVM to align the host and guest FPSIMD
> > save/restore logic with each other for arm64.  This reduces the
> > number of redundant save/restore operations that must occur, and
> > reduces the common-case IRQ blackout time during guest exit storms
> > by saving the host state lazily and optimising away the need to
> > restore the host state before returning to the run loop.
> >
> > Four hooks are defined in order to enable this:
> >
> >  * kvm_arch_vcpu_run_map_fp():
> >    Called on PID change to map necessary bits of current to Hyp.
> >
> >  * kvm_arch_vcpu_load_fp():
> >    Set up FP/SIMD for entering the KVM run loop (parse as
> >    "vcpu_load fp").
> >
> >  * kvm_arch_vcpu_ctxsync_fp():
> >    Get FP/SIMD into a safe state for re-enabling interrupts after a
> >    guest exit back to the run loop.
> >
> >    For arm64 specifically, this involves updating the host kernel's
> >    FPSIMD context tracking metadata so that kernel-mode NEON use
> >    will cause the vcpu's FPSIMD state to be saved back correctly
> >    into the vcpu struct.  This must be done before re-enabling
> >    interrupts because kernel-mode NEON may be used by softirqs.
> >
> >  * kvm_arch_vcpu_put_fp():
> >    Save guest FP/SIMD state back to memory and dissociate from the
> >    CPU ("vcpu_put fp").
> >
> > Also, the arm64 FPSIMD context switch code is updated to enable it
> > to save back FPSIMD state for a vcpu, not just current.  A few
> > helpers drive this:
> >
> >  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
> >    mark this CPU as having context fp (which may belong to a vcpu)
> >    currently loaded in its registers.  This is the non-task
> >    equivalent of the static function fpsimd_bind_to_cpu() in
> >    fpsimd.c.
> >
> >  * task_fpsimd_save():
> >    exported to allow KVM to save the guest's FPSIMD state back to
> >    memory on exit from the run loop.
> >
> >  * fpsimd_flush_state():
> >    invalidate any context's FPSIMD state that is currently loaded.
> >    Used to disassociate the vcpu from the CPU regs on run loop exit.
> >
> > These changes allow the run loop to enable interrupts (and thus
> > softirqs that may use kernel-mode NEON) without having to save the
> > guest's FPSIMD state eagerly.
> >
> > Some new vcpu_arch fields are added to make all this work.  Because
> > host FPSIMD state can now be saved back directly into current's
> > thread_struct as appropriate, host_cpu_context is no longer used
> > for preserving the FPSIMD state.  However, it is still needed for
> > preserving other things such as the host's system registers.  To
> > avoid ABI churn, the redundant storage space in host_cpu_context is
> > not removed for now.
> >
> > arch/arm is not addressed by this patch and continues to use its
> > current save/restore logic.  It could provide implementations of
> > the helpers later if desired.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> >
> > ---
> >
> > Reviewers note: tags retained because this delta is straightforward by
> > itself.  Please shout if you're not happy!
> >
> > Changes since v9:
> >
> >  * Remove redundant set_thread_flag(TIF_FOREIGN_FPSTATE) that is now
> >    implicit in fpsimd_flush_cpu_state().
> > ---
> >  arch/arm/include/asm/kvm_host.h   |   8 +++
> >  arch/arm64/include/asm/fpsimd.h   |   6 +++
> >  arch/arm64/include/asm/kvm_host.h |  21 ++++++++
> >  arch/arm64/kernel/fpsimd.c        |  17 ++++--
> >  arch/arm64/kvm/Kconfig            |   1 +
> >  arch/arm64/kvm/Makefile           |   2 +-
> >  arch/arm64/kvm/fpsimd.c           | 111 ++++++++++++++++++++++++++++++++++++++
> >  arch/arm64/kvm/hyp/switch.c       |  51 +++++++++---------
> >  virt/kvm/arm/arm.c                |   4 ++
> >  9 files changed, 191 insertions(+), 30 deletions(-)
> >  create mode 100644 arch/arm64/kvm/fpsimd.c
> >
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index c7c28c8..ac870b2 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
> >  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> >  			       struct kvm_device_attr *attr);
> >
> > +/*
> > + * VFP/NEON switching is all done by the hyp switch code, so no need to
> > + * coordinate with host context handling for this state:
> > + */
> > +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> > +
> >  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
> >  static inline void kvm_fpsimd_flush_cpu_state(void) {}
> >
> > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> > index aa7162a..3e00f70 100644
> > --- a/arch/arm64/include/asm/fpsimd.h
> > +++ b/arch/arm64/include/asm/fpsimd.h
> > @@ -41,6 +41,8 @@ struct task_struct;
> >  extern void fpsimd_save_state(struct user_fpsimd_state *state);
> >  extern void fpsimd_load_state(struct user_fpsimd_state *state);
> >
> > +extern void fpsimd_save(void);
> > +
> >  extern void fpsimd_thread_switch(struct task_struct *next);
> >  extern void fpsimd_flush_thread(void);
> >
> > @@ -49,7 +51,11 @@ extern void fpsimd_preserve_current_state(void);
> >  extern void fpsimd_restore_current_state(void);
> >  extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
> >
> > +extern void fpsimd_bind_task_to_cpu(void);
> > +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> > +
> >  extern void fpsimd_flush_task_state(struct task_struct *target);
> > +extern void fpsimd_flush_cpu_state(void);
> >  extern void sve_flush_cpu_state(void);
> >
> >  /* Maximum VL that SVE VL-agnostic software can transparently support */
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 146c167..b3fe730 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -30,6 +30,7 @@
> >  #include <asm/kvm.h>
> >  #include <asm/kvm_asm.h>
> >  #include <asm/kvm_mmio.h>
> > +#include <asm/thread_info.h>
> >
> >  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
> >
> > @@ -238,6 +239,10 @@ struct kvm_vcpu_arch {
> >
> >  	/* Pointer to host CPU context */
> >  	kvm_cpu_context_t *host_cpu_context;
> > +
> > +	struct thread_info *host_thread_info;	/* hyp VA */
> > +	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
> > +
> >  	struct {
> >  		/* {Break,watch}point registers */
> >  		struct kvm_guest_debug_arch regs;
> > @@ -295,6 +300,9 @@ struct kvm_vcpu_arch {
> >
> >  /* vcpu_arch flags field values: */
> >  #define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
> > +#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
> > +#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded
> >  */
> 
> I may be descending into bike-shedding territory here but it seems a
> little incongruous to have _ENABLED = guest FP state when we have _HOST
> for host FP state. Why not KVM_ARM64_FP_GUEST?

I thought about this, but wanted to retain the clear relationship
between the _ENABLED flag and the state of the FPSIMD trap controls.

The HOST flag has no direct relationship with trap controls, so these
seemed different enough things to justify different names, though the
inconsistency was a bit annoying.

[...]

> Minor bike-shedding aside:
> 
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Thanks.  I'll probably leave it as is, but shout if you're unhappy with
this.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
@ 2018-05-24 10:18       ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24 10:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 24, 2018 at 11:09:02AM +0100, Alex Benn?e wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > This patch refactors KVM to align the host and guest FPSIMD
> > save/restore logic with each other for arm64.  This reduces the
> > number of redundant save/restore operations that must occur, and
> > reduces the common-case IRQ blackout time during guest exit storms
> > by saving the host state lazily and optimising away the need to
> > restore the host state before returning to the run loop.
> >
> > Four hooks are defined in order to enable this:
> >
> >  * kvm_arch_vcpu_run_map_fp():
> >    Called on PID change to map necessary bits of current to Hyp.
> >
> >  * kvm_arch_vcpu_load_fp():
> >    Set up FP/SIMD for entering the KVM run loop (parse as
> >    "vcpu_load fp").
> >
> >  * kvm_arch_vcpu_ctxsync_fp():
> >    Get FP/SIMD into a safe state for re-enabling interrupts after a
> >    guest exit back to the run loop.
> >
> >    For arm64 specifically, this involves updating the host kernel's
> >    FPSIMD context tracking metadata so that kernel-mode NEON use
> >    will cause the vcpu's FPSIMD state to be saved back correctly
> >    into the vcpu struct.  This must be done before re-enabling
> >    interrupts because kernel-mode NEON may be used by softirqs.
> >
> >  * kvm_arch_vcpu_put_fp():
> >    Save guest FP/SIMD state back to memory and dissociate from the
> >    CPU ("vcpu_put fp").
> >
> > Also, the arm64 FPSIMD context switch code is updated to enable it
> > to save back FPSIMD state for a vcpu, not just current.  A few
> > helpers drive this:
> >
> >  * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
> >    mark this CPU as having context fp (which may belong to a vcpu)
> >    currently loaded in its registers.  This is the non-task
> >    equivalent of the static function fpsimd_bind_to_cpu() in
> >    fpsimd.c.
> >
> >  * task_fpsimd_save():
> >    exported to allow KVM to save the guest's FPSIMD state back to
> >    memory on exit from the run loop.
> >
> >  * fpsimd_flush_state():
> >    invalidate any context's FPSIMD state that is currently loaded.
> >    Used to disassociate the vcpu from the CPU regs on run loop exit.
> >
> > These changes allow the run loop to enable interrupts (and thus
> > softirqs that may use kernel-mode NEON) without having to save the
> > guest's FPSIMD state eagerly.
> >
> > Some new vcpu_arch fields are added to make all this work.  Because
> > host FPSIMD state can now be saved back directly into current's
> > thread_struct as appropriate, host_cpu_context is no longer used
> > for preserving the FPSIMD state.  However, it is still needed for
> > preserving other things such as the host's system registers.  To
> > avoid ABI churn, the redundant storage space in host_cpu_context is
> > not removed for now.
> >
> > arch/arm is not addressed by this patch and continues to use its
> > current save/restore logic.  It could provide implementations of
> > the helpers later if desired.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> >
> > ---
> >
> > Reviewers note: tags retained because this delta is straightforward by
> > itself.  Please shout if you're not happy!
> >
> > Changes since v9:
> >
> >  * Remove redundant set_thread_flag(TIF_FOREIGN_FPSTATE) that is now
> >    implicit in fpsimd_flush_cpu_state().
> > ---
> >  arch/arm/include/asm/kvm_host.h   |   8 +++
> >  arch/arm64/include/asm/fpsimd.h   |   6 +++
> >  arch/arm64/include/asm/kvm_host.h |  21 ++++++++
> >  arch/arm64/kernel/fpsimd.c        |  17 ++++--
> >  arch/arm64/kvm/Kconfig            |   1 +
> >  arch/arm64/kvm/Makefile           |   2 +-
> >  arch/arm64/kvm/fpsimd.c           | 111 ++++++++++++++++++++++++++++++++++++++
> >  arch/arm64/kvm/hyp/switch.c       |  51 +++++++++---------
> >  virt/kvm/arm/arm.c                |   4 ++
> >  9 files changed, 191 insertions(+), 30 deletions(-)
> >  create mode 100644 arch/arm64/kvm/fpsimd.c
> >
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index c7c28c8..ac870b2 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -303,6 +303,14 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
> >  int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> >  			       struct kvm_device_attr *attr);
> >
> > +/*
> > + * VFP/NEON switching is all done by the hyp switch code, so no need to
> > + * coordinate with host context handling for this state:
> > + */
> > +static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
> > +
> >  /* All host FP/SIMD state is restored on guest exit, so nothing to save: */
> >  static inline void kvm_fpsimd_flush_cpu_state(void) {}
> >
> > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> > index aa7162a..3e00f70 100644
> > --- a/arch/arm64/include/asm/fpsimd.h
> > +++ b/arch/arm64/include/asm/fpsimd.h
> > @@ -41,6 +41,8 @@ struct task_struct;
> >  extern void fpsimd_save_state(struct user_fpsimd_state *state);
> >  extern void fpsimd_load_state(struct user_fpsimd_state *state);
> >
> > +extern void fpsimd_save(void);
> > +
> >  extern void fpsimd_thread_switch(struct task_struct *next);
> >  extern void fpsimd_flush_thread(void);
> >
> > @@ -49,7 +51,11 @@ extern void fpsimd_preserve_current_state(void);
> >  extern void fpsimd_restore_current_state(void);
> >  extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
> >
> > +extern void fpsimd_bind_task_to_cpu(void);
> > +extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state);
> > +
> >  extern void fpsimd_flush_task_state(struct task_struct *target);
> > +extern void fpsimd_flush_cpu_state(void);
> >  extern void sve_flush_cpu_state(void);
> >
> >  /* Maximum VL that SVE VL-agnostic software can transparently support */
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 146c167..b3fe730 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -30,6 +30,7 @@
> >  #include <asm/kvm.h>
> >  #include <asm/kvm_asm.h>
> >  #include <asm/kvm_mmio.h>
> > +#include <asm/thread_info.h>
> >
> >  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
> >
> > @@ -238,6 +239,10 @@ struct kvm_vcpu_arch {
> >
> >  	/* Pointer to host CPU context */
> >  	kvm_cpu_context_t *host_cpu_context;
> > +
> > +	struct thread_info *host_thread_info;	/* hyp VA */
> > +	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
> > +
> >  	struct {
> >  		/* {Break,watch}point registers */
> >  		struct kvm_guest_debug_arch regs;
> > @@ -295,6 +300,9 @@ struct kvm_vcpu_arch {
> >
> >  /* vcpu_arch flags field values: */
> >  #define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
> > +#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
> > +#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded
> >  */
> 
> I may be descending into bike-shedding territory here but it seems a
> little incongruous to have _ENABLED = guest FP state when we have _HOST
> for host FP state. Why not KVM_ARM64_FP_GUEST?

I thought about this, but wanted to retain the clear relationship
between the _ENABLED flag and the state of the FPSIMD trap controls.

The HOST flag has no direct relationship with trap controls, so these
seemed different enough things to justify different names, though the
inconsistency was a bit annoying.

[...]

> Minor bike-shedding aside:
> 
> Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

Thanks.  I'll probably leave it as is, but shout if you're unhappy with
this.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 13/18] arm64/sve: Move sve_pffr() to fpsimd.h and make inline
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 10:20     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 10:20 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In order to make sve_save_state()/sve_load_state() more easily
> reusable and to get rid of a potential branch on context switch
> critical paths, this patch makes sve_pffr() inline and moves it to
> fpsimd.h.
>
> <asm/processor.h> must be included in fpsimd.h in order to make
> this work, and this creates an #include cycle that is tricky to
> avoid without modifying core code, due to the way the PR_SVE_*()
> prctl helpers are included in the core prctl implementation.
>
> Instead of breaking the cycle, this patch defers inclusion of
> <asm/fpsimd.h> in <asm/processor.h> until the point where it is
> actually needed: i.e., immediately before the prctl definitions.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/fpsimd.h    | 13 +++++++++++++
>  arch/arm64/include/asm/processor.h |  3 ++-
>  arch/arm64/kernel/fpsimd.c         | 12 ------------
>  3 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index fb60b22..fa92747 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -18,6 +18,8 @@
>
>  #include <asm/ptrace.h>
>  #include <asm/errno.h>
> +#include <asm/processor.h>
> +#include <asm/sigcontext.h>
>
>  #ifndef __ASSEMBLY__
>
> @@ -61,6 +63,17 @@ extern void sve_flush_cpu_state(void);
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
>  #define SVE_VL_ARCH_MAX 0x100
>
> +/* Offset of FFR in the SVE register dump */
> +static inline size_t sve_ffr_offset(int vl)
> +{
> +	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
> +}
> +
> +static inline void *sve_pffr(struct thread_struct *thread)
> +{
> +	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
> +}
> +
>  extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  			   unsigned long vq_minus_1);
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index f902b6d..ebaadb1 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -40,7 +40,6 @@
>
>  #include <asm/alternative.h>
>  #include <asm/cpufeature.h>
> -#include <asm/fpsimd.h>
>  #include <asm/hw_breakpoint.h>
>  #include <asm/lse.h>
>  #include <asm/pgtable-hwdef.h>
> @@ -245,6 +244,8 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused);
>  void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused);
>  void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused);
>
> +#include <asm/fpsimd.h>
> +

You really need a one-liner comment to note why the include is in a
funny place to save someone just moving it back and then getting really
confused. Maybe:

  /* included just in time to avoid circular inclusion issues */
  #include <asm/fpsimd.h>

It still seems weird to me though :-/

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 13/18] arm64/sve: Move sve_pffr() to fpsimd.h and make inline
@ 2018-05-24 10:20     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 10:20 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In order to make sve_save_state()/sve_load_state() more easily
> reusable and to get rid of a potential branch on context switch
> critical paths, this patch makes sve_pffr() inline and moves it to
> fpsimd.h.
>
> <asm/processor.h> must be included in fpsimd.h in order to make
> this work, and this creates an #include cycle that is tricky to
> avoid without modifying core code, due to the way the PR_SVE_*()
> prctl helpers are included in the core prctl implementation.
>
> Instead of breaking the cycle, this patch defers inclusion of
> <asm/fpsimd.h> in <asm/processor.h> until the point where it is
> actually needed: i.e., immediately before the prctl definitions.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/fpsimd.h    | 13 +++++++++++++
>  arch/arm64/include/asm/processor.h |  3 ++-
>  arch/arm64/kernel/fpsimd.c         | 12 ------------
>  3 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index fb60b22..fa92747 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -18,6 +18,8 @@
>
>  #include <asm/ptrace.h>
>  #include <asm/errno.h>
> +#include <asm/processor.h>
> +#include <asm/sigcontext.h>
>
>  #ifndef __ASSEMBLY__
>
> @@ -61,6 +63,17 @@ extern void sve_flush_cpu_state(void);
>  /* Maximum VL that SVE VL-agnostic software can transparently support */
>  #define SVE_VL_ARCH_MAX 0x100
>
> +/* Offset of FFR in the SVE register dump */
> +static inline size_t sve_ffr_offset(int vl)
> +{
> +	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
> +}
> +
> +static inline void *sve_pffr(struct thread_struct *thread)
> +{
> +	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
> +}
> +
>  extern void sve_save_state(void *state, u32 *pfpsr);
>  extern void sve_load_state(void const *state, u32 const *pfpsr,
>  			   unsigned long vq_minus_1);
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index f902b6d..ebaadb1 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -40,7 +40,6 @@
>
>  #include <asm/alternative.h>
>  #include <asm/cpufeature.h>
> -#include <asm/fpsimd.h>
>  #include <asm/hw_breakpoint.h>
>  #include <asm/lse.h>
>  #include <asm/pgtable-hwdef.h>
> @@ -245,6 +244,8 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused);
>  void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused);
>  void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused);
>
> +#include <asm/fpsimd.h>
> +

You really need a one-liner comment to note why the include is in a
funny place to save someone just moving it back and then getting really
confused. Maybe:

  /* included just in time to avoid circular inclusion issues */
  #include <asm/fpsimd.h>

It still seems weird to me though :-/

Otherwise:

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 13/18] arm64/sve: Move sve_pffr() to fpsimd.h and make inline
  2018-05-24 10:20     ` Alex Bennée
@ 2018-05-24 11:22       ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24 11:22 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Thu, May 24, 2018 at 11:20:59AM +0100, Alex Bennée wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > In order to make sve_save_state()/sve_load_state() more easily
> > reusable and to get rid of a potential branch on context switch
> > critical paths, this patch makes sve_pffr() inline and moves it to
> > fpsimd.h.
> >
> > <asm/processor.h> must be included in fpsimd.h in order to make
> > this work, and this creates an #include cycle that is tricky to
> > avoid without modifying core code, due to the way the PR_SVE_*()
> > prctl helpers are included in the core prctl implementation.
> >
> > Instead of breaking the cycle, this patch defers inclusion of
> > <asm/fpsimd.h> in <asm/processor.h> until the point where it is
> > actually needed: i.e., immediately before the prctl definitions.
> >
> > No functional change.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm64/include/asm/fpsimd.h    | 13 +++++++++++++
> >  arch/arm64/include/asm/processor.h |  3 ++-
> >  arch/arm64/kernel/fpsimd.c         | 12 ------------
> >  3 files changed, 15 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> > index fb60b22..fa92747 100644
> > --- a/arch/arm64/include/asm/fpsimd.h
> > +++ b/arch/arm64/include/asm/fpsimd.h
> > @@ -18,6 +18,8 @@
> >
> >  #include <asm/ptrace.h>
> >  #include <asm/errno.h>
> > +#include <asm/processor.h>
> > +#include <asm/sigcontext.h>
> >
> >  #ifndef __ASSEMBLY__
> >
> > @@ -61,6 +63,17 @@ extern void sve_flush_cpu_state(void);
> >  /* Maximum VL that SVE VL-agnostic software can transparently support */
> >  #define SVE_VL_ARCH_MAX 0x100
> >
> > +/* Offset of FFR in the SVE register dump */
> > +static inline size_t sve_ffr_offset(int vl)
> > +{
> > +	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
> > +}
> > +
> > +static inline void *sve_pffr(struct thread_struct *thread)
> > +{
> > +	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
> > +}
> > +
> >  extern void sve_save_state(void *state, u32 *pfpsr);
> >  extern void sve_load_state(void const *state, u32 const *pfpsr,
> >  			   unsigned long vq_minus_1);
> > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> > index f902b6d..ebaadb1 100644
> > --- a/arch/arm64/include/asm/processor.h
> > +++ b/arch/arm64/include/asm/processor.h
> > @@ -40,7 +40,6 @@
> >
> >  #include <asm/alternative.h>
> >  #include <asm/cpufeature.h>
> > -#include <asm/fpsimd.h>
> >  #include <asm/hw_breakpoint.h>
> >  #include <asm/lse.h>
> >  #include <asm/pgtable-hwdef.h>
> > @@ -245,6 +244,8 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused);
> >  void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused);
> >  void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused);
> >
> > +#include <asm/fpsimd.h>
> > +
> 
> You really need a one-liner comment to note why the include is in a
> funny place to save someone just moving it back and then getting really
> confused. Maybe:
> 
>   /* included just in time to avoid circular inclusion issues */
>   #include <asm/fpsimd.h>
> 
> It still seems weird to me though :-/

How about

/*                                                                              
 * Not at the top of the file due to a direct #include cycle between            
 * <asm/fpsimd.h> and <asm/processor.h>.  Deferring this #include               
 * ensures that contents of processor.h are visible to fpsimd.h even if         
 * processor.h is included first.                                               
 *                                                                              
 * These prctl helpers are the only things in this file that require            
 * fpsimd.h.  The core code expects them to be in this header.                  
 */

?

Cheers
---Dave

> 
> Otherwise:
> 
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 13/18] arm64/sve: Move sve_pffr() to fpsimd.h and make inline
@ 2018-05-24 11:22       ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 24, 2018 at 11:20:59AM +0100, Alex Benn?e wrote:
> 
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > In order to make sve_save_state()/sve_load_state() more easily
> > reusable and to get rid of a potential branch on context switch
> > critical paths, this patch makes sve_pffr() inline and moves it to
> > fpsimd.h.
> >
> > <asm/processor.h> must be included in fpsimd.h in order to make
> > this work, and this creates an #include cycle that is tricky to
> > avoid without modifying core code, due to the way the PR_SVE_*()
> > prctl helpers are included in the core prctl implementation.
> >
> > Instead of breaking the cycle, this patch defers inclusion of
> > <asm/fpsimd.h> in <asm/processor.h> until the point where it is
> > actually needed: i.e., immediately before the prctl definitions.
> >
> > No functional change.
> >
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm64/include/asm/fpsimd.h    | 13 +++++++++++++
> >  arch/arm64/include/asm/processor.h |  3 ++-
> >  arch/arm64/kernel/fpsimd.c         | 12 ------------
> >  3 files changed, 15 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> > index fb60b22..fa92747 100644
> > --- a/arch/arm64/include/asm/fpsimd.h
> > +++ b/arch/arm64/include/asm/fpsimd.h
> > @@ -18,6 +18,8 @@
> >
> >  #include <asm/ptrace.h>
> >  #include <asm/errno.h>
> > +#include <asm/processor.h>
> > +#include <asm/sigcontext.h>
> >
> >  #ifndef __ASSEMBLY__
> >
> > @@ -61,6 +63,17 @@ extern void sve_flush_cpu_state(void);
> >  /* Maximum VL that SVE VL-agnostic software can transparently support */
> >  #define SVE_VL_ARCH_MAX 0x100
> >
> > +/* Offset of FFR in the SVE register dump */
> > +static inline size_t sve_ffr_offset(int vl)
> > +{
> > +	return SVE_SIG_FFR_OFFSET(sve_vq_from_vl(vl)) - SVE_SIG_REGS_OFFSET;
> > +}
> > +
> > +static inline void *sve_pffr(struct thread_struct *thread)
> > +{
> > +	return (char *)thread->sve_state + sve_ffr_offset(thread->sve_vl);
> > +}
> > +
> >  extern void sve_save_state(void *state, u32 *pfpsr);
> >  extern void sve_load_state(void const *state, u32 const *pfpsr,
> >  			   unsigned long vq_minus_1);
> > diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> > index f902b6d..ebaadb1 100644
> > --- a/arch/arm64/include/asm/processor.h
> > +++ b/arch/arm64/include/asm/processor.h
> > @@ -40,7 +40,6 @@
> >
> >  #include <asm/alternative.h>
> >  #include <asm/cpufeature.h>
> > -#include <asm/fpsimd.h>
> >  #include <asm/hw_breakpoint.h>
> >  #include <asm/lse.h>
> >  #include <asm/pgtable-hwdef.h>
> > @@ -245,6 +244,8 @@ void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused);
> >  void cpu_enable_cache_maint_trap(const struct arm64_cpu_capabilities *__unused);
> >  void cpu_clear_disr(const struct arm64_cpu_capabilities *__unused);
> >
> > +#include <asm/fpsimd.h>
> > +
> 
> You really need a one-liner comment to note why the include is in a
> funny place to save someone just moving it back and then getting really
> confused. Maybe:
> 
>   /* included just in time to avoid circular inclusion issues */
>   #include <asm/fpsimd.h>
> 
> It still seems weird to me though :-/

How about

/*                                                                              
 * Not at the top of the file due to a direct #include cycle between            
 * <asm/fpsimd.h> and <asm/processor.h>.  Deferring this #include               
 * ensures that contents of processor.h are visible to fpsimd.h even if         
 * processor.h is included first.                                               
 *                                                                              
 * These prctl helpers are the only things in this file that require            
 * fpsimd.h.  The core code expects them to be in this header.                  
 */

?

Cheers
---Dave

> 
> Otherwise:
> 
> Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-24 10:06                 ` Christoffer Dall
@ 2018-05-24 14:37                   ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24 14:37 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote:
> On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:
> > On Thu, May 24, 2018 at 10:33:50AM +0200, Christoffer Dall wrote:

[...]

> > > ...with a risk of being a bit over-pedantic and annoying, may I suggest
> > > the following complete commit text:
> > > 
> > > ------8<------
> > > Currently the FPSIMD handling code uses the condition task->mm ==
> > > NULL as a hint that task has no FPSIMD register context.
> > > 
> > > The ->mm check is only there to filter out tasks that cannot
> > > possibly have FPSIMD context loaded, for optimisation purposes.
> > > However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > > saving FPSIMD context back to memory.  For this reason, the ->mm
> > > checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> > > maintained properly for kernel threads.
> > > 
> > > FPSIMD context is never preserved for kernel threads across a context
> > > switch and therefore TIF_FOREIGN_FPSTATE should always be true for
> > 
> > (This refactoring opens up the interesting possibility of making
> > kernel-mode NEON in task context preemptible for kernel threads so
> > that we actually do preserve state... but that's a discussion for
> > another day.  There may be code around that relies on
> > kernel_neon_begin() disabling preemption for real.)
> > 
> > > kernel threads.  This is indeed the case, as the wrong_task and
> > 
> > This suggests that TIF_FOREIGN_FPSTATE is always true for kernel
> > threads today.  This is not quite because use_mm() can make mm non-
> > NULL.
> > 
> 
> I was suggesting that it's always true after this patch.

I tend to read the present tense as describing the situation before the
patch, but this convention isn't followed universally.

This was part of the problem with my "true by construction" weasel
words: the described property wasn't true by construction prior to the
patch, and there wasn't sufficient explanation to convince people it's
true afterwards.  If people are bring rigorous, it takes a _lot_ of
explanation...

> 
> > > wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> > > kernel threads.
> > 
> > ("false" -> "true".  My bad.)
> > 
> > > Further, the context switch logic is already deliberately optimised to
> > > defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> > > special case), which kernel threads by definition never reach, and
> > > therefore this change introduces no additional work in the critical
> > > path.
> > > 
> > > This patch removes the redundant checks and special-case code.
> > > ------8<------
> > 
> > Looking at my existing text, I rather reworded it like this.
> > Does this work any better for you?
> > 
> > --8<--
> > 
> > Currently the FPSIMD handling code uses the condition task->mm ==
> > NULL as a hint that task has no FPSIMD register context.
> > 
> > The ->mm check is only there to filter out tasks that cannot
> > possibly have FPSIMD context loaded, for optimisation purposes.
> > Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > saving FPSIMD context back to memory.  For these reasons, the ->mm
> > checks are not useful, providing that TIF_FOREIGN_FPSTATE is
> > maintained in a consistent way for kernel threads.
> 
> Consistent with what?  Without more context or explanation,

Consistent with the handling of user threads (though I admit it's not
explicit in the text.)

> I'm not sure what the reader is to make of that.  Do you not mean the
> TIF_FOREIGN_FPSTATE is always true for kernel threads?

Again, this is probably a red herring.  TIF_FOREIGN_FPSTATE is always
true for kernel threads prior to the patch, except (randomly) for the
init task.

This change is not really about TIF_FOREIGN_FPSTATE at all, rather
that there is nothing to justify handling kernel threads differently,
or even distinguishing kernel threads from user threads at all in this
code.

Part of the confusion (and I had confused myself) comes from the fact
that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make
sense as a per-task property -- i.e., the flag is meaningless for
scheduled-out tasks and we must explicitly "repair" it when scheduling
a task in anyway.  I think it's a thread flag primarily so that it's
convenient to check alongside other thread flags in the ret_to_user
work loop.  This is somewhat less of a justification now that loop was
ported to C.

> > 
> > The context switch logic is already deliberately optimised to defer
> > reloads of the regs until ret_to_user (or sigreturn as a special
> > case), and save them only if they have been previously loaded.

Does it help to insert the following here?

"These paths are the only places where the wrong_task and wrong_cpu
conditions can be made false, by calling fpsimd_bind_task_to_cpu()."

> > Kernel threads by definition never reach these paths.  As a result,
> 
> I'm struggling with the "As a result," here.  Is this because reloads of
> regs in ret_to_user (or sigreturn) are the only places that can make
> wrong_cpu or wrong_task be false?

See the proposed clarification above.  Is that sufficient?

> (I'm actually wanting to understand this, not just bikeshedding the
> commit message, as new corner cases keep coming up on this logic.)

That's a good thing, and I would really like to explain it in a
concise manner.  See [*] below for the "concise" explanation -- it may
demonstrate why I've been evasive...

> > the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
> > always yield true for kernel threads.
> > 
> > This patch removes the redundant checks and special-case code,                  ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread               is scheduled in, and ensures that this flag is set for the init
> > task.  The fpsimd_flush_task_state() call already present in                    copy_thread() ensures the same for any new task.
> 
> nit: funny formatting

Dang, I was repeatedly pasing between Mutt and git commit terminals,
which doesn't always work as I'd like...

> nit: ensuring that TIF_FOREIGN_FPSTATE *remains* set whenever a kernel
> thread is scheduled in?

Er, yes.

> > With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
> > ensures that no extra context save work is added for kernel
> > threads, and eliminates the redundant context saving that may
> > currently occur for kernel threads that have acquired an mm via
> > use_mm().
> > 
> > -->8--
> 
> If you can slightly connect the dots with the "As a result" above, I'm
> fine with your version of the text.


As an aside, the big wall of text before the definition of struct
fpsimd_last_state_struct is looking out of date and could use an
update to cover at least some of what is explained in [*] better.

I'm currently considering that out of scope for this series, but I will
keep it in mind to refresh it in the not too distant future.


Cheers
---Dave

--8<--

[*] The bigger picture:

* Consider a relation (C,T) between cpus C and tasks T, such that
  (C,T) means "T's FPSIMD regs are loaded on cpu C".

  At a given point of execution of some cpu C, there is at most one task
  T for which (C,T) holds.
 
  At a given point of execution of some task T, there is at most one
  cpu C for which (C,T) holds.

* (C,T) becomes true whenever T's registers are loaded into cpu C.

* At sched-out, we must ensure that the registers of current are
  loaded before writing them to current's thread_struct.  Thus, we
  must save the registers if and only if (smp_processor_id(), current)
  holds at this time.

* Before entering userspace, we must ensure that current's regs
  are loaded, and we must only load the regs if they are not loaded
  already (since if so, they might have been dirtied by current in
  userspace since last loaded).

  Thus, when entering userspace, we must load the regs from memory
  if and only if (smp_processor_id(), current) does not hold.

* Checking this relation involves per-CPU access and inspection of
  current->thread, and was presumably considered too cumbersome for
  implemenation an entry.S, particluarly in the ret_to_user work
  pending loop (which is where the FPSIMD regs are finally loaded
  before entering userspace, if they weren't loaded already).

  To mitigate this, the status of the check is cached in a thread flag
  TIF_FOREIGN_FPSTATE: with softirqs disabled, (smp_processor_id(),
  current) holds if and only if TIF_FOREIGN_FPSTATE is false.
  TIF_FOREIGN_FPSTATE is corrected on sched-in by the code in
  fpsimd_thread_switch().

[2] Anything that changes the state of the relation for current
  requires its TIF_FOREIGN_FPSTATE to be changed to match.

* (smp_processor_id(), current) is established in
  fpsimd_bind_task_to_cpu().  This is the only way the relation can be
  made to hold between a task and a CPU.

* (C,T) is broken whenever

[1] T is created;

  * T's regs are loaded onto a different cpu C2, so (C2,T) becomes
    true and (C,T) necessarily becomes false;

  * another task's regs are loaded into C, so (C,T2) becomes true
    and (C,T) necessarily becomes false;

  * the kernel clobbers the regs on C for its own purposes, so
    (C,T) becomes false but there is no T2 for which (C,T2) becomes
    true as a result.  Examples are kernel-mode NEON and loading
    the regs for a KVM vcpu;

  * T's register context changes via a thread_struct update instead
    of running instructions in userspace, requiring the contents of
    the hardware regs to be thrown away.  Examples are exec() (which
    requires the registers to be zeroed), sigreturn (which populates the
    regs from the user signal frame) and modification of the registers
    via PTRACE_SETREGSET;

    As a (probably unnecesary) optimisation, sigreturn immediately
    loads the registers and reestablishes (smp_processor_id(), current)
    in anticipation of the return to userspace which is likely to
    occur soon.  This allows the relation breaking logic to be omitted
    in fpsimd_update_current_state() which does the work.

* In general, these relation breakings involve an unknown: knowing
  either C or T but *not* both, we want to break (C,T).  If the
  relation were recorded in task_struct only, we would need to scan all
  tasks in the "T unknown" case.  If the relation were recorded in a
  percpu variable only, we would need to scan all CPUs in the "C
  unknown" case.  As well as having gnarly synchronisation
  requirements, these would get expensive in many-tasks or many-cpus
  situations.

  This is why the relation is recorded in both places, and is only
  deemed to hold if the two records match up.  This is what
  fpsimd_thread_switch() is checking for the task being scheduled in.

  The invalidation (breaking) operations are now factored as

  fpsimd_flush_task_state(): falsify (C,current) for every cpu C.
  This is done by zapping current->thread.fpsimd_cpu with NR_CPUS
  (chosen because it cannot match smp_processor_id()).

  fpsumd_flush_cpu_state(): falsify (smp_processor_id(),T) for every
  task T.  This is done by zapping this_cpu(fpsimd_last_state.st)
  with NULL (chosen because it cannot match &T->thread.uw.fpsimd_state
  for any task).

  By [2] above, it is necessary to ensure that TIF_FOREIGN_FPSTATE is
  set after calling either of the above functions.  Of the two,
  fpsimd_flush_cpu_state() now does this implicitly but
  fpsimd_flush_task_state() does not: but the caller must do it
  instead.  I have a vague memory of some refactoring obstacle that
  dissuaded me from pulling the set_thread_flag in, but I can't
  remember it now.  I may review this later.

* Because the (C,T) relation may need to be manipulated by
  kernel_neon_{begin,end}() in softirq context, examining or
  manipulating for current or the running CPU must be done under
  local_bh_disable().  The same goes for TIF_FOREIGN_FPSTATE which is
  supposed to represent the same condition but may spontaneously become
  stale if softirqs are not masked.  (The rule is not quite as strict
  as this, but in order to make the code easier to reason about, I skip
  the local_bh_disable() only where absolutely necessary --
  restore_sve_fpsimd_context() is the only example today.)

Now, imagine that T is a kernel thread, and consider what needs to
be done differently.  The observation of this patch is that nothing
needs to be done differently at all.

There is a single anomaly relating to [1] above, in the form of a task
that can run without ever being scheduled in: the init task.  Beyond
that, kernel_neon_begin() before the first reschedule would spuriously
save the FPSIMD regs into the init_task's thread struct, even though it
is pointless to do so.  This patch fixes those anomalies by updating
INIT_THREAD and INIT_THREAD_INFO to set up the init task so that it
looks the same as some other kernel thread that has been scheduled in.

There is a strong design motivation to avoid unnecessary loads and
saves of the state, so if removing the special-casing of kernel threads
were to add cost it would imply that the code were _already_ suboptimal
for user tasks.  This patch does not attempt to address that at all,
but by assuming that the code is already well-optimised, "unnecessary"
save/restore work will not be added.  If this were not the case, it
could in any case be fixed independently.

The observation of this _series_ is that we don't need to do very
much in order to be able to generalise the logic to accept KVM vcpus
in place of T.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-24 14:37                   ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-24 14:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote:
> On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:
> > On Thu, May 24, 2018 at 10:33:50AM +0200, Christoffer Dall wrote:

[...]

> > > ...with a risk of being a bit over-pedantic and annoying, may I suggest
> > > the following complete commit text:
> > > 
> > > ------8<------
> > > Currently the FPSIMD handling code uses the condition task->mm ==
> > > NULL as a hint that task has no FPSIMD register context.
> > > 
> > > The ->mm check is only there to filter out tasks that cannot
> > > possibly have FPSIMD context loaded, for optimisation purposes.
> > > However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > > saving FPSIMD context back to memory.  For this reason, the ->mm
> > > checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> > > maintained properly for kernel threads.
> > > 
> > > FPSIMD context is never preserved for kernel threads across a context
> > > switch and therefore TIF_FOREIGN_FPSTATE should always be true for
> > 
> > (This refactoring opens up the interesting possibility of making
> > kernel-mode NEON in task context preemptible for kernel threads so
> > that we actually do preserve state... but that's a discussion for
> > another day.  There may be code around that relies on
> > kernel_neon_begin() disabling preemption for real.)
> > 
> > > kernel threads.  This is indeed the case, as the wrong_task and
> > 
> > This suggests that TIF_FOREIGN_FPSTATE is always true for kernel
> > threads today.  This is not quite because use_mm() can make mm non-
> > NULL.
> > 
> 
> I was suggesting that it's always true after this patch.

I tend to read the present tense as describing the situation before the
patch, but this convention isn't followed universally.

This was part of the problem with my "true by construction" weasel
words: the described property wasn't true by construction prior to the
patch, and there wasn't sufficient explanation to convince people it's
true afterwards.  If people are bring rigorous, it takes a _lot_ of
explanation...

> 
> > > wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> > > kernel threads.
> > 
> > ("false" -> "true".  My bad.)
> > 
> > > Further, the context switch logic is already deliberately optimised to
> > > defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> > > special case), which kernel threads by definition never reach, and
> > > therefore this change introduces no additional work in the critical
> > > path.
> > > 
> > > This patch removes the redundant checks and special-case code.
> > > ------8<------
> > 
> > Looking at my existing text, I rather reworded it like this.
> > Does this work any better for you?
> > 
> > --8<--
> > 
> > Currently the FPSIMD handling code uses the condition task->mm ==
> > NULL as a hint that task has no FPSIMD register context.
> > 
> > The ->mm check is only there to filter out tasks that cannot
> > possibly have FPSIMD context loaded, for optimisation purposes.
> > Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > saving FPSIMD context back to memory.  For these reasons, the ->mm
> > checks are not useful, providing that TIF_FOREIGN_FPSTATE is
> > maintained in a consistent way for kernel threads.
> 
> Consistent with what?  Without more context or explanation,

Consistent with the handling of user threads (though I admit it's not
explicit in the text.)

> I'm not sure what the reader is to make of that.  Do you not mean the
> TIF_FOREIGN_FPSTATE is always true for kernel threads?

Again, this is probably a red herring.  TIF_FOREIGN_FPSTATE is always
true for kernel threads prior to the patch, except (randomly) for the
init task.

This change is not really about TIF_FOREIGN_FPSTATE at all, rather
that there is nothing to justify handling kernel threads differently,
or even distinguishing kernel threads from user threads at all in this
code.

Part of the confusion (and I had confused myself) comes from the fact
that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make
sense as a per-task property -- i.e., the flag is meaningless for
scheduled-out tasks and we must explicitly "repair" it when scheduling
a task in anyway.  I think it's a thread flag primarily so that it's
convenient to check alongside other thread flags in the ret_to_user
work loop.  This is somewhat less of a justification now that loop was
ported to C.

> > 
> > The context switch logic is already deliberately optimised to defer
> > reloads of the regs until ret_to_user (or sigreturn as a special
> > case), and save them only if they have been previously loaded.

Does it help to insert the following here?

"These paths are the only places where the wrong_task and wrong_cpu
conditions can be made false, by calling fpsimd_bind_task_to_cpu()."

> > Kernel threads by definition never reach these paths.  As a result,
> 
> I'm struggling with the "As a result," here.  Is this because reloads of
> regs in ret_to_user (or sigreturn) are the only places that can make
> wrong_cpu or wrong_task be false?

See the proposed clarification above.  Is that sufficient?

> (I'm actually wanting to understand this, not just bikeshedding the
> commit message, as new corner cases keep coming up on this logic.)

That's a good thing, and I would really like to explain it in a
concise manner.  See [*] below for the "concise" explanation -- it may
demonstrate why I've been evasive...

> > the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
> > always yield true for kernel threads.
> > 
> > This patch removes the redundant checks and special-case code,                  ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread               is scheduled in, and ensures that this flag is set for the init
> > task.  The fpsimd_flush_task_state() call already present in                    copy_thread() ensures the same for any new task.
> 
> nit: funny formatting

Dang, I was repeatedly pasing between Mutt and git commit terminals,
which doesn't always work as I'd like...

> nit: ensuring that TIF_FOREIGN_FPSTATE *remains* set whenever a kernel
> thread is scheduled in?

Er, yes.

> > With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
> > ensures that no extra context save work is added for kernel
> > threads, and eliminates the redundant context saving that may
> > currently occur for kernel threads that have acquired an mm via
> > use_mm().
> > 
> > -->8--
> 
> If you can slightly connect the dots with the "As a result" above, I'm
> fine with your version of the text.


As an aside, the big wall of text before the definition of struct
fpsimd_last_state_struct is looking out of date and could use an
update to cover at least some of what is explained in [*] better.

I'm currently considering that out of scope for this series, but I will
keep it in mind to refresh it in the not too distant future.


Cheers
---Dave

--8<--

[*] The bigger picture:

* Consider a relation (C,T) between cpus C and tasks T, such that
  (C,T) means "T's FPSIMD regs are loaded on cpu C".

  At a given point of execution of some cpu C, there is at most one task
  T for which (C,T) holds.
 
  At a given point of execution of some task T, there is at most one
  cpu C for which (C,T) holds.

* (C,T) becomes true whenever T's registers are loaded into cpu C.

* At sched-out, we must ensure that the registers of current are
  loaded before writing them to current's thread_struct.  Thus, we
  must save the registers if and only if (smp_processor_id(), current)
  holds@this time.

* Before entering userspace, we must ensure that current's regs
  are loaded, and we must only load the regs if they are not loaded
  already (since if so, they might have been dirtied by current in
  userspace since last loaded).

  Thus, when entering userspace, we must load the regs from memory
  if and only if (smp_processor_id(), current) does not hold.

* Checking this relation involves per-CPU access and inspection of
  current->thread, and was presumably considered too cumbersome for
  implemenation an entry.S, particluarly in the ret_to_user work
  pending loop (which is where the FPSIMD regs are finally loaded
  before entering userspace, if they weren't loaded already).

  To mitigate this, the status of the check is cached in a thread flag
  TIF_FOREIGN_FPSTATE: with softirqs disabled, (smp_processor_id(),
  current) holds if and only if TIF_FOREIGN_FPSTATE is false.
  TIF_FOREIGN_FPSTATE is corrected on sched-in by the code in
  fpsimd_thread_switch().

[2] Anything that changes the state of the relation for current
  requires its TIF_FOREIGN_FPSTATE to be changed to match.

* (smp_processor_id(), current) is established in
  fpsimd_bind_task_to_cpu().  This is the only way the relation can be
  made to hold between a task and a CPU.

* (C,T) is broken whenever

[1] T is created;

  * T's regs are loaded onto a different cpu C2, so (C2,T) becomes
    true and (C,T) necessarily becomes false;

  * another task's regs are loaded into C, so (C,T2) becomes true
    and (C,T) necessarily becomes false;

  * the kernel clobbers the regs on C for its own purposes, so
    (C,T) becomes false but there is no T2 for which (C,T2) becomes
    true as a result.  Examples are kernel-mode NEON and loading
    the regs for a KVM vcpu;

  * T's register context changes via a thread_struct update instead
    of running instructions in userspace, requiring the contents of
    the hardware regs to be thrown away.  Examples are exec() (which
    requires the registers to be zeroed), sigreturn (which populates the
    regs from the user signal frame) and modification of the registers
    via PTRACE_SETREGSET;

    As a (probably unnecesary) optimisation, sigreturn immediately
    loads the registers and reestablishes (smp_processor_id(), current)
    in anticipation of the return to userspace which is likely to
    occur soon.  This allows the relation breaking logic to be omitted
    in fpsimd_update_current_state() which does the work.

* In general, these relation breakings involve an unknown: knowing
  either C or T but *not* both, we want to break (C,T).  If the
  relation were recorded in task_struct only, we would need to scan all
  tasks in the "T unknown" case.  If the relation were recorded in a
  percpu variable only, we would need to scan all CPUs in the "C
  unknown" case.  As well as having gnarly synchronisation
  requirements, these would get expensive in many-tasks or many-cpus
  situations.

  This is why the relation is recorded in both places, and is only
  deemed to hold if the two records match up.  This is what
  fpsimd_thread_switch() is checking for the task being scheduled in.

  The invalidation (breaking) operations are now factored as

  fpsimd_flush_task_state(): falsify (C,current) for every cpu C.
  This is done by zapping current->thread.fpsimd_cpu with NR_CPUS
  (chosen because it cannot match smp_processor_id()).

  fpsumd_flush_cpu_state(): falsify (smp_processor_id(),T) for every
  task T.  This is done by zapping this_cpu(fpsimd_last_state.st)
  with NULL (chosen because it cannot match &T->thread.uw.fpsimd_state
  for any task).

  By [2] above, it is necessary to ensure that TIF_FOREIGN_FPSTATE is
  set after calling either of the above functions.  Of the two,
  fpsimd_flush_cpu_state() now does this implicitly but
  fpsimd_flush_task_state() does not: but the caller must do it
  instead.  I have a vague memory of some refactoring obstacle that
  dissuaded me from pulling the set_thread_flag in, but I can't
  remember it now.  I may review this later.

* Because the (C,T) relation may need to be manipulated by
  kernel_neon_{begin,end}() in softirq context, examining or
  manipulating for current or the running CPU must be done under
  local_bh_disable().  The same goes for TIF_FOREIGN_FPSTATE which is
  supposed to represent the same condition but may spontaneously become
  stale if softirqs are not masked.  (The rule is not quite as strict
  as this, but in order to make the code easier to reason about, I skip
  the local_bh_disable() only where absolutely necessary --
  restore_sve_fpsimd_context() is the only example today.)

Now, imagine that T is a kernel thread, and consider what needs to
be done differently.  The observation of this patch is that nothing
needs to be done differently at all.

There is a single anomaly relating to [1] above, in the form of a task
that can run without ever being scheduled in: the init task.  Beyond
that, kernel_neon_begin() before the first reschedule would spuriously
save the FPSIMD regs into the init_task's thread struct, even though it
is pointless to do so.  This patch fixes those anomalies by updating
INIT_THREAD and INIT_THREAD_INFO to set up the init task so that it
looks the same as some other kernel thread that has been scheduled in.

There is a strong design motivation to avoid unnecessary loads and
saves of the state, so if removing the special-casing of kernel threads
were to add cost it would imply that the code were _already_ suboptimal
for user tasks.  This patch does not attempt to address that at all,
but by assuming that the code is already well-optimised, "unnecessary"
save/restore work will not be added.  If this were not the case, it
could in any case be fixed independently.

The observation of this _series_ is that we don't need to do very
much in order to be able to generalise the logic to accept KVM vcpus
in place of T.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 14:49     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 14:49 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> This patch adds SVE context saving to the hyp FPSIMD context switch
> path.  This means that it is no longer necessary to save the host
> SVE state in advance of entering the guest, when in use.
>
> In order to avoid adding pointless complexity to the code, VHE is
> assumed if SVE is in use.  VHE is an architectural prerequisite for
> SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
> kernels that support both SVE and KVM.
>
> Historically, software models exist that can expose the
> architecturally invalid configuration of SVE without VHE, so if
> this situation is detected at kvm_init() time then KVM will be
> disabled.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>
> ---
>
>  * Tags stripped since v8, please reconfirm if possible:
>
> Formerly-Reviewed-by: Christoffer Dall <christoffer.dall at arm.com>
> Formerly-Acked-by: Marc Zyngier <marc.zyngier at arm.com>
> Formerly-Acked-by: Catalin Marinas <catalin.marinas at arm.com>
>
> Changes since v9:
>
> Requested by Marc Zyngier:
>
>  * Inline check for VHE if SVE is present into kvm_host.h.
>
>    The check has been renamed to the more specific
>    kvm_arch_check_sve_has_vhe(), and the kvm_pr_unimpl() moved back to
>    arm.c (to avoid circular include issues).
>
>    arm.c is not single-arch code, but it is all Arm-specific, so
>    adding a hook like this doesn't seem too unreasonable.
>
> Changes since v8:
>
>  * Add kvm_arch_check_supported() hook, and move arm64-specific check
>    for SVE-implies-VHE into arch/arm64/.
>
>    Due to circular header dependency problems, it is difficult to get
>    the prototype for kvm_pr_*() functions in <asm/kvm_host.h>, so this
>    patch puts arm64's kvm_arch_check_supported() hook out of line.
>    This is not a hot function.
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm64/Kconfig                |  7 +++++++
>  arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
>  arch/arm64/kvm/fpsimd.c           |  1 -
>  arch/arm64/kvm/hyp/switch.c       | 20 +++++++++++++++++++-
>  virt/kvm/arm/arm.c                |  7 +++++++
>  6 files changed, 47 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index ac870b2..3b85bbb 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -280,6 +280,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>
> +static inline bool kvm_arch_check_sve_has_vhe(void) { return true; }
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index eb2cf49..b0d3820 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1130,6 +1130,7 @@ endmenu
>  config ARM64_SVE
>  	bool "ARM Scalable Vector Extension support"
>  	default y
> +	depends on !KVM || ARM64_VHE
>  	help
>  	  The Scalable Vector Extension (SVE) is an extension to the AArch64
>  	  execution state which complements and extends the SIMD functionality
> @@ -1155,6 +1156,12 @@ config ARM64_SVE
>  	  booting the kernel.  If unsure and you are not observing these
>  	  symptoms, you should assume that it is safe to say Y.
>
> +	  CPUs that support SVE are architecturally required to support the
> +	  Virtualization Host Extensions (VHE), so the kernel makes no
> +	  provision for supporting SVE alongside KVM without VHE enabled.
> +	  Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
> +	  KVM in the same kernel image.
> +
>  config ARM64_MODULE_PLTS
>  	bool
>  	select HAVE_MOD_ARCH_SPECIFIC
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b3fe730..06d5a61 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -405,6 +405,19 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>  	kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
>  }
>
> +static inline bool kvm_arch_check_sve_has_vhe(void)
> +{
> +	/*
> +	 * The Arm architecture specifies that imlpementation of SVE
> +	 * requires VHE also to be implemented.  The KVM code for arm64
> +	 * relies on this when SVE is present:
> +	 */
> +	if (system_supports_sve())
> +		return has_vhe();
> +	else
> +		return true;
> +}
> +
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 365933a..dc6ecfa 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -59,7 +59,6 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
>   */
>  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
>  {
> -	BUG_ON(system_supports_sve());
>  	BUG_ON(!current->mm);
>
>  	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 118f300..a6a8c7d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -21,6 +21,7 @@
>
>  #include <kvm/arm_psci.h>
>
> +#include <asm/cpufeature.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
> @@ -28,6 +29,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/processor.h>
>  #include <asm/thread_info.h>
>
>  /* Check whether the FP regs were dirtied while in the host-side run loop: */
> @@ -329,6 +331,8 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  				    struct kvm_vcpu *vcpu)
>  {
> +	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
> +
>  	if (has_vhe())
>  		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>  			     cpacr_el1);
> @@ -339,7 +343,21 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  	isb();
>
>  	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
> -		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> +		/*
> +		 * In the SVE case, VHE is assumed: it is enforced by
> +		 * Kconfig and kvm_arch_init().
> +		 */
> +		if (system_supports_sve() &&
> +		    (vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE)) {
> +			struct thread_struct *thread = container_of(
> +				host_fpsimd,
> +				struct thread_struct, uw.fpsimd_state);
> +
> +			sve_save_state(sve_pffr(thread), &host_fpsimd->fpsr);
> +		} else {
> +			__fpsimd_save_state(host_fpsimd);
> +		}
> +
>  		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
>  	}
>
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index bee226c..ce7c6f3 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -16,6 +16,7 @@
>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>   */
>
> +#include <linux/bug.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/errno.h>
>  #include <linux/err.h>
> @@ -41,6 +42,7 @@
>  #include <asm/mman.h>
>  #include <asm/tlbflush.h>
>  #include <asm/cacheflush.h>
> +#include <asm/cpufeature.h>
>  #include <asm/virt.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
> @@ -1574,6 +1576,11 @@ int kvm_arch_init(void *opaque)
>  		return -ENODEV;
>  	}
>
> +	if (!kvm_arch_check_sve_has_vhe()) {
> +		kvm_pr_unimpl("SVE system without VHE unsupported.  Broken cpu?");
> +		return -ENODEV;
> +	}
> +

Ahh this is going to be a pain when people want to enable system
emulation for SVE in QEMU given our patchy feature implementation (i.e.
we haven't done VHE yet). However that's totally our problem not yours
;-)

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


>  	for_each_online_cpu(cpu) {
>  		smp_call_function_single(cpu, check_kvm_target_cpu, &ret, 1);
>  		if (ret < 0) {


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate
@ 2018-05-24 14:49     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 14:49 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> This patch adds SVE context saving to the hyp FPSIMD context switch
> path.  This means that it is no longer necessary to save the host
> SVE state in advance of entering the guest, when in use.
>
> In order to avoid adding pointless complexity to the code, VHE is
> assumed if SVE is in use.  VHE is an architectural prerequisite for
> SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
> kernels that support both SVE and KVM.
>
> Historically, software models exist that can expose the
> architecturally invalid configuration of SVE without VHE, so if
> this situation is detected at kvm_init() time then KVM will be
> disabled.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>
> ---
>
>  * Tags stripped since v8, please reconfirm if possible:
>
> Formerly-Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
> Formerly-Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Formerly-Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>
> Changes since v9:
>
> Requested by Marc Zyngier:
>
>  * Inline check for VHE if SVE is present into kvm_host.h.
>
>    The check has been renamed to the more specific
>    kvm_arch_check_sve_has_vhe(), and the kvm_pr_unimpl() moved back to
>    arm.c (to avoid circular include issues).
>
>    arm.c is not single-arch code, but it is all Arm-specific, so
>    adding a hook like this doesn't seem too unreasonable.
>
> Changes since v8:
>
>  * Add kvm_arch_check_supported() hook, and move arm64-specific check
>    for SVE-implies-VHE into arch/arm64/.
>
>    Due to circular header dependency problems, it is difficult to get
>    the prototype for kvm_pr_*() functions in <asm/kvm_host.h>, so this
>    patch puts arm64's kvm_arch_check_supported() hook out of line.
>    This is not a hot function.
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm64/Kconfig                |  7 +++++++
>  arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
>  arch/arm64/kvm/fpsimd.c           |  1 -
>  arch/arm64/kvm/hyp/switch.c       | 20 +++++++++++++++++++-
>  virt/kvm/arm/arm.c                |  7 +++++++
>  6 files changed, 47 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index ac870b2..3b85bbb 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -280,6 +280,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>
> +static inline bool kvm_arch_check_sve_has_vhe(void) { return true; }
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index eb2cf49..b0d3820 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1130,6 +1130,7 @@ endmenu
>  config ARM64_SVE
>  	bool "ARM Scalable Vector Extension support"
>  	default y
> +	depends on !KVM || ARM64_VHE
>  	help
>  	  The Scalable Vector Extension (SVE) is an extension to the AArch64
>  	  execution state which complements and extends the SIMD functionality
> @@ -1155,6 +1156,12 @@ config ARM64_SVE
>  	  booting the kernel.  If unsure and you are not observing these
>  	  symptoms, you should assume that it is safe to say Y.
>
> +	  CPUs that support SVE are architecturally required to support the
> +	  Virtualization Host Extensions (VHE), so the kernel makes no
> +	  provision for supporting SVE alongside KVM without VHE enabled.
> +	  Thus, you will need to enable CONFIG_ARM64_VHE if you want to support
> +	  KVM in the same kernel image.
> +
>  config ARM64_MODULE_PLTS
>  	bool
>  	select HAVE_MOD_ARCH_SPECIFIC
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b3fe730..06d5a61 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -405,6 +405,19 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>  	kvm_call_hyp(__kvm_set_tpidr_el2, tpidr_el2);
>  }
>
> +static inline bool kvm_arch_check_sve_has_vhe(void)
> +{
> +	/*
> +	 * The Arm architecture specifies that imlpementation of SVE
> +	 * requires VHE also to be implemented.  The KVM code for arm64
> +	 * relies on this when SVE is present:
> +	 */
> +	if (system_supports_sve())
> +		return has_vhe();
> +	else
> +		return true;
> +}
> +
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 365933a..dc6ecfa 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -59,7 +59,6 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
>   */
>  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
>  {
> -	BUG_ON(system_supports_sve());
>  	BUG_ON(!current->mm);
>
>  	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED | KVM_ARM64_HOST_SVE_IN_USE);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 118f300..a6a8c7d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -21,6 +21,7 @@
>
>  #include <kvm/arm_psci.h>
>
> +#include <asm/cpufeature.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_host.h>
> @@ -28,6 +29,7 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  #include <asm/debug-monitors.h>
> +#include <asm/processor.h>
>  #include <asm/thread_info.h>
>
>  /* Check whether the FP regs were dirtied while in the host-side run loop: */
> @@ -329,6 +331,8 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  				    struct kvm_vcpu *vcpu)
>  {
> +	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
> +
>  	if (has_vhe())
>  		write_sysreg(read_sysreg(cpacr_el1) | CPACR_EL1_FPEN,
>  			     cpacr_el1);
> @@ -339,7 +343,21 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  	isb();
>
>  	if (vcpu->arch.flags & KVM_ARM64_FP_HOST) {
> -		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
> +		/*
> +		 * In the SVE case, VHE is assumed: it is enforced by
> +		 * Kconfig and kvm_arch_init().
> +		 */
> +		if (system_supports_sve() &&
> +		    (vcpu->arch.flags & KVM_ARM64_HOST_SVE_IN_USE)) {
> +			struct thread_struct *thread = container_of(
> +				host_fpsimd,
> +				struct thread_struct, uw.fpsimd_state);
> +
> +			sve_save_state(sve_pffr(thread), &host_fpsimd->fpsr);
> +		} else {
> +			__fpsimd_save_state(host_fpsimd);
> +		}
> +
>  		vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
>  	}
>
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index bee226c..ce7c6f3 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -16,6 +16,7 @@
>   * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
>   */
>
> +#include <linux/bug.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/errno.h>
>  #include <linux/err.h>
> @@ -41,6 +42,7 @@
>  #include <asm/mman.h>
>  #include <asm/tlbflush.h>
>  #include <asm/cacheflush.h>
> +#include <asm/cpufeature.h>
>  #include <asm/virt.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
> @@ -1574,6 +1576,11 @@ int kvm_arch_init(void *opaque)
>  		return -ENODEV;
>  	}
>
> +	if (!kvm_arch_check_sve_has_vhe()) {
> +		kvm_pr_unimpl("SVE system without VHE unsupported.  Broken cpu?");
> +		return -ENODEV;
> +	}
> +

Ahh this is going to be a pain when people want to enable system
emulation for SVE in QEMU given our patchy feature implementation (i.e.
we haven't done VHE yet). However that's totally our problem not yours
;-)

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>


>  	for_each_online_cpu(cpu) {
>  		smp_call_function_single(cpu, check_kvm_target_cpu, &ret, 1);
>  		if (ret < 0) {


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 15/18] KVM: arm64: Remove eager host SVE state saving
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 14:54     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 14:54 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> Now that the host SVE context can be saved on demand from Hyp,
> there is no longer any need to save this state in advance before
> entering the guest.
>
> This patch removes the relevant call to
> kvm_fpsimd_flush_cpu_state().
>
> Since the problem that function was intended to solve now no longer
> exists, the function and its dependencies are also deleted.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Christoffer Dall <christoffer.dall@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  arch/arm/include/asm/kvm_host.h   |  3 ---
>  arch/arm64/include/asm/kvm_host.h | 10 ----------
>  arch/arm64/kernel/fpsimd.c        | 21 ---------------------
>  virt/kvm/arm/arm.c                |  3 ---
>  4 files changed, 37 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 3b85bbb..f079a20 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -312,9 +312,6 @@ static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
>
> -/* All host FP/SIMD state is restored on guest exit, so nothing to save: */
> -static inline void kvm_fpsimd_flush_cpu_state(void) {}
> -
>  static inline void kvm_arm_vhe_guest_enter(void) {}
>  static inline void kvm_arm_vhe_guest_exit(void) {}
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 06d5a61..ce7ed92 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -457,16 +457,6 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>  }
>  #endif
>
> -/*
> - * All host FP/SIMD state is restored on guest exit, so nothing needs
> - * doing here except in the SVE case:
> -*/
> -static inline void kvm_fpsimd_flush_cpu_state(void)
> -{
> -	if (system_supports_sve())
> -		sve_flush_cpu_state();
> -}
> -
>  static inline void kvm_arm_vhe_guest_enter(void)
>  {
>  	local_daif_mask();
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index f39d3b0..ea5d780 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -120,7 +120,6 @@
>   */
>  struct fpsimd_last_state_struct {
>  	struct user_fpsimd_state *st;
> -	bool sve_in_use;
>  };
>
>  static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
> @@ -1003,7 +1002,6 @@ void fpsimd_bind_task_to_cpu(void)
>  		this_cpu_ptr(&fpsimd_last_state);
>
>  	last->st = &current->thread.uw.fpsimd_state;
> -	last->sve_in_use = test_thread_flag(TIF_SVE);
>  	current->thread.fpsimd_cpu = smp_processor_id();
>
>  	if (system_supports_sve()) {
> @@ -1025,7 +1023,6 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
>  	WARN_ON(!in_softirq() && !irqs_disabled());
>
>  	last->st = st;
> -	last->sve_in_use = false;
>  }
>
>  /*
> @@ -1086,24 +1083,6 @@ void fpsimd_flush_cpu_state(void)
>  	set_thread_flag(TIF_FOREIGN_FPSTATE);
>  }
>
> -/*
> - * Invalidate any task SVE state currently held in this CPU's regs.
> - *
> - * This is used to prevent the kernel from trying to reuse SVE register data
> - * that is detroyed by KVM guest enter/exit.  This function should go away when
> - * KVM SVE support is implemented.  Don't use it for anything else.
> - */
> -#ifdef CONFIG_ARM64_SVE
> -void sve_flush_cpu_state(void)
> -{
> -	struct fpsimd_last_state_struct const *last =
> -		this_cpu_ptr(&fpsimd_last_state);
> -
> -	if (last->st && last->sve_in_use)
> -		fpsimd_flush_cpu_state();
> -}
> -#endif /* CONFIG_ARM64_SVE */
> -
>  #ifdef CONFIG_KERNEL_MODE_NEON
>
>  DEFINE_PER_CPU(bool, kernel_neon_busy);
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index ce7c6f3..39e7771 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -682,9 +682,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 */
>  		preempt_disable();
>
> -		/* Flush FP/SIMD state that can't survive guest entry/exit */
> -		kvm_fpsimd_flush_cpu_state();
> -
>  		kvm_pmu_flush_hwstate(vcpu);
>
>  		local_irq_disable();


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 15/18] KVM: arm64: Remove eager host SVE state saving
@ 2018-05-24 14:54     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 14:54 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> Now that the host SVE context can be saved on demand from Hyp,
> there is no longer any need to save this state in advance before
> entering the guest.
>
> This patch removes the relevant call to
> kvm_fpsimd_flush_cpu_state().
>
> Since the problem that function was intended to solve now no longer
> exists, the function and its dependencies are also deleted.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Christoffer Dall <christoffer.dall@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  arch/arm/include/asm/kvm_host.h   |  3 ---
>  arch/arm64/include/asm/kvm_host.h | 10 ----------
>  arch/arm64/kernel/fpsimd.c        | 21 ---------------------
>  virt/kvm/arm/arm.c                |  3 ---
>  4 files changed, 37 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 3b85bbb..f079a20 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -312,9 +312,6 @@ static inline void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) {}
>
> -/* All host FP/SIMD state is restored on guest exit, so nothing to save: */
> -static inline void kvm_fpsimd_flush_cpu_state(void) {}
> -
>  static inline void kvm_arm_vhe_guest_enter(void) {}
>  static inline void kvm_arm_vhe_guest_exit(void) {}
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 06d5a61..ce7ed92 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -457,16 +457,6 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
>  }
>  #endif
>
> -/*
> - * All host FP/SIMD state is restored on guest exit, so nothing needs
> - * doing here except in the SVE case:
> -*/
> -static inline void kvm_fpsimd_flush_cpu_state(void)
> -{
> -	if (system_supports_sve())
> -		sve_flush_cpu_state();
> -}
> -
>  static inline void kvm_arm_vhe_guest_enter(void)
>  {
>  	local_daif_mask();
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index f39d3b0..ea5d780 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -120,7 +120,6 @@
>   */
>  struct fpsimd_last_state_struct {
>  	struct user_fpsimd_state *st;
> -	bool sve_in_use;
>  };
>
>  static DEFINE_PER_CPU(struct fpsimd_last_state_struct, fpsimd_last_state);
> @@ -1003,7 +1002,6 @@ void fpsimd_bind_task_to_cpu(void)
>  		this_cpu_ptr(&fpsimd_last_state);
>
>  	last->st = &current->thread.uw.fpsimd_state;
> -	last->sve_in_use = test_thread_flag(TIF_SVE);
>  	current->thread.fpsimd_cpu = smp_processor_id();
>
>  	if (system_supports_sve()) {
> @@ -1025,7 +1023,6 @@ void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *st)
>  	WARN_ON(!in_softirq() && !irqs_disabled());
>
>  	last->st = st;
> -	last->sve_in_use = false;
>  }
>
>  /*
> @@ -1086,24 +1083,6 @@ void fpsimd_flush_cpu_state(void)
>  	set_thread_flag(TIF_FOREIGN_FPSTATE);
>  }
>
> -/*
> - * Invalidate any task SVE state currently held in this CPU's regs.
> - *
> - * This is used to prevent the kernel from trying to reuse SVE register data
> - * that is detroyed by KVM guest enter/exit.  This function should go away when
> - * KVM SVE support is implemented.  Don't use it for anything else.
> - */
> -#ifdef CONFIG_ARM64_SVE
> -void sve_flush_cpu_state(void)
> -{
> -	struct fpsimd_last_state_struct const *last =
> -		this_cpu_ptr(&fpsimd_last_state);
> -
> -	if (last->st && last->sve_in_use)
> -		fpsimd_flush_cpu_state();
> -}
> -#endif /* CONFIG_ARM64_SVE */
> -
>  #ifdef CONFIG_KERNEL_MODE_NEON
>
>  DEFINE_PER_CPU(bool, kernel_neon_busy);
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index ce7c6f3..39e7771 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -682,9 +682,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 */
>  		preempt_disable();
>
> -		/* Flush FP/SIMD state that can't survive guest entry/exit */
> -		kvm_fpsimd_flush_cpu_state();
> -
>  		kvm_pmu_flush_hwstate(vcpu);
>
>  		local_irq_disable();


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 16/18] KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 15:02     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 15:02 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In fixup_guest_exit(), there are a couple of cases where after
> checking what the exit code was, we assign it explicitly with the
> value it already had.
>
> Assuming this is not indicative of a bug, these assignments are not
> needed.
>
> This patch removes the redundant assignments, and simplifies some
> if-nesting that becomes trivial as a result.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 16/18] KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
@ 2018-05-24 15:02     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 15:02 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> In fixup_guest_exit(), there are a couple of cases where after
> checking what the exit code was, we assign it explicitly with the
> value it already had.
>
> Assuming this is not indicative of a bug, these assignments are not
> needed.
>
> This patch removes the redundant assignments, and simplifies some
> if-nesting that becomes trivial as a result.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Acked-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 17/18] KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 15:06     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 15:06 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> The entire tail of fixup_guest_exit() is contained in if statements
> of the form if (x && *exit_code == ARM_EXCEPTION_TRAP).  As a result,
> we can check just once and bail out of the function early, allowing
> the remaining if conditions to be simplified.
>
> The only awkward case is where *exit_code is changed to
> ARM_EXCEPTION_EL1_SERROR in the case of an illegal GICv2 CPU
> interface access: in that case, the GICv3 trap handling code is
> skipped using a goto.  This avoids pointlessly evaluating the
> static branch check for the GICv3 case, even though we can't have
> vgic_v2_cpuif_trap and vgic_v3_cpuif_trap true simultaneously
> unless we have a GICv3 and GICv2 on the host: that sounds stupid,
> but I haven't satisfied myself that it can't happen.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  arch/arm64/kvm/hyp/switch.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 18d0faa..4fbee95 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -387,11 +387,13 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	 * same PC once the SError has been injected, and replay the
>  	 * trapping instruction.
>  	 */
> -	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
> +	if (*exit_code != ARM_EXCEPTION_TRAP)
> +		goto exit;
> +
> +	if (!__populate_fault_info(vcpu))
>  		return true;
>
> -	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
> -	    *exit_code == ARM_EXCEPTION_TRAP) {
> +	if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
>  		bool valid;
>
>  		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
> @@ -417,11 +419,12 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
>  				*exit_code = ARM_EXCEPTION_EL1_SERROR;
>  			}
> +
> +			goto exit;
>  		}
>  	}
>
>  	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
> -	    *exit_code == ARM_EXCEPTION_TRAP &&
>  	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
>  	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
>  		int ret = __vgic_v3_perform_cpuif_access(vcpu);
> @@ -430,6 +433,7 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  			return true;
>  	}
>
> +exit:
>  	/* Return to the host kernel and handle the exit */
>  	return false;
>  }


--
Alex Bennée

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 17/18] KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
@ 2018-05-24 15:06     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 15:06 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> The entire tail of fixup_guest_exit() is contained in if statements
> of the form if (x && *exit_code == ARM_EXCEPTION_TRAP).  As a result,
> we can check just once and bail out of the function early, allowing
> the remaining if conditions to be simplified.
>
> The only awkward case is where *exit_code is changed to
> ARM_EXCEPTION_EL1_SERROR in the case of an illegal GICv2 CPU
> interface access: in that case, the GICv3 trap handling code is
> skipped using a goto.  This avoids pointlessly evaluating the
> static branch check for the GICv3 case, even though we can't have
> vgic_v2_cpuif_trap and vgic_v3_cpuif_trap true simultaneously
> unless we have a GICv3 and GICv2 on the host: that sounds stupid,
> but I haven't satisfied myself that it can't happen.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  arch/arm64/kvm/hyp/switch.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 18d0faa..4fbee95 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -387,11 +387,13 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	 * same PC once the SError has been injected, and replay the
>  	 * trapping instruction.
>  	 */
> -	if (*exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
> +	if (*exit_code != ARM_EXCEPTION_TRAP)
> +		goto exit;
> +
> +	if (!__populate_fault_info(vcpu))
>  		return true;
>
> -	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
> -	    *exit_code == ARM_EXCEPTION_TRAP) {
> +	if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
>  		bool valid;
>
>  		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
> @@ -417,11 +419,12 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  					*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
>  				*exit_code = ARM_EXCEPTION_EL1_SERROR;
>  			}
> +
> +			goto exit;
>  		}
>  	}
>
>  	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
> -	    *exit_code == ARM_EXCEPTION_TRAP &&
>  	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
>  	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
>  		int ret = __vgic_v3_perform_cpuif_access(vcpu);
> @@ -430,6 +433,7 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  			return true;
>  	}
>
> +exit:
>  	/* Return to the host kernel and handle the exit */
>  	return false;
>  }


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 18/18] KVM: arm64: Invoke FPSIMD context switch trap from C
  2018-05-22 16:05   ` Dave Martin
@ 2018-05-24 15:09     ` Alex Bennée
  -1 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 15:09 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> The conversion of the FPSIMD context switch trap code to C has added
> some overhead to calling it, due to the need to save registers that
> the procedure call standard defines as caller-saved.
>
> So, perhaps it is no longer worth invoking this trap handler quite
> so early.
>
> Instead, we can invoke it from fixup_guest_exit(), with little
> likelihood of increasing the overhead much further.
>
> As a convenience, this patch gives __hyp_switch_fpsimd() the same
> return semantics fixup_guest_exit().  For now there is no
> possibility of a spurious FPSIMD trap, so the function always
> returns true, but this allows it to be tail-called with a single
> return statement.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  arch/arm64/kvm/hyp/entry.S     | 30 ------------------------------
>  arch/arm64/kvm/hyp/hyp-entry.S | 19 -------------------
>  arch/arm64/kvm/hyp/switch.c    | 15 +++++++++++++--
>  3 files changed, 13 insertions(+), 51 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 40f349b..fad1e16 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -166,33 +166,3 @@ abort_guest_exit_end:
>  	orr	x0, x0, x5
>  1:	ret
>  ENDPROC(__guest_exit)
> -
> -ENTRY(__fpsimd_guest_restore)
> -	// x0: esr
> -	// x1: vcpu
> -	// x2-x29,lr: vcpu regs
> -	// vcpu x0-x1 on the stack
> -	stp	x2, x3, [sp, #-144]!
> -	stp	x4, x5, [sp, #16]
> -	stp	x6, x7, [sp, #32]
> -	stp	x8, x9, [sp, #48]
> -	stp	x10, x11, [sp, #64]
> -	stp	x12, x13, [sp, #80]
> -	stp	x14, x15, [sp, #96]
> -	stp	x16, x17, [sp, #112]
> -	stp	x18, lr, [sp, #128]
> -
> -	bl	__hyp_switch_fpsimd
> -
> -	ldp	x4, x5, [sp, #16]
> -	ldp	x6, x7, [sp, #32]
> -	ldp	x8, x9, [sp, #48]
> -	ldp	x10, x11, [sp, #64]
> -	ldp	x12, x13, [sp, #80]
> -	ldp	x14, x15, [sp, #96]
> -	ldp	x16, x17, [sp, #112]
> -	ldp	x18, lr, [sp, #128]
> -	ldp	x0, x1, [sp, #144]
> -	ldp	x2, x3, [sp], #160
> -	eret
> -ENDPROC(__fpsimd_guest_restore)
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index bffece2..753b9d2 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -113,25 +113,6 @@ el1_hvc_guest:
>
>  el1_trap:
>  	get_vcpu_ptr	x1, x0
> -
> -	mrs		x0, esr_el2
> -	lsr		x0, x0, #ESR_ELx_EC_SHIFT
> -	/*
> -	 * x0: ESR_EC
> -	 * x1: vcpu pointer
> -	 */
> -
> -	/*
> -	 * We trap the first access to the FP/SIMD to save the host context
> -	 * and restore the guest context lazily.
> -	 * If FP/SIMD is not implemented, handle the trap and inject an
> -	 * undefined instruction exception to the guest.
> -	 */
> -alternative_if_not ARM64_HAS_NO_FPSIMD
> -	cmp	x0, #ESR_ELx_EC_FP_ASIMD
> -	b.eq	__fpsimd_guest_restore
> -alternative_else_nop_endif
> -
>  	mov	x0, #ARM_EXCEPTION_TRAP
>  	b	__guest_exit
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 4fbee95..2d45bd7 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -328,8 +328,7 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  	}
>  }
>
> -void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> -				    struct kvm_vcpu *vcpu)
> +static bool __hyp_text __hyp_switch_fpsimd(struct kvm_vcpu *vcpu)
>  {
>  	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
>
> @@ -369,6 +368,8 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  			     fpexc32_el2);
>
>  	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
> +
> +	return true;
>  }
>
>  /*
> @@ -390,6 +391,16 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	if (*exit_code != ARM_EXCEPTION_TRAP)
>  		goto exit;
>
> +	/*
> +	 * We trap the first access to the FP/SIMD to save the host context
> +	 * and restore the guest context lazily.
> +	 * If FP/SIMD is not implemented, handle the trap and inject an
> +	 * undefined instruction exception to the guest.
> +	 */
> +	if (system_supports_fpsimd() &&
> +	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD)
> +		return __hyp_switch_fpsimd(vcpu);
> +
>  	if (!__populate_fault_info(vcpu))
>  		return true;


--
Alex Bennée
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 18/18] KVM: arm64: Invoke FPSIMD context switch trap from C
@ 2018-05-24 15:09     ` Alex Bennée
  0 siblings, 0 replies; 138+ messages in thread
From: Alex Bennée @ 2018-05-24 15:09 UTC (permalink / raw)
  To: linux-arm-kernel


Dave Martin <Dave.Martin@arm.com> writes:

> The conversion of the FPSIMD context switch trap code to C has added
> some overhead to calling it, due to the need to save registers that
> the procedure call standard defines as caller-saved.
>
> So, perhaps it is no longer worth invoking this trap handler quite
> so early.
>
> Instead, we can invoke it from fixup_guest_exit(), with little
> likelihood of increasing the overhead much further.
>
> As a convenience, this patch gives __hyp_switch_fpsimd() the same
> return semantics fixup_guest_exit().  For now there is no
> possibility of a spurious FPSIMD trap, so the function always
> returns true, but this allows it to be tail-called with a single
> return statement.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>

Reviewed-by: Alex Benn?e <alex.bennee@linaro.org>

> ---
>  arch/arm64/kvm/hyp/entry.S     | 30 ------------------------------
>  arch/arm64/kvm/hyp/hyp-entry.S | 19 -------------------
>  arch/arm64/kvm/hyp/switch.c    | 15 +++++++++++++--
>  3 files changed, 13 insertions(+), 51 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 40f349b..fad1e16 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -166,33 +166,3 @@ abort_guest_exit_end:
>  	orr	x0, x0, x5
>  1:	ret
>  ENDPROC(__guest_exit)
> -
> -ENTRY(__fpsimd_guest_restore)
> -	// x0: esr
> -	// x1: vcpu
> -	// x2-x29,lr: vcpu regs
> -	// vcpu x0-x1 on the stack
> -	stp	x2, x3, [sp, #-144]!
> -	stp	x4, x5, [sp, #16]
> -	stp	x6, x7, [sp, #32]
> -	stp	x8, x9, [sp, #48]
> -	stp	x10, x11, [sp, #64]
> -	stp	x12, x13, [sp, #80]
> -	stp	x14, x15, [sp, #96]
> -	stp	x16, x17, [sp, #112]
> -	stp	x18, lr, [sp, #128]
> -
> -	bl	__hyp_switch_fpsimd
> -
> -	ldp	x4, x5, [sp, #16]
> -	ldp	x6, x7, [sp, #32]
> -	ldp	x8, x9, [sp, #48]
> -	ldp	x10, x11, [sp, #64]
> -	ldp	x12, x13, [sp, #80]
> -	ldp	x14, x15, [sp, #96]
> -	ldp	x16, x17, [sp, #112]
> -	ldp	x18, lr, [sp, #128]
> -	ldp	x0, x1, [sp, #144]
> -	ldp	x2, x3, [sp], #160
> -	eret
> -ENDPROC(__fpsimd_guest_restore)
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index bffece2..753b9d2 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -113,25 +113,6 @@ el1_hvc_guest:
>
>  el1_trap:
>  	get_vcpu_ptr	x1, x0
> -
> -	mrs		x0, esr_el2
> -	lsr		x0, x0, #ESR_ELx_EC_SHIFT
> -	/*
> -	 * x0: ESR_EC
> -	 * x1: vcpu pointer
> -	 */
> -
> -	/*
> -	 * We trap the first access to the FP/SIMD to save the host context
> -	 * and restore the guest context lazily.
> -	 * If FP/SIMD is not implemented, handle the trap and inject an
> -	 * undefined instruction exception to the guest.
> -	 */
> -alternative_if_not ARM64_HAS_NO_FPSIMD
> -	cmp	x0, #ESR_ELx_EC_FP_ASIMD
> -	b.eq	__fpsimd_guest_restore
> -alternative_else_nop_endif
> -
>  	mov	x0, #ARM_EXCEPTION_TRAP
>  	b	__guest_exit
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 4fbee95..2d45bd7 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -328,8 +328,7 @@ static bool __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
>  	}
>  }
>
> -void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
> -				    struct kvm_vcpu *vcpu)
> +static bool __hyp_text __hyp_switch_fpsimd(struct kvm_vcpu *vcpu)
>  {
>  	struct user_fpsimd_state *host_fpsimd = vcpu->arch.host_fpsimd_state;
>
> @@ -369,6 +368,8 @@ void __hyp_text __hyp_switch_fpsimd(u64 esr __always_unused,
>  			     fpexc32_el2);
>
>  	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
> +
> +	return true;
>  }
>
>  /*
> @@ -390,6 +391,16 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  	if (*exit_code != ARM_EXCEPTION_TRAP)
>  		goto exit;
>
> +	/*
> +	 * We trap the first access to the FP/SIMD to save the host context
> +	 * and restore the guest context lazily.
> +	 * If FP/SIMD is not implemented, handle the trap and inject an
> +	 * undefined instruction exception to the guest.
> +	 */
> +	if (system_supports_fpsimd() &&
> +	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD)
> +		return __hyp_switch_fpsimd(vcpu);
> +
>  	if (!__populate_fault_info(vcpu))
>  		return true;


--
Alex Benn?e

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-24 14:37                   ` Dave Martin
@ 2018-05-25  9:00                     ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-25  9:00 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Thu, May 24, 2018 at 03:37:15PM +0100, Dave Martin wrote:
> On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote:
> > On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:
> > > On Thu, May 24, 2018 at 10:33:50AM +0200, Christoffer Dall wrote:
> 
> [...]
> 
> > > > ...with a risk of being a bit over-pedantic and annoying, may I suggest
> > > > the following complete commit text:
> > > > 
> > > > ------8<------
> > > > Currently the FPSIMD handling code uses the condition task->mm ==
> > > > NULL as a hint that task has no FPSIMD register context.
> > > > 
> > > > The ->mm check is only there to filter out tasks that cannot
> > > > possibly have FPSIMD context loaded, for optimisation purposes.
> > > > However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > > > saving FPSIMD context back to memory.  For this reason, the ->mm
> > > > checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> > > > maintained properly for kernel threads.
> > > > 
> > > > FPSIMD context is never preserved for kernel threads across a context
> > > > switch and therefore TIF_FOREIGN_FPSTATE should always be true for
> > > 
> > > (This refactoring opens up the interesting possibility of making
> > > kernel-mode NEON in task context preemptible for kernel threads so
> > > that we actually do preserve state... but that's a discussion for
> > > another day.  There may be code around that relies on
> > > kernel_neon_begin() disabling preemption for real.)
> > > 
> > > > kernel threads.  This is indeed the case, as the wrong_task and
> > > 
> > > This suggests that TIF_FOREIGN_FPSTATE is always true for kernel
> > > threads today.  This is not quite because use_mm() can make mm non-
> > > NULL.
> > > 
> > 
> > I was suggesting that it's always true after this patch.
> 
> I tend to read the present tense as describing the situation before the
> patch, but this convention isn't followed universally.
> 
> This was part of the problem with my "true by construction" weasel
> words: the described property wasn't true by construction prior to the
> patch, and there wasn't sufficient explanation to convince people it's
> true afterwards.  If people are bring rigorous, it takes a _lot_ of
> explanation...
> 
> > 
> > > > wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> > > > kernel threads.
> > > 
> > > ("false" -> "true".  My bad.)
> > > 
> > > > Further, the context switch logic is already deliberately optimised to
> > > > defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> > > > special case), which kernel threads by definition never reach, and
> > > > therefore this change introduces no additional work in the critical
> > > > path.
> > > > 
> > > > This patch removes the redundant checks and special-case code.
> > > > ------8<------
> > > 
> > > Looking at my existing text, I rather reworded it like this.
> > > Does this work any better for you?
> > > 
> > > --8<--
> > > 
> > > Currently the FPSIMD handling code uses the condition task->mm ==
> > > NULL as a hint that task has no FPSIMD register context.
> > > 
> > > The ->mm check is only there to filter out tasks that cannot
> > > possibly have FPSIMD context loaded, for optimisation purposes.
> > > Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > > saving FPSIMD context back to memory.  For these reasons, the ->mm
> > > checks are not useful, providing that TIF_FOREIGN_FPSTATE is
> > > maintained in a consistent way for kernel threads.
> > 
> > Consistent with what?  Without more context or explanation,
> 
> Consistent with the handling of user threads (though I admit it's not
> explicit in the text.)
> 
> > I'm not sure what the reader is to make of that.  Do you not mean the
> > TIF_FOREIGN_FPSTATE is always true for kernel threads?
> 
> Again, this is probably a red herring.  TIF_FOREIGN_FPSTATE is always
> true for kernel threads prior to the patch, except (randomly) for the
> init task.

That was really what my initial question was about, and what I thought
the commit message should make abundantly clear, because that ties the
message together with the code.

> 
> This change is not really about TIF_FOREIGN_FPSTATE at all, rather
> that there is nothing to justify handling kernel threads differently,
> or even distinguishing kernel threads from user threads at all in this
> code.

Understood.

> 
> Part of the confusion (and I had confused myself) comes from the fact
> that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make
> sense as a per-task property -- i.e., the flag is meaningless for
> scheduled-out tasks and we must explicitly "repair" it when scheduling
> a task in anyway.  I think it's a thread flag primarily so that it's
> convenient to check alongside other thread flags in the ret_to_user
> work loop.  This is somewhat less of a justification now that loop was
> ported to C.
> 
> > > 
> > > The context switch logic is already deliberately optimised to defer
> > > reloads of the regs until ret_to_user (or sigreturn as a special
> > > case), and save them only if they have been previously loaded.
> 
> Does it help to insert the following here?
> 
> "These paths are the only places where the wrong_task and wrong_cpu
> conditions can be made false, by calling fpsimd_bind_task_to_cpu()."
> 

yes it does.

> > > Kernel threads by definition never reach these paths.  As a result,
> > 
> > I'm struggling with the "As a result," here.  Is this because reloads of
> > regs in ret_to_user (or sigreturn) are the only places that can make
> > wrong_cpu or wrong_task be false?
> 
> See the proposed clarification above.  Is that sufficient?
> 

yes.

> > (I'm actually wanting to understand this, not just bikeshedding the
> > commit message, as new corner cases keep coming up on this logic.)
> 
> That's a good thing, and I would really like to explain it in a
> concise manner.  See [*] below for the "concise" explanation -- it may
> demonstrate why I've been evasive...
> 

I don't think you've been evasive at all, I just think we reason about
this in slightly different ways, and I was trying to convince myself why
this change is safe and summarize that concisely.  I think we've
accomplished both :)

> > > the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
> > > always yield true for kernel threads.
> > > 
> > > This patch removes the redundant checks and special-case code,                  ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread               is scheduled in, and ensures that this flag is set for the init
> > > task.  The fpsimd_flush_task_state() call already present in                    copy_thread() ensures the same for any new task.
> > 
> > nit: funny formatting
> 
> Dang, I was repeatedly pasing between Mutt and git commit terminals,
> which doesn't always work as I'd like...
> 
> > nit: ensuring that TIF_FOREIGN_FPSTATE *remains* set whenever a kernel
> > thread is scheduled in?
> 
> Er, yes.
> 
> > > With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
> > > ensures that no extra context save work is added for kernel
> > > threads, and eliminates the redundant context saving that may
> > > currently occur for kernel threads that have acquired an mm via
> > > use_mm().
> > > 
> > > -->8--
> > 
> > If you can slightly connect the dots with the "As a result" above, I'm
> > fine with your version of the text.
> 
> 
> As an aside, the big wall of text before the definition of struct
> fpsimd_last_state_struct is looking out of date and could use an
> update to cover at least some of what is explained in [*] better.
> 
> I'm currently considering that out of scope for this series, but I will
> keep it in mind to refresh it in the not too distant future.
> 

Fine with me.

> 
> Cheers
> ---Dave
> 
> --8<--
> 
> [*] The bigger picture:
> 
> * Consider a relation (C,T) between cpus C and tasks T, such that
>   (C,T) means "T's FPSIMD regs are loaded on cpu C".
> 
>   At a given point of execution of some cpu C, there is at most one task
>   T for which (C,T) holds.
>  
>   At a given point of execution of some task T, there is at most one
>   cpu C for which (C,T) holds.
> 
> * (C,T) becomes true whenever T's registers are loaded into cpu C.
> 
> * At sched-out, we must ensure that the registers of current are
>   loaded before writing them to current's thread_struct.  Thus, we
>   must save the registers if and only if (smp_processor_id(), current)
>   holds at this time.
> 
> * Before entering userspace, we must ensure that current's regs
>   are loaded, and we must only load the regs if they are not loaded
>   already (since if so, they might have been dirtied by current in
>   userspace since last loaded).
> 
>   Thus, when entering userspace, we must load the regs from memory
>   if and only if (smp_processor_id(), current) does not hold.
> 
> * Checking this relation involves per-CPU access and inspection of
>   current->thread, and was presumably considered too cumbersome for
>   implemenation an entry.S, particluarly in the ret_to_user work
>   pending loop (which is where the FPSIMD regs are finally loaded
>   before entering userspace, if they weren't loaded already).
> 
>   To mitigate this, the status of the check is cached in a thread flag
>   TIF_FOREIGN_FPSTATE: with softirqs disabled, (smp_processor_id(),
>   current) holds if and only if TIF_FOREIGN_FPSTATE is false.
>   TIF_FOREIGN_FPSTATE is corrected on sched-in by the code in
>   fpsimd_thread_switch().
> 
> [2] Anything that changes the state of the relation for current
>   requires its TIF_FOREIGN_FPSTATE to be changed to match.
> 
> * (smp_processor_id(), current) is established in
>   fpsimd_bind_task_to_cpu().  This is the only way the relation can be
>   made to hold between a task and a CPU.
> 
> * (C,T) is broken whenever
> 
> [1] T is created;
> 
>   * T's regs are loaded onto a different cpu C2, so (C2,T) becomes
>     true and (C,T) necessarily becomes false;
> 
>   * another task's regs are loaded into C, so (C,T2) becomes true
>     and (C,T) necessarily becomes false;
> 
>   * the kernel clobbers the regs on C for its own purposes, so
>     (C,T) becomes false but there is no T2 for which (C,T2) becomes
>     true as a result.  Examples are kernel-mode NEON and loading
>     the regs for a KVM vcpu;
> 
>   * T's register context changes via a thread_struct update instead
>     of running instructions in userspace, requiring the contents of
>     the hardware regs to be thrown away.  Examples are exec() (which
>     requires the registers to be zeroed), sigreturn (which populates the
>     regs from the user signal frame) and modification of the registers
>     via PTRACE_SETREGSET;
> 
>     As a (probably unnecesary) optimisation, sigreturn immediately
>     loads the registers and reestablishes (smp_processor_id(), current)
>     in anticipation of the return to userspace which is likely to
>     occur soon.  This allows the relation breaking logic to be omitted
>     in fpsimd_update_current_state() which does the work.
> 
> * In general, these relation breakings involve an unknown: knowing
>   either C or T but *not* both, we want to break (C,T).  If the
>   relation were recorded in task_struct only, we would need to scan all
>   tasks in the "T unknown" case.  If the relation were recorded in a
>   percpu variable only, we would need to scan all CPUs in the "C
>   unknown" case.  As well as having gnarly synchronisation
>   requirements, these would get expensive in many-tasks or many-cpus
>   situations.
> 
>   This is why the relation is recorded in both places, and is only
>   deemed to hold if the two records match up.  This is what
>   fpsimd_thread_switch() is checking for the task being scheduled in.
> 
>   The invalidation (breaking) operations are now factored as
> 
>   fpsimd_flush_task_state(): falsify (C,current) for every cpu C.
>   This is done by zapping current->thread.fpsimd_cpu with NR_CPUS
>   (chosen because it cannot match smp_processor_id()).
> 
>   fpsumd_flush_cpu_state(): falsify (smp_processor_id(),T) for every
>   task T.  This is done by zapping this_cpu(fpsimd_last_state.st)
>   with NULL (chosen because it cannot match &T->thread.uw.fpsimd_state
>   for any task).
> 
>   By [2] above, it is necessary to ensure that TIF_FOREIGN_FPSTATE is
>   set after calling either of the above functions.  Of the two,
>   fpsimd_flush_cpu_state() now does this implicitly but
>   fpsimd_flush_task_state() does not: but the caller must do it
>   instead.  I have a vague memory of some refactoring obstacle that
>   dissuaded me from pulling the set_thread_flag in, but I can't
>   remember it now.  I may review this later.
> 
> * Because the (C,T) relation may need to be manipulated by
>   kernel_neon_{begin,end}() in softirq context, examining or
>   manipulating for current or the running CPU must be done under
>   local_bh_disable().  The same goes for TIF_FOREIGN_FPSTATE which is
>   supposed to represent the same condition but may spontaneously become
>   stale if softirqs are not masked.  (The rule is not quite as strict
>   as this, but in order to make the code easier to reason about, I skip
>   the local_bh_disable() only where absolutely necessary --
>   restore_sve_fpsimd_context() is the only example today.)
> 
> Now, imagine that T is a kernel thread, and consider what needs to
> be done differently.  The observation of this patch is that nothing
> needs to be done differently at all.
> 
> There is a single anomaly relating to [1] above, in the form of a task
> that can run without ever being scheduled in: the init task.  Beyond
> that, kernel_neon_begin() before the first reschedule would spuriously
> save the FPSIMD regs into the init_task's thread struct, even though it
> is pointless to do so.  This patch fixes those anomalies by updating
> INIT_THREAD and INIT_THREAD_INFO to set up the init task so that it
> looks the same as some other kernel thread that has been scheduled in.
> 
> There is a strong design motivation to avoid unnecessary loads and
> saves of the state, so if removing the special-casing of kernel threads
> were to add cost it would imply that the code were _already_ suboptimal
> for user tasks.  This patch does not attempt to address that at all,
> but by assuming that the code is already well-optimised, "unnecessary"
> save/restore work will not be added.  If this were not the case, it
> could in any case be fixed independently.
> 
> The observation of this _series_ is that we don't need to do very
> much in order to be able to generalise the logic to accept KVM vcpus
> in place of T.
> 

Thanks for the explanation.
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-25  9:00                     ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-25  9:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 24, 2018 at 03:37:15PM +0100, Dave Martin wrote:
> On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote:
> > On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:
> > > On Thu, May 24, 2018 at 10:33:50AM +0200, Christoffer Dall wrote:
> 
> [...]
> 
> > > > ...with a risk of being a bit over-pedantic and annoying, may I suggest
> > > > the following complete commit text:
> > > > 
> > > > ------8<------
> > > > Currently the FPSIMD handling code uses the condition task->mm ==
> > > > NULL as a hint that task has no FPSIMD register context.
> > > > 
> > > > The ->mm check is only there to filter out tasks that cannot
> > > > possibly have FPSIMD context loaded, for optimisation purposes.
> > > > However, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > > > saving FPSIMD context back to memory.  For this reason, the ->mm
> > > > checks are not useful, providing that that TIF_FOREIGN_FPSTATE is
> > > > maintained properly for kernel threads.
> > > > 
> > > > FPSIMD context is never preserved for kernel threads across a context
> > > > switch and therefore TIF_FOREIGN_FPSTATE should always be true for
> > > 
> > > (This refactoring opens up the interesting possibility of making
> > > kernel-mode NEON in task context preemptible for kernel threads so
> > > that we actually do preserve state... but that's a discussion for
> > > another day.  There may be code around that relies on
> > > kernel_neon_begin() disabling preemption for real.)
> > > 
> > > > kernel threads.  This is indeed the case, as the wrong_task and
> > > 
> > > This suggests that TIF_FOREIGN_FPSTATE is always true for kernel
> > > threads today.  This is not quite because use_mm() can make mm non-
> > > NULL.
> > > 
> > 
> > I was suggesting that it's always true after this patch.
> 
> I tend to read the present tense as describing the situation before the
> patch, but this convention isn't followed universally.
> 
> This was part of the problem with my "true by construction" weasel
> words: the described property wasn't true by construction prior to the
> patch, and there wasn't sufficient explanation to convince people it's
> true afterwards.  If people are bring rigorous, it takes a _lot_ of
> explanation...
> 
> > 
> > > > wrong_cpu tests in fpsimd_thread_switch() will always yield false for
> > > > kernel threads.
> > > 
> > > ("false" -> "true".  My bad.)
> > > 
> > > > Further, the context switch logic is already deliberately optimised to
> > > > defer reloads of the FPSIMD context until ret_to_user (or sigreturn as a
> > > > special case), which kernel threads by definition never reach, and
> > > > therefore this change introduces no additional work in the critical
> > > > path.
> > > > 
> > > > This patch removes the redundant checks and special-case code.
> > > > ------8<------
> > > 
> > > Looking at my existing text, I rather reworded it like this.
> > > Does this work any better for you?
> > > 
> > > --8<--
> > > 
> > > Currently the FPSIMD handling code uses the condition task->mm ==
> > > NULL as a hint that task has no FPSIMD register context.
> > > 
> > > The ->mm check is only there to filter out tasks that cannot
> > > possibly have FPSIMD context loaded, for optimisation purposes.
> > > Also, TIF_FOREIGN_FPSTATE must always be checked anyway before
> > > saving FPSIMD context back to memory.  For these reasons, the ->mm
> > > checks are not useful, providing that TIF_FOREIGN_FPSTATE is
> > > maintained in a consistent way for kernel threads.
> > 
> > Consistent with what?  Without more context or explanation,
> 
> Consistent with the handling of user threads (though I admit it's not
> explicit in the text.)
> 
> > I'm not sure what the reader is to make of that.  Do you not mean the
> > TIF_FOREIGN_FPSTATE is always true for kernel threads?
> 
> Again, this is probably a red herring.  TIF_FOREIGN_FPSTATE is always
> true for kernel threads prior to the patch, except (randomly) for the
> init task.

That was really what my initial question was about, and what I thought
the commit message should make abundantly clear, because that ties the
message together with the code.

> 
> This change is not really about TIF_FOREIGN_FPSTATE at all, rather
> that there is nothing to justify handling kernel threads differently,
> or even distinguishing kernel threads from user threads at all in this
> code.

Understood.

> 
> Part of the confusion (and I had confused myself) comes from the fact
> that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make
> sense as a per-task property -- i.e., the flag is meaningless for
> scheduled-out tasks and we must explicitly "repair" it when scheduling
> a task in anyway.  I think it's a thread flag primarily so that it's
> convenient to check alongside other thread flags in the ret_to_user
> work loop.  This is somewhat less of a justification now that loop was
> ported to C.
> 
> > > 
> > > The context switch logic is already deliberately optimised to defer
> > > reloads of the regs until ret_to_user (or sigreturn as a special
> > > case), and save them only if they have been previously loaded.
> 
> Does it help to insert the following here?
> 
> "These paths are the only places where the wrong_task and wrong_cpu
> conditions can be made false, by calling fpsimd_bind_task_to_cpu()."
> 

yes it does.

> > > Kernel threads by definition never reach these paths.  As a result,
> > 
> > I'm struggling with the "As a result," here.  Is this because reloads of
> > regs in ret_to_user (or sigreturn) are the only places that can make
> > wrong_cpu or wrong_task be false?
> 
> See the proposed clarification above.  Is that sufficient?
> 

yes.

> > (I'm actually wanting to understand this, not just bikeshedding the
> > commit message, as new corner cases keep coming up on this logic.)
> 
> That's a good thing, and I would really like to explain it in a
> concise manner.  See [*] below for the "concise" explanation -- it may
> demonstrate why I've been evasive...
> 

I don't think you've been evasive at all, I just think we reason about
this in slightly different ways, and I was trying to convince myself why
this change is safe and summarize that concisely.  I think we've
accomplished both :)

> > > the wrong_task and wrong_cpu tests in fpsimd_thread_switch() will
> > > always yield true for kernel threads.
> > > 
> > > This patch removes the redundant checks and special-case code,                  ensuring that TIF_FOREIGN_FPSTATE is set whenever a kernel thread               is scheduled in, and ensures that this flag is set for the init
> > > task.  The fpsimd_flush_task_state() call already present in                    copy_thread() ensures the same for any new task.
> > 
> > nit: funny formatting
> 
> Dang, I was repeatedly pasing between Mutt and git commit terminals,
> which doesn't always work as I'd like...
> 
> > nit: ensuring that TIF_FOREIGN_FPSTATE *remains* set whenever a kernel
> > thread is scheduled in?
> 
> Er, yes.
> 
> > > With TIF_FOREIGN_FPSTATE always set for kernel threads, this patch
> > > ensures that no extra context save work is added for kernel
> > > threads, and eliminates the redundant context saving that may
> > > currently occur for kernel threads that have acquired an mm via
> > > use_mm().
> > > 
> > > -->8--
> > 
> > If you can slightly connect the dots with the "As a result" above, I'm
> > fine with your version of the text.
> 
> 
> As an aside, the big wall of text before the definition of struct
> fpsimd_last_state_struct is looking out of date and could use an
> update to cover at least some of what is explained in [*] better.
> 
> I'm currently considering that out of scope for this series, but I will
> keep it in mind to refresh it in the not too distant future.
> 

Fine with me.

> 
> Cheers
> ---Dave
> 
> --8<--
> 
> [*] The bigger picture:
> 
> * Consider a relation (C,T) between cpus C and tasks T, such that
>   (C,T) means "T's FPSIMD regs are loaded on cpu C".
> 
>   At a given point of execution of some cpu C, there is at most one task
>   T for which (C,T) holds.
>  
>   At a given point of execution of some task T, there is at most one
>   cpu C for which (C,T) holds.
> 
> * (C,T) becomes true whenever T's registers are loaded into cpu C.
> 
> * At sched-out, we must ensure that the registers of current are
>   loaded before writing them to current's thread_struct.  Thus, we
>   must save the registers if and only if (smp_processor_id(), current)
>   holds at this time.
> 
> * Before entering userspace, we must ensure that current's regs
>   are loaded, and we must only load the regs if they are not loaded
>   already (since if so, they might have been dirtied by current in
>   userspace since last loaded).
> 
>   Thus, when entering userspace, we must load the regs from memory
>   if and only if (smp_processor_id(), current) does not hold.
> 
> * Checking this relation involves per-CPU access and inspection of
>   current->thread, and was presumably considered too cumbersome for
>   implemenation an entry.S, particluarly in the ret_to_user work
>   pending loop (which is where the FPSIMD regs are finally loaded
>   before entering userspace, if they weren't loaded already).
> 
>   To mitigate this, the status of the check is cached in a thread flag
>   TIF_FOREIGN_FPSTATE: with softirqs disabled, (smp_processor_id(),
>   current) holds if and only if TIF_FOREIGN_FPSTATE is false.
>   TIF_FOREIGN_FPSTATE is corrected on sched-in by the code in
>   fpsimd_thread_switch().
> 
> [2] Anything that changes the state of the relation for current
>   requires its TIF_FOREIGN_FPSTATE to be changed to match.
> 
> * (smp_processor_id(), current) is established in
>   fpsimd_bind_task_to_cpu().  This is the only way the relation can be
>   made to hold between a task and a CPU.
> 
> * (C,T) is broken whenever
> 
> [1] T is created;
> 
>   * T's regs are loaded onto a different cpu C2, so (C2,T) becomes
>     true and (C,T) necessarily becomes false;
> 
>   * another task's regs are loaded into C, so (C,T2) becomes true
>     and (C,T) necessarily becomes false;
> 
>   * the kernel clobbers the regs on C for its own purposes, so
>     (C,T) becomes false but there is no T2 for which (C,T2) becomes
>     true as a result.  Examples are kernel-mode NEON and loading
>     the regs for a KVM vcpu;
> 
>   * T's register context changes via a thread_struct update instead
>     of running instructions in userspace, requiring the contents of
>     the hardware regs to be thrown away.  Examples are exec() (which
>     requires the registers to be zeroed), sigreturn (which populates the
>     regs from the user signal frame) and modification of the registers
>     via PTRACE_SETREGSET;
> 
>     As a (probably unnecesary) optimisation, sigreturn immediately
>     loads the registers and reestablishes (smp_processor_id(), current)
>     in anticipation of the return to userspace which is likely to
>     occur soon.  This allows the relation breaking logic to be omitted
>     in fpsimd_update_current_state() which does the work.
> 
> * In general, these relation breakings involve an unknown: knowing
>   either C or T but *not* both, we want to break (C,T).  If the
>   relation were recorded in task_struct only, we would need to scan all
>   tasks in the "T unknown" case.  If the relation were recorded in a
>   percpu variable only, we would need to scan all CPUs in the "C
>   unknown" case.  As well as having gnarly synchronisation
>   requirements, these would get expensive in many-tasks or many-cpus
>   situations.
> 
>   This is why the relation is recorded in both places, and is only
>   deemed to hold if the two records match up.  This is what
>   fpsimd_thread_switch() is checking for the task being scheduled in.
> 
>   The invalidation (breaking) operations are now factored as
> 
>   fpsimd_flush_task_state(): falsify (C,current) for every cpu C.
>   This is done by zapping current->thread.fpsimd_cpu with NR_CPUS
>   (chosen because it cannot match smp_processor_id()).
> 
>   fpsumd_flush_cpu_state(): falsify (smp_processor_id(),T) for every
>   task T.  This is done by zapping this_cpu(fpsimd_last_state.st)
>   with NULL (chosen because it cannot match &T->thread.uw.fpsimd_state
>   for any task).
> 
>   By [2] above, it is necessary to ensure that TIF_FOREIGN_FPSTATE is
>   set after calling either of the above functions.  Of the two,
>   fpsimd_flush_cpu_state() now does this implicitly but
>   fpsimd_flush_task_state() does not: but the caller must do it
>   instead.  I have a vague memory of some refactoring obstacle that
>   dissuaded me from pulling the set_thread_flag in, but I can't
>   remember it now.  I may review this later.
> 
> * Because the (C,T) relation may need to be manipulated by
>   kernel_neon_{begin,end}() in softirq context, examining or
>   manipulating for current or the running CPU must be done under
>   local_bh_disable().  The same goes for TIF_FOREIGN_FPSTATE which is
>   supposed to represent the same condition but may spontaneously become
>   stale if softirqs are not masked.  (The rule is not quite as strict
>   as this, but in order to make the code easier to reason about, I skip
>   the local_bh_disable() only where absolutely necessary --
>   restore_sve_fpsimd_context() is the only example today.)
> 
> Now, imagine that T is a kernel thread, and consider what needs to
> be done differently.  The observation of this patch is that nothing
> needs to be done differently at all.
> 
> There is a single anomaly relating to [1] above, in the form of a task
> that can run without ever being scheduled in: the init task.  Beyond
> that, kernel_neon_begin() before the first reschedule would spuriously
> save the FPSIMD regs into the init_task's thread struct, even though it
> is pointless to do so.  This patch fixes those anomalies by updating
> INIT_THREAD and INIT_THREAD_INFO to set up the init task so that it
> looks the same as some other kernel thread that has been scheduled in.
> 
> There is a strong design motivation to avoid unnecessary loads and
> saves of the state, so if removing the special-casing of kernel threads
> were to add cost it would imply that the code were _already_ suboptimal
> for user tasks.  This patch does not attempt to address that at all,
> but by assuming that the code is already well-optimised, "unnecessary"
> save/restore work will not be added.  If this were not the case, it
> could in any case be fixed independently.
> 
> The observation of this _series_ is that we don't need to do very
> much in order to be able to generalise the logic to accept KVM vcpus
> in place of T.
> 

Thanks for the explanation.
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-25  9:00                     ` Christoffer Dall
@ 2018-05-25  9:45                       ` Dave Martin
  -1 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-25  9:45 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Fri, May 25, 2018 at 11:00:20AM +0200, Christoffer Dall wrote:
> On Thu, May 24, 2018 at 03:37:15PM +0100, Dave Martin wrote:
> > On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote:
> > > On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:

[...]

> > > I'm not sure what the reader is to make of that.  Do you not mean the
> > > TIF_FOREIGN_FPSTATE is always true for kernel threads?
> > 
> > Again, this is probably a red herring.  TIF_FOREIGN_FPSTATE is always
> > true for kernel threads prior to the patch, except (randomly) for the
> > init task.
> 
> That was really what my initial question was about, and what I thought
> the commit message should make abundantly clear, because that ties the
> message together with the code.
> 
> > 
> > This change is not really about TIF_FOREIGN_FPSTATE at all, rather
> > that there is nothing to justify handling kernel threads differently,
> > or even distinguishing kernel threads from user threads at all in this
> > code.
> 
> Understood.

And my bad was that I hadn't gone to the effort of understanding my own
argument -- I'd glad to be called out on that.

> > Part of the confusion (and I had confused myself) comes from the fact
> > that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make
> > sense as a per-task property -- i.e., the flag is meaningless for
> > scheduled-out tasks and we must explicitly "repair" it when scheduling
> > a task in anyway.  I think it's a thread flag primarily so that it's
> > convenient to check alongside other thread flags in the ret_to_user
> > work loop.  This is somewhat less of a justification now that loop was
> > ported to C.
> > 
> > > > 
> > > > The context switch logic is already deliberately optimised to defer
> > > > reloads of the regs until ret_to_user (or sigreturn as a special
> > > > case), and save them only if they have been previously loaded.
> > 
> > Does it help to insert the following here?
> > 
> > "These paths are the only places where the wrong_task and wrong_cpu
> > conditions can be made false, by calling fpsimd_bind_task_to_cpu()."
> > 
> 
> yes it does.
> 
> > > > Kernel threads by definition never reach these paths.  As a result,
> > > 
> > > I'm struggling with the "As a result," here.  Is this because reloads of
> > > regs in ret_to_user (or sigreturn) are the only places that can make
> > > wrong_cpu or wrong_task be false?
> > 
> > See the proposed clarification above.  Is that sufficient?
> > 
> 
> yes.
> 
> > > (I'm actually wanting to understand this, not just bikeshedding the
> > > commit message, as new corner cases keep coming up on this logic.)
> > 
> > That's a good thing, and I would really like to explain it in a
> > concise manner.  See [*] below for the "concise" explanation -- it may
> > demonstrate why I've been evasive...
> > 
> 
> I don't think you've been evasive at all, I just think we reason about
> this in slightly different ways, and I was trying to convince myself why
> this change is safe and summarize that concisely.  I think we've
> accomplished both :)

OK, good.  I reposted speculatively on this basis :)

The commit message is in better shape now, and I very much appreciate
you kicking the tyres on my reasoning!

[...]

> > As an aside, the big wall of text before the definition of struct
> > fpsimd_last_state_struct is looking out of date and could use an
> > update to cover at least some of what is explained in [*] better.
> > 
> > I'm currently considering that out of scope for this series, but I will
> > keep it in mind to refresh it in the not too distant future.
> > 
> 
> Fine with me.

OK, good.

[...]

> > [*] The bigger picture:
> > 
> > * Consider a relation (C,T) between cpus C and tasks T, such that

[...]

> > but by assuming that the code is already well-optimised, "unnecessary"
> > save/restore work will not be added.  If this were not the case, it
> > could in any case be fixed independently.
> > 
> > The observation of this _series_ is that we don't need to do very
> > much in order to be able to generalise the logic to accept KVM vcpus
> > in place of T.
> > 
> 
> Thanks for the explanation.
> -Christoffer

Was this reasonably understandable?  If so I could use it as a basis for
improving the comment block in fpsimd.c, but I'd want to squash it down
to the essentials.  It's pretty verbose as it stands.

(What I'd really like to do it take an axe to the logic so that we
end up with something that doesn't require anything like this amount
of explanation ... but that's more of an aspiration right now.)

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-25  9:45                       ` Dave Martin
  0 siblings, 0 replies; 138+ messages in thread
From: Dave Martin @ 2018-05-25  9:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 25, 2018 at 11:00:20AM +0200, Christoffer Dall wrote:
> On Thu, May 24, 2018 at 03:37:15PM +0100, Dave Martin wrote:
> > On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote:
> > > On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:

[...]

> > > I'm not sure what the reader is to make of that.  Do you not mean the
> > > TIF_FOREIGN_FPSTATE is always true for kernel threads?
> > 
> > Again, this is probably a red herring.  TIF_FOREIGN_FPSTATE is always
> > true for kernel threads prior to the patch, except (randomly) for the
> > init task.
> 
> That was really what my initial question was about, and what I thought
> the commit message should make abundantly clear, because that ties the
> message together with the code.
> 
> > 
> > This change is not really about TIF_FOREIGN_FPSTATE at all, rather
> > that there is nothing to justify handling kernel threads differently,
> > or even distinguishing kernel threads from user threads at all in this
> > code.
> 
> Understood.

And my bad was that I hadn't gone to the effort of understanding my own
argument -- I'd glad to be called out on that.

> > Part of the confusion (and I had confused myself) comes from the fact
> > that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make
> > sense as a per-task property -- i.e., the flag is meaningless for
> > scheduled-out tasks and we must explicitly "repair" it when scheduling
> > a task in anyway.  I think it's a thread flag primarily so that it's
> > convenient to check alongside other thread flags in the ret_to_user
> > work loop.  This is somewhat less of a justification now that loop was
> > ported to C.
> > 
> > > > 
> > > > The context switch logic is already deliberately optimised to defer
> > > > reloads of the regs until ret_to_user (or sigreturn as a special
> > > > case), and save them only if they have been previously loaded.
> > 
> > Does it help to insert the following here?
> > 
> > "These paths are the only places where the wrong_task and wrong_cpu
> > conditions can be made false, by calling fpsimd_bind_task_to_cpu()."
> > 
> 
> yes it does.
> 
> > > > Kernel threads by definition never reach these paths.  As a result,
> > > 
> > > I'm struggling with the "As a result," here.  Is this because reloads of
> > > regs in ret_to_user (or sigreturn) are the only places that can make
> > > wrong_cpu or wrong_task be false?
> > 
> > See the proposed clarification above.  Is that sufficient?
> > 
> 
> yes.
> 
> > > (I'm actually wanting to understand this, not just bikeshedding the
> > > commit message, as new corner cases keep coming up on this logic.)
> > 
> > That's a good thing, and I would really like to explain it in a
> > concise manner.  See [*] below for the "concise" explanation -- it may
> > demonstrate why I've been evasive...
> > 
> 
> I don't think you've been evasive at all, I just think we reason about
> this in slightly different ways, and I was trying to convince myself why
> this change is safe and summarize that concisely.  I think we've
> accomplished both :)

OK, good.  I reposted speculatively on this basis :)

The commit message is in better shape now, and I very much appreciate
you kicking the tyres on my reasoning!

[...]

> > As an aside, the big wall of text before the definition of struct
> > fpsimd_last_state_struct is looking out of date and could use an
> > update to cover at least some of what is explained in [*] better.
> > 
> > I'm currently considering that out of scope for this series, but I will
> > keep it in mind to refresh it in the not too distant future.
> > 
> 
> Fine with me.

OK, good.

[...]

> > [*] The bigger picture:
> > 
> > * Consider a relation (C,T) between cpus C and tasks T, such that

[...]

> > but by assuming that the code is already well-optimised, "unnecessary"
> > save/restore work will not be added.  If this were not the case, it
> > could in any case be fixed independently.
> > 
> > The observation of this _series_ is that we don't need to do very
> > much in order to be able to generalise the logic to accept KVM vcpus
> > in place of T.
> > 
> 
> Thanks for the explanation.
> -Christoffer

Was this reasonably understandable?  If so I could use it as a basis for
improving the comment block in fpsimd.c, but I'd want to squash it down
to the essentials.  It's pretty verbose as it stands.

(What I'd really like to do it take an axe to the logic so that we
end up with something that doesn't require anything like this amount
of explanation ... but that's more of an aspiration right now.)

Cheers
---Dave

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
  2018-05-25  9:45                       ` Dave Martin
@ 2018-05-25 11:28                         ` Christoffer Dall
  -1 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-25 11:28 UTC (permalink / raw)
  To: Dave Martin
  Cc: Christoffer Dall, Ard Biesheuvel, Marc Zyngier, Catalin Marinas,
	Will Deacon, kvmarm, linux-arm-kernel

On Fri, May 25, 2018 at 10:45:17AM +0100, Dave Martin wrote:
> On Fri, May 25, 2018 at 11:00:20AM +0200, Christoffer Dall wrote:
> > On Thu, May 24, 2018 at 03:37:15PM +0100, Dave Martin wrote:
> > > On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote:
> > > > On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:
> 
> [...]
> 
> > > > I'm not sure what the reader is to make of that.  Do you not mean the
> > > > TIF_FOREIGN_FPSTATE is always true for kernel threads?
> > > 
> > > Again, this is probably a red herring.  TIF_FOREIGN_FPSTATE is always
> > > true for kernel threads prior to the patch, except (randomly) for the
> > > init task.
> > 
> > That was really what my initial question was about, and what I thought
> > the commit message should make abundantly clear, because that ties the
> > message together with the code.
> > 
> > > 
> > > This change is not really about TIF_FOREIGN_FPSTATE at all, rather
> > > that there is nothing to justify handling kernel threads differently,
> > > or even distinguishing kernel threads from user threads at all in this
> > > code.
> > 
> > Understood.
> 
> And my bad was that I hadn't gone to the effort of understanding my own
> argument -- I'd glad to be called out on that.
> 
> > > Part of the confusion (and I had confused myself) comes from the fact
> > > that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make
> > > sense as a per-task property -- i.e., the flag is meaningless for
> > > scheduled-out tasks and we must explicitly "repair" it when scheduling
> > > a task in anyway.  I think it's a thread flag primarily so that it's
> > > convenient to check alongside other thread flags in the ret_to_user
> > > work loop.  This is somewhat less of a justification now that loop was
> > > ported to C.
> > > 
> > > > > 
> > > > > The context switch logic is already deliberately optimised to defer
> > > > > reloads of the regs until ret_to_user (or sigreturn as a special
> > > > > case), and save them only if they have been previously loaded.
> > > 
> > > Does it help to insert the following here?
> > > 
> > > "These paths are the only places where the wrong_task and wrong_cpu
> > > conditions can be made false, by calling fpsimd_bind_task_to_cpu()."
> > > 
> > 
> > yes it does.
> > 
> > > > > Kernel threads by definition never reach these paths.  As a result,
> > > > 
> > > > I'm struggling with the "As a result," here.  Is this because reloads of
> > > > regs in ret_to_user (or sigreturn) are the only places that can make
> > > > wrong_cpu or wrong_task be false?
> > > 
> > > See the proposed clarification above.  Is that sufficient?
> > > 
> > 
> > yes.
> > 
> > > > (I'm actually wanting to understand this, not just bikeshedding the
> > > > commit message, as new corner cases keep coming up on this logic.)
> > > 
> > > That's a good thing, and I would really like to explain it in a
> > > concise manner.  See [*] below for the "concise" explanation -- it may
> > > demonstrate why I've been evasive...
> > > 
> > 
> > I don't think you've been evasive at all, I just think we reason about
> > this in slightly different ways, and I was trying to convince myself why
> > this change is safe and summarize that concisely.  I think we've
> > accomplished both :)
> 
> OK, good.  I reposted speculatively on this basis :)
> 
> The commit message is in better shape now, and I very much appreciate
> you kicking the tyres on my reasoning!
> 
> [...]
> 
> > > As an aside, the big wall of text before the definition of struct
> > > fpsimd_last_state_struct is looking out of date and could use an
> > > update to cover at least some of what is explained in [*] better.
> > > 
> > > I'm currently considering that out of scope for this series, but I will
> > > keep it in mind to refresh it in the not too distant future.
> > > 
> > 
> > Fine with me.
> 
> OK, good.
> 
> [...]
> 
> > > [*] The bigger picture:
> > > 
> > > * Consider a relation (C,T) between cpus C and tasks T, such that
> 
> [...]
> 
> > > but by assuming that the code is already well-optimised, "unnecessary"
> > > save/restore work will not be added.  If this were not the case, it
> > > could in any case be fixed independently.
> > > 
> > > The observation of this _series_ is that we don't need to do very
> > > much in order to be able to generalise the logic to accept KVM vcpus
> > > in place of T.
> > > 
> > 
> > Thanks for the explanation.
> > -Christoffer
> 
> Was this reasonably understandable?  If so I could use it as a basis for
> improving the comment block in fpsimd.c, but I'd want to squash it down
> to the essentials.  It's pretty verbose as it stands.

Yes, I think that's a resonable way forward.  The thing that I hadn't
fully appreciated before is that you may have a valid relation (C,T)
which you wish to invalidate whilst T may not be running on C at that
particular time.

> 
> (What I'd really like to do it take an axe to the logic so that we
> end up with something that doesn't require anything like this amount
> of explanation ... but that's more of an aspiration right now.)
> 

I'll be happy to review a potentially simplified design, should you come
up with one at some point in the future.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks
@ 2018-05-25 11:28                         ` Christoffer Dall
  0 siblings, 0 replies; 138+ messages in thread
From: Christoffer Dall @ 2018-05-25 11:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 25, 2018 at 10:45:17AM +0100, Dave Martin wrote:
> On Fri, May 25, 2018 at 11:00:20AM +0200, Christoffer Dall wrote:
> > On Thu, May 24, 2018 at 03:37:15PM +0100, Dave Martin wrote:
> > > On Thu, May 24, 2018 at 12:06:59PM +0200, Christoffer Dall wrote:
> > > > On Thu, May 24, 2018 at 10:50:56AM +0100, Dave Martin wrote:
> 
> [...]
> 
> > > > I'm not sure what the reader is to make of that.  Do you not mean the
> > > > TIF_FOREIGN_FPSTATE is always true for kernel threads?
> > > 
> > > Again, this is probably a red herring.  TIF_FOREIGN_FPSTATE is always
> > > true for kernel threads prior to the patch, except (randomly) for the
> > > init task.
> > 
> > That was really what my initial question was about, and what I thought
> > the commit message should make abundantly clear, because that ties the
> > message together with the code.
> > 
> > > 
> > > This change is not really about TIF_FOREIGN_FPSTATE at all, rather
> > > that there is nothing to justify handling kernel threads differently,
> > > or even distinguishing kernel threads from user threads at all in this
> > > code.
> > 
> > Understood.
> 
> And my bad was that I hadn't gone to the effort of understanding my own
> argument -- I'd glad to be called out on that.
> 
> > > Part of the confusion (and I had confused myself) comes from the fact
> > > that TIF_FOREIGN_FPSTATE is really a per-cpu property and doesn't make
> > > sense as a per-task property -- i.e., the flag is meaningless for
> > > scheduled-out tasks and we must explicitly "repair" it when scheduling
> > > a task in anyway.  I think it's a thread flag primarily so that it's
> > > convenient to check alongside other thread flags in the ret_to_user
> > > work loop.  This is somewhat less of a justification now that loop was
> > > ported to C.
> > > 
> > > > > 
> > > > > The context switch logic is already deliberately optimised to defer
> > > > > reloads of the regs until ret_to_user (or sigreturn as a special
> > > > > case), and save them only if they have been previously loaded.
> > > 
> > > Does it help to insert the following here?
> > > 
> > > "These paths are the only places where the wrong_task and wrong_cpu
> > > conditions can be made false, by calling fpsimd_bind_task_to_cpu()."
> > > 
> > 
> > yes it does.
> > 
> > > > > Kernel threads by definition never reach these paths.  As a result,
> > > > 
> > > > I'm struggling with the "As a result," here.  Is this because reloads of
> > > > regs in ret_to_user (or sigreturn) are the only places that can make
> > > > wrong_cpu or wrong_task be false?
> > > 
> > > See the proposed clarification above.  Is that sufficient?
> > > 
> > 
> > yes.
> > 
> > > > (I'm actually wanting to understand this, not just bikeshedding the
> > > > commit message, as new corner cases keep coming up on this logic.)
> > > 
> > > That's a good thing, and I would really like to explain it in a
> > > concise manner.  See [*] below for the "concise" explanation -- it may
> > > demonstrate why I've been evasive...
> > > 
> > 
> > I don't think you've been evasive at all, I just think we reason about
> > this in slightly different ways, and I was trying to convince myself why
> > this change is safe and summarize that concisely.  I think we've
> > accomplished both :)
> 
> OK, good.  I reposted speculatively on this basis :)
> 
> The commit message is in better shape now, and I very much appreciate
> you kicking the tyres on my reasoning!
> 
> [...]
> 
> > > As an aside, the big wall of text before the definition of struct
> > > fpsimd_last_state_struct is looking out of date and could use an
> > > update to cover at least some of what is explained in [*] better.
> > > 
> > > I'm currently considering that out of scope for this series, but I will
> > > keep it in mind to refresh it in the not too distant future.
> > > 
> > 
> > Fine with me.
> 
> OK, good.
> 
> [...]
> 
> > > [*] The bigger picture:
> > > 
> > > * Consider a relation (C,T) between cpus C and tasks T, such that
> 
> [...]
> 
> > > but by assuming that the code is already well-optimised, "unnecessary"
> > > save/restore work will not be added.  If this were not the case, it
> > > could in any case be fixed independently.
> > > 
> > > The observation of this _series_ is that we don't need to do very
> > > much in order to be able to generalise the logic to accept KVM vcpus
> > > in place of T.
> > > 
> > 
> > Thanks for the explanation.
> > -Christoffer
> 
> Was this reasonably understandable?  If so I could use it as a basis for
> improving the comment block in fpsimd.c, but I'd want to squash it down
> to the essentials.  It's pretty verbose as it stands.

Yes, I think that's a resonable way forward.  The thing that I hadn't
fully appreciated before is that you may have a valid relation (C,T)
which you wish to invalidate whilst T may not be running on C at that
particular time.

> 
> (What I'd really like to do it take an axe to the logic so that we
> end up with something that doesn't require anything like this amount
> of explanation ... but that's more of an aspiration right now.)
> 

I'll be happy to review a potentially simplified design, should you come
up with one at some point in the future.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 138+ messages in thread

end of thread, other threads:[~2018-05-25 11:28 UTC | newest]

Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 16:05 [PATCH v10 00/18] KVM: arm64: Optimise FPSIMD context switching Dave Martin
2018-05-22 16:05 ` Dave Martin
2018-05-22 16:05 ` [PATCH v10 01/18] arm64: fpsimd: Fix TIF_FOREIGN_FPSTATE after invalidating cpu regs Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 11:33   ` Christoffer Dall
2018-05-23 11:33     ` Christoffer Dall
2018-05-23 13:44   ` Alex Bennée
2018-05-23 13:44     ` Alex Bennée
2018-05-23 13:46   ` Catalin Marinas
2018-05-23 13:46     ` Catalin Marinas
2018-05-22 16:05 ` [PATCH v10 02/18] thread_info: Add update_thread_flag() helpers Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 13:46   ` Alex Bennée
2018-05-23 13:46     ` Alex Bennée
2018-05-23 13:57     ` Dave Martin
2018-05-23 13:57       ` Dave Martin
2018-05-23 14:35       ` Alex Bennée
2018-05-23 14:35         ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 03/18] arm64: Use update{,_tsk}_thread_flag() Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 13:48   ` Alex Bennée
2018-05-23 13:48     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 04/18] KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 14:34   ` Alex Bennée
2018-05-23 14:34     ` Alex Bennée
2018-05-23 14:40     ` Dave Martin
2018-05-23 14:40       ` Dave Martin
2018-05-24  8:11       ` Christoffer Dall
2018-05-24  8:11         ` Christoffer Dall
2018-05-24  9:18         ` Alex Bennée
2018-05-24  9:18           ` Alex Bennée
2018-05-24 10:04           ` Dave Martin
2018-05-24 10:04             ` Dave Martin
2018-05-22 16:05 ` [PATCH v10 05/18] KVM: arm64: Convert lazy FPSIMD context switch trap to C Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 19:35   ` Alex Bennée
2018-05-23 19:35     ` Alex Bennée
2018-05-24  8:12     ` Christoffer Dall
2018-05-24  8:12       ` Christoffer Dall
2018-05-24  8:54       ` Dave Martin
2018-05-24  8:54         ` Dave Martin
2018-05-24  9:14         ` Alex Bennée
2018-05-24  9:14           ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 06/18] arm64: fpsimd: Generalise context saving for non-task contexts Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 20:15   ` Alex Bennée
2018-05-23 20:15     ` Alex Bennée
2018-05-24  9:03     ` Dave Martin
2018-05-24  9:03       ` Dave Martin
2018-05-24  9:41       ` Alex Bennée
2018-05-24  9:41         ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 07/18] arm64: fpsimd: Eliminate task->mm checks Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 11:48   ` Christoffer Dall
2018-05-23 11:48     ` Christoffer Dall
2018-05-23 13:31     ` Dave Martin
2018-05-23 13:31       ` Dave Martin
2018-05-23 14:56       ` Catalin Marinas
2018-05-23 14:56         ` Catalin Marinas
2018-05-23 15:03         ` Dave Martin
2018-05-23 15:03           ` Dave Martin
2018-05-23 16:42           ` Catalin Marinas
2018-05-23 16:42             ` Catalin Marinas
2018-05-24  8:33           ` Christoffer Dall
2018-05-24  8:33             ` Christoffer Dall
2018-05-24  9:16             ` Alex Bennée
2018-05-24  9:16               ` Alex Bennée
2018-05-24  9:50             ` Dave Martin
2018-05-24  9:50               ` Dave Martin
2018-05-24 10:06               ` Christoffer Dall
2018-05-24 10:06                 ` Christoffer Dall
2018-05-24 14:37                 ` Dave Martin
2018-05-24 14:37                   ` Dave Martin
2018-05-25  9:00                   ` Christoffer Dall
2018-05-25  9:00                     ` Christoffer Dall
2018-05-25  9:45                     ` Dave Martin
2018-05-25  9:45                       ` Dave Martin
2018-05-25 11:28                       ` Christoffer Dall
2018-05-25 11:28                         ` Christoffer Dall
2018-05-24  9:19   ` Alex Bennée
2018-05-24  9:19     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 08/18] arm64/sve: Refactor user SVE trap maintenance for external use Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 20:16   ` Alex Bennée
2018-05-23 20:16     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 09/18] KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24  9:21   ` Alex Bennée
2018-05-24  9:21     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 10/18] KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24 10:09   ` Alex Bennée
2018-05-24 10:09     ` Alex Bennée
2018-05-24 10:18     ` Dave Martin
2018-05-24 10:18       ` Dave Martin
2018-05-22 16:05 ` [PATCH v10 11/18] arm64/sve: Move read_zcr_features() out of cpufeature.h Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24 10:12   ` Alex Bennée
2018-05-24 10:12     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 12/18] arm64/sve: Switch sve_pffr() argument from task to thread Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24 10:12   ` Alex Bennée
2018-05-24 10:12     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 13/18] arm64/sve: Move sve_pffr() to fpsimd.h and make inline Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24 10:20   ` Alex Bennée
2018-05-24 10:20     ` Alex Bennée
2018-05-24 11:22     ` Dave Martin
2018-05-24 11:22       ` Dave Martin
2018-05-22 16:05 ` [PATCH v10 14/18] KVM: arm64: Save host SVE context as appropriate Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-23 14:59   ` Catalin Marinas
2018-05-23 14:59     ` Catalin Marinas
2018-05-24  9:11   ` Christoffer Dall
2018-05-24  9:11     ` Christoffer Dall
2018-05-24 14:49   ` Alex Bennée
2018-05-24 14:49     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 15/18] KVM: arm64: Remove eager host SVE state saving Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24 14:54   ` Alex Bennée
2018-05-24 14:54     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 16/18] KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit() Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24  9:11   ` Christoffer Dall
2018-05-24  9:11     ` Christoffer Dall
2018-05-24 15:02   ` Alex Bennée
2018-05-24 15:02     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 17/18] KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit() Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24  9:12   ` Christoffer Dall
2018-05-24  9:12     ` Christoffer Dall
2018-05-24 15:06   ` Alex Bennée
2018-05-24 15:06     ` Alex Bennée
2018-05-22 16:05 ` [PATCH v10 18/18] KVM: arm64: Invoke FPSIMD context switch trap from C Dave Martin
2018-05-22 16:05   ` Dave Martin
2018-05-24 15:09   ` Alex Bennée
2018-05-24 15:09     ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.