Re: [PATCH 2/2] KVM: arm64: Leave vcpu FPSIMD synchronization in host

From: Dave Martin <Dave.Martin@arm.com>
To: Andrew Scull <ascull@google.com>
Cc: maz@kernel.org, kernel-team@android.com, kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH 2/2] KVM: arm64: Leave vcpu FPSIMD synchronization in host
Date: Mon, 13 Jul 2020 17:04:21 +0100	[thread overview]
Message-ID: <20200713160421.GS10992@arm.com> (raw)
In-Reply-To: <20200710095754.3641976-3-ascull@google.com>

On Fri, Jul 10, 2020 at 10:57:54AM +0100, Andrew Scull wrote:
> The task state can be checked by the host and the vcpu flags updated
> before calling into hyp. This more neatly separates the concerns and
> removes the need to map the task flags to EL2.
> 
> Hyp acts on the state provided to it by the host and updates it when
> switching to the vcpu state.

Can this patch be split up?  We have a few overlapping changes here.

i.e., renaming and refactoring of hooks; moving some code around; and a
couple of other changes that are not directly related (noted below).

Overall this looks like a decent cleanup however.  It was always a bit
nasty to have to map the thread flags into Hyp.

Side question: currently we do fpsimd_save_and_flush_cpu_state() in
kvm_arch_vcpu_put_fp().  Can we remove the flush so that the vcpu state
lingers in the CPU regs and can be reclaimed when switching back to the
KVM thread?

This could remove some overhead when the KVM run loop is preempted by a
kernel thread and subsequently resumed without passing through userspace.

We would need to flush this state when anything else tries to change the
vcpu FP regs, which is one reason I skipped this previously: it would
require a bit of refactoring of fpsimd_flush_task_state() so that a non-
task context can also be flushed.

(This isn't directly related to this series.)

Additional minor comments below.

> 
> No functional change.
> 
> Signed-off-by: Andrew Scull <ascull@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h       |  5 +--
>  arch/arm64/kvm/arm.c                    |  4 +-
>  arch/arm64/kvm/fpsimd.c                 | 57 ++++++++++++++++++-------
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 19 ---------
>  arch/arm64/kvm/hyp/nvhe/switch.c        |  3 +-
>  arch/arm64/kvm/hyp/vhe/switch.c         |  3 +-
>  6 files changed, 48 insertions(+), 43 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b06f24b5f443..ca1621eeb9d9 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -24,7 +24,6 @@
>  #include <asm/fpsimd.h>
>  #include <asm/kvm.h>
>  #include <asm/kvm_asm.h>
> -#include <asm/thread_info.h>
>  
>  #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>  
> @@ -320,7 +319,6 @@ struct kvm_vcpu_arch {
>  	struct kvm_guest_debug_arch vcpu_debug_state;
>  	struct kvm_guest_debug_arch external_debug_state;
>  
> -	struct thread_info *host_thread_info;	/* hyp VA */
>  	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
>  
>  	struct {
> @@ -616,7 +614,8 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  /* Guest/host FPSIMD coordination helpers */
>  int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> -void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_enter_ctxsync_fp(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_exit_ctxsync_fp(struct kvm_vcpu *vcpu);

I find these names a bit confusing.

Maybe

	kvm_arch_vcpu_ctxsync_fp_before_guest_enter()
	kvm_arch_vcpu_ctxsync_fp_after_guest_exit()

(we could probably drop the "ctx" to make these slightly shorter).

>  void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
>  
>  static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 98f05bdac3c1..c7a711ca840e 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -682,6 +682,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  
>  		local_irq_disable();
>  
> +		kvm_arch_vcpu_enter_ctxsync_fp(vcpu);
> +
>  		kvm_vgic_flush_hwstate(vcpu);
>  
>  		/*
> @@ -769,7 +771,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		if (static_branch_unlikely(&userspace_irqchip_in_use))
>  			kvm_timer_sync_user(vcpu);
>  
> -		kvm_arch_vcpu_ctxsync_fp(vcpu);
> +		kvm_arch_vcpu_exit_ctxsync_fp(vcpu);
>  
>  		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
> diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
> index 3e081d556e81..aec91f43df62 100644
> --- a/arch/arm64/kvm/fpsimd.c
> +++ b/arch/arm64/kvm/fpsimd.c
> @@ -27,22 +27,13 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
>  {
>  	int ret;
>  
> -	struct thread_info *ti = &current->thread_info;
>  	struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
>  
> -	/*
> -	 * Make sure the host task thread flags and fpsimd state are
> -	 * visible to hyp:
> -	 */
> -	ret = create_hyp_mappings(ti, ti + 1, PAGE_HYP);
> -	if (ret)
> -		goto error;
> -
> +	/* Make sure the host task fpsimd state is visible to hyp: */
>  	ret = create_hyp_mappings(fpsimd, fpsimd + 1, PAGE_HYP);
>  	if (ret)
>  		goto error;
>  
> -	vcpu->arch.host_thread_info = kern_hyp_va(ti);
>  	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
>  error:
>  	return ret;
> @@ -52,7 +43,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
>   * Prepare vcpu for saving the host's FPSIMD state and loading the guest's.
>   * The actual loading is done by the FPSIMD access trap taken to hyp.
>   *
> - * Here, we just set the correct metadata to indicate that the FPSIMD
> + * Here, we just set the correct metadata to indicate whether the FPSIMD
>   * state in the cpu regs (if any) belongs to current on the host.
>   *
>   * TIF_SVE is backed up here, since it may get clobbered with guest state.
> @@ -63,15 +54,46 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
>  	BUG_ON(!current->mm);
>  
>  	vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
> +			      KVM_ARM64_FP_HOST |
>  			      KVM_ARM64_HOST_SVE_IN_USE |
>  			      KVM_ARM64_HOST_SVE_ENABLED);
> +
> +	if (!system_supports_fpsimd())
> +		return;
> +
> +	/*
> +	 * Having just come from the user task, if any FP state is loaded it
> +	 * will be that of the task. Make a note of this but, just before
> +	 * entering the vcpu, it will be double checked that the loaded FP
> +	 * state isn't transient because things could change between now and
> +	 * then.
> +	 */
>  	vcpu->arch.flags |= KVM_ARM64_FP_HOST;
>  
> -	if (test_thread_flag(TIF_SVE))
> -		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
> +	if (system_supports_sve()) {

This is a change and should be noted.

Looks reasonable, though.

To ensure that we aren't breaking assumptions here, double-check that we
also have system_supports_sve() in the appropriate places in the Hyp
code.   We almost certainly do, but it doesn't hurt to review it.

> +		if (test_thread_flag(TIF_SVE))
> +			vcpu->arch.flags |= KVM_ARM64_HOST_SVE_IN_USE;
>  
> -	if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
> -		vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED;
> +		if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
> +			vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED;
> +	}
> +}
> +
> +void kvm_arch_vcpu_enter_ctxsync_fp(struct kvm_vcpu *vcpu)
> +{
> +	WARN_ON_ONCE(!irqs_disabled());
> +
> +	if (!system_supports_fpsimd())
> +		return;
> +
> +	/*
> +	 * If the CPU's FP state is transient, there is no need to save the
> +	 * current state. Without further information, it must also be assumed
> +	 * that the vcpu's state is not loaded.
> +	 */
> +	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
> +		vcpu->arch.flags &= ~(KVM_ARM64_FP_ENABLED |
> +				      KVM_ARM64_FP_HOST);
>  }
>  
>  /*
> @@ -80,7 +102,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
>   * so that they will be written back if the kernel clobbers them due to
>   * kernel-mode NEON before re-entry into the guest.
>   */
> -void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
> +void kvm_arch_vcpu_exit_ctxsync_fp(struct kvm_vcpu *vcpu)
>  {
>  	WARN_ON_ONCE(!irqs_disabled());
>  
> @@ -106,6 +128,9 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
>  	bool host_has_sve = system_supports_sve();
>  	bool guest_has_sve = vcpu_has_sve(vcpu);
>  
> +	if (!system_supports_fpsimd())
> +		return;
> +

Is this a bugfix, an optimisation, or what?

[...]

Cheers
---Dave
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm