Re: [PATCH v2] KVM: SVM: use vmsave/vmload for saving/restoring additional host state

From: Michael Roth <michael.roth@amd.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: Re: [PATCH v2] KVM: SVM: use vmsave/vmload for saving/restoring additional host state
Date: Mon, 14 Dec 2020 16:02:13 -0600	[thread overview]
Message-ID: <20201214220213.np7ytcxmm6xcyllm@amd.com> (raw)
In-Reply-To: <X9e/L3YTAT/N+ljh@google.com>

On Mon, Dec 14, 2020 at 11:38:23AM -0800, Sean Christopherson wrote:
> +Andy, who provided a lot of feedback on v1.
> 
> On Mon, Dec 14, 2020, Michael Roth wrote:
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> 
> > Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> > v2:
> > * rebase on latest kvm/next
> > * move VMLOAD to just after vmexit so we can use it to handle all FS/GS
> >   host state restoration and rather than relying on loadsegment() and
> >   explicit write to MSR_GS_BASE (Andy)
> > * drop 'host' field from struct vcpu_svm since it is no longer needed
> >   for storing FS/GS/LDT state (Andy)
> > ---
> >  arch/x86/kvm/svm/svm.c | 44 ++++++++++++++++--------------------------
> >  arch/x86/kvm/svm/svm.h | 14 +++-----------
> >  2 files changed, 20 insertions(+), 38 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 0e52fac4f5ae..fb15b7bd461f 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1367,15 +1367,19 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >  		vmcb_mark_all_dirty(svm->vmcb);
> >  	}
> >  
> > -#ifdef CONFIG_X86_64
> > -	rdmsrl(MSR_GS_BASE, to_svm(vcpu)->host.gs_base);
> > -#endif
> > -	savesegment(fs, svm->host.fs);
> > -	savesegment(gs, svm->host.gs);
> > -	svm->host.ldt = kvm_read_ldt();
> > -
> > -	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
> > +	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) {
> >  		rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
> > +	}

Hi Sean,

Hopefully I've got my email situation sorted out now...

> 
> Unnecessary change that violates preferred coding style.  Checkpatch explicitly
> complains about this.
> 
> WARNING: braces {} are not necessary for single statement blocks
> #132: FILE: arch/x86/kvm/svm/svm.c:1370:
> +	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) {
>  		rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
> +

Sorry, that was an artifact from an earlier version of the patch that I
failed to notice. I'll make sure to run everything through checkpatch
going forward.

> 
> > +
> > +	asm volatile(__ex("vmsave")
> > +		     : : "a" (page_to_pfn(sd->save_area) << PAGE_SHIFT)
> 
> I'm pretty sure this can be page_to_phys().
> 
> > +		     : "memory");
> 
> I think we can defer this until we're actually planning on running the guest,
> i.e. put this in svm_prepare_guest_switch().

One downside to that is that we'd need to do the VMSAVE on every
iteration of vcpu_run(), as opposed to just once when we enter from
userspace via KVM_RUN. It ends up being a similar situation to Andy's
earlier suggestion of moving VMLOAD just after vmexit, but in that case
we were able to remove an MSR write to MSR_GS_BASE, which cancelled out
the overhead, but in this case I think it could only cost us extra.

It looks like the SEV-ES patches might land before this one, and those
introduce similar handling of VMSAVE in svm_vcpu_load(), so I think it
might also create some churn there if we take this approach and want to
keep the SEV-ES and non-SEV-ES handling similar.

> 
> > +	/*
> > +	 * Store a pointer to the save area to we can access it after
> > +	 * vmexit for vmload. This is needed since per-cpu accesses
> > +	 * won't be available until GS is restored as part of vmload
> > +	 */
> > +	svm->host_save_area = sd->save_area;
> 
> Unless I'm missing something with how SVM loads guest state, you can avoid
> adding host_save_area by saving the PA of the save area on the stack prior to
> the vmload of guest state.

Yes, that is much nicer. Not sure what I was thinking there.

> 
> >  
> >  	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
> >  		u64 tsc_ratio = vcpu->arch.tsc_scaling_ratio;
> > @@ -1403,18 +1407,10 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> >  	avic_vcpu_put(vcpu);
> >  
> >  	++vcpu->stat.host_state_reload;
> > -	kvm_load_ldt(svm->host.ldt);
> > -#ifdef CONFIG_X86_64
> > -	loadsegment(fs, svm->host.fs);
> > -	wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gsbase);
> > -	load_gs_index(svm->host.gs);
> > -#else
> > -#ifdef CONFIG_X86_32_LAZY_GS
> > -	loadsegment(gs, svm->host.gs);
> > -#endif
> > -#endif
> > -	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++)
> > +
> > +	for (i = 0; i < NR_HOST_SAVE_USER_MSRS; i++) {
> >  		wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
> > +	}
> 
> Same bogus change and checkpatch warning.
> 
> >  }
> >  
> >  static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
> > @@ -3507,14 +3503,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> >  
> >  	__svm_vcpu_run(svm->vmcb_pa, (unsigned long *)&svm->vcpu.arch.regs);
> 
> Tying in with avoiding svm->host_save_area, what about passing in the PA of the
> save area and doing the vmload in __svm_vcpu_run()?  One less instance of inline
> assembly to stare at...

That sounds like a nice clean-up, I'll give this a shot.

> 
> >  
> > -#ifdef CONFIG_X86_64
> > -	native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> > -#else
> > -	loadsegment(fs, svm->host.fs);
> > -#ifndef CONFIG_X86_32_LAZY_GS
> > -	loadsegment(gs, svm->host.gs);
> > -#endif
> > -#endif
> > +	asm volatile(__ex("vmload")
> > +		     : : "a" (page_to_pfn(svm->host_save_area) << PAGE_SHIFT));
> >  
> >  	/*
> >  	 * VMEXIT disables interrupts (host state), but tracing and lockdep
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index fdff76eb6ceb..bf01a8c19ec0 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -21,11 +21,6 @@
> >  #include <asm/svm.h>
> >  
> >  static const u32 host_save_user_msrs[] = {
> > -#ifdef CONFIG_X86_64
> > -	MSR_STAR, MSR_LSTAR, MSR_CSTAR, MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
> > -	MSR_FS_BASE,
> > -#endif
> > -	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
> >  	MSR_TSC_AUX,
> 
> With this being whittled down to TSC_AUX, a good follow-on series would be to
> add SVM usage of the "user return" MSRs to handle TSC_AUX.  If no one objects,
> I'll plan on doing that in the not-too-distant future as a ramp task of sorts.

No objection here. I mainly held off on removing the list since it might be
nice to have some infrastructure to handle future cases, but if it's already
there then great :)

Thanks for the suggestions. I'll work on a v3 with those applied.

-Mike

> 
> >  };
> >  
> > @@ -117,12 +112,9 @@ struct vcpu_svm {
> >  	u64 next_rip;
> >  
> >  	u64 host_user_msrs[NR_HOST_SAVE_USER_MSRS];
> > -	struct {
> > -		u16 fs;
> > -		u16 gs;
> > -		u16 ldt;
> > -		u64 gs_base;
> > -	} host;
> > +
> > +	/* set by vcpu_load(), for use when per-cpu accesses aren't available */
> > +	struct page *host_save_area;
> >  
> >  	u64 spec_ctrl;
> >  	/*
> > -- 
> > 2.25.1
> >