From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752516AbdBTLFS (ORCPT ); Mon, 20 Feb 2017 06:05:18 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:36754 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751341AbdBTLFQ (ORCPT ); Mon, 20 Feb 2017 06:05:16 -0500 Subject: Re: RFC: Getting rid of LTR in VMX To: Andy Lutomirski , kvm list , "linux-kernel@vger.kernel.org" , X86 ML , "H. Peter Anvin" , Borislav Petkov References: From: Paolo Bonzini Message-ID: <38d613dc-5b46-1f4f-04d1-53c01932e6d6@redhat.com> Date: Mon, 20 Feb 2017 12:05:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 18/02/2017 04:29, Andy Lutomirski wrote: > There's no code here because the patch is trivial, but I want to run > the idea by you all first to see if there are any issues. > > VMX is silly and forces the TSS limit to the minimum on VM exits. KVM > wastes lots of cycles bumping it back up to accomodate the io bitmap. Actually looked at the code now... reload_tss is only invoked for userspace exits, so it is a nice-to-have but it wouldn't show on most workloads. Still it does save 150-200 clock cycles to remove it (I just commented out reload_tss() from __vmx_load_host_state to test). Another 100-150 could be saved if we could just use rdgsbase/wrgsbase, instead of rdmsr/wrmsr, to read and write the kernel GS. Super hacky patch after sig. vmx_save_host_state is really slow... It would be nice if Intel defined an XSAVES state to do it (just like AMD's vmload/vmsave). Paolo diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9856b73a21ad..e76bfec463bc 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2138,9 +2138,12 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu) #endif #ifdef CONFIG_X86_64 - rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base); + local_irq_disable(); + asm volatile("swapgs; rdgsbase %0" : "=r" (vmx->msr_host_kernel_gs_base)); if (is_long_mode(&vmx->vcpu)) - wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); + asm volatile("wrgsbase %0" : : "r" (vmx->msr_guest_kernel_gs_base)); + asm volatile("swapgs"); + local_irq_enable(); #endif if (boot_cpu_has(X86_FEATURE_MPX)) rdmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs); @@ -2152,14 +2155,20 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu) static void __vmx_load_host_state(struct vcpu_vmx *vmx) { + unsigned long flags; + if (!vmx->host_state.loaded) return; ++vmx->vcpu.stat.host_state_reload; vmx->host_state.loaded = 0; #ifdef CONFIG_X86_64 + local_irq_save(flags); + asm("swapgs"); if (is_long_mode(&vmx->vcpu)) - rdmsrl(MSR_KERNEL_GS_BASE, vmx->msr_guest_kernel_gs_base); + asm volatile("rdgsbase %0" : "=r" (vmx->msr_guest_kernel_gs_base)); + asm volatile("wrgsbase %0; swapgs" : : "r" (vmx->msr_host_kernel_gs_base)); + local_irq_restore(flags); #endif if (vmx->host_state.gs_ldt_reload_needed) { kvm_load_ldt(vmx->host_state.ldt_sel); @@ -2177,10 +2186,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx) loadsegment(es, vmx->host_state.es_sel); } #endif - reload_tss(); -#ifdef CONFIG_X86_64 - wrmsrl(MSR_KERNEL_GS_BASE, vmx->msr_host_kernel_gs_base); -#endif + //reload_tss(); if (vmx->host_state.msr_host_bndcfgs) wrmsrl(MSR_IA32_BNDCFGS, vmx->host_state.msr_host_bndcfgs); load_gdt(this_cpu_ptr(&host_gdt)); @@ -3469,6 +3475,7 @@ static int hardware_enable(void) wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits); } cr4_set_bits(X86_CR4_VMXE); + cr4_set_bits(X86_CR4_FSGSBASE); if (vmm_exclusive) { kvm_cpu_vmxon(phys_addr); > I propose that we rework this. Add a percpu variable that indicates > whether the TSS limit needs to be refreshed. On task switch, if the > new task has TIF_IO_BITMAP set, then check that flag and, if set, > refresh TR and clear the flag. On VMX exit, set the flag. > > The TSS limit is (phew!) invisible to userspace, so we don't have ABI > issues to worry about here. We also shouldn't have security issues > because a too-low TSS limit just results in unprivileged IO operations > generating #GP, which is exactly what we want. > > What do you all think? I expect a speedup of a couple hundred cycles > on each VM exit.