From mboxrd@z Thu Jan  1 00:00:00 1970
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
Subject: Re: [PATCH v2] KVM: nVMX: Track active shadow VMCSs
Date: Wed, 27 Jul 2016 22:45:34 +0200
Message-ID: <20160727204533.GA29177@potion>
References: <CALMp9eQo_B0t_5E0Od5H2YCrGes_qbVUSe9bVDVA1u+MMTvy4g@mail.gmail.com>
 <1469211903-21781-1-git-send-email-jmattson@google.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, bsd@redhat.com,
	dmatlack@google.com, pfeiner@google.com
To: Jim Mattson <jmattson@google.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:34290 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932328AbcG0Upi (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 27 Jul 2016 16:45:38 -0400
Content-Disposition: inline
In-Reply-To: <1469211903-21781-1-git-send-email-jmattson@google.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

2016-07-22 11:25-0700, Jim Mattson:
> If a shadow VMCS is referenced by the VMCS link pointer in the
> current VMCS, then VM-entry makes the shadow VMCS active on the
> current processor. No VMCS should ever be active on more than one
> processor. If a VMCS is to be migrated from one processor to
> another, software should execute a VMCLEAR for the VMCS on the
> first processor before the VMCS is made active on the second
> processor.
> 
> We already keep track of ordinary VMCSs that are made active by
> VMPTRLD. Extend that tracking to handle shadow VMCSs that are
> made active by VM-entry to their parents.
> 
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -2113,6 +2113,15 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>  			new.control) != old.control);
>  }
>  
> +static void record_loaded_vmcs(struct loaded_vmcs *loaded_vmcs, int cpu)
> +{
> +	if (loaded_vmcs->vmcs) {
> +		list_add(&loaded_vmcs->loaded_vmcss_on_cpu_link,
> +			 &per_cpu(loaded_vmcss_on_cpu, cpu));
> +		loaded_vmcs->cpu = cpu;
> +	}
> +}
> +
>  /*
>   * Switches to specified vcpu, until a matching vcpu_put(), but assumes
>   * vcpu mutex is already taken.
> @@ -2124,15 +2133,13 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  
>  	if (!vmm_exclusive)
>  		kvm_cpu_vmxon(phys_addr);
> -	else if (vmx->loaded_vmcs->cpu != cpu)
> +	else if (vmx->loaded_vmcs->cpu != cpu) {
>  		loaded_vmcs_clear(vmx->loaded_vmcs);
> +		if (vmx->nested.current_shadow_vmcs.vmcs)
> +			loaded_vmcs_clear(&vmx->nested.current_shadow_vmcs);

loaded_vmcs_clear() uses expensive IPI, so would want to do both in one
call in future patches as they are always active on the same CPU.

Another possible complication is marking current_shadow_vmcs as active
on a cpu only after a successful vmlaunch.
(We don't seem to ever vmptrld shadow vmcs explicitly.)

> +        }

Incorrect whitespace for indentation.

>  
>  	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
> -		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
> -		vmcs_load(vmx->loaded_vmcs->vmcs);
> -	}
> -
> -	if (vmx->loaded_vmcs->cpu != cpu) {

This condition is nice for performance because a non-current vmcs is
usually already active on the same CPU, so we skip all the code below.

(This is the only thing that has to be fixed as it regresses non-nested,
 the rest are mostly ideas for simplifications.)

>  		struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
>  		unsigned long sysenter_esp;
>  
> @@ -2147,11 +2154,15 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		 */
>  		smp_rmb();
>  
> -		list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link,
> -			 &per_cpu(loaded_vmcss_on_cpu, cpu));
> +		record_loaded_vmcs(vmx->loaded_vmcs, cpu);

Adding and an element to a list multiple times seems invalid, which the
condition was also guarding.

> +		record_loaded_vmcs(&vmx->nested.current_shadow_vmcs, cpu);

current_shadow_vmcs is always active on the same cpu as loaded_vmcs ...
I think we could skip the list and just clear current_shadow_vmcs when
clearing its loaded_vmcs.

>  		crash_enable_local_vmclear(cpu);
>  		local_irq_enable();
>  
> +		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
> +		vmcs_load(vmx->loaded_vmcs->vmcs);
> +
>  		/*
>  		 * Linux uses per-cpu TSS and GDT, so set these when switching
>  		 * processors.
> @@ -2161,8 +2172,6 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  
>  		rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
>  		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
> -
> -		vmx->loaded_vmcs->cpu = cpu;
>  	}
>  
>  	/* Setup TSC multiplier */
> @@ -6812,6 +6821,34 @@ static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason,
>  	return 0;
>  }
>  
> +static int setup_shadow_vmcs(struct vcpu_vmx *vmx)
> +{
> +	struct vmcs *shadow_vmcs;
> +	int cpu;
> +
> +	shadow_vmcs = alloc_vmcs();
> +	if (!shadow_vmcs)
> +		return -ENOMEM;
> +
> +	/* mark vmcs as shadow */
> +	shadow_vmcs->revision_id |= (1u << 31);
> +	/* init shadow vmcs */
> +	vmx->nested.current_shadow_vmcs.vmcs = shadow_vmcs;
> +	loaded_vmcs_init(&vmx->nested.current_shadow_vmcs);
> +
> +	cpu = get_cpu();
> +	local_irq_disable();
> +	crash_disable_local_vmclear(cpu);
> +
> +	record_loaded_vmcs(&vmx->nested.current_shadow_vmcs, cpu);

This could be avoided if we assumed that shadow vmcs is always active on
the same vcpu.  The assumption looks rock-solid, because shadow vmcs is
activated on vmlaunch and its linking vmcs must be active (and current)
on the same CPU.

> +
> +	crash_enable_local_vmclear(cpu);
> +	local_irq_enable();
> +	put_cpu();
> +
> +	return 0;
> +}
> +
>  /*
>   * Emulate the VMXON instruction.
>   * Currently, we just remember that VMX is active, and do not save or even
> @@ -6867,14 +6903,9 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
>  	}
>  
>  	if (enable_shadow_vmcs) {
> -		shadow_vmcs = alloc_vmcs();
> -		if (!shadow_vmcs)
> -			return -ENOMEM;
> -		/* mark vmcs as shadow */
> -		shadow_vmcs->revision_id |= (1u << 31);
> -		/* init shadow vmcs */
> -		vmcs_clear(shadow_vmcs);
> -		vmx->nested.current_shadow_vmcs = shadow_vmcs;
> +		int ret = setup_shadow_vmcs(vmx);
> +		if (ret < 0)
> +			return ret;
>  	}
>  
>  	INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));