All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Nadav Har'El <nyh@math.technion.ac.il>, Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"gleb@redhat.com" <gleb@redhat.com>,
	"Roedel, Joerg" <Joerg.Roedel@amd.com>
Subject: RE: [PATCH 08/31] nVMX: Fix local_vcpus_link handling
Date: Tue, 24 May 2011 10:22:59 +0800	[thread overview]
Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C9BEF07A9@shsmsx502.ccr.corp.intel.com> (raw)
In-Reply-To: <20110523185104.GA26899@fermat.math.technion.ac.il>

> From: Nadav Har'El
> Sent: Tuesday, May 24, 2011 2:51 AM
> 
> > >+	vmcs_init(vmx->loaded_vmcs->vmcs);
> > >+	vmx->loaded_vmcs->cpu = -1;
> > >+	vmx->loaded_vmcs->launched = 0;
> >
> > Perhaps a loaded_vmcs_init() to encapsulate initialization of these
> > three fields, you'll probably reuse it later.
> 
> It's good you pointed this out, because it made me suddenly realise that I
> forgot to VMCLEAR the new vmcs02's I allocate. In practice it never made a
> difference, but better safe than sorry.

yes, that's what spec requires. You need VMCLEAR on any new VMCS which
does implementation specific initialization in that VMCS region.

> 
> I had to restructure some of the code a bit to be able to properly use this
> new function (in 3 places - __loaded_vmcs_clear, nested_get_current_vmcs02,
> vmx_create_cpu).
> 
> > Please repost separately after the fix, I'd like to apply it before the
> > rest of the series.
> 
> I am adding a new version of this patch at the end of this mail.
> 
> > (regarding interrupts, I think we can do that work post-merge.  But I'd
> > like to see Kevin's comments addressed)
> 
> I replied to his comments. Done some of the things he asked, and asked for
> more info on why/where he believes the current code is incorrect where I
> didn't understand what problems he pointed to, and am now waiting for him
> to reply.

As I replied in another thread, I believe this has been explained clearly by Nadav.

> 
> 
> ------- 8< ------ 8< ---------- 8< ---------- 8< ----------- 8< -----------
> 
> Subject: [PATCH 01/31] nVMX: Keep list of loaded VMCSs, instead of vcpus.
> 
> In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
> because (at least in theory) the processor might not have written all of its
> content back to memory. Since a patch from June 26, 2008, this is done using
> a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.
> 
> The problem is that with nested VMX, we no longer have the concept of a
> vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
> L2s), and each of those may be have been last loaded on a different cpu.
> 
> So instead of linking the vcpus, we link the VMCSs, using a new structure
> loaded_vmcs. This structure contains the VMCS, and the information
> pertaining
> to its loading on a specific cpu (namely, the cpu number, and whether it
> was already launched on this cpu once). In nested we will also use the same
> structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
> currently active VMCS.
> 
> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> ---
>  arch/x86/kvm/vmx.c |  150 ++++++++++++++++++++++++-------------------
>  1 file changed, 86 insertions(+), 64 deletions(-)
> 
> --- .before/arch/x86/kvm/vmx.c	2011-05-23 21:46:14.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c	2011-05-23 21:46:14.000000000 +0300
> @@ -116,6 +116,18 @@ struct vmcs {
>  	char data[0];
>  };
> 
> +/*
> + * Track a VMCS that may be loaded on a certain CPU. If it is (cpu!=-1), also
> + * remember whether it was VMLAUNCHed, and maintain a linked list of all
> VMCSs
> + * loaded on this CPU (so we can clear them if the CPU goes down).
> + */
> +struct loaded_vmcs {
> +	struct vmcs *vmcs;
> +	int cpu;
> +	int launched;
> +	struct list_head loaded_vmcss_on_cpu_link;
> +};
> +
>  struct shared_msr_entry {
>  	unsigned index;
>  	u64 data;
> @@ -124,9 +136,7 @@ struct shared_msr_entry {
> 
>  struct vcpu_vmx {
>  	struct kvm_vcpu       vcpu;
> -	struct list_head      local_vcpus_link;
>  	unsigned long         host_rsp;
> -	int                   launched;
>  	u8                    fail;
>  	u8                    cpl;
>  	bool                  nmi_known_unmasked;
> @@ -140,7 +150,14 @@ struct vcpu_vmx {
>  	u64 		      msr_host_kernel_gs_base;
>  	u64 		      msr_guest_kernel_gs_base;
>  #endif
> -	struct vmcs          *vmcs;
> +	/*
> +	 * loaded_vmcs points to the VMCS currently used in this vcpu. For a
> +	 * non-nested (L1) guest, it always points to vmcs01. For a nested
> +	 * guest (L2), it points to a different VMCS.
> +	 */
> +	struct loaded_vmcs    vmcs01;
> +	struct loaded_vmcs   *loaded_vmcs;
> +	bool                  __launched; /* temporary, used in
> vmx_vcpu_run */
>  	struct msr_autoload {
>  		unsigned nr;
>  		struct vmx_msr_entry guest[NR_AUTOLOAD_MSRS];
> @@ -200,7 +217,11 @@ static int vmx_set_tss_addr(struct kvm *
> 
>  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
>  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> -static DEFINE_PER_CPU(struct list_head, vcpus_on_cpu);
> +/*
> + * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is
> needed
> + * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded
> on it.
> + */
> +static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
>  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> 
>  static unsigned long *vmx_io_bitmap_a;
> @@ -501,6 +522,13 @@ static void vmcs_clear(struct vmcs *vmcs
>  		       vmcs, phys_addr);
>  }
> 
> +static inline void loaded_vmcs_init(struct loaded_vmcs *loaded_vmcs)
> +{
> +	vmcs_clear(loaded_vmcs->vmcs);
> +	loaded_vmcs->cpu = -1;
> +	loaded_vmcs->launched = 0;
> +}
> +

call it vmcs_init instead since you now remove original vmcs_init invocation,
which more reflect the necessity of adding VMCLEAR for a new vmcs?

>  static void vmcs_load(struct vmcs *vmcs)
>  {
>  	u64 phys_addr = __pa(vmcs);
> @@ -514,25 +542,24 @@ static void vmcs_load(struct vmcs *vmcs)
>  		       vmcs, phys_addr);
>  }
> 
> -static void __vcpu_clear(void *arg)
> +static void __loaded_vmcs_clear(void *arg)
>  {
> -	struct vcpu_vmx *vmx = arg;
> +	struct loaded_vmcs *loaded_vmcs = arg;
>  	int cpu = raw_smp_processor_id();
> 
> -	if (vmx->vcpu.cpu == cpu)
> -		vmcs_clear(vmx->vmcs);
> -	if (per_cpu(current_vmcs, cpu) == vmx->vmcs)
> +	if (loaded_vmcs->cpu != cpu)
> +		return; /* cpu migration can race with cpu offline */

what do you mean by "cpu migration" here? why does 'cpu offline'
matter here regarding to the cpu change?

> +	if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs)
>  		per_cpu(current_vmcs, cpu) = NULL;
> -	list_del(&vmx->local_vcpus_link);
> -	vmx->vcpu.cpu = -1;
> -	vmx->launched = 0;
> +	list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
> +	loaded_vmcs_init(loaded_vmcs);
>  }
> 
> -static void vcpu_clear(struct vcpu_vmx *vmx)
> +static void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
>  {
> -	if (vmx->vcpu.cpu == -1)
> -		return;
> -	smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 1);
> +	if (loaded_vmcs->cpu != -1)
> +		smp_call_function_single(
> +			loaded_vmcs->cpu, __loaded_vmcs_clear, loaded_vmcs, 1);
>  }
> 
>  static inline void vpid_sync_vcpu_single(struct vcpu_vmx *vmx)
> @@ -971,22 +998,22 @@ static void vmx_vcpu_load(struct kvm_vcp
> 
>  	if (!vmm_exclusive)
>  		kvm_cpu_vmxon(phys_addr);
> -	else if (vcpu->cpu != cpu)
> -		vcpu_clear(vmx);
> +	else if (vmx->loaded_vmcs->cpu != cpu)
> +		loaded_vmcs_clear(vmx->loaded_vmcs);
> 
> -	if (per_cpu(current_vmcs, cpu) != vmx->vmcs) {
> -		per_cpu(current_vmcs, cpu) = vmx->vmcs;
> -		vmcs_load(vmx->vmcs);
> +	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
> +		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
> +		vmcs_load(vmx->loaded_vmcs->vmcs);
>  	}
> 
> -	if (vcpu->cpu != cpu) {
> +	if (vmx->loaded_vmcs->cpu != cpu) {
>  		struct desc_ptr *gdt = &__get_cpu_var(host_gdt);
>  		unsigned long sysenter_esp;
> 
>  		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>  		local_irq_disable();
> -		list_add(&vmx->local_vcpus_link,
> -			 &per_cpu(vcpus_on_cpu, cpu));
> +		list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link,
> +			 &per_cpu(loaded_vmcss_on_cpu, cpu));
>  		local_irq_enable();
> 
>  		/*
> @@ -998,6 +1025,7 @@ static void vmx_vcpu_load(struct kvm_vcp
> 
>  		rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
>  		vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
> +		vmx->loaded_vmcs->cpu = cpu;
>  	}
>  }
> 
> @@ -1005,7 +1033,8 @@ static void vmx_vcpu_put(struct kvm_vcpu
>  {
>  	__vmx_load_host_state(to_vmx(vcpu));
>  	if (!vmm_exclusive) {
> -		__vcpu_clear(to_vmx(vcpu));
> +		__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
> +		vcpu->cpu = -1;
>  		kvm_cpu_vmxoff();
>  	}
>  }
> @@ -1469,7 +1498,7 @@ static int hardware_enable(void *garbage
>  	if (read_cr4() & X86_CR4_VMXE)
>  		return -EBUSY;
> 
> -	INIT_LIST_HEAD(&per_cpu(vcpus_on_cpu, cpu));
> +	INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
>  	rdmsrl(MSR_IA32_FEATURE_CONTROL, old);
> 
>  	test_bits = FEATURE_CONTROL_LOCKED;
> @@ -1493,14 +1522,14 @@ static int hardware_enable(void *garbage
>  	return 0;
>  }
> 
> -static void vmclear_local_vcpus(void)
> +static void vmclear_local_loaded_vmcss(void)
>  {
>  	int cpu = raw_smp_processor_id();
> -	struct vcpu_vmx *vmx, *n;
> +	struct loaded_vmcs *v, *n;
> 
> -	list_for_each_entry_safe(vmx, n, &per_cpu(vcpus_on_cpu, cpu),
> -				 local_vcpus_link)
> -		__vcpu_clear(vmx);
> +	list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
> +				 loaded_vmcss_on_cpu_link)
> +		__loaded_vmcs_clear(v);
>  }
> 
> 
> @@ -1515,7 +1544,7 @@ static void kvm_cpu_vmxoff(void)
>  static void hardware_disable(void *garbage)
>  {
>  	if (vmm_exclusive) {
> -		vmclear_local_vcpus();
> +		vmclear_local_loaded_vmcss();
>  		kvm_cpu_vmxoff();
>  	}
>  	write_cr4(read_cr4() & ~X86_CR4_VMXE);
> @@ -1696,6 +1725,18 @@ static void free_vmcs(struct vmcs *vmcs)
>  	free_pages((unsigned long)vmcs, vmcs_config.order);
>  }
> 
> +/*
> + * Free a VMCS, but before that VMCLEAR it on the CPU where it was last
> loaded
> + */
> +static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs)
> +{
> +	if (!loaded_vmcs->vmcs)
> +		return;
> +	loaded_vmcs_clear(loaded_vmcs);
> +	free_vmcs(loaded_vmcs->vmcs);
> +	loaded_vmcs->vmcs = NULL;
> +}

prefer to not do cleanup work through loaded_vmcs since it's just a pointer
to a loaded vmcs structure. Though you can carefully arrange the nested
vmcs cleanup happening before it, it's not very clean and a bit error prone
simply from this function itself. It's clearer to directly cleanup vmcs01, and
if you want an assertion could be added to make sure loaded_vmcs doesn't
point to any stale vmcs02 structure after nested cleanup step.	

Thanks,
Kevin 

  reply	other threads:[~2011-05-24  2:24 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-16 19:43 [PATCH 0/31] nVMX: Nested VMX, v10 Nadav Har'El
2011-05-16 19:44 ` [PATCH 01/31] nVMX: Add "nested" module option to kvm_intel Nadav Har'El
2011-05-16 19:44 ` [PATCH 02/31] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2011-05-20  7:58   ` Tian, Kevin
2011-05-16 19:45 ` [PATCH 03/31] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2011-05-16 19:45 ` [PATCH 04/31] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2011-05-16 19:46 ` [PATCH 05/31] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2011-05-16 19:46 ` [PATCH 06/31] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2011-05-16 19:47 ` [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2 Nadav Har'El
2011-05-20  8:04   ` Tian, Kevin
2011-05-20  8:48     ` Tian, Kevin
2011-05-20 20:32       ` Nadav Har'El
2011-05-22  2:00         ` Tian, Kevin
2011-05-22  7:22           ` Nadav Har'El
2011-05-24  0:54             ` Tian, Kevin
2011-05-22  8:29     ` Nadav Har'El
2011-05-24  1:03       ` Tian, Kevin
2011-05-16 19:48 ` [PATCH 08/31] nVMX: Fix local_vcpus_link handling Nadav Har'El
2011-05-17 13:19   ` Marcelo Tosatti
2011-05-17 13:35     ` Avi Kivity
2011-05-17 14:35       ` Nadav Har'El
2011-05-17 14:42         ` Marcelo Tosatti
2011-05-17 17:57           ` Nadav Har'El
2011-05-17 15:11         ` Avi Kivity
2011-05-17 18:11           ` Nadav Har'El
2011-05-17 18:43             ` Marcelo Tosatti
2011-05-17 19:30               ` Nadav Har'El
2011-05-17 19:52                 ` Marcelo Tosatti
2011-05-18  5:52                   ` Nadav Har'El
2011-05-18  8:31                     ` Avi Kivity
2011-05-18  9:02                       ` Nadav Har'El
2011-05-18  9:16                         ` Avi Kivity
2011-05-18 12:08                     ` Marcelo Tosatti
2011-05-18 12:19                       ` Nadav Har'El
2011-05-22  8:57                       ` Nadav Har'El
2011-05-23 15:49                         ` Avi Kivity
2011-05-23 16:17                           ` Gleb Natapov
2011-05-23 18:59                             ` Nadav Har'El
2011-05-23 19:03                               ` Gleb Natapov
2011-05-23 16:43                           ` Roedel, Joerg
2011-05-23 16:51                             ` Avi Kivity
2011-05-24  9:22                               ` Roedel, Joerg
2011-05-24  9:28                                 ` Nadav Har'El
2011-05-24  9:57                                   ` Roedel, Joerg
2011-05-24 10:08                                     ` Avi Kivity
2011-05-24 10:12                                     ` Nadav Har'El
2011-05-23 18:51                           ` Nadav Har'El
2011-05-24  2:22                             ` Tian, Kevin [this message]
2011-05-24  7:56                               ` Nadav Har'El
2011-05-24  8:20                                 ` Tian, Kevin
2011-05-24 11:05                                   ` Avi Kivity
2011-05-24 11:20                                     ` Tian, Kevin
2011-05-24 11:27                                       ` Avi Kivity
2011-05-24 11:30                                         ` Tian, Kevin
2011-05-24 11:36                                           ` Avi Kivity
2011-05-24 11:40                                             ` Tian, Kevin
2011-05-24 11:59                                               ` Nadav Har'El
2011-05-24  0:57                           ` Tian, Kevin
2011-05-18  8:29                   ` Avi Kivity
2011-05-16 19:48 ` [PATCH 09/31] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2011-05-20  8:22   ` Tian, Kevin
2011-05-16 19:49 ` [PATCH 10/31] nVMX: Success/failure of VMX instructions Nadav Har'El
2011-05-16 19:49 ` [PATCH 11/31] nVMX: Implement VMCLEAR Nadav Har'El
2011-05-16 19:50 ` [PATCH 12/31] nVMX: Implement VMPTRLD Nadav Har'El
2011-05-16 19:50 ` [PATCH 13/31] nVMX: Implement VMPTRST Nadav Har'El
2011-05-16 19:51 ` [PATCH 14/31] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2011-05-16 19:51 ` [PATCH 15/31] nVMX: Move host-state field setup to a function Nadav Har'El
2011-05-16 19:52 ` [PATCH 16/31] nVMX: Move control field setup to functions Nadav Har'El
2011-05-16 19:52 ` [PATCH 17/31] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2011-05-24  8:02   ` Tian, Kevin
2011-05-24  9:19     ` Nadav Har'El
2011-05-24 10:52       ` Tian, Kevin
2011-05-16 19:53 ` [PATCH 18/31] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2011-05-24  8:45   ` Tian, Kevin
2011-05-24  9:45     ` Nadav Har'El
2011-05-24 10:54       ` Tian, Kevin
2011-05-25  8:00   ` Tian, Kevin
2011-05-25 13:26     ` Nadav Har'El
2011-05-26  0:42       ` Tian, Kevin
2011-05-16 19:53 ` [PATCH 19/31] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2011-05-16 19:54 ` [PATCH 20/31] nVMX: Exiting from L2 to L1 Nadav Har'El
2011-05-24 12:58   ` Tian, Kevin
2011-05-24 13:43     ` Nadav Har'El
2011-05-25  0:55       ` Tian, Kevin
2011-05-25  8:06         ` Nadav Har'El
2011-05-25  8:23           ` Tian, Kevin
2011-05-25  2:43   ` Tian, Kevin
2011-05-25 13:21     ` Nadav Har'El
2011-05-26  0:41       ` Tian, Kevin
2011-05-16 19:54 ` [PATCH 21/31] nVMX: vmcs12 checks on nested entry Nadav Har'El
2011-05-25  3:01   ` Tian, Kevin
2011-05-25  5:38     ` Nadav Har'El
2011-05-25  7:33       ` Tian, Kevin
2011-05-16 19:55 ` [PATCH 22/31] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2011-05-25  7:56   ` Tian, Kevin
2011-05-25 13:45     ` Nadav Har'El
2011-05-16 19:55 ` [PATCH 23/31] nVMX: Correct handling of interrupt injection Nadav Har'El
2011-05-25  8:39   ` Tian, Kevin
2011-05-25  8:45     ` Tian, Kevin
2011-05-25 10:56     ` Nadav Har'El
2011-05-25  9:18   ` Tian, Kevin
2011-05-25 12:33     ` Nadav Har'El
2011-05-25 12:55       ` Tian, Kevin
2011-05-16 19:56 ` [PATCH 24/31] nVMX: Correct handling of exception injection Nadav Har'El
2011-05-16 19:56 ` [PATCH 25/31] nVMX: Correct handling of idt vectoring info Nadav Har'El
2011-05-25 10:02   ` Tian, Kevin
2011-05-25 10:13     ` Nadav Har'El
2011-05-25 10:17       ` Tian, Kevin
2011-05-16 19:57 ` [PATCH 26/31] nVMX: Handling of CR0 and CR4 modifying instructions Nadav Har'El
2011-05-16 19:57 ` [PATCH 27/31] nVMX: Further fixes for lazy FPU loading Nadav Har'El
2011-05-16 19:58 ` [PATCH 28/31] nVMX: Additional TSC-offset handling Nadav Har'El
2011-05-16 19:58 ` [PATCH 29/31] nVMX: Add VMX to list of supported cpuid features Nadav Har'El
2011-05-16 19:59 ` [PATCH 30/31] nVMX: Miscellenous small corrections Nadav Har'El
2011-05-16 19:59 ` [PATCH 31/31] nVMX: Documentation Nadav Har'El
2011-05-25 10:33   ` Tian, Kevin
2011-05-25 11:54     ` Nadav Har'El
2011-05-25 12:11       ` Tian, Kevin
2011-05-25 12:13     ` Muli Ben-Yehuda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=625BA99ED14B2D499DC4E29D8138F1505C9BEF07A9@shsmsx502.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=Joerg.Roedel@amd.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nyh@math.technion.ac.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.