All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Nadav Har'El <nyh@il.ibm.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Cc: "gleb@redhat.com" <gleb@redhat.com>, "avi@redhat.com" <avi@redhat.com>
Subject: RE: [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2
Date: Fri, 20 May 2011 16:04:39 +0800	[thread overview]
Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C9BEEFE29@shsmsx502.ccr.corp.intel.com> (raw)
In-Reply-To: <201105161947.p4GJlUJb001735@rice.haifa.ibm.com>

> From: Nadav Har'El
> Sent: Tuesday, May 17, 2011 3:48 AM
> 
> We saw in a previous patch that L1 controls its L2 guest with a vcms12.
> L0 needs to create a real VMCS for running L2. We call that "vmcs02".
> A later patch will contain the code, prepare_vmcs02(), for filling the vmcs02
> fields. This patch only contains code for allocating vmcs02.
> 
> In this version, prepare_vmcs02() sets *all* of vmcs02's fields each time we
> enter from L1 to L2, so keeping just one vmcs02 for the vcpu is enough: It can
> be reused even when L1 runs multiple L2 guests. However, in future versions
> we'll probably want to add an optimization where vmcs02 fields that rarely
> change will not be set each time. For that, we may want to keep around several
> vmcs02s of L2 guests that have recently run, so that potentially we could run
> these L2s again more quickly because less vmwrites to vmcs02 will be needed.

That would be a neat enhancement and should have an obvious improvement.
Possibly we can maintain the vmcs02 pool along with L1 VMCLEAR ops, which
is similar to the hardware behavior regarding to cleared and launched state.

> 
> This patch adds to each vcpu a vmcs02 pool, vmx->nested.vmcs02_pool,
> which remembers the vmcs02s last used to run up to VMCS02_POOL_SIZE L2s.
> As explained above, in the current version we choose VMCS02_POOL_SIZE=1,
> I.e., one vmcs02 is allocated (and loaded onto the processor), and it is
> reused to enter any L2 guest. In the future, when prepare_vmcs02() is
> optimized not to set all fields every time, VMCS02_POOL_SIZE should be
> increased.
> 
> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> ---
>  arch/x86/kvm/vmx.c |  139
> +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 139 insertions(+)
> 
> --- .before/arch/x86/kvm/vmx.c	2011-05-16 22:36:47.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c	2011-05-16 22:36:47.000000000 +0300
> @@ -117,6 +117,7 @@ static int ple_window = KVM_VMX_DEFAULT_
>  module_param(ple_window, int, S_IRUGO);
> 
>  #define NR_AUTOLOAD_MSRS 1
> +#define VMCS02_POOL_SIZE 1
> 
>  struct vmcs {
>  	u32 revision_id;
> @@ -166,6 +167,30 @@ struct __packed vmcs12 {
>  #define VMCS12_SIZE 0x1000
> 
>  /*
> + * When we temporarily switch a vcpu's VMCS (e.g., stop using an L1's VMCS
> + * while we use L2's VMCS), and we wish to save the previous VMCS, we must
> also
> + * remember on which CPU it was last loaded (vcpu->cpu), so when we return
> to
> + * using this VMCS we'll know if we're now running on a different CPU and
> need
> + * to clear the VMCS on the old CPU, and load it on the new one. Additionally,
> + * we need to remember whether this VMCS was launched (vmx->launched),
> so when
> + * we return to it we know if to VMLAUNCH or to VMRESUME it (we cannot
> deduce
> + * this from other state, because it's possible that this VMCS had once been
> + * launched, but has since been cleared after a CPU switch).
> + */
> +struct saved_vmcs {
> +	struct vmcs *vmcs;
> +	int cpu;
> +	int launched;
> +};

"saved" looks a bit misleading here. It's simply a list of all active vmcs02 tracked
by kvm, isn't it?

> +
> +/* Used to remember the last vmcs02 used for some recently used vmcs12s
> */
> +struct vmcs02_list {
> +	struct list_head list;
> +	gpa_t vmcs12_addr;

uniform the name 'vmptr' as nested_vmx strucure:
 /* The guest-physical address of the current VMCS L1 keeps for L2 */
	gpa_t current_vmptr;
	/* The host-usable pointer to the above */
	struct page *current_vmcs12_page;
	struct vmcs12 *current_vmcs12;

you should keep consistent meaning for vmcs12, which means the arch-neutral
state interpreted by KVM only.

> +	struct saved_vmcs vmcs02;
> +};
> +
> +/*
>   * The nested_vmx structure is part of vcpu_vmx, and holds information we
> need
>   * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
>   */
> @@ -178,6 +203,10 @@ struct nested_vmx {
>  	/* The host-usable pointer to the above */
>  	struct page *current_vmcs12_page;
>  	struct vmcs12 *current_vmcs12;
> +
> +	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
> +	struct list_head vmcs02_pool;
> +	int vmcs02_num;
>  };
> 
>  struct vcpu_vmx {
> @@ -4200,6 +4229,111 @@ static int handle_invalid_op(struct kvm_
>  }
> 
>  /*
> + * To run an L2 guest, we need a vmcs02 based the L1-specified vmcs12.
> + * We could reuse a single VMCS for all the L2 guests, but we also want the
> + * option to allocate a separate vmcs02 for each separate loaded vmcs12 -
> this
> + * allows keeping them loaded on the processor, and in the future will allow
> + * optimizations where prepare_vmcs02 doesn't need to set all the fields on
> + * every entry if they never change.
> + * So we keep, in vmx->nested.vmcs02_pool, a cache of size
> VMCS02_POOL_SIZE
> + * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first.
> + *
> + * The following functions allocate and free a vmcs02 in this pool.
> + */
> +
> +static void __nested_free_saved_vmcs(void *arg)
> +{
> +	struct saved_vmcs *saved_vmcs = arg;
> +
> +	vmcs_clear(saved_vmcs->vmcs);
> +	if (per_cpu(current_vmcs, saved_vmcs->cpu) == saved_vmcs->vmcs)
> +		per_cpu(current_vmcs, saved_vmcs->cpu) = NULL;
> +}
> +
> +/*
> + * Free a VMCS, but before that VMCLEAR it on the CPU where it was last
> loaded
> + * (the necessary information is in the saved_vmcs structure).
> + * See also vcpu_clear() (with different parameters and side-effects)
> + */
> +static void nested_free_saved_vmcs(struct vcpu_vmx *vmx,
> +		struct saved_vmcs *saved_vmcs)
> +{
> +	if (saved_vmcs->cpu != -1)
> +		smp_call_function_single(saved_vmcs->cpu,
> +				__nested_free_saved_vmcs, saved_vmcs, 1);
> +
> +	free_vmcs(saved_vmcs->vmcs);
> +}
> +
> +/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */
> +static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
> +{
> +	struct vmcs02_list *item;
> +	list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
> +		if (item->vmcs12_addr == vmptr) {
> +			nested_free_saved_vmcs(vmx, &item->vmcs02);
> +			list_del(&item->list);
> +			kfree(item);
> +			vmx->nested.vmcs02_num--;
> +			return;
> +		}
> +}
> +
> +/*
> + * Free all VMCSs saved for this vcpu, except the actual vmx->vmcs.
> + * These include the VMCSs in vmcs02_pool (except the one currently used,
> + * if running L2), and saved_vmcs01 when running L2.
> + */
> +static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx)
> +{
> +	struct vmcs02_list *item, *n;
> +	list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) {
> +		if (vmx->vmcs != item->vmcs02.vmcs)
> +			nested_free_saved_vmcs(vmx, &item->vmcs02);
> +		list_del(&item->list);
> +		kfree(item);
> +	}
> +	vmx->nested.vmcs02_num = 0;
> +}
> +
> +/* Get a vmcs02 for the current vmcs12. */
> +static struct saved_vmcs *nested_get_current_vmcs02(struct vcpu_vmx
> *vmx)
> +{
> +	struct vmcs02_list *item;
> +	list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
> +		if (item->vmcs12_addr == vmx->nested.current_vmptr) {
> +			list_move(&item->list, &vmx->nested.vmcs02_pool);
> +			return &item->vmcs02;
> +		}
> +
> +	if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) {
> +		/* Recycle the least recently used VMCS. */
> +		item = list_entry(vmx->nested.vmcs02_pool.prev,
> +			struct vmcs02_list, list);
> +		item->vmcs12_addr = vmx->nested.current_vmptr;
> +		list_move(&item->list, &vmx->nested.vmcs02_pool);
> +		return &item->vmcs02;
> +	}
> +
> +	/* Create a new vmcs02 */
> +	item = (struct vmcs02_list *)
> +		kmalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
> +	if (!item)
> +		return NULL;
> +	item->vmcs02.vmcs = alloc_vmcs();
> +	if (!item->vmcs02.vmcs) {
> +		kfree(item);
> +		return NULL;
> +	}
> +	item->vmcs12_addr = vmx->nested.current_vmptr;
> +	item->vmcs02.cpu = -1;
> +	item->vmcs02.launched = 0;
> +	list_add(&(item->list), &(vmx->nested.vmcs02_pool));
> +	vmx->nested.vmcs02_num++;
> +	return &item->vmcs02;
> +}
> +
> +/*
>   * Emulate the VMXON instruction.
>   * Currently, we just remember that VMX is active, and do not save or even
>   * inspect the argument to VMXON (the so-called "VMXON pointer") because
> we
> @@ -4235,6 +4369,9 @@ static int handle_vmon(struct kvm_vcpu *
>  		return 1;
>  	}
> 
> +	INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
> +	vmx->nested.vmcs02_num = 0;
> +
>  	vmx->nested.vmxon = true;
> 
>  	skip_emulated_instruction(vcpu);
> @@ -4286,6 +4423,8 @@ static void free_nested(struct vcpu_vmx
>  		vmx->nested.current_vmptr = -1ull;
>  		vmx->nested.current_vmcs12 = NULL;
>  	}
> +
> +	nested_free_all_saved_vmcss(vmx);
>  }
> 
>  /* Emulate the VMXOFF instruction */
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-05-20  8:05 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-16 19:43 [PATCH 0/31] nVMX: Nested VMX, v10 Nadav Har'El
2011-05-16 19:44 ` [PATCH 01/31] nVMX: Add "nested" module option to kvm_intel Nadav Har'El
2011-05-16 19:44 ` [PATCH 02/31] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2011-05-20  7:58   ` Tian, Kevin
2011-05-16 19:45 ` [PATCH 03/31] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2011-05-16 19:45 ` [PATCH 04/31] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2011-05-16 19:46 ` [PATCH 05/31] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2011-05-16 19:46 ` [PATCH 06/31] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2011-05-16 19:47 ` [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2 Nadav Har'El
2011-05-20  8:04   ` Tian, Kevin [this message]
2011-05-20  8:48     ` Tian, Kevin
2011-05-20 20:32       ` Nadav Har'El
2011-05-22  2:00         ` Tian, Kevin
2011-05-22  7:22           ` Nadav Har'El
2011-05-24  0:54             ` Tian, Kevin
2011-05-22  8:29     ` Nadav Har'El
2011-05-24  1:03       ` Tian, Kevin
2011-05-16 19:48 ` [PATCH 08/31] nVMX: Fix local_vcpus_link handling Nadav Har'El
2011-05-17 13:19   ` Marcelo Tosatti
2011-05-17 13:35     ` Avi Kivity
2011-05-17 14:35       ` Nadav Har'El
2011-05-17 14:42         ` Marcelo Tosatti
2011-05-17 17:57           ` Nadav Har'El
2011-05-17 15:11         ` Avi Kivity
2011-05-17 18:11           ` Nadav Har'El
2011-05-17 18:43             ` Marcelo Tosatti
2011-05-17 19:30               ` Nadav Har'El
2011-05-17 19:52                 ` Marcelo Tosatti
2011-05-18  5:52                   ` Nadav Har'El
2011-05-18  8:31                     ` Avi Kivity
2011-05-18  9:02                       ` Nadav Har'El
2011-05-18  9:16                         ` Avi Kivity
2011-05-18 12:08                     ` Marcelo Tosatti
2011-05-18 12:19                       ` Nadav Har'El
2011-05-22  8:57                       ` Nadav Har'El
2011-05-23 15:49                         ` Avi Kivity
2011-05-23 16:17                           ` Gleb Natapov
2011-05-23 18:59                             ` Nadav Har'El
2011-05-23 19:03                               ` Gleb Natapov
2011-05-23 16:43                           ` Roedel, Joerg
2011-05-23 16:51                             ` Avi Kivity
2011-05-24  9:22                               ` Roedel, Joerg
2011-05-24  9:28                                 ` Nadav Har'El
2011-05-24  9:57                                   ` Roedel, Joerg
2011-05-24 10:08                                     ` Avi Kivity
2011-05-24 10:12                                     ` Nadav Har'El
2011-05-23 18:51                           ` Nadav Har'El
2011-05-24  2:22                             ` Tian, Kevin
2011-05-24  7:56                               ` Nadav Har'El
2011-05-24  8:20                                 ` Tian, Kevin
2011-05-24 11:05                                   ` Avi Kivity
2011-05-24 11:20                                     ` Tian, Kevin
2011-05-24 11:27                                       ` Avi Kivity
2011-05-24 11:30                                         ` Tian, Kevin
2011-05-24 11:36                                           ` Avi Kivity
2011-05-24 11:40                                             ` Tian, Kevin
2011-05-24 11:59                                               ` Nadav Har'El
2011-05-24  0:57                           ` Tian, Kevin
2011-05-18  8:29                   ` Avi Kivity
2011-05-16 19:48 ` [PATCH 09/31] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2011-05-20  8:22   ` Tian, Kevin
2011-05-16 19:49 ` [PATCH 10/31] nVMX: Success/failure of VMX instructions Nadav Har'El
2011-05-16 19:49 ` [PATCH 11/31] nVMX: Implement VMCLEAR Nadav Har'El
2011-05-16 19:50 ` [PATCH 12/31] nVMX: Implement VMPTRLD Nadav Har'El
2011-05-16 19:50 ` [PATCH 13/31] nVMX: Implement VMPTRST Nadav Har'El
2011-05-16 19:51 ` [PATCH 14/31] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2011-05-16 19:51 ` [PATCH 15/31] nVMX: Move host-state field setup to a function Nadav Har'El
2011-05-16 19:52 ` [PATCH 16/31] nVMX: Move control field setup to functions Nadav Har'El
2011-05-16 19:52 ` [PATCH 17/31] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2011-05-24  8:02   ` Tian, Kevin
2011-05-24  9:19     ` Nadav Har'El
2011-05-24 10:52       ` Tian, Kevin
2011-05-16 19:53 ` [PATCH 18/31] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2011-05-24  8:45   ` Tian, Kevin
2011-05-24  9:45     ` Nadav Har'El
2011-05-24 10:54       ` Tian, Kevin
2011-05-25  8:00   ` Tian, Kevin
2011-05-25 13:26     ` Nadav Har'El
2011-05-26  0:42       ` Tian, Kevin
2011-05-16 19:53 ` [PATCH 19/31] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2011-05-16 19:54 ` [PATCH 20/31] nVMX: Exiting from L2 to L1 Nadav Har'El
2011-05-24 12:58   ` Tian, Kevin
2011-05-24 13:43     ` Nadav Har'El
2011-05-25  0:55       ` Tian, Kevin
2011-05-25  8:06         ` Nadav Har'El
2011-05-25  8:23           ` Tian, Kevin
2011-05-25  2:43   ` Tian, Kevin
2011-05-25 13:21     ` Nadav Har'El
2011-05-26  0:41       ` Tian, Kevin
2011-05-16 19:54 ` [PATCH 21/31] nVMX: vmcs12 checks on nested entry Nadav Har'El
2011-05-25  3:01   ` Tian, Kevin
2011-05-25  5:38     ` Nadav Har'El
2011-05-25  7:33       ` Tian, Kevin
2011-05-16 19:55 ` [PATCH 22/31] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2011-05-25  7:56   ` Tian, Kevin
2011-05-25 13:45     ` Nadav Har'El
2011-05-16 19:55 ` [PATCH 23/31] nVMX: Correct handling of interrupt injection Nadav Har'El
2011-05-25  8:39   ` Tian, Kevin
2011-05-25  8:45     ` Tian, Kevin
2011-05-25 10:56     ` Nadav Har'El
2011-05-25  9:18   ` Tian, Kevin
2011-05-25 12:33     ` Nadav Har'El
2011-05-25 12:55       ` Tian, Kevin
2011-05-16 19:56 ` [PATCH 24/31] nVMX: Correct handling of exception injection Nadav Har'El
2011-05-16 19:56 ` [PATCH 25/31] nVMX: Correct handling of idt vectoring info Nadav Har'El
2011-05-25 10:02   ` Tian, Kevin
2011-05-25 10:13     ` Nadav Har'El
2011-05-25 10:17       ` Tian, Kevin
2011-05-16 19:57 ` [PATCH 26/31] nVMX: Handling of CR0 and CR4 modifying instructions Nadav Har'El
2011-05-16 19:57 ` [PATCH 27/31] nVMX: Further fixes for lazy FPU loading Nadav Har'El
2011-05-16 19:58 ` [PATCH 28/31] nVMX: Additional TSC-offset handling Nadav Har'El
2011-05-16 19:58 ` [PATCH 29/31] nVMX: Add VMX to list of supported cpuid features Nadav Har'El
2011-05-16 19:59 ` [PATCH 30/31] nVMX: Miscellenous small corrections Nadav Har'El
2011-05-16 19:59 ` [PATCH 31/31] nVMX: Documentation Nadav Har'El
2011-05-25 10:33   ` Tian, Kevin
2011-05-25 11:54     ` Nadav Har'El
2011-05-25 12:11       ` Tian, Kevin
2011-05-25 12:13     ` Muli Ben-Yehuda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=625BA99ED14B2D499DC4E29D8138F1505C9BEEFE29@shsmsx502.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@il.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.