All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nadav Har'El" <nyh@math.technion.ac.il>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, gleb@redhat.com
Subject: Re: [PATCH 07/29] nVMX: Hold a vmcs02 for each vmcs12
Date: Thu, 3 Feb 2011 14:57:32 +0200	[thread overview]
Message-ID: <20110203125732.GA19503@fermat.math.technion.ac.il> (raw)
In-Reply-To: <4D45372E.2050605@redhat.com>

On Sun, Jan 30, 2011, Avi Kivity wrote about "Re: [PATCH 07/29] nVMX: Hold a vmcs02 for each vmcs12":
>..
> >+static int nested_create_current_vmcs(struct kvm_vcpu *vcpu)
> >+{
>...
> >+	if (vmx->nested.vmcs02_num>= NESTED_MAX_VMCS)
> >+		return -ENOMEM;
> 
> I asked to replace this by dropping the entire vmcs02_list (or perhaps 
> just its tail).

Hi, here is a completely rewritten patch.

Now we make no guarantee to keep one vmcs02 for each vmcs12. Rather, we
have a limited pool of vmcs02. If we can find the same vmcs02 that we
previously used for the current vmcs12. Otherwise, we can take any of them
(we take the least recently used) and just use that. Of course, if the
pool is not yet fool, we can also allocate a new vmcs02.

The current default size for the pool is 1, meaning that we just keep one
vmcs02 (per vcpu) for use in any L2, potentially many of them. Because in
this version prepare_vmcs02 each time sets all vmcs02 fields, and doesn't
try to avoid setting rarely modified fields, there's nothing to gain by
trying to start from the previous vmcs02 used to run a certain L2.
In the future, when we do have an optimized prepare_vmcs02 which doesn't
set every field each time, it will make sense to increase the pool size.



Subject: [PATCH 07/29] nVMX: Introduce vmcs02: VMCS used to run L2

We saw in a previous patch that L1 controls its L2 guest with a vcms12.
L0 needs to create a real VMCS for running L2. We call that "vmcs02".
A later patch will contain the code, prepare_vmcs02(), for filling the vmcs02
fields. This patch only contains code for allocating vmcs02.

In this version, prepare_vmcs02() sets *all* of vmcs02's fields each time we
enter from L1 to L2, so keeping just one vmcs02 for the vcpu would have
sufficed: It could be reused even when L1 runs multiple L2 guests.
However, in future versions we'll probably want to add an optimization where
vmcs02 fields that rarely change will not be set each time. For that reason
it is beneficial to keep around several vmcs02s of L2 guests that have
recently run, so that potentially we could run these L2s again more quickly
because less vmwrites to vmcs02 will be needed.

This patch adds to each vcpu a vmcs02 pool, vmx->nested.vmcs02_pool,
which remembers the vmcs02s last used to run up to VMCS02_POOL_SIZE L2s.
Because in the current version prepare_vmcs02() sets all vmcs02 fields no
matter what we start with, we choose VMCS02_POOL_SIZE=1. I.e., one vmcs02
is allocated (and loaded onto the processor), and it is reused to enter any
L2 guest. In the future, when prepare_vmcs02() is optimized not to set all
fields every time, VMCS02_POOL_SIZE should be increased.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
---
 arch/x86/kvm/vmx.c |  135 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)

--- .before/arch/x86/kvm/vmx.c	2011-02-03 14:46:53.000000000 +0200
+++ .after/arch/x86/kvm/vmx.c	2011-02-03 14:46:53.000000000 +0200
@@ -117,6 +117,7 @@ static int ple_window = KVM_VMX_DEFAULT_
 module_param(ple_window, int, S_IRUGO);
 
 #define NR_AUTOLOAD_MSRS 1
+#define VMCS02_POOL_SIZE 1
 
 struct vmcs {
 	u32 revision_id;
@@ -159,6 +160,31 @@ struct __packed vmcs12 {
 #define VMCS12_REVISION 0x11e57ed0
 
 /*
+ * When we temporarily switch a vcpu's VMCS (e.g., stop using an L1's VMCS
+ * while we use L2's VMCS), and wish to save the previous VMCS, we must also
+ * remember on which CPU it was last loaded (vcpu->cpu), so when we return to
+ * using this VMCS we'll know if we're now running on a different CPU and need
+ * to clear the VMCS on the old CPU, and load it on the new one. Additionally,
+ * we need to remember whether this VMCS was launched (vmx->launched), so when
+ * we return to it we know if to VMLAUNCH or to VMRESUME it (we cannot deduce
+ * this from other state, because it's possible that this VMCS had once been
+ * launched, but has since been cleared after a CPU switch, and now
+ * vmx->launch is 0).
+ */
+struct saved_vmcs {
+	struct vmcs *vmcs;
+	int cpu;
+	int launched;
+};
+
+/* Used to remember the last vmcs02 used for some recently used vmcs12s */
+struct vmcs02_list {
+	struct list_head list;
+	gpa_t vmcs12_addr;
+	struct saved_vmcs vmcs02;
+};
+
+/*
  * The nested_vmx structure is part of vcpu_vmx, and holds information we need
  * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example,
  * the current VMCS set by L1, a list of the VMCSs used to run the active
@@ -173,6 +199,10 @@ struct nested_vmx {
 	/* The host-usable pointer to the above */
 	struct page *current_vmcs12_page;
 	struct vmcs12 *current_vmcs12;
+
+	/* vmcs02_list cache of VMCSs recently used to run L2 guests */
+	struct list_head vmcs02_pool;
+	int vmcs02_num;
 };
 
 struct vcpu_vmx {
@@ -3965,6 +3995,106 @@ static int handle_invalid_op(struct kvm_
 }
 
 /*
+ * To run an L2 guest, we need a vmcs02 based the L1-specified vmcs12.
+ * We could reuse a single VMCS for all the L2 guests, but we also want the
+ * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this
+ * allows keeping them loaded on the processor, and in the future will allow
+ * optimizations where prepare_vmcs02 doesn't need to set all the fields on
+ * every entry if they never change.
+ * So we keep, in vmx->nested.vmcs02_pool, an cache of size VMCS02_POOL_SIZE
+ * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first.
+ *
+ * The following functions allocate and free a vmcs02 in this pool.
+ */
+
+static void __nested_free_saved_vmcs(void *arg)
+{
+	struct saved_vmcs *saved_vmcs = arg;
+
+	vmcs_clear(saved_vmcs->vmcs);
+	if (per_cpu(current_vmcs, saved_vmcs->cpu) == saved_vmcs->vmcs)
+		per_cpu(current_vmcs, saved_vmcs->cpu) = NULL;
+}
+
+/*
+ * Free a VMCS, but before that VMCLEAR it on the CPU where it was last loaded
+ * (the necessary information is in the saved_vmcs structure).
+ * See also vcpu_clear() (with different parameters and side-effects)
+ */
+static void nested_free_saved_vmcs(struct vcpu_vmx *vmx,
+		struct saved_vmcs *saved_vmcs)
+{
+	if (saved_vmcs->cpu != -1)
+		smp_call_function_single(saved_vmcs->cpu,
+				__nested_free_saved_vmcs, saved_vmcs, 1);
+
+	free_vmcs(saved_vmcs->vmcs);
+}
+
+/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */
+static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
+{
+	struct vmcs02_list *item;
+	list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
+		if (item->vmcs12_addr == vmptr) {
+			nested_free_saved_vmcs(vmx, &item->vmcs02);
+			list_del(&item->list);
+			kfree(item);
+			vmx->nested.vmcs02_num--;
+			return;
+		}
+}
+
+/* Free all vmcs02 saved for this vcpu */
+static void nested_free_all_vmcs02(struct vcpu_vmx *vmx)
+{
+	struct vmcs02_list *item, *n;
+	list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) {
+		nested_free_saved_vmcs(vmx, &item->vmcs02);
+		list_del(&item->list);
+		kfree(item);
+	}
+	vmx->nested.vmcs02_num = 0;
+}
+
+/* Get a vmcs02 for the current vmcs12. */
+static struct saved_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx)
+{
+	struct vmcs02_list *item;
+	list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
+		if (item->vmcs12_addr == vmx->nested.current_vmptr){
+			list_move(&item->list, &vmx->nested.vmcs02_pool);
+			return &item->vmcs02;
+		}
+
+	if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE,1)) {
+		/* Recycle the least recently used VMCS. */
+		item = list_entry(vmx->nested.vmcs02_pool.prev,
+			struct vmcs02_list, list);
+		item->vmcs12_addr = vmx->nested.current_vmptr;
+		list_move(&item->list, &vmx->nested.vmcs02_pool);
+		return &item->vmcs02;
+	}
+
+	/* Create a new vmcs02 */
+	item = (struct vmcs02_list *)
+		kmalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
+	if (!item)
+		return NULL;
+	item->vmcs02.vmcs = alloc_vmcs();
+	if (!item->vmcs02.vmcs) {
+		kfree(item);
+		return NULL;
+	}
+	item->vmcs12_addr = vmx->nested.current_vmptr;
+	item->vmcs02.cpu = -1;
+	item->vmcs02.launched = 0;
+	list_add(&(item->list), &(vmx->nested.vmcs02_pool));
+	vmx->nested.vmcs02_num++;
+	return &item->vmcs02;
+}
+
+/*
  * Emulate the VMXON instruction.
  * Currently, we just remember that VMX is active, and do not save or even
  * inspect the argument to VMXON (the so-called "VMXON pointer") because we
@@ -4000,6 +4130,9 @@ static int handle_vmon(struct kvm_vcpu *
 		return 1;
 	}
 
+	INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
+	vmx->nested.vmcs02_num = 0;
+
 	vmx->nested.vmxon = true;
 
 	skip_emulated_instruction(vcpu);
@@ -4050,6 +4183,8 @@ static void free_nested(struct vcpu_vmx 
 		nested_release_page(vmx->nested.current_vmcs12_page);
 		vmx->nested.current_vmptr = -1ull;
 	}
+
+	nested_free_all_vmcs02(vmx);
 }
 
 /* Emulate the VMXOFF instruction */

-- 
Nadav Har'El                        |    Thursday, Feb  3 2011, 29 Shevat 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Boat: A hole in the water surrounded by
http://nadav.harel.org.il           |wood into which one pours money.

  parent reply	other threads:[~2011-02-03 12:57 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-27  8:29 [PATCH 0/29] nVMX: Nested VMX, v8 Nadav Har'El
2011-01-27  8:30 ` [PATCH 01/29] nVMX: Add "nested" module option to vmx.c Nadav Har'El
2011-01-27  8:30 ` [PATCH 02/29] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2011-01-27  8:31 ` [PATCH 03/29] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2011-01-27  8:31 ` [PATCH 04/29] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2011-01-27  8:32 ` [PATCH 05/29] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2011-01-30  9:52   ` Avi Kivity
2011-01-31  8:57     ` Nadav Har'El
2011-01-31  9:01       ` Avi Kivity
2011-01-27  8:32 ` [PATCH 06/29] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2011-01-27  8:33 ` [PATCH 07/29] nVMX: Hold a vmcs02 for each vmcs12 Nadav Har'El
2011-01-30 10:02   ` Avi Kivity
2011-01-31  9:26     ` Nadav Har'El
2011-01-31  9:41       ` Avi Kivity
2011-02-03 12:57     ` Nadav Har'El [this message]
2011-02-06  9:16       ` Avi Kivity
2011-02-13 13:04         ` Nadav Har'El
2011-02-13 14:58           ` Avi Kivity
2011-02-13 20:07             ` Nadav Har'El
2011-01-27  8:33 ` [PATCH 08/29] nVMX: Fix local_vcpus_link handling Nadav Har'El
2011-01-30 10:08   ` Avi Kivity
2011-01-27  8:34 ` [PATCH 09/29] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2011-01-30 10:10   ` Avi Kivity
2011-01-27  8:34 ` [PATCH 10/29] nVMX: Success/failure of VMX instructions Nadav Har'El
2011-01-27  8:35 ` [PATCH 11/29] nVMX: Implement VMCLEAR Nadav Har'El
2011-01-30 12:07   ` Avi Kivity
2011-01-27  8:35 ` [PATCH 12/29] nVMX: Implement VMPTRLD Nadav Har'El
2011-01-27  8:36 ` [PATCH 13/29] nVMX: Implement VMPTRST Nadav Har'El
2011-01-27  8:37 ` [PATCH 14/29] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2011-01-27  8:37 ` [PATCH 15/29] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2011-01-27  8:38 ` [PATCH 16/29] nVMX: Move register-syncing to a function Nadav Har'El
2011-01-27  8:38 ` [PATCH 17/29] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2011-01-27  8:39 ` [PATCH 18/29] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2011-01-27  8:39 ` [PATCH 19/29] nVMX: Exiting from L2 to L1 Nadav Har'El
2011-01-27  8:40 ` [PATCH 20/29] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2011-01-27  8:40 ` [PATCH 21/29] nVMX: Correct handling of interrupt injection Nadav Har'El
2011-01-27  8:41 ` [PATCH 22/29] nVMX: Correct handling of exception injection Nadav Har'El
2011-01-27  8:41 ` [PATCH 23/29] nVMX: Correct handling of idt vectoring info Nadav Har'El
2011-01-27  8:42 ` [PATCH 24/29] nVMX: Handling of CR0 and CR4 modifying instructions Nadav Har'El
2011-01-27  8:42 ` [PATCH 25/29] nVMX: Further fixes for lazy FPU loading Nadav Har'El
2011-01-27  8:43 ` [PATCH 26/29] nVMX: Additional TSC-offset handling Nadav Har'El
2011-01-27  8:43 ` [PATCH 27/29] nVMX: Add VMX to list of supported cpuid features Nadav Har'El
2011-01-27  8:44 ` [PATCH 28/29] nVMX: Miscellenous small corrections Nadav Har'El
2011-01-27  8:44 ` [PATCH 29/29] nVMX: Documentation Nadav Har'El
2011-01-28  8:41 ` [PATCH 0/29] nVMX: Nested VMX, v8 Juerg Haefliger
2011-01-28 17:16   ` Nadav Har'El
2011-01-31 10:07   ` Nadav Har'El

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110203125732.GA19503@fermat.math.technion.ac.il \
    --to=nyh@math.technion.ac.il \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.