[PATCH 0/30] nVMX: Nested VMX, v9

* [PATCH 0/30] nVMX: Nested VMX, v9
@ 2011-05-08  8:15 Nadav Har'El
  2011-05-08  8:15 ` [PATCH 01/30] nVMX: Add "nested" module option to kvm_intel Nadav Har'El
                   ` (30 more replies)
  0 siblings, 31 replies; 83+ messages in thread
From: Nadav Har'El @ 2011-05-08  8:15 UTC (permalink / raw)
  To: kvm; +Cc: gleb, avi

Hi,

This is the ninth iteration of the nested VMX patch set. This iteration
addresses all of the comments and requests that were raised by reviewers in
the previous rounds, with only a few exception listed below.

Some of the issues which were solved in this version include:

 * Overhauled the hardware VMCS (vmcs02) allocation. Previously we had up to
   256 vmcs02s, one for each L2. Now we only have one, which is reused.
   We also have a compile-time option VMCS02_POOL_SIZE to keep a bigger pool
   of vmcs02s. This option will be useful in the future if vmcs02 won't be
   filled from scratch on each entry from L1 to L2 (currently, it is).

 * The vmcs01 structure, containing a copy of all fields from L1's VMCS, was
   unnecessary, as all the necessary values are either known to KVM or appear
   in vmcs12. This structure is now gone for good.

 * There is no longer a "vmcs_fields" sub-structure that everyone disliked.
   All the VMCS fields appear directly in the vmcs12 structure, which makes
   the code simpler and more readable.

 * Make sure that the vmcs12 fields have fixed sizes and location, and add
   some extra padding, to support live migration and improve future-proofing.

 * For some fields, nested exit used to fail to return the host-state as set
   by L1. Fixed that.

 * nested_vmx_exit_handled (deciding if to let L1 handle an exit, or handle it
   in L0 and return to L2) is now more correct, and handles more exit reasons.

 * Complete overhaul of the cr0, exception bitmap, cr3 and cr4 handling code.
   The code is now shorter (uses existing functions like kvm_set_cr3, etc.),
   more readable, and more uniform (no pieces of code for enable_ept and not,
   less special code for cr0.TS, and none of that ugly cr0.PG monkey-business).

 * Use kvm_register_write(), kvm_rip_read(), etc. Got rid of new and now
   unneeded function sync_cached_regs_to_vcms().

 * Fix return value of the VMX msrs to be more correct, and more constant
   (not to needlessly vary on different hosts).

 * Added some more missing verifications to vmcs12's fields (cleanly failing
   the nested entry if these verifications fail).

 * Expose the MSR-bitmap feature to L1. Every MSR access still exits to L0,
   but slow exits to L1 are avoided when L1's MSR bitmap doesn't want it.

 * Removed or rate limited printouts which could be exploited by guests.

 * Fix VM_ENTRY_LOAD_IA32_PAT feature handling.

 * Fixed potential bug and verified that nested vmx now works with both
   CONFIG_PREEMPT and CONFIG_SMP enabled.

 * Dozens of other code cleanups and bug fixes.

Only a few issues from previous reviews remain unaddressed. These are:

 * The interrupt injection and IDT_VECTORING_INFO_FIELD handling code was
   still not rewritten. It works, though ;-)

 * No KVM autotests for nested VMX yet.

 * Merging of L0's and L1's MSR bitmaps (and IO bitmaps) is still not
   supported. As explained above, the current code uses L1's MSR bitmap
   to avoid costly exits to L1, but still suffers exits to L0 on each
   MSR access in L2.

 * Still no option for disabling some capabilities advertised to L1.

 * No support for TPR_SHADOW feature for L1.

This new set of patches applies to the current KVM trunk (I checked with
082f9eced53d50c136e42d072598da4be4b9ba23).
If you wish, you can also check out an already-patched version of KVM from
branch "nvmx9" of the repository:
	 git://github.com/nyh/kvm-nested-vmx.git

About nested VMX:
-----------------

The following 30 patches implement nested VMX support. This feature enables
a guest to use the VMX APIs in order to run its own nested guests.
In other words, it allows running hypervisors (that use VMX) under KVM.
Multiple guest hypervisors can be run concurrently, and each of those can
in turn host multiple guests.

The theory behind this work, our implementation, and its performance
characteristics were presented in OSDI 2010 (the USENIX Symposium on
Operating Systems Design and Implementation). Our paper was titled
"The Turtles Project: Design and Implementation of Nested Virtualization",
and was awarded "Jay Lepreau Best Paper". The paper is available online, at:

	http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf

This patch set does not include all the features described in the paper.
In particular, this patch set is missing nested EPT (L1 can't use EPT and
must use shadow page tables). It is also missing some features required to
run VMWare hypervisors as a guest. These missing features will be sent as
follow-on patchs.

Running nested VMX:
------------------

The nested VMX feature is currently disabled by default. It must be
explicitly enabled with the "nested=1" option to the kvm-intel module.

No modifications are required to user space (qemu). However, qemu's default
emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be
explicitly enabled, by giving qemu one of the following options:

     -cpu host              (emulated CPU has all features of the real CPU)

     -cpu qemu64,+vmx       (add just the vmx feature to a named CPU type)

This version was only tested with KVM (64-bit) as a guest hypervisor, and
Linux as a nested guest.

Patch statistics:
-----------------

 Documentation/kvm/nested-vmx.txt |  243 ++
 arch/x86/include/asm/kvm_host.h  |    2 
 arch/x86/include/asm/msr-index.h |   12 
 arch/x86/include/asm/vmx.h       |   31 
 arch/x86/kvm/svm.c               |    6 
 arch/x86/kvm/vmx.c               | 2558 +++++++++++++++++++++++++++--
 arch/x86/kvm/x86.c               |   11 
 arch/x86/kvm/x86.h               |    8 
 8 files changed, 2773 insertions(+), 98 deletions(-)

--
Nadav Har'El
IBM Haifa Research Lab

^ permalink raw reply	[flat|nested] 83+ messages in thread