All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nadav Har'El" <nyh@il.ibm.com>
To: kvm@vger.kernel.org
Cc: gleb@redhat.com, avi@redhat.com
Subject: [PATCH 29/29] nVMX: Documentation
Date: Thu, 27 Jan 2011 10:44:43 +0200	[thread overview]
Message-ID: <201101270844.p0R8ihiW002777@rice.haifa.ibm.com> (raw)
In-Reply-To: 1296116987-nyh@il.ibm.com

This patch includes a brief introduction to the nested vmx feature in the
Documentation/kvm directory. The document also includes a copy of the
vmcs12 structure, as requested by Avi Kivity.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
---
 Documentation/kvm/nested-vmx.txt |  241 +++++++++++++++++++++++++++++
 1 file changed, 241 insertions(+)

--- .before/Documentation/kvm/nested-vmx.txt	2011-01-26 18:06:06.000000000 +0200
+++ .after/Documentation/kvm/nested-vmx.txt	2011-01-26 18:06:06.000000000 +0200
@@ -0,0 +1,241 @@
+Nested VMX
+==========
+
+Overview
+---------
+
+On Intel processors, KVM uses Intel's VMX (Virtual-Machine eXtensions)
+to easily and efficiently run guest operating systems. Normally, these guests
+*cannot* themselves be hypervisors running their own guests, because in VMX,
+guests cannot use VMX instructions.
+
+The "Nested VMX" feature adds this missing capability - of running guest
+hypervisors (which use VMX) with their own nested guests. It does so by
+allowing a guest to use VMX instructions, and correctly and efficiently
+emulating them using the single level of VMX available in the hardware.
+
+We describe in much greater detail the theory behind the nested VMX feature,
+its implementation and its performance characteristics, in the OSDI 2010 paper
+"The Turtles Project: Design and Implementation of Nested Virtualization",
+available at:
+
+	http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf
+
+
+Terminology
+-----------
+
+Single-level virtualization has two levels - the host (KVM) and the guests.
+In nested virtualization, we have three levels: The host (KVM), which we call
+L0, the guest hypervisor, which we call L1, and its nested guest, which we
+call L2.
+
+
+Known limitations
+-----------------
+
+The current code supports running Linux guests under KVM guests.
+Only 64-bit guest hypervisors are supported.
+
+Additional patches for running Windows under guest KVM, and Linux under
+guest VMware server, and support for nested EPT, are currently running in
+the lab, and will be sent as follow-on patchsets.
+
+
+Running nested VMX
+------------------
+
+The nested VMX feature is disabled by default. It can be enabled by giving
+the "nested=1" option to the kvm-intel module.
+
+No modifications are required to user space (qemu). However, qemu's default
+emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be
+explicitly enabled, by giving qemu one of the following options:
+
+     -cpu host              (emulated CPU has all features of the real CPU)
+
+     -cpu qemu64,+vmx       (add just the vmx feature to a named CPU type)
+
+
+ABIs
+----
+
+Nested VMX aims to present a standard and (eventually) fully-functional VMX
+implementation for the a guest hypervisor to use. As such, the official
+specification of the ABI that it provides is Intel's VMX specification,
+namely volume 3B of their "Intel 64 and IA-32 Architectures Software
+Developer's Manual". Not all of VMX's features are currently fully supported,
+but the goal is to eventually support them all, starting with the VMX features
+which are used in practice by popular hypervisors (KVM and others).
+
+As a VMX implementation, nested VMX presents a VMCS structure to L1.
+As mandated by the spec, other than the two fields revision_id and abort,
+this structure is *opaque* to its user, who is not supposed to know or care
+about its internal structure. Rather, the structure is accessed through the
+VMREAD and VMWRITE instructions.
+Still, for debugging purposes, KVM developers might be interested to know the
+internals of this structure; This is struct vmcs12 from arch/x86/kvm/vmx.c.
+For convenience, we repeat its content here. If the internals of this structure
+changes, this can break live migration across KVM versions. VMCS12_REVISION
+(from vmx.c) should be changed if struct vmcs12 or its inner struct shadow_vmcs
+is ever changed.
+
+struct __packed vmcs12 {
+	/* According to the Intel spec, a VMCS region must start with the
+	 * following two fields. Then follow implementation-specific data.
+	 */
+	u32 revision_id;
+	u32 abort;
+
+	struct vmcs_fields fields;
+
+	bool launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */
+}
+
+struct __packed vmcs_fields {
+	u16 virtual_processor_id;
+	u16 guest_es_selector;
+	u16 guest_cs_selector;
+	u16 guest_ss_selector;
+	u16 guest_ds_selector;
+	u16 guest_fs_selector;
+	u16 guest_gs_selector;
+	u16 guest_ldtr_selector;
+	u16 guest_tr_selector;
+	u16 host_es_selector;
+	u16 host_cs_selector;
+	u16 host_ss_selector;
+	u16 host_ds_selector;
+	u16 host_fs_selector;
+	u16 host_gs_selector;
+	u16 host_tr_selector;
+	u64 io_bitmap_a;
+	u64 io_bitmap_b;
+	u64 msr_bitmap;
+	u64 vm_exit_msr_store_addr;
+	u64 vm_exit_msr_load_addr;
+	u64 vm_entry_msr_load_addr;
+	u64 tsc_offset;
+	u64 virtual_apic_page_addr;
+	u64 apic_access_addr;
+	u64 ept_pointer;
+	u64 guest_physical_address;
+	u64 vmcs_link_pointer;
+	u64 guest_ia32_debugctl;
+	u64 guest_ia32_pat;
+	u64 guest_pdptr0;
+	u64 guest_pdptr1;
+	u64 guest_pdptr2;
+	u64 guest_pdptr3;
+	u64 host_ia32_pat;
+	u32 pin_based_vm_exec_control;
+	u32 cpu_based_vm_exec_control;
+	u32 exception_bitmap;
+	u32 page_fault_error_code_mask;
+	u32 page_fault_error_code_match;
+	u32 cr3_target_count;
+	u32 vm_exit_controls;
+	u32 vm_exit_msr_store_count;
+	u32 vm_exit_msr_load_count;
+	u32 vm_entry_controls;
+	u32 vm_entry_msr_load_count;
+	u32 vm_entry_intr_info_field;
+	u32 vm_entry_exception_error_code;
+	u32 vm_entry_instruction_len;
+	u32 tpr_threshold;
+	u32 secondary_vm_exec_control;
+	u32 vm_instruction_error;
+	u32 vm_exit_reason;
+	u32 vm_exit_intr_info;
+	u32 vm_exit_intr_error_code;
+	u32 idt_vectoring_info_field;
+	u32 idt_vectoring_error_code;
+	u32 vm_exit_instruction_len;
+	u32 vmx_instruction_info;
+	u32 guest_es_limit;
+	u32 guest_cs_limit;
+	u32 guest_ss_limit;
+	u32 guest_ds_limit;
+	u32 guest_fs_limit;
+	u32 guest_gs_limit;
+	u32 guest_ldtr_limit;
+	u32 guest_tr_limit;
+	u32 guest_gdtr_limit;
+	u32 guest_idtr_limit;
+	u32 guest_es_ar_bytes;
+	u32 guest_cs_ar_bytes;
+	u32 guest_ss_ar_bytes;
+	u32 guest_ds_ar_bytes;
+	u32 guest_fs_ar_bytes;
+	u32 guest_gs_ar_bytes;
+	u32 guest_ldtr_ar_bytes;
+	u32 guest_tr_ar_bytes;
+	u32 guest_interruptibility_info;
+	u32 guest_activity_state;
+	u32 guest_sysenter_cs;
+	u32 host_ia32_sysenter_cs;
+	unsigned long cr0_guest_host_mask;
+	unsigned long cr4_guest_host_mask;
+	unsigned long cr0_read_shadow;
+	unsigned long cr4_read_shadow;
+	unsigned long cr3_target_value0;
+	unsigned long cr3_target_value1;
+	unsigned long cr3_target_value2;
+	unsigned long cr3_target_value3;
+	unsigned long exit_qualification;
+	unsigned long guest_linear_address;
+	unsigned long guest_cr0;
+	unsigned long guest_cr3;
+	unsigned long guest_cr4;
+	unsigned long guest_es_base;
+	unsigned long guest_cs_base;
+	unsigned long guest_ss_base;
+	unsigned long guest_ds_base;
+	unsigned long guest_fs_base;
+	unsigned long guest_gs_base;
+	unsigned long guest_ldtr_base;
+	unsigned long guest_tr_base;
+	unsigned long guest_gdtr_base;
+	unsigned long guest_idtr_base;
+	unsigned long guest_dr7;
+	unsigned long guest_rsp;
+	unsigned long guest_rip;
+	unsigned long guest_rflags;
+	unsigned long guest_pending_dbg_exceptions;
+	unsigned long guest_sysenter_esp;
+	unsigned long guest_sysenter_eip;
+	unsigned long host_cr0;
+	unsigned long host_cr3;
+	unsigned long host_cr4;
+	unsigned long host_fs_base;
+	unsigned long host_gs_base;
+	unsigned long host_tr_base;
+	unsigned long host_gdtr_base;
+	unsigned long host_idtr_base;
+	unsigned long host_ia32_sysenter_esp;
+	unsigned long host_ia32_sysenter_eip;
+	unsigned long host_rsp;
+	unsigned long host_rip;
+};
+
+
+Authors
+-------
+
+These patches were written by:
+     Abel Gordon, abelg <at> il.ibm.com
+     Nadav Har'El, nyh <at> il.ibm.com
+     Orit Wasserman, oritw <at> il.ibm.com
+     Ben-Ami Yassor, benami <at> il.ibm.com
+     Muli Ben-Yehuda, muli <at> il.ibm.com
+
+With contributions by:
+     Anthony Liguori, aliguori <at> us.ibm.com
+     Mike Day, mdday <at> us.ibm.com
+     Michael Factor, factor <at> il.ibm.com
+     Zvi Dubitzky, dubi <at> il.ibm.com
+
+And valuable reviews by:
+     Avi Kivity, avi <at> redhat.com
+     Gleb Natapov, gleb <at> redhat.com
+     and others.

  parent reply	other threads:[~2011-01-27  8:44 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-27  8:29 [PATCH 0/29] nVMX: Nested VMX, v8 Nadav Har'El
2011-01-27  8:30 ` [PATCH 01/29] nVMX: Add "nested" module option to vmx.c Nadav Har'El
2011-01-27  8:30 ` [PATCH 02/29] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2011-01-27  8:31 ` [PATCH 03/29] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2011-01-27  8:31 ` [PATCH 04/29] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2011-01-27  8:32 ` [PATCH 05/29] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2011-01-30  9:52   ` Avi Kivity
2011-01-31  8:57     ` Nadav Har'El
2011-01-31  9:01       ` Avi Kivity
2011-01-27  8:32 ` [PATCH 06/29] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2011-01-27  8:33 ` [PATCH 07/29] nVMX: Hold a vmcs02 for each vmcs12 Nadav Har'El
2011-01-30 10:02   ` Avi Kivity
2011-01-31  9:26     ` Nadav Har'El
2011-01-31  9:41       ` Avi Kivity
2011-02-03 12:57     ` Nadav Har'El
2011-02-06  9:16       ` Avi Kivity
2011-02-13 13:04         ` Nadav Har'El
2011-02-13 14:58           ` Avi Kivity
2011-02-13 20:07             ` Nadav Har'El
2011-01-27  8:33 ` [PATCH 08/29] nVMX: Fix local_vcpus_link handling Nadav Har'El
2011-01-30 10:08   ` Avi Kivity
2011-01-27  8:34 ` [PATCH 09/29] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2011-01-30 10:10   ` Avi Kivity
2011-01-27  8:34 ` [PATCH 10/29] nVMX: Success/failure of VMX instructions Nadav Har'El
2011-01-27  8:35 ` [PATCH 11/29] nVMX: Implement VMCLEAR Nadav Har'El
2011-01-30 12:07   ` Avi Kivity
2011-01-27  8:35 ` [PATCH 12/29] nVMX: Implement VMPTRLD Nadav Har'El
2011-01-27  8:36 ` [PATCH 13/29] nVMX: Implement VMPTRST Nadav Har'El
2011-01-27  8:37 ` [PATCH 14/29] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2011-01-27  8:37 ` [PATCH 15/29] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2011-01-27  8:38 ` [PATCH 16/29] nVMX: Move register-syncing to a function Nadav Har'El
2011-01-27  8:38 ` [PATCH 17/29] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2011-01-27  8:39 ` [PATCH 18/29] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2011-01-27  8:39 ` [PATCH 19/29] nVMX: Exiting from L2 to L1 Nadav Har'El
2011-01-27  8:40 ` [PATCH 20/29] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2011-01-27  8:40 ` [PATCH 21/29] nVMX: Correct handling of interrupt injection Nadav Har'El
2011-01-27  8:41 ` [PATCH 22/29] nVMX: Correct handling of exception injection Nadav Har'El
2011-01-27  8:41 ` [PATCH 23/29] nVMX: Correct handling of idt vectoring info Nadav Har'El
2011-01-27  8:42 ` [PATCH 24/29] nVMX: Handling of CR0 and CR4 modifying instructions Nadav Har'El
2011-01-27  8:42 ` [PATCH 25/29] nVMX: Further fixes for lazy FPU loading Nadav Har'El
2011-01-27  8:43 ` [PATCH 26/29] nVMX: Additional TSC-offset handling Nadav Har'El
2011-01-27  8:43 ` [PATCH 27/29] nVMX: Add VMX to list of supported cpuid features Nadav Har'El
2011-01-27  8:44 ` [PATCH 28/29] nVMX: Miscellenous small corrections Nadav Har'El
2011-01-27  8:44 ` Nadav Har'El [this message]
2011-01-28  8:41 ` [PATCH 0/29] nVMX: Nested VMX, v8 Juerg Haefliger
2011-01-28 17:16   ` Nadav Har'El
2011-01-31 10:07   ` Nadav Har'El

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201101270844.p0R8ihiW002777@rice.haifa.ibm.com \
    --to=nyh@il.ibm.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.