From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Dong, Eddie" <eddie.dong@intel.com>
Subject: RE: [PATCH 0/24] Nested VMX, v5
Date: Fri, 9 Jul 2010 16:59:48 +0800
Message-ID: <1A42CE6F5F474C41B63392A5F80372B21F70B7B1@shsmsx501.ccr.corp.intel.com>
References: <1276431753-nyh@il.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8BIT
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Dong, Eddie" <eddie.dong@intel.com>
To: Nadav Har'El <nyh@il.ibm.com>, "avi@redhat.com" <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mga11.intel.com ([192.55.52.93]:62597 "EHLO mga11.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751877Ab0GIJCv convert rfc822-to-8bit (ORCPT
	<rfc822;kvm@vger.kernel.org>); Fri, 9 Jul 2010 05:02:51 -0400
In-Reply-To: <1276431753-nyh@il.ibm.com>
Content-Language: en-US
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Nadav Har'El wrote:
> Hi Avi,
> 
> This is a followup of our nested VMX patches that Orit Wasserman
> posted in December. We've addressed most of the comments and concerns
> that you and others on the mailing list had with the previous patch
> set. We hope you'll find these patches easier to understand, and
> suitable for applying to KVM. 
> 
> 
> The following 24 patches implement nested VMX support. The patches
> enable a guest to use the VMX APIs in order to run its own nested
> guests. I.e., it allows running hypervisors (that use VMX) under KVM.
> We describe the theory behind this work, our implementation, and its
> performance characteristics, 
> in IBM Research report H-0282, "The Turtles Project: Design and
> Implementation of Nested Virtualization", available at:
> 
> 	http://bit.ly/a0o9te
> 
> The current patches support running Linux under a nested KVM using
> shadow page table (with bypass_guest_pf disabled). They support
> multiple nested hypervisors, which can run multiple guests. Only
> 64-bit nested hypervisors are supported. SMP is supported. Additional
> patches for running Windows under nested KVM, and Linux under nested
> VMware server, and support for nested EPT, are currently running in
> the lab, and will be sent as follow-on patchsets. 
> 

Nadav & All:
	Thnaks for the posting and in general the patches are well written. I like the concept of VMCSxy and I feel it is pretty clear (better than my previous naming as well), but there are some confusing inside, especially for the term "shadow" which I feel quit hard.

	Comments from me:
	1: Basically there are 2 diferent type in VMCS, one is defined by hardware, whose layout is unknown to VMM. Another one is defined by VMM (this patch) and used for vmcs12.

	The former one is using "struct vmcs" to describe its data instance, but the later one doesn't have a clear definition (or struct vmcs12?). I suggest we can have a distinguish struct for this, for example "struct sw_vmcs"(software vmcs), or "struct vvmcs" (virtual vmcs).

	2: vmcsxy (vmcs12, vmcs02, vmcs01) are for instances of either "struct vmcs", or "struct sw_vmcs", but not for struct Clear distinguish between data structure and instance helps IMO.

	3: We may use prefix or suffix in addition to vmcsxy to explictly state the format of that instance. For example vmcs02 in current patch is for hardware use, hence it is an instance "struct vmcs", but vmcs01 is an instance of "struct sw_vmcs". Postfix and prefix helps to make better understand.

	4: Rename l2_vmcs to vmcs02, l1_shadow_vmcs to vmcs01, l1_vmcs to vmcs02, with prefix/postfix can strengthen above concept of vmcsxy.


	5: guest VMPTRLD emulation. Current patch creates vmcs02 instance each time when guest VMPTRLD, and free the instance at VMCLEAR. The code may fail if the vmcs (un-vmcleared) exceeds certain threshold to avoid denial of service. That is fine, but it brings additional complexity and may pay with a lot of memory. I think we can emulate using concept of "cached vmcs" here in case L1 VMM doesn't do vmclear in time.  L0 VMM can simply flush those vmcs02 to guest memory i.e. vmcs12 per need. For example if the cached vcs02 exceeds 10, we can do automatically flush.


Thx, Eddie