From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH 00/11] Alternate p2m: support multiple
 copies of host p2m
Date: Tue, 13 Jan 2015 19:01:13 +0000
Message-ID: <54B56B79.3010109@citrix.com>
References: <1420838801-11704-1-git-send-email-edmund.h.white@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1420838801-11704-1-git-send-email-edmund.h.white@intel.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ed White <edmund.h.white@intel.com>, xen-devel@lists.xen.org
Cc: keir@xen.org, ian.campbell@citrix.com, tim@xen.org, ian.jackson@eu.citrix.com, jbeulich@suse.com, Tamas K Lengyel <tamas.lengyel@zentific.com>
List-Id: xen-devel@lists.xenproject.org

On 09/01/15 21:26, Ed White wrote:
> This set of patches adds support to hvm domains for EPTP switching by creating
> multiple copies of the host p2m (currently limited to 10 copies).
>
> The primary use of this capability is expected to be in scenarios where access
> to memory needs to be monitored and/or restricted below the level at which the
> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
> developer summit are:
>
>     VM introspection: 
>         http://www.slideshare.net/xen_com_mgr/
>         zero-footprint-guest-memory-introspection-from-xen
>
>     Secure inter-VM communication:
>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>
> Each p2m copy is populated lazily on EPT violations, and only contains entries for
> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>
> All this is done through extra HVMOP types.
>
> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
> code is hypervisor-only, the toolstack has not been modified.
>
> The intra-domain code has been tested. Violation notifications can only be received
> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
> intra-domain, and only on VCPU's that have enabled notification.
>
> VMFUNC and #VE will both be emulated on hardware without native support.
>
> This code is not compatible with nested hvm functionality and will refuse to work
> with nested hvm active. It is also not compatible with migration. It should be
> considered experimental.

Having reviewed most of the series, I believe I now have a feeling for
what you are trying to achieve, but I would like to discuss some of the
design implications.

The following is my understanding of the situation.  Please correct me
if I have made a mistake.


Currently, a domain has a single host p2m.  This contains the guest
physical address mappings, and a combination of p2m types which are used
by existing components to allow certain actions to happen.  All vcpus
run with the same host p2m.

A domain may have a number of nested p2ms (currently an arbitrary limit
of 10).  These are used for nested-virt and are translated by the host
p2m.  Vcpus in guest mode run under a nested p2m.

This new altp2m infrastructure adds the ability to use a different set
of tables in the place of the host p2m.  This, in practice, allows for
different translations, different p2m types, different access permissions. 

One usecase of alternate p2ms is to provide introspection information to
out-of-guest entities (via the mem_event interface) or to in-guest
entities (via #VE).


Now for some observations and assumptions.

It occurs to me that the altp2m mechanism is generic.  From the look of
the series, it is mostly implemented in a generic way, which is great. 
The only Intel specific bits appear to be the ept handling itself,
'vmfunc' instruction support and #VE injection to in-guest entities. 

I can't think of any reasonable case where the alternate p2m would want
mappings different to the host p2m.  That is to say, an altp2m will map
the same set of mfns to make a guest physical address space, but may
differ in page permissions and possibly p2m types.

Given the above restriction, I believe a lot of the existing features
can continue to work and coexist.  For generating mem_events, the
permissions can be altered in the altp2m.  For injecting #VE, the altp2m
type can change to the new p2m_ram_rw, so long as the host p2m type is
compatible.  For both, a vmexit can occur.  Xen can do the appropriate
action and also inject a #VE on its way back into the guest.

One thing I have noticed while looking at the #VE stuff that EPT also
supports A/D tracking, which might be quite a nice optimisation and
forgo the need for p2m_ram_logdirty, but I think this should be treated
as an orthogonal item.

When shared ept/iommu is not in use, altp2m can safely be used by vcpus,
as this will not interfere with the IOMMU permissions.

Furthermore, I can't conceptually think of an issue against the idea of
nestedp2m alternatives, following the same rule that the mapped mfns
match up.  That should allow all existing nestedvirt infrastructure
continue to work.

Does the above look sensible, or have I overlooked something?

~Andrew