All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
@ 2015-01-09 21:26 Ed White
  2015-01-09 21:26 ` [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection Ed White
                   ` (14 more replies)
  0 siblings, 15 replies; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

This set of patches adds support to hvm domains for EPTP switching by creating
multiple copies of the host p2m (currently limited to 10 copies).

The primary use of this capability is expected to be in scenarios where access
to memory needs to be monitored and/or restricted below the level at which the
guest OS page tables operate. Two examples that were discussed at the 2014 Xen
developer summit are:

    VM introspection: 
        http://www.slideshare.net/xen_com_mgr/
        zero-footprint-guest-memory-introspection-from-xen

    Secure inter-VM communication:
        http://www.slideshare.net/xen_com_mgr/nakajima-nvf

Each p2m copy is populated lazily on EPT violations, and only contains entries for
ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
way to the existing memory access interface, and gfn->mfn mappings can be changed.

All this is done through extra HVMOP types.

The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
code is hypervisor-only, the toolstack has not been modified.

The intra-domain code has been tested. Violation notifications can only be received
for pages that have been modified (access permissions and/or gfn->mfn mapping) 
intra-domain, and only on VCPU's that have enabled notification.

VMFUNC and #VE will both be emulated on hardware without native support.

This code is not compatible with nested hvm functionality and will refuse to work
with nested hvm active. It is also not compatible with migration. It should be
considered experimental.

Ed White (11):
  VMX: VMFUNC and #VE definitions and detection.
  VMX: implement suppress #VE.
  x86/HVM: Hardware alternate p2m support detection.
  x86/MM: Improve p2m type checks.
  x86/altp2m: basic data structures and support routines.
  VMX/altp2m: add code to support EPTP switching and #VE.
  x86/altp2m: introduce p2m_ram_rw_ve type.
  x86/altp2m: add remaining support routines.
  x86/altp2m: define and implement alternate p2m HVMOP types.
  x86/altp2m: fix log-dirty handling.
  x86/altp2m: alternate p2m memory events.

 docs/misc/xen-command-line.markdown |   7 +
 xen/arch/x86/hvm/Makefile           |   3 +-
 xen/arch/x86/hvm/altp2mhvm.c        |  77 ++++++
 xen/arch/x86/hvm/hvm.c              | 264 +++++++++++++++++++-
 xen/arch/x86/hvm/vmx/vmcs.c         |  40 +++
 xen/arch/x86/hvm/vmx/vmx.c          | 139 +++++++++++
 xen/arch/x86/mm/guest_walk.c        |   2 +-
 xen/arch/x86/mm/hap/Makefile        |   1 +
 xen/arch/x86/mm/hap/altp2m_hap.c    | 191 +++++++++++++++
 xen/arch/x86/mm/hap/guest_walk.c    |   4 +-
 xen/arch/x86/mm/hap/hap.c           |  30 ++-
 xen/arch/x86/mm/mm-locks.h          |   4 +
 xen/arch/x86/mm/p2m-ept.c           |  40 ++-
 xen/arch/x86/mm/p2m.c               | 472 +++++++++++++++++++++++++++++++++++-
 xen/arch/x86/mm/paging.c            |   5 -
 xen/common/mem_access.c             |   1 +
 xen/include/asm-arm/p2m.h           |   7 +
 xen/include/asm-x86/domain.h        |   7 +
 xen/include/asm-x86/hvm/altp2mhvm.h |  42 ++++
 xen/include/asm-x86/hvm/hvm.h       |  23 ++
 xen/include/asm-x86/hvm/vcpu.h      |   9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |  16 ++
 xen/include/asm-x86/hvm/vmx/vmx.h   |  14 +-
 xen/include/asm-x86/msr-index.h     |   1 +
 xen/include/asm-x86/p2m.h           |  61 ++++-
 xen/include/public/hvm/hvm_op.h     |  68 ++++++
 xen/include/public/mem_event.h      |   9 +
 27 files changed, 1513 insertions(+), 24 deletions(-)
 create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
 create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
 create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-12 13:06   ` Andrew Cooper
  2015-01-09 21:26 ` [PATCH 02/11] VMX: implement suppress #VE Ed White
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

Currently, neither is enabled globally but may be enabled on a per-VCPU
basis by the altp2m code.

Everything can be force-disabled globally by specifying vmfunc=0 on the
Xen command line.

Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
that bit is now hardware-defined.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 docs/misc/xen-command-line.markdown |  7 +++++++
 xen/arch/x86/hvm/vmx/vmcs.c         | 40 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m-ept.c           |  1 -
 xen/include/asm-x86/hvm/vmx/vmcs.h  | 16 +++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmx.h   | 13 +++++++++++-
 xen/include/asm-x86/msr-index.h     |  1 +
 6 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 152ae03..00fbae7 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1305,6 +1305,13 @@ The optional `keep` parameter causes Xen to continue using the vga
 console even after dom0 has been started.  The default behaviour is to
 relinquish control to dom0.
 
+### vmfunc (Intel)
+> `= <boolean>`
+
+> Default: `true`
+
+Use VMFUNC and #VE support if available.
+
 ### vpid (Intel)
 > `= <boolean>`
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 9d8033e..4274e92 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -50,6 +50,9 @@ boolean_param("unrestricted_guest", opt_unrestricted_guest_enabled);
 static bool_t __read_mostly opt_apicv_enabled = 1;
 boolean_param("apicv", opt_apicv_enabled);
 
+static bool_t __read_mostly opt_vmfunc_enabled = 1;
+boolean_param("vmfunc", opt_vmfunc_enabled);
+
 /*
  * These two parameters are used to config the controls for Pause-Loop Exiting:
  * ple_gap:    upper bound on the amount of time between two successive
@@ -71,6 +74,8 @@ u32 vmx_secondary_exec_control __read_mostly;
 u32 vmx_vmexit_control __read_mostly;
 u32 vmx_vmentry_control __read_mostly;
 u64 vmx_ept_vpid_cap __read_mostly;
+u64 vmx_vmfunc __read_mostly;
+bool_t vmx_virt_exception __read_mostly;
 
 const u32 vmx_introspection_force_enabled_msrs[] = {
     MSR_IA32_SYSENTER_EIP,
@@ -110,6 +115,8 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery");
     P(cpu_has_vmx_posted_intr_processing, "Posted Interrupt Processing");
     P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
+    P(cpu_has_vmx_vmfunc, "VM Functions");
+    P(cpu_has_vmx_virt_exceptions, "Virtualization Exceptions");
 #undef P
 
     if ( !printed )
@@ -154,6 +161,7 @@ static int vmx_init_vmcs_config(void)
     u64 _vmx_misc_cap = 0;
     u32 _vmx_vmexit_control;
     u32 _vmx_vmentry_control;
+    u64 _vmx_vmfunc = 0;
     bool_t mismatch = 0;
 
     rdmsr(MSR_IA32_VMX_BASIC, vmx_basic_msr_low, vmx_basic_msr_high);
@@ -207,6 +215,9 @@ static int vmx_init_vmcs_config(void)
             opt |= SECONDARY_EXEC_ENABLE_VPID;
         if ( opt_unrestricted_guest_enabled )
             opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
+        if ( opt_vmfunc_enabled )
+            opt |= SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
+                   SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
 
         /*
          * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
@@ -296,6 +307,24 @@ static int vmx_init_vmcs_config(void)
           || !(_vmx_vmexit_control & VM_EXIT_ACK_INTR_ON_EXIT) )
         _vmx_pin_based_exec_control  &= ~ PIN_BASED_POSTED_INTERRUPT;
 
+    /* The IA32_VMX_VMFUNC MSR exists only when VMFUNC is available */
+    if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS )
+    {
+        rdmsrl(MSR_IA32_VMX_VMFUNC, _vmx_vmfunc);
+
+        /*
+         * VMFUNC leaf 0 (EPTP switching) must be supported.
+         *
+         * Or we just don't use VMFUNC.
+         */
+        if ( !(_vmx_vmfunc & VMX_VMFUNC_EPTP_SWITCHING) )
+            _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
+    }
+
+    /* Virtualization exceptions are only enabled if VMFUNC is enabled */
+    if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
+        _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+
     min = 0;
     opt = VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_BNDCFGS;
     _vmx_vmentry_control = adjust_vmx_controls(
@@ -316,6 +345,9 @@ static int vmx_init_vmcs_config(void)
         vmx_vmentry_control        = _vmx_vmentry_control;
         vmx_basic_msr              = ((u64)vmx_basic_msr_high << 32) |
                                      vmx_basic_msr_low;
+        vmx_vmfunc                 = _vmx_vmfunc;
+        vmx_virt_exception         = !!(_vmx_secondary_exec_control &
+                                       SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
         vmx_display_features();
 
         /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -352,6 +384,9 @@ static int vmx_init_vmcs_config(void)
         mismatch |= cap_check(
             "EPT and VPID Capability",
             vmx_ept_vpid_cap, _vmx_ept_vpid_cap);
+        mismatch |= cap_check(
+            "VMFUNC Capability",
+            vmx_vmfunc, _vmx_vmfunc);
         if ( cpu_has_vmx_ins_outs_instr_info !=
              !!(vmx_basic_msr_high & (VMX_BASIC_INS_OUT_INFO >> 32)) )
         {
@@ -921,6 +956,11 @@ static int construct_vmcs(struct vcpu *v)
     /* Do not enable Monitor Trap Flag unless start single step debug */
     v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
 
+    /* Disable VMFUNC and #VE for now: they may be enabled later by altp2m. */
+    v->arch.hvm_vmx.secondary_exec_control &=
+        ~(SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
+          SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
+
     if ( is_pvh_domain(d) )
     {
         /* Disable virtual apics, TPR */
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 15c6e83..eb8b5f9 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -244,7 +244,6 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
         epte->mfn += i * trunk;
         epte->snp = (iommu_enabled && iommu_snoop);
         ASSERT(!epte->rsvd1);
-        ASSERT(!epte->avail3);
 
         ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 6a99dca..c6db13f 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -213,7 +213,9 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING       0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
+#define SECONDARY_EXEC_ENABLE_VM_FUNCTIONS      0x00002000
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
+#define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS   0x00040000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
@@ -273,6 +275,10 @@ extern u32 vmx_secondary_exec_control;
     (vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT)
 #define cpu_has_vmx_vmcs_shadowing \
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
+#define cpu_has_vmx_vmfunc \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS)
+#define cpu_has_vmx_virt_exceptions \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
@@ -302,10 +308,14 @@ extern u64 vmx_basic_msr;
 #define VMX_GUEST_INTR_STATUS_SUBFIELD_BITMASK  0x0FF
 #define VMX_GUEST_INTR_STATUS_SVI_OFFSET        8
 
+/* VMFUNC leaf definitions */
+#define VMX_VMFUNC_EPTP_SWITCHING   (1ULL << 0)
+
 /* VMCS field encodings. */
 enum vmcs_field {
     VIRTUAL_PROCESSOR_ID            = 0x00000000,
     POSTED_INTR_NOTIFICATION_VECTOR = 0x00000002,
+    EPTP_INDEX                      = 0x00000004,
     GUEST_ES_SELECTOR               = 0x00000800,
     GUEST_CS_SELECTOR               = 0x00000802,
     GUEST_SS_SELECTOR               = 0x00000804,
@@ -342,14 +352,20 @@ enum vmcs_field {
     APIC_ACCESS_ADDR_HIGH           = 0x00002015,
     PI_DESC_ADDR                    = 0x00002016,
     PI_DESC_ADDR_HIGH               = 0x00002017,
+    VM_FUNCTION_CONTROL             = 0x00002018,
+    VM_FUNCTION_CONTROL_HIGH        = 0x00002019,
     EPT_POINTER                     = 0x0000201a,
     EPT_POINTER_HIGH                = 0x0000201b,
     EOI_EXIT_BITMAP0                = 0x0000201c,
 #define EOI_EXIT_BITMAP(n) (EOI_EXIT_BITMAP0 + (n) * 2) /* n = 0...3 */
+    EPTP_LIST_ADDR                  = 0x00002024,
+    EPTP_LIST_ADDR_HIGH             = 0x00002025,
     VMREAD_BITMAP                   = 0x00002026,
     VMREAD_BITMAP_HIGH              = 0x00002027,
     VMWRITE_BITMAP                  = 0x00002028,
     VMWRITE_BITMAP_HIGH             = 0x00002029,
+    VIRT_EXCEPTION_INFO             = 0x0000202a,
+    VIRT_EXCEPTION_INFO_HIGH        = 0x0000202b,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
     VMCS_LINK_POINTER               = 0x00002800,
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index c8bb548..8bae195 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -46,7 +46,7 @@ typedef union {
         access      :   4,  /* bits 61:58 - p2m_access_t */
         tm          :   1,  /* bit 62 - VT-d transient-mapping hint in
                                shared EPT/VT-d usage */
-        avail3      :   1;  /* bit 63 - Software available 3 */
+        suppress_ve :   1;  /* bit 63 - suppress #VE */
     };
     u64 epte;
 } ept_entry_t;
@@ -185,6 +185,7 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 #define EXIT_REASON_XSETBV              55
 #define EXIT_REASON_APIC_WRITE          56
 #define EXIT_REASON_INVPCID             58
+#define EXIT_REASON_VMFUNC              59
 
 /*
  * Interruption-information format
@@ -550,4 +551,14 @@ void p2m_init_hap_data(struct p2m_domain *p2m);
 #define EPT_L4_PAGETABLE_SHIFT      39
 #define EPT_PAGETABLE_ENTRIES       512
 
+/* #VE information page */
+typedef struct {
+    u32 exit_reason;
+    u32 semaphore;
+    u64 exit_qualification;
+    u64 gla;
+    u64 gpa;
+    u16 eptp_index;
+} ve_info_t;
+
 #endif /* __ASM_X86_HVM_VMX_VMX_H__ */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 83f2f70..8069d60 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -130,6 +130,7 @@
 #define MSR_IA32_VMX_TRUE_PROCBASED_CTLS        0x48e
 #define MSR_IA32_VMX_TRUE_EXIT_CTLS             0x48f
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS            0x490
+#define MSR_IA32_VMX_VMFUNC                     0x491
 #define IA32_FEATURE_CONTROL_MSR                0x3a
 #define IA32_FEATURE_CONTROL_MSR_LOCK                     0x0001
 #define IA32_FEATURE_CONTROL_MSR_ENABLE_VMXON_INSIDE_SMX  0x0002
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 02/11] VMX: implement suppress #VE.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
  2015-01-09 21:26 ` [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-12 16:43   ` Andrew Cooper
  2015-01-15 16:25   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection Ed White
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

In preparation for selectively enabling hardware #VE in a later patch,
set suppress #VE on all EPTE's on #VE-capable hardware.

Suppress #VE should always be the default condition for two reasons:
it is generally not safe to deliver #VE into a guest unless that guest
has been modified to receive it; and even then for most EPT violations only
the hypervisor is able to handle the violation.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/mm/p2m-ept.c         | 34 +++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index eb8b5f9..2b9f07c 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -41,7 +41,7 @@
 #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
 static inline bool_t is_epte_valid(ept_entry_t *e)
 {
-    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
+    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);
 }
 
 /* returns : 0 for success, -errno otherwise */
@@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
 
     ept_entry->r = ept_entry->w = ept_entry->x = 1;
 
+    /* Disable #VE on all entries */ 
+    if ( cpu_has_vmx_virt_exceptions )
+    {
+        ept_entry_t *table = __map_domain_page(pg);
+
+        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+            table[i].suppress_ve = 1;
+
+        unmap_domain_page(table);
+
+        ept_entry->suppress_ve = 1;
+    }
+
     return 1;
 }
 
@@ -243,6 +256,10 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
         epte->sp = (level > 1);
         epte->mfn += i * trunk;
         epte->snp = (iommu_enabled && iommu_snoop);
+
+        if ( cpu_has_vmx_virt_exceptions )
+            epte->suppress_ve = 1;
+
         ASSERT(!epte->rsvd1);
 
         ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
@@ -753,6 +770,9 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ept_p2m_type_to_flags(&new_entry, p2mt, p2ma);
     }
 
+    if ( cpu_has_vmx_virt_exceptions )
+        new_entry.suppress_ve = 1;
+
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
     if ( unlikely(rc) )
         old_entry.epte = 0;
@@ -1069,6 +1089,18 @@ int ept_p2m_init(struct p2m_domain *p2m)
     /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
     ept->ept_wl = 3;
 
+    /* Disable #VE on all entries */
+    if ( cpu_has_vmx_virt_exceptions )
+    {
+        ept_entry_t *table =
+            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
+
+        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+            table[i].suppress_ve = 1;
+
+        unmap_domain_page(table);
+    }
+
     if ( !zalloc_cpumask_var(&ept->synced_mask) )
         return -ENOMEM;
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 8bae195..70fee74 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -49,6 +49,7 @@ typedef union {
         suppress_ve :   1;  /* bit 63 - suppress #VE */
     };
     u64 epte;
+    u64 valid       :   63; /* entire EPTE except suppress #VE bit */
 } ept_entry_t;
 
 typedef struct {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
  2015-01-09 21:26 ` [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection Ed White
  2015-01-09 21:26 ` [PATCH 02/11] VMX: implement suppress #VE Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-12 17:08   ` Andrew Cooper
  2015-01-15 16:32   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 04/11] x86/MM: Improve p2m type checks Ed White
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

As implemented here, only supported on platforms with VMX HAP.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/hvm.c        | 8 ++++++++
 xen/arch/x86/hvm/vmx/vmx.c    | 1 +
 xen/include/asm-x86/hvm/hvm.h | 6 ++++++
 3 files changed, 15 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bc414ff..3a7367c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -157,6 +157,9 @@ static int __init hvm_enable(void)
     if ( !fns->pvh_supported )
         printk(XENLOG_INFO "HVM: PVH mode not supported on this platform\n");
 
+    if ( !fns->altp2m_supported )
+        printk(XENLOG_INFO "HVM: Alternate p2m mode not supported on this platform\n");
+
     /*
      * Allow direct access to the PC debug ports 0x80 and 0xed (they are
      * often used for I/O delays, but the vmexits simply slow things down).
@@ -6369,6 +6372,11 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
     return hvm_funcs.nhvm_intr_blocked(v);
 }
 
+bool_t hvm_altp2m_supported()
+{
+    return hvm_funcs.altp2m_supported;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f2554d6..931709b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1796,6 +1796,7 @@ const struct hvm_function_table * __init start_vmx(void)
     if ( cpu_has_vmx_ept && (cpu_has_vmx_pat || opt_force_ept) )
     {
         vmx_function_table.hap_supported = 1;
+        vmx_function_table.altp2m_supported = 1;
 
         vmx_function_table.hap_capabilities = 0;
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index e3d2d9a..7115a68 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -94,6 +94,9 @@ struct hvm_function_table {
     /* Necessary hardware support for PVH mode? */
     int pvh_supported;
 
+    /* Necessary hardware support for alternate p2m's? */
+    int altp2m_supported;
+
     /* Indicate HAP capabilities. */
     int hap_capabilities;
 
@@ -518,6 +521,9 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 /* interrupt */
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
+/* returns true if hardware supports alternate p2m's */
+bool_t hvm_altp2m_supported(void);
+
 #ifndef NDEBUG
 /* Permit use of the Forced Emulation Prefix in HVM guests */
 extern bool_t opt_hvm_fep;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 04/11] x86/MM: Improve p2m type checks.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (2 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-12 17:48   ` Andrew Cooper
  2015-01-15 16:36   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 05/11] x86/altp2m: basic data structures and support routines Ed White
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

The alternate p2m code will introduce a new p2m type. In preparation for using
that new type, introduce the type indicator here and fix all the checks
that assume !nestedp2m == hostp2m to explicitly check for hostp2m.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/hvm.c           | 2 +-
 xen/arch/x86/mm/guest_walk.c     | 2 +-
 xen/arch/x86/mm/hap/guest_walk.c | 4 ++--
 xen/arch/x86/mm/p2m-ept.c        | 4 ++--
 xen/arch/x86/mm/p2m.c            | 9 +++++----
 xen/include/asm-x86/p2m.h        | 7 ++++++-
 6 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3a7367c..b89e9d2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2861,7 +2861,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     /* Mem sharing: unshare the page and try again */
     if ( npfec.write_access && (p2mt == p2m_ram_shared) )
     {
-        ASSERT(!p2m_is_nestedp2m(p2m));
+        ASSERT(p2m_is_hostp2m(p2m));
         sharing_enomem = 
             (mem_sharing_unshare_page(p2m->domain, gfn, 0) < 0);
         rc = 1;
diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 1b26175..d8f5a35 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -99,7 +99,7 @@ void *map_domain_gfn(struct p2m_domain *p2m, gfn_t gfn, mfn_t *mfn,
                                  q);
     if ( p2m_is_paging(*p2mt) )
     {
-        ASSERT(!p2m_is_nestedp2m(p2m));
+        ASSERT(p2m_is_hostp2m(p2m));
         if ( page )
             put_page(page);
         p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
diff --git a/xen/arch/x86/mm/hap/guest_walk.c b/xen/arch/x86/mm/hap/guest_walk.c
index 25d9792..381a196 100644
--- a/xen/arch/x86/mm/hap/guest_walk.c
+++ b/xen/arch/x86/mm/hap/guest_walk.c
@@ -64,7 +64,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
                                      &p2mt, NULL, P2M_ALLOC | P2M_UNSHARE);
     if ( p2m_is_paging(p2mt) )
     {
-        ASSERT(!p2m_is_nestedp2m(p2m));
+        ASSERT(p2m_is_hostp2m(p2m));
         pfec[0] = PFEC_page_paged;
         if ( top_page )
             put_page(top_page);
@@ -106,7 +106,7 @@ unsigned long hap_p2m_ga_to_gfn(GUEST_PAGING_LEVELS)(
             put_page(page);
         if ( p2m_is_paging(p2mt) )
         {
-            ASSERT(!p2m_is_nestedp2m(p2m));
+            ASSERT(p2m_is_hostp2m(p2m));
             pfec[0] = PFEC_page_paged;
             p2m_mem_paging_populate(p2m->domain, gfn_x(gfn));
             return INVALID_GFN;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 2b9f07c..255b681 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -787,8 +787,8 @@ out:
     if ( needs_sync != sync_off )
         ept_sync_domain(p2m);
 
-    /* For non-nested p2m, may need to change VT-d page table.*/
-    if ( rc == 0 && !p2m_is_nestedp2m(p2m) && need_iommu(d) &&
+    /* For host p2m, may need to change VT-d page table.*/
+    if ( rc == 0 && p2m_is_hostp2m(p2m) && need_iommu(d) &&
          need_modify_vtd_table )
     {
         if ( iommu_hap_pt_share )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index efa49dd..49b66fb 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -73,6 +73,7 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     p2m->default_access = p2m_access_rwx;
 
     p2m->np2m_base = P2M_BASE_EADDR;
+    p2m->alternate = 0;
 
     if ( hap_enabled(d) && cpu_has_vmx )
         ret = ept_p2m_init(p2m);
@@ -202,7 +203,7 @@ int p2m_init(struct domain *d)
 int p2m_is_logdirty_range(struct p2m_domain *p2m, unsigned long start,
                           unsigned long end)
 {
-    ASSERT(!p2m_is_nestedp2m(p2m));
+    ASSERT(p2m_is_hostp2m(p2m));
     if ( p2m->global_logdirty ||
          rangeset_contains_range(p2m->logdirty_ranges, start, end) )
         return 1;
@@ -263,7 +264,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
 
     if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) )
     {
-        ASSERT(!p2m_is_nestedp2m(p2m));
+        ASSERT(p2m_is_hostp2m(p2m));
         /* Try to unshare. If we fail, communicate ENOMEM without
          * sleeping. */
         if ( mem_sharing_unshare_page(p2m->domain, gfn, 0) < 0 )
@@ -431,7 +432,7 @@ int p2m_alloc_table(struct p2m_domain *p2m)
 
     p2m_lock(p2m);
 
-    if ( !p2m_is_nestedp2m(p2m)
+    if ( p2m_is_hostp2m(p2m)
          && !page_list_empty(&d->page_list) )
     {
         P2M_ERROR("dom %d already has memory allocated\n", d->domain_id);
@@ -1708,7 +1709,7 @@ p2m_flush_table(struct p2m_domain *p2m)
 
     /* "Host" p2m tables can have shared entries &c that need a bit more 
      * care when discarding them */
-    ASSERT(p2m_is_nestedp2m(p2m));
+    ASSERT(!p2m_is_hostp2m(p2m));
     /* Nested p2m's do not do pod, hence the asserts (and no pod lock)*/
     ASSERT(page_list_empty(&p2m->pod.super));
     ASSERT(page_list_empty(&p2m->pod.single));
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 5f7fe71..8193901 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -193,6 +193,9 @@ struct p2m_domain {
      * threaded on in LRU order. */
     struct list_head   np2m_list;
 
+    /* Does this p2m belong to the altp2m code? */
+    bool_t alternate;
+
     /* Host p2m: Log-dirty ranges registered for the domain. */
     struct rangeset   *logdirty_ranges;
 
@@ -290,7 +293,9 @@ struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
  */
 struct p2m_domain *p2m_get_p2m(struct vcpu *v);
 
-#define p2m_is_nestedp2m(p2m)   ((p2m) != p2m_get_hostp2m((p2m->domain)))
+#define p2m_is_hostp2m(p2m)   ((p2m) == p2m_get_hostp2m((p2m->domain)))
+#define p2m_is_altp2m(p2m)    ((p2m)->alternate)
+#define p2m_is_nestedp2m(p2m) (!p2m_is_altp2m(p2m) && !p2m_ishostp2m(p2m))
 
 #define p2m_get_pagetable(p2m)  ((p2m)->phys_table)
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (3 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 04/11] x86/MM: Improve p2m type checks Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-13 11:28   ` Andrew Cooper
  2015-01-15 16:48   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE Ed White
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

Add the basic data structures needed to support alternate p2m's and
the functions to initialise them and tear them down.

Although Intel hardware can handle 512 EPTP's per hardware thread
concurrently, only 10 per domain are supported in this patch for
performance reasons.

The iterator in hap_enable() does need to handle 512, so that is now
uint16_t.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/Makefile           |   3 +-
 xen/arch/x86/hvm/altp2mhvm.c        |  77 +++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c              |  21 ++++++++
 xen/arch/x86/mm/hap/Makefile        |   1 +
 xen/arch/x86/mm/hap/altp2m_hap.c    |  66 +++++++++++++++++++++++
 xen/arch/x86/mm/hap/hap.c           |  30 ++++++++++-
 xen/arch/x86/mm/mm-locks.h          |   4 ++
 xen/arch/x86/mm/p2m.c               | 102 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h        |   7 +++
 xen/include/asm-x86/hvm/altp2mhvm.h |  36 +++++++++++++
 xen/include/asm-x86/hvm/hvm.h       |  17 ++++++
 xen/include/asm-x86/hvm/vcpu.h      |   9 ++++
 xen/include/asm-x86/p2m.h           |  22 ++++++++
 13 files changed, 393 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
 create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
 create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h

diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..5bf8b4f 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -22,4 +22,5 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
+obj-y += vpmu.o
+obj-y += altp2mhvm.o
diff --git a/xen/arch/x86/hvm/altp2mhvm.c b/xen/arch/x86/hvm/altp2mhvm.c
new file mode 100644
index 0000000..fa0af0c
--- /dev/null
+++ b/xen/arch/x86/hvm/altp2mhvm.c
@@ -0,0 +1,77 @@
+/*
+ * Alternate p2m HVM
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <asm/hvm/support.h>
+#include <asm/hvm/hvm.h>
+#include <asm/p2m.h>
+#include <asm/hvm/altp2mhvm.h>
+
+void
+altp2mhvm_vcpu_reset(struct vcpu *v)
+{
+    struct altp2mvcpu *av = &vcpu_altp2mhvm(v);
+
+    av->p2midx = 0;
+    av->veinfo = 0;
+
+    if ( hvm_funcs.ahvm_vcpu_reset )
+        hvm_funcs.ahvm_vcpu_reset(v);
+}
+
+int
+altp2mhvm_vcpu_initialise(struct vcpu *v)
+{
+    int rc = -EOPNOTSUPP;
+
+    if ( v != current )
+        vcpu_pause(v);
+
+    if ( !hvm_funcs.ahvm_vcpu_initialise ||
+         (hvm_funcs.ahvm_vcpu_initialise(v) == 0) )
+    {
+        rc = 0;
+        altp2mhvm_vcpu_reset(v);
+        cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
+
+        ahvm_vcpu_update_eptp(v);
+    }
+
+    if ( v != current )
+        vcpu_unpause(v);
+
+    return rc;
+}
+
+void
+altp2mhvm_vcpu_destroy(struct vcpu *v)
+{
+    if ( v != current )
+        vcpu_pause(v);
+
+    if ( hvm_funcs.ahvm_vcpu_destroy )
+        hvm_funcs.ahvm_vcpu_destroy(v);
+
+    cpumask_clear_cpu(v->processor, p2m_get_altp2m(v)->dirty_cpumask);
+    altp2mhvm_vcpu_reset(v);
+
+    ahvm_vcpu_update_eptp(v);
+    ahvm_vcpu_update_vmfunc_ve(v);
+
+    if ( v != current )
+        vcpu_unpause(v);
+}
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b89e9d2..e8787cc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -60,6 +60,7 @@
 #include <asm/hvm/cacheattr.h>
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/altp2mhvm.h>
 #include <asm/mtrr.h>
 #include <asm/apic.h>
 #include <public/sched.h>
@@ -2290,6 +2291,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
 
     hvm_all_ioreq_servers_remove_vcpu(d, v);
 
+    altp2mhvm_vcpu_destroy(v);
     nestedhvm_vcpu_destroy(v);
 
     free_compat_arg_xlat(v);
@@ -6377,6 +6379,25 @@ bool_t hvm_altp2m_supported()
     return hvm_funcs.altp2m_supported;
 }
 
+void ahvm_vcpu_update_eptp(struct vcpu *v)
+{
+    if (hvm_funcs.ahvm_vcpu_update_eptp)
+        hvm_funcs.ahvm_vcpu_update_eptp(v);
+}
+
+void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v)
+{
+    if (hvm_funcs.ahvm_vcpu_update_vmfunc_ve)
+        hvm_funcs.ahvm_vcpu_update_vmfunc_ve(v);
+}
+
+bool_t ahvm_vcpu_emulate_ve(struct vcpu *v)
+{
+    if (hvm_funcs.ahvm_vcpu_emulate_ve)
+        return hvm_funcs.ahvm_vcpu_emulate_ve(v);
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
index 68f2bb5..216cd90 100644
--- a/xen/arch/x86/mm/hap/Makefile
+++ b/xen/arch/x86/mm/hap/Makefile
@@ -4,6 +4,7 @@ obj-y += guest_walk_3level.o
 obj-$(x86_64) += guest_walk_4level.o
 obj-y += nested_hap.o
 obj-y += nested_ept.o
+obj-y += altp2m_hap.o
 
 guest_walk_%level.o: guest_walk.c Makefile
 	$(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c b/xen/arch/x86/mm/hap/altp2m_hap.c
new file mode 100644
index 0000000..c2cdc42
--- /dev/null
+++ b/xen/arch/x86/mm/hap/altp2m_hap.c
@@ -0,0 +1,66 @@
+/******************************************************************************
+ * arch/x86/mm/hap/altp2m_hap.c
+ *
+ * Copyright (c) 2014 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <xen/mem_event.h>
+#include <xen/event.h>
+#include <public/mem_event.h>
+#include <asm/domain.h>
+#include <asm/page.h>
+#include <asm/paging.h>
+#include <asm/p2m.h>
+#include <asm/mem_sharing.h>
+#include <asm/hap.h>
+#include <asm/hvm/support.h>
+
+#include "private.h"
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_valid
+#define mfn_valid(_mfn) __mfn_valid(mfn_x(_mfn))
+#undef page_to_mfn
+#define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg))
+
+void
+altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
+    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
+{
+    struct domain *d = p2m->domain;
+    uint32_t old_flags;
+
+    paging_lock(d);
+
+    old_flags = l1e_get_flags(*p);
+    safe_write_pte(p, new);
+
+    if (old_flags & _PAGE_PRESENT)
+        flush_tlb_mask(p2m->dirty_cpumask);
+
+    paging_unlock(d);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index abf3d7a..8fe0650 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -439,7 +439,7 @@ void hap_domain_init(struct domain *d)
 int hap_enable(struct domain *d, u32 mode)
 {
     unsigned int old_pages;
-    uint8_t i;
+    uint16_t i;
     int rv = 0;
 
     domain_pause(d);
@@ -485,6 +485,23 @@ int hap_enable(struct domain *d, u32 mode)
            goto out;
     }
 
+    /* Init alternate p2m data */
+    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
+    {
+        rv = -ENOMEM;
+        goto out;
+    }
+    for (i = 0; i < 512; i++)
+        d->arch.altp2m_eptp[i] = ~0ul;
+
+    for (i = 0; i < MAX_ALTP2M; i++) {
+        rv = p2m_alloc_table(d->arch.altp2m_p2m[i]);
+        if ( rv != 0 )
+           goto out;
+    }
+
+    d->arch.altp2m_active = 0;
+
     /* Now let other users see the new mode */
     d->arch.paging.mode = mode | PG_HAP_enable;
 
@@ -497,6 +514,17 @@ void hap_final_teardown(struct domain *d)
 {
     uint8_t i;
 
+    d->arch.altp2m_active = 0;
+
+    if ( d->arch.altp2m_eptp ) {
+        free_xenheap_page(d->arch.altp2m_eptp);
+        d->arch.altp2m_eptp = NULL;
+    }
+
+    for (i = 0; i < MAX_ALTP2M; i++) {
+        p2m_teardown(d->arch.altp2m_p2m[i]);
+    }
+
     /* Destroy nestedp2m's first */
     for (i = 0; i < MAX_NESTEDP2M; i++) {
         p2m_teardown(d->arch.nested_p2m[i]);
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 769f7bc..a0faca3 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -209,6 +209,10 @@ declare_mm_lock(nestedp2m)
 #define nestedp2m_lock(d)   mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock)
 #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock)
 
+declare_mm_lock(altp2m)
+#define altp2m_lock(d)   mm_lock(altp2m, &(d)->arch.altp2m_lock)
+#define altp2m_unlock(d) mm_unlock(&(d)->arch.altp2m_lock)
+
 /* P2M lock (per-p2m-table)
  *
  * This protects all queries and updates to the p2m table.
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 49b66fb..3c6049b 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -35,6 +35,7 @@
 #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */
 #include <asm/mem_sharing.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/altp2mhvm.h>
 #include <asm/hvm/svm/amd-iommu-proto.h>
 #include <xsm/xsm.h>
 
@@ -182,6 +183,44 @@ static void p2m_teardown_nestedp2m(struct domain *d)
     }
 }
 
+static void p2m_teardown_altp2m(struct domain *d);
+
+static int p2m_init_altp2m(struct domain *d)
+{
+    uint8_t i;
+    struct p2m_domain *p2m;
+
+    mm_lock_init(&d->arch.altp2m_lock);
+    for (i = 0; i < MAX_ALTP2M; i++)
+    {
+        d->arch.altp2m_p2m[i] = p2m = p2m_init_one(d);
+        if ( p2m == NULL )
+        {
+            p2m_teardown_altp2m(d);
+            return -ENOMEM;
+        }
+        p2m->write_p2m_entry = altp2m_write_p2m_entry;
+        p2m->alternate = 1;
+    }
+
+    return 0;
+}
+
+static void p2m_teardown_altp2m(struct domain *d)
+{
+    uint8_t i;
+    struct p2m_domain *p2m;
+
+    for (i = 0; i < MAX_ALTP2M; i++)
+    {
+        if ( !d->arch.altp2m_p2m[i] )
+            continue;
+        p2m = d->arch.altp2m_p2m[i];
+        p2m_free_one(p2m);
+        d->arch.altp2m_p2m[i] = NULL;
+    }
+}
+
 int p2m_init(struct domain *d)
 {
     int rc;
@@ -195,7 +234,14 @@ int p2m_init(struct domain *d)
      * (p2m_init runs too early for HVM_PARAM_* options) */
     rc = p2m_init_nestedp2m(d);
     if ( rc )
+    {
         p2m_teardown_hostp2m(d);
+        return rc;
+    }
+
+    rc = p2m_init_altp2m(d);
+    if ( rc )
+        p2m_teardown_altp2m(d);
 
     return rc;
 }
@@ -1891,6 +1937,62 @@ int unmap_mmio_regions(struct domain *d,
     return err;
 }
 
+bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx)
+{
+    struct p2m_domain *p2m;
+    struct ept_data *ept;
+    bool_t rc = 0;
+    uint16_t i;
+
+    altp2m_lock(d);
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+    {
+        if ( d->arch.altp2m_eptp[i] == ~0ul )
+            continue;
+
+        p2m = d->arch.altp2m_p2m[i];
+        ept = &p2m->ept;
+
+        if ( eptp != ept_get_eptp(ept) )
+            continue;
+
+        *idx = i;
+        rc = 1;
+
+        break;
+    }
+
+    altp2m_unlock(d);
+    return rc;
+}
+
+bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx)
+{
+    struct domain *d = v->domain;
+    bool_t rc = 0;
+
+    if ( idx > MAX_ALTP2M )
+        return rc;
+
+    altp2m_lock(d);
+
+    if ( d->arch.altp2m_eptp[idx] != ~0ul )
+    {
+        if ( idx != vcpu_altp2mhvm(v).p2midx )
+        {
+            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
+            vcpu_altp2mhvm(v).p2midx = idx;
+            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
+            ahvm_vcpu_update_eptp(v);
+        }
+        rc = 1;
+    }
+
+    altp2m_unlock(d);
+    return rc;
+}
+
 /*** Audit ***/
 
 #if P2M_AUDIT
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 6a77a93..a0e9e90 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -230,6 +230,7 @@ struct paging_vcpu {
 typedef xen_domctl_cpuid_t cpuid_input_t;
 
 #define MAX_NESTEDP2M 10
+#define MAX_ALTP2M    10
 struct p2m_domain;
 struct time_scale {
     int shift;
@@ -274,6 +275,12 @@ struct arch_domain
     struct p2m_domain *nested_p2m[MAX_NESTEDP2M];
     mm_lock_t nested_p2m_lock;
 
+    /* altp2m: allow multiple copies of host p2m */
+    bool_t altp2m_active;
+    struct p2m_domain *altp2m_p2m[MAX_ALTP2M];
+    mm_lock_t altp2m_lock;
+    uint64_t *altp2m_eptp;
+
     /* NB. protected by d->event_lock and by irq_desc[irq].lock */
     struct radix_tree_root irq_pirq;
 
diff --git a/xen/include/asm-x86/hvm/altp2mhvm.h b/xen/include/asm-x86/hvm/altp2mhvm.h
new file mode 100644
index 0000000..919986e
--- /dev/null
+++ b/xen/include/asm-x86/hvm/altp2mhvm.h
@@ -0,0 +1,36 @@
+/*
+ * Alternate p2m HVM
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#ifndef _HVM_ALTP2M_H
+#define _HVM_ALTP2M_H
+
+#include <xen/types.h>         /* for uintNN_t */
+#include <xen/sched.h>         /* for struct vcpu, struct domain */
+#include <asm/hvm/vcpu.h>      /* for vcpu_altp2mhvm */
+
+/* Alternate p2m HVM on/off per domain */
+#define altp2mhvm_active(d) \
+    d->arch.altp2m_active
+
+/* Alternate p2m VCPU */
+int altp2mhvm_vcpu_initialise(struct vcpu *v);
+void altp2mhvm_vcpu_destroy(struct vcpu *v);
+void altp2mhvm_vcpu_reset(struct vcpu *v);
+
+#endif /* _HVM_ALTP2M_H */
+
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 7115a68..32d1d02 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -210,6 +210,14 @@ struct hvm_function_table {
                                   uint32_t *ecx, uint32_t *edx);
 
     void (*enable_msr_exit_interception)(struct domain *d);
+
+    /* Alternate p2m */
+    int (*ahvm_vcpu_initialise)(struct vcpu *v);
+    void (*ahvm_vcpu_destroy)(struct vcpu *v);
+    int (*ahvm_vcpu_reset)(struct vcpu *v);
+    void (*ahvm_vcpu_update_eptp)(struct vcpu *v);
+    void (*ahvm_vcpu_update_vmfunc_ve)(struct vcpu *v);
+    bool_t (*ahvm_vcpu_emulate_ve)(struct vcpu *v);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -531,6 +539,15 @@ extern bool_t opt_hvm_fep;
 #define opt_hvm_fep 0
 #endif
 
+/* updates the current EPTP in VMCS */
+void ahvm_vcpu_update_eptp(struct vcpu *v);
+
+/* updates VMCS fields related to VMFUNC and #VE */
+void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v);
+
+/* emulates #VE */
+bool_t ahvm_vcpu_emulate_ve(struct vcpu *v);
+
 #endif /* __ASM_X86_HVM_HVM_H__ */
 
 /*
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 01e0665..9302d40 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -118,6 +118,13 @@ struct nestedvcpu {
 
 #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
 
+struct altp2mvcpu {
+    uint16_t    p2midx ;        /* alternate p2m index */
+    uint64_t    veinfo;         /* #VE information page guest pfn */
+};
+
+#define vcpu_altp2mhvm(v) ((v)->arch.hvm_vcpu.avcpu)
+
 struct hvm_vcpu {
     /* Guest control-register and EFER values, just as the guest sees them. */
     unsigned long       guest_cr[5];
@@ -163,6 +170,8 @@ struct hvm_vcpu {
 
     struct nestedvcpu   nvcpu;
 
+    struct altp2mvcpu   avcpu;
+
     struct mtrr_state   mtrr;
     u64                 pat_cr;
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 8193901..9fb5ba0 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -688,6 +688,28 @@ void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
     l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
 
 /*
+ * Alternate p2m: shadow p2m tables used for alternate memory views
+ */
+
+void altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
+    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
+
+/* get current alternate p2m table */
+static inline struct p2m_domain *p2m_get_altp2m(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    uint16_t index = vcpu_altp2mhvm(v).p2midx;
+
+    return d->arch.altp2m_p2m[index];
+}
+
+/* Locate an alternate p2m by its EPTP */
+bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx);
+
+/* Switch alternate p2m for a single vcpu */
+bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
+
+/*
  * p2m type to IOMMU flags
  */
 static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (4 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 05/11] x86/altp2m: basic data structures and support routines Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-13 11:58   ` Andrew Cooper
  2015-01-15 16:56   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type Ed White
                   ` (8 subsequent siblings)
  14 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

Implement and hook up the code to enable VMX support of VMFUNC and #VE.

VMFUNC leaf 0 (EPTP switching) and #VE are emulated on hardware that
doesn't support them.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 138 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 931709b..a0a2d02 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -56,6 +56,7 @@
 #include <asm/debugger.h>
 #include <asm/apic.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/altp2mhvm.h>
 #include <asm/event.h>
 #include <public/arch-x86/cpuid.h>
 
@@ -1718,6 +1719,91 @@ static void vmx_enable_msr_exit_interception(struct domain *d)
                                          MSR_TYPE_W);
 }
 
+static void vmx_vcpu_update_eptp(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct p2m_domain *p2m = altp2mhvm_active(d) ?
+        p2m_get_altp2m(v) : p2m_get_hostp2m(d);
+    struct ept_data *ept = &p2m->ept;
+
+    ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+
+    vmx_vmcs_enter(v);
+
+    __vmwrite(EPT_POINTER, ept_get_eptp(ept));
+
+    vmx_vmcs_exit(v);
+}
+
+static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
+
+    if ( !cpu_has_vmx_vmfunc )
+        return;
+
+    if ( cpu_has_vmx_virt_exceptions )
+        mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+
+    vmx_vmcs_enter(v);
+
+    if ( !d->is_dying && altp2mhvm_active(d) )
+    {
+        v->arch.hvm_vmx.secondary_exec_control |= mask;
+        __vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING);
+        __vmwrite(EPTP_LIST_ADDR, virt_to_maddr(d->arch.altp2m_eptp));
+
+        if ( cpu_has_vmx_virt_exceptions )
+        {
+            p2m_type_t t;
+            mfn_t mfn;
+
+            mfn = get_gfn_query_unlocked(d, vcpu_altp2mhvm(v).veinfo, &t);
+            __vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) << PAGE_SHIFT);
+        }
+    }
+    else
+        v->arch.hvm_vmx.secondary_exec_control &= ~mask;
+
+    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
+        v->arch.hvm_vmx.secondary_exec_control);
+
+    vmx_vmcs_exit(v);
+}
+
+static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
+{
+    bool_t rc = 0;
+    ve_info_t *veinfo = vcpu_altp2mhvm(v).veinfo ?
+        hvm_map_guest_frame_rw(vcpu_altp2mhvm(v).veinfo, 0) : NULL;
+
+    if ( !veinfo )
+        return 0;
+
+    if ( veinfo->semaphore != 0 )
+        goto out;
+
+    rc = 1;
+
+    veinfo->exit_reason = EXIT_REASON_EPT_VIOLATION;
+    veinfo->semaphore = ~0l;
+    veinfo->eptp_index = vcpu_altp2mhvm(v).p2midx;
+
+    vmx_vmcs_enter(v);
+    __vmread(EXIT_QUALIFICATION, &veinfo->exit_qualification);
+    __vmread(GUEST_LINEAR_ADDRESS, &veinfo->gla);
+    __vmread(GUEST_PHYSICAL_ADDRESS, &veinfo->gpa);
+    vmx_vmcs_exit(v);
+
+    hvm_inject_hw_exception(TRAP_virtualisation,
+                            HVM_DELIVER_NO_ERROR_CODE);
+
+out:
+    hvm_unmap_guest_frame(veinfo, 0);
+    return rc;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -1777,6 +1863,9 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
     .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
+    .ahvm_vcpu_update_eptp = vmx_vcpu_update_eptp,
+    .ahvm_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
+    .ahvm_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2551,6 +2640,17 @@ static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs)
         hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
         break;
     case X86EMUL_EXCEPTION:
+        /* check for a VMFUNC that should be emulated */
+        if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
+             ctxt.insn_buf_bytes >= 3 && ctxt.insn_buf[0] == 0x0f &&
+             ctxt.insn_buf[1] == 0x01 && ctxt.insn_buf[2] == 0xd4 &&
+             regs->eax == 0 &&
+             p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
+        {
+            regs->eip += 3;
+            return;
+        }
+
         if ( ctxt.exn_pending )
             hvm_inject_trap(&ctxt.trap);
         /* fall through */
@@ -2698,6 +2798,40 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
     /* Now enable interrupts so it's safe to take locks. */
     local_irq_enable();
+ 
+    /*
+     * If the guest has the ability to switch EPTP without an exit,
+     * figure out whether it has done so and update the altp2m data.
+     */
+    if ( altp2mhvm_active(v->domain) &&
+        (v->arch.hvm_vmx.secondary_exec_control &
+        SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
+    {
+        unsigned long idx;
+
+        if ( v->arch.hvm_vmx.secondary_exec_control &
+            SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
+            __vmread(EPTP_INDEX, &idx);
+        else
+        {
+            unsigned long eptp;
+
+            __vmread(EPT_POINTER, &eptp);
+
+            if ( !p2m_find_altp2m_by_eptp(v->domain, eptp, &idx) )
+            {
+                gdprintk(XENLOG_ERR, "EPTP not found in alternate p2m list\n");
+                domain_crash(v->domain);
+            }
+        }
+
+        if ( (uint16_t)idx != vcpu_altp2mhvm(v).p2midx )
+        {
+            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
+            vcpu_altp2mhvm(v).p2midx = (uint16_t)idx;
+            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
+        }
+    }
 
     /* XXX: This looks ugly, but we need a mechanism to ensure
      * any pending vmresume has really happened
@@ -3041,6 +3175,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             update_guest_eip();
         break;
 
+    case EXIT_REASON_VMFUNC:
+        vmx_vmexit_ud_intercept(regs);
+        break;
+
     case EXIT_REASON_INVEPT:
         if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
             update_guest_eip();
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (5 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-15 17:03   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 08/11] x86/altp2m: add remaining support routines Ed White
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

This is treated exactly like p2m_ram_rw, except that suppress_ve is not
set in the EPTE.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/mm/p2m-ept.c | 3 ++-
 xen/include/asm-x86/p2m.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 255b681..d227cbb 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -117,6 +117,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
             entry->r = entry->w = entry->x = 0;
             break;
         case p2m_ram_rw:
+        case p2m_ram_rw_ve:
             entry->r = entry->w = entry->x = 1;
             break;
         case p2m_mmio_direct:
@@ -771,7 +772,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
     }
 
     if ( cpu_has_vmx_virt_exceptions )
-        new_entry.suppress_ve = 1;
+        new_entry.suppress_ve = (p2mt != p2m_ram_rw_ve);
 
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
     if ( unlikely(rc) )
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 9fb5ba0..68a5f80 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -72,6 +72,7 @@ typedef enum {
     p2m_ram_shared = 12,          /* Shared or sharable memory */
     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
     p2m_map_foreign  = 14,        /* ram pages from foreign domain */
+    p2m_ram_rw_ve = 15,           /* Same as p2m_ram_rw, but used to allow #VE */
 } p2m_type_t;
 
 /* Modifiers to the query */
@@ -84,6 +85,7 @@ typedef unsigned int p2m_query_t;
 
 /* RAM types, which map to real machine frames */
 #define P2M_RAM_TYPES (p2m_to_mask(p2m_ram_rw)                \
+                       | p2m_to_mask(p2m_ram_rw_ve)           \
                        | p2m_to_mask(p2m_ram_logdirty)        \
                        | p2m_to_mask(p2m_ram_ro)              \
                        | p2m_to_mask(p2m_ram_paging_out)      \
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (6 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-15 17:25   ` Tim Deegan
  2015-01-15 17:33   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

Add the remaining routines required to support enabling the alternate
p2m functionality.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/hvm.c              |  12 ++
 xen/arch/x86/mm/hap/altp2m_hap.c    |  76 ++++++++
 xen/arch/x86/mm/p2m.c               | 339 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/altp2mhvm.h |   6 +
 xen/include/asm-x86/p2m.h           |  26 +++
 5 files changed, 459 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e8787cc..e6f64a3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2782,6 +2782,18 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
         goto out;
     }
 
+    if ( altp2mhvm_active(v->domain) )
+    {
+        int rv = altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec);
+
+        switch (rv) {
+        case ALTP2MHVM_PAGEFAULT_DONE:
+            return 1;
+        case ALTP2MHVM_PAGEFAULT_CONTINUE:
+            break;
+        }
+    }
+
     p2m = p2m_get_hostp2m(v->domain);
     mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
                               P2M_ALLOC | (npfec.write_access ? P2M_UNSHARE : 0),
diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c b/xen/arch/x86/mm/hap/altp2m_hap.c
index c2cdc42..b889626 100644
--- a/xen/arch/x86/mm/hap/altp2m_hap.c
+++ b/xen/arch/x86/mm/hap/altp2m_hap.c
@@ -29,6 +29,8 @@
 #include <asm/hap.h>
 #include <asm/hvm/support.h>
 
+#include <asm/hvm/altp2mhvm.h>
+
 #include "private.h"
 
 /* Override macros from asm/page.h to make them work with mfn_t */
@@ -56,6 +58,80 @@ altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
 }
 
 /*
+ * If the fault is for a not present entry:
+ *     if the entry is present in the host p2m and is ram, copy it and retry
+ *     else indicate that outer handler should handle fault
+ *
+ * If the fault is for a present entry:
+ *     if the page type is not p2m_ram_rw_ve crash domain
+ *     else if hardware does not support #VE emulate it and retry
+ *     else crash domain
+ */
+
+int
+altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
+                                unsigned long gla, struct npfec npfec)
+{
+    struct domain *d = v->domain;
+    struct p2m_domain *hp2m = p2m_get_hostp2m(d);
+    struct p2m_domain *ap2m;
+    p2m_type_t p2mt;
+    p2m_access_t p2ma;
+    unsigned int page_order;
+    unsigned long gfn, mask;
+    mfn_t mfn;
+    int rv;
+
+    ap2m = p2m_get_altp2m(v);
+
+    mfn = get_gfn_type_access(ap2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
+                              0, &page_order);
+    __put_gfn(ap2m, gpa >> PAGE_SHIFT);
+
+    if ( mfn_valid(mfn) )
+    {
+        /* Should #VE be emulated for this fault? */
+        if ( p2mt == p2m_ram_rw_ve && !cpu_has_vmx_virt_exceptions &&
+             ahvm_vcpu_emulate_ve(v) )
+            return ALTP2MHVM_PAGEFAULT_DONE;
+
+        /* Could not handle fault here */
+        gdprintk(XENLOG_INFO, "Altp2m memory access permissions failure, "
+                              "no mem_event listener VCPU %d, dom %d\n",
+                              v->vcpu_id, d->domain_id);
+        domain_crash(v->domain);
+        return ALTP2MHVM_PAGEFAULT_CONTINUE;
+    }
+
+    mfn = get_gfn_type_access(hp2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
+                              0, &page_order);
+    put_gfn(hp2m->domain, gpa >> PAGE_SHIFT);
+
+    if ( p2mt != p2m_ram_rw || p2ma != p2m_access_rwx )
+        return ALTP2MHVM_PAGEFAULT_CONTINUE;
+
+    p2m_lock(ap2m);
+
+    /* If this is a superpage mapping, round down both frame numbers
+     * to the start of the superpage. */
+    mask = ~((1UL << page_order) - 1);
+    gfn = (gpa >> PAGE_SHIFT) & mask;
+    mfn = _mfn(mfn_x(mfn) & mask);
+
+    rv = p2m_set_entry(ap2m, gfn, mfn, page_order, p2mt, p2ma);
+    p2m_unlock(ap2m);
+
+    if ( rv ) {
+        gdprintk(XENLOG_ERR,
+	    "failed to set entry for %#"PRIx64" -> %#"PRIx64"\n",
+	    gpa, mfn_x(mfn));
+        domain_crash(hp2m->domain);
+    }
+
+    return ALTP2MHVM_PAGEFAULT_DONE;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 3c6049b..44bf1ad 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1993,6 +1993,345 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx)
     return rc;
 }
 
+void p2m_flush_altp2m(struct domain *d)
+{
+    uint16_t i;
+
+    altp2m_lock(d);
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+    {
+        p2m_flush_table(d->arch.altp2m_p2m[i]);
+        d->arch.altp2m_eptp[i] = ~0ul;
+    }
+
+    altp2m_unlock(d);
+}
+
+bool_t p2m_init_altp2m_by_id(struct domain *d, uint16_t idx)
+{
+    struct p2m_domain *p2m;
+    struct ept_data *ept;
+    bool_t rc = 0;
+
+    if ( idx > MAX_ALTP2M )
+        return rc;
+
+    altp2m_lock(d);
+
+    if ( d->arch.altp2m_eptp[idx] == ~0ul )
+    {
+        p2m = d->arch.altp2m_p2m[idx];
+        ept = &p2m->ept;
+        ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+        d->arch.altp2m_eptp[idx] = ept_get_eptp(ept);
+        rc = 1;
+    }
+
+    altp2m_unlock(d);
+    return rc;
+}
+
+bool_t p2m_init_next_altp2m(struct domain *d, uint16_t *idx)
+{
+    struct p2m_domain *p2m;
+    struct ept_data *ept;
+    bool_t rc = 0;
+    uint16_t i;
+
+    altp2m_lock(d);
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+    {
+        if ( d->arch.altp2m_eptp[i] != ~0ul )
+            continue;
+
+        p2m = d->arch.altp2m_p2m[i];
+        ept = &p2m->ept;
+        ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+        d->arch.altp2m_eptp[i] = ept_get_eptp(ept);
+        *idx = i;
+        rc = 1;
+
+        break;
+    }
+
+    altp2m_unlock(d);
+    return rc;
+}
+
+bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx)
+{
+    struct p2m_domain *p2m;
+    struct vcpu *curr = current;
+    struct vcpu *v;
+    bool_t rc = 0;
+
+    if ( !idx || idx > MAX_ALTP2M )
+        return rc;
+
+    if ( curr->domain != d )
+        domain_pause(d);
+    else
+        for_each_vcpu( d, v )
+            if ( curr != v )
+                vcpu_pause(v);
+
+    altp2m_lock(d);
+
+    if ( d->arch.altp2m_eptp[idx] != ~0ul )
+    {
+        p2m = d->arch.altp2m_p2m[idx];
+
+        if ( !cpumask_weight(p2m->dirty_cpumask) )
+        {
+            p2m_flush_table(d->arch.altp2m_p2m[idx]);
+            d->arch.altp2m_eptp[idx] = ~0ul;
+            rc = 1;
+        }
+    }
+
+    altp2m_unlock(d);
+
+    if ( curr->domain != d )
+        domain_unpause(d);
+    else
+        for_each_vcpu( d, v )
+            if ( curr != v )
+                vcpu_unpause(v);
+
+    return rc;
+}
+
+bool_t p2m_switch_domain_altp2m_by_id(struct domain *d, uint16_t idx)
+{
+    struct vcpu *curr = current;
+    struct vcpu *v;
+    bool_t rc = 0;
+
+    if ( idx > MAX_ALTP2M )
+        return rc;
+
+    if ( curr->domain != d )
+        domain_pause(d);
+    else
+        for_each_vcpu( d, v )
+            if ( curr != v )
+                vcpu_pause(v);
+
+    altp2m_lock(d);
+
+    if ( d->arch.altp2m_eptp[idx] != ~0ul )
+    {
+        for_each_vcpu( d, v )
+            if ( idx != vcpu_altp2mhvm(v).p2midx )
+            {
+                cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
+                vcpu_altp2mhvm(v).p2midx = idx;
+                cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
+                ahvm_vcpu_update_eptp(v);
+            }
+
+        rc = 1;
+    }
+
+    altp2m_unlock(d);
+
+    if ( curr->domain != d )
+        domain_unpause(d);
+    else
+        for_each_vcpu( d, v )
+            if ( curr != v )
+                vcpu_unpause(v);
+
+    return rc;
+}
+
+bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
+                                 unsigned long pfn, xenmem_access_t access)
+{
+    struct p2m_domain *hp2m, *ap2m;
+    p2m_access_t a, _a;
+    p2m_type_t t;
+    mfn_t mfn;
+    unsigned int page_order;
+    bool_t rc = 0;
+
+    static const p2m_access_t memaccess[] = {
+#define ACCESS(ac) [XENMEM_access_##ac] = p2m_access_##ac
+        ACCESS(n),
+        ACCESS(r),
+        ACCESS(w),
+        ACCESS(rw),
+        ACCESS(x),
+        ACCESS(rx),
+        ACCESS(wx),
+        ACCESS(rwx),
+#undef ACCESS
+    };
+
+    if ( idx > MAX_ALTP2M || d->arch.altp2m_eptp[idx] == ~0ul )
+        return 0;
+
+    ap2m = d->arch.altp2m_p2m[idx];
+
+    switch ( access )
+    {
+    case 0 ... ARRAY_SIZE(memaccess) - 1:
+        a = memaccess[access];
+        break;
+    case XENMEM_access_default:
+        a = ap2m->default_access;
+        break;
+    default:
+        return 0;
+    }
+
+    /* If request to set default access */
+    if ( pfn == ~0ul )
+    {
+        ap2m->default_access = a;
+        return 1;
+    }
+
+    hp2m = p2m_get_hostp2m(d);
+
+    p2m_lock(ap2m);
+
+    mfn = ap2m->get_entry(ap2m, pfn, &t, &_a, 0, NULL);
+
+    /* Check host p2m if no valid entry in alternate */
+    if ( !mfn_valid(mfn) )
+    {
+        mfn = hp2m->get_entry(hp2m, pfn, &t, &_a, 0, &page_order);
+
+        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
+            goto out;
+
+        /* If this is a superpage, copy that first */
+        if ( page_order != PAGE_ORDER_4K )
+        {
+            unsigned long gfn, mask;
+            mfn_t mfn2;
+
+            mask = ~((1UL << page_order) - 1);
+            gfn = pfn & mask;
+            mfn2 = _mfn(mfn_x(mfn) & mask);
+
+            if ( ap2m->set_entry(ap2m, gfn, mfn2, page_order, t, _a) )
+                goto out;
+        }
+    }
+
+    /* Use special ram type to enable #VE if setting for current domain */
+    if ( current->domain == d )
+        t = p2m_ram_rw_ve;
+
+    if ( !ap2m->set_entry(ap2m, pfn, mfn, PAGE_ORDER_4K, t, a) )
+        rc = 1;
+
+out:
+    p2m_unlock(ap2m);
+    return rc;
+}
+
+bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
+                             unsigned long old_pfn, unsigned long new_pfn)
+{
+    struct p2m_domain *hp2m, *ap2m;
+    p2m_access_t a;
+    p2m_type_t t;
+    mfn_t mfn;
+    unsigned int page_order;
+    bool_t rc = 0;
+
+    if ( idx > MAX_ALTP2M || d->arch.altp2m_eptp[idx] == ~0ul )
+        return 0;
+
+    hp2m = p2m_get_hostp2m(d);
+    ap2m = d->arch.altp2m_p2m[idx];
+
+    p2m_lock(ap2m);
+
+    mfn = ap2m->get_entry(ap2m, old_pfn, &t, &a, 0, NULL);
+
+    if ( new_pfn == ~0ul )
+    {
+        if ( mfn_valid(mfn) )
+            p2m_remove_page(ap2m, old_pfn, mfn_x(mfn), PAGE_ORDER_4K);
+        rc = 1;
+        goto out;
+    }
+
+    /* Check host p2m if no valid entry in alternate */
+    if ( !mfn_valid(mfn) )
+    {
+        mfn = hp2m->get_entry(hp2m, old_pfn, &t, &a, 0, &page_order);
+
+        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
+            goto out;
+
+        /* If this is a superpage, copy that first */
+        if ( page_order != PAGE_ORDER_4K )
+        {
+            unsigned long gfn, mask;
+
+            mask = ~((1UL << page_order) - 1);
+            gfn = old_pfn & mask;
+            mfn = _mfn(mfn_x(mfn) & mask);
+
+            if ( ap2m->set_entry(ap2m, gfn, mfn, page_order, t, a) )
+                goto out;
+        }
+    }
+
+    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
+
+    if ( !mfn_valid(mfn) )
+        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
+
+    if ( !mfn_valid(mfn) || !(t == p2m_ram_rw || t == p2m_ram_rw) )
+        goto out;
+
+    /* Use special ram type to enable #VE if setting for current domain */
+    if ( current->domain == d )
+        t = p2m_ram_rw_ve;
+
+    if ( !ap2m->set_entry(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a) )
+        rc = 1;
+
+out:
+    p2m_unlock(ap2m);
+    return rc;
+}
+
+void p2m_remove_altp2m_page(struct domain *d, unsigned long gfn)
+{
+    struct p2m_domain *p2m;
+    p2m_access_t a;
+    p2m_type_t t;
+    mfn_t mfn;
+    uint16_t i;
+
+    altp2m_lock(d);
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+    {
+        if ( d->arch.altp2m_eptp[i] == ~0ul )
+            continue;
+
+        p2m = d->arch.altp2m_p2m[i];
+        mfn = get_gfn_type_access(p2m, gfn, &t, &a, 0, NULL);
+
+        if ( mfn_valid(mfn) )
+            p2m_remove_page(p2m, gfn, mfn_x(mfn), PAGE_ORDER_4K);
+
+        __put_gfn(p2m, gfn);
+    }
+
+    altp2m_unlock(d);
+}
+
 /*** Audit ***/
 
 #if P2M_AUDIT
diff --git a/xen/include/asm-x86/hvm/altp2mhvm.h b/xen/include/asm-x86/hvm/altp2mhvm.h
index 919986e..f752815 100644
--- a/xen/include/asm-x86/hvm/altp2mhvm.h
+++ b/xen/include/asm-x86/hvm/altp2mhvm.h
@@ -32,5 +32,11 @@ int altp2mhvm_vcpu_initialise(struct vcpu *v);
 void altp2mhvm_vcpu_destroy(struct vcpu *v);
 void altp2mhvm_vcpu_reset(struct vcpu *v);
 
+/* Alternate p2m paging */
+#define ALTP2MHVM_PAGEFAULT_DONE       0
+#define ALTP2MHVM_PAGEFAULT_CONTINUE   1
+int altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
+    unsigned long gla, struct npfec npfec);
+
 #endif /* _HVM_ALTP2M_H */
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 68a5f80..52588ed 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -711,6 +711,32 @@ bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *i
 /* Switch alternate p2m for a single vcpu */
 bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
 
+/* Flush all the alternate p2m's for a domain */
+void p2m_flush_altp2m(struct domain *d);
+
+/* Make a specific alternate p2m valid */
+bool_t p2m_init_altp2m_by_id(struct domain *d, uint16_t idx);
+
+/* Find an available alternate p2m and make it valid */
+bool_t p2m_init_next_altp2m(struct domain *d, uint16_t *idx);
+
+/* Make a specific alternate p2m invalid */
+bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx);
+
+/* Switch alternate p2m for entire domain */
+bool_t p2m_switch_domain_altp2m_by_id(struct domain *d, uint16_t idx);
+
+/* Set access type for a pfn */
+bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
+                                 unsigned long pfn, xenmem_access_t access);
+
+/* Replace a pfn with a different pfn */
+bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
+                             unsigned long old_pfn, unsigned long new_pfn);
+
+/* Invalidate a page in all alternate p2m's */
+void p2m_remove_altp2m_page(struct domain *d, unsigned long gfn);
+
 /*
  * p2m type to IOMMU flags
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (7 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 08/11] x86/altp2m: add remaining support routines Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-15 17:09   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 10/11] x86/altp2m: fix log-dirty handling Ed White
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/hvm.c          | 217 ++++++++++++++++++++++++++++++++++++++++
 xen/include/public/hvm/hvm_op.h |  68 +++++++++++++
 2 files changed, 285 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e6f64a3..afe16bf 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6145,6 +6145,223 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case HVMOP_altp2m_get_domain_state:
+    {
+        struct xen_hvm_altp2m_domain_state a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() )
+            goto param_fail9;
+
+        a.state = altp2mhvm_active(d);
+        rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+    param_fail9:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_set_domain_state:
+    {
+        struct xen_hvm_altp2m_domain_state a;
+        struct domain *d;
+        struct vcpu *v;
+        bool_t ostate;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             nestedhvm_enabled(d) )
+            goto param_fail10;
+
+        ostate = d->arch.altp2m_active;
+        d->arch.altp2m_active = !!a.state;
+
+        /* If the alternate p2m state has changed, handle appropriately */
+        if ( d->arch.altp2m_active != ostate )
+        {
+            if ( !ostate && !p2m_init_altp2m_by_id(d, 0) )
+                    goto param_fail10;
+
+            for_each_vcpu( d, v )
+                if (!ostate)
+                    altp2mhvm_vcpu_initialise(v);
+                else
+                    altp2mhvm_vcpu_destroy(v);
+
+            if ( ostate )
+                p2m_flush_altp2m(d);
+        }
+
+        rc = 0;
+
+    param_fail10:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_vcpu_enable_notify:
+    {
+        struct vcpu *curr = current;
+        struct xen_hvm_altp2m_vcpu_enable_notify a;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        if ( !is_hvm_domain(curr_d) || !hvm_altp2m_supported() ||
+             !curr_d->arch.altp2m_active || vcpu_altp2mhvm(curr).veinfo )
+            return -EINVAL;
+
+        vcpu_altp2mhvm(curr).veinfo = a.pfn;
+        ahvm_vcpu_update_vmfunc_ve(curr);
+        rc = 0;
+
+        break;
+    }
+
+    case HVMOP_altp2m_create_p2m:
+    {
+        struct xen_hvm_altp2m_view a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail11;
+
+        if ( !p2m_init_next_altp2m(d, &a.view) )
+            goto param_fail11;
+
+        p2m_set_altp2m_mem_access(d, a.view, ~0ul, a.hvmmem_default_access);
+
+        rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+    param_fail11:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_destroy_p2m:
+    {
+        struct xen_hvm_altp2m_view a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail12;
+
+        if ( p2m_destroy_altp2m_by_id(d, a.view) )
+            rc = 0;
+
+    param_fail12:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_switch_p2m:
+    {
+        struct xen_hvm_altp2m_view a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail13;
+
+        if ( p2m_switch_domain_altp2m_by_id(d, a.view) )
+            rc = 0;
+
+    param_fail13:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_set_mem_access:
+    {
+        struct xen_hvm_altp2m_set_mem_access a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail14;
+
+        if ( p2m_set_altp2m_mem_access(d, a.view, a.pfn, a.hvmmem_access) )
+            rc = 0;
+
+    param_fail14:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_change_pfn:
+    {
+        struct xen_hvm_altp2m_change_pfn a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail15;
+
+        if ( p2m_change_altp2m_pfn(d, a.view, a.old_pfn, a.new_pfn) )
+            rc = 0;
+
+    param_fail15:
+        rcu_unlock_domain(d);
+        break;
+    }
+
     default:
     {
         gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index eeb0a60..ea542ec 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -369,6 +369,74 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t);
 
 #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
 
+/* Set/get the altp2m state for a domain */
+#define HVMOP_altp2m_set_domain_state     23
+#define HVMOP_altp2m_get_domain_state     24
+struct xen_hvm_altp2m_domain_state {
+    /* Domain to be updated or queried */
+    domid_t domid;
+    /* IN or OUT variable on/off */
+    uint8_t state;
+};
+typedef struct xen_hvm_altp2m_domain_state xen_hvm_altp2m_domain_state_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_domain_state_t);
+
+/* Set the current VCPU to receive altp2m event notifications */
+#define HVMOP_altp2m_vcpu_enable_notify   25
+struct xen_hvm_altp2m_vcpu_enable_notify {
+    /* #VE info area pfn */
+    uint64_t pfn;
+};
+typedef struct xen_hvm_altp2m_vcpu_enable_notify xen_hvm_altp2m_vcpu_enable_notify_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_vcpu_enable_notify_t);
+
+/* Create a new view */
+#define HVMOP_altp2m_create_p2m   26
+/* Destroy a view */
+#define HVMOP_altp2m_destroy_p2m  27
+/* Switch view for an entire domain */
+#define HVMOP_altp2m_switch_p2m   28
+struct xen_hvm_altp2m_view {
+    /* Domain to be updated */
+    domid_t domid;
+    /* IN/OUT variable */
+    uint16_t view;
+    /* Create view only: default access type */
+    uint16_t hvmmem_default_access; /* xenmem_access_t */
+};
+typedef struct xen_hvm_altp2m_view xen_hvm_altp2m_view_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_view_t);
+
+/* Notify that a page of memory is to have specific access types */
+#define HVMOP_altp2m_set_mem_access 29
+struct xen_hvm_altp2m_set_mem_access {
+    /* Domain to be updated. */
+    domid_t domid;
+    /* view */
+    uint16_t view;
+    /* Memory type */
+    uint16_t hvmmem_access; /* xenmem_access_t */
+    /* pfn */
+    uint64_t pfn;
+};
+typedef struct xen_hvm_altp2m_set_mem_access xen_hvm_altp2m_set_mem_access_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_set_mem_access_t);
+
+/* Change a p2m entry to map a different pfn */
+#define HVMOP_altp2m_change_pfn 30
+struct xen_hvm_altp2m_change_pfn {
+    /* Domain to be updated. */
+    domid_t domid;
+    /* view */
+    uint16_t view;
+    /* old pfn */
+    uint64_t old_pfn;
+    /* new pfn, -1 means revert */
+    uint64_t new_pfn;
+};
+typedef struct xen_hvm_altp2m_change_pfn xen_hvm_altp2m_change_pfn_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_change_pfn_t);
+
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 10/11] x86/altp2m: fix log-dirty handling.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (8 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-15 17:20   ` Tim Deegan
  2015-01-09 21:26 ` [PATCH 11/11] x86/altp2m: alternate p2m memory events Ed White
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

Log-dirty, as used to track vram changes, works exclusively on the host p2m.
As a result, when running on any other p2m vram changes aren't tracked
properly and the domain's console display is corrupted.

To fix this, log-dirty pages are never valid in the alternate p2m's, and
if the type of any page in the host p2m is changed, that page is immediately
removed from any alternate p2m in which it was previously valid.

This requires taking the alternate p2m list lock, so to avoid a locking
order violation p2m_change_type_one() must not be called with the host p2m
lock held. This requires a minor change to the exit code flow in the
nested page fault handler, and removing the p2m locking code in
paging_log_dirty_range().

As far as I can tell, removing the latter code is safe since
p2m_change_type_one() acquires a gfn lock on the page before changing it.

With these changes, the alternate p2m nested page fault handler can safely
ignore log-dirty and leave it to be handled in the host p2m nested page
fault handler.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/hvm.c   | 4 +++-
 xen/arch/x86/mm/p2m.c    | 4 ++++
 xen/arch/x86/mm/paging.c | 5 -----
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index afe16bf..18d5987 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2885,6 +2885,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     /* Spurious fault? PoD and log-dirty also take this path. */
     if ( p2m_is_ram(p2mt) )
     {
+        rc = 1;
         /*
          * Page log dirty is always done with order 0. If this mfn resides in
          * a large page, we do not change other pages type within that large
@@ -2893,9 +2894,10 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
         if ( npfec.write_access )
         {
             paging_mark_dirty(v->domain, mfn_x(mfn));
+            put_gfn(p2m->domain, gfn);
             p2m_change_type_one(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
+            goto out;
         }
-        rc = 1;
         goto out_put_gfn;
     }
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 44bf1ad..843a433 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -793,6 +793,10 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
 
     gfn_unlock(p2m, gfn, 0);
 
+    if ( pt == ot && altp2mhvm_active(d) )
+        /* make sure this page isn't valid in any alternate p2m */
+        p2m_remove_altp2m_page(d, gfn);
+
     return rc;
 }
 
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 6b788f7..2be68ae 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -574,7 +574,6 @@ void paging_log_dirty_range(struct domain *d,
                            unsigned long nr,
                            uint8_t *dirty_bitmap)
 {
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int i;
     unsigned long pfn;
 
@@ -588,14 +587,10 @@ void paging_log_dirty_range(struct domain *d,
      * switched to read-write.
      */
 
-    p2m_lock(p2m);
-
     for ( i = 0, pfn = begin_pfn; pfn < begin_pfn + nr; i++, pfn++ )
         if ( !p2m_change_type_one(d, pfn, p2m_ram_rw, p2m_ram_logdirty) )
             dirty_bitmap[i >> 3] |= (1 << (i & 7));
 
-    p2m_unlock(p2m);
-
     flush_tlb_mask(d->domain_dirty_cpumask);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* [PATCH 11/11] x86/altp2m: alternate p2m memory events.
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (9 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 10/11] x86/altp2m: fix log-dirty handling Ed White
@ 2015-01-09 21:26 ` Ed White
  2015-01-09 22:06 ` [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Andrew Cooper
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-09 21:26 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, ian.campbell, tim, ian.jackson, Ed White, jbeulich

Add a flag to indicate that a memory event occurred in an alternate p2m
and a field containing the p2m index. Allow the response to switch to
a different p2m using the same flag and field.

Modify p2m_access_check() to handle alternate p2m's. Access_required is
always assumed for an alternate p2m.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/mm/hap/altp2m_hap.c | 53 ++++++++++++++++++++++++++++++++++++++--
 xen/arch/x86/mm/p2m.c            | 18 ++++++++++++--
 xen/common/mem_access.c          |  1 +
 xen/include/asm-arm/p2m.h        |  7 ++++++
 xen/include/asm-x86/p2m.h        |  4 +++
 xen/include/public/mem_event.h   |  9 +++++++
 6 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c b/xen/arch/x86/mm/hap/altp2m_hap.c
index b889626..dd56bbc 100644
--- a/xen/arch/x86/mm/hap/altp2m_hap.c
+++ b/xen/arch/x86/mm/hap/altp2m_hap.c
@@ -19,6 +19,7 @@
  */
 
 #include <xen/mem_event.h>
+#include <xen/mem_access.h>
 #include <xen/event.h>
 #include <public/mem_event.h>
 #include <asm/domain.h>
@@ -63,8 +64,9 @@ altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
  *     else indicate that outer handler should handle fault
  *
  * If the fault is for a present entry:
- *     if the page type is not p2m_ram_rw_ve crash domain
- *     else if hardware does not support #VE emulate it and retry
+ *     if the page type is p2m_ram_rw_ve and hardware does not support #VE
+ *      emulate #VE and retry if successful
+ *     else try to send a memory event
  *     else crash domain
  */
 
@@ -90,11 +92,58 @@ altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
 
     if ( mfn_valid(mfn) )
     {
+        bool_t violation;
+        mem_event_request_t *req_ptr = NULL;
+
         /* Should #VE be emulated for this fault? */
         if ( p2mt == p2m_ram_rw_ve && !cpu_has_vmx_virt_exceptions &&
              ahvm_vcpu_emulate_ve(v) )
             return ALTP2MHVM_PAGEFAULT_DONE;
 
+        /* Fault not handled yet, so try for mem_event */
+        switch (p2ma)
+        {
+        case p2m_access_n:
+        case p2m_access_n2rwx:
+        default:
+            violation = npfec.read_access || npfec.write_access || npfec.insn_fetch;
+            break;
+        case p2m_access_r:
+            violation = npfec.write_access || npfec.insn_fetch;
+            break;
+        case p2m_access_w:
+            violation = npfec.read_access || npfec.insn_fetch;
+            break;
+        case p2m_access_x:
+            violation = npfec.read_access || npfec.write_access;
+            break;
+        case p2m_access_rx:
+        case p2m_access_rx2rw:
+            violation = npfec.write_access;
+            break;
+        case p2m_access_wx:
+            violation = npfec.read_access;
+            break;
+        case p2m_access_rw:
+            violation = npfec.insn_fetch;
+            break;
+        case p2m_access_rwx:
+            violation = 0;
+            break;
+        }
+
+        if ( violation )
+        {
+            p2m_mem_access_check(gpa, gla, npfec, &req_ptr);
+
+            if ( req_ptr )
+            {
+                mem_access_send_req(v->domain, req_ptr);
+                xfree(req_ptr);
+                return ALTP2MHVM_PAGEFAULT_DONE;
+            }
+        }
+
         /* Could not handle fault here */
         gdprintk(XENLOG_INFO, "Altp2m memory access permissions failure, "
                               "no mem_event listener VCPU %d, dom %d\n",
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 843a433..d296c8f 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1486,6 +1486,13 @@ void p2m_mem_event_emulate_check(struct vcpu *v, const mem_event_response_t *rsp
     }
 }
 
+void p2m_mem_event_altp2m_check(struct vcpu *v, const mem_event_response_t *rsp)
+{
+    if ( (rsp->flags & MEM_EVENT_FLAG_ALTERNATE_P2M) &&
+         altp2mhvm_active(v->domain) )
+        p2m_switch_vcpu_altp2m_by_id(v, rsp->altp2m_idx);
+}
+
 void p2m_setup_introspection(struct domain *d)
 {
     if ( hvm_funcs.enable_msr_exit_interception )
@@ -1502,7 +1509,8 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     struct vcpu *v = current;
     unsigned long gfn = gpa >> PAGE_SHIFT;
     struct domain *d = v->domain;    
-    struct p2m_domain* p2m = p2m_get_hostp2m(d);
+    struct p2m_domain *p2m = altp2mhvm_active(v->domain) ?
+        p2m_get_altp2m(v) : p2m_get_hostp2m(d);
     mfn_t mfn;
     p2m_type_t p2mt;
     p2m_access_t p2ma;
@@ -1536,7 +1544,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     if ( !mem_event_check_ring(&d->mem_event->access) || !req_ptr ) 
     {
         /* No listener */
-        if ( p2m->access_required ) 
+        if ( p2m->access_required || altp2mhvm_active(v->domain) )
         {
             gdprintk(XENLOG_INFO, "Memory access permissions failure, "
                                   "no mem_event listener VCPU %d, dom %d\n",
@@ -1612,6 +1620,12 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
         req->vcpu_id = v->vcpu_id;
 
         p2m_mem_event_fill_regs(req);
+
+        if ( altp2mhvm_active(v->domain) )
+        {
+            req->flags |= MEM_EVENT_FLAG_ALTERNATE_P2M;
+            req->altp2m_idx = vcpu_altp2mhvm(v).p2midx;
+        }
     }
 
     /* Pause the current VCPU */
diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c
index d8aac5f..223d048 100644
--- a/xen/common/mem_access.c
+++ b/xen/common/mem_access.c
@@ -48,6 +48,7 @@ void mem_access_resume(struct domain *d)
         v = d->vcpu[rsp.vcpu_id];
 
         p2m_mem_event_emulate_check(v, &rsp);
+        p2m_mem_event_altp2m_check(v, &rsp);
 
         /* Unpause domain. */
         if ( rsp.flags & MEM_EVENT_FLAG_VCPU_PAUSED )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index da36504..c838f26 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -78,6 +78,13 @@ void p2m_mem_event_emulate_check(struct vcpu *v,
 };
 
 static inline
+void p2m_mem_event_altp2m_check(struct vcpu *v,
+                                 const mem_event_response_t *rsp)
+{
+    /* Not supported on ARM. */
+};
+
+static inline
 void p2m_setup_introspection(struct domain *d)
 {
     /* No special setup on ARM. */
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 52588ed..e4bc64f 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -737,6 +737,10 @@ bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
 /* Invalidate a page in all alternate p2m's */
 void p2m_remove_altp2m_page(struct domain *d, unsigned long gfn);
 
+/* Check to see if vcpu should be switched to a different p2m. */
+void p2m_mem_event_altp2m_check(struct vcpu *v,
+                                 const mem_event_response_t *rsp);
+
 /*
  * p2m type to IOMMU flags
  */
diff --git a/xen/include/public/mem_event.h b/xen/include/public/mem_event.h
index 599f9e8..b877899 100644
--- a/xen/include/public/mem_event.h
+++ b/xen/include/public/mem_event.h
@@ -47,6 +47,14 @@
  * potentially having side effects (like memory mapped or port I/O) disabled.
  */
 #define MEM_EVENT_FLAG_EMULATE_NOWRITE (1 << 6)
+/*
+ * On a request, indicates that the event occurred in the alternate p2m specified by
+ * the altp2m_idx request field.
+ *
+ * On a response, indicates that the VCPU should resume in the alternate p2m specified
+ * by the altp2m_idx response field if possible.
+ */
+#define MEM_EVENT_FLAG_ALTERNATE_P2M   (1 << 7)
 
 /* Reasons for the memory event request */
 #define MEM_EVENT_REASON_UNKNOWN     0    /* typical reason */
@@ -117,6 +125,7 @@ typedef struct mem_event_st {
 
     uint16_t reason;
     struct mem_event_regs_x86 x86_regs;
+    uint16_t altp2m_idx;
 } mem_event_request_t, mem_event_response_t;
 
 DEFINE_RING_TYPES(mem_event, mem_event_request_t, mem_event_response_t);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (10 preceding siblings ...)
  2015-01-09 21:26 ` [PATCH 11/11] x86/altp2m: alternate p2m memory events Ed White
@ 2015-01-09 22:06 ` Andrew Cooper
  2015-01-09 22:21   ` Ed White
  2015-01-12 12:17 ` Ian Jackson
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-09 22:06 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 09/01/2015 21:26, Ed White wrote:
> This set of patches adds support to hvm domains for EPTP switching by creating
> multiple copies of the host p2m (currently limited to 10 copies).
>
> The primary use of this capability is expected to be in scenarios where access
> to memory needs to be monitored and/or restricted below the level at which the
> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
> developer summit are:

Given the nature of VMFUNC, I presume this series is looking to add
support for in-guest entities to be able to monitor/tweak/swap the EPT
tables under the feet of operating system around it?

>
>     VM introspection: 
>         http://www.slideshare.net/xen_com_mgr/
>         zero-footprint-guest-memory-introspection-from-xen
>
>     Secure inter-VM communication:
>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>
> Each p2m copy is populated lazily on EPT violations, and only contains entries for
> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>
> All this is done through extra HVMOP types.
>
> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
> code is hypervisor-only, the toolstack has not been modified.
>
> The intra-domain code has been tested. Violation notifications can only be received
> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
> intra-domain, and only on VCPU's that have enabled notification.
>
> VMFUNC and #VE will both be emulated on hardware without native support.
>
> This code is not compatible with nested hvm functionality and will refuse to work
> with nested hvm active. It is also not compatible with migration. It should be
> considered experimental.

What about shared ept, pci passthrough, ballooning, PoD or any other
mechanisms which involve playing games with the EPT tables behind the
back of the guest?

It appears that this feature only makes sense for a plain, RAM-only VM
with no bells or whistles.

~Andrew

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-09 22:06 ` [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Andrew Cooper
@ 2015-01-09 22:21   ` Ed White
  2015-01-09 22:41     ` Andrew Cooper
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-09 22:21 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 01/09/2015 02:06 PM, Andrew Cooper wrote:
> On 09/01/2015 21:26, Ed White wrote:
>> This set of patches adds support to hvm domains for EPTP switching by creating
>> multiple copies of the host p2m (currently limited to 10 copies).
>>
>> The primary use of this capability is expected to be in scenarios where access
>> to memory needs to be monitored and/or restricted below the level at which the
>> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
>> developer summit are:
> 
> Given the nature of VMFUNC, I presume this series is looking to add
> support for in-guest entities to be able to monitor/tweak/swap the EPT
> tables under the feet of operating system around it?
> 

Primarily, yes. There is (untested) support for using these capabilities
cross-domain, because we have been contacted by multiple people who are
interested in using them that way, but the hardware acceleration provided
by VMFUNC and #VE aren't available cross-domain.

>>
>>     VM introspection: 
>>         http://www.slideshare.net/xen_com_mgr/
>>         zero-footprint-guest-memory-introspection-from-xen
>>
>>     Secure inter-VM communication:
>>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>>
>> Each p2m copy is populated lazily on EPT violations, and only contains entries for
>> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
>> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>>
>> All this is done through extra HVMOP types.
>>
>> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
>> code is hypervisor-only, the toolstack has not been modified.
>>
>> The intra-domain code has been tested. Violation notifications can only be received
>> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
>> intra-domain, and only on VCPU's that have enabled notification.
>>
>> VMFUNC and #VE will both be emulated on hardware without native support.
>>
>> This code is not compatible with nested hvm functionality and will refuse to work
>> with nested hvm active. It is also not compatible with migration. It should be
>> considered experimental.
> 
> What about shared ept, pci passthrough, ballooning, PoD or any other
> mechanisms which involve playing games with the EPT tables behind the
> back of the guest?
> 
> It appears that this feature only makes sense for a plain, RAM-only VM
> with no bells or whistles.
> 

The intention is that only RAM will be tweaked, which is why only RAM entries
exist in the alternate p2m's. All those other page types are still valid and
still handled in the primary nested page fault handler, but these mechanisms
can't modify them.

Also, if any page that was previously plain old RAM is changed in the host p2m,
one of the later patches in the series immediately removes it from all the
alternates, so 'stale' copies in the alternates shouldn't be an issue.

> ~Andrew
> 

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-09 22:21   ` Ed White
@ 2015-01-09 22:41     ` Andrew Cooper
  2015-01-09 23:04       ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-09 22:41 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 09/01/2015 22:21, Ed White wrote:
> On 01/09/2015 02:06 PM, Andrew Cooper wrote:
>> On 09/01/2015 21:26, Ed White wrote:
>>> This set of patches adds support to hvm domains for EPTP switching by creating
>>> multiple copies of the host p2m (currently limited to 10 copies).
>>>
>>> The primary use of this capability is expected to be in scenarios where access
>>> to memory needs to be monitored and/or restricted below the level at which the
>>> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
>>> developer summit are:
>> Given the nature of VMFUNC, I presume this series is looking to add
>> support for in-guest entities to be able to monitor/tweak/swap the EPT
>> tables under the feet of operating system around it?
>>
> Primarily, yes.

Cool!  It might be helpful to add a sentence or two to this effect in
the description.

> There is (untested) support for using these capabilities
> cross-domain, because we have been contacted by multiple people who are
> interested in using them that way, but the hardware acceleration provided
> by VMFUNC and #VE aren't available cross-domain.
>
>>>     VM introspection: 
>>>         http://www.slideshare.net/xen_com_mgr/
>>>         zero-footprint-guest-memory-introspection-from-xen
>>>
>>>     Secure inter-VM communication:
>>>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>>>
>>> Each p2m copy is populated lazily on EPT violations, and only contains entries for
>>> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
>>> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>>>
>>> All this is done through extra HVMOP types.
>>>
>>> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
>>> code is hypervisor-only, the toolstack has not been modified.
>>>
>>> The intra-domain code has been tested. Violation notifications can only be received
>>> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
>>> intra-domain, and only on VCPU's that have enabled notification.
>>>
>>> VMFUNC and #VE will both be emulated on hardware without native support.
>>>
>>> This code is not compatible with nested hvm functionality and will refuse to work
>>> with nested hvm active. It is also not compatible with migration. It should be
>>> considered experimental.
>> What about shared ept, pci passthrough, ballooning, PoD or any other
>> mechanisms which involve playing games with the EPT tables behind the
>> back of the guest?
>>
>> It appears that this feature only makes sense for a plain, RAM-only VM
>> with no bells or whistles.
>>
> The intention is that only RAM will be tweaked, which is why only RAM entries
> exist in the alternate p2m's. All those other page types are still valid and
> still handled in the primary nested page fault handler, but these mechanisms
> can't modify them.
>
> Also, if any page that was previously plain old RAM is changed in the host p2m,
> one of the later patches in the series immediately removes it from all the
> alternates, so 'stale' copies in the alternates shouldn't be an issue.

Ok so the plain old ram tricks are possibly ok, but that still sounds
like shared ept and pci passthrough are still to be avoided?

Having some non-OS part of the guest swap the EPT tables and
accidentally turn a DMA buffer read-only is not going to end well.

~Andrew

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-09 22:41     ` Andrew Cooper
@ 2015-01-09 23:04       ` Ed White
  2015-01-12 10:00         ` Jan Beulich
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-09 23:04 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 01/09/2015 02:41 PM, Andrew Cooper wrote:
> On 09/01/2015 22:21, Ed White wrote:
>> On 01/09/2015 02:06 PM, Andrew Cooper wrote:
>>> On 09/01/2015 21:26, Ed White wrote:
>>>> This set of patches adds support to hvm domains for EPTP switching by creating
>>>> multiple copies of the host p2m (currently limited to 10 copies).
>>>>
>>>> The primary use of this capability is expected to be in scenarios where access
>>>> to memory needs to be monitored and/or restricted below the level at which the
>>>> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
>>>> developer summit are:
>>> Given the nature of VMFUNC, I presume this series is looking to add
>>> support for in-guest entities to be able to monitor/tweak/swap the EPT
>>> tables under the feet of operating system around it?
>>>
>> Primarily, yes.
> 
> Cool!  It might be helpful to add a sentence or two to this effect in
> the description.
> 

Point taken. The first slide deck linked below describes the intra-domain
use case in some detail, but I should at least mention it here. I'll add
something next time round.

>> There is (untested) support for using these capabilities
>> cross-domain, because we have been contacted by multiple people who are
>> interested in using them that way, but the hardware acceleration provided
>> by VMFUNC and #VE aren't available cross-domain.
>>
>>>>     VM introspection: 
>>>>         http://www.slideshare.net/xen_com_mgr/
>>>>         zero-footprint-guest-memory-introspection-from-xen
>>>>
>>>>     Secure inter-VM communication:
>>>>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>>>>
>>>> Each p2m copy is populated lazily on EPT violations, and only contains entries for
>>>> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
>>>> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>>>>
>>>> All this is done through extra HVMOP types.
>>>>
>>>> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
>>>> code is hypervisor-only, the toolstack has not been modified.
>>>>
>>>> The intra-domain code has been tested. Violation notifications can only be received
>>>> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
>>>> intra-domain, and only on VCPU's that have enabled notification.
>>>>
>>>> VMFUNC and #VE will both be emulated on hardware without native support.
>>>>
>>>> This code is not compatible with nested hvm functionality and will refuse to work
>>>> with nested hvm active. It is also not compatible with migration. It should be
>>>> considered experimental.
>>> What about shared ept, pci passthrough, ballooning, PoD or any other
>>> mechanisms which involve playing games with the EPT tables behind the
>>> back of the guest?
>>>
>>> It appears that this feature only makes sense for a plain, RAM-only VM
>>> with no bells or whistles.
>>>
>> The intention is that only RAM will be tweaked, which is why only RAM entries
>> exist in the alternate p2m's. All those other page types are still valid and
>> still handled in the primary nested page fault handler, but these mechanisms
>> can't modify them.
>>
>> Also, if any page that was previously plain old RAM is changed in the host p2m,
>> one of the later patches in the series immediately removes it from all the
>> alternates, so 'stale' copies in the alternates shouldn't be an issue.
> 
> Ok so the plain old ram tricks are possibly ok, but that still sounds
> like shared ept and pci passthrough are still to be avoided?
> 
> Having some non-OS part of the guest swap the EPT tables and
> accidentally turn a DMA buffer read-only is not going to end well.
> 

The agent can certainly do bad things, and at some level you have to assume it
is sensible enough not to. However, I'm not sure this is fundamentally more
dangerous than what a privileged domain can do today using the MEMOP...
operations, and people are already using those for very similar purposes.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-09 23:04       ` Ed White
@ 2015-01-12 10:00         ` Jan Beulich
  2015-01-12 17:36           ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-12 10:00 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.campbell, Andrew Cooper, tim, xen-devel, ian.jackson

>>> On 10.01.15 at 00:04, <edmund.h.white@intel.com> wrote:
> On 01/09/2015 02:41 PM, Andrew Cooper wrote:
>> Having some non-OS part of the guest swap the EPT tables and
>> accidentally turn a DMA buffer read-only is not going to end well.
>> 
> 
> The agent can certainly do bad things, and at some level you have to assume it
> is sensible enough not to. However, I'm not sure this is fundamentally more
> dangerous than what a privileged domain can do today using the MEMOP...
> operations, and people are already using those for very similar purposes.

I don't follow - how is what privileged domain can do related to the
proposed changes here (which are - via VMFUNC - at least partially
guest controllable, and that's also the case Andrew mentioned in his
reply)? I'm having a hard time understanding how a P2M stripped of
anything that's not plain RAM can be very useful to a guest. IOW
without such fundamental aspects clarified I don't see a point in
looking at the individual patches (which btw, according to your
wording elsewhere, should have been marked RFC).

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (11 preceding siblings ...)
  2015-01-09 22:06 ` [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Andrew Cooper
@ 2015-01-12 12:17 ` Ian Jackson
  2015-01-12 17:39   ` Ed White
  2015-01-13 19:01 ` Andrew Cooper
  2015-01-15 16:15 ` Tim Deegan
  14 siblings, 1 reply; 135+ messages in thread
From: Ian Jackson @ 2015-01-12 12:17 UTC (permalink / raw)
  To: Ed White; +Cc: tim, keir, ian.campbell, jbeulich, xen-devel

Ed White writes ("[PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
> This set of patches adds support to hvm domains for EPTP switching
> by creating multiple copies of the host p2m (currently limited to 10
> copies).

Thanks for this.  Did you CC me in my capacity as tools maintainer ?
I can't see anything in this series which deals with any necessary
tools changes.

Are there tools parts to come later ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection.
  2015-01-09 21:26 ` [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection Ed White
@ 2015-01-12 13:06   ` Andrew Cooper
  2015-01-13 18:50     ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-12 13:06 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 09/01/15 21:26, Ed White wrote:
> Currently, neither is enabled globally but may be enabled on a per-VCPU
> basis by the altp2m code.
>
> Everything can be force-disabled globally by specifying vmfunc=0 on the
> Xen command line.
>
> Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
> that bit is now hardware-defined.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  docs/misc/xen-command-line.markdown |  7 +++++++
>  xen/arch/x86/hvm/vmx/vmcs.c         | 40 +++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/mm/p2m-ept.c           |  1 -
>  xen/include/asm-x86/hvm/vmx/vmcs.h  | 16 +++++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmx.h   | 13 +++++++++++-
>  xen/include/asm-x86/msr-index.h     |  1 +
>  6 files changed, 76 insertions(+), 2 deletions(-)
>
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index 152ae03..00fbae7 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1305,6 +1305,13 @@ The optional `keep` parameter causes Xen to continue using the vga
>  console even after dom0 has been started.  The default behaviour is to
>  relinquish control to dom0.
>  
> +### vmfunc (Intel)
> +> `= <boolean>`
> +
> +> Default: `true`
> +
> +Use VMFUNC and #VE support if available.
> +
>  ### vpid (Intel)
>  > `= <boolean>`
>  
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 9d8033e..4274e92 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -50,6 +50,9 @@ boolean_param("unrestricted_guest", opt_unrestricted_guest_enabled);
>  static bool_t __read_mostly opt_apicv_enabled = 1;
>  boolean_param("apicv", opt_apicv_enabled);
>  
> +static bool_t __read_mostly opt_vmfunc_enabled = 1;
> +boolean_param("vmfunc", opt_vmfunc_enabled);

Please can experimental features be off by default.  (I am specifically
looking to avoid the issues we had with apicv getting into stable
releases despite reliably causing problems for migration).

I suspect you will have many interested testers for this featureset, and
it is fine to patch the default later when the feature gets declared stable.

I also wonder whether it might be better to have a "vmx=" command line
parameter with "vmfunc" as a subopt, to save gaining an ever increasing
set of related top level parameters?

Other than this, the content of the rest of the patch appears fine.

~Andrew

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/11] VMX: implement suppress #VE.
  2015-01-09 21:26 ` [PATCH 02/11] VMX: implement suppress #VE Ed White
@ 2015-01-12 16:43   ` Andrew Cooper
  2015-01-12 17:45     ` Ed White
  2015-01-15 16:25   ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-12 16:43 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 09/01/15 21:26, Ed White wrote:
> In preparation for selectively enabling hardware #VE in a later patch,
> set suppress #VE on all EPTE's on #VE-capable hardware.
>
> Suppress #VE should always be the default condition for two reasons:
> it is generally not safe to deliver #VE into a guest unless that guest
> has been modified to receive it; and even then for most EPT violations only
> the hypervisor is able to handle the violation.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/mm/p2m-ept.c         | 34 +++++++++++++++++++++++++++++++++-
>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
>  2 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index eb8b5f9..2b9f07c 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -41,7 +41,7 @@
>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>  static inline bool_t is_epte_valid(ept_entry_t *e)
>  {
> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
> +    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);
>  }
>  
>  /* returns : 0 for success, -errno otherwise */
> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
>  
>      ept_entry->r = ept_entry->w = ept_entry->x = 1;
>  
> +    /* Disable #VE on all entries */ 
> +    if ( cpu_has_vmx_virt_exceptions )
> +    {
> +        ept_entry_t *table = __map_domain_page(pg);
> +
> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )

Style - please declare i in the upper scope, and it should be unsigned.

> +            table[i].suppress_ve = 1;
> +
> +        unmap_domain_page(table);
> +
> +        ept_entry->suppress_ve = 1;
> +    }
> +
>      return 1;
>  }
>  
> @@ -243,6 +256,10 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>          epte->sp = (level > 1);
>          epte->mfn += i * trunk;
>          epte->snp = (iommu_enabled && iommu_snoop);
> +
> +        if ( cpu_has_vmx_virt_exceptions )
> +            epte->suppress_ve = 1;
> +
>          ASSERT(!epte->rsvd1);
>  
>          ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
> @@ -753,6 +770,9 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          ept_p2m_type_to_flags(&new_entry, p2mt, p2ma);
>      }
>  
> +    if ( cpu_has_vmx_virt_exceptions )
> +        new_entry.suppress_ve = 1;
> +
>      rc = atomic_write_ept_entry(ept_entry, new_entry, target);
>      if ( unlikely(rc) )
>          old_entry.epte = 0;
> @@ -1069,6 +1089,18 @@ int ept_p2m_init(struct p2m_domain *p2m)
>      /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
>      ept->ept_wl = 3;
>  
> +    /* Disable #VE on all entries */
> +    if ( cpu_has_vmx_virt_exceptions )
> +    {
> +        ept_entry_t *table =
> +            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
> +
> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> +            table[i].suppress_ve = 1;

Is it safe setting SVE on an entry which is not known to be a superpage
or not present?  The manual states that the bit is ignored in this case,
but I am concerned that, as with SVE, this bit will suddenly gain
meaning in the future.

> +
> +        unmap_domain_page(table);
> +    }
> +
>      if ( !zalloc_cpumask_var(&ept->synced_mask) )
>          return -ENOMEM;
>  
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 8bae195..70fee74 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -49,6 +49,7 @@ typedef union {
>          suppress_ve :   1;  /* bit 63 - suppress #VE */
>      };
>      u64 epte;
> +    u64 valid       :   63; /* entire EPTE except suppress #VE bit */

I am not sure 'valid' is a sensible name here.  As it is only used in
is_epte_valid(), might it be better to just use ->epte and a bitmask for
everything other than the #VE bit?

~Andrew

>  } ept_entry_t;
>  
>  typedef struct {

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection.
  2015-01-09 21:26 ` [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection Ed White
@ 2015-01-12 17:08   ` Andrew Cooper
  2015-01-12 17:46     ` Ed White
  2015-01-15 16:32   ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-12 17:08 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 09/01/15 21:26, Ed White wrote:
> As implemented here, only supported on platforms with VMX HAP.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c        | 8 ++++++++
>  xen/arch/x86/hvm/vmx/vmx.c    | 1 +
>  xen/include/asm-x86/hvm/hvm.h | 6 ++++++
>  3 files changed, 15 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index bc414ff..3a7367c 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -157,6 +157,9 @@ static int __init hvm_enable(void)
>      if ( !fns->pvh_supported )
>          printk(XENLOG_INFO "HVM: PVH mode not supported on this platform\n");
>  
> +    if ( !fns->altp2m_supported )
> +        printk(XENLOG_INFO "HVM: Alternate p2m mode not supported on this platform\n");
> +

I am not sure this message is particularly useful.  The PVH message
above is just transitory until PVH looses some of its restrictions.

>      /*
>       * Allow direct access to the PC debug ports 0x80 and 0xed (they are
>       * often used for I/O delays, but the vmexits simply slow things down).
> @@ -6369,6 +6372,11 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
>      return hvm_funcs.nhvm_intr_blocked(v);
>  }
>  
> +bool_t hvm_altp2m_supported()

I have to admit that I am somewhat uneasy about the name "altp2m", but I
can't suggest anything better at the moment.  I am all ears if anyone
else has any other suggestions.

> +{
> +    return hvm_funcs.altp2m_supported;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index f2554d6..931709b 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1796,6 +1796,7 @@ const struct hvm_function_table * __init start_vmx(void)
>      if ( cpu_has_vmx_ept && (cpu_has_vmx_pat || opt_force_ept) )
>      {
>          vmx_function_table.hap_supported = 1;
> +        vmx_function_table.altp2m_supported = 1;
>  
>          vmx_function_table.hap_capabilities = 0;
>  
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index e3d2d9a..7115a68 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -94,6 +94,9 @@ struct hvm_function_table {
>      /* Necessary hardware support for PVH mode? */
>      int pvh_supported;
>  
> +    /* Necessary hardware support for alternate p2m's? */
> +    int altp2m_supported;

bool_t please.  (The adjacent examples are poor)

~Andrew

> +
>      /* Indicate HAP capabilities. */
>      int hap_capabilities;
>  
> @@ -518,6 +521,9 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
>  /* interrupt */
>  enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
>  
> +/* returns true if hardware supports alternate p2m's */
> +bool_t hvm_altp2m_supported(void);
> +
>  #ifndef NDEBUG
>  /* Permit use of the Forced Emulation Prefix in HVM guests */
>  extern bool_t opt_hvm_fep;

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 10:00         ` Jan Beulich
@ 2015-01-12 17:36           ` Ed White
  2015-01-13  8:56             ` Jan Beulich
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-12 17:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, ian.campbell, Andrew Cooper, tim, xen-devel, ian.jackson

On 01/12/2015 02:00 AM, Jan Beulich wrote:
>>>> On 10.01.15 at 00:04, <edmund.h.white@intel.com> wrote:
>> On 01/09/2015 02:41 PM, Andrew Cooper wrote:
>>> Having some non-OS part of the guest swap the EPT tables and
>>> accidentally turn a DMA buffer read-only is not going to end well.
>>>
>>
>> The agent can certainly do bad things, and at some level you have to assume it
>> is sensible enough not to. However, I'm not sure this is fundamentally more
>> dangerous than what a privileged domain can do today using the MEMOP...
>> operations, and people are already using those for very similar purposes.
> 
> I don't follow - how is what privileged domain can do related to the
> proposed changes here (which are - via VMFUNC - at least partially
> guest controllable, and that's also the case Andrew mentioned in his
> reply)? I'm having a hard time understanding how a P2M stripped of
> anything that's not plain RAM can be very useful to a guest. IOW
> without such fundamental aspects clarified I don't see a point in
> looking at the individual patches (which btw, according to your
> wording elsewhere, should have been marked RFC).
> 
> Jan
> 
In this patch series, none of the new hypercalls are protected by xsm
policies. Earlier in the process of working on this code, I added such
a check to all the hypercalls, but then removed them all because it
dawned on me that I didn't actually understand what I was doing and
my code only worked because I only ever built the dummy permit everything
policy.

Should some version of this patch series be accepted, my hope is that
someone who does understand xsm policies would put the appropriate checks
in place, and at that point I maintain that these extra capabilities
would not be fundamentally more dangerous than existing mechanisms
available to privileged domains, because policy can prevent the guest
using vmfunc. That's obviously not true today.

The alternate p2m's only contain entries for ram pages with valid mfn's.
All other page types are still handled in the nested page fault handler
for the host p2m. Those pages (at least the ones I've encountered) don't
require the hardware to have a valid EPTE for the page.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 12:17 ` Ian Jackson
@ 2015-01-12 17:39   ` Ed White
  2015-01-12 17:43     ` Ian Jackson
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-12 17:39 UTC (permalink / raw)
  To: Ian Jackson; +Cc: tim, keir, ian.campbell, jbeulich, xen-devel

On 01/12/2015 04:17 AM, Ian Jackson wrote:
> Ed White writes ("[PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
>> This set of patches adds support to hvm domains for EPTP switching
>> by creating multiple copies of the host p2m (currently limited to 10
>> copies).
> 
> Thanks for this.  Did you CC me in my capacity as tools maintainer ?
> I can't see anything in this series which deals with any necessary
> tools changes.
> 
> Are there tools parts to come later ?
> 
I copied you because get_maintainer picked you. There is scope for tools
parts, but I don't have the relevant subject knowledge to add them.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 17:39   ` Ed White
@ 2015-01-12 17:43     ` Ian Jackson
  2015-01-12 17:50       ` Ed White
  2015-01-12 17:51       ` Andrew Cooper
  0 siblings, 2 replies; 135+ messages in thread
From: Ian Jackson @ 2015-01-12 17:43 UTC (permalink / raw)
  To: Ed White; +Cc: tim, keir, ian.campbell, jbeulich, xen-devel

Ed White writes ("Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
> On 01/12/2015 04:17 AM, Ian Jackson wrote:
> > Are there tools parts to come later ?
> 
> I copied you because get_maintainer picked you. There is scope for tools
> parts, but I don't have the relevant subject knowledge to add them.

I see.  Without tools parts, how is this new functionality to be
exercised ?  I guess I'm missing part (or maybe most) of the picture.

(I wonder why get_maintainer picked me.  I may investigate.)

Ian.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/11] VMX: implement suppress #VE.
  2015-01-12 16:43   ` Andrew Cooper
@ 2015-01-12 17:45     ` Ed White
  2015-01-13 18:36       ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-12 17:45 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 01/12/2015 08:43 AM, Andrew Cooper wrote:
> On 09/01/15 21:26, Ed White wrote:
>> In preparation for selectively enabling hardware #VE in a later patch,
>> set suppress #VE on all EPTE's on #VE-capable hardware.
>>
>> Suppress #VE should always be the default condition for two reasons:
>> it is generally not safe to deliver #VE into a guest unless that guest
>> has been modified to receive it; and even then for most EPT violations only
>> the hypervisor is able to handle the violation.
>>
>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>> ---
>>  xen/arch/x86/mm/p2m-ept.c         | 34 +++++++++++++++++++++++++++++++++-
>>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
>>  2 files changed, 34 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index eb8b5f9..2b9f07c 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -41,7 +41,7 @@
>>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>>  static inline bool_t is_epte_valid(ept_entry_t *e)
>>  {
>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
>> +    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);
>>  }
>>  
>>  /* returns : 0 for success, -errno otherwise */
>> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
>>  
>>      ept_entry->r = ept_entry->w = ept_entry->x = 1;
>>  
>> +    /* Disable #VE on all entries */ 
>> +    if ( cpu_has_vmx_virt_exceptions )
>> +    {
>> +        ept_entry_t *table = __map_domain_page(pg);
>> +
>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> 
> Style - please declare i in the upper scope, and it should be unsigned.
> 
>> +            table[i].suppress_ve = 1;
>> +
>> +        unmap_domain_page(table);
>> +
>> +        ept_entry->suppress_ve = 1;
>> +    }
>> +
>>      return 1;
>>  }
>>  
>> @@ -243,6 +256,10 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>>          epte->sp = (level > 1);
>>          epte->mfn += i * trunk;
>>          epte->snp = (iommu_enabled && iommu_snoop);
>> +
>> +        if ( cpu_has_vmx_virt_exceptions )
>> +            epte->suppress_ve = 1;
>> +
>>          ASSERT(!epte->rsvd1);
>>  
>>          ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
>> @@ -753,6 +770,9 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>          ept_p2m_type_to_flags(&new_entry, p2mt, p2ma);
>>      }
>>  
>> +    if ( cpu_has_vmx_virt_exceptions )
>> +        new_entry.suppress_ve = 1;
>> +
>>      rc = atomic_write_ept_entry(ept_entry, new_entry, target);
>>      if ( unlikely(rc) )
>>          old_entry.epte = 0;
>> @@ -1069,6 +1089,18 @@ int ept_p2m_init(struct p2m_domain *p2m)
>>      /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
>>      ept->ept_wl = 3;
>>  
>> +    /* Disable #VE on all entries */
>> +    if ( cpu_has_vmx_virt_exceptions )
>> +    {
>> +        ept_entry_t *table =
>> +            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
>> +
>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
>> +            table[i].suppress_ve = 1;
> 
> Is it safe setting SVE on an entry which is not known to be a superpage
> or not present?  The manual states that the bit is ignored in this case,
> but I am concerned that, as with SVE, this bit will suddenly gain
> meaning in the future.
> 

It is safe to do this. Never say never, but I am aware of no plans to
overload this bit, and I would know. Unless you feel strongly about it,
I would prefer to leave this as-is, since changing it would make the code
more complex.

>> +
>> +        unmap_domain_page(table);
>> +    }
>> +
>>      if ( !zalloc_cpumask_var(&ept->synced_mask) )
>>          return -ENOMEM;
>>  
>> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
>> index 8bae195..70fee74 100644
>> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
>> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
>> @@ -49,6 +49,7 @@ typedef union {
>>          suppress_ve :   1;  /* bit 63 - suppress #VE */
>>      };
>>      u64 epte;
>> +    u64 valid       :   63; /* entire EPTE except suppress #VE bit */
> 
> I am not sure 'valid' is a sensible name here.  As it is only used in
> is_epte_valid(), might it be better to just use ->epte and a bitmask for
> everything other than the #VE bit?
> 

This seemed more in the style of the code I was changing, but I can do it
as you suggest.

Ed

>>  } ept_entry_t;
>>  
>>  typedef struct {
> 
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection.
  2015-01-12 17:08   ` Andrew Cooper
@ 2015-01-12 17:46     ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-12 17:46 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 01/12/2015 09:08 AM, Andrew Cooper wrote:
> On 09/01/15 21:26, Ed White wrote:
>> As implemented here, only supported on platforms with VMX HAP.
>>
>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>> ---
>>  xen/arch/x86/hvm/hvm.c        | 8 ++++++++
>>  xen/arch/x86/hvm/vmx/vmx.c    | 1 +
>>  xen/include/asm-x86/hvm/hvm.h | 6 ++++++
>>  3 files changed, 15 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index bc414ff..3a7367c 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -157,6 +157,9 @@ static int __init hvm_enable(void)
>>      if ( !fns->pvh_supported )
>>          printk(XENLOG_INFO "HVM: PVH mode not supported on this platform\n");
>>  
>> +    if ( !fns->altp2m_supported )
>> +        printk(XENLOG_INFO "HVM: Alternate p2m mode not supported on this platform\n");
>> +
> 
> I am not sure this message is particularly useful.  The PVH message
> above is just transitory until PVH looses some of its restrictions.
> 
>>      /*
>>       * Allow direct access to the PC debug ports 0x80 and 0xed (they are
>>       * often used for I/O delays, but the vmexits simply slow things down).
>> @@ -6369,6 +6372,11 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
>>      return hvm_funcs.nhvm_intr_blocked(v);
>>  }
>>  
>> +bool_t hvm_altp2m_supported()
> 
> I have to admit that I am somewhat uneasy about the name "altp2m", but I
> can't suggest anything better at the moment.  I am all ears if anyone
> else has any other suggestions.
> 

Me too. The first name I used was even worse.

Ed

>> +{
>> +    return hvm_funcs.altp2m_supported;
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index f2554d6..931709b 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1796,6 +1796,7 @@ const struct hvm_function_table * __init start_vmx(void)
>>      if ( cpu_has_vmx_ept && (cpu_has_vmx_pat || opt_force_ept) )
>>      {
>>          vmx_function_table.hap_supported = 1;
>> +        vmx_function_table.altp2m_supported = 1;
>>  
>>          vmx_function_table.hap_capabilities = 0;
>>  
>> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
>> index e3d2d9a..7115a68 100644
>> --- a/xen/include/asm-x86/hvm/hvm.h
>> +++ b/xen/include/asm-x86/hvm/hvm.h
>> @@ -94,6 +94,9 @@ struct hvm_function_table {
>>      /* Necessary hardware support for PVH mode? */
>>      int pvh_supported;
>>  
>> +    /* Necessary hardware support for alternate p2m's? */
>> +    int altp2m_supported;
> 
> bool_t please.  (The adjacent examples are poor)
> 
> ~Andrew
> 
>> +
>>      /* Indicate HAP capabilities. */
>>      int hap_capabilities;
>>  
>> @@ -518,6 +521,9 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
>>  /* interrupt */
>>  enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
>>  
>> +/* returns true if hardware supports alternate p2m's */
>> +bool_t hvm_altp2m_supported(void);
>> +
>>  #ifndef NDEBUG
>>  /* Permit use of the Forced Emulation Prefix in HVM guests */
>>  extern bool_t opt_hvm_fep;
> 
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 04/11] x86/MM: Improve p2m type checks.
  2015-01-09 21:26 ` [PATCH 04/11] x86/MM: Improve p2m type checks Ed White
@ 2015-01-12 17:48   ` Andrew Cooper
  2015-01-13 19:39     ` Ed White
  2015-01-15 16:36   ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-12 17:48 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 09/01/15 21:26, Ed White wrote:
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 5f7fe71..8193901 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -193,6 +193,9 @@ struct p2m_domain {
>       * threaded on in LRU order. */
>      struct list_head   np2m_list;
>  
> +    /* Does this p2m belong to the altp2m code? */
> +    bool_t alternate;
> +
>      /* Host p2m: Log-dirty ranges registered for the domain. */
>      struct rangeset   *logdirty_ranges;
>  
> @@ -290,7 +293,9 @@ struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
>   */
>  struct p2m_domain *p2m_get_p2m(struct vcpu *v);
>  
> -#define p2m_is_nestedp2m(p2m)   ((p2m) != p2m_get_hostp2m((p2m->domain)))
> +#define p2m_is_hostp2m(p2m)   ((p2m) == p2m_get_hostp2m((p2m->domain)))
> +#define p2m_is_altp2m(p2m)    ((p2m)->alternate)
> +#define p2m_is_nestedp2m(p2m) (!p2m_is_altp2m(p2m) && !p2m_ishostp2m(p2m))

Might this be better expressed as a p2m type, currently of the set
{host, alt, nested} ?  p2m_is_nestedp2m() is starting to hide some
moderately complicated calculations.

~Andrew

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 17:43     ` Ian Jackson
@ 2015-01-12 17:50       ` Ed White
  2015-01-12 18:00         ` Ian Jackson
  2015-01-12 17:51       ` Andrew Cooper
  1 sibling, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-12 17:50 UTC (permalink / raw)
  To: Ian Jackson; +Cc: tim, keir, ian.campbell, jbeulich, xen-devel

On 01/12/2015 09:43 AM, Ian Jackson wrote:
> Ed White writes ("Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
>> On 01/12/2015 04:17 AM, Ian Jackson wrote:
>>> Are there tools parts to come later ?
>>
>> I copied you because get_maintainer picked you. There is scope for tools
>> parts, but I don't have the relevant subject knowledge to add them.
> 
> I see.  Without tools parts, how is this new functionality to be
> exercised ?  I guess I'm missing part (or maybe most) of the picture.
> 
> (I wonder why get_maintainer picked me.  I may investigate.)
> 

The hypercalls are all there. My testing is all done in a Windows
domU with the tests running inside that domain, so I couldn't use
tools support even if I had it.

Ed

> Ian.
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 17:43     ` Ian Jackson
  2015-01-12 17:50       ` Ed White
@ 2015-01-12 17:51       ` Andrew Cooper
  1 sibling, 0 replies; 135+ messages in thread
From: Andrew Cooper @ 2015-01-12 17:51 UTC (permalink / raw)
  To: Ian Jackson, Ed White; +Cc: keir, tim, ian.campbell, jbeulich, xen-devel

On 12/01/15 17:43, Ian Jackson wrote:
> Ed White writes ("Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
>> On 01/12/2015 04:17 AM, Ian Jackson wrote:
>>> Are there tools parts to come later ?
>> I copied you because get_maintainer picked you. There is scope for tools
>> parts, but I don't have the relevant subject knowledge to add them.
> I see.  Without tools parts, how is this new functionality to be
> exercised ?  I guess I'm missing part (or maybe most) of the picture.
>
> (I wonder why get_maintainer picked me.  I may investigate.)

You were picked because this touches common hypervisor code, and you
fall into "the rest" category.

~Andrew

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 17:50       ` Ed White
@ 2015-01-12 18:00         ` Ian Jackson
  2015-01-12 18:31           ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Ian Jackson @ 2015-01-12 18:00 UTC (permalink / raw)
  To: Ed White; +Cc: tim, keir, ian.campbell, jbeulich, xen-devel

Ed White writes ("Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
> The hypercalls are all there. My testing is all done in a Windows
> domU with the tests running inside that domain, so I couldn't use
> tools support even if I had it.

To support this code in-tree, I think we will need Open Source code
for exercising it, surely ?

Ian.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 18:00         ` Ian Jackson
@ 2015-01-12 18:31           ` Ed White
  2015-01-13 10:21             ` Tamas K Lengyel
  2015-01-13 11:16             ` Ian Jackson
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-12 18:31 UTC (permalink / raw)
  To: Ian Jackson; +Cc: tim, keir, ian.campbell, jbeulich, xen-devel

On 01/12/2015 10:00 AM, Ian Jackson wrote:
> Ed White writes ("Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
>> The hypercalls are all there. My testing is all done in a Windows
>> domU with the tests running inside that domain, so I couldn't use
>> tools support even if I had it.
> 
> To support this code in-tree, I think we will need Open Source code
> for exercising it, surely ?
> 

I'm hoping that, as Andrew says, there will be people interested
in using these capabilities, and that some of them will be prepared
to help fill in the gaps. That's why I wanted to send the series to
the list very early in the 4.6 development cycle.

If that doesn't turn out to be the case, I'll see if I can find some
help internally, but I have neither the bandwidth nor the expertise
to do everything myself.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 17:36           ` Ed White
@ 2015-01-13  8:56             ` Jan Beulich
  2015-01-13 11:28               ` Ian Jackson
  2015-01-13 17:42               ` Ed White
  0 siblings, 2 replies; 135+ messages in thread
From: Jan Beulich @ 2015-01-13  8:56 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.campbell, Andrew Cooper, tim, xen-devel, ian.jackson

>>> On 12.01.15 at 18:36, <edmund.h.white@intel.com> wrote:
> On 01/12/2015 02:00 AM, Jan Beulich wrote:
>>>>> On 10.01.15 at 00:04, <edmund.h.white@intel.com> wrote:
>>> On 01/09/2015 02:41 PM, Andrew Cooper wrote:
>>>> Having some non-OS part of the guest swap the EPT tables and
>>>> accidentally turn a DMA buffer read-only is not going to end well.
>>>>
>>>
>>> The agent can certainly do bad things, and at some level you have to assume 
> it
>>> is sensible enough not to. However, I'm not sure this is fundamentally more
>>> dangerous than what a privileged domain can do today using the MEMOP...
>>> operations, and people are already using those for very similar purposes.
>> 
>> I don't follow - how is what privileged domain can do related to the
>> proposed changes here (which are - via VMFUNC - at least partially
>> guest controllable, and that's also the case Andrew mentioned in his
>> reply)? I'm having a hard time understanding how a P2M stripped of
>> anything that's not plain RAM can be very useful to a guest. IOW
>> without such fundamental aspects clarified I don't see a point in
>> looking at the individual patches (which btw, according to your
>> wording elsewhere, should have been marked RFC).
>> 
> In this patch series, none of the new hypercalls are protected by xsm
> policies. Earlier in the process of working on this code, I added such
> a check to all the hypercalls, but then removed them all because it
> dawned on me that I didn't actually understand what I was doing and
> my code only worked because I only ever built the dummy permit everything
> policy.
> 
> Should some version of this patch series be accepted, my hope is that
> someone who does understand xsm policies would put the appropriate checks
> in place, and at that point I maintain that these extra capabilities
> would not be fundamentally more dangerous than existing mechanisms
> available to privileged domains, because policy can prevent the guest
> using vmfunc. That's obviously not true today.

Please simply consult with the XSM maintainer on questions/issues
like this. Proposing a partial (insecure) patch set isn't appropriate.

> The alternate p2m's only contain entries for ram pages with valid mfn's.
> All other page types are still handled in the nested page fault handler
> for the host p2m. Those pages (at least the ones I've encountered) don't
> require the hardware to have a valid EPTE for the page.

I.e. the functionality requiring e.g. p2m_ram_logdirty and
p2m_mmio_direct is then incompatible with your proposed additions
(which I think was also already noted by Andrew). That's imo not
a basis to think about accepting (or even reviewing) the series.

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 18:31           ` Ed White
@ 2015-01-13 10:21             ` Tamas K Lengyel
  2015-01-13 18:25               ` Ed White
  2015-01-13 11:16             ` Ian Jackson
  1 sibling, 1 reply; 135+ messages in thread
From: Tamas K Lengyel @ 2015-01-13 10:21 UTC (permalink / raw)
  To: Ed White
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich

On Mon, Jan 12, 2015 at 7:31 PM, Ed White <edmund.h.white@intel.com> wrote:
> On 01/12/2015 10:00 AM, Ian Jackson wrote:
>> Ed White writes ("Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
>>> The hypercalls are all there. My testing is all done in a Windows
>>> domU with the tests running inside that domain, so I couldn't use
>>> tools support even if I had it.
>>
>> To support this code in-tree, I think we will need Open Source code
>> for exercising it, surely ?
>>
>
> I'm hoping that, as Andrew says, there will be people interested
> in using these capabilities, and that some of them will be prepared
> to help fill in the gaps. That's why I wanted to send the series to
> the list very early in the 4.6 development cycle.
>
> If that doesn't turn out to be the case, I'll see if I can find some
> help internally, but I have neither the bandwidth nor the expertise
> to do everything myself.
>
> Ed

Hi Ed,
we are certainly very interested in this feature so thanks for posting
this series!

I also see a usecase for multiple copies of host p2m by enabling
better performance for monitoring with the existing memaccess API.
Currently the problem is that if a memaccess violation occurs on one
vcpu, the memaccess settings need to be cleared, then re-applied again
after the operation has passed (usually done via singlestepping). With
multiple vCPUs there is a potential racecondition here, unless all
other vCPUs are paused while the memaccess settings are cleared. With
multiple copies of the host p2m, we could easily just swap in a table
for the violating vCPU where the permissions are clear, without
affecting any of the other vCPUs. This could be exercised by extending
the xen-access test tool!

Is this something you think would be within scope for the envisioned
use-case for this series?

Tamas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-12 18:31           ` Ed White
  2015-01-13 10:21             ` Tamas K Lengyel
@ 2015-01-13 11:16             ` Ian Jackson
  1 sibling, 0 replies; 135+ messages in thread
From: Ian Jackson @ 2015-01-13 11:16 UTC (permalink / raw)
  To: Ed White; +Cc: tim, keir, ian.campbell, jbeulich, xen-devel

Ed White writes ("Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
> On 01/12/2015 10:00 AM, Ian Jackson wrote:
> > To support this code in-tree, I think we will need Open Source code
> > for exercising it, surely ?
> 
> I'm hoping that, as Andrew says, there will be people interested
> in using these capabilities, and that some of them will be prepared
> to help fill in the gaps. That's why I wanted to send the series to
> the list very early in the 4.6 development cycle.

That makes perfect sense, thanks.  There's absolutely nothing wrong
with posting a patch series early.

> If that doesn't turn out to be the case, I'll see if I can find some
> help internally, but I have neither the bandwidth nor the expertise
> to do everything myself.

Right.

Ian.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13  8:56             ` Jan Beulich
@ 2015-01-13 11:28               ` Ian Jackson
  2015-01-13 17:42               ` Ed White
  1 sibling, 0 replies; 135+ messages in thread
From: Ian Jackson @ 2015-01-13 11:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: keir, ian.campbell, Andrew Cooper, tim, Ed White, xen-devel

Jan Beulich writes ("Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
> On 12.01.15 at 18:36, <edmund.h.white@intel.com> wrote:
> > Should some version of this patch series be accepted, my hope is that
> > someone who does understand xsm policies would put the appropriate checks
> > in place, and at that point I maintain that these extra capabilities
> > would not be fundamentally more dangerous than existing mechanisms
> > available to privileged domains, because policy can prevent the guest
> > using vmfunc. That's obviously not true today.
> 
> Please simply consult with the XSM maintainer on questions/issues
> like this. Proposing a partial (insecure) patch set isn't appropriate.

I think a better way to phrase this criticism would be to say "please
next time mark your series as RFC and mention in the 00 covering note
the issues which mean that the series should not be applied".

It is definitely appropriate to post RFC patches even if some
important parts are missing.  What is necessary is to explicitly
discuss the problems, so that they don't get overlooked.

Ian.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-09 21:26 ` [PATCH 05/11] x86/altp2m: basic data structures and support routines Ed White
@ 2015-01-13 11:28   ` Andrew Cooper
  2015-01-13 19:49     ` Ed White
  2015-01-15 16:48   ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-13 11:28 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 09/01/15 21:26, Ed White wrote:
> Add the basic data structures needed to support alternate p2m's and
> the functions to initialise them and tear them down.
>
> Although Intel hardware can handle 512 EPTP's per hardware thread
> concurrently, only 10 per domain are supported in this patch for
> performance reasons.
>
> The iterator in hap_enable() does need to handle 512, so that is now
> uint16_t.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/Makefile           |   3 +-
>  xen/arch/x86/hvm/altp2mhvm.c        |  77 +++++++++++++++++++++++++++
>  xen/arch/x86/hvm/hvm.c              |  21 ++++++++
>  xen/arch/x86/mm/hap/Makefile        |   1 +
>  xen/arch/x86/mm/hap/altp2m_hap.c    |  66 +++++++++++++++++++++++
>  xen/arch/x86/mm/hap/hap.c           |  30 ++++++++++-
>  xen/arch/x86/mm/mm-locks.h          |   4 ++
>  xen/arch/x86/mm/p2m.c               | 102 ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/domain.h        |   7 +++
>  xen/include/asm-x86/hvm/altp2mhvm.h |  36 +++++++++++++
>  xen/include/asm-x86/hvm/hvm.h       |  17 ++++++
>  xen/include/asm-x86/hvm/vcpu.h      |   9 ++++
>  xen/include/asm-x86/p2m.h           |  22 ++++++++
>  13 files changed, 393 insertions(+), 2 deletions(-)
>  create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
>  create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h
>
> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
> index eea5555..5bf8b4f 100644
> --- a/xen/arch/x86/hvm/Makefile
> +++ b/xen/arch/x86/hvm/Makefile
> @@ -22,4 +22,5 @@ obj-y += vlapic.o
>  obj-y += vmsi.o
>  obj-y += vpic.o
>  obj-y += vpt.o
> -obj-y += vpmu.o
> \ No newline at end of file
> +obj-y += vpmu.o
> +obj-y += altp2mhvm.o
> diff --git a/xen/arch/x86/hvm/altp2mhvm.c b/xen/arch/x86/hvm/altp2mhvm.c
> new file mode 100644
> index 0000000..fa0af0c
> --- /dev/null
> +++ b/xen/arch/x86/hvm/altp2mhvm.c
> @@ -0,0 +1,77 @@
> +/*
> + * Alternate p2m HVM
> + * Copyright (c) 2014, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#include <asm/hvm/support.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/p2m.h>
> +#include <asm/hvm/altp2mhvm.h>
> +
> +void
> +altp2mhvm_vcpu_reset(struct vcpu *v)
> +{
> +    struct altp2mvcpu *av = &vcpu_altp2mhvm(v);
> +
> +    av->p2midx = 0;
> +    av->veinfo = 0;
> +
> +    if ( hvm_funcs.ahvm_vcpu_reset )
> +        hvm_funcs.ahvm_vcpu_reset(v);
> +}
> +
> +int
> +altp2mhvm_vcpu_initialise(struct vcpu *v)
> +{
> +    int rc = -EOPNOTSUPP;
> +
> +    if ( v != current )
> +        vcpu_pause(v);

Under what circumstances would a vcpu initialisation happen on current? 
All initialisation should happen during domain creation.

> +
> +    if ( !hvm_funcs.ahvm_vcpu_initialise ||
> +         (hvm_funcs.ahvm_vcpu_initialise(v) == 0) )
> +    {
> +        rc = 0;
> +        altp2mhvm_vcpu_reset(v);
> +        cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> +
> +        ahvm_vcpu_update_eptp(v);
> +    }
> +
> +    if ( v != current )
> +        vcpu_unpause(v);
> +
> +    return rc;
> +}
> +
> +void
> +altp2mhvm_vcpu_destroy(struct vcpu *v)
> +{
> +    if ( v != current )
> +        vcpu_pause(v);
> +
> +    if ( hvm_funcs.ahvm_vcpu_destroy )
> +        hvm_funcs.ahvm_vcpu_destroy(v);
> +
> +    cpumask_clear_cpu(v->processor, p2m_get_altp2m(v)->dirty_cpumask);
> +    altp2mhvm_vcpu_reset(v);
> +
> +    ahvm_vcpu_update_eptp(v);
> +    ahvm_vcpu_update_vmfunc_ve(v);
> +
> +    if ( v != current )
> +        vcpu_unpause(v);
> +}
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index b89e9d2..e8787cc 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -60,6 +60,7 @@
>  #include <asm/hvm/cacheattr.h>
>  #include <asm/hvm/trace.h>
>  #include <asm/hvm/nestedhvm.h>
> +#include <asm/hvm/altp2mhvm.h>
>  #include <asm/mtrr.h>
>  #include <asm/apic.h>
>  #include <public/sched.h>
> @@ -2290,6 +2291,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
>  
>      hvm_all_ioreq_servers_remove_vcpu(d, v);
>  
> +    altp2mhvm_vcpu_destroy(v);
>      nestedhvm_vcpu_destroy(v);
>  
>      free_compat_arg_xlat(v);
> @@ -6377,6 +6379,25 @@ bool_t hvm_altp2m_supported()
>      return hvm_funcs.altp2m_supported;
>  }
>  
> +void ahvm_vcpu_update_eptp(struct vcpu *v)
> +{
> +    if (hvm_funcs.ahvm_vcpu_update_eptp)
> +        hvm_funcs.ahvm_vcpu_update_eptp(v);
> +}
> +
> +void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v)
> +{
> +    if (hvm_funcs.ahvm_vcpu_update_vmfunc_ve)
> +        hvm_funcs.ahvm_vcpu_update_vmfunc_ve(v);
> +}
> +
> +bool_t ahvm_vcpu_emulate_ve(struct vcpu *v)
> +{
> +    if (hvm_funcs.ahvm_vcpu_emulate_ve)
> +        return hvm_funcs.ahvm_vcpu_emulate_ve(v);
> +    return 0;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
> index 68f2bb5..216cd90 100644
> --- a/xen/arch/x86/mm/hap/Makefile
> +++ b/xen/arch/x86/mm/hap/Makefile
> @@ -4,6 +4,7 @@ obj-y += guest_walk_3level.o
>  obj-$(x86_64) += guest_walk_4level.o
>  obj-y += nested_hap.o
>  obj-y += nested_ept.o
> +obj-y += altp2m_hap.o
>  
>  guest_walk_%level.o: guest_walk.c Makefile
>  	$(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
> diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c b/xen/arch/x86/mm/hap/altp2m_hap.c
> new file mode 100644
> index 0000000..c2cdc42
> --- /dev/null
> +++ b/xen/arch/x86/mm/hap/altp2m_hap.c
> @@ -0,0 +1,66 @@
> +/******************************************************************************
> + * arch/x86/mm/hap/altp2m_hap.c
> + *
> + * Copyright (c) 2014 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#include <xen/mem_event.h>
> +#include <xen/event.h>
> +#include <public/mem_event.h>
> +#include <asm/domain.h>
> +#include <asm/page.h>
> +#include <asm/paging.h>
> +#include <asm/p2m.h>
> +#include <asm/mem_sharing.h>
> +#include <asm/hap.h>
> +#include <asm/hvm/support.h>
> +
> +#include "private.h"
> +
> +/* Override macros from asm/page.h to make them work with mfn_t */
> +#undef mfn_valid
> +#define mfn_valid(_mfn) __mfn_valid(mfn_x(_mfn))
> +#undef page_to_mfn
> +#define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg))
> +
> +void
> +altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> +    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
> +{
> +    struct domain *d = p2m->domain;
> +    uint32_t old_flags;
> +
> +    paging_lock(d);
> +
> +    old_flags = l1e_get_flags(*p);
> +    safe_write_pte(p, new);
> +
> +    if (old_flags & _PAGE_PRESENT)
> +        flush_tlb_mask(p2m->dirty_cpumask);
> +
> +    paging_unlock(d);
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index abf3d7a..8fe0650 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -439,7 +439,7 @@ void hap_domain_init(struct domain *d)
>  int hap_enable(struct domain *d, u32 mode)
>  {
>      unsigned int old_pages;
> -    uint8_t i;
> +    uint16_t i;
>      int rv = 0;
>  
>      domain_pause(d);
> @@ -485,6 +485,23 @@ int hap_enable(struct domain *d, u32 mode)
>             goto out;
>      }
>  
> +    /* Init alternate p2m data */
> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )

This memory should be allocated from some domain-accounted pool,
probably the paging pool (d->->arch.paging.alloc_page()).  You can use
map_domain_page_global() to get a safe pointer to anchor in
d->arch.altp2m_eptp for hardware.

> +    {
> +        rv = -ENOMEM;
> +        goto out;
> +    }
> +    for (i = 0; i < 512; i++)
> +        d->arch.altp2m_eptp[i] = ~0ul;
> +
> +    for (i = 0; i < MAX_ALTP2M; i++) {
> +        rv = p2m_alloc_table(d->arch.altp2m_p2m[i]);
> +        if ( rv != 0 )
> +           goto out;
> +    }
> +
> +    d->arch.altp2m_active = 0;
> +
>      /* Now let other users see the new mode */
>      d->arch.paging.mode = mode | PG_HAP_enable;
>  
> @@ -497,6 +514,17 @@ void hap_final_teardown(struct domain *d)
>  {
>      uint8_t i;
>  
> +    d->arch.altp2m_active = 0;
> +
> +    if ( d->arch.altp2m_eptp ) {
> +        free_xenheap_page(d->arch.altp2m_eptp);
> +        d->arch.altp2m_eptp = NULL;
> +    }
> +
> +    for (i = 0; i < MAX_ALTP2M; i++) {
> +        p2m_teardown(d->arch.altp2m_p2m[i]);
> +    }
> +
>      /* Destroy nestedp2m's first */
>      for (i = 0; i < MAX_NESTEDP2M; i++) {
>          p2m_teardown(d->arch.nested_p2m[i]);
> diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
> index 769f7bc..a0faca3 100644
> --- a/xen/arch/x86/mm/mm-locks.h
> +++ b/xen/arch/x86/mm/mm-locks.h
> @@ -209,6 +209,10 @@ declare_mm_lock(nestedp2m)
>  #define nestedp2m_lock(d)   mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock)
>  #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock)
>  
> +declare_mm_lock(altp2m)
> +#define altp2m_lock(d)   mm_lock(altp2m, &(d)->arch.altp2m_lock)
> +#define altp2m_unlock(d) mm_unlock(&(d)->arch.altp2m_lock)
> +
>  /* P2M lock (per-p2m-table)
>   *
>   * This protects all queries and updates to the p2m table.
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 49b66fb..3c6049b 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -35,6 +35,7 @@
>  #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */
>  #include <asm/mem_sharing.h>
>  #include <asm/hvm/nestedhvm.h>
> +#include <asm/hvm/altp2mhvm.h>
>  #include <asm/hvm/svm/amd-iommu-proto.h>
>  #include <xsm/xsm.h>
>  
> @@ -182,6 +183,44 @@ static void p2m_teardown_nestedp2m(struct domain *d)
>      }
>  }
>  
> +static void p2m_teardown_altp2m(struct domain *d);
> +
> +static int p2m_init_altp2m(struct domain *d)
> +{
> +    uint8_t i;
> +    struct p2m_domain *p2m;
> +
> +    mm_lock_init(&d->arch.altp2m_lock);
> +    for (i = 0; i < MAX_ALTP2M; i++)
> +    {
> +        d->arch.altp2m_p2m[i] = p2m = p2m_init_one(d);
> +        if ( p2m == NULL )
> +        {
> +            p2m_teardown_altp2m(d);
> +            return -ENOMEM;
> +        }
> +        p2m->write_p2m_entry = altp2m_write_p2m_entry;
> +        p2m->alternate = 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static void p2m_teardown_altp2m(struct domain *d)
> +{
> +    uint8_t i;
> +    struct p2m_domain *p2m;
> +
> +    for (i = 0; i < MAX_ALTP2M; i++)
> +    {
> +        if ( !d->arch.altp2m_p2m[i] )
> +            continue;
> +        p2m = d->arch.altp2m_p2m[i];
> +        p2m_free_one(p2m);
> +        d->arch.altp2m_p2m[i] = NULL;
> +    }
> +}
> +
>  int p2m_init(struct domain *d)
>  {
>      int rc;
> @@ -195,7 +234,14 @@ int p2m_init(struct domain *d)
>       * (p2m_init runs too early for HVM_PARAM_* options) */
>      rc = p2m_init_nestedp2m(d);
>      if ( rc )
> +    {
>          p2m_teardown_hostp2m(d);
> +        return rc;
> +    }
> +
> +    rc = p2m_init_altp2m(d);
> +    if ( rc )
> +        p2m_teardown_altp2m(d);
>  
>      return rc;
>  }
> @@ -1891,6 +1937,62 @@ int unmap_mmio_regions(struct domain *d,
>      return err;
>  }
>  
> +bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx)
> +{
> +    struct p2m_domain *p2m;
> +    struct ept_data *ept;
> +    bool_t rc = 0;
> +    uint16_t i;
> +
> +    altp2m_lock(d);
> +
> +    for ( i = 0; i < MAX_ALTP2M; i++ )
> +    {
> +        if ( d->arch.altp2m_eptp[i] == ~0ul )
> +            continue;
> +
> +        p2m = d->arch.altp2m_p2m[i];
> +        ept = &p2m->ept;
> +
> +        if ( eptp != ept_get_eptp(ept) )
> +            continue;
> +
> +        *idx = i;
> +        rc = 1;
> +
> +        break;
> +    }
> +
> +    altp2m_unlock(d);
> +    return rc;
> +}
> +
> +bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx)
> +{
> +    struct domain *d = v->domain;
> +    bool_t rc = 0;
> +
> +    if ( idx > MAX_ALTP2M )
> +        return rc;
> +
> +    altp2m_lock(d);
> +
> +    if ( d->arch.altp2m_eptp[idx] != ~0ul )
> +    {
> +        if ( idx != vcpu_altp2mhvm(v).p2midx )
> +        {
> +            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> +            vcpu_altp2mhvm(v).p2midx = idx;
> +            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> +            ahvm_vcpu_update_eptp(v);
> +        }
> +        rc = 1;
> +    }
> +
> +    altp2m_unlock(d);
> +    return rc;
> +}
> +
>  /*** Audit ***/
>  
>  #if P2M_AUDIT
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 6a77a93..a0e9e90 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -230,6 +230,7 @@ struct paging_vcpu {
>  typedef xen_domctl_cpuid_t cpuid_input_t;
>  
>  #define MAX_NESTEDP2M 10
> +#define MAX_ALTP2M    10
>  struct p2m_domain;
>  struct time_scale {
>      int shift;
> @@ -274,6 +275,12 @@ struct arch_domain
>      struct p2m_domain *nested_p2m[MAX_NESTEDP2M];
>      mm_lock_t nested_p2m_lock;
>  
> +    /* altp2m: allow multiple copies of host p2m */
> +    bool_t altp2m_active;
> +    struct p2m_domain *altp2m_p2m[MAX_ALTP2M];
> +    mm_lock_t altp2m_lock;
> +    uint64_t *altp2m_eptp;
> +
>      /* NB. protected by d->event_lock and by irq_desc[irq].lock */
>      struct radix_tree_root irq_pirq;
>  
> diff --git a/xen/include/asm-x86/hvm/altp2mhvm.h b/xen/include/asm-x86/hvm/altp2mhvm.h
> new file mode 100644
> index 0000000..919986e
> --- /dev/null
> +++ b/xen/include/asm-x86/hvm/altp2mhvm.h
> @@ -0,0 +1,36 @@
> +/*
> + * Alternate p2m HVM
> + * Copyright (c) 2014, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#ifndef _HVM_ALTP2M_H
> +#define _HVM_ALTP2M_H
> +
> +#include <xen/types.h>         /* for uintNN_t */
> +#include <xen/sched.h>         /* for struct vcpu, struct domain */
> +#include <asm/hvm/vcpu.h>      /* for vcpu_altp2mhvm */
> +
> +/* Alternate p2m HVM on/off per domain */
> +#define altp2mhvm_active(d) \
> +    d->arch.altp2m_active

static inline bool_t altp2mhvm_active(const struct domain *d)
{
    return d->arch.altp2m_active;
}

Type systems are a nice.  We should use them where possible.

> +
> +/* Alternate p2m VCPU */
> +int altp2mhvm_vcpu_initialise(struct vcpu *v);
> +void altp2mhvm_vcpu_destroy(struct vcpu *v);
> +void altp2mhvm_vcpu_reset(struct vcpu *v);
> +
> +#endif /* _HVM_ALTP2M_H */
> +
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 7115a68..32d1d02 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -210,6 +210,14 @@ struct hvm_function_table {
>                                    uint32_t *ecx, uint32_t *edx);
>  
>      void (*enable_msr_exit_interception)(struct domain *d);
> +
> +    /* Alternate p2m */
> +    int (*ahvm_vcpu_initialise)(struct vcpu *v);
> +    void (*ahvm_vcpu_destroy)(struct vcpu *v);
> +    int (*ahvm_vcpu_reset)(struct vcpu *v);
> +    void (*ahvm_vcpu_update_eptp)(struct vcpu *v);
> +    void (*ahvm_vcpu_update_vmfunc_ve)(struct vcpu *v);
> +    bool_t (*ahvm_vcpu_emulate_ve)(struct vcpu *v);
>  };
>  
>  extern struct hvm_function_table hvm_funcs;
> @@ -531,6 +539,15 @@ extern bool_t opt_hvm_fep;
>  #define opt_hvm_fep 0
>  #endif
>  
> +/* updates the current EPTP in VMCS */
> +void ahvm_vcpu_update_eptp(struct vcpu *v);
> +
> +/* updates VMCS fields related to VMFUNC and #VE */
> +void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v);
> +
> +/* emulates #VE */
> +bool_t ahvm_vcpu_emulate_ve(struct vcpu *v);

These should be added in the patch which introduces them.

> +
>  #endif /* __ASM_X86_HVM_HVM_H__ */
>  
>  /*
> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
> index 01e0665..9302d40 100644
> --- a/xen/include/asm-x86/hvm/vcpu.h
> +++ b/xen/include/asm-x86/hvm/vcpu.h
> @@ -118,6 +118,13 @@ struct nestedvcpu {
>  
>  #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
>  
> +struct altp2mvcpu {
> +    uint16_t    p2midx ;        /* alternate p2m index */

Stray space.

> +    uint64_t    veinfo;         /* #VE information page guest pfn */
> +};
> +
> +#define vcpu_altp2mhvm(v) ((v)->arch.hvm_vcpu.avcpu)

static inline as well please.

~Andrew

> +
>  struct hvm_vcpu {
>      /* Guest control-register and EFER values, just as the guest sees them. */
>      unsigned long       guest_cr[5];
> @@ -163,6 +170,8 @@ struct hvm_vcpu {
>  
>      struct nestedvcpu   nvcpu;
>  
> +    struct altp2mvcpu   avcpu;
> +
>      struct mtrr_state   mtrr;
>      u64                 pat_cr;
>  
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 8193901..9fb5ba0 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -688,6 +688,28 @@ void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>      l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
>  
>  /*
> + * Alternate p2m: shadow p2m tables used for alternate memory views
> + */
> +
> +void altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> +    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
> +
> +/* get current alternate p2m table */
> +static inline struct p2m_domain *p2m_get_altp2m(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    uint16_t index = vcpu_altp2mhvm(v).p2midx;
> +
> +    return d->arch.altp2m_p2m[index];
> +}
> +
> +/* Locate an alternate p2m by its EPTP */
> +bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx);
> +
> +/* Switch alternate p2m for a single vcpu */
> +bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
> +
> +/*
>   * p2m type to IOMMU flags
>   */
>  static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-01-09 21:26 ` [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE Ed White
@ 2015-01-13 11:58   ` Andrew Cooper
  2015-01-15 16:56   ` Tim Deegan
  1 sibling, 0 replies; 135+ messages in thread
From: Andrew Cooper @ 2015-01-13 11:58 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 09/01/15 21:26, Ed White wrote:
> Implement and hook up the code to enable VMX support of VMFUNC and #VE.
>
> VMFUNC leaf 0 (EPTP switching) and #VE are emulated on hardware that
> doesn't support them.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 138 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 138 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 931709b..a0a2d02 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -56,6 +56,7 @@
>  #include <asm/debugger.h>
>  #include <asm/apic.h>
>  #include <asm/hvm/nestedhvm.h>
> +#include <asm/hvm/altp2mhvm.h>
>  #include <asm/event.h>
>  #include <public/arch-x86/cpuid.h>
>  
> @@ -1718,6 +1719,91 @@ static void vmx_enable_msr_exit_interception(struct domain *d)
>                                           MSR_TYPE_W);
>  }
>  
> +static void vmx_vcpu_update_eptp(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    struct p2m_domain *p2m = altp2mhvm_active(d) ?
> +        p2m_get_altp2m(v) : p2m_get_hostp2m(d);
> +    struct ept_data *ept = &p2m->ept;
> +
> +    ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> +
> +    vmx_vmcs_enter(v);
> +
> +    __vmwrite(EPT_POINTER, ept_get_eptp(ept));
> +
> +    vmx_vmcs_exit(v);
> +}
> +
> +static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
> +
> +    if ( !cpu_has_vmx_vmfunc )
> +        return;
> +
> +    if ( cpu_has_vmx_virt_exceptions )
> +        mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
> +
> +    vmx_vmcs_enter(v);
> +
> +    if ( !d->is_dying && altp2mhvm_active(d) )
> +    {
> +        v->arch.hvm_vmx.secondary_exec_control |= mask;
> +        __vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING);
> +        __vmwrite(EPTP_LIST_ADDR, virt_to_maddr(d->arch.altp2m_eptp));
> +
> +        if ( cpu_has_vmx_virt_exceptions )
> +        {
> +            p2m_type_t t;
> +            mfn_t mfn;
> +
> +            mfn = get_gfn_query_unlocked(d, vcpu_altp2mhvm(v).veinfo, &t);
> +            __vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) << PAGE_SHIFT);
> +        }
> +    }
> +    else
> +        v->arch.hvm_vmx.secondary_exec_control &= ~mask;
> +
> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
> +        v->arch.hvm_vmx.secondary_exec_control);
> +
> +    vmx_vmcs_exit(v);
> +}
> +
> +static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
> +{
> +    bool_t rc = 0;
> +    ve_info_t *veinfo = vcpu_altp2mhvm(v).veinfo ?
> +        hvm_map_guest_frame_rw(vcpu_altp2mhvm(v).veinfo, 0) : NULL;
> +
> +    if ( !veinfo )
> +        return 0;
> +
> +    if ( veinfo->semaphore != 0 )
> +        goto out;
> +
> +    rc = 1;
> +
> +    veinfo->exit_reason = EXIT_REASON_EPT_VIOLATION;
> +    veinfo->semaphore = ~0l;
> +    veinfo->eptp_index = vcpu_altp2mhvm(v).p2midx;
> +
> +    vmx_vmcs_enter(v);
> +    __vmread(EXIT_QUALIFICATION, &veinfo->exit_qualification);
> +    __vmread(GUEST_LINEAR_ADDRESS, &veinfo->gla);
> +    __vmread(GUEST_PHYSICAL_ADDRESS, &veinfo->gpa);
> +    vmx_vmcs_exit(v);
> +
> +    hvm_inject_hw_exception(TRAP_virtualisation,
> +                            HVM_DELIVER_NO_ERROR_CODE);
> +
> +out:
> +    hvm_unmap_guest_frame(veinfo, 0);
> +    return rc;
> +}
> +
>  static struct hvm_function_table __initdata vmx_function_table = {
>      .name                 = "VMX",
>      .cpu_up_prepare       = vmx_cpu_up_prepare,
> @@ -1777,6 +1863,9 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
>      .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
>      .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> +    .ahvm_vcpu_update_eptp = vmx_vcpu_update_eptp,
> +    .ahvm_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
> +    .ahvm_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
>  };
>  
>  const struct hvm_function_table * __init start_vmx(void)
> @@ -2551,6 +2640,17 @@ static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs)
>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>          break;
>      case X86EMUL_EXCEPTION:
> +        /* check for a VMFUNC that should be emulated */
> +        if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
> +             ctxt.insn_buf_bytes >= 3 && ctxt.insn_buf[0] == 0x0f &&
> +             ctxt.insn_buf[1] == 0x01 && ctxt.insn_buf[2] == 0xd4 &&
> +             regs->eax == 0 &&
> +             p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
> +        {
> +            regs->eip += 3;
> +            return;
> +        }
> +

This is not appropriate.  You should extend the x86_emulate to decode
and act on the VMFUNC instruction.

>          if ( ctxt.exn_pending )
>              hvm_inject_trap(&ctxt.trap);
>          /* fall through */
> @@ -2698,6 +2798,40 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>  
>      /* Now enable interrupts so it's safe to take locks. */
>      local_irq_enable();
> + 
> +    /*
> +     * If the guest has the ability to switch EPTP without an exit,
> +     * figure out whether it has done so and update the altp2m data.
> +     */
> +    if ( altp2mhvm_active(v->domain) &&
> +        (v->arch.hvm_vmx.secondary_exec_control &
> +        SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
> +    {
> +        unsigned long idx;
> +
> +        if ( v->arch.hvm_vmx.secondary_exec_control &
> +            SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
> +            __vmread(EPTP_INDEX, &idx);
> +        else
> +        {
> +            unsigned long eptp;
> +
> +            __vmread(EPT_POINTER, &eptp);
> +
> +            if ( !p2m_find_altp2m_by_eptp(v->domain, eptp, &idx) )
> +            {
> +                gdprintk(XENLOG_ERR, "EPTP not found in alternate p2m list\n");
> +                domain_crash(v->domain);
> +            }
> +        }
> +
> +        if ( (uint16_t)idx != vcpu_altp2mhvm(v).p2midx )
> +        {
> +            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> +            vcpu_altp2mhvm(v).p2midx = (uint16_t)idx;
> +            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> +        }
> +    }
>  
>      /* XXX: This looks ugly, but we need a mechanism to ensure
>       * any pending vmresume has really happened
> @@ -3041,6 +3175,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>              update_guest_eip();
>          break;
>  
> +    case EXIT_REASON_VMFUNC:
> +        vmx_vmexit_ud_intercept(regs);

This should be a dedicated function to handle vmfunc, which is invoked
from here and from the x86_emulate when emulating a 'VMFUNC' instruction.

~Andrew

> +        break;
> +
>      case EXIT_REASON_INVEPT:
>          if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
>              update_guest_eip();

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13  8:56             ` Jan Beulich
  2015-01-13 11:28               ` Ian Jackson
@ 2015-01-13 17:42               ` Ed White
  1 sibling, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-13 17:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, ian.campbell, Andrew Cooper, tim, xen-devel, ian.jackson

On 01/13/2015 12:56 AM, Jan Beulich wrote:
>>>> On 12.01.15 at 18:36, <edmund.h.white@intel.com> wrote:
>> On 01/12/2015 02:00 AM, Jan Beulich wrote:
>>>>>> On 10.01.15 at 00:04, <edmund.h.white@intel.com> wrote:
>>>> On 01/09/2015 02:41 PM, Andrew Cooper wrote:
>>>>> Having some non-OS part of the guest swap the EPT tables and
>>>>> accidentally turn a DMA buffer read-only is not going to end well.
>>>>>
>>>>
>>>> The agent can certainly do bad things, and at some level you have to assume 
>> it
>>>> is sensible enough not to. However, I'm not sure this is fundamentally more
>>>> dangerous than what a privileged domain can do today using the MEMOP...
>>>> operations, and people are already using those for very similar purposes.
>>>
>>> I don't follow - how is what privileged domain can do related to the
>>> proposed changes here (which are - via VMFUNC - at least partially
>>> guest controllable, and that's also the case Andrew mentioned in his
>>> reply)? I'm having a hard time understanding how a P2M stripped of
>>> anything that's not plain RAM can be very useful to a guest. IOW
>>> without such fundamental aspects clarified I don't see a point in
>>> looking at the individual patches (which btw, according to your
>>> wording elsewhere, should have been marked RFC).
>>>
>> In this patch series, none of the new hypercalls are protected by xsm
>> policies. Earlier in the process of working on this code, I added such
>> a check to all the hypercalls, but then removed them all because it
>> dawned on me that I didn't actually understand what I was doing and
>> my code only worked because I only ever built the dummy permit everything
>> policy.
>>
>> Should some version of this patch series be accepted, my hope is that
>> someone who does understand xsm policies would put the appropriate checks
>> in place, and at that point I maintain that these extra capabilities
>> would not be fundamentally more dangerous than existing mechanisms
>> available to privileged domains, because policy can prevent the guest
>> using vmfunc. That's obviously not true today.
> 
> Please simply consult with the XSM maintainer on questions/issues
> like this. Proposing a partial (insecure) patch set isn't appropriate.
> 
>> The alternate p2m's only contain entries for ram pages with valid mfn's.
>> All other page types are still handled in the nested page fault handler
>> for the host p2m. Those pages (at least the ones I've encountered) don't
>> require the hardware to have a valid EPTE for the page.
> 
> I.e. the functionality requiring e.g. p2m_ram_logdirty and
> p2m_mmio_direct is then incompatible with your proposed additions
> (which I think was also already noted by Andrew). That's imo not
> a basis to think about accepting (or even reviewing) the series.

Andrew raised that question, and I answered that pages needing
special handling are compatible with these changes. Unless I
misunderstood him, he accepted that.

If the hardware is never intended to be able to satisfy an access to
a page without generating an EPT violation, then all the hardware
needs is a set of EPT's that guarantee that behaviour. These changes
take of advantage of that to avoid copying any of the EPTE's for special
pages into the alternate p2m's. Instead, the nested page fault handler
for the alternate p2m returns a status to indicate that the host p2m
nested page fault handler should handle the violation using the data
in the host p2m.

If the result is that the page becomes ram in the host p2m and the
instruction is restarted, the hardware will generate another violation
and this time the EPTE will be copied.

This works. I have vram log-dirty working, something that does not work
with the nestedhvm nested EPT code.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13 10:21             ` Tamas K Lengyel
@ 2015-01-13 18:25               ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-13 18:25 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich

On 01/13/2015 02:21 AM, Tamas K Lengyel wrote:
> On Mon, Jan 12, 2015 at 7:31 PM, Ed White <edmund.h.white@intel.com> wrote:
>> On 01/12/2015 10:00 AM, Ian Jackson wrote:
>>> Ed White writes ("Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m"):
>>>> The hypercalls are all there. My testing is all done in a Windows
>>>> domU with the tests running inside that domain, so I couldn't use
>>>> tools support even if I had it.
>>>
>>> To support this code in-tree, I think we will need Open Source code
>>> for exercising it, surely ?
>>>
>>
>> I'm hoping that, as Andrew says, there will be people interested
>> in using these capabilities, and that some of them will be prepared
>> to help fill in the gaps. That's why I wanted to send the series to
>> the list very early in the 4.6 development cycle.
>>
>> If that doesn't turn out to be the case, I'll see if I can find some
>> help internally, but I have neither the bandwidth nor the expertise
>> to do everything myself.
>>
>> Ed
> 
> Hi Ed,
> we are certainly very interested in this feature so thanks for posting
> this series!
> 
> I also see a usecase for multiple copies of host p2m by enabling
> better performance for monitoring with the existing memaccess API.
> Currently the problem is that if a memaccess violation occurs on one
> vcpu, the memaccess settings need to be cleared, then re-applied again
> after the operation has passed (usually done via singlestepping). With
> multiple vCPUs there is a potential racecondition here, unless all
> other vCPUs are paused while the memaccess settings are cleared. With
> multiple copies of the host p2m, we could easily just swap in a table
> for the violating vCPU where the permissions are clear, without
> affecting any of the other vCPUs. This could be exercised by extending
> the xen-access test tool!
> 
> Is this something you think would be within scope for the envisioned
> use-case for this series?
> 
> Tamas
> 

Yes, this is definitely within scope. The last patch in the series
adds a request flag to indicate that a memory access violation occurred
in an alternate p2m and a field containing the index of the p2m, and
the response can use the same flag and field to cause a p2m switch
for the violating vcpu.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/11] VMX: implement suppress #VE.
  2015-01-12 17:45     ` Ed White
@ 2015-01-13 18:36       ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-13 18:36 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 01/12/2015 09:45 AM, Ed White wrote:
> On 01/12/2015 08:43 AM, Andrew Cooper wrote:
>> On 09/01/15 21:26, Ed White wrote:
>>> In preparation for selectively enabling hardware #VE in a later patch,
>>> set suppress #VE on all EPTE's on #VE-capable hardware.
>>>
>>> Suppress #VE should always be the default condition for two reasons:
>>> it is generally not safe to deliver #VE into a guest unless that guest
>>> has been modified to receive it; and even then for most EPT violations only
>>> the hypervisor is able to handle the violation.
>>>
>>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>>> ---
>>>  xen/arch/x86/mm/p2m-ept.c         | 34 +++++++++++++++++++++++++++++++++-
>>>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
>>>  2 files changed, 34 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>>> index eb8b5f9..2b9f07c 100644
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -41,7 +41,7 @@
>>>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>>>  static inline bool_t is_epte_valid(ept_entry_t *e)
>>>  {
>>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
>>> +    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);
>>>  }
>>>  
>>>  /* returns : 0 for success, -errno otherwise */
>>> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
>>>  
>>>      ept_entry->r = ept_entry->w = ept_entry->x = 1;
>>>  
>>> +    /* Disable #VE on all entries */ 
>>> +    if ( cpu_has_vmx_virt_exceptions )
>>> +    {
>>> +        ept_entry_t *table = __map_domain_page(pg);
>>> +
>>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
>>
>> Style - please declare i in the upper scope, and it should be unsigned.
>>
>>> +            table[i].suppress_ve = 1;
>>> +
>>> +        unmap_domain_page(table);
>>> +
>>> +        ept_entry->suppress_ve = 1;
>>> +    }
>>> +
>>>      return 1;
>>>  }
>>>  
>>> @@ -243,6 +256,10 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>>>          epte->sp = (level > 1);
>>>          epte->mfn += i * trunk;
>>>          epte->snp = (iommu_enabled && iommu_snoop);
>>> +
>>> +        if ( cpu_has_vmx_virt_exceptions )
>>> +            epte->suppress_ve = 1;
>>> +
>>>          ASSERT(!epte->rsvd1);
>>>  
>>>          ept_p2m_type_to_flags(epte, epte->sa_p2mt, epte->access);
>>> @@ -753,6 +770,9 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>>>          ept_p2m_type_to_flags(&new_entry, p2mt, p2ma);
>>>      }
>>>  
>>> +    if ( cpu_has_vmx_virt_exceptions )
>>> +        new_entry.suppress_ve = 1;
>>> +
>>>      rc = atomic_write_ept_entry(ept_entry, new_entry, target);
>>>      if ( unlikely(rc) )
>>>          old_entry.epte = 0;
>>> @@ -1069,6 +1089,18 @@ int ept_p2m_init(struct p2m_domain *p2m)
>>>      /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
>>>      ept->ept_wl = 3;
>>>  
>>> +    /* Disable #VE on all entries */
>>> +    if ( cpu_has_vmx_virt_exceptions )
>>> +    {
>>> +        ept_entry_t *table =
>>> +            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
>>> +
>>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
>>> +            table[i].suppress_ve = 1;
>>
>> Is it safe setting SVE on an entry which is not known to be a superpage
>> or not present?  The manual states that the bit is ignored in this case,
>> but I am concerned that, as with SVE, this bit will suddenly gain
>> meaning in the future.
>>
> 
> It is safe to do this. Never say never, but I am aware of no plans to
> overload this bit, and I would know. Unless you feel strongly about it,
> I would prefer to leave this as-is, since changing it would make the code
> more complex.
> 

One point that I should have clarified yesterday: the SDM says the bit is
ignored for a non-terminal present entry; the bit is not ignored for
non-present entries, which is why I have to set all the SVE bits in a new
page -- my lazy EPTE copying algorithm wouldn't work otherwise because all
the zero entries would generate #VE.

Ed

>>> +
>>> +        unmap_domain_page(table);
>>> +    }
>>> +
>>>      if ( !zalloc_cpumask_var(&ept->synced_mask) )
>>>          return -ENOMEM;
>>>  
>>> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
>>> index 8bae195..70fee74 100644
>>> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
>>> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
>>> @@ -49,6 +49,7 @@ typedef union {
>>>          suppress_ve :   1;  /* bit 63 - suppress #VE */
>>>      };
>>>      u64 epte;
>>> +    u64 valid       :   63; /* entire EPTE except suppress #VE bit */
>>
>> I am not sure 'valid' is a sensible name here.  As it is only used in
>> is_epte_valid(), might it be better to just use ->epte and a bitmask for
>> everything other than the #VE bit?
>>
> 
> This seemed more in the style of the code I was changing, but I can do it
> as you suggest.
> 
> Ed
> 
>>>  } ept_entry_t;
>>>  
>>>  typedef struct {
>>
>>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection.
  2015-01-12 13:06   ` Andrew Cooper
@ 2015-01-13 18:50     ` Ed White
  2015-01-14 14:38       ` Andrew Cooper
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-13 18:50 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 01/12/2015 05:06 AM, Andrew Cooper wrote:
> On 09/01/15 21:26, Ed White wrote:
>> Currently, neither is enabled globally but may be enabled on a per-VCPU
>> basis by the altp2m code.
>>
>> Everything can be force-disabled globally by specifying vmfunc=0 on the
>> Xen command line.
>>
>> Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
>> that bit is now hardware-defined.
>>
>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>> ---
>>  docs/misc/xen-command-line.markdown |  7 +++++++
>>  xen/arch/x86/hvm/vmx/vmcs.c         | 40 +++++++++++++++++++++++++++++++++++++
>>  xen/arch/x86/mm/p2m-ept.c           |  1 -
>>  xen/include/asm-x86/hvm/vmx/vmcs.h  | 16 +++++++++++++++
>>  xen/include/asm-x86/hvm/vmx/vmx.h   | 13 +++++++++++-
>>  xen/include/asm-x86/msr-index.h     |  1 +
>>  6 files changed, 76 insertions(+), 2 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
>> index 152ae03..00fbae7 100644
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1305,6 +1305,13 @@ The optional `keep` parameter causes Xen to continue using the vga
>>  console even after dom0 has been started.  The default behaviour is to
>>  relinquish control to dom0.
>>  
>> +### vmfunc (Intel)
>> +> `= <boolean>`
>> +
>> +> Default: `true`
>> +
>> +Use VMFUNC and #VE support if available.
>> +
>>  ### vpid (Intel)
>>  > `= <boolean>`
>>  
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 9d8033e..4274e92 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -50,6 +50,9 @@ boolean_param("unrestricted_guest", opt_unrestricted_guest_enabled);
>>  static bool_t __read_mostly opt_apicv_enabled = 1;
>>  boolean_param("apicv", opt_apicv_enabled);
>>  
>> +static bool_t __read_mostly opt_vmfunc_enabled = 1;
>> +boolean_param("vmfunc", opt_vmfunc_enabled);
> 
> Please can experimental features be off by default.  (I am specifically
> looking to avoid the issues we had with apicv getting into stable
> releases despite reliably causing problems for migration).
> 
> I suspect you will have many interested testers for this featureset, and
> it is fine to patch the default later when the feature gets declared stable.
> 
> I also wonder whether it might be better to have a "vmx=" command line
> parameter with "vmfunc" as a subopt, to save gaining an ever increasing
> set of related top level parameters?
> 
> Other than this, the content of the rest of the patch appears fine.
> 

I definitely can change the default to off, but I don't think it will
have the effect you're expecting.

This patch simply determines whether the hardware supports enabling
VMFUNC and #VE, but does not enable them. If a domain enters
alternate p2m mode through the relevant hypercall, at that point
VMFUNC will be enabled for vcpu's in that domain; and if a vcpu in
that domain subsequently registers itself to receive #VE through
another hypercall, #VE will be enabled for that vcpu. Since both
features are emulated if the hardware doesn't support them, changing
the default to off will simply force emulation.

In its current state, I suspect that this patch series will cause
problems for migration either way, as I noted in the cover letter.

As regards making vmfunc a subopt of vmx: I can look into that, but
then what happens if AMD implements vmfunc?

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (12 preceding siblings ...)
  2015-01-12 12:17 ` Ian Jackson
@ 2015-01-13 19:01 ` Andrew Cooper
  2015-01-13 20:02   ` Ed White
  2015-01-15 16:15 ` Tim Deegan
  14 siblings, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-13 19:01 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: keir, ian.campbell, tim, ian.jackson, jbeulich, Tamas K Lengyel

On 09/01/15 21:26, Ed White wrote:
> This set of patches adds support to hvm domains for EPTP switching by creating
> multiple copies of the host p2m (currently limited to 10 copies).
>
> The primary use of this capability is expected to be in scenarios where access
> to memory needs to be monitored and/or restricted below the level at which the
> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
> developer summit are:
>
>     VM introspection: 
>         http://www.slideshare.net/xen_com_mgr/
>         zero-footprint-guest-memory-introspection-from-xen
>
>     Secure inter-VM communication:
>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>
> Each p2m copy is populated lazily on EPT violations, and only contains entries for
> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>
> All this is done through extra HVMOP types.
>
> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
> code is hypervisor-only, the toolstack has not been modified.
>
> The intra-domain code has been tested. Violation notifications can only be received
> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
> intra-domain, and only on VCPU's that have enabled notification.
>
> VMFUNC and #VE will both be emulated on hardware without native support.
>
> This code is not compatible with nested hvm functionality and will refuse to work
> with nested hvm active. It is also not compatible with migration. It should be
> considered experimental.

Having reviewed most of the series, I believe I now have a feeling for
what you are trying to achieve, but I would like to discuss some of the
design implications.

The following is my understanding of the situation.  Please correct me
if I have made a mistake.


Currently, a domain has a single host p2m.  This contains the guest
physical address mappings, and a combination of p2m types which are used
by existing components to allow certain actions to happen.  All vcpus
run with the same host p2m.

A domain may have a number of nested p2ms (currently an arbitrary limit
of 10).  These are used for nested-virt and are translated by the host
p2m.  Vcpus in guest mode run under a nested p2m.

This new altp2m infrastructure adds the ability to use a different set
of tables in the place of the host p2m.  This, in practice, allows for
different translations, different p2m types, different access permissions. 

One usecase of alternate p2ms is to provide introspection information to
out-of-guest entities (via the mem_event interface) or to in-guest
entities (via #VE).


Now for some observations and assumptions.

It occurs to me that the altp2m mechanism is generic.  From the look of
the series, it is mostly implemented in a generic way, which is great. 
The only Intel specific bits appear to be the ept handling itself,
'vmfunc' instruction support and #VE injection to in-guest entities. 

I can't think of any reasonable case where the alternate p2m would want
mappings different to the host p2m.  That is to say, an altp2m will map
the same set of mfns to make a guest physical address space, but may
differ in page permissions and possibly p2m types.

Given the above restriction, I believe a lot of the existing features
can continue to work and coexist.  For generating mem_events, the
permissions can be altered in the altp2m.  For injecting #VE, the altp2m
type can change to the new p2m_ram_rw, so long as the host p2m type is
compatible.  For both, a vmexit can occur.  Xen can do the appropriate
action and also inject a #VE on its way back into the guest.

One thing I have noticed while looking at the #VE stuff that EPT also
supports A/D tracking, which might be quite a nice optimisation and
forgo the need for p2m_ram_logdirty, but I think this should be treated
as an orthogonal item.

When shared ept/iommu is not in use, altp2m can safely be used by vcpus,
as this will not interfere with the IOMMU permissions.

Furthermore, I can't conceptually think of an issue against the idea of
nestedp2m alternatives, following the same rule that the mapped mfns
match up.  That should allow all existing nestedvirt infrastructure
continue to work.

Does the above look sensible, or have I overlooked something?

~Andrew

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 04/11] x86/MM: Improve p2m type checks.
  2015-01-12 17:48   ` Andrew Cooper
@ 2015-01-13 19:39     ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-13 19:39 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 01/12/2015 09:48 AM, Andrew Cooper wrote:
> On 09/01/15 21:26, Ed White wrote:
>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
>> index 5f7fe71..8193901 100644
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -193,6 +193,9 @@ struct p2m_domain {
>>       * threaded on in LRU order. */
>>      struct list_head   np2m_list;
>>  
>> +    /* Does this p2m belong to the altp2m code? */
>> +    bool_t alternate;
>> +
>>      /* Host p2m: Log-dirty ranges registered for the domain. */
>>      struct rangeset   *logdirty_ranges;
>>  
>> @@ -290,7 +293,9 @@ struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
>>   */
>>  struct p2m_domain *p2m_get_p2m(struct vcpu *v);
>>  
>> -#define p2m_is_nestedp2m(p2m)   ((p2m) != p2m_get_hostp2m((p2m->domain)))
>> +#define p2m_is_hostp2m(p2m)   ((p2m) == p2m_get_hostp2m((p2m->domain)))
>> +#define p2m_is_altp2m(p2m)    ((p2m)->alternate)
>> +#define p2m_is_nestedp2m(p2m) (!p2m_is_altp2m(p2m) && !p2m_ishostp2m(p2m))
> 
> Might this be better expressed as a p2m type, currently of the set
> {host, alt, nested} ?  p2m_is_nestedp2m() is starting to hide some
> moderately complicated calculations.
> 

Any suggestions for the name? Unfortunately, p2m_type is already
taken, and I can't think of a good alternative.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-13 11:28   ` Andrew Cooper
@ 2015-01-13 19:49     ` Ed White
  2015-03-25 20:59       ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-13 19:49 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 01/13/2015 03:28 AM, Andrew Cooper wrote:
> On 09/01/15 21:26, Ed White wrote:
>> Add the basic data structures needed to support alternate p2m's and
>> the functions to initialise them and tear them down.
>>
>> Although Intel hardware can handle 512 EPTP's per hardware thread
>> concurrently, only 10 per domain are supported in this patch for
>> performance reasons.
>>
>> The iterator in hap_enable() does need to handle 512, so that is now
>> uint16_t.
>>
>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>> ---
>>  xen/arch/x86/hvm/Makefile           |   3 +-
>>  xen/arch/x86/hvm/altp2mhvm.c        |  77 +++++++++++++++++++++++++++
>>  xen/arch/x86/hvm/hvm.c              |  21 ++++++++
>>  xen/arch/x86/mm/hap/Makefile        |   1 +
>>  xen/arch/x86/mm/hap/altp2m_hap.c    |  66 +++++++++++++++++++++++
>>  xen/arch/x86/mm/hap/hap.c           |  30 ++++++++++-
>>  xen/arch/x86/mm/mm-locks.h          |   4 ++
>>  xen/arch/x86/mm/p2m.c               | 102 ++++++++++++++++++++++++++++++++++++
>>  xen/include/asm-x86/domain.h        |   7 +++
>>  xen/include/asm-x86/hvm/altp2mhvm.h |  36 +++++++++++++
>>  xen/include/asm-x86/hvm/hvm.h       |  17 ++++++
>>  xen/include/asm-x86/hvm/vcpu.h      |   9 ++++
>>  xen/include/asm-x86/p2m.h           |  22 ++++++++
>>  13 files changed, 393 insertions(+), 2 deletions(-)
>>  create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
>>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
>>  create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h
>>
>> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
>> index eea5555..5bf8b4f 100644
>> --- a/xen/arch/x86/hvm/Makefile
>> +++ b/xen/arch/x86/hvm/Makefile
>> @@ -22,4 +22,5 @@ obj-y += vlapic.o
>>  obj-y += vmsi.o
>>  obj-y += vpic.o
>>  obj-y += vpt.o
>> -obj-y += vpmu.o
>> \ No newline at end of file
>> +obj-y += vpmu.o
>> +obj-y += altp2mhvm.o
>> diff --git a/xen/arch/x86/hvm/altp2mhvm.c b/xen/arch/x86/hvm/altp2mhvm.c
>> new file mode 100644
>> index 0000000..fa0af0c
>> --- /dev/null
>> +++ b/xen/arch/x86/hvm/altp2mhvm.c
>> @@ -0,0 +1,77 @@
>> +/*
>> + * Alternate p2m HVM
>> + * Copyright (c) 2014, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
>> + * Place - Suite 330, Boston, MA 02111-1307 USA.
>> + */
>> +
>> +#include <asm/hvm/support.h>
>> +#include <asm/hvm/hvm.h>
>> +#include <asm/p2m.h>
>> +#include <asm/hvm/altp2mhvm.h>
>> +
>> +void
>> +altp2mhvm_vcpu_reset(struct vcpu *v)
>> +{
>> +    struct altp2mvcpu *av = &vcpu_altp2mhvm(v);
>> +
>> +    av->p2midx = 0;
>> +    av->veinfo = 0;
>> +
>> +    if ( hvm_funcs.ahvm_vcpu_reset )
>> +        hvm_funcs.ahvm_vcpu_reset(v);
>> +}
>> +
>> +int
>> +altp2mhvm_vcpu_initialise(struct vcpu *v)
>> +{
>> +    int rc = -EOPNOTSUPP;
>> +
>> +    if ( v != current )
>> +        vcpu_pause(v);
> 
> Under what circumstances would a vcpu initialisation happen on current? 
> All initialisation should happen during domain creation.
> 

A domain never starts in alternate p2m mode, much like it never starts in
nestedhvm guest mode. In the in-domain agent case, where the hypercall to
enter alternate p2m mode is issued by a vcpu in the target domain, this
check is necessary.

In my testing, the same domain often enters and leaves alternate p2m mode
multiple times, and these initialise and destroy functions are called
multiple times.

Ed

>> +
>> +    if ( !hvm_funcs.ahvm_vcpu_initialise ||
>> +         (hvm_funcs.ahvm_vcpu_initialise(v) == 0) )
>> +    {
>> +        rc = 0;
>> +        altp2mhvm_vcpu_reset(v);
>> +        cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
>> +
>> +        ahvm_vcpu_update_eptp(v);
>> +    }
>> +
>> +    if ( v != current )
>> +        vcpu_unpause(v);
>> +
>> +    return rc;
>> +}
>> +
>> +void
>> +altp2mhvm_vcpu_destroy(struct vcpu *v)
>> +{
>> +    if ( v != current )
>> +        vcpu_pause(v);
>> +
>> +    if ( hvm_funcs.ahvm_vcpu_destroy )
>> +        hvm_funcs.ahvm_vcpu_destroy(v);
>> +
>> +    cpumask_clear_cpu(v->processor, p2m_get_altp2m(v)->dirty_cpumask);
>> +    altp2mhvm_vcpu_reset(v);
>> +
>> +    ahvm_vcpu_update_eptp(v);
>> +    ahvm_vcpu_update_vmfunc_ve(v);
>> +
>> +    if ( v != current )
>> +        vcpu_unpause(v);
>> +}
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index b89e9d2..e8787cc 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -60,6 +60,7 @@
>>  #include <asm/hvm/cacheattr.h>
>>  #include <asm/hvm/trace.h>
>>  #include <asm/hvm/nestedhvm.h>
>> +#include <asm/hvm/altp2mhvm.h>
>>  #include <asm/mtrr.h>
>>  #include <asm/apic.h>
>>  #include <public/sched.h>
>> @@ -2290,6 +2291,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
>>  
>>      hvm_all_ioreq_servers_remove_vcpu(d, v);
>>  
>> +    altp2mhvm_vcpu_destroy(v);
>>      nestedhvm_vcpu_destroy(v);
>>  
>>      free_compat_arg_xlat(v);
>> @@ -6377,6 +6379,25 @@ bool_t hvm_altp2m_supported()
>>      return hvm_funcs.altp2m_supported;
>>  }
>>  
>> +void ahvm_vcpu_update_eptp(struct vcpu *v)
>> +{
>> +    if (hvm_funcs.ahvm_vcpu_update_eptp)
>> +        hvm_funcs.ahvm_vcpu_update_eptp(v);
>> +}
>> +
>> +void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v)
>> +{
>> +    if (hvm_funcs.ahvm_vcpu_update_vmfunc_ve)
>> +        hvm_funcs.ahvm_vcpu_update_vmfunc_ve(v);
>> +}
>> +
>> +bool_t ahvm_vcpu_emulate_ve(struct vcpu *v)
>> +{
>> +    if (hvm_funcs.ahvm_vcpu_emulate_ve)
>> +        return hvm_funcs.ahvm_vcpu_emulate_ve(v);
>> +    return 0;
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
>> index 68f2bb5..216cd90 100644
>> --- a/xen/arch/x86/mm/hap/Makefile
>> +++ b/xen/arch/x86/mm/hap/Makefile
>> @@ -4,6 +4,7 @@ obj-y += guest_walk_3level.o
>>  obj-$(x86_64) += guest_walk_4level.o
>>  obj-y += nested_hap.o
>>  obj-y += nested_ept.o
>> +obj-y += altp2m_hap.o
>>  
>>  guest_walk_%level.o: guest_walk.c Makefile
>>  	$(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
>> diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c b/xen/arch/x86/mm/hap/altp2m_hap.c
>> new file mode 100644
>> index 0000000..c2cdc42
>> --- /dev/null
>> +++ b/xen/arch/x86/mm/hap/altp2m_hap.c
>> @@ -0,0 +1,66 @@
>> +/******************************************************************************
>> + * arch/x86/mm/hap/altp2m_hap.c
>> + *
>> + * Copyright (c) 2014 Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>> + */
>> +
>> +#include <xen/mem_event.h>
>> +#include <xen/event.h>
>> +#include <public/mem_event.h>
>> +#include <asm/domain.h>
>> +#include <asm/page.h>
>> +#include <asm/paging.h>
>> +#include <asm/p2m.h>
>> +#include <asm/mem_sharing.h>
>> +#include <asm/hap.h>
>> +#include <asm/hvm/support.h>
>> +
>> +#include "private.h"
>> +
>> +/* Override macros from asm/page.h to make them work with mfn_t */
>> +#undef mfn_valid
>> +#define mfn_valid(_mfn) __mfn_valid(mfn_x(_mfn))
>> +#undef page_to_mfn
>> +#define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg))
>> +
>> +void
>> +altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>> +    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level)
>> +{
>> +    struct domain *d = p2m->domain;
>> +    uint32_t old_flags;
>> +
>> +    paging_lock(d);
>> +
>> +    old_flags = l1e_get_flags(*p);
>> +    safe_write_pte(p, new);
>> +
>> +    if (old_flags & _PAGE_PRESENT)
>> +        flush_tlb_mask(p2m->dirty_cpumask);
>> +
>> +    paging_unlock(d);
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>> index abf3d7a..8fe0650 100644
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -439,7 +439,7 @@ void hap_domain_init(struct domain *d)
>>  int hap_enable(struct domain *d, u32 mode)
>>  {
>>      unsigned int old_pages;
>> -    uint8_t i;
>> +    uint16_t i;
>>      int rv = 0;
>>  
>>      domain_pause(d);
>> @@ -485,6 +485,23 @@ int hap_enable(struct domain *d, u32 mode)
>>             goto out;
>>      }
>>  
>> +    /* Init alternate p2m data */
>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
> 
> This memory should be allocated from some domain-accounted pool,
> probably the paging pool (d->->arch.paging.alloc_page()).  You can use
> map_domain_page_global() to get a safe pointer to anchor in
> d->arch.altp2m_eptp for hardware.
> 
>> +    {
>> +        rv = -ENOMEM;
>> +        goto out;
>> +    }
>> +    for (i = 0; i < 512; i++)
>> +        d->arch.altp2m_eptp[i] = ~0ul;
>> +
>> +    for (i = 0; i < MAX_ALTP2M; i++) {
>> +        rv = p2m_alloc_table(d->arch.altp2m_p2m[i]);
>> +        if ( rv != 0 )
>> +           goto out;
>> +    }
>> +
>> +    d->arch.altp2m_active = 0;
>> +
>>      /* Now let other users see the new mode */
>>      d->arch.paging.mode = mode | PG_HAP_enable;
>>  
>> @@ -497,6 +514,17 @@ void hap_final_teardown(struct domain *d)
>>  {
>>      uint8_t i;
>>  
>> +    d->arch.altp2m_active = 0;
>> +
>> +    if ( d->arch.altp2m_eptp ) {
>> +        free_xenheap_page(d->arch.altp2m_eptp);
>> +        d->arch.altp2m_eptp = NULL;
>> +    }
>> +
>> +    for (i = 0; i < MAX_ALTP2M; i++) {
>> +        p2m_teardown(d->arch.altp2m_p2m[i]);
>> +    }
>> +
>>      /* Destroy nestedp2m's first */
>>      for (i = 0; i < MAX_NESTEDP2M; i++) {
>>          p2m_teardown(d->arch.nested_p2m[i]);
>> diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
>> index 769f7bc..a0faca3 100644
>> --- a/xen/arch/x86/mm/mm-locks.h
>> +++ b/xen/arch/x86/mm/mm-locks.h
>> @@ -209,6 +209,10 @@ declare_mm_lock(nestedp2m)
>>  #define nestedp2m_lock(d)   mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock)
>>  #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock)
>>  
>> +declare_mm_lock(altp2m)
>> +#define altp2m_lock(d)   mm_lock(altp2m, &(d)->arch.altp2m_lock)
>> +#define altp2m_unlock(d) mm_unlock(&(d)->arch.altp2m_lock)
>> +
>>  /* P2M lock (per-p2m-table)
>>   *
>>   * This protects all queries and updates to the p2m table.
>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
>> index 49b66fb..3c6049b 100644
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -35,6 +35,7 @@
>>  #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */
>>  #include <asm/mem_sharing.h>
>>  #include <asm/hvm/nestedhvm.h>
>> +#include <asm/hvm/altp2mhvm.h>
>>  #include <asm/hvm/svm/amd-iommu-proto.h>
>>  #include <xsm/xsm.h>
>>  
>> @@ -182,6 +183,44 @@ static void p2m_teardown_nestedp2m(struct domain *d)
>>      }
>>  }
>>  
>> +static void p2m_teardown_altp2m(struct domain *d);
>> +
>> +static int p2m_init_altp2m(struct domain *d)
>> +{
>> +    uint8_t i;
>> +    struct p2m_domain *p2m;
>> +
>> +    mm_lock_init(&d->arch.altp2m_lock);
>> +    for (i = 0; i < MAX_ALTP2M; i++)
>> +    {
>> +        d->arch.altp2m_p2m[i] = p2m = p2m_init_one(d);
>> +        if ( p2m == NULL )
>> +        {
>> +            p2m_teardown_altp2m(d);
>> +            return -ENOMEM;
>> +        }
>> +        p2m->write_p2m_entry = altp2m_write_p2m_entry;
>> +        p2m->alternate = 1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static void p2m_teardown_altp2m(struct domain *d)
>> +{
>> +    uint8_t i;
>> +    struct p2m_domain *p2m;
>> +
>> +    for (i = 0; i < MAX_ALTP2M; i++)
>> +    {
>> +        if ( !d->arch.altp2m_p2m[i] )
>> +            continue;
>> +        p2m = d->arch.altp2m_p2m[i];
>> +        p2m_free_one(p2m);
>> +        d->arch.altp2m_p2m[i] = NULL;
>> +    }
>> +}
>> +
>>  int p2m_init(struct domain *d)
>>  {
>>      int rc;
>> @@ -195,7 +234,14 @@ int p2m_init(struct domain *d)
>>       * (p2m_init runs too early for HVM_PARAM_* options) */
>>      rc = p2m_init_nestedp2m(d);
>>      if ( rc )
>> +    {
>>          p2m_teardown_hostp2m(d);
>> +        return rc;
>> +    }
>> +
>> +    rc = p2m_init_altp2m(d);
>> +    if ( rc )
>> +        p2m_teardown_altp2m(d);
>>  
>>      return rc;
>>  }
>> @@ -1891,6 +1937,62 @@ int unmap_mmio_regions(struct domain *d,
>>      return err;
>>  }
>>  
>> +bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx)
>> +{
>> +    struct p2m_domain *p2m;
>> +    struct ept_data *ept;
>> +    bool_t rc = 0;
>> +    uint16_t i;
>> +
>> +    altp2m_lock(d);
>> +
>> +    for ( i = 0; i < MAX_ALTP2M; i++ )
>> +    {
>> +        if ( d->arch.altp2m_eptp[i] == ~0ul )
>> +            continue;
>> +
>> +        p2m = d->arch.altp2m_p2m[i];
>> +        ept = &p2m->ept;
>> +
>> +        if ( eptp != ept_get_eptp(ept) )
>> +            continue;
>> +
>> +        *idx = i;
>> +        rc = 1;
>> +
>> +        break;
>> +    }
>> +
>> +    altp2m_unlock(d);
>> +    return rc;
>> +}
>> +
>> +bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx)
>> +{
>> +    struct domain *d = v->domain;
>> +    bool_t rc = 0;
>> +
>> +    if ( idx > MAX_ALTP2M )
>> +        return rc;
>> +
>> +    altp2m_lock(d);
>> +
>> +    if ( d->arch.altp2m_eptp[idx] != ~0ul )
>> +    {
>> +        if ( idx != vcpu_altp2mhvm(v).p2midx )
>> +        {
>> +            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
>> +            vcpu_altp2mhvm(v).p2midx = idx;
>> +            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
>> +            ahvm_vcpu_update_eptp(v);
>> +        }
>> +        rc = 1;
>> +    }
>> +
>> +    altp2m_unlock(d);
>> +    return rc;
>> +}
>> +
>>  /*** Audit ***/
>>  
>>  #if P2M_AUDIT
>> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
>> index 6a77a93..a0e9e90 100644
>> --- a/xen/include/asm-x86/domain.h
>> +++ b/xen/include/asm-x86/domain.h
>> @@ -230,6 +230,7 @@ struct paging_vcpu {
>>  typedef xen_domctl_cpuid_t cpuid_input_t;
>>  
>>  #define MAX_NESTEDP2M 10
>> +#define MAX_ALTP2M    10
>>  struct p2m_domain;
>>  struct time_scale {
>>      int shift;
>> @@ -274,6 +275,12 @@ struct arch_domain
>>      struct p2m_domain *nested_p2m[MAX_NESTEDP2M];
>>      mm_lock_t nested_p2m_lock;
>>  
>> +    /* altp2m: allow multiple copies of host p2m */
>> +    bool_t altp2m_active;
>> +    struct p2m_domain *altp2m_p2m[MAX_ALTP2M];
>> +    mm_lock_t altp2m_lock;
>> +    uint64_t *altp2m_eptp;
>> +
>>      /* NB. protected by d->event_lock and by irq_desc[irq].lock */
>>      struct radix_tree_root irq_pirq;
>>  
>> diff --git a/xen/include/asm-x86/hvm/altp2mhvm.h b/xen/include/asm-x86/hvm/altp2mhvm.h
>> new file mode 100644
>> index 0000000..919986e
>> --- /dev/null
>> +++ b/xen/include/asm-x86/hvm/altp2mhvm.h
>> @@ -0,0 +1,36 @@
>> +/*
>> + * Alternate p2m HVM
>> + * Copyright (c) 2014, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
>> + * Place - Suite 330, Boston, MA 02111-1307 USA.
>> + */
>> +
>> +#ifndef _HVM_ALTP2M_H
>> +#define _HVM_ALTP2M_H
>> +
>> +#include <xen/types.h>         /* for uintNN_t */
>> +#include <xen/sched.h>         /* for struct vcpu, struct domain */
>> +#include <asm/hvm/vcpu.h>      /* for vcpu_altp2mhvm */
>> +
>> +/* Alternate p2m HVM on/off per domain */
>> +#define altp2mhvm_active(d) \
>> +    d->arch.altp2m_active
> 
> static inline bool_t altp2mhvm_active(const struct domain *d)
> {
>     return d->arch.altp2m_active;
> }
> 
> Type systems are a nice.  We should use them where possible.
> 
>> +
>> +/* Alternate p2m VCPU */
>> +int altp2mhvm_vcpu_initialise(struct vcpu *v);
>> +void altp2mhvm_vcpu_destroy(struct vcpu *v);
>> +void altp2mhvm_vcpu_reset(struct vcpu *v);
>> +
>> +#endif /* _HVM_ALTP2M_H */
>> +
>> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
>> index 7115a68..32d1d02 100644
>> --- a/xen/include/asm-x86/hvm/hvm.h
>> +++ b/xen/include/asm-x86/hvm/hvm.h
>> @@ -210,6 +210,14 @@ struct hvm_function_table {
>>                                    uint32_t *ecx, uint32_t *edx);
>>  
>>      void (*enable_msr_exit_interception)(struct domain *d);
>> +
>> +    /* Alternate p2m */
>> +    int (*ahvm_vcpu_initialise)(struct vcpu *v);
>> +    void (*ahvm_vcpu_destroy)(struct vcpu *v);
>> +    int (*ahvm_vcpu_reset)(struct vcpu *v);
>> +    void (*ahvm_vcpu_update_eptp)(struct vcpu *v);
>> +    void (*ahvm_vcpu_update_vmfunc_ve)(struct vcpu *v);
>> +    bool_t (*ahvm_vcpu_emulate_ve)(struct vcpu *v);
>>  };
>>  
>>  extern struct hvm_function_table hvm_funcs;
>> @@ -531,6 +539,15 @@ extern bool_t opt_hvm_fep;
>>  #define opt_hvm_fep 0
>>  #endif
>>  
>> +/* updates the current EPTP in VMCS */
>> +void ahvm_vcpu_update_eptp(struct vcpu *v);
>> +
>> +/* updates VMCS fields related to VMFUNC and #VE */
>> +void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v);
>> +
>> +/* emulates #VE */
>> +bool_t ahvm_vcpu_emulate_ve(struct vcpu *v);
> 
> These should be added in the patch which introduces them.
> 
>> +
>>  #endif /* __ASM_X86_HVM_HVM_H__ */
>>  
>>  /*
>> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
>> index 01e0665..9302d40 100644
>> --- a/xen/include/asm-x86/hvm/vcpu.h
>> +++ b/xen/include/asm-x86/hvm/vcpu.h
>> @@ -118,6 +118,13 @@ struct nestedvcpu {
>>  
>>  #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
>>  
>> +struct altp2mvcpu {
>> +    uint16_t    p2midx ;        /* alternate p2m index */
> 
> Stray space.
> 
>> +    uint64_t    veinfo;         /* #VE information page guest pfn */
>> +};
>> +
>> +#define vcpu_altp2mhvm(v) ((v)->arch.hvm_vcpu.avcpu)
> 
> static inline as well please.
> 
> ~Andrew
> 
>> +
>>  struct hvm_vcpu {
>>      /* Guest control-register and EFER values, just as the guest sees them. */
>>      unsigned long       guest_cr[5];
>> @@ -163,6 +170,8 @@ struct hvm_vcpu {
>>  
>>      struct nestedvcpu   nvcpu;
>>  
>> +    struct altp2mvcpu   avcpu;
>> +
>>      struct mtrr_state   mtrr;
>>      u64                 pat_cr;
>>  
>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
>> index 8193901..9fb5ba0 100644
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -688,6 +688,28 @@ void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>>      l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
>>  
>>  /*
>> + * Alternate p2m: shadow p2m tables used for alternate memory views
>> + */
>> +
>> +void altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>> +    l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
>> +
>> +/* get current alternate p2m table */
>> +static inline struct p2m_domain *p2m_get_altp2m(struct vcpu *v)
>> +{
>> +    struct domain *d = v->domain;
>> +    uint16_t index = vcpu_altp2mhvm(v).p2midx;
>> +
>> +    return d->arch.altp2m_p2m[index];
>> +}
>> +
>> +/* Locate an alternate p2m by its EPTP */
>> +bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx);
>> +
>> +/* Switch alternate p2m for a single vcpu */
>> +bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
>> +
>> +/*
>>   * p2m type to IOMMU flags
>>   */
>>  static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)
> 
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13 19:01 ` Andrew Cooper
@ 2015-01-13 20:02   ` Ed White
  2015-01-13 20:45     ` Andrew Cooper
  2015-01-14  7:01     ` Jan Beulich
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-13 20:02 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: keir, ian.campbell, tim, ian.jackson, jbeulich, Tamas K Lengyel

On 01/13/2015 11:01 AM, Andrew Cooper wrote:
> On 09/01/15 21:26, Ed White wrote:
>> This set of patches adds support to hvm domains for EPTP switching by creating
>> multiple copies of the host p2m (currently limited to 10 copies).
>>
>> The primary use of this capability is expected to be in scenarios where access
>> to memory needs to be monitored and/or restricted below the level at which the
>> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
>> developer summit are:
>>
>>     VM introspection: 
>>         http://www.slideshare.net/xen_com_mgr/
>>         zero-footprint-guest-memory-introspection-from-xen
>>
>>     Secure inter-VM communication:
>>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>>
>> Each p2m copy is populated lazily on EPT violations, and only contains entries for
>> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
>> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>>
>> All this is done through extra HVMOP types.
>>
>> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
>> code is hypervisor-only, the toolstack has not been modified.
>>
>> The intra-domain code has been tested. Violation notifications can only be received
>> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
>> intra-domain, and only on VCPU's that have enabled notification.
>>
>> VMFUNC and #VE will both be emulated on hardware without native support.
>>
>> This code is not compatible with nested hvm functionality and will refuse to work
>> with nested hvm active. It is also not compatible with migration. It should be
>> considered experimental.
> 
> Having reviewed most of the series, I believe I now have a feeling for
> what you are trying to achieve, but I would like to discuss some of the
> design implications.
> 
> The following is my understanding of the situation.  Please correct me
> if I have made a mistake.
> 
> 

Thanks for investing the time to do this. Maybe this first couple of days
would have gone more smoothly if something like this was in the cover letter.

With the exception of a couple of minor points, you are spot on.

> Currently, a domain has a single host p2m.  This contains the guest
> physical address mappings, and a combination of p2m types which are used
> by existing components to allow certain actions to happen.  All vcpus
> run with the same host p2m.
> 
> A domain may have a number of nested p2ms (currently an arbitrary limit
> of 10).  These are used for nested-virt and are translated by the host
> p2m.  Vcpus in guest mode run under a nested p2m.
> 
> This new altp2m infrastructure adds the ability to use a different set
> of tables in the place of the host p2m.  This, in practice, allows for
> different translations, different p2m types, different access permissions. 
> 
> One usecase of alternate p2ms is to provide introspection information to
> out-of-guest entities (via the mem_event interface) or to in-guest
> entities (via #VE).
> 
> 
> Now for some observations and assumptions.
> 
> It occurs to me that the altp2m mechanism is generic.  From the look of
> the series, it is mostly implemented in a generic way, which is great. 
> The only Intel specific bits appear to be the ept handling itself,
> 'vmfunc' instruction support and #VE injection to in-guest entities. 
> 

That was my intention. I don't know enough about the state of AMD
virtualization to know if it can support these patches by emulating
vmfunc and #VE, but that was my target.

> I can't think of any reasonable case where the alternate p2m would want
> mappings different to the host p2m.  That is to say, an altp2m will map
> the same set of mfns to make a guest physical address space, but may
> differ in page permissions and possibly p2m types.
> 

The set of mfn's is the same, but I do allow gfn->mfn mappings to be
modified under certain circumstances. One use of this is to point the
same VA to different physical pages (with different access permissions)
in different p2m's to hide memory changes.

> Given the above restriction, I believe a lot of the existing features
> can continue to work and coexist.  For generating mem_events, the
> permissions can be altered in the altp2m.  For injecting #VE, the altp2m
> type can change to the new p2m_ram_rw, so long as the host p2m type is
> compatible.  For both, a vmexit can occur.  Xen can do the appropriate
> action and also inject a #VE on its way back into the guest.
> 
> One thing I have noticed while looking at the #VE stuff that EPT also
> supports A/D tracking, which might be quite a nice optimisation and
> forgo the need for p2m_ram_logdirty, but I think this should be treated
> as an orthogonal item.
> 

This is far from my area of expertise, but I believe there is code in Xen
to use EPT D bits in migration.

Ed

> When shared ept/iommu is not in use, altp2m can safely be used by vcpus,
> as this will not interfere with the IOMMU permissions.
> 
> Furthermore, I can't conceptually think of an issue against the idea of
> nestedp2m alternatives, following the same rule that the mapped mfns
> match up.  That should allow all existing nestedvirt infrastructure
> continue to work.
> 
> Does the above look sensible, or have I overlooked something?
> 
> ~Andrew
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13 20:02   ` Ed White
@ 2015-01-13 20:45     ` Andrew Cooper
  2015-01-13 21:30       ` Ed White
  2015-03-05 13:45       ` Egger, Christoph
  2015-01-14  7:01     ` Jan Beulich
  1 sibling, 2 replies; 135+ messages in thread
From: Andrew Cooper @ 2015-01-13 20:45 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: keir, ian.campbell, tim, ian.jackson, jbeulich, Tamas K Lengyel

On 13/01/15 20:02, Ed White wrote:
> On 01/13/2015 11:01 AM, Andrew Cooper wrote:
>> On 09/01/15 21:26, Ed White wrote:
>>> This set of patches adds support to hvm domains for EPTP switching by creating
>>> multiple copies of the host p2m (currently limited to 10 copies).
>>>
>>> The primary use of this capability is expected to be in scenarios where access
>>> to memory needs to be monitored and/or restricted below the level at which the
>>> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
>>> developer summit are:
>>>
>>>     VM introspection: 
>>>         http://www.slideshare.net/xen_com_mgr/
>>>         zero-footprint-guest-memory-introspection-from-xen
>>>
>>>     Secure inter-VM communication:
>>>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>>>
>>> Each p2m copy is populated lazily on EPT violations, and only contains entries for
>>> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
>>> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>>>
>>> All this is done through extra HVMOP types.
>>>
>>> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
>>> code is hypervisor-only, the toolstack has not been modified.
>>>
>>> The intra-domain code has been tested. Violation notifications can only be received
>>> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
>>> intra-domain, and only on VCPU's that have enabled notification.
>>>
>>> VMFUNC and #VE will both be emulated on hardware without native support.
>>>
>>> This code is not compatible with nested hvm functionality and will refuse to work
>>> with nested hvm active. It is also not compatible with migration. It should be
>>> considered experimental.
>> Having reviewed most of the series, I believe I now have a feeling for
>> what you are trying to achieve, but I would like to discuss some of the
>> design implications.
>>
>> The following is my understanding of the situation.  Please correct me
>> if I have made a mistake.
>>
>>
> Thanks for investing the time to do this. Maybe this first couple of days
> would have gone more smoothly if something like this was in the cover letter.

No problem.  (I tend to find that things like this save time in the long
run)

>
> With the exception of a couple of minor points, you are spot on.

Cool!

>
>> Currently, a domain has a single host p2m.  This contains the guest
>> physical address mappings, and a combination of p2m types which are used
>> by existing components to allow certain actions to happen.  All vcpus
>> run with the same host p2m.
>>
>> A domain may have a number of nested p2ms (currently an arbitrary limit
>> of 10).  These are used for nested-virt and are translated by the host
>> p2m.  Vcpus in guest mode run under a nested p2m.
>>
>> This new altp2m infrastructure adds the ability to use a different set
>> of tables in the place of the host p2m.  This, in practice, allows for
>> different translations, different p2m types, different access permissions. 
>>
>> One usecase of alternate p2ms is to provide introspection information to
>> out-of-guest entities (via the mem_event interface) or to in-guest
>> entities (via #VE).
>>
>>
>> Now for some observations and assumptions.
>>
>> It occurs to me that the altp2m mechanism is generic.  From the look of
>> the series, it is mostly implemented in a generic way, which is great. 
>> The only Intel specific bits appear to be the ept handling itself,
>> 'vmfunc' instruction support and #VE injection to in-guest entities. 
>>
> That was my intention. I don't know enough about the state of AMD
> virtualization to know if it can support these patches by emulating
> vmfunc and #VE, but that was my target.

As far as I am aware, AMD SVM has no similar concept to vmfunc, nor
#VE.  However, the same kinds of introspection are certainly possible by
playing with the read/write bits on the NPT tables and causing a vmexit.

>
>> I can't think of any reasonable case where the alternate p2m would want
>> mappings different to the host p2m.  That is to say, an altp2m will map
>> the same set of mfns to make a guest physical address space, but may
>> differ in page permissions and possibly p2m types.
>>
> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
> modified under certain circumstances. One use of this is to point the
> same VA to different physical pages (with different access permissions)
> in different p2m's to hide memory changes.

What is the practical use of being able to play paging tricks like this
behind a VMs back?

>
>> Given the above restriction, I believe a lot of the existing features
>> can continue to work and coexist.  For generating mem_events, the
>> permissions can be altered in the altp2m.  For injecting #VE, the altp2m
>> type can change to the new p2m_ram_rw, so long as the host p2m type is
>> compatible.  For both, a vmexit can occur.  Xen can do the appropriate
>> action and also inject a #VE on its way back into the guest.
>>
>> One thing I have noticed while looking at the #VE stuff that EPT also
>> supports A/D tracking, which might be quite a nice optimisation and
>> forgo the need for p2m_ram_logdirty, but I think this should be treated
>> as an orthogonal item.
>>
> This is far from my area of expertise, but I believe there is code in Xen
> to use EPT D bits in migration.

Not that I can spot, although I seem to remember some talk about it. All
logdirty code still appears to relies on the logdirty bitmap being
filled, which is done from vmexits for p2m_ram_logdirty regions.

~Andrew

>
> Ed
>
>> When shared ept/iommu is not in use, altp2m can safely be used by vcpus,
>> as this will not interfere with the IOMMU permissions.
>>
>> Furthermore, I can't conceptually think of an issue against the idea of
>> nestedp2m alternatives, following the same rule that the mapped mfns
>> match up.  That should allow all existing nestedvirt infrastructure
>> continue to work.
>>
>> Does the above look sensible, or have I overlooked something?
>>
>> ~Andrew
>>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13 20:45     ` Andrew Cooper
@ 2015-01-13 21:30       ` Ed White
  2015-01-14  7:04         ` Jan Beulich
  2015-03-05 13:45       ` Egger, Christoph
  1 sibling, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-13 21:30 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: keir, ian.campbell, tim, ian.jackson, jbeulich, Tamas K Lengyel

On 01/13/2015 12:45 PM, Andrew Cooper wrote:
> On 13/01/15 20:02, Ed White wrote:
>> On 01/13/2015 11:01 AM, Andrew Cooper wrote:
>>> On 09/01/15 21:26, Ed White wrote:
>>>> This set of patches adds support to hvm domains for EPTP switching by creating
>>>> multiple copies of the host p2m (currently limited to 10 copies).
>>>>
>>>> The primary use of this capability is expected to be in scenarios where access
>>>> to memory needs to be monitored and/or restricted below the level at which the
>>>> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
>>>> developer summit are:
>>>>
>>>>     VM introspection: 
>>>>         http://www.slideshare.net/xen_com_mgr/
>>>>         zero-footprint-guest-memory-introspection-from-xen
>>>>
>>>>     Secure inter-VM communication:
>>>>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>>>>
>>>> Each p2m copy is populated lazily on EPT violations, and only contains entries for
>>>> ram p2m types. Permissions for pages in alternate p2m's can be changed in a similar
>>>> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>>>>
>>>> All this is done through extra HVMOP types.
>>>>
>>>> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
>>>> code is hypervisor-only, the toolstack has not been modified.
>>>>
>>>> The intra-domain code has been tested. Violation notifications can only be received
>>>> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
>>>> intra-domain, and only on VCPU's that have enabled notification.
>>>>
>>>> VMFUNC and #VE will both be emulated on hardware without native support.
>>>>
>>>> This code is not compatible with nested hvm functionality and will refuse to work
>>>> with nested hvm active. It is also not compatible with migration. It should be
>>>> considered experimental.
>>> Having reviewed most of the series, I believe I now have a feeling for
>>> what you are trying to achieve, but I would like to discuss some of the
>>> design implications.
>>>
>>> The following is my understanding of the situation.  Please correct me
>>> if I have made a mistake.
>>>
>>>
>> Thanks for investing the time to do this. Maybe this first couple of days
>> would have gone more smoothly if something like this was in the cover letter.
> 
> No problem.  (I tend to find that things like this save time in the long
> run)
> 
>>
>> With the exception of a couple of minor points, you are spot on.
> 
> Cool!
> 
>>
>>> Currently, a domain has a single host p2m.  This contains the guest
>>> physical address mappings, and a combination of p2m types which are used
>>> by existing components to allow certain actions to happen.  All vcpus
>>> run with the same host p2m.
>>>
>>> A domain may have a number of nested p2ms (currently an arbitrary limit
>>> of 10).  These are used for nested-virt and are translated by the host
>>> p2m.  Vcpus in guest mode run under a nested p2m.
>>>
>>> This new altp2m infrastructure adds the ability to use a different set
>>> of tables in the place of the host p2m.  This, in practice, allows for
>>> different translations, different p2m types, different access permissions. 
>>>
>>> One usecase of alternate p2ms is to provide introspection information to
>>> out-of-guest entities (via the mem_event interface) or to in-guest
>>> entities (via #VE).
>>>
>>>
>>> Now for some observations and assumptions.
>>>
>>> It occurs to me that the altp2m mechanism is generic.  From the look of
>>> the series, it is mostly implemented in a generic way, which is great. 
>>> The only Intel specific bits appear to be the ept handling itself,
>>> 'vmfunc' instruction support and #VE injection to in-guest entities. 
>>>
>> That was my intention. I don't know enough about the state of AMD
>> virtualization to know if it can support these patches by emulating
>> vmfunc and #VE, but that was my target.
> 
> As far as I am aware, AMD SVM has no similar concept to vmfunc, nor
> #VE.  However, the same kinds of introspection are certainly possible by
> playing with the read/write bits on the NPT tables and causing a vmexit.
> 
>>
>>> I can't think of any reasonable case where the alternate p2m would want
>>> mappings different to the host p2m.  That is to say, an altp2m will map
>>> the same set of mfns to make a guest physical address space, but may
>>> differ in page permissions and possibly p2m types.
>>>
>> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
>> modified under certain circumstances. One use of this is to point the
>> same VA to different physical pages (with different access permissions)
>> in different p2m's to hide memory changes.
> 
> What is the practical use of being able to play paging tricks like this
> behind a VMs back?
> 

I'm restricted in how much detail I can go into on a public mailing list,
but imagine that you want a data read to see one thing and an instruction
fetch to see something else.

If you need more than that we'll have to go off-list, and even then I'll
have to check what I can say.

Ed

>>
>>> Given the above restriction, I believe a lot of the existing features
>>> can continue to work and coexist.  For generating mem_events, the
>>> permissions can be altered in the altp2m.  For injecting #VE, the altp2m
>>> type can change to the new p2m_ram_rw, so long as the host p2m type is
>>> compatible.  For both, a vmexit can occur.  Xen can do the appropriate
>>> action and also inject a #VE on its way back into the guest.
>>>
>>> One thing I have noticed while looking at the #VE stuff that EPT also
>>> supports A/D tracking, which might be quite a nice optimisation and
>>> forgo the need for p2m_ram_logdirty, but I think this should be treated
>>> as an orthogonal item.
>>>
>> This is far from my area of expertise, but I believe there is code in Xen
>> to use EPT D bits in migration.
> 
> Not that I can spot, although I seem to remember some talk about it. All
> logdirty code still appears to relies on the logdirty bitmap being
> filled, which is done from vmexits for p2m_ram_logdirty regions.
> 
> ~Andrew
> 
>>
>> Ed
>>
>>> When shared ept/iommu is not in use, altp2m can safely be used by vcpus,
>>> as this will not interfere with the IOMMU permissions.
>>>
>>> Furthermore, I can't conceptually think of an issue against the idea of
>>> nestedp2m alternatives, following the same rule that the mapped mfns
>>> match up.  That should allow all existing nestedvirt infrastructure
>>> continue to work.
>>>
>>> Does the above look sensible, or have I overlooked something?
>>>
>>> ~Andrew
>>>
> 
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13 20:02   ` Ed White
  2015-01-13 20:45     ` Andrew Cooper
@ 2015-01-14  7:01     ` Jan Beulich
  1 sibling, 0 replies; 135+ messages in thread
From: Jan Beulich @ 2015-01-14  7:01 UTC (permalink / raw)
  To: andrew.cooper3, edmund.h.white
  Cc: keir, ian.campbell, tamas.lengyel, tim, xen-devel, ian.jackson

>>> Ed White <edmund.h.white@intel.com> 01/13/15 9:03 PM >>>
>On 01/13/2015 11:01 AM, Andrew Cooper wrote:
>> One thing I have noticed while looking at the #VE stuff that EPT also
>> supports A/D tracking, which might be quite a nice optimisation and
>> forgo the need for p2m_ram_logdirty, but I think this should be treated
>> as an orthogonal item.
> 
>This is far from my area of expertise, but I believe there is code in Xen
>to use EPT D bits in migration.

There once was a patch series, but upon asking on the (performance)
benefits, the submitting engineer stated that there was no measurable
improvement, and hence the series never got applied. Right now PML
is being worked on afaik, which from what I can tell will make it a lot
easier (compared to scanning the whole tree for set D bits) to collect
the modified bitmap when the tool stack asks for it.

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13 21:30       ` Ed White
@ 2015-01-14  7:04         ` Jan Beulich
  2015-01-14 10:31           ` Tamas K Lengyel
  0 siblings, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-14  7:04 UTC (permalink / raw)
  To: edmund.h.white
  Cc: keir, ian.campbell, andrew.cooper3, tim, xen-devel,
	tamas.lengyel, ian.jackson

>>> Ed White <edmund.h.white@intel.com> 01/13/15 10:32 PM >>>
>On 01/13/2015 12:45 PM, Andrew Cooper wrote:
>> On 13/01/15 20:02, Ed White wrote:
>>> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
>>> modified under certain circumstances. One use of this is to point the
>>> same VA to different physical pages (with different access permissions)
>>> in different p2m's to hide memory changes.
>> 
>> What is the practical use of being able to play paging tricks like this
>> behind a VMs back?
> 
>I'm restricted in how much detail I can go into on a public mailing list,
>but imagine that you want a data read to see one thing and an instruction
>fetch to see something else.

How would that work? There can only be one P2M in use at a time, and that's
used for both translations. Or are you saying at least one of the two accesses
would be emulated nevertheless?

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-14  7:04         ` Jan Beulich
@ 2015-01-14 10:31           ` Tamas K Lengyel
  2015-01-14 11:09             ` Jan Beulich
  0 siblings, 1 reply; 135+ messages in thread
From: Tamas K Lengyel @ 2015-01-14 10:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Keir Fraser, Ian Campbell, Andrew Cooper,
	Ian Jackson, Ed White, xen-devel

On Wed, Jan 14, 2015 at 8:04 AM, Jan Beulich <jbeulich@suse.com> wrote:
>>>> Ed White <edmund.h.white@intel.com> 01/13/15 10:32 PM >>>
>>On 01/13/2015 12:45 PM, Andrew Cooper wrote:
>>> On 13/01/15 20:02, Ed White wrote:
>>>> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
>>>> modified under certain circumstances. One use of this is to point the
>>>> same VA to different physical pages (with different access permissions)
>>>> in different p2m's to hide memory changes.
>>>
>>> What is the practical use of being able to play paging tricks like this
>>> behind a VMs back?
>>
>>I'm restricted in how much detail I can go into on a public mailing list,
>>but imagine that you want a data read to see one thing and an instruction
>>fetch to see something else.
>
> How would that work? There can only be one P2M in use at a time, and that's
> used for both translations. Or are you saying at least one of the two accesses
> would be emulated nevertheless?
>
> Jan

I can see it working by having data fetch access to a page trapped via
mem_access, while instruction fetch is not. This would be very handy
when doing stealthy debugging where the presence of breakpoints should
be hidden from the guest. With this technique it is possible to
present a copy of the page to the data fetch that has no breakpoints
in it, as done for example in this paper:
http://friends.cs.purdue.edu/pubs/ACSAC13.pdf.

Tamas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-14 10:31           ` Tamas K Lengyel
@ 2015-01-14 11:09             ` Jan Beulich
  2015-01-14 11:28               ` Tamas K Lengyel
  0 siblings, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-14 11:09 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tim Deegan, Keir Fraser, Ian Campbell, Andrew Cooper,
	Ian Jackson, Ed White, xen-devel

>>> On 14.01.15 at 11:31, <tamas.lengyel@zentific.com> wrote:
> On Wed, Jan 14, 2015 at 8:04 AM, Jan Beulich <jbeulich@suse.com> wrote:
>>>>> Ed White <edmund.h.white@intel.com> 01/13/15 10:32 PM >>>
>>>On 01/13/2015 12:45 PM, Andrew Cooper wrote:
>>>> On 13/01/15 20:02, Ed White wrote:
>>>>> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
>>>>> modified under certain circumstances. One use of this is to point the
>>>>> same VA to different physical pages (with different access permissions)
>>>>> in different p2m's to hide memory changes.
>>>>
>>>> What is the practical use of being able to play paging tricks like this
>>>> behind a VMs back?
>>>
>>>I'm restricted in how much detail I can go into on a public mailing list,
>>>but imagine that you want a data read to see one thing and an instruction
>>>fetch to see something else.
>>
>> How would that work? There can only be one P2M in use at a time, and that's
>> used for both translations. Or are you saying at least one of the two accesses
>> would be emulated nevertheless?
> 
> I can see it working by having data fetch access to a page trapped via
> mem_access, while instruction fetch is not.

Understood, but how do you then carry out the data access? The
question I raised was whether that would then involve emulation.

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-14 11:09             ` Jan Beulich
@ 2015-01-14 11:28               ` Tamas K Lengyel
  2015-01-14 17:35                 ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tamas K Lengyel @ 2015-01-14 11:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Keir Fraser, Ian Campbell, Andrew Cooper,
	Ian Jackson, Ed White, xen-devel

On Wed, Jan 14, 2015 at 12:09 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 14.01.15 at 11:31, <tamas.lengyel@zentific.com> wrote:
>> On Wed, Jan 14, 2015 at 8:04 AM, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>> Ed White <edmund.h.white@intel.com> 01/13/15 10:32 PM >>>
>>>>On 01/13/2015 12:45 PM, Andrew Cooper wrote:
>>>>> On 13/01/15 20:02, Ed White wrote:
>>>>>> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
>>>>>> modified under certain circumstances. One use of this is to point the
>>>>>> same VA to different physical pages (with different access permissions)
>>>>>> in different p2m's to hide memory changes.
>>>>>
>>>>> What is the practical use of being able to play paging tricks like this
>>>>> behind a VMs back?
>>>>
>>>>I'm restricted in how much detail I can go into on a public mailing list,
>>>>but imagine that you want a data read to see one thing and an instruction
>>>>fetch to see something else.
>>>
>>> How would that work? There can only be one P2M in use at a time, and that's
>>> used for both translations. Or are you saying at least one of the two accesses
>>> would be emulated nevertheless?
>>
>> I can see it working by having data fetch access to a page trapped via
>> mem_access, while instruction fetch is not.
>
> Understood, but how do you then carry out the data access? The
> question I raised was whether that would then involve emulation.
>
> Jan

At the mem_access trap point you can swap in an altp2m where the
gfn->mfn mapping is the one where the breakpoints are hidden,
singlestep, then swap the original p2m back. While this approach still
has some overhead because of the use of singlestepping, it is going to
be faster then what you currently have to do, which is removing all
breakpoints, singlestep, then put breakpoints back. Now it would just
be a matter of swapping a single pointer.

Tamas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection.
  2015-01-13 18:50     ` Ed White
@ 2015-01-14 14:38       ` Andrew Cooper
  0 siblings, 0 replies; 135+ messages in thread
From: Andrew Cooper @ 2015-01-14 14:38 UTC (permalink / raw)
  To: Ed White, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

On 13/01/15 18:50, Ed White wrote:
> On 01/12/2015 05:06 AM, Andrew Cooper wrote:
>> On 09/01/15 21:26, Ed White wrote:
>>> Currently, neither is enabled globally but may be enabled on a per-VCPU
>>> basis by the altp2m code.
>>>
>>> Everything can be force-disabled globally by specifying vmfunc=0 on the
>>> Xen command line.
>>>
>>> Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
>>> that bit is now hardware-defined.
>>>
>>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>>> ---
>>>  docs/misc/xen-command-line.markdown |  7 +++++++
>>>  xen/arch/x86/hvm/vmx/vmcs.c         | 40 +++++++++++++++++++++++++++++++++++++
>>>  xen/arch/x86/mm/p2m-ept.c           |  1 -
>>>  xen/include/asm-x86/hvm/vmx/vmcs.h  | 16 +++++++++++++++
>>>  xen/include/asm-x86/hvm/vmx/vmx.h   | 13 +++++++++++-
>>>  xen/include/asm-x86/msr-index.h     |  1 +
>>>  6 files changed, 76 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
>>> index 152ae03..00fbae7 100644
>>> --- a/docs/misc/xen-command-line.markdown
>>> +++ b/docs/misc/xen-command-line.markdown
>>> @@ -1305,6 +1305,13 @@ The optional `keep` parameter causes Xen to continue using the vga
>>>  console even after dom0 has been started.  The default behaviour is to
>>>  relinquish control to dom0.
>>>  
>>> +### vmfunc (Intel)
>>> +> `= <boolean>`
>>> +
>>> +> Default: `true`
>>> +
>>> +Use VMFUNC and #VE support if available.
>>> +
>>>  ### vpid (Intel)
>>>  > `= <boolean>`
>>>  
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index 9d8033e..4274e92 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -50,6 +50,9 @@ boolean_param("unrestricted_guest", opt_unrestricted_guest_enabled);
>>>  static bool_t __read_mostly opt_apicv_enabled = 1;
>>>  boolean_param("apicv", opt_apicv_enabled);
>>>  
>>> +static bool_t __read_mostly opt_vmfunc_enabled = 1;
>>> +boolean_param("vmfunc", opt_vmfunc_enabled);
>> Please can experimental features be off by default.  (I am specifically
>> looking to avoid the issues we had with apicv getting into stable
>> releases despite reliably causing problems for migration).
>>
>> I suspect you will have many interested testers for this featureset, and
>> it is fine to patch the default later when the feature gets declared stable.
>>
>> I also wonder whether it might be better to have a "vmx=" command line
>> parameter with "vmfunc" as a subopt, to save gaining an ever increasing
>> set of related top level parameters?
>>
>> Other than this, the content of the rest of the patch appears fine.
>>
> I definitely can change the default to off, but I don't think it will
> have the effect you're expecting.
>
> This patch simply determines whether the hardware supports enabling
> VMFUNC and #VE, but does not enable them. If a domain enters
> alternate p2m mode through the relevant hypercall, at that point
> VMFUNC will be enabled for vcpu's in that domain; and if a vcpu in
> that domain subsequently registers itself to receive #VE through
> another hypercall, #VE will be enabled for that vcpu. Since both
> features are emulated if the hardware doesn't support them, changing
> the default to off will simply force emulation.

Now you mention this, what feature flag should a VM look for to indicate
the availability of vmfunc?

Looking at the manual, it would appear that guest software's only method
of detecting the absence of support is to attempt the instruction can
catch a #UD.  (I also observe that vmfunc 0 has no cpl0 requirements as
described by its pseudocode.)

One way or another the domain needs something akin to a feature flag. 
While I am loathe to suggest it, I think you need two hvm params to
control this.

One HVM param should probably match the vm-function controls, and
identifies which functions are permitted for use, independent of
hardware support vs emulation.  A missing bit here will cause emulated
attempts to fail with #UD.

A second hvmparam should identify the altp2m mode, as one of {off,
identity, unrestricted}

This allows the host admin quite fine grain control over the options
available including absolutely nothing, out-of-guest-only altp2m,
identity altp2m which plays nicely with most other Xen features, or the
full set.  By explicitly choosing the full set, the host admin has to
take a conscious choice to be incompatible with features such as
passthrough and migration.

~Andrew

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-14 11:28               ` Tamas K Lengyel
@ 2015-01-14 17:35                 ` Ed White
  2015-01-15  8:16                   ` Jan Beulich
  2015-01-15 10:39                   ` Tamas K Lengyel
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-14 17:35 UTC (permalink / raw)
  To: Tamas K Lengyel, Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan, xen-devel,
	Ian Jackson

On 01/14/2015 03:28 AM, Tamas K Lengyel wrote:
> On Wed, Jan 14, 2015 at 12:09 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 14.01.15 at 11:31, <tamas.lengyel@zentific.com> wrote:
>>> On Wed, Jan 14, 2015 at 8:04 AM, Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>> Ed White <edmund.h.white@intel.com> 01/13/15 10:32 PM >>>
>>>>> On 01/13/2015 12:45 PM, Andrew Cooper wrote:
>>>>>> On 13/01/15 20:02, Ed White wrote:
>>>>>>> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
>>>>>>> modified under certain circumstances. One use of this is to point the
>>>>>>> same VA to different physical pages (with different access permissions)
>>>>>>> in different p2m's to hide memory changes.
>>>>>>
>>>>>> What is the practical use of being able to play paging tricks like this
>>>>>> behind a VMs back?
>>>>>
>>>>> I'm restricted in how much detail I can go into on a public mailing list,
>>>>> but imagine that you want a data read to see one thing and an instruction
>>>>> fetch to see something else.
>>>>
>>>> How would that work? There can only be one P2M in use at a time, and that's
>>>> used for both translations. Or are you saying at least one of the two accesses
>>>> would be emulated nevertheless?
>>>
>>> I can see it working by having data fetch access to a page trapped via
>>> mem_access, while instruction fetch is not.
>>
>> Understood, but how do you then carry out the data access? The
>> question I raised was whether that would then involve emulation.
>>
>> Jan
> 
> At the mem_access trap point you can swap in an altp2m where the
> gfn->mfn mapping is the one where the breakpoints are hidden,
> singlestep, then swap the original p2m back. While this approach still
> has some overhead because of the use of singlestepping, it is going to
> be faster then what you currently have to do, which is removing all
> breakpoints, singlestep, then put breakpoints back. Now it would just
> be a matter of swapping a single pointer.
> 

Right. The key observation is that at any single point in time, a given
hardware thread can be fetching an instruction or reading data, but not
both. These patches add a low-overhead way of switching p2m's for a single
vcpu between any two such operations. There are ways of avoiding the
single-step too, although I don't think that falls within the scope
of this conversation.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-14 17:35                 ` Ed White
@ 2015-01-15  8:16                   ` Jan Beulich
  2015-01-15 17:28                     ` Ed White
  2015-01-15 10:39                   ` Tamas K Lengyel
  1 sibling, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-15  8:16 UTC (permalink / raw)
  To: Ed White, Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan, xen-devel,
	Ian Jackson

>>> On 14.01.15 at 18:35, <edmund.h.white@intel.com> wrote:
> On 01/14/2015 03:28 AM, Tamas K Lengyel wrote:
>> At the mem_access trap point you can swap in an altp2m where the
>> gfn->mfn mapping is the one where the breakpoints are hidden,
>> singlestep, then swap the original p2m back. While this approach still
>> has some overhead because of the use of singlestepping, it is going to
>> be faster then what you currently have to do, which is removing all
>> breakpoints, singlestep, then put breakpoints back. Now it would just
>> be a matter of swapping a single pointer.
> 
> Right. The key observation is that at any single point in time, a given
> hardware thread can be fetching an instruction or reading data, but not
> both.

Fine, as long as an instruction reading itself isn't going to lead to
a live lock.

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-14 17:35                 ` Ed White
  2015-01-15  8:16                   ` Jan Beulich
@ 2015-01-15 10:39                   ` Tamas K Lengyel
  2015-01-15 17:31                     ` Ed White
  1 sibling, 1 reply; 135+ messages in thread
From: Tamas K Lengyel @ 2015-01-15 10:39 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Keir Fraser, Ian Campbell, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich

> There are ways of avoiding the
> single-step too, although I don't think that falls within the scope
> of this conversation.
>
> Ed

I would be very interested in knowing how we can avoid the singlestep
phase. Are you envisioning using this with a split-TLB? IMHO this is a
pretty critical component of effectively using the new feature so
should be within scope of this discussion.

Tamas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (13 preceding siblings ...)
  2015-01-13 19:01 ` Andrew Cooper
@ 2015-01-15 16:15 ` Tim Deegan
  2015-01-15 18:23   ` Ed White
  14 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 16:15 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hello,

Thanks for sending this series - in particular, thank you for sending
it early in the release cycle!  I'll review some of the patches
individually but since I expect there will be some changes to come in
future versions I'm not going to go into too much detail.

I see there's been some discussion of how this would be useful for an
out-of-domain inspection tool, but could you talk some more about the
usefulness of the in-VM callers?  I'm not sure what advantage it
brings over a kernel playing the same tricks in normal pagetables --
after all an attacker in the kernel can make hypercalls and so get
around any p2m restrictions.


Looking at the code, the first thing that strikes me about it is that
you've tried to make a split between common code and VMX-specific
implementation details.  Thank you for that!  My only reservations
around that are that some of the naming of things in common code are
too vmx-specific.

I think it's probably OK to use 've', though I think that the term
'notify' might be better (which you use in the hypercall interface).
Using 'eptp' in common code is less good, though I think almost
everywhere in common code, you could just use 'p2m' instead.
Also, functions like 'find_by_eptp' that are only ever called
from vmx code don't need to be plumbed through common wrappers.
Also, I think you can probably s/altp2mhvm/altp2m/ throughout.


The second thing is how similar some of this is to nested p2m code,
making me wonder whether it could share more code with that.  It's not
as much duplication as I had feared, but e.g. altp2m_write_p2m_entry()
is _identical_ to nestedp2m_write_p2m_entry(), (making the
copyright claim at the top of the file quite dubious, BTW).


In order to work towards getting this series merged, I think we have
four things to consider:

- General design.  I've made some comments above and some of the other
  maintainers have replied separately.  Assuming that the case can be
  made for needing this in the hypervisor at all, I think the overall
  direction is probably a good one.

- Feature compatibilty/completeness.  You pointed out yourself that
  it doesn't work with nested HVM or migration.  I think I'd have to
  add mem_event/access/paging and PCI passthrough to the list of
  features that ought to still work.  I'm resigned to the idea that
  many new features don't work with shadow pagetables. :)

- Testing and sample code.  If we're to carry this feature in the
  tree, we'll need at least some code to make use of it; ideally
  some sort of test we can run to find regressions later.

- Issues of coding style and patch hygiene.  I don't think it's
  particularly useful to go into that in detail at this stage, but I
  did see some missing spaces in parentheses, e.g. for ( <-here-> ),
  and the patch series should be ordered so that the new feature is
  enabled in the last patch (in particular after 'fix log-dirty handling'!)

OK.  I have a few comments about the code itself; I'll reply to
individual patches with that.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/11] VMX: implement suppress #VE.
  2015-01-09 21:26 ` [PATCH 02/11] VMX: implement suppress #VE Ed White
  2015-01-12 16:43   ` Andrew Cooper
@ 2015-01-15 16:25   ` Tim Deegan
  2015-01-15 18:46     ` Ed White
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 16:25 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

At 13:26 -0800 on 09 Jan (1420806392), Ed White wrote:
>  static inline bool_t is_epte_valid(ept_entry_t *e)
>  {
> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
> +    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);

This test for 0 is just catching uninitialised entries in freshly
allocated pages.  Rather than changing it to ignore bit 63, this loop...

>  }
>  
>  /* returns : 0 for success, -errno otherwise */
> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
>  
>      ept_entry->r = ept_entry->w = ept_entry->x = 1;
>  
> +    /* Disable #VE on all entries */ 
> +    if ( cpu_has_vmx_virt_exceptions )
> +    {
> +        ept_entry_t *table = __map_domain_page(pg);
> +
> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> +            table[i].suppress_ve = 1;

...should set the type of the empty entries to p2m_invalid as it goes.

> +    /* Disable #VE on all entries */
> +    if ( cpu_has_vmx_virt_exceptions )
> +    {
> +        ept_entry_t *table =
> +            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
> +
> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> +            table[i].suppress_ve = 1;

And the same here.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection.
  2015-01-09 21:26 ` [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection Ed White
  2015-01-12 17:08   ` Andrew Cooper
@ 2015-01-15 16:32   ` Tim Deegan
  1 sibling, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 16:32 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 13:26 -0800 on 09 Jan (1420806393), Ed White wrote:
> As implemented here, only supported on platforms with VMX HAP.

This patch, I think, is where we could have an off-by-default feature
option to disable all this new code until it's stable.

There must also be some way for the toolstack to find out whether a
given host supports altp2m, so that it can plan for VM migration, and
some way (in a later patch) to enable or disable the feature for a
given VM.  Typically we use a HVM_PARAM for that, with appropriate
checks on who can change it and when.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 04/11] x86/MM: Improve p2m type checks.
  2015-01-09 21:26 ` [PATCH 04/11] x86/MM: Improve p2m type checks Ed White
  2015-01-12 17:48   ` Andrew Cooper
@ 2015-01-15 16:36   ` Tim Deegan
  1 sibling, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 16:36 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

At 13:26 -0800 on 09 Jan (1420806394), Ed White wrote:
> The alternate p2m code will introduce a new p2m type. In preparation for using
> that new type, introduce the type indicator here and fix all the checks
> that assume !nestedp2m == hostp2m to explicitly check for hostp2m.
> 
> Signed-off-by: Ed White <edmund.h.white@intel.com>

This looks like a good idea regardless of altp2m.  If you move the
altp2m bits out (into patch #5, I think), I'll just check this in.
Maybe also s/p2m type/p2m class/ to avoid confusion with p2m_type_*. 

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-09 21:26 ` [PATCH 05/11] x86/altp2m: basic data structures and support routines Ed White
  2015-01-13 11:28   ` Andrew Cooper
@ 2015-01-15 16:48   ` Tim Deegan
  2015-01-15 16:53     ` Jan Beulich
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 16:48 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

At 13:26 -0800 on 09 Jan (1420806395), Ed White wrote:
> --- a/xen/arch/x86/hvm/Makefile
> +++ b/xen/arch/x86/hvm/Makefile
> @@ -22,4 +22,5 @@ obj-y += vlapic.o
>  obj-y += vmsi.o
>  obj-y += vpic.o
>  obj-y += vpt.o
> -obj-y += vpmu.o
> \ No newline at end of file
> +obj-y += vpmu.o
> +obj-y += altp2mhvm.o

This list is in alphabetical order; please add this at the top. :)

> +    /* Init alternate p2m data */
> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
> +    {
> +        rv = -ENOMEM;
> +        goto out;
> +    }
> +    for (i = 0; i < 512; i++)
> +        d->arch.altp2m_eptp[i] = ~0ul;

This 512 is architectural, I guess?  It should have a named constant.

> --- a/xen/arch/x86/mm/mm-locks.h
> +++ b/xen/arch/x86/mm/mm-locks.h
> @@ -209,6 +209,10 @@ declare_mm_lock(nestedp2m)
>  #define nestedp2m_lock(d)   mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock)
>  #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock)
>  
> +declare_mm_lock(altp2m)
> +#define altp2m_lock(d)   mm_lock(altp2m, &(d)->arch.altp2m_lock)
> +#define altp2m_unlock(d) mm_unlock(&(d)->arch.altp2m_lock)

This needs a nice big block comment describing what it protects, like
the other locks in this file.  (Urgh, I see that the nested-p2m lock
has come unmoored from its comment; I'll fix that now).

> +struct altp2mvcpu {
> +    uint16_t    p2midx ;        /* alternate p2m index */
> +    uint64_t    veinfo;         /* #VE information page guest pfn */

This is a gfn, I think?  I think I'd prefer 'unsigned long veinfo_gfn'
for clarity.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-15 16:48   ` Tim Deegan
@ 2015-01-15 16:53     ` Jan Beulich
  2015-01-15 18:49       ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-15 16:53 UTC (permalink / raw)
  To: Ed White, Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, xen-devel

>>> On 15.01.15 at 17:48, <tim@xen.org> wrote:
> At 13:26 -0800 on 09 Jan (1420806395), Ed White wrote:
>> +    /* Init alternate p2m data */
>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
>> +    {
>> +        rv = -ENOMEM;
>> +        goto out;
>> +    }
>> +    for (i = 0; i < 512; i++)
>> +        d->arch.altp2m_eptp[i] = ~0ul;
> 
> This 512 is architectural, I guess?  It should have a named constant.

Perhaps even calculated rather than just defined as a plain
number constant, e.g. (PAGE_SIZE / sizeof(something)).

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-01-09 21:26 ` [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE Ed White
  2015-01-13 11:58   ` Andrew Cooper
@ 2015-01-15 16:56   ` Tim Deegan
  2015-01-15 18:55     ` Ed White
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 16:56 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

At 13:26 -0800 on 09 Jan (1420806396), Ed White wrote:
> @@ -2551,6 +2640,17 @@ static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs)
>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>          break;
>      case X86EMUL_EXCEPTION:
> +        /* check for a VMFUNC that should be emulated */
> +        if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
> +             ctxt.insn_buf_bytes >= 3 && ctxt.insn_buf[0] == 0x0f &&
> +             ctxt.insn_buf[1] == 0x01 && ctxt.insn_buf[2] == 0xd4 &&
> +             regs->eax == 0 &&
> +             p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
> +        {
> +            regs->eip += 3;
> +            return;
> +        }
> +

I think Andrew already pointed out that this needs to be done by
adding VMFUNC to the emulator itself with a callback.  Apart from
anything else that will DTRT with prefix bytes &c.

> +        if ( (uint16_t)idx != vcpu_altp2mhvm(v).p2midx )
> +        {
> +            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> +            vcpu_altp2mhvm(v).p2midx = (uint16_t)idx;
> +            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);

This looks wrong -- you need to do a TLB flush before you can remove
this CPU from the dirty_cpumask.

> +        }
> +    }
>  
>      /* XXX: This looks ugly, but we need a mechanism to ensure
>       * any pending vmresume has really happened
> @@ -3041,6 +3175,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>              update_guest_eip();
>          break;
>  
> +    case EXIT_REASON_VMFUNC:
> +        vmx_vmexit_ud_intercept(regs);

I think vmx_vmexit_ud_intercept() should probably be renamed, and
perhaps split into two since this new caller won't want the
opt_hvm_fep stuff.

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-09 21:26 ` [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type Ed White
@ 2015-01-15 17:03   ` Tim Deegan
  2015-01-15 20:38     ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 17:03 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 13:26 -0800 on 09 Jan (1420806397), Ed White wrote:
> This is treated exactly like p2m_ram_rw, except that suppress_ve is not
> set in the EPTE.

I don't think this is going to work -- you probably want to support
p2m_ram_ro at least, and maybe other types, but duplicating each of
them as a 'type foo with #VE' doesn't seem right.

Since the default is to set the ignore-#ve flag everywhere, how about
having an operation to enable #ve for a frame that just clears that
bit, and then having all other updates to altp2m entries preserve it?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types.
  2015-01-09 21:26 ` [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
@ 2015-01-15 17:09   ` Tim Deegan
  2015-01-15 20:43     ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 17:09 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

These _definitely_ need XSM checks, otherwise any domain can call them
on any other!  I think you can probably copy the other p2m-munging
operations to see how to make a sensible default policy.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 10/11] x86/altp2m: fix log-dirty handling.
  2015-01-09 21:26 ` [PATCH 10/11] x86/altp2m: fix log-dirty handling Ed White
@ 2015-01-15 17:20   ` Tim Deegan
  2015-01-15 20:49     ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 17:20 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

The locking chages look OK at first glance, but...

At 13:26 -0800 on 09 Jan (1420806400), Ed White wrote:
> @@ -793,6 +793,10 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
>  
>      gfn_unlock(p2m, gfn, 0);
>  
> +    if ( pt == ot && altp2mhvm_active(d) )
> +        /* make sure this page isn't valid in any alternate p2m */
> +        p2m_remove_altp2m_page(d, gfn);
> +
>      return rc;
>  }

...this is the wrong level to be making this change at.  The hook needs
to be right at the bottom, in atomic_write_ept_entry() (and
hap_write_p2m_entry() for AMD, I think), to catch _every_ update of a
p2m entry in the host p2m.

Otherwise a guest frame could be removed entirely and the altp2m would
still map it.  Or am I missing some other path that handles that case?
nested-p2m handles this by failry aggressively flushing nested p2m
tabvles but that doesn't sounds suitable for this since there's state
in the alt-p2m that needs to be retained.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-09 21:26 ` [PATCH 08/11] x86/altp2m: add remaining support routines Ed White
@ 2015-01-15 17:25   ` Tim Deegan
  2015-01-15 20:57     ` Ed White
  2015-01-15 17:33   ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 17:25 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
> +int
> +altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
> +                                unsigned long gla, struct npfec npfec)
> +{
> +    struct domain *d = v->domain;
> +    struct p2m_domain *hp2m = p2m_get_hostp2m(d);
> +    struct p2m_domain *ap2m;
> +    p2m_type_t p2mt;
> +    p2m_access_t p2ma;
> +    unsigned int page_order;
> +    unsigned long gfn, mask;
> +    mfn_t mfn;
> +    int rv;
> +
> +    ap2m = p2m_get_altp2m(v);
> +
> +    mfn = get_gfn_type_access(ap2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
> +                              0, &page_order);
> +    __put_gfn(ap2m, gpa >> PAGE_SHIFT);
> +
> +    if ( mfn_valid(mfn) )
> +    {
> +        /* Should #VE be emulated for this fault? */
> +        if ( p2mt == p2m_ram_rw_ve && !cpu_has_vmx_virt_exceptions &&
> +             ahvm_vcpu_emulate_ve(v) )
> +            return ALTP2MHVM_PAGEFAULT_DONE;
> +
> +        /* Could not handle fault here */
> +        gdprintk(XENLOG_INFO, "Altp2m memory access permissions failure, "
> +                              "no mem_event listener VCPU %d, dom %d\n",
> +                              v->vcpu_id, d->domain_id);
> +        domain_crash(v->domain);
> +        return ALTP2MHVM_PAGEFAULT_CONTINUE;
> +    }
> +
> +    mfn = get_gfn_type_access(hp2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
> +                              0, &page_order);
> +    put_gfn(hp2m->domain, gpa >> PAGE_SHIFT);
> +
> +    if ( p2mt != p2m_ram_rw || p2ma != p2m_access_rwx )
> +        return ALTP2MHVM_PAGEFAULT_CONTINUE;

I don't follow -- surely the altp2m ought to contain everything that's
in the host p2m except for deliberate extra changes.  But it looks
like here you just bail on anything other than rwx RAM.  That's going
to livelock on anything that the main fault handler thinks is OK to retry.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15  8:16                   ` Jan Beulich
@ 2015-01-15 17:28                     ` Ed White
  2015-01-15 17:45                       ` Tim Deegan
  2015-01-16  7:35                       ` Jan Beulich
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-15 17:28 UTC (permalink / raw)
  To: Jan Beulich, Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan, xen-devel,
	Ian Jackson

On 01/15/2015 12:16 AM, Jan Beulich wrote:
>>>> On 14.01.15 at 18:35, <edmund.h.white@intel.com> wrote:
>> On 01/14/2015 03:28 AM, Tamas K Lengyel wrote:
>>> At the mem_access trap point you can swap in an altp2m where the
>>> gfn->mfn mapping is the one where the breakpoints are hidden,
>>> singlestep, then swap the original p2m back. While this approach still
>>> has some overhead because of the use of singlestepping, it is going to
>>> be faster then what you currently have to do, which is removing all
>>> breakpoints, singlestep, then put breakpoints back. Now it would just
>>> be a matter of swapping a single pointer.
>>
>> Right. The key observation is that at any single point in time, a given
>> hardware thread can be fetching an instruction or reading data, but not
>> both.
> 
> Fine, as long as an instruction reading itself isn't going to lead to
> a live lock.
> 

That's not how the hardware works. By the time you figure out that the
instruction you are executing reads memory, the instruction itself has
been fetched and decoded. That won't happen again during this execution.

That's true for every CPU I've ever seen, not just Intel ones.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 10:39                   ` Tamas K Lengyel
@ 2015-01-15 17:31                     ` Ed White
  2015-01-16 10:43                       ` Tamas K Lengyel
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-15 17:31 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tim Deegan, Keir Fraser, Ian Campbell, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich

On 01/15/2015 02:39 AM, Tamas K Lengyel wrote:
>> There are ways of avoiding the
>> single-step too, although I don't think that falls within the scope
>> of this conversation.
>>
>> Ed
> 
> I would be very interested in knowing how we can avoid the singlestep
> phase. Are you envisioning using this with a split-TLB? IMHO this is a
> pretty critical component of effectively using the new feature so
> should be within scope of this discussion.
> 

It's an optimization certainly, but it's not required, and it's not
a technique we have placed in the public domain. You could try talking
to us under NDA or figure it out for yourself, but I can't detail
it here.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-09 21:26 ` [PATCH 08/11] x86/altp2m: add remaining support routines Ed White
  2015-01-15 17:25   ` Tim Deegan
@ 2015-01-15 17:33   ` Tim Deegan
  2015-01-15 21:00     ` Ed White
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 17:33 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

Sorry for the fractured replies - my notes are confused about which
functions were defined where.

At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
> +                             unsigned long old_pfn, unsigned long new_pfn)
> +{
[...]
> +    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
> +
> +    if ( !mfn_valid(mfn) )
> +        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
> +
> +    if ( !mfn_valid(mfn) || !(t == p2m_ram_rw || t == p2m_ram_rw) )
> +        goto out;
> +
> +    /* Use special ram type to enable #VE if setting for current domain */
> +    if ( current->domain == d )
> +        t = p2m_ram_rw_ve;
> +
> +    if ( !ap2m->set_entry(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a) )
> +        rc = 1;

I'm afraid this is Terribly Unsafe[tm].  Following on from my point on
the log-dirty patch, if the original gfn gets removed from the guest,
for any reason, we need a way to find and remove this mapping too.

That will be non-trivial, since you can't do it by exhaustive search.
Maybe some sort of reverse mapping?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 17:28                     ` Ed White
@ 2015-01-15 17:45                       ` Tim Deegan
  2015-01-15 18:44                         ` Ed White
  2015-01-16  7:35                       ` Jan Beulich
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-15 17:45 UTC (permalink / raw)
  To: Ed White
  Cc: Keir Fraser, Ian Campbell, Tamas K Lengyel, Ian Jackson,
	xen-devel, Jan Beulich, Andrew Cooper

At 09:28 -0800 on 15 Jan (1421310487), Ed White wrote:
> On 01/15/2015 12:16 AM, Jan Beulich wrote:
> >>>> On 14.01.15 at 18:35, <edmund.h.white@intel.com> wrote:
> >> On 01/14/2015 03:28 AM, Tamas K Lengyel wrote:
> >>> At the mem_access trap point you can swap in an altp2m where the
> >>> gfn->mfn mapping is the one where the breakpoints are hidden,
> >>> singlestep, then swap the original p2m back. While this approach still
> >>> has some overhead because of the use of singlestepping, it is going to
> >>> be faster then what you currently have to do, which is removing all
> >>> breakpoints, singlestep, then put breakpoints back. Now it would just
> >>> be a matter of swapping a single pointer.
> >>
> >> Right. The key observation is that at any single point in time, a given
> >> hardware thread can be fetching an instruction or reading data, but not
> >> both.
> > 
> > Fine, as long as an instruction reading itself isn't going to lead to
> > a live lock.
> > 
> 
> That's not how the hardware works. By the time you figure out that the
> instruction you are executing reads memory, the instruction itself has
> been fetched and decoded. That won't happen again during this execution.

Can you explain?  If the instruction faults and is returned to,
execution starts again, right?  So for an instruction that reads itself:

- the fetch succeeds;
- the read fails, and we fault;
- the hypervisor switches from mapping MFN 1 (--x) to MFN 2 (r--);
- the hypervisor returns to the guest.

Are you relying on the icache/trace cache/whatever to restart
the instruction from a cached value rather than fault immediately?
(Because the hypervisor didn't flush the TLB when it changed the mapping)?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 16:15 ` Tim Deegan
@ 2015-01-15 18:23   ` Ed White
  2015-01-16  8:12     ` Jan Beulich
                       ` (2 more replies)
  0 siblings, 3 replies; 135+ messages in thread
From: Ed White @ 2015-01-15 18:23 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 08:15 AM, Tim Deegan wrote:
> Hello,
> 
> Thanks for sending this series - in particular, thank you for sending
> it early in the release cycle!  I'll review some of the patches
> individually but since I expect there will be some changes to come in
> future versions I'm not going to go into too much detail.
> 
> I see there's been some discussion of how this would be useful for an
> out-of-domain inspection tool, but could you talk some more about the
> usefulness of the in-VM callers?  I'm not sure what advantage it
> brings over a kernel playing the same tricks in normal pagetables --
> after all an attacker in the kernel can make hypercalls and so get
> around any p2m restrictions.
> 

Our original motivation for this work is that we have a requirement
internally to enable in-guest agent based memory partitioning for
Windows domains.

There is commercially available software that uses a security
hypervisor to partition memory for Windows running on physical
hardware, and we want to make the same thing possible on
virtualized hardware.

Provided the in-domain agent loads early enough, there are ways to
prevent other code using the hypercalls. In the Windows world,
which is where we see an immediate use for these features, there
are OS features we can use to make sure the agent loads early.
In the context of a Xen domain, we might then need some additions via
extra patches to help prevent rogue actors using the hypercalls, but
it can be done.

We're aware that others are interested in an out-of-domain agent model,
and not necessarily with Windows as the target OS, and we're trying
to accommodate that usage, but we'll need help.

> 
> Looking at the code, the first thing that strikes me about it is that
> you've tried to make a split between common code and VMX-specific
> implementation details.  Thank you for that!  My only reservations
> around that are that some of the naming of things in common code are
> too vmx-specific.
> 
> I think it's probably OK to use 've', though I think that the term
> 'notify' might be better (which you use in the hypercall interface).
> Using 'eptp' in common code is less good, though I think almost
> everywhere in common code, you could just use 'p2m' instead.
> Also, functions like 'find_by_eptp' that are only ever called
> from vmx code don't need to be plumbed through common wrappers.
> Also, I think you can probably s/altp2mhvm/altp2m/ throughout.
>

As I said in discussion with Andrew, my aim was to make it possible
for these same changes to be extensible to AMD processors if they
support multiple copies of whatever their EPT equivalent is, by
simply emulating VMFUNC and #VE. That's why there are some wrappers
in the implementation that appear redundant.
 
> 
> The second thing is how similar some of this is to nested p2m code,
> making me wonder whether it could share more code with that.  It's not
> as much duplication as I had feared, but e.g. altp2m_write_p2m_entry()
> is _identical_ to nestedp2m_write_p2m_entry(), (making the
> copyright claim at the top of the file quite dubious, BTW).
> 

I did initially use nestedp2m_write_p2m_entry directly, but I knew
that wouldn't be acceptable! On this specific point, I would be more
inclined to refactor the normal write entry routine so you can call
it everywhere, since both the nested and alternate ones are simply
a copy of a part of the normal one.

I'm well aware that there are similarities with the nested p2m code.
There are significant issues with that code once you start using it
in earnest.

I've spoken to the maintainers of the nested p2m code on a number of
occasions to discuss my concerns, but I need to be clear that I can not
and will not submit patches that touch that code.

> 
> In order to work towards getting this series merged, I think we have
> four things to consider:
> 
> - General design.  I've made some comments above and some of the other
>   maintainers have replied separately.  Assuming that the case can be
>   made for needing this in the hypervisor at all, I think the overall
>   direction is probably a good one.
> 
> - Feature compatibilty/completeness.  You pointed out yourself that
>   it doesn't work with nested HVM or migration.  I think I'd have to
>   add mem_event/access/paging and PCI passthrough to the list of
>   features that ought to still work.  I'm resigned to the idea that
>   many new features don't work with shadow pagetables. :)
> 

The intention is that mem_event/access should still work. I haven't
specifically looked at paging, but I don't see any fundamental reason
why it shouldn't. PCI passthrough I suspect won't. Does nested HVM
work with migration? Is it simply not acceptable to submit a feature
as experimental, with known compatibility issues? I had assumed that
it was, based on the nested HVM status as documented in the release
notes.

> - Testing and sample code.  If we're to carry this feature in the
>   tree, we'll need at least some code to make use of it; ideally
>   some sort of test we can run to find regressions later.
> 

Understood. As I said in another thread, I'm hoping the community
will be interested enough to help with that. If not, I'll try to
figure something out.

> - Issues of coding style and patch hygiene.  I don't think it's
>   particularly useful to go into that in detail at this stage, but I
>   did see some missing spaces in parentheses, e.g. for ( <-here-> ),
>   and the patch series should be ordered so that the new feature is
>   enabled in the last patch (in particular after 'fix log-dirty handling'!)
> 

The reason I put the fix log-dirty handling patch there is because I wasn't
convinced it would be acceptable, and it fixes a bug that already exists
and remains unfixed in the nested p2m code. IOW, alternate p2m would
work as well as nested p2m without that fix.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 17:45                       ` Tim Deegan
@ 2015-01-15 18:44                         ` Ed White
  2015-03-04 23:06                           ` Tamas K Lengyel
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-15 18:44 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Keir Fraser, Ian Campbell, Tamas K Lengyel, Ian Jackson,
	xen-devel, Jan Beulich, Andrew Cooper

On 01/15/2015 09:45 AM, Tim Deegan wrote:
> At 09:28 -0800 on 15 Jan (1421310487), Ed White wrote:
>> On 01/15/2015 12:16 AM, Jan Beulich wrote:
>>>>>> On 14.01.15 at 18:35, <edmund.h.white@intel.com> wrote:
>>>> On 01/14/2015 03:28 AM, Tamas K Lengyel wrote:
>>>>> At the mem_access trap point you can swap in an altp2m where the
>>>>> gfn->mfn mapping is the one where the breakpoints are hidden,
>>>>> singlestep, then swap the original p2m back. While this approach still
>>>>> has some overhead because of the use of singlestepping, it is going to
>>>>> be faster then what you currently have to do, which is removing all
>>>>> breakpoints, singlestep, then put breakpoints back. Now it would just
>>>>> be a matter of swapping a single pointer.
>>>>
>>>> Right. The key observation is that at any single point in time, a given
>>>> hardware thread can be fetching an instruction or reading data, but not
>>>> both.
>>>
>>> Fine, as long as an instruction reading itself isn't going to lead to
>>> a live lock.
>>>
>>
>> That's not how the hardware works. By the time you figure out that the
>> instruction you are executing reads memory, the instruction itself has
>> been fetched and decoded. That won't happen again during this execution.
> 
> Can you explain?  If the instruction faults and is returned to,
> execution starts again, right?  So for an instruction that reads itself:
> 
> - the fetch succeeds;
> - the read fails, and we fault;
> - the hypervisor switches from mapping MFN 1 (--x) to MFN 2 (r--);
> - the hypervisor returns to the guest.
> 
> Are you relying on the icache/trace cache/whatever to restart
> the instruction from a cached value rather than fault immediately?
> (Because the hypervisor didn't flush the TLB when it changed the mapping)?
> 

Nope. I just typed before drinking enough coffee. That whole answer was bogus.

Of course, if an instruction reads itself you can get a live lock using
these techniques, but it's a software-induced live lock and software can
avoid it. One way is compare the address being read with the instruction
pointer, and if they are on the same page emulate instead of switching p2m's.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/11] VMX: implement suppress #VE.
  2015-01-15 16:25   ` Tim Deegan
@ 2015-01-15 18:46     ` Ed White
  2015-01-16 17:22       ` Tim Deegan
  2015-03-25 17:30       ` Ed White
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-15 18:46 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 08:25 AM, Tim Deegan wrote:
> Hi,
> 
> At 13:26 -0800 on 09 Jan (1420806392), Ed White wrote:
>>  static inline bool_t is_epte_valid(ept_entry_t *e)
>>  {
>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
>> +    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);
> 
> This test for 0 is just catching uninitialised entries in freshly
> allocated pages.  Rather than changing it to ignore bit 63, this loop...
> 
>>  }
>>  
>>  /* returns : 0 for success, -errno otherwise */
>> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
>>  
>>      ept_entry->r = ept_entry->w = ept_entry->x = 1;
>>  
>> +    /* Disable #VE on all entries */ 
>> +    if ( cpu_has_vmx_virt_exceptions )
>> +    {
>> +        ept_entry_t *table = __map_domain_page(pg);
>> +
>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
>> +            table[i].suppress_ve = 1;
> 
> ...should set the type of the empty entries to p2m_invalid as it goes.
> 
>> +    /* Disable #VE on all entries */
>> +    if ( cpu_has_vmx_virt_exceptions )
>> +    {
>> +        ept_entry_t *table =
>> +            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
>> +
>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
>> +            table[i].suppress_ve = 1;
> 
> And the same here.

You mean do that unconditionally, regardless of hardware #VE?

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-15 16:53     ` Jan Beulich
@ 2015-01-15 18:49       ` Ed White
  2015-01-16  7:37         ` Jan Beulich
  2015-01-16 17:23         ` Tim Deegan
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-15 18:49 UTC (permalink / raw)
  To: Jan Beulich, Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, xen-devel

On 01/15/2015 08:53 AM, Jan Beulich wrote:
>>>> On 15.01.15 at 17:48, <tim@xen.org> wrote:
>> At 13:26 -0800 on 09 Jan (1420806395), Ed White wrote:
>>> +    /* Init alternate p2m data */
>>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
>>> +    {
>>> +        rv = -ENOMEM;
>>> +        goto out;
>>> +    }
>>> +    for (i = 0; i < 512; i++)
>>> +        d->arch.altp2m_eptp[i] = ~0ul;
>>
>> This 512 is architectural, I guess?  It should have a named constant.
> 
> Perhaps even calculated rather than just defined as a plain
> number constant, e.g. (PAGE_SIZE / sizeof(something)).

There are architectural reasons why it's very unlikely to change
from 512, so I'll just name it if that's OK.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-01-15 16:56   ` Tim Deegan
@ 2015-01-15 18:55     ` Ed White
  2015-01-16 17:50       ` Tim Deegan
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-15 18:55 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 08:56 AM, Tim Deegan wrote:
> Hi,
> 
> At 13:26 -0800 on 09 Jan (1420806396), Ed White wrote:
>> @@ -2551,6 +2640,17 @@ static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs)
>>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>          break;
>>      case X86EMUL_EXCEPTION:
>> +        /* check for a VMFUNC that should be emulated */
>> +        if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
>> +             ctxt.insn_buf_bytes >= 3 && ctxt.insn_buf[0] == 0x0f &&
>> +             ctxt.insn_buf[1] == 0x01 && ctxt.insn_buf[2] == 0xd4 &&
>> +             regs->eax == 0 &&
>> +             p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
>> +        {
>> +            regs->eip += 3;
>> +            return;
>> +        }
>> +
> 
> I think Andrew already pointed out that this needs to be done by
> adding VMFUNC to the emulator itself with a callback.  Apart from
> anything else that will DTRT with prefix bytes &c.
> 
>> +        if ( (uint16_t)idx != vcpu_altp2mhvm(v).p2midx )
>> +        {
>> +            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
>> +            vcpu_altp2mhvm(v).p2midx = (uint16_t)idx;
>> +            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> 
> This looks wrong -- you need to do a TLB flush before you can remove
> this CPU from the dirty_cpumask.
> 

No, the whole point of multiple EPTP's is that you can switch between them
without a flush. The EPTP is part of the TLB tag, and you want that entry
to stay in the TLB because you're probably going to switch back and use
it again.

If you tear the whole table down you need a flush, but I think the existing
EPT code handles that. I only use the mask to make sure I don't tear down a
table that is the current table for a vcpu.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-15 17:03   ` Tim Deegan
@ 2015-01-15 20:38     ` Ed White
  2015-01-16  8:20       ` Jan Beulich
  2015-01-16 17:52       ` Tim Deegan
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-15 20:38 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 09:03 AM, Tim Deegan wrote:
> At 13:26 -0800 on 09 Jan (1420806397), Ed White wrote:
>> This is treated exactly like p2m_ram_rw, except that suppress_ve is not
>> set in the EPTE.
> 
> I don't think this is going to work -- you probably want to support
> p2m_ram_ro at least, and maybe other types, but duplicating each of
> them as a 'type foo with #VE' doesn't seem right.
> 
> Since the default is to set the ignore-#ve flag everywhere, how about
> having an operation to enable #ve for a frame that just clears that
> bit, and then having all other updates to altp2m entries preserve it?

I hear you, but #VE is only even relevant for the in-domain agent
model, and as the only current user of that model we not only don't
want #VE to work on other page types, we specifically want it to be
prohibited.

Can we do it this way, and then change it later if required?

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types.
  2015-01-15 17:09   ` Tim Deegan
@ 2015-01-15 20:43     ` Ed White
  2015-01-16 17:57       ` Tim Deegan
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-15 20:43 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 09:09 AM, Tim Deegan wrote:
> Hi,
> 
> These _definitely_ need XSM checks, otherwise any domain can call them
> on any other!  I think you can probably copy the other p2m-munging
> operations to see how to make a sensible default policy.

Understood. I'll look at this subject again, but it's an area where
I could really use some help. There aren't any similar hypercalls that
I could find where the target domain and source domain may or may not
be the same, and the processing required varies depending on which is
the case.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 10/11] x86/altp2m: fix log-dirty handling.
  2015-01-15 17:20   ` Tim Deegan
@ 2015-01-15 20:49     ` Ed White
  2015-01-16 17:59       ` Tim Deegan
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-15 20:49 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 09:20 AM, Tim Deegan wrote:
> Hi,
> 
> The locking chages look OK at first glance, but...
> 
> At 13:26 -0800 on 09 Jan (1420806400), Ed White wrote:
>> @@ -793,6 +793,10 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
>>  
>>      gfn_unlock(p2m, gfn, 0);
>>  
>> +    if ( pt == ot && altp2mhvm_active(d) )
>> +        /* make sure this page isn't valid in any alternate p2m */
>> +        p2m_remove_altp2m_page(d, gfn);
>> +
>>      return rc;
>>  }
> 
> ...this is the wrong level to be making this change at.  The hook needs
> to be right at the bottom, in atomic_write_ept_entry() (and
> hap_write_p2m_entry() for AMD, I think), to catch _every_ update of a
> p2m entry in the host p2m.
> 
> Otherwise a guest frame could be removed entirely and the altp2m would
> still map it.  Or am I missing some other path that handles that case?
> nested-p2m handles this by failry aggressively flushing nested p2m
> tabvles but that doesn't sounds suitable for this since there's state
> in the alt-p2m that needs to be retained.

Hmm. Is that going to give me even more locking order problems?

I don't want to go down the nested p2m route. That is seriously bad
for performance, and I've also seen plenty of cases where a flush on
one vcpu breaks instruction emulation on another. Also, as you say,
I don't have enough information to rebuild the alt p2m.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-15 17:25   ` Tim Deegan
@ 2015-01-15 20:57     ` Ed White
  2015-01-16 18:04       ` Tim Deegan
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-15 20:57 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 09:25 AM, Tim Deegan wrote:
> Hi,
> 
> At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
>> +int
>> +altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
>> +                                unsigned long gla, struct npfec npfec)
>> +{
>> +    struct domain *d = v->domain;
>> +    struct p2m_domain *hp2m = p2m_get_hostp2m(d);
>> +    struct p2m_domain *ap2m;
>> +    p2m_type_t p2mt;
>> +    p2m_access_t p2ma;
>> +    unsigned int page_order;
>> +    unsigned long gfn, mask;
>> +    mfn_t mfn;
>> +    int rv;
>> +
>> +    ap2m = p2m_get_altp2m(v);
>> +
>> +    mfn = get_gfn_type_access(ap2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
>> +                              0, &page_order);
>> +    __put_gfn(ap2m, gpa >> PAGE_SHIFT);
>> +
>> +    if ( mfn_valid(mfn) )
>> +    {
>> +        /* Should #VE be emulated for this fault? */
>> +        if ( p2mt == p2m_ram_rw_ve && !cpu_has_vmx_virt_exceptions &&
>> +             ahvm_vcpu_emulate_ve(v) )
>> +            return ALTP2MHVM_PAGEFAULT_DONE;
>> +
>> +        /* Could not handle fault here */
>> +        gdprintk(XENLOG_INFO, "Altp2m memory access permissions failure, "
>> +                              "no mem_event listener VCPU %d, dom %d\n",
>> +                              v->vcpu_id, d->domain_id);
>> +        domain_crash(v->domain);
>> +        return ALTP2MHVM_PAGEFAULT_CONTINUE;
>> +    }
>> +
>> +    mfn = get_gfn_type_access(hp2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
>> +                              0, &page_order);
>> +    put_gfn(hp2m->domain, gpa >> PAGE_SHIFT);
>> +
>> +    if ( p2mt != p2m_ram_rw || p2ma != p2m_access_rwx )
>> +        return ALTP2MHVM_PAGEFAULT_CONTINUE;
> 
> I don't follow -- surely the altp2m ought to contain everything that's
> in the host p2m except for deliberate extra changes.  But it looks
> like here you just bail on anything other than rwx RAM.  That's going
> to livelock on anything that the main fault handler thinks is OK to retry.

It sounds like there must be some cases that I'm not aware of.

I've only tested Windows, and I've never seen anything other than rwx ram
where the hardware is ever able to retry and succeed. That's why I don't
copy anything else to the alternate p2m, because my experience has been
that everything else can be resolved in the host p2m and its nested page
fault handler. As I explained to Jan, if the host p2m page becomes rwx ram
and *then* there's a retry, I copy that EPTE on the violation that the retry
triggers and retry again.

I want to get this right. Can you tell me what I'm missing?

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-15 17:33   ` Tim Deegan
@ 2015-01-15 21:00     ` Ed White
  2015-01-16  8:24       ` Jan Beulich
  2015-01-16 18:09       ` Tim Deegan
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-15 21:00 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 09:33 AM, Tim Deegan wrote:
> Hi,
> 
> Sorry for the fractured replies - my notes are confused about which
> functions were defined where.
> 
> At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
>> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
>> +                             unsigned long old_pfn, unsigned long new_pfn)
>> +{
> [...]
>> +    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
>> +
>> +    if ( !mfn_valid(mfn) )
>> +        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
>> +
>> +    if ( !mfn_valid(mfn) || !(t == p2m_ram_rw || t == p2m_ram_rw) )
>> +        goto out;
>> +
>> +    /* Use special ram type to enable #VE if setting for current domain */
>> +    if ( current->domain == d )
>> +        t = p2m_ram_rw_ve;
>> +
>> +    if ( !ap2m->set_entry(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a) )
>> +        rc = 1;
> 
> I'm afraid this is Terribly Unsafe[tm].  Following on from my point on
> the log-dirty patch, if the original gfn gets removed from the guest,
> for any reason, we need a way to find and remove this mapping too.
> 
> That will be non-trivial, since you can't do it by exhaustive search.
> Maybe some sort of reverse mapping?

How often is it likely that a page will be removed? If it's
infrequent, maybe an exhaustive search will suffice. I don't
expect there to be anywhere near 10 alternates in use in most
cases, and they are sparsely populated.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 17:28                     ` Ed White
  2015-01-15 17:45                       ` Tim Deegan
@ 2015-01-16  7:35                       ` Jan Beulich
  2015-01-16 16:54                         ` Ed White
  1 sibling, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-16  7:35 UTC (permalink / raw)
  To: Ed White, Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan, xen-devel,
	Ian Jackson

>>> On 15.01.15 at 18:28, <edmund.h.white@intel.com> wrote:
> On 01/15/2015 12:16 AM, Jan Beulich wrote:
>>>>> On 14.01.15 at 18:35, <edmund.h.white@intel.com> wrote:
>>> Right. The key observation is that at any single point in time, a given
>>> hardware thread can be fetching an instruction or reading data, but not
>>> both.
>> 
>> Fine, as long as an instruction reading itself isn't going to lead to
>> a live lock.
> 
> That's not how the hardware works. By the time you figure out that the
> instruction you are executing reads memory, the instruction itself has
> been fetched and decoded. That won't happen again during this execution.
> 
> That's true for every CPU I've ever seen, not just Intel ones.

Certainly not if either the fetch or the data read faults/vmexits.

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-15 18:49       ` Ed White
@ 2015-01-16  7:37         ` Jan Beulich
  2015-01-16 17:23         ` Tim Deegan
  1 sibling, 0 replies; 135+ messages in thread
From: Jan Beulich @ 2015-01-16  7:37 UTC (permalink / raw)
  To: Ed White; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

>>> On 15.01.15 at 19:49, <edmund.h.white@intel.com> wrote:
> On 01/15/2015 08:53 AM, Jan Beulich wrote:
>>>>> On 15.01.15 at 17:48, <tim@xen.org> wrote:
>>> At 13:26 -0800 on 09 Jan (1420806395), Ed White wrote:
>>>> +    /* Init alternate p2m data */
>>>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
>>>> +    {
>>>> +        rv = -ENOMEM;
>>>> +        goto out;
>>>> +    }
>>>> +    for (i = 0; i < 512; i++)
>>>> +        d->arch.altp2m_eptp[i] = ~0ul;
>>>
>>> This 512 is architectural, I guess?  It should have a named constant.
>> 
>> Perhaps even calculated rather than just defined as a plain
>> number constant, e.g. (PAGE_SIZE / sizeof(something)).
> 
> There are architectural reasons why it's very unlikely to change
> from 512, so I'll just name it if that's OK.

Imo if it's a calculated value (like the number of page table entries
in a page table page is), it should be defined like this. PAGE_SIZE
and the appropriate sizeof() won't change either after all.

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 18:23   ` Ed White
@ 2015-01-16  8:12     ` Jan Beulich
  2015-01-16 17:01       ` Ed White
  2015-01-16 18:33     ` Tim Deegan
  2015-03-25 17:41     ` Ed White
  2 siblings, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-16  8:12 UTC (permalink / raw)
  To: Ed White; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

>>> On 15.01.15 at 19:23, <edmund.h.white@intel.com> wrote:
> On 01/15/2015 08:15 AM, Tim Deegan wrote:
>> - Feature compatibilty/completeness.  You pointed out yourself that
>>   it doesn't work with nested HVM or migration.  I think I'd have to
>>   add mem_event/access/paging and PCI passthrough to the list of
>>   features that ought to still work.  I'm resigned to the idea that
>>   many new features don't work with shadow pagetables. :)
>> 
> 
> The intention is that mem_event/access should still work. I haven't
> specifically looked at paging, but I don't see any fundamental reason
> why it shouldn't. PCI passthrough I suspect won't. Does nested HVM
> work with migration? Is it simply not acceptable to submit a feature
> as experimental, with known compatibility issues? I had assumed that
> it was, based on the nested HVM status as documented in the release
> notes.

It is generally acceptable, sure. But the sad thing here is that
particularly with such code coming from Intel (with the nested
VMX code being a good example, and certain IOMMU/pass-
through things being another) my experience is that once that
initial code drop happened, interest from the original authors is
lost (possibly not maliciously, but because of being assigned other
tasks) and the code therefore has to remain experimental for
extended periods of time (often until someone else gets
frustrated enough about its half baked state to spend non-
negligible amounts of time to deal with the loose ends).

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-15 20:38     ` Ed White
@ 2015-01-16  8:20       ` Jan Beulich
  2015-01-16 17:14         ` Ed White
  2015-01-16 17:52       ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-16  8:20 UTC (permalink / raw)
  To: Ed White; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

>>> On 15.01.15 at 21:38, <edmund.h.white@intel.com> wrote:
> On 01/15/2015 09:03 AM, Tim Deegan wrote:
>> At 13:26 -0800 on 09 Jan (1420806397), Ed White wrote:
>>> This is treated exactly like p2m_ram_rw, except that suppress_ve is not
>>> set in the EPTE.
>> 
>> I don't think this is going to work -- you probably want to support
>> p2m_ram_ro at least, and maybe other types, but duplicating each of
>> them as a 'type foo with #VE' doesn't seem right.
>> 
>> Since the default is to set the ignore-#ve flag everywhere, how about
>> having an operation to enable #ve for a frame that just clears that
>> bit, and then having all other updates to altp2m entries preserve it?
> 
> I hear you, but #VE is only even relevant for the in-domain agent
> model, and as the only current user of that model we not only don't
> want #VE to work on other page types, we specifically want it to be
> prohibited.
> 
> Can we do it this way, and then change it later if required?

Without you explaining to us the full details of the in-domain
agent model, I'm afraid this is going to remain dubious and the
question hard to answer. In particular, if you indeed want to
prohibit that behavior on _all_ other p2m types, how would
subsequently changing the implementation then be compatible
(if it can't be done this way right from the beginning)?

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-15 21:00     ` Ed White
@ 2015-01-16  8:24       ` Jan Beulich
  2015-01-16 17:17         ` Ed White
  2015-01-16 18:09       ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-16  8:24 UTC (permalink / raw)
  To: Ed White; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

>>> On 15.01.15 at 22:00, <edmund.h.white@intel.com> wrote:
> On 01/15/2015 09:33 AM, Tim Deegan wrote:
>> Hi,
>> 
>> Sorry for the fractured replies - my notes are confused about which
>> functions were defined where.
>> 
>> At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
>>> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
>>> +                             unsigned long old_pfn, unsigned long new_pfn)
>>> +{
>> [...]
>>> +    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
>>> +
>>> +    if ( !mfn_valid(mfn) )
>>> +        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
>>> +
>>> +    if ( !mfn_valid(mfn) || !(t == p2m_ram_rw || t == p2m_ram_rw) )
>>> +        goto out;
>>> +
>>> +    /* Use special ram type to enable #VE if setting for current domain */
>>> +    if ( current->domain == d )
>>> +        t = p2m_ram_rw_ve;
>>> +
>>> +    if ( !ap2m->set_entry(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a) )
>>> +        rc = 1;
>> 
>> I'm afraid this is Terribly Unsafe[tm].  Following on from my point on
>> the log-dirty patch, if the original gfn gets removed from the guest,
>> for any reason, we need a way to find and remove this mapping too.
>> 
>> That will be non-trivial, since you can't do it by exhaustive search.
>> Maybe some sort of reverse mapping?
> 
> How often is it likely that a page will be removed? If it's
> infrequent, maybe an exhaustive search will suffice. I don't
> expect there to be anywhere near 10 alternates in use in most
> cases, and they are sparsely populated.

A fundamental thing you need to keep in mind when considering
exhaustive searches is that these need to be preemptible, no
matter how infrequent they are. Which may be difficult to
arrange for, based on experience with code where we needed
to add such preemption later on (as security fixes).

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 17:31                     ` Ed White
@ 2015-01-16 10:43                       ` Tamas K Lengyel
  2015-01-16 17:21                         ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tamas K Lengyel @ 2015-01-16 10:43 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Keir Fraser, Ian Campbell, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich

On Thu, Jan 15, 2015 at 6:31 PM, Ed White <edmund.h.white@intel.com> wrote:
> On 01/15/2015 02:39 AM, Tamas K Lengyel wrote:
>>> There are ways of avoiding the
>>> single-step too, although I don't think that falls within the scope
>>> of this conversation.
>>>
>>> Ed
>>
>> I would be very interested in knowing how we can avoid the singlestep
>> phase. Are you envisioning using this with a split-TLB? IMHO this is a
>> pretty critical component of effectively using the new feature so
>> should be within scope of this discussion.
>>
>
> It's an optimization certainly, but it's not required, and it's not
> a technique we have placed in the public domain. You could try talking
> to us under NDA or figure it out for yourself, but I can't detail
> it here.
>
> Ed

That's unfortunate. I would have been happy to extend the xen-access
test tool to exercise this feature but contributing code that is known
to be sub-optimal from the beginning just doesn't feel right.

Tamas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-16  7:35                       ` Jan Beulich
@ 2015-01-16 16:54                         ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-16 16:54 UTC (permalink / raw)
  To: Jan Beulich, Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Tim Deegan, xen-devel,
	Ian Jackson

On 01/15/2015 11:35 PM, Jan Beulich wrote:
>>>> On 15.01.15 at 18:28, <edmund.h.white@intel.com> wrote:
>> On 01/15/2015 12:16 AM, Jan Beulich wrote:
>>>>>> On 14.01.15 at 18:35, <edmund.h.white@intel.com> wrote:
>>>> Right. The key observation is that at any single point in time, a given
>>>> hardware thread can be fetching an instruction or reading data, but not
>>>> both.
>>>
>>> Fine, as long as an instruction reading itself isn't going to lead to
>>> a live lock.
>>
>> That's not how the hardware works. By the time you figure out that the
>> instruction you are executing reads memory, the instruction itself has
>> been fetched and decoded. That won't happen again during this execution.
>>
>> That's true for every CPU I've ever seen, not just Intel ones.
> 
> Certainly not if either the fetch or the data read faults/vmexits.

Sorry about that. As I explained, I rushed off a response before
actually engaging my brain.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-16  8:12     ` Jan Beulich
@ 2015-01-16 17:01       ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-16 17:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

On 01/16/2015 12:12 AM, Jan Beulich wrote:
>>>> On 15.01.15 at 19:23, <edmund.h.white@intel.com> wrote:
>> On 01/15/2015 08:15 AM, Tim Deegan wrote:
>>> - Feature compatibilty/completeness.  You pointed out yourself that
>>>   it doesn't work with nested HVM or migration.  I think I'd have to
>>>   add mem_event/access/paging and PCI passthrough to the list of
>>>   features that ought to still work.  I'm resigned to the idea that
>>>   many new features don't work with shadow pagetables. :)
>>>
>>
>> The intention is that mem_event/access should still work. I haven't
>> specifically looked at paging, but I don't see any fundamental reason
>> why it shouldn't. PCI passthrough I suspect won't. Does nested HVM
>> work with migration? Is it simply not acceptable to submit a feature
>> as experimental, with known compatibility issues? I had assumed that
>> it was, based on the nested HVM status as documented in the release
>> notes.
> 
> It is generally acceptable, sure. But the sad thing here is that
> particularly with such code coming from Intel (with the nested
> VMX code being a good example, and certain IOMMU/pass-
> through things being another) my experience is that once that
> initial code drop happened, interest from the original authors is
> lost (possibly not maliciously, but because of being assigned other
> tasks) and the code therefore has to remain experimental for
> extended periods of time (often until someone else gets
> frustrated enough about its half baked state to spend non-
> negligible amounts of time to deal with the loose ends).

I can't make any guarantees, I'm just one person and I do what
the people who pay me tell me to do.

What I can tell you is that our primary motivation for producing
this patch series is that we have an internal customer already
shipping Xen-based products who wants to use these capabilities.
That customer isn't going to go away, and isn't going to want to
support something half-baked.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-16  8:20       ` Jan Beulich
@ 2015-01-16 17:14         ` Ed White
  2015-01-19  8:49           ` Jan Beulich
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-16 17:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

On 01/16/2015 12:20 AM, Jan Beulich wrote:
>>>> On 15.01.15 at 21:38, <edmund.h.white@intel.com> wrote:
>> On 01/15/2015 09:03 AM, Tim Deegan wrote:
>>> At 13:26 -0800 on 09 Jan (1420806397), Ed White wrote:
>>>> This is treated exactly like p2m_ram_rw, except that suppress_ve is not
>>>> set in the EPTE.
>>>
>>> I don't think this is going to work -- you probably want to support
>>> p2m_ram_ro at least, and maybe other types, but duplicating each of
>>> them as a 'type foo with #VE' doesn't seem right.
>>>
>>> Since the default is to set the ignore-#ve flag everywhere, how about
>>> having an operation to enable #ve for a frame that just clears that
>>> bit, and then having all other updates to altp2m entries preserve it?
>>
>> I hear you, but #VE is only even relevant for the in-domain agent
>> model, and as the only current user of that model we not only don't
>> want #VE to work on other page types, we specifically want it to be
>> prohibited.
>>
>> Can we do it this way, and then change it later if required?
> 
> Without you explaining to us the full details of the in-domain
> agent model, I'm afraid this is going to remain dubious and the
> question hard to answer. In particular, if you indeed want to
> prohibit that behavior on _all_ other p2m types, how would
> subsequently changing the implementation then be compatible
> (if it can't be done this way right from the beginning)?

I think I have explained it. There is software already commercially
available that uses a security hypervisor to partition memory at a
level below the OS page tables for Windows running on physical
hardware. We want to make that possible for Windows in a Xen domU.
The security hypervisor uses multiple EPTP's to apply different
access permissions to some guest physical addresses in different
views (p2m's) and in certain cases applies different
guest physical->host physical (gfn->mfn) mappings to some pages
between different views. The only pages to which any of these
operations is applied are pages which are rwx ram at a hardware level.
The security hypervisor works in concert with a protected agent that
runs inside the OS.

I don't see any great difficulty at all in implementing the #VE
functionality through a p2m type initially and subsequently adding
a more generic facility. What do you see as the problem there?

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-16  8:24       ` Jan Beulich
@ 2015-01-16 17:17         ` Ed White
  2015-01-19  8:52           ` Jan Beulich
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-16 17:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

On 01/16/2015 12:24 AM, Jan Beulich wrote:
>>>> On 15.01.15 at 22:00, <edmund.h.white@intel.com> wrote:
>> On 01/15/2015 09:33 AM, Tim Deegan wrote:
>>> Hi,
>>>
>>> Sorry for the fractured replies - my notes are confused about which
>>> functions were defined where.
>>>
>>> At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
>>>> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
>>>> +                             unsigned long old_pfn, unsigned long new_pfn)
>>>> +{
>>> [...]
>>>> +    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
>>>> +
>>>> +    if ( !mfn_valid(mfn) )
>>>> +        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
>>>> +
>>>> +    if ( !mfn_valid(mfn) || !(t == p2m_ram_rw || t == p2m_ram_rw) )
>>>> +        goto out;
>>>> +
>>>> +    /* Use special ram type to enable #VE if setting for current domain */
>>>> +    if ( current->domain == d )
>>>> +        t = p2m_ram_rw_ve;
>>>> +
>>>> +    if ( !ap2m->set_entry(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a) )
>>>> +        rc = 1;
>>>
>>> I'm afraid this is Terribly Unsafe[tm].  Following on from my point on
>>> the log-dirty patch, if the original gfn gets removed from the guest,
>>> for any reason, we need a way to find and remove this mapping too.
>>>
>>> That will be non-trivial, since you can't do it by exhaustive search.
>>> Maybe some sort of reverse mapping?
>>
>> How often is it likely that a page will be removed? If it's
>> infrequent, maybe an exhaustive search will suffice. I don't
>> expect there to be anywhere near 10 alternates in use in most
>> cases, and they are sparsely populated.
> 
> A fundamental thing you need to keep in mind when considering
> exhaustive searches is that these need to be preemptible, no
> matter how infrequent they are. Which may be difficult to
> arrange for, based on experience with code where we needed
> to add such preemption later on (as security fixes).

I've seen plenty of activity on the list regarding pre-emptible
hypercalls, but I'm not clear what level of DOS this prevents. Is
it to guard against a DOS against a domain, or against the whole system?

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-16 10:43                       ` Tamas K Lengyel
@ 2015-01-16 17:21                         ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-16 17:21 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Tim Deegan, Keir Fraser, Ian Campbell, Andrew Cooper,
	Ian Jackson, xen-devel, Jan Beulich

On 01/16/2015 02:43 AM, Tamas K Lengyel wrote:
> On Thu, Jan 15, 2015 at 6:31 PM, Ed White <edmund.h.white@intel.com> wrote:
>> On 01/15/2015 02:39 AM, Tamas K Lengyel wrote:
>>>> There are ways of avoiding the
>>>> single-step too, although I don't think that falls within the scope
>>>> of this conversation.
>>>>
>>>> Ed
>>>
>>> I would be very interested in knowing how we can avoid the singlestep
>>> phase. Are you envisioning using this with a split-TLB? IMHO this is a
>>> pretty critical component of effectively using the new feature so
>>> should be within scope of this discussion.
>>>
>>
>> It's an optimization certainly, but it's not required, and it's not
>> a technique we have placed in the public domain. You could try talking
>> to us under NDA or figure it out for yourself, but I can't detail
>> it here.
>>
>> Ed
> 
> That's unfortunate. I would have been happy to extend the xen-access
> test tool to exercise this feature but contributing code that is known
> to be sub-optimal from the beginning just doesn't feel right.

If you don't feel it's worth improving the existing solution unless
you can make it perfect, that's your decision. I don't get to decide
what Intel makes public and what it doesn't.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/11] VMX: implement suppress #VE.
  2015-01-15 18:46     ` Ed White
@ 2015-01-16 17:22       ` Tim Deegan
  2015-03-25 17:30       ` Ed White
  1 sibling, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 17:22 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 10:46 -0800 on 15 Jan (1421315210), Ed White wrote:
> On 01/15/2015 08:25 AM, Tim Deegan wrote:
> > Hi,
> > 
> > At 13:26 -0800 on 09 Jan (1420806392), Ed White wrote:
> >>  static inline bool_t is_epte_valid(ept_entry_t *e)
> >>  {
> >> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
> >> +    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);
> > 
> > This test for 0 is just catching uninitialised entries in freshly
> > allocated pages.  Rather than changing it to ignore bit 63, this loop...
> > 
> >>  }
> >>  
> >>  /* returns : 0 for success, -errno otherwise */
> >> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
> >>  
> >>      ept_entry->r = ept_entry->w = ept_entry->x = 1;
> >>  
> >> +    /* Disable #VE on all entries */ 
> >> +    if ( cpu_has_vmx_virt_exceptions )
> >> +    {
> >> +        ept_entry_t *table = __map_domain_page(pg);
> >> +
> >> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> >> +            table[i].suppress_ve = 1;
> > 
> > ...should set the type of the empty entries to p2m_invalid as it goes.
> > 
> >> +    /* Disable #VE on all entries */
> >> +    if ( cpu_has_vmx_virt_exceptions )
> >> +    {
> >> +        ept_entry_t *table =
> >> +            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
> >> +
> >> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> >> +            table[i].suppress_ve = 1;
> > 
> > And the same here.
> 
> You mean do that unconditionally, regardless of hardware #VE?

Yes, I think that would be OK, and make for slightly cleaner code.  If
you prefer, you can set the type only for #VE hardware and leave the
'e->epte != 0' test as it is.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-15 18:49       ` Ed White
  2015-01-16  7:37         ` Jan Beulich
@ 2015-01-16 17:23         ` Tim Deegan
  1 sibling, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 17:23 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, Jan Beulich, xen-devel

At 10:49 -0800 on 15 Jan (1421315382), Ed White wrote:
> On 01/15/2015 08:53 AM, Jan Beulich wrote:
> >>>> On 15.01.15 at 17:48, <tim@xen.org> wrote:
> >> At 13:26 -0800 on 09 Jan (1420806395), Ed White wrote:
> >>> +    /* Init alternate p2m data */
> >>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
> >>> +    {
> >>> +        rv = -ENOMEM;
> >>> +        goto out;
> >>> +    }
> >>> +    for (i = 0; i < 512; i++)
> >>> +        d->arch.altp2m_eptp[i] = ~0ul;
> >>
> >> This 512 is architectural, I guess?  It should have a named constant.
> > 
> > Perhaps even calculated rather than just defined as a plain
> > number constant, e.g. (PAGE_SIZE / sizeof(something)).
> 
> There are architectural reasons why it's very unlikely to change
> from 512, so I'll just name it if that's OK.

Either is fine by me -- I'll leave that beween you and Jan.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-01-15 18:55     ` Ed White
@ 2015-01-16 17:50       ` Tim Deegan
  2015-01-16 17:57         ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 17:50 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 10:55 -0800 on 15 Jan (1421315724), Ed White wrote:
> On 01/15/2015 08:56 AM, Tim Deegan wrote:
> > Hi,
> > 
> > At 13:26 -0800 on 09 Jan (1420806396), Ed White wrote:
> >> @@ -2551,6 +2640,17 @@ static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs)
> >>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
> >>          break;
> >>      case X86EMUL_EXCEPTION:
> >> +        /* check for a VMFUNC that should be emulated */
> >> +        if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
> >> +             ctxt.insn_buf_bytes >= 3 && ctxt.insn_buf[0] == 0x0f &&
> >> +             ctxt.insn_buf[1] == 0x01 && ctxt.insn_buf[2] == 0xd4 &&
> >> +             regs->eax == 0 &&
> >> +             p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
> >> +        {
> >> +            regs->eip += 3;
> >> +            return;
> >> +        }
> >> +
> > 
> > I think Andrew already pointed out that this needs to be done by
> > adding VMFUNC to the emulator itself with a callback.  Apart from
> > anything else that will DTRT with prefix bytes &c.
> > 
> >> +        if ( (uint16_t)idx != vcpu_altp2mhvm(v).p2midx )
> >> +        {
> >> +            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> >> +            vcpu_altp2mhvm(v).p2midx = (uint16_t)idx;
> >> +            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
> > 
> > This looks wrong -- you need to do a TLB flush before you can remove
> > this CPU from the dirty_cpumask.
> > 
> 
> No, the whole point of multiple EPTP's is that you can switch between them
> without a flush. The EPTP is part of the TLB tag, and you want that entry
> to stay in the TLB because you're probably going to switch back and use
> it again.

That's actually what I was worried about...

> If you tear the whole table down you need a flush, but I think the
> existing EPT code handles that.  I only use the mask to make sure I
> don't tear down a table that is the current table for a vcpu.

and this is why I was confused.  The meaning of 'dirty_cpumask' in Xen
generally is 'all CPUs that might hold state derived from this',
i.e. all the CPUs you'd have to IPI if you wanted to be sure that a
mapping you removed from this table wasn't still cached.  IOW, this
could be used to mask down flush IPIs when p2m updates happen to this
table.

Looking at the code, the current (non-nested) HAP code uses the
_domain_'s dirty_cpumask for all flushes, so for altp2m this field is
not needed.

I'm not comfortable with it being reused for something
almost-but-not-quite like the usual semantics, though.  Can you please
use a simple counter for this instead?

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-15 20:38     ` Ed White
  2015-01-16  8:20       ` Jan Beulich
@ 2015-01-16 17:52       ` Tim Deegan
  2015-01-16 18:35         ` Ed White
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 17:52 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 12:38 -0800 on 15 Jan (1421321902), Ed White wrote:
> On 01/15/2015 09:03 AM, Tim Deegan wrote:
> > At 13:26 -0800 on 09 Jan (1420806397), Ed White wrote:
> >> This is treated exactly like p2m_ram_rw, except that suppress_ve is not
> >> set in the EPTE.
> > 
> > I don't think this is going to work -- you probably want to support
> > p2m_ram_ro at least, and maybe other types, but duplicating each of
> > them as a 'type foo with #VE' doesn't seem right.
> > 
> > Since the default is to set the ignore-#ve flag everywhere, how about
> > having an operation to enable #ve for a frame that just clears that
> > bit, and then having all other updates to altp2m entries preserve it?
> 
> I hear you, but #VE is only even relevant for the in-domain agent
> model, and as the only current user of that model we not only don't
> want #VE to work on other page types, we specifically want it to be
> prohibited.

I see.  I think it would be very useful if you could add some
documentation of the new feature, covering this sort of thing, as well
as the exact semantics of the hypercalls.

> Can we do it this way, and then change it later if required?

No thank you.  It shouldn't be hard to do it the clean way from the
start.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types.
  2015-01-15 20:43     ` Ed White
@ 2015-01-16 17:57       ` Tim Deegan
  0 siblings, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 17:57 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 12:43 -0800 on 15 Jan (1421322197), Ed White wrote:
> On 01/15/2015 09:09 AM, Tim Deegan wrote:
> > Hi,
> > 
> > These _definitely_ need XSM checks, otherwise any domain can call them
> > on any other!  I think you can probably copy the other p2m-munging
> > operations to see how to make a sensible default policy.
> 
> Understood. I'll look at this subject again, but it's an area where
> I could really use some help. There aren't any similar hypercalls that
> I could find where the target domain and source domain may or may not
> be the same, and the processing required varies depending on which is
> the case.

Yeah, this stuff is a bit non-obvious.  IIUC what you want is
basically for all these operations to be available to either the VM
itself or a privileged helper in another domain.  The shorthand for
that is XSM_TARGET, so you should be ok with something like

rc = xsm_hvm_control(XSM_TARGET, d, op);

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-01-16 17:50       ` Tim Deegan
@ 2015-01-16 17:57         ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-16 17:57 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/16/2015 09:50 AM, Tim Deegan wrote:
> At 10:55 -0800 on 15 Jan (1421315724), Ed White wrote:
>> On 01/15/2015 08:56 AM, Tim Deegan wrote:
>>> Hi,
>>>
>>> At 13:26 -0800 on 09 Jan (1420806396), Ed White wrote:
>>>> @@ -2551,6 +2640,17 @@ static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs)
>>>>          hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
>>>>          break;
>>>>      case X86EMUL_EXCEPTION:
>>>> +        /* check for a VMFUNC that should be emulated */
>>>> +        if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
>>>> +             ctxt.insn_buf_bytes >= 3 && ctxt.insn_buf[0] == 0x0f &&
>>>> +             ctxt.insn_buf[1] == 0x01 && ctxt.insn_buf[2] == 0xd4 &&
>>>> +             regs->eax == 0 &&
>>>> +             p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
>>>> +        {
>>>> +            regs->eip += 3;
>>>> +            return;
>>>> +        }
>>>> +
>>>
>>> I think Andrew already pointed out that this needs to be done by
>>> adding VMFUNC to the emulator itself with a callback.  Apart from
>>> anything else that will DTRT with prefix bytes &c.
>>>
>>>> +        if ( (uint16_t)idx != vcpu_altp2mhvm(v).p2midx )
>>>> +        {
>>>> +            cpumask_clear_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
>>>> +            vcpu_altp2mhvm(v).p2midx = (uint16_t)idx;
>>>> +            cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
>>>
>>> This looks wrong -- you need to do a TLB flush before you can remove
>>> this CPU from the dirty_cpumask.
>>>
>>
>> No, the whole point of multiple EPTP's is that you can switch between them
>> without a flush. The EPTP is part of the TLB tag, and you want that entry
>> to stay in the TLB because you're probably going to switch back and use
>> it again.
> 
> That's actually what I was worried about...
> 
>> If you tear the whole table down you need a flush, but I think the
>> existing EPT code handles that.  I only use the mask to make sure I
>> don't tear down a table that is the current table for a vcpu.
> 
> and this is why I was confused.  The meaning of 'dirty_cpumask' in Xen
> generally is 'all CPUs that might hold state derived from this',
> i.e. all the CPUs you'd have to IPI if you wanted to be sure that a
> mapping you removed from this table wasn't still cached.  IOW, this
> could be used to mask down flush IPIs when p2m updates happen to this
> table.
> 
> Looking at the code, the current (non-nested) HAP code uses the
> _domain_'s dirty_cpumask for all flushes, so for altp2m this field is
> not needed.
> 
> I'm not comfortable with it being reused for something
> almost-but-not-quite like the usual semantics, though.  Can you please
> use a simple counter for this instead?

Will do. Since the mask is already there and not needed for anything
else in the non-nested case, I thought it was useful in case there
was a future need to know which vcpu's were using a given alt p2m,
but there is no such need currently.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 10/11] x86/altp2m: fix log-dirty handling.
  2015-01-15 20:49     ` Ed White
@ 2015-01-16 17:59       ` Tim Deegan
  0 siblings, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 17:59 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 12:49 -0800 on 15 Jan (1421322565), Ed White wrote:
> On 01/15/2015 09:20 AM, Tim Deegan wrote:
> > Hi,
> > 
> > The locking chages look OK at first glance, but...
> > 
> > At 13:26 -0800 on 09 Jan (1420806400), Ed White wrote:
> >> @@ -793,6 +793,10 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
> >>  
> >>      gfn_unlock(p2m, gfn, 0);
> >>  
> >> +    if ( pt == ot && altp2mhvm_active(d) )
> >> +        /* make sure this page isn't valid in any alternate p2m */
> >> +        p2m_remove_altp2m_page(d, gfn);
> >> +
> >>      return rc;
> >>  }
> > 
> > ...this is the wrong level to be making this change at.  The hook needs
> > to be right at the bottom, in atomic_write_ept_entry() (and
> > hap_write_p2m_entry() for AMD, I think), to catch _every_ update of a
> > p2m entry in the host p2m.
> > 
> > Otherwise a guest frame could be removed entirely and the altp2m would
> > still map it.  Or am I missing some other path that handles that case?
> > nested-p2m handles this by failry aggressively flushing nested p2m
> > tabvles but that doesn't sounds suitable for this since there's state
> > in the alt-p2m that needs to be retained.
> 
> Hmm. Is that going to give me even more locking order problems?

Potentially.  Having given yourself a separate altp2m lock, you might
be able to nest it the other way around, so the p2m lock is always
taken first.

> I don't want to go down the nested p2m route. That is seriously bad
> for performance, and I've also seen plenty of cases where a flush on
> one vcpu breaks instruction emulation on another. Also, as you say,
> I don't have enough information to rebuild the alt p2m.

Yep, it's clearly not going to work for you.

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-15 20:57     ` Ed White
@ 2015-01-16 18:04       ` Tim Deegan
  0 siblings, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 18:04 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 12:57 -0800 on 15 Jan (1421323048), Ed White wrote:
> On 01/15/2015 09:25 AM, Tim Deegan wrote:
> > Hi,
> > 
> > At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
> >> +int
> >> +altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
> >> +                                unsigned long gla, struct npfec npfec)
> >> +{
> >> +    struct domain *d = v->domain;
> >> +    struct p2m_domain *hp2m = p2m_get_hostp2m(d);
> >> +    struct p2m_domain *ap2m;
> >> +    p2m_type_t p2mt;
> >> +    p2m_access_t p2ma;
> >> +    unsigned int page_order;
> >> +    unsigned long gfn, mask;
> >> +    mfn_t mfn;
> >> +    int rv;
> >> +
> >> +    ap2m = p2m_get_altp2m(v);
> >> +
> >> +    mfn = get_gfn_type_access(ap2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
> >> +                              0, &page_order);
> >> +    __put_gfn(ap2m, gpa >> PAGE_SHIFT);
> >> +
> >> +    if ( mfn_valid(mfn) )
> >> +    {
> >> +        /* Should #VE be emulated for this fault? */
> >> +        if ( p2mt == p2m_ram_rw_ve && !cpu_has_vmx_virt_exceptions &&
> >> +             ahvm_vcpu_emulate_ve(v) )
> >> +            return ALTP2MHVM_PAGEFAULT_DONE;
> >> +
> >> +        /* Could not handle fault here */
> >> +        gdprintk(XENLOG_INFO, "Altp2m memory access permissions failure, "
> >> +                              "no mem_event listener VCPU %d, dom %d\n",
> >> +                              v->vcpu_id, d->domain_id);
> >> +        domain_crash(v->domain);
> >> +        return ALTP2MHVM_PAGEFAULT_CONTINUE;
> >> +    }
> >> +
> >> +    mfn = get_gfn_type_access(hp2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
> >> +                              0, &page_order);
> >> +    put_gfn(hp2m->domain, gpa >> PAGE_SHIFT);
> >> +
> >> +    if ( p2mt != p2m_ram_rw || p2ma != p2m_access_rwx )
> >> +        return ALTP2MHVM_PAGEFAULT_CONTINUE;
> > 
> > I don't follow -- surely the altp2m ought to contain everything that's
> > in the host p2m except for deliberate extra changes.  But it looks
> > like here you just bail on anything other than rwx RAM.  That's going
> > to livelock on anything that the main fault handler thinks is OK to retry.
> 
> It sounds like there must be some cases that I'm not aware of.
> 
> I've only tested Windows, and I've never seen anything other than rwx ram
> where the hardware is ever able to retry and succeed. That's why I don't
> copy anything else to the alternate p2m, because my experience has been
> that everything else can be resolved in the host p2m and its nested page
> fault handler. As I explained to Jan, if the host p2m page becomes rwx ram
> and *then* there's a retry, I copy that EPTE on the violation that the retry
> triggers and retry again.
> 
> I want to get this right. Can you tell me what I'm missing?

I think the easiest case to test would be read access to a read-only
area (e.g., I think, the in-VM BIOS).  PCI passthrough will have the
same problem with p2m_mmio_diirect mappings of BARs.

In those cases the hostp2m has a valid mapping that it expects the
guest can use, but it's not rwx.

I think there would be a similr problem with anything that mem_even
caller had marked non-execute or read-only in the host p2m.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-15 21:00     ` Ed White
  2015-01-16  8:24       ` Jan Beulich
@ 2015-01-16 18:09       ` Tim Deegan
  1 sibling, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 18:09 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 13:00 -0800 on 15 Jan (1421323248), Ed White wrote:
> On 01/15/2015 09:33 AM, Tim Deegan wrote:
> > Hi,
> > 
> > Sorry for the fractured replies - my notes are confused about which
> > functions were defined where.
> > 
> > At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
> >> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
> >> +                             unsigned long old_pfn, unsigned long new_pfn)
> >> +{
> > [...]
> >> +    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
> >> +
> >> +    if ( !mfn_valid(mfn) )
> >> +        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
> >> +
> >> +    if ( !mfn_valid(mfn) || !(t == p2m_ram_rw || t == p2m_ram_rw) )
> >> +        goto out;
> >> +
> >> +    /* Use special ram type to enable #VE if setting for current domain */
> >> +    if ( current->domain == d )
> >> +        t = p2m_ram_rw_ve;
> >> +
> >> +    if ( !ap2m->set_entry(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a) )
> >> +        rc = 1;
> > 
> > I'm afraid this is Terribly Unsafe[tm].  Following on from my point on
> > the log-dirty patch, if the original gfn gets removed from the guest,
> > for any reason, we need a way to find and remove this mapping too.
> > 
> > That will be non-trivial, since you can't do it by exhaustive search.
> > Maybe some sort of reverse mapping?
> 
> How often is it likely that a page will be removed? If it's
> infrequent, maybe an exhaustive search will suffice. I don't
> expect there to be anywhere near 10 alternates in use in most
> cases, and they are sparsely populated.

The worry is that an exhaustive search could take long enough to cause
watchdogs to fire (either in Xen itself or in the vCPU that's
scheduled on the CPU doing the work).  Also, Xen needs to defend
against the worst that a malicious guest could do, which is to make
all 10 densely populated.

The options to avoid that are either to make the whole operation
restartable (which is probalby a lot of work, given that it would mean
changing every operation that makes a p2m update!) or to find a way of
avoiding the exhaustive search in the first place.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 18:23   ` Ed White
  2015-01-16  8:12     ` Jan Beulich
@ 2015-01-16 18:33     ` Tim Deegan
  2015-01-16 20:32       ` Ed White
                         ` (2 more replies)
  2015-03-25 17:41     ` Ed White
  2 siblings, 3 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-16 18:33 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

Hi,

At 10:23 -0800 on 15 Jan (1421313824), Ed White wrote:
> On 01/15/2015 08:15 AM, Tim Deegan wrote:
> > I see there's been some discussion of how this would be useful for an
> > out-of-domain inspection tool, but could you talk some more about the
> > usefulness of the in-VM callers?  I'm not sure what advantage it
> > brings over a kernel playing the same tricks in normal pagetables --
> > after all an attacker in the kernel can make hypercalls and so get
> > around any p2m restrictions.
> > 
> 
> Our original motivation for this work is that we have a requirement
> internally to enable in-guest agent based memory partitioning for
> Windows domains.
> 
> There is commercially available software that uses a security
> hypervisor to partition memory for Windows running on physical
> hardware, and we want to make the same thing possible on
> virtualized hardware.

Righto, thanks.  It sounds like you're not the only people interested
in this feature, which is encouraging.

> As I said in discussion with Andrew, my aim was to make it possible
> for these same changes to be extensible to AMD processors if they
> support multiple copies of whatever their EPT equivalent is, by
> simply emulating VMFUNC and #VE. That's why there are some wrappers
> in the implementation that appear redundant.

Yep, understood, and thank you for that.  But I think there was one
function (to find a p2m by eptp) that's defined in vmx.c and only ever
called from there -- that doesn't need an arch-specific wrapper,
because an eventual AMD equivalent would also be entirely contained
within svm.c.

> > The second thing is how similar some of this is to nested p2m code,
> > making me wonder whether it could share more code with that.  It's not
> > as much duplication as I had feared, but e.g. altp2m_write_p2m_entry()
> > is _identical_ to nestedp2m_write_p2m_entry(), (making the
> > copyright claim at the top of the file quite dubious, BTW).
> > 
> 
> I did initially use nestedp2m_write_p2m_entry directly, but I knew
> that wouldn't be acceptable! On this specific point, I would be more
> inclined to refactor the normal write entry routine so you can call
> it everywhere, since both the nested and alternate ones are simply
> a copy of a part of the normal one.

That sounds like an excellent idea.

> > - Feature compatibilty/completeness.  You pointed out yourself that
> >   it doesn't work with nested HVM or migration.  I think I'd have to
> >   add mem_event/access/paging and PCI passthrough to the list of
> >   features that ought to still work.  I'm resigned to the idea that
> >   many new features don't work with shadow pagetables. :)
> > 
> 
> The intention is that mem_event/access should still work. I haven't
> specifically looked at paging, but I don't see any fundamental reason
> why it shouldn't. PCI passthrough I suspect won't. Does nested HVM
> work with migration? Is it simply not acceptable to submit a feature
> as experimental, with known compatibility issues? I had assumed that
> it was, based on the nested HVM status as documented in the release
> notes.

Potentially, yes, if we have reasonable confidence that you (or
someone else) will work towards fixing those things.  If you can't
make promises yourself, perhaps you can talk to someone who can.

> > - Testing and sample code.  If we're to carry this feature in the
> >   tree, we'll need at least some code to make use of it; ideally
> >   some sort of test we can run to find regressions later.
> 
> Understood. As I said in another thread, I'm hoping the community
> will be interested enough to help with that. If not, I'll try to
> figure something out.

Good, thanks.

> > - Issues of coding style and patch hygiene.  I don't think it's
> >   particularly useful to go into that in detail at this stage, but I
> >   did see some missing spaces in parentheses, e.g. for ( <-here-> ),
> >   and the patch series should be ordered so that the new feature is
> >   enabled in the last patch (in particular after 'fix log-dirty handling'!)
> 
> The reason I put the fix log-dirty handling patch there is because I wasn't
> convinced it would be acceptable, and it fixes a bug that already exists
> and remains unfixed in the nested p2m code. IOW, alternate p2m would
> work as well as nested p2m without that fix.

I would have thought, from the tone of your earlier comments, that
you were aiming for a bar somewhat higher than "as good as
nestedp2m". :)  I hope you'll also understand that given how well that 
has turned out, we shouldn't necessarily apply the same standard to
new code as we did there.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-16 17:52       ` Tim Deegan
@ 2015-01-16 18:35         ` Ed White
  2015-01-17  9:37           ` Tim Deegan
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-16 18:35 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/16/2015 09:52 AM, Tim Deegan wrote:
> At 12:38 -0800 on 15 Jan (1421321902), Ed White wrote:
>> On 01/15/2015 09:03 AM, Tim Deegan wrote:
>>> At 13:26 -0800 on 09 Jan (1420806397), Ed White wrote:
>>>> This is treated exactly like p2m_ram_rw, except that suppress_ve is not
>>>> set in the EPTE.
>>>
>>> I don't think this is going to work -- you probably want to support
>>> p2m_ram_ro at least, and maybe other types, but duplicating each of
>>> them as a 'type foo with #VE' doesn't seem right.
>>>
>>> Since the default is to set the ignore-#ve flag everywhere, how about
>>> having an operation to enable #ve for a frame that just clears that
>>> bit, and then having all other updates to altp2m entries preserve it?
>>
>> I hear you, but #VE is only even relevant for the in-domain agent
>> model, and as the only current user of that model we not only don't
>> want #VE to work on other page types, we specifically want it to be
>> prohibited.
> 
> I see.  I think it would be very useful if you could add some
> documentation of the new feature, covering this sort of thing, as well
> as the exact semantics of the hypercalls.
> 
>> Can we do it this way, and then change it later if required?
> 
> No thank you.  It shouldn't be hard to do it the clean way from the
> start.

The problem with doing it the clean way is that I have to use EPTE
bit 63 even on hardware that doesn't support it. That's not a
problem hardware-wise, because, at least for Intel, bit 63 is don't
care for non-#VE hardware. It does mean Xen can't use it for
anything else though.

If you look at the code in the current patch series, for non-#VE
hardware I don't use that bit, the nested page fault handler
decides whether to emulate #VE based on the p2m_type value.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-16 18:33     ` Tim Deegan
@ 2015-01-16 20:32       ` Ed White
  2015-01-17  9:34         ` Tim Deegan
  2015-01-16 21:43       ` Ed White
  2015-01-17  9:31       ` Tim Deegan
  2 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-16 20:32 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

> 
>> As I said in discussion with Andrew, my aim was to make it possible
>> for these same changes to be extensible to AMD processors if they
>> support multiple copies of whatever their EPT equivalent is, by
>> simply emulating VMFUNC and #VE. That's why there are some wrappers
>> in the implementation that appear redundant.
> 
> Yep, understood, and thank you for that.  But I think there was one
> function (to find a p2m by eptp) that's defined in vmx.c and only ever
> called from there -- that doesn't need an arch-specific wrapper,
> because an eventual AMD equivalent would also be entirely contained
> within svm.c.
> 

I think you're referring to p2m_find_altp2m_by_eptp(), defined in
p2m.c but only called from vmx.c. That might more properly be
p2m_find_altp2m_by_whatever_the_hardware_uses(), which is why I
put it there. I assumed it would have value for AMD.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-16 18:33     ` Tim Deegan
  2015-01-16 20:32       ` Ed White
@ 2015-01-16 21:43       ` Ed White
  2015-01-17  9:49         ` Tim Deegan
  2015-01-17  9:31       ` Tim Deegan
  2 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-01-16 21:43 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

> 
> I would have thought, from the tone of your earlier comments, that
> you were aiming for a bar somewhat higher than "as good as
> nestedp2m". :)  I hope you'll also understand that given how well that 
> has turned out, we shouldn't necessarily apply the same standard to
> new code as we did there.
> 

I absolutely am aiming for a higher bar. However, I'm mindful of
the fact that, at least to me, this seems to be a fairly significant
patch series, and although I have some minor patches in 4.5 I'm new
around here.

I've tried to make all my work consistent with existing code
and design in the same vein or same source file, regardless of my
opinion of that existing content, unless I had some compelling reason
not to. I've also tried not to touch anything that I didn't absolutely
have to touch (and as I've hinted, I'm not allowed to touch nested p2m).

I'm sure I haven't always succeeded, but that has been my intent.
My view was that this approach gave me the best chance of getting
the work accepted.

If I need to make more far-reaching changes (it occurs to me that
following the nested p2m model may have resulted in putting code
in p2m.c/h that should really be in p2m_ept.c/h, for example),
then I'm open to that, I'm just concerned that I'll be
criticized for not following existing style or being too
ambitious.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-16 18:33     ` Tim Deegan
  2015-01-16 20:32       ` Ed White
  2015-01-16 21:43       ` Ed White
@ 2015-01-17  9:31       ` Tim Deegan
  2015-01-17 15:01         ` Andrew Cooper
  2 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-17  9:31 UTC (permalink / raw)
  To: Ed White; +Cc: ian.jackson, keir, ian.campbell, jbeulich, xen-devel

Hi,

At 19:33 +0100 on 16 Jan (1421433186), Tim Deegan wrote:
> > > - Feature compatibilty/completeness.  You pointed out yourself that
> > >   it doesn't work with nested HVM or migration.  I think I'd have to
> > >   add mem_event/access/paging and PCI passthrough to the list of
> > >   features that ought to still work.  I'm resigned to the idea that
> > >   many new features don't work with shadow pagetables. :)
> > > 
> > 
> > The intention is that mem_event/access should still work. I haven't
> > specifically looked at paging, but I don't see any fundamental reason
> > why it shouldn't. PCI passthrough I suspect won't. Does nested HVM
> > work with migration? Is it simply not acceptable to submit a feature
> > as experimental, with known compatibility issues? I had assumed that
> > it was, based on the nested HVM status as documented in the release
> > notes.
> 
> Potentially, yes, if we have reasonable confidence that you (or
> someone else) will work towards fixing those things.  If you can't
> make promises yourself, perhaps you can talk to someone who can.

It occurs to me that I should make the distinction between migration
and passthrough, which are first-class features, and the others, which
are 'preview'.  So migration and passthrough are hard requirements,
and the others should have a bit more room for negotiation.

Our process around all this is far from clear, which I can see must be
frustrating to work with.  I wonder whether we can make some clearer
guidelines.

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-16 20:32       ` Ed White
@ 2015-01-17  9:34         ` Tim Deegan
  0 siblings, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-17  9:34 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 12:32 -0800 on 16 Jan (1421407932), Ed White wrote:
> > 
> >> As I said in discussion with Andrew, my aim was to make it possible
> >> for these same changes to be extensible to AMD processors if they
> >> support multiple copies of whatever their EPT equivalent is, by
> >> simply emulating VMFUNC and #VE. That's why there are some wrappers
> >> in the implementation that appear redundant.
> > 
> > Yep, understood, and thank you for that.  But I think there was one
> > function (to find a p2m by eptp) that's defined in vmx.c and only ever
> > called from there -- that doesn't need an arch-specific wrapper,
> > because an eventual AMD equivalent would also be entirely contained
> > within svm.c.
> > 
> 
> I think you're referring to p2m_find_altp2m_by_eptp(), defined in
> p2m.c but only called from vmx.c. That might more properly be
> p2m_find_altp2m_by_whatever_the_hardware_uses(), which is why I
> put it there. I assumed it would have value for AMD.

Ah, I had misread it, sorry.  Yes, that path is fine, I think.  It
would be nice to have a more arch-neutral name but I can;t think of
anything particularly good right now.  p2m_find_altp2m_by_table()
seems close to other similar thinkgs, but it's not excatly inspiring.

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-16 18:35         ` Ed White
@ 2015-01-17  9:37           ` Tim Deegan
  0 siblings, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-01-17  9:37 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 10:35 -0800 on 16 Jan (1421400901), Ed White wrote:
> On 01/16/2015 09:52 AM, Tim Deegan wrote:
> > At 12:38 -0800 on 15 Jan (1421321902), Ed White wrote:
> >> On 01/15/2015 09:03 AM, Tim Deegan wrote:
> >>> At 13:26 -0800 on 09 Jan (1420806397), Ed White wrote:
> >>>> This is treated exactly like p2m_ram_rw, except that suppress_ve is not
> >>>> set in the EPTE.
> >>>
> >>> I don't think this is going to work -- you probably want to support
> >>> p2m_ram_ro at least, and maybe other types, but duplicating each of
> >>> them as a 'type foo with #VE' doesn't seem right.
> >>>
> >>> Since the default is to set the ignore-#ve flag everywhere, how about
> >>> having an operation to enable #ve for a frame that just clears that
> >>> bit, and then having all other updates to altp2m entries preserve it?
> >>
> >> I hear you, but #VE is only even relevant for the in-domain agent
> >> model, and as the only current user of that model we not only don't
> >> want #VE to work on other page types, we specifically want it to be
> >> prohibited.
> > 
> > I see.  I think it would be very useful if you could add some
> > documentation of the new feature, covering this sort of thing, as well
> > as the exact semantics of the hypercalls.
> > 
> >> Can we do it this way, and then change it later if required?
> > 
> > No thank you.  It shouldn't be hard to do it the clean way from the
> > start.
> 
> The problem with doing it the clean way is that I have to use EPTE
> bit 63 even on hardware that doesn't support it. That's not a
> problem hardware-wise, because, at least for Intel, bit 63 is don't
> care for non-#VE hardware. It does mean Xen can't use it for
> anything else though.

That's fine for Xen.  In particular I can't imagine any other use that
Xen would have for that bit where we'd want it to be compatible with
emulated #VE but not real #VE.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-16 21:43       ` Ed White
@ 2015-01-17  9:49         ` Tim Deegan
  2015-01-19 19:35           ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-17  9:49 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 13:43 -0800 on 16 Jan (1421412191), Ed White wrote:
> I've tried to make all my work consistent with existing code
> and design in the same vein or same source file, regardless of my
> opinion of that existing content, unless I had some compelling reason
> not to. I've also tried not to touch anything that I didn't absolutely
> have to touch (and as I've hinted, I'm not allowed to touch nested p2m).
> 
> I'm sure I haven't always succeeded, but that has been my intent.
> My view was that this approach gave me the best chance of getting
> the work accepted.

Thank you for that, by the way.  I realise that this kind of review
can be terribly off-putting because every negative thing has to be
gone over in detail whereas the positives are pretty well taken for
granted.  And there is basically no way to get everything right first
time through -- you'll see that even code from long-time contributors
goes through the same process.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-17  9:31       ` Tim Deegan
@ 2015-01-17 15:01         ` Andrew Cooper
  2015-01-19 12:17           ` Tim Deegan
  0 siblings, 1 reply; 135+ messages in thread
From: Andrew Cooper @ 2015-01-17 15:01 UTC (permalink / raw)
  To: Tim Deegan, Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 17/01/2015 09:31, Tim Deegan wrote:
> Hi,
>
> At 19:33 +0100 on 16 Jan (1421433186), Tim Deegan wrote:
>>>> - Feature compatibilty/completeness.  You pointed out yourself that
>>>>   it doesn't work with nested HVM or migration.  I think I'd have to
>>>>   add mem_event/access/paging and PCI passthrough to the list of
>>>>   features that ought to still work.  I'm resigned to the idea that
>>>>   many new features don't work with shadow pagetables. :)
>>>>
>>> The intention is that mem_event/access should still work. I haven't
>>> specifically looked at paging, but I don't see any fundamental reason
>>> why it shouldn't. PCI passthrough I suspect won't. Does nested HVM
>>> work with migration? Is it simply not acceptable to submit a feature
>>> as experimental, with known compatibility issues? I had assumed that
>>> it was, based on the nested HVM status as documented in the release
>>> notes.
>> Potentially, yes, if we have reasonable confidence that you (or
>> someone else) will work towards fixing those things.  If you can't
>> make promises yourself, perhaps you can talk to someone who can.
> It occurs to me that I should make the distinction between migration
> and passthrough, which are first-class features, and the others, which
> are 'preview'.  So migration and passthrough are hard requirements,
> and the others should have a bit more room for negotiation.
>
> Our process around all this is far from clear, which I can see must be
> frustrating to work with.  I wonder whether we can make some clearer
> guidelines.

I think a useful starting point would be an in-tree document stating the
hypervisor features, their support status, a short description (of a
technical orientation, including interaction with other features), a
list of remaining work to do (nice to have, or to advance to a higher
support status).

This will allow new features to more easily identify the other areas
they need to work with, and be useful as a completely unambiguous
statement as to what status a feature has, and what further is required
for it to develop.


With this altp2m code, I have been thinking about migration and passthrough.

Migration and passthrough are themselves mutually exclusive features, as
logdirty cant identify DMA writes (and the toolstack can probably be
forgiven for not being able to wedge a real bit of hardware into the
migration stream :) ).

I cant see any problem between the existing alt2pm code and passthrough,
in the case that shared ept is not in use.  Given the number of other
issues we have with shared ept, I don't think this is a impediment.

Migration on the other hand poses a number of challenges.  The altp2m
work would require sending multiple p2ms in the stream which is
completely incompatible with legacy migration, but will be fine in
migration v2 where extensions can easily be made.  Logdirty would need
to be extended to cover the p2ms themselves, and p2m permissions would
also need to be sent.

As such, I do not believe migration support should be an impediment to
the altp2m series gaining experimental status in tree.  It is not
reasonable to delay inclusion for migration v2 to be accepted, and
frankly, the extra work to get migration working is probably as larger
task as this basic support.


In some copious free time, I will see about drafting a start to the
document.

~Andrew

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-16 17:14         ` Ed White
@ 2015-01-19  8:49           ` Jan Beulich
  2015-01-19 19:53             ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-19  8:49 UTC (permalink / raw)
  To: Ed White; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

>>> On 16.01.15 at 18:14, <edmund.h.white@intel.com> wrote:
> On 01/16/2015 12:20 AM, Jan Beulich wrote:
>>>>> On 15.01.15 at 21:38, <edmund.h.white@intel.com> wrote:
>>> On 01/15/2015 09:03 AM, Tim Deegan wrote:
>>>> At 13:26 -0800 on 09 Jan (1420806397), Ed White wrote:
>>>>> This is treated exactly like p2m_ram_rw, except that suppress_ve is not
>>>>> set in the EPTE.
>>>>
>>>> I don't think this is going to work -- you probably want to support
>>>> p2m_ram_ro at least, and maybe other types, but duplicating each of
>>>> them as a 'type foo with #VE' doesn't seem right.
>>>>
>>>> Since the default is to set the ignore-#ve flag everywhere, how about
>>>> having an operation to enable #ve for a frame that just clears that
>>>> bit, and then having all other updates to altp2m entries preserve it?
>>>
>>> I hear you, but #VE is only even relevant for the in-domain agent
>>> model, and as the only current user of that model we not only don't
>>> want #VE to work on other page types, we specifically want it to be
>>> prohibited.
>>>
>>> Can we do it this way, and then change it later if required?
>> 
>> Without you explaining to us the full details of the in-domain
>> agent model, I'm afraid this is going to remain dubious and the
>> question hard to answer. In particular, if you indeed want to
>> prohibit that behavior on _all_ other p2m types, how would
>> subsequently changing the implementation then be compatible
>> (if it can't be done this way right from the beginning)?
> 
> I think I have explained it. There is software already commercially
> available that uses a security hypervisor to partition memory at a
> level below the OS page tables for Windows running on physical
> hardware. We want to make that possible for Windows in a Xen domU.
> The security hypervisor uses multiple EPTP's to apply different
> access permissions to some guest physical addresses in different
> views (p2m's) and in certain cases applies different
> guest physical->host physical (gfn->mfn) mappings to some pages
> between different views. The only pages to which any of these
> operations is applied are pages which are rwx ram at a hardware level.
> The security hypervisor works in concert with a protected agent that
> runs inside the OS.
> 
> I don't see any great difficulty at all in implementing the #VE
> functionality through a p2m type initially and subsequently adding
> a more generic facility. What do you see as the problem there?

You said above "and as the only current user of that model we not
only don't want #VE to work on other page types, we specifically
want it to be prohibited" - if in a first implementation you enforce
this, and a later implementation relaxes it, the guest relying on the
first implementation's behavior may break. I.e. I'm not certain
whether with a later more complete implementation a guest
wouldn't be required to actively request what behavior it wants,
and whether making the default be how your first implementation
is intended to behave is (design-/architecture-wise) reasonable.

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 08/11] x86/altp2m: add remaining support routines.
  2015-01-16 17:17         ` Ed White
@ 2015-01-19  8:52           ` Jan Beulich
  0 siblings, 0 replies; 135+ messages in thread
From: Jan Beulich @ 2015-01-19  8:52 UTC (permalink / raw)
  To: Ed White; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

>>> On 16.01.15 at 18:17, <edmund.h.white@intel.com> wrote:
> On 01/16/2015 12:24 AM, Jan Beulich wrote:
>>>>> On 15.01.15 at 22:00, <edmund.h.white@intel.com> wrote:
>>> On 01/15/2015 09:33 AM, Tim Deegan wrote:
>>>> Hi,
>>>>
>>>> Sorry for the fractured replies - my notes are confused about which
>>>> functions were defined where.
>>>>
>>>> At 13:26 -0800 on 09 Jan (1420806398), Ed White wrote:
>>>>> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
>>>>> +                             unsigned long old_pfn, unsigned long new_pfn)
>>>>> +{
>>>> [...]
>>>>> +    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
>>>>> +
>>>>> +    if ( !mfn_valid(mfn) )
>>>>> +        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
>>>>> +
>>>>> +    if ( !mfn_valid(mfn) || !(t == p2m_ram_rw || t == p2m_ram_rw) )
>>>>> +        goto out;
>>>>> +
>>>>> +    /* Use special ram type to enable #VE if setting for current domain */
>>>>> +    if ( current->domain == d )
>>>>> +        t = p2m_ram_rw_ve;
>>>>> +
>>>>> +    if ( !ap2m->set_entry(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a) )
>>>>> +        rc = 1;
>>>>
>>>> I'm afraid this is Terribly Unsafe[tm].  Following on from my point on
>>>> the log-dirty patch, if the original gfn gets removed from the guest,
>>>> for any reason, we need a way to find and remove this mapping too.
>>>>
>>>> That will be non-trivial, since you can't do it by exhaustive search.
>>>> Maybe some sort of reverse mapping?
>>>
>>> How often is it likely that a page will be removed? If it's
>>> infrequent, maybe an exhaustive search will suffice. I don't
>>> expect there to be anywhere near 10 alternates in use in most
>>> cases, and they are sparsely populated.
>> 
>> A fundamental thing you need to keep in mind when considering
>> exhaustive searches is that these need to be preemptible, no
>> matter how infrequent they are. Which may be difficult to
>> arrange for, based on experience with code where we needed
>> to add such preemption later on (as security fixes).
> 
> I've seen plenty of activity on the list regarding pre-emptible
> hypercalls, but I'm not clear what level of DOS this prevents. Is
> it to guard against a DOS against a domain, or against the whole system?

The hypervisor itself is not preemptible. Hence any code running
for long enough without allowing softirqs to be processed put the
system as a whole at risk. Furthermore, anything not allowing
events to be processed inside the calling guest for an extended
period of time is going to put the guest at risk (and again the whole
system if that guest is the hardware or control domain).

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-17 15:01         ` Andrew Cooper
@ 2015-01-19 12:17           ` Tim Deegan
  2015-01-19 21:54             ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-19 12:17 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: keir, ian.campbell, ian.jackson, Ed White, xen-devel, jbeulich

Hi,

At 15:01 +0000 on 17 Jan (1421503283), Andrew Cooper wrote:
> With this altp2m code, I have been thinking about migration and passthrough.
> 
> Migration and passthrough are themselves mutually exclusive features, as
> logdirty cant identify DMA writes (and the toolstack can probably be
> forgiven for not being able to wedge a real bit of hardware into the
> migration stream :) ).

Yep.

> I cant see any problem between the existing alt2pm code and passthrough,
> in the case that shared ept is not in use.  Given the number of other
> issues we have with shared ept, I don't think this is a impediment.

Indeed.  I can't see a problem even if shared EPT is in use, as the
IOMMU will use the 'host' p2m, and the no-#VE bit is available in EPTEs.

Malicious code in the guest could circumvent the altp2m restrictions
by issuing DMA, but that's independent of shared EPT.  The current
altp2m will never map any BARs in the altp2m, but there's always IOIO
and shared-memory queues to talk to the hardware.

For that matter there are the emulated devices, whose BAR accesses
will be trapped and emulated.  I wonder whether the altp2m feature
wants to have a matching restriction on emulated I/O?

> Migration on the other hand poses a number of challenges.  The altp2m
> work would require sending multiple p2ms in the stream which is
> completely incompatible with legacy migration, but will be fine in
> migration v2 where extensions can easily be made.

Extending v1 is easy enough, just not as clean.  Grab another negative
integer to tag the new data and have at it.  Agreed that it would be
a pain to have to redo some of the framing work for v2, but the actual
saving and restoring ought to be the same.

> Logdirty would need to be extended to cover the p2ms themselves, and
> p2m permissions would also need to be sent.

I'm not sure I follow you about log-dirty.  Log-dirty tracking of
guest state is already in hand; I think the altp2m state could be
pickled after the domain is paused.

How much of the altp2ms would need to be sent is not really clear --
it would be nice to have a description of the interface's intended
semantics for this discussion.  Presumably it would be enough to
normalize the altp2ms against the real one: rather than sending the
actual contents, send only the access retrictions and #VE bit, where
they differ from the main p2m, and a gfn->gfn list of remappings.

Or: declare in the interface that the altp2ms are soft state that can
be dropped on migration, with some suitable callback (#VE injection?)
to the guest when an altp2m 'view' is not available.  That depends on
whether the in-guest agent can reconstruct the state it needs from
scratch.

> As such, I do not believe migration support should be an impediment to
> the altp2m series gaining experimental status in tree. It is not
> reasonable to delay inclusion for migration v2 to be accepted, and
> frankly, the extra work to get migration working is probably as larger
> task as this basic support.

Most of the work would be the same for v1 and v2 migration, and the
fact that it's hard doesn't mean it doesn't need to happen.
I'm willing to be persuaded on this, but we ought to have explicit
agreement in advance about:
 - what needs to be fixed;
 - who is going to do the fixing and when; and
 - what happens if it's not fixed (e.g. feature gets reverted).

> In some copious free time, I will see about drafting a start to the
> document.

Thanks!

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-17  9:49         ` Tim Deegan
@ 2015-01-19 19:35           ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-19 19:35 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/17/2015 01:49 AM, Tim Deegan wrote:
> At 13:43 -0800 on 16 Jan (1421412191), Ed White wrote:
>> I've tried to make all my work consistent with existing code
>> and design in the same vein or same source file, regardless of my
>> opinion of that existing content, unless I had some compelling reason
>> not to. I've also tried not to touch anything that I didn't absolutely
>> have to touch (and as I've hinted, I'm not allowed to touch nested p2m).
>>
>> I'm sure I haven't always succeeded, but that has been my intent.
>> My view was that this approach gave me the best chance of getting
>> the work accepted.
> 
> Thank you for that, by the way.  I realise that this kind of review
> can be terribly off-putting because every negative thing has to be
> gone over in detail whereas the positives are pretty well taken for
> granted.  And there is basically no way to get everything right first
> time through -- you'll see that even code from long-time contributors
> goes through the same process.

I'm not upset or discouraged. Now that you all understand at least
in broad terms what I'm trying to achieve, no-one is telling me to
go away and forget the whole idea.

Late last week some of our discussion had become philosophical, so
I thought it was worth making my philosophy clear: I want to make
Xen support these capabilities, and I want to do it properly, and
I want to make it as easy as possible for you to accept the
changes.

Righting all the wrongs I perceive in Xen can wait for another
day.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type.
  2015-01-19  8:49           ` Jan Beulich
@ 2015-01-19 19:53             ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-19 19:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: keir, Tim Deegan, ian.jackson, ian.campbell, xen-devel

>>> Without you explaining to us the full details of the in-domain
>>> agent model, I'm afraid this is going to remain dubious and the
>>> question hard to answer. In particular, if you indeed want to
>>> prohibit that behavior on _all_ other p2m types, how would
>>> subsequently changing the implementation then be compatible
>>> (if it can't be done this way right from the beginning)?
>>
>> I think I have explained it. There is software already commercially
>> available that uses a security hypervisor to partition memory at a
>> level below the OS page tables for Windows running on physical
>> hardware. We want to make that possible for Windows in a Xen domU.
>> The security hypervisor uses multiple EPTP's to apply different
>> access permissions to some guest physical addresses in different
>> views (p2m's) and in certain cases applies different
>> guest physical->host physical (gfn->mfn) mappings to some pages
>> between different views. The only pages to which any of these
>> operations is applied are pages which are rwx ram at a hardware level.
>> The security hypervisor works in concert with a protected agent that
>> runs inside the OS.
>>
>> I don't see any great difficulty at all in implementing the #VE
>> functionality through a p2m type initially and subsequently adding
>> a more generic facility. What do you see as the problem there?
> 
> You said above "and as the only current user of that model we not
> only don't want #VE to work on other page types, we specifically
> want it to be prohibited" - if in a first implementation you enforce
> this, and a later implementation relaxes it, the guest relying on the
> first implementation's behavior may break. I.e. I'm not certain
> whether with a later more complete implementation a guest
> wouldn't be required to actively request what behavior it wants,
> and whether making the default be how your first implementation
> is intended to behave is (design-/architecture-wise) reasonable.

You are right, that was a poor explanation. Our in-domain agent
should never be manipulating pages that are not hardware rwx ram,
and so if we inadvertently try to manipulate another page type
it would help us if rather than attempting to do what we asked Xen
returned an error. So it's effectively a debugging aid. I wasn't
suggesting that we would rely on that behaviour.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-19 12:17           ` Tim Deegan
@ 2015-01-19 21:54             ` Ed White
  2015-01-20  8:47               ` Jan Beulich
  2015-01-22 15:42               ` Tim Deegan
  0 siblings, 2 replies; 135+ messages in thread
From: Ed White @ 2015-01-19 21:54 UTC (permalink / raw)
  To: Tim Deegan, Andrew Cooper
  Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

> Or: declare in the interface that the altp2ms are soft state that can
> be dropped on migration, with some suitable callback (#VE injection?)
> to the guest when an altp2m 'view' is not available.  That depends on
> whether the in-guest agent can reconstruct the state it needs from
> scratch.
> 

I've been wondering about this too, although it's not something the
existing software that we want to enable on Xen has to cope with.

I still don't understand under what circumstances a machine page
can be removed from a guest, especially given that we are only
going to use remapping when we are actively using both copies of
the page. However, if such a page can be removed, the agent
(in-domain or out-of-domain) has to be able to know about that
and handle it. There's also the issue that access permissions
are soft state and can be reverted to default in certain cases.

Maybe the solution to all of this is a mechanism by which the
agent can be notified that some of its modifications have been
invalidated, so it can rebuild state as appropriate.

We could then add a piece of state to an alternate p2m to indicate
that it contains remapped gfn->mfn translations, and invalidate
the entire p2m if any mfn is removed from the guest.

Migration could invalidate everything.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-19 21:54             ` Ed White
@ 2015-01-20  8:47               ` Jan Beulich
  2015-01-20 18:43                 ` Ed White
  2015-01-22 15:42               ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Jan Beulich @ 2015-01-20  8:47 UTC (permalink / raw)
  To: Ed White
  Cc: keir, ian.campbell, Andrew Cooper, Tim Deegan, xen-devel, ian.jackson

>>> On 19.01.15 at 22:54, <edmund.h.white@intel.com> wrote:
> There's also the issue that access permissions
> are soft state and can be reverted to default in certain cases.

Some instances of which have got removed during the 4.5 cycle,
and at least some of the remaining ones are deemed at least
questionable too. I.e. if this reverting to default permissions
stands in the way, propose (separate!) patches to eliminate it.

Jan

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-20  8:47               ` Jan Beulich
@ 2015-01-20 18:43                 ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-20 18:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, ian.campbell, Andrew Cooper, Tim Deegan, xen-devel, ian.jackson

On 01/20/2015 12:47 AM, Jan Beulich wrote:
>>>> On 19.01.15 at 22:54, <edmund.h.white@intel.com> wrote:
>> There's also the issue that access permissions
>> are soft state and can be reverted to default in certain cases.
> 
> Some instances of which have got removed during the 4.5 cycle,
> and at least some of the remaining ones are deemed at least
> questionable too. I.e. if this reverting to default permissions
> stands in the way, propose (separate!) patches to eliminate it.

The less frequently changed state disappears the better, but it
seems there isn't any way to prevent all such events, which is
a new problem for the software these patches are meant to
enable.

So, there has to be a mechanism to notify the agent that
a piece of its state is no longer valid. I think the
challenge is figuring out how fine-grained the notification
should be.

One idea I had since yesterday is to track the minimum and
maximum mfn's ever subject to any altp2m-type modification
in a p2m. That shouldn't be very expensive, and would likely
avoid a lot of unnecessary p2m invalidation.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-19 21:54             ` Ed White
  2015-01-20  8:47               ` Jan Beulich
@ 2015-01-22 15:42               ` Tim Deegan
  2015-01-22 19:15                 ` Ed White
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-01-22 15:42 UTC (permalink / raw)
  To: Ed White
  Cc: keir, ian.campbell, Andrew Cooper, ian.jackson, xen-devel, jbeulich

At 13:54 -0800 on 19 Jan (1421672054), Ed White wrote:
> > Or: declare in the interface that the altp2ms are soft state that can
> > be dropped on migration, with some suitable callback (#VE injection?)
> > to the guest when an altp2m 'view' is not available.  That depends on
> > whether the in-guest agent can reconstruct the state it needs from
> > scratch.
> > 
> 
> I've been wondering about this too, although it's not something the
> existing software that we want to enable on Xen has to cope with.
> 
> I still don't understand under what circumstances a machine page
> can be removed from a guest, especially given that we are only
> going to use remapping when we are actively using both copies of
> the page.

The guest itself can choose to relinquish a page (e.g. for
ballooning), or any privileged tool can remove one.  From the
hypervisor's point of view, we need to do the right (i.e. safe) thing
when this happens, even if it's a strange/pointless thing to happen.

Log-dirty has a similar issue wrt remappings -- we need to make sure
that all remapped entries in altp2ms get reset to read-only or we
might miss a write. 

> However, if such a page can be removed, the agent
> (in-domain or out-of-domain) has to be able to know about that
> and handle it. There's also the issue that access permissions
> are soft state and can be reverted to default in certain cases.
> 
> Maybe the solution to all of this is a mechanism by which the
> agent can be notified that some of its modifications have been
> invalidated, so it can rebuild state as appropriate.
> 
> We could then add a piece of state to an alternate p2m to indicate
> that it contains remapped gfn->mfn translations, and invalidate
> the entire p2m if any mfn is removed from the guest.

Yep.  That would be safe, but potentially slow -- heading towards the
flush-everything model that nested-p2m uses.  It would be nice to be
able to just invalidate the relevant entries, if we can keep enough
state to find them.

> Migration could invalidate everything.

Yep.  That certainly sounds a lot easier than trying to transport all
that state to a new machine!

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-22 15:42               ` Tim Deegan
@ 2015-01-22 19:15                 ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-01-22 19:15 UTC (permalink / raw)
  To: Tim Deegan
  Cc: keir, ian.campbell, Andrew Cooper, ian.jackson, xen-devel, jbeulich

On 01/22/2015 07:42 AM, Tim Deegan wrote:
> At 13:54 -0800 on 19 Jan (1421672054), Ed White wrote:
>>> Or: declare in the interface that the altp2ms are soft state that can
>>> be dropped on migration, with some suitable callback (#VE injection?)
>>> to the guest when an altp2m 'view' is not available.  That depends on
>>> whether the in-guest agent can reconstruct the state it needs from
>>> scratch.
>>>
>>
>> I've been wondering about this too, although it's not something the
>> existing software that we want to enable on Xen has to cope with.
>>
>> I still don't understand under what circumstances a machine page
>> can be removed from a guest, especially given that we are only
>> going to use remapping when we are actively using both copies of
>> the page.
> 
> The guest itself can choose to relinquish a page (e.g. for
> ballooning), or any privileged tool can remove one.  From the
> hypervisor's point of view, we need to do the right (i.e. safe) thing
> when this happens, even if it's a strange/pointless thing to happen.
> 
> Log-dirty has a similar issue wrt remappings -- we need to make sure
> that all remapped entries in altp2ms get reset to read-only or we
> might miss a write. 
> 
>> However, if such a page can be removed, the agent
>> (in-domain or out-of-domain) has to be able to know about that
>> and handle it. There's also the issue that access permissions
>> are soft state and can be reverted to default in certain cases.
>>
>> Maybe the solution to all of this is a mechanism by which the
>> agent can be notified that some of its modifications have been
>> invalidated, so it can rebuild state as appropriate.
>>
>> We could then add a piece of state to an alternate p2m to indicate
>> that it contains remapped gfn->mfn translations, and invalidate
>> the entire p2m if any mfn is removed from the guest.
> 
> Yep.  That would be safe, but potentially slow -- heading towards the
> flush-everything model that nested-p2m uses.  It would be nice to be
> able to just invalidate the relevant entries, if we can keep enough
> state to find them.
> 
>> Migration could invalidate everything.
> 
> Yep.  That certainly sounds a lot easier than trying to transport all
> that state to a new machine!

IMO, invalidating an entire altp2m seems like the right level of
granularity to expect the agent to handle, in which case it will
be possible to fine-tune the algorithm that determines when to
invalidate without changing the notification interface.

My thought is to start by tracking the upper and lower bound of
modified mfn's (modified in any way) in a p2m, and see how
effective that is. If it's good enough in a non-pathological
case, why do something more complex?

Changing tack somewhat, in the short term I intend to submit
the p2m class tracking changes (patch 4 in the series) as a
standalone patch, combining your suggestions and Andrew's, as
that doesn't seem to be controversial and lays the groundwork
for altp2m. If I think I can do it without treading on toes
inside Intel, I might change p2m_write_entry() and submit a
standalone patch for that too.

Then I'll probably disappear for a while. I completely
understand that you can't accept these changes without the
ability to test them automatically, but I don't have an answer
as to how I can provide test code. I'll return once I've
figured that out.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 18:44                         ` Ed White
@ 2015-03-04 23:06                           ` Tamas K Lengyel
  2015-03-04 23:41                             ` Ed White
  2015-03-05 10:36                             ` Tim Deegan
  0 siblings, 2 replies; 135+ messages in thread
From: Tamas K Lengyel @ 2015-03-04 23:06 UTC (permalink / raw)
  To: Ed White
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson,
	Tim Deegan, Jan Beulich, xen-devel

>>>>> Right. The key observation is that at any single point in time, a given
>>>>> hardware thread can be fetching an instruction or reading data, but not
>>>>> both.
>>>>
>>>> Fine, as long as an instruction reading itself isn't going to lead to
>>>> a live lock.
>>>>
>>>
>>> That's not how the hardware works. By the time you figure out that the
>>> instruction you are executing reads memory, the instruction itself has
>>> been fetched and decoded. That won't happen again during this execution.
>>
>> Can you explain?  If the instruction faults and is returned to,
>> execution starts again, right?  So for an instruction that reads itself:
>>
>> - the fetch succeeds;
>> - the read fails, and we fault;
>> - the hypervisor switches from mapping MFN 1 (--x) to MFN 2 (r--);
>> - the hypervisor returns to the guest.
>>
>> Are you relying on the icache/trace cache/whatever to restart
>> the instruction from a cached value rather than fault immediately?
>> (Because the hypervisor didn't flush the TLB when it changed the mapping)?
>>
>
> Nope. I just typed before drinking enough coffee. That whole answer was bogus.
>
> Of course, if an instruction reads itself you can get a live lock using
> these techniques, but it's a software-induced live lock and software can
> avoid it. One way is compare the address being read with the instruction
> pointer, and if they are on the same page emulate instead of switching p2m's.
>
> Ed

Hi Ed,
we have been poking at this idea of achieving singlestepping through
altp2m view-switching (which would be supported by the VMFUNC
EPTP-switching) and the problem discussed above is not limited to
instructions that perform data accesses on the same page where the
instruction executing was fetched from. In order to achieve true
single-stepping, the immediate next instruction should be causing an
EPT violation.

Let's assume we trap an instruction that only performs data accesses
on pages other than the one the instruction was fetched from. Since
the instruction fetch is repeated after a failed data access due to
EPT violation, the page containing the instruction has to be at least
--x and the pages that will be touched by it rw- (or the proper
combination or r-- and rw-) simultaneously in order to avoid getting
into a live-lock. This results in all subsequent instruction fetches
to succeed from the original page. Furthermore, as long as all such
subsequent instructions keep accessing only the pages touched by the
first instruction, we could end up missing a good chunk of code
execution. Is there something we are missing here or is this a known
limitation to the EPT-based singlestepping mechanism? Or is there
something in the way VMFUNC is implemented that will avoid this
limitation?

Thanks,
Tamas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-03-04 23:06                           ` Tamas K Lengyel
@ 2015-03-04 23:41                             ` Ed White
  2015-03-05 10:51                               ` Tamas K Lengyel
  2015-03-05 10:36                             ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Ed White @ 2015-03-04 23:41 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson,
	Tim Deegan, Jan Beulich, xen-devel

On 03/04/2015 03:06 PM, Tamas K Lengyel wrote:
>>>>>> Right. The key observation is that at any single point in time, a given
>>>>>> hardware thread can be fetching an instruction or reading data, but not
>>>>>> both.
>>>>>
>>>>> Fine, as long as an instruction reading itself isn't going to lead to
>>>>> a live lock.
>>>>>
>>>>
>>>> That's not how the hardware works. By the time you figure out that the
>>>> instruction you are executing reads memory, the instruction itself has
>>>> been fetched and decoded. That won't happen again during this execution.
>>>
>>> Can you explain?  If the instruction faults and is returned to,
>>> execution starts again, right?  So for an instruction that reads itself:
>>>
>>> - the fetch succeeds;
>>> - the read fails, and we fault;
>>> - the hypervisor switches from mapping MFN 1 (--x) to MFN 2 (r--);
>>> - the hypervisor returns to the guest.
>>>
>>> Are you relying on the icache/trace cache/whatever to restart
>>> the instruction from a cached value rather than fault immediately?
>>> (Because the hypervisor didn't flush the TLB when it changed the mapping)?
>>>
>>
>> Nope. I just typed before drinking enough coffee. That whole answer was bogus.
>>
>> Of course, if an instruction reads itself you can get a live lock using
>> these techniques, but it's a software-induced live lock and software can
>> avoid it. One way is compare the address being read with the instruction
>> pointer, and if they are on the same page emulate instead of switching p2m's.
>>
>> Ed
> 
> Hi Ed,
> we have been poking at this idea of achieving singlestepping through
> altp2m view-switching (which would be supported by the VMFUNC
> EPTP-switching) and the problem discussed above is not limited to
> instructions that perform data accesses on the same page where the
> instruction executing was fetched from. In order to achieve true
> single-stepping, the immediate next instruction should be causing an
> EPT violation.
> 
> Let's assume we trap an instruction that only performs data accesses
> on pages other than the one the instruction was fetched from. Since
> the instruction fetch is repeated after a failed data access due to
> EPT violation, the page containing the instruction has to be at least
> --x and the pages that will be touched by it rw- (or the proper
> combination or r-- and rw-) simultaneously in order to avoid getting
> into a live-lock. This results in all subsequent instruction fetches
> to succeed from the original page. Furthermore, as long as all such
> subsequent instructions keep accessing only the pages touched by the
> first instruction, we could end up missing a good chunk of code
> execution. Is there something we are missing here or is this a known
> limitation to the EPT-based singlestepping mechanism? Or is there
> something in the way VMFUNC is implemented that will avoid this
> limitation?
> 
> Thanks,
> Tamas
> 

If you truly need single-step, then there is no alternative to doing
that the traditional way using TF. What I was hinting at before (and
I seem to have offended you by doing so) is that if your only reason
for single-stepping is to revert a view switch, then depending on your
use-case the single-step may be avoidable. At the risk of offending
you again, I still can't talk about that in more detail.

Is there any chance you might reconsider your decision not to help
with toolstack support of the patch series? I'm still trying to find
an internal resource to do that work, but right now it's the biggest
risk I see to getting the series into 4.6.

Since this discussion has started up again, I should tell you that
after today I probably won't be able to post to the list until next
week.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-03-04 23:06                           ` Tamas K Lengyel
  2015-03-04 23:41                             ` Ed White
@ 2015-03-05 10:36                             ` Tim Deegan
  2015-03-05 10:58                               ` Tamas K Lengyel
  1 sibling, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-03-05 10:36 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson, Ed White,
	xen-devel, Jan Beulich

At 00:06 +0100 on 05 Mar (1425510383), Tamas K Lengyel wrote:
> Let's assume we trap an instruction that only performs data accesses
> on pages other than the one the instruction was fetched from. Since
> the instruction fetch is repeated after a failed data access due to
> EPT violation, the page containing the instruction has to be at least
> --x and the pages that will be touched by it rw- (or the proper
> combination or r-- and rw-) simultaneously in order to avoid getting
> into a live-lock. This results in all subsequent instruction fetches
> to succeed from the original page. Furthermore, as long as all such
> subsequent instructions keep accessing only the pages touched by the
> first instruction, we could end up missing a good chunk of code
> execution.

If all you want is to audit the changes that were made to the target
page before making them visible (e.g. before marking the target page
executable or before undoing a private redirection of the page) then
perhaps you don't care how many instructions have executed.  You can
just treat that chunk of execution as if it were one really complex
instruction.

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-03-04 23:41                             ` Ed White
@ 2015-03-05 10:51                               ` Tamas K Lengyel
  2015-03-13 17:38                                 ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tamas K Lengyel @ 2015-03-05 10:51 UTC (permalink / raw)
  To: Ed White
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson,
	Tim Deegan, Jan Beulich, xen-devel

On Thu, Mar 5, 2015 at 12:41 AM, Ed White <edmund.h.white@intel.com> wrote:
> On 03/04/2015 03:06 PM, Tamas K Lengyel wrote:
>>>>>>> Right. The key observation is that at any single point in time, a given
>>>>>>> hardware thread can be fetching an instruction or reading data, but not
>>>>>>> both.
>>>>>>
>>>>>> Fine, as long as an instruction reading itself isn't going to lead to
>>>>>> a live lock.
>>>>>>
>>>>>
>>>>> That's not how the hardware works. By the time you figure out that the
>>>>> instruction you are executing reads memory, the instruction itself has
>>>>> been fetched and decoded. That won't happen again during this execution.
>>>>
>>>> Can you explain?  If the instruction faults and is returned to,
>>>> execution starts again, right?  So for an instruction that reads itself:
>>>>
>>>> - the fetch succeeds;
>>>> - the read fails, and we fault;
>>>> - the hypervisor switches from mapping MFN 1 (--x) to MFN 2 (r--);
>>>> - the hypervisor returns to the guest.
>>>>
>>>> Are you relying on the icache/trace cache/whatever to restart
>>>> the instruction from a cached value rather than fault immediately?
>>>> (Because the hypervisor didn't flush the TLB when it changed the mapping)?
>>>>
>>>
>>> Nope. I just typed before drinking enough coffee. That whole answer was bogus.
>>>
>>> Of course, if an instruction reads itself you can get a live lock using
>>> these techniques, but it's a software-induced live lock and software can
>>> avoid it. One way is compare the address being read with the instruction
>>> pointer, and if they are on the same page emulate instead of switching p2m's.
>>>
>>> Ed
>>
>> Hi Ed,
>> we have been poking at this idea of achieving singlestepping through
>> altp2m view-switching (which would be supported by the VMFUNC
>> EPTP-switching) and the problem discussed above is not limited to
>> instructions that perform data accesses on the same page where the
>> instruction executing was fetched from. In order to achieve true
>> single-stepping, the immediate next instruction should be causing an
>> EPT violation.
>>
>> Let's assume we trap an instruction that only performs data accesses
>> on pages other than the one the instruction was fetched from. Since
>> the instruction fetch is repeated after a failed data access due to
>> EPT violation, the page containing the instruction has to be at least
>> --x and the pages that will be touched by it rw- (or the proper
>> combination or r-- and rw-) simultaneously in order to avoid getting
>> into a live-lock. This results in all subsequent instruction fetches
>> to succeed from the original page. Furthermore, as long as all such
>> subsequent instructions keep accessing only the pages touched by the
>> first instruction, we could end up missing a good chunk of code
>> execution. Is there something we are missing here or is this a known
>> limitation to the EPT-based singlestepping mechanism? Or is there
>> something in the way VMFUNC is implemented that will avoid this
>> limitation?
>>
>> Thanks,
>> Tamas
>>
>
> If you truly need single-step, then there is no alternative to doing
> that the traditional way using TF. What I was hinting at before (and
> I seem to have offended you by doing so) is that if your only reason
> for single-stepping is to revert a view switch, then depending on your
> use-case the single-step may be avoidable. At the risk of offending
> you again, I still can't talk about that in more detail.

I see, that is indeed different than actual singlestepping - which was
also implied by the Zero-footprint slides to be possible via VMFUNC.
Maybe not calling singlestepping in the future would avoid confusion.
Sorry if I came of as offended, I understand its not your decision
what Intel decides to share. What you started to contribute is already
very valuable and appreciated. I just see too many subsystems already
within Xen without documentation that were built to support
proprietary systems where the original authors refuse to share
information on how to actually use it (mem sharing for example). For
people who are outside that loop it's frustrating to reverse engineer
the usecase from the source.

>
> Is there any chance you might reconsider your decision not to help
> with toolstack support of the patch series? I'm still trying to find
> an internal resource to do that work, but right now it's the biggest
> risk I see to getting the series into 4.6.

My comment regarding hesitation in committing toolstack code that is
suboptimal still stands. Many people look at these as reference
implementations, thus a faulty or suboptimal contribution here can
have highly uncompetitive effects. IMHO that's not how open source
should work. Of course if that's not the case for at least some
usecases I would be happy to help with those.

> Since this discussion has started up again, I should tell you that
> after today I probably won't be able to post to the list until next
> week.
>
> Ed

Thanks,
Tamas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-03-05 10:36                             ` Tim Deegan
@ 2015-03-05 10:58                               ` Tamas K Lengyel
  2015-03-05 11:13                                 ` Tim Deegan
  0 siblings, 1 reply; 135+ messages in thread
From: Tamas K Lengyel @ 2015-03-05 10:58 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson, Ed White,
	xen-devel, Jan Beulich

On Thu, Mar 5, 2015 at 11:36 AM, Tim Deegan <tim@xen.org> wrote:
> At 00:06 +0100 on 05 Mar (1425510383), Tamas K Lengyel wrote:
>> Let's assume we trap an instruction that only performs data accesses
>> on pages other than the one the instruction was fetched from. Since
>> the instruction fetch is repeated after a failed data access due to
>> EPT violation, the page containing the instruction has to be at least
>> --x and the pages that will be touched by it rw- (or the proper
>> combination or r-- and rw-) simultaneously in order to avoid getting
>> into a live-lock. This results in all subsequent instruction fetches
>> to succeed from the original page. Furthermore, as long as all such
>> subsequent instructions keep accessing only the pages touched by the
>> first instruction, we could end up missing a good chunk of code
>> execution.
>
> If all you want is to audit the changes that were made to the target
> page before making them visible (e.g. before marking the target page
> executable or before undoing a private redirection of the page) then
> perhaps you don't care how many instructions have executed.  You can
> just treat that chunk of execution as if it were one really complex
> instruction.
>
> Tim.

Thanks Tim, that indeed seems to have been the intended usecase for
this subsystem. The usecase I was thinking is API call tracing via
instruction fetch violations (stealthy debugging). Unfortunately that
doesn't seem to be possible and the terminology used in the
slides/discussion has been somewhat misleading regarding this
possibility.

Thanks,
Tamas

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-03-05 10:58                               ` Tamas K Lengyel
@ 2015-03-05 11:13                                 ` Tim Deegan
  0 siblings, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-03-05 11:13 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson, Ed White,
	xen-devel, Jan Beulich

At 11:58 +0100 on 05 Mar (1425553138), Tamas K Lengyel wrote:
> On Thu, Mar 5, 2015 at 11:36 AM, Tim Deegan <tim@xen.org> wrote:
> > At 00:06 +0100 on 05 Mar (1425510383), Tamas K Lengyel wrote:
> >> Let's assume we trap an instruction that only performs data accesses
> >> on pages other than the one the instruction was fetched from. Since
> >> the instruction fetch is repeated after a failed data access due to
> >> EPT violation, the page containing the instruction has to be at least
> >> --x and the pages that will be touched by it rw- (or the proper
> >> combination or r-- and rw-) simultaneously in order to avoid getting
> >> into a live-lock. This results in all subsequent instruction fetches
> >> to succeed from the original page. Furthermore, as long as all such
> >> subsequent instructions keep accessing only the pages touched by the
> >> first instruction, we could end up missing a good chunk of code
> >> execution.
> >
> > If all you want is to audit the changes that were made to the target
> > page before making them visible (e.g. before marking the target page
> > executable or before undoing a private redirection of the page) then
> > perhaps you don't care how many instructions have executed.  You can
> > just treat that chunk of execution as if it were one really complex
> > instruction.
> >
> > Tim.
> 
> Thanks Tim, that indeed seems to have been the intended usecase for
> this subsystem. The usecase I was thinking is API call tracing via
> instruction fetch violations (stealthy debugging).

Ah, yes, for that you'd need all the function you care about to be on
different pages from their callers.  That's probably true in many
interesting cases (e.g. tracing all calls into a dll).

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-13 20:45     ` Andrew Cooper
  2015-01-13 21:30       ` Ed White
@ 2015-03-05 13:45       ` Egger, Christoph
  1 sibling, 0 replies; 135+ messages in thread
From: Egger, Christoph @ 2015-03-05 13:45 UTC (permalink / raw)
  To: Andrew Cooper, Ed White, xen-devel
  Cc: keir, ian.campbell, tim, ian.jackson, jbeulich, Tamas K Lengyel

On 2015/01/13 21:45, Andrew Cooper wrote:
> On 13/01/15 20:02, Ed White wrote:
>> On 01/13/2015 11:01 AM, Andrew Cooper wrote:
>>
>>> I can't think of any reasonable case where the alternate p2m would want
>>> mappings different to the host p2m.  That is to say, an altp2m will map
>>> the same set of mfns to make a guest physical address space, but may
>>> differ in page permissions and possibly p2m types.
>>>
>> The set of mfn's is the same, but I do allow gfn->mfn mappings to be
>> modified under certain circumstances. One use of this is to point the
>> same VA to different physical pages (with different access permissions)
>> in different p2m's to hide memory changes.
> 
> What is the practical use of being able to play paging tricks like this
> behind a VMs back?
> 

I can imagine to reduce overhead for PV frontend/backend driver
communication by saying the "mfn" is dom0's "gmfn".

Christoph

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-03-05 10:51                               ` Tamas K Lengyel
@ 2015-03-13 17:38                                 ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-03-13 17:38 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson,
	Tim Deegan, Jan Beulich, xen-devel

>>
>> Is there any chance you might reconsider your decision not to help
>> with toolstack support of the patch series? I'm still trying to find
>> an internal resource to do that work, but right now it's the biggest
>> risk I see to getting the series into 4.6.
> 
> My comment regarding hesitation in committing toolstack code that is
> suboptimal still stands. Many people look at these as reference
> implementations, thus a faulty or suboptimal contribution here can
> have highly uncompetitive effects. IMHO that's not how open source
> should work. Of course if that's not the case for at least some
> usecases I would be happy to help with those.
> 

You are more familiar with the tools in question than I am, but
I wonder if anything would be suboptimal in the sense that you
mean it. You could argue that there may be closer to optimal ways
of intercepting an OS call for instance, but that depends on your
use-case, and I suspect is a higher level of abstraction than the
tools provide.

Single-stepping, OTOH, would not be suboptimal -- that was a
misunderstanding.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/11] VMX: implement suppress #VE.
  2015-01-15 18:46     ` Ed White
  2015-01-16 17:22       ` Tim Deegan
@ 2015-03-25 17:30       ` Ed White
  2015-03-26 10:15         ` Tim Deegan
  1 sibling, 1 reply; 135+ messages in thread
From: Ed White @ 2015-03-25 17:30 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

On 01/15/2015 10:46 AM, Ed White wrote:
> On 01/15/2015 08:25 AM, Tim Deegan wrote:
>> Hi,
>>
>> At 13:26 -0800 on 09 Jan (1420806392), Ed White wrote:
>>>  static inline bool_t is_epte_valid(ept_entry_t *e)
>>>  {
>>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
>>> +    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);
>>
>> This test for 0 is just catching uninitialised entries in freshly
>> allocated pages.  Rather than changing it to ignore bit 63, this loop...
>>
>>>  }
>>>  
>>>  /* returns : 0 for success, -errno otherwise */
>>> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
>>>  
>>>      ept_entry->r = ept_entry->w = ept_entry->x = 1;
>>>  
>>> +    /* Disable #VE on all entries */ 
>>> +    if ( cpu_has_vmx_virt_exceptions )
>>> +    {
>>> +        ept_entry_t *table = __map_domain_page(pg);
>>> +
>>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
>>> +            table[i].suppress_ve = 1;
>>
>> ...should set the type of the empty entries to p2m_invalid as it goes.
>>
>>> +    /* Disable #VE on all entries */
>>> +    if ( cpu_has_vmx_virt_exceptions )
>>> +    {
>>> +        ept_entry_t *table =
>>> +            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
>>> +
>>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
>>> +            table[i].suppress_ve = 1;
>>
>> And the same here.

I have some time to work on this patch series again, and although I tried
this it doesn't eliminate all the instances of epte being zero with the
possible exception of suppress_ve. I spent some time trying to find all
cases where that happens without success, so I've used Andrew's suggestion
of using a mask.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-01-15 18:23   ` Ed White
  2015-01-16  8:12     ` Jan Beulich
  2015-01-16 18:33     ` Tim Deegan
@ 2015-03-25 17:41     ` Ed White
  2015-03-26 10:40       ` Tim Deegan
  2 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-03-25 17:41 UTC (permalink / raw)
  To: Tim Deegan; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

>>
>> The second thing is how similar some of this is to nested p2m code,
>> making me wonder whether it could share more code with that.  It's not
>> as much duplication as I had feared, but e.g. altp2m_write_p2m_entry()
>> is _identical_ to nestedp2m_write_p2m_entry(), (making the
>> copyright claim at the top of the file quite dubious, BTW).
>>
> 
> I did initially use nestedp2m_write_p2m_entry directly, but I knew
> that wouldn't be acceptable! On this specific point, I would be more
> inclined to refactor the normal write entry routine so you can call
> it everywhere, since both the nested and alternate ones are simply
> a copy of a part of the normal one.
> 

I started to look into this again. What I described as the normal write
routine is the one for HAP, and that is actually a domain-level routine
(akin to the paging mode-specific ones), and takes a domain as its first
parameter. The nestedp2m routine is a p2m-level routine that takes a p2m
as its first parameter, and that's what I had copied.

However, looking at the code, the p2m-level write routine is only ever
called from the domain-level one, but the HAP routine doesn't make such
calls, and nestedp2m and altp2m are only available in HAP mode.

Therefore, I have dropped the altp2m write routine with no ill effects.
I don't think the nestedp2m one can ever be called.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-01-13 19:49     ` Ed White
@ 2015-03-25 20:59       ` Ed White
  2015-03-26 10:48         ` Tim Deegan
  0 siblings, 1 reply; 135+ messages in thread
From: Ed White @ 2015-03-25 20:59 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: ian.jackson, tim, keir, ian.campbell, jbeulich

>>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>>> index abf3d7a..8fe0650 100644
>>> --- a/xen/arch/x86/mm/hap/hap.c
>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>> @@ -439,7 +439,7 @@ void hap_domain_init(struct domain *d)
>>>  int hap_enable(struct domain *d, u32 mode)
>>>  {
>>>      unsigned int old_pages;
>>> -    uint8_t i;
>>> +    uint16_t i;
>>>      int rv = 0;
>>>  
>>>      domain_pause(d);
>>> @@ -485,6 +485,23 @@ int hap_enable(struct domain *d, u32 mode)
>>>             goto out;
>>>      }
>>>  
>>> +    /* Init alternate p2m data */
>>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
>>
>> This memory should be allocated from some domain-accounted pool,
>> probably the paging pool (d->->arch.paging.alloc_page()).  You can use
>> map_domain_page_global() to get a safe pointer to anchor in
>> d->arch.altp2m_eptp for hardware.

I tried this but could not get it to work due to panics in Xen.
Looking at the current VMX code, all the existing structures
shared with hardware (VMCS, exception bitmap, etc.) are allocated
using alloc_xenheap_page(), which is what induced me to write the
code this way.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 02/11] VMX: implement suppress #VE.
  2015-03-25 17:30       ` Ed White
@ 2015-03-26 10:15         ` Tim Deegan
  0 siblings, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-03-26 10:15 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 10:30 -0700 on 25 Mar (1427279417), Ed White wrote:
> On 01/15/2015 10:46 AM, Ed White wrote:
> > On 01/15/2015 08:25 AM, Tim Deegan wrote:
> >> Hi,
> >>
> >> At 13:26 -0800 on 09 Jan (1420806392), Ed White wrote:
> >>>  static inline bool_t is_epte_valid(ept_entry_t *e)
> >>>  {
> >>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
> >>> +    return (e->valid != 0 && e->sa_p2mt != p2m_invalid);
> >>
> >> This test for 0 is just catching uninitialised entries in freshly
> >> allocated pages.  Rather than changing it to ignore bit 63, this loop...
> >>
> >>>  }
> >>>  
> >>>  /* returns : 0 for success, -errno otherwise */
> >>> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
> >>>  
> >>>      ept_entry->r = ept_entry->w = ept_entry->x = 1;
> >>>  
> >>> +    /* Disable #VE on all entries */ 
> >>> +    if ( cpu_has_vmx_virt_exceptions )
> >>> +    {
> >>> +        ept_entry_t *table = __map_domain_page(pg);
> >>> +
> >>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> >>> +            table[i].suppress_ve = 1;
> >>
> >> ...should set the type of the empty entries to p2m_invalid as it goes.
> >>
> >>> +    /* Disable #VE on all entries */
> >>> +    if ( cpu_has_vmx_virt_exceptions )
> >>> +    {
> >>> +        ept_entry_t *table =
> >>> +            map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
> >>> +
> >>> +        for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> >>> +            table[i].suppress_ve = 1;
> >>
> >> And the same here.
> 
> I have some time to work on this patch series again, and although I tried
> this it doesn't eliminate all the instances of epte being zero with the
> possible exception of suppress_ve. I spent some time trying to find all
> cases where that happens without success, so I've used Andrew's suggestion
> of using a mask.

Fair enough.  Thanks for trying it.

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
  2015-03-25 17:41     ` Ed White
@ 2015-03-26 10:40       ` Tim Deegan
  0 siblings, 0 replies; 135+ messages in thread
From: Tim Deegan @ 2015-03-26 10:40 UTC (permalink / raw)
  To: Ed White; +Cc: keir, ian.jackson, ian.campbell, jbeulich, xen-devel

At 10:41 -0700 on 25 Mar (1427280115), Ed White wrote:
> >>
> >> The second thing is how similar some of this is to nested p2m code,
> >> making me wonder whether it could share more code with that.  It's not
> >> as much duplication as I had feared, but e.g. altp2m_write_p2m_entry()
> >> is _identical_ to nestedp2m_write_p2m_entry(), (making the
> >> copyright claim at the top of the file quite dubious, BTW).
> >>
> > 
> > I did initially use nestedp2m_write_p2m_entry directly, but I knew
> > that wouldn't be acceptable! On this specific point, I would be more
> > inclined to refactor the normal write entry routine so you can call
> > it everywhere, since both the nested and alternate ones are simply
> > a copy of a part of the normal one.
> > 
> 
> I started to look into this again. What I described as the normal write
> routine is the one for HAP, and that is actually a domain-level routine
> (akin to the paging mode-specific ones), and takes a domain as its first
> parameter. The nestedp2m routine is a p2m-level routine that takes a p2m
> as its first parameter, and that's what I had copied.
> 
> However, looking at the code, the p2m-level write routine is only ever
> called from the domain-level one, but the HAP routine doesn't make such
> calls, and nestedp2m and altp2m are only available in HAP mode.
> 
> Therefore, I have dropped the altp2m write routine with no ill effects.
> I don't think the nestedp2m one can ever be called.

The write_entry() hook is called from the p2m-pt.c implementation (i.e. on
AMD SVM), so with s/HAP/EPT/g your explanation is correct.  In EPT,
the equivalent is the atomic_write_p2m_entry() function.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-03-25 20:59       ` Ed White
@ 2015-03-26 10:48         ` Tim Deegan
  2015-03-26 18:00           ` Ed White
  0 siblings, 1 reply; 135+ messages in thread
From: Tim Deegan @ 2015-03-26 10:48 UTC (permalink / raw)
  To: Ed White
  Cc: keir, ian.campbell, Andrew Cooper, ian.jackson, xen-devel, jbeulich

At 13:59 -0700 on 25 Mar (1427291983), Ed White wrote:
> >>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> >>> index abf3d7a..8fe0650 100644
> >>> --- a/xen/arch/x86/mm/hap/hap.c
> >>> +++ b/xen/arch/x86/mm/hap/hap.c
> >>> @@ -439,7 +439,7 @@ void hap_domain_init(struct domain *d)
> >>>  int hap_enable(struct domain *d, u32 mode)
> >>>  {
> >>>      unsigned int old_pages;
> >>> -    uint8_t i;
> >>> +    uint16_t i;
> >>>      int rv = 0;
> >>>  
> >>>      domain_pause(d);
> >>> @@ -485,6 +485,23 @@ int hap_enable(struct domain *d, u32 mode)
> >>>             goto out;
> >>>      }
> >>>  
> >>> +    /* Init alternate p2m data */
> >>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
> >>
> >> This memory should be allocated from some domain-accounted pool,
> >> probably the paging pool (d->->arch.paging.alloc_page()).  You can use
> >> map_domain_page_global() to get a safe pointer to anchor in
> >> d->arch.altp2m_eptp for hardware.
> 
> I tried this but could not get it to work due to panics in Xen.
> Looking at the current VMX code, all the existing structures
> shared with hardware (VMCS, exception bitmap, etc.) are allocated
> using alloc_xenheap_page(), which is what induced me to write the
> code this way.

This page is a per-domain page of pointers, right, and not the actual
top-level page of an alt-EPT table?  In that case, alloc_xenheap_page()
should be OK.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 05/11] x86/altp2m: basic data structures and support routines.
  2015-03-26 10:48         ` Tim Deegan
@ 2015-03-26 18:00           ` Ed White
  0 siblings, 0 replies; 135+ messages in thread
From: Ed White @ 2015-03-26 18:00 UTC (permalink / raw)
  To: Tim Deegan
  Cc: keir, ian.campbell, Andrew Cooper, ian.jackson, xen-devel, jbeulich

On 03/26/2015 03:48 AM, Tim Deegan wrote:
> At 13:59 -0700 on 25 Mar (1427291983), Ed White wrote:
>>>>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>>>>> index abf3d7a..8fe0650 100644
>>>>> --- a/xen/arch/x86/mm/hap/hap.c
>>>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>>>> @@ -439,7 +439,7 @@ void hap_domain_init(struct domain *d)
>>>>>  int hap_enable(struct domain *d, u32 mode)
>>>>>  {
>>>>>      unsigned int old_pages;
>>>>> -    uint8_t i;
>>>>> +    uint16_t i;
>>>>>      int rv = 0;
>>>>>  
>>>>>      domain_pause(d);
>>>>> @@ -485,6 +485,23 @@ int hap_enable(struct domain *d, u32 mode)
>>>>>             goto out;
>>>>>      }
>>>>>  
>>>>> +    /* Init alternate p2m data */
>>>>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
>>>>
>>>> This memory should be allocated from some domain-accounted pool,
>>>> probably the paging pool (d->->arch.paging.alloc_page()).  You can use
>>>> map_domain_page_global() to get a safe pointer to anchor in
>>>> d->arch.altp2m_eptp for hardware.
>>
>> I tried this but could not get it to work due to panics in Xen.
>> Looking at the current VMX code, all the existing structures
>> shared with hardware (VMCS, exception bitmap, etc.) are allocated
>> using alloc_xenheap_page(), which is what induced me to write the
>> code this way.
> 
> This page is a per-domain page of pointers, right, and not the actual
> top-level page of an alt-EPT table?  In that case, alloc_xenheap_page()
> should be OK.
> 
> Cheers,
> 
> Tim.
> 

Yep, one page of pointers per domain. Thanks, I'll leave this code
as I originally wrote it then.

Ed

^ permalink raw reply	[flat|nested] 135+ messages in thread

end of thread, other threads:[~2015-03-26 18:00 UTC | newest]

Thread overview: 135+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-09 21:26 [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Ed White
2015-01-09 21:26 ` [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection Ed White
2015-01-12 13:06   ` Andrew Cooper
2015-01-13 18:50     ` Ed White
2015-01-14 14:38       ` Andrew Cooper
2015-01-09 21:26 ` [PATCH 02/11] VMX: implement suppress #VE Ed White
2015-01-12 16:43   ` Andrew Cooper
2015-01-12 17:45     ` Ed White
2015-01-13 18:36       ` Ed White
2015-01-15 16:25   ` Tim Deegan
2015-01-15 18:46     ` Ed White
2015-01-16 17:22       ` Tim Deegan
2015-03-25 17:30       ` Ed White
2015-03-26 10:15         ` Tim Deegan
2015-01-09 21:26 ` [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection Ed White
2015-01-12 17:08   ` Andrew Cooper
2015-01-12 17:46     ` Ed White
2015-01-15 16:32   ` Tim Deegan
2015-01-09 21:26 ` [PATCH 04/11] x86/MM: Improve p2m type checks Ed White
2015-01-12 17:48   ` Andrew Cooper
2015-01-13 19:39     ` Ed White
2015-01-15 16:36   ` Tim Deegan
2015-01-09 21:26 ` [PATCH 05/11] x86/altp2m: basic data structures and support routines Ed White
2015-01-13 11:28   ` Andrew Cooper
2015-01-13 19:49     ` Ed White
2015-03-25 20:59       ` Ed White
2015-03-26 10:48         ` Tim Deegan
2015-03-26 18:00           ` Ed White
2015-01-15 16:48   ` Tim Deegan
2015-01-15 16:53     ` Jan Beulich
2015-01-15 18:49       ` Ed White
2015-01-16  7:37         ` Jan Beulich
2015-01-16 17:23         ` Tim Deegan
2015-01-09 21:26 ` [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE Ed White
2015-01-13 11:58   ` Andrew Cooper
2015-01-15 16:56   ` Tim Deegan
2015-01-15 18:55     ` Ed White
2015-01-16 17:50       ` Tim Deegan
2015-01-16 17:57         ` Ed White
2015-01-09 21:26 ` [PATCH 07/11] x86/altp2m: introduce p2m_ram_rw_ve type Ed White
2015-01-15 17:03   ` Tim Deegan
2015-01-15 20:38     ` Ed White
2015-01-16  8:20       ` Jan Beulich
2015-01-16 17:14         ` Ed White
2015-01-19  8:49           ` Jan Beulich
2015-01-19 19:53             ` Ed White
2015-01-16 17:52       ` Tim Deegan
2015-01-16 18:35         ` Ed White
2015-01-17  9:37           ` Tim Deegan
2015-01-09 21:26 ` [PATCH 08/11] x86/altp2m: add remaining support routines Ed White
2015-01-15 17:25   ` Tim Deegan
2015-01-15 20:57     ` Ed White
2015-01-16 18:04       ` Tim Deegan
2015-01-15 17:33   ` Tim Deegan
2015-01-15 21:00     ` Ed White
2015-01-16  8:24       ` Jan Beulich
2015-01-16 17:17         ` Ed White
2015-01-19  8:52           ` Jan Beulich
2015-01-16 18:09       ` Tim Deegan
2015-01-09 21:26 ` [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
2015-01-15 17:09   ` Tim Deegan
2015-01-15 20:43     ` Ed White
2015-01-16 17:57       ` Tim Deegan
2015-01-09 21:26 ` [PATCH 10/11] x86/altp2m: fix log-dirty handling Ed White
2015-01-15 17:20   ` Tim Deegan
2015-01-15 20:49     ` Ed White
2015-01-16 17:59       ` Tim Deegan
2015-01-09 21:26 ` [PATCH 11/11] x86/altp2m: alternate p2m memory events Ed White
2015-01-09 22:06 ` [PATCH 00/11] Alternate p2m: support multiple copies of host p2m Andrew Cooper
2015-01-09 22:21   ` Ed White
2015-01-09 22:41     ` Andrew Cooper
2015-01-09 23:04       ` Ed White
2015-01-12 10:00         ` Jan Beulich
2015-01-12 17:36           ` Ed White
2015-01-13  8:56             ` Jan Beulich
2015-01-13 11:28               ` Ian Jackson
2015-01-13 17:42               ` Ed White
2015-01-12 12:17 ` Ian Jackson
2015-01-12 17:39   ` Ed White
2015-01-12 17:43     ` Ian Jackson
2015-01-12 17:50       ` Ed White
2015-01-12 18:00         ` Ian Jackson
2015-01-12 18:31           ` Ed White
2015-01-13 10:21             ` Tamas K Lengyel
2015-01-13 18:25               ` Ed White
2015-01-13 11:16             ` Ian Jackson
2015-01-12 17:51       ` Andrew Cooper
2015-01-13 19:01 ` Andrew Cooper
2015-01-13 20:02   ` Ed White
2015-01-13 20:45     ` Andrew Cooper
2015-01-13 21:30       ` Ed White
2015-01-14  7:04         ` Jan Beulich
2015-01-14 10:31           ` Tamas K Lengyel
2015-01-14 11:09             ` Jan Beulich
2015-01-14 11:28               ` Tamas K Lengyel
2015-01-14 17:35                 ` Ed White
2015-01-15  8:16                   ` Jan Beulich
2015-01-15 17:28                     ` Ed White
2015-01-15 17:45                       ` Tim Deegan
2015-01-15 18:44                         ` Ed White
2015-03-04 23:06                           ` Tamas K Lengyel
2015-03-04 23:41                             ` Ed White
2015-03-05 10:51                               ` Tamas K Lengyel
2015-03-13 17:38                                 ` Ed White
2015-03-05 10:36                             ` Tim Deegan
2015-03-05 10:58                               ` Tamas K Lengyel
2015-03-05 11:13                                 ` Tim Deegan
2015-01-16  7:35                       ` Jan Beulich
2015-01-16 16:54                         ` Ed White
2015-01-15 10:39                   ` Tamas K Lengyel
2015-01-15 17:31                     ` Ed White
2015-01-16 10:43                       ` Tamas K Lengyel
2015-01-16 17:21                         ` Ed White
2015-03-05 13:45       ` Egger, Christoph
2015-01-14  7:01     ` Jan Beulich
2015-01-15 16:15 ` Tim Deegan
2015-01-15 18:23   ` Ed White
2015-01-16  8:12     ` Jan Beulich
2015-01-16 17:01       ` Ed White
2015-01-16 18:33     ` Tim Deegan
2015-01-16 20:32       ` Ed White
2015-01-17  9:34         ` Tim Deegan
2015-01-16 21:43       ` Ed White
2015-01-17  9:49         ` Tim Deegan
2015-01-19 19:35           ` Ed White
2015-01-17  9:31       ` Tim Deegan
2015-01-17 15:01         ` Andrew Cooper
2015-01-19 12:17           ` Tim Deegan
2015-01-19 21:54             ` Ed White
2015-01-20  8:47               ` Jan Beulich
2015-01-20 18:43                 ` Ed White
2015-01-22 15:42               ` Tim Deegan
2015-01-22 19:15                 ` Ed White
2015-03-25 17:41     ` Ed White
2015-03-26 10:40       ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.