xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
@ 2015-06-22 18:56 Ed White
  2015-06-22 18:56 ` [PATCH v2 01/12] VMX: VMFUNC and #VE definitions and detection Ed White
                   ` (13 more replies)
  0 siblings, 14 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

This set of patches adds support to hvm domains for EPTP switching by creating
multiple copies of the host p2m (currently limited to 10 copies).

The primary use of this capability is expected to be in scenarios where access
to memory needs to be monitored and/or restricted below the level at which the
guest OS page tables operate. Two examples that were discussed at the 2014 Xen
developer summit are:

    VM introspection: 
        http://www.slideshare.net/xen_com_mgr/
        zero-footprint-guest-memory-introspection-from-xen

    Secure inter-VM communication:
        http://www.slideshare.net/xen_com_mgr/nakajima-nvf

A more detailed design specification can be found at:
    http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01319.html

Each p2m copy is populated lazily on EPT violations.
Permissions for pages in alternate p2m's can be changed in a similar
way to the existing memory access interface, and gfn->mfn mappings can be changed.

All this is done through extra HVMOP types.

The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
code is hypervisor-only, the toolstack has not been modified.

The intra-domain code has been tested. Violation notifications can only be received
for pages that have been modified (access permissions and/or gfn->mfn mapping) 
intra-domain, and only on VCPU's that have enabled notification.

VMFUNC and #VE will both be emulated on hardware without native support.

This code is not compatible with nested hvm functionality and will refuse to work
with nested hvm active. It is also not compatible with migration. It should be
considered experimental.


Changes since v1:

Many changes since v1 in response to maintainer feedback, including:

    Suppress_ve state is now decoupled from memory type
    VMFUNC emulation handled in x86 emulator
    Lazy-copy algorithm copies any page where mfn != INVALID_MFN
    All nested page fault handling except lazy-copy is now in
        top-level (hvm.c) nested page fault handler
    Split p2m lock type (as suggested by Tim) to avoid lock order violations
    XSM hooks
    Xen parameter to globally enable altp2m (default disabled) and HVM parameter
    Altp2m reference counting no longer uses dirty_cpu bitmap
    Remapped page tracking to invalidate altp2m's where needed to protect Xen
    Many other minor changes

The altp2m invalidation is implemented to a level that I believe satisifies
the requirements of protecting Xen. Invalidation notification is not yet
implemented, and there may be other cases where invalidation is warranted to
protect the integrity of the restrictions placed through altp2m. We may add
further patches in this area.

Testability is still a potential issue. We have offered to make our internal
Windows test binaries available for intra-domain testing. Tamas has
been working on toolstack support for cross-domain testing with a slightly
earlier patch series, and we hope he will submit that support.

Not all of the patches will be of interest to everyone copied here. I've
copied everyone on this initial mailing to give context.
   
Ed White (10):
  VMX: VMFUNC and #VE definitions and detection.
  VMX: implement suppress #VE.
  x86/HVM: Hardware alternate p2m support detection.
  x86/altp2m: basic data structures and support routines.
  VMX/altp2m: add code to support EPTP switching and #VE.
  x86/altp2m: add control of suppress_ve.
  x86/altp2m: alternate p2m memory events.
  x86/altp2m: add remaining support routines.
  x86/altp2m: define and implement alternate p2m HVMOP types.
  x86/altp2m: Add altp2mhvm HVM domain parameter.

Ravi Sahita (2):
  VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.
  x86/altp2m: XSM hooks for altp2m HVM ops

 docs/man/xl.cfg.pod.5                        |  12 +
 docs/misc/xen-command-line.markdown          |   7 +
 tools/flask/policy/policy/modules/xen/xen.if |   4 +-
 tools/libxl/libxl_create.c                   |   1 +
 tools/libxl/libxl_dom.c                      |   2 +
 tools/libxl/libxl_types.idl                  |   1 +
 tools/libxl/xl_cmdimpl.c                     |   8 +
 xen/arch/x86/hvm/Makefile                    |   2 +
 xen/arch/x86/hvm/altp2mhvm.c                 |  82 +++++
 xen/arch/x86/hvm/emulate.c                   |  13 +-
 xen/arch/x86/hvm/hvm.c                       | 357 +++++++++++++++++-
 xen/arch/x86/hvm/vmx/vmcs.c                  |  42 ++-
 xen/arch/x86/hvm/vmx/vmx.c                   | 163 +++++++++
 xen/arch/x86/mm/hap/Makefile                 |   1 +
 xen/arch/x86/mm/hap/altp2m_hap.c             | 103 ++++++
 xen/arch/x86/mm/hap/hap.c                    |  31 +-
 xen/arch/x86/mm/mm-locks.h                   |  33 +-
 xen/arch/x86/mm/p2m-ept.c                    |  67 +++-
 xen/arch/x86/mm/p2m.c                        | 528 ++++++++++++++++++++++++++-
 xen/arch/x86/x86_emulate/x86_emulate.c       |   8 +
 xen/arch/x86/x86_emulate/x86_emulate.h       |   4 +
 xen/include/asm-arm/p2m.h                    |   7 +
 xen/include/asm-x86/domain.h                 |  10 +
 xen/include/asm-x86/hvm/altp2mhvm.h          |  42 +++
 xen/include/asm-x86/hvm/hvm.h                |  25 ++
 xen/include/asm-x86/hvm/vcpu.h               |   9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h           |  14 +-
 xen/include/asm-x86/hvm/vmx/vmx.h            |  13 +-
 xen/include/asm-x86/msr-index.h              |   1 +
 xen/include/asm-x86/p2m.h                    |  80 +++-
 xen/include/public/hvm/hvm_op.h              |  69 ++++
 xen/include/public/hvm/params.h              |   5 +-
 xen/include/public/vm_event.h                |  13 +-
 xen/include/xen/mem_access.h                 |   1 +
 xen/include/xsm/dummy.h                      |  12 +
 xen/include/xsm/xsm.h                        |  12 +
 xen/xsm/dummy.c                              |   2 +
 xen/xsm/flask/hooks.c                        |  12 +
 xen/xsm/flask/policy/access_vectors          |   7 +
 39 files changed, 1770 insertions(+), 33 deletions(-)
 create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
 create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
 create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH v2 01/12] VMX: VMFUNC and #VE definitions and detection.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24  8:45   ` Andrew Cooper
  2015-06-22 18:56 ` [PATCH v2 02/12] VMX: implement suppress #VE Ed White
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

Currently, neither is enabled globally but may be enabled on a per-VCPU
basis by the altp2m code.

Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
that bit is now hardware-defined.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 42 +++++++++++++++++++++++++++++++++++---
 xen/arch/x86/mm/p2m-ept.c          |  1 -
 xen/include/asm-x86/hvm/vmx/vmcs.h | 14 +++++++++++--
 xen/include/asm-x86/hvm/vmx/vmx.h  | 13 +++++++++++-
 xen/include/asm-x86/msr-index.h    |  1 +
 5 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4c5ceb5..bc1cabd 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -101,6 +101,8 @@ u32 vmx_secondary_exec_control __read_mostly;
 u32 vmx_vmexit_control __read_mostly;
 u32 vmx_vmentry_control __read_mostly;
 u64 vmx_ept_vpid_cap __read_mostly;
+u64 vmx_vmfunc __read_mostly;
+bool_t vmx_virt_exception __read_mostly;
 
 const u32 vmx_introspection_force_enabled_msrs[] = {
     MSR_IA32_SYSENTER_EIP,
@@ -140,6 +142,8 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery");
     P(cpu_has_vmx_posted_intr_processing, "Posted Interrupt Processing");
     P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
+    P(cpu_has_vmx_vmfunc, "VM Functions");
+    P(cpu_has_vmx_virt_exceptions, "Virtualisation Exceptions");
     P(cpu_has_vmx_pml, "Page Modification Logging");
 #undef P
 
@@ -185,6 +189,7 @@ static int vmx_init_vmcs_config(void)
     u64 _vmx_misc_cap = 0;
     u32 _vmx_vmexit_control;
     u32 _vmx_vmentry_control;
+    u64 _vmx_vmfunc = 0;
     bool_t mismatch = 0;
 
     rdmsr(MSR_IA32_VMX_BASIC, vmx_basic_msr_low, vmx_basic_msr_high);
@@ -230,7 +235,9 @@ static int vmx_init_vmcs_config(void)
                SECONDARY_EXEC_ENABLE_EPT |
                SECONDARY_EXEC_ENABLE_RDTSCP |
                SECONDARY_EXEC_PAUSE_LOOP_EXITING |
-               SECONDARY_EXEC_ENABLE_INVPCID);
+               SECONDARY_EXEC_ENABLE_INVPCID |
+               SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
+               SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
         rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
         if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
             opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
@@ -341,6 +348,24 @@ static int vmx_init_vmcs_config(void)
           || !(_vmx_vmexit_control & VM_EXIT_ACK_INTR_ON_EXIT) )
         _vmx_pin_based_exec_control  &= ~ PIN_BASED_POSTED_INTERRUPT;
 
+    /* The IA32_VMX_VMFUNC MSR exists only when VMFUNC is available */
+    if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS )
+    {
+        rdmsrl(MSR_IA32_VMX_VMFUNC, _vmx_vmfunc);
+
+        /*
+         * VMFUNC leaf 0 (EPTP switching) must be supported.
+         *
+         * Or we just don't use VMFUNC.
+         */
+        if ( !(_vmx_vmfunc & VMX_VMFUNC_EPTP_SWITCHING) )
+            _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
+    }
+
+    /* Virtualization exceptions are only enabled if VMFUNC is enabled */
+    if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
+        _vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+
     min = 0;
     opt = VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_BNDCFGS;
     _vmx_vmentry_control = adjust_vmx_controls(
@@ -361,6 +386,9 @@ static int vmx_init_vmcs_config(void)
         vmx_vmentry_control        = _vmx_vmentry_control;
         vmx_basic_msr              = ((u64)vmx_basic_msr_high << 32) |
                                      vmx_basic_msr_low;
+        vmx_vmfunc                 = _vmx_vmfunc;
+        vmx_virt_exception         = !!(_vmx_secondary_exec_control &
+                                       SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
         vmx_display_features();
 
         /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -397,6 +425,9 @@ static int vmx_init_vmcs_config(void)
         mismatch |= cap_check(
             "EPT and VPID Capability",
             vmx_ept_vpid_cap, _vmx_ept_vpid_cap);
+        mismatch |= cap_check(
+            "VMFUNC Capability",
+            vmx_vmfunc, _vmx_vmfunc);
         if ( cpu_has_vmx_ins_outs_instr_info !=
              !!(vmx_basic_msr_high & (VMX_BASIC_INS_OUT_INFO >> 32)) )
         {
@@ -967,6 +998,11 @@ static int construct_vmcs(struct vcpu *v)
     /* Do not enable Monitor Trap Flag unless start single step debug */
     v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
 
+    /* Disable VMFUNC and #VE for now: they may be enabled later by altp2m. */
+    v->arch.hvm_vmx.secondary_exec_control &=
+        ~(SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
+          SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
+
     if ( is_pvh_domain(d) )
     {
         /* Disable virtual apics, TPR */
@@ -1790,9 +1826,9 @@ void vmcs_dump_vcpu(struct vcpu *v)
         printk("PLE Gap=%08x Window=%08x\n",
                vmr32(PLE_GAP), vmr32(PLE_WINDOW));
     if ( v->arch.hvm_vmx.secondary_exec_control &
-         (SECONDARY_EXEC_ENABLE_VPID | SECONDARY_EXEC_ENABLE_VMFUNC) )
+         (SECONDARY_EXEC_ENABLE_VPID | SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
         printk("Virtual processor ID = 0x%04x VMfunc controls = %016lx\n",
-               vmr16(VIRTUAL_PROCESSOR_ID), vmr(VMFUNC_CONTROL));
+               vmr16(VIRTUAL_PROCESSOR_ID), vmr(VM_FUNCTION_CONTROL));
 
     vmx_vmcs_exit(v);
 }
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 5133eb6..a6c9adf 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -281,7 +281,6 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
         epte->sp = (level > 1);
         epte->mfn += i * trunk;
         epte->snp = (iommu_enabled && iommu_snoop);
-        ASSERT(!epte->avail3);
 
         ept_p2m_type_to_flags(p2m, epte, epte->sa_p2mt, epte->access);
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 1104bda..cb0ee6c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -222,9 +222,10 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING       0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
-#define SECONDARY_EXEC_ENABLE_VMFUNC            0x00002000
+#define SECONDARY_EXEC_ENABLE_VM_FUNCTIONS      0x00002000
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
+#define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS   0x00040000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
@@ -285,6 +286,10 @@ extern u32 vmx_secondary_exec_control;
     (vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT)
 #define cpu_has_vmx_vmcs_shadowing \
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
+#define cpu_has_vmx_vmfunc \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS)
+#define cpu_has_vmx_virt_exceptions \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS)
 #define cpu_has_vmx_pml \
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
 
@@ -316,6 +321,9 @@ extern u64 vmx_basic_msr;
 #define VMX_GUEST_INTR_STATUS_SUBFIELD_BITMASK  0x0FF
 #define VMX_GUEST_INTR_STATUS_SVI_OFFSET        8
 
+/* VMFUNC leaf definitions */
+#define VMX_VMFUNC_EPTP_SWITCHING   (1ULL << 0)
+
 /* VMCS field encodings. */
 #define VMCS_HIGH(x) ((x) | 1)
 enum vmcs_field {
@@ -350,12 +358,14 @@ enum vmcs_field {
     VIRTUAL_APIC_PAGE_ADDR          = 0x00002012,
     APIC_ACCESS_ADDR                = 0x00002014,
     PI_DESC_ADDR                    = 0x00002016,
-    VMFUNC_CONTROL                  = 0x00002018,
+    VM_FUNCTION_CONTROL             = 0x00002018,
     EPT_POINTER                     = 0x0000201a,
     EOI_EXIT_BITMAP0                = 0x0000201c,
 #define EOI_EXIT_BITMAP(n) (EOI_EXIT_BITMAP0 + (n) * 2) /* n = 0...3 */
+    EPTP_LIST_ADDR                  = 0x00002024,
     VMREAD_BITMAP                   = 0x00002026,
     VMWRITE_BITMAP                  = 0x00002028,
+    VIRT_EXCEPTION_INFO             = 0x0000202a,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     VMCS_LINK_POINTER               = 0x00002800,
     GUEST_IA32_DEBUGCTL             = 0x00002802,
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index 35f804a..5b59d3c 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -47,7 +47,7 @@ typedef union {
         access      :   4,  /* bits 61:58 - p2m_access_t */
         tm          :   1,  /* bit 62 - VT-d transient-mapping hint in
                                shared EPT/VT-d usage */
-        avail3      :   1;  /* bit 63 - Software available 3 */
+        suppress_ve :   1;  /* bit 63 - suppress #VE */
     };
     u64 epte;
 } ept_entry_t;
@@ -186,6 +186,7 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 #define EXIT_REASON_XSETBV              55
 #define EXIT_REASON_APIC_WRITE          56
 #define EXIT_REASON_INVPCID             58
+#define EXIT_REASON_VMFUNC              59
 #define EXIT_REASON_PML_FULL            62
 
 /*
@@ -554,4 +555,14 @@ void p2m_init_hap_data(struct p2m_domain *p2m);
 #define EPT_L4_PAGETABLE_SHIFT      39
 #define EPT_PAGETABLE_ENTRIES       512
 
+/* #VE information page */
+typedef struct {
+    u32 exit_reason;
+    u32 semaphore;
+    u64 exit_qualification;
+    u64 gla;
+    u64 gpa;
+    u16 eptp_index;
+} ve_info_t;
+
 #endif /* __ASM_X86_HVM_VMX_VMX_H__ */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 83f2f70..8069d60 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -130,6 +130,7 @@
 #define MSR_IA32_VMX_TRUE_PROCBASED_CTLS        0x48e
 #define MSR_IA32_VMX_TRUE_EXIT_CTLS             0x48f
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS            0x490
+#define MSR_IA32_VMX_VMFUNC                     0x491
 #define IA32_FEATURE_CONTROL_MSR                0x3a
 #define IA32_FEATURE_CONTROL_MSR_LOCK                     0x0001
 #define IA32_FEATURE_CONTROL_MSR_ENABLE_VMXON_INSIDE_SMX  0x0002
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 02/12] VMX: implement suppress #VE.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
  2015-06-22 18:56 ` [PATCH v2 01/12] VMX: VMFUNC and #VE definitions and detection Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24  9:35   ` Andrew Cooper
  2015-06-29 14:20   ` George Dunlap
  2015-06-22 18:56 ` [PATCH v2 03/12] x86/HVM: Hardware alternate p2m support detection Ed White
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

In preparation for selectively enabling #VE in a later patch, set
suppress #VE on all EPTE's.

Suppress #VE should always be the default condition for two reasons:
it is generally not safe to deliver #VE into a guest unless that guest
has been modified to receive it; and even then for most EPT violations only
the hypervisor is able to handle the violation.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/mm/p2m-ept.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index a6c9adf..5de3387 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -41,7 +41,7 @@
 #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
 static inline bool_t is_epte_valid(ept_entry_t *e)
 {
-    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
+    return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);
 }
 
 /* returns : 0 for success, -errno otherwise */
@@ -219,6 +219,8 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
 static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
 {
     struct page_info *pg;
+    ept_entry_t *table;
+    unsigned int i;
 
     pg = p2m_alloc_ptp(p2m, 0);
     if ( pg == NULL )
@@ -232,6 +234,15 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
     /* Manually set A bit to avoid overhead of MMU having to write it later. */
     ept_entry->a = 1;
 
+    ept_entry->suppress_ve = 1;
+
+    table = __map_domain_page(pg);
+
+    for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+        table[i].suppress_ve = 1;
+
+    unmap_domain_page(table);
+
     return 1;
 }
 
@@ -281,6 +292,7 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
         epte->sp = (level > 1);
         epte->mfn += i * trunk;
         epte->snp = (iommu_enabled && iommu_snoop);
+        epte->suppress_ve = 1;
 
         ept_p2m_type_to_flags(p2m, epte, epte->sa_p2mt, epte->access);
 
@@ -790,6 +802,8 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
     }
 
+    new_entry.suppress_ve = 1;
+
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
     if ( unlikely(rc) )
         old_entry.epte = 0;
@@ -1111,6 +1125,8 @@ static void ept_flush_pml_buffers(struct p2m_domain *p2m)
 int ept_p2m_init(struct p2m_domain *p2m)
 {
     struct ept_data *ept = &p2m->ept;
+    ept_entry_t *table;
+    unsigned int i;
 
     p2m->set_entry = ept_set_entry;
     p2m->get_entry = ept_get_entry;
@@ -1134,6 +1150,13 @@ int ept_p2m_init(struct p2m_domain *p2m)
         p2m->flush_hardware_cached_dirty = ept_flush_pml_buffers;
     }
 
+    table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
+
+    for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+        table[i].suppress_ve = 1;
+
+    unmap_domain_page(table);
+
     if ( !zalloc_cpumask_var(&ept->synced_mask) )
         return -ENOMEM;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 03/12] x86/HVM: Hardware alternate p2m support detection.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
  2015-06-22 18:56 ` [PATCH v2 01/12] VMX: VMFUNC and #VE definitions and detection Ed White
  2015-06-22 18:56 ` [PATCH v2 02/12] VMX: implement suppress #VE Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24  9:44   ` Andrew Cooper
  2015-06-22 18:56 ` [PATCH v2 04/12] x86/altp2m: basic data structures and support routines Ed White
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

As implemented here, only supported on platforms with VMX HAP.

By default this functionality is force-disabled, it can be enabled
by specifying altp2m=1 on the Xen command line.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 docs/misc/xen-command-line.markdown |  7 +++++++
 xen/arch/x86/hvm/hvm.c              | 12 ++++++++++++
 xen/arch/x86/hvm/vmx/vmx.c          |  1 +
 xen/include/asm-x86/hvm/hvm.h       |  6 ++++++
 4 files changed, 26 insertions(+)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index aa684c0..3391c66 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -139,6 +139,13 @@ mode during S3 resume.
 > Default: `true`
 
 Permit Xen to use superpages when performing memory management.
+ 
+### altp2m (Intel)
+> `= <boolean>`
+
++> Default: `false`
+
+Permit multiple copies of host p2m.
 
 ### apic
 > `= bigsmp | default`
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 65baa1b..af68d44 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -94,6 +94,10 @@ bool_t opt_hvm_fep;
 boolean_param("hvm_fep", opt_hvm_fep);
 #endif
 
+/* Xen command-line option to enable altp2m */
+static bool_t __initdata opt_altp2m_enabled = 0;
+boolean_param("altp2m", opt_altp2m_enabled);
+
 static int cpu_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
@@ -160,6 +164,9 @@ static int __init hvm_enable(void)
     if ( !fns->pvh_supported )
         printk(XENLOG_INFO "HVM: PVH mode not supported on this platform\n");
 
+    if ( !opt_altp2m_enabled )
+        hvm_funcs.altp2m_supported = 0;
+
     /*
      * Allow direct access to the PC debug ports 0x80 and 0xed (they are
      * often used for I/O delays, but the vmexits simply slow things down).
@@ -6474,6 +6481,11 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
     return hvm_funcs.nhvm_intr_blocked(v);
 }
 
+bool_t hvm_altp2m_supported()
+{
+    return hvm_funcs.altp2m_supported;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 0837627..2d3ad63 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1841,6 +1841,7 @@ const struct hvm_function_table * __init start_vmx(void)
     if ( cpu_has_vmx_ept && (cpu_has_vmx_pat || opt_force_ept) )
     {
         vmx_function_table.hap_supported = 1;
+        vmx_function_table.altp2m_supported = 1;
 
         vmx_function_table.hap_capabilities = 0;
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 77eeac5..07d8e8e 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -94,6 +94,9 @@ struct hvm_function_table {
     /* Necessary hardware support for PVH mode? */
     int pvh_supported;
 
+    /* Necessary hardware support for alternate p2m's? */
+    bool_t altp2m_supported;
+
     /* Indicate HAP capabilities. */
     int hap_capabilities;
 
@@ -509,6 +512,9 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 /* interrupt */
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
+/* returns true if hardware supports alternate p2m's */
+bool_t hvm_altp2m_supported(void);
+
 #ifndef NDEBUG
 /* Permit use of the Forced Emulation Prefix in HVM guests */
 extern bool_t opt_hvm_fep;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (2 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 03/12] x86/HVM: Hardware alternate p2m support detection Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24 10:06   ` Andrew Cooper
                     ` (2 more replies)
  2015-06-22 18:56 ` [PATCH v2 05/12] VMX/altp2m: add code to support EPTP switching and #VE Ed White
                   ` (9 subsequent siblings)
  13 siblings, 3 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

Add the basic data structures needed to support alternate p2m's and
the functions to initialise them and tear them down.

Although Intel hardware can handle 512 EPTP's per hardware thread
concurrently, only 10 per domain are supported in this patch for
performance reasons.

The iterator in hap_enable() does need to handle 512, so that is now
uint16_t.

This change also splits the p2m lock into one lock type for altp2m's
and another type for all other p2m's. The purpose of this is to place
the altp2m list lock between the types, so the list lock can be
acquired whilst holding the host p2m lock.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/Makefile           |   2 +
 xen/arch/x86/hvm/altp2mhvm.c        |  82 ++++++++++++++++++++++++++++
 xen/arch/x86/hvm/hvm.c              |  21 ++++++++
 xen/arch/x86/mm/hap/hap.c           |  31 ++++++++++-
 xen/arch/x86/mm/mm-locks.h          |  33 +++++++++++-
 xen/arch/x86/mm/p2m.c               | 103 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h        |  10 ++++
 xen/include/asm-x86/hvm/altp2mhvm.h |  38 +++++++++++++
 xen/include/asm-x86/hvm/hvm.h       |  17 ++++++
 xen/include/asm-x86/hvm/vcpu.h      |   9 ++++
 xen/include/asm-x86/p2m.h           |  30 ++++++++++-
 11 files changed, 372 insertions(+), 4 deletions(-)
 create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
 create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h

diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index 69af47f..da4475d 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -1,6 +1,7 @@
 subdir-y += svm
 subdir-y += vmx
 
+obj-y += altp2mhvm.o
 obj-y += asid.o
 obj-y += emulate.o
 obj-y += event.o
@@ -24,3 +25,4 @@ obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
 obj-y += vpmu.o
+
diff --git a/xen/arch/x86/hvm/altp2mhvm.c b/xen/arch/x86/hvm/altp2mhvm.c
new file mode 100644
index 0000000..802fe5b
--- /dev/null
+++ b/xen/arch/x86/hvm/altp2mhvm.c
@@ -0,0 +1,82 @@
+/*
+ * Alternate p2m HVM
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <asm/hvm/support.h>
+#include <asm/hvm/hvm.h>
+#include <asm/p2m.h>
+#include <asm/hvm/altp2mhvm.h>
+
+void
+altp2mhvm_vcpu_reset(struct vcpu *v)
+{
+    struct altp2mvcpu *av = &vcpu_altp2mhvm(v);
+
+    av->p2midx = INVALID_ALTP2M;
+    av->veinfo_gfn = 0;
+
+    if ( hvm_funcs.ahvm_vcpu_reset )
+        hvm_funcs.ahvm_vcpu_reset(v);
+}
+
+int
+altp2mhvm_vcpu_initialise(struct vcpu *v)
+{
+    int rc = -EOPNOTSUPP;
+
+    if ( v != current )
+        vcpu_pause(v);
+
+    if ( !hvm_funcs.ahvm_vcpu_initialise ||
+         (hvm_funcs.ahvm_vcpu_initialise(v) == 0) )
+    {
+        rc = 0;
+        altp2mhvm_vcpu_reset(v);
+        vcpu_altp2mhvm(v).p2midx = 0;
+        atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
+
+        ahvm_vcpu_update_eptp(v);
+    }
+
+    if ( v != current )
+        vcpu_unpause(v);
+
+    return rc;
+}
+
+void
+altp2mhvm_vcpu_destroy(struct vcpu *v)
+{
+    struct p2m_domain *p2m;
+
+    if ( v != current )
+        vcpu_pause(v);
+
+    if ( hvm_funcs.ahvm_vcpu_destroy )
+        hvm_funcs.ahvm_vcpu_destroy(v);
+
+    if ( (p2m = p2m_get_altp2m(v)) )
+        atomic_dec(&p2m->active_vcpus);
+
+    altp2mhvm_vcpu_reset(v);
+
+    ahvm_vcpu_update_eptp(v);
+    ahvm_vcpu_update_vmfunc_ve(v);
+
+    if ( v != current )
+        vcpu_unpause(v);
+}
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index af68d44..d75c12d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -58,6 +58,7 @@
 #include <asm/hvm/cacheattr.h>
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/altp2mhvm.h>
 #include <asm/hvm/event.h>
 #include <asm/mtrr.h>
 #include <asm/apic.h>
@@ -2373,6 +2374,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
 {
     hvm_all_ioreq_servers_remove_vcpu(v->domain, v);
 
+    altp2mhvm_vcpu_destroy(v);
     nestedhvm_vcpu_destroy(v);
 
     free_compat_arg_xlat(v);
@@ -6486,6 +6488,25 @@ bool_t hvm_altp2m_supported()
     return hvm_funcs.altp2m_supported;
 }
 
+void ahvm_vcpu_update_eptp(struct vcpu *v)
+{
+    if (hvm_funcs.ahvm_vcpu_update_eptp)
+        hvm_funcs.ahvm_vcpu_update_eptp(v);
+}
+
+void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v)
+{
+    if (hvm_funcs.ahvm_vcpu_update_vmfunc_ve)
+        hvm_funcs.ahvm_vcpu_update_vmfunc_ve(v);
+}
+
+bool_t ahvm_vcpu_emulate_ve(struct vcpu *v)
+{
+    if (hvm_funcs.ahvm_vcpu_emulate_ve)
+        return hvm_funcs.ahvm_vcpu_emulate_ve(v);
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index d0d3f1e..202aa42 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -459,7 +459,7 @@ void hap_domain_init(struct domain *d)
 int hap_enable(struct domain *d, u32 mode)
 {
     unsigned int old_pages;
-    uint8_t i;
+    uint16_t i;
     int rv = 0;
 
     domain_pause(d);
@@ -498,6 +498,24 @@ int hap_enable(struct domain *d, u32 mode)
            goto out;
     }
 
+    /* Init alternate p2m data */
+    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
+    {
+        rv = -ENOMEM;
+        goto out;
+    }
+
+    for (i = 0; i < MAX_EPTP; i++)
+        d->arch.altp2m_eptp[i] = ~0ul;
+
+    for (i = 0; i < MAX_ALTP2M; i++) {
+        rv = p2m_alloc_table(d->arch.altp2m_p2m[i]);
+        if ( rv != 0 )
+           goto out;
+    }
+
+    d->arch.altp2m_active = 0;
+
     /* Now let other users see the new mode */
     d->arch.paging.mode = mode | PG_HAP_enable;
 
@@ -510,6 +528,17 @@ void hap_final_teardown(struct domain *d)
 {
     uint8_t i;
 
+    d->arch.altp2m_active = 0;
+
+    if ( d->arch.altp2m_eptp ) {
+        free_xenheap_page(d->arch.altp2m_eptp);
+        d->arch.altp2m_eptp = NULL;
+    }
+
+    for (i = 0; i < MAX_ALTP2M; i++) {
+        p2m_teardown(d->arch.altp2m_p2m[i]);
+    }
+
     /* Destroy nestedp2m's first */
     for (i = 0; i < MAX_NESTEDP2M; i++) {
         p2m_teardown(d->arch.nested_p2m[i]);
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index b4f035e..954b345 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -217,7 +217,7 @@ declare_mm_lock(nestedp2m)
 #define nestedp2m_lock(d)   mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock)
 #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock)
 
-/* P2M lock (per-p2m-table)
+/* P2M lock (per-non-alt-p2m-table)
  *
  * This protects all queries and updates to the p2m table.
  * Queries may be made under the read lock but all modifications
@@ -228,7 +228,36 @@ declare_mm_lock(nestedp2m)
  */
 
 declare_mm_rwlock(p2m);
-#define p2m_lock(p)           mm_write_lock(p2m, &(p)->lock);
+
+/* Alternate P2M list lock (per-domain)
+ *
+ * A per-domain lock that protects the list of alternate p2m's.
+ * Any operation that walks the list needs to acquire this lock.
+ * Additionally, before destroying an alternate p2m all VCPU's
+ * in the target domain must be paused.  */
+
+declare_mm_lock(altp2mlist)
+#define altp2m_lock(d)   mm_lock(altp2mlist, &(d)->arch.altp2m_lock)
+#define altp2m_unlock(d) mm_unlock(&(d)->arch.altp2m_lock)
+
+/* P2M lock (per-altp2m-table)
+ *
+ * This protects all queries and updates to the p2m table.
+ * Queries may be made under the read lock but all modifications
+ * need the main (write) lock.
+ *
+ * The write lock is recursive as it is common for a code path to look
+ * up a gfn and later mutate it.
+ */
+
+declare_mm_rwlock(altp2m);
+#define p2m_lock(p)                         \
+{                                           \
+    if ( p2m_is_altp2m(p) )                 \
+        mm_write_lock(altp2m, &(p)->lock);  \
+    else                                    \
+        mm_write_lock(p2m, &(p)->lock);     \
+}
 #define p2m_unlock(p)         mm_write_unlock(&(p)->lock);
 #define gfn_lock(p,g,o)       p2m_lock(p)
 #define gfn_unlock(p,g,o)     p2m_unlock(p)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1fd1194..87b4b75 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -35,6 +35,7 @@
 #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */
 #include <asm/mem_sharing.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/altp2mhvm.h>
 #include <asm/hvm/svm/amd-iommu-proto.h>
 #include <xsm/xsm.h>
 
@@ -183,6 +184,45 @@ static void p2m_teardown_nestedp2m(struct domain *d)
     }
 }
 
+static void p2m_teardown_altp2m(struct domain *d);
+
+static int p2m_init_altp2m(struct domain *d)
+{
+    uint8_t i;
+    struct p2m_domain *p2m;
+
+    mm_lock_init(&d->arch.altp2m_lock);
+    for (i = 0; i < MAX_ALTP2M; i++)
+    {
+        d->arch.altp2m_p2m[i] = p2m = p2m_init_one(d);
+        if ( p2m == NULL )
+        {
+            p2m_teardown_altp2m(d);
+            return -ENOMEM;
+        }
+        p2m->p2m_class = p2m_alternate;
+        p2m->access_required = 1;
+        _atomic_set(&p2m->active_vcpus, 0);
+    }
+
+    return 0;
+}
+
+static void p2m_teardown_altp2m(struct domain *d)
+{
+    uint8_t i;
+    struct p2m_domain *p2m;
+
+    for (i = 0; i < MAX_ALTP2M; i++)
+    {
+        if ( !d->arch.altp2m_p2m[i] )
+            continue;
+        p2m = d->arch.altp2m_p2m[i];
+        p2m_free_one(p2m);
+        d->arch.altp2m_p2m[i] = NULL;
+    }
+}
+
 int p2m_init(struct domain *d)
 {
     int rc;
@@ -196,7 +236,14 @@ int p2m_init(struct domain *d)
      * (p2m_init runs too early for HVM_PARAM_* options) */
     rc = p2m_init_nestedp2m(d);
     if ( rc )
+    {
         p2m_teardown_hostp2m(d);
+        return rc;
+    }
+
+    rc = p2m_init_altp2m(d);
+    if ( rc )
+        p2m_teardown_altp2m(d);
 
     return rc;
 }
@@ -1920,6 +1967,62 @@ int unmap_mmio_regions(struct domain *d,
     return err;
 }
 
+bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx)
+{
+    struct p2m_domain *p2m;
+    struct ept_data *ept;
+    bool_t rc = 0;
+    uint16_t i;
+
+    altp2m_lock(d);
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+    {
+        if ( d->arch.altp2m_eptp[i] == ~0ul )
+            continue;
+
+        p2m = d->arch.altp2m_p2m[i];
+        ept = &p2m->ept;
+
+        if ( eptp != ept_get_eptp(ept) )
+            continue;
+
+        *idx = i;
+        rc = 1;
+
+        break;
+    }
+
+    altp2m_unlock(d);
+    return rc;
+}
+
+bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx)
+{
+    struct domain *d = v->domain;
+    bool_t rc = 0;
+
+    if ( idx > MAX_ALTP2M )
+        return rc;
+
+    altp2m_lock(d);
+
+    if ( d->arch.altp2m_eptp[idx] != ~0ul )
+    {
+        if ( idx != vcpu_altp2mhvm(v).p2midx )
+        {
+            atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
+            vcpu_altp2mhvm(v).p2midx = idx;
+            atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
+            ahvm_vcpu_update_eptp(v);
+        }
+        rc = 1;
+    }
+
+    altp2m_unlock(d);
+    return rc;
+}
+
 /*** Audit ***/
 
 #if P2M_AUDIT
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index a3c117f..6275151 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -235,6 +235,10 @@ struct paging_vcpu {
 typedef xen_domctl_cpuid_t cpuid_input_t;
 
 #define MAX_NESTEDP2M 10
+
+#define MAX_ALTP2M      (uint16_t)10
+#define INVALID_ALTP2M  (uint16_t)~0
+#define MAX_EPTP        (PAGE_SIZE / sizeof(uint64_t))
 struct p2m_domain;
 struct time_scale {
     int shift;
@@ -294,6 +298,12 @@ struct arch_domain
     struct p2m_domain *nested_p2m[MAX_NESTEDP2M];
     mm_lock_t nested_p2m_lock;
 
+    /* altp2m: allow multiple copies of host p2m */
+    bool_t altp2m_active;
+    struct p2m_domain *altp2m_p2m[MAX_ALTP2M];
+    mm_lock_t altp2m_lock;
+    uint64_t *altp2m_eptp;
+
     /* NB. protected by d->event_lock and by irq_desc[irq].lock */
     struct radix_tree_root irq_pirq;
 
diff --git a/xen/include/asm-x86/hvm/altp2mhvm.h b/xen/include/asm-x86/hvm/altp2mhvm.h
new file mode 100644
index 0000000..a4b8e24
--- /dev/null
+++ b/xen/include/asm-x86/hvm/altp2mhvm.h
@@ -0,0 +1,38 @@
+/*
+ * Alternate p2m HVM
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#ifndef _HVM_ALTP2M_H
+#define _HVM_ALTP2M_H
+
+#include <xen/types.h>         /* for uintNN_t */
+#include <xen/sched.h>         /* for struct vcpu, struct domain */
+#include <asm/hvm/vcpu.h>      /* for vcpu_altp2mhvm */
+
+/* Alternate p2m HVM on/off per domain */
+static inline bool_t altp2mhvm_active(const struct domain *d)
+{
+    return d->arch.altp2m_active;
+}
+
+/* Alternate p2m VCPU */
+int altp2mhvm_vcpu_initialise(struct vcpu *v);
+void altp2mhvm_vcpu_destroy(struct vcpu *v);
+void altp2mhvm_vcpu_reset(struct vcpu *v);
+
+#endif /* _HVM_ALTP2M_H */
+
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 07d8e8e..9cd674f 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -210,6 +210,14 @@ struct hvm_function_table {
                                   uint32_t *ecx, uint32_t *edx);
 
     void (*enable_msr_exit_interception)(struct domain *d);
+
+    /* Alternate p2m */
+    int (*ahvm_vcpu_initialise)(struct vcpu *v);
+    void (*ahvm_vcpu_destroy)(struct vcpu *v);
+    int (*ahvm_vcpu_reset)(struct vcpu *v);
+    void (*ahvm_vcpu_update_eptp)(struct vcpu *v);
+    void (*ahvm_vcpu_update_vmfunc_ve)(struct vcpu *v);
+    bool_t (*ahvm_vcpu_emulate_ve)(struct vcpu *v);
 };
 
 extern struct hvm_function_table hvm_funcs;
@@ -522,6 +530,15 @@ extern bool_t opt_hvm_fep;
 #define opt_hvm_fep 0
 #endif
 
+/* updates the current EPTP in VMCS */
+void ahvm_vcpu_update_eptp(struct vcpu *v);
+
+/* updates VMCS fields related to VMFUNC and #VE */
+void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v);
+
+/* emulates #VE */
+bool_t ahvm_vcpu_emulate_ve(struct vcpu *v);
+
 #endif /* __ASM_X86_HVM_HVM_H__ */
 
 /*
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 3d8f4dc..a1529c0 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -118,6 +118,13 @@ struct nestedvcpu {
 
 #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
 
+struct altp2mvcpu {
+    uint16_t    p2midx;         /* alternate p2m index */
+    uint64_t    veinfo_gfn;     /* #VE information page guest pfn */
+};
+
+#define vcpu_altp2mhvm(v) ((v)->arch.hvm_vcpu.avcpu)
+
 struct hvm_vcpu {
     /* Guest control-register and EFER values, just as the guest sees them. */
     unsigned long       guest_cr[5];
@@ -163,6 +170,8 @@ struct hvm_vcpu {
 
     struct nestedvcpu   nvcpu;
 
+    struct altp2mvcpu   avcpu;
+
     struct mtrr_state   mtrr;
     u64                 pat_cr;
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b49c09b..d916891 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -175,6 +175,7 @@ typedef unsigned int p2m_query_t;
 typedef enum {
     p2m_host,
     p2m_nested,
+    p2m_alternate,
 } p2m_class_t;
 
 /* Per-p2m-table state */
@@ -193,7 +194,7 @@ struct p2m_domain {
 
     struct domain     *domain;   /* back pointer to domain */
 
-    p2m_class_t       p2m_class; /* host/nested/? */
+    p2m_class_t       p2m_class; /* host/nested/alternate */
 
     /* Nested p2ms only: nested p2m base value that this p2m shadows.
      * This can be cleared to P2M_BASE_EADDR under the per-p2m lock but
@@ -219,6 +220,9 @@ struct p2m_domain {
      * host p2m's lock. */
     int                defer_nested_flush;
 
+    /* Alternate p2m: count of vcpu's currently using this p2m. */
+    atomic_t           active_vcpus;
+
     /* Pages used to construct the p2m */
     struct page_list_head pages;
 
@@ -317,6 +321,11 @@ static inline bool_t p2m_is_nestedp2m(const struct p2m_domain *p2m)
     return p2m->p2m_class == p2m_nested;
 }
 
+static inline bool_t p2m_is_altp2m(const struct p2m_domain *p2m)
+{
+    return p2m->p2m_class == p2m_alternate;
+}
+
 #define p2m_get_pagetable(p2m)  ((p2m)->phys_table)
 
 /**** p2m query accessors. They lock p2m_lock, and thus serialize
@@ -722,6 +731,25 @@ void nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
     l1_pgentry_t *p, l1_pgentry_t new, unsigned int level);
 
 /*
+ * Alternate p2m: shadow p2m tables used for alternate memory views
+ */
+
+/* get current alternate p2m table */
+static inline struct p2m_domain *p2m_get_altp2m(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    uint16_t index = vcpu_altp2mhvm(v).p2midx;
+
+    return (index == INVALID_ALTP2M) ? NULL : d->arch.altp2m_p2m[index];
+}
+
+/* Locate an alternate p2m by its EPTP */
+bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx);
+
+/* Switch alternate p2m for a single vcpu */
+bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
+
+/*
  * p2m type to IOMMU flags
  */
 static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 05/12] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (3 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 04/12] x86/altp2m: basic data structures and support routines Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24 11:59   ` Andrew Cooper
  2015-06-22 18:56 ` [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator Ed White
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

Implement and hook up the code to enable VMX support of VMFUNC and #VE.

VMFUNC leaf 0 (EPTP switching) emulation is added in a later patch.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 132 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 2d3ad63..e8d9c82 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -56,6 +56,7 @@
 #include <asm/debugger.h>
 #include <asm/apic.h>
 #include <asm/hvm/nestedhvm.h>
+#include <asm/hvm/altp2mhvm.h>
 #include <asm/event.h>
 #include <asm/monitor.h>
 #include <public/arch-x86/cpuid.h>
@@ -1763,6 +1764,100 @@ static void vmx_enable_msr_exit_interception(struct domain *d)
                                          MSR_TYPE_W);
 }
 
+static void vmx_vcpu_update_eptp(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct p2m_domain *p2m = NULL;
+    struct ept_data *ept;
+
+    if ( altp2mhvm_active(d) )
+        p2m = p2m_get_altp2m(v);
+    if ( !p2m )
+        p2m = p2m_get_hostp2m(d);
+
+    ept = &p2m->ept;
+    ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+
+    vmx_vmcs_enter(v);
+
+    __vmwrite(EPT_POINTER, ept_get_eptp(ept));
+
+    if ( v->arch.hvm_vmx.secondary_exec_control &
+        SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
+        __vmwrite(EPTP_INDEX, vcpu_altp2mhvm(v).p2midx);
+
+    vmx_vmcs_exit(v);
+}
+
+static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
+
+    if ( !cpu_has_vmx_vmfunc )
+        return;
+
+    if ( cpu_has_vmx_virt_exceptions )
+        mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+
+    vmx_vmcs_enter(v);
+
+    if ( !d->is_dying && altp2mhvm_active(d) )
+    {
+        v->arch.hvm_vmx.secondary_exec_control |= mask;
+        __vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING);
+        __vmwrite(EPTP_LIST_ADDR, virt_to_maddr(d->arch.altp2m_eptp));
+
+        if ( cpu_has_vmx_virt_exceptions )
+        {
+            p2m_type_t t;
+            mfn_t mfn;
+
+            mfn = get_gfn_query_unlocked(d, vcpu_altp2mhvm(v).veinfo_gfn, &t);
+            __vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) << PAGE_SHIFT);
+        }
+    }
+    else
+        v->arch.hvm_vmx.secondary_exec_control &= ~mask;
+
+    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
+        v->arch.hvm_vmx.secondary_exec_control);
+
+    vmx_vmcs_exit(v);
+}
+
+static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
+{
+    bool_t rc = 0;
+    ve_info_t *veinfo = vcpu_altp2mhvm(v).veinfo_gfn ?
+        hvm_map_guest_frame_rw(vcpu_altp2mhvm(v).veinfo_gfn, 0) : NULL;
+
+    if ( !veinfo )
+        return 0;
+
+    if ( veinfo->semaphore != 0 )
+        goto out;
+
+    rc = 1;
+
+    veinfo->exit_reason = EXIT_REASON_EPT_VIOLATION;
+    veinfo->semaphore = ~0l;
+    veinfo->eptp_index = vcpu_altp2mhvm(v).p2midx;
+
+    vmx_vmcs_enter(v);
+    __vmread(EXIT_QUALIFICATION, &veinfo->exit_qualification);
+    __vmread(GUEST_LINEAR_ADDRESS, &veinfo->gla);
+    __vmread(GUEST_PHYSICAL_ADDRESS, &veinfo->gpa);
+    vmx_vmcs_exit(v);
+
+    hvm_inject_hw_exception(TRAP_virtualisation,
+                            HVM_DELIVER_NO_ERROR_CODE);
+
+out:
+    hvm_unmap_guest_frame(veinfo, 0);
+    return rc;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
     .name                 = "VMX",
     .cpu_up_prepare       = vmx_cpu_up_prepare,
@@ -1822,6 +1917,9 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
     .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
     .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
+    .ahvm_vcpu_update_eptp = vmx_vcpu_update_eptp,
+    .ahvm_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
+    .ahvm_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2754,6 +2852,40 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
     /* Now enable interrupts so it's safe to take locks. */
     local_irq_enable();
+ 
+    /*
+     * If the guest has the ability to switch EPTP without an exit,
+     * figure out whether it has done so and update the altp2m data.
+     */
+    if ( altp2mhvm_active(v->domain) &&
+        (v->arch.hvm_vmx.secondary_exec_control &
+        SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
+    {
+        unsigned long idx;
+
+        if ( v->arch.hvm_vmx.secondary_exec_control &
+            SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
+            __vmread(EPTP_INDEX, &idx);
+        else
+        {
+            unsigned long eptp;
+
+            __vmread(EPT_POINTER, &eptp);
+
+            if ( !p2m_find_altp2m_by_eptp(v->domain, eptp, &idx) )
+            {
+                gdprintk(XENLOG_ERR, "EPTP not found in alternate p2m list\n");
+                domain_crash(v->domain);
+            }
+        }
+
+        if ( (uint16_t)idx != vcpu_altp2mhvm(v).p2midx )
+        {
+            atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
+            vcpu_altp2mhvm(v).p2midx = (uint16_t)idx;
+            atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
+        }
+    }
 
     /* XXX: This looks ugly, but we need a mechanism to ensure
      * any pending vmresume has really happened
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (4 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 05/12] VMX/altp2m: add code to support EPTP switching and #VE Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24 12:47   ` Andrew Cooper
  2015-06-24 14:26   ` Jan Beulich
  2015-06-22 18:56 ` [PATCH v2 07/12] x86/altp2m: add control of suppress_ve Ed White
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	Andrew Cooper, tlengyel, Daniel De Graaf

From: Ravi Sahita <ravi.sahita@intel.com>

Signed-off-by: Ravi Sahita <ravi.sahita@intel.com>
---
 xen/arch/x86/hvm/emulate.c             | 13 +++++++++++--
 xen/arch/x86/hvm/vmx/vmx.c             | 30 ++++++++++++++++++++++++++++++
 xen/arch/x86/x86_emulate/x86_emulate.c |  8 ++++++++
 xen/arch/x86/x86_emulate/x86_emulate.h |  4 ++++
 xen/include/asm-x86/hvm/hvm.h          |  2 ++
 5 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index ac9c9d6..e38a2fe 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -1356,6 +1356,13 @@ static int hvmemul_invlpg(
     return rc;
 }
 
+static int hvmemul_vmfunc(
+    struct x86_emulate_ctxt *ctxt)
+{
+    hvm_funcs.ahvm_vcpu_emulate_vmfunc(ctxt->regs);
+    return X86EMUL_OKAY;
+}
+
 static const struct x86_emulate_ops hvm_emulate_ops = {
     .read          = hvmemul_read,
     .insn_fetch    = hvmemul_insn_fetch,
@@ -1379,7 +1386,8 @@ static const struct x86_emulate_ops hvm_emulate_ops = {
     .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
     .get_fpu       = hvmemul_get_fpu,
     .put_fpu       = hvmemul_put_fpu,
-    .invlpg        = hvmemul_invlpg
+    .invlpg        = hvmemul_invlpg,
+    .vmfunc        = hvmemul_vmfunc,
 };
 
 static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
@@ -1405,7 +1413,8 @@ static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
     .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
     .get_fpu       = hvmemul_get_fpu,
     .put_fpu       = hvmemul_put_fpu,
-    .invlpg        = hvmemul_invlpg
+    .invlpg        = hvmemul_invlpg,
+    .vmfunc        = hvmemul_vmfunc,
 };
 
 static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e8d9c82..ad9e9e4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -82,6 +82,7 @@ static void vmx_fpu_dirty_intercept(void);
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
+static int vmx_vmfunc_intercept(struct cpu_user_regs* regs);
 
 uint8_t __read_mostly posted_intr_vector;
 
@@ -1826,6 +1827,20 @@ static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
     vmx_vmcs_exit(v);
 }
 
+static bool_t vmx_vcpu_emulate_vmfunc(struct cpu_user_regs *regs)
+{
+    bool_t rc = 0;
+
+    if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
+         regs->eax == 0 &&
+         p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
+    {
+        regs->eip += 3;
+        rc = 1;
+    }
+    return rc;
+}
+
 static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
 {
     bool_t rc = 0;
@@ -1894,6 +1909,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .msr_read_intercept   = vmx_msr_read_intercept,
     .msr_write_intercept  = vmx_msr_write_intercept,
     .invlpg_intercept     = vmx_invlpg_intercept,
+    .vmfunc_intercept     = vmx_vmfunc_intercept,
     .handle_cd            = vmx_handle_cd,
     .set_info_guest       = vmx_set_info_guest,
     .set_rdtsc_exiting    = vmx_set_rdtsc_exiting,
@@ -1920,6 +1936,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .ahvm_vcpu_update_eptp = vmx_vcpu_update_eptp,
     .ahvm_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
     .ahvm_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
+    .ahvm_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2091,6 +2108,13 @@ static void vmx_invlpg_intercept(unsigned long vaddr)
         vpid_sync_vcpu_gva(curr, vaddr);
 }
 
+static int vmx_vmfunc_intercept(struct cpu_user_regs *regs)
+{
+    gdprintk(XENLOG_ERR, "Failed guest VMFUNC execution\n");
+    domain_crash(current->domain);
+    return X86EMUL_OKAY;
+}
+
 static int vmx_cr_access(unsigned long exit_qualification)
 {
     struct vcpu *curr = current;
@@ -2675,6 +2699,7 @@ void vmx_enter_realmode(struct cpu_user_regs *regs)
     regs->eflags |= (X86_EFLAGS_VM | X86_EFLAGS_IOPL);
 }
 
+
 static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs)
 {
     struct hvm_emulate_ctxt ctxt;
@@ -3239,6 +3264,11 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             update_guest_eip();
         break;
 
+    case EXIT_REASON_VMFUNC:
+        if ( vmx_vmfunc_intercept(regs) == X86EMUL_OKAY )
+            update_guest_eip();
+        break;
+
     case EXIT_REASON_MWAIT_INSTRUCTION:
     case EXIT_REASON_MONITOR_INSTRUCTION:
     case EXIT_REASON_GETSEC:
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index c017c69..4ae95ce 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3837,6 +3837,14 @@ x86_emulate(
             goto rdtsc;
         }
 
+        if (modrm == 0xd4) /* vmfunc */
+        {
+            fail_if(ops->vmfunc == NULL);
+            if ( (rc = ops->vmfunc(ctxt) != 0) )
+                goto done;
+            break;
+        }
+
         switch ( modrm_reg & 7 )
         {
         case 0: /* sgdt */
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index 064b8f4..a4d4ec8 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -397,6 +397,10 @@ struct x86_emulate_ops
         enum x86_segment seg,
         unsigned long offset,
         struct x86_emulate_ctxt *ctxt);
+
+    /* vmfunc: Emulate VMFUNC via given set of EAX ECX inputs */
+    int (*vmfunc)(
+        struct x86_emulate_ctxt *ctxt);
 };
 
 struct cpu_user_regs;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 9cd674f..2e33b4f 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -167,6 +167,7 @@ struct hvm_function_table {
     int (*msr_read_intercept)(unsigned int msr, uint64_t *msr_content);
     int (*msr_write_intercept)(unsigned int msr, uint64_t msr_content);
     void (*invlpg_intercept)(unsigned long vaddr);
+    int (*vmfunc_intercept)(struct cpu_user_regs *regs);
     void (*handle_cd)(struct vcpu *v, unsigned long value);
     void (*set_info_guest)(struct vcpu *v);
     void (*set_rdtsc_exiting)(struct vcpu *v, bool_t);
@@ -218,6 +219,7 @@ struct hvm_function_table {
     void (*ahvm_vcpu_update_eptp)(struct vcpu *v);
     void (*ahvm_vcpu_update_vmfunc_ve)(struct vcpu *v);
     bool_t (*ahvm_vcpu_emulate_ve)(struct vcpu *v);
+    bool_t (*ahvm_vcpu_emulate_vmfunc)(struct cpu_user_regs *regs);
 };
 
 extern struct hvm_function_table hvm_funcs;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (5 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24 13:05   ` Andrew Cooper
  2015-06-24 14:38   ` Jan Beulich
  2015-06-22 18:56 ` [PATCH v2 08/12] x86/altp2m: alternate p2m memory events Ed White
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

The existing ept_set_entry() and ept_get_entry() routines are extended
to optionally set/get suppress_ve and renamed. New ept_set_entry() and
ept_get_entry() routines are provided as wrappers, where set preserves
suppress_ve for an existing entry and sets it for a new entry.

Additional function pointers are added to p2m_domain to allow direct
access to the extended routines.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/mm/p2m-ept.c | 40 +++++++++++++++++++++++++++++++++-------
 xen/include/asm-x86/p2m.h | 13 +++++++++++++
 2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 5de3387..e7719cf 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -649,14 +649,15 @@ bool_t ept_handle_misconfig(uint64_t gpa)
 }
 
 /*
- * ept_set_entry() computes 'need_modify_vtd_table' for itself,
+ * ept_set_entry_sve() computes 'need_modify_vtd_table' for itself,
  * by observing whether any gfn->mfn translations are modified.
  *
  * Returns: 0 for success, -errno for failure
  */
 static int
-ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
-              unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma)
+ept_set_entry_sve(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
+                  unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma,
+                  unsigned int sve)
 {
     ept_entry_t *table, *ept_entry = NULL;
     unsigned long gfn_remainder = gfn;
@@ -802,7 +803,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
     }
 
-    new_entry.suppress_ve = 1;
+    if ( sve != ~0 )
+        new_entry.suppress_ve = !!sve;
+    else
+        new_entry.suppress_ve = is_epte_valid(&old_entry) ?
+                                    old_entry.suppress_ve : 1;
 
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
     if ( unlikely(rc) )
@@ -847,10 +852,18 @@ out:
     return rc;
 }
 
+static int
+ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
+              unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma)
+{
+    return ept_set_entry_sve(p2m, gfn, mfn, order, p2mt, p2ma, ~0);
+}
+
 /* Read ept p2m entries */
-static mfn_t ept_get_entry(struct p2m_domain *p2m,
-                           unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
-                           p2m_query_t q, unsigned int *page_order)
+static mfn_t ept_get_entry_sve(struct p2m_domain *p2m,
+                               unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
+                               p2m_query_t q, unsigned int *page_order,
+                               unsigned int *sve)
 {
     ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
     unsigned long gfn_remainder = gfn;
@@ -864,6 +877,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
 
     *t = p2m_mmio_dm;
     *a = p2m_access_n;
+    if ( sve )
+        *sve = 1;
 
     /* This pfn is higher than the highest the p2m map currently holds */
     if ( gfn > p2m->max_mapped_pfn )
@@ -929,6 +944,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
         else
             *t = ept_entry->sa_p2mt;
         *a = ept_entry->access;
+        if ( sve )
+            *sve = ept_entry->suppress_ve;
 
         mfn = _mfn(ept_entry->mfn);
         if ( i )
@@ -952,6 +969,13 @@ out:
     return mfn;
 }
 
+static mfn_t ept_get_entry(struct p2m_domain *p2m,
+                           unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
+                           p2m_query_t q, unsigned int *page_order)
+{
+    return ept_get_entry_sve(p2m, gfn, t, a, q, page_order, NULL);
+}
+
 void ept_walk_table(struct domain *d, unsigned long gfn)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
@@ -1130,6 +1154,8 @@ int ept_p2m_init(struct p2m_domain *p2m)
 
     p2m->set_entry = ept_set_entry;
     p2m->get_entry = ept_get_entry;
+    p2m->set_entry_full = ept_set_entry_sve;
+    p2m->get_entry_full = ept_get_entry_sve;
     p2m->change_entry_type_global = ept_change_entry_type_global;
     p2m->change_entry_type_range = ept_change_entry_type_range;
     p2m->memory_type_changed = ept_memory_type_changed;
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index d916891..16fd523 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -237,6 +237,19 @@ struct p2m_domain {
                                        p2m_access_t *p2ma,
                                        p2m_query_t q,
                                        unsigned int *page_order);
+    int                (*set_entry_full)(struct p2m_domain *p2m,
+                                         unsigned long gfn,
+                                         mfn_t mfn, unsigned int page_order,
+                                         p2m_type_t p2mt,
+                                         p2m_access_t p2ma,
+                                         unsigned int sve);
+    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
+                                         unsigned long gfn,
+                                         p2m_type_t *p2mt,
+                                         p2m_access_t *p2ma,
+                                         p2m_query_t q,
+                                         unsigned int *page_order,
+                                         unsigned int *sve);
     void               (*enable_hardware_log_dirty)(struct p2m_domain *p2m);
     void               (*disable_hardware_log_dirty)(struct p2m_domain *p2m);
     void               (*flush_hardware_cached_dirty)(struct p2m_domain *p2m);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 08/12] x86/altp2m: alternate p2m memory events.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (6 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 07/12] x86/altp2m: add control of suppress_ve Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24 13:09   ` Andrew Cooper
  2015-06-24 16:01   ` Lengyel, Tamas
  2015-06-22 18:56 ` [PATCH v2 09/12] x86/altp2m: add remaining support routines Ed White
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

Add a flag to indicate that a memory event occurred in an alternate p2m
and a field containing the p2m index. Allow the response to switch to
a different p2m using the same flag and field.

Modify p2m_access_check() to handle alternate p2m's.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/mm/p2m.c         | 20 +++++++++++++++++++-
 xen/include/asm-arm/p2m.h     |  7 +++++++
 xen/include/asm-x86/p2m.h     |  4 ++++
 xen/include/public/vm_event.h | 13 ++++++++++++-
 xen/include/xen/mem_access.h  |  1 +
 5 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 87b4b75..389360a 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1516,6 +1516,13 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
     }
 }
 
+void p2m_mem_access_altp2m_check(struct vcpu *v, const vm_event_response_t *rsp)
+{
+    if ( (rsp->flags & MEM_ACCESS_ALTERNATE_P2M) &&
+         altp2mhvm_active(v->domain) )
+        p2m_switch_vcpu_altp2m_by_id(v, rsp->u.mem_access.altp2m_idx);
+}
+
 bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
                             struct npfec npfec,
                             vm_event_request_t **req_ptr)
@@ -1523,7 +1530,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     struct vcpu *v = current;
     unsigned long gfn = gpa >> PAGE_SHIFT;
     struct domain *d = v->domain;    
-    struct p2m_domain* p2m = p2m_get_hostp2m(d);
+    struct p2m_domain *p2m = NULL;
     mfn_t mfn;
     p2m_type_t p2mt;
     p2m_access_t p2ma;
@@ -1531,6 +1538,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     int rc;
     unsigned long eip = guest_cpu_user_regs()->eip;
 
+    if ( altp2mhvm_active(d) )
+        p2m = p2m_get_altp2m(v);
+    if ( !p2m )
+        p2m = p2m_get_hostp2m(d);
+
     /* First, handle rx2rw conversion automatically.
      * These calls to p2m->set_entry() must succeed: we have the gfn
      * locked and just did a successful get_entry(). */
@@ -1637,6 +1649,12 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
         req->vcpu_id = v->vcpu_id;
 
         p2m_vm_event_fill_regs(req);
+
+        if ( altp2mhvm_active(v->domain) )
+        {
+            req->flags |= MEM_ACCESS_ALTERNATE_P2M;
+            req->u.mem_access.altp2m_idx = vcpu_altp2mhvm(v).p2midx;
+        }
     }
 
     /* Pause the current VCPU */
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 63748ef..b31dd6f 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -109,6 +109,13 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
     /* Not supported on ARM. */
 }
 
+static inline
+void p2m_mem_access_altp2m_check(struct vcpu *v,
+                                const mem_event_response_t *rsp)
+{
+    /* Not supported on ARM. */
+}
+
 #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
 #define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 16fd523..d84da33 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -762,6 +762,10 @@ bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *i
 /* Switch alternate p2m for a single vcpu */
 bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
 
+/* Check to see if vcpu should be switched to a different p2m. */
+void p2m_mem_access_altp2m_check(struct vcpu *v,
+                                 const vm_event_response_t *rsp);
+
 /*
  * p2m type to IOMMU flags
  */
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index 577e971..b492f65 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -149,13 +149,24 @@ struct vm_event_regs_x86 {
  * potentially having side effects (like memory mapped or port I/O) disabled.
  */
 #define MEM_ACCESS_EMULATE_NOWRITE      (1 << 7)
+/*
+ * This flag can be set in a request or a response
+ *
+ * On a request, indicates that the event occurred in the alternate p2m specified by
+ * the altp2m_idx request field.
+ *
+ * On a response, indicates that the VCPU should resume in the alternate p2m specified
+ * by the altp2m_idx response field if possible.
+ */
+#define MEM_ACCESS_ALTERNATE_P2M        (1 << 8)
 
 struct vm_event_mem_access {
     uint64_t gfn;
     uint64_t offset;
     uint64_t gla;   /* if flags has MEM_ACCESS_GLA_VALID set */
     uint32_t flags; /* MEM_ACCESS_* */
-    uint32_t _pad;
+    uint16_t altp2m_idx; /* may be used during request and response */
+    uint16_t _pad;
 };
 
 struct vm_event_write_ctrlreg {
diff --git a/xen/include/xen/mem_access.h b/xen/include/xen/mem_access.h
index f60b727..4d3d5ca 100644
--- a/xen/include/xen/mem_access.h
+++ b/xen/include/xen/mem_access.h
@@ -36,6 +36,7 @@ static inline
 void mem_access_resume(struct vcpu *v, vm_event_response_t *rsp)
 {
     p2m_mem_access_emulate_check(v, rsp);
+    p2m_mem_access_altp2m_check(v, rsp);
 }
 
 #else
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (7 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 08/12] x86/altp2m: alternate p2m memory events Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-23 18:15   ` Lengyel, Tamas
                     ` (3 more replies)
  2015-06-22 18:56 ` [PATCH v2 10/12] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
                   ` (4 subsequent siblings)
  13 siblings, 4 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

Add the remaining routines required to support enabling the alternate
p2m functionality.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/hvm.c              |  60 +++++-
 xen/arch/x86/mm/hap/Makefile        |   1 +
 xen/arch/x86/mm/hap/altp2m_hap.c    | 103 +++++++++
 xen/arch/x86/mm/p2m-ept.c           |   3 +
 xen/arch/x86/mm/p2m.c               | 405 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/altp2mhvm.h |   4 +
 xen/include/asm-x86/p2m.h           |  33 +++
 7 files changed, 601 insertions(+), 8 deletions(-)
 create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d75c12d..b758ee1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2786,10 +2786,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     p2m_access_t p2ma;
     mfn_t mfn;
     struct vcpu *v = current;
-    struct p2m_domain *p2m;
+    struct p2m_domain *p2m, *hostp2m;
     int rc, fall_through = 0, paged = 0;
     int sharing_enomem = 0;
     vm_event_request_t *req_ptr = NULL;
+    int altp2m_active = 0;
 
     /* On Nested Virtualization, walk the guest page table.
      * If this succeeds, all is fine.
@@ -2845,15 +2846,33 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     {
         if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec) )
             hvm_inject_hw_exception(TRAP_gp_fault, 0);
-        rc = 1;
-        goto out;
+        return 1;
     }
 
-    p2m = p2m_get_hostp2m(v->domain);
-    mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
+    altp2m_active = altp2mhvm_active(v->domain);
+
+    /* Take a lock on the host p2m speculatively, to avoid potential
+     * locking order problems later and to handle unshare etc.
+     */
+    hostp2m = p2m_get_hostp2m(v->domain);
+    mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
                               P2M_ALLOC | (npfec.write_access ? P2M_UNSHARE : 0),
                               NULL);
 
+    if ( altp2m_active )
+    {
+        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m) == 1 )
+        {
+            /* entry was lazily copied from host -- retry */
+            __put_gfn(hostp2m, gfn);
+            return 1;
+        }
+
+        mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
+    }
+    else
+        p2m = hostp2m;
+
     /* Check access permissions first, then handle faults */
     if ( mfn_x(mfn) != INVALID_MFN )
     {
@@ -2893,6 +2912,20 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
 
         if ( violation )
         {
+            /* Should #VE be emulated for this fault? */
+            if ( p2m_is_altp2m(p2m) && !cpu_has_vmx_virt_exceptions )
+            {
+                unsigned int sve;
+
+                p2m->get_entry_full(p2m, gfn, &p2mt, &p2ma, 0, NULL, &sve);
+
+                if ( !sve && ahvm_vcpu_emulate_ve(v) )
+                {
+                    rc = 1;
+                    goto out_put_gfn;
+                }
+            }
+
             if ( p2m_mem_access_check(gpa, gla, npfec, &req_ptr) )
             {
                 fall_through = 1;
@@ -2912,7 +2945,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
          (npfec.write_access &&
           (p2m_is_discard_write(p2mt) || (p2mt == p2m_mmio_write_dm))) )
     {
-        put_gfn(p2m->domain, gfn);
+        __put_gfn(p2m, gfn);
+        if ( altp2m_active )
+            __put_gfn(hostp2m, gfn);
 
         rc = 0;
         if ( unlikely(is_pvh_vcpu(v)) )
@@ -2941,6 +2976,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     /* Spurious fault? PoD and log-dirty also take this path. */
     if ( p2m_is_ram(p2mt) )
     {
+        rc = 1;
         /*
          * Page log dirty is always done with order 0. If this mfn resides in
          * a large page, we do not change other pages type within that large
@@ -2949,9 +2985,15 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
         if ( npfec.write_access )
         {
             paging_mark_dirty(v->domain, mfn_x(mfn));
+            /* If p2m is really an altp2m, unlock here to avoid lock ordering
+             * violation when the change below is propagated from host p2m */
+            if ( altp2m_active )
+                __put_gfn(p2m, gfn);
             p2m_change_type_one(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
+            __put_gfn(altp2m_active ? hostp2m : p2m, gfn);
+
+            goto out;
         }
-        rc = 1;
         goto out_put_gfn;
     }
 
@@ -2961,7 +3003,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
     rc = fall_through;
 
 out_put_gfn:
-    put_gfn(p2m->domain, gfn);
+    __put_gfn(p2m, gfn);
+    if ( altp2m_active )
+        __put_gfn(hostp2m, gfn);
 out:
     /* All of these are delayed until we exit, since we might 
      * sleep on event ring wait queues, and we must not hold
diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
index 68f2bb5..216cd90 100644
--- a/xen/arch/x86/mm/hap/Makefile
+++ b/xen/arch/x86/mm/hap/Makefile
@@ -4,6 +4,7 @@ obj-y += guest_walk_3level.o
 obj-$(x86_64) += guest_walk_4level.o
 obj-y += nested_hap.o
 obj-y += nested_ept.o
+obj-y += altp2m_hap.o
 
 guest_walk_%level.o: guest_walk.c Makefile
 	$(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c b/xen/arch/x86/mm/hap/altp2m_hap.c
new file mode 100644
index 0000000..899b636
--- /dev/null
+++ b/xen/arch/x86/mm/hap/altp2m_hap.c
@@ -0,0 +1,103 @@
+/******************************************************************************
+ * arch/x86/mm/hap/altp2m_hap.c
+ *
+ * Copyright (c) 2014 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <asm/domain.h>
+#include <asm/page.h>
+#include <asm/paging.h>
+#include <asm/p2m.h>
+#include <asm/hap.h>
+#include <asm/hvm/altp2mhvm.h>
+
+#include "private.h"
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_valid
+#define mfn_valid(_mfn) __mfn_valid(mfn_x(_mfn))
+#undef page_to_mfn
+#define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg))
+
+/*
+ * If the fault is for a not present entry:
+ *     if the entry in the host p2m has a valid mfn, copy it and retry
+ *     else indicate that outer handler should handle fault
+ *
+ * If the fault is for a present entry:
+ *     indicate that outer handler should handle fault
+ */
+
+int
+altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
+                                unsigned long gla, struct npfec npfec,
+                                struct p2m_domain **ap2m)
+{
+    struct p2m_domain *hp2m = p2m_get_hostp2m(v->domain);
+    p2m_type_t p2mt;
+    p2m_access_t p2ma;
+    unsigned int page_order;
+    unsigned long gfn, mask;
+    mfn_t mfn;
+    int rv;
+
+    *ap2m = p2m_get_altp2m(v);
+
+    mfn = get_gfn_type_access(*ap2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
+                              0, &page_order);
+    __put_gfn(*ap2m, gpa >> PAGE_SHIFT);
+
+    if ( mfn_x(mfn) != INVALID_MFN )
+        return 0;
+
+    mfn = get_gfn_type_access(hp2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
+                              0, &page_order);
+    put_gfn(hp2m->domain, gpa >> PAGE_SHIFT);
+
+    if ( mfn_x(mfn) == INVALID_MFN )
+        return 0;
+
+    p2m_lock(*ap2m);
+
+    /* If this is a superpage mapping, round down both frame numbers
+     * to the start of the superpage. */
+    mask = ~((1UL << page_order) - 1);
+    gfn = (gpa >> PAGE_SHIFT) & mask;
+    mfn = _mfn(mfn_x(mfn) & mask);
+
+    rv = p2m_set_entry(*ap2m, gfn, mfn, page_order, p2mt, p2ma);
+    p2m_unlock(*ap2m);
+
+    if ( rv ) {
+        gdprintk(XENLOG_ERR,
+	    "failed to set entry for %#"PRIx64" -> %#"PRIx64"\n",
+	    gpa, mfn_x(mfn));
+        domain_crash(hp2m->domain);
+    }
+
+    return 1;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index e7719cf..4411b36 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -849,6 +849,9 @@ out:
     if ( is_epte_present(&old_entry) )
         ept_free_entry(p2m, &old_entry, target);
 
+    if ( rc == 0 && p2m_is_hostp2m(p2m) )
+        p2m_altp2m_propagate_change(d, gfn, mfn, order, p2mt, p2ma);
+
     return rc;
 }
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 389360a..588acd5 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2041,6 +2041,411 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx)
     return rc;
 }
 
+void p2m_flush_altp2m(struct domain *d)
+{
+    uint16_t i;
+
+    altp2m_lock(d);
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+    {
+        p2m_flush_table(d->arch.altp2m_p2m[i]);
+        /* Uninit and reinit ept to force TLB shootdown */
+        ept_p2m_uninit(d->arch.altp2m_p2m[i]);
+        ept_p2m_init(d->arch.altp2m_p2m[i]);
+        d->arch.altp2m_eptp[i] = ~0ul;
+    }
+
+    altp2m_unlock(d);
+}
+
+bool_t p2m_init_altp2m_by_id(struct domain *d, uint16_t idx)
+{
+    struct p2m_domain *p2m;
+    struct ept_data *ept;
+    bool_t rc = 0;
+
+    if ( idx > MAX_ALTP2M )
+        return rc;
+
+    altp2m_lock(d);
+
+    if ( d->arch.altp2m_eptp[idx] == ~0ul )
+    {
+        p2m = d->arch.altp2m_p2m[idx];
+        p2m->min_remapped_pfn = ~0ul;
+        p2m->max_remapped_pfn = ~0ul;
+        ept = &p2m->ept;
+        ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+        d->arch.altp2m_eptp[idx] = ept_get_eptp(ept);
+        rc = 1;
+    }
+
+    altp2m_unlock(d);
+    return rc;
+}
+
+bool_t p2m_init_next_altp2m(struct domain *d, uint16_t *idx)
+{
+    struct p2m_domain *p2m;
+    struct ept_data *ept;
+    bool_t rc = 0;
+    uint16_t i;
+
+    altp2m_lock(d);
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+    {
+        if ( d->arch.altp2m_eptp[i] != ~0ul )
+            continue;
+
+        p2m = d->arch.altp2m_p2m[i];
+        p2m->min_remapped_pfn = ~0ul;
+        p2m->max_remapped_pfn = ~0ul;
+        ept = &p2m->ept;
+        ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+        d->arch.altp2m_eptp[i] = ept_get_eptp(ept);
+        *idx = i;
+        rc = 1;
+
+        break;
+    }
+
+    altp2m_unlock(d);
+    return rc;
+}
+
+bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx)
+{
+    struct p2m_domain *p2m;
+    struct vcpu *curr = current;
+    struct vcpu *v;
+    bool_t rc = 0;
+
+    if ( !idx || idx > MAX_ALTP2M )
+        return rc;
+
+    if ( curr->domain != d )
+        domain_pause(d);
+    else
+        for_each_vcpu( d, v )
+            if ( curr != v )
+                vcpu_pause(v);
+
+    altp2m_lock(d);
+
+    if ( d->arch.altp2m_eptp[idx] != ~0ul )
+    {
+        p2m = d->arch.altp2m_p2m[idx];
+
+        if ( !_atomic_read(p2m->active_vcpus) )
+        {
+            p2m_flush_table(d->arch.altp2m_p2m[idx]);
+            /* Uninit and reinit ept to force TLB shootdown */
+            ept_p2m_uninit(d->arch.altp2m_p2m[idx]);
+            ept_p2m_init(d->arch.altp2m_p2m[idx]);
+            d->arch.altp2m_eptp[idx] = ~0ul;
+            rc = 1;
+        }
+    }
+
+    altp2m_unlock(d);
+
+    if ( curr->domain != d )
+        domain_unpause(d);
+    else
+        for_each_vcpu( d, v )
+            if ( curr != v )
+                vcpu_unpause(v);
+
+    return rc;
+}
+
+bool_t p2m_switch_domain_altp2m_by_id(struct domain *d, uint16_t idx)
+{
+    struct vcpu *curr = current;
+    struct vcpu *v;
+    bool_t rc = 0;
+
+    if ( idx > MAX_ALTP2M )
+        return rc;
+
+    if ( curr->domain != d )
+        domain_pause(d);
+    else
+        for_each_vcpu( d, v )
+            if ( curr != v )
+                vcpu_pause(v);
+
+    altp2m_lock(d);
+
+    if ( d->arch.altp2m_eptp[idx] != ~0ul )
+    {
+        for_each_vcpu( d, v )
+            if ( idx != vcpu_altp2mhvm(v).p2midx )
+            {
+                atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
+                vcpu_altp2mhvm(v).p2midx = idx;
+                atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
+                ahvm_vcpu_update_eptp(v);
+            }
+
+        rc = 1;
+    }
+
+    altp2m_unlock(d);
+
+    if ( curr->domain != d )
+        domain_unpause(d);
+    else
+        for_each_vcpu( d, v )
+            if ( curr != v )
+                vcpu_unpause(v);
+
+    return rc;
+}
+
+bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
+                                 unsigned long pfn, xenmem_access_t access)
+{
+    struct p2m_domain *hp2m, *ap2m;
+    p2m_access_t a, _a;
+    p2m_type_t t;
+    mfn_t mfn;
+    unsigned int page_order;
+    bool_t rc = 0;
+
+    static const p2m_access_t memaccess[] = {
+#define ACCESS(ac) [XENMEM_access_##ac] = p2m_access_##ac
+        ACCESS(n),
+        ACCESS(r),
+        ACCESS(w),
+        ACCESS(rw),
+        ACCESS(x),
+        ACCESS(rx),
+        ACCESS(wx),
+        ACCESS(rwx),
+#undef ACCESS
+    };
+
+    if ( idx > MAX_ALTP2M || d->arch.altp2m_eptp[idx] == ~0ul )
+        return 0;
+
+    ap2m = d->arch.altp2m_p2m[idx];
+
+    switch ( access )
+    {
+    case 0 ... ARRAY_SIZE(memaccess) - 1:
+        a = memaccess[access];
+        break;
+    case XENMEM_access_default:
+        a = ap2m->default_access;
+        break;
+    default:
+        return 0;
+    }
+
+    /* If request to set default access */
+    if ( pfn == ~0ul )
+    {
+        ap2m->default_access = a;
+        return 1;
+    }
+
+    hp2m = p2m_get_hostp2m(d);
+
+    p2m_lock(ap2m);
+
+    mfn = ap2m->get_entry(ap2m, pfn, &t, &_a, 0, NULL);
+
+    /* Check host p2m if no valid entry in alternate */
+    if ( !mfn_valid(mfn) )
+    {
+        mfn = hp2m->get_entry(hp2m, pfn, &t, &_a, 0, &page_order);
+
+        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
+            goto out;
+
+        /* If this is a superpage, copy that first */
+        if ( page_order != PAGE_ORDER_4K )
+        {
+            unsigned long gfn, mask;
+            mfn_t mfn2;
+
+            mask = ~((1UL << page_order) - 1);
+            gfn = pfn & mask;
+            mfn2 = _mfn(mfn_x(mfn) & mask);
+
+            if ( ap2m->set_entry(ap2m, gfn, mfn2, page_order, t, _a) )
+                goto out;
+        }
+    }
+
+    if ( !ap2m->set_entry_full(ap2m, pfn, mfn, PAGE_ORDER_4K, t, a,
+                               (current->domain != d)) )
+        rc = 1;
+
+out:
+    p2m_unlock(ap2m);
+    return rc;
+}
+
+bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
+                             unsigned long old_pfn, unsigned long new_pfn)
+{
+    struct p2m_domain *hp2m, *ap2m;
+    p2m_access_t a;
+    p2m_type_t t;
+    mfn_t mfn;
+    unsigned int page_order;
+    bool_t rc = 0;
+
+    if ( idx > MAX_ALTP2M || d->arch.altp2m_eptp[idx] == ~0ul )
+        return 0;
+
+    hp2m = p2m_get_hostp2m(d);
+    ap2m = d->arch.altp2m_p2m[idx];
+
+    p2m_lock(ap2m);
+
+    mfn = ap2m->get_entry(ap2m, old_pfn, &t, &a, 0, NULL);
+
+    if ( new_pfn == ~0ul )
+    {
+        if ( mfn_valid(mfn) )
+            p2m_remove_page(ap2m, old_pfn, mfn_x(mfn), PAGE_ORDER_4K);
+        rc = 1;
+        goto out;
+    }
+
+    /* Check host p2m if no valid entry in alternate */
+    if ( !mfn_valid(mfn) )
+    {
+        mfn = hp2m->get_entry(hp2m, old_pfn, &t, &a, 0, &page_order);
+
+        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
+            goto out;
+
+        /* If this is a superpage, copy that first */
+        if ( page_order != PAGE_ORDER_4K )
+        {
+            unsigned long gfn, mask;
+
+            mask = ~((1UL << page_order) - 1);
+            gfn = old_pfn & mask;
+            mfn = _mfn(mfn_x(mfn) & mask);
+
+            if ( ap2m->set_entry(ap2m, gfn, mfn, page_order, t, a) )
+                goto out;
+        }
+    }
+
+    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
+
+    if ( !mfn_valid(mfn) )
+        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
+
+    if ( !mfn_valid(mfn) || (t != p2m_ram_rw) )
+        goto out;
+
+    if ( !ap2m->set_entry_full(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a,
+                               (current->domain != d)) )
+    {
+        rc = 1;
+
+        if ( ap2m->min_remapped_pfn == ~0ul ||
+             new_pfn < ap2m->min_remapped_pfn )
+            ap2m->min_remapped_pfn = new_pfn;
+        if ( ap2m->max_remapped_pfn == ~0ul ||
+             new_pfn > ap2m->max_remapped_pfn )
+            ap2m->max_remapped_pfn = new_pfn;
+    }
+
+out:
+    p2m_unlock(ap2m);
+    return rc;
+}
+
+static inline void p2m_reset_altp2m(struct p2m_domain *p2m)
+{
+    p2m_flush_table(p2m);
+    /* Uninit and reinit ept to force TLB shootdown */
+    ept_p2m_uninit(p2m);
+    ept_p2m_init(p2m);
+    p2m->min_remapped_pfn = ~0ul;
+    p2m->max_remapped_pfn = ~0ul;
+}
+
+void p2m_altp2m_propagate_change(struct domain *d, unsigned long gfn,
+                                 mfn_t mfn, unsigned int page_order,
+                                 p2m_type_t p2mt, p2m_access_t p2ma)
+{
+    struct p2m_domain *p2m;
+    p2m_access_t a;
+    p2m_type_t t;
+    mfn_t m;
+    uint16_t i;
+    bool_t reset_p2m;
+    unsigned int reset_count = 0;
+    uint16_t last_reset_idx = ~0;
+
+    if ( !altp2mhvm_active(d) )
+        return;
+
+    altp2m_lock(d);
+
+    for ( i = 0; i < MAX_ALTP2M; i++ )
+    {
+        if ( d->arch.altp2m_eptp[i] == ~0ul )
+            continue;
+
+        p2m = d->arch.altp2m_p2m[i];
+        m = get_gfn_type_access(p2m, gfn, &t, &a, 0, NULL);
+
+        reset_p2m = 0;
+
+        /* Check for a dropped page that may impact this altp2m */
+        if ( mfn_x(mfn) == INVALID_MFN &&
+             gfn >= p2m->min_remapped_pfn && gfn <= p2m->max_remapped_pfn )
+            reset_p2m = 1;
+
+        if ( reset_p2m )
+        {
+            if ( !reset_count++ )
+            {
+                p2m_reset_altp2m(p2m);
+                last_reset_idx = i;
+            }
+            else
+            {
+                /* At least 2 altp2m's impacted, so reset everything */
+                __put_gfn(p2m, gfn);
+
+                for ( i = 0; i < MAX_ALTP2M; i++ )
+                {
+                    if ( i == last_reset_idx ||
+                         d->arch.altp2m_eptp[i] == ~0ul )
+                        continue;
+
+                    p2m = d->arch.altp2m_p2m[i];
+                    p2m_lock(p2m);
+                    p2m_reset_altp2m(p2m);
+                    p2m_unlock(p2m);
+                }
+
+                goto out;
+            }
+        }
+        else if ( mfn_x(m) != INVALID_MFN )
+           p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
+
+        __put_gfn(p2m, gfn);
+    }
+
+out:
+    altp2m_unlock(d);
+}
+
 /*** Audit ***/
 
 #if P2M_AUDIT
diff --git a/xen/include/asm-x86/hvm/altp2mhvm.h b/xen/include/asm-x86/hvm/altp2mhvm.h
index a4b8e24..08ff79b 100644
--- a/xen/include/asm-x86/hvm/altp2mhvm.h
+++ b/xen/include/asm-x86/hvm/altp2mhvm.h
@@ -34,5 +34,9 @@ int altp2mhvm_vcpu_initialise(struct vcpu *v);
 void altp2mhvm_vcpu_destroy(struct vcpu *v);
 void altp2mhvm_vcpu_reset(struct vcpu *v);
 
+/* Alternate p2m paging */
+int altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
+    unsigned long gla, struct npfec npfec, struct p2m_domain **ap2m);
+
 #endif /* _HVM_ALTP2M_H */
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index d84da33..3f17211 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -279,6 +279,11 @@ struct p2m_domain {
     /* Highest guest frame that's ever been mapped in the p2m */
     unsigned long max_mapped_pfn;
 
+    /* Alternate p2m's only: range of pfn's for which underlying
+     * mfn may have duplicate mappings */
+    unsigned long min_remapped_pfn;
+    unsigned long max_remapped_pfn;
+
     /* When releasing shared gfn's in a preemptible manner, recall where
      * to resume the search */
     unsigned long next_shared_gfn_to_relinquish;
@@ -766,6 +771,34 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
 void p2m_mem_access_altp2m_check(struct vcpu *v,
                                  const vm_event_response_t *rsp);
 
+/* Flush all the alternate p2m's for a domain */
+void p2m_flush_altp2m(struct domain *d);
+
+/* Make a specific alternate p2m valid */
+bool_t p2m_init_altp2m_by_id(struct domain *d, uint16_t idx);
+
+/* Find an available alternate p2m and make it valid */
+bool_t p2m_init_next_altp2m(struct domain *d, uint16_t *idx);
+
+/* Make a specific alternate p2m invalid */
+bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx);
+
+/* Switch alternate p2m for entire domain */
+bool_t p2m_switch_domain_altp2m_by_id(struct domain *d, uint16_t idx);
+
+/* Set access type for a pfn */
+bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
+                                 unsigned long pfn, xenmem_access_t access);
+
+/* Replace a pfn with a different pfn */
+bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
+                             unsigned long old_pfn, unsigned long new_pfn);
+
+/* Propagate a host p2m change to all alternate p2m's */
+void p2m_altp2m_propagate_change(struct domain *d, unsigned long gfn,
+                                 mfn_t mfn, unsigned int page_order,
+                                 p2m_type_t p2mt, p2m_access_t p2ma);
+
 /*
  * p2m type to IOMMU flags
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 10/12] x86/altp2m: define and implement alternate p2m HVMOP types.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (8 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 09/12] x86/altp2m: add remaining support routines Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24 13:58   ` Andrew Cooper
  2015-06-24 14:53   ` Jan Beulich
  2015-06-22 18:56 ` [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter Ed White
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 xen/arch/x86/hvm/hvm.c          | 216 ++++++++++++++++++++++++++++++++++++++++
 xen/include/public/hvm/hvm_op.h |  69 +++++++++++++
 2 files changed, 285 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b758ee1..b3e74ce 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6424,6 +6424,222 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case HVMOP_altp2m_get_domain_state:
+    {
+        struct xen_hvm_altp2m_domain_state a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() )
+            goto param_fail9;
+
+        a.state = altp2mhvm_active(d);
+        rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+    param_fail9:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_set_domain_state:
+    {
+        struct xen_hvm_altp2m_domain_state a;
+        struct domain *d;
+        struct vcpu *v;
+        bool_t ostate;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             nestedhvm_enabled(d) )
+            goto param_fail10;
+
+        ostate = d->arch.altp2m_active;
+        d->arch.altp2m_active = !!a.state;
+
+        /* If the alternate p2m state has changed, handle appropriately */
+        if ( d->arch.altp2m_active != ostate )
+        {
+            if ( !ostate && !p2m_init_altp2m_by_id(d, 0) )
+                    goto param_fail10;
+
+            for_each_vcpu( d, v )
+                if (!ostate)
+                    altp2mhvm_vcpu_initialise(v);
+                else
+                    altp2mhvm_vcpu_destroy(v);
+
+            if ( ostate )
+                p2m_flush_altp2m(d);
+        }
+
+        rc = 0;
+
+    param_fail10:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_vcpu_enable_notify:
+    {
+        struct domain *curr_d = current->domain;
+        struct vcpu *curr = current;
+        struct xen_hvm_altp2m_vcpu_enable_notify a;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        if ( !is_hvm_domain(curr_d) || !hvm_altp2m_supported() ||
+             !curr_d->arch.altp2m_active || vcpu_altp2mhvm(curr).veinfo_gfn )
+            return -EINVAL;
+
+        vcpu_altp2mhvm(curr).veinfo_gfn = a.pfn;
+        ahvm_vcpu_update_vmfunc_ve(curr);
+        rc = 0;
+
+        break;
+    }
+
+    case HVMOP_altp2m_create_p2m:
+    {
+        struct xen_hvm_altp2m_view a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail11;
+
+        if ( !p2m_init_next_altp2m(d, &a.view) )
+            goto param_fail11;
+
+        rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+    param_fail11:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_destroy_p2m:
+    {
+        struct xen_hvm_altp2m_view a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail12;
+
+        if ( p2m_destroy_altp2m_by_id(d, a.view) )
+            rc = 0;
+
+    param_fail12:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_switch_p2m:
+    {
+        struct xen_hvm_altp2m_view a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail13;
+
+        if ( p2m_switch_domain_altp2m_by_id(d, a.view) )
+            rc = 0;
+
+    param_fail13:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_set_mem_access:
+    {
+        struct xen_hvm_altp2m_set_mem_access a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail14;
+
+        if ( p2m_set_altp2m_mem_access(d, a.view, a.pfn, a.hvmmem_access) )
+            rc = 0;
+
+    param_fail14:
+        rcu_unlock_domain(d);
+        break;
+    }
+
+    case HVMOP_altp2m_change_pfn:
+    {
+        struct xen_hvm_altp2m_change_pfn a;
+        struct domain *d;
+
+        if ( copy_from_guest(&a, arg, 1) )
+            return -EFAULT;
+
+        d = rcu_lock_domain_by_any_id(a.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        rc = -EINVAL;
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.altp2m_active )
+            goto param_fail15;
+
+        if ( p2m_change_altp2m_pfn(d, a.view, a.old_pfn, a.new_pfn) )
+            rc = 0;
+
+    param_fail15:
+        rcu_unlock_domain(d);
+        break;
+    }
+
     default:
     {
         gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index cde3571..f6abce9 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -389,6 +389,75 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t);
 
 #endif /* defined(__i386__) || defined(__x86_64__) */
 
+/* Set/get the altp2m state for a domain */
+#define HVMOP_altp2m_set_domain_state     24
+#define HVMOP_altp2m_get_domain_state     25
+struct xen_hvm_altp2m_domain_state {
+    /* Domain to be updated or queried */
+    domid_t domid;
+    /* IN or OUT variable on/off */
+    uint8_t state;
+};
+typedef struct xen_hvm_altp2m_domain_state xen_hvm_altp2m_domain_state_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_domain_state_t);
+
+/* Set the current VCPU to receive altp2m event notifications */
+#define HVMOP_altp2m_vcpu_enable_notify   26
+struct xen_hvm_altp2m_vcpu_enable_notify {
+    /* #VE info area pfn */
+    uint64_t pfn;
+};
+typedef struct xen_hvm_altp2m_vcpu_enable_notify xen_hvm_altp2m_vcpu_enable_notify_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_vcpu_enable_notify_t);
+
+/* Create a new view */
+#define HVMOP_altp2m_create_p2m   27
+/* Destroy a view */
+#define HVMOP_altp2m_destroy_p2m  28
+/* Switch view for an entire domain */
+#define HVMOP_altp2m_switch_p2m   29
+struct xen_hvm_altp2m_view {
+    /* Domain to be updated */
+    domid_t domid;
+    /* IN/OUT variable */
+    uint16_t view;
+    /* Create view only: default access type
+     * NOTE: currently ignored */
+    uint16_t hvmmem_default_access; /* xenmem_access_t */
+};
+typedef struct xen_hvm_altp2m_view xen_hvm_altp2m_view_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_view_t);
+
+/* Notify that a page of memory is to have specific access types */
+#define HVMOP_altp2m_set_mem_access 30
+struct xen_hvm_altp2m_set_mem_access {
+    /* Domain to be updated. */
+    domid_t domid;
+    /* view */
+    uint16_t view;
+    /* Memory type */
+    uint16_t hvmmem_access; /* xenmem_access_t */
+    /* pfn */
+    uint64_t pfn;
+};
+typedef struct xen_hvm_altp2m_set_mem_access xen_hvm_altp2m_set_mem_access_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_set_mem_access_t);
+
+/* Change a p2m entry to map a different pfn */
+#define HVMOP_altp2m_change_pfn 31
+struct xen_hvm_altp2m_change_pfn {
+    /* Domain to be updated. */
+    domid_t domid;
+    /* view */
+    uint16_t view;
+    /* old pfn */
+    uint64_t old_pfn;
+    /* new pfn, -1 means revert */
+    uint64_t new_pfn;
+};
+typedef struct xen_hvm_altp2m_change_pfn xen_hvm_altp2m_change_pfn_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_change_pfn_t);
+
 #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter.
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (9 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 10/12] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-24 14:06   ` Andrew Cooper
  2015-06-24 14:59   ` Jan Beulich
  2015-06-22 18:56 ` [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops Ed White
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Ed White,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

The altp2mhvm and nestedhvm parameters are mutually
exclusive and cannot be set together.

Signed-off-by: Ed White <edmund.h.white@intel.com>
---
 docs/man/xl.cfg.pod.5           | 12 ++++++++++++
 tools/libxl/libxl_create.c      |  1 +
 tools/libxl/libxl_dom.c         |  2 ++
 tools/libxl/libxl_types.idl     |  1 +
 tools/libxl/xl_cmdimpl.c        |  8 ++++++++
 xen/arch/x86/hvm/hvm.c          | 15 ++++++++++++++-
 xen/include/public/hvm/params.h |  5 ++++-
 7 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..18afd46 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1035,6 +1035,18 @@ enabled by default and you should usually omit it. It may be necessary
 to disable the HPET in order to improve compatibility with guest
 Operating Systems (X86 only)
 
+=item B<altp2mhvm=BOOLEAN>
+
+Enables or disables hvm guest access to alternate-p2m capability.
+Alternate-p2m allows a guest to manage multiple p2m guest physical
+"memory views" (as opposed to a single p2m). This option is
+disabled by default and is available only to hvm domains.
+You may want this option if you want to access-control/isolate
+access to specific guest physical memory pages accessed by
+the guest, e.g. for HVM domain memory introspection or
+for isolation/access-control of memory between components within
+a single guest hvm domain.
+
 =item B<nestedhvm=BOOLEAN>
 
 Enable or disables guest access to hardware virtualisation features,
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 86384d2..35e322e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -329,6 +329,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         libxl_defbool_setdefault(&b_info->u.hvm.hpet,               true);
         libxl_defbool_setdefault(&b_info->u.hvm.vpt_align,          true);
         libxl_defbool_setdefault(&b_info->u.hvm.nested_hvm,         false);
+        libxl_defbool_setdefault(&b_info->u.hvm.altp2mhvm,          false);
         libxl_defbool_setdefault(&b_info->u.hvm.usb,                false);
         libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 867172a..c925fec 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -300,6 +300,8 @@ static void hvm_set_conf_params(xc_interface *handle, uint32_t domid,
                     libxl_defbool_val(info->u.hvm.vpt_align));
     xc_hvm_param_set(handle, domid, HVM_PARAM_NESTEDHVM,
                     libxl_defbool_val(info->u.hvm.nested_hvm));
+    xc_hvm_param_set(handle, domid, HVM_PARAM_ALTP2MHVM,
+                    libxl_defbool_val(info->u.hvm.altp2mhvm));
 }
 
 int libxl__build_pre(libxl__gc *gc, uint32_t domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 23f27d4..66a89cf 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -437,6 +437,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("mmio_hole_memkb",  MemKB),
                                        ("timer_mode",       libxl_timer_mode),
                                        ("nested_hvm",       libxl_defbool),
+                                       ("altp2mhvm",        libxl_defbool),
                                        ("smbios_firmware",  string),
                                        ("acpi_firmware",    string),
                                        ("nographic",        libxl_defbool),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..ccb0de9 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1500,6 +1500,14 @@ static void parse_config_data(const char *config_source,
 
         xlu_cfg_get_defbool(config, "nestedhvm", &b_info->u.hvm.nested_hvm, 0);
 
+        xlu_cfg_get_defbool(config, "altp2mhvm", &b_info->u.hvm.altp2mhvm, 0);
+
+        if (strcmp(libxl_defbool_to_string(b_info->u.hvm.nested_hvm), "True") == 0 &&
+            strcmp(libxl_defbool_to_string(b_info->u.hvm.altp2mhvm), "True") == 0) {
+            fprintf(stderr, "ERROR: nestedhvm and altp2mhvm cannot be used together\n");
+            exit (1);
+        }
+
         xlu_cfg_replace_string(config, "smbios_firmware",
                                &b_info->u.hvm.smbios_firmware, 0);
         xlu_cfg_replace_string(config, "acpi_firmware",
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b3e74ce..8453489 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5732,6 +5732,7 @@ static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_VIRIDIAN:
     case HVM_PARAM_IOREQ_SERVER_PFN:
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+    case HVM_PARAM_ALTP2MHVM:
         if ( value != 0 && a->value != value )
             rc = -EEXIST;
         break;
@@ -5854,6 +5855,9 @@ static int hvmop_set_param(
          */
         if ( cpu_has_svm && !paging_mode_hap(d) && a.value )
             rc = -EINVAL;
+        if ( a.value &&
+             d->arch.hvm_domain.params[HVM_PARAM_ALTP2MHVM] )
+            rc = -EINVAL;
         /* Set up NHVM state for any vcpus that are already up. */
         if ( a.value &&
              !d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
@@ -5864,6 +5868,13 @@ static int hvmop_set_param(
             for_each_vcpu(d, v)
                 nestedhvm_vcpu_destroy(v);
         break;
+    case HVM_PARAM_ALTP2MHVM:
+        if ( a.value > 1 )
+            rc = -EINVAL;
+        if ( a.value &&
+             d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
+            rc = -EINVAL;
+        break;
     case HVM_PARAM_BUFIOREQ_EVTCHN:
         rc = -EINVAL;
         break;
@@ -6437,7 +6448,8 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -ESRCH;
 
         rc = -EINVAL;
-        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() )
+        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.hvm_domain.params[HVM_PARAM_ALTP2MHVM] )
             goto param_fail9;
 
         a.state = altp2mhvm_active(d);
@@ -6464,6 +6476,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         rc = -EINVAL;
         if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+             !d->arch.hvm_domain.params[HVM_PARAM_ALTP2MHVM] ||
              nestedhvm_enabled(d) )
             goto param_fail10;
 
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 7c73089..1b5f840 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -187,6 +187,9 @@
 /* Location of the VM Generation ID in guest physical address space. */
 #define HVM_PARAM_VM_GENERATION_ID_ADDR 34
 
-#define HVM_NR_PARAMS          35
+/* Boolean: Enable altp2m (hvm only) */
+#define HVM_PARAM_ALTP2MHVM    35
+
+#define HVM_NR_PARAMS          36
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (10 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter Ed White
@ 2015-06-22 18:56 ` Ed White
  2015-06-26 19:24   ` Daniel De Graaf
  2015-06-23 21:27 ` [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Lengyel, Tamas
  2015-06-24 14:10 ` Andrew Cooper
  13 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-22 18:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	Andrew Cooper, tlengyel, Daniel De Graaf

From: Ravi Sahita <ravi.sahita@intel.com>

Signed-off-by: Ravi Sahita <ravi.sahita@intel.com>
---
 tools/flask/policy/policy/modules/xen/xen.if |  4 ++--
 xen/arch/x86/hvm/hvm.c                       | 35 ++++++++++++++++++++++++++++
 xen/include/xsm/dummy.h                      | 12 ++++++++++
 xen/include/xsm/xsm.h                        | 12 ++++++++++
 xen/xsm/dummy.c                              |  2 ++
 xen/xsm/flask/hooks.c                        | 12 ++++++++++
 xen/xsm/flask/policy/access_vectors          |  7 ++++++
 7 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
index f4cde11..c95109f 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -8,7 +8,7 @@
 define(`declare_domain_common', `
 	allow $1 $2:grant { query setup };
 	allow $1 $2:mmu { adjust physmap map_read map_write stat pinpage updatemp mmuext_op };
-	allow $1 $2:hvm { getparam setparam };
+	allow $1 $2:hvm { getparam setparam altp2mhvm altp2mhvm_op };
 	allow $1 $2:domain2 get_vnumainfo;
 ')
 
@@ -58,7 +58,7 @@ define(`create_domain_common', `
 	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp };
 	allow $1 $2:grant setup;
 	allow $1 $2:hvm { cacheattr getparam hvmctl irqlevel pciroute sethvmc
-			setparam pcilevel trackdirtyvram nested };
+			setparam pcilevel trackdirtyvram nested altp2mhvm altp2mhvm_op };
 ')
 
 # create_domain(priv, target)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 8453489..8210ede 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5869,6 +5869,9 @@ static int hvmop_set_param(
                 nestedhvm_vcpu_destroy(v);
         break;
     case HVM_PARAM_ALTP2MHVM:
+        rc = xsm_hvm_param_altp2mhvm(XSM_PRIV, d);
+        if ( rc )
+            break;
         if ( a.value > 1 )
             rc = -EINVAL;
         if ( a.value &&
@@ -6447,6 +6450,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( d == NULL )
             return -ESRCH;
 
+        rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d);
+        if ( rc )
+            goto param_fail9;
+
         rc = -EINVAL;
         if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
              !d->arch.hvm_domain.params[HVM_PARAM_ALTP2MHVM] )
@@ -6474,6 +6481,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( d == NULL )
             return -ESRCH;
 
+        rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d);
+        if ( rc )
+            goto param_fail10;
+
         rc = -EINVAL;
         if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
              !d->arch.hvm_domain.params[HVM_PARAM_ALTP2MHVM] ||
@@ -6515,6 +6526,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&a, arg, 1) )
             return -EFAULT;
 
+        rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, curr_d);
+        if ( rc )
+            break;
+
         if ( !is_hvm_domain(curr_d) || !hvm_altp2m_supported() ||
              !curr_d->arch.altp2m_active || vcpu_altp2mhvm(curr).veinfo_gfn )
             return -EINVAL;
@@ -6538,6 +6553,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( d == NULL )
             return -ESRCH;
 
+        rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d);
+        if ( rc )
+            goto param_fail11;
+
         rc = -EINVAL;
         if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
              !d->arch.altp2m_active )
@@ -6565,6 +6584,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( d == NULL )
             return -ESRCH;
 
+        rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d);
+        if ( rc )
+            goto param_fail12;
+
         rc = -EINVAL;
         if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
              !d->arch.altp2m_active )
@@ -6590,6 +6613,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( d == NULL )
             return -ESRCH;
 
+        rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d);
+        if ( rc )
+            goto param_fail13;
+
         rc = -EINVAL;
         if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
              !d->arch.altp2m_active )
@@ -6615,6 +6642,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( d == NULL )
             return -ESRCH;
 
+        rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d);
+        if ( rc )
+            goto param_fail14;
+
         rc = -EINVAL;
         if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
              !d->arch.altp2m_active )
@@ -6640,6 +6671,10 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( d == NULL )
             return -ESRCH;
 
+        rc = xsm_hvm_altp2mhvm_op(XSM_TARGET, d);
+        if ( rc )
+            goto param_fail15;
+
         rc = -EINVAL;
         if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
              !d->arch.altp2m_active )
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index f044c0f..e0b561d 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -548,6 +548,18 @@ static XSM_INLINE int xsm_hvm_param_nested(XSM_DEFAULT_ARG struct domain *d)
     return xsm_default_action(action, current->domain, d);
 }
 
+static XSM_INLINE int xsm_hvm_param_altp2mhvm(XSM_DEFAULT_ARG struct domain *d)
+{
+    XSM_ASSERT_ACTION(XSM_PRIV);
+    return xsm_default_action(action, current->domain, d);
+}
+
+static XSM_INLINE int xsm_hvm_altp2mhvm_op(XSM_DEFAULT_ARG struct domain *d)
+{
+    XSM_ASSERT_ACTION(XSM_TARGET);
+    return xsm_default_action(action, current->domain, d);
+}
+
 static XSM_INLINE int xsm_vm_event_control(XSM_DEFAULT_ARG struct domain *d, int mode, int op)
 {
     XSM_ASSERT_ACTION(XSM_PRIV);
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index c872d44..dc48d23 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -147,6 +147,8 @@ struct xsm_operations {
     int (*hvm_param) (struct domain *d, unsigned long op);
     int (*hvm_control) (struct domain *d, unsigned long op);
     int (*hvm_param_nested) (struct domain *d);
+    int (*hvm_param_altp2mhvm) (struct domain *d);
+    int (*hvm_altp2mhvm_op) (struct domain *d);
     int (*get_vnumainfo) (struct domain *d);
 
     int (*vm_event_control) (struct domain *d, int mode, int op);
@@ -586,6 +588,16 @@ static inline int xsm_hvm_param_nested (xsm_default_t def, struct domain *d)
     return xsm_ops->hvm_param_nested(d);
 }
 
+static inline int xsm_hvm_param_altp2mhvm (xsm_default_t def, struct domain *d)
+{
+    return xsm_ops->hvm_param_altp2mhvm(d);
+}
+
+static inline int xsm_hvm_altp2mhvm_op (xsm_default_t def, struct domain *d)
+{
+    return xsm_ops->hvm_altp2mhvm_op(d);
+}
+
 static inline int xsm_get_vnumainfo (xsm_default_t def, struct domain *d)
 {
     return xsm_ops->get_vnumainfo(d);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index e84b0e4..3461d4f 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -116,6 +116,8 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, hvm_param);
     set_to_dummy_if_null(ops, hvm_control);
     set_to_dummy_if_null(ops, hvm_param_nested);
+    set_to_dummy_if_null(ops, hvm_param_altp2mhvm);
+    set_to_dummy_if_null(ops, hvm_altp2mhvm_op);
 
     set_to_dummy_if_null(ops, do_xsm_op);
 #ifdef CONFIG_COMPAT
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 6e37d29..2b998c9 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1170,6 +1170,16 @@ static int flask_hvm_param_nested(struct domain *d)
     return current_has_perm(d, SECCLASS_HVM, HVM__NESTED);
 }
 
+static int flask_hvm_param_altp2mhvm(struct domain *d)
+{
+    return current_has_perm(d, SECCLASS_HVM, HVM__ALTP2MHVM);
+}
+
+static int flask_hvm_altp2mhvm_op(struct domain *d)
+{
+    return current_has_perm(d, SECCLASS_HVM, HVM__ALTP2MHVM_OP);
+}
+
 static int flask_vm_event_control(struct domain *d, int mode, int op)
 {
     return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__VM_EVENT);
@@ -1654,6 +1664,8 @@ static struct xsm_operations flask_ops = {
     .hvm_param = flask_hvm_param,
     .hvm_control = flask_hvm_param,
     .hvm_param_nested = flask_hvm_param_nested,
+    .hvm_param_altp2mhvm = flask_hvm_param_altp2mhvm,
+    .hvm_altp2mhvm_op = flask_hvm_altp2mhvm_op,
 
     .do_xsm_op = do_flask_op,
     .get_vnumainfo = flask_get_vnumainfo,
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 68284d5..ca7982f 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -272,6 +272,13 @@ class hvm
     share_mem
 # HVMOP_set_param setting HVM_PARAM_NESTEDHVM
     nested
+# HVMOP_set_param setting HVM_PARAM_ALTP2MHVM
+    altp2mhvm
+# HVMOP_altp2m_set_domain_state HVMOP_altp2m_get_domain_state
+# HVMOP_altp2m_vcpu_enable_notify HVMOP_altp2m_create_p2m
+# HVMOP_altp2m_destroy_p2m HVMOP_altp2m_switch_p2m
+# HVMOP_altp2m_set_mem_access HVMOP_altp2m_change_pfn
+    altp2mhvm_op
 }
 
 # Class event describes event channels.  Interdomain event channels have their
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-22 18:56 ` [PATCH v2 09/12] x86/altp2m: add remaining support routines Ed White
@ 2015-06-23 18:15   ` Lengyel, Tamas
  2015-06-23 18:52     ` Ed White
  2015-06-24 13:46   ` Andrew Cooper
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-23 18:15 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 3883 bytes --]

On Mon, Jun 22, 2015 at 2:56 PM, Ed White <edmund.h.white@intel.com> wrote:

> Add the remaining routines required to support enabling the alternate
> p2m functionality.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c              |  60 +++++-
>  xen/arch/x86/mm/hap/Makefile        |   1 +
>  xen/arch/x86/mm/hap/altp2m_hap.c    | 103 +++++++++
>  xen/arch/x86/mm/p2m-ept.c           |   3 +
>  xen/arch/x86/mm/p2m.c               | 405
> ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/altp2mhvm.h |   4 +
>  xen/include/asm-x86/p2m.h           |  33 +++
>  7 files changed, 601 insertions(+), 8 deletions(-)
>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index d75c12d..b758ee1 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2786,10 +2786,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> unsigned long gla,
>      p2m_access_t p2ma;
>      mfn_t mfn;
>      struct vcpu *v = current;
> -    struct p2m_domain *p2m;
> +    struct p2m_domain *p2m, *hostp2m;
>      int rc, fall_through = 0, paged = 0;
>      int sharing_enomem = 0;
>      vm_event_request_t *req_ptr = NULL;
> +    int altp2m_active = 0;
>
>      /* On Nested Virtualization, walk the guest page table.
>       * If this succeeds, all is fine.
> @@ -2845,15 +2846,33 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> unsigned long gla,
>      {
>          if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec)
> )
>              hvm_inject_hw_exception(TRAP_gp_fault, 0);
> -        rc = 1;
> -        goto out;
> +        return 1;
>      }
>
> -    p2m = p2m_get_hostp2m(v->domain);
> -    mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
> +    altp2m_active = altp2mhvm_active(v->domain);
> +
> +    /* Take a lock on the host p2m speculatively, to avoid potential
> +     * locking order problems later and to handle unshare etc.
> +     */
> +    hostp2m = p2m_get_hostp2m(v->domain);
> +    mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
>                                P2M_ALLOC | (npfec.write_access ?
> P2M_UNSHARE : 0),
>                                NULL);
>
> +    if ( altp2m_active )
> +    {
> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m) ==
> 1 )
> +        {
> +            /* entry was lazily copied from host -- retry */
> +            __put_gfn(hostp2m, gfn);
> +            return 1;
> +        }
> +
> +        mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
> +    }
> +    else
> +        p2m = hostp2m;
> +
>      /* Check access permissions first, then handle faults */
>      if ( mfn_x(mfn) != INVALID_MFN )
>      {
> @@ -2893,6 +2912,20 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned
> long gla,
>
>          if ( violation )
>          {
> +            /* Should #VE be emulated for this fault? */
> +            if ( p2m_is_altp2m(p2m) && !cpu_has_vmx_virt_exceptions )
> +            {
> +                unsigned int sve;
> +
> +                p2m->get_entry_full(p2m, gfn, &p2mt, &p2ma, 0, NULL,
> &sve);
> +
> +                if ( !sve && ahvm_vcpu_emulate_ve(v) )
>

This line generates the following compile-time error: "hvm.c:2923:51:
error: ‘v’ undeclared (first use in this function)". Did you mean to pass
curr to ahvm_vcpu_emulate_ve instead of v?

I would recommend doing a compile-test on each patch of the series to catch
small things like this. Travis-ci has been working really great for me to
automate that process (https://github.com/tklengyel/xen/compare/travis) ;)

-- 

[image: www.novetta.com]

Tamas K Lengyel

Senior Security Researcher

7921 Jones Branch Drive

McLean VA 22102

Email  tlengyel@novetta.com

[-- Attachment #1.2: Type: text/html, Size: 6833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-23 18:15   ` Lengyel, Tamas
@ 2015-06-23 18:52     ` Ed White
  2015-06-23 19:35       ` Lengyel, Tamas
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-23 18:52 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/23/2015 11:15 AM, Lengyel, Tamas wrote:
> On Mon, Jun 22, 2015 at 2:56 PM, Ed White <edmund.h.white@intel.com> wrote:
> 
>> Add the remaining routines required to support enabling the alternate
>> p2m functionality.
>>
>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>> ---
>>  xen/arch/x86/hvm/hvm.c              |  60 +++++-
>>  xen/arch/x86/mm/hap/Makefile        |   1 +
>>  xen/arch/x86/mm/hap/altp2m_hap.c    | 103 +++++++++
>>  xen/arch/x86/mm/p2m-ept.c           |   3 +
>>  xen/arch/x86/mm/p2m.c               | 405
>> ++++++++++++++++++++++++++++++++++++
>>  xen/include/asm-x86/hvm/altp2mhvm.h |   4 +
>>  xen/include/asm-x86/p2m.h           |  33 +++
>>  7 files changed, 601 insertions(+), 8 deletions(-)
>>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
>>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index d75c12d..b758ee1 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -2786,10 +2786,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>> unsigned long gla,
>>      p2m_access_t p2ma;
>>      mfn_t mfn;
>>      struct vcpu *v = current;
>> -    struct p2m_domain *p2m;
>> +    struct p2m_domain *p2m, *hostp2m;
>>      int rc, fall_through = 0, paged = 0;
>>      int sharing_enomem = 0;
>>      vm_event_request_t *req_ptr = NULL;
>> +    int altp2m_active = 0;
>>
>>      /* On Nested Virtualization, walk the guest page table.
>>       * If this succeeds, all is fine.
>> @@ -2845,15 +2846,33 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>> unsigned long gla,
>>      {
>>          if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec)
>> )
>>              hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> -        rc = 1;
>> -        goto out;
>> +        return 1;
>>      }
>>
>> -    p2m = p2m_get_hostp2m(v->domain);
>> -    mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
>> +    altp2m_active = altp2mhvm_active(v->domain);
>> +
>> +    /* Take a lock on the host p2m speculatively, to avoid potential
>> +     * locking order problems later and to handle unshare etc.
>> +     */
>> +    hostp2m = p2m_get_hostp2m(v->domain);
>> +    mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
>>                                P2M_ALLOC | (npfec.write_access ?
>> P2M_UNSHARE : 0),
>>                                NULL);
>>
>> +    if ( altp2m_active )
>> +    {
>> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m) ==
>> 1 )
>> +        {
>> +            /* entry was lazily copied from host -- retry */
>> +            __put_gfn(hostp2m, gfn);
>> +            return 1;
>> +        }
>> +
>> +        mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
>> +    }
>> +    else
>> +        p2m = hostp2m;
>> +
>>      /* Check access permissions first, then handle faults */
>>      if ( mfn_x(mfn) != INVALID_MFN )
>>      {
>> @@ -2893,6 +2912,20 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned
>> long gla,
>>
>>          if ( violation )
>>          {
>> +            /* Should #VE be emulated for this fault? */
>> +            if ( p2m_is_altp2m(p2m) && !cpu_has_vmx_virt_exceptions )
>> +            {
>> +                unsigned int sve;
>> +
>> +                p2m->get_entry_full(p2m, gfn, &p2mt, &p2ma, 0, NULL,
>> &sve);
>> +
>> +                if ( !sve && ahvm_vcpu_emulate_ve(v) )
>>
> 
> This line generates the following compile-time error: "hvm.c:2923:51:
> error: ‘v’ undeclared (first use in this function)". Did you mean to pass
> curr to ahvm_vcpu_emulate_ve instead of v?
> 
> I would recommend doing a compile-test on each patch of the series to catch
> small things like this. Travis-ci has been working really great for me to
> automate that process (https://github.com/tklengyel/xen/compare/travis) ;)
> 

I don't know why you are seeing that error, you can clearly see that v is
defined and initialised at the start of the containing function, and has
been used earlier.

I always compile test every patch individually.

Ed


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-23 18:52     ` Ed White
@ 2015-06-23 19:35       ` Lengyel, Tamas
  0 siblings, 0 replies; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-23 19:35 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 4600 bytes --]

On Tue, Jun 23, 2015 at 2:52 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/23/2015 11:15 AM, Lengyel, Tamas wrote:
> > On Mon, Jun 22, 2015 at 2:56 PM, Ed White <edmund.h.white@intel.com>
> wrote:
> >
> >> Add the remaining routines required to support enabling the alternate
> >> p2m functionality.
> >>
> >> Signed-off-by: Ed White <edmund.h.white@intel.com>
> >> ---
> >>  xen/arch/x86/hvm/hvm.c              |  60 +++++-
> >>  xen/arch/x86/mm/hap/Makefile        |   1 +
> >>  xen/arch/x86/mm/hap/altp2m_hap.c    | 103 +++++++++
> >>  xen/arch/x86/mm/p2m-ept.c           |   3 +
> >>  xen/arch/x86/mm/p2m.c               | 405
> >> ++++++++++++++++++++++++++++++++++++
> >>  xen/include/asm-x86/hvm/altp2mhvm.h |   4 +
> >>  xen/include/asm-x86/p2m.h           |  33 +++
> >>  7 files changed, 601 insertions(+), 8 deletions(-)
> >>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
> >>
> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >> index d75c12d..b758ee1 100644
> >> --- a/xen/arch/x86/hvm/hvm.c
> >> +++ b/xen/arch/x86/hvm/hvm.c
> >> @@ -2786,10 +2786,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> >> unsigned long gla,
> >>      p2m_access_t p2ma;
> >>      mfn_t mfn;
> >>      struct vcpu *v = current;
> >> -    struct p2m_domain *p2m;
> >> +    struct p2m_domain *p2m, *hostp2m;
> >>      int rc, fall_through = 0, paged = 0;
> >>      int sharing_enomem = 0;
> >>      vm_event_request_t *req_ptr = NULL;
> >> +    int altp2m_active = 0;
> >>
> >>      /* On Nested Virtualization, walk the guest page table.
> >>       * If this succeeds, all is fine.
> >> @@ -2845,15 +2846,33 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> >> unsigned long gla,
> >>      {
> >>          if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT,
> npfec)
> >> )
> >>              hvm_inject_hw_exception(TRAP_gp_fault, 0);
> >> -        rc = 1;
> >> -        goto out;
> >> +        return 1;
> >>      }
> >>
> >> -    p2m = p2m_get_hostp2m(v->domain);
> >> -    mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
> >> +    altp2m_active = altp2mhvm_active(v->domain);
> >> +
> >> +    /* Take a lock on the host p2m speculatively, to avoid potential
> >> +     * locking order problems later and to handle unshare etc.
> >> +     */
> >> +    hostp2m = p2m_get_hostp2m(v->domain);
> >> +    mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
> >>                                P2M_ALLOC | (npfec.write_access ?
> >> P2M_UNSHARE : 0),
> >>                                NULL);
> >>
> >> +    if ( altp2m_active )
> >> +    {
> >> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m)
> ==
> >> 1 )
> >> +        {
> >> +            /* entry was lazily copied from host -- retry */
> >> +            __put_gfn(hostp2m, gfn);
> >> +            return 1;
> >> +        }
> >> +
> >> +        mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
> >> +    }
> >> +    else
> >> +        p2m = hostp2m;
> >> +
> >>      /* Check access permissions first, then handle faults */
> >>      if ( mfn_x(mfn) != INVALID_MFN )
> >>      {
> >> @@ -2893,6 +2912,20 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> unsigned
> >> long gla,
> >>
> >>          if ( violation )
> >>          {
> >> +            /* Should #VE be emulated for this fault? */
> >> +            if ( p2m_is_altp2m(p2m) && !cpu_has_vmx_virt_exceptions )
> >> +            {
> >> +                unsigned int sve;
> >> +
> >> +                p2m->get_entry_full(p2m, gfn, &p2mt, &p2ma, 0, NULL,
> >> &sve);
> >> +
> >> +                if ( !sve && ahvm_vcpu_emulate_ve(v) )
> >>
> >
> > This line generates the following compile-time error: "hvm.c:2923:51:
> > error: 'v' undeclared (first use in this function)". Did you mean to pass
> > curr to ahvm_vcpu_emulate_ve instead of v?
> >
> > I would recommend doing a compile-test on each patch of the series to
> catch
> > small things like this. Travis-ci has been working really great for me to
> > automate that process (https://github.com/tklengyel/xen/compare/travis)
> ;)
> >
>
> I don't know why you are seeing that error, you can clearly see that v is
> defined and initialised at the start of the containing function, and has
> been used earlier.
>
> I always compile test every patch individually.
>
> Ed
>

Was applying the series to the latest master branch. Things have apparently
changed enough in that frame to cause breakage =) Sorry for the noise.



-- 

[image: www.novetta.com]

Tamas K Lengyel

Senior Security Researcher

7921 Jones Branch Drive

McLean VA 22102

Email  tlengyel@novetta.com

[-- Attachment #1.2: Type: text/html, Size: 9615 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (11 preceding siblings ...)
  2015-06-22 18:56 ` [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops Ed White
@ 2015-06-23 21:27 ` Lengyel, Tamas
  2015-06-23 22:25   ` Ed White
  2015-06-24  5:39   ` Razvan Cojocaru
  2015-06-24 14:10 ` Andrew Cooper
  13 siblings, 2 replies; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-23 21:27 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 2195 bytes --]

> Testability is still a potential issue. We have offered to make our
> internal
> Windows test binaries available for intra-domain testing. Tamas has
> been working on toolstack support for cross-domain testing with a slightly
> earlier patch series, and we hope he will submit that support.
>

Hi Ed,
the toolstack support for externel monitoring seems to be functioning now.
I can post it separately but IMHO it would make most sense to just append
it to the series (if you plan to submit it again), or wait till your side
gets merged. My branch can be found at
https://github.com/tklengyel/xen/tree/altp2m_mine.

I've extended xen-access to exercise this new feature taking into account
some of the current limitations. Using the altp2m_write|exec options we
create a duplicate view of the default hostp2m, and instead of relaxing the
mem_access permissions when we encounter a violation, we swap the view on
the violating vCPU while also enabling MTF singlestepping. When the
singlestep event fires, we use the response to that event to swap the view
back to the restricted altp2m view.

# ./xen-access 6 altp2m_write
xenaccess init
max_gpfn = ff000
starting altp2m_write 6
altp2m view created with id 1
Setting altp2m mem_access permissions.. done! Permissions set on 260171
pages.
Got event from Xen
Got event from Xen
PAGE ACCESS: rw- for GFN 272e (offset 000b98) gla 000000008272eb98 (valid:
y; fault in gpt: n; fault with gla: y) (vcpu 0, altp2m view 1)
    Switching back to hostp2m default view!
Got event from Xen
Singlestep: rip=0000000082a1a634, vcpu 0
    Switching altp2m to view 1!
Got event from Xen
PAGE ACCESS: rw- for GFN 272e (offset 000b8c) gla 000000008272eb8c (valid:
y; fault in gpt: n; fault with gla: y) (vcpu 0, altp2m view 1)
    Switching back to hostp2m default view!

Some of the more exotic features, such as the gfn remapping, is left as
future work for now. We definitely have plans on utilizing it in the near
future though and it is exposed via libxc but no toolside test exercises it
at the moment.

Cheers!

-- 

[image: www.novetta.com]

Tamas K Lengyel

Senior Security Researcher

7921 Jones Branch Drive

McLean VA 22102

Email  tlengyel@novetta.com

[-- Attachment #1.2: Type: text/html, Size: 4672 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-23 21:27 ` [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Lengyel, Tamas
@ 2015-06-23 22:25   ` Ed White
  2015-06-24  5:39   ` Razvan Cojocaru
  1 sibling, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-23 22:25 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/23/2015 02:27 PM, Lengyel, Tamas wrote:
>> Testability is still a potential issue. We have offered to make our
>> internal
>> Windows test binaries available for intra-domain testing. Tamas has
>> been working on toolstack support for cross-domain testing with a slightly
>> earlier patch series, and we hope he will submit that support.
>>
> 
> Hi Ed,
> the toolstack support for externel monitoring seems to be functioning now.
> I can post it separately but IMHO it would make most sense to just append
> it to the series (if you plan to submit it again), or wait till your side
> gets merged. My branch can be found at
> https://github.com/tklengyel/xen/tree/altp2m_mine.
> 
> I've extended xen-access to exercise this new feature taking into account
> some of the current limitations. Using the altp2m_write|exec options we
> create a duplicate view of the default hostp2m, and instead of relaxing the
> mem_access permissions when we encounter a violation, we swap the view on
> the violating vCPU while also enabling MTF singlestepping. When the
> singlestep event fires, we use the response to that event to swap the view
> back to the restricted altp2m view.
> 
> # ./xen-access 6 altp2m_write
> xenaccess init
> max_gpfn = ff000
> starting altp2m_write 6
> altp2m view created with id 1
> Setting altp2m mem_access permissions.. done! Permissions set on 260171
> pages.
> Got event from Xen
> Got event from Xen
> PAGE ACCESS: rw- for GFN 272e (offset 000b98) gla 000000008272eb98 (valid:
> y; fault in gpt: n; fault with gla: y) (vcpu 0, altp2m view 1)
>     Switching back to hostp2m default view!
> Got event from Xen
> Singlestep: rip=0000000082a1a634, vcpu 0
>     Switching altp2m to view 1!
> Got event from Xen
> PAGE ACCESS: rw- for GFN 272e (offset 000b8c) gla 000000008272eb8c (valid:
> y; fault in gpt: n; fault with gla: y) (vcpu 0, altp2m view 1)
>     Switching back to hostp2m default view!
> 
> Some of the more exotic features, such as the gfn remapping, is left as
> future work for now. We definitely have plans on utilizing it in the near
> future though and it is exposed via libxc but no toolside test exercises it
> at the moment.
> 
> Cheers!
> 

That's great news! Thanks for doing this. I agree that it makes sense to
wait and see how the patch series progresses before deciding how best to
submit your changes. I'm pleased but slightly surprised that you didn't
find any show-stopping bugs, since not one line of the cross-domain code
had ever been tested.

Incidentally, the algorithm you've implemented is very similar to what our
Windows tests are currently using.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-23 21:27 ` [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Lengyel, Tamas
  2015-06-23 22:25   ` Ed White
@ 2015-06-24  5:39   ` Razvan Cojocaru
  2015-06-24 13:32     ` Lengyel, Tamas
  1 sibling, 1 reply; 116+ messages in thread
From: Razvan Cojocaru @ 2015-06-24  5:39 UTC (permalink / raw)
  To: Lengyel, Tamas, Ed White
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/24/2015 12:27 AM, Lengyel, Tamas wrote:
> I've extended xen-access to exercise this new feature taking into
> account some of the current limitations. Using the altp2m_write|exec
> options we create a duplicate view of the default hostp2m, and instead
> of relaxing the mem_access permissions when we encounter a violation, we
> swap the view on the violating vCPU while also enabling MTF
> singlestepping. When the singlestep event fires, we use the response to
> that event to swap the view back to the restricted altp2m view.

That's certainly very interesting. I wonder what the benefits are in
this case over emulating the fault-causing instruction (other than
obviously not going through the emulator)? The altp2m method would
certainly be slower, since you need more round-trips from userspace to
the hypervisor (the EPT vm_event handling + the singlestep event,
whereas with emulation you just reply to the original vm_event).


Regards,
Razvan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 01/12] VMX: VMFUNC and #VE definitions and detection.
  2015-06-22 18:56 ` [PATCH v2 01/12] VMX: VMFUNC and #VE definitions and detection Ed White
@ 2015-06-24  8:45   ` Andrew Cooper
  0 siblings, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24  8:45 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> Currently, neither is enabled globally but may be enabled on a per-VCPU
> basis by the altp2m code.
>
> Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
> that bit is now hardware-defined.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 02/12] VMX: implement suppress #VE.
  2015-06-22 18:56 ` [PATCH v2 02/12] VMX: implement suppress #VE Ed White
@ 2015-06-24  9:35   ` Andrew Cooper
  2015-06-29 14:20   ` George Dunlap
  1 sibling, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24  9:35 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> In preparation for selectively enabling #VE in a later patch, set
> suppress #VE on all EPTE's.
>
> Suppress #VE should always be the default condition for two reasons:
> it is generally not safe to deliver #VE into a guest unless that guest
> has been modified to receive it; and even then for most EPT violations only
> the hypervisor is able to handle the violation.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/mm/p2m-ept.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index a6c9adf..5de3387 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -41,7 +41,7 @@
>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>  static inline bool_t is_epte_valid(ept_entry_t *e)
>  {
> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
> +    return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);

It might be nice to leave a comment explaining that epte.suppress_ve is
not considered as part of validity.  This avoids a rather opaque mask
against a magic number.

Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

>  }
>  
>  /* returns : 0 for success, -errno otherwise */
> @@ -219,6 +219,8 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
>  static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
>  {
>      struct page_info *pg;
> +    ept_entry_t *table;
> +    unsigned int i;
>  
>      pg = p2m_alloc_ptp(p2m, 0);
>      if ( pg == NULL )
> @@ -232,6 +234,15 @@ static int ept_set_middle_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry)
>      /* Manually set A bit to avoid overhead of MMU having to write it later. */
>      ept_entry->a = 1;
>  
> +    ept_entry->suppress_ve = 1;
> +
> +    table = __map_domain_page(pg);
> +
> +    for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> +        table[i].suppress_ve = 1;
> +
> +    unmap_domain_page(table);
> +
>      return 1;
>  }
>  
> @@ -281,6 +292,7 @@ static int ept_split_super_page(struct p2m_domain *p2m, ept_entry_t *ept_entry,
>          epte->sp = (level > 1);
>          epte->mfn += i * trunk;
>          epte->snp = (iommu_enabled && iommu_snoop);
> +        epte->suppress_ve = 1;
>  
>          ept_p2m_type_to_flags(p2m, epte, epte->sa_p2mt, epte->access);
>  
> @@ -790,6 +802,8 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
>      }
>  
> +    new_entry.suppress_ve = 1;
> +
>      rc = atomic_write_ept_entry(ept_entry, new_entry, target);
>      if ( unlikely(rc) )
>          old_entry.epte = 0;
> @@ -1111,6 +1125,8 @@ static void ept_flush_pml_buffers(struct p2m_domain *p2m)
>  int ept_p2m_init(struct p2m_domain *p2m)
>  {
>      struct ept_data *ept = &p2m->ept;
> +    ept_entry_t *table;
> +    unsigned int i;
>  
>      p2m->set_entry = ept_set_entry;
>      p2m->get_entry = ept_get_entry;
> @@ -1134,6 +1150,13 @@ int ept_p2m_init(struct p2m_domain *p2m)
>          p2m->flush_hardware_cached_dirty = ept_flush_pml_buffers;
>      }
>  
> +    table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
> +
> +    for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
> +        table[i].suppress_ve = 1;
> +
> +    unmap_domain_page(table);
> +
>      if ( !zalloc_cpumask_var(&ept->synced_mask) )
>          return -ENOMEM;
>  

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 03/12] x86/HVM: Hardware alternate p2m support detection.
  2015-06-22 18:56 ` [PATCH v2 03/12] x86/HVM: Hardware alternate p2m support detection Ed White
@ 2015-06-24  9:44   ` Andrew Cooper
  2015-06-24 10:07     ` Jan Beulich
  0 siblings, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24  9:44 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> As implemented here, only supported on platforms with VMX HAP.
>
> By default this functionality is force-disabled, it can be enabled
> by specifying altp2m=1 on the Xen command line.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  docs/misc/xen-command-line.markdown |  7 +++++++
>  xen/arch/x86/hvm/hvm.c              | 12 ++++++++++++
>  xen/arch/x86/hvm/vmx/vmx.c          |  1 +
>  xen/include/asm-x86/hvm/hvm.h       |  6 ++++++
>  4 files changed, 26 insertions(+)
>
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
> index aa684c0..3391c66 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -139,6 +139,13 @@ mode during S3 resume.
>  > Default: `true`
>  
>  Permit Xen to use superpages when performing memory management.
> + 
> +### altp2m (Intel)
> +> `= <boolean>`
> +
> ++> Default: `false`
> +
> +Permit multiple copies of host p2m.
>  
>  ### apic
>  > `= bigsmp | default`
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 65baa1b..af68d44 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -94,6 +94,10 @@ bool_t opt_hvm_fep;
>  boolean_param("hvm_fep", opt_hvm_fep);
>  #endif
>  
> +/* Xen command-line option to enable altp2m */
> +static bool_t __initdata opt_altp2m_enabled = 0;
> +boolean_param("altp2m", opt_altp2m_enabled);
> +
>  static int cpu_callback(
>      struct notifier_block *nfb, unsigned long action, void *hcpu)
>  {
> @@ -160,6 +164,9 @@ static int __init hvm_enable(void)
>      if ( !fns->pvh_supported )
>          printk(XENLOG_INFO "HVM: PVH mode not supported on this platform\n");
>  
> +    if ( !opt_altp2m_enabled )
> +        hvm_funcs.altp2m_supported = 0;
> +
>      /*
>       * Allow direct access to the PC debug ports 0x80 and 0xed (they are
>       * often used for I/O delays, but the vmexits simply slow things down).
> @@ -6474,6 +6481,11 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
>      return hvm_funcs.nhvm_intr_blocked(v);
>  }
>  
> +bool_t hvm_altp2m_supported()
> +{
> +    return hvm_funcs.altp2m_supported;
> +}

I would put this as a static inline in hvm.h, as opposed to forcing a
call into a different translation unit to retrieve a global boolean.

Otherwise, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-22 18:56 ` [PATCH v2 04/12] x86/altp2m: basic data structures and support routines Ed White
@ 2015-06-24 10:06   ` Andrew Cooper
  2015-06-24 10:23     ` Jan Beulich
  2015-06-24 17:20     ` Ed White
  2015-06-24 10:29   ` Andrew Cooper
  2015-06-24 14:44   ` Jan Beulich
  2 siblings, 2 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 10:06 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> Add the basic data structures needed to support alternate p2m's and
> the functions to initialise them and tear them down.
>
> Although Intel hardware can handle 512 EPTP's per hardware thread
> concurrently, only 10 per domain are supported in this patch for
> performance reasons.
>
> The iterator in hap_enable() does need to handle 512, so that is now
> uint16_t.
>
> This change also splits the p2m lock into one lock type for altp2m's
> and another type for all other p2m's. The purpose of this is to place
> the altp2m list lock between the types, so the list lock can be
> acquired whilst holding the host p2m lock.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/Makefile           |   2 +
>  xen/arch/x86/hvm/altp2mhvm.c        |  82 ++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/hvm.c              |  21 ++++++++
>  xen/arch/x86/mm/hap/hap.c           |  31 ++++++++++-
>  xen/arch/x86/mm/mm-locks.h          |  33 +++++++++++-
>  xen/arch/x86/mm/p2m.c               | 103 ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/domain.h        |  10 ++++
>  xen/include/asm-x86/hvm/altp2mhvm.h |  38 +++++++++++++
>  xen/include/asm-x86/hvm/hvm.h       |  17 ++++++
>  xen/include/asm-x86/hvm/vcpu.h      |   9 ++++
>  xen/include/asm-x86/p2m.h           |  30 ++++++++++-
>  11 files changed, 372 insertions(+), 4 deletions(-)
>  create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
>  create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h
>
> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
> index 69af47f..da4475d 100644
> --- a/xen/arch/x86/hvm/Makefile
> +++ b/xen/arch/x86/hvm/Makefile
> @@ -1,6 +1,7 @@
>  subdir-y += svm
>  subdir-y += vmx
>  
> +obj-y += altp2mhvm.o

This file is already in a directory named hvm.  I would name it simply
altp2m.c

Similarly, altp2mhvm in function names seems redundant, where altp2m
would suffice.  We will never be in a position of offering altp2m to PV
domains.

>  obj-y += asid.o
>  obj-y += emulate.o
>  obj-y += event.o
> @@ -24,3 +25,4 @@ obj-y += vmsi.o
>  obj-y += vpic.o
>  obj-y += vpt.o
>  obj-y += vpmu.o
> +

Spurious whitespace change.

> diff --git a/xen/arch/x86/hvm/altp2mhvm.c b/xen/arch/x86/hvm/altp2mhvm.c
> new file mode 100644
> index 0000000..802fe5b
> --- /dev/null
> +++ b/xen/arch/x86/hvm/altp2mhvm.c
> @@ -0,0 +1,82 @@
> +/*
> + * Alternate p2m HVM
> + * Copyright (c) 2014, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#include <asm/hvm/support.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/p2m.h>
> +#include <asm/hvm/altp2mhvm.h>
> +
> +void
> +altp2mhvm_vcpu_reset(struct vcpu *v)
> +{
> +    struct altp2mvcpu *av = &vcpu_altp2mhvm(v);
> +
> +    av->p2midx = INVALID_ALTP2M;
> +    av->veinfo_gfn = 0;
> +
> +    if ( hvm_funcs.ahvm_vcpu_reset )
> +        hvm_funcs.ahvm_vcpu_reset(v);
> +}
> +
> +int
> +altp2mhvm_vcpu_initialise(struct vcpu *v)
> +{
> +    int rc = -EOPNOTSUPP;
> +
> +    if ( v != current )
> +        vcpu_pause(v);
> +
> +    if ( !hvm_funcs.ahvm_vcpu_initialise ||
> +         (hvm_funcs.ahvm_vcpu_initialise(v) == 0) )
> +    {
> +        rc = 0;
> +        altp2mhvm_vcpu_reset(v);
> +        vcpu_altp2mhvm(v).p2midx = 0;
> +        atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
> +
> +        ahvm_vcpu_update_eptp(v);
> +    }
> +
> +    if ( v != current )
> +        vcpu_unpause(v);
> +
> +    return rc;
> +}
> +
> +void
> +altp2mhvm_vcpu_destroy(struct vcpu *v)
> +{
> +    struct p2m_domain *p2m;
> +
> +    if ( v != current )
> +        vcpu_pause(v);
> +
> +    if ( hvm_funcs.ahvm_vcpu_destroy )
> +        hvm_funcs.ahvm_vcpu_destroy(v);
> +
> +    if ( (p2m = p2m_get_altp2m(v)) )
> +        atomic_dec(&p2m->active_vcpus);
> +
> +    altp2mhvm_vcpu_reset(v);
> +
> +    ahvm_vcpu_update_eptp(v);
> +    ahvm_vcpu_update_vmfunc_ve(v);
> +
> +    if ( v != current )
> +        vcpu_unpause(v);
> +}

Please put an variable block at the bottom of the file.

/*
 * Local variables:
 * mode: C
 * c-file-style: "BSD"
 * c-basic-offset: 4
 * tab-width: 4
 * indent-tabs-mode: nil
 * End:
 */

> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index af68d44..d75c12d 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -58,6 +58,7 @@
>  #include <asm/hvm/cacheattr.h>
>  #include <asm/hvm/trace.h>
>  #include <asm/hvm/nestedhvm.h>
> +#include <asm/hvm/altp2mhvm.h>
>  #include <asm/hvm/event.h>
>  #include <asm/mtrr.h>
>  #include <asm/apic.h>
> @@ -2373,6 +2374,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
>  {
>      hvm_all_ioreq_servers_remove_vcpu(v->domain, v);
>  
> +    altp2mhvm_vcpu_destroy(v);
>      nestedhvm_vcpu_destroy(v);
>  
>      free_compat_arg_xlat(v);
> @@ -6486,6 +6488,25 @@ bool_t hvm_altp2m_supported()
>      return hvm_funcs.altp2m_supported;
>  }
>  
> +void ahvm_vcpu_update_eptp(struct vcpu *v)
> +{
> +    if (hvm_funcs.ahvm_vcpu_update_eptp)
> +        hvm_funcs.ahvm_vcpu_update_eptp(v);
> +}
> +
> +void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v)
> +{
> +    if (hvm_funcs.ahvm_vcpu_update_vmfunc_ve)
> +        hvm_funcs.ahvm_vcpu_update_vmfunc_ve(v);
> +}
> +
> +bool_t ahvm_vcpu_emulate_ve(struct vcpu *v)
> +{
> +    if (hvm_funcs.ahvm_vcpu_emulate_ve)
> +        return hvm_funcs.ahvm_vcpu_emulate_ve(v);
> +    return 0;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index d0d3f1e..202aa42 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -459,7 +459,7 @@ void hap_domain_init(struct domain *d)
>  int hap_enable(struct domain *d, u32 mode)
>  {
>      unsigned int old_pages;
> -    uint8_t i;
> +    uint16_t i;
>      int rv = 0;
>  
>      domain_pause(d);
> @@ -498,6 +498,24 @@ int hap_enable(struct domain *d, u32 mode)
>             goto out;
>      }
>  
> +    /* Init alternate p2m data */
> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )

Please use alloc_domheap_page() and map_domain_page_global() so the
allocation is accounted against the domain.

> +    {
> +        rv = -ENOMEM;
> +        goto out;
> +    }
> +
> +    for (i = 0; i < MAX_EPTP; i++)
> +        d->arch.altp2m_eptp[i] = ~0ul;
> +
> +    for (i = 0; i < MAX_ALTP2M; i++) {
> +        rv = p2m_alloc_table(d->arch.altp2m_p2m[i]);
> +        if ( rv != 0 )
> +           goto out;
> +    }
> +
> +    d->arch.altp2m_active = 0;
> +
>      /* Now let other users see the new mode */
>      d->arch.paging.mode = mode | PG_HAP_enable;
>  
> @@ -510,6 +528,17 @@ void hap_final_teardown(struct domain *d)
>  {
>      uint8_t i;
>  
> +    d->arch.altp2m_active = 0;
> +
> +    if ( d->arch.altp2m_eptp ) {
> +        free_xenheap_page(d->arch.altp2m_eptp);
> +        d->arch.altp2m_eptp = NULL;
> +    }
> +
> +    for (i = 0; i < MAX_ALTP2M; i++) {
> +        p2m_teardown(d->arch.altp2m_p2m[i]);
> +    }
> +
>      /* Destroy nestedp2m's first */
>      for (i = 0; i < MAX_NESTEDP2M; i++) {
>          p2m_teardown(d->arch.nested_p2m[i]);
> diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
> index b4f035e..954b345 100644
> --- a/xen/arch/x86/mm/mm-locks.h
> +++ b/xen/arch/x86/mm/mm-locks.h
> @@ -217,7 +217,7 @@ declare_mm_lock(nestedp2m)
>  #define nestedp2m_lock(d)   mm_lock(nestedp2m, &(d)->arch.nested_p2m_lock)
>  #define nestedp2m_unlock(d) mm_unlock(&(d)->arch.nested_p2m_lock)
>  
> -/* P2M lock (per-p2m-table)
> +/* P2M lock (per-non-alt-p2m-table)
>   *
>   * This protects all queries and updates to the p2m table.
>   * Queries may be made under the read lock but all modifications
> @@ -228,7 +228,36 @@ declare_mm_lock(nestedp2m)
>   */
>  
>  declare_mm_rwlock(p2m);

Normally, we expect the declarations to be in the nesting order, and
sofar, this is the case.  Please leave a comment here explaining why the
p2m lock declaration is now removed from the rest of its implementation.

> -#define p2m_lock(p)           mm_write_lock(p2m, &(p)->lock);
> +
> +/* Alternate P2M list lock (per-domain)
> + *
> + * A per-domain lock that protects the list of alternate p2m's.
> + * Any operation that walks the list needs to acquire this lock.
> + * Additionally, before destroying an alternate p2m all VCPU's
> + * in the target domain must be paused.  */
> +
> +declare_mm_lock(altp2mlist)
> +#define altp2m_lock(d)   mm_lock(altp2mlist, &(d)->arch.altp2m_lock)
> +#define altp2m_unlock(d) mm_unlock(&(d)->arch.altp2m_lock)
> +
> +/* P2M lock (per-altp2m-table)
> + *
> + * This protects all queries and updates to the p2m table.
> + * Queries may be made under the read lock but all modifications
> + * need the main (write) lock.
> + *
> + * The write lock is recursive as it is common for a code path to look
> + * up a gfn and later mutate it.
> + */
> +
> +declare_mm_rwlock(altp2m);
> +#define p2m_lock(p)                         \
> +{                                           \
> +    if ( p2m_is_altp2m(p) )                 \
> +        mm_write_lock(altp2m, &(p)->lock);  \
> +    else                                    \
> +        mm_write_lock(p2m, &(p)->lock);     \
> +}
>  #define p2m_unlock(p)         mm_write_unlock(&(p)->lock);
>  #define gfn_lock(p,g,o)       p2m_lock(p)
>  #define gfn_unlock(p,g,o)     p2m_unlock(p)
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 1fd1194..87b4b75 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -35,6 +35,7 @@
>  #include <asm/hvm/vmx/vmx.h> /* ept_p2m_init() */
>  #include <asm/mem_sharing.h>
>  #include <asm/hvm/nestedhvm.h>
> +#include <asm/hvm/altp2mhvm.h>
>  #include <asm/hvm/svm/amd-iommu-proto.h>
>  #include <xsm/xsm.h>
>  
> @@ -183,6 +184,45 @@ static void p2m_teardown_nestedp2m(struct domain *d)
>      }
>  }
>  
> +static void p2m_teardown_altp2m(struct domain *d);

You can avoid this forward declaration by moving the implementation of
p2m_teardown_altp2m() up here.

> +
> +static int p2m_init_altp2m(struct domain *d)
> +{
> +    uint8_t i;
> +    struct p2m_domain *p2m;
> +
> +    mm_lock_init(&d->arch.altp2m_lock);
> +    for (i = 0; i < MAX_ALTP2M; i++)
> +    {
> +        d->arch.altp2m_p2m[i] = p2m = p2m_init_one(d);
> +        if ( p2m == NULL )
> +        {
> +            p2m_teardown_altp2m(d);
> +            return -ENOMEM;
> +        }
> +        p2m->p2m_class = p2m_alternate;
> +        p2m->access_required = 1;
> +        _atomic_set(&p2m->active_vcpus, 0);
> +    }
> +
> +    return 0;
> +}
> +
> +static void p2m_teardown_altp2m(struct domain *d)
> +{
> +    uint8_t i;
> +    struct p2m_domain *p2m;
> +
> +    for (i = 0; i < MAX_ALTP2M; i++)
> +    {
> +        if ( !d->arch.altp2m_p2m[i] )
> +            continue;
> +        p2m = d->arch.altp2m_p2m[i];
> +        p2m_free_one(p2m);
> +        d->arch.altp2m_p2m[i] = NULL;
> +    }
> +}
> +
>  int p2m_init(struct domain *d)
>  {
>      int rc;
> @@ -196,7 +236,14 @@ int p2m_init(struct domain *d)
>       * (p2m_init runs too early for HVM_PARAM_* options) */
>      rc = p2m_init_nestedp2m(d);
>      if ( rc )
> +    {
>          p2m_teardown_hostp2m(d);
> +        return rc;
> +    }
> +
> +    rc = p2m_init_altp2m(d);
> +    if ( rc )
> +        p2m_teardown_altp2m(d);
>  
>      return rc;
>  }
> @@ -1920,6 +1967,62 @@ int unmap_mmio_regions(struct domain *d,
>      return err;
>  }
>  
> +bool_t p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp, unsigned long *idx)

Wouldn't it be better to return index directly, using INVALID_ALTP2M as
a sentinel, rather than returning idx via pointer?

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 03/12] x86/HVM: Hardware alternate p2m support detection.
  2015-06-24  9:44   ` Andrew Cooper
@ 2015-06-24 10:07     ` Jan Beulich
  0 siblings, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-06-24 10:07 UTC (permalink / raw)
  To: Andrew Cooper, Ed White, xen-devel
  Cc: RaviSahita, Wei Liu, Tim Deegan, Ian Jackson, tlengyel, Daniel De Graaf

>>> On 24.06.15 at 11:44, <andrew.cooper3@citrix.com> wrote:
> On 22/06/15 19:56, Ed White wrote:
>> @@ -6474,6 +6481,11 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
>>      return hvm_funcs.nhvm_intr_blocked(v);
>>  }
>>  
>> +bool_t hvm_altp2m_supported()
>> +{
>> +    return hvm_funcs.altp2m_supported;
>> +}
> 
> I would put this as a static inline in hvm.h, as opposed to forcing a
> call into a different translation unit to retrieve a global boolean.

+1

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-24 10:06   ` Andrew Cooper
@ 2015-06-24 10:23     ` Jan Beulich
  2015-06-24 17:20     ` Ed White
  1 sibling, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-06-24 10:23 UTC (permalink / raw)
  To: Andrew Cooper, Ed White
  Cc: RaviSahita, Wei Liu, Tim Deegan, Ian Jackson, xen-devel,
	tlengyel, Daniel De Graaf

>>> On 24.06.15 at 12:06, <andrew.cooper3@citrix.com> wrote:
> On 22/06/15 19:56, Ed White wrote:
>> @@ -498,6 +498,24 @@ int hap_enable(struct domain *d, u32 mode)
>>             goto out;
>>      }
>>  
>> +    /* Init alternate p2m data */
>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
> 
> Please use alloc_domheap_page() and map_domain_page_global() so the
> allocation is accounted against the domain.

The suggestion is okay, but it would be NULL that gets passed as
domain pointer here, i.e. there wouldn't be any accounting anyway
(we'd gain a wider pool to allocate from, but now that we have
vmalloc(), that should be considered as an option here too). Or
shouldn't this memory come from the P2M pool anyway?

What I also dislike here is an EPT specific allocation in hap.c.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-22 18:56 ` [PATCH v2 04/12] x86/altp2m: basic data structures and support routines Ed White
  2015-06-24 10:06   ` Andrew Cooper
@ 2015-06-24 10:29   ` Andrew Cooper
  2015-06-24 11:14     ` Andrew Cooper
  2015-06-26 21:17     ` Ed White
  2015-06-24 14:44   ` Jan Beulich
  2 siblings, 2 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 10:29 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
> index 3d8f4dc..a1529c0 100644
> --- a/xen/include/asm-x86/hvm/vcpu.h
> +++ b/xen/include/asm-x86/hvm/vcpu.h
> @@ -118,6 +118,13 @@ struct nestedvcpu {
>  
>  #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
>  
> +struct altp2mvcpu {
> +    uint16_t    p2midx;         /* alternate p2m index */
> +    uint64_t    veinfo_gfn;     /* #VE information page guest pfn */

Please use the recently-introduced pfn_t here.  pfn is a more
appropriate term than gfn in this case.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-24 10:29   ` Andrew Cooper
@ 2015-06-24 11:14     ` Andrew Cooper
  2015-06-26 21:17     ` Ed White
  1 sibling, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 11:14 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Jan Beulich,
	tlengyel, Daniel De Graaf

On 24/06/15 11:29, Andrew Cooper wrote:
> On 22/06/15 19:56, Ed White wrote:
>> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
>> index 3d8f4dc..a1529c0 100644
>> --- a/xen/include/asm-x86/hvm/vcpu.h
>> +++ b/xen/include/asm-x86/hvm/vcpu.h
>> @@ -118,6 +118,13 @@ struct nestedvcpu {
>>  
>>  #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
>>  
>> +struct altp2mvcpu {
>> +    uint16_t    p2midx;         /* alternate p2m index */
>> +    uint64_t    veinfo_gfn;     /* #VE information page guest pfn */
> Please use the recently-introduced pfn_t here.  pfn is a more
> appropriate term than gfn in this case.

And of course, I meant to say that gfn should be the term to use, since
this is a guest physical address, from its point of view.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 05/12] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-06-22 18:56 ` [PATCH v2 05/12] VMX/altp2m: add code to support EPTP switching and #VE Ed White
@ 2015-06-24 11:59   ` Andrew Cooper
  2015-06-24 17:31     ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 11:59 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> Implement and hook up the code to enable VMX support of VMFUNC and #VE.
>
> VMFUNC leaf 0 (EPTP switching) emulation is added in a later patch.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 132 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 132 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 2d3ad63..e8d9c82 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -56,6 +56,7 @@
>  #include <asm/debugger.h>
>  #include <asm/apic.h>
>  #include <asm/hvm/nestedhvm.h>
> +#include <asm/hvm/altp2mhvm.h>
>  #include <asm/event.h>
>  #include <asm/monitor.h>
>  #include <public/arch-x86/cpuid.h>
> @@ -1763,6 +1764,100 @@ static void vmx_enable_msr_exit_interception(struct domain *d)
>                                           MSR_TYPE_W);
>  }
>  
> +static void vmx_vcpu_update_eptp(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    struct p2m_domain *p2m = NULL;
> +    struct ept_data *ept;
> +
> +    if ( altp2mhvm_active(d) )
> +        p2m = p2m_get_altp2m(v);
> +    if ( !p2m )
> +        p2m = p2m_get_hostp2m(d);
> +
> +    ept = &p2m->ept;
> +    ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> +
> +    vmx_vmcs_enter(v);
> +
> +    __vmwrite(EPT_POINTER, ept_get_eptp(ept));
> +
> +    if ( v->arch.hvm_vmx.secondary_exec_control &
> +        SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
> +        __vmwrite(EPTP_INDEX, vcpu_altp2mhvm(v).p2midx);
> +
> +    vmx_vmcs_exit(v);
> +}
> +
> +static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
> +{
> +    struct domain *d = v->domain;
> +    u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
> +
> +    if ( !cpu_has_vmx_vmfunc )
> +        return;
> +
> +    if ( cpu_has_vmx_virt_exceptions )
> +        mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
> +
> +    vmx_vmcs_enter(v);
> +
> +    if ( !d->is_dying && altp2mhvm_active(d) )
> +    {
> +        v->arch.hvm_vmx.secondary_exec_control |= mask;
> +        __vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING);
> +        __vmwrite(EPTP_LIST_ADDR, virt_to_maddr(d->arch.altp2m_eptp));
> +
> +        if ( cpu_has_vmx_virt_exceptions )
> +        {
> +            p2m_type_t t;
> +            mfn_t mfn;
> +
> +            mfn = get_gfn_query_unlocked(d, vcpu_altp2mhvm(v).veinfo_gfn, &t);

get_gfn_query_unlocked() returns _mfn(INVALID_MFN) in the failure case,
which you must not blindly write back.

> +            __vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) << PAGE_SHIFT);

pfn_to_paddr() please, rather than opencoding it.  (This is a helper
which needs cleaning up, name-wise).

> +        }
> +    }
> +    else
> +        v->arch.hvm_vmx.secondary_exec_control &= ~mask;
> +
> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL,
> +        v->arch.hvm_vmx.secondary_exec_control);
> +
> +    vmx_vmcs_exit(v);
> +}
> +
> +static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
> +{
> +    bool_t rc = 0;
> +    ve_info_t *veinfo = vcpu_altp2mhvm(v).veinfo_gfn ?
> +        hvm_map_guest_frame_rw(vcpu_altp2mhvm(v).veinfo_gfn, 0) : NULL;

gfn 0 is a valid (albeit unlikely) location to request the veinfo page. 
Use GFN_INVALID as the sentinel.

> +
> +    if ( !veinfo )
> +        return 0;
> +
> +    if ( veinfo->semaphore != 0 )
> +        goto out;

The semantics of this semaphore are not clearly spelled out in the
manual.  The only information I can locate concerning this field is in
note in 25.5.6.1 which says:

"Delivery of virtualization exceptions writes the value FFFFFFFFH to
offset 4 in the virtualization-exception informa-
tion area (see Section 25.5.6.2). Thus, once a virtualization exception
occurs, another can occur only if software
clears this field."

I presume this should be taken to mean "software writes 0 to this
field", but some clarification would be nice.

> +
> +    rc = 1;
> +
> +    veinfo->exit_reason = EXIT_REASON_EPT_VIOLATION;
> +    veinfo->semaphore = ~0l;

semaphore is declared as an unsigned field, so should use ~0u.

> +    veinfo->eptp_index = vcpu_altp2mhvm(v).p2midx;
> +
> +    vmx_vmcs_enter(v);
> +    __vmread(EXIT_QUALIFICATION, &veinfo->exit_qualification);
> +    __vmread(GUEST_LINEAR_ADDRESS, &veinfo->gla);
> +    __vmread(GUEST_PHYSICAL_ADDRESS, &veinfo->gpa);
> +    vmx_vmcs_exit(v);
> +
> +    hvm_inject_hw_exception(TRAP_virtualisation,
> +                            HVM_DELIVER_NO_ERROR_CODE);
> +
> +out:
> +    hvm_unmap_guest_frame(veinfo, 0);
> +    return rc;
> +}
> +
>  static struct hvm_function_table __initdata vmx_function_table = {
>      .name                 = "VMX",
>      .cpu_up_prepare       = vmx_cpu_up_prepare,
> @@ -1822,6 +1917,9 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
>      .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
>      .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
> +    .ahvm_vcpu_update_eptp = vmx_vcpu_update_eptp,
> +    .ahvm_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
> +    .ahvm_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
>  };
>  
>  const struct hvm_function_table * __init start_vmx(void)
> @@ -2754,6 +2852,40 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>  
>      /* Now enable interrupts so it's safe to take locks. */
>      local_irq_enable();
> + 
> +    /*
> +     * If the guest has the ability to switch EPTP without an exit,
> +     * figure out whether it has done so and update the altp2m data.
> +     */
> +    if ( altp2mhvm_active(v->domain) &&
> +        (v->arch.hvm_vmx.secondary_exec_control &
> +        SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
> +    {
> +        unsigned long idx;
> +
> +        if ( v->arch.hvm_vmx.secondary_exec_control &
> +            SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS )
> +            __vmread(EPTP_INDEX, &idx);
> +        else
> +        {
> +            unsigned long eptp;
> +
> +            __vmread(EPT_POINTER, &eptp);
> +
> +            if ( !p2m_find_altp2m_by_eptp(v->domain, eptp, &idx) )
> +            {
> +                gdprintk(XENLOG_ERR, "EPTP not found in alternate p2m list\n");
> +                domain_crash(v->domain);
> +            }
> +        }
> +

Is it worth checking that idx is plausible at this point, before blindly
writing it back into the vcpu structure?

~Andrew

> +        if ( (uint16_t)idx != vcpu_altp2mhvm(v).p2midx )
> +        {
> +            atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
> +            vcpu_altp2mhvm(v).p2midx = (uint16_t)idx;
> +            atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
> +        }
> +    }
>  
>      /* XXX: This looks ugly, but we need a mechanism to ensure
>       * any pending vmresume has really happened

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.
  2015-06-22 18:56 ` [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator Ed White
@ 2015-06-24 12:47   ` Andrew Cooper
  2015-06-24 20:29     ` Ed White
  2015-06-24 14:26   ` Jan Beulich
  1 sibling, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 12:47 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> From: Ravi Sahita <ravi.sahita@intel.com>
>
> Signed-off-by: Ravi Sahita <ravi.sahita@intel.com>
> ---
>  xen/arch/x86/hvm/emulate.c             | 13 +++++++++++--
>  xen/arch/x86/hvm/vmx/vmx.c             | 30 ++++++++++++++++++++++++++++++
>  xen/arch/x86/x86_emulate/x86_emulate.c |  8 ++++++++
>  xen/arch/x86/x86_emulate/x86_emulate.h |  4 ++++
>  xen/include/asm-x86/hvm/hvm.h          |  2 ++
>  5 files changed, 55 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index ac9c9d6..e38a2fe 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -1356,6 +1356,13 @@ static int hvmemul_invlpg(
>      return rc;
>  }
>  
> +static int hvmemul_vmfunc(
> +    struct x86_emulate_ctxt *ctxt)
> +{
> +    hvm_funcs.ahvm_vcpu_emulate_vmfunc(ctxt->regs);
> +    return X86EMUL_OKAY;
> +}

ahvm_vcpu_emulate_vmfunc() should return an X86EMUL code.

> +
>  static const struct x86_emulate_ops hvm_emulate_ops = {
>      .read          = hvmemul_read,
>      .insn_fetch    = hvmemul_insn_fetch,
> @@ -1379,7 +1386,8 @@ static const struct x86_emulate_ops hvm_emulate_ops = {
>      .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
>      .get_fpu       = hvmemul_get_fpu,
>      .put_fpu       = hvmemul_put_fpu,
> -    .invlpg        = hvmemul_invlpg
> +    .invlpg        = hvmemul_invlpg,
> +    .vmfunc        = hvmemul_vmfunc,
>  };
>  
>  static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
> @@ -1405,7 +1413,8 @@ static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
>      .inject_sw_interrupt = hvmemul_inject_sw_interrupt,
>      .get_fpu       = hvmemul_get_fpu,
>      .put_fpu       = hvmemul_put_fpu,
> -    .invlpg        = hvmemul_invlpg
> +    .invlpg        = hvmemul_invlpg,
> +    .vmfunc        = hvmemul_vmfunc,
>  };
>  
>  static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e8d9c82..ad9e9e4 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -82,6 +82,7 @@ static void vmx_fpu_dirty_intercept(void);
>  static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
>  static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
>  static void vmx_invlpg_intercept(unsigned long vaddr);
> +static int vmx_vmfunc_intercept(struct cpu_user_regs* regs);

s/* / */

>  
>  uint8_t __read_mostly posted_intr_vector;
>  
> @@ -1826,6 +1827,20 @@ static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
>      vmx_vmcs_exit(v);
>  }
>  
> +static bool_t vmx_vcpu_emulate_vmfunc(struct cpu_user_regs *regs)
> +{
> +    bool_t rc = 0;
> +
> +    if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
> +         regs->eax == 0 &&
> +         p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )

Please latch current at the top of the function.  It is inefficient to
access like this.

> +    {
> +        regs->eip += 3;
> +        rc = 1;
> +    }
> +    return rc;
> +}
> +
>  static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
>  {
>      bool_t rc = 0;
> @@ -1894,6 +1909,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .msr_read_intercept   = vmx_msr_read_intercept,
>      .msr_write_intercept  = vmx_msr_write_intercept,
>      .invlpg_intercept     = vmx_invlpg_intercept,
> +    .vmfunc_intercept     = vmx_vmfunc_intercept,
>      .handle_cd            = vmx_handle_cd,
>      .set_info_guest       = vmx_set_info_guest,
>      .set_rdtsc_exiting    = vmx_set_rdtsc_exiting,
> @@ -1920,6 +1936,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
>      .ahvm_vcpu_update_eptp = vmx_vcpu_update_eptp,
>      .ahvm_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
>      .ahvm_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
> +    .ahvm_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
>  };
>  
>  const struct hvm_function_table * __init start_vmx(void)
> @@ -2091,6 +2108,13 @@ static void vmx_invlpg_intercept(unsigned long vaddr)
>          vpid_sync_vcpu_gva(curr, vaddr);
>  }
>  
> +static int vmx_vmfunc_intercept(struct cpu_user_regs *regs)
> +{
> +    gdprintk(XENLOG_ERR, "Failed guest VMFUNC execution\n");
> +    domain_crash(current->domain);
> +    return X86EMUL_OKAY;
> +}
> +
>  static int vmx_cr_access(unsigned long exit_qualification)
>  {
>      struct vcpu *curr = current;
> @@ -2675,6 +2699,7 @@ void vmx_enter_realmode(struct cpu_user_regs *regs)
>      regs->eflags |= (X86_EFLAGS_VM | X86_EFLAGS_IOPL);
>  }
>  
> +

Spurious whitespace change.

>  static void vmx_vmexit_ud_intercept(struct cpu_user_regs *regs)
>  {
>      struct hvm_emulate_ctxt ctxt;
> @@ -3239,6 +3264,11 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>              update_guest_eip();
>          break;
>  
> +    case EXIT_REASON_VMFUNC:
> +        if ( vmx_vmfunc_intercept(regs) == X86EMUL_OKAY )

This is currently an unconditional failure, and I don't see subsequent
patches which alter vmx_vmfunc_intercept().  Shouldn't
vmx_vmfunc_intercept() switch on eax and optionally call
p2m_switch_vcpu_altp2m_by_id()?

> +            update_guest_eip();
> +        break;
> +
>      case EXIT_REASON_MWAIT_INSTRUCTION:
>      case EXIT_REASON_MONITOR_INSTRUCTION:
>      case EXIT_REASON_GETSEC:
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
> index c017c69..4ae95ce 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -3837,6 +3837,14 @@ x86_emulate(
>              goto rdtsc;
>          }
>  
> +        if (modrm == 0xd4) /* vmfunc */

Style (spaces inside brackets).

~Andrew

> +        {
> +            fail_if(ops->vmfunc == NULL);
> +            if ( (rc = ops->vmfunc(ctxt) != 0) )
> +                goto done;
> +            break;
> +        }
> +
>          switch ( modrm_reg & 7 )
>          {
>          case 0: /* sgdt */
> diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
> index 064b8f4..a4d4ec8 100644
> --- a/xen/arch/x86/x86_emulate/x86_emulate.h
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
> @@ -397,6 +397,10 @@ struct x86_emulate_ops
>          enum x86_segment seg,
>          unsigned long offset,
>          struct x86_emulate_ctxt *ctxt);
> +
> +    /* vmfunc: Emulate VMFUNC via given set of EAX ECX inputs */
> +    int (*vmfunc)(
> +        struct x86_emulate_ctxt *ctxt);
>  };
>  
>  struct cpu_user_regs;
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index 9cd674f..2e33b4f 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -167,6 +167,7 @@ struct hvm_function_table {
>      int (*msr_read_intercept)(unsigned int msr, uint64_t *msr_content);
>      int (*msr_write_intercept)(unsigned int msr, uint64_t msr_content);
>      void (*invlpg_intercept)(unsigned long vaddr);
> +    int (*vmfunc_intercept)(struct cpu_user_regs *regs);
>      void (*handle_cd)(struct vcpu *v, unsigned long value);
>      void (*set_info_guest)(struct vcpu *v);
>      void (*set_rdtsc_exiting)(struct vcpu *v, bool_t);
> @@ -218,6 +219,7 @@ struct hvm_function_table {
>      void (*ahvm_vcpu_update_eptp)(struct vcpu *v);
>      void (*ahvm_vcpu_update_vmfunc_ve)(struct vcpu *v);
>      bool_t (*ahvm_vcpu_emulate_ve)(struct vcpu *v);
> +    bool_t (*ahvm_vcpu_emulate_vmfunc)(struct cpu_user_regs *regs);
>  };
>  
>  extern struct hvm_function_table hvm_funcs;

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-22 18:56 ` [PATCH v2 07/12] x86/altp2m: add control of suppress_ve Ed White
@ 2015-06-24 13:05   ` Andrew Cooper
  2015-06-24 14:38   ` Jan Beulich
  1 sibling, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 13:05 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> The existing ept_set_entry() and ept_get_entry() routines are extended
> to optionally set/get suppress_ve and renamed. New ept_set_entry() and
> ept_get_entry() routines are provided as wrappers, where set preserves
> suppress_ve for an existing entry and sets it for a new entry.
>
> Additional function pointers are added to p2m_domain to allow direct
> access to the extended routines.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/mm/p2m-ept.c | 40 +++++++++++++++++++++++++++++++++-------
>  xen/include/asm-x86/p2m.h | 13 +++++++++++++
>  2 files changed, 46 insertions(+), 7 deletions(-)
>
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 5de3387..e7719cf 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -649,14 +649,15 @@ bool_t ept_handle_misconfig(uint64_t gpa)
>  }
>  
>  /*
> - * ept_set_entry() computes 'need_modify_vtd_table' for itself,
> + * ept_set_entry_sve() computes 'need_modify_vtd_table' for itself,
>   * by observing whether any gfn->mfn translations are modified.
>   *
>   * Returns: 0 for success, -errno for failure
>   */
>  static int
> -ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
> -              unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma)
> +ept_set_entry_sve(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
> +                  unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma,
> +                  unsigned int sve)

I would be tempted to name this _ept_set_entry() rather than using a
_sve suffix.

unsigned int sve would be better int sve using -1/0/1, as it is
logically a trinay value (copy existing, clear, set).

>  {
>      ept_entry_t *table, *ept_entry = NULL;
>      unsigned long gfn_remainder = gfn;
> @@ -802,7 +803,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
>      }
>  
> -    new_entry.suppress_ve = 1;
> +    if ( sve != ~0 )
> +        new_entry.suppress_ve = !!sve;
> +    else
> +        new_entry.suppress_ve = is_epte_valid(&old_entry) ?
> +                                    old_entry.suppress_ve : 1;
>  
>      rc = atomic_write_ept_entry(ept_entry, new_entry, target);
>      if ( unlikely(rc) )(set/clear/copy).
> @@ -847,10 +852,18 @@ out:
>      return rc;
>  }
>  
> +static int
> +ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
> +              unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma)
> +{
> +    return ept_set_entry_sve(p2m, gfn, mfn, order, p2mt, p2ma, ~0);
> +}
> +
>  /* Read ept p2m entries */
> -static mfn_t ept_get_entry(struct p2m_domain *p2m,
> -                           unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
> -                           p2m_query_t q, unsigned int *page_order)
> +static mfn_t ept_get_entry_sve(struct p2m_domain *p2m,
> +                               unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
> +                               p2m_query_t q, unsigned int *page_order,
> +                               unsigned int *sve)

This should be a bool_t * as it is very definitely a boolean output.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 08/12] x86/altp2m: alternate p2m memory events.
  2015-06-22 18:56 ` [PATCH v2 08/12] x86/altp2m: alternate p2m memory events Ed White
@ 2015-06-24 13:09   ` Andrew Cooper
  2015-06-24 16:01   ` Lengyel, Tamas
  1 sibling, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 13:09 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> Add a flag to indicate that a memory event occurred in an alternate p2m
> and a field containing the p2m index. Allow the response to switch to
> a different p2m using the same flag and field.
>
> Modify p2m_access_check() to handle alternate p2m's.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-24  5:39   ` Razvan Cojocaru
@ 2015-06-24 13:32     ` Lengyel, Tamas
  2015-06-24 13:37       ` Razvan Cojocaru
  0 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-24 13:32 UTC (permalink / raw)
  To: Razvan Cojocaru
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Ed White,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 1777 bytes --]

On Wed, Jun 24, 2015 at 1:39 AM, Razvan Cojocaru <rcojocaru@bitdefender.com>
wrote:

> On 06/24/2015 12:27 AM, Lengyel, Tamas wrote:
> > I've extended xen-access to exercise this new feature taking into
> > account some of the current limitations. Using the altp2m_write|exec
> > options we create a duplicate view of the default hostp2m, and instead
> > of relaxing the mem_access permissions when we encounter a violation, we
> > swap the view on the violating vCPU while also enabling MTF
> > singlestepping. When the singlestep event fires, we use the response to
> > that event to swap the view back to the restricted altp2m view.
>
> That's certainly very interesting. I wonder what the benefits are in
> this case over emulating the fault-causing instruction (other than
> obviously not going through the emulator)? The altp2m method would
> certainly be slower, since you need more round-trips from userspace to
> the hypervisor (the EPT vm_event handling + the singlestep event,
> whereas with emulation you just reply to the original vm_event).
>
>
> Regards,
> Razvan
>

Certainly, this is pretty slow right now, especially for the altp2m_exec
case. However, sometimes you simply cannot emulate. For example if you
write breakpoints into target locations, the original instruction has been
overwritten with 0xCC. If you have a duplicate of the page without the
breakpoint, this is an easy way to make the guest fetch the original
instruction. Of course, if you extend the emulation routine where you can
provide the instruction to emulate, instead of it being fetched from guest
memory, that would be equally useful ;)

-- 

[image: www.novetta.com]

Tamas K Lengyel

Senior Security Researcher

7921 Jones Branch Drive

McLean VA 22102

Email  tlengyel@novetta.com

[-- Attachment #1.2: Type: text/html, Size: 4288 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-24 13:32     ` Lengyel, Tamas
@ 2015-06-24 13:37       ` Razvan Cojocaru
  2015-06-24 16:43         ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Razvan Cojocaru @ 2015-06-24 13:37 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Ed White,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/24/2015 04:32 PM, Lengyel, Tamas wrote:
> 
> 
> On Wed, Jun 24, 2015 at 1:39 AM, Razvan Cojocaru
> <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>> wrote:
> 
>     On 06/24/2015 12:27 AM, Lengyel, Tamas wrote:
>     > I've extended xen-access to exercise this new feature taking into
>     > account some of the current limitations. Using the altp2m_write|exec
>     > options we create a duplicate view of the default hostp2m, and instead
>     > of relaxing the mem_access permissions when we encounter a violation, we
>     > swap the view on the violating vCPU while also enabling MTF
>     > singlestepping. When the singlestep event fires, we use the response to
>     > that event to swap the view back to the restricted altp2m view.
> 
>     That's certainly very interesting. I wonder what the benefits are in
>     this case over emulating the fault-causing instruction (other than
>     obviously not going through the emulator)? The altp2m method would
>     certainly be slower, since you need more round-trips from userspace to
>     the hypervisor (the EPT vm_event handling + the singlestep event,
>     whereas with emulation you just reply to the original vm_event).
> 
> 
>     Regards,
>     Razvan
> 
> 
> Certainly, this is pretty slow right now, especially for the altp2m_exec
> case. However, sometimes you simply cannot emulate. For example if you
> write breakpoints into target locations, the original instruction has
> been overwritten with 0xCC. If you have a duplicate of the page without
> the breakpoint, this is an easy way to make the guest fetch the original
> instruction. Of course, if you extend the emulation routine where you
> can provide the instruction to emulate, instead of it being fetched from
> guest memory, that would be equally useful ;)

Makes sense, thanks for the explanation! Sure, sending back the
instruction to emulate could be something to consider for the future.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-22 18:56 ` [PATCH v2 09/12] x86/altp2m: add remaining support routines Ed White
  2015-06-23 18:15   ` Lengyel, Tamas
@ 2015-06-24 13:46   ` Andrew Cooper
  2015-06-24 17:47     ` Ed White
  2015-06-24 16:15   ` Lengyel, Tamas
  2015-06-25  2:44   ` Lengyel, Tamas
  3 siblings, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 13:46 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> Add the remaining routines required to support enabling the alternate
> p2m functionality.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c              |  60 +++++-
>  xen/arch/x86/mm/hap/Makefile        |   1 +
>  xen/arch/x86/mm/hap/altp2m_hap.c    | 103 +++++++++
>  xen/arch/x86/mm/p2m-ept.c           |   3 +
>  xen/arch/x86/mm/p2m.c               | 405 ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/altp2mhvm.h |   4 +
>  xen/include/asm-x86/p2m.h           |  33 +++
>  7 files changed, 601 insertions(+), 8 deletions(-)
>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index d75c12d..b758ee1 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2786,10 +2786,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>      p2m_access_t p2ma;
>      mfn_t mfn;
>      struct vcpu *v = current;
> -    struct p2m_domain *p2m;
> +    struct p2m_domain *p2m, *hostp2m;
>      int rc, fall_through = 0, paged = 0;
>      int sharing_enomem = 0;
>      vm_event_request_t *req_ptr = NULL;
> +    int altp2m_active = 0;

bool_t

>  
>      /* On Nested Virtualization, walk the guest page table.
>       * If this succeeds, all is fine.
> @@ -2845,15 +2846,33 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>      {
>          if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec) )
>              hvm_inject_hw_exception(TRAP_gp_fault, 0);
> -        rc = 1;
> -        goto out;
> +        return 1;

What is the justification for skipping the normal out: processing?

>      }
>  
> -    p2m = p2m_get_hostp2m(v->domain);
> -    mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
> +    altp2m_active = altp2mhvm_active(v->domain);
> +
> +    /* Take a lock on the host p2m speculatively, to avoid potential
> +     * locking order problems later and to handle unshare etc.
> +     */
> +    hostp2m = p2m_get_hostp2m(v->domain);
> +    mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
>                                P2M_ALLOC | (npfec.write_access ? P2M_UNSHARE : 0),
>                                NULL);
>  
> +    if ( altp2m_active )
> +    {
> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m) == 1 )
> +        {
> +            /* entry was lazily copied from host -- retry */
> +            __put_gfn(hostp2m, gfn);
> +            return 1;

Again, please don't skip the out: processing.

> +        }
> +
> +        mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
> +    }
> +    else
> +        p2m = hostp2m;
> +
>      /* Check access permissions first, then handle faults */
>      if ( mfn_x(mfn) != INVALID_MFN )
>      {
> @@ -2893,6 +2912,20 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>  
>          if ( violation )
>          {
> +            /* Should #VE be emulated for this fault? */
> +            if ( p2m_is_altp2m(p2m) && !cpu_has_vmx_virt_exceptions )
> +            {
> +                unsigned int sve;
> +
> +                p2m->get_entry_full(p2m, gfn, &p2mt, &p2ma, 0, NULL, &sve);
> +
> +                if ( !sve && ahvm_vcpu_emulate_ve(v) )
> +                {
> +                    rc = 1;
> +                    goto out_put_gfn;
> +                }
> +            }
> +
>              if ( p2m_mem_access_check(gpa, gla, npfec, &req_ptr) )
>              {
>                  fall_through = 1;
> @@ -2912,7 +2945,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>           (npfec.write_access &&
>            (p2m_is_discard_write(p2mt) || (p2mt == p2m_mmio_write_dm))) )
>      {
> -        put_gfn(p2m->domain, gfn);
> +        __put_gfn(p2m, gfn);
> +        if ( altp2m_active )
> +            __put_gfn(hostp2m, gfn);
>  
>          rc = 0;
>          if ( unlikely(is_pvh_vcpu(v)) )
> @@ -2941,6 +2976,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>      /* Spurious fault? PoD and log-dirty also take this path. */
>      if ( p2m_is_ram(p2mt) )
>      {
> +        rc = 1;
>          /*
>           * Page log dirty is always done with order 0. If this mfn resides in
>           * a large page, we do not change other pages type within that large
> @@ -2949,9 +2985,15 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>          if ( npfec.write_access )
>          {
>              paging_mark_dirty(v->domain, mfn_x(mfn));
> +            /* If p2m is really an altp2m, unlock here to avoid lock ordering
> +             * violation when the change below is propagated from host p2m */
> +            if ( altp2m_active )
> +                __put_gfn(p2m, gfn);
>              p2m_change_type_one(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
> +            __put_gfn(altp2m_active ? hostp2m : p2m, gfn);
> +
> +            goto out;
>          }
> -        rc = 1;
>          goto out_put_gfn;
>      }
>  
> @@ -2961,7 +3003,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>      rc = fall_through;
>  
>  out_put_gfn:
> -    put_gfn(p2m->domain, gfn);
> +    __put_gfn(p2m, gfn);
> +    if ( altp2m_active )
> +        __put_gfn(hostp2m, gfn);
>  out:
>      /* All of these are delayed until we exit, since we might 
>       * sleep on event ring wait queues, and we must not hold
> diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
> index 68f2bb5..216cd90 100644
> --- a/xen/arch/x86/mm/hap/Makefile
> +++ b/xen/arch/x86/mm/hap/Makefile
> @@ -4,6 +4,7 @@ obj-y += guest_walk_3level.o
>  obj-$(x86_64) += guest_walk_4level.o
>  obj-y += nested_hap.o
>  obj-y += nested_ept.o
> +obj-y += altp2m_hap.o
>  
>  guest_walk_%level.o: guest_walk.c Makefile
>  	$(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
> diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c b/xen/arch/x86/mm/hap/altp2m_hap.c
> new file mode 100644
> index 0000000..899b636
> --- /dev/null
> +++ b/xen/arch/x86/mm/hap/altp2m_hap.c
> @@ -0,0 +1,103 @@
> +/******************************************************************************
> + * arch/x86/mm/hap/altp2m_hap.c
> + *
> + * Copyright (c) 2014 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#include <asm/domain.h>
> +#include <asm/page.h>
> +#include <asm/paging.h>
> +#include <asm/p2m.h>
> +#include <asm/hap.h>
> +#include <asm/hvm/altp2mhvm.h>
> +
> +#include "private.h"
> +
> +/* Override macros from asm/page.h to make them work with mfn_t */
> +#undef mfn_valid
> +#define mfn_valid(_mfn) __mfn_valid(mfn_x(_mfn))
> +#undef page_to_mfn
> +#define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg))
> +
> +/*
> + * If the fault is for a not present entry:
> + *     if the entry in the host p2m has a valid mfn, copy it and retry
> + *     else indicate that outer handler should handle fault
> + *
> + * If the fault is for a present entry:
> + *     indicate that outer handler should handle fault
> + */
> +
> +int
> +altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
> +                                unsigned long gla, struct npfec npfec,
> +                                struct p2m_domain **ap2m)
> +{
> +    struct p2m_domain *hp2m = p2m_get_hostp2m(v->domain);
> +    p2m_type_t p2mt;
> +    p2m_access_t p2ma;
> +    unsigned int page_order;
> +    unsigned long gfn, mask;

gfn_t gfn please, and probably better to initialise with paddr_to_pfn()
rather than to opencode "gpa >> PAGE_SHIFT" repeatedly below.

> +    mfn_t mfn;
> +    int rv;
> +
> +    *ap2m = p2m_get_altp2m(v);
> +
> +    mfn = get_gfn_type_access(*ap2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
> +                              0, &page_order);
> +    __put_gfn(*ap2m, gpa >> PAGE_SHIFT);
> +
> +    if ( mfn_x(mfn) != INVALID_MFN )
> +        return 0;
> +
> +    mfn = get_gfn_type_access(hp2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
> +                              0, &page_order);
> +    put_gfn(hp2m->domain, gpa >> PAGE_SHIFT);
> +
> +    if ( mfn_x(mfn) == INVALID_MFN )
> +        return 0;
> +
> +    p2m_lock(*ap2m);
> +
> +    /* If this is a superpage mapping, round down both frame numbers
> +     * to the start of the superpage. */
> +    mask = ~((1UL << page_order) - 1);
> +    gfn = (gpa >> PAGE_SHIFT) & mask;
> +    mfn = _mfn(mfn_x(mfn) & mask);
> +
> +    rv = p2m_set_entry(*ap2m, gfn, mfn, page_order, p2mt, p2ma);
> +    p2m_unlock(*ap2m);
> +
> +    if ( rv ) {

Style (brace on new line)

> +        gdprintk(XENLOG_ERR,
> +	    "failed to set entry for %#"PRIx64" -> %#"PRIx64"\n",

It would be useful to know more information, (which altp2m), and to
prefer gfn over gpa to avoid mixing unqualified linear and frame numbers.

> +	    gpa, mfn_x(mfn));
> +        domain_crash(hp2m->domain);
> +    }
> +
> +    return 1;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index e7719cf..4411b36 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -849,6 +849,9 @@ out:
>      if ( is_epte_present(&old_entry) )
>          ept_free_entry(p2m, &old_entry, target);
>  
> +    if ( rc == 0 && p2m_is_hostp2m(p2m) )
> +        p2m_altp2m_propagate_change(d, gfn, mfn, order, p2mt, p2ma);
> +
>      return rc;
>  }
>  
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 389360a..588acd5 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -2041,6 +2041,411 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx)
>      return rc;
>  }
>  
> +void p2m_flush_altp2m(struct domain *d)
> +{
> +    uint16_t i;
> +
> +    altp2m_lock(d);
> +
> +    for ( i = 0; i < MAX_ALTP2M; i++ )
> +    {
> +        p2m_flush_table(d->arch.altp2m_p2m[i]);
> +        /* Uninit and reinit ept to force TLB shootdown */
> +        ept_p2m_uninit(d->arch.altp2m_p2m[i]);
> +        ept_p2m_init(d->arch.altp2m_p2m[i]);
> +        d->arch.altp2m_eptp[i] = ~0ul;

INVALID_MFN (elsewhere through the series as well).

> +    }
> +
> +    altp2m_unlock(d);
> +}
> +
> +bool_t p2m_init_altp2m_by_id(struct domain *d, uint16_t idx)
> +{
> +    struct p2m_domain *p2m;
> +    struct ept_data *ept;
> +    bool_t rc = 0;
> +
> +    if ( idx > MAX_ALTP2M )
> +        return rc;
> +
> +    altp2m_lock(d);
> +
> +    if ( d->arch.altp2m_eptp[idx] == ~0ul )
> +    {
> +        p2m = d->arch.altp2m_p2m[idx];
> +        p2m->min_remapped_pfn = ~0ul;
> +        p2m->max_remapped_pfn = ~0ul;
> +        ept = &p2m->ept;
> +        ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> +        d->arch.altp2m_eptp[idx] = ept_get_eptp(ept);
> +        rc = 1;
> +    }
> +
> +    altp2m_unlock(d);
> +    return rc;
> +}
> +
> +bool_t p2m_init_next_altp2m(struct domain *d, uint16_t *idx)
> +{
> +    struct p2m_domain *p2m;
> +    struct ept_data *ept;
> +    bool_t rc = 0;
> +    uint16_t i;
> +
> +    altp2m_lock(d);
> +
> +    for ( i = 0; i < MAX_ALTP2M; i++ )
> +    {
> +        if ( d->arch.altp2m_eptp[i] != ~0ul )
> +            continue;
> +
> +        p2m = d->arch.altp2m_p2m[i];
> +        p2m->min_remapped_pfn = ~0ul;
> +        p2m->max_remapped_pfn = ~0ul;
> +        ept = &p2m->ept;
> +        ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> +        d->arch.altp2m_eptp[i] = ept_get_eptp(ept);
> +        *idx = i;
> +        rc = 1;

This function, and the one above, look like they could do with a common
__init_altp2m() helper to avoid duplicating the reset state for an altp2m.

> +
> +        break;
> +    }
> +
> +    altp2m_unlock(d);
> +    return rc;
> +}
> +
> +bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx)
> +{
> +    struct p2m_domain *p2m;
> +    struct vcpu *curr = current;
> +    struct vcpu *v;
> +    bool_t rc = 0;
> +
> +    if ( !idx || idx > MAX_ALTP2M )
> +        return rc;
> +
> +    if ( curr->domain != d )
> +        domain_pause(d);
> +    else
> +        for_each_vcpu( d, v )
> +            if ( curr != v )
> +                vcpu_pause(v);

This looks like some hoop jumping around the assertions in
domain_pause() and vcpu_pause().

We should probably have some new helpers where the domain needs to be
paused, possibly while in context.  The current domain/vcpu_pause() are
almost always used where it is definitely not safe to pause in context,
hence the assertions.

> +
> +    altp2m_lock(d);
> +
> +    if ( d->arch.altp2m_eptp[idx] != ~0ul )
> +    {
> +        p2m = d->arch.altp2m_p2m[idx];
> +
> +        if ( !_atomic_read(p2m->active_vcpus) )
> +        {
> +            p2m_flush_table(d->arch.altp2m_p2m[idx]);
> +            /* Uninit and reinit ept to force TLB shootdown */
> +            ept_p2m_uninit(d->arch.altp2m_p2m[idx]);
> +            ept_p2m_init(d->arch.altp2m_p2m[idx]);
> +            d->arch.altp2m_eptp[idx] = ~0ul;
> +            rc = 1;
> +        }
> +    }
> +
> +    altp2m_unlock(d);
> +
> +    if ( curr->domain != d )
> +        domain_unpause(d);
> +    else
> +        for_each_vcpu( d, v )
> +            if ( curr != v )
> +                vcpu_unpause(v);
> +
> +    return rc;
> +}
> +
> +bool_t p2m_switch_domain_altp2m_by_id(struct domain *d, uint16_t idx)
> +{
> +    struct vcpu *curr = current;
> +    struct vcpu *v;
> +    bool_t rc = 0;
> +
> +    if ( idx > MAX_ALTP2M )
> +        return rc;
> +
> +    if ( curr->domain != d )
> +        domain_pause(d);
> +    else
> +        for_each_vcpu( d, v )
> +            if ( curr != v )
> +                vcpu_pause(v);
> +
> +    altp2m_lock(d);
> +
> +    if ( d->arch.altp2m_eptp[idx] != ~0ul )
> +    {
> +        for_each_vcpu( d, v )
> +            if ( idx != vcpu_altp2mhvm(v).p2midx )
> +            {
> +                atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
> +                vcpu_altp2mhvm(v).p2midx = idx;
> +                atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
> +                ahvm_vcpu_update_eptp(v);
> +            }
> +
> +        rc = 1;
> +    }
> +
> +    altp2m_unlock(d);
> +
> +    if ( curr->domain != d )
> +        domain_unpause(d);
> +    else
> +        for_each_vcpu( d, v )
> +            if ( curr != v )
> +                vcpu_unpause(v);
> +
> +    return rc;
> +}
> +
> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
> +                                 unsigned long pfn, xenmem_access_t access)

gfn_t gfn please.

> +{
> +    struct p2m_domain *hp2m, *ap2m;
> +    p2m_access_t a, _a;

{host,alt}_access? to save having two variable differing by just an
underscore.

> +    p2m_type_t t;
> +    mfn_t mfn;
> +    unsigned int page_order;
> +    bool_t rc = 0;
> +
> +    static const p2m_access_t memaccess[] = {
> +#define ACCESS(ac) [XENMEM_access_##ac] = p2m_access_##ac
> +        ACCESS(n),
> +        ACCESS(r),
> +        ACCESS(w),
> +        ACCESS(rw),
> +        ACCESS(x),
> +        ACCESS(rx),
> +        ACCESS(wx),
> +        ACCESS(rwx),
> +#undef ACCESS
> +    };
> +
> +    if ( idx > MAX_ALTP2M || d->arch.altp2m_eptp[idx] == ~0ul )
> +        return 0;
> +
> +    ap2m = d->arch.altp2m_p2m[idx];
> +
> +    switch ( access )
> +    {
> +    case 0 ... ARRAY_SIZE(memaccess) - 1:
> +        a = memaccess[access];
> +        break;
> +    case XENMEM_access_default:
> +        a = ap2m->default_access;
> +        break;
> +    default:
> +        return 0;
> +    }
> +
> +    /* If request to set default access */
> +    if ( pfn == ~0ul )
> +    {
> +        ap2m->default_access = a;
> +        return 1;
> +    }
> +
> +    hp2m = p2m_get_hostp2m(d);
> +
> +    p2m_lock(ap2m);
> +
> +    mfn = ap2m->get_entry(ap2m, pfn, &t, &_a, 0, NULL);
> +
> +    /* Check host p2m if no valid entry in alternate */
> +    if ( !mfn_valid(mfn) )
> +    {
> +        mfn = hp2m->get_entry(hp2m, pfn, &t, &_a, 0, &page_order);
> +
> +        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
> +            goto out;
> +
> +        /* If this is a superpage, copy that first */
> +        if ( page_order != PAGE_ORDER_4K )
> +        {
> +            unsigned long gfn, mask;
> +            mfn_t mfn2;
> +
> +            mask = ~((1UL << page_order) - 1);
> +            gfn = pfn & mask;
> +            mfn2 = _mfn(mfn_x(mfn) & mask);
> +
> +            if ( ap2m->set_entry(ap2m, gfn, mfn2, page_order, t, _a) )
> +                goto out;
> +        }
> +    }
> +
> +    if ( !ap2m->set_entry_full(ap2m, pfn, mfn, PAGE_ORDER_4K, t, a,
> +                               (current->domain != d)) )
> +        rc = 1;
> +
> +out:
> +    p2m_unlock(ap2m);
> +    return rc;
> +}
> +
> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
> +                             unsigned long old_pfn, unsigned long new_pfn)

gfns all the way through.

> +{
> +    struct p2m_domain *hp2m, *ap2m;
> +    p2m_access_t a;
> +    p2m_type_t t;
> +    mfn_t mfn;
> +    unsigned int page_order;
> +    bool_t rc = 0;
> +
> +    if ( idx > MAX_ALTP2M || d->arch.altp2m_eptp[idx] == ~0ul )
> +        return 0;
> +
> +    hp2m = p2m_get_hostp2m(d);
> +    ap2m = d->arch.altp2m_p2m[idx];
> +
> +    p2m_lock(ap2m);
> +
> +    mfn = ap2m->get_entry(ap2m, old_pfn, &t, &a, 0, NULL);
> +
> +    if ( new_pfn == ~0ul )
> +    {
> +        if ( mfn_valid(mfn) )
> +            p2m_remove_page(ap2m, old_pfn, mfn_x(mfn), PAGE_ORDER_4K);
> +        rc = 1;
> +        goto out;
> +    }
> +
> +    /* Check host p2m if no valid entry in alternate */
> +    if ( !mfn_valid(mfn) )
> +    {
> +        mfn = hp2m->get_entry(hp2m, old_pfn, &t, &a, 0, &page_order);
> +
> +        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
> +            goto out;
> +
> +        /* If this is a superpage, copy that first */
> +        if ( page_order != PAGE_ORDER_4K )
> +        {
> +            unsigned long gfn, mask;
> +
> +            mask = ~((1UL << page_order) - 1);
> +            gfn = old_pfn & mask;
> +            mfn = _mfn(mfn_x(mfn) & mask);
> +
> +            if ( ap2m->set_entry(ap2m, gfn, mfn, page_order, t, a) )
> +                goto out;
> +        }
> +    }
> +
> +    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
> +
> +    if ( !mfn_valid(mfn) )
> +        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
> +
> +    if ( !mfn_valid(mfn) || (t != p2m_ram_rw) )
> +        goto out;
> +
> +    if ( !ap2m->set_entry_full(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a,
> +                               (current->domain != d)) )
> +    {
> +        rc = 1;
> +
> +        if ( ap2m->min_remapped_pfn == ~0ul ||
> +             new_pfn < ap2m->min_remapped_pfn )
> +            ap2m->min_remapped_pfn = new_pfn;
> +        if ( ap2m->max_remapped_pfn == ~0ul ||
> +             new_pfn > ap2m->max_remapped_pfn )
> +            ap2m->max_remapped_pfn = new_pfn;
> +    }
> +
> +out:
> +    p2m_unlock(ap2m);
> +    return rc;
> +}
> +
> +static inline void p2m_reset_altp2m(struct p2m_domain *p2m)

inline is not useful here.  The compiler will have a better idea as to
whether inlining it is a good idea or not.

> +{
> +    p2m_flush_table(p2m);
> +    /* Uninit and reinit ept to force TLB shootdown */
> +    ept_p2m_uninit(p2m);
> +    ept_p2m_init(p2m);
> +    p2m->min_remapped_pfn = ~0ul;
> +    p2m->max_remapped_pfn = ~0ul;
> +}
> +
> +void p2m_altp2m_propagate_change(struct domain *d, unsigned long gfn,

gfn_t.

> +                                 mfn_t mfn, unsigned int page_order,
> +                                 p2m_type_t p2mt, p2m_access_t p2ma)
> +{
> +    struct p2m_domain *p2m;
> +    p2m_access_t a;
> +    p2m_type_t t;
> +    mfn_t m;
> +    uint16_t i;
> +    bool_t reset_p2m;
> +    unsigned int reset_count = 0;
> +    uint16_t last_reset_idx = ~0;
> +
> +    if ( !altp2mhvm_active(d) )
> +        return;
> +
> +    altp2m_lock(d);
> +
> +    for ( i = 0; i < MAX_ALTP2M; i++ )
> +    {
> +        if ( d->arch.altp2m_eptp[i] == ~0ul )
> +            continue;
> +
> +        p2m = d->arch.altp2m_p2m[i];
> +        m = get_gfn_type_access(p2m, gfn, &t, &a, 0, NULL);
> +
> +        reset_p2m = 0;
> +
> +        /* Check for a dropped page that may impact this altp2m */
> +        if ( mfn_x(mfn) == INVALID_MFN &&
> +             gfn >= p2m->min_remapped_pfn && gfn <= p2m->max_remapped_pfn )
> +            reset_p2m = 1;
> +
> +        if ( reset_p2m )
> +        {
> +            if ( !reset_count++ )
> +            {
> +                p2m_reset_altp2m(p2m);
> +                last_reset_idx = i;
> +            }
> +            else
> +            {
> +                /* At least 2 altp2m's impacted, so reset everything */
> +                __put_gfn(p2m, gfn);
> +
> +                for ( i = 0; i < MAX_ALTP2M; i++ )
> +                {
> +                    if ( i == last_reset_idx ||
> +                         d->arch.altp2m_eptp[i] == ~0ul )
> +                        continue;
> +
> +                    p2m = d->arch.altp2m_p2m[i];
> +                    p2m_lock(p2m);
> +                    p2m_reset_altp2m(p2m);
> +                    p2m_unlock(p2m);
> +                }
> +
> +                goto out;
> +            }
> +        }
> +        else if ( mfn_x(m) != INVALID_MFN )
> +           p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
> +
> +        __put_gfn(p2m, gfn);
> +    }
> +
> +out:
> +    altp2m_unlock(d);
> +}
> +
>  /*** Audit ***/
>  
>  #if P2M_AUDIT
> diff --git a/xen/include/asm-x86/hvm/altp2mhvm.h b/xen/include/asm-x86/hvm/altp2mhvm.h
> index a4b8e24..08ff79b 100644
> --- a/xen/include/asm-x86/hvm/altp2mhvm.h
> +++ b/xen/include/asm-x86/hvm/altp2mhvm.h
> @@ -34,5 +34,9 @@ int altp2mhvm_vcpu_initialise(struct vcpu *v);
>  void altp2mhvm_vcpu_destroy(struct vcpu *v);
>  void altp2mhvm_vcpu_reset(struct vcpu *v);
>  
> +/* Alternate p2m paging */
> +int altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
> +    unsigned long gla, struct npfec npfec, struct p2m_domain **ap2m);
> +
>  #endif /* _HVM_ALTP2M_H */
>  
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index d84da33..3f17211 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -279,6 +279,11 @@ struct p2m_domain {
>      /* Highest guest frame that's ever been mapped in the p2m */
>      unsigned long max_mapped_pfn;
>  
> +    /* Alternate p2m's only: range of pfn's for which underlying
> +     * mfn may have duplicate mappings */
> +    unsigned long min_remapped_pfn;
> +    unsigned long max_remapped_pfn;

These are gfns.

~Andrew

> +
>      /* When releasing shared gfn's in a preemptible manner, recall where
>       * to resume the search */
>      unsigned long next_shared_gfn_to_relinquish;
> @@ -766,6 +771,34 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
>  void p2m_mem_access_altp2m_check(struct vcpu *v,
>                                   const vm_event_response_t *rsp);
>  
> +/* Flush all the alternate p2m's for a domain */
> +void p2m_flush_altp2m(struct domain *d);
> +
> +/* Make a specific alternate p2m valid */
> +bool_t p2m_init_altp2m_by_id(struct domain *d, uint16_t idx);
> +
> +/* Find an available alternate p2m and make it valid */
> +bool_t p2m_init_next_altp2m(struct domain *d, uint16_t *idx);
> +
> +/* Make a specific alternate p2m invalid */
> +bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx);
> +
> +/* Switch alternate p2m for entire domain */
> +bool_t p2m_switch_domain_altp2m_by_id(struct domain *d, uint16_t idx);
> +
> +/* Set access type for a pfn */
> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
> +                                 unsigned long pfn, xenmem_access_t access);
> +
> +/* Replace a pfn with a different pfn */
> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
> +                             unsigned long old_pfn, unsigned long new_pfn);
> +
> +/* Propagate a host p2m change to all alternate p2m's */
> +void p2m_altp2m_propagate_change(struct domain *d, unsigned long gfn,
> +                                 mfn_t mfn, unsigned int page_order,
> +                                 p2m_type_t p2mt, p2m_access_t p2ma);
> +
>  /*
>   * p2m type to IOMMU flags
>   */

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 10/12] x86/altp2m: define and implement alternate p2m HVMOP types.
  2015-06-22 18:56 ` [PATCH v2 10/12] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
@ 2015-06-24 13:58   ` Andrew Cooper
  2015-06-24 14:53   ` Jan Beulich
  1 sibling, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 13:58 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c          | 216 ++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/hvm/hvm_op.h |  69 +++++++++++++
>  2 files changed, 285 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index b758ee1..b3e74ce 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -6424,6 +6424,222 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case HVMOP_altp2m_get_domain_state:
> +    {
> +        struct xen_hvm_altp2m_domain_state a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() )
> +            goto param_fail9;
> +
> +        a.state = altp2mhvm_active(d);
> +        rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
> +
> +    param_fail9:
> +        rcu_unlock_domain(d);
> +        break;
> +    }
> +
> +    case HVMOP_altp2m_set_domain_state:
> +    {
> +        struct xen_hvm_altp2m_domain_state a;
> +        struct domain *d;
> +        struct vcpu *v;
> +        bool_t ostate;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             nestedhvm_enabled(d) )
> +            goto param_fail10;
> +
> +        ostate = d->arch.altp2m_active;
> +        d->arch.altp2m_active = !!a.state;
> +
> +        /* If the alternate p2m state has changed, handle appropriately */
> +        if ( d->arch.altp2m_active != ostate )
> +        {
> +            if ( !ostate && !p2m_init_altp2m_by_id(d, 0) )
> +                    goto param_fail10;

Indentation.

> +
> +            for_each_vcpu( d, v )
> +                if (!ostate)
> +                    altp2mhvm_vcpu_initialise(v);
> +                else
> +                    altp2mhvm_vcpu_destroy(v);

Although strictly speaking this is (almost) ok by the style guidelines,
it would probably be better to have braces for the for_each_vcpu()
loop.  Also, spaces for the brackets for !ostate.

> +
> +            if ( ostate )
> +                p2m_flush_altp2m(d);
> +        }
> +
> +        rc = 0;
> +
> +    param_fail10:
> +        rcu_unlock_domain(d);
> +        break;
> +    }
> +
> +    case HVMOP_altp2m_vcpu_enable_notify:
> +    {
> +        struct domain *curr_d = current->domain;
> +        struct vcpu *curr = current;
> +        struct xen_hvm_altp2m_vcpu_enable_notify a;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        if ( !is_hvm_domain(curr_d) || !hvm_altp2m_supported() ||
> +             !curr_d->arch.altp2m_active || vcpu_altp2mhvm(curr).veinfo_gfn )
> +            return -EINVAL;
> +
> +        vcpu_altp2mhvm(curr).veinfo_gfn = a.pfn;
> +        ahvm_vcpu_update_vmfunc_ve(curr);

You need a gfn bounds check against the host p2m here.

> +        rc = 0;
> +
> +        break;
> +    }
> +
> +    case HVMOP_altp2m_create_p2m:
> +    {
> +        struct xen_hvm_altp2m_view a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail11;
> +
> +        if ( !p2m_init_next_altp2m(d, &a.view) )
> +            goto param_fail11;
> +
> +        rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
> +
> +    param_fail11:
> +        rcu_unlock_domain(d);
> +        break;
> +    }
> +
> +    case HVMOP_altp2m_destroy_p2m:
> +    {
> +        struct xen_hvm_altp2m_view a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail12;
> +
> +        if ( p2m_destroy_altp2m_by_id(d, a.view) )
> +            rc = 0;
> +
> +    param_fail12:
> +        rcu_unlock_domain(d);
> +        break;
> +    }
> +
> +    case HVMOP_altp2m_switch_p2m:
> +    {
> +        struct xen_hvm_altp2m_view a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail13;
> +
> +        if ( p2m_switch_domain_altp2m_by_id(d, a.view) )
> +            rc = 0;
> +
> +    param_fail13:
> +        rcu_unlock_domain(d);
> +        break;
> +    }
> +
> +    case HVMOP_altp2m_set_mem_access:
> +    {
> +        struct xen_hvm_altp2m_set_mem_access a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail14;
> +
> +        if ( p2m_set_altp2m_mem_access(d, a.view, a.pfn, a.hvmmem_access) )
> +            rc = 0;
> +
> +    param_fail14:
> +        rcu_unlock_domain(d);
> +        break;
> +    }
> +
> +    case HVMOP_altp2m_change_pfn:
> +    {
> +        struct xen_hvm_altp2m_change_pfn a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail15;
> +
> +        if ( p2m_change_altp2m_pfn(d, a.view, a.old_pfn, a.new_pfn) )
> +            rc = 0;
> +
> +    param_fail15:
> +        rcu_unlock_domain(d);
> +        break;
> +    }
> +
>      default:
>      {
>          gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op);
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index cde3571..f6abce9 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -389,6 +389,75 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t);
>  
>  #endif /* defined(__i386__) || defined(__x86_64__) */
>  

We have an upper ABI limit of 255 HVMOPs.  As such, I would recommend
having a single HVMOP_altp2m and a subop which lives as the first
parameter in any structure.

~Andrew

> +/* Set/get the altp2m state for a domain */
> +#define HVMOP_altp2m_set_domain_state     24
> +#define HVMOP_altp2m_get_domain_state     25
> +struct xen_hvm_altp2m_domain_state {
> +    /* Domain to be updated or queried */
> +    domid_t domid;
> +    /* IN or OUT variable on/off */
> +    uint8_t state;
> +};
> +typedef struct xen_hvm_altp2m_domain_state xen_hvm_altp2m_domain_state_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_domain_state_t);
> +
> +/* Set the current VCPU to receive altp2m event notifications */
> +#define HVMOP_altp2m_vcpu_enable_notify   26
> +struct xen_hvm_altp2m_vcpu_enable_notify {
> +    /* #VE info area pfn */
> +    uint64_t pfn;
> +};
> +typedef struct xen_hvm_altp2m_vcpu_enable_notify xen_hvm_altp2m_vcpu_enable_notify_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_vcpu_enable_notify_t);
> +
> +/* Create a new view */
> +#define HVMOP_altp2m_create_p2m   27
> +/* Destroy a view */
> +#define HVMOP_altp2m_destroy_p2m  28
> +/* Switch view for an entire domain */
> +#define HVMOP_altp2m_switch_p2m   29
> +struct xen_hvm_altp2m_view {
> +    /* Domain to be updated */
> +    domid_t domid;
> +    /* IN/OUT variable */
> +    uint16_t view;
> +    /* Create view only: default access type
> +     * NOTE: currently ignored */
> +    uint16_t hvmmem_default_access; /* xenmem_access_t */
> +};
> +typedef struct xen_hvm_altp2m_view xen_hvm_altp2m_view_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_view_t);
> +
> +/* Notify that a page of memory is to have specific access types */
> +#define HVMOP_altp2m_set_mem_access 30
> +struct xen_hvm_altp2m_set_mem_access {
> +    /* Domain to be updated. */
> +    domid_t domid;
> +    /* view */
> +    uint16_t view;
> +    /* Memory type */
> +    uint16_t hvmmem_access; /* xenmem_access_t */
> +    /* pfn */
> +    uint64_t pfn;
> +};
> +typedef struct xen_hvm_altp2m_set_mem_access xen_hvm_altp2m_set_mem_access_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_set_mem_access_t);
> +
> +/* Change a p2m entry to map a different pfn */
> +#define HVMOP_altp2m_change_pfn 31
> +struct xen_hvm_altp2m_change_pfn {
> +    /* Domain to be updated. */
> +    domid_t domid;
> +    /* view */
> +    uint16_t view;
> +    /* old pfn */
> +    uint64_t old_pfn;
> +    /* new pfn, -1 means revert */
> +    uint64_t new_pfn;
> +};
> +typedef struct xen_hvm_altp2m_change_pfn xen_hvm_altp2m_change_pfn_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_change_pfn_t);
> +
>  #endif /* __XEN_PUBLIC_HVM_HVM_OP_H__ */
>  
>  /*

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter.
  2015-06-22 18:56 ` [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter Ed White
@ 2015-06-24 14:06   ` Andrew Cooper
  2015-06-24 14:59   ` Jan Beulich
  1 sibling, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 14:06 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index b3e74ce..8453489 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -5732,6 +5732,7 @@ static int hvm_allow_set_param(struct domain *d,
>      case HVM_PARAM_VIRIDIAN:
>      case HVM_PARAM_IOREQ_SERVER_PFN:
>      case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
> +    case HVM_PARAM_ALTP2MHVM:
>          if ( value != 0 && a->value != value )
>              rc = -EEXIST;
>          break;
> @@ -5854,6 +5855,9 @@ static int hvmop_set_param(
>           */
>          if ( cpu_has_svm && !paging_mode_hap(d) && a.value )
>              rc = -EINVAL;
> +        if ( a.value &&
> +             d->arch.hvm_domain.params[HVM_PARAM_ALTP2MHVM] )
> +            rc = -EINVAL;
>          /* Set up NHVM state for any vcpus that are already up. */
>          if ( a.value &&
>               !d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
> @@ -5864,6 +5868,13 @@ static int hvmop_set_param(
>              for_each_vcpu(d, v)
>                  nestedhvm_vcpu_destroy(v);
>          break;
> +    case HVM_PARAM_ALTP2MHVM:
> +        if ( a.value > 1 )
> +            rc = -EINVAL;
> +        if ( a.value &&
> +             d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
> +            rc = -EINVAL;
> +        break;
>      case HVM_PARAM_BUFIOREQ_EVTCHN:
>          rc = -EINVAL;
>          break;

As altp2m is useful as much for an inguest agent as well as an external
agent, it might be an idea to add HVM_PARAM_ALTP2MHVM to the whitelist
of guest-readable params, unless we intend to support a "only external
agent" XSM mode and avoid to avoid the guest snooping on its own settings.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
                   ` (12 preceding siblings ...)
  2015-06-23 21:27 ` [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Lengyel, Tamas
@ 2015-06-24 14:10 ` Andrew Cooper
  13 siblings, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 14:10 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 22/06/15 19:56, Ed White wrote:
> This set of patches adds support to hvm domains for EPTP switching by creating
> multiple copies of the host p2m (currently limited to 10 copies).
>
> The primary use of this capability is expected to be in scenarios where access
> to memory needs to be monitored and/or restricted below the level at which the
> guest OS page tables operate. Two examples that were discussed at the 2014 Xen
> developer summit are:
>
>     VM introspection: 
>         http://www.slideshare.net/xen_com_mgr/
>         zero-footprint-guest-memory-introspection-from-xen
>
>     Secure inter-VM communication:
>         http://www.slideshare.net/xen_com_mgr/nakajima-nvf
>
> A more detailed design specification can be found at:
>     http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01319.html
>
> Each p2m copy is populated lazily on EPT violations.
> Permissions for pages in alternate p2m's can be changed in a similar
> way to the existing memory access interface, and gfn->mfn mappings can be changed.
>
> All this is done through extra HVMOP types.
>
> The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
> code is hypervisor-only, the toolstack has not been modified.
>
> The intra-domain code has been tested. Violation notifications can only be received
> for pages that have been modified (access permissions and/or gfn->mfn mapping) 
> intra-domain, and only on VCPU's that have enabled notification.
>
> VMFUNC and #VE will both be emulated on hardware without native support.
>
> This code is not compatible with nested hvm functionality and will refuse to work
> with nested hvm active. It is also not compatible with migration. It should be
> considered experimental.

Overall, this patch series is looking very good, and it would seem that
3rd party testing agrees!

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.
  2015-06-22 18:56 ` [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator Ed White
  2015-06-24 12:47   ` Andrew Cooper
@ 2015-06-24 14:26   ` Jan Beulich
  1 sibling, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-06-24 14:26 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
> @@ -1826,6 +1827,20 @@ static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
>      vmx_vmcs_exit(v);
>  }
>  
> +static bool_t vmx_vcpu_emulate_vmfunc(struct cpu_user_regs *regs)
> +{
> +    bool_t rc = 0;
> +
> +    if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
> +         regs->eax == 0 &&
> +         p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
> +    {
> +        regs->eip += 3;

What if the instruction has some (bogus but not invalid) opcode
prefix?

> @@ -2091,6 +2108,13 @@ static void vmx_invlpg_intercept(unsigned long vaddr)
>          vpid_sync_vcpu_gva(curr, vaddr);
>  }
>  
> +static int vmx_vmfunc_intercept(struct cpu_user_regs *regs)
> +{
> +    gdprintk(XENLOG_ERR, "Failed guest VMFUNC execution\n");
> +    domain_crash(current->domain);
> +    return X86EMUL_OKAY;
> +}

What is this unconditional crashing of the guest good for?

> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -3837,6 +3837,14 @@ x86_emulate(
>              goto rdtsc;
>          }
>  
> +        if (modrm == 0xd4) /* vmfunc */
> +        {
> +            fail_if(ops->vmfunc == NULL);
> +            if ( (rc = ops->vmfunc(ctxt) != 0) )
> +                goto done;
> +            break;
> +        }

Together with the two preceding if()-s this is now finally the point
where switch() should be used instead.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-22 18:56 ` [PATCH v2 07/12] x86/altp2m: add control of suppress_ve Ed White
  2015-06-24 13:05   ` Andrew Cooper
@ 2015-06-24 14:38   ` Jan Beulich
  2015-06-24 17:53     ` Ed White
  1 sibling, 1 reply; 116+ messages in thread
From: Jan Beulich @ 2015-06-24 14:38 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -237,6 +237,19 @@ struct p2m_domain {
>                                         p2m_access_t *p2ma,
>                                         p2m_query_t q,
>                                         unsigned int *page_order);
> +    int                (*set_entry_full)(struct p2m_domain *p2m,
> +                                         unsigned long gfn,
> +                                         mfn_t mfn, unsigned int page_order,
> +                                         p2m_type_t p2mt,
> +                                         p2m_access_t p2ma,
> +                                         unsigned int sve);
> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
> +                                         unsigned long gfn,
> +                                         p2m_type_t *p2mt,
> +                                         p2m_access_t *p2ma,
> +                                         p2m_query_t q,
> +                                         unsigned int *page_order,
> +                                         unsigned int *sve);

I have to admit that I find the _full suffixes here pretty odd. Based
on the functionality, they should be _sve. But then it seems
questionable how they could be useful to the generic p2m layer
anyway, i.e. why there would need to be such hooks in the first
place.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-22 18:56 ` [PATCH v2 04/12] x86/altp2m: basic data structures and support routines Ed White
  2015-06-24 10:06   ` Andrew Cooper
  2015-06-24 10:29   ` Andrew Cooper
@ 2015-06-24 14:44   ` Jan Beulich
  2 siblings, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-06-24 14:44 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -210,6 +210,14 @@ struct hvm_function_table {
>                                    uint32_t *ecx, uint32_t *edx);
>  
>      void (*enable_msr_exit_interception)(struct domain *d);
> +
> +    /* Alternate p2m */
> +    int (*ahvm_vcpu_initialise)(struct vcpu *v);
> +    void (*ahvm_vcpu_destroy)(struct vcpu *v);
> +    int (*ahvm_vcpu_reset)(struct vcpu *v);
> +    void (*ahvm_vcpu_update_eptp)(struct vcpu *v);
> +    void (*ahvm_vcpu_update_vmfunc_ve)(struct vcpu *v);
> +    bool_t (*ahvm_vcpu_emulate_ve)(struct vcpu *v);
>  };

These ahvm_ prefixes are pretty strange - this isn't about
alternate HVM after all.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 10/12] x86/altp2m: define and implement alternate p2m HVMOP types.
  2015-06-22 18:56 ` [PATCH v2 10/12] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
  2015-06-24 13:58   ` Andrew Cooper
@ 2015-06-24 14:53   ` Jan Beulich
  1 sibling, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-06-24 14:53 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -6424,6 +6424,222 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case HVMOP_altp2m_get_domain_state:
> +    {
> +        struct xen_hvm_altp2m_domain_state a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() )
> +            goto param_fail9;
> +
> +        a.state = altp2mhvm_active(d);
> +        rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;

__copy_to_guest()

> +
> +    param_fail9:

Can you please avoid introducing further numbered "param_fail"
labels? In the case here I think you could easily get away without
any label.

> +    case HVMOP_altp2m_destroy_p2m:
> +    {
> +        struct xen_hvm_altp2m_view a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail12;
> +
> +        if ( p2m_destroy_altp2m_by_id(d, a.view) )
> +            rc = 0;

This function should have its own return code, which should be
assigned to rc (avoiding all sorts of failures to be reported as
-EINVAL).

> +    case HVMOP_altp2m_switch_p2m:
> +    {
> +        struct xen_hvm_altp2m_view a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail13;
> +
> +        if ( p2m_switch_domain_altp2m_by_id(d, a.view) )
> +            rc = 0;

Same here.

> +    case HVMOP_altp2m_set_mem_access:
> +    {
> +        struct xen_hvm_altp2m_set_mem_access a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail14;
> +
> +        if ( p2m_set_altp2m_mem_access(d, a.view, a.pfn, a.hvmmem_access) )
> +            rc = 0;

And here.

> +    case HVMOP_altp2m_change_pfn:
> +    {
> +        struct xen_hvm_altp2m_change_pfn a;
> +        struct domain *d;
> +
> +        if ( copy_from_guest(&a, arg, 1) )
> +            return -EFAULT;
> +
> +        d = rcu_lock_domain_by_any_id(a.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +
> +        rc = -EINVAL;
> +        if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
> +             !d->arch.altp2m_active )
> +            goto param_fail15;
> +
> +        if ( p2m_change_altp2m_pfn(d, a.view, a.old_pfn, a.new_pfn) )
> +            rc = 0;

And again.

> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -389,6 +389,75 @@ DEFINE_XEN_GUEST_HANDLE(xen_hvm_evtchn_upcall_vector_t);
>  
>  #endif /* defined(__i386__) || defined(__x86_64__) */
>  
> +/* Set/get the altp2m state for a domain */

All of the below is being added outside any __XEN__/__XEN_TOOLS__
section, yet as Andrew noted you don't whitelist the ops for guest
access. This needs to be consistent.

> +#define HVMOP_altp2m_set_domain_state     24
> +#define HVMOP_altp2m_get_domain_state     25
> +struct xen_hvm_altp2m_domain_state {
> +    /* Domain to be updated or queried */
> +    domid_t domid;
> +    /* IN or OUT variable on/off */
> +    uint8_t state;
> +};

And if any of these are to be guest accessible, padding fields should
be made explicit, checked to be zero on input, and cleared to zero on
output if not copying back anyway what the guest has provided.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter.
  2015-06-22 18:56 ` [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter Ed White
  2015-06-24 14:06   ` Andrew Cooper
@ 2015-06-24 14:59   ` Jan Beulich
  2015-06-24 17:57     ` Ed White
  1 sibling, 1 reply; 116+ messages in thread
From: Jan Beulich @ 2015-06-24 14:59 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
> +    case HVM_PARAM_ALTP2MHVM:
> +        if ( a.value > 1 )
> +            rc = -EINVAL;
> +        if ( a.value &&
> +             d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
> +            rc = -EINVAL;
> +        break;

As you added the new param to the change-once section of
hvm_allow_set_param() - what is this code good for?

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 08/12] x86/altp2m: alternate p2m memory events.
  2015-06-22 18:56 ` [PATCH v2 08/12] x86/altp2m: alternate p2m memory events Ed White
  2015-06-24 13:09   ` Andrew Cooper
@ 2015-06-24 16:01   ` Lengyel, Tamas
  2015-06-24 18:02     ` Ed White
  1 sibling, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-24 16:01 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 6150 bytes --]

On Mon, Jun 22, 2015 at 2:56 PM, Ed White <edmund.h.white@intel.com> wrote:

> Add a flag to indicate that a memory event occurred in an alternate p2m
> and a field containing the p2m index. Allow the response to switch to
> a different p2m using the same flag and field.
>
> Modify p2m_access_check() to handle alternate p2m's.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/mm/p2m.c         | 20 +++++++++++++++++++-
>  xen/include/asm-arm/p2m.h     |  7 +++++++
>  xen/include/asm-x86/p2m.h     |  4 ++++
>  xen/include/public/vm_event.h | 13 ++++++++++++-
>  xen/include/xen/mem_access.h  |  1 +
>  5 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 87b4b75..389360a 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1516,6 +1516,13 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
>      }
>  }
>
> +void p2m_mem_access_altp2m_check(struct vcpu *v, const
> vm_event_response_t *rsp)
> +{
> +    if ( (rsp->flags & MEM_ACCESS_ALTERNATE_P2M) &&
> +         altp2mhvm_active(v->domain) )
> +        p2m_switch_vcpu_altp2m_by_id(v, rsp->u.mem_access.altp2m_idx);
> +}
>

The function should be renamed p2m_altp2m_check as it is not really
required to use mem_access at all to be able use altp2m. See my comment
below.


> +
>  bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
>                              struct npfec npfec,
>                              vm_event_request_t **req_ptr)
> @@ -1523,7 +1530,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned
> long gla,
>      struct vcpu *v = current;
>      unsigned long gfn = gpa >> PAGE_SHIFT;
>      struct domain *d = v->domain;
> -    struct p2m_domain* p2m = p2m_get_hostp2m(d);
> +    struct p2m_domain *p2m = NULL;
>      mfn_t mfn;
>      p2m_type_t p2mt;
>      p2m_access_t p2ma;
> @@ -1531,6 +1538,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned
> long gla,
>      int rc;
>      unsigned long eip = guest_cpu_user_regs()->eip;
>
> +    if ( altp2mhvm_active(d) )
> +        p2m = p2m_get_altp2m(v);
> +    if ( !p2m )
> +        p2m = p2m_get_hostp2m(d);
> +
>      /* First, handle rx2rw conversion automatically.
>       * These calls to p2m->set_entry() must succeed: we have the gfn
>       * locked and just did a successful get_entry(). */
> @@ -1637,6 +1649,12 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned
> long gla,
>          req->vcpu_id = v->vcpu_id;
>
>          p2m_vm_event_fill_regs(req);
> +
> +        if ( altp2mhvm_active(v->domain) )
> +        {
> +            req->flags |= MEM_ACCESS_ALTERNATE_P2M;
> +            req->u.mem_access.altp2m_idx = vcpu_altp2mhvm(v).p2midx;
> +        }
>      }
>
>      /* Pause the current VCPU */
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 63748ef..b31dd6f 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -109,6 +109,13 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
>      /* Not supported on ARM. */
>  }
>
> +static inline
> +void p2m_mem_access_altp2m_check(struct vcpu *v,
> +                                const mem_event_response_t *rsp)
> +{
> +    /* Not supported on ARM. */
> +}
> +
>  #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
>  #define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
>
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 16fd523..d84da33 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -762,6 +762,10 @@ bool_t p2m_find_altp2m_by_eptp(struct domain *d,
> uint64_t eptp, unsigned long *i
>  /* Switch alternate p2m for a single vcpu */
>  bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
>
> +/* Check to see if vcpu should be switched to a different p2m. */
> +void p2m_mem_access_altp2m_check(struct vcpu *v,
> +                                 const vm_event_response_t *rsp);
> +
>  /*
>   * p2m type to IOMMU flags
>   */
> diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
> index 577e971..b492f65 100644
> --- a/xen/include/public/vm_event.h
> +++ b/xen/include/public/vm_event.h
> @@ -149,13 +149,24 @@ struct vm_event_regs_x86 {
>   * potentially having side effects (like memory mapped or port I/O)
> disabled.
>   */
>  #define MEM_ACCESS_EMULATE_NOWRITE      (1 << 7)
> +/*
> + * This flag can be set in a request or a response
> + *
> + * On a request, indicates that the event occurred in the alternate p2m
> specified by
> + * the altp2m_idx request field.
> + *
> + * On a response, indicates that the VCPU should resume in the alternate
> p2m specified
> + * by the altp2m_idx response field if possible.
> + */
> +#define MEM_ACCESS_ALTERNATE_P2M        (1 << 8)
>

This definition should be renamed VM_EVENT_FLAG_ALTERNATE_P2M and moved to
the appropriate location. It should also be checked for all events, not
just for mem_access, similar to how VM_EVENT_FLAG_VCPU_PAUSED is checked
for, as we might want to switch views in response to a variety of events.
Right now I worked around this be specifying the response to a singlestep
event as if it was a response to a mem_access one, but that's very hackish.


>
>  struct vm_event_mem_access {
>      uint64_t gfn;
>      uint64_t offset;
>      uint64_t gla;   /* if flags has MEM_ACCESS_GLA_VALID set */
>      uint32_t flags; /* MEM_ACCESS_* */
> -    uint32_t _pad;
> +    uint16_t altp2m_idx; /* may be used during request and response */
> +    uint16_t _pad;
>  };
>
>  struct vm_event_write_ctrlreg {
> diff --git a/xen/include/xen/mem_access.h b/xen/include/xen/mem_access.h
> index f60b727..4d3d5ca 100644
> --- a/xen/include/xen/mem_access.h
> +++ b/xen/include/xen/mem_access.h
> @@ -36,6 +36,7 @@ static inline
>  void mem_access_resume(struct vcpu *v, vm_event_response_t *rsp)
>  {
>      p2m_mem_access_emulate_check(v, rsp);
> +    p2m_mem_access_altp2m_check(v, rsp);
>  }
>
>  #else
> --
> 1.9.1
>
>


-- 

[image: www.novetta.com]

Tamas K Lengyel

Senior Security Researcher

7921 Jones Branch Drive

McLean VA 22102

Email  tlengyel@novetta.com

[-- Attachment #1.2: Type: text/html, Size: 9435 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-22 18:56 ` [PATCH v2 09/12] x86/altp2m: add remaining support routines Ed White
  2015-06-23 18:15   ` Lengyel, Tamas
  2015-06-24 13:46   ` Andrew Cooper
@ 2015-06-24 16:15   ` Lengyel, Tamas
  2015-06-24 18:06     ` Ed White
  2015-06-25  2:44   ` Lengyel, Tamas
  3 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-24 16:15 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 24159 bytes --]

On Mon, Jun 22, 2015 at 2:56 PM, Ed White <edmund.h.white@intel.com> wrote:

> Add the remaining routines required to support enabling the alternate
> p2m functionality.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c              |  60 +++++-
>  xen/arch/x86/mm/hap/Makefile        |   1 +
>  xen/arch/x86/mm/hap/altp2m_hap.c    | 103 +++++++++
>  xen/arch/x86/mm/p2m-ept.c           |   3 +
>  xen/arch/x86/mm/p2m.c               | 405
> ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/altp2mhvm.h |   4 +
>  xen/include/asm-x86/p2m.h           |  33 +++
>  7 files changed, 601 insertions(+), 8 deletions(-)
>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index d75c12d..b758ee1 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2786,10 +2786,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> unsigned long gla,
>      p2m_access_t p2ma;
>      mfn_t mfn;
>      struct vcpu *v = current;
> -    struct p2m_domain *p2m;
> +    struct p2m_domain *p2m, *hostp2m;
>      int rc, fall_through = 0, paged = 0;
>      int sharing_enomem = 0;
>      vm_event_request_t *req_ptr = NULL;
> +    int altp2m_active = 0;
>
>      /* On Nested Virtualization, walk the guest page table.
>       * If this succeeds, all is fine.
> @@ -2845,15 +2846,33 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> unsigned long gla,
>      {
>          if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec)
> )
>              hvm_inject_hw_exception(TRAP_gp_fault, 0);
> -        rc = 1;
> -        goto out;
> +        return 1;
>      }
>
> -    p2m = p2m_get_hostp2m(v->domain);
> -    mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
> +    altp2m_active = altp2mhvm_active(v->domain);
> +
> +    /* Take a lock on the host p2m speculatively, to avoid potential
> +     * locking order problems later and to handle unshare etc.
> +     */
> +    hostp2m = p2m_get_hostp2m(v->domain);
> +    mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
>                                P2M_ALLOC | (npfec.write_access ?
> P2M_UNSHARE : 0),
>                                NULL);
>
> +    if ( altp2m_active )
> +    {
> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m) ==
> 1 )
> +        {
> +            /* entry was lazily copied from host -- retry */
> +            __put_gfn(hostp2m, gfn);
> +            return 1;
> +        }
> +
> +        mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
> +    }
> +    else
> +        p2m = hostp2m;
> +
>      /* Check access permissions first, then handle faults */
>      if ( mfn_x(mfn) != INVALID_MFN )
>      {
> @@ -2893,6 +2912,20 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned
> long gla,
>
>          if ( violation )
>          {
> +            /* Should #VE be emulated for this fault? */
> +            if ( p2m_is_altp2m(p2m) && !cpu_has_vmx_virt_exceptions )
> +            {
> +                unsigned int sve;
> +
> +                p2m->get_entry_full(p2m, gfn, &p2mt, &p2ma, 0, NULL,
> &sve);
> +
> +                if ( !sve && ahvm_vcpu_emulate_ve(v) )
> +                {
> +                    rc = 1;
> +                    goto out_put_gfn;
> +                }
> +            }
> +
>              if ( p2m_mem_access_check(gpa, gla, npfec, &req_ptr) )
>              {
>                  fall_through = 1;
> @@ -2912,7 +2945,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned
> long gla,
>           (npfec.write_access &&
>            (p2m_is_discard_write(p2mt) || (p2mt == p2m_mmio_write_dm))) )
>      {
> -        put_gfn(p2m->domain, gfn);
> +        __put_gfn(p2m, gfn);
> +        if ( altp2m_active )
> +            __put_gfn(hostp2m, gfn);
>
>          rc = 0;
>          if ( unlikely(is_pvh_vcpu(v)) )
> @@ -2941,6 +2976,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned
> long gla,
>      /* Spurious fault? PoD and log-dirty also take this path. */
>      if ( p2m_is_ram(p2mt) )
>      {
> +        rc = 1;
>          /*
>           * Page log dirty is always done with order 0. If this mfn
> resides in
>           * a large page, we do not change other pages type within that
> large
> @@ -2949,9 +2985,15 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned
> long gla,
>          if ( npfec.write_access )
>          {
>              paging_mark_dirty(v->domain, mfn_x(mfn));
> +            /* If p2m is really an altp2m, unlock here to avoid lock
> ordering
> +             * violation when the change below is propagated from host
> p2m */
> +            if ( altp2m_active )
> +                __put_gfn(p2m, gfn);
>              p2m_change_type_one(v->domain, gfn, p2m_ram_logdirty,
> p2m_ram_rw);
> +            __put_gfn(altp2m_active ? hostp2m : p2m, gfn);
> +
> +            goto out;
>          }
> -        rc = 1;
>          goto out_put_gfn;
>      }
>
> @@ -2961,7 +3003,9 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned
> long gla,
>      rc = fall_through;
>
>  out_put_gfn:
> -    put_gfn(p2m->domain, gfn);
> +    __put_gfn(p2m, gfn);
> +    if ( altp2m_active )
> +        __put_gfn(hostp2m, gfn);
>  out:
>      /* All of these are delayed until we exit, since we might
>       * sleep on event ring wait queues, and we must not hold
> diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
> index 68f2bb5..216cd90 100644
> --- a/xen/arch/x86/mm/hap/Makefile
> +++ b/xen/arch/x86/mm/hap/Makefile
> @@ -4,6 +4,7 @@ obj-y += guest_walk_3level.o
>  obj-$(x86_64) += guest_walk_4level.o
>  obj-y += nested_hap.o
>  obj-y += nested_ept.o
> +obj-y += altp2m_hap.o
>
>  guest_walk_%level.o: guest_walk.c Makefile
>         $(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
> diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c
> b/xen/arch/x86/mm/hap/altp2m_hap.c
> new file mode 100644
> index 0000000..899b636
> --- /dev/null
> +++ b/xen/arch/x86/mm/hap/altp2m_hap.c
> @@ -0,0 +1,103 @@
>
> +/******************************************************************************
> + * arch/x86/mm/hap/altp2m_hap.c
> + *
> + * Copyright (c) 2014 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> USA
> + */
> +
> +#include <asm/domain.h>
> +#include <asm/page.h>
> +#include <asm/paging.h>
> +#include <asm/p2m.h>
> +#include <asm/hap.h>
> +#include <asm/hvm/altp2mhvm.h>
> +
> +#include "private.h"
> +
> +/* Override macros from asm/page.h to make them work with mfn_t */
> +#undef mfn_valid
> +#define mfn_valid(_mfn) __mfn_valid(mfn_x(_mfn))
> +#undef page_to_mfn
> +#define page_to_mfn(_pg) _mfn(__page_to_mfn(_pg))
> +
> +/*
> + * If the fault is for a not present entry:
> + *     if the entry in the host p2m has a valid mfn, copy it and retry
> + *     else indicate that outer handler should handle fault
> + *
> + * If the fault is for a present entry:
> + *     indicate that outer handler should handle fault
> + */
> +
> +int
> +altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
> +                                unsigned long gla, struct npfec npfec,
> +                                struct p2m_domain **ap2m)
> +{
> +    struct p2m_domain *hp2m = p2m_get_hostp2m(v->domain);
> +    p2m_type_t p2mt;
> +    p2m_access_t p2ma;
> +    unsigned int page_order;
> +    unsigned long gfn, mask;
> +    mfn_t mfn;
> +    int rv;
> +
> +    *ap2m = p2m_get_altp2m(v);
> +
> +    mfn = get_gfn_type_access(*ap2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
> +                              0, &page_order);
> +    __put_gfn(*ap2m, gpa >> PAGE_SHIFT);
> +
> +    if ( mfn_x(mfn) != INVALID_MFN )
> +        return 0;
> +
> +    mfn = get_gfn_type_access(hp2m, gpa >> PAGE_SHIFT, &p2mt, &p2ma,
> +                              0, &page_order);
> +    put_gfn(hp2m->domain, gpa >> PAGE_SHIFT);
> +
> +    if ( mfn_x(mfn) == INVALID_MFN )
> +        return 0;
> +
> +    p2m_lock(*ap2m);
> +
> +    /* If this is a superpage mapping, round down both frame numbers
> +     * to the start of the superpage. */
> +    mask = ~((1UL << page_order) - 1);
> +    gfn = (gpa >> PAGE_SHIFT) & mask;
> +    mfn = _mfn(mfn_x(mfn) & mask);
> +
> +    rv = p2m_set_entry(*ap2m, gfn, mfn, page_order, p2mt, p2ma);
> +    p2m_unlock(*ap2m);
> +
> +    if ( rv ) {
> +        gdprintk(XENLOG_ERR,
> +           "failed to set entry for %#"PRIx64" -> %#"PRIx64"\n",
> +           gpa, mfn_x(mfn));
> +        domain_crash(hp2m->domain);
> +    }
> +
> +    return 1;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index e7719cf..4411b36 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -849,6 +849,9 @@ out:
>      if ( is_epte_present(&old_entry) )
>          ept_free_entry(p2m, &old_entry, target);
>
> +    if ( rc == 0 && p2m_is_hostp2m(p2m) )
> +        p2m_altp2m_propagate_change(d, gfn, mfn, order, p2mt, p2ma);
> +
>      return rc;
>  }
>
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 389360a..588acd5 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -2041,6 +2041,411 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu
> *v, uint16_t idx)
>      return rc;
>  }
>
> +void p2m_flush_altp2m(struct domain *d)
> +{
> +    uint16_t i;
> +
> +    altp2m_lock(d);
> +
> +    for ( i = 0; i < MAX_ALTP2M; i++ )
> +    {
> +        p2m_flush_table(d->arch.altp2m_p2m[i]);
> +        /* Uninit and reinit ept to force TLB shootdown */
> +        ept_p2m_uninit(d->arch.altp2m_p2m[i]);
> +        ept_p2m_init(d->arch.altp2m_p2m[i]);
> +        d->arch.altp2m_eptp[i] = ~0ul;
> +    }
> +
> +    altp2m_unlock(d);
> +}
> +
> +bool_t p2m_init_altp2m_by_id(struct domain *d, uint16_t idx)
> +{
> +    struct p2m_domain *p2m;
> +    struct ept_data *ept;
> +    bool_t rc = 0;
> +
> +    if ( idx > MAX_ALTP2M )
> +        return rc;
> +
> +    altp2m_lock(d);
> +
> +    if ( d->arch.altp2m_eptp[idx] == ~0ul )
> +    {
> +        p2m = d->arch.altp2m_p2m[idx];
> +        p2m->min_remapped_pfn = ~0ul;
> +        p2m->max_remapped_pfn = ~0ul;
> +        ept = &p2m->ept;
> +        ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> +        d->arch.altp2m_eptp[idx] = ept_get_eptp(ept);
> +        rc = 1;
> +    }
> +
> +    altp2m_unlock(d);
> +    return rc;
> +}
> +
> +bool_t p2m_init_next_altp2m(struct domain *d, uint16_t *idx)
> +{
> +    struct p2m_domain *p2m;
> +    struct ept_data *ept;
> +    bool_t rc = 0;
> +    uint16_t i;
> +
> +    altp2m_lock(d);
> +
> +    for ( i = 0; i < MAX_ALTP2M; i++ )
> +    {
> +        if ( d->arch.altp2m_eptp[i] != ~0ul )
> +            continue;
> +
> +        p2m = d->arch.altp2m_p2m[i];
> +        p2m->min_remapped_pfn = ~0ul;
> +        p2m->max_remapped_pfn = ~0ul;
> +        ept = &p2m->ept;
> +        ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> +        d->arch.altp2m_eptp[i] = ept_get_eptp(ept);
> +        *idx = i;
> +        rc = 1;
> +
> +        break;
> +    }
> +
> +    altp2m_unlock(d);
> +    return rc;
> +}
> +
> +bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx)
> +{
> +    struct p2m_domain *p2m;
> +    struct vcpu *curr = current;
> +    struct vcpu *v;
> +    bool_t rc = 0;
> +
> +    if ( !idx || idx > MAX_ALTP2M )
> +        return rc;
> +
> +    if ( curr->domain != d )
> +        domain_pause(d);
> +    else
> +        for_each_vcpu( d, v )
> +            if ( curr != v )
> +                vcpu_pause(v);
> +
> +    altp2m_lock(d);
> +
> +    if ( d->arch.altp2m_eptp[idx] != ~0ul )
> +    {
> +        p2m = d->arch.altp2m_p2m[idx];
> +
> +        if ( !_atomic_read(p2m->active_vcpus) )
> +        {
> +            p2m_flush_table(d->arch.altp2m_p2m[idx]);
> +            /* Uninit and reinit ept to force TLB shootdown */
> +            ept_p2m_uninit(d->arch.altp2m_p2m[idx]);
> +            ept_p2m_init(d->arch.altp2m_p2m[idx]);
> +            d->arch.altp2m_eptp[idx] = ~0ul;
> +            rc = 1;
> +        }
> +    }
> +
> +    altp2m_unlock(d);
> +
> +    if ( curr->domain != d )
> +        domain_unpause(d);
> +    else
> +        for_each_vcpu( d, v )
> +            if ( curr != v )
> +                vcpu_unpause(v);
> +
> +    return rc;
> +}
> +
> +bool_t p2m_switch_domain_altp2m_by_id(struct domain *d, uint16_t idx)
> +{
> +    struct vcpu *curr = current;
> +    struct vcpu *v;
> +    bool_t rc = 0;
> +
> +    if ( idx > MAX_ALTP2M )
> +        return rc;
> +
> +    if ( curr->domain != d )
> +        domain_pause(d);
> +    else
> +        for_each_vcpu( d, v )
> +            if ( curr != v )
> +                vcpu_pause(v);
> +
> +    altp2m_lock(d);
> +
> +    if ( d->arch.altp2m_eptp[idx] != ~0ul )
> +    {
> +        for_each_vcpu( d, v )
> +            if ( idx != vcpu_altp2mhvm(v).p2midx )
> +            {
> +                atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
> +                vcpu_altp2mhvm(v).p2midx = idx;
> +                atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
> +                ahvm_vcpu_update_eptp(v);
> +            }
> +
> +        rc = 1;
> +    }
> +
> +    altp2m_unlock(d);
> +
> +    if ( curr->domain != d )
> +        domain_unpause(d);
> +    else
> +        for_each_vcpu( d, v )
> +            if ( curr != v )
> +                vcpu_unpause(v);
> +
> +    return rc;
> +}
> +
> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
> +                                 unsigned long pfn, xenmem_access_t
> access)
> +{
>

This function IMHO should be merged with p2m_set_mem_access and should be
triggerable with the same memop (XENMEM_access_op) hypercall instead of
introducing a new hvmop one.


> +    struct p2m_domain *hp2m, *ap2m;
> +    p2m_access_t a, _a;
> +    p2m_type_t t;
> +    mfn_t mfn;
> +    unsigned int page_order;
> +    bool_t rc = 0;
> +
> +    static const p2m_access_t memaccess[] = {
> +#define ACCESS(ac) [XENMEM_access_##ac] = p2m_access_##ac
> +        ACCESS(n),
> +        ACCESS(r),
> +        ACCESS(w),
> +        ACCESS(rw),
> +        ACCESS(x),
> +        ACCESS(rx),
> +        ACCESS(wx),
> +        ACCESS(rwx),
> +#undef ACCESS
> +    };
> +
> +    if ( idx > MAX_ALTP2M || d->arch.altp2m_eptp[idx] == ~0ul )
> +        return 0;
> +
> +    ap2m = d->arch.altp2m_p2m[idx];
> +
> +    switch ( access )
> +    {
> +    case 0 ... ARRAY_SIZE(memaccess) - 1:
> +        a = memaccess[access];
> +        break;
> +    case XENMEM_access_default:
> +        a = ap2m->default_access;
> +        break;
> +    default:
> +        return 0;
> +    }
> +
> +    /* If request to set default access */
> +    if ( pfn == ~0ul )
> +    {
> +        ap2m->default_access = a;
> +        return 1;
> +    }
> +
> +    hp2m = p2m_get_hostp2m(d);
> +
> +    p2m_lock(ap2m);
> +
> +    mfn = ap2m->get_entry(ap2m, pfn, &t, &_a, 0, NULL);
> +
> +    /* Check host p2m if no valid entry in alternate */
> +    if ( !mfn_valid(mfn) )
> +    {
> +        mfn = hp2m->get_entry(hp2m, pfn, &t, &_a, 0, &page_order);
> +
> +        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
> +            goto out;
> +
> +        /* If this is a superpage, copy that first */
> +        if ( page_order != PAGE_ORDER_4K )
> +        {
> +            unsigned long gfn, mask;
> +            mfn_t mfn2;
> +
> +            mask = ~((1UL << page_order) - 1);
> +            gfn = pfn & mask;
> +            mfn2 = _mfn(mfn_x(mfn) & mask);
> +
> +            if ( ap2m->set_entry(ap2m, gfn, mfn2, page_order, t, _a) )
> +                goto out;
> +        }
> +    }
> +
> +    if ( !ap2m->set_entry_full(ap2m, pfn, mfn, PAGE_ORDER_4K, t, a,
> +                               (current->domain != d)) )
> +        rc = 1;
> +
> +out:
> +    p2m_unlock(ap2m);
> +    return rc;
> +}
> +
> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
> +                             unsigned long old_pfn, unsigned long new_pfn)
> +{
> +    struct p2m_domain *hp2m, *ap2m;
> +    p2m_access_t a;
> +    p2m_type_t t;
> +    mfn_t mfn;
> +    unsigned int page_order;
> +    bool_t rc = 0;
> +
> +    if ( idx > MAX_ALTP2M || d->arch.altp2m_eptp[idx] == ~0ul )
> +        return 0;
> +
> +    hp2m = p2m_get_hostp2m(d);
> +    ap2m = d->arch.altp2m_p2m[idx];
> +
> +    p2m_lock(ap2m);
> +
> +    mfn = ap2m->get_entry(ap2m, old_pfn, &t, &a, 0, NULL);
> +
> +    if ( new_pfn == ~0ul )
> +    {
> +        if ( mfn_valid(mfn) )
> +            p2m_remove_page(ap2m, old_pfn, mfn_x(mfn), PAGE_ORDER_4K);
> +        rc = 1;
> +        goto out;
> +    }
> +
> +    /* Check host p2m if no valid entry in alternate */
> +    if ( !mfn_valid(mfn) )
> +    {
> +        mfn = hp2m->get_entry(hp2m, old_pfn, &t, &a, 0, &page_order);
> +
> +        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
> +            goto out;
> +
> +        /* If this is a superpage, copy that first */
> +        if ( page_order != PAGE_ORDER_4K )
> +        {
> +            unsigned long gfn, mask;
> +
> +            mask = ~((1UL << page_order) - 1);
> +            gfn = old_pfn & mask;
> +            mfn = _mfn(mfn_x(mfn) & mask);
> +
> +            if ( ap2m->set_entry(ap2m, gfn, mfn, page_order, t, a) )
> +                goto out;
> +        }
> +    }
> +
> +    mfn = ap2m->get_entry(ap2m, new_pfn, &t, &a, 0, NULL);
> +
> +    if ( !mfn_valid(mfn) )
> +        mfn = hp2m->get_entry(hp2m, new_pfn, &t, &a, 0, NULL);
> +
> +    if ( !mfn_valid(mfn) || (t != p2m_ram_rw) )
> +        goto out;
> +
> +    if ( !ap2m->set_entry_full(ap2m, old_pfn, mfn, PAGE_ORDER_4K, t, a,
> +                               (current->domain != d)) )
> +    {
> +        rc = 1;
> +
> +        if ( ap2m->min_remapped_pfn == ~0ul ||
> +             new_pfn < ap2m->min_remapped_pfn )
> +            ap2m->min_remapped_pfn = new_pfn;
> +        if ( ap2m->max_remapped_pfn == ~0ul ||
> +             new_pfn > ap2m->max_remapped_pfn )
> +            ap2m->max_remapped_pfn = new_pfn;
> +    }
> +
> +out:
> +    p2m_unlock(ap2m);
> +    return rc;
> +}
> +
> +static inline void p2m_reset_altp2m(struct p2m_domain *p2m)
> +{
> +    p2m_flush_table(p2m);
> +    /* Uninit and reinit ept to force TLB shootdown */
> +    ept_p2m_uninit(p2m);
> +    ept_p2m_init(p2m);
> +    p2m->min_remapped_pfn = ~0ul;
> +    p2m->max_remapped_pfn = ~0ul;
> +}
> +
> +void p2m_altp2m_propagate_change(struct domain *d, unsigned long gfn,
> +                                 mfn_t mfn, unsigned int page_order,
> +                                 p2m_type_t p2mt, p2m_access_t p2ma)
> +{
> +    struct p2m_domain *p2m;
> +    p2m_access_t a;
> +    p2m_type_t t;
> +    mfn_t m;
> +    uint16_t i;
> +    bool_t reset_p2m;
> +    unsigned int reset_count = 0;
> +    uint16_t last_reset_idx = ~0;
> +
> +    if ( !altp2mhvm_active(d) )
> +        return;
> +
> +    altp2m_lock(d);
> +
> +    for ( i = 0; i < MAX_ALTP2M; i++ )
> +    {
> +        if ( d->arch.altp2m_eptp[i] == ~0ul )
> +            continue;
> +
> +        p2m = d->arch.altp2m_p2m[i];
> +        m = get_gfn_type_access(p2m, gfn, &t, &a, 0, NULL);
> +
> +        reset_p2m = 0;
> +
> +        /* Check for a dropped page that may impact this altp2m */
> +        if ( mfn_x(mfn) == INVALID_MFN &&
> +             gfn >= p2m->min_remapped_pfn && gfn <= p2m->max_remapped_pfn
> )
> +            reset_p2m = 1;
> +
> +        if ( reset_p2m )
> +        {
> +            if ( !reset_count++ )
> +            {
> +                p2m_reset_altp2m(p2m);
> +                last_reset_idx = i;
> +            }
> +            else
> +            {
> +                /* At least 2 altp2m's impacted, so reset everything */
> +                __put_gfn(p2m, gfn);
> +
> +                for ( i = 0; i < MAX_ALTP2M; i++ )
> +                {
> +                    if ( i == last_reset_idx ||
> +                         d->arch.altp2m_eptp[i] == ~0ul )
> +                        continue;
> +
> +                    p2m = d->arch.altp2m_p2m[i];
> +                    p2m_lock(p2m);
> +                    p2m_reset_altp2m(p2m);
> +                    p2m_unlock(p2m);
> +                }
> +
> +                goto out;
> +            }
> +        }
> +        else if ( mfn_x(m) != INVALID_MFN )
> +           p2m_set_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
> +
> +        __put_gfn(p2m, gfn);
> +    }
> +
> +out:
> +    altp2m_unlock(d);
> +}
> +
>  /*** Audit ***/
>
>  #if P2M_AUDIT
> diff --git a/xen/include/asm-x86/hvm/altp2mhvm.h
> b/xen/include/asm-x86/hvm/altp2mhvm.h
> index a4b8e24..08ff79b 100644
> --- a/xen/include/asm-x86/hvm/altp2mhvm.h
> +++ b/xen/include/asm-x86/hvm/altp2mhvm.h
> @@ -34,5 +34,9 @@ int altp2mhvm_vcpu_initialise(struct vcpu *v);
>  void altp2mhvm_vcpu_destroy(struct vcpu *v);
>  void altp2mhvm_vcpu_reset(struct vcpu *v);
>
> +/* Alternate p2m paging */
> +int altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
> +    unsigned long gla, struct npfec npfec, struct p2m_domain **ap2m);
> +
>  #endif /* _HVM_ALTP2M_H */
>
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index d84da33..3f17211 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -279,6 +279,11 @@ struct p2m_domain {
>      /* Highest guest frame that's ever been mapped in the p2m */
>      unsigned long max_mapped_pfn;
>
> +    /* Alternate p2m's only: range of pfn's for which underlying
> +     * mfn may have duplicate mappings */
> +    unsigned long min_remapped_pfn;
> +    unsigned long max_remapped_pfn;
> +
>      /* When releasing shared gfn's in a preemptible manner, recall where
>       * to resume the search */
>      unsigned long next_shared_gfn_to_relinquish;
> @@ -766,6 +771,34 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v,
> uint16_t idx);
>  void p2m_mem_access_altp2m_check(struct vcpu *v,
>                                   const vm_event_response_t *rsp);
>
> +/* Flush all the alternate p2m's for a domain */
> +void p2m_flush_altp2m(struct domain *d);
> +
> +/* Make a specific alternate p2m valid */
> +bool_t p2m_init_altp2m_by_id(struct domain *d, uint16_t idx);
> +
> +/* Find an available alternate p2m and make it valid */
> +bool_t p2m_init_next_altp2m(struct domain *d, uint16_t *idx);
> +
> +/* Make a specific alternate p2m invalid */
> +bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx);
> +
> +/* Switch alternate p2m for entire domain */
> +bool_t p2m_switch_domain_altp2m_by_id(struct domain *d, uint16_t idx);
> +
> +/* Set access type for a pfn */
> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
> +                                 unsigned long pfn, xenmem_access_t
> access);
> +
> +/* Replace a pfn with a different pfn */
> +bool_t p2m_change_altp2m_pfn(struct domain *d, uint16_t idx,
> +                             unsigned long old_pfn, unsigned long
> new_pfn);
> +
> +/* Propagate a host p2m change to all alternate p2m's */
> +void p2m_altp2m_propagate_change(struct domain *d, unsigned long gfn,
> +                                 mfn_t mfn, unsigned int page_order,
> +                                 p2m_type_t p2mt, p2m_access_t p2ma);
> +
>  /*
>   * p2m type to IOMMU flags
>   */
> --
> 1.9.1
>
>


-- 

[image: www.novetta.com]

Tamas K Lengyel

Senior Security Researcher

7921 Jones Branch Drive

McLean VA 22102

Email  tlengyel@novetta.com

[-- Attachment #1.2: Type: text/html, Size: 29706 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-24 13:37       ` Razvan Cojocaru
@ 2015-06-24 16:43         ` Ed White
  2015-06-24 21:34           ` Lengyel, Tamas
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-24 16:43 UTC (permalink / raw)
  To: Razvan Cojocaru, Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/24/2015 06:37 AM, Razvan Cojocaru wrote:
> On 06/24/2015 04:32 PM, Lengyel, Tamas wrote:
>>
>>
>> On Wed, Jun 24, 2015 at 1:39 AM, Razvan Cojocaru
>> <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>> wrote:
>>
>>     On 06/24/2015 12:27 AM, Lengyel, Tamas wrote:
>>     > I've extended xen-access to exercise this new feature taking into
>>     > account some of the current limitations. Using the altp2m_write|exec
>>     > options we create a duplicate view of the default hostp2m, and instead
>>     > of relaxing the mem_access permissions when we encounter a violation, we
>>     > swap the view on the violating vCPU while also enabling MTF
>>     > singlestepping. When the singlestep event fires, we use the response to
>>     > that event to swap the view back to the restricted altp2m view.
>>
>>     That's certainly very interesting. I wonder what the benefits are in
>>     this case over emulating the fault-causing instruction (other than
>>     obviously not going through the emulator)? The altp2m method would
>>     certainly be slower, since you need more round-trips from userspace to
>>     the hypervisor (the EPT vm_event handling + the singlestep event,
>>     whereas with emulation you just reply to the original vm_event).
>>
>>
>>     Regards,
>>     Razvan
>>
>>
>> Certainly, this is pretty slow right now, especially for the altp2m_exec
>> case. However, sometimes you simply cannot emulate. For example if you
>> write breakpoints into target locations, the original instruction has
>> been overwritten with 0xCC. If you have a duplicate of the page without
>> the breakpoint, this is an easy way to make the guest fetch the original
>> instruction. Of course, if you extend the emulation routine where you
>> can provide the instruction to emulate, instead of it being fetched from
>> guest memory, that would be equally useful ;)
> 
> Makes sense, thanks for the explanation! Sure, sending back the
> instruction to emulate could be something to consider for the future.
> 
> 
> Thanks,
> Razvan
> 

One thing I'd add is that what Tamas has done provides a valuable test that
the cross-domain functionality works, even if it might not be a recommended
design pattern. Our primary use case at Intel is intra-domain, and there
the advantages of avoiding many exits are clear.

Also, even cross-domain usage allows for different views of, and levels of
access to, memory concurrently on different vcpus.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-24 10:06   ` Andrew Cooper
  2015-06-24 10:23     ` Jan Beulich
@ 2015-06-24 17:20     ` Ed White
  1 sibling, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-24 17:20 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/24/2015 03:06 AM, Andrew Cooper wrote:
>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
>> index d0d3f1e..202aa42 100644
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -459,7 +459,7 @@ void hap_domain_init(struct domain *d)
>>  int hap_enable(struct domain *d, u32 mode)
>>  {
>>      unsigned int old_pages;
>> -    uint8_t i;
>> +    uint16_t i;
>>      int rv = 0;
>>  
>>      domain_pause(d);
>> @@ -498,6 +498,24 @@ int hap_enable(struct domain *d, u32 mode)
>>             goto out;
>>      }
>>  
>> +    /* Init alternate p2m data */
>> +    if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL )
> 
> Please use alloc_domheap_page() and map_domain_page_global() so the
> allocation is accounted against the domain.

You raised this back in January too, and I did try it. Unfortunately,
allocating that way caused repeated Xen panics, and when I reported that
to you and Tim on the list 2 or 3 months ago Tim said that he thought
the existing code was acceptable in this instance, since this is only 1
page per-domain.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 05/12] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-06-24 11:59   ` Andrew Cooper
@ 2015-06-24 17:31     ` Ed White
  2015-06-24 17:40       ` Andrew Cooper
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-24 17:31 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/24/2015 04:59 AM, Andrew Cooper wrote:
>> +
>> +    if ( !veinfo )
>> +        return 0;
>> +
>> +    if ( veinfo->semaphore != 0 )
>> +        goto out;
> 
> The semantics of this semaphore are not clearly spelled out in the
> manual.  The only information I can locate concerning this field is in
> note in 25.5.6.1 which says:
> 
> "Delivery of virtualization exceptions writes the value FFFFFFFFH to
> offset 4 in the virtualization-exception informa-
> tion area (see Section 25.5.6.2). Thus, once a virtualization exception
> occurs, another can occur only if software
> clears this field."
> 
> I presume this should be taken to mean "software writes 0 to this
> field", but some clarification would be nice.
> 

Immediately above that, where the conditions required to deliver #VE
are discussed, it says "the 32 bits at offset 4 in the
virtualization-exception information area are all 0". Hardware never
writes anything other than FFFFFFFFH there, so only software can make
that be so.

>> +
>> +            if ( !p2m_find_altp2m_by_eptp(v->domain, eptp, &idx) )
>> +            {
>> +                gdprintk(XENLOG_ERR, "EPTP not found in alternate p2m list\n");
>> +                domain_crash(v->domain);
>> +            }
>> +        }
>> +
> 
> Is it worth checking that idx is plausible at this point, before blindly
> writing it back into the vcpu structure?

I'm not sure I follow your logic. In the case where the hardware supports
EPTP_INDEX, the hardware itself is asserting that the index is valid. In the
case quoted above, if the index isn't valid p2m_find_altp2m_by_eptp() will
fail.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 05/12] VMX/altp2m: add code to support EPTP switching and #VE.
  2015-06-24 17:31     ` Ed White
@ 2015-06-24 17:40       ` Andrew Cooper
  0 siblings, 0 replies; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 17:40 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Jan Beulich,
	tlengyel, Daniel De Graaf

On 24/06/15 18:31, Ed White wrote:
> On 06/24/2015 04:59 AM, Andrew Cooper wrote:
>>> +
>>> +    if ( !veinfo )
>>> +        return 0;
>>> +
>>> +    if ( veinfo->semaphore != 0 )
>>> +        goto out;
>> The semantics of this semaphore are not clearly spelled out in the
>> manual.  The only information I can locate concerning this field is in
>> note in 25.5.6.1 which says:
>>
>> "Delivery of virtualization exceptions writes the value FFFFFFFFH to
>> offset 4 in the virtualization-exception informa-
>> tion area (see Section 25.5.6.2). Thus, once a virtualization exception
>> occurs, another can occur only if software
>> clears this field."
>>
>> I presume this should be taken to mean "software writes 0 to this
>> field", but some clarification would be nice.
>>
> Immediately above that, where the conditions required to deliver #VE
> are discussed, it says "the 32 bits at offset 4 in the
> virtualization-exception information area are all 0". Hardware never
> writes anything other than FFFFFFFFH there, so only software can make
> that be so.

So it does.  Sorry for missing that.

>
>>> +
>>> +            if ( !p2m_find_altp2m_by_eptp(v->domain, eptp, &idx) )
>>> +            {
>>> +                gdprintk(XENLOG_ERR, "EPTP not found in alternate p2m list\n");
>>> +                domain_crash(v->domain);
>>> +            }
>>> +        }
>>> +
>> Is it worth checking that idx is plausible at this point, before blindly
>> writing it back into the vcpu structure?
> I'm not sure I follow your logic. In the case where the hardware supports
> EPTP_INDEX, the hardware itself is asserting that the index is valid. In the
> case quoted above, if the index isn't valid p2m_find_altp2m_by_eptp() will
> fail.

I tend to be somewhat more pessimistic when coding.

It is possible for __vmread(EPTP_INDEX, &idx); to return any index
between 0 and 511 (including some other software bug which has gone and
accidentally set of altp2m 11).

I would put a BUG_ON(idx >= MAX_ALTP2M) here just for safety.  Nothing
good can come of attempting to continue in such a state, but it is
certainly not an impossible situation to get into.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-24 13:46   ` Andrew Cooper
@ 2015-06-24 17:47     ` Ed White
  2015-06-24 18:19       ` Andrew Cooper
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-24 17:47 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/24/2015 06:46 AM, Andrew Cooper wrote:
> On 22/06/15 19:56, Ed White wrote:
>> Add the remaining routines required to support enabling the alternate
>> p2m functionality.
>>
>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>> ---
>>  xen/arch/x86/hvm/hvm.c              |  60 +++++-
>>  xen/arch/x86/mm/hap/Makefile        |   1 +
>>  xen/arch/x86/mm/hap/altp2m_hap.c    | 103 +++++++++
>>  xen/arch/x86/mm/p2m-ept.c           |   3 +
>>  xen/arch/x86/mm/p2m.c               | 405 ++++++++++++++++++++++++++++++++++++
>>  xen/include/asm-x86/hvm/altp2mhvm.h |   4 +
>>  xen/include/asm-x86/p2m.h           |  33 +++
>>  7 files changed, 601 insertions(+), 8 deletions(-)
>>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
>>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index d75c12d..b758ee1 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -2786,10 +2786,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>>      p2m_access_t p2ma;
>>      mfn_t mfn;
>>      struct vcpu *v = current;
>> -    struct p2m_domain *p2m;
>> +    struct p2m_domain *p2m, *hostp2m;
>>      int rc, fall_through = 0, paged = 0;
>>      int sharing_enomem = 0;
>>      vm_event_request_t *req_ptr = NULL;
>> +    int altp2m_active = 0;
> 
> bool_t
> 
>>  
>>      /* On Nested Virtualization, walk the guest page table.
>>       * If this succeeds, all is fine.
>> @@ -2845,15 +2846,33 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long gla,
>>      {
>>          if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec) )
>>              hvm_inject_hw_exception(TRAP_gp_fault, 0);
>> -        rc = 1;
>> -        goto out;
>> +        return 1;
> 
> What is the justification for skipping the normal out: processing?
> 

At one point in the development of this patch, I had some code after
the out label that assumed at least 1 p2m lock was held. I observed
that at the point above, none of the conditions that require extra
processing after out could be true, so to avoid even more complication
I made the above change. Since the change after out: was later factored
out, the above change is no longer relevant, but it remains true that
none of the conditions requiring extra out: processing can be true here.

>>      }
>>  
>> -    p2m = p2m_get_hostp2m(v->domain);
>> -    mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 
>> +    altp2m_active = altp2mhvm_active(v->domain);
>> +
>> +    /* Take a lock on the host p2m speculatively, to avoid potential
>> +     * locking order problems later and to handle unshare etc.
>> +     */
>> +    hostp2m = p2m_get_hostp2m(v->domain);
>> +    mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
>>                                P2M_ALLOC | (npfec.write_access ? P2M_UNSHARE : 0),
>>                                NULL);
>>  
>> +    if ( altp2m_active )
>> +    {
>> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m) == 1 )
>> +        {
>> +            /* entry was lazily copied from host -- retry */
>> +            __put_gfn(hostp2m, gfn);
>> +            return 1;
> 
> Again, please don't skip the out: processing.

Same thing. There is no possibility of extra out processing being
required. There is precedent for this: the MMIO bypass skips out
processing.

>> +    if ( rv ) {
> 
> Style (brace on new line)
> 
>> +        gdprintk(XENLOG_ERR,
>> +	    "failed to set entry for %#"PRIx64" -> %#"PRIx64"\n",
> 
> It would be useful to know more information, (which altp2m), and to
> prefer gfn over gpa to avoid mixing unqualified linear and frame numbers.

Ack on both counts. I copied this from somewhere else, and in my
private branch I carry a patch which logs much more info.

>> +bool_t p2m_destroy_altp2m_by_id(struct domain *d, uint16_t idx)
>> +{
>> +    struct p2m_domain *p2m;
>> +    struct vcpu *curr = current;
>> +    struct vcpu *v;
>> +    bool_t rc = 0;
>> +
>> +    if ( !idx || idx > MAX_ALTP2M )
>> +        return rc;
>> +
>> +    if ( curr->domain != d )
>> +        domain_pause(d);
>> +    else
>> +        for_each_vcpu( d, v )
>> +            if ( curr != v )
>> +                vcpu_pause(v);
> 
> This looks like some hoop jumping around the assertions in
> domain_pause() and vcpu_pause().
> 
> We should probably have some new helpers where the domain needs to be
> paused, possibly while in context.  The current domain/vcpu_pause() are
> almost always used where it is definitely not safe to pause in context,
> hence the assertions.
> 

It is. I'd be happy to use new helpers, I don't feel qualified to
write them.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-24 14:38   ` Jan Beulich
@ 2015-06-24 17:53     ` Ed White
  2015-06-25  8:12       ` Jan Beulich
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-24 17:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>                                         p2m_access_t *p2ma,
>>                                         p2m_query_t q,
>>                                         unsigned int *page_order);
>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>> +                                         unsigned long gfn,
>> +                                         mfn_t mfn, unsigned int page_order,
>> +                                         p2m_type_t p2mt,
>> +                                         p2m_access_t p2ma,
>> +                                         unsigned int sve);
>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>> +                                         unsigned long gfn,
>> +                                         p2m_type_t *p2mt,
>> +                                         p2m_access_t *p2ma,
>> +                                         p2m_query_t q,
>> +                                         unsigned int *page_order,
>> +                                         unsigned int *sve);
> 
> I have to admit that I find the _full suffixes here pretty odd. Based
> on the functionality, they should be _sve. But then it seems
> questionable how they could be useful to the generic p2m layer
> anyway, i.e. why there would need to be such hooks in the first
> place.

I did originally use _sve suffixes. I changed them because there
may be some future case where these routines control some other
EPTE bit too. I made them hooks because I thought calling ept...
functions directly would be a layering violation.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter.
  2015-06-24 14:59   ` Jan Beulich
@ 2015-06-24 17:57     ` Ed White
  2015-06-24 18:08       ` Andrew Cooper
  2015-06-25  8:33       ` Jan Beulich
  0 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-24 17:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

On 06/24/2015 07:59 AM, Jan Beulich wrote:
>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>> +    case HVM_PARAM_ALTP2MHVM:
>> +        if ( a.value > 1 )
>> +            rc = -EINVAL;
>> +        if ( a.value &&
>> +             d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
>> +            rc = -EINVAL;
>> +        break;
> 
> As you added the new param to the change-once section of
> hvm_allow_set_param() - what is this code good for?

I don't understand. How does change-once invalidate this code?

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 08/12] x86/altp2m: alternate p2m memory events.
  2015-06-24 16:01   ` Lengyel, Tamas
@ 2015-06-24 18:02     ` Ed White
  0 siblings, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-24 18:02 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/24/2015 09:01 AM, Lengyel, Tamas wrote:
> On Mon, Jun 22, 2015 at 2:56 PM, Ed White <edmund.h.white@intel.com> wrote:
> 
>> Add a flag to indicate that a memory event occurred in an alternate p2m
>> and a field containing the p2m index. Allow the response to switch to
>> a different p2m using the same flag and field.
>>
>> Modify p2m_access_check() to handle alternate p2m's.
>>
>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>> ---
>>  xen/arch/x86/mm/p2m.c         | 20 +++++++++++++++++++-
>>  xen/include/asm-arm/p2m.h     |  7 +++++++
>>  xen/include/asm-x86/p2m.h     |  4 ++++
>>  xen/include/public/vm_event.h | 13 ++++++++++++-
>>  xen/include/xen/mem_access.h  |  1 +
>>  5 files changed, 43 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
>> index 87b4b75..389360a 100644
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -1516,6 +1516,13 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
>>      }
>>  }
>>
>> +void p2m_mem_access_altp2m_check(struct vcpu *v, const
>> vm_event_response_t *rsp)
>> +{
>> +    if ( (rsp->flags & MEM_ACCESS_ALTERNATE_P2M) &&
>> +         altp2mhvm_active(v->domain) )
>> +        p2m_switch_vcpu_altp2m_by_id(v, rsp->u.mem_access.altp2m_idx);
>> +}
>>
> 
> The function should be renamed p2m_altp2m_check as it is not really
> required to use mem_access at all to be able use altp2m. See my comment
> below.
> 
> 
>> +
>>  bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
>>                              struct npfec npfec,
>>                              vm_event_request_t **req_ptr)
>> @@ -1523,7 +1530,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned
>> long gla,
>>      struct vcpu *v = current;
>>      unsigned long gfn = gpa >> PAGE_SHIFT;
>>      struct domain *d = v->domain;
>> -    struct p2m_domain* p2m = p2m_get_hostp2m(d);
>> +    struct p2m_domain *p2m = NULL;
>>      mfn_t mfn;
>>      p2m_type_t p2mt;
>>      p2m_access_t p2ma;
>> @@ -1531,6 +1538,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned
>> long gla,
>>      int rc;
>>      unsigned long eip = guest_cpu_user_regs()->eip;
>>
>> +    if ( altp2mhvm_active(d) )
>> +        p2m = p2m_get_altp2m(v);
>> +    if ( !p2m )
>> +        p2m = p2m_get_hostp2m(d);
>> +
>>      /* First, handle rx2rw conversion automatically.
>>       * These calls to p2m->set_entry() must succeed: we have the gfn
>>       * locked and just did a successful get_entry(). */
>> @@ -1637,6 +1649,12 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned
>> long gla,
>>          req->vcpu_id = v->vcpu_id;
>>
>>          p2m_vm_event_fill_regs(req);
>> +
>> +        if ( altp2mhvm_active(v->domain) )
>> +        {
>> +            req->flags |= MEM_ACCESS_ALTERNATE_P2M;
>> +            req->u.mem_access.altp2m_idx = vcpu_altp2mhvm(v).p2midx;
>> +        }
>>      }
>>
>>      /* Pause the current VCPU */
>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>> index 63748ef..b31dd6f 100644
>> --- a/xen/include/asm-arm/p2m.h
>> +++ b/xen/include/asm-arm/p2m.h
>> @@ -109,6 +109,13 @@ void p2m_mem_access_emulate_check(struct vcpu *v,
>>      /* Not supported on ARM. */
>>  }
>>
>> +static inline
>> +void p2m_mem_access_altp2m_check(struct vcpu *v,
>> +                                const mem_event_response_t *rsp)
>> +{
>> +    /* Not supported on ARM. */
>> +}
>> +
>>  #define p2m_is_foreign(_t)  ((_t) == p2m_map_foreign)
>>  #define p2m_is_ram(_t)      ((_t) == p2m_ram_rw || (_t) == p2m_ram_ro)
>>
>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
>> index 16fd523..d84da33 100644
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -762,6 +762,10 @@ bool_t p2m_find_altp2m_by_eptp(struct domain *d,
>> uint64_t eptp, unsigned long *i
>>  /* Switch alternate p2m for a single vcpu */
>>  bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, uint16_t idx);
>>
>> +/* Check to see if vcpu should be switched to a different p2m. */
>> +void p2m_mem_access_altp2m_check(struct vcpu *v,
>> +                                 const vm_event_response_t *rsp);
>> +
>>  /*
>>   * p2m type to IOMMU flags
>>   */
>> diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
>> index 577e971..b492f65 100644
>> --- a/xen/include/public/vm_event.h
>> +++ b/xen/include/public/vm_event.h
>> @@ -149,13 +149,24 @@ struct vm_event_regs_x86 {
>>   * potentially having side effects (like memory mapped or port I/O)
>> disabled.
>>   */
>>  #define MEM_ACCESS_EMULATE_NOWRITE      (1 << 7)
>> +/*
>> + * This flag can be set in a request or a response
>> + *
>> + * On a request, indicates that the event occurred in the alternate p2m
>> specified by
>> + * the altp2m_idx request field.
>> + *
>> + * On a response, indicates that the VCPU should resume in the alternate
>> p2m specified
>> + * by the altp2m_idx response field if possible.
>> + */
>> +#define MEM_ACCESS_ALTERNATE_P2M        (1 << 8)
>>
> 
> This definition should be renamed VM_EVENT_FLAG_ALTERNATE_P2M and moved to
> the appropriate location. It should also be checked for all events, not
> just for mem_access, similar to how VM_EVENT_FLAG_VCPU_PAUSED is checked
> for, as we might want to switch views in response to a variety of events.
> Right now I worked around this be specifying the response to a singlestep
> event as if it was a response to a mem_access one, but that's very hackish.

Good points. I'll do as you suggest.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-24 16:15   ` Lengyel, Tamas
@ 2015-06-24 18:06     ` Ed White
  2015-06-25  8:52       ` Ian Campbell
  2015-06-25 12:44       ` Lengyel, Tamas
  0 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-24 18:06 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
>> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
>> +                                 unsigned long pfn, xenmem_access_t
>> access)
>> +{
>>
> 
> This function IMHO should be merged with p2m_set_mem_access and should be
> triggerable with the same memop (XENMEM_access_op) hypercall instead of
> introducing a new hvmop one.

I think we should vote on this. My view is that it makes XENMEM_access_op
too complicated to use. It also makes using this one specific altp2m
capability different to using any of the others -- especially if we adopt
Andrew's suggestion and make all the altp2m ops subops.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter.
  2015-06-24 17:57     ` Ed White
@ 2015-06-24 18:08       ` Andrew Cooper
  2015-06-25  8:34         ` Jan Beulich
  2015-06-25  8:33       ` Jan Beulich
  1 sibling, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 18:08 UTC (permalink / raw)
  To: Ed White, Jan Beulich
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, xen-devel,
	tlengyel, Daniel De Graaf

On 24/06/15 18:57, Ed White wrote:
> On 06/24/2015 07:59 AM, Jan Beulich wrote:
>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>> +    case HVM_PARAM_ALTP2MHVM:
>>> +        if ( a.value > 1 )
>>> +            rc = -EINVAL;
>>> +        if ( a.value &&
>>> +             d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
>>> +            rc = -EINVAL;
>>> +        break;
>> As you added the new param to the change-once section of
>> hvm_allow_set_param() - what is this code good for?
> I don't understand. How does change-once invalidate this code?

I don't believe it does.  This code (appears to) enforce the param being
a boolean, and exclusive against HVM_PARAM_NESTEDHVM.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-24 17:47     ` Ed White
@ 2015-06-24 18:19       ` Andrew Cooper
  2015-06-26 16:30         ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-24 18:19 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Jan Beulich,
	tlengyel, Daniel De Graaf

On 24/06/15 18:47, Ed White wrote:
>> > This looks like some hoop jumping around the assertions in
>> > domain_pause() and vcpu_pause().
>> > 
>> > We should probably have some new helpers where the domain needs to be
>> > paused, possibly while in context.  The current domain/vcpu_pause() are
>> > almost always used where it is definitely not safe to pause in context,
>> > hence the assertions.
>> > 
> It is. I'd be happy to use new helpers, I don't feel qualified to
> write them.
>
> Ed

Something like this?  Only compile tested.  In the meantime, I have an
optimisation in mind for domain_pause() on domains with large numbers of
vcpus, but that will have to wait a while.

From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Wed, 24 Jun 2015 19:06:14 +0100
Subject: [PATCH] common/domain: Helpers to pause a domain while in context

For use on codepaths which would need to use domain_pause() but might be in
the target domain's context.  In the case that the target domain is in
context,
all other vcpus are paused.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/common/domain.c     |   28 ++++++++++++++++++++++++++++
 xen/include/xen/sched.h |    5 +++++
 2 files changed, 33 insertions(+)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 3bc52e6..a1d27e3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1010,6 +1010,34 @@ int domain_unpause_by_systemcontroller(struct
domain *d)
     return 0;
 }
 
+void domain_pause_except_self(struct domain *d)
+{
+    struct vcpu *v, *curr = current;
+
+    if ( curr->domain == d )
+    {
+        for_each_vcpu( d, v )
+            if ( likely(v != curr) )
+                vcpu_pause(v);
+    }
+    else
+        domain_pause(d);
+}
+
+void domain_unpause_except_self(struct domain *d)
+{
+    struct vcpu *v, *curr = current;
+
+    if ( curr->domain == d )
+    {
+        for_each_vcpu( d, v )
+            if ( likely(v != curr) )
+                vcpu_unpause(v);
+    }
+    else
+        domain_unpause(d);
+}
+
 int vcpu_reset(struct vcpu *v)
 {
     struct domain *d = v->domain;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b29d9e7..8e1345a 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -804,6 +804,11 @@ static inline int
domain_pause_by_systemcontroller_nosync(struct domain *d)
 {
     return __domain_pause_by_systemcontroller(d, domain_pause_nosync);
 }
+
+/* domain_pause() but safe against trying to pause current. */
+void domain_pause_except_self(struct domain *d);
+void domain_unpause_except_self(struct domain *d);
+
 void cpu_init(void);
 
 struct scheduler;

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.
  2015-06-24 12:47   ` Andrew Cooper
@ 2015-06-24 20:29     ` Ed White
  2015-06-25  8:26       ` Jan Beulich
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-24 20:29 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/24/2015 05:47 AM, Andrew Cooper wrote:
>> +    case EXIT_REASON_VMFUNC:
>> +        if ( vmx_vmfunc_intercept(regs) == X86EMUL_OKAY )
> 
> This is currently an unconditional failure, and I don't see subsequent
> patches which alter vmx_vmfunc_intercept().  Shouldn't
> vmx_vmfunc_intercept() switch on eax and optionally call
> p2m_switch_vcpu_altp2m_by_id()?

If the VMFUNC instruction was valid, the hardware would have executed it.
The only time a VMFUNC exit occurs is if the hardware supports VMFUNC
and the hypervisor has enabled it, but the VMFUNC instruction is
invalid in some way and can't be executed (because EAX != 0, for example).

There are only two choices: crash the domain or inject #UD (which is the
closest analogue to what happens in the absence of a hypervisor and will
probably crash the OS in the domain). I chose the latter in the code I
originally wrote; Ravi chose the former in his patch. I don't have a
strong opinion either way, but I think these are the only two choices.

I hope this answers Jan's question in another email on the same subject.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-24 16:43         ` Ed White
@ 2015-06-24 21:34           ` Lengyel, Tamas
  2015-06-24 22:02             ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-24 21:34 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 6224 bytes --]

On Wed, Jun 24, 2015 at 12:43 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/24/2015 06:37 AM, Razvan Cojocaru wrote:
> > On 06/24/2015 04:32 PM, Lengyel, Tamas wrote:
> >>
> >>
> >> On Wed, Jun 24, 2015 at 1:39 AM, Razvan Cojocaru
> >> <rcojocaru@bitdefender.com <mailto:rcojocaru@bitdefender.com>> wrote:
> >>
> >>     On 06/24/2015 12:27 AM, Lengyel, Tamas wrote:
> >>     > I've extended xen-access to exercise this new feature taking into
> >>     > account some of the current limitations. Using the
> altp2m_write|exec
> >>     > options we create a duplicate view of the default hostp2m, and
> instead
> >>     > of relaxing the mem_access permissions when we encounter a
> violation, we
> >>     > swap the view on the violating vCPU while also enabling MTF
> >>     > singlestepping. When the singlestep event fires, we use the
> response to
> >>     > that event to swap the view back to the restricted altp2m view.
> >>
> >>     That's certainly very interesting. I wonder what the benefits are in
> >>     this case over emulating the fault-causing instruction (other than
> >>     obviously not going through the emulator)? The altp2m method would
> >>     certainly be slower, since you need more round-trips from userspace
> to
> >>     the hypervisor (the EPT vm_event handling + the singlestep event,
> >>     whereas with emulation you just reply to the original vm_event).
> >>
> >>
> >>     Regards,
> >>     Razvan
> >>
> >>
> >> Certainly, this is pretty slow right now, especially for the altp2m_exec
> >> case. However, sometimes you simply cannot emulate. For example if you
> >> write breakpoints into target locations, the original instruction has
> >> been overwritten with 0xCC. If you have a duplicate of the page without
> >> the breakpoint, this is an easy way to make the guest fetch the original
> >> instruction. Of course, if you extend the emulation routine where you
> >> can provide the instruction to emulate, instead of it being fetched from
> >> guest memory, that would be equally useful ;)
> >
> > Makes sense, thanks for the explanation! Sure, sending back the
> > instruction to emulate could be something to consider for the future.
> >
> >
> > Thanks,
> > Razvan
> >
>
> One thing I'd add is that what Tamas has done provides a valuable test that
> the cross-domain functionality works, even if it might not be a recommended
> design pattern. Our primary use case at Intel is intra-domain, and there
> the advantages of avoiding many exits are clear.
>
> Also, even cross-domain usage allows for different views of, and levels of
> access to, memory concurrently on different vcpus.
>
> Ed
>


Hi Ed,
I tried the system using memsharing and I collected the following crash
log. In this test I ran memsharing on all pages of the domain before
activating altp2m and creating the view. Afterwards I used my updated
xen-access to create a copy of this p2m with only R/X permissions. The idea
would be that the altp2m view remains completely shared, while the hostp2m
would be able to do its CoW propagation as the domain is executing.

(XEN) mm locking order violation: 278 > 239
(XEN) Xen BUG at mm-locks.h:68
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff82d0801f8768>]
p2m_altp2m_propagate_change+0x85/0x4a9
(XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d6v0)
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: ffff8302163a8000   rsi: 000000000000000a   rdi: ffff82d0802a069c
(XEN) rbp: ffff8302163afa68   rsp: ffff8302163af9e8   r8:  ffff83021c000000
(XEN) r9:  0000000000000003   r10: 00000000000000ef   r11: 0000000000000003
(XEN) r12: ffff83010cc51820   r13: 0000000000000000   r14: ffff830158d90000
(XEN) r15: 0000000000025697   cr0: 0000000080050033   cr4: 00000000001526f0
(XEN) cr3: 00000000dbba3000   cr2: 00000000778c9714
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff8302163af9e8:
(XEN)    ffff8302163af9f8 00000000803180f8 000000000000000c ffff82d0801892ee
(XEN)    ffff82d0801fb4d1 ffff83010cc51de0 000000000008ff49 ffff82d08012f86a
(XEN)    ffff83010cc51820 ffff83010cc51820 0000000000000000 0000000000000000
(XEN)    ffff83010cc51820 0000000000000000 ffff8300dbb334b8 ffff8302163afa00
(XEN)    ffff8302163afb18 ffff82d0801fd549 0000000500000009 ffff830200000001
(XEN)    0000000000000001 ffff830158d90000 0000000000000002 000000000008ff49
(XEN)    0000000000025697 000000000000000c ffff8302163afae8 80c000008ff49175
(XEN)    80c00000d0a97175 01ff83010cc51820 0000000000000097 ffff8300dbb33000
(XEN)    ffff8302163afb78 000000000008ff49 0000000000000000 0000000000000001
(XEN)    0000000000025697 ffff83010cc51820 ffff8302163afb38 ffff82d0801fd644
(XEN)    ffffffffffffffff 00000000000d0a97 ffff8302163afb98 ffff82d0801f23c5
(XEN)    ffff830158d90000 000000000cc51820 ffff830158d90000 000000000000000c
(XEN)    000000000008ff49 ffff83010cc51820 0000000000025697 00000000000d0a97
(XEN)    000000000008ff49 ffff830158d90000 ffff8302163afbd8 ffff82d0801f45c8
(XEN)    ffff83010cc51820 000000000000000c ffff83008fd41170 000000000008ff49
(XEN)    0000000000025697 ffff82e001a152e0 ffff8302163afc58 ffff82d080205b51
(XEN)    0000000000000009 000000000008ff49 ffff8300d0a97000 ffff83008fd41160
(XEN)    ffff82e001a152f0 ffff82e0011fe920 ffff83010cc51820 0000000c00000000
(XEN)    0000000000025697 0000000000000003 ffff83010cc51820 ffff8302163afd34
(XEN)    0000000000025697 0000000000000000 ffff8302163afca8 ffff82d0801f1f7d
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f8768>] p2m_altp2m_propagate_change+0x85/0x4a9
(XEN)    [<ffff82d0801fd549>] ept_set_entry_sve+0x5fa/0x6e6
(XEN)    [<ffff82d0801fd644>] ept_set_entry+0xf/0x11
(XEN)    [<ffff82d0801f23c5>] p2m_set_entry+0xd4/0x112
(XEN)    [<ffff82d0801f45c8>] set_shared_p2m_entry+0x2d0/0x39b
(XEN)    [<ffff82d080205b51>] __mem_sharing_unshare_page+0x83f/0xbd6
(XEN)    [<ffff82d0801f1f7d>] __get_gfn_type_access+0x224/0x2b0
(XEN)    [<ffff82d0801c6df5>] hvm_hap_nested_page_fault+0x21f/0x795
(XEN)    [<ffff82d0801e86ae>] vmx_vmexit_handler+0x1764/0x1af3
(XEN)    [<ffff82d0801ee891>] vmx_asm_vmexit_handler+0x41/0xc0
<http://novetta.com>

[-- Attachment #1.2: Type: text/html, Size: 7895 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-24 21:34           ` Lengyel, Tamas
@ 2015-06-24 22:02             ` Ed White
  2015-06-24 22:45               ` Lengyel, Tamas
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-24 22:02 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/24/2015 02:34 PM, Lengyel, Tamas wrote:
> Hi Ed,
> I tried the system using memsharing and I collected the following crash
> log. In this test I ran memsharing on all pages of the domain before
> activating altp2m and creating the view. Afterwards I used my updated
> xen-access to create a copy of this p2m with only R/X permissions. The idea
> would be that the altp2m view remains completely shared, while the hostp2m
> would be able to do its CoW propagation as the domain is executing.
> 
> (XEN) mm locking order violation: 278 > 239
> (XEN) Xen BUG at mm-locks.h:68
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    2
> (XEN) RIP:    e008:[<ffff82d0801f8768>]
> p2m_altp2m_propagate_change+0x85/0x4a9
> (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d6v0)
> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
> (XEN) rdx: ffff8302163a8000   rsi: 000000000000000a   rdi: ffff82d0802a069c
> (XEN) rbp: ffff8302163afa68   rsp: ffff8302163af9e8   r8:  ffff83021c000000
> (XEN) r9:  0000000000000003   r10: 00000000000000ef   r11: 0000000000000003
> (XEN) r12: ffff83010cc51820   r13: 0000000000000000   r14: ffff830158d90000
> (XEN) r15: 0000000000025697   cr0: 0000000080050033   cr4: 00000000001526f0
> (XEN) cr3: 00000000dbba3000   cr2: 00000000778c9714
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff8302163af9e8:
> (XEN)    ffff8302163af9f8 00000000803180f8 000000000000000c ffff82d0801892ee
> (XEN)    ffff82d0801fb4d1 ffff83010cc51de0 000000000008ff49 ffff82d08012f86a
> (XEN)    ffff83010cc51820 ffff83010cc51820 0000000000000000 0000000000000000
> (XEN)    ffff83010cc51820 0000000000000000 ffff8300dbb334b8 ffff8302163afa00
> (XEN)    ffff8302163afb18 ffff82d0801fd549 0000000500000009 ffff830200000001
> (XEN)    0000000000000001 ffff830158d90000 0000000000000002 000000000008ff49
> (XEN)    0000000000025697 000000000000000c ffff8302163afae8 80c000008ff49175
> (XEN)    80c00000d0a97175 01ff83010cc51820 0000000000000097 ffff8300dbb33000
> (XEN)    ffff8302163afb78 000000000008ff49 0000000000000000 0000000000000001
> (XEN)    0000000000025697 ffff83010cc51820 ffff8302163afb38 ffff82d0801fd644
> (XEN)    ffffffffffffffff 00000000000d0a97 ffff8302163afb98 ffff82d0801f23c5
> (XEN)    ffff830158d90000 000000000cc51820 ffff830158d90000 000000000000000c
> (XEN)    000000000008ff49 ffff83010cc51820 0000000000025697 00000000000d0a97
> (XEN)    000000000008ff49 ffff830158d90000 ffff8302163afbd8 ffff82d0801f45c8
> (XEN)    ffff83010cc51820 000000000000000c ffff83008fd41170 000000000008ff49
> (XEN)    0000000000025697 ffff82e001a152e0 ffff8302163afc58 ffff82d080205b51
> (XEN)    0000000000000009 000000000008ff49 ffff8300d0a97000 ffff83008fd41160
> (XEN)    ffff82e001a152f0 ffff82e0011fe920 ffff83010cc51820 0000000c00000000
> (XEN)    0000000000025697 0000000000000003 ffff83010cc51820 ffff8302163afd34
> (XEN)    0000000000025697 0000000000000000 ffff8302163afca8 ffff82d0801f1f7d
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801f8768>] p2m_altp2m_propagate_change+0x85/0x4a9
> (XEN)    [<ffff82d0801fd549>] ept_set_entry_sve+0x5fa/0x6e6
> (XEN)    [<ffff82d0801fd644>] ept_set_entry+0xf/0x11
> (XEN)    [<ffff82d0801f23c5>] p2m_set_entry+0xd4/0x112
> (XEN)    [<ffff82d0801f45c8>] set_shared_p2m_entry+0x2d0/0x39b
> (XEN)    [<ffff82d080205b51>] __mem_sharing_unshare_page+0x83f/0xbd6
> (XEN)    [<ffff82d0801f1f7d>] __get_gfn_type_access+0x224/0x2b0
> (XEN)    [<ffff82d0801c6df5>] hvm_hap_nested_page_fault+0x21f/0x795
> (XEN)    [<ffff82d0801e86ae>] vmx_vmexit_handler+0x1764/0x1af3
> (XEN)    [<ffff82d0801ee891>] vmx_asm_vmexit_handler+0x41/0xc0

The crash here is because I haven't successfully forced all the shared
pages in the host p2m to become unshared before copying,
which is the intended behaviour.

I think I know how that has happened and how to fix it, but what you're
trying to do won't work by design. By the time a copy from host p2m to
altp2m occurs, the sharing is supposed to be broken.

You're coming up with some ways of attempting to use altp2m that we
hadn't thought of. That's a good thing, and just what we want, but
there are limits to what we can support without more far-reaching
changes to existing parts of Xen. This isn't going to be do-able for
4.6.

Ed


Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-24 22:02             ` Ed White
@ 2015-06-24 22:45               ` Lengyel, Tamas
  2015-06-24 22:55                 ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-24 22:45 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 5390 bytes --]

On Wed, Jun 24, 2015 at 6:02 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/24/2015 02:34 PM, Lengyel, Tamas wrote:
> > Hi Ed,
> > I tried the system using memsharing and I collected the following crash
> > log. In this test I ran memsharing on all pages of the domain before
> > activating altp2m and creating the view. Afterwards I used my updated
> > xen-access to create a copy of this p2m with only R/X permissions. The
> idea
> > would be that the altp2m view remains completely shared, while the
> hostp2m
> > would be able to do its CoW propagation as the domain is executing.
> >
> > (XEN) mm locking order violation: 278 > 239
> > (XEN) Xen BUG at mm-locks.h:68
> > (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> > (XEN) CPU:    2
> > (XEN) RIP:    e008:[<ffff82d0801f8768>]
> > p2m_altp2m_propagate_change+0x85/0x4a9
> > (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d6v0)
> > (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx:
> 0000000000000000
> > (XEN) rdx: ffff8302163a8000   rsi: 000000000000000a   rdi:
> ffff82d0802a069c
> > (XEN) rbp: ffff8302163afa68   rsp: ffff8302163af9e8   r8:
> ffff83021c000000
> > (XEN) r9:  0000000000000003   r10: 00000000000000ef   r11:
> 0000000000000003
> > (XEN) r12: ffff83010cc51820   r13: 0000000000000000   r14:
> ffff830158d90000
> > (XEN) r15: 0000000000025697   cr0: 0000000080050033   cr4:
> 00000000001526f0
> > (XEN) cr3: 00000000dbba3000   cr2: 00000000778c9714
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen stack trace from rsp=ffff8302163af9e8:
> > (XEN)    ffff8302163af9f8 00000000803180f8 000000000000000c
> ffff82d0801892ee
> > (XEN)    ffff82d0801fb4d1 ffff83010cc51de0 000000000008ff49
> ffff82d08012f86a
> > (XEN)    ffff83010cc51820 ffff83010cc51820 0000000000000000
> 0000000000000000
> > (XEN)    ffff83010cc51820 0000000000000000 ffff8300dbb334b8
> ffff8302163afa00
> > (XEN)    ffff8302163afb18 ffff82d0801fd549 0000000500000009
> ffff830200000001
> > (XEN)    0000000000000001 ffff830158d90000 0000000000000002
> 000000000008ff49
> > (XEN)    0000000000025697 000000000000000c ffff8302163afae8
> 80c000008ff49175
> > (XEN)    80c00000d0a97175 01ff83010cc51820 0000000000000097
> ffff8300dbb33000
> > (XEN)    ffff8302163afb78 000000000008ff49 0000000000000000
> 0000000000000001
> > (XEN)    0000000000025697 ffff83010cc51820 ffff8302163afb38
> ffff82d0801fd644
> > (XEN)    ffffffffffffffff 00000000000d0a97 ffff8302163afb98
> ffff82d0801f23c5
> > (XEN)    ffff830158d90000 000000000cc51820 ffff830158d90000
> 000000000000000c
> > (XEN)    000000000008ff49 ffff83010cc51820 0000000000025697
> 00000000000d0a97
> > (XEN)    000000000008ff49 ffff830158d90000 ffff8302163afbd8
> ffff82d0801f45c8
> > (XEN)    ffff83010cc51820 000000000000000c ffff83008fd41170
> 000000000008ff49
> > (XEN)    0000000000025697 ffff82e001a152e0 ffff8302163afc58
> ffff82d080205b51
> > (XEN)    0000000000000009 000000000008ff49 ffff8300d0a97000
> ffff83008fd41160
> > (XEN)    ffff82e001a152f0 ffff82e0011fe920 ffff83010cc51820
> 0000000c00000000
> > (XEN)    0000000000025697 0000000000000003 ffff83010cc51820
> ffff8302163afd34
> > (XEN)    0000000000025697 0000000000000000 ffff8302163afca8
> ffff82d0801f1f7d
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d0801f8768>] p2m_altp2m_propagate_change+0x85/0x4a9
> > (XEN)    [<ffff82d0801fd549>] ept_set_entry_sve+0x5fa/0x6e6
> > (XEN)    [<ffff82d0801fd644>] ept_set_entry+0xf/0x11
> > (XEN)    [<ffff82d0801f23c5>] p2m_set_entry+0xd4/0x112
> > (XEN)    [<ffff82d0801f45c8>] set_shared_p2m_entry+0x2d0/0x39b
> > (XEN)    [<ffff82d080205b51>] __mem_sharing_unshare_page+0x83f/0xbd6
> > (XEN)    [<ffff82d0801f1f7d>] __get_gfn_type_access+0x224/0x2b0
> > (XEN)    [<ffff82d0801c6df5>] hvm_hap_nested_page_fault+0x21f/0x795
> > (XEN)    [<ffff82d0801e86ae>] vmx_vmexit_handler+0x1764/0x1af3
> > (XEN)    [<ffff82d0801ee891>] vmx_asm_vmexit_handler+0x41/0xc0
>
> The crash here is because I haven't successfully forced all the shared
> pages in the host p2m to become unshared before copying,
> which is the intended behaviour.
>
> I think I know how that has happened and how to fix it, but what you're
> trying to do won't work by design. By the time a copy from host p2m to
> altp2m occurs, the sharing is supposed to be broken.
>

Hm. If the sharing gets broken before the hostp2m->altp2m copy, maybe doing
sharing after the view has been created is a better route? I guess the
sharing code would need to be adapted to check if altp2m is enabled for
that to work..


>
> You're coming up with some ways of attempting to use altp2m that we
> hadn't thought of. That's a good thing, and just what we want, but
> there are limits to what we can support without more far-reaching
> changes to existing parts of Xen. This isn't going to be do-able for
> 4.6.
>

My main concern is just getting it to work, hitting 4.6 is not a priority.
I understand that my stuff is highly experimental ;) While the gfn
remapping feature is intriguing, in my setup I already have a copy of the
page I would want to present during a singlestep-altp2mswitch - in the
origin domains memory. AFAIU the gfn remapping would work only within the
domains existing p2m space.



-- 

[image: www.novetta.com]

Tamas K Lengyel

Senior Security Researcher

7921 Jones Branch Drive

McLean VA 22102

Email  tlengyel@novetta.com

[-- Attachment #1.2: Type: text/html, Size: 8375 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-24 22:45               ` Lengyel, Tamas
@ 2015-06-24 22:55                 ` Ed White
  2015-06-25  9:00                   ` Andrew Cooper
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-24 22:55 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/24/2015 03:45 PM, Lengyel, Tamas wrote:
> On Wed, Jun 24, 2015 at 6:02 PM, Ed White <edmund.h.white@intel.com> wrote:
> 
>> On 06/24/2015 02:34 PM, Lengyel, Tamas wrote:
>>> Hi Ed,
>>> I tried the system using memsharing and I collected the following crash
>>> log. In this test I ran memsharing on all pages of the domain before
>>> activating altp2m and creating the view. Afterwards I used my updated
>>> xen-access to create a copy of this p2m with only R/X permissions. The
>> idea
>>> would be that the altp2m view remains completely shared, while the
>> hostp2m
>>> would be able to do its CoW propagation as the domain is executing.
>>>
>>> (XEN) mm locking order violation: 278 > 239
>>> (XEN) Xen BUG at mm-locks.h:68
>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>> (XEN) CPU:    2
>>> (XEN) RIP:    e008:[<ffff82d0801f8768>]
>>> p2m_altp2m_propagate_change+0x85/0x4a9
>>> (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d6v0)
>>> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx:
>> 0000000000000000
>>> (XEN) rdx: ffff8302163a8000   rsi: 000000000000000a   rdi:
>> ffff82d0802a069c
>>> (XEN) rbp: ffff8302163afa68   rsp: ffff8302163af9e8   r8:
>> ffff83021c000000
>>> (XEN) r9:  0000000000000003   r10: 00000000000000ef   r11:
>> 0000000000000003
>>> (XEN) r12: ffff83010cc51820   r13: 0000000000000000   r14:
>> ffff830158d90000
>>> (XEN) r15: 0000000000025697   cr0: 0000000080050033   cr4:
>> 00000000001526f0
>>> (XEN) cr3: 00000000dbba3000   cr2: 00000000778c9714
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen stack trace from rsp=ffff8302163af9e8:
>>> (XEN)    ffff8302163af9f8 00000000803180f8 000000000000000c
>> ffff82d0801892ee
>>> (XEN)    ffff82d0801fb4d1 ffff83010cc51de0 000000000008ff49
>> ffff82d08012f86a
>>> (XEN)    ffff83010cc51820 ffff83010cc51820 0000000000000000
>> 0000000000000000
>>> (XEN)    ffff83010cc51820 0000000000000000 ffff8300dbb334b8
>> ffff8302163afa00
>>> (XEN)    ffff8302163afb18 ffff82d0801fd549 0000000500000009
>> ffff830200000001
>>> (XEN)    0000000000000001 ffff830158d90000 0000000000000002
>> 000000000008ff49
>>> (XEN)    0000000000025697 000000000000000c ffff8302163afae8
>> 80c000008ff49175
>>> (XEN)    80c00000d0a97175 01ff83010cc51820 0000000000000097
>> ffff8300dbb33000
>>> (XEN)    ffff8302163afb78 000000000008ff49 0000000000000000
>> 0000000000000001
>>> (XEN)    0000000000025697 ffff83010cc51820 ffff8302163afb38
>> ffff82d0801fd644
>>> (XEN)    ffffffffffffffff 00000000000d0a97 ffff8302163afb98
>> ffff82d0801f23c5
>>> (XEN)    ffff830158d90000 000000000cc51820 ffff830158d90000
>> 000000000000000c
>>> (XEN)    000000000008ff49 ffff83010cc51820 0000000000025697
>> 00000000000d0a97
>>> (XEN)    000000000008ff49 ffff830158d90000 ffff8302163afbd8
>> ffff82d0801f45c8
>>> (XEN)    ffff83010cc51820 000000000000000c ffff83008fd41170
>> 000000000008ff49
>>> (XEN)    0000000000025697 ffff82e001a152e0 ffff8302163afc58
>> ffff82d080205b51
>>> (XEN)    0000000000000009 000000000008ff49 ffff8300d0a97000
>> ffff83008fd41160
>>> (XEN)    ffff82e001a152f0 ffff82e0011fe920 ffff83010cc51820
>> 0000000c00000000
>>> (XEN)    0000000000025697 0000000000000003 ffff83010cc51820
>> ffff8302163afd34
>>> (XEN)    0000000000025697 0000000000000000 ffff8302163afca8
>> ffff82d0801f1f7d
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d0801f8768>] p2m_altp2m_propagate_change+0x85/0x4a9
>>> (XEN)    [<ffff82d0801fd549>] ept_set_entry_sve+0x5fa/0x6e6
>>> (XEN)    [<ffff82d0801fd644>] ept_set_entry+0xf/0x11
>>> (XEN)    [<ffff82d0801f23c5>] p2m_set_entry+0xd4/0x112
>>> (XEN)    [<ffff82d0801f45c8>] set_shared_p2m_entry+0x2d0/0x39b
>>> (XEN)    [<ffff82d080205b51>] __mem_sharing_unshare_page+0x83f/0xbd6
>>> (XEN)    [<ffff82d0801f1f7d>] __get_gfn_type_access+0x224/0x2b0
>>> (XEN)    [<ffff82d0801c6df5>] hvm_hap_nested_page_fault+0x21f/0x795
>>> (XEN)    [<ffff82d0801e86ae>] vmx_vmexit_handler+0x1764/0x1af3
>>> (XEN)    [<ffff82d0801ee891>] vmx_asm_vmexit_handler+0x41/0xc0
>>
>> The crash here is because I haven't successfully forced all the shared
>> pages in the host p2m to become unshared before copying,
>> which is the intended behaviour.
>>
>> I think I know how that has happened and how to fix it, but what you're
>> trying to do won't work by design. By the time a copy from host p2m to
>> altp2m occurs, the sharing is supposed to be broken.
>>
> 
> Hm. If the sharing gets broken before the hostp2m->altp2m copy, maybe doing
> sharing after the view has been created is a better route? I guess the
> sharing code would need to be adapted to check if altp2m is enabled for
> that to work..
> 
> 
>>
>> You're coming up with some ways of attempting to use altp2m that we
>> hadn't thought of. That's a good thing, and just what we want, but
>> there are limits to what we can support without more far-reaching
>> changes to existing parts of Xen. This isn't going to be do-able for
>> 4.6.
>>
> 
> My main concern is just getting it to work, hitting 4.6 is not a priority.
> I understand that my stuff is highly experimental ;) While the gfn
> remapping feature is intriguing, in my setup I already have a copy of the
> page I would want to present during a singlestep-altp2mswitch - in the
> origin domains memory. AFAIU the gfn remapping would work only within the
> domains existing p2m space.

Understood, but for us hitting 4.6 with the initial version of altp2m
is *the* priority. And yes, remapping is restricted to pages from the
same host p2m.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-22 18:56 ` [PATCH v2 09/12] x86/altp2m: add remaining support routines Ed White
                     ` (2 preceding siblings ...)
  2015-06-24 16:15   ` Lengyel, Tamas
@ 2015-06-25  2:44   ` Lengyel, Tamas
  2015-06-25 16:31     ` Ed White
  3 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-25  2:44 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 3050 bytes --]

On Mon, Jun 22, 2015 at 2:56 PM, Ed White <edmund.h.white@intel.com> wrote:

> Add the remaining routines required to support enabling the alternate
> p2m functionality.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c              |  60 +++++-
>  xen/arch/x86/mm/hap/Makefile        |   1 +
>  xen/arch/x86/mm/hap/altp2m_hap.c    | 103 +++++++++
>  xen/arch/x86/mm/p2m-ept.c           |   3 +
>  xen/arch/x86/mm/p2m.c               | 405
> ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/altp2mhvm.h |   4 +
>  xen/include/asm-x86/p2m.h           |  33 +++
>  7 files changed, 601 insertions(+), 8 deletions(-)
>  create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index d75c12d..b758ee1 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2786,10 +2786,11 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> unsigned long gla,
>      p2m_access_t p2ma;
>      mfn_t mfn;
>      struct vcpu *v = current;
> -    struct p2m_domain *p2m;
> +    struct p2m_domain *p2m, *hostp2m;
>      int rc, fall_through = 0, paged = 0;
>      int sharing_enomem = 0;
>      vm_event_request_t *req_ptr = NULL;
> +    int altp2m_active = 0;
>
>      /* On Nested Virtualization, walk the guest page table.
>       * If this succeeds, all is fine.
> @@ -2845,15 +2846,33 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
> unsigned long gla,
>      {
>          if ( !handle_mmio_with_translation(gla, gpa >> PAGE_SHIFT, npfec)
> )
>              hvm_inject_hw_exception(TRAP_gp_fault, 0);
> -        rc = 1;
> -        goto out;
> +        return 1;
>      }
>
> -    p2m = p2m_get_hostp2m(v->domain);
> -    mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma,
> +    altp2m_active = altp2mhvm_active(v->domain);
> +
> +    /* Take a lock on the host p2m speculatively, to avoid potential
> +     * locking order problems later and to handle unshare etc.
> +     */
> +    hostp2m = p2m_get_hostp2m(v->domain);
> +    mfn = get_gfn_type_access(hostp2m, gfn, &p2mt, &p2ma,
>                                P2M_ALLOC | (npfec.write_access ?
> P2M_UNSHARE : 0),
>                                NULL);
>
> +    if ( altp2m_active )
> +    {
> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m) ==
> 1 )
> +        {
> +            /* entry was lazily copied from host -- retry */
>

So I'm not fully following this logic here. I can see that the altp2m entry
got copied from the host. Why is there a need for the retry, why not just
continue?


> +            __put_gfn(hostp2m, gfn);
> +            return 1;
> +        }
> +
> +        mfn = get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL);
> +    }
> +    else
> +        p2m = hostp2m;
> +
>      /* Check access permissions first, then handle faults */
>      if ( mfn_x(mfn) != INVALID_MFN )
>      {
>


-- 

[image: www.novetta.com]

Tamas K Lengyel

Senior Security Researcher

7921 Jones Branch Drive

McLean VA 22102

Email  tlengyel@novetta.com

[-- Attachment #1.2: Type: text/html, Size: 5894 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-24 17:53     ` Ed White
@ 2015-06-25  8:12       ` Jan Beulich
  2015-06-25 16:36         ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Jan Beulich @ 2015-06-25  8:12 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 24.06.15 at 19:53, <edmund.h.white@intel.com> wrote:
> On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>> --- a/xen/include/asm-x86/p2m.h
>>> +++ b/xen/include/asm-x86/p2m.h
>>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>>                                         p2m_access_t *p2ma,
>>>                                         p2m_query_t q,
>>>                                         unsigned int *page_order);
>>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>>> +                                         unsigned long gfn,
>>> +                                         mfn_t mfn, unsigned int 
> page_order,
>>> +                                         p2m_type_t p2mt,
>>> +                                         p2m_access_t p2ma,
>>> +                                         unsigned int sve);
>>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>>> +                                         unsigned long gfn,
>>> +                                         p2m_type_t *p2mt,
>>> +                                         p2m_access_t *p2ma,
>>> +                                         p2m_query_t q,
>>> +                                         unsigned int *page_order,
>>> +                                         unsigned int *sve);
>> 
>> I have to admit that I find the _full suffixes here pretty odd. Based
>> on the functionality, they should be _sve. But then it seems
>> questionable how they could be useful to the generic p2m layer
>> anyway, i.e. why there would need to be such hooks in the first
>> place.
> 
> I did originally use _sve suffixes. I changed them because there
> may be some future case where these routines control some other
> EPTE bit too. I made them hooks because I thought calling ept...
> functions directly would be a layering violation.

Indeed it would. But thinking about it more, I would suggest to
extend the existing accessors rather than adding new ones.
Just consider what would result when further such return values
are going to be needed in the future: I don't see us adding
_fuller, _fullest, etc variants. Perhaps just make the new output
an optional generic "flags" one. One might even consider folding
it with order, or even consolidate all the outputs into a single
structure.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator.
  2015-06-24 20:29     ` Ed White
@ 2015-06-25  8:26       ` Jan Beulich
  0 siblings, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-06-25  8:26 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 24.06.15 at 22:29, <edmund.h.white@intel.com> wrote:
> On 06/24/2015 05:47 AM, Andrew Cooper wrote:
>>> +    case EXIT_REASON_VMFUNC:
>>> +        if ( vmx_vmfunc_intercept(regs) == X86EMUL_OKAY )
>> 
>> This is currently an unconditional failure, and I don't see subsequent
>> patches which alter vmx_vmfunc_intercept().  Shouldn't
>> vmx_vmfunc_intercept() switch on eax and optionally call
>> p2m_switch_vcpu_altp2m_by_id()?
> 
> If the VMFUNC instruction was valid, the hardware would have executed it.
> The only time a VMFUNC exit occurs is if the hardware supports VMFUNC
> and the hypervisor has enabled it, but the VMFUNC instruction is
> invalid in some way and can't be executed (because EAX != 0, for example).
> 
> There are only two choices: crash the domain or inject #UD (which is the
> closest analogue to what happens in the absence of a hypervisor and will
> probably crash the OS in the domain). I chose the latter in the code I
> originally wrote; Ravi chose the former in his patch. I don't have a
> strong opinion either way, but I think these are the only two choices.

Injecting an exception should always be preferred, as that gives the
guest at least a theoretical chance of recovering. The closer to real
hardware, the better. I.e. if hardware without a hypervisor gives
#UD here, so should the emulation do. (Still I admit that the case is
somewhat fuzzy, as the instruction specifically exists to be used
under a hypervisor. But #UD being raised for EAX >= 64 makes it a
good candidate for smaller but invalid EAX values too imo.)

> I hope this answers Jan's question in another email on the same subject.

It does, thanks.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter.
  2015-06-24 17:57     ` Ed White
  2015-06-24 18:08       ` Andrew Cooper
@ 2015-06-25  8:33       ` Jan Beulich
  1 sibling, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-06-25  8:33 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 24.06.15 at 19:57, <edmund.h.white@intel.com> wrote:
> On 06/24/2015 07:59 AM, Jan Beulich wrote:
>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>> +    case HVM_PARAM_ALTP2MHVM:
>>> +        if ( a.value > 1 )
>>> +            rc = -EINVAL;
>>> +        if ( a.value &&
>>> +             d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
>>> +            rc = -EINVAL;
>>> +        break;
>> 
>> As you added the new param to the change-once section of
>> hvm_allow_set_param() - what is this code good for?
> 
> I don't understand. How does change-once invalidate this code?

The first if of course is needed (to exclude larger values). But
afaict the second if() is redundant with the change-once logic,
and prevents success from being returned when the tool stack
requests a 1->1 change (which is a no-op, and hence ought to
be allowed).

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter.
  2015-06-24 18:08       ` Andrew Cooper
@ 2015-06-25  8:34         ` Jan Beulich
  0 siblings, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-06-25  8:34 UTC (permalink / raw)
  To: Andrew Cooper, Ed White
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, xen-devel,
	tlengyel, Daniel De Graaf

>>> On 24.06.15 at 20:08, <andrew.cooper3@citrix.com> wrote:
> On 24/06/15 18:57, Ed White wrote:
>> On 06/24/2015 07:59 AM, Jan Beulich wrote:
>>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>>> +    case HVM_PARAM_ALTP2MHVM:
>>>> +        if ( a.value > 1 )
>>>> +            rc = -EINVAL;
>>>> +        if ( a.value &&
>>>> +             d->arch.hvm_domain.params[HVM_PARAM_NESTEDHVM] )
>>>> +            rc = -EINVAL;
>>>> +        break;
>>> As you added the new param to the change-once section of
>>> hvm_allow_set_param() - what is this code good for?
>> I don't understand. How does change-once invalidate this code?
> 
> I don't believe it does.  This code (appears to) enforce the param being
> a boolean, and exclusive against HVM_PARAM_NESTEDHVM.

Ah, indeed - I didn't pay attention to it being a different param
in the second if(). I.e. scratch the reply I just sent to Ed's mail.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-24 18:06     ` Ed White
@ 2015-06-25  8:52       ` Ian Campbell
  2015-06-25 16:27         ` Ed White
  2015-06-25 12:44       ` Lengyel, Tamas
  1 sibling, 1 reply; 116+ messages in thread
From: Ian Campbell @ 2015-06-25  8:52 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Xen-devel,
	Jan Beulich, Andrew Cooper, Lengyel, Tamas, Daniel De Graaf

On Wed, 2015-06-24 at 11:06 -0700, Ed White wrote:
> I think we should vote on this.

In general we vote on things only when there has been a failure to reach
consensus. Unless there has been some prior discussion around this issue
which isn't referenced from the bits of the thread I've looked at then I
don't think we are at that point.

Ian.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-24 22:55                 ` Ed White
@ 2015-06-25  9:00                   ` Andrew Cooper
  2015-06-25 16:38                     ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-25  9:00 UTC (permalink / raw)
  To: Ed White, Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Daniel De Graaf

On 24/06/15 23:55, Ed White wrote:
> On 06/24/2015 03:45 PM, Lengyel, Tamas wrote:
>> On Wed, Jun 24, 2015 at 6:02 PM, Ed White <edmund.h.white@intel.com> wrote:
>>
>>> On 06/24/2015 02:34 PM, Lengyel, Tamas wrote:
>>>> Hi Ed,
>>>> I tried the system using memsharing and I collected the following crash
>>>> log. In this test I ran memsharing on all pages of the domain before
>>>> activating altp2m and creating the view. Afterwards I used my updated
>>>> xen-access to create a copy of this p2m with only R/X permissions. The
>>> idea
>>>> would be that the altp2m view remains completely shared, while the
>>> hostp2m
>>>> would be able to do its CoW propagation as the domain is executing.
>>>>
>>>> (XEN) mm locking order violation: 278 > 239
>>>> (XEN) Xen BUG at mm-locks.h:68
>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>>> (XEN) CPU:    2
>>>> (XEN) RIP:    e008:[<ffff82d0801f8768>]
>>>> p2m_altp2m_propagate_change+0x85/0x4a9
>>>> (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d6v0)
>>>> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx:
>>> 0000000000000000
>>>> (XEN) rdx: ffff8302163a8000   rsi: 000000000000000a   rdi:
>>> ffff82d0802a069c
>>>> (XEN) rbp: ffff8302163afa68   rsp: ffff8302163af9e8   r8:
>>> ffff83021c000000
>>>> (XEN) r9:  0000000000000003   r10: 00000000000000ef   r11:
>>> 0000000000000003
>>>> (XEN) r12: ffff83010cc51820   r13: 0000000000000000   r14:
>>> ffff830158d90000
>>>> (XEN) r15: 0000000000025697   cr0: 0000000080050033   cr4:
>>> 00000000001526f0
>>>> (XEN) cr3: 00000000dbba3000   cr2: 00000000778c9714
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff8302163af9e8:
>>>> (XEN)    ffff8302163af9f8 00000000803180f8 000000000000000c
>>> ffff82d0801892ee
>>>> (XEN)    ffff82d0801fb4d1 ffff83010cc51de0 000000000008ff49
>>> ffff82d08012f86a
>>>> (XEN)    ffff83010cc51820 ffff83010cc51820 0000000000000000
>>> 0000000000000000
>>>> (XEN)    ffff83010cc51820 0000000000000000 ffff8300dbb334b8
>>> ffff8302163afa00
>>>> (XEN)    ffff8302163afb18 ffff82d0801fd549 0000000500000009
>>> ffff830200000001
>>>> (XEN)    0000000000000001 ffff830158d90000 0000000000000002
>>> 000000000008ff49
>>>> (XEN)    0000000000025697 000000000000000c ffff8302163afae8
>>> 80c000008ff49175
>>>> (XEN)    80c00000d0a97175 01ff83010cc51820 0000000000000097
>>> ffff8300dbb33000
>>>> (XEN)    ffff8302163afb78 000000000008ff49 0000000000000000
>>> 0000000000000001
>>>> (XEN)    0000000000025697 ffff83010cc51820 ffff8302163afb38
>>> ffff82d0801fd644
>>>> (XEN)    ffffffffffffffff 00000000000d0a97 ffff8302163afb98
>>> ffff82d0801f23c5
>>>> (XEN)    ffff830158d90000 000000000cc51820 ffff830158d90000
>>> 000000000000000c
>>>> (XEN)    000000000008ff49 ffff83010cc51820 0000000000025697
>>> 00000000000d0a97
>>>> (XEN)    000000000008ff49 ffff830158d90000 ffff8302163afbd8
>>> ffff82d0801f45c8
>>>> (XEN)    ffff83010cc51820 000000000000000c ffff83008fd41170
>>> 000000000008ff49
>>>> (XEN)    0000000000025697 ffff82e001a152e0 ffff8302163afc58
>>> ffff82d080205b51
>>>> (XEN)    0000000000000009 000000000008ff49 ffff8300d0a97000
>>> ffff83008fd41160
>>>> (XEN)    ffff82e001a152f0 ffff82e0011fe920 ffff83010cc51820
>>> 0000000c00000000
>>>> (XEN)    0000000000025697 0000000000000003 ffff83010cc51820
>>> ffff8302163afd34
>>>> (XEN)    0000000000025697 0000000000000000 ffff8302163afca8
>>> ffff82d0801f1f7d
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82d0801f8768>] p2m_altp2m_propagate_change+0x85/0x4a9
>>>> (XEN)    [<ffff82d0801fd549>] ept_set_entry_sve+0x5fa/0x6e6
>>>> (XEN)    [<ffff82d0801fd644>] ept_set_entry+0xf/0x11
>>>> (XEN)    [<ffff82d0801f23c5>] p2m_set_entry+0xd4/0x112
>>>> (XEN)    [<ffff82d0801f45c8>] set_shared_p2m_entry+0x2d0/0x39b
>>>> (XEN)    [<ffff82d080205b51>] __mem_sharing_unshare_page+0x83f/0xbd6
>>>> (XEN)    [<ffff82d0801f1f7d>] __get_gfn_type_access+0x224/0x2b0
>>>> (XEN)    [<ffff82d0801c6df5>] hvm_hap_nested_page_fault+0x21f/0x795
>>>> (XEN)    [<ffff82d0801e86ae>] vmx_vmexit_handler+0x1764/0x1af3
>>>> (XEN)    [<ffff82d0801ee891>] vmx_asm_vmexit_handler+0x41/0xc0
>>> The crash here is because I haven't successfully forced all the shared
>>> pages in the host p2m to become unshared before copying,
>>> which is the intended behaviour.
>>>
>>> I think I know how that has happened and how to fix it, but what you're
>>> trying to do won't work by design. By the time a copy from host p2m to
>>> altp2m occurs, the sharing is supposed to be broken.
>>>
>> Hm. If the sharing gets broken before the hostp2m->altp2m copy, maybe doing
>> sharing after the view has been created is a better route? I guess the
>> sharing code would need to be adapted to check if altp2m is enabled for
>> that to work..
>>
>>
>>> You're coming up with some ways of attempting to use altp2m that we
>>> hadn't thought of. That's a good thing, and just what we want, but
>>> there are limits to what we can support without more far-reaching
>>> changes to existing parts of Xen. This isn't going to be do-able for
>>> 4.6.
>>>
>> My main concern is just getting it to work, hitting 4.6 is not a priority.
>> I understand that my stuff is highly experimental ;) While the gfn
>> remapping feature is intriguing, in my setup I already have a copy of the
>> page I would want to present during a singlestep-altp2mswitch - in the
>> origin domains memory. AFAIU the gfn remapping would work only within the
>> domains existing p2m space.
> Understood, but for us hitting 4.6 with the initial version of altp2m
> is *the* priority. And yes, remapping is restricted to pages from the
> same host p2m.

It is fine for experimental features to have known interaction issues. 
I don't necessarily see this as a blocker to 4.6, although it would
indeed be better if it could be fixed in time.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-24 18:06     ` Ed White
  2015-06-25  8:52       ` Ian Campbell
@ 2015-06-25 12:44       ` Lengyel, Tamas
  2015-06-25 13:40         ` Razvan Cojocaru
  1 sibling, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-25 12:44 UTC (permalink / raw)
  To: Ed White, Razvan Cojocaru
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 1081 bytes --]

On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
> >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
> >> +                                 unsigned long pfn, xenmem_access_t
> >> access)
> >> +{
> >>
> >
> > This function IMHO should be merged with p2m_set_mem_access and should be
> > triggerable with the same memop (XENMEM_access_op) hypercall instead of
> > introducing a new hvmop one.
>
> I think we should vote on this. My view is that it makes XENMEM_access_op
> too complicated to use.


The two functions are not very long and share enough code that it would
justify merging. The only big change added is the copy from host->alt when
the entry doesn't exists in alt, and that itself is pretty self contained.
Let's see if we can get a third opinion on it..


> It also makes using this one specific altp2m
> capability different to using any of the others
>

That argument goes both ways - a new mem_access function being introduced
that is different from the others.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 1691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 12:44       ` Lengyel, Tamas
@ 2015-06-25 13:40         ` Razvan Cojocaru
  2015-06-25 16:48           ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Razvan Cojocaru @ 2015-06-25 13:40 UTC (permalink / raw)
  To: Lengyel, Tamas, Ed White
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/25/2015 03:44 PM, Lengyel, Tamas wrote:
> On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com
> <mailto:edmund.h.white@intel.com>> wrote:
>     On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
>     >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
>     >> +                                 unsigned long pfn, xenmem_access_t
>     >> access)
>     >> +{
>     >>
>     >
>     > This function IMHO should be merged with p2m_set_mem_access and should be
>     > triggerable with the same memop (XENMEM_access_op) hypercall instead of
>     > introducing a new hvmop one.
> 
>     I think we should vote on this. My view is that it makes
>     XENMEM_access_op
>     too complicated to use.
> 
> The two functions are not very long and share enough code that it would
> justify merging. The only big change added is the copy from host->alt
> when the entry doesn't exists in alt, and that itself is pretty self
> contained. Let's see if we can get a third opinion on it..

At first sight (I admit I'm rather late in the game and haven't had a
chance to follow the series closely from the beginning), the two
functions do seem to be mergeable (or at least the common code factored
out in static helper functions).

Also, if Ed's concern is that the libxc API would look unnatural if
xc_set_mem_access() is used for both purposes, as far as I can tell the
only difference could be a non-zero last altp2m parameter, so I agree
with you that the less functions doing almost the same thing the better
(I have been guilty of this in the past too, for example with my
xc_enable_introspection() function ;) ).

So I'd say, yes, if possible merge them.


Regards,
Razvan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25  8:52       ` Ian Campbell
@ 2015-06-25 16:27         ` Ed White
  0 siblings, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-25 16:27 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Xen-devel,
	Jan Beulich, Andrew Cooper, Lengyel, Tamas, Daniel De Graaf

On 06/25/2015 01:52 AM, Ian Campbell wrote:
> On Wed, 2015-06-24 at 11:06 -0700, Ed White wrote:
>> I think we should vote on this.
> 
> In general we vote on things only when there has been a failure to reach
> consensus. Unless there has been some prior discussion around this issue
> which isn't referenced from the bits of the thread I've looked at then I
> don't think we are at that point.

I didn't mean vote quite as literally as you seem to have read it. I
just meant that IMHO it needs more discussion. I didn't even realize
there was a formal procedure for voting -- if I had I would have been
more careful with my choice of words.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25  2:44   ` Lengyel, Tamas
@ 2015-06-25 16:31     ` Ed White
  2015-06-25 17:42       ` Lengyel, Tamas
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-25 16:31 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/24/2015 07:44 PM, Lengyel, Tamas wrote:
>> +    if ( altp2m_active )
>> +    {
>> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m) ==
>> 1 )
>> +        {
>> +            /* entry was lazily copied from host -- retry */
>>
> 
> So I'm not fully following this logic here. I can see that the altp2m entry
> got copied from the host. Why is there a need for the retry, why not just
> continue?

At this point the EPT's that the hardware is using have been made valid
by software, but the hardware has already failed the access so you have
to restart the operation. This isn't in any way specific to altp2m,
it's how page fault logic works generally.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-25  8:12       ` Jan Beulich
@ 2015-06-25 16:36         ` Ed White
  2015-06-26  6:04           ` Jan Beulich
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-25 16:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

On 06/25/2015 01:12 AM, Jan Beulich wrote:
>>>> On 24.06.15 at 19:53, <edmund.h.white@intel.com> wrote:
>> On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>>> --- a/xen/include/asm-x86/p2m.h
>>>> +++ b/xen/include/asm-x86/p2m.h
>>>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>>>                                         p2m_access_t *p2ma,
>>>>                                         p2m_query_t q,
>>>>                                         unsigned int *page_order);
>>>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>>>> +                                         unsigned long gfn,
>>>> +                                         mfn_t mfn, unsigned int 
>> page_order,
>>>> +                                         p2m_type_t p2mt,
>>>> +                                         p2m_access_t p2ma,
>>>> +                                         unsigned int sve);
>>>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>>>> +                                         unsigned long gfn,
>>>> +                                         p2m_type_t *p2mt,
>>>> +                                         p2m_access_t *p2ma,
>>>> +                                         p2m_query_t q,
>>>> +                                         unsigned int *page_order,
>>>> +                                         unsigned int *sve);
>>>
>>> I have to admit that I find the _full suffixes here pretty odd. Based
>>> on the functionality, they should be _sve. But then it seems
>>> questionable how they could be useful to the generic p2m layer
>>> anyway, i.e. why there would need to be such hooks in the first
>>> place.
>>
>> I did originally use _sve suffixes. I changed them because there
>> may be some future case where these routines control some other
>> EPTE bit too. I made them hooks because I thought calling ept...
>> functions directly would be a layering violation.
> 
> Indeed it would. But thinking about it more, I would suggest to
> extend the existing accessors rather than adding new ones.
> Just consider what would result when further such return values
> are going to be needed in the future: I don't see us adding
> _fuller, _fullest, etc variants. Perhaps just make the new output
> an optional generic "flags" one. One might even consider folding
> it with order, or even consolidate all the outputs into a single
> structure.

The new functions are called in 3 places only, so changing them
later would have minimal impact. The existing functions are called
in many, many places. I *really* don't want to go changing the
amount of existing code that doing what you suggest would entail
at this late stage.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-25  9:00                   ` Andrew Cooper
@ 2015-06-25 16:38                     ` Ed White
  2015-06-25 17:29                       ` Lengyel, Tamas
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-25 16:38 UTC (permalink / raw)
  To: Andrew Cooper, Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Daniel De Graaf

On 06/25/2015 02:00 AM, Andrew Cooper wrote:
> On 24/06/15 23:55, Ed White wrote:
>> On 06/24/2015 03:45 PM, Lengyel, Tamas wrote:
>>> On Wed, Jun 24, 2015 at 6:02 PM, Ed White <edmund.h.white@intel.com> wrote:
>>>
>>>> On 06/24/2015 02:34 PM, Lengyel, Tamas wrote:
>>>>> Hi Ed,
>>>>> I tried the system using memsharing and I collected the following crash
>>>>> log. In this test I ran memsharing on all pages of the domain before
>>>>> activating altp2m and creating the view. Afterwards I used my updated
>>>>> xen-access to create a copy of this p2m with only R/X permissions. The
>>>> idea
>>>>> would be that the altp2m view remains completely shared, while the
>>>> hostp2m
>>>>> would be able to do its CoW propagation as the domain is executing.
>>>>>
>>>>> (XEN) mm locking order violation: 278 > 239
>>>>> (XEN) Xen BUG at mm-locks.h:68
>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>>>> (XEN) CPU:    2
>>>>> (XEN) RIP:    e008:[<ffff82d0801f8768>]
>>>>> p2m_altp2m_propagate_change+0x85/0x4a9
>>>>> (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d6v0)
>>>>> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx:
>>>> 0000000000000000
>>>>> (XEN) rdx: ffff8302163a8000   rsi: 000000000000000a   rdi:
>>>> ffff82d0802a069c
>>>>> (XEN) rbp: ffff8302163afa68   rsp: ffff8302163af9e8   r8:
>>>> ffff83021c000000
>>>>> (XEN) r9:  0000000000000003   r10: 00000000000000ef   r11:
>>>> 0000000000000003
>>>>> (XEN) r12: ffff83010cc51820   r13: 0000000000000000   r14:
>>>> ffff830158d90000
>>>>> (XEN) r15: 0000000000025697   cr0: 0000000080050033   cr4:
>>>> 00000000001526f0
>>>>> (XEN) cr3: 00000000dbba3000   cr2: 00000000778c9714
>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>> (XEN) Xen stack trace from rsp=ffff8302163af9e8:
>>>>> (XEN)    ffff8302163af9f8 00000000803180f8 000000000000000c
>>>> ffff82d0801892ee
>>>>> (XEN)    ffff82d0801fb4d1 ffff83010cc51de0 000000000008ff49
>>>> ffff82d08012f86a
>>>>> (XEN)    ffff83010cc51820 ffff83010cc51820 0000000000000000
>>>> 0000000000000000
>>>>> (XEN)    ffff83010cc51820 0000000000000000 ffff8300dbb334b8
>>>> ffff8302163afa00
>>>>> (XEN)    ffff8302163afb18 ffff82d0801fd549 0000000500000009
>>>> ffff830200000001
>>>>> (XEN)    0000000000000001 ffff830158d90000 0000000000000002
>>>> 000000000008ff49
>>>>> (XEN)    0000000000025697 000000000000000c ffff8302163afae8
>>>> 80c000008ff49175
>>>>> (XEN)    80c00000d0a97175 01ff83010cc51820 0000000000000097
>>>> ffff8300dbb33000
>>>>> (XEN)    ffff8302163afb78 000000000008ff49 0000000000000000
>>>> 0000000000000001
>>>>> (XEN)    0000000000025697 ffff83010cc51820 ffff8302163afb38
>>>> ffff82d0801fd644
>>>>> (XEN)    ffffffffffffffff 00000000000d0a97 ffff8302163afb98
>>>> ffff82d0801f23c5
>>>>> (XEN)    ffff830158d90000 000000000cc51820 ffff830158d90000
>>>> 000000000000000c
>>>>> (XEN)    000000000008ff49 ffff83010cc51820 0000000000025697
>>>> 00000000000d0a97
>>>>> (XEN)    000000000008ff49 ffff830158d90000 ffff8302163afbd8
>>>> ffff82d0801f45c8
>>>>> (XEN)    ffff83010cc51820 000000000000000c ffff83008fd41170
>>>> 000000000008ff49
>>>>> (XEN)    0000000000025697 ffff82e001a152e0 ffff8302163afc58
>>>> ffff82d080205b51
>>>>> (XEN)    0000000000000009 000000000008ff49 ffff8300d0a97000
>>>> ffff83008fd41160
>>>>> (XEN)    ffff82e001a152f0 ffff82e0011fe920 ffff83010cc51820
>>>> 0000000c00000000
>>>>> (XEN)    0000000000025697 0000000000000003 ffff83010cc51820
>>>> ffff8302163afd34
>>>>> (XEN)    0000000000025697 0000000000000000 ffff8302163afca8
>>>> ffff82d0801f1f7d
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff82d0801f8768>] p2m_altp2m_propagate_change+0x85/0x4a9
>>>>> (XEN)    [<ffff82d0801fd549>] ept_set_entry_sve+0x5fa/0x6e6
>>>>> (XEN)    [<ffff82d0801fd644>] ept_set_entry+0xf/0x11
>>>>> (XEN)    [<ffff82d0801f23c5>] p2m_set_entry+0xd4/0x112
>>>>> (XEN)    [<ffff82d0801f45c8>] set_shared_p2m_entry+0x2d0/0x39b
>>>>> (XEN)    [<ffff82d080205b51>] __mem_sharing_unshare_page+0x83f/0xbd6
>>>>> (XEN)    [<ffff82d0801f1f7d>] __get_gfn_type_access+0x224/0x2b0
>>>>> (XEN)    [<ffff82d0801c6df5>] hvm_hap_nested_page_fault+0x21f/0x795
>>>>> (XEN)    [<ffff82d0801e86ae>] vmx_vmexit_handler+0x1764/0x1af3
>>>>> (XEN)    [<ffff82d0801ee891>] vmx_asm_vmexit_handler+0x41/0xc0
>>>> The crash here is because I haven't successfully forced all the shared
>>>> pages in the host p2m to become unshared before copying,
>>>> which is the intended behaviour.
>>>>
>>>> I think I know how that has happened and how to fix it, but what you're
>>>> trying to do won't work by design. By the time a copy from host p2m to
>>>> altp2m occurs, the sharing is supposed to be broken.
>>>>
>>> Hm. If the sharing gets broken before the hostp2m->altp2m copy, maybe doing
>>> sharing after the view has been created is a better route? I guess the
>>> sharing code would need to be adapted to check if altp2m is enabled for
>>> that to work..
>>>
>>>
>>>> You're coming up with some ways of attempting to use altp2m that we
>>>> hadn't thought of. That's a good thing, and just what we want, but
>>>> there are limits to what we can support without more far-reaching
>>>> changes to existing parts of Xen. This isn't going to be do-able for
>>>> 4.6.
>>>>
>>> My main concern is just getting it to work, hitting 4.6 is not a priority.
>>> I understand that my stuff is highly experimental ;) While the gfn
>>> remapping feature is intriguing, in my setup I already have a copy of the
>>> page I would want to present during a singlestep-altp2mswitch - in the
>>> origin domains memory. AFAIU the gfn remapping would work only within the
>>> domains existing p2m space.
>> Understood, but for us hitting 4.6 with the initial version of altp2m
>> is *the* priority. And yes, remapping is restricted to pages from the
>> same host p2m.
> 
> It is fine for experimental features to have known interaction issues. 
> I don't necessarily see this as a blocker to 4.6, although it would
> indeed be better if it could be fixed in time.

I plan to fix the bug, such that unshare will always occur before a copy.
I don't plan to make the altp2m's able to have shared pages.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 13:40         ` Razvan Cojocaru
@ 2015-06-25 16:48           ` Ed White
  2015-06-25 17:39             ` Sahita, Ravi
                               ` (2 more replies)
  0 siblings, 3 replies; 116+ messages in thread
From: Ed White @ 2015-06-25 16:48 UTC (permalink / raw)
  To: Razvan Cojocaru, Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/25/2015 06:40 AM, Razvan Cojocaru wrote:
> On 06/25/2015 03:44 PM, Lengyel, Tamas wrote:
>> On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com
>> <mailto:edmund.h.white@intel.com>> wrote:
>>     On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
>>     >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
>>     >> +                                 unsigned long pfn, xenmem_access_t
>>     >> access)
>>     >> +{
>>     >>
>>     >
>>     > This function IMHO should be merged with p2m_set_mem_access and should be
>>     > triggerable with the same memop (XENMEM_access_op) hypercall instead of
>>     > introducing a new hvmop one.
>>
>>     I think we should vote on this. My view is that it makes
>>     XENMEM_access_op
>>     too complicated to use.
>>
>> The two functions are not very long and share enough code that it would
>> justify merging. The only big change added is the copy from host->alt
>> when the entry doesn't exists in alt, and that itself is pretty self
>> contained. Let's see if we can get a third opinion on it..
> 
> At first sight (I admit I'm rather late in the game and haven't had a
> chance to follow the series closely from the beginning), the two
> functions do seem to be mergeable (or at least the common code factored
> out in static helper functions).
> 
> Also, if Ed's concern is that the libxc API would look unnatural if
> xc_set_mem_access() is used for both purposes, as far as I can tell the
> only difference could be a non-zero last altp2m parameter, so I agree
> with you that the less functions doing almost the same thing the better
> (I have been guilty of this in the past too, for example with my
> xc_enable_introspection() function ;) ).
> 
> So I'd say, yes, if possible merge them.

So here are my reasons why I don't think we should merge the hypercalls,
in more detail:

Although the two hypercalls are similar, they are not identical. For one
thing, the existing hypercall can only be used cross-domain whereas the
altp2m one can be used cross-domain or intra-domain. Also, the existing
hypercall can be used to modify a range of pages and the new one can only
modify a single page, and that is intentional.

As I see it, the implementation in hvm.c would become a lot less clean,
and every direct user of the existing hypercall would have to change for
no good reason.

Razvan's suggestion to merge the functions that implement the p2m changes
I'm more ambivalent about. Personally, I prefer not to have code that
contains lots of conditional logic, which would be the result, but I
don't feel that strongly about it.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-25 16:38                     ` Ed White
@ 2015-06-25 17:29                       ` Lengyel, Tamas
  2015-06-25 20:34                         ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-25 17:29 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Andrew Cooper,
	Ian Jackson, Tim Deegan, Jan Beulich, Daniel De Graaf, Xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6789 bytes --]

On Thu, Jun 25, 2015 at 12:38 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/25/2015 02:00 AM, Andrew Cooper wrote:
> > On 24/06/15 23:55, Ed White wrote:
> >> On 06/24/2015 03:45 PM, Lengyel, Tamas wrote:
> >>> On Wed, Jun 24, 2015 at 6:02 PM, Ed White <edmund.h.white@intel.com>
> wrote:
> >>>
> >>>> On 06/24/2015 02:34 PM, Lengyel, Tamas wrote:
> >>>>> Hi Ed,
> >>>>> I tried the system using memsharing and I collected the following
> crash
> >>>>> log. In this test I ran memsharing on all pages of the domain before
> >>>>> activating altp2m and creating the view. Afterwards I used my updated
> >>>>> xen-access to create a copy of this p2m with only R/X permissions.
> The
> >>>> idea
> >>>>> would be that the altp2m view remains completely shared, while the
> >>>> hostp2m
> >>>>> would be able to do its CoW propagation as the domain is executing.
> >>>>>
> >>>>> (XEN) mm locking order violation: 278 > 239
> >>>>> (XEN) Xen BUG at mm-locks.h:68
> >>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> >>>>> (XEN) CPU:    2
> >>>>> (XEN) RIP:    e008:[<ffff82d0801f8768>]
> >>>>> p2m_altp2m_propagate_change+0x85/0x4a9
> >>>>> (XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor (d6v0)
> >>>>> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx:
> >>>> 0000000000000000
> >>>>> (XEN) rdx: ffff8302163a8000   rsi: 000000000000000a   rdi:
> >>>> ffff82d0802a069c
> >>>>> (XEN) rbp: ffff8302163afa68   rsp: ffff8302163af9e8   r8:
> >>>> ffff83021c000000
> >>>>> (XEN) r9:  0000000000000003   r10: 00000000000000ef   r11:
> >>>> 0000000000000003
> >>>>> (XEN) r12: ffff83010cc51820   r13: 0000000000000000   r14:
> >>>> ffff830158d90000
> >>>>> (XEN) r15: 0000000000025697   cr0: 0000000080050033   cr4:
> >>>> 00000000001526f0
> >>>>> (XEN) cr3: 00000000dbba3000   cr2: 00000000778c9714
> >>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >>>>> (XEN) Xen stack trace from rsp=ffff8302163af9e8:
> >>>>> (XEN)    ffff8302163af9f8 00000000803180f8 000000000000000c
> >>>> ffff82d0801892ee
> >>>>> (XEN)    ffff82d0801fb4d1 ffff83010cc51de0 000000000008ff49
> >>>> ffff82d08012f86a
> >>>>> (XEN)    ffff83010cc51820 ffff83010cc51820 0000000000000000
> >>>> 0000000000000000
> >>>>> (XEN)    ffff83010cc51820 0000000000000000 ffff8300dbb334b8
> >>>> ffff8302163afa00
> >>>>> (XEN)    ffff8302163afb18 ffff82d0801fd549 0000000500000009
> >>>> ffff830200000001
> >>>>> (XEN)    0000000000000001 ffff830158d90000 0000000000000002
> >>>> 000000000008ff49
> >>>>> (XEN)    0000000000025697 000000000000000c ffff8302163afae8
> >>>> 80c000008ff49175
> >>>>> (XEN)    80c00000d0a97175 01ff83010cc51820 0000000000000097
> >>>> ffff8300dbb33000
> >>>>> (XEN)    ffff8302163afb78 000000000008ff49 0000000000000000
> >>>> 0000000000000001
> >>>>> (XEN)    0000000000025697 ffff83010cc51820 ffff8302163afb38
> >>>> ffff82d0801fd644
> >>>>> (XEN)    ffffffffffffffff 00000000000d0a97 ffff8302163afb98
> >>>> ffff82d0801f23c5
> >>>>> (XEN)    ffff830158d90000 000000000cc51820 ffff830158d90000
> >>>> 000000000000000c
> >>>>> (XEN)    000000000008ff49 ffff83010cc51820 0000000000025697
> >>>> 00000000000d0a97
> >>>>> (XEN)    000000000008ff49 ffff830158d90000 ffff8302163afbd8
> >>>> ffff82d0801f45c8
> >>>>> (XEN)    ffff83010cc51820 000000000000000c ffff83008fd41170
> >>>> 000000000008ff49
> >>>>> (XEN)    0000000000025697 ffff82e001a152e0 ffff8302163afc58
> >>>> ffff82d080205b51
> >>>>> (XEN)    0000000000000009 000000000008ff49 ffff8300d0a97000
> >>>> ffff83008fd41160
> >>>>> (XEN)    ffff82e001a152f0 ffff82e0011fe920 ffff83010cc51820
> >>>> 0000000c00000000
> >>>>> (XEN)    0000000000025697 0000000000000003 ffff83010cc51820
> >>>> ffff8302163afd34
> >>>>> (XEN)    0000000000025697 0000000000000000 ffff8302163afca8
> >>>> ffff82d0801f1f7d
> >>>>> (XEN) Xen call trace:
> >>>>> (XEN)    [<ffff82d0801f8768>] p2m_altp2m_propagate_change+0x85/0x4a9
> >>>>> (XEN)    [<ffff82d0801fd549>] ept_set_entry_sve+0x5fa/0x6e6
> >>>>> (XEN)    [<ffff82d0801fd644>] ept_set_entry+0xf/0x11
> >>>>> (XEN)    [<ffff82d0801f23c5>] p2m_set_entry+0xd4/0x112
> >>>>> (XEN)    [<ffff82d0801f45c8>] set_shared_p2m_entry+0x2d0/0x39b
> >>>>> (XEN)    [<ffff82d080205b51>] __mem_sharing_unshare_page+0x83f/0xbd6
> >>>>> (XEN)    [<ffff82d0801f1f7d>] __get_gfn_type_access+0x224/0x2b0
> >>>>> (XEN)    [<ffff82d0801c6df5>] hvm_hap_nested_page_fault+0x21f/0x795
> >>>>> (XEN)    [<ffff82d0801e86ae>] vmx_vmexit_handler+0x1764/0x1af3
> >>>>> (XEN)    [<ffff82d0801ee891>] vmx_asm_vmexit_handler+0x41/0xc0
> >>>> The crash here is because I haven't successfully forced all the shared
> >>>> pages in the host p2m to become unshared before copying,
> >>>> which is the intended behaviour.
> >>>>
> >>>> I think I know how that has happened and how to fix it, but what
> you're
> >>>> trying to do won't work by design. By the time a copy from host p2m to
> >>>> altp2m occurs, the sharing is supposed to be broken.
> >>>>
> >>> Hm. If the sharing gets broken before the hostp2m->altp2m copy, maybe
> doing
> >>> sharing after the view has been created is a better route? I guess the
> >>> sharing code would need to be adapted to check if altp2m is enabled for
> >>> that to work..
> >>>
> >>>
> >>>> You're coming up with some ways of attempting to use altp2m that we
> >>>> hadn't thought of. That's a good thing, and just what we want, but
> >>>> there are limits to what we can support without more far-reaching
> >>>> changes to existing parts of Xen. This isn't going to be do-able for
> >>>> 4.6.
> >>>>
> >>> My main concern is just getting it to work, hitting 4.6 is not a
> priority.
> >>> I understand that my stuff is highly experimental ;) While the gfn
> >>> remapping feature is intriguing, in my setup I already have a copy of
> the
> >>> page I would want to present during a singlestep-altp2mswitch - in the
> >>> origin domains memory. AFAIU the gfn remapping would work only within
> the
> >>> domains existing p2m space.
> >> Understood, but for us hitting 4.6 with the initial version of altp2m
> >> is *the* priority. And yes, remapping is restricted to pages from the
> >> same host p2m.
> >
> > It is fine for experimental features to have known interaction issues.
> > I don't necessarily see this as a blocker to 4.6, although it would
> > indeed be better if it could be fixed in time.
>
> I plan to fix the bug, such that unshare will always occur before a copy.
> I don't plan to make the altp2m's able to have shared pages.
>
> Ed
>

For now that is of course fine, memsharing is experimental and that's what
I meant above. I would however like to see that option in the future
eventually. I'll be digging it into a bit more once altp2m is merged, do
you see any obstacle why it wouldn't/couldn't work?

Thanks,
Tamas

[-- Attachment #1.2: Type: text/html, Size: 9309 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 16:48           ` Ed White
@ 2015-06-25 17:39             ` Sahita, Ravi
  2015-06-25 18:22             ` Razvan Cojocaru
  2015-06-25 18:23             ` Lengyel, Tamas
  2 siblings, 0 replies; 116+ messages in thread
From: Sahita, Ravi @ 2015-06-25 17:39 UTC (permalink / raw)
  To: White, Edmund H, Razvan Cojocaru, Lengyel, Tamas
  Cc: Tim Deegan, Wei Liu, Andrew Cooper, Ian Jackson, Xen-devel,
	Jan Beulich, Daniel De Graaf

On 06/25/2015 06:40 AM, Razvan Cojocaru wrote:
> On 06/25/2015 03:44 PM, Lengyel, Tamas wrote:
>> On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com 
>> <mailto:edmund.h.white@intel.com>> wrote:
>>     On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
>>     >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
>>     >> +                                 unsigned long pfn, xenmem_access_t
>>     >> access)
>>     >> +{
>>     >>
>>     >
>>     > This function IMHO should be merged with p2m_set_mem_access and should be
>>     > triggerable with the same memop (XENMEM_access_op) hypercall instead of
>>     > introducing a new hvmop one.
>>
>>     I think we should vote on this. My view is that it makes
>>     XENMEM_access_op
>>     too complicated to use.
>>
>> The two functions are not very long and share enough code that it 
>> would justify merging. The only big change added is the copy from 
>> host->alt when the entry doesn't exists in alt, and that itself is 
>> pretty self contained. Let's see if we can get a third opinion on it..
> 
> At first sight (I admit I'm rather late in the game and haven't had a 
> chance to follow the series closely from the beginning), the two 
> functions do seem to be mergeable (or at least the common code 
> factored out in static helper functions).
> 
> Also, if Ed's concern is that the libxc API would look unnatural if
> xc_set_mem_access() is used for both purposes, as far as I can tell 
> the only difference could be a non-zero last altp2m parameter, so I 
> agree with you that the less functions doing almost the same thing the 
> better (I have been guilty of this in the past too, for example with 
> my
> xc_enable_introspection() function ;) ).
> 
> So I'd say, yes, if possible merge them.

So here are my reasons why I don't think we should merge the hypercalls, in more detail:

Although the two hypercalls are similar, they are not identical. For one thing, the existing hypercall can only be used cross-domain whereas the altp2m one can be used cross-domain or intra-domain. Also, the existing hypercall can be used to modify a range of pages and the new one can only modify a single page, and that is intentional.

As I see it, the implementation in hvm.c would become a lot less clean, and every direct user of the existing hypercall would have to change for no good reason.

Razvan's suggestion to merge the functions that implement the p2m changes I'm more ambivalent about. Personally, I prefer not to have code that contains lots of conditional logic, which would be the result, but I don't feel that strongly about it.

Ed

Ravi> This also has implications for the XSM hooks used for these hypercalls - altp2m default policy is to allow for intra-domain , which is not the case for XENMEM_access_op - 
Any thoughts on how to manage this difference if we merge them?

Ravi

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 16:31     ` Ed White
@ 2015-06-25 17:42       ` Lengyel, Tamas
  2015-06-25 20:27         ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-25 17:42 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 1163 bytes --]

On Thu, Jun 25, 2015 at 12:31 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/24/2015 07:44 PM, Lengyel, Tamas wrote:
> >> +    if ( altp2m_active )
> >> +    {
> >> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m)
> ==
> >> 1 )
> >> +        {
> >> +            /* entry was lazily copied from host -- retry */
> >>
> >
> > So I'm not fully following this logic here. I can see that the altp2m
> entry
> > got copied from the host. Why is there a need for the retry, why not just
> > continue?
>
> At this point the EPT's that the hardware is using have been made valid
> by software, but the hardware has already failed the access so you have
> to restart the operation. This isn't in any way specific to altp2m,
> it's how page fault logic works generally.
>
> Ed
>

Oh I see, you are working with the assumption that the fault was triggered
by the entry not being present in the altp2m EPT, thus it's enough to copy
it to resolve the fault. However, if the hostp2m permissions are
restricted, there will be a follow-up fault again. Would it maybe make
sense to check for that condition and save having to hit two faults?

Tamas

[-- Attachment #1.2: Type: text/html, Size: 1722 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 16:48           ` Ed White
  2015-06-25 17:39             ` Sahita, Ravi
@ 2015-06-25 18:22             ` Razvan Cojocaru
  2015-06-25 18:23             ` Lengyel, Tamas
  2 siblings, 0 replies; 116+ messages in thread
From: Razvan Cojocaru @ 2015-06-25 18:22 UTC (permalink / raw)
  To: Ed White, Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/25/2015 07:48 PM, Ed White wrote:
> On 06/25/2015 06:40 AM, Razvan Cojocaru wrote:
>> On 06/25/2015 03:44 PM, Lengyel, Tamas wrote:
>>> On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com
>>> <mailto:edmund.h.white@intel.com>> wrote:
>>>     On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
>>>     >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
>>>     >> +                                 unsigned long pfn, xenmem_access_t
>>>     >> access)
>>>     >> +{
>>>     >>
>>>     >
>>>     > This function IMHO should be merged with p2m_set_mem_access and should be
>>>     > triggerable with the same memop (XENMEM_access_op) hypercall instead of
>>>     > introducing a new hvmop one.
>>>
>>>     I think we should vote on this. My view is that it makes
>>>     XENMEM_access_op
>>>     too complicated to use.
>>>
>>> The two functions are not very long and share enough code that it would
>>> justify merging. The only big change added is the copy from host->alt
>>> when the entry doesn't exists in alt, and that itself is pretty self
>>> contained. Let's see if we can get a third opinion on it..
>>
>> At first sight (I admit I'm rather late in the game and haven't had a
>> chance to follow the series closely from the beginning), the two
>> functions do seem to be mergeable (or at least the common code factored
>> out in static helper functions).
>>
>> Also, if Ed's concern is that the libxc API would look unnatural if
>> xc_set_mem_access() is used for both purposes, as far as I can tell the
>> only difference could be a non-zero last altp2m parameter, so I agree
>> with you that the less functions doing almost the same thing the better
>> (I have been guilty of this in the past too, for example with my
>> xc_enable_introspection() function ;) ).
>>
>> So I'd say, yes, if possible merge them.
> 
> So here are my reasons why I don't think we should merge the hypercalls,
> in more detail:
> 
> Although the two hypercalls are similar, they are not identical. For one
> thing, the existing hypercall can only be used cross-domain whereas the
> altp2m one can be used cross-domain or intra-domain. Also, the existing
> hypercall can be used to modify a range of pages and the new one can only
> modify a single page, and that is intentional.
> 
> As I see it, the implementation in hvm.c would become a lot less clean,
> and every direct user of the existing hypercall would have to change for
> no good reason.

Thank you for the explanation. While it could be argued that a non-zero
altp2m parameter passed to a merged xc_set_mem_access() could be the
xc_set_altp2m_mem_access() selector, and that the function can then
return EINVAL for parameters that don't fit the semantics of the
selected behaviour, I also don't have a strong aversion to those
functions not being merged. So I'll defer this to Tamas.

> Razvan's suggestion to merge the functions that implement the p2m changes
> I'm more ambivalent about. Personally, I prefer not to have code that
> contains lots of conditional logic, which would be the result, but I
> don't feel that strongly about it.

Well, not necessarily merge the functions, but at least have as much
common code as possible factored out in helper static functions that
both of them call.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 16:48           ` Ed White
  2015-06-25 17:39             ` Sahita, Ravi
  2015-06-25 18:22             ` Razvan Cojocaru
@ 2015-06-25 18:23             ` Lengyel, Tamas
  2015-06-25 20:46               ` Ed White
  2 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-25 18:23 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 3606 bytes --]

On Thu, Jun 25, 2015 at 12:48 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/25/2015 06:40 AM, Razvan Cojocaru wrote:
> > On 06/25/2015 03:44 PM, Lengyel, Tamas wrote:
> >> On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com
> >> <mailto:edmund.h.white@intel.com>> wrote:
> >>     On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
> >>     >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
> >>     >> +                                 unsigned long pfn,
> xenmem_access_t
> >>     >> access)
> >>     >> +{
> >>     >>
> >>     >
> >>     > This function IMHO should be merged with p2m_set_mem_access and
> should be
> >>     > triggerable with the same memop (XENMEM_access_op) hypercall
> instead of
> >>     > introducing a new hvmop one.
> >>
> >>     I think we should vote on this. My view is that it makes
> >>     XENMEM_access_op
> >>     too complicated to use.
> >>
> >> The two functions are not very long and share enough code that it would
> >> justify merging. The only big change added is the copy from host->alt
> >> when the entry doesn't exists in alt, and that itself is pretty self
> >> contained. Let's see if we can get a third opinion on it..
> >
> > At first sight (I admit I'm rather late in the game and haven't had a
> > chance to follow the series closely from the beginning), the two
> > functions do seem to be mergeable (or at least the common code factored
> > out in static helper functions).
> >
> > Also, if Ed's concern is that the libxc API would look unnatural if
> > xc_set_mem_access() is used for both purposes, as far as I can tell the
> > only difference could be a non-zero last altp2m parameter, so I agree
> > with you that the less functions doing almost the same thing the better
> > (I have been guilty of this in the past too, for example with my
> > xc_enable_introspection() function ;) ).
> >
> > So I'd say, yes, if possible merge them.
>
> So here are my reasons why I don't think we should merge the hypercalls,
> in more detail:
>
> Although the two hypercalls are similar, they are not identical. For one
> thing, the existing hypercall can only be used cross-domain whereas the
> altp2m one can be used cross-domain or intra-domain.


Fair point, the use of rcu_lock_live_remote_domain_by_id in the memaccess
memop handler precludes it working for the intra-domain case. However, now
that we have a valid use-case for it working when a domain applies
restrictions on itself, it would be fine to change that to
rcu_lock_domain_by_any_id. It has been just used as a sanity check. The
code you are using in hvm.c could be abstracted as p2m_altp2m_sanity_check:
"!is_hvm_domain(d) || !hvm_altp2m_supported() || !d->arch.altp2m_active"
and ran when the altp2m field is non-zero to catch buggy tools.


> Also, the existing
> hypercall can be used to modify a range of pages and the new one can only
> modify a single page, and that is intentional.
>

Please elaborate on this.


>
> As I see it, the implementation in hvm.c would become a lot less clean,
> and every direct user of the existing hypercall would have to change for
> no good reason.
>

For 4.6 I reworked the entire vm_event/mem_access system, so that is
already happening irrespective of altp2m. It's fine to add support for the
altp2m field before 4.6 freezes.


> Razvan's suggestion to merge the functions that implement the p2m changes
> I'm more ambivalent about. Personally, I prefer not to have code that
> contains lots of conditional logic, which would be the result, but I
> don't feel that strongly about it.
>
> Ed
>

Thanks,
Tamas

[-- Attachment #1.2: Type: text/html, Size: 5116 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 17:42       ` Lengyel, Tamas
@ 2015-06-25 20:27         ` Ed White
  2015-06-25 21:33           ` Lengyel, Tamas
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-25 20:27 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/25/2015 10:42 AM, Lengyel, Tamas wrote:
> On Thu, Jun 25, 2015 at 12:31 PM, Ed White <edmund.h.white@intel.com> wrote:
> 
>> On 06/24/2015 07:44 PM, Lengyel, Tamas wrote:
>>>> +    if ( altp2m_active )
>>>> +    {
>>>> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec, &p2m)
>> ==
>>>> 1 )
>>>> +        {
>>>> +            /* entry was lazily copied from host -- retry */
>>>>
>>>
>>> So I'm not fully following this logic here. I can see that the altp2m
>> entry
>>> got copied from the host. Why is there a need for the retry, why not just
>>> continue?
>>
>> At this point the EPT's that the hardware is using have been made valid
>> by software, but the hardware has already failed the access so you have
>> to restart the operation. This isn't in any way specific to altp2m,
>> it's how page fault logic works generally.
>>
>> Ed
>>
> 
> Oh I see, you are working with the assumption that the fault was triggered
> by the entry not being present in the altp2m EPT, thus it's enough to copy
> it to resolve the fault. However, if the hostp2m permissions are
> restricted, there will be a follow-up fault again. Would it maybe make
> sense to check for that condition and save having to hit two faults?

It's not an assumption, it's a fact because the altp2m nested page fault
handler returns 1 IFF it has copied from the host p2m.

Once again this is standard page fault handling. Preemptively checking
for a condition that would cause another fault shortens the path for
cases that would re-fault, but lengthens it for all the cases that would
not. In a typical scenario (which your current experiments are not) you
expect most cases not to re-fault. The cases that do re-fault are much
more expensive anyway.

There are other reasons not to preemptively check, but that's the most
straightforward one.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m
  2015-06-25 17:29                       ` Lengyel, Tamas
@ 2015-06-25 20:34                         ` Ed White
  0 siblings, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-25 20:34 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Andrew Cooper,
	Ian Jackson, Tim Deegan, Jan Beulich, Daniel De Graaf, Xen-devel

On 06/25/2015 10:29 AM, Lengyel, Tamas wrote:
>>
>> I plan to fix the bug, such that unshare will always occur before a copy.
>> I don't plan to make the altp2m's able to have shared pages.
>>
>> Ed
>>
> 
> For now that is of course fine, memsharing is experimental and that's what
> I meant above. I would however like to see that option in the future
> eventually. I'll be digging it into a bit more once altp2m is merged, do
> you see any obstacle why it wouldn't/couldn't work?

The general philosophy is that an altp2m is a copy of the host p2m with
the same or more restrictive access permissions. Remapping tweaks that
by allowing different permutations of the gfn->mfn mappings, but only
from the same gfn and mfn sets.

There are a number of issues to do with locking order, coherency
between multiple p2m's and code that operates on the host p2m 
implicitly that would need to be addressed to handle all
page types in altp2m's, but I think what you are asking for violates
the overarching philosophy stated above.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 18:23             ` Lengyel, Tamas
@ 2015-06-25 20:46               ` Ed White
  2015-06-25 22:45                 ` Lengyel, Tamas
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-25 20:46 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/25/2015 11:23 AM, Lengyel, Tamas wrote:
> On Thu, Jun 25, 2015 at 12:48 PM, Ed White <edmund.h.white@intel.com> wrote:
> 
>> On 06/25/2015 06:40 AM, Razvan Cojocaru wrote:
>>> On 06/25/2015 03:44 PM, Lengyel, Tamas wrote:
>>>> On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com
>>>> <mailto:edmund.h.white@intel.com>> wrote:
>>>>     On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
>>>>     >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t idx,
>>>>     >> +                                 unsigned long pfn,
>> xenmem_access_t
>>>>     >> access)
>>>>     >> +{
>>>>     >>
>>>>     >
>>>>     > This function IMHO should be merged with p2m_set_mem_access and
>> should be
>>>>     > triggerable with the same memop (XENMEM_access_op) hypercall
>> instead of
>>>>     > introducing a new hvmop one.
>>>>
>>>>     I think we should vote on this. My view is that it makes
>>>>     XENMEM_access_op
>>>>     too complicated to use.
>>>>
>>>> The two functions are not very long and share enough code that it would
>>>> justify merging. The only big change added is the copy from host->alt
>>>> when the entry doesn't exists in alt, and that itself is pretty self
>>>> contained. Let's see if we can get a third opinion on it..
>>>
>>> At first sight (I admit I'm rather late in the game and haven't had a
>>> chance to follow the series closely from the beginning), the two
>>> functions do seem to be mergeable (or at least the common code factored
>>> out in static helper functions).
>>>
>>> Also, if Ed's concern is that the libxc API would look unnatural if
>>> xc_set_mem_access() is used for both purposes, as far as I can tell the
>>> only difference could be a non-zero last altp2m parameter, so I agree
>>> with you that the less functions doing almost the same thing the better
>>> (I have been guilty of this in the past too, for example with my
>>> xc_enable_introspection() function ;) ).
>>>
>>> So I'd say, yes, if possible merge them.
>>
>> So here are my reasons why I don't think we should merge the hypercalls,
>> in more detail:
>>
>> Although the two hypercalls are similar, they are not identical. For one
>> thing, the existing hypercall can only be used cross-domain whereas the
>> altp2m one can be used cross-domain or intra-domain.
> 
> 
> Fair point, the use of rcu_lock_live_remote_domain_by_id in the memaccess
> memop handler precludes it working for the intra-domain case. However, now
> that we have a valid use-case for it working when a domain applies
> restrictions on itself, it would be fine to change that to
> rcu_lock_domain_by_any_id. It has been just used as a sanity check. The
> code you are using in hvm.c could be abstracted as p2m_altp2m_sanity_check:
> "!is_hvm_domain(d) || !hvm_altp2m_supported() || !d->arch.altp2m_active"
> and ran when the altp2m field is non-zero to catch buggy tools.

Whether or not it's possible to merge the two isn't in dispute. The
question is which path results in the easiest to understand and
maintain outcome for users of the hypercalls and maintainers of
the implementation. Having said that, I don't think your check
catches an attempt to place an intra-domain restriction on the
host p2m with altp2m active.

> 
>> Also, the existing
>> hypercall can be used to modify a range of pages and the new one can only
>> modify a single page, and that is intentional.
>>
> 
> Please elaborate on this.

In order to keep the p2m's coherent and respect the primacy of the host
p2m, changes that occur in the host p2m can cause changes in altp2m's to
be lost. At the moment there is not even any notification that has
occurred, although that's something I'm working on. The minimum
*guaranteed* granularity of that type of altp2m invalidation is
an entire altp2m. The more pages you change in an altp2m, the more
chance there is of a collision causing an invalidation, so for this
version of altp2m we encourage as few changes as possible by requiring
a separate hypercall for each page modification.

Ed

>> As I see it, the implementation in hvm.c would become a lot less clean,
>> and every direct user of the existing hypercall would have to change for
>> no good reason.
>>
> 
> For 4.6 I reworked the entire vm_event/mem_access system, so that is
> already happening irrespective of altp2m. It's fine to add support for the
> altp2m field before 4.6 freezes.
> 
> 
>> Razvan's suggestion to merge the functions that implement the p2m changes
>> I'm more ambivalent about. Personally, I prefer not to have code that
>> contains lots of conditional logic, which would be the result, but I
>> don't feel that strongly about it.
>>
>> Ed
>>
> 
> Thanks,
> Tamas
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 20:27         ` Ed White
@ 2015-06-25 21:33           ` Lengyel, Tamas
  0 siblings, 0 replies; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-25 21:33 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Xen-devel,
	Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 2043 bytes --]

On Thu, Jun 25, 2015 at 4:27 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/25/2015 10:42 AM, Lengyel, Tamas wrote:
> > On Thu, Jun 25, 2015 at 12:31 PM, Ed White <edmund.h.white@intel.com>
> wrote:
> >
> >> On 06/24/2015 07:44 PM, Lengyel, Tamas wrote:
> >>>> +    if ( altp2m_active )
> >>>> +    {
> >>>> +        if ( altp2mhvm_hap_nested_page_fault(v, gpa, gla, npfec,
> &p2m)
> >> ==
> >>>> 1 )
> >>>> +        {
> >>>> +            /* entry was lazily copied from host -- retry */
> >>>>
> >>>
> >>> So I'm not fully following this logic here. I can see that the altp2m
> >> entry
> >>> got copied from the host. Why is there a need for the retry, why not
> just
> >>> continue?
> >>
> >> At this point the EPT's that the hardware is using have been made valid
> >> by software, but the hardware has already failed the access so you have
> >> to restart the operation. This isn't in any way specific to altp2m,
> >> it's how page fault logic works generally.
> >>
> >> Ed
> >>
> >
> > Oh I see, you are working with the assumption that the fault was
> triggered
> > by the entry not being present in the altp2m EPT, thus it's enough to
> copy
> > it to resolve the fault. However, if the hostp2m permissions are
> > restricted, there will be a follow-up fault again. Would it maybe make
> > sense to check for that condition and save having to hit two faults?
>
> It's not an assumption, it's a fact because the altp2m nested page fault
> handler returns 1 IFF it has copied from the host p2m.
>
> Once again this is standard page fault handling. Preemptively checking
> for a condition that would cause another fault shortens the path for
> cases that would re-fault, but lengthens it for all the cases that would
> not. In a typical scenario (which your current experiments are not) you
> expect most cases not to re-fault. The cases that do re-fault are much
> more expensive anyway.
>
> There are other reasons not to preemptively check, but that's the most
> straightforward one.
>
> Ed
>

OK, thanks, makes sense.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 2925 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 20:46               ` Ed White
@ 2015-06-25 22:45                 ` Lengyel, Tamas
  2015-06-25 23:10                   ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Lengyel, Tamas @ 2015-06-25 22:45 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf


[-- Attachment #1.1: Type: text/plain, Size: 4872 bytes --]

On Thu, Jun 25, 2015 at 4:46 PM, Ed White <edmund.h.white@intel.com> wrote:

> On 06/25/2015 11:23 AM, Lengyel, Tamas wrote:
> > On Thu, Jun 25, 2015 at 12:48 PM, Ed White <edmund.h.white@intel.com>
> wrote:
> >
> >> On 06/25/2015 06:40 AM, Razvan Cojocaru wrote:
> >>> On 06/25/2015 03:44 PM, Lengyel, Tamas wrote:
> >>>> On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com
> >>>> <mailto:edmund.h.white@intel.com>> wrote:
> >>>>     On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
> >>>>     >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t
> idx,
> >>>>     >> +                                 unsigned long pfn,
> >> xenmem_access_t
> >>>>     >> access)
> >>>>     >> +{
> >>>>     >>
> >>>>     >
> >>>>     > This function IMHO should be merged with p2m_set_mem_access and
> >> should be
> >>>>     > triggerable with the same memop (XENMEM_access_op) hypercall
> >> instead of
> >>>>     > introducing a new hvmop one.
> >>>>
> >>>>     I think we should vote on this. My view is that it makes
> >>>>     XENMEM_access_op
> >>>>     too complicated to use.
> >>>>
> >>>> The two functions are not very long and share enough code that it
> would
> >>>> justify merging. The only big change added is the copy from host->alt
> >>>> when the entry doesn't exists in alt, and that itself is pretty self
> >>>> contained. Let's see if we can get a third opinion on it..
> >>>
> >>> At first sight (I admit I'm rather late in the game and haven't had a
> >>> chance to follow the series closely from the beginning), the two
> >>> functions do seem to be mergeable (or at least the common code factored
> >>> out in static helper functions).
> >>>
> >>> Also, if Ed's concern is that the libxc API would look unnatural if
> >>> xc_set_mem_access() is used for both purposes, as far as I can tell the
> >>> only difference could be a non-zero last altp2m parameter, so I agree
> >>> with you that the less functions doing almost the same thing the better
> >>> (I have been guilty of this in the past too, for example with my
> >>> xc_enable_introspection() function ;) ).
> >>>
> >>> So I'd say, yes, if possible merge them.
> >>
> >> So here are my reasons why I don't think we should merge the hypercalls,
> >> in more detail:
> >>
> >> Although the two hypercalls are similar, they are not identical. For one
> >> thing, the existing hypercall can only be used cross-domain whereas the
> >> altp2m one can be used cross-domain or intra-domain.
> >
> >
> > Fair point, the use of rcu_lock_live_remote_domain_by_id in the memaccess
> > memop handler precludes it working for the intra-domain case. However,
> now
> > that we have a valid use-case for it working when a domain applies
> > restrictions on itself, it would be fine to change that to
> > rcu_lock_domain_by_any_id. It has been just used as a sanity check. The
> > code you are using in hvm.c could be abstracted as
> p2m_altp2m_sanity_check:
> > "!is_hvm_domain(d) || !hvm_altp2m_supported() || !d->arch.altp2m_active"
> > and ran when the altp2m field is non-zero to catch buggy tools.
>
> Whether or not it's possible to merge the two isn't in dispute. The
> question is which path results in the easiest to understand and
> maintain outcome for users of the hypercalls and maintainers of
> the implementation.


If it turns out to be that merging the two is too big of a hassle, I would
agree with Razvan, some code-deduplication would be fine instead of a
complete merger. I still think it would be cleaner.


> Having said that, I don't think your check
> catches an attempt to place an intra-domain restriction on the
> host p2m with altp2m active.
>

The check above I copied from the existing code you do your hvm op. Do you
explicitly check for that conditions somewhere else? Why not append it you
need to restrict that condition?


>
> >
> >> Also, the existing
> >> hypercall can be used to modify a range of pages and the new one can
> only
> >> modify a single page, and that is intentional.
> >>
> >
> > Please elaborate on this.
>
> In order to keep the p2m's coherent and respect the primacy of the host
> p2m, changes that occur in the host p2m can cause changes in altp2m's to
> be lost. At the moment there is not even any notification that has
> occurred, although that's something I'm working on. The minimum
> *guaranteed* granularity of that type of altp2m invalidation is
> an entire altp2m. The more pages you change in an altp2m, the more
> chance there is of a collision causing an invalidation, so for this
> version of altp2m we encourage as few changes as possible by requiring
> a separate hypercall for each page modification.
>

> Ed
>

OK, but we could check for the condition where npages>1 and an altp2m is
specified to return -EOPNOTSUPP. It could be documented in the exposed part
of the API that with altp2m this is a restriction.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 6863 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-25 22:45                 ` Lengyel, Tamas
@ 2015-06-25 23:10                   ` Ed White
  0 siblings, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-25 23:10 UTC (permalink / raw)
  To: Lengyel, Tamas
  Cc: Ravi Sahita, Wei Liu, Razvan Cojocaru, Tim Deegan, Ian Jackson,
	Xen-devel, Jan Beulich, Andrew Cooper, Daniel De Graaf

On 06/25/2015 03:45 PM, Lengyel, Tamas wrote:
> On Thu, Jun 25, 2015 at 4:46 PM, Ed White <edmund.h.white@intel.com> wrote:
> 
>> On 06/25/2015 11:23 AM, Lengyel, Tamas wrote:
>>> On Thu, Jun 25, 2015 at 12:48 PM, Ed White <edmund.h.white@intel.com>
>> wrote:
>>>
>>>> On 06/25/2015 06:40 AM, Razvan Cojocaru wrote:
>>>>> On 06/25/2015 03:44 PM, Lengyel, Tamas wrote:
>>>>>> On Wed, Jun 24, 2015 at 2:06 PM, Ed White <edmund.h.white@intel.com
>>>>>> <mailto:edmund.h.white@intel.com>> wrote:
>>>>>>     On 06/24/2015 09:15 AM, Lengyel, Tamas wrote:
>>>>>>     >> +bool_t p2m_set_altp2m_mem_access(struct domain *d, uint16_t
>> idx,
>>>>>>     >> +                                 unsigned long pfn,
>>>> xenmem_access_t
>>>>>>     >> access)
>>>>>>     >> +{
>>>>>>     >>
>>>>>>     >
>>>>>>     > This function IMHO should be merged with p2m_set_mem_access and
>>>> should be
>>>>>>     > triggerable with the same memop (XENMEM_access_op) hypercall
>>>> instead of
>>>>>>     > introducing a new hvmop one.
>>>>>>
>>>>>>     I think we should vote on this. My view is that it makes
>>>>>>     XENMEM_access_op
>>>>>>     too complicated to use.
>>>>>>
>>>>>> The two functions are not very long and share enough code that it
>> would
>>>>>> justify merging. The only big change added is the copy from host->alt
>>>>>> when the entry doesn't exists in alt, and that itself is pretty self
>>>>>> contained. Let's see if we can get a third opinion on it..
>>>>>
>>>>> At first sight (I admit I'm rather late in the game and haven't had a
>>>>> chance to follow the series closely from the beginning), the two
>>>>> functions do seem to be mergeable (or at least the common code factored
>>>>> out in static helper functions).
>>>>>
>>>>> Also, if Ed's concern is that the libxc API would look unnatural if
>>>>> xc_set_mem_access() is used for both purposes, as far as I can tell the
>>>>> only difference could be a non-zero last altp2m parameter, so I agree
>>>>> with you that the less functions doing almost the same thing the better
>>>>> (I have been guilty of this in the past too, for example with my
>>>>> xc_enable_introspection() function ;) ).
>>>>>
>>>>> So I'd say, yes, if possible merge them.
>>>>
>>>> So here are my reasons why I don't think we should merge the hypercalls,
>>>> in more detail:
>>>>
>>>> Although the two hypercalls are similar, they are not identical. For one
>>>> thing, the existing hypercall can only be used cross-domain whereas the
>>>> altp2m one can be used cross-domain or intra-domain.
>>>
>>>
>>> Fair point, the use of rcu_lock_live_remote_domain_by_id in the memaccess
>>> memop handler precludes it working for the intra-domain case. However,
>> now
>>> that we have a valid use-case for it working when a domain applies
>>> restrictions on itself, it would be fine to change that to
>>> rcu_lock_domain_by_any_id. It has been just used as a sanity check. The
>>> code you are using in hvm.c could be abstracted as
>> p2m_altp2m_sanity_check:
>>> "!is_hvm_domain(d) || !hvm_altp2m_supported() || !d->arch.altp2m_active"
>>> and ran when the altp2m field is non-zero to catch buggy tools.
>>
>> Whether or not it's possible to merge the two isn't in dispute. The
>> question is which path results in the easiest to understand and
>> maintain outcome for users of the hypercalls and maintainers of
>> the implementation.
> 
> 
> If it turns out to be that merging the two is too big of a hassle, I would
> agree with Razvan, some code-deduplication would be fine instead of a
> complete merger. I still think it would be cleaner.
> 
> 
>> Having said that, I don't think your check
>> catches an attempt to place an intra-domain restriction on the
>> host p2m with altp2m active.
>>
> 
> The check above I copied from the existing code you do your hvm op. Do you
> explicitly check for that conditions somewhere else? Why not append it you
> need to restrict that condition?
> 

The existing altp2m HVM op can't operate on the host p2m, so I don't
need a check, which I think reinforces the point I'm trying to make:
the code in hvm.c will get spaghetti-like if we go down this route.

>>
>>>
>>>> Also, the existing
>>>> hypercall can be used to modify a range of pages and the new one can
>> only
>>>> modify a single page, and that is intentional.
>>>>
>>>
>>> Please elaborate on this.
>>
>> In order to keep the p2m's coherent and respect the primacy of the host
>> p2m, changes that occur in the host p2m can cause changes in altp2m's to
>> be lost. At the moment there is not even any notification that has
>> occurred, although that's something I'm working on. The minimum
>> *guaranteed* granularity of that type of altp2m invalidation is
>> an entire altp2m. The more pages you change in an altp2m, the more
>> chance there is of a collision causing an invalidation, so for this
>> version of altp2m we encourage as few changes as possible by requiring
>> a separate hypercall for each page modification.
>>
> 
>> Ed
>>
> 
> OK, but we could check for the condition where npages>1 and an altp2m is
> specified to return -EOPNOTSUPP. It could be documented in the exposed part
> of the API that with altp2m this is a restriction.
> 

See above.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-25 16:36         ` Ed White
@ 2015-06-26  6:04           ` Jan Beulich
  2015-06-26 16:27             ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Jan Beulich @ 2015-06-26  6:04 UTC (permalink / raw)
  To: George Dunlap, Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 25.06.15 at 18:36, <edmund.h.white@intel.com> wrote:
> On 06/25/2015 01:12 AM, Jan Beulich wrote:
>>>>> On 24.06.15 at 19:53, <edmund.h.white@intel.com> wrote:
>>> On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>>>> --- a/xen/include/asm-x86/p2m.h
>>>>> +++ b/xen/include/asm-x86/p2m.h
>>>>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>>>>                                         p2m_access_t *p2ma,
>>>>>                                         p2m_query_t q,
>>>>>                                         unsigned int *page_order);
>>>>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>>>>> +                                         unsigned long gfn,
>>>>> +                                         mfn_t mfn, unsigned int 
>>> page_order,
>>>>> +                                         p2m_type_t p2mt,
>>>>> +                                         p2m_access_t p2ma,
>>>>> +                                         unsigned int sve);
>>>>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>>>>> +                                         unsigned long gfn,
>>>>> +                                         p2m_type_t *p2mt,
>>>>> +                                         p2m_access_t *p2ma,
>>>>> +                                         p2m_query_t q,
>>>>> +                                         unsigned int *page_order,
>>>>> +                                         unsigned int *sve);
>>>>
>>>> I have to admit that I find the _full suffixes here pretty odd. Based
>>>> on the functionality, they should be _sve. But then it seems
>>>> questionable how they could be useful to the generic p2m layer
>>>> anyway, i.e. why there would need to be such hooks in the first
>>>> place.
>>>
>>> I did originally use _sve suffixes. I changed them because there
>>> may be some future case where these routines control some other
>>> EPTE bit too. I made them hooks because I thought calling ept...
>>> functions directly would be a layering violation.
>> 
>> Indeed it would. But thinking about it more, I would suggest to
>> extend the existing accessors rather than adding new ones.
>> Just consider what would result when further such return values
>> are going to be needed in the future: I don't see us adding
>> _fuller, _fullest, etc variants. Perhaps just make the new output
>> an optional generic "flags" one. One might even consider folding
>> it with order, or even consolidate all the outputs into a single
>> structure.
> 
> The new functions are called in 3 places only, so changing them
> later would have minimal impact. The existing functions are called
> in many, many places. I *really* don't want to go changing the
> amount of existing code that doing what you suggest would entail
> at this late stage.

I continue to think differently (and I don't consider "at this late
stage" a particularly relevant argument), but the maintainer will
have the final say anyway - George?

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-26  6:04           ` Jan Beulich
@ 2015-06-26 16:27             ` Ed White
  2015-07-06 17:12               ` George Dunlap
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-26 16:27 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

On 06/25/2015 11:04 PM, Jan Beulich wrote:
>>>> On 25.06.15 at 18:36, <edmund.h.white@intel.com> wrote:
>> On 06/25/2015 01:12 AM, Jan Beulich wrote:
>>>>>> On 24.06.15 at 19:53, <edmund.h.white@intel.com> wrote:
>>>> On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>>>>> --- a/xen/include/asm-x86/p2m.h
>>>>>> +++ b/xen/include/asm-x86/p2m.h
>>>>>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>>>>>                                         p2m_access_t *p2ma,
>>>>>>                                         p2m_query_t q,
>>>>>>                                         unsigned int *page_order);
>>>>>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>>>>>> +                                         unsigned long gfn,
>>>>>> +                                         mfn_t mfn, unsigned int 
>>>> page_order,
>>>>>> +                                         p2m_type_t p2mt,
>>>>>> +                                         p2m_access_t p2ma,
>>>>>> +                                         unsigned int sve);
>>>>>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>>>>>> +                                         unsigned long gfn,
>>>>>> +                                         p2m_type_t *p2mt,
>>>>>> +                                         p2m_access_t *p2ma,
>>>>>> +                                         p2m_query_t q,
>>>>>> +                                         unsigned int *page_order,
>>>>>> +                                         unsigned int *sve);
>>>>>
>>>>> I have to admit that I find the _full suffixes here pretty odd. Based
>>>>> on the functionality, they should be _sve. But then it seems
>>>>> questionable how they could be useful to the generic p2m layer
>>>>> anyway, i.e. why there would need to be such hooks in the first
>>>>> place.
>>>>
>>>> I did originally use _sve suffixes. I changed them because there
>>>> may be some future case where these routines control some other
>>>> EPTE bit too. I made them hooks because I thought calling ept...
>>>> functions directly would be a layering violation.
>>>
>>> Indeed it would. But thinking about it more, I would suggest to
>>> extend the existing accessors rather than adding new ones.
>>> Just consider what would result when further such return values
>>> are going to be needed in the future: I don't see us adding
>>> _fuller, _fullest, etc variants. Perhaps just make the new output
>>> an optional generic "flags" one. One might even consider folding
>>> it with order, or even consolidate all the outputs into a single
>>> structure.
>>
>> The new functions are called in 3 places only, so changing them
>> later would have minimal impact. The existing functions are called
>> in many, many places. I *really* don't want to go changing the
>> amount of existing code that doing what you suggest would entail
>> at this late stage.
> 
> I continue to think differently (and I don't consider "at this late
> stage" a particularly relevant argument), but the maintainer will
> have the final say anyway - George?
> 

The patch as it is now doesn't disturb (and risk breaking) any
existing code. I'd much rather stick with that for 4.6, even if
only on the condition that I have to change it later. If I do
what you suggest, that sets me up to fail to get anything in
4.6. That may not matter to you, but it matters to me.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-24 18:19       ` Andrew Cooper
@ 2015-06-26 16:30         ` Ed White
  2015-06-29 13:03           ` Andrew Cooper
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-26 16:30 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/24/2015 11:19 AM, Andrew Cooper wrote:
> On 24/06/15 18:47, Ed White wrote:
>>>> This looks like some hoop jumping around the assertions in
>>>> domain_pause() and vcpu_pause().
>>>>
>>>> We should probably have some new helpers where the domain needs to be
>>>> paused, possibly while in context.  The current domain/vcpu_pause() are
>>>> almost always used where it is definitely not safe to pause in context,
>>>> hence the assertions.
>>>>
>> It is. I'd be happy to use new helpers, I don't feel qualified to
>> write them.
>>
>> Ed
> 
> Something like this?  Only compile tested.  In the meantime, I have an
> optimisation in mind for domain_pause() on domains with large numbers of
> vcpus, but that will have to wait a while.
> 
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Date: Wed, 24 Jun 2015 19:06:14 +0100
> Subject: [PATCH] common/domain: Helpers to pause a domain while in context
> 
> For use on codepaths which would need to use domain_pause() but might be in
> the target domain's context.  In the case that the target domain is in
> context,
> all other vcpus are paused.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/common/domain.c     |   28 ++++++++++++++++++++++++++++
>  xen/include/xen/sched.h |    5 +++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 3bc52e6..a1d27e3 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1010,6 +1010,34 @@ int domain_unpause_by_systemcontroller(struct
> domain *d)
>      return 0;
>  }
>  
> +void domain_pause_except_self(struct domain *d)
> +{
> +    struct vcpu *v, *curr = current;
> +
> +    if ( curr->domain == d )
> +    {
> +        for_each_vcpu( d, v )
> +            if ( likely(v != curr) )
> +                vcpu_pause(v);
> +    }
> +    else
> +        domain_pause(d);
> +}
> +
> +void domain_unpause_except_self(struct domain *d)
> +{
> +    struct vcpu *v, *curr = current;
> +
> +    if ( curr->domain == d )
> +    {
> +        for_each_vcpu( d, v )
> +            if ( likely(v != curr) )
> +                vcpu_unpause(v);
> +    }
> +    else
> +        domain_unpause(d);
> +}
> +
>  int vcpu_reset(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index b29d9e7..8e1345a 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -804,6 +804,11 @@ static inline int
> domain_pause_by_systemcontroller_nosync(struct domain *d)
>  {
>      return __domain_pause_by_systemcontroller(d, domain_pause_nosync);
>  }
> +
> +/* domain_pause() but safe against trying to pause current. */
> +void domain_pause_except_self(struct domain *d);
> +void domain_unpause_except_self(struct domain *d);
> +
>  void cpu_init(void);
>  
>  struct scheduler;
> 
> 
Did you commit this to staging? IOW, can I apply it to my branch
and assume it will already be in-tree when our patches are applied?

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops
  2015-06-22 18:56 ` [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops Ed White
@ 2015-06-26 19:24   ` Daniel De Graaf
  2015-06-26 19:35     ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Daniel De Graaf @ 2015-06-26 19:24 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	Andrew Cooper, tlengyel

On 06/22/2015 02:56 PM, Ed White wrote:
> From: Ravi Sahita <ravi.sahita@intel.com>
>
> Signed-off-by: Ravi Sahita <ravi.sahita@intel.com>

One comment, below.

[...]
> diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
> index f4cde11..c95109f 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.if
> +++ b/tools/flask/policy/policy/modules/xen/xen.if
> @@ -8,7 +8,7 @@
>   define(`declare_domain_common', `
>   	allow $1 $2:grant { query setup };
>   	allow $1 $2:mmu { adjust physmap map_read map_write stat pinpage updatemp mmuext_op };
> -	allow $1 $2:hvm { getparam setparam };
> +	allow $1 $2:hvm { getparam setparam altp2mhvm altp2mhvm_op };
>   	allow $1 $2:domain2 get_vnumainfo;
>   ')

This allows any domain to enable altp2m on itself; I think you meant to
only allow altp2mhvm_op here, requiring a privileged domain to first
enable the feature on a domain before anyone can use it.

Otherwise, this looks good, although if patch #10 is changed to expose
a single subop, the altp2mhvm_op XSM checks will need to be relocated.

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops
  2015-06-26 19:24   ` Daniel De Graaf
@ 2015-06-26 19:35     ` Ed White
  2015-06-29 17:52       ` Daniel De Graaf
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-06-26 19:35 UTC (permalink / raw)
  To: Daniel De Graaf, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	Andrew Cooper, tlengyel

On 06/26/2015 12:24 PM, Daniel De Graaf wrote:
> On 06/22/2015 02:56 PM, Ed White wrote:
>> From: Ravi Sahita <ravi.sahita@intel.com>
>>
>> Signed-off-by: Ravi Sahita <ravi.sahita@intel.com>
> 
> One comment, below.
> 
> [...]
>> diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
>> index f4cde11..c95109f 100644
>> --- a/tools/flask/policy/policy/modules/xen/xen.if
>> +++ b/tools/flask/policy/policy/modules/xen/xen.if
>> @@ -8,7 +8,7 @@
>>   define(`declare_domain_common', `
>>       allow $1 $2:grant { query setup };
>>       allow $1 $2:mmu { adjust physmap map_read map_write stat pinpage updatemp mmuext_op };
>> -    allow $1 $2:hvm { getparam setparam };
>> +    allow $1 $2:hvm { getparam setparam altp2mhvm altp2mhvm_op };
>>       allow $1 $2:domain2 get_vnumainfo;
>>   ')
> 
> This allows any domain to enable altp2m on itself; I think you meant to
> only allow altp2mhvm_op here, requiring a privileged domain to first
> enable the feature on a domain before anyone can use it.
> 

We certainly don't want to unconditionally disallow that. We want the
policy to offer the ability to choose whether it's allowed or not.
Does the patch do that?

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-24 10:29   ` Andrew Cooper
  2015-06-24 11:14     ` Andrew Cooper
@ 2015-06-26 21:17     ` Ed White
  2015-06-27 19:25       ` Ed White
  2015-06-29 13:00       ` Andrew Cooper
  1 sibling, 2 replies; 116+ messages in thread
From: Ed White @ 2015-06-26 21:17 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/24/2015 03:29 AM, Andrew Cooper wrote:
> On 22/06/15 19:56, Ed White wrote:
>> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
>> index 3d8f4dc..a1529c0 100644
>> --- a/xen/include/asm-x86/hvm/vcpu.h
>> +++ b/xen/include/asm-x86/hvm/vcpu.h
>> @@ -118,6 +118,13 @@ struct nestedvcpu {
>>  
>>  #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
>>  
>> +struct altp2mvcpu {
>> +    uint16_t    p2midx;         /* alternate p2m index */
>> +    uint64_t    veinfo_gfn;     /* #VE information page guest pfn */
> 
> Please use the recently-introduced pfn_t here.  pfn is a more
> appropriate term than gfn in this case.

Did you mean pfn_t, or xen_pfn_t? I'm having a hard time
figuring out how to use a pfn_t, I can't even assign
INVALID_PFN to one.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-26 21:17     ` Ed White
@ 2015-06-27 19:25       ` Ed White
  2015-06-29 13:00       ` Andrew Cooper
  1 sibling, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-27 19:25 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/26/2015 02:17 PM, Ed White wrote:
> On 06/24/2015 03:29 AM, Andrew Cooper wrote:
>> On 22/06/15 19:56, Ed White wrote:
>>> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
>>> index 3d8f4dc..a1529c0 100644
>>> --- a/xen/include/asm-x86/hvm/vcpu.h
>>> +++ b/xen/include/asm-x86/hvm/vcpu.h
>>> @@ -118,6 +118,13 @@ struct nestedvcpu {
>>>  
>>>  #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
>>>  
>>> +struct altp2mvcpu {
>>> +    uint16_t    p2midx;         /* alternate p2m index */
>>> +    uint64_t    veinfo_gfn;     /* #VE information page guest pfn */
>>
>> Please use the recently-introduced pfn_t here.  pfn is a more
>> appropriate term than gfn in this case.
> 
> Did you mean pfn_t, or xen_pfn_t? I'm having a hard time
> figuring out how to use a pfn_t, I can't even assign
> INVALID_PFN to one.
> 

Scratch that. After a while fumbling around in the dark,
I worked out how to use pfn_t.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-26 21:17     ` Ed White
  2015-06-27 19:25       ` Ed White
@ 2015-06-29 13:00       ` Andrew Cooper
  2015-06-29 16:23         ` Ed White
  1 sibling, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-29 13:00 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 26/06/15 22:17, Ed White wrote:
> On 06/24/2015 03:29 AM, Andrew Cooper wrote:
>> On 22/06/15 19:56, Ed White wrote:
>>> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
>>> index 3d8f4dc..a1529c0 100644
>>> --- a/xen/include/asm-x86/hvm/vcpu.h
>>> +++ b/xen/include/asm-x86/hvm/vcpu.h
>>> @@ -118,6 +118,13 @@ struct nestedvcpu {
>>>  
>>>  #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
>>>  
>>> +struct altp2mvcpu {
>>> +    uint16_t    p2midx;         /* alternate p2m index */
>>> +    uint64_t    veinfo_gfn;     /* #VE information page guest pfn */
>> Please use the recently-introduced pfn_t here.  pfn is a more
>> appropriate term than gfn in this case.
> Did you mean pfn_t, or xen_pfn_t?

Actually I meant gfn_t, per the followup I sent shortly afterwards.

> I'm having a hard time
> figuring out how to use a pfn_t, I can't even assign
> INVALID_PFN to one.

Documentation in c/s 177bd5f, example in c/s 24036a5.  The point of this

For now, it will require copious use of _gfn() and gfn_x() until the
rest of the mm subsystem has been updated to use the new typesafe types.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-26 16:30         ` Ed White
@ 2015-06-29 13:03           ` Andrew Cooper
  2015-06-29 16:24             ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-29 13:03 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Jan Beulich,
	tlengyel, Daniel De Graaf

On 26/06/15 17:30, Ed White wrote:
> On 06/24/2015 11:19 AM, Andrew Cooper wrote:
>> On 24/06/15 18:47, Ed White wrote:
>>>>> This looks like some hoop jumping around the assertions in
>>>>> domain_pause() and vcpu_pause().
>>>>>
>>>>> We should probably have some new helpers where the domain needs to be
>>>>> paused, possibly while in context.  The current domain/vcpu_pause() are
>>>>> almost always used where it is definitely not safe to pause in context,
>>>>> hence the assertions.
>>>>>
>>> It is. I'd be happy to use new helpers, I don't feel qualified to
>>> write them.
>>>
>>> Ed
>> Something like this?  Only compile tested.  In the meantime, I have an
>> optimisation in mind for domain_pause() on domains with large numbers of
>> vcpus, but that will have to wait a while.
>>
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>> Date: Wed, 24 Jun 2015 19:06:14 +0100
>> Subject: [PATCH] common/domain: Helpers to pause a domain while in context
>>
>> For use on codepaths which would need to use domain_pause() but might be in
>> the target domain's context.  In the case that the target domain is in
>> context,
>> all other vcpus are paused.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  xen/common/domain.c     |   28 ++++++++++++++++++++++++++++
>>  xen/include/xen/sched.h |    5 +++++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 3bc52e6..a1d27e3 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -1010,6 +1010,34 @@ int domain_unpause_by_systemcontroller(struct
>> domain *d)
>>      return 0;
>>  }
>>  
>> +void domain_pause_except_self(struct domain *d)
>> +{
>> +    struct vcpu *v, *curr = current;
>> +
>> +    if ( curr->domain == d )
>> +    {
>> +        for_each_vcpu( d, v )
>> +            if ( likely(v != curr) )
>> +                vcpu_pause(v);
>> +    }
>> +    else
>> +        domain_pause(d);
>> +}
>> +
>> +void domain_unpause_except_self(struct domain *d)
>> +{
>> +    struct vcpu *v, *curr = current;
>> +
>> +    if ( curr->domain == d )
>> +    {
>> +        for_each_vcpu( d, v )
>> +            if ( likely(v != curr) )
>> +                vcpu_unpause(v);
>> +    }
>> +    else
>> +        domain_unpause(d);
>> +}
>> +
>>  int vcpu_reset(struct vcpu *v)
>>  {
>>      struct domain *d = v->domain;
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index b29d9e7..8e1345a 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -804,6 +804,11 @@ static inline int
>> domain_pause_by_systemcontroller_nosync(struct domain *d)
>>  {
>>      return __domain_pause_by_systemcontroller(d, domain_pause_nosync);
>>  }
>> +
>> +/* domain_pause() but safe against trying to pause current. */
>> +void domain_pause_except_self(struct domain *d);
>> +void domain_unpause_except_self(struct domain *d);
>> +
>>  void cpu_init(void);
>>  
>>  struct scheduler;
>>
>>
> Did you commit this to staging?

I am not a committer, so couldn't even if I wished to.

> IOW, can I apply it to my branch
> and assume it will already be in-tree when our patches are applied?

You will be the first user of the patch, and as noted, I have only
compile tested.  Please take it and put it at the start of your series.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 02/12] VMX: implement suppress #VE.
  2015-06-22 18:56 ` [PATCH v2 02/12] VMX: implement suppress #VE Ed White
  2015-06-24  9:35   ` Andrew Cooper
@ 2015-06-29 14:20   ` George Dunlap
  2015-06-29 14:31     ` Andrew Cooper
  1 sibling, 1 reply; 116+ messages in thread
From: George Dunlap @ 2015-06-29 14:20 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, xen-devel,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

On Mon, Jun 22, 2015 at 7:56 PM, Ed White <edmund.h.white@intel.com> wrote:
> In preparation for selectively enabling #VE in a later patch, set
> suppress #VE on all EPTE's.
>
> Suppress #VE should always be the default condition for two reasons:
> it is generally not safe to deliver #VE into a guest unless that guest
> has been modified to receive it; and even then for most EPT violations only
> the hypervisor is able to handle the violation.
>
> Signed-off-by: Ed White <edmund.h.white@intel.com>
> ---
>  xen/arch/x86/mm/p2m-ept.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index a6c9adf..5de3387 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -41,7 +41,7 @@
>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>  static inline bool_t is_epte_valid(ept_entry_t *e)
>  {
> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
> +    return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);

So just getting up to speed here: Is it the case that if #VE is
enabled in vmcs that a #VE will be delivered to the guest on any
invalid epte entry that doesn't contain this flag?  So we now need to
actively choose a "default" which is different than the hardware?

 -George

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 02/12] VMX: implement suppress #VE.
  2015-06-29 14:20   ` George Dunlap
@ 2015-06-29 14:31     ` Andrew Cooper
  2015-06-29 15:03       ` George Dunlap
  0 siblings, 1 reply; 116+ messages in thread
From: Andrew Cooper @ 2015-06-29 14:31 UTC (permalink / raw)
  To: George Dunlap, Ed White
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, xen-devel,
	Jan Beulich, tlengyel, Daniel De Graaf

On 29/06/15 15:20, George Dunlap wrote:
> On Mon, Jun 22, 2015 at 7:56 PM, Ed White <edmund.h.white@intel.com> wrote:
>> In preparation for selectively enabling #VE in a later patch, set
>> suppress #VE on all EPTE's.
>>
>> Suppress #VE should always be the default condition for two reasons:
>> it is generally not safe to deliver #VE into a guest unless that guest
>> has been modified to receive it; and even then for most EPT violations only
>> the hypervisor is able to handle the violation.
>>
>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>> ---
>>  xen/arch/x86/mm/p2m-ept.c | 25 ++++++++++++++++++++++++-
>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index a6c9adf..5de3387 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -41,7 +41,7 @@
>>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>>  static inline bool_t is_epte_valid(ept_entry_t *e)
>>  {
>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
>> +    return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);
> So just getting up to speed here: Is it the case that if #VE is
> enabled in vmcs that a #VE will be delivered to the guest on any
> invalid epte entry that doesn't contain this flag?

There is a list of conditions which must be satisfied for a #VE to be
injected instead of an EPT related VMexit.  All EPT misconfiguration
still exit to the hypervisor, but this suppress_ve bit allows the
hypervisor to choose to whether a plain EPT permission violation exits
to Xen, or injects a #VE.

> So we now need to
> actively choose a "default" which is different than the hardware?

By default, setting suppress_ve on everything will cause everything to
behave as before.  Clearing suppress_ve is an optimisation to avoid a
vmexit/vmentry for faults needing bouncing to an in-guest agent.

~Andrew

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 02/12] VMX: implement suppress #VE.
  2015-06-29 14:31     ` Andrew Cooper
@ 2015-06-29 15:03       ` George Dunlap
  2015-06-29 16:21         ` Sahita, Ravi
  2015-06-29 16:21         ` Ed White
  0 siblings, 2 replies; 116+ messages in thread
From: George Dunlap @ 2015-06-29 15:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Ed White,
	xen-devel, Jan Beulich, tlengyel, Daniel De Graaf

On Mon, Jun 29, 2015 at 3:31 PM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 29/06/15 15:20, George Dunlap wrote:
>> On Mon, Jun 22, 2015 at 7:56 PM, Ed White <edmund.h.white@intel.com> wrote:
>>> In preparation for selectively enabling #VE in a later patch, set
>>> suppress #VE on all EPTE's.
>>>
>>> Suppress #VE should always be the default condition for two reasons:
>>> it is generally not safe to deliver #VE into a guest unless that guest
>>> has been modified to receive it; and even then for most EPT violations only
>>> the hypervisor is able to handle the violation.
>>>
>>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>>> ---
>>>  xen/arch/x86/mm/p2m-ept.c | 25 ++++++++++++++++++++++++-
>>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>>> index a6c9adf..5de3387 100644
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -41,7 +41,7 @@
>>>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>>>  static inline bool_t is_epte_valid(ept_entry_t *e)
>>>  {
>>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
>>> +    return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);
>> So just getting up to speed here: Is it the case that if #VE is
>> enabled in vmcs that a #VE will be delivered to the guest on any
>> invalid epte entry that doesn't contain this flag?
>
> There is a list of conditions which must be satisfied for a #VE to be
> injected instead of an EPT related VMexit.  All EPT misconfiguration
> still exit to the hypervisor, but this suppress_ve bit allows the
> hypervisor to choose to whether a plain EPT permission violation exits
> to Xen, or injects a #VE.
>
>> So we now need to
>> actively choose a "default" which is different than the hardware?
>
> By default, setting suppress_ve on everything will cause everything to
> behave as before.  Clearing suppress_ve is an optimisation to avoid a
> vmexit/vmentry for faults needing bouncing to an in-guest agent.

So the short answer is, 'yes':  The hardware will deliver #VEs for all
non-misconfigured ept entries (which includes entries which are simply
not present) unless you actively do something to suppress them; what
we want is *not* to deliver #VEs unless the guest actively does
something to cause them to be delivered for particular GPAs.

Thanks,
 -George

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 02/12] VMX: implement suppress #VE.
  2015-06-29 15:03       ` George Dunlap
@ 2015-06-29 16:21         ` Sahita, Ravi
  2015-06-29 16:21         ` Ed White
  1 sibling, 0 replies; 116+ messages in thread
From: Sahita, Ravi @ 2015-06-29 16:21 UTC (permalink / raw)
  To: George Dunlap, Andrew Cooper
  Cc: Wei Liu, Tim Deegan, Ian Jackson, White, Edmund H, xen-devel,
	Jan Beulich, tlengyel, Daniel De Graaf


On Mon, Jun 29, 2015 at 3:31 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 29/06/15 15:20, George Dunlap wrote:
>> On Mon, Jun 22, 2015 at 7:56 PM, Ed White <edmund.h.white@intel.com> wrote:
>>> In preparation for selectively enabling #VE in a later patch, set 
>>> suppress #VE on all EPTE's.
>>>
>>> Suppress #VE should always be the default condition for two reasons:
>>> it is generally not safe to deliver #VE into a guest unless that 
>>> guest has been modified to receive it; and even then for most EPT 
>>> violations only the hypervisor is able to handle the violation.
>>>
>>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>>> ---
>>>  xen/arch/x86/mm/p2m-ept.c | 25 ++++++++++++++++++++++++-
>>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c 
>>> index a6c9adf..5de3387 100644
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -41,7 +41,7 @@
>>>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>>>  static inline bool_t is_epte_valid(ept_entry_t *e)  {
>>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
>>> +    return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != 
>>> + p2m_invalid);
>> So just getting up to speed here: Is it the case that if #VE is 
>> enabled in vmcs that a #VE will be delivered to the guest on any 
>> invalid epte entry that doesn't contain this flag?
>
> There is a list of conditions which must be satisfied for a #VE to be 
> injected instead of an EPT related VMexit.  All EPT misconfiguration 
> still exit to the hypervisor, but this suppress_ve bit allows the 
> hypervisor to choose to whether a plain EPT permission violation exits 
> to Xen, or injects a #VE.
>
>> So we now need to
>> actively choose a "default" which is different than the hardware?
>
> By default, setting suppress_ve on everything will cause everything to 
> behave as before.  Clearing suppress_ve is an optimisation to avoid a 
> vmexit/vmentry for faults needing bouncing to an in-guest agent.

So the short answer is, 'yes':  The hardware will deliver #VEs for all non-misconfigured ept entries (which includes entries which are simply not present) unless you actively do something to suppress them; what we want is *not* to deliver #VEs unless the guest actively does something to cause them to be delivered for particular GPAs.

Ravi> correct, by setting suppress-ve in the default EPTE we achieve that behavior of not delivering #VE (ie legacy behavior) unless the guest actively sets an altp2m policy for specific GPAs.

Ravi

Thanks,
 -George

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 02/12] VMX: implement suppress #VE.
  2015-06-29 15:03       ` George Dunlap
  2015-06-29 16:21         ` Sahita, Ravi
@ 2015-06-29 16:21         ` Ed White
  1 sibling, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-29 16:21 UTC (permalink / raw)
  To: George Dunlap, Andrew Cooper
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, xen-devel,
	Jan Beulich, tlengyel, Daniel De Graaf

On 06/29/2015 08:03 AM, George Dunlap wrote:
> On Mon, Jun 29, 2015 at 3:31 PM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 29/06/15 15:20, George Dunlap wrote:
>>> On Mon, Jun 22, 2015 at 7:56 PM, Ed White <edmund.h.white@intel.com> wrote:
>>>> In preparation for selectively enabling #VE in a later patch, set
>>>> suppress #VE on all EPTE's.
>>>>
>>>> Suppress #VE should always be the default condition for two reasons:
>>>> it is generally not safe to deliver #VE into a guest unless that guest
>>>> has been modified to receive it; and even then for most EPT violations only
>>>> the hypervisor is able to handle the violation.
>>>>
>>>> Signed-off-by: Ed White <edmund.h.white@intel.com>
>>>> ---
>>>>  xen/arch/x86/mm/p2m-ept.c | 25 ++++++++++++++++++++++++-
>>>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>>>> index a6c9adf..5de3387 100644
>>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>>> @@ -41,7 +41,7 @@
>>>>  #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
>>>>  static inline bool_t is_epte_valid(ept_entry_t *e)
>>>>  {
>>>> -    return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
>>>> +    return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);
>>> So just getting up to speed here: Is it the case that if #VE is
>>> enabled in vmcs that a #VE will be delivered to the guest on any
>>> invalid epte entry that doesn't contain this flag?
>>
>> There is a list of conditions which must be satisfied for a #VE to be
>> injected instead of an EPT related VMexit.  All EPT misconfiguration
>> still exit to the hypervisor, but this suppress_ve bit allows the
>> hypervisor to choose to whether a plain EPT permission violation exits
>> to Xen, or injects a #VE.
>>
>>> So we now need to
>>> actively choose a "default" which is different than the hardware?
>>
>> By default, setting suppress_ve on everything will cause everything to
>> behave as before.  Clearing suppress_ve is an optimisation to avoid a
>> vmexit/vmentry for faults needing bouncing to an in-guest agent.
> 
> So the short answer is, 'yes':  The hardware will deliver #VEs for all
> non-misconfigured ept entries (which includes entries which are simply
> not present) unless you actively do something to suppress them; what
> we want is *not* to deliver #VEs unless the guest actively does
> something to cause them to be delivered for particular GPAs.
> 

Exactly. After this patch, the hypervisor can enable #VE in the VMCS
but no #VE's will actually be delivered. A later patch selectively
enables them on certain EPTE's.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 04/12] x86/altp2m: basic data structures and support routines.
  2015-06-29 13:00       ` Andrew Cooper
@ 2015-06-29 16:23         ` Ed White
  0 siblings, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-29 16:23 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/29/2015 06:00 AM, Andrew Cooper wrote:
> On 26/06/15 22:17, Ed White wrote:
>> On 06/24/2015 03:29 AM, Andrew Cooper wrote:
>>> On 22/06/15 19:56, Ed White wrote:
>>>> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
>>>> index 3d8f4dc..a1529c0 100644
>>>> --- a/xen/include/asm-x86/hvm/vcpu.h
>>>> +++ b/xen/include/asm-x86/hvm/vcpu.h
>>>> @@ -118,6 +118,13 @@ struct nestedvcpu {
>>>>  
>>>>  #define vcpu_nestedhvm(v) ((v)->arch.hvm_vcpu.nvcpu)
>>>>  
>>>> +struct altp2mvcpu {
>>>> +    uint16_t    p2midx;         /* alternate p2m index */
>>>> +    uint64_t    veinfo_gfn;     /* #VE information page guest pfn */
>>> Please use the recently-introduced pfn_t here.  pfn is a more
>>> appropriate term than gfn in this case.
>> Did you mean pfn_t, or xen_pfn_t?
> 
> Actually I meant gfn_t, per the followup I sent shortly afterwards.
> 
>> I'm having a hard time
>> figuring out how to use a pfn_t, I can't even assign
>> INVALID_PFN to one.
> 
> Documentation in c/s 177bd5f, example in c/s 24036a5.  The point of this
> 
> For now, it will require copious use of _gfn() and gfn_x() until the
> rest of the mm subsystem has been updated to use the new typesafe types.
> 

Understood. I am finding that the re-work to use gfn_t in this and patch 9
is making for very messy code, but I am doing it.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 09/12] x86/altp2m: add remaining support routines.
  2015-06-29 13:03           ` Andrew Cooper
@ 2015-06-29 16:24             ` Ed White
  0 siblings, 0 replies; 116+ messages in thread
From: Ed White @ 2015-06-29 16:24 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Ravi Sahita, Wei Liu, Tim Deegan, Ian Jackson, Jan Beulich,
	tlengyel, Daniel De Graaf

On 06/29/2015 06:03 AM, Andrew Cooper wrote:
> On 26/06/15 17:30, Ed White wrote:
>> On 06/24/2015 11:19 AM, Andrew Cooper wrote:
>>> On 24/06/15 18:47, Ed White wrote:
>>>>>> This looks like some hoop jumping around the assertions in
>>>>>> domain_pause() and vcpu_pause().
>>>>>>
>>>>>> We should probably have some new helpers where the domain needs to be
>>>>>> paused, possibly while in context.  The current domain/vcpu_pause() are
>>>>>> almost always used where it is definitely not safe to pause in context,
>>>>>> hence the assertions.
>>>>>>
>>>> It is. I'd be happy to use new helpers, I don't feel qualified to
>>>> write them.
>>>>
>>>> Ed
>>> Something like this?  Only compile tested.  In the meantime, I have an
>>> optimisation in mind for domain_pause() on domains with large numbers of
>>> vcpus, but that will have to wait a while.
>>>
>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Date: Wed, 24 Jun 2015 19:06:14 +0100
>>> Subject: [PATCH] common/domain: Helpers to pause a domain while in context
>>>
>>> For use on codepaths which would need to use domain_pause() but might be in
>>> the target domain's context.  In the case that the target domain is in
>>> context,
>>> all other vcpus are paused.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>>  xen/common/domain.c     |   28 ++++++++++++++++++++++++++++
>>>  xen/include/xen/sched.h |    5 +++++
>>>  2 files changed, 33 insertions(+)
>>>
>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>> index 3bc52e6..a1d27e3 100644
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -1010,6 +1010,34 @@ int domain_unpause_by_systemcontroller(struct
>>> domain *d)
>>>      return 0;
>>>  }
>>>  
>>> +void domain_pause_except_self(struct domain *d)
>>> +{
>>> +    struct vcpu *v, *curr = current;
>>> +
>>> +    if ( curr->domain == d )
>>> +    {
>>> +        for_each_vcpu( d, v )
>>> +            if ( likely(v != curr) )
>>> +                vcpu_pause(v);
>>> +    }
>>> +    else
>>> +        domain_pause(d);
>>> +}
>>> +
>>> +void domain_unpause_except_self(struct domain *d)
>>> +{
>>> +    struct vcpu *v, *curr = current;
>>> +
>>> +    if ( curr->domain == d )
>>> +    {
>>> +        for_each_vcpu( d, v )
>>> +            if ( likely(v != curr) )
>>> +                vcpu_unpause(v);
>>> +    }
>>> +    else
>>> +        domain_unpause(d);
>>> +}
>>> +
>>>  int vcpu_reset(struct vcpu *v)
>>>  {
>>>      struct domain *d = v->domain;
>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>>> index b29d9e7..8e1345a 100644
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -804,6 +804,11 @@ static inline int
>>> domain_pause_by_systemcontroller_nosync(struct domain *d)
>>>  {
>>>      return __domain_pause_by_systemcontroller(d, domain_pause_nosync);
>>>  }
>>> +
>>> +/* domain_pause() but safe against trying to pause current. */
>>> +void domain_pause_except_self(struct domain *d);
>>> +void domain_unpause_except_self(struct domain *d);
>>> +
>>>  void cpu_init(void);
>>>  
>>>  struct scheduler;
>>>
>>>
>> Did you commit this to staging?
> 
> I am not a committer, so couldn't even if I wished to.
> 
>> IOW, can I apply it to my branch
>> and assume it will already be in-tree when our patches are applied?
> 
> You will be the first user of the patch, and as noted, I have only
> compile tested.  Please take it and put it at the start of your series.
> 

Will do. I thought you were all-powerful.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops
  2015-06-26 19:35     ` Ed White
@ 2015-06-29 17:52       ` Daniel De Graaf
  2015-06-29 17:55         ` Sahita, Ravi
  0 siblings, 1 reply; 116+ messages in thread
From: Daniel De Graaf @ 2015-06-29 17:52 UTC (permalink / raw)
  To: Ed White, xen-devel
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich,
	Andrew Cooper, tlengyel

On 06/26/2015 03:35 PM, Ed White wrote:
> On 06/26/2015 12:24 PM, Daniel De Graaf wrote:
>> On 06/22/2015 02:56 PM, Ed White wrote:
>>> From: Ravi Sahita <ravi.sahita@intel.com>
>>>
>>> Signed-off-by: Ravi Sahita <ravi.sahita@intel.com>
>>
>> One comment, below.
>>
>> [...]
>>> diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if
>>> index f4cde11..c95109f 100644
>>> --- a/tools/flask/policy/policy/modules/xen/xen.if
>>> +++ b/tools/flask/policy/policy/modules/xen/xen.if
>>> @@ -8,7 +8,7 @@
>>>    define(`declare_domain_common', `
>>>        allow $1 $2:grant { query setup };
>>>        allow $1 $2:mmu { adjust physmap map_read map_write stat pinpage updatemp mmuext_op };
>>> -    allow $1 $2:hvm { getparam setparam };
>>> +    allow $1 $2:hvm { getparam setparam altp2mhvm altp2mhvm_op };
>>>        allow $1 $2:domain2 get_vnumainfo;
>>>    ')
>>
>> This allows any domain to enable altp2m on itself; I think you meant to
>> only allow altp2mhvm_op here, requiring a privileged domain to first
>> enable the feature on a domain before anyone can use it.
>>
>
> We certainly don't want to unconditionally disallow that. We want the
> policy to offer the ability to choose whether it's allowed or not.
> Does the patch do that?

Remove altp2mhvm from the above line, leaving only altp2mhvm_op here.  The
other line added to xen.if should still contain both. This makes the FLASK
policy match the no-XSM case, which I assume is what you've tested.

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops
  2015-06-29 17:52       ` Daniel De Graaf
@ 2015-06-29 17:55         ` Sahita, Ravi
  0 siblings, 0 replies; 116+ messages in thread
From: Sahita, Ravi @ 2015-06-29 17:55 UTC (permalink / raw)
  To: Daniel De Graaf, White, Edmund H, xen-devel
  Cc: Wei Liu, Ian Jackson, Tim Deegan, Jan Beulich, Andrew Cooper, tlengyel

On 06/26/2015 03:35 PM, Ed White wrote:
> On 06/26/2015 12:24 PM, Daniel De Graaf wrote:
>> On 06/22/2015 02:56 PM, Ed White wrote:
>>> From: Ravi Sahita <ravi.sahita@intel.com>
>>>
>>> Signed-off-by: Ravi Sahita <ravi.sahita@intel.com>
>>
>> One comment, below.
>>
>> [...]
>>> diff --git a/tools/flask/policy/policy/modules/xen/xen.if 
>>> b/tools/flask/policy/policy/modules/xen/xen.if
>>> index f4cde11..c95109f 100644
>>> --- a/tools/flask/policy/policy/modules/xen/xen.if
>>> +++ b/tools/flask/policy/policy/modules/xen/xen.if
>>> @@ -8,7 +8,7 @@
>>>    define(`declare_domain_common', `
>>>        allow $1 $2:grant { query setup };
>>>        allow $1 $2:mmu { adjust physmap map_read map_write stat pinpage updatemp mmuext_op };
>>> -    allow $1 $2:hvm { getparam setparam };
>>> +    allow $1 $2:hvm { getparam setparam altp2mhvm altp2mhvm_op };
>>>        allow $1 $2:domain2 get_vnumainfo;
>>>    ')
>>
>> This allows any domain to enable altp2m on itself; I think you meant 
>> to only allow altp2mhvm_op here, requiring a privileged domain to 
>> first enable the feature on a domain before anyone can use it.
>>
>
> We certainly don't want to unconditionally disallow that. We want the 
> policy to offer the ability to choose whether it's allowed or not.
> Does the patch do that?

Remove altp2mhvm from the above line, leaving only altp2mhvm_op here.  The other line added to xen.if should still contain both. This makes the FLASK policy match the no-XSM case, which I assume is what you've tested.
--
Daniel De Graaf
National Security Agency

Ravi> Thanks Daniel - we will make that change - We have tested both no-XSM and with XSM for our Windows HVM domain tests.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-06-26 16:27             ` Ed White
@ 2015-07-06 17:12               ` George Dunlap
  2015-07-06 17:35                 ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: George Dunlap @ 2015-07-06 17:12 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

On Fri, Jun 26, 2015 at 5:27 PM, Ed White <edmund.h.white@intel.com> wrote:
> On 06/25/2015 11:04 PM, Jan Beulich wrote:
>>>>> On 25.06.15 at 18:36, <edmund.h.white@intel.com> wrote:
>>> On 06/25/2015 01:12 AM, Jan Beulich wrote:
>>>>>>> On 24.06.15 at 19:53, <edmund.h.white@intel.com> wrote:
>>>>> On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>>>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>>>>>> --- a/xen/include/asm-x86/p2m.h
>>>>>>> +++ b/xen/include/asm-x86/p2m.h
>>>>>>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>>>>>>                                         p2m_access_t *p2ma,
>>>>>>>                                         p2m_query_t q,
>>>>>>>                                         unsigned int *page_order);
>>>>>>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>>>>>>> +                                         unsigned long gfn,
>>>>>>> +                                         mfn_t mfn, unsigned int
>>>>> page_order,
>>>>>>> +                                         p2m_type_t p2mt,
>>>>>>> +                                         p2m_access_t p2ma,
>>>>>>> +                                         unsigned int sve);
>>>>>>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>>>>>>> +                                         unsigned long gfn,
>>>>>>> +                                         p2m_type_t *p2mt,
>>>>>>> +                                         p2m_access_t *p2ma,
>>>>>>> +                                         p2m_query_t q,
>>>>>>> +                                         unsigned int *page_order,
>>>>>>> +                                         unsigned int *sve);
>>>>>>
>>>>>> I have to admit that I find the _full suffixes here pretty odd. Based
>>>>>> on the functionality, they should be _sve. But then it seems
>>>>>> questionable how they could be useful to the generic p2m layer
>>>>>> anyway, i.e. why there would need to be such hooks in the first
>>>>>> place.
>>>>>
>>>>> I did originally use _sve suffixes. I changed them because there
>>>>> may be some future case where these routines control some other
>>>>> EPTE bit too. I made them hooks because I thought calling ept...
>>>>> functions directly would be a layering violation.
>>>>
>>>> Indeed it would. But thinking about it more, I would suggest to
>>>> extend the existing accessors rather than adding new ones.
>>>> Just consider what would result when further such return values
>>>> are going to be needed in the future: I don't see us adding
>>>> _fuller, _fullest, etc variants. Perhaps just make the new output
>>>> an optional generic "flags" one. One might even consider folding
>>>> it with order, or even consolidate all the outputs into a single
>>>> structure.
>>>
>>> The new functions are called in 3 places only, so changing them
>>> later would have minimal impact. The existing functions are called
>>> in many, many places. I *really* don't want to go changing the
>>> amount of existing code that doing what you suggest would entail
>>> at this late stage.
>>
>> I continue to think differently (and I don't consider "at this late
>> stage" a particularly relevant argument), but the maintainer will
>> have the final say anyway - George?
>>
>
> The patch as it is now doesn't disturb (and risk breaking) any
> existing code. I'd much rather stick with that for 4.6, even if
> only on the condition that I have to change it later. If I do
> what you suggest, that sets me up to fail to get anything in
> 4.6. That may not matter to you, but it matters to me.

Sorry, I've just gotten up to speed enough to figure out what the
question is about.

For future reference: what has the highest risk of breaking existing
code is touching the codepath, not doing an almost entirely mechanical
change.  From that perspective, you have changed all paths through
[gs]et_entry() already (on Intel boxes at least).  I wouldn't have
considered a global search-and-replace where the defaults are always
the same (and propagation of the interface through the generic and AMD
function signatures) as a particularly invasive change -- at least,
not any more than the code you have here.

It looks like the existing p2m->set_entry() function is only called in
6 places -- 5 times in p2m.c and once in mem_sharing.c; and
p2m->get_entry() is called in about two dozen places, all in p2m.c
(and again one in mem_sharing.c).  If you change the [gs]et_entry()
hooks, but have p2m_set_entry() pass in the default, it shouldn't be
that big of an impact (particularly as the get_entry() will just be
passing NULL).

I do think that avoiding magic numbers is important, at least for the
default; for example:

#define P2M_SUPPRESS_VE_DEFAULT (-1)

Another option would be to make an enum with {default, clear, set},
but that's probably overkill.

 -George

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-06 17:12               ` George Dunlap
@ 2015-07-06 17:35                 ` Ed White
  2015-07-06 18:29                   ` George Dunlap
  0 siblings, 1 reply; 116+ messages in thread
From: Ed White @ 2015-07-06 17:35 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

On 07/06/2015 10:12 AM, George Dunlap wrote:
> On Fri, Jun 26, 2015 at 5:27 PM, Ed White <edmund.h.white@intel.com> wrote:
>> On 06/25/2015 11:04 PM, Jan Beulich wrote:
>>>>>> On 25.06.15 at 18:36, <edmund.h.white@intel.com> wrote:
>>>> On 06/25/2015 01:12 AM, Jan Beulich wrote:
>>>>>>>> On 24.06.15 at 19:53, <edmund.h.white@intel.com> wrote:
>>>>>> On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>>>>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>>>>>>> --- a/xen/include/asm-x86/p2m.h
>>>>>>>> +++ b/xen/include/asm-x86/p2m.h
>>>>>>>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>>>>>>>                                         p2m_access_t *p2ma,
>>>>>>>>                                         p2m_query_t q,
>>>>>>>>                                         unsigned int *page_order);
>>>>>>>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>>>>>>>> +                                         unsigned long gfn,
>>>>>>>> +                                         mfn_t mfn, unsigned int
>>>>>> page_order,
>>>>>>>> +                                         p2m_type_t p2mt,
>>>>>>>> +                                         p2m_access_t p2ma,
>>>>>>>> +                                         unsigned int sve);
>>>>>>>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>>>>>>>> +                                         unsigned long gfn,
>>>>>>>> +                                         p2m_type_t *p2mt,
>>>>>>>> +                                         p2m_access_t *p2ma,
>>>>>>>> +                                         p2m_query_t q,
>>>>>>>> +                                         unsigned int *page_order,
>>>>>>>> +                                         unsigned int *sve);
>>>>>>>
>>>>>>> I have to admit that I find the _full suffixes here pretty odd. Based
>>>>>>> on the functionality, they should be _sve. But then it seems
>>>>>>> questionable how they could be useful to the generic p2m layer
>>>>>>> anyway, i.e. why there would need to be such hooks in the first
>>>>>>> place.
>>>>>>
>>>>>> I did originally use _sve suffixes. I changed them because there
>>>>>> may be some future case where these routines control some other
>>>>>> EPTE bit too. I made them hooks because I thought calling ept...
>>>>>> functions directly would be a layering violation.
>>>>>
>>>>> Indeed it would. But thinking about it more, I would suggest to
>>>>> extend the existing accessors rather than adding new ones.
>>>>> Just consider what would result when further such return values
>>>>> are going to be needed in the future: I don't see us adding
>>>>> _fuller, _fullest, etc variants. Perhaps just make the new output
>>>>> an optional generic "flags" one. One might even consider folding
>>>>> it with order, or even consolidate all the outputs into a single
>>>>> structure.
>>>>
>>>> The new functions are called in 3 places only, so changing them
>>>> later would have minimal impact. The existing functions are called
>>>> in many, many places. I *really* don't want to go changing the
>>>> amount of existing code that doing what you suggest would entail
>>>> at this late stage.
>>>
>>> I continue to think differently (and I don't consider "at this late
>>> stage" a particularly relevant argument), but the maintainer will
>>> have the final say anyway - George?
>>>
>>
>> The patch as it is now doesn't disturb (and risk breaking) any
>> existing code. I'd much rather stick with that for 4.6, even if
>> only on the condition that I have to change it later. If I do
>> what you suggest, that sets me up to fail to get anything in
>> 4.6. That may not matter to you, but it matters to me.
> 
> Sorry, I've just gotten up to speed enough to figure out what the
> question is about.
> 
> For future reference: what has the highest risk of breaking existing
> code is touching the codepath, not doing an almost entirely mechanical
> change.  From that perspective, you have changed all paths through
> [gs]et_entry() already (on Intel boxes at least).  I wouldn't have
> considered a global search-and-replace where the defaults are always
> the same (and propagation of the interface through the generic and AMD
> function signatures) as a particularly invasive change -- at least,
> not any more than the code you have here.
> 
> It looks like the existing p2m->set_entry() function is only called in
> 6 places -- 5 times in p2m.c and once in mem_sharing.c; and
> p2m->get_entry() is called in about two dozen places, all in p2m.c
> (and again one in mem_sharing.c).  If you change the [gs]et_entry()
> hooks, but have p2m_set_entry() pass in the default, it shouldn't be
> that big of an impact (particularly as the get_entry() will just be
> passing NULL).
> 
> I do think that avoiding magic numbers is important, at least for the
> default; for example:
> 
> #define P2M_SUPPRESS_VE_DEFAULT (-1)
> 
> Another option would be to make an enum with {default, clear, set},
> but that's probably overkill.
> 

I certainly don't want to speak for Jan, but my reading of his
comments suggests that wouldn't be enough to satisfy him. He
seemed to me to object to the whole idea of adding something
specifically to handle suppress_ve, and thought any change should
offer a more general 'control extra (E)PTE bits' interface.

If the requirement is only to add control of suppress_ve, I honestly
don't understand what is wrong with the way I have already done it.
There is certainly precedent for adding extra p2m hook functions that
are VMX-specific (look at the PML patch series), and I haven't
changed lots of code that I have no way to test, which is one of
the concerns I have about changing set/get everywhere.

If the objection is to me wrapping the existing EPT set/get functions,
I could add entirely separate functions that only manipulate
suppress_ve. The reason I didn't is that I would need to duplicate
a lot of the code in the existing functions.

I want to be clear: you are the maintainers, and in the end you have
final say; however, I've been developing system software for a long
time and I really don't understand why you think requiring a design
that changes more source code for no functional effect is a good
idea.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-06 17:35                 ` Ed White
@ 2015-07-06 18:29                   ` George Dunlap
  2015-07-06 18:43                     ` Ed White
  2015-07-07  8:04                     ` Jan Beulich
  0 siblings, 2 replies; 116+ messages in thread
From: George Dunlap @ 2015-07-06 18:29 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

On 07/06/2015 06:35 PM, Ed White wrote:
> On 07/06/2015 10:12 AM, George Dunlap wrote:
>> On Fri, Jun 26, 2015 at 5:27 PM, Ed White <edmund.h.white@intel.com> wrote:
>>> On 06/25/2015 11:04 PM, Jan Beulich wrote:
>>>>>>> On 25.06.15 at 18:36, <edmund.h.white@intel.com> wrote:
>>>>> On 06/25/2015 01:12 AM, Jan Beulich wrote:
>>>>>>>>> On 24.06.15 at 19:53, <edmund.h.white@intel.com> wrote:
>>>>>>> On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>>>>>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>>>>>>>> --- a/xen/include/asm-x86/p2m.h
>>>>>>>>> +++ b/xen/include/asm-x86/p2m.h
>>>>>>>>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>>>>>>>>                                         p2m_access_t *p2ma,
>>>>>>>>>                                         p2m_query_t q,
>>>>>>>>>                                         unsigned int *page_order);
>>>>>>>>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>>>>>>>>> +                                         unsigned long gfn,
>>>>>>>>> +                                         mfn_t mfn, unsigned int
>>>>>>> page_order,
>>>>>>>>> +                                         p2m_type_t p2mt,
>>>>>>>>> +                                         p2m_access_t p2ma,
>>>>>>>>> +                                         unsigned int sve);
>>>>>>>>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>>>>>>>>> +                                         unsigned long gfn,
>>>>>>>>> +                                         p2m_type_t *p2mt,
>>>>>>>>> +                                         p2m_access_t *p2ma,
>>>>>>>>> +                                         p2m_query_t q,
>>>>>>>>> +                                         unsigned int *page_order,
>>>>>>>>> +                                         unsigned int *sve);
>>>>>>>>
>>>>>>>> I have to admit that I find the _full suffixes here pretty odd. Based
>>>>>>>> on the functionality, they should be _sve. But then it seems
>>>>>>>> questionable how they could be useful to the generic p2m layer
>>>>>>>> anyway, i.e. why there would need to be such hooks in the first
>>>>>>>> place.
>>>>>>>
>>>>>>> I did originally use _sve suffixes. I changed them because there
>>>>>>> may be some future case where these routines control some other
>>>>>>> EPTE bit too. I made them hooks because I thought calling ept...
>>>>>>> functions directly would be a layering violation.
>>>>>>
>>>>>> Indeed it would. But thinking about it more, I would suggest to
>>>>>> extend the existing accessors rather than adding new ones.
>>>>>> Just consider what would result when further such return values
>>>>>> are going to be needed in the future: I don't see us adding
>>>>>> _fuller, _fullest, etc variants. Perhaps just make the new output
>>>>>> an optional generic "flags" one. One might even consider folding
>>>>>> it with order, or even consolidate all the outputs into a single
>>>>>> structure.
>>>>>
>>>>> The new functions are called in 3 places only, so changing them
>>>>> later would have minimal impact. The existing functions are called
>>>>> in many, many places. I *really* don't want to go changing the
>>>>> amount of existing code that doing what you suggest would entail
>>>>> at this late stage.
>>>>
>>>> I continue to think differently (and I don't consider "at this late
>>>> stage" a particularly relevant argument), but the maintainer will
>>>> have the final say anyway - George?
>>>>
>>>
>>> The patch as it is now doesn't disturb (and risk breaking) any
>>> existing code. I'd much rather stick with that for 4.6, even if
>>> only on the condition that I have to change it later. If I do
>>> what you suggest, that sets me up to fail to get anything in
>>> 4.6. That may not matter to you, but it matters to me.
>>
>> Sorry, I've just gotten up to speed enough to figure out what the
>> question is about.
>>
>> For future reference: what has the highest risk of breaking existing
>> code is touching the codepath, not doing an almost entirely mechanical
>> change.  From that perspective, you have changed all paths through
>> [gs]et_entry() already (on Intel boxes at least).  I wouldn't have
>> considered a global search-and-replace where the defaults are always
>> the same (and propagation of the interface through the generic and AMD
>> function signatures) as a particularly invasive change -- at least,
>> not any more than the code you have here.
>>
>> It looks like the existing p2m->set_entry() function is only called in
>> 6 places -- 5 times in p2m.c and once in mem_sharing.c; and
>> p2m->get_entry() is called in about two dozen places, all in p2m.c
>> (and again one in mem_sharing.c).  If you change the [gs]et_entry()
>> hooks, but have p2m_set_entry() pass in the default, it shouldn't be
>> that big of an impact (particularly as the get_entry() will just be
>> passing NULL).
>>
>> I do think that avoiding magic numbers is important, at least for the
>> default; for example:
>>
>> #define P2M_SUPPRESS_VE_DEFAULT (-1)
>>
>> Another option would be to make an enum with {default, clear, set},
>> but that's probably overkill.
>>
> 
> I certainly don't want to speak for Jan, but my reading of his
> comments suggests that wouldn't be enough to satisfy him. He
> seemed to me to object to the whole idea of adding something
> specifically to handle suppress_ve, and thought any change should
> offer a more general 'control extra (E)PTE bits' interface.

I understood Jan's objection to be to adding two extra hooks ("But then
it seems questionable how they could be useful to the generic p2m layer
anyway, i.e. why there would need to be such hooks in the first
place."), instead of just adding an extra field to the existing
[gs]et_p2m_entry() ("But thinking about it more, I would suggest to
extend the existing accessors rather than adding new ones.")

He does suggest the idea of making the interface generic, by for example
making it an extentable "flags" argument, or by changing the whole thing
to accept a pointer to a struct, rather than adding more and more
arguments that need to be set (and which, like p2m_access_t, almost
everybody just either uses the default or passes on what was passed to
them), add a pointer to a struct ("One might even consider folding it
with order, or even consolidate all the outputs into a single
structure.") But I think that may be a clean-up thing we do next round.

The first quote above isn't 100% clear, so I can see why you might think
he meant not to expose SVE directly.

> If the requirement is only to add control of suppress_ve, I honestly
> don't understand what is wrong with the way I have already done it.
> There is certainly precedent for adding extra p2m hook functions that
> are VMX-specific (look at the PML patch series), and I haven't
> changed lots of code that I have no way to test, which is one of
> the concerns I have about changing set/get everywhere.
> 
> If the objection is to me wrapping the existing EPT set/get functions,
> I could add entirely separate functions that only manipulate
> suppress_ve. The reason I didn't is that I would need to duplicate
> a lot of the code in the existing functions.

The objection isn't to the wrapping; the objection is to adding new
hooks that are *almost entirely identical* to the old hooks, but have
one extra parameter.

The PML series added new hooks that were *completely new* in functionality.

> I want to be clear: you are the maintainers, and in the end you have
> final say; however, I've been developing system software for a long
> time and I really don't understand why you think requiring a design
> that changes more source code for no functional effect is a good
> idea.

If it were simply a matter of making a new function call (by adding _ to
the front or _internal to the end), then yeah, this wrapper scheme would
probably be better than going around changing all the entries that don't
use the extra value.  But p2m->set_entry() is *already* the internal
function which is wrapped by p2m_set_entry().

Introducing yet another layer -- particularly in a hooked interface like
this -- just seems clunky.  It's not the worst thing in the world; if I
thought this would be the difference between making it or not, I might
just say fix it later.  But I don't think it will; and these little
things add up.

 -George

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-06 18:29                   ` George Dunlap
@ 2015-07-06 18:43                     ` Ed White
  2015-07-07 10:10                       ` George Dunlap
  2015-07-07  8:04                     ` Jan Beulich
  1 sibling, 1 reply; 116+ messages in thread
From: Ed White @ 2015-07-06 18:43 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

On 07/06/2015 11:29 AM, George Dunlap wrote:
> On 07/06/2015 06:35 PM, Ed White wrote:
>> On 07/06/2015 10:12 AM, George Dunlap wrote:
>>> On Fri, Jun 26, 2015 at 5:27 PM, Ed White <edmund.h.white@intel.com> wrote:
>>>> On 06/25/2015 11:04 PM, Jan Beulich wrote:
>>>>>>>> On 25.06.15 at 18:36, <edmund.h.white@intel.com> wrote:
>>>>>> On 06/25/2015 01:12 AM, Jan Beulich wrote:
>>>>>>>>>> On 24.06.15 at 19:53, <edmund.h.white@intel.com> wrote:
>>>>>>>> On 06/24/2015 07:38 AM, Jan Beulich wrote:
>>>>>>>>>>>> On 22.06.15 at 20:56, <edmund.h.white@intel.com> wrote:
>>>>>>>>>> --- a/xen/include/asm-x86/p2m.h
>>>>>>>>>> +++ b/xen/include/asm-x86/p2m.h
>>>>>>>>>> @@ -237,6 +237,19 @@ struct p2m_domain {
>>>>>>>>>>                                         p2m_access_t *p2ma,
>>>>>>>>>>                                         p2m_query_t q,
>>>>>>>>>>                                         unsigned int *page_order);
>>>>>>>>>> +    int                (*set_entry_full)(struct p2m_domain *p2m,
>>>>>>>>>> +                                         unsigned long gfn,
>>>>>>>>>> +                                         mfn_t mfn, unsigned int
>>>>>>>> page_order,
>>>>>>>>>> +                                         p2m_type_t p2mt,
>>>>>>>>>> +                                         p2m_access_t p2ma,
>>>>>>>>>> +                                         unsigned int sve);
>>>>>>>>>> +    mfn_t              (*get_entry_full)(struct p2m_domain *p2m,
>>>>>>>>>> +                                         unsigned long gfn,
>>>>>>>>>> +                                         p2m_type_t *p2mt,
>>>>>>>>>> +                                         p2m_access_t *p2ma,
>>>>>>>>>> +                                         p2m_query_t q,
>>>>>>>>>> +                                         unsigned int *page_order,
>>>>>>>>>> +                                         unsigned int *sve);
>>>>>>>>>
>>>>>>>>> I have to admit that I find the _full suffixes here pretty odd. Based
>>>>>>>>> on the functionality, they should be _sve. But then it seems
>>>>>>>>> questionable how they could be useful to the generic p2m layer
>>>>>>>>> anyway, i.e. why there would need to be such hooks in the first
>>>>>>>>> place.
>>>>>>>>
>>>>>>>> I did originally use _sve suffixes. I changed them because there
>>>>>>>> may be some future case where these routines control some other
>>>>>>>> EPTE bit too. I made them hooks because I thought calling ept...
>>>>>>>> functions directly would be a layering violation.
>>>>>>>
>>>>>>> Indeed it would. But thinking about it more, I would suggest to
>>>>>>> extend the existing accessors rather than adding new ones.
>>>>>>> Just consider what would result when further such return values
>>>>>>> are going to be needed in the future: I don't see us adding
>>>>>>> _fuller, _fullest, etc variants. Perhaps just make the new output
>>>>>>> an optional generic "flags" one. One might even consider folding
>>>>>>> it with order, or even consolidate all the outputs into a single
>>>>>>> structure.
>>>>>>
>>>>>> The new functions are called in 3 places only, so changing them
>>>>>> later would have minimal impact. The existing functions are called
>>>>>> in many, many places. I *really* don't want to go changing the
>>>>>> amount of existing code that doing what you suggest would entail
>>>>>> at this late stage.
>>>>>
>>>>> I continue to think differently (and I don't consider "at this late
>>>>> stage" a particularly relevant argument), but the maintainer will
>>>>> have the final say anyway - George?
>>>>>
>>>>
>>>> The patch as it is now doesn't disturb (and risk breaking) any
>>>> existing code. I'd much rather stick with that for 4.6, even if
>>>> only on the condition that I have to change it later. If I do
>>>> what you suggest, that sets me up to fail to get anything in
>>>> 4.6. That may not matter to you, but it matters to me.
>>>
>>> Sorry, I've just gotten up to speed enough to figure out what the
>>> question is about.
>>>
>>> For future reference: what has the highest risk of breaking existing
>>> code is touching the codepath, not doing an almost entirely mechanical
>>> change.  From that perspective, you have changed all paths through
>>> [gs]et_entry() already (on Intel boxes at least).  I wouldn't have
>>> considered a global search-and-replace where the defaults are always
>>> the same (and propagation of the interface through the generic and AMD
>>> function signatures) as a particularly invasive change -- at least,
>>> not any more than the code you have here.
>>>
>>> It looks like the existing p2m->set_entry() function is only called in
>>> 6 places -- 5 times in p2m.c and once in mem_sharing.c; and
>>> p2m->get_entry() is called in about two dozen places, all in p2m.c
>>> (and again one in mem_sharing.c).  If you change the [gs]et_entry()
>>> hooks, but have p2m_set_entry() pass in the default, it shouldn't be
>>> that big of an impact (particularly as the get_entry() will just be
>>> passing NULL).
>>>
>>> I do think that avoiding magic numbers is important, at least for the
>>> default; for example:
>>>
>>> #define P2M_SUPPRESS_VE_DEFAULT (-1)
>>>
>>> Another option would be to make an enum with {default, clear, set},
>>> but that's probably overkill.
>>>
>>
>> I certainly don't want to speak for Jan, but my reading of his
>> comments suggests that wouldn't be enough to satisfy him. He
>> seemed to me to object to the whole idea of adding something
>> specifically to handle suppress_ve, and thought any change should
>> offer a more general 'control extra (E)PTE bits' interface.
> 
> I understood Jan's objection to be to adding two extra hooks ("But then
> it seems questionable how they could be useful to the generic p2m layer
> anyway, i.e. why there would need to be such hooks in the first
> place."), instead of just adding an extra field to the existing
> [gs]et_p2m_entry() ("But thinking about it more, I would suggest to
> extend the existing accessors rather than adding new ones.")
> 
> He does suggest the idea of making the interface generic, by for example
> making it an extentable "flags" argument, or by changing the whole thing
> to accept a pointer to a struct, rather than adding more and more
> arguments that need to be set (and which, like p2m_access_t, almost
> everybody just either uses the default or passes on what was passed to
> them), add a pointer to a struct ("One might even consider folding it
> with order, or even consolidate all the outputs into a single
> structure.") But I think that may be a clean-up thing we do next round.
> 
> The first quote above isn't 100% clear, so I can see why you might think
> he meant not to expose SVE directly.
> 
>> If the requirement is only to add control of suppress_ve, I honestly
>> don't understand what is wrong with the way I have already done it.
>> There is certainly precedent for adding extra p2m hook functions that
>> are VMX-specific (look at the PML patch series), and I haven't
>> changed lots of code that I have no way to test, which is one of
>> the concerns I have about changing set/get everywhere.
>>
>> If the objection is to me wrapping the existing EPT set/get functions,
>> I could add entirely separate functions that only manipulate
>> suppress_ve. The reason I didn't is that I would need to duplicate
>> a lot of the code in the existing functions.
> 
> The objection isn't to the wrapping; the objection is to adding new
> hooks that are *almost entirely identical* to the old hooks, but have
> one extra parameter.
> 
> The PML series added new hooks that were *completely new* in functionality.
> 
>> I want to be clear: you are the maintainers, and in the end you have
>> final say; however, I've been developing system software for a long
>> time and I really don't understand why you think requiring a design
>> that changes more source code for no functional effect is a good
>> idea.
> 
> If it were simply a matter of making a new function call (by adding _ to
> the front or _internal to the end), then yeah, this wrapper scheme would
> probably be better than going around changing all the entries that don't
> use the extra value.  But p2m->set_entry() is *already* the internal
> function which is wrapped by p2m_set_entry().
> 
> Introducing yet another layer -- particularly in a hooked interface like
> this -- just seems clunky.  It's not the worst thing in the world; if I
> thought this would be the difference between making it or not, I might
> just say fix it later.  But I don't think it will; and these little
> things add up.
> 

I don't want to change set/get everywhere, and Tim already made it clear
that coupling suppress_ve with p2m_type_t is not acceptable.

How can I provide an implementation that does not do either of the above
but does allow access to suppress_ve in a way that is acceptable?

Tell me and I will do it.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-06 18:29                   ` George Dunlap
  2015-07-06 18:43                     ` Ed White
@ 2015-07-07  8:04                     ` Jan Beulich
  1 sibling, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-07-07  8:04 UTC (permalink / raw)
  To: George Dunlap, Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson,
	xen-devel, tlengyel, Daniel De Graaf

>>> On 06.07.15 at 20:29, <george.dunlap@eu.citrix.com> wrote:
> On 07/06/2015 06:35 PM, Ed White wrote:
>> I certainly don't want to speak for Jan, but my reading of his
>> comments suggests that wouldn't be enough to satisfy him. He
>> seemed to me to object to the whole idea of adding something
>> specifically to handle suppress_ve, and thought any change should
>> offer a more general 'control extra (E)PTE bits' interface.
> 
> I understood Jan's objection to be to adding two extra hooks ("But then
> it seems questionable how they could be useful to the generic p2m layer
> anyway, i.e. why there would need to be such hooks in the first
> place."), instead of just adding an extra field to the existing
> [gs]et_p2m_entry() ("But thinking about it more, I would suggest to
> extend the existing accessors rather than adding new ones.")
> 
> He does suggest the idea of making the interface generic, by for example
> making it an extentable "flags" argument, or by changing the whole thing
> to accept a pointer to a struct, rather than adding more and more
> arguments that need to be set (and which, like p2m_access_t, almost
> everybody just either uses the default or passes on what was passed to
> them), add a pointer to a struct ("One might even consider folding it
> with order, or even consolidate all the outputs into a single
> structure.") But I think that may be a clean-up thing we do next round.
> 
> The first quote above isn't 100% clear, so I can see why you might think
> he meant not to expose SVE directly.

Indeed you split it up exactly like it was meant: No new, redundant
hook is a requirement in my view, and consolidating the multitude of
parameters would be a (rather desirable) cleanup.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-06 18:43                     ` Ed White
@ 2015-07-07 10:10                       ` George Dunlap
  2015-07-07 16:24                         ` Ed White
  0 siblings, 1 reply; 116+ messages in thread
From: George Dunlap @ 2015-07-07 10:10 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

On 07/06/2015 07:43 PM, Ed White wrote:
>> Introducing yet another layer -- particularly in a hooked interface like
>> this -- just seems clunky.  It's not the worst thing in the world; if I
>> thought this would be the difference between making it or not, I might
>> just say fix it later.  But I don't think it will; and these little
>> things add up.
>>
> 
> I don't want to change set/get everywhere, and Tim already made it clear
> that coupling suppress_ve with p2m_type_t is not acceptable.
> 
> How can I provide an implementation that does not do either of the above
> but does allow access to suppress_ve in a way that is acceptable?
> 
> Tell me and I will do it.

The only reason I can think that you don't want to change get/set is
that you think it's too much work.

So here you go, I modified your patch; it took me 10 minutes, which is
less than what it would have taken me to continue arguing with you.
I've compile-tested it, but not done anything else (including porting
subsequent patches onto it).

 -George

[-- Attachment #2: 0001-x86-altp2m-add-control-of-suppress_ve.patch --]
[-- Type: text/x-patch, Size: 19174 bytes --]

>From 25668e883fee5098785b1492455468bfdbad58f7 Mon Sep 17 00:00:00 2001
From: Ed White <edmund.h.white@intel.com>
Date: Wed, 1 Jul 2015 11:09:32 -0700
Subject: [PATCH] x86/altp2m: add control of suppress_ve.

The existing ept_set_entry() and ept_get_entry() routines are extended
to optionally set/get suppress_ve.  Passing -1 will set suppress_ve on
new p2m entries, or retain suppress_ve flag on existing entries.

Signed-off-by: Ed White <edmund.h.white@intel.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/arch/x86/mm/mem_sharing.c |  5 +++--
 xen/arch/x86/mm/p2m-ept.c     | 18 ++++++++++++----
 xen/arch/x86/mm/p2m-pod.c     | 12 +++++------
 xen/arch/x86/mm/p2m-pt.c      |  5 +++--
 xen/arch/x86/mm/p2m.c         | 50 ++++++++++++++++++++++---------------------
 xen/include/asm-x86/p2m.h     | 24 +++++++++++----------
 6 files changed, 65 insertions(+), 49 deletions(-)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 16e329e..5780a26 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1257,10 +1257,11 @@ int relinquish_shared_pages(struct domain *d)
         p2m_type_t t;
         mfn_t mfn;
         int set_rc;
+        bool_t sve;
 
         if ( atomic_read(&d->shr_pages) == 0 )
             break;
-        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
+        mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, &sve);
         if ( mfn_valid(mfn) && (t == p2m_ram_shared) )
         {
             /* Does not fail with ENOMEM given the DESTROY flag */
@@ -1270,7 +1271,7 @@ int relinquish_shared_pages(struct domain *d)
              * unshare.  Must succeed: we just read the old entry and
              * we hold the p2m lock. */
             set_rc = p2m->set_entry(p2m, gfn, _mfn(0), PAGE_ORDER_4K,
-                                    p2m_invalid, p2m_access_rwx);
+                                    p2m_invalid, p2m_access_rwx, sve);
             ASSERT(set_rc == 0);
             count += 0x10;
         }
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 15c010b..595bbe5 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -658,7 +658,8 @@ bool_t ept_handle_misconfig(uint64_t gpa)
  */
 static int
 ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
-              unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma)
+              unsigned int order, p2m_type_t p2mt, p2m_access_t p2ma,
+              int sve)
 {
     ept_entry_t *table, *ept_entry = NULL;
     unsigned long gfn_remainder = gfn;
@@ -804,7 +805,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         ept_p2m_type_to_flags(p2m, &new_entry, p2mt, p2ma);
     }
 
-    new_entry.suppress_ve = 1;
+    if ( sve != -1 )
+        new_entry.suppress_ve = !!sve;
+    else
+        new_entry.suppress_ve = is_epte_valid(&old_entry) ?
+                                    old_entry.suppress_ve : 1;
 
     rc = atomic_write_ept_entry(ept_entry, new_entry, target);
     if ( unlikely(rc) )
@@ -851,8 +856,9 @@ out:
 
 /* Read ept p2m entries */
 static mfn_t ept_get_entry(struct p2m_domain *p2m,
-                           unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
-                           p2m_query_t q, unsigned int *page_order)
+                            unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
+                            p2m_query_t q, unsigned int *page_order,
+                            bool_t *sve)
 {
     ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
     unsigned long gfn_remainder = gfn;
@@ -866,6 +872,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
 
     *t = p2m_mmio_dm;
     *a = p2m_access_n;
+    if ( sve )
+        *sve = 1;
 
     /* This pfn is higher than the highest the p2m map currently holds */
     if ( gfn > p2m->max_mapped_pfn )
@@ -931,6 +939,8 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
         else
             *t = ept_entry->sa_p2mt;
         *a = ept_entry->access;
+        if ( sve )
+            *sve = ept_entry->suppress_ve;
 
         mfn = _mfn(ept_entry->mfn);
         if ( i )
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index 0679f00..a2f6d02 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -536,7 +536,7 @@ recount:
         p2m_access_t a;
         p2m_type_t t;
 
-        (void)p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL);
+        (void)p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
 
         if ( t == p2m_populate_on_demand )
             pod++;
@@ -587,7 +587,7 @@ recount:
         p2m_type_t t;
         p2m_access_t a;
 
-        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL);
+        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
         if ( t == p2m_populate_on_demand )
         {
             p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
@@ -676,7 +676,7 @@ p2m_pod_zero_check_superpage(struct p2m_domain *p2m, unsigned long gfn)
     for ( i=0; i<SUPERPAGE_PAGES; i++ )
     {
         p2m_access_t a; 
-        mfn = p2m->get_entry(p2m, gfn + i, &type, &a, 0, NULL);
+        mfn = p2m->get_entry(p2m, gfn + i, &type, &a, 0, NULL, NULL);
 
         if ( i == 0 )
         {
@@ -808,7 +808,7 @@ p2m_pod_zero_check(struct p2m_domain *p2m, unsigned long *gfns, int count)
     for ( i=0; i<count; i++ )
     {
         p2m_access_t a;
-        mfns[i] = p2m->get_entry(p2m, gfns[i], types + i, &a, 0, NULL);
+        mfns[i] = p2m->get_entry(p2m, gfns[i], types + i, &a, 0, NULL, NULL);
         /* If this is ram, and not a pagetable or from the xen heap, and probably not mapped
            elsewhere, map it; otherwise, skip. */
         if ( p2m_is_ram(types[i])
@@ -947,7 +947,7 @@ p2m_pod_emergency_sweep(struct p2m_domain *p2m)
     for ( i=p2m->pod.reclaim_single; i > 0 ; i-- )
     {
         p2m_access_t a;
-        (void)p2m->get_entry(p2m, i, &t, &a, 0, NULL);
+        (void)p2m->get_entry(p2m, i, &t, &a, 0, NULL, NULL);
         if ( p2m_is_ram(t) )
         {
             gfns[j] = i;
@@ -1135,7 +1135,7 @@ guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
     for ( i = 0; i < (1UL << order); i++ )
     {
         p2m_access_t a;
-        omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL);
+        omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL, NULL);
         if ( p2m_is_ram(ot) )
         {
             P2M_DEBUG("gfn_to_mfn returned type %d!\n", ot);
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index e50b6fa..37eef38 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -482,7 +482,8 @@ int p2m_pt_handle_deferred_changes(uint64_t gpa)
 /* Returns: 0 for success, -errno for failure */
 static int
 p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
-                 unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
+                 unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma,
+                 int sve)
 {
     /* XXX -- this might be able to be faster iff current->domain == d */
     void *table;
@@ -689,7 +690,7 @@ static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t,
 static mfn_t
 p2m_pt_get_entry(struct p2m_domain *p2m, unsigned long gfn,
                  p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
-                 unsigned int *page_order)
+                 unsigned int *page_order, bool_t *sve)
 {
     mfn_t mfn;
     paddr_t addr = ((paddr_t)gfn) << PAGE_SHIFT;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 4360689..6e1a50c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -342,7 +342,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
         /* Grab the lock here, don't release until put_gfn */
         gfn_lock(p2m, gfn, 0);
 
-    mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
+    mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
     if ( (q & P2M_UNSHARE) && p2m_is_shared(*t) )
     {
@@ -351,7 +351,7 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
          * sleeping. */
         if ( mem_sharing_unshare_page(p2m->domain, gfn, 0) < 0 )
             (void)mem_sharing_notify_enomem(p2m->domain, gfn, 0);
-        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
+        mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
     }
 
     if (unlikely((p2m_is_broken(*t))))
@@ -455,7 +455,7 @@ int p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         else
             order = 0;
 
-        set_rc = p2m->set_entry(p2m, gfn, mfn, order, p2mt, p2ma);
+        set_rc = p2m->set_entry(p2m, gfn, mfn, order, p2mt, p2ma, -1);
         if ( set_rc )
             rc = set_rc;
 
@@ -619,7 +619,7 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
     {
         for ( i = 0; i < (1UL << page_order); i++ )
         {
-            mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, 0, NULL);
+            mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, 0, NULL, NULL);
             if ( !p2m_is_grant(t) && !p2m_is_shared(t) && !p2m_is_foreign(t) )
                 set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY);
             ASSERT( !p2m_is_valid(t) || mfn + i == mfn_x(mfn_return) );
@@ -682,7 +682,7 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
     /* First, remove m->p mappings for existing p->m mappings */
     for ( i = 0; i < (1UL << page_order); i++ )
     {
-        omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL);
+        omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL, NULL);
         if ( p2m_is_shared(ot) )
         {
             /* Do an unshare to cleanly take care of all corner 
@@ -706,7 +706,7 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
                 (void)mem_sharing_notify_enomem(p2m->domain, gfn + i, 0);
                 return rc;
             }
-            omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL);
+            omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL, NULL);
             ASSERT(!p2m_is_shared(ot));
         }
         if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
@@ -754,7 +754,7 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
              * address */
             P2M_DEBUG("aliased! mfn=%#lx, old gfn=%#lx, new gfn=%#lx\n",
                       mfn + i, ogfn, gfn + i);
-            omfn = p2m->get_entry(p2m, ogfn, &ot, &a, 0, NULL);
+            omfn = p2m->get_entry(p2m, ogfn, &ot, &a, 0, NULL, NULL);
             if ( p2m_is_ram(ot) && !p2m_is_paged(ot) )
             {
                 ASSERT(mfn_valid(omfn));
@@ -821,7 +821,7 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn,
 
     gfn_lock(p2m, gfn, 0);
 
-    mfn = p2m->get_entry(p2m, gfn, &pt, &a, 0, NULL);
+    mfn = p2m->get_entry(p2m, gfn, &pt, &a, 0, NULL, NULL);
     rc = likely(pt == ot)
          ? p2m_set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, nt,
                          p2m->default_access)
@@ -905,7 +905,7 @@ static int set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
         return -EIO;
 
     gfn_lock(p2m, gfn, 0);
-    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL);
+    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL, NULL);
     if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
     {
         p2m_unlock(p2m);
@@ -956,7 +956,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
         return -EIO;
 
     gfn_lock(p2m, gfn, 0);
-    actual_mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
+    actual_mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL);
 
     /* Do not use mfn_valid() here as it will usually fail for MMIO pages. */
     if ( (INVALID_MFN == mfn_x(actual_mfn)) || (t != p2m_mmio_direct) )
@@ -992,7 +992,7 @@ int set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
         return -EIO;
 
     gfn_lock(p2m, gfn, 0);
-    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL);
+    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL, NULL);
     /* At the moment we only allow p2m change if gfn has already been made
      * sharable first */
     ASSERT(p2m_is_shared(ot));
@@ -1044,7 +1044,7 @@ int p2m_mem_paging_nominate(struct domain *d, unsigned long gfn)
 
     gfn_lock(p2m, gfn, 0);
 
-    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL, NULL);
 
     /* Check if mfn is valid */
     if ( !mfn_valid(mfn) )
@@ -1106,7 +1106,7 @@ int p2m_mem_paging_evict(struct domain *d, unsigned long gfn)
     gfn_lock(p2m, gfn, 0);
 
     /* Get mfn */
-    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL, NULL);
     if ( unlikely(!mfn_valid(mfn)) )
         goto out;
 
@@ -1238,7 +1238,7 @@ void p2m_mem_paging_populate(struct domain *d, unsigned long gfn)
 
     /* Fix p2m mapping */
     gfn_lock(p2m, gfn, 0);
-    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL, NULL);
     /* Allow only nominated or evicted pages to enter page-in path */
     if ( p2mt == p2m_ram_paging_out || p2mt == p2m_ram_paged )
     {
@@ -1300,7 +1300,7 @@ int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer)
 
     gfn_lock(p2m, gfn, 0);
 
-    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL, NULL);
 
     ret = -ENOENT;
     /* Allow missing pages */
@@ -1388,7 +1388,7 @@ void p2m_mem_paging_resume(struct domain *d, vm_event_response_t *rsp)
         unsigned long gfn = rsp->u.mem_access.gfn;
 
         gfn_lock(p2m, gfn, 0);
-        mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+        mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL, NULL);
         /*
          * Allow only pages which were prepared properly, or pages which
          * were nominated but not evicted.
@@ -1528,16 +1528,17 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     vm_event_request_t *req;
     int rc;
     unsigned long eip = guest_cpu_user_regs()->eip;
+    bool_t sve;
 
     /* First, handle rx2rw conversion automatically.
      * These calls to p2m->set_entry() must succeed: we have the gfn
      * locked and just did a successful get_entry(). */
     gfn_lock(p2m, gfn, 0);
-    mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL);
+    mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL, &sve);
 
     if ( npfec.write_access && p2ma == p2m_access_rx2rw ) 
     {
-        rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2mt, p2m_access_rw);
+        rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2mt, p2m_access_rw, sve);
         ASSERT(rc == 0);
         gfn_unlock(p2m, gfn, 0);
         return 1;
@@ -1546,7 +1547,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
     {
         ASSERT(npfec.write_access || npfec.read_access || npfec.insn_fetch);
         rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
-                            p2mt, p2m_access_rwx);
+                            p2mt, p2m_access_rwx, -1);
         ASSERT(rc == 0);
     }
     gfn_unlock(p2m, gfn, 0);
@@ -1566,14 +1567,14 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long gla,
         else
         {
             gfn_lock(p2m, gfn, 0);
-            mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL);
+            mfn = p2m->get_entry(p2m, gfn, &p2mt, &p2ma, 0, NULL, &sve);
             if ( p2ma != p2m_access_n2rwx )
             {
                 /* A listener is not required, so clear the access
                  * restrictions.  This set must succeed: we have the
                  * gfn locked and just did a successful get_entry(). */
                 rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
-                                    p2mt, p2m_access_rwx);
+                                    p2mt, p2m_access_rwx, sve);
                 ASSERT(rc == 0);
             }
             gfn_unlock(p2m, gfn, 0);
@@ -1652,6 +1653,7 @@ long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr,
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     p2m_access_t a, _a;
+    bool_t sve;
     p2m_type_t t;
     mfn_t mfn;
     long rc = 0;
@@ -1693,8 +1695,8 @@ long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr,
     p2m_lock(p2m);
     for ( pfn += start; nr > start; ++pfn )
     {
-        mfn = p2m->get_entry(p2m, pfn, &t, &_a, 0, NULL);
-        rc = p2m->set_entry(p2m, pfn, mfn, PAGE_ORDER_4K, t, a);
+        mfn = p2m->get_entry(p2m, pfn, &t, &_a, 0, NULL, &sve);
+        rc = p2m->set_entry(p2m, pfn, mfn, PAGE_ORDER_4K, t, a, sve);
         if ( rc )
             break;
 
@@ -1742,7 +1744,7 @@ int p2m_get_mem_access(struct domain *d, unsigned long pfn,
     }
 
     gfn_lock(p2m, gfn, 0);
-    mfn = p2m->get_entry(p2m, pfn, &t, &a, 0, NULL);
+    mfn = p2m->get_entry(p2m, pfn, &t, &a, 0, NULL, NULL);
     gfn_unlock(p2m, gfn, 0);
 
     if ( mfn_x(mfn) == INVALID_MFN )
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 079a298..0a172e0 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -226,17 +226,19 @@ struct p2m_domain {
     /* Pages used to construct the p2m */
     struct page_list_head pages;
 
-    int                (*set_entry   )(struct p2m_domain *p2m,
-                                       unsigned long gfn,
-                                       mfn_t mfn, unsigned int page_order,
-                                       p2m_type_t p2mt,
-                                       p2m_access_t p2ma);
-    mfn_t              (*get_entry   )(struct p2m_domain *p2m,
-                                       unsigned long gfn,
-                                       p2m_type_t *p2mt,
-                                       p2m_access_t *p2ma,
-                                       p2m_query_t q,
-                                       unsigned int *page_order);
+    int                (*set_entry)(struct p2m_domain *p2m,
+                                         unsigned long gfn,
+                                         mfn_t mfn, unsigned int page_order,
+                                         p2m_type_t p2mt,
+                                         p2m_access_t p2ma,
+                                         int sve);
+    mfn_t              (*get_entry)(struct p2m_domain *p2m,
+                                         unsigned long gfn,
+                                         p2m_type_t *p2mt,
+                                         p2m_access_t *p2ma,
+                                         p2m_query_t q,
+                                         unsigned int *page_order,
+                                         bool_t *sve);
     void               (*enable_hardware_log_dirty)(struct p2m_domain *p2m);
     void               (*disable_hardware_log_dirty)(struct p2m_domain *p2m);
     void               (*flush_hardware_cached_dirty)(struct p2m_domain *p2m);
-- 
1.9.1


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-07 10:10                       ` George Dunlap
@ 2015-07-07 16:24                         ` Ed White
  2015-07-07 17:33                           ` George Dunlap
  2015-07-08  7:23                           ` Jan Beulich
  0 siblings, 2 replies; 116+ messages in thread
From: Ed White @ 2015-07-07 16:24 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

On 07/07/2015 03:10 AM, George Dunlap wrote:
> On 07/06/2015 07:43 PM, Ed White wrote:
>>> Introducing yet another layer -- particularly in a hooked interface like
>>> this -- just seems clunky.  It's not the worst thing in the world; if I
>>> thought this would be the difference between making it or not, I might
>>> just say fix it later.  But I don't think it will; and these little
>>> things add up.
>>>
>>
>> I don't want to change set/get everywhere, and Tim already made it clear
>> that coupling suppress_ve with p2m_type_t is not acceptable.
>>
>> How can I provide an implementation that does not do either of the above
>> but does allow access to suppress_ve in a way that is acceptable?
>>
>> Tell me and I will do it.
> 
> The only reason I can think that you don't want to change get/set is
> that you think it's too much work.
> 
> So here you go, I modified your patch; it took me 10 minutes, which is
> less than what it would have taken me to continue arguing with you.
> I've compile-tested it, but not done anything else (including porting
> subsequent patches onto it).
> 

I'm disappointed that you think that. I respect yours, Jan's, etc. role
as maintainers, and your absolute right to reject anything you think
is inappropriate. It's clear that Jan, and now apparently you, don't
respect my abilities or desire to do good work.

I won't make the changes you suggest because I don't think they
represent good design, and I don't think they are the right way to
solve the issue at hand.

I'm going to hand the Intel end of further discussions off to Ravi
Sahita.

Ed

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-07 16:24                         ` Ed White
@ 2015-07-07 17:33                           ` George Dunlap
  2015-07-07 17:38                             ` Sahita, Ravi
  2015-07-08  7:23                           ` Jan Beulich
  1 sibling, 1 reply; 116+ messages in thread
From: George Dunlap @ 2015-07-07 17:33 UTC (permalink / raw)
  To: Ed White
  Cc: Ravi Sahita, Wei Liu, Ian Jackson, Tim Deegan, xen-devel,
	Jan Beulich, Andrew Cooper, tlengyel, Daniel De Graaf

On 07/07/2015 05:24 PM, Ed White wrote:
> On 07/07/2015 03:10 AM, George Dunlap wrote:
>> On 07/06/2015 07:43 PM, Ed White wrote:
>>>> Introducing yet another layer -- particularly in a hooked interface like
>>>> this -- just seems clunky.  It's not the worst thing in the world; if I
>>>> thought this would be the difference between making it or not, I might
>>>> just say fix it later.  But I don't think it will; and these little
>>>> things add up.
>>>>
>>>
>>> I don't want to change set/get everywhere, and Tim already made it clear
>>> that coupling suppress_ve with p2m_type_t is not acceptable.
>>>
>>> How can I provide an implementation that does not do either of the above
>>> but does allow access to suppress_ve in a way that is acceptable?
>>>
>>> Tell me and I will do it.
>>
>> The only reason I can think that you don't want to change get/set is
>> that you think it's too much work.
>>
>> So here you go, I modified your patch; it took me 10 minutes, which is
>> less than what it would have taken me to continue arguing with you.
>> I've compile-tested it, but not done anything else (including porting
>> subsequent patches onto it).
>>
> 
> I'm disappointed that you think that. I respect yours, Jan's, etc. role
> as maintainers, and your absolute right to reject anything you think
> is inappropriate. It's clear that Jan, and now apparently you, don't
> respect my abilities or desire to do good work.
> 
> I won't make the changes you suggest because I don't think they
> represent good design, and I don't think they are the right way to
> solve the issue at hand.
> 
> I'm going to hand the Intel end of further discussions off to Ravi
> Sahita.

So let's clarify a few things.

First, I do respect your work; on the whole I've found this patch series
sensibly designed and quite readable, given how complex the feature is
you're trying to implement.

Secondly, I understand that you think that changing all the callers to
set_entry and get_entry is ugly, and that if you were the maintainer you
would do things differently.

However, the world is a big place and lots of people have different
ideas of what's ugly and what's not.  The way they do things in qemu is
different than the way they do things in libvirt, which is different
than the way they do things in the Linux kernel.  If the maintainer asks
for things to be done a certain way, you just do it that way, even if
you think it's ugly.  That's just how things are sometimes.

So when you said you didn't want to change get/set, after being told by
both myself and Jan that we thought that was the right approach, I
assumed that the reason was that you felt it unfair to be forced to do
the tedious task of tracking down all the changes and doing it yourself.

Well, it is a bit tedious to do that, and I can see how you might be
upset about doing it.  On the other hand, I think that's the right thing
to do.

So rather than try to force you to do it, or compromising on what I
think is the right thing to do, I decided, as a show of good faith, to
do the tedious task myself, so that you wouldn't have to.

It sounds like you took this as an insult; hopefully now you understand
that it wasn't.

It also sounds like you are refusing to make this change because you're
standing on your principles.  I don't really understand that attitude,
particularly in this case.  That's your choice, I'm not going to argue
with it, though I do hope you'll reconsider.

I will say, however, that if you are unwilling to compromise on this
sort of thing, then contributing to open-source projects is probably not
going to be in general a very fruitful activity for you.

 -George

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-07 17:33                           ` George Dunlap
@ 2015-07-07 17:38                             ` Sahita, Ravi
  2015-07-08  7:24                               ` Jan Beulich
  2015-07-08 10:12                               ` Tim Deegan
  0 siblings, 2 replies; 116+ messages in thread
From: Sahita, Ravi @ 2015-07-07 17:38 UTC (permalink / raw)
  To: George Dunlap, White, Edmund H
  Cc: Wei Liu, Ian Jackson, Tim Deegan, xen-devel, Jan Beulich,
	Andrew Cooper, tlengyel, Daniel De Graaf



>From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
>Sent: Tuesday, July 07, 2015 10:34 AM
>
>On 07/07/2015 05:24 PM, Ed White wrote:
>> On 07/07/2015 03:10 AM, George Dunlap wrote:
>>> On 07/06/2015 07:43 PM, Ed White wrote:
>>>>> Introducing yet another layer -- particularly in a hooked interface
>>>>> like this -- just seems clunky.  It's not the worst thing in the
>>>>> world; if I thought this would be the difference between making it
>>>>> or not, I might just say fix it later.  But I don't think it will;
>>>>> and these little things add up.
>>>>>
>>>>
>>>> I don't want to change set/get everywhere, and Tim already made it
>>>> clear that coupling suppress_ve with p2m_type_t is not acceptable.
>>>>
>>>> How can I provide an implementation that does not do either of the
>>>> above but does allow access to suppress_ve in a way that is acceptable?
>>>>
>>>> Tell me and I will do it.
>>>
>>> The only reason I can think that you don't want to change get/set is
>>> that you think it's too much work.
>>>
>>> So here you go, I modified your patch; it took me 10 minutes, which
>>> is less than what it would have taken me to continue arguing with you.
>>> I've compile-tested it, but not done anything else (including porting
>>> subsequent patches onto it).
>>>
>>
>> I'm disappointed that you think that. I respect yours, Jan's, etc.
>> role as maintainers, and your absolute right to reject anything you
>> think is inappropriate. It's clear that Jan, and now apparently you,
>> don't respect my abilities or desire to do good work.
>>
>> I won't make the changes you suggest because I don't think they
>> represent good design, and I don't think they are the right way to
>> solve the issue at hand.
>>
>> I'm going to hand the Intel end of further discussions off to Ravi
>> Sahita.
>
>So let's clarify a few things.
>
>First, I do respect your work; on the whole I've found this patch series sensibly
>designed and quite readable, given how complex the feature is you're trying
>to implement.
>
>Secondly, I understand that you think that changing all the callers to set_entry
>and get_entry is ugly, and that if you were the maintainer you would do things
>differently.
>
>However, the world is a big place and lots of people have different ideas of
>what's ugly and what's not.  The way they do things in qemu is different than
>the way they do things in libvirt, which is different than the way they do things
>in the Linux kernel.  If the maintainer asks for things to be done a certain way,
>you just do it that way, even if you think it's ugly.  That's just how things are
>sometimes.
>
>So when you said you didn't want to change get/set, after being told by both
>myself and Jan that we thought that was the right approach, I assumed that
>the reason was that you felt it unfair to be forced to do the tedious task of
>tracking down all the changes and doing it yourself.
>
>Well, it is a bit tedious to do that, and I can see how you might be upset about
>doing it.  On the other hand, I think that's the right thing to do.
>
>So rather than try to force you to do it, or compromising on what I think is the
>right thing to do, I decided, as a show of good faith, to do the tedious task
>myself, so that you wouldn't have to.
>
>It sounds like you took this as an insult; hopefully now you understand that it
>wasn't.
>
>It also sounds like you are refusing to make this change because you're
>standing on your principles.  I don't really understand that attitude, particularly
>in this case.  That's your choice, I'm not going to argue with it, though I do
>hope you'll reconsider.
>
>I will say, however, that if you are unwilling to compromise on this sort of
>thing, then contributing to open-source projects is probably not going to be in
>general a very fruitful activity for you.
>
> -George

Thanks for taking the time to do this George, and for sending the followup email - we do appreciate your show of good faith, and putting it in code :-).
In order to make forward progress, do the other maintainers (Jan, Andrew, Tim) agree with the patch direction that George has suggested for this particular patch? 

Ravi

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-07 16:24                         ` Ed White
  2015-07-07 17:33                           ` George Dunlap
@ 2015-07-08  7:23                           ` Jan Beulich
  1 sibling, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-07-08  7:23 UTC (permalink / raw)
  To: Ed White
  Cc: Tim Deegan, Ravi Sahita, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, xen-devel, tlengyel, Daniel De Graaf

>>> On 07.07.15 at 18:24, <edmund.h.white@intel.com> wrote:
> I'm disappointed that you think that. I respect yours, Jan's, etc. role
> as maintainers, and your absolute right to reject anything you think
> is inappropriate. It's clear that Jan, and now apparently you, don't
> respect my abilities or desire to do good work.

I'm not sure what you deduced this from: Not agreeing with a
certain implementation detail decision you took doesn't mean a
lack of respect of your abilities or desire to do good work, at
least not to me. I regret if you felt offended in any way.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-07 17:38                             ` Sahita, Ravi
@ 2015-07-08  7:24                               ` Jan Beulich
  2015-07-08 10:12                               ` Tim Deegan
  1 sibling, 0 replies; 116+ messages in thread
From: Jan Beulich @ 2015-07-08  7:24 UTC (permalink / raw)
  To: Ravi Sahita
  Cc: Tim Deegan, Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson,
	Edmund H White, xen-devel, tlengyel, Daniel De Graaf

>>> On 07.07.15 at 19:38, <ravi.sahita@intel.com> wrote:
> In order to make forward progress, do the other maintainers (Jan, Andrew, 
> Tim) agree with the patch direction that George has suggested for this 
> particular patch? 

I for my part do, with the assumption that post-4.6 consolidation of
the increasingly ugly interface is going to be done.

Jan

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-07 17:38                             ` Sahita, Ravi
  2015-07-08  7:24                               ` Jan Beulich
@ 2015-07-08 10:12                               ` Tim Deegan
  2015-07-08 12:51                                 ` George Dunlap
  1 sibling, 1 reply; 116+ messages in thread
From: Tim Deegan @ 2015-07-08 10:12 UTC (permalink / raw)
  To: Ravi Sahita
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Ian Jackson,
	Edmund H White, xen-devel, Jan Beulich, tlengyel,
	Daniel De Graaf

Hi,

At 17:38 +0000 on 07 Jul (1436290689), Sahita, Ravi wrote:
> In order to make forward progress, do the other maintainers (Jan,
> Andrew, Tim) agree with the patch direction that George has
> suggested for this particular patch?

I'm no longer a maintainer for this code, but FWIW I think that this
direction (adding a new argument to the internal APIs rather than
adding new internal APIs) is correct.

Because the sve bit must be _set_ to get the old/default behaviour, I
think the p2m_pt implementation should always return sve = 1 on _get
and possibly also assert sve != 0 on _set.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH v2 07/12] x86/altp2m: add control of suppress_ve.
  2015-07-08 10:12                               ` Tim Deegan
@ 2015-07-08 12:51                                 ` George Dunlap
  0 siblings, 0 replies; 116+ messages in thread
From: George Dunlap @ 2015-07-08 12:51 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Ravi Sahita, Wei Liu, Andrew Cooper, Ian Jackson, Edmund H White,
	xen-devel, Jan Beulich, tlengyel, Daniel De Graaf

On Wed, Jul 8, 2015 at 11:12 AM, Tim Deegan <tim@xen.org> wrote:
> Hi,
>
> At 17:38 +0000 on 07 Jul (1436290689), Sahita, Ravi wrote:
>> In order to make forward progress, do the other maintainers (Jan,
>> Andrew, Tim) agree with the patch direction that George has
>> suggested for this particular patch?
>
> I'm no longer a maintainer for this code, but FWIW I think that this
> direction (adding a new argument to the internal APIs rather than
> adding new internal APIs) is correct.
>
> Because the sve bit must be _set_ to get the old/default behaviour, I
> think the p2m_pt implementation should always return sve = 1 on _get
> and possibly also assert sve != 0 on _set.

Yes, I was thinking about this after I sent the patch.  If you
re-send, Ravi, please modify the patch as Tim suggests.  (Use your
best judgement about asserting sve != 0.)

 -George

^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2015-07-08 12:51 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22 18:56 [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Ed White
2015-06-22 18:56 ` [PATCH v2 01/12] VMX: VMFUNC and #VE definitions and detection Ed White
2015-06-24  8:45   ` Andrew Cooper
2015-06-22 18:56 ` [PATCH v2 02/12] VMX: implement suppress #VE Ed White
2015-06-24  9:35   ` Andrew Cooper
2015-06-29 14:20   ` George Dunlap
2015-06-29 14:31     ` Andrew Cooper
2015-06-29 15:03       ` George Dunlap
2015-06-29 16:21         ` Sahita, Ravi
2015-06-29 16:21         ` Ed White
2015-06-22 18:56 ` [PATCH v2 03/12] x86/HVM: Hardware alternate p2m support detection Ed White
2015-06-24  9:44   ` Andrew Cooper
2015-06-24 10:07     ` Jan Beulich
2015-06-22 18:56 ` [PATCH v2 04/12] x86/altp2m: basic data structures and support routines Ed White
2015-06-24 10:06   ` Andrew Cooper
2015-06-24 10:23     ` Jan Beulich
2015-06-24 17:20     ` Ed White
2015-06-24 10:29   ` Andrew Cooper
2015-06-24 11:14     ` Andrew Cooper
2015-06-26 21:17     ` Ed White
2015-06-27 19:25       ` Ed White
2015-06-29 13:00       ` Andrew Cooper
2015-06-29 16:23         ` Ed White
2015-06-24 14:44   ` Jan Beulich
2015-06-22 18:56 ` [PATCH v2 05/12] VMX/altp2m: add code to support EPTP switching and #VE Ed White
2015-06-24 11:59   ` Andrew Cooper
2015-06-24 17:31     ` Ed White
2015-06-24 17:40       ` Andrew Cooper
2015-06-22 18:56 ` [PATCH v2 06/12] VMX: add VMFUNC leaf 0 (EPTP switching) to emulator Ed White
2015-06-24 12:47   ` Andrew Cooper
2015-06-24 20:29     ` Ed White
2015-06-25  8:26       ` Jan Beulich
2015-06-24 14:26   ` Jan Beulich
2015-06-22 18:56 ` [PATCH v2 07/12] x86/altp2m: add control of suppress_ve Ed White
2015-06-24 13:05   ` Andrew Cooper
2015-06-24 14:38   ` Jan Beulich
2015-06-24 17:53     ` Ed White
2015-06-25  8:12       ` Jan Beulich
2015-06-25 16:36         ` Ed White
2015-06-26  6:04           ` Jan Beulich
2015-06-26 16:27             ` Ed White
2015-07-06 17:12               ` George Dunlap
2015-07-06 17:35                 ` Ed White
2015-07-06 18:29                   ` George Dunlap
2015-07-06 18:43                     ` Ed White
2015-07-07 10:10                       ` George Dunlap
2015-07-07 16:24                         ` Ed White
2015-07-07 17:33                           ` George Dunlap
2015-07-07 17:38                             ` Sahita, Ravi
2015-07-08  7:24                               ` Jan Beulich
2015-07-08 10:12                               ` Tim Deegan
2015-07-08 12:51                                 ` George Dunlap
2015-07-08  7:23                           ` Jan Beulich
2015-07-07  8:04                     ` Jan Beulich
2015-06-22 18:56 ` [PATCH v2 08/12] x86/altp2m: alternate p2m memory events Ed White
2015-06-24 13:09   ` Andrew Cooper
2015-06-24 16:01   ` Lengyel, Tamas
2015-06-24 18:02     ` Ed White
2015-06-22 18:56 ` [PATCH v2 09/12] x86/altp2m: add remaining support routines Ed White
2015-06-23 18:15   ` Lengyel, Tamas
2015-06-23 18:52     ` Ed White
2015-06-23 19:35       ` Lengyel, Tamas
2015-06-24 13:46   ` Andrew Cooper
2015-06-24 17:47     ` Ed White
2015-06-24 18:19       ` Andrew Cooper
2015-06-26 16:30         ` Ed White
2015-06-29 13:03           ` Andrew Cooper
2015-06-29 16:24             ` Ed White
2015-06-24 16:15   ` Lengyel, Tamas
2015-06-24 18:06     ` Ed White
2015-06-25  8:52       ` Ian Campbell
2015-06-25 16:27         ` Ed White
2015-06-25 12:44       ` Lengyel, Tamas
2015-06-25 13:40         ` Razvan Cojocaru
2015-06-25 16:48           ` Ed White
2015-06-25 17:39             ` Sahita, Ravi
2015-06-25 18:22             ` Razvan Cojocaru
2015-06-25 18:23             ` Lengyel, Tamas
2015-06-25 20:46               ` Ed White
2015-06-25 22:45                 ` Lengyel, Tamas
2015-06-25 23:10                   ` Ed White
2015-06-25  2:44   ` Lengyel, Tamas
2015-06-25 16:31     ` Ed White
2015-06-25 17:42       ` Lengyel, Tamas
2015-06-25 20:27         ` Ed White
2015-06-25 21:33           ` Lengyel, Tamas
2015-06-22 18:56 ` [PATCH v2 10/12] x86/altp2m: define and implement alternate p2m HVMOP types Ed White
2015-06-24 13:58   ` Andrew Cooper
2015-06-24 14:53   ` Jan Beulich
2015-06-22 18:56 ` [PATCH v2 11/12] x86/altp2m: Add altp2mhvm HVM domain parameter Ed White
2015-06-24 14:06   ` Andrew Cooper
2015-06-24 14:59   ` Jan Beulich
2015-06-24 17:57     ` Ed White
2015-06-24 18:08       ` Andrew Cooper
2015-06-25  8:34         ` Jan Beulich
2015-06-25  8:33       ` Jan Beulich
2015-06-22 18:56 ` [PATCH v2 12/12] x86/altp2m: XSM hooks for altp2m HVM ops Ed White
2015-06-26 19:24   ` Daniel De Graaf
2015-06-26 19:35     ` Ed White
2015-06-29 17:52       ` Daniel De Graaf
2015-06-29 17:55         ` Sahita, Ravi
2015-06-23 21:27 ` [PATCH v2 00/12] Alternate p2m: support multiple copies of host p2m Lengyel, Tamas
2015-06-23 22:25   ` Ed White
2015-06-24  5:39   ` Razvan Cojocaru
2015-06-24 13:32     ` Lengyel, Tamas
2015-06-24 13:37       ` Razvan Cojocaru
2015-06-24 16:43         ` Ed White
2015-06-24 21:34           ` Lengyel, Tamas
2015-06-24 22:02             ` Ed White
2015-06-24 22:45               ` Lengyel, Tamas
2015-06-24 22:55                 ` Ed White
2015-06-25  9:00                   ` Andrew Cooper
2015-06-25 16:38                     ` Ed White
2015-06-25 17:29                       ` Lengyel, Tamas
2015-06-25 20:34                         ` Ed White
2015-06-24 14:10 ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).