All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] x86: misc changes
@ 2015-03-17 15:48 Jan Beulich
  2015-03-17 15:54 ` [PATCH 1/3] x86: allow 64‑bit PV guest kernels to suppress user mode exposure of M2P Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jan Beulich @ 2015-03-17 15:48 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, Andrew Cooper, Keir Fraser, Ian Jackson, Tim Deegan

The main point of the series is really patch 1 (which after consultation
among the security team doesn't appear to represent a security fix);
the other two are just cleanup that I found possible/desirable while
putting together the first one.

1: x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P
2: slightly reduce vm_assist code
3: x86/shadow: pass domain to sh_install_xen_entries_in_lN()

Signed-off-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] x86: allow 64‑bit PV guest kernels to suppress user mode exposure of M2P
  2015-03-17 15:48 [PATCH 0/3] x86: misc changes Jan Beulich
@ 2015-03-17 15:54 ` Jan Beulich
  2015-03-19 12:35   ` [PATCH 1/3] x86: allow 64?bit " Tim Deegan
  2015-03-17 15:55 ` [PATCH 2/3] slightly reduce vm_assist code Jan Beulich
  2015-03-17 15:56 ` [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN() Jan Beulich
  2 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2015-03-17 15:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, Andrew Cooper, Keir Fraser, Ian Jackson, Tim Deegan

[-- Attachment #1: Type: text/plain, Size: 12979 bytes --]

Xen L4 entries being uniformly installed into any L4 table and 64-bit
PV kernels running in ring 3 means that user mode was able to see the
read-only M2P presented by Xen to the guests. While apparently not
really representing an exploitable information leak, this still very
certainly was never meant to be that way.

Building on the fact that these guests already have separate kernel and
user mode page tables we can allow guest kernels to tell Xen that they
don't want user mode to see this table. We can't, however, do this by
default: There is no ABI requirement that kernel and user mode page
tables be separate. Therefore introduce a new VM-assist flag allowing
the guest to control respective hypervisor behavior:
- when not set, L4 tables get created with the respective slot blank,
  and whenever the L4 table gets used as a kernel one the missing
  mapping gets inserted,
- when set, L4 tables get created with the respective slot initialized
  as before, and whenever the L4 table gets used as a user one the
  mapping gets zapped.

Since the new flag gets assigned a value discontiguous to the existing
ones (in order to preserve the low bits, as only those are currently
accessible to 32-bit guests), this requires a little bit of rework of
the VM assist code in general: An architecture specific
VM_ASSIST_VALID definition gets introduced (with an optional compat
mode counterpart), and compilation of the respective code becomes
conditional upon this being defined (ARM doesn't wire these up and
hence doesn't need that code).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -339,7 +339,7 @@ static int setup_compat_l4(struct vcpu *
 
     l4tab = __map_domain_page(pg);
     clear_page(l4tab);
-    init_guest_l4_table(l4tab, v->domain);
+    init_guest_l4_table(l4tab, v->domain, 1);
     unmap_domain_page(l4tab);
 
     v->arch.guest_table = pagetable_from_page(pg);
@@ -971,7 +971,17 @@ int arch_set_info_guest(
         case -EINTR:
             rc = -ERESTART;
         case -ERESTART:
+            break;
         case 0:
+            if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
+                 !paging_mode_refcounts(d) )
+            {
+                l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
+
+                l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+                    idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+                unmap_domain_page(l4tab);
+            }
             break;
         default:
             if ( cr3_page == current->arch.old_guest_table )
@@ -1006,7 +1016,16 @@ int arch_set_info_guest(
                 default:
                     if ( cr3_page == current->arch.old_guest_table )
                         cr3_page = NULL;
+                    break;
                 case 0:
+                    if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                    {
+                        l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
+
+                        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+                            l4e_empty();
+                        unmap_domain_page(l4tab);
+                    }
                     break;
                 }
             }
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -1203,7 +1203,7 @@ int __init construct_dom0(
         l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
     }
     clear_page(l4tab);
-    init_guest_l4_table(l4tab, d);
+    init_guest_l4_table(l4tab, d, 0);
     v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
     if ( is_pv_32on64_domain(d) )
         v->arch.guest_table_user = v->arch.guest_table;
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1380,7 +1380,8 @@ static int alloc_l3_table(struct page_in
     return rc > 0 ? 0 : rc;
 }
 
-void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d)
+void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+                         bool_t zap_ro_mpt)
 {
     /* Xen private mappings. */
     memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
@@ -1395,6 +1396,8 @@ void init_guest_l4_table(l4_pgentry_t l4
         l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR);
     l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
         l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR);
+    if ( zap_ro_mpt || is_pv_32on64_domain(d) || paging_mode_refcounts(d) )
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
 }
 
 static int alloc_l4_table(struct page_info *page)
@@ -1444,7 +1447,7 @@ static int alloc_l4_table(struct page_in
         adjust_guest_l4e(pl4e[i], d);
     }
 
-    init_guest_l4_table(pl4e, d);
+    init_guest_l4_table(pl4e, d, !VM_ASSIST(d, VMASST_TYPE_m2p_strict));
     unmap_domain_page(pl4e);
 
     return rc > 0 ? 0 : rc;
@@ -2755,6 +2758,14 @@ int new_guest_cr3(unsigned long mfn)
 
     invalidate_shadow_ldt(curr, 0);
 
+    if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && !paging_mode_refcounts(d) )
+    {
+        l4_pgentry_t *l4tab = map_domain_page(mfn);
+
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+            idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+        unmap_domain_page(l4tab);
+    }
     curr->arch.guest_table = pagetable_from_pfn(mfn);
     update_cr3(curr);
 
@@ -3112,6 +3123,14 @@ long do_mmuext_op(
                                 op.arg1.mfn);
                     break;
                 }
+                if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
+                     !paging_mode_refcounts(d) )
+                {
+                    l4_pgentry_t *l4tab = map_domain_page(op.arg1.mfn);
+
+                    l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+                    unmap_domain_page(l4tab);
+                }
             }
 
             curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1436,6 +1436,9 @@ void sh_install_xen_entries_in_l4(struct
         shadow_l4e_from_mfn(page_to_mfn(d->arch.perdomain_l3_pg),
                             __PAGE_HYPERVISOR);
 
+    if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+        sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] = shadow_l4e_empty();
+
     /* Shadow linear mapping for 4-level shadows.  N.B. for 3-level
      * shadows on 64-bit xen, this linear mapping is later replaced by the
      * monitor pagetable structure, which is built in make_monitor_table
@@ -3975,6 +3978,19 @@ sh_update_cr3(struct vcpu *v, int do_loc
         /* PAGING_LEVELS==4 implies 64-bit, which means that
          * map_domain_page_global can't fail */
         BUG_ON(v->arch.paging.shadow.guest_vtable == NULL);
+        if ( !shadow_mode_external(d) && !is_pv_32on64_domain(d) )
+        {
+            shadow_l4e_t *sl4e = v->arch.paging.shadow.guest_vtable;
+
+            if ( (v->arch.flags & TF_kernel_mode) &&
+                 !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
+                    idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+            else if ( !(v->arch.flags & TF_kernel_mode) &&
+                      VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
+                    shadow_l4e_empty();
+        }
     }
     else
         v->arch.paging.shadow.guest_vtable = __linear_l4_table;
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -480,7 +480,7 @@ static int setup_m2p_table(struct mem_ho
                 l2_ro_mpt += l2_table_offset(va);
             }
 
-            /* NB. Cannot be GLOBAL as shadow_mode_translate reuses this area. */
+            /* NB. Cannot be GLOBAL: guest user mode should not see it. */
             l2e_write(l2_ro_mpt, l2e_from_pfn(mfn,
                    /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
         }
@@ -583,7 +583,7 @@ void __init paging_init(void)
                        0x77, 1UL << L3_PAGETABLE_SHIFT);
 
                 ASSERT(!l2_table_offset(va));
-                /* NB. Cannot be GLOBAL as shadow_mode_translate reuses this area. */
+                /* NB. Cannot be GLOBAL: guest user mode should not see it. */
                 l3e_write(&l3_ro_mpt[l3_table_offset(va)],
                     l3e_from_page(l1_pg,
                         /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
@@ -621,7 +621,7 @@ void __init paging_init(void)
                       l3e_from_page(l2_pg, __PAGE_HYPERVISOR | _PAGE_USER));
             ASSERT(!l2_table_offset(va));
         }
-        /* NB. Cannot be GLOBAL as shadow_mode_translate reuses this area. */
+        /* NB. Cannot be GLOBAL: guest user mode should not see it. */
         if ( l1_pg )
             l2e_write(l2_ro_mpt, l2e_from_page(
                 l1_pg, /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -41,6 +41,11 @@ CHECK_TYPE(domain_handle);
 #define xennmi_callback compat_nmi_callback
 #define xennmi_callback_t compat_nmi_callback_t
 
+#ifdef COMPAT_VM_ASSIST_VALID
+#undef VM_ASSIST_VALID
+#define VM_ASSIST_VALID COMPAT_VM_ASSIST_VALID
+#endif
+
 #define DO(fn) int compat_##fn
 #define COMPAT
 
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1318,9 +1318,11 @@ long do_vcpu_op(int cmd, unsigned int vc
     return rc;
 }
 
-long vm_assist(struct domain *p, unsigned int cmd, unsigned int type)
+#ifdef VM_ASSIST_VALID
+long vm_assist(struct domain *p, unsigned int cmd, unsigned int type,
+               unsigned long valid)
 {
-    if ( type > MAX_VMASST_TYPE )
+    if ( type >= BITS_PER_LONG || !test_bit(type, &valid) )
         return -EINVAL;
 
     switch ( cmd )
@@ -1335,6 +1337,7 @@ long vm_assist(struct domain *p, unsigne
 
     return -ENOSYS;
 }
+#endif
 
 struct pirq *pirq_get_info(struct domain *d, int pirq)
 {
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -386,10 +386,12 @@ DO(nmi_op)(unsigned int cmd, XEN_GUEST_H
     return rc;
 }
 
+#ifdef VM_ASSIST_VALID
 DO(vm_assist)(unsigned int cmd, unsigned int type)
 {
-    return vm_assist(current->domain, cmd, type);
+    return vm_assist(current->domain, cmd, type, VM_ASSIST_VALID);
 }
+#endif
 
 DO(ni_hypercall)(void)
 {
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -343,6 +343,15 @@ extern unsigned long xen_phys_start;
 #define ARG_XLAT_START(v)        \
     (ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))
 
+#define NATIVE_VM_ASSIST_VALID   ((1UL << VMASST_TYPE_4gb_segments)        | \
+                                  (1UL << VMASST_TYPE_4gb_segments_notify) | \
+                                  (1UL << VMASST_TYPE_writable_pagetables) | \
+                                  (1UL << VMASST_TYPE_pae_extended_cr3)    | \
+                                  (1UL << VMASST_TYPE_m2p_strict))
+#define VM_ASSIST_VALID          NATIVE_VM_ASSIST_VALID
+#define COMPAT_VM_ASSIST_VALID   (NATIVE_VM_ASSIST_VALID & \
+                                  ((1UL << COMPAT_BITS_PER_LONG) - 1))
+
 #define ELFSIZE 64
 
 #define ARCH_CRASH_SAVE_VMCOREINFO
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -318,7 +318,8 @@ static inline void *__page_to_virt(const
 int free_page_type(struct page_info *page, unsigned long type,
                    int preemptible);
 
-void init_guest_l4_table(l4_pgentry_t[], const struct domain *);
+void init_guest_l4_table(l4_pgentry_t[], const struct domain *,
+                         bool_t zap_ro_mpt);
 
 int is_iomem_page(unsigned long mfn);
 
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -486,7 +486,12 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
 /* x86/PAE guests: support PDPTs above 4GB. */
 #define VMASST_TYPE_pae_extended_cr3     3
 
+/* x86/64 guests: strictly hide M2P from user mode. */
+#define VMASST_TYPE_m2p_strict           32
+
+#if __XEN_INTERFACE_VERSION__ < 0x00040600
 #define MAX_VMASST_TYPE                  3
+#endif
 
 #ifndef __ASSEMBLY__
 
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -80,7 +80,8 @@ extern void guest_printk(const struct do
     __attribute__ ((format (printf, 2, 3)));
 extern void noreturn panic(const char *format, ...)
     __attribute__ ((format (printf, 1, 2)));
-extern long vm_assist(struct domain *, unsigned int, unsigned int);
+extern long vm_assist(struct domain *, unsigned int cmd, unsigned int type,
+                      unsigned long valid);
 extern int __printk_ratelimit(int ratelimit_ms, int ratelimit_burst);
 extern int printk_ratelimit(void);
 



[-- Attachment #2: x86-m2p-strict.patch --]
[-- Type: text/plain, Size: 13051 bytes --]

x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P

Xen L4 entries being uniformly installed into any L4 table and 64-bit
PV kernels running in ring 3 means that user mode was able to see the
read-only M2P presented by Xen to the guests. While apparently not
really representing an exploitable information leak, this still very
certainly was never meant to be that way.

Building on the fact that these guests already have separate kernel and
user mode page tables we can allow guest kernels to tell Xen that they
don't want user mode to see this table. We can't, however, do this by
default: There is no ABI requirement that kernel and user mode page
tables be separate. Therefore introduce a new VM-assist flag allowing
the guest to control respective hypervisor behavior:
- when not set, L4 tables get created with the respective slot blank,
  and whenever the L4 table gets used as a kernel one the missing
  mapping gets inserted,
- when set, L4 tables get created with the respective slot initialized
  as before, and whenever the L4 table gets used as a user one the
  mapping gets zapped.

Since the new flag gets assigned a value discontiguous to the existing
ones (in order to preserve the low bits, as only those are currently
accessible to 32-bit guests), this requires a little bit of rework of
the VM assist code in general: An architecture specific
VM_ASSIST_VALID definition gets introduced (with an optional compat
mode counterpart), and compilation of the respective code becomes
conditional upon this being defined (ARM doesn't wire these up and
hence doesn't need that code).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -339,7 +339,7 @@ static int setup_compat_l4(struct vcpu *
 
     l4tab = __map_domain_page(pg);
     clear_page(l4tab);
-    init_guest_l4_table(l4tab, v->domain);
+    init_guest_l4_table(l4tab, v->domain, 1);
     unmap_domain_page(l4tab);
 
     v->arch.guest_table = pagetable_from_page(pg);
@@ -971,7 +971,17 @@ int arch_set_info_guest(
         case -EINTR:
             rc = -ERESTART;
         case -ERESTART:
+            break;
         case 0:
+            if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
+                 !paging_mode_refcounts(d) )
+            {
+                l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
+
+                l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+                    idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+                unmap_domain_page(l4tab);
+            }
             break;
         default:
             if ( cr3_page == current->arch.old_guest_table )
@@ -1006,7 +1016,16 @@ int arch_set_info_guest(
                 default:
                     if ( cr3_page == current->arch.old_guest_table )
                         cr3_page = NULL;
+                    break;
                 case 0:
+                    if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                    {
+                        l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
+
+                        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+                            l4e_empty();
+                        unmap_domain_page(l4tab);
+                    }
                     break;
                 }
             }
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -1203,7 +1203,7 @@ int __init construct_dom0(
         l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
     }
     clear_page(l4tab);
-    init_guest_l4_table(l4tab, d);
+    init_guest_l4_table(l4tab, d, 0);
     v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
     if ( is_pv_32on64_domain(d) )
         v->arch.guest_table_user = v->arch.guest_table;
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1380,7 +1380,8 @@ static int alloc_l3_table(struct page_in
     return rc > 0 ? 0 : rc;
 }
 
-void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d)
+void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+                         bool_t zap_ro_mpt)
 {
     /* Xen private mappings. */
     memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
@@ -1395,6 +1396,8 @@ void init_guest_l4_table(l4_pgentry_t l4
         l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR);
     l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
         l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR);
+    if ( zap_ro_mpt || is_pv_32on64_domain(d) || paging_mode_refcounts(d) )
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
 }
 
 static int alloc_l4_table(struct page_info *page)
@@ -1444,7 +1447,7 @@ static int alloc_l4_table(struct page_in
         adjust_guest_l4e(pl4e[i], d);
     }
 
-    init_guest_l4_table(pl4e, d);
+    init_guest_l4_table(pl4e, d, !VM_ASSIST(d, VMASST_TYPE_m2p_strict));
     unmap_domain_page(pl4e);
 
     return rc > 0 ? 0 : rc;
@@ -2755,6 +2758,14 @@ int new_guest_cr3(unsigned long mfn)
 
     invalidate_shadow_ldt(curr, 0);
 
+    if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && !paging_mode_refcounts(d) )
+    {
+        l4_pgentry_t *l4tab = map_domain_page(mfn);
+
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+            idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+        unmap_domain_page(l4tab);
+    }
     curr->arch.guest_table = pagetable_from_pfn(mfn);
     update_cr3(curr);
 
@@ -3112,6 +3123,14 @@ long do_mmuext_op(
                                 op.arg1.mfn);
                     break;
                 }
+                if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
+                     !paging_mode_refcounts(d) )
+                {
+                    l4_pgentry_t *l4tab = map_domain_page(op.arg1.mfn);
+
+                    l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+                    unmap_domain_page(l4tab);
+                }
             }
 
             curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1436,6 +1436,9 @@ void sh_install_xen_entries_in_l4(struct
         shadow_l4e_from_mfn(page_to_mfn(d->arch.perdomain_l3_pg),
                             __PAGE_HYPERVISOR);
 
+    if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+        sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] = shadow_l4e_empty();
+
     /* Shadow linear mapping for 4-level shadows.  N.B. for 3-level
      * shadows on 64-bit xen, this linear mapping is later replaced by the
      * monitor pagetable structure, which is built in make_monitor_table
@@ -3975,6 +3978,19 @@ sh_update_cr3(struct vcpu *v, int do_loc
         /* PAGING_LEVELS==4 implies 64-bit, which means that
          * map_domain_page_global can't fail */
         BUG_ON(v->arch.paging.shadow.guest_vtable == NULL);
+        if ( !shadow_mode_external(d) && !is_pv_32on64_domain(d) )
+        {
+            shadow_l4e_t *sl4e = v->arch.paging.shadow.guest_vtable;
+
+            if ( (v->arch.flags & TF_kernel_mode) &&
+                 !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
+                    idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+            else if ( !(v->arch.flags & TF_kernel_mode) &&
+                      VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
+                    shadow_l4e_empty();
+        }
     }
     else
         v->arch.paging.shadow.guest_vtable = __linear_l4_table;
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -480,7 +480,7 @@ static int setup_m2p_table(struct mem_ho
                 l2_ro_mpt += l2_table_offset(va);
             }
 
-            /* NB. Cannot be GLOBAL as shadow_mode_translate reuses this area. */
+            /* NB. Cannot be GLOBAL: guest user mode should not see it. */
             l2e_write(l2_ro_mpt, l2e_from_pfn(mfn,
                    /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
         }
@@ -583,7 +583,7 @@ void __init paging_init(void)
                        0x77, 1UL << L3_PAGETABLE_SHIFT);
 
                 ASSERT(!l2_table_offset(va));
-                /* NB. Cannot be GLOBAL as shadow_mode_translate reuses this area. */
+                /* NB. Cannot be GLOBAL: guest user mode should not see it. */
                 l3e_write(&l3_ro_mpt[l3_table_offset(va)],
                     l3e_from_page(l1_pg,
                         /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
@@ -621,7 +621,7 @@ void __init paging_init(void)
                       l3e_from_page(l2_pg, __PAGE_HYPERVISOR | _PAGE_USER));
             ASSERT(!l2_table_offset(va));
         }
-        /* NB. Cannot be GLOBAL as shadow_mode_translate reuses this area. */
+        /* NB. Cannot be GLOBAL: guest user mode should not see it. */
         if ( l1_pg )
             l2e_write(l2_ro_mpt, l2e_from_page(
                 l1_pg, /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -41,6 +41,11 @@ CHECK_TYPE(domain_handle);
 #define xennmi_callback compat_nmi_callback
 #define xennmi_callback_t compat_nmi_callback_t
 
+#ifdef COMPAT_VM_ASSIST_VALID
+#undef VM_ASSIST_VALID
+#define VM_ASSIST_VALID COMPAT_VM_ASSIST_VALID
+#endif
+
 #define DO(fn) int compat_##fn
 #define COMPAT
 
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1318,9 +1318,11 @@ long do_vcpu_op(int cmd, unsigned int vc
     return rc;
 }
 
-long vm_assist(struct domain *p, unsigned int cmd, unsigned int type)
+#ifdef VM_ASSIST_VALID
+long vm_assist(struct domain *p, unsigned int cmd, unsigned int type,
+               unsigned long valid)
 {
-    if ( type > MAX_VMASST_TYPE )
+    if ( type >= BITS_PER_LONG || !test_bit(type, &valid) )
         return -EINVAL;
 
     switch ( cmd )
@@ -1335,6 +1337,7 @@ long vm_assist(struct domain *p, unsigne
 
     return -ENOSYS;
 }
+#endif
 
 struct pirq *pirq_get_info(struct domain *d, int pirq)
 {
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -386,10 +386,12 @@ DO(nmi_op)(unsigned int cmd, XEN_GUEST_H
     return rc;
 }
 
+#ifdef VM_ASSIST_VALID
 DO(vm_assist)(unsigned int cmd, unsigned int type)
 {
-    return vm_assist(current->domain, cmd, type);
+    return vm_assist(current->domain, cmd, type, VM_ASSIST_VALID);
 }
+#endif
 
 DO(ni_hypercall)(void)
 {
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -343,6 +343,15 @@ extern unsigned long xen_phys_start;
 #define ARG_XLAT_START(v)        \
     (ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))
 
+#define NATIVE_VM_ASSIST_VALID   ((1UL << VMASST_TYPE_4gb_segments)        | \
+                                  (1UL << VMASST_TYPE_4gb_segments_notify) | \
+                                  (1UL << VMASST_TYPE_writable_pagetables) | \
+                                  (1UL << VMASST_TYPE_pae_extended_cr3)    | \
+                                  (1UL << VMASST_TYPE_m2p_strict))
+#define VM_ASSIST_VALID          NATIVE_VM_ASSIST_VALID
+#define COMPAT_VM_ASSIST_VALID   (NATIVE_VM_ASSIST_VALID & \
+                                  ((1UL << COMPAT_BITS_PER_LONG) - 1))
+
 #define ELFSIZE 64
 
 #define ARCH_CRASH_SAVE_VMCOREINFO
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -318,7 +318,8 @@ static inline void *__page_to_virt(const
 int free_page_type(struct page_info *page, unsigned long type,
                    int preemptible);
 
-void init_guest_l4_table(l4_pgentry_t[], const struct domain *);
+void init_guest_l4_table(l4_pgentry_t[], const struct domain *,
+                         bool_t zap_ro_mpt);
 
 int is_iomem_page(unsigned long mfn);
 
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -486,7 +486,12 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
 /* x86/PAE guests: support PDPTs above 4GB. */
 #define VMASST_TYPE_pae_extended_cr3     3
 
+/* x86/64 guests: strictly hide M2P from user mode. */
+#define VMASST_TYPE_m2p_strict           32
+
+#if __XEN_INTERFACE_VERSION__ < 0x00040600
 #define MAX_VMASST_TYPE                  3
+#endif
 
 #ifndef __ASSEMBLY__
 
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -80,7 +80,8 @@ extern void guest_printk(const struct do
     __attribute__ ((format (printf, 2, 3)));
 extern void noreturn panic(const char *format, ...)
     __attribute__ ((format (printf, 1, 2)));
-extern long vm_assist(struct domain *, unsigned int, unsigned int);
+extern long vm_assist(struct domain *, unsigned int cmd, unsigned int type,
+                      unsigned long valid);
 extern int __printk_ratelimit(int ratelimit_ms, int ratelimit_burst);
 extern int printk_ratelimit(void);
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/3] slightly reduce vm_assist code
  2015-03-17 15:48 [PATCH 0/3] x86: misc changes Jan Beulich
  2015-03-17 15:54 ` [PATCH 1/3] x86: allow 64‑bit PV guest kernels to suppress user mode exposure of M2P Jan Beulich
@ 2015-03-17 15:55 ` Jan Beulich
  2015-03-17 17:19   ` Tim Deegan
  2015-03-17 15:56 ` [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN() Jan Beulich
  2 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2015-03-17 15:55 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, Andrew Cooper, Keir Fraser, Ian Jackson, Tim Deegan

[-- Attachment #1: Type: text/plain, Size: 5991 bytes --]

- drop an effectively unused struct pv_vcpu field (x86)
- adjust VM_ASSIST() to prepend VMASST_TYPE_

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -901,7 +901,6 @@ int arch_set_info_guest(
         v->arch.pv_vcpu.event_callback_cs = c(event_callback_cs);
         v->arch.pv_vcpu.failsafe_callback_cs = c(failsafe_callback_cs);
     }
-    v->arch.pv_vcpu.vm_assist = c(vm_assist);
 
     /* Only CR0.TS is modifiable by guest or admin. */
     v->arch.pv_vcpu.ctrlreg[0] &= X86_CR0_TS;
@@ -973,7 +972,7 @@ int arch_set_info_guest(
         case -ERESTART:
             break;
         case 0:
-            if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
+            if ( !compat && !VM_ASSIST(d, m2p_strict) &&
                  !paging_mode_refcounts(d) )
             {
                 l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
@@ -1023,7 +1022,7 @@ int arch_set_info_guest(
                         cr3_page = NULL;
                     break;
                 case 0:
-                    if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                    if ( VM_ASSIST(d, m2p_strict) )
                     {
                         l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
 
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1436,7 +1436,6 @@ void arch_get_info_guest(struct vcpu *v,
             c(event_callback_cs = v->arch.pv_vcpu.event_callback_cs);
             c(failsafe_callback_cs = v->arch.pv_vcpu.failsafe_callback_cs);
         }
-        c(vm_assist = v->arch.pv_vcpu.vm_assist);
 
         /* IOPL privileges are virtualised: merge back into returned eflags. */
         BUG_ON((c(user_regs.eflags) & X86_EFLAGS_IOPL) != 0);
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1454,7 +1454,7 @@ static int alloc_l4_table(struct page_in
         adjust_guest_l4e(pl4e[i], d);
     }
 
-    init_guest_l4_table(pl4e, d, !VM_ASSIST(d, VMASST_TYPE_m2p_strict));
+    init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
     unmap_domain_page(pl4e);
 
     return rc > 0 ? 0 : rc;
@@ -2765,7 +2765,7 @@ int new_guest_cr3(unsigned long mfn)
 
     invalidate_shadow_ldt(curr, 0);
 
-    if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && !paging_mode_refcounts(d) )
+    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
     {
         l4_pgentry_t *l4tab = map_domain_page(mfn);
 
@@ -3135,8 +3135,7 @@ long do_mmuext_op(
                                 op.arg1.mfn);
                     break;
                 }
-                if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
-                     !paging_mode_refcounts(d) )
+                if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
                 {
                     l4_pgentry_t *l4tab = map_domain_page(op.arg1.mfn);
 
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1436,7 +1436,7 @@ void sh_install_xen_entries_in_l4(struct
         shadow_l4e_from_mfn(page_to_mfn(d->arch.perdomain_l3_pg),
                             __PAGE_HYPERVISOR);
 
-    if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+    if ( !VM_ASSIST(d, m2p_strict) )
         sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] = shadow_l4e_empty();
 
     /* Shadow linear mapping for 4-level shadows.  N.B. for 3-level
@@ -3983,11 +3983,11 @@ sh_update_cr3(struct vcpu *v, int do_loc
             shadow_l4e_t *sl4e = v->arch.paging.shadow.guest_vtable;
 
             if ( (v->arch.flags & TF_kernel_mode) &&
-                 !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                 !VM_ASSIST(d, m2p_strict) )
                 sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
                     idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
             else if ( !(v->arch.flags & TF_kernel_mode) &&
-                      VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                      VM_ASSIST(d, m2p_strict) )
                 sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
                     shadow_l4e_empty();
         }
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1441,7 +1441,7 @@ static int fixup_page_fault(unsigned lon
          !(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) &&
          (regs->error_code & PFEC_write_access) )
     {
-        if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables) &&
+        if ( VM_ASSIST(d, writable_pagetables) &&
              /* Do not check if access-protection fault since the page may
                 legitimately be not present in shadow page tables */
              (paging_mode_enabled(d) ||
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -306,7 +306,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDL
         {
         case 0:
             fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
-            if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) )
+            if ( VM_ASSIST(d, pae_extended_cr3) )
                 fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
             if ( paging_mode_translate(d) )
                 fi.submap |= 
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -372,8 +372,6 @@ struct pv_vcpu
         };
     };
 
-    unsigned long vm_assist;
-
     unsigned long syscall32_callback_eip;
     unsigned long sysenter_callback_eip;
     unsigned short syscall32_callback_cs;
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -833,7 +833,7 @@ void watchdog_domain_destroy(struct doma
 /* This check is for functionality specific to a control domain */
 #define is_control_domain(_d) ((_d)->is_privileged)
 
-#define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist))
+#define VM_ASSIST(d, t) (test_bit(VMASST_TYPE_ ## t, &(d)->vm_assist))
 
 #define is_pv_domain(d) ((d)->guest_type == guest_type_pv)
 #define is_pv_vcpu(v)   (is_pv_domain((v)->domain))



[-- Attachment #2: vm-assist-cleanup.patch --]
[-- Type: text/plain, Size: 6021 bytes --]

slightly reduce vm_assist code

- drop an effectively unused struct pv_vcpu field (x86)
- adjust VM_ASSIST() to prepend VMASST_TYPE_

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -901,7 +901,6 @@ int arch_set_info_guest(
         v->arch.pv_vcpu.event_callback_cs = c(event_callback_cs);
         v->arch.pv_vcpu.failsafe_callback_cs = c(failsafe_callback_cs);
     }
-    v->arch.pv_vcpu.vm_assist = c(vm_assist);
 
     /* Only CR0.TS is modifiable by guest or admin. */
     v->arch.pv_vcpu.ctrlreg[0] &= X86_CR0_TS;
@@ -973,7 +972,7 @@ int arch_set_info_guest(
         case -ERESTART:
             break;
         case 0:
-            if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
+            if ( !compat && !VM_ASSIST(d, m2p_strict) &&
                  !paging_mode_refcounts(d) )
             {
                 l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
@@ -1023,7 +1022,7 @@ int arch_set_info_guest(
                         cr3_page = NULL;
                     break;
                 case 0:
-                    if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                    if ( VM_ASSIST(d, m2p_strict) )
                     {
                         l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
 
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1436,7 +1436,6 @@ void arch_get_info_guest(struct vcpu *v,
             c(event_callback_cs = v->arch.pv_vcpu.event_callback_cs);
             c(failsafe_callback_cs = v->arch.pv_vcpu.failsafe_callback_cs);
         }
-        c(vm_assist = v->arch.pv_vcpu.vm_assist);
 
         /* IOPL privileges are virtualised: merge back into returned eflags. */
         BUG_ON((c(user_regs.eflags) & X86_EFLAGS_IOPL) != 0);
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1454,7 +1454,7 @@ static int alloc_l4_table(struct page_in
         adjust_guest_l4e(pl4e[i], d);
     }
 
-    init_guest_l4_table(pl4e, d, !VM_ASSIST(d, VMASST_TYPE_m2p_strict));
+    init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
     unmap_domain_page(pl4e);
 
     return rc > 0 ? 0 : rc;
@@ -2765,7 +2765,7 @@ int new_guest_cr3(unsigned long mfn)
 
     invalidate_shadow_ldt(curr, 0);
 
-    if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && !paging_mode_refcounts(d) )
+    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
     {
         l4_pgentry_t *l4tab = map_domain_page(mfn);
 
@@ -3135,8 +3135,7 @@ long do_mmuext_op(
                                 op.arg1.mfn);
                     break;
                 }
-                if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
-                     !paging_mode_refcounts(d) )
+                if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
                 {
                     l4_pgentry_t *l4tab = map_domain_page(op.arg1.mfn);
 
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1436,7 +1436,7 @@ void sh_install_xen_entries_in_l4(struct
         shadow_l4e_from_mfn(page_to_mfn(d->arch.perdomain_l3_pg),
                             __PAGE_HYPERVISOR);
 
-    if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+    if ( !VM_ASSIST(d, m2p_strict) )
         sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] = shadow_l4e_empty();
 
     /* Shadow linear mapping for 4-level shadows.  N.B. for 3-level
@@ -3983,11 +3983,11 @@ sh_update_cr3(struct vcpu *v, int do_loc
             shadow_l4e_t *sl4e = v->arch.paging.shadow.guest_vtable;
 
             if ( (v->arch.flags & TF_kernel_mode) &&
-                 !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                 !VM_ASSIST(d, m2p_strict) )
                 sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
                     idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
             else if ( !(v->arch.flags & TF_kernel_mode) &&
-                      VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+                      VM_ASSIST(d, m2p_strict) )
                 sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
                     shadow_l4e_empty();
         }
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1441,7 +1441,7 @@ static int fixup_page_fault(unsigned lon
          !(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) &&
          (regs->error_code & PFEC_write_access) )
     {
-        if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables) &&
+        if ( VM_ASSIST(d, writable_pagetables) &&
              /* Do not check if access-protection fault since the page may
                 legitimately be not present in shadow page tables */
              (paging_mode_enabled(d) ||
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -306,7 +306,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDL
         {
         case 0:
             fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
-            if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) )
+            if ( VM_ASSIST(d, pae_extended_cr3) )
                 fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
             if ( paging_mode_translate(d) )
                 fi.submap |= 
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -372,8 +372,6 @@ struct pv_vcpu
         };
     };
 
-    unsigned long vm_assist;
-
     unsigned long syscall32_callback_eip;
     unsigned long sysenter_callback_eip;
     unsigned short syscall32_callback_cs;
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -833,7 +833,7 @@ void watchdog_domain_destroy(struct doma
 /* This check is for functionality specific to a control domain */
 #define is_control_domain(_d) ((_d)->is_privileged)
 
-#define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist))
+#define VM_ASSIST(d, t) (test_bit(VMASST_TYPE_ ## t, &(d)->vm_assist))
 
 #define is_pv_domain(d) ((d)->guest_type == guest_type_pv)
 #define is_pv_vcpu(v)   (is_pv_domain((v)->domain))

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN()
  2015-03-17 15:48 [PATCH 0/3] x86: misc changes Jan Beulich
  2015-03-17 15:54 ` [PATCH 1/3] x86: allow 64‑bit PV guest kernels to suppress user mode exposure of M2P Jan Beulich
  2015-03-17 15:55 ` [PATCH 2/3] slightly reduce vm_assist code Jan Beulich
@ 2015-03-17 15:56 ` Jan Beulich
  2015-03-17 17:07   ` Tim Deegan
  2 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2015-03-17 15:56 UTC (permalink / raw)
  To: xen-devel; +Cc: Tim Deegan

[-- Attachment #1: Type: text/plain, Size: 3978 bytes --]

Most callers have this available already, and the functions don't need
any vcpu specifics.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1416,9 +1416,8 @@ do {                                    
 //        shadow-types.h to shadow-private.h
 //
 #if GUEST_PAGING_LEVELS == 4
-void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn)
+void sh_install_xen_entries_in_l4(struct domain *d, mfn_t gl4mfn, mfn_t sl4mfn)
 {
-    struct domain *d = v->domain;
     shadow_l4e_t *sl4e;
     unsigned int slots;
 
@@ -1449,7 +1448,7 @@ void sh_install_xen_entries_in_l4(struct
         shadow_l4e_from_mfn(sl4mfn, __PAGE_HYPERVISOR);
 
     /* Self linear mapping.  */
-    if ( shadow_mode_translate(v->domain) && !shadow_mode_external(v->domain) )
+    if ( shadow_mode_translate(d) && !shadow_mode_external(d) )
     {
         // linear tables may not be used with translated PV guests
         sl4e[shadow_l4_table_offset(LINEAR_PT_VIRT_START)] =
@@ -1470,12 +1469,11 @@ void sh_install_xen_entries_in_l4(struct
 // place, which means that we need to populate the l2h entry in the l3
 // table.
 
-static void sh_install_xen_entries_in_l2h(struct vcpu *v, mfn_t sl2hmfn)
+static void sh_install_xen_entries_in_l2h(struct domain *d, mfn_t sl2hmfn)
 {
-    struct domain *d = v->domain;
     shadow_l2e_t *sl2e;
 
-    if ( !is_pv_32on64_vcpu(v) )
+    if ( !is_pv_32on64_domain(d) )
         return;
 
     sl2e = sh_map_domain_page(sl2hmfn);
@@ -1549,11 +1547,13 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
         {
 #if GUEST_PAGING_LEVELS == 4
         case SH_type_l4_shadow:
-            sh_install_xen_entries_in_l4(v, gmfn, smfn); break;
+            sh_install_xen_entries_in_l4(v->domain, gmfn, smfn);
+            break;
 #endif
 #if GUEST_PAGING_LEVELS >= 3
         case SH_type_l2h_shadow:
-            sh_install_xen_entries_in_l2h(v, smfn); break;
+            sh_install_xen_entries_in_l2h(v->domain, smfn);
+            break;
 #endif
         default: /* Do nothing */ break;
         }
@@ -1594,7 +1594,7 @@ sh_make_monitor_table(struct vcpu *v)
     {
         mfn_t m4mfn;
         m4mfn = shadow_alloc(d, SH_type_monitor_table, 0);
-        sh_install_xen_entries_in_l4(v, m4mfn, m4mfn);
+        sh_install_xen_entries_in_l4(d, m4mfn, m4mfn);
         /* Remember the level of this table */
         mfn_to_page(m4mfn)->shadow_flags = 4;
 #if SHADOW_PAGING_LEVELS < 4
@@ -1618,7 +1618,7 @@ sh_make_monitor_table(struct vcpu *v)
             l3e[0] = l3e_from_pfn(mfn_x(m2mfn), __PAGE_HYPERVISOR);
             sh_unmap_domain_page(l3e);
 
-            if ( is_pv_32on64_vcpu(v) )
+            if ( is_pv_32on64_domain(d) )
             {
                 /* For 32-on-64 PV guests, we need to map the 32-bit Xen
                  * area into its usual VAs in the monitor tables */
@@ -1630,7 +1630,7 @@ sh_make_monitor_table(struct vcpu *v)
                 mfn_to_page(m2mfn)->shadow_flags = 2;
                 l3e = sh_map_domain_page(m3mfn);
                 l3e[3] = l3e_from_pfn(mfn_x(m2mfn), _PAGE_PRESENT);
-                sh_install_xen_entries_in_l2h(v, m2mfn);
+                sh_install_xen_entries_in_l2h(d, m2mfn);
                 sh_unmap_domain_page(l3e);
             }
 
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -361,7 +361,7 @@ mfn_t shadow_alloc(struct domain *d, 
 void  shadow_free(struct domain *d, mfn_t smfn);
 
 /* Install the xen mappings in various flavours of shadow */
-void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn);
+void sh_install_xen_entries_in_l4(struct domain *, mfn_t gl4mfn, mfn_t sl4mfn);
 
 /* Update the shadows in response to a pagetable write from Xen */
 int sh_validate_guest_entry(struct vcpu *v, mfn_t gmfn, void *entry, u32 size);




[-- Attachment #2: sh-install-xen-entries.patch --]
[-- Type: text/plain, Size: 4033 bytes --]

x86/shadow: pass domain to sh_install_xen_entries_in_lN()

Most callers have this available already, and the functions don't need
any vcpu specifics.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1416,9 +1416,8 @@ do {                                    
 //        shadow-types.h to shadow-private.h
 //
 #if GUEST_PAGING_LEVELS == 4
-void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn)
+void sh_install_xen_entries_in_l4(struct domain *d, mfn_t gl4mfn, mfn_t sl4mfn)
 {
-    struct domain *d = v->domain;
     shadow_l4e_t *sl4e;
     unsigned int slots;
 
@@ -1449,7 +1448,7 @@ void sh_install_xen_entries_in_l4(struct
         shadow_l4e_from_mfn(sl4mfn, __PAGE_HYPERVISOR);
 
     /* Self linear mapping.  */
-    if ( shadow_mode_translate(v->domain) && !shadow_mode_external(v->domain) )
+    if ( shadow_mode_translate(d) && !shadow_mode_external(d) )
     {
         // linear tables may not be used with translated PV guests
         sl4e[shadow_l4_table_offset(LINEAR_PT_VIRT_START)] =
@@ -1470,12 +1469,11 @@ void sh_install_xen_entries_in_l4(struct
 // place, which means that we need to populate the l2h entry in the l3
 // table.
 
-static void sh_install_xen_entries_in_l2h(struct vcpu *v, mfn_t sl2hmfn)
+static void sh_install_xen_entries_in_l2h(struct domain *d, mfn_t sl2hmfn)
 {
-    struct domain *d = v->domain;
     shadow_l2e_t *sl2e;
 
-    if ( !is_pv_32on64_vcpu(v) )
+    if ( !is_pv_32on64_domain(d) )
         return;
 
     sl2e = sh_map_domain_page(sl2hmfn);
@@ -1549,11 +1547,13 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
         {
 #if GUEST_PAGING_LEVELS == 4
         case SH_type_l4_shadow:
-            sh_install_xen_entries_in_l4(v, gmfn, smfn); break;
+            sh_install_xen_entries_in_l4(v->domain, gmfn, smfn);
+            break;
 #endif
 #if GUEST_PAGING_LEVELS >= 3
         case SH_type_l2h_shadow:
-            sh_install_xen_entries_in_l2h(v, smfn); break;
+            sh_install_xen_entries_in_l2h(v->domain, smfn);
+            break;
 #endif
         default: /* Do nothing */ break;
         }
@@ -1594,7 +1594,7 @@ sh_make_monitor_table(struct vcpu *v)
     {
         mfn_t m4mfn;
         m4mfn = shadow_alloc(d, SH_type_monitor_table, 0);
-        sh_install_xen_entries_in_l4(v, m4mfn, m4mfn);
+        sh_install_xen_entries_in_l4(d, m4mfn, m4mfn);
         /* Remember the level of this table */
         mfn_to_page(m4mfn)->shadow_flags = 4;
 #if SHADOW_PAGING_LEVELS < 4
@@ -1618,7 +1618,7 @@ sh_make_monitor_table(struct vcpu *v)
             l3e[0] = l3e_from_pfn(mfn_x(m2mfn), __PAGE_HYPERVISOR);
             sh_unmap_domain_page(l3e);
 
-            if ( is_pv_32on64_vcpu(v) )
+            if ( is_pv_32on64_domain(d) )
             {
                 /* For 32-on-64 PV guests, we need to map the 32-bit Xen
                  * area into its usual VAs in the monitor tables */
@@ -1630,7 +1630,7 @@ sh_make_monitor_table(struct vcpu *v)
                 mfn_to_page(m2mfn)->shadow_flags = 2;
                 l3e = sh_map_domain_page(m3mfn);
                 l3e[3] = l3e_from_pfn(mfn_x(m2mfn), _PAGE_PRESENT);
-                sh_install_xen_entries_in_l2h(v, m2mfn);
+                sh_install_xen_entries_in_l2h(d, m2mfn);
                 sh_unmap_domain_page(l3e);
             }
 
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -361,7 +361,7 @@ mfn_t shadow_alloc(struct domain *d, 
 void  shadow_free(struct domain *d, mfn_t smfn);
 
 /* Install the xen mappings in various flavours of shadow */
-void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn);
+void sh_install_xen_entries_in_l4(struct domain *, mfn_t gl4mfn, mfn_t sl4mfn);
 
 /* Update the shadows in response to a pagetable write from Xen */
 int sh_validate_guest_entry(struct vcpu *v, mfn_t gmfn, void *entry, u32 size);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN()
  2015-03-17 15:56 ` [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN() Jan Beulich
@ 2015-03-17 17:07   ` Tim Deegan
  0 siblings, 0 replies; 8+ messages in thread
From: Tim Deegan @ 2015-03-17 17:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

At 15:56 +0000 on 17 Mar (1426607770), Jan Beulich wrote:
> Most callers have this available already, and the functions don't need
> any vcpu specifics.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Tim Deegan <tim@xen.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] slightly reduce vm_assist code
  2015-03-17 15:55 ` [PATCH 2/3] slightly reduce vm_assist code Jan Beulich
@ 2015-03-17 17:19   ` Tim Deegan
  0 siblings, 0 replies; 8+ messages in thread
From: Tim Deegan @ 2015-03-17 17:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, xen-devel, Keir Fraser, Ian Jackson, Andrew Cooper

At 15:55 +0000 on 17 Mar (1426607705), Jan Beulich wrote:
> - drop an effectively unused struct pv_vcpu field (x86)
> - adjust VM_ASSIST() to prepend VMASST_TYPE_
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Tim Deegan <tim@xen.org>, though I think these would
have been better as two separate patches.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] x86: allow 64?bit PV guest kernels to suppress user mode exposure of M2P
  2015-03-17 15:54 ` [PATCH 1/3] x86: allow 64‑bit PV guest kernels to suppress user mode exposure of M2P Jan Beulich
@ 2015-03-19 12:35   ` Tim Deegan
  2015-03-19 13:08     ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Deegan @ 2015-03-19 12:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Campbell, xen-devel, Keir Fraser, Ian Jackson, Andrew Cooper

Hi,

At 15:54 +0000 on 17 Mar (1426607644), Jan Beulich wrote:
> Xen L4 entries being uniformly installed into any L4 table and 64-bit
> PV kernels running in ring 3 means that user mode was able to see the
> read-only M2P presented by Xen to the guests. While apparently not
> really representing an exploitable information leak, this still very
> certainly was never meant to be that way.
> 
> Building on the fact that these guests already have separate kernel and
> user mode page tables we can allow guest kernels to tell Xen that they
> don't want user mode to see this table. We can't, however, do this by
> default: There is no ABI requirement that kernel and user mode page
> tables be separate. Therefore introduce a new VM-assist flag allowing
> the guest to control respective hypervisor behavior:
> - when not set, L4 tables get created with the respective slot blank,
>   and whenever the L4 table gets used as a kernel one the missing
>   mapping gets inserted,
> - when set, L4 tables get created with the respective slot initialized
>   as before, and whenever the L4 table gets used as a user one the
>   mapping gets zapped.

Right, I think I understand your reasoning here.  Can you put all this
into a header or API doc somewhere?  The comment you add to xen.h is
not nearly enough for an OS dev to figure this out.

Two questions (to which I suspect your answer is yes, but just for clarity):

- Is the ABI change that an always-user-mode context never gets access
  to the RO M2P acceptable, then?  I can't think of a reason why a
  guest would have been relying on the old behaviour, but it is still
  a change.

- Are you happy that this is important enough to add extra code to the
  context switch paths?

> Since the new flag gets assigned a value discontiguous to the existing
> ones (in order to preserve the low bits, as only those are currently
> accessible to 32-bit guests), this requires a little bit of rework of
> the VM assist code in general: An architecture specific
> VM_ASSIST_VALID definition gets introduced (with an optional compat
> mode counterpart), and compilation of the respective code becomes
> conditional upon this being defined (ARM doesn't wire these up and
> hence doesn't need that code).

That all seems fine, but please do it in a separate patch.

I've only one comment on the actual code:

> +            if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
> +                 !paging_mode_refcounts(d) )
> +            {
> +                l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
> +
> +                l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
> +                    idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
> +                unmap_domain_page(l4tab);

This map/copy/unmap pattern repeats a lot in this patch -- I think a
helper function would be nice.  Potentially that helper could also
include some of the VMASST/paging_mode_refcounts tests (e.g.
update_l4_ro_mpt_mapping(d, l4mfn, is_compat, is_user_mode)).

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] x86: allow 64?bit PV guest kernels to suppress user mode exposure of M2P
  2015-03-19 12:35   ` [PATCH 1/3] x86: allow 64?bit " Tim Deegan
@ 2015-03-19 13:08     ` Jan Beulich
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Beulich @ 2015-03-19 13:08 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Ian Campbell, Andrew Cooper, Keir Fraser, Ian Jackson, xen-devel

>>> On 19.03.15 at 13:35, <tim@xen.org> wrote:
> At 15:54 +0000 on 17 Mar (1426607644), Jan Beulich wrote:
>> Xen L4 entries being uniformly installed into any L4 table and 64-bit
>> PV kernels running in ring 3 means that user mode was able to see the
>> read-only M2P presented by Xen to the guests. While apparently not
>> really representing an exploitable information leak, this still very
>> certainly was never meant to be that way.
>> 
>> Building on the fact that these guests already have separate kernel and
>> user mode page tables we can allow guest kernels to tell Xen that they
>> don't want user mode to see this table. We can't, however, do this by
>> default: There is no ABI requirement that kernel and user mode page
>> tables be separate. Therefore introduce a new VM-assist flag allowing
>> the guest to control respective hypervisor behavior:
>> - when not set, L4 tables get created with the respective slot blank,
>>   and whenever the L4 table gets used as a kernel one the missing
>>   mapping gets inserted,
>> - when set, L4 tables get created with the respective slot initialized
>>   as before, and whenever the L4 table gets used as a user one the
>>   mapping gets zapped.
> 
> Right, I think I understand your reasoning here.  Can you put all this
> into a header or API doc somewhere?  The comment you add to xen.h is
> not nearly enough for an OS dev to figure this out.

Sure.

> Two questions (to which I suspect your answer is yes, but just for clarity):
> 
> - Is the ABI change that an always-user-mode context never gets access
>   to the RO M2P acceptable, then?  I can't think of a reason why a
>   guest would have been relying on the old behaviour, but it is still
>   a change.

No, I wouldn't view that as an ABI change, because the ABI never
specified that this range would be exposed. In fact how I coded
this was to make sure there is no ABI change at all, i.e. the
reasoning above is about why a simpler approach (assuming fully
separate kernel and user L4 tables) can't be used, as that would
result in an ABI change.

> - Are you happy that this is important enough to add extra code to the
>   context switch paths?

I'm not really happy to add stuff to the context switch path, but I
also don't think such an accidental exposure of data should be left
in place. (If we thought that there was absolutely no way an
attacker could make use of this, including making the exploit of
some other bug more harmful, then we could as well specify that
this data is being exposed to user mode, and have e.g. libxc use
it instead of the mmap() it does currently.)

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-03-19 13:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-17 15:48 [PATCH 0/3] x86: misc changes Jan Beulich
2015-03-17 15:54 ` [PATCH 1/3] x86: allow 64‑bit PV guest kernels to suppress user mode exposure of M2P Jan Beulich
2015-03-19 12:35   ` [PATCH 1/3] x86: allow 64?bit " Tim Deegan
2015-03-19 13:08     ` Jan Beulich
2015-03-17 15:55 ` [PATCH 2/3] slightly reduce vm_assist code Jan Beulich
2015-03-17 17:19   ` Tim Deegan
2015-03-17 15:56 ` [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN() Jan Beulich
2015-03-17 17:07   ` Tim Deegan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.