xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up
@ 2020-07-15  9:56 Jan Beulich
  2020-07-15  9:58 ` [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only Jan Beulich
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Jan Beulich @ 2020-07-15  9:56 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné

This in particular goes a few small steps further towards proper
!HVM and !PV config handling (i.e. no carrying of unnecessary
baggage).

1: x86/shadow: dirty VRAM tracking is needed for HVM only
2: x86/shadow: shadow_table[] needs only one entry for PV-only configs
3: x86/PV: drop a few misleading paging_mode_refcounts() checks
4: x86/shadow: have just a single instance of sh_set_toplevel_shadow()
5: x86/shadow: l3table[] and gl3e[] are HVM only

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only
  2020-07-15  9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
@ 2020-07-15  9:58 ` Jan Beulich
  2020-07-15  9:58 ` [PATCH 2/5] x86/shadow: shadow_table[] needs only one entry for PV-only configs Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2020-07-15  9:58 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné

Move shadow_track_dirty_vram() into hvm.c (requiring two static
functions to become non-static). More importantly though make sure we
don't de-reference d->arch.hvm.dirty_vram for a non-HVM guest. This was
a latent issue only just because the field lives far enough into struct
hvm_domain to be outside the part overlapping with struct pv_domain.

While moving shadow_track_dirty_vram() some purely typographic
adjustments are being made, like inserting missing blanks or putting
breaces on their own lines.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -999,7 +999,7 @@ void shadow_prealloc(struct domain *d, u
 
 /* Deliberately free all the memory we can: this will tear down all of
  * this domain's shadows */
-static void shadow_blow_tables(struct domain *d)
+void shadow_blow_tables(struct domain *d)
 {
     struct page_info *sp, *t;
     struct vcpu *v;
@@ -2029,7 +2029,7 @@ int sh_remove_write_access(struct domain
 /* Remove all mappings of a guest frame from the shadow tables.
  * Returns non-zero if we need to flush TLBs. */
 
-static int sh_remove_all_mappings(struct domain *d, mfn_t gmfn, gfn_t gfn)
+int sh_remove_all_mappings(struct domain *d, mfn_t gmfn, gfn_t gfn)
 {
     struct page_info *page = mfn_to_page(gmfn);
 
@@ -3162,203 +3162,6 @@ static void sh_clean_dirty_bitmap(struct
 }
 
 
-/**************************************************************************/
-/* VRAM dirty tracking support */
-int shadow_track_dirty_vram(struct domain *d,
-                            unsigned long begin_pfn,
-                            unsigned long nr,
-                            XEN_GUEST_HANDLE(void) guest_dirty_bitmap)
-{
-    int rc = 0;
-    unsigned long end_pfn = begin_pfn + nr;
-    unsigned long dirty_size = (nr + 7) / 8;
-    int flush_tlb = 0;
-    unsigned long i;
-    p2m_type_t t;
-    struct sh_dirty_vram *dirty_vram;
-    struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    uint8_t *dirty_bitmap = NULL;
-
-    if ( end_pfn < begin_pfn || end_pfn > p2m->max_mapped_pfn + 1 )
-        return -EINVAL;
-
-    /* We perform p2m lookups, so lock the p2m upfront to avoid deadlock */
-    p2m_lock(p2m_get_hostp2m(d));
-    paging_lock(d);
-
-    dirty_vram = d->arch.hvm.dirty_vram;
-
-    if ( dirty_vram && (!nr ||
-             ( begin_pfn != dirty_vram->begin_pfn
-            || end_pfn   != dirty_vram->end_pfn )) )
-    {
-        /* Different tracking, tear the previous down. */
-        gdprintk(XENLOG_INFO, "stopping tracking VRAM %lx - %lx\n", dirty_vram->begin_pfn, dirty_vram->end_pfn);
-        xfree(dirty_vram->sl1ma);
-        xfree(dirty_vram->dirty_bitmap);
-        xfree(dirty_vram);
-        dirty_vram = d->arch.hvm.dirty_vram = NULL;
-    }
-
-    if ( !nr )
-        goto out;
-
-    dirty_bitmap = vzalloc(dirty_size);
-    if ( dirty_bitmap == NULL )
-    {
-        rc = -ENOMEM;
-        goto out;
-    }
-    /* This should happen seldomly (Video mode change),
-     * no need to be careful. */
-    if ( !dirty_vram )
-    {
-        /* Throw away all the shadows rather than walking through them
-         * up to nr times getting rid of mappings of each pfn */
-        shadow_blow_tables(d);
-
-        gdprintk(XENLOG_INFO, "tracking VRAM %lx - %lx\n", begin_pfn, end_pfn);
-
-        rc = -ENOMEM;
-        if ( (dirty_vram = xmalloc(struct sh_dirty_vram)) == NULL )
-            goto out;
-        dirty_vram->begin_pfn = begin_pfn;
-        dirty_vram->end_pfn = end_pfn;
-        d->arch.hvm.dirty_vram = dirty_vram;
-
-        if ( (dirty_vram->sl1ma = xmalloc_array(paddr_t, nr)) == NULL )
-            goto out_dirty_vram;
-        memset(dirty_vram->sl1ma, ~0, sizeof(paddr_t) * nr);
-
-        if ( (dirty_vram->dirty_bitmap = xzalloc_array(uint8_t, dirty_size)) == NULL )
-            goto out_sl1ma;
-
-        dirty_vram->last_dirty = NOW();
-
-        /* Tell the caller that this time we could not track dirty bits. */
-        rc = -ENODATA;
-    }
-    else if (dirty_vram->last_dirty == -1)
-        /* still completely clean, just copy our empty bitmap */
-        memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
-    else
-    {
-        mfn_t map_mfn = INVALID_MFN;
-        void *map_sl1p = NULL;
-
-        /* Iterate over VRAM to track dirty bits. */
-        for ( i = 0; i < nr; i++ ) {
-            mfn_t mfn = get_gfn_query_unlocked(d, begin_pfn + i, &t);
-            struct page_info *page;
-            int dirty = 0;
-            paddr_t sl1ma = dirty_vram->sl1ma[i];
-
-            if ( mfn_eq(mfn, INVALID_MFN) )
-                dirty = 1;
-            else
-            {
-                page = mfn_to_page(mfn);
-                switch (page->u.inuse.type_info & PGT_count_mask)
-                {
-                case 0:
-                    /* No guest reference, nothing to track. */
-                    break;
-                case 1:
-                    /* One guest reference. */
-                    if ( sl1ma == INVALID_PADDR )
-                    {
-                        /* We don't know which sl1e points to this, too bad. */
-                        dirty = 1;
-                        /* TODO: Heuristics for finding the single mapping of
-                         * this gmfn */
-                        flush_tlb |= sh_remove_all_mappings(d, mfn,
-                                                            _gfn(begin_pfn + i));
-                    }
-                    else
-                    {
-                        /* Hopefully the most common case: only one mapping,
-                         * whose dirty bit we can use. */
-                        l1_pgentry_t *sl1e;
-                        mfn_t sl1mfn = maddr_to_mfn(sl1ma);
-
-                        if ( !mfn_eq(sl1mfn, map_mfn) )
-                        {
-                            if ( map_sl1p )
-                                unmap_domain_page(map_sl1p);
-                            map_sl1p = map_domain_page(sl1mfn);
-                            map_mfn = sl1mfn;
-                        }
-                        sl1e = map_sl1p + (sl1ma & ~PAGE_MASK);
-
-                        if ( l1e_get_flags(*sl1e) & _PAGE_DIRTY )
-                        {
-                            dirty = 1;
-                            /* Note: this is atomic, so we may clear a
-                             * _PAGE_ACCESSED set by another processor. */
-                            l1e_remove_flags(*sl1e, _PAGE_DIRTY);
-                            flush_tlb = 1;
-                        }
-                    }
-                    break;
-                default:
-                    /* More than one guest reference,
-                     * we don't afford tracking that. */
-                    dirty = 1;
-                    break;
-                }
-            }
-
-            if ( dirty )
-            {
-                dirty_vram->dirty_bitmap[i / 8] |= 1 << (i % 8);
-                dirty_vram->last_dirty = NOW();
-            }
-        }
-
-        if ( map_sl1p )
-            unmap_domain_page(map_sl1p);
-
-        memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
-        memset(dirty_vram->dirty_bitmap, 0, dirty_size);
-        if ( dirty_vram->last_dirty + SECONDS(2) < NOW() )
-        {
-            /* was clean for more than two seconds, try to disable guest
-             * write access */
-            for ( i = begin_pfn; i < end_pfn; i++ )
-            {
-                mfn_t mfn = get_gfn_query_unlocked(d, i, &t);
-                if ( !mfn_eq(mfn, INVALID_MFN) )
-                    flush_tlb |= sh_remove_write_access(d, mfn, 1, 0);
-            }
-            dirty_vram->last_dirty = -1;
-        }
-    }
-    if ( flush_tlb )
-        guest_flush_tlb_mask(d, d->dirty_cpumask);
-    goto out;
-
-out_sl1ma:
-    xfree(dirty_vram->sl1ma);
-out_dirty_vram:
-    xfree(dirty_vram);
-    dirty_vram = d->arch.hvm.dirty_vram = NULL;
-
-out:
-    paging_unlock(d);
-    if ( rc == 0 && dirty_bitmap != NULL &&
-         copy_to_guest(guest_dirty_bitmap, dirty_bitmap, dirty_size) )
-    {
-        paging_lock(d);
-        for ( i = 0; i < dirty_size; i++ )
-            dirty_vram->dirty_bitmap[i] |= dirty_bitmap[i];
-        paging_unlock(d);
-        rc = -EFAULT;
-    }
-    vfree(dirty_bitmap);
-    p2m_unlock(p2m_get_hostp2m(d));
-    return rc;
-}
-
 /* Fluhs TLB of selected vCPUs. */
 bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                       void *ctxt)
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -691,6 +691,218 @@ static void sh_emulate_unmap_dest(struct
     atomic_inc(&v->domain->arch.paging.shadow.gtable_dirty_version);
 }
 
+/**************************************************************************/
+/* VRAM dirty tracking support */
+int shadow_track_dirty_vram(struct domain *d,
+                            unsigned long begin_pfn,
+                            unsigned long nr,
+                            XEN_GUEST_HANDLE(void) guest_dirty_bitmap)
+{
+    int rc = 0;
+    unsigned long end_pfn = begin_pfn + nr;
+    unsigned long dirty_size = (nr + 7) / 8;
+    int flush_tlb = 0;
+    unsigned long i;
+    p2m_type_t t;
+    struct sh_dirty_vram *dirty_vram;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    uint8_t *dirty_bitmap = NULL;
+
+    if ( end_pfn < begin_pfn || end_pfn > p2m->max_mapped_pfn + 1 )
+        return -EINVAL;
+
+    /* We perform p2m lookups, so lock the p2m upfront to avoid deadlock */
+    p2m_lock(p2m_get_hostp2m(d));
+    paging_lock(d);
+
+    dirty_vram = d->arch.hvm.dirty_vram;
+
+    if ( dirty_vram && (!nr ||
+             ( begin_pfn != dirty_vram->begin_pfn
+            || end_pfn   != dirty_vram->end_pfn )) )
+    {
+        /* Different tracking, tear the previous down. */
+        gdprintk(XENLOG_INFO, "stopping tracking VRAM %lx - %lx\n", dirty_vram->begin_pfn, dirty_vram->end_pfn);
+        xfree(dirty_vram->sl1ma);
+        xfree(dirty_vram->dirty_bitmap);
+        xfree(dirty_vram);
+        dirty_vram = d->arch.hvm.dirty_vram = NULL;
+    }
+
+    if ( !nr )
+        goto out;
+
+    dirty_bitmap = vzalloc(dirty_size);
+    if ( dirty_bitmap == NULL )
+    {
+        rc = -ENOMEM;
+        goto out;
+    }
+    /*
+     * This should happen seldomly (Video mode change),
+     * no need to be careful.
+     */
+    if ( !dirty_vram )
+    {
+        /*
+         * Throw away all the shadows rather than walking through them
+         * up to nr times getting rid of mappings of each pfn.
+         */
+        shadow_blow_tables(d);
+
+        gdprintk(XENLOG_INFO, "tracking VRAM %lx - %lx\n", begin_pfn, end_pfn);
+
+        rc = -ENOMEM;
+        if ( (dirty_vram = xmalloc(struct sh_dirty_vram)) == NULL )
+            goto out;
+        dirty_vram->begin_pfn = begin_pfn;
+        dirty_vram->end_pfn = end_pfn;
+        d->arch.hvm.dirty_vram = dirty_vram;
+
+        if ( (dirty_vram->sl1ma = xmalloc_array(paddr_t, nr)) == NULL )
+            goto out_dirty_vram;
+        memset(dirty_vram->sl1ma, ~0, sizeof(paddr_t) * nr);
+
+        if ( (dirty_vram->dirty_bitmap = xzalloc_array(uint8_t, dirty_size)) == NULL )
+            goto out_sl1ma;
+
+        dirty_vram->last_dirty = NOW();
+
+        /* Tell the caller that this time we could not track dirty bits. */
+        rc = -ENODATA;
+    }
+    else if ( dirty_vram->last_dirty == -1 )
+        /* still completely clean, just copy our empty bitmap */
+        memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
+    else
+    {
+        mfn_t map_mfn = INVALID_MFN;
+        void *map_sl1p = NULL;
+
+        /* Iterate over VRAM to track dirty bits. */
+        for ( i = 0; i < nr; i++ )
+        {
+            mfn_t mfn = get_gfn_query_unlocked(d, begin_pfn + i, &t);
+            struct page_info *page;
+            int dirty = 0;
+            paddr_t sl1ma = dirty_vram->sl1ma[i];
+
+            if ( mfn_eq(mfn, INVALID_MFN) )
+                dirty = 1;
+            else
+            {
+                page = mfn_to_page(mfn);
+                switch ( page->u.inuse.type_info & PGT_count_mask )
+                {
+                case 0:
+                    /* No guest reference, nothing to track. */
+                    break;
+
+                case 1:
+                    /* One guest reference. */
+                    if ( sl1ma == INVALID_PADDR )
+                    {
+                        /* We don't know which sl1e points to this, too bad. */
+                        dirty = 1;
+                        /*
+                         * TODO: Heuristics for finding the single mapping of
+                         * this gmfn
+                         */
+                        flush_tlb |= sh_remove_all_mappings(d, mfn,
+                                                            _gfn(begin_pfn + i));
+                    }
+                    else
+                    {
+                        /*
+                         * Hopefully the most common case: only one mapping,
+                         * whose dirty bit we can use.
+                         */
+                        l1_pgentry_t *sl1e;
+                        mfn_t sl1mfn = maddr_to_mfn(sl1ma);
+
+                        if ( !mfn_eq(sl1mfn, map_mfn) )
+                        {
+                            if ( map_sl1p )
+                                unmap_domain_page(map_sl1p);
+                            map_sl1p = map_domain_page(sl1mfn);
+                            map_mfn = sl1mfn;
+                        }
+                        sl1e = map_sl1p + (sl1ma & ~PAGE_MASK);
+
+                        if ( l1e_get_flags(*sl1e) & _PAGE_DIRTY )
+                        {
+                            dirty = 1;
+                            /*
+                             * Note: this is atomic, so we may clear a
+                             * _PAGE_ACCESSED set by another processor.
+                             */
+                            l1e_remove_flags(*sl1e, _PAGE_DIRTY);
+                            flush_tlb = 1;
+                        }
+                    }
+                    break;
+
+                default:
+                    /* More than one guest reference,
+                     * we don't afford tracking that. */
+                    dirty = 1;
+                    break;
+                }
+            }
+
+            if ( dirty )
+            {
+                dirty_vram->dirty_bitmap[i / 8] |= 1 << (i % 8);
+                dirty_vram->last_dirty = NOW();
+            }
+        }
+
+        if ( map_sl1p )
+            unmap_domain_page(map_sl1p);
+
+        memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
+        memset(dirty_vram->dirty_bitmap, 0, dirty_size);
+        if ( dirty_vram->last_dirty + SECONDS(2) < NOW() )
+        {
+            /*
+             * Was clean for more than two seconds, try to disable guest
+             * write access.
+             */
+            for ( i = begin_pfn; i < end_pfn; i++ )
+            {
+                mfn_t mfn = get_gfn_query_unlocked(d, i, &t);
+                if ( !mfn_eq(mfn, INVALID_MFN) )
+                    flush_tlb |= sh_remove_write_access(d, mfn, 1, 0);
+            }
+            dirty_vram->last_dirty = -1;
+        }
+    }
+    if ( flush_tlb )
+        guest_flush_tlb_mask(d, d->dirty_cpumask);
+    goto out;
+
+ out_sl1ma:
+    xfree(dirty_vram->sl1ma);
+ out_dirty_vram:
+    xfree(dirty_vram);
+    dirty_vram = d->arch.hvm.dirty_vram = NULL;
+
+ out:
+    paging_unlock(d);
+    if ( rc == 0 && dirty_bitmap != NULL &&
+         copy_to_guest(guest_dirty_bitmap, dirty_bitmap, dirty_size) )
+    {
+        paging_lock(d);
+        for ( i = 0; i < dirty_size; i++ )
+            dirty_vram->dirty_bitmap[i] |= dirty_bitmap[i];
+        paging_unlock(d);
+        rc = -EFAULT;
+    }
+    vfree(dirty_bitmap);
+    p2m_unlock(p2m_get_hostp2m(d));
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -494,7 +494,6 @@ _sh_propagate(struct vcpu *v,
     guest_l1e_t guest_entry = { guest_intpte };
     shadow_l1e_t *sp = shadow_entry_ptr;
     struct domain *d = v->domain;
-    struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
@@ -649,15 +648,19 @@ _sh_propagate(struct vcpu *v,
         }
     }
 
-    if ( unlikely((level == 1) && dirty_vram
-            && dirty_vram->last_dirty == -1
-            && gfn_x(target_gfn) >= dirty_vram->begin_pfn
-            && gfn_x(target_gfn) < dirty_vram->end_pfn) )
+    if ( unlikely(level == 1) && is_hvm_domain(d) )
     {
-        if ( ft & FETCH_TYPE_WRITE )
-            dirty_vram->last_dirty = NOW();
-        else
-            sflags &= ~_PAGE_RW;
+        struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
+
+        if ( dirty_vram && dirty_vram->last_dirty == -1 &&
+             gfn_x(target_gfn) >= dirty_vram->begin_pfn &&
+             gfn_x(target_gfn) < dirty_vram->end_pfn )
+        {
+            if ( ft & FETCH_TYPE_WRITE )
+                dirty_vram->last_dirty = NOW();
+            else
+                sflags &= ~_PAGE_RW;
+        }
     }
 
     /* Read-only memory */
@@ -1082,7 +1085,7 @@ static inline void shadow_vram_get_l1e(s
     unsigned long gfn;
     struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
 
-    if ( !dirty_vram         /* tracking disabled? */
+    if ( !is_hvm_domain(d) || !dirty_vram /* tracking disabled? */
          || !(flags & _PAGE_RW) /* read-only mapping? */
          || !mfn_valid(mfn) )   /* mfn can be invalid in mmio_direct */
         return;
@@ -1113,7 +1116,7 @@ static inline void shadow_vram_put_l1e(s
     unsigned long gfn;
     struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
 
-    if ( !dirty_vram         /* tracking disabled? */
+    if ( !is_hvm_domain(d) || !dirty_vram /* tracking disabled? */
          || !(flags & _PAGE_RW) /* read-only mapping? */
          || !mfn_valid(mfn) )   /* mfn can be invalid in mmio_direct */
         return;
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -438,6 +438,14 @@ mfn_t oos_snapshot_lookup(struct domain
 
 #endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
 
+/* Deliberately free all the memory we can: tear down all of d's shadows. */
+void shadow_blow_tables(struct domain *d);
+
+/*
+ * Remove all mappings of a guest frame from the shadow tables.
+ * Returns non-zero if we need to flush TLBs.
+ */
+int sh_remove_all_mappings(struct domain *d, mfn_t gmfn, gfn_t gfn);
 
 /* Reset the up-pointers of every L3 shadow to 0.
  * This is called when l3 shadows stop being pinnable, to clear out all



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/5] x86/shadow: shadow_table[] needs only one entry for PV-only configs
  2020-07-15  9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
  2020-07-15  9:58 ` [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only Jan Beulich
@ 2020-07-15  9:58 ` Jan Beulich
  2020-07-15  9:59 ` [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks Jan Beulich
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2020-07-15  9:58 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné

Furthermore the field isn't needed at all with shadow support disabled -
move it into struct shadow_vcpu.

Introduce for_each_shadow_table(), shortening loops for the 4-level case
at the same time.

Adjust loop variables and a function parameter to be "unsigned int"
where applicable at the same time. Also move a comment that ended up
misplaced due to incremental additions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -959,13 +959,15 @@ static void _shadow_prealloc(struct doma
     perfc_incr(shadow_prealloc_2);
 
     for_each_vcpu(d, v)
-        for ( i = 0 ; i < 4 ; i++ )
+        for ( i = 0; i < ARRAY_SIZE(v->arch.paging.shadow.shadow_table); i++ )
         {
-            if ( !pagetable_is_null(v->arch.shadow_table[i]) )
+            if ( !pagetable_is_null(v->arch.paging.shadow.shadow_table[i]) )
             {
                 TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_PREALLOC_UNHOOK);
-                shadow_unhook_mappings(d,
-                               pagetable_get_mfn(v->arch.shadow_table[i]), 0);
+                shadow_unhook_mappings(
+                    d,
+                    pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]),
+                    0);
 
                 /* See if that freed up enough space */
                 if ( d->arch.paging.shadow.free_pages >= pages )
@@ -1018,10 +1020,12 @@ void shadow_blow_tables(struct domain *d
 
     /* Second pass: unhook entries of in-use shadows */
     for_each_vcpu(d, v)
-        for ( i = 0 ; i < 4 ; i++ )
-            if ( !pagetable_is_null(v->arch.shadow_table[i]) )
-                shadow_unhook_mappings(d,
-                               pagetable_get_mfn(v->arch.shadow_table[i]), 0);
+        for ( i = 0; i < ARRAY_SIZE(v->arch.paging.shadow.shadow_table); i++ )
+            if ( !pagetable_is_null(v->arch.paging.shadow.shadow_table[i]) )
+                shadow_unhook_mappings(
+                    d,
+                    pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]),
+                    0);
 
     /* Make sure everyone sees the unshadowings */
     guest_flush_tlb_mask(d, d->dirty_cpumask);
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -85,6 +85,15 @@ const char *const fetch_type_names[] = {
 };
 #endif
 
+#if SHADOW_PAGING_LEVELS == 3
+# define for_each_shadow_table(v, i) \
+    for ( (i) = 0; \
+          (i) < ARRAY_SIZE((v)->arch.paging.shadow.shadow_table); \
+          ++(i) )
+#else
+# define for_each_shadow_table(v, i) for ( (i) = 0; (i) < 1; ++(i) )
+#endif
+
 /* Helper to perform a local TLB flush. */
 static void sh_flush_local(const struct domain *d)
 {
@@ -1624,7 +1633,7 @@ static shadow_l4e_t * shadow_get_and_cre
                                                 mfn_t *sl4mfn)
 {
     /* There is always a shadow of the top level table.  Get it. */
-    *sl4mfn = pagetable_get_mfn(v->arch.shadow_table[0]);
+    *sl4mfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
     /* Reading the top level table is always valid. */
     return sh_linear_l4_table(v) + shadow_l4_linear_offset(gw->va);
 }
@@ -1740,7 +1749,7 @@ static shadow_l2e_t * shadow_get_and_cre
     return sh_linear_l2_table(v) + shadow_l2_linear_offset(gw->va);
 #else /* 32bit... */
     /* There is always a shadow of the top level table.  Get it. */
-    *sl2mfn = pagetable_get_mfn(v->arch.shadow_table[0]);
+    *sl2mfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
     /* This next line is important: the guest l2 has a 16k
      * shadow, we need to return the right mfn of the four. This
      * call will set it for us as a side-effect. */
@@ -2333,6 +2342,7 @@ int sh_safe_not_to_sync(struct vcpu *v,
     struct domain *d = v->domain;
     struct page_info *sp;
     mfn_t smfn;
+    unsigned int i;
 
     if ( !sh_type_has_up_pointer(d, SH_type_l1_shadow) )
         return 0;
@@ -2365,14 +2375,10 @@ int sh_safe_not_to_sync(struct vcpu *v,
     ASSERT(mfn_valid(smfn));
 #endif
 
-    if ( pagetable_get_pfn(v->arch.shadow_table[0]) == mfn_x(smfn)
-#if (SHADOW_PAGING_LEVELS == 3)
-         || pagetable_get_pfn(v->arch.shadow_table[1]) == mfn_x(smfn)
-         || pagetable_get_pfn(v->arch.shadow_table[2]) == mfn_x(smfn)
-         || pagetable_get_pfn(v->arch.shadow_table[3]) == mfn_x(smfn)
-#endif
-        )
-        return 0;
+    for_each_shadow_table(v, i)
+        if ( pagetable_get_pfn(v->arch.paging.shadow.shadow_table[i]) ==
+             mfn_x(smfn) )
+            return 0;
 
     /* Only in use in one toplevel shadow, and it's not the one we're
      * running on */
@@ -3287,10 +3293,12 @@ static int sh_page_fault(struct vcpu *v,
         for_each_vcpu(d, tmp)
         {
 #if GUEST_PAGING_LEVELS == 3
-            int i;
-            for ( i = 0; i < 4; i++ )
+            unsigned int i;
+
+            for_each_shadow_table(v, i)
             {
-                mfn_t smfn = pagetable_get_mfn(v->arch.shadow_table[i]);
+                mfn_t smfn = pagetable_get_mfn(
+                                 v->arch.paging.shadow.shadow_table[i]);
 
                 if ( mfn_valid(smfn) && (mfn_x(smfn) != 0) )
                 {
@@ -3707,7 +3715,7 @@ sh_update_linear_entries(struct vcpu *v)
      *
      * Because HVM guests run on the same monitor tables regardless of the
      * shadow tables in use, the linear mapping of the shadow tables has to
-     * be updated every time v->arch.shadow_table changes.
+     * be updated every time v->arch.paging.shadow.shadow_table changes.
      */
 
     /* Don't try to update the monitor table if it doesn't exist */
@@ -3723,8 +3731,9 @@ sh_update_linear_entries(struct vcpu *v)
     if ( v == current )
     {
         __linear_l4_table[l4_linear_offset(SH_LINEAR_PT_VIRT_START)] =
-            l4e_from_pfn(pagetable_get_pfn(v->arch.shadow_table[0]),
-                         __PAGE_HYPERVISOR_RW);
+            l4e_from_pfn(
+                pagetable_get_pfn(v->arch.paging.shadow.shadow_table[0]),
+                __PAGE_HYPERVISOR_RW);
     }
     else
     {
@@ -3732,8 +3741,9 @@ sh_update_linear_entries(struct vcpu *v)
 
         ml4e = map_domain_page(pagetable_get_mfn(v->arch.hvm.monitor_table));
         ml4e[l4_table_offset(SH_LINEAR_PT_VIRT_START)] =
-            l4e_from_pfn(pagetable_get_pfn(v->arch.shadow_table[0]),
-                         __PAGE_HYPERVISOR_RW);
+            l4e_from_pfn(
+                pagetable_get_pfn(v->arch.paging.shadow.shadow_table[0]),
+                __PAGE_HYPERVISOR_RW);
         unmap_domain_page(ml4e);
     }
 
@@ -3812,7 +3822,7 @@ sh_update_linear_entries(struct vcpu *v)
 
 
 /*
- * Removes vcpu->arch.shadow_table[].
+ * Removes v->arch.paging.shadow.shadow_table[].
  * Does all appropriate management/bookkeeping/refcounting/etc...
  */
 static void
@@ -3820,38 +3830,34 @@ sh_detach_old_tables(struct vcpu *v)
 {
     struct domain *d = v->domain;
     mfn_t smfn;
-    int i = 0;
+    unsigned int i;
 
     ////
-    //// vcpu->arch.shadow_table[]
+    //// vcpu->arch.paging.shadow.shadow_table[]
     ////
 
-#if GUEST_PAGING_LEVELS == 3
-    /* PAE guests have four shadow_table entries */
-    for ( i = 0 ; i < 4 ; i++ )
-#endif
+    for_each_shadow_table(v, i)
     {
-        smfn = pagetable_get_mfn(v->arch.shadow_table[i]);
+        smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]);
         if ( mfn_x(smfn) )
             sh_put_ref(d, smfn, 0);
-        v->arch.shadow_table[i] = pagetable_null();
+        v->arch.paging.shadow.shadow_table[i] = pagetable_null();
     }
 }
 
 /* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
 static void
 sh_set_toplevel_shadow(struct vcpu *v,
-                       int slot,
+                       unsigned int slot,
                        mfn_t gmfn,
                        unsigned int root_type)
 {
     mfn_t smfn;
     pagetable_t old_entry, new_entry;
-
     struct domain *d = v->domain;
 
     /* Remember the old contents of this slot */
-    old_entry = v->arch.shadow_table[slot];
+    old_entry = v->arch.paging.shadow.shadow_table[slot];
 
     /* Now figure out the new contents: is this a valid guest MFN? */
     if ( !mfn_valid(gmfn) )
@@ -3893,7 +3899,7 @@ sh_set_toplevel_shadow(struct vcpu *v,
     SHADOW_PRINTK("%u/%u [%u] gmfn %#"PRI_mfn" smfn %#"PRI_mfn"\n",
                   GUEST_PAGING_LEVELS, SHADOW_PAGING_LEVELS, slot,
                   mfn_x(gmfn), mfn_x(pagetable_get_mfn(new_entry)));
-    v->arch.shadow_table[slot] = new_entry;
+    v->arch.paging.shadow.shadow_table[slot] = new_entry;
 
     /* Decrement the refcount of the old contents of this slot */
     if ( !pagetable_is_null(old_entry) ) {
@@ -3999,7 +4005,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
 
 
     ////
-    //// vcpu->arch.shadow_table[]
+    //// vcpu->arch.paging.shadow.shadow_table[]
     ////
 
     /* We revoke write access to the new guest toplevel page(s) before we
@@ -4056,7 +4062,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow);
     if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
     {
-        mfn_t smfn = pagetable_get_mfn(v->arch.shadow_table[0]);
+        mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
 
         if ( !(v->arch.flags & TF_kernel_mode) && VM_ASSIST(d, m2p_strict) )
             zap_ro_mpt(smfn);
@@ -4074,9 +4080,10 @@ sh_update_cr3(struct vcpu *v, int do_loc
     ///
 #if SHADOW_PAGING_LEVELS == 3
         {
-            mfn_t smfn = pagetable_get_mfn(v->arch.shadow_table[0]);
-            int i;
-            for ( i = 0; i < 4; i++ )
+            mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
+            unsigned int i;
+
+            for_each_shadow_table(v, i)
             {
 #if GUEST_PAGING_LEVELS == 2
                 /* 2-on-3: make a PAE l3 that points at the four-page l2 */
@@ -4084,7 +4091,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
                     smfn = sh_next_page(smfn);
 #else
                 /* 3-on-3: make a PAE l3 that points at the four l2 pages */
-                smfn = pagetable_get_mfn(v->arch.shadow_table[i]);
+                smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]);
 #endif
                 v->arch.paging.shadow.l3table[i] =
                     (mfn_x(smfn) == 0)
@@ -4108,7 +4115,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
         /* We don't support PV except guest == shadow == config levels */
         BUILD_BUG_ON(GUEST_PAGING_LEVELS != SHADOW_PAGING_LEVELS);
         /* Just use the shadow top-level directly */
-        make_cr3(v, pagetable_get_mfn(v->arch.shadow_table[0]));
+        make_cr3(v, pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]));
     }
 #endif
 
@@ -4124,7 +4131,8 @@ sh_update_cr3(struct vcpu *v, int do_loc
         v->arch.hvm.hw_cr[3] = virt_to_maddr(&v->arch.paging.shadow.l3table);
 #else
         /* 4-on-4: Just use the shadow top-level directly */
-        v->arch.hvm.hw_cr[3] = pagetable_get_paddr(v->arch.shadow_table[0]);
+        v->arch.hvm.hw_cr[3] =
+            pagetable_get_paddr(v->arch.paging.shadow.shadow_table[0]);
 #endif
         hvm_update_guest_cr3(v, noflush);
     }
@@ -4443,7 +4451,7 @@ static void sh_pagetable_dying(paddr_t g
 {
     struct vcpu *v = current;
     struct domain *d = v->domain;
-    int i = 0;
+    unsigned int i;
     int flush = 0;
     int fast_path = 0;
     paddr_t gcr3 = 0;
@@ -4474,15 +4482,16 @@ static void sh_pagetable_dying(paddr_t g
         gl3pa = map_domain_page(l3mfn);
         gl3e = (guest_l3e_t *)(gl3pa + ((unsigned long)gpa & ~PAGE_MASK));
     }
-    for ( i = 0; i < 4; i++ )
+    for_each_shadow_table(v, i)
     {
         mfn_t smfn, gmfn;
 
-        if ( fast_path ) {
-            if ( pagetable_is_null(v->arch.shadow_table[i]) )
+        if ( fast_path )
+        {
+            if ( pagetable_is_null(v->arch.paging.shadow.shadow_table[i]) )
                 smfn = INVALID_MFN;
             else
-                smfn = pagetable_get_mfn(v->arch.shadow_table[i]);
+                smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]);
         }
         else
         {
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -135,6 +135,14 @@ struct shadow_vcpu {
     l3_pgentry_t l3table[4] __attribute__((__aligned__(32)));
     /* PAE guests: per-vcpu cache of the top-level *guest* entries */
     l3_pgentry_t gl3e[4] __attribute__((__aligned__(32)));
+
+    /* shadow(s) of guest (MFN) */
+#ifdef CONFIG_HVM
+    pagetable_t shadow_table[4];
+#else
+    pagetable_t shadow_table[1];
+#endif
+
     /* Last MFN that we emulated a write to as unshadow heuristics. */
     unsigned long last_emulated_mfn_for_unshadow;
     /* MFN of the last shadow that we shot a writeable mapping in */
@@ -576,6 +584,10 @@ struct arch_vcpu
         struct hvm_vcpu hvm;
     };
 
+    /*
+     * guest_table{,_user} hold a ref to the page, and also a type-count
+     * unless shadow refcounts are in use
+     */
     pagetable_t guest_table_user;       /* (MFN) x86/64 user-space pagetable */
     pagetable_t guest_table;            /* (MFN) guest notion of cr3 */
     struct page_info *old_guest_table;  /* partially destructed pagetable */
@@ -583,9 +595,7 @@ struct arch_vcpu
                                         /* former, if any */
     bool old_guest_table_partial;       /* Are we dropping a type ref, or just
                                          * finishing up a partial de-validation? */
-    /* guest_table holds a ref to the page, and also a type-count unless
-     * shadow refcounts are in use */
-    pagetable_t shadow_table[4];        /* (MFN) shadow(s) of guest */
+
     unsigned long cr3;                  /* (MA) value to install in HW CR3 */
 
     /*



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks
  2020-07-15  9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
  2020-07-15  9:58 ` [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only Jan Beulich
  2020-07-15  9:58 ` [PATCH 2/5] x86/shadow: shadow_table[] needs only one entry for PV-only configs Jan Beulich
@ 2020-07-15  9:59 ` Jan Beulich
  2020-07-31 14:58   ` Ping: " Jan Beulich
  2020-07-15  9:59 ` [PATCH 4/5] x86/shadow: have just a single instance of sh_set_toplevel_shadow() Jan Beulich
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2020-07-15  9:59 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné

The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
was apparently inconsistent so far: There was a type ref acquired
unconditionally for the new top level page table, but the dropping of
the old type ref was conditional upon !paging_mode_refcounts(). Mirror
this also to arch_set_info_guest().

Also move new_guest_cr3()'s #ifdef to around the function - both callers
now get built only when CONFIG_PV, i.e. no need to retain a stub.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1122,8 +1122,6 @@ int arch_set_info_guest(
 
     if ( !cr3_page )
         rc = -EINVAL;
-    else if ( paging_mode_refcounts(d) )
-        /* nothing */;
     else if ( cr3_page == v->arch.old_guest_table )
     {
         v->arch.old_guest_table = NULL;
@@ -1144,8 +1142,7 @@ int arch_set_info_guest(
         case -ERESTART:
             break;
         case 0:
-            if ( !compat && !VM_ASSIST(d, m2p_strict) &&
-                 !paging_mode_refcounts(d) )
+            if ( !compat && !VM_ASSIST(d, m2p_strict) )
                 fill_ro_mpt(cr3_mfn);
             break;
         default:
@@ -1166,7 +1163,7 @@ int arch_set_info_guest(
 
             if ( !cr3_page )
                 rc = -EINVAL;
-            else if ( !paging_mode_refcounts(d) )
+            else
             {
                 rc = get_page_type_preemptible(cr3_page, PGT_root_page_table);
                 switch ( rc )
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3149,9 +3149,9 @@ int vcpu_destroy_pagetables(struct vcpu
     return rc;
 }
 
+#ifdef CONFIG_PV
 int new_guest_cr3(mfn_t mfn)
 {
-#ifdef CONFIG_PV
     struct vcpu *curr = current;
     struct domain *d = curr->domain;
     int rc;
@@ -3220,7 +3220,7 @@ int new_guest_cr3(mfn_t mfn)
 
     pv_destroy_ldt(curr); /* Unconditional TLB flush later. */
 
-    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+    if ( !VM_ASSIST(d, m2p_strict) )
         fill_ro_mpt(mfn);
     curr->arch.guest_table = pagetable_from_mfn(mfn);
     update_cr3(curr);
@@ -3231,30 +3231,24 @@ int new_guest_cr3(mfn_t mfn)
     {
         struct page_info *page = mfn_to_page(old_base_mfn);
 
-        if ( paging_mode_refcounts(d) )
-            put_page(page);
-        else
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-            case -ERESTART:
-                curr->arch.old_guest_ptpg = NULL;
-                curr->arch.old_guest_table = page;
-                curr->arch.old_guest_table_partial = (rc == -ERESTART);
-                rc = -ERESTART;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
+        switch ( rc = put_page_and_type_preemptible(page) )
+        {
+        case -EINTR:
+        case -ERESTART:
+            curr->arch.old_guest_ptpg = NULL;
+            curr->arch.old_guest_table = page;
+            curr->arch.old_guest_table_partial = (rc == -ERESTART);
+            rc = -ERESTART;
+            break;
+        default:
+            BUG_ON(rc);
+            break;
+        }
     }
 
     return rc;
-#else
-    ASSERT_UNREACHABLE();
-    return -EINVAL;
-#endif
 }
+#endif
 
 #ifdef CONFIG_PV
 static int vcpumask_to_pcpumask(



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/5] x86/shadow: have just a single instance of sh_set_toplevel_shadow()
  2020-07-15  9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
                   ` (2 preceding siblings ...)
  2020-07-15  9:59 ` [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks Jan Beulich
@ 2020-07-15  9:59 ` Jan Beulich
  2020-07-15 10:00 ` [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only Jan Beulich
  2020-07-18 18:21 ` [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Tim Deegan
  5 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2020-07-15  9:59 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné

The only guest/shadow level dependent piece here is the call to
sh_make_shadow(). Make a pointer to the respective function an
argument of sh_set_toplevel_shadow(), allowing it to be moved to
common.c.

This implies making get_shadow_status() available to common.c; its set
and delete counterparts are moved along with it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Besides reducing compiled code size, this also avoids the function
becoming unused in !HVM builds (in two out of the three objects built)
in a subsequent change.

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2560,6 +2560,80 @@ void shadow_update_paging_modes(struct v
     paging_unlock(v->domain);
 }
 
+/* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
+void sh_set_toplevel_shadow(struct vcpu *v,
+                            unsigned int slot,
+                            mfn_t gmfn,
+                            unsigned int root_type,
+                            mfn_t (*make_shadow)(struct vcpu *v,
+                                                 mfn_t gmfn,
+                                                 uint32_t shadow_type))
+{
+    mfn_t smfn;
+    pagetable_t old_entry, new_entry;
+    struct domain *d = v->domain;
+
+    /* Remember the old contents of this slot */
+    old_entry = v->arch.paging.shadow.shadow_table[slot];
+
+    /* Now figure out the new contents: is this a valid guest MFN? */
+    if ( !mfn_valid(gmfn) )
+    {
+        new_entry = pagetable_null();
+        goto install_new_entry;
+    }
+
+    /* Guest mfn is valid: shadow it and install the shadow */
+    smfn = get_shadow_status(d, gmfn, root_type);
+    if ( !mfn_valid(smfn) )
+    {
+        /* Make sure there's enough free shadow memory. */
+        shadow_prealloc(d, root_type, 1);
+        /* Shadow the page. */
+        smfn = make_shadow(v, gmfn, root_type);
+    }
+    ASSERT(mfn_valid(smfn));
+
+    /* Take a ref to this page: it will be released in sh_detach_old_tables()
+     * or the next call to set_toplevel_shadow() */
+    if ( sh_get_ref(d, smfn, 0) )
+    {
+        /* Pin the shadow and put it (back) on the list of pinned shadows */
+        sh_pin(d, smfn);
+
+        new_entry = pagetable_from_mfn(smfn);
+    }
+    else
+    {
+        printk(XENLOG_G_ERR "can't install %"PRI_mfn" as toplevel shadow\n",
+               mfn_x(smfn));
+        domain_crash(d);
+        new_entry = pagetable_null();
+    }
+
+ install_new_entry:
+    /* Done.  Install it */
+    SHADOW_PRINTK("%u [%u] gmfn %#"PRI_mfn" smfn %#"PRI_mfn"\n",
+                  v->arch.paging.mode->shadow.shadow_levels, slot,
+                  mfn_x(gmfn), mfn_x(pagetable_get_mfn(new_entry)));
+    v->arch.paging.shadow.shadow_table[slot] = new_entry;
+
+    /* Decrement the refcount of the old contents of this slot */
+    if ( !pagetable_is_null(old_entry) )
+    {
+        mfn_t old_smfn = pagetable_get_mfn(old_entry);
+        /* Need to repin the old toplevel shadow if it's been unpinned
+         * by shadow_prealloc(): in PV mode we're still running on this
+         * shadow and it's not safe to free it yet. */
+        if ( !mfn_to_page(old_smfn)->u.sh.pinned && !sh_pin(d, old_smfn) )
+        {
+            printk(XENLOG_G_ERR "can't re-pin %"PRI_mfn"\n", mfn_x(old_smfn));
+            domain_crash(d);
+        }
+        sh_put_ref(d, old_smfn, 0);
+    }
+}
+
 /**************************************************************************/
 /* Turning on and off shadow features */
 
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -103,7 +103,7 @@ static void sh_flush_local(const struct
 /**************************************************************************/
 /* Hash table mapping from guest pagetables to shadows
  *
- * Normal case: maps the mfn of a guest page to the mfn of its shadow page.
+ * normal case: see private.h.
  * FL1's:       maps the *gfn* of the start of a superpage to the mfn of a
  *              shadow L1 which maps its "splinters".
  */
@@ -117,16 +117,6 @@ get_fl1_shadow_status(struct domain *d,
     return smfn;
 }
 
-static inline mfn_t
-get_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type)
-/* Look for shadows in the hash table */
-{
-    mfn_t smfn = shadow_hash_lookup(d, mfn_x(gmfn), shadow_type);
-    ASSERT(!mfn_valid(smfn) || mfn_to_page(smfn)->u.sh.head);
-    perfc_incr(shadow_get_shadow_status);
-    return smfn;
-}
-
 static inline void
 set_fl1_shadow_status(struct domain *d, gfn_t gfn, mfn_t smfn)
 /* Put an FL1 shadow into the hash table */
@@ -139,27 +129,6 @@ set_fl1_shadow_status(struct domain *d,
 }
 
 static inline void
-set_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
-/* Put a shadow into the hash table */
-{
-    int res;
-
-    SHADOW_PRINTK("d%d gmfn=%lx, type=%08x, smfn=%lx\n",
-                  d->domain_id, mfn_x(gmfn), shadow_type, mfn_x(smfn));
-
-    ASSERT(mfn_to_page(smfn)->u.sh.head);
-
-    /* 32-bit PV guests don't own their l4 pages so can't get_page them */
-    if ( !is_pv_32bit_domain(d) || shadow_type != SH_type_l4_64_shadow )
-    {
-        res = get_page(mfn_to_page(gmfn), d);
-        ASSERT(res == 1);
-    }
-
-    shadow_hash_insert(d, mfn_x(gmfn), shadow_type, smfn);
-}
-
-static inline void
 delete_fl1_shadow_status(struct domain *d, gfn_t gfn, mfn_t smfn)
 /* Remove a shadow from the hash table */
 {
@@ -169,19 +138,6 @@ delete_fl1_shadow_status(struct domain *
     shadow_hash_delete(d, gfn_x(gfn), SH_type_fl1_shadow, smfn);
 }
 
-static inline void
-delete_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
-/* Remove a shadow from the hash table */
-{
-    SHADOW_PRINTK("d%d gmfn=%"PRI_mfn", type=%08x, smfn=%"PRI_mfn"\n",
-                  d->domain_id, mfn_x(gmfn), shadow_type, mfn_x(smfn));
-    ASSERT(mfn_to_page(smfn)->u.sh.head);
-    shadow_hash_delete(d, mfn_x(gmfn), shadow_type, smfn);
-    /* 32-bit PV guests don't own their l4 pages; see set_shadow_status */
-    if ( !is_pv_32bit_domain(d) || shadow_type != SH_type_l4_64_shadow )
-        put_page(mfn_to_page(gmfn));
-}
-
 
 /**************************************************************************/
 /* Functions for walking the guest page tables */
@@ -3845,78 +3801,6 @@ sh_detach_old_tables(struct vcpu *v)
     }
 }
 
-/* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
-static void
-sh_set_toplevel_shadow(struct vcpu *v,
-                       unsigned int slot,
-                       mfn_t gmfn,
-                       unsigned int root_type)
-{
-    mfn_t smfn;
-    pagetable_t old_entry, new_entry;
-    struct domain *d = v->domain;
-
-    /* Remember the old contents of this slot */
-    old_entry = v->arch.paging.shadow.shadow_table[slot];
-
-    /* Now figure out the new contents: is this a valid guest MFN? */
-    if ( !mfn_valid(gmfn) )
-    {
-        new_entry = pagetable_null();
-        goto install_new_entry;
-    }
-
-    /* Guest mfn is valid: shadow it and install the shadow */
-    smfn = get_shadow_status(d, gmfn, root_type);
-    if ( !mfn_valid(smfn) )
-    {
-        /* Make sure there's enough free shadow memory. */
-        shadow_prealloc(d, root_type, 1);
-        /* Shadow the page. */
-        smfn = sh_make_shadow(v, gmfn, root_type);
-    }
-    ASSERT(mfn_valid(smfn));
-
-    /* Take a ref to this page: it will be released in sh_detach_old_tables()
-     * or the next call to set_toplevel_shadow() */
-    if ( sh_get_ref(d, smfn, 0) )
-    {
-        /* Pin the shadow and put it (back) on the list of pinned shadows */
-        sh_pin(d, smfn);
-
-        new_entry = pagetable_from_mfn(smfn);
-    }
-    else
-    {
-        printk(XENLOG_G_ERR "can't install %"PRI_mfn" as toplevel shadow\n",
-               mfn_x(smfn));
-        domain_crash(d);
-        new_entry = pagetable_null();
-    }
-
- install_new_entry:
-    /* Done.  Install it */
-    SHADOW_PRINTK("%u/%u [%u] gmfn %#"PRI_mfn" smfn %#"PRI_mfn"\n",
-                  GUEST_PAGING_LEVELS, SHADOW_PAGING_LEVELS, slot,
-                  mfn_x(gmfn), mfn_x(pagetable_get_mfn(new_entry)));
-    v->arch.paging.shadow.shadow_table[slot] = new_entry;
-
-    /* Decrement the refcount of the old contents of this slot */
-    if ( !pagetable_is_null(old_entry) ) {
-        mfn_t old_smfn = pagetable_get_mfn(old_entry);
-        /* Need to repin the old toplevel shadow if it's been unpinned
-         * by shadow_prealloc(): in PV mode we're still running on this
-         * shadow and it's not safe to free it yet. */
-        if ( !mfn_to_page(old_smfn)->u.sh.pinned && !sh_pin(d, old_smfn) )
-        {
-            printk(XENLOG_G_ERR "can't re-pin %"PRI_mfn"\n", mfn_x(old_smfn));
-            domain_crash(d);
-        }
-        sh_put_ref(d, old_smfn, 0);
-    }
-}
-
-
 static void
 sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
 /* Updates vcpu->arch.cr3 after the guest has changed CR3.
@@ -4014,7 +3898,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
 #if GUEST_PAGING_LEVELS == 2
     if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
         guest_flush_tlb_mask(d, d->dirty_cpumask);
-    sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow);
+    sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
 #elif GUEST_PAGING_LEVELS == 3
     /* PAE guests have four shadow_table entries, based on the
      * current values of the guest's four l3es. */
@@ -4048,18 +3932,20 @@ sh_update_cr3(struct vcpu *v, int do_loc
                 if ( p2m_is_ram(p2mt) )
                     sh_set_toplevel_shadow(v, i, gl2mfn, (i == 3)
                                            ? SH_type_l2h_shadow
-                                           : SH_type_l2_shadow);
+                                           : SH_type_l2_shadow,
+                                           sh_make_shadow);
                 else
-                    sh_set_toplevel_shadow(v, i, INVALID_MFN, 0);
+                    sh_set_toplevel_shadow(v, i, INVALID_MFN, 0,
+                                           sh_make_shadow);
             }
             else
-                sh_set_toplevel_shadow(v, i, INVALID_MFN, 0);
+                sh_set_toplevel_shadow(v, i, INVALID_MFN, 0, sh_make_shadow);
         }
     }
 #elif GUEST_PAGING_LEVELS == 4
     if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
         guest_flush_tlb_mask(d, d->dirty_cpumask);
-    sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow);
+    sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
     if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
     {
         mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -357,6 +357,15 @@ mfn_t shadow_alloc(struct domain *d,
                     unsigned long backpointer);
 void  shadow_free(struct domain *d, mfn_t smfn);
 
+/* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
+void sh_set_toplevel_shadow(struct vcpu *v,
+                            unsigned int slot,
+                            mfn_t gmfn,
+                            unsigned int root_type,
+                            mfn_t (*make_shadow)(struct vcpu *v,
+                                                 mfn_t gmfn,
+                                                 uint32_t shadow_type));
+
 /* Install the xen mappings in various flavours of shadow */
 void sh_install_xen_entries_in_l4(struct domain *, mfn_t gl4mfn, mfn_t sl4mfn);
 
@@ -701,6 +710,58 @@ static inline void sh_unpin(struct domai
 }
 
 
+/**************************************************************************/
+/* Hash table mapping from guest pagetables to shadows
+ *
+ * Normal case: maps the mfn of a guest page to the mfn of its shadow page.
+ * FL1's:       see multi.c.
+ */
+
+static inline mfn_t
+get_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type)
+/* Look for shadows in the hash table */
+{
+    mfn_t smfn = shadow_hash_lookup(d, mfn_x(gmfn), shadow_type);
+    ASSERT(!mfn_valid(smfn) || mfn_to_page(smfn)->u.sh.head);
+    perfc_incr(shadow_get_shadow_status);
+    return smfn;
+}
+
+static inline void
+set_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
+/* Put a shadow into the hash table */
+{
+    int res;
+
+    SHADOW_PRINTK("d%d gmfn=%lx, type=%08x, smfn=%lx\n",
+                  d->domain_id, mfn_x(gmfn), shadow_type, mfn_x(smfn));
+
+    ASSERT(mfn_to_page(smfn)->u.sh.head);
+
+    /* 32-bit PV guests don't own their l4 pages so can't get_page them */
+    if ( !is_pv_32bit_domain(d) || shadow_type != SH_type_l4_64_shadow )
+    {
+        res = get_page(mfn_to_page(gmfn), d);
+        ASSERT(res == 1);
+    }
+
+    shadow_hash_insert(d, mfn_x(gmfn), shadow_type, smfn);
+}
+
+static inline void
+delete_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
+/* Remove a shadow from the hash table */
+{
+    SHADOW_PRINTK("d%d gmfn=%"PRI_mfn", type=%08x, smfn=%"PRI_mfn"\n",
+                  d->domain_id, mfn_x(gmfn), shadow_type, mfn_x(smfn));
+    ASSERT(mfn_to_page(smfn)->u.sh.head);
+    shadow_hash_delete(d, mfn_x(gmfn), shadow_type, smfn);
+    /* 32-bit PV guests don't own their l4 pages; see set_shadow_status */
+    if ( !is_pv_32bit_domain(d) || shadow_type != SH_type_l4_64_shadow )
+        put_page(mfn_to_page(gmfn));
+}
+
+
 /**************************************************************************/
 /* PTE-write emulation. */
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only
  2020-07-15  9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
                   ` (3 preceding siblings ...)
  2020-07-15  9:59 ` [PATCH 4/5] x86/shadow: have just a single instance of sh_set_toplevel_shadow() Jan Beulich
@ 2020-07-15 10:00 ` Jan Beulich
  2020-07-18 18:20   ` Tim Deegan
  2020-07-18 18:21 ` [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Tim Deegan
  5 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2020-07-15 10:00 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné

... by the very fact that they're 3-level specific, while PV always gets
run in 4-level mode. This requires adding some seemingly redundant
#ifdef-s - some of them will be possible to drop again once 2- and
3-level guest code doesn't get built anymore in !HVM configs, but I'm
afraid there's still quite a bit of disentangling work to be done to
make this possible.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -150,10 +150,7 @@ sh_walk_guest_tables(struct vcpu *v, uns
                           ? cr3_pa(v->arch.hvm.guest_cr[3]) >> PAGE_SHIFT
                           : pagetable_get_pfn(v->arch.guest_table));
 
-#if GUEST_PAGING_LEVELS == 3 /* PAE */
-    return guest_walk_tables(v, p2m_get_hostp2m(v->domain), va, gw, pfec,
-                             root_gfn, INVALID_MFN, v->arch.paging.shadow.gl3e);
-#else /* 32 or 64 */
+#if GUEST_PAGING_LEVELS != 3 /* 32 or 64 */
     const struct domain *d = v->domain;
     mfn_t root_mfn = (v->arch.flags & TF_kernel_mode
                       ? pagetable_get_mfn(v->arch.guest_table)
@@ -165,6 +162,13 @@ sh_walk_guest_tables(struct vcpu *v, uns
     unmap_domain_page(root_map);
 
     return ok;
+#elif !defined(CONFIG_HVM)
+    (void)root_gfn;
+    memset(gw, 0, sizeof(*gw));
+    return false;
+#else /* PAE */
+    return guest_walk_tables(v, p2m_get_hostp2m(v->domain), va, gw, pfec,
+                             root_gfn, INVALID_MFN, v->arch.paging.shadow.gl3e);
 #endif
 }
 
@@ -211,7 +215,7 @@ shadow_check_gwalk(struct vcpu *v, unsig
     l3p = map_domain_page(gw->l3mfn);
     mismatch |= (gw->l3e.l3 != l3p[guest_l3_table_offset(va)].l3);
     unmap_domain_page(l3p);
-#else
+#elif defined(CONFIG_HVM)
     mismatch |= (gw->l3e.l3 !=
                  v->arch.paging.shadow.gl3e[guest_l3_table_offset(va)].l3);
 #endif
@@ -1693,6 +1697,8 @@ static shadow_l2e_t * shadow_get_and_cre
     }
     /* Now follow it down a level.  Guaranteed to succeed. */
     return sh_linear_l2_table(v) + shadow_l2_linear_offset(gw->va);
+#elif !defined(CONFIG_HVM)
+    return NULL;
 #elif GUEST_PAGING_LEVELS == 3 /* PAE... */
     /* We never demand-shadow PAE l3es: they are only created in
      * sh_update_cr3().  Check if the relevant sl3e is present. */
@@ -3526,6 +3532,8 @@ static bool sh_invlpg(struct vcpu *v, un
         if ( !(shadow_l3e_get_flags(sl3e) & _PAGE_PRESENT) )
             return false;
     }
+#elif !defined(CONFIG_HVM)
+    return false;
 #else /* SHADOW_PAGING_LEVELS == 3 */
     if ( !(l3e_get_flags(v->arch.paging.shadow.l3table[shadow_l3_linear_offset(linear)])
            & _PAGE_PRESENT) )
@@ -3679,7 +3687,9 @@ sh_update_linear_entries(struct vcpu *v)
          pagetable_get_pfn(v->arch.hvm.monitor_table) == 0 )
         return;
 
-#if SHADOW_PAGING_LEVELS == 4
+#if !defined(CONFIG_HVM)
+    return;
+#elif SHADOW_PAGING_LEVELS == 4
 
     /* For HVM, just need to update the l4e that points to the shadow l4. */
 
@@ -3816,7 +3826,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
 {
     struct domain *d = v->domain;
     mfn_t gmfn;
-#if GUEST_PAGING_LEVELS == 3
+#if GUEST_PAGING_LEVELS == 3 && defined(CONFIG_HVM)
     const guest_l3e_t *gl3e;
     unsigned int i, guest_idx;
 #endif
@@ -3867,7 +3877,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
 #endif
         gmfn = pagetable_get_mfn(v->arch.guest_table);
 
-#if GUEST_PAGING_LEVELS == 3
+#if GUEST_PAGING_LEVELS == 3 && defined(CONFIG_HVM)
     /*
      * On PAE guests we don't use a mapping of the guest's own top-level
      * table.  We cache the current state of that table and shadow that,
@@ -3895,10 +3905,22 @@ sh_update_cr3(struct vcpu *v, int do_loc
     /* We revoke write access to the new guest toplevel page(s) before we
      * replace the old shadow pagetable(s), so that we can safely use the
      * (old) shadow linear maps in the writeable mapping heuristics. */
-#if GUEST_PAGING_LEVELS == 2
-    if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
+#if GUEST_PAGING_LEVELS == 4
+    if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
         guest_flush_tlb_mask(d, d->dirty_cpumask);
-    sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
+    sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
+    if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
+    {
+        mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
+
+        if ( !(v->arch.flags & TF_kernel_mode) && VM_ASSIST(d, m2p_strict) )
+            zap_ro_mpt(smfn);
+        else if ( (v->arch.flags & TF_kernel_mode) &&
+                  !VM_ASSIST(d, m2p_strict) )
+            fill_ro_mpt(smfn);
+    }
+#elif !defined(CONFIG_HVM)
+    ASSERT_UNREACHABLE();
 #elif GUEST_PAGING_LEVELS == 3
     /* PAE guests have four shadow_table entries, based on the
      * current values of the guest's four l3es. */
@@ -3907,7 +3929,8 @@ sh_update_cr3(struct vcpu *v, int do_loc
         gfn_t gl2gfn;
         mfn_t gl2mfn;
         p2m_type_t p2mt;
-        const guest_l3e_t *gl3e = v->arch.paging.shadow.gl3e;
+
+        gl3e = v->arch.paging.shadow.gl3e;
 
         /* First, make all four entries read-only. */
         for ( i = 0; i < 4; i++ )
@@ -3942,25 +3965,16 @@ sh_update_cr3(struct vcpu *v, int do_loc
                 sh_set_toplevel_shadow(v, i, INVALID_MFN, 0, sh_make_shadow);
         }
     }
-#elif GUEST_PAGING_LEVELS == 4
-    if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
+#elif GUEST_PAGING_LEVELS == 2
+    if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
         guest_flush_tlb_mask(d, d->dirty_cpumask);
-    sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
-    if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
-    {
-        mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
-
-        if ( !(v->arch.flags & TF_kernel_mode) && VM_ASSIST(d, m2p_strict) )
-            zap_ro_mpt(smfn);
-        else if ( (v->arch.flags & TF_kernel_mode) &&
-                  !VM_ASSIST(d, m2p_strict) )
-            fill_ro_mpt(smfn);
-    }
+    sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
 #else
 #error This should never happen
 #endif
 
 
+#ifdef CONFIG_HVM
     ///
     /// v->arch.paging.shadow.l3table
     ///
@@ -3986,7 +4000,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
             }
         }
 #endif /* SHADOW_PAGING_LEVELS == 3 */
-
+#endif /* CONFIG_HVM */
 
     ///
     /// v->arch.cr3
@@ -4006,6 +4020,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
 #endif
 
 
+#ifdef CONFIG_HVM
     ///
     /// v->arch.hvm.hw_cr[3]
     ///
@@ -4022,6 +4037,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
 #endif
         hvm_update_guest_cr3(v, noflush);
     }
+#endif /* CONFIG_HVM */
 
     /* Fix up the linear pagetable mappings */
     sh_update_linear_entries(v);
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -131,15 +131,16 @@ struct shadow_domain {
 
 struct shadow_vcpu {
 #ifdef CONFIG_SHADOW_PAGING
+#ifdef CONFIG_HVM
     /* PAE guests: per-vcpu shadow top-level table */
     l3_pgentry_t l3table[4] __attribute__((__aligned__(32)));
     /* PAE guests: per-vcpu cache of the top-level *guest* entries */
     l3_pgentry_t gl3e[4] __attribute__((__aligned__(32)));
 
     /* shadow(s) of guest (MFN) */
-#ifdef CONFIG_HVM
     pagetable_t shadow_table[4];
 #else
+    /* shadow of guest (MFN) */
     pagetable_t shadow_table[1];
 #endif
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only
  2020-07-15 10:00 ` [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only Jan Beulich
@ 2020-07-18 18:20   ` Tim Deegan
  2020-07-20  8:55     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Tim Deegan @ 2020-07-18 18:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

At 12:00 +0200 on 15 Jul (1594814409), Jan Beulich wrote:
> ... by the very fact that they're 3-level specific, while PV always gets
> run in 4-level mode. This requires adding some seemingly redundant
> #ifdef-s - some of them will be possible to drop again once 2- and
> 3-level guest code doesn't get built anymore in !HVM configs, but I'm
> afraid there's still quite a bit of disentangling work to be done to
> make this possible.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Looks good.  It seems like the new code for '3-level non-HVM' in
guest-walks ought to have some sort of assert-unreachable in them too
- or is there a reason to to?

Cheers,

Tim.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up
  2020-07-15  9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
                   ` (4 preceding siblings ...)
  2020-07-15 10:00 ` [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only Jan Beulich
@ 2020-07-18 18:21 ` Tim Deegan
  5 siblings, 0 replies; 12+ messages in thread
From: Tim Deegan @ 2020-07-18 18:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

At 11:56 +0200 on 15 Jul (1594814214), Jan Beulich wrote:
> This in particular goes a few small steps further towards proper
> !HVM and !PV config handling (i.e. no carrying of unnecessary
> baggage).
> 
> 1: x86/shadow: dirty VRAM tracking is needed for HVM only
> 2: x86/shadow: shadow_table[] needs only one entry for PV-only configs
> 3: x86/PV: drop a few misleading paging_mode_refcounts() checks
> 4: x86/shadow: have just a single instance of sh_set_toplevel_shadow()
> 5: x86/shadow: l3table[] and gl3e[] are HVM only

I sent a question on #5 separatly; otherwise these all seem good to
me, thank you!

Acked-by: Tim Deegan <tim@xen.org>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only
  2020-07-18 18:20   ` Tim Deegan
@ 2020-07-20  8:55     ` Jan Beulich
  2020-07-20 17:37       ` Tim Deegan
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2020-07-20  8:55 UTC (permalink / raw)
  To: Tim Deegan
  Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 18.07.2020 20:20, Tim Deegan wrote:
> At 12:00 +0200 on 15 Jul (1594814409), Jan Beulich wrote:
>> ... by the very fact that they're 3-level specific, while PV always gets
>> run in 4-level mode. This requires adding some seemingly redundant
>> #ifdef-s - some of them will be possible to drop again once 2- and
>> 3-level guest code doesn't get built anymore in !HVM configs, but I'm
>> afraid there's still quite a bit of disentangling work to be done to
>> make this possible.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Looks good.  It seems like the new code for '3-level non-HVM' in
> guest-walks ought to have some sort of assert-unreachable in them too
> - or is there a reason to to?

You mean this piece of code

+#elif !defined(CONFIG_HVM)
+    (void)root_gfn;
+    memset(gw, 0, sizeof(*gw));
+    return false;
+#else /* PAE */

If so - sure, ASSERT_UNREACHABLE() could be added there. It simply
didn't occur to me. I take it your ack for the entire series holds
here with this addition.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only
  2020-07-20  8:55     ` Jan Beulich
@ 2020-07-20 17:37       ` Tim Deegan
  0 siblings, 0 replies; 12+ messages in thread
From: Tim Deegan @ 2020-07-20 17:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

At 10:55 +0200 on 20 Jul (1595242521), Jan Beulich wrote:
> On 18.07.2020 20:20, Tim Deegan wrote:
> > At 12:00 +0200 on 15 Jul (1594814409), Jan Beulich wrote:
> >> ... by the very fact that they're 3-level specific, while PV always gets
> >> run in 4-level mode. This requires adding some seemingly redundant
> >> #ifdef-s - some of them will be possible to drop again once 2- and
> >> 3-level guest code doesn't get built anymore in !HVM configs, but I'm
> >> afraid there's still quite a bit of disentangling work to be done to
> >> make this possible.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Looks good.  It seems like the new code for '3-level non-HVM' in
> > guest-walks ought to have some sort of assert-unreachable in them too
> > - or is there a reason to to?
> 
> You mean this piece of code
> 
> +#elif !defined(CONFIG_HVM)
> +    (void)root_gfn;
> +    memset(gw, 0, sizeof(*gw));
> +    return false;
> +#else /* PAE */
> 
> If so - sure, ASSERT_UNREACHABLE() could be added there. It simply
> didn't occur to me. I take it your ack for the entire series holds
> here with this addition.

Yes, it does.  Thanks!

Tim.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Ping: [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks
  2020-07-15  9:59 ` [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks Jan Beulich
@ 2020-07-31 14:58   ` Jan Beulich
  2020-07-31 15:17     ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2020-07-31 14:58 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu, Roger Pau Monné
  Cc: George Dunlap, xen-devel, Tim Deegan

On 15.07.2020 11:59, Jan Beulich wrote:
> The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
> was apparently inconsistent so far: There was a type ref acquired
> unconditionally for the new top level page table, but the dropping of
> the old type ref was conditional upon !paging_mode_refcounts(). Mirror
> this also to arch_set_info_guest().
> 
> Also move new_guest_cr3()'s #ifdef to around the function - both callers
> now get built only when CONFIG_PV, i.e. no need to retain a stub.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

While I've got an ack from Tim, I think I need either an ack from
Andrew or someone's R-b in order to commit this.

Thanks, Jan

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1122,8 +1122,6 @@ int arch_set_info_guest(
>  
>      if ( !cr3_page )
>          rc = -EINVAL;
> -    else if ( paging_mode_refcounts(d) )
> -        /* nothing */;
>      else if ( cr3_page == v->arch.old_guest_table )
>      {
>          v->arch.old_guest_table = NULL;
> @@ -1144,8 +1142,7 @@ int arch_set_info_guest(
>          case -ERESTART:
>              break;
>          case 0:
> -            if ( !compat && !VM_ASSIST(d, m2p_strict) &&
> -                 !paging_mode_refcounts(d) )
> +            if ( !compat && !VM_ASSIST(d, m2p_strict) )
>                  fill_ro_mpt(cr3_mfn);
>              break;
>          default:
> @@ -1166,7 +1163,7 @@ int arch_set_info_guest(
>  
>              if ( !cr3_page )
>                  rc = -EINVAL;
> -            else if ( !paging_mode_refcounts(d) )
> +            else
>              {
>                  rc = get_page_type_preemptible(cr3_page, PGT_root_page_table);
>                  switch ( rc )
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -3149,9 +3149,9 @@ int vcpu_destroy_pagetables(struct vcpu
>      return rc;
>  }
>  
> +#ifdef CONFIG_PV
>  int new_guest_cr3(mfn_t mfn)
>  {
> -#ifdef CONFIG_PV
>      struct vcpu *curr = current;
>      struct domain *d = curr->domain;
>      int rc;
> @@ -3220,7 +3220,7 @@ int new_guest_cr3(mfn_t mfn)
>  
>      pv_destroy_ldt(curr); /* Unconditional TLB flush later. */
>  
> -    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
> +    if ( !VM_ASSIST(d, m2p_strict) )
>          fill_ro_mpt(mfn);
>      curr->arch.guest_table = pagetable_from_mfn(mfn);
>      update_cr3(curr);
> @@ -3231,30 +3231,24 @@ int new_guest_cr3(mfn_t mfn)
>      {
>          struct page_info *page = mfn_to_page(old_base_mfn);
>  
> -        if ( paging_mode_refcounts(d) )
> -            put_page(page);
> -        else
> -            switch ( rc = put_page_and_type_preemptible(page) )
> -            {
> -            case -EINTR:
> -            case -ERESTART:
> -                curr->arch.old_guest_ptpg = NULL;
> -                curr->arch.old_guest_table = page;
> -                curr->arch.old_guest_table_partial = (rc == -ERESTART);
> -                rc = -ERESTART;
> -                break;
> -            default:
> -                BUG_ON(rc);
> -                break;
> -            }
> +        switch ( rc = put_page_and_type_preemptible(page) )
> +        {
> +        case -EINTR:
> +        case -ERESTART:
> +            curr->arch.old_guest_ptpg = NULL;
> +            curr->arch.old_guest_table = page;
> +            curr->arch.old_guest_table_partial = (rc == -ERESTART);
> +            rc = -ERESTART;
> +            break;
> +        default:
> +            BUG_ON(rc);
> +            break;
> +        }
>      }
>  
>      return rc;
> -#else
> -    ASSERT_UNREACHABLE();
> -    return -EINVAL;
> -#endif
>  }
> +#endif
>  
>  #ifdef CONFIG_PV
>  static int vcpumask_to_pcpumask(
> 
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ping: [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks
  2020-07-31 14:58   ` Ping: " Jan Beulich
@ 2020-07-31 15:17     ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2020-07-31 15:17 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu, Roger Pau Monné
  Cc: George Dunlap, xen-devel, Tim Deegan

On 31/07/2020 15:58, Jan Beulich wrote:
> On 15.07.2020 11:59, Jan Beulich wrote:
>> The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
>> was apparently inconsistent so far: There was a type ref acquired
>> unconditionally for the new top level page table, but the dropping of
>> the old type ref was conditional upon !paging_mode_refcounts(). Mirror
>> this also to arch_set_info_guest().
>>
>> Also move new_guest_cr3()'s #ifdef to around the function - both callers
>> now get built only when CONFIG_PV, i.e. no need to retain a stub.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> While I've got an ack from Tim, I think I need either an ack from
> Andrew or someone's R-b in order to commit this.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-31 15:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15  9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
2020-07-15  9:58 ` [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only Jan Beulich
2020-07-15  9:58 ` [PATCH 2/5] x86/shadow: shadow_table[] needs only one entry for PV-only configs Jan Beulich
2020-07-15  9:59 ` [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks Jan Beulich
2020-07-31 14:58   ` Ping: " Jan Beulich
2020-07-31 15:17     ` Andrew Cooper
2020-07-15  9:59 ` [PATCH 4/5] x86/shadow: have just a single instance of sh_set_toplevel_shadow() Jan Beulich
2020-07-15 10:00 ` [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only Jan Beulich
2020-07-18 18:20   ` Tim Deegan
2020-07-20  8:55     ` Jan Beulich
2020-07-20 17:37       ` Tim Deegan
2020-07-18 18:21 ` [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Tim Deegan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).