All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] x86/shadow: misc tidying
@ 2023-01-05 15:57 Jan Beulich
  2023-01-05 15:59 ` [PATCH 01/11] x86/shadow: replace sh_reset_l3_up_pointers() Jan Beulich
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 15:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

... or so I hope. The main observation was that we still have both
hash_vcpu_for_each() and hash_domain_for_each(), where the latter was
introduced in 2014/15 to replace the former. Only some eight years
later we can now complete this conversion. Everything else addresses
other things noticed along the road.

01: replace sh_reset_l3_up_pointers()
02: convert sh_audit_flags()'es 1st parameter to domain
03: drop hash_vcpu_foreach()
04: rename hash_domain_foreach()
05: move bogus HVM checks in sh_pagetable_dying()
06: drop a few uses of mfn_valid()
07: L2H shadow type is PV32-only
08: reduce effort of hash calculation
09: simplify conditionals in sh_{get,put}_ref()
10: correct shadow type bounds checks
11: sh_remove_all_mappings() is HVM-only

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/11] x86/shadow: replace sh_reset_l3_up_pointers()
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
@ 2023-01-05 15:59 ` Jan Beulich
  2023-01-06  0:55   ` Andrew Cooper
  2023-01-05 15:59 ` [PATCH 02/11] x86/shadow: convert sh_audit_flags()'es 1st parameter to domain Jan Beulich
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 15:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

Rather than doing a separate hash walk (and then even using the vCPU
variant, which is to go away), do the up-pointer-clearing right in
sh_unpin(), as an alternative to the (now further limited) enlisting on
a "free floating" list fragment. This utilizes the fact that such list
fragments are traversed only for multi-page shadows (in shadow_free()).
Furthermore sh_terminate_list() is a safe guard only anyway, which isn't
in use in the common case (it actually does anything only for BIGMEM
configurations).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -116,6 +116,9 @@ struct shadow_domain {
     /* OOS */
     bool_t oos_active;
 
+    /* Domain is in the process of leaving SHOPT_LINUX_L3_TOPLEVEL mode. */
+    bool unpinning_l3;
+
 #ifdef CONFIG_HVM
     /* Has this domain ever used HVMOP_pagetable_dying? */
     bool_t pagetable_dying_op;
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2306,29 +2306,6 @@ void shadow_prepare_page_type_change(str
 
 /**************************************************************************/
 
-/* Reset the up-pointers of every L3 shadow to 0.
- * This is called when l3 shadows stop being pinnable, to clear out all
- * the list-head bits so the up-pointer field is properly inititalised. */
-static int cf_check sh_clear_up_pointer(
-    struct vcpu *v, mfn_t smfn, mfn_t unused)
-{
-    mfn_to_page(smfn)->up = 0;
-    return 0;
-}
-
-void sh_reset_l3_up_pointers(struct vcpu *v)
-{
-    static const hash_vcpu_callback_t callbacks[SH_type_unused] = {
-        [SH_type_l3_64_shadow] = sh_clear_up_pointer,
-    };
-
-    HASH_CALLBACKS_CHECK(SHF_L3_64);
-    hash_vcpu_foreach(v, SHF_L3_64, callbacks, INVALID_MFN);
-}
-
-
-/**************************************************************************/
-
 static void sh_update_paging_modes(struct vcpu *v)
 {
     struct domain *d = v->domain;
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -960,6 +960,8 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
         }
         if ( l4count > 2 * d->max_vcpus )
         {
+            d->arch.paging.shadow.unpinning_l3 = true;
+
             /* Unpin all the pinned l3 tables, and don't pin any more. */
             page_list_for_each_safe(sp, t, &d->arch.paging.shadow.pinned_shadows)
             {
@@ -967,7 +969,8 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
                     sh_unpin(d, page_to_mfn(sp));
             }
             d->arch.paging.shadow.opt_flags &= ~SHOPT_LINUX_L3_TOPLEVEL;
-            sh_reset_l3_up_pointers(v);
+
+            d->arch.paging.shadow.unpinning_l3 = false;
         }
     }
 #endif
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -497,11 +497,6 @@ void shadow_blow_tables(struct domain *d
  */
 int sh_remove_all_mappings(struct domain *d, mfn_t gmfn, gfn_t gfn);
 
-/* Reset the up-pointers of every L3 shadow to 0.
- * This is called when l3 shadows stop being pinnable, to clear out all
- * the list-head bits so the up-pointer field is properly inititalised. */
-void sh_reset_l3_up_pointers(struct vcpu *v);
-
 /******************************************************************************
  * Flags used in the return value of the shadow_set_lXe() functions...
  */
@@ -722,7 +717,7 @@ static inline void sh_unpin(struct domai
 {
     struct page_list_head tmp_list, *pin_list;
     struct page_info *sp, *next;
-    unsigned int i, head_type;
+    unsigned int i, head_type, sz;
 
     ASSERT(mfn_valid(smfn));
     sp = mfn_to_page(smfn);
@@ -734,20 +729,30 @@ static inline void sh_unpin(struct domai
         return;
     sp->u.sh.pinned = 0;
 
-    /* Cut the sub-list out of the list of pinned shadows,
-     * stitching it back into a list fragment of its own. */
+    sz = shadow_size(head_type);
+
+    /*
+     * Cut the sub-list out of the list of pinned shadows, stitching
+     * multi-page shadows back into a list fragment of their own.
+     */
     pin_list = &d->arch.paging.shadow.pinned_shadows;
     INIT_PAGE_LIST_HEAD(&tmp_list);
-    for ( i = 0; i < shadow_size(head_type); i++ )
+    for ( i = 0; i < sz; i++ )
     {
         ASSERT(sp->u.sh.type == head_type);
         ASSERT(!i || !sp->u.sh.head);
         next = page_list_next(sp, pin_list);
         page_list_del(sp, pin_list);
-        page_list_add_tail(sp, &tmp_list);
+        if ( sz > 1 )
+            page_list_add_tail(sp, &tmp_list);
+        else if ( head_type == SH_type_l3_64_shadow &&
+                  d->arch.paging.shadow.unpinning_l3 )
+            sp->up = 0;
         sp = next;
     }
-    sh_terminate_list(&tmp_list);
+
+    if ( sz > 1 )
+        sh_terminate_list(&tmp_list);
 
     sh_put_ref(d, smfn, 0);
 }



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 02/11] x86/shadow: convert sh_audit_flags()'es 1st parameter to domain
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
  2023-01-05 15:59 ` [PATCH 01/11] x86/shadow: replace sh_reset_l3_up_pointers() Jan Beulich
@ 2023-01-05 15:59 ` Jan Beulich
  2023-01-05 15:59 ` [PATCH 03/11] x86/shadow: drop hash_vcpu_foreach() Jan Beulich
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 15:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

Nothing in there is vCPU-specific.

With the introduction of the local variable in sh_audit_l1_table(),
convert other uses of v->domain as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3919,13 +3919,13 @@ static void cf_check sh_pagetable_dying(
     done = 1;                                                           \
 } while (0)
 
-static const char *sh_audit_flags(struct vcpu *v, int level,
+static const char *sh_audit_flags(const struct domain *d, int level,
                                   int gflags, int sflags)
 /* Common code for auditing flag bits */
 {
     if ( (sflags & _PAGE_PRESENT) && !(gflags & _PAGE_PRESENT) )
         return "shadow is present but guest is not present";
-    if ( (sflags & _PAGE_GLOBAL) && !is_hvm_vcpu(v) )
+    if ( (sflags & _PAGE_GLOBAL) && !is_hvm_domain(d) )
         return "global bit set in PV shadow";
     if ( level == 2 && (sflags & _PAGE_PSE) )
         return "PS bit set in shadow";
@@ -3948,6 +3948,7 @@ static const char *sh_audit_flags(struct
 
 int cf_check sh_audit_l1_table(struct vcpu *v, mfn_t sl1mfn, mfn_t x)
 {
+    struct domain *d = v->domain;
     guest_l1e_t *gl1e, *gp;
     shadow_l1e_t *sl1e;
     mfn_t mfn, gmfn, gl1mfn;
@@ -3964,7 +3965,7 @@ int cf_check sh_audit_l1_table(struct vc
     /* Out-of-sync l1 shadows can contain anything: just check the OOS hash */
     if ( page_is_out_of_sync(mfn_to_page(gl1mfn)) )
     {
-        oos_audit_hash_is_present(v->domain, gl1mfn);
+        oos_audit_hash_is_present(d, gl1mfn);
         return 0;
     }
 #endif
@@ -3994,7 +3995,7 @@ int cf_check sh_audit_l1_table(struct vc
         }
         else
         {
-            s = sh_audit_flags(v, 1, guest_l1e_get_flags(*gl1e),
+            s = sh_audit_flags(d, 1, guest_l1e_get_flags(*gl1e),
                                shadow_l1e_get_flags(*sl1e));
             if ( s ) AUDIT_FAIL(1, "%s", s);
 
@@ -4002,7 +4003,7 @@ int cf_check sh_audit_l1_table(struct vc
             {
                 gfn = guest_l1e_get_gfn(*gl1e);
                 mfn = shadow_l1e_get_mfn(*sl1e);
-                gmfn = get_gfn_query_unlocked(v->domain, gfn_x(gfn), &p2mt);
+                gmfn = get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt);
                 if ( !p2m_is_grant(p2mt) && !mfn_eq(gmfn, mfn) )
                     AUDIT_FAIL(1, "bad translation: gfn %" SH_PRI_gfn
                                " --> %" PRI_mfn " != mfn %" PRI_mfn,
@@ -4064,8 +4065,8 @@ int cf_check sh_audit_l2_table(struct vc
     gl2e = gp = map_domain_page(gl2mfn);
     SHADOW_FOREACH_L2E(sl2mfn, sl2e, &gl2e, done, d, {
 
-        s = sh_audit_flags(v, 2, guest_l2e_get_flags(*gl2e),
-                            shadow_l2e_get_flags(*sl2e));
+        s = sh_audit_flags(d, 2, guest_l2e_get_flags(*gl2e),
+                           shadow_l2e_get_flags(*sl2e));
         if ( s ) AUDIT_FAIL(2, "%s", s);
 
         if ( SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES_MFNS )
@@ -4116,8 +4117,8 @@ int cf_check sh_audit_l3_table(struct vc
     gl3e = gp = map_domain_page(gl3mfn);
     SHADOW_FOREACH_L3E(sl3mfn, sl3e, &gl3e, done, {
 
-        s = sh_audit_flags(v, 3, guest_l3e_get_flags(*gl3e),
-                            shadow_l3e_get_flags(*sl3e));
+        s = sh_audit_flags(d, 3, guest_l3e_get_flags(*gl3e),
+                           shadow_l3e_get_flags(*sl3e));
         if ( s ) AUDIT_FAIL(3, "%s", s);
 
         if ( SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES_MFNS )
@@ -4164,8 +4165,8 @@ int cf_check sh_audit_l4_table(struct vc
     gl4e = gp = map_domain_page(gl4mfn);
     SHADOW_FOREACH_L4E(sl4mfn, sl4e, &gl4e, done, d,
     {
-        s = sh_audit_flags(v, 4, guest_l4e_get_flags(*gl4e),
-                            shadow_l4e_get_flags(*sl4e));
+        s = sh_audit_flags(d, 4, guest_l4e_get_flags(*gl4e),
+                           shadow_l4e_get_flags(*sl4e));
         if ( s ) AUDIT_FAIL(4, "%s", s);
 
         if ( SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES_MFNS )



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 03/11] x86/shadow: drop hash_vcpu_foreach()
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
  2023-01-05 15:59 ` [PATCH 01/11] x86/shadow: replace sh_reset_l3_up_pointers() Jan Beulich
  2023-01-05 15:59 ` [PATCH 02/11] x86/shadow: convert sh_audit_flags()'es 1st parameter to domain Jan Beulich
@ 2023-01-05 15:59 ` Jan Beulich
  2023-01-05 16:00 ` [PATCH 04/11] x86/shadow: rename hash_domain_foreach() Jan Beulich
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 15:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

The domain based variant is easily usable by shadow_audit_tables(); all
that's needed is conversion of the callback functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1640,59 +1640,11 @@ bool shadow_hash_delete(struct domain *d
     return true;
 }
 
-typedef int (*hash_vcpu_callback_t)(struct vcpu *v, mfn_t smfn, mfn_t other_mfn);
 typedef int (*hash_domain_callback_t)(struct domain *d, mfn_t smfn, mfn_t other_mfn);
 
 #define HASH_CALLBACKS_CHECK(mask) \
     BUILD_BUG_ON((mask) > (1U << ARRAY_SIZE(callbacks)) - 1)
 
-static void hash_vcpu_foreach(struct vcpu *v, unsigned int callback_mask,
-                              const hash_vcpu_callback_t callbacks[],
-                              mfn_t callback_mfn)
-/* Walk the hash table looking at the types of the entries and
- * calling the appropriate callback function for each entry.
- * The mask determines which shadow types we call back for, and the array
- * of callbacks tells us which function to call.
- * Any callback may return non-zero to let us skip the rest of the scan.
- *
- * WARNING: Callbacks MUST NOT add or remove hash entries unless they
- * then return non-zero to terminate the scan. */
-{
-    int i, done = 0;
-    struct domain *d = v->domain;
-    struct page_info *x;
-
-    ASSERT(paging_locked_by_me(d));
-
-    /* Can be called via p2m code &c after shadow teardown. */
-    if ( unlikely(!d->arch.paging.shadow.hash_table) )
-        return;
-
-    /* Say we're here, to stop hash-lookups reordering the chains */
-    ASSERT(d->arch.paging.shadow.hash_walking == 0);
-    d->arch.paging.shadow.hash_walking = 1;
-
-    for ( i = 0; i < SHADOW_HASH_BUCKETS; i++ )
-    {
-        /* WARNING: This is not safe against changes to the hash table.
-         * The callback *must* return non-zero if it has inserted or
-         * deleted anything from the hash (lookups are OK, though). */
-        for ( x = d->arch.paging.shadow.hash_table[i]; x; x = next_shadow(x) )
-        {
-            if ( callback_mask & (1 << x->u.sh.type) )
-            {
-                ASSERT(x->u.sh.type <= SH_type_max_shadow);
-                ASSERT(callbacks[x->u.sh.type] != NULL);
-                done = callbacks[x->u.sh.type](v, page_to_mfn(x),
-                                               callback_mfn);
-                if ( done ) break;
-            }
-        }
-        if ( done ) break;
-    }
-    d->arch.paging.shadow.hash_walking = 0;
-}
-
 static void hash_domain_foreach(struct domain *d,
                                 unsigned int callback_mask,
                                 const hash_domain_callback_t callbacks[],
@@ -3215,7 +3167,7 @@ int shadow_domctl(struct domain *d,
 void shadow_audit_tables(struct vcpu *v)
 {
     /* Dispatch table for getting per-type functions */
-    static const hash_vcpu_callback_t callbacks[SH_type_unused] = {
+    static const hash_domain_callback_t callbacks[SH_type_unused] = {
 #if SHADOW_AUDIT & (SHADOW_AUDIT_ENTRIES | SHADOW_AUDIT_ENTRIES_FULL)
 # ifdef CONFIG_HVM
         [SH_type_l1_32_shadow] = SHADOW_INTERNAL_NAME(sh_audit_l1_table, 2),
@@ -3262,7 +3214,7 @@ void shadow_audit_tables(struct vcpu *v)
     HASH_CALLBACKS_CHECK(SHADOW_AUDIT & (SHADOW_AUDIT_ENTRIES |
                                          SHADOW_AUDIT_ENTRIES_FULL)
                          ? SHF_page_type_mask : 0);
-    hash_vcpu_foreach(v, mask, callbacks, INVALID_MFN);
+    hash_domain_foreach(v->domain, mask, callbacks, INVALID_MFN);
 }
 
 #ifdef CONFIG_PV
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -326,32 +326,32 @@ static void sh_audit_gw(struct vcpu *v,
     if ( mfn_valid(gw->l4mfn)
          && mfn_valid((smfn = get_shadow_status(d, gw->l4mfn,
                                                 SH_type_l4_shadow))) )
-        (void) sh_audit_l4_table(v, smfn, INVALID_MFN);
+        sh_audit_l4_table(d, smfn, INVALID_MFN);
     if ( mfn_valid(gw->l3mfn)
          && mfn_valid((smfn = get_shadow_status(d, gw->l3mfn,
                                                 SH_type_l3_shadow))) )
-        (void) sh_audit_l3_table(v, smfn, INVALID_MFN);
+        sh_audit_l3_table(d, smfn, INVALID_MFN);
 #endif /* PAE or 64... */
     if ( mfn_valid(gw->l2mfn) )
     {
         if ( mfn_valid((smfn = get_shadow_status(d, gw->l2mfn,
                                                  SH_type_l2_shadow))) )
-            (void) sh_audit_l2_table(v, smfn, INVALID_MFN);
+            sh_audit_l2_table(d, smfn, INVALID_MFN);
 #if GUEST_PAGING_LEVELS >= 4 /* 32-bit PV only */
         if ( mfn_valid((smfn = get_shadow_status(d, gw->l2mfn,
                                                  SH_type_l2h_shadow))) )
-            (void) sh_audit_l2_table(v, smfn, INVALID_MFN);
+            sh_audit_l2_table(d, smfn, INVALID_MFN);
 #endif
     }
     if ( mfn_valid(gw->l1mfn)
          && mfn_valid((smfn = get_shadow_status(d, gw->l1mfn,
                                                 SH_type_l1_shadow))) )
-        (void) sh_audit_l1_table(v, smfn, INVALID_MFN);
+        sh_audit_l1_table(d, smfn, INVALID_MFN);
     else if ( (guest_l2e_get_flags(gw->l2e) & _PAGE_PRESENT)
               && (guest_l2e_get_flags(gw->l2e) & _PAGE_PSE)
               && mfn_valid(
               (smfn = get_fl1_shadow_status(d, guest_l2e_get_gfn(gw->l2e)))) )
-        (void) sh_audit_fl1_table(v, smfn, INVALID_MFN);
+        sh_audit_fl1_table(d, smfn, INVALID_MFN);
 #endif /* SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES */
 }
 
@@ -3946,9 +3946,8 @@ static const char *sh_audit_flags(const
     return NULL;
 }
 
-int cf_check sh_audit_l1_table(struct vcpu *v, mfn_t sl1mfn, mfn_t x)
+int cf_check sh_audit_l1_table(struct domain *d, mfn_t sl1mfn, mfn_t x)
 {
-    struct domain *d = v->domain;
     guest_l1e_t *gl1e, *gp;
     shadow_l1e_t *sl1e;
     mfn_t mfn, gmfn, gl1mfn;
@@ -4015,7 +4014,7 @@ int cf_check sh_audit_l1_table(struct vc
     return done;
 }
 
-int cf_check sh_audit_fl1_table(struct vcpu *v, mfn_t sl1mfn, mfn_t x)
+int cf_check sh_audit_fl1_table(struct domain *d, mfn_t sl1mfn, mfn_t x)
 {
     guest_l1e_t *gl1e, e;
     shadow_l1e_t *sl1e;
@@ -4041,9 +4040,8 @@ int cf_check sh_audit_fl1_table(struct v
     return 0;
 }
 
-int cf_check sh_audit_l2_table(struct vcpu *v, mfn_t sl2mfn, mfn_t x)
+int cf_check sh_audit_l2_table(struct domain *d, mfn_t sl2mfn, mfn_t x)
 {
-    struct domain *d = v->domain;
     guest_l2e_t *gl2e, *gp;
     shadow_l2e_t *sl2e;
     mfn_t mfn, gmfn, gl2mfn;
@@ -4093,9 +4091,8 @@ int cf_check sh_audit_l2_table(struct vc
 }
 
 #if GUEST_PAGING_LEVELS >= 4
-int cf_check sh_audit_l3_table(struct vcpu *v, mfn_t sl3mfn, mfn_t x)
+int cf_check sh_audit_l3_table(struct domain *d, mfn_t sl3mfn, mfn_t x)
 {
-    struct domain *d = v->domain;
     guest_l3e_t *gl3e, *gp;
     shadow_l3e_t *sl3e;
     mfn_t mfn, gmfn, gl3mfn;
@@ -4141,9 +4138,8 @@ int cf_check sh_audit_l3_table(struct vc
     return 0;
 }
 
-int cf_check sh_audit_l4_table(struct vcpu *v, mfn_t sl4mfn, mfn_t x)
+int cf_check sh_audit_l4_table(struct domain *d, mfn_t sl4mfn, mfn_t x)
 {
-    struct domain *d = v->domain;
     guest_l4e_t *gl4e, *gp;
     shadow_l4e_t *sl4e;
     mfn_t mfn, gmfn, gl4mfn;
--- a/xen/arch/x86/mm/shadow/multi.h
+++ b/xen/arch/x86/mm/shadow/multi.h
@@ -83,19 +83,19 @@ SHADOW_INTERNAL_NAME(sh_remove_l3_shadow
 #if SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES
 int cf_check
 SHADOW_INTERNAL_NAME(sh_audit_l1_table, GUEST_LEVELS)
-    (struct vcpu *v, mfn_t sl1mfn, mfn_t x);
+    (struct domain *d, mfn_t sl1mfn, mfn_t x);
 int cf_check
 SHADOW_INTERNAL_NAME(sh_audit_fl1_table, GUEST_LEVELS)
-    (struct vcpu *v, mfn_t sl1mfn, mfn_t x);
+    (struct domain *d, mfn_t sl1mfn, mfn_t x);
 int cf_check
 SHADOW_INTERNAL_NAME(sh_audit_l2_table, GUEST_LEVELS)
-    (struct vcpu *v, mfn_t sl2mfn, mfn_t x);
+    (struct domain *d, mfn_t sl2mfn, mfn_t x);
 int cf_check
 SHADOW_INTERNAL_NAME(sh_audit_l3_table, GUEST_LEVELS)
-    (struct vcpu *v, mfn_t sl3mfn, mfn_t x);
+    (struct domain *d, mfn_t sl3mfn, mfn_t x);
 int cf_check
 SHADOW_INTERNAL_NAME(sh_audit_l4_table, GUEST_LEVELS)
-    (struct vcpu *v, mfn_t sl4mfn, mfn_t x);
+    (struct domain *d, mfn_t sl4mfn, mfn_t x);
 #endif
 
 extern const struct paging_mode



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 04/11] x86/shadow: rename hash_domain_foreach()
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (2 preceding siblings ...)
  2023-01-05 15:59 ` [PATCH 03/11] x86/shadow: drop hash_vcpu_foreach() Jan Beulich
@ 2023-01-05 16:00 ` Jan Beulich
  2023-01-05 16:04 ` [PATCH 05/11] x86/shadow: move bogus HVM checks in sh_pagetable_dying() Jan Beulich
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 16:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

The "domain" in there has become meaningless; drop it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1640,15 +1640,15 @@ bool shadow_hash_delete(struct domain *d
     return true;
 }
 
-typedef int (*hash_domain_callback_t)(struct domain *d, mfn_t smfn, mfn_t other_mfn);
+typedef int (*hash_callback_t)(struct domain *d, mfn_t smfn, mfn_t other_mfn);
 
 #define HASH_CALLBACKS_CHECK(mask) \
     BUILD_BUG_ON((mask) > (1U << ARRAY_SIZE(callbacks)) - 1)
 
-static void hash_domain_foreach(struct domain *d,
-                                unsigned int callback_mask,
-                                const hash_domain_callback_t callbacks[],
-                                mfn_t callback_mfn)
+static void hash_foreach(struct domain *d,
+                         unsigned int callback_mask,
+                         const hash_callback_t callbacks[],
+                         mfn_t callback_mfn)
 /* Walk the hash table looking at the types of the entries and
  * calling the appropriate callback function for each entry.
  * The mask determines which shadow types we call back for, and the array
@@ -1784,7 +1784,7 @@ int sh_remove_write_access(struct domain
                            unsigned long fault_addr)
 {
     /* Dispatch table for getting per-type functions */
-    static const hash_domain_callback_t callbacks[SH_type_unused] = {
+    static const hash_callback_t callbacks[SH_type_unused] = {
 #ifdef CONFIG_HVM
         [SH_type_l1_32_shadow] = SHADOW_INTERNAL_NAME(sh_rm_write_access_from_l1, 2),
         [SH_type_fl1_32_shadow] = SHADOW_INTERNAL_NAME(sh_rm_write_access_from_l1, 2),
@@ -1969,7 +1969,7 @@ int sh_remove_write_access(struct domain
     else
         perfc_incr(shadow_writeable_bf);
     HASH_CALLBACKS_CHECK(SHF_L1_ANY | SHF_FL1_ANY);
-    hash_domain_foreach(d, SHF_L1_ANY | SHF_FL1_ANY, callbacks, gmfn);
+    hash_foreach(d, SHF_L1_ANY | SHF_FL1_ANY, callbacks, gmfn);
 
     /* If that didn't catch the mapping, then there's some non-pagetable
      * mapping -- ioreq page, grant mapping, &c. */
@@ -1998,7 +1998,7 @@ int sh_remove_all_mappings(struct domain
     struct page_info *page = mfn_to_page(gmfn);
 
     /* Dispatch table for getting per-type functions */
-    static const hash_domain_callback_t callbacks[SH_type_unused] = {
+    static const hash_callback_t callbacks[SH_type_unused] = {
 #ifdef CONFIG_HVM
         [SH_type_l1_32_shadow] = SHADOW_INTERNAL_NAME(sh_rm_mappings_from_l1, 2),
         [SH_type_fl1_32_shadow] = SHADOW_INTERNAL_NAME(sh_rm_mappings_from_l1, 2),
@@ -2024,7 +2024,7 @@ int sh_remove_all_mappings(struct domain
     /* Brute-force search of all the shadows, by walking the hash */
     perfc_incr(shadow_mappings_bf);
     HASH_CALLBACKS_CHECK(SHF_L1_ANY | SHF_FL1_ANY);
-    hash_domain_foreach(d, SHF_L1_ANY | SHF_FL1_ANY, callbacks, gmfn);
+    hash_foreach(d, SHF_L1_ANY | SHF_FL1_ANY, callbacks, gmfn);
 
     /* If that didn't catch the mapping, something is very wrong */
     if ( !sh_check_page_has_no_refs(page) )
@@ -2132,7 +2132,7 @@ void sh_remove_shadows(struct domain *d,
 
     /* Dispatch table for getting per-type functions: each level must
      * be called with the function to remove a lower-level shadow. */
-    static const hash_domain_callback_t callbacks[SH_type_unused] = {
+    static const hash_callback_t callbacks[SH_type_unused] = {
 #ifdef CONFIG_HVM
         [SH_type_l2_32_shadow] = SHADOW_INTERNAL_NAME(sh_remove_l1_shadow, 2),
         [SH_type_l2_pae_shadow] = SHADOW_INTERNAL_NAME(sh_remove_l1_shadow, 3),
@@ -2177,9 +2177,9 @@ void sh_remove_shadows(struct domain *d,
 
     /*
      * Lower-level shadows need to be excised from upper-level shadows. This
-     * call to hash_domain_foreach() looks dangerous but is in fact OK: each
-     * call will remove at most one shadow, and terminate immediately when
-     * it does remove it, so we never walk the hash after doing a deletion.
+     * call to hash_foreach() looks dangerous but is in fact OK: each call
+     * will remove at most one shadow, and terminate immediately when it does
+     * remove it, so we never walk the hash after doing a deletion.
      */
 #define DO_UNSHADOW(_type) do {                                         \
     t = (_type);                                                        \
@@ -2203,7 +2203,7 @@ void sh_remove_shadows(struct domain *d,
          (pg->shadow_flags & (1 << t)) )                                \
     {                                                                   \
         HASH_CALLBACKS_CHECK(SHF_page_type_mask);                       \
-        hash_domain_foreach(d, masks[t], callbacks, smfn);              \
+        hash_foreach(d, masks[t], callbacks, smfn);                     \
     }                                                                   \
 } while (0)
 
@@ -3167,7 +3167,7 @@ int shadow_domctl(struct domain *d,
 void shadow_audit_tables(struct vcpu *v)
 {
     /* Dispatch table for getting per-type functions */
-    static const hash_domain_callback_t callbacks[SH_type_unused] = {
+    static const hash_callback_t callbacks[SH_type_unused] = {
 #if SHADOW_AUDIT & (SHADOW_AUDIT_ENTRIES | SHADOW_AUDIT_ENTRIES_FULL)
 # ifdef CONFIG_HVM
         [SH_type_l1_32_shadow] = SHADOW_INTERNAL_NAME(sh_audit_l1_table, 2),
@@ -3214,7 +3214,7 @@ void shadow_audit_tables(struct vcpu *v)
     HASH_CALLBACKS_CHECK(SHADOW_AUDIT & (SHADOW_AUDIT_ENTRIES |
                                          SHADOW_AUDIT_ENTRIES_FULL)
                          ? SHF_page_type_mask : 0);
-    hash_domain_foreach(v->domain, mask, callbacks, INVALID_MFN);
+    hash_foreach(v->domain, mask, callbacks, INVALID_MFN);
 }
 
 #ifdef CONFIG_PV



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 05/11] x86/shadow: move bogus HVM checks in sh_pagetable_dying()
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (3 preceding siblings ...)
  2023-01-05 16:00 ` [PATCH 04/11] x86/shadow: rename hash_domain_foreach() Jan Beulich
@ 2023-01-05 16:04 ` Jan Beulich
  2023-01-06  1:00   ` Andrew Cooper
  2023-01-05 16:04 ` [PATCH 06/11] x86/shadow: drop a few uses of mfn_valid() Jan Beulich
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 16:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

Perhaps these should have been dropped right in 2fb2dee1ac62 ("x86/mm:
pagetable_dying() is HVM-only"). Convert both to assertions, noting that
in particular the one in the 3-level variant of the function comes too
late anyway - first thing there we access the HVM part of a union.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3780,6 +3780,8 @@ static void cf_check sh_pagetable_dying(
     unsigned long l3gfn;
     mfn_t l3mfn;
 
+    ASSERT(is_hvm_domain(d));
+
     gcr3 = v->arch.hvm.guest_cr[3];
     /* fast path: the pagetable belongs to the current context */
     if ( gcr3 == gpa )
@@ -3822,7 +3824,7 @@ static void cf_check sh_pagetable_dying(
                    : shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l2_pae_shadow);
         }
 
-        if ( mfn_valid(smfn) && is_hvm_domain(d) )
+        if ( mfn_valid(smfn) )
         {
             gmfn = _mfn(mfn_to_page(smfn)->v.sh.back);
             mfn_to_page(gmfn)->pagetable_dying = true;
@@ -3854,6 +3856,8 @@ static void cf_check sh_pagetable_dying(
     mfn_t smfn, gmfn;
     p2m_type_t p2mt;
 
+    ASSERT(is_hvm_domain(d));
+
     gmfn = get_gfn_query(d, _gfn(gpa >> PAGE_SHIFT), &p2mt);
     paging_lock(d);
 
@@ -3863,7 +3867,7 @@ static void cf_check sh_pagetable_dying(
     smfn = shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l4_64_shadow);
 #endif
 
-    if ( mfn_valid(smfn) && is_hvm_domain(d) )
+    if ( mfn_valid(smfn) )
     {
         mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 06/11] x86/shadow: drop a few uses of mfn_valid()
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (4 preceding siblings ...)
  2023-01-05 16:04 ` [PATCH 05/11] x86/shadow: move bogus HVM checks in sh_pagetable_dying() Jan Beulich
@ 2023-01-05 16:04 ` Jan Beulich
  2023-01-06  1:02   ` Andrew Cooper
  2023-01-05 16:05 ` [PATCH 07/11] x86/shadow: L2H shadow type is PV32-only Jan Beulich
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 16:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

v->arch.paging.shadow.shadow_table[], v->arch.paging.shadow.oos[],
v->arch.paging.shadow.oos_{snapshot[],fixup[].smfn[]} as well as the
hash table are all only ever written with valid MFNs or INVALID_MFN.
Avoid the somewhat expensive mfn_valid() when checking MFNs coming from
these arrays.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
There are many more uses which can likely be replaced, but I think we're
better off doing this in piecemeal fashion.

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -171,7 +171,7 @@ static void sh_oos_audit(struct domain *
         for ( idx = 0; idx < SHADOW_OOS_PAGES; idx++ )
         {
             mfn_t *oos = v->arch.paging.shadow.oos;
-            if ( !mfn_valid(oos[idx]) )
+            if ( mfn_eq(oos[idx], INVALID_MFN) )
                 continue;
 
             expected_idx = mfn_x(oos[idx]) % SHADOW_OOS_PAGES;
@@ -327,8 +327,7 @@ void oos_fixup_add(struct domain *d, mfn
             int i;
             for ( i = 0; i < SHADOW_OOS_FIXUPS; i++ )
             {
-                if ( mfn_valid(oos_fixup[idx].smfn[i])
-                     && mfn_eq(oos_fixup[idx].smfn[i], smfn)
+                if ( mfn_eq(oos_fixup[idx].smfn[i], smfn)
                      && (oos_fixup[idx].off[i] == off) )
                     return;
             }
@@ -461,7 +460,7 @@ static void oos_hash_add(struct vcpu *v,
     idx = mfn_x(gmfn) % SHADOW_OOS_PAGES;
     oidx = idx;
 
-    if ( mfn_valid(oos[idx])
+    if ( !mfn_eq(oos[idx], INVALID_MFN)
          && (mfn_x(oos[idx]) % SHADOW_OOS_PAGES) == idx )
     {
         /* Punt the current occupant into the next slot */
@@ -470,8 +469,8 @@ static void oos_hash_add(struct vcpu *v,
         swap = 1;
         idx = (idx + 1) % SHADOW_OOS_PAGES;
     }
-    if ( mfn_valid(oos[idx]) )
-   {
+    if ( !mfn_eq(oos[idx], INVALID_MFN) )
+    {
         /* Crush the current occupant. */
         _sh_resync(v, oos[idx], &oos_fixup[idx], oos_snapshot[idx]);
         perfc_incr(shadow_unsync_evict);
@@ -607,7 +606,7 @@ void sh_resync_all(struct vcpu *v, int s
 
     /* First: resync all of this vcpu's oos pages */
     for ( idx = 0; idx < SHADOW_OOS_PAGES; idx++ )
-        if ( mfn_valid(oos[idx]) )
+        if ( !mfn_eq(oos[idx], INVALID_MFN) )
         {
             /* Write-protect and sync contents */
             _sh_resync(v, oos[idx], &oos_fixup[idx], oos_snapshot[idx]);
@@ -630,7 +629,7 @@ void sh_resync_all(struct vcpu *v, int s
 
         for ( idx = 0; idx < SHADOW_OOS_PAGES; idx++ )
         {
-            if ( !mfn_valid(oos[idx]) )
+            if ( mfn_eq(oos[idx], INVALID_MFN) )
                 continue;
 
             if ( skip )
@@ -2187,7 +2186,7 @@ void sh_remove_shadows(struct domain *d,
          !(pg->shadow_flags & (1 << t)) )                               \
         break;                                                          \
     smfn = shadow_hash_lookup(d, mfn_x(gmfn), t);                       \
-    if ( unlikely(!mfn_valid(smfn)) )                                   \
+    if ( unlikely(mfn_eq(smfn, INVALID_MFN)) )                          \
     {                                                                   \
         printk(XENLOG_G_ERR "gmfn %"PRI_mfn" has flags %#x"             \
                " but no type-%#x shadow\n",                             \
@@ -2755,7 +2754,7 @@ void shadow_teardown(struct domain *d, b
             int i;
             mfn_t *oos_snapshot = v->arch.paging.shadow.oos_snapshot;
             for ( i = 0; i < SHADOW_OOS_PAGES; i++ )
-                if ( mfn_valid(oos_snapshot[i]) )
+                if ( !mfn_eq(oos_snapshot[i], INVALID_MFN) )
                 {
                     shadow_free(d, oos_snapshot[i]);
                     oos_snapshot[i] = INVALID_MFN;
@@ -2938,7 +2937,7 @@ static int shadow_one_bit_disable(struct
                 int i;
                 mfn_t *oos_snapshot = v->arch.paging.shadow.oos_snapshot;
                 for ( i = 0; i < SHADOW_OOS_PAGES; i++ )
-                    if ( mfn_valid(oos_snapshot[i]) )
+                    if ( !mfn_eq(oos_snapshot[i], INVALID_MFN) )
                     {
                         shadow_free(d, oos_snapshot[i]);
                         oos_snapshot[i] = INVALID_MFN;
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -110,7 +110,7 @@ get_fl1_shadow_status(struct domain *d,
 /* Look for FL1 shadows in the hash table */
 {
     mfn_t smfn = shadow_hash_lookup(d, gfn_x(gfn), SH_type_fl1_shadow);
-    ASSERT(!mfn_valid(smfn) || mfn_to_page(smfn)->u.sh.head);
+    ASSERT(mfn_eq(smfn, INVALID_MFN) || mfn_to_page(smfn)->u.sh.head);
     return smfn;
 }
 
@@ -2680,7 +2680,7 @@ static int cf_check sh_page_fault(
                 mfn_t smfn = pagetable_get_mfn(
                                  v->arch.paging.shadow.shadow_table[i]);
 
-                if ( mfn_valid(smfn) && (mfn_x(smfn) != 0) )
+                if ( mfn_x(smfn) )
                 {
                     used |= (mfn_to_page(smfn)->v.sh.back == mfn_x(gmfn));
 
@@ -3824,7 +3824,7 @@ static void cf_check sh_pagetable_dying(
                    : shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l2_pae_shadow);
         }
 
-        if ( mfn_valid(smfn) )
+        if ( !mfn_eq(smfn, INVALID_MFN) )
         {
             gmfn = _mfn(mfn_to_page(smfn)->v.sh.back);
             mfn_to_page(gmfn)->pagetable_dying = true;
@@ -3867,7 +3867,7 @@ static void cf_check sh_pagetable_dying(
     smfn = shadow_hash_lookup(d, mfn_x(gmfn), SH_type_l4_64_shadow);
 #endif
 
-    if ( mfn_valid(smfn) )
+    if ( !mfn_eq(smfn, INVALID_MFN) )
     {
         mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -770,8 +770,10 @@ get_shadow_status(struct domain *d, mfn_
 /* Look for shadows in the hash table */
 {
     mfn_t smfn = shadow_hash_lookup(d, mfn_x(gmfn), shadow_type);
-    ASSERT(!mfn_valid(smfn) || mfn_to_page(smfn)->u.sh.head);
+
+    ASSERT(mfn_eq(smfn, INVALID_MFN) || mfn_to_page(smfn)->u.sh.head);
     perfc_incr(shadow_get_shadow_status);
+
     return smfn;
 }
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 07/11] x86/shadow: L2H shadow type is PV32-only
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (5 preceding siblings ...)
  2023-01-05 16:04 ` [PATCH 06/11] x86/shadow: drop a few uses of mfn_valid() Jan Beulich
@ 2023-01-05 16:05 ` Jan Beulich
  2023-01-06  1:31   ` Andrew Cooper
  2023-01-05 16:05 ` [PATCH 08/11] x86/shadow: reduce effort of hash calculation Jan Beulich
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 16:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

Like for the various HVM-only types, save a little bit of code by suitably
"masking" this type out when !PV32.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I wasn't really sure whether it would be worthwhile to also update the
"#else" part of shadow_size(). Doing so would be a little tricky, as the
type to return 0 for has no name right now; I'd need to move down the
#undef to allow for that. Thoughts?

In the 4-level variant of SHADOW_FOREACH_L2E() I was heavily inclined to
also pull out of the loop the entire loop invariant part of the
condition (part of which needs touching here anyway). But in the end I
guess this would better be a separate change.

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1740,9 +1740,11 @@ void sh_destroy_shadow(struct domain *d,
     case SH_type_fl1_64_shadow:
         SHADOW_INTERNAL_NAME(sh_destroy_l1_shadow, 4)(d, smfn);
         break;
+#ifdef CONFIG_PV32
     case SH_type_l2h_64_shadow:
         ASSERT(is_pv_32bit_domain(d));
         /* Fall through... */
+#endif
     case SH_type_l2_64_shadow:
         SHADOW_INTERNAL_NAME(sh_destroy_l2_shadow, 4)(d, smfn);
         break;
@@ -2099,7 +2101,9 @@ static int sh_remove_shadow_via_pointer(
 #endif
     case SH_type_l1_64_shadow:
     case SH_type_l2_64_shadow:
+#ifdef CONFIG_PV32
     case SH_type_l2h_64_shadow:
+#endif
     case SH_type_l3_64_shadow:
     case SH_type_l4_64_shadow:
         SHADOW_INTERNAL_NAME(sh_clear_shadow_entry, 4)(d, vaddr, pmfn);
@@ -2137,7 +2141,9 @@ void sh_remove_shadows(struct domain *d,
         [SH_type_l2_pae_shadow] = SHADOW_INTERNAL_NAME(sh_remove_l1_shadow, 3),
 #endif
         [SH_type_l2_64_shadow] = SHADOW_INTERNAL_NAME(sh_remove_l1_shadow, 4),
+#ifdef CONFIG_PV32
         [SH_type_l2h_64_shadow] = SHADOW_INTERNAL_NAME(sh_remove_l1_shadow, 4),
+#endif
         [SH_type_l3_64_shadow] = SHADOW_INTERNAL_NAME(sh_remove_l2_shadow, 4),
         [SH_type_l4_64_shadow] = SHADOW_INTERNAL_NAME(sh_remove_l3_shadow, 4),
     };
@@ -2150,7 +2156,9 @@ void sh_remove_shadows(struct domain *d,
 #endif
         [SH_type_l1_64_shadow] = SHF_L2H_64 | SHF_L2_64,
         [SH_type_l2_64_shadow] = SHF_L3_64,
+#ifdef CONFIG_PV32
         [SH_type_l2h_64_shadow] = SHF_L3_64,
+#endif
         [SH_type_l3_64_shadow] = SHF_L4_64,
     };
 
@@ -2214,7 +2222,9 @@ void sh_remove_shadows(struct domain *d,
 #endif
     DO_UNSHADOW(SH_type_l4_64_shadow);
     DO_UNSHADOW(SH_type_l3_64_shadow);
+#ifdef CONFIG_PV32
     DO_UNSHADOW(SH_type_l2h_64_shadow);
+#endif
     DO_UNSHADOW(SH_type_l2_64_shadow);
     DO_UNSHADOW(SH_type_l1_64_shadow);
 
@@ -3179,7 +3189,9 @@ void shadow_audit_tables(struct vcpu *v)
         [SH_type_l1_64_shadow] = SHADOW_INTERNAL_NAME(sh_audit_l1_table, 4),
         [SH_type_fl1_64_shadow] = SHADOW_INTERNAL_NAME(sh_audit_fl1_table, 4),
         [SH_type_l2_64_shadow] = SHADOW_INTERNAL_NAME(sh_audit_l2_table, 4),
+# ifdef CONFIG_PV32
         [SH_type_l2h_64_shadow] = SHADOW_INTERNAL_NAME(sh_audit_l2_table, 4),
+# endif
         [SH_type_l3_64_shadow] = SHADOW_INTERNAL_NAME(sh_audit_l3_table, 4),
         [SH_type_l4_64_shadow] = SHADOW_INTERNAL_NAME(sh_audit_l4_table, 4),
 #endif
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -56,7 +56,6 @@ const uint8_t sh_type_to_size[] = {
     [SH_type_l1_64_shadow]   = 1,
     [SH_type_fl1_64_shadow]  = 1,
     [SH_type_l2_64_shadow]   = 1,
-    [SH_type_l2h_64_shadow]  = 1,
     [SH_type_l3_64_shadow]   = 1,
     [SH_type_l4_64_shadow]   = 1,
     [SH_type_p2m_table]      = 1,
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -97,6 +97,13 @@ static void sh_flush_local(const struct
     flush_local(guest_flush_tlb_flags(d));
 }
 
+#if GUEST_PAGING_LEVELS >= 4 && defined(CONFIG_PV32)
+#define ASSERT_VALID_L2(t) \
+    ASSERT((t) == SH_type_l2_shadow || (t) == SH_type_l2h_shadow)
+#else
+#define ASSERT_VALID_L2(t) ASSERT((t) == SH_type_l2_shadow)
+#endif
+
 /**************************************************************************/
 /* Hash table mapping from guest pagetables to shadows
  *
@@ -337,7 +344,7 @@ static void sh_audit_gw(struct vcpu *v,
         if ( mfn_valid((smfn = get_shadow_status(d, gw->l2mfn,
                                                  SH_type_l2_shadow))) )
             sh_audit_l2_table(d, smfn, INVALID_MFN);
-#if GUEST_PAGING_LEVELS >= 4 /* 32-bit PV only */
+#if GUEST_PAGING_LEVELS >= 4 && defined(CONFIG_PV32)
         if ( mfn_valid((smfn = get_shadow_status(d, gw->l2mfn,
                                                  SH_type_l2h_shadow))) )
             sh_audit_l2_table(d, smfn, INVALID_MFN);
@@ -859,13 +866,12 @@ do {
     int _i;                                                                 \
     int _xen = !shadow_mode_external(_dom);                                 \
     shadow_l2e_t *_sp = map_domain_page((_sl2mfn));                         \
-    ASSERT(mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2_64_shadow ||\
-           mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2h_64_shadow);\
+    ASSERT_VALID_L2(mfn_to_page(_sl2mfn)->u.sh.type);                       \
     for ( _i = 0; _i < SHADOW_L2_PAGETABLE_ENTRIES; _i++ )                  \
     {                                                                       \
         if ( (!(_xen))                                                      \
              || !is_pv_32bit_domain(_dom)                                   \
-             || mfn_to_page(_sl2mfn)->u.sh.type != SH_type_l2h_64_shadow    \
+             || mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2_64_shadow     \
              || (_i < COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(_dom)) )           \
         {                                                                   \
             (_sl2e) = _sp + _i;                                             \
@@ -992,6 +998,7 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
         }
         break;
 
+#ifdef CONFIG_PV32
         case SH_type_l2h_shadow:
             BUILD_BUG_ON(sizeof(l2_pgentry_t) != sizeof(shadow_l2e_t));
             if ( is_pv_32bit_domain(d) )
@@ -1002,6 +1009,8 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
                 unmap_domain_page(l2t);
             }
             break;
+#endif
+
         default: /* Do nothing */ break;
         }
     }
@@ -1123,11 +1132,13 @@ static shadow_l2e_t * shadow_get_and_cre
         shadow_l3e_t new_sl3e;
         unsigned int t = SH_type_l2_shadow;
 
+#ifdef CONFIG_PV32
         /* Tag compat L2 containing hypervisor (m2p) mappings */
         if ( is_pv_32bit_domain(d) &&
              guest_l4_table_offset(gw->va) == 0 &&
              guest_l3_table_offset(gw->va) == 3 )
             t = SH_type_l2h_shadow;
+#endif
 
         /* No l2 shadow installed: find and install it. */
         *sl2mfn = get_shadow_status(d, gw->l2mfn, t);
@@ -1337,11 +1348,7 @@ void sh_destroy_l2_shadow(struct domain
 
     SHADOW_DEBUG(DESTROY_SHADOW, "%"PRI_mfn"\n", mfn_x(smfn));
 
-#if GUEST_PAGING_LEVELS >= 4
-    ASSERT(t == SH_type_l2_shadow || t == SH_type_l2h_shadow);
-#else
-    ASSERT(t == SH_type_l2_shadow);
-#endif
+    ASSERT_VALID_L2(t);
     ASSERT(sp->u.sh.head);
 
     /* Record that the guest page isn't shadowed any more (in this type) */
@@ -1865,7 +1872,7 @@ int
 sh_map_and_validate_gl2he(struct vcpu *v, mfn_t gl2mfn,
                            void *new_gl2p, u32 size)
 {
-#if GUEST_PAGING_LEVELS >= 4
+#if GUEST_PAGING_LEVELS >= 4 && defined(CONFIG_PV32)
     return sh_map_and_validate(v, gl2mfn, new_gl2p, size,
                                 SH_type_l2h_shadow,
                                 shadow_l2_index,
@@ -3674,7 +3681,7 @@ void sh_clear_shadow_entry(struct domain
         shadow_set_l1e(d, ep, shadow_l1e_empty(), p2m_invalid, smfn);
         break;
     case SH_type_l2_shadow:
-#if GUEST_PAGING_LEVELS >= 4
+#if GUEST_PAGING_LEVELS >= 4 && defined(CONFIG_PV32)
     case SH_type_l2h_shadow:
 #endif
         shadow_set_l2e(d, ep, shadow_l2e_empty(), smfn);
@@ -4124,14 +4131,16 @@ int cf_check sh_audit_l3_table(struct do
 
         if ( SHADOW_AUDIT & SHADOW_AUDIT_ENTRIES_MFNS )
         {
+            unsigned int t = SH_type_l2_shadow;
+
             gfn = guest_l3e_get_gfn(*gl3e);
             mfn = shadow_l3e_get_mfn(*sl3e);
-            gmfn = get_shadow_status(d, get_gfn_query_unlocked(
-                                        d, gfn_x(gfn), &p2mt),
-                                     (is_pv_32bit_domain(d) &&
-                                      guest_index(gl3e) == 3)
-                                     ? SH_type_l2h_shadow
-                                     : SH_type_l2_shadow);
+#ifdef CONFIG_PV32
+            if ( guest_index(gl3e) == 3 && is_pv_32bit_domain(d) )
+                t = SH_type_l2h_shadow;
+#endif
+            gmfn = get_shadow_status(
+                       d, get_gfn_query_unlocked(d, gfn_x(gfn), &p2mt), t);
             if ( !mfn_eq(gmfn, mfn) )
                 AUDIT_FAIL(3, "bad translation: gfn %" SH_PRI_gfn
                            " --> %" PRI_mfn " != mfn %" PRI_mfn,
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -209,6 +209,10 @@ extern void shadow_audit_tables(struct v
 #define SH_type_unused        10U
 #endif
 
+#ifndef CONFIG_PV32 /* Unused (but uglier to #ifdef above): */
+#undef SH_type_l2h_64_shadow
+#endif
+
 /*
  * What counts as a pinnable shadow?
  */
@@ -286,7 +290,11 @@ static inline void sh_terminate_list(str
 #define SHF_L1_64   (1u << SH_type_l1_64_shadow)
 #define SHF_FL1_64  (1u << SH_type_fl1_64_shadow)
 #define SHF_L2_64   (1u << SH_type_l2_64_shadow)
+#ifdef CONFIG_PV32
 #define SHF_L2H_64  (1u << SH_type_l2h_64_shadow)
+#else
+#define SHF_L2H_64  0
+#endif
 #define SHF_L3_64   (1u << SH_type_l3_64_shadow)
 #define SHF_L4_64   (1u << SH_type_l4_64_shadow)
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 08/11] x86/shadow: reduce effort of hash calculation
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (6 preceding siblings ...)
  2023-01-05 16:05 ` [PATCH 07/11] x86/shadow: L2H shadow type is PV32-only Jan Beulich
@ 2023-01-05 16:05 ` Jan Beulich
  2023-01-06  2:03   ` Andrew Cooper
  2023-01-05 16:06 ` [PATCH 09/11] x86/shadow: simplify conditionals in sh_{get,put}_ref() Jan Beulich
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 16:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

The "n" input is a GFN value and hence bounded by the physical address
bits in use on a system. The hash quality won't improve by also
including the upper always-zero bits in the calculation. To keep things
as compile-time-constant as they were before, use PADDR_BITS (not
paddr_bits) for loop bounding. This reduces loop iterations from 8 to 5.

While there also drop the unnecessary cast to u32.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I was tempted to also change the types of "p" (pointer to const) and "i"
(unsigned) right here (and perhaps even the "byte" in the comment ahead
of the function), but then thought this might be going too far ...

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1400,7 +1400,11 @@ static inline key_t sh_hash(unsigned lon
     unsigned char *p = (unsigned char *)&n;
     key_t k = t;
     int i;
-    for ( i = 0; i < sizeof(n) ; i++ ) k = (u32)p[i] + (k<<6) + (k<<16) - k;
+
+    BUILD_BUG_ON(PADDR_BITS > BITS_PER_LONG + PAGE_SHIFT);
+    for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++ )
+        k = p[i] + (k << 6) + (k << 16) - k;
+
     return k % SHADOW_HASH_BUCKETS;
 }
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 09/11] x86/shadow: simplify conditionals in sh_{get,put}_ref()
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (7 preceding siblings ...)
  2023-01-05 16:05 ` [PATCH 08/11] x86/shadow: reduce effort of hash calculation Jan Beulich
@ 2023-01-05 16:06 ` Jan Beulich
  2023-01-05 16:07 ` [PATCH 10/11] x86/shadow: correct shadow type bounds checks Jan Beulich
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 16:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

In both cases the "entry_pa != 0" check is redundant; storing 0 when the
field already is 0 is quite fine. Move the cheaper remaining part first
in sh_get_ref(). In sh_put_ref() convert the has-up-pointer check into
an assertion (requiring the zero check to be retained there).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Strictly speaking accessing ->up ahead of checking that the type
     actually has an "up" pointer is UB, as only the last written field
     of a union may be read. But we have violations of this rule in many
     other places, so I guess we can assume to be okay-ish here as well.

--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -586,9 +586,7 @@ static inline int sh_get_ref(struct doma
     sp->u.sh.count = nx;
 
     /* We remember the first shadow entry that points to each shadow. */
-    if ( entry_pa != 0
-         && sh_type_has_up_pointer(d, sp->u.sh.type)
-         && sp->up == 0 )
+    if ( !sp->up && sh_type_has_up_pointer(d, sp->u.sh.type) )
         sp->up = entry_pa;
 
     return 1;
@@ -607,10 +605,11 @@ static inline void sh_put_ref(struct dom
     ASSERT(!(sp->count_info & PGC_count_mask));
 
     /* If this is the entry in the up-pointer, remove it */
-    if ( entry_pa != 0
-         && sh_type_has_up_pointer(d, sp->u.sh.type)
-         && sp->up == entry_pa )
+    if ( sp->up == entry_pa )
+    {
+        ASSERT(!entry_pa || sh_type_has_up_pointer(d, sp->u.sh.type));
         sp->up = 0;
+    }
 
     x = sp->u.sh.count;
     nx = x - 1;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 10/11] x86/shadow: correct shadow type bounds checks
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (8 preceding siblings ...)
  2023-01-05 16:06 ` [PATCH 09/11] x86/shadow: simplify conditionals in sh_{get,put}_ref() Jan Beulich
@ 2023-01-05 16:07 ` Jan Beulich
  2023-01-05 16:07 ` [PATCH 11/11] x86/shadow: sh_remove_all_mappings() is HVM-only Jan Beulich
  2023-01-06  2:03 ` [PATCH 00/11] x86/shadow: misc tidying Andrew Cooper
  11 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 16:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

In sh_remove_shadow_via_pointer() the type range checks, besides being
bogus (should be ">= min && <= max"), are fully redundant with the has-
up-pointer assertion. In sh_hash_audit_bucket() properly use "min"
instead of assuming a certain order of type numbers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
While style is wrong for the BUG_ON(), keep that aspect as is because of
all the neighboring ones.

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1425,7 +1425,7 @@ static void sh_hash_audit_bucket(struct
         /* Not a shadow? */
         BUG_ON( (sp->count_info & PGC_count_mask )!= 0 ) ;
         /* Bogus type? */
-        BUG_ON( sp->u.sh.type == 0 );
+        BUG_ON( sp->u.sh.type < SH_type_min_shadow );
         BUG_ON( sp->u.sh.type > SH_type_max_shadow );
         /* Wrong page of a multi-page shadow? */
         BUG_ON( !sp->u.sh.head );
@@ -2077,8 +2077,6 @@ static int sh_remove_shadow_via_pointer(
     l1_pgentry_t *vaddr;
     int rc;
 
-    ASSERT(sp->u.sh.type > 0);
-    ASSERT(sp->u.sh.type < SH_type_max_shadow);
     ASSERT(sh_type_has_up_pointer(d, sp->u.sh.type));
 
     if (sp->up == 0) return 0;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 11/11] x86/shadow: sh_remove_all_mappings() is HVM-only
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (9 preceding siblings ...)
  2023-01-05 16:07 ` [PATCH 10/11] x86/shadow: correct shadow type bounds checks Jan Beulich
@ 2023-01-05 16:07 ` Jan Beulich
  2023-01-06  2:03 ` [PATCH 00/11] x86/shadow: misc tidying Andrew Cooper
  11 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-05 16:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Tim Deegan, George Dunlap

All callers live in hvm.c. Moving the function there is undesirable, as
hash walking is local to common.c and probably better remains so. Hence
move an #endif, allowing to drop an #ifdef.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1992,7 +1992,6 @@ int sh_remove_write_access(struct domain
     /* We killed at least one writeable mapping, so must flush TLBs. */
     return 1;
 }
-#endif /* CONFIG_HVM */
 
 /**************************************************************************/
 /* Remove all mappings of a guest frame from the shadow tables.
@@ -2004,12 +2003,10 @@ int sh_remove_all_mappings(struct domain
 
     /* Dispatch table for getting per-type functions */
     static const hash_callback_t callbacks[SH_type_unused] = {
-#ifdef CONFIG_HVM
         [SH_type_l1_32_shadow] = SHADOW_INTERNAL_NAME(sh_rm_mappings_from_l1, 2),
         [SH_type_fl1_32_shadow] = SHADOW_INTERNAL_NAME(sh_rm_mappings_from_l1, 2),
         [SH_type_l1_pae_shadow] = SHADOW_INTERNAL_NAME(sh_rm_mappings_from_l1, 3),
         [SH_type_fl1_pae_shadow] = SHADOW_INTERNAL_NAME(sh_rm_mappings_from_l1, 3),
-#endif
         [SH_type_l1_64_shadow] = SHADOW_INTERNAL_NAME(sh_rm_mappings_from_l1, 4),
         [SH_type_fl1_64_shadow] = SHADOW_INTERNAL_NAME(sh_rm_mappings_from_l1, 4),
     };
@@ -2064,6 +2061,7 @@ int sh_remove_all_mappings(struct domain
     return 1;
 }
 
+#endif /* CONFIG_HVM */
 
 /**************************************************************************/
 /* Remove all shadows of a guest frame from the shadow tables */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/11] x86/shadow: replace sh_reset_l3_up_pointers()
  2023-01-05 15:59 ` [PATCH 01/11] x86/shadow: replace sh_reset_l3_up_pointers() Jan Beulich
@ 2023-01-06  0:55   ` Andrew Cooper
  2023-01-09  8:35     ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2023-01-06  0:55 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap

On 05/01/2023 3:59 pm, Jan Beulich wrote:
> Rather than doing a separate hash walk (and then even using the vCPU
> variant, which is to go away), do the up-pointer-clearing right in
> sh_unpin(), as an alternative to the (now further limited) enlisting on
> a "free floating" list fragment. This utilizes the fact that such list
> fragments are traversed only for multi-page shadows (in shadow_free()).
> Furthermore sh_terminate_list() is a safe guard only anyway, which isn't
> in use in the common case (it actually does anything only for BIGMEM
> configurations).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

I think.  The reasoning seems plausible, but it would probably benefit
from someone else double checking.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05/11] x86/shadow: move bogus HVM checks in sh_pagetable_dying()
  2023-01-05 16:04 ` [PATCH 05/11] x86/shadow: move bogus HVM checks in sh_pagetable_dying() Jan Beulich
@ 2023-01-06  1:00   ` Andrew Cooper
  2023-01-09  8:39     ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2023-01-06  1:00 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap

On 05/01/2023 4:04 pm, Jan Beulich wrote:
> Perhaps these should have been dropped right in 2fb2dee1ac62 ("x86/mm:
> pagetable_dying() is HVM-only"). Convert both to assertions, noting that
> in particular the one in the 3-level variant of the function comes too

"came too late"?

It doesn't any more with this change in place.

> late anyway - first thing there we access the HVM part of a union.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/11] x86/shadow: drop a few uses of mfn_valid()
  2023-01-05 16:04 ` [PATCH 06/11] x86/shadow: drop a few uses of mfn_valid() Jan Beulich
@ 2023-01-06  1:02   ` Andrew Cooper
  2023-01-09  8:42     ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2023-01-06  1:02 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap

On 05/01/2023 4:04 pm, Jan Beulich wrote:
> v->arch.paging.shadow.shadow_table[], v->arch.paging.shadow.oos[],
> v->arch.paging.shadow.oos_{snapshot[],fixup[].smfn[]} as well as the

fixup[],smfn[] ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 07/11] x86/shadow: L2H shadow type is PV32-only
  2023-01-05 16:05 ` [PATCH 07/11] x86/shadow: L2H shadow type is PV32-only Jan Beulich
@ 2023-01-06  1:31   ` Andrew Cooper
  2023-01-09  9:12     ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2023-01-06  1:31 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap

On 05/01/2023 4:05 pm, Jan Beulich wrote:
> Like for the various HVM-only types, save a little bit of code by suitably
> "masking" this type out when !PV32.

add/remove: 0/1 grow/shrink: 2/4 up/down: 544/-922 (-378)
Function                                     old     new   delta
sh_map_and_validate_gl2e__guest_4            136     666    +530
sh_destroy_shadow                            289     303     +14
sh_clear_shadow_entry__guest_4               178     176      -2
sh_validate_guest_entry                      521     472     -49
sh_map_and_validate_gl2he__guest_4           136       2    -134
sh_remove_shadows                           4757    4545    -212
validate_gl2e                                525       -    -525
Total: Before=3914702, After=3914324, chg -0.01%

Marginal...

>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I wasn't really sure whether it would be worthwhile to also update the
> "#else" part of shadow_size(). Doing so would be a little tricky, as the
> type to return 0 for has no name right now; I'd need to move down the
> #undef to allow for that. Thoughts?

This refers to the straight deletion from sh_type_to_size[] ?

I was confused by that at first.  The shadow does have a size of 1.  Might

/*   [SH_type_l2h_64_shadow]  = 1,  PV32 only */

work better?  That leaves it clearly in there as a 1, but not needing
any further ifdefary.

> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -859,13 +866,12 @@ do {
>      int _i;                                                                 \
>      int _xen = !shadow_mode_external(_dom);                                 \
>      shadow_l2e_t *_sp = map_domain_page((_sl2mfn));                         \
> -    ASSERT(mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2_64_shadow ||\
> -           mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2h_64_shadow);\
> +    ASSERT_VALID_L2(mfn_to_page(_sl2mfn)->u.sh.type);                       \
>      for ( _i = 0; _i < SHADOW_L2_PAGETABLE_ENTRIES; _i++ )                  \
>      {                                                                       \
>          if ( (!(_xen))                                                      \
>               || !is_pv_32bit_domain(_dom)                                   \
> -             || mfn_to_page(_sl2mfn)->u.sh.type != SH_type_l2h_64_shadow    \
> +             || mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2_64_shadow     \

Isn't this redundant with the ASSERT_VALID_L2() now?

Per your other question, yes this desperately wants rearranging, but I
would agree with it being another patch.

I did previously play at trying to simplify the PV pagetable loops in a
similar way.  Code-gen wise, I think the L2 loops what to calculate an
upper bound which is either 512, or compat_first_slot, while the L4
loops what an "if(i == 256) i += 7; continue;" rather than having
LFENCE-ing predicates on each iteration.

~Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 08/11] x86/shadow: reduce effort of hash calculation
  2023-01-05 16:05 ` [PATCH 08/11] x86/shadow: reduce effort of hash calculation Jan Beulich
@ 2023-01-06  2:03   ` Andrew Cooper
  2023-01-09  9:48     ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2023-01-06  2:03 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap

On 05/01/2023 4:05 pm, Jan Beulich wrote:
> The "n" input is a GFN value and hence bounded by the physical address
> bits in use on a system.

The one case where this isn't obviously true is in sh_audit().  It comes
from a real MFN in the system, not a GFN, which will have the same
property WRT PADDR_BITS.

>  The hash quality won't improve by also
> including the upper always-zero bits in the calculation. To keep things
> as compile-time-constant as they were before, use PADDR_BITS (not
> paddr_bits) for loop bounding. This reduces loop iterations from 8 to 5.

While this is all true, you'll get a much better improvement by not
forcing 'n' onto the stack just to access it bytewise.  Right now, the
loop looks like:

<shadow_hash_insert>:
    48 83 ec 10                 sub    $0x10,%rsp
    49 89 c9                    mov    %rcx,%r9
    41 89 d0                    mov    %edx,%r8d
    48 8d 44 24 08              lea    0x8(%rsp),%rax
    48 8d 4c 24 10              lea    0x10(%rsp),%rcx
    48 89 74 24 08              mov    %rsi,0x8(%rsp)
    0f 1f 80 00 00 00 00        nopl   0x0(%rax)
/-> 0f b6 10                    movzbl (%rax),%edx
|   48 83 c0 01                 add    $0x1,%rax
|   45 69 c0 3f 00 01 00        imul   $0x1003f,%r8d,%r8d
|   41 01 d0                    add    %edx,%r8d
|   48 39 c1                    cmp    %rax,%rcx
\-- 75 ea                       jne    ffff82d0402efda0
<shadow_hash_insert+0x20>


which doesn't even have a compile-time constant loop bound.  It's
runtime calculated by the second lea constructing the upper pointer bound.

Given this further delta:

diff --git a/xen/arch/x86/mm/shadow/common.c
b/xen/arch/x86/mm/shadow/common.c
index 4a8bcec10fe8..902c749f2724 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1397,13 +1397,12 @@ static unsigned int shadow_get_allocation(struct
domain *d)
 typedef u32 key_t;
 static inline key_t sh_hash(unsigned long n, unsigned int t)
 {
-    unsigned char *p = (unsigned char *)&n;
     key_t k = t;
     int i;
 
     BUILD_BUG_ON(PADDR_BITS > BITS_PER_LONG + PAGE_SHIFT);
-    for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++ )
-        k = p[i] + (k << 6) + (k << 16) - k;
+    for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++, n >>= 8 )
+        k = (uint8_t)n + (k << 6) + (k << 16) - k;
 
     return k % SHADOW_HASH_BUCKETS;
 }

the code gen becomes:

<shadow_hash_insert>:
    41 89 d0                    mov    %edx,%r8d
    49 89 c9                    mov    %rcx,%r9
    b8 05 00 00 00              mov    $0x5,%eax
/-> 45 69 c0 3f 00 01 00        imul   $0x1003f,%r8d,%r8d
|   40 0f b6 d6                 movzbl %sil,%edx
|   48 c1 ee 08                 shr    $0x8,%rsi
|   41 01 d0                    add    %edx,%r8d
|   83 e8 01                    sub    $0x1,%eax
\-- 75 e9                       jne    ffff82d0402efd8b
<shadow_hash_insert+0xb>

with an actual constant loop bound, and not a memory operand in sight. 
This form (even at 8 iterations) will easily execute faster than the
stack-spilled form.

~Andrew

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 00/11] x86/shadow: misc tidying
  2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
                   ` (10 preceding siblings ...)
  2023-01-05 16:07 ` [PATCH 11/11] x86/shadow: sh_remove_all_mappings() is HVM-only Jan Beulich
@ 2023-01-06  2:03 ` Andrew Cooper
  11 siblings, 0 replies; 23+ messages in thread
From: Andrew Cooper @ 2023-01-06  2:03 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap

On 05/01/2023 3:57 pm, Jan Beulich wrote:
> ... or so I hope. The main observation was that we still have both
> hash_vcpu_for_each() and hash_domain_for_each(), where the latter was

foreach

> introduced in 2014/15 to replace the former. Only some eight years
> later we can now complete this conversion. Everything else addresses
> other things noticed along the road.

Wow, it has been a long time.  That was the start of the "make Xen not
fall over NULL pointers if the toolstack issues some hypercalls out of
order", a task that is still ongoing...

>
> 01: replace sh_reset_l3_up_pointers()
> 02: convert sh_audit_flags()'es 1st parameter to domain
> 03: drop hash_vcpu_foreach()
> 04: rename hash_domain_foreach()
> 05: move bogus HVM checks in sh_pagetable_dying()
> 06: drop a few uses of mfn_valid()
> 07: L2H shadow type is PV32-only
> 08: reduce effort of hash calculation
> 09: simplify conditionals in sh_{get,put}_ref()
> 10: correct shadow type bounds checks
> 11: sh_remove_all_mappings() is HVM-only

Everything without code queries, Acked-by: Andrew Cooper
<andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 01/11] x86/shadow: replace sh_reset_l3_up_pointers()
  2023-01-06  0:55   ` Andrew Cooper
@ 2023-01-09  8:35     ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-09  8:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap, xen-devel

On 06.01.2023 01:55, Andrew Cooper wrote:
> On 05/01/2023 3:59 pm, Jan Beulich wrote:
>> Rather than doing a separate hash walk (and then even using the vCPU
>> variant, which is to go away), do the up-pointer-clearing right in
>> sh_unpin(), as an alternative to the (now further limited) enlisting on
>> a "free floating" list fragment. This utilizes the fact that such list
>> fragments are traversed only for multi-page shadows (in shadow_free()).
>> Furthermore sh_terminate_list() is a safe guard only anyway, which isn't
>> in use in the common case (it actually does anything only for BIGMEM
>> configurations).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks.

> I think.  The reasoning seems plausible, but it would probably benefit
> from someone else double checking.

Okay, I'll wait some to see whether Tim or George may voice a view. Perhaps
until the end of this week, committing early next week if no contrary
indications appear. The "good" thing here is that all modern 64-bit guests
undergo this transition aiui, so the code is / will be properly exercised.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05/11] x86/shadow: move bogus HVM checks in sh_pagetable_dying()
  2023-01-06  1:00   ` Andrew Cooper
@ 2023-01-09  8:39     ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-09  8:39 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap, xen-devel

On 06.01.2023 02:00, Andrew Cooper wrote:
> On 05/01/2023 4:04 pm, Jan Beulich wrote:
>> Perhaps these should have been dropped right in 2fb2dee1ac62 ("x86/mm:
>> pagetable_dying() is HVM-only"). Convert both to assertions, noting that
>> in particular the one in the 3-level variant of the function comes too
> 
> "came too late"?
> 
> It doesn't any more with this change in place.

Fine with me either way, so I've changed it. Iirc in particular George has
been advocating for writing descriptions in present tense, even if in the
course of a change the stated fact changes as well.

>> late anyway - first thing there we access the HVM part of a union.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 06/11] x86/shadow: drop a few uses of mfn_valid()
  2023-01-06  1:02   ` Andrew Cooper
@ 2023-01-09  8:42     ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-09  8:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap, xen-devel

On 06.01.2023 02:02, Andrew Cooper wrote:
> On 05/01/2023 4:04 pm, Jan Beulich wrote:
>> v->arch.paging.shadow.shadow_table[], v->arch.paging.shadow.oos[],
>> v->arch.paging.shadow.oos_{snapshot[],fixup[].smfn[]} as well as the
> 
> fixup[],smfn[] ?

No. See e.g. shadow_vcpu_init():

    for ( i = 0; i < SHADOW_OOS_PAGES; i++ )
    {
        v->arch.paging.shadow.oos[i] = INVALID_MFN;
        v->arch.paging.shadow.oos_snapshot[i] = INVALID_MFN;
        for ( j = 0; j < SHADOW_OOS_FIXUPS; j++ )
            v->arch.paging.shadow.oos_fixup[i].smfn[j] = INVALID_MFN;
    }

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 07/11] x86/shadow: L2H shadow type is PV32-only
  2023-01-06  1:31   ` Andrew Cooper
@ 2023-01-09  9:12     ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-09  9:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap, xen-devel

On 06.01.2023 02:31, Andrew Cooper wrote:
> On 05/01/2023 4:05 pm, Jan Beulich wrote:
>> Like for the various HVM-only types, save a little bit of code by suitably
>> "masking" this type out when !PV32.
> 
> add/remove: 0/1 grow/shrink: 2/4 up/down: 544/-922 (-378)
> Function                                     old     new   delta
> sh_map_and_validate_gl2e__guest_4            136     666    +530
> sh_destroy_shadow                            289     303     +14
> sh_clear_shadow_entry__guest_4               178     176      -2
> sh_validate_guest_entry                      521     472     -49
> sh_map_and_validate_gl2he__guest_4           136       2    -134
> sh_remove_shadows                           4757    4545    -212
> validate_gl2e                                525       -    -525
> Total: Before=3914702, After=3914324, chg -0.01%
> 
> Marginal...

Talk wasn't of only size, but also of what actually is being executed
for no gain e.g. in sh_remove_shadows(). I think "a little bit" is a
fair statement.

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> I wasn't really sure whether it would be worthwhile to also update the
>> "#else" part of shadow_size(). Doing so would be a little tricky, as the
>> type to return 0 for has no name right now; I'd need to move down the
>> #undef to allow for that. Thoughts?
> 
> This refers to the straight deletion from sh_type_to_size[] ?

Not really, no, but ...

> I was confused by that at first.  The shadow does have a size of 1.  Might
> 
> /*   [SH_type_l2h_64_shadow]  = 1,  PV32 only */
> 
> work better?  That leaves it clearly in there as a 1, but not needing
> any further ifdefary.

... I've made this adjustment anyway. I'd like to note though that such
commenting is being disliked by Misra.

The remark is rather about shadow_size() itself which already doesn't
return the "correct" size for all of the types when !HVM, utilizing that
the returned size is of no real interest for types which aren't used in
that case (and which then also don't really exist anymore as
"distinguishable" types). Plus the connection to assertions like this

    ASSERT(pages);

in shadow_alloc(): "pages" is the return value from shadow_size(), and
I think it wouldn't be bad to trigger for "dead" types, requiring the
function to return zero for this type despite it not becoming aliased to
SH_type_unused (i.e. unlike the types unavailable when !HVM).

Which makes me notice that with HVM=y shadow_size() would probably
better use array_access_nospec(). I'll add another patch for this.

>> --- a/xen/arch/x86/mm/shadow/multi.c
>> +++ b/xen/arch/x86/mm/shadow/multi.c
>> @@ -859,13 +866,12 @@ do {
>>      int _i;                                                                 \
>>      int _xen = !shadow_mode_external(_dom);                                 \
>>      shadow_l2e_t *_sp = map_domain_page((_sl2mfn));                         \
>> -    ASSERT(mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2_64_shadow ||\
>> -           mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2h_64_shadow);\
>> +    ASSERT_VALID_L2(mfn_to_page(_sl2mfn)->u.sh.type);                       \
>>      for ( _i = 0; _i < SHADOW_L2_PAGETABLE_ENTRIES; _i++ )                  \
>>      {                                                                       \
>>          if ( (!(_xen))                                                      \
>>               || !is_pv_32bit_domain(_dom)                                   \
>> -             || mfn_to_page(_sl2mfn)->u.sh.type != SH_type_l2h_64_shadow    \
>> +             || mfn_to_page(_sl2mfn)->u.sh.type == SH_type_l2_64_shadow     \
> 
> Isn't this redundant with the ASSERT_VALID_L2() now?

How would that be? ASSERT_VALID_L2(), when PV32=y allows for both l2 and
l2h, whereas here we strictly mean only l2. The sole reason for the change
is that the l2h constant simply doesn't exist anymore when !PV32.

> Per your other question, yes this desperately wants rearranging, but I
> would agree with it being another patch.
> 
> I did previously play at trying to simplify the PV pagetable loops in a
> similar way.  Code-gen wise, I think the L2 loops what to calculate an
> upper bound which is either 512, or compat_first_slot, while the L4
> loops what an "if(i == 256) i += 7; continue;" rather than having
> LFENCE-ing predicates on each iteration.

I'll see what I can do without getting distracted too much. (I also guess
you mean "i += 15".)

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 08/11] x86/shadow: reduce effort of hash calculation
  2023-01-06  2:03   ` Andrew Cooper
@ 2023-01-09  9:48     ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2023-01-09  9:48 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Roger Pau Monne, Tim (Xen.org), George Dunlap, xen-devel

On 06.01.2023 03:03, Andrew Cooper wrote:
> On 05/01/2023 4:05 pm, Jan Beulich wrote:
>> The "n" input is a GFN value and hence bounded by the physical address
>> bits in use on a system.
> 
> The one case where this isn't obviously true is in sh_audit().  It comes
> from a real MFN in the system, not a GFN, which will have the same
> property WRT PADDR_BITS.

I'm afraid I was more wrong with that than just for the audit case. Only
FL1 shadows use GFNs. All other shadows us MFNs. I'll update the sentence.

>>  The hash quality won't improve by also
>> including the upper always-zero bits in the calculation. To keep things
>> as compile-time-constant as they were before, use PADDR_BITS (not
>> paddr_bits) for loop bounding. This reduces loop iterations from 8 to 5.
> 
> While this is all true, you'll get a much better improvement by not
> forcing 'n' onto the stack just to access it bytewise.  Right now, the
> loop looks like:
> 
> <shadow_hash_insert>:
>     48 83 ec 10                 sub    $0x10,%rsp
>     49 89 c9                    mov    %rcx,%r9
>     41 89 d0                    mov    %edx,%r8d
>     48 8d 44 24 08              lea    0x8(%rsp),%rax
>     48 8d 4c 24 10              lea    0x10(%rsp),%rcx
>     48 89 74 24 08              mov    %rsi,0x8(%rsp)
>     0f 1f 80 00 00 00 00        nopl   0x0(%rax)
> /-> 0f b6 10                    movzbl (%rax),%edx
> |   48 83 c0 01                 add    $0x1,%rax
> |   45 69 c0 3f 00 01 00        imul   $0x1003f,%r8d,%r8d
> |   41 01 d0                    add    %edx,%r8d
> |   48 39 c1                    cmp    %rax,%rcx
> \-- 75 ea                       jne    ffff82d0402efda0
> <shadow_hash_insert+0x20>
> 
> 
> which doesn't even have a compile-time constant loop bound.  It's
> runtime calculated by the second lea constructing the upper pointer bound.
> 
> Given this further delta:
> 
> diff --git a/xen/arch/x86/mm/shadow/common.c
> b/xen/arch/x86/mm/shadow/common.c
> index 4a8bcec10fe8..902c749f2724 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -1397,13 +1397,12 @@ static unsigned int shadow_get_allocation(struct
> domain *d)
>  typedef u32 key_t;
>  static inline key_t sh_hash(unsigned long n, unsigned int t)
>  {
> -    unsigned char *p = (unsigned char *)&n;
>      key_t k = t;
>      int i;
>  
>      BUILD_BUG_ON(PADDR_BITS > BITS_PER_LONG + PAGE_SHIFT);
> -    for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++ )
> -        k = p[i] + (k << 6) + (k << 16) - k;
> +    for ( i = 0; i < (PADDR_BITS - PAGE_SHIFT + 7) / 8; i++, n >>= 8 )
> +        k = (uint8_t)n + (k << 6) + (k << 16) - k;
>  
>      return k % SHADOW_HASH_BUCKETS;
>  }
> 
> the code gen becomes:
> 
> <shadow_hash_insert>:
>     41 89 d0                    mov    %edx,%r8d
>     49 89 c9                    mov    %rcx,%r9
>     b8 05 00 00 00              mov    $0x5,%eax
> /-> 45 69 c0 3f 00 01 00        imul   $0x1003f,%r8d,%r8d
> |   40 0f b6 d6                 movzbl %sil,%edx
> |   48 c1 ee 08                 shr    $0x8,%rsi
> |   41 01 d0                    add    %edx,%r8d
> |   83 e8 01                    sub    $0x1,%eax
> \-- 75 e9                       jne    ffff82d0402efd8b
> <shadow_hash_insert+0xb>
> 
> with an actual constant loop bound, and not a memory operand in sight. 
> This form (even at 8 iterations) will easily execute faster than the
> stack-spilled form.

Oh, yes, good idea. Will adjust.

Jan


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-01-09  9:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-05 15:57 [PATCH 00/11] x86/shadow: misc tidying Jan Beulich
2023-01-05 15:59 ` [PATCH 01/11] x86/shadow: replace sh_reset_l3_up_pointers() Jan Beulich
2023-01-06  0:55   ` Andrew Cooper
2023-01-09  8:35     ` Jan Beulich
2023-01-05 15:59 ` [PATCH 02/11] x86/shadow: convert sh_audit_flags()'es 1st parameter to domain Jan Beulich
2023-01-05 15:59 ` [PATCH 03/11] x86/shadow: drop hash_vcpu_foreach() Jan Beulich
2023-01-05 16:00 ` [PATCH 04/11] x86/shadow: rename hash_domain_foreach() Jan Beulich
2023-01-05 16:04 ` [PATCH 05/11] x86/shadow: move bogus HVM checks in sh_pagetable_dying() Jan Beulich
2023-01-06  1:00   ` Andrew Cooper
2023-01-09  8:39     ` Jan Beulich
2023-01-05 16:04 ` [PATCH 06/11] x86/shadow: drop a few uses of mfn_valid() Jan Beulich
2023-01-06  1:02   ` Andrew Cooper
2023-01-09  8:42     ` Jan Beulich
2023-01-05 16:05 ` [PATCH 07/11] x86/shadow: L2H shadow type is PV32-only Jan Beulich
2023-01-06  1:31   ` Andrew Cooper
2023-01-09  9:12     ` Jan Beulich
2023-01-05 16:05 ` [PATCH 08/11] x86/shadow: reduce effort of hash calculation Jan Beulich
2023-01-06  2:03   ` Andrew Cooper
2023-01-09  9:48     ` Jan Beulich
2023-01-05 16:06 ` [PATCH 09/11] x86/shadow: simplify conditionals in sh_{get,put}_ref() Jan Beulich
2023-01-05 16:07 ` [PATCH 10/11] x86/shadow: correct shadow type bounds checks Jan Beulich
2023-01-05 16:07 ` [PATCH 11/11] x86/shadow: sh_remove_all_mappings() is HVM-only Jan Beulich
2023-01-06  2:03 ` [PATCH 00/11] x86/shadow: misc tidying Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.