* [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up
@ 2020-07-15 9:56 Jan Beulich
2020-07-15 9:58 ` [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only Jan Beulich
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Jan Beulich @ 2020-07-15 9:56 UTC (permalink / raw)
To: xen-devel
Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné
This in particular goes a few small steps further towards proper
!HVM and !PV config handling (i.e. no carrying of unnecessary
baggage).
1: x86/shadow: dirty VRAM tracking is needed for HVM only
2: x86/shadow: shadow_table[] needs only one entry for PV-only configs
3: x86/PV: drop a few misleading paging_mode_refcounts() checks
4: x86/shadow: have just a single instance of sh_set_toplevel_shadow()
5: x86/shadow: l3table[] and gl3e[] are HVM only
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only
2020-07-15 9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
@ 2020-07-15 9:58 ` Jan Beulich
2020-07-15 9:58 ` [PATCH 2/5] x86/shadow: shadow_table[] needs only one entry for PV-only configs Jan Beulich
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2020-07-15 9:58 UTC (permalink / raw)
To: xen-devel
Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné
Move shadow_track_dirty_vram() into hvm.c (requiring two static
functions to become non-static). More importantly though make sure we
don't de-reference d->arch.hvm.dirty_vram for a non-HVM guest. This was
a latent issue only just because the field lives far enough into struct
hvm_domain to be outside the part overlapping with struct pv_domain.
While moving shadow_track_dirty_vram() some purely typographic
adjustments are being made, like inserting missing blanks or putting
breaces on their own lines.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -999,7 +999,7 @@ void shadow_prealloc(struct domain *d, u
/* Deliberately free all the memory we can: this will tear down all of
* this domain's shadows */
-static void shadow_blow_tables(struct domain *d)
+void shadow_blow_tables(struct domain *d)
{
struct page_info *sp, *t;
struct vcpu *v;
@@ -2029,7 +2029,7 @@ int sh_remove_write_access(struct domain
/* Remove all mappings of a guest frame from the shadow tables.
* Returns non-zero if we need to flush TLBs. */
-static int sh_remove_all_mappings(struct domain *d, mfn_t gmfn, gfn_t gfn)
+int sh_remove_all_mappings(struct domain *d, mfn_t gmfn, gfn_t gfn)
{
struct page_info *page = mfn_to_page(gmfn);
@@ -3162,203 +3162,6 @@ static void sh_clean_dirty_bitmap(struct
}
-/**************************************************************************/
-/* VRAM dirty tracking support */
-int shadow_track_dirty_vram(struct domain *d,
- unsigned long begin_pfn,
- unsigned long nr,
- XEN_GUEST_HANDLE(void) guest_dirty_bitmap)
-{
- int rc = 0;
- unsigned long end_pfn = begin_pfn + nr;
- unsigned long dirty_size = (nr + 7) / 8;
- int flush_tlb = 0;
- unsigned long i;
- p2m_type_t t;
- struct sh_dirty_vram *dirty_vram;
- struct p2m_domain *p2m = p2m_get_hostp2m(d);
- uint8_t *dirty_bitmap = NULL;
-
- if ( end_pfn < begin_pfn || end_pfn > p2m->max_mapped_pfn + 1 )
- return -EINVAL;
-
- /* We perform p2m lookups, so lock the p2m upfront to avoid deadlock */
- p2m_lock(p2m_get_hostp2m(d));
- paging_lock(d);
-
- dirty_vram = d->arch.hvm.dirty_vram;
-
- if ( dirty_vram && (!nr ||
- ( begin_pfn != dirty_vram->begin_pfn
- || end_pfn != dirty_vram->end_pfn )) )
- {
- /* Different tracking, tear the previous down. */
- gdprintk(XENLOG_INFO, "stopping tracking VRAM %lx - %lx\n", dirty_vram->begin_pfn, dirty_vram->end_pfn);
- xfree(dirty_vram->sl1ma);
- xfree(dirty_vram->dirty_bitmap);
- xfree(dirty_vram);
- dirty_vram = d->arch.hvm.dirty_vram = NULL;
- }
-
- if ( !nr )
- goto out;
-
- dirty_bitmap = vzalloc(dirty_size);
- if ( dirty_bitmap == NULL )
- {
- rc = -ENOMEM;
- goto out;
- }
- /* This should happen seldomly (Video mode change),
- * no need to be careful. */
- if ( !dirty_vram )
- {
- /* Throw away all the shadows rather than walking through them
- * up to nr times getting rid of mappings of each pfn */
- shadow_blow_tables(d);
-
- gdprintk(XENLOG_INFO, "tracking VRAM %lx - %lx\n", begin_pfn, end_pfn);
-
- rc = -ENOMEM;
- if ( (dirty_vram = xmalloc(struct sh_dirty_vram)) == NULL )
- goto out;
- dirty_vram->begin_pfn = begin_pfn;
- dirty_vram->end_pfn = end_pfn;
- d->arch.hvm.dirty_vram = dirty_vram;
-
- if ( (dirty_vram->sl1ma = xmalloc_array(paddr_t, nr)) == NULL )
- goto out_dirty_vram;
- memset(dirty_vram->sl1ma, ~0, sizeof(paddr_t) * nr);
-
- if ( (dirty_vram->dirty_bitmap = xzalloc_array(uint8_t, dirty_size)) == NULL )
- goto out_sl1ma;
-
- dirty_vram->last_dirty = NOW();
-
- /* Tell the caller that this time we could not track dirty bits. */
- rc = -ENODATA;
- }
- else if (dirty_vram->last_dirty == -1)
- /* still completely clean, just copy our empty bitmap */
- memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
- else
- {
- mfn_t map_mfn = INVALID_MFN;
- void *map_sl1p = NULL;
-
- /* Iterate over VRAM to track dirty bits. */
- for ( i = 0; i < nr; i++ ) {
- mfn_t mfn = get_gfn_query_unlocked(d, begin_pfn + i, &t);
- struct page_info *page;
- int dirty = 0;
- paddr_t sl1ma = dirty_vram->sl1ma[i];
-
- if ( mfn_eq(mfn, INVALID_MFN) )
- dirty = 1;
- else
- {
- page = mfn_to_page(mfn);
- switch (page->u.inuse.type_info & PGT_count_mask)
- {
- case 0:
- /* No guest reference, nothing to track. */
- break;
- case 1:
- /* One guest reference. */
- if ( sl1ma == INVALID_PADDR )
- {
- /* We don't know which sl1e points to this, too bad. */
- dirty = 1;
- /* TODO: Heuristics for finding the single mapping of
- * this gmfn */
- flush_tlb |= sh_remove_all_mappings(d, mfn,
- _gfn(begin_pfn + i));
- }
- else
- {
- /* Hopefully the most common case: only one mapping,
- * whose dirty bit we can use. */
- l1_pgentry_t *sl1e;
- mfn_t sl1mfn = maddr_to_mfn(sl1ma);
-
- if ( !mfn_eq(sl1mfn, map_mfn) )
- {
- if ( map_sl1p )
- unmap_domain_page(map_sl1p);
- map_sl1p = map_domain_page(sl1mfn);
- map_mfn = sl1mfn;
- }
- sl1e = map_sl1p + (sl1ma & ~PAGE_MASK);
-
- if ( l1e_get_flags(*sl1e) & _PAGE_DIRTY )
- {
- dirty = 1;
- /* Note: this is atomic, so we may clear a
- * _PAGE_ACCESSED set by another processor. */
- l1e_remove_flags(*sl1e, _PAGE_DIRTY);
- flush_tlb = 1;
- }
- }
- break;
- default:
- /* More than one guest reference,
- * we don't afford tracking that. */
- dirty = 1;
- break;
- }
- }
-
- if ( dirty )
- {
- dirty_vram->dirty_bitmap[i / 8] |= 1 << (i % 8);
- dirty_vram->last_dirty = NOW();
- }
- }
-
- if ( map_sl1p )
- unmap_domain_page(map_sl1p);
-
- memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
- memset(dirty_vram->dirty_bitmap, 0, dirty_size);
- if ( dirty_vram->last_dirty + SECONDS(2) < NOW() )
- {
- /* was clean for more than two seconds, try to disable guest
- * write access */
- for ( i = begin_pfn; i < end_pfn; i++ )
- {
- mfn_t mfn = get_gfn_query_unlocked(d, i, &t);
- if ( !mfn_eq(mfn, INVALID_MFN) )
- flush_tlb |= sh_remove_write_access(d, mfn, 1, 0);
- }
- dirty_vram->last_dirty = -1;
- }
- }
- if ( flush_tlb )
- guest_flush_tlb_mask(d, d->dirty_cpumask);
- goto out;
-
-out_sl1ma:
- xfree(dirty_vram->sl1ma);
-out_dirty_vram:
- xfree(dirty_vram);
- dirty_vram = d->arch.hvm.dirty_vram = NULL;
-
-out:
- paging_unlock(d);
- if ( rc == 0 && dirty_bitmap != NULL &&
- copy_to_guest(guest_dirty_bitmap, dirty_bitmap, dirty_size) )
- {
- paging_lock(d);
- for ( i = 0; i < dirty_size; i++ )
- dirty_vram->dirty_bitmap[i] |= dirty_bitmap[i];
- paging_unlock(d);
- rc = -EFAULT;
- }
- vfree(dirty_bitmap);
- p2m_unlock(p2m_get_hostp2m(d));
- return rc;
-}
-
/* Fluhs TLB of selected vCPUs. */
bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
void *ctxt)
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -691,6 +691,218 @@ static void sh_emulate_unmap_dest(struct
atomic_inc(&v->domain->arch.paging.shadow.gtable_dirty_version);
}
+/**************************************************************************/
+/* VRAM dirty tracking support */
+int shadow_track_dirty_vram(struct domain *d,
+ unsigned long begin_pfn,
+ unsigned long nr,
+ XEN_GUEST_HANDLE(void) guest_dirty_bitmap)
+{
+ int rc = 0;
+ unsigned long end_pfn = begin_pfn + nr;
+ unsigned long dirty_size = (nr + 7) / 8;
+ int flush_tlb = 0;
+ unsigned long i;
+ p2m_type_t t;
+ struct sh_dirty_vram *dirty_vram;
+ struct p2m_domain *p2m = p2m_get_hostp2m(d);
+ uint8_t *dirty_bitmap = NULL;
+
+ if ( end_pfn < begin_pfn || end_pfn > p2m->max_mapped_pfn + 1 )
+ return -EINVAL;
+
+ /* We perform p2m lookups, so lock the p2m upfront to avoid deadlock */
+ p2m_lock(p2m_get_hostp2m(d));
+ paging_lock(d);
+
+ dirty_vram = d->arch.hvm.dirty_vram;
+
+ if ( dirty_vram && (!nr ||
+ ( begin_pfn != dirty_vram->begin_pfn
+ || end_pfn != dirty_vram->end_pfn )) )
+ {
+ /* Different tracking, tear the previous down. */
+ gdprintk(XENLOG_INFO, "stopping tracking VRAM %lx - %lx\n", dirty_vram->begin_pfn, dirty_vram->end_pfn);
+ xfree(dirty_vram->sl1ma);
+ xfree(dirty_vram->dirty_bitmap);
+ xfree(dirty_vram);
+ dirty_vram = d->arch.hvm.dirty_vram = NULL;
+ }
+
+ if ( !nr )
+ goto out;
+
+ dirty_bitmap = vzalloc(dirty_size);
+ if ( dirty_bitmap == NULL )
+ {
+ rc = -ENOMEM;
+ goto out;
+ }
+ /*
+ * This should happen seldomly (Video mode change),
+ * no need to be careful.
+ */
+ if ( !dirty_vram )
+ {
+ /*
+ * Throw away all the shadows rather than walking through them
+ * up to nr times getting rid of mappings of each pfn.
+ */
+ shadow_blow_tables(d);
+
+ gdprintk(XENLOG_INFO, "tracking VRAM %lx - %lx\n", begin_pfn, end_pfn);
+
+ rc = -ENOMEM;
+ if ( (dirty_vram = xmalloc(struct sh_dirty_vram)) == NULL )
+ goto out;
+ dirty_vram->begin_pfn = begin_pfn;
+ dirty_vram->end_pfn = end_pfn;
+ d->arch.hvm.dirty_vram = dirty_vram;
+
+ if ( (dirty_vram->sl1ma = xmalloc_array(paddr_t, nr)) == NULL )
+ goto out_dirty_vram;
+ memset(dirty_vram->sl1ma, ~0, sizeof(paddr_t) * nr);
+
+ if ( (dirty_vram->dirty_bitmap = xzalloc_array(uint8_t, dirty_size)) == NULL )
+ goto out_sl1ma;
+
+ dirty_vram->last_dirty = NOW();
+
+ /* Tell the caller that this time we could not track dirty bits. */
+ rc = -ENODATA;
+ }
+ else if ( dirty_vram->last_dirty == -1 )
+ /* still completely clean, just copy our empty bitmap */
+ memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
+ else
+ {
+ mfn_t map_mfn = INVALID_MFN;
+ void *map_sl1p = NULL;
+
+ /* Iterate over VRAM to track dirty bits. */
+ for ( i = 0; i < nr; i++ )
+ {
+ mfn_t mfn = get_gfn_query_unlocked(d, begin_pfn + i, &t);
+ struct page_info *page;
+ int dirty = 0;
+ paddr_t sl1ma = dirty_vram->sl1ma[i];
+
+ if ( mfn_eq(mfn, INVALID_MFN) )
+ dirty = 1;
+ else
+ {
+ page = mfn_to_page(mfn);
+ switch ( page->u.inuse.type_info & PGT_count_mask )
+ {
+ case 0:
+ /* No guest reference, nothing to track. */
+ break;
+
+ case 1:
+ /* One guest reference. */
+ if ( sl1ma == INVALID_PADDR )
+ {
+ /* We don't know which sl1e points to this, too bad. */
+ dirty = 1;
+ /*
+ * TODO: Heuristics for finding the single mapping of
+ * this gmfn
+ */
+ flush_tlb |= sh_remove_all_mappings(d, mfn,
+ _gfn(begin_pfn + i));
+ }
+ else
+ {
+ /*
+ * Hopefully the most common case: only one mapping,
+ * whose dirty bit we can use.
+ */
+ l1_pgentry_t *sl1e;
+ mfn_t sl1mfn = maddr_to_mfn(sl1ma);
+
+ if ( !mfn_eq(sl1mfn, map_mfn) )
+ {
+ if ( map_sl1p )
+ unmap_domain_page(map_sl1p);
+ map_sl1p = map_domain_page(sl1mfn);
+ map_mfn = sl1mfn;
+ }
+ sl1e = map_sl1p + (sl1ma & ~PAGE_MASK);
+
+ if ( l1e_get_flags(*sl1e) & _PAGE_DIRTY )
+ {
+ dirty = 1;
+ /*
+ * Note: this is atomic, so we may clear a
+ * _PAGE_ACCESSED set by another processor.
+ */
+ l1e_remove_flags(*sl1e, _PAGE_DIRTY);
+ flush_tlb = 1;
+ }
+ }
+ break;
+
+ default:
+ /* More than one guest reference,
+ * we don't afford tracking that. */
+ dirty = 1;
+ break;
+ }
+ }
+
+ if ( dirty )
+ {
+ dirty_vram->dirty_bitmap[i / 8] |= 1 << (i % 8);
+ dirty_vram->last_dirty = NOW();
+ }
+ }
+
+ if ( map_sl1p )
+ unmap_domain_page(map_sl1p);
+
+ memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
+ memset(dirty_vram->dirty_bitmap, 0, dirty_size);
+ if ( dirty_vram->last_dirty + SECONDS(2) < NOW() )
+ {
+ /*
+ * Was clean for more than two seconds, try to disable guest
+ * write access.
+ */
+ for ( i = begin_pfn; i < end_pfn; i++ )
+ {
+ mfn_t mfn = get_gfn_query_unlocked(d, i, &t);
+ if ( !mfn_eq(mfn, INVALID_MFN) )
+ flush_tlb |= sh_remove_write_access(d, mfn, 1, 0);
+ }
+ dirty_vram->last_dirty = -1;
+ }
+ }
+ if ( flush_tlb )
+ guest_flush_tlb_mask(d, d->dirty_cpumask);
+ goto out;
+
+ out_sl1ma:
+ xfree(dirty_vram->sl1ma);
+ out_dirty_vram:
+ xfree(dirty_vram);
+ dirty_vram = d->arch.hvm.dirty_vram = NULL;
+
+ out:
+ paging_unlock(d);
+ if ( rc == 0 && dirty_bitmap != NULL &&
+ copy_to_guest(guest_dirty_bitmap, dirty_bitmap, dirty_size) )
+ {
+ paging_lock(d);
+ for ( i = 0; i < dirty_size; i++ )
+ dirty_vram->dirty_bitmap[i] |= dirty_bitmap[i];
+ paging_unlock(d);
+ rc = -EFAULT;
+ }
+ vfree(dirty_bitmap);
+ p2m_unlock(p2m_get_hostp2m(d));
+ return rc;
+}
+
/*
* Local variables:
* mode: C
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -494,7 +494,6 @@ _sh_propagate(struct vcpu *v,
guest_l1e_t guest_entry = { guest_intpte };
shadow_l1e_t *sp = shadow_entry_ptr;
struct domain *d = v->domain;
- struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
u32 pass_thru_flags;
u32 gflags, sflags;
@@ -649,15 +648,19 @@ _sh_propagate(struct vcpu *v,
}
}
- if ( unlikely((level == 1) && dirty_vram
- && dirty_vram->last_dirty == -1
- && gfn_x(target_gfn) >= dirty_vram->begin_pfn
- && gfn_x(target_gfn) < dirty_vram->end_pfn) )
+ if ( unlikely(level == 1) && is_hvm_domain(d) )
{
- if ( ft & FETCH_TYPE_WRITE )
- dirty_vram->last_dirty = NOW();
- else
- sflags &= ~_PAGE_RW;
+ struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
+
+ if ( dirty_vram && dirty_vram->last_dirty == -1 &&
+ gfn_x(target_gfn) >= dirty_vram->begin_pfn &&
+ gfn_x(target_gfn) < dirty_vram->end_pfn )
+ {
+ if ( ft & FETCH_TYPE_WRITE )
+ dirty_vram->last_dirty = NOW();
+ else
+ sflags &= ~_PAGE_RW;
+ }
}
/* Read-only memory */
@@ -1082,7 +1085,7 @@ static inline void shadow_vram_get_l1e(s
unsigned long gfn;
struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
- if ( !dirty_vram /* tracking disabled? */
+ if ( !is_hvm_domain(d) || !dirty_vram /* tracking disabled? */
|| !(flags & _PAGE_RW) /* read-only mapping? */
|| !mfn_valid(mfn) ) /* mfn can be invalid in mmio_direct */
return;
@@ -1113,7 +1116,7 @@ static inline void shadow_vram_put_l1e(s
unsigned long gfn;
struct sh_dirty_vram *dirty_vram = d->arch.hvm.dirty_vram;
- if ( !dirty_vram /* tracking disabled? */
+ if ( !is_hvm_domain(d) || !dirty_vram /* tracking disabled? */
|| !(flags & _PAGE_RW) /* read-only mapping? */
|| !mfn_valid(mfn) ) /* mfn can be invalid in mmio_direct */
return;
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -438,6 +438,14 @@ mfn_t oos_snapshot_lookup(struct domain
#endif /* (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) */
+/* Deliberately free all the memory we can: tear down all of d's shadows. */
+void shadow_blow_tables(struct domain *d);
+
+/*
+ * Remove all mappings of a guest frame from the shadow tables.
+ * Returns non-zero if we need to flush TLBs.
+ */
+int sh_remove_all_mappings(struct domain *d, mfn_t gmfn, gfn_t gfn);
/* Reset the up-pointers of every L3 shadow to 0.
* This is called when l3 shadows stop being pinnable, to clear out all
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/5] x86/shadow: shadow_table[] needs only one entry for PV-only configs
2020-07-15 9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
2020-07-15 9:58 ` [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only Jan Beulich
@ 2020-07-15 9:58 ` Jan Beulich
2020-07-15 9:59 ` [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks Jan Beulich
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2020-07-15 9:58 UTC (permalink / raw)
To: xen-devel
Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné
Furthermore the field isn't needed at all with shadow support disabled -
move it into struct shadow_vcpu.
Introduce for_each_shadow_table(), shortening loops for the 4-level case
at the same time.
Adjust loop variables and a function parameter to be "unsigned int"
where applicable at the same time. Also move a comment that ended up
misplaced due to incremental additions.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -959,13 +959,15 @@ static void _shadow_prealloc(struct doma
perfc_incr(shadow_prealloc_2);
for_each_vcpu(d, v)
- for ( i = 0 ; i < 4 ; i++ )
+ for ( i = 0; i < ARRAY_SIZE(v->arch.paging.shadow.shadow_table); i++ )
{
- if ( !pagetable_is_null(v->arch.shadow_table[i]) )
+ if ( !pagetable_is_null(v->arch.paging.shadow.shadow_table[i]) )
{
TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_PREALLOC_UNHOOK);
- shadow_unhook_mappings(d,
- pagetable_get_mfn(v->arch.shadow_table[i]), 0);
+ shadow_unhook_mappings(
+ d,
+ pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]),
+ 0);
/* See if that freed up enough space */
if ( d->arch.paging.shadow.free_pages >= pages )
@@ -1018,10 +1020,12 @@ void shadow_blow_tables(struct domain *d
/* Second pass: unhook entries of in-use shadows */
for_each_vcpu(d, v)
- for ( i = 0 ; i < 4 ; i++ )
- if ( !pagetable_is_null(v->arch.shadow_table[i]) )
- shadow_unhook_mappings(d,
- pagetable_get_mfn(v->arch.shadow_table[i]), 0);
+ for ( i = 0; i < ARRAY_SIZE(v->arch.paging.shadow.shadow_table); i++ )
+ if ( !pagetable_is_null(v->arch.paging.shadow.shadow_table[i]) )
+ shadow_unhook_mappings(
+ d,
+ pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]),
+ 0);
/* Make sure everyone sees the unshadowings */
guest_flush_tlb_mask(d, d->dirty_cpumask);
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -85,6 +85,15 @@ const char *const fetch_type_names[] = {
};
#endif
+#if SHADOW_PAGING_LEVELS == 3
+# define for_each_shadow_table(v, i) \
+ for ( (i) = 0; \
+ (i) < ARRAY_SIZE((v)->arch.paging.shadow.shadow_table); \
+ ++(i) )
+#else
+# define for_each_shadow_table(v, i) for ( (i) = 0; (i) < 1; ++(i) )
+#endif
+
/* Helper to perform a local TLB flush. */
static void sh_flush_local(const struct domain *d)
{
@@ -1624,7 +1633,7 @@ static shadow_l4e_t * shadow_get_and_cre
mfn_t *sl4mfn)
{
/* There is always a shadow of the top level table. Get it. */
- *sl4mfn = pagetable_get_mfn(v->arch.shadow_table[0]);
+ *sl4mfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
/* Reading the top level table is always valid. */
return sh_linear_l4_table(v) + shadow_l4_linear_offset(gw->va);
}
@@ -1740,7 +1749,7 @@ static shadow_l2e_t * shadow_get_and_cre
return sh_linear_l2_table(v) + shadow_l2_linear_offset(gw->va);
#else /* 32bit... */
/* There is always a shadow of the top level table. Get it. */
- *sl2mfn = pagetable_get_mfn(v->arch.shadow_table[0]);
+ *sl2mfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
/* This next line is important: the guest l2 has a 16k
* shadow, we need to return the right mfn of the four. This
* call will set it for us as a side-effect. */
@@ -2333,6 +2342,7 @@ int sh_safe_not_to_sync(struct vcpu *v,
struct domain *d = v->domain;
struct page_info *sp;
mfn_t smfn;
+ unsigned int i;
if ( !sh_type_has_up_pointer(d, SH_type_l1_shadow) )
return 0;
@@ -2365,14 +2375,10 @@ int sh_safe_not_to_sync(struct vcpu *v,
ASSERT(mfn_valid(smfn));
#endif
- if ( pagetable_get_pfn(v->arch.shadow_table[0]) == mfn_x(smfn)
-#if (SHADOW_PAGING_LEVELS == 3)
- || pagetable_get_pfn(v->arch.shadow_table[1]) == mfn_x(smfn)
- || pagetable_get_pfn(v->arch.shadow_table[2]) == mfn_x(smfn)
- || pagetable_get_pfn(v->arch.shadow_table[3]) == mfn_x(smfn)
-#endif
- )
- return 0;
+ for_each_shadow_table(v, i)
+ if ( pagetable_get_pfn(v->arch.paging.shadow.shadow_table[i]) ==
+ mfn_x(smfn) )
+ return 0;
/* Only in use in one toplevel shadow, and it's not the one we're
* running on */
@@ -3287,10 +3293,12 @@ static int sh_page_fault(struct vcpu *v,
for_each_vcpu(d, tmp)
{
#if GUEST_PAGING_LEVELS == 3
- int i;
- for ( i = 0; i < 4; i++ )
+ unsigned int i;
+
+ for_each_shadow_table(v, i)
{
- mfn_t smfn = pagetable_get_mfn(v->arch.shadow_table[i]);
+ mfn_t smfn = pagetable_get_mfn(
+ v->arch.paging.shadow.shadow_table[i]);
if ( mfn_valid(smfn) && (mfn_x(smfn) != 0) )
{
@@ -3707,7 +3715,7 @@ sh_update_linear_entries(struct vcpu *v)
*
* Because HVM guests run on the same monitor tables regardless of the
* shadow tables in use, the linear mapping of the shadow tables has to
- * be updated every time v->arch.shadow_table changes.
+ * be updated every time v->arch.paging.shadow.shadow_table changes.
*/
/* Don't try to update the monitor table if it doesn't exist */
@@ -3723,8 +3731,9 @@ sh_update_linear_entries(struct vcpu *v)
if ( v == current )
{
__linear_l4_table[l4_linear_offset(SH_LINEAR_PT_VIRT_START)] =
- l4e_from_pfn(pagetable_get_pfn(v->arch.shadow_table[0]),
- __PAGE_HYPERVISOR_RW);
+ l4e_from_pfn(
+ pagetable_get_pfn(v->arch.paging.shadow.shadow_table[0]),
+ __PAGE_HYPERVISOR_RW);
}
else
{
@@ -3732,8 +3741,9 @@ sh_update_linear_entries(struct vcpu *v)
ml4e = map_domain_page(pagetable_get_mfn(v->arch.hvm.monitor_table));
ml4e[l4_table_offset(SH_LINEAR_PT_VIRT_START)] =
- l4e_from_pfn(pagetable_get_pfn(v->arch.shadow_table[0]),
- __PAGE_HYPERVISOR_RW);
+ l4e_from_pfn(
+ pagetable_get_pfn(v->arch.paging.shadow.shadow_table[0]),
+ __PAGE_HYPERVISOR_RW);
unmap_domain_page(ml4e);
}
@@ -3812,7 +3822,7 @@ sh_update_linear_entries(struct vcpu *v)
/*
- * Removes vcpu->arch.shadow_table[].
+ * Removes v->arch.paging.shadow.shadow_table[].
* Does all appropriate management/bookkeeping/refcounting/etc...
*/
static void
@@ -3820,38 +3830,34 @@ sh_detach_old_tables(struct vcpu *v)
{
struct domain *d = v->domain;
mfn_t smfn;
- int i = 0;
+ unsigned int i;
////
- //// vcpu->arch.shadow_table[]
+ //// vcpu->arch.paging.shadow.shadow_table[]
////
-#if GUEST_PAGING_LEVELS == 3
- /* PAE guests have four shadow_table entries */
- for ( i = 0 ; i < 4 ; i++ )
-#endif
+ for_each_shadow_table(v, i)
{
- smfn = pagetable_get_mfn(v->arch.shadow_table[i]);
+ smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]);
if ( mfn_x(smfn) )
sh_put_ref(d, smfn, 0);
- v->arch.shadow_table[i] = pagetable_null();
+ v->arch.paging.shadow.shadow_table[i] = pagetable_null();
}
}
/* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
static void
sh_set_toplevel_shadow(struct vcpu *v,
- int slot,
+ unsigned int slot,
mfn_t gmfn,
unsigned int root_type)
{
mfn_t smfn;
pagetable_t old_entry, new_entry;
-
struct domain *d = v->domain;
/* Remember the old contents of this slot */
- old_entry = v->arch.shadow_table[slot];
+ old_entry = v->arch.paging.shadow.shadow_table[slot];
/* Now figure out the new contents: is this a valid guest MFN? */
if ( !mfn_valid(gmfn) )
@@ -3893,7 +3899,7 @@ sh_set_toplevel_shadow(struct vcpu *v,
SHADOW_PRINTK("%u/%u [%u] gmfn %#"PRI_mfn" smfn %#"PRI_mfn"\n",
GUEST_PAGING_LEVELS, SHADOW_PAGING_LEVELS, slot,
mfn_x(gmfn), mfn_x(pagetable_get_mfn(new_entry)));
- v->arch.shadow_table[slot] = new_entry;
+ v->arch.paging.shadow.shadow_table[slot] = new_entry;
/* Decrement the refcount of the old contents of this slot */
if ( !pagetable_is_null(old_entry) ) {
@@ -3999,7 +4005,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
////
- //// vcpu->arch.shadow_table[]
+ //// vcpu->arch.paging.shadow.shadow_table[]
////
/* We revoke write access to the new guest toplevel page(s) before we
@@ -4056,7 +4062,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow);
if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
{
- mfn_t smfn = pagetable_get_mfn(v->arch.shadow_table[0]);
+ mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
if ( !(v->arch.flags & TF_kernel_mode) && VM_ASSIST(d, m2p_strict) )
zap_ro_mpt(smfn);
@@ -4074,9 +4080,10 @@ sh_update_cr3(struct vcpu *v, int do_loc
///
#if SHADOW_PAGING_LEVELS == 3
{
- mfn_t smfn = pagetable_get_mfn(v->arch.shadow_table[0]);
- int i;
- for ( i = 0; i < 4; i++ )
+ mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
+ unsigned int i;
+
+ for_each_shadow_table(v, i)
{
#if GUEST_PAGING_LEVELS == 2
/* 2-on-3: make a PAE l3 that points at the four-page l2 */
@@ -4084,7 +4091,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
smfn = sh_next_page(smfn);
#else
/* 3-on-3: make a PAE l3 that points at the four l2 pages */
- smfn = pagetable_get_mfn(v->arch.shadow_table[i]);
+ smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]);
#endif
v->arch.paging.shadow.l3table[i] =
(mfn_x(smfn) == 0)
@@ -4108,7 +4115,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
/* We don't support PV except guest == shadow == config levels */
BUILD_BUG_ON(GUEST_PAGING_LEVELS != SHADOW_PAGING_LEVELS);
/* Just use the shadow top-level directly */
- make_cr3(v, pagetable_get_mfn(v->arch.shadow_table[0]));
+ make_cr3(v, pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]));
}
#endif
@@ -4124,7 +4131,8 @@ sh_update_cr3(struct vcpu *v, int do_loc
v->arch.hvm.hw_cr[3] = virt_to_maddr(&v->arch.paging.shadow.l3table);
#else
/* 4-on-4: Just use the shadow top-level directly */
- v->arch.hvm.hw_cr[3] = pagetable_get_paddr(v->arch.shadow_table[0]);
+ v->arch.hvm.hw_cr[3] =
+ pagetable_get_paddr(v->arch.paging.shadow.shadow_table[0]);
#endif
hvm_update_guest_cr3(v, noflush);
}
@@ -4443,7 +4451,7 @@ static void sh_pagetable_dying(paddr_t g
{
struct vcpu *v = current;
struct domain *d = v->domain;
- int i = 0;
+ unsigned int i;
int flush = 0;
int fast_path = 0;
paddr_t gcr3 = 0;
@@ -4474,15 +4482,16 @@ static void sh_pagetable_dying(paddr_t g
gl3pa = map_domain_page(l3mfn);
gl3e = (guest_l3e_t *)(gl3pa + ((unsigned long)gpa & ~PAGE_MASK));
}
- for ( i = 0; i < 4; i++ )
+ for_each_shadow_table(v, i)
{
mfn_t smfn, gmfn;
- if ( fast_path ) {
- if ( pagetable_is_null(v->arch.shadow_table[i]) )
+ if ( fast_path )
+ {
+ if ( pagetable_is_null(v->arch.paging.shadow.shadow_table[i]) )
smfn = INVALID_MFN;
else
- smfn = pagetable_get_mfn(v->arch.shadow_table[i]);
+ smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[i]);
}
else
{
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -135,6 +135,14 @@ struct shadow_vcpu {
l3_pgentry_t l3table[4] __attribute__((__aligned__(32)));
/* PAE guests: per-vcpu cache of the top-level *guest* entries */
l3_pgentry_t gl3e[4] __attribute__((__aligned__(32)));
+
+ /* shadow(s) of guest (MFN) */
+#ifdef CONFIG_HVM
+ pagetable_t shadow_table[4];
+#else
+ pagetable_t shadow_table[1];
+#endif
+
/* Last MFN that we emulated a write to as unshadow heuristics. */
unsigned long last_emulated_mfn_for_unshadow;
/* MFN of the last shadow that we shot a writeable mapping in */
@@ -576,6 +584,10 @@ struct arch_vcpu
struct hvm_vcpu hvm;
};
+ /*
+ * guest_table{,_user} hold a ref to the page, and also a type-count
+ * unless shadow refcounts are in use
+ */
pagetable_t guest_table_user; /* (MFN) x86/64 user-space pagetable */
pagetable_t guest_table; /* (MFN) guest notion of cr3 */
struct page_info *old_guest_table; /* partially destructed pagetable */
@@ -583,9 +595,7 @@ struct arch_vcpu
/* former, if any */
bool old_guest_table_partial; /* Are we dropping a type ref, or just
* finishing up a partial de-validation? */
- /* guest_table holds a ref to the page, and also a type-count unless
- * shadow refcounts are in use */
- pagetable_t shadow_table[4]; /* (MFN) shadow(s) of guest */
+
unsigned long cr3; /* (MA) value to install in HW CR3 */
/*
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks
2020-07-15 9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
2020-07-15 9:58 ` [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only Jan Beulich
2020-07-15 9:58 ` [PATCH 2/5] x86/shadow: shadow_table[] needs only one entry for PV-only configs Jan Beulich
@ 2020-07-15 9:59 ` Jan Beulich
2020-07-31 14:58 ` Ping: " Jan Beulich
2020-07-15 9:59 ` [PATCH 4/5] x86/shadow: have just a single instance of sh_set_toplevel_shadow() Jan Beulich
` (2 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2020-07-15 9:59 UTC (permalink / raw)
To: xen-devel
Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné
The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
was apparently inconsistent so far: There was a type ref acquired
unconditionally for the new top level page table, but the dropping of
the old type ref was conditional upon !paging_mode_refcounts(). Mirror
this also to arch_set_info_guest().
Also move new_guest_cr3()'s #ifdef to around the function - both callers
now get built only when CONFIG_PV, i.e. no need to retain a stub.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1122,8 +1122,6 @@ int arch_set_info_guest(
if ( !cr3_page )
rc = -EINVAL;
- else if ( paging_mode_refcounts(d) )
- /* nothing */;
else if ( cr3_page == v->arch.old_guest_table )
{
v->arch.old_guest_table = NULL;
@@ -1144,8 +1142,7 @@ int arch_set_info_guest(
case -ERESTART:
break;
case 0:
- if ( !compat && !VM_ASSIST(d, m2p_strict) &&
- !paging_mode_refcounts(d) )
+ if ( !compat && !VM_ASSIST(d, m2p_strict) )
fill_ro_mpt(cr3_mfn);
break;
default:
@@ -1166,7 +1163,7 @@ int arch_set_info_guest(
if ( !cr3_page )
rc = -EINVAL;
- else if ( !paging_mode_refcounts(d) )
+ else
{
rc = get_page_type_preemptible(cr3_page, PGT_root_page_table);
switch ( rc )
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3149,9 +3149,9 @@ int vcpu_destroy_pagetables(struct vcpu
return rc;
}
+#ifdef CONFIG_PV
int new_guest_cr3(mfn_t mfn)
{
-#ifdef CONFIG_PV
struct vcpu *curr = current;
struct domain *d = curr->domain;
int rc;
@@ -3220,7 +3220,7 @@ int new_guest_cr3(mfn_t mfn)
pv_destroy_ldt(curr); /* Unconditional TLB flush later. */
- if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+ if ( !VM_ASSIST(d, m2p_strict) )
fill_ro_mpt(mfn);
curr->arch.guest_table = pagetable_from_mfn(mfn);
update_cr3(curr);
@@ -3231,30 +3231,24 @@ int new_guest_cr3(mfn_t mfn)
{
struct page_info *page = mfn_to_page(old_base_mfn);
- if ( paging_mode_refcounts(d) )
- put_page(page);
- else
- switch ( rc = put_page_and_type_preemptible(page) )
- {
- case -EINTR:
- case -ERESTART:
- curr->arch.old_guest_ptpg = NULL;
- curr->arch.old_guest_table = page;
- curr->arch.old_guest_table_partial = (rc == -ERESTART);
- rc = -ERESTART;
- break;
- default:
- BUG_ON(rc);
- break;
- }
+ switch ( rc = put_page_and_type_preemptible(page) )
+ {
+ case -EINTR:
+ case -ERESTART:
+ curr->arch.old_guest_ptpg = NULL;
+ curr->arch.old_guest_table = page;
+ curr->arch.old_guest_table_partial = (rc == -ERESTART);
+ rc = -ERESTART;
+ break;
+ default:
+ BUG_ON(rc);
+ break;
+ }
}
return rc;
-#else
- ASSERT_UNREACHABLE();
- return -EINVAL;
-#endif
}
+#endif
#ifdef CONFIG_PV
static int vcpumask_to_pcpumask(
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 4/5] x86/shadow: have just a single instance of sh_set_toplevel_shadow()
2020-07-15 9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
` (2 preceding siblings ...)
2020-07-15 9:59 ` [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks Jan Beulich
@ 2020-07-15 9:59 ` Jan Beulich
2020-07-15 10:00 ` [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only Jan Beulich
2020-07-18 18:21 ` [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Tim Deegan
5 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2020-07-15 9:59 UTC (permalink / raw)
To: xen-devel
Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné
The only guest/shadow level dependent piece here is the call to
sh_make_shadow(). Make a pointer to the respective function an
argument of sh_set_toplevel_shadow(), allowing it to be moved to
common.c.
This implies making get_shadow_status() available to common.c; its set
and delete counterparts are moved along with it.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Besides reducing compiled code size, this also avoids the function
becoming unused in !HVM builds (in two out of the three objects built)
in a subsequent change.
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2560,6 +2560,80 @@ void shadow_update_paging_modes(struct v
paging_unlock(v->domain);
}
+/* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
+void sh_set_toplevel_shadow(struct vcpu *v,
+ unsigned int slot,
+ mfn_t gmfn,
+ unsigned int root_type,
+ mfn_t (*make_shadow)(struct vcpu *v,
+ mfn_t gmfn,
+ uint32_t shadow_type))
+{
+ mfn_t smfn;
+ pagetable_t old_entry, new_entry;
+ struct domain *d = v->domain;
+
+ /* Remember the old contents of this slot */
+ old_entry = v->arch.paging.shadow.shadow_table[slot];
+
+ /* Now figure out the new contents: is this a valid guest MFN? */
+ if ( !mfn_valid(gmfn) )
+ {
+ new_entry = pagetable_null();
+ goto install_new_entry;
+ }
+
+ /* Guest mfn is valid: shadow it and install the shadow */
+ smfn = get_shadow_status(d, gmfn, root_type);
+ if ( !mfn_valid(smfn) )
+ {
+ /* Make sure there's enough free shadow memory. */
+ shadow_prealloc(d, root_type, 1);
+ /* Shadow the page. */
+ smfn = make_shadow(v, gmfn, root_type);
+ }
+ ASSERT(mfn_valid(smfn));
+
+ /* Take a ref to this page: it will be released in sh_detach_old_tables()
+ * or the next call to set_toplevel_shadow() */
+ if ( sh_get_ref(d, smfn, 0) )
+ {
+ /* Pin the shadow and put it (back) on the list of pinned shadows */
+ sh_pin(d, smfn);
+
+ new_entry = pagetable_from_mfn(smfn);
+ }
+ else
+ {
+ printk(XENLOG_G_ERR "can't install %"PRI_mfn" as toplevel shadow\n",
+ mfn_x(smfn));
+ domain_crash(d);
+ new_entry = pagetable_null();
+ }
+
+ install_new_entry:
+ /* Done. Install it */
+ SHADOW_PRINTK("%u [%u] gmfn %#"PRI_mfn" smfn %#"PRI_mfn"\n",
+ v->arch.paging.mode->shadow.shadow_levels, slot,
+ mfn_x(gmfn), mfn_x(pagetable_get_mfn(new_entry)));
+ v->arch.paging.shadow.shadow_table[slot] = new_entry;
+
+ /* Decrement the refcount of the old contents of this slot */
+ if ( !pagetable_is_null(old_entry) )
+ {
+ mfn_t old_smfn = pagetable_get_mfn(old_entry);
+ /* Need to repin the old toplevel shadow if it's been unpinned
+ * by shadow_prealloc(): in PV mode we're still running on this
+ * shadow and it's not safe to free it yet. */
+ if ( !mfn_to_page(old_smfn)->u.sh.pinned && !sh_pin(d, old_smfn) )
+ {
+ printk(XENLOG_G_ERR "can't re-pin %"PRI_mfn"\n", mfn_x(old_smfn));
+ domain_crash(d);
+ }
+ sh_put_ref(d, old_smfn, 0);
+ }
+}
+
/**************************************************************************/
/* Turning on and off shadow features */
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -103,7 +103,7 @@ static void sh_flush_local(const struct
/**************************************************************************/
/* Hash table mapping from guest pagetables to shadows
*
- * Normal case: maps the mfn of a guest page to the mfn of its shadow page.
+ * normal case: see private.h.
* FL1's: maps the *gfn* of the start of a superpage to the mfn of a
* shadow L1 which maps its "splinters".
*/
@@ -117,16 +117,6 @@ get_fl1_shadow_status(struct domain *d,
return smfn;
}
-static inline mfn_t
-get_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type)
-/* Look for shadows in the hash table */
-{
- mfn_t smfn = shadow_hash_lookup(d, mfn_x(gmfn), shadow_type);
- ASSERT(!mfn_valid(smfn) || mfn_to_page(smfn)->u.sh.head);
- perfc_incr(shadow_get_shadow_status);
- return smfn;
-}
-
static inline void
set_fl1_shadow_status(struct domain *d, gfn_t gfn, mfn_t smfn)
/* Put an FL1 shadow into the hash table */
@@ -139,27 +129,6 @@ set_fl1_shadow_status(struct domain *d,
}
static inline void
-set_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
-/* Put a shadow into the hash table */
-{
- int res;
-
- SHADOW_PRINTK("d%d gmfn=%lx, type=%08x, smfn=%lx\n",
- d->domain_id, mfn_x(gmfn), shadow_type, mfn_x(smfn));
-
- ASSERT(mfn_to_page(smfn)->u.sh.head);
-
- /* 32-bit PV guests don't own their l4 pages so can't get_page them */
- if ( !is_pv_32bit_domain(d) || shadow_type != SH_type_l4_64_shadow )
- {
- res = get_page(mfn_to_page(gmfn), d);
- ASSERT(res == 1);
- }
-
- shadow_hash_insert(d, mfn_x(gmfn), shadow_type, smfn);
-}
-
-static inline void
delete_fl1_shadow_status(struct domain *d, gfn_t gfn, mfn_t smfn)
/* Remove a shadow from the hash table */
{
@@ -169,19 +138,6 @@ delete_fl1_shadow_status(struct domain *
shadow_hash_delete(d, gfn_x(gfn), SH_type_fl1_shadow, smfn);
}
-static inline void
-delete_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
-/* Remove a shadow from the hash table */
-{
- SHADOW_PRINTK("d%d gmfn=%"PRI_mfn", type=%08x, smfn=%"PRI_mfn"\n",
- d->domain_id, mfn_x(gmfn), shadow_type, mfn_x(smfn));
- ASSERT(mfn_to_page(smfn)->u.sh.head);
- shadow_hash_delete(d, mfn_x(gmfn), shadow_type, smfn);
- /* 32-bit PV guests don't own their l4 pages; see set_shadow_status */
- if ( !is_pv_32bit_domain(d) || shadow_type != SH_type_l4_64_shadow )
- put_page(mfn_to_page(gmfn));
-}
-
/**************************************************************************/
/* Functions for walking the guest page tables */
@@ -3845,78 +3801,6 @@ sh_detach_old_tables(struct vcpu *v)
}
}
-/* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
-static void
-sh_set_toplevel_shadow(struct vcpu *v,
- unsigned int slot,
- mfn_t gmfn,
- unsigned int root_type)
-{
- mfn_t smfn;
- pagetable_t old_entry, new_entry;
- struct domain *d = v->domain;
-
- /* Remember the old contents of this slot */
- old_entry = v->arch.paging.shadow.shadow_table[slot];
-
- /* Now figure out the new contents: is this a valid guest MFN? */
- if ( !mfn_valid(gmfn) )
- {
- new_entry = pagetable_null();
- goto install_new_entry;
- }
-
- /* Guest mfn is valid: shadow it and install the shadow */
- smfn = get_shadow_status(d, gmfn, root_type);
- if ( !mfn_valid(smfn) )
- {
- /* Make sure there's enough free shadow memory. */
- shadow_prealloc(d, root_type, 1);
- /* Shadow the page. */
- smfn = sh_make_shadow(v, gmfn, root_type);
- }
- ASSERT(mfn_valid(smfn));
-
- /* Take a ref to this page: it will be released in sh_detach_old_tables()
- * or the next call to set_toplevel_shadow() */
- if ( sh_get_ref(d, smfn, 0) )
- {
- /* Pin the shadow and put it (back) on the list of pinned shadows */
- sh_pin(d, smfn);
-
- new_entry = pagetable_from_mfn(smfn);
- }
- else
- {
- printk(XENLOG_G_ERR "can't install %"PRI_mfn" as toplevel shadow\n",
- mfn_x(smfn));
- domain_crash(d);
- new_entry = pagetable_null();
- }
-
- install_new_entry:
- /* Done. Install it */
- SHADOW_PRINTK("%u/%u [%u] gmfn %#"PRI_mfn" smfn %#"PRI_mfn"\n",
- GUEST_PAGING_LEVELS, SHADOW_PAGING_LEVELS, slot,
- mfn_x(gmfn), mfn_x(pagetable_get_mfn(new_entry)));
- v->arch.paging.shadow.shadow_table[slot] = new_entry;
-
- /* Decrement the refcount of the old contents of this slot */
- if ( !pagetable_is_null(old_entry) ) {
- mfn_t old_smfn = pagetable_get_mfn(old_entry);
- /* Need to repin the old toplevel shadow if it's been unpinned
- * by shadow_prealloc(): in PV mode we're still running on this
- * shadow and it's not safe to free it yet. */
- if ( !mfn_to_page(old_smfn)->u.sh.pinned && !sh_pin(d, old_smfn) )
- {
- printk(XENLOG_G_ERR "can't re-pin %"PRI_mfn"\n", mfn_x(old_smfn));
- domain_crash(d);
- }
- sh_put_ref(d, old_smfn, 0);
- }
-}
-
-
static void
sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
/* Updates vcpu->arch.cr3 after the guest has changed CR3.
@@ -4014,7 +3898,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
#if GUEST_PAGING_LEVELS == 2
if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
guest_flush_tlb_mask(d, d->dirty_cpumask);
- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow);
+ sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
#elif GUEST_PAGING_LEVELS == 3
/* PAE guests have four shadow_table entries, based on the
* current values of the guest's four l3es. */
@@ -4048,18 +3932,20 @@ sh_update_cr3(struct vcpu *v, int do_loc
if ( p2m_is_ram(p2mt) )
sh_set_toplevel_shadow(v, i, gl2mfn, (i == 3)
? SH_type_l2h_shadow
- : SH_type_l2_shadow);
+ : SH_type_l2_shadow,
+ sh_make_shadow);
else
- sh_set_toplevel_shadow(v, i, INVALID_MFN, 0);
+ sh_set_toplevel_shadow(v, i, INVALID_MFN, 0,
+ sh_make_shadow);
}
else
- sh_set_toplevel_shadow(v, i, INVALID_MFN, 0);
+ sh_set_toplevel_shadow(v, i, INVALID_MFN, 0, sh_make_shadow);
}
}
#elif GUEST_PAGING_LEVELS == 4
if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
guest_flush_tlb_mask(d, d->dirty_cpumask);
- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow);
+ sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
{
mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -357,6 +357,15 @@ mfn_t shadow_alloc(struct domain *d,
unsigned long backpointer);
void shadow_free(struct domain *d, mfn_t smfn);
+/* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
+void sh_set_toplevel_shadow(struct vcpu *v,
+ unsigned int slot,
+ mfn_t gmfn,
+ unsigned int root_type,
+ mfn_t (*make_shadow)(struct vcpu *v,
+ mfn_t gmfn,
+ uint32_t shadow_type));
+
/* Install the xen mappings in various flavours of shadow */
void sh_install_xen_entries_in_l4(struct domain *, mfn_t gl4mfn, mfn_t sl4mfn);
@@ -701,6 +710,58 @@ static inline void sh_unpin(struct domai
}
+/**************************************************************************/
+/* Hash table mapping from guest pagetables to shadows
+ *
+ * Normal case: maps the mfn of a guest page to the mfn of its shadow page.
+ * FL1's: see multi.c.
+ */
+
+static inline mfn_t
+get_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type)
+/* Look for shadows in the hash table */
+{
+ mfn_t smfn = shadow_hash_lookup(d, mfn_x(gmfn), shadow_type);
+ ASSERT(!mfn_valid(smfn) || mfn_to_page(smfn)->u.sh.head);
+ perfc_incr(shadow_get_shadow_status);
+ return smfn;
+}
+
+static inline void
+set_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
+/* Put a shadow into the hash table */
+{
+ int res;
+
+ SHADOW_PRINTK("d%d gmfn=%lx, type=%08x, smfn=%lx\n",
+ d->domain_id, mfn_x(gmfn), shadow_type, mfn_x(smfn));
+
+ ASSERT(mfn_to_page(smfn)->u.sh.head);
+
+ /* 32-bit PV guests don't own their l4 pages so can't get_page them */
+ if ( !is_pv_32bit_domain(d) || shadow_type != SH_type_l4_64_shadow )
+ {
+ res = get_page(mfn_to_page(gmfn), d);
+ ASSERT(res == 1);
+ }
+
+ shadow_hash_insert(d, mfn_x(gmfn), shadow_type, smfn);
+}
+
+static inline void
+delete_shadow_status(struct domain *d, mfn_t gmfn, u32 shadow_type, mfn_t smfn)
+/* Remove a shadow from the hash table */
+{
+ SHADOW_PRINTK("d%d gmfn=%"PRI_mfn", type=%08x, smfn=%"PRI_mfn"\n",
+ d->domain_id, mfn_x(gmfn), shadow_type, mfn_x(smfn));
+ ASSERT(mfn_to_page(smfn)->u.sh.head);
+ shadow_hash_delete(d, mfn_x(gmfn), shadow_type, smfn);
+ /* 32-bit PV guests don't own their l4 pages; see set_shadow_status */
+ if ( !is_pv_32bit_domain(d) || shadow_type != SH_type_l4_64_shadow )
+ put_page(mfn_to_page(gmfn));
+}
+
+
/**************************************************************************/
/* PTE-write emulation. */
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only
2020-07-15 9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
` (3 preceding siblings ...)
2020-07-15 9:59 ` [PATCH 4/5] x86/shadow: have just a single instance of sh_set_toplevel_shadow() Jan Beulich
@ 2020-07-15 10:00 ` Jan Beulich
2020-07-18 18:20 ` Tim Deegan
2020-07-18 18:21 ` [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Tim Deegan
5 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2020-07-15 10:00 UTC (permalink / raw)
To: xen-devel
Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Roger Pau Monné
... by the very fact that they're 3-level specific, while PV always gets
run in 4-level mode. This requires adding some seemingly redundant
#ifdef-s - some of them will be possible to drop again once 2- and
3-level guest code doesn't get built anymore in !HVM configs, but I'm
afraid there's still quite a bit of disentangling work to be done to
make this possible.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -150,10 +150,7 @@ sh_walk_guest_tables(struct vcpu *v, uns
? cr3_pa(v->arch.hvm.guest_cr[3]) >> PAGE_SHIFT
: pagetable_get_pfn(v->arch.guest_table));
-#if GUEST_PAGING_LEVELS == 3 /* PAE */
- return guest_walk_tables(v, p2m_get_hostp2m(v->domain), va, gw, pfec,
- root_gfn, INVALID_MFN, v->arch.paging.shadow.gl3e);
-#else /* 32 or 64 */
+#if GUEST_PAGING_LEVELS != 3 /* 32 or 64 */
const struct domain *d = v->domain;
mfn_t root_mfn = (v->arch.flags & TF_kernel_mode
? pagetable_get_mfn(v->arch.guest_table)
@@ -165,6 +162,13 @@ sh_walk_guest_tables(struct vcpu *v, uns
unmap_domain_page(root_map);
return ok;
+#elif !defined(CONFIG_HVM)
+ (void)root_gfn;
+ memset(gw, 0, sizeof(*gw));
+ return false;
+#else /* PAE */
+ return guest_walk_tables(v, p2m_get_hostp2m(v->domain), va, gw, pfec,
+ root_gfn, INVALID_MFN, v->arch.paging.shadow.gl3e);
#endif
}
@@ -211,7 +215,7 @@ shadow_check_gwalk(struct vcpu *v, unsig
l3p = map_domain_page(gw->l3mfn);
mismatch |= (gw->l3e.l3 != l3p[guest_l3_table_offset(va)].l3);
unmap_domain_page(l3p);
-#else
+#elif defined(CONFIG_HVM)
mismatch |= (gw->l3e.l3 !=
v->arch.paging.shadow.gl3e[guest_l3_table_offset(va)].l3);
#endif
@@ -1693,6 +1697,8 @@ static shadow_l2e_t * shadow_get_and_cre
}
/* Now follow it down a level. Guaranteed to succeed. */
return sh_linear_l2_table(v) + shadow_l2_linear_offset(gw->va);
+#elif !defined(CONFIG_HVM)
+ return NULL;
#elif GUEST_PAGING_LEVELS == 3 /* PAE... */
/* We never demand-shadow PAE l3es: they are only created in
* sh_update_cr3(). Check if the relevant sl3e is present. */
@@ -3526,6 +3532,8 @@ static bool sh_invlpg(struct vcpu *v, un
if ( !(shadow_l3e_get_flags(sl3e) & _PAGE_PRESENT) )
return false;
}
+#elif !defined(CONFIG_HVM)
+ return false;
#else /* SHADOW_PAGING_LEVELS == 3 */
if ( !(l3e_get_flags(v->arch.paging.shadow.l3table[shadow_l3_linear_offset(linear)])
& _PAGE_PRESENT) )
@@ -3679,7 +3687,9 @@ sh_update_linear_entries(struct vcpu *v)
pagetable_get_pfn(v->arch.hvm.monitor_table) == 0 )
return;
-#if SHADOW_PAGING_LEVELS == 4
+#if !defined(CONFIG_HVM)
+ return;
+#elif SHADOW_PAGING_LEVELS == 4
/* For HVM, just need to update the l4e that points to the shadow l4. */
@@ -3816,7 +3826,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
{
struct domain *d = v->domain;
mfn_t gmfn;
-#if GUEST_PAGING_LEVELS == 3
+#if GUEST_PAGING_LEVELS == 3 && defined(CONFIG_HVM)
const guest_l3e_t *gl3e;
unsigned int i, guest_idx;
#endif
@@ -3867,7 +3877,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
#endif
gmfn = pagetable_get_mfn(v->arch.guest_table);
-#if GUEST_PAGING_LEVELS == 3
+#if GUEST_PAGING_LEVELS == 3 && defined(CONFIG_HVM)
/*
* On PAE guests we don't use a mapping of the guest's own top-level
* table. We cache the current state of that table and shadow that,
@@ -3895,10 +3905,22 @@ sh_update_cr3(struct vcpu *v, int do_loc
/* We revoke write access to the new guest toplevel page(s) before we
* replace the old shadow pagetable(s), so that we can safely use the
* (old) shadow linear maps in the writeable mapping heuristics. */
-#if GUEST_PAGING_LEVELS == 2
- if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
+#if GUEST_PAGING_LEVELS == 4
+ if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
guest_flush_tlb_mask(d, d->dirty_cpumask);
- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
+ sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
+ if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
+ {
+ mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
+
+ if ( !(v->arch.flags & TF_kernel_mode) && VM_ASSIST(d, m2p_strict) )
+ zap_ro_mpt(smfn);
+ else if ( (v->arch.flags & TF_kernel_mode) &&
+ !VM_ASSIST(d, m2p_strict) )
+ fill_ro_mpt(smfn);
+ }
+#elif !defined(CONFIG_HVM)
+ ASSERT_UNREACHABLE();
#elif GUEST_PAGING_LEVELS == 3
/* PAE guests have four shadow_table entries, based on the
* current values of the guest's four l3es. */
@@ -3907,7 +3929,8 @@ sh_update_cr3(struct vcpu *v, int do_loc
gfn_t gl2gfn;
mfn_t gl2mfn;
p2m_type_t p2mt;
- const guest_l3e_t *gl3e = v->arch.paging.shadow.gl3e;
+
+ gl3e = v->arch.paging.shadow.gl3e;
/* First, make all four entries read-only. */
for ( i = 0; i < 4; i++ )
@@ -3942,25 +3965,16 @@ sh_update_cr3(struct vcpu *v, int do_loc
sh_set_toplevel_shadow(v, i, INVALID_MFN, 0, sh_make_shadow);
}
}
-#elif GUEST_PAGING_LEVELS == 4
- if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
+#elif GUEST_PAGING_LEVELS == 2
+ if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
guest_flush_tlb_mask(d, d->dirty_cpumask);
- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
- if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
- {
- mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
-
- if ( !(v->arch.flags & TF_kernel_mode) && VM_ASSIST(d, m2p_strict) )
- zap_ro_mpt(smfn);
- else if ( (v->arch.flags & TF_kernel_mode) &&
- !VM_ASSIST(d, m2p_strict) )
- fill_ro_mpt(smfn);
- }
+ sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
#else
#error This should never happen
#endif
+#ifdef CONFIG_HVM
///
/// v->arch.paging.shadow.l3table
///
@@ -3986,7 +4000,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
}
}
#endif /* SHADOW_PAGING_LEVELS == 3 */
-
+#endif /* CONFIG_HVM */
///
/// v->arch.cr3
@@ -4006,6 +4020,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
#endif
+#ifdef CONFIG_HVM
///
/// v->arch.hvm.hw_cr[3]
///
@@ -4022,6 +4037,7 @@ sh_update_cr3(struct vcpu *v, int do_loc
#endif
hvm_update_guest_cr3(v, noflush);
}
+#endif /* CONFIG_HVM */
/* Fix up the linear pagetable mappings */
sh_update_linear_entries(v);
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -131,15 +131,16 @@ struct shadow_domain {
struct shadow_vcpu {
#ifdef CONFIG_SHADOW_PAGING
+#ifdef CONFIG_HVM
/* PAE guests: per-vcpu shadow top-level table */
l3_pgentry_t l3table[4] __attribute__((__aligned__(32)));
/* PAE guests: per-vcpu cache of the top-level *guest* entries */
l3_pgentry_t gl3e[4] __attribute__((__aligned__(32)));
/* shadow(s) of guest (MFN) */
-#ifdef CONFIG_HVM
pagetable_t shadow_table[4];
#else
+ /* shadow of guest (MFN) */
pagetable_t shadow_table[1];
#endif
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only
2020-07-15 10:00 ` [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only Jan Beulich
@ 2020-07-18 18:20 ` Tim Deegan
2020-07-20 8:55 ` Jan Beulich
0 siblings, 1 reply; 12+ messages in thread
From: Tim Deegan @ 2020-07-18 18:20 UTC (permalink / raw)
To: Jan Beulich
Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper
At 12:00 +0200 on 15 Jul (1594814409), Jan Beulich wrote:
> ... by the very fact that they're 3-level specific, while PV always gets
> run in 4-level mode. This requires adding some seemingly redundant
> #ifdef-s - some of them will be possible to drop again once 2- and
> 3-level guest code doesn't get built anymore in !HVM configs, but I'm
> afraid there's still quite a bit of disentangling work to be done to
> make this possible.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Looks good. It seems like the new code for '3-level non-HVM' in
guest-walks ought to have some sort of assert-unreachable in them too
- or is there a reason to to?
Cheers,
Tim.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up
2020-07-15 9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
` (4 preceding siblings ...)
2020-07-15 10:00 ` [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only Jan Beulich
@ 2020-07-18 18:21 ` Tim Deegan
5 siblings, 0 replies; 12+ messages in thread
From: Tim Deegan @ 2020-07-18 18:21 UTC (permalink / raw)
To: Jan Beulich
Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper
At 11:56 +0200 on 15 Jul (1594814214), Jan Beulich wrote:
> This in particular goes a few small steps further towards proper
> !HVM and !PV config handling (i.e. no carrying of unnecessary
> baggage).
>
> 1: x86/shadow: dirty VRAM tracking is needed for HVM only
> 2: x86/shadow: shadow_table[] needs only one entry for PV-only configs
> 3: x86/PV: drop a few misleading paging_mode_refcounts() checks
> 4: x86/shadow: have just a single instance of sh_set_toplevel_shadow()
> 5: x86/shadow: l3table[] and gl3e[] are HVM only
I sent a question on #5 separatly; otherwise these all seem good to
me, thank you!
Acked-by: Tim Deegan <tim@xen.org>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only
2020-07-18 18:20 ` Tim Deegan
@ 2020-07-20 8:55 ` Jan Beulich
2020-07-20 17:37 ` Tim Deegan
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2020-07-20 8:55 UTC (permalink / raw)
To: Tim Deegan
Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper
On 18.07.2020 20:20, Tim Deegan wrote:
> At 12:00 +0200 on 15 Jul (1594814409), Jan Beulich wrote:
>> ... by the very fact that they're 3-level specific, while PV always gets
>> run in 4-level mode. This requires adding some seemingly redundant
>> #ifdef-s - some of them will be possible to drop again once 2- and
>> 3-level guest code doesn't get built anymore in !HVM configs, but I'm
>> afraid there's still quite a bit of disentangling work to be done to
>> make this possible.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> Looks good. It seems like the new code for '3-level non-HVM' in
> guest-walks ought to have some sort of assert-unreachable in them too
> - or is there a reason to to?
You mean this piece of code
+#elif !defined(CONFIG_HVM)
+ (void)root_gfn;
+ memset(gw, 0, sizeof(*gw));
+ return false;
+#else /* PAE */
If so - sure, ASSERT_UNREACHABLE() could be added there. It simply
didn't occur to me. I take it your ack for the entire series holds
here with this addition.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only
2020-07-20 8:55 ` Jan Beulich
@ 2020-07-20 17:37 ` Tim Deegan
0 siblings, 0 replies; 12+ messages in thread
From: Tim Deegan @ 2020-07-20 17:37 UTC (permalink / raw)
To: Jan Beulich
Cc: George Dunlap, xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper
At 10:55 +0200 on 20 Jul (1595242521), Jan Beulich wrote:
> On 18.07.2020 20:20, Tim Deegan wrote:
> > At 12:00 +0200 on 15 Jul (1594814409), Jan Beulich wrote:
> >> ... by the very fact that they're 3-level specific, while PV always gets
> >> run in 4-level mode. This requires adding some seemingly redundant
> >> #ifdef-s - some of them will be possible to drop again once 2- and
> >> 3-level guest code doesn't get built anymore in !HVM configs, but I'm
> >> afraid there's still quite a bit of disentangling work to be done to
> >> make this possible.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >
> > Looks good. It seems like the new code for '3-level non-HVM' in
> > guest-walks ought to have some sort of assert-unreachable in them too
> > - or is there a reason to to?
>
> You mean this piece of code
>
> +#elif !defined(CONFIG_HVM)
> + (void)root_gfn;
> + memset(gw, 0, sizeof(*gw));
> + return false;
> +#else /* PAE */
>
> If so - sure, ASSERT_UNREACHABLE() could be added there. It simply
> didn't occur to me. I take it your ack for the entire series holds
> here with this addition.
Yes, it does. Thanks!
Tim.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Ping: [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks
2020-07-15 9:59 ` [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks Jan Beulich
@ 2020-07-31 14:58 ` Jan Beulich
2020-07-31 15:17 ` Andrew Cooper
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2020-07-31 14:58 UTC (permalink / raw)
To: Andrew Cooper, Wei Liu, Roger Pau Monné
Cc: George Dunlap, xen-devel, Tim Deegan
On 15.07.2020 11:59, Jan Beulich wrote:
> The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
> was apparently inconsistent so far: There was a type ref acquired
> unconditionally for the new top level page table, but the dropping of
> the old type ref was conditional upon !paging_mode_refcounts(). Mirror
> this also to arch_set_info_guest().
>
> Also move new_guest_cr3()'s #ifdef to around the function - both callers
> now get built only when CONFIG_PV, i.e. no need to retain a stub.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
While I've got an ack from Tim, I think I need either an ack from
Andrew or someone's R-b in order to commit this.
Thanks, Jan
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1122,8 +1122,6 @@ int arch_set_info_guest(
>
> if ( !cr3_page )
> rc = -EINVAL;
> - else if ( paging_mode_refcounts(d) )
> - /* nothing */;
> else if ( cr3_page == v->arch.old_guest_table )
> {
> v->arch.old_guest_table = NULL;
> @@ -1144,8 +1142,7 @@ int arch_set_info_guest(
> case -ERESTART:
> break;
> case 0:
> - if ( !compat && !VM_ASSIST(d, m2p_strict) &&
> - !paging_mode_refcounts(d) )
> + if ( !compat && !VM_ASSIST(d, m2p_strict) )
> fill_ro_mpt(cr3_mfn);
> break;
> default:
> @@ -1166,7 +1163,7 @@ int arch_set_info_guest(
>
> if ( !cr3_page )
> rc = -EINVAL;
> - else if ( !paging_mode_refcounts(d) )
> + else
> {
> rc = get_page_type_preemptible(cr3_page, PGT_root_page_table);
> switch ( rc )
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -3149,9 +3149,9 @@ int vcpu_destroy_pagetables(struct vcpu
> return rc;
> }
>
> +#ifdef CONFIG_PV
> int new_guest_cr3(mfn_t mfn)
> {
> -#ifdef CONFIG_PV
> struct vcpu *curr = current;
> struct domain *d = curr->domain;
> int rc;
> @@ -3220,7 +3220,7 @@ int new_guest_cr3(mfn_t mfn)
>
> pv_destroy_ldt(curr); /* Unconditional TLB flush later. */
>
> - if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
> + if ( !VM_ASSIST(d, m2p_strict) )
> fill_ro_mpt(mfn);
> curr->arch.guest_table = pagetable_from_mfn(mfn);
> update_cr3(curr);
> @@ -3231,30 +3231,24 @@ int new_guest_cr3(mfn_t mfn)
> {
> struct page_info *page = mfn_to_page(old_base_mfn);
>
> - if ( paging_mode_refcounts(d) )
> - put_page(page);
> - else
> - switch ( rc = put_page_and_type_preemptible(page) )
> - {
> - case -EINTR:
> - case -ERESTART:
> - curr->arch.old_guest_ptpg = NULL;
> - curr->arch.old_guest_table = page;
> - curr->arch.old_guest_table_partial = (rc == -ERESTART);
> - rc = -ERESTART;
> - break;
> - default:
> - BUG_ON(rc);
> - break;
> - }
> + switch ( rc = put_page_and_type_preemptible(page) )
> + {
> + case -EINTR:
> + case -ERESTART:
> + curr->arch.old_guest_ptpg = NULL;
> + curr->arch.old_guest_table = page;
> + curr->arch.old_guest_table_partial = (rc == -ERESTART);
> + rc = -ERESTART;
> + break;
> + default:
> + BUG_ON(rc);
> + break;
> + }
> }
>
> return rc;
> -#else
> - ASSERT_UNREACHABLE();
> - return -EINVAL;
> -#endif
> }
> +#endif
>
> #ifdef CONFIG_PV
> static int vcpumask_to_pcpumask(
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Ping: [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks
2020-07-31 14:58 ` Ping: " Jan Beulich
@ 2020-07-31 15:17 ` Andrew Cooper
0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2020-07-31 15:17 UTC (permalink / raw)
To: Jan Beulich, Wei Liu, Roger Pau Monné
Cc: George Dunlap, xen-devel, Tim Deegan
On 31/07/2020 15:58, Jan Beulich wrote:
> On 15.07.2020 11:59, Jan Beulich wrote:
>> The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
>> was apparently inconsistent so far: There was a type ref acquired
>> unconditionally for the new top level page table, but the dropping of
>> the old type ref was conditional upon !paging_mode_refcounts(). Mirror
>> this also to arch_set_info_guest().
>>
>> Also move new_guest_cr3()'s #ifdef to around the function - both callers
>> now get built only when CONFIG_PV, i.e. no need to retain a stub.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> While I've got an ack from Tim, I think I need either an ack from
> Andrew or someone's R-b in order to commit this.
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-07-31 15:17 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 9:56 [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Jan Beulich
2020-07-15 9:58 ` [PATCH 1/5] x86/shadow: dirty VRAM tracking is needed for HVM only Jan Beulich
2020-07-15 9:58 ` [PATCH 2/5] x86/shadow: shadow_table[] needs only one entry for PV-only configs Jan Beulich
2020-07-15 9:59 ` [PATCH 3/5] x86/PV: drop a few misleading paging_mode_refcounts() checks Jan Beulich
2020-07-31 14:58 ` Ping: " Jan Beulich
2020-07-31 15:17 ` Andrew Cooper
2020-07-15 9:59 ` [PATCH 4/5] x86/shadow: have just a single instance of sh_set_toplevel_shadow() Jan Beulich
2020-07-15 10:00 ` [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only Jan Beulich
2020-07-18 18:20 ` Tim Deegan
2020-07-20 8:55 ` Jan Beulich
2020-07-20 17:37 ` Tim Deegan
2020-07-18 18:21 ` [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up Tim Deegan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.