All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] x86: more or less log-dirty related improvements
@ 2022-04-26 10:20 Jan Beulich
  2022-04-26 10:22 ` [PATCH v3 1/8] libxenguest: short-circuit "all-dirty" handling Jan Beulich
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:20 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Anthony Perard

... or so I hope. This series continues the attempt to deal with
the ovmf change putting the shared info page at a very high address
(which is now planned to get reverted there, but the general
problem doesn't go away by them doing so). There are further issues
with truncated value, which are being dealt with here. But there
are also not directly related changes, when I simply spotted things
that aren't very likely to be right the way they are. And then
there are also adjustments to the underlying hypervisor
implementation, with the goal of making the returned data more
useful to consumers.

With these changes in place, a 1Gb guest which has "inflated"
itself by putting a page right below the 16Tb boundary migrates
successfully, albeit the process takes from some 20 minutes to over
half an hour on my test system.

No significant changes in v3 (individual patches have details); a
controversial patch was moved later in the series.

1: libxenguest: short-circuit "all-dirty" handling
2: libxenguest: avoid allocating unused deferred-pages bitmap
3: libxenguest: guard against overflow from too large p2m when checkpointing
4: libxenguest: restrict PV guest size
5: libxenguest: deal with log-dirty op stats overflow
6: x86/paging: supply more useful log-dirty page count
7: x86/mm: update log-dirty bitmap when manipulating P2M
8: SUPPORT.md: write down restriction of 32-bit tool stacks

Jan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/8] libxenguest: short-circuit "all-dirty" handling
  2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
@ 2022-04-26 10:22 ` Jan Beulich
  2022-04-26 10:22 ` [PATCH v3 2/8] libxenguest: avoid allocating unused deferred-pages bitmap Jan Beulich
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Anthony Perard

For one it is unnecessary to fill a perhaps large chunk of memory with
all ones. Add a new parameter to send_dirty_pages() for callers to
indicate so.

Then it is further unnecessary to allocate the dirty bitmap altogether
when all that's ever going to happen is a single all-dirty run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
---
NOTE: Ian demands that this not be committed without also committing
"libxenguest: avoid allocating unused deferred-pages bitmap" (because
of the ugly -1L).

--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -364,7 +364,7 @@ static int suspend_domain(struct xc_sr_c
  * Bitmap is bounded by p2m_size.
  */
 static int send_dirty_pages(struct xc_sr_context *ctx,
-                            unsigned long entries)
+                            unsigned long entries, bool all_dirty)
 {
     xc_interface *xch = ctx->xch;
     xen_pfn_t p;
@@ -375,7 +375,7 @@ static int send_dirty_pages(struct xc_sr
 
     for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
     {
-        if ( !test_bit(p, dirty_bitmap) )
+        if ( !all_dirty && !test_bit(p, dirty_bitmap) )
             continue;
 
         rc = add_to_batch(ctx, p);
@@ -407,12 +407,7 @@ static int send_dirty_pages(struct xc_sr
  */
 static int send_all_pages(struct xc_sr_context *ctx)
 {
-    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
-                                    &ctx->save.dirty_bitmap_hbuf);
-
-    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
-
-    return send_dirty_pages(ctx, ctx->save.p2m_size);
+    return send_dirty_pages(ctx, ctx->save.p2m_size, true /* all_dirty */);
 }
 
 static int enable_logdirty(struct xc_sr_context *ctx)
@@ -502,9 +497,6 @@ static int send_memory_live(struct xc_sr
     int rc;
     int policy_decision;
 
-    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
-                                    &ctx->save.dirty_bitmap_hbuf);
-
     precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
     void *data = ctx->save.callbacks->data;
 
@@ -522,8 +514,6 @@ static int send_memory_live(struct xc_sr
     if ( precopy_policy == NULL )
         precopy_policy = simple_precopy_policy;
 
-    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
-
     for ( ; ; )
     {
         policy_decision = precopy_policy(*policy_stats, data);
@@ -535,7 +525,7 @@ static int send_memory_live(struct xc_sr
             if ( rc )
                 goto out;
 
-            rc = send_dirty_pages(ctx, stats.dirty_count);
+            rc = send_dirty_pages(ctx, stats.dirty_count, x == 1);
             if ( rc )
                 goto out;
         }
@@ -681,7 +671,8 @@ static int suspend_and_send_dirty(struct
         }
     }
 
-    rc = send_dirty_pages(ctx, stats.dirty_count + ctx->save.nr_deferred_pages);
+    rc = send_dirty_pages(ctx, stats.dirty_count + ctx->save.nr_deferred_pages,
+                          false /* all_dirty */);
     if ( rc )
         goto out;
 
@@ -801,8 +792,11 @@ static int setup(struct xc_sr_context *c
     if ( rc )
         goto err;
 
-    dirty_bitmap = xc_hypercall_buffer_alloc_pages(
-        xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
+    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
+        ? xc_hypercall_buffer_alloc_pages(
+              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
+        : (void *)-1L;
+
     ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
                                   sizeof(*ctx->save.batch_pfns));
     ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 2/8] libxenguest: avoid allocating unused deferred-pages bitmap
  2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
  2022-04-26 10:22 ` [PATCH v3 1/8] libxenguest: short-circuit "all-dirty" handling Jan Beulich
@ 2022-04-26 10:22 ` Jan Beulich
  2022-04-26 10:23 ` [PATCH v3 3/8] libxenguest: guard against overflow from too large p2m when checkpointing Jan Beulich
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Juergen Gross, George Dunlap, Anthony Perard

Like for the dirty bitmap, it is unnecessary to allocate the deferred-
pages bitmap when all that's ever going to happen is a single all-dirty
run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -130,7 +130,7 @@ static int write_batch(struct xc_sr_cont
                                                       ctx->save.batch_pfns[i]);
 
         /* Likely a ballooned page. */
-        if ( mfns[i] == INVALID_MFN )
+        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
         {
             set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
             ++ctx->save.nr_deferred_pages;
@@ -192,8 +192,12 @@ static int write_batch(struct xc_sr_cont
             {
                 if ( rc == -1 && errno == EAGAIN )
                 {
-                    set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
-                    ++ctx->save.nr_deferred_pages;
+                    if ( ctx->save.deferred_pages )
+                    {
+                        set_bit(ctx->save.batch_pfns[i],
+                                ctx->save.deferred_pages);
+                        ++ctx->save.nr_deferred_pages;
+                    }
                     types[i] = XEN_DOMCTL_PFINFO_XTAB;
                     --nr_pages;
                 }
@@ -659,7 +663,8 @@ static int suspend_and_send_dirty(struct
     else
         xc_set_progress_prefix(xch, "Checkpointed save");
 
-    bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
+    if ( ctx->save.deferred_pages )
+        bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
 
     if ( !ctx->save.live && ctx->stream_type == XC_STREAM_COLO )
     {
@@ -676,7 +681,8 @@ static int suspend_and_send_dirty(struct
     if ( rc )
         goto out;
 
-    bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_size);
+    if ( ctx->save.deferred_pages )
+        bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_size);
     ctx->save.nr_deferred_pages = 0;
 
  out:
@@ -785,24 +791,31 @@ static int setup(struct xc_sr_context *c
 {
     xc_interface *xch = ctx->xch;
     int rc;
-    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
-                                    &ctx->save.dirty_bitmap_hbuf);
 
     rc = ctx->save.ops.setup(ctx);
     if ( rc )
         goto err;
 
-    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
-        ? xc_hypercall_buffer_alloc_pages(
-              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
-        : (void *)-1L;
+    if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
+    {
+        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                        &ctx->save.dirty_bitmap_hbuf);
+
+        dirty_bitmap =
+            xc_hypercall_buffer_alloc_pages(
+                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
+        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+
+        if ( !dirty_bitmap || !ctx->save.deferred_pages )
+            goto enomem;
+    }
 
     ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
                                   sizeof(*ctx->save.batch_pfns));
-    ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
 
-    if ( !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages )
+    if ( !ctx->save.batch_pfns )
     {
+    enomem:
         ERROR("Unable to allocate memory for dirty bitmaps, batch pfns and"
               " deferred pages");
         rc = -1;



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 3/8] libxenguest: guard against overflow from too large p2m when checkpointing
  2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
  2022-04-26 10:22 ` [PATCH v3 1/8] libxenguest: short-circuit "all-dirty" handling Jan Beulich
  2022-04-26 10:22 ` [PATCH v3 2/8] libxenguest: avoid allocating unused deferred-pages bitmap Jan Beulich
@ 2022-04-26 10:23 ` Jan Beulich
  2022-04-26 10:23 ` [PATCH v3 4/8] libxenguest: restrict PV guest size Jan Beulich
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:23 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Juergen Gross, George Dunlap, Anthony Perard

struct xc_sr_record's length field has just 32 bits. Fill it early and
check that the calculated value hasn't overflowed. Additionally check
for counter overflow early - there's no point even trying to allocate
any memory in such an event.

While there also limit an induction variable's type to unsigned long:
There's no gain from it being uint64_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Of course looping over test_bit() is pretty inefficient, but given that
I have no idea how to test this code I wanted to restrict changes to
what can sensibly be seen as no worse than before from just looking at
the changes.

--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -424,7 +424,8 @@ static int send_checkpoint_dirty_pfn_lis
     xc_interface *xch = ctx->xch;
     int rc = -1;
     unsigned int count, written;
-    uint64_t i, *pfns = NULL;
+    unsigned long i;
+    uint64_t *pfns = NULL;
     struct iovec *iov = NULL;
     xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size };
     struct xc_sr_record rec = {
@@ -444,16 +445,28 @@ static int send_checkpoint_dirty_pfn_lis
 
     for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
     {
-        if ( test_bit(i, dirty_bitmap) )
-            count++;
+        if ( test_bit(i, dirty_bitmap) && !++count )
+            break;
     }
 
+    if ( i < ctx->restore.p2m_size )
+    {
+        ERROR("Too many dirty pfns");
+        goto err;
+    }
+
+    rec.length = count * sizeof(*pfns);
+    if ( rec.length / sizeof(*pfns) != count )
+    {
+        ERROR("Too many (%u) dirty pfns", count);
+        goto err;
+    }
 
-    pfns = malloc(count * sizeof(*pfns));
+    pfns = malloc(rec.length);
     if ( !pfns )
     {
-        ERROR("Unable to allocate %zu bytes of memory for dirty pfn list",
-              count * sizeof(*pfns));
+        ERROR("Unable to allocate %u bytes of memory for dirty pfn list",
+              rec.length);
         goto err;
     }
 
@@ -479,8 +492,6 @@ static int send_checkpoint_dirty_pfn_lis
         goto err;
     }
 
-    rec.length = count * sizeof(*pfns);
-
     iov[0].iov_base = &rec.type;
     iov[0].iov_len = sizeof(rec.type);
 
@@ -488,7 +499,7 @@ static int send_checkpoint_dirty_pfn_lis
     iov[1].iov_len = sizeof(rec.length);
 
     iov[2].iov_base = pfns;
-    iov[2].iov_len = count * sizeof(*pfns);
+    iov[2].iov_len = rec.length;
 
     if ( writev_exact(ctx->restore.send_back_fd, iov, 3) )
     {



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 4/8] libxenguest: restrict PV guest size
  2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
                   ` (2 preceding siblings ...)
  2022-04-26 10:23 ` [PATCH v3 3/8] libxenguest: guard against overflow from too large p2m when checkpointing Jan Beulich
@ 2022-04-26 10:23 ` Jan Beulich
  2022-04-26 10:24 ` [PATCH v3 5/8] libxenguest: deal with log-dirty op stats overflow Jan Beulich
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:23 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Juergen Gross, George Dunlap, Anthony Perard

The P2M, the use of PFNs, and hence the maximum valid PFN are purely
software constructs in PV. In principle a guest is free to use arbitrary
PFNs. However, at least page table normalization requires that PFN space
be, like MFN space, limited to the architectural 40 bits (52 address
bits). And of course a 32-bit tool stack places further constraints.

Bounding the values also makes sure that various subsequent calculations
won't truncate values and then continue with inconsistencies (see e.g.
fl_entries vs ctx->x86.pv.p2m_frames in map_p2m_tree()).

While there correct an adjacent error message with wrong way round
wording in restore code and another slightly malformed and misleading
(off by one) one in core dumping code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Integrate into series.
---
In case the save/restore changes don't make it obvious enough: It
escapes me why struct xc_sr_rec_x86_pv_p2m_frames has p2m_pfns[] with
uint64_t element type but {start,end}_pfn both as uint32_t. Imo all
three can sensibly only ever be of the same type.

--- a/tools/include/xen-tools/libs.h
+++ b/tools/include/xen-tools/libs.h
@@ -13,6 +13,10 @@
 #define ARRAY_SIZE(a) (sizeof(a) / sizeof(*a))
 #endif
 
+#ifndef sizeof_field
+#define sizeof_field(type, field) sizeof(((type *)0)->field)
+#endif
+
 #ifndef MAX
 #define MAX(x, y) ((x) > (y) ? (x) : (y))
 #endif
--- a/tools/libs/guest/xg_core_x86.c
+++ b/tools/libs/guest/xg_core_x86.c
@@ -59,6 +59,43 @@ xc_core_arch_memory_map_get(xc_interface
     if ( xc_domain_nr_gpfns(xch, info->domid, &p2m_size) < 0 )
         return -1;
 
+    if ( !p2m_size )
+    {
+        ERROR("Cannot map a guest without P2M");
+        errno = ENODATA;
+        return -1;
+    }
+
+    if ( !info->hvm )
+    {
+        unsigned int guest_width;
+
+        if ( xc_domain_get_guest_width(xch, info->domid, &guest_width) != 0 )
+        {
+            PERROR("Cannot get address size for PV guest");
+            return -1;
+        }
+
+        if ( p2m_size == (guest_width > 4 ? ~0UL : ~0U) )
+        {
+            ERROR("Cannot map a PV guest with invalid P2M");
+            errno = ENODATA;
+            return -1;
+        }
+    }
+
+#ifndef __i386__
+    if ( (p2m_size - 1) >> 40 )
+#else
+    /* Very large domains (> 1TB) will exhaust virtual address space. */
+    if ( (p2m_size - 1) >> 28 )
+#endif
+    {
+        ERROR("Cannot map a guest with P2M size %#lx", p2m_size);
+        errno = EOPNOTSUPP;
+        return -1;
+    }
+
     map = malloc(sizeof(*map));
     if ( map == NULL )
     {
@@ -333,10 +370,30 @@ xc_core_arch_map_p2m_rw(xc_interface *xc
 
     if ( dinfo->p2m_size < info->nr_pages  )
     {
-        ERROR("p2m_size < nr_pages -1 (%lx < %lx", dinfo->p2m_size, info->nr_pages - 1);
+        ERROR("p2m_size < nr_pages (%lx < %lx)", dinfo->p2m_size, info->nr_pages);
         goto out;
     }
 
+    if ( !info->hvm && dinfo->p2m_size == (dinfo->guest_width > 4 ? ~0UL : ~0U) )
+    {
+        ERROR("Cannot r/%c-map a PV guest with invalid P2M", rw ? 'w' : 'o');
+        errno = ENODATA;
+        return -1;
+    }
+
+#ifndef __i386__
+    if ( (dinfo->p2m_size - 1) >> 40 )
+#else
+    /* Very large domains (> 1TB) will exhaust virtual address space. */
+    if ( (dinfo->p2m_size - 1) >> 28 )
+#endif
+    {
+        ERROR("Cannot r/%c-map a guest with P2M size %#lx",
+              rw ? 'w' : 'o', dinfo->p2m_size);
+        errno = EOPNOTSUPP;
+        return -1;
+    }
+
     p2m_cr3 = GET_FIELD(live_shinfo, arch.p2m_cr3, dinfo->guest_width);
 
     p2m_frame_list = p2m_cr3 ? xc_core_arch_map_p2m_list_rw(xch, dinfo, dom, live_shinfo, p2m_cr3)
--- a/tools/libs/guest/xg_sr_restore_x86_pv.c
+++ b/tools/libs/guest/xg_sr_restore_x86_pv.c
@@ -709,10 +709,23 @@ static int handle_x86_pv_p2m_frames(stru
         return -1;
     }
 
+#ifdef __i386__
+    /* Very large domains (> 1TB) will exhaust virtual address space. */
+    if ( data->end_pfn >> 28 )
+#elif 0 /* sizeof(data->end_pfn) > 4 */
+    if ( data->end_pfn >> (ctx->x86.pv.width > 4 ? 40 : 32) )
+#else
+    if ( 0 )
+#endif
+    {
+        ERROR("End pfn in stream (%#x) too large", data->end_pfn);
+        return -1;
+    }
+
     if ( data->start_pfn > data->end_pfn )
     {
-        ERROR("End pfn in stream (%#x) exceeds Start (%#x)",
-              data->end_pfn, data->start_pfn);
+        ERROR("Start pfn in stream (%#x) exceeds End (%#x)",
+              data->start_pfn, data->end_pfn);
         return -1;
     }
 
--- a/tools/libs/guest/xg_sr_save_x86_pv.c
+++ b/tools/libs/guest/xg_sr_save_x86_pv.c
@@ -464,11 +464,40 @@ static int map_p2m_list(struct xc_sr_con
  */
 static int map_p2m(struct xc_sr_context *ctx)
 {
+    xc_interface *xch = ctx->xch;
     uint64_t p2m_cr3;
+    uint64_t max_pfn = GET_FIELD(ctx->x86.pv.shinfo, arch.max_pfn,
+                                 ctx->x86.pv.width);
+
+    if ( !max_pfn )
+    {
+        ERROR("Cannot save a guest without P2M");
+        errno = ENODATA;
+        return -1;
+    }
+
+    if ( max_pfn-- == (ctx->x86.pv.width > 4 ? ~0UL : ~0U) )
+    {
+        ERROR("Cannot save a guest with invalid P2M");
+        errno = ENODATA;
+        return -1;
+    }
+
+#ifndef __i386__
+    if ( max_pfn >> (sizeof_field(struct xc_sr_rec_x86_pv_p2m_frames,
+                                  end_pfn) > 4 ? 40 : 32) )
+#else
+    /* Very large domains (> 1TB) will exhaust virtual address space. */
+    if ( max_pfn >> 28 )
+#endif
+    {
+        ERROR("Cannot save a guest with maximum PFN %#"PRIx64, max_pfn);
+        errno = EOPNOTSUPP;
+        return -1;
+    }
 
     ctx->x86.pv.p2m_generation = ~0ULL;
-    ctx->x86.pv.max_pfn = GET_FIELD(ctx->x86.pv.shinfo, arch.max_pfn,
-                                    ctx->x86.pv.width) - 1;
+    ctx->x86.pv.max_pfn = max_pfn;
     p2m_cr3 = GET_FIELD(ctx->x86.pv.shinfo, arch.p2m_cr3, ctx->x86.pv.width);
 
     return p2m_cr3 ? map_p2m_list(ctx, p2m_cr3) : map_p2m_tree(ctx);



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 5/8] libxenguest: deal with log-dirty op stats overflow
  2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
                   ` (3 preceding siblings ...)
  2022-04-26 10:23 ` [PATCH v3 4/8] libxenguest: restrict PV guest size Jan Beulich
@ 2022-04-26 10:24 ` Jan Beulich
  2022-04-26 10:54   ` Andrew Cooper
  2022-04-26 10:25 ` [PATCH v3 6/8] x86/paging: supply more useful log-dirty page count Jan Beulich
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Juergen Gross, George Dunlap, Anthony Perard

In send_memory_live() the precise value the dirty_count struct field
gets initialized to doesn't matter much (apart from the triggering of
the log message in send_dirty_pages(), see below), but it is important
that it not be zero on the first iteration (or else send_dirty_pages()
won't get called at all). Saturate the initializer value at the maximum
value the field can hold.

While there also initialize struct precopy_stats' respective field to a
more sane value: We don't really know how many dirty pages there are at
that point.

In suspend_and_send_dirty() and verify_frames() the local variables
don't need initializing at all, as they're only an output from the
hypercall which gets invoked first thing.

In send_checkpoint_dirty_pfn_list() the local variable can be dropped
altogether: It's optional to xc_logdirty_control() and not used anywhere
else.

Note that in case the clipping actually takes effect, the "Bitmap
contained more entries than expected..." log message will trigger. This
being just an informational message, I don't think this is overly
concerning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: To address Andrew's objection, Jürgen suggests to widen struct
     xen_domctl_shadow_op_stats'es fields; yet within this same series
     we had settled on there not being a need (17e91570c5a4).

--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -427,7 +427,6 @@ static int send_checkpoint_dirty_pfn_lis
     unsigned long i;
     uint64_t *pfns = NULL;
     struct iovec *iov = NULL;
-    xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size };
     struct xc_sr_record rec = {
         .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
     };
@@ -437,7 +436,7 @@ static int send_checkpoint_dirty_pfn_lis
     if ( xc_logdirty_control(
              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
              HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
-             0, &stats) != ctx->restore.p2m_size )
+             0, NULL) != ctx->restore.p2m_size )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         goto err;
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -495,7 +495,9 @@ static int simple_precopy_policy(struct
 static int send_memory_live(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats = {
+        .dirty_count = MIN(ctx->save.p2m_size, (typeof(stats.dirty_count))~0)
+    };
     char *progress_str = NULL;
     unsigned int x = 0;
     int rc;
@@ -511,7 +513,7 @@ static int send_memory_live(struct xc_sr
         goto out;
 
     ctx->save.stats = (struct precopy_stats){
-        .dirty_count = ctx->save.p2m_size,
+        .dirty_count = -1,
     };
     policy_stats = &ctx->save.stats;
 
@@ -633,7 +635,7 @@ static int colo_merge_secondary_dirty_bi
 static int suspend_and_send_dirty(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats;
     char *progress_str = NULL;
     int rc;
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
@@ -694,7 +696,7 @@ static int suspend_and_send_dirty(struct
 static int verify_frames(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats;
     int rc;
     struct xc_sr_record rec = { .type = REC_TYPE_VERIFY };
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 6/8] x86/paging: supply more useful log-dirty page count
  2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
                   ` (4 preceding siblings ...)
  2022-04-26 10:24 ` [PATCH v3 5/8] libxenguest: deal with log-dirty op stats overflow Jan Beulich
@ 2022-04-26 10:25 ` Jan Beulich
  2022-04-26 10:26 ` [PATCH v3 7/8] x86/mm: update log-dirty bitmap when manipulating P2M Jan Beulich
  2022-04-26 10:27 ` [PATCH v3 8/8] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
  7 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:25 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Anthony Perard

In paging_log_dirty_op(), always update the count of pages field:
- if more pages were specified than the guest has ever accessed (HVM) or
  marked part of the p2m (PV), there's no point for the caller to
  inspect bits beyond the one for that last page,
- if the guest's p2m size has grown in the meantime, the caller would
  have no indication that it may not have caught all dirty bits.

Also exit the loop once having passed the last valid GFN. To balance
overhead and savings, do this before inspecting a new L2 table.

Adjust libxenguest accordingly, albeit these changes are necessary only
for guests which actually alter their P2M size while under migration.
They do, however, additionally open up the option of the hypervisor
eventually zapping large ranges of trailing zeros from the bitmap when
providing it back to the tools.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Of course this still is far from ideal: At the very least a perhaps
large tail of zeros could very well also result in a reduced page
count.
---
v3: Replace mfn_valid() by !mfn_eq(, INVALID_MFN) (and rebase
    accordingly).

--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -237,7 +237,16 @@ struct xc_sr_context
             /* Further debugging information in the stream. */
             bool debug;
 
+            /*
+             * Counts of bits (each representing a guest page), expressing
+             * respectively
+             * - obtained P2M size,
+             * - allocated bitmap size,
+             * - range actually filled with valid data.
+             */
             unsigned long p2m_size;
+            unsigned long p2m_alloc_size;
+            unsigned long p2m_used_size;
 
             struct precopy_stats stats;
 
@@ -245,6 +254,7 @@ struct xc_sr_context
             unsigned int nr_batch_pfns;
             unsigned long *deferred_pages;
             unsigned long nr_deferred_pages;
+            unsigned long used_deferred_pages;
             xc_hypercall_buffer_t dirty_bitmap_hbuf;
         } save;
 
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -424,7 +424,8 @@ static int send_checkpoint_dirty_pfn_lis
     xc_interface *xch = ctx->xch;
     int rc = -1;
     unsigned int count, written;
-    unsigned long i;
+    unsigned long i, p2m_size;
+    long long ret;
     uint64_t *pfns = NULL;
     struct iovec *iov = NULL;
     struct xc_sr_record rec = {
@@ -433,22 +434,29 @@ static int send_checkpoint_dirty_pfn_lis
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->restore.dirty_bitmap_hbuf);
 
-    if ( xc_logdirty_control(
-             xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
-             HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
-             0, NULL) != ctx->restore.p2m_size )
+    ret = xc_logdirty_control(
+              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+              HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
+              0, NULL);
+    if ( ret < 0 )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         goto err;
     }
+    if ( ret > ctx->restore.p2m_size )
+    {
+        ERROR("Guest has grown its p2m too much");
+        goto err;
+    }
+    p2m_size = ret;
 
-    for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
+    for ( i = 0, count = 0; i < p2m_size; i++ )
     {
         if ( test_bit(i, dirty_bitmap) && !++count )
             break;
     }
 
-    if ( i < ctx->restore.p2m_size )
+    if ( i < p2m_size )
     {
         ERROR("Too many dirty pfns");
         goto err;
@@ -469,7 +477,7 @@ static int send_checkpoint_dirty_pfn_lis
         goto err;
     }
 
-    for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i )
+    for ( i = 0, written = 0; i < p2m_size; ++i )
     {
         if ( !test_bit(i, dirty_bitmap) )
             continue;
@@ -713,8 +721,10 @@ static int setup(struct xc_sr_context *c
 
     if ( ctx->stream_type == XC_STREAM_COLO )
     {
+        unsigned long pages = NRPAGES(bitmap_size(ctx->restore.p2m_size));
+
         dirty_bitmap = xc_hypercall_buffer_alloc_pages(
-            xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->restore.p2m_size)));
+            xch, dirty_bitmap, pages);
 
         if ( !dirty_bitmap )
         {
@@ -722,6 +732,8 @@ static int setup(struct xc_sr_context *c
             rc = -1;
             goto err;
         }
+
+        ctx->restore.p2m_size = pages << (PAGE_SHIFT + 3);
     }
 
     rc = ctx->restore.ops.setup(ctx);
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -74,6 +74,16 @@ static int write_checkpoint_record(struc
     return write_record(ctx, &checkpoint);
 }
 
+static void update_deferred_pages(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+    if ( !ctx->save.deferred_pages )
+        return;
+    set_bit(pfn, ctx->save.deferred_pages);
+    ++ctx->save.nr_deferred_pages;
+    if ( pfn >= ctx->save.used_deferred_pages )
+        ctx->save.used_deferred_pages = pfn + 1;
+}
+
 /*
  * Writes a batch of memory as a PAGE_DATA record into the stream.  The batch
  * is constructed in ctx->save.batch_pfns.
@@ -130,11 +140,8 @@ static int write_batch(struct xc_sr_cont
                                                       ctx->save.batch_pfns[i]);
 
         /* Likely a ballooned page. */
-        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
-        {
-            set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
-            ++ctx->save.nr_deferred_pages;
-        }
+        if ( mfns[i] == INVALID_MFN )
+            update_deferred_pages(ctx, ctx->save.batch_pfns[i]);
     }
 
     rc = xc_get_pfn_type_batch(xch, ctx->domid, nr_pfns, types);
@@ -192,12 +199,7 @@ static int write_batch(struct xc_sr_cont
             {
                 if ( rc == -1 && errno == EAGAIN )
                 {
-                    if ( ctx->save.deferred_pages )
-                    {
-                        set_bit(ctx->save.batch_pfns[i],
-                                ctx->save.deferred_pages);
-                        ++ctx->save.nr_deferred_pages;
-                    }
+                    update_deferred_pages(ctx, ctx->save.batch_pfns[i]);
                     types[i] = XEN_DOMCTL_PFINFO_XTAB;
                     --nr_pages;
                 }
@@ -365,7 +367,7 @@ static int suspend_domain(struct xc_sr_c
  * Send a subset of pages in the guests p2m, according to the dirty bitmap.
  * Used for each subsequent iteration of the live migration loop.
  *
- * Bitmap is bounded by p2m_size.
+ * Bitmap is bounded by p2m_alloc_size, but populated only up to p2m_used_size.
  */
 static int send_dirty_pages(struct xc_sr_context *ctx,
                             unsigned long entries, bool all_dirty)
@@ -377,7 +379,10 @@ static int send_dirty_pages(struct xc_sr
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->save.dirty_bitmap_hbuf);
 
-    for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
+    if ( all_dirty )
+        ctx->save.p2m_used_size = ctx->save.p2m_size;
+
+    for ( p = 0, written = 0; p < ctx->save.p2m_used_size; ++p )
     {
         if ( !all_dirty && !test_bit(p, dirty_bitmap) )
             continue;
@@ -522,6 +527,8 @@ static int send_memory_live(struct xc_sr
 
     for ( ; ; )
     {
+        long long ret;
+
         policy_decision = precopy_policy(*policy_stats, data);
         x++;
 
@@ -548,15 +555,23 @@ static int send_memory_live(struct xc_sr
         if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
             break;
 
-        if ( xc_logdirty_control(
-                 xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
-                 &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
-                 0, &stats) != ctx->save.p2m_size )
+        ret = xc_logdirty_control(
+                  xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+                  &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_alloc_size,
+                  0, &stats);
+        if ( ret < 0 )
         {
             PERROR("Failed to retrieve logdirty bitmap");
             rc = -1;
             goto out;
         }
+        if ( ret > ctx->save.p2m_alloc_size )
+        {
+            ERROR("Guest has grown its p2m too much");
+            rc = -1;
+            goto out;
+        }
+        ctx->save.p2m_used_size = ret;
 
         policy_stats->dirty_count = stats.dirty_count;
 
@@ -610,7 +625,7 @@ static int colo_merge_secondary_dirty_bi
     for ( i = 0; i < count; i++ )
     {
         pfn = pfns[i];
-        if ( pfn >= ctx->save.p2m_size )
+        if ( pfn >= ctx->save.p2m_alloc_size )
         {
             PERROR("Invalid pfn 0x%" PRIx64, pfn);
             rc = -1;
@@ -638,6 +653,7 @@ static int suspend_and_send_dirty(struct
     xc_shadow_op_stats_t stats;
     char *progress_str = NULL;
     int rc;
+    long long ret;
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->save.dirty_bitmap_hbuf);
 
@@ -645,16 +661,22 @@ static int suspend_and_send_dirty(struct
     if ( rc )
         goto out;
 
-    if ( xc_logdirty_control(
-             xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
-             HYPERCALL_BUFFER(dirty_bitmap), ctx->save.p2m_size,
-             XEN_DOMCTL_SHADOW_LOGDIRTY_FINAL, &stats) !=
-         ctx->save.p2m_size )
+    ret = xc_logdirty_control(
+              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+              HYPERCALL_BUFFER(dirty_bitmap), ctx->save.p2m_alloc_size,
+              XEN_DOMCTL_SHADOW_LOGDIRTY_FINAL, &stats);
+    if ( ret < 0 )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         rc = -1;
         goto out;
     }
+    if ( ret > ctx->save.p2m_alloc_size )
+    {
+        ERROR("Guest has grown its p2m too much");
+        rc = -1;
+        goto out;
+    }
 
     if ( ctx->save.live )
     {
@@ -666,7 +688,8 @@ static int suspend_and_send_dirty(struct
         xc_set_progress_prefix(xch, "Checkpointed save");
 
     if ( ctx->save.deferred_pages )
-        bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
+        bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_alloc_size);
+    ctx->save.p2m_used_size = MAX(ret, ctx->save.used_deferred_pages);
 
     if ( !ctx->save.live && ctx->stream_type == XC_STREAM_COLO )
     {
@@ -684,8 +707,9 @@ static int suspend_and_send_dirty(struct
         goto out;
 
     if ( ctx->save.deferred_pages )
-        bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_size);
+        bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_alloc_size);
     ctx->save.nr_deferred_pages = 0;
+    ctx->save.used_deferred_pages = 0;
 
  out:
     xc_set_progress_prefix(xch, NULL);
@@ -698,6 +722,7 @@ static int verify_frames(struct xc_sr_co
     xc_interface *xch = ctx->xch;
     xc_shadow_op_stats_t stats;
     int rc;
+    long long ret;
     struct xc_sr_record rec = { .type = REC_TYPE_VERIFY };
 
     DPRINTF("Enabling verify mode");
@@ -711,15 +736,18 @@ static int verify_frames(struct xc_sr_co
     if ( rc )
         goto out;
 
-    if ( xc_logdirty_control(
-             xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_PEEK,
-             &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
-             0, &stats) != ctx->save.p2m_size )
+    ret = xc_logdirty_control(
+              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_PEEK,
+              &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_alloc_size,
+              0, &stats);
+    if ( ret < 0 )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         rc = -1;
         goto out;
     }
+    if ( ret > ctx->save.p2m_alloc_size )
+        IPRINTF("Guest has grown its p2m too much");
 
     DPRINTF("  Further stats: faults %u, dirty %u",
             stats.fault_count, stats.dirty_count);
@@ -800,13 +828,14 @@ static int setup(struct xc_sr_context *c
 
     if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
     {
+        unsigned long pages = NRPAGES(bitmap_size(ctx->save.p2m_size));
         DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                         &ctx->save.dirty_bitmap_hbuf);
 
         dirty_bitmap =
-            xc_hypercall_buffer_alloc_pages(
-                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
-        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+            xc_hypercall_buffer_alloc_pages(xch, dirty_bitmap, pages);
+        ctx->save.p2m_alloc_size = pages << (PAGE_SHIFT + 3);
+        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_alloc_size);
 
         if ( !dirty_bitmap || !ctx->save.deferred_pages )
             goto enomem;
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -400,6 +400,20 @@ bool paging_mfn_is_dirty(const struct do
 }
 #endif
 
+/*
+ * This is used to provide a rough (upper) estimate to the caller of how many
+ * more pages we might have data for.
+ */
+static unsigned int last_valid_entry(const mfn_t *tbl, unsigned int idx) {
+    unsigned int last = LOGDIRTY_NODE_ENTRIES;
+
+    for ( ; idx < LOGDIRTY_NODE_ENTRIES; ++idx )
+        if ( !mfn_eq(tbl[idx], INVALID_MFN) )
+            last = idx;
+
+    return last;
+}
+
 /* Read a domain's log-dirty bitmap and stats.  If the operation is a CLEAN,
  * clear the bitmap and stats as well. */
 static int paging_log_dirty_op(struct domain *d,
@@ -407,10 +421,10 @@ static int paging_log_dirty_op(struct do
                                bool_t resuming)
 {
     int rv = 0, clean = 0, peek = 1;
-    unsigned long pages = 0;
+    unsigned long pages = 0, extra = 0;
     mfn_t *l4 = NULL, *l3 = NULL, *l2 = NULL;
     unsigned long *l1 = NULL;
-    int i4, i3, i2;
+    unsigned int i4, i3, i2;
 
     if ( !resuming )
     {
@@ -482,6 +496,15 @@ static int paging_log_dirty_op(struct do
               map_domain_page(l4[i4]) : NULL);
         for ( ; (pages < sc->pages) && (i3 < LOGDIRTY_NODE_ENTRIES); i3++ )
         {
+            unsigned long max_gfn = domain_get_maximum_gpfn(d);
+
+            if ( (i4 * LOGDIRTY_NODE_ENTRIES + i3) *
+                 LOGDIRTY_NODE_ENTRIES * PAGE_SIZE * 8 > max_gfn )
+            {
+                i4 = LOGDIRTY_NODE_ENTRIES;
+                break;
+            }
+
             l2 = ((l3 && !mfn_eq(l3[i3], INVALID_MFN)) ?
                   map_domain_page(l3[i3]) : NULL);
             for ( i2 = 0;
@@ -505,18 +528,36 @@ static int paging_log_dirty_op(struct do
                         goto out;
                     }
                 }
+
                 pages += bytes << 3;
+
                 if ( l1 )
                 {
+                    if ( unlikely(pages >= sc->pages) )
+                        extra = (PAGE_SIZE - bytes) << 3;
+
                     if ( clean )
                         clear_page(l1);
                     unmap_domain_page(l1);
                 }
             }
+
             if ( l2 )
+            {
+                if ( unlikely(pages >= sc->pages) )
+                {
+                    i2 = last_valid_entry(l2, i2);
+                    if ( i2 < LOGDIRTY_NODE_ENTRIES )
+                        extra = ((i4 * LOGDIRTY_NODE_ENTRIES + i3) *
+                                 LOGDIRTY_NODE_ENTRIES + i2 + 1) *
+                                PAGE_SIZE * 8;
+                }
+
                 unmap_domain_page(l2);
+            }
 
-            if ( i3 < LOGDIRTY_NODE_ENTRIES - 1 && hypercall_preempt_check() )
+            if ( pages < sc->pages && i3 < LOGDIRTY_NODE_ENTRIES - 1 &&
+                 hypercall_preempt_check() )
             {
                 d->arch.paging.preempt.log_dirty.i4 = i4;
                 d->arch.paging.preempt.log_dirty.i3 = i3 + 1;
@@ -524,10 +565,21 @@ static int paging_log_dirty_op(struct do
                 break;
             }
         }
+
         if ( l3 )
+        {
+            if ( !rv && unlikely(pages >= sc->pages) )
+            {
+                i3 = last_valid_entry(l3, i3);
+                if ( i3 < LOGDIRTY_NODE_ENTRIES )
+                    extra = (i4 * LOGDIRTY_NODE_ENTRIES + i3 + 1) *
+                            LOGDIRTY_NODE_ENTRIES * PAGE_SIZE * 8;
+            }
+
             unmap_domain_page(l3);
+        }
 
-        if ( !rv && i4 < LOGDIRTY_NODE_ENTRIES - 1 &&
+        if ( !rv && pages < sc->pages && i4 < LOGDIRTY_NODE_ENTRIES - 1 &&
              hypercall_preempt_check() )
         {
             d->arch.paging.preempt.log_dirty.i4 = i4 + 1;
@@ -537,8 +589,19 @@ static int paging_log_dirty_op(struct do
         if ( rv )
             break;
     }
+
     if ( l4 )
+    {
+        if ( !rv && unlikely(pages >= sc->pages) )
+        {
+            i4 = last_valid_entry(l4, i4);
+            if ( i4 < LOGDIRTY_NODE_ENTRIES )
+                extra = (i4 + 1) * LOGDIRTY_NODE_ENTRIES *
+                        LOGDIRTY_NODE_ENTRIES * PAGE_SIZE * 8;
+        }
+
         unmap_domain_page(l4);
+    }
 
     if ( !rv )
     {
@@ -565,8 +628,8 @@ static int paging_log_dirty_op(struct do
         return rv;
     }
 
-    if ( pages < sc->pages )
-        sc->pages = pages;
+    sc->pages = min(pages + extra, domain_get_maximum_gpfn(d) + 1);
+
     if ( clean )
     {
         /* We need to further call clean_dirty_bitmap() functions of specific
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -272,7 +272,8 @@ struct xen_domctl_shadow_op {
 
     /* OP_PEEK / OP_CLEAN */
     XEN_GUEST_HANDLE_64(uint8) dirty_bitmap;
-    uint64_aligned_t pages; /* Size of buffer. Updated with actual size. */
+    uint64_aligned_t pages; /* Size of buffer. Updated with actual (or
+                               potentially needed) size. */
     struct xen_domctl_shadow_op_stats stats;
 };
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 7/8] x86/mm: update log-dirty bitmap when manipulating P2M
  2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
                   ` (5 preceding siblings ...)
  2022-04-26 10:25 ` [PATCH v3 6/8] x86/paging: supply more useful log-dirty page count Jan Beulich
@ 2022-04-26 10:26 ` Jan Beulich
  2022-04-26 10:27 ` [PATCH v3 8/8] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
  7 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Anthony Perard

Just like for PV guests MMU_MACHPHYS_UPDATE implies marking of the
respective page as dirty, additions to a HVM guest's P2M should do so.

For HVM the opposite is also true: Pages being removed from the P2M are
no longer dirty at their prior GFN; there's no point in telling the tool
stack to try and copy that page, when this will fail anyway (until
perhaps a new page gets placed there). Introduce paging_mark_pfn_clean()
(intentionally without a paging_mark_clean() counterpart) to handle
this. Note that while there is an earlier call to set_gpfn_from_mfn() in
guest_physmap_add_entry(), but there's little reason to mark the page
clean there when later in the function it'll be marked dirty. This is
even more so given that at this point it's only the M2P that gets
updated, with the P2M still left unchanged.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
p2m_add_page()'s error handling looks bogus in this regard anyway: If an
error occurs before an MFN actually is assciated with the new GFN, the
M2P entry ought to be restored imo. But of course a guest is still hosed
if the operation succeeds partially.

Note that I've not even checked mem-paging and mem-sharing code for
whether they may need similar adjustment. At least the latters is, aiui,
incompatible with log-dirty mode anyway.
---
v3: Re-base.

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -549,7 +549,10 @@ p2m_remove_entry(struct p2m_domain *p2m,
         {
             p2m->get_entry(p2m, gfn_add(gfn, i), &t, &a, 0, NULL, NULL);
             if ( !p2m_is_special(t) && !p2m_is_shared(t) )
+            {
                 set_gpfn_from_mfn(mfn_x(mfn) + i, INVALID_M2P_ENTRY);
+                paging_mark_pfn_clean(p2m->domain, _pfn(gfn_x(gfn) + i));
+            }
         }
     }
 
@@ -737,8 +740,11 @@ p2m_add_page(struct domain *d, gfn_t gfn
         if ( !p2m_is_grant(t) )
         {
             for ( i = 0; i < (1UL << page_order); i++ )
+            {
                 set_gpfn_from_mfn(mfn_x(mfn_add(mfn, i)),
                                   gfn_x(gfn_add(gfn, i)));
+                paging_mark_pfn_dirty(d, _pfn(gfn_x(gfn) + i));
+            }
         }
     }
     else
@@ -1096,6 +1102,7 @@ static int set_typed_p2m_entry(struct do
                 {
                     ASSERT(mfn_valid(mfn_add(omfn, i)));
                     set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
+                    paging_mark_pfn_clean(d, _pfn(gfn_x(gfn) + i));
 
                     ioreq_request_mapcache_invalidate(d);
                 }
@@ -1117,6 +1124,7 @@ static int set_typed_p2m_entry(struct do
         {
             ASSERT(mfn_valid(mfn_add(omfn, i)));
             set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
+            paging_mark_pfn_clean(d, _pfn(gfn_x(gfn) + i));
         }
 
         ioreq_request_mapcache_invalidate(d);
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -657,7 +657,10 @@ decrease_reservation(struct domain *d, g
             }
             p2m_tlb_flush_sync(p2m);
             for ( j = 0; j < n; ++j )
+            {
                 set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
+                paging_mark_pfn_clean(d, _pfn(gfn_x(gfn) + i + j));
+            }
             p2m_pod_cache_add(p2m, page, cur_order);
 
             ioreq_request_mapcache_invalidate(d);
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -260,7 +260,7 @@ static int paging_log_dirty_disable(stru
 }
 
 /* Mark a page as dirty, with taking guest pfn as parameter */
-void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
+static void mark_pfn_dirty(struct domain *d, pfn_t pfn, bool dirty)
 {
     bool changed;
     mfn_t mfn, *l4, *l3, *l2;
@@ -291,14 +291,15 @@ void paging_mark_pfn_dirty(struct domain
 
     if ( unlikely(mfn_eq(d->arch.paging.log_dirty.top, INVALID_MFN)) )
     {
-         d->arch.paging.log_dirty.top = paging_new_log_dirty_node(d);
+         if ( dirty )
+             d->arch.paging.log_dirty.top = paging_new_log_dirty_node(d);
          if ( unlikely(mfn_eq(d->arch.paging.log_dirty.top, INVALID_MFN)) )
              goto out;
     }
 
     l4 = paging_map_log_dirty_bitmap(d);
     mfn = l4[i4];
-    if ( mfn_eq(mfn, INVALID_MFN) )
+    if ( mfn_eq(mfn, INVALID_MFN) && dirty )
         l4[i4] = mfn = paging_new_log_dirty_node(d);
     unmap_domain_page(l4);
     if ( mfn_eq(mfn, INVALID_MFN) )
@@ -306,7 +307,7 @@ void paging_mark_pfn_dirty(struct domain
 
     l3 = map_domain_page(mfn);
     mfn = l3[i3];
-    if ( mfn_eq(mfn, INVALID_MFN) )
+    if ( mfn_eq(mfn, INVALID_MFN) && dirty )
         l3[i3] = mfn = paging_new_log_dirty_node(d);
     unmap_domain_page(l3);
     if ( mfn_eq(mfn, INVALID_MFN) )
@@ -314,21 +315,22 @@ void paging_mark_pfn_dirty(struct domain
 
     l2 = map_domain_page(mfn);
     mfn = l2[i2];
-    if ( mfn_eq(mfn, INVALID_MFN) )
+    if ( mfn_eq(mfn, INVALID_MFN) && dirty )
         l2[i2] = mfn = paging_new_log_dirty_leaf(d);
     unmap_domain_page(l2);
     if ( mfn_eq(mfn, INVALID_MFN) )
         goto out;
 
     l1 = map_domain_page(mfn);
-    changed = !__test_and_set_bit(i1, l1);
+    changed = dirty ? !__test_and_set_bit(i1, l1)
+                    : __test_and_clear_bit(i1, l1);
     unmap_domain_page(l1);
     if ( changed )
     {
         PAGING_DEBUG(LOGDIRTY,
-                     "d%d: marked mfn %" PRI_mfn " (pfn %" PRI_pfn ")\n",
-                     d->domain_id, mfn_x(mfn), pfn_x(pfn));
-        d->arch.paging.log_dirty.dirty_count++;
+                     "%pd: marked mfn %" PRI_mfn " (pfn %" PRI_pfn ") %s\n",
+                     d, mfn_x(mfn), pfn_x(pfn), dirty ? "dirty" : "clean");
+        d->arch.paging.log_dirty.dirty_count += dirty ? 1 : -1;
     }
 
 out:
@@ -337,6 +339,16 @@ out:
     return;
 }
 
+void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
+{
+    mark_pfn_dirty(d, pfn, true);
+}
+
+void paging_mark_pfn_clean(struct domain *d, pfn_t pfn)
+{
+    mark_pfn_dirty(d, pfn, false);
+}
+
 /* Mark a page as dirty */
 void paging_mark_dirty(struct domain *d, mfn_t gmfn)
 {
@@ -349,7 +361,7 @@ void paging_mark_dirty(struct domain *d,
     /* We /really/ mean PFN here, even for non-translated guests. */
     pfn = _pfn(get_gpfn_from_mfn(mfn_x(gmfn)));
 
-    paging_mark_pfn_dirty(d, pfn);
+    mark_pfn_dirty(d, pfn, true);
 }
 
 #ifdef CONFIG_SHADOW_PAGING
--- a/xen/arch/x86/include/asm/paging.h
+++ b/xen/arch/x86/include/asm/paging.h
@@ -165,8 +165,9 @@ void paging_log_dirty_init(struct domain
 
 /* mark a page as dirty */
 void paging_mark_dirty(struct domain *d, mfn_t gmfn);
-/* mark a page as dirty with taking guest pfn as parameter */
+/* mark a page as dirty/clean with taking guest pfn as parameter */
 void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn);
+void paging_mark_pfn_clean(struct domain *d, pfn_t pfn);
 
 /* is this guest page dirty? 
  * This is called from inside paging code, with the paging lock held. */



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 8/8] SUPPORT.md: write down restriction of 32-bit tool stacks
  2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
                   ` (6 preceding siblings ...)
  2022-04-26 10:26 ` [PATCH v3 7/8] x86/mm: update log-dirty bitmap when manipulating P2M Jan Beulich
@ 2022-04-26 10:27 ` Jan Beulich
  2023-06-16 20:50   ` Julien Grall
  7 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 10:27 UTC (permalink / raw)
  To: xen-devel
  Cc: Anthony Perard, Andrew Cooper, George Dunlap, Julien Grall,
	Stefano Stabellini, Wei Liu

Let's try to avoid giving the impression that 32-bit tool stacks are as
capable as 64-bit ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Wording adjustments as per review discussion.

--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -137,6 +137,12 @@ ARM only has one guest type at the momen
 
 ## Toolstack
 
+While 32-bit builds of the tool stack are generally supported, restrictions
+apply in particular when running on top of a 64-bit hypervisor.  For example,
+very large guests aren't expected to be manageable in this case.  This includes
+guests giving the appearance of being large, by altering their own memory
+layouts.
+
 ### xl
 
     Status: Supported



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 5/8] libxenguest: deal with log-dirty op stats overflow
  2022-04-26 10:24 ` [PATCH v3 5/8] libxenguest: deal with log-dirty op stats overflow Jan Beulich
@ 2022-04-26 10:54   ` Andrew Cooper
  2022-04-26 14:24     ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2022-04-26 10:54 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Juergen Gross, George Dunlap, Anthony Perard

On 26/04/2022 11:24, Jan Beulich wrote:
> In send_memory_live() the precise value the dirty_count struct field
> gets initialized to doesn't matter much

Yes it does.

And as you keep on refusing to actually fix the bugs pointed out during
review, this entire series is NACKED, seeing as you've also repeatedly
failed to address feedback from previous revisions on other patches.

~Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 5/8] libxenguest: deal with log-dirty op stats overflow
  2022-04-26 10:54   ` Andrew Cooper
@ 2022-04-26 14:24     ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-04-26 14:24 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Juergen Gross, George Dunlap, Anthony Perard, xen-devel

On 26.04.2022 12:54, Andrew Cooper wrote:
> On 26/04/2022 11:24, Jan Beulich wrote:
>> In send_memory_live() the precise value the dirty_count struct field
>> gets initialized to doesn't matter much
> 
> Yes it does.
> 
> And as you keep on refusing to actually fix the bugs pointed out during
> review, this entire series is NACKED, seeing as you've also repeatedly
> failed to address feedback from previous revisions on other patches.

Excuse me? Afair I did respond back, requesting further input from you.
It is this input which hasn't come forward; I don't see how I could
have made changes when I don't know what exactly I need to do. This is
emphasized by Ian giving a provisional ack on 'libxenguest: short-
circuit "all-dirty" handling', depending on you not proving him,
Jürgen (who had reviewed what is now the first two patches), and me
wrong. There again was no response from you. In such a case no
response can only mean "what I've heard back was satisfactory".

Also may I remind you that you're not in the position to NACK this
entire series. You can latch '"open" objections', yes, but it is at
best unfair to record objections without then working towards
resolving them, by simply remaining silent towards the request for
further input. This looks to be a repeating pattern, which is rather
frustrating. I fully recognize that in many areas you know things
better than I do, but this doesn't mean you're always right and I'm
always wrong. Therefore, like e.g. for patch 1 in this series, you
will want to actively convince the three people who think the change
is okay, rather than simply saying "no" with apparently none of the
three understanding why you say so.

I also view it as at least questionable behavior to NACK an entire
series when there are clearly issues which want taking care of. If
not by the patches in their present shape, then by whatever is
suitable (working out of which is why discussion needs to continue
until clarity is actually reached).

Jan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 8/8] SUPPORT.md: write down restriction of 32-bit tool stacks
  2022-04-26 10:27 ` [PATCH v3 8/8] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
@ 2023-06-16 20:50   ` Julien Grall
  0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2023-06-16 20:50 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Anthony Perard, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu

Hi Jan,

Sorry for the late reply.

On 26/04/2022 11:27, Jan Beulich wrote:
> Let's try to avoid giving the impression that 32-bit tool stacks are as
> capable as 64-bit ones.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-06-16 20:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-26 10:20 [PATCH v3 0/8] x86: more or less log-dirty related improvements Jan Beulich
2022-04-26 10:22 ` [PATCH v3 1/8] libxenguest: short-circuit "all-dirty" handling Jan Beulich
2022-04-26 10:22 ` [PATCH v3 2/8] libxenguest: avoid allocating unused deferred-pages bitmap Jan Beulich
2022-04-26 10:23 ` [PATCH v3 3/8] libxenguest: guard against overflow from too large p2m when checkpointing Jan Beulich
2022-04-26 10:23 ` [PATCH v3 4/8] libxenguest: restrict PV guest size Jan Beulich
2022-04-26 10:24 ` [PATCH v3 5/8] libxenguest: deal with log-dirty op stats overflow Jan Beulich
2022-04-26 10:54   ` Andrew Cooper
2022-04-26 14:24     ` Jan Beulich
2022-04-26 10:25 ` [PATCH v3 6/8] x86/paging: supply more useful log-dirty page count Jan Beulich
2022-04-26 10:26 ` [PATCH v3 7/8] x86/mm: update log-dirty bitmap when manipulating P2M Jan Beulich
2022-04-26 10:27 ` [PATCH v3 8/8] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
2023-06-16 20:50   ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.