All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] x86: more or less log-dirty related improvements
@ 2021-07-05 15:09 Jan Beulich
  2021-07-05 15:11 ` [PATCH v2 01/13] libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl Jan Beulich
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

... or so I hope. This series continues the attempt to deal with
the ovmf change putting the shared info page at a very high address
(which is now planned to get reverted there, but the general
problem doesn't go away by them doing so). There are further issues
with truncated value, which are being dealt with here. But there
are also not directly related changes, when I simply spotted things
that aren't very likely to be right the way they are. And then
there are also adjustments to the underlying hypervisor
implementation, with the goal of making the returned data more
useful to the consumers.

With these changes in place, a 1Gb guest which has "inflated"
itself by putting a page right below the 16Tb boundary migrates
successfully, albeit the process takes from some 20 minutes to over
half an hour on my test system.

In v2, besides integrating 2 patches that were previously sent,
there's one new patch and otherwise review feedback addressed
(albeit there wasn't any for a number of patches).

01: libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl
02: libxc: split xc_logdirty_control() from xc_shadow_control()
03: libxenguest: deal with log-dirty op stats overflow
04: libxenguest: short-circuit "all-dirty" handling
05: libxenguest: avoid allocating unused deferred-pages bitmap
06: libxenguest: complete loops in xc_map_domain_meminfo()
07: libxenguest: guard against overflow from too large p2m when checkpointing
08: libxenguest: fix off-by-1 in colo-secondary-bitmap merging
09: libxenguest: restrict PV guest size
10: libxc: simplify HYPERCALL_BUFFER()
11: x86/paging: supply more useful log-dirty page count
12: x86/mm: update log-dirty bitmap when manipulating P2M
13: SUPPORT.md: write down restriction of 32-bit tool stacks

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 01/13] libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
@ 2021-07-05 15:11 ` Jan Beulich
  2021-07-16 13:03   ` Anthony PERARD
  2021-07-05 15:12 ` [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control() Jan Beulich
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson, Anthony Perard

The hypervisor may not have enough memory to satisfy the request. While
there, make the unit of the value clear by renaming the local variable.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Rename local variables. Also log requested value. Don't log errno
    explicitly in LOGED() (nor function return value). Integrate into
    series.
---
Especially if the request was mostly fulfilled, guests may have done
fine despite the failure, so there is a risk of perceived regressions
here. But not checking the error at all was certainly wrong.

--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -529,10 +529,20 @@ int libxl__arch_domain_create(libxl__gc
         xc_domain_set_time_offset(ctx->xch, domid, rtc_timeoffset);
 
     if (d_config->b_info.type != LIBXL_DOMAIN_TYPE_PV) {
-        unsigned long shadow = DIV_ROUNDUP(d_config->b_info.shadow_memkb,
-                                           1024);
-        xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
-                          NULL, 0, &shadow, 0, NULL);
+        unsigned long shadow_mb = DIV_ROUNDUP(d_config->b_info.shadow_memkb,
+                                              1024);
+        int r = xc_shadow_control(ctx->xch, domid,
+                                  XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
+                                  NULL, 0, &shadow_mb, 0, NULL);
+
+        if (r) {
+            LOGED(ERROR, domid,
+                  "Failed to set %lu MiB %s allocation",
+                  shadow_mb,
+                  libxl_defbool_val(d_config->c_info.hap) ? "HAP" : "shadow");
+            ret = ERROR_FAIL;
+            goto out;
+        }
     }
 
     if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV &&



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control()
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
  2021-07-05 15:11 ` [PATCH v2 01/13] libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl Jan Beulich
@ 2021-07-05 15:12 ` Jan Beulich
  2021-08-19  9:11   ` Juergen Gross
  2021-07-05 15:13 ` [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow Jan Beulich
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:12 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson, Anthony Perard,
	Marek Marczykowski

For log-dirty operations a 64-bit field is being truncated to become an
"int" return value. Seeing the large number of arguments the present
function takes, reduce its set of parameters to that needed for all
operations not involving the log-dirty bitmap, while introducing a new
wrapper for the log-dirty bitmap operations. This new function in turn
doesn't need an "mb" parameter, but has a 64-bit return type. (Using the
return value in favor of a pointer-type parameter is left as is, to
disturb callers as little as possible.)

While altering xc_shadow_control() anyway, also adjust the types of the
last two of the remaining parameters.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
v2: Avoid new use of DECLARE_DOMCTL. Re-base over error handling fix to
    libxl__arch_domain_create().
---
I wonder whether we shouldn't take the opportunity and also rename
xc_shadow_control() to, say, xc_paging_control(), matching the layer
above the HAP/shadow distinction in the hypervisor.

--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -885,11 +885,15 @@ typedef struct xen_domctl_shadow_op_stat
 int xc_shadow_control(xc_interface *xch,
                       uint32_t domid,
                       unsigned int sop,
-                      xc_hypercall_buffer_t *dirty_bitmap,
-                      unsigned long pages,
-                      unsigned long *mb,
-                      uint32_t mode,
-                      xc_shadow_op_stats_t *stats);
+                      unsigned int *mb,
+                      unsigned int mode);
+long long xc_logdirty_control(xc_interface *xch,
+                              uint32_t domid,
+                              unsigned int sop,
+                              xc_hypercall_buffer_t *dirty_bitmap,
+                              unsigned long pages,
+                              unsigned int mode,
+                              xc_shadow_op_stats_t *stats);
 
 int xc_sched_credit_domain_set(xc_interface *xch,
                                uint32_t domid,
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -650,25 +650,49 @@ int xc_watchdog(xc_interface *xch,
 int xc_shadow_control(xc_interface *xch,
                       uint32_t domid,
                       unsigned int sop,
-                      xc_hypercall_buffer_t *dirty_bitmap,
-                      unsigned long pages,
-                      unsigned long *mb,
-                      uint32_t mode,
-                      xc_shadow_op_stats_t *stats)
+                      unsigned int *mb,
+                      unsigned int mode)
 {
     int rc;
     DECLARE_DOMCTL;
-    DECLARE_HYPERCALL_BUFFER_ARGUMENT(dirty_bitmap);
 
     memset(&domctl, 0, sizeof(domctl));
 
     domctl.cmd = XEN_DOMCTL_shadow_op;
     domctl.domain = domid;
     domctl.u.shadow_op.op     = sop;
-    domctl.u.shadow_op.pages  = pages;
     domctl.u.shadow_op.mb     = mb ? *mb : 0;
     domctl.u.shadow_op.mode   = mode;
-    if (dirty_bitmap != NULL)
+
+    rc = do_domctl(xch, &domctl);
+
+    if ( mb )
+        *mb = domctl.u.shadow_op.mb;
+
+    return rc;
+}
+
+long long xc_logdirty_control(xc_interface *xch,
+                              uint32_t domid,
+                              unsigned int sop,
+                              xc_hypercall_buffer_t *dirty_bitmap,
+                              unsigned long pages,
+                              unsigned int mode,
+                              xc_shadow_op_stats_t *stats)
+{
+    int rc;
+    struct xen_domctl domctl = {
+        .cmd         = XEN_DOMCTL_shadow_op,
+        .domain      = domid,
+        .u.shadow_op = {
+            .op    = sop,
+            .pages = pages,
+            .mode  = mode,
+        }
+    };
+    DECLARE_HYPERCALL_BUFFER_ARGUMENT(dirty_bitmap);
+
+    if ( dirty_bitmap )
         set_xen_guest_handle(domctl.u.shadow_op.dirty_bitmap,
                                 dirty_bitmap);
 
@@ -678,9 +702,6 @@ int xc_shadow_control(xc_interface *xch,
         memcpy(stats, &domctl.u.shadow_op.stats,
                sizeof(xc_shadow_op_stats_t));
     
-    if ( mb ) 
-        *mb = domctl.u.shadow_op.mb;
-
     return (rc == 0) ? domctl.u.shadow_op.pages : rc;
 }
 
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -459,10 +459,10 @@ static int send_checkpoint_dirty_pfn_lis
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->restore.dirty_bitmap_hbuf);
 
-    if ( xc_shadow_control(
+    if ( xc_logdirty_control(
              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
              HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
-             NULL, 0, &stats) != ctx->restore.p2m_size )
+             0, &stats) != ctx->restore.p2m_size )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         goto err;
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -428,18 +428,18 @@ static int enable_logdirty(struct xc_sr_
     /* This juggling is required if logdirty is enabled for VRAM tracking. */
     rc = xc_shadow_control(xch, ctx->domid,
                            XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
-                           NULL, 0, NULL, 0, NULL);
+                           NULL, 0);
     if ( rc < 0 )
     {
         on1 = errno;
         rc = xc_shadow_control(xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_OFF,
-                               NULL, 0, NULL, 0, NULL);
+                               NULL, 0);
         if ( rc < 0 )
             off = errno;
         else {
             rc = xc_shadow_control(xch, ctx->domid,
                                    XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
-                                   NULL, 0, NULL, 0, NULL);
+                                   NULL, 0);
             if ( rc < 0 )
                 on2 = errno;
         }
@@ -556,10 +556,10 @@ static int send_memory_live(struct xc_sr
         if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
             break;
 
-        if ( xc_shadow_control(
+        if ( xc_logdirty_control(
                  xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
                  &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
-                 NULL, 0, &stats) != ctx->save.p2m_size )
+                 0, &stats) != ctx->save.p2m_size )
         {
             PERROR("Failed to retrieve logdirty bitmap");
             rc = -1;
@@ -653,10 +653,10 @@ static int suspend_and_send_dirty(struct
     if ( rc )
         goto out;
 
-    if ( xc_shadow_control(
+    if ( xc_logdirty_control(
              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
              HYPERCALL_BUFFER(dirty_bitmap), ctx->save.p2m_size,
-             NULL, XEN_DOMCTL_SHADOW_LOGDIRTY_FINAL, &stats) !=
+             XEN_DOMCTL_SHADOW_LOGDIRTY_FINAL, &stats) !=
          ctx->save.p2m_size )
     {
         PERROR("Failed to retrieve logdirty bitmap");
@@ -716,10 +716,10 @@ static int verify_frames(struct xc_sr_co
     if ( rc )
         goto out;
 
-    if ( xc_shadow_control(
+    if ( xc_logdirty_control(
              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_PEEK,
              &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
-             NULL, 0, &stats) != ctx->save.p2m_size )
+             0, &stats) != ctx->save.p2m_size )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         rc = -1;
@@ -834,7 +834,7 @@ static void cleanup(struct xc_sr_context
 
 
     xc_shadow_control(xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_OFF,
-                      NULL, 0, NULL, 0, NULL);
+                      NULL, 0);
 
     if ( ctx->save.ops.cleanup(ctx) )
         PERROR("Failed to clean up");
--- a/tools/libs/light/libxl_colo_restore.c
+++ b/tools/libs/light/libxl_colo_restore.c
@@ -62,7 +62,7 @@ static void colo_enable_logdirty(libxl__
     /* we need to know which pages are dirty to restore the guest */
     if (xc_shadow_control(CTX->xch, domid,
                           XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
-                          NULL, 0, NULL, 0, NULL) < 0) {
+                          NULL, 0) < 0) {
         LOGD(ERROR, domid, "cannot enable secondary vm's logdirty");
         lds->callback(egc, lds, ERROR_FAIL);
         return;
@@ -90,7 +90,7 @@ static void colo_disable_logdirty(libxl_
 
     /* we need to know which pages are dirty to restore the guest */
     if (xc_shadow_control(CTX->xch, domid, XEN_DOMCTL_SHADOW_OP_OFF,
-                          NULL, 0, NULL, 0, NULL) < 0)
+                          NULL, 0) < 0)
         LOGD(WARN, domid, "cannot disable secondary vm's logdirty");
 
     if (crs->hvm) {
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -529,15 +529,15 @@ int libxl__arch_domain_create(libxl__gc
         xc_domain_set_time_offset(ctx->xch, domid, rtc_timeoffset);
 
     if (d_config->b_info.type != LIBXL_DOMAIN_TYPE_PV) {
-        unsigned long shadow_mb = DIV_ROUNDUP(d_config->b_info.shadow_memkb,
-                                              1024);
+        unsigned int shadow_mb = DIV_ROUNDUP(d_config->b_info.shadow_memkb,
+                                             1024);
         int r = xc_shadow_control(ctx->xch, domid,
                                   XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
-                                  NULL, 0, &shadow_mb, 0, NULL);
+                                  &shadow_mb, 0);
 
         if (r) {
             LOGED(ERROR, domid,
-                  "Failed to set %lu MiB %s allocation",
+                  "Failed to set %u MiB %s allocation",
                   shadow_mb,
                   libxl_defbool_val(d_config->c_info.hap) ? "HAP" : "shadow");
             ret = ERROR_FAIL;
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -997,13 +997,13 @@ CAMLprim value stub_shadow_allocation_ge
 {
 	CAMLparam2(xch, domid);
 	CAMLlocal1(mb);
-	unsigned long c_mb;
+	unsigned int c_mb;
 	int ret;
 
 	caml_enter_blocking_section();
 	ret = xc_shadow_control(_H(xch), _D(domid),
 				XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION,
-				NULL, 0, &c_mb, 0, NULL);
+				&c_mb, 0);
 	caml_leave_blocking_section();
 	if (ret != 0)
 		failwith_xc(_H(xch));
@@ -1016,14 +1016,14 @@ CAMLprim value stub_shadow_allocation_se
 					  value mb)
 {
 	CAMLparam3(xch, domid, mb);
-	unsigned long c_mb;
+	unsigned int c_mb;
 	int ret;
 
 	c_mb = Int_val(mb);
 	caml_enter_blocking_section();
 	ret = xc_shadow_control(_H(xch), _D(domid),
 				XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
-				NULL, 0, &c_mb, 0, NULL);
+				&c_mb, 0);
 	caml_leave_blocking_section();
 	if (ret != 0)
 		failwith_xc(_H(xch));
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1192,8 +1192,7 @@ static PyObject *pyxc_shadow_control(PyO
                                       &dom, &op) )
         return NULL;
     
-    if ( xc_shadow_control(xc->xc_handle, dom, op, NULL, 0, NULL, 0, NULL) 
-         < 0 )
+    if ( xc_shadow_control(xc->xc_handle, dom, op, NULL, 0) < 0 )
         return pyxc_error_to_exception(xc->xc_handle);
     
     Py_INCREF(zero);
@@ -1208,7 +1207,7 @@ static PyObject *pyxc_shadow_mem_control
     int op;
     uint32_t dom;
     int mbarg = -1;
-    unsigned long mb;
+    unsigned int mb;
 
     static char *kwd_list[] = { "dom", "mb", NULL };
 
@@ -1223,7 +1222,7 @@ static PyObject *pyxc_shadow_mem_control
         mb = mbarg;
         op = XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION;
     }
-    if ( xc_shadow_control(xc->xc_handle, dom, op, NULL, 0, &mb, 0, NULL) < 0 )
+    if ( xc_shadow_control(xc->xc_handle, dom, op, &mb, 0) < 0 )
         return pyxc_error_to_exception(xc->xc_handle);
     
     mbarg = mb;



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
  2021-07-05 15:11 ` [PATCH v2 01/13] libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl Jan Beulich
  2021-07-05 15:12 ` [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control() Jan Beulich
@ 2021-07-05 15:13 ` Jan Beulich
  2021-07-05 15:41   ` Andrew Cooper
                     ` (2 more replies)
  2021-07-05 15:13 ` [PATCH v2 04/13] libxenguest: short-circuit "all-dirty" handling Jan Beulich
                   ` (10 subsequent siblings)
  13 siblings, 3 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

In send_memory_live() the precise value the dirty_count struct field
gets initialized to doesn't matter much (apart from the triggering of
the log message in send_dirty_pages(), see below), but it is important
that it not be zero on the first iteration (or else send_dirty_pages()
won't get called at all). Saturate the initializer value at the maximum
value the field can hold.

While there also initialize struct precopy_stats' respective field to a
more sane value: We don't really know how many dirty pages there are at
that point.

In suspend_and_send_dirty() and verify_frames() the local variables
don't need initializing at all, as they're only an output from the
hypercall which gets invoked first thing.

In send_checkpoint_dirty_pfn_list() the local variable can be dropped
altogether: It's optional to xc_logdirty_control() and not used anywhere
else.

Note that in case the clipping actually takes effect, the "Bitmap
contained more entries than expected..." log message will trigger. This
being just an informational message, I don't think this is overly
concerning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -452,7 +452,6 @@ static int send_checkpoint_dirty_pfn_lis
     unsigned int count, written;
     uint64_t i, *pfns = NULL;
     struct iovec *iov = NULL;
-    xc_shadow_op_stats_t stats = { 0, ctx->restore.p2m_size };
     struct xc_sr_record rec = {
         .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
     };
@@ -462,7 +461,7 @@ static int send_checkpoint_dirty_pfn_lis
     if ( xc_logdirty_control(
              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
              HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
-             0, &stats) != ctx->restore.p2m_size )
+             0, NULL) != ctx->restore.p2m_size )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         goto err;
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -500,7 +500,9 @@ static int simple_precopy_policy(struct
 static int send_memory_live(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats = {
+        .dirty_count = MIN(ctx->save.p2m_size, (typeof(stats.dirty_count))~0)
+    };
     char *progress_str = NULL;
     unsigned int x = 0;
     int rc;
@@ -519,7 +521,7 @@ static int send_memory_live(struct xc_sr
         goto out;
 
     ctx->save.stats = (struct precopy_stats){
-        .dirty_count = ctx->save.p2m_size,
+        .dirty_count = -1,
     };
     policy_stats = &ctx->save.stats;
 
@@ -643,7 +645,7 @@ static int colo_merge_secondary_dirty_bi
 static int suspend_and_send_dirty(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats;
     char *progress_str = NULL;
     int rc;
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
@@ -701,7 +703,7 @@ static int suspend_and_send_dirty(struct
 static int verify_frames(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xc_shadow_op_stats_t stats = { 0, ctx->save.p2m_size };
+    xc_shadow_op_stats_t stats;
     int rc;
     struct xc_sr_record rec = { .type = REC_TYPE_VERIFY };
 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 04/13] libxenguest: short-circuit "all-dirty" handling
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (2 preceding siblings ...)
  2021-07-05 15:13 ` [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow Jan Beulich
@ 2021-07-05 15:13 ` Jan Beulich
  2021-08-19 14:21   ` Juergen Gross
  2021-07-05 15:14 ` [PATCH v2 05/13] libxenguest: avoid allocating unused deferred-pages bitmap Jan Beulich
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

For one it is unnecessary to fill a perhaps large chunk of memory with
all ones. Add a new parameter to send_dirty_pages() for callers to
indicate so.

Then it is further unnecessary to allocate the dirty bitmap altogether
when all that's ever going to happen is a single all-dirty run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -368,7 +368,7 @@ static int suspend_domain(struct xc_sr_c
  * Bitmap is bounded by p2m_size.
  */
 static int send_dirty_pages(struct xc_sr_context *ctx,
-                            unsigned long entries)
+                            unsigned long entries, bool all_dirty)
 {
     xc_interface *xch = ctx->xch;
     xen_pfn_t p;
@@ -379,7 +379,7 @@ static int send_dirty_pages(struct xc_sr
 
     for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
     {
-        if ( !test_bit(p, dirty_bitmap) )
+        if ( !all_dirty && !test_bit(p, dirty_bitmap) )
             continue;
 
         rc = add_to_batch(ctx, p);
@@ -411,12 +411,7 @@ static int send_dirty_pages(struct xc_sr
  */
 static int send_all_pages(struct xc_sr_context *ctx)
 {
-    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
-                                    &ctx->save.dirty_bitmap_hbuf);
-
-    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
-
-    return send_dirty_pages(ctx, ctx->save.p2m_size);
+    return send_dirty_pages(ctx, ctx->save.p2m_size, true /* all_dirty */);
 }
 
 static int enable_logdirty(struct xc_sr_context *ctx)
@@ -508,9 +503,6 @@ static int send_memory_live(struct xc_sr
     int rc;
     int policy_decision;
 
-    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
-                                    &ctx->save.dirty_bitmap_hbuf);
-
     precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
     void *data = ctx->save.callbacks->data;
 
@@ -528,8 +520,6 @@ static int send_memory_live(struct xc_sr
     if ( precopy_policy == NULL )
         precopy_policy = simple_precopy_policy;
 
-    bitmap_set(dirty_bitmap, ctx->save.p2m_size);
-
     for ( ; ; )
     {
         policy_decision = precopy_policy(*policy_stats, data);
@@ -541,7 +531,7 @@ static int send_memory_live(struct xc_sr
             if ( rc )
                 goto out;
 
-            rc = send_dirty_pages(ctx, stats.dirty_count);
+            rc = send_dirty_pages(ctx, stats.dirty_count, x == 1);
             if ( rc )
                 goto out;
         }
@@ -687,7 +677,8 @@ static int suspend_and_send_dirty(struct
         }
     }
 
-    rc = send_dirty_pages(ctx, stats.dirty_count + ctx->save.nr_deferred_pages);
+    rc = send_dirty_pages(ctx, stats.dirty_count + ctx->save.nr_deferred_pages,
+                          false /* all_dirty */);
     if ( rc )
         goto out;
 
@@ -807,8 +798,11 @@ static int setup(struct xc_sr_context *c
     if ( rc )
         goto err;
 
-    dirty_bitmap = xc_hypercall_buffer_alloc_pages(
-        xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
+    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
+        ? xc_hypercall_buffer_alloc_pages(
+              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
+        : (void *)-1L;
+
     ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
                                   sizeof(*ctx->save.batch_pfns));
     ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 05/13] libxenguest: avoid allocating unused deferred-pages bitmap
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (3 preceding siblings ...)
  2021-07-05 15:13 ` [PATCH v2 04/13] libxenguest: short-circuit "all-dirty" handling Jan Beulich
@ 2021-07-05 15:14 ` Jan Beulich
  2021-08-19 14:22   ` Juergen Gross
  2021-07-05 15:14 ` [PATCH v2 06/13] libxenguest: complete loops in xc_map_domain_meminfo() Jan Beulich
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

Like for the dirty bitmap, it is unnecessary to allocate the deferred-
pages bitmap when all that's ever going to happen is a single all-dirty
run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -130,7 +130,7 @@ static int write_batch(struct xc_sr_cont
                                                       ctx->save.batch_pfns[i]);
 
         /* Likely a ballooned page. */
-        if ( mfns[i] == INVALID_MFN )
+        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
         {
             set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
             ++ctx->save.nr_deferred_pages;
@@ -196,8 +196,12 @@ static int write_batch(struct xc_sr_cont
             {
                 if ( rc == -1 && errno == EAGAIN )
                 {
-                    set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
-                    ++ctx->save.nr_deferred_pages;
+                    if ( ctx->save.deferred_pages )
+                    {
+                        set_bit(ctx->save.batch_pfns[i],
+                                ctx->save.deferred_pages);
+                        ++ctx->save.nr_deferred_pages;
+                    }
                     types[i] = XEN_DOMCTL_PFINFO_XTAB;
                     --nr_pages;
                 }
@@ -665,7 +669,8 @@ static int suspend_and_send_dirty(struct
     else
         xc_set_progress_prefix(xch, "Checkpointed save");
 
-    bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
+    if ( ctx->save.deferred_pages )
+        bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
 
     if ( !ctx->save.live && ctx->stream_type == XC_STREAM_COLO )
     {
@@ -682,7 +687,8 @@ static int suspend_and_send_dirty(struct
     if ( rc )
         goto out;
 
-    bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_size);
+    if ( ctx->save.deferred_pages )
+        bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_size);
     ctx->save.nr_deferred_pages = 0;
 
  out:
@@ -791,24 +797,31 @@ static int setup(struct xc_sr_context *c
 {
     xc_interface *xch = ctx->xch;
     int rc;
-    DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
-                                    &ctx->save.dirty_bitmap_hbuf);
 
     rc = ctx->save.ops.setup(ctx);
     if ( rc )
         goto err;
 
-    dirty_bitmap = ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN
-        ? xc_hypercall_buffer_alloc_pages(
-              xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)))
-        : (void *)-1L;
+    if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
+    {
+        DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
+                                        &ctx->save.dirty_bitmap_hbuf);
+
+        dirty_bitmap =
+            xc_hypercall_buffer_alloc_pages(
+                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
+        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+
+        if ( !dirty_bitmap || !ctx->save.deferred_pages )
+            goto enomem;
+    }
 
     ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
                                   sizeof(*ctx->save.batch_pfns));
-    ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
 
-    if ( !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages )
+    if ( !ctx->save.batch_pfns )
     {
+    enomem:
         ERROR("Unable to allocate memory for dirty bitmaps, batch pfns and"
               " deferred pages");
         rc = -1;



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 06/13] libxenguest: complete loops in xc_map_domain_meminfo()
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (4 preceding siblings ...)
  2021-07-05 15:14 ` [PATCH v2 05/13] libxenguest: avoid allocating unused deferred-pages bitmap Jan Beulich
@ 2021-07-05 15:14 ` Jan Beulich
  2021-07-05 15:15 ` [PATCH v2 07/13] libxenguest: guard against overflow from too large p2m when checkpointing Jan Beulich
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

minfo->p2m_size may have more than 31 significant bits. Change the
induction variable to unsigned long, and (largely for signed-ness
consistency) a helper variable to unsigned int. And while there also
avoid open-coding min().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Use min().

--- a/tools/libs/guest/xg_domain.c
+++ b/tools/libs/guest/xg_domain.c
@@ -40,7 +40,7 @@ int xc_map_domain_meminfo(xc_interface *
     xc_dominfo_t info;
     shared_info_any_t *live_shinfo;
     xen_capabilities_info_t xen_caps = "";
-    int i;
+    unsigned long i;
 
     /* Only be initialized once */
     if ( minfo->pfn_type || minfo->p2m_table )
@@ -116,12 +116,11 @@ int xc_map_domain_meminfo(xc_interface *
     /* Retrieve PFN types in batches */
     for ( i = 0; i < minfo->p2m_size ; i+=1024 )
     {
-        int count = ((minfo->p2m_size - i ) > 1024 ) ?
-                        1024: (minfo->p2m_size - i);
+        unsigned int count = min(minfo->p2m_size - i, 1024UL);
 
         if ( xc_get_pfn_type_batch(xch, domid, count, minfo->pfn_type + i) )
         {
-            PERROR("Could not get %d-eth batch of PFN types", (i+1)/1024);
+            PERROR("Could not get batch %lu of PFN types", (i + 1) / 1024);
             goto failed;
         }
     }



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 07/13] libxenguest: guard against overflow from too large p2m when checkpointing
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (5 preceding siblings ...)
  2021-07-05 15:14 ` [PATCH v2 06/13] libxenguest: complete loops in xc_map_domain_meminfo() Jan Beulich
@ 2021-07-05 15:15 ` Jan Beulich
  2021-07-05 15:15 ` [PATCH v2 08/13] libxenguest: fix off-by-1 in colo-secondary-bitmap merging Jan Beulich
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

struct xc_sr_record's length field has just 32 bits. Fill it early and
check that the calculated value hasn't overflowed. Additionally check
for counter overflow early - there's no point even trying to allocate
any memory in such an event.

While there also limit an induction variable's type to unsigned long:
There's no gain from it being uint64_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Of course looping over test_bit() is pretty inefficient, but given that
I have no idea how to test this code I wanted to restrict changes to
what can sensibly be seen as no worse than before from just looking at
the changes.

--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -450,7 +450,8 @@ static int send_checkpoint_dirty_pfn_lis
     xc_interface *xch = ctx->xch;
     int rc = -1;
     unsigned int count, written;
-    uint64_t i, *pfns = NULL;
+    unsigned long i;
+    uint64_t *pfns = NULL;
     struct iovec *iov = NULL;
     struct xc_sr_record rec = {
         .type = REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST,
@@ -469,16 +470,28 @@ static int send_checkpoint_dirty_pfn_lis
 
     for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
     {
-        if ( test_bit(i, dirty_bitmap) )
-            count++;
+        if ( test_bit(i, dirty_bitmap) && !++count )
+            break;
     }
 
+    if ( i < ctx->restore.p2m_size )
+    {
+        ERROR("Too many dirty pfns");
+        goto err;
+    }
+
+    rec.length = count * sizeof(*pfns);
+    if ( rec.length / sizeof(*pfns) != count )
+    {
+        ERROR("Too many (%u) dirty pfns", count);
+        goto err;
+    }
 
-    pfns = malloc(count * sizeof(*pfns));
+    pfns = malloc(rec.length);
     if ( !pfns )
     {
-        ERROR("Unable to allocate %zu bytes of memory for dirty pfn list",
-              count * sizeof(*pfns));
+        ERROR("Unable to allocate %u bytes of memory for dirty pfn list",
+              rec.length);
         goto err;
     }
 
@@ -504,8 +517,6 @@ static int send_checkpoint_dirty_pfn_lis
         goto err;
     }
 
-    rec.length = count * sizeof(*pfns);
-
     iov[0].iov_base = &rec.type;
     iov[0].iov_len = sizeof(rec.type);
 
@@ -513,7 +524,7 @@ static int send_checkpoint_dirty_pfn_lis
     iov[1].iov_len = sizeof(rec.length);
 
     iov[2].iov_base = pfns;
-    iov[2].iov_len = count * sizeof(*pfns);
+    iov[2].iov_len = rec.length;
 
     if ( writev_exact(ctx->restore.send_back_fd, iov, 3) )
     {



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 08/13] libxenguest: fix off-by-1 in colo-secondary-bitmap merging
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (6 preceding siblings ...)
  2021-07-05 15:15 ` [PATCH v2 07/13] libxenguest: guard against overflow from too large p2m when checkpointing Jan Beulich
@ 2021-07-05 15:15 ` Jan Beulich
  2021-07-05 15:15 ` [PATCH v2 09/13] libxenguest: restrict PV guest size Jan Beulich
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

Valid GFNs (having a representation in the dirty bitmap) need to be
strictly below p2m_size.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -614,7 +614,7 @@ static int colo_merge_secondary_dirty_bi
     for ( i = 0; i < count; i++ )
     {
         pfn = pfns[i];
-        if ( pfn > ctx->save.p2m_size )
+        if ( pfn >= ctx->save.p2m_size )
         {
             PERROR("Invalid pfn 0x%" PRIx64, pfn);
             rc = -1;



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 09/13] libxenguest: restrict PV guest size
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (7 preceding siblings ...)
  2021-07-05 15:15 ` [PATCH v2 08/13] libxenguest: fix off-by-1 in colo-secondary-bitmap merging Jan Beulich
@ 2021-07-05 15:15 ` Jan Beulich
  2021-07-05 15:16 ` [PATCH v2 10/13] libxc: simplify HYPERCALL_BUFFER() Jan Beulich
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

The P2M, the use of PFNs, and hence the maximum valid PFN are purely
software constructs in PV. In principle a guest is free to use arbitrary
PFNs. However, at least page table normalization requires that PFN space
be, like MFN space, limited to the architectural 40 bits (52 address
bits). And of course a 32-bit tool stack places further constraints.

Bounding the values also makes sure that various subsequent calculations
won't truncate values and then continue with inconsistencies (see e.g.
fl_entries vs ctx->x86.pv.p2m_frames in map_p2m_tree()).

While there correct an adjacent error message with wrong way round
wording in restore code and another slightly malformed and misleading
(off by one) one in core dumping code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Integrate into series.
---
In case the save/restore changes don't make it obvious enough: It
escapes me why struct xc_sr_rec_x86_pv_p2m_frames has p2m_pfns[] with
uint64_t element type but {start,end}_pfn both as uint32_t. Imo all
three can sensibly only ever be of the same type.

--- a/tools/include/xen-tools/libs.h
+++ b/tools/include/xen-tools/libs.h
@@ -13,6 +13,10 @@
 #define ARRAY_SIZE(a) (sizeof(a) / sizeof(*a))
 #endif
 
+#ifndef sizeof_field
+#define sizeof_field(type, field) sizeof(((type *)0)->field)
+#endif
+
 #ifndef MAX
 #define MAX(x, y) ((x) > (y) ? (x) : (y))
 #endif
--- a/tools/libs/guest/xg_core_x86.c
+++ b/tools/libs/guest/xg_core_x86.c
@@ -59,6 +59,43 @@ xc_core_arch_memory_map_get(xc_interface
     if ( xc_domain_nr_gpfns(xch, info->domid, &p2m_size) < 0 )
         return -1;
 
+    if ( !p2m_size )
+    {
+        ERROR("Cannot map a guest without P2M");
+        errno = ENODATA;
+        return -1;
+    }
+
+    if ( !info->hvm )
+    {
+        unsigned int guest_width;
+
+        if ( xc_domain_get_guest_width(xch, info->domid, &guest_width) != 0 )
+        {
+            PERROR("Cannot get address size for PV guest");
+            return -1;
+        }
+
+        if ( p2m_size == (guest_width > 4 ? ~0UL : ~0U) )
+        {
+            ERROR("Cannot map a PV guest with invalid P2M");
+            errno = ENODATA;
+            return -1;
+        }
+    }
+
+#ifndef __i386__
+    if ( (p2m_size - 1) >> 40 )
+#else
+    /* Very large domains (> 1TB) will exhaust virtual address space. */
+    if ( (p2m_size - 1) >> 28 )
+#endif
+    {
+        ERROR("Cannot map a guest with P2M size %#lx", p2m_size);
+        errno = EOPNOTSUPP;
+        return -1;
+    }
+
     map = malloc(sizeof(*map));
     if ( map == NULL )
     {
@@ -333,10 +370,30 @@ xc_core_arch_map_p2m_rw(xc_interface *xc
 
     if ( dinfo->p2m_size < info->nr_pages  )
     {
-        ERROR("p2m_size < nr_pages -1 (%lx < %lx", dinfo->p2m_size, info->nr_pages - 1);
+        ERROR("p2m_size < nr_pages (%lx < %lx)", dinfo->p2m_size, info->nr_pages);
         goto out;
     }
 
+    if ( !info->hvm && dinfo->p2m_size == (dinfo->guest_width > 4 ? ~0UL : ~0U) )
+    {
+        ERROR("Cannot r/%c-map a PV guest with invalid P2M", rw ? 'w' : 'o');
+        errno = ENODATA;
+        return -1;
+    }
+
+#ifndef __i386__
+    if ( (dinfo->p2m_size - 1) >> 40 )
+#else
+    /* Very large domains (> 1TB) will exhaust virtual address space. */
+    if ( (dinfo->p2m_size - 1) >> 28 )
+#endif
+    {
+        ERROR("Cannot r/%c-map a guest with P2M size %#lx",
+              rw ? 'w' : 'o', dinfo->p2m_size);
+        errno = EOPNOTSUPP;
+        return -1;
+    }
+
     p2m_cr3 = GET_FIELD(live_shinfo, arch.p2m_cr3, dinfo->guest_width);
 
     p2m_frame_list = p2m_cr3 ? xc_core_arch_map_p2m_list_rw(xch, dinfo, dom, live_shinfo, p2m_cr3)
--- a/tools/libs/guest/xg_sr_restore_x86_pv.c
+++ b/tools/libs/guest/xg_sr_restore_x86_pv.c
@@ -709,10 +709,23 @@ static int handle_x86_pv_p2m_frames(stru
         return -1;
     }
 
+#ifdef __i386__
+    /* Very large domains (> 1TB) will exhaust virtual address space. */
+    if ( data->end_pfn >> 28 )
+#elif 0 /* sizeof(data->end_pfn) > 4 */
+    if ( data->end_pfn >> (ctx->x86.pv.width > 4 ? 40 : 32) )
+#else
+    if ( 0 )
+#endif
+    {
+        ERROR("End pfn in stream (%#x) too large", data->end_pfn);
+        return -1;
+    }
+
     if ( data->start_pfn > data->end_pfn )
     {
-        ERROR("End pfn in stream (%#x) exceeds Start (%#x)",
-              data->end_pfn, data->start_pfn);
+        ERROR("Start pfn in stream (%#x) exceeds End (%#x)",
+              data->start_pfn, data->end_pfn);
         return -1;
     }
 
--- a/tools/libs/guest/xg_sr_save_x86_pv.c
+++ b/tools/libs/guest/xg_sr_save_x86_pv.c
@@ -464,11 +464,40 @@ static int map_p2m_list(struct xc_sr_con
  */
 static int map_p2m(struct xc_sr_context *ctx)
 {
+    xc_interface *xch = ctx->xch;
     uint64_t p2m_cr3;
+    uint64_t max_pfn = GET_FIELD(ctx->x86.pv.shinfo, arch.max_pfn,
+                                 ctx->x86.pv.width);
+
+    if ( !max_pfn )
+    {
+        ERROR("Cannot save a guest without P2M");
+        errno = ENODATA;
+        return -1;
+    }
+
+    if ( max_pfn-- == (ctx->x86.pv.width > 4 ? ~0UL : ~0U) )
+    {
+        ERROR("Cannot save a guest with invalid P2M");
+        errno = ENODATA;
+        return -1;
+    }
+
+#ifndef __i386__
+    if ( max_pfn >> (sizeof_field(struct xc_sr_rec_x86_pv_p2m_frames,
+                                  end_pfn) > 4 ? 40 : 32) )
+#else
+    /* Very large domains (> 1TB) will exhaust virtual address space. */
+    if ( max_pfn >> 28 )
+#endif
+    {
+        ERROR("Cannot save a guest with maximum PFN %#"PRIx64, max_pfn);
+        errno = EOPNOTSUPP;
+        return -1;
+    }
 
     ctx->x86.pv.p2m_generation = ~0ULL;
-    ctx->x86.pv.max_pfn = GET_FIELD(ctx->x86.pv.shinfo, arch.max_pfn,
-                                    ctx->x86.pv.width) - 1;
+    ctx->x86.pv.max_pfn = max_pfn;
     p2m_cr3 = GET_FIELD(ctx->x86.pv.shinfo, arch.p2m_cr3, ctx->x86.pv.width);
 
     return p2m_cr3 ? map_p2m_list(ctx, p2m_cr3) : map_p2m_tree(ctx);



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 10/13] libxc: simplify HYPERCALL_BUFFER()
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (8 preceding siblings ...)
  2021-07-05 15:15 ` [PATCH v2 09/13] libxenguest: restrict PV guest size Jan Beulich
@ 2021-07-05 15:16 ` Jan Beulich
  2021-07-05 15:47   ` Andrew Cooper
  2021-07-05 15:17 ` [PATCH v2 11/13] x86/paging: supply more useful log-dirty page count Jan Beulich
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Juergen Gross, George Dunlap, Ian Jackson

_hcbuf_buf1 has been there only for a pointer comparison to validate
type compatibility. The same can be achieved by not using typeof() on
the definition of what so far was _hcbuf_buf2, as the initializer has
to also be type-compatible. Drop _hcbuf_buf1 and the comaprison;
rename _hcbuf_buf2.

Since we're already using compiler extensions here, don't be shy and
also omit the middle operand of the involved ?: operator.

Bring line continuation character placement in line with that of
related macros.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -247,13 +247,10 @@ typedef struct xc_hypercall_buffer xc_hy
 /*
  * Returns the hypercall_buffer associated with a variable.
  */
-#define HYPERCALL_BUFFER(_name)                                 \
-    ({  xc_hypercall_buffer_t _hcbuf_buf1;                      \
-        typeof(XC__HYPERCALL_BUFFER_NAME(_name)) *_hcbuf_buf2 = \
-                &XC__HYPERCALL_BUFFER_NAME(_name);              \
-        (void)(&_hcbuf_buf1 == _hcbuf_buf2);                    \
-        (_hcbuf_buf2)->param_shadow ?                           \
-                (_hcbuf_buf2)->param_shadow : (_hcbuf_buf2);    \
+#define HYPERCALL_BUFFER(_name)                                \
+    ({  xc_hypercall_buffer_t *_hcbuf_buf =                    \
+                &XC__HYPERCALL_BUFFER_NAME(_name);             \
+        _hcbuf_buf->param_shadow ?: _hcbuf_buf;                \
      })
 
 #define HYPERCALL_BUFFER_INIT_NO_BOUNCE .dir = 0, .sz = 0, .ubuf = (void *)-1



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 11/13] x86/paging: supply more useful log-dirty page count
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (9 preceding siblings ...)
  2021-07-05 15:16 ` [PATCH v2 10/13] libxc: simplify HYPERCALL_BUFFER() Jan Beulich
@ 2021-07-05 15:17 ` Jan Beulich
  2021-07-05 15:18 ` [PATCH v2 12/13] x86/mm: update log-dirty bitmap when manipulating P2M Jan Beulich
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

In paging_log_dirty_op(), always update the count of pages field:
- if more pages were specified than the guest has ever accessed (HVM) or
  marked part of the p2m (PV), there's no point for the caller to
  inspect bits beyond the one for that last page,
- if the guest's p2m size has grown in the meantime, the caller would
  have no indication that it may not have caught all dirty bits.

Also exit the loop once having passed the last valid GFN. To balance
overhead and savings, do this before inspecting a new L2 table.

Adjust libxenguest accordingly, albeit these changes are necessary only
for guests which actually alter their P2M size while under migration.
They do, however, additionally open up the option of the hypervisor
eventually zapping large ranges of trailing zeros from the bitmap when
providing it back to the tools.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Of course this still is far from ideal: At the very least a perhaps
large tail of zeros could very well also result in a reduced page
count.

--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -237,7 +237,16 @@ struct xc_sr_context
             /* Further debugging information in the stream. */
             bool debug;
 
+            /*
+             * Counts of bits (each representing a guest page), expressing
+             * respectively
+             * - obtained P2M size,
+             * - allocated bitmap size,
+             * - range actually filled with valid data.
+             */
             unsigned long p2m_size;
+            unsigned long p2m_alloc_size;
+            unsigned long p2m_used_size;
 
             struct precopy_stats stats;
 
@@ -245,6 +254,7 @@ struct xc_sr_context
             unsigned int nr_batch_pfns;
             unsigned long *deferred_pages;
             unsigned long nr_deferred_pages;
+            unsigned long used_deferred_pages;
             xc_hypercall_buffer_t dirty_bitmap_hbuf;
         } save;
 
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -450,7 +450,8 @@ static int send_checkpoint_dirty_pfn_lis
     xc_interface *xch = ctx->xch;
     int rc = -1;
     unsigned int count, written;
-    unsigned long i;
+    unsigned long i, p2m_size;
+    long long ret;
     uint64_t *pfns = NULL;
     struct iovec *iov = NULL;
     struct xc_sr_record rec = {
@@ -459,22 +460,29 @@ static int send_checkpoint_dirty_pfn_lis
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->restore.dirty_bitmap_hbuf);
 
-    if ( xc_logdirty_control(
-             xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
-             HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
-             0, NULL) != ctx->restore.p2m_size )
+    ret = xc_logdirty_control(
+              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+              HYPERCALL_BUFFER(dirty_bitmap), ctx->restore.p2m_size,
+              0, NULL);
+    if ( ret < 0 )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         goto err;
     }
+    if ( ret > ctx->restore.p2m_size )
+    {
+        ERROR("Guest has grown its p2m too much");
+        goto err;
+    }
+    p2m_size = ret;
 
-    for ( i = 0, count = 0; i < ctx->restore.p2m_size; i++ )
+    for ( i = 0, count = 0; i < p2m_size; i++ )
     {
         if ( test_bit(i, dirty_bitmap) && !++count )
             break;
     }
 
-    if ( i < ctx->restore.p2m_size )
+    if ( i < p2m_size )
     {
         ERROR("Too many dirty pfns");
         goto err;
@@ -495,7 +503,7 @@ static int send_checkpoint_dirty_pfn_lis
         goto err;
     }
 
-    for ( i = 0, written = 0; i < ctx->restore.p2m_size; ++i )
+    for ( i = 0, written = 0; i < p2m_size; ++i )
     {
         if ( !test_bit(i, dirty_bitmap) )
             continue;
@@ -739,8 +747,10 @@ static int setup(struct xc_sr_context *c
 
     if ( ctx->stream_type == XC_STREAM_COLO )
     {
+        unsigned long pages = NRPAGES(bitmap_size(ctx->restore.p2m_size));
+
         dirty_bitmap = xc_hypercall_buffer_alloc_pages(
-            xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->restore.p2m_size)));
+            xch, dirty_bitmap, pages);
 
         if ( !dirty_bitmap )
         {
@@ -748,6 +758,8 @@ static int setup(struct xc_sr_context *c
             rc = -1;
             goto err;
         }
+
+        ctx->restore.p2m_size = pages << (PAGE_SHIFT + 3);
     }
 
     rc = ctx->restore.ops.setup(ctx);
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -74,6 +74,16 @@ static int write_checkpoint_record(struc
     return write_record(ctx, &checkpoint);
 }
 
+static void update_deferred_pages(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+    if ( !ctx->save.deferred_pages )
+        return;
+    set_bit(pfn, ctx->save.deferred_pages);
+    ++ctx->save.nr_deferred_pages;
+    if ( pfn >= ctx->save.used_deferred_pages )
+        ctx->save.used_deferred_pages = pfn + 1;
+}
+
 /*
  * Writes a batch of memory as a PAGE_DATA record into the stream.  The batch
  * is constructed in ctx->save.batch_pfns.
@@ -130,11 +140,8 @@ static int write_batch(struct xc_sr_cont
                                                       ctx->save.batch_pfns[i]);
 
         /* Likely a ballooned page. */
-        if ( mfns[i] == INVALID_MFN && ctx->save.deferred_pages )
-        {
-            set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
-            ++ctx->save.nr_deferred_pages;
-        }
+        if ( mfns[i] == INVALID_MFN )
+            update_deferred_pages(ctx, ctx->save.batch_pfns[i]);
     }
 
     rc = xc_get_pfn_type_batch(xch, ctx->domid, nr_pfns, types);
@@ -196,12 +203,7 @@ static int write_batch(struct xc_sr_cont
             {
                 if ( rc == -1 && errno == EAGAIN )
                 {
-                    if ( ctx->save.deferred_pages )
-                    {
-                        set_bit(ctx->save.batch_pfns[i],
-                                ctx->save.deferred_pages);
-                        ++ctx->save.nr_deferred_pages;
-                    }
+                    update_deferred_pages(ctx, ctx->save.batch_pfns[i]);
                     types[i] = XEN_DOMCTL_PFINFO_XTAB;
                     --nr_pages;
                 }
@@ -369,7 +371,7 @@ static int suspend_domain(struct xc_sr_c
  * Send a subset of pages in the guests p2m, according to the dirty bitmap.
  * Used for each subsequent iteration of the live migration loop.
  *
- * Bitmap is bounded by p2m_size.
+ * Bitmap is bounded by p2m_alloc_size, but populated only up to p2m_used_size.
  */
 static int send_dirty_pages(struct xc_sr_context *ctx,
                             unsigned long entries, bool all_dirty)
@@ -381,7 +383,10 @@ static int send_dirty_pages(struct xc_sr
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->save.dirty_bitmap_hbuf);
 
-    for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
+    if ( all_dirty )
+        ctx->save.p2m_used_size = ctx->save.p2m_size;
+
+    for ( p = 0, written = 0; p < ctx->save.p2m_used_size; ++p )
     {
         if ( !all_dirty && !test_bit(p, dirty_bitmap) )
             continue;
@@ -526,6 +531,8 @@ static int send_memory_live(struct xc_sr
 
     for ( ; ; )
     {
+        long long ret;
+
         policy_decision = precopy_policy(*policy_stats, data);
         x++;
 
@@ -552,15 +559,23 @@ static int send_memory_live(struct xc_sr
         if ( policy_decision != XGS_POLICY_CONTINUE_PRECOPY )
             break;
 
-        if ( xc_logdirty_control(
-                 xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
-                 &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
-                 0, &stats) != ctx->save.p2m_size )
+        ret = xc_logdirty_control(
+                  xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+                  &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_alloc_size,
+                  0, &stats);
+        if ( ret < 0 )
         {
             PERROR("Failed to retrieve logdirty bitmap");
             rc = -1;
             goto out;
         }
+        if ( ret > ctx->save.p2m_alloc_size )
+        {
+            ERROR("Guest has grown its p2m too much");
+            rc = -1;
+            goto out;
+        }
+        ctx->save.p2m_used_size = ret;
 
         policy_stats->dirty_count = stats.dirty_count;
 
@@ -614,7 +629,7 @@ static int colo_merge_secondary_dirty_bi
     for ( i = 0; i < count; i++ )
     {
         pfn = pfns[i];
-        if ( pfn >= ctx->save.p2m_size )
+        if ( pfn >= ctx->save.p2m_alloc_size )
         {
             PERROR("Invalid pfn 0x%" PRIx64, pfn);
             rc = -1;
@@ -642,6 +657,7 @@ static int suspend_and_send_dirty(struct
     xc_shadow_op_stats_t stats;
     char *progress_str = NULL;
     int rc;
+    long long ret;
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->save.dirty_bitmap_hbuf);
 
@@ -649,16 +665,22 @@ static int suspend_and_send_dirty(struct
     if ( rc )
         goto out;
 
-    if ( xc_logdirty_control(
-             xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
-             HYPERCALL_BUFFER(dirty_bitmap), ctx->save.p2m_size,
-             XEN_DOMCTL_SHADOW_LOGDIRTY_FINAL, &stats) !=
-         ctx->save.p2m_size )
+    ret = xc_logdirty_control(
+              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_CLEAN,
+              HYPERCALL_BUFFER(dirty_bitmap), ctx->save.p2m_alloc_size,
+              XEN_DOMCTL_SHADOW_LOGDIRTY_FINAL, &stats);
+    if ( ret < 0 )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         rc = -1;
         goto out;
     }
+    if ( ret > ctx->save.p2m_alloc_size )
+    {
+        ERROR("Guest has grown its p2m too much");
+        rc = -1;
+        goto out;
+    }
 
     if ( ctx->save.live )
     {
@@ -670,7 +692,8 @@ static int suspend_and_send_dirty(struct
         xc_set_progress_prefix(xch, "Checkpointed save");
 
     if ( ctx->save.deferred_pages )
-        bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_size);
+        bitmap_or(dirty_bitmap, ctx->save.deferred_pages, ctx->save.p2m_alloc_size);
+    ctx->save.p2m_used_size = MAX(ret, ctx->save.used_deferred_pages);
 
     if ( !ctx->save.live && ctx->stream_type == XC_STREAM_COLO )
     {
@@ -688,8 +711,9 @@ static int suspend_and_send_dirty(struct
         goto out;
 
     if ( ctx->save.deferred_pages )
-        bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_size);
+        bitmap_clear(ctx->save.deferred_pages, ctx->save.p2m_alloc_size);
     ctx->save.nr_deferred_pages = 0;
+    ctx->save.used_deferred_pages = 0;
 
  out:
     xc_set_progress_prefix(xch, NULL);
@@ -702,6 +726,7 @@ static int verify_frames(struct xc_sr_co
     xc_interface *xch = ctx->xch;
     xc_shadow_op_stats_t stats;
     int rc;
+    long long ret;
     struct xc_sr_record rec = { .type = REC_TYPE_VERIFY };
 
     DPRINTF("Enabling verify mode");
@@ -715,15 +740,18 @@ static int verify_frames(struct xc_sr_co
     if ( rc )
         goto out;
 
-    if ( xc_logdirty_control(
-             xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_PEEK,
-             &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_size,
-             0, &stats) != ctx->save.p2m_size )
+    ret = xc_logdirty_control(
+              xch, ctx->domid, XEN_DOMCTL_SHADOW_OP_PEEK,
+              &ctx->save.dirty_bitmap_hbuf, ctx->save.p2m_alloc_size,
+              0, &stats);
+    if ( ret < 0 )
     {
         PERROR("Failed to retrieve logdirty bitmap");
         rc = -1;
         goto out;
     }
+    if ( ret > ctx->save.p2m_alloc_size )
+        IPRINTF("Guest has grown its p2m too much");
 
     DPRINTF("  Further stats: faults %u, dirty %u",
             stats.fault_count, stats.dirty_count);
@@ -804,13 +832,14 @@ static int setup(struct xc_sr_context *c
 
     if ( ctx->save.live || ctx->stream_type != XC_STREAM_PLAIN )
     {
+        unsigned long pages = NRPAGES(bitmap_size(ctx->save.p2m_size));
         DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                         &ctx->save.dirty_bitmap_hbuf);
 
         dirty_bitmap =
-            xc_hypercall_buffer_alloc_pages(
-                xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
-        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+            xc_hypercall_buffer_alloc_pages(xch, dirty_bitmap, pages);
+        ctx->save.p2m_alloc_size = pages << (PAGE_SHIFT + 3);
+        ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_alloc_size);
 
         if ( !dirty_bitmap || !ctx->save.deferred_pages )
             goto enomem;
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -397,6 +397,19 @@ int paging_mfn_is_dirty(struct domain *d
     return rv;
 }
 
+/*
+ * This is used to provide a rough (upper) estimate to the caller of how many
+ * more pages we might have data for.
+ */
+static unsigned int last_valid_entry(const mfn_t *tbl, unsigned int idx) {
+    unsigned int last = LOGDIRTY_NODE_ENTRIES;
+
+    for ( ; idx < LOGDIRTY_NODE_ENTRIES; ++idx )
+        if ( mfn_valid(tbl[idx]) )
+            last = idx;
+
+    return last;
+}
 
 /* Read a domain's log-dirty bitmap and stats.  If the operation is a CLEAN,
  * clear the bitmap and stats as well. */
@@ -405,10 +418,10 @@ static int paging_log_dirty_op(struct do
                                bool_t resuming)
 {
     int rv = 0, clean = 0, peek = 1;
-    unsigned long pages = 0;
+    unsigned long pages = 0, extra = 0;
     mfn_t *l4 = NULL, *l3 = NULL, *l2 = NULL;
     unsigned long *l1 = NULL;
-    int i4, i3, i2;
+    unsigned int i4, i3, i2;
 
     if ( !resuming )
     {
@@ -479,6 +492,15 @@ static int paging_log_dirty_op(struct do
         l3 = (l4 && mfn_valid(l4[i4])) ? map_domain_page(l4[i4]) : NULL;
         for ( ; (pages < sc->pages) && (i3 < LOGDIRTY_NODE_ENTRIES); i3++ )
         {
+            unsigned long max_gfn = domain_get_maximum_gpfn(d);
+
+            if ( (i4 * LOGDIRTY_NODE_ENTRIES + i3) *
+                 LOGDIRTY_NODE_ENTRIES * PAGE_SIZE * 8 > max_gfn )
+            {
+                i4 = LOGDIRTY_NODE_ENTRIES;
+                break;
+            }
+
             l2 = ((l3 && mfn_valid(l3[i3])) ?
                   map_domain_page(l3[i3]) : NULL);
             for ( i2 = 0;
@@ -502,18 +524,36 @@ static int paging_log_dirty_op(struct do
                         goto out;
                     }
                 }
+
                 pages += bytes << 3;
+
                 if ( l1 )
                 {
+                    if ( unlikely(pages >= sc->pages) )
+                        extra = (PAGE_SIZE - bytes) << 3;
+
                     if ( clean )
                         clear_page(l1);
                     unmap_domain_page(l1);
                 }
             }
+
             if ( l2 )
+            {
+                if ( unlikely(pages >= sc->pages) )
+                {
+                    i2 = last_valid_entry(l2, i2);
+                    if ( i2 < LOGDIRTY_NODE_ENTRIES )
+                        extra = ((i4 * LOGDIRTY_NODE_ENTRIES + i3) *
+                                 LOGDIRTY_NODE_ENTRIES + i2 + 1) *
+                                PAGE_SIZE * 8;
+                }
+
                 unmap_domain_page(l2);
+            }
 
-            if ( i3 < LOGDIRTY_NODE_ENTRIES - 1 && hypercall_preempt_check() )
+            if ( pages < sc->pages && i3 < LOGDIRTY_NODE_ENTRIES - 1 &&
+                 hypercall_preempt_check() )
             {
                 d->arch.paging.preempt.log_dirty.i4 = i4;
                 d->arch.paging.preempt.log_dirty.i3 = i3 + 1;
@@ -521,10 +561,21 @@ static int paging_log_dirty_op(struct do
                 break;
             }
         }
+
         if ( l3 )
+        {
+            if ( !rv && unlikely(pages >= sc->pages) )
+            {
+                i3 = last_valid_entry(l3, i3);
+                if ( i3 < LOGDIRTY_NODE_ENTRIES )
+                    extra = (i4 * LOGDIRTY_NODE_ENTRIES + i3 + 1) *
+                            LOGDIRTY_NODE_ENTRIES * PAGE_SIZE * 8;
+            }
+
             unmap_domain_page(l3);
+        }
 
-        if ( !rv && i4 < LOGDIRTY_NODE_ENTRIES - 1 &&
+        if ( !rv && pages < sc->pages && i4 < LOGDIRTY_NODE_ENTRIES - 1 &&
              hypercall_preempt_check() )
         {
             d->arch.paging.preempt.log_dirty.i4 = i4 + 1;
@@ -534,8 +585,19 @@ static int paging_log_dirty_op(struct do
         if ( rv )
             break;
     }
+
     if ( l4 )
+    {
+        if ( !rv && unlikely(pages >= sc->pages) )
+        {
+            i4 = last_valid_entry(l4, i4);
+            if ( i4 < LOGDIRTY_NODE_ENTRIES )
+                extra = (i4 + 1) * LOGDIRTY_NODE_ENTRIES *
+                        LOGDIRTY_NODE_ENTRIES * PAGE_SIZE * 8;
+        }
+
         unmap_domain_page(l4);
+    }
 
     if ( !rv )
     {
@@ -562,8 +624,8 @@ static int paging_log_dirty_op(struct do
         return rv;
     }
 
-    if ( pages < sc->pages )
-        sc->pages = pages;
+    sc->pages = min(pages + extra, domain_get_maximum_gpfn(d) + 1);
+
     if ( clean )
     {
         /* We need to further call clean_dirty_bitmap() functions of specific
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -261,7 +261,8 @@ struct xen_domctl_shadow_op {
 
     /* OP_PEEK / OP_CLEAN */
     XEN_GUEST_HANDLE_64(uint8) dirty_bitmap;
-    uint64_aligned_t pages; /* Size of buffer. Updated with actual size. */
+    uint64_aligned_t pages; /* Size of buffer. Updated with actual (or
+                               potentially needed) size. */
     struct xen_domctl_shadow_op_stats stats;
 };
 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 12/13] x86/mm: update log-dirty bitmap when manipulating P2M
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (10 preceding siblings ...)
  2021-07-05 15:17 ` [PATCH v2 11/13] x86/paging: supply more useful log-dirty page count Jan Beulich
@ 2021-07-05 15:18 ` Jan Beulich
  2021-07-05 15:18 ` [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
  2021-07-19  7:46 ` Ping: [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
  13 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:18 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, George Dunlap

Just like for PV guests MMU_MACHPHYS_UPDATE implies marking of the
respective page as dirty, additions to a HVM guest's P2M should do so.

For HVM the opposite is also true: Pages being removed from the P2M are
no longer dirty at their prior GFN; there's no point in telling the tool
stack to try and copy that page, when this will fail anyway (until
perhaps a new page gets placed there). Introduce paging_mark_pfn_clean()
(intentionally without a paging_mark_clean() counterpart) to handle
this. Note that while there is an earlier call to set_gpfn_from_mfn() in
guest_physmap_add_entry(), but there's little reason to mark the page
clean there when later in the function it'll be marked dirty. This is
even more so given that at this point it's only the M2P that gets
updated, with the P2M still left unchanged.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
guest_physmap_add_entry()'s error handling looks bogus in this regard
anyway: If an error occurs before an MFN actually gets assciated with
the new GFN, the M2P entry ought to be restored imo. But of course a
guest is still hosed if the operation succeeds partially.

Note that I've not even checked mem-paging and mem-sharing code for
whether they may need similar adjustment. At least the latters is, aiui,
incompatible with log-dirty mode anyway.

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -818,7 +818,10 @@ p2m_remove_page(struct p2m_domain *p2m,
         {
             p2m->get_entry(p2m, gfn_add(gfn, i), &t, &a, 0, NULL, NULL);
             if ( !p2m_is_grant(t) && !p2m_is_shared(t) && !p2m_is_foreign(t) )
+            {
                 set_gpfn_from_mfn(mfn_x(mfn) + i, INVALID_M2P_ENTRY);
+                paging_mark_pfn_clean(p2m->domain, _pfn(gfn_x(gfn) + i));
+            }
         }
     }
 
@@ -1027,8 +1030,11 @@ guest_physmap_add_entry(struct domain *d
         if ( !p2m_is_grant(t) )
         {
             for ( i = 0; i < (1UL << page_order); i++ )
+            {
                 set_gpfn_from_mfn(mfn_x(mfn_add(mfn, i)),
                                   gfn_x(gfn_add(gfn, i)));
+                paging_mark_pfn_dirty(d, _pfn(gfn_x(gfn) + i));
+            }
         }
     }
 
@@ -1314,6 +1320,7 @@ static int set_typed_p2m_entry(struct do
         {
             ASSERT(mfn_valid(mfn_add(omfn, i)));
             set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
+            paging_mark_pfn_clean(d, _pfn(gfn_x(gfn) + i));
         }
 
         ioreq_request_mapcache_invalidate(d);
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -645,7 +645,10 @@ p2m_pod_decrease_reservation(struct doma
             }
             p2m_tlb_flush_sync(p2m);
             for ( j = 0; j < n; ++j )
+            {
                 set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
+                paging_mark_pfn_clean(d, _pfn(gfn_x(gfn) + i + j));
+            }
             p2m_pod_cache_add(p2m, page, cur_order);
 
             ioreq_request_mapcache_invalidate(d);
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -259,7 +259,7 @@ static int paging_log_dirty_disable(stru
 }
 
 /* Mark a page as dirty, with taking guest pfn as parameter */
-void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
+static void mark_pfn_dirty(struct domain *d, pfn_t pfn, bool dirty)
 {
     bool changed;
     mfn_t mfn, *l4, *l3, *l2;
@@ -290,14 +290,15 @@ void paging_mark_pfn_dirty(struct domain
 
     if ( unlikely(!mfn_valid(d->arch.paging.log_dirty.top)) ) 
     {
-         d->arch.paging.log_dirty.top = paging_new_log_dirty_node(d);
+         if ( dirty )
+             d->arch.paging.log_dirty.top = paging_new_log_dirty_node(d);
          if ( unlikely(!mfn_valid(d->arch.paging.log_dirty.top)) )
              goto out;
     }
 
     l4 = paging_map_log_dirty_bitmap(d);
     mfn = l4[i4];
-    if ( !mfn_valid(mfn) )
+    if ( !mfn_valid(mfn) && dirty )
         l4[i4] = mfn = paging_new_log_dirty_node(d);
     unmap_domain_page(l4);
     if ( !mfn_valid(mfn) )
@@ -305,7 +306,7 @@ void paging_mark_pfn_dirty(struct domain
 
     l3 = map_domain_page(mfn);
     mfn = l3[i3];
-    if ( !mfn_valid(mfn) )
+    if ( !mfn_valid(mfn) && dirty )
         l3[i3] = mfn = paging_new_log_dirty_node(d);
     unmap_domain_page(l3);
     if ( !mfn_valid(mfn) )
@@ -313,21 +314,22 @@ void paging_mark_pfn_dirty(struct domain
 
     l2 = map_domain_page(mfn);
     mfn = l2[i2];
-    if ( !mfn_valid(mfn) )
+    if ( !mfn_valid(mfn) && dirty )
         l2[i2] = mfn = paging_new_log_dirty_leaf(d);
     unmap_domain_page(l2);
     if ( !mfn_valid(mfn) )
         goto out;
 
     l1 = map_domain_page(mfn);
-    changed = !__test_and_set_bit(i1, l1);
+    changed = dirty ? !__test_and_set_bit(i1, l1)
+                    : __test_and_clear_bit(i1, l1);
     unmap_domain_page(l1);
     if ( changed )
     {
         PAGING_DEBUG(LOGDIRTY,
-                     "d%d: marked mfn %" PRI_mfn " (pfn %" PRI_pfn ")\n",
-                     d->domain_id, mfn_x(mfn), pfn_x(pfn));
-        d->arch.paging.log_dirty.dirty_count++;
+                     "%pd: marked mfn %" PRI_mfn " (pfn %" PRI_pfn ") %s\n",
+                     d, mfn_x(mfn), pfn_x(pfn), dirty ? "dirty" : "clean");
+        d->arch.paging.log_dirty.dirty_count += dirty ? 1 : -1;
     }
 
 out:
@@ -336,6 +338,16 @@ out:
     return;
 }
 
+void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
+{
+    mark_pfn_dirty(d, pfn, true);
+}
+
+void paging_mark_pfn_clean(struct domain *d, pfn_t pfn)
+{
+    mark_pfn_dirty(d, pfn, false);
+}
+
 /* Mark a page as dirty */
 void paging_mark_dirty(struct domain *d, mfn_t gmfn)
 {
@@ -348,7 +360,7 @@ void paging_mark_dirty(struct domain *d,
     /* We /really/ mean PFN here, even for non-translated guests. */
     pfn = _pfn(get_gpfn_from_mfn(mfn_x(gmfn)));
 
-    paging_mark_pfn_dirty(d, pfn);
+    mark_pfn_dirty(d, pfn, true);
 }
 
 
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -170,8 +170,9 @@ void paging_log_dirty_init(struct domain
 
 /* mark a page as dirty */
 void paging_mark_dirty(struct domain *d, mfn_t gmfn);
-/* mark a page as dirty with taking guest pfn as parameter */
+/* mark a page as dirty/clean with taking guest pfn as parameter */
 void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn);
+void paging_mark_pfn_clean(struct domain *d, pfn_t pfn);
 
 /* is this guest page dirty? 
  * This is called from inside paging code, with the paging lock held. */



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (11 preceding siblings ...)
  2021-07-05 15:18 ` [PATCH v2 12/13] x86/mm: update log-dirty bitmap when manipulating P2M Jan Beulich
@ 2021-07-05 15:18 ` Jan Beulich
  2021-07-14 18:16   ` Julien Grall
  2021-07-19  7:46 ` Ping: [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
  13 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:18 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monné,
	Andrew Cooper, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Wei Liu

Let's try to avoid giving the impression that 32-bit tool stacks are as
capable as 64-bit ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Wording adjustments as per review discussion.

--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -131,6 +131,12 @@ ARM only has one guest type at the momen
 
 ## Toolstack
 
+While 32-bit builds of the tool stack are generally supported, restrictions
+apply in particular when running on top of a 64-bit hypervisor.  For example,
+very large guests aren't expected to be manageable in this case.  This includes
+guests giving the appearance of being large, by altering their own memory
+layouts.
+
 ### xl
 
     Status: Supported



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-05 15:13 ` [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow Jan Beulich
@ 2021-07-05 15:41   ` Andrew Cooper
  2021-07-05 15:53     ` Jan Beulich
  2021-07-05 17:26   ` Olaf Hering
  2021-08-19 10:20   ` Juergen Gross
  2 siblings, 1 reply; 44+ messages in thread
From: Andrew Cooper @ 2021-07-05 15:41 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Roger Pau Monné, Juergen Gross, George Dunlap, Ian Jackson

On 05/07/2021 16:13, Jan Beulich wrote:
> In send_memory_live() the precise value the dirty_count struct field
> gets initialized to doesn't matter much (apart from the triggering of
> the log message in send_dirty_pages(), see below), but it is important
> that it not be zero on the first iteration (or else send_dirty_pages()
> won't get called at all). Saturate the initializer value at the maximum
> value the field can hold.

I've already explained why this is broken...

> While there also initialize struct precopy_stats' respective field to a
> more sane value: We don't really know how many dirty pages there are at
> that point.
>
> In suspend_and_send_dirty() and verify_frames() the local variables
> don't need initializing at all, as they're only an output from the
> hypercall which gets invoked first thing.
>
> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
> altogether: It's optional to xc_logdirty_control() and not used anywhere
> else.

... and why this is broken particularly in the context of a later
change, and ...

>
> Note that in case the clipping actually takes effect, the "Bitmap
> contained more entries than expected..." log message will trigger. This
> being just an informational message, I don't think this is overly
> concerning.

... why this isn't ok.

Why do I bother wasting my time reviewing patches in the first place.

~Andrew



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/13] libxc: simplify HYPERCALL_BUFFER()
  2021-07-05 15:16 ` [PATCH v2 10/13] libxc: simplify HYPERCALL_BUFFER() Jan Beulich
@ 2021-07-05 15:47   ` Andrew Cooper
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Cooper @ 2021-07-05 15:47 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Juergen Gross, George Dunlap, Ian Jackson

On 05/07/2021 16:16, Jan Beulich wrote:
> _hcbuf_buf1 has been there only for a pointer comparison to validate
> type compatibility. The same can be achieved by not using typeof() on
> the definition of what so far was _hcbuf_buf2, as the initializer has
> to also be type-compatible. Drop _hcbuf_buf1 and the comaprison;
> rename _hcbuf_buf2.
>
> Since we're already using compiler extensions here, don't be shy and
> also omit the middle operand of the involved ?: operator.
>
> Bring line continuation character placement in line with that of
> related macros.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-05 15:41   ` Andrew Cooper
@ 2021-07-05 15:53     ` Jan Beulich
  0 siblings, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-05 15:53 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson, xen-devel

On 05.07.2021 17:41, Andrew Cooper wrote:
> On 05/07/2021 16:13, Jan Beulich wrote:
>> In send_memory_live() the precise value the dirty_count struct field
>> gets initialized to doesn't matter much (apart from the triggering of
>> the log message in send_dirty_pages(), see below), but it is important
>> that it not be zero on the first iteration (or else send_dirty_pages()
>> won't get called at all). Saturate the initializer value at the maximum
>> value the field can hold.
> 
> I've already explained why this is broken...
> 
>> While there also initialize struct precopy_stats' respective field to a
>> more sane value: We don't really know how many dirty pages there are at
>> that point.
>>
>> In suspend_and_send_dirty() and verify_frames() the local variables
>> don't need initializing at all, as they're only an output from the
>> hypercall which gets invoked first thing.
>>
>> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
>> altogether: It's optional to xc_logdirty_control() and not used anywhere
>> else.
> 
> ... and why this is broken particularly in the context of a later
> change, and ...
> 
>>
>> Note that in case the clipping actually takes effect, the "Bitmap
>> contained more entries than expected..." log message will trigger. This
>> being just an informational message, I don't think this is overly
>> concerning.
> 
> ... why this isn't ok.
> 
> Why do I bother wasting my time reviewing patches in the first place.

I'm sorry Andrew, but no. I could as well reply "Why do I bother
replying to your review comments?" You did say

"I don't follow.  Migration would be extremely broken if the first
 iteration didn't work correctly, so something else is going on here."

which I replied to:

"As per the title we're talking about overflow situation here. In particular
 the field did end up zero when ctx->save.p2m_size was 0x100000000. I'm not
 claiming in any way that the first iteration would generally not work."

and then nothing else came back from you. I gave it a couple of
days, taking silence as indication that my reply was satisfactory.

Similarly e.g. for you saying

"This is an external interface, and I'm not sure it will tolerate finding
 more than p2m_size allegedly dirty."

with my reply

"But you do realize that a few lines down from here there already was

        policy_stats->dirty_count   = -1;

 ? Or are you trying to tell me that -1 (documented as indicating
 "unknown") is okay on subsequent iterations, but not on the first one?
 That's not being said anywhere ..."

If you expect me to adjust patches rather then just replying verbally,
you won't get around replying back when you get verbal feedback.

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-05 15:13 ` [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow Jan Beulich
  2021-07-05 15:41   ` Andrew Cooper
@ 2021-07-05 17:26   ` Olaf Hering
  2021-07-06  6:39     ` Jan Beulich
  2021-08-19 10:20   ` Juergen Gross
  2 siblings, 1 reply; 44+ messages in thread
From: Olaf Hering @ 2021-07-05 17:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

Am Mon, 5 Jul 2021 17:13:28 +0200
schrieb Jan Beulich <jbeulich@suse.com>:

>      ctx->save.stats = (struct precopy_stats){
> -        .dirty_count = ctx->save.p2m_size,
> +        .dirty_count = -1,
>      };

As said earlier, a consumer of these data may now be unable to initialize itself properly. Without the patch it would be able to size its private data structures properly to p2m_size. With the patch it can not know in advance what the upper limit might be.

There is no in-tree consumer that is affected, and I do not have an out-of-tree consumer that might be broken by this change.

Just saying..

Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-05 17:26   ` Olaf Hering
@ 2021-07-06  6:39     ` Jan Beulich
  2021-07-06  6:46       ` Olaf Hering
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-06  6:39 UTC (permalink / raw)
  To: Olaf Hering
  Cc: xen-devel, Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

On 05.07.2021 19:26, Olaf Hering wrote:
> Am Mon, 5 Jul 2021 17:13:28 +0200
> schrieb Jan Beulich <jbeulich@suse.com>:
> 
>>      ctx->save.stats = (struct precopy_stats){
>> -        .dirty_count = ctx->save.p2m_size,
>> +        .dirty_count = -1,
>>      };
> 
> As said earlier, a consumer of these data may now be unable to initialize itself properly. Without the patch it would be able to size its private data structures properly to p2m_size. With the patch it can not know in advance what the upper limit might be.

And as said before, besides the intention of this being usable for
sizing purposes not being spelled out anywhere (instead it is
explicitly documented that -1 means "unknown", and the value really
_is_ unknown at this point; p2m_size is only a wrongly assumed
upper bound), this is useless information as the size may change in
the course of migration. It is a present limitation that this isn't
handled properly, which I think would better not be baked into
further places.

> There is no in-tree consumer that is affected, and I do not have an out-of-tree consumer that might be broken by this change.
> 
> Just saying..

Sure, but (as is the case for Andrew's replies) if you want me to
change the patch, you need to not just "say" something, but provide
convincing arguments. So far I've provided counter arguments which
weren't proven wrong or at least inapplicable.

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-06  6:39     ` Jan Beulich
@ 2021-07-06  6:46       ` Olaf Hering
  2021-07-06  6:58         ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Olaf Hering @ 2021-07-06  6:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

Am Tue, 6 Jul 2021 08:39:21 +0200
schrieb Jan Beulich <jbeulich@suse.com>:

> the size may change in the course of migration

How can the p2m_size change? This upper limit is queried once, then all loops take this as loop counter.

Olaf


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-06  6:46       ` Olaf Hering
@ 2021-07-06  6:58         ` Jan Beulich
  2021-07-06  7:15           ` Olaf Hering
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-06  6:58 UTC (permalink / raw)
  To: Olaf Hering
  Cc: xen-devel, Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

On 06.07.2021 08:46, Olaf Hering wrote:
> Am Tue, 6 Jul 2021 08:39:21 +0200
> schrieb Jan Beulich <jbeulich@suse.com>:
> 
>> the size may change in the course of migration
> 
> How can the p2m_size change? This upper limit is queried once, then
> all loops take this as loop counter.

Well, that's part of the current limitation. The size would need re-querying
before every iteration (or at the very least before the last one, after the
domain has been paused).

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-06  6:58         ` Jan Beulich
@ 2021-07-06  7:15           ` Olaf Hering
  0 siblings, 0 replies; 44+ messages in thread
From: Olaf Hering @ 2021-07-06  7:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

Am Tue, 6 Jul 2021 08:58:59 +0200
schrieb Jan Beulich <jbeulich@suse.com>:

> Well, that's part of the current limitation. The size would need re-querying
> before every iteration (or at the very least before the last one, after the
> domain has been paused).

Ah, you even wrote that.

In addition each iteration should also query the state of the domU.
Right now a shutdown is not handled and the dead domU is detected only at the very end.


Olaf


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks
  2021-07-05 15:18 ` [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
@ 2021-07-14 18:16   ` Julien Grall
  2021-07-15  6:38     ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Julien Grall @ 2021-07-14 18:16 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Roger Pau Monné,
	Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu

Hi Jan,

On 05/07/2021 16:18, Jan Beulich wrote:
> Let's try to avoid giving the impression that 32-bit tool stacks are as
> capable as 64-bit ones.

Would you be able to provide a few examples of the known issues in the 
commit message? This would be helpful for anyone to understand why we 
decided to drop the support.

At least on Arm, we tried to design the hypercall ABI in such a way that 
it should be possible to use a 32-bit toolstack.

That said, I am not aware of anyone using the 32-bit ABI on 64-bit Arm 
hypervisor. So dropping the support should be fine.

Cheers,

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Wording adjustments as per review discussion.
> 
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -131,6 +131,12 @@ ARM only has one guest type at the momen
>   
>   ## Toolstack
>   
> +While 32-bit builds of the tool stack are generally supported, restrictions
> +apply in particular when running on top of a 64-bit hypervisor.  For example,
> +very large guests aren't expected to be manageable in this case.  This includes
> +guests giving the appearance of being large, by altering their own memory
> +layouts.
> +
>   ### xl
>   
>       Status: Supported
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks
  2021-07-14 18:16   ` Julien Grall
@ 2021-07-15  6:38     ` Jan Beulich
  2021-07-15  9:05       ` Julien Grall
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-15  6:38 UTC (permalink / raw)
  To: Julien Grall
  Cc: Roger Pau Monné,
	Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, xen-devel

On 14.07.2021 20:16, Julien Grall wrote:
> On 05/07/2021 16:18, Jan Beulich wrote:
>> Let's try to avoid giving the impression that 32-bit tool stacks are as
>> capable as 64-bit ones.
> 
> Would you be able to provide a few examples of the known issues in the 
> commit message? This would be helpful for anyone to understand why we 
> decided to drop the support.

Not sure how useful this is going to be. This would be pointing at the
declarations / definitions of various tool stack internal variables or
structure fields. Which also is why ...

> At least on Arm, we tried to design the hypercall ABI in such a way that 
> it should be possible to use a 32-bit toolstack.

... keeping the ABI tidy in this regard didn't help at all (albeit it
of course was a prereq to writing a tool stack that would be capable).

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks
  2021-07-15  6:38     ` Jan Beulich
@ 2021-07-15  9:05       ` Julien Grall
  2021-07-15 11:36         ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Julien Grall @ 2021-07-15  9:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, xen-devel

Hi Jan,

On 15/07/2021 07:38, Jan Beulich wrote:
> On 14.07.2021 20:16, Julien Grall wrote:
>> On 05/07/2021 16:18, Jan Beulich wrote:
>>> Let's try to avoid giving the impression that 32-bit tool stacks are as
>>> capable as 64-bit ones.
>>
>> Would you be able to provide a few examples of the known issues in the
>> commit message? This would be helpful for anyone to understand why we
>> decided to drop the support.
> 
> Not sure how useful this is going to be.

It would at least be useful to me, so I can make an informed decision. I 
suspect it would also be for anyone reading it in the future. This is 
rather frustrating to find commit message with barely any rationale and 
no-one remembering why this was done...

I vaguely recall a discussion about 64-bit hypercall ([1]). I assume the 
decision to drop support is related to it, but I have no way to prove it 
from the commit message.

It is also not clear why adding the restriction is the way to go...

> This would be pointing at the
> declarations / definitions of various tool stack internal variables or
> structure fields. Which also is why ...

... is this because such issues are too widespread in libxc/libxl to fix 
it in long term?

> 
>> At least on Arm, we tried to design the hypercall ABI in such a way that
>> it should be possible to use a 32-bit toolstack.
> 
> ... keeping the ABI tidy in this regard didn't help at all (albeit it
> of course was a prereq to writing a tool stack that would be capable).
> 
> Jan
> 

[1] 
https://lore.kernel.org/xen-devel/71b8a4f1-9c18-36e7-56b1-3f1b1dabddd6@suse.com/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks
  2021-07-15  9:05       ` Julien Grall
@ 2021-07-15 11:36         ` Jan Beulich
  2021-07-16  7:50           ` Julien Grall
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-07-15 11:36 UTC (permalink / raw)
  To: Julien Grall
  Cc: Roger Pau Monné,
	Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, xen-devel

On 15.07.2021 11:05, Julien Grall wrote:
> On 15/07/2021 07:38, Jan Beulich wrote:
>> On 14.07.2021 20:16, Julien Grall wrote:
>>> On 05/07/2021 16:18, Jan Beulich wrote:
>>>> Let's try to avoid giving the impression that 32-bit tool stacks are as
>>>> capable as 64-bit ones.
>>>
>>> Would you be able to provide a few examples of the known issues in the
>>> commit message? This would be helpful for anyone to understand why we
>>> decided to drop the support.
>>
>> Not sure how useful this is going to be.
> 
> It would at least be useful to me, so I can make an informed decision. I 
> suspect it would also be for anyone reading it in the future. This is 
> rather frustrating to find commit message with barely any rationale and 
> no-one remembering why this was done...

Well, I've added "There are a number of cases there where 32-bit
types are used to hold e.g. frame numbers." Not sure whether you
consider this sufficient.

Problematic code may be primarily in areas Arm doesn't
care about (yet), like PCI pass-through or migration. But see e.g.
- xc_map_foreign_range()'s "mfn" and "size" parameters,
- xc_maximum_ram_page()'s "max_mfn" parameter,
- libxl_dom.c:hvm_build_set_params()'s "store_mfn" and "console_mfn"
  parameters,
- xs_introduce_domain()'s "mfn" parameter,
and quite a few more in particular in libxenguest.

And then there are also subtle oddities like xc_set_mem_access_multi()
having

    xen_mem_access_op_t mao =
    {
        .op       = XENMEM_access_op_set_access_multi,
        .domid    = domain_id,
        .access   = XENMEM_access_default + 1, /* Invalid value */
        .pfn      = ~0UL, /* Invalid GFN */
        .nr       = nr,
    };

Clearly ~0UL won't have the intended effect even for 32-bit guests,
when the field is uint64_t (we get away here because for whatever
reason the hypervisor doesn't check that the field indeed is ~0UL).
But I wouldn't be surprised to find uses where there would be a
difference. One of the main aspects certainly is ...

> I vaguely recall a discussion about 64-bit hypercall ([1]). I assume the 
> decision to drop support is related to it, but I have no way to prove it 
> from the commit message.

... this. Some XENMEM_* may return 64-bit values, yet the hypercall
interface is limited to "long" return types. Not even the multicall
approach taken to work around the restriction to "int" would help
here for x86-32, as struct multicall_entry also uses xen_ulong_t
for its "result" field.

> It is also not clear why adding the restriction is the way to go...
> 
>> This would be pointing at the
>> declarations / definitions of various tool stack internal variables or
>> structure fields. Which also is why ...
> 
> ... is this because such issues are too widespread in libxc/libxl to fix 
> it in long term?

Fixing is an option, but until it gets fixed (if anyone really cared
to do so), spelling out the restriction looks to be an appropriate
step to me (or else I wouldn't have followed the request and created
this patch). Once suitably audited, fixed, and tested, I wouldn't see
a reason not to remove the restriction again.

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks
  2021-07-15 11:36         ` Jan Beulich
@ 2021-07-16  7:50           ` Julien Grall
  0 siblings, 0 replies; 44+ messages in thread
From: Julien Grall @ 2021-07-16  7:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Andrew Cooper, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, xen-devel

Hi Jan,

On 15/07/2021 12:36, Jan Beulich wrote:
> On 15.07.2021 11:05, Julien Grall wrote:
>> On 15/07/2021 07:38, Jan Beulich wrote:
>>> On 14.07.2021 20:16, Julien Grall wrote:
>>>> On 05/07/2021 16:18, Jan Beulich wrote:
>>>>> Let's try to avoid giving the impression that 32-bit tool stacks are as
>>>>> capable as 64-bit ones.
>>>>
>>>> Would you be able to provide a few examples of the known issues in the
>>>> commit message? This would be helpful for anyone to understand why we
>>>> decided to drop the support.
>>>
>>> Not sure how useful this is going to be.
>>
>> It would at least be useful to me, so I can make an informed decision. I
>> suspect it would also be for anyone reading it in the future. This is
>> rather frustrating to find commit message with barely any rationale and
>> no-one remembering why this was done...
> 
> Well, I've added "There are a number of cases there where 32-bit
> types are used to hold e.g. frame numbers." Not sure whether you
> consider this sufficient.

That's good enough for me in the commit message.

> 
> Problematic code may be primarily in areas Arm doesn't
> care about (yet), like PCI pass-through or migration. But see e.g.
> - xc_map_foreign_range()'s "mfn" and "size" parameters,
> - xc_maximum_ram_page()'s "max_mfn" parameter,
> - libxl_dom.c:hvm_build_set_params()'s "store_mfn" and "console_mfn"
>    parameters,
> - xs_introduce_domain()'s "mfn" parameter,
> and quite a few more in particular in libxenguest.

That's quite a few :/. Thanks for listing them on the ML, they are 
useful to have them log on the ML.

> 
> And then there are also subtle oddities like xc_set_mem_access_multi()
> having
> 
>      xen_mem_access_op_t mao =
>      {
>          .op       = XENMEM_access_op_set_access_multi,
>          .domid    = domain_id,
>          .access   = XENMEM_access_default + 1, /* Invalid value */
>          .pfn      = ~0UL, /* Invalid GFN */
>          .nr       = nr,
>      };
 >
> Clearly ~0UL won't have the intended effect even for 32-bit guests,
> when the field is uint64_t (we get away here because for whatever
> reason the hypervisor doesn't check that the field indeed is ~0UL).
> But I wouldn't be surprised to find uses where there would be a
> difference. One of the main aspects certainly is ...

Whoops :). One topic that came up on my series to drop the M2P helpers 
on Arm is the lack of a suitable define for invalid GFN.

However, at the first look, it didn't seem to be easy to introduce 
because the GFN is sometime stored in a 64-bit and other time in 
xen_ulong_t.

> 
>> I vaguely recall a discussion about 64-bit hypercall ([1]). I assume the
>> decision to drop support is related to it, but I have no way to prove it
>> from the commit message.
> 
> ... this. Some XENMEM_* may return 64-bit values, yet the hypercall
> interface is limited to "long" return types. Not even the multicall
> approach taken to work around the restriction to "int" would help
> here for x86-32, as struct multicall_entry also uses xen_ulong_t
> for its "result" field.
> 
>> It is also not clear why adding the restriction is the way to go...
>>
>>> This would be pointing at the
>>> declarations / definitions of various tool stack internal variables or
>>> structure fields. Which also is why ...
>>
>> ... is this because such issues are too widespread in libxc/libxl to fix
>> it in long term?
> 
> Fixing is an option, but until it gets fixed (if anyone really cared
> to do so), spelling out the restriction looks to be an appropriate
> step to me (or else I wouldn't have followed the request and created
> this patch). Once suitably audited, fixed, and tested, I wouldn't see
> a reason not to remove the restriction again.

Agreed. I was mostly wondering whether this was a matter of couple 
patches and could be restricted to maybe libxl (IOW toolstack based on 
libxc may not be affected). But from what you wrote, the issue is quite 
widespread.

Anyway, this is enough to convince me that dropping support (until it is 
fixed) for 32-bit toolstack on 64-bit hypervisor.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/13] libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl
  2021-07-05 15:11 ` [PATCH v2 01/13] libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl Jan Beulich
@ 2021-07-16 13:03   ` Anthony PERARD
  0 siblings, 0 replies; 44+ messages in thread
From: Anthony PERARD @ 2021-07-16 13:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, Roger Pau Monné,
	Juergen Gross, George Dunlap, Ian Jackson

On Mon, Jul 05, 2021 at 05:11:31PM +0200, Jan Beulich wrote:
> The hypervisor may not have enough memory to satisfy the request. While
> there, make the unit of the value clear by renaming the local variable.
> 
> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> v2: Rename local variables. Also log requested value. Don't log errno
>     explicitly in LOGED() (nor function return value). Integrate into
>     series.

Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Ping: [PATCH v2 00/13] x86: more or less log-dirty related improvements
  2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
                   ` (12 preceding siblings ...)
  2021-07-05 15:18 ` [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
@ 2021-07-19  7:46 ` Jan Beulich
  2021-08-13  9:24   ` Jan Beulich
  2021-08-20  7:20   ` Jan Beulich
  13 siblings, 2 replies; 44+ messages in thread
From: Jan Beulich @ 2021-07-19  7:46 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson
  Cc: Juergen Gross, George Dunlap, xen-devel

On 05.07.2021 17:09, Jan Beulich wrote:
> ... or so I hope. This series continues the attempt to deal with
> the ovmf change putting the shared info page at a very high address
> (which is now planned to get reverted there, but the general
> problem doesn't go away by them doing so). There are further issues
> with truncated value, which are being dealt with here. But there
> are also not directly related changes, when I simply spotted things
> that aren't very likely to be right the way they are. And then
> there are also adjustments to the underlying hypervisor
> implementation, with the goal of making the returned data more
> useful to the consumers.
> 
> With these changes in place, a 1Gb guest which has "inflated"
> itself by putting a page right below the 16Tb boundary migrates
> successfully, albeit the process takes from some 20 minutes to over
> half an hour on my test system.
> 
> In v2, besides integrating 2 patches that were previously sent,
> there's one new patch and otherwise review feedback addressed
> (albeit there wasn't any for a number of patches).
> 
> 01: libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl

while I did get an R-b from Anthony on this one, but ...

> 02: libxc: split xc_logdirty_control() from xc_shadow_control()
> 03: libxenguest: deal with log-dirty op stats overflow
> 04: libxenguest: short-circuit "all-dirty" handling
> 05: libxenguest: avoid allocating unused deferred-pages bitmap
> 06: libxenguest: complete loops in xc_map_domain_meminfo()
> 07: libxenguest: guard against overflow from too large p2m when checkpointing
> 08: libxenguest: fix off-by-1 in colo-secondary-bitmap merging
> 09: libxenguest: restrict PV guest size
> 10: libxc: simplify HYPERCALL_BUFFER()
> 11: x86/paging: supply more useful log-dirty page count
> 12: x86/mm: update log-dirty bitmap when manipulating P2M

... all of these are still in needed of suitable acks (patches 8
and 10 have an R-b though, and are independent of earlier parts of
this series). Patches 3 and 5 have objections pending by Andrew,
which I did reply to verbally without it having become clear
whether these replies were addressing the concerns, or what exactly
the misunderstanding on either side is (and hence which, if any,
changes I should make).

Thanks, Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Ping: [PATCH v2 00/13] x86: more or less log-dirty related improvements
  2021-07-19  7:46 ` Ping: [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
@ 2021-08-13  9:24   ` Jan Beulich
  2021-08-20  7:20   ` Jan Beulich
  1 sibling, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-08-13  9:24 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson
  Cc: Juergen Gross, George Dunlap, xen-devel

On 19.07.2021 09:46, Jan Beulich wrote:
> On 05.07.2021 17:09, Jan Beulich wrote:
>> ... or so I hope. This series continues the attempt to deal with
>> the ovmf change putting the shared info page at a very high address
>> (which is now planned to get reverted there, but the general
>> problem doesn't go away by them doing so). There are further issues
>> with truncated value, which are being dealt with here. But there
>> are also not directly related changes, when I simply spotted things
>> that aren't very likely to be right the way they are. And then
>> there are also adjustments to the underlying hypervisor
>> implementation, with the goal of making the returned data more
>> useful to the consumers.
>>
>> With these changes in place, a 1Gb guest which has "inflated"
>> itself by putting a page right below the 16Tb boundary migrates
>> successfully, albeit the process takes from some 20 minutes to over
>> half an hour on my test system.
>>
>> In v2, besides integrating 2 patches that were previously sent,
>> there's one new patch and otherwise review feedback addressed
>> (albeit there wasn't any for a number of patches).
>>
>> 01: libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl
> 
> while I did get an R-b from Anthony on this one, but ...
> 
>> 02: libxc: split xc_logdirty_control() from xc_shadow_control()
>> 03: libxenguest: deal with log-dirty op stats overflow
>> 04: libxenguest: short-circuit "all-dirty" handling
>> 05: libxenguest: avoid allocating unused deferred-pages bitmap
>> 06: libxenguest: complete loops in xc_map_domain_meminfo()
>> 07: libxenguest: guard against overflow from too large p2m when checkpointing
>> 08: libxenguest: fix off-by-1 in colo-secondary-bitmap merging
>> 09: libxenguest: restrict PV guest size
>> 10: libxc: simplify HYPERCALL_BUFFER()
>> 11: x86/paging: supply more useful log-dirty page count
>> 12: x86/mm: update log-dirty bitmap when manipulating P2M
> 
> ... all of these are still in needed of suitable acks (patches 8
> and 10 have an R-b though, and are independent of earlier parts of
> this series).

Since I've not had any feedback since the ping, I'm intending to
commit the patches that have acks / R-b by Andrew (which actually
also includes patch 6), on the basis that informally (iirc
mentioned on irc more than once, in other but similar contexts)
this was indicated to be okay by the maintainers.

For all other patches here I'd like to renew the ping. I have to
admit that I'm unclear about what else I can do to progress this
series.

Jan

> Patches 3 and 5 have objections pending by Andrew,
> which I did reply to verbally without it having become clear
> whether these replies were addressing the concerns, or what exactly
> the misunderstanding on either side is (and hence which, if any,
> changes I should make).
> 
> Thanks, Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control()
  2021-07-05 15:12 ` [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control() Jan Beulich
@ 2021-08-19  9:11   ` Juergen Gross
  2021-08-19  9:24     ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Juergen Gross @ 2021-08-19  9:11 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Anthony Perard, Marek Marczykowski


[-- Attachment #1.1.1: Type: text/plain, Size: 4403 bytes --]

On 05.07.21 17:12, Jan Beulich wrote:
> For log-dirty operations a 64-bit field is being truncated to become an
> "int" return value. Seeing the large number of arguments the present
> function takes, reduce its set of parameters to that needed for all
> operations not involving the log-dirty bitmap, while introducing a new
> wrapper for the log-dirty bitmap operations. This new function in turn
> doesn't need an "mb" parameter, but has a 64-bit return type. (Using the
> return value in favor of a pointer-type parameter is left as is, to
> disturb callers as little as possible.)
> 
> While altering xc_shadow_control() anyway, also adjust the types of the
> last two of the remaining parameters.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Christian Lindig <christian.lindig@citrix.com>
> ---
> v2: Avoid new use of DECLARE_DOMCTL. Re-base over error handling fix to
>      libxl__arch_domain_create().
> ---
> I wonder whether we shouldn't take the opportunity and also rename
> xc_shadow_control() to, say, xc_paging_control(), matching the layer
> above the HAP/shadow distinction in the hypervisor.
> 
> --- a/tools/include/xenctrl.h
> +++ b/tools/include/xenctrl.h
> @@ -885,11 +885,15 @@ typedef struct xen_domctl_shadow_op_stat
>   int xc_shadow_control(xc_interface *xch,
>                         uint32_t domid,
>                         unsigned int sop,
> -                      xc_hypercall_buffer_t *dirty_bitmap,
> -                      unsigned long pages,
> -                      unsigned long *mb,
> -                      uint32_t mode,
> -                      xc_shadow_op_stats_t *stats);
> +                      unsigned int *mb,
> +                      unsigned int mode);
> +long long xc_logdirty_control(xc_interface *xch,
> +                              uint32_t domid,
> +                              unsigned int sop,
> +                              xc_hypercall_buffer_t *dirty_bitmap,
> +                              unsigned long pages,
> +                              unsigned int mode,
> +                              xc_shadow_op_stats_t *stats);
>   
>   int xc_sched_credit_domain_set(xc_interface *xch,
>                                  uint32_t domid,
> --- a/tools/libs/ctrl/xc_domain.c
> +++ b/tools/libs/ctrl/xc_domain.c
> @@ -650,25 +650,49 @@ int xc_watchdog(xc_interface *xch,
>   int xc_shadow_control(xc_interface *xch,
>                         uint32_t domid,
>                         unsigned int sop,
> -                      xc_hypercall_buffer_t *dirty_bitmap,
> -                      unsigned long pages,
> -                      unsigned long *mb,
> -                      uint32_t mode,
> -                      xc_shadow_op_stats_t *stats)
> +                      unsigned int *mb,
> +                      unsigned int mode)
>   {
>       int rc;
>       DECLARE_DOMCTL;
> -    DECLARE_HYPERCALL_BUFFER_ARGUMENT(dirty_bitmap);
>   
>       memset(&domctl, 0, sizeof(domctl));
>   
>       domctl.cmd = XEN_DOMCTL_shadow_op;
>       domctl.domain = domid;
>       domctl.u.shadow_op.op     = sop;

Shouldn't you verify that sop is not one of the operations now
handled by xc_logdirty_control()?

> -    domctl.u.shadow_op.pages  = pages;
>       domctl.u.shadow_op.mb     = mb ? *mb : 0;
>       domctl.u.shadow_op.mode   = mode;
> -    if (dirty_bitmap != NULL)
> +
> +    rc = do_domctl(xch, &domctl);
> +
> +    if ( mb )
> +        *mb = domctl.u.shadow_op.mb;
> +
> +    return rc;
> +}
> +
> +long long xc_logdirty_control(xc_interface *xch,
> +                              uint32_t domid,
> +                              unsigned int sop,
> +                              xc_hypercall_buffer_t *dirty_bitmap,
> +                              unsigned long pages,
> +                              unsigned int mode,
> +                              xc_shadow_op_stats_t *stats)
> +{
> +    int rc;
> +    struct xen_domctl domctl = {
> +        .cmd         = XEN_DOMCTL_shadow_op,
> +        .domain      = domid,
> +        .u.shadow_op = {
> +            .op    = sop,

And same here the other way round: sop should really only be one of
XEN_DOMCTL_SHADOW_OP_CLEAN or XEN_DOMCTL_SHADOW_OP_PEEK.

With that fixed you can add my:

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control()
  2021-08-19  9:11   ` Juergen Gross
@ 2021-08-19  9:24     ` Jan Beulich
  2021-08-19  9:50       ` Juergen Gross
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-08-19  9:24 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Anthony Perard, Marek Marczykowski,
	xen-devel

On 19.08.2021 11:11, Juergen Gross wrote:
> On 05.07.21 17:12, Jan Beulich wrote:
>> For log-dirty operations a 64-bit field is being truncated to become an
>> "int" return value. Seeing the large number of arguments the present
>> function takes, reduce its set of parameters to that needed for all
>> operations not involving the log-dirty bitmap, while introducing a new
>> wrapper for the log-dirty bitmap operations. This new function in turn
>> doesn't need an "mb" parameter, but has a 64-bit return type. (Using the
>> return value in favor of a pointer-type parameter is left as is, to
>> disturb callers as little as possible.)
>>
>> While altering xc_shadow_control() anyway, also adjust the types of the
>> last two of the remaining parameters.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Acked-by: Christian Lindig <christian.lindig@citrix.com>
>> ---
>> v2: Avoid new use of DECLARE_DOMCTL. Re-base over error handling fix to
>>      libxl__arch_domain_create().
>> ---
>> I wonder whether we shouldn't take the opportunity and also rename
>> xc_shadow_control() to, say, xc_paging_control(), matching the layer
>> above the HAP/shadow distinction in the hypervisor.
>>
>> --- a/tools/include/xenctrl.h
>> +++ b/tools/include/xenctrl.h
>> @@ -885,11 +885,15 @@ typedef struct xen_domctl_shadow_op_stat
>>   int xc_shadow_control(xc_interface *xch,
>>                         uint32_t domid,
>>                         unsigned int sop,
>> -                      xc_hypercall_buffer_t *dirty_bitmap,
>> -                      unsigned long pages,
>> -                      unsigned long *mb,
>> -                      uint32_t mode,
>> -                      xc_shadow_op_stats_t *stats);
>> +                      unsigned int *mb,
>> +                      unsigned int mode);
>> +long long xc_logdirty_control(xc_interface *xch,
>> +                              uint32_t domid,
>> +                              unsigned int sop,
>> +                              xc_hypercall_buffer_t *dirty_bitmap,
>> +                              unsigned long pages,
>> +                              unsigned int mode,
>> +                              xc_shadow_op_stats_t *stats);
>>   
>>   int xc_sched_credit_domain_set(xc_interface *xch,
>>                                  uint32_t domid,
>> --- a/tools/libs/ctrl/xc_domain.c
>> +++ b/tools/libs/ctrl/xc_domain.c
>> @@ -650,25 +650,49 @@ int xc_watchdog(xc_interface *xch,
>>   int xc_shadow_control(xc_interface *xch,
>>                         uint32_t domid,
>>                         unsigned int sop,
>> -                      xc_hypercall_buffer_t *dirty_bitmap,
>> -                      unsigned long pages,
>> -                      unsigned long *mb,
>> -                      uint32_t mode,
>> -                      xc_shadow_op_stats_t *stats)
>> +                      unsigned int *mb,
>> +                      unsigned int mode)
>>   {
>>       int rc;
>>       DECLARE_DOMCTL;
>> -    DECLARE_HYPERCALL_BUFFER_ARGUMENT(dirty_bitmap);
>>   
>>       memset(&domctl, 0, sizeof(domctl));
>>   
>>       domctl.cmd = XEN_DOMCTL_shadow_op;
>>       domctl.domain = domid;
>>       domctl.u.shadow_op.op     = sop;
> 
> Shouldn't you verify that sop is not one of the operations now
> handled by xc_logdirty_control()?

While I was considering to do this, I couldn't think of a forward
compatible way, and what I'd like to avoid is having the need to
update these functions when new ops get added, just to suitably
also exclude them then. Plus I thought that if someone elected
the (apparently) wrong function as suiting their need in a
particular case, why would we put policy in place making this
impossible?

>> -    domctl.u.shadow_op.pages  = pages;
>>       domctl.u.shadow_op.mb     = mb ? *mb : 0;
>>       domctl.u.shadow_op.mode   = mode;
>> -    if (dirty_bitmap != NULL)
>> +
>> +    rc = do_domctl(xch, &domctl);
>> +
>> +    if ( mb )
>> +        *mb = domctl.u.shadow_op.mb;
>> +
>> +    return rc;
>> +}
>> +
>> +long long xc_logdirty_control(xc_interface *xch,
>> +                              uint32_t domid,
>> +                              unsigned int sop,
>> +                              xc_hypercall_buffer_t *dirty_bitmap,
>> +                              unsigned long pages,
>> +                              unsigned int mode,
>> +                              xc_shadow_op_stats_t *stats)
>> +{
>> +    int rc;
>> +    struct xen_domctl domctl = {
>> +        .cmd         = XEN_DOMCTL_shadow_op,
>> +        .domain      = domid,
>> +        .u.shadow_op = {
>> +            .op    = sop,
> 
> And same here the other way round: sop should really only be one of
> XEN_DOMCTL_SHADOW_OP_CLEAN or XEN_DOMCTL_SHADOW_OP_PEEK.
> 
> With that fixed you can add my:
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>

Thanks, but I won't take this just yet, awaiting your (and maybe
others') view(s) on my reply above.

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control()
  2021-08-19  9:24     ` Jan Beulich
@ 2021-08-19  9:50       ` Juergen Gross
  2021-09-02 17:04         ` [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control() [and 1 more messages] Ian Jackson
  0 siblings, 1 reply; 44+ messages in thread
From: Juergen Gross @ 2021-08-19  9:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Anthony Perard, Marek Marczykowski,
	xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 5330 bytes --]

On 19.08.21 11:24, Jan Beulich wrote:
> On 19.08.2021 11:11, Juergen Gross wrote:
>> On 05.07.21 17:12, Jan Beulich wrote:
>>> For log-dirty operations a 64-bit field is being truncated to become an
>>> "int" return value. Seeing the large number of arguments the present
>>> function takes, reduce its set of parameters to that needed for all
>>> operations not involving the log-dirty bitmap, while introducing a new
>>> wrapper for the log-dirty bitmap operations. This new function in turn
>>> doesn't need an "mb" parameter, but has a 64-bit return type. (Using the
>>> return value in favor of a pointer-type parameter is left as is, to
>>> disturb callers as little as possible.)
>>>
>>> While altering xc_shadow_control() anyway, also adjust the types of the
>>> last two of the remaining parameters.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> Acked-by: Christian Lindig <christian.lindig@citrix.com>
>>> ---
>>> v2: Avoid new use of DECLARE_DOMCTL. Re-base over error handling fix to
>>>       libxl__arch_domain_create().
>>> ---
>>> I wonder whether we shouldn't take the opportunity and also rename
>>> xc_shadow_control() to, say, xc_paging_control(), matching the layer
>>> above the HAP/shadow distinction in the hypervisor.
>>>
>>> --- a/tools/include/xenctrl.h
>>> +++ b/tools/include/xenctrl.h
>>> @@ -885,11 +885,15 @@ typedef struct xen_domctl_shadow_op_stat
>>>    int xc_shadow_control(xc_interface *xch,
>>>                          uint32_t domid,
>>>                          unsigned int sop,
>>> -                      xc_hypercall_buffer_t *dirty_bitmap,
>>> -                      unsigned long pages,
>>> -                      unsigned long *mb,
>>> -                      uint32_t mode,
>>> -                      xc_shadow_op_stats_t *stats);
>>> +                      unsigned int *mb,
>>> +                      unsigned int mode);
>>> +long long xc_logdirty_control(xc_interface *xch,
>>> +                              uint32_t domid,
>>> +                              unsigned int sop,
>>> +                              xc_hypercall_buffer_t *dirty_bitmap,
>>> +                              unsigned long pages,
>>> +                              unsigned int mode,
>>> +                              xc_shadow_op_stats_t *stats);
>>>    
>>>    int xc_sched_credit_domain_set(xc_interface *xch,
>>>                                   uint32_t domid,
>>> --- a/tools/libs/ctrl/xc_domain.c
>>> +++ b/tools/libs/ctrl/xc_domain.c
>>> @@ -650,25 +650,49 @@ int xc_watchdog(xc_interface *xch,
>>>    int xc_shadow_control(xc_interface *xch,
>>>                          uint32_t domid,
>>>                          unsigned int sop,
>>> -                      xc_hypercall_buffer_t *dirty_bitmap,
>>> -                      unsigned long pages,
>>> -                      unsigned long *mb,
>>> -                      uint32_t mode,
>>> -                      xc_shadow_op_stats_t *stats)
>>> +                      unsigned int *mb,
>>> +                      unsigned int mode)
>>>    {
>>>        int rc;
>>>        DECLARE_DOMCTL;
>>> -    DECLARE_HYPERCALL_BUFFER_ARGUMENT(dirty_bitmap);
>>>    
>>>        memset(&domctl, 0, sizeof(domctl));
>>>    
>>>        domctl.cmd = XEN_DOMCTL_shadow_op;
>>>        domctl.domain = domid;
>>>        domctl.u.shadow_op.op     = sop;
>>
>> Shouldn't you verify that sop is not one of the operations now
>> handled by xc_logdirty_control()?
> 
> While I was considering to do this, I couldn't think of a forward
> compatible way, and what I'd like to avoid is having the need to
> update these functions when new ops get added, just to suitably
> also exclude them then. Plus I thought that if someone elected
> the (apparently) wrong function as suiting their need in a
> particular case, why would we put policy in place making this
> impossible?
> 
>>> -    domctl.u.shadow_op.pages  = pages;
>>>        domctl.u.shadow_op.mb     = mb ? *mb : 0;
>>>        domctl.u.shadow_op.mode   = mode;
>>> -    if (dirty_bitmap != NULL)
>>> +
>>> +    rc = do_domctl(xch, &domctl);
>>> +
>>> +    if ( mb )
>>> +        *mb = domctl.u.shadow_op.mb;
>>> +
>>> +    return rc;
>>> +}
>>> +
>>> +long long xc_logdirty_control(xc_interface *xch,
>>> +                              uint32_t domid,
>>> +                              unsigned int sop,
>>> +                              xc_hypercall_buffer_t *dirty_bitmap,
>>> +                              unsigned long pages,
>>> +                              unsigned int mode,
>>> +                              xc_shadow_op_stats_t *stats)
>>> +{
>>> +    int rc;
>>> +    struct xen_domctl domctl = {
>>> +        .cmd         = XEN_DOMCTL_shadow_op,
>>> +        .domain      = domid,
>>> +        .u.shadow_op = {
>>> +            .op    = sop,
>>
>> And same here the other way round: sop should really only be one of
>> XEN_DOMCTL_SHADOW_OP_CLEAN or XEN_DOMCTL_SHADOW_OP_PEEK.
>>
>> With that fixed you can add my:
>>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
> Thanks, but I won't take this just yet, awaiting your (and maybe
> others') view(s) on my reply above.

I'm not feeling really strong in this regard. Either way is fine for
me.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-07-05 15:13 ` [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow Jan Beulich
  2021-07-05 15:41   ` Andrew Cooper
  2021-07-05 17:26   ` Olaf Hering
@ 2021-08-19 10:20   ` Juergen Gross
  2021-08-19 11:06     ` Jan Beulich
  2 siblings, 1 reply; 44+ messages in thread
From: Juergen Gross @ 2021-08-19 10:20 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, George Dunlap, Ian Jackson


[-- Attachment #1.1.1: Type: text/plain, Size: 1412 bytes --]

On 05.07.21 17:13, Jan Beulich wrote:
> In send_memory_live() the precise value the dirty_count struct field
> gets initialized to doesn't matter much (apart from the triggering of
> the log message in send_dirty_pages(), see below), but it is important
> that it not be zero on the first iteration (or else send_dirty_pages()
> won't get called at all). Saturate the initializer value at the maximum
> value the field can hold.
> 
> While there also initialize struct precopy_stats' respective field to a
> more sane value: We don't really know how many dirty pages there are at
> that point.
> 
> In suspend_and_send_dirty() and verify_frames() the local variables
> don't need initializing at all, as they're only an output from the
> hypercall which gets invoked first thing.
> 
> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
> altogether: It's optional to xc_logdirty_control() and not used anywhere
> else.
> 
> Note that in case the clipping actually takes effect, the "Bitmap
> contained more entries than expected..." log message will trigger. This
> being just an informational message, I don't think this is overly
> concerning.

Is there any real reason why the width of the stats fields can't be
expanded to avoid clipping? This could avoid the need to set the
initial value to -1, which seems one of the more controversial changes.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-08-19 10:20   ` Juergen Gross
@ 2021-08-19 11:06     ` Jan Beulich
  2021-08-19 11:25       ` Juergen Gross
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-08-19 11:06 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, xen-devel

On 19.08.2021 12:20, Juergen Gross wrote:
> On 05.07.21 17:13, Jan Beulich wrote:
>> In send_memory_live() the precise value the dirty_count struct field
>> gets initialized to doesn't matter much (apart from the triggering of
>> the log message in send_dirty_pages(), see below), but it is important
>> that it not be zero on the first iteration (or else send_dirty_pages()
>> won't get called at all). Saturate the initializer value at the maximum
>> value the field can hold.
>>
>> While there also initialize struct precopy_stats' respective field to a
>> more sane value: We don't really know how many dirty pages there are at
>> that point.
>>
>> In suspend_and_send_dirty() and verify_frames() the local variables
>> don't need initializing at all, as they're only an output from the
>> hypercall which gets invoked first thing.
>>
>> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
>> altogether: It's optional to xc_logdirty_control() and not used anywhere
>> else.
>>
>> Note that in case the clipping actually takes effect, the "Bitmap
>> contained more entries than expected..." log message will trigger. This
>> being just an informational message, I don't think this is overly
>> concerning.
> 
> Is there any real reason why the width of the stats fields can't be
> expanded to avoid clipping? This could avoid the need to set the
> initial value to -1, which seems one of the more controversial changes.

While not impossible, it comes with a price tag, as we'd either need
to decouple xc_shadow_op_stats_t from struct xen_domctl_shadow_op_stats
or alter the underlying domctl. Neither of which looked either
appealing or necessary to me; instead I'm still struggling with
Andrew's comments, yet I didn't receive any clarification of further
explanation. Plus I continue to think that statistics output like this
shouldn't be assumed to be precise anyway, and for practical purposes
I don't think it really matters how large the counts actually are once
they've moved into the billions.

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-08-19 11:06     ` Jan Beulich
@ 2021-08-19 11:25       ` Juergen Gross
  2021-08-19 11:51         ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Juergen Gross @ 2021-08-19 11:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2614 bytes --]

On 19.08.21 13:06, Jan Beulich wrote:
> On 19.08.2021 12:20, Juergen Gross wrote:
>> On 05.07.21 17:13, Jan Beulich wrote:
>>> In send_memory_live() the precise value the dirty_count struct field
>>> gets initialized to doesn't matter much (apart from the triggering of
>>> the log message in send_dirty_pages(), see below), but it is important
>>> that it not be zero on the first iteration (or else send_dirty_pages()
>>> won't get called at all). Saturate the initializer value at the maximum
>>> value the field can hold.
>>>
>>> While there also initialize struct precopy_stats' respective field to a
>>> more sane value: We don't really know how many dirty pages there are at
>>> that point.
>>>
>>> In suspend_and_send_dirty() and verify_frames() the local variables
>>> don't need initializing at all, as they're only an output from the
>>> hypercall which gets invoked first thing.
>>>
>>> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
>>> altogether: It's optional to xc_logdirty_control() and not used anywhere
>>> else.
>>>
>>> Note that in case the clipping actually takes effect, the "Bitmap
>>> contained more entries than expected..." log message will trigger. This
>>> being just an informational message, I don't think this is overly
>>> concerning.
>>
>> Is there any real reason why the width of the stats fields can't be
>> expanded to avoid clipping? This could avoid the need to set the
>> initial value to -1, which seems one of the more controversial changes.
> 
> While not impossible, it comes with a price tag, as we'd either need
> to decouple xc_shadow_op_stats_t from struct xen_domctl_shadow_op_stats
> or alter the underlying domctl. Neither of which looked either

I was thinking about the domctl.

Apart of struct xen_sysctl_page_offline_op this seems to be the only
left domctl/sysctl structure limiting guest or host size to values
being relevant. Changing those would be a sensible thing to do IMO.

> appealing or necessary to me; instead I'm still struggling with
> Andrew's comments, yet I didn't receive any clarification of further
> explanation. Plus I continue to think that statistics output like this
> shouldn't be assumed to be precise anyway, and for practical purposes
> I don't think it really matters how large the counts actually are once
> they've moved into the billions.

In case the underlying problem can easily be avoided be changing the
type of the field I don't see why this shouldn't be done. Especially as
in this case a misleading message will just be avoided.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-08-19 11:25       ` Juergen Gross
@ 2021-08-19 11:51         ` Jan Beulich
  2021-08-19 11:53           ` Jan Beulich
  2021-08-19 14:29           ` Juergen Gross
  0 siblings, 2 replies; 44+ messages in thread
From: Jan Beulich @ 2021-08-19 11:51 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, xen-devel

On 19.08.2021 13:25, Juergen Gross wrote:
> On 19.08.21 13:06, Jan Beulich wrote:
>> On 19.08.2021 12:20, Juergen Gross wrote:
>>> On 05.07.21 17:13, Jan Beulich wrote:
>>>> In send_memory_live() the precise value the dirty_count struct field
>>>> gets initialized to doesn't matter much (apart from the triggering of
>>>> the log message in send_dirty_pages(), see below), but it is important
>>>> that it not be zero on the first iteration (or else send_dirty_pages()
>>>> won't get called at all). Saturate the initializer value at the maximum
>>>> value the field can hold.
>>>>
>>>> While there also initialize struct precopy_stats' respective field to a
>>>> more sane value: We don't really know how many dirty pages there are at
>>>> that point.
>>>>
>>>> In suspend_and_send_dirty() and verify_frames() the local variables
>>>> don't need initializing at all, as they're only an output from the
>>>> hypercall which gets invoked first thing.
>>>>
>>>> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
>>>> altogether: It's optional to xc_logdirty_control() and not used anywhere
>>>> else.
>>>>
>>>> Note that in case the clipping actually takes effect, the "Bitmap
>>>> contained more entries than expected..." log message will trigger. This
>>>> being just an informational message, I don't think this is overly
>>>> concerning.
>>>
>>> Is there any real reason why the width of the stats fields can't be
>>> expanded to avoid clipping? This could avoid the need to set the
>>> initial value to -1, which seems one of the more controversial changes.
>>
>> While not impossible, it comes with a price tag, as we'd either need
>> to decouple xc_shadow_op_stats_t from struct xen_domctl_shadow_op_stats
>> or alter the underlying domctl. Neither of which looked either
> 
> I was thinking about the domctl.
> 
> Apart of struct xen_sysctl_page_offline_op this seems to be the only
> left domctl/sysctl structure limiting guest or host size to values
> being relevant. Changing those would be a sensible thing to do IMO.

Yet in the context of v1 of this series, which included "x86/paging:
deal with log-dirty stats overflow" (now commit 17e91570c5a4) we
settled on these fields not needing widening. This doesn't prevent
us doing what you suggest, but it would look pretty odd to me at
least.

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-08-19 11:51         ` Jan Beulich
@ 2021-08-19 11:53           ` Jan Beulich
  2021-09-02 16:57             ` Ian Jackson
  2021-08-19 14:29           ` Juergen Gross
  1 sibling, 1 reply; 44+ messages in thread
From: Jan Beulich @ 2021-08-19 11:53 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, xen-devel

On 19.08.2021 13:51, Jan Beulich wrote:
> On 19.08.2021 13:25, Juergen Gross wrote:
>> On 19.08.21 13:06, Jan Beulich wrote:
>>> On 19.08.2021 12:20, Juergen Gross wrote:
>>>> On 05.07.21 17:13, Jan Beulich wrote:
>>>>> In send_memory_live() the precise value the dirty_count struct field
>>>>> gets initialized to doesn't matter much (apart from the triggering of
>>>>> the log message in send_dirty_pages(), see below), but it is important
>>>>> that it not be zero on the first iteration (or else send_dirty_pages()
>>>>> won't get called at all). Saturate the initializer value at the maximum
>>>>> value the field can hold.
>>>>>
>>>>> While there also initialize struct precopy_stats' respective field to a
>>>>> more sane value: We don't really know how many dirty pages there are at
>>>>> that point.
>>>>>
>>>>> In suspend_and_send_dirty() and verify_frames() the local variables
>>>>> don't need initializing at all, as they're only an output from the
>>>>> hypercall which gets invoked first thing.
>>>>>
>>>>> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
>>>>> altogether: It's optional to xc_logdirty_control() and not used anywhere
>>>>> else.
>>>>>
>>>>> Note that in case the clipping actually takes effect, the "Bitmap
>>>>> contained more entries than expected..." log message will trigger. This
>>>>> being just an informational message, I don't think this is overly
>>>>> concerning.
>>>>
>>>> Is there any real reason why the width of the stats fields can't be
>>>> expanded to avoid clipping? This could avoid the need to set the
>>>> initial value to -1, which seems one of the more controversial changes.
>>>
>>> While not impossible, it comes with a price tag, as we'd either need
>>> to decouple xc_shadow_op_stats_t from struct xen_domctl_shadow_op_stats
>>> or alter the underlying domctl. Neither of which looked either
>>
>> I was thinking about the domctl.
>>
>> Apart of struct xen_sysctl_page_offline_op this seems to be the only
>> left domctl/sysctl structure limiting guest or host size to values
>> being relevant. Changing those would be a sensible thing to do IMO.
> 
> Yet in the context of v1 of this series, which included "x86/paging:
> deal with log-dirty stats overflow" (now commit 17e91570c5a4) we
> settled on these fields not needing widening. This doesn't prevent
> us doing what you suggest, but it would look pretty odd to me at
> least.

And - just to make it very explicit - even if I went this route to
address a controversial point, I'd still like to understand the
reason for the original objection - if only for my own education.

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 04/13] libxenguest: short-circuit "all-dirty" handling
  2021-07-05 15:13 ` [PATCH v2 04/13] libxenguest: short-circuit "all-dirty" handling Jan Beulich
@ 2021-08-19 14:21   ` Juergen Gross
  0 siblings, 0 replies; 44+ messages in thread
From: Juergen Gross @ 2021-08-19 14:21 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, George Dunlap, Ian Jackson


[-- Attachment #1.1.1: Type: text/plain, Size: 459 bytes --]

On 05.07.21 17:13, Jan Beulich wrote:
> For one it is unnecessary to fill a perhaps large chunk of memory with
> all ones. Add a new parameter to send_dirty_pages() for callers to
> indicate so.
> 
> Then it is further unnecessary to allocate the dirty bitmap altogether
> when all that's ever going to happen is a single all-dirty run.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 05/13] libxenguest: avoid allocating unused deferred-pages bitmap
  2021-07-05 15:14 ` [PATCH v2 05/13] libxenguest: avoid allocating unused deferred-pages bitmap Jan Beulich
@ 2021-08-19 14:22   ` Juergen Gross
  0 siblings, 0 replies; 44+ messages in thread
From: Juergen Gross @ 2021-08-19 14:22 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, George Dunlap, Ian Jackson


[-- Attachment #1.1.1: Type: text/plain, Size: 311 bytes --]

On 05.07.21 17:14, Jan Beulich wrote:
> Like for the dirty bitmap, it is unnecessary to allocate the deferred-
> pages bitmap when all that's ever going to happen is a single all-dirty
> run.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-08-19 11:51         ` Jan Beulich
  2021-08-19 11:53           ` Jan Beulich
@ 2021-08-19 14:29           ` Juergen Gross
  1 sibling, 0 replies; 44+ messages in thread
From: Juergen Gross @ 2021-08-19 14:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2800 bytes --]

On 19.08.21 13:51, Jan Beulich wrote:
> On 19.08.2021 13:25, Juergen Gross wrote:
>> On 19.08.21 13:06, Jan Beulich wrote:
>>> On 19.08.2021 12:20, Juergen Gross wrote:
>>>> On 05.07.21 17:13, Jan Beulich wrote:
>>>>> In send_memory_live() the precise value the dirty_count struct field
>>>>> gets initialized to doesn't matter much (apart from the triggering of
>>>>> the log message in send_dirty_pages(), see below), but it is important
>>>>> that it not be zero on the first iteration (or else send_dirty_pages()
>>>>> won't get called at all). Saturate the initializer value at the maximum
>>>>> value the field can hold.
>>>>>
>>>>> While there also initialize struct precopy_stats' respective field to a
>>>>> more sane value: We don't really know how many dirty pages there are at
>>>>> that point.
>>>>>
>>>>> In suspend_and_send_dirty() and verify_frames() the local variables
>>>>> don't need initializing at all, as they're only an output from the
>>>>> hypercall which gets invoked first thing.
>>>>>
>>>>> In send_checkpoint_dirty_pfn_list() the local variable can be dropped
>>>>> altogether: It's optional to xc_logdirty_control() and not used anywhere
>>>>> else.
>>>>>
>>>>> Note that in case the clipping actually takes effect, the "Bitmap
>>>>> contained more entries than expected..." log message will trigger. This
>>>>> being just an informational message, I don't think this is overly
>>>>> concerning.
>>>>
>>>> Is there any real reason why the width of the stats fields can't be
>>>> expanded to avoid clipping? This could avoid the need to set the
>>>> initial value to -1, which seems one of the more controversial changes.
>>>
>>> While not impossible, it comes with a price tag, as we'd either need
>>> to decouple xc_shadow_op_stats_t from struct xen_domctl_shadow_op_stats
>>> or alter the underlying domctl. Neither of which looked either
>>
>> I was thinking about the domctl.
>>
>> Apart of struct xen_sysctl_page_offline_op this seems to be the only
>> left domctl/sysctl structure limiting guest or host size to values
>> being relevant. Changing those would be a sensible thing to do IMO.
> 
> Yet in the context of v1 of this series, which included "x86/paging:
> deal with log-dirty stats overflow" (now commit 17e91570c5a4) we
> settled on these fields not needing widening. This doesn't prevent
> us doing what you suggest, but it would look pretty odd to me at
> least.

Sorry I was too busy at that time to have a detailed look at the
patches.

TBH I'd rather undo the stats overflow handling and widen the fields.
This is a rather simple patch and a much cleaner solution in the end.

I'm not insisting on that, but in case I had to decide this would be my
preferred way to handle it.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Ping: [PATCH v2 00/13] x86: more or less log-dirty related improvements
  2021-07-19  7:46 ` Ping: [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
  2021-08-13  9:24   ` Jan Beulich
@ 2021-08-20  7:20   ` Jan Beulich
  1 sibling, 0 replies; 44+ messages in thread
From: Jan Beulich @ 2021-08-20  7:20 UTC (permalink / raw)
  To: Wei Liu, Ian Jackson
  Cc: Juergen Gross, George Dunlap, xen-devel, Andrew Cooper,
	Roger Pau Monné

On 19.07.2021 09:46, Jan Beulich wrote:
> On 05.07.2021 17:09, Jan Beulich wrote:
>> ... or so I hope. This series continues the attempt to deal with
>> the ovmf change putting the shared info page at a very high address
>> (which is now planned to get reverted there, but the general
>> problem doesn't go away by them doing so). There are further issues
>> with truncated value, which are being dealt with here. But there
>> are also not directly related changes, when I simply spotted things
>> that aren't very likely to be right the way they are. And then
>> there are also adjustments to the underlying hypervisor
>> implementation, with the goal of making the returned data more
>> useful to the consumers.
>>
>> With these changes in place, a 1Gb guest which has "inflated"
>> itself by putting a page right below the 16Tb boundary migrates
>> successfully, albeit the process takes from some 20 minutes to over
>> half an hour on my test system.
>>
>> In v2, besides integrating 2 patches that were previously sent,
>> there's one new patch and otherwise review feedback addressed
>> (albeit there wasn't any for a number of patches).
>>
>> 01: libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl
> 
> while I did get an R-b from Anthony on this one, but ...
> 
>> 02: libxc: split xc_logdirty_control() from xc_shadow_control()
>> 03: libxenguest: deal with log-dirty op stats overflow
>> 04: libxenguest: short-circuit "all-dirty" handling
>> 05: libxenguest: avoid allocating unused deferred-pages bitmap

with Jürgen's R-b for 2, 4, and 5, may I ask for a maintainer ack on
at least patch 2? While I did address Andrew's comments on v1 of 4
and 5 (verbally), he didn't indicate back whether he was okay with
my replies, so some judgement may need applying there when deciding
whether to also give an ack there. Thanks.

Patch 3 remains controversial as it seems, unfortunately.

Jan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow
  2021-08-19 11:53           ` Jan Beulich
@ 2021-09-02 16:57             ` Ian Jackson
  0 siblings, 0 replies; 44+ messages in thread
From: Ian Jackson @ 2021-09-02 16:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, xen-devel

Jan Beulich writes ("Re: [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow"):
> And - just to make it very explicit - even if I went this route to
> address a controversial point, I'd still like to understand the
> reason for the original objection - if only for my own education.

I agree with this position.

I have reread the discussions with this patch title in the Subject
line.  I am still unclear on the precise nature of Andrew's objection.

All I could find was this from Andrew:

  >      ctx->save.stats = (struct precopy_stats){
  > -        .dirty_count = ctx->save.p2m_size,
  > +        .dirty_count = -1,

  This is an external interface, and I'm not sure it will tolerate finding
  more than p2m_size allegedly dirty.

and then the later comment by Andrew:

  I've already explained why [various things]

Unfortunately that message did not contain any links to these previous
explanations.  Andrew, where were they ?  If we are not to go round
in circles, we need to write these things down and then refer to them
when relevant.

We had a conversation on irc today.  I found today's IRC conversation
rather unsatisfactory.  I gleaned from Andrew's comments that this he
was saying this callback is used by something in Citrix's product to
do with nVidia migration.  Obviously we do not wish to break a
downstream.  But this interface is badly documented and clear
explanations are lacking.

While I recognise Andrew's expertise in this area, I think that as
tools maintainer I need to go with the information I have.
The overflow issue is real and Jan has proposed a fix.

Andrew: do you have time to engage with Jan here on these API
questions ?  If not, does anyone else have enough familiarity with
Citrix's use of this API to engage in a proper discussion about how
the overflow problem can be addressed ?

If effort is not available for this, that is of course fine.  But as
an Open Source project we need to be able to move forwsrd with the
available input and contributions.  So if no-one is able to explain
how this rather poor API is to be used, and help develop a solution to
the overflow problem, I should probably ack Jan's patch here.

Ian.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control() [and 1 more messages]
  2021-08-19  9:50       ` Juergen Gross
@ 2021-09-02 17:04         ` Ian Jackson
  0 siblings, 0 replies; 44+ messages in thread
From: Ian Jackson @ 2021-09-02 17:04 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Anthony Perard, Marek Marczykowski

Juergen Gross writes ("Re: [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control()"):
> On 05.07.21 17:12, Jan Beulich wrote:
> >>> +long long xc_logdirty_control(xc_interface *xch,
> >>> +                              uint32_t domid,
> >>> +                              unsigned int sop,
> >>> +                              xc_hypercall_buffer_t *dirty_bitmap,
> >>> +                              unsigned long pages,
> >>> +                              unsigned int mode,
> >>> +                              xc_shadow_op_stats_t *stats)
> >>> +{
> >>> +    int rc;
> >>> +    struct xen_domctl domctl = {
> >>> +        .cmd         = XEN_DOMCTL_shadow_op,
> >>> +        .domain      = domid,
> >>> +        .u.shadow_op = {
> >>> +            .op    = sop,
> >>
> >> And same here the other way round: sop should really only be one of
> >> XEN_DOMCTL_SHADOW_OP_CLEAN or XEN_DOMCTL_SHADOW_OP_PEEK.
> >>
> >> With that fixed you can add my:
> >>
> >> Reviewed-by: Juergen Gross <jgross@suse.com>
> > 
> > Thanks, but I won't take this just yet, awaiting your (and maybe
> > others') view(s) on my reply above.
> 
> I'm not feeling really strong in this regard. Either way is fine for
> me.

OK.  My understanding then is that you are happy with Jan's patch and
he can add the Reviewed-by even though he's not checking the ops.
Is that right ?

If so, then Jan, please add my
Acked-by: Ian Jackson <iwj@xenproject.org>
along with Juergen's review.

If I have misunderstood please explain :-).

Thanks,
Ian.


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-09-02 17:04 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-05 15:09 [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
2021-07-05 15:11 ` [PATCH v2 01/13] libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl Jan Beulich
2021-07-16 13:03   ` Anthony PERARD
2021-07-05 15:12 ` [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control() Jan Beulich
2021-08-19  9:11   ` Juergen Gross
2021-08-19  9:24     ` Jan Beulich
2021-08-19  9:50       ` Juergen Gross
2021-09-02 17:04         ` [PATCH v2 02/13] libxc: split xc_logdirty_control() from xc_shadow_control() [and 1 more messages] Ian Jackson
2021-07-05 15:13 ` [PATCH v2 03/13] libxenguest: deal with log-dirty op stats overflow Jan Beulich
2021-07-05 15:41   ` Andrew Cooper
2021-07-05 15:53     ` Jan Beulich
2021-07-05 17:26   ` Olaf Hering
2021-07-06  6:39     ` Jan Beulich
2021-07-06  6:46       ` Olaf Hering
2021-07-06  6:58         ` Jan Beulich
2021-07-06  7:15           ` Olaf Hering
2021-08-19 10:20   ` Juergen Gross
2021-08-19 11:06     ` Jan Beulich
2021-08-19 11:25       ` Juergen Gross
2021-08-19 11:51         ` Jan Beulich
2021-08-19 11:53           ` Jan Beulich
2021-09-02 16:57             ` Ian Jackson
2021-08-19 14:29           ` Juergen Gross
2021-07-05 15:13 ` [PATCH v2 04/13] libxenguest: short-circuit "all-dirty" handling Jan Beulich
2021-08-19 14:21   ` Juergen Gross
2021-07-05 15:14 ` [PATCH v2 05/13] libxenguest: avoid allocating unused deferred-pages bitmap Jan Beulich
2021-08-19 14:22   ` Juergen Gross
2021-07-05 15:14 ` [PATCH v2 06/13] libxenguest: complete loops in xc_map_domain_meminfo() Jan Beulich
2021-07-05 15:15 ` [PATCH v2 07/13] libxenguest: guard against overflow from too large p2m when checkpointing Jan Beulich
2021-07-05 15:15 ` [PATCH v2 08/13] libxenguest: fix off-by-1 in colo-secondary-bitmap merging Jan Beulich
2021-07-05 15:15 ` [PATCH v2 09/13] libxenguest: restrict PV guest size Jan Beulich
2021-07-05 15:16 ` [PATCH v2 10/13] libxc: simplify HYPERCALL_BUFFER() Jan Beulich
2021-07-05 15:47   ` Andrew Cooper
2021-07-05 15:17 ` [PATCH v2 11/13] x86/paging: supply more useful log-dirty page count Jan Beulich
2021-07-05 15:18 ` [PATCH v2 12/13] x86/mm: update log-dirty bitmap when manipulating P2M Jan Beulich
2021-07-05 15:18 ` [PATCH v2 13/13] SUPPORT.md: write down restriction of 32-bit tool stacks Jan Beulich
2021-07-14 18:16   ` Julien Grall
2021-07-15  6:38     ` Jan Beulich
2021-07-15  9:05       ` Julien Grall
2021-07-15 11:36         ` Jan Beulich
2021-07-16  7:50           ` Julien Grall
2021-07-19  7:46 ` Ping: [PATCH v2 00/13] x86: more or less log-dirty related improvements Jan Beulich
2021-08-13  9:24   ` Jan Beulich
2021-08-20  7:20   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.