All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v20210616 00/36] leftover from 2020
@ 2021-06-16 12:50 Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 01/36] hotplug/Linux: fix starting of xenstored with restarting systemd Olaf Hering
                   ` (36 more replies)
  0 siblings, 37 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering

Various unreviewed changes, rebase to 4bcf6433ee.

Olaf Hering (36):
  hotplug/Linux: fix starting of xenstored with restarting systemd
  tools: add API to work with sevaral bits at once
  xl: fix description of migrate --debug
  tools: create libxensaverestore
  MAINTAINERS: add myself as saverestore maintainer
  tools: add readv_exact to libxenctrl
  tools: add sr_is_known_page_type to libsaverestore
  tools: use sr_is_known_page_type
  tools: unify type checking for data pfns in migration stream
  tools: show migration transfer rate in send_dirty_pages
  tools: prepare to allocate saverestore arrays once
  tools: save: move mfns array
  tools: save: move types array
  tools: save: move errors array
  tools: save: move iov array
  tools: save: move rec_pfns array
  tools: save: move guest_data array
  tools: save: move local_pages array
  tools: restore: move types array
  tools: restore: move mfns array
  tools: restore: move map_errs array
  tools: restore: move mfns array in populate_pfns
  tools: restore: move pfns array in populate_pfns
  tools: restore: split record processing
  tools: restore: split handle_page_data
  tools: restore: write data directly into guest
  tools: recognize LIBXL_API_VERSION for 4.16
  tools: adjust libxl_domain_suspend to receive a struct props
  tools: change struct precopy_stats to precopy_stats_t
  tools: add callback to libxl for precopy_policy and precopy_stats_t
  tools: add --max_iters to libxl_domain_suspend
  tools: add --min_remaining to libxl_domain_suspend
  tools: add --abort_if_busy to libxl_domain_suspend
  tools: add API for expandable bitmaps
  tools: use sr_bitmap for populated_pfns
  tools: use superpages during restore of HVM guest

 .gitignore                                    |   2 +
 MAINTAINERS                                   |   6 +
 docs/man/xl.1.pod.in                          |  22 +-
 tools/hotplug/Linux/init.d/xencommons.in      |   2 +-
 tools/hotplug/Linux/launch-xenstore.in        |  40 +-
 .../Linux/systemd/xenstored.service.in        |   2 +-
 tools/include/libxl.h                         |  32 +-
 tools/include/xenguest.h                      | 186 -----
 tools/include/xensaverestore.h                | 207 ++++++
 tools/libs/Makefile                           |   1 +
 tools/libs/ctrl/xc_bitops.h                   |  28 +
 tools/libs/ctrl/xc_private.c                  |  57 +-
 tools/libs/ctrl/xc_private.h                  |   1 +
 tools/libs/guest/Makefile                     |  11 -
 tools/libs/guest/xg_dom_x86.c                 |   5 -
 tools/libs/guest/xg_offline_page.c            |   1 -
 tools/libs/guest/xg_private.h                 |   5 +
 tools/libs/guest/xg_sr_restore_x86_hvm.c      | 274 --------
 tools/libs/light/Makefile                     |   4 +-
 tools/libs/light/libxl_dom_save.c             |  24 +
 tools/libs/light/libxl_domain.c               |  10 +-
 tools/libs/light/libxl_internal.h             |   7 +
 tools/libs/light/libxl_save_helper.c          |   1 +
 tools/libs/light/libxl_save_msgs_gen.pl       |   5 +-
 tools/libs/light/libxl_stream_write.c         |   9 +-
 tools/libs/light/libxl_types.idl              |   1 +
 tools/libs/saverestore/Makefile               |  38 ++
 .../xg_sr_common.c => saverestore/common.c}   |  76 ++-
 .../xg_sr_common.h => saverestore/common.h}   | 253 ++++++-
 .../common_x86.c}                             |   2 +-
 .../common_x86.h}                             |   2 +-
 .../common_x86_pv.c}                          |   2 +-
 .../common_x86_pv.h}                          |   2 +-
 .../nomigrate.c}                              |   0
 .../xg_sr_restore.c => saverestore/restore.c} | 617 +++++++++--------
 tools/libs/saverestore/restore_x86_hvm.c      | 645 ++++++++++++++++++
 .../restore_x86_pv.c}                         |  70 +-
 .../xg_sr_save.c => saverestore/save.c}       | 165 ++---
 .../save_restore.h}                           |   2 -
 .../save_x86_hvm.c}                           |   7 +-
 .../save_x86_pv.c}                            |  33 +-
 .../stream_format.h}                          |   0
 tools/libs/uselibs.mk                         |   4 +-
 tools/ocaml/libs/xl/xenlight_stubs.c          |   3 +-
 tools/xl/xl_cmdtable.c                        |  26 +-
 tools/xl/xl_migrate.c                         |  54 +-
 tools/xl/xl_saverestore.c                     |   3 +-
 47 files changed, 2006 insertions(+), 941 deletions(-)
 create mode 100644 tools/include/xensaverestore.h
 delete mode 100644 tools/libs/guest/xg_sr_restore_x86_hvm.c
 create mode 100644 tools/libs/saverestore/Makefile
 rename tools/libs/{guest/xg_sr_common.c => saverestore/common.c} (72%)
 rename tools/libs/{guest/xg_sr_common.h => saverestore/common.h} (68%)
 rename tools/libs/{guest/xg_sr_common_x86.c => saverestore/common_x86.c} (99%)
 rename tools/libs/{guest/xg_sr_common_x86.h => saverestore/common_x86.h} (98%)
 rename tools/libs/{guest/xg_sr_common_x86_pv.c => saverestore/common_x86_pv.c} (99%)
 rename tools/libs/{guest/xg_sr_common_x86_pv.h => saverestore/common_x86_pv.h} (98%)
 rename tools/libs/{guest/xg_nomigrate.c => saverestore/nomigrate.c} (100%)
 rename tools/libs/{guest/xg_sr_restore.c => saverestore/restore.c} (66%)
 create mode 100644 tools/libs/saverestore/restore_x86_hvm.c
 rename tools/libs/{guest/xg_sr_restore_x86_pv.c => saverestore/restore_x86_pv.c} (94%)
 rename tools/libs/{guest/xg_sr_save.c => saverestore/save.c} (88%)
 rename tools/libs/{guest/xg_save_restore.h => saverestore/save_restore.h} (98%)
 rename tools/libs/{guest/xg_sr_save_x86_hvm.c => saverestore/save_x86_hvm.c} (96%)
 rename tools/libs/{guest/xg_sr_save_x86_pv.c => saverestore/save_x86_pv.c} (97%)
 rename tools/libs/{guest/xg_sr_stream_format.h => saverestore/stream_format.h} (100%)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 01/36] hotplug/Linux: fix starting of xenstored with restarting systemd
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
@ 2021-06-16 12:50 ` Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 02/36] tools: add API to work with sevaral bits at once Olaf Hering
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu

A hard to trigger race with another unrelated systemd service and
xenstored.service unveiled a bug in the way how xenstored is launched
with systemd.

launch-xenstore may start either a daemon or a domain. In case a domain
is used, systemd-notify was called. If another service triggered a
restart of systemd while xenstored.service was executed, systemd may
temporary lose track of services with Type=notify. As a result,
xenstored.service would be marked as failed and units that depend on it
will not be started. This breaks the enire Xen toolstack.

The chain of events is basically: xenstored.service sends the
notification to systemd, this is a one-way event. Then systemd may be
restarted by the other unit. During this time, xenstored.service is done
and exits. Once systemd is done with its restart, it collects the pending
notifications and childs. If it does not find the unit which sent the
notification it will declare it as failed.

A workaround for this scenario is to leave the child processes running
for a short time after sending the "READY=1" notification. If systemd
happens to restart it will still find the unit it launched.

Adjust the callers of launch-xenstore to specifiy the init system:
Do not fork xenstored with systemd, preserve pid. This wil also avoid
the need for a sleep because the process which sent the "READY=1" (the
previously forked child) is still alive.

Remove the --pid-file in the systemd case because the pid of the child
is known, and the file had probably little effect anyway due to lack of
PidFile= and Type=forking in the unit file.

Be verbose about xenstored startup only with sysv to avoid interleaved
output in systemd journal. Do the same also for domain case, even if is
not strictly needed because init-xenstore-domain has no output.

The fix for upstream systemd which is supposed to fix it:
575b300b795b6 ("pid1: rework how we dispatch SIGCHLD and other signals")

Signed-off-by: Olaf Hering <olaf@aepfle.de>

--
v04:
- do mkdir unconditionally because init-xenstore-domain writes the domid to
  xenstored.pid
v03:
- remove run_xenstored function, follow style of shell built-in test function
v02:
- preserve Type=notify
---
 tools/hotplug/Linux/init.d/xencommons.in      |  2 +-
 tools/hotplug/Linux/launch-xenstore.in        | 40 ++++++++++++++-----
 .../Linux/systemd/xenstored.service.in        |  2 +-
 3 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/tools/hotplug/Linux/init.d/xencommons.in b/tools/hotplug/Linux/init.d/xencommons.in
index 7fd6903b98..dcb0ce4b73 100644
--- a/tools/hotplug/Linux/init.d/xencommons.in
+++ b/tools/hotplug/Linux/init.d/xencommons.in
@@ -60,7 +60,7 @@ do_start () {
 	mkdir -m700 -p ${XEN_LOCK_DIR}
 	mkdir -p ${XEN_LOG_DIR}
 
-	@XEN_SCRIPT_DIR@/launch-xenstore || exit 1
+	@XEN_SCRIPT_DIR@/launch-xenstore 'sysv' || exit 1
 
 	echo Setting domain 0 name, domid and JSON config...
 	${LIBEXEC_BIN}/xen-init-dom0 ${XEN_DOM0_UUID}
diff --git a/tools/hotplug/Linux/launch-xenstore.in b/tools/hotplug/Linux/launch-xenstore.in
index 019f9d6f4d..d40c66482a 100644
--- a/tools/hotplug/Linux/launch-xenstore.in
+++ b/tools/hotplug/Linux/launch-xenstore.in
@@ -15,6 +15,17 @@
 # License along with this library; If not, see <http://www.gnu.org/licenses/>.
 #
 
+initd=$1
+
+case "$initd" in
+	sysv) nonl='-n' ;;
+	systemd) nonl= ;;
+	*)
+	echo "first argument must be 'sysv' or 'systemd'"
+	exit 1
+	;;
+esac
+
 XENSTORED=@XENSTORED@
 
 . @XEN_SCRIPT_DIR@/hotplugpath.sh
@@ -44,14 +55,16 @@ timeout_xenstore () {
 	return 0
 }
 
-test_xenstore && exit 0
+mkdir -p @XEN_RUN_DIR@
+
+if test "$initd" = 'sysv' ; then
+	test_xenstore && exit 0
+fi
 
 test -f @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons && . @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons
 
 [ "$XENSTORETYPE" = "" ] && XENSTORETYPE=daemon
 
-/bin/mkdir -p @XEN_RUN_DIR@
-
 [ "$XENSTORETYPE" = "daemon" ] && {
 	[ -z "$XENSTORED_TRACE" ] || XENSTORED_ARGS="$XENSTORED_ARGS -T @XEN_LOG_DIR@/xenstored-trace.log"
 	[ -z "$XENSTORED" ] && XENSTORED=@XENSTORED@
@@ -59,13 +72,15 @@ test -f @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons && . @CONFIG_DIR@/@CONFIG_LEAF
 		echo "No xenstored found"
 		exit 1
 	}
+	[ "$initd" = 'sysv' ] && {
+		echo $nonl Starting $XENSTORED...
+		$XENSTORED --pid-file @XEN_RUN_DIR@/xenstored.pid $XENSTORED_ARGS
+		timeout_xenstore $XENSTORED || exit 1
+		exit 0
+	}
 
-	echo -n Starting $XENSTORED...
-	$XENSTORED --pid-file @XEN_RUN_DIR@/xenstored.pid $XENSTORED_ARGS
-
-	systemd-notify --booted 2>/dev/null || timeout_xenstore $XENSTORED || exit 1
-
-	exit 0
+	exec $XENSTORED -N $XENSTORED_ARGS
+	exit 1
 }
 
 [ "$XENSTORETYPE" = "domain" ] && {
@@ -75,9 +90,12 @@ test -f @CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons && . @CONFIG_DIR@/@CONFIG_LEAF
 	XENSTORE_DOMAIN_ARGS="$XENSTORE_DOMAIN_ARGS --memory $XENSTORE_DOMAIN_SIZE"
 	[ -z "$XENSTORE_MAX_DOMAIN_SIZE" ] || XENSTORE_DOMAIN_ARGS="$XENSTORE_DOMAIN_ARGS --maxmem $XENSTORE_MAX_DOMAIN_SIZE"
 
-	echo -n Starting $XENSTORE_DOMAIN_KERNEL...
+	echo $nonl Starting $XENSTORE_DOMAIN_KERNEL...
 	${LIBEXEC_BIN}/init-xenstore-domain $XENSTORE_DOMAIN_ARGS || exit 1
-	systemd-notify --ready 2>/dev/null
+	[ "$initd" = 'systemd' ] && {
+		systemd-notify --ready
+		sleep 9
+	}
 
 	exit 0
 }
diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in b/tools/hotplug/Linux/systemd/xenstored.service.in
index 80c1d408a5..c226eb3635 100644
--- a/tools/hotplug/Linux/systemd/xenstored.service.in
+++ b/tools/hotplug/Linux/systemd/xenstored.service.in
@@ -11,7 +11,7 @@ Type=notify
 NotifyAccess=all
 RemainAfterExit=true
 ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities
-ExecStart=@XEN_SCRIPT_DIR@/launch-xenstore
+ExecStart=@XEN_SCRIPT_DIR@/launch-xenstore 'systemd'
 
 [Install]
 WantedBy=multi-user.target


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 02/36] tools: add API to work with sevaral bits at once
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 01/36] hotplug/Linux: fix starting of xenstored with restarting systemd Olaf Hering
@ 2021-06-16 12:50 ` Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 03/36] xl: fix description of migrate --debug Olaf Hering
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

Introduce new API to test if a fixed number of bits is clear or set,
and clear or set them all at once.

The caller has to make sure the input bitnumber is a multiple of BITS_PER_LONG.

This API avoids the loop over each bit in a known range just to see
if all of them are either clear or set.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

v02:
- change return type from int to bool (jgross)
---
 tools/libs/ctrl/xc_bitops.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/libs/ctrl/xc_bitops.h b/tools/libs/ctrl/xc_bitops.h
index f0bac4a071..8e8c6efb45 100644
--- a/tools/libs/ctrl/xc_bitops.h
+++ b/tools/libs/ctrl/xc_bitops.h
@@ -3,6 +3,7 @@
 
 /* bitmap operations for single threaded access */
 
+#include <stdbool.h>
 #include <stdlib.h>
 #include <string.h>
 
@@ -77,4 +78,31 @@ static inline void bitmap_or(void *_dst, const void *_other,
         dst[i] |= other[i];
 }
 
+static inline bool test_bit_long_set(unsigned long nr_base, const void *_addr)
+{
+    const unsigned long *addr = _addr;
+    unsigned long val = addr[nr_base / BITS_PER_LONG];
+
+    return val == ~0;
+}
+
+static inline bool test_bit_long_clear(unsigned long nr_base, const void *_addr)
+{
+    const unsigned long *addr = _addr;
+    unsigned long val = addr[nr_base / BITS_PER_LONG];
+
+    return val == 0;
+}
+
+static inline void clear_bit_long(unsigned long nr_base, void *_addr)
+{
+    unsigned long *addr = _addr;
+    addr[nr_base / BITS_PER_LONG] = 0;
+}
+
+static inline void set_bit_long(unsigned long nr_base, void *_addr)
+{
+    unsigned long *addr = _addr;
+    addr[nr_base / BITS_PER_LONG] = ~0;
+}
 #endif  /* XC_BITOPS_H */


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 03/36] xl: fix description of migrate --debug
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 01/36] hotplug/Linux: fix starting of xenstored with restarting systemd Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 02/36] tools: add API to work with sevaral bits at once Olaf Hering
@ 2021-06-16 12:50 ` Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 04/36] tools: create libxensaverestore Olaf Hering
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu, Anthony PERARD

xl migrate --debug used to track every pfn in every batch of pages.
But these times are gone. The code in xc_domain_save is the consumer
of this knob, but it considers it only for the remus and colo case.

Adjust the help text to tell what --debug does today: Nothing.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>

v02:
- the option has no effect anymore
---
 docs/man/xl.1.pod.in   | 2 +-
 tools/xl/xl_cmdtable.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index e2176bd696..70a6ebf438 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -481,7 +481,7 @@ domain.
 
 =item B<--debug>
 
-Display huge (!) amount of debug information during the migration process.
+This option has no effect. It is preserved for compatibility reasons.
 
 =item B<-p>
 
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 661323d488..ca1dfa3525 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -172,7 +172,7 @@ const struct cmd_spec cmd_table[] = {
       "                migrate-receive [-d -e]\n"
       "-e              Do not wait in the background (on <host>) for the death\n"
       "                of the domain.\n"
-      "--debug         Print huge (!) amount of debug during the migration process.\n"
+      "--debug         Ignored.\n"
       "-p              Do not unpause domain after migrating it.\n"
       "-D              Preserve the domain id"
     },


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 04/36] tools: create libxensaverestore
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (2 preceding siblings ...)
  2021-06-16 12:50 ` [PATCH v20210616 03/36] xl: fix description of migrate --debug Olaf Hering
@ 2021-06-16 12:50 ` Olaf Hering
  2021-06-16 15:17   ` Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 05/36] MAINTAINERS: add myself as saverestore maintainer Olaf Hering
                   ` (32 subsequent siblings)
  36 siblings, 1 reply; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Wei Liu, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Juergen Gross,
	Anthony PERARD

Move all save/restore related code from libxenguest.so into a separate
library libxensaverestore.so. The only consumer is libxl-save-helper.
There is no need to have the moved code mapped all the time in binaries
where libxenguest.so is used.

According to size(1) the change is:
   text	   data	    bss	    dec	    hex	filename
 187183	   4304	     48	 191535	  2ec2f	guest/libxenguest.so.4.15.0

 124106	   3376	     48	 127530	  1f22a	guest/libxenguest.so.4.15.0
  67841	   1872	      8	  69721	  11059	saverestore/libxensaverestore.so.4.15.0

While touching the files anyway, take the opportunity to drop the
redundant xg_sr_ filename prefix.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>

v5:
- fix spelling in description
v4:
- drop xg_ prefix from filenames (jgross)
- drop sr_ prefix from filenames (jbeulich)
v3:
- repost in time for 4.16
v2:
- copy also license header
- move xg_nomigrate.c
- add size(1) output to commit msg
- remove change from libxl_create.c
---
 .gitignore                                    |   2 +
 tools/include/xenguest.h                      | 186 ----------------
 tools/include/xensaverestore.h                | 208 ++++++++++++++++++
 tools/libs/Makefile                           |   1 +
 tools/libs/guest/Makefile                     |  11 -
 tools/libs/guest/xg_offline_page.c            |   1 -
 tools/libs/light/Makefile                     |   4 +-
 tools/libs/light/libxl_internal.h             |   1 +
 tools/libs/light/libxl_save_helper.c          |   1 +
 tools/libs/light/libxl_save_msgs_gen.pl       |   2 +-
 tools/libs/saverestore/Makefile               |  38 ++++
 .../xg_sr_common.c => saverestore/common.c}   |   2 +-
 .../xg_sr_common.h => saverestore/common.h}   |  16 +-
 .../common_x86.c}                             |   2 +-
 .../common_x86.h}                             |   2 +-
 .../common_x86_pv.c}                          |   2 +-
 .../common_x86_pv.h}                          |   2 +-
 .../nomigrate.c}                              |   0
 .../xg_sr_restore.c => saverestore/restore.c} |   2 +-
 .../restore_x86_hvm.c}                        |   2 +-
 .../restore_x86_pv.c}                         |   2 +-
 .../xg_sr_save.c => saverestore/save.c}       |   2 +-
 .../save_restore.h}                           |   2 -
 .../save_x86_hvm.c}                           |   2 +-
 .../save_x86_pv.c}                            |   2 +-
 .../stream_format.h}                          |   0
 tools/libs/uselibs.mk                         |   4 +-
 27 files changed, 282 insertions(+), 217 deletions(-)
 create mode 100644 tools/include/xensaverestore.h
 create mode 100644 tools/libs/saverestore/Makefile
 rename tools/libs/{guest/xg_sr_common.c => saverestore/common.c} (99%)
 rename tools/libs/{guest/xg_sr_common.h => saverestore/common.h} (98%)
 rename tools/libs/{guest/xg_sr_common_x86.c => saverestore/common_x86.c} (99%)
 rename tools/libs/{guest/xg_sr_common_x86.h => saverestore/common_x86.h} (98%)
 rename tools/libs/{guest/xg_sr_common_x86_pv.c => saverestore/common_x86_pv.c} (99%)
 rename tools/libs/{guest/xg_sr_common_x86_pv.h => saverestore/common_x86_pv.h} (98%)
 rename tools/libs/{guest/xg_nomigrate.c => saverestore/nomigrate.c} (100%)
 rename tools/libs/{guest/xg_sr_restore.c => saverestore/restore.c} (99%)
 rename tools/libs/{guest/xg_sr_restore_x86_hvm.c => saverestore/restore_x86_hvm.c} (99%)
 rename tools/libs/{guest/xg_sr_restore_x86_pv.c => saverestore/restore_x86_pv.c} (99%)
 rename tools/libs/{guest/xg_sr_save.c => saverestore/save.c} (99%)
 rename tools/libs/{guest/xg_save_restore.h => saverestore/save_restore.h} (98%)
 rename tools/libs/{guest/xg_sr_save_x86_hvm.c => saverestore/save_x86_hvm.c} (99%)
 rename tools/libs/{guest/xg_sr_save_x86_pv.c => saverestore/save_x86_pv.c} (99%)
 rename tools/libs/{guest/xg_sr_stream_format.h => saverestore/stream_format.h} (100%)

diff --git a/.gitignore b/.gitignore
index 38a085e398..08a321e995 100644
--- a/.gitignore
+++ b/.gitignore
@@ -147,6 +147,8 @@ tools/libs/light/test_timedereg
 tools/libs/light/test_fdderegrace
 tools/libs/light/tmp.*
 tools/libs/light/xenlight.pc
+tools/libs/saverestore/libxensaverestore.map
+tools/libs/saverestore/xensaverestore.pc
 tools/libs/stat/_paths.h
 tools/libs/stat/headers.chk
 tools/libs/stat/libxenstat.map
diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
index 61d0a82f48..7417675b3b 100644
--- a/tools/include/xenguest.h
+++ b/tools/include/xenguest.h
@@ -24,9 +24,6 @@
 
 #define XC_NUMA_NO_NODE   (~0U)
 
-#define XCFLAGS_LIVE      (1 << 0)
-#define XCFLAGS_DEBUG     (1 << 1)
-
 #define X86_64_B_SIZE   64 
 #define X86_32_B_SIZE   32
 
@@ -433,189 +430,6 @@ static inline xen_pfn_t xc_dom_p2m(struct xc_dom_image *dom, xen_pfn_t pfn)
  */
 struct xenevtchn_handle;
 
-/* For save's precopy_policy(). */
-struct precopy_stats
-{
-    unsigned int iteration;
-    unsigned long total_written;
-    long dirty_count; /* -1 if unknown */
-};
-
-/*
- * A precopy_policy callback may not be running in the same address
- * space as libxc an so precopy_stats is passed by value.
- */
-typedef int (*precopy_policy_t)(struct precopy_stats, void *);
-
-/* callbacks provided by xc_domain_save */
-struct save_callbacks {
-    /*
-     * Called after expiration of checkpoint interval,
-     * to suspend the guest.
-     */
-    int (*suspend)(void *data);
-
-    /*
-     * Called before and after every batch of page data sent during
-     * the precopy phase of a live migration to ask the caller what
-     * to do next based on the current state of the precopy migration.
-     *
-     * Should return one of the values listed below:
-     */
-#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
-                                        * and tidy up. */
-#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
-#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
-                                        * remaining dirty pages. */
-    precopy_policy_t precopy_policy;
-
-    /*
-     * Called after the guest's dirty pages have been
-     *  copied into an output buffer.
-     * Callback function resumes the guest & the device model,
-     *  returns to xc_domain_save.
-     * xc_domain_save then flushes the output buffer, while the
-     *  guest continues to run.
-     */
-    int (*postcopy)(void *data);
-
-    /*
-     * Called after the memory checkpoint has been flushed
-     * out into the network. Typical actions performed in this
-     * callback include:
-     *   (a) send the saved device model state (for HVM guests),
-     *   (b) wait for checkpoint ack
-     *   (c) release the network output buffer pertaining to the acked checkpoint.
-     *   (c) sleep for the checkpoint interval.
-     *
-     * returns:
-     * 0: terminate checkpointing gracefully
-     * 1: take another checkpoint
-     */
-    int (*checkpoint)(void *data);
-
-    /*
-     * Called after the checkpoint callback.
-     *
-     * returns:
-     * 0: terminate checkpointing gracefully
-     * 1: take another checkpoint
-     */
-    int (*wait_checkpoint)(void *data);
-
-    /* Enable qemu-dm logging dirty pages to xen */
-    int (*switch_qemu_logdirty)(uint32_t domid, unsigned enable, void *data); /* HVM only */
-
-    /* to be provided as the last argument to each callback function */
-    void *data;
-};
-
-/* Type of stream.  Plain, or using a continuous replication protocol? */
-typedef enum {
-    XC_STREAM_PLAIN,
-    XC_STREAM_REMUS,
-    XC_STREAM_COLO,
-} xc_stream_type_t;
-
-/**
- * This function will save a running domain.
- *
- * @param xch a handle to an open hypervisor interface
- * @param io_fd the file descriptor to save a domain to
- * @param dom the id of the domain
- * @param flags XCFLAGS_xxx
- * @param stream_type XC_STREAM_PLAIN if the far end of the stream
- *        doesn't use checkpointing
- * @param recv_fd Only used for XC_STREAM_COLO.  Contains backchannel from
- *        the destination side.
- * @return 0 on success, -1 on failure
- */
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
-                   uint32_t flags, struct save_callbacks *callbacks,
-                   xc_stream_type_t stream_type, int recv_fd);
-
-/* callbacks provided by xc_domain_restore */
-struct restore_callbacks {
-    /*
-     * Called once the STATIC_DATA_END record has been received/inferred.
-     *
-     * For compatibility with older streams, provides a list of static data
-     * expected to be found in the stream, which was missing.  A higher level
-     * toolstack is responsible for providing any necessary compatibiltiy.
-     */
-#define XGR_SDD_MISSING_CPUID (1 << 0)
-#define XGR_SDD_MISSING_MSR   (1 << 1)
-    int (*static_data_done)(unsigned int missing, void *data);
-
-    /* Called after a new checkpoint to suspend the guest. */
-    int (*suspend)(void *data);
-
-    /*
-     * Called after the secondary vm is ready to resume.
-     * Callback function resumes the guest & the device model,
-     * returns to xc_domain_restore.
-     */
-    int (*postcopy)(void *data);
-
-    /*
-     * A checkpoint record has been found in the stream.
-     * returns:
-     */
-#define XGR_CHECKPOINT_ERROR    0 /* Terminate processing */
-#define XGR_CHECKPOINT_SUCCESS  1 /* Continue reading more data from the stream */
-#define XGR_CHECKPOINT_FAILOVER 2 /* Failover and resume VM */
-    int (*checkpoint)(void *data);
-
-    /*
-     * Called after the checkpoint callback.
-     *
-     * returns:
-     * 0: terminate checkpointing gracefully
-     * 1: take another checkpoint
-     */
-    int (*wait_checkpoint)(void *data);
-
-    /*
-     * callback to send store gfn and console gfn to xl
-     * if we want to resume vm before xc_domain_save()
-     * exits.
-     */
-    void (*restore_results)(xen_pfn_t store_gfn, xen_pfn_t console_gfn,
-                            void *data);
-
-    /* to be provided as the last argument to each callback function */
-    void *data;
-};
-
-/**
- * This function will restore a saved domain.
- *
- * Domain is restored in a suspended state ready to be unpaused.
- *
- * @param xch a handle to an open hypervisor interface
- * @param io_fd the file descriptor to restore a domain from
- * @param dom the id of the domain
- * @param store_evtchn the xenstore event channel for this domain to use
- * @param store_mfn filled with the gfn of the store page
- * @param store_domid the backend domain for xenstore
- * @param console_evtchn the console event channel for this domain to use
- * @param console_mfn filled with the gfn of the console page
- * @param console_domid the backend domain for xenconsole
- * @param stream_type XC_STREAM_PLAIN if the far end of the stream is using
- *        checkpointing
- * @param callbacks non-NULL to receive a callback to restore toolstack
- *        specific data
- * @param send_back_fd Only used for XC_STREAM_COLO.  Contains backchannel to
- *        the source side.
- * @return 0 on success, -1 on failure
- */
-int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
-                      unsigned int store_evtchn, unsigned long *store_mfn,
-                      uint32_t store_domid, unsigned int console_evtchn,
-                      unsigned long *console_mfn, uint32_t console_domid,
-                      xc_stream_type_t stream_type,
-                      struct restore_callbacks *callbacks, int send_back_fd);
-
 /**
  * This function will create a domain for a paravirtualized Linux
  * using file names pointing to kernel and ramdisk
diff --git a/tools/include/xensaverestore.h b/tools/include/xensaverestore.h
new file mode 100644
index 0000000000..0410f0469e
--- /dev/null
+++ b/tools/include/xensaverestore.h
@@ -0,0 +1,208 @@
+/******************************************************************************
+ * A library for guest domain save/restore/migration in Xen.
+ *
+ * Copyright (c) 2003-2004, K A Fraser.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef XENSAVERESTORE_H
+#define XENSAVERESTORE_H
+
+#define XCFLAGS_LIVE      (1 << 0)
+#define XCFLAGS_DEBUG     (1 << 1)
+
+/* For save's precopy_policy(). */
+struct precopy_stats
+{
+    unsigned int iteration;
+    unsigned long total_written;
+    long dirty_count; /* -1 if unknown */
+};
+
+/*
+ * A precopy_policy callback may not be running in the same address
+ * space as libxc an so precopy_stats is passed by value.
+ */
+typedef int (*precopy_policy_t)(struct precopy_stats, void *);
+
+/* callbacks provided by xc_domain_save */
+struct save_callbacks {
+    /*
+     * Called after expiration of checkpoint interval,
+     * to suspend the guest.
+     */
+    int (*suspend)(void *data);
+
+    /*
+     * Called before and after every batch of page data sent during
+     * the precopy phase of a live migration to ask the caller what
+     * to do next based on the current state of the precopy migration.
+     *
+     * Should return one of the values listed below:
+     */
+#define XGS_POLICY_ABORT          (-1) /* Abandon the migration entirely
+                                        * and tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY    1  /* Immediately suspend and transmit the
+                                        * remaining dirty pages. */
+    precopy_policy_t precopy_policy;
+
+    /*
+     * Called after the guest's dirty pages have been
+     *  copied into an output buffer.
+     * Callback function resumes the guest & the device model,
+     *  returns to xc_domain_save.
+     * xc_domain_save then flushes the output buffer, while the
+     *  guest continues to run.
+     */
+    int (*postcopy)(void *data);
+
+    /*
+     * Called after the memory checkpoint has been flushed
+     * out into the network. Typical actions performed in this
+     * callback include:
+     *   (a) send the saved device model state (for HVM guests),
+     *   (b) wait for checkpoint ack
+     *   (c) release the network output buffer pertaining to the acked checkpoint.
+     *   (c) sleep for the checkpoint interval.
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint
+     */
+    int (*checkpoint)(void *data);
+
+    /*
+     * Called after the checkpoint callback.
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint
+     */
+    int (*wait_checkpoint)(void *data);
+
+    /* Enable qemu-dm logging dirty pages to xen */
+    int (*switch_qemu_logdirty)(uint32_t domid, unsigned enable, void *data); /* HVM only */
+
+    /* to be provided as the last argument to each callback function */
+    void *data;
+};
+
+/* Type of stream.  Plain, or using a continuous replication protocol? */
+typedef enum {
+    XC_STREAM_PLAIN,
+    XC_STREAM_REMUS,
+    XC_STREAM_COLO,
+} xc_stream_type_t;
+
+/**
+ * This function will save a running domain.
+ *
+ * @param xch a handle to an open hypervisor interface
+ * @param io_fd the file descriptor to save a domain to
+ * @param dom the id of the domain
+ * @param flags XCFLAGS_xxx
+ * @param stream_type XC_STREAM_PLAIN if the far end of the stream
+ *        doesn't use checkpointing
+ * @param recv_fd Only used for XC_STREAM_COLO.  Contains backchannel from
+ *        the destination side.
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+                   uint32_t flags, struct save_callbacks *callbacks,
+                   xc_stream_type_t stream_type, int recv_fd);
+
+/* callbacks provided by xc_domain_restore */
+struct restore_callbacks {
+    /*
+     * Called once the STATIC_DATA_END record has been received/inferred.
+     *
+     * For compatibility with older streams, provides a list of static data
+     * expected to be found in the stream, which was missing.  A higher level
+     * toolstack is responsible for providing any necessary compatibiltiy.
+     */
+#define XGR_SDD_MISSING_CPUID (1 << 0)
+#define XGR_SDD_MISSING_MSR   (1 << 1)
+    int (*static_data_done)(unsigned int missing, void *data);
+
+    /* Called after a new checkpoint to suspend the guest. */
+    int (*suspend)(void *data);
+
+    /*
+     * Called after the secondary vm is ready to resume.
+     * Callback function resumes the guest & the device model,
+     * returns to xc_domain_restore.
+     */
+    int (*postcopy)(void *data);
+
+    /*
+     * A checkpoint record has been found in the stream.
+     * returns:
+     */
+#define XGR_CHECKPOINT_ERROR    0 /* Terminate processing */
+#define XGR_CHECKPOINT_SUCCESS  1 /* Continue reading more data from the stream */
+#define XGR_CHECKPOINT_FAILOVER 2 /* Failover and resume VM */
+    int (*checkpoint)(void *data);
+
+    /*
+     * Called after the checkpoint callback.
+     *
+     * returns:
+     * 0: terminate checkpointing gracefully
+     * 1: take another checkpoint
+     */
+    int (*wait_checkpoint)(void *data);
+
+    /*
+     * callback to send store gfn and console gfn to xl
+     * if we want to resume vm before xc_domain_save()
+     * exits.
+     */
+    void (*restore_results)(xen_pfn_t store_gfn, xen_pfn_t console_gfn,
+                            void *data);
+
+    /* to be provided as the last argument to each callback function */
+    void *data;
+};
+
+/**
+ * This function will restore a saved domain.
+ *
+ * Domain is restored in a suspended state ready to be unpaused.
+ *
+ * @param xch a handle to an open hypervisor interface
+ * @param io_fd the file descriptor to restore a domain from
+ * @param dom the id of the domain
+ * @param store_evtchn the xenstore event channel for this domain to use
+ * @param store_mfn filled with the gfn of the store page
+ * @param store_domid the backend domain for xenstore
+ * @param console_evtchn the console event channel for this domain to use
+ * @param console_mfn filled with the gfn of the console page
+ * @param console_domid the backend domain for xenconsole
+ * @param stream_type XC_STREAM_PLAIN if the far end of the stream is using
+ *        checkpointing
+ * @param callbacks non-NULL to receive a callback to restore toolstack
+ *        specific data
+ * @param send_back_fd Only used for XC_STREAM_COLO.  Contains backchannel to
+ *        the source side.
+ * @return 0 on success, -1 on failure
+ */
+int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
+                      unsigned int store_evtchn, unsigned long *store_mfn,
+                      uint32_t store_domid, unsigned int console_evtchn,
+                      unsigned long *console_mfn, uint32_t console_domid,
+                      xc_stream_type_t stream_type,
+                      struct restore_callbacks *callbacks, int send_back_fd);
+
+#endif /* XENSAVERESTORE_H */
diff --git a/tools/libs/Makefile b/tools/libs/Makefile
index 1afcd12e2b..ca43c66777 100644
--- a/tools/libs/Makefile
+++ b/tools/libs/Makefile
@@ -12,6 +12,7 @@ SUBDIRS-y += devicemodel
 SUBDIRS-y += ctrl
 SUBDIRS-y += guest
 SUBDIRS-y += hypfs
+SUBDIRS-y += saverestore
 SUBDIRS-y += store
 SUBDIRS-y += stat
 SUBDIRS-$(CONFIG_Linux) += vchan
diff --git a/tools/libs/guest/Makefile b/tools/libs/guest/Makefile
index 2ce92d247e..4cf5459bb1 100644
--- a/tools/libs/guest/Makefile
+++ b/tools/libs/guest/Makefile
@@ -11,18 +11,7 @@ SRCS-y += xg_domain.c
 SRCS-y += xg_suspend.c
 SRCS-y += xg_resume.c
 ifeq ($(CONFIG_MIGRATE),y)
-SRCS-y += xg_sr_common.c
-SRCS-$(CONFIG_X86) += xg_sr_common_x86.c
-SRCS-$(CONFIG_X86) += xg_sr_common_x86_pv.c
-SRCS-$(CONFIG_X86) += xg_sr_restore_x86_pv.c
-SRCS-$(CONFIG_X86) += xg_sr_restore_x86_hvm.c
-SRCS-$(CONFIG_X86) += xg_sr_save_x86_pv.c
-SRCS-$(CONFIG_X86) += xg_sr_save_x86_hvm.c
-SRCS-y += xg_sr_restore.c
-SRCS-y += xg_sr_save.c
 SRCS-y += xg_offline_page.c
-else
-SRCS-y += xg_nomigrate.c
 endif
 SRCS-y       += xg_core.c
 SRCS-$(CONFIG_X86) += xg_core_x86.c
diff --git a/tools/libs/guest/xg_offline_page.c b/tools/libs/guest/xg_offline_page.c
index cfe0e2d537..92b65243b1 100644
--- a/tools/libs/guest/xg_offline_page.c
+++ b/tools/libs/guest/xg_offline_page.c
@@ -29,7 +29,6 @@
 
 #include "xc_private.h"
 #include "xg_private.h"
-#include "xg_save_restore.h"
 
 struct pte_backup_entry
 {
diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
index 7d8c51d492..68e51dd13c 100644
--- a/tools/libs/light/Makefile
+++ b/tools/libs/light/Makefile
@@ -179,7 +179,7 @@ $(ACPI_OBJS) $(ACPI_PIC_OBJS): CFLAGS += -I. -DLIBACPI_STDUTILS=\"$(CURDIR)/libx
 $(TEST_PROG_OBJS) _libxl.api-for-check: CFLAGS += $(CFLAGS_libxentoollog) $(CFLAGS_libxentoolcore)
 libxl_dom.o libxl_dom.opic: CFLAGS += -I$(XEN_ROOT)/tools  # include libacpi/x86.h
 libxl_x86_acpi.o libxl_x86_acpi.opic: CFLAGS += -I$(XEN_ROOT)/tools
-$(SAVE_HELPER_OBJS): CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenevtchn) $(CFLAGS_libxenguest)
+$(SAVE_HELPER_OBJS): CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenevtchn) $(CFLAGS_libxensaverestore)
 
 testidl.o: CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenlight)
 testidl.c: libxl_types.idl gentest.py $(XEN_INCLUDE)/libxl.h $(AUTOINCS)
@@ -241,7 +241,7 @@ test_%: test_%.o test_common.o libxenlight_test.so
 	$(CC) $(LDFLAGS) -o $@ $^ $(filter-out %libxenlight.so, $(LDLIBS_libxenlight)) $(LDLIBS_libxentoollog) $(LDLIBS_libxentoolcore) -lyajl $(APPEND_LDFLAGS)
 
 libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so
-	$(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxentoollog) $(LDLIBS_libxenctrl) $(LDLIBS_libxenguest) $(LDLIBS_libxentoolcore) $(APPEND_LDFLAGS)
+	$(CC) $(LDFLAGS) -o $@ $(SAVE_HELPER_OBJS) $(LDLIBS_libxentoollog) $(LDLIBS_libxenctrl) $(LDLIBS_libxensaverestore) $(LDLIBS_libxentoolcore) $(APPEND_LDFLAGS)
 
 testidl: testidl.o libxenlight.so
 	$(CC) $(LDFLAGS) -o $@ testidl.o $(LDLIBS_libxenlight) $(LDLIBS_libxentoollog) $(LDLIBS_libxentoolcore) $(APPEND_LDFLAGS)
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index 0b4671318c..439c654733 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -56,6 +56,7 @@
 #define XC_WANT_COMPAT_MAP_FOREIGN_API
 #include <xenctrl.h>
 #include <xenguest.h>
+#include <xensaverestore.h>
 #include <xenhypfs.h>
 
 #include <xen-tools/libs.h>
diff --git a/tools/libs/light/libxl_save_helper.c b/tools/libs/light/libxl_save_helper.c
index 65dff389bf..896e845a2f 100644
--- a/tools/libs/light/libxl_save_helper.c
+++ b/tools/libs/light/libxl_save_helper.c
@@ -48,6 +48,7 @@
 
 #include "xenctrl.h"
 #include "xenguest.h"
+#include "xensaverestore.h"
 #include "_libxl_save_msgs_helper.h"
 
 /*----- logger -----*/
diff --git a/tools/libs/light/libxl_save_msgs_gen.pl b/tools/libs/light/libxl_save_msgs_gen.pl
index 9d425b1dee..f263ee01bb 100755
--- a/tools/libs/light/libxl_save_msgs_gen.pl
+++ b/tools/libs/light/libxl_save_msgs_gen.pl
@@ -72,7 +72,7 @@ END_BOTH
 END_CALLOUT
 
 #include <xenctrl.h>
-#include <xenguest.h>
+#include <xensaverestore.h>
 #include "_libxl_save_msgs_${ah}.h"
 
 END_HELPER
diff --git a/tools/libs/saverestore/Makefile b/tools/libs/saverestore/Makefile
new file mode 100644
index 0000000000..48728b3be2
--- /dev/null
+++ b/tools/libs/saverestore/Makefile
@@ -0,0 +1,38 @@
+XEN_ROOT = $(CURDIR)/../../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+ifeq ($(CONFIG_MIGRATE),y)
+SRCS-y += common.c
+SRCS-$(CONFIG_X86) += common_x86.c
+SRCS-$(CONFIG_X86) += common_x86_pv.c
+SRCS-$(CONFIG_X86) += restore_x86_pv.c
+SRCS-$(CONFIG_X86) += restore_x86_hvm.c
+SRCS-$(CONFIG_X86) += save_x86_pv.c
+SRCS-$(CONFIG_X86) += save_x86_hvm.c
+SRCS-y += restore.c
+SRCS-y += save.c
+else
+SRCS-y += nomigrate.c
+endif
+
+CFLAGS += -I$(XEN_libxenctrl)
+CFLAGS += -I$(XEN_libxenguest)
+
+-include $(XEN_TARGET_ARCH)/Makefile
+
+CFLAGS   += -Werror -Wmissing-prototypes
+CFLAGS   += -I. -I./include $(CFLAGS_xeninclude)
+CFLAGS   += -D__XEN_TOOLS__
+CFLAGS   += -include $(XEN_ROOT)/tools/config.h
+# Needed for asprintf()
+CFLAGS-$(CONFIG_Linux) += -D_GNU_SOURCE
+
+LIBHEADER := xensaverestore.h
+
+NO_HEADERS_CHK := y
+
+include $(XEN_ROOT)/tools/libs/libs.mk
+
+.PHONY: cleanlocal
+cleanlocal:
+	rm -f libxensaverestore.map
diff --git a/tools/libs/guest/xg_sr_common.c b/tools/libs/saverestore/common.c
similarity index 99%
rename from tools/libs/guest/xg_sr_common.c
rename to tools/libs/saverestore/common.c
index 17567ab133..77128bc747 100644
--- a/tools/libs/guest/xg_sr_common.c
+++ b/tools/libs/saverestore/common.c
@@ -1,6 +1,6 @@
 #include <assert.h>
 
-#include "xg_sr_common.h"
+#include "common.h"
 
 #include <xen-tools/libs.h>
 
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/saverestore/common.h
similarity index 98%
rename from tools/libs/guest/xg_sr_common.h
rename to tools/libs/saverestore/common.h
index e2994e18ac..ca2eb47a4f 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/saverestore/common.h
@@ -1,13 +1,25 @@
 #ifndef __COMMON__H
 #define __COMMON__H
 
+#include <unistd.h>
+#include <errno.h>
 #include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "xc_private.h"
+#include "xenguest.h"
+#include "xensaverestore.h"
 
 #include "xg_private.h"
-#include "xg_save_restore.h"
+#include "save_restore.h"
 #include "xc_bitops.h"
 
-#include "xg_sr_stream_format.h"
+#include "stream_format.h"
 
 /* String representation of Domain Header types. */
 const char *dhdr_type_to_str(uint32_t type);
diff --git a/tools/libs/guest/xg_sr_common_x86.c b/tools/libs/saverestore/common_x86.c
similarity index 99%
rename from tools/libs/guest/xg_sr_common_x86.c
rename to tools/libs/saverestore/common_x86.c
index 563b4f0168..f1beb234ae 100644
--- a/tools/libs/guest/xg_sr_common_x86.c
+++ b/tools/libs/saverestore/common_x86.c
@@ -1,4 +1,4 @@
-#include "xg_sr_common_x86.h"
+#include "common_x86.h"
 
 int write_x86_tsc_info(struct xc_sr_context *ctx)
 {
diff --git a/tools/libs/guest/xg_sr_common_x86.h b/tools/libs/saverestore/common_x86.h
similarity index 98%
rename from tools/libs/guest/xg_sr_common_x86.h
rename to tools/libs/saverestore/common_x86.h
index b55758c96d..3a2d91dcb8 100644
--- a/tools/libs/guest/xg_sr_common_x86.h
+++ b/tools/libs/saverestore/common_x86.h
@@ -1,7 +1,7 @@
 #ifndef __COMMON_X86__H
 #define __COMMON_X86__H
 
-#include "xg_sr_common.h"
+#include "common.h"
 
 /*
  * Obtains a domains TSC information from Xen and writes a X86_TSC_INFO record
diff --git a/tools/libs/guest/xg_sr_common_x86_pv.c b/tools/libs/saverestore/common_x86_pv.c
similarity index 99%
rename from tools/libs/guest/xg_sr_common_x86_pv.c
rename to tools/libs/saverestore/common_x86_pv.c
index c0acf00f90..cfe1b24bed 100644
--- a/tools/libs/guest/xg_sr_common_x86_pv.c
+++ b/tools/libs/saverestore/common_x86_pv.c
@@ -1,6 +1,6 @@
 #include <assert.h>
 
-#include "xg_sr_common_x86_pv.h"
+#include "common_x86_pv.h"
 
 xen_pfn_t mfn_to_pfn(struct xc_sr_context *ctx, xen_pfn_t mfn)
 {
diff --git a/tools/libs/guest/xg_sr_common_x86_pv.h b/tools/libs/saverestore/common_x86_pv.h
similarity index 98%
rename from tools/libs/guest/xg_sr_common_x86_pv.h
rename to tools/libs/saverestore/common_x86_pv.h
index 953b5bfb8d..a9f8c970e3 100644
--- a/tools/libs/guest/xg_sr_common_x86_pv.h
+++ b/tools/libs/saverestore/common_x86_pv.h
@@ -1,7 +1,7 @@
 #ifndef __COMMON_X86_PV_H
 #define __COMMON_X86_PV_H
 
-#include "xg_sr_common_x86.h"
+#include "common_x86.h"
 
 /* Virtual address ranges reserved for hypervisor. */
 #define HYPERVISOR_VIRT_START_X86_64 0xFFFF800000000000ULL
diff --git a/tools/libs/guest/xg_nomigrate.c b/tools/libs/saverestore/nomigrate.c
similarity index 100%
rename from tools/libs/guest/xg_nomigrate.c
rename to tools/libs/saverestore/nomigrate.c
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/saverestore/restore.c
similarity index 99%
rename from tools/libs/guest/xg_sr_restore.c
rename to tools/libs/saverestore/restore.c
index b57a787519..be259a1c6b 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -2,7 +2,7 @@
 
 #include <assert.h>
 
-#include "xg_sr_common.h"
+#include "common.h"
 
 /*
  * Read and validate the Image and Domain headers.
diff --git a/tools/libs/guest/xg_sr_restore_x86_hvm.c b/tools/libs/saverestore/restore_x86_hvm.c
similarity index 99%
rename from tools/libs/guest/xg_sr_restore_x86_hvm.c
rename to tools/libs/saverestore/restore_x86_hvm.c
index d6ea6f3012..bd63bd2818 100644
--- a/tools/libs/guest/xg_sr_restore_x86_hvm.c
+++ b/tools/libs/saverestore/restore_x86_hvm.c
@@ -1,7 +1,7 @@
 #include <assert.h>
 #include <arpa/inet.h>
 
-#include "xg_sr_common_x86.h"
+#include "common_x86.h"
 
 /*
  * Process an HVM_CONTEXT record from the stream.
diff --git a/tools/libs/guest/xg_sr_restore_x86_pv.c b/tools/libs/saverestore/restore_x86_pv.c
similarity index 99%
rename from tools/libs/guest/xg_sr_restore_x86_pv.c
rename to tools/libs/saverestore/restore_x86_pv.c
index dc50b0f5a8..96608e5231 100644
--- a/tools/libs/guest/xg_sr_restore_x86_pv.c
+++ b/tools/libs/saverestore/restore_x86_pv.c
@@ -1,6 +1,6 @@
 #include <assert.h>
 
-#include "xg_sr_common_x86_pv.h"
+#include "common_x86_pv.h"
 
 static xen_pfn_t pfn_to_mfn(const struct xc_sr_context *ctx, xen_pfn_t pfn)
 {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/saverestore/save.c
similarity index 99%
rename from tools/libs/guest/xg_sr_save.c
rename to tools/libs/saverestore/save.c
index 2ba7c3200c..ae3e8797d0 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/saverestore/save.c
@@ -1,7 +1,7 @@
 #include <assert.h>
 #include <arpa/inet.h>
 
-#include "xg_sr_common.h"
+#include "common.h"
 
 /*
  * Writes an Image header and Domain header into the stream.
diff --git a/tools/libs/guest/xg_save_restore.h b/tools/libs/saverestore/save_restore.h
similarity index 98%
rename from tools/libs/guest/xg_save_restore.h
rename to tools/libs/saverestore/save_restore.h
index 3dbbc8dcd2..20bd3d30a5 100644
--- a/tools/libs/guest/xg_save_restore.h
+++ b/tools/libs/saverestore/save_restore.h
@@ -15,8 +15,6 @@
  * License along with this library; If not, see <http://www.gnu.org/licenses/>.
  */
 
-#include "xc_private.h"
-
 #include <xen/foreign/x86_32.h>
 #include <xen/foreign/x86_64.h>
 
diff --git a/tools/libs/guest/xg_sr_save_x86_hvm.c b/tools/libs/saverestore/save_x86_hvm.c
similarity index 99%
rename from tools/libs/guest/xg_sr_save_x86_hvm.c
rename to tools/libs/saverestore/save_x86_hvm.c
index 1634a7bc43..91c2cb99ab 100644
--- a/tools/libs/guest/xg_sr_save_x86_hvm.c
+++ b/tools/libs/saverestore/save_x86_hvm.c
@@ -1,6 +1,6 @@
 #include <assert.h>
 
-#include "xg_sr_common_x86.h"
+#include "common_x86.h"
 
 #include <xen/hvm/params.h>
 
diff --git a/tools/libs/guest/xg_sr_save_x86_pv.c b/tools/libs/saverestore/save_x86_pv.c
similarity index 99%
rename from tools/libs/guest/xg_sr_save_x86_pv.c
rename to tools/libs/saverestore/save_x86_pv.c
index 4964f1f7b8..92f77fad0f 100644
--- a/tools/libs/guest/xg_sr_save_x86_pv.c
+++ b/tools/libs/saverestore/save_x86_pv.c
@@ -1,7 +1,7 @@
 #include <assert.h>
 #include <limits.h>
 
-#include "xg_sr_common_x86_pv.h"
+#include "common_x86_pv.h"
 
 /* Check a 64 bit virtual address for being canonical. */
 static inline bool is_canonical_address(xen_vaddr_t vaddr)
diff --git a/tools/libs/guest/xg_sr_stream_format.h b/tools/libs/saverestore/stream_format.h
similarity index 100%
rename from tools/libs/guest/xg_sr_stream_format.h
rename to tools/libs/saverestore/stream_format.h
diff --git a/tools/libs/uselibs.mk b/tools/libs/uselibs.mk
index efd7a475ba..62a2990b95 100644
--- a/tools/libs/uselibs.mk
+++ b/tools/libs/uselibs.mk
@@ -20,6 +20,8 @@ LIBS_LIBS += ctrl
 USELIBS_ctrl := toollog call evtchn gnttab foreignmemory devicemodel
 LIBS_LIBS += guest
 USELIBS_guest := evtchn ctrl
+LIBS_LIBS += saverestore
+USELIBS_saverestore := guest ctrl
 LIBS_LIBS += store
 USELIBS_store := toolcore
 LIBS_LIBS += vchan
@@ -27,7 +29,7 @@ USELIBS_vchan := toollog store gnttab evtchn
 LIBS_LIBS += stat
 USELIBS_stat := ctrl store
 LIBS_LIBS += light
-USELIBS_light := toollog evtchn toolcore ctrl store hypfs guest
+USELIBS_light := toollog evtchn toolcore ctrl store hypfs guest saverestore
 LIBS_LIBS += util
 USELIBS_util := light
 FILENAME_util := xlutil


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 05/36] MAINTAINERS: add myself as saverestore maintainer
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (3 preceding siblings ...)
  2021-06-16 12:50 ` [PATCH v20210616 04/36] tools: create libxensaverestore Olaf Hering
@ 2021-06-16 12:50 ` Olaf Hering
  2021-06-16 12:50 ` [PATCH v20210616 06/36] tools: add readv_exact to libxenctrl Olaf Hering
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu

I touched it last.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 39750bb75d..dbb8f56ab3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -381,6 +381,12 @@ R:	Juergen Gross <jgross@suse.com>
 S:	Supported
 F:	tools/libs/
 
+LIBSAVERESTORE:
+M:	Olaf Hering <olaf@aepfle.de>
+S:	Supported
+F:	tools/include/xensaverestore.h
+F:	tools/libs/saverestore/
+
 LIBXENLIGHT
 M:	Ian Jackson <iwj@xenproject.org>
 M:	Wei Liu <wl@xen.org>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 06/36] tools: add readv_exact to libxenctrl
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (4 preceding siblings ...)
  2021-06-16 12:50 ` [PATCH v20210616 05/36] MAINTAINERS: add myself as saverestore maintainer Olaf Hering
@ 2021-06-16 12:50 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 07/36] tools: add sr_is_known_page_type to libsaverestore Olaf Hering
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

Read a batch of iovec's.

Short reads are the common case, finish the trailing iov with read_exact.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

v2:
- add comment to short-read handling
---
 tools/libs/ctrl/xc_private.c | 57 +++++++++++++++++++++++++++++++++++-
 tools/libs/ctrl/xc_private.h |  1 +
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/tools/libs/ctrl/xc_private.c b/tools/libs/ctrl/xc_private.c
index d94f846686..da58c3d9ba 100644
--- a/tools/libs/ctrl/xc_private.c
+++ b/tools/libs/ctrl/xc_private.c
@@ -659,8 +659,23 @@ int write_exact(int fd, const void *data, size_t size)
 
 #if defined(__MINIOS__)
 /*
- * MiniOS's libc doesn't know about writev(). Implement it as multiple write()s.
+ * MiniOS's libc doesn't know about readv/writev().
+ * Implement it as multiple read/write()s.
  */
+int readv_exact(int fd, const struct iovec *iov, int iovcnt)
+{
+    int rc, i;
+
+    for ( i = 0; i < iovcnt; ++i )
+    {
+        rc = read_exact(fd, iov[i].iov_base, iov[i].iov_len);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 int writev_exact(int fd, const struct iovec *iov, int iovcnt)
 {
     int rc, i;
@@ -675,6 +690,46 @@ int writev_exact(int fd, const struct iovec *iov, int iovcnt)
     return 0;
 }
 #else
+int readv_exact(int fd, const struct iovec *iov, int iovcnt)
+{
+    int rc = 0, idx = 0;
+    ssize_t len;
+
+    while ( idx < iovcnt )
+    {
+        len = readv(fd, &iov[idx], min(iovcnt - idx, IOV_MAX));
+        if ( len == -1 && errno == EINTR )
+            continue;
+        if ( len <= 0 )
+        {
+            rc = -1;
+            goto out;
+        }
+
+        /* Finish a potential short read in the last iov */
+        while ( len > 0 && idx < iovcnt )
+        {
+            if ( len >= iov[idx].iov_len )
+            {
+                len -= iov[idx].iov_len;
+            }
+            else
+            {
+                void *p = iov[idx].iov_base + len;
+                size_t l = iov[idx].iov_len - len;
+
+                rc = read_exact(fd, p, l);
+                if ( rc )
+                    goto out;
+                len = 0;
+            }
+            idx++;
+        }
+    }
+out:
+    return rc;
+}
+
 int writev_exact(int fd, const struct iovec *iov, int iovcnt)
 {
     struct iovec *local_iov = NULL;
diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h
index 3e299b943f..66086ef19f 100644
--- a/tools/libs/ctrl/xc_private.h
+++ b/tools/libs/ctrl/xc_private.h
@@ -410,6 +410,7 @@ int xc_flush_mmu_updates(xc_interface *xch, struct xc_mmu *mmu);
 
 /* Return 0 on success; -1 on error setting errno. */
 int read_exact(int fd, void *data, size_t size); /* EOF => -1, errno=0 */
+int readv_exact(int fd, const struct iovec *iov, int iovcnt);
 int write_exact(int fd, const void *data, size_t size);
 int writev_exact(int fd, const struct iovec *iov, int iovcnt);
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 07/36] tools: add sr_is_known_page_type to libsaverestore
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (5 preceding siblings ...)
  2021-06-16 12:50 ` [PATCH v20210616 06/36] tools: add readv_exact to libxenctrl Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 08/36] tools: use sr_is_known_page_type Olaf Hering
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

Users of xc_get_pfn_type_batch may want to sanity check the data
returned by Xen. Add a simple helper for this purpose.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

v02:
- rename xc_is_known_page_type to sr_is_known_page_type
- move from ctrl/xc_private.h to saverestore/common.h (jgross)
---
 tools/libs/saverestore/common.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index ca2eb47a4f..c9cc4206e5 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -467,6 +467,39 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
 /* Handle a STATIC_DATA_END record. */
 int handle_static_data_end(struct xc_sr_context *ctx);
 
+/* Sanitiy check for types returned by Xen */
+static inline bool sr_is_known_page_type(xen_pfn_t type)
+{
+    bool ret;
+
+    switch (type)
+    {
+    case XEN_DOMCTL_PFINFO_NOTAB:
+
+    case XEN_DOMCTL_PFINFO_L1TAB:
+    case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+    case XEN_DOMCTL_PFINFO_L2TAB:
+    case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+    case XEN_DOMCTL_PFINFO_L3TAB:
+    case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+    case XEN_DOMCTL_PFINFO_L4TAB:
+    case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+    case XEN_DOMCTL_PFINFO_XTAB:
+    case XEN_DOMCTL_PFINFO_XALLOC:
+    case XEN_DOMCTL_PFINFO_BROKEN:
+        ret = true;
+        break;
+    default:
+        ret = false;
+        break;
+    }
+    return ret;
+}
+
 #endif
 /*
  * Local variables:


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 08/36] tools: use sr_is_known_page_type
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (6 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 07/36] tools: add sr_is_known_page_type to libsaverestore Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 09/36] tools: unify type checking for data pfns in migration stream Olaf Hering
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Verify pfn type on sending side, also verify incoming batch of pfns.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>

v02:
- use sr_is_known_page_type instead of xc_is_known_page_type
---
 tools/libs/saverestore/restore.c | 3 +--
 tools/libs/saverestore/save.c    | 6 ++++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index be259a1c6b..324b9050e2 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -406,8 +406,7 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
         }
 
         type = (pages->pfn[i] & PAGE_DATA_TYPE_MASK) >> 32;
-        if ( ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) >= 5) &&
-             ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) <= 8) )
+        if ( sr_is_known_page_type(type) == false )
         {
             ERROR("Invalid type %#"PRIx32" for pfn %#"PRIpfn" (index %u)",
                   type, pfn, i);
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index ae3e8797d0..6f820ea432 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -147,6 +147,12 @@ static int write_batch(struct xc_sr_context *ctx)
 
     for ( i = 0; i < nr_pfns; ++i )
     {
+        if ( sr_is_known_page_type(types[i]) == false )
+        {
+            ERROR("Wrong type %#"PRIpfn" for pfn %#"PRIpfn, types[i], mfns[i]);
+            goto err;
+        }
+
         switch ( types[i] )
         {
         case XEN_DOMCTL_PFINFO_BROKEN:


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 09/36] tools: unify type checking for data pfns in migration stream
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (7 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 08/36] tools: use sr_is_known_page_type Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 10/36] tools: show migration transfer rate in send_dirty_pages Olaf Hering
                   ` (27 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

Introduce a helper which decides if a given pfn type has data
for the migration stream.

No change in behavior intended.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 tools/libs/saverestore/common.h  | 17 ++++++++++++++++
 tools/libs/saverestore/restore.c | 34 +++++---------------------------
 tools/libs/saverestore/save.c    | 14 ++-----------
 3 files changed, 24 insertions(+), 41 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index c9cc4206e5..08bbe902b9 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -500,6 +500,23 @@ static inline bool sr_is_known_page_type(xen_pfn_t type)
     return ret;
 }
 
+static inline bool page_type_has_stream_data(uint32_t type)
+{
+    bool ret;
+
+    switch (type)
+    {
+    case XEN_DOMCTL_PFINFO_XTAB:
+    case XEN_DOMCTL_PFINFO_XALLOC:
+    case XEN_DOMCTL_PFINFO_BROKEN:
+        ret = false;
+        break;
+    default:
+        ret = true;
+        break;
+    }
+    return ret;
+}
 #endif
 /*
  * Local variables:
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index 324b9050e2..70c92eaadc 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -152,9 +152,8 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
 
     for ( i = 0; i < count; ++i )
     {
-        if ( (!types || (types &&
-                         (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-                          types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
+        if ( (!types ||
+              (types && page_type_has_stream_data(types[i]) == true)) &&
              !pfn_is_populated(ctx, original_pfns[i]) )
         {
             rc = pfn_set_populated(ctx, original_pfns[i]);
@@ -233,25 +232,8 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
     {
         ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]);
 
-        switch ( types[i] )
-        {
-        case XEN_DOMCTL_PFINFO_NOTAB:
-
-        case XEN_DOMCTL_PFINFO_L1TAB:
-        case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-        case XEN_DOMCTL_PFINFO_L2TAB:
-        case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-        case XEN_DOMCTL_PFINFO_L3TAB:
-        case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-        case XEN_DOMCTL_PFINFO_L4TAB:
-        case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
+        if ( page_type_has_stream_data(types[i]) == true )
             mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]);
-            break;
-        }
     }
 
     /* Nothing to do? */
@@ -271,14 +253,8 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
 
     for ( i = 0, j = 0; i < count; ++i )
     {
-        switch ( types[i] )
-        {
-        case XEN_DOMCTL_PFINFO_XTAB:
-        case XEN_DOMCTL_PFINFO_BROKEN:
-        case XEN_DOMCTL_PFINFO_XALLOC:
-            /* No page data to deal with. */
+        if ( page_type_has_stream_data(types[i]) == false )
             continue;
-        }
 
         if ( map_errs[j] )
         {
@@ -413,7 +389,7 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
             goto err;
         }
 
-        if ( type < XEN_DOMCTL_PFINFO_BROKEN )
+        if ( page_type_has_stream_data(type) == true )
             /* NOTAB and all L1 through L4 tables (including pinned) should
              * have a page worth of data in the record. */
             pages_of_data++;
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index 6f820ea432..12598bd4e2 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -153,13 +153,8 @@ static int write_batch(struct xc_sr_context *ctx)
             goto err;
         }
 
-        switch ( types[i] )
-        {
-        case XEN_DOMCTL_PFINFO_BROKEN:
-        case XEN_DOMCTL_PFINFO_XALLOC:
-        case XEN_DOMCTL_PFINFO_XTAB:
+        if ( page_type_has_stream_data(types[i]) == false )
             continue;
-        }
 
         mfns[nr_pages++] = mfns[i];
     }
@@ -177,13 +172,8 @@ static int write_batch(struct xc_sr_context *ctx)
 
         for ( i = 0, p = 0; i < nr_pfns; ++i )
         {
-            switch ( types[i] )
-            {
-            case XEN_DOMCTL_PFINFO_BROKEN:
-            case XEN_DOMCTL_PFINFO_XALLOC:
-            case XEN_DOMCTL_PFINFO_XTAB:
+            if ( page_type_has_stream_data(types[i]) == false )
                 continue;
-            }
 
             if ( errors[p] )
             {


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 10/36] tools: show migration transfer rate in send_dirty_pages
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (8 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 09/36] tools: unify type checking for data pfns in migration stream Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 11/36] tools: prepare to allocate saverestore arrays once Olaf Hering
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

Show how fast domU pages are transferred in each iteration.

The relevant data is how fast the pfns travel, not so much how much
protocol overhead exists. So the reported MiB/sec is just for pfns.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

v02:
- rearrange MiB_sec calculation (jgross)
---
 tools/libs/saverestore/common.h |  2 ++
 tools/libs/saverestore/save.c   | 46 +++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 08bbe902b9..d61569e1a6 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -250,6 +250,8 @@ struct xc_sr_context
             bool debug;
 
             unsigned long p2m_size;
+            size_t pages_sent;
+            size_t overhead_sent;
 
             struct precopy_stats stats;
 
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index 12598bd4e2..f8fbe7a742 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -1,5 +1,6 @@
 #include <assert.h>
 #include <arpa/inet.h>
+#include <time.h>
 
 #include "common.h"
 
@@ -238,6 +239,8 @@ static int write_batch(struct xc_sr_context *ctx)
     iov[3].iov_len = nr_pfns * sizeof(*rec_pfns);
 
     iovcnt = 4;
+    ctx->save.pages_sent += nr_pages;
+    ctx->save.overhead_sent += sizeof(rec) + sizeof(hdr) + nr_pfns * sizeof(*rec_pfns);
 
     if ( nr_pages )
     {
@@ -357,6 +360,42 @@ static int suspend_domain(struct xc_sr_context *ctx)
     return 0;
 }
 
+static void show_transfer_rate(struct xc_sr_context *ctx, struct timespec *start)
+{
+    xc_interface *xch = ctx->xch;
+    struct timespec end = {}, diff = {};
+    size_t ms, MiB_sec;
+
+    if (!ctx->save.pages_sent)
+        return;
+
+    if ( clock_gettime(CLOCK_MONOTONIC, &end) )
+        PERROR("clock_gettime");
+
+    if ( (end.tv_nsec - start->tv_nsec) < 0 )
+    {
+        diff.tv_sec = end.tv_sec - start->tv_sec - 1;
+        diff.tv_nsec = end.tv_nsec - start->tv_nsec + (1000U*1000U*1000U);
+    }
+    else
+    {
+        diff.tv_sec = end.tv_sec - start->tv_sec;
+        diff.tv_nsec = end.tv_nsec - start->tv_nsec;
+    }
+
+    ms = (diff.tv_nsec / (1000U*1000U));
+    ms += (diff.tv_sec * 1000U);
+    if (!ms)
+        ms = 1;
+
+    MiB_sec = (ctx->save.pages_sent * PAGE_SIZE * 1000U) / ms / (1024U*1024U);
+
+    errno = 0;
+    IPRINTF("%s: %zu bytes + %zu pages in %ld.%09ld sec, %zu MiB/sec", __func__,
+            ctx->save.overhead_sent, ctx->save.pages_sent,
+            diff.tv_sec, diff.tv_nsec, MiB_sec);
+}
+
 /*
  * Send a subset of pages in the guests p2m, according to the dirty bitmap.
  * Used for each subsequent iteration of the live migration loop.
@@ -370,9 +409,15 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
     xen_pfn_t p;
     unsigned long written;
     int rc;
+    struct timespec start = {};
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->save.dirty_bitmap_hbuf);
 
+    ctx->save.pages_sent = 0;
+    ctx->save.overhead_sent = 0;
+    if ( clock_gettime(CLOCK_MONOTONIC, &start) )
+        PERROR("clock_gettime");
+
     for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
     {
         if ( !test_bit(p, dirty_bitmap) )
@@ -396,6 +441,7 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
     if ( written > entries )
         DPRINTF("Bitmap contained more entries than expected...");
 
+    show_transfer_rate(ctx, &start);
     xc_report_progress_step(xch, entries, entries);
 
     return ctx->save.ops.check_vm_state(ctx);


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 11/36] tools: prepare to allocate saverestore arrays once
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (9 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 10/36] tools: show migration transfer rate in send_dirty_pages Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 12/36] tools: save: move mfns array Olaf Hering
                   ` (25 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

The hotpath 'send_dirty_pages' is supposed to do just one thing: sending.
The other end 'handle_page_data' is supposed to do just receiving.

But instead both do other costly work like memory allocations and data moving.
Do the allocations once, the array sizes are a compiletime constant.
Avoid unneeded copying of data by receiving data directly into mapped guest memory.

This patch is just prepartion, subsequent changes will populate the arrays.

Once all changes are applied, migration of a busy HVM domU changes like that:

Without this series, from sr650 to sr950 (xen-4.15.20201027T173911.16a20963b3 xen_testing):
2020-10-29 10:23:10.711+0000: xc: show_transfer_rate: 23663128 bytes + 2879563 pages in 55.324905335 sec, 203 MiB/sec: Internal error
2020-10-29 10:23:35.115+0000: xc: show_transfer_rate: 16829632 bytes + 2097552 pages in 24.401179720 sec, 335 MiB/sec: Internal error
2020-10-29 10:23:59.436+0000: xc: show_transfer_rate: 16829032 bytes + 2097478 pages in 24.319025928 sec, 336 MiB/sec: Internal error
2020-10-29 10:24:23.844+0000: xc: show_transfer_rate: 16829024 bytes + 2097477 pages in 24.406992500 sec, 335 MiB/sec: Internal error
2020-10-29 10:24:48.292+0000: xc: show_transfer_rate: 16828912 bytes + 2097463 pages in 24.446489027 sec, 335 MiB/sec: Internal error
2020-10-29 10:25:01.816+0000: xc: show_transfer_rate: 16836080 bytes + 2098356 pages in 13.447091818 sec, 609 MiB/sec: Internal error

With this series, from sr650 to sr950 (xen-4.15.20201027T173911.16a20963b3 xen_unstable):
2020-10-28 21:26:05.074+0000: xc: show_transfer_rate: 23663128 bytes + 2879563 pages in 52.564054368 sec, 213 MiB/sec: Internal error
2020-10-28 21:26:23.527+0000: xc: show_transfer_rate: 16830040 bytes + 2097603 pages in 18.450592015 sec, 444 MiB/sec: Internal error
2020-10-28 21:26:41.926+0000: xc: show_transfer_rate: 16830944 bytes + 2097717 pages in 18.397862306 sec, 445 MiB/sec: Internal error
2020-10-28 21:27:00.339+0000: xc: show_transfer_rate: 16829176 bytes + 2097498 pages in 18.411973339 sec, 445 MiB/sec: Internal error
2020-10-28 21:27:18.643+0000: xc: show_transfer_rate: 16828592 bytes + 2097425 pages in 18.303326695 sec, 447 MiB/sec: Internal error
2020-10-28 21:27:26.289+0000: xc: show_transfer_rate: 16835952 bytes + 2098342 pages in 7.579846749 sec, 1081 MiB/sec: Internal error

Note: the performance improvement depends on the used network cards,
wirespeed and the host:
- No improvement is expected with a 1G link.
- Improvement can be seen as shown above on a 10G link.
- Just a slight improvment can be seen on a 100G link.

This change also populates sr_save_arrays with "batch_pfns", and
sr_restore_arrays with "pfns" to make sure malloc is always called
with a non-zero value.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

v02:
- rename xc_sr_save_arrays to sr_save_arrays
- rename xc_sr_restore_arrays to sr_restore_arrays
- merge handling of "batch_pfns" and "pfns" to make sure malloc is
  called with a non-zero size value (jgross)
---
 tools/libs/saverestore/common.h  | 12 +++++++++++-
 tools/libs/saverestore/restore.c | 14 ++++++++++----
 tools/libs/saverestore/save.c    | 27 +++++++++++++--------------
 3 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index d61569e1a6..b3941af537 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -223,6 +223,15 @@ static inline int update_blob(struct xc_sr_blob *blob,
     return 0;
 }
 
+struct sr_save_arrays {
+    xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
+};
+
+struct sr_restore_arrays {
+    /* handle_page_data */
+    xen_pfn_t pfns[MAX_BATCH_SIZE];
+};
+
 struct xc_sr_context
 {
     xc_interface *xch;
@@ -255,11 +264,11 @@ struct xc_sr_context
 
             struct precopy_stats stats;
 
-            xen_pfn_t *batch_pfns;
             unsigned int nr_batch_pfns;
             unsigned long *deferred_pages;
             unsigned long nr_deferred_pages;
             xc_hypercall_buffer_t dirty_bitmap_hbuf;
+            struct sr_save_arrays *m;
         } save;
 
         struct /* Restore data. */
@@ -311,6 +320,7 @@ struct xc_sr_context
 
             /* Sender has invoked verify mode on the stream. */
             bool verify;
+            struct sr_restore_arrays *m;
         } restore;
     };
 
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index 70c92eaadc..e18a03b381 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -315,7 +315,7 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     unsigned int i, pages_of_data = 0;
     int rc = -1;
 
-    xen_pfn_t *pfns = NULL, pfn;
+    xen_pfn_t *pfns = ctx->restore.m->pfns, pfn;
     uint32_t *types = NULL, type;
 
     /*
@@ -363,9 +363,8 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
         goto err;
     }
 
-    pfns = malloc(pages->count * sizeof(*pfns));
     types = malloc(pages->count * sizeof(*types));
-    if ( !pfns || !types )
+    if ( !types )
     {
         ERROR("Unable to allocate enough memory for %u pfns",
               pages->count);
@@ -412,7 +411,6 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
                            &pages->pfn[pages->count]);
  err:
     free(types);
-    free(pfns);
 
     return rc;
 }
@@ -739,6 +737,13 @@ static int setup(struct xc_sr_context *ctx)
     }
     ctx->restore.allocated_rec_num = DEFAULT_BUF_RECORDS;
 
+    ctx->restore.m = malloc(sizeof(*ctx->restore.m));
+    if ( !ctx->restore.m ) {
+        ERROR("Unable to allocate memory for arrays");
+        rc = -1;
+        goto err;
+    }
+
  err:
     return rc;
 }
@@ -757,6 +762,7 @@ static void cleanup(struct xc_sr_context *ctx)
         xc_hypercall_buffer_free_pages(
             xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->restore.p2m_size)));
 
+    free(ctx->restore.m);
     free(ctx->restore.buffered_records);
     free(ctx->restore.populated_pfns);
 
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index f8fbe7a742..e29b6e1d66 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -77,7 +77,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 
 /*
  * Writes a batch of memory as a PAGE_DATA record into the stream.  The batch
- * is constructed in ctx->save.batch_pfns.
+ * is constructed in ctx->save.m->batch_pfns.
  *
  * This function:
  * - gets the types for each pfn in the batch.
@@ -128,12 +128,12 @@ static int write_batch(struct xc_sr_context *ctx)
     for ( i = 0; i < nr_pfns; ++i )
     {
         types[i] = mfns[i] = ctx->save.ops.pfn_to_gfn(ctx,
-                                                      ctx->save.batch_pfns[i]);
+                                                      ctx->save.m->batch_pfns[i]);
 
         /* Likely a ballooned page. */
         if ( mfns[i] == INVALID_MFN )
         {
-            set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
+            set_bit(ctx->save.m->batch_pfns[i], ctx->save.deferred_pages);
             ++ctx->save.nr_deferred_pages;
         }
     }
@@ -179,7 +179,7 @@ static int write_batch(struct xc_sr_context *ctx)
             if ( errors[p] )
             {
                 ERROR("Mapping of pfn %#"PRIpfn" (mfn %#"PRIpfn") failed %d",
-                      ctx->save.batch_pfns[i], mfns[p], errors[p]);
+                      ctx->save.m->batch_pfns[i], mfns[p], errors[p]);
                 goto err;
             }
 
@@ -193,7 +193,7 @@ static int write_batch(struct xc_sr_context *ctx)
             {
                 if ( rc == -1 && errno == EAGAIN )
                 {
-                    set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
+                    set_bit(ctx->save.m->batch_pfns[i], ctx->save.deferred_pages);
                     ++ctx->save.nr_deferred_pages;
                     types[i] = XEN_DOMCTL_PFINFO_XTAB;
                     --nr_pages;
@@ -224,7 +224,7 @@ static int write_batch(struct xc_sr_context *ctx)
     rec.length += nr_pages * PAGE_SIZE;
 
     for ( i = 0; i < nr_pfns; ++i )
-        rec_pfns[i] = ((uint64_t)(types[i]) << 32) | ctx->save.batch_pfns[i];
+        rec_pfns[i] = ((uint64_t)(types[i]) << 32) | ctx->save.m->batch_pfns[i];
 
     iov[0].iov_base = &rec.type;
     iov[0].iov_len = sizeof(rec.type);
@@ -296,9 +296,9 @@ static int flush_batch(struct xc_sr_context *ctx)
 
     if ( !rc )
     {
-        VALGRIND_MAKE_MEM_UNDEFINED(ctx->save.batch_pfns,
+        VALGRIND_MAKE_MEM_UNDEFINED(ctx->save.m->batch_pfns,
                                     MAX_BATCH_SIZE *
-                                    sizeof(*ctx->save.batch_pfns));
+                                    sizeof(*ctx->save.m->batch_pfns));
     }
 
     return rc;
@@ -315,7 +315,7 @@ static int add_to_batch(struct xc_sr_context *ctx, xen_pfn_t pfn)
         rc = flush_batch(ctx);
 
     if ( rc == 0 )
-        ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
+        ctx->save.m->batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
 
     return rc;
 }
@@ -849,13 +849,12 @@ static int setup(struct xc_sr_context *ctx)
 
     dirty_bitmap = xc_hypercall_buffer_alloc_pages(
         xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
-    ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
-                                  sizeof(*ctx->save.batch_pfns));
     ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+    ctx->save.m = malloc(sizeof(*ctx->save.m));
 
-    if ( !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages )
+    if ( !ctx->save.m || !dirty_bitmap || !ctx->save.deferred_pages )
     {
-        ERROR("Unable to allocate memory for dirty bitmaps, batch pfns and"
+        ERROR("Unable to allocate memory for dirty bitmaps and"
               " deferred pages");
         rc = -1;
         errno = ENOMEM;
@@ -884,7 +883,7 @@ static void cleanup(struct xc_sr_context *ctx)
     xc_hypercall_buffer_free_pages(xch, dirty_bitmap,
                                    NRPAGES(bitmap_size(ctx->save.p2m_size)));
     free(ctx->save.deferred_pages);
-    free(ctx->save.batch_pfns);
+    free(ctx->save.m);
 }
 
 /*


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 12/36] tools: save: move mfns array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (10 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 11/36] tools: prepare to allocate saverestore arrays once Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 13/36] tools: save: move types array Olaf Hering
                   ` (24 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move mfns array into preallocated space.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h | 2 ++
 tools/libs/saverestore/save.c   | 7 ++-----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index b3941af537..6129710a3f 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -225,6 +225,8 @@ static inline int update_blob(struct xc_sr_blob *blob,
 
 struct sr_save_arrays {
     xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
+    /* write_batch: Mfns of the batch pfns. */
+    xen_pfn_t mfns[MAX_BATCH_SIZE];
 };
 
 struct sr_restore_arrays {
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index e29b6e1d66..6b09784be8 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -88,7 +88,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 static int write_batch(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xen_pfn_t *mfns = NULL, *types = NULL;
+    xen_pfn_t *mfns = ctx->save.m->mfns, *types = NULL;
     void *guest_mapping = NULL;
     void **guest_data = NULL;
     void **local_pages = NULL;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
     assert(nr_pfns != 0);
 
-    /* Mfns of the batch pfns. */
-    mfns = malloc(nr_pfns * sizeof(*mfns));
     /* Types of the batch pfns. */
     types = malloc(nr_pfns * sizeof(*types));
     /* Errors from attempting to map the gfns. */
@@ -118,7 +116,7 @@ static int write_batch(struct xc_sr_context *ctx)
     /* iovec[] for writev(). */
     iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-    if ( !mfns || !types || !errors || !guest_data || !local_pages || !iov )
+    if ( !types || !errors || !guest_data || !local_pages || !iov )
     {
         ERROR("Unable to allocate arrays for a batch of %u pages",
               nr_pfns);
@@ -277,7 +275,6 @@ static int write_batch(struct xc_sr_context *ctx)
     free(guest_data);
     free(errors);
     free(types);
-    free(mfns);
 
     return rc;
 }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 13/36] tools: save: move types array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (11 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 12/36] tools: save: move mfns array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 14/36] tools: save: move errors array Olaf Hering
                   ` (23 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move types array into preallocated space.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h | 2 ++
 tools/libs/saverestore/save.c   | 7 ++-----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 6129710a3f..1df684acb9 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -227,6 +227,8 @@ struct sr_save_arrays {
     xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
     /* write_batch: Mfns of the batch pfns. */
     xen_pfn_t mfns[MAX_BATCH_SIZE];
+    /* write_batch: Types of the batch pfns. */
+    xen_pfn_t types[MAX_BATCH_SIZE];
 };
 
 struct sr_restore_arrays {
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index 6b09784be8..0883c1fac0 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -88,7 +88,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 static int write_batch(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    xen_pfn_t *mfns = ctx->save.m->mfns, *types = NULL;
+    xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
     void *guest_mapping = NULL;
     void **guest_data = NULL;
     void **local_pages = NULL;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
     assert(nr_pfns != 0);
 
-    /* Types of the batch pfns. */
-    types = malloc(nr_pfns * sizeof(*types));
     /* Errors from attempting to map the gfns. */
     errors = malloc(nr_pfns * sizeof(*errors));
     /* Pointers to page data to send.  Mapped gfns or local allocations. */
@@ -116,7 +114,7 @@ static int write_batch(struct xc_sr_context *ctx)
     /* iovec[] for writev(). */
     iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-    if ( !types || !errors || !guest_data || !local_pages || !iov )
+    if ( !errors || !guest_data || !local_pages || !iov )
     {
         ERROR("Unable to allocate arrays for a batch of %u pages",
               nr_pfns);
@@ -274,7 +272,6 @@ static int write_batch(struct xc_sr_context *ctx)
     free(local_pages);
     free(guest_data);
     free(errors);
-    free(types);
 
     return rc;
 }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 14/36] tools: save: move errors array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (12 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 13/36] tools: save: move types array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 15/36] tools: save: move iov array Olaf Hering
                   ` (22 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move errors array into preallocated space.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h | 2 ++
 tools/libs/saverestore/save.c   | 7 ++-----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 1df684acb9..558b5fbf06 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -229,6 +229,8 @@ struct sr_save_arrays {
     xen_pfn_t mfns[MAX_BATCH_SIZE];
     /* write_batch: Types of the batch pfns. */
     xen_pfn_t types[MAX_BATCH_SIZE];
+    /* write_batch: Errors from attempting to map the gfns. */
+    int errors[MAX_BATCH_SIZE];
 };
 
 struct sr_restore_arrays {
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index 0883c1fac0..9ebbf00ce7 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -92,7 +92,7 @@ static int write_batch(struct xc_sr_context *ctx)
     void *guest_mapping = NULL;
     void **guest_data = NULL;
     void **local_pages = NULL;
-    int *errors = NULL, rc = -1;
+    int *errors = ctx->save.m->errors, rc = -1;
     unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
     unsigned int nr_pfns = ctx->save.nr_batch_pfns;
     void *page, *orig_page;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
     assert(nr_pfns != 0);
 
-    /* Errors from attempting to map the gfns. */
-    errors = malloc(nr_pfns * sizeof(*errors));
     /* Pointers to page data to send.  Mapped gfns or local allocations. */
     guest_data = calloc(nr_pfns, sizeof(*guest_data));
     /* Pointers to locally allocated pages.  Need freeing. */
@@ -114,7 +112,7 @@ static int write_batch(struct xc_sr_context *ctx)
     /* iovec[] for writev(). */
     iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-    if ( !errors || !guest_data || !local_pages || !iov )
+    if ( !guest_data || !local_pages || !iov )
     {
         ERROR("Unable to allocate arrays for a batch of %u pages",
               nr_pfns);
@@ -271,7 +269,6 @@ static int write_batch(struct xc_sr_context *ctx)
     free(iov);
     free(local_pages);
     free(guest_data);
-    free(errors);
 
     return rc;
 }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 15/36] tools: save: move iov array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (13 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 14/36] tools: save: move errors array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 16/36] tools: save: move rec_pfns array Olaf Hering
                   ` (21 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move iov array into preallocated space.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h | 2 ++
 tools/libs/saverestore/save.c   | 7 ++-----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 558b5fbf06..ae87954364 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -231,6 +231,8 @@ struct sr_save_arrays {
     xen_pfn_t types[MAX_BATCH_SIZE];
     /* write_batch: Errors from attempting to map the gfns. */
     int errors[MAX_BATCH_SIZE];
+    /* write_batch: iovec[] for writev(). */
+    struct iovec iov[MAX_BATCH_SIZE + 4];
 };
 
 struct sr_restore_arrays {
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index 9ebbf00ce7..1a5f3d29ea 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -97,7 +97,7 @@ static int write_batch(struct xc_sr_context *ctx)
     unsigned int nr_pfns = ctx->save.nr_batch_pfns;
     void *page, *orig_page;
     uint64_t *rec_pfns = NULL;
-    struct iovec *iov = NULL; int iovcnt = 0;
+    struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
     struct xc_sr_rec_page_data_header hdr = { 0 };
     struct xc_sr_record rec = {
         .type = REC_TYPE_PAGE_DATA,
@@ -109,10 +109,8 @@ static int write_batch(struct xc_sr_context *ctx)
     guest_data = calloc(nr_pfns, sizeof(*guest_data));
     /* Pointers to locally allocated pages.  Need freeing. */
     local_pages = calloc(nr_pfns, sizeof(*local_pages));
-    /* iovec[] for writev(). */
-    iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-    if ( !guest_data || !local_pages || !iov )
+    if ( !guest_data || !local_pages )
     {
         ERROR("Unable to allocate arrays for a batch of %u pages",
               nr_pfns);
@@ -266,7 +264,6 @@ static int write_batch(struct xc_sr_context *ctx)
         xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
     for ( i = 0; local_pages && i < nr_pfns; ++i )
         free(local_pages[i]);
-    free(iov);
     free(local_pages);
     free(guest_data);
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 16/36] tools: save: move rec_pfns array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (14 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 15/36] tools: save: move iov array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 17/36] tools: save: move guest_data array Olaf Hering
                   ` (20 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move rec_pfns array into preallocated space.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h |  2 ++
 tools/libs/saverestore/save.c   | 11 +----------
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index ae87954364..2950947f1d 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -233,6 +233,8 @@ struct sr_save_arrays {
     int errors[MAX_BATCH_SIZE];
     /* write_batch: iovec[] for writev(). */
     struct iovec iov[MAX_BATCH_SIZE + 4];
+    /* write_batch */
+    uint64_t rec_pfns[MAX_BATCH_SIZE];
 };
 
 struct sr_restore_arrays {
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index 1a5f3d29ea..0f02988ff9 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -96,7 +96,7 @@ static int write_batch(struct xc_sr_context *ctx)
     unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
     unsigned int nr_pfns = ctx->save.nr_batch_pfns;
     void *page, *orig_page;
-    uint64_t *rec_pfns = NULL;
+    uint64_t *rec_pfns = ctx->save.m->rec_pfns;
     struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
     struct xc_sr_rec_page_data_header hdr = { 0 };
     struct xc_sr_record rec = {
@@ -201,14 +201,6 @@ static int write_batch(struct xc_sr_context *ctx)
         }
     }
 
-    rec_pfns = malloc(nr_pfns * sizeof(*rec_pfns));
-    if ( !rec_pfns )
-    {
-        ERROR("Unable to allocate %zu bytes of memory for page data pfn list",
-              nr_pfns * sizeof(*rec_pfns));
-        goto err;
-    }
-
     hdr.count = nr_pfns;
 
     rec.length = sizeof(hdr);
@@ -259,7 +251,6 @@ static int write_batch(struct xc_sr_context *ctx)
     rc = ctx->save.nr_batch_pfns = 0;
 
  err:
-    free(rec_pfns);
     if ( guest_mapping )
         xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
     for ( i = 0; local_pages && i < nr_pfns; ++i )


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 17/36] tools: save: move guest_data array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (15 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 16/36] tools: save: move rec_pfns array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 18/36] tools: save: move local_pages array Olaf Hering
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move guest_data array into preallocated space.

Because this was allocated with calloc:
Adjust the loop to clear unused entries as needed.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h |  2 ++
 tools/libs/saverestore/save.c   | 11 ++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 2950947f1d..c4ab843c77 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -235,6 +235,8 @@ struct sr_save_arrays {
     struct iovec iov[MAX_BATCH_SIZE + 4];
     /* write_batch */
     uint64_t rec_pfns[MAX_BATCH_SIZE];
+    /* write_batch: Pointers to page data to send. Mapped gfns or local allocations. */
+    void *guest_data[MAX_BATCH_SIZE];
 };
 
 struct sr_restore_arrays {
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index 0f02988ff9..ea04cb1a74 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -90,7 +90,7 @@ static int write_batch(struct xc_sr_context *ctx)
     xc_interface *xch = ctx->xch;
     xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
     void *guest_mapping = NULL;
-    void **guest_data = NULL;
+    void **guest_data = ctx->save.m->guest_data;
     void **local_pages = NULL;
     int *errors = ctx->save.m->errors, rc = -1;
     unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
@@ -105,12 +105,10 @@ static int write_batch(struct xc_sr_context *ctx)
 
     assert(nr_pfns != 0);
 
-    /* Pointers to page data to send.  Mapped gfns or local allocations. */
-    guest_data = calloc(nr_pfns, sizeof(*guest_data));
     /* Pointers to locally allocated pages.  Need freeing. */
     local_pages = calloc(nr_pfns, sizeof(*local_pages));
 
-    if ( !guest_data || !local_pages )
+    if ( !local_pages )
     {
         ERROR("Unable to allocate arrays for a batch of %u pages",
               nr_pfns);
@@ -166,7 +164,10 @@ static int write_batch(struct xc_sr_context *ctx)
         for ( i = 0, p = 0; i < nr_pfns; ++i )
         {
             if ( page_type_has_stream_data(types[i]) == false )
+            {
+                guest_data[i] = NULL;
                 continue;
+            }
 
             if ( errors[p] )
             {
@@ -183,6 +184,7 @@ static int write_batch(struct xc_sr_context *ctx)
 
             if ( rc )
             {
+                guest_data[i] = NULL;
                 if ( rc == -1 && errno == EAGAIN )
                 {
                     set_bit(ctx->save.m->batch_pfns[i], ctx->save.deferred_pages);
@@ -256,7 +258,6 @@ static int write_batch(struct xc_sr_context *ctx)
     for ( i = 0; local_pages && i < nr_pfns; ++i )
         free(local_pages[i]);
     free(local_pages);
-    free(guest_data);
 
     return rc;
 }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 18/36] tools: save: move local_pages array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (16 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 17/36] tools: save: move guest_data array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 19/36] tools: restore: move types array Olaf Hering
                   ` (18 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move local_pages array into preallocated space.

Adjust the code to use the src page as is in case of HVM.
In case of PV the page may need to be normalised, use a private memory
area for this purpose.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h       | 22 ++++++++++---------
 tools/libs/saverestore/save.c         | 25 +++------------------
 tools/libs/saverestore/save_x86_hvm.c |  5 +++--
 tools/libs/saverestore/save_x86_pv.c  | 31 ++++++++++++++++++---------
 4 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index c4ab843c77..96ae0904fc 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -45,16 +45,12 @@ struct xc_sr_save_ops
      * Optionally transform the contents of a page from being specific to the
      * sending environment, to being generic for the stream.
      *
-     * The page of data at the end of 'page' may be a read-only mapping of a
-     * running guest; it must not be modified.  If no transformation is
-     * required, the callee should leave '*pages' untouched.
+     * The page of data '*src' may be a read-only mapping of a running guest;
+     * it must not be modified. If no transformation is required, the callee
+     * should leave '*src' untouched, and return it via '**ptr'.
      *
-     * If a transformation is required, the callee should allocate themselves
-     * a local page using malloc() and return it via '*page'.
-     *
-     * The caller shall free() '*page' in all cases.  In the case that the
-     * callee encounters an error, it should *NOT* free() the memory it
-     * allocated for '*page'.
+     * If a transformation is required, the callee should provide the
+     * transformed page in a private buffer and return it via '**ptr'.
      *
      * It is valid to fail with EAGAIN if the transformation is not able to be
      * completed at this point.  The page shall be retried later.
@@ -62,7 +58,7 @@ struct xc_sr_save_ops
      * @returns 0 for success, -1 for failure, with errno appropriately set.
      */
     int (*normalise_page)(struct xc_sr_context *ctx, xen_pfn_t type,
-                          void **page);
+                          void *src, unsigned int idx, void **ptr);
 
     /**
      * Set up local environment to save a domain. (Typically querying
@@ -385,6 +381,12 @@ struct xc_sr_context
 
                 union
                 {
+                    struct
+                    {
+                        /* Used by write_batch for modified pages. */
+                        void *normalised_pages;
+                    } save;
+
                     struct
                     {
                         /* State machine for the order of received records. */
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index ea04cb1a74..fa83648f9a 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -91,11 +91,10 @@ static int write_batch(struct xc_sr_context *ctx)
     xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
     void *guest_mapping = NULL;
     void **guest_data = ctx->save.m->guest_data;
-    void **local_pages = NULL;
     int *errors = ctx->save.m->errors, rc = -1;
     unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
     unsigned int nr_pfns = ctx->save.nr_batch_pfns;
-    void *page, *orig_page;
+    void *src;
     uint64_t *rec_pfns = ctx->save.m->rec_pfns;
     struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
     struct xc_sr_rec_page_data_header hdr = { 0 };
@@ -105,16 +104,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
     assert(nr_pfns != 0);
 
-    /* Pointers to locally allocated pages.  Need freeing. */
-    local_pages = calloc(nr_pfns, sizeof(*local_pages));
-
-    if ( !local_pages )
-    {
-        ERROR("Unable to allocate arrays for a batch of %u pages",
-              nr_pfns);
-        goto err;
-    }
-
     for ( i = 0; i < nr_pfns; ++i )
     {
         types[i] = mfns[i] = ctx->save.ops.pfn_to_gfn(ctx,
@@ -176,11 +165,8 @@ static int write_batch(struct xc_sr_context *ctx)
                 goto err;
             }
 
-            orig_page = page = guest_mapping + (p * PAGE_SIZE);
-            rc = ctx->save.ops.normalise_page(ctx, types[i], &page);
-
-            if ( orig_page != page )
-                local_pages[i] = page;
+            src = guest_mapping + (p * PAGE_SIZE);
+            rc = ctx->save.ops.normalise_page(ctx, types[i], src, i, &guest_data[i]);
 
             if ( rc )
             {
@@ -195,8 +181,6 @@ static int write_batch(struct xc_sr_context *ctx)
                 else
                     goto err;
             }
-            else
-                guest_data[i] = page;
 
             rc = -1;
             ++p;
@@ -255,9 +239,6 @@ static int write_batch(struct xc_sr_context *ctx)
  err:
     if ( guest_mapping )
         xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
-    for ( i = 0; local_pages && i < nr_pfns; ++i )
-        free(local_pages[i]);
-    free(local_pages);
 
     return rc;
 }
diff --git a/tools/libs/saverestore/save_x86_hvm.c b/tools/libs/saverestore/save_x86_hvm.c
index 91c2cb99ab..26f49ee267 100644
--- a/tools/libs/saverestore/save_x86_hvm.c
+++ b/tools/libs/saverestore/save_x86_hvm.c
@@ -129,9 +129,10 @@ static xen_pfn_t x86_hvm_pfn_to_gfn(const struct xc_sr_context *ctx,
     return pfn;
 }
 
-static int x86_hvm_normalise_page(struct xc_sr_context *ctx,
-                                  xen_pfn_t type, void **page)
+static int x86_hvm_normalise_page(struct xc_sr_context *ctx, xen_pfn_t type,
+                                  void *src, unsigned int idx, void **ptr)
 {
+    *ptr = src;
     return 0;
 }
 
diff --git a/tools/libs/saverestore/save_x86_pv.c b/tools/libs/saverestore/save_x86_pv.c
index 92f77fad0f..159ff59480 100644
--- a/tools/libs/saverestore/save_x86_pv.c
+++ b/tools/libs/saverestore/save_x86_pv.c
@@ -999,29 +999,31 @@ static xen_pfn_t x86_pv_pfn_to_gfn(const struct xc_sr_context *ctx,
  * save_ops function.  Performs pagetable normalisation on appropriate pages.
  */
 static int x86_pv_normalise_page(struct xc_sr_context *ctx, xen_pfn_t type,
-                                 void **page)
+                                  void *src, unsigned int idx, void **ptr)
 {
     xc_interface *xch = ctx->xch;
-    void *local_page;
     int rc;
+    void *dst;
 
     type &= XEN_DOMCTL_PFINFO_LTABTYPE_MASK;
 
     if ( type < XEN_DOMCTL_PFINFO_L1TAB || type > XEN_DOMCTL_PFINFO_L4TAB )
+    {
+        *ptr = src;
         return 0;
+    }
 
-    local_page = malloc(PAGE_SIZE);
-    if ( !local_page )
+    if ( idx >= MAX_BATCH_SIZE )
     {
-        ERROR("Unable to allocate scratch page");
-        rc = -1;
-        goto out;
+        ERROR("idx %u out of range", idx);
+        errno = ERANGE;
+        return -1;
     }
 
-    rc = normalise_pagetable(ctx, *page, local_page, type);
-    *page = local_page;
+    dst = ctx->x86.pv.save.normalised_pages + idx * PAGE_SIZE;
+    rc = normalise_pagetable(ctx, src, dst, type);
+    *ptr = dst;
 
- out:
     return rc;
 }
 
@@ -1031,8 +1033,16 @@ static int x86_pv_normalise_page(struct xc_sr_context *ctx, xen_pfn_t type,
  */
 static int x86_pv_setup(struct xc_sr_context *ctx)
 {
+    xc_interface *xch = ctx->xch;
     int rc;
 
+    ctx->x86.pv.save.normalised_pages = malloc(MAX_BATCH_SIZE * PAGE_SIZE);
+    if ( !ctx->x86.pv.save.normalised_pages )
+    {
+        PERROR("Failed to allocate normalised_pages");
+        return -1;
+    }
+
     rc = x86_pv_domain_info(ctx);
     if ( rc )
         return rc;
@@ -1118,6 +1128,7 @@ static int x86_pv_check_vm_state(struct xc_sr_context *ctx)
 
 static int x86_pv_cleanup(struct xc_sr_context *ctx)
 {
+    free(ctx->x86.pv.save.normalised_pages);
     free(ctx->x86.pv.p2m_pfns);
 
     if ( ctx->x86.pv.p2m )


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 19/36] tools: restore: move types array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (17 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 18/36] tools: save: move local_pages array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 20/36] tools: restore: move mfns array Olaf Hering
                   ` (17 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move types array into preallocated space.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h  |  1 +
 tools/libs/saverestore/restore.c | 12 +-----------
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 96ae0904fc..fe44302eac 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -238,6 +238,7 @@ struct sr_save_arrays {
 struct sr_restore_arrays {
     /* handle_page_data */
     xen_pfn_t pfns[MAX_BATCH_SIZE];
+    uint32_t types[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index e18a03b381..d460a2b2b5 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -316,7 +316,7 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     int rc = -1;
 
     xen_pfn_t *pfns = ctx->restore.m->pfns, pfn;
-    uint32_t *types = NULL, type;
+    uint32_t *types = ctx->restore.m->types, type;
 
     /*
      * v2 compatibility only exists for x86 streams.  This is a bit of a
@@ -363,14 +363,6 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
         goto err;
     }
 
-    types = malloc(pages->count * sizeof(*types));
-    if ( !types )
-    {
-        ERROR("Unable to allocate enough memory for %u pfns",
-              pages->count);
-        goto err;
-    }
-
     for ( i = 0; i < pages->count; ++i )
     {
         pfn = pages->pfn[i] & PAGE_DATA_PFN_MASK;
@@ -410,8 +402,6 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     rc = process_page_data(ctx, pages->count, pfns, types,
                            &pages->pfn[pages->count]);
  err:
-    free(types);
-
     return rc;
 }
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 20/36] tools: restore: move mfns array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (18 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 19/36] tools: restore: move types array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 21/36] tools: restore: move map_errs array Olaf Hering
                   ` (16 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move mfns array into preallocated space.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h  | 2 ++
 tools/libs/saverestore/restore.c | 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index fe44302eac..54352f5427 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -239,6 +239,8 @@ struct sr_restore_arrays {
     /* handle_page_data */
     xen_pfn_t pfns[MAX_BATCH_SIZE];
     uint32_t types[MAX_BATCH_SIZE];
+    /* process_page_data */
+    xen_pfn_t mfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index d460a2b2b5..1a7cfbcd47 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -205,7 +205,7 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
                              xen_pfn_t *pfns, uint32_t *types, void *page_data)
 {
     xc_interface *xch = ctx->xch;
-    xen_pfn_t *mfns = malloc(count * sizeof(*mfns));
+    xen_pfn_t *mfns = ctx->restore.m->mfns;
     int *map_errs = malloc(count * sizeof(*map_errs));
     int rc;
     void *mapping = NULL, *guest_page = NULL;
@@ -213,7 +213,7 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
         j,          /* j indexes the subset of pfns we decide to map. */
         nr_pages = 0;
 
-    if ( !mfns || !map_errs )
+    if ( !map_errs )
     {
         rc = -1;
         ERROR("Failed to allocate %zu bytes to process page data",
@@ -299,7 +299,6 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
         xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
 
     free(map_errs);
-    free(mfns);
 
     return rc;
 }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 21/36] tools: restore: move map_errs array
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (19 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 20/36] tools: restore: move mfns array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 22/36] tools: restore: move mfns array in populate_pfns Olaf Hering
                   ` (15 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move map_errs array into preallocated space.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h  |  1 +
 tools/libs/saverestore/restore.c | 12 +-----------
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 54352f5427..34042c2b90 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -241,6 +241,7 @@ struct sr_restore_arrays {
     uint32_t types[MAX_BATCH_SIZE];
     /* process_page_data */
     xen_pfn_t mfns[MAX_BATCH_SIZE];
+    int map_errs[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index 1a7cfbcd47..6eb955423c 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -206,21 +206,13 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
 {
     xc_interface *xch = ctx->xch;
     xen_pfn_t *mfns = ctx->restore.m->mfns;
-    int *map_errs = malloc(count * sizeof(*map_errs));
+    int *map_errs = ctx->restore.m->map_errs;
     int rc;
     void *mapping = NULL, *guest_page = NULL;
     unsigned int i, /* i indexes the pfns from the record. */
         j,          /* j indexes the subset of pfns we decide to map. */
         nr_pages = 0;
 
-    if ( !map_errs )
-    {
-        rc = -1;
-        ERROR("Failed to allocate %zu bytes to process page data",
-              count * (sizeof(*mfns) + sizeof(*map_errs)));
-        goto err;
-    }
-
     rc = populate_pfns(ctx, count, pfns, types);
     if ( rc )
     {
@@ -298,8 +290,6 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
     if ( mapping )
         xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
 
-    free(map_errs);
-
     return rc;
 }
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 22/36] tools: restore: move mfns array in populate_pfns
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (20 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 21/36] tools: restore: move map_errs array Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 23/36] tools: restore: move pfns " Olaf Hering
                   ` (14 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move populate_pfns mfns array into preallocated space.
Use some prefix to avoid conflict with an array used in handle_page_data.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h  | 2 ++
 tools/libs/saverestore/restore.c | 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 34042c2b90..3cfb23861f 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -242,6 +242,8 @@ struct sr_restore_arrays {
     /* process_page_data */
     xen_pfn_t mfns[MAX_BATCH_SIZE];
     int map_errs[MAX_BATCH_SIZE];
+    /* populate_pfns */
+    xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index 6eb955423c..0c29478ccb 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -138,12 +138,12 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
                   const xen_pfn_t *original_pfns, const uint32_t *types)
 {
     xc_interface *xch = ctx->xch;
-    xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
+    xen_pfn_t *mfns = ctx->restore.m->pp_mfns,
         *pfns = malloc(count * sizeof(*pfns));
     unsigned int i, nr_pfns = 0;
     int rc = -1;
 
-    if ( !mfns || !pfns )
+    if ( !pfns )
     {
         ERROR("Failed to allocate %zu bytes for populating the physmap",
               2 * count * sizeof(*mfns));
@@ -191,7 +191,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
 
  err:
     free(pfns);
-    free(mfns);
 
     return rc;
 }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 23/36] tools: restore: move pfns array in populate_pfns
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (21 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 22/36] tools: restore: move mfns array in populate_pfns Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 24/36] tools: restore: split record processing Olaf Hering
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

Remove allocation from hotpath, move populate_pfns' pfns array into preallocated space.
Use some prefix to avoid conflict with an array used in handle_page_data.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.h  |  1 +
 tools/libs/saverestore/restore.c | 11 +----------
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 3cfb23861f..379887e149 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -244,6 +244,7 @@ struct sr_restore_arrays {
     int map_errs[MAX_BATCH_SIZE];
     /* populate_pfns */
     xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
+    xen_pfn_t pp_pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index 0c29478ccb..f2234eac55 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -139,17 +139,10 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
 {
     xc_interface *xch = ctx->xch;
     xen_pfn_t *mfns = ctx->restore.m->pp_mfns,
-        *pfns = malloc(count * sizeof(*pfns));
+        *pfns = ctx->restore.m->pp_pfns;
     unsigned int i, nr_pfns = 0;
     int rc = -1;
 
-    if ( !pfns )
-    {
-        ERROR("Failed to allocate %zu bytes for populating the physmap",
-              2 * count * sizeof(*mfns));
-        goto err;
-    }
-
     for ( i = 0; i < count; ++i )
     {
         if ( (!types ||
@@ -190,8 +183,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
     rc = 0;
 
  err:
-    free(pfns);
-
     return rc;
 }
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 24/36] tools: restore: split record processing
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (22 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 23/36] tools: restore: move pfns " Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 25/36] tools: restore: split handle_page_data Olaf Hering
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Juergen Gross, Ian Jackson, Wei Liu

handle_page_data must be able to read directly into mapped guest memory.
This will avoid unneccesary memcpy calls for data which can be consumed verbatim.

Rearrange the code to allow decisions based on the incoming record.

This change is preparation for future changes in handle_page_data,
no change in behavior is intended.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 tools/libs/saverestore/common.c  | 33 ++++++++++++---------
 tools/libs/saverestore/common.h  |  4 ++-
 tools/libs/saverestore/restore.c | 49 ++++++++++++++++++++++----------
 tools/libs/saverestore/save.c    |  7 ++++-
 4 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/tools/libs/saverestore/common.c b/tools/libs/saverestore/common.c
index 77128bc747..7da7fa4e2c 100644
--- a/tools/libs/saverestore/common.c
+++ b/tools/libs/saverestore/common.c
@@ -91,26 +91,33 @@ int write_split_record(struct xc_sr_context *ctx, struct xc_sr_record *rec,
     return -1;
 }
 
-int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
+int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rhdr rhdr;
-    size_t datasz;
 
-    if ( read_exact(fd, &rhdr, sizeof(rhdr)) )
+    if ( read_exact(fd, rhdr, sizeof(*rhdr)) )
     {
         PERROR("Failed to read Record Header from stream");
         return -1;
     }
 
-    if ( rhdr.length > REC_LENGTH_MAX )
+    if ( rhdr->length > REC_LENGTH_MAX )
     {
-        ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr.type,
-              rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
+        ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr->type,
+              rec_type_to_str(rhdr->type), rhdr->length, REC_LENGTH_MAX);
         return -1;
     }
 
-    datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
+    return 0;
+}
+
+int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr,
+                     struct xc_sr_record *rec)
+{
+    xc_interface *xch = ctx->xch;
+    size_t datasz;
+
+    datasz = ROUNDUP(rhdr->length, REC_ALIGN_ORDER);
 
     if ( datasz )
     {
@@ -119,7 +126,7 @@ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
         if ( !rec->data )
         {
             ERROR("Unable to allocate %zu bytes for record data (0x%08x, %s)",
-                  datasz, rhdr.type, rec_type_to_str(rhdr.type));
+                  datasz, rhdr->type, rec_type_to_str(rhdr->type));
             return -1;
         }
 
@@ -128,18 +135,18 @@ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
             free(rec->data);
             rec->data = NULL;
             PERROR("Failed to read %zu bytes of data for record (0x%08x, %s)",
-                   datasz, rhdr.type, rec_type_to_str(rhdr.type));
+                   datasz, rhdr->type, rec_type_to_str(rhdr->type));
             return -1;
         }
     }
     else
         rec->data = NULL;
 
-    rec->type   = rhdr.type;
-    rec->length = rhdr.length;
+    rec->type   = rhdr->type;
+    rec->length = rhdr->length;
 
     return 0;
-};
+}
 
 static void __attribute__((unused)) build_assertions(void)
 {
diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 379887e149..2ced6f100d 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -487,7 +487,9 @@ static inline int write_record(struct xc_sr_context *ctx,
  *
  * On failure, the contents of the record structure are undefined.
  */
-int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
+int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr);
+int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr,
+                     struct xc_sr_record *rec);
 
 /*
  * This would ideally be private in restore.c, but is needed by
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index f2234eac55..2409c8d603 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -471,7 +471,7 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx)
     return rc;
 }
 
-static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
+static int process_buffered_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
 static int handle_checkpoint(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
@@ -510,7 +510,7 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 
         for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
         {
-            rc = process_record(ctx, &ctx->restore.buffered_records[i]);
+            rc = process_buffered_record(ctx, &ctx->restore.buffered_records[i]);
             if ( rc )
                 goto err;
         }
@@ -571,10 +571,11 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
     return rc;
 }
 
-static int buffer_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
+static int buffer_record(struct xc_sr_context *ctx, struct xc_sr_rhdr *rhdr)
 {
     xc_interface *xch = ctx->xch;
     unsigned int new_alloc_num;
+    struct xc_sr_record rec;
     struct xc_sr_record *p;
 
     if ( ctx->restore.buffered_rec_num >= ctx->restore.allocated_rec_num )
@@ -592,8 +593,13 @@ static int buffer_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
         ctx->restore.allocated_rec_num = new_alloc_num;
     }
 
+    if ( read_record_data(ctx, ctx->fd, rhdr, &rec) )
+    {
+        return -1;
+    }
+
     memcpy(&ctx->restore.buffered_records[ctx->restore.buffered_rec_num++],
-           rec, sizeof(*rec));
+           &rec, sizeof(rec));
 
     return 0;
 }
@@ -624,7 +630,7 @@ int handle_static_data_end(struct xc_sr_context *ctx)
     return rc;
 }
 
-static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
+static int process_buffered_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
     int rc = 0;
@@ -662,6 +668,19 @@ static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     return rc;
 }
 
+static int process_incoming_record_header(struct xc_sr_context *ctx, struct xc_sr_rhdr *rhdr)
+{
+    struct xc_sr_record rec;
+    int rc;
+
+    rc = read_record_data(ctx, ctx->fd, rhdr, &rec);
+    if ( rc )
+        return rc;
+
+    return process_buffered_record(ctx, &rec);
+}
+
+
 static int setup(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
@@ -745,7 +764,7 @@ static void cleanup(struct xc_sr_context *ctx)
 static int restore(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_record rec;
+    struct xc_sr_rhdr rhdr;
     int rc, saved_rc = 0, saved_errno = 0;
 
     IPRINTF("Restoring domain");
@@ -756,7 +775,7 @@ static int restore(struct xc_sr_context *ctx)
 
     do
     {
-        rc = read_record(ctx, ctx->fd, &rec);
+        rc = read_record_header(ctx, ctx->fd, &rhdr);
         if ( rc )
         {
             if ( ctx->restore.buffer_all_records )
@@ -766,25 +785,25 @@ static int restore(struct xc_sr_context *ctx)
         }
 
         if ( ctx->restore.buffer_all_records &&
-             rec.type != REC_TYPE_END &&
-             rec.type != REC_TYPE_CHECKPOINT )
+             rhdr.type != REC_TYPE_END &&
+             rhdr.type != REC_TYPE_CHECKPOINT )
         {
-            rc = buffer_record(ctx, &rec);
+            rc = buffer_record(ctx, &rhdr);
             if ( rc )
                 goto err;
         }
         else
         {
-            rc = process_record(ctx, &rec);
+            rc = process_incoming_record_header(ctx, &rhdr);
             if ( rc == RECORD_NOT_PROCESSED )
             {
-                if ( rec.type & REC_TYPE_OPTIONAL )
+                if ( rhdr.type & REC_TYPE_OPTIONAL )
                     DPRINTF("Ignoring optional record %#x (%s)",
-                            rec.type, rec_type_to_str(rec.type));
+                            rhdr.type, rec_type_to_str(rhdr.type));
                 else
                 {
                     ERROR("Mandatory record %#x (%s) not handled",
-                          rec.type, rec_type_to_str(rec.type));
+                          rhdr.type, rec_type_to_str(rhdr.type));
                     rc = -1;
                     goto err;
                 }
@@ -795,7 +814,7 @@ static int restore(struct xc_sr_context *ctx)
                 goto err;
         }
 
-    } while ( rec.type != REC_TYPE_END );
+    } while ( rhdr.type != REC_TYPE_END );
 
  remus_failover:
     if ( ctx->stream_type == XC_STREAM_COLO )
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index fa83648f9a..e486bce96f 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -589,6 +589,7 @@ static int send_memory_live(struct xc_sr_context *ctx)
 static int colo_merge_secondary_dirty_bitmap(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
+    struct xc_sr_rhdr rhdr;
     struct xc_sr_record rec;
     uint64_t *pfns = NULL;
     uint64_t pfn;
@@ -597,7 +598,11 @@ static int colo_merge_secondary_dirty_bitmap(struct xc_sr_context *ctx)
     DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
                                     &ctx->save.dirty_bitmap_hbuf);
 
-    rc = read_record(ctx, ctx->save.recv_fd, &rec);
+    rc = read_record_header(ctx, ctx->save.recv_fd, &rhdr);
+    if ( rc )
+        goto err;
+
+    rc = read_record_data(ctx, ctx->save.recv_fd, &rhdr, &rec);
     if ( rc )
         goto err;
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 25/36] tools: restore: split handle_page_data
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (23 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 24/36] tools: restore: split record processing Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 26/36] tools: restore: write data directly into guest Olaf Hering
                   ` (11 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

handle_page_data must be able to read directly into mapped guest memory.
This will avoid unneccesary memcpy calls for data that can be consumed verbatim.

Split the various steps of record processing:
- move processing to handle_buffered_page_data
- adjust xenforeignmemory_map to set errno in case of failure
- adjust verify mode to set errno in case of failure

This change is preparation for future changes in handle_page_data,
no change in behavior is intended.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 tools/libs/saverestore/common.h  |   9 +
 tools/libs/saverestore/restore.c | 343 ++++++++++++++++++++-----------
 2 files changed, 231 insertions(+), 121 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 2ced6f100d..d479f1a918 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -242,9 +242,14 @@ struct sr_restore_arrays {
     /* process_page_data */
     xen_pfn_t mfns[MAX_BATCH_SIZE];
     int map_errs[MAX_BATCH_SIZE];
+    void *guest_data[MAX_BATCH_SIZE];
+
     /* populate_pfns */
     xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
     xen_pfn_t pp_pfns[MAX_BATCH_SIZE];
+
+    /* Must be the last member */
+    struct xc_sr_rec_page_data_header pages;
 };
 
 struct xc_sr_context
@@ -335,7 +340,11 @@ struct xc_sr_context
 
             /* Sender has invoked verify mode on the stream. */
             bool verify;
+            void *verify_buf;
+
             struct sr_restore_arrays *m;
+            void *guest_mapping;
+            uint32_t nr_mapped_pages;
         } restore;
     };
 
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index 2409c8d603..877fd19a9b 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -186,123 +186,18 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
     return rc;
 }
 
-/*
- * Given a list of pfns, their types, and a block of page data from the
- * stream, populate and record their types, map the relevant subset and copy
- * the data into the guest.
- */
-static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
-                             xen_pfn_t *pfns, uint32_t *types, void *page_data)
+static int handle_static_data_end_v2(struct xc_sr_context *ctx)
 {
-    xc_interface *xch = ctx->xch;
-    xen_pfn_t *mfns = ctx->restore.m->mfns;
-    int *map_errs = ctx->restore.m->map_errs;
-    int rc;
-    void *mapping = NULL, *guest_page = NULL;
-    unsigned int i, /* i indexes the pfns from the record. */
-        j,          /* j indexes the subset of pfns we decide to map. */
-        nr_pages = 0;
-
-    rc = populate_pfns(ctx, count, pfns, types);
-    if ( rc )
-    {
-        ERROR("Failed to populate pfns for batch of %u pages", count);
-        goto err;
-    }
-
-    for ( i = 0; i < count; ++i )
-    {
-        ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]);
-
-        if ( page_type_has_stream_data(types[i]) == true )
-            mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]);
-    }
-
-    /* Nothing to do? */
-    if ( nr_pages == 0 )
-        goto done;
-
-    mapping = guest_page = xenforeignmemory_map(
-        xch->fmem, ctx->domid, PROT_READ | PROT_WRITE,
-        nr_pages, mfns, map_errs);
-    if ( !mapping )
-    {
-        rc = -1;
-        PERROR("Unable to map %u mfns for %u pages of data",
-               nr_pages, count);
-        goto err;
-    }
-
-    for ( i = 0, j = 0; i < count; ++i )
-    {
-        if ( page_type_has_stream_data(types[i]) == false )
-            continue;
-
-        if ( map_errs[j] )
-        {
-            rc = -1;
-            ERROR("Mapping pfn %#"PRIpfn" (mfn %#"PRIpfn", type %#"PRIx32") failed with %d",
-                  pfns[i], mfns[j], types[i], map_errs[j]);
-            goto err;
-        }
-
-        /* Undo page normalisation done by the saver. */
-        rc = ctx->restore.ops.localise_page(ctx, types[i], page_data);
-        if ( rc )
-        {
-            ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")",
-                  pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
-            goto err;
-        }
-
-        if ( ctx->restore.verify )
-        {
-            /* Verify mode - compare incoming data to what we already have. */
-            if ( memcmp(guest_page, page_data, PAGE_SIZE) )
-                ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
-                      pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
-        }
-        else
-        {
-            /* Regular mode - copy incoming data into place. */
-            memcpy(guest_page, page_data, PAGE_SIZE);
-        }
-
-        ++j;
-        guest_page += PAGE_SIZE;
-        page_data += PAGE_SIZE;
-    }
-
- done:
-    rc = 0;
-
- err:
-    if ( mapping )
-        xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
-
-    return rc;
-}
+    int rc = 0;
 
-/*
- * Validate a PAGE_DATA record from the stream, and pass the results to
- * process_page_data() to actually perform the legwork.
- */
-static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
-{
+#if defined(__i386__) || defined(__x86_64__)
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_page_data_header *pages = rec->data;
-    unsigned int i, pages_of_data = 0;
-    int rc = -1;
-
-    xen_pfn_t *pfns = ctx->restore.m->pfns, pfn;
-    uint32_t *types = ctx->restore.m->types, type;
-
     /*
      * v2 compatibility only exists for x86 streams.  This is a bit of a
      * bodge, but it is less bad than duplicating handle_page_data() between
      * different architectures.
      */
-#if defined(__i386__) || defined(__x86_64__)
+
     /* v2 compat.  Infer the position of STATIC_DATA_END. */
     if ( ctx->restore.format_version < 3 && !ctx->restore.seen_static_data_end )
     {
@@ -320,12 +215,26 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
         ERROR("No STATIC_DATA_END seen");
         goto err;
     }
+
+    rc = 0;
+err:
 #endif
 
-    if ( rec->length < sizeof(*pages) )
+    return rc;
+}
+
+static bool verify_rec_page_hdr(struct xc_sr_context *ctx, uint32_t rec_length,
+                                 struct xc_sr_rec_page_data_header *pages)
+{
+    xc_interface *xch = ctx->xch;
+    bool ret = false;
+
+    errno = EINVAL;
+
+    if ( rec_length < sizeof(*pages) )
     {
         ERROR("PAGE_DATA record truncated: length %u, min %zu",
-              rec->length, sizeof(*pages));
+              rec_length, sizeof(*pages));
         goto err;
     }
 
@@ -335,13 +244,35 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
         goto err;
     }
 
-    if ( rec->length < sizeof(*pages) + (pages->count * sizeof(uint64_t)) )
+    if ( pages->count > MAX_BATCH_SIZE )
+    {
+        ERROR("pfn count %u in PAGE_DATA record too large", pages->count);
+        errno = E2BIG;
+        goto err;
+    }
+
+    if ( rec_length < sizeof(*pages) + (pages->count * sizeof(uint64_t)) )
     {
         ERROR("PAGE_DATA record (length %u) too short to contain %u"
-              " pfns worth of information", rec->length, pages->count);
+              " pfns worth of information", rec_length, pages->count);
         goto err;
     }
 
+    ret = true;
+
+err:
+    return ret;
+}
+
+static bool verify_rec_page_pfns(struct xc_sr_context *ctx, uint32_t rec_length,
+                                 struct xc_sr_rec_page_data_header *pages)
+{
+    xc_interface *xch = ctx->xch;
+    uint32_t i, pages_of_data = 0;
+    xen_pfn_t pfn;
+    uint32_t type;
+    bool ret = false;
+
     for ( i = 0; i < pages->count; ++i )
     {
         pfn = pages->pfn[i] & PAGE_DATA_PFN_MASK;
@@ -364,23 +295,183 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
              * have a page worth of data in the record. */
             pages_of_data++;
 
-        pfns[i] = pfn;
-        types[i] = type;
+        ctx->restore.m->pfns[i] = pfn;
+        ctx->restore.m->types[i] = type;
     }
 
-    if ( rec->length != (sizeof(*pages) +
+    if ( rec_length != (sizeof(*pages) +
                          (sizeof(uint64_t) * pages->count) +
                          (PAGE_SIZE * pages_of_data)) )
     {
         ERROR("PAGE_DATA record wrong size: length %u, expected "
-              "%zu + %zu + %lu", rec->length, sizeof(*pages),
+              "%zu + %zu + %lu", rec_length, sizeof(*pages),
               (sizeof(uint64_t) * pages->count), (PAGE_SIZE * pages_of_data));
         goto err;
     }
 
-    rc = process_page_data(ctx, pages->count, pfns, types,
-                           &pages->pfn[pages->count]);
+    ret = true;
+
+err:
+    return ret;
+}
+
+/*
+ * Populate pfns, if required
+ * Fill m->guest_data with either mapped address or NULL
+ * The caller must unmap guest_mapping
+ */
+static int map_guest_pages(struct xc_sr_context *ctx,
+                           struct xc_sr_rec_page_data_header *pages)
+{
+    xc_interface *xch = ctx->xch;
+    struct sr_restore_arrays *m = ctx->restore.m;
+    uint32_t i, p;
+    int rc;
+
+    rc = populate_pfns(ctx, pages->count, m->pfns, m->types);
+    if ( rc )
+    {
+        ERROR("Failed to populate pfns for batch of %u pages", pages->count);
+        goto err;
+    }
+
+    ctx->restore.nr_mapped_pages = 0;
+
+    for ( i = 0; i < pages->count; i++ )
+    {
+        ctx->restore.ops.set_page_type(ctx, m->pfns[i], m->types[i]);
+
+        if ( page_type_has_stream_data(m->types[i]) == false )
+        {
+            m->guest_data[i] = NULL;
+            continue;
+        }
+
+        m->mfns[ctx->restore.nr_mapped_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, m->pfns[i]);
+    }
+
+    /* Nothing to do? */
+    if ( ctx->restore.nr_mapped_pages == 0 )
+        goto done;
+
+    ctx->restore.guest_mapping = xenforeignmemory_map(xch->fmem, ctx->domid,
+            PROT_READ | PROT_WRITE, ctx->restore.nr_mapped_pages,
+            m->mfns, m->map_errs);
+    if ( !ctx->restore.guest_mapping )
+    {
+        rc = -1;
+        PERROR("Unable to map %u mfns for %u pages of data",
+               ctx->restore.nr_mapped_pages, pages->count);
+        goto err;
+    }
+
+    /* Verify mapping, and assign address to pfn data */
+    for ( i = 0, p = 0; i < pages->count; i++ )
+    {
+        if ( page_type_has_stream_data(m->types[i]) == false )
+            continue;
+
+        if ( m->map_errs[p] == 0 )
+        {
+            m->guest_data[i] = ctx->restore.guest_mapping + (p * PAGE_SIZE);
+            p++;
+            continue;
+        }
+
+        errno = m->map_errs[p];
+        rc = -1;
+        PERROR("Mapping pfn %#"PRIpfn" (mfn %#"PRIpfn", type %#"PRIx32") failed",
+              m->pfns[i], m->mfns[p], m->types[i]);
+        goto err;
+    }
+
+done:
+    rc = 0;
+
+err:
+    return rc;
+}
+
+/*
+ * Handle PAGE_DATA record from an existing buffer
+ * Given a list of pfns, their types, and a block of page data from the
+ * stream, populate and record their types, map the relevant subset and copy
+ * the data into the guest.
+ */
+static int handle_buffered_page_data(struct xc_sr_context *ctx,
+                                     struct xc_sr_record *rec)
+{
+    xc_interface *xch = ctx->xch;
+    struct xc_sr_rec_page_data_header *pages = rec->data;
+    struct sr_restore_arrays *m = ctx->restore.m;
+    void *p;
+    uint32_t i;
+    int rc = -1, idx;
+
+    rc = handle_static_data_end_v2(ctx);
+    if ( rc )
+        goto err;
+
+    /* First read and verify the header */
+    if ( verify_rec_page_hdr(ctx, rec->length, pages) == false )
+    {
+        rc = -1;
+        goto err;
+    }
+
+    /* Then read and verify the pfn numbers */
+    if ( verify_rec_page_pfns(ctx, rec->length, pages) == false )
+    {
+        rc = -1;
+        goto err;
+    }
+
+    /* Map the target pfn */
+    rc = map_guest_pages(ctx, pages);
+    if ( rc )
+        goto err;
+
+    for ( i = 0, idx = 0; i < pages->count; i++ )
+    {
+        if ( !m->guest_data[i] )
+            continue;
+
+        p = &pages->pfn[pages->count] + (idx * PAGE_SIZE);
+        rc = ctx->restore.ops.localise_page(ctx, m->types[i], p);
+        if ( rc )
+        {
+            ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")",
+                  m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+            goto err;
+
+        }
+
+        if ( ctx->restore.verify )
+        {
+            if ( memcmp(m->guest_data[i], p, PAGE_SIZE) )
+            {
+                errno = EIO;
+                ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
+                      m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+                goto err;
+            }
+        }
+        else
+        {
+            memcpy(m->guest_data[i], p, PAGE_SIZE);
+        }
+
+        idx++;
+    }
+
+    rc = 0;
+
  err:
+    if ( ctx->restore.guest_mapping )
+    {
+        xenforeignmemory_unmap(xch->fmem, ctx->restore.guest_mapping, ctx->restore.nr_mapped_pages);
+        ctx->restore.guest_mapping = NULL;
+    }
     return rc;
 }
 
@@ -641,12 +732,21 @@ static int process_buffered_record(struct xc_sr_context *ctx, struct xc_sr_recor
         break;
 
     case REC_TYPE_PAGE_DATA:
-        rc = handle_page_data(ctx, rec);
+        rc = handle_buffered_page_data(ctx, rec);
         break;
 
     case REC_TYPE_VERIFY:
         DPRINTF("Verify mode enabled");
         ctx->restore.verify = true;
+        if ( !ctx->restore.verify_buf )
+        {
+            ctx->restore.verify_buf = malloc(MAX_BATCH_SIZE * PAGE_SIZE);
+            if ( !ctx->restore.verify_buf )
+            {
+                rc = -1;
+                PERROR("Unable to allocate verify_buf");
+            }
+        }
         break;
 
     case REC_TYPE_CHECKPOINT:
@@ -725,7 +825,8 @@ static int setup(struct xc_sr_context *ctx)
     }
     ctx->restore.allocated_rec_num = DEFAULT_BUF_RECORDS;
 
-    ctx->restore.m = malloc(sizeof(*ctx->restore.m));
+    ctx->restore.m = malloc(sizeof(*ctx->restore.m) +
+            (sizeof(*ctx->restore.m->pages.pfn) * MAX_BATCH_SIZE));
     if ( !ctx->restore.m ) {
         ERROR("Unable to allocate memory for arrays");
         rc = -1;


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 26/36] tools: restore: write data directly into guest
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (24 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 25/36] tools: restore: split handle_page_data Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 27/36] tools: recognize LIBXL_API_VERSION for 4.16 Olaf Hering
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

Read incoming migration stream directly into the guest memory.
This avoids the memory allocation and copying, and the resulting
performance penalty.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 tools/libs/saverestore/common.h  |   1 +
 tools/libs/saverestore/restore.c | 132 ++++++++++++++++++++++++++++++-
 2 files changed, 129 insertions(+), 4 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index d479f1a918..5c440f28ec 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -243,6 +243,7 @@ struct sr_restore_arrays {
     xen_pfn_t mfns[MAX_BATCH_SIZE];
     int map_errs[MAX_BATCH_SIZE];
     void *guest_data[MAX_BATCH_SIZE];
+    struct iovec iov[MAX_BATCH_SIZE];
 
     /* populate_pfns */
     xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index 877fd19a9b..d0148606bf 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -392,6 +392,122 @@ err:
     return rc;
 }
 
+/*
+ * Handle PAGE_DATA record from the stream.
+ * Given a list of pfns, their types, and a block of page data from the
+ * stream, populate and record their types, map the relevant subset and copy
+ * the data into the guest.
+ */
+static int handle_incoming_page_data(struct xc_sr_context *ctx,
+                                     struct xc_sr_rhdr *rhdr)
+{
+    xc_interface *xch = ctx->xch;
+    struct sr_restore_arrays *m = ctx->restore.m;
+    struct xc_sr_rec_page_data_header *pages = &m->pages;
+    uint64_t *pfn_nums = m->pages.pfn;
+    uint32_t i;
+    int rc, iov_idx;
+
+    rc = handle_static_data_end_v2(ctx);
+    if ( rc )
+        goto err;
+
+    /* First read and verify the header */
+    rc = read_exact(ctx->fd, pages, sizeof(*pages));
+    if ( rc )
+    {
+        PERROR("Could not read rec_pfn header");
+        goto err;
+    }
+
+    if ( verify_rec_page_hdr(ctx, rhdr->length, pages) == false )
+    {
+        rc = -1;
+        goto err;
+    }
+
+    /* Then read and verify the incoming pfn numbers */
+    rc = read_exact(ctx->fd, pfn_nums, sizeof(*pfn_nums) * pages->count);
+    if ( rc )
+    {
+        PERROR("Could not read rec_pfn data");
+        goto err;
+    }
+
+    if ( verify_rec_page_pfns(ctx, rhdr->length, pages) == false )
+    {
+        rc = -1;
+        goto err;
+    }
+
+    /* Finally read and verify the incoming pfn data */
+    rc = map_guest_pages(ctx, pages);
+    if ( rc )
+        goto err;
+
+    /* Prepare read buffers, either guest or throw away memory */
+    for ( i = 0, iov_idx = 0; i < pages->count; i++ )
+    {
+        if ( !m->guest_data[i] )
+            continue;
+
+        m->iov[iov_idx].iov_len = PAGE_SIZE;
+        if ( ctx->restore.verify )
+            m->iov[iov_idx].iov_base = ctx->restore.verify_buf + i * PAGE_SIZE;
+        else
+            m->iov[iov_idx].iov_base = m->guest_data[i];
+        iov_idx++;
+    }
+
+    if ( !iov_idx )
+        goto done;
+
+    rc = readv_exact(ctx->fd, m->iov, iov_idx);
+    if ( rc )
+    {
+        PERROR("read of %d pages failed", iov_idx);
+        goto err;
+    }
+
+    /* Post-processing of pfn data */
+    for ( i = 0, iov_idx = 0; i < pages->count; i++ )
+    {
+        if ( !m->guest_data[i] )
+            continue;
+
+        rc = ctx->restore.ops.localise_page(ctx, m->types[i], m->iov[iov_idx].iov_base);
+        if ( rc )
+        {
+            ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")",
+                  m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+            goto err;
+
+        }
+
+        if ( ctx->restore.verify )
+        {
+            if ( memcmp(m->guest_data[i], m->iov[iov_idx].iov_base, PAGE_SIZE) )
+            {
+                ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
+                      m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+            }
+        }
+
+        iov_idx++;
+    }
+
+done:
+    rc = 0;
+
+err:
+    if ( ctx->restore.guest_mapping )
+    {
+        xenforeignmemory_unmap(xch->fmem, ctx->restore.guest_mapping, ctx->restore.nr_mapped_pages);
+        ctx->restore.guest_mapping = NULL;
+    }
+    return rc;
+}
+
 /*
  * Handle PAGE_DATA record from an existing buffer
  * Given a list of pfns, their types, and a block of page data from the
@@ -773,11 +889,19 @@ static int process_incoming_record_header(struct xc_sr_context *ctx, struct xc_s
     struct xc_sr_record rec;
     int rc;
 
-    rc = read_record_data(ctx, ctx->fd, rhdr, &rec);
-    if ( rc )
-        return rc;
+    switch ( rhdr->type )
+    {
+    case REC_TYPE_PAGE_DATA:
+        rc = handle_incoming_page_data(ctx, rhdr);
+        break;
+    default:
+        rc = read_record_data(ctx, ctx->fd, rhdr, &rec);
+        if ( rc == 0 )
+            rc = process_buffered_record(ctx, &rec);;
+        break;
+    }
 
-    return process_buffered_record(ctx, &rec);
+    return rc;
 }
 
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 27/36] tools: recognize LIBXL_API_VERSION for 4.16
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (25 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 26/36] tools: restore: write data directly into guest Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 28/36] tools: adjust libxl_domain_suspend to receive a struct props Olaf Hering
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu

This is required by upcoming API changes.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 tools/include/libxl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index ae7fe27c1f..29931626a2 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -729,7 +729,8 @@ typedef struct libxl__ctx libxl_ctx;
 #if LIBXL_API_VERSION != 0x040200 && LIBXL_API_VERSION != 0x040300 && \
     LIBXL_API_VERSION != 0x040400 && LIBXL_API_VERSION != 0x040500 && \
     LIBXL_API_VERSION != 0x040700 && LIBXL_API_VERSION != 0x040800 && \
-    LIBXL_API_VERSION != 0x041300 && LIBXL_API_VERSION != 0x041400
+    LIBXL_API_VERSION != 0x041300 && LIBXL_API_VERSION != 0x041400 && \
+    LIBXL_API_VERSION != 0x041600
 #error Unknown LIBXL_API_VERSION
 #endif
 #endif


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 28/36] tools: adjust libxl_domain_suspend to receive a struct props
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (26 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 27/36] tools: recognize LIBXL_API_VERSION for 4.16 Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 29/36] tools: change struct precopy_stats to precopy_stats_t Olaf Hering
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Christian Lindig, Ian Jackson, Wei Liu,
	Anthony PERARD, Juergen Gross, David Scott

Upcoming changes will pass more knobs down to xc_domain_save.
Adjust the libxl_domain_suspend API to allow easy adding of additional knobs.

No change in behavior intented.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
---
 tools/include/libxl.h                | 26 +++++++++++++++++++++++---
 tools/libs/light/libxl_domain.c      |  7 ++++---
 tools/ocaml/libs/xl/xenlight_stubs.c |  3 ++-
 tools/xl/xl_migrate.c                |  9 ++++++---
 tools/xl/xl_saverestore.c            |  3 ++-
 5 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 29931626a2..9a4d7514ed 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1706,12 +1706,32 @@ static inline int libxl_retrieve_domain_configuration_0x041200(
     libxl_retrieve_domain_configuration_0x041200
 #endif
 
+/*
+ * LIBXL_HAVE_DOMAIN_SUSPEND_PROPS indicates that the
+ * libxl_domain_suspend_props() function takes a props struct.
+ */
+#define LIBXL_HAVE_DOMAIN_SUSPEND_PROPS 1
+
+typedef struct {
+    uint32_t flags; /* LIBXL_SUSPEND_* */
+} libxl_domain_suspend_props;
+#define LIBXL_SUSPEND_DEBUG 1
+#define LIBXL_SUSPEND_LIVE 2
+
 int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
-                         int flags, /* LIBXL_SUSPEND_* */
+                         libxl_domain_suspend_props *props,
                          const libxl_asyncop_how *ao_how)
                          LIBXL_EXTERNAL_CALLERS_ONLY;
-#define LIBXL_SUSPEND_DEBUG 1
-#define LIBXL_SUSPEND_LIVE 2
+#if defined(LIBXL_API_VERSION) && LIBXL_API_VERSION < 0x041600
+static inline int libxl_domain_suspend_0x041500(libxl_ctx *ctx, uint32_t domid,
+                         int fd, int flags, /* LIBXL_SUSPEND_* */
+                         const libxl_asyncop_how *ao_how)
+{
+    libxl_domain_suspend_props props = { .flags = flags, };
+    return libxl_domain_suspend(ctx, domid, fd, &props, ao_how);
+}
+#define libxl_domain_suspend libxl_domain_suspend_0x041500
+#endif
 
 /*
  * Only suspend domain, do not save its state to file, do not destroy it.
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index c00c36c928..5dbd27900f 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -505,7 +505,8 @@ static void domain_suspend_cb(libxl__egc *egc,
 
 }
 
-int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, int flags,
+int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
+                         libxl_domain_suspend_props *props,
                          const libxl_asyncop_how *ao_how)
 {
     AO_CREATE(ctx, domid, ao_how);
@@ -526,8 +527,8 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd, int flags,
     dss->domid = domid;
     dss->fd = fd;
     dss->type = type;
-    dss->live = flags & LIBXL_SUSPEND_LIVE;
-    dss->debug = flags & LIBXL_SUSPEND_DEBUG;
+    dss->live = props->flags & LIBXL_SUSPEND_LIVE;
+    dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
     dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
 
     rc = libxl__fd_flags_modify_save(gc, dss->fd,
diff --git a/tools/ocaml/libs/xl/xenlight_stubs.c b/tools/ocaml/libs/xl/xenlight_stubs.c
index 352a00134d..eaf7bce35a 100644
--- a/tools/ocaml/libs/xl/xenlight_stubs.c
+++ b/tools/ocaml/libs/xl/xenlight_stubs.c
@@ -614,10 +614,11 @@ value stub_libxl_domain_suspend(value ctx, value domid, value fd, value async, v
 	int ret;
 	uint32_t c_domid = Int_val(domid);
 	int c_fd = Int_val(fd);
+    libxl_domain_suspend_props props = {};
 	libxl_asyncop_how *ao_how = aohow_val(async);
 
 	caml_enter_blocking_section();
-	ret = libxl_domain_suspend(CTX, c_domid, c_fd, 0, ao_how);
+	ret = libxl_domain_suspend(CTX, c_domid, c_fd, &props, ao_how);
 	caml_leave_blocking_section();
 
 	free(ao_how);
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index b8594f44a5..144890924f 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -186,7 +186,10 @@ static void migrate_domain(uint32_t domid, int preserve_domid,
     char *away_domname;
     char rc_buf;
     uint8_t *config_data;
-    int config_len, flags = LIBXL_SUSPEND_LIVE;
+    int config_len;
+    libxl_domain_suspend_props props = {
+        .flags = LIBXL_SUSPEND_LIVE,
+        };
 
     save_domain_core_begin(domid, preserve_domid, override_config_file,
                            &config_data, &config_len);
@@ -205,8 +208,8 @@ static void migrate_domain(uint32_t domid, int preserve_domid,
     xtl_stdiostream_adjust_flags(logger, XTL_STDIOSTREAM_HIDE_PROGRESS, 0);
 
     if (debug)
-        flags |= LIBXL_SUSPEND_DEBUG;
-    rc = libxl_domain_suspend(ctx, domid, send_fd, flags, NULL);
+        props.flags |= LIBXL_SUSPEND_DEBUG;
+    rc = libxl_domain_suspend(ctx, domid, send_fd, &props, NULL);
     if (rc) {
         fprintf(stderr, "migration sender: libxl_domain_suspend failed"
                 " (rc=%d)\n", rc);
diff --git a/tools/xl/xl_saverestore.c b/tools/xl/xl_saverestore.c
index 953d791d1a..476d4d9a6a 100644
--- a/tools/xl/xl_saverestore.c
+++ b/tools/xl/xl_saverestore.c
@@ -130,6 +130,7 @@ static int save_domain(uint32_t domid, int preserve_domid,
     int fd;
     uint8_t *config_data;
     int config_len;
+    libxl_domain_suspend_props props = {};
 
     save_domain_core_begin(domid, preserve_domid, override_config_file,
                            &config_data, &config_len);
@@ -146,7 +147,7 @@ static int save_domain(uint32_t domid, int preserve_domid,
 
     save_domain_core_writeconfig(fd, filename, config_data, config_len);
 
-    int rc = libxl_domain_suspend(ctx, domid, fd, 0, NULL);
+    int rc = libxl_domain_suspend(ctx, domid, fd, &props, NULL);
     close(fd);
 
     if (rc < 0) {


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 29/36] tools: change struct precopy_stats to precopy_stats_t
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (27 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 28/36] tools: adjust libxl_domain_suspend to receive a struct props Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 30/36] tools: add callback to libxl for precopy_policy and precopy_stats_t Olaf Hering
                   ` (7 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

This will help libxl_save_msgs_gen.pl to copy the struct as a region of memory.

No change in behavior intented.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 tools/include/xensaverestore.h  | 7 +++----
 tools/libs/saverestore/common.h | 2 +-
 tools/libs/saverestore/save.c   | 6 +++---
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/include/xensaverestore.h b/tools/include/xensaverestore.h
index 0410f0469e..dca0134605 100644
--- a/tools/include/xensaverestore.h
+++ b/tools/include/xensaverestore.h
@@ -23,18 +23,17 @@
 #define XCFLAGS_DEBUG     (1 << 1)
 
 /* For save's precopy_policy(). */
-struct precopy_stats
-{
+typedef struct {
     unsigned int iteration;
     unsigned long total_written;
     long dirty_count; /* -1 if unknown */
-};
+} precopy_stats_t;
 
 /*
  * A precopy_policy callback may not be running in the same address
  * space as libxc an so precopy_stats is passed by value.
  */
-typedef int (*precopy_policy_t)(struct precopy_stats, void *);
+typedef int (*precopy_policy_t)(precopy_stats_t, void *);
 
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 5c440f28ec..60bbba6aa9 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -283,7 +283,7 @@ struct xc_sr_context
             size_t pages_sent;
             size_t overhead_sent;
 
-            struct precopy_stats stats;
+            precopy_stats_t stats;
 
             unsigned int nr_batch_pfns;
             unsigned long *deferred_pages;
diff --git a/tools/libs/saverestore/save.c b/tools/libs/saverestore/save.c
index e486bce96f..537b977ba8 100644
--- a/tools/libs/saverestore/save.c
+++ b/tools/libs/saverestore/save.c
@@ -488,7 +488,7 @@ static int update_progress_string(struct xc_sr_context *ctx, char **str)
 #define SPP_MAX_ITERATIONS      5
 #define SPP_TARGET_DIRTY_COUNT 50
 
-static int simple_precopy_policy(struct precopy_stats stats, void *user)
+static int simple_precopy_policy(precopy_stats_t stats, void *user)
 {
     return ((stats.dirty_count >= 0 &&
              stats.dirty_count < SPP_TARGET_DIRTY_COUNT) ||
@@ -515,13 +515,13 @@ static int send_memory_live(struct xc_sr_context *ctx)
     precopy_policy_t precopy_policy = ctx->save.callbacks->precopy_policy;
     void *data = ctx->save.callbacks->data;
 
-    struct precopy_stats *policy_stats;
+    precopy_stats_t *policy_stats;
 
     rc = update_progress_string(ctx, &progress_str);
     if ( rc )
         goto out;
 
-    ctx->save.stats = (struct precopy_stats){
+    ctx->save.stats = (precopy_stats_t){
         .dirty_count = ctx->save.p2m_size,
     };
     policy_stats = &ctx->save.stats;


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 30/36] tools: add callback to libxl for precopy_policy and precopy_stats_t
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (28 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 29/36] tools: change struct precopy_stats to precopy_stats_t Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 31/36] tools: add --max_iters to libxl_domain_suspend Olaf Hering
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Ian Jackson, Wei Liu, Anthony PERARD, Juergen Gross

This duplicates simple_precopy_policy. To recap its purpose:
- do up to 5 iterations of copying dirty domU memory to target,
  including the initial copying of all domU memory, excluding
  the final copying while the domU is suspended
- do fewer iterations in case the domU dirtied less than 50 pages

Take the opportunity to also move xen_pfn_t into qw().

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 tools/libs/light/libxl_dom_save.c       | 19 +++++++++++++++++++
 tools/libs/light/libxl_internal.h       |  2 ++
 tools/libs/light/libxl_save_msgs_gen.pl |  3 ++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/libs/light/libxl_dom_save.c b/tools/libs/light/libxl_dom_save.c
index 32e3cb5a13..3f3cff0342 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -373,6 +373,24 @@ int libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
     return rc;
 }
 
+static int libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
+{
+    libxl__save_helper_state *shs = user;
+    libxl__domain_save_state *dss = shs->caller_state;
+    STATE_AO_GC(dss->ao);
+
+    LOGD(DEBUG, shs->domid, "iteration %u dirty_count %ld total_written %lu",
+         stats.iteration, stats.dirty_count, stats.total_written);
+    if (stats.dirty_count >= 0 && stats.dirty_count < LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT)
+        goto stop_copy;
+    if (stats.iteration >= LIBXL_XGS_POLICY_MAX_ITERATIONS)
+        goto stop_copy;
+    return XGS_POLICY_CONTINUE_PRECOPY;
+
+stop_copy:
+    return XGS_POLICY_STOP_AND_COPY;
+}
+
 /*----- main code for saving, in order of execution -----*/
 
 void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
@@ -430,6 +448,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
         callbacks->suspend = libxl__domain_suspend_callback;
 
     callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty;
+    callbacks->precopy_policy = libxl__domain_save_precopy_policy;
 
     dss->sws.ao  = dss->ao;
     dss->sws.dss = dss;
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index 439c654733..57d7e4b4b8 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -125,6 +125,8 @@
 #define DOMID_XS_PATH "domid"
 #define PVSHIM_BASENAME "xen-shim"
 #define PVSHIM_CMDLINE "pv-shim console=xen,pv"
+#define LIBXL_XGS_POLICY_MAX_ITERATIONS 5
+#define LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT 50
 
 /* Size macros. */
 #define __AC(X,Y)   (X##Y)
diff --git a/tools/libs/light/libxl_save_msgs_gen.pl b/tools/libs/light/libxl_save_msgs_gen.pl
index f263ee01bb..ab55c81644 100755
--- a/tools/libs/light/libxl_save_msgs_gen.pl
+++ b/tools/libs/light/libxl_save_msgs_gen.pl
@@ -23,6 +23,7 @@ our @msgs = (
                                              STRING doing_what),
                                             'unsigned long', 'done',
                                             'unsigned long', 'total'] ],
+    [ 'scxW',   "precopy_policy", ['precopy_stats_t', 'stats'] ],
     [ 'srcxA',  "suspend", [] ],
     [ 'srcxA',  "postcopy", [] ],
     [ 'srcxA',  "checkpoint", [] ],
@@ -142,7 +143,7 @@ static void bytes_put(unsigned char *const buf, int *len,
 
 END
 
-foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long', 'xen_pfn_t') {
+foreach my $simpletype (qw(int uint16_t uint32_t unsigned precopy_stats_t xen_pfn_t), 'unsigned long') {
     my $typeid = typeid($simpletype);
     $out_body{'callout'} .= <<END;
 static int ${typeid}_get(const unsigned char **msg,


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 31/36] tools: add --max_iters to libxl_domain_suspend
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (29 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 30/36] tools: add callback to libxl for precopy_policy and precopy_stats_t Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 32/36] tools: add --min_remaining " Olaf Hering
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Ian Jackson, Wei Liu, Anthony PERARD, Juergen Gross

Migrating a large, and potentially busy, domU will take more
time than neccessary due to excessive number of copying iterations.

Allow to host admin to control the number of iterations which
copy cumulated domU dirty pages to the target host.

The default remains 5, which means one initial iteration to copy the
entire domU memory, and up to 4 additional iterations to copy dirty
memory from the still running domU. After the given number of iterations
the domU is suspended, remaining dirty memory is copied and the domU is
finally moved to the target host.

This patch adjusts xl(1) and the libxl API.
External users check LIBXL_HAVE_DOMAIN_SUSPEND_PROPS for the availibility
of the new .max_iters property.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 docs/man/xl.1.pod.in              |  4 ++++
 tools/include/libxl.h             |  1 +
 tools/libs/light/libxl_dom_save.c |  2 +-
 tools/libs/light/libxl_domain.c   |  1 +
 tools/libs/light/libxl_internal.h |  1 +
 tools/xl/xl_cmdtable.c            |  3 ++-
 tools/xl/xl_migrate.c             | 10 +++++++++-
 7 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 70a6ebf438..594387bcf4 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -494,6 +494,10 @@ such that it will be identical on the destination host, unless that
 configuration is overridden using the B<-C> option. Note that it is not
 possible to use this option for a 'localhost' migration.
 
+=item B<--max_iters> I<iterations>
+
+Number of copy iterations before final suspend+move (default: 5)
+
 =back
 
 =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 9a4d7514ed..bf77da0524 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1714,6 +1714,7 @@ static inline int libxl_retrieve_domain_configuration_0x041200(
 
 typedef struct {
     uint32_t flags; /* LIBXL_SUSPEND_* */
+    uint32_t max_iters;
 } libxl_domain_suspend_props;
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
diff --git a/tools/libs/light/libxl_dom_save.c b/tools/libs/light/libxl_dom_save.c
index 3f3cff0342..938c0127f3 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -383,7 +383,7 @@ static int libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
          stats.iteration, stats.dirty_count, stats.total_written);
     if (stats.dirty_count >= 0 && stats.dirty_count < LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT)
         goto stop_copy;
-    if (stats.iteration >= LIBXL_XGS_POLICY_MAX_ITERATIONS)
+    if (stats.iteration >= dss->max_iters)
         goto stop_copy;
     return XGS_POLICY_CONTINUE_PRECOPY;
 
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index 5dbd27900f..9f98cd7f2b 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -527,6 +527,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
     dss->domid = domid;
     dss->fd = fd;
     dss->type = type;
+    dss->max_iters = props->max_iters ?: LIBXL_XGS_POLICY_MAX_ITERATIONS;
     dss->live = props->flags & LIBXL_SUSPEND_LIVE;
     dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
     dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index 57d7e4b4b8..8cbcc5282c 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -3649,6 +3649,7 @@ struct libxl__domain_save_state {
     int live;
     int debug;
     int checkpointed_stream;
+    uint32_t max_iters;
     const libxl_domain_remus_info *remus;
     /* private */
     int rc;
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index ca1dfa3525..9b6b3c99aa 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -174,7 +174,8 @@ const struct cmd_spec cmd_table[] = {
       "                of the domain.\n"
       "--debug         Ignored.\n"
       "-p              Do not unpause domain after migrating it.\n"
-      "-D              Preserve the domain id"
+      "-D              Preserve the domain id\n"
+      "--max_iters N   Number of copy iterations before final stop+move"
     },
     { "restore",
       &main_restore, 0, 1,
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index 144890924f..af117d4d56 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -178,6 +178,7 @@ static void migrate_do_preamble(int send_fd, int recv_fd, pid_t child,
 
 static void migrate_domain(uint32_t domid, int preserve_domid,
                            const char *rune, int debug,
+                           uint32_t max_iters,
                            const char *override_config_file)
 {
     pid_t child = -1;
@@ -189,6 +190,7 @@ static void migrate_domain(uint32_t domid, int preserve_domid,
     int config_len;
     libxl_domain_suspend_props props = {
         .flags = LIBXL_SUSPEND_LIVE,
+        .max_iters = max_iters,
         };
 
     save_domain_core_begin(domid, preserve_domid, override_config_file,
@@ -542,8 +544,10 @@ int main_migrate(int argc, char **argv)
     char *host;
     int opt, daemonize = 1, monitor = 1, debug = 0, pause_after_migration = 0;
     int preserve_domid = 0;
+    uint32_t max_iters = 0;
     static struct option opts[] = {
         {"debug", 0, 0, 0x100},
+        {"max_iters", 1, 0, 0x101},
         {"live", 0, 0, 0x200},
         COMMON_LONG_OPTS
     };
@@ -571,6 +575,9 @@ int main_migrate(int argc, char **argv)
     case 0x100: /* --debug */
         debug = 1;
         break;
+    case 0x101: /* --max_iters */
+        max_iters = atoi(optarg);
+        break;
     case 0x200: /* --live */
         /* ignored for compatibility with xm */
         break;
@@ -605,7 +612,8 @@ int main_migrate(int argc, char **argv)
                   pause_after_migration ? " -p" : "");
     }
 
-    migrate_domain(domid, preserve_domid, rune, debug, config_filename);
+    migrate_domain(domid, preserve_domid, rune, debug,
+                   max_iters, config_filename);
     return EXIT_SUCCESS;
 }
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 32/36] tools: add --min_remaining to libxl_domain_suspend
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (30 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 31/36] tools: add --max_iters to libxl_domain_suspend Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 33/36] tools: add --abort_if_busy " Olaf Hering
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Ian Jackson, Wei Liu, Anthony PERARD, Juergen Gross

The decision to stop+move a domU to the new host must be based on two factors:
- the available network bandwidth for the migration stream
- the maximum time a workload within a domU can be savely suspended

Both values define how many dirty pages a workload may produce prior the
final stop+move.

The default value of 50 pages is much too low with todays network bandwidths.
On an idle 1GiB link these 200K will be transferred within ~2ms.

Give the admin a knob to adjust the point when the final stop+move will
be done, so he can base this decision on his own needs.

This patch adjusts xl(1) and the libxl API.
External users check LIBXL_HAVE_DOMAIN_SUSPEND_PROPS for the availibility
of the new .min_remaining property.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 docs/man/xl.1.pod.in              |  8 ++++++++
 tools/include/libxl.h             |  1 +
 tools/libs/light/libxl_dom_save.c |  2 +-
 tools/libs/light/libxl_domain.c   |  1 +
 tools/libs/light/libxl_internal.h |  1 +
 tools/xl/xl_cmdtable.c            | 23 ++++++++++++-----------
 tools/xl/xl_migrate.c             |  9 ++++++++-
 7 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 594387bcf4..09e866ad87 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -498,6 +498,14 @@ possible to use this option for a 'localhost' migration.
 
 Number of copy iterations before final suspend+move (default: 5)
 
+=item B<--min_remaing> I<pages>
+
+Number of remaining dirty pages. If the number of dirty pages drops that
+low, the guest is suspended and the domU will finally be moved to I<host>.
+
+This allows the host admin to control for how long the domU will likely
+be suspended during transit.
+
 =back
 
 =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index bf77da0524..28d70b1078 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1715,6 +1715,7 @@ static inline int libxl_retrieve_domain_configuration_0x041200(
 typedef struct {
     uint32_t flags; /* LIBXL_SUSPEND_* */
     uint32_t max_iters;
+    uint32_t min_remaining;
 } libxl_domain_suspend_props;
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
diff --git a/tools/libs/light/libxl_dom_save.c b/tools/libs/light/libxl_dom_save.c
index 938c0127f3..ad5df89b2c 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -381,7 +381,7 @@ static int libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
 
     LOGD(DEBUG, shs->domid, "iteration %u dirty_count %ld total_written %lu",
          stats.iteration, stats.dirty_count, stats.total_written);
-    if (stats.dirty_count >= 0 && stats.dirty_count < LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT)
+    if (stats.dirty_count >= 0 && stats.dirty_count < dss->min_remaining)
         goto stop_copy;
     if (stats.iteration >= dss->max_iters)
         goto stop_copy;
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index 9f98cd7f2b..06ca7a7df6 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -528,6 +528,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
     dss->fd = fd;
     dss->type = type;
     dss->max_iters = props->max_iters ?: LIBXL_XGS_POLICY_MAX_ITERATIONS;
+    dss->min_remaining = props->min_remaining ?: LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT;
     dss->live = props->flags & LIBXL_SUSPEND_LIVE;
     dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
     dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index 8cbcc5282c..e4bfb34085 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -3650,6 +3650,7 @@ struct libxl__domain_save_state {
     int debug;
     int checkpointed_stream;
     uint32_t max_iters;
+    uint32_t min_remaining;
     const libxl_domain_remus_info *remus;
     /* private */
     int rc;
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 9b6b3c99aa..2cb4980c80 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -165,17 +165,18 @@ const struct cmd_spec cmd_table[] = {
       &main_migrate, 0, 1,
       "Migrate a domain to another host",
       "[options] <Domain> <host>",
-      "-h              Print this help.\n"
-      "-C <config>     Send <config> instead of config file from creation.\n"
-      "-s <sshcommand> Use <sshcommand> instead of ssh.  String will be passed\n"
-      "                to sh. If empty, run <host> instead of ssh <host> xl\n"
-      "                migrate-receive [-d -e]\n"
-      "-e              Do not wait in the background (on <host>) for the death\n"
-      "                of the domain.\n"
-      "--debug         Ignored.\n"
-      "-p              Do not unpause domain after migrating it.\n"
-      "-D              Preserve the domain id\n"
-      "--max_iters N   Number of copy iterations before final stop+move"
+      "-h                Print this help.\n"
+      "-C <config>       Send <config> instead of config file from creation.\n"
+      "-s <sshcommand>   Use <sshcommand> instead of ssh.  String will be passed\n"
+      "                  to sh. If empty, run <host> instead of ssh <host> xl\n"
+      "                  migrate-receive [-d -e]\n"
+      "-e                Do not wait in the background (on <host>) for the death\n"
+      "                  of the domain.\n"
+      "--debug           Ignored.\n"
+      "-p                Do not unpause domain after migrating it.\n"
+      "-D                Preserve the domain id\n"
+      "--max_iters N     Number of copy iterations before final stop+move\n"
+      "--min_remaining N Number of remaining dirty pages before final stop+move"
     },
     { "restore",
       &main_restore, 0, 1,
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index af117d4d56..14feb2b7ec 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -179,6 +179,7 @@ static void migrate_do_preamble(int send_fd, int recv_fd, pid_t child,
 static void migrate_domain(uint32_t domid, int preserve_domid,
                            const char *rune, int debug,
                            uint32_t max_iters,
+                           uint32_t min_remaining,
                            const char *override_config_file)
 {
     pid_t child = -1;
@@ -191,6 +192,7 @@ static void migrate_domain(uint32_t domid, int preserve_domid,
     libxl_domain_suspend_props props = {
         .flags = LIBXL_SUSPEND_LIVE,
         .max_iters = max_iters,
+        .min_remaining = min_remaining,
         };
 
     save_domain_core_begin(domid, preserve_domid, override_config_file,
@@ -545,9 +547,11 @@ int main_migrate(int argc, char **argv)
     int opt, daemonize = 1, monitor = 1, debug = 0, pause_after_migration = 0;
     int preserve_domid = 0;
     uint32_t max_iters = 0;
+    uint32_t min_remaining = 0;
     static struct option opts[] = {
         {"debug", 0, 0, 0x100},
         {"max_iters", 1, 0, 0x101},
+        {"min_remaining", 1, 0, 0x102},
         {"live", 0, 0, 0x200},
         COMMON_LONG_OPTS
     };
@@ -578,6 +582,9 @@ int main_migrate(int argc, char **argv)
     case 0x101: /* --max_iters */
         max_iters = atoi(optarg);
         break;
+    case 0x102: /* --min_remaining */
+        min_remaining = atoi(optarg);
+        break;
     case 0x200: /* --live */
         /* ignored for compatibility with xm */
         break;
@@ -613,7 +620,7 @@ int main_migrate(int argc, char **argv)
     }
 
     migrate_domain(domid, preserve_domid, rune, debug,
-                   max_iters, config_filename);
+                   max_iters, min_remaining, config_filename);
     return EXIT_SUCCESS;
 }
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 33/36] tools: add --abort_if_busy to libxl_domain_suspend
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (31 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 32/36] tools: add --min_remaining " Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 34/36] tools: add API for expandable bitmaps Olaf Hering
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel
  Cc: Olaf Hering, Ian Jackson, Wei Liu, Anthony PERARD, Juergen Gross

Provide a knob to the host admin to abort the live migration of a
running domU if the downtime during final transit will be too long
for the workload within domU.

Adjust error reporting. Add ERROR_MIGRATION_ABORTED to allow callers of
libxl_domain_suspend to distinguish between errors and the requested
constraint.

Adjust precopy_policy to simplify reporting of remaining dirty pages.
The loop in send_memory_live populates ->dirty_count in a different
place than ->iteration. Let it proceeed one more time to provide the
desired information before leaving the loop.

This patch adjusts xl(1) and the libxl API.
External users check LIBXL_HAVE_DOMAIN_SUSPEND_PROPS for the availibility
of the new .abort_if_busy property.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 docs/man/xl.1.pod.in                  |  8 +++++++
 tools/include/libxl.h                 |  1 +
 tools/libs/light/libxl_dom_save.c     |  7 ++++++-
 tools/libs/light/libxl_domain.c       |  1 +
 tools/libs/light/libxl_internal.h     |  2 ++
 tools/libs/light/libxl_stream_write.c |  9 +++++++-
 tools/libs/light/libxl_types.idl      |  1 +
 tools/xl/xl_cmdtable.c                |  6 +++++-
 tools/xl/xl_migrate.c                 | 30 ++++++++++++++++++++-------
 9 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 09e866ad87..37267c9171 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -506,6 +506,14 @@ low, the guest is suspended and the domU will finally be moved to I<host>.
 This allows the host admin to control for how long the domU will likely
 be suspended during transit.
 
+=item B<--abort_if_busy>
+
+Abort migration instead of doing final suspend/move/resume if the
+guest produced more than I<min_remaining> dirty pages during th number
+of I<max_iters> iterations.
+This avoids long periods of time where the guest is suspended, which
+may confuse the workload within domU.
+
 =back
 
 =item B<remus> [I<OPTIONS>] I<domain-id> I<host>
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 28d70b1078..cc056ed627 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1719,6 +1719,7 @@ typedef struct {
 } libxl_domain_suspend_props;
 #define LIBXL_SUSPEND_DEBUG 1
 #define LIBXL_SUSPEND_LIVE 2
+#define LIBXL_SUSPEND_ABORT_IF_BUSY 4
 
 int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
                          libxl_domain_suspend_props *props,
diff --git a/tools/libs/light/libxl_dom_save.c b/tools/libs/light/libxl_dom_save.c
index ad5df89b2c..1999a8997f 100644
--- a/tools/libs/light/libxl_dom_save.c
+++ b/tools/libs/light/libxl_dom_save.c
@@ -383,11 +383,16 @@ static int libxl__domain_save_precopy_policy(precopy_stats_t stats, void *user)
          stats.iteration, stats.dirty_count, stats.total_written);
     if (stats.dirty_count >= 0 && stats.dirty_count < dss->min_remaining)
         goto stop_copy;
-    if (stats.iteration >= dss->max_iters)
+    if (stats.dirty_count >= 0 && stats.iteration >= dss->max_iters)
         goto stop_copy;
     return XGS_POLICY_CONTINUE_PRECOPY;
 
 stop_copy:
+    if (dss->abort_if_busy)
+    {
+        dss->remaining_dirty_pages = stats.dirty_count;
+        return XGS_POLICY_ABORT;
+    }
     return XGS_POLICY_STOP_AND_COPY;
 }
 
diff --git a/tools/libs/light/libxl_domain.c b/tools/libs/light/libxl_domain.c
index 06ca7a7df6..e4740b063e 100644
--- a/tools/libs/light/libxl_domain.c
+++ b/tools/libs/light/libxl_domain.c
@@ -529,6 +529,7 @@ int libxl_domain_suspend(libxl_ctx *ctx, uint32_t domid, int fd,
     dss->type = type;
     dss->max_iters = props->max_iters ?: LIBXL_XGS_POLICY_MAX_ITERATIONS;
     dss->min_remaining = props->min_remaining ?: LIBXL_XGS_POLICY_TARGET_DIRTY_COUNT;
+    dss->abort_if_busy = props->flags & LIBXL_SUSPEND_ABORT_IF_BUSY;
     dss->live = props->flags & LIBXL_SUSPEND_LIVE;
     dss->debug = props->flags & LIBXL_SUSPEND_DEBUG;
     dss->checkpointed_stream = LIBXL_CHECKPOINTED_STREAM_NONE;
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index e4bfb34085..905d5179ba 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -3648,9 +3648,11 @@ struct libxl__domain_save_state {
     libxl_domain_type type;
     int live;
     int debug;
+    int abort_if_busy;
     int checkpointed_stream;
     uint32_t max_iters;
     uint32_t min_remaining;
+    long remaining_dirty_pages;
     const libxl_domain_remus_info *remus;
     /* private */
     int rc;
diff --git a/tools/libs/light/libxl_stream_write.c b/tools/libs/light/libxl_stream_write.c
index 634f3240d1..1ab3943f3e 100644
--- a/tools/libs/light/libxl_stream_write.c
+++ b/tools/libs/light/libxl_stream_write.c
@@ -344,11 +344,18 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void,
         goto err;
 
     if (retval) {
+        if (dss->remaining_dirty_pages) {
+            LOGD(NOTICE, dss->domid, "saving domain: aborted,"
+                 " %ld remaining dirty pages.", dss->remaining_dirty_pages);
+        } else {
         LOGEVD(ERROR, errnoval, dss->domid, "saving domain: %s",
               dss->dsps.guest_responded ?
               "domain responded to suspend request" :
               "domain did not respond to suspend request");
-        if (!dss->dsps.guest_responded)
+        }
+        if (dss->remaining_dirty_pages)
+           rc = ERROR_MIGRATION_ABORTED;
+        else if(!dss->dsps.guest_responded)
             rc = ERROR_GUEST_TIMEDOUT;
         else if (dss->rc)
             rc = dss->rc;
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index f45adddab0..b91769ee10 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -76,6 +76,7 @@ libxl_error = Enumeration("error", [
     (-30, "QMP_DEVICE_NOT_ACTIVE"), # a device has failed to be become active
     (-31, "QMP_DEVICE_NOT_FOUND"), # the requested device has not been found
     (-32, "QEMU_API"), # QEMU's replies don't contains expected members
+    (-33, "MIGRATION_ABORTED"),
     ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 2cb4980c80..322a47c2bc 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -176,7 +176,11 @@ const struct cmd_spec cmd_table[] = {
       "-p                Do not unpause domain after migrating it.\n"
       "-D                Preserve the domain id\n"
       "--max_iters N     Number of copy iterations before final stop+move\n"
-      "--min_remaining N Number of remaining dirty pages before final stop+move"
+      "--min_remaining N Number of remaining dirty pages before final stop+move\n"
+      "--abort_if_busy   Abort migration instead of doing final stop+move,\n"
+      "                  if the number of dirty pages is higher than <min_remaining>\n"
+      "                  after <max_iters> iterations. Otherwise the amount of memory\n"
+      "                  to be transfered would exceed maximum allowed domU downtime."
     },
     { "restore",
       &main_restore, 0, 1,
diff --git a/tools/xl/xl_migrate.c b/tools/xl/xl_migrate.c
index 14feb2b7ec..f523746e5b 100644
--- a/tools/xl/xl_migrate.c
+++ b/tools/xl/xl_migrate.c
@@ -177,7 +177,7 @@ static void migrate_do_preamble(int send_fd, int recv_fd, pid_t child,
 }
 
 static void migrate_domain(uint32_t domid, int preserve_domid,
-                           const char *rune, int debug,
+                           const char *rune, int debug, int abort_if_busy,
                            uint32_t max_iters,
                            uint32_t min_remaining,
                            const char *override_config_file)
@@ -213,14 +213,20 @@ static void migrate_domain(uint32_t domid, int preserve_domid,
 
     if (debug)
         props.flags |= LIBXL_SUSPEND_DEBUG;
+    if (abort_if_busy)
+        props.flags |= LIBXL_SUSPEND_ABORT_IF_BUSY;
     rc = libxl_domain_suspend(ctx, domid, send_fd, &props, NULL);
     if (rc) {
         fprintf(stderr, "migration sender: libxl_domain_suspend failed"
                 " (rc=%d)\n", rc);
-        if (rc == ERROR_GUEST_TIMEDOUT)
-            goto failed_suspend;
-        else
-            goto failed_resume;
+        switch (rc) {
+            case ERROR_GUEST_TIMEDOUT:
+                goto failed_suspend;
+            case ERROR_MIGRATION_ABORTED:
+                goto failed_busy;
+            default:
+                goto failed_resume;
+        }
     }
 
     //fprintf(stderr, "migration sender: Transfer complete.\n");
@@ -302,6 +308,12 @@ static void migrate_domain(uint32_t domid, int preserve_domid,
     fprintf(stderr, "Migration failed, failed to suspend at sender.\n");
     exit(EXIT_FAILURE);
 
+ failed_busy:
+    close(send_fd);
+    migration_child_report(recv_fd);
+    fprintf(stderr, "Migration aborted as requested, domain is too busy.\n");
+    exit(EXIT_FAILURE);
+
  failed_resume:
     close(send_fd);
     migration_child_report(recv_fd);
@@ -545,13 +557,14 @@ int main_migrate(int argc, char **argv)
     char *rune = NULL;
     char *host;
     int opt, daemonize = 1, monitor = 1, debug = 0, pause_after_migration = 0;
-    int preserve_domid = 0;
+    int preserve_domid = 0, abort_if_busy = 0;
     uint32_t max_iters = 0;
     uint32_t min_remaining = 0;
     static struct option opts[] = {
         {"debug", 0, 0, 0x100},
         {"max_iters", 1, 0, 0x101},
         {"min_remaining", 1, 0, 0x102},
+        {"abort_if_busy", 0, 0, 0x103},
         {"live", 0, 0, 0x200},
         COMMON_LONG_OPTS
     };
@@ -585,6 +598,9 @@ int main_migrate(int argc, char **argv)
     case 0x102: /* --min_remaining */
         min_remaining = atoi(optarg);
         break;
+    case 0x103: /* --abort_if_busy */
+        abort_if_busy = 1;
+        break;
     case 0x200: /* --live */
         /* ignored for compatibility with xm */
         break;
@@ -619,7 +635,7 @@ int main_migrate(int argc, char **argv)
                   pause_after_migration ? " -p" : "");
     }
 
-    migrate_domain(domid, preserve_domid, rune, debug,
+    migrate_domain(domid, preserve_domid, rune, debug, abort_if_busy,
                    max_iters, min_remaining, config_filename);
     return EXIT_SUCCESS;
 }


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 34/36] tools: add API for expandable bitmaps
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (32 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 33/36] tools: add --abort_if_busy " Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 35/36] tools: use sr_bitmap for populated_pfns Olaf Hering
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

Since the incoming migration stream lacks info about what the highest pfn
will be, data structures can not be allocated upfront.

Add an API for expandable bitmaps, loosely based on pfn_set_populated.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

v02:
- remove xg_ prefix from functions
---
 tools/libs/saverestore/common.c | 40 ++++++++++++++++++++
 tools/libs/saverestore/common.h | 67 +++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

diff --git a/tools/libs/saverestore/common.c b/tools/libs/saverestore/common.c
index 7da7fa4e2c..8b4e402df5 100644
--- a/tools/libs/saverestore/common.c
+++ b/tools/libs/saverestore/common.c
@@ -163,6 +163,46 @@ static void __attribute__((unused)) build_assertions(void)
     BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)        != 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest
+ * power of two large enough to contain the required number of bits.
+ */
+bool _sr_bitmap_expand(struct sr_bitmap *bm, unsigned long bits)
+{
+    size_t new_max;
+    size_t old_sz, new_sz;
+    void *p;
+
+    if (bits <= bm->bits)
+        return true;
+
+    /* Round up to the nearest power of two larger than bit, less 1. */
+    new_max = bits;
+    new_max |= new_max >> 1;
+    new_max |= new_max >> 2;
+    new_max |= new_max >> 4;
+    new_max |= new_max >> 8;
+    new_max |= new_max >> 16;
+    if ( sizeof(unsigned long) > 4 )
+        new_max |= new_max >> 32;
+
+    /* Allocate units of unsigned long */
+    new_max = (new_max + BITS_PER_LONG - 1) & ~(BITS_PER_LONG - 1);
+
+    old_sz = bitmap_size(bm->bits);
+    new_sz = bitmap_size(new_max);
+    p = realloc(bm->p, new_sz);
+    if (!p)
+        return false;
+
+    memset(p + old_sz, 0, new_sz - old_sz);
+    bm->p = p;
+    bm->bits = new_max;
+
+    return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 60bbba6aa9..43aa1a7b86 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -30,6 +30,73 @@ const char *rec_type_to_str(uint32_t type);
 struct xc_sr_context;
 struct xc_sr_record;
 
+struct sr_bitmap
+{
+    void *p;
+    unsigned long bits;
+};
+
+extern bool _sr_bitmap_expand(struct sr_bitmap *bm, unsigned long bits);
+
+static inline bool sr_bitmap_expand(struct sr_bitmap *bm, unsigned long bits)
+{
+    if (bits > bm->bits)
+        return _sr_bitmap_expand(bm, bits);
+    return true;
+}
+
+static inline void sr_bitmap_free(struct sr_bitmap *bm)
+{
+    free(bm->p);
+    bm->p = NULL;
+}
+
+static inline bool sr_set_bit(unsigned long bit, struct sr_bitmap *bm)
+{
+    if (sr_bitmap_expand(bm, bit) == false)
+        return false;
+
+    set_bit(bit, bm->p);
+    return true;
+}
+
+static inline bool sr_test_bit(unsigned long bit, struct sr_bitmap *bm)
+{
+    if (bit > bm->bits)
+        return false;
+    return !!test_bit(bit, bm->p);
+}
+
+static inline void sr_clear_bit(unsigned long bit, struct sr_bitmap *bm)
+{
+    if (bit <= bm->bits)
+        clear_bit(bit, bm->p);
+}
+
+static inline bool sr_test_and_clear_bit(unsigned long bit, struct sr_bitmap *bm)
+{
+    if (bit > bm->bits)
+        return false;
+    return !!test_and_clear_bit(bit, bm->p);
+}
+
+/* No way to report potential allocation error, bitmap must be expanded prior usage */
+static inline bool sr_test_and_set_bit(unsigned long bit, struct sr_bitmap *bm)
+{
+    if (bit > bm->bits)
+        return false;
+    return !!test_and_set_bit(bit, bm->p);
+}
+
+static inline bool sr_set_long_bit(unsigned long base_bit, struct sr_bitmap *bm)
+{
+    if (sr_bitmap_expand(bm, base_bit + BITS_PER_LONG) == false)
+        return false;
+
+    set_bit_long(base_bit, bm->p);
+    return true;
+}
+
 /**
  * Save operations.  To be implemented for each type of guest, for use by the
  * common save algorithm.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 35/36] tools: use sr_bitmap for populated_pfns
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (33 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 34/36] tools: add API for expandable bitmaps Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 12:51 ` [PATCH v20210616 36/36] tools: use superpages during restore of HVM guest Olaf Hering
  2021-06-16 14:50 ` [PATCH v20210616 00/36] leftover from 2020 Andrew Cooper
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

Signed-off-by: Olaf Hering <olaf@aepfle.de>

v02:
- remove xg_ prefix from called functions
---
 tools/libs/saverestore/common.h          | 21 +++++++-
 tools/libs/saverestore/restore.c         | 69 ------------------------
 tools/libs/saverestore/restore_x86_hvm.c |  9 ++++
 tools/libs/saverestore/restore_x86_pv.c  |  7 +++
 4 files changed, 35 insertions(+), 71 deletions(-)

diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 43aa1a7b86..43a31f9aa5 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -403,8 +403,7 @@ struct xc_sr_context
             uint32_t     xenstore_domid,  console_domid;
 
             /* Bitmap of currently populated PFNs during restore. */
-            unsigned long *populated_pfns;
-            xen_pfn_t max_populated_pfn;
+            struct sr_bitmap populated_pfns;
 
             /* Sender has invoked verify mode on the stream. */
             bool verify;
@@ -629,6 +628,24 @@ static inline bool page_type_has_stream_data(uint32_t type)
     }
     return ret;
 }
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+    return sr_test_bit(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+    xc_interface *xch = ctx->xch;
+
+    if ( sr_set_bit(pfn, &ctx->restore.populated_pfns) == false )
+    {
+        PERROR("Failed to realloc populated_pfns bitmap");
+        errno = ENOMEM;
+        return -1;
+    }
+    return 0;
+}
 #endif
 /*
  * Local variables:
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index d0148606bf..8f7bce2585 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -71,64 +71,6 @@ static int read_headers(struct xc_sr_context *ctx)
     return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-    if ( pfn > ctx->restore.max_populated_pfn )
-        return false;
-    return test_bit(pfn, ctx->restore.populated_pfns);
-}
-
-/*
- * Set a pfn as populated, expanding the tracking structures if needed. To
- * avoid realloc()ing too excessively, the size increased to the nearest power
- * of two large enough to contain the required pfn.
- */
-static int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-    xc_interface *xch = ctx->xch;
-
-    if ( pfn > ctx->restore.max_populated_pfn )
-    {
-        xen_pfn_t new_max;
-        size_t old_sz, new_sz;
-        unsigned long *p;
-
-        /* Round up to the nearest power of two larger than pfn, less 1. */
-        new_max = pfn;
-        new_max |= new_max >> 1;
-        new_max |= new_max >> 2;
-        new_max |= new_max >> 4;
-        new_max |= new_max >> 8;
-        new_max |= new_max >> 16;
-#ifdef __x86_64__
-        new_max |= new_max >> 32;
-#endif
-
-        old_sz = bitmap_size(ctx->restore.max_populated_pfn + 1);
-        new_sz = bitmap_size(new_max + 1);
-        p = realloc(ctx->restore.populated_pfns, new_sz);
-        if ( !p )
-        {
-            ERROR("Failed to realloc populated bitmap");
-            errno = ENOMEM;
-            return -1;
-        }
-
-        memset((uint8_t *)p + old_sz, 0x00, new_sz - old_sz);
-
-        ctx->restore.populated_pfns    = p;
-        ctx->restore.max_populated_pfn = new_max;
-    }
-
-    assert(!test_bit(pfn, ctx->restore.populated_pfns));
-    set_bit(pfn, ctx->restore.populated_pfns);
-
-    return 0;
-}
-
 /*
  * Given a set of pfns, obtain memory from Xen to fill the physmap for the
  * unpopulated subset.  If types is NULL, no page type checking is performed
@@ -929,16 +871,6 @@ static int setup(struct xc_sr_context *ctx)
     if ( rc )
         goto err;
 
-    ctx->restore.max_populated_pfn = (32 * 1024 / 4) - 1;
-    ctx->restore.populated_pfns = bitmap_alloc(
-        ctx->restore.max_populated_pfn + 1);
-    if ( !ctx->restore.populated_pfns )
-    {
-        ERROR("Unable to allocate memory for populated_pfns bitmap");
-        rc = -1;
-        goto err;
-    }
-
     ctx->restore.buffered_records = malloc(
         DEFAULT_BUF_RECORDS * sizeof(struct xc_sr_record));
     if ( !ctx->restore.buffered_records )
@@ -977,7 +909,6 @@ static void cleanup(struct xc_sr_context *ctx)
 
     free(ctx->restore.m);
     free(ctx->restore.buffered_records);
-    free(ctx->restore.populated_pfns);
 
     if ( ctx->restore.ops.cleanup(ctx) )
         PERROR("Failed to clean up");
diff --git a/tools/libs/saverestore/restore_x86_hvm.c b/tools/libs/saverestore/restore_x86_hvm.c
index bd63bd2818..97e7e0f48c 100644
--- a/tools/libs/saverestore/restore_x86_hvm.c
+++ b/tools/libs/saverestore/restore_x86_hvm.c
@@ -136,6 +136,7 @@ static int x86_hvm_localise_page(struct xc_sr_context *ctx,
 static int x86_hvm_setup(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
+    unsigned long max_pfn;
 
     if ( ctx->restore.guest_type != DHDR_TYPE_X86_HVM )
     {
@@ -161,6 +162,13 @@ static int x86_hvm_setup(struct xc_sr_context *ctx)
     }
 #endif
 
+    max_pfn = max(ctx->restore.p2m_size, ctx->dominfo.max_memkb >> (PAGE_SHIFT-10));
+    if ( !sr_bitmap_expand(&ctx->restore.populated_pfns, max_pfn) )
+    {
+        PERROR("Unable to allocate memory for populated_pfns bitmap");
+        return -1;
+    }
+
     return 0;
 }
 
@@ -241,6 +249,7 @@ static int x86_hvm_stream_complete(struct xc_sr_context *ctx)
 
 static int x86_hvm_cleanup(struct xc_sr_context *ctx)
 {
+    sr_bitmap_free(&ctx->restore.populated_pfns);
     free(ctx->x86.hvm.restore.context.ptr);
 
     free(ctx->x86.restore.cpuid.ptr);
diff --git a/tools/libs/saverestore/restore_x86_pv.c b/tools/libs/saverestore/restore_x86_pv.c
index 96608e5231..c73a3cd99f 100644
--- a/tools/libs/saverestore/restore_x86_pv.c
+++ b/tools/libs/saverestore/restore_x86_pv.c
@@ -1060,6 +1060,12 @@ static int x86_pv_setup(struct xc_sr_context *ctx)
     if ( rc )
         return rc;
 
+    if ( !sr_bitmap_expand(&ctx->restore.populated_pfns, 32 * 1024 / 4) )
+    {
+        PERROR("Unable to allocate memory for populated_pfns bitmap");
+        return -1;
+    }
+
     ctx->x86.pv.restore.nr_vcpus = ctx->dominfo.max_vcpu_id + 1;
     ctx->x86.pv.restore.vcpus = calloc(sizeof(struct xc_sr_x86_pv_restore_vcpu),
                                        ctx->x86.pv.restore.nr_vcpus);
@@ -1153,6 +1159,7 @@ static int x86_pv_stream_complete(struct xc_sr_context *ctx)
  */
 static int x86_pv_cleanup(struct xc_sr_context *ctx)
 {
+    sr_bitmap_free(&ctx->restore.populated_pfns);
     free(ctx->x86.pv.p2m);
     free(ctx->x86.pv.p2m_pfns);
 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v20210616 36/36] tools: use superpages during restore of HVM guest
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (34 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 35/36] tools: use sr_bitmap for populated_pfns Olaf Hering
@ 2021-06-16 12:51 ` Olaf Hering
  2021-06-16 14:50 ` [PATCH v20210616 00/36] leftover from 2020 Andrew Cooper
  36 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 12:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Olaf Hering, Ian Jackson, Wei Liu, Juergen Gross

During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradation after
migration.

Add neccessary code to preallocate a superpage for an incoming chunk of
pfns. In case a pfn was not populated on the sending side, it must be
freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

v02:
- remove xg_ prefix from called functions
---
 tools/libs/guest/xg_dom_x86.c            |   5 -
 tools/libs/guest/xg_private.h            |   5 +
 tools/libs/saverestore/common.c          |   1 -
 tools/libs/saverestore/common.h          |  28 +-
 tools/libs/saverestore/restore.c         |  62 +---
 tools/libs/saverestore/restore_x86_hvm.c | 370 ++++++++++++++++++++++-
 tools/libs/saverestore/restore_x86_pv.c  |  61 +++-
 7 files changed, 455 insertions(+), 77 deletions(-)

diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c
index d2eb89ce01..ec0d18fd60 100644
--- a/tools/libs/guest/xg_dom_x86.c
+++ b/tools/libs/guest/xg_dom_x86.c
@@ -44,11 +44,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libs/guest/xg_private.h b/tools/libs/guest/xg_private.h
index 28441ee13f..b7372e6bd5 100644
--- a/tools/libs/guest/xg_private.h
+++ b/tools/libs/guest/xg_private.h
@@ -179,4 +179,9 @@ struct xc_cpu_policy {
 };
 #endif /* x86 */
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #endif /* XG_PRIVATE_H */
diff --git a/tools/libs/saverestore/common.c b/tools/libs/saverestore/common.c
index 8b4e402df5..5c659aa55b 100644
--- a/tools/libs/saverestore/common.c
+++ b/tools/libs/saverestore/common.c
@@ -1,5 +1,4 @@
 #include <assert.h>
-
 #include "common.h"
 
 #include <xen-tools/libs.h>
diff --git a/tools/libs/saverestore/common.h b/tools/libs/saverestore/common.h
index 43a31f9aa5..8e67989bbf 100644
--- a/tools/libs/saverestore/common.h
+++ b/tools/libs/saverestore/common.h
@@ -219,6 +219,16 @@ struct xc_sr_restore_ops
      */
     int (*setup)(struct xc_sr_context *ctx);
 
+    /**
+     * Populate PFNs
+     *
+     * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+     * unpopulated subset.
+     */
+    int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+                         const xen_pfn_t *original_pfns, const uint32_t *types);
+
+
     /**
      * Process an individual record from the stream.  The caller shall take
      * care of processing common records (e.g. END, PAGE_DATA).
@@ -366,6 +376,8 @@ struct xc_sr_context
 
             int send_back_fd;
             unsigned long p2m_size;
+            unsigned long max_pages;
+            unsigned long tot_pages;
             xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
             /* From Image Header. */
@@ -503,6 +515,14 @@ struct xc_sr_context
                     {
                         /* HVM context blob. */
                         struct xc_sr_blob context;
+
+                        /* Bitmap of currently allocated PFNs during restore. */
+                        struct sr_bitmap attempted_1g;
+                        struct sr_bitmap attempted_2m;
+                        struct sr_bitmap allocated_pfns;
+                        xen_pfn_t prev_populated_pfn;
+                        xen_pfn_t iteration_tracker_pfn;
+                        unsigned long iteration;
                     } restore;
                 };
             } hvm;
@@ -567,14 +587,6 @@ int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhd
 int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr,
                      struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
-                  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 /* Handle a STATIC_DATA_END record. */
 int handle_static_data_end(struct xc_sr_context *ctx);
 
diff --git a/tools/libs/saverestore/restore.c b/tools/libs/saverestore/restore.c
index 8f7bce2585..5ad3df49ba 100644
--- a/tools/libs/saverestore/restore.c
+++ b/tools/libs/saverestore/restore.c
@@ -71,63 +71,6 @@ static int read_headers(struct xc_sr_context *ctx)
     return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned int count,
-                  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-    xc_interface *xch = ctx->xch;
-    xen_pfn_t *mfns = ctx->restore.m->pp_mfns,
-        *pfns = ctx->restore.m->pp_pfns;
-    unsigned int i, nr_pfns = 0;
-    int rc = -1;
-
-    for ( i = 0; i < count; ++i )
-    {
-        if ( (!types ||
-              (types && page_type_has_stream_data(types[i]) == true)) &&
-             !pfn_is_populated(ctx, original_pfns[i]) )
-        {
-            rc = pfn_set_populated(ctx, original_pfns[i]);
-            if ( rc )
-                goto err;
-            pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-            ++nr_pfns;
-        }
-    }
-
-    if ( nr_pfns )
-    {
-        rc = xc_domain_populate_physmap_exact(
-            xch, ctx->domid, nr_pfns, 0, 0, mfns);
-        if ( rc )
-        {
-            PERROR("Failed to populate physmap");
-            goto err;
-        }
-
-        for ( i = 0; i < nr_pfns; ++i )
-        {
-            if ( mfns[i] == INVALID_MFN )
-            {
-                ERROR("Populate physmap failed for pfn %u", i);
-                rc = -1;
-                goto err;
-            }
-
-            ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-        }
-    }
-
-    rc = 0;
-
- err:
-    return rc;
-}
-
 static int handle_static_data_end_v2(struct xc_sr_context *ctx)
 {
     int rc = 0;
@@ -270,7 +213,7 @@ static int map_guest_pages(struct xc_sr_context *ctx,
     uint32_t i, p;
     int rc;
 
-    rc = populate_pfns(ctx, pages->count, m->pfns, m->types);
+    rc = ctx->restore.ops.populate_pfns(ctx, pages->count, m->pfns, m->types);
     if ( rc )
     {
         ERROR("Failed to populate pfns for batch of %u pages", pages->count);
@@ -1077,6 +1020,9 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
         return -1;
     }
 
+    /* See xc_domain_getinfo */
+    ctx.restore.max_pages = ctx.dominfo.max_memkb >> (PAGE_SHIFT-10);
+    ctx.restore.tot_pages = ctx.dominfo.nr_pages;
     ctx.restore.p2m_size = nr_pfns;
     ctx.restore.ops = ctx.dominfo.hvm
         ? restore_ops_x86_hvm : restore_ops_x86_pv;
diff --git a/tools/libs/saverestore/restore_x86_hvm.c b/tools/libs/saverestore/restore_x86_hvm.c
index 97e7e0f48c..7ed438e1be 100644
--- a/tools/libs/saverestore/restore_x86_hvm.c
+++ b/tools/libs/saverestore/restore_x86_hvm.c
@@ -130,6 +130,25 @@ static int x86_hvm_localise_page(struct xc_sr_context *ctx,
     return 0;
 }
 
+static bool x86_hvm_expand_sp_bitmaps(struct xc_sr_context *ctx, unsigned long max_pfn)
+{
+    struct sr_bitmap *bm;
+
+    bm = &ctx->x86.hvm.restore.attempted_1g;
+    if ( !sr_bitmap_expand(bm, max_pfn >> SUPERPAGE_1GB_SHIFT) )
+        return false;
+
+    bm = &ctx->x86.hvm.restore.attempted_2m;
+    if ( !sr_bitmap_expand(bm, max_pfn >> SUPERPAGE_2MB_SHIFT) )
+        return false;
+
+    bm = &ctx->x86.hvm.restore.allocated_pfns;
+    if ( !sr_bitmap_expand(bm, max_pfn) )
+        return false;
+
+    return true;
+}
+
 /*
  * restore_ops function. Confirms the stream matches the domain.
  */
@@ -164,12 +183,21 @@ static int x86_hvm_setup(struct xc_sr_context *ctx)
 
     max_pfn = max(ctx->restore.p2m_size, ctx->dominfo.max_memkb >> (PAGE_SHIFT-10));
     if ( !sr_bitmap_expand(&ctx->restore.populated_pfns, max_pfn) )
-    {
-        PERROR("Unable to allocate memory for populated_pfns bitmap");
-        return -1;
-    }
+        goto out;
+
+    if ( !x86_hvm_expand_sp_bitmaps(ctx, max_pfn) )
+        goto out;
+
+    /* FIXME: distinguish between PVH and HVM */
+    /* No superpage in 1st 2MB due to VGA hole */
+    sr_set_bit(0, &ctx->x86.hvm.restore.attempted_1g);
+    sr_set_bit(0, &ctx->x86.hvm.restore.attempted_2m);
 
     return 0;
+
+out:
+    PERROR("Unable to allocate memory for pfn bitmaps");
+    return -1;
 }
 
 /*
@@ -250,6 +278,9 @@ static int x86_hvm_stream_complete(struct xc_sr_context *ctx)
 static int x86_hvm_cleanup(struct xc_sr_context *ctx)
 {
     sr_bitmap_free(&ctx->restore.populated_pfns);
+    sr_bitmap_free(&ctx->x86.hvm.restore.attempted_1g);
+    sr_bitmap_free(&ctx->x86.hvm.restore.attempted_2m);
+    sr_bitmap_free(&ctx->x86.hvm.restore.allocated_pfns);
     free(ctx->x86.hvm.restore.context.ptr);
 
     free(ctx->x86.restore.cpuid.ptr);
@@ -258,6 +289,336 @@ static int x86_hvm_cleanup(struct xc_sr_context *ctx)
     return 0;
 }
 
+/*
+ * Set a range of pfns as allocated
+ */
+static void pfn_set_long_allocated(struct xc_sr_context *ctx, xen_pfn_t base_pfn)
+{
+    sr_set_long_bit(base_pfn, &ctx->x86.hvm.restore.allocated_pfns);
+}
+
+static void pfn_set_allocated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+    sr_set_bit(pfn, &ctx->x86.hvm.restore.allocated_pfns);
+}
+
+struct x86_hvm_sp {
+    xen_pfn_t pfn;
+    xen_pfn_t base_pfn;
+    unsigned long index;
+    unsigned long count;
+};
+
+/*
+ * Try to allocate a 1GB page for this pfn, but avoid Over-allocation.
+ * If this succeeds, mark the range of 2MB pages as busy.
+ */
+static bool x86_hvm_alloc_1g(struct xc_sr_context *ctx, struct x86_hvm_sp *sp)
+{
+    xc_interface *xch = ctx->xch;
+    unsigned int order;
+    int i, done;
+    xen_pfn_t extent;
+
+    /* Only one attempt to avoid overlapping allocation */
+    if ( sr_test_and_set_bit(sp->index, &ctx->x86.hvm.restore.attempted_1g) )
+        return false;
+
+    order = SUPERPAGE_1GB_SHIFT;
+    sp->count = SUPERPAGE_1GB_NR_PFNS;
+
+    /* Allocate only if there is room for another superpage */
+    if ( ctx->restore.tot_pages + sp->count > ctx->restore.max_pages )
+        return false;
+
+    extent = sp->base_pfn = (sp->pfn >> order) << order;
+    done = xc_domain_populate_physmap(xch, ctx->domid, 1, order, 0, &extent);
+    if ( done < 0 ) {
+        PERROR("populate_physmap failed.");
+        return false;
+    }
+    if ( done == 0 )
+        return false;
+
+    DPRINTF("1G %" PRI_xen_pfn "\n", sp->base_pfn);
+
+    /* Mark all 2MB pages as done to avoid overlapping allocation */
+    for ( i = 0; i < (SUPERPAGE_1GB_NR_PFNS/SUPERPAGE_2MB_NR_PFNS); i++ )
+        sr_set_bit((sp->base_pfn >> SUPERPAGE_2MB_SHIFT) + i, &ctx->x86.hvm.restore.attempted_2m);
+
+    return true;
+}
+
+/* Allocate a 2MB page if x86_hvm_alloc_1g failed, avoid Over-allocation. */
+static bool x86_hvm_alloc_2m(struct xc_sr_context *ctx, struct x86_hvm_sp *sp)
+{
+    xc_interface *xch = ctx->xch;
+    unsigned int order;
+    int done;
+    xen_pfn_t extent;
+
+    /* Only one attempt to avoid overlapping allocation */
+    if ( sr_test_and_set_bit(sp->index, &ctx->x86.hvm.restore.attempted_2m) )
+        return false;
+
+    order = SUPERPAGE_2MB_SHIFT;
+    sp->count = SUPERPAGE_2MB_NR_PFNS;
+
+    /* Allocate only if there is room for another superpage */
+    if ( ctx->restore.tot_pages + sp->count > ctx->restore.max_pages )
+        return false;
+
+    extent = sp->base_pfn = (sp->pfn >> order) << order;
+    done = xc_domain_populate_physmap(xch, ctx->domid, 1, order, 0, &extent);
+    if ( done < 0 ) {
+        PERROR("populate_physmap failed.");
+        return false;
+    }
+    if ( done == 0 )
+        return false;
+
+    DPRINTF("2M %" PRI_xen_pfn "\n", sp->base_pfn);
+    return true;
+}
+
+/* Allocate a single page if x86_hvm_alloc_2m failed. */
+static bool x86_hvm_alloc_4k(struct xc_sr_context *ctx, struct x86_hvm_sp *sp)
+{
+    xc_interface *xch = ctx->xch;
+    unsigned int order;
+    int done;
+    xen_pfn_t extent;
+
+    order = 0;
+    sp->count = 1UL;
+
+    /* Allocate only if there is room for another page */
+    if ( ctx->restore.tot_pages + sp->count > ctx->restore.max_pages ) {
+        errno = E2BIG;
+        return false;
+    }
+
+    extent = sp->base_pfn = (sp->pfn >> order) << order;
+    done = xc_domain_populate_physmap(xch, ctx->domid, 1, order, 0, &extent);
+    if ( done < 0 ) {
+        PERROR("populate_physmap failed.");
+        return false;
+    }
+    if ( done == 0 ) {
+        errno = ENOMEM;
+        return false;
+    }
+
+    DPRINTF("4K %" PRI_xen_pfn "\n", sp->base_pfn);
+    return true;
+}
+/*
+ * Attempt to allocate a superpage where the pfn resides.
+ */
+static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+    bool success;
+    unsigned long idx_1g, idx_2m;
+    struct x86_hvm_sp sp = {
+        .pfn = pfn
+    };
+
+    if ( sr_test_bit(pfn, &ctx->x86.hvm.restore.allocated_pfns) )
+        return 0;
+
+    idx_1g = pfn >> SUPERPAGE_1GB_SHIFT;
+    idx_2m = pfn >> SUPERPAGE_2MB_SHIFT;
+
+    sp.index = idx_1g;
+    success = x86_hvm_alloc_1g(ctx, &sp);
+
+    if ( success == false ) {
+        sp.index = idx_2m;
+        success = x86_hvm_alloc_2m(ctx, &sp);
+    }
+
+    if ( success == false ) {
+        sp.index = 0;
+        success = x86_hvm_alloc_4k(ctx, &sp);
+    }
+
+    if ( success == false )
+        return -1;
+
+    do {
+        if ( sp.count >= BITS_PER_LONG ) {
+            sp.count -= BITS_PER_LONG;
+            ctx->restore.tot_pages += BITS_PER_LONG;
+            pfn_set_long_allocated(ctx, sp.base_pfn + sp.count);
+        } else {
+            sp.count--;
+            ctx->restore.tot_pages++;
+            pfn_set_allocated(ctx, sp.base_pfn + sp.count);
+        }
+    } while ( sp.count );
+
+    return 0;
+}
+
+/*
+ * Deallocate memory.
+ * There was likely an optimistic superpage allocation.
+ * This means more pages may have been allocated past gap_end.
+ * This range is not freed now. Incoming higher pfns will release it.
+ */
+static int x86_hvm_punch_hole(struct xc_sr_context *ctx,
+                               xen_pfn_t gap_start, xen_pfn_t gap_end)
+{
+    xc_interface *xch = ctx->xch;
+    xen_pfn_t _pfn, pfn;
+    uint32_t domid, freed = 0;
+    int rc;
+
+    pfn = gap_start >> SUPERPAGE_1GB_SHIFT;
+    do
+    {
+        sr_set_bit(pfn, &ctx->x86.hvm.restore.attempted_1g);
+    } while (++pfn <= gap_end >> SUPERPAGE_1GB_SHIFT);
+
+    pfn = gap_start >> SUPERPAGE_2MB_SHIFT;
+    do
+    {
+        sr_set_bit(pfn, &ctx->x86.hvm.restore.attempted_2m);
+    } while (++pfn <= gap_end >> SUPERPAGE_2MB_SHIFT);
+
+    pfn = gap_start;
+
+    while ( pfn <= gap_end )
+    {
+        if ( sr_test_and_clear_bit(pfn, &ctx->x86.hvm.restore.allocated_pfns) )
+        {
+            domid = ctx->domid;
+            _pfn = pfn;
+            rc = xc_domain_decrease_reservation_exact(xch, domid, 1, 0, &_pfn);
+            if ( rc )
+            {
+                PERROR("Failed to release pfn %" PRI_xen_pfn, pfn);
+                return -1;
+            }
+            ctx->restore.tot_pages--;
+            freed++;
+        }
+        pfn++;
+    }
+    if ( freed )
+        DPRINTF("freed %u between %" PRI_xen_pfn " %" PRI_xen_pfn "\n",
+                freed, gap_start, gap_end);
+    return 0;
+}
+
+static int x86_hvm_unpopulate_page(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+    sr_clear_bit(pfn, &ctx->restore.populated_pfns);
+    return x86_hvm_punch_hole(ctx, pfn, pfn);
+}
+
+static int x86_hvm_populate_page(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+    xen_pfn_t gap_start, gap_end;
+    bool has_gap, first_iteration;
+    int rc;
+
+    /*
+     * Check for a gap between the previous populated pfn and this pfn.
+     * In case a gap exists, it is required to punch a hole to release memory,
+     * starting after the previous pfn and before this pfn.
+     *
+     * But: this can be done only during the first iteration, which is the
+     * only place there superpage allocations are attempted. All following
+     * iterations lack the info to properly maintain prev_populated_pfn.
+     */
+    has_gap = ctx->x86.hvm.restore.prev_populated_pfn + 1 < pfn;
+    first_iteration = ctx->x86.hvm.restore.iteration == 0;
+    if ( has_gap && first_iteration )
+    {
+        gap_start = ctx->x86.hvm.restore.prev_populated_pfn + 1;
+        gap_end = pfn - 1;
+
+        rc = x86_hvm_punch_hole(ctx, gap_start, gap_end);
+        if ( rc )
+            goto err;
+    }
+
+    rc = x86_hvm_allocate_pfn(ctx, pfn);
+    if ( rc )
+        goto err;
+    pfn_set_populated(ctx, pfn);
+    ctx->x86.hvm.restore.prev_populated_pfn = pfn;
+
+    rc = 0;
+err:
+    return rc;
+}
+
+/*
+ * Try to allocate superpages.
+ * This works without memory map because the pfns arrive in incremental order.
+ * All pfn numbers and their type are submitted.
+ * Only pfns with data will have also pfn content transmitted.
+ */
+static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,
+                                 const xen_pfn_t *original_pfns,
+                                 const uint32_t *types)
+{
+    xc_interface *xch = ctx->xch;
+    xen_pfn_t pfn, min_pfn, max_pfn;
+    bool has_data, populated;
+    unsigned i = count;
+    int rc = 0;
+
+    min_pfn = count ? original_pfns[0] : 0;
+    max_pfn = count ? original_pfns[count - 1] : 0;
+    DPRINTF("batch of %u pfns between %" PRI_xen_pfn " %" PRI_xen_pfn "\n",
+            count, min_pfn, max_pfn);
+
+    if ( !x86_hvm_expand_sp_bitmaps(ctx, max_pfn) )
+    {
+        ERROR("Unable to allocate memory for pfn bitmaps");
+        return -1;
+    }
+
+    /*
+     * There is no indicator for a new iteration.
+     * Simulate it by checking if a lower pfn is coming in.
+     * In the end it matters only to know if this iteration is the first one.
+     */
+    if ( min_pfn < ctx->x86.hvm.restore.iteration_tracker_pfn )
+        ctx->x86.hvm.restore.iteration++;
+    ctx->x86.hvm.restore.iteration_tracker_pfn = min_pfn;
+
+    for ( i = 0; i < count; ++i )
+    {
+        pfn = original_pfns[i];
+
+        has_data = page_type_has_stream_data(types[i]);
+        populated = pfn_is_populated(ctx, pfn);
+
+        /*
+         * page has data, pfn populated: nothing to do
+         * page has data, pfn not populated: likely never seen before
+         * page has no data, pfn populated: likely ballooned out during migration
+         * page has no data, pfn not populated: nothing to do
+         */
+        if ( has_data && !populated )
+        {
+            rc = x86_hvm_populate_page(ctx, pfn);
+        } else if ( !has_data && populated )
+        {
+            rc = x86_hvm_unpopulate_page(ctx, pfn);
+        }
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
+
+
 struct xc_sr_restore_ops restore_ops_x86_hvm =
 {
     .pfn_is_valid    = x86_hvm_pfn_is_valid,
@@ -266,6 +627,7 @@ struct xc_sr_restore_ops restore_ops_x86_hvm =
     .set_page_type   = x86_hvm_set_page_type,
     .localise_page   = x86_hvm_localise_page,
     .setup           = x86_hvm_setup,
+    .populate_pfns   = x86_hvm_populate_pfns,
     .process_record  = x86_hvm_process_record,
     .static_data_complete = x86_static_data_complete,
     .stream_complete = x86_hvm_stream_complete,
diff --git a/tools/libs/saverestore/restore_x86_pv.c b/tools/libs/saverestore/restore_x86_pv.c
index c73a3cd99f..244f1da218 100644
--- a/tools/libs/saverestore/restore_x86_pv.c
+++ b/tools/libs/saverestore/restore_x86_pv.c
@@ -959,6 +959,64 @@ static void x86_pv_set_gfn(struct xc_sr_context *ctx, xen_pfn_t pfn,
         ((uint32_t *)ctx->x86.pv.p2m)[pfn] = mfn;
 }
 
+/*
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.  If types is NULL, no page type checking is performed
+ * and all unpopulated pfns are populated.
+ */
+static int x86_pv_populate_pfns(struct xc_sr_context *ctx, unsigned count,
+                                const xen_pfn_t *original_pfns,
+                                const uint32_t *types)
+{
+    xc_interface *xch = ctx->xch;
+    xen_pfn_t *mfns = ctx->restore.m->pp_mfns,
+        *pfns = ctx->restore.m->pp_pfns;
+    unsigned int i, nr_pfns = 0;
+    int rc = -1;
+
+    for ( i = 0; i < count; ++i )
+    {
+        if ( (!types ||
+              (types && page_type_has_stream_data(types[i]) == true)) &&
+             !pfn_is_populated(ctx, original_pfns[i]) )
+        {
+            rc = pfn_set_populated(ctx, original_pfns[i]);
+            if ( rc )
+                goto err;
+            pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
+            ++nr_pfns;
+        }
+    }
+
+    if ( nr_pfns )
+    {
+        rc = xc_domain_populate_physmap_exact(
+            xch, ctx->domid, nr_pfns, 0, 0, mfns);
+        if ( rc )
+        {
+            PERROR("Failed to populate physmap");
+            goto err;
+        }
+
+        for ( i = 0; i < nr_pfns; ++i )
+        {
+            if ( mfns[i] == INVALID_MFN )
+            {
+                ERROR("Populate physmap failed for pfn %u", i);
+                rc = -1;
+                goto err;
+            }
+
+            ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
+        }
+    }
+
+    rc = 0;
+
+ err:
+    return rc;
+}
+
 /*
  * restore_ops function.  Convert pfns back to mfns in pagetables.  Possibly
  * needs to populate new frames if a PTE is found referring to a frame which
@@ -1003,7 +1061,7 @@ static int x86_pv_localise_page(struct xc_sr_context *ctx,
         }
     }
 
-    if ( to_populate && populate_pfns(ctx, to_populate, pfns, NULL) )
+    if ( to_populate && x86_pv_populate_pfns(ctx, to_populate, pfns, NULL) )
         return -1;
 
     for ( i = 0; i < (PAGE_SIZE / sizeof(uint64_t)); ++i )
@@ -1200,6 +1258,7 @@ struct xc_sr_restore_ops restore_ops_x86_pv =
     .set_gfn         = x86_pv_set_gfn,
     .localise_page   = x86_pv_localise_page,
     .setup           = x86_pv_setup,
+    .populate_pfns   = x86_pv_populate_pfns,
     .process_record  = x86_pv_process_record,
     .static_data_complete = x86_static_data_complete,
     .stream_complete = x86_pv_stream_complete,


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 00/36] leftover from 2020
  2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
                   ` (35 preceding siblings ...)
  2021-06-16 12:51 ` [PATCH v20210616 36/36] tools: use superpages during restore of HVM guest Olaf Hering
@ 2021-06-16 14:50 ` Andrew Cooper
  2021-06-16 15:02   ` Olaf Hering
  2021-06-16 15:38   ` Olaf Hering
  36 siblings, 2 replies; 46+ messages in thread
From: Andrew Cooper @ 2021-06-16 14:50 UTC (permalink / raw)
  To: Olaf Hering, xen-devel

On 16/06/2021 13:50, Olaf Hering wrote:
> Various unreviewed changes, rebase to 4bcf6433ee.

General CI run:
https://gitlab.com/xen-project/patchew/xen/-/pipelines/322032419

Some specific failures.
https://gitlab.com/xen-project/patchew/xen/-/jobs/1351977567 (32bit
toolstack build):

common.c: In function '_sr_bitmap_expand':
common.c:187:9: error: right shift count >= width of type [-Werror]
         new_max |= new_max >> 32;
         ^


https://gitlab.com/xen-project/patchew/xen/-/jobs/1351977708 (arm64)

nomigrate.c:25:20: error: unknown type name 'xc_stream_type_t'
   25 |                    xc_stream_type_t stream_type, int recv_fd)
      |                    ^~~~~~~~~~~~~~~~


I haven't looked through all the failures in the general run, but be
aware that there might still be some clang fallout in dom0_build.c in
Xen, and PV32 fallout for the smoke tests, which won't be from your series.

~Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 00/36] leftover from 2020
  2021-06-16 14:50 ` [PATCH v20210616 00/36] leftover from 2020 Andrew Cooper
@ 2021-06-16 15:02   ` Olaf Hering
  2021-06-17 11:02     ` Julien Grall
  2021-06-16 15:38   ` Olaf Hering
  1 sibling, 1 reply; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 15:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]

Am Wed, 16 Jun 2021 15:50:24 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

>          new_max |= new_max >> 32;

Lazy compiler? I hoped this is a compile-time constant, which evaluates to zero in 32bit builds.

    if ( sizeof(unsigned long) > 4 )                                            

I guess a #ifdef as it is done in old code must be done.

Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 04/36] tools: create libxensaverestore
  2021-06-16 12:50 ` [PATCH v20210616 04/36] tools: create libxensaverestore Olaf Hering
@ 2021-06-16 15:17   ` Olaf Hering
  0 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 15:17 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Julien Grall, Stefano Stabellini, Juergen Gross, Anthony PERARD

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

Am Wed, 16 Jun 2021 14:50:57 +0200
schrieb Olaf Hering <olaf@aepfle.de>:

> Move all save/restore related code from libxenguest.so into a separate
> library libxensaverestore.so.

This additional change is required to cover non-x86.

--- a/tools/libs/saverestore/nomigrate.c
+++ b/tools/libs/saverestore/nomigrate.c
@@ -17,8 +17,7 @@
 
 #include <inttypes.h>
 #include <errno.h>
-#include <xenctrl.h>
-#include <xenguest.h>
+#include <xensaverestore.h>
 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags,
                    struct save_callbacks *callbacks,

Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 00/36] leftover from 2020
  2021-06-16 14:50 ` [PATCH v20210616 00/36] leftover from 2020 Andrew Cooper
  2021-06-16 15:02   ` Olaf Hering
@ 2021-06-16 15:38   ` Olaf Hering
  2021-06-17 11:24     ` Andrew Cooper
  1 sibling, 1 reply; 46+ messages in thread
From: Olaf Hering @ 2021-06-16 15:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 247 bytes --]

Am Wed, 16 Jun 2021 15:50:24 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

> 32bit toolstack build

as in i386?
How is this used in practice?
I guess such build should be marked as CONFIG_MIGRATE=n in config/x86_32.mk?

Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 00/36] leftover from 2020
  2021-06-16 15:02   ` Olaf Hering
@ 2021-06-17 11:02     ` Julien Grall
  2021-06-17 17:40       ` Olaf Hering
  0 siblings, 1 reply; 46+ messages in thread
From: Julien Grall @ 2021-06-17 11:02 UTC (permalink / raw)
  To: Olaf Hering, Andrew Cooper; +Cc: xen-devel

Hi Olaf,

On 16/06/2021 17:02, Olaf Hering wrote:
> Am Wed, 16 Jun 2021 15:50:24 +0100
> schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
> 
>>           new_max |= new_max >> 32;
> 
> Lazy compiler? I hoped this is a compile-time constant, which evaluates to zero in 32bit builds.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210, this seems to be 
a known issue on GCC.

> 
>      if ( sizeof(unsigned long) > 4 )
> 
> I guess a #ifdef as it is done in old code must be done.
> 
> Olaf
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 00/36] leftover from 2020
  2021-06-16 15:38   ` Olaf Hering
@ 2021-06-17 11:24     ` Andrew Cooper
  2021-06-17 14:55       ` Olaf Hering
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Cooper @ 2021-06-17 11:24 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel

On 16/06/2021 16:38, Olaf Hering wrote:
> Am Wed, 16 Jun 2021 15:50:24 +0100
> schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
>
>> 32bit toolstack build
> as in i386?
> How is this used in practice?

Every OSSTest run.  Also, arm32 is absolutely a thing (the only reason
ARM can't migrate right now is because there is no logdirty support in
Xen yet).

> I guess such build should be marked as CONFIG_MIGRATE=n in config/x86_32.mk?

Migration (v2) very definitely works for i386 toolstacks. Part of the
testing process during development was migrating a VM between a 32bit
and 64bit dom0's, specifically to check that we'd got rid of all of the
bitness problems in the stream format.

~Andrew



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 00/36] leftover from 2020
  2021-06-17 11:24     ` Andrew Cooper
@ 2021-06-17 14:55       ` Olaf Hering
  2021-06-17 15:05         ` Andrew Cooper
  0 siblings, 1 reply; 46+ messages in thread
From: Olaf Hering @ 2021-06-17 14:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

Am Thu, 17 Jun 2021 12:24:22 +0100
schrieb Andrew Cooper <andrew.cooper3@citrix.com>:

> On 16/06/2021 16:38, Olaf Hering wrote:
> > Am Wed, 16 Jun 2021 15:50:24 +0100
> > schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
> >  
> >> 32bit toolstack build  
> > as in i386?
> > How is this used in practice?  
> Every OSSTest run.

This is not what I mean.
I think there is a 32bit xen-tools, a 32bit dom0 kernel and a 64bit Xen?
Is 32bit xen-tools, 64bit dom0 kernel and 64bit Xen expected to work?

Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 00/36] leftover from 2020
  2021-06-17 14:55       ` Olaf Hering
@ 2021-06-17 15:05         ` Andrew Cooper
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Cooper @ 2021-06-17 15:05 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel

On 17/06/2021 15:55, Olaf Hering wrote:
> Am Thu, 17 Jun 2021 12:24:22 +0100
> schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
>
>> On 16/06/2021 16:38, Olaf Hering wrote:
>>> Am Wed, 16 Jun 2021 15:50:24 +0100
>>> schrieb Andrew Cooper <andrew.cooper3@citrix.com>:
>>>  
>>>> 32bit toolstack build  
>>> as in i386?
>>> How is this used in practice?  
>> Every OSSTest run.
> This is not what I mean.
> I think there is a 32bit xen-tools, a 32bit dom0 kernel and a 64bit Xen?

Yes - this exists.

> Is 32bit xen-tools, 64bit dom0 kernel and 64bit Xen expected to work?

In an ideal world, yes.  In reality, no.

Lots of hypercalls have embedded pointers (every GUEST_HANDLE(), to a
first approximation), and dom0's ABI with Xen is 64bit, which is not the
ABI that 32bit userspace speaks.

This is one of many errors in the hypercall design intending to be
addressed by the ABIv2 plans.

~Andrew



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v20210616 00/36] leftover from 2020
  2021-06-17 11:02     ` Julien Grall
@ 2021-06-17 17:40       ` Olaf Hering
  0 siblings, 0 replies; 46+ messages in thread
From: Olaf Hering @ 2021-06-17 17:40 UTC (permalink / raw)
  To: Julien Grall; +Cc: Andrew Cooper, xen-devel

[-- Attachment #1: Type: text/plain, Size: 297 bytes --]

Am Thu, 17 Jun 2021 13:02:39 +0200
schrieb Julien Grall <julien@xen.org>:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

Thanks, comments in this bug suggest a workaround like this:

    new_max |= sizeof(unsigned long) > 4 ? new_max >> 32 : 0;

This triggers no warning.

Olaf

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-06-17 17:40 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 12:50 [PATCH v20210616 00/36] leftover from 2020 Olaf Hering
2021-06-16 12:50 ` [PATCH v20210616 01/36] hotplug/Linux: fix starting of xenstored with restarting systemd Olaf Hering
2021-06-16 12:50 ` [PATCH v20210616 02/36] tools: add API to work with sevaral bits at once Olaf Hering
2021-06-16 12:50 ` [PATCH v20210616 03/36] xl: fix description of migrate --debug Olaf Hering
2021-06-16 12:50 ` [PATCH v20210616 04/36] tools: create libxensaverestore Olaf Hering
2021-06-16 15:17   ` Olaf Hering
2021-06-16 12:50 ` [PATCH v20210616 05/36] MAINTAINERS: add myself as saverestore maintainer Olaf Hering
2021-06-16 12:50 ` [PATCH v20210616 06/36] tools: add readv_exact to libxenctrl Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 07/36] tools: add sr_is_known_page_type to libsaverestore Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 08/36] tools: use sr_is_known_page_type Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 09/36] tools: unify type checking for data pfns in migration stream Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 10/36] tools: show migration transfer rate in send_dirty_pages Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 11/36] tools: prepare to allocate saverestore arrays once Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 12/36] tools: save: move mfns array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 13/36] tools: save: move types array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 14/36] tools: save: move errors array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 15/36] tools: save: move iov array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 16/36] tools: save: move rec_pfns array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 17/36] tools: save: move guest_data array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 18/36] tools: save: move local_pages array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 19/36] tools: restore: move types array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 20/36] tools: restore: move mfns array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 21/36] tools: restore: move map_errs array Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 22/36] tools: restore: move mfns array in populate_pfns Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 23/36] tools: restore: move pfns " Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 24/36] tools: restore: split record processing Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 25/36] tools: restore: split handle_page_data Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 26/36] tools: restore: write data directly into guest Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 27/36] tools: recognize LIBXL_API_VERSION for 4.16 Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 28/36] tools: adjust libxl_domain_suspend to receive a struct props Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 29/36] tools: change struct precopy_stats to precopy_stats_t Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 30/36] tools: add callback to libxl for precopy_policy and precopy_stats_t Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 31/36] tools: add --max_iters to libxl_domain_suspend Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 32/36] tools: add --min_remaining " Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 33/36] tools: add --abort_if_busy " Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 34/36] tools: add API for expandable bitmaps Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 35/36] tools: use sr_bitmap for populated_pfns Olaf Hering
2021-06-16 12:51 ` [PATCH v20210616 36/36] tools: use superpages during restore of HVM guest Olaf Hering
2021-06-16 14:50 ` [PATCH v20210616 00/36] leftover from 2020 Andrew Cooper
2021-06-16 15:02   ` Olaf Hering
2021-06-17 11:02     ` Julien Grall
2021-06-17 17:40       ` Olaf Hering
2021-06-16 15:38   ` Olaf Hering
2021-06-17 11:24     ` Andrew Cooper
2021-06-17 14:55       ` Olaf Hering
2021-06-17 15:05         ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.