All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] domain context infrastructure
@ 2020-04-07 17:38 Paul Durrant
  2020-04-07 17:38 ` [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context Paul Durrant
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Paul Durrant @ 2020-04-07 17:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Paul Durrant, Ian Jackson, George Dunlap, Jan Beulich,
	Daniel De Graaf, Volodymyr Babchuk, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

Paul Durrant (5):
  xen/common: introduce a new framework for save/restore of 'domain'
    context
  xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
  tools/misc: add xen-domctx to present domain context
  common/domain: add a domain context record for shared_info...
  tools/libxc: make use of domain context SHARED_INFO record...

 .gitignore                             |   1 +
 tools/flask/policy/modules/xen.if      |   4 +-
 tools/libxc/include/xenctrl.h          |   5 +
 tools/libxc/xc_domain.c                |  54 ++++
 tools/libxc/xc_sr_common.h             |   7 +-
 tools/libxc/xc_sr_common_x86.c         |  59 +++++
 tools/libxc/xc_sr_common_x86.h         |   4 +
 tools/libxc/xc_sr_common_x86_pv.c      |  53 ++++
 tools/libxc/xc_sr_common_x86_pv.h      |   3 +
 tools/libxc/xc_sr_restore_x86_pv.c     |  40 ++-
 tools/libxc/xc_sr_save_x86_pv.c        |  26 +-
 tools/libxc/xg_save_restore.h          |   1 +
 tools/misc/Makefile                    |   4 +
 tools/misc/xen-domctx.c                | 156 ++++++++++++
 xen/common/Makefile                    |   1 +
 xen/common/domain.c                    |  81 ++++++
 xen/common/domctl.c                    | 117 +++++++++
 xen/common/save.c                      | 329 +++++++++++++++++++++++++
 xen/include/public/arch-arm/hvm/save.h |   5 +
 xen/include/public/arch-x86/hvm/save.h |   5 +
 xen/include/public/domctl.h            |  44 +++-
 xen/include/public/save.h              |  92 +++++++
 xen/include/xen/save.h                 | 152 ++++++++++++
 xen/xsm/flask/hooks.c                  |   6 +
 xen/xsm/flask/policy/access_vectors    |   4 +
 25 files changed, 1201 insertions(+), 52 deletions(-)
 create mode 100644 tools/misc/xen-domctx.c
 create mode 100644 xen/common/save.c
 create mode 100644 xen/include/public/save.h
 create mode 100644 xen/include/xen/save.h
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: Wei Liu <wl@xen.org>
-- 
2.20.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-04-07 17:38 [PATCH v2 0/5] domain context infrastructure Paul Durrant
@ 2020-04-07 17:38 ` Paul Durrant
  2020-04-20 17:20   ` Julien Grall
  2020-04-29 11:02   ` Jan Beulich
  2020-04-07 17:38 ` [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext Paul Durrant
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 30+ messages in thread
From: Paul Durrant @ 2020-04-07 17:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Paul Durrant,
	Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jan Beulich, Volodymyr Babchuk, Roger Pau Monné

To allow enlightened HVM guests (i.e. those that have PV drivers) to be
migrated without their co-operation it will be necessary to transfer 'PV'
state such as event channel state, grant entry state, etc.

Currently there is a framework (entered via the hvm_save/load() functions)
that allows a domain's 'HVM' (architectural) state to be transferred but
'PV' state is also common with pure PV guests and so this framework is not
really suitable.

This patch adds the new public header and low level implementation of a new
common framework, entered via the domain_save/load() functions. Subsequent
patches will introduce other parts of the framework, and code that will
make use of it within the current version of the libxc migration stream.

This patch also marks the HVM-only framework as deprecated in favour of the
new framework.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wl@xen.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>

v2:
 - Allow multi-stage save/load to avoid the need to double-buffer
 - Get rid of the masks and add an 'ignore' flag instead
 - Create copy function union to preserve const save buffer
 - Deprecate HVM-only framework
---
 xen/common/Makefile                    |   1 +
 xen/common/save.c                      | 329 +++++++++++++++++++++++++
 xen/include/public/arch-arm/hvm/save.h |   5 +
 xen/include/public/arch-x86/hvm/save.h |   5 +
 xen/include/public/save.h              |  84 +++++++
 xen/include/xen/save.h                 | 152 ++++++++++++
 6 files changed, 576 insertions(+)
 create mode 100644 xen/common/save.c
 create mode 100644 xen/include/public/save.h
 create mode 100644 xen/include/xen/save.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index e8cde65370..90553ba5d7 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -37,6 +37,7 @@ obj-y += radix-tree.o
 obj-y += rbtree.o
 obj-y += rcupdate.o
 obj-y += rwlock.o
+obj-y += save.o
 obj-y += shutdown.o
 obj-y += softirq.o
 obj-y += sort.o
diff --git a/xen/common/save.c b/xen/common/save.c
new file mode 100644
index 0000000000..6cdac3785b
--- /dev/null
+++ b/xen/common/save.c
@@ -0,0 +1,329 @@
+/*
+ * save.c: Save and restore PV guest state common to all domain types.
+ *
+ * Copyright Amazon.com Inc. or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/save.h>
+
+union domain_copy_entry {
+    domain_write_entry write;
+    domain_read_entry read;
+};
+
+struct domain_context {
+    bool log;
+    struct domain_save_descriptor desc;
+    size_t data_len;
+    union domain_copy_entry copy;
+    void *priv;
+};
+
+static struct {
+    const char *name;
+    bool per_vcpu;
+    domain_save_handler save;
+    domain_load_handler load;
+} handlers[DOMAIN_SAVE_CODE_MAX + 1];
+
+void __init domain_register_save_type(unsigned int tc, const char *name,
+                                      bool per_vcpu,
+                                      domain_save_handler save,
+                                      domain_load_handler load)
+{
+    BUG_ON(tc > ARRAY_SIZE(handlers));
+
+    ASSERT(!handlers[tc].save);
+    ASSERT(!handlers[tc].load);
+
+    handlers[tc].name = name;
+    handlers[tc].per_vcpu = per_vcpu;
+    handlers[tc].save = save;
+    handlers[tc].load = load;
+}
+
+int domain_save_begin(struct domain_context *c, unsigned int tc,
+                      const char *name, const struct vcpu *v, size_t len)
+{
+    int rc;
+
+    if ( c->log )
+        gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
+                 (unsigned long)len);
+
+    BUG_ON(tc != c->desc.typecode);
+    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
+
+    ASSERT(!c->data_len);
+    c->data_len = c->desc.length = len;
+
+    rc = c->copy.write(c->priv, &c->desc, sizeof(c->desc));
+    if ( rc )
+        return rc;
+
+    c->desc.length = 0;
+
+    return 0;
+}
+
+int domain_save_data(struct domain_context *c, const void *src, size_t len)
+{
+    if ( c->desc.length + len > c->data_len )
+        return -ENOSPC;
+
+    c->desc.length += len;
+
+    return c->copy.write(c->priv, src, len);
+}
+
+int domain_save_end(struct domain_context *c)
+{
+    /*
+     * If desc.length does not match the length specified in
+     * domain_save_begin(), there should have been more data.
+     */
+    if ( c->desc.length != c->data_len )
+        return -EIO;
+
+    c->data_len = 0;
+
+    return 0;
+}
+
+int domain_save(struct domain *d, domain_write_entry write, void *priv,
+                bool dry_run)
+{
+    struct domain_context c = {
+        .copy.write = write,
+        .priv = priv,
+        .log = !dry_run,
+    };
+    struct domain_save_header h = {
+        .magic = DOMAIN_SAVE_MAGIC,
+        .version = DOMAIN_SAVE_VERSION,
+    };
+    struct domain_save_header e;
+    unsigned int i;
+    int rc;
+
+    ASSERT(d != current->domain);
+
+    if ( d->is_dying )
+        return -EINVAL;
+
+    domain_pause(d);
+
+    c.desc.typecode = DOMAIN_SAVE_CODE(HEADER);
+
+    rc = DOMAIN_SAVE_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h));
+    if ( rc )
+        goto out;
+
+    for ( i = 0; i < ARRAY_SIZE(handlers); i++ )
+    {
+        domain_save_handler save = handlers[i].save;
+
+        if ( !save )
+            continue;
+
+        memset(&c.desc, 0, sizeof(c.desc));
+        c.desc.typecode = i;
+
+        if ( handlers[i].per_vcpu )
+        {
+            struct vcpu *v;
+
+            for_each_vcpu ( d, v )
+            {
+                c.desc.vcpu_id = v->vcpu_id;
+
+                rc = save(v, &c, dry_run);
+                if ( rc )
+                    goto out;
+            }
+        }
+        else
+        {
+            rc = save(d->vcpu[0], &c, dry_run);
+            if ( rc )
+                goto out;
+        }
+    }
+
+    memset(&c.desc, 0, sizeof(c.desc));
+    c.desc.typecode = DOMAIN_SAVE_CODE(END);
+
+    rc = DOMAIN_SAVE_ENTRY(END, &c, d->vcpu[0], &e, 0);
+
+ out:
+    domain_unpause(d);
+
+    return rc;
+}
+
+int domain_load_begin(struct domain_context *c, unsigned int tc,
+                      const char *name, const struct vcpu *v, size_t len,
+                      bool exact)
+{
+    if ( c->log )
+        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
+                 (unsigned long)len);
+
+    BUG_ON(tc != c->desc.typecode);
+    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
+
+    if ( (exact && (len != c->desc.length)) ||
+         (len < c->desc.length) )
+        return -EINVAL;
+
+    ASSERT(!c->data_len);
+    c->data_len = len;
+
+    return 0;
+}
+
+int domain_load_data(struct domain_context *c, void *dst, size_t len)
+{
+    size_t copy_len = min_t(size_t, len, c->desc.length);
+    int rc;
+
+    if ( c->data_len < len )
+        return -ENODATA;
+
+    c->data_len -= len;
+    c->desc.length -= copy_len;
+
+    rc = c->copy.read(c->priv, dst, copy_len);
+    if ( rc )
+        return rc;
+
+    /* Zero extend if the descriptor is exhausted */
+    len -= copy_len;
+    if ( len )
+    {
+        dst += copy_len;
+        memset(dst, 0, len);
+    }
+
+    return 0;
+}
+
+int domain_load_end(struct domain_context *c)
+{
+    /* If data_len is non-zero there is unread data */
+    if ( c->data_len )
+        return -EIO;
+
+    return 0;
+}
+
+int domain_load(struct domain *d, domain_read_entry read, void *priv)
+{
+    struct domain_context c = {
+        .copy.read = read,
+        .priv = priv,
+        .log = true,
+    };
+    struct domain_save_header h;
+    int rc;
+
+    ASSERT(d != current->domain);
+
+    if ( d->is_dying )
+        return -EINVAL;
+
+    rc = c.copy.read(c.priv, &c.desc, sizeof(c.desc));
+    if ( rc )
+        return rc;
+
+    if ( c.desc.typecode != DOMAIN_SAVE_CODE(HEADER) || c.desc.vcpu_id ||
+         c.desc.flags )
+        return -EINVAL;
+
+    rc = DOMAIN_LOAD_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h), true);
+    if ( rc )
+        return rc;
+
+    if ( h.magic != DOMAIN_SAVE_MAGIC || h.version != DOMAIN_SAVE_VERSION )
+        return -EINVAL;
+
+    domain_pause(d);
+
+    for (;;)
+    {
+        unsigned int i;
+        unsigned int flags;
+        domain_load_handler load;
+        struct vcpu *v;
+
+        rc = c.copy.read(c.priv, &c.desc, sizeof(c.desc));
+        if ( rc )
+            break;
+
+        rc = -EINVAL;
+
+        flags = c.desc.flags;
+        if ( flags & ~DOMAIN_SAVE_FLAG_IGNORE )
+            break;
+
+        if ( c.desc.typecode == DOMAIN_SAVE_CODE(END) ) {
+            if ( !(flags & DOMAIN_SAVE_FLAG_IGNORE) )
+                rc = DOMAIN_LOAD_ENTRY(END, &c, d->vcpu[0], NULL, 0, true);
+
+            break;
+        }
+
+        i = c.desc.typecode;
+        if ( i >= ARRAY_SIZE(handlers) )
+            break;
+
+        if ( (!handlers[i].per_vcpu && c.desc.vcpu_id) ||
+             (c.desc.vcpu_id >= d->max_vcpus) )
+            break;
+
+        v = d->vcpu[c.desc.vcpu_id];
+
+        if ( flags & DOMAIN_SAVE_FLAG_IGNORE )
+        {
+            /* Sink the data */
+            rc = domain_load_entry(&c, c.desc.typecode, "IGNORED",
+                                   v, NULL, c.desc.length, true);
+            if ( rc )
+                break;
+
+            continue;
+        }
+
+        load = handlers[i].load;
+
+        rc = load ? load(v, &c) : -EOPNOTSUPP;
+        if ( rc )
+            break;
+    }
+
+    domain_unpause(d);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h
index 75b8e65bcb..d5b0c15203 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -26,6 +26,11 @@
 #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
 #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
 
+/*
+ * Further use of HVM state is deprecated. New state records should only
+ * be added to the domain state header: public/save.h
+ */
+
 #endif
 
 /*
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 773a380bc2..e61e2dbcd7 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -648,6 +648,11 @@ struct hvm_msr {
  */
 #define HVM_SAVE_CODE_MAX 20
 
+/*
+ * Further use of HVM state is deprecated. New state records should only
+ * be added to the domain state header: public/save.h
+ */
+
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
 
 /*
diff --git a/xen/include/public/save.h b/xen/include/public/save.h
new file mode 100644
index 0000000000..7e5f8752bd
--- /dev/null
+++ b/xen/include/public/save.h
@@ -0,0 +1,84 @@
+/*
+ * save.h
+ *
+ * Structure definitions for common PV/HVM domain state that is held by
+ * Xen and must be saved along with the domain's memory.
+ *
+ * Copyright Amazon.com Inc. or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_SAVE_H__
+#define __XEN_PUBLIC_SAVE_H__
+
+#include "xen.h"
+
+/* Each entry is preceded by a descriptor */
+struct domain_save_descriptor {
+    uint16_t typecode;
+    /*
+     * Each entry will contain either to global or per-vcpu domain state.
+     * Entries relating to global state should have zero in this field.
+     */
+    uint16_t vcpu_id;
+    uint32_t flags;
+    /*
+     * When restoring state this flag can be set in a descriptor to cause
+     * its content to be ignored.
+     *
+     * NOTE: It is invalid to set this flag for HEADER or END records (see
+     *       below).
+     */
+#define _DOMAIN_SAVE_FLAG_IGNORE 0
+#define DOMAIN_SAVE_FLAG_IGNORE (1u << _DOMAIN_SAVE_FLAG_IGNORE)
+
+    /* Entry length not including this descriptor */
+    uint64_t length;
+};
+
+/*
+ * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
+ * binds these things together.
+ */
+#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
+    struct __DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
+
+#define DOMAIN_SAVE_CODE(_x) \
+    (sizeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
+#define DOMAIN_SAVE_TYPE(_x) \
+    typeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)
+
+/* Terminating entry */
+struct domain_save_end {};
+DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
+
+#define DOMAIN_SAVE_MAGIC   0x53415645
+#define DOMAIN_SAVE_VERSION 0x00000001
+
+/* Initial entry */
+struct domain_save_header {
+    uint32_t magic;             /* Must be DOMAIN_SAVE_MAGIC */
+    uint32_t version;           /* Save format version */
+};
+DECLARE_DOMAIN_SAVE_TYPE(HEADER, 1, struct domain_save_header);
+
+#define DOMAIN_SAVE_CODE_MAX 1
+
+#endif /* __XEN_PUBLIC_SAVE_H__ */
diff --git a/xen/include/xen/save.h b/xen/include/xen/save.h
new file mode 100644
index 0000000000..879bbb4390
--- /dev/null
+++ b/xen/include/xen/save.h
@@ -0,0 +1,152 @@
+/*
+ * save.h: support routines for save/restore
+ *
+ * Copyright Amazon.com Inc. or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __XEN_SAVE_H__
+#define __XEN_SAVE_H__
+
+#include <xen/sched.h>
+#include <xen/types.h>
+#include <xen/init.h>
+
+#include <public/xen.h>
+#include <public/save.h>
+
+struct domain_context;
+
+int domain_save_begin(struct domain_context *c, unsigned int tc,
+                      const char *name, const struct vcpu *v, size_t len);
+
+#define DOMAIN_SAVE_BEGIN(_x, _c, _v, _len) \
+        domain_save_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len))
+
+int domain_save_data(struct domain_context *c, const void *data, size_t len);
+int domain_save_end(struct domain_context *c);
+
+static inline int domain_save_entry(struct domain_context *c,
+                                    unsigned int tc, const char *name,
+                                    const struct vcpu *v, const void *src,
+                                    size_t len)
+{
+    int rc;
+
+    rc = domain_save_begin(c, tc, name, v, len);
+    if ( rc )
+        return rc;
+
+    rc = domain_save_data(c, src, len);
+    if ( rc )
+        return rc;
+
+    return domain_save_end(c);
+}
+
+#define DOMAIN_SAVE_ENTRY(_x, _c, _v, _src, _len) \
+    domain_save_entry((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_src), (_len))
+
+int domain_load_begin(struct domain_context *c, unsigned int tc,
+                      const char *name, const struct vcpu *v, size_t len,
+                      bool exact);
+
+#define DOMAIN_LOAD_BEGIN(_x, _c, _v, _len, _exact) \
+        domain_load_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len), \
+                          (_exact));
+
+int domain_load_data(struct domain_context *c, void *data, size_t len);
+int domain_load_end(struct domain_context *c);
+
+static inline int domain_load_entry(struct domain_context *c,
+                                    unsigned int tc, const char *name,
+                                    const struct vcpu *v, void *dst,
+                                    size_t len, bool exact)
+{
+    int rc;
+
+    rc = domain_load_begin(c, tc, name, v, len, exact);
+    if ( rc )
+        return rc;
+
+    rc = domain_load_data(c, dst, len);
+    if ( rc )
+        return rc;
+
+    return domain_load_end(c);
+}
+
+#define DOMAIN_LOAD_ENTRY(_x, _c, _v, _dst, _len, _exact) \
+    domain_load_entry((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_dst), (_len), \
+                          (_exact))
+
+/*
+ * The 'dry_run' flag indicates that the caller of domain_save() (see
+ * below) is not trying to actually acquire the data, only the size
+ * of the data. The save handler can therefore limit work to only that
+ * which is necessary to call DOMAIN_SAVE_BEGIN/ENTRY() with an accurate
+ * value for '_len'.
+ */
+typedef int (*domain_save_handler)(const struct vcpu *v,
+                                   struct domain_context *h,
+                                   bool dry_run);
+typedef int (*domain_load_handler)(struct vcpu *v,
+                                   struct domain_context *h);
+
+void domain_register_save_type(unsigned int tc, const char *name,
+                               bool per_vcpu,
+                               domain_save_handler save,
+                               domain_load_handler load);
+
+/*
+ * Register save and restore handlers. Save handlers will be invoked
+ * in order of DOMAIN_SAVE_CODE().
+ */
+#define DOMAIN_REGISTER_SAVE_RESTORE(_x, _per_vcpu, _save, _load) \
+static int __init __domain_register_##_x##_save_restore(void)     \
+{                                                                 \
+    domain_register_save_type(                                    \
+        DOMAIN_SAVE_CODE(_x),                                     \
+        #_x,                                                      \
+        (_per_vcpu),                                              \
+        &(_save),                                                 \
+        &(_load));                                                \
+                                                                  \
+    return 0;                                                     \
+}                                                                 \
+__initcall(__domain_register_##_x##_save_restore);
+
+/* Copy callback functions */
+typedef int (*domain_write_entry)(void *priv, const void *data, size_t len);
+typedef int (*domain_read_entry)(void *priv, void *data, size_t len);
+
+/*
+ * Entry points:
+ *
+ * int domain_save(struct domain *d, domain_copy_entry copy, void *priv,
+ *                 bool dry_run);
+ * int domain_load(struct domain *d, domain_copy_entry copy, void *priv);
+ *
+ * write/read: This is a callback function provided by the caller that will
+ *             be used to write to (in the save case) or read from (in the
+ *             load case) the context buffer.
+ * priv:       This is a pointer that will be passed to the copy function to
+ *             allow it to identify the context buffer and the current state
+ *             of the save or load operation.
+ */
+int domain_save(struct domain *d, domain_write_entry write, void *priv,
+                bool dry_run);
+int domain_load(struct domain *d, domain_read_entry read, void *priv);
+
+#endif /* __XEN_SAVE_H__ */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
  2020-04-07 17:38 [PATCH v2 0/5] domain context infrastructure Paul Durrant
  2020-04-07 17:38 ` [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context Paul Durrant
@ 2020-04-07 17:38 ` Paul Durrant
  2020-04-20 17:26   ` Julien Grall
  2020-04-29 14:50   ` Jan Beulich
  2020-04-07 17:38 ` [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context Paul Durrant
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 30+ messages in thread
From: Paul Durrant @ 2020-04-07 17:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Paul Durrant,
	Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jan Beulich, Daniel De Graaf

These domctls provide a mechanism to get and set domain context from
the toolstack.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>

v2:
 - drop mask parameter
 - const-ify some more buffers
---
 tools/flask/policy/modules/xen.if   |   4 +-
 tools/libxc/include/xenctrl.h       |   5 ++
 tools/libxc/xc_domain.c             |  54 +++++++++++++
 xen/common/domctl.c                 | 117 ++++++++++++++++++++++++++++
 xen/include/public/domctl.h         |  44 ++++++++++-
 xen/xsm/flask/hooks.c               |   6 ++
 xen/xsm/flask/policy/access_vectors |   4 +
 7 files changed, 231 insertions(+), 3 deletions(-)

diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if
index 8eb2293a52..2bc9db4f64 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -53,7 +53,7 @@ define(`create_domain_common', `
 	allow $1 $2:domain2 { set_cpu_policy settsc setscheduler setclaim
 			set_vnumainfo get_vnumainfo cacheflush
 			psr_cmt_op psr_alloc soft_reset
-			resource_map get_cpu_policy };
+			resource_map get_cpu_policy setcontext };
 	allow $1 $2:security check_context;
 	allow $1 $2:shadow enable;
 	allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp };
@@ -97,7 +97,7 @@ define(`migrate_domain_out', `
 	allow $1 $2:hvm { gethvmc getparam };
 	allow $1 $2:mmu { stat pageinfo map_read };
 	allow $1 $2:domain { getaddrsize getvcpucontext pause destroy };
-	allow $1 $2:domain2 gettsc;
+	allow $1 $2:domain2 { gettsc getcontext };
 	allow $1 $2:shadow { enable disable logdirty };
 ')
 
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 58fa931de1..06ca8e9a74 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -867,6 +867,11 @@ int xc_domain_hvm_setcontext(xc_interface *xch,
                              uint8_t *hvm_ctxt,
                              uint32_t size);
 
+int xc_domain_getcontext(xc_interface *xch, uint32_t domid,
+                         void *ctxt_buf, size_t *size);
+int xc_domain_setcontext(xc_interface *xch, uint32_t domid,
+                         const void *ctxt_buf, size_t size);
+
 /**
  * This function will return guest IO ABI protocol
  *
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 71829c2bce..212d1489dd 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -537,6 +537,60 @@ int xc_domain_hvm_setcontext(xc_interface *xch,
     return ret;
 }
 
+int xc_domain_getcontext(xc_interface *xch, uint32_t domid,
+                         void *ctxt_buf, size_t *size)
+{
+    int ret;
+    DECLARE_DOMCTL;
+    DECLARE_HYPERCALL_BOUNCE(ctxt_buf, *size, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, ctxt_buf) )
+        return -1;
+
+    domctl.cmd = XEN_DOMCTL_getdomaincontext;
+    domctl.domain = domid;
+    domctl.u.getdomaincontext.size = *size;
+    set_xen_guest_handle(domctl.u.setdomaincontext.buffer, ctxt_buf);
+
+    ret = do_domctl(xch, &domctl);
+
+    xc_hypercall_bounce_post(xch, ctxt_buf);
+
+    if ( ret )
+        return ret;
+
+    *size = domctl.u.getdomaincontext.size;
+    if ( *size != domctl.u.getdomaincontext.size )
+    {
+        errno = EOVERFLOW;
+        return -1;
+    }
+
+    return 0;
+}
+
+int xc_domain_setcontext(xc_interface *xch, uint32_t domid,
+                         const void *ctxt_buf, size_t size)
+{
+    int ret;
+    DECLARE_DOMCTL;
+    DECLARE_HYPERCALL_BOUNCE_IN(ctxt_buf, size);
+
+    if ( xc_hypercall_bounce_pre(xch, ctxt_buf) )
+        return -1;
+
+    domctl.cmd = XEN_DOMCTL_setdomaincontext;
+    domctl.domain = domid;
+    domctl.u.setdomaincontext.size = size;
+    set_xen_guest_handle(domctl.u.setdomaincontext.buffer, ctxt_buf);
+
+    ret = do_domctl(xch, &domctl);
+
+    xc_hypercall_bounce_post(xch, ctxt_buf);
+
+    return ret;
+}
+
 int xc_vcpu_getcontext(xc_interface *xch,
                        uint32_t domid,
                        uint32_t vcpu,
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index a69b3b59a8..2e5c6a46d9 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -25,6 +25,7 @@
 #include <xen/hypercall.h>
 #include <xen/vm_event.h>
 #include <xen/monitor.h>
+#include <xen/save.h>
 #include <asm/current.h>
 #include <asm/irq.h>
 #include <asm/page.h>
@@ -358,6 +359,113 @@ static struct vnuma_info *vnuma_init(const struct xen_domctl_vnuma *uinfo,
     return ERR_PTR(ret);
 }
 
+struct domctl_context
+{
+    void *buffer;
+    size_t len;
+    size_t cur;
+};
+
+static int accumulate_size(void *priv, const void *data, size_t len)
+{
+    struct domctl_context *c = priv;
+
+    if ( c->len + len < c->len )
+        return -EOVERFLOW;
+
+    c->len += len;
+
+    return 0;
+}
+
+static int save_data(void *priv, const void *data, size_t len)
+{
+    struct domctl_context *c = priv;
+
+    if ( c->len - c->cur < len )
+        return -ENOSPC;
+
+    memcpy(c->buffer + c->cur, data, len);
+    c->cur += len;
+
+    return 0;
+}
+
+static int getdomaincontext(struct domain *d,
+                            struct xen_domctl_getdomaincontext *gdc)
+{
+    struct domctl_context c = { };
+    int rc;
+
+    if ( d == current->domain )
+        return -EPERM;
+
+    if ( guest_handle_is_null(gdc->buffer) ) /* query for buffer size */
+    {
+        if ( gdc->size )
+            return -EINVAL;
+
+        /* dry run to acquire buffer size */
+        rc = domain_save(d, accumulate_size, &c, true);
+        if ( rc )
+            return rc;
+
+        gdc->size = c.len;
+        return 0;
+    }
+
+    c.len = gdc->size;
+    c.buffer = xmalloc_bytes(c.len);
+    if ( !c.buffer )
+        return -ENOMEM;
+
+    rc = domain_save(d, save_data, &c, false);
+
+    gdc->size = c.cur;
+    if ( !rc && copy_to_guest(gdc->buffer, c.buffer, gdc->size) )
+        rc = -EFAULT;
+
+    xfree(c.buffer);
+
+    return rc;
+}
+
+static int load_data(void *priv, void *data, size_t len)
+{
+    struct domctl_context *c = priv;
+
+    if ( c->len - c->cur < len )
+        return -ENODATA;
+
+    if ( data )
+        memcpy(data, c->buffer + c->cur, len);
+
+    c->cur += len;
+
+    return 0;
+}
+
+static int setdomaincontext(struct domain *d,
+                            const struct xen_domctl_setdomaincontext *sdc)
+{
+    struct domctl_context c = { .len = sdc->size };
+    int rc;
+
+    if ( d == current->domain )
+        return -EPERM;
+
+    c.buffer = xmalloc_bytes(c.len);
+    if ( !c.buffer )
+        return -ENOMEM;
+
+    rc = !copy_from_guest(c.buffer, sdc->buffer, c.len) ?
+        domain_load(d, load_data, &c) : -EFAULT;
+
+    xfree(c.buffer);
+
+    return rc;
+}
+
 long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
     long ret = 0;
@@ -942,6 +1050,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             copyback = 1;
         break;
 
+    case XEN_DOMCTL_getdomaincontext:
+        ret = getdomaincontext(d, &op->u.getdomaincontext);
+        copyback = !ret;
+        break;
+
+    case XEN_DOMCTL_setdomaincontext:
+        ret = setdomaincontext(d, &op->u.setdomaincontext);
+        break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 1ad34c35eb..8ab39acf0c 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -38,7 +38,7 @@
 #include "hvm/save.h"
 #include "memory.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -1129,6 +1129,44 @@ struct xen_domctl_vuart_op {
                                  */
 };
 
+/*
+ * Get/Set domain PV context. The same struct xen_domctl_domaincontext
+ * is used for both commands but with slightly different field semantics
+ * as follows:
+ *
+ * XEN_DOMCTL_getdomaincontext
+ * ---------------------------
+ *
+ * buffer (IN):   The buffer into which the context data should be
+ *                copied, or NULL to query the buffer size that should
+ *                be allocated.
+ * size (IN/OUT): If 'buffer' is NULL then the value passed in must be
+ *                zero, and the value passed out will be the size of the
+ *                buffer to allocate.
+ *                If 'buffer' is non-NULL then the value passed in must
+ *                be the size of the buffer into which data may be copied.
+ */
+struct xen_domctl_getdomaincontext {
+    uint64_t size;
+    XEN_GUEST_HANDLE_64(void) buffer;
+};
+
+/* XEN_DOMCTL_setdomaincontext
+ * ---------------------------
+ *
+ * buffer (IN):   The buffer from which the context data should be
+ *                copied.
+ * size (IN):     The size of the buffer from which data may be copied.
+ *                This data must include DOMAIN_SAVE_CODE_HEADER at the
+ *                start and terminate with a DOMAIN_SAVE_CODE_END record.
+ *                Any data beyond the DOMAIN_SAVE_CODE_END record will be
+ *                ignored.
+ */
+struct xen_domctl_setdomaincontext {
+    uint64_t size;
+    XEN_GUEST_HANDLE_64(const_void) buffer;
+};
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -1210,6 +1248,8 @@ struct xen_domctl {
 #define XEN_DOMCTL_vuart_op                      81
 #define XEN_DOMCTL_get_cpu_policy                82
 #define XEN_DOMCTL_set_cpu_policy                83
+#define XEN_DOMCTL_getdomaincontext              84
+#define XEN_DOMCTL_setdomaincontext              85
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1270,6 +1310,8 @@ struct xen_domctl {
         struct xen_domctl_monitor_op        monitor_op;
         struct xen_domctl_psr_alloc         psr_alloc;
         struct xen_domctl_vuart_op          vuart_op;
+        struct xen_domctl_getdomaincontext  getdomaincontext;
+        struct xen_domctl_setdomaincontext  setdomaincontext;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 8af8602b46..d94d0fc125 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -744,6 +744,12 @@ static int flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_get_cpu_policy:
         return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__GET_CPU_POLICY);
 
+    case XEN_DOMCTL_setdomaincontext:
+        return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__SETCONTEXT);
+
+    case XEN_DOMCTL_getdomaincontext:
+        return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__GETCONTEXT);
+
     default:
         return avc_unknown_permission("domctl", cmd);
     }
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index c055c14c26..fccfb9de82 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -245,6 +245,10 @@ class domain2
     resource_map
 # XEN_DOMCTL_get_cpu_policy
     get_cpu_policy
+# XEN_DOMCTL_setdomaincontext
+    setcontext
+# XEN_DOMCTL_getdomaincontext
+    getcontext
 }
 
 # Similar to class domain, but primarily contains domctls related to HVM domains
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context
  2020-04-07 17:38 [PATCH v2 0/5] domain context infrastructure Paul Durrant
  2020-04-07 17:38 ` [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context Paul Durrant
  2020-04-07 17:38 ` [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext Paul Durrant
@ 2020-04-07 17:38 ` Paul Durrant
  2020-04-29 15:04   ` Jan Beulich
  2020-04-07 17:38 ` [PATCH v2 4/5] common/domain: add a domain context record for shared_info Paul Durrant
  2020-04-07 17:38 ` [PATCH v2 5/5] tools/libxc: make use of domain context SHARED_INFO record Paul Durrant
  4 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2020-04-07 17:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Ian Jackson, Wei Liu, Paul Durrant

This tool is analogous to 'xen-hvmctx' which presents HVM context.
Subsequent patches will add 'dump' functions when new records are
introduced.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wl@xen.org>

v2:
 - Change name from 'xen-ctx' to 'xen-domctx'
---
 .gitignore              |   1 +
 tools/misc/Makefile     |   4 ++
 tools/misc/xen-domctx.c | 145 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 150 insertions(+)
 create mode 100644 tools/misc/xen-domctx.c

diff --git a/.gitignore b/.gitignore
index 4ca679ddbc..744c0529bb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -208,6 +208,7 @@ tools/misc/xen_cpuperf
 tools/misc/xen-cpuid
 tools/misc/xen-detect
 tools/misc/xen-diag
+tools/misc/xen-domctx
 tools/misc/xen-tmem-list-parse
 tools/misc/xen-livepatch
 tools/misc/xenperf
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index 63947bfadc..ef25524354 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -30,6 +30,7 @@ INSTALL_SBIN                   += xenpm
 INSTALL_SBIN                   += xenwatchdogd
 INSTALL_SBIN                   += xen-livepatch
 INSTALL_SBIN                   += xen-diag
+INSTALL_SBIN                   += xen-domctx
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -108,6 +109,9 @@ xen-livepatch: xen-livepatch.o
 xen-diag: xen-diag.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-domctx: xen-domctx.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-domctx.c b/tools/misc/xen-domctx.c
new file mode 100644
index 0000000000..d663522a8b
--- /dev/null
+++ b/tools/misc/xen-domctx.c
@@ -0,0 +1,145 @@
+/*
+ * xen-domctx.c
+ *
+ * Print out domain save records in a human-readable way.
+ *
+ * Copyright Amazon.com Inc. or its affiliates.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <inttypes.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include <xenctrl.h>
+#include <xen/xen.h>
+#include <xen/domctl.h>
+#include <xen/save.h>
+
+static void *buf = NULL;
+static size_t len, off;
+
+#define READ(_x) do {                                                   \
+    if ( len - off < sizeof (_x) )                                      \
+    {                                                                   \
+        fprintf(stderr,                                                 \
+                "Error: need another %lu bytes, only %lu available\n",  \
+                sizeof(_x), len - off);                                 \
+        exit(1);                                                        \
+    }                                                                   \
+    memcpy(&(_x), buf + off, sizeof (_x));                              \
+} while (0)
+
+static void dump_header(struct domain_save_descriptor *desc)
+{
+    DOMAIN_SAVE_TYPE(HEADER) h;
+    READ(h);
+    printf("    HEADER: magic %#x, version %u\n",
+           h.magic, h.version);
+
+    off += desc->length;
+}
+
+static void dump_end(struct domain_save_descriptor *desc)
+{
+    DOMAIN_SAVE_TYPE(END) e;
+    READ(e);
+    printf("    END\n");
+}
+
+int main(int argc, char **argv)
+{
+    uint32_t domid;
+    unsigned int entry;
+    xc_interface *xch;
+    int rc;
+
+    if ( argc != 2 || !argv[1] || (rc = atoi(argv[1])) < 0 )
+    {
+        fprintf(stderr, "usage: %s <domid>\n", argv[0]);
+        exit(1);
+    }
+    domid = rc;
+
+    xch = xc_interface_open(0,0,0);
+    if ( !xch )
+    {
+        fprintf(stderr, "Error: can't open libxc handle\n");
+        exit(1);
+    }
+
+    rc = xc_domain_getcontext(xch, domid, NULL, &len);
+    if ( rc < 0 )
+    {
+        fprintf(stderr, "Error: can't get record length for dom %u: %s\n",
+                domid, strerror(errno));
+        exit(1);
+    }
+
+    buf = malloc(len);
+    if ( !buf )
+    {
+        fprintf(stderr, "Error: can't allocate %lu bytes\n", len);
+        exit(1);
+    }
+
+    rc = xc_domain_getcontext(xch, domid, buf, &len);
+    if ( rc < 0 )
+    {
+        fprintf(stderr, "Error: can't get domain record for dom %u: %s\n",
+                domid, strerror(errno));
+        exit(1);
+    }
+    off = 0;
+
+    printf("Domain save records for d%u\n", domid);
+
+    entry = 0;
+    for (;;) {
+        struct domain_save_descriptor desc;
+
+        READ(desc);
+        printf("[%u] type %u v%u, length %lu\n", entry++,
+               desc.typecode, desc.vcpu_id, (unsigned long)desc.length);
+        off += sizeof(desc);
+
+        switch (desc.typecode)
+        {
+        case DOMAIN_SAVE_CODE(HEADER): dump_header(&desc); break;
+        case DOMAIN_SAVE_CODE(END): dump_end(&desc); return 0;
+        default:
+            printf("Unknown type %u: skipping\n", desc.typecode);
+            off += desc.length;
+            break;
+        }
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/5] common/domain: add a domain context record for shared_info...
  2020-04-07 17:38 [PATCH v2 0/5] domain context infrastructure Paul Durrant
                   ` (2 preceding siblings ...)
  2020-04-07 17:38 ` [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context Paul Durrant
@ 2020-04-07 17:38 ` Paul Durrant
  2020-04-20 17:34   ` Julien Grall
  2020-04-30 11:56   ` Jan Beulich
  2020-04-07 17:38 ` [PATCH v2 5/5] tools/libxc: make use of domain context SHARED_INFO record Paul Durrant
  4 siblings, 2 replies; 30+ messages in thread
From: Paul Durrant @ 2020-04-07 17:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Paul Durrant,
	Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jan Beulich

... and update xen-domctx to dump some information describing the record.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>

v2:
 - Drop the header change to define a 'Xen' page size and instead use a
   variable length struct now that the framework makes this is feasible
 - Guard use of 'has_32bit_shinfo' in common code with CONFIG_COMPAT
---
 tools/misc/xen-domctx.c   | 11 ++++++
 xen/common/domain.c       | 81 +++++++++++++++++++++++++++++++++++++++
 xen/include/public/save.h | 10 ++++-
 3 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/tools/misc/xen-domctx.c b/tools/misc/xen-domctx.c
index d663522a8b..a8d3922321 100644
--- a/tools/misc/xen-domctx.c
+++ b/tools/misc/xen-domctx.c
@@ -59,6 +59,16 @@ static void dump_header(struct domain_save_descriptor *desc)
     off += desc->length;
 }
 
+static void dump_shared_info(struct domain_save_descriptor *desc)
+{
+    DOMAIN_SAVE_TYPE(SHARED_INFO) s;
+    READ(s);
+    printf("    SHARED_INFO: field_width %u buffer size: %lu\n",
+           s.field_width, desc->length - sizeof(s));
+
+    off += desc->length;
+}
+
 static void dump_end(struct domain_save_descriptor *desc)
 {
     DOMAIN_SAVE_TYPE(END) e;
@@ -125,6 +135,7 @@ int main(int argc, char **argv)
         switch (desc.typecode)
         {
         case DOMAIN_SAVE_CODE(HEADER): dump_header(&desc); break;
+        case DOMAIN_SAVE_CODE(SHARED_INFO): dump_shared_info(&desc); break;
         case DOMAIN_SAVE_CODE(END): dump_end(&desc); return 0;
         default:
             printf("Unknown type %u: skipping\n", desc.typecode);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 3dcd73f67c..8b72462e07 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -33,6 +33,7 @@
 #include <xen/xenoprof.h>
 #include <xen/irq.h>
 #include <xen/argo.h>
+#include <xen/save.h>
 #include <asm/debugger.h>
 #include <asm/p2m.h>
 #include <asm/processor.h>
@@ -1646,6 +1647,86 @@ int continue_hypercall_on_cpu(
     return 0;
 }
 
+static int save_shared_info(const struct vcpu *v, struct domain_context *c,
+                            bool dry_run)
+{
+    struct domain *d = v->domain;
+    struct domain_shared_info_context ctxt = {};
+    size_t hdr_size = offsetof(typeof(ctxt), buffer);
+    size_t size = hdr_size + PAGE_SIZE;
+    int rc;
+
+    rc = DOMAIN_SAVE_BEGIN(SHARED_INFO, c, v, size);
+    if ( rc )
+        return rc;
+
+    if ( !dry_run )
+        ctxt.field_width =
+#ifdef CONFIG_COMPAT
+            has_32bit_shinfo(d) ? 4 :
+#endif
+            8;
+
+    rc = domain_save_data(c, &ctxt, hdr_size);
+    if ( rc )
+        return rc;
+
+    rc = domain_save_data(c, d->shared_info, PAGE_SIZE);
+    if ( rc )
+        return rc;
+
+    return domain_save_end(c);
+}
+
+static int load_shared_info(struct vcpu *v, struct domain_context *c)
+{
+    struct domain *d = v->domain;
+    struct domain_shared_info_context ctxt = {};
+    size_t hdr_size = offsetof(typeof(ctxt), buffer);
+    size_t size = hdr_size + PAGE_SIZE;
+    unsigned int i;
+    int rc;
+
+    rc = DOMAIN_LOAD_BEGIN(SHARED_INFO, c, v, size, true);
+    if ( rc )
+        return rc;
+
+    rc = domain_load_data(c, &ctxt, hdr_size);
+    if ( rc )
+        return rc;
+
+    for ( i = 0; i < ARRAY_SIZE(ctxt.pad); i++ )
+        if ( ctxt.pad[i] )
+            return -EINVAL;
+
+    switch ( ctxt.field_width )
+    {
+#ifdef CONFIG_COMPAT
+    case 4:
+        d->arch.has_32bit_shinfo = 1;
+        break;
+#endif
+    case 8:
+#ifdef CONFIG_COMPAT
+        d->arch.has_32bit_shinfo = 0;
+#endif
+        break;
+
+    default:
+        rc = -EINVAL;
+        break;
+    }
+
+    rc = domain_load_data(c, d->shared_info, PAGE_SIZE);
+    if ( rc )
+        return rc;
+
+    return domain_load_end(c);
+}
+
+DOMAIN_REGISTER_SAVE_RESTORE(SHARED_INFO, false, save_shared_info,
+                             load_shared_info);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/public/save.h b/xen/include/public/save.h
index 7e5f8752bd..ed994a8765 100644
--- a/xen/include/public/save.h
+++ b/xen/include/public/save.h
@@ -79,6 +79,14 @@ struct domain_save_header {
 };
 DECLARE_DOMAIN_SAVE_TYPE(HEADER, 1, struct domain_save_header);
 
-#define DOMAIN_SAVE_CODE_MAX 1
+struct domain_shared_info_context {
+    uint8_t field_width;
+    uint8_t pad[7];
+    uint8_t buffer[]; /* Implementation specific size */
+};
+
+DECLARE_DOMAIN_SAVE_TYPE(SHARED_INFO, 2, struct domain_shared_info_context);
+
+#define DOMAIN_SAVE_CODE_MAX 2
 
 #endif /* __XEN_PUBLIC_SAVE_H__ */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 5/5] tools/libxc: make use of domain context SHARED_INFO record...
  2020-04-07 17:38 [PATCH v2 0/5] domain context infrastructure Paul Durrant
                   ` (3 preceding siblings ...)
  2020-04-07 17:38 ` [PATCH v2 4/5] common/domain: add a domain context record for shared_info Paul Durrant
@ 2020-04-07 17:38 ` Paul Durrant
  2020-04-30 11:57   ` Jan Beulich
  4 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2020-04-07 17:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Ian Jackson, Wei Liu, Paul Durrant

... in the save/restore code.

This patch replaces direct mapping of the shared_info_frame (retrieved
using XEN_DOMCTL_getdomaininfo) with save/load of the domain context
SHARED_INFO record.

No modifications are made to the definition of the migration stream at
this point. Subsequent patches will define a record in the libxc domain
image format for passing domain context and convert the save/restore code
to use that.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wl@xen.org>

v2:
 - Re-based (now making use of DOMAIN_SAVE_FLAG_IGNORE)
---
 tools/libxc/xc_sr_common.h         |  7 +++-
 tools/libxc/xc_sr_common_x86.c     | 59 ++++++++++++++++++++++++++++++
 tools/libxc/xc_sr_common_x86.h     |  4 ++
 tools/libxc/xc_sr_common_x86_pv.c  | 53 +++++++++++++++++++++++++++
 tools/libxc/xc_sr_common_x86_pv.h  |  3 ++
 tools/libxc/xc_sr_restore_x86_pv.c | 40 ++++++++------------
 tools/libxc/xc_sr_save_x86_pv.c    | 26 ++-----------
 tools/libxc/xg_save_restore.h      |  1 +
 8 files changed, 144 insertions(+), 49 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 5dd51ccb15..db6519cdcc 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -287,6 +287,11 @@ struct xc_sr_context
     {
         struct /* x86 */
         {
+            struct {
+                void *buffer;
+                unsigned int len;
+            } domain_context;
+
             struct /* x86 PV guest. */
             {
                 /* 4 or 8; 32 or 64 bit domain */
@@ -314,7 +319,7 @@ struct xc_sr_context
                 /* The guest pfns containing the p2m leaves */
                 xen_pfn_t *p2m_pfns;
 
-                /* Read-only mapping of guests shared info page */
+                /* Pointer to shared_info (located in context buffer) */
                 shared_info_any_t *shinfo;
 
                 /* p2m generation count for verifying validity of local p2m. */
diff --git a/tools/libxc/xc_sr_common_x86.c b/tools/libxc/xc_sr_common_x86.c
index 011684df97..e87dc0f0f3 100644
--- a/tools/libxc/xc_sr_common_x86.c
+++ b/tools/libxc/xc_sr_common_x86.c
@@ -42,6 +42,65 @@ int handle_x86_tsc_info(struct xc_sr_context *ctx, struct xc_sr_record *rec)
     return 0;
 }
 
+int x86_get_context(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+    size_t len = 0;
+    int rc;
+
+    if ( ctx->x86.domain_context.buffer )
+    {
+        ERROR("Domain context already present");
+        return -1;
+    }
+
+    rc = xc_domain_getcontext(xch, ctx->domid, NULL, &len);
+    if ( rc < 0 )
+    {
+        PERROR("Unable to get size of domain context");
+        return -1;
+    }
+
+    ctx->x86.domain_context.buffer = malloc(len);
+    if ( ctx->x86.domain_context.buffer == NULL )
+    {
+        PERROR("Unable to allocate memory for domain context");
+        return -1;
+    }
+
+    rc = xc_domain_getcontext(xch, ctx->domid,
+                              ctx->x86.domain_context.buffer, &len);
+    if ( rc < 0 )
+    {
+        PERROR("Unable to get domain context");
+        return -1;
+    }
+
+    ctx->x86.domain_context.len = len;
+
+    return 0;
+}
+
+int x86_set_context(struct xc_sr_context *ctx)
+{
+    xc_interface *xch = ctx->xch;
+
+    if ( !ctx->x86.domain_context.buffer )
+    {
+        ERROR("Domain context not present");
+        return -1;
+    }
+
+    return xc_domain_setcontext(xch, ctx->domid,
+                                ctx->x86.domain_context.buffer,
+                                ctx->x86.domain_context.len);
+}
+
+void x86_cleanup(struct xc_sr_context *ctx)
+{
+    free(ctx->x86.domain_context.buffer);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common_x86.h b/tools/libxc/xc_sr_common_x86.h
index ebc4355bd1..501c9e52ba 100644
--- a/tools/libxc/xc_sr_common_x86.h
+++ b/tools/libxc/xc_sr_common_x86.h
@@ -14,6 +14,10 @@ int write_x86_tsc_info(struct xc_sr_context *ctx);
  */
 int handle_x86_tsc_info(struct xc_sr_context *ctx, struct xc_sr_record *rec);
 
+int x86_get_context(struct xc_sr_context *ctx);
+int x86_set_context(struct xc_sr_context *ctx);
+void x86_cleanup(struct xc_sr_context *ctx);
+
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_common_x86_pv.c b/tools/libxc/xc_sr_common_x86_pv.c
index d3d425cb82..7354fd6052 100644
--- a/tools/libxc/xc_sr_common_x86_pv.c
+++ b/tools/libxc/xc_sr_common_x86_pv.c
@@ -182,6 +182,59 @@ int x86_pv_map_m2p(struct xc_sr_context *ctx)
     return rc;
 }
 
+int x86_pv_get_shinfo(struct xc_sr_context *ctx)
+{
+    unsigned int off = 0;
+    struct domain_save_descriptor *desc;
+    int rc;
+
+    rc = x86_get_context(ctx);
+    if ( rc )
+        return rc;
+
+    do {
+        if ( ctx->x86.domain_context.len - off < sizeof(*desc) )
+            return -1;
+
+        desc = ctx->x86.domain_context.buffer + off;
+        off += sizeof(*desc);
+
+        switch (desc->typecode)
+        {
+        case DOMAIN_SAVE_CODE(SHARED_INFO):
+        {
+            DOMAIN_SAVE_TYPE(SHARED_INFO) *s;
+
+            if ( ctx->x86.domain_context.len - off < sizeof(*s) )
+                return -1;
+
+            s = ctx->x86.domain_context.buffer + off;
+            ctx->x86.pv.shinfo = (shared_info_any_t *)s->buffer;
+            /* fall through */
+        }
+        case DOMAIN_SAVE_CODE(HEADER):
+            off += desc->length;
+            /* fall through */
+        case DOMAIN_SAVE_CODE(END):
+            break;
+        default:
+            desc->flags |= DOMAIN_SAVE_FLAG_IGNORE;
+            off += desc->length;
+            break;
+        }
+    } while ( desc->typecode != DOMAIN_SAVE_CODE(END) );
+
+    if ( !ctx->x86.pv.shinfo )
+        return -1;
+
+    return 0;
+}
+
+int x86_pv_set_shinfo(struct xc_sr_context *ctx)
+{
+    return ctx->x86.pv.shinfo ? x86_set_context(ctx) : -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common_x86_pv.h b/tools/libxc/xc_sr_common_x86_pv.h
index 2ed03309af..01442f48fb 100644
--- a/tools/libxc/xc_sr_common_x86_pv.h
+++ b/tools/libxc/xc_sr_common_x86_pv.h
@@ -97,6 +97,9 @@ int x86_pv_domain_info(struct xc_sr_context *ctx);
  */
 int x86_pv_map_m2p(struct xc_sr_context *ctx);
 
+int x86_pv_get_shinfo(struct xc_sr_context *ctx);
+int x86_pv_set_shinfo(struct xc_sr_context *ctx);
+
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore_x86_pv.c b/tools/libxc/xc_sr_restore_x86_pv.c
index 904ccc462a..4dbc7f0da5 100644
--- a/tools/libxc/xc_sr_restore_x86_pv.c
+++ b/tools/libxc/xc_sr_restore_x86_pv.c
@@ -864,8 +864,7 @@ static int handle_shared_info(struct xc_sr_context *ctx,
 {
     xc_interface *xch = ctx->xch;
     unsigned int i;
-    int rc = -1;
-    shared_info_any_t *guest_shinfo = NULL;
+    int rc;
     const shared_info_any_t *old_shinfo = rec->data;
 
     if ( !ctx->x86.pv.restore.seen_pv_info )
@@ -878,39 +877,30 @@ static int handle_shared_info(struct xc_sr_context *ctx,
     {
         ERROR("X86_PV_SHARED_INFO record wrong size: length %u"
               ", expected 4096", rec->length);
-        goto err;
+        return -1;
     }
 
-    guest_shinfo = xc_map_foreign_range(
-        xch, ctx->domid, PAGE_SIZE, PROT_READ | PROT_WRITE,
-        ctx->dominfo.shared_info_frame);
-    if ( !guest_shinfo )
-    {
-        PERROR("Failed to map Shared Info at mfn %#lx",
-               ctx->dominfo.shared_info_frame);
-        goto err;
-    }
+    rc = x86_pv_get_shinfo(ctx);
+    if ( rc )
+        return rc;
 
-    MEMCPY_FIELD(guest_shinfo, old_shinfo, vcpu_info, ctx->x86.pv.width);
-    MEMCPY_FIELD(guest_shinfo, old_shinfo, arch, ctx->x86.pv.width);
+    MEMCPY_FIELD(ctx->x86.pv.shinfo, old_shinfo, vcpu_info,
+                 ctx->x86.pv.width);
+    MEMCPY_FIELD(ctx->x86.pv.shinfo, old_shinfo, arch, ctx->x86.pv.width);
 
-    SET_FIELD(guest_shinfo, arch.pfn_to_mfn_frame_list_list,
+    SET_FIELD(ctx->x86.pv.shinfo, arch.pfn_to_mfn_frame_list_list,
               0, ctx->x86.pv.width);
 
-    MEMSET_ARRAY_FIELD(guest_shinfo, evtchn_pending, 0, ctx->x86.pv.width);
+    MEMSET_ARRAY_FIELD(ctx->x86.pv.shinfo, evtchn_pending, 0,
+                       ctx->x86.pv.width);
     for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
-        SET_FIELD(guest_shinfo, vcpu_info[i].evtchn_pending_sel,
+        SET_FIELD(ctx->x86.pv.shinfo, vcpu_info[i].evtchn_pending_sel,
                   0, ctx->x86.pv.width);
 
-    MEMSET_ARRAY_FIELD(guest_shinfo, evtchn_mask, 0xff, ctx->x86.pv.width);
-
-    rc = 0;
+    MEMSET_ARRAY_FIELD(ctx->x86.pv.shinfo, evtchn_mask, 0xff,
+                       ctx->x86.pv.width);
 
- err:
-    if ( guest_shinfo )
-        munmap(guest_shinfo, PAGE_SIZE);
-
-    return rc;
+    return x86_pv_set_shinfo(ctx);
 }
 
 /* restore_ops function. */
diff --git a/tools/libxc/xc_sr_save_x86_pv.c b/tools/libxc/xc_sr_save_x86_pv.c
index f3ccf5bb4b..7c4fcffa92 100644
--- a/tools/libxc/xc_sr_save_x86_pv.c
+++ b/tools/libxc/xc_sr_save_x86_pv.c
@@ -9,25 +9,6 @@ static inline bool is_canonical_address(xen_vaddr_t vaddr)
     return ((int64_t)vaddr >> 47) == ((int64_t)vaddr >> 63);
 }
 
-/*
- * Maps the guests shared info page.
- */
-static int map_shinfo(struct xc_sr_context *ctx)
-{
-    xc_interface *xch = ctx->xch;
-
-    ctx->x86.pv.shinfo = xc_map_foreign_range(
-        xch, ctx->domid, PAGE_SIZE, PROT_READ, ctx->dominfo.shared_info_frame);
-    if ( !ctx->x86.pv.shinfo )
-    {
-        PERROR("Failed to map shared info frame at mfn %#lx",
-               ctx->dominfo.shared_info_frame);
-        return -1;
-    }
-
-    return 0;
-}
-
 /*
  * Copy a list of mfns from a guest, accounting for differences between guest
  * and toolstack width.  Can fail if truncation would occur.
@@ -1041,7 +1022,7 @@ static int x86_pv_setup(struct xc_sr_context *ctx)
     if ( rc )
         return rc;
 
-    rc = map_shinfo(ctx);
+    rc = x86_pv_get_shinfo(ctx);
     if ( rc )
         return rc;
 
@@ -1112,12 +1093,11 @@ static int x86_pv_cleanup(struct xc_sr_context *ctx)
     if ( ctx->x86.pv.p2m )
         munmap(ctx->x86.pv.p2m, ctx->x86.pv.p2m_frames * PAGE_SIZE);
 
-    if ( ctx->x86.pv.shinfo )
-        munmap(ctx->x86.pv.shinfo, PAGE_SIZE);
-
     if ( ctx->x86.pv.m2p )
         munmap(ctx->x86.pv.m2p, ctx->x86.pv.nr_m2p_frames * PAGE_SIZE);
 
+    x86_cleanup(ctx);
+
     return 0;
 }
 
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index 303081df0d..296b523963 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -19,6 +19,7 @@
 
 #include <xen/foreign/x86_32.h>
 #include <xen/foreign/x86_64.h>
+#include <xen/save.h>
 
 /*
 ** We process save/restore/migrate in batches of pages; the below
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-04-07 17:38 ` [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context Paul Durrant
@ 2020-04-20 17:20   ` Julien Grall
  2020-04-28 15:35     ` Paul Durrant
  2020-04-29 11:02   ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Julien Grall @ 2020-04-20 17:20 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	Ian Jackson, George Dunlap, Jan Beulich, Volodymyr Babchuk,
	Roger Pau Monné

Hi Paul,

On 07/04/2020 18:38, Paul Durrant wrote:
> To allow enlightened HVM guests (i.e. those that have PV drivers) to be
> migrated without their co-operation it will be necessary to transfer 'PV'
> state such as event channel state, grant entry state, etc.
> 
> Currently there is a framework (entered via the hvm_save/load() functions)
> that allows a domain's 'HVM' (architectural) state to be transferred but
> 'PV' state is also common with pure PV guests and so this framework is not
> really suitable.
> 
> This patch adds the new public header and low level implementation of a new
> common framework, entered via the domain_save/load() functions. Subsequent
> patches will introduce other parts of the framework, and code that will
> make use of it within the current version of the libxc migration stream.
> 
> This patch also marks the HVM-only framework as deprecated in favour of the
> new framework.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Wei Liu <wl@xen.org>
> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> 
> v2:
>   - Allow multi-stage save/load to avoid the need to double-buffer
>   - Get rid of the masks and add an 'ignore' flag instead
>   - Create copy function union to preserve const save buffer
>   - Deprecate HVM-only framework
> ---
>   xen/common/Makefile                    |   1 +
>   xen/common/save.c                      | 329 +++++++++++++++++++++++++
>   xen/include/public/arch-arm/hvm/save.h |   5 +
>   xen/include/public/arch-x86/hvm/save.h |   5 +
>   xen/include/public/save.h              |  84 +++++++
>   xen/include/xen/save.h                 | 152 ++++++++++++
>   6 files changed, 576 insertions(+)
>   create mode 100644 xen/common/save.c
>   create mode 100644 xen/include/public/save.h
>   create mode 100644 xen/include/xen/save.h
> 
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index e8cde65370..90553ba5d7 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -37,6 +37,7 @@ obj-y += radix-tree.o
>   obj-y += rbtree.o
>   obj-y += rcupdate.o
>   obj-y += rwlock.o
> +obj-y += save.o
>   obj-y += shutdown.o
>   obj-y += softirq.o
>   obj-y += sort.o
> diff --git a/xen/common/save.c b/xen/common/save.c
> new file mode 100644
> index 0000000000..6cdac3785b
> --- /dev/null
> +++ b/xen/common/save.c
> @@ -0,0 +1,329 @@
> +/*
> + * save.c: Save and restore PV guest state common to all domain types.
> + *
> + * Copyright Amazon.com Inc. or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/save.h>
> +
> +union domain_copy_entry {
> +    domain_write_entry write;
> +    domain_read_entry read;
> +};
> +
> +struct domain_context {
> +    bool log;
> +    struct domain_save_descriptor desc;
> +    size_t data_len;

What is data_len?

> +    union domain_copy_entry copy;
> +    void *priv;
> +};
> +
> +static struct {
> +    const char *name;
> +    bool per_vcpu;
> +    domain_save_handler save;
> +    domain_load_handler load;
> +} handlers[DOMAIN_SAVE_CODE_MAX + 1];
> +
> +void __init domain_register_save_type(unsigned int tc, const char *name,
> +                                      bool per_vcpu,
> +                                      domain_save_handler save,
> +                                      domain_load_handler load)
> +{
> +    BUG_ON(tc > ARRAY_SIZE(handlers));
> +
> +    ASSERT(!handlers[tc].save);
> +    ASSERT(!handlers[tc].load);
> +
> +    handlers[tc].name = name;
> +    handlers[tc].per_vcpu = per_vcpu;
> +    handlers[tc].save = save;
> +    handlers[tc].load = load;
> +}
> +
> +int domain_save_begin(struct domain_context *c, unsigned int tc,
> +                      const char *name, const struct vcpu *v, size_t len)
> +{
> +    int rc;
> +
> +    if ( c->log )
> +        gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
> +                 (unsigned long)len);

How about using %zu rather than a cast here?

> +
> +    BUG_ON(tc != c->desc.typecode);
> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);

I can't find any answer on my question about this part. I understand the 
code would be buggy if this happen, but is it warrant to crash the host? 
Couldn't you just return an error and continue to run?

> +
> +    ASSERT(!c->data_len);
> +    c->data_len = c->desc.length = len;
> +
> +    rc = c->copy.write(c->priv, &c->desc, sizeof(c->desc));
> +    if ( rc )
> +        return rc;
> +
> +    c->desc.length = 0;

This is confusing, why would you need to reset c->desc.length but not 
the rest of the fields?

> +
> +    return 0;
> +}
> +
> +int domain_save_data(struct domain_context *c, const void *src, size_t len)
> +{
> +    if ( c->desc.length + len > c->data_len )
> +        return -ENOSPC;
> +
> +    c->desc.length += len;
> +
> +    return c->copy.write(c->priv, src, len);
> +}
> +
> +int domain_save_end(struct domain_context *c)
> +{
> +    /*
> +     * If desc.length does not match the length specified in
> +     * domain_save_begin(), there should have been more data.
> +     */
> +    if ( c->desc.length != c->data_len )

This suggests you know in advance the size of the record. I have seen 
some cases where we don't know the size in advance. A good example if 
when saving grants. You know the maximum of grant used by the guest but 
you don't know yet the number of grants used. You might need to walk the 
"list" twice or allocate a temporary buffer. None of them are ideal.

Another example is when saving memory, we may want to compact page 
informations to save space.

This problem is going to be more relevant for LiveUpdate where we need 
to be able to create the stream in a very short amount of time. 
Allocating any temporary buffer and/or walking the list twice is going 
to kill the performance.

I would suggest to consider a different approach where you update the 
record length at the end.

FWIW, this below the current approach for the LU stream (IIRC David sent 
a prototype recently). A record is opened using lu_stream_open_record(), 
you then have two way to add data:
     - lu_stream_append() -> This takes a buffer and write to the stream.
     - lu_stream_reserve() -> Pre-allocate space in the stream and 
return a pointer to the beginning of the reserved region.
     - lu_stream_end_reservation() -> Takes the actual size in parameter 
and update the stream size.
     - lu_stream_close_record() -> Update the header with the total 
length and update the position in the stream.

> +        return -EIO;

I noticed that all the pasding check have been dropped. I still think we 
need implicit padding to harden the code.

> +
> +    c->data_len = 0;
> +
> +    return 0;
> +}
> +
> +int domain_save(struct domain *d, domain_write_entry write, void *priv,
> +                bool dry_run)
> +{
> +    struct domain_context c = {
> +        .copy.write = write,
> +        .priv = priv,
> +        .log = !dry_run,
> +    };
> +    struct domain_save_header h = {
> +        .magic = DOMAIN_SAVE_MAGIC,
> +        .version = DOMAIN_SAVE_VERSION,
> +    };
> +    struct domain_save_header e;
> +    unsigned int i;
> +    int rc;
> +
> +    ASSERT(d != current->domain);
> +
> +    if ( d->is_dying )

I can't find an answer to my question about d->is_dying. What if the 
guest die afterwards? What does protect us against domain_kill()?

[...]

> +int domain_load(struct domain *d, domain_read_entry read, void *priv)
> +{
> +    struct domain_context c = {
> +        .copy.read = read,
> +        .priv = priv,
> +        .log = true,
> +    };
> +    struct domain_save_header h;
> +    int rc;
> +
> +    ASSERT(d != current->domain);
> +
> +    if ( d->is_dying )
> +        return -EINVAL;

Same here.


> diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> new file mode 100644
> index 0000000000..7e5f8752bd
> --- /dev/null
> +++ b/xen/include/public/save.h
> @@ -0,0 +1,84 @@
> +/*
> + * save.h
> + *
> + * Structure definitions for common PV/HVM domain state that is held by
> + * Xen and must be saved along with the domain's memory.
> + *
> + * Copyright Amazon.com Inc. or its affiliates.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __XEN_PUBLIC_SAVE_H__
> +#define __XEN_PUBLIC_SAVE_H__

Does this header need to be exposed outside of Xen and the tools?

> +
> +#include "xen.h"
> +
> +/* Each entry is preceded by a descriptor */
> +struct domain_save_descriptor {
> +    uint16_t typecode;
> +    /*
> +     * Each entry will contain either to global or per-vcpu domain state.
> +     * Entries relating to global state should have zero in this field.
> +     */
> +    uint16_t vcpu_id;
> +    uint32_t flags;
> +    /*
> +     * When restoring state this flag can be set in a descriptor to cause
> +     * its content to be ignored.

Could you give examples where you would want to ignore a descriptor?

> +     *
> +     * NOTE: It is invalid to set this flag for HEADER or END records (see
> +     *       below).
> +     */
> +#define _DOMAIN_SAVE_FLAG_IGNORE 0
> +#define DOMAIN_SAVE_FLAG_IGNORE (1u << _DOMAIN_SAVE_FLAG_IGNORE)
> +
> +    /* Entry length not including this descriptor */
> +    uint64_t length;
> +};
> +
> +/*
> + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> + * binds these things together.
> + */
> +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> +    struct __DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
> +
> +#define DOMAIN_SAVE_CODE(_x) \
> +    (sizeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
> +#define DOMAIN_SAVE_TYPE(_x) \
> +    typeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)
> +
> +/* Terminating entry */
> +struct domain_save_end {};
> +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
> +
> +#define DOMAIN_SAVE_MAGIC   0x53415645
> +#define DOMAIN_SAVE_VERSION 0x00000001
> +
> +/* Initial entry */
> +struct domain_save_header {
> +    uint32_t magic;             /* Must be DOMAIN_SAVE_MAGIC */
> +    uint32_t version;           /* Save format version */


I haven't seen any answer about xen version here. For the record:

"Let's imagine in 4.14 we introduced a bug in the saving part. This is
solved in 4.15 but somehow the version is not bumped. How would you
differentiate the streams saved by Xen 4.14 so you can still migrate?

If you record the version of Xen in the record automatically, then you
at least have a way to differentiate the two versions."

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
  2020-04-07 17:38 ` [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext Paul Durrant
@ 2020-04-20 17:26   ` Julien Grall
  2020-04-28 15:36     ` Paul Durrant
  2020-04-29 14:50   ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Julien Grall @ 2020-04-20 17:26 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	Ian Jackson, George Dunlap, Jan Beulich, Daniel De Graaf

Hi Paul,

On 07/04/2020 18:38, Paul Durrant wrote:
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 1ad34c35eb..8ab39acf0c 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -38,7 +38,7 @@
>   #include "hvm/save.h"
>   #include "memory.h"
>   
> -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
> +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
>   
>   /*
>    * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
> @@ -1129,6 +1129,44 @@ struct xen_domctl_vuart_op {
>                                    */
>   };
>   
> +/*
> + * Get/Set domain PV context. The same struct xen_domctl_domaincontext

I think you want to update the comments to match the split.

> + * is used for both commands but with slightly different field semantics
> + * as follows:

Reviewed-by: Julien Grall <jgrall@amazon.com>


Cheers,


-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/5] common/domain: add a domain context record for shared_info...
  2020-04-07 17:38 ` [PATCH v2 4/5] common/domain: add a domain context record for shared_info Paul Durrant
@ 2020-04-20 17:34   ` Julien Grall
  2020-04-28 15:37     ` Paul Durrant
  2020-04-30 11:56   ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Julien Grall @ 2020-04-20 17:34 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Paul Durrant,
	Ian Jackson, George Dunlap, Jan Beulich

Hi Paul,

On 07/04/2020 18:38, Paul Durrant wrote:
> ... and update xen-domctx to dump some information describing the record.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wl@xen.org>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> 
> v2:
>   - Drop the header change to define a 'Xen' page size and instead use a
>     variable length struct now that the framework makes this is feasible
>   - Guard use of 'has_32bit_shinfo' in common code with CONFIG_COMPAT
> ---
>   tools/misc/xen-domctx.c   | 11 ++++++
>   xen/common/domain.c       | 81 +++++++++++++++++++++++++++++++++++++++
>   xen/include/public/save.h | 10 ++++-
>   3 files changed, 101 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/misc/xen-domctx.c b/tools/misc/xen-domctx.c
> index d663522a8b..a8d3922321 100644
> --- a/tools/misc/xen-domctx.c
> +++ b/tools/misc/xen-domctx.c
> @@ -59,6 +59,16 @@ static void dump_header(struct domain_save_descriptor *desc)
>       off += desc->length;
>   }
>   
> +static void dump_shared_info(struct domain_save_descriptor *desc)
> +{
> +    DOMAIN_SAVE_TYPE(SHARED_INFO) s;
> +    READ(s);
> +    printf("    SHARED_INFO: field_width %u buffer size: %lu\n",
> +           s.field_width, desc->length - sizeof(s));
> +
> +    off += desc->length;
> +}
> +
>   static void dump_end(struct domain_save_descriptor *desc)
>   {
>       DOMAIN_SAVE_TYPE(END) e;
> @@ -125,6 +135,7 @@ int main(int argc, char **argv)
>           switch (desc.typecode)
>           {
>           case DOMAIN_SAVE_CODE(HEADER): dump_header(&desc); break;
> +        case DOMAIN_SAVE_CODE(SHARED_INFO): dump_shared_info(&desc); break;
>           case DOMAIN_SAVE_CODE(END): dump_end(&desc); return 0;
>           default:
>               printf("Unknown type %u: skipping\n", desc.typecode);
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 3dcd73f67c..8b72462e07 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -33,6 +33,7 @@
>   #include <xen/xenoprof.h>
>   #include <xen/irq.h>
>   #include <xen/argo.h>
> +#include <xen/save.h>
>   #include <asm/debugger.h>
>   #include <asm/p2m.h>
>   #include <asm/processor.h>
> @@ -1646,6 +1647,86 @@ int continue_hypercall_on_cpu(
>       return 0;
>   }
>   
> +static int save_shared_info(const struct vcpu *v, struct domain_context *c,
> +                            bool dry_run)
> +{
> +    struct domain *d = v->domain;
> +    struct domain_shared_info_context ctxt = {};
> +    size_t hdr_size = offsetof(typeof(ctxt), buffer);
> +    size_t size = hdr_size + PAGE_SIZE;
> +    int rc;
> +
> +    rc = DOMAIN_SAVE_BEGIN(SHARED_INFO, c, v, size);
> +    if ( rc )
> +        return rc;
> +
> +    if ( !dry_run )

NIT: I think the if is not necessary here as you don't skip that much code.

> +        ctxt.field_width =
> +#ifdef CONFIG_COMPAT
> +            has_32bit_shinfo(d) ? 4 :
> +#endif
> +            8;
> +
> +    rc = domain_save_data(c, &ctxt, hdr_size);
> +    if ( rc )
> +        return rc;
> +
> +    rc = domain_save_data(c, d->shared_info, PAGE_SIZE);
> +    if ( rc )
> +        return rc;
> +
> +    return domain_save_end(c);
> +}
> +
> +static int load_shared_info(struct vcpu *v, struct domain_context *c)
> +{
> +    struct domain *d = v->domain;
> +    struct domain_shared_info_context ctxt = {};
> +    size_t hdr_size = offsetof(typeof(ctxt), buffer);
> +    size_t size = hdr_size + PAGE_SIZE;
> +    unsigned int i;
> +    int rc;
> +
> +    rc = DOMAIN_LOAD_BEGIN(SHARED_INFO, c, v, size, true);
> +    if ( rc )
> +        return rc;
> +
> +    rc = domain_load_data(c, &ctxt, hdr_size);
> +    if ( rc )
> +        return rc;
> +
> +    for ( i = 0; i < ARRAY_SIZE(ctxt.pad); i++ )
> +        if ( ctxt.pad[i] )
> +            return -EINVAL;
> +
> +    switch ( ctxt.field_width )
> +    {
> +#ifdef CONFIG_COMPAT
> +    case 4:
> +        d->arch.has_32bit_shinfo = 1;
> +        break;
> +#endif
> +    case 8:
> +#ifdef CONFIG_COMPAT
> +        d->arch.has_32bit_shinfo = 0;
> +#endif
> +        break;
> +
> +    default:
> +        rc = -EINVAL;
> +        break;
> +    }
> +
> +    rc = domain_load_data(c, d->shared_info, PAGE_SIZE);
> +    if ( rc )
> +        return rc;
> +
> +    return domain_load_end(c);
> +}
> +
> +DOMAIN_REGISTER_SAVE_RESTORE(SHARED_INFO, false, save_shared_info,
> +                             load_shared_info);
> +
>   /*
>    * Local variables:
>    * mode: C
> diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> index 7e5f8752bd..ed994a8765 100644
> --- a/xen/include/public/save.h
> +++ b/xen/include/public/save.h
> @@ -79,6 +79,14 @@ struct domain_save_header {
>   };
>   DECLARE_DOMAIN_SAVE_TYPE(HEADER, 1, struct domain_save_header);
>   
> -#define DOMAIN_SAVE_CODE_MAX 1
> +struct domain_shared_info_context {
> +    uint8_t field_width;
> +    uint8_t pad[7];
> +    uint8_t buffer[]; /* Implementation specific size */

I would recommend to use buffer[XEN_FLEX_ARRAY_DIM].

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-04-20 17:20   ` Julien Grall
@ 2020-04-28 15:35     ` Paul Durrant
  2020-04-29 11:05       ` Julien Grall
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2020-04-28 15:35 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: 'Stefano Stabellini', 'Wei Liu',
	'Andrew Cooper', 'Paul Durrant',
	'Ian Jackson', 'George Dunlap',
	'Jan Beulich', 'Volodymyr Babchuk',
	'Roger Pau Monné'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 20 April 2020 18:21
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>;
> Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Roger Pau Monné <roger.pau@citrix.com>
> Subject: Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
> 
> Hi Paul,
> 
> On 07/04/2020 18:38, Paul Durrant wrote:
> > To allow enlightened HVM guests (i.e. those that have PV drivers) to be
> > migrated without their co-operation it will be necessary to transfer 'PV'
> > state such as event channel state, grant entry state, etc.
> >
> > Currently there is a framework (entered via the hvm_save/load() functions)
> > that allows a domain's 'HVM' (architectural) state to be transferred but
> > 'PV' state is also common with pure PV guests and so this framework is not
> > really suitable.
> >
> > This patch adds the new public header and low level implementation of a new
> > common framework, entered via the domain_save/load() functions. Subsequent
> > patches will introduce other parts of the framework, and code that will
> > make use of it within the current version of the libxc migration stream.
> >
> > This patch also marks the HVM-only framework as deprecated in favour of the
> > new framework.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: George Dunlap <george.dunlap@citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Julien Grall <julien@xen.org>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Wei Liu <wl@xen.org>
> > Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> > Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> >
> > v2:
> >   - Allow multi-stage save/load to avoid the need to double-buffer
> >   - Get rid of the masks and add an 'ignore' flag instead
> >   - Create copy function union to preserve const save buffer
> >   - Deprecate HVM-only framework
> > ---
> >   xen/common/Makefile                    |   1 +
> >   xen/common/save.c                      | 329 +++++++++++++++++++++++++
> >   xen/include/public/arch-arm/hvm/save.h |   5 +
> >   xen/include/public/arch-x86/hvm/save.h |   5 +
> >   xen/include/public/save.h              |  84 +++++++
> >   xen/include/xen/save.h                 | 152 ++++++++++++
> >   6 files changed, 576 insertions(+)
> >   create mode 100644 xen/common/save.c
> >   create mode 100644 xen/include/public/save.h
> >   create mode 100644 xen/include/xen/save.h
> >
> > diff --git a/xen/common/Makefile b/xen/common/Makefile
> > index e8cde65370..90553ba5d7 100644
> > --- a/xen/common/Makefile
> > +++ b/xen/common/Makefile
> > @@ -37,6 +37,7 @@ obj-y += radix-tree.o
> >   obj-y += rbtree.o
> >   obj-y += rcupdate.o
> >   obj-y += rwlock.o
> > +obj-y += save.o
> >   obj-y += shutdown.o
> >   obj-y += softirq.o
> >   obj-y += sort.o
> > diff --git a/xen/common/save.c b/xen/common/save.c
> > new file mode 100644
> > index 0000000000..6cdac3785b
> > --- /dev/null
> > +++ b/xen/common/save.c
> > @@ -0,0 +1,329 @@
> > +/*
> > + * save.c: Save and restore PV guest state common to all domain types.
> > + *
> > + * Copyright Amazon.com Inc. or its affiliates.
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> > + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License along with
> > + * this program; If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include <xen/save.h>
> > +
> > +union domain_copy_entry {
> > +    domain_write_entry write;
> > +    domain_read_entry read;
> > +};
> > +
> > +struct domain_context {
> > +    bool log;
> > +    struct domain_save_descriptor desc;
> > +    size_t data_len;
> 
> What is data_len?
> 

It’s used for internal accounting.

> > +    union domain_copy_entry copy;
> > +    void *priv;
> > +};
> > +
> > +static struct {
> > +    const char *name;
> > +    bool per_vcpu;
> > +    domain_save_handler save;
> > +    domain_load_handler load;
> > +} handlers[DOMAIN_SAVE_CODE_MAX + 1];
> > +
> > +void __init domain_register_save_type(unsigned int tc, const char *name,
> > +                                      bool per_vcpu,
> > +                                      domain_save_handler save,
> > +                                      domain_load_handler load)
> > +{
> > +    BUG_ON(tc > ARRAY_SIZE(handlers));
> > +
> > +    ASSERT(!handlers[tc].save);
> > +    ASSERT(!handlers[tc].load);
> > +
> > +    handlers[tc].name = name;
> > +    handlers[tc].per_vcpu = per_vcpu;
> > +    handlers[tc].save = save;
> > +    handlers[tc].load = load;
> > +}
> > +
> > +int domain_save_begin(struct domain_context *c, unsigned int tc,
> > +                      const char *name, const struct vcpu *v, size_t len)
> > +{
> > +    int rc;
> > +
> > +    if ( c->log )
> > +        gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
> > +                 (unsigned long)len);
> 
> How about using %zu rather than a cast here?
> 

Yes, that would be better.

> > +
> > +    BUG_ON(tc != c->desc.typecode);
> > +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> 
> I can't find any answer on my question about this part. I understand the
> code would be buggy if this happen, but is it warrant to crash the host?
> Couldn't you just return an error and continue to run?
> 

Since it means buggy code I could ASSERT and error out, yes.

> > +
> > +    ASSERT(!c->data_len);
> > +    c->data_len = c->desc.length = len;
> > +
> > +    rc = c->copy.write(c->priv, &c->desc, sizeof(c->desc));
> > +    if ( rc )
> > +        return rc;
> > +
> > +    c->desc.length = 0;
> 
> This is confusing, why would you need to reset c->desc.length but not
> the rest of the fields?
> 

Because I use it to accumulate the length of the saved data and then cross check it against data_len in domain_save_end() below.

> > +
> > +    return 0;
> > +}
> > +
> > +int domain_save_data(struct domain_context *c, const void *src, size_t len)
> > +{
> > +    if ( c->desc.length + len > c->data_len )
> > +        return -ENOSPC;
> > +
> > +    c->desc.length += len;
> > +
> > +    return c->copy.write(c->priv, src, len);
> > +}
> > +
> > +int domain_save_end(struct domain_context *c)
> > +{
> > +    /*
> > +     * If desc.length does not match the length specified in
> > +     * domain_save_begin(), there should have been more data.
> > +     */
> > +    if ( c->desc.length != c->data_len )
> 
> This suggests you know in advance the size of the record.

That depends on what you mean by 'in advance'. I'd expect the caller of domain_save_begin() to know exactly.

> I have seen
> some cases where we don't know the size in advance. A good example if
> when saving grants. You know the maximum of grant used by the guest but
> you don't know yet the number of grants used. You might need to walk the
> "list" twice or allocate a temporary buffer. None of them are ideal.
> 
> Another example is when saving memory, we may want to compact page
> informations to save space.
> 
> This problem is going to be more relevant for LiveUpdate where we need
> to be able to create the stream in a very short amount of time.
> Allocating any temporary buffer and/or walking the list twice is going
> to kill the performance.
> 
> I would suggest to consider a different approach where you update the
> record length at the end.
> 
> FWIW, this below the current approach for the LU stream (IIRC David sent
> a prototype recently). A record is opened using lu_stream_open_record(),
> you then have two way to add data:
>      - lu_stream_append() -> This takes a buffer and write to the stream.
>      - lu_stream_reserve() -> Pre-allocate space in the stream and
> return a pointer to the beginning of the reserved region.
>      - lu_stream_end_reservation() -> Takes the actual size in parameter
> and update the stream size.
>      - lu_stream_close_record() -> Update the header with the total
> length and update the position in the stream.
> 

That sounds quite LU specific. For LM we still need to know up-front the maximal size of the buffer, and I was trying to work on the basis of never having to update previously saved data but I guess there's no actual harm in doing so, so we could avoid domain_save_begin() specifying the length.

> > +        return -EIO;
> 
> I noticed that all the pasding check have been dropped. I still think we
> need implicit padding to harden the code.
> 

I'd still view that as buggy code.

> > +
> > +    c->data_len = 0;
> > +
> > +    return 0;
> > +}
> > +
> > +int domain_save(struct domain *d, domain_write_entry write, void *priv,
> > +                bool dry_run)
> > +{
> > +    struct domain_context c = {
> > +        .copy.write = write,
> > +        .priv = priv,
> > +        .log = !dry_run,
> > +    };
> > +    struct domain_save_header h = {
> > +        .magic = DOMAIN_SAVE_MAGIC,
> > +        .version = DOMAIN_SAVE_VERSION,
> > +    };
> > +    struct domain_save_header e;
> > +    unsigned int i;
> > +    int rc;
> > +
> > +    ASSERT(d != current->domain);
> > +
> > +    if ( d->is_dying )
> 
> I can't find an answer to my question about d->is_dying. What if the
> guest die afterwards? What does protect us against domain_kill()?
> 
> [...]

As I said in a previous response, nothing protects against domain_kill(), this check is just supposed to avoid doing 'unnecessary' work for a domain we know is already dying. For LU though I guess we do need to save (some) state for even a dying domain, so the individual save handlers actually need to make the decision on whether they are going to do anything.

> 
> > +int domain_load(struct domain *d, domain_read_entry read, void *priv)
> > +{
> > +    struct domain_context c = {
> > +        .copy.read = read,
> > +        .priv = priv,
> > +        .log = true,
> > +    };
> > +    struct domain_save_header h;
> > +    int rc;
> > +
> > +    ASSERT(d != current->domain);
> > +
> > +    if ( d->is_dying )
> > +        return -EINVAL;
> 
> Same here.
> 
> 
> > diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> > new file mode 100644
> > index 0000000000..7e5f8752bd
> > --- /dev/null
> > +++ b/xen/include/public/save.h
> > @@ -0,0 +1,84 @@
> > +/*
> > + * save.h
> > + *
> > + * Structure definitions for common PV/HVM domain state that is held by
> > + * Xen and must be saved along with the domain's memory.
> > + *
> > + * Copyright Amazon.com Inc. or its affiliates.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a copy
> > + * of this software and associated documentation files (the "Software"), to
> > + * deal in the Software without restriction, including without limitation the
> > + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> > + * sell copies of the Software, and to permit persons to whom the Software is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + */
> > +
> > +#ifndef __XEN_PUBLIC_SAVE_H__
> > +#define __XEN_PUBLIC_SAVE_H__
> 
> Does this header need to be exposed outside of Xen and the tools?
> 

Probably not.

> > +
> > +#include "xen.h"
> > +
> > +/* Each entry is preceded by a descriptor */
> > +struct domain_save_descriptor {
> > +    uint16_t typecode;
> > +    /*
> > +     * Each entry will contain either to global or per-vcpu domain state.
> > +     * Entries relating to global state should have zero in this field.
> > +     */
> > +    uint16_t vcpu_id;
> > +    uint32_t flags;
> > +    /*
> > +     * When restoring state this flag can be set in a descriptor to cause
> > +     * its content to be ignored.
> 
> Could you give examples where you would want to ignore a descriptor?
> 

I was thinking of the case when, e.g. you want to update something in the shared_info... You save a context blob, modify the shared_info record, and then restore the context blob with all other records marked as 'ignore' since they have not been modified.

> > +     *
> > +     * NOTE: It is invalid to set this flag for HEADER or END records (see
> > +     *       below).
> > +     */
> > +#define _DOMAIN_SAVE_FLAG_IGNORE 0
> > +#define DOMAIN_SAVE_FLAG_IGNORE (1u << _DOMAIN_SAVE_FLAG_IGNORE)
> > +
> > +    /* Entry length not including this descriptor */
> > +    uint64_t length;
> > +};
> > +
> > +/*
> > + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> > + * binds these things together.
> > + */
> > +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> > +    struct __DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
> > +
> > +#define DOMAIN_SAVE_CODE(_x) \
> > +    (sizeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
> > +#define DOMAIN_SAVE_TYPE(_x) \
> > +    typeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)
> > +
> > +/* Terminating entry */
> > +struct domain_save_end {};
> > +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
> > +
> > +#define DOMAIN_SAVE_MAGIC   0x53415645
> > +#define DOMAIN_SAVE_VERSION 0x00000001
> > +
> > +/* Initial entry */
> > +struct domain_save_header {
> > +    uint32_t magic;             /* Must be DOMAIN_SAVE_MAGIC */
> > +    uint32_t version;           /* Save format version */
> 
> 
> I haven't seen any answer about xen version here. For the record:
> 
> "Let's imagine in 4.14 we introduced a bug in the saving part. This is
> solved in 4.15 but somehow the version is not bumped. How would you
> differentiate the streams saved by Xen 4.14 so you can still migrate?

I'd assume testing would at least save and restore on 4.14. If we then fixed a bug then why would we not bump the version?

> 
> If you record the version of Xen in the record automatically, then you
> at least have a way to differentiate the two versions."
> 

OK, I guess redundant version information is not going to be harmful.

  Paul



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
  2020-04-20 17:26   ` Julien Grall
@ 2020-04-28 15:36     ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2020-04-28 15:36 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: 'Stefano Stabellini', 'Wei Liu',
	'Andrew Cooper', 'Paul Durrant',
	'Ian Jackson', 'George Dunlap',
	'Jan Beulich', 'Daniel De Graaf'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 20 April 2020 18:26
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Wei Liu <wl@xen.org>; Andrew Cooper <andrew.cooper3@citrix.com>; George
> Dunlap <george.dunlap@citrix.com>; Jan Beulich <jbeulich@suse.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Subject: Re: [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
> 
> Hi Paul,
> 
> On 07/04/2020 18:38, Paul Durrant wrote:
> > diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> > index 1ad34c35eb..8ab39acf0c 100644
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -38,7 +38,7 @@
> >   #include "hvm/save.h"
> >   #include "memory.h"
> >
> > -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
> > +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
> >
> >   /*
> >    * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
> > @@ -1129,6 +1129,44 @@ struct xen_domctl_vuart_op {
> >                                    */
> >   };
> >
> > +/*
> > + * Get/Set domain PV context. The same struct xen_domctl_domaincontext
> 
> I think you want to update the comments to match the split.

Oh yes.

> 
> > + * is used for both commands but with slightly different field semantics
> > + * as follows:
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>
> 

Thanks,

  Paul

> 
> Cheers,
> 
> 
> --
> Julien Grall



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 4/5] common/domain: add a domain context record for shared_info...
  2020-04-20 17:34   ` Julien Grall
@ 2020-04-28 15:37     ` Paul Durrant
  2020-04-30 11:29       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2020-04-28 15:37 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: 'Stefano Stabellini', 'Wei Liu',
	'Andrew Cooper', 'Paul Durrant',
	'Ian Jackson', 'George Dunlap',
	'Jan Beulich'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 20 April 2020 18:35
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Wei Liu <wl@xen.org>;
> Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Jan Beulich
> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>
> Subject: Re: [PATCH v2 4/5] common/domain: add a domain context record for shared_info...
> 
> Hi Paul,
> 
> On 07/04/2020 18:38, Paul Durrant wrote:
> > ... and update xen-domctx to dump some information describing the record.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Wei Liu <wl@xen.org>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: George Dunlap <george.dunlap@citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Julien Grall <julien@xen.org>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> >
> > v2:
> >   - Drop the header change to define a 'Xen' page size and instead use a
> >     variable length struct now that the framework makes this is feasible
> >   - Guard use of 'has_32bit_shinfo' in common code with CONFIG_COMPAT
> > ---
> >   tools/misc/xen-domctx.c   | 11 ++++++
> >   xen/common/domain.c       | 81 +++++++++++++++++++++++++++++++++++++++
> >   xen/include/public/save.h | 10 ++++-
> >   3 files changed, 101 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/misc/xen-domctx.c b/tools/misc/xen-domctx.c
> > index d663522a8b..a8d3922321 100644
> > --- a/tools/misc/xen-domctx.c
> > +++ b/tools/misc/xen-domctx.c
> > @@ -59,6 +59,16 @@ static void dump_header(struct domain_save_descriptor *desc)
> >       off += desc->length;
> >   }
> >
> > +static void dump_shared_info(struct domain_save_descriptor *desc)
> > +{
> > +    DOMAIN_SAVE_TYPE(SHARED_INFO) s;
> > +    READ(s);
> > +    printf("    SHARED_INFO: field_width %u buffer size: %lu\n",
> > +           s.field_width, desc->length - sizeof(s));
> > +
> > +    off += desc->length;
> > +}
> > +
> >   static void dump_end(struct domain_save_descriptor *desc)
> >   {
> >       DOMAIN_SAVE_TYPE(END) e;
> > @@ -125,6 +135,7 @@ int main(int argc, char **argv)
> >           switch (desc.typecode)
> >           {
> >           case DOMAIN_SAVE_CODE(HEADER): dump_header(&desc); break;
> > +        case DOMAIN_SAVE_CODE(SHARED_INFO): dump_shared_info(&desc); break;
> >           case DOMAIN_SAVE_CODE(END): dump_end(&desc); return 0;
> >           default:
> >               printf("Unknown type %u: skipping\n", desc.typecode);
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index 3dcd73f67c..8b72462e07 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -33,6 +33,7 @@
> >   #include <xen/xenoprof.h>
> >   #include <xen/irq.h>
> >   #include <xen/argo.h>
> > +#include <xen/save.h>
> >   #include <asm/debugger.h>
> >   #include <asm/p2m.h>
> >   #include <asm/processor.h>
> > @@ -1646,6 +1647,86 @@ int continue_hypercall_on_cpu(
> >       return 0;
> >   }
> >
> > +static int save_shared_info(const struct vcpu *v, struct domain_context *c,
> > +                            bool dry_run)
> > +{
> > +    struct domain *d = v->domain;
> > +    struct domain_shared_info_context ctxt = {};
> > +    size_t hdr_size = offsetof(typeof(ctxt), buffer);
> > +    size_t size = hdr_size + PAGE_SIZE;
> > +    int rc;
> > +
> > +    rc = DOMAIN_SAVE_BEGIN(SHARED_INFO, c, v, size);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( !dry_run )
> 
> NIT: I think the if is not necessary here as you don't skip that much code.
> 

I know, but it is illustrative so I'd rather keep it.

> > +        ctxt.field_width =
> > +#ifdef CONFIG_COMPAT
> > +            has_32bit_shinfo(d) ? 4 :
> > +#endif
> > +            8;
> > +
> > +    rc = domain_save_data(c, &ctxt, hdr_size);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    rc = domain_save_data(c, d->shared_info, PAGE_SIZE);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    return domain_save_end(c);
> > +}
> > +
> > +static int load_shared_info(struct vcpu *v, struct domain_context *c)
> > +{
> > +    struct domain *d = v->domain;
> > +    struct domain_shared_info_context ctxt = {};
> > +    size_t hdr_size = offsetof(typeof(ctxt), buffer);
> > +    size_t size = hdr_size + PAGE_SIZE;
> > +    unsigned int i;
> > +    int rc;
> > +
> > +    rc = DOMAIN_LOAD_BEGIN(SHARED_INFO, c, v, size, true);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    rc = domain_load_data(c, &ctxt, hdr_size);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    for ( i = 0; i < ARRAY_SIZE(ctxt.pad); i++ )
> > +        if ( ctxt.pad[i] )
> > +            return -EINVAL;
> > +
> > +    switch ( ctxt.field_width )
> > +    {
> > +#ifdef CONFIG_COMPAT
> > +    case 4:
> > +        d->arch.has_32bit_shinfo = 1;
> > +        break;
> > +#endif
> > +    case 8:
> > +#ifdef CONFIG_COMPAT
> > +        d->arch.has_32bit_shinfo = 0;
> > +#endif
> > +        break;
> > +
> > +    default:
> > +        rc = -EINVAL;
> > +        break;
> > +    }
> > +
> > +    rc = domain_load_data(c, d->shared_info, PAGE_SIZE);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    return domain_load_end(c);
> > +}
> > +
> > +DOMAIN_REGISTER_SAVE_RESTORE(SHARED_INFO, false, save_shared_info,
> > +                             load_shared_info);
> > +
> >   /*
> >    * Local variables:
> >    * mode: C
> > diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> > index 7e5f8752bd..ed994a8765 100644
> > --- a/xen/include/public/save.h
> > +++ b/xen/include/public/save.h
> > @@ -79,6 +79,14 @@ struct domain_save_header {
> >   };
> >   DECLARE_DOMAIN_SAVE_TYPE(HEADER, 1, struct domain_save_header);
> >
> > -#define DOMAIN_SAVE_CODE_MAX 1
> > +struct domain_shared_info_context {
> > +    uint8_t field_width;
> > +    uint8_t pad[7];
> > +    uint8_t buffer[]; /* Implementation specific size */
> 
> I would recommend to use buffer[XEN_FLEX_ARRAY_DIM].
> 

Ok.

  Paul

> Cheers,
> 
> --
> Julien Grall



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-04-07 17:38 ` [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context Paul Durrant
  2020-04-20 17:20   ` Julien Grall
@ 2020-04-29 11:02   ` Jan Beulich
  2020-05-06 16:44     ` Paul Durrant
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2020-04-29 11:02 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Paul Durrant, Ian Jackson, George Dunlap, xen-devel,
	Volodymyr Babchuk, Roger Pau Monné

On 07.04.2020 19:38, Paul Durrant wrote:
> +void __init domain_register_save_type(unsigned int tc, const char *name,
> +                                      bool per_vcpu,
> +                                      domain_save_handler save,
> +                                      domain_load_handler load)
> +{
> +    BUG_ON(tc > ARRAY_SIZE(handlers));

>=

> +    ASSERT(!handlers[tc].save);
> +    ASSERT(!handlers[tc].load);
> +
> +    handlers[tc].name = name;
> +    handlers[tc].per_vcpu = per_vcpu;
> +    handlers[tc].save = save;
> +    handlers[tc].load = load;
> +}
> +
> +int domain_save_begin(struct domain_context *c, unsigned int tc,
> +                      const char *name, const struct vcpu *v, size_t len)

I find it quite odd for a function like this one to take a
struct vcpu * rather than a struct domain *. See also below
comment on the vcpu_id field in the public header.

> +{
> +    int rc;
> +
> +    if ( c->log )
> +        gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
> +                 (unsigned long)len);
> +
> +    BUG_ON(tc != c->desc.typecode);
> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> +
> +    ASSERT(!c->data_len);
> +    c->data_len = c->desc.length = len;
> +
> +    rc = c->copy.write(c->priv, &c->desc, sizeof(c->desc));
> +    if ( rc )
> +        return rc;
> +
> +    c->desc.length = 0;
> +
> +    return 0;
> +}
> +
> +int domain_save_data(struct domain_context *c, const void *src, size_t len)
> +{
> +    if ( c->desc.length + len > c->data_len )
> +        return -ENOSPC;
> +
> +    c->desc.length += len;
> +
> +    return c->copy.write(c->priv, src, len);
> +}
> +
> +int domain_save_end(struct domain_context *c)

I'm sure there is a reason for the split into three load/save
functions (begin/data/end), but I can't figure it and the
description also doesn't explain it. They're all used together
only afaics, in domain_{save,load}_entry(). Or wait, there are
DOMAIN_{SAVE,LOAD}_BEGIN() macros apparently allowing separate
use of ..._begin(), but then it's still not clear why ..._end()
need to be separate from ..._data().

> +int domain_save(struct domain *d, domain_write_entry write, void *priv,
> +                bool dry_run)
> +{
> +    struct domain_context c = {
> +        .copy.write = write,
> +        .priv = priv,
> +        .log = !dry_run,
> +    };
> +    struct domain_save_header h = {

const? Perhaps even static?

> +        .magic = DOMAIN_SAVE_MAGIC,
> +        .version = DOMAIN_SAVE_VERSION,
> +    };
> +    struct domain_save_header e;
> +    unsigned int i;
> +    int rc;
> +
> +    ASSERT(d != current->domain);
> +
> +    if ( d->is_dying )
> +        return -EINVAL;

Could I talk you into using less generic an error code here, e.g.
-ESRCH or -ENXIO? There look to be further uses of EINVAL that
may want replacing.

> +    domain_pause(d);
> +
> +    c.desc.typecode = DOMAIN_SAVE_CODE(HEADER);
> +
> +    rc = DOMAIN_SAVE_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h));
> +    if ( rc )
> +        goto out;
> +
> +    for ( i = 0; i < ARRAY_SIZE(handlers); i++ )
> +    {
> +        domain_save_handler save = handlers[i].save;
> +
> +        if ( !save )
> +            continue;
> +
> +        memset(&c.desc, 0, sizeof(c.desc));
> +        c.desc.typecode = i;
> +
> +        if ( handlers[i].per_vcpu )
> +        {
> +            struct vcpu *v;
> +
> +            for_each_vcpu ( d, v )
> +            {
> +                c.desc.vcpu_id = v->vcpu_id;
> +
> +                rc = save(v, &c, dry_run);
> +                if ( rc )
> +                    goto out;
> +            }
> +        }
> +        else
> +        {
> +            rc = save(d->vcpu[0], &c, dry_run);
> +            if ( rc )
> +                goto out;
> +        }
> +    }
> +
> +    memset(&c.desc, 0, sizeof(c.desc));
> +    c.desc.typecode = DOMAIN_SAVE_CODE(END);
> +
> +    rc = DOMAIN_SAVE_ENTRY(END, &c, d->vcpu[0], &e, 0);

By the looks of it you're passing uninitialized e here; it's just
that the struct has no members. It would look less odd if you used
NULL here. Otherwise please don't use literal 0, but sizeof() for
the last parameter.

> +int domain_load_begin(struct domain_context *c, unsigned int tc,
> +                      const char *name, const struct vcpu *v, size_t len,
> +                      bool exact)
> +{
> +    if ( c->log )
> +        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
> +                 (unsigned long)len);
> +
> +    BUG_ON(tc != c->desc.typecode);
> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> +
> +    if ( (exact && (len != c->desc.length)) ||
> +         (len < c->desc.length) )
> +        return -EINVAL;

How about

    if ( exact ? len != c->desc.length
               : len < c->desc.length )

? I'm also unsure about the < - don't you mean > instead? Too
little data would be compensated by zero padding, but too
much data can't be dealt with. But maybe I'm getting the sense
of len wrong ...

> +int domain_load(struct domain *d, domain_read_entry read, void *priv)
> +{
> +    struct domain_context c = {
> +        .copy.read = read,
> +        .priv = priv,
> +        .log = true,
> +    };
> +    struct domain_save_header h;
> +    int rc;
> +
> +    ASSERT(d != current->domain);
> +
> +    if ( d->is_dying )
> +        return -EINVAL;
> +
> +    rc = c.copy.read(c.priv, &c.desc, sizeof(c.desc));
> +    if ( rc )
> +        return rc;
> +
> +    if ( c.desc.typecode != DOMAIN_SAVE_CODE(HEADER) || c.desc.vcpu_id ||
> +         c.desc.flags )
> +        return -EINVAL;
> +
> +    rc = DOMAIN_LOAD_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h), true);
> +    if ( rc )
> +        return rc;
> +
> +    if ( h.magic != DOMAIN_SAVE_MAGIC || h.version != DOMAIN_SAVE_VERSION )
> +        return -EINVAL;
> +
> +    domain_pause(d);
> +
> +    for (;;)
> +    {
> +        unsigned int i;
> +        unsigned int flags;
> +        domain_load_handler load;
> +        struct vcpu *v;
> +
> +        rc = c.copy.read(c.priv, &c.desc, sizeof(c.desc));
> +        if ( rc )
> +            break;
> +
> +        rc = -EINVAL;
> +
> +        flags = c.desc.flags;
> +        if ( flags & ~DOMAIN_SAVE_FLAG_IGNORE )
> +            break;
> +
> +        if ( c.desc.typecode == DOMAIN_SAVE_CODE(END) ) {

Brace placement.

> +            if ( !(flags & DOMAIN_SAVE_FLAG_IGNORE) )
> +                rc = DOMAIN_LOAD_ENTRY(END, &c, d->vcpu[0], NULL, 0, true);

The public header says DOMAIN_SAVE_FLAG_IGNORE is invalid for
END.

> +            break;
> +        }
> +
> +        i = c.desc.typecode;
> +        if ( i >= ARRAY_SIZE(handlers) )
> +            break;
> +
> +        if ( (!handlers[i].per_vcpu && c.desc.vcpu_id) ||
> +             (c.desc.vcpu_id >= d->max_vcpus) )
> +            break;
> +
> +        v = d->vcpu[c.desc.vcpu_id];
> +
> +        if ( flags & DOMAIN_SAVE_FLAG_IGNORE )
> +        {
> +            /* Sink the data */
> +            rc = domain_load_entry(&c, c.desc.typecode, "IGNORED",
> +                                   v, NULL, c.desc.length, true);

IOW the read handlers need to be able to deal with a NULL dst?
Sounds a little fragile to me. Is there a reason
domain_load_data() can't skip the call to read()?

> --- /dev/null
> +++ b/xen/include/public/save.h
> @@ -0,0 +1,84 @@
> +/*
> + * save.h
> + *
> + * Structure definitions for common PV/HVM domain state that is held by
> + * Xen and must be saved along with the domain's memory.
> + *
> + * Copyright Amazon.com Inc. or its affiliates.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __XEN_PUBLIC_SAVE_H__
> +#define __XEN_PUBLIC_SAVE_H__
> +
> +#include "xen.h"
> +
> +/* Each entry is preceded by a descriptor */
> +struct domain_save_descriptor {

Throughout this new public header, let's please play nice in name
space terms: Prefix global scope items with xen_ / XEN_
respectively, and don't introduce violations of the standard's
rules (e.g. _DOMAIN_SAVE_FLAG_IGNORE below). The latter point also
goes for naming outside of the public header, of course.

> +    uint16_t typecode;
> +    /*
> +     * Each entry will contain either to global or per-vcpu domain state.

s/contain/apply/ or drop "to"?

> +     * Entries relating to global state should have zero in this field.

Is there a reason to specify zero?

> +     */
> +    uint16_t vcpu_id;

Seeing (possibly) multi-instance records in HVM state (PIC, IO-APIC, HPET),
wouldn't it be better to generalize this to ID, meaning "vCPU ID" just for
per-vCPU state?

> +    uint32_t flags;
> +    /*
> +     * When restoring state this flag can be set in a descriptor to cause
> +     * its content to be ignored.
> +     *
> +     * NOTE: It is invalid to set this flag for HEADER or END records (see
> +     *       below).
> +     */
> +#define _DOMAIN_SAVE_FLAG_IGNORE 0
> +#define DOMAIN_SAVE_FLAG_IGNORE (1u << _DOMAIN_SAVE_FLAG_IGNORE)

As per your reply to Julien I think this wants further clarification on
the intentions with this flag. I'm also uncertain it is a good idea to
allow such partial restores - there may be dependencies between records,
and hence things can easily go wrong in perhaps non-obvious ways.

> +    /* Entry length not including this descriptor */
> +    uint64_t length;

Do you really envision descriptors with more than 4Gb of data? If so,
for 32-bit purposes wouldn't this want to be uint64_aligned_t?

> +};
> +
> +/*
> + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> + * binds these things together.
> + */
> +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> +    struct __DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
> +
> +#define DOMAIN_SAVE_CODE(_x) \
> +    (sizeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
> +#define DOMAIN_SAVE_TYPE(_x) \
> +    typeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)

Just like the typeof() I dont think the sizeof() needs an outer
pair of parentheses. I also don't see why 0 gets parenthesized.

> +/* Terminating entry */
> +struct domain_save_end {};
> +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);

If the header gets a __XEN__ || __XEN_TOOLS__ restriction, as
suggested by Julien, using 0 here may be fine. If not, char[0]
and typeof() aren't legal plain C and hence would need to be
avoided.

> --- /dev/null
> +++ b/xen/include/xen/save.h
> @@ -0,0 +1,152 @@
> +/*
> + * save.h: support routines for save/restore
> + *
> + * Copyright Amazon.com Inc. or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __XEN_SAVE_H__
> +#define __XEN_SAVE_H__
> +
> +#include <xen/sched.h>
> +#include <xen/types.h>
> +#include <xen/init.h>

Please sort these.

> +#include <public/xen.h>
> +#include <public/save.h>

The latter includes the former anyway - is the former necessary
for some reason nevertheless?

> +struct domain_context;
> +
> +int domain_save_begin(struct domain_context *c, unsigned int tc,
> +                      const char *name, const struct vcpu *v, size_t len);
> +
> +#define DOMAIN_SAVE_BEGIN(_x, _c, _v, _len) \
> +        domain_save_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len))

In new code I'd like to ask for no leading underscores on macro
parameters as well as no unnecessary parenthesization of macro
parameters (e.g. when they simply get passed on to another macro
or function without being part of a larger expression).

> +int domain_save_data(struct domain_context *c, const void *data, size_t len);
> +int domain_save_end(struct domain_context *c);
> +
> +static inline int domain_save_entry(struct domain_context *c,
> +                                    unsigned int tc, const char *name,
> +                                    const struct vcpu *v, const void *src,
> +                                    size_t len)
> +{
> +    int rc;
> +
> +    rc = domain_save_begin(c, tc, name, v, len);
> +    if ( rc )
> +        return rc;
> +
> +    rc = domain_save_data(c, src, len);
> +    if ( rc )
> +        return rc;
> +
> +    return domain_save_end(c);
> +}
> +
> +#define DOMAIN_SAVE_ENTRY(_x, _c, _v, _src, _len) \
> +    domain_save_entry((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_src), (_len))
> +
> +int domain_load_begin(struct domain_context *c, unsigned int tc,
> +                      const char *name, const struct vcpu *v, size_t len,
> +                      bool exact);
> +
> +#define DOMAIN_LOAD_BEGIN(_x, _c, _v, _len, _exact) \
> +        domain_load_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len), \
> +                          (_exact));
> +
> +int domain_load_data(struct domain_context *c, void *data, size_t len);
> +int domain_load_end(struct domain_context *c);
> +
> +static inline int domain_load_entry(struct domain_context *c,
> +                                    unsigned int tc, const char *name,
> +                                    const struct vcpu *v, void *dst,
> +                                    size_t len, bool exact)
> +{
> +    int rc;
> +
> +    rc = domain_load_begin(c, tc, name, v, len, exact);
> +    if ( rc )
> +        return rc;
> +
> +    rc = domain_load_data(c, dst, len);
> +    if ( rc )
> +        return rc;
> +
> +    return domain_load_end(c);
> +}
> +
> +#define DOMAIN_LOAD_ENTRY(_x, _c, _v, _dst, _len, _exact) \
> +    domain_load_entry((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_dst), (_len), \
> +                          (_exact))
> +
> +/*
> + * The 'dry_run' flag indicates that the caller of domain_save() (see
> + * below) is not trying to actually acquire the data, only the size
> + * of the data. The save handler can therefore limit work to only that
> + * which is necessary to call DOMAIN_SAVE_BEGIN/ENTRY() with an accurate
> + * value for '_len'.
> + */
> +typedef int (*domain_save_handler)(const struct vcpu *v,
> +                                   struct domain_context *h,
> +                                   bool dry_run);
> +typedef int (*domain_load_handler)(struct vcpu *v,
> +                                   struct domain_context *h);

Does a load handler have a need to modify struct domain_context?

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-04-28 15:35     ` Paul Durrant
@ 2020-04-29 11:05       ` Julien Grall
  0 siblings, 0 replies; 30+ messages in thread
From: Julien Grall @ 2020-04-29 11:05 UTC (permalink / raw)
  To: xen-devel
  Cc: 'Stefano Stabellini', 'Wei Liu',
	'Andrew Cooper', 'Paul Durrant',
	'Ian Jackson', 'George Dunlap',
	'Jan Beulich', 'Volodymyr Babchuk',
	'Roger Pau Monné'

Hi Paul,

On 28/04/2020 16:35, Paul Durrant wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 20 April 2020 18:21
>> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
>> Cc: Paul Durrant <pdurrant@amazon.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
>> <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich <jbeulich@suse.com>;
>> Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Volodymyr Babchuk
>> <Volodymyr_Babchuk@epam.com>; Roger Pau Monné <roger.pau@citrix.com>
>> Subject: Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
>>
>> Hi Paul,
>>
>> On 07/04/2020 18:38, Paul Durrant wrote:
>>> To allow enlightened HVM guests (i.e. those that have PV drivers) to be
>>> migrated without their co-operation it will be necessary to transfer 'PV'
>>> state such as event channel state, grant entry state, etc.
>>>
>>> Currently there is a framework (entered via the hvm_save/load() functions)
>>> that allows a domain's 'HVM' (architectural) state to be transferred but
>>> 'PV' state is also common with pure PV guests and so this framework is not
>>> really suitable.
>>>
>>> This patch adds the new public header and low level implementation of a new
>>> common framework, entered via the domain_save/load() functions. Subsequent
>>> patches will introduce other parts of the framework, and code that will
>>> make use of it within the current version of the libxc migration stream.
>>>
>>> This patch also marks the HVM-only framework as deprecated in favour of the
>>> new framework.
>>>
>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>>> ---
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Cc: George Dunlap <george.dunlap@citrix.com>
>>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Julien Grall <julien@xen.org>
>>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>>> Cc: Wei Liu <wl@xen.org>
>>> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
>>> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
>>>
>>> v2:
>>>    - Allow multi-stage save/load to avoid the need to double-buffer
>>>    - Get rid of the masks and add an 'ignore' flag instead
>>>    - Create copy function union to preserve const save buffer
>>>    - Deprecate HVM-only framework
>>> ---
>>>    xen/common/Makefile                    |   1 +
>>>    xen/common/save.c                      | 329 +++++++++++++++++++++++++
>>>    xen/include/public/arch-arm/hvm/save.h |   5 +
>>>    xen/include/public/arch-x86/hvm/save.h |   5 +
>>>    xen/include/public/save.h              |  84 +++++++
>>>    xen/include/xen/save.h                 | 152 ++++++++++++
>>>    6 files changed, 576 insertions(+)
>>>    create mode 100644 xen/common/save.c
>>>    create mode 100644 xen/include/public/save.h
>>>    create mode 100644 xen/include/xen/save.h
>>>
>>> diff --git a/xen/common/Makefile b/xen/common/Makefile
>>> index e8cde65370..90553ba5d7 100644
>>> --- a/xen/common/Makefile
>>> +++ b/xen/common/Makefile
>>> @@ -37,6 +37,7 @@ obj-y += radix-tree.o
>>>    obj-y += rbtree.o
>>>    obj-y += rcupdate.o
>>>    obj-y += rwlock.o
>>> +obj-y += save.o
>>>    obj-y += shutdown.o
>>>    obj-y += softirq.o
>>>    obj-y += sort.o
>>> diff --git a/xen/common/save.c b/xen/common/save.c
>>> new file mode 100644
>>> index 0000000000..6cdac3785b
>>> --- /dev/null
>>> +++ b/xen/common/save.c
>>> @@ -0,0 +1,329 @@
>>> +/*
>>> + * save.c: Save and restore PV guest state common to all domain types.
>>> + *
>>> + * Copyright Amazon.com Inc. or its affiliates.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>> + * more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along with
>>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include <xen/save.h>
>>> +
>>> +union domain_copy_entry {
>>> +    domain_write_entry write;
>>> +    domain_read_entry read;
>>> +};
>>> +
>>> +struct domain_context {
>>> +    bool log;
>>> +    struct domain_save_descriptor desc;
>>> +    size_t data_len;
>>
>> What is data_len?
>>
> 
> It’s used for internal accounting.

Can you add a comment explaining it?

> 
>>> +    union domain_copy_entry copy;
>>> +    void *priv;
>>> +};
>>> +
>>> +static struct {
>>> +    const char *name;
>>> +    bool per_vcpu;
>>> +    domain_save_handler save;
>>> +    domain_load_handler load;
>>> +} handlers[DOMAIN_SAVE_CODE_MAX + 1];
>>> +
>>> +void __init domain_register_save_type(unsigned int tc, const char *name,
>>> +                                      bool per_vcpu,
>>> +                                      domain_save_handler save,
>>> +                                      domain_load_handler load)
>>> +{
>>> +    BUG_ON(tc > ARRAY_SIZE(handlers));
>>> +
>>> +    ASSERT(!handlers[tc].save);
>>> +    ASSERT(!handlers[tc].load);
>>> +
>>> +    handlers[tc].name = name;
>>> +    handlers[tc].per_vcpu = per_vcpu;
>>> +    handlers[tc].save = save;
>>> +    handlers[tc].load = load;
>>> +}
>>> +
>>> +int domain_save_begin(struct domain_context *c, unsigned int tc,
>>> +                      const char *name, const struct vcpu *v, size_t len)
>>> +{
>>> +    int rc;
>>> +
>>> +    if ( c->log )
>>> +        gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
>>> +                 (unsigned long)len);
>>
>> How about using %zu rather than a cast here?
>>
> 
> Yes, that would be better.
> 
>>> +
>>> +    BUG_ON(tc != c->desc.typecode);
>>> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
>>
>> I can't find any answer on my question about this part. I understand the
>> code would be buggy if this happen, but is it warrant to crash the host?
>> Couldn't you just return an error and continue to run?
>>
> 
> Since it means buggy code I could ASSERT and error out, yes.

That would be better, thanks!

> 
>>> +
>>> +    ASSERT(!c->data_len);
>>> +    c->data_len = c->desc.length = len;
>>> +
>>> +    rc = c->copy.write(c->priv, &c->desc, sizeof(c->desc));
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>> +    c->desc.length = 0;
>>
>> This is confusing, why would you need to reset c->desc.length but not
>> the rest of the fields?
>>
> 
> Because I use it to accumulate the length of the saved data and then cross check it against data_len in domain_save_end() below.
> 
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +int domain_save_data(struct domain_context *c, const void *src, size_t len)
>>> +{
>>> +    if ( c->desc.length + len > c->data_len )
>>> +        return -ENOSPC;
>>> +
>>> +    c->desc.length += len;
>>> +
>>> +    return c->copy.write(c->priv, src, len);
>>> +}
>>> +
>>> +int domain_save_end(struct domain_context *c)
>>> +{
>>> +    /*
>>> +     * If desc.length does not match the length specified in
>>> +     * domain_save_begin(), there should have been more data.
>>> +     */
>>> +    if ( c->desc.length != c->data_len )
>>
>> This suggests you know in advance the size of the record.
> 
> That depends on what you mean by 'in advance'. I'd expect the caller of domain_save_begin() to know exactly.
> 
>> I have seen
>> some cases where we don't know the size in advance. A good example if
>> when saving grants. You know the maximum of grant used by the guest but
>> you don't know yet the number of grants used. You might need to walk the
>> "list" twice or allocate a temporary buffer. None of them are ideal.
>>
>> Another example is when saving memory, we may want to compact page
>> informations to save space.
>>
>> This problem is going to be more relevant for LiveUpdate where we need
>> to be able to create the stream in a very short amount of time.
>> Allocating any temporary buffer and/or walking the list twice is going
>> to kill the performance.
>>
>> I would suggest to consider a different approach where you update the
>> record length at the end.
>>
>> FWIW, this below the current approach for the LU stream (IIRC David sent
>> a prototype recently). A record is opened using lu_stream_open_record(),
>> you then have two way to add data:
>>       - lu_stream_append() -> This takes a buffer and write to the stream.
>>       - lu_stream_reserve() -> Pre-allocate space in the stream and
>> return a pointer to the beginning of the reserved region.
>>       - lu_stream_end_reservation() -> Takes the actual size in parameter
>> and update the stream size.
>>       - lu_stream_close_record() -> Update the header with the total
>> length and update the position in the stream.
>>
> 
> That sounds quite LU specific. For LM we still need to know up-front the maximal size of the buffer, 

I described the function from an LU PoV, but thsi is mostly replace 
lu_stream_reserve() by a function that check you have at least N bytes 
free in your stream.

> and I was trying to work on the basis of never having to update previously saved data but I guess there's no actual harm in doing so, so we could avoid domain_save_begin() specifying the length.

It depends what we want to achieve in term of memory space and time. If 
we want to avoid modifying previously saved data, then we would need to 
walk the elements twice or allocate a temporary buffer. They will both 
consume space and time.

A good example for LM is when we need to save the event channels. The 
maximum number of event channels is known in advance but you don't know 
how many of them are used. We could save all of them (even the free 
ones) but this is a waste of space.

To avoid the temporary buffer or walking the elements twice, our 
internal tree contain open-code code to rewrite the record in the event 
channels save function.

> 
>>> +        return -EIO;
>>
>> I noticed that all the pasding check have been dropped. I still think we
>> need implicit padding to harden the code.
>>
> 
> I'd still view that as buggy code.
> 
>>> +
>>> +    c->data_len = 0;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +int domain_save(struct domain *d, domain_write_entry write, void *priv,
>>> +                bool dry_run)
>>> +{
>>> +    struct domain_context c = {
>>> +        .copy.write = write,
>>> +        .priv = priv,
>>> +        .log = !dry_run,
>>> +    };
>>> +    struct domain_save_header h = {
>>> +        .magic = DOMAIN_SAVE_MAGIC,
>>> +        .version = DOMAIN_SAVE_VERSION,
>>> +    };
>>> +    struct domain_save_header e;
>>> +    unsigned int i;
>>> +    int rc;
>>> +
>>> +    ASSERT(d != current->domain);
>>> +
>>> +    if ( d->is_dying )
>>
>> I can't find an answer to my question about d->is_dying. What if the
>> guest die afterwards? What does protect us against domain_kill()?
>>
>> [...]
> 
> As I said in a previous response, nothing protects against domain_kill(), this check is just supposed to avoid doing 'unnecessary' work for a domain we know is already dying. 

Regardless LU (see below), I would argue that if a user purposefully 
call domain_save() on a dying domain, then he/she already accepted the cost.

Beside, this give a false idea that domain_kill() cannot be called in 
parallel. So I would rather drop this check unless we can make it safe.

> For LU though I guess we do need to save (some) state for even a dying domain, so the individual save handlers actually need to make the decision on whether they are going to do anything.

That's correct we would want to preserve states for dying domain.

>>
>>> +int domain_load(struct domain *d, domain_read_entry read, void *priv)
>>> +{
>>> +    struct domain_context c = {
>>> +        .copy.read = read,
>>> +        .priv = priv,
>>> +        .log = true,
>>> +    };
>>> +    struct domain_save_header h;
>>> +    int rc;
>>> +
>>> +    ASSERT(d != current->domain);
>>> +
>>> +    if ( d->is_dying )
>>> +        return -EINVAL;
>>
>> Same here.
>>
>>
>>> diff --git a/xen/include/public/save.h b/xen/include/public/save.h
>>> new file mode 100644
>>> index 0000000000..7e5f8752bd
>>> --- /dev/null
>>> +++ b/xen/include/public/save.h
>>> @@ -0,0 +1,84 @@
>>> +/*
>>> + * save.h
>>> + *
>>> + * Structure definitions for common PV/HVM domain state that is held by
>>> + * Xen and must be saved along with the domain's memory.
>>> + *
>>> + * Copyright Amazon.com Inc. or its affiliates.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>>> + * of this software and associated documentation files (the "Software"), to
>>> + * deal in the Software without restriction, including without limitation the
>>> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
>>> + * sell copies of the Software, and to permit persons to whom the Software is
>>> + * furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>>> + * DEALINGS IN THE SOFTWARE.
>>> + */
>>> +
>>> +#ifndef __XEN_PUBLIC_SAVE_H__
>>> +#define __XEN_PUBLIC_SAVE_H__
>>
>> Does this header need to be exposed outside of Xen and the tools?
>>
> 
> Probably not.
> 
>>> +
>>> +#include "xen.h"
>>> +
>>> +/* Each entry is preceded by a descriptor */
>>> +struct domain_save_descriptor {
>>> +    uint16_t typecode;
>>> +    /*
>>> +     * Each entry will contain either to global or per-vcpu domain state.
>>> +     * Entries relating to global state should have zero in this field.
>>> +     */
>>> +    uint16_t vcpu_id;
>>> +    uint32_t flags;
>>> +    /*
>>> +     * When restoring state this flag can be set in a descriptor to cause
>>> +     * its content to be ignored.
>>
>> Could you give examples where you would want to ignore a descriptor?
>>
> 
> I was thinking of the case when, e.g. you want to update something in the shared_info... You save a context blob, modify the shared_info record, and then restore the context blob with all other records marked as 'ignore' since they have not been modified.

How about introducing a domctl similar to 
XEN_DOMCTL_gethvmcontext_partial? This would only give you the 
shared_info record and make easier for the user to use.

> 
>>> +     *
>>> +     * NOTE: It is invalid to set this flag for HEADER or END records (see
>>> +     *       below).
>>> +     */
>>> +#define _DOMAIN_SAVE_FLAG_IGNORE 0
>>> +#define DOMAIN_SAVE_FLAG_IGNORE (1u << _DOMAIN_SAVE_FLAG_IGNORE)
>>> +
>>> +    /* Entry length not including this descriptor */
>>> +    uint64_t length;
>>> +};
>>> +
>>> +/*
>>> + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
>>> + * binds these things together.
>>> + */
>>> +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
>>> +    struct __DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
>>> +
>>> +#define DOMAIN_SAVE_CODE(_x) \
>>> +    (sizeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
>>> +#define DOMAIN_SAVE_TYPE(_x) \
>>> +    typeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)
>>> +
>>> +/* Terminating entry */
>>> +struct domain_save_end {};
>>> +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
>>> +
>>> +#define DOMAIN_SAVE_MAGIC   0x53415645
>>> +#define DOMAIN_SAVE_VERSION 0x00000001
>>> +
>>> +/* Initial entry */
>>> +struct domain_save_header {
>>> +    uint32_t magic;             /* Must be DOMAIN_SAVE_MAGIC */
>>> +    uint32_t version;           /* Save format version */
>>
>>
>> I haven't seen any answer about xen version here. For the record:
>>
>> "Let's imagine in 4.14 we introduced a bug in the saving part. This is
>> solved in 4.15 but somehow the version is not bumped. How would you
>> differentiate the streams saved by Xen 4.14 so you can still migrate?
> 
> I'd assume testing would at least save and restore on 4.14. If we then fixed a bug then why would we not bump the version?

Do you mean we would test save/restore between 4.14 and 4.15 (and 
vice-versa)? If so, this may work but we need this to be part of OSSTest 
so we can catch this. Otherwise there are chance we don't catch 
regression in the stream.

> 
>>
>> If you record the version of Xen in the record automatically, then you
>> at least have a way to differentiate the two versions."
>>
> 
> OK, I guess redundant version information is not going to be harmful.
> 
>    Paul
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
  2020-04-07 17:38 ` [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext Paul Durrant
  2020-04-20 17:26   ` Julien Grall
@ 2020-04-29 14:50   ` Jan Beulich
  2020-05-13 15:06     ` Paul Durrant
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2020-04-29 14:50 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Paul Durrant, Ian Jackson, George Dunlap, xen-devel,
	Daniel De Graaf

On 07.04.2020 19:38, Paul Durrant wrote:
> @@ -358,6 +359,113 @@ static struct vnuma_info *vnuma_init(const struct xen_domctl_vnuma *uinfo,
>      return ERR_PTR(ret);
>  }
>  
> +struct domctl_context
> +{
> +    void *buffer;
> +    size_t len;
> +    size_t cur;
> +};
> +
> +static int accumulate_size(void *priv, const void *data, size_t len)
> +{
> +    struct domctl_context *c = priv;
> +
> +    if ( c->len + len < c->len )
> +        return -EOVERFLOW;
> +
> +    c->len += len;
> +
> +    return 0;
> +}
> +
> +static int save_data(void *priv, const void *data, size_t len)
> +{
> +    struct domctl_context *c = priv;
> +
> +    if ( c->len - c->cur < len )
> +        return -ENOSPC;
> +
> +    memcpy(c->buffer + c->cur, data, len);
> +    c->cur += len;
> +
> +    return 0;
> +}
> +
> +static int getdomaincontext(struct domain *d,
> +                            struct xen_domctl_getdomaincontext *gdc)
> +{
> +    struct domctl_context c = { };

Please can you use ZERO_BLOCK_PTR or some such for the buffer
field, such that errnoeous use of the field would not end up
as a (PV-controllable) NULL deref. (Yes, it's a domctl, but
still.) This being common code you also want to get things
right for Arm, irrespective of whether the code will be dead
there for now.

> +    int rc;
> +
> +    if ( d == current->domain )
> +        return -EPERM;
> +
> +    if ( guest_handle_is_null(gdc->buffer) ) /* query for buffer size */
> +    {
> +        if ( gdc->size )
> +            return -EINVAL;
> +
> +        /* dry run to acquire buffer size */
> +        rc = domain_save(d, accumulate_size, &c, true);
> +        if ( rc )
> +            return rc;
> +
> +        gdc->size = c.len;
> +        return 0;
> +    }
> +
> +    c.len = gdc->size;
> +    c.buffer = xmalloc_bytes(c.len);

What sizes are we looking at here? It may be better to use vmalloc()
right from the start. If not, I'd like to advocate for using
xmalloc_array() instead of xmalloc_bytes() - see the almost-XSA
commit cf38b4926e2b.

> +    if ( !c.buffer )
> +        return -ENOMEM;
> +
> +    rc = domain_save(d, save_data, &c, false);
> +
> +    gdc->size = c.cur;
> +    if ( !rc && copy_to_guest(gdc->buffer, c.buffer, gdc->size) )

As to my remark in patch 1 on the size field, applying to this size
field too - copy_to_user{,_hvm}() don't support a 64-bit value (on
y86 at least).

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -38,7 +38,7 @@
>  #include "hvm/save.h"
>  #include "memory.h"
>  
> -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
> +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013

I don't see you making any change making the interface backwards
incompatible, hence no need for the bump.

> @@ -1129,6 +1129,44 @@ struct xen_domctl_vuart_op {
>                                   */
>  };
>  
> +/*
> + * Get/Set domain PV context. The same struct xen_domctl_domaincontext
> + * is used for both commands but with slightly different field semantics
> + * as follows:
> + *
> + * XEN_DOMCTL_getdomaincontext
> + * ---------------------------
> + *
> + * buffer (IN):   The buffer into which the context data should be
> + *                copied, or NULL to query the buffer size that should
> + *                be allocated.
> + * size (IN/OUT): If 'buffer' is NULL then the value passed in must be
> + *                zero, and the value passed out will be the size of the
> + *                buffer to allocate.
> + *                If 'buffer' is non-NULL then the value passed in must
> + *                be the size of the buffer into which data may be copied.

This leaves open whether the size also gets updated in this latter
case.

> + */
> +struct xen_domctl_getdomaincontext {
> +    uint64_t size;

If this is to remain 64-bits (with too large values suitably taken
care of for all cases - see above), uint64_aligned_t please for
consistency, if nothing else.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context
  2020-04-07 17:38 ` [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context Paul Durrant
@ 2020-04-29 15:04   ` Jan Beulich
  2020-05-13 15:27     ` Paul Durrant
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2020-04-29 15:04 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Paul Durrant, Ian Jackson, Wei Liu

On 07.04.2020 19:38, Paul Durrant wrote:
> +int main(int argc, char **argv)
> +{
> +    uint32_t domid;
> +    unsigned int entry;
> +    xc_interface *xch;
> +    int rc;
> +
> +    if ( argc != 2 || !argv[1] || (rc = atoi(argv[1])) < 0 )
> +    {
> +        fprintf(stderr, "usage: %s <domid>\n", argv[0]);
> +        exit(1);
> +    }

Perhaps also allow dumping just a single (vCPU or other) ID?

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/5] common/domain: add a domain context record for shared_info...
  2020-04-28 15:37     ` Paul Durrant
@ 2020-04-30 11:29       ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2020-04-30 11:29 UTC (permalink / raw)
  To: paul
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Ian Jackson',
	'George Dunlap',
	xen-devel

On 28.04.2020 17:37, Paul Durrant wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 20 April 2020 18:35
>> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
>> Cc: Paul Durrant <pdurrant@amazon.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Wei Liu <wl@xen.org>;
>> Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Jan Beulich
>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>
>> Subject: Re: [PATCH v2 4/5] common/domain: add a domain context record for shared_info...
>>
>> Hi Paul,
>>
>> On 07/04/2020 18:38, Paul Durrant wrote:
>>> ... and update xen-domctx to dump some information describing the record.
>>>
>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>>> ---
>>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>>> Cc: Wei Liu <wl@xen.org>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Cc: George Dunlap <george.dunlap@citrix.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Julien Grall <julien@xen.org>
>>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>>>
>>> v2:
>>>   - Drop the header change to define a 'Xen' page size and instead use a
>>>     variable length struct now that the framework makes this is feasible
>>>   - Guard use of 'has_32bit_shinfo' in common code with CONFIG_COMPAT
>>> ---
>>>   tools/misc/xen-domctx.c   | 11 ++++++
>>>   xen/common/domain.c       | 81 +++++++++++++++++++++++++++++++++++++++
>>>   xen/include/public/save.h | 10 ++++-
>>>   3 files changed, 101 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/misc/xen-domctx.c b/tools/misc/xen-domctx.c
>>> index d663522a8b..a8d3922321 100644
>>> --- a/tools/misc/xen-domctx.c
>>> +++ b/tools/misc/xen-domctx.c
>>> @@ -59,6 +59,16 @@ static void dump_header(struct domain_save_descriptor *desc)
>>>       off += desc->length;
>>>   }
>>>
>>> +static void dump_shared_info(struct domain_save_descriptor *desc)
>>> +{
>>> +    DOMAIN_SAVE_TYPE(SHARED_INFO) s;
>>> +    READ(s);
>>> +    printf("    SHARED_INFO: field_width %u buffer size: %lu\n",
>>> +           s.field_width, desc->length - sizeof(s));
>>> +
>>> +    off += desc->length;
>>> +}
>>> +
>>>   static void dump_end(struct domain_save_descriptor *desc)
>>>   {
>>>       DOMAIN_SAVE_TYPE(END) e;
>>> @@ -125,6 +135,7 @@ int main(int argc, char **argv)
>>>           switch (desc.typecode)
>>>           {
>>>           case DOMAIN_SAVE_CODE(HEADER): dump_header(&desc); break;
>>> +        case DOMAIN_SAVE_CODE(SHARED_INFO): dump_shared_info(&desc); break;
>>>           case DOMAIN_SAVE_CODE(END): dump_end(&desc); return 0;
>>>           default:
>>>               printf("Unknown type %u: skipping\n", desc.typecode);
>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>> index 3dcd73f67c..8b72462e07 100644
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -33,6 +33,7 @@
>>>   #include <xen/xenoprof.h>
>>>   #include <xen/irq.h>
>>>   #include <xen/argo.h>
>>> +#include <xen/save.h>
>>>   #include <asm/debugger.h>
>>>   #include <asm/p2m.h>
>>>   #include <asm/processor.h>
>>> @@ -1646,6 +1647,86 @@ int continue_hypercall_on_cpu(
>>>       return 0;
>>>   }
>>>
>>> +static int save_shared_info(const struct vcpu *v, struct domain_context *c,
>>> +                            bool dry_run)
>>> +{
>>> +    struct domain *d = v->domain;
>>> +    struct domain_shared_info_context ctxt = {};
>>> +    size_t hdr_size = offsetof(typeof(ctxt), buffer);
>>> +    size_t size = hdr_size + PAGE_SIZE;
>>> +    int rc;
>>> +
>>> +    rc = DOMAIN_SAVE_BEGIN(SHARED_INFO, c, v, size);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>> +    if ( !dry_run )
>>
>> NIT: I think the if is not necessary here as you don't skip that much code.
>>
> 
> I know, but it is illustrative so I'd rather keep it.

While I agree with the "illustrative", I'd really see this be part
of the struct initializer. Plus its use here made me wonder ...

>>> +        ctxt.field_width =
>>> +#ifdef CONFIG_COMPAT
>>> +            has_32bit_shinfo(d) ? 4 :
>>> +#endif
>>> +            8;
>>> +
>>> +    rc = domain_save_data(c, &ctxt, hdr_size);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>> +    rc = domain_save_data(c, d->shared_info, PAGE_SIZE);
>>> +    if ( rc )
>>> +        return rc;

... why these don't get skipped. It took me going back through
earlier patches to find that there's effectively redundancy in
the passed arguments in that the write callback chosen varies with
whether "dry_run" is true or false.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/5] common/domain: add a domain context record for shared_info...
  2020-04-07 17:38 ` [PATCH v2 4/5] common/domain: add a domain context record for shared_info Paul Durrant
  2020-04-20 17:34   ` Julien Grall
@ 2020-04-30 11:56   ` Jan Beulich
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2020-04-30 11:56 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Paul Durrant, Ian Jackson, George Dunlap, xen-devel

On 07.04.2020 19:38, Paul Durrant wrote:
> --- a/tools/misc/xen-domctx.c
> +++ b/tools/misc/xen-domctx.c
> @@ -59,6 +59,16 @@ static void dump_header(struct domain_save_descriptor *desc)
>      off += desc->length;
>  }
>  
> +static void dump_shared_info(struct domain_save_descriptor *desc)
> +{
> +    DOMAIN_SAVE_TYPE(SHARED_INFO) s;
> +    READ(s);
> +    printf("    SHARED_INFO: field_width %u buffer size: %lu\n",
> +           s.field_width, desc->length - sizeof(s));
> +
> +    off += desc->length;
> +}

And nothing about the actual contents of the shared info struct?

> @@ -1646,6 +1647,86 @@ int continue_hypercall_on_cpu(
>      return 0;
>  }
>  
> +static int save_shared_info(const struct vcpu *v, struct domain_context *c,
> +                            bool dry_run)
> +{
> +    struct domain *d = v->domain;

const?

> +    struct domain_shared_info_context ctxt = {};
> +    size_t hdr_size = offsetof(typeof(ctxt), buffer);
> +    size_t size = hdr_size + PAGE_SIZE;
> +    int rc;
> +
> +    rc = DOMAIN_SAVE_BEGIN(SHARED_INFO, c, v, size);
> +    if ( rc )
> +        return rc;
> +
> +    if ( !dry_run )
> +        ctxt.field_width =
> +#ifdef CONFIG_COMPAT
> +            has_32bit_shinfo(d) ? 4 :
> +#endif
> +            8;

What are 4 and 8 to represent here? Taking Arm32 into account, the
only things I could think of are sizeof(xen_ulong_t) or
sizeof(guest_handle). I'd prefer if literal numbers could be avoided
here, such that it would become clear what property is really meant.

> +    rc = domain_save_data(c, &ctxt, hdr_size);
> +    if ( rc )
> +        return rc;
> +
> +    rc = domain_save_data(c, d->shared_info, PAGE_SIZE);
> +    if ( rc )
> +        return rc;
> +
> +    return domain_save_end(c);
> +}
> +
> +static int load_shared_info(struct vcpu *v, struct domain_context *c)
> +{
> +    struct domain *d = v->domain;
> +    struct domain_shared_info_context ctxt = {};
> +    size_t hdr_size = offsetof(typeof(ctxt), buffer);
> +    size_t size = hdr_size + PAGE_SIZE;
> +    unsigned int i;
> +    int rc;
> +
> +    rc = DOMAIN_LOAD_BEGIN(SHARED_INFO, c, v, size, true);
> +    if ( rc )
> +        return rc;
> +
> +    rc = domain_load_data(c, &ctxt, hdr_size);
> +    if ( rc )
> +        return rc;
> +
> +    for ( i = 0; i < ARRAY_SIZE(ctxt.pad); i++ )
> +        if ( ctxt.pad[i] )
> +            return -EINVAL;
> +
> +    switch ( ctxt.field_width )
> +    {
> +#ifdef CONFIG_COMPAT
> +    case 4:
> +        d->arch.has_32bit_shinfo = 1;

true and ...

> +        break;
> +#endif
> +    case 8:
> +#ifdef CONFIG_COMPAT
> +        d->arch.has_32bit_shinfo = 0;

... false respectively, please.

> +#endif
> +        break;
> +
> +    default:
> +        rc = -EINVAL;
> +        break;
> +    }
> +
> +    rc = domain_load_data(c, d->shared_info, PAGE_SIZE);
> +    if ( rc )
> +        return rc;
> +
> +    return domain_load_end(c);
> +}

While you check the padding fields of the header above, what about
currently unused fields of the shared_info struct itself?

There's a connection between shared_info and vcpu_info - this way
you may or may not restore vcpu_info - depending on guest behavior.
There not being a patch in the series to deal with vcpu_info, the
description here should imo at least outline the intended
interaction (including e.g. ordering between the [supposed] records).

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/5] tools/libxc: make use of domain context SHARED_INFO record...
  2020-04-07 17:38 ` [PATCH v2 5/5] tools/libxc: make use of domain context SHARED_INFO record Paul Durrant
@ 2020-04-30 11:57   ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2020-04-30 11:57 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Paul Durrant, Ian Jackson, Wei Liu

On 07.04.2020 19:38, Paul Durrant wrote:
> ... in the save/restore code.
> 
> This patch replaces direct mapping of the shared_info_frame (retrieved
> using XEN_DOMCTL_getdomaininfo) with save/load of the domain context
> SHARED_INFO record.
> 
> No modifications are made to the definition of the migration stream at
> this point. Subsequent patches will define a record in the libxc domain
> image format for passing domain context and convert the save/restore code
> to use that.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Wei Liu <wl@xen.org>
> 
> v2:
>  - Re-based (now making use of DOMAIN_SAVE_FLAG_IGNORE)
> ---
>  tools/libxc/xc_sr_common.h         |  7 +++-
>  tools/libxc/xc_sr_common_x86.c     | 59 ++++++++++++++++++++++++++++++
>  tools/libxc/xc_sr_common_x86.h     |  4 ++
>  tools/libxc/xc_sr_common_x86_pv.c  | 53 +++++++++++++++++++++++++++
>  tools/libxc/xc_sr_common_x86_pv.h  |  3 ++
>  tools/libxc/xc_sr_restore_x86_pv.c | 40 ++++++++------------
>  tools/libxc/xc_sr_save_x86_pv.c    | 26 ++-----------
>  tools/libxc/xg_save_restore.h      |  1 +
>  8 files changed, 144 insertions(+), 49 deletions(-)

The underlying interface being arch-independent, shouldn't at least
some of the new code go into other than xc_sr_*x86*?

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-04-29 11:02   ` Jan Beulich
@ 2020-05-06 16:44     ` Paul Durrant
  2020-05-07  7:21       ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2020-05-06 16:44 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Ian Jackson',
	'George Dunlap', xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 29 April 2020 12:02
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Roger
> Pau Monné <roger.pau@citrix.com>
> Subject: Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
> 
> On 07.04.2020 19:38, Paul Durrant wrote:
> > +void __init domain_register_save_type(unsigned int tc, const char *name,
> > +                                      bool per_vcpu,
> > +                                      domain_save_handler save,
> > +                                      domain_load_handler load)
> > +{
> > +    BUG_ON(tc > ARRAY_SIZE(handlers));
> 
> >=

Yes.

> 
> > +    ASSERT(!handlers[tc].save);
> > +    ASSERT(!handlers[tc].load);
> > +
> > +    handlers[tc].name = name;
> > +    handlers[tc].per_vcpu = per_vcpu;
> > +    handlers[tc].save = save;
> > +    handlers[tc].load = load;
> > +}
> > +
> > +int domain_save_begin(struct domain_context *c, unsigned int tc,
> > +                      const char *name, const struct vcpu *v, size_t len)
> 
> I find it quite odd for a function like this one to take a
> struct vcpu * rather than a struct domain *. See also below
> comment on the vcpu_id field in the public header.

I guess struct domain + vcpu_id can be used.

> 
> > +{
> > +    int rc;
> > +
> > +    if ( c->log )
> > +        gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
> > +                 (unsigned long)len);
> > +
> > +    BUG_ON(tc != c->desc.typecode);
> > +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> > +
> > +    ASSERT(!c->data_len);
> > +    c->data_len = c->desc.length = len;
> > +
> > +    rc = c->copy.write(c->priv, &c->desc, sizeof(c->desc));
> > +    if ( rc )
> > +        return rc;
> > +
> > +    c->desc.length = 0;
> > +
> > +    return 0;
> > +}
> > +
> > +int domain_save_data(struct domain_context *c, const void *src, size_t len)
> > +{
> > +    if ( c->desc.length + len > c->data_len )
> > +        return -ENOSPC;
> > +
> > +    c->desc.length += len;
> > +
> > +    return c->copy.write(c->priv, src, len);
> > +}
> > +
> > +int domain_save_end(struct domain_context *c)
> 
> I'm sure there is a reason for the split into three load/save
> functions (begin/data/end), but I can't figure it and the
> description also doesn't explain it. They're all used together
> only afaics, in domain_{save,load}_entry(). Or wait, there are
> DOMAIN_{SAVE,LOAD}_BEGIN() macros apparently allowing separate
> use of ..._begin(), but then it's still not clear why ..._end()
> need to be separate from ..._data().

The split is to avoid the need to double-buffer in the save code, shared info being a case in point. If the entire save record needs to be written in one call then the shared info content would need to be copied into a newly allocated save record and then copied again into the aggregate context buffer.

> 
> > +int domain_save(struct domain *d, domain_write_entry write, void *priv,
> > +                bool dry_run)
> > +{
> > +    struct domain_context c = {
> > +        .copy.write = write,
> > +        .priv = priv,
> > +        .log = !dry_run,
> > +    };
> > +    struct domain_save_header h = {
> 
> const? Perhaps even static?
> 

Ok, static is a good idea.

> > +        .magic = DOMAIN_SAVE_MAGIC,
> > +        .version = DOMAIN_SAVE_VERSION,
> > +    };
> > +    struct domain_save_header e;
> > +    unsigned int i;
> > +    int rc;
> > +
> > +    ASSERT(d != current->domain);
> > +
> > +    if ( d->is_dying )
> > +        return -EINVAL;
> 
> Could I talk you into using less generic an error code here, e.g.
> -ESRCH or -ENXIO? There look to be further uses of EINVAL that
> may want replacing.
> 

I'm going to drop this check as it creates a problem for live update, where we actually do need to extract and restore state for dying domains.

> > +    domain_pause(d);
> > +
> > +    c.desc.typecode = DOMAIN_SAVE_CODE(HEADER);
> > +
> > +    rc = DOMAIN_SAVE_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h));
> > +    if ( rc )
> > +        goto out;
> > +
> > +    for ( i = 0; i < ARRAY_SIZE(handlers); i++ )
> > +    {
> > +        domain_save_handler save = handlers[i].save;
> > +
> > +        if ( !save )
> > +            continue;
> > +
> > +        memset(&c.desc, 0, sizeof(c.desc));
> > +        c.desc.typecode = i;
> > +
> > +        if ( handlers[i].per_vcpu )
> > +        {
> > +            struct vcpu *v;
> > +
> > +            for_each_vcpu ( d, v )
> > +            {
> > +                c.desc.vcpu_id = v->vcpu_id;
> > +
> > +                rc = save(v, &c, dry_run);
> > +                if ( rc )
> > +                    goto out;
> > +            }
> > +        }
> > +        else
> > +        {
> > +            rc = save(d->vcpu[0], &c, dry_run);
> > +            if ( rc )
> > +                goto out;
> > +        }
> > +    }
> > +
> > +    memset(&c.desc, 0, sizeof(c.desc));
> > +    c.desc.typecode = DOMAIN_SAVE_CODE(END);
> > +
> > +    rc = DOMAIN_SAVE_ENTRY(END, &c, d->vcpu[0], &e, 0);
> 
> By the looks of it you're passing uninitialized e here; it's just
> that the struct has no members. It would look less odd if you used
> NULL here. Otherwise please don't use literal 0, but sizeof() for
> the last parameter.

I'll init the 'e'.

> 
> > +int domain_load_begin(struct domain_context *c, unsigned int tc,
> > +                      const char *name, const struct vcpu *v, size_t len,
> > +                      bool exact)
> > +{
> > +    if ( c->log )
> > +        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
> > +                 (unsigned long)len);
> > +
> > +    BUG_ON(tc != c->desc.typecode);
> > +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> > +
> > +    if ( (exact && (len != c->desc.length)) ||
> > +         (len < c->desc.length) )
> > +        return -EINVAL;
> 
> How about
> 
>     if ( exact ? len != c->desc.length
>                : len < c->desc.length )
> 

Yes, that doesn't look too bad.

> ? I'm also unsure about the < - don't you mean > instead? Too
> little data would be compensated by zero padding, but too
> much data can't be dealt with. But maybe I'm getting the sense
> of len wrong ...

I think the < is correct. The caller needs to have at least enough space to accommodate the context record.

> > +int domain_load(struct domain *d, domain_read_entry read, void *priv)
> > +{
> > +    struct domain_context c = {
> > +        .copy.read = read,
> > +        .priv = priv,
> > +        .log = true,
> > +    };
> > +    struct domain_save_header h;
> > +    int rc;
> > +
> > +    ASSERT(d != current->domain);
> > +
> > +    if ( d->is_dying )
> > +        return -EINVAL;
> > +
> > +    rc = c.copy.read(c.priv, &c.desc, sizeof(c.desc));
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( c.desc.typecode != DOMAIN_SAVE_CODE(HEADER) || c.desc.vcpu_id ||
> > +         c.desc.flags )
> > +        return -EINVAL;
> > +
> > +    rc = DOMAIN_LOAD_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h), true);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( h.magic != DOMAIN_SAVE_MAGIC || h.version != DOMAIN_SAVE_VERSION )
> > +        return -EINVAL;
> > +
> > +    domain_pause(d);
> > +
> > +    for (;;)
> > +    {
> > +        unsigned int i;
> > +        unsigned int flags;
> > +        domain_load_handler load;
> > +        struct vcpu *v;
> > +
> > +        rc = c.copy.read(c.priv, &c.desc, sizeof(c.desc));
> > +        if ( rc )
> > +            break;
> > +
> > +        rc = -EINVAL;
> > +
> > +        flags = c.desc.flags;
> > +        if ( flags & ~DOMAIN_SAVE_FLAG_IGNORE )
> > +            break;
> > +
> > +        if ( c.desc.typecode == DOMAIN_SAVE_CODE(END) ) {
> 
> Brace placement.
> 

Oops, yes.

> > +            if ( !(flags & DOMAIN_SAVE_FLAG_IGNORE) )
> > +                rc = DOMAIN_LOAD_ENTRY(END, &c, d->vcpu[0], NULL, 0, true);
> 
> The public header says DOMAIN_SAVE_FLAG_IGNORE is invalid for
> END.
> 

Indeed, and it should be... don't know why I put that in there.

> > +            break;
> > +        }
> > +
> > +        i = c.desc.typecode;
> > +        if ( i >= ARRAY_SIZE(handlers) )
> > +            break;
> > +
> > +        if ( (!handlers[i].per_vcpu && c.desc.vcpu_id) ||
> > +             (c.desc.vcpu_id >= d->max_vcpus) )
> > +            break;
> > +
> > +        v = d->vcpu[c.desc.vcpu_id];
> > +
> > +        if ( flags & DOMAIN_SAVE_FLAG_IGNORE )
> > +        {
> > +            /* Sink the data */
> > +            rc = domain_load_entry(&c, c.desc.typecode, "IGNORED",
> > +                                   v, NULL, c.desc.length, true);
> 
> IOW the read handlers need to be able to deal with a NULL dst?
> Sounds a little fragile to me. Is there a reason
> domain_load_data() can't skip the call to read()?

Something has to move the cursor so sinking the data using a NULL dst seemed like the best way to avoid the need for a separate callback function.

> 
> > --- /dev/null
> > +++ b/xen/include/public/save.h
> > @@ -0,0 +1,84 @@
> > +/*
> > + * save.h
> > + *
> > + * Structure definitions for common PV/HVM domain state that is held by
> > + * Xen and must be saved along with the domain's memory.
> > + *
> > + * Copyright Amazon.com Inc. or its affiliates.
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a copy
> > + * of this software and associated documentation files (the "Software"), to
> > + * deal in the Software without restriction, including without limitation the
> > + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> > + * sell copies of the Software, and to permit persons to whom the Software is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + */
> > +
> > +#ifndef __XEN_PUBLIC_SAVE_H__
> > +#define __XEN_PUBLIC_SAVE_H__
> > +
> > +#include "xen.h"
> > +
> > +/* Each entry is preceded by a descriptor */
> > +struct domain_save_descriptor {
> 
> Throughout this new public header, let's please play nice in name
> space terms: Prefix global scope items with xen_ / XEN_
> respectively, and don't introduce violations of the standard's
> rules (e.g. _DOMAIN_SAVE_FLAG_IGNORE below). The latter point also
> goes for naming outside of the public header, of course.

Sorry, I just didn't pay enough attention to the cut'n'paste from the hvm save.h; I'll fix these up.

> 
> > +    uint16_t typecode;
> > +    /*
> > +     * Each entry will contain either to global or per-vcpu domain state.
> 
> s/contain/apply/ or drop "to"?

Yes, 'to' should be dropped.

> 
> > +     * Entries relating to global state should have zero in this field.
> 
> Is there a reason to specify zero?
> 

Not particularly but I thought it best to at least specify a value in all cases.

> > +     */
> > +    uint16_t vcpu_id;
> 
> Seeing (possibly) multi-instance records in HVM state (PIC, IO-APIC, HPET),
> wouldn't it be better to generalize this to ID, meaning "vCPU ID" just for
> per-vCPU state?
> 

True, a generic 'instance' would be needed for such records. I'll so as you suggest.

> > +    uint32_t flags;
> > +    /*
> > +     * When restoring state this flag can be set in a descriptor to cause
> > +     * its content to be ignored.
> > +     *
> > +     * NOTE: It is invalid to set this flag for HEADER or END records (see
> > +     *       below).
> > +     */
> > +#define _DOMAIN_SAVE_FLAG_IGNORE 0
> > +#define DOMAIN_SAVE_FLAG_IGNORE (1u << _DOMAIN_SAVE_FLAG_IGNORE)
> 
> As per your reply to Julien I think this wants further clarification on
> the intentions with this flag. I'm also uncertain it is a good idea to
> allow such partial restores - there may be dependencies between records,
> and hence things can easily go wrong in perhaps non-obvious ways.
> 

OK, I'll drop this for now. It could be added later if it is needed.

> > +    /* Entry length not including this descriptor */
> > +    uint64_t length;
> 
> Do you really envision descriptors with more than 4Gb of data? If so,
> for 32-bit purposes wouldn't this want to be uint64_aligned_t?
> 

I don't think we'll ever see a single record that big. I'll drop to 32 bits.

> > +};
> > +
> > +/*
> > + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> > + * binds these things together.
> > + */
> > +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> > +    struct __DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
> > +
> > +#define DOMAIN_SAVE_CODE(_x) \
> > +    (sizeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
> > +#define DOMAIN_SAVE_TYPE(_x) \
> > +    typeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)
> 
> Just like the typeof() I dont think the sizeof() needs an outer
> pair of parentheses. I also don't see why 0 gets parenthesized.
> 

Cut'n'paste... I'll fix it.

> > +/* Terminating entry */
> > +struct domain_save_end {};
> > +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
> 
> If the header gets a __XEN__ || __XEN_TOOLS__ restriction, as
> suggested by Julien, using 0 here may be fine. If not, char[0]
> and typeof() aren't legal plain C and hence would need to be
> avoided.
> 

I'll restrict to tools.

> > --- /dev/null
> > +++ b/xen/include/xen/save.h
> > @@ -0,0 +1,152 @@
> > +/*
> > + * save.h: support routines for save/restore
> > + *
> > + * Copyright Amazon.com Inc. or its affiliates.
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> > + * more details.
> > + *
> > + * You should have received a copy of the GNU General Public License along with
> > + * this program; If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef __XEN_SAVE_H__
> > +#define __XEN_SAVE_H__
> > +
> > +#include <xen/sched.h>
> > +#include <xen/types.h>
> > +#include <xen/init.h>
> 
> Please sort these.
> 

Ok.

> > +#include <public/xen.h>
> > +#include <public/save.h>
> 
> The latter includes the former anyway - is the former necessary
> for some reason nevertheless?
> 

No. I suspect it was at one point, but it can be dropped.

> > +struct domain_context;
> > +
> > +int domain_save_begin(struct domain_context *c, unsigned int tc,
> > +                      const char *name, const struct vcpu *v, size_t len);
> > +
> > +#define DOMAIN_SAVE_BEGIN(_x, _c, _v, _len) \
> > +        domain_save_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len))
> 
> In new code I'd like to ask for no leading underscores on macro
> parameters as well as no unnecessary parenthesization of macro
> parameters (e.g. when they simply get passed on to another macro
> or function without being part of a larger expression).

Personally I think it is generally good practice to parenthesize but I can drop if you prefer.

> 
> > +int domain_save_data(struct domain_context *c, const void *data, size_t len);
> > +int domain_save_end(struct domain_context *c);
> > +
> > +static inline int domain_save_entry(struct domain_context *c,
> > +                                    unsigned int tc, const char *name,
> > +                                    const struct vcpu *v, const void *src,
> > +                                    size_t len)
> > +{
> > +    int rc;
> > +
> > +    rc = domain_save_begin(c, tc, name, v, len);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    rc = domain_save_data(c, src, len);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    return domain_save_end(c);
> > +}
> > +
> > +#define DOMAIN_SAVE_ENTRY(_x, _c, _v, _src, _len) \
> > +    domain_save_entry((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_src), (_len))
> > +
> > +int domain_load_begin(struct domain_context *c, unsigned int tc,
> > +                      const char *name, const struct vcpu *v, size_t len,
> > +                      bool exact);
> > +
> > +#define DOMAIN_LOAD_BEGIN(_x, _c, _v, _len, _exact) \
> > +        domain_load_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len), \
> > +                          (_exact));
> > +
> > +int domain_load_data(struct domain_context *c, void *data, size_t len);
> > +int domain_load_end(struct domain_context *c);
> > +
> > +static inline int domain_load_entry(struct domain_context *c,
> > +                                    unsigned int tc, const char *name,
> > +                                    const struct vcpu *v, void *dst,
> > +                                    size_t len, bool exact)
> > +{
> > +    int rc;
> > +
> > +    rc = domain_load_begin(c, tc, name, v, len, exact);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    rc = domain_load_data(c, dst, len);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    return domain_load_end(c);
> > +}
> > +
> > +#define DOMAIN_LOAD_ENTRY(_x, _c, _v, _dst, _len, _exact) \
> > +    domain_load_entry((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_dst), (_len), \
> > +                          (_exact))
> > +
> > +/*
> > + * The 'dry_run' flag indicates that the caller of domain_save() (see
> > + * below) is not trying to actually acquire the data, only the size
> > + * of the data. The save handler can therefore limit work to only that
> > + * which is necessary to call DOMAIN_SAVE_BEGIN/ENTRY() with an accurate
> > + * value for '_len'.
> > + */
> > +typedef int (*domain_save_handler)(const struct vcpu *v,
> > +                                   struct domain_context *h,
> > +                                   bool dry_run);
> > +typedef int (*domain_load_handler)(struct vcpu *v,
> > +                                   struct domain_context *h);
> 
> Does a load handler have a need to modify struct domain_context?
> 

Not sure. I'll try const-ing.

  Paul

> Jan



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-05-06 16:44     ` Paul Durrant
@ 2020-05-07  7:21       ` Jan Beulich
  2020-05-07  7:34         ` Paul Durrant
  2020-05-07  8:35         ` Julien Grall
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Beulich @ 2020-05-07  7:21 UTC (permalink / raw)
  To: paul
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Ian Jackson',
	'George Dunlap', xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'

On 06.05.2020 18:44, Paul Durrant wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 29 April 2020 12:02
>>
>> On 07.04.2020 19:38, Paul Durrant wrote:
>>> +int domain_load_begin(struct domain_context *c, unsigned int tc,
>>> +                      const char *name, const struct vcpu *v, size_t len,
>>> +                      bool exact)
>>> +{
>>> +    if ( c->log )
>>> +        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
>>> +                 (unsigned long)len);
>>> +
>>> +    BUG_ON(tc != c->desc.typecode);
>>> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
>>> +
>>> +    if ( (exact && (len != c->desc.length)) ||
>>> +         (len < c->desc.length) )
>>> +        return -EINVAL;
>>
>> How about
>>
>>     if ( exact ? len != c->desc.length
>>                : len < c->desc.length )
>>
> 
> Yes, that doesn't look too bad.
> 
>> ? I'm also unsure about the < - don't you mean > instead? Too
>> little data would be compensated by zero padding, but too
>> much data can't be dealt with. But maybe I'm getting the sense
>> of len wrong ...
> 
> I think the < is correct. The caller needs to have at least enough
> space to accommodate the context record.

But this is load, not save - the caller supplies the data. If
there's less data than can be fit, it'll be zero-extended. If
there's too much data, the excess you don't know what to do
with (it might be okay to tolerate it being all zero).

>>> +        if ( (!handlers[i].per_vcpu && c.desc.vcpu_id) ||
>>> +             (c.desc.vcpu_id >= d->max_vcpus) )
>>> +            break;
>>> +
>>> +        v = d->vcpu[c.desc.vcpu_id];
>>> +
>>> +        if ( flags & DOMAIN_SAVE_FLAG_IGNORE )
>>> +        {
>>> +            /* Sink the data */
>>> +            rc = domain_load_entry(&c, c.desc.typecode, "IGNORED",
>>> +                                   v, NULL, c.desc.length, true);
>>
>> IOW the read handlers need to be able to deal with a NULL dst?
>> Sounds a little fragile to me. Is there a reason
>> domain_load_data() can't skip the call to read()?
> 
> Something has to move the cursor so sinking the data using a
> NULL dst seemed like the best way to avoid the need for a
> separate callback function.

And domain_load_data() can't itself advance the cursor in such
a case, with no callback involved at all?

>>> +    uint16_t typecode;
>>> +    /*
>>> +     * Each entry will contain either to global or per-vcpu domain state.
>>> +     * Entries relating to global state should have zero in this field.
>>
>> Is there a reason to specify zero?
>>
> 
> Not particularly but I thought it best to at least specify a value in all cases.

A specific value is certainly a good idea; I wonder though whether
an obviously invalid one (like ~0) wouldn't be better then,
allowing this ID to later be assigned meaning in such cases if
need be.

>>> +     */
>>> +    uint16_t vcpu_id;
>>
>> Seeing (possibly) multi-instance records in HVM state (PIC, IO-APIC, HPET),
>> wouldn't it be better to generalize this to ID, meaning "vCPU ID" just for
>> per-vCPU state?
> 
> True, a generic 'instance' would be needed for such records. I'll so as you suggest.

Which, along with my comment on domain_save_begin() taking a
struct vcpu * right now, will then indeed require changing
to a (struct domain *, unsigned int instance) tuple, I guess.

>>> +#define DOMAIN_SAVE_BEGIN(_x, _c, _v, _len) \
>>> +        domain_save_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len))
>>
>> In new code I'd like to ask for no leading underscores on macro
>> parameters as well as no unnecessary parenthesization of macro
>> parameters (e.g. when they simply get passed on to another macro
>> or function without being part of a larger expression).
> 
> Personally I think it is generally good practice to parenthesize
> but I can drop if you prefer.

To be honest - it's more than just "prefer": Excess parentheses
hamper readability. There shouldn't be too few, and since macros
already require quite a lot of them, imo we should strive to
have exactly as many as are needed.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-05-07  7:21       ` Jan Beulich
@ 2020-05-07  7:34         ` Paul Durrant
  2020-05-07  7:39           ` Jan Beulich
  2020-05-07  8:35         ` Julien Grall
  1 sibling, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2020-05-07  7:34 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Ian Jackson',
	'George Dunlap', xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 07 May 2020 08:22
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson'
> <ian.jackson@eu.citrix.com>; 'Julien Grall' <julien@xen.org>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>;
> 'Roger Pau Monné' <roger.pau@citrix.com>
> Subject: Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
> 
> On 06.05.2020 18:44, Paul Durrant wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 29 April 2020 12:02
> >>
> >> On 07.04.2020 19:38, Paul Durrant wrote:
> >>> +int domain_load_begin(struct domain_context *c, unsigned int tc,
> >>> +                      const char *name, const struct vcpu *v, size_t len,
> >>> +                      bool exact)
> >>> +{
> >>> +    if ( c->log )
> >>> +        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
> >>> +                 (unsigned long)len);
> >>> +
> >>> +    BUG_ON(tc != c->desc.typecode);
> >>> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> >>> +
> >>> +    if ( (exact && (len != c->desc.length)) ||
> >>> +         (len < c->desc.length) )
> >>> +        return -EINVAL;
> >>
> >> How about
> >>
> >>     if ( exact ? len != c->desc.length
> >>                : len < c->desc.length )
> >>
> >
> > Yes, that doesn't look too bad.
> >
> >> ? I'm also unsure about the < - don't you mean > instead? Too
> >> little data would be compensated by zero padding, but too
> >> much data can't be dealt with. But maybe I'm getting the sense
> >> of len wrong ...
> >
> > I think the < is correct. The caller needs to have at least enough
> > space to accommodate the context record.
> 
> But this is load, not save - the caller supplies the data. If
> there's less data than can be fit, it'll be zero-extended. If
> there's too much data, the excess you don't know what to do
> with (it might be okay to tolerate it being all zero).
> 

But this is a callback. The outer load function iterates over the records calling the appropriate hander for each one. Those handlers then call this function saying how much data they expect and whether they want exactly that amount, or whether they can tolerate less (i.e. zero-extend). Hence len < c->desc.length is an error, because it means the descriptor contains more data than the hander knows how to handle.

> >>> +        if ( (!handlers[i].per_vcpu && c.desc.vcpu_id) ||
> >>> +             (c.desc.vcpu_id >= d->max_vcpus) )
> >>> +            break;
> >>> +
> >>> +        v = d->vcpu[c.desc.vcpu_id];
> >>> +
> >>> +        if ( flags & DOMAIN_SAVE_FLAG_IGNORE )
> >>> +        {
> >>> +            /* Sink the data */
> >>> +            rc = domain_load_entry(&c, c.desc.typecode, "IGNORED",
> >>> +                                   v, NULL, c.desc.length, true);
> >>
> >> IOW the read handlers need to be able to deal with a NULL dst?
> >> Sounds a little fragile to me. Is there a reason
> >> domain_load_data() can't skip the call to read()?
> >
> > Something has to move the cursor so sinking the data using a
> > NULL dst seemed like the best way to avoid the need for a
> > separate callback function.
> 
> And domain_load_data() can't itself advance the cursor in such
> a case, with no callback involved at all?
> 

How could it do that without a callback? Doing such a thing negates the utility of the ignore flag anyway since it implies the caller is going to edit the load stream in advance. Since I'm going to drop the ignore flag in v3 anyway, this is all a bit academic.

> >>> +    uint16_t typecode;
> >>> +    /*
> >>> +     * Each entry will contain either to global or per-vcpu domain state.
> >>> +     * Entries relating to global state should have zero in this field.
> >>
> >> Is there a reason to specify zero?
> >>
> >
> > Not particularly but I thought it best to at least specify a value in all cases.
> 
> A specific value is certainly a good idea; I wonder though whether
> an obviously invalid one (like ~0) wouldn't be better then,
> allowing this ID to later be assigned meaning in such cases if
> need be.
> 

Ok, that sounds reasonable. I'll #define something for convenience.

> >>> +     */
> >>> +    uint16_t vcpu_id;
> >>
> >> Seeing (possibly) multi-instance records in HVM state (PIC, IO-APIC, HPET),
> >> wouldn't it be better to generalize this to ID, meaning "vCPU ID" just for
> >> per-vCPU state?
> >
> > True, a generic 'instance' would be needed for such records. I'll so as you suggest.
> 
> Which, along with my comment on domain_save_begin() taking a
> struct vcpu * right now, will then indeed require changing
> to a (struct domain *, unsigned int instance) tuple, I guess.
> 

Yes.

  Paul



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-05-07  7:34         ` Paul Durrant
@ 2020-05-07  7:39           ` Jan Beulich
  2020-05-07  7:45             ` Paul Durrant
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2020-05-07  7:39 UTC (permalink / raw)
  To: paul
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Ian Jackson',
	'George Dunlap', xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'

On 07.05.2020 09:34, Paul Durrant wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 07 May 2020 08:22
>>
>> On 06.05.2020 18:44, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 29 April 2020 12:02
>>>>
>>>> On 07.04.2020 19:38, Paul Durrant wrote:
>>>>> +int domain_load_begin(struct domain_context *c, unsigned int tc,
>>>>> +                      const char *name, const struct vcpu *v, size_t len,
>>>>> +                      bool exact)
>>>>> +{
>>>>> +    if ( c->log )
>>>>> +        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
>>>>> +                 (unsigned long)len);
>>>>> +
>>>>> +    BUG_ON(tc != c->desc.typecode);
>>>>> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
>>>>> +
>>>>> +    if ( (exact && (len != c->desc.length)) ||
>>>>> +         (len < c->desc.length) )
>>>>> +        return -EINVAL;
>>>>
>>>> How about
>>>>
>>>>     if ( exact ? len != c->desc.length
>>>>                : len < c->desc.length )
>>>>
>>>
>>> Yes, that doesn't look too bad.
>>>
>>>> ? I'm also unsure about the < - don't you mean > instead? Too
>>>> little data would be compensated by zero padding, but too
>>>> much data can't be dealt with. But maybe I'm getting the sense
>>>> of len wrong ...
>>>
>>> I think the < is correct. The caller needs to have at least enough
>>> space to accommodate the context record.
>>
>> But this is load, not save - the caller supplies the data. If
>> there's less data than can be fit, it'll be zero-extended. If
>> there's too much data, the excess you don't know what to do
>> with (it might be okay to tolerate it being all zero).
>>
> 
> But this is a callback. The outer load function iterates over
> the records calling the appropriate hander for each one. Those
> handlers then call this function saying how much data they
> expect and whether they want exactly that amount, or whether
> they can tolerate less (i.e. zero-extend). Hence
> len < c->desc.length is an error, because it means the
> descriptor contains more data than the hander knows how to
> handle.

Oh, I see - "But maybe I'm getting the sense of len wrong ..."
then indeed applies.

Any thoughts on tolerating the excess data being zero?

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-05-07  7:39           ` Jan Beulich
@ 2020-05-07  7:45             ` Paul Durrant
  2020-05-07  8:17               ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Paul Durrant @ 2020-05-07  7:45 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Ian Jackson',
	'George Dunlap', xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 07 May 2020 08:40
> To: paul@xen.org
> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson'
> <ian.jackson@eu.citrix.com>; 'Julien Grall' <julien@xen.org>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>;
> 'Roger Pau Monné' <roger.pau@citrix.com>
> Subject: Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
> 
> On 07.05.2020 09:34, Paul Durrant wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 07 May 2020 08:22
> >>
> >> On 06.05.2020 18:44, Paul Durrant wrote:
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 29 April 2020 12:02
> >>>>
> >>>> On 07.04.2020 19:38, Paul Durrant wrote:
> >>>>> +int domain_load_begin(struct domain_context *c, unsigned int tc,
> >>>>> +                      const char *name, const struct vcpu *v, size_t len,
> >>>>> +                      bool exact)
> >>>>> +{
> >>>>> +    if ( c->log )
> >>>>> +        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
> >>>>> +                 (unsigned long)len);
> >>>>> +
> >>>>> +    BUG_ON(tc != c->desc.typecode);
> >>>>> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> >>>>> +
> >>>>> +    if ( (exact && (len != c->desc.length)) ||
> >>>>> +         (len < c->desc.length) )
> >>>>> +        return -EINVAL;
> >>>>
> >>>> How about
> >>>>
> >>>>     if ( exact ? len != c->desc.length
> >>>>                : len < c->desc.length )
> >>>>
> >>>
> >>> Yes, that doesn't look too bad.
> >>>
> >>>> ? I'm also unsure about the < - don't you mean > instead? Too
> >>>> little data would be compensated by zero padding, but too
> >>>> much data can't be dealt with. But maybe I'm getting the sense
> >>>> of len wrong ...
> >>>
> >>> I think the < is correct. The caller needs to have at least enough
> >>> space to accommodate the context record.
> >>
> >> But this is load, not save - the caller supplies the data. If
> >> there's less data than can be fit, it'll be zero-extended. If
> >> there's too much data, the excess you don't know what to do
> >> with (it might be okay to tolerate it being all zero).
> >>
> >
> > But this is a callback. The outer load function iterates over
> > the records calling the appropriate hander for each one. Those
> > handlers then call this function saying how much data they
> > expect and whether they want exactly that amount, or whether
> > they can tolerate less (i.e. zero-extend). Hence
> > len < c->desc.length is an error, because it means the
> > descriptor contains more data than the hander knows how to
> > handle.
> 
> Oh, I see - "But maybe I'm getting the sense of len wrong ..."
> then indeed applies.
> 
> Any thoughts on tolerating the excess data being zero?
> 

Well the point of the check here is to not tolerate excess data... Are you suggesting that it might be a reasonable idea? If so, then yes, insisting it is all zero would be an alternative.

  Paul




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-05-07  7:45             ` Paul Durrant
@ 2020-05-07  8:17               ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2020-05-07  8:17 UTC (permalink / raw)
  To: paul
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Ian Jackson',
	'George Dunlap', xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'

On 07.05.2020 09:45, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 07 May 2020 08:40
>> To: paul@xen.org
>> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Andrew Cooper'
>> <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson'
>> <ian.jackson@eu.citrix.com>; 'Julien Grall' <julien@xen.org>; 'Stefano Stabellini'
>> <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>;
>> 'Roger Pau Monné' <roger.pau@citrix.com>
>> Subject: Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
>>
>> On 07.05.2020 09:34, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 07 May 2020 08:22
>>>>
>>>> On 06.05.2020 18:44, Paul Durrant wrote:
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: 29 April 2020 12:02
>>>>>>
>>>>>> On 07.04.2020 19:38, Paul Durrant wrote:
>>>>>>> +int domain_load_begin(struct domain_context *c, unsigned int tc,
>>>>>>> +                      const char *name, const struct vcpu *v, size_t len,
>>>>>>> +                      bool exact)
>>>>>>> +{
>>>>>>> +    if ( c->log )
>>>>>>> +        gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
>>>>>>> +                 (unsigned long)len);
>>>>>>> +
>>>>>>> +    BUG_ON(tc != c->desc.typecode);
>>>>>>> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);
>>>>>>> +
>>>>>>> +    if ( (exact && (len != c->desc.length)) ||
>>>>>>> +         (len < c->desc.length) )
>>>>>>> +        return -EINVAL;
>>>>>>
>>>>>> How about
>>>>>>
>>>>>>     if ( exact ? len != c->desc.length
>>>>>>                : len < c->desc.length )
>>>>>>
>>>>>
>>>>> Yes, that doesn't look too bad.
>>>>>
>>>>>> ? I'm also unsure about the < - don't you mean > instead? Too
>>>>>> little data would be compensated by zero padding, but too
>>>>>> much data can't be dealt with. But maybe I'm getting the sense
>>>>>> of len wrong ...
>>>>>
>>>>> I think the < is correct. The caller needs to have at least enough
>>>>> space to accommodate the context record.
>>>>
>>>> But this is load, not save - the caller supplies the data. If
>>>> there's less data than can be fit, it'll be zero-extended. If
>>>> there's too much data, the excess you don't know what to do
>>>> with (it might be okay to tolerate it being all zero).
>>>>
>>>
>>> But this is a callback. The outer load function iterates over
>>> the records calling the appropriate hander for each one. Those
>>> handlers then call this function saying how much data they
>>> expect and whether they want exactly that amount, or whether
>>> they can tolerate less (i.e. zero-extend). Hence
>>> len < c->desc.length is an error, because it means the
>>> descriptor contains more data than the hander knows how to
>>> handle.
>>
>> Oh, I see - "But maybe I'm getting the sense of len wrong ..."
>> then indeed applies.
>>
>> Any thoughts on tolerating the excess data being zero?
>>
> 
> Well the point of the check here is to not tolerate excess data...
> Are you suggesting that it might be a reasonable idea?

Well - it looks to be the obvious counterpart to zero-extending.
I'm not going to assert though that I've thought through all
possible consequences...

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-05-07  7:21       ` Jan Beulich
  2020-05-07  7:34         ` Paul Durrant
@ 2020-05-07  8:35         ` Julien Grall
  2020-05-07  8:58           ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Julien Grall @ 2020-05-07  8:35 UTC (permalink / raw)
  To: Jan Beulich, paul
  Cc: 'Stefano Stabellini', 'Wei Liu',
	'Andrew Cooper', 'Paul Durrant',
	'Ian Jackson', 'George Dunlap',
	xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'



On 07/05/2020 08:21, Jan Beulich wrote:
> On 06.05.2020 18:44, Paul Durrant wrote:
>>>> +#define DOMAIN_SAVE_BEGIN(_x, _c, _v, _len) \
>>>> +        domain_save_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len))
>>>
>>> In new code I'd like to ask for no leading underscores on macro
>>> parameters as well as no unnecessary parenthesization of macro
>>> parameters (e.g. when they simply get passed on to another macro
>>> or function without being part of a larger expression).
>>
>> Personally I think it is generally good practice to parenthesize
>> but I can drop if you prefer.
> 
> To be honest - it's more than just "prefer": Excess parentheses
> hamper readability. There shouldn't be too few, and since macros
> already require quite a lot of them, imo we should strive to
> have exactly as many as are needed.

While I understand that too many parentheses may make the code worse, in 
the case of the macros, adding them for each argument is a good 
practice. This pretty simple to follow and avoid the mistake to forget 
to protect an argument correctly.

So I would let the contributor decides whether he wants to protect all 
the macro arguments or just as a need basis.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-05-07  8:35         ` Julien Grall
@ 2020-05-07  8:58           ` Jan Beulich
  2020-05-07  9:31             ` Julien Grall
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2020-05-07  8:58 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Stefano Stabellini', 'Wei Liu',
	paul, 'Andrew Cooper', 'Paul Durrant',
	'Ian Jackson', 'George Dunlap',
	xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'

On 07.05.2020 10:35, Julien Grall wrote:
> 
> 
> On 07/05/2020 08:21, Jan Beulich wrote:
>> On 06.05.2020 18:44, Paul Durrant wrote:
>>>>> +#define DOMAIN_SAVE_BEGIN(_x, _c, _v, _len) \
>>>>> +        domain_save_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len))
>>>>
>>>> In new code I'd like to ask for no leading underscores on macro
>>>> parameters as well as no unnecessary parenthesization of macro
>>>> parameters (e.g. when they simply get passed on to another macro
>>>> or function without being part of a larger expression).
>>>
>>> Personally I think it is generally good practice to parenthesize
>>> but I can drop if you prefer.
>>
>> To be honest - it's more than just "prefer": Excess parentheses
>> hamper readability. There shouldn't be too few, and since macros
>> already require quite a lot of them, imo we should strive to
>> have exactly as many as are needed.
> 
> While I understand that too many parentheses may make the code
> worse, in the case of the macros, adding them for each argument
> is a good practice. This pretty simple to follow and avoid the
> mistake to forget to protect an argument correctly.
> 
> So I would let the contributor decides whether he wants to
> protect all the macro arguments or just as a need basis.

This is a possible alternative position to take, accepting the
worse readability. But then this would please need applying in
full (as far as it's possible - operands to # or ## of course
can't be parenthesized):

#define DOMAIN_SAVE_BEGIN(x, c, v, len) \
        domain_save_begin((c), DOMAIN_SAVE_CODE((x)), #x, (v), (len))

which might look a little odd even to you?

As to readability - I'm sure you realize that from time to time
one needs to look at preprocessor output, where parentheses used
like this plus parentheses then also applied inside the invoked
macro add to one another. All in all I don't really buy the
"avoid the mistake to forget to protect an argument correctly"
argument to outweigh the downsides. We're doing this work not on
an occasional basis, but as a job. There should be a minimum
level of care everyone applies.

Jan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
  2020-05-07  8:58           ` Jan Beulich
@ 2020-05-07  9:31             ` Julien Grall
  0 siblings, 0 replies; 30+ messages in thread
From: Julien Grall @ 2020-05-07  9:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: 'Stefano Stabellini', 'Wei Liu',
	paul, 'Andrew Cooper', 'Paul Durrant',
	'Ian Jackson', 'George Dunlap',
	xen-devel, 'Volodymyr Babchuk',
	'Roger Pau Monné'

Hi,

On 07/05/2020 09:58, Jan Beulich wrote:
> On 07.05.2020 10:35, Julien Grall wrote:
>>
>>
>> On 07/05/2020 08:21, Jan Beulich wrote:
>>> On 06.05.2020 18:44, Paul Durrant wrote:
>>>>>> +#define DOMAIN_SAVE_BEGIN(_x, _c, _v, _len) \
>>>>>> +        domain_save_begin((_c), DOMAIN_SAVE_CODE(_x), #_x, (_v), (_len))
>>>>>
>>>>> In new code I'd like to ask for no leading underscores on macro
>>>>> parameters as well as no unnecessary parenthesization of macro
>>>>> parameters (e.g. when they simply get passed on to another macro
>>>>> or function without being part of a larger expression).
>>>>
>>>> Personally I think it is generally good practice to parenthesize
>>>> but I can drop if you prefer.
>>>
>>> To be honest - it's more than just "prefer": Excess parentheses
>>> hamper readability. There shouldn't be too few, and since macros
>>> already require quite a lot of them, imo we should strive to
>>> have exactly as many as are needed.
>>
>> While I understand that too many parentheses may make the code
>> worse, in the case of the macros, adding them for each argument
>> is a good practice. This pretty simple to follow and avoid the
>> mistake to forget to protect an argument correctly.
>>
>> So I would let the contributor decides whether he wants to
>> protect all the macro arguments or just as a need basis.
> 
> This is a possible alternative position to take, accepting the
> worse readability. But then this would please need applying in
> full (as far as it's possible - operands to # or ## of course
> can't be parenthesized):
> 
> #define DOMAIN_SAVE_BEGIN(x, c, v, len) \
>          domain_save_begin((c), DOMAIN_SAVE_CODE((x)), #x, (v), (len))
> 
> which might look a little odd even to you?

One pair of parenthesis looks wasteful but it doesn't bother that much.

> 
> As to readability - I'm sure you realize that from time to time
> one needs to look at preprocessor output, where parentheses used
> like this plus parentheses then also applied inside the invoked
> macro add to one another. All in all I don't really buy the
> "avoid the mistake to forget to protect an argument correctly"
> argument to outweigh the downsides. We're doing this work not on
> an occasional basis, but as a job. There should be a minimum
> level of care everyone applies.

I am sure you heard about "human error" in the past. Even if you do 
something all day along, you will make a mistake one day. By requesting 
a contributor to limit the number of parenthesis, then you increase the 
chance he/she will screw up one day.

You might think reviewers will catch such error, but XSA-316 proved that 
it is possible to miss it. I was the author of the patch and you were 
one the reviewer. As you can see even an experienced contributor and 
reviewer can make mistake... we are all human after all ;).

I don't ask to be over-proctective, but we should not lay trap to other 
contributors for them to fall into accidentally. In this case, Paul 
thinks the parentheses will help him. So I would not impose him to 
remove them and possibly make a mistake.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
  2020-04-29 14:50   ` Jan Beulich
@ 2020-05-13 15:06     ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2020-05-13 15:06 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Ian Jackson',
	'George Dunlap', xen-devel, 'Daniel De Graaf'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 29 April 2020 15:51
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Daniel De Graaf
> <dgdegra@tycho.nsa.gov>; Ian Jackson <ian.jackson@eu.citrix.com>; Wei Liu <wl@xen.org>; Andrew Cooper
> <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Julien Grall <julien@xen.org>;
> Stefano Stabellini <sstabellini@kernel.org>
> Subject: Re: [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext
> 
> On 07.04.2020 19:38, Paul Durrant wrote:
> > @@ -358,6 +359,113 @@ static struct vnuma_info *vnuma_init(const struct xen_domctl_vnuma *uinfo,
> >      return ERR_PTR(ret);
> >  }
> >
> > +struct domctl_context
> > +{
> > +    void *buffer;
> > +    size_t len;
> > +    size_t cur;
> > +};
> > +
> > +static int accumulate_size(void *priv, const void *data, size_t len)
> > +{
> > +    struct domctl_context *c = priv;
> > +
> > +    if ( c->len + len < c->len )
> > +        return -EOVERFLOW;
> > +
> > +    c->len += len;
> > +
> > +    return 0;
> > +}
> > +
> > +static int save_data(void *priv, const void *data, size_t len)
> > +{
> > +    struct domctl_context *c = priv;
> > +
> > +    if ( c->len - c->cur < len )
> > +        return -ENOSPC;
> > +
> > +    memcpy(c->buffer + c->cur, data, len);
> > +    c->cur += len;
> > +
> > +    return 0;
> > +}
> > +
> > +static int getdomaincontext(struct domain *d,
> > +                            struct xen_domctl_getdomaincontext *gdc)
> > +{
> > +    struct domctl_context c = { };
> 
> Please can you use ZERO_BLOCK_PTR or some such for the buffer
> field, such that errnoeous use of the field would not end up
> as a (PV-controllable) NULL deref. (Yes, it's a domctl, but
> still.) This being common code you also want to get things
> right for Arm, irrespective of whether the code will be dead
> there for now.
> 

Ok.

> > +    int rc;
> > +
> > +    if ( d == current->domain )
> > +        return -EPERM;
> > +
> > +    if ( guest_handle_is_null(gdc->buffer) ) /* query for buffer size */
> > +    {
> > +        if ( gdc->size )
> > +            return -EINVAL;
> > +
> > +        /* dry run to acquire buffer size */
> > +        rc = domain_save(d, accumulate_size, &c, true);
> > +        if ( rc )
> > +            return rc;
> > +
> > +        gdc->size = c.len;
> > +        return 0;
> > +    }
> > +
> > +    c.len = gdc->size;
> > +    c.buffer = xmalloc_bytes(c.len);
> 
> What sizes are we looking at here? It may be better to use vmalloc()
> right from the start.

Could be quite big so that seems reasonable.

> If not, I'd like to advocate for using
> xmalloc_array() instead of xmalloc_bytes() - see the almost-XSA
> commit cf38b4926e2b.
> 
> > +    if ( !c.buffer )
> > +        return -ENOMEM;
> > +
> > +    rc = domain_save(d, save_data, &c, false);
> > +
> > +    gdc->size = c.cur;
> > +    if ( !rc && copy_to_guest(gdc->buffer, c.buffer, gdc->size) )
> 
> As to my remark in patch 1 on the size field, applying to this size
> field too - copy_to_user{,_hvm}() don't support a 64-bit value (on
> y86 at least).

Ok.

> 
> > --- a/xen/include/public/domctl.h
> > +++ b/xen/include/public/domctl.h
> > @@ -38,7 +38,7 @@
> >  #include "hvm/save.h"
> >  #include "memory.h"
> >
> > -#define XEN_DOMCTL_INTERFACE_VERSION 0x00000012
> > +#define XEN_DOMCTL_INTERFACE_VERSION 0x00000013
> 
> I don't see you making any change making the interface backwards
> incompatible, hence no need for the bump.
> 

Ok.

> > @@ -1129,6 +1129,44 @@ struct xen_domctl_vuart_op {
> >                                   */
> >  };
> >
> > +/*
> > + * Get/Set domain PV context. The same struct xen_domctl_domaincontext
> > + * is used for both commands but with slightly different field semantics
> > + * as follows:
> > + *
> > + * XEN_DOMCTL_getdomaincontext
> > + * ---------------------------
> > + *
> > + * buffer (IN):   The buffer into which the context data should be
> > + *                copied, or NULL to query the buffer size that should
> > + *                be allocated.
> > + * size (IN/OUT): If 'buffer' is NULL then the value passed in must be
> > + *                zero, and the value passed out will be the size of the
> > + *                buffer to allocate.
> > + *                If 'buffer' is non-NULL then the value passed in must
> > + *                be the size of the buffer into which data may be copied.
> 
> This leaves open whether the size also gets updated in this latter
> case.

I'll clarify.

> 
> > + */
> > +struct xen_domctl_getdomaincontext {
> > +    uint64_t size;
> 
> If this is to remain 64-bits (with too large values suitably taken
> care of for all cases - see above), uint64_aligned_t please for
> consistency, if nothing else.

I've changed to uint32_t.

  Paul

> 
> Jan



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context
  2020-04-29 15:04   ` Jan Beulich
@ 2020-05-13 15:27     ` Paul Durrant
  0 siblings, 0 replies; 30+ messages in thread
From: Paul Durrant @ 2020-05-13 15:27 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: xen-devel, 'Paul Durrant', 'Ian Jackson',
	'Wei Liu'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 29 April 2020 16:04
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Wei Liu <wl@xen.org>
> Subject: Re: [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context
> 
> On 07.04.2020 19:38, Paul Durrant wrote:
> > +int main(int argc, char **argv)
> > +{
> > +    uint32_t domid;
> > +    unsigned int entry;
> > +    xc_interface *xch;
> > +    int rc;
> > +
> > +    if ( argc != 2 || !argv[1] || (rc = atoi(argv[1])) < 0 )
> > +    {
> > +        fprintf(stderr, "usage: %s <domid>\n", argv[0]);
> > +        exit(1);
> > +    }
> 
> Perhaps also allow dumping just a single (vCPU or other) ID?
> 

Yes, I can add optional type and instance arguments.

  Paul

> Jan



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-05-13 15:28 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-07 17:38 [PATCH v2 0/5] domain context infrastructure Paul Durrant
2020-04-07 17:38 ` [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context Paul Durrant
2020-04-20 17:20   ` Julien Grall
2020-04-28 15:35     ` Paul Durrant
2020-04-29 11:05       ` Julien Grall
2020-04-29 11:02   ` Jan Beulich
2020-05-06 16:44     ` Paul Durrant
2020-05-07  7:21       ` Jan Beulich
2020-05-07  7:34         ` Paul Durrant
2020-05-07  7:39           ` Jan Beulich
2020-05-07  7:45             ` Paul Durrant
2020-05-07  8:17               ` Jan Beulich
2020-05-07  8:35         ` Julien Grall
2020-05-07  8:58           ` Jan Beulich
2020-05-07  9:31             ` Julien Grall
2020-04-07 17:38 ` [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext Paul Durrant
2020-04-20 17:26   ` Julien Grall
2020-04-28 15:36     ` Paul Durrant
2020-04-29 14:50   ` Jan Beulich
2020-05-13 15:06     ` Paul Durrant
2020-04-07 17:38 ` [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context Paul Durrant
2020-04-29 15:04   ` Jan Beulich
2020-05-13 15:27     ` Paul Durrant
2020-04-07 17:38 ` [PATCH v2 4/5] common/domain: add a domain context record for shared_info Paul Durrant
2020-04-20 17:34   ` Julien Grall
2020-04-28 15:37     ` Paul Durrant
2020-04-30 11:29       ` Jan Beulich
2020-04-30 11:56   ` Jan Beulich
2020-04-07 17:38 ` [PATCH v2 5/5] tools/libxc: make use of domain context SHARED_INFO record Paul Durrant
2020-04-30 11:57   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.