xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien@xen.org>
To: Paul Durrant <paul@xen.org>, xen-devel@lists.xenproject.org
Cc: "Stefano Stabellini" <sstabellini@kernel.org>,
	"Wei Liu" <wl@xen.org>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Paul Durrant" <pdurrant@amazon.com>,
	"Ian Jackson" <ian.jackson@eu.citrix.com>,
	"George Dunlap" <george.dunlap@citrix.com>,
	"Jan Beulich" <jbeulich@suse.com>,
	"Volodymyr Babchuk" <Volodymyr_Babchuk@epam.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context
Date: Mon, 20 Apr 2020 18:20:43 +0100	[thread overview]
Message-ID: <2f69484c-d043-bded-0a88-2587241aaa94@xen.org> (raw)
In-Reply-To: <20200407173847.1595-2-paul@xen.org>

Hi Paul,

On 07/04/2020 18:38, Paul Durrant wrote:
> To allow enlightened HVM guests (i.e. those that have PV drivers) to be
> migrated without their co-operation it will be necessary to transfer 'PV'
> state such as event channel state, grant entry state, etc.
> 
> Currently there is a framework (entered via the hvm_save/load() functions)
> that allows a domain's 'HVM' (architectural) state to be transferred but
> 'PV' state is also common with pure PV guests and so this framework is not
> really suitable.
> 
> This patch adds the new public header and low level implementation of a new
> common framework, entered via the domain_save/load() functions. Subsequent
> patches will introduce other parts of the framework, and code that will
> make use of it within the current version of the libxc migration stream.
> 
> This patch also marks the HVM-only framework as deprecated in favour of the
> new framework.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Wei Liu <wl@xen.org>
> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> 
> v2:
>   - Allow multi-stage save/load to avoid the need to double-buffer
>   - Get rid of the masks and add an 'ignore' flag instead
>   - Create copy function union to preserve const save buffer
>   - Deprecate HVM-only framework
> ---
>   xen/common/Makefile                    |   1 +
>   xen/common/save.c                      | 329 +++++++++++++++++++++++++
>   xen/include/public/arch-arm/hvm/save.h |   5 +
>   xen/include/public/arch-x86/hvm/save.h |   5 +
>   xen/include/public/save.h              |  84 +++++++
>   xen/include/xen/save.h                 | 152 ++++++++++++
>   6 files changed, 576 insertions(+)
>   create mode 100644 xen/common/save.c
>   create mode 100644 xen/include/public/save.h
>   create mode 100644 xen/include/xen/save.h
> 
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index e8cde65370..90553ba5d7 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -37,6 +37,7 @@ obj-y += radix-tree.o
>   obj-y += rbtree.o
>   obj-y += rcupdate.o
>   obj-y += rwlock.o
> +obj-y += save.o
>   obj-y += shutdown.o
>   obj-y += softirq.o
>   obj-y += sort.o
> diff --git a/xen/common/save.c b/xen/common/save.c
> new file mode 100644
> index 0000000000..6cdac3785b
> --- /dev/null
> +++ b/xen/common/save.c
> @@ -0,0 +1,329 @@
> +/*
> + * save.c: Save and restore PV guest state common to all domain types.
> + *
> + * Copyright Amazon.com Inc. or its affiliates.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/save.h>
> +
> +union domain_copy_entry {
> +    domain_write_entry write;
> +    domain_read_entry read;
> +};
> +
> +struct domain_context {
> +    bool log;
> +    struct domain_save_descriptor desc;
> +    size_t data_len;

What is data_len?

> +    union domain_copy_entry copy;
> +    void *priv;
> +};
> +
> +static struct {
> +    const char *name;
> +    bool per_vcpu;
> +    domain_save_handler save;
> +    domain_load_handler load;
> +} handlers[DOMAIN_SAVE_CODE_MAX + 1];
> +
> +void __init domain_register_save_type(unsigned int tc, const char *name,
> +                                      bool per_vcpu,
> +                                      domain_save_handler save,
> +                                      domain_load_handler load)
> +{
> +    BUG_ON(tc > ARRAY_SIZE(handlers));
> +
> +    ASSERT(!handlers[tc].save);
> +    ASSERT(!handlers[tc].load);
> +
> +    handlers[tc].name = name;
> +    handlers[tc].per_vcpu = per_vcpu;
> +    handlers[tc].save = save;
> +    handlers[tc].load = load;
> +}
> +
> +int domain_save_begin(struct domain_context *c, unsigned int tc,
> +                      const char *name, const struct vcpu *v, size_t len)
> +{
> +    int rc;
> +
> +    if ( c->log )
> +        gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
> +                 (unsigned long)len);

How about using %zu rather than a cast here?

> +
> +    BUG_ON(tc != c->desc.typecode);
> +    BUG_ON(v->vcpu_id != c->desc.vcpu_id);

I can't find any answer on my question about this part. I understand the 
code would be buggy if this happen, but is it warrant to crash the host? 
Couldn't you just return an error and continue to run?

> +
> +    ASSERT(!c->data_len);
> +    c->data_len = c->desc.length = len;
> +
> +    rc = c->copy.write(c->priv, &c->desc, sizeof(c->desc));
> +    if ( rc )
> +        return rc;
> +
> +    c->desc.length = 0;

This is confusing, why would you need to reset c->desc.length but not 
the rest of the fields?

> +
> +    return 0;
> +}
> +
> +int domain_save_data(struct domain_context *c, const void *src, size_t len)
> +{
> +    if ( c->desc.length + len > c->data_len )
> +        return -ENOSPC;
> +
> +    c->desc.length += len;
> +
> +    return c->copy.write(c->priv, src, len);
> +}
> +
> +int domain_save_end(struct domain_context *c)
> +{
> +    /*
> +     * If desc.length does not match the length specified in
> +     * domain_save_begin(), there should have been more data.
> +     */
> +    if ( c->desc.length != c->data_len )

This suggests you know in advance the size of the record. I have seen 
some cases where we don't know the size in advance. A good example if 
when saving grants. You know the maximum of grant used by the guest but 
you don't know yet the number of grants used. You might need to walk the 
"list" twice or allocate a temporary buffer. None of them are ideal.

Another example is when saving memory, we may want to compact page 
informations to save space.

This problem is going to be more relevant for LiveUpdate where we need 
to be able to create the stream in a very short amount of time. 
Allocating any temporary buffer and/or walking the list twice is going 
to kill the performance.

I would suggest to consider a different approach where you update the 
record length at the end.

FWIW, this below the current approach for the LU stream (IIRC David sent 
a prototype recently). A record is opened using lu_stream_open_record(), 
you then have two way to add data:
     - lu_stream_append() -> This takes a buffer and write to the stream.
     - lu_stream_reserve() -> Pre-allocate space in the stream and 
return a pointer to the beginning of the reserved region.
     - lu_stream_end_reservation() -> Takes the actual size in parameter 
and update the stream size.
     - lu_stream_close_record() -> Update the header with the total 
length and update the position in the stream.

> +        return -EIO;

I noticed that all the pasding check have been dropped. I still think we 
need implicit padding to harden the code.

> +
> +    c->data_len = 0;
> +
> +    return 0;
> +}
> +
> +int domain_save(struct domain *d, domain_write_entry write, void *priv,
> +                bool dry_run)
> +{
> +    struct domain_context c = {
> +        .copy.write = write,
> +        .priv = priv,
> +        .log = !dry_run,
> +    };
> +    struct domain_save_header h = {
> +        .magic = DOMAIN_SAVE_MAGIC,
> +        .version = DOMAIN_SAVE_VERSION,
> +    };
> +    struct domain_save_header e;
> +    unsigned int i;
> +    int rc;
> +
> +    ASSERT(d != current->domain);
> +
> +    if ( d->is_dying )

I can't find an answer to my question about d->is_dying. What if the 
guest die afterwards? What does protect us against domain_kill()?

[...]

> +int domain_load(struct domain *d, domain_read_entry read, void *priv)
> +{
> +    struct domain_context c = {
> +        .copy.read = read,
> +        .priv = priv,
> +        .log = true,
> +    };
> +    struct domain_save_header h;
> +    int rc;
> +
> +    ASSERT(d != current->domain);
> +
> +    if ( d->is_dying )
> +        return -EINVAL;

Same here.


> diff --git a/xen/include/public/save.h b/xen/include/public/save.h
> new file mode 100644
> index 0000000000..7e5f8752bd
> --- /dev/null
> +++ b/xen/include/public/save.h
> @@ -0,0 +1,84 @@
> +/*
> + * save.h
> + *
> + * Structure definitions for common PV/HVM domain state that is held by
> + * Xen and must be saved along with the domain's memory.
> + *
> + * Copyright Amazon.com Inc. or its affiliates.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __XEN_PUBLIC_SAVE_H__
> +#define __XEN_PUBLIC_SAVE_H__

Does this header need to be exposed outside of Xen and the tools?

> +
> +#include "xen.h"
> +
> +/* Each entry is preceded by a descriptor */
> +struct domain_save_descriptor {
> +    uint16_t typecode;
> +    /*
> +     * Each entry will contain either to global or per-vcpu domain state.
> +     * Entries relating to global state should have zero in this field.
> +     */
> +    uint16_t vcpu_id;
> +    uint32_t flags;
> +    /*
> +     * When restoring state this flag can be set in a descriptor to cause
> +     * its content to be ignored.

Could you give examples where you would want to ignore a descriptor?

> +     *
> +     * NOTE: It is invalid to set this flag for HEADER or END records (see
> +     *       below).
> +     */
> +#define _DOMAIN_SAVE_FLAG_IGNORE 0
> +#define DOMAIN_SAVE_FLAG_IGNORE (1u << _DOMAIN_SAVE_FLAG_IGNORE)
> +
> +    /* Entry length not including this descriptor */
> +    uint64_t length;
> +};
> +
> +/*
> + * Each entry has a type associated with it. DECLARE_DOMAIN_SAVE_TYPE
> + * binds these things together.
> + */
> +#define DECLARE_DOMAIN_SAVE_TYPE(_x, _code, _type) \
> +    struct __DOMAIN_SAVE_TYPE_##_x { char c[_code]; _type t; };
> +
> +#define DOMAIN_SAVE_CODE(_x) \
> +    (sizeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->c))
> +#define DOMAIN_SAVE_TYPE(_x) \
> +    typeof(((struct __DOMAIN_SAVE_TYPE_##_x *)(0))->t)
> +
> +/* Terminating entry */
> +struct domain_save_end {};
> +DECLARE_DOMAIN_SAVE_TYPE(END, 0, struct domain_save_end);
> +
> +#define DOMAIN_SAVE_MAGIC   0x53415645
> +#define DOMAIN_SAVE_VERSION 0x00000001
> +
> +/* Initial entry */
> +struct domain_save_header {
> +    uint32_t magic;             /* Must be DOMAIN_SAVE_MAGIC */
> +    uint32_t version;           /* Save format version */


I haven't seen any answer about xen version here. For the record:

"Let's imagine in 4.14 we introduced a bug in the saving part. This is
solved in 4.15 but somehow the version is not bumped. How would you
differentiate the streams saved by Xen 4.14 so you can still migrate?

If you record the version of Xen in the record automatically, then you
at least have a way to differentiate the two versions."

Cheers,

-- 
Julien Grall


  reply	other threads:[~2020-04-20 17:21 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-07 17:38 [PATCH v2 0/5] domain context infrastructure Paul Durrant
2020-04-07 17:38 ` [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context Paul Durrant
2020-04-20 17:20   ` Julien Grall [this message]
2020-04-28 15:35     ` Paul Durrant
2020-04-29 11:05       ` Julien Grall
2020-04-29 11:02   ` Jan Beulich
2020-05-06 16:44     ` Paul Durrant
2020-05-07  7:21       ` Jan Beulich
2020-05-07  7:34         ` Paul Durrant
2020-05-07  7:39           ` Jan Beulich
2020-05-07  7:45             ` Paul Durrant
2020-05-07  8:17               ` Jan Beulich
2020-05-07  8:35         ` Julien Grall
2020-05-07  8:58           ` Jan Beulich
2020-05-07  9:31             ` Julien Grall
2020-04-07 17:38 ` [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext Paul Durrant
2020-04-20 17:26   ` Julien Grall
2020-04-28 15:36     ` Paul Durrant
2020-04-29 14:50   ` Jan Beulich
2020-05-13 15:06     ` Paul Durrant
2020-04-07 17:38 ` [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context Paul Durrant
2020-04-29 15:04   ` Jan Beulich
2020-05-13 15:27     ` Paul Durrant
2020-04-07 17:38 ` [PATCH v2 4/5] common/domain: add a domain context record for shared_info Paul Durrant
2020-04-20 17:34   ` Julien Grall
2020-04-28 15:37     ` Paul Durrant
2020-04-30 11:29       ` Jan Beulich
2020-04-30 11:56   ` Jan Beulich
2020-04-07 17:38 ` [PATCH v2 5/5] tools/libxc: make use of domain context SHARED_INFO record Paul Durrant
2020-04-30 11:57   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f69484c-d043-bded-0a88-2587241aaa94@xen.org \
    --to=julien@xen.org \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=paul@xen.org \
    --cc=pdurrant@amazon.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).