All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Durrant <xadimgnik@gmail.com>
To: "'Julien Grall'" <julien@xen.org>, <pdurrant@amzn.com>,
	<xen-devel@lists.xenproject.org>
Cc: 'Stefano Stabellini' <sstabellini@kernel.org>,
	'Wei Liu' <wl@xen.org>,
	'Konrad Rzeszutek Wilk' <konrad.wilk@oracle.com>,
	'George Dunlap' <George.Dunlap@eu.citrix.com>,
	'Andrew Cooper' <andrew.cooper3@citrix.com>,
	'Ian Jackson' <ian.jackson@eu.citrix.com>,
	'Jan Beulich' <jbeulich@suse.com>
Subject: Re: [Xen-devel] [EXTERNAL][PATCH v6 2/2] docs/designs: Add a design document for migration of xenstore data
Date: Fri, 6 Mar 2020 12:19:17 -0000	[thread overview]
Message-ID: <01cd01d5f3b1$78b69410$6a23bc30$@xen.org> (raw)
In-Reply-To: <982fbada-6b29-aafd-4faa-14b60b2cc900@xen.org>

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 06 March 2020 12:03
> To: pdurrant@amzn.com; xen-devel@lists.xenproject.org
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <George.Dunlap@eu.citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
> <jbeulich@suse.com>; Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>
> Subject: RE: [EXTERNAL][PATCH v6 2/2] docs/designs: Add a design document for migration of xenstore
> data
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Hi Paul,
> 
> On 05/03/2020 17:30, pdurrant@amzn.com wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > This patch details proposes extra migration data and xenstore protocol
> > extensions to support non-cooperative live migration of guests.
> >
> > NOTE: doc/misc/xenstore.txt is also amened to replace the <mfn> term
> >        for the INTRODUCE operation with the <gfn>, since this is what
> >        it actually is.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Julien Grall <julien@xen.org>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Wei Liu <wl@xen.org>
> >
> > v6:
> >   - Addressed comments from Julien
> >
> > v5:
> >   - Add QUIESCE
> >   - Make semantics of <index> in GET_DOMAIN_WATCHES more clear
> >
> > v4:
> >   - Drop the restrictions on special paths
> >
> > v3:
> >   - New in v3
> > ---
> >   docs/designs/xenstore-migration.md | 171 +++++++++++++++++++++++++++++
> >   docs/misc/xenstore.txt             |   6 +-
> >   2 files changed, 174 insertions(+), 3 deletions(-)
> >   create mode 100644 docs/designs/xenstore-migration.md
> >
> > diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
> > new file mode 100644
> > index 0000000000..7e61f462f0
> > --- /dev/null
> > +++ b/docs/designs/xenstore-migration.md
> > @@ -0,0 +1,171 @@
> > +# Xenstore Migration
> > +
> > +## Background
> > +
> > +The design for *Non-Cooperative Migration of Guests*[1] explains that extra
> > +save records are required in the migrations stream to allow a guest running
> > +PV drivers to be migrated without its co-operation. Moreover the save
> > +records must include details of registered xenstore watches as well as
> > +content; information that cannot currently be recovered from `xenstored`,
> > +and hence some extension to the xenstore protocol[2] will also be required.
> > +
> > +The *libxenlight Domain Image Format* specification[3] already defines a
> > +record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
> > +transferring xenstore data pertaining to the domain directly as it is
> > +specified such that keys are relative to the path
> > +`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
> > +define at least one new save record type.
> > +
> > +## Proposal
> > +
> > +### New Save Record
> > +
> > +A new mandatory record type should be defined within the libxenlight Domain
> > +Image Format:
> > +
> > +`0x00000007: DOMAIN_XENSTORE_DATA`
> > +
> > +The format of each of these new records should be as follows:
> > +
> > +
> > +```
> > +0     1     2     3     4     5     6     7 octet
> > ++------------------------+------------------------+
> > +| type                   | record specific data   |
> > ++------------------------+                        |
> > +...
> > ++-------------------------------------------------+
> > +```
> > +
> > +NB: The record data does not contain a length because the libxenlight record
> > +header specifies the length.
> > +
> > +
> > +| Field  | Description                                      |
> > +|--------|--------------------------------------------------|
> > +| `type` | 0x00000000: invalid                              |
> > +|        | 0x00000001: node data                            |
> > +|        | 0x00000002: watch data                           |
> > +|        | 0x00000003: transaction data                     |
> > +|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
> > +
> > +
> > +where data is always in the form of a tuple as follows
> 
> NIT: missing full stop.
> 

Ok.

> > +
> > +
> > +**node data**
> > +
> > +
> > +`<path>|<perm-count>|<perm-as-string>|+<value|>`
> > +
> In the xenstore spec, | means a 'nul (zero) byte'. Are you using the
> same terminology here? If so, it may be worth mentioning it.
> 

Ok, I'll cut'n'paste the format definitions in for ease of reference I think.

> > +
> > +`<path>` and `<value|>` should be suitable to formulate a `WRITE` operation
> > +to the receiving xenstored and the `<perm-as-string>|+` list should be
> > +similarly suitable to formulate a subsequent `SET_PERMS` operation.
> > +`<perm-count>` specifies the number of entries in the `<perm-as-string>|+`
> > +list and `<value|>` must be placed at the end because it may contain NUL
> > +octets.
> 
> What is the size of <perm-count>? Also, we may want perm-count to be
> aligned to its size so we don't have to worry of unaligned access.
> 
> How about moving <perm-count> at the beginning of the data blob?
>

Not sure we really need to care about alignment... I'll leave as-is for the moment.
 
> > +
> > +
> > +**watch data**
> > +
> > +
> > +`<path>|<token>|`
> > +
> > +`<path>` again is absolute and, together with `<token>`, should
> > +be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
> > +
> > +
> > +**transaction data**
> > +
> > +
> > +`<transid-count>|<transid>|+`
> > +
> > +Each `<transid>` should be a uint32_t value represented as unsigned decimal
> > +suitable for passing as a *tx_id* to the re-defined `TRANSACTION_START`
> > +operation (see below). `<transid-count>` is the number of entries in the
> > +`<transid>|+` list.
> 
> What is the size of <transid-count>?. Also the | suggests we would have
> a NUL byte between the two numbers. This would mean the <transid> will
> not be aligned to its size.
> 

I was considering transid as text encoded (to match the TRANSACTION_START operation) but I agree the mix is odd. Perhaps I'll just use structures for the migration records instead.

> > +
> > +
> > +### Protocol Extension
> > +
> > +Before xenstore state is migrated it is necessary to wait for any pending
> > +reads, writes, watch registrations etc. to complete, and also to make sure
> > +that xenstored does not start processing any new requests (so that new
> > +requests remain pending on the shared ring for subsequent processing on the
> > +new host). Hence the following operation is needed:
> > +
> > +```
> > +QUIESCE                 <domid>|
> > +
> > +Complete processing of any request issued by the specified domain, and
> > +do not process any further requests from the shared ring.
> > +``` > +
> > +The `WATCH` operation does not allow specification of a `<domid>`; it is
> > +assumed that the watch pertains to the domain that owns the shared ring
> > +over which the operation is passed. Hence, for the tool-stack to be able
> > +to register a watch on behalf of a domain a new operation is needed:
> > +
> > +```
> > +ADD_DOMAIN_WATCHES      <domid>|<watch>|+
> > +
> > +Adds watches on behalf of the specified domain.
> > +
> > +<watch> is a NUL separated tuple of <path>|<token>. The semantics of this
> > +operation are identical to the domain issuing WATCH <path>|<token>| for
> > +each <watch>.
> > +```
> > +
> > +The watch information for a domain also needs to be extracted from the
> > +sending xenstored so the following operation is also needed:
> > +
> > +```
> > +GET_DOMAIN_WATCHES      <domid>|<index>   <gencnt>|<watch>|*
> > +
> > +Gets the list of watches that are currently registered for the domain.
> > +
> > +<watch> is a NUL separated tuple of <path>|<token>. The sub-list returned
> > +will start at <index> items into the the overall list of watches and may
> > +be truncated (at a <watch> boundary) such that the returned data fits
> > +within XENSTORE_PAYLOAD_MAX.
> > +
> > +If <index> is beyond the end of the overall list then the returned sub-
> > +list will be empty. If the value of <gencnt> changes then it indicates
> > +that the overall watch list has changed and thus it may be necessary
> > +to re-issue the operation for previous values of <index>.
> > +```
> > +
> > +To deal with transactions that were pending when the domain is migrated
> > +it is necessary to start transactions with the same `<trans-id>` in the
> > +receiving xenstored but for them to result in an `EAGAIN` when the
> > +`TRANSACTION_END` operation is peformed. Thus the `TRANSACTION_START`
> > +operation needs to be re-defined as follows:
> > +
> > +```
> > +TRANSACTION_START    |                       <transid>|
> > +     <transid> is an opaque uint32_t represented as unsigned decimal.
> > +    If tx_id is 0 for this operation then a new transaction will be started
> > +    with a tx_id allocated by xenstored. If a non-0 tx_id is specified then
> > +    a transaction with that tx_id will be started and automatically marked
> > +    `conflicting'. The tx_id will always be passed back in <transid>.
> > +    After this, the tx_id may be used in the request header field for
> > +    other operations.
> > +    When a transaction is started whole db is copied; reads and writes
> > +    happen on the copy.
> > +```
> 
> The transaction ID are per connection. As the toolstack will issue the
> command on restore, you would reserve the ID for the wrong domain. So I
> think you want to introduce a new command maybe
> RESERVE_DOMAIN_TRANSACTIONS?

Yes, of course you are correct. So much for trying to re-use things :-( I'll spec a new command.

  Paul

> 
> Cheers,
> 
> --
> Julien Grall


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2020-03-06 13:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-05 17:30 [Xen-devel] [PATCH v6 0/2] docs: Migration design documents pdurrant
2020-03-05 17:30 ` [Xen-devel] [PATCH v6 1/2] docs/designs: Add a design document for non-cooperative live migration pdurrant
2020-03-06 11:06   ` Julien Grall
2020-03-05 17:30 ` [Xen-devel] [PATCH v6 2/2] docs/designs: Add a design document for migration of xenstore data pdurrant
2020-03-06 12:02   ` Julien Grall
2020-03-06 12:19     ` Paul Durrant [this message]
2020-03-06 16:18       ` [Xen-devel] [EXTERNAL][PATCH " Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01cd01d5f3b1$78b69410$6a23bc30$@xen.org' \
    --to=xadimgnik@gmail.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=pdurrant@amzn.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.