All of lore.kernel.org
 help / color / mirror / Atom feed
From: <pdurrant@amzn.com>
To: <xen-devel@lists.xenproject.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	Julien Grall <julien@xen.org>, Wei Liu <wl@xen.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Paul Durrant <pdurrant@amazon.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: [Xen-devel] [PATCH v6 2/2] docs/designs: Add a design document for migration of xenstore data
Date: Thu, 5 Mar 2020 17:30:41 +0000	[thread overview]
Message-ID: <20200305173041.5141-3-pdurrant@amzn.com> (raw)
In-Reply-To: <20200305173041.5141-1-pdurrant@amzn.com>

From: Paul Durrant <pdurrant@amazon.com>

This patch details proposes extra migration data and xenstore protocol
extensions to support non-cooperative live migration of guests.

NOTE: doc/misc/xenstore.txt is also amened to replace the <mfn> term
      for the INTRODUCE operation with the <gfn>, since this is what
      it actually is.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wl@xen.org>

v6:
 - Addressed comments from Julien

v5:
 - Add QUIESCE
 - Make semantics of <index> in GET_DOMAIN_WATCHES more clear

v4:
 - Drop the restrictions on special paths

v3:
 - New in v3
---
 docs/designs/xenstore-migration.md | 171 +++++++++++++++++++++++++++++
 docs/misc/xenstore.txt             |   6 +-
 2 files changed, 174 insertions(+), 3 deletions(-)
 create mode 100644 docs/designs/xenstore-migration.md

diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
new file mode 100644
index 0000000000..7e61f462f0
--- /dev/null
+++ b/docs/designs/xenstore-migration.md
@@ -0,0 +1,171 @@
+# Xenstore Migration
+
+## Background
+
+The design for *Non-Cooperative Migration of Guests*[1] explains that extra
+save records are required in the migrations stream to allow a guest running
+PV drivers to be migrated without its co-operation. Moreover the save
+records must include details of registered xenstore watches as well as
+content; information that cannot currently be recovered from `xenstored`,
+and hence some extension to the xenstore protocol[2] will also be required.
+
+The *libxenlight Domain Image Format* specification[3] already defines a
+record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
+transferring xenstore data pertaining to the domain directly as it is
+specified such that keys are relative to the path
+`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
+define at least one new save record type.
+
+## Proposal
+
+### New Save Record
+
+A new mandatory record type should be defined within the libxenlight Domain
+Image Format:
+
+`0x00000007: DOMAIN_XENSTORE_DATA`
+
+The format of each of these new records should be as follows:
+
+
+```
+0     1     2     3     4     5     6     7 octet
++------------------------+------------------------+
+| type                   | record specific data   |
++------------------------+                        |
+...
++-------------------------------------------------+
+```
+
+NB: The record data does not contain a length because the libxenlight record
+header specifies the length.
+
+
+| Field  | Description                                      |
+|--------|--------------------------------------------------|
+| `type` | 0x00000000: invalid                              |
+|        | 0x00000001: node data                            |
+|        | 0x00000002: watch data                           |
+|        | 0x00000003: transaction data                     |
+|        | 0x00000004 - 0xFFFFFFFF: reserved for future use |
+
+
+where data is always in the form of a tuple as follows
+
+
+**node data**
+
+
+`<path>|<perm-count>|<perm-as-string>|+<value|>`
+
+
+`<path>` and `<value|>` should be suitable to formulate a `WRITE` operation
+to the receiving xenstored and the `<perm-as-string>|+` list should be
+similarly suitable to formulate a subsequent `SET_PERMS` operation.
+`<perm-count>` specifies the number of entries in the `<perm-as-string>|+`
+list and `<value|>` must be placed at the end because it may contain NUL
+octets.
+
+
+**watch data**
+
+
+`<path>|<token>|`
+
+`<path>` again is absolute and, together with `<token>`, should
+be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
+
+
+**transaction data**
+
+
+`<transid-count>|<transid>|+`
+
+Each `<transid>` should be a uint32_t value represented as unsigned decimal
+suitable for passing as a *tx_id* to the re-defined `TRANSACTION_START`
+operation (see below). `<transid-count>` is the number of entries in the
+`<transid>|+` list.
+
+
+### Protocol Extension
+
+Before xenstore state is migrated it is necessary to wait for any pending
+reads, writes, watch registrations etc. to complete, and also to make sure
+that xenstored does not start processing any new requests (so that new
+requests remain pending on the shared ring for subsequent processing on the
+new host). Hence the following operation is needed:
+
+```
+QUIESCE                 <domid>|
+
+Complete processing of any request issued by the specified domain, and
+do not process any further requests from the shared ring.
+```
+
+The `WATCH` operation does not allow specification of a `<domid>`; it is
+assumed that the watch pertains to the domain that owns the shared ring
+over which the operation is passed. Hence, for the tool-stack to be able
+to register a watch on behalf of a domain a new operation is needed:
+
+```
+ADD_DOMAIN_WATCHES      <domid>|<watch>|+
+
+Adds watches on behalf of the specified domain.
+
+<watch> is a NUL separated tuple of <path>|<token>. The semantics of this
+operation are identical to the domain issuing WATCH <path>|<token>| for
+each <watch>.
+```
+
+The watch information for a domain also needs to be extracted from the
+sending xenstored so the following operation is also needed:
+
+```
+GET_DOMAIN_WATCHES      <domid>|<index>   <gencnt>|<watch>|*
+
+Gets the list of watches that are currently registered for the domain.
+
+<watch> is a NUL separated tuple of <path>|<token>. The sub-list returned
+will start at <index> items into the the overall list of watches and may
+be truncated (at a <watch> boundary) such that the returned data fits
+within XENSTORE_PAYLOAD_MAX.
+
+If <index> is beyond the end of the overall list then the returned sub-
+list will be empty. If the value of <gencnt> changes then it indicates
+that the overall watch list has changed and thus it may be necessary
+to re-issue the operation for previous values of <index>.
+```
+
+To deal with transactions that were pending when the domain is migrated
+it is necessary to start transactions with the same `<trans-id>` in the
+receiving xenstored but for them to result in an `EAGAIN` when the
+`TRANSACTION_END` operation is peformed. Thus the `TRANSACTION_START`
+operation needs to be re-defined as follows:
+
+```
+TRANSACTION_START	|			<transid>|
+	<transid> is an opaque uint32_t represented as unsigned decimal.
+    If tx_id is 0 for this operation then a new transaction will be started
+    with a tx_id allocated by xenstored. If a non-0 tx_id is specified then
+    a transaction with that tx_id will be started and automatically marked
+    `conflicting'. The tx_id will always be passed back in <transid>.
+    After this, the tx_id may be used in the request header field for
+    other operations.
+    When a transaction is started whole db is copied; reads and writes
+    happen on the copy.
+```
+
+It may also be desirable to state in the protocol specification that
+the `INTRODUCE` operation should not clear the `<gfn>` specified such that
+a `RELEASE` operation followed by an `INTRODUCE` operation form an
+idempotent pair. The current implementation of *C xentored* does this
+(in the `domain_conn_reset()` function) but this could be dropped as this
+behaviour is not currently specified and the page will always be zeroed
+for a newly created domain.
+
+
+* * *
+
+[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
+[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt
+[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/specs/libxl-migration-stream.pandoc
diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index 6f8569d576..51e6b12931 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -254,7 +254,7 @@ TRANSACTION_END		F|
 
 ---------- Domain management and xenstored communications ----------
 
-INTRODUCE		<domid>|<mfn>|<evtchn>|?
+INTRODUCE		<domid>|<gfn>|<evtchn>|?
 	Notifies xenstored to communicate with this domain.
 
 	INTRODUCE is currently only used by xend (during domain
@@ -262,12 +262,12 @@ INTRODUCE		<domid>|<mfn>|<evtchn>|?
 	xenstored prevents its use other than by dom0.
 
 	<domid> must be a real domain id (not 0 and not a special
-	DOMID_... value).  <mfn> must be a machine page in that domain
+	DOMID_... value).  <gfn> must be a machine page in that domain
 	represented in signed decimal (!).  <evtchn> must be event
 	channel is an unbound event channel in <domid> (likewise in
 	decimal), on which xenstored will call bind_interdomain.
 	Violations of these rules may result in undefined behaviour;
-	for example passing a high-bit-set 32-bit mfn as an unsigned
+	for example passing a high-bit-set 32-bit gfn as an unsigned
 	decimal will attempt to use 0x7fffffff instead (!).
 
 RELEASE			<domid>|
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  parent reply	other threads:[~2020-03-05 17:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-05 17:30 [Xen-devel] [PATCH v6 0/2] docs: Migration design documents pdurrant
2020-03-05 17:30 ` [Xen-devel] [PATCH v6 1/2] docs/designs: Add a design document for non-cooperative live migration pdurrant
2020-03-06 11:06   ` Julien Grall
2020-03-05 17:30 ` pdurrant [this message]
2020-03-06 12:02   ` [Xen-devel] [PATCH v6 2/2] docs/designs: Add a design document for migration of xenstore data Julien Grall
2020-03-06 12:19     ` [Xen-devel] [EXTERNAL][PATCH " Paul Durrant
2020-03-06 16:18       ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200305173041.5141-3-pdurrant@amzn.com \
    --to=pdurrant@amzn.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=pdurrant@amazon.com \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.