* [PATCH v2] docs/designs: re-work the xenstore migration document...
@ 2020-04-27 7:53 Paul Durrant
2020-04-27 10:36 ` Jürgen Groß
0 siblings, 1 reply; 6+ messages in thread
From: Paul Durrant @ 2020-04-27 7:53 UTC (permalink / raw)
To: xen-devel; +Cc: Paul Durrant
From: Paul Durrant <pdurrant@amazon.com>
... to specify a separate migration stream that will also be suitable for
live update.
The original scope of the document was to support non-cooperative migration
of guests [1] but, since then, live update of xenstored has been brought into
scope. Thus it makes more sense to define a separate image format for
serializing xenstore state that is suitable for both purposes.
The document has been limited to specifying a new image format. The mechanism
for acquiring the image for live update or migration is not covered as that
is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
also expected that, when the first implementation of live update or migration
making use of this specification is committed, that the document is moved from
docs/designs into docs/specs.
[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Juergen Gross <jgross@suse.com>
Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap <george.dunlap@citrix.com>
Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich <jbeulich@suse.com>
Julien Grall <julien@xen.org>
Stefano Stabellini <sstabellini@kernel.org>
Wei Liu <wl@xen.org>
v2:
- Address comments from Juergen
---
docs/designs/xenstore-migration.md | 475 +++++++++++++++++++----------
1 file changed, 312 insertions(+), 163 deletions(-)
diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
index 6ab351e8fe..815aaeee59 100644
--- a/docs/designs/xenstore-migration.md
+++ b/docs/designs/xenstore-migration.md
@@ -3,254 +3,403 @@
## Background
The design for *Non-Cooperative Migration of Guests*[1] explains that extra
-save records are required in the migrations stream to allow a guest running
-PV drivers to be migrated without its co-operation. Moreover the save
-records must include details of registered xenstore watches as well as
-content; information that cannot currently be recovered from `xenstored`,
-and hence some extension to the xenstore protocol[2] will also be required.
-
-The *libxenlight Domain Image Format* specification[3] already defines a
-record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
-transferring xenstore data pertaining to the domain directly as it is
-specified such that keys are relative to the path
-`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
-define at least one new save record type.
+save records are required in the migrations stream to allow a guest running PV
+drivers to be migrated without its co-operation. Moreover the save records must
+include details of registered xenstore watches as well ascontent; information
+that cannot currently be recovered from `xenstored`, and hence some extension
+to the xenstored implementations will also be required.
+
+As a similar set of data is needed for transferring xenstore data from one
+instance to another when live updating xenstored this document proposes an
+image format for a 'migration stream' suitable for both purposes.
## Proposal
-### New Save Record
+The image format consists of a _header_ followed by 1 or more _records_. Each
+record consists of a type and length field, followed by any data mandated by
+the record type. At minimum there will be a single record of type `END`
+(defined below).
-A new mandatory record type should be defined within the libxenlight Domain
-Image Format:
+### Header
-`0x00000007: DOMAIN_XENSTORE_DATA`
+The header identifies the stream as a `xenstore` stream, including the version
+of the specification that it complies with.
-An arbitrary number of these records may be present in the migration
-stream and may appear in any order. The format of each record should be as
-follows:
+All fields in this header must be in _big-endian_ byte order, regardless of
+the setting of the endianness bit.
```
0 1 2 3 4 5 6 7 octet
+-------+-------+-------+-------+-------+-------+-------+-------+
-| type | record specific data |
-+-------------------------------+ |
-...
-+---------------------------------------------------------------+
+| ident |
++-------------------------------+-------------------------------|
+| version | flags |
++-------------------------------+-------------------------------+
```
-where type is one of the following values
+| Field | Description |
+|-----------|---------------------------------------------------|
+| `ident` | 0x78656e73746f7265 ('xenstore' in ASCII) |
+| | |
+| `version` | 0x00000001 (the version of the specification) |
+| | |
+| `flags` | 0 (LSB): Endianness: 0 = little, 1 = big |
+| | |
+| | 1-31: Reserved (must be zero) |
-| Field | Description |
-|--------|--------------------------------------------------|
-| `type` | 0x00000000: invalid |
-| | 0x00000001: NODE_DATA |
-| | 0x00000002: WATCH_DATA |
-| | 0x00000003: TRANSACTION_DATA |
-| | 0x00000004 - 0xFFFFFFFF: reserved for future use |
+### Records
+Records immediately follow the header and have the following format:
-and data is one of the record data formats described in the following
-sections.
+```
+ 0 1 2 3 4 5 6 7 octet
++-------+-------+-------+-------+-------+-------+-------+-------+
+| type | len |
++-------------------------------+-------------------------------+
+| body
+...
+| | padding (0 to 7 octets) |
++-------+-------------------------------------------------------+
+```
+
+NOTE: padding octets here and in all subsequent format specifications must be
+ written as zero and should be ignored when the stream is read.
-NOTE: The record data does not contain an overall length because the
-libxenlight record header specifies the length.
+| Field | Description |
+|--------|------------------------------------------------------|
+| `type` | 0x00000000: END |
+| | 0x00000001: GLOBAL_DATA |
+| | 0x00000002: CONNECTION_DATA |
+| | 0x00000003: WATCH_DATA |
+| | 0x00000004: TRANSACTION_DATA |
+| | 0x00000005: NODE_DATA |
+| | 0x00000006 - 0xFFFFFFFF: reserved for future use |
+| | |
+| `len` | The length (in octets) of `body` |
+| | |
+| `body` | The type-specific record data |
-**NODE_DATA**
+Some records will depend on other records in the migration stream. Records
+upon which other records depend must always appear earlier in the stream.
+The various formats of the type-specific data are described in the following
+sections:
-Each NODE_DATA record specifies a single node in xenstore and is formatted
-as follows:
+\pagebreak
+### END
+
+The end record marks the end of the image, and is the final record
+in the stream.
```
- 0 1 2 3 octet
-+-------+-------+-------+-------+
-| NODE_DATA |
-+-------------------------------+
-| path length |
-+-------------------------------+
-| path data |
-...
-| pad (0 to 3 octets) |
-+-------------------------------+
-| perm count (N) |
-+-------------------------------+
-| perm0 |
-+-------------------------------+
-...
-+-------------------------------+
-| permN |
-+-------------------------------+
-| value length |
-+-------------------------------+
-| value data |
-...
-| pad (0 to 3 octets) |
-+-------------------------------+
+ 0 1 2 3 4 5 6 7 octet
++-------+-------+-------+-------+-------+-------+-------+-------+
```
-where perm0..N are formatted as follows:
+The end record contains no fields; its body length is 0.
+
+\pagebreak
+
+### GLOBAL_DATA
+
+This record is only relevant for live update. It contains details of global
+xenstored state that needs to be restored.
```
- 0 1 2 3 octet
+ 0 1 2 3 octet
+-------+-------+-------+-------+
-| perm | pad | domid |
+| rw-socket-fd |
++-------------------------------+
+| ro-socket-fd |
+-------------------------------+
```
-path length and value length are specified in octets (excluding the NUL
-terminator of the path). perm should be one of the ASCII values `w`, `r`,
-`b` or `n` as described in [2]. All pad values should be 0.
-All paths should be absolute (i.e. start with `/`) and as described in
-[2].
+| Field | Description |
+|----------------|----------------------------------------------|
+| `rw-socket-fd` | The file descriptor of the socket accepting |
+| | read-write connections |
+| | |
+| `ro-socket-fd` | The file descriptor of the socket accepting |
+| | read-only connections |
+
+xenstored will resume in the original process context. Hence `rw-socket-fd` and
+`ro-socket-fd` simply specify the file descriptors of the sockets.
-**WATCH_DATA**
+\pagebreak
+### CONNECTION_DATA
-Each WATCH_DATA record specifies a registered watch and is formatted as
-follows:
+For live update the image format will contain a `CONNECTION_DATA` record for
+each connection to xenstore. For migration it will only contain a record for
+the domain being migrated.
```
- 0 1 2 3 octet
-+-------+-------+-------+-------+
-| WATCH_DATA |
-+-------------------------------+
-| wpath length |
-+-------------------------------+
-| wpath data |
-...
-| pad (0 to 3 octets) |
-+-------------------------------+
+ 0 1 2 3 4 5 6 7 octet
++-------+-------+-------+-------+-------+-------+-------+-------+
+| conn-id | pad |
++---------------+-----------------------------------------------+
+| conn-type | conn-spec
...
++-------------------------------+-------------------------------+
+| data-len | data
+-------------------------------+
-| token length |
-+-------------------------------+
-| token data |
...
-| pad (0 to 3 octets) |
-+-------------------------------+
```
-wpath length and token length are specified in octets (excluding the NUL
-terminator). The wpath should be as described for the `WATCH` operation in
-[2]. The token is an arbitrary string of octets not containing any NUL
-values.
+| Field | Description |
+|-------------|-------------------------------------------------|
+| `conn-id` | A non-zero number used to identify this |
+| | connection in subsequent connection-specific |
+| | records |
+| | |
+| `conn-type` | 0x0000: shared ring |
+| | 0x0001: socket |
+| | |
+| `conn-spec` | See below |
+| | |
+| `data-len` | The length (in octets) of any pending data not |
+| | yet written to the connection |
+| | |
+| `data` | Pending data (may be empty) |
-**TRANSACTION_DATA**
+The format of `conn-spec` is dependent upon `conn-type`.
+\pagebreak
-Each TRANSACTION_DATA record specifies an open transaction and is formatted
-as follows:
+For `shared ring` connections it is as follows:
```
- 0 1 2 3 octet
-+-------+-------+-------+-------+
-| TRANSACTION_DATA |
-+-------------------------------+
-| tx_id |
-+-------------------------------+
+ 0 1 2 3 4 5 6 7 octet
+ +-------+-------+-------+-------+-------+-------+
+ | domid | tdomid | flags |
++---------------+---------------+---------------+---------------+
+| revtchn | levtchn |
++-------------------------------+-------------------------------+
+| mfn |
++---------------------------------------------------------------+
```
-where tx_id is the non-zero identifier values of an open transaction.
-
-### Protocol Extension
+| Field | Description |
+|------------|--------------------------------------------------|
+| `domid` | The domain-id that owns the shared page |
+| | |
+| `tdomid` | The domain-id that `domid` acts on behalf of if |
+| | it has been subject to an SET_TARGET |
+| | operation [2] or DOMID_INVALID otherwise |
+| | |
+| `flags` | A bit-wise OR of: |
+| | 0x0001: INTRODUCE has been issued |
+| | 0x0002: RELEASE has been issued |
+| | |
+| `revtchn` | The port number of the interdomain channel used |
+| | by `domid` to communicate with xenstored |
+| | |
+| `levtchn` | For a live update this will be the port number |
+| | of the interdomain channel used by xenstored |
+| | itself otherwise, for migration, it will be -1 |
+| | |
+| `mfn` | The MFN of the shared page for a live update or |
+| | INVALID_MFN otherwise |
+
+Since the ABI guarantees that entry 1 in `domid`'s grant table will always
+contain the GFN of the shared page, so for a live update `mfn` can be used to
+give confidence that `domid` has not been re-cycled during the update.
+
+
+For `socket` connections it is as follows:
-Before xenstore state is migrated it is necessary to wait for any pending
-reads, writes, watch registrations etc. to complete, and also to make sure
-that xenstored does not start processing any new requests (so that new
-requests remain pending on the shared ring for subsequent processing on the
-new host). Hence the following operation is needed:
```
-QUIESCE <domid>|
-
-Complete processing of any request issued by the specified domain, and
-do not process any further requests from the shared ring.
+ 0 1 2 3 4 5 6 7 octet
+ +-------+-------+-------+-------+-------+-------+
+ | flags | socket-fd |
+ +---------------+-------------------------------+
```
-The `WATCH` operation does not allow specification of a `<domid>`; it is
-assumed that the watch pertains to the domain that owns the shared ring
-over which the operation is passed. Hence, for the tool-stack to be able
-to register a watch on behalf of a domain a new operation is needed:
-```
-ADD_DOMAIN_WATCHES <domid>|<watch>|+
+| Field | Description |
+|-------------|-------------------------------------------------|
+| `flags` | A bit-wise OR of: |
+| | 0001: read-only |
+| | |
+| `socket-fd` | The file descriptor of the connected socket |
-Adds watches on behalf of the specified domain.
+This type of connection is only relevant for live update, where the xenstored
+resumes in the original process context. Hence `socket-fd` simply specify
+the file descriptor of the socket connection.
-<watch> is a NUL separated tuple of <path>|<token>. The semantics of this
-operation are identical to the domain issuing WATCH <path>|<token>| for
-each <watch>.
-```
+\pagebreak
+
+### WATCH_DATA
+
+The image format will contain a `WATCH_DATA` record for each watch registered
+by a connection for which there is `CONNECTION_DATA` record previously present.
-The watch information for a domain also needs to be extracted from the
-sending xenstored so the following operation is also needed:
```
-GET_DOMAIN_WATCHES <domid>|<index> <gencnt>|<watch>|*
+ 0 1 2 3 octet
++-------+-------+-------+-------+
+| conn-id |
++---------------+---------------+
+| wpath-len | token-len |
++---------------+---------------+
+| wpath
+...
+| token
+...
+```
+
-Gets the list of watches that are currently registered for the domain.
+| Field | Description |
+|-------------|-------------------------------------------------|
+| `conn-id` | The connection that issued the `WATCH` |
+| | operation [2] |
+| | |
+| `wpath-len` | The length (in octets) of `wpath` including the |
+| | NUL terminator |
+| | |
+| `token-len` | The length (in octets) of `token` including the |
+| | NUL terminator |
+| | |
+| `wpath` | The watch path, as specified in the `WATCH` |
+| | operation |
+| | |
+| `token` | The watch identifier token, as specified in the |
+| | `WATCH` operation |
+
+\pagebreak
+
+### TRANSACTION_DATA
+
+The image format will contain a `TRANSACTION_DATA` record for each transaction
+that is pending on a connection for which there is `CONNECTION_DATA` record
+previously present.
-<watch> is a NUL separated tuple of <path>|<token>. The sub-list returned
-will start at <index> items into the the overall list of watches and may
-be truncated (at a <watch> boundary) such that the returned data fits
-within XENSTORE_PAYLOAD_MAX.
-If <index> is beyond the end of the overall list then the returned sub-
-list will be empty. If the value of <gencnt> changes then it indicates
-that the overall watch list has changed and thus it may be necessary
-to re-issue the operation for previous values of <index>.
```
+ 0 1 2 3 octet
++-------+-------+-------+-------+
+| conn-id |
++-------------------------------+
+| tx-id |
++-------------------------------+
+```
+
+
+| Field | Description |
+|----------------|----------------------------------------------|
+| `conn-id` | The connection that issued the |
+| | `TRANSACTION_START` operation [2] |
+| | |
+| `tx-id` | The transaction id passed back to the domain |
+| | by the `TRANSACTION_START` operation |
+
+\pagebreak
-To deal with transactions that were pending when the domain is migrated
-it is necessary to start transactions with the same tx_id on behalf of the
-domain in the receiving xenstored.
+### NODE_DATA
-NOTE: For safety each such transaction should result in an `EAGAIN` when
-the `TRANSACTION_END` operation is performed, as modifications made under
-the tx_id will not be part of the migration stream.
+For live update the image format will contain a `NODE_DATA` record for each
+node in xenstore. For migration it will only contain a record for the nodes
+relating to the domain being migrated. The `NODE_DATA` may be related to
+a _committed_ node (globally visible in xenstored) or a _pending_ node (created
+or modified by a transaction for which there is also a `TRANSACTION_DATA`
+record previously present).
-The `TRANSACTION_START` operation does not allow specification of a
-`<domid>`; it is assumed that the transaction pertains to the domain that
-owns the shared ring over which the operation is passed. Neither does it
-allow a `<transid>` to be specified; it is always chosen by xenstored.
-Hence, for the tool-stack to be able to open a transaction on behalf of a
-domain a new operation is needed:
```
-START_DOMAIN_TRANSACTION <domid>|<transid>|
+ 0 1 2 3 octet
++-------+-------+-------+-------+
+| conn-id |
++-------------------------------+
+| tx-id |
++---------------+---------------+
+| access | perm-count |
++---------------+---------------+
+| perm1 |
++-------------------------------+
+...
++-------------------------------+
+| permN |
++---------------+---------------+
+| path-len | value-len |
++---------------+---------------+
+| path
+...
+| value
+...
+```
+
+
+| Field | Description |
+|--------------|------------------------------------------------|
+| `conn-id` | If this value is non-zero then this record |
+| | related to a pending transaction |
+| | |
+| `tx-id` | This value should be ignored if `conn-id` is |
+| | zero. Otherwise it specifies the id of the |
+| | pending transaction |
+| | |
+| `access` | This value should be ignored if this record |
+| | does not relate to a pending transaction, |
+| | otherwise it specifies the accesses made to |
+| | the node and hence is a bitwise OR of: |
+| | |
+| | 0x0001: read |
+| | 0x0002: written |
+| | |
+| | The value will be zero for a deleted node |
+| | |
+| `perm-count` | The number (N) of node permission specifiers |
+| | (which will be 0 for a node deleted in a |
+| | pending transaction) |
+| | |
+| `perm1..N` | A list of zero or more node permission |
+| | specifiers (see below) |
+| | |
+| `path-len` | The length (in octets) of `path` including the |
+| | NUL terminator |
+| | |
+| `value-len` | The length (in octets) of `value` (which will |
+| | be zero for a deleted node) |
+| | |
+| `path` | The absolute path of the node |
+| | |
+| `value` | The node value (which may be empty or contain |
+| | NUL octets) |
+
+
+A node permission specifier has the following format:
-Starts a transaction on behalf of a domain.
-The semantics of this are similar to the domain issuing
-TRANSACTION_START and receiving the specified <transid> as the response.
-The main difference is that the transaction will be immediately marked as
-'conflicting' such that when the domain issues TRANSACTION_END T|, it will
-result in EAGAIN.
+```
+ 0 1 2 3 octet
++-------+-------+-------+-------+
+| perm | pad | domid |
++-------+-------+---------------+
```
-It may also be desirable to state in the protocol specification that
-the `INTRODUCE` operation should not clear the `<gfn>` specified such that
-a `RELEASE` operation followed by an `INTRODUCE` operation form an
-idempotent pair. The current implementation of *C xentored* does this
-(in the `domain_conn_reset()` function) but this could be dropped as this
-behaviour is not currently specified and the page will always be zeroed
-for a newly created domain.
+| Field | Description |
+|---------|-----------------------------------------------------|
+| `perm` | One of the ASCII values `w`, `r`, `b` or `n` as |
+| | specified for the `SET_PERMS` operation [2] |
+| | |
+| `domid` | The domain-id to which the permission relates |
* * *
[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
+
[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt
-[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/specs/libxl-migration-stream.pandoc
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
2020-04-27 7:53 [PATCH v2] docs/designs: re-work the xenstore migration document Paul Durrant
@ 2020-04-27 10:36 ` Jürgen Groß
2020-04-27 10:45 ` Paul Durrant
0 siblings, 1 reply; 6+ messages in thread
From: Jürgen Groß @ 2020-04-27 10:36 UTC (permalink / raw)
To: Paul Durrant, xen-devel; +Cc: Paul Durrant
On 27.04.20 09:53, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
>
> ... to specify a separate migration stream that will also be suitable for
> live update.
>
> The original scope of the document was to support non-cooperative migration
> of guests [1] but, since then, live update of xenstored has been brought into
> scope. Thus it makes more sense to define a separate image format for
> serializing xenstore state that is suitable for both purposes.
>
> The document has been limited to specifying a new image format. The mechanism
> for acquiring the image for live update or migration is not covered as that
> is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
> also expected that, when the first implementation of live update or migration
> making use of this specification is committed, that the document is moved from
> docs/designs into docs/specs.
>
> [1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
>
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Juergen Gross <jgross@suse.com>
> Andrew Cooper <andrew.cooper3@citrix.com>
> George Dunlap <george.dunlap@citrix.com>
> Ian Jackson <ian.jackson@eu.citrix.com>
> Jan Beulich <jbeulich@suse.com>
> Julien Grall <julien@xen.org>
> Stefano Stabellini <sstabellini@kernel.org>
> Wei Liu <wl@xen.org>
Mind adding CC: before those mail addresses in order to let git add
those to the recipients list?
>
> v2:
> - Address comments from Juergen
Not all unfortunately. :-(
> +### CONNECTION_DATA
>
> -Each WATCH_DATA record specifies a registered watch and is formatted as
> -follows:
> +For live update the image format will contain a `CONNECTION_DATA` record for
> +each connection to xenstore. For migration it will only contain a record for
> +the domain being migrated.
>
>
> ```
> - 0 1 2 3 octet
> -+-------+-------+-------+-------+
> -| WATCH_DATA |
> -+-------------------------------+
> -| wpath length |
> -+-------------------------------+
> -| wpath data |
> -...
> -| pad (0 to 3 octets) |
> -+-------------------------------+
> + 0 1 2 3 4 5 6 7 octet
> ++-------+-------+-------+-------+-------+-------+-------+-------+
> +| conn-id | pad |
> ++---------------+-----------------------------------------------+
> +| conn-type | conn-spec
> ...
I asked whether it wouldn't be better to drop the pad and move conn-type
and a 2-byte (unified) flag field at its position. This together ...
> ++-------------------------------+-------------------------------+
> +| data-len | data
> +-------------------------------+
> -| token length |
> -+-------------------------------+
> -| token data |
> ...
> -| pad (0 to 3 octets) |
> -+-------------------------------+
> ```
>
> -wpath length and token length are specified in octets (excluding the NUL
> -terminator). The wpath should be as described for the `WATCH` operation in
> -[2]. The token is an arbitrary string of octets not containing any NUL
> -values.
>
> +| Field | Description |
> +|-------------|-------------------------------------------------|
> +| `conn-id` | A non-zero number used to identify this |
> +| | connection in subsequent connection-specific |
> +| | records |
> +| | |
> +| `conn-type` | 0x0000: shared ring |
> +| | 0x0001: socket |
> +| | |
> +| `conn-spec` | See below |
> +| | |
> +| `data-len` | The length (in octets) of any pending data not |
> +| | yet written to the connection |
> +| | |
> +| `data` | Pending data (may be empty) |
>
> -**TRANSACTION_DATA**
> +The format of `conn-spec` is dependent upon `conn-type`.
>
> +\pagebreak
>
> -Each TRANSACTION_DATA record specifies an open transaction and is formatted
> -as follows:
> +For `shared ring` connections it is as follows:
>
>
> ```
> - 0 1 2 3 octet
> -+-------+-------+-------+-------+
> -| TRANSACTION_DATA |
> -+-------------------------------+
> -| tx_id |
> -+-------------------------------+
> + 0 1 2 3 4 5 6 7 octet
> + +-------+-------+-------+-------+-------+-------+
> + | domid | tdomid | flags |
> ++---------------+---------------+---------------+---------------+
> +| revtchn | levtchn |
> ++-------------------------------+-------------------------------+
> +| mfn |
> ++---------------------------------------------------------------+
... with dropping levtchn (which isn't needed IMO) will make it much
easier to have a union in C (which needs to be aligned to 8 bytes
and have a length of a multiple of 8 bytes due to mfn).
So something like:
struct xs_state_connection {
uint32_t conn_id;
uint16_t conn_type;
#define XS_STATE_CONN_TYPE_RING 0
#define XS_STATE_CONN_TYPE_SOCKET 1
uint16_t flags;
#define XS_STATE_CONN_INTRODUCED 0x0001
#define XS_STATE_CONN_RELEASED 0x0002
#define XS_STATE_CONN_READONLY 0x0004
union {
struct {
uint16_t domid;
uint16_t tdomid;
#define XS_STATE_DOMID_INVALID 0xffffU
uint32_t evtchn;
uint64_t mfn;
#define XS_STATE_MFN_INVALID 0xffffffffffffffffUL
} ring;
int32_t socket_fd;
} spec;
uint32_t data_out_len;
uint8_t data[];
};
> ```
>
> -where tx_id is the non-zero identifier values of an open transaction.
> -
>
> -### Protocol Extension
> +| Field | Description |
> +|------------|--------------------------------------------------|
> +| `domid` | The domain-id that owns the shared page |
> +| | |
> +| `tdomid` | The domain-id that `domid` acts on behalf of if |
> +| | it has been subject to an SET_TARGET |
> +| | operation [2] or DOMID_INVALID otherwise |
DOMID_INVALID needs to be defined (or we need a reference where it is
coming from).
> +| | |
> +| `flags` | A bit-wise OR of: |
> +| | 0x0001: INTRODUCE has been issued |
> +| | 0x0002: RELEASE has been issued |
> +| | |
> +| `revtchn` | The port number of the interdomain channel used |
> +| | by `domid` to communicate with xenstored |
> +| | |
> +| `levtchn` | For a live update this will be the port number |
> +| | of the interdomain channel used by xenstored |
> +| | itself otherwise, for migration, it will be -1 |
> +| | |
> +| `mfn` | The MFN of the shared page for a live update or |
> +| | INVALID_MFN otherwise |
INVALID_MFN is an internal detail of the hypervisor. We should have a
local definition for it here (as in my example above).
> +
> +Since the ABI guarantees that entry 1 in `domid`'s grant table will always
> +contain the GFN of the shared page, so for a live update `mfn` can be used to
> +give confidence that `domid` has not been re-cycled during the update.
> +
> +
> +For `socket` connections it is as follows:
>
> -Before xenstore state is migrated it is necessary to wait for any pending
> -reads, writes, watch registrations etc. to complete, and also to make sure
> -that xenstored does not start processing any new requests (so that new
> -requests remain pending on the shared ring for subsequent processing on the
> -new host). Hence the following operation is needed:
>
> ```
> -QUIESCE <domid>|
> -
> -Complete processing of any request issued by the specified domain, and
> -do not process any further requests from the shared ring.
> + 0 1 2 3 4 5 6 7 octet
> + +-------+-------+-------+-------+-------+-------+
> + | flags | socket-fd |
> + +---------------+-------------------------------+
> -START_DOMAIN_TRANSACTION <domid>|<transid>|
> + 0 1 2 3 octet
> ++-------+-------+-------+-------+
> +| conn-id |
> ++-------------------------------+
> +| tx-id |
> ++---------------+---------------+
> +| access | perm-count |
> ++---------------+---------------+
> +| perm1 |
> ++-------------------------------+
> +...
> ++-------------------------------+
> +| permN |
> ++---------------+---------------+
> +| path-len | value-len |
> ++---------------+---------------+
> +| path
> +...
> +| value
> +...
> +```
> +
> +
> +| Field | Description |
> +|--------------|------------------------------------------------|
> +| `conn-id` | If this value is non-zero then this record |
> +| | related to a pending transaction |
> +| | |
> +| `tx-id` | This value should be ignored if `conn-id` is |
> +| | zero. Otherwise it specifies the id of the |
> +| | pending transaction |
> +| | |
> +| `access` | This value should be ignored if this record |
> +| | does not relate to a pending transaction, |
> +| | otherwise it specifies the accesses made to |
> +| | the node and hence is a bitwise OR of: |
> +| | |
> +| | 0x0001: read |
> +| | 0x0002: written |
> +| | |
> +| | The value will be zero for a deleted node |
> +| | |
> +| `perm-count` | The number (N) of node permission specifiers |
> +| | (which will be 0 for a node deleted in a |
> +| | pending transaction) |
> +| | |
> +| `perm1..N` | A list of zero or more node permission |
> +| | specifiers (see below) |
> +| | |
> +| `path-len` | The length (in octets) of `path` including the |
> +| | NUL terminator |
> +| | |
> +| `value-len` | The length (in octets) of `value` (which will |
> +| | be zero for a deleted node) |
I asked you to put the path-len and value-len fields before the perm
array.
Juergen
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH v2] docs/designs: re-work the xenstore migration document...
2020-04-27 10:36 ` Jürgen Groß
@ 2020-04-27 10:45 ` Paul Durrant
2020-04-27 11:13 ` Jürgen Groß
0 siblings, 1 reply; 6+ messages in thread
From: Paul Durrant @ 2020-04-27 10:45 UTC (permalink / raw)
To: 'Jürgen Groß', xen-devel; +Cc: 'Paul Durrant'
> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 27 April 2020 11:37
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>
> Subject: Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
>
> On 27.04.20 09:53, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > ... to specify a separate migration stream that will also be suitable for
> > live update.
> >
> > The original scope of the document was to support non-cooperative migration
> > of guests [1] but, since then, live update of xenstored has been brought into
> > scope. Thus it makes more sense to define a separate image format for
> > serializing xenstore state that is suitable for both purposes.
> >
> > The document has been limited to specifying a new image format. The mechanism
> > for acquiring the image for live update or migration is not covered as that
> > is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
> > also expected that, when the first implementation of live update or migration
> > making use of this specification is committed, that the document is moved from
> > docs/designs into docs/specs.
> >
> > [1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Juergen Gross <jgross@suse.com>
> > Andrew Cooper <andrew.cooper3@citrix.com>
> > George Dunlap <george.dunlap@citrix.com>
> > Ian Jackson <ian.jackson@eu.citrix.com>
> > Jan Beulich <jbeulich@suse.com>
> > Julien Grall <julien@xen.org>
> > Stefano Stabellini <sstabellini@kernel.org>
> > Wei Liu <wl@xen.org>
>
> Mind adding CC: before those mail addresses in order to let git add
> those to the recipients list?
>
D'oh... good spot.
> >
> > v2:
> > - Address comments from Juergen
>
> Not all unfortunately. :-(
>
OK.
> > +### CONNECTION_DATA
> >
> > -Each WATCH_DATA record specifies a registered watch and is formatted as
> > -follows:
> > +For live update the image format will contain a `CONNECTION_DATA` record for
> > +each connection to xenstore. For migration it will only contain a record for
> > +the domain being migrated.
> >
> >
> > ```
> > - 0 1 2 3 octet
> > -+-------+-------+-------+-------+
> > -| WATCH_DATA |
> > -+-------------------------------+
> > -| wpath length |
> > -+-------------------------------+
> > -| wpath data |
> > -...
> > -| pad (0 to 3 octets) |
> > -+-------------------------------+
> > + 0 1 2 3 4 5 6 7 octet
> > ++-------+-------+-------+-------+-------+-------+-------+-------+
> > +| conn-id | pad |
> > ++---------------+-----------------------------------------------+
> > +| conn-type | conn-spec
> > ...
>
> I asked whether it wouldn't be better to drop the pad and move conn-type
> and a 2-byte (unified) flag field at its position. This together ...
>
> > ++-------------------------------+-------------------------------+
> > +| data-len | data
> > +-------------------------------+
> > -| token length |
> > -+-------------------------------+
> > -| token data |
> > ...
> > -| pad (0 to 3 octets) |
> > -+-------------------------------+
> > ```
> >
> > -wpath length and token length are specified in octets (excluding the NUL
> > -terminator). The wpath should be as described for the `WATCH` operation in
> > -[2]. The token is an arbitrary string of octets not containing any NUL
> > -values.
> >
> > +| Field | Description |
> > +|-------------|-------------------------------------------------|
> > +| `conn-id` | A non-zero number used to identify this |
> > +| | connection in subsequent connection-specific |
> > +| | records |
> > +| | |
> > +| `conn-type` | 0x0000: shared ring |
> > +| | 0x0001: socket |
> > +| | |
> > +| `conn-spec` | See below |
> > +| | |
> > +| `data-len` | The length (in octets) of any pending data not |
> > +| | yet written to the connection |
> > +| | |
> > +| `data` | Pending data (may be empty) |
> >
> > -**TRANSACTION_DATA**
> > +The format of `conn-spec` is dependent upon `conn-type`.
> >
> > +\pagebreak
> >
> > -Each TRANSACTION_DATA record specifies an open transaction and is formatted
> > -as follows:
> > +For `shared ring` connections it is as follows:
> >
> >
> > ```
> > - 0 1 2 3 octet
> > -+-------+-------+-------+-------+
> > -| TRANSACTION_DATA |
> > -+-------------------------------+
> > -| tx_id |
> > -+-------------------------------+
> > + 0 1 2 3 4 5 6 7 octet
> > + +-------+-------+-------+-------+-------+-------+
> > + | domid | tdomid | flags |
> > ++---------------+---------------+---------------+---------------+
> > +| revtchn | levtchn |
> > ++-------------------------------+-------------------------------+
> > +| mfn |
> > ++---------------------------------------------------------------+
>
> ... with dropping levtchn (which isn't needed IMO) will make it much
> easier to have a union in C (which needs to be aligned to 8 bytes
> and have a length of a multiple of 8 bytes due to mfn).
>
> So something like:
>
> struct xs_state_connection {
> uint32_t conn_id;
> uint16_t conn_type;
> #define XS_STATE_CONN_TYPE_RING 0
> #define XS_STATE_CONN_TYPE_SOCKET 1
> uint16_t flags;
> #define XS_STATE_CONN_INTRODUCED 0x0001
> #define XS_STATE_CONN_RELEASED 0x0002
> #define XS_STATE_CONN_READONLY 0x0004
> union {
> struct {
> uint16_t domid;
> uint16_t tdomid;
> #define XS_STATE_DOMID_INVALID 0xffffU
> uint32_t evtchn;
> uint64_t mfn;
> #define XS_STATE_MFN_INVALID 0xffffffffffffffffUL
> } ring;
> int32_t socket_fd;
> } spec;
> uint32_t data_out_len;
> uint8_t data[];
> };
The issue is making sure that the mfn is properly aligned. If I can drop the levtchn then this gets easier.
>
> > ```
> >
> > -where tx_id is the non-zero identifier values of an open transaction.
> > -
> >
> > -### Protocol Extension
> > +| Field | Description |
> > +|------------|--------------------------------------------------|
> > +| `domid` | The domain-id that owns the shared page |
> > +| | |
> > +| `tdomid` | The domain-id that `domid` acts on behalf of if |
> > +| | it has been subject to an SET_TARGET |
> > +| | operation [2] or DOMID_INVALID otherwise |
>
> DOMID_INVALID needs to be defined (or we need a reference where it is
> coming from).
OK. It's in a public header... I'll reference it.
>
> > +| | |
> > +| `flags` | A bit-wise OR of: |
> > +| | 0x0001: INTRODUCE has been issued |
> > +| | 0x0002: RELEASE has been issued |
> > +| | |
> > +| `revtchn` | The port number of the interdomain channel used |
> > +| | by `domid` to communicate with xenstored |
> > +| | |
> > +| `levtchn` | For a live update this will be the port number |
> > +| | of the interdomain channel used by xenstored |
> > +| | itself otherwise, for migration, it will be -1 |
> > +| | |
> > +| `mfn` | The MFN of the shared page for a live update or |
> > +| | INVALID_MFN otherwise |
>
> INVALID_MFN is an internal detail of the hypervisor. We should have a
> local definition for it here (as in my example above).
OK. I'll define it as all-1s.
>
> > +
> > +Since the ABI guarantees that entry 1 in `domid`'s grant table will always
> > +contain the GFN of the shared page, so for a live update `mfn` can be used to
> > +give confidence that `domid` has not been re-cycled during the update.
> > +
> > +
> > +For `socket` connections it is as follows:
> >
> > -Before xenstore state is migrated it is necessary to wait for any pending
> > -reads, writes, watch registrations etc. to complete, and also to make sure
> > -that xenstored does not start processing any new requests (so that new
> > -requests remain pending on the shared ring for subsequent processing on the
> > -new host). Hence the following operation is needed:
> >
> > ```
> > -QUIESCE <domid>|
> > -
> > -Complete processing of any request issued by the specified domain, and
> > -do not process any further requests from the shared ring.
> > + 0 1 2 3 4 5 6 7 octet
> > + +-------+-------+-------+-------+-------+-------+
> > + | flags | socket-fd |
> > + +---------------+-------------------------------+
>
> > -START_DOMAIN_TRANSACTION <domid>|<transid>|
> > + 0 1 2 3 octet
> > ++-------+-------+-------+-------+
> > +| conn-id |
> > ++-------------------------------+
> > +| tx-id |
> > ++---------------+---------------+
> > +| access | perm-count |
> > ++---------------+---------------+
> > +| perm1 |
> > ++-------------------------------+
> > +...
> > ++-------------------------------+
> > +| permN |
> > ++---------------+---------------+
> > +| path-len | value-len |
> > ++---------------+---------------+
> > +| path
> > +...
> > +| value
> > +...
> > +```
> > +
> > +
> > +| Field | Description |
> > +|--------------|------------------------------------------------|
> > +| `conn-id` | If this value is non-zero then this record |
> > +| | related to a pending transaction |
> > +| | |
> > +| `tx-id` | This value should be ignored if `conn-id` is |
> > +| | zero. Otherwise it specifies the id of the |
> > +| | pending transaction |
> > +| | |
> > +| `access` | This value should be ignored if this record |
> > +| | does not relate to a pending transaction, |
> > +| | otherwise it specifies the accesses made to |
> > +| | the node and hence is a bitwise OR of: |
> > +| | |
> > +| | 0x0001: read |
> > +| | 0x0002: written |
> > +| | |
> > +| | The value will be zero for a deleted node |
> > +| | |
> > +| `perm-count` | The number (N) of node permission specifiers |
> > +| | (which will be 0 for a node deleted in a |
> > +| | pending transaction) |
> > +| | |
> > +| `perm1..N` | A list of zero or more node permission |
> > +| | specifiers (see below) |
> > +| | |
> > +| `path-len` | The length (in octets) of `path` including the |
> > +| | NUL terminator |
> > +| | |
> > +| `value-len` | The length (in octets) of `value` (which will |
> > +| | be zero for a deleted node) |
>
> I asked you to put the path-len and value-len fields before the perm
> array.
I missed that. I'll move them.
Paul
>
>
> Juergen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
2020-04-27 10:45 ` Paul Durrant
@ 2020-04-27 11:13 ` Jürgen Groß
2020-04-27 11:25 ` Paul Durrant
0 siblings, 1 reply; 6+ messages in thread
From: Jürgen Groß @ 2020-04-27 11:13 UTC (permalink / raw)
To: paul, xen-devel; +Cc: 'Paul Durrant'
On 27.04.20 12:45, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jürgen Groß <jgross@suse.com>
>> Sent: 27 April 2020 11:37
>> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
>> Cc: Paul Durrant <pdurrant@amazon.com>
>> Subject: Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
>>
>> On 27.04.20 09:53, Paul Durrant wrote:
>>> From: Paul Durrant <pdurrant@amazon.com>
>>>
>>> ... to specify a separate migration stream that will also be suitable for
>>> live update.
>>>
>>> The original scope of the document was to support non-cooperative migration
>>> of guests [1] but, since then, live update of xenstored has been brought into
>>> scope. Thus it makes more sense to define a separate image format for
>>> serializing xenstore state that is suitable for both purposes.
>>>
>>> The document has been limited to specifying a new image format. The mechanism
>>> for acquiring the image for live update or migration is not covered as that
>>> is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
>>> also expected that, when the first implementation of live update or migration
>>> making use of this specification is committed, that the document is moved from
>>> docs/designs into docs/specs.
>>>
>>> [1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
>>>
>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>>> ---
>>> Juergen Gross <jgross@suse.com>
>>> Andrew Cooper <andrew.cooper3@citrix.com>
>>> George Dunlap <george.dunlap@citrix.com>
>>> Ian Jackson <ian.jackson@eu.citrix.com>
>>> Jan Beulich <jbeulich@suse.com>
>>> Julien Grall <julien@xen.org>
>>> Stefano Stabellini <sstabellini@kernel.org>
>>> Wei Liu <wl@xen.org>
>>
>> Mind adding CC: before those mail addresses in order to let git add
>> those to the recipients list?
>>
>
> D'oh... good spot.
>
>>>
>>> v2:
>>> - Address comments from Juergen
>>
>> Not all unfortunately. :-(
>>
>
> OK.
>
>>> +### CONNECTION_DATA
>>>
>>> -Each WATCH_DATA record specifies a registered watch and is formatted as
>>> -follows:
>>> +For live update the image format will contain a `CONNECTION_DATA` record for
>>> +each connection to xenstore. For migration it will only contain a record for
>>> +the domain being migrated.
>>>
>>>
>>> ```
>>> - 0 1 2 3 octet
>>> -+-------+-------+-------+-------+
>>> -| WATCH_DATA |
>>> -+-------------------------------+
>>> -| wpath length |
>>> -+-------------------------------+
>>> -| wpath data |
>>> -...
>>> -| pad (0 to 3 octets) |
>>> -+-------------------------------+
>>> + 0 1 2 3 4 5 6 7 octet
>>> ++-------+-------+-------+-------+-------+-------+-------+-------+
>>> +| conn-id | pad |
>>> ++---------------+-----------------------------------------------+
>>> +| conn-type | conn-spec
>>> ...
>>
>> I asked whether it wouldn't be better to drop the pad and move conn-type
>> and a 2-byte (unified) flag field at its position. This together ...
>>
>>> ++-------------------------------+-------------------------------+
>>> +| data-len | data
>>> +-------------------------------+
>>> -| token length |
>>> -+-------------------------------+
>>> -| token data |
>>> ...
>>> -| pad (0 to 3 octets) |
>>> -+-------------------------------+
>>> ```
>>>
>>> -wpath length and token length are specified in octets (excluding the NUL
>>> -terminator). The wpath should be as described for the `WATCH` operation in
>>> -[2]. The token is an arbitrary string of octets not containing any NUL
>>> -values.
>>>
>>> +| Field | Description |
>>> +|-------------|-------------------------------------------------|
>>> +| `conn-id` | A non-zero number used to identify this |
>>> +| | connection in subsequent connection-specific |
>>> +| | records |
>>> +| | |
>>> +| `conn-type` | 0x0000: shared ring |
>>> +| | 0x0001: socket |
>>> +| | |
>>> +| `conn-spec` | See below |
>>> +| | |
>>> +| `data-len` | The length (in octets) of any pending data not |
>>> +| | yet written to the connection |
>>> +| | |
>>> +| `data` | Pending data (may be empty) |
>>>
>>> -**TRANSACTION_DATA**
>>> +The format of `conn-spec` is dependent upon `conn-type`.
>>>
>>> +\pagebreak
>>>
>>> -Each TRANSACTION_DATA record specifies an open transaction and is formatted
>>> -as follows:
>>> +For `shared ring` connections it is as follows:
>>>
>>>
>>> ```
>>> - 0 1 2 3 octet
>>> -+-------+-------+-------+-------+
>>> -| TRANSACTION_DATA |
>>> -+-------------------------------+
>>> -| tx_id |
>>> -+-------------------------------+
>>> + 0 1 2 3 4 5 6 7 octet
>>> + +-------+-------+-------+-------+-------+-------+
>>> + | domid | tdomid | flags |
>>> ++---------------+---------------+---------------+---------------+
>>> +| revtchn | levtchn |
>>> ++-------------------------------+-------------------------------+
>>> +| mfn |
>>> ++---------------------------------------------------------------+
>>
>> ... with dropping levtchn (which isn't needed IMO) will make it much
>> easier to have a union in C (which needs to be aligned to 8 bytes
>> and have a length of a multiple of 8 bytes due to mfn).
>>
>> So something like:
>>
>> struct xs_state_connection {
>> uint32_t conn_id;
>> uint16_t conn_type;
>> #define XS_STATE_CONN_TYPE_RING 0
>> #define XS_STATE_CONN_TYPE_SOCKET 1
>> uint16_t flags;
>> #define XS_STATE_CONN_INTRODUCED 0x0001
>> #define XS_STATE_CONN_RELEASED 0x0002
>> #define XS_STATE_CONN_READONLY 0x0004
>> union {
>> struct {
>> uint16_t domid;
>> uint16_t tdomid;
>> #define XS_STATE_DOMID_INVALID 0xffffU
>> uint32_t evtchn;
>> uint64_t mfn;
>> #define XS_STATE_MFN_INVALID 0xffffffffffffffffUL
>> } ring;
>> int32_t socket_fd;
>> } spec;
>> uint32_t data_out_len;
>> uint8_t data[];
>> };
>
> The issue is making sure that the mfn is properly aligned. If I can drop the levtchn then this gets easier.
>
>>
>>> ```
>>>
>>> -where tx_id is the non-zero identifier values of an open transaction.
>>> -
>>>
>>> -### Protocol Extension
>>> +| Field | Description |
>>> +|------------|--------------------------------------------------|
>>> +| `domid` | The domain-id that owns the shared page |
>>> +| | |
>>> +| `tdomid` | The domain-id that `domid` acts on behalf of if |
>>> +| | it has been subject to an SET_TARGET |
>>> +| | operation [2] or DOMID_INVALID otherwise |
>>
>> DOMID_INVALID needs to be defined (or we need a reference where it is
>> coming from).
>
> OK. It's in a public header... I'll reference it.
>
>>
>>> +| | |
>>> +| `flags` | A bit-wise OR of: |
>>> +| | 0x0001: INTRODUCE has been issued |
Just realized, I think we can drop those flags.
Reasoning: if INTRODUCE hasn't been issued, there can't be an active
connection to Xenstore for that domain, as Xenstore doesn't know about
the parameters to connect (especially the event channel is missing).
>>> +| | 0x0002: RELEASE has been issued |
And the same applies here: RELEASE will drop the connection to the
domain, so it can't appear in a connection record.
Juergen
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH v2] docs/designs: re-work the xenstore migration document...
2020-04-27 11:13 ` Jürgen Groß
@ 2020-04-27 11:25 ` Paul Durrant
2020-04-27 13:30 ` Jürgen Groß
0 siblings, 1 reply; 6+ messages in thread
From: Paul Durrant @ 2020-04-27 11:25 UTC (permalink / raw)
To: 'Jürgen Groß', xen-devel; +Cc: 'Paul Durrant'
> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 27 April 2020 12:13
> To: paul@xen.org; xen-devel@lists.xenproject.org
> Cc: 'Paul Durrant' <pdurrant@amazon.com>
> Subject: Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
>
> On 27.04.20 12:45, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jürgen Groß <jgross@suse.com>
> >> Sent: 27 April 2020 11:37
> >> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> >> Cc: Paul Durrant <pdurrant@amazon.com>
> >> Subject: Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
> >>
> >> On 27.04.20 09:53, Paul Durrant wrote:
> >>> From: Paul Durrant <pdurrant@amazon.com>
> >>>
> >>> ... to specify a separate migration stream that will also be suitable for
> >>> live update.
> >>>
> >>> The original scope of the document was to support non-cooperative migration
> >>> of guests [1] but, since then, live update of xenstored has been brought into
> >>> scope. Thus it makes more sense to define a separate image format for
> >>> serializing xenstore state that is suitable for both purposes.
> >>>
> >>> The document has been limited to specifying a new image format. The mechanism
> >>> for acquiring the image for live update or migration is not covered as that
> >>> is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
> >>> also expected that, when the first implementation of live update or migration
> >>> making use of this specification is committed, that the document is moved from
> >>> docs/designs into docs/specs.
> >>>
> >>> [1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-
> migration.md
> >>>
> >>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> >>> ---
> >>> Juergen Gross <jgross@suse.com>
> >>> Andrew Cooper <andrew.cooper3@citrix.com>
> >>> George Dunlap <george.dunlap@citrix.com>
> >>> Ian Jackson <ian.jackson@eu.citrix.com>
> >>> Jan Beulich <jbeulich@suse.com>
> >>> Julien Grall <julien@xen.org>
> >>> Stefano Stabellini <sstabellini@kernel.org>
> >>> Wei Liu <wl@xen.org>
> >>
> >> Mind adding CC: before those mail addresses in order to let git add
> >> those to the recipients list?
> >>
> >
> > D'oh... good spot.
> >
> >>>
> >>> v2:
> >>> - Address comments from Juergen
> >>
> >> Not all unfortunately. :-(
> >>
> >
> > OK.
> >
> >>> +### CONNECTION_DATA
> >>>
> >>> -Each WATCH_DATA record specifies a registered watch and is formatted as
> >>> -follows:
> >>> +For live update the image format will contain a `CONNECTION_DATA` record for
> >>> +each connection to xenstore. For migration it will only contain a record for
> >>> +the domain being migrated.
> >>>
> >>>
> >>> ```
> >>> - 0 1 2 3 octet
> >>> -+-------+-------+-------+-------+
> >>> -| WATCH_DATA |
> >>> -+-------------------------------+
> >>> -| wpath length |
> >>> -+-------------------------------+
> >>> -| wpath data |
> >>> -...
> >>> -| pad (0 to 3 octets) |
> >>> -+-------------------------------+
> >>> + 0 1 2 3 4 5 6 7 octet
> >>> ++-------+-------+-------+-------+-------+-------+-------+-------+
> >>> +| conn-id | pad |
> >>> ++---------------+-----------------------------------------------+
> >>> +| conn-type | conn-spec
> >>> ...
> >>
> >> I asked whether it wouldn't be better to drop the pad and move conn-type
> >> and a 2-byte (unified) flag field at its position. This together ...
> >>
> >>> ++-------------------------------+-------------------------------+
> >>> +| data-len | data
> >>> +-------------------------------+
> >>> -| token length |
> >>> -+-------------------------------+
> >>> -| token data |
> >>> ...
> >>> -| pad (0 to 3 octets) |
> >>> -+-------------------------------+
> >>> ```
> >>>
> >>> -wpath length and token length are specified in octets (excluding the NUL
> >>> -terminator). The wpath should be as described for the `WATCH` operation in
> >>> -[2]. The token is an arbitrary string of octets not containing any NUL
> >>> -values.
> >>>
> >>> +| Field | Description |
> >>> +|-------------|-------------------------------------------------|
> >>> +| `conn-id` | A non-zero number used to identify this |
> >>> +| | connection in subsequent connection-specific |
> >>> +| | records |
> >>> +| | |
> >>> +| `conn-type` | 0x0000: shared ring |
> >>> +| | 0x0001: socket |
> >>> +| | |
> >>> +| `conn-spec` | See below |
> >>> +| | |
> >>> +| `data-len` | The length (in octets) of any pending data not |
> >>> +| | yet written to the connection |
> >>> +| | |
> >>> +| `data` | Pending data (may be empty) |
> >>>
> >>> -**TRANSACTION_DATA**
> >>> +The format of `conn-spec` is dependent upon `conn-type`.
> >>>
> >>> +\pagebreak
> >>>
> >>> -Each TRANSACTION_DATA record specifies an open transaction and is formatted
> >>> -as follows:
> >>> +For `shared ring` connections it is as follows:
> >>>
> >>>
> >>> ```
> >>> - 0 1 2 3 octet
> >>> -+-------+-------+-------+-------+
> >>> -| TRANSACTION_DATA |
> >>> -+-------------------------------+
> >>> -| tx_id |
> >>> -+-------------------------------+
> >>> + 0 1 2 3 4 5 6 7 octet
> >>> + +-------+-------+-------+-------+-------+-------+
> >>> + | domid | tdomid | flags |
> >>> ++---------------+---------------+---------------+---------------+
> >>> +| revtchn | levtchn |
> >>> ++-------------------------------+-------------------------------+
> >>> +| mfn |
> >>> ++---------------------------------------------------------------+
> >>
> >> ... with dropping levtchn (which isn't needed IMO) will make it much
> >> easier to have a union in C (which needs to be aligned to 8 bytes
> >> and have a length of a multiple of 8 bytes due to mfn).
> >>
> >> So something like:
> >>
> >> struct xs_state_connection {
> >> uint32_t conn_id;
> >> uint16_t conn_type;
> >> #define XS_STATE_CONN_TYPE_RING 0
> >> #define XS_STATE_CONN_TYPE_SOCKET 1
> >> uint16_t flags;
> >> #define XS_STATE_CONN_INTRODUCED 0x0001
> >> #define XS_STATE_CONN_RELEASED 0x0002
> >> #define XS_STATE_CONN_READONLY 0x0004
> >> union {
> >> struct {
> >> uint16_t domid;
> >> uint16_t tdomid;
> >> #define XS_STATE_DOMID_INVALID 0xffffU
> >> uint32_t evtchn;
> >> uint64_t mfn;
> >> #define XS_STATE_MFN_INVALID 0xffffffffffffffffUL
> >> } ring;
> >> int32_t socket_fd;
> >> } spec;
> >> uint32_t data_out_len;
> >> uint8_t data[];
> >> };
> >
> > The issue is making sure that the mfn is properly aligned. If I can drop the levtchn then this gets
> easier.
> >
> >>
> >>> ```
> >>>
> >>> -where tx_id is the non-zero identifier values of an open transaction.
> >>> -
> >>>
> >>> -### Protocol Extension
> >>> +| Field | Description |
> >>> +|------------|--------------------------------------------------|
> >>> +| `domid` | The domain-id that owns the shared page |
> >>> +| | |
> >>> +| `tdomid` | The domain-id that `domid` acts on behalf of if |
> >>> +| | it has been subject to an SET_TARGET |
> >>> +| | operation [2] or DOMID_INVALID otherwise |
> >>
> >> DOMID_INVALID needs to be defined (or we need a reference where it is
> >> coming from).
> >
> > OK. It's in a public header... I'll reference it.
> >
> >>
> >>> +| | |
> >>> +| `flags` | A bit-wise OR of: |
> >>> +| | 0x0001: INTRODUCE has been issued |
>
> Just realized, I think we can drop those flags.
>
> Reasoning: if INTRODUCE hasn't been issued, there can't be an active
> connection to Xenstore for that domain, as Xenstore doesn't know about
> the parameters to connect (especially the event channel is missing).
>
> >>> +| | 0x0002: RELEASE has been issued |
>
> And the same applies here: RELEASE will drop the connection to the
> domain, so it can't appear in a connection record.
>
I think the presence of the RESUME command in xenstore.txt makes it non-obvious that we can forget about a domain once RELEASE has been called. The text there does say:
"It is not clear whether this is possible since one would
normally expect a domain not to be restarted after being shut
down without being destroyed in the meantime. There are
currently no users of this request in xen-unstable."
So, perhaps this would be the time to remove RESUME from the spec?
Paul
>
> Juergen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
2020-04-27 11:25 ` Paul Durrant
@ 2020-04-27 13:30 ` Jürgen Groß
0 siblings, 0 replies; 6+ messages in thread
From: Jürgen Groß @ 2020-04-27 13:30 UTC (permalink / raw)
To: paul, xen-devel; +Cc: 'Paul Durrant'
On 27.04.20 13:25, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jürgen Groß <jgross@suse.com>
>> Sent: 27 April 2020 12:13
>> To: paul@xen.org; xen-devel@lists.xenproject.org
>> Cc: 'Paul Durrant' <pdurrant@amazon.com>
>> Subject: Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
>>
>> On 27.04.20 12:45, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jürgen Groß <jgross@suse.com>
>>>> Sent: 27 April 2020 11:37
>>>> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
>>>> Cc: Paul Durrant <pdurrant@amazon.com>
>>>> Subject: Re: [PATCH v2] docs/designs: re-work the xenstore migration document...
>>>>
>>>> On 27.04.20 09:53, Paul Durrant wrote:
>>>>> From: Paul Durrant <pdurrant@amazon.com>
>>>>>
>>>>> ... to specify a separate migration stream that will also be suitable for
>>>>> live update.
>>>>>
>>>>> The original scope of the document was to support non-cooperative migration
>>>>> of guests [1] but, since then, live update of xenstored has been brought into
>>>>> scope. Thus it makes more sense to define a separate image format for
>>>>> serializing xenstore state that is suitable for both purposes.
>>>>>
>>>>> The document has been limited to specifying a new image format. The mechanism
>>>>> for acquiring the image for live update or migration is not covered as that
>>>>> is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
>>>>> also expected that, when the first implementation of live update or migration
>>>>> making use of this specification is committed, that the document is moved from
>>>>> docs/designs into docs/specs.
>>>>>
>>>>> [1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-
>> migration.md
>>>>>
>>>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>>>>> ---
>>>>> Juergen Gross <jgross@suse.com>
>>>>> Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> George Dunlap <george.dunlap@citrix.com>
>>>>> Ian Jackson <ian.jackson@eu.citrix.com>
>>>>> Jan Beulich <jbeulich@suse.com>
>>>>> Julien Grall <julien@xen.org>
>>>>> Stefano Stabellini <sstabellini@kernel.org>
>>>>> Wei Liu <wl@xen.org>
>>>>
>>>> Mind adding CC: before those mail addresses in order to let git add
>>>> those to the recipients list?
>>>>
>>>
>>> D'oh... good spot.
>>>
>>>>>
>>>>> v2:
>>>>> - Address comments from Juergen
>>>>
>>>> Not all unfortunately. :-(
>>>>
>>>
>>> OK.
>>>
>>>>> +### CONNECTION_DATA
>>>>>
>>>>> -Each WATCH_DATA record specifies a registered watch and is formatted as
>>>>> -follows:
>>>>> +For live update the image format will contain a `CONNECTION_DATA` record for
>>>>> +each connection to xenstore. For migration it will only contain a record for
>>>>> +the domain being migrated.
>>>>>
>>>>>
>>>>> ```
>>>>> - 0 1 2 3 octet
>>>>> -+-------+-------+-------+-------+
>>>>> -| WATCH_DATA |
>>>>> -+-------------------------------+
>>>>> -| wpath length |
>>>>> -+-------------------------------+
>>>>> -| wpath data |
>>>>> -...
>>>>> -| pad (0 to 3 octets) |
>>>>> -+-------------------------------+
>>>>> + 0 1 2 3 4 5 6 7 octet
>>>>> ++-------+-------+-------+-------+-------+-------+-------+-------+
>>>>> +| conn-id | pad |
>>>>> ++---------------+-----------------------------------------------+
>>>>> +| conn-type | conn-spec
>>>>> ...
>>>>
>>>> I asked whether it wouldn't be better to drop the pad and move conn-type
>>>> and a 2-byte (unified) flag field at its position. This together ...
>>>>
>>>>> ++-------------------------------+-------------------------------+
>>>>> +| data-len | data
>>>>> +-------------------------------+
>>>>> -| token length |
>>>>> -+-------------------------------+
>>>>> -| token data |
>>>>> ...
>>>>> -| pad (0 to 3 octets) |
>>>>> -+-------------------------------+
>>>>> ```
>>>>>
>>>>> -wpath length and token length are specified in octets (excluding the NUL
>>>>> -terminator). The wpath should be as described for the `WATCH` operation in
>>>>> -[2]. The token is an arbitrary string of octets not containing any NUL
>>>>> -values.
>>>>>
>>>>> +| Field | Description |
>>>>> +|-------------|-------------------------------------------------|
>>>>> +| `conn-id` | A non-zero number used to identify this |
>>>>> +| | connection in subsequent connection-specific |
>>>>> +| | records |
>>>>> +| | |
>>>>> +| `conn-type` | 0x0000: shared ring |
>>>>> +| | 0x0001: socket |
>>>>> +| | |
>>>>> +| `conn-spec` | See below |
>>>>> +| | |
>>>>> +| `data-len` | The length (in octets) of any pending data not |
>>>>> +| | yet written to the connection |
>>>>> +| | |
>>>>> +| `data` | Pending data (may be empty) |
>>>>>
>>>>> -**TRANSACTION_DATA**
>>>>> +The format of `conn-spec` is dependent upon `conn-type`.
>>>>>
>>>>> +\pagebreak
>>>>>
>>>>> -Each TRANSACTION_DATA record specifies an open transaction and is formatted
>>>>> -as follows:
>>>>> +For `shared ring` connections it is as follows:
>>>>>
>>>>>
>>>>> ```
>>>>> - 0 1 2 3 octet
>>>>> -+-------+-------+-------+-------+
>>>>> -| TRANSACTION_DATA |
>>>>> -+-------------------------------+
>>>>> -| tx_id |
>>>>> -+-------------------------------+
>>>>> + 0 1 2 3 4 5 6 7 octet
>>>>> + +-------+-------+-------+-------+-------+-------+
>>>>> + | domid | tdomid | flags |
>>>>> ++---------------+---------------+---------------+---------------+
>>>>> +| revtchn | levtchn |
>>>>> ++-------------------------------+-------------------------------+
>>>>> +| mfn |
>>>>> ++---------------------------------------------------------------+
>>>>
>>>> ... with dropping levtchn (which isn't needed IMO) will make it much
>>>> easier to have a union in C (which needs to be aligned to 8 bytes
>>>> and have a length of a multiple of 8 bytes due to mfn).
>>>>
>>>> So something like:
>>>>
>>>> struct xs_state_connection {
>>>> uint32_t conn_id;
>>>> uint16_t conn_type;
>>>> #define XS_STATE_CONN_TYPE_RING 0
>>>> #define XS_STATE_CONN_TYPE_SOCKET 1
>>>> uint16_t flags;
>>>> #define XS_STATE_CONN_INTRODUCED 0x0001
>>>> #define XS_STATE_CONN_RELEASED 0x0002
>>>> #define XS_STATE_CONN_READONLY 0x0004
>>>> union {
>>>> struct {
>>>> uint16_t domid;
>>>> uint16_t tdomid;
>>>> #define XS_STATE_DOMID_INVALID 0xffffU
>>>> uint32_t evtchn;
>>>> uint64_t mfn;
>>>> #define XS_STATE_MFN_INVALID 0xffffffffffffffffUL
>>>> } ring;
>>>> int32_t socket_fd;
>>>> } spec;
>>>> uint32_t data_out_len;
>>>> uint8_t data[];
>>>> };
>>>
>>> The issue is making sure that the mfn is properly aligned. If I can drop the levtchn then this gets
>> easier.
>>>
>>>>
>>>>> ```
>>>>>
>>>>> -where tx_id is the non-zero identifier values of an open transaction.
>>>>> -
>>>>>
>>>>> -### Protocol Extension
>>>>> +| Field | Description |
>>>>> +|------------|--------------------------------------------------|
>>>>> +| `domid` | The domain-id that owns the shared page |
>>>>> +| | |
>>>>> +| `tdomid` | The domain-id that `domid` acts on behalf of if |
>>>>> +| | it has been subject to an SET_TARGET |
>>>>> +| | operation [2] or DOMID_INVALID otherwise |
>>>>
>>>> DOMID_INVALID needs to be defined (or we need a reference where it is
>>>> coming from).
>>>
>>> OK. It's in a public header... I'll reference it.
>>>
>>>>
>>>>> +| | |
>>>>> +| `flags` | A bit-wise OR of: |
>>>>> +| | 0x0001: INTRODUCE has been issued |
>>
>> Just realized, I think we can drop those flags.
>>
>> Reasoning: if INTRODUCE hasn't been issued, there can't be an active
>> connection to Xenstore for that domain, as Xenstore doesn't know about
>> the parameters to connect (especially the event channel is missing).
>>
>>>>> +| | 0x0002: RELEASE has been issued |
>>
>> And the same applies here: RELEASE will drop the connection to the
>> domain, so it can't appear in a connection record.
>>
>
> I think the presence of the RESUME command in xenstore.txt makes it non-obvious that we can forget about a domain once RELEASE has been called. The text there does say:
>
> "It is not clear whether this is possible since one would
> normally expect a domain not to be restarted after being shut
> down without being destroyed in the meantime. There are
> currently no users of this request in xen-unstable."
>
> So, perhaps this would be the time to remove RESUME from the spec?
+1
Juergen
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-04-27 13:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-27 7:53 [PATCH v2] docs/designs: re-work the xenstore migration document Paul Durrant
2020-04-27 10:36 ` Jürgen Groß
2020-04-27 10:45 ` Paul Durrant
2020-04-27 11:13 ` Jürgen Groß
2020-04-27 11:25 ` Paul Durrant
2020-04-27 13:30 ` Jürgen Groß
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.