* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
2014-07-03 15:15 ` [PATCH v3] " David Scott
@ 2014-07-04 8:21 ` Paul Durrant
2014-07-04 10:00 ` Dave Scott
2014-07-04 9:59 ` Ian Campbell
2014-09-03 16:25 ` [PATCH v4] xenstore: " David Scott
2 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2014-07-04 8:21 UTC (permalink / raw)
To: xen-devel
Cc: john.liuqiming, Dave Scott, yanqiangjun, Ian Campbell, Andrew Cooper
> -----Original Message-----
> From: David Scott [mailto:dave.scott@citrix.com]
> Sent: 03 July 2014 16:15
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant; Ian Campbell; john.liuqiming@huawei.com;
> yanqiangjun@huawei.com; Andrew Cooper; Dave Scott
> Subject: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
>
> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.
>
> Furthermore, when a VM crashes it may jump to a 'crash kernel'
> to create a diagnostic dump. Without the ability to safely
> reset the ring the PV drivers won't be able to reliably
> establish connections, to (for example) stream a memory dump to
> disk.
>
> This prototype patch contains a simple extension of the
> xenstore ring structure, enough to contain version numbers and
> a 'closing' flag. This memory is currently unused within the
> 4k page and should be zeroed as part of the domain setup
> process. The oxenstored server advertises version 1, and
> hvmloader detects this and sets the 'closing' flag. The server
> then reinitialises the ring, filling it with obviously invalid
> data to help debugging (unfortunately blocks of zeroes are
> valid xenstore packets) and signals hvmloader by the event
> channel that it's safe to boot the guest OS.
>
> Updates since v2 (thanks to Andy Cooper for the review):
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
>
> Updates since v1 (thanks to Paul Durrant for the review):
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
>
> Signed-off-by: David Scott <dave.scott@citrix.com>
> ---
> docs/misc/xenstore-ring.txt | 79
> +++++++++++++++++++++++++++++++++++
> tools/firmware/hvmloader/xenbus.c | 39 +++++++++--------
> tools/ocaml/libs/xb/xb.ml | 26 +++++++++++-
> tools/ocaml/libs/xb/xb.mli | 1 +
> tools/ocaml/libs/xb/xs_ring.ml | 11 +++++
> tools/ocaml/libs/xb/xs_ring_stubs.c | 63
> +++++++++++++++++++++++++++-
> xen/include/public/io/xs_wire.h | 5 +++
> 7 files changed, 203 insertions(+), 21 deletions(-)
> create mode 100644 docs/misc/xenstore-ring.txt
>
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..df2a09f
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,79 @@
> +
> +The domain builder allocates a single page and shares it with the xenstore
> +daemon. This document describes the ring protocol used to communicate
> via
> +this page which is used to transmit and receive
> +[xenstore protocol packets](xenstore.txt).
> +
> +In the original version (we call this "version 0"), the shared page has the
> +following contents (all offsets and lengths are in octets):
> +
> +Offset Length Description
> +-----------------------------------------------------------------
> +0 1024 Requests: data sent to the xenstore daemon
> +1024 1024 Replies: data sent to the domain
> +2048 4 Request consumer offset
> +2052 4 Request producer offset
> +2056 4 Response consumer offset
> +2060 4 Response producer offset
> +
> +When the page is allocated it is guaranteed to be full of zeroes.
> +
> +The "Requests" and "Replies" are treated as circular buffers, one for
> +each direction. Each circular buffer is associated wth a producer and
> +consumer offset, which are free-running counters starting from 0. A
> "producer"
> +offset is the offset in the byte stream of the next byte to be written; a
> +"consumer" offset is the offset in the byte stream of the next byte to be
> +read. The byte at offset 'x' in the byte stream will be stored in
> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
> +the number of bytes still to be read, and "1024 - (producer - consumer)"
> +therefore gives the amount of space currently available for writing,
> +where we must avoid overwriting unread data.
> +
> +The circular buffers are only used to send sequences of bytes between
> domains.
> +It is the responsibility of the layer above to segment these bytes into
> +packets, as described in [xenstore.txt](xenstore.txt).
> +
> +The client and server domains can run concurrently on different cores and
> +different sockets. We must therefore take care to avoid corruption by:
> +
> + 1. using atomic loads and stores when reading and writing the producer
> + and consumer offsets. If an offset were to be updated by a non-atomic
> + store then the other domain may read an invalid offset value.
> + 2. ensuring request and reply data is fully read or written before
> + updating the offsets by issuing a memory barrier.
> +
> +If we wish to read data but find the circular buffer empty, or wish to write
> +data and find the circular buffer full, we wait on a pre-arranged event
> +channel. When the other party next reads or writes data to the ring, it
> +will notify() this event channel and we will wake.
> +
> +Protocol extension for reconnection
> +-----------------------------------
> +
> +In version 0 of the protocol it is not possible to close and reopen the
> +connection. This means that if the ring is corrupted, it can never be used
> +(safely) again. Extending the protocol to allow reconnection would allow
> +us to:
> +
> + 1. use the ring in the firmware (hvmloader) and safely reset it for use
> + by the guest
> + 2. re-establish a ring in a 'crash kernel' environment, allowing us to
> + write crash dumps to PV disks or network devices.
> +
> +In version 1 of the protocol the ring is extended with the following
> +fields:
> +
> +Offset Length Description
> +-----------------------------------------------------------------
> +2064 4 Xenstore daemon supported protocol version
> +2068 4 Close request flag
> +
> +In a system supporting only version 0, both these fields are guaranteed
> +to contain 0 by the domain builder.
> +
> +In a system supporting version 1, the xenstore daemon will write "1" into
> +the support protocol version field. The guest xenstore client (eg in
s/support/supported
> +hvmloader) can query the version, and if it is set to "1" it can write
> +"1" to the close request flag and call notify(). The supporting xenstore
Perhaps, rather than 'call notify()' this should be 'signal the store event channel'?
> +daemon can reset the ring to an empty state and signal completion by
> +clearing the flag and calling notify() again.
Same here.
> diff --git a/tools/firmware/hvmloader/xenbus.c
> b/tools/firmware/hvmloader/xenbus.c
> index fe72e97..f85832c 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /*
> Shared ring with dom0 */
> static evtchn_port_t event; /* Event-channel to dom0 */
> static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area
> */
>
> +static void ring_wait(void)
> +{
> + struct shared_info *shinfo = get_shared_info();
> + struct sched_poll poll;
> +
> + memset(&poll, 0, sizeof(poll));
> + set_xen_guest_handle(poll.ports, &event);
> + poll.nr_ports = 1;
> +
> + while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> + hypercall_sched_op(SCHEDOP_poll, &poll);
> +}
> +
> /* Connect our xenbus client to the backend.
> * Call once, before any other xenbus actions. */
> void xenbus_setup(void)
> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
>
> ASSERT(rings != NULL);
>
> - /* We zero out the whole ring -- the backend can handle this, and it's
> - * not going to surprise any frontends since it's equivalent to never
> - * having used the rings. */
> - memset(rings, 0, sizeof *rings);
> + if (rings->server_version > 0) {
> + rings->closing = 1;
> + while (*(volatile uint32_t*)&rings->closing == 1)
> + ring_wait ();
> + } else
> + /* If the backend reads the state while we're erasing it then the
> + * ring state will become corrupted, preventing guest frontends from
> + * connecting. This is rare. */
> + memset(rings, 0, sizeof *rings);
>
> /* Clear the event-channel state too. */
> memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
> @@ -81,19 +99,6 @@ void xenbus_shutdown(void)
> rings = NULL;
> }
>
> -static void ring_wait(void)
> -{
> - struct shared_info *shinfo = get_shared_info();
> - struct sched_poll poll;
> -
> - memset(&poll, 0, sizeof(poll));
> - set_xen_guest_handle(poll.ports, &event);
> - poll.nr_ports = 1;
> -
> - while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> - hypercall_sched_op(SCHEDOP_poll, &poll);
> -}
> -
> /* Helper functions: copy data in and out of the ring */
> static void ring_write(const char *data, uint32_t len)
> {
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..d5cd776 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -84,7 +84,26 @@ let write con s len =
> | Fd backfd -> write_fd backfd con s len
> | Xenmmap backmmap -> write_mmap backmmap con s len
>
> -let output con =
> +(* If a function throws Xs_ring.Closing, then clear the ring state
> + and serve the ring again. *)
> +let rec handle_closing f con =
> + match (try Some (f con) with Xs_ring.Closing -> None) with
> + | Some x -> x
> + | None ->
> + begin match con.backend with
> + | Fd _ -> raise Xs_ring.Closing (* should never happen, but
> just in case *)
> + | Xenmmap backend ->
> + Xs_ring.close backend.mmap;
> + backend.eventchn_notify ();
> + (* Clear our old connection state *)
> + Queue.clear con.pkt_in;
> + Queue.clear con.pkt_out;
> + con.partial_in <- init_partial_in ();
> + con.partial_out <- "";
> + handle_closing f con
> + end
> +
> +let output = handle_closing (fun con ->
> (* get the output string from a string_of(packet) or partial_out *)
> let s = if String.length con.partial_out > 0 then
> con.partial_out
> @@ -101,8 +120,9 @@ let output con =
> );
> (* after sending one packet, partial is empty *)
> con.partial_out = ""
> +)
>
> -let input con =
> +let input = handle_closing (fun con ->
> let newpacket = ref false in
> let to_read =
> match con.partial_in with
> @@ -133,6 +153,7 @@ let input con =
> HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
> );
> !newpacket
> +)
>
> let newcon backend = {
> backend = backend;
> @@ -145,6 +166,7 @@ let newcon backend = {
> let open_fd fd = newcon (Fd { fd = fd; })
>
> let open_mmap mmap notifyfct =
> + Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
> newcon (Xenmmap {
> mmap = mmap;
> eventchn_notify = notifyfct;
> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
> index 58234ae..7f65fa3 100644
> --- a/tools/ocaml/libs/xb/xb.mli
> +++ b/tools/ocaml/libs/xb/xb.mli
> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
> val write_fd : backend_fd -> 'a -> string -> int -> int
> val write_mmap : backend_mmap -> 'a -> string -> int -> int
> val write : t -> string -> int -> int
> +val handle_closing : (t -> 'a) -> t -> 'a
> val output : t -> bool
> val input : t -> bool
> val newcon : backend -> t
> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
> index 9469609..d7f0fd4 100644
> --- a/tools/ocaml/libs/xb/xs_ring.ml
> +++ b/tools/ocaml/libs/xb/xs_ring.ml
> @@ -14,5 +14,16 @@
> * GNU Lesser General Public License for more details.
> *)
>
> +exception Closing
> +
> +let _ =
> + Callback.register_exception "Xs_ring.Closing" Closing
> +
> external read: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_read"
> external write: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_write"
> +
> +
> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
> "ml_interface_set_server_version" "noalloc"
> +external get_server_version: Xenmmap.mmap_interface -> int =
> "ml_interface_get_server_version" "noalloc"
> +
> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
> "noalloc"
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
> b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..27c98cd 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,19 +35,28 @@
>
> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>
> +#define ERROR_UNKNOWN (-1)
> +#define ERROR_CLOSING (-2)
> +
> static int xs_ring_read(struct mmap_interface *interface,
> char *buffer, int len)
> {
> struct xenstore_domain_interface *intf = interface->addr;
> XENSTORE_RING_IDX cons, prod; /* offsets only */
> int to_read;
> + uint32_t closing;
>
> cons = *(volatile uint32*)&intf->req_cons;
> prod = *(volatile uint32*)&intf->req_prod;
> + closing = *(volatile uint32*)&intf->closing;
> +
> + if (closing != 0)
> + return ERROR_CLOSING;
> +
> xen_mb();
>
> if ((prod - cons) > XENSTORE_RING_SIZE)
> - return -1;
> + return ERROR_UNKNOWN;
>
> if (prod == cons)
> return 0;
> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
> *interface,
> struct xenstore_domain_interface *intf = interface->addr;
> XENSTORE_RING_IDX cons, prod;
> int can_write;
> + uint32_t closing;
>
> cons = *(volatile uint32*)&intf->rsp_cons;
> prod = *(volatile uint32*)&intf->rsp_prod;
> + closing = *(volatile uint32*)&intf->closing;
> +
> + if (closing != 0)
> + return ERROR_CLOSING;
> +
> xen_mb();
> if ( (prod - cons) >= XENSTORE_RING_SIZE )
> return 0;
> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
> value buffer, value len)
>
> res = xs_ring_read(GET_C_STRUCT(interface),
> String_val(buffer), Int_val(len));
> - if (res == -1)
> + if (res == ERROR_UNKNOWN)
> caml_failwith("bad connection");
> +
> + if (res == ERROR_CLOSING)
> +
This looks a bit wrong. The blank line above should be removed and the following line indented appropriately.
> caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
> result = Val_int(res);
> CAMLreturn(result);
> }
> @@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface,
> value buffer, value len)
>
> res = xs_ring_write(GET_C_STRUCT(interface),
> String_val(buffer), Int_val(len));
> +
> + if (res == ERROR_CLOSING)
> +
Same here.
> caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
> result = Val_int(res);
> CAMLreturn(result);
> }
> +
> +CAMLprim value ml_interface_set_server_version(value interface, value v)
> +{
> + CAMLparam2(interface, v);
> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> + intf->server_version = Int_val(v);
> +
> + CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_server_version(value interface)
> +{
> + CAMLparam1(interface);
> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> + CAMLreturn(Val_int (intf->server_version));
> +}
> +
> +CAMLprim value ml_interface_close(value interface)
> +{
> + CAMLparam1(interface);
> + const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> + int i;
> +
> + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
> 0;
> + /* Ensure the unused space is full of invalid xenstore packets. */
> + for (i = 0; i < XENSTORE_RING_SIZE; i++) {
> + intf->req[i] = invalid_data[i % sizeof(invalid_data)];
> + intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
> + }
This does not reset the rings to their initial state (i.e. all zeroes). I don't think that's necessarily a problem, but it is inconsistent.
> + xen_mb ();
> + intf->closing = 0;
> + CAMLreturn(Val_unit);
> +}
> diff --git a/xen/include/public/io/xs_wire.h
> b/xen/include/public/io/xs_wire.h
> index 585f0c8..022d614 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -108,14 +108,19 @@ enum xs_watch_type
> * `incontents 150 xenstore_struct XenStore wire protocol.
> *
> * Inter-domain shared memory communications. */
> +
Pure whitespace change. Not necessary.
Paul
> #define XENSTORE_RING_SIZE 1024
> typedef uint32_t XENSTORE_RING_IDX;
> #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
> struct xenstore_domain_interface {
> + /* XENSTORE_VERSION_0 */
> char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
> char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
> XENSTORE_RING_IDX req_cons, req_prod;
> XENSTORE_RING_IDX rsp_cons, rsp_prod;
> + uint32_t server_version;
> + /* server_version 1 and later: */
> + uint32_t closing; /* Non-zero means close in progress */
> };
>
> /* Violating this is very bad. See docs/misc/xenstore.txt. */
> --
> 1.7.10.4
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
2014-07-04 8:21 ` Paul Durrant
@ 2014-07-04 10:00 ` Dave Scott
0 siblings, 0 replies; 29+ messages in thread
From: Dave Scott @ 2014-07-04 10:00 UTC (permalink / raw)
To: Paul Durrant
Cc: xen-devel, yanqiangjun, Andrew Cooper, Ian Campbell, john.liuqiming
Hi Paul,
On 4 Jul 2014, at 09:21, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> -----Original Message-----
>> From: David Scott [mailto:dave.scott@citrix.com]
>> Sent: 03 July 2014 16:15
>> To: xen-devel@lists.xenproject.org
>> Cc: Paul Durrant; Ian Campbell; john.liuqiming@huawei.com;
>> yanqiangjun@huawei.com; Andrew Cooper; Dave Scott
>> Subject: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
>>
>> Currently hvmloader uses the xenstore ring and then tries to
>> reset it back to its initial state. This is not part of the
>> ring protocol and, if xenstored reads the ring while it is
>> happening, xenstored will conclude it is corrupted. A corrupted
>> ring will prevent PV drivers from connecting. This seems to
>> be a rare failure.
>>
>> Furthermore, when a VM crashes it may jump to a 'crash kernel'
>> to create a diagnostic dump. Without the ability to safely
>> reset the ring the PV drivers won't be able to reliably
>> establish connections, to (for example) stream a memory dump to
>> disk.
>>
>> This prototype patch contains a simple extension of the
>> xenstore ring structure, enough to contain version numbers and
>> a 'closing' flag. This memory is currently unused within the
>> 4k page and should be zeroed as part of the domain setup
>> process. The oxenstored server advertises version 1, and
>> hvmloader detects this and sets the 'closing' flag. The server
>> then reinitialises the ring, filling it with obviously invalid
>> data to help debugging (unfortunately blocks of zeroes are
>> valid xenstore packets) and signals hvmloader by the event
>> channel that it's safe to boot the guest OS.
>>
>> Updates since v2 (thanks to Andy Cooper for the review):
>> * hvmloader: use volatile for read of closing flag
>> * style improvements
>> * remove xenstore version #defines
>>
>> Updates since v1 (thanks to Paul Durrant for the review):
>> * remove unused client version and associated dead code
>> * treat 'closing' as a flag by using "!=0" rather than "==1"
>> * hvmloader: move function up and remove forward decl
>> * document the existing xenstore ring and the extention in misc/
>>
>> Signed-off-by: David Scott <dave.scott@citrix.com>
>> ---
>> docs/misc/xenstore-ring.txt | 79
>> +++++++++++++++++++++++++++++++++++
>> tools/firmware/hvmloader/xenbus.c | 39 +++++++++--------
>> tools/ocaml/libs/xb/xb.ml | 26 +++++++++++-
>> tools/ocaml/libs/xb/xb.mli | 1 +
>> tools/ocaml/libs/xb/xs_ring.ml | 11 +++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c | 63
>> +++++++++++++++++++++++++++-
>> xen/include/public/io/xs_wire.h | 5 +++
>> 7 files changed, 203 insertions(+), 21 deletions(-)
>> create mode 100644 docs/misc/xenstore-ring.txt
>>
>> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
>> new file mode 100644
>> index 0000000..df2a09f
>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
>> @@ -0,0 +1,79 @@
>> +
>> +The domain builder allocates a single page and shares it with the xenstore
>> +daemon. This document describes the ring protocol used to communicate
>> via
>> +this page which is used to transmit and receive
>> +[xenstore protocol packets](xenstore.txt).
>> +
>> +In the original version (we call this "version 0"), the shared page has the
>> +following contents (all offsets and lengths are in octets):
>> +
>> +Offset Length Description
>> +-----------------------------------------------------------------
>> +0 1024 Requests: data sent to the xenstore daemon
>> +1024 1024 Replies: data sent to the domain
>> +2048 4 Request consumer offset
>> +2052 4 Request producer offset
>> +2056 4 Response consumer offset
>> +2060 4 Response producer offset
>> +
>> +When the page is allocated it is guaranteed to be full of zeroes.
>> +
>> +The "Requests" and "Replies" are treated as circular buffers, one for
>> +each direction. Each circular buffer is associated wth a producer and
>> +consumer offset, which are free-running counters starting from 0. A
>> "producer"
>> +offset is the offset in the byte stream of the next byte to be written; a
>> +"consumer" offset is the offset in the byte stream of the next byte to be
>> +read. The byte at offset 'x' in the byte stream will be stored in
>> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
>> +the number of bytes still to be read, and "1024 - (producer - consumer)"
>> +therefore gives the amount of space currently available for writing,
>> +where we must avoid overwriting unread data.
>> +
>> +The circular buffers are only used to send sequences of bytes between
>> domains.
>> +It is the responsibility of the layer above to segment these bytes into
>> +packets, as described in [xenstore.txt](xenstore.txt).
>> +
>> +The client and server domains can run concurrently on different cores and
>> +different sockets. We must therefore take care to avoid corruption by:
>> +
>> + 1. using atomic loads and stores when reading and writing the producer
>> + and consumer offsets. If an offset were to be updated by a non-atomic
>> + store then the other domain may read an invalid offset value.
>> + 2. ensuring request and reply data is fully read or written before
>> + updating the offsets by issuing a memory barrier.
>> +
>> +If we wish to read data but find the circular buffer empty, or wish to write
>> +data and find the circular buffer full, we wait on a pre-arranged event
>> +channel. When the other party next reads or writes data to the ring, it
>> +will notify() this event channel and we will wake.
>> +
>> +Protocol extension for reconnection
>> +-----------------------------------
>> +
>> +In version 0 of the protocol it is not possible to close and reopen the
>> +connection. This means that if the ring is corrupted, it can never be used
>> +(safely) again. Extending the protocol to allow reconnection would allow
>> +us to:
>> +
>> + 1. use the ring in the firmware (hvmloader) and safely reset it for use
>> + by the guest
>> + 2. re-establish a ring in a 'crash kernel' environment, allowing us to
>> + write crash dumps to PV disks or network devices.
>> +
>> +In version 1 of the protocol the ring is extended with the following
>> +fields:
>> +
>> +Offset Length Description
>> +-----------------------------------------------------------------
>> +2064 4 Xenstore daemon supported protocol version
>> +2068 4 Close request flag
>> +
>> +In a system supporting only version 0, both these fields are guaranteed
>> +to contain 0 by the domain builder.
>> +
>> +In a system supporting version 1, the xenstore daemon will write "1" into
>> +the support protocol version field. The guest xenstore client (eg in
>
> s/support/supported
>
>> +hvmloader) can query the version, and if it is set to "1" it can write
>> +"1" to the close request flag and call notify(). The supporting xenstore
>
> Perhaps, rather than 'call notify()' this should be 'signal the store event channel'?
>
>> +daemon can reset the ring to an empty state and signal completion by
>> +clearing the flag and calling notify() again.
>
> Same here.
>
>> diff --git a/tools/firmware/hvmloader/xenbus.c
>> b/tools/firmware/hvmloader/xenbus.c
>> index fe72e97..f85832c 100644
>> --- a/tools/firmware/hvmloader/xenbus.c
>> +++ b/tools/firmware/hvmloader/xenbus.c
>> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /*
>> Shared ring with dom0 */
>> static evtchn_port_t event; /* Event-channel to dom0 */
>> static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area
>> */
>>
>> +static void ring_wait(void)
>> +{
>> + struct shared_info *shinfo = get_shared_info();
>> + struct sched_poll poll;
>> +
>> + memset(&poll, 0, sizeof(poll));
>> + set_xen_guest_handle(poll.ports, &event);
>> + poll.nr_ports = 1;
>> +
>> + while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
>> + hypercall_sched_op(SCHEDOP_poll, &poll);
>> +}
>> +
>> /* Connect our xenbus client to the backend.
>> * Call once, before any other xenbus actions. */
>> void xenbus_setup(void)
>> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
>>
>> ASSERT(rings != NULL);
>>
>> - /* We zero out the whole ring -- the backend can handle this, and it's
>> - * not going to surprise any frontends since it's equivalent to never
>> - * having used the rings. */
>> - memset(rings, 0, sizeof *rings);
>> + if (rings->server_version > 0) {
>> + rings->closing = 1;
>> + while (*(volatile uint32_t*)&rings->closing == 1)
>> + ring_wait ();
>> + } else
>> + /* If the backend reads the state while we're erasing it then the
>> + * ring state will become corrupted, preventing guest frontends from
>> + * connecting. This is rare. */
>> + memset(rings, 0, sizeof *rings);
>>
>> /* Clear the event-channel state too. */
>> memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
>> @@ -81,19 +99,6 @@ void xenbus_shutdown(void)
>> rings = NULL;
>> }
>>
>> -static void ring_wait(void)
>> -{
>> - struct shared_info *shinfo = get_shared_info();
>> - struct sched_poll poll;
>> -
>> - memset(&poll, 0, sizeof(poll));
>> - set_xen_guest_handle(poll.ports, &event);
>> - poll.nr_ports = 1;
>> -
>> - while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
>> - hypercall_sched_op(SCHEDOP_poll, &poll);
>> -}
>> -
>> /* Helper functions: copy data in and out of the ring */
>> static void ring_write(const char *data, uint32_t len)
>> {
>> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
>> index 29d354d..d5cd776 100644
>> --- a/tools/ocaml/libs/xb/xb.ml
>> +++ b/tools/ocaml/libs/xb/xb.ml
>> @@ -84,7 +84,26 @@ let write con s len =
>> | Fd backfd -> write_fd backfd con s len
>> | Xenmmap backmmap -> write_mmap backmmap con s len
>>
>> -let output con =
>> +(* If a function throws Xs_ring.Closing, then clear the ring state
>> + and serve the ring again. *)
>> +let rec handle_closing f con =
>> + match (try Some (f con) with Xs_ring.Closing -> None) with
>> + | Some x -> x
>> + | None ->
>> + begin match con.backend with
>> + | Fd _ -> raise Xs_ring.Closing (* should never happen, but
>> just in case *)
>> + | Xenmmap backend ->
>> + Xs_ring.close backend.mmap;
>> + backend.eventchn_notify ();
>> + (* Clear our old connection state *)
>> + Queue.clear con.pkt_in;
>> + Queue.clear con.pkt_out;
>> + con.partial_in <- init_partial_in ();
>> + con.partial_out <- "";
>> + handle_closing f con
>> + end
>> +
>> +let output = handle_closing (fun con ->
>> (* get the output string from a string_of(packet) or partial_out *)
>> let s = if String.length con.partial_out > 0 then
>> con.partial_out
>> @@ -101,8 +120,9 @@ let output con =
>> );
>> (* after sending one packet, partial is empty *)
>> con.partial_out = ""
>> +)
>>
>> -let input con =
>> +let input = handle_closing (fun con ->
>> let newpacket = ref false in
>> let to_read =
>> match con.partial_in with
>> @@ -133,6 +153,7 @@ let input con =
>> HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>> );
>> !newpacket
>> +)
>>
>> let newcon backend = {
>> backend = backend;
>> @@ -145,6 +166,7 @@ let newcon backend = {
>> let open_fd fd = newcon (Fd { fd = fd; })
>>
>> let open_mmap mmap notifyfct =
>> + Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
>> newcon (Xenmmap {
>> mmap = mmap;
>> eventchn_notify = notifyfct;
>> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
>> index 58234ae..7f65fa3 100644
>> --- a/tools/ocaml/libs/xb/xb.mli
>> +++ b/tools/ocaml/libs/xb/xb.mli
>> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>> val write_fd : backend_fd -> 'a -> string -> int -> int
>> val write_mmap : backend_mmap -> 'a -> string -> int -> int
>> val write : t -> string -> int -> int
>> +val handle_closing : (t -> 'a) -> t -> 'a
>> val output : t -> bool
>> val input : t -> bool
>> val newcon : backend -> t
>> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
>> index 9469609..d7f0fd4 100644
>> --- a/tools/ocaml/libs/xb/xs_ring.ml
>> +++ b/tools/ocaml/libs/xb/xs_ring.ml
>> @@ -14,5 +14,16 @@
>> * GNU Lesser General Public License for more details.
>> *)
>>
>> +exception Closing
>> +
>> +let _ =
>> + Callback.register_exception "Xs_ring.Closing" Closing
>> +
>> external read: Xenmmap.mmap_interface -> string -> int -> int =
>> "ml_interface_read"
>> external write: Xenmmap.mmap_interface -> string -> int -> int =
>> "ml_interface_write"
>> +
>> +
>> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
>> "ml_interface_set_server_version" "noalloc"
>> +external get_server_version: Xenmmap.mmap_interface -> int =
>> "ml_interface_get_server_version" "noalloc"
>> +
>> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
>> "noalloc"
>> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> index 8bd1047..27c98cd 100644
>> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> @@ -35,19 +35,28 @@
>>
>> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>>
>> +#define ERROR_UNKNOWN (-1)
>> +#define ERROR_CLOSING (-2)
>> +
>> static int xs_ring_read(struct mmap_interface *interface,
>> char *buffer, int len)
>> {
>> struct xenstore_domain_interface *intf = interface->addr;
>> XENSTORE_RING_IDX cons, prod; /* offsets only */
>> int to_read;
>> + uint32_t closing;
>>
>> cons = *(volatile uint32*)&intf->req_cons;
>> prod = *(volatile uint32*)&intf->req_prod;
>> + closing = *(volatile uint32*)&intf->closing;
>> +
>> + if (closing != 0)
>> + return ERROR_CLOSING;
>> +
>> xen_mb();
>>
>> if ((prod - cons) > XENSTORE_RING_SIZE)
>> - return -1;
>> + return ERROR_UNKNOWN;
>>
>> if (prod == cons)
>> return 0;
>> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
>> *interface,
>> struct xenstore_domain_interface *intf = interface->addr;
>> XENSTORE_RING_IDX cons, prod;
>> int can_write;
>> + uint32_t closing;
>>
>> cons = *(volatile uint32*)&intf->rsp_cons;
>> prod = *(volatile uint32*)&intf->rsp_prod;
>> + closing = *(volatile uint32*)&intf->closing;
>> +
>> + if (closing != 0)
>> + return ERROR_CLOSING;
>> +
>> xen_mb();
>> if ( (prod - cons) >= XENSTORE_RING_SIZE )
>> return 0;
>> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
>> value buffer, value len)
>>
>> res = xs_ring_read(GET_C_STRUCT(interface),
>> String_val(buffer), Int_val(len));
>> - if (res == -1)
>> + if (res == ERROR_UNKNOWN)
>> caml_failwith("bad connection");
>> +
>> + if (res == ERROR_CLOSING)
>> +
>
> This looks a bit wrong. The blank line above should be removed and the following line indented appropriately.
>
>> caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
>> +
>> result = Val_int(res);
>> CAMLreturn(result);
>> }
>> @@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface,
>> value buffer, value len)
>>
>> res = xs_ring_write(GET_C_STRUCT(interface),
>> String_val(buffer), Int_val(len));
>> +
>> + if (res == ERROR_CLOSING)
>> +
>
> Same here.
>
>> caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
>> +
>> result = Val_int(res);
>> CAMLreturn(result);
>> }
>> +
>> +CAMLprim value ml_interface_set_server_version(value interface, value v)
>> +{
>> + CAMLparam2(interface, v);
>> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> + intf->server_version = Int_val(v);
>> +
>> + CAMLreturn(Val_unit);
>> +}
>> +
>> +CAMLprim value ml_interface_get_server_version(value interface)
>> +{
>> + CAMLparam1(interface);
>> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> + CAMLreturn(Val_int (intf->server_version));
>> +}
>> +
>> +CAMLprim value ml_interface_close(value interface)
>> +{
>> + CAMLparam1(interface);
>> + const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
>> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> + int i;
>> +
>> + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
>> 0;
>> + /* Ensure the unused space is full of invalid xenstore packets. */
>> + for (i = 0; i < XENSTORE_RING_SIZE; i++) {
>> + intf->req[i] = invalid_data[i % sizeof(invalid_data)];
>> + intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
>> + }
>
> This does not reset the rings to their initial state (i.e. all zeroes). I don't think that's necessarily a problem, but it is inconsistent.
Indeed — I should document this difference in the API.
In one case where I saw the original bug the xenstore logs were full of ‘DEBUG’ messages, because message id = 0 = DEBUG, request id = 0, transaction id = 0 and length = 0 is a valid packet. This invalid data should cause the corruption to be noticed more quickly :-)
Thanks,
Dave
>
>> + xen_mb ();
>> + intf->closing = 0;
>> + CAMLreturn(Val_unit);
>> +}
>> diff --git a/xen/include/public/io/xs_wire.h
>> b/xen/include/public/io/xs_wire.h
>> index 585f0c8..022d614 100644
>> --- a/xen/include/public/io/xs_wire.h
>> +++ b/xen/include/public/io/xs_wire.h
>> @@ -108,14 +108,19 @@ enum xs_watch_type
>> * `incontents 150 xenstore_struct XenStore wire protocol.
>> *
>> * Inter-domain shared memory communications. */
>> +
>
> Pure whitespace change. Not necessary.
>
> Paul
>
>> #define XENSTORE_RING_SIZE 1024
>> typedef uint32_t XENSTORE_RING_IDX;
>> #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>> struct xenstore_domain_interface {
>> + /* XENSTORE_VERSION_0 */
>> char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>> char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>> XENSTORE_RING_IDX req_cons, req_prod;
>> XENSTORE_RING_IDX rsp_cons, rsp_prod;
>> + uint32_t server_version;
>> + /* server_version 1 and later: */
>> + uint32_t closing; /* Non-zero means close in progress */
>> };
>>
>> /* Violating this is very bad. See docs/misc/xenstore.txt. */
>> --
>> 1.7.10.4
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
2014-07-03 15:15 ` [PATCH v3] " David Scott
2014-07-04 8:21 ` Paul Durrant
@ 2014-07-04 9:59 ` Ian Campbell
2014-07-04 10:19 ` Dave Scott
2014-09-03 16:25 ` [PATCH v4] xenstore: " David Scott
2 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-07-04 9:59 UTC (permalink / raw)
To: David Scott
Cc: xen-devel, paul.durrant, andrew.cooper3, yanqiangjun, john.liuqiming
On Thu, 2014-07-03 at 16:15 +0100, David Scott wrote:
This all looks broadly sensible to me.
> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.
>
> Furthermore, when a VM crashes it may jump to a 'crash kernel'
> to create a diagnostic dump. Without the ability to safely
> reset the ring the PV drivers won't be able to reliably
> establish connections, to (for example) stream a memory dump to
> disk.
>
> This prototype patch contains a simple extension of the
> xenstore ring structure, enough to contain version numbers and
> a 'closing' flag. This memory is currently unused within the
> 4k page and should be zeroed as part of the domain setup
> process. The oxenstored server advertises version 1, and
> hvmloader detects this and sets the 'closing' flag. The server
> then reinitialises the ring, filling it with obviously invalid
> data to help debugging (unfortunately blocks of zeroes are
> valid xenstore packets) and signals hvmloader by the event
> channel that it's safe to boot the guest OS.
>
> Updates since v2 (thanks to Andy Cooper for the review):
Please can you put these updates after the "---" marker. This will cause
the to be omitted from the final commit by the git am tool.
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
>
> Updates since v1 (thanks to Paul Durrant for the review):
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
>
> Signed-off-by: David Scott <dave.scott@citrix.com>
> ---
> docs/misc/xenstore-ring.txt | 79 +++++++++++++++++++++++++++++++++++
> tools/firmware/hvmloader/xenbus.c | 39 +++++++++--------
> tools/ocaml/libs/xb/xb.ml | 26 +++++++++++-
> tools/ocaml/libs/xb/xb.mli | 1 +
> tools/ocaml/libs/xb/xs_ring.ml | 11 +++++
> tools/ocaml/libs/xb/xs_ring_stubs.c | 63 +++++++++++++++++++++++++++-
> xen/include/public/io/xs_wire.h | 5 +++
> 7 files changed, 203 insertions(+), 21 deletions(-)
> create mode 100644 docs/misc/xenstore-ring.txt
>
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..df2a09f
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,79 @@
> +
> +The domain builder allocates a single page and shares it with the xenstore
> +daemon. This document describes the ring protocol used to communicate via
> +this page which is used to transmit and receive
> +[xenstore protocol packets](xenstore.txt).
> +
> +In the original version (we call this "version 0"), the shared page has the
> +following contents (all offsets and lengths are in octets):
> +
> +Offset Length Description
> +-----------------------------------------------------------------
> +0 1024 Requests: data sent to the xenstore daemon
> +1024 1024 Replies: data sent to the domain
> +2048 4 Request consumer offset
> +2052 4 Request producer offset
> +2056 4 Response consumer offset
> +2060 4 Response producer offset
> +
> +When the page is allocated it is guaranteed to be full of zeroes.
> +
> +The "Requests" and "Replies" are treated as circular buffers, one for
> +each direction. Each circular buffer is associated wth a producer and
> +consumer offset, which are free-running counters starting from 0. A "producer"
> +offset is the offset in the byte stream of the next byte to be written; a
> +"consumer" offset is the offset in the byte stream of the next byte to be
> +read. The byte at offset 'x' in the byte stream will be stored in
> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
> +the number of bytes still to be read, and "1024 - (producer - consumer)"
> +therefore gives the amount of space currently available for writing,
> +where we must avoid overwriting unread data.
> +
> +The circular buffers are only used to send sequences of bytes between domains.
> +It is the responsibility of the layer above to segment these bytes into
> +packets, as described in [xenstore.txt](xenstore.txt).
> +
> +The client and server domains can run concurrently on different cores and
> +different sockets. We must therefore take care to avoid corruption by:
> +
> + 1. using atomic loads and stores when reading and writing the producer
> + and consumer offsets. If an offset were to be updated by a non-atomic
> + store then the other domain may read an invalid offset value.
> + 2. ensuring request and reply data is fully read or written before
> + updating the offsets by issuing a memory barrier.
> +
> +If we wish to read data but find the circular buffer empty, or wish to write
> +data and find the circular buffer full, we wait on a pre-arranged event
> +channel. When the other party next reads or writes data to the ring, it
> +will notify() this event channel and we will wake.
> +
> +Protocol extension for reconnection
> +-----------------------------------
> +
> +In version 0 of the protocol it is not possible to close and reopen the
> +connection. This means that if the ring is corrupted, it can never be used
> +(safely) again. Extending the protocol to allow reconnection would allow
> +us to:
The use of past tense here is a bit odd given that this patch implements
and enables all of this stuff.
> +
> + 1. use the ring in the firmware (hvmloader) and safely reset it for use
> + by the guest
> + 2. re-establish a ring in a 'crash kernel' environment, allowing us to
> + write crash dumps to PV disks or network devices.
> +
> +In version 1 of the protocol the ring is extended with the following
> +fields:
> +
> +Offset Length Description
> +-----------------------------------------------------------------
> +2064 4 Xenstore daemon supported protocol version
> +2068 4 Close request flag
> +
> +In a system supporting only version 0, both these fields are guaranteed
> +to contain 0 by the domain builder.
> +
> +In a system supporting version 1, the xenstore daemon will write "1" into
> +the support protocol version field. The guest xenstore client (eg in
> +hvmloader) can query the version, and if it is set to "1" it can write
> +"1" to the close request flag and call notify(). The supporting xenstore
> +daemon can reset the ring to an empty state and signal completion by
> +clearing the flag and calling notify() again.
I agree with Paul WRT s/calling notify()/kick the evtchn/.
> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
> index fe72e97..f85832c 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
> static evtchn_port_t event; /* Event-channel to dom0 */
> static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area */
>
> +static void ring_wait(void)
> +{
> + struct shared_info *shinfo = get_shared_info();
> + struct sched_poll poll;
> +
> + memset(&poll, 0, sizeof(poll));
> + set_xen_guest_handle(poll.ports, &event);
> + poll.nr_ports = 1;
> +
> + while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> + hypercall_sched_op(SCHEDOP_poll, &poll);
> +}
> +
> /* Connect our xenbus client to the backend.
> * Call once, before any other xenbus actions. */
> void xenbus_setup(void)
> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
>
> ASSERT(rings != NULL);
>
> - /* We zero out the whole ring -- the backend can handle this, and it's
> - * not going to surprise any frontends since it's equivalent to never
> - * having used the rings. */
> - memset(rings, 0, sizeof *rings);
> + if (rings->server_version > 0) {
hvmloader is part of the Xen release and is therefore entitled to assume
that it is running against the xenstore(s) which shipped with that
release of Xen.
IOW you can omit this check or replace it with an assertion if you
prefer.
(Later: OIC, I thought you'd patched the cxenstored but that was
actually just the ocaml stubs. This is fine then)
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..d5cd776 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
I'll trust you've got all this ocaml stuff right ;-)
Is there someone else you can CC to get a pair of savvy eyes on it?
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..27c98cd 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,19 +35,28 @@
>
> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>
> +#define ERROR_UNKNOWN (-1)
> +#define ERROR_CLOSING (-2)
> +
> static int xs_ring_read(struct mmap_interface *interface,
> char *buffer, int len)
> {
> struct xenstore_domain_interface *intf = interface->addr;
> XENSTORE_RING_IDX cons, prod; /* offsets only */
> int to_read;
> + uint32_t closing;
>
> cons = *(volatile uint32*)&intf->req_cons;
> prod = *(volatile uint32*)&intf->req_prod;
> + closing = *(volatile uint32*)&intf->closing;
> +
> + if (closing != 0)
> + return ERROR_CLOSING;
It's not possible to just raise the relevant exception here rather than
propagating this to all the callers?
> +CAMLprim value ml_interface_close(value interface)
> +{
> + CAMLparam1(interface);
> + const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> + int i;
> +
> + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
> + /* Ensure the unused space is full of invalid xenstore packets. */
Is filling with ones or zeroes not sufficient?
Also, is this strictly necessary of just for debugging purposes?
Ian.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
2014-07-04 9:59 ` Ian Campbell
@ 2014-07-04 10:19 ` Dave Scott
2014-07-04 11:27 ` Ian Campbell
0 siblings, 1 reply; 29+ messages in thread
From: Dave Scott @ 2014-07-04 10:19 UTC (permalink / raw)
To: Ian Campbell
Cc: Anil Madhavapeddy, Andrew Cooper, Jonathan Ludlam, Yanqiangjun,
Paul Durrant, Liuqiming (John),
xen-devel
On 4 Jul 2014, at 10:59, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2014-07-03 at 16:15 +0100, David Scott wrote:
>
> This all looks broadly sensible to me.
Thanks!
>
>> Currently hvmloader uses the xenstore ring and then tries to
>> reset it back to its initial state. This is not part of the
>> ring protocol and, if xenstored reads the ring while it is
>> happening, xenstored will conclude it is corrupted. A corrupted
>> ring will prevent PV drivers from connecting. This seems to
>> be a rare failure.
>>
>> Furthermore, when a VM crashes it may jump to a 'crash kernel'
>> to create a diagnostic dump. Without the ability to safely
>> reset the ring the PV drivers won't be able to reliably
>> establish connections, to (for example) stream a memory dump to
>> disk.
>>
>> This prototype patch contains a simple extension of the
>> xenstore ring structure, enough to contain version numbers and
>> a 'closing' flag. This memory is currently unused within the
>> 4k page and should be zeroed as part of the domain setup
>> process. The oxenstored server advertises version 1, and
>> hvmloader detects this and sets the 'closing' flag. The server
>> then reinitialises the ring, filling it with obviously invalid
>> data to help debugging (unfortunately blocks of zeroes are
>> valid xenstore packets) and signals hvmloader by the event
>> channel that it's safe to boot the guest OS.
>>
>> Updates since v2 (thanks to Andy Cooper for the review):
>
> Please can you put these updates after the "---" marker. This will cause
> the to be omitted from the final commit by the git am tool.
>
>> * hvmloader: use volatile for read of closing flag
>> * style improvements
>> * remove xenstore version #defines
>>
>> Updates since v1 (thanks to Paul Durrant for the review):
>> * remove unused client version and associated dead code
>> * treat 'closing' as a flag by using "!=0" rather than "==1"
>> * hvmloader: move function up and remove forward decl
>> * document the existing xenstore ring and the extention in misc/
>>
>> Signed-off-by: David Scott <dave.scott@citrix.com>
>> ---
>> docs/misc/xenstore-ring.txt | 79 +++++++++++++++++++++++++++++++++++
>> tools/firmware/hvmloader/xenbus.c | 39 +++++++++--------
>> tools/ocaml/libs/xb/xb.ml | 26 +++++++++++-
>> tools/ocaml/libs/xb/xb.mli | 1 +
>> tools/ocaml/libs/xb/xs_ring.ml | 11 +++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c | 63 +++++++++++++++++++++++++++-
>> xen/include/public/io/xs_wire.h | 5 +++
>> 7 files changed, 203 insertions(+), 21 deletions(-)
>> create mode 100644 docs/misc/xenstore-ring.txt
>>
>> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
>> new file mode 100644
>> index 0000000..df2a09f
>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
>> @@ -0,0 +1,79 @@
>> +
>> +The domain builder allocates a single page and shares it with the xenstore
>> +daemon. This document describes the ring protocol used to communicate via
>> +this page which is used to transmit and receive
>> +[xenstore protocol packets](xenstore.txt).
>> +
>> +In the original version (we call this "version 0"), the shared page has the
>> +following contents (all offsets and lengths are in octets):
>> +
>> +Offset Length Description
>> +-----------------------------------------------------------------
>> +0 1024 Requests: data sent to the xenstore daemon
>> +1024 1024 Replies: data sent to the domain
>> +2048 4 Request consumer offset
>> +2052 4 Request producer offset
>> +2056 4 Response consumer offset
>> +2060 4 Response producer offset
>> +
>> +When the page is allocated it is guaranteed to be full of zeroes.
>> +
>> +The "Requests" and "Replies" are treated as circular buffers, one for
>> +each direction. Each circular buffer is associated wth a producer and
>> +consumer offset, which are free-running counters starting from 0. A "producer"
>> +offset is the offset in the byte stream of the next byte to be written; a
>> +"consumer" offset is the offset in the byte stream of the next byte to be
>> +read. The byte at offset 'x' in the byte stream will be stored in
>> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
>> +the number of bytes still to be read, and "1024 - (producer - consumer)"
>> +therefore gives the amount of space currently available for writing,
>> +where we must avoid overwriting unread data.
>> +
>> +The circular buffers are only used to send sequences of bytes between domains.
>> +It is the responsibility of the layer above to segment these bytes into
>> +packets, as described in [xenstore.txt](xenstore.txt).
>> +
>> +The client and server domains can run concurrently on different cores and
>> +different sockets. We must therefore take care to avoid corruption by:
>> +
>> + 1. using atomic loads and stores when reading and writing the producer
>> + and consumer offsets. If an offset were to be updated by a non-atomic
>> + store then the other domain may read an invalid offset value.
>> + 2. ensuring request and reply data is fully read or written before
>> + updating the offsets by issuing a memory barrier.
>> +
>> +If we wish to read data but find the circular buffer empty, or wish to write
>> +data and find the circular buffer full, we wait on a pre-arranged event
>> +channel. When the other party next reads or writes data to the ring, it
>> +will notify() this event channel and we will wake.
>> +
>> +Protocol extension for reconnection
>> +-----------------------------------
>> +
>> +In version 0 of the protocol it is not possible to close and reopen the
>> +connection. This means that if the ring is corrupted, it can never be used
>> +(safely) again. Extending the protocol to allow reconnection would allow
>> +us to:
>
> The use of past tense here is a bit odd given that this patch implements
> and enables all of this stuff.
>
>> +
>> + 1. use the ring in the firmware (hvmloader) and safely reset it for use
>> + by the guest
>> + 2. re-establish a ring in a 'crash kernel' environment, allowing us to
>> + write crash dumps to PV disks or network devices.
>> +
>> +In version 1 of the protocol the ring is extended with the following
>> +fields:
>> +
>> +Offset Length Description
>> +-----------------------------------------------------------------
>> +2064 4 Xenstore daemon supported protocol version
>> +2068 4 Close request flag
>> +
>> +In a system supporting only version 0, both these fields are guaranteed
>> +to contain 0 by the domain builder.
>> +
>> +In a system supporting version 1, the xenstore daemon will write "1" into
>> +the support protocol version field. The guest xenstore client (eg in
>> +hvmloader) can query the version, and if it is set to "1" it can write
>> +"1" to the close request flag and call notify(). The supporting xenstore
>> +daemon can reset the ring to an empty state and signal completion by
>> +clearing the flag and calling notify() again.
>
> I agree with Paul WRT s/calling notify()/kick the evtchn/.
>
>> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
>> index fe72e97..f85832c 100644
>> --- a/tools/firmware/hvmloader/xenbus.c
>> +++ b/tools/firmware/hvmloader/xenbus.c
>> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
>> static evtchn_port_t event; /* Event-channel to dom0 */
>> static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area */
>>
>> +static void ring_wait(void)
>> +{
>> + struct shared_info *shinfo = get_shared_info();
>> + struct sched_poll poll;
>> +
>> + memset(&poll, 0, sizeof(poll));
>> + set_xen_guest_handle(poll.ports, &event);
>> + poll.nr_ports = 1;
>> +
>> + while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
>> + hypercall_sched_op(SCHEDOP_poll, &poll);
>> +}
>> +
>> /* Connect our xenbus client to the backend.
>> * Call once, before any other xenbus actions. */
>> void xenbus_setup(void)
>> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
>>
>> ASSERT(rings != NULL);
>>
>> - /* We zero out the whole ring -- the backend can handle this, and it's
>> - * not going to surprise any frontends since it's equivalent to never
>> - * having used the rings. */
>> - memset(rings, 0, sizeof *rings);
>> + if (rings->server_version > 0) {
>
> hvmloader is part of the Xen release and is therefore entitled to assume
> that it is running against the xenstore(s) which shipped with that
> release of Xen.
>
> IOW you can omit this check or replace it with an assertion if you
> prefer.
>
> (Later: OIC, I thought you'd patched the cxenstored but that was
> actually just the ocaml stubs. This is fine then)
OK
>
>> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
>> index 29d354d..d5cd776 100644
>> --- a/tools/ocaml/libs/xb/xb.ml
>> +++ b/tools/ocaml/libs/xb/xb.ml
>
> I'll trust you've got all this ocaml stuff right ;-)
>
> Is there someone else you can CC to get a pair of savvy eyes on it?
I’ve cc:d Anil and Jon who are always good at this kind of thing.
>
>> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> index 8bd1047..27c98cd 100644
>> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> @@ -35,19 +35,28 @@
>>
>> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>>
>> +#define ERROR_UNKNOWN (-1)
>> +#define ERROR_CLOSING (-2)
>> +
>> static int xs_ring_read(struct mmap_interface *interface,
>> char *buffer, int len)
>> {
>> struct xenstore_domain_interface *intf = interface->addr;
>> XENSTORE_RING_IDX cons, prod; /* offsets only */
>> int to_read;
>> + uint32_t closing;
>>
>> cons = *(volatile uint32*)&intf->req_cons;
>> prod = *(volatile uint32*)&intf->req_prod;
>> + closing = *(volatile uint32*)&intf->closing;
>> +
>> + if (closing != 0)
>> + return ERROR_CLOSING;
>
> It's not possible to just raise the relevant exception here rather than
> propagating this to all the callers?
Hm, looking at the code in context I don’t know why both read and write have been decomposed into two functions each— I might as well merge them back together and raise the exceptions inline as you suggest.
>
>> +CAMLprim value ml_interface_close(value interface)
>> +{
>> + CAMLparam1(interface);
>> + const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
>> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
>> + int i;
>> +
>> + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
>> + /* Ensure the unused space is full of invalid xenstore packets. */
>
> Is filling with ones or zeroes not sufficient?
>
> Also, is this strictly necessary of just for debugging purposes?
It’s not strictly necessary so it could be removed.
My preference i to fill the unused space within the ring buffers with obviously invalid packets so that if someone peeks at it at the wrong time, they get obviously bad data and hopefully stop immediately. If both ends are operating correctly then they should never notice. It’s unfortunate that all zeroes looks like a valid sequence of DEBUG packets. Perhaps I could fill it with ‘0xff’ and explicitly declare 0xff as an invalid message type xs_wire.h — that would do it.
What do you think?
Dave
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
2014-07-04 10:19 ` Dave Scott
@ 2014-07-04 11:27 ` Ian Campbell
0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2014-07-04 11:27 UTC (permalink / raw)
To: Dave Scott
Cc: Anil Madhavapeddy, Andrew Cooper, Jonathan Ludlam, Yanqiangjun,
Paul Durrant, Liuqiming (John),
xen-devel
On Fri, 2014-07-04 at 11:19 +0100, Dave Scott wrote:
> >
> >> +CAMLprim value ml_interface_close(value interface)
> >> +{
> >> + CAMLparam1(interface);
> >> + const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> >> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> >> + int i;
> >> +
> >> + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
> >> + /* Ensure the unused space is full of invalid xenstore packets. */
> >
> > Is filling with ones or zeroes not sufficient?
> >
> > Also, is this strictly necessary of just for debugging purposes?
>
> It’s not strictly necessary so it could be removed.
>
> My preference i to fill the unused space within the ring buffers with
> obviously invalid packets so that if someone peeks at it at the wrong
> time, they get obviously bad data and hopefully stop immediately. If
> both ends are operating correctly then they should never notice. It’s
> unfortunate that all zeroes looks like a valid sequence of DEBUG
> packets. Perhaps I could fill it with ‘0xff’ and explicitly declare
> 0xff as an invalid message type xs_wire.h — that would do it.
>
> What do you think?
One thought I just had was that this doesn't leave the ring in the
stated initial state (which is all zeroes). If something wanted to
resume using the ring it might expect that it was all zeroes?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
2014-07-03 15:15 ` [PATCH v3] " David Scott
2014-07-04 8:21 ` Paul Durrant
2014-07-04 9:59 ` Ian Campbell
@ 2014-09-03 16:25 ` David Scott
2014-09-10 13:35 ` Ian Campbell
` (2 more replies)
2 siblings, 3 replies; 29+ messages in thread
From: David Scott @ 2014-09-03 16:25 UTC (permalink / raw)
To: xen-devel
Cc: David Scott, anil, Andrew.Cooper3, Jonathan.Ludlam, yanqiangjun,
Paul.Durrant, john.liuqiming
Hvmloader uses the xenstore ring and then tries to reset it back
to its initial state before booting the guest. Occasionally xenstored
will read the ring while it is being zeroed and conclude it has
been corrupted. This prevents PV drivers from loading in the guest.
This patch updates the xenstore ring protocol definition, enabling
a server to advertise additional features to the guest. One such feature
is defined: the ability to cleanly reset the ring (but not the
higher-level protocol, for which we already have RESET_WATCHES).
This patch implements the ring reconnection features in oxenstored
and hvmloader, fixing the bug.
This patch also defines an 'invalid' xenstore packet type and uses this
to poison the ring over a reconnect. This will make diagnosing this
bug much easier in future.
Signed-off-by: David Scott <dave.scott@citrix.com>
---
Updates since v3
* hvmloader: signal the event channel after requesting a reconnect
(thanks to Zheng Li for diagnosing this)
* switch to using feature flags/bits instead of version numbers
* rename the 'closing' field to 'connection state'
* define an invalid packet type (0xffff, since 0x0 was already taken)
* on ring connection/reset, fill the input and output buffers with
the invalid packet
* ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
remove #define ERROR_FOO
* doc: write from the guest's point of view, weaken guarantees made by
server (e.g. producer = consumer != 0 is valid after reconnect)
* doc: relate reconnection to the RESET_WATCHES command in the higher
level protocol
* doc: be more specific about who must write what, when
Updates since v2
* hvmloader: use volatile for read of closing flag
* style improvements
* remove xenstore version #defines
Updates since v1
* remove unused client version and associated dead code
* treat 'closing' as a flag by using "!=0" rather than "==1"
* hvmloader: move function up and remove forward decl
* document the existing xenstore ring and the extention in misc/
---
docs/misc/xenstore-ring.txt | 112 +++++++++++++++++++++++++++++++++++
tools/firmware/hvmloader/xenbus.c | 48 +++++++++------
tools/ocaml/libs/xb/xb.ml | 27 ++++++++-
tools/ocaml/libs/xb/xb.mli | 1 +
tools/ocaml/libs/xb/xs_ring.ml | 10 ++++
tools/ocaml/libs/xb/xs_ring_stubs.c | 109 ++++++++++++++++++++++++----------
xen/include/public/io/xs_wire.h | 13 +++-
7 files changed, 269 insertions(+), 51 deletions(-)
create mode 100644 docs/misc/xenstore-ring.txt
diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
new file mode 100644
index 0000000..6d18556
--- /dev/null
+++ b/docs/misc/xenstore-ring.txt
@@ -0,0 +1,112 @@
+The xenstore ring is a datastructure stored within a single 4KiB page
+shared between the xenstore server and the guest. The ring contains
+two queues of bytes -- one in each direction -- and some signalling
+information. The [xenstore protocol](xenstore.txt) is layered on top of
+the byte streams.
+
+The xenstore ring datastructure
+===============================
+
+The following table describes the ring structure where
+ - offsets and lengths are in bytes;
+ - "Input" is used to describe the data sent to the server; and
+ - "Output" is used to describe the data sent to the domain.
+
+Offset Length Description
+-----------------------------------------------------------------
+0 1024 Input data
+1024 1024 Output data
+2048 4 Input consumer offset
+2052 4 Input producer offset
+2056 4 Output consumer offset
+2060 4 Output producer offset
+2064 4 Server feature bitmap
+2068 4 Connection state
+
+The Input data and Output data are circular buffers. Each buffer is
+associated with a pair of free-running offsets labelled "consumer" and
+"producer".
+
+A "producer" offset is the offset in the byte stream of the next byte
+to be written modulo 2^32. A "consumer" offset is the offset in the byte
+stream of the next byte to be read modulo 2^32. Implementations must
+take care to handle wraparound properly when performing arithmetic with
+these values.
+
+The byte at offset 'x' in the byte stream will be stored at offset
+'x modulo 1024' in the circular buffer.
+
+Implementations may only overwrite previously-written data if it has
+been marked as 'consumed' by the relevant consumer pointer.
+
+When the guest domain is created, there is no outstanding Input or Output
+data. However
+
+ - guests must not assume that producer or consumer pointers start
+ at zero; and
+ - guests must not assume that unused bytes in either the Input or
+ Output data buffers has any particular value.
+
+A xenstore ring is always associated with an event channel. Whenever the
+ring structure is updated the event channel must be signalled. The
+guest and server are free to inspect the contents of the ring at any
+time, not only in response to an event channel event. This implies that
+updates must be ordered carefully to ensure consistency.
+
+The xenstore server may decide to advertise some features via the
+"Server feature bitmap". The server can start advertising features
+at any time by setting bits but it will never stop advertising features
+i.e. bits will never be cleared. The guest is not permitted to write to
+the server feature bitmap. The server features are offered to the guest;
+it is up to the guest whether to use them or not. The guest should ignore
+any unknown feature bits.
+
+The following features are defined:
+
+Mask Description
+-----------------------------------------------------------------
+1 Ring reconnection (see the ring reconnection feature below)
+
+The "Connection state" field is used to request a ring close and reconnect.
+The "Connection state" field only contains valid data if the server has
+advertised the ring reconnection feature. If the feature has been advertised
+then the "Connection state" may take the following values:
+
+Value Description
+-----------------------------------------------------------------
+0 Ring is connected
+1 Ring close and reconnect is in progress (see the "ring
+ reconnection feature" described below)
+
+The ring reconnection feature
+=============================
+
+The ring reconnection feature allows the guest to ask the server to
+reset the ring to a valid initial state i.e. where the Input and Output
+queues contain no data. The feature operates at the ring-level only;
+it does not affect the higher-level protocol state machine at all.
+To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
+command in the [xenstore protocol](xenstore.txt) specification.
+
+The ring reconnection feature is only available if the 'Ring reconnection'
+feature bit has been set by the server in the "Server feature bitmap".
+
+Assuming the server has advertised the feature, the guest can initiate
+a reconnection by setting the the Connection state to 1 ("Ring close
+and reconnect is in progress") and signalling the event channel.
+The guest must now ignore all fields except the Connection state and
+wait for it to be set to 0 ("Ring is connected")
+
+The server will guarantee to
+
+ - drop any partially read or written higher-level
+ [xenstore protocol](xenstore.txt) packets it may have;
+ - empty the Input and Output queues in the xenstore ring;
+ - set the Connection state to 0 ("Ring is connected"); and
+ - signal the event channel.
+
+From the point of view of the guest, the connection has been reset on a
+packet boundary.
+
+Note that only the guest may set the Connection state to 1 and only the
+server may set it back to 0.
diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
index 64c2176..f900a1e 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
static evtchn_port_t event; /* Event-channel to dom0 */
static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area */
+static void ring_wait(void)
+{
+ struct shared_info *shinfo = get_shared_info();
+ struct sched_poll poll;
+
+ memset(&poll, 0, sizeof(poll));
+ set_xen_guest_handle(poll.ports, &event);
+ poll.nr_ports = 1;
+
+ while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
+ hypercall_sched_op(SCHEDOP_poll, &poll);
+}
+
/* Connect our xenbus client to the backend.
* Call once, before any other xenbus actions. */
void xenbus_setup(void)
@@ -61,14 +74,26 @@ void xenbus_setup(void)
void xenbus_shutdown(void)
{
struct shared_info *shinfo = get_shared_info();
+ evtchn_send_t send;
ASSERT(rings != NULL);
- /* We zero out the whole ring -- the backend can handle this, and it's
- * not going to surprise any frontends since it's equivalent to never
- * having used the rings. */
- memset(rings, 0, sizeof *rings);
-
+ if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
+ rings->connection = XENSTORE_RECONNECT;
+ send.port = event;
+ hypercall_event_channel_op(EVTCHNOP_send, &send);
+ while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
+ ring_wait ();
+ } else {
+ /* If the backend reads the state while we're erasing it then the
+ * ring state will become corrupted, preventing guest frontends from
+ * connecting. This is rare. To help diagnose the failure, we fill
+ * the ring with XS_INVALID packets. */
+ memset(rings->req, 0xff, XENSTORE_RING_SIZE);
+ memset(rings->rsp, 0xff, XENSTORE_RING_SIZE);
+ rings->req_cons = rings->req_prod = 0;
+ rings->rsp_cons = rings->rsp_prod = 0;
+ }
/* Clear the event-channel state too. */
memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
memset(shinfo->evtchn_pending, 0, sizeof(shinfo->evtchn_pending));
@@ -77,19 +102,6 @@ void xenbus_shutdown(void)
rings = NULL;
}
-static void ring_wait(void)
-{
- struct shared_info *shinfo = get_shared_info();
- struct sched_poll poll;
-
- memset(&poll, 0, sizeof(poll));
- set_xen_guest_handle(poll.ports, &event);
- poll.nr_ports = 1;
-
- while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
- hypercall_sched_op(SCHEDOP_poll, &poll);
-}
-
/* Helper functions: copy data in and out of the ring */
static void ring_write(const char *data, uint32_t len)
{
diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
index 29d354d..6332433 100644
--- a/tools/ocaml/libs/xb/xb.ml
+++ b/tools/ocaml/libs/xb/xb.ml
@@ -84,7 +84,26 @@ let write con s len =
| Fd backfd -> write_fd backfd con s len
| Xenmmap backmmap -> write_mmap backmmap con s len
-let output con =
+(* If a function throws Xs_ring.Reconnect, then clear the ring state
+ and serve the ring again. *)
+let rec handle_reconnect f con =
+ match (try Some (f con) with Xs_ring.Reconnect -> None) with
+ | Some x -> x
+ | None ->
+ begin match con.backend with
+ | Fd _ -> raise Xs_ring.Reconnect (* should never happen, but just in case *)
+ | Xenmmap backend ->
+ Xs_ring.close backend.mmap;
+ backend.eventchn_notify ();
+ (* Clear our old connection state *)
+ Queue.clear con.pkt_in;
+ Queue.clear con.pkt_out;
+ con.partial_in <- init_partial_in ();
+ con.partial_out <- "";
+ handle_reconnect f con
+ end
+
+let output = handle_reconnect (fun con ->
(* get the output string from a string_of(packet) or partial_out *)
let s = if String.length con.partial_out > 0 then
con.partial_out
@@ -101,8 +120,9 @@ let output con =
);
(* after sending one packet, partial is empty *)
con.partial_out = ""
+)
-let input con =
+let input = handle_reconnect (fun con ->
let newpacket = ref false in
let to_read =
match con.partial_in with
@@ -133,6 +153,7 @@ let input con =
HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
);
!newpacket
+)
let newcon backend = {
backend = backend;
@@ -145,6 +166,8 @@ let newcon backend = {
let open_fd fd = newcon (Fd { fd = fd; })
let open_mmap mmap notifyfct =
+ (* Advertise XENSTORE_SERVER_FEATURE_RECONNECTION *)
+ Xs_ring.set_server_features mmap 1;
newcon (Xenmmap {
mmap = mmap;
eventchn_notify = notifyfct;
diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
index 58234ae..d47d869 100644
--- a/tools/ocaml/libs/xb/xb.mli
+++ b/tools/ocaml/libs/xb/xb.mli
@@ -80,6 +80,7 @@ val read : t -> string -> int -> int
val write_fd : backend_fd -> 'a -> string -> int -> int
val write_mmap : backend_mmap -> 'a -> string -> int -> int
val write : t -> string -> int -> int
+val handle_reconnect : (t -> 'a) -> t -> 'a
val output : t -> bool
val input : t -> bool
val newcon : backend -> t
diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
index 9469609..2c4eeab 100644
--- a/tools/ocaml/libs/xb/xs_ring.ml
+++ b/tools/ocaml/libs/xb/xs_ring.ml
@@ -14,5 +14,15 @@
* GNU Lesser General Public License for more details.
*)
+exception Reconnect
+
+let _ =
+ Callback.register_exception "Xs_ring.Reconnect" Reconnect
+
external read: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_read"
external write: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
+
+external set_server_features: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_features" "noalloc"
+external get_server_features: Xenmmap.mmap_interface -> int = "ml_interface_get_server_features" "noalloc"
+
+external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" "noalloc"
diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
index 8bd1047..c728a01 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -35,22 +35,39 @@
#define GET_C_STRUCT(a) ((struct mmap_interface *) a)
-static int xs_ring_read(struct mmap_interface *interface,
- char *buffer, int len)
+CAMLprim value ml_interface_read(value ml_interface,
+ value ml_buffer,
+ value ml_len)
{
+ CAMLparam3(ml_interface, ml_buffer, ml_len);
+ CAMLlocal1(ml_result);
+
+ struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
+ char *buffer = String_val(ml_buffer);
+ int len = Int_val(ml_len);
+ int result;
+
struct xenstore_domain_interface *intf = interface->addr;
XENSTORE_RING_IDX cons, prod; /* offsets only */
int to_read;
+ uint32_t connection;
cons = *(volatile uint32*)&intf->req_cons;
prod = *(volatile uint32*)&intf->req_prod;
+ connection = *(volatile uint32*)&intf->connection;
+
+ if (connection != XENSTORE_CONNECTED)
+ caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
+
xen_mb();
if ((prod - cons) > XENSTORE_RING_SIZE)
- return -1;
+ caml_failwith("bad connection");
- if (prod == cons)
- return 0;
+ if (prod == cons) {
+ result = 0;
+ goto exit;
+ }
cons = MASK_XENSTORE_IDX(cons);
prod = MASK_XENSTORE_IDX(prod);
if (prod > cons)
@@ -62,21 +79,41 @@ static int xs_ring_read(struct mmap_interface *interface,
memcpy(buffer, intf->req + cons, len);
xen_mb();
intf->req_cons += len;
- return len;
+ result = len;
+exit:
+ ml_result = Val_int(result);
+ CAMLreturn(ml_result);
}
-static int xs_ring_write(struct mmap_interface *interface,
- char *buffer, int len)
+CAMLprim value ml_interface_write(value ml_interface,
+ value ml_buffer,
+ value ml_len)
{
+ CAMLparam3(ml_interface, ml_buffer, ml_len);
+ CAMLlocal1(ml_result);
+
+ struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
+ char *buffer = String_val(ml_buffer);
+ int len = Int_val(ml_len);
+ int result;
+
struct xenstore_domain_interface *intf = interface->addr;
XENSTORE_RING_IDX cons, prod;
int can_write;
+ uint32_t connection;
cons = *(volatile uint32*)&intf->rsp_cons;
prod = *(volatile uint32*)&intf->rsp_prod;
+ connection = *(volatile uint32*)&intf->connection;
+
+ if (connection != XENSTORE_CONNECTED)
+ caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
+
xen_mb();
- if ( (prod - cons) >= XENSTORE_RING_SIZE )
- return 0;
+ if ( (prod - cons) >= XENSTORE_RING_SIZE ) {
+ result = 0;
+ goto exit;
+ }
if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons))
can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod);
else
@@ -86,31 +123,43 @@ static int xs_ring_write(struct mmap_interface *interface,
memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len);
xen_mb();
intf->rsp_prod += len;
- return len;
+ result = len;
+exit:
+ ml_result = Val_int(result);
+ CAMLreturn(ml_result);
}
-CAMLprim value ml_interface_read(value interface, value buffer, value len)
+CAMLprim value ml_interface_set_server_features(value interface, value v)
{
- CAMLparam3(interface, buffer, len);
- CAMLlocal1(result);
- int res;
+ CAMLparam2(interface, v);
+ struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
- res = xs_ring_read(GET_C_STRUCT(interface),
- String_val(buffer), Int_val(len));
- if (res == -1)
- caml_failwith("bad connection");
- result = Val_int(res);
- CAMLreturn(result);
+ intf->server_features = Int_val(v);
+
+ CAMLreturn(Val_unit);
+}
+
+CAMLprim value ml_interface_get_server_features(value interface)
+{
+ CAMLparam1(interface);
+ struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+ CAMLreturn(Val_int (intf->server_features));
}
-CAMLprim value ml_interface_write(value interface, value buffer, value len)
+CAMLprim value ml_interface_close(value interface)
{
- CAMLparam3(interface, buffer, len);
- CAMLlocal1(result);
- int res;
-
- res = xs_ring_write(GET_C_STRUCT(interface),
- String_val(buffer), Int_val(len));
- result = Val_int(res);
- CAMLreturn(result);
+ CAMLparam1(interface);
+ struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+ int i;
+
+ intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
+ /* Ensure the unused space is full of invalid xenstore packets. */
+ for (i = 0; i < XENSTORE_RING_SIZE; i++) {
+ intf->req[i] = 0xff; /* XS_INVALID = 0xffff */
+ intf->rsp[i] = 0xff;
+ }
+ xen_mb ();
+ intf->connection = XENSTORE_CONNECTED;
+ CAMLreturn(Val_unit);
}
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 585f0c8..0a0cdbc 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -49,7 +49,9 @@ enum xsd_sockmsg_type
XS_RESUME,
XS_SET_TARGET,
XS_RESTRICT,
- XS_RESET_WATCHES
+ XS_RESET_WATCHES,
+
+ XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */
};
#define XS_WRITE_NONE "NONE"
@@ -116,6 +118,8 @@ struct xenstore_domain_interface {
char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
XENSTORE_RING_IDX req_cons, req_prod;
XENSTORE_RING_IDX rsp_cons, rsp_prod;
+ uint32_t server_features; /* Bitmap of features supported by the server */
+ uint32_t connection;
};
/* Violating this is very bad. See docs/misc/xenstore.txt. */
@@ -125,6 +129,13 @@ struct xenstore_domain_interface {
#define XENSTORE_ABS_PATH_MAX 3072
#define XENSTORE_REL_PATH_MAX 2048
+/* The ability to reconnect a ring */
+#define XENSTORE_SERVER_FEATURE_RECONNECTION 1
+
+/* Valid values for the connection field */
+#define XENSTORE_CONNECTED 0 /* the steady-state */
+#define XENSTORE_RECONNECT 1 /* guest has initiated a reconnect */
+
#endif /* _XS_WIRE_H */
/*
--
1.7.10.4
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
2014-09-03 16:25 ` [PATCH v4] xenstore: " David Scott
@ 2014-09-10 13:35 ` Ian Campbell
2014-09-10 14:33 ` Dave Scott
[not found] ` <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>
2014-09-24 13:36 ` Jon Ludlam
2 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-09-10 13:35 UTC (permalink / raw)
To: David Scott, Ian Jackson
Cc: anil, Andrew.Cooper3, Jonathan.Ludlam, yanqiangjun, Paul.Durrant,
john.liuqiming, xen-devel
On Wed, 2014-09-03 at 17:25 +0100, David Scott wrote:
> Hvmloader uses the xenstore ring and then tries to reset it back
> to its initial state before booting the guest. Occasionally xenstored
> will read the ring while it is being zeroed and conclude it has
> been corrupted. This prevents PV drivers from loading in the guest.
>
> This patch updates the xenstore ring protocol definition, enabling
> a server to advertise additional features to the guest. One such feature
> is defined: the ability to cleanly reset the ring (but not the
> higher-level protocol, for which we already have RESET_WATCHES).
Is RESET_WATCHES complete enough to be considered a higher-level
protocol reset, as opposed to just doing the watch stuff? (maybe there
is no other state to speak of?)
> This patch implements the ring reconnection features in oxenstored
> and hvmloader, fixing the bug.
I suppose it is worth mentioning here that C xenstored is untouched but
that the hvmloader changes have been written to work with an arbitrary
xenstore by using the protocol. (at least I hope it has!)
> This patch also defines an 'invalid' xenstore packet type and uses this
> to poison the ring over a reconnect. This will make diagnosing this
> bug much easier in future.
>
> Signed-off-by: David Scott <dave.scott@citrix.com>
I've made some comments below but I think it might be worth waiting for
Ian Jackson's input, since he had comments last time, before rushing to
make any changes.
>
> ---
>
> Updates since v3
> * hvmloader: signal the event channel after requesting a reconnect
> (thanks to Zheng Li for diagnosing this)
> * switch to using feature flags/bits instead of version numbers
> * rename the 'closing' field to 'connection state'
> * define an invalid packet type (0xffff, since 0x0 was already taken)
> * on ring connection/reset, fill the input and output buffers with
> the invalid packet
> * ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
> remove #define ERROR_FOO
> * doc: write from the guest's point of view, weaken guarantees made by
> server (e.g. producer = consumer != 0 is valid after reconnect)
> * doc: relate reconnection to the RESET_WATCHES command in the higher
> level protocol
> * doc: be more specific about who must write what, when
>
> Updates since v2
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
>
> Updates since v1
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
> ---
> docs/misc/xenstore-ring.txt | 112 +++++++++++++++++++++++++++++++++++
> tools/firmware/hvmloader/xenbus.c | 48 +++++++++------
> tools/ocaml/libs/xb/xb.ml | 27 ++++++++-
> tools/ocaml/libs/xb/xb.mli | 1 +
> tools/ocaml/libs/xb/xs_ring.ml | 10 ++++
> tools/ocaml/libs/xb/xs_ring_stubs.c | 109 ++++++++++++++++++++++++----------
> xen/include/public/io/xs_wire.h | 13 +++-
> 7 files changed, 269 insertions(+), 51 deletions(-)
> create mode 100644 docs/misc/xenstore-ring.txt
>
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..6d18556
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
You seem to use markdown like syntax, so please name it .markdown and we
will get some sort of useful (well, ish) HTML out of it at build time.
> @@ -0,0 +1,112 @@
> +The xenstore ring is a datastructure stored within a single 4KiB page
> +shared between the xenstore server and the guest. The ring contains
> +two queues of bytes -- one in each direction -- and some signalling
> +information. The [xenstore protocol](xenstore.txt) is layered on top of
> +the byte streams.
> +
> +The xenstore ring datastructure
> +===============================
> +
> +The following table describes the ring structure where
> + - offsets and lengths are in bytes;
> + - "Input" is used to describe the data sent to the server; and
> + - "Output" is used to describe the data sent to the domain.
> +
> +Offset Length Description
> +-----------------------------------------------------------------
> +0 1024 Input data
> +1024 1024 Output data
> +2048 4 Input consumer offset
> +2052 4 Input producer offset
> +2056 4 Output consumer offset
> +2060 4 Output producer offset
> +2064 4 Server feature bitmap
> +2068 4 Connection state
> +
> +The Input data and Output data are circular buffers. Each buffer is
> +associated with a pair of free-running offsets labelled "consumer" and
> +"producer".
> +
> +A "producer" offset is the offset in the byte stream of the next byte
> +to be written modulo 2^32. A "consumer" offset is the offset in the byte
> +stream of the next byte to be read modulo 2^32. Implementations must
> +take care to handle wraparound properly when performing arithmetic with
> +these values.
> +
> +The byte at offset 'x' in the byte stream will be stored at offset
> +'x modulo 1024' in the circular buffer.
> +
> +Implementations may only overwrite previously-written data if it has
> +been marked as 'consumed' by the relevant consumer pointer.
> +
> +When the guest domain is created, there is no outstanding Input or Output
> +data. However
> +
> + - guests must not assume that producer or consumer pointers start
> + at zero; and
> + - guests must not assume that unused bytes in either the Input or
> + Output data buffers has any particular value.
> +
> +A xenstore ring is always associated with an event channel. Whenever the
> +ring structure is updated the event channel must be signalled. The
> +guest and server are free to inspect the contents of the ring at any
> +time, not only in response to an event channel event. This implies that
> +updates must be ordered carefully to ensure consistency.
> +
> +The xenstore server may decide to advertise some features via the
> +"Server feature bitmap". The server can start advertising features
> +at any time by setting bits but it will never stop advertising features
> +i.e. bits will never be cleared. The guest is not permitted to write to
> +the server feature bitmap. The server features are offered to the guest;
> +it is up to the guest whether to use them or not. The guest should ignore
> +any unknown feature bits.
> +
> +The following features are defined:
> +
> +Mask Description
> +-----------------------------------------------------------------
> +1 Ring reconnection (see the ring reconnection feature below)
> +
> +The "Connection state" field is used to request a ring close and reconnect.
> +The "Connection state" field only contains valid data if the server has
> +advertised the ring reconnection feature. If the feature has been advertised
> +then the "Connection state" may take the following values:
> +
> +Value Description
> +-----------------------------------------------------------------
> +0 Ring is connected
> +1 Ring close and reconnect is in progress (see the "ring
> + reconnection feature" described below)
> +
> +The ring reconnection feature
> +=============================
> +
> +The ring reconnection feature allows the guest to ask the server to
> +reset the ring to a valid initial state i.e. where the Input and Output
> +queues contain no data. The feature operates at the ring-level only;
> +it does not affect the higher-level protocol state machine at all.
> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
typo here: theh.
> +command in the [xenstore protocol](xenstore.txt) specification.
> +
> +The ring reconnection feature is only available if the 'Ring reconnection'
> +feature bit has been set by the server in the "Server feature bitmap".
> +
> +Assuming the server has advertised the feature, the guest can initiate
> +a reconnection by setting the the Connection state to 1 ("Ring close
> +and reconnect is in progress") and signalling the event channel.
> +The guest must now ignore all fields except the Connection state and
> +wait for it to be set to 0 ("Ring is connected")
> +
> +The server will guarantee to
> +
> + - drop any partially read or written higher-level
> + [xenstore protocol](xenstore.txt) packets it may have;
> + - empty the Input and Output queues in the xenstore ring;
> + - set the Connection state to 0 ("Ring is connected"); and
> + - signal the event channel.
> +
> +From the point of view of the guest, the connection has been reset on a
> +packet boundary.
> +
> +Note that only the guest may set the Connection state to 1 and only the
> +server may set it back to 0.
> [...]
> - /* We zero out the whole ring -- the backend can handle this, and it's
> - * not going to surprise any frontends since it's equivalent to never
> - * having used the rings. */
> - memset(rings, 0, sizeof *rings);
> -
> + if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
> + rings->connection = XENSTORE_RECONNECT;
> + send.port = event;
> + hypercall_event_channel_op(EVTCHNOP_send, &send);
> + while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
I don't think you need the volatile here, certainly none of the other
ring accesses seem to use it.
Apart from that the C side of things looks good. I've not looked at the
ocaml side of things, do you have someone you can lean on to do some
review or do I need to blow the cobwebs off that part of my brain?
Ian.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
2014-09-10 13:35 ` Ian Campbell
@ 2014-09-10 14:33 ` Dave Scott
0 siblings, 0 replies; 29+ messages in thread
From: Dave Scott @ 2014-09-10 14:33 UTC (permalink / raw)
To: Ian Campbell
Cc: Anil Madhavapeddy, Andrew Cooper, Jonathan Ludlam, yanqiangjun,
Paul Durrant, xen-devel, Ian Jackson, john.liuqiming
Hi,
On 10 Sep 2014, at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Wed, 2014-09-03 at 17:25 +0100, David Scott wrote:
>> Hvmloader uses the xenstore ring and then tries to reset it back
>> to its initial state before booting the guest. Occasionally xenstored
>> will read the ring while it is being zeroed and conclude it has
>> been corrupted. This prevents PV drivers from loading in the guest.
>>
>> This patch updates the xenstore ring protocol definition, enabling
>> a server to advertise additional features to the guest. One such feature
>> is defined: the ability to cleanly reset the ring (but not the
>> higher-level protocol, for which we already have RESET_WATCHES).
>
> Is RESET_WATCHES complete enough to be considered a higher-level
> protocol reset, as opposed to just doing the watch stuff? (maybe there
> is no other state to speak of?)
I believe the only high-level connection state is watches + outstanding transactions. The xenstore.txt doc says that RESET_WATCHES should reset both:
RESET_WATCHES |
Reset all watches and transactions of the caller.
>
>> This patch implements the ring reconnection features in oxenstored
>> and hvmloader, fixing the bug.
>
> I suppose it is worth mentioning here that C xenstored is untouched but
> that the hvmloader changes have been written to work with an arbitrary
> xenstore by using the protocol. (at least I hope it has!)
That’s right — hvmloader is flexible.
>
>> This patch also defines an 'invalid' xenstore packet type and uses this
>> to poison the ring over a reconnect. This will make diagnosing this
>> bug much easier in future.
>>
>> Signed-off-by: David Scott <dave.scott@citrix.com>
>
> I've made some comments below but I think it might be worth waiting for
> Ian Jackson's input, since he had comments last time, before rushing to
> make any changes.
Sounds sensible to me.
Thanks,
Dave
>
>>
>> ---
>>
>> Updates since v3
>> * hvmloader: signal the event channel after requesting a reconnect
>> (thanks to Zheng Li for diagnosing this)
>> * switch to using feature flags/bits instead of version numbers
>> * rename the 'closing' field to 'connection state'
>> * define an invalid packet type (0xffff, since 0x0 was already taken)
>> * on ring connection/reset, fill the input and output buffers with
>> the invalid packet
>> * ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
>> remove #define ERROR_FOO
>> * doc: write from the guest's point of view, weaken guarantees made by
>> server (e.g. producer = consumer != 0 is valid after reconnect)
>> * doc: relate reconnection to the RESET_WATCHES command in the higher
>> level protocol
>> * doc: be more specific about who must write what, when
>>
>> Updates since v2
>> * hvmloader: use volatile for read of closing flag
>> * style improvements
>> * remove xenstore version #defines
>>
>> Updates since v1
>> * remove unused client version and associated dead code
>> * treat 'closing' as a flag by using "!=0" rather than "==1"
>> * hvmloader: move function up and remove forward decl
>> * document the existing xenstore ring and the extention in misc/
>> ---
>> docs/misc/xenstore-ring.txt | 112 +++++++++++++++++++++++++++++++++++
>> tools/firmware/hvmloader/xenbus.c | 48 +++++++++------
>> tools/ocaml/libs/xb/xb.ml | 27 ++++++++-
>> tools/ocaml/libs/xb/xb.mli | 1 +
>> tools/ocaml/libs/xb/xs_ring.ml | 10 ++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c | 109 ++++++++++++++++++++++++----------
>> xen/include/public/io/xs_wire.h | 13 +++-
>> 7 files changed, 269 insertions(+), 51 deletions(-)
>> create mode 100644 docs/misc/xenstore-ring.txt
>>
>> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
>> new file mode 100644
>> index 0000000..6d18556
>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
>
> You seem to use markdown like syntax, so please name it .markdown and we
> will get some sort of useful (well, ish) HTML out of it at build time.
>
>> @@ -0,0 +1,112 @@
>> +The xenstore ring is a datastructure stored within a single 4KiB page
>> +shared between the xenstore server and the guest. The ring contains
>> +two queues of bytes -- one in each direction -- and some signalling
>> +information. The [xenstore protocol](xenstore.txt) is layered on top of
>> +the byte streams.
>> +
>> +The xenstore ring datastructure
>> +===============================
>> +
>> +The following table describes the ring structure where
>> + - offsets and lengths are in bytes;
>> + - "Input" is used to describe the data sent to the server; and
>> + - "Output" is used to describe the data sent to the domain.
>> +
>> +Offset Length Description
>> +-----------------------------------------------------------------
>> +0 1024 Input data
>> +1024 1024 Output data
>> +2048 4 Input consumer offset
>> +2052 4 Input producer offset
>> +2056 4 Output consumer offset
>> +2060 4 Output producer offset
>> +2064 4 Server feature bitmap
>> +2068 4 Connection state
>> +
>> +The Input data and Output data are circular buffers. Each buffer is
>> +associated with a pair of free-running offsets labelled "consumer" and
>> +"producer".
>> +
>> +A "producer" offset is the offset in the byte stream of the next byte
>> +to be written modulo 2^32. A "consumer" offset is the offset in the byte
>> +stream of the next byte to be read modulo 2^32. Implementations must
>> +take care to handle wraparound properly when performing arithmetic with
>> +these values.
>> +
>> +The byte at offset 'x' in the byte stream will be stored at offset
>> +'x modulo 1024' in the circular buffer.
>> +
>> +Implementations may only overwrite previously-written data if it has
>> +been marked as 'consumed' by the relevant consumer pointer.
>> +
>> +When the guest domain is created, there is no outstanding Input or Output
>> +data. However
>> +
>> + - guests must not assume that producer or consumer pointers start
>> + at zero; and
>> + - guests must not assume that unused bytes in either the Input or
>> + Output data buffers has any particular value.
>> +
>> +A xenstore ring is always associated with an event channel. Whenever the
>> +ring structure is updated the event channel must be signalled. The
>> +guest and server are free to inspect the contents of the ring at any
>> +time, not only in response to an event channel event. This implies that
>> +updates must be ordered carefully to ensure consistency.
>> +
>> +The xenstore server may decide to advertise some features via the
>> +"Server feature bitmap". The server can start advertising features
>> +at any time by setting bits but it will never stop advertising features
>> +i.e. bits will never be cleared. The guest is not permitted to write to
>> +the server feature bitmap. The server features are offered to the guest;
>> +it is up to the guest whether to use them or not. The guest should ignore
>> +any unknown feature bits.
>> +
>> +The following features are defined:
>> +
>> +Mask Description
>> +-----------------------------------------------------------------
>> +1 Ring reconnection (see the ring reconnection feature below)
>> +
>> +The "Connection state" field is used to request a ring close and reconnect.
>> +The "Connection state" field only contains valid data if the server has
>> +advertised the ring reconnection feature. If the feature has been advertised
>> +then the "Connection state" may take the following values:
>> +
>> +Value Description
>> +-----------------------------------------------------------------
>> +0 Ring is connected
>> +1 Ring close and reconnect is in progress (see the "ring
>> + reconnection feature" described below)
>> +
>> +The ring reconnection feature
>> +=============================
>> +
>> +The ring reconnection feature allows the guest to ask the server to
>> +reset the ring to a valid initial state i.e. where the Input and Output
>> +queues contain no data. The feature operates at the ring-level only;
>> +it does not affect the higher-level protocol state machine at all.
>> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
>
> typo here: theh.
>
>> +command in the [xenstore protocol](xenstore.txt) specification.
>> +
>> +The ring reconnection feature is only available if the 'Ring reconnection'
>> +feature bit has been set by the server in the "Server feature bitmap".
>> +
>> +Assuming the server has advertised the feature, the guest can initiate
>> +a reconnection by setting the the Connection state to 1 ("Ring close
>> +and reconnect is in progress") and signalling the event channel.
>> +The guest must now ignore all fields except the Connection state and
>> +wait for it to be set to 0 ("Ring is connected")
>> +
>> +The server will guarantee to
>> +
>> + - drop any partially read or written higher-level
>> + [xenstore protocol](xenstore.txt) packets it may have;
>> + - empty the Input and Output queues in the xenstore ring;
>> + - set the Connection state to 0 ("Ring is connected"); and
>> + - signal the event channel.
>> +
>> +From the point of view of the guest, the connection has been reset on a
>> +packet boundary.
>> +
>> +Note that only the guest may set the Connection state to 1 and only the
>> +server may set it back to 0.
>
>> [...]
>> - /* We zero out the whole ring -- the backend can handle this, and it's
>> - * not going to surprise any frontends since it's equivalent to never
>> - * having used the rings. */
>> - memset(rings, 0, sizeof *rings);
>> -
>> + if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
>> + rings->connection = XENSTORE_RECONNECT;
>> + send.port = event;
>> + hypercall_event_channel_op(EVTCHNOP_send, &send);
>> + while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
>
> I don't think you need the volatile here, certainly none of the other
> ring accesses seem to use it.
>
> Apart from that the C side of things looks good. I've not looked at the
> ocaml side of things, do you have someone you can lean on to do some
> review or do I need to blow the cobwebs off that part of my brain?
>
> Ian.
^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>]
* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
[not found] ` <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>
@ 2014-09-22 16:38 ` Ian Jackson
2014-09-24 9:06 ` Dave Scott
0 siblings, 1 reply; 29+ messages in thread
From: Ian Jackson @ 2014-09-22 16:38 UTC (permalink / raw)
To: Dave Scott; +Cc: xen-devel, Ian Campbell
On 3 Sep 2014, at 17:25, David Scott <dave.scott@citrix.com> wrote:
> Hvmloader uses the xenstore ring and then tries to reset it back
> to its initial state before booting the guest. Occasionally xenstored
> will read the ring while it is being zeroed and conclude it has
> been corrupted. This prevents PV drivers from loading in the guest.
This protocol doc is almost perfect - see below. I see that Ian C has
reviewed the C implementation. Have you found anyone to provide a 2nd
opinion on the ocaml ?
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> +The xenstore server may decide to advertise some features via the
> +"Server feature bitmap". The server can start advertising features
> +at any time by setting bits but it will never stop advertising features
> +i.e. bits will never be cleared. The guest is not permitted to write to
> +the server feature bitmap. The server features are offered to the guest;
> +it is up to the guest whether to use them or not. The guest should ignore
> +any unknown feature bits.
The rule that the server may start advertising features at any time is
reasonable but it needs to be accompanied by a promise, for any
specific feature, of the latest time at which the server will normally
advertise it. Otherwise the client won't know when to check it and
whether it can sensibly cache the result.
> +Mask Description
> +-----------------------------------------------------------------
> +1 Ring reconnection (see the ring reconnection feature below)
> +
> +The "Connection state" field is used to request a ring close and reconnect.
> +The "Connection state" field only contains valid data if the server has
> +advertised the ring reconnection feature. If the feature has been advertised
> +then the "Connection state" may take the following values:
For example, it would be useful to say here that this feature is
normally advertised from the start of the guest's operation.
> +The ring reconnection feature
> +=============================
> +
> +The ring reconnection feature allows the guest to ask the server to
> +reset the ring to a valid initial state i.e. where the Input and Output
> +queues contain no data. The feature operates at the ring-level only;
> +it does not affect the higher-level protocol state machine at all.
> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
> +command in the [xenstore protocol](xenstore.txt) specification.
> +
> +The ring reconnection feature is only available if the 'Ring reconnection'
> +feature bit has been set by the server in the "Server feature bitmap".
> +
> +Assuming the server has advertised the feature, the guest can initiate
> +a reconnection by setting the the Connection state to 1 ("Ring close
> +and reconnect is in progress") and signalling the event channel.
> +The guest must now ignore all fields except the Connection state and
> +wait for it to be set to 0 ("Ring is connected")
> +
> +The server will guarantee to
> +
> + - drop any partially read or written higher-level
> + [xenstore protocol](xenstore.txt) packets it may have;
> + - empty the Input and Output queues in the xenstore ring;
> + - set the Connection state to 0 ("Ring is connected"); and
> + - signal the event channel.
I think the server needs to guarantee that there are no outstanding
commands, somehow. As you write this, the server might have a command
being processed which is no longer in the Input queue but whose
response has not yet arrived in the Output queue.
You might say that the client can use the response to RESET_WATCHES to
distinguish replies to old commands (before the RESET_WATCHES reply)
from replies to new ones. But (a) the server does not guarantee to
process messages in order and (b) the client might get unlucky and use
the same request id for its RESET_WATCHES as one of the `old'
commands.
I assume (without checking) that there is a suitable guarantee which
could be written down and which is actually implemented by both
existing xenstored implementations...
Thanks,
Ian.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
2014-09-22 16:38 ` Ian Jackson
@ 2014-09-24 9:06 ` Dave Scott
0 siblings, 0 replies; 29+ messages in thread
From: Dave Scott @ 2014-09-24 9:06 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel, Jonathan Ludlam, Ian Campbell
On 22 Sep 2014, at 17:38, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> On 3 Sep 2014, at 17:25, David Scott <dave.scott@citrix.com> wrote:
>> Hvmloader uses the xenstore ring and then tries to reset it back
>> to its initial state before booting the guest. Occasionally xenstored
>> will read the ring while it is being zeroed and conclude it has
>> been corrupted. This prevents PV drivers from loading in the guest.
>
> This protocol doc is almost perfect - see below. I see that Ian C has
> reviewed the C implementation. Have you found anyone to provide a 2nd
> opinion on the ocaml ?
I’ve cc:d Jon Ludlam. Jon: do you have time to review OCaml part of the patch?
>
>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
>> +The xenstore server may decide to advertise some features via the
>> +"Server feature bitmap". The server can start advertising features
>> +at any time by setting bits but it will never stop advertising features
>> +i.e. bits will never be cleared. The guest is not permitted to write to
>> +the server feature bitmap. The server features are offered to the guest;
>> +it is up to the guest whether to use them or not. The guest should ignore
>> +any unknown feature bits.
>
> The rule that the server may start advertising features at any time is
> reasonable but it needs to be accompanied by a promise, for any
> specific feature, of the latest time at which the server will normally
> advertise it. Otherwise the client won't know when to check it and
> whether it can sensibly cache the result.
>
>> +Mask Description
>> +-----------------------------------------------------------------
>> +1 Ring reconnection (see the ring reconnection feature below)
>> +
>> +The "Connection state" field is used to request a ring close and reconnect.
>> +The "Connection state" field only contains valid data if the server has
>> +advertised the ring reconnection feature. If the feature has been advertised
>> +then the "Connection state" may take the following values:
>
> For example, it would be useful to say here that this feature is
> normally advertised from the start of the guest's operation.
OK.
>
>> +The ring reconnection feature
>> +=============================
>> +
>> +The ring reconnection feature allows the guest to ask the server to
>> +reset the ring to a valid initial state i.e. where the Input and Output
>> +queues contain no data. The feature operates at the ring-level only;
>> +it does not affect the higher-level protocol state machine at all.
>> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
>> +command in the [xenstore protocol](xenstore.txt) specification.
>> +
>> +The ring reconnection feature is only available if the 'Ring reconnection'
>> +feature bit has been set by the server in the "Server feature bitmap".
>> +
>> +Assuming the server has advertised the feature, the guest can initiate
>> +a reconnection by setting the the Connection state to 1 ("Ring close
>> +and reconnect is in progress") and signalling the event channel.
>> +The guest must now ignore all fields except the Connection state and
>> +wait for it to be set to 0 ("Ring is connected")
>> +
>> +The server will guarantee to
>> +
>> + - drop any partially read or written higher-level
>> + [xenstore protocol](xenstore.txt) packets it may have;
>> + - empty the Input and Output queues in the xenstore ring;
>> + - set the Connection state to 0 ("Ring is connected"); and
>> + - signal the event channel.
>
> I think the server needs to guarantee that there are no outstanding
> commands, somehow. As you write this, the server might have a command
> being processed which is no longer in the Input queue but whose
> response has not yet arrived in the Output queue.
>
> You might say that the client can use the response to RESET_WATCHES to
> distinguish replies to old commands (before the RESET_WATCHES reply)
> from replies to new ones. But (a) the server does not guarantee to
> process messages in order and (b) the client might get unlucky and use
> the same request id for its RESET_WATCHES as one of the `old'
> commands.
I see what you mean. For a ‘clean’ reconnect like hvmloader we can be
sure that there are no outstanding requests because hvmloader has already
received all the replies. For ‘unclean’ reconnects like a crash-kernel
start there is definitely a problem.
Since the request ids are local to the connection, and the connection has
just closed and re-opened, I think it makes sense for the server to drop
all old request ids. The client would then be able to safely use any
request id for the RESET_WATCHES without fearing a old reply coming back.
It would be even nicer from a client perspective (although I’m not sure
how this would affect the diff) if the ring connection re-open performed
a RESET_WATCHES automatically, since then it would act the same as closing
and re-opening your Unix domain socket.
Cheers,
Dave
> I assume (without checking) that there is a suitable guarantee which
> could be written down and which is actually implemented by both
> existing xenstored implementations...
>
> Thanks,
> Ian.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
2014-09-03 16:25 ` [PATCH v4] xenstore: " David Scott
2014-09-10 13:35 ` Ian Campbell
[not found] ` <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>
@ 2014-09-24 13:36 ` Jon Ludlam
2 siblings, 0 replies; 29+ messages in thread
From: Jon Ludlam @ 2014-09-24 13:36 UTC (permalink / raw)
To: David Scott, xen-devel
Cc: anil, Andrew.Cooper3, Jonathan.Ludlam, yanqiangjun, Paul.Durrant,
john.liuqiming
On 03/09/14 17:25, David Scott wrote:
> Hvmloader uses the xenstore ring and then tries to reset it back
> to its initial state before booting the guest. Occasionally xenstored
> will read the ring while it is being zeroed and conclude it has
> been corrupted. This prevents PV drivers from loading in the guest.
>
> This patch updates the xenstore ring protocol definition, enabling
> a server to advertise additional features to the guest. One such feature
> is defined: the ability to cleanly reset the ring (but not the
> higher-level protocol, for which we already have RESET_WATCHES).
>
> This patch implements the ring reconnection features in oxenstored
> and hvmloader, fixing the bug.
>
> This patch also defines an 'invalid' xenstore packet type and uses this
> to poison the ring over a reconnect. This will make diagnosing this
> bug much easier in future.
>
> Signed-off-by: David Scott <dave.scott@citrix.com>
>
> ---
>
> Updates since v3
> * hvmloader: signal the event channel after requesting a reconnect
> (thanks to Zheng Li for diagnosing this)
> * switch to using feature flags/bits instead of version numbers
> * rename the 'closing' field to 'connection state'
> * define an invalid packet type (0xffff, since 0x0 was already taken)
> * on ring connection/reset, fill the input and output buffers with
> the invalid packet
> * ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
> remove #define ERROR_FOO
> * doc: write from the guest's point of view, weaken guarantees made by
> server (e.g. producer = consumer != 0 is valid after reconnect)
> * doc: relate reconnection to the RESET_WATCHES command in the higher
> level protocol
> * doc: be more specific about who must write what, when
>
> Updates since v2
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
>
> Updates since v1
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
> ---
> docs/misc/xenstore-ring.txt | 112 +++++++++++++++++++++++++++++++++++
> tools/firmware/hvmloader/xenbus.c | 48 +++++++++------
> tools/ocaml/libs/xb/xb.ml | 27 ++++++++-
> tools/ocaml/libs/xb/xb.mli | 1 +
> tools/ocaml/libs/xb/xs_ring.ml | 10 ++++
> tools/ocaml/libs/xb/xs_ring_stubs.c | 109 ++++++++++++++++++++++++----------
> xen/include/public/io/xs_wire.h | 13 +++-
> 7 files changed, 269 insertions(+), 51 deletions(-)
> create mode 100644 docs/misc/xenstore-ring.txt
>
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..6d18556
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,112 @@
> +The xenstore ring is a datastructure stored within a single 4KiB page
> +shared between the xenstore server and the guest. The ring contains
> +two queues of bytes -- one in each direction -- and some signalling
> +information. The [xenstore protocol](xenstore.txt) is layered on top of
> +the byte streams.
> +
> +The xenstore ring datastructure
> +===============================
> +
> +The following table describes the ring structure where
> + - offsets and lengths are in bytes;
> + - "Input" is used to describe the data sent to the server; and
> + - "Output" is used to describe the data sent to the domain.
> +
> +Offset Length Description
> +-----------------------------------------------------------------
> +0 1024 Input data
> +1024 1024 Output data
> +2048 4 Input consumer offset
> +2052 4 Input producer offset
> +2056 4 Output consumer offset
> +2060 4 Output producer offset
> +2064 4 Server feature bitmap
> +2068 4 Connection state
> +
> +The Input data and Output data are circular buffers. Each buffer is
> +associated with a pair of free-running offsets labelled "consumer" and
> +"producer".
> +
> +A "producer" offset is the offset in the byte stream of the next byte
> +to be written modulo 2^32. A "consumer" offset is the offset in the byte
> +stream of the next byte to be read modulo 2^32. Implementations must
> +take care to handle wraparound properly when performing arithmetic with
> +these values.
> +
> +The byte at offset 'x' in the byte stream will be stored at offset
> +'x modulo 1024' in the circular buffer.
> +
> +Implementations may only overwrite previously-written data if it has
> +been marked as 'consumed' by the relevant consumer pointer.
> +
> +When the guest domain is created, there is no outstanding Input or Output
> +data. However
> +
> + - guests must not assume that producer or consumer pointers start
> + at zero; and
> + - guests must not assume that unused bytes in either the Input or
> + Output data buffers has any particular value.
> +
> +A xenstore ring is always associated with an event channel. Whenever the
> +ring structure is updated the event channel must be signalled. The
> +guest and server are free to inspect the contents of the ring at any
> +time, not only in response to an event channel event. This implies that
> +updates must be ordered carefully to ensure consistency.
> +
> +The xenstore server may decide to advertise some features via the
> +"Server feature bitmap". The server can start advertising features
> +at any time by setting bits but it will never stop advertising features
> +i.e. bits will never be cleared. The guest is not permitted to write to
> +the server feature bitmap. The server features are offered to the guest;
> +it is up to the guest whether to use them or not. The guest should ignore
> +any unknown feature bits.
> +
> +The following features are defined:
> +
> +Mask Description
> +-----------------------------------------------------------------
> +1 Ring reconnection (see the ring reconnection feature below)
> +
> +The "Connection state" field is used to request a ring close and reconnect.
> +The "Connection state" field only contains valid data if the server has
> +advertised the ring reconnection feature. If the feature has been advertised
> +then the "Connection state" may take the following values:
> +
> +Value Description
> +-----------------------------------------------------------------
> +0 Ring is connected
> +1 Ring close and reconnect is in progress (see the "ring
> + reconnection feature" described below)
> +
> +The ring reconnection feature
> +=============================
> +
> +The ring reconnection feature allows the guest to ask the server to
> +reset the ring to a valid initial state i.e. where the Input and Output
> +queues contain no data. The feature operates at the ring-level only;
> +it does not affect the higher-level protocol state machine at all.
> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
> +command in the [xenstore protocol](xenstore.txt) specification.
> +
> +The ring reconnection feature is only available if the 'Ring reconnection'
> +feature bit has been set by the server in the "Server feature bitmap".
> +
> +Assuming the server has advertised the feature, the guest can initiate
> +a reconnection by setting the the Connection state to 1 ("Ring close
> +and reconnect is in progress") and signalling the event channel.
> +The guest must now ignore all fields except the Connection state and
> +wait for it to be set to 0 ("Ring is connected")
> +
> +The server will guarantee to
> +
> + - drop any partially read or written higher-level
> + [xenstore protocol](xenstore.txt) packets it may have;
> + - empty the Input and Output queues in the xenstore ring;
> + - set the Connection state to 0 ("Ring is connected"); and
> + - signal the event channel.
> +
> +From the point of view of the guest, the connection has been reset on a
> +packet boundary.
> +
> +Note that only the guest may set the Connection state to 1 and only the
> +server may set it back to 0.
> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
> index 64c2176..f900a1e 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
> static evtchn_port_t event; /* Event-channel to dom0 */
> static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area */
>
> +static void ring_wait(void)
> +{
> + struct shared_info *shinfo = get_shared_info();
> + struct sched_poll poll;
> +
> + memset(&poll, 0, sizeof(poll));
> + set_xen_guest_handle(poll.ports, &event);
> + poll.nr_ports = 1;
> +
> + while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> + hypercall_sched_op(SCHEDOP_poll, &poll);
> +}
> +
> /* Connect our xenbus client to the backend.
> * Call once, before any other xenbus actions. */
> void xenbus_setup(void)
> @@ -61,14 +74,26 @@ void xenbus_setup(void)
> void xenbus_shutdown(void)
> {
> struct shared_info *shinfo = get_shared_info();
> + evtchn_send_t send;
>
> ASSERT(rings != NULL);
>
> - /* We zero out the whole ring -- the backend can handle this, and it's
> - * not going to surprise any frontends since it's equivalent to never
> - * having used the rings. */
> - memset(rings, 0, sizeof *rings);
> -
> + if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
> + rings->connection = XENSTORE_RECONNECT;
> + send.port = event;
> + hypercall_event_channel_op(EVTCHNOP_send, &send);
> + while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
> + ring_wait ();
> + } else {
> + /* If the backend reads the state while we're erasing it then the
> + * ring state will become corrupted, preventing guest frontends from
> + * connecting. This is rare. To help diagnose the failure, we fill
> + * the ring with XS_INVALID packets. */
> + memset(rings->req, 0xff, XENSTORE_RING_SIZE);
> + memset(rings->rsp, 0xff, XENSTORE_RING_SIZE);
> + rings->req_cons = rings->req_prod = 0;
> + rings->rsp_cons = rings->rsp_prod = 0;
> + }
> /* Clear the event-channel state too. */
> memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
> memset(shinfo->evtchn_pending, 0, sizeof(shinfo->evtchn_pending));
> @@ -77,19 +102,6 @@ void xenbus_shutdown(void)
> rings = NULL;
> }
>
> -static void ring_wait(void)
> -{
> - struct shared_info *shinfo = get_shared_info();
> - struct sched_poll poll;
> -
> - memset(&poll, 0, sizeof(poll));
> - set_xen_guest_handle(poll.ports, &event);
> - poll.nr_ports = 1;
> -
> - while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> - hypercall_sched_op(SCHEDOP_poll, &poll);
> -}
> -
> /* Helper functions: copy data in and out of the ring */
> static void ring_write(const char *data, uint32_t len)
> {
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..6332433 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -84,7 +84,26 @@ let write con s len =
> | Fd backfd -> write_fd backfd con s len
> | Xenmmap backmmap -> write_mmap backmmap con s len
>
> -let output con =
> +(* If a function throws Xs_ring.Reconnect, then clear the ring state
> + and serve the ring again. *)
> +let rec handle_reconnect f con =
> + match (try Some (f con) with Xs_ring.Reconnect -> None) with
> + | Some x -> x
> + | None ->
> + begin match con.backend with
> + | Fd _ -> raise Xs_ring.Reconnect (* should never happen, but just in case *)
> + | Xenmmap backend ->
> + Xs_ring.close backend.mmap;
> + backend.eventchn_notify ();
> + (* Clear our old connection state *)
> + Queue.clear con.pkt_in;
> + Queue.clear con.pkt_out;
> + con.partial_in <- init_partial_in ();
> + con.partial_out <- "";
> + handle_reconnect f con
> + end
> +
> +let output = handle_reconnect (fun con ->
> (* get the output string from a string_of(packet) or partial_out *)
> let s = if String.length con.partial_out > 0 then
> con.partial_out
> @@ -101,8 +120,9 @@ let output con =
> );
> (* after sending one packet, partial is empty *)
> con.partial_out = ""
> +)
>
> -let input con =
> +let input = handle_reconnect (fun con ->
> let newpacket = ref false in
> let to_read =
> match con.partial_in with
> @@ -133,6 +153,7 @@ let input con =
> HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
> );
> !newpacket
> +)
>
> let newcon backend = {
> backend = backend;
> @@ -145,6 +166,8 @@ let newcon backend = {
> let open_fd fd = newcon (Fd { fd = fd; })
>
> let open_mmap mmap notifyfct =
> + (* Advertise XENSTORE_SERVER_FEATURE_RECONNECTION *)
> + Xs_ring.set_server_features mmap 1;
Minor nit here: the hard-coded 1 looks bad.
> newcon (Xenmmap {
> mmap = mmap;
> eventchn_notify = notifyfct;
> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
> index 58234ae..d47d869 100644
> --- a/tools/ocaml/libs/xb/xb.mli
> +++ b/tools/ocaml/libs/xb/xb.mli
> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
> val write_fd : backend_fd -> 'a -> string -> int -> int
> val write_mmap : backend_mmap -> 'a -> string -> int -> int
> val write : t -> string -> int -> int
> +val handle_reconnect : (t -> 'a) -> t -> 'a
> val output : t -> bool
> val input : t -> bool
> val newcon : backend -> t
> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
> index 9469609..2c4eeab 100644
> --- a/tools/ocaml/libs/xb/xs_ring.ml
> +++ b/tools/ocaml/libs/xb/xs_ring.ml
> @@ -14,5 +14,15 @@
> * GNU Lesser General Public License for more details.
> *)
>
> +exception Reconnect
> +
> +let _ =
> + Callback.register_exception "Xs_ring.Reconnect" Reconnect
> +
> external read: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_read"
> external write: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
> +
> +external set_server_features: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_features" "noalloc"
> +external get_server_features: Xenmmap.mmap_interface -> int = "ml_interface_get_server_features" "noalloc"
> +
> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" "noalloc"
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..c728a01 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,22 +35,39 @@
>
> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>
> -static int xs_ring_read(struct mmap_interface *interface,
> - char *buffer, int len)
> +CAMLprim value ml_interface_read(value ml_interface,
> + value ml_buffer,
> + value ml_len)
> {
> + CAMLparam3(ml_interface, ml_buffer, ml_len);
> + CAMLlocal1(ml_result);
> +
> + struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
> + char *buffer = String_val(ml_buffer);
> + int len = Int_val(ml_len);
> + int result;
> +
> struct xenstore_domain_interface *intf = interface->addr;
> XENSTORE_RING_IDX cons, prod; /* offsets only */
> int to_read;
> + uint32_t connection;
>
> cons = *(volatile uint32*)&intf->req_cons;
> prod = *(volatile uint32*)&intf->req_prod;
> + connection = *(volatile uint32*)&intf->connection;
> +
> + if (connection != XENSTORE_CONNECTED)
> + caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
> +
> xen_mb();
>
> if ((prod - cons) > XENSTORE_RING_SIZE)
> - return -1;
> + caml_failwith("bad connection");
>
> - if (prod == cons)
> - return 0;
> + if (prod == cons) {
> + result = 0;
> + goto exit;
> + }
> cons = MASK_XENSTORE_IDX(cons);
> prod = MASK_XENSTORE_IDX(prod);
> if (prod > cons)
> @@ -62,21 +79,41 @@ static int xs_ring_read(struct mmap_interface *interface,
> memcpy(buffer, intf->req + cons, len);
> xen_mb();
> intf->req_cons += len;
> - return len;
> + result = len;
> +exit:
> + ml_result = Val_int(result);
> + CAMLreturn(ml_result);
> }
>
> -static int xs_ring_write(struct mmap_interface *interface,
> - char *buffer, int len)
> +CAMLprim value ml_interface_write(value ml_interface,
> + value ml_buffer,
> + value ml_len)
> {
> + CAMLparam3(ml_interface, ml_buffer, ml_len);
> + CAMLlocal1(ml_result);
> +
> + struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
> + char *buffer = String_val(ml_buffer);
> + int len = Int_val(ml_len);
> + int result;
> +
> struct xenstore_domain_interface *intf = interface->addr;
> XENSTORE_RING_IDX cons, prod;
> int can_write;
> + uint32_t connection;
>
> cons = *(volatile uint32*)&intf->rsp_cons;
> prod = *(volatile uint32*)&intf->rsp_prod;
> + connection = *(volatile uint32*)&intf->connection;
> +
> + if (connection != XENSTORE_CONNECTED)
> + caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
> +
> xen_mb();
> - if ( (prod - cons) >= XENSTORE_RING_SIZE )
> - return 0;
> + if ( (prod - cons) >= XENSTORE_RING_SIZE ) {
> + result = 0;
> + goto exit;
> + }
> if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons))
> can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod);
> else
> @@ -86,31 +123,43 @@ static int xs_ring_write(struct mmap_interface *interface,
> memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len);
> xen_mb();
> intf->rsp_prod += len;
> - return len;
> + result = len;
> +exit:
> + ml_result = Val_int(result);
> + CAMLreturn(ml_result);
> }
>
> -CAMLprim value ml_interface_read(value interface, value buffer, value len)
> +CAMLprim value ml_interface_set_server_features(value interface, value v)
> {
> - CAMLparam3(interface, buffer, len);
> - CAMLlocal1(result);
> - int res;
> + CAMLparam2(interface, v);
> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
>
> - res = xs_ring_read(GET_C_STRUCT(interface),
> - String_val(buffer), Int_val(len));
> - if (res == -1)
> - caml_failwith("bad connection");
> - result = Val_int(res);
> - CAMLreturn(result);
> + intf->server_features = Int_val(v);
> +
> + CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_server_features(value interface)
> +{
> + CAMLparam1(interface);
> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> +
> + CAMLreturn(Val_int (intf->server_features));
> }
>
> -CAMLprim value ml_interface_write(value interface, value buffer, value len)
> +CAMLprim value ml_interface_close(value interface)
> {
> - CAMLparam3(interface, buffer, len);
> - CAMLlocal1(result);
> - int res;
> -
> - res = xs_ring_write(GET_C_STRUCT(interface),
> - String_val(buffer), Int_val(len));
> - result = Val_int(res);
> - CAMLreturn(result);
> + CAMLparam1(interface);
> + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> + int i;
> +
> + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
> + /* Ensure the unused space is full of invalid xenstore packets. */
> + for (i = 0; i < XENSTORE_RING_SIZE; i++) {
> + intf->req[i] = 0xff; /* XS_INVALID = 0xffff */
> + intf->rsp[i] = 0xff;
> + }
> + xen_mb ();
> + intf->connection = XENSTORE_CONNECTED;
> + CAMLreturn(Val_unit);
> }
> diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
> index 585f0c8..0a0cdbc 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -49,7 +49,9 @@ enum xsd_sockmsg_type
> XS_RESUME,
> XS_SET_TARGET,
> XS_RESTRICT,
> - XS_RESET_WATCHES
> + XS_RESET_WATCHES,
> +
> + XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */
> };
>
> #define XS_WRITE_NONE "NONE"
> @@ -116,6 +118,8 @@ struct xenstore_domain_interface {
> char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
> XENSTORE_RING_IDX req_cons, req_prod;
> XENSTORE_RING_IDX rsp_cons, rsp_prod;
> + uint32_t server_features; /* Bitmap of features supported by the server */
> + uint32_t connection;
> };
>
> /* Violating this is very bad. See docs/misc/xenstore.txt. */
> @@ -125,6 +129,13 @@ struct xenstore_domain_interface {
> #define XENSTORE_ABS_PATH_MAX 3072
> #define XENSTORE_REL_PATH_MAX 2048
>
> +/* The ability to reconnect a ring */
> +#define XENSTORE_SERVER_FEATURE_RECONNECTION 1
> +
> +/* Valid values for the connection field */
> +#define XENSTORE_CONNECTED 0 /* the steady-state */
> +#define XENSTORE_RECONNECT 1 /* guest has initiated a reconnect */
> +
> #endif /* _XS_WIRE_H */
>
> /*
Other than that, it looks good to me.
Reviewed-by: Jon Ludlam <jonathan.ludlam@citrix.com>
^ permalink raw reply [flat|nested] 29+ messages in thread