From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Durrant Subject: Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal Date: Fri, 4 Jul 2014 08:21:49 +0000 Message-ID: <9AAE0902D5BC7E449B7C8E4E778ABCD03CAFD1@AMSPEX01CL01.citrite.net> References: <1404139323-28806-1-git-send-email-dave.scott@citrix.com> <1404400506-3170-1-git-send-email-dave.scott@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1X2yki-00078w-Lt for xen-devel@lists.xenproject.org; Fri, 04 Jul 2014 08:21:52 +0000 In-Reply-To: <1404400506-3170-1-git-send-email-dave.scott@citrix.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "xen-devel@lists.xenproject.org" Cc: "john.liuqiming@huawei.com" , Dave Scott , "yanqiangjun@huawei.com" , Ian Campbell , Andrew Cooper List-Id: xen-devel@lists.xenproject.org > -----Original Message----- > From: David Scott [mailto:dave.scott@citrix.com] > Sent: 03 July 2014 16:15 > To: xen-devel@lists.xenproject.org > Cc: Paul Durrant; Ian Campbell; john.liuqiming@huawei.com; > yanqiangjun@huawei.com; Andrew Cooper; Dave Scott > Subject: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal > > Currently hvmloader uses the xenstore ring and then tries to > reset it back to its initial state. This is not part of the > ring protocol and, if xenstored reads the ring while it is > happening, xenstored will conclude it is corrupted. A corrupted > ring will prevent PV drivers from connecting. This seems to > be a rare failure. > > Furthermore, when a VM crashes it may jump to a 'crash kernel' > to create a diagnostic dump. Without the ability to safely > reset the ring the PV drivers won't be able to reliably > establish connections, to (for example) stream a memory dump to > disk. > > This prototype patch contains a simple extension of the > xenstore ring structure, enough to contain version numbers and > a 'closing' flag. This memory is currently unused within the > 4k page and should be zeroed as part of the domain setup > process. The oxenstored server advertises version 1, and > hvmloader detects this and sets the 'closing' flag. The server > then reinitialises the ring, filling it with obviously invalid > data to help debugging (unfortunately blocks of zeroes are > valid xenstore packets) and signals hvmloader by the event > channel that it's safe to boot the guest OS. > > Updates since v2 (thanks to Andy Cooper for the review): > * hvmloader: use volatile for read of closing flag > * style improvements > * remove xenstore version #defines > > Updates since v1 (thanks to Paul Durrant for the review): > * remove unused client version and associated dead code > * treat 'closing' as a flag by using "!=0" rather than "==1" > * hvmloader: move function up and remove forward decl > * document the existing xenstore ring and the extention in misc/ > > Signed-off-by: David Scott > --- > docs/misc/xenstore-ring.txt | 79 > +++++++++++++++++++++++++++++++++++ > tools/firmware/hvmloader/xenbus.c | 39 +++++++++-------- > tools/ocaml/libs/xb/xb.ml | 26 +++++++++++- > tools/ocaml/libs/xb/xb.mli | 1 + > tools/ocaml/libs/xb/xs_ring.ml | 11 +++++ > tools/ocaml/libs/xb/xs_ring_stubs.c | 63 > +++++++++++++++++++++++++++- > xen/include/public/io/xs_wire.h | 5 +++ > 7 files changed, 203 insertions(+), 21 deletions(-) > create mode 100644 docs/misc/xenstore-ring.txt > > diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt > new file mode 100644 > index 0000000..df2a09f > --- /dev/null > +++ b/docs/misc/xenstore-ring.txt > @@ -0,0 +1,79 @@ > + > +The domain builder allocates a single page and shares it with the xenstore > +daemon. This document describes the ring protocol used to communicate > via > +this page which is used to transmit and receive > +[xenstore protocol packets](xenstore.txt). > + > +In the original version (we call this "version 0"), the shared page has the > +following contents (all offsets and lengths are in octets): > + > +Offset Length Description > +----------------------------------------------------------------- > +0 1024 Requests: data sent to the xenstore daemon > +1024 1024 Replies: data sent to the domain > +2048 4 Request consumer offset > +2052 4 Request producer offset > +2056 4 Response consumer offset > +2060 4 Response producer offset > + > +When the page is allocated it is guaranteed to be full of zeroes. > + > +The "Requests" and "Replies" are treated as circular buffers, one for > +each direction. Each circular buffer is associated wth a producer and > +consumer offset, which are free-running counters starting from 0. A > "producer" > +offset is the offset in the byte stream of the next byte to be written; a > +"consumer" offset is the offset in the byte stream of the next byte to be > +read. The byte at offset 'x' in the byte stream will be stored in > +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives > +the number of bytes still to be read, and "1024 - (producer - consumer)" > +therefore gives the amount of space currently available for writing, > +where we must avoid overwriting unread data. > + > +The circular buffers are only used to send sequences of bytes between > domains. > +It is the responsibility of the layer above to segment these bytes into > +packets, as described in [xenstore.txt](xenstore.txt). > + > +The client and server domains can run concurrently on different cores and > +different sockets. We must therefore take care to avoid corruption by: > + > + 1. using atomic loads and stores when reading and writing the producer > + and consumer offsets. If an offset were to be updated by a non-atomic > + store then the other domain may read an invalid offset value. > + 2. ensuring request and reply data is fully read or written before > + updating the offsets by issuing a memory barrier. > + > +If we wish to read data but find the circular buffer empty, or wish to write > +data and find the circular buffer full, we wait on a pre-arranged event > +channel. When the other party next reads or writes data to the ring, it > +will notify() this event channel and we will wake. > + > +Protocol extension for reconnection > +----------------------------------- > + > +In version 0 of the protocol it is not possible to close and reopen the > +connection. This means that if the ring is corrupted, it can never be used > +(safely) again. Extending the protocol to allow reconnection would allow > +us to: > + > + 1. use the ring in the firmware (hvmloader) and safely reset it for use > + by the guest > + 2. re-establish a ring in a 'crash kernel' environment, allowing us to > + write crash dumps to PV disks or network devices. > + > +In version 1 of the protocol the ring is extended with the following > +fields: > + > +Offset Length Description > +----------------------------------------------------------------- > +2064 4 Xenstore daemon supported protocol version > +2068 4 Close request flag > + > +In a system supporting only version 0, both these fields are guaranteed > +to contain 0 by the domain builder. > + > +In a system supporting version 1, the xenstore daemon will write "1" into > +the support protocol version field. The guest xenstore client (eg in s/support/supported > +hvmloader) can query the version, and if it is set to "1" it can write > +"1" to the close request flag and call notify(). The supporting xenstore Perhaps, rather than 'call notify()' this should be 'signal the store event channel'? > +daemon can reset the ring to an empty state and signal completion by > +clearing the flag and calling notify() again. Same here. > diff --git a/tools/firmware/hvmloader/xenbus.c > b/tools/firmware/hvmloader/xenbus.c > index fe72e97..f85832c 100644 > --- a/tools/firmware/hvmloader/xenbus.c > +++ b/tools/firmware/hvmloader/xenbus.c > @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* > Shared ring with dom0 */ > static evtchn_port_t event; /* Event-channel to dom0 */ > static char payload[XENSTORE_PAYLOAD_MAX + 1]; /* Unmarshalling area > */ > > +static void ring_wait(void) > +{ > + struct shared_info *shinfo = get_shared_info(); > + struct sched_poll poll; > + > + memset(&poll, 0, sizeof(poll)); > + set_xen_guest_handle(poll.ports, &event); > + poll.nr_ports = 1; > + > + while ( !test_and_clear_bit(event, shinfo->evtchn_pending) ) > + hypercall_sched_op(SCHEDOP_poll, &poll); > +} > + > /* Connect our xenbus client to the backend. > * Call once, before any other xenbus actions. */ > void xenbus_setup(void) > @@ -68,10 +81,15 @@ void xenbus_shutdown(void) > > ASSERT(rings != NULL); > > - /* We zero out the whole ring -- the backend can handle this, and it's > - * not going to surprise any frontends since it's equivalent to never > - * having used the rings. */ > - memset(rings, 0, sizeof *rings); > + if (rings->server_version > 0) { > + rings->closing = 1; > + while (*(volatile uint32_t*)&rings->closing == 1) > + ring_wait (); > + } else > + /* If the backend reads the state while we're erasing it then the > + * ring state will become corrupted, preventing guest frontends from > + * connecting. This is rare. */ > + memset(rings, 0, sizeof *rings); > > /* Clear the event-channel state too. */ > memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info)); > @@ -81,19 +99,6 @@ void xenbus_shutdown(void) > rings = NULL; > } > > -static void ring_wait(void) > -{ > - struct shared_info *shinfo = get_shared_info(); > - struct sched_poll poll; > - > - memset(&poll, 0, sizeof(poll)); > - set_xen_guest_handle(poll.ports, &event); > - poll.nr_ports = 1; > - > - while ( !test_and_clear_bit(event, shinfo->evtchn_pending) ) > - hypercall_sched_op(SCHEDOP_poll, &poll); > -} > - > /* Helper functions: copy data in and out of the ring */ > static void ring_write(const char *data, uint32_t len) > { > diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml > index 29d354d..d5cd776 100644 > --- a/tools/ocaml/libs/xb/xb.ml > +++ b/tools/ocaml/libs/xb/xb.ml > @@ -84,7 +84,26 @@ let write con s len = > | Fd backfd -> write_fd backfd con s len > | Xenmmap backmmap -> write_mmap backmmap con s len > > -let output con = > +(* If a function throws Xs_ring.Closing, then clear the ring state > + and serve the ring again. *) > +let rec handle_closing f con = > + match (try Some (f con) with Xs_ring.Closing -> None) with > + | Some x -> x > + | None -> > + begin match con.backend with > + | Fd _ -> raise Xs_ring.Closing (* should never happen, but > just in case *) > + | Xenmmap backend -> > + Xs_ring.close backend.mmap; > + backend.eventchn_notify (); > + (* Clear our old connection state *) > + Queue.clear con.pkt_in; > + Queue.clear con.pkt_out; > + con.partial_in <- init_partial_in (); > + con.partial_out <- ""; > + handle_closing f con > + end > + > +let output = handle_closing (fun con -> > (* get the output string from a string_of(packet) or partial_out *) > let s = if String.length con.partial_out > 0 then > con.partial_out > @@ -101,8 +120,9 @@ let output con = > ); > (* after sending one packet, partial is empty *) > con.partial_out = "" > +) > > -let input con = > +let input = handle_closing (fun con -> > let newpacket = ref false in > let to_read = > match con.partial_in with > @@ -133,6 +153,7 @@ let input con = > HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf) > ); > !newpacket > +) > > let newcon backend = { > backend = backend; > @@ -145,6 +166,7 @@ let newcon backend = { > let open_fd fd = newcon (Fd { fd = fd; }) > > let open_mmap mmap notifyfct = > + Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *) > newcon (Xenmmap { > mmap = mmap; > eventchn_notify = notifyfct; > diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli > index 58234ae..7f65fa3 100644 > --- a/tools/ocaml/libs/xb/xb.mli > +++ b/tools/ocaml/libs/xb/xb.mli > @@ -80,6 +80,7 @@ val read : t -> string -> int -> int > val write_fd : backend_fd -> 'a -> string -> int -> int > val write_mmap : backend_mmap -> 'a -> string -> int -> int > val write : t -> string -> int -> int > +val handle_closing : (t -> 'a) -> t -> 'a > val output : t -> bool > val input : t -> bool > val newcon : backend -> t > diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml > index 9469609..d7f0fd4 100644 > --- a/tools/ocaml/libs/xb/xs_ring.ml > +++ b/tools/ocaml/libs/xb/xs_ring.ml > @@ -14,5 +14,16 @@ > * GNU Lesser General Public License for more details. > *) > > +exception Closing > + > +let _ = > + Callback.register_exception "Xs_ring.Closing" Closing > + > external read: Xenmmap.mmap_interface -> string -> int -> int = > "ml_interface_read" > external write: Xenmmap.mmap_interface -> string -> int -> int = > "ml_interface_write" > + > + > +external set_server_version: Xenmmap.mmap_interface -> int -> unit = > "ml_interface_set_server_version" "noalloc" > +external get_server_version: Xenmmap.mmap_interface -> int = > "ml_interface_get_server_version" "noalloc" > + > +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" > "noalloc" > diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c > b/tools/ocaml/libs/xb/xs_ring_stubs.c > index 8bd1047..27c98cd 100644 > --- a/tools/ocaml/libs/xb/xs_ring_stubs.c > +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c > @@ -35,19 +35,28 @@ > > #define GET_C_STRUCT(a) ((struct mmap_interface *) a) > > +#define ERROR_UNKNOWN (-1) > +#define ERROR_CLOSING (-2) > + > static int xs_ring_read(struct mmap_interface *interface, > char *buffer, int len) > { > struct xenstore_domain_interface *intf = interface->addr; > XENSTORE_RING_IDX cons, prod; /* offsets only */ > int to_read; > + uint32_t closing; > > cons = *(volatile uint32*)&intf->req_cons; > prod = *(volatile uint32*)&intf->req_prod; > + closing = *(volatile uint32*)&intf->closing; > + > + if (closing != 0) > + return ERROR_CLOSING; > + > xen_mb(); > > if ((prod - cons) > XENSTORE_RING_SIZE) > - return -1; > + return ERROR_UNKNOWN; > > if (prod == cons) > return 0; > @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface > *interface, > struct xenstore_domain_interface *intf = interface->addr; > XENSTORE_RING_IDX cons, prod; > int can_write; > + uint32_t closing; > > cons = *(volatile uint32*)&intf->rsp_cons; > prod = *(volatile uint32*)&intf->rsp_prod; > + closing = *(volatile uint32*)&intf->closing; > + > + if (closing != 0) > + return ERROR_CLOSING; > + > xen_mb(); > if ( (prod - cons) >= XENSTORE_RING_SIZE ) > return 0; > @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface, > value buffer, value len) > > res = xs_ring_read(GET_C_STRUCT(interface), > String_val(buffer), Int_val(len)); > - if (res == -1) > + if (res == ERROR_UNKNOWN) > caml_failwith("bad connection"); > + > + if (res == ERROR_CLOSING) > + This looks a bit wrong. The blank line above should be removed and the following line indented appropriately. > caml_raise_constant(*caml_named_value("Xs_ring.Closing")); > + > result = Val_int(res); > CAMLreturn(result); > } > @@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface, > value buffer, value len) > > res = xs_ring_write(GET_C_STRUCT(interface), > String_val(buffer), Int_val(len)); > + > + if (res == ERROR_CLOSING) > + Same here. > caml_raise_constant(*caml_named_value("Xs_ring.Closing")); > + > result = Val_int(res); > CAMLreturn(result); > } > + > +CAMLprim value ml_interface_set_server_version(value interface, value v) > +{ > + CAMLparam2(interface, v); > + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)- > >addr; > + > + intf->server_version = Int_val(v); > + > + CAMLreturn(Val_unit); > +} > + > +CAMLprim value ml_interface_get_server_version(value interface) > +{ > + CAMLparam1(interface); > + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)- > >addr; > + > + CAMLreturn(Val_int (intf->server_version)); > +} > + > +CAMLprim value ml_interface_close(value interface) > +{ > + CAMLparam1(interface); > + const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' }; > + struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)- > >addr; > + int i; > + > + intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = > 0; > + /* Ensure the unused space is full of invalid xenstore packets. */ > + for (i = 0; i < XENSTORE_RING_SIZE; i++) { > + intf->req[i] = invalid_data[i % sizeof(invalid_data)]; > + intf->rsp[i] = invalid_data[i % sizeof(invalid_data)]; > + } This does not reset the rings to their initial state (i.e. all zeroes). I don't think that's necessarily a problem, but it is inconsistent. > + xen_mb (); > + intf->closing = 0; > + CAMLreturn(Val_unit); > +} > diff --git a/xen/include/public/io/xs_wire.h > b/xen/include/public/io/xs_wire.h > index 585f0c8..022d614 100644 > --- a/xen/include/public/io/xs_wire.h > +++ b/xen/include/public/io/xs_wire.h > @@ -108,14 +108,19 @@ enum xs_watch_type > * `incontents 150 xenstore_struct XenStore wire protocol. > * > * Inter-domain shared memory communications. */ > + Pure whitespace change. Not necessary. Paul > #define XENSTORE_RING_SIZE 1024 > typedef uint32_t XENSTORE_RING_IDX; > #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1)) > struct xenstore_domain_interface { > + /* XENSTORE_VERSION_0 */ > char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */ > char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */ > XENSTORE_RING_IDX req_cons, req_prod; > XENSTORE_RING_IDX rsp_cons, rsp_prod; > + uint32_t server_version; > + /* server_version 1 and later: */ > + uint32_t closing; /* Non-zero means close in progress */ > }; > > /* Violating this is very bad. See docs/misc/xenstore.txt. */ > -- > 1.7.10.4