All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Durrant <Paul.Durrant@citrix.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: "john.liuqiming@huawei.com" <john.liuqiming@huawei.com>,
	Dave Scott <Dave.Scott@citrix.com>,
	"yanqiangjun@huawei.com" <yanqiangjun@huawei.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>
Subject: Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
Date: Fri, 4 Jul 2014 08:21:49 +0000	[thread overview]
Message-ID: <9AAE0902D5BC7E449B7C8E4E778ABCD03CAFD1@AMSPEX01CL01.citrite.net> (raw)
In-Reply-To: <1404400506-3170-1-git-send-email-dave.scott@citrix.com>

> -----Original Message-----
> From: David Scott [mailto:dave.scott@citrix.com]
> Sent: 03 July 2014 16:15
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant; Ian Campbell; john.liuqiming@huawei.com;
> yanqiangjun@huawei.com; Andrew Cooper; Dave Scott
> Subject: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
> 
> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.
> 
> Furthermore, when a VM crashes it may jump to a 'crash kernel'
> to create a diagnostic dump. Without the ability to safely
> reset the ring the PV drivers won't be able to reliably
> establish connections, to (for example) stream a memory dump to
> disk.
> 
> This prototype patch contains a simple extension of the
> xenstore ring structure, enough to contain version numbers and
> a 'closing' flag. This memory is currently unused within the
> 4k page and should be zeroed as part of the domain setup
> process. The oxenstored server advertises version 1, and
> hvmloader detects this and sets the 'closing' flag. The server
> then reinitialises the ring, filling it with obviously invalid
> data to help debugging (unfortunately blocks of zeroes are
> valid xenstore packets) and signals hvmloader by the event
> channel that it's safe to boot the guest OS.
> 
> Updates since v2 (thanks to Andy Cooper for the review):
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
> 
> Updates since v1 (thanks to Paul Durrant for the review):
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
> 
> Signed-off-by: David Scott <dave.scott@citrix.com>
> ---
>  docs/misc/xenstore-ring.txt         |   79
> +++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/xenbus.c   |   39 +++++++++--------
>  tools/ocaml/libs/xb/xb.ml           |   26 +++++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    1 +
>  tools/ocaml/libs/xb/xs_ring.ml      |   11 +++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |   63
> +++++++++++++++++++++++++++-
>  xen/include/public/io/xs_wire.h     |    5 +++
>  7 files changed, 203 insertions(+), 21 deletions(-)
>  create mode 100644 docs/misc/xenstore-ring.txt
> 
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..df2a09f
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,79 @@
> +
> +The domain builder allocates a single page and shares it with the xenstore
> +daemon. This document describes the ring protocol used to communicate
> via
> +this page which is used to transmit and receive
> +[xenstore protocol packets](xenstore.txt).
> +
> +In the original version (we call this "version 0"), the shared page has the
> +following contents (all offsets and lengths are in octets):
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +0       1024    Requests: data sent to the xenstore daemon
> +1024    1024    Replies: data sent to the domain
> +2048    4       Request consumer offset
> +2052    4       Request producer offset
> +2056    4       Response consumer offset
> +2060    4       Response producer offset
> +
> +When the page is allocated it is guaranteed to be full of zeroes.
> +
> +The "Requests" and "Replies" are treated as circular buffers, one for
> +each direction. Each circular buffer is associated wth a producer and
> +consumer offset, which are free-running counters starting from 0. A
> "producer"
> +offset is the offset in the byte stream of the next byte to be written; a
> +"consumer" offset is the offset in the byte stream of the next byte to be
> +read. The byte at offset 'x' in the byte stream will be stored in
> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
> +the number of bytes still to be read, and "1024 - (producer - consumer)"
> +therefore gives the amount of space currently available for writing,
> +where we must avoid overwriting unread data.
> +
> +The circular buffers are only used to send sequences of bytes between
> domains.
> +It is the responsibility of the layer above to segment these bytes into
> +packets, as described in [xenstore.txt](xenstore.txt).
> +
> +The client and server domains can run concurrently on different cores and
> +different sockets. We must therefore take care to avoid corruption by:
> +
> +  1. using atomic loads and stores when reading and writing the producer
> +     and consumer offsets. If an offset were to be updated by a non-atomic
> +     store then the other domain may read an invalid offset value.
> +  2. ensuring request and reply data is fully read or written before
> +     updating the offsets by issuing a memory barrier.
> +
> +If we wish to read data but find the circular buffer empty, or wish to write
> +data and find the circular buffer full, we wait on a pre-arranged event
> +channel. When the other party next reads or writes data to the ring, it
> +will notify() this event channel and we will wake.
> +
> +Protocol extension for reconnection
> +-----------------------------------
> +
> +In version 0 of the protocol it is not possible to close and reopen the
> +connection. This means that if the ring is corrupted, it can never be used
> +(safely) again. Extending the protocol to allow reconnection would allow
> +us to:
> +
> +  1. use the ring in the firmware (hvmloader) and safely reset it for use
> +     by the guest
> +  2. re-establish a ring in a 'crash kernel' environment, allowing us to
> +     write crash dumps to PV disks or network devices.
> +
> +In version 1 of the protocol the ring is extended with the following
> +fields:
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +2064    4       Xenstore daemon supported protocol version
> +2068    4       Close request flag
> +
> +In a system supporting only version 0, both these fields are guaranteed
> +to contain 0 by the domain builder.
> +
> +In a system supporting version 1, the xenstore daemon will write "1" into
> +the support protocol version field. The guest xenstore client (eg in

s/support/supported

> +hvmloader) can query the version, and if it is set to "1" it can write
> +"1" to the close request flag and call notify(). The supporting xenstore

Perhaps, rather than 'call notify()' this should be 'signal the store event channel'?

> +daemon can reset the ring to an empty state and signal completion by
> +clearing the flag and calling notify() again.

Same here.

> diff --git a/tools/firmware/hvmloader/xenbus.c
> b/tools/firmware/hvmloader/xenbus.c
> index fe72e97..f85832c 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /*
> Shared ring with dom0 */
>  static evtchn_port_t event;                     /* Event-channel to dom0 */
>  static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area
> */
> 
> +static void ring_wait(void)
> +{
> +    struct shared_info *shinfo = get_shared_info();
> +    struct sched_poll poll;
> +
> +    memset(&poll, 0, sizeof(poll));
> +    set_xen_guest_handle(poll.ports, &event);
> +    poll.nr_ports = 1;
> +
> +    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> +        hypercall_sched_op(SCHEDOP_poll, &poll);
> +}
> +
>  /* Connect our xenbus client to the backend.
>   * Call once, before any other xenbus actions. */
>  void xenbus_setup(void)
> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
> 
>      ASSERT(rings != NULL);
> 
> -    /* We zero out the whole ring -- the backend can handle this, and it's
> -     * not going to surprise any frontends since it's equivalent to never
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> +    if (rings->server_version > 0) {
> +        rings->closing = 1;
> +        while (*(volatile uint32_t*)&rings->closing == 1)
> +            ring_wait ();
> +    } else
> +        /* If the backend reads the state while we're erasing it then the
> +         * ring state will become corrupted, preventing guest frontends from
> +         * connecting. This is rare. */
> +        memset(rings, 0, sizeof *rings);
> 
>      /* Clear the event-channel state too. */
>      memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
> @@ -81,19 +99,6 @@ void xenbus_shutdown(void)
>      rings = NULL;
>  }
> 
> -static void ring_wait(void)
> -{
> -    struct shared_info *shinfo = get_shared_info();
> -    struct sched_poll poll;
> -
> -    memset(&poll, 0, sizeof(poll));
> -    set_xen_guest_handle(poll.ports, &event);
> -    poll.nr_ports = 1;
> -
> -    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> -        hypercall_sched_op(SCHEDOP_poll, &poll);
> -}
> -
>  /* Helper functions: copy data in and out of the ring */
>  static void ring_write(const char *data, uint32_t len)
>  {
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..d5cd776 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -84,7 +84,26 @@ let write con s len =
>  	| Fd backfd     -> write_fd backfd con s len
>  	| Xenmmap backmmap -> write_mmap backmmap con s len
> 
> -let output con =
> +(* If a function throws Xs_ring.Closing, then clear the ring state
> +   and serve the ring again. *)
> +let rec handle_closing f con =
> +	match (try Some (f con) with Xs_ring.Closing -> None) with
> +	| Some x -> x
> +	| None ->
> +		begin match con.backend with
> +		| Fd _ -> raise Xs_ring.Closing (* should never happen, but
> just in case *)
> +		| Xenmmap backend ->
> +			Xs_ring.close backend.mmap;
> +			backend.eventchn_notify ();
> +			(* Clear our old connection state *)
> +			Queue.clear con.pkt_in;
> +			Queue.clear con.pkt_out;
> +			con.partial_in <- init_partial_in ();
> +			con.partial_out <- "";
> +			handle_closing f con
> +		end
> +
> +let output = handle_closing (fun con ->
>  	(* get the output string from a string_of(packet) or partial_out *)
>  	let s = if String.length con.partial_out > 0 then
>  			con.partial_out
> @@ -101,8 +120,9 @@ let output con =
>  	);
>  	(* after sending one packet, partial is empty *)
>  	con.partial_out = ""
> +)
> 
> -let input con =
> +let input = handle_closing (fun con ->
>  	let newpacket = ref false in
>  	let to_read =
>  		match con.partial_in with
> @@ -133,6 +153,7 @@ let input con =
>  			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>  	);
>  	!newpacket
> +)
> 
>  let newcon backend = {
>  	backend = backend;
> @@ -145,6 +166,7 @@ let newcon backend = {
>  let open_fd fd = newcon (Fd { fd = fd; })
> 
>  let open_mmap mmap notifyfct =
> +	Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
>  	newcon (Xenmmap {
>  		mmap = mmap;
>  		eventchn_notify = notifyfct;
> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
> index 58234ae..7f65fa3 100644
> --- a/tools/ocaml/libs/xb/xb.mli
> +++ b/tools/ocaml/libs/xb/xb.mli
> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>  val write_fd : backend_fd -> 'a -> string -> int -> int
>  val write_mmap : backend_mmap -> 'a -> string -> int -> int
>  val write : t -> string -> int -> int
> +val handle_closing : (t -> 'a) -> t -> 'a
>  val output : t -> bool
>  val input : t -> bool
>  val newcon : backend -> t
> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
> index 9469609..d7f0fd4 100644
> --- a/tools/ocaml/libs/xb/xs_ring.ml
> +++ b/tools/ocaml/libs/xb/xs_ring.ml
> @@ -14,5 +14,16 @@
>   * GNU Lesser General Public License for more details.
>   *)
> 
> +exception Closing
> +
> +let _ =
> +  Callback.register_exception "Xs_ring.Closing" Closing
> +
>  external read: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_read"
>  external write: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_write"
> +
> +
> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
> "ml_interface_set_server_version" "noalloc"
> +external get_server_version: Xenmmap.mmap_interface -> int =
> "ml_interface_get_server_version" "noalloc"
> +
> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
> "noalloc"
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
> b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..27c98cd 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,19 +35,28 @@
> 
>  #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
> 
> +#define ERROR_UNKNOWN (-1)
> +#define ERROR_CLOSING (-2)
> +
>  static int xs_ring_read(struct mmap_interface *interface,
>                               char *buffer, int len)
>  {
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod; /* offsets only */
>  	int to_read;
> +	uint32_t closing;
> 
>  	cons = *(volatile uint32*)&intf->req_cons;
>  	prod = *(volatile uint32*)&intf->req_prod;
> +	closing = *(volatile uint32*)&intf->closing;
> +
> +	if (closing != 0)
> +		return ERROR_CLOSING;
> +
>  	xen_mb();
> 
>  	if ((prod - cons) > XENSTORE_RING_SIZE)
> -	    return -1;
> +	    return ERROR_UNKNOWN;
> 
>  	if (prod == cons)
>  		return 0;
> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
> *interface,
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod;
>  	int can_write;
> +	uint32_t closing;
> 
>  	cons = *(volatile uint32*)&intf->rsp_cons;
>  	prod = *(volatile uint32*)&intf->rsp_prod;
> +	closing = *(volatile uint32*)&intf->closing;
> +
> +	if (closing != 0)
> +		return ERROR_CLOSING;
> +
>  	xen_mb();
>  	if ( (prod - cons) >= XENSTORE_RING_SIZE )
>  		return 0;
> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
> value buffer, value len)
> 
>  	res = xs_ring_read(GET_C_STRUCT(interface),
>  	                   String_val(buffer), Int_val(len));
> -	if (res == -1)
> +	if (res == ERROR_UNKNOWN)
>  		caml_failwith("bad connection");
> +
> +	if (res == ERROR_CLOSING)
> +

This looks a bit wrong. The blank line above should be removed and the following line indented appropriately.

> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
>  	result = Val_int(res);
>  	CAMLreturn(result);
>  }
> @@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface,
> value buffer, value len)
> 
>  	res = xs_ring_write(GET_C_STRUCT(interface),
>  	                    String_val(buffer), Int_val(len));
> +
> +	if (res == ERROR_CLOSING)
> +

Same here.

> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
>  	result = Val_int(res);
>  	CAMLreturn(result);
>  }
> +
> +CAMLprim value ml_interface_set_server_version(value interface, value v)
> +{
> +	CAMLparam2(interface, v);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +	intf->server_version = Int_val(v);
> +
> +	CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_server_version(value interface)
> +{
> +	CAMLparam1(interface);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +	CAMLreturn(Val_int (intf->server_version));
> +}
> +
> +CAMLprim value ml_interface_close(value interface)
> +{
> +	CAMLparam1(interface);
> +	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +	int i;
> +
> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
> 0;
> +	/* Ensure the unused space is full of invalid xenstore packets. */
> +	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
> +		intf->req[i] = invalid_data[i % sizeof(invalid_data)];
> +		intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
> +	}

This does not reset the rings to their initial state (i.e. all zeroes). I don't think that's necessarily a problem, but it is inconsistent.

> +	xen_mb ();
> +	intf->closing = 0;
> +	CAMLreturn(Val_unit);
> +}
> diff --git a/xen/include/public/io/xs_wire.h
> b/xen/include/public/io/xs_wire.h
> index 585f0c8..022d614 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -108,14 +108,19 @@ enum xs_watch_type
>   * `incontents 150 xenstore_struct XenStore wire protocol.
>   *
>   * Inter-domain shared memory communications. */
> +

Pure whitespace change. Not necessary.

  Paul

>  #define XENSTORE_RING_SIZE 1024
>  typedef uint32_t XENSTORE_RING_IDX;
>  #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>  struct xenstore_domain_interface {
> +    /* XENSTORE_VERSION_0 */
>      char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>      XENSTORE_RING_IDX req_cons, req_prod;
>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> +    uint32_t server_version;
> +    /* server_version 1 and later: */
> +    uint32_t closing;             /* Non-zero means close in progress */
>  };
> 
>  /* Violating this is very bad.  See docs/misc/xenstore.txt. */
> --
> 1.7.10.4

  reply	other threads:[~2014-07-04  8:21 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-04  4:25 bug in hvmloader xenbus_shutdown logic Liuqiming (John)
2013-12-04  9:49 ` Ian Campbell
2013-12-05  2:08   ` Liuqiming (John)
2013-12-05  9:36     ` David Scott
2013-12-05  9:43       ` Paul Durrant
2013-12-05 11:10       ` Ian Campbell
2013-12-06  3:53         ` Liuqiming (John)
2014-06-16 13:43           ` Dave Scott
2014-06-16 13:57             ` Paul Durrant
2014-06-25 21:15               ` [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal David Scott
2014-06-26  9:05                 ` Paul Durrant
2014-06-26  9:45                   ` Dave Scott
2014-06-30 14:42                     ` [PATCH v2] " David Scott
2014-07-03 15:15                       ` [PATCH v3] " David Scott
2014-07-04  8:21                         ` Paul Durrant [this message]
2014-07-04 10:00                           ` Dave Scott
2014-07-04  9:59                         ` Ian Campbell
2014-07-04 10:19                           ` Dave Scott
2014-07-04 11:27                             ` Ian Campbell
2014-09-03 16:25                         ` [PATCH v4] xenstore: " David Scott
2014-09-10 13:35                           ` Ian Campbell
2014-09-10 14:33                             ` Dave Scott
     [not found]                           ` <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>
2014-09-22 16:38                             ` Ian Jackson
2014-09-24  9:06                               ` Dave Scott
2014-09-24 13:36                           ` Jon Ludlam
2014-07-04 11:02                       ` [PATCH v2] RFC: " Ian Jackson
2014-07-02 12:32                 ` [PATCH RFC] " Andrew Cooper
2014-07-02 16:47                   ` Dave Scott
2014-07-02 16:52                     ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9AAE0902D5BC7E449B7C8E4E778ABCD03CAFD1@AMSPEX01CL01.citrite.net \
    --to=paul.durrant@citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Dave.Scott@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=john.liuqiming@huawei.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=yanqiangjun@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.