All of lore.kernel.org
 help / color / mirror / Atom feed
* bug in hvmloader xenbus_shutdown logic
@ 2013-12-04  4:25 Liuqiming (John)
  2013-12-04  9:49 ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Liuqiming (John) @ 2013-12-04  4:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Yanqiangjun

Hi,

A couple of months ago, I run into a xenstored IO ring issue ( see http://lists.xen.org/archives/html/xen-devel/2013-07/msg00134.html) and at that time we add a protection on oxenstored side.

Somehow I meet this problem again. A suse11 hvm domain start and hang with error:

    XENBUS request ring is not quiescent (00000010:00000000)!
    XENBUS response ring is not quiescent (00000000:00000013): fixing up

It happens when pv-driver init xenbus modules

...
int xb_init_comms(void)
{
	struct xenstore_domain_interface *intf = xen_store_interface;
	int err;

	if (intf->req_prod != intf->req_cons)
		pr_err("XENBUS request ring is not quiescent "
		       "(%08x:%08x)!\n", intf->req_cons, intf->req_prod);

	if (intf->rsp_prod != intf->rsp_cons) {
		pr_warning("XENBUS response ring is not quiescent"
			   " (%08x:%08x): fixing up\n",
			   intf->rsp_cons, intf->rsp_prod);
		/* breaks kdump */
		if (!reset_devices)
			intf->rsp_cons = intf->rsp_prod;
	}
...

With the protection logic, oxenstored will treat this domain as " malicious domains", but apparently it's not.

And this time I review all the related source code, and found that it may be caused by a bug in hvmloader.

In hvmloader after all the setup, there will be a xenbus_shutdown call which will clear up the ring,

void xenbus_shutdown(void)
{
...
    /* We zero out the whole ring -- the backend can handle this, and it's 
     * not going to surprise any frontends since it's equivalent to never 
     * having used the rings. */
    memset(rings, 0, sizeof *rings); // <-- clear it here
...
} 

And  memset is implemented as this 
void *
memset(void *s, int c, unsigned n)
{
    uint8_t b = (uint8_t) c;
    uint8_t *p = (uint8_t *)s;
    int i;
    for ( i = 0; i < n; i++ )
        *p++ = b;
    return s;
}

The ring structure is put in a shared page and definition is 
struct xenstore_domain_interface
{
    char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
    char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
    XENSTORE_RING_IDX req_cons, req_prod;
    XENSTORE_RING_IDX rsp_cons, rsp_prod;
};

So here is the problem:
memset can not set all the page to zero in an atomic way, and during the clear up process oxenstored may access this ring.
So memset set every 8bit in this page to zero one by one and when req_cons set to zero oxenstored read req_cons and req_prod to compare, 
After that oxenstored will process as a normal incoming request and memset will zero out the left page. 
Finally oxenstored will increase req_cons but req_prod will remain as 0 and that status will be treated as a "bad connection".

To fix that we need to make memset atomicly (clear up req_cons and req_prod the same time at least) or we can make oxenstored do not to access the ring when hvmloader clearing up.

Any suggestion?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: bug in hvmloader xenbus_shutdown logic
  2013-12-04  4:25 bug in hvmloader xenbus_shutdown logic Liuqiming (John)
@ 2013-12-04  9:49 ` Ian Campbell
  2013-12-05  2:08   ` Liuqiming (John)
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2013-12-04  9:49 UTC (permalink / raw)
  To: Liuqiming (John); +Cc: Yanqiangjun, xen-devel

On Wed, 2013-12-04 at 04:25 +0000, Liuqiming (John) wrote:

> memset can not set all the page to zero in an atomic way, and during
> the clear up process oxenstored may access this ring.

Why is oxenstored poking at the ring? Surely it should only do so when
the guest (hvmloader) sends it a request. If hvmloader is clearing the
page while there is a request/event outstanding then this is an
hvmloader bug.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: bug in hvmloader xenbus_shutdown logic
  2013-12-04  9:49 ` Ian Campbell
@ 2013-12-05  2:08   ` Liuqiming (John)
  2013-12-05  9:36     ` David Scott
  0 siblings, 1 reply; 29+ messages in thread
From: Liuqiming (John) @ 2013-12-05  2:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Yanqiangjun, David Scott, xen-devel

According to oxenstored source code, oxenstored will read every domain's IO ring no matter what events happened.

Here is the main loop of oxenstored:

let main_loop () =
	incr periodic_ops_counter;
	if !periodic_ops_counter > 20 then (
		periodic_ops_counter := 0;
		periodic_ops ();
	);

	let mw = Connections.has_more_work cons in
	let inset, outset = Connections.select cons in
	let timeout = if List.length mw > 0 then 0. else -1. in
	let rset, wset, _ =
	try
		Unix.select (spec_fds @ inset) outset [] timeout    <--  poll event from fd set including /dev/xen/evtchn
	with Unix.Unix_error(Unix.EINTR, _, _) ->
		[], [], [] in
	let sfds, cfds =
		List.partition (fun fd -> List.mem fd spec_fds) rset in
	if List.length sfds > 0 then
		process_special_fds sfds;
	if List.length cfds > 0 || List.length wset > 0 then
		process_connection_fds store cons domains cfds wset;
	process_domains store cons domains   <- no matter what event income, this will handle IO ring request of all domains
	in

so when one domain's hvmloader is clearing its IO ring, oxenstored may access this IO ring because of another domain's event happened.
  

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Wednesday, December 04, 2013 5:50 PM
> To: Liuqiming (John)
> Cc: xen-devel@lists.xen.org; Yanqiangjun
> Subject: Re: [Xen-devel] bug in hvmloader xenbus_shutdown logic
> 
> On Wed, 2013-12-04 at 04:25 +0000, Liuqiming (John) wrote:
> 
> > memset can not set all the page to zero in an atomic way, and during
> > the clear up process oxenstored may access this ring.
> 
> Why is oxenstored poking at the ring? Surely it should only do so when
> the guest (hvmloader) sends it a request. If hvmloader is clearing the
> page while there is a request/event outstanding then this is an
> hvmloader bug.
> 
> Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: bug in hvmloader xenbus_shutdown logic
  2013-12-05  2:08   ` Liuqiming (John)
@ 2013-12-05  9:36     ` David Scott
  2013-12-05  9:43       ` Paul Durrant
  2013-12-05 11:10       ` Ian Campbell
  0 siblings, 2 replies; 29+ messages in thread
From: David Scott @ 2013-12-05  9:36 UTC (permalink / raw)
  To: Liuqiming (John), Ian Campbell; +Cc: Yanqiangjun, xen-devel

On 05/12/13 02:08, Liuqiming (John) wrote:
> According to oxenstored source code, oxenstored will read every domain's IO ring no matter what events happened.
>
> Here is the main loop of oxenstored:
>
> let main_loop () =
...
> 	process_domains store cons domains   <- no matter what event income, this will handle IO ring request of all domains
> 	in
>
> so when one domain's hvmloader is clearing its IO ring, oxenstored may access this IO ring because of another domain's event happened.

Yes, this version of the code checks all the interdomain rings every 
time it goes around the main loop. FWIW my experimental updated 
oxenstore uses one (user-level) thread per connection. I had thought 
this was just an efficiency issue... I didn't realise it had correctness 
implications.

>> On Wed, 2013-12-04 at 04:25 +0000, Liuqiming (John) wrote:
>>
>>> memset can not set all the page to zero in an atomic way, and during
>>> the clear up process oxenstored may access this ring.
>>
>> Why is oxenstored poking at the ring? Surely it should only do so when
>> the guest (hvmloader) sends it a request. If hvmloader is clearing the
>> page while there is a request/event outstanding then this is an
>> hvmloader bug.

Ok, that makes sense.

My only hesitation is that violations of this rule (like with current 
oxenstored) show up rarely. Presumably the xenstore ring desynchronises 
and the guest will never be able to boot properly with PV drivers. I 
don't think I've ever seen this myself.

Do frontends expect the xenstore ring to be in a zeroed state when they 
startup? Assuming hvmloader has received all the outstanding 
request/events, would it be safe to leave the ring contents alone?

Cheers,
Dave

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: bug in hvmloader xenbus_shutdown logic
  2013-12-05  9:36     ` David Scott
@ 2013-12-05  9:43       ` Paul Durrant
  2013-12-05 11:10       ` Ian Campbell
  1 sibling, 0 replies; 29+ messages in thread
From: Paul Durrant @ 2013-12-05  9:43 UTC (permalink / raw)
  To: Dave Scott, Liuqiming (John), Ian Campbell; +Cc: Yanqiangjun, xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of David Scott
> Sent: 05 December 2013 09:36
> To: Liuqiming (John); Ian Campbell
> Cc: Yanqiangjun; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] bug in hvmloader xenbus_shutdown logic
> 
> On 05/12/13 02:08, Liuqiming (John) wrote:
> > According to oxenstored source code, oxenstored will read every domain's
> IO ring no matter what events happened.
> >
> > Here is the main loop of oxenstored:
> >
> > let main_loop () =
> ...
> > 	process_domains store cons domains   <- no matter what event
> income, this will handle IO ring request of all domains
> > 	in
> >
> > so when one domain's hvmloader is clearing its IO ring, oxenstored may
> access this IO ring because of another domain's event happened.
> 
> Yes, this version of the code checks all the interdomain rings every
> time it goes around the main loop. FWIW my experimental updated
> oxenstore uses one (user-level) thread per connection. I had thought
> this was just an efficiency issue... I didn't realise it had correctness
> implications.
> 
> >> On Wed, 2013-12-04 at 04:25 +0000, Liuqiming (John) wrote:
> >>
> >>> memset can not set all the page to zero in an atomic way, and during
> >>> the clear up process oxenstored may access this ring.
> >>
> >> Why is oxenstored poking at the ring? Surely it should only do so when
> >> the guest (hvmloader) sends it a request. If hvmloader is clearing the
> >> page while there is a request/event outstanding then this is an
> >> hvmloader bug.
> 
> Ok, that makes sense.
> 
> My only hesitation is that violations of this rule (like with current
> oxenstored) show up rarely. Presumably the xenstore ring desynchronises
> and the guest will never be able to boot properly with PV drivers. I
> don't think I've ever seen this myself.
> 
> Do frontends expect the xenstore ring to be in a zeroed state when they
> startup? Assuming hvmloader has received all the outstanding
> request/events, would it be safe to leave the ring contents alone?
> 

I think there are probably some PV driver frontends that don't expect the ring to have been messed with. You could probably consider that a frontend bug, but restoring the ring to pristine state seems like it's a good thing to do. However zeroing out a shared ring with no explicit notification to the backend also seems like a very risky thing to do.

  Paul

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: bug in hvmloader xenbus_shutdown logic
  2013-12-05  9:36     ` David Scott
  2013-12-05  9:43       ` Paul Durrant
@ 2013-12-05 11:10       ` Ian Campbell
  2013-12-06  3:53         ` Liuqiming (John)
  1 sibling, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2013-12-05 11:10 UTC (permalink / raw)
  To: David Scott; +Cc: Liuqiming (John), Yanqiangjun, xen-devel

On Thu, 2013-12-05 at 09:36 +0000, David Scott wrote:
> On 05/12/13 02:08, Liuqiming (John) wrote:
> > According to oxenstored source code, oxenstored will read every domain's IO ring no matter what events happened.
> >
> > Here is the main loop of oxenstored:
> >
> > let main_loop () =
> ...
> > 	process_domains store cons domains   <- no matter what event income, this will handle IO ring request of all domains
> > 	in
> >
> > so when one domain's hvmloader is clearing its IO ring, oxenstored may access this IO ring because of another domain's event happened.
> 
> Yes, this version of the code checks all the interdomain rings every 
> time it goes around the main loop. FWIW my experimental updated 
> oxenstore uses one (user-level) thread per connection. I had thought 
> this was just an efficiency issue... I didn't realise it had correctness 
> implications.

TBH the protocol probably doesn't say anything about not looking unless
you've got a event, so I guess it is kind of implicit and I suppose it
could be argued that what you do is fine (except it's broken in
practice ;-))

Is there some well defined order in which we could clear the various
pointer fields safely? I'd need to get my ring macro head on to figure
that out I think.

> >> On Wed, 2013-12-04 at 04:25 +0000, Liuqiming (John) wrote:
> >>
> >>> memset can not set all the page to zero in an atomic way, and during
> >>> the clear up process oxenstored may access this ring.
> >>
> >> Why is oxenstored poking at the ring? Surely it should only do so when
> >> the guest (hvmloader) sends it a request. If hvmloader is clearing the
> >> page while there is a request/event outstanding then this is an
> >> hvmloader bug.
> 
> Ok, that makes sense.
> 
> My only hesitation is that violations of this rule (like with current 
> oxenstored) show up rarely. Presumably the xenstore ring desynchronises 
> and the guest will never be able to boot properly with PV drivers. I 
> don't think I've ever seen this myself.
> 
> Do frontends expect the xenstore ring to be in a zeroed state when they 
> startup? Assuming hvmloader has received all the outstanding 
> request/events, would it be safe to leave the ring contents alone?

Most frontends also have private versions of various pointers which
would need to be synchronised, and I suspect all of the init to zero.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: bug in hvmloader xenbus_shutdown logic
  2013-12-05 11:10       ` Ian Campbell
@ 2013-12-06  3:53         ` Liuqiming (John)
  2014-06-16 13:43           ` Dave Scott
  0 siblings, 1 reply; 29+ messages in thread
From: Liuqiming (John) @ 2013-12-06  3:53 UTC (permalink / raw)
  To: Ian Campbell, David Scott; +Cc: Yanqiangjun, xen-devel

How about we add a XS_RESET operation to xenstore and the semantic should be: reset the connection to initial state

The work flow as:
1, Hvmloader send this request to oxenstored and then poll the event
2,Oxenstored clear up IO ring and do some other work(if any) to make sure this connect reset to initial state
3,Oxenstored send a event to notify the reset work has finished

I think oxenstored should be the "owner" of this IO ring, so all the complicated operation should be done by oxenstord and other component should just send the request. 

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Thursday, December 05, 2013 7:11 PM
> To: David Scott
> Cc: Liuqiming (John); Yanqiangjun; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] bug in hvmloader xenbus_shutdown logic
> 
> On Thu, 2013-12-05 at 09:36 +0000, David Scott wrote:
> > On 05/12/13 02:08, Liuqiming (John) wrote:
> > > According to oxenstored source code, oxenstored will read every
> domain's IO ring no matter what events happened.
> > >
> > > Here is the main loop of oxenstored:
> > >
> > > let main_loop () =
> > ...
> > > 	process_domains store cons domains   <- no matter what event
> income, this will handle IO ring request of all domains
> > > 	in
> > >
> > > so when one domain's hvmloader is clearing its IO ring, oxenstored may
> access this IO ring because of another domain's event happened.
> >
> > Yes, this version of the code checks all the interdomain rings every
> > time it goes around the main loop. FWIW my experimental updated
> > oxenstore uses one (user-level) thread per connection. I had thought
> > this was just an efficiency issue... I didn't realise it had correctness
> > implications.
> 
> TBH the protocol probably doesn't say anything about not looking unless
> you've got a event, so I guess it is kind of implicit and I suppose it
> could be argued that what you do is fine (except it's broken in
> practice ;-))
> 
> Is there some well defined order in which we could clear the various
> pointer fields safely? I'd need to get my ring macro head on to figure
> that out I think.
> 
> > >> On Wed, 2013-12-04 at 04:25 +0000, Liuqiming (John) wrote:
> > >>
> > >>> memset can not set all the page to zero in an atomic way, and during
> > >>> the clear up process oxenstored may access this ring.
> > >>
> > >> Why is oxenstored poking at the ring? Surely it should only do so when
> > >> the guest (hvmloader) sends it a request. If hvmloader is clearing the
> > >> page while there is a request/event outstanding then this is an
> > >> hvmloader bug.
> >
> > Ok, that makes sense.
> >
> > My only hesitation is that violations of this rule (like with current
> > oxenstored) show up rarely. Presumably the xenstore ring desynchronises
> > and the guest will never be able to boot properly with PV drivers. I
> > don't think I've ever seen this myself.
> >
> > Do frontends expect the xenstore ring to be in a zeroed state when they
> > startup? Assuming hvmloader has received all the outstanding
> > request/events, would it be safe to leave the ring contents alone?
> 
> Most frontends also have private versions of various pointers which
> would need to be synchronised, and I suspect all of the init to zero.
> 
> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: bug in hvmloader xenbus_shutdown logic
  2013-12-06  3:53         ` Liuqiming (John)
@ 2014-06-16 13:43           ` Dave Scott
  2014-06-16 13:57             ` Paul Durrant
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Scott @ 2014-06-16 13:43 UTC (permalink / raw)
  To: Liuqiming (John)
  Cc: Dave Scott, Yanqiangjun, Ian Campbell, Paul Durrant, xen-devel

Hi,

Sorry for the delay replying!

[added Paul Durrant to cc:]

On 6 Dec 2013, at 03:53, Liuqiming (John) <john.liuqiming@huawei.com> wrote:

> How about we add a XS_RESET operation to xenstore and the semantic should be: reset the connection to initial state
> 
> The work flow as:
> 1, Hvmloader send this request to oxenstored and then poll the event
> 2,Oxenstored clear up IO ring and do some other work(if any) to make sure this connect reset to initial state
> 3,Oxenstored send a event to notify the reset work has finished
> 
> I think oxenstored should be the "owner" of this IO ring, so all the complicated operation should be done by oxenstord and other component should just send the request. 

I agree that (o)xenstored should be the “owner” of this ring and other components should just send requests.

Perhaps instead of sending a new type of request inside the ring, we could use some of the free space in the shared page to signal a reconnect? The current definition is:

/*
 * `incontents 150 xenstore_struct XenStore wire protocol.
 *
 * Inter-domain shared memory communications. */
#define XENSTORE_RING_SIZE 1024
typedef uint32_t XENSTORE_RING_IDX;
#define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
struct xenstore_domain_interface {
    char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
    char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
    XENSTORE_RING_IDX req_cons, req_prod;
    XENSTORE_RING_IDX rsp_cons, rsp_prod;
};

which means that 1024 + 1024 + 4 * 4 = 2064 bytes are used, and 2032 bytes are still free. I believe the 4k page is allocated at domain build time and the unused bytes are guaranteed to be 0 (— is this correct?)

Perhaps we could allocate 12 bytes for a version number and another pair of counters:

uint32_t ring_protocol_version;
uint32_t connection_request;
uint32_t connection_actual;

A new xenstored could set ‘ring_protocol_version’ to 1, old xenstoreds would leave it at 0.

If a new xenstore client saw the ring_protocol_version = 1, it would be able to request a reconnect by incrementing ‘connection_request’ and waiting xenstored to zero the ring and increment ‘connection_actual’.

We could then modify hvmloader to safely reset the ring before the guest boots. If a guest wants to jump into a crash kernel (with undefined xenstore ring state) it would also be able to request a reconnect and load PV drivers.

What do you think?

Cheers,
Dave

> 
>> -----Original Message-----
>> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
>> Sent: Thursday, December 05, 2013 7:11 PM
>> To: David Scott
>> Cc: Liuqiming (John); Yanqiangjun; xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] bug in hvmloader xenbus_shutdown logic
>> 
>> On Thu, 2013-12-05 at 09:36 +0000, David Scott wrote:
>>> On 05/12/13 02:08, Liuqiming (John) wrote:
>>>> According to oxenstored source code, oxenstored will read every
>> domain's IO ring no matter what events happened.
>>>> 
>>>> Here is the main loop of oxenstored:
>>>> 
>>>> let main_loop () =
>>> ...
>>>> 	process_domains store cons domains   <- no matter what event
>> income, this will handle IO ring request of all domains
>>>> 	in
>>>> 
>>>> so when one domain's hvmloader is clearing its IO ring, oxenstored may
>> access this IO ring because of another domain's event happened.
>>> 
>>> Yes, this version of the code checks all the interdomain rings every
>>> time it goes around the main loop. FWIW my experimental updated
>>> oxenstore uses one (user-level) thread per connection. I had thought
>>> this was just an efficiency issue... I didn't realise it had correctness
>>> implications.
>> 
>> TBH the protocol probably doesn't say anything about not looking unless
>> you've got a event, so I guess it is kind of implicit and I suppose it
>> could be argued that what you do is fine (except it's broken in
>> practice ;-))
>> 
>> Is there some well defined order in which we could clear the various
>> pointer fields safely? I'd need to get my ring macro head on to figure
>> that out I think.
>> 
>>>>> On Wed, 2013-12-04 at 04:25 +0000, Liuqiming (John) wrote:
>>>>> 
>>>>>> memset can not set all the page to zero in an atomic way, and during
>>>>>> the clear up process oxenstored may access this ring.
>>>>> 
>>>>> Why is oxenstored poking at the ring? Surely it should only do so when
>>>>> the guest (hvmloader) sends it a request. If hvmloader is clearing the
>>>>> page while there is a request/event outstanding then this is an
>>>>> hvmloader bug.
>>> 
>>> Ok, that makes sense.
>>> 
>>> My only hesitation is that violations of this rule (like with current
>>> oxenstored) show up rarely. Presumably the xenstore ring desynchronises
>>> and the guest will never be able to boot properly with PV drivers. I
>>> don't think I've ever seen this myself.
>>> 
>>> Do frontends expect the xenstore ring to be in a zeroed state when they
>>> startup? Assuming hvmloader has received all the outstanding
>>> request/events, would it be safe to leave the ring contents alone?
>> 
>> Most frontends also have private versions of various pointers which
>> would need to be synchronised, and I suspect all of the init to zero.
>> 
>> Ian.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: bug in hvmloader xenbus_shutdown logic
  2014-06-16 13:43           ` Dave Scott
@ 2014-06-16 13:57             ` Paul Durrant
  2014-06-25 21:15               ` [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal David Scott
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2014-06-16 13:57 UTC (permalink / raw)
  To: Dave Scott, Liuqiming (John); +Cc: Yanqiangjun, Ian Campbell, xen-devel

> -----Original Message-----
> From: Dave Scott
> Sent: 16 June 2014 14:44
> To: Liuqiming (John)
> Cc: Ian Campbell; Dave Scott; Yanqiangjun; xen-devel@lists.xen.org; Paul
> Durrant
> Subject: Re: [Xen-devel] bug in hvmloader xenbus_shutdown logic
> 
> Hi,
> 
> Sorry for the delay replying!
> 
> [added Paul Durrant to cc:]
> 
> On 6 Dec 2013, at 03:53, Liuqiming (John) <john.liuqiming@huawei.com>
> wrote:
> 
> > How about we add a XS_RESET operation to xenstore and the semantic
> should be: reset the connection to initial state
> >
> > The work flow as:
> > 1, Hvmloader send this request to oxenstored and then poll the event
> > 2,Oxenstored clear up IO ring and do some other work(if any) to make sure
> this connect reset to initial state
> > 3,Oxenstored send a event to notify the reset work has finished
> >
> > I think oxenstored should be the "owner" of this IO ring, so all the
> complicated operation should be done by oxenstord and other component
> should just send the request.
> 
> I agree that (o)xenstored should be the “owner” of this ring and other
> components should just send requests.
> 
> Perhaps instead of sending a new type of request inside the ring, we could
> use some of the free space in the shared page to signal a reconnect? The
> current definition is:
> 
> /*
>  * `incontents 150 xenstore_struct XenStore wire protocol.
>  *
>  * Inter-domain shared memory communications. */
> #define XENSTORE_RING_SIZE 1024
> typedef uint32_t XENSTORE_RING_IDX;
> #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
> struct xenstore_domain_interface {
>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>     XENSTORE_RING_IDX req_cons, req_prod;
>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
> };
> 
> which means that 1024 + 1024 + 4 * 4 = 2064 bytes are used, and 2032 bytes
> are still free. I believe the 4k page is allocated at domain build time and the
> unused bytes are guaranteed to be 0 (— is this correct?)
> 
> Perhaps we could allocate 12 bytes for a version number and another pair of
> counters:
> 
> uint32_t ring_protocol_version;
> uint32_t connection_request;
> uint32_t connection_actual;
> 
> A new xenstored could set ‘ring_protocol_version’ to 1, old xenstoreds
> would leave it at 0.
> 
> If a new xenstore client saw the ring_protocol_version = 1, it would be able
> to request a reconnect by incrementing ‘connection_request’ and waiting
> xenstored to zero the ring and increment ‘connection_actual’.
> 
> We could then modify hvmloader to safely reset the ring before the guest
> boots. If a guest wants to jump into a crash kernel (with undefined xenstore
> ring state) it would also be able to request a reconnect and load PV drivers.
> 
> What do you think?
> 

I think it would be good if 'connection_actual' were an incarnation count that is sampled by the frontend to determine whether to trust the counter values e.g. in cases where it needs to wait for a response that may never come because xenstored has had some problem, or determines that a frontend has corrupted the ring. So, xenstored bumps it if it ever resets the ring either by request or unilaterally (and it's read-only to the guest). Then perhaps 'connection_request' should become 'connection_reset' which is written to 1 by a guest requesting reset and cleared by xenstored if and when it satisfies the request.

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
  2014-06-16 13:57             ` Paul Durrant
@ 2014-06-25 21:15               ` David Scott
  2014-06-26  9:05                 ` Paul Durrant
  2014-07-02 12:32                 ` [PATCH RFC] " Andrew Cooper
  0 siblings, 2 replies; 29+ messages in thread
From: David Scott @ 2014-06-25 21:15 UTC (permalink / raw)
  To: xen-devel
  Cc: john.liuqiming, David Scott, paul.durrant, ian.campbell, yanqiangjun

Currently hvmloader uses the xenstore ring and then tries to
reset it back to its initial state. This is not part of the
ring protocol and, if xenstored reads the ring while it is
happening, xenstored will conclude it is corrupted. A corrupted
ring will prevent PV drivers from connecting. This seems to
be a rare failure.

Furthermore, when a VM crashes it may jump to a 'crash kernel'
to create a diagnostic dump. Without the ability to safely
reset the ring the PV drivers won't be able to reliably
establish connections, to (for example) stream a memory dump to
disk.

This prototype patch contains a simple extension of the
xenstore ring structure, enough to contain version numbers and
a 'closing' flag. This memory is currently unused within the
4k page and should be zeroed as part of the domain setup
process. The oxenstored server advertises version 1, and
hvmloader detects this and sets the 'closing' flag. The server
then reinitialises the ring, filling it with obviously invalid
data to help debugging (unfortunately blocks of zeroes are
valid xenstore packets) and signals hvmloader by the event
channel that it's safe to boot the guest OS.

This patch has only been lightly tested. I'd appreciate
feedback on the approach before going further.

Signed-off-by: David Scott <dave.scott@citrix.com>
---
 tools/firmware/hvmloader/xenbus.c   |   16 +++++--
 tools/ocaml/libs/xb/xb.ml           |   26 ++++++++++-
 tools/ocaml/libs/xb/xb.mli          |    3 +-
 tools/ocaml/libs/xb/xs_ring.ml      |   13 ++++++
 tools/ocaml/libs/xb/xs_ring_stubs.c |   81 ++++++++++++++++++++++++++++++++++-
 xen/include/public/io/xs_wire.h     |    8 ++++
 6 files changed, 138 insertions(+), 9 deletions(-)

diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
index fe72e97..15d961b 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -37,6 +37,8 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
 static evtchn_port_t event;                     /* Event-channel to dom0 */
 static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
 
+static void ring_wait(void);
+
 /* Connect our xenbus client to the backend.
  * Call once, before any other xenbus actions. */
 void xenbus_setup(void)
@@ -68,10 +70,16 @@ void xenbus_shutdown(void)
 
     ASSERT(rings != NULL);
 
-    /* We zero out the whole ring -- the backend can handle this, and it's 
-     * not going to surprise any frontends since it's equivalent to never 
-     * having used the rings. */
-    memset(rings, 0, sizeof *rings);
+    if (rings->server_version > XENSTORE_VERSION_0) {
+        rings->closing = 1;
+        while (rings->closing == 1)
+            ring_wait ();
+    } else {
+        /* If the backend reads the state while we're erasing it then the
+           ring state will become corrupted, preventing guest frontends from
+           connecting. This is rare. */
+        memset(rings, 0, sizeof *rings);
+    }
 
     /* Clear the event-channel state too. */
     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
index 29d354d..d5cd776 100644
--- a/tools/ocaml/libs/xb/xb.ml
+++ b/tools/ocaml/libs/xb/xb.ml
@@ -84,7 +84,26 @@ let write con s len =
 	| Fd backfd     -> write_fd backfd con s len
 	| Xenmmap backmmap -> write_mmap backmmap con s len
 
-let output con =
+(* If a function throws Xs_ring.Closing, then clear the ring state
+   and serve the ring again. *)
+let rec handle_closing f con =
+	match (try Some (f con) with Xs_ring.Closing -> None) with
+	| Some x -> x
+	| None ->
+		begin match con.backend with
+		| Fd _ -> raise Xs_ring.Closing (* should never happen, but just in case *)
+		| Xenmmap backend ->
+			Xs_ring.close backend.mmap;
+			backend.eventchn_notify ();
+			(* Clear our old connection state *)
+			Queue.clear con.pkt_in;
+			Queue.clear con.pkt_out;
+			con.partial_in <- init_partial_in ();
+			con.partial_out <- "";
+			handle_closing f con
+		end
+
+let output = handle_closing (fun con ->
 	(* get the output string from a string_of(packet) or partial_out *)
 	let s = if String.length con.partial_out > 0 then
 			con.partial_out
@@ -101,8 +120,9 @@ let output con =
 	);
 	(* after sending one packet, partial is empty *)
 	con.partial_out = ""
+)
 
-let input con =
+let input = handle_closing (fun con ->
 	let newpacket = ref false in
 	let to_read =
 		match con.partial_in with
@@ -133,6 +153,7 @@ let input con =
 			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
 	);
 	!newpacket
+)
 
 let newcon backend = {
 	backend = backend;
@@ -145,6 +166,7 @@ let newcon backend = {
 let open_fd fd = newcon (Fd { fd = fd; })
 
 let open_mmap mmap notifyfct =
+	Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
 	newcon (Xenmmap {
 		mmap = mmap;
 		eventchn_notify = notifyfct;
diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
index 58234ae..af7ea5c 100644
--- a/tools/ocaml/libs/xb/xb.mli
+++ b/tools/ocaml/libs/xb/xb.mli
@@ -23,7 +23,7 @@ module Op :
       | Resume
       | Set_target
       | Restrict
-      | Invalid (* Not a valid wire operation *)
+      | Invalid
     val operation_c_mapping : operation array
     val size : int
     val array_search : 'a -> 'a array -> int
@@ -80,6 +80,7 @@ val read : t -> string -> int -> int
 val write_fd : backend_fd -> 'a -> string -> int -> int
 val write_mmap : backend_mmap -> 'a -> string -> int -> int
 val write : t -> string -> int -> int
+val handle_closing : (t -> 'a) -> t -> 'a
 val output : t -> bool
 val input : t -> bool
 val newcon : backend -> t
diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
index 9469609..dadf9e1 100644
--- a/tools/ocaml/libs/xb/xs_ring.ml
+++ b/tools/ocaml/libs/xb/xs_ring.ml
@@ -14,5 +14,18 @@
  * GNU Lesser General Public License for more details.
  *)
 
+exception Closing
+
+let _ =
+  Callback.register_exception "Xs_ring.Closing" Closing
+
 external read: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_read"
 external write: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
+
+external set_client_version: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_client_version" "noalloc"
+external get_client_version: Xenmmap.mmap_interface -> int = "ml_interface_get_client_version" "noalloc"
+
+external set_server_version: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_version" "noalloc"
+external get_server_version: Xenmmap.mmap_interface -> int = "ml_interface_get_server_version" "noalloc"
+
+external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" "noalloc"
diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
index 8bd1047..4ddf5a7 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -35,19 +35,28 @@
 
 #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
 
+#define ERROR_UNKNOWN (-1)
+#define ERROR_CLOSING (-2)
+
 static int xs_ring_read(struct mmap_interface *interface,
                              char *buffer, int len)
 {
 	struct xenstore_domain_interface *intf = interface->addr;
 	XENSTORE_RING_IDX cons, prod; /* offsets only */
 	int to_read;
+        uint32_t closing;
 
 	cons = *(volatile uint32*)&intf->req_cons;
 	prod = *(volatile uint32*)&intf->req_prod;
+	closing = *(volatile uint32*)&intf->closing;
+
+	if (closing == 1)
+		return ERROR_CLOSING;
+
 	xen_mb();
 
 	if ((prod - cons) > XENSTORE_RING_SIZE)
-	    return -1;
+	    return ERROR_UNKNOWN;
 
 	if (prod == cons)
 		return 0;
@@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface *interface,
 	struct xenstore_domain_interface *intf = interface->addr;
 	XENSTORE_RING_IDX cons, prod;
 	int can_write;
+	uint32_t closing;
 
 	cons = *(volatile uint32*)&intf->rsp_cons;
 	prod = *(volatile uint32*)&intf->rsp_prod;
+	closing = *(volatile uint32*)&intf->closing;
+
+	if (closing == 1)
+		return ERROR_CLOSING;
+
 	xen_mb();
 	if ( (prod - cons) >= XENSTORE_RING_SIZE )
 		return 0;
@@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface, value buffer, value len)
 
 	res = xs_ring_read(GET_C_STRUCT(interface),
 	                   String_val(buffer), Int_val(len));
-	if (res == -1)
+	if (res == ERROR_UNKNOWN)
 		caml_failwith("bad connection");
+
+	if (res == ERROR_CLOSING)
+		caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
+
 	result = Val_int(res);
 	CAMLreturn(result);
 }
@@ -111,6 +130,64 @@ CAMLprim value ml_interface_write(value interface, value buffer, value len)
 
 	res = xs_ring_write(GET_C_STRUCT(interface),
 	                    String_val(buffer), Int_val(len));
+
+	if (res == ERROR_CLOSING)
+		caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
+
 	result = Val_int(res);
 	CAMLreturn(result);
 }
+
+CAMLprim value ml_interface_set_client_version(value interface, value v)
+{
+	CAMLparam2(interface, v);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	intf->client_version = Int_val(v);
+
+	CAMLreturn(Val_unit);
+}
+
+CAMLprim value ml_interface_get_client_version(value interface)
+{
+	CAMLparam1(interface);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	CAMLreturn(Val_int (intf->client_version));
+}
+
+CAMLprim value ml_interface_set_server_version(value interface, value v)
+{
+	CAMLparam2(interface, v);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	intf->server_version = Int_val(v);
+
+	CAMLreturn(Val_unit);
+}
+
+CAMLprim value ml_interface_get_server_version(value interface)
+{
+	CAMLparam1(interface);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	CAMLreturn(Val_int (intf->server_version));
+}
+
+CAMLprim value ml_interface_close(value interface)
+{
+	CAMLparam1(interface);
+	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+	int i;
+
+	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
+	/* Ensure the unused space is full of invalid xenstore packets. */
+	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
+		intf->req[i] = invalid_data[i % sizeof(invalid_data)];
+		intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
+	}
+	xen_mb ();
+	intf->closing = 0;
+	CAMLreturn(Val_unit);
+}
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 585f0c8..68460cc 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -104,6 +104,9 @@ enum xs_watch_type
     XS_WATCH_TOKEN
 };
 
+#define XENSTORE_VERSION_0 0
+#define XENSTORE_VERSION_1 1
+
 /*
  * `incontents 150 xenstore_struct XenStore wire protocol.
  *
@@ -112,10 +115,15 @@ enum xs_watch_type
 typedef uint32_t XENSTORE_RING_IDX;
 #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
 struct xenstore_domain_interface {
+    /* XENSTORE_VERSION_0 */
     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
     XENSTORE_RING_IDX req_cons, req_prod;
     XENSTORE_RING_IDX rsp_cons, rsp_prod;
+    uint32_t client_version;
+    uint32_t server_version;
+    /* XENSTORE_VERSION_1 */
+    uint32_t closing;
 };
 
 /* Violating this is very bad.  See docs/misc/xenstore.txt. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
  2014-06-25 21:15               ` [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal David Scott
@ 2014-06-26  9:05                 ` Paul Durrant
  2014-06-26  9:45                   ` Dave Scott
  2014-07-02 12:32                 ` [PATCH RFC] " Andrew Cooper
  1 sibling, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2014-06-26  9:05 UTC (permalink / raw)
  To: xen-devel; +Cc: john.liuqiming, Dave Scott, yanqiangjun, Ian Campbell

> -----Original Message-----
> From: David Scott [mailto:dave.scott@citrix.com]
> Sent: 25 June 2014 22:15
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant; Ian Campbell; john.liuqiming@huawei.com;
> yanqiangjun@huawei.com; Dave Scott
> Subject: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
> 
> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.
> 
> Furthermore, when a VM crashes it may jump to a 'crash kernel'
> to create a diagnostic dump. Without the ability to safely
> reset the ring the PV drivers won't be able to reliably
> establish connections, to (for example) stream a memory dump to
> disk.
> 
> This prototype patch contains a simple extension of the
> xenstore ring structure, enough to contain version numbers and
> a 'closing' flag. This memory is currently unused within the
> 4k page and should be zeroed as part of the domain setup
> process. The oxenstored server advertises version 1, and
> hvmloader detects this and sets the 'closing' flag. The server
> then reinitialises the ring, filling it with obviously invalid
> data to help debugging (unfortunately blocks of zeroes are
> valid xenstore packets) and signals hvmloader by the event
> channel that it's safe to boot the guest OS.
> 
> This patch has only been lightly tested. I'd appreciate
> feedback on the approach before going further.
> 
> Signed-off-by: David Scott <dave.scott@citrix.com>
> ---
>  tools/firmware/hvmloader/xenbus.c   |   16 +++++--
>  tools/ocaml/libs/xb/xb.ml           |   26 ++++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    3 +-
>  tools/ocaml/libs/xb/xs_ring.ml      |   13 ++++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |   81
> ++++++++++++++++++++++++++++++++++-
>  xen/include/public/io/xs_wire.h     |    8 ++++
>  6 files changed, 138 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/xenbus.c
> b/tools/firmware/hvmloader/xenbus.c
> index fe72e97..15d961b 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,8 @@ static struct xenstore_domain_interface *rings; /*
> Shared ring with dom0 */
>  static evtchn_port_t event;                     /* Event-channel to dom0 */
>  static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area
> */
> 
> +static void ring_wait(void);
> +

Personally, I'd move the definition of ring_wait() up rather than forward declaring it.

>  /* Connect our xenbus client to the backend.
>   * Call once, before any other xenbus actions. */
>  void xenbus_setup(void)
> @@ -68,10 +70,16 @@ void xenbus_shutdown(void)
> 
>      ASSERT(rings != NULL);
> 
> -    /* We zero out the whole ring -- the backend can handle this, and it's
> -     * not going to surprise any frontends since it's equivalent to never
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> +    if (rings->server_version > XENSTORE_VERSION_0) {

I notice in the changes to xs_wire.h below you have a client_version field. Should that not be set here too?

> +        rings->closing = 1;
> +        while (rings->closing == 1)
> +            ring_wait ();
> +    } else {
> +        /* If the backend reads the state while we're erasing it then the
> +           ring state will become corrupted, preventing guest frontends from
> +           connecting. This is rare. */
> +        memset(rings, 0, sizeof *rings);
> +    }
> 
>      /* Clear the event-channel state too. */
>      memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..d5cd776 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -84,7 +84,26 @@ let write con s len =
>  	| Fd backfd     -> write_fd backfd con s len
>  	| Xenmmap backmmap -> write_mmap backmmap con s len
> 
> -let output con =
> +(* If a function throws Xs_ring.Closing, then clear the ring state
> +   and serve the ring again. *)
> +let rec handle_closing f con =
> +	match (try Some (f con) with Xs_ring.Closing -> None) with
> +	| Some x -> x
> +	| None ->
> +		begin match con.backend with
> +		| Fd _ -> raise Xs_ring.Closing (* should never happen, but
> just in case *)
> +		| Xenmmap backend ->
> +			Xs_ring.close backend.mmap;
> +			backend.eventchn_notify ();
> +			(* Clear our old connection state *)
> +			Queue.clear con.pkt_in;
> +			Queue.clear con.pkt_out;
> +			con.partial_in <- init_partial_in ();
> +			con.partial_out <- "";
> +			handle_closing f con
> +		end
> +
> +let output = handle_closing (fun con ->
>  	(* get the output string from a string_of(packet) or partial_out *)
>  	let s = if String.length con.partial_out > 0 then
>  			con.partial_out
> @@ -101,8 +120,9 @@ let output con =
>  	);
>  	(* after sending one packet, partial is empty *)
>  	con.partial_out = ""
> +)
> 
> -let input con =
> +let input = handle_closing (fun con ->
>  	let newpacket = ref false in
>  	let to_read =
>  		match con.partial_in with
> @@ -133,6 +153,7 @@ let input con =
>  			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>  	);
>  	!newpacket
> +)
> 
>  let newcon backend = {
>  	backend = backend;
> @@ -145,6 +166,7 @@ let newcon backend = {
>  let open_fd fd = newcon (Fd { fd = fd; })
> 
>  let open_mmap mmap notifyfct =
> +	Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
>  	newcon (Xenmmap {
>  		mmap = mmap;
>  		eventchn_notify = notifyfct;
> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
> index 58234ae..af7ea5c 100644
> --- a/tools/ocaml/libs/xb/xb.mli
> +++ b/tools/ocaml/libs/xb/xb.mli
> @@ -23,7 +23,7 @@ module Op :
>        | Resume
>        | Set_target
>        | Restrict
> -      | Invalid (* Not a valid wire operation *)
> +      | Invalid
>      val operation_c_mapping : operation array
>      val size : int
>      val array_search : 'a -> 'a array -> int
> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>  val write_fd : backend_fd -> 'a -> string -> int -> int
>  val write_mmap : backend_mmap -> 'a -> string -> int -> int
>  val write : t -> string -> int -> int
> +val handle_closing : (t -> 'a) -> t -> 'a
>  val output : t -> bool
>  val input : t -> bool
>  val newcon : backend -> t
> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
> index 9469609..dadf9e1 100644
> --- a/tools/ocaml/libs/xb/xs_ring.ml
> +++ b/tools/ocaml/libs/xb/xs_ring.ml
> @@ -14,5 +14,18 @@
>   * GNU Lesser General Public License for more details.
>   *)
> 
> +exception Closing
> +
> +let _ =
> +  Callback.register_exception "Xs_ring.Closing" Closing
> +
>  external read: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_read"
>  external write: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_write"
> +
> +external set_client_version: Xenmmap.mmap_interface -> int -> unit =
> "ml_interface_set_client_version" "noalloc"
> +external get_client_version: Xenmmap.mmap_interface -> int =
> "ml_interface_get_client_version" "noalloc"
> +
> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
> "ml_interface_set_server_version" "noalloc"
> +external get_server_version: Xenmmap.mmap_interface -> int =
> "ml_interface_get_server_version" "noalloc"
> +
> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
> "noalloc"
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
> b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..4ddf5a7 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,19 +35,28 @@
> 
>  #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
> 
> +#define ERROR_UNKNOWN (-1)
> +#define ERROR_CLOSING (-2)
> +
>  static int xs_ring_read(struct mmap_interface *interface,
>                               char *buffer, int len)
>  {
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod; /* offsets only */
>  	int to_read;
> +        uint32_t closing;
> 
>  	cons = *(volatile uint32*)&intf->req_cons;
>  	prod = *(volatile uint32*)&intf->req_prod;
> +	closing = *(volatile uint32*)&intf->closing;
> +
> +	if (closing == 1)

Do you really mean if == 1 here? It's a flag, so != 0 would seem better. Also, should you test the client version?

> +		return ERROR_CLOSING;
> +
>  	xen_mb();
> 
>  	if ((prod - cons) > XENSTORE_RING_SIZE)
> -	    return -1;
> +	    return ERROR_UNKNOWN;
> 
>  	if (prod == cons)
>  		return 0;
> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
> *interface,
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod;
>  	int can_write;
> +	uint32_t closing;
> 
>  	cons = *(volatile uint32*)&intf->rsp_cons;
>  	prod = *(volatile uint32*)&intf->rsp_prod;
> +	closing = *(volatile uint32*)&intf->closing;
> +
> +	if (closing == 1)

Same again.

> +		return ERROR_CLOSING;
> +
>  	xen_mb();
>  	if ( (prod - cons) >= XENSTORE_RING_SIZE )
>  		return 0;
> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
> value buffer, value len)
> 
>  	res = xs_ring_read(GET_C_STRUCT(interface),
>  	                   String_val(buffer), Int_val(len));
> -	if (res == -1)
> +	if (res == ERROR_UNKNOWN)
>  		caml_failwith("bad connection");
> +
> +	if (res == ERROR_CLOSING)
> +
> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
>  	result = Val_int(res);
>  	CAMLreturn(result);
>  }
> @@ -111,6 +130,64 @@ CAMLprim value ml_interface_write(value interface,
> value buffer, value len)
> 
>  	res = xs_ring_write(GET_C_STRUCT(interface),
>  	                    String_val(buffer), Int_val(len));
> +
> +	if (res == ERROR_CLOSING)
> +
> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
>  	result = Val_int(res);
>  	CAMLreturn(result);
>  }
> +
> +CAMLprim value ml_interface_set_client_version(value interface, value v)
> +{
> +	CAMLparam2(interface, v);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +	intf->client_version = Int_val(v);

Why would the server ever set the client version?

> +
> +	CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_client_version(value interface)
> +{
> +	CAMLparam1(interface);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +	CAMLreturn(Val_int (intf->client_version));
> +}
> +
> +CAMLprim value ml_interface_set_server_version(value interface, value v)
> +{
> +	CAMLparam2(interface, v);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +	intf->server_version = Int_val(v);
> +
> +	CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_server_version(value interface)
> +{
> +	CAMLparam1(interface);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +	CAMLreturn(Val_int (intf->server_version));
> +}
> +
> +CAMLprim value ml_interface_close(value interface)
> +{
> +	CAMLparam1(interface);
> +	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +	int i;
> +
> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
> 0;
> +	/* Ensure the unused space is full of invalid xenstore packets. */
> +	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
> +		intf->req[i] = invalid_data[i % sizeof(invalid_data)];
> +		intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
> +	}
> +	xen_mb ();
> +	intf->closing = 0;
> +	CAMLreturn(Val_unit);
> +}
> diff --git a/xen/include/public/io/xs_wire.h
> b/xen/include/public/io/xs_wire.h
> index 585f0c8..68460cc 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -104,6 +104,9 @@ enum xs_watch_type
>      XS_WATCH_TOKEN
>  };
> 
> +#define XENSTORE_VERSION_0 0
> +#define XENSTORE_VERSION_1 1
> +
>  /*
>   * `incontents 150 xenstore_struct XenStore wire protocol.
>   *
> @@ -112,10 +115,15 @@ enum xs_watch_type
>  typedef uint32_t XENSTORE_RING_IDX;
>  #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>  struct xenstore_domain_interface {
> +    /* XENSTORE_VERSION_0 */
>      char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>      XENSTORE_RING_IDX req_cons, req_prod;
>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> +    uint32_t client_version;
> +    uint32_t server_version;

Should any values below perhaps be part of a union so that future versions may replace them, or do you intend that future versions should only extend the set of fields?

  Paul

> +    /* XENSTORE_VERSION_1 */
> +    uint32_t closing;
>  };
> 
>  /* Violating this is very bad.  See docs/misc/xenstore.txt. */
> --
> 1.7.10.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
  2014-06-26  9:05                 ` Paul Durrant
@ 2014-06-26  9:45                   ` Dave Scott
  2014-06-30 14:42                     ` [PATCH v2] " David Scott
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Scott @ 2014-06-26  9:45 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, yanqiangjun, Ian Campbell, john.liuqiming

Hi Paul,

Thanks for the comments.

On 26 Jun 2014, at 10:05, Paul Durrant <Paul.Durrant@citrix.com> wrote:

>> -----Original Message-----
>> From: David Scott [mailto:dave.scott@citrix.com]
>> Sent: 25 June 2014 22:15
>> To: xen-devel@lists.xenproject.org
>> Cc: Paul Durrant; Ian Campbell; john.liuqiming@huawei.com;
>> yanqiangjun@huawei.com; Dave Scott
>> Subject: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
>> 
>> Currently hvmloader uses the xenstore ring and then tries to
>> reset it back to its initial state. This is not part of the
>> ring protocol and, if xenstored reads the ring while it is
>> happening, xenstored will conclude it is corrupted. A corrupted
>> ring will prevent PV drivers from connecting. This seems to
>> be a rare failure.
>> 
>> Furthermore, when a VM crashes it may jump to a 'crash kernel'
>> to create a diagnostic dump. Without the ability to safely
>> reset the ring the PV drivers won't be able to reliably
>> establish connections, to (for example) stream a memory dump to
>> disk.
>> 
>> This prototype patch contains a simple extension of the
>> xenstore ring structure, enough to contain version numbers and
>> a 'closing' flag. This memory is currently unused within the
>> 4k page and should be zeroed as part of the domain setup
>> process. The oxenstored server advertises version 1, and
>> hvmloader detects this and sets the 'closing' flag. The server
>> then reinitialises the ring, filling it with obviously invalid
>> data to help debugging (unfortunately blocks of zeroes are
>> valid xenstore packets) and signals hvmloader by the event
>> channel that it's safe to boot the guest OS.
>> 
>> This patch has only been lightly tested. I'd appreciate
>> feedback on the approach before going further.
>> 
>> Signed-off-by: David Scott <dave.scott@citrix.com>
>> ---
>> tools/firmware/hvmloader/xenbus.c   |   16 +++++--
>> tools/ocaml/libs/xb/xb.ml           |   26 ++++++++++-
>> tools/ocaml/libs/xb/xb.mli          |    3 +-
>> tools/ocaml/libs/xb/xs_ring.ml      |   13 ++++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c |   81
>> ++++++++++++++++++++++++++++++++++-
>> xen/include/public/io/xs_wire.h     |    8 ++++
>> 6 files changed, 138 insertions(+), 9 deletions(-)
>> 
>> diff --git a/tools/firmware/hvmloader/xenbus.c
>> b/tools/firmware/hvmloader/xenbus.c
>> index fe72e97..15d961b 100644
>> --- a/tools/firmware/hvmloader/xenbus.c
>> +++ b/tools/firmware/hvmloader/xenbus.c
>> @@ -37,6 +37,8 @@ static struct xenstore_domain_interface *rings; /*
>> Shared ring with dom0 */
>> static evtchn_port_t event;                     /* Event-channel to dom0 */
>> static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area
>> */
>> 
>> +static void ring_wait(void);
>> +
> 
> Personally, I'd move the definition of ring_wait() up rather than forward declaring it.
> 
>> /* Connect our xenbus client to the backend.
>>  * Call once, before any other xenbus actions. */
>> void xenbus_setup(void)
>> @@ -68,10 +70,16 @@ void xenbus_shutdown(void)
>> 
>>     ASSERT(rings != NULL);
>> 
>> -    /* We zero out the whole ring -- the backend can handle this, and it's
>> -     * not going to surprise any frontends since it's equivalent to never
>> -     * having used the rings. */
>> -    memset(rings, 0, sizeof *rings);
>> +    if (rings->server_version > XENSTORE_VERSION_0) {
> 
> I notice in the changes to xs_wire.h below you have a client_version field. Should that not be set here too?

I added fields for a client version and a server version initially, thinking it sounded like a good idea (symmetry and all that). Looking at it now, it might be a premature generalisation? We only need the server_version in this particular case since only the client needs to request a close/reopen.

I agree with you that, if we have a client_version field we should set it and test it. Or drop it :-)

> 
>> +        rings->closing = 1;
>> +        while (rings->closing == 1)
>> +            ring_wait ();
>> +    } else {
>> +        /* If the backend reads the state while we're erasing it then the
>> +           ring state will become corrupted, preventing guest frontends from
>> +           connecting. This is rare. */
>> +        memset(rings, 0, sizeof *rings);
>> +    }
>> 
>>     /* Clear the event-channel state too. */
>>     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
>> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
>> index 29d354d..d5cd776 100644
>> --- a/tools/ocaml/libs/xb/xb.ml
>> +++ b/tools/ocaml/libs/xb/xb.ml
>> @@ -84,7 +84,26 @@ let write con s len =
>> 	| Fd backfd     -> write_fd backfd con s len
>> 	| Xenmmap backmmap -> write_mmap backmmap con s len
>> 
>> -let output con =
>> +(* If a function throws Xs_ring.Closing, then clear the ring state
>> +   and serve the ring again. *)
>> +let rec handle_closing f con =
>> +	match (try Some (f con) with Xs_ring.Closing -> None) with
>> +	| Some x -> x
>> +	| None ->
>> +		begin match con.backend with
>> +		| Fd _ -> raise Xs_ring.Closing (* should never happen, but
>> just in case *)
>> +		| Xenmmap backend ->
>> +			Xs_ring.close backend.mmap;
>> +			backend.eventchn_notify ();
>> +			(* Clear our old connection state *)
>> +			Queue.clear con.pkt_in;
>> +			Queue.clear con.pkt_out;
>> +			con.partial_in <- init_partial_in ();
>> +			con.partial_out <- "";
>> +			handle_closing f con
>> +		end
>> +
>> +let output = handle_closing (fun con ->
>> 	(* get the output string from a string_of(packet) or partial_out *)
>> 	let s = if String.length con.partial_out > 0 then
>> 			con.partial_out
>> @@ -101,8 +120,9 @@ let output con =
>> 	);
>> 	(* after sending one packet, partial is empty *)
>> 	con.partial_out = ""
>> +)
>> 
>> -let input con =
>> +let input = handle_closing (fun con ->
>> 	let newpacket = ref false in
>> 	let to_read =
>> 		match con.partial_in with
>> @@ -133,6 +153,7 @@ let input con =
>> 			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>> 	);
>> 	!newpacket
>> +)
>> 
>> let newcon backend = {
>> 	backend = backend;
>> @@ -145,6 +166,7 @@ let newcon backend = {
>> let open_fd fd = newcon (Fd { fd = fd; })
>> 
>> let open_mmap mmap notifyfct =
>> +	Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
>> 	newcon (Xenmmap {
>> 		mmap = mmap;
>> 		eventchn_notify = notifyfct;
>> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
>> index 58234ae..af7ea5c 100644
>> --- a/tools/ocaml/libs/xb/xb.mli
>> +++ b/tools/ocaml/libs/xb/xb.mli
>> @@ -23,7 +23,7 @@ module Op :
>>       | Resume
>>       | Set_target
>>       | Restrict
>> -      | Invalid (* Not a valid wire operation *)
>> +      | Invalid
>>     val operation_c_mapping : operation array
>>     val size : int
>>     val array_search : 'a -> 'a array -> int
>> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>> val write_fd : backend_fd -> 'a -> string -> int -> int
>> val write_mmap : backend_mmap -> 'a -> string -> int -> int
>> val write : t -> string -> int -> int
>> +val handle_closing : (t -> 'a) -> t -> 'a
>> val output : t -> bool
>> val input : t -> bool
>> val newcon : backend -> t
>> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
>> index 9469609..dadf9e1 100644
>> --- a/tools/ocaml/libs/xb/xs_ring.ml
>> +++ b/tools/ocaml/libs/xb/xs_ring.ml
>> @@ -14,5 +14,18 @@
>>  * GNU Lesser General Public License for more details.
>>  *)
>> 
>> +exception Closing
>> +
>> +let _ =
>> +  Callback.register_exception "Xs_ring.Closing" Closing
>> +
>> external read: Xenmmap.mmap_interface -> string -> int -> int =
>> "ml_interface_read"
>> external write: Xenmmap.mmap_interface -> string -> int -> int =
>> "ml_interface_write"
>> +
>> +external set_client_version: Xenmmap.mmap_interface -> int -> unit =
>> "ml_interface_set_client_version" "noalloc"
>> +external get_client_version: Xenmmap.mmap_interface -> int =
>> "ml_interface_get_client_version" "noalloc"
>> +
>> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
>> "ml_interface_set_server_version" "noalloc"
>> +external get_server_version: Xenmmap.mmap_interface -> int =
>> "ml_interface_get_server_version" "noalloc"
>> +
>> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
>> "noalloc"
>> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> index 8bd1047..4ddf5a7 100644
>> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> @@ -35,19 +35,28 @@
>> 
>> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>> 
>> +#define ERROR_UNKNOWN (-1)
>> +#define ERROR_CLOSING (-2)
>> +
>> static int xs_ring_read(struct mmap_interface *interface,
>>                              char *buffer, int len)
>> {
>> 	struct xenstore_domain_interface *intf = interface->addr;
>> 	XENSTORE_RING_IDX cons, prod; /* offsets only */
>> 	int to_read;
>> +        uint32_t closing;
>> 
>> 	cons = *(volatile uint32*)&intf->req_cons;
>> 	prod = *(volatile uint32*)&intf->req_prod;
>> +	closing = *(volatile uint32*)&intf->closing;
>> +
>> +	if (closing == 1)
> 
> Do you really mean if == 1 here? It's a flag, so != 0 would seem better. Also, should you test the client version?

!= 0 would be fine by me.

> 
>> +		return ERROR_CLOSING;
>> +
>> 	xen_mb();
>> 
>> 	if ((prod - cons) > XENSTORE_RING_SIZE)
>> -	    return -1;
>> +	    return ERROR_UNKNOWN;
>> 
>> 	if (prod == cons)
>> 		return 0;
>> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
>> *interface,
>> 	struct xenstore_domain_interface *intf = interface->addr;
>> 	XENSTORE_RING_IDX cons, prod;
>> 	int can_write;
>> +	uint32_t closing;
>> 
>> 	cons = *(volatile uint32*)&intf->rsp_cons;
>> 	prod = *(volatile uint32*)&intf->rsp_prod;
>> +	closing = *(volatile uint32*)&intf->closing;
>> +
>> +	if (closing == 1)
> 
> Same again.
> 
>> +		return ERROR_CLOSING;
>> +
>> 	xen_mb();
>> 	if ( (prod - cons) >= XENSTORE_RING_SIZE )
>> 		return 0;
>> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
>> value buffer, value len)
>> 
>> 	res = xs_ring_read(GET_C_STRUCT(interface),
>> 	                   String_val(buffer), Int_val(len));
>> -	if (res == -1)
>> +	if (res == ERROR_UNKNOWN)
>> 		caml_failwith("bad connection");
>> +
>> +	if (res == ERROR_CLOSING)
>> +
>> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
>> +
>> 	result = Val_int(res);
>> 	CAMLreturn(result);
>> }
>> @@ -111,6 +130,64 @@ CAMLprim value ml_interface_write(value interface,
>> value buffer, value len)
>> 
>> 	res = xs_ring_write(GET_C_STRUCT(interface),
>> 	                    String_val(buffer), Int_val(len));
>> +
>> +	if (res == ERROR_CLOSING)
>> +
>> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
>> +
>> 	result = Val_int(res);
>> 	CAMLreturn(result);
>> }
>> +
>> +CAMLprim value ml_interface_set_client_version(value interface, value v)
>> +{
>> +	CAMLparam2(interface, v);
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +	intf->client_version = Int_val(v);
> 
> Why would the server ever set the client version?

This is dead code which is only present for symmetry. It certainly could be removed :-)

> 
>> +
>> +	CAMLreturn(Val_unit);
>> +}
>> +
>> +CAMLprim value ml_interface_get_client_version(value interface)
>> +{
>> +	CAMLparam1(interface);
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +	CAMLreturn(Val_int (intf->client_version));
>> +}
>> +
>> +CAMLprim value ml_interface_set_server_version(value interface, value v)
>> +{
>> +	CAMLparam2(interface, v);
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +	intf->server_version = Int_val(v);
>> +
>> +	CAMLreturn(Val_unit);
>> +}
>> +
>> +CAMLprim value ml_interface_get_server_version(value interface)
>> +{
>> +	CAMLparam1(interface);
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +	CAMLreturn(Val_int (intf->server_version));
>> +}
>> +
>> +CAMLprim value ml_interface_close(value interface)
>> +{
>> +	CAMLparam1(interface);
>> +	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +	int i;
>> +
>> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
>> 0;
>> +	/* Ensure the unused space is full of invalid xenstore packets. */
>> +	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
>> +		intf->req[i] = invalid_data[i % sizeof(invalid_data)];
>> +		intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
>> +	}
>> +	xen_mb ();
>> +	intf->closing = 0;
>> +	CAMLreturn(Val_unit);
>> +}
>> diff --git a/xen/include/public/io/xs_wire.h
>> b/xen/include/public/io/xs_wire.h
>> index 585f0c8..68460cc 100644
>> --- a/xen/include/public/io/xs_wire.h
>> +++ b/xen/include/public/io/xs_wire.h
>> @@ -104,6 +104,9 @@ enum xs_watch_type
>>     XS_WATCH_TOKEN
>> };
>> 
>> +#define XENSTORE_VERSION_0 0
>> +#define XENSTORE_VERSION_1 1
>> +
>> /*
>>  * `incontents 150 xenstore_struct XenStore wire protocol.
>>  *
>> @@ -112,10 +115,15 @@ enum xs_watch_type
>> typedef uint32_t XENSTORE_RING_IDX;
>> #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>> struct xenstore_domain_interface {
>> +    /* XENSTORE_VERSION_0 */
>>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>>     XENSTORE_RING_IDX req_cons, req_prod;
>>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
>> +    uint32_t client_version;
>> +    uint32_t server_version;
> 
> Should any values below perhaps be part of a union so that future versions may replace them, or do you intend that future versions should only extend the set of fields?

Good question, I don’t know what’s best here.

I notice that the ring structure has been constant for a long time; maybe it would never change again? Or is that wishful thinking? We’ve also got just under 2k of space left, so we could do quite a lot of extending if necessary.

> 
>  Paul
> 
>> +    /* XENSTORE_VERSION_1 */
>> +    uint32_t closing;
>> };
>> 
>> /* Violating this is very bad.  See docs/misc/xenstore.txt. */

(I thought this was an appropriate last line of the diff :-)

Cheers,
Dave

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2] RFC: extend the xenstore ring with a 'closing' signal
  2014-06-26  9:45                   ` Dave Scott
@ 2014-06-30 14:42                     ` David Scott
  2014-07-03 15:15                       ` [PATCH v3] " David Scott
  2014-07-04 11:02                       ` [PATCH v2] RFC: " Ian Jackson
  0 siblings, 2 replies; 29+ messages in thread
From: David Scott @ 2014-06-30 14:42 UTC (permalink / raw)
  To: xen-devel
  Cc: john.liuqiming, David Scott, paul.durrant, ian.campbell, yanqiangjun

Currently hvmloader uses the xenstore ring and then tries to
reset it back to its initial state. This is not part of the
ring protocol and, if xenstored reads the ring while it is
happening, xenstored will conclude it is corrupted. A corrupted
ring will prevent PV drivers from connecting. This seems to
be a rare failure.

Furthermore, when a VM crashes it may jump to a 'crash kernel'
to create a diagnostic dump. Without the ability to safely
reset the ring the PV drivers won't be able to reliably
establish connections, to (for example) stream a memory dump to
disk.

This prototype patch contains a simple extension of the
xenstore ring structure, enough to contain version numbers and
a 'closing' flag. This memory is currently unused within the
4k page and should be zeroed as part of the domain setup
process. The oxenstored server advertises version 1, and
hvmloader detects this and sets the 'closing' flag. The server
then reinitialises the ring, filling it with obviously invalid
data to help debugging (unfortunately blocks of zeroes are
valid xenstore packets) and signals hvmloader by the event
channel that it's safe to boot the guest OS.

Updates since initial version (thanks to Paul Durrant for the
review):
* remove unused client version and associated dead code
* treat 'closing' as a flag by using "!=0" rather than "==1"
* hvmloader: move function up and remove forward decl
* document the existing xenstore ring and the extention in misc/

Signed-off-by: David Scott <dave.scott@citrix.com>
---
 docs/misc/xenstore-ring.txt         |   79 +++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/xenbus.c   |   40 ++++++++++--------
 tools/ocaml/libs/xb/xb.ml           |   26 +++++++++++-
 tools/ocaml/libs/xb/xb.mli          |    1 +
 tools/ocaml/libs/xb/xs_ring.ml      |   11 +++++
 tools/ocaml/libs/xb/xs_ring_stubs.c |   63 +++++++++++++++++++++++++++-
 xen/include/public/io/xs_wire.h     |    8 ++++
 7 files changed, 207 insertions(+), 21 deletions(-)
 create mode 100644 docs/misc/xenstore-ring.txt

diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
new file mode 100644
index 0000000..df2a09f
--- /dev/null
+++ b/docs/misc/xenstore-ring.txt
@@ -0,0 +1,79 @@
+
+The domain builder allocates a single page and shares it with the xenstore
+daemon. This document describes the ring protocol used to communicate via
+this page which is used to transmit and receive
+[xenstore protocol packets](xenstore.txt).
+
+In the original version (we call this "version 0"), the shared page has the
+following contents (all offsets and lengths are in octets):
+
+Offset  Length  Description
+-----------------------------------------------------------------
+0       1024    Requests: data sent to the xenstore daemon
+1024    1024    Replies: data sent to the domain
+2048    4       Request consumer offset
+2052    4       Request producer offset
+2056    4       Response consumer offset
+2060    4       Response producer offset
+
+When the page is allocated it is guaranteed to be full of zeroes.
+
+The "Requests" and "Replies" are treated as circular buffers, one for
+each direction. Each circular buffer is associated wth a producer and
+consumer offset, which are free-running counters starting from 0. A "producer"
+offset is the offset in the byte stream of the next byte to be written; a
+"consumer" offset is the offset in the byte stream of the next byte to be
+read. The byte at offset 'x' in the byte stream will be stored in
+offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
+the number of bytes still to be read, and "1024 - (producer - consumer)"
+therefore gives the amount of space currently available for writing,
+where we must avoid overwriting unread data.
+
+The circular buffers are only used to send sequences of bytes between domains.
+It is the responsibility of the layer above to segment these bytes into
+packets, as described in [xenstore.txt](xenstore.txt).
+
+The client and server domains can run concurrently on different cores and
+different sockets. We must therefore take care to avoid corruption by:
+
+  1. using atomic loads and stores when reading and writing the producer
+     and consumer offsets. If an offset were to be updated by a non-atomic
+     store then the other domain may read an invalid offset value.
+  2. ensuring request and reply data is fully read or written before
+     updating the offsets by issuing a memory barrier.
+
+If we wish to read data but find the circular buffer empty, or wish to write
+data and find the circular buffer full, we wait on a pre-arranged event
+channel. When the other party next reads or writes data to the ring, it
+will notify() this event channel and we will wake.
+
+Protocol extension for reconnection
+-----------------------------------
+
+In version 0 of the protocol it is not possible to close and reopen the
+connection. This means that if the ring is corrupted, it can never be used
+(safely) again. Extending the protocol to allow reconnection would allow
+us to:
+
+  1. use the ring in the firmware (hvmloader) and safely reset it for use
+     by the guest
+  2. re-establish a ring in a 'crash kernel' environment, allowing us to
+     write crash dumps to PV disks or network devices.
+
+In version 1 of the protocol the ring is extended with the following
+fields:
+
+Offset  Length  Description
+-----------------------------------------------------------------
+2064    4       Xenstore daemon supported protocol version
+2068    4       Close request flag
+
+In a system supporting only version 0, both these fields are guaranteed
+to contain 0 by the domain builder.
+
+In a system supporting version 1, the xenstore daemon will write "1" into
+the support protocol version field. The guest xenstore client (eg in
+hvmloader) can query the version, and if it is set to "1" it can write
+"1" to the close request flag and call notify(). The supporting xenstore
+daemon can reset the ring to an empty state and signal completion by
+clearing the flag and calling notify() again.
diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
index fe72e97..56a4128 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
 static evtchn_port_t event;                     /* Event-channel to dom0 */
 static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
 
+static void ring_wait(void)
+{
+    struct shared_info *shinfo = get_shared_info();
+    struct sched_poll poll;
+
+    memset(&poll, 0, sizeof(poll));
+    set_xen_guest_handle(poll.ports, &event);
+    poll.nr_ports = 1;
+
+    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
+        hypercall_sched_op(SCHEDOP_poll, &poll);
+}
+
 /* Connect our xenbus client to the backend.
  * Call once, before any other xenbus actions. */
 void xenbus_setup(void)
@@ -68,10 +81,16 @@ void xenbus_shutdown(void)
 
     ASSERT(rings != NULL);
 
-    /* We zero out the whole ring -- the backend can handle this, and it's 
-     * not going to surprise any frontends since it's equivalent to never 
-     * having used the rings. */
-    memset(rings, 0, sizeof *rings);
+    if (rings->server_version > XENSTORE_VERSION_0) {
+        rings->closing = 1;
+        while (rings->closing == 1)
+            ring_wait ();
+    } else {
+        /* If the backend reads the state while we're erasing it then the
+           ring state will become corrupted, preventing guest frontends from
+           connecting. This is rare. */
+        memset(rings, 0, sizeof *rings);
+    }
 
     /* Clear the event-channel state too. */
     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
@@ -81,19 +100,6 @@ void xenbus_shutdown(void)
     rings = NULL;
 }
 
-static void ring_wait(void)
-{
-    struct shared_info *shinfo = get_shared_info();
-    struct sched_poll poll;
-
-    memset(&poll, 0, sizeof(poll));
-    set_xen_guest_handle(poll.ports, &event);
-    poll.nr_ports = 1;
-
-    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
-        hypercall_sched_op(SCHEDOP_poll, &poll);
-}
-
 /* Helper functions: copy data in and out of the ring */
 static void ring_write(const char *data, uint32_t len)
 {
diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
index 29d354d..d5cd776 100644
--- a/tools/ocaml/libs/xb/xb.ml
+++ b/tools/ocaml/libs/xb/xb.ml
@@ -84,7 +84,26 @@ let write con s len =
 	| Fd backfd     -> write_fd backfd con s len
 	| Xenmmap backmmap -> write_mmap backmmap con s len
 
-let output con =
+(* If a function throws Xs_ring.Closing, then clear the ring state
+   and serve the ring again. *)
+let rec handle_closing f con =
+	match (try Some (f con) with Xs_ring.Closing -> None) with
+	| Some x -> x
+	| None ->
+		begin match con.backend with
+		| Fd _ -> raise Xs_ring.Closing (* should never happen, but just in case *)
+		| Xenmmap backend ->
+			Xs_ring.close backend.mmap;
+			backend.eventchn_notify ();
+			(* Clear our old connection state *)
+			Queue.clear con.pkt_in;
+			Queue.clear con.pkt_out;
+			con.partial_in <- init_partial_in ();
+			con.partial_out <- "";
+			handle_closing f con
+		end
+
+let output = handle_closing (fun con ->
 	(* get the output string from a string_of(packet) or partial_out *)
 	let s = if String.length con.partial_out > 0 then
 			con.partial_out
@@ -101,8 +120,9 @@ let output con =
 	);
 	(* after sending one packet, partial is empty *)
 	con.partial_out = ""
+)
 
-let input con =
+let input = handle_closing (fun con ->
 	let newpacket = ref false in
 	let to_read =
 		match con.partial_in with
@@ -133,6 +153,7 @@ let input con =
 			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
 	);
 	!newpacket
+)
 
 let newcon backend = {
 	backend = backend;
@@ -145,6 +166,7 @@ let newcon backend = {
 let open_fd fd = newcon (Fd { fd = fd; })
 
 let open_mmap mmap notifyfct =
+	Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
 	newcon (Xenmmap {
 		mmap = mmap;
 		eventchn_notify = notifyfct;
diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
index 58234ae..7f65fa3 100644
--- a/tools/ocaml/libs/xb/xb.mli
+++ b/tools/ocaml/libs/xb/xb.mli
@@ -80,6 +80,7 @@ val read : t -> string -> int -> int
 val write_fd : backend_fd -> 'a -> string -> int -> int
 val write_mmap : backend_mmap -> 'a -> string -> int -> int
 val write : t -> string -> int -> int
+val handle_closing : (t -> 'a) -> t -> 'a
 val output : t -> bool
 val input : t -> bool
 val newcon : backend -> t
diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
index 9469609..d7f0fd4 100644
--- a/tools/ocaml/libs/xb/xs_ring.ml
+++ b/tools/ocaml/libs/xb/xs_ring.ml
@@ -14,5 +14,16 @@
  * GNU Lesser General Public License for more details.
  *)
 
+exception Closing
+
+let _ =
+  Callback.register_exception "Xs_ring.Closing" Closing
+
 external read: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_read"
 external write: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
+
+
+external set_server_version: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_version" "noalloc"
+external get_server_version: Xenmmap.mmap_interface -> int = "ml_interface_get_server_version" "noalloc"
+
+external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" "noalloc"
diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
index 8bd1047..6ae6bb9 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -35,19 +35,28 @@
 
 #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
 
+#define ERROR_UNKNOWN (-1)
+#define ERROR_CLOSING (-2)
+
 static int xs_ring_read(struct mmap_interface *interface,
                              char *buffer, int len)
 {
 	struct xenstore_domain_interface *intf = interface->addr;
 	XENSTORE_RING_IDX cons, prod; /* offsets only */
 	int to_read;
+        uint32_t closing;
 
 	cons = *(volatile uint32*)&intf->req_cons;
 	prod = *(volatile uint32*)&intf->req_prod;
+	closing = *(volatile uint32*)&intf->closing;
+
+	if (closing != 0)
+		return ERROR_CLOSING;
+
 	xen_mb();
 
 	if ((prod - cons) > XENSTORE_RING_SIZE)
-	    return -1;
+	    return ERROR_UNKNOWN;
 
 	if (prod == cons)
 		return 0;
@@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface *interface,
 	struct xenstore_domain_interface *intf = interface->addr;
 	XENSTORE_RING_IDX cons, prod;
 	int can_write;
+	uint32_t closing;
 
 	cons = *(volatile uint32*)&intf->rsp_cons;
 	prod = *(volatile uint32*)&intf->rsp_prod;
+	closing = *(volatile uint32*)&intf->closing;
+
+	if (closing != 0)
+		return ERROR_CLOSING;
+
 	xen_mb();
 	if ( (prod - cons) >= XENSTORE_RING_SIZE )
 		return 0;
@@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface, value buffer, value len)
 
 	res = xs_ring_read(GET_C_STRUCT(interface),
 	                   String_val(buffer), Int_val(len));
-	if (res == -1)
+	if (res == ERROR_UNKNOWN)
 		caml_failwith("bad connection");
+
+	if (res == ERROR_CLOSING)
+		caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
+
 	result = Val_int(res);
 	CAMLreturn(result);
 }
@@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface, value buffer, value len)
 
 	res = xs_ring_write(GET_C_STRUCT(interface),
 	                    String_val(buffer), Int_val(len));
+
+	if (res == ERROR_CLOSING)
+		caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
+
 	result = Val_int(res);
 	CAMLreturn(result);
 }
+
+CAMLprim value ml_interface_set_server_version(value interface, value v)
+{
+	CAMLparam2(interface, v);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	intf->server_version = Int_val(v);
+
+	CAMLreturn(Val_unit);
+}
+
+CAMLprim value ml_interface_get_server_version(value interface)
+{
+	CAMLparam1(interface);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	CAMLreturn(Val_int (intf->server_version));
+}
+
+CAMLprim value ml_interface_close(value interface)
+{
+	CAMLparam1(interface);
+	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+	int i;
+
+	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
+	/* Ensure the unused space is full of invalid xenstore packets. */
+	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
+		intf->req[i] = invalid_data[i % sizeof(invalid_data)];
+		intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
+	}
+	xen_mb ();
+	intf->closing = 0;
+	CAMLreturn(Val_unit);
+}
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 585f0c8..2c934ca 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -108,14 +108,22 @@ enum xs_watch_type
  * `incontents 150 xenstore_struct XenStore wire protocol.
  *
  * Inter-domain shared memory communications. */
+
+#define XENSTORE_VERSION_0 0
+#define XENSTORE_VERSION_1 1
+
 #define XENSTORE_RING_SIZE 1024
 typedef uint32_t XENSTORE_RING_IDX;
 #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
 struct xenstore_domain_interface {
+    /* XENSTORE_VERSION_0 */
     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
     XENSTORE_RING_IDX req_cons, req_prod;
     XENSTORE_RING_IDX rsp_cons, rsp_prod;
+    uint32_t server_version;
+    /* XENSTORE_VERSION_1 */
+    uint32_t closing;
 };
 
 /* Violating this is very bad.  See docs/misc/xenstore.txt. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
  2014-06-25 21:15               ` [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal David Scott
  2014-06-26  9:05                 ` Paul Durrant
@ 2014-07-02 12:32                 ` Andrew Cooper
  2014-07-02 16:47                   ` Dave Scott
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2014-07-02 12:32 UTC (permalink / raw)
  To: David Scott, xen-devel
  Cc: john.liuqiming, paul.durrant, ian.campbell, yanqiangjun

On 25/06/14 22:15, David Scott wrote:
> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.
>
> Furthermore, when a VM crashes it may jump to a 'crash kernel'
> to create a diagnostic dump. Without the ability to safely
> reset the ring the PV drivers won't be able to reliably
> establish connections, to (for example) stream a memory dump to
> disk.
>
> This prototype patch contains a simple extension of the
> xenstore ring structure, enough to contain version numbers and
> a 'closing' flag. This memory is currently unused within the
> 4k page and should be zeroed as part of the domain setup
> process. The oxenstored server advertises version 1, and
> hvmloader detects this and sets the 'closing' flag. The server
> then reinitialises the ring, filling it with obviously invalid
> data to help debugging (unfortunately blocks of zeroes are
> valid xenstore packets) and signals hvmloader by the event
> channel that it's safe to boot the guest OS.
>
> This patch has only been lightly tested. I'd appreciate
> feedback on the approach before going further.
>
> Signed-off-by: David Scott <dave.scott@citrix.com>

The plan of some version information looks plausible.  Some comments
below (for the non-ocaml bits).

> ---
>  tools/firmware/hvmloader/xenbus.c   |   16 +++++--
>  tools/ocaml/libs/xb/xb.ml           |   26 ++++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    3 +-
>  tools/ocaml/libs/xb/xs_ring.ml      |   13 ++++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |   81 ++++++++++++++++++++++++++++++++++-
>  xen/include/public/io/xs_wire.h     |    8 ++++
>  6 files changed, 138 insertions(+), 9 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
> index fe72e97..15d961b 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,8 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
>  static evtchn_port_t event;                     /* Event-channel to dom0 */
>  static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
>  
> +static void ring_wait(void);
> +

Move ring_wait() up, or xenbus_shutdown() down.

>  /* Connect our xenbus client to the backend.
>   * Call once, before any other xenbus actions. */
>  void xenbus_setup(void)
> @@ -68,10 +70,16 @@ void xenbus_shutdown(void)
>  
>      ASSERT(rings != NULL);
>  
> -    /* We zero out the whole ring -- the backend can handle this, and it's 
> -     * not going to surprise any frontends since it's equivalent to never 
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> +    if (rings->server_version > XENSTORE_VERSION_0) {
> +        rings->closing = 1;
> +        while (rings->closing == 1)

This must be a volatile read of rings->closing, or the compiler is free
to optimise this to an infinite loop.

> +            ring_wait ();

Are we guarenteed to receive an event on the xenbus evtchn after the
server has cleared rings->closing?  I can't see anything in the
implementation which would do this.

> +    } else {
> +        /* If the backend reads the state while we're erasing it then the
> +           ring state will become corrupted, preventing guest frontends from
> +           connecting. This is rare. */
> +        memset(rings, 0, sizeof *rings);
> +    }

Brackets optional per Xen style.  Could you keep the left-hand column of
*'s with the comment?

>  
>      /* Clear the event-channel state too. */
>      memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
>
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..4ddf5a7 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,19 +35,28 @@
>  
>  #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>  
> +#define ERROR_UNKNOWN (-1)
> +#define ERROR_CLOSING (-2)
> +
>  static int xs_ring_read(struct mmap_interface *interface,
>                               char *buffer, int len)
>  {
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod; /* offsets only */
>  	int to_read;
> +        uint32_t closing;

Spaces in a tabbed file.

>
> diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
> index 585f0c8..68460cc 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -104,6 +104,9 @@ enum xs_watch_type
>      XS_WATCH_TOKEN
>  };
>  
> +#define XENSTORE_VERSION_0 0
> +#define XENSTORE_VERSION_1 1
> +

Do we really need mnemonics for these?  This looks rather peculiar.

~Andrew

>  /*
>   * `incontents 150 xenstore_struct XenStore wire protocol.
>   *
> @@ -112,10 +115,15 @@ enum xs_watch_type
>  typedef uint32_t XENSTORE_RING_IDX;
>  #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>  struct xenstore_domain_interface {
> +    /* XENSTORE_VERSION_0 */
>      char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>      XENSTORE_RING_IDX req_cons, req_prod;
>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> +    uint32_t client_version;
> +    uint32_t server_version;
> +    /* XENSTORE_VERSION_1 */
> +    uint32_t closing;
>  };
>  
>  /* Violating this is very bad.  See docs/misc/xenstore.txt. */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
  2014-07-02 12:32                 ` [PATCH RFC] " Andrew Cooper
@ 2014-07-02 16:47                   ` Dave Scott
  2014-07-02 16:52                     ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Scott @ 2014-07-02 16:47 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, Paul Durrant, Ian Campbell, yanqiangjun, john.liuqiming

Hi Andy,

Thanks for the feedback!

On 2 Jul 2014, at 13:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote:

> On 25/06/14 22:15, David Scott wrote:
>> Currently hvmloader uses the xenstore ring and then tries to
>> reset it back to its initial state. This is not part of the
>> ring protocol and, if xenstored reads the ring while it is
>> happening, xenstored will conclude it is corrupted. A corrupted
>> ring will prevent PV drivers from connecting. This seems to
>> be a rare failure.
>> 
>> Furthermore, when a VM crashes it may jump to a 'crash kernel'
>> to create a diagnostic dump. Without the ability to safely
>> reset the ring the PV drivers won't be able to reliably
>> establish connections, to (for example) stream a memory dump to
>> disk.
>> 
>> This prototype patch contains a simple extension of the
>> xenstore ring structure, enough to contain version numbers and
>> a 'closing' flag. This memory is currently unused within the
>> 4k page and should be zeroed as part of the domain setup
>> process. The oxenstored server advertises version 1, and
>> hvmloader detects this and sets the 'closing' flag. The server
>> then reinitialises the ring, filling it with obviously invalid
>> data to help debugging (unfortunately blocks of zeroes are
>> valid xenstore packets) and signals hvmloader by the event
>> channel that it's safe to boot the guest OS.
>> 
>> This patch has only been lightly tested. I'd appreciate
>> feedback on the approach before going further.
>> 
>> Signed-off-by: David Scott <dave.scott@citrix.com>
> 
> The plan of some version information looks plausible.  Some comments
> below (for the non-ocaml bits).
> 
>> ---
>> tools/firmware/hvmloader/xenbus.c   |   16 +++++--
>> tools/ocaml/libs/xb/xb.ml           |   26 ++++++++++-
>> tools/ocaml/libs/xb/xb.mli          |    3 +-
>> tools/ocaml/libs/xb/xs_ring.ml      |   13 ++++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c |   81 ++++++++++++++++++++++++++++++++++-
>> xen/include/public/io/xs_wire.h     |    8 ++++
>> 6 files changed, 138 insertions(+), 9 deletions(-)
>> 
>> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
>> index fe72e97..15d961b 100644
>> --- a/tools/firmware/hvmloader/xenbus.c
>> +++ b/tools/firmware/hvmloader/xenbus.c
>> @@ -37,6 +37,8 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
>> static evtchn_port_t event;                     /* Event-channel to dom0 */
>> static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
>> 
>> +static void ring_wait(void);
>> +
> 
> Move ring_wait() up, or xenbus_shutdown() down.

OK

> 
>> /* Connect our xenbus client to the backend.
>>  * Call once, before any other xenbus actions. */
>> void xenbus_setup(void)
>> @@ -68,10 +70,16 @@ void xenbus_shutdown(void)
>> 
>>     ASSERT(rings != NULL);
>> 
>> -    /* We zero out the whole ring -- the backend can handle this, and it's 
>> -     * not going to surprise any frontends since it's equivalent to never 
>> -     * having used the rings. */
>> -    memset(rings, 0, sizeof *rings);
>> +    if (rings->server_version > XENSTORE_VERSION_0) {
>> +        rings->closing = 1;
>> +        while (rings->closing == 1)
> 
> This must be a volatile read of rings->closing, or the compiler is free
> to optimise this to an infinite loop.

Aha, good spot. Is it sufficient to do something like:

-        while (rings->closing == 1)
+        while ( *(volatile uint32_t*)&rings->closing == 1)
             ring_wait ();


> 
>> +            ring_wait ();
> 
> Are we guarenteed to receive an event on the xenbus evtchn after the
> server has cleared rings->closing?  I can't see anything in the
> implementation which would do this.

Unfortunately the code is split between the OCaml and the C functions. The C functions take care of manipulating the flags, pointers and data, while the OCaml code manages the event channel. The OCaml ‘handle_exception’ function calls ‘Xs_ring.close’ (the C stub) and then calls ‘backend.eventchn_notify’. This is the only way ‘Xs_ring.close' is called, so I believe it’s ok.

> 
>> +    } else {
>> +        /* If the backend reads the state while we're erasing it then the
>> +           ring state will become corrupted, preventing guest frontends from
>> +           connecting. This is rare. */
>> +        memset(rings, 0, sizeof *rings);
>> +    }
> 
> Brackets optional per Xen style.  Could you keep the left-hand column of
> *'s with the comment?

Sure

> 
>> 
>>     /* Clear the event-channel state too. */
>>     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
>> 
>> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> index 8bd1047..4ddf5a7 100644
>> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> @@ -35,19 +35,28 @@
>> 
>> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>> 
>> +#define ERROR_UNKNOWN (-1)
>> +#define ERROR_CLOSING (-2)
>> +
>> static int xs_ring_read(struct mmap_interface *interface,
>>                              char *buffer, int len)
>> {
>> 	struct xenstore_domain_interface *intf = interface->addr;
>> 	XENSTORE_RING_IDX cons, prod; /* offsets only */
>> 	int to_read;
>> +        uint32_t closing;
> 
> Spaces in a tabbed file.

Oops

> 
>> 
>> diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
>> index 585f0c8..68460cc 100644
>> --- a/xen/include/public/io/xs_wire.h
>> +++ b/xen/include/public/io/xs_wire.h
>> @@ -104,6 +104,9 @@ enum xs_watch_type
>>     XS_WATCH_TOKEN
>> };
>> 
>> +#define XENSTORE_VERSION_0 0
>> +#define XENSTORE_VERSION_1 1
>> +
> 
> Do we really need mnemonics for these?  This looks rather peculiar.

Yeah those are probably a bit OTT. I’ll remove them.

Cheers,
Dave

> 
> ~Andrew
> 
>> /*
>>  * `incontents 150 xenstore_struct XenStore wire protocol.
>>  *
>> @@ -112,10 +115,15 @@ enum xs_watch_type
>> typedef uint32_t XENSTORE_RING_IDX;
>> #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>> struct xenstore_domain_interface {
>> +    /* XENSTORE_VERSION_0 */
>>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>>     XENSTORE_RING_IDX req_cons, req_prod;
>>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
>> +    uint32_t client_version;
>> +    uint32_t server_version;
>> +    /* XENSTORE_VERSION_1 */
>> +    uint32_t closing;
>> };
>> 
>> /* Violating this is very bad.  See docs/misc/xenstore.txt. */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal
  2014-07-02 16:47                   ` Dave Scott
@ 2014-07-02 16:52                     ` Andrew Cooper
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2014-07-02 16:52 UTC (permalink / raw)
  To: Dave Scott
  Cc: xen-devel, Paul Durrant, Ian Campbell, yanqiangjun, john.liuqiming

On 02/07/14 17:47, Dave Scott wrote:
> Hi Andy,
>
> Thanks for the feedback!
>
> On 2 Jul 2014, at 13:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
>> On 25/06/14 22:15, David Scott wrote:
>>> Currently hvmloader uses the xenstore ring and then tries to
>>> reset it back to its initial state. This is not part of the
>>> ring protocol and, if xenstored reads the ring while it is
>>> happening, xenstored will conclude it is corrupted. A corrupted
>>> ring will prevent PV drivers from connecting. This seems to
>>> be a rare failure.
>>>
>>> Furthermore, when a VM crashes it may jump to a 'crash kernel'
>>> to create a diagnostic dump. Without the ability to safely
>>> reset the ring the PV drivers won't be able to reliably
>>> establish connections, to (for example) stream a memory dump to
>>> disk.
>>>
>>> This prototype patch contains a simple extension of the
>>> xenstore ring structure, enough to contain version numbers and
>>> a 'closing' flag. This memory is currently unused within the
>>> 4k page and should be zeroed as part of the domain setup
>>> process. The oxenstored server advertises version 1, and
>>> hvmloader detects this and sets the 'closing' flag. The server
>>> then reinitialises the ring, filling it with obviously invalid
>>> data to help debugging (unfortunately blocks of zeroes are
>>> valid xenstore packets) and signals hvmloader by the event
>>> channel that it's safe to boot the guest OS.
>>>
>>> This patch has only been lightly tested. I'd appreciate
>>> feedback on the approach before going further.
>>>
>>> Signed-off-by: David Scott <dave.scott@citrix.com>
>> The plan of some version information looks plausible.  Some comments
>> below (for the non-ocaml bits).
>>
>>> ---
>>> tools/firmware/hvmloader/xenbus.c   |   16 +++++--
>>> tools/ocaml/libs/xb/xb.ml           |   26 ++++++++++-
>>> tools/ocaml/libs/xb/xb.mli          |    3 +-
>>> tools/ocaml/libs/xb/xs_ring.ml      |   13 ++++++
>>> tools/ocaml/libs/xb/xs_ring_stubs.c |   81 ++++++++++++++++++++++++++++++++++-
>>> xen/include/public/io/xs_wire.h     |    8 ++++
>>> 6 files changed, 138 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
>>> index fe72e97..15d961b 100644
>>> --- a/tools/firmware/hvmloader/xenbus.c
>>> +++ b/tools/firmware/hvmloader/xenbus.c
>>> @@ -37,6 +37,8 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
>>> static evtchn_port_t event;                     /* Event-channel to dom0 */
>>> static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
>>>
>>> +static void ring_wait(void);
>>> +
>> Move ring_wait() up, or xenbus_shutdown() down.
> OK
>
>>> /* Connect our xenbus client to the backend.
>>>  * Call once, before any other xenbus actions. */
>>> void xenbus_setup(void)
>>> @@ -68,10 +70,16 @@ void xenbus_shutdown(void)
>>>
>>>     ASSERT(rings != NULL);
>>>
>>> -    /* We zero out the whole ring -- the backend can handle this, and it's 
>>> -     * not going to surprise any frontends since it's equivalent to never 
>>> -     * having used the rings. */
>>> -    memset(rings, 0, sizeof *rings);
>>> +    if (rings->server_version > XENSTORE_VERSION_0) {
>>> +        rings->closing = 1;
>>> +        while (rings->closing == 1)
>> This must be a volatile read of rings->closing, or the compiler is free
>> to optimise this to an infinite loop.
> Aha, good spot. Is it sufficient to do something like:
>
> -        while (rings->closing == 1)
> +        while ( *(volatile uint32_t*)&rings->closing == 1)
>              ring_wait ();

Yeah - that looks better (albeit with the leading space removed).

>
>>> +            ring_wait ();
>> Are we guarenteed to receive an event on the xenbus evtchn after the
>> server has cleared rings->closing?  I can't see anything in the
>> implementation which would do this.
> Unfortunately the code is split between the OCaml and the C functions. The C functions take care of manipulating the flags, pointers and data, while the OCaml code manages the event channel. The OCaml ‘handle_exception’ function calls ‘Xs_ring.close’ (the C stub) and then calls ‘backend.eventchn_notify’. This is the only way ‘Xs_ring.close' is called, so I believe it’s ok.

Ok.

~Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
  2014-06-30 14:42                     ` [PATCH v2] " David Scott
@ 2014-07-03 15:15                       ` David Scott
  2014-07-04  8:21                         ` Paul Durrant
                                           ` (2 more replies)
  2014-07-04 11:02                       ` [PATCH v2] RFC: " Ian Jackson
  1 sibling, 3 replies; 29+ messages in thread
From: David Scott @ 2014-07-03 15:15 UTC (permalink / raw)
  To: xen-devel
  Cc: David Scott, ian.campbell, andrew.cooper3, yanqiangjun,
	paul.durrant, john.liuqiming

Currently hvmloader uses the xenstore ring and then tries to
reset it back to its initial state. This is not part of the
ring protocol and, if xenstored reads the ring while it is
happening, xenstored will conclude it is corrupted. A corrupted
ring will prevent PV drivers from connecting. This seems to
be a rare failure.

Furthermore, when a VM crashes it may jump to a 'crash kernel'
to create a diagnostic dump. Without the ability to safely
reset the ring the PV drivers won't be able to reliably
establish connections, to (for example) stream a memory dump to
disk.

This prototype patch contains a simple extension of the
xenstore ring structure, enough to contain version numbers and
a 'closing' flag. This memory is currently unused within the
4k page and should be zeroed as part of the domain setup
process. The oxenstored server advertises version 1, and
hvmloader detects this and sets the 'closing' flag. The server
then reinitialises the ring, filling it with obviously invalid
data to help debugging (unfortunately blocks of zeroes are
valid xenstore packets) and signals hvmloader by the event
channel that it's safe to boot the guest OS.

Updates since v2 (thanks to Andy Cooper for the review):
* hvmloader: use volatile for read of closing flag
* style improvements
* remove xenstore version #defines

Updates since v1 (thanks to Paul Durrant for the review):
* remove unused client version and associated dead code
* treat 'closing' as a flag by using "!=0" rather than "==1"
* hvmloader: move function up and remove forward decl
* document the existing xenstore ring and the extention in misc/

Signed-off-by: David Scott <dave.scott@citrix.com>
---
 docs/misc/xenstore-ring.txt         |   79 +++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/xenbus.c   |   39 +++++++++--------
 tools/ocaml/libs/xb/xb.ml           |   26 +++++++++++-
 tools/ocaml/libs/xb/xb.mli          |    1 +
 tools/ocaml/libs/xb/xs_ring.ml      |   11 +++++
 tools/ocaml/libs/xb/xs_ring_stubs.c |   63 +++++++++++++++++++++++++++-
 xen/include/public/io/xs_wire.h     |    5 +++
 7 files changed, 203 insertions(+), 21 deletions(-)
 create mode 100644 docs/misc/xenstore-ring.txt

diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
new file mode 100644
index 0000000..df2a09f
--- /dev/null
+++ b/docs/misc/xenstore-ring.txt
@@ -0,0 +1,79 @@
+
+The domain builder allocates a single page and shares it with the xenstore
+daemon. This document describes the ring protocol used to communicate via
+this page which is used to transmit and receive
+[xenstore protocol packets](xenstore.txt).
+
+In the original version (we call this "version 0"), the shared page has the
+following contents (all offsets and lengths are in octets):
+
+Offset  Length  Description
+-----------------------------------------------------------------
+0       1024    Requests: data sent to the xenstore daemon
+1024    1024    Replies: data sent to the domain
+2048    4       Request consumer offset
+2052    4       Request producer offset
+2056    4       Response consumer offset
+2060    4       Response producer offset
+
+When the page is allocated it is guaranteed to be full of zeroes.
+
+The "Requests" and "Replies" are treated as circular buffers, one for
+each direction. Each circular buffer is associated wth a producer and
+consumer offset, which are free-running counters starting from 0. A "producer"
+offset is the offset in the byte stream of the next byte to be written; a
+"consumer" offset is the offset in the byte stream of the next byte to be
+read. The byte at offset 'x' in the byte stream will be stored in
+offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
+the number of bytes still to be read, and "1024 - (producer - consumer)"
+therefore gives the amount of space currently available for writing,
+where we must avoid overwriting unread data.
+
+The circular buffers are only used to send sequences of bytes between domains.
+It is the responsibility of the layer above to segment these bytes into
+packets, as described in [xenstore.txt](xenstore.txt).
+
+The client and server domains can run concurrently on different cores and
+different sockets. We must therefore take care to avoid corruption by:
+
+  1. using atomic loads and stores when reading and writing the producer
+     and consumer offsets. If an offset were to be updated by a non-atomic
+     store then the other domain may read an invalid offset value.
+  2. ensuring request and reply data is fully read or written before
+     updating the offsets by issuing a memory barrier.
+
+If we wish to read data but find the circular buffer empty, or wish to write
+data and find the circular buffer full, we wait on a pre-arranged event
+channel. When the other party next reads or writes data to the ring, it
+will notify() this event channel and we will wake.
+
+Protocol extension for reconnection
+-----------------------------------
+
+In version 0 of the protocol it is not possible to close and reopen the
+connection. This means that if the ring is corrupted, it can never be used
+(safely) again. Extending the protocol to allow reconnection would allow
+us to:
+
+  1. use the ring in the firmware (hvmloader) and safely reset it for use
+     by the guest
+  2. re-establish a ring in a 'crash kernel' environment, allowing us to
+     write crash dumps to PV disks or network devices.
+
+In version 1 of the protocol the ring is extended with the following
+fields:
+
+Offset  Length  Description
+-----------------------------------------------------------------
+2064    4       Xenstore daemon supported protocol version
+2068    4       Close request flag
+
+In a system supporting only version 0, both these fields are guaranteed
+to contain 0 by the domain builder.
+
+In a system supporting version 1, the xenstore daemon will write "1" into
+the support protocol version field. The guest xenstore client (eg in
+hvmloader) can query the version, and if it is set to "1" it can write
+"1" to the close request flag and call notify(). The supporting xenstore
+daemon can reset the ring to an empty state and signal completion by
+clearing the flag and calling notify() again.
diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
index fe72e97..f85832c 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
 static evtchn_port_t event;                     /* Event-channel to dom0 */
 static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
 
+static void ring_wait(void)
+{
+    struct shared_info *shinfo = get_shared_info();
+    struct sched_poll poll;
+
+    memset(&poll, 0, sizeof(poll));
+    set_xen_guest_handle(poll.ports, &event);
+    poll.nr_ports = 1;
+
+    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
+        hypercall_sched_op(SCHEDOP_poll, &poll);
+}
+
 /* Connect our xenbus client to the backend.
  * Call once, before any other xenbus actions. */
 void xenbus_setup(void)
@@ -68,10 +81,15 @@ void xenbus_shutdown(void)
 
     ASSERT(rings != NULL);
 
-    /* We zero out the whole ring -- the backend can handle this, and it's 
-     * not going to surprise any frontends since it's equivalent to never 
-     * having used the rings. */
-    memset(rings, 0, sizeof *rings);
+    if (rings->server_version > 0) {
+        rings->closing = 1;
+        while (*(volatile uint32_t*)&rings->closing == 1)
+            ring_wait ();
+    } else
+        /* If the backend reads the state while we're erasing it then the
+         * ring state will become corrupted, preventing guest frontends from
+         * connecting. This is rare. */
+        memset(rings, 0, sizeof *rings);
 
     /* Clear the event-channel state too. */
     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
@@ -81,19 +99,6 @@ void xenbus_shutdown(void)
     rings = NULL;
 }
 
-static void ring_wait(void)
-{
-    struct shared_info *shinfo = get_shared_info();
-    struct sched_poll poll;
-
-    memset(&poll, 0, sizeof(poll));
-    set_xen_guest_handle(poll.ports, &event);
-    poll.nr_ports = 1;
-
-    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
-        hypercall_sched_op(SCHEDOP_poll, &poll);
-}
-
 /* Helper functions: copy data in and out of the ring */
 static void ring_write(const char *data, uint32_t len)
 {
diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
index 29d354d..d5cd776 100644
--- a/tools/ocaml/libs/xb/xb.ml
+++ b/tools/ocaml/libs/xb/xb.ml
@@ -84,7 +84,26 @@ let write con s len =
 	| Fd backfd     -> write_fd backfd con s len
 	| Xenmmap backmmap -> write_mmap backmmap con s len
 
-let output con =
+(* If a function throws Xs_ring.Closing, then clear the ring state
+   and serve the ring again. *)
+let rec handle_closing f con =
+	match (try Some (f con) with Xs_ring.Closing -> None) with
+	| Some x -> x
+	| None ->
+		begin match con.backend with
+		| Fd _ -> raise Xs_ring.Closing (* should never happen, but just in case *)
+		| Xenmmap backend ->
+			Xs_ring.close backend.mmap;
+			backend.eventchn_notify ();
+			(* Clear our old connection state *)
+			Queue.clear con.pkt_in;
+			Queue.clear con.pkt_out;
+			con.partial_in <- init_partial_in ();
+			con.partial_out <- "";
+			handle_closing f con
+		end
+
+let output = handle_closing (fun con ->
 	(* get the output string from a string_of(packet) or partial_out *)
 	let s = if String.length con.partial_out > 0 then
 			con.partial_out
@@ -101,8 +120,9 @@ let output con =
 	);
 	(* after sending one packet, partial is empty *)
 	con.partial_out = ""
+)
 
-let input con =
+let input = handle_closing (fun con ->
 	let newpacket = ref false in
 	let to_read =
 		match con.partial_in with
@@ -133,6 +153,7 @@ let input con =
 			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
 	);
 	!newpacket
+)
 
 let newcon backend = {
 	backend = backend;
@@ -145,6 +166,7 @@ let newcon backend = {
 let open_fd fd = newcon (Fd { fd = fd; })
 
 let open_mmap mmap notifyfct =
+	Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
 	newcon (Xenmmap {
 		mmap = mmap;
 		eventchn_notify = notifyfct;
diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
index 58234ae..7f65fa3 100644
--- a/tools/ocaml/libs/xb/xb.mli
+++ b/tools/ocaml/libs/xb/xb.mli
@@ -80,6 +80,7 @@ val read : t -> string -> int -> int
 val write_fd : backend_fd -> 'a -> string -> int -> int
 val write_mmap : backend_mmap -> 'a -> string -> int -> int
 val write : t -> string -> int -> int
+val handle_closing : (t -> 'a) -> t -> 'a
 val output : t -> bool
 val input : t -> bool
 val newcon : backend -> t
diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
index 9469609..d7f0fd4 100644
--- a/tools/ocaml/libs/xb/xs_ring.ml
+++ b/tools/ocaml/libs/xb/xs_ring.ml
@@ -14,5 +14,16 @@
  * GNU Lesser General Public License for more details.
  *)
 
+exception Closing
+
+let _ =
+  Callback.register_exception "Xs_ring.Closing" Closing
+
 external read: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_read"
 external write: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
+
+
+external set_server_version: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_version" "noalloc"
+external get_server_version: Xenmmap.mmap_interface -> int = "ml_interface_get_server_version" "noalloc"
+
+external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" "noalloc"
diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
index 8bd1047..27c98cd 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -35,19 +35,28 @@
 
 #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
 
+#define ERROR_UNKNOWN (-1)
+#define ERROR_CLOSING (-2)
+
 static int xs_ring_read(struct mmap_interface *interface,
                              char *buffer, int len)
 {
 	struct xenstore_domain_interface *intf = interface->addr;
 	XENSTORE_RING_IDX cons, prod; /* offsets only */
 	int to_read;
+	uint32_t closing;
 
 	cons = *(volatile uint32*)&intf->req_cons;
 	prod = *(volatile uint32*)&intf->req_prod;
+	closing = *(volatile uint32*)&intf->closing;
+
+	if (closing != 0)
+		return ERROR_CLOSING;
+
 	xen_mb();
 
 	if ((prod - cons) > XENSTORE_RING_SIZE)
-	    return -1;
+	    return ERROR_UNKNOWN;
 
 	if (prod == cons)
 		return 0;
@@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface *interface,
 	struct xenstore_domain_interface *intf = interface->addr;
 	XENSTORE_RING_IDX cons, prod;
 	int can_write;
+	uint32_t closing;
 
 	cons = *(volatile uint32*)&intf->rsp_cons;
 	prod = *(volatile uint32*)&intf->rsp_prod;
+	closing = *(volatile uint32*)&intf->closing;
+
+	if (closing != 0)
+		return ERROR_CLOSING;
+
 	xen_mb();
 	if ( (prod - cons) >= XENSTORE_RING_SIZE )
 		return 0;
@@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface, value buffer, value len)
 
 	res = xs_ring_read(GET_C_STRUCT(interface),
 	                   String_val(buffer), Int_val(len));
-	if (res == -1)
+	if (res == ERROR_UNKNOWN)
 		caml_failwith("bad connection");
+
+	if (res == ERROR_CLOSING)
+		caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
+
 	result = Val_int(res);
 	CAMLreturn(result);
 }
@@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface, value buffer, value len)
 
 	res = xs_ring_write(GET_C_STRUCT(interface),
 	                    String_val(buffer), Int_val(len));
+
+	if (res == ERROR_CLOSING)
+		caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
+
 	result = Val_int(res);
 	CAMLreturn(result);
 }
+
+CAMLprim value ml_interface_set_server_version(value interface, value v)
+{
+	CAMLparam2(interface, v);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	intf->server_version = Int_val(v);
+
+	CAMLreturn(Val_unit);
+}
+
+CAMLprim value ml_interface_get_server_version(value interface)
+{
+	CAMLparam1(interface);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	CAMLreturn(Val_int (intf->server_version));
+}
+
+CAMLprim value ml_interface_close(value interface)
+{
+	CAMLparam1(interface);
+	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+	int i;
+
+	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
+	/* Ensure the unused space is full of invalid xenstore packets. */
+	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
+		intf->req[i] = invalid_data[i % sizeof(invalid_data)];
+		intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
+	}
+	xen_mb ();
+	intf->closing = 0;
+	CAMLreturn(Val_unit);
+}
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 585f0c8..022d614 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -108,14 +108,19 @@ enum xs_watch_type
  * `incontents 150 xenstore_struct XenStore wire protocol.
  *
  * Inter-domain shared memory communications. */
+
 #define XENSTORE_RING_SIZE 1024
 typedef uint32_t XENSTORE_RING_IDX;
 #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
 struct xenstore_domain_interface {
+    /* XENSTORE_VERSION_0 */
     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
     XENSTORE_RING_IDX req_cons, req_prod;
     XENSTORE_RING_IDX rsp_cons, rsp_prod;
+    uint32_t server_version;
+    /* server_version 1 and later: */
+    uint32_t closing;             /* Non-zero means close in progress */
 };
 
 /* Violating this is very bad.  See docs/misc/xenstore.txt. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
  2014-07-03 15:15                       ` [PATCH v3] " David Scott
@ 2014-07-04  8:21                         ` Paul Durrant
  2014-07-04 10:00                           ` Dave Scott
  2014-07-04  9:59                         ` Ian Campbell
  2014-09-03 16:25                         ` [PATCH v4] xenstore: " David Scott
  2 siblings, 1 reply; 29+ messages in thread
From: Paul Durrant @ 2014-07-04  8:21 UTC (permalink / raw)
  To: xen-devel
  Cc: john.liuqiming, Dave Scott, yanqiangjun, Ian Campbell, Andrew Cooper

> -----Original Message-----
> From: David Scott [mailto:dave.scott@citrix.com]
> Sent: 03 July 2014 16:15
> To: xen-devel@lists.xenproject.org
> Cc: Paul Durrant; Ian Campbell; john.liuqiming@huawei.com;
> yanqiangjun@huawei.com; Andrew Cooper; Dave Scott
> Subject: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
> 
> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.
> 
> Furthermore, when a VM crashes it may jump to a 'crash kernel'
> to create a diagnostic dump. Without the ability to safely
> reset the ring the PV drivers won't be able to reliably
> establish connections, to (for example) stream a memory dump to
> disk.
> 
> This prototype patch contains a simple extension of the
> xenstore ring structure, enough to contain version numbers and
> a 'closing' flag. This memory is currently unused within the
> 4k page and should be zeroed as part of the domain setup
> process. The oxenstored server advertises version 1, and
> hvmloader detects this and sets the 'closing' flag. The server
> then reinitialises the ring, filling it with obviously invalid
> data to help debugging (unfortunately blocks of zeroes are
> valid xenstore packets) and signals hvmloader by the event
> channel that it's safe to boot the guest OS.
> 
> Updates since v2 (thanks to Andy Cooper for the review):
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
> 
> Updates since v1 (thanks to Paul Durrant for the review):
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
> 
> Signed-off-by: David Scott <dave.scott@citrix.com>
> ---
>  docs/misc/xenstore-ring.txt         |   79
> +++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/xenbus.c   |   39 +++++++++--------
>  tools/ocaml/libs/xb/xb.ml           |   26 +++++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    1 +
>  tools/ocaml/libs/xb/xs_ring.ml      |   11 +++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |   63
> +++++++++++++++++++++++++++-
>  xen/include/public/io/xs_wire.h     |    5 +++
>  7 files changed, 203 insertions(+), 21 deletions(-)
>  create mode 100644 docs/misc/xenstore-ring.txt
> 
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..df2a09f
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,79 @@
> +
> +The domain builder allocates a single page and shares it with the xenstore
> +daemon. This document describes the ring protocol used to communicate
> via
> +this page which is used to transmit and receive
> +[xenstore protocol packets](xenstore.txt).
> +
> +In the original version (we call this "version 0"), the shared page has the
> +following contents (all offsets and lengths are in octets):
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +0       1024    Requests: data sent to the xenstore daemon
> +1024    1024    Replies: data sent to the domain
> +2048    4       Request consumer offset
> +2052    4       Request producer offset
> +2056    4       Response consumer offset
> +2060    4       Response producer offset
> +
> +When the page is allocated it is guaranteed to be full of zeroes.
> +
> +The "Requests" and "Replies" are treated as circular buffers, one for
> +each direction. Each circular buffer is associated wth a producer and
> +consumer offset, which are free-running counters starting from 0. A
> "producer"
> +offset is the offset in the byte stream of the next byte to be written; a
> +"consumer" offset is the offset in the byte stream of the next byte to be
> +read. The byte at offset 'x' in the byte stream will be stored in
> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
> +the number of bytes still to be read, and "1024 - (producer - consumer)"
> +therefore gives the amount of space currently available for writing,
> +where we must avoid overwriting unread data.
> +
> +The circular buffers are only used to send sequences of bytes between
> domains.
> +It is the responsibility of the layer above to segment these bytes into
> +packets, as described in [xenstore.txt](xenstore.txt).
> +
> +The client and server domains can run concurrently on different cores and
> +different sockets. We must therefore take care to avoid corruption by:
> +
> +  1. using atomic loads and stores when reading and writing the producer
> +     and consumer offsets. If an offset were to be updated by a non-atomic
> +     store then the other domain may read an invalid offset value.
> +  2. ensuring request and reply data is fully read or written before
> +     updating the offsets by issuing a memory barrier.
> +
> +If we wish to read data but find the circular buffer empty, or wish to write
> +data and find the circular buffer full, we wait on a pre-arranged event
> +channel. When the other party next reads or writes data to the ring, it
> +will notify() this event channel and we will wake.
> +
> +Protocol extension for reconnection
> +-----------------------------------
> +
> +In version 0 of the protocol it is not possible to close and reopen the
> +connection. This means that if the ring is corrupted, it can never be used
> +(safely) again. Extending the protocol to allow reconnection would allow
> +us to:
> +
> +  1. use the ring in the firmware (hvmloader) and safely reset it for use
> +     by the guest
> +  2. re-establish a ring in a 'crash kernel' environment, allowing us to
> +     write crash dumps to PV disks or network devices.
> +
> +In version 1 of the protocol the ring is extended with the following
> +fields:
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +2064    4       Xenstore daemon supported protocol version
> +2068    4       Close request flag
> +
> +In a system supporting only version 0, both these fields are guaranteed
> +to contain 0 by the domain builder.
> +
> +In a system supporting version 1, the xenstore daemon will write "1" into
> +the support protocol version field. The guest xenstore client (eg in

s/support/supported

> +hvmloader) can query the version, and if it is set to "1" it can write
> +"1" to the close request flag and call notify(). The supporting xenstore

Perhaps, rather than 'call notify()' this should be 'signal the store event channel'?

> +daemon can reset the ring to an empty state and signal completion by
> +clearing the flag and calling notify() again.

Same here.

> diff --git a/tools/firmware/hvmloader/xenbus.c
> b/tools/firmware/hvmloader/xenbus.c
> index fe72e97..f85832c 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /*
> Shared ring with dom0 */
>  static evtchn_port_t event;                     /* Event-channel to dom0 */
>  static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area
> */
> 
> +static void ring_wait(void)
> +{
> +    struct shared_info *shinfo = get_shared_info();
> +    struct sched_poll poll;
> +
> +    memset(&poll, 0, sizeof(poll));
> +    set_xen_guest_handle(poll.ports, &event);
> +    poll.nr_ports = 1;
> +
> +    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> +        hypercall_sched_op(SCHEDOP_poll, &poll);
> +}
> +
>  /* Connect our xenbus client to the backend.
>   * Call once, before any other xenbus actions. */
>  void xenbus_setup(void)
> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
> 
>      ASSERT(rings != NULL);
> 
> -    /* We zero out the whole ring -- the backend can handle this, and it's
> -     * not going to surprise any frontends since it's equivalent to never
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> +    if (rings->server_version > 0) {
> +        rings->closing = 1;
> +        while (*(volatile uint32_t*)&rings->closing == 1)
> +            ring_wait ();
> +    } else
> +        /* If the backend reads the state while we're erasing it then the
> +         * ring state will become corrupted, preventing guest frontends from
> +         * connecting. This is rare. */
> +        memset(rings, 0, sizeof *rings);
> 
>      /* Clear the event-channel state too. */
>      memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
> @@ -81,19 +99,6 @@ void xenbus_shutdown(void)
>      rings = NULL;
>  }
> 
> -static void ring_wait(void)
> -{
> -    struct shared_info *shinfo = get_shared_info();
> -    struct sched_poll poll;
> -
> -    memset(&poll, 0, sizeof(poll));
> -    set_xen_guest_handle(poll.ports, &event);
> -    poll.nr_ports = 1;
> -
> -    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> -        hypercall_sched_op(SCHEDOP_poll, &poll);
> -}
> -
>  /* Helper functions: copy data in and out of the ring */
>  static void ring_write(const char *data, uint32_t len)
>  {
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..d5cd776 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -84,7 +84,26 @@ let write con s len =
>  	| Fd backfd     -> write_fd backfd con s len
>  	| Xenmmap backmmap -> write_mmap backmmap con s len
> 
> -let output con =
> +(* If a function throws Xs_ring.Closing, then clear the ring state
> +   and serve the ring again. *)
> +let rec handle_closing f con =
> +	match (try Some (f con) with Xs_ring.Closing -> None) with
> +	| Some x -> x
> +	| None ->
> +		begin match con.backend with
> +		| Fd _ -> raise Xs_ring.Closing (* should never happen, but
> just in case *)
> +		| Xenmmap backend ->
> +			Xs_ring.close backend.mmap;
> +			backend.eventchn_notify ();
> +			(* Clear our old connection state *)
> +			Queue.clear con.pkt_in;
> +			Queue.clear con.pkt_out;
> +			con.partial_in <- init_partial_in ();
> +			con.partial_out <- "";
> +			handle_closing f con
> +		end
> +
> +let output = handle_closing (fun con ->
>  	(* get the output string from a string_of(packet) or partial_out *)
>  	let s = if String.length con.partial_out > 0 then
>  			con.partial_out
> @@ -101,8 +120,9 @@ let output con =
>  	);
>  	(* after sending one packet, partial is empty *)
>  	con.partial_out = ""
> +)
> 
> -let input con =
> +let input = handle_closing (fun con ->
>  	let newpacket = ref false in
>  	let to_read =
>  		match con.partial_in with
> @@ -133,6 +153,7 @@ let input con =
>  			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>  	);
>  	!newpacket
> +)
> 
>  let newcon backend = {
>  	backend = backend;
> @@ -145,6 +166,7 @@ let newcon backend = {
>  let open_fd fd = newcon (Fd { fd = fd; })
> 
>  let open_mmap mmap notifyfct =
> +	Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
>  	newcon (Xenmmap {
>  		mmap = mmap;
>  		eventchn_notify = notifyfct;
> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
> index 58234ae..7f65fa3 100644
> --- a/tools/ocaml/libs/xb/xb.mli
> +++ b/tools/ocaml/libs/xb/xb.mli
> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>  val write_fd : backend_fd -> 'a -> string -> int -> int
>  val write_mmap : backend_mmap -> 'a -> string -> int -> int
>  val write : t -> string -> int -> int
> +val handle_closing : (t -> 'a) -> t -> 'a
>  val output : t -> bool
>  val input : t -> bool
>  val newcon : backend -> t
> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
> index 9469609..d7f0fd4 100644
> --- a/tools/ocaml/libs/xb/xs_ring.ml
> +++ b/tools/ocaml/libs/xb/xs_ring.ml
> @@ -14,5 +14,16 @@
>   * GNU Lesser General Public License for more details.
>   *)
> 
> +exception Closing
> +
> +let _ =
> +  Callback.register_exception "Xs_ring.Closing" Closing
> +
>  external read: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_read"
>  external write: Xenmmap.mmap_interface -> string -> int -> int =
> "ml_interface_write"
> +
> +
> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
> "ml_interface_set_server_version" "noalloc"
> +external get_server_version: Xenmmap.mmap_interface -> int =
> "ml_interface_get_server_version" "noalloc"
> +
> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
> "noalloc"
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
> b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..27c98cd 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,19 +35,28 @@
> 
>  #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
> 
> +#define ERROR_UNKNOWN (-1)
> +#define ERROR_CLOSING (-2)
> +
>  static int xs_ring_read(struct mmap_interface *interface,
>                               char *buffer, int len)
>  {
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod; /* offsets only */
>  	int to_read;
> +	uint32_t closing;
> 
>  	cons = *(volatile uint32*)&intf->req_cons;
>  	prod = *(volatile uint32*)&intf->req_prod;
> +	closing = *(volatile uint32*)&intf->closing;
> +
> +	if (closing != 0)
> +		return ERROR_CLOSING;
> +
>  	xen_mb();
> 
>  	if ((prod - cons) > XENSTORE_RING_SIZE)
> -	    return -1;
> +	    return ERROR_UNKNOWN;
> 
>  	if (prod == cons)
>  		return 0;
> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
> *interface,
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod;
>  	int can_write;
> +	uint32_t closing;
> 
>  	cons = *(volatile uint32*)&intf->rsp_cons;
>  	prod = *(volatile uint32*)&intf->rsp_prod;
> +	closing = *(volatile uint32*)&intf->closing;
> +
> +	if (closing != 0)
> +		return ERROR_CLOSING;
> +
>  	xen_mb();
>  	if ( (prod - cons) >= XENSTORE_RING_SIZE )
>  		return 0;
> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
> value buffer, value len)
> 
>  	res = xs_ring_read(GET_C_STRUCT(interface),
>  	                   String_val(buffer), Int_val(len));
> -	if (res == -1)
> +	if (res == ERROR_UNKNOWN)
>  		caml_failwith("bad connection");
> +
> +	if (res == ERROR_CLOSING)
> +

This looks a bit wrong. The blank line above should be removed and the following line indented appropriately.

> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
>  	result = Val_int(res);
>  	CAMLreturn(result);
>  }
> @@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface,
> value buffer, value len)
> 
>  	res = xs_ring_write(GET_C_STRUCT(interface),
>  	                    String_val(buffer), Int_val(len));
> +
> +	if (res == ERROR_CLOSING)
> +

Same here.

> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
> +
>  	result = Val_int(res);
>  	CAMLreturn(result);
>  }
> +
> +CAMLprim value ml_interface_set_server_version(value interface, value v)
> +{
> +	CAMLparam2(interface, v);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +	intf->server_version = Int_val(v);
> +
> +	CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_server_version(value interface)
> +{
> +	CAMLparam1(interface);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +
> +	CAMLreturn(Val_int (intf->server_version));
> +}
> +
> +CAMLprim value ml_interface_close(value interface)
> +{
> +	CAMLparam1(interface);
> +	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
> >addr;
> +	int i;
> +
> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
> 0;
> +	/* Ensure the unused space is full of invalid xenstore packets. */
> +	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
> +		intf->req[i] = invalid_data[i % sizeof(invalid_data)];
> +		intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
> +	}

This does not reset the rings to their initial state (i.e. all zeroes). I don't think that's necessarily a problem, but it is inconsistent.

> +	xen_mb ();
> +	intf->closing = 0;
> +	CAMLreturn(Val_unit);
> +}
> diff --git a/xen/include/public/io/xs_wire.h
> b/xen/include/public/io/xs_wire.h
> index 585f0c8..022d614 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -108,14 +108,19 @@ enum xs_watch_type
>   * `incontents 150 xenstore_struct XenStore wire protocol.
>   *
>   * Inter-domain shared memory communications. */
> +

Pure whitespace change. Not necessary.

  Paul

>  #define XENSTORE_RING_SIZE 1024
>  typedef uint32_t XENSTORE_RING_IDX;
>  #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>  struct xenstore_domain_interface {
> +    /* XENSTORE_VERSION_0 */
>      char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>      XENSTORE_RING_IDX req_cons, req_prod;
>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> +    uint32_t server_version;
> +    /* server_version 1 and later: */
> +    uint32_t closing;             /* Non-zero means close in progress */
>  };
> 
>  /* Violating this is very bad.  See docs/misc/xenstore.txt. */
> --
> 1.7.10.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
  2014-07-03 15:15                       ` [PATCH v3] " David Scott
  2014-07-04  8:21                         ` Paul Durrant
@ 2014-07-04  9:59                         ` Ian Campbell
  2014-07-04 10:19                           ` Dave Scott
  2014-09-03 16:25                         ` [PATCH v4] xenstore: " David Scott
  2 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-07-04  9:59 UTC (permalink / raw)
  To: David Scott
  Cc: xen-devel, paul.durrant, andrew.cooper3, yanqiangjun, john.liuqiming

On Thu, 2014-07-03 at 16:15 +0100, David Scott wrote:

This all looks broadly sensible to me.

> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.
> 
> Furthermore, when a VM crashes it may jump to a 'crash kernel'
> to create a diagnostic dump. Without the ability to safely
> reset the ring the PV drivers won't be able to reliably
> establish connections, to (for example) stream a memory dump to
> disk.
> 
> This prototype patch contains a simple extension of the
> xenstore ring structure, enough to contain version numbers and
> a 'closing' flag. This memory is currently unused within the
> 4k page and should be zeroed as part of the domain setup
> process. The oxenstored server advertises version 1, and
> hvmloader detects this and sets the 'closing' flag. The server
> then reinitialises the ring, filling it with obviously invalid
> data to help debugging (unfortunately blocks of zeroes are
> valid xenstore packets) and signals hvmloader by the event
> channel that it's safe to boot the guest OS.
> 
> Updates since v2 (thanks to Andy Cooper for the review):

Please can you put these updates after the "---" marker. This will cause
the to be omitted from the final commit by the git am tool.

> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
> 
> Updates since v1 (thanks to Paul Durrant for the review):
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
> 
> Signed-off-by: David Scott <dave.scott@citrix.com>
> ---
>  docs/misc/xenstore-ring.txt         |   79 +++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/xenbus.c   |   39 +++++++++--------
>  tools/ocaml/libs/xb/xb.ml           |   26 +++++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    1 +
>  tools/ocaml/libs/xb/xs_ring.ml      |   11 +++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |   63 +++++++++++++++++++++++++++-
>  xen/include/public/io/xs_wire.h     |    5 +++
>  7 files changed, 203 insertions(+), 21 deletions(-)
>  create mode 100644 docs/misc/xenstore-ring.txt
> 
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..df2a09f
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,79 @@
> +
> +The domain builder allocates a single page and shares it with the xenstore
> +daemon. This document describes the ring protocol used to communicate via
> +this page which is used to transmit and receive
> +[xenstore protocol packets](xenstore.txt).
> +
> +In the original version (we call this "version 0"), the shared page has the
> +following contents (all offsets and lengths are in octets):
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +0       1024    Requests: data sent to the xenstore daemon
> +1024    1024    Replies: data sent to the domain
> +2048    4       Request consumer offset
> +2052    4       Request producer offset
> +2056    4       Response consumer offset
> +2060    4       Response producer offset
> +
> +When the page is allocated it is guaranteed to be full of zeroes.
> +
> +The "Requests" and "Replies" are treated as circular buffers, one for
> +each direction. Each circular buffer is associated wth a producer and
> +consumer offset, which are free-running counters starting from 0. A "producer"
> +offset is the offset in the byte stream of the next byte to be written; a
> +"consumer" offset is the offset in the byte stream of the next byte to be
> +read. The byte at offset 'x' in the byte stream will be stored in
> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
> +the number of bytes still to be read, and "1024 - (producer - consumer)"
> +therefore gives the amount of space currently available for writing,
> +where we must avoid overwriting unread data.
> +
> +The circular buffers are only used to send sequences of bytes between domains.
> +It is the responsibility of the layer above to segment these bytes into
> +packets, as described in [xenstore.txt](xenstore.txt).
> +
> +The client and server domains can run concurrently on different cores and
> +different sockets. We must therefore take care to avoid corruption by:
> +
> +  1. using atomic loads and stores when reading and writing the producer
> +     and consumer offsets. If an offset were to be updated by a non-atomic
> +     store then the other domain may read an invalid offset value.
> +  2. ensuring request and reply data is fully read or written before
> +     updating the offsets by issuing a memory barrier.
> +
> +If we wish to read data but find the circular buffer empty, or wish to write
> +data and find the circular buffer full, we wait on a pre-arranged event
> +channel. When the other party next reads or writes data to the ring, it
> +will notify() this event channel and we will wake.
> +
> +Protocol extension for reconnection
> +-----------------------------------
> +
> +In version 0 of the protocol it is not possible to close and reopen the
> +connection. This means that if the ring is corrupted, it can never be used
> +(safely) again. Extending the protocol to allow reconnection would allow
> +us to:

The use of past tense here is a bit odd given that this patch implements
and enables all of this stuff.

> +
> +  1. use the ring in the firmware (hvmloader) and safely reset it for use
> +     by the guest
> +  2. re-establish a ring in a 'crash kernel' environment, allowing us to
> +     write crash dumps to PV disks or network devices.
> +
> +In version 1 of the protocol the ring is extended with the following
> +fields:
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +2064    4       Xenstore daemon supported protocol version
> +2068    4       Close request flag
> +
> +In a system supporting only version 0, both these fields are guaranteed
> +to contain 0 by the domain builder.
> +
> +In a system supporting version 1, the xenstore daemon will write "1" into
> +the support protocol version field. The guest xenstore client (eg in
> +hvmloader) can query the version, and if it is set to "1" it can write
> +"1" to the close request flag and call notify(). The supporting xenstore
> +daemon can reset the ring to an empty state and signal completion by
> +clearing the flag and calling notify() again.

I agree with Paul WRT s/calling notify()/kick the evtchn/.

> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
> index fe72e97..f85832c 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
>  static evtchn_port_t event;                     /* Event-channel to dom0 */
>  static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
>  
> +static void ring_wait(void)
> +{
> +    struct shared_info *shinfo = get_shared_info();
> +    struct sched_poll poll;
> +
> +    memset(&poll, 0, sizeof(poll));
> +    set_xen_guest_handle(poll.ports, &event);
> +    poll.nr_ports = 1;
> +
> +    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> +        hypercall_sched_op(SCHEDOP_poll, &poll);
> +}
> +
>  /* Connect our xenbus client to the backend.
>   * Call once, before any other xenbus actions. */
>  void xenbus_setup(void)
> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
>  
>      ASSERT(rings != NULL);
>  
> -    /* We zero out the whole ring -- the backend can handle this, and it's 
> -     * not going to surprise any frontends since it's equivalent to never 
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> +    if (rings->server_version > 0) {

hvmloader is part of the Xen release and is therefore entitled to assume
that it is running against the xenstore(s) which shipped with that
release of Xen.

IOW you can omit this check or replace it with an assertion if you
prefer.

(Later: OIC, I thought you'd patched the cxenstored but that was
actually just the ocaml stubs. This is fine then)

> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..d5cd776 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml

I'll trust you've got all this ocaml stuff right ;-)

Is there someone else you can CC to get a pair of savvy eyes on it?

> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..27c98cd 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,19 +35,28 @@
>  
>  #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>  
> +#define ERROR_UNKNOWN (-1)
> +#define ERROR_CLOSING (-2)
> +
>  static int xs_ring_read(struct mmap_interface *interface,
>                               char *buffer, int len)
>  {
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod; /* offsets only */
>  	int to_read;
> +	uint32_t closing;
>  
>  	cons = *(volatile uint32*)&intf->req_cons;
>  	prod = *(volatile uint32*)&intf->req_prod;
> +	closing = *(volatile uint32*)&intf->closing;
> +
> +	if (closing != 0)
> +		return ERROR_CLOSING;

It's not possible to just raise the relevant exception here rather than
propagating this to all the callers?

> +CAMLprim value ml_interface_close(value interface)
> +{
> +	CAMLparam1(interface);
> +	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> +	int i;
> +
> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
> +	/* Ensure the unused space is full of invalid xenstore packets. */

Is filling with ones or zeroes not sufficient?

Also, is this strictly necessary of just for debugging purposes?

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
  2014-07-04  8:21                         ` Paul Durrant
@ 2014-07-04 10:00                           ` Dave Scott
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Scott @ 2014-07-04 10:00 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, yanqiangjun, Andrew Cooper, Ian Campbell, john.liuqiming

Hi Paul,

On 4 Jul 2014, at 09:21, Paul Durrant <Paul.Durrant@citrix.com> wrote:

>> -----Original Message-----
>> From: David Scott [mailto:dave.scott@citrix.com]
>> Sent: 03 July 2014 16:15
>> To: xen-devel@lists.xenproject.org
>> Cc: Paul Durrant; Ian Campbell; john.liuqiming@huawei.com;
>> yanqiangjun@huawei.com; Andrew Cooper; Dave Scott
>> Subject: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
>> 
>> Currently hvmloader uses the xenstore ring and then tries to
>> reset it back to its initial state. This is not part of the
>> ring protocol and, if xenstored reads the ring while it is
>> happening, xenstored will conclude it is corrupted. A corrupted
>> ring will prevent PV drivers from connecting. This seems to
>> be a rare failure.
>> 
>> Furthermore, when a VM crashes it may jump to a 'crash kernel'
>> to create a diagnostic dump. Without the ability to safely
>> reset the ring the PV drivers won't be able to reliably
>> establish connections, to (for example) stream a memory dump to
>> disk.
>> 
>> This prototype patch contains a simple extension of the
>> xenstore ring structure, enough to contain version numbers and
>> a 'closing' flag. This memory is currently unused within the
>> 4k page and should be zeroed as part of the domain setup
>> process. The oxenstored server advertises version 1, and
>> hvmloader detects this and sets the 'closing' flag. The server
>> then reinitialises the ring, filling it with obviously invalid
>> data to help debugging (unfortunately blocks of zeroes are
>> valid xenstore packets) and signals hvmloader by the event
>> channel that it's safe to boot the guest OS.
>> 
>> Updates since v2 (thanks to Andy Cooper for the review):
>> * hvmloader: use volatile for read of closing flag
>> * style improvements
>> * remove xenstore version #defines
>> 
>> Updates since v1 (thanks to Paul Durrant for the review):
>> * remove unused client version and associated dead code
>> * treat 'closing' as a flag by using "!=0" rather than "==1"
>> * hvmloader: move function up and remove forward decl
>> * document the existing xenstore ring and the extention in misc/
>> 
>> Signed-off-by: David Scott <dave.scott@citrix.com>
>> ---
>> docs/misc/xenstore-ring.txt         |   79
>> +++++++++++++++++++++++++++++++++++
>> tools/firmware/hvmloader/xenbus.c   |   39 +++++++++--------
>> tools/ocaml/libs/xb/xb.ml           |   26 +++++++++++-
>> tools/ocaml/libs/xb/xb.mli          |    1 +
>> tools/ocaml/libs/xb/xs_ring.ml      |   11 +++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c |   63
>> +++++++++++++++++++++++++++-
>> xen/include/public/io/xs_wire.h     |    5 +++
>> 7 files changed, 203 insertions(+), 21 deletions(-)
>> create mode 100644 docs/misc/xenstore-ring.txt
>> 
>> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
>> new file mode 100644
>> index 0000000..df2a09f
>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
>> @@ -0,0 +1,79 @@
>> +
>> +The domain builder allocates a single page and shares it with the xenstore
>> +daemon. This document describes the ring protocol used to communicate
>> via
>> +this page which is used to transmit and receive
>> +[xenstore protocol packets](xenstore.txt).
>> +
>> +In the original version (we call this "version 0"), the shared page has the
>> +following contents (all offsets and lengths are in octets):
>> +
>> +Offset  Length  Description
>> +-----------------------------------------------------------------
>> +0       1024    Requests: data sent to the xenstore daemon
>> +1024    1024    Replies: data sent to the domain
>> +2048    4       Request consumer offset
>> +2052    4       Request producer offset
>> +2056    4       Response consumer offset
>> +2060    4       Response producer offset
>> +
>> +When the page is allocated it is guaranteed to be full of zeroes.
>> +
>> +The "Requests" and "Replies" are treated as circular buffers, one for
>> +each direction. Each circular buffer is associated wth a producer and
>> +consumer offset, which are free-running counters starting from 0. A
>> "producer"
>> +offset is the offset in the byte stream of the next byte to be written; a
>> +"consumer" offset is the offset in the byte stream of the next byte to be
>> +read. The byte at offset 'x' in the byte stream will be stored in
>> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
>> +the number of bytes still to be read, and "1024 - (producer - consumer)"
>> +therefore gives the amount of space currently available for writing,
>> +where we must avoid overwriting unread data.
>> +
>> +The circular buffers are only used to send sequences of bytes between
>> domains.
>> +It is the responsibility of the layer above to segment these bytes into
>> +packets, as described in [xenstore.txt](xenstore.txt).
>> +
>> +The client and server domains can run concurrently on different cores and
>> +different sockets. We must therefore take care to avoid corruption by:
>> +
>> +  1. using atomic loads and stores when reading and writing the producer
>> +     and consumer offsets. If an offset were to be updated by a non-atomic
>> +     store then the other domain may read an invalid offset value.
>> +  2. ensuring request and reply data is fully read or written before
>> +     updating the offsets by issuing a memory barrier.
>> +
>> +If we wish to read data but find the circular buffer empty, or wish to write
>> +data and find the circular buffer full, we wait on a pre-arranged event
>> +channel. When the other party next reads or writes data to the ring, it
>> +will notify() this event channel and we will wake.
>> +
>> +Protocol extension for reconnection
>> +-----------------------------------
>> +
>> +In version 0 of the protocol it is not possible to close and reopen the
>> +connection. This means that if the ring is corrupted, it can never be used
>> +(safely) again. Extending the protocol to allow reconnection would allow
>> +us to:
>> +
>> +  1. use the ring in the firmware (hvmloader) and safely reset it for use
>> +     by the guest
>> +  2. re-establish a ring in a 'crash kernel' environment, allowing us to
>> +     write crash dumps to PV disks or network devices.
>> +
>> +In version 1 of the protocol the ring is extended with the following
>> +fields:
>> +
>> +Offset  Length  Description
>> +-----------------------------------------------------------------
>> +2064    4       Xenstore daemon supported protocol version
>> +2068    4       Close request flag
>> +
>> +In a system supporting only version 0, both these fields are guaranteed
>> +to contain 0 by the domain builder.
>> +
>> +In a system supporting version 1, the xenstore daemon will write "1" into
>> +the support protocol version field. The guest xenstore client (eg in
> 
> s/support/supported
> 
>> +hvmloader) can query the version, and if it is set to "1" it can write
>> +"1" to the close request flag and call notify(). The supporting xenstore
> 
> Perhaps, rather than 'call notify()' this should be 'signal the store event channel'?
> 
>> +daemon can reset the ring to an empty state and signal completion by
>> +clearing the flag and calling notify() again.
> 
> Same here.
> 
>> diff --git a/tools/firmware/hvmloader/xenbus.c
>> b/tools/firmware/hvmloader/xenbus.c
>> index fe72e97..f85832c 100644
>> --- a/tools/firmware/hvmloader/xenbus.c
>> +++ b/tools/firmware/hvmloader/xenbus.c
>> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /*
>> Shared ring with dom0 */
>> static evtchn_port_t event;                     /* Event-channel to dom0 */
>> static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area
>> */
>> 
>> +static void ring_wait(void)
>> +{
>> +    struct shared_info *shinfo = get_shared_info();
>> +    struct sched_poll poll;
>> +
>> +    memset(&poll, 0, sizeof(poll));
>> +    set_xen_guest_handle(poll.ports, &event);
>> +    poll.nr_ports = 1;
>> +
>> +    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
>> +        hypercall_sched_op(SCHEDOP_poll, &poll);
>> +}
>> +
>> /* Connect our xenbus client to the backend.
>>  * Call once, before any other xenbus actions. */
>> void xenbus_setup(void)
>> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
>> 
>>     ASSERT(rings != NULL);
>> 
>> -    /* We zero out the whole ring -- the backend can handle this, and it's
>> -     * not going to surprise any frontends since it's equivalent to never
>> -     * having used the rings. */
>> -    memset(rings, 0, sizeof *rings);
>> +    if (rings->server_version > 0) {
>> +        rings->closing = 1;
>> +        while (*(volatile uint32_t*)&rings->closing == 1)
>> +            ring_wait ();
>> +    } else
>> +        /* If the backend reads the state while we're erasing it then the
>> +         * ring state will become corrupted, preventing guest frontends from
>> +         * connecting. This is rare. */
>> +        memset(rings, 0, sizeof *rings);
>> 
>>     /* Clear the event-channel state too. */
>>     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
>> @@ -81,19 +99,6 @@ void xenbus_shutdown(void)
>>     rings = NULL;
>> }
>> 
>> -static void ring_wait(void)
>> -{
>> -    struct shared_info *shinfo = get_shared_info();
>> -    struct sched_poll poll;
>> -
>> -    memset(&poll, 0, sizeof(poll));
>> -    set_xen_guest_handle(poll.ports, &event);
>> -    poll.nr_ports = 1;
>> -
>> -    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
>> -        hypercall_sched_op(SCHEDOP_poll, &poll);
>> -}
>> -
>> /* Helper functions: copy data in and out of the ring */
>> static void ring_write(const char *data, uint32_t len)
>> {
>> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
>> index 29d354d..d5cd776 100644
>> --- a/tools/ocaml/libs/xb/xb.ml
>> +++ b/tools/ocaml/libs/xb/xb.ml
>> @@ -84,7 +84,26 @@ let write con s len =
>> 	| Fd backfd     -> write_fd backfd con s len
>> 	| Xenmmap backmmap -> write_mmap backmmap con s len
>> 
>> -let output con =
>> +(* If a function throws Xs_ring.Closing, then clear the ring state
>> +   and serve the ring again. *)
>> +let rec handle_closing f con =
>> +	match (try Some (f con) with Xs_ring.Closing -> None) with
>> +	| Some x -> x
>> +	| None ->
>> +		begin match con.backend with
>> +		| Fd _ -> raise Xs_ring.Closing (* should never happen, but
>> just in case *)
>> +		| Xenmmap backend ->
>> +			Xs_ring.close backend.mmap;
>> +			backend.eventchn_notify ();
>> +			(* Clear our old connection state *)
>> +			Queue.clear con.pkt_in;
>> +			Queue.clear con.pkt_out;
>> +			con.partial_in <- init_partial_in ();
>> +			con.partial_out <- "";
>> +			handle_closing f con
>> +		end
>> +
>> +let output = handle_closing (fun con ->
>> 	(* get the output string from a string_of(packet) or partial_out *)
>> 	let s = if String.length con.partial_out > 0 then
>> 			con.partial_out
>> @@ -101,8 +120,9 @@ let output con =
>> 	);
>> 	(* after sending one packet, partial is empty *)
>> 	con.partial_out = ""
>> +)
>> 
>> -let input con =
>> +let input = handle_closing (fun con ->
>> 	let newpacket = ref false in
>> 	let to_read =
>> 		match con.partial_in with
>> @@ -133,6 +153,7 @@ let input con =
>> 			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>> 	);
>> 	!newpacket
>> +)
>> 
>> let newcon backend = {
>> 	backend = backend;
>> @@ -145,6 +166,7 @@ let newcon backend = {
>> let open_fd fd = newcon (Fd { fd = fd; })
>> 
>> let open_mmap mmap notifyfct =
>> +	Xs_ring.set_server_version mmap 1; (* defined in xs_wire.h *)
>> 	newcon (Xenmmap {
>> 		mmap = mmap;
>> 		eventchn_notify = notifyfct;
>> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
>> index 58234ae..7f65fa3 100644
>> --- a/tools/ocaml/libs/xb/xb.mli
>> +++ b/tools/ocaml/libs/xb/xb.mli
>> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>> val write_fd : backend_fd -> 'a -> string -> int -> int
>> val write_mmap : backend_mmap -> 'a -> string -> int -> int
>> val write : t -> string -> int -> int
>> +val handle_closing : (t -> 'a) -> t -> 'a
>> val output : t -> bool
>> val input : t -> bool
>> val newcon : backend -> t
>> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
>> index 9469609..d7f0fd4 100644
>> --- a/tools/ocaml/libs/xb/xs_ring.ml
>> +++ b/tools/ocaml/libs/xb/xs_ring.ml
>> @@ -14,5 +14,16 @@
>>  * GNU Lesser General Public License for more details.
>>  *)
>> 
>> +exception Closing
>> +
>> +let _ =
>> +  Callback.register_exception "Xs_ring.Closing" Closing
>> +
>> external read: Xenmmap.mmap_interface -> string -> int -> int =
>> "ml_interface_read"
>> external write: Xenmmap.mmap_interface -> string -> int -> int =
>> "ml_interface_write"
>> +
>> +
>> +external set_server_version: Xenmmap.mmap_interface -> int -> unit =
>> "ml_interface_set_server_version" "noalloc"
>> +external get_server_version: Xenmmap.mmap_interface -> int =
>> "ml_interface_get_server_version" "noalloc"
>> +
>> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close"
>> "noalloc"
>> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> index 8bd1047..27c98cd 100644
>> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> @@ -35,19 +35,28 @@
>> 
>> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>> 
>> +#define ERROR_UNKNOWN (-1)
>> +#define ERROR_CLOSING (-2)
>> +
>> static int xs_ring_read(struct mmap_interface *interface,
>>                              char *buffer, int len)
>> {
>> 	struct xenstore_domain_interface *intf = interface->addr;
>> 	XENSTORE_RING_IDX cons, prod; /* offsets only */
>> 	int to_read;
>> +	uint32_t closing;
>> 
>> 	cons = *(volatile uint32*)&intf->req_cons;
>> 	prod = *(volatile uint32*)&intf->req_prod;
>> +	closing = *(volatile uint32*)&intf->closing;
>> +
>> +	if (closing != 0)
>> +		return ERROR_CLOSING;
>> +
>> 	xen_mb();
>> 
>> 	if ((prod - cons) > XENSTORE_RING_SIZE)
>> -	    return -1;
>> +	    return ERROR_UNKNOWN;
>> 
>> 	if (prod == cons)
>> 		return 0;
>> @@ -71,9 +80,15 @@ static int xs_ring_write(struct mmap_interface
>> *interface,
>> 	struct xenstore_domain_interface *intf = interface->addr;
>> 	XENSTORE_RING_IDX cons, prod;
>> 	int can_write;
>> +	uint32_t closing;
>> 
>> 	cons = *(volatile uint32*)&intf->rsp_cons;
>> 	prod = *(volatile uint32*)&intf->rsp_prod;
>> +	closing = *(volatile uint32*)&intf->closing;
>> +
>> +	if (closing != 0)
>> +		return ERROR_CLOSING;
>> +
>> 	xen_mb();
>> 	if ( (prod - cons) >= XENSTORE_RING_SIZE )
>> 		return 0;
>> @@ -97,8 +112,12 @@ CAMLprim value ml_interface_read(value interface,
>> value buffer, value len)
>> 
>> 	res = xs_ring_read(GET_C_STRUCT(interface),
>> 	                   String_val(buffer), Int_val(len));
>> -	if (res == -1)
>> +	if (res == ERROR_UNKNOWN)
>> 		caml_failwith("bad connection");
>> +
>> +	if (res == ERROR_CLOSING)
>> +
> 
> This looks a bit wrong. The blank line above should be removed and the following line indented appropriately.
> 
>> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
>> +
>> 	result = Val_int(res);
>> 	CAMLreturn(result);
>> }
>> @@ -111,6 +130,46 @@ CAMLprim value ml_interface_write(value interface,
>> value buffer, value len)
>> 
>> 	res = xs_ring_write(GET_C_STRUCT(interface),
>> 	                    String_val(buffer), Int_val(len));
>> +
>> +	if (res == ERROR_CLOSING)
>> +
> 
> Same here.
> 
>> 	caml_raise_constant(*caml_named_value("Xs_ring.Closing"));
>> +
>> 	result = Val_int(res);
>> 	CAMLreturn(result);
>> }
>> +
>> +CAMLprim value ml_interface_set_server_version(value interface, value v)
>> +{
>> +	CAMLparam2(interface, v);
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +	intf->server_version = Int_val(v);
>> +
>> +	CAMLreturn(Val_unit);
>> +}
>> +
>> +CAMLprim value ml_interface_get_server_version(value interface)
>> +{
>> +	CAMLparam1(interface);
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +
>> +	CAMLreturn(Val_int (intf->server_version));
>> +}
>> +
>> +CAMLprim value ml_interface_close(value interface)
>> +{
>> +	CAMLparam1(interface);
>> +	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)-
>>> addr;
>> +	int i;
>> +
>> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod =
>> 0;
>> +	/* Ensure the unused space is full of invalid xenstore packets. */
>> +	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
>> +		intf->req[i] = invalid_data[i % sizeof(invalid_data)];
>> +		intf->rsp[i] = invalid_data[i % sizeof(invalid_data)];
>> +	}
> 
> This does not reset the rings to their initial state (i.e. all zeroes). I don't think that's necessarily a problem, but it is inconsistent.

Indeed — I should document this difference in the API.

In one case where I saw the original bug the xenstore logs were full of ‘DEBUG’ messages, because message id = 0 = DEBUG, request id = 0, transaction id = 0 and length = 0 is a valid packet. This invalid data should cause the corruption to be noticed more quickly :-)

Thanks,
Dave

> 
>> +	xen_mb ();
>> +	intf->closing = 0;
>> +	CAMLreturn(Val_unit);
>> +}
>> diff --git a/xen/include/public/io/xs_wire.h
>> b/xen/include/public/io/xs_wire.h
>> index 585f0c8..022d614 100644
>> --- a/xen/include/public/io/xs_wire.h
>> +++ b/xen/include/public/io/xs_wire.h
>> @@ -108,14 +108,19 @@ enum xs_watch_type
>>  * `incontents 150 xenstore_struct XenStore wire protocol.
>>  *
>>  * Inter-domain shared memory communications. */
>> +
> 
> Pure whitespace change. Not necessary.
> 
>  Paul
> 
>> #define XENSTORE_RING_SIZE 1024
>> typedef uint32_t XENSTORE_RING_IDX;
>> #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1))
>> struct xenstore_domain_interface {
>> +    /* XENSTORE_VERSION_0 */
>>     char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */
>>     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>>     XENSTORE_RING_IDX req_cons, req_prod;
>>     XENSTORE_RING_IDX rsp_cons, rsp_prod;
>> +    uint32_t server_version;
>> +    /* server_version 1 and later: */
>> +    uint32_t closing;             /* Non-zero means close in progress */
>> };
>> 
>> /* Violating this is very bad.  See docs/misc/xenstore.txt. */
>> --
>> 1.7.10.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
  2014-07-04  9:59                         ` Ian Campbell
@ 2014-07-04 10:19                           ` Dave Scott
  2014-07-04 11:27                             ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Scott @ 2014-07-04 10:19 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Anil Madhavapeddy, Andrew Cooper, Jonathan Ludlam, Yanqiangjun,
	Paul Durrant, Liuqiming (John),
	xen-devel


On 4 Jul 2014, at 10:59, Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Thu, 2014-07-03 at 16:15 +0100, David Scott wrote:
> 
> This all looks broadly sensible to me.

Thanks!

> 
>> Currently hvmloader uses the xenstore ring and then tries to
>> reset it back to its initial state. This is not part of the
>> ring protocol and, if xenstored reads the ring while it is
>> happening, xenstored will conclude it is corrupted. A corrupted
>> ring will prevent PV drivers from connecting. This seems to
>> be a rare failure.
>> 
>> Furthermore, when a VM crashes it may jump to a 'crash kernel'
>> to create a diagnostic dump. Without the ability to safely
>> reset the ring the PV drivers won't be able to reliably
>> establish connections, to (for example) stream a memory dump to
>> disk.
>> 
>> This prototype patch contains a simple extension of the
>> xenstore ring structure, enough to contain version numbers and
>> a 'closing' flag. This memory is currently unused within the
>> 4k page and should be zeroed as part of the domain setup
>> process. The oxenstored server advertises version 1, and
>> hvmloader detects this and sets the 'closing' flag. The server
>> then reinitialises the ring, filling it with obviously invalid
>> data to help debugging (unfortunately blocks of zeroes are
>> valid xenstore packets) and signals hvmloader by the event
>> channel that it's safe to boot the guest OS.
>> 
>> Updates since v2 (thanks to Andy Cooper for the review):
> 
> Please can you put these updates after the "---" marker. This will cause
> the to be omitted from the final commit by the git am tool.
> 
>> * hvmloader: use volatile for read of closing flag
>> * style improvements
>> * remove xenstore version #defines
>> 
>> Updates since v1 (thanks to Paul Durrant for the review):
>> * remove unused client version and associated dead code
>> * treat 'closing' as a flag by using "!=0" rather than "==1"
>> * hvmloader: move function up and remove forward decl
>> * document the existing xenstore ring and the extention in misc/
>> 
>> Signed-off-by: David Scott <dave.scott@citrix.com>
>> ---
>> docs/misc/xenstore-ring.txt         |   79 +++++++++++++++++++++++++++++++++++
>> tools/firmware/hvmloader/xenbus.c   |   39 +++++++++--------
>> tools/ocaml/libs/xb/xb.ml           |   26 +++++++++++-
>> tools/ocaml/libs/xb/xb.mli          |    1 +
>> tools/ocaml/libs/xb/xs_ring.ml      |   11 +++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c |   63 +++++++++++++++++++++++++++-
>> xen/include/public/io/xs_wire.h     |    5 +++
>> 7 files changed, 203 insertions(+), 21 deletions(-)
>> create mode 100644 docs/misc/xenstore-ring.txt
>> 
>> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
>> new file mode 100644
>> index 0000000..df2a09f
>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
>> @@ -0,0 +1,79 @@
>> +
>> +The domain builder allocates a single page and shares it with the xenstore
>> +daemon. This document describes the ring protocol used to communicate via
>> +this page which is used to transmit and receive
>> +[xenstore protocol packets](xenstore.txt).
>> +
>> +In the original version (we call this "version 0"), the shared page has the
>> +following contents (all offsets and lengths are in octets):
>> +
>> +Offset  Length  Description
>> +-----------------------------------------------------------------
>> +0       1024    Requests: data sent to the xenstore daemon
>> +1024    1024    Replies: data sent to the domain
>> +2048    4       Request consumer offset
>> +2052    4       Request producer offset
>> +2056    4       Response consumer offset
>> +2060    4       Response producer offset
>> +
>> +When the page is allocated it is guaranteed to be full of zeroes.
>> +
>> +The "Requests" and "Replies" are treated as circular buffers, one for
>> +each direction. Each circular buffer is associated wth a producer and
>> +consumer offset, which are free-running counters starting from 0. A "producer"
>> +offset is the offset in the byte stream of the next byte to be written; a
>> +"consumer" offset is the offset in the byte stream of the next byte to be
>> +read. The byte at offset 'x' in the byte stream will be stored in
>> +offset 'x mod 1024' in the circular buffer. "producer - consumer" gives
>> +the number of bytes still to be read, and "1024 - (producer - consumer)"
>> +therefore gives the amount of space currently available for writing,
>> +where we must avoid overwriting unread data.
>> +
>> +The circular buffers are only used to send sequences of bytes between domains.
>> +It is the responsibility of the layer above to segment these bytes into
>> +packets, as described in [xenstore.txt](xenstore.txt).
>> +
>> +The client and server domains can run concurrently on different cores and
>> +different sockets. We must therefore take care to avoid corruption by:
>> +
>> +  1. using atomic loads and stores when reading and writing the producer
>> +     and consumer offsets. If an offset were to be updated by a non-atomic
>> +     store then the other domain may read an invalid offset value.
>> +  2. ensuring request and reply data is fully read or written before
>> +     updating the offsets by issuing a memory barrier.
>> +
>> +If we wish to read data but find the circular buffer empty, or wish to write
>> +data and find the circular buffer full, we wait on a pre-arranged event
>> +channel. When the other party next reads or writes data to the ring, it
>> +will notify() this event channel and we will wake.
>> +
>> +Protocol extension for reconnection
>> +-----------------------------------
>> +
>> +In version 0 of the protocol it is not possible to close and reopen the
>> +connection. This means that if the ring is corrupted, it can never be used
>> +(safely) again. Extending the protocol to allow reconnection would allow
>> +us to:
> 
> The use of past tense here is a bit odd given that this patch implements
> and enables all of this stuff.
> 
>> +
>> +  1. use the ring in the firmware (hvmloader) and safely reset it for use
>> +     by the guest
>> +  2. re-establish a ring in a 'crash kernel' environment, allowing us to
>> +     write crash dumps to PV disks or network devices.
>> +
>> +In version 1 of the protocol the ring is extended with the following
>> +fields:
>> +
>> +Offset  Length  Description
>> +-----------------------------------------------------------------
>> +2064    4       Xenstore daemon supported protocol version
>> +2068    4       Close request flag
>> +
>> +In a system supporting only version 0, both these fields are guaranteed
>> +to contain 0 by the domain builder.
>> +
>> +In a system supporting version 1, the xenstore daemon will write "1" into
>> +the support protocol version field. The guest xenstore client (eg in
>> +hvmloader) can query the version, and if it is set to "1" it can write
>> +"1" to the close request flag and call notify(). The supporting xenstore
>> +daemon can reset the ring to an empty state and signal completion by
>> +clearing the flag and calling notify() again.
> 
> I agree with Paul WRT s/calling notify()/kick the evtchn/.
> 
>> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
>> index fe72e97..f85832c 100644
>> --- a/tools/firmware/hvmloader/xenbus.c
>> +++ b/tools/firmware/hvmloader/xenbus.c
>> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
>> static evtchn_port_t event;                     /* Event-channel to dom0 */
>> static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
>> 
>> +static void ring_wait(void)
>> +{
>> +    struct shared_info *shinfo = get_shared_info();
>> +    struct sched_poll poll;
>> +
>> +    memset(&poll, 0, sizeof(poll));
>> +    set_xen_guest_handle(poll.ports, &event);
>> +    poll.nr_ports = 1;
>> +
>> +    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
>> +        hypercall_sched_op(SCHEDOP_poll, &poll);
>> +}
>> +
>> /* Connect our xenbus client to the backend.
>>  * Call once, before any other xenbus actions. */
>> void xenbus_setup(void)
>> @@ -68,10 +81,15 @@ void xenbus_shutdown(void)
>> 
>>     ASSERT(rings != NULL);
>> 
>> -    /* We zero out the whole ring -- the backend can handle this, and it's 
>> -     * not going to surprise any frontends since it's equivalent to never 
>> -     * having used the rings. */
>> -    memset(rings, 0, sizeof *rings);
>> +    if (rings->server_version > 0) {
> 
> hvmloader is part of the Xen release and is therefore entitled to assume
> that it is running against the xenstore(s) which shipped with that
> release of Xen.
> 
> IOW you can omit this check or replace it with an assertion if you
> prefer.
> 
> (Later: OIC, I thought you'd patched the cxenstored but that was
> actually just the ocaml stubs. This is fine then)

OK

> 
>> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
>> index 29d354d..d5cd776 100644
>> --- a/tools/ocaml/libs/xb/xb.ml
>> +++ b/tools/ocaml/libs/xb/xb.ml
> 
> I'll trust you've got all this ocaml stuff right ;-)
> 
> Is there someone else you can CC to get a pair of savvy eyes on it?

I’ve cc:d Anil and Jon who are always good at this kind of thing.

> 
>> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> index 8bd1047..27c98cd 100644
>> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
>> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
>> @@ -35,19 +35,28 @@
>> 
>> #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>> 
>> +#define ERROR_UNKNOWN (-1)
>> +#define ERROR_CLOSING (-2)
>> +
>> static int xs_ring_read(struct mmap_interface *interface,
>>                              char *buffer, int len)
>> {
>> 	struct xenstore_domain_interface *intf = interface->addr;
>> 	XENSTORE_RING_IDX cons, prod; /* offsets only */
>> 	int to_read;
>> +	uint32_t closing;
>> 
>> 	cons = *(volatile uint32*)&intf->req_cons;
>> 	prod = *(volatile uint32*)&intf->req_prod;
>> +	closing = *(volatile uint32*)&intf->closing;
>> +
>> +	if (closing != 0)
>> +		return ERROR_CLOSING;
> 
> It's not possible to just raise the relevant exception here rather than
> propagating this to all the callers?

Hm, looking at the code in context I don’t know why both read and write have been decomposed into two functions each— I might as well merge them back together and raise the exceptions inline as you suggest.

> 
>> +CAMLprim value ml_interface_close(value interface)
>> +{
>> +	CAMLparam1(interface);
>> +	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
>> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
>> +	int i;
>> +
>> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
>> +	/* Ensure the unused space is full of invalid xenstore packets. */
> 
> Is filling with ones or zeroes not sufficient?
> 
> Also, is this strictly necessary of just for debugging purposes?

It’s not strictly necessary so it could be removed.

My preference i to fill the unused space within the ring buffers with obviously invalid packets so that if someone peeks at it at the wrong time, they get obviously bad data and hopefully stop immediately. If both ends are operating correctly then they should never notice. It’s unfortunate that all zeroes looks like a valid sequence of DEBUG packets. Perhaps I could fill it with ‘0xff’ and explicitly declare 0xff as an invalid message type xs_wire.h — that would do it.

What do you think?

Dave

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] RFC: extend the xenstore ring with a 'closing' signal
  2014-06-30 14:42                     ` [PATCH v2] " David Scott
  2014-07-03 15:15                       ` [PATCH v3] " David Scott
@ 2014-07-04 11:02                       ` Ian Jackson
  1 sibling, 0 replies; 29+ messages in thread
From: Ian Jackson @ 2014-07-04 11:02 UTC (permalink / raw)
  To: David Scott
  Cc: xen-devel, paul.durrant, ian.campbell, yanqiangjun, john.liuqiming

David Scott writes ("[Xen-devel] [PATCH v2] RFC: extend the xenstore ring with a 'closing' signal"):
> Currently hvmloader uses the xenstore ring and then tries to
> reset it back to its initial state. This is not part of the
> ring protocol and, if xenstored reads the ring while it is
> happening, xenstored will conclude it is corrupted. A corrupted
> ring will prevent PV drivers from connecting. This seems to
> be a rare failure.

This seems like a sensible extension in principle.  I have some
comments.  I'll concentrate on the protocol specification:

Firstly, thanks a lot for actually writing this down properly.

> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..df2a09f
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,79 @@
> +
> +The domain builder allocates a single page and shares it with the xenstore
> +daemon. This document describes the ring protocol used to communicate via
> +this page which is used to transmit and receive
> +[xenstore protocol packets](xenstore.txt).

This document should explain things mainly from the point of view of
the guest.  The rest are internal implementation details (which might
be moved into comments in the implementation).

> +In the original version (we call this "version 0"), the shared page has the
> +following contents (all offsets and lengths are in octets):
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +0       1024    Requests: data sent to the xenstore daemon
> +1024    1024    Replies: data sent to the domain
> +2048    4       Request consumer offset
> +2052    4       Request producer offset
> +2056    4       Response consumer offset
> +2060    4       Response producer offset
> +
> +When the page is allocated it is guaranteed to be full of zeroes.

For example, this belongs somewhere else.  Guests should not assume
that the page starts out full of zeroes.

Instead, you should say something like:

  When the guest starts, the ring is in a valid state with an empty
  queue in each direction.  (I.e., each consumer offset is the same as
  the corresponding producer offset.)

> +The "Requests" and "Replies" are treated as circular buffers, one for
> +each direction. Each circular buffer is associated wth a producer and
> +consumer offset, which are free-running counters starting from 0.

It is surely not necessary to specify that these counters start at 0.
Both ends need to cope with wraparound anyway.

On the other hand if old guests have assumed that the counters start
at 0 then that should be documented.  Have they, do we know ?

> +Protocol extension for reconnection
> +-----------------------------------
> +
> +In version 0 of the protocol it is not possible to close and reopen the
> +connection. This means that if the ring is corrupted, it can never be used
> +(safely) again. Extending the protocol to allow reconnection would allow
> +us to:

I generally favour feature bits rather than version numbers.  This
makes handling divergence and feature obsolecence much easier.  There
is plenty of room in this page for one bit per feature.

(Also, using feature bits makes it clearer what the intended semantics
are.  It is implicit in your design that an implementation which seems
a later version number advertised by its peer should not refuse to
operate, but you didn't actually say so.  The corresponding statement
for feature bits is simpler: "unknown feature bits should be
ignored".)

> +In a system supporting version 1, the xenstore daemon will write "1" into
> +the support protocol version field. The guest xenstore client (eg in
> +hvmloader) can query the version, and if it is set to "1" it can write
> +"1" to the close request flag and call notify(). The supporting xenstore
> +daemon can reset the ring to an empty state and signal completion by
> +clearing the flag and calling notify() again.

Implicitly there is a rule here that the daemon may only write to this
field if it is currently nonzero, and that the client may only write
to it if it is currently zero.  This (or something like it) should be
stated explicitly.

You should also say whether this has any effect on the higher
layers of the xenstored protocol.  Does a ring reset have the effect
of a RESET_WATCHES call ?  What happens to oustanding commands and
responses ?

> index 585f0c8..2c934ca 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -108,14 +108,22 @@ enum xs_watch_type
>   * `incontents 150 xenstore_struct XenStore wire protocol.
>   *
>   * Inter-domain shared memory communications. */

This should gain a reference to your new document.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3] RFC: extend the xenstore ring with a 'closing' signal
  2014-07-04 10:19                           ` Dave Scott
@ 2014-07-04 11:27                             ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2014-07-04 11:27 UTC (permalink / raw)
  To: Dave Scott
  Cc: Anil Madhavapeddy, Andrew Cooper, Jonathan Ludlam, Yanqiangjun,
	Paul Durrant, Liuqiming (John),
	xen-devel

On Fri, 2014-07-04 at 11:19 +0100, Dave Scott wrote:

> > 
> >> +CAMLprim value ml_interface_close(value interface)
> >> +{
> >> +	CAMLparam1(interface);
> >> +	const char invalid_data[] = { 'd', 'e', 'a', 'd', 'b', 'e', 'e', 'f' };
> >> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> >> +	int i;
> >> +
> >> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
> >> +	/* Ensure the unused space is full of invalid xenstore packets. */
> > 
> > Is filling with ones or zeroes not sufficient?
> > 
> > Also, is this strictly necessary of just for debugging purposes?
> 
> It’s not strictly necessary so it could be removed.
> 
> My preference i to fill the unused space within the ring buffers with
> obviously invalid packets so that if someone peeks at it at the wrong
> time, they get obviously bad data and hopefully stop immediately. If
> both ends are operating correctly then they should never notice. It’s
> unfortunate that all zeroes looks like a valid sequence of DEBUG
> packets. Perhaps I could fill it with ‘0xff’ and explicitly declare
> 0xff as an invalid message type xs_wire.h — that would do it.
> 
> What do you think?

One thought I just had was that this doesn't leave the ring in the
stated initial state (which is all zeroes). If something wanted to
resume using the ring it might expect that it was all zeroes?




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
  2014-07-03 15:15                       ` [PATCH v3] " David Scott
  2014-07-04  8:21                         ` Paul Durrant
  2014-07-04  9:59                         ` Ian Campbell
@ 2014-09-03 16:25                         ` David Scott
  2014-09-10 13:35                           ` Ian Campbell
                                             ` (2 more replies)
  2 siblings, 3 replies; 29+ messages in thread
From: David Scott @ 2014-09-03 16:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Scott, anil, Andrew.Cooper3, Jonathan.Ludlam, yanqiangjun,
	Paul.Durrant, john.liuqiming

Hvmloader uses the xenstore ring and then tries to reset it back
to its initial state before booting the guest. Occasionally xenstored
will read the ring while it is being zeroed and conclude it has
been corrupted. This prevents PV drivers from loading in the guest.

This patch updates the xenstore ring protocol definition, enabling
a server to advertise additional features to the guest. One such feature
is defined: the ability to cleanly reset the ring (but not the
higher-level protocol, for which we already have RESET_WATCHES).

This patch implements the ring reconnection features in oxenstored
and hvmloader, fixing the bug.

This patch also defines an 'invalid' xenstore packet type and uses this
to poison the ring over a reconnect. This will make diagnosing this
bug much easier in future.

Signed-off-by: David Scott <dave.scott@citrix.com>

---

Updates since v3
* hvmloader: signal the event channel after requesting a reconnect
  (thanks to Zheng Li for diagnosing this)
* switch to using feature flags/bits instead of version numbers
* rename the 'closing' field to 'connection state'
* define an invalid packet type (0xffff, since 0x0 was already taken)
* on ring connection/reset, fill the input and output buffers with
  the invalid packet
* ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
  remove #define ERROR_FOO
* doc: write from the guest's point of view, weaken guarantees made by
  server (e.g. producer = consumer != 0 is valid after reconnect)
* doc: relate reconnection to the RESET_WATCHES command in the higher
  level protocol
* doc: be more specific about who must write what, when

Updates since v2
* hvmloader: use volatile for read of closing flag
* style improvements
* remove xenstore version #defines

Updates since v1
* remove unused client version and associated dead code
* treat 'closing' as a flag by using "!=0" rather than "==1"
* hvmloader: move function up and remove forward decl
* document the existing xenstore ring and the extention in misc/
---
 docs/misc/xenstore-ring.txt         |  112 +++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/xenbus.c   |   48 +++++++++------
 tools/ocaml/libs/xb/xb.ml           |   27 ++++++++-
 tools/ocaml/libs/xb/xb.mli          |    1 +
 tools/ocaml/libs/xb/xs_ring.ml      |   10 ++++
 tools/ocaml/libs/xb/xs_ring_stubs.c |  109 ++++++++++++++++++++++++----------
 xen/include/public/io/xs_wire.h     |   13 +++-
 7 files changed, 269 insertions(+), 51 deletions(-)
 create mode 100644 docs/misc/xenstore-ring.txt

diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
new file mode 100644
index 0000000..6d18556
--- /dev/null
+++ b/docs/misc/xenstore-ring.txt
@@ -0,0 +1,112 @@
+The xenstore ring is a datastructure stored within a single 4KiB page
+shared between the xenstore server and the guest. The ring contains
+two queues of bytes -- one in each direction -- and some signalling
+information. The [xenstore protocol](xenstore.txt) is layered on top of
+the byte streams.
+
+The xenstore ring datastructure
+===============================
+
+The following table describes the ring structure where
+  - offsets and lengths are in bytes;
+  - "Input" is used to describe the data sent to the server; and
+  - "Output" is used to describe the data sent to the domain.
+
+Offset  Length  Description
+-----------------------------------------------------------------
+0       1024    Input data
+1024    1024    Output data
+2048    4       Input consumer offset
+2052    4       Input producer offset
+2056    4       Output consumer offset
+2060    4       Output producer offset
+2064    4       Server feature bitmap
+2068    4       Connection state
+
+The Input data and Output data are circular buffers. Each buffer is
+associated with a pair of free-running offsets labelled "consumer" and
+"producer".
+
+A "producer" offset is the offset in the byte stream of the next byte
+to be written modulo 2^32. A "consumer" offset is the offset in the byte
+stream of the next byte to be read modulo 2^32. Implementations must
+take care to handle wraparound properly when performing arithmetic with
+these values.
+
+The byte at offset 'x' in the byte stream will be stored at offset
+'x modulo 1024' in the circular buffer.
+
+Implementations may only overwrite previously-written data if it has
+been marked as 'consumed' by the relevant consumer pointer.
+
+When the guest domain is created, there is no outstanding Input or Output
+data. However
+
+  - guests must not assume that producer or consumer pointers start
+    at zero; and
+  - guests must not assume that unused bytes in either the Input or
+    Output data buffers has any particular value.
+
+A xenstore ring is always associated with an event channel. Whenever the
+ring structure is updated the event channel must be signalled. The
+guest and server are free to inspect the contents of the ring at any
+time, not only in response to an event channel event. This implies that
+updates must be ordered carefully to ensure consistency.
+
+The xenstore server may decide to advertise some features via the
+"Server feature bitmap". The server can start advertising features
+at any time by setting bits but it will never stop advertising features
+i.e. bits will never be cleared. The guest is not permitted to write to
+the server feature bitmap. The server features are offered to the guest;
+it is up to the guest whether to use them or not. The guest should ignore
+any unknown feature bits.
+
+The following features are defined:
+
+Mask    Description
+-----------------------------------------------------------------
+1       Ring reconnection (see the ring reconnection feature below)
+
+The "Connection state" field is used to request a ring close and reconnect.
+The "Connection state" field only contains valid data if the server has
+advertised the ring reconnection feature. If the feature has been advertised
+then the "Connection state" may take the following values:
+
+Value   Description
+-----------------------------------------------------------------
+0       Ring is connected
+1       Ring close and reconnect is in progress (see the "ring
+        reconnection feature" described below)
+
+The ring reconnection feature
+=============================
+
+The ring reconnection feature allows the guest to ask the server to
+reset the ring to a valid initial state i.e. where the Input and Output
+queues contain no data. The feature operates at the ring-level only;
+it does not affect the higher-level protocol state machine at all.
+To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
+command in the [xenstore protocol](xenstore.txt) specification.
+
+The ring reconnection feature is only available if the 'Ring reconnection'
+feature bit has been set by the server in the "Server feature bitmap".
+
+Assuming the server has advertised the feature, the guest can initiate
+a reconnection by setting the the Connection state to 1 ("Ring close
+and reconnect is in progress") and signalling the event channel.
+The guest must now ignore all fields except the Connection state and
+wait for it to be set to 0 ("Ring is connected")
+
+The server will guarantee to
+
+  - drop any partially read or written higher-level
+    [xenstore protocol](xenstore.txt) packets it may have;
+  - empty the Input and Output queues in the xenstore ring;
+  - set the Connection state to 0 ("Ring is connected"); and
+  - signal the event channel.
+
+From the point of view of the guest, the connection has been reset on a
+packet boundary.
+
+Note that only the guest may set the Connection state to 1 and only the
+server may set it back to 0.
diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
index 64c2176..f900a1e 100644
--- a/tools/firmware/hvmloader/xenbus.c
+++ b/tools/firmware/hvmloader/xenbus.c
@@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
 static evtchn_port_t event;                     /* Event-channel to dom0 */
 static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
 
+static void ring_wait(void)
+{
+    struct shared_info *shinfo = get_shared_info();
+    struct sched_poll poll;
+
+    memset(&poll, 0, sizeof(poll));
+    set_xen_guest_handle(poll.ports, &event);
+    poll.nr_ports = 1;
+
+    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
+        hypercall_sched_op(SCHEDOP_poll, &poll);
+}
+
 /* Connect our xenbus client to the backend.
  * Call once, before any other xenbus actions. */
 void xenbus_setup(void)
@@ -61,14 +74,26 @@ void xenbus_setup(void)
 void xenbus_shutdown(void)
 {
     struct shared_info *shinfo = get_shared_info();
+    evtchn_send_t send;
 
     ASSERT(rings != NULL);
 
-    /* We zero out the whole ring -- the backend can handle this, and it's 
-     * not going to surprise any frontends since it's equivalent to never 
-     * having used the rings. */
-    memset(rings, 0, sizeof *rings);
-
+    if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
+        rings->connection = XENSTORE_RECONNECT;
+        send.port = event;
+        hypercall_event_channel_op(EVTCHNOP_send, &send);
+        while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
+            ring_wait ();
+    } else {
+        /* If the backend reads the state while we're erasing it then the
+         * ring state will become corrupted, preventing guest frontends from
+         * connecting. This is rare. To help diagnose the failure, we fill
+         * the ring with XS_INVALID packets. */
+        memset(rings->req, 0xff, XENSTORE_RING_SIZE);
+        memset(rings->rsp, 0xff, XENSTORE_RING_SIZE);
+        rings->req_cons = rings->req_prod = 0;
+        rings->rsp_cons = rings->rsp_prod = 0;
+    }
     /* Clear the event-channel state too. */
     memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
     memset(shinfo->evtchn_pending, 0, sizeof(shinfo->evtchn_pending));
@@ -77,19 +102,6 @@ void xenbus_shutdown(void)
     rings = NULL;
 }
 
-static void ring_wait(void)
-{
-    struct shared_info *shinfo = get_shared_info();
-    struct sched_poll poll;
-
-    memset(&poll, 0, sizeof(poll));
-    set_xen_guest_handle(poll.ports, &event);
-    poll.nr_ports = 1;
-
-    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
-        hypercall_sched_op(SCHEDOP_poll, &poll);
-}
-
 /* Helper functions: copy data in and out of the ring */
 static void ring_write(const char *data, uint32_t len)
 {
diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
index 29d354d..6332433 100644
--- a/tools/ocaml/libs/xb/xb.ml
+++ b/tools/ocaml/libs/xb/xb.ml
@@ -84,7 +84,26 @@ let write con s len =
 	| Fd backfd     -> write_fd backfd con s len
 	| Xenmmap backmmap -> write_mmap backmmap con s len
 
-let output con =
+(* If a function throws Xs_ring.Reconnect, then clear the ring state
+   and serve the ring again. *)
+let rec handle_reconnect f con =
+	match (try Some (f con) with Xs_ring.Reconnect -> None) with
+	| Some x -> x
+	| None ->
+		begin match con.backend with
+		| Fd _ -> raise Xs_ring.Reconnect (* should never happen, but just in case *)
+		| Xenmmap backend ->
+			Xs_ring.close backend.mmap;
+			backend.eventchn_notify ();
+			(* Clear our old connection state *)
+			Queue.clear con.pkt_in;
+			Queue.clear con.pkt_out;
+			con.partial_in <- init_partial_in ();
+			con.partial_out <- "";
+			handle_reconnect f con
+		end
+
+let output = handle_reconnect (fun con ->
 	(* get the output string from a string_of(packet) or partial_out *)
 	let s = if String.length con.partial_out > 0 then
 			con.partial_out
@@ -101,8 +120,9 @@ let output con =
 	);
 	(* after sending one packet, partial is empty *)
 	con.partial_out = ""
+)
 
-let input con =
+let input = handle_reconnect (fun con ->
 	let newpacket = ref false in
 	let to_read =
 		match con.partial_in with
@@ -133,6 +153,7 @@ let input con =
 			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
 	);
 	!newpacket
+)
 
 let newcon backend = {
 	backend = backend;
@@ -145,6 +166,8 @@ let newcon backend = {
 let open_fd fd = newcon (Fd { fd = fd; })
 
 let open_mmap mmap notifyfct =
+	(* Advertise XENSTORE_SERVER_FEATURE_RECONNECTION *)
+	Xs_ring.set_server_features mmap 1;
 	newcon (Xenmmap {
 		mmap = mmap;
 		eventchn_notify = notifyfct;
diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
index 58234ae..d47d869 100644
--- a/tools/ocaml/libs/xb/xb.mli
+++ b/tools/ocaml/libs/xb/xb.mli
@@ -80,6 +80,7 @@ val read : t -> string -> int -> int
 val write_fd : backend_fd -> 'a -> string -> int -> int
 val write_mmap : backend_mmap -> 'a -> string -> int -> int
 val write : t -> string -> int -> int
+val handle_reconnect : (t -> 'a) -> t -> 'a
 val output : t -> bool
 val input : t -> bool
 val newcon : backend -> t
diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
index 9469609..2c4eeab 100644
--- a/tools/ocaml/libs/xb/xs_ring.ml
+++ b/tools/ocaml/libs/xb/xs_ring.ml
@@ -14,5 +14,15 @@
  * GNU Lesser General Public License for more details.
  *)
 
+exception Reconnect
+
+let _ =
+  Callback.register_exception "Xs_ring.Reconnect" Reconnect
+
 external read: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_read"
 external write: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
+
+external set_server_features: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_features" "noalloc"
+external get_server_features: Xenmmap.mmap_interface -> int = "ml_interface_get_server_features" "noalloc"
+
+external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" "noalloc"
diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
index 8bd1047..c728a01 100644
--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
@@ -35,22 +35,39 @@
 
 #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
 
-static int xs_ring_read(struct mmap_interface *interface,
-                             char *buffer, int len)
+CAMLprim value ml_interface_read(value ml_interface,
+                                 value ml_buffer,
+                                 value ml_len)
 {
+	CAMLparam3(ml_interface, ml_buffer, ml_len);
+	CAMLlocal1(ml_result);
+
+	struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
+	char *buffer = String_val(ml_buffer);
+	int len = Int_val(ml_len);
+	int result;
+
 	struct xenstore_domain_interface *intf = interface->addr;
 	XENSTORE_RING_IDX cons, prod; /* offsets only */
 	int to_read;
+	uint32_t connection;
 
 	cons = *(volatile uint32*)&intf->req_cons;
 	prod = *(volatile uint32*)&intf->req_prod;
+	connection = *(volatile uint32*)&intf->connection;
+
+	if (connection != XENSTORE_CONNECTED)
+		caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
+
 	xen_mb();
 
 	if ((prod - cons) > XENSTORE_RING_SIZE)
-	    return -1;
+		caml_failwith("bad connection");
 
-	if (prod == cons)
-		return 0;
+	if (prod == cons) {
+		result = 0;
+		goto exit;
+	}
 	cons = MASK_XENSTORE_IDX(cons);
 	prod = MASK_XENSTORE_IDX(prod);
 	if (prod > cons)
@@ -62,21 +79,41 @@ static int xs_ring_read(struct mmap_interface *interface,
 	memcpy(buffer, intf->req + cons, len);
 	xen_mb();
 	intf->req_cons += len;
-	return len;
+	result = len;
+exit:
+	ml_result = Val_int(result);
+	CAMLreturn(ml_result);
 }
 
-static int xs_ring_write(struct mmap_interface *interface,
-                              char *buffer, int len)
+CAMLprim value ml_interface_write(value ml_interface,
+                                  value ml_buffer,
+                                  value ml_len)
 {
+	CAMLparam3(ml_interface, ml_buffer, ml_len);
+	CAMLlocal1(ml_result);
+
+	struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
+	char *buffer = String_val(ml_buffer);
+	int len = Int_val(ml_len);
+	int result;
+
 	struct xenstore_domain_interface *intf = interface->addr;
 	XENSTORE_RING_IDX cons, prod;
 	int can_write;
+	uint32_t connection;
 
 	cons = *(volatile uint32*)&intf->rsp_cons;
 	prod = *(volatile uint32*)&intf->rsp_prod;
+	connection = *(volatile uint32*)&intf->connection;
+
+	if (connection != XENSTORE_CONNECTED)
+		caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
+
 	xen_mb();
-	if ( (prod - cons) >= XENSTORE_RING_SIZE )
-		return 0;
+	if ( (prod - cons) >= XENSTORE_RING_SIZE ) {
+		result = 0;
+		goto exit;
+	}
 	if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons))
 		can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod);
 	else 
@@ -86,31 +123,43 @@ static int xs_ring_write(struct mmap_interface *interface,
 	memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len);
 	xen_mb();
 	intf->rsp_prod += len;
-	return len;
+	result = len;
+exit:
+	ml_result = Val_int(result);
+	CAMLreturn(ml_result);
 }
 
-CAMLprim value ml_interface_read(value interface, value buffer, value len)
+CAMLprim value ml_interface_set_server_features(value interface, value v)
 {
-	CAMLparam3(interface, buffer, len);
-	CAMLlocal1(result);
-	int res;
+	CAMLparam2(interface, v);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
 
-	res = xs_ring_read(GET_C_STRUCT(interface),
-	                   String_val(buffer), Int_val(len));
-	if (res == -1)
-		caml_failwith("bad connection");
-	result = Val_int(res);
-	CAMLreturn(result);
+	intf->server_features = Int_val(v);
+
+	CAMLreturn(Val_unit);
+}
+
+CAMLprim value ml_interface_get_server_features(value interface)
+{
+	CAMLparam1(interface);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+
+	CAMLreturn(Val_int (intf->server_features));
 }
 
-CAMLprim value ml_interface_write(value interface, value buffer, value len)
+CAMLprim value ml_interface_close(value interface)
 {
-	CAMLparam3(interface, buffer, len);
-	CAMLlocal1(result);
-	int res;
-
-	res = xs_ring_write(GET_C_STRUCT(interface),
-	                    String_val(buffer), Int_val(len));
-	result = Val_int(res);
-	CAMLreturn(result);
+	CAMLparam1(interface);
+	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
+	int i;
+
+	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
+	/* Ensure the unused space is full of invalid xenstore packets. */
+	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
+		intf->req[i] = 0xff; /* XS_INVALID = 0xffff */
+		intf->rsp[i] = 0xff;
+	}
+	xen_mb ();
+	intf->connection = XENSTORE_CONNECTED;
+	CAMLreturn(Val_unit);
 }
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 585f0c8..0a0cdbc 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -49,7 +49,9 @@ enum xsd_sockmsg_type
     XS_RESUME,
     XS_SET_TARGET,
     XS_RESTRICT,
-    XS_RESET_WATCHES
+    XS_RESET_WATCHES,
+
+    XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */
 };
 
 #define XS_WRITE_NONE "NONE"
@@ -116,6 +118,8 @@ struct xenstore_domain_interface {
     char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
     XENSTORE_RING_IDX req_cons, req_prod;
     XENSTORE_RING_IDX rsp_cons, rsp_prod;
+    uint32_t server_features; /* Bitmap of features supported by the server */
+    uint32_t connection;
 };
 
 /* Violating this is very bad.  See docs/misc/xenstore.txt. */
@@ -125,6 +129,13 @@ struct xenstore_domain_interface {
 #define XENSTORE_ABS_PATH_MAX 3072
 #define XENSTORE_REL_PATH_MAX 2048
 
+/* The ability to reconnect a ring */
+#define XENSTORE_SERVER_FEATURE_RECONNECTION 1
+
+/* Valid values for the connection field */
+#define XENSTORE_CONNECTED 0 /* the steady-state */
+#define XENSTORE_RECONNECT 1 /* guest has initiated a reconnect */
+
 #endif /* _XS_WIRE_H */
 
 /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
  2014-09-03 16:25                         ` [PATCH v4] xenstore: " David Scott
@ 2014-09-10 13:35                           ` Ian Campbell
  2014-09-10 14:33                             ` Dave Scott
       [not found]                           ` <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>
  2014-09-24 13:36                           ` Jon Ludlam
  2 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2014-09-10 13:35 UTC (permalink / raw)
  To: David Scott, Ian Jackson
  Cc: anil, Andrew.Cooper3, Jonathan.Ludlam, yanqiangjun, Paul.Durrant,
	john.liuqiming, xen-devel

On Wed, 2014-09-03 at 17:25 +0100, David Scott wrote:
> Hvmloader uses the xenstore ring and then tries to reset it back
> to its initial state before booting the guest. Occasionally xenstored
> will read the ring while it is being zeroed and conclude it has
> been corrupted. This prevents PV drivers from loading in the guest.
> 
> This patch updates the xenstore ring protocol definition, enabling
> a server to advertise additional features to the guest. One such feature
> is defined: the ability to cleanly reset the ring (but not the
> higher-level protocol, for which we already have RESET_WATCHES).

Is RESET_WATCHES complete enough to be considered a higher-level
protocol reset, as opposed to just doing the watch stuff? (maybe there
is no other state to speak of?)

> This patch implements the ring reconnection features in oxenstored
> and hvmloader, fixing the bug.

I suppose it is worth mentioning here that C xenstored is untouched but
that the hvmloader changes have been written to work with an arbitrary
xenstore by using the protocol. (at least I hope it has!)

> This patch also defines an 'invalid' xenstore packet type and uses this
> to poison the ring over a reconnect. This will make diagnosing this
> bug much easier in future.
> 
> Signed-off-by: David Scott <dave.scott@citrix.com>

I've made some comments below but I think it might be worth waiting for
Ian Jackson's input, since he had comments last time, before rushing to
make any changes.

> 
> ---
> 
> Updates since v3
> * hvmloader: signal the event channel after requesting a reconnect
>   (thanks to Zheng Li for diagnosing this)
> * switch to using feature flags/bits instead of version numbers
> * rename the 'closing' field to 'connection state'
> * define an invalid packet type (0xffff, since 0x0 was already taken)
> * on ring connection/reset, fill the input and output buffers with
>   the invalid packet
> * ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
>   remove #define ERROR_FOO
> * doc: write from the guest's point of view, weaken guarantees made by
>   server (e.g. producer = consumer != 0 is valid after reconnect)
> * doc: relate reconnection to the RESET_WATCHES command in the higher
>   level protocol
> * doc: be more specific about who must write what, when
> 
> Updates since v2
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
> 
> Updates since v1
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
> ---
>  docs/misc/xenstore-ring.txt         |  112 +++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/xenbus.c   |   48 +++++++++------
>  tools/ocaml/libs/xb/xb.ml           |   27 ++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    1 +
>  tools/ocaml/libs/xb/xs_ring.ml      |   10 ++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |  109 ++++++++++++++++++++++++----------
>  xen/include/public/io/xs_wire.h     |   13 +++-
>  7 files changed, 269 insertions(+), 51 deletions(-)
>  create mode 100644 docs/misc/xenstore-ring.txt
> 
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..6d18556
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt

You seem to use markdown like syntax, so please name it .markdown and we
will get some sort of useful (well, ish) HTML out of it at build time.

> @@ -0,0 +1,112 @@
> +The xenstore ring is a datastructure stored within a single 4KiB page
> +shared between the xenstore server and the guest. The ring contains
> +two queues of bytes -- one in each direction -- and some signalling
> +information. The [xenstore protocol](xenstore.txt) is layered on top of
> +the byte streams.
> +
> +The xenstore ring datastructure
> +===============================
> +
> +The following table describes the ring structure where
> +  - offsets and lengths are in bytes;
> +  - "Input" is used to describe the data sent to the server; and
> +  - "Output" is used to describe the data sent to the domain.
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +0       1024    Input data
> +1024    1024    Output data
> +2048    4       Input consumer offset
> +2052    4       Input producer offset
> +2056    4       Output consumer offset
> +2060    4       Output producer offset
> +2064    4       Server feature bitmap
> +2068    4       Connection state
> +
> +The Input data and Output data are circular buffers. Each buffer is
> +associated with a pair of free-running offsets labelled "consumer" and
> +"producer".
> +
> +A "producer" offset is the offset in the byte stream of the next byte
> +to be written modulo 2^32. A "consumer" offset is the offset in the byte
> +stream of the next byte to be read modulo 2^32. Implementations must
> +take care to handle wraparound properly when performing arithmetic with
> +these values.
> +
> +The byte at offset 'x' in the byte stream will be stored at offset
> +'x modulo 1024' in the circular buffer.
> +
> +Implementations may only overwrite previously-written data if it has
> +been marked as 'consumed' by the relevant consumer pointer.
> +
> +When the guest domain is created, there is no outstanding Input or Output
> +data. However
> +
> +  - guests must not assume that producer or consumer pointers start
> +    at zero; and
> +  - guests must not assume that unused bytes in either the Input or
> +    Output data buffers has any particular value.
> +
> +A xenstore ring is always associated with an event channel. Whenever the
> +ring structure is updated the event channel must be signalled. The
> +guest and server are free to inspect the contents of the ring at any
> +time, not only in response to an event channel event. This implies that
> +updates must be ordered carefully to ensure consistency.
> +
> +The xenstore server may decide to advertise some features via the
> +"Server feature bitmap". The server can start advertising features
> +at any time by setting bits but it will never stop advertising features
> +i.e. bits will never be cleared. The guest is not permitted to write to
> +the server feature bitmap. The server features are offered to the guest;
> +it is up to the guest whether to use them or not. The guest should ignore
> +any unknown feature bits.
> +
> +The following features are defined:
> +
> +Mask    Description
> +-----------------------------------------------------------------
> +1       Ring reconnection (see the ring reconnection feature below)
> +
> +The "Connection state" field is used to request a ring close and reconnect.
> +The "Connection state" field only contains valid data if the server has
> +advertised the ring reconnection feature. If the feature has been advertised
> +then the "Connection state" may take the following values:
> +
> +Value   Description
> +-----------------------------------------------------------------
> +0       Ring is connected
> +1       Ring close and reconnect is in progress (see the "ring
> +        reconnection feature" described below)
> +
> +The ring reconnection feature
> +=============================
> +
> +The ring reconnection feature allows the guest to ask the server to
> +reset the ring to a valid initial state i.e. where the Input and Output
> +queues contain no data. The feature operates at the ring-level only;
> +it does not affect the higher-level protocol state machine at all.
> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'

typo here: theh.

> +command in the [xenstore protocol](xenstore.txt) specification.
> +
> +The ring reconnection feature is only available if the 'Ring reconnection'
> +feature bit has been set by the server in the "Server feature bitmap".
> +
> +Assuming the server has advertised the feature, the guest can initiate
> +a reconnection by setting the the Connection state to 1 ("Ring close
> +and reconnect is in progress") and signalling the event channel.
> +The guest must now ignore all fields except the Connection state and
> +wait for it to be set to 0 ("Ring is connected")
> +
> +The server will guarantee to
> +
> +  - drop any partially read or written higher-level
> +    [xenstore protocol](xenstore.txt) packets it may have;
> +  - empty the Input and Output queues in the xenstore ring;
> +  - set the Connection state to 0 ("Ring is connected"); and
> +  - signal the event channel.
> +
> +From the point of view of the guest, the connection has been reset on a
> +packet boundary.
> +
> +Note that only the guest may set the Connection state to 1 and only the
> +server may set it back to 0.

> [...]
> -    /* We zero out the whole ring -- the backend can handle this, and it's 
> -     * not going to surprise any frontends since it's equivalent to never 
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> -
> +    if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
> +        rings->connection = XENSTORE_RECONNECT;
> +        send.port = event;
> +        hypercall_event_channel_op(EVTCHNOP_send, &send);
> +        while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)

I don't think you need the volatile here, certainly none of the other
ring accesses seem to use it.

Apart from that the C side of things looks good. I've not looked at the
ocaml side of things, do you have someone you can lean on to do some
review or do I need to blow the cobwebs off that part of my brain?

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
  2014-09-10 13:35                           ` Ian Campbell
@ 2014-09-10 14:33                             ` Dave Scott
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Scott @ 2014-09-10 14:33 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Anil Madhavapeddy, Andrew Cooper, Jonathan Ludlam, yanqiangjun,
	Paul Durrant, xen-devel, Ian Jackson, john.liuqiming

Hi,

On 10 Sep 2014, at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote:

> On Wed, 2014-09-03 at 17:25 +0100, David Scott wrote:
>> Hvmloader uses the xenstore ring and then tries to reset it back
>> to its initial state before booting the guest. Occasionally xenstored
>> will read the ring while it is being zeroed and conclude it has
>> been corrupted. This prevents PV drivers from loading in the guest.
>> 
>> This patch updates the xenstore ring protocol definition, enabling
>> a server to advertise additional features to the guest. One such feature
>> is defined: the ability to cleanly reset the ring (but not the
>> higher-level protocol, for which we already have RESET_WATCHES).
> 
> Is RESET_WATCHES complete enough to be considered a higher-level
> protocol reset, as opposed to just doing the watch stuff? (maybe there
> is no other state to speak of?)

I believe the only high-level connection state is watches + outstanding transactions. The xenstore.txt doc says that RESET_WATCHES should reset both:

RESET_WATCHES           |
        Reset all watches and transactions of the caller.


> 
>> This patch implements the ring reconnection features in oxenstored
>> and hvmloader, fixing the bug.
> 
> I suppose it is worth mentioning here that C xenstored is untouched but
> that the hvmloader changes have been written to work with an arbitrary
> xenstore by using the protocol. (at least I hope it has!)

That’s right — hvmloader is flexible.

> 
>> This patch also defines an 'invalid' xenstore packet type and uses this
>> to poison the ring over a reconnect. This will make diagnosing this
>> bug much easier in future.
>> 
>> Signed-off-by: David Scott <dave.scott@citrix.com>
> 
> I've made some comments below but I think it might be worth waiting for
> Ian Jackson's input, since he had comments last time, before rushing to
> make any changes.

Sounds sensible to me.

Thanks,
Dave

> 
>> 
>> ---
>> 
>> Updates since v3
>> * hvmloader: signal the event channel after requesting a reconnect
>>  (thanks to Zheng Li for diagnosing this)
>> * switch to using feature flags/bits instead of version numbers
>> * rename the 'closing' field to 'connection state'
>> * define an invalid packet type (0xffff, since 0x0 was already taken)
>> * on ring connection/reset, fill the input and output buffers with
>>  the invalid packet
>> * ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
>>  remove #define ERROR_FOO
>> * doc: write from the guest's point of view, weaken guarantees made by
>>  server (e.g. producer = consumer != 0 is valid after reconnect)
>> * doc: relate reconnection to the RESET_WATCHES command in the higher
>>  level protocol
>> * doc: be more specific about who must write what, when
>> 
>> Updates since v2
>> * hvmloader: use volatile for read of closing flag
>> * style improvements
>> * remove xenstore version #defines
>> 
>> Updates since v1
>> * remove unused client version and associated dead code
>> * treat 'closing' as a flag by using "!=0" rather than "==1"
>> * hvmloader: move function up and remove forward decl
>> * document the existing xenstore ring and the extention in misc/
>> ---
>> docs/misc/xenstore-ring.txt         |  112 +++++++++++++++++++++++++++++++++++
>> tools/firmware/hvmloader/xenbus.c   |   48 +++++++++------
>> tools/ocaml/libs/xb/xb.ml           |   27 ++++++++-
>> tools/ocaml/libs/xb/xb.mli          |    1 +
>> tools/ocaml/libs/xb/xs_ring.ml      |   10 ++++
>> tools/ocaml/libs/xb/xs_ring_stubs.c |  109 ++++++++++++++++++++++++----------
>> xen/include/public/io/xs_wire.h     |   13 +++-
>> 7 files changed, 269 insertions(+), 51 deletions(-)
>> create mode 100644 docs/misc/xenstore-ring.txt
>> 
>> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
>> new file mode 100644
>> index 0000000..6d18556
>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
> 
> You seem to use markdown like syntax, so please name it .markdown and we
> will get some sort of useful (well, ish) HTML out of it at build time.
> 
>> @@ -0,0 +1,112 @@
>> +The xenstore ring is a datastructure stored within a single 4KiB page
>> +shared between the xenstore server and the guest. The ring contains
>> +two queues of bytes -- one in each direction -- and some signalling
>> +information. The [xenstore protocol](xenstore.txt) is layered on top of
>> +the byte streams.
>> +
>> +The xenstore ring datastructure
>> +===============================
>> +
>> +The following table describes the ring structure where
>> +  - offsets and lengths are in bytes;
>> +  - "Input" is used to describe the data sent to the server; and
>> +  - "Output" is used to describe the data sent to the domain.
>> +
>> +Offset  Length  Description
>> +-----------------------------------------------------------------
>> +0       1024    Input data
>> +1024    1024    Output data
>> +2048    4       Input consumer offset
>> +2052    4       Input producer offset
>> +2056    4       Output consumer offset
>> +2060    4       Output producer offset
>> +2064    4       Server feature bitmap
>> +2068    4       Connection state
>> +
>> +The Input data and Output data are circular buffers. Each buffer is
>> +associated with a pair of free-running offsets labelled "consumer" and
>> +"producer".
>> +
>> +A "producer" offset is the offset in the byte stream of the next byte
>> +to be written modulo 2^32. A "consumer" offset is the offset in the byte
>> +stream of the next byte to be read modulo 2^32. Implementations must
>> +take care to handle wraparound properly when performing arithmetic with
>> +these values.
>> +
>> +The byte at offset 'x' in the byte stream will be stored at offset
>> +'x modulo 1024' in the circular buffer.
>> +
>> +Implementations may only overwrite previously-written data if it has
>> +been marked as 'consumed' by the relevant consumer pointer.
>> +
>> +When the guest domain is created, there is no outstanding Input or Output
>> +data. However
>> +
>> +  - guests must not assume that producer or consumer pointers start
>> +    at zero; and
>> +  - guests must not assume that unused bytes in either the Input or
>> +    Output data buffers has any particular value.
>> +
>> +A xenstore ring is always associated with an event channel. Whenever the
>> +ring structure is updated the event channel must be signalled. The
>> +guest and server are free to inspect the contents of the ring at any
>> +time, not only in response to an event channel event. This implies that
>> +updates must be ordered carefully to ensure consistency.
>> +
>> +The xenstore server may decide to advertise some features via the
>> +"Server feature bitmap". The server can start advertising features
>> +at any time by setting bits but it will never stop advertising features
>> +i.e. bits will never be cleared. The guest is not permitted to write to
>> +the server feature bitmap. The server features are offered to the guest;
>> +it is up to the guest whether to use them or not. The guest should ignore
>> +any unknown feature bits.
>> +
>> +The following features are defined:
>> +
>> +Mask    Description
>> +-----------------------------------------------------------------
>> +1       Ring reconnection (see the ring reconnection feature below)
>> +
>> +The "Connection state" field is used to request a ring close and reconnect.
>> +The "Connection state" field only contains valid data if the server has
>> +advertised the ring reconnection feature. If the feature has been advertised
>> +then the "Connection state" may take the following values:
>> +
>> +Value   Description
>> +-----------------------------------------------------------------
>> +0       Ring is connected
>> +1       Ring close and reconnect is in progress (see the "ring
>> +        reconnection feature" described below)
>> +
>> +The ring reconnection feature
>> +=============================
>> +
>> +The ring reconnection feature allows the guest to ask the server to
>> +reset the ring to a valid initial state i.e. where the Input and Output
>> +queues contain no data. The feature operates at the ring-level only;
>> +it does not affect the higher-level protocol state machine at all.
>> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
> 
> typo here: theh.
> 
>> +command in the [xenstore protocol](xenstore.txt) specification.
>> +
>> +The ring reconnection feature is only available if the 'Ring reconnection'
>> +feature bit has been set by the server in the "Server feature bitmap".
>> +
>> +Assuming the server has advertised the feature, the guest can initiate
>> +a reconnection by setting the the Connection state to 1 ("Ring close
>> +and reconnect is in progress") and signalling the event channel.
>> +The guest must now ignore all fields except the Connection state and
>> +wait for it to be set to 0 ("Ring is connected")
>> +
>> +The server will guarantee to
>> +
>> +  - drop any partially read or written higher-level
>> +    [xenstore protocol](xenstore.txt) packets it may have;
>> +  - empty the Input and Output queues in the xenstore ring;
>> +  - set the Connection state to 0 ("Ring is connected"); and
>> +  - signal the event channel.
>> +
>> +From the point of view of the guest, the connection has been reset on a
>> +packet boundary.
>> +
>> +Note that only the guest may set the Connection state to 1 and only the
>> +server may set it back to 0.
> 
>> [...]
>> -    /* We zero out the whole ring -- the backend can handle this, and it's 
>> -     * not going to surprise any frontends since it's equivalent to never 
>> -     * having used the rings. */
>> -    memset(rings, 0, sizeof *rings);
>> -
>> +    if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
>> +        rings->connection = XENSTORE_RECONNECT;
>> +        send.port = event;
>> +        hypercall_event_channel_op(EVTCHNOP_send, &send);
>> +        while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
> 
> I don't think you need the volatile here, certainly none of the other
> ring accesses seem to use it.
> 
> Apart from that the C side of things looks good. I've not looked at the
> ocaml side of things, do you have someone you can lean on to do some
> review or do I need to blow the cobwebs off that part of my brain?
> 
> Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
       [not found]                           ` <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>
@ 2014-09-22 16:38                             ` Ian Jackson
  2014-09-24  9:06                               ` Dave Scott
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Jackson @ 2014-09-22 16:38 UTC (permalink / raw)
  To: Dave Scott; +Cc: xen-devel, Ian Campbell

On 3 Sep 2014, at 17:25, David Scott <dave.scott@citrix.com> wrote:
> Hvmloader uses the xenstore ring and then tries to reset it back
> to its initial state before booting the guest. Occasionally xenstored
> will read the ring while it is being zeroed and conclude it has
> been corrupted. This prevents PV drivers from loading in the guest.

This protocol doc is almost perfect - see below.  I see that Ian C has
reviewed the C implementation.  Have you found anyone to provide a 2nd
opinion on the ocaml ?

> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> +The xenstore server may decide to advertise some features via the
> +"Server feature bitmap". The server can start advertising features
> +at any time by setting bits but it will never stop advertising features
> +i.e. bits will never be cleared. The guest is not permitted to write to
> +the server feature bitmap. The server features are offered to the guest;
> +it is up to the guest whether to use them or not. The guest should ignore
> +any unknown feature bits.

The rule that the server may start advertising features at any time is
reasonable but it needs to be accompanied by a promise, for any
specific feature, of the latest time at which the server will normally
advertise it.  Otherwise the client won't know when to check it and
whether it can sensibly cache the result.

> +Mask    Description
> +-----------------------------------------------------------------
> +1       Ring reconnection (see the ring reconnection feature below)
> +
> +The "Connection state" field is used to request a ring close and reconnect.
> +The "Connection state" field only contains valid data if the server has
> +advertised the ring reconnection feature. If the feature has been advertised
> +then the "Connection state" may take the following values:

For example, it would be useful to say here that this feature is
normally advertised from the start of the guest's operation.

> +The ring reconnection feature
> +=============================
> +
> +The ring reconnection feature allows the guest to ask the server to
> +reset the ring to a valid initial state i.e. where the Input and Output
> +queues contain no data. The feature operates at the ring-level only;
> +it does not affect the higher-level protocol state machine at all.
> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
> +command in the [xenstore protocol](xenstore.txt) specification.
> +
> +The ring reconnection feature is only available if the 'Ring reconnection'
> +feature bit has been set by the server in the "Server feature bitmap".
> +
> +Assuming the server has advertised the feature, the guest can initiate
> +a reconnection by setting the the Connection state to 1 ("Ring close
> +and reconnect is in progress") and signalling the event channel.
> +The guest must now ignore all fields except the Connection state and
> +wait for it to be set to 0 ("Ring is connected")
> +
> +The server will guarantee to
> +
> +  - drop any partially read or written higher-level
> +    [xenstore protocol](xenstore.txt) packets it may have;
> +  - empty the Input and Output queues in the xenstore ring;
> +  - set the Connection state to 0 ("Ring is connected"); and
> +  - signal the event channel.

I think the server needs to guarantee that there are no outstanding
commands, somehow.  As you write this, the server might have a command
being processed which is no longer in the Input queue but whose
response has not yet arrived in the Output queue.

You might say that the client can use the response to RESET_WATCHES to
distinguish replies to old commands (before the RESET_WATCHES reply)
from replies to new ones.  But (a) the server does not guarantee to
process messages in order and (b) the client might get unlucky and use
the same request id for its RESET_WATCHES as one of the `old'
commands.

I assume (without checking) that there is a suitable guarantee which
could be written down and which is actually implemented by both
existing xenstored implementations...

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
  2014-09-22 16:38                             ` Ian Jackson
@ 2014-09-24  9:06                               ` Dave Scott
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Scott @ 2014-09-24  9:06 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Jonathan Ludlam, Ian Campbell


On 22 Sep 2014, at 17:38, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:

> On 3 Sep 2014, at 17:25, David Scott <dave.scott@citrix.com> wrote:
>> Hvmloader uses the xenstore ring and then tries to reset it back
>> to its initial state before booting the guest. Occasionally xenstored
>> will read the ring while it is being zeroed and conclude it has
>> been corrupted. This prevents PV drivers from loading in the guest.
> 
> This protocol doc is almost perfect - see below.  I see that Ian C has
> reviewed the C implementation.  Have you found anyone to provide a 2nd
> opinion on the ocaml ?

I’ve cc:d Jon Ludlam. Jon: do you have time to review OCaml part of the patch?

> 
>> --- /dev/null
>> +++ b/docs/misc/xenstore-ring.txt
>> +The xenstore server may decide to advertise some features via the
>> +"Server feature bitmap". The server can start advertising features
>> +at any time by setting bits but it will never stop advertising features
>> +i.e. bits will never be cleared. The guest is not permitted to write to
>> +the server feature bitmap. The server features are offered to the guest;
>> +it is up to the guest whether to use them or not. The guest should ignore
>> +any unknown feature bits.
> 
> The rule that the server may start advertising features at any time is
> reasonable but it needs to be accompanied by a promise, for any
> specific feature, of the latest time at which the server will normally
> advertise it.  Otherwise the client won't know when to check it and
> whether it can sensibly cache the result.
> 
>> +Mask    Description
>> +-----------------------------------------------------------------
>> +1       Ring reconnection (see the ring reconnection feature below)
>> +
>> +The "Connection state" field is used to request a ring close and reconnect.
>> +The "Connection state" field only contains valid data if the server has
>> +advertised the ring reconnection feature. If the feature has been advertised
>> +then the "Connection state" may take the following values:
> 
> For example, it would be useful to say here that this feature is
> normally advertised from the start of the guest's operation.

OK.

> 
>> +The ring reconnection feature
>> +=============================
>> +
>> +The ring reconnection feature allows the guest to ask the server to
>> +reset the ring to a valid initial state i.e. where the Input and Output
>> +queues contain no data. The feature operates at the ring-level only;
>> +it does not affect the higher-level protocol state machine at all.
>> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
>> +command in the [xenstore protocol](xenstore.txt) specification.
>> +
>> +The ring reconnection feature is only available if the 'Ring reconnection'
>> +feature bit has been set by the server in the "Server feature bitmap".
>> +
>> +Assuming the server has advertised the feature, the guest can initiate
>> +a reconnection by setting the the Connection state to 1 ("Ring close
>> +and reconnect is in progress") and signalling the event channel.
>> +The guest must now ignore all fields except the Connection state and
>> +wait for it to be set to 0 ("Ring is connected")
>> +
>> +The server will guarantee to
>> +
>> +  - drop any partially read or written higher-level
>> +    [xenstore protocol](xenstore.txt) packets it may have;
>> +  - empty the Input and Output queues in the xenstore ring;
>> +  - set the Connection state to 0 ("Ring is connected"); and
>> +  - signal the event channel.
> 
> I think the server needs to guarantee that there are no outstanding
> commands, somehow.  As you write this, the server might have a command
> being processed which is no longer in the Input queue but whose
> response has not yet arrived in the Output queue.
> 
> You might say that the client can use the response to RESET_WATCHES to
> distinguish replies to old commands (before the RESET_WATCHES reply)
> from replies to new ones.  But (a) the server does not guarantee to
> process messages in order and (b) the client might get unlucky and use
> the same request id for its RESET_WATCHES as one of the `old'
> commands.

I see what you mean. For a ‘clean’ reconnect like hvmloader we can be
sure that there are no outstanding requests because hvmloader has already
received all the replies. For ‘unclean’ reconnects like a crash-kernel
start there is definitely a problem.

Since the request ids are local to the connection, and the connection has
just closed and re-opened, I think it makes sense for the server to drop
all old request ids. The client would then be able to safely use any
request id for the RESET_WATCHES without fearing a old reply coming back.

It would be even nicer from a client perspective (although I’m not sure
how this would affect the diff) if the ring connection re-open performed
a RESET_WATCHES automatically, since then it would act the same as closing
and re-opening your Unix domain socket.

Cheers,
Dave

> I assume (without checking) that there is a suitable guarantee which
> could be written down and which is actually implemented by both
> existing xenstored implementations...
> 
> Thanks,
> Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4] xenstore: extend the xenstore ring with a 'closing' signal
  2014-09-03 16:25                         ` [PATCH v4] xenstore: " David Scott
  2014-09-10 13:35                           ` Ian Campbell
       [not found]                           ` <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>
@ 2014-09-24 13:36                           ` Jon Ludlam
  2 siblings, 0 replies; 29+ messages in thread
From: Jon Ludlam @ 2014-09-24 13:36 UTC (permalink / raw)
  To: David Scott, xen-devel
  Cc: anil, Andrew.Cooper3, Jonathan.Ludlam, yanqiangjun, Paul.Durrant,
	john.liuqiming

On 03/09/14 17:25, David Scott wrote:
> Hvmloader uses the xenstore ring and then tries to reset it back
> to its initial state before booting the guest. Occasionally xenstored
> will read the ring while it is being zeroed and conclude it has
> been corrupted. This prevents PV drivers from loading in the guest.
>
> This patch updates the xenstore ring protocol definition, enabling
> a server to advertise additional features to the guest. One such feature
> is defined: the ability to cleanly reset the ring (but not the
> higher-level protocol, for which we already have RESET_WATCHES).
>
> This patch implements the ring reconnection features in oxenstored
> and hvmloader, fixing the bug.
>
> This patch also defines an 'invalid' xenstore packet type and uses this
> to poison the ring over a reconnect. This will make diagnosing this
> bug much easier in future.
>
> Signed-off-by: David Scott <dave.scott@citrix.com>
>
> ---
>
> Updates since v3
> * hvmloader: signal the event channel after requesting a reconnect
>   (thanks to Zheng Li for diagnosing this)
> * switch to using feature flags/bits instead of version numbers
> * rename the 'closing' field to 'connection state'
> * define an invalid packet type (0xffff, since 0x0 was already taken)
> * on ring connection/reset, fill the input and output buffers with
>   the invalid packet
> * ocaml xs_ring_stubs.c: collapse both {read,write} functions into one,
>   remove #define ERROR_FOO
> * doc: write from the guest's point of view, weaken guarantees made by
>   server (e.g. producer = consumer != 0 is valid after reconnect)
> * doc: relate reconnection to the RESET_WATCHES command in the higher
>   level protocol
> * doc: be more specific about who must write what, when
>
> Updates since v2
> * hvmloader: use volatile for read of closing flag
> * style improvements
> * remove xenstore version #defines
>
> Updates since v1
> * remove unused client version and associated dead code
> * treat 'closing' as a flag by using "!=0" rather than "==1"
> * hvmloader: move function up and remove forward decl
> * document the existing xenstore ring and the extention in misc/
> ---
>  docs/misc/xenstore-ring.txt         |  112 +++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/xenbus.c   |   48 +++++++++------
>  tools/ocaml/libs/xb/xb.ml           |   27 ++++++++-
>  tools/ocaml/libs/xb/xb.mli          |    1 +
>  tools/ocaml/libs/xb/xs_ring.ml      |   10 ++++
>  tools/ocaml/libs/xb/xs_ring_stubs.c |  109 ++++++++++++++++++++++++----------
>  xen/include/public/io/xs_wire.h     |   13 +++-
>  7 files changed, 269 insertions(+), 51 deletions(-)
>  create mode 100644 docs/misc/xenstore-ring.txt
>
> diff --git a/docs/misc/xenstore-ring.txt b/docs/misc/xenstore-ring.txt
> new file mode 100644
> index 0000000..6d18556
> --- /dev/null
> +++ b/docs/misc/xenstore-ring.txt
> @@ -0,0 +1,112 @@
> +The xenstore ring is a datastructure stored within a single 4KiB page
> +shared between the xenstore server and the guest. The ring contains
> +two queues of bytes -- one in each direction -- and some signalling
> +information. The [xenstore protocol](xenstore.txt) is layered on top of
> +the byte streams.
> +
> +The xenstore ring datastructure
> +===============================
> +
> +The following table describes the ring structure where
> +  - offsets and lengths are in bytes;
> +  - "Input" is used to describe the data sent to the server; and
> +  - "Output" is used to describe the data sent to the domain.
> +
> +Offset  Length  Description
> +-----------------------------------------------------------------
> +0       1024    Input data
> +1024    1024    Output data
> +2048    4       Input consumer offset
> +2052    4       Input producer offset
> +2056    4       Output consumer offset
> +2060    4       Output producer offset
> +2064    4       Server feature bitmap
> +2068    4       Connection state
> +
> +The Input data and Output data are circular buffers. Each buffer is
> +associated with a pair of free-running offsets labelled "consumer" and
> +"producer".
> +
> +A "producer" offset is the offset in the byte stream of the next byte
> +to be written modulo 2^32. A "consumer" offset is the offset in the byte
> +stream of the next byte to be read modulo 2^32. Implementations must
> +take care to handle wraparound properly when performing arithmetic with
> +these values.
> +
> +The byte at offset 'x' in the byte stream will be stored at offset
> +'x modulo 1024' in the circular buffer.
> +
> +Implementations may only overwrite previously-written data if it has
> +been marked as 'consumed' by the relevant consumer pointer.
> +
> +When the guest domain is created, there is no outstanding Input or Output
> +data. However
> +
> +  - guests must not assume that producer or consumer pointers start
> +    at zero; and
> +  - guests must not assume that unused bytes in either the Input or
> +    Output data buffers has any particular value.
> +
> +A xenstore ring is always associated with an event channel. Whenever the
> +ring structure is updated the event channel must be signalled. The
> +guest and server are free to inspect the contents of the ring at any
> +time, not only in response to an event channel event. This implies that
> +updates must be ordered carefully to ensure consistency.
> +
> +The xenstore server may decide to advertise some features via the
> +"Server feature bitmap". The server can start advertising features
> +at any time by setting bits but it will never stop advertising features
> +i.e. bits will never be cleared. The guest is not permitted to write to
> +the server feature bitmap. The server features are offered to the guest;
> +it is up to the guest whether to use them or not. The guest should ignore
> +any unknown feature bits.
> +
> +The following features are defined:
> +
> +Mask    Description
> +-----------------------------------------------------------------
> +1       Ring reconnection (see the ring reconnection feature below)
> +
> +The "Connection state" field is used to request a ring close and reconnect.
> +The "Connection state" field only contains valid data if the server has
> +advertised the ring reconnection feature. If the feature has been advertised
> +then the "Connection state" may take the following values:
> +
> +Value   Description
> +-----------------------------------------------------------------
> +0       Ring is connected
> +1       Ring close and reconnect is in progress (see the "ring
> +        reconnection feature" described below)
> +
> +The ring reconnection feature
> +=============================
> +
> +The ring reconnection feature allows the guest to ask the server to
> +reset the ring to a valid initial state i.e. where the Input and Output
> +queues contain no data. The feature operates at the ring-level only;
> +it does not affect the higher-level protocol state machine at all.
> +To reset theh higher-level protocol, please read about the 'RESET_WATCHES'
> +command in the [xenstore protocol](xenstore.txt) specification.
> +
> +The ring reconnection feature is only available if the 'Ring reconnection'
> +feature bit has been set by the server in the "Server feature bitmap".
> +
> +Assuming the server has advertised the feature, the guest can initiate
> +a reconnection by setting the the Connection state to 1 ("Ring close
> +and reconnect is in progress") and signalling the event channel.
> +The guest must now ignore all fields except the Connection state and
> +wait for it to be set to 0 ("Ring is connected")
> +
> +The server will guarantee to
> +
> +  - drop any partially read or written higher-level
> +    [xenstore protocol](xenstore.txt) packets it may have;
> +  - empty the Input and Output queues in the xenstore ring;
> +  - set the Connection state to 0 ("Ring is connected"); and
> +  - signal the event channel.
> +
> +From the point of view of the guest, the connection has been reset on a
> +packet boundary.
> +
> +Note that only the guest may set the Connection state to 1 and only the
> +server may set it back to 0.
> diff --git a/tools/firmware/hvmloader/xenbus.c b/tools/firmware/hvmloader/xenbus.c
> index 64c2176..f900a1e 100644
> --- a/tools/firmware/hvmloader/xenbus.c
> +++ b/tools/firmware/hvmloader/xenbus.c
> @@ -37,6 +37,19 @@ static struct xenstore_domain_interface *rings; /* Shared ring with dom0 */
>  static evtchn_port_t event;                     /* Event-channel to dom0 */
>  static char payload[XENSTORE_PAYLOAD_MAX + 1];  /* Unmarshalling area */
>  
> +static void ring_wait(void)
> +{
> +    struct shared_info *shinfo = get_shared_info();
> +    struct sched_poll poll;
> +
> +    memset(&poll, 0, sizeof(poll));
> +    set_xen_guest_handle(poll.ports, &event);
> +    poll.nr_ports = 1;
> +
> +    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> +        hypercall_sched_op(SCHEDOP_poll, &poll);
> +}
> +
>  /* Connect our xenbus client to the backend.
>   * Call once, before any other xenbus actions. */
>  void xenbus_setup(void)
> @@ -61,14 +74,26 @@ void xenbus_setup(void)
>  void xenbus_shutdown(void)
>  {
>      struct shared_info *shinfo = get_shared_info();
> +    evtchn_send_t send;
>  
>      ASSERT(rings != NULL);
>  
> -    /* We zero out the whole ring -- the backend can handle this, and it's 
> -     * not going to surprise any frontends since it's equivalent to never 
> -     * having used the rings. */
> -    memset(rings, 0, sizeof *rings);
> -
> +    if (rings->server_features & XENSTORE_SERVER_FEATURE_RECONNECTION) {
> +        rings->connection = XENSTORE_RECONNECT;
> +        send.port = event;
> +        hypercall_event_channel_op(EVTCHNOP_send, &send);
> +        while (*(volatile uint32_t*)&rings->connection == XENSTORE_RECONNECT)
> +            ring_wait ();
> +    } else {
> +        /* If the backend reads the state while we're erasing it then the
> +         * ring state will become corrupted, preventing guest frontends from
> +         * connecting. This is rare. To help diagnose the failure, we fill
> +         * the ring with XS_INVALID packets. */
> +        memset(rings->req, 0xff, XENSTORE_RING_SIZE);
> +        memset(rings->rsp, 0xff, XENSTORE_RING_SIZE);
> +        rings->req_cons = rings->req_prod = 0;
> +        rings->rsp_cons = rings->rsp_prod = 0;
> +    }
>      /* Clear the event-channel state too. */
>      memset(shinfo->vcpu_info, 0, sizeof(shinfo->vcpu_info));
>      memset(shinfo->evtchn_pending, 0, sizeof(shinfo->evtchn_pending));
> @@ -77,19 +102,6 @@ void xenbus_shutdown(void)
>      rings = NULL;
>  }
>  
> -static void ring_wait(void)
> -{
> -    struct shared_info *shinfo = get_shared_info();
> -    struct sched_poll poll;
> -
> -    memset(&poll, 0, sizeof(poll));
> -    set_xen_guest_handle(poll.ports, &event);
> -    poll.nr_ports = 1;
> -
> -    while ( !test_and_clear_bit(event, shinfo->evtchn_pending) )
> -        hypercall_sched_op(SCHEDOP_poll, &poll);
> -}
> -
>  /* Helper functions: copy data in and out of the ring */
>  static void ring_write(const char *data, uint32_t len)
>  {
> diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
> index 29d354d..6332433 100644
> --- a/tools/ocaml/libs/xb/xb.ml
> +++ b/tools/ocaml/libs/xb/xb.ml
> @@ -84,7 +84,26 @@ let write con s len =
>  	| Fd backfd     -> write_fd backfd con s len
>  	| Xenmmap backmmap -> write_mmap backmmap con s len
>  
> -let output con =
> +(* If a function throws Xs_ring.Reconnect, then clear the ring state
> +   and serve the ring again. *)
> +let rec handle_reconnect f con =
> +	match (try Some (f con) with Xs_ring.Reconnect -> None) with
> +	| Some x -> x
> +	| None ->
> +		begin match con.backend with
> +		| Fd _ -> raise Xs_ring.Reconnect (* should never happen, but just in case *)
> +		| Xenmmap backend ->
> +			Xs_ring.close backend.mmap;
> +			backend.eventchn_notify ();
> +			(* Clear our old connection state *)
> +			Queue.clear con.pkt_in;
> +			Queue.clear con.pkt_out;
> +			con.partial_in <- init_partial_in ();
> +			con.partial_out <- "";
> +			handle_reconnect f con
> +		end
> +
> +let output = handle_reconnect (fun con ->
>  	(* get the output string from a string_of(packet) or partial_out *)
>  	let s = if String.length con.partial_out > 0 then
>  			con.partial_out
> @@ -101,8 +120,9 @@ let output con =
>  	);
>  	(* after sending one packet, partial is empty *)
>  	con.partial_out = ""
> +)
>  
> -let input con =
> +let input = handle_reconnect (fun con ->
>  	let newpacket = ref false in
>  	let to_read =
>  		match con.partial_in with
> @@ -133,6 +153,7 @@ let input con =
>  			HaveHdr (Partial.of_string buf) else NoHdr (i - sz, buf)
>  	);
>  	!newpacket
> +)
>  
>  let newcon backend = {
>  	backend = backend;
> @@ -145,6 +166,8 @@ let newcon backend = {
>  let open_fd fd = newcon (Fd { fd = fd; })
>  
>  let open_mmap mmap notifyfct =
> +	(* Advertise XENSTORE_SERVER_FEATURE_RECONNECTION *)
> +	Xs_ring.set_server_features mmap 1;

Minor nit here: the hard-coded 1 looks bad.

>  	newcon (Xenmmap {
>  		mmap = mmap;
>  		eventchn_notify = notifyfct;
> diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
> index 58234ae..d47d869 100644
> --- a/tools/ocaml/libs/xb/xb.mli
> +++ b/tools/ocaml/libs/xb/xb.mli
> @@ -80,6 +80,7 @@ val read : t -> string -> int -> int
>  val write_fd : backend_fd -> 'a -> string -> int -> int
>  val write_mmap : backend_mmap -> 'a -> string -> int -> int
>  val write : t -> string -> int -> int
> +val handle_reconnect : (t -> 'a) -> t -> 'a
>  val output : t -> bool
>  val input : t -> bool
>  val newcon : backend -> t
> diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
> index 9469609..2c4eeab 100644
> --- a/tools/ocaml/libs/xb/xs_ring.ml
> +++ b/tools/ocaml/libs/xb/xs_ring.ml
> @@ -14,5 +14,15 @@
>   * GNU Lesser General Public License for more details.
>   *)
>  
> +exception Reconnect
> +
> +let _ =
> +  Callback.register_exception "Xs_ring.Reconnect" Reconnect
> +
>  external read: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_read"
>  external write: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
> +
> +external set_server_features: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_features" "noalloc"
> +external get_server_features: Xenmmap.mmap_interface -> int = "ml_interface_get_server_features" "noalloc"
> +
> +external close: Xenmmap.mmap_interface -> unit = "ml_interface_close" "noalloc"
> diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
> index 8bd1047..c728a01 100644
> --- a/tools/ocaml/libs/xb/xs_ring_stubs.c
> +++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
> @@ -35,22 +35,39 @@
>  
>  #define GET_C_STRUCT(a) ((struct mmap_interface *) a)
>  
> -static int xs_ring_read(struct mmap_interface *interface,
> -                             char *buffer, int len)
> +CAMLprim value ml_interface_read(value ml_interface,
> +                                 value ml_buffer,
> +                                 value ml_len)
>  {
> +	CAMLparam3(ml_interface, ml_buffer, ml_len);
> +	CAMLlocal1(ml_result);
> +
> +	struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
> +	char *buffer = String_val(ml_buffer);
> +	int len = Int_val(ml_len);
> +	int result;
> +
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod; /* offsets only */
>  	int to_read;
> +	uint32_t connection;
>  
>  	cons = *(volatile uint32*)&intf->req_cons;
>  	prod = *(volatile uint32*)&intf->req_prod;
> +	connection = *(volatile uint32*)&intf->connection;
> +
> +	if (connection != XENSTORE_CONNECTED)
> +		caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
> +
>  	xen_mb();
>  
>  	if ((prod - cons) > XENSTORE_RING_SIZE)
> -	    return -1;
> +		caml_failwith("bad connection");
>  
> -	if (prod == cons)
> -		return 0;
> +	if (prod == cons) {
> +		result = 0;
> +		goto exit;
> +	}
>  	cons = MASK_XENSTORE_IDX(cons);
>  	prod = MASK_XENSTORE_IDX(prod);
>  	if (prod > cons)
> @@ -62,21 +79,41 @@ static int xs_ring_read(struct mmap_interface *interface,
>  	memcpy(buffer, intf->req + cons, len);
>  	xen_mb();
>  	intf->req_cons += len;
> -	return len;
> +	result = len;
> +exit:
> +	ml_result = Val_int(result);
> +	CAMLreturn(ml_result);
>  }
>  
> -static int xs_ring_write(struct mmap_interface *interface,
> -                              char *buffer, int len)
> +CAMLprim value ml_interface_write(value ml_interface,
> +                                  value ml_buffer,
> +                                  value ml_len)
>  {
> +	CAMLparam3(ml_interface, ml_buffer, ml_len);
> +	CAMLlocal1(ml_result);
> +
> +	struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
> +	char *buffer = String_val(ml_buffer);
> +	int len = Int_val(ml_len);
> +	int result;
> +
>  	struct xenstore_domain_interface *intf = interface->addr;
>  	XENSTORE_RING_IDX cons, prod;
>  	int can_write;
> +	uint32_t connection;
>  
>  	cons = *(volatile uint32*)&intf->rsp_cons;
>  	prod = *(volatile uint32*)&intf->rsp_prod;
> +	connection = *(volatile uint32*)&intf->connection;
> +
> +	if (connection != XENSTORE_CONNECTED)
> +		caml_raise_constant(*caml_named_value("Xs_ring.Reconnect"));
> +
>  	xen_mb();
> -	if ( (prod - cons) >= XENSTORE_RING_SIZE )
> -		return 0;
> +	if ( (prod - cons) >= XENSTORE_RING_SIZE ) {
> +		result = 0;
> +		goto exit;
> +	}
>  	if (MASK_XENSTORE_IDX(prod) >= MASK_XENSTORE_IDX(cons))
>  		can_write = XENSTORE_RING_SIZE - MASK_XENSTORE_IDX(prod);
>  	else 
> @@ -86,31 +123,43 @@ static int xs_ring_write(struct mmap_interface *interface,
>  	memcpy(intf->rsp + MASK_XENSTORE_IDX(prod), buffer, len);
>  	xen_mb();
>  	intf->rsp_prod += len;
> -	return len;
> +	result = len;
> +exit:
> +	ml_result = Val_int(result);
> +	CAMLreturn(ml_result);
>  }
>  
> -CAMLprim value ml_interface_read(value interface, value buffer, value len)
> +CAMLprim value ml_interface_set_server_features(value interface, value v)
>  {
> -	CAMLparam3(interface, buffer, len);
> -	CAMLlocal1(result);
> -	int res;
> +	CAMLparam2(interface, v);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
>  
> -	res = xs_ring_read(GET_C_STRUCT(interface),
> -	                   String_val(buffer), Int_val(len));
> -	if (res == -1)
> -		caml_failwith("bad connection");
> -	result = Val_int(res);
> -	CAMLreturn(result);
> +	intf->server_features = Int_val(v);
> +
> +	CAMLreturn(Val_unit);
> +}
> +
> +CAMLprim value ml_interface_get_server_features(value interface)
> +{
> +	CAMLparam1(interface);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> +
> +	CAMLreturn(Val_int (intf->server_features));
>  }
>  
> -CAMLprim value ml_interface_write(value interface, value buffer, value len)
> +CAMLprim value ml_interface_close(value interface)
>  {
> -	CAMLparam3(interface, buffer, len);
> -	CAMLlocal1(result);
> -	int res;
> -
> -	res = xs_ring_write(GET_C_STRUCT(interface),
> -	                    String_val(buffer), Int_val(len));
> -	result = Val_int(res);
> -	CAMLreturn(result);
> +	CAMLparam1(interface);
> +	struct xenstore_domain_interface *intf = GET_C_STRUCT(interface)->addr;
> +	int i;
> +
> +	intf->req_cons = intf->req_prod = intf->rsp_cons = intf->rsp_prod = 0;
> +	/* Ensure the unused space is full of invalid xenstore packets. */
> +	for (i = 0; i < XENSTORE_RING_SIZE; i++) {
> +		intf->req[i] = 0xff; /* XS_INVALID = 0xffff */
> +		intf->rsp[i] = 0xff;
> +	}
> +	xen_mb ();
> +	intf->connection = XENSTORE_CONNECTED;
> +	CAMLreturn(Val_unit);
>  }
> diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
> index 585f0c8..0a0cdbc 100644
> --- a/xen/include/public/io/xs_wire.h
> +++ b/xen/include/public/io/xs_wire.h
> @@ -49,7 +49,9 @@ enum xsd_sockmsg_type
>      XS_RESUME,
>      XS_SET_TARGET,
>      XS_RESTRICT,
> -    XS_RESET_WATCHES
> +    XS_RESET_WATCHES,
> +
> +    XS_INVALID = 0xffff /* Guaranteed to remain an invalid type */
>  };
>  
>  #define XS_WRITE_NONE "NONE"
> @@ -116,6 +118,8 @@ struct xenstore_domain_interface {
>      char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */
>      XENSTORE_RING_IDX req_cons, req_prod;
>      XENSTORE_RING_IDX rsp_cons, rsp_prod;
> +    uint32_t server_features; /* Bitmap of features supported by the server */
> +    uint32_t connection;
>  };
>  
>  /* Violating this is very bad.  See docs/misc/xenstore.txt. */
> @@ -125,6 +129,13 @@ struct xenstore_domain_interface {
>  #define XENSTORE_ABS_PATH_MAX 3072
>  #define XENSTORE_REL_PATH_MAX 2048
>  
> +/* The ability to reconnect a ring */
> +#define XENSTORE_SERVER_FEATURE_RECONNECTION 1
> +
> +/* Valid values for the connection field */
> +#define XENSTORE_CONNECTED 0 /* the steady-state */
> +#define XENSTORE_RECONNECT 1 /* guest has initiated a reconnect */
> +
>  #endif /* _XS_WIRE_H */
>  
>  /*

Other than that, it looks good to me.

Reviewed-by: Jon Ludlam <jonathan.ludlam@citrix.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-09-24 13:36 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-04  4:25 bug in hvmloader xenbus_shutdown logic Liuqiming (John)
2013-12-04  9:49 ` Ian Campbell
2013-12-05  2:08   ` Liuqiming (John)
2013-12-05  9:36     ` David Scott
2013-12-05  9:43       ` Paul Durrant
2013-12-05 11:10       ` Ian Campbell
2013-12-06  3:53         ` Liuqiming (John)
2014-06-16 13:43           ` Dave Scott
2014-06-16 13:57             ` Paul Durrant
2014-06-25 21:15               ` [PATCH RFC] RFC: extend the xenstore ring with a 'closing' signal David Scott
2014-06-26  9:05                 ` Paul Durrant
2014-06-26  9:45                   ` Dave Scott
2014-06-30 14:42                     ` [PATCH v2] " David Scott
2014-07-03 15:15                       ` [PATCH v3] " David Scott
2014-07-04  8:21                         ` Paul Durrant
2014-07-04 10:00                           ` Dave Scott
2014-07-04  9:59                         ` Ian Campbell
2014-07-04 10:19                           ` Dave Scott
2014-07-04 11:27                             ` Ian Campbell
2014-09-03 16:25                         ` [PATCH v4] xenstore: " David Scott
2014-09-10 13:35                           ` Ian Campbell
2014-09-10 14:33                             ` Dave Scott
     [not found]                           ` <DF76B30A-D122-4600-987E-6BBD66CFFF73@citrix.com>
2014-09-22 16:38                             ` Ian Jackson
2014-09-24  9:06                               ` Dave Scott
2014-09-24 13:36                           ` Jon Ludlam
2014-07-04 11:02                       ` [PATCH v2] RFC: " Ian Jackson
2014-07-02 12:32                 ` [PATCH RFC] " Andrew Cooper
2014-07-02 16:47                   ` Dave Scott
2014-07-02 16:52                     ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.