xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Issues with the xen-access.c model
@ 2016-04-05 10:55 Razvan Cojocaru
  2016-04-05 12:13 ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Razvan Cojocaru @ 2016-04-05 10:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Tamas K Lengyel

Hello,

xen-access.c does roughly this:

for (;;) {
	poll_evt_channel();

	if (new_events) {
		while (ring_buffer_has_requests) {
			pull_request();
			process_request();
			put_response();
		}

		signal_evt_channel();
	}
}

The problems are these:

1. vm_event_put_request() does notify_via_xen_event_channel(d,
ved->xen_port); at the very end, and we seem to be using a FIFO event
channel (if I'm reading the code right, it's generic event channel code
that pre-dates the vm_event code that sits on top of it).

This means that for a guest that's running two VCPUs, the following
scenario is possible:

* VCPU1 puts an event in the ring buffer
* VCPU1 signals the event channel
* poll_evt_channel() wakes up
* VCPU2 puts an event in the ring buffer
* the loop processes the first event
* VCPU2 signals the event channel
* the loop processes the second event (since there _are_ more requests
in the ring buffer)
* the loop signals the event channel that requests have been processed
* poll_evt_channel() wakes up again, because of the second event channel
signal, but this is pointless since all events have already been processed

This can be avoided by counting the requests processed vs. the number of
succesful poll()s, but it still does not solve the second problem:

2. Race conditions. At least in theory, there's nothing to stop the
second iteration of the request processing loop to read a partially
written request from the ring buffer, or garbage left over from a
previous request, if it hits it "just right" (i.e. the "VCPU2 puts an
event in the ring buffer" part can run on top of the "the loop processes
the second event").

All this doesn't really show up unless we're in heavy processing
scenarios, but they can occur, especially in synced vm_event scenarios.

The problems seems to be alleviated by changing the loop to this:

for (;;) {
	poll_evt_channel();

	if (new_events) {
		while (ring_buffer_has_requests) {
			pull_request();
			process_request();
			put_response();

			/* Moved here. */
			signal_evt_channel();
		}
	}
}

but it's obviously not foolproof, the main problem remaining the
synchronization of the ring buffer from both the hypervisor side and
userspace.

How should we proceed to fix this?


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Issues with the xen-access.c model
  2016-04-05 10:55 Issues with the xen-access.c model Razvan Cojocaru
@ 2016-04-05 12:13 ` Andrew Cooper
  2016-04-05 15:35   ` Tamas K Lengyel
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2016-04-05 12:13 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel; +Cc: Tamas K Lengyel

On 05/04/16 11:55, Razvan Cojocaru wrote:
> Hello,
>
> xen-access.c does roughly this:
>
> for (;;) {
> 	poll_evt_channel();
>
> 	if (new_events) {
> 		while (ring_buffer_has_requests) {
> 			pull_request();
> 			process_request();
> 			put_response();
> 		}
>
> 		signal_evt_channel();
> 	}
> }
>
> The problems are these:
>
> 1. vm_event_put_request() does notify_via_xen_event_channel(d,
> ved->xen_port); at the very end, and we seem to be using a FIFO event
> channel (if I'm reading the code right, it's generic event channel code
> that pre-dates the vm_event code that sits on top of it).
>
> This means that for a guest that's running two VCPUs, the following
> scenario is possible:
>
> * VCPU1 puts an event in the ring buffer
> * VCPU1 signals the event channel
> * poll_evt_channel() wakes up
> * VCPU2 puts an event in the ring buffer
> * the loop processes the first event
> * VCPU2 signals the event channel
> * the loop processes the second event (since there _are_ more requests
> in the ring buffer)
> * the loop signals the event channel that requests have been processed
> * poll_evt_channel() wakes up again, because of the second event channel
> signal, but this is pointless since all events have already been processed
>
> This can be avoided by counting the requests processed vs. the number of
> succesful poll()s, but it still does not solve the second problem:
>
> 2. Race conditions. At least in theory, there's nothing to stop the
> second iteration of the request processing loop to read a partially
> written request from the ring buffer, or garbage left over from a
> previous request, if it hits it "just right" (i.e. the "VCPU2 puts an
> event in the ring buffer" part can run on top of the "the loop processes
> the second event").
>
> All this doesn't really show up unless we're in heavy processing
> scenarios, but they can occur, especially in synced vm_event scenarios.
>
> The problems seems to be alleviated by changing the loop to this:
>
> for (;;) {
> 	poll_evt_channel();
>
> 	if (new_events) {
> 		while (ring_buffer_has_requests) {
> 			pull_request();
> 			process_request();
> 			put_response();
>
> 			/* Moved here. */
> 			signal_evt_channel();
> 		}
> 	}
> }
>
> but it's obviously not foolproof, the main problem remaining the
> synchronization of the ring buffer from both the hypervisor side and
> userspace.
>
> How should we proceed to fix this?

The vm_event code in Xen must increment the producer index as the final
action of putting a request on the ring.  If it doesn't then it is
simply broken and needs fixing.

This will ensure that the consumer never sees a partial request on the
ring; by the time it observes a newer producer index, all data is
already written.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Issues with the xen-access.c model
  2016-04-05 12:13 ` Andrew Cooper
@ 2016-04-05 15:35   ` Tamas K Lengyel
  2016-04-05 15:55     ` Razvan Cojocaru
  0 siblings, 1 reply; 6+ messages in thread
From: Tamas K Lengyel @ 2016-04-05 15:35 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Razvan Cojocaru, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4057 bytes --]

On Tue, Apr 5, 2016 at 6:13 AM, Andrew Cooper <andrew.cooper3@citrix.com>
wrote:

> On 05/04/16 11:55, Razvan Cojocaru wrote:
> > Hello,
> >
> > xen-access.c does roughly this:
> >
> > for (;;) {
> >       poll_evt_channel();
> >
> >       if (new_events) {
> >               while (ring_buffer_has_requests) {
> >                       pull_request();
> >                       process_request();
> >                       put_response();
> >               }
> >
> >               signal_evt_channel();
> >       }
> > }
> >
> > The problems are these:
> >
> > 1. vm_event_put_request() does notify_via_xen_event_channel(d,
> > ved->xen_port); at the very end, and we seem to be using a FIFO event
> > channel (if I'm reading the code right, it's generic event channel code
> > that pre-dates the vm_event code that sits on top of it).
> >
> > This means that for a guest that's running two VCPUs, the following
> > scenario is possible:
> >
> > * VCPU1 puts an event in the ring buffer
> > * VCPU1 signals the event channel
> > * poll_evt_channel() wakes up
> > * VCPU2 puts an event in the ring buffer
> > * the loop processes the first event
> > * VCPU2 signals the event channel
> > * the loop processes the second event (since there _are_ more requests
> > in the ring buffer)
> > * the loop signals the event channel that requests have been processed
> > * poll_evt_channel() wakes up again, because of the second event channel
> > signal, but this is pointless since all events have already been
> processed
> >
> > This can be avoided by counting the requests processed vs. the number of
> > succesful poll()s, but it still does not solve the second problem:
> >
> > 2. Race conditions. At least in theory, there's nothing to stop the
> > second iteration of the request processing loop to read a partially
> > written request from the ring buffer, or garbage left over from a
> > previous request, if it hits it "just right" (i.e. the "VCPU2 puts an
> > event in the ring buffer" part can run on top of the "the loop processes
> > the second event").
> >
> > All this doesn't really show up unless we're in heavy processing
> > scenarios, but they can occur, especially in synced vm_event scenarios.
> >
> > The problems seems to be alleviated by changing the loop to this:
> >
> > for (;;) {
> >       poll_evt_channel();
> >
> >       if (new_events) {
> >               while (ring_buffer_has_requests) {
> >                       pull_request();
> >                       process_request();
> >                       put_response();
> >
> >                       /* Moved here. */
> >                       signal_evt_channel();
>

That was the original setup, I've changed it a while back by moving it
outside the loop. My reasoning was that there is no point in going lockstep
with events and issuing a hypercall after each is processed. With only a
handful of events on the ring the context switch is probably more taxing
then having all requests and responses processed in one swipe. Of course,
depending on the application this may not be the optimal case and the
signal can be moved back to the loop (with lots of vCPUs it may be the case
that the first vCPU remains paused while all events are processed for the
other vCPUs, which of course would be bad).


> >               }
> >       }
> > }
> >
> > but it's obviously not foolproof, the main problem remaining the
> > synchronization of the ring buffer from both the hypervisor side and
> > userspace.
> >
> > How should we proceed to fix this?
>
> The vm_event code in Xen must increment the producer index as the final
> action of putting a request on the ring.  If it doesn't then it is
> simply broken and needs fixing.
>
> This will ensure that the consumer never sees a partial request on the
> ring; by the time it observes a newer producer index, all data is
> already written.
>

It may worth double-checking but I'm pretty sure that's how it works right
now as that part of the code has been pretty much just been recycled from
the other Xen ring implementations.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 5444 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Issues with the xen-access.c model
  2016-04-05 15:35   ` Tamas K Lengyel
@ 2016-04-05 15:55     ` Razvan Cojocaru
  2016-04-05 16:10       ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Razvan Cojocaru @ 2016-04-05 15:55 UTC (permalink / raw)
  To: Tamas K Lengyel, Andrew Cooper; +Cc: xen-devel

On 04/05/16 18:35, Tamas K Lengyel wrote:
> 
> 
> On Tue, Apr 5, 2016 at 6:13 AM, Andrew Cooper <andrew.cooper3@citrix.com
> <mailto:andrew.cooper3@citrix.com>> wrote:
> 
>     On 05/04/16 11:55, Razvan Cojocaru wrote:
>     > Hello,
>     >
>     > xen-access.c does roughly this:
>     >
>     > for (;;) {
>     >       poll_evt_channel();
>     >
>     >       if (new_events) {
>     >               while (ring_buffer_has_requests) {
>     >                       pull_request();
>     >                       process_request();
>     >                       put_response();
>     >               }
>     >
>     >               signal_evt_channel();
>     >       }
>     > }
>     >
>     > The problems are these:
>     >
>     > 1. vm_event_put_request() does notify_via_xen_event_channel(d,
>     > ved->xen_port); at the very end, and we seem to be using a FIFO event
>     > channel (if I'm reading the code right, it's generic event channel
>     code
>     > that pre-dates the vm_event code that sits on top of it).
>     >
>     > This means that for a guest that's running two VCPUs, the following
>     > scenario is possible:
>     >
>     > * VCPU1 puts an event in the ring buffer
>     > * VCPU1 signals the event channel
>     > * poll_evt_channel() wakes up
>     > * VCPU2 puts an event in the ring buffer
>     > * the loop processes the first event
>     > * VCPU2 signals the event channel
>     > * the loop processes the second event (since there _are_ more requests
>     > in the ring buffer)
>     > * the loop signals the event channel that requests have been processed
>     > * poll_evt_channel() wakes up again, because of the second event
>     channel
>     > signal, but this is pointless since all events have already been
>     processed
>     >
>     > This can be avoided by counting the requests processed vs. the
>     number of
>     > succesful poll()s, but it still does not solve the second problem:
>     >
>     > 2. Race conditions. At least in theory, there's nothing to stop the
>     > second iteration of the request processing loop to read a partially
>     > written request from the ring buffer, or garbage left over from a
>     > previous request, if it hits it "just right" (i.e. the "VCPU2 puts an
>     > event in the ring buffer" part can run on top of the "the loop
>     processes
>     > the second event").
>     >
>     > All this doesn't really show up unless we're in heavy processing
>     > scenarios, but they can occur, especially in synced vm_event
>     scenarios.
>     >
>     > The problems seems to be alleviated by changing the loop to this:
>     >
>     > for (;;) {
>     >       poll_evt_channel();
>     >
>     >       if (new_events) {
>     >               while (ring_buffer_has_requests) {
>     >                       pull_request();
>     >                       process_request();
>     >                       put_response();
>     >
>     >                       /* Moved here. */
>     >                       signal_evt_channel();
> 
> 
> That was the original setup, I've changed it a while back by moving it
> outside the loop. My reasoning was that there is no point in going
> lockstep with events and issuing a hypercall after each is processed.
> With only a handful of events on the ring the context switch is probably
> more taxing then having all requests and responses processed in one
> swipe. Of course, depending on the application this may not be the
> optimal case and the signal can be moved back to the loop (with lots of
> vCPUs it may be the case that the first vCPU remains paused while all
> events are processed for the other vCPUs, which of course would be bad).
>  
> 
>     >               }
>     >       }
>     > }
>     >
>     > but it's obviously not foolproof, the main problem remaining the
>     > synchronization of the ring buffer from both the hypervisor side and
>     > userspace.
>     >
>     > How should we proceed to fix this?
> 
>     The vm_event code in Xen must increment the producer index as the final
>     action of putting a request on the ring.  If it doesn't then it is
>     simply broken and needs fixing.
> 
>     This will ensure that the consumer never sees a partial request on the
>     ring; by the time it observes a newer producer index, all data is
>     already written.
> 
> 
> It may worth double-checking but I'm pretty sure that's how it works
> right now as that part of the code has been pretty much just been
> recycled from the other Xen ring implementations.

I understand. We shouldn't forget that the consumer also modifies the
ring in userspace, and the hypervisor then becomes a consumer of
responses, and the same issues apply to it (would compiler optimizations
be a problem here?). But unless that's the case, then I'm probably
seeing a different issue and Andrew is right.

Thank you both for the prompt replies!


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Issues with the xen-access.c model
  2016-04-05 15:55     ` Razvan Cojocaru
@ 2016-04-05 16:10       ` Andrew Cooper
  2016-04-06 16:08         ` Razvan Cojocaru
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2016-04-05 16:10 UTC (permalink / raw)
  To: Razvan Cojocaru, Tamas K Lengyel; +Cc: xen-devel

On 05/04/16 16:55, Razvan Cojocaru wrote:
> On 04/05/16 18:35, Tamas K Lengyel wrote:
>>
>> On Tue, Apr 5, 2016 at 6:13 AM, Andrew Cooper <andrew.cooper3@citrix.com
>> <mailto:andrew.cooper3@citrix.com>> wrote:
>>
>>     On 05/04/16 11:55, Razvan Cojocaru wrote:
>>     > Hello,
>>     >
>>     > xen-access.c does roughly this:
>>     >
>>     > for (;;) {
>>     >       poll_evt_channel();
>>     >
>>     >       if (new_events) {
>>     >               while (ring_buffer_has_requests) {
>>     >                       pull_request();
>>     >                       process_request();
>>     >                       put_response();
>>     >               }
>>     >
>>     >               signal_evt_channel();
>>     >       }
>>     > }
>>     >
>>     > The problems are these:
>>     >
>>     > 1. vm_event_put_request() does notify_via_xen_event_channel(d,
>>     > ved->xen_port); at the very end, and we seem to be using a FIFO event
>>     > channel (if I'm reading the code right, it's generic event channel
>>     code
>>     > that pre-dates the vm_event code that sits on top of it).
>>     >
>>     > This means that for a guest that's running two VCPUs, the following
>>     > scenario is possible:
>>     >
>>     > * VCPU1 puts an event in the ring buffer
>>     > * VCPU1 signals the event channel
>>     > * poll_evt_channel() wakes up
>>     > * VCPU2 puts an event in the ring buffer
>>     > * the loop processes the first event
>>     > * VCPU2 signals the event channel
>>     > * the loop processes the second event (since there _are_ more requests
>>     > in the ring buffer)
>>     > * the loop signals the event channel that requests have been processed
>>     > * poll_evt_channel() wakes up again, because of the second event
>>     channel
>>     > signal, but this is pointless since all events have already been
>>     processed
>>     >
>>     > This can be avoided by counting the requests processed vs. the
>>     number of
>>     > succesful poll()s, but it still does not solve the second problem:
>>     >
>>     > 2. Race conditions. At least in theory, there's nothing to stop the
>>     > second iteration of the request processing loop to read a partially
>>     > written request from the ring buffer, or garbage left over from a
>>     > previous request, if it hits it "just right" (i.e. the "VCPU2 puts an
>>     > event in the ring buffer" part can run on top of the "the loop
>>     processes
>>     > the second event").
>>     >
>>     > All this doesn't really show up unless we're in heavy processing
>>     > scenarios, but they can occur, especially in synced vm_event
>>     scenarios.
>>     >
>>     > The problems seems to be alleviated by changing the loop to this:
>>     >
>>     > for (;;) {
>>     >       poll_evt_channel();
>>     >
>>     >       if (new_events) {
>>     >               while (ring_buffer_has_requests) {
>>     >                       pull_request();
>>     >                       process_request();
>>     >                       put_response();
>>     >
>>     >                       /* Moved here. */
>>     >                       signal_evt_channel();
>>
>>
>> That was the original setup, I've changed it a while back by moving it
>> outside the loop. My reasoning was that there is no point in going
>> lockstep with events and issuing a hypercall after each is processed.
>> With only a handful of events on the ring the context switch is probably
>> more taxing then having all requests and responses processed in one
>> swipe. Of course, depending on the application this may not be the
>> optimal case and the signal can be moved back to the loop (with lots of
>> vCPUs it may be the case that the first vCPU remains paused while all
>> events are processed for the other vCPUs, which of course would be bad).
>>  
>>
>>     >               }
>>     >       }
>>     > }
>>     >
>>     > but it's obviously not foolproof, the main problem remaining the
>>     > synchronization of the ring buffer from both the hypervisor side and
>>     > userspace.
>>     >
>>     > How should we proceed to fix this?
>>
>>     The vm_event code in Xen must increment the producer index as the final
>>     action of putting a request on the ring.  If it doesn't then it is
>>     simply broken and needs fixing.
>>
>>     This will ensure that the consumer never sees a partial request on the
>>     ring; by the time it observes a newer producer index, all data is
>>     already written.
>>
>>
>> It may worth double-checking but I'm pretty sure that's how it works
>> right now as that part of the code has been pretty much just been
>> recycled from the other Xen ring implementations.
> I understand. We shouldn't forget that the consumer also modifies the
> ring in userspace, and the hypervisor then becomes a consumer of
> responses, and the same issues apply to it (would compiler optimizations
> be a problem here?).

Quite possibly.  See XSA-155

> But unless that's the case, then I'm probably seeing a different issue and Andrew is right.
>
> Thank you both for the prompt replies!

You need to make certain that for both producer/consumer pairs, the
following is true:

The producer and consumer indices are read using some form of atomic
operation (ACCESS_ONCE() most likely, or read_atomic()), followed by an
smb_rmb() to guarantee that the read completes and is in a local
variable before any calculation is performed based on the value.  (This
prevents the compiler "optimising" to multiple reads of the shared ring).

Then, memcpy() (or suitable alternative) to get the contents of the
request in/out of the ring.

For the producer side, an smp_wmb() between the memcpy() into the ring,
and the write to the producer index.
For the consumer side, an smp_mb() between the memcpy() out of the ring,
and the write to the consumer index.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Issues with the xen-access.c model
  2016-04-05 16:10       ` Andrew Cooper
@ 2016-04-06 16:08         ` Razvan Cojocaru
  0 siblings, 0 replies; 6+ messages in thread
From: Razvan Cojocaru @ 2016-04-06 16:08 UTC (permalink / raw)
  To: Andrew Cooper, Tamas K Lengyel; +Cc: xen-devel

On 04/05/2016 07:10 PM, Andrew Cooper wrote:
> On 05/04/16 16:55, Razvan Cojocaru wrote:
>> On 04/05/16 18:35, Tamas K Lengyel wrote:
>>>
>>> On Tue, Apr 5, 2016 at 6:13 AM, Andrew Cooper <andrew.cooper3@citrix.com
>>> <mailto:andrew.cooper3@citrix.com>> wrote:
>>>
>>>     On 05/04/16 11:55, Razvan Cojocaru wrote:
>>>     > Hello,
>>>     >
>>>     > xen-access.c does roughly this:
>>>     >
>>>     > for (;;) {
>>>     >       poll_evt_channel();
>>>     >
>>>     >       if (new_events) {
>>>     >               while (ring_buffer_has_requests) {
>>>     >                       pull_request();
>>>     >                       process_request();
>>>     >                       put_response();
>>>     >               }
>>>     >
>>>     >               signal_evt_channel();
>>>     >       }
>>>     > }
>>>     >
>>>     > The problems are these:
>>>     >
>>>     > 1. vm_event_put_request() does notify_via_xen_event_channel(d,
>>>     > ved->xen_port); at the very end, and we seem to be using a FIFO event
>>>     > channel (if I'm reading the code right, it's generic event channel
>>>     code
>>>     > that pre-dates the vm_event code that sits on top of it).
>>>     >
>>>     > This means that for a guest that's running two VCPUs, the following
>>>     > scenario is possible:
>>>     >
>>>     > * VCPU1 puts an event in the ring buffer
>>>     > * VCPU1 signals the event channel
>>>     > * poll_evt_channel() wakes up
>>>     > * VCPU2 puts an event in the ring buffer
>>>     > * the loop processes the first event
>>>     > * VCPU2 signals the event channel
>>>     > * the loop processes the second event (since there _are_ more requests
>>>     > in the ring buffer)
>>>     > * the loop signals the event channel that requests have been processed
>>>     > * poll_evt_channel() wakes up again, because of the second event
>>>     channel
>>>     > signal, but this is pointless since all events have already been
>>>     processed
>>>     >
>>>     > This can be avoided by counting the requests processed vs. the
>>>     number of
>>>     > succesful poll()s, but it still does not solve the second problem:
>>>     >
>>>     > 2. Race conditions. At least in theory, there's nothing to stop the
>>>     > second iteration of the request processing loop to read a partially
>>>     > written request from the ring buffer, or garbage left over from a
>>>     > previous request, if it hits it "just right" (i.e. the "VCPU2 puts an
>>>     > event in the ring buffer" part can run on top of the "the loop
>>>     processes
>>>     > the second event").
>>>     >
>>>     > All this doesn't really show up unless we're in heavy processing
>>>     > scenarios, but they can occur, especially in synced vm_event
>>>     scenarios.
>>>     >
>>>     > The problems seems to be alleviated by changing the loop to this:
>>>     >
>>>     > for (;;) {
>>>     >       poll_evt_channel();
>>>     >
>>>     >       if (new_events) {
>>>     >               while (ring_buffer_has_requests) {
>>>     >                       pull_request();
>>>     >                       process_request();
>>>     >                       put_response();
>>>     >
>>>     >                       /* Moved here. */
>>>     >                       signal_evt_channel();
>>>
>>>
>>> That was the original setup, I've changed it a while back by moving it
>>> outside the loop. My reasoning was that there is no point in going
>>> lockstep with events and issuing a hypercall after each is processed.
>>> With only a handful of events on the ring the context switch is probably
>>> more taxing then having all requests and responses processed in one
>>> swipe. Of course, depending on the application this may not be the
>>> optimal case and the signal can be moved back to the loop (with lots of
>>> vCPUs it may be the case that the first vCPU remains paused while all
>>> events are processed for the other vCPUs, which of course would be bad).
>>>  
>>>
>>>     >               }
>>>     >       }
>>>     > }
>>>     >
>>>     > but it's obviously not foolproof, the main problem remaining the
>>>     > synchronization of the ring buffer from both the hypervisor side and
>>>     > userspace.
>>>     >
>>>     > How should we proceed to fix this?
>>>
>>>     The vm_event code in Xen must increment the producer index as the final
>>>     action of putting a request on the ring.  If it doesn't then it is
>>>     simply broken and needs fixing.
>>>
>>>     This will ensure that the consumer never sees a partial request on the
>>>     ring; by the time it observes a newer producer index, all data is
>>>     already written.
>>>
>>>
>>> It may worth double-checking but I'm pretty sure that's how it works
>>> right now as that part of the code has been pretty much just been
>>> recycled from the other Xen ring implementations.
>> I understand. We shouldn't forget that the consumer also modifies the
>> ring in userspace, and the hypervisor then becomes a consumer of
>> responses, and the same issues apply to it (would compiler optimizations
>> be a problem here?).
> 
> Quite possibly.  See XSA-155
> 
>> But unless that's the case, then I'm probably seeing a different issue and Andrew is right.
>>
>> Thank you both for the prompt replies!
> 
> You need to make certain that for both producer/consumer pairs, the
> following is true:
> 
> The producer and consumer indices are read using some form of atomic
> operation (ACCESS_ONCE() most likely, or read_atomic()), followed by an
> smb_rmb() to guarantee that the read completes and is in a local
> variable before any calculation is performed based on the value.  (This
> prevents the compiler "optimising" to multiple reads of the shared ring).
> 
> Then, memcpy() (or suitable alternative) to get the contents of the
> request in/out of the ring.
> 
> For the producer side, an smp_wmb() between the memcpy() into the ring,
> and the write to the producer index.
> For the consumer side, an smp_mb() between the memcpy() out of the ring,
> and the write to the consumer index.

Did all of that, to no avail - but the issue may indeed be somewhere else.

What I'm seeing is that on Windows 7 x86 boot, at the point of loading
some drivers (most often around usbport.sys), if I get page faults in
quick succession and try to emulate the offending instruction with code
similar to the xen-access.c "outside-the-loop resume" model, the domU
appears to freeze (although it's still running and nothing crashes). x64
guests seem unaffected by this, and x86 guests almost never hang if I
put the resume code inside the loop.

This behaviour is always accompanied by qemu-dm going to 100% CPU for a
few minutes. It eventually subsides, but the guest remains unusable
although it's not dead.

This goes away if I just run the instruction (with a single-step-like
model) inside the hypervisor instead of trying to emulate it, this being
the only change.

Unfortunately this is not trivial to demonstrate with small
modifications to the Xen in-tree code (see the CLI emulation issue, for
example).

Would trying to emulate an instruction from a driver working with qemu
cause this? No emulation failures are reported, but the attempt
apparently leads to a(n IO?) loop of some kind.


Thanks,
Razvan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-06 16:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-05 10:55 Issues with the xen-access.c model Razvan Cojocaru
2016-04-05 12:13 ` Andrew Cooper
2016-04-05 15:35   ` Tamas K Lengyel
2016-04-05 15:55     ` Razvan Cojocaru
2016-04-05 16:10       ` Andrew Cooper
2016-04-06 16:08         ` Razvan Cojocaru

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).