KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal()
@ 2020-07-31 10:14 Alexandru Elisei
  2020-07-31 13:30 ` Jean-Philippe Brucker
  2020-07-31 22:05 ` Milan Kocian
  0 siblings, 2 replies; 5+ messages in thread
From: Alexandru Elisei @ 2020-07-31 10:14 UTC (permalink / raw)
  To: kvm
  Cc: milon, julien.thierry.kdev, will, jean-philippe, andre.przywara,
	Anvay Virkar

The guest programs used_event in the avail queue to let the host know when
it wants a notification from the device. The host notifies the guest when
the used queue vring index passes used_event.

The virtio-blk guest driver, in the notification callback, sets used_event
to the value of the current used queue vring index, which means it will get
a notification after the next buffer is consumed by the host. It is
possible for the guest to submit a job, and then go into uninterruptible
sleep waiting for the notification (for example, via submit_bio_wait()).

On the host side, the virtio-blk device increments the used queue vring
index, then compares it to used_event to decide if a notification should be
sent.

A memory barrier between writing the new index value and reading used_event
is needed to make sure we read the latest used_event value.  kvmtool uses a
write memory barrier, which on arm64 translates into DMB ISHST. The
instruction orders memory writes that have been executed in program order
before the barrier relative to writes that have been executed in program
order after the barrier. The barrier doesn't apply to reads, which means it
is not enough to prevent reading a stale value for used_event. This can
lead to the host not sending the notification, and the guest thread remains
stuck indefinitely waiting for IO to complete.

Using a memory barrier for reads and writes matches what the guest driver
does when deciding to kick the host: after updating the avail queue vring
index via virtio_queue_rq() -> virtblk_add_req() -> .. ->
virtqueue_add_split(), it uses a read/write memory barrier before reading
avail_event from the used queue in virtio_commit_rqs() -> .. ->
virtqueue_kick_prepare_split().

Also move the barrier in virt_queue__should_signal(), because the barrier
is needed for notifications to work correctly, and it makes more sense to
have it in the function that determines if the host should notify the
guest.

Reported-by: Anvay Virkar <anvay.virkar@arm.com>
Suggested-by: Anvay Virkar <anvay.virkar@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
This was observed by Anvay, where kvmtool reads the previous value of
used_event and the notification is not sent.

I *think* this also fixes the VM hang reported in [1], where several
processes in the guest were stuck in uninterruptible sleep. I am not
familiar with the block layer, but my theory is that the threads were stuck
in wait_for_completion_io(), from blk_execute_rq() executing a flush
request. It would be great if Milan could give this patch a spin and see if
the problem goes away. Don't know how reproducible it is though.

[1] https://www.spinics.net/lists/kvm/msg204543.html

 virtio/core.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/virtio/core.c b/virtio/core.c
index f5b3c07fc100..90a661d12e14 100644
--- a/virtio/core.c
+++ b/virtio/core.c
@@ -33,13 +33,6 @@ void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump)
 	wmb();
 	idx += jump;
 	queue->vring.used->idx = virtio_host_to_guest_u16(queue, idx);
-
-	/*
-	 * Use wmb to assure used idx has been increased before we signal the guest.
-	 * Without a wmb here the guest may ignore the queue since it won't see
-	 * an updated idx.
-	 */
-	wmb();
 }
 
 struct vring_used_elem *
@@ -194,6 +187,14 @@ bool virtio_queue__should_signal(struct virt_queue *vq)
 {
 	u16 old_idx, new_idx, event_idx;
 
+	/*
+	 * Use mb to assure used idx has been increased before we signal the
+	 * guest, and we don't read a stale value for used_event. Without a mb
+	 * here we might not send a notification that we need to send, or the
+	 * guest may ignore the queue since it won't see an updated idx.
+	 */
+	mb();
+
 	if (!vq->use_event_idx) {
 		/*
 		 * When VIRTIO_RING_F_EVENT_IDX isn't negotiated, interrupt the
-- 
2.28.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal()
  2020-07-31 10:14 [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal() Alexandru Elisei
@ 2020-07-31 13:30 ` Jean-Philippe Brucker
  2020-08-04 11:13   ` Alexandru Elisei
  2020-07-31 22:05 ` Milan Kocian
  1 sibling, 1 reply; 5+ messages in thread
From: Jean-Philippe Brucker @ 2020-07-31 13:30 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, milon, julien.thierry.kdev, will, andre.przywara, Anvay Virkar

On Fri, Jul 31, 2020 at 11:14:27AM +0100, Alexandru Elisei wrote:
> The guest programs used_event in the avail queue to let the host know when

                                 nit: "avail ring"...

> it wants a notification from the device. The host notifies the guest when
> the used queue vring index passes used_event.

... and "used ring" are more consistent with the virtio spec.

> The virtio-blk guest driver, in the notification callback, sets used_event
> to the value of the current used queue vring index, which means it will get
> a notification after the next buffer is consumed by the host. It is
> possible for the guest to submit a job, and then go into uninterruptible
> sleep waiting for the notification (for example, via submit_bio_wait()).
> 
> On the host side, the virtio-blk device increments the used queue vring
> index, then compares it to used_event to decide if a notification should be
> sent.
> 
> A memory barrier between writing the new index value and reading used_event
> is needed to make sure we read the latest used_event value.  kvmtool uses a
> write memory barrier, which on arm64 translates into DMB ISHST. The
> instruction orders memory writes that have been executed in program order
> before the barrier relative to writes that have been executed in program
> order after the barrier.

Not sure this sentence is necessary.

> The barrier doesn't apply to reads, which means it
> is not enough to prevent reading a stale value for used_event. This can
> lead to the host not sending the notification, and the guest thread remains
> stuck indefinitely waiting for IO to complete.

It might be helpful to detail what the guest does, to identify which
barrier this pairs with on the guest side. I had a hard time convincing
myself that this is the right fix, but I think I got there in the end.

Kvmtool currently does:

	// virt_queue__used_idx_advance()
		vring.used.idx = idx
		wmb()
	// virtio_queue__should_signal()
	if (vring.used_event is between old and new idx)
		notify()

When simplifying Linux virtblk_done() I get:

	while (last_used != vring.used.idx) {
		...
		vring.used_event = ++last_used
		mb() // virtio_store_mb() in virtqueue_get_buf_ctx_split()
		...
		// unnecessary write+mb, here for completeness
		vring.used_event = last_used
		mb() // virtqueue_poll()
	}

(1) Kvmtool:
    writes vring.used.idx = 2
    wmb()
    reads vring.used_event = 1
    notifies the guest.

(2) Linux:
    (reads vring.used.idx = 2)
    writes vring.used_event = 2
    mb()
    reads vring.used.idx = 2
    returns from virtblk_done()

(3) Kvmtool:
    writes vring.used.idx = 3
    wmb()
    reads vring.used_event = 1
    doesn't notify, stalls the guest.

By replacing the wmb() with a full barrier we get the correct store buffer
pattern (SB+fencembonceonces.litmus).

> Using a memory barrier for reads and writes matches what the guest driver
> does when deciding to kick the host: after updating the avail queue vring
> index via virtio_queue_rq() -> virtblk_add_req() -> .. ->

Probably worth noting you're referring to Linux here.

> virtqueue_add_split(), it uses a read/write memory barrier before reading

                                or "full" memory barrier?

> avail_event from the used queue in virtio_commit_rqs() -> .. ->
> virtqueue_kick_prepare_split().
> 
> Also move the barrier in virt_queue__should_signal(), because the barrier
> is needed for notifications to work correctly, and it makes more sense to
> have it in the function that determines if the host should notify the
> guest.
> 
> Reported-by: Anvay Virkar <anvay.virkar@arm.com>
> Suggested-by: Anvay Virkar <anvay.virkar@arm.com>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Regardless of the nits above, I believe this patch is correct

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

> ---
> This was observed by Anvay, where kvmtool reads the previous value of
> used_event and the notification is not sent.
> 
> I *think* this also fixes the VM hang reported in [1], where several
> processes in the guest were stuck in uninterruptible sleep. I am not
> familiar with the block layer, but my theory is that the threads were stuck
> in wait_for_completion_io(), from blk_execute_rq() executing a flush
> request. It would be great if Milan could give this patch a spin and see if
> the problem goes away. Don't know how reproducible it is though.
> 
> [1] https://www.spinics.net/lists/kvm/msg204543.html
> 
>  virtio/core.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/virtio/core.c b/virtio/core.c
> index f5b3c07fc100..90a661d12e14 100644
> --- a/virtio/core.c
> +++ b/virtio/core.c
> @@ -33,13 +33,6 @@ void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump)
>  	wmb();
>  	idx += jump;
>  	queue->vring.used->idx = virtio_host_to_guest_u16(queue, idx);
> -
> -	/*
> -	 * Use wmb to assure used idx has been increased before we signal the guest.
> -	 * Without a wmb here the guest may ignore the queue since it won't see
> -	 * an updated idx.
> -	 */
> -	wmb();
>  }
>  
>  struct vring_used_elem *
> @@ -194,6 +187,14 @@ bool virtio_queue__should_signal(struct virt_queue *vq)
>  {
>  	u16 old_idx, new_idx, event_idx;
>  
> +	/*
> +	 * Use mb to assure used idx has been increased before we signal the
> +	 * guest, and we don't read a stale value for used_event. Without a mb
> +	 * here we might not send a notification that we need to send, or the
> +	 * guest may ignore the queue since it won't see an updated idx.
> +	 */
> +	mb();
> +
>  	if (!vq->use_event_idx) {
>  		/*
>  		 * When VIRTIO_RING_F_EVENT_IDX isn't negotiated, interrupt the
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal()
  2020-07-31 10:14 [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal() Alexandru Elisei
  2020-07-31 13:30 ` Jean-Philippe Brucker
@ 2020-07-31 22:05 ` Milan Kocian
  2020-08-04 12:15   ` Alexandru Elisei
  1 sibling, 1 reply; 5+ messages in thread
From: Milan Kocian @ 2020-07-31 22:05 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, julien.thierry.kdev, will, jean-philippe, andre.przywara,
	Anvay Virkar

Hello,
 
> I *think* this also fixes the VM hang reported in [1], where several
> processes in the guest were stuck in uninterruptible sleep. I am not
> familiar with the block layer, but my theory is that the threads were stuck
> in wait_for_completion_io(), from blk_execute_rq() executing a flush
> request. It would be great if Milan could give this patch a spin and see if
> the problem goes away. Don't know how reproducible it is though.
> 
> [1] https://www.spinics.net/lists/kvm/msg204543.html
> 

Okay, I will test it but it takes some time. Because I migrated to the
qemu due this problem :-) so I need to prepare new environment. And
because the problem happens randomly. My environment can run days/few weeks
without this problem.

Thanks.

Best regards.

-- 
Milan Kocian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal()
  2020-07-31 13:30 ` Jean-Philippe Brucker
@ 2020-08-04 11:13   ` Alexandru Elisei
  0 siblings, 0 replies; 5+ messages in thread
From: Alexandru Elisei @ 2020-08-04 11:13 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm, milon, julien.thierry.kdev, will, andre.przywara, Anvay Virkar

Hello Jean,

Thank you for taking the time to review the patch.

On 7/31/20 2:30 PM, Jean-Philippe Brucker wrote:
> On Fri, Jul 31, 2020 at 11:14:27AM +0100, Alexandru Elisei wrote:
>> The guest programs used_event in the avail queue to let the host know when
>                                  nit: "avail ring"...
>
>> it wants a notification from the device. The host notifies the guest when
>> the used queue vring index passes used_event.
> ... and "used ring" are more consistent with the virtio spec.

Will fix.

>
>> The virtio-blk guest driver, in the notification callback, sets used_event
>> to the value of the current used queue vring index, which means it will get
>> a notification after the next buffer is consumed by the host. It is
>> possible for the guest to submit a job, and then go into uninterruptible
>> sleep waiting for the notification (for example, via submit_bio_wait()).
>>
>> On the host side, the virtio-blk device increments the used queue vring
>> index, then compares it to used_event to decide if a notification should be
>> sent.
>>
>> A memory barrier between writing the new index value and reading used_event
>> is needed to make sure we read the latest used_event value.  kvmtool uses a
>> write memory barrier, which on arm64 translates into DMB ISHST. The
>> instruction orders memory writes that have been executed in program order
>> before the barrier relative to writes that have been executed in program
>> order after the barrier.
> Not sure this sentence is necessary.
>
>> The barrier doesn't apply to reads, which means it
>> is not enough to prevent reading a stale value for used_event. This can
>> lead to the host not sending the notification, and the guest thread remains
>> stuck indefinitely waiting for IO to complete.
> It might be helpful to detail what the guest does, to identify which
> barrier this pairs with on the guest side. I had a hard time convincing
> myself that this is the right fix, but I think I got there in the end.
>
> Kvmtool currently does:
>
> 	// virt_queue__used_idx_advance()
> 		vring.used.idx = idx
> 		wmb()
> 	// virtio_queue__should_signal()
> 	if (vring.used_event is between old and new idx)
> 		notify()
>
> When simplifying Linux virtblk_done() I get:
>
> 	while (last_used != vring.used.idx) {
> 		...
> 		vring.used_event = ++last_used
> 		mb() // virtio_store_mb() in virtqueue_get_buf_ctx_split()
> 		...
> 		// unnecessary write+mb, here for completeness
> 		vring.used_event = last_used
> 		mb() // virtqueue_poll()
> 	}
>
> (1) Kvmtool:
>     writes vring.used.idx = 2
>     wmb()
>     reads vring.used_event = 1
>     notifies the guest.
>
> (2) Linux:
>     (reads vring.used.idx = 2)
>     writes vring.used_event = 2
>     mb()
>     reads vring.used.idx = 2
>     returns from virtblk_done()
>
> (3) Kvmtool:
>     writes vring.used.idx = 3
>     wmb()
>     reads vring.used_event = 1
>     doesn't notify, stalls the guest.
>
> By replacing the wmb() with a full barrier we get the correct store buffer
> pattern (SB+fencembonceonces.litmus).

Yes, sb+fencemboneonces.litmus pattern is what kvmtool should implement, and the
pattern that is triggering the stall is when both threads read the initial values,
which should be forbidden. I'll update the commit message with a better
description of what is going on.

>
>> Using a memory barrier for reads and writes matches what the guest driver
>> does when deciding to kick the host: after updating the avail queue vring
>> index via virtio_queue_rq() -> virtblk_add_req() -> .. ->
> Probably worth noting you're referring to Linux here.
>
>> virtqueue_add_split(), it uses a read/write memory barrier before reading
>                                 or "full" memory barrier?

I was using the Arm ARM terminology, which doesn't have a "full memory barrier",
but instead uses either a full system memory barrier (SY) wrt to the shareability
domain or a read/write memory barrier wrt to access type. I'll use the Linux
terminology, I suppose that makes more sense for everyone.

>
>> avail_event from the used queue in virtio_commit_rqs() -> .. ->
>> virtqueue_kick_prepare_split().
>>
>> Also move the barrier in virt_queue__should_signal(), because the barrier
>> is needed for notifications to work correctly, and it makes more sense to
>> have it in the function that determines if the host should notify the
>> guest.
>>
>> Reported-by: Anvay Virkar <anvay.virkar@arm.com>
>> Suggested-by: Anvay Virkar <anvay.virkar@arm.com>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Regardless of the nits above, I believe this patch is correct
>
> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

I'll try to write a better commit message, thank you for the review and suggestions.

Thanks,
Alex
>
>> ---
>> This was observed by Anvay, where kvmtool reads the previous value of
>> used_event and the notification is not sent.
>>
>> I *think* this also fixes the VM hang reported in [1], where several
>> processes in the guest were stuck in uninterruptible sleep. I am not
>> familiar with the block layer, but my theory is that the threads were stuck
>> in wait_for_completion_io(), from blk_execute_rq() executing a flush
>> request. It would be great if Milan could give this patch a spin and see if
>> the problem goes away. Don't know how reproducible it is though.
>>
>> [1] https://www.spinics.net/lists/kvm/msg204543.html
>>
>>  virtio/core.c | 15 ++++++++-------
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/virtio/core.c b/virtio/core.c
>> index f5b3c07fc100..90a661d12e14 100644
>> --- a/virtio/core.c
>> +++ b/virtio/core.c
>> @@ -33,13 +33,6 @@ void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump)
>>  	wmb();
>>  	idx += jump;
>>  	queue->vring.used->idx = virtio_host_to_guest_u16(queue, idx);
>> -
>> -	/*
>> -	 * Use wmb to assure used idx has been increased before we signal the guest.
>> -	 * Without a wmb here the guest may ignore the queue since it won't see
>> -	 * an updated idx.
>> -	 */
>> -	wmb();
>>  }
>>  
>>  struct vring_used_elem *
>> @@ -194,6 +187,14 @@ bool virtio_queue__should_signal(struct virt_queue *vq)
>>  {
>>  	u16 old_idx, new_idx, event_idx;
>>  
>> +	/*
>> +	 * Use mb to assure used idx has been increased before we signal the
>> +	 * guest, and we don't read a stale value for used_event. Without a mb
>> +	 * here we might not send a notification that we need to send, or the
>> +	 * guest may ignore the queue since it won't see an updated idx.
>> +	 */
>> +	mb();
>> +
>>  	if (!vq->use_event_idx) {
>>  		/*
>>  		 * When VIRTIO_RING_F_EVENT_IDX isn't negotiated, interrupt the
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal()
  2020-07-31 22:05 ` Milan Kocian
@ 2020-08-04 12:15   ` Alexandru Elisei
  0 siblings, 0 replies; 5+ messages in thread
From: Alexandru Elisei @ 2020-08-04 12:15 UTC (permalink / raw)
  To: Milan Kocian
  Cc: kvm, julien.thierry.kdev, will, jean-philippe, andre.przywara,
	Anvay Virkar

Hello Milan,

On 7/31/20 11:05 PM, Milan Kocian wrote:
> Hello,
>  
>> I *think* this also fixes the VM hang reported in [1], where several
>> processes in the guest were stuck in uninterruptible sleep. I am not
>> familiar with the block layer, but my theory is that the threads were stuck
>> in wait_for_completion_io(), from blk_execute_rq() executing a flush
>> request. It would be great if Milan could give this patch a spin and see if
>> the problem goes away. Don't know how reproducible it is though.
>>
>> [1] https://www.spinics.net/lists/kvm/msg204543.html
>>
> Okay, I will test it but it takes some time. Because I migrated to the
> qemu due this problem :-) so I need to prepare new environment. And
> because the problem happens randomly. My environment can run days/few weeks
> without this problem.

Thank you for giving this a go, much appreciated!

It's unfortunate that you had to stop using kvmtool because of the bug, hopefully
this patch fixes it.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31 10:14 [PATCH kvmtool] virtio: Fix ordering of virt_queue__should_signal() Alexandru Elisei
2020-07-31 13:30 ` Jean-Philippe Brucker
2020-08-04 11:13   ` Alexandru Elisei
2020-07-31 22:05 ` Milan Kocian
2020-08-04 12:15   ` Alexandru Elisei

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git