All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] tap: avoid deadlocking rx
@ 2014-03-08 15:00 Stefan Hajnoczi
  2014-03-10 19:19 ` Neil Skrypuch
  2014-03-11 12:25 ` Stefan Hajnoczi
  0 siblings, 2 replies; 3+ messages in thread
From: Stefan Hajnoczi @ 2014-03-08 15:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert

The net subsystem has a control flow mechanism so peer NetClientStates
can tell each other to stop sending packets.  This is used to stop
monitoring the tap file descriptor for incoming packets if the guest rx
ring has no spare buffers.

There is a corner case when tap_can_send() is true at the beginning of
an event loop iteration but becomes false before the tap_send() fd
handler is invoked.

tap_send() will read the packet from the tap file descriptor and attempt
to send it.  The net queue will hold on to the packet and return 0,
indicating that further I/O is not possible.  tap then stops monitoring
the file descriptor for reads.

This is unlike the normal case where tap_can_send() is the same before
and during the event loop iteration.  The event loop would simply not
monitor the file descriptor if tap_can_send() returns true.  Upon next
iteration it would check tap_can_send() again and begin monitoring if we
can send.

The deadlock happens because tap_send() explicitly disabled read_poll.
This is done with the expectation that the peer will call
qemu_net_queue_flush().  But hw/net/virtio-net.c does not monitor
vm_running transitions and issue the flush.  Hence we're left with a
broken tap device.

Cc: qemu-stable@nongnu.org
Reported-by: Neil Skrypuch <neil@tembosocial.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/tap.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 2d5099b..8847ce1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -190,7 +190,7 @@ static void tap_send(void *opaque)
     TAPState *s = opaque;
     int size;
 
-    do {
+    while (qemu_can_send_packet(&s->nc)) {
         uint8_t *buf = s->buf;
 
         size = tap_read_packet(s->fd, s->buf, sizeof(s->buf));
@@ -206,8 +206,11 @@ static void tap_send(void *opaque)
         size = qemu_send_packet_async(&s->nc, buf, size, tap_send_completed);
         if (size == 0) {
             tap_read_poll(s, false);
+            break;
+        } else if (size < 0) {
+            break;
         }
-    } while (size > 0 && qemu_can_send_packet(&s->nc));
+    }
 }
 
 static bool tap_has_ufo(NetClientState *nc)
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] [PATCH] tap: avoid deadlocking rx
  2014-03-08 15:00 [Qemu-devel] [PATCH] tap: avoid deadlocking rx Stefan Hajnoczi
@ 2014-03-10 19:19 ` Neil Skrypuch
  2014-03-11 12:25 ` Stefan Hajnoczi
  1 sibling, 0 replies; 3+ messages in thread
From: Neil Skrypuch @ 2014-03-10 19:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dr. David Alan Gilbert, Stefan Hajnoczi

On Saturday 08 March 2014 16:00:43 Stefan Hajnoczi wrote:
> The net subsystem has a control flow mechanism so peer NetClientStates
> can tell each other to stop sending packets.  This is used to stop
> monitoring the tap file descriptor for incoming packets if the guest rx
> ring has no spare buffers.
> 
> There is a corner case when tap_can_send() is true at the beginning of
> an event loop iteration but becomes false before the tap_send() fd
> handler is invoked.
> 
> tap_send() will read the packet from the tap file descriptor and attempt
> to send it.  The net queue will hold on to the packet and return 0,
> indicating that further I/O is not possible.  tap then stops monitoring
> the file descriptor for reads.
> 
> This is unlike the normal case where tap_can_send() is the same before
> and during the event loop iteration.  The event loop would simply not
> monitor the file descriptor if tap_can_send() returns true.  Upon next
> iteration it would check tap_can_send() again and begin monitoring if we
> can send.
> 
> The deadlock happens because tap_send() explicitly disabled read_poll.
> This is done with the expectation that the peer will call
> qemu_net_queue_flush().  But hw/net/virtio-net.c does not monitor
> vm_running transitions and issue the flush.  Hence we're left with a
> broken tap device.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Neil Skrypuch <neil@tembosocial.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  net/tap.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/tap.c b/net/tap.c
> index 2d5099b..8847ce1 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -190,7 +190,7 @@ static void tap_send(void *opaque)
>      TAPState *s = opaque;
>      int size;
> 
> -    do {
> +    while (qemu_can_send_packet(&s->nc)) {
>          uint8_t *buf = s->buf;
> 
>          size = tap_read_packet(s->fd, s->buf, sizeof(s->buf));
> @@ -206,8 +206,11 @@ static void tap_send(void *opaque)
>          size = qemu_send_packet_async(&s->nc, buf, size,
> tap_send_completed); if (size == 0) {
>              tap_read_poll(s, false);
> +            break;
> +        } else if (size < 0) {
> +            break;
>          }
> -    } while (size > 0 && qemu_can_send_packet(&s->nc));
> +    }
>  }
> 
>  static bool tap_has_ufo(NetClientState *nc)

Tested-by: Neil Skrypuch <neil@tembosocial.com>

With this patch applied, I've observed over 1000 migrations with no failures 
through my test harness. Without the above patch, that would be virtually 
impossible, so, looks good to me.

Thanks!

- Neil

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] [PATCH] tap: avoid deadlocking rx
  2014-03-08 15:00 [Qemu-devel] [PATCH] tap: avoid deadlocking rx Stefan Hajnoczi
  2014-03-10 19:19 ` Neil Skrypuch
@ 2014-03-11 12:25 ` Stefan Hajnoczi
  1 sibling, 0 replies; 3+ messages in thread
From: Stefan Hajnoczi @ 2014-03-11 12:25 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, Dr. David Alan Gilbert

On Sat, Mar 08, 2014 at 04:00:43PM +0100, Stefan Hajnoczi wrote:
> The net subsystem has a control flow mechanism so peer NetClientStates
> can tell each other to stop sending packets.  This is used to stop
> monitoring the tap file descriptor for incoming packets if the guest rx
> ring has no spare buffers.
> 
> There is a corner case when tap_can_send() is true at the beginning of
> an event loop iteration but becomes false before the tap_send() fd
> handler is invoked.
> 
> tap_send() will read the packet from the tap file descriptor and attempt
> to send it.  The net queue will hold on to the packet and return 0,
> indicating that further I/O is not possible.  tap then stops monitoring
> the file descriptor for reads.
> 
> This is unlike the normal case where tap_can_send() is the same before
> and during the event loop iteration.  The event loop would simply not
> monitor the file descriptor if tap_can_send() returns true.  Upon next
> iteration it would check tap_can_send() again and begin monitoring if we
> can send.
> 
> The deadlock happens because tap_send() explicitly disabled read_poll.
> This is done with the expectation that the peer will call
> qemu_net_queue_flush().  But hw/net/virtio-net.c does not monitor
> vm_running transitions and issue the flush.  Hence we're left with a
> broken tap device.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Neil Skrypuch <neil@tembosocial.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  net/tap.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Applied to my net tree:
https://github.com/stefanha/qemu/commits/net

Stefan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-03-11 12:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-08 15:00 [Qemu-devel] [PATCH] tap: avoid deadlocking rx Stefan Hajnoczi
2014-03-10 19:19 ` Neil Skrypuch
2014-03-11 12:25 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.