All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
@ 2016-08-23  8:57 Gaohaifeng (A)
  2016-08-25  1:33 ` Daniel P. Berrange
  0 siblings, 1 reply; 4+ messages in thread
From: Gaohaifeng (A) @ 2016-08-23  8:57 UTC (permalink / raw)
  To: berrange, pbonzini, Lilijun (Jerry), zangchuanqiang, qemu devel

Hi Daniel & Paolo,

Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about

the below code segment:

-static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
+static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)
{
     CharDriverState *chr = opaque;
     TCPCharDriver *s = chr->opaque;
@@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
     if (len > s->max_size)
         len = s->max_size;
     size = tcp_chr_recv(chr, (void *)buf, len);
-    if (size == 0 ||
-        (size < 0 &&
-         socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
+    if (size == 0 || size == -1) {
         /* connection closed */
         tcp_chr_disconnect(chr);
     } else if (size > 0) {

The old version will not call tcp_chr_disconnect when error is not EAGAIN.
The new version will call tcp_chr_disconnect when size==-1. From the tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call tcp_chr_disconnect.

We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit since link status is not up. The link is down because tcp_chr_disconnect will set it.

Can you explain it why EAGIN here changes the behavior ?

Thanks,
Haifeng Gao

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
  2016-08-23  8:57 [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 Gaohaifeng (A)
@ 2016-08-25  1:33 ` Daniel P. Berrange
  2016-08-25 13:21   ` Gaohaifeng (A)
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel P. Berrange @ 2016-08-25  1:33 UTC (permalink / raw)
  To: Gaohaifeng (A); +Cc: pbonzini, Lilijun (Jerry), zangchuanqiang, qemu devel

On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote:
> Hi Daniel & Paolo,
> 
> Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about
> 
> the below code segment:
> 
> -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
> +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)
> {
>      CharDriverState *chr = opaque;
>      TCPCharDriver *s = chr->opaque;
> @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
>      if (len > s->max_size)
>          len = s->max_size;
>      size = tcp_chr_recv(chr, (void *)buf, len);
> -    if (size == 0 ||
> -        (size < 0 &&
> -         socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
> +    if (size == 0 || size == -1) {
>          /* connection closed */
>          tcp_chr_disconnect(chr);
>      } else if (size > 0) {
> 
> The old version will not call tcp_chr_disconnect when error is not
> EAGAIN.
> The new version will call tcp_chr_disconnect when size==-1. From the
> tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call
> tcp_chr_disconnect.
> 
> We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit since
> link status is not up. The link is down because tcp_chr_disconnect will
> set it.
> 
> Can you explain it why EAGIN here changes the behavior ?

tcp_chr_read is used in a call to io_add_watch_poll() which sets up an
event loop watch so that tcp_chr_read is called when there is incoming
data to be read.  IOW, when tcp_chr_read is invoked, I would always
expect that at least 1 byte is available, and so EAGAIN should not
occurr.

So I'm puzzled why you would get EAGAIN at all, unless there is some
kind of race with other code also consuming bytes from the same
socket.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
  2016-08-25  1:33 ` Daniel P. Berrange
@ 2016-08-25 13:21   ` Gaohaifeng (A)
  2016-08-25 15:52     ` Daniel P. Berrange
  0 siblings, 1 reply; 4+ messages in thread
From: Gaohaifeng (A) @ 2016-08-25 13:21 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: pbonzini, Lilijun (Jerry), zangchuanqiang, qemu devel

On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote:
> Hi Daniel & Paolo,
> > 
> > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about
> > 
> > the below code segment:
> > 
> > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, 
> > void *opaque)
> > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, 
> > +void *opaque)
> > {
> >      CharDriverState *chr = opaque;
> >      TCPCharDriver *s = chr->opaque;
> > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
> >      if (len > s->max_size)
> >          len = s->max_size;
> >      size = tcp_chr_recv(chr, (void *)buf, len);
> > -    if (size == 0 ||
> > -        (size < 0 &&
> > -         socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
> > +    if (size == 0 || size == -1) {
> >          /* connection closed */
> >          tcp_chr_disconnect(chr);
> >      } else if (size > 0) {
> > 
> > The old version will not call tcp_chr_disconnect when error is not 
> > EAGAIN.
> > The new version will call tcp_chr_disconnect when size==-1. From the 
> > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call 
> > tcp_chr_disconnect.
> > 
> > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit 
> > since link status is not up. The link is down because 
> > tcp_chr_disconnect will set it.
> > 
> > Can you explain it why EAGIN here changes the behavior ?

> tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event loop watch so that tcp_chr_read is called when there is incoming data to be read.  IOW, when tcp_chr_read is invoked, I would always expect that at least 1 byte is available, and so EAGAIN should not occurr.
> 
> So I'm puzzled why you would get EAGAIN at all, unless there is some kind of race with other code also consuming bytes from the same socket.

It could easily reproduce this when repeating 'rmmod virtio_net;modprobe virtio_net'.
The call stack:
#0  tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867
#1  0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917
#2  0x00007f09d46364fa in qio_channel_fd_source_dispatch (source=0x7f093e603b30, callback=0x7f09d43b1fcb <tcp_chr_read>, user_data=0x7f09d5b2d540) at io/channel-watch.c:84
#3  0x00007f09d2e9999a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#4  0x00007f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213
#5  0x00007f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at main-loop.c:258
#6  0x00007f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506
#7  0x00007f09d43bbdd2 in main_loop () at vl.c:2119
#8  0x00007f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, envp=0x7ffe7ad66b70) at vl.c:4896
(gdb) f 1
#1  0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917
2917	        tcp_chr_disconnect(chr);
(gdb) p errno
$1 = 11

EAGAIN = 11

> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
  2016-08-25 13:21   ` Gaohaifeng (A)
@ 2016-08-25 15:52     ` Daniel P. Berrange
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel P. Berrange @ 2016-08-25 15:52 UTC (permalink / raw)
  To: Gaohaifeng (A); +Cc: pbonzini, Lilijun (Jerry), zangchuanqiang, qemu devel

On Thu, Aug 25, 2016 at 01:21:47PM +0000, Gaohaifeng (A) wrote:
> On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote:
> > Hi Daniel & Paolo,
> > > 
> > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about
> > > 
> > > the below code segment:
> > > 
> > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, 
> > > void *opaque)
> > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, 
> > > +void *opaque)
> > > {
> > >      CharDriverState *chr = opaque;
> > >      TCPCharDriver *s = chr->opaque;
> > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
> > >      if (len > s->max_size)
> > >          len = s->max_size;
> > >      size = tcp_chr_recv(chr, (void *)buf, len);
> > > -    if (size == 0 ||
> > > -        (size < 0 &&
> > > -         socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
> > > +    if (size == 0 || size == -1) {
> > >          /* connection closed */
> > >          tcp_chr_disconnect(chr);
> > >      } else if (size > 0) {
> > > 
> > > The old version will not call tcp_chr_disconnect when error is not 
> > > EAGAIN.
> > > The new version will call tcp_chr_disconnect when size==-1. From the 
> > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call 
> > > tcp_chr_disconnect.
> > > 
> > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit 
> > > since link status is not up. The link is down because 
> > > tcp_chr_disconnect will set it.
> > > 
> > > Can you explain it why EAGIN here changes the behavior ?
> 
> > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event loop watch so that tcp_chr_read is called when there is incoming data to be read.  IOW, when tcp_chr_read is invoked, I would always expect that at least 1 byte is available, and so EAGAIN should not occurr.
> > 
> > So I'm puzzled why you would get EAGAIN at all, unless there is some kind of race with other code also consuming bytes from the same socket.
> 
> It could easily reproduce this when repeating 'rmmod virtio_net;modprobe virtio_net'.
> The call stack:
> #0  tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867
> #1  0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917
> #2  0x00007f09d46364fa in qio_channel_fd_source_dispatch (source=0x7f093e603b30, callback=0x7f09d43b1fcb <tcp_chr_read>, user_data=0x7f09d5b2d540) at io/channel-watch.c:84
> #3  0x00007f09d2e9999a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> #4  0x00007f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213
> #5  0x00007f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at main-loop.c:258
> #6  0x00007f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506
> #7  0x00007f09d43bbdd2 in main_loop () at vl.c:2119
> #8  0x00007f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, envp=0x7ffe7ad66b70) at vl.c:4896
> (gdb) f 1
> #1  0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917
> 2917	        tcp_chr_disconnect(chr);
> (gdb) p errno
> $1 = 11
> 
> EAGAIN = 11

I think what is happening is a race condition - the main event loop thread
sees I/O incoming triggering tcp_chr_read. Meanwhile the vhost-user.c file is
triggering vhost_user_read() in a different thread which consumes the incoming
I/O before tcp_chr_read can handle it.

IOW, you are right, we should have kept the EAGAIN handling code in tcp_chr_read
to avoid the disconnects.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-25 15:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-23  8:57 [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 Gaohaifeng (A)
2016-08-25  1:33 ` Daniel P. Berrange
2016-08-25 13:21   ` Gaohaifeng (A)
2016-08-25 15:52     ` Daniel P. Berrange

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.