* [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 @ 2016-08-23 8:57 Gaohaifeng (A) 2016-08-25 1:33 ` Daniel P. Berrange 0 siblings, 1 reply; 4+ messages in thread From: Gaohaifeng (A) @ 2016-08-23 8:57 UTC (permalink / raw) To: berrange, pbonzini, Lilijun (Jerry), zangchuanqiang, qemu devel Hi Daniel & Paolo, Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about the below code segment: -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque) +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque) { CharDriverState *chr = opaque; TCPCharDriver *s = chr->opaque; @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque) if (len > s->max_size) len = s->max_size; size = tcp_chr_recv(chr, (void *)buf, len); - if (size == 0 || - (size < 0 && - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { + if (size == 0 || size == -1) { /* connection closed */ tcp_chr_disconnect(chr); } else if (size > 0) { The old version will not call tcp_chr_disconnect when error is not EAGAIN. The new version will call tcp_chr_disconnect when size==-1. From the tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call tcp_chr_disconnect. We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit since link status is not up. The link is down because tcp_chr_disconnect will set it. Can you explain it why EAGIN here changes the behavior ? Thanks, Haifeng Gao ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 2016-08-23 8:57 [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 Gaohaifeng (A) @ 2016-08-25 1:33 ` Daniel P. Berrange 2016-08-25 13:21 ` Gaohaifeng (A) 0 siblings, 1 reply; 4+ messages in thread From: Daniel P. Berrange @ 2016-08-25 1:33 UTC (permalink / raw) To: Gaohaifeng (A); +Cc: pbonzini, Lilijun (Jerry), zangchuanqiang, qemu devel On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote: > Hi Daniel & Paolo, > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about > > the below code segment: > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque) > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque) > { > CharDriverState *chr = opaque; > TCPCharDriver *s = chr->opaque; > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque) > if (len > s->max_size) > len = s->max_size; > size = tcp_chr_recv(chr, (void *)buf, len); > - if (size == 0 || > - (size < 0 && > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { > + if (size == 0 || size == -1) { > /* connection closed */ > tcp_chr_disconnect(chr); > } else if (size > 0) { > > The old version will not call tcp_chr_disconnect when error is not > EAGAIN. > The new version will call tcp_chr_disconnect when size==-1. From the > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call > tcp_chr_disconnect. > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit since > link status is not up. The link is down because tcp_chr_disconnect will > set it. > > Can you explain it why EAGIN here changes the behavior ? tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event loop watch so that tcp_chr_read is called when there is incoming data to be read. IOW, when tcp_chr_read is invoked, I would always expect that at least 1 byte is available, and so EAGAIN should not occurr. So I'm puzzled why you would get EAGAIN at all, unless there is some kind of race with other code also consuming bytes from the same socket. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 2016-08-25 1:33 ` Daniel P. Berrange @ 2016-08-25 13:21 ` Gaohaifeng (A) 2016-08-25 15:52 ` Daniel P. Berrange 0 siblings, 1 reply; 4+ messages in thread From: Gaohaifeng (A) @ 2016-08-25 13:21 UTC (permalink / raw) To: Daniel P. Berrange; +Cc: pbonzini, Lilijun (Jerry), zangchuanqiang, qemu devel On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote: > Hi Daniel & Paolo, > > > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about > > > > the below code segment: > > > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, > > void *opaque) > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, > > +void *opaque) > > { > > CharDriverState *chr = opaque; > > TCPCharDriver *s = chr->opaque; > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque) > > if (len > s->max_size) > > len = s->max_size; > > size = tcp_chr_recv(chr, (void *)buf, len); > > - if (size == 0 || > > - (size < 0 && > > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { > > + if (size == 0 || size == -1) { > > /* connection closed */ > > tcp_chr_disconnect(chr); > > } else if (size > 0) { > > > > The old version will not call tcp_chr_disconnect when error is not > > EAGAIN. > > The new version will call tcp_chr_disconnect when size==-1. From the > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call > > tcp_chr_disconnect. > > > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit > > since link status is not up. The link is down because > > tcp_chr_disconnect will set it. > > > > Can you explain it why EAGIN here changes the behavior ? > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event loop watch so that tcp_chr_read is called when there is incoming data to be read. IOW, when tcp_chr_read is invoked, I would always expect that at least 1 byte is available, and so EAGAIN should not occurr. > > So I'm puzzled why you would get EAGAIN at all, unless there is some kind of race with other code also consuming bytes from the same socket. It could easily reproduce this when repeating 'rmmod virtio_net;modprobe virtio_net'. The call stack: #0 tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867 #1 0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917 #2 0x00007f09d46364fa in qio_channel_fd_source_dispatch (source=0x7f093e603b30, callback=0x7f09d43b1fcb <tcp_chr_read>, user_data=0x7f09d5b2d540) at io/channel-watch.c:84 #3 0x00007f09d2e9999a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #4 0x00007f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213 #5 0x00007f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at main-loop.c:258 #6 0x00007f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506 #7 0x00007f09d43bbdd2 in main_loop () at vl.c:2119 #8 0x00007f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, envp=0x7ffe7ad66b70) at vl.c:4896 (gdb) f 1 #1 0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917 2917 tcp_chr_disconnect(chr); (gdb) p errno $1 = 11 EAGAIN = 11 > > Regards, > Daniel ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 2016-08-25 13:21 ` Gaohaifeng (A) @ 2016-08-25 15:52 ` Daniel P. Berrange 0 siblings, 0 replies; 4+ messages in thread From: Daniel P. Berrange @ 2016-08-25 15:52 UTC (permalink / raw) To: Gaohaifeng (A); +Cc: pbonzini, Lilijun (Jerry), zangchuanqiang, qemu devel On Thu, Aug 25, 2016 at 01:21:47PM +0000, Gaohaifeng (A) wrote: > On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote: > > Hi Daniel & Paolo, > > > > > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about > > > > > > the below code segment: > > > > > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, > > > void *opaque) > > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, > > > +void *opaque) > > > { > > > CharDriverState *chr = opaque; > > > TCPCharDriver *s = chr->opaque; > > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque) > > > if (len > s->max_size) > > > len = s->max_size; > > > size = tcp_chr_recv(chr, (void *)buf, len); > > > - if (size == 0 || > > > - (size < 0 && > > > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { > > > + if (size == 0 || size == -1) { > > > /* connection closed */ > > > tcp_chr_disconnect(chr); > > > } else if (size > 0) { > > > > > > The old version will not call tcp_chr_disconnect when error is not > > > EAGAIN. > > > The new version will call tcp_chr_disconnect when size==-1. From the > > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call > > > tcp_chr_disconnect. > > > > > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit > > > since link status is not up. The link is down because > > > tcp_chr_disconnect will set it. > > > > > > Can you explain it why EAGIN here changes the behavior ? > > > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event loop watch so that tcp_chr_read is called when there is incoming data to be read. IOW, when tcp_chr_read is invoked, I would always expect that at least 1 byte is available, and so EAGAIN should not occurr. > > > > So I'm puzzled why you would get EAGAIN at all, unless there is some kind of race with other code also consuming bytes from the same socket. > > It could easily reproduce this when repeating 'rmmod virtio_net;modprobe virtio_net'. > The call stack: > #0 tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867 > #1 0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917 > #2 0x00007f09d46364fa in qio_channel_fd_source_dispatch (source=0x7f093e603b30, callback=0x7f09d43b1fcb <tcp_chr_read>, user_data=0x7f09d5b2d540) at io/channel-watch.c:84 > #3 0x00007f09d2e9999a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 > #4 0x00007f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213 > #5 0x00007f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at main-loop.c:258 > #6 0x00007f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506 > #7 0x00007f09d43bbdd2 in main_loop () at vl.c:2119 > #8 0x00007f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, envp=0x7ffe7ad66b70) at vl.c:4896 > (gdb) f 1 > #1 0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917 > 2917 tcp_chr_disconnect(chr); > (gdb) p errno > $1 = 11 > > EAGAIN = 11 I think what is happening is a race condition - the main event loop thread sees I/O incoming triggering tcp_chr_read. Meanwhile the vhost-user.c file is triggering vhost_user_read() in a different thread which consumes the incoming I/O before tcp_chr_read can handle it. IOW, you are right, we should have kept the EAGAIN handling code in tcp_chr_read to avoid the disconnects. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-08-25 15:52 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-23 8:57 [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 Gaohaifeng (A) 2016-08-25 1:33 ` Daniel P. Berrange 2016-08-25 13:21 ` Gaohaifeng (A) 2016-08-25 15:52 ` Daniel P. Berrange
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.