From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51682) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bcwxH-00032t-VN for qemu-devel@nongnu.org; Thu, 25 Aug 2016 11:52:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bcwxD-0007qN-KR for qemu-devel@nongnu.org; Thu, 25 Aug 2016 11:52:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42468) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bcwxD-0007qD-CA for qemu-devel@nongnu.org; Thu, 25 Aug 2016 11:52:31 -0400 Date: Thu, 25 Aug 2016 11:52:27 -0400 From: "Daniel P. Berrange" Message-ID: <20160825155227.GB23814@redhat.com> Reply-To: "Daniel P. Berrange" References: <47CEA9C0E570484FBF22EF0D7EBCE5B5357D7C69@szxema505-mbx.china.huawei.com> <20160825013334.GH22605@redhat.com> <47CEA9C0E570484FBF22EF0D7EBCE5B5357D9022@szxema505-mbx.china.huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <47CEA9C0E570484FBF22EF0D7EBCE5B5357D9022@szxema505-mbx.china.huawei.com> Subject: Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Gaohaifeng (A)" Cc: "pbonzini@redhat.com" , "Lilijun (Jerry)" , zangchuanqiang , qemu devel On Thu, Aug 25, 2016 at 01:21:47PM +0000, Gaohaifeng (A) wrote: > On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote: > > Hi Daniel & Paolo, > > > > > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about > > > > > > the below code segment: > > > > > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, > > > void *opaque) > > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, > > > +void *opaque) > > > { > > > CharDriverState *chr = opaque; > > > TCPCharDriver *s = chr->opaque; > > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque) > > > if (len > s->max_size) > > > len = s->max_size; > > > size = tcp_chr_recv(chr, (void *)buf, len); > > > - if (size == 0 || > > > - (size < 0 && > > > - socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) { > > > + if (size == 0 || size == -1) { > > > /* connection closed */ > > > tcp_chr_disconnect(chr); > > > } else if (size > 0) { > > > > > > The old version will not call tcp_chr_disconnect when error is not > > > EAGAIN. > > > The new version will call tcp_chr_disconnect when size==-1. From the > > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call > > > tcp_chr_disconnect. > > > > > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit > > > since link status is not up. The link is down because > > > tcp_chr_disconnect will set it. > > > > > > Can you explain it why EAGIN here changes the behavior ? > > > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event loop watch so that tcp_chr_read is called when there is incoming data to be read. IOW, when tcp_chr_read is invoked, I would always expect that at least 1 byte is available, and so EAGAIN should not occurr. > > > > So I'm puzzled why you would get EAGAIN at all, unless there is some kind of race with other code also consuming bytes from the same socket. > > It could easily reproduce this when repeating 'rmmod virtio_net;modprobe virtio_net'. > The call stack: > #0 tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867 > #1 0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917 > #2 0x00007f09d46364fa in qio_channel_fd_source_dispatch (source=0x7f093e603b30, callback=0x7f09d43b1fcb , user_data=0x7f09d5b2d540) at io/channel-watch.c:84 > #3 0x00007f09d2e9999a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 > #4 0x00007f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213 > #5 0x00007f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at main-loop.c:258 > #6 0x00007f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506 > #7 0x00007f09d43bbdd2 in main_loop () at vl.c:2119 > #8 0x00007f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, envp=0x7ffe7ad66b70) at vl.c:4896 > (gdb) f 1 > #1 0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917 > 2917 tcp_chr_disconnect(chr); > (gdb) p errno > $1 = 11 > > EAGAIN = 11 I think what is happening is a race condition - the main event loop thread sees I/O incoming triggering tcp_chr_read. Meanwhile the vhost-user.c file is triggering vhost_user_read() in a different thread which consumes the incoming I/O before tcp_chr_read can handle it. IOW, you are right, we should have kept the EAGAIN handling code in tcp_chr_read to avoid the disconnects. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|