All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: "Gaohaifeng (A)" <gaohaifeng.gao@huawei.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Lilijun (Jerry)" <jerry.lilijun@huawei.com>,
	zangchuanqiang <zangchuanqiang@huawei.com>,
	qemu devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2
Date: Thu, 25 Aug 2016 11:52:27 -0400	[thread overview]
Message-ID: <20160825155227.GB23814@redhat.com> (raw)
In-Reply-To: <47CEA9C0E570484FBF22EF0D7EBCE5B5357D9022@szxema505-mbx.china.huawei.com>

On Thu, Aug 25, 2016 at 01:21:47PM +0000, Gaohaifeng (A) wrote:
> On Tue, Aug 23, 2016 at 08:57:44AM +0000, Gaohaifeng (A) wrote:
> > Hi Daniel & Paolo,
> > > 
> > > Commit 9894dc0c "char: convert from GIOChannel to QIOChannel", about
> > > 
> > > the below code segment:
> > > 
> > > -static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, 
> > > void *opaque)
> > > +static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, 
> > > +void *opaque)
> > > {
> > >      CharDriverState *chr = opaque;
> > >      TCPCharDriver *s = chr->opaque;
> > > @@ -2938,9 +2801,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
> > >      if (len > s->max_size)
> > >          len = s->max_size;
> > >      size = tcp_chr_recv(chr, (void *)buf, len);
> > > -    if (size == 0 ||
> > > -        (size < 0 &&
> > > -         socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
> > > +    if (size == 0 || size == -1) {
> > >          /* connection closed */
> > >          tcp_chr_disconnect(chr);
> > >      } else if (size > 0) {
> > > 
> > > The old version will not call tcp_chr_disconnect when error is not 
> > > EAGAIN.
> > > The new version will call tcp_chr_disconnect when size==-1. From the 
> > > tcp_chr_recv function we see EAGIN will return -1, so EAGIN will call 
> > > tcp_chr_disconnect.
> > > 
> > > We meet an issue when Guest VM use DPDK(1.6.0) l2fwd, it may exit 
> > > since link status is not up. The link is down because 
> > > tcp_chr_disconnect will set it.
> > > 
> > > Can you explain it why EAGIN here changes the behavior ?
> 
> > tcp_chr_read is used in a call to io_add_watch_poll() which sets up an event loop watch so that tcp_chr_read is called when there is incoming data to be read.  IOW, when tcp_chr_read is invoked, I would always expect that at least 1 byte is available, and so EAGAIN should not occurr.
> > 
> > So I'm puzzled why you would get EAGAIN at all, unless there is some kind of race with other code also consuming bytes from the same socket.
> 
> It could easily reproduce this when repeating 'rmmod virtio_net;modprobe virtio_net'.
> The call stack:
> #0  tcp_chr_disconnect (chr=0x7f09d5b2d540) at qemu-char.c:2867
> #1  0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917
> #2  0x00007f09d46364fa in qio_channel_fd_source_dispatch (source=0x7f093e603b30, callback=0x7f09d43b1fcb <tcp_chr_read>, user_data=0x7f09d5b2d540) at io/channel-watch.c:84
> #3  0x00007f09d2e9999a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> #4  0x00007f09d45bfc96 in glib_pollfds_poll () at main-loop.c:213
> #5  0x00007f09d45bfd73 in os_host_main_loop_wait (timeout=991853) at main-loop.c:258
> #6  0x00007f09d45bfe23 in main_loop_wait (nonblocking=0) at main-loop.c:506
> #7  0x00007f09d43bbdd2 in main_loop () at vl.c:2119
> #8  0x00007f09d43c380e in main (argc=56, argv=0x7ffe7ad669a8, envp=0x7ffe7ad66b70) at vl.c:4896
> (gdb) f 1
> #1  0x00007f09d43b2106 in tcp_chr_read (chan=0x7f09d5b2df30, cond=G_IO_IN, opaque=0x7f09d5b2d540) at qemu-char.c:2917
> 2917	        tcp_chr_disconnect(chr);
> (gdb) p errno
> $1 = 11
> 
> EAGAIN = 11

I think what is happening is a race condition - the main event loop thread
sees I/O incoming triggering tcp_chr_read. Meanwhile the vhost-user.c file is
triggering vhost_user_read() in a different thread which consumes the incoming
I/O before tcp_chr_read can handle it.

IOW, you are right, we should have kept the EAGAIN handling code in tcp_chr_read
to avoid the disconnects.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

      reply	other threads:[~2016-08-25 15:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-23  8:57 [Qemu-devel] A question about this commit 9894dc0cdcc397ee5b26370bc53da6d360a363c2 Gaohaifeng (A)
2016-08-25  1:33 ` Daniel P. Berrange
2016-08-25 13:21   ` Gaohaifeng (A)
2016-08-25 15:52     ` Daniel P. Berrange [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160825155227.GB23814@redhat.com \
    --to=berrange@redhat.com \
    --cc=gaohaifeng.gao@huawei.com \
    --cc=jerry.lilijun@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zangchuanqiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.