From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52958) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMbM3-0006Gh-Qi for qemu-devel@nongnu.org; Tue, 04 Aug 2015 08:30:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZMbLy-0003cf-6E for qemu-devel@nongnu.org; Tue, 04 Aug 2015 08:30:03 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:53952 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMbLx-0003c8-SQ for qemu-devel@nongnu.org; Tue, 04 Aug 2015 08:29:58 -0400 Message-ID: <55C0B03D.8000109@kamp.de> Date: Tue, 04 Aug 2015 14:29:49 +0200 From: Peter Lieven MIME-Version: 1.0 References: <55BB2DF7.8010808@kamp.de> <55BB302D.50108@redhat.com> <55BB335A.1010009@kamp.de> <55BB3FE7.3000106@redhat.com> <55C08461.1040308@kamp.de> <55C0A7AA.70609@redhat.com> <55C0A88D.1010800@kamp.de> <55C0AB81.8020404@redhat.com> In-Reply-To: <55C0AB81.8020404@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Stefan Hajnoczi Cc: "qemu-devel@nongnu.org" , ronnie sahlberg , qemu-stable@nongnu.org Am 04.08.2015 um 14:09 schrieb Paolo Bonzini: > > On 04/08/2015 13:57, Peter Lieven wrote: >> Okay, what I found out is that in aio_poll I get revents = POLLIN for >> the nfs file descriptor. But there is no data available on the socket. > Does read return 0 or EAGAIN? > > If it returns EAGAIN, the bug is in the QEMU main loop or the kernel. > It should never happen that poll returns POLLIN and read returns EAGAIN. > > If it returns 0, it means the other side called shutdown(fd, SHUT_WR). > Then I think the bug is in the libnfs driver or more likely libnfs. You > should stop polling the POLLIN event after read has returned 0 once. You might be right. Ronnie originally used the FIONREAD ioctl before every read and considered the socket as disconnected if the available bytes returned where 0. I found that I get available bytes == 0 from that ioctl even if the socket was not closed. This seems to be some kind of bug in Linux - at least what I have thought. See BUGS in the select(2) manpage. Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NON‐ BLOCK on sockets that should not block. I will debug further, but it seems to be that I receive a POLLIN even if there is no data available. I see 0 bytes from the recv call inside libnfs and continue without a deadlock - at least so far. Would it be a good idea to count the number of 0 bytes from recv and react after I received 0 bytes for a number of consecutive times? And then: stop polling POLLIN or reconnect? Thanks, Peter