From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60667) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fn86c-0000sG-G5 for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:57:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fn86b-0006xz-8x for qemu-devel@nongnu.org; Tue, 07 Aug 2018 15:57:22 -0400 References: <20180618164504.24488-1-kwolf@redhat.com> <20180618164504.24488-22-kwolf@redhat.com> From: Eric Blake Message-ID: <28f44beb-e19b-3e26-254a-4678eacf30c2@redhat.com> Date: Tue, 7 Aug 2018 14:57:13 -0500 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PULL 21/35] block: fix QEMU crash with scsi-hd and drive_del List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf , qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, Max Reitz , Vladimir Sementsov-Ogievskiy On 08/06/2018 05:04 PM, Eric Blake wrote: > On 06/18/2018 11:44 AM, Kevin Wolf wrote: >> From: Greg Kurz >> >> Removing a drive with drive_del while it is being used to run an I/O >> intensive workload can cause QEMU to crash. >> > ... >=20 > Test 83 sets up a client that intentionally disconnects at critical=20 > points in the NBD protocol exchange, to ensure that the server reacts=20 > sanely. Rather, nbd-fault-injector.py is a server that disconnects at critical=20 points, and the test is of client reaction. >=C2=A0 I suspect that somewhere in the NBD code, the server detects=20 > the disconnect and was somehow calling into blk_remove_bs() (although I= =20 > could not quickly find the backtrace); and that prior to this patch, th= e=20 > 'Connection closed' message resulted from other NBD coroutines getting = a=20 > shot at the (now-closed) connection, while after this patch, the=20 > additional blk_drain() somehow tweaks things in a way that prevents the= =20 > other NBD coroutines from printing a message.=C2=A0 If so, then the cha= nge in=20 > 83 reference output is probably intentional, and we should update it. It seems like this condition is racy, and that the race is more likely=20 to be lost prior to this patch than after. It's a question of whether=20 the client has time to start a request to the server prior to the server=20 hanging up, as the message is generated during=20 nbd_co_do_receive_one_chunk. Here's a demonstration of the fact that=20 things are racy: $ git revert f45280cbf $ make $ cd tests/qemu-iotests $ cat fault.txt [inject-error "a"] event=3Dneg2 when=3Dafter $ python nbd-fault-injector.py localhost:10809 ./fault.txt & Listening on 127.0.0.1:10809 $ ../../qemu-io -f raw nbd://localhost:10809 -c 'r 0 512' Closing connection on rule match inject-error "a" Connection closed read failed: Input/output error $ python nbd-fault-injector.py localhost:10809 ./fault.txt & Listening on 127.0.0.1:10809 $ ../../qemu-io -f raw nbd://localhost:10809 Closing connection on rule match inject-error "a" qemu-io> r 0 512 read failed: Input/output error qemu-io> q So, depending on whether the read command is kicked off quickly (via -c)=20 or slowly (via typing into qemu-io) determines whether the message appear= s. What's more, in commit f140e300, we specifically called out in the=20 commit message that maybe it was better to trace when we detect=20 connection closed rather than log it to stdout, and in all cases in that=20 commit, the additional 'Connection closed' messages do not add any=20 information to the error message already displayed by the rest of the cod= e. I don't know how much the proposed NBD reconnect code will change things=20 in 3.1. Meanwhile, we've missed any chance for 3.0 to fix test 83. >=20 > But I'm having a hard time convincing myself that this is the case,=20 > particularly since I'm not even sure how to easily debug the assumption= s=20 > I made above. >=20 > Since I'm very weak on the whole notion of what blk_drain() vs.=20 > blk_remove_bs() is really supposed to be doing, and could easily be=20 > persuaded that the change in output is a regression instead of a fix. At this point, I don't think we have a regression, just merely a bad=20 iotests reference output. The extra blk_drain() merely adds more time=20 before the NBD code can send out its first request, and thus makes it=20 more likely that the fault injector has closed the connection before the=20 read request is issued rather than after (the message only appears when=20 read beats the race), but the NBD code shouldn't be printing the error=20 message in the first place, and 083 needs to be tweaked to remove the=20 noisy lines added in f140e300 (not just the three lines that are=20 reliably different due to this patch, but all other such lines due to=20 strategic server drops at other points in the NBD protocol). --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org