From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41642) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aswZg-0004o9-M1 for qemu-devel@nongnu.org; Wed, 20 Apr 2016 14:10:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aswZf-0004ln-JW for qemu-devel@nongnu.org; Wed, 20 Apr 2016 14:10:04 -0400 References: <20160413231801.31850.67186.malonedeb@chaenomeles.canonical.com> <20160420000318.17358.96092.malone@soybean.canonical.com> From: Max Reitz Message-ID: <5717C5F3.90603@redhat.com> Date: Wed, 20 Apr 2016 20:09:55 +0200 MIME-Version: 1.0 In-Reply-To: <20160420000318.17358.96092.malone@soybean.canonical.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="E3pSTNQpEcDh7CDRJMM3Wo7QcdQTP3auV" Subject: Re: [Qemu-devel] [Bug 1570134] Re: While committing snapshot qemu crashes with SIGABRT List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org, Qemu-block , Bug 1570134 <1570134@bugs.launchpad.net> Cc: Fam Zheng , Paolo Bonzini , Stefan Hajnoczi This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --E3pSTNQpEcDh7CDRJMM3Wo7QcdQTP3auV Content-Type: multipart/mixed; boundary="j087tEW9ptLECv7LIbhAO2f2VKeKAG07v" From: Max Reitz To: qemu-devel@nongnu.org, Qemu-block , Bug 1570134 <1570134@bugs.launchpad.net> Cc: Fam Zheng , Paolo Bonzini , Stefan Hajnoczi Message-ID: <5717C5F3.90603@redhat.com> Subject: Re: [Qemu-devel] [Bug 1570134] Re: While committing snapshot qemu crashes with SIGABRT References: <20160413231801.31850.67186.malonedeb@chaenomeles.canonical.com> <20160420000318.17358.96092.malone@soybean.canonical.com> In-Reply-To: <20160420000318.17358.96092.malone@soybean.canonical.com> --j087tEW9ptLECv7LIbhAO2f2VKeKAG07v Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 20.04.2016 02:03, Matthew Schumacher wrote: > Max, >=20 > Qemu still crashes for me, but the debug is again very different. When= > I attach to the qemu process from gdb, it is unable to provide a > backtrace when it crashes. The log file is different too. Any ideas? >=20 > qemu-system-x86_64: block.c:2307: bdrv_replace_in_backing_chain: > Assertion `!bdrv_requests_pending(old)' failed. This message is exactly the same as you saw in 2.5.1, so I guess we've at least averted a regression in 2.6.0. I'm CC-ing some people who are more involved with this (although Paolo is on PTO right now, but well...). (The following is more of a note to those people than to you, Matthew.) Summary: I think bdrv_drained_begin() does not behave as advertised. So the assertion that is failing here asserts that no requests are pending on the mirror block jobs source BDS. However, we do invoke a bdrv_drained_begin() on exactly that BDS at the end of mirror_run(). When that function returns, there are indeed no more requests pending for that BDS. But once mirror_exit() is invoked, there may be new requests pending. I reproduced that by running bonnie++ in a guest and then just committed a snapshot and invoked block-job-complete right after the BLOCK_JOB_READY event; sometimes, in bdrv_requests_pending(s->common.bs) is true in mirror_exit() (which is bad), sometimes it's false. I just used a plain virtio-blk drive without dataplane. I'm not sure exactly how bdrv_drained_begin() and in turn aio_disable_external() are supposed to work, but as a matter of fact a BDS may receive requests even after those functions are called. Just putting an assert(!bs->quiesce_counter) in tracked_request_begin() will make it fail even before I started the mirror block job (due to some flus= h). So in my case the problematic request regarding the mirroring comes from blk_aio_ready_entry(); putting an assert(!blk_bs(blk)->quiesce_counter) into blk_aio_readv() yields the following backtrace: #0 0x00007f3e750bd2a8 in raise () from /usr/lib/libc.so.6 No symbol table info available. #1 0x00007f3e750be72a in abort () from /usr/lib/libc.so.6 No symbol table info available. #2 0x00007f3e750b61b7 in __assert_fail_base () from /usr/lib/libc.so.6 No symbol table info available. #3 0x00007f3e750b6262 in __assert_fail () from /usr/lib/libc.so.6 No symbol table info available. #4 0x0000564cf7d4e25e in blk_aio_readv (blk=3D, sector_num=3D, iov=3D, nb_sectors=3D, cb=3D, opaque=3D) at qemu/block/block-backend.c:1002 __PRETTY_FUNCTION__ =3D "blk_aio_readv" #5 0x0000564cf7ab2cf3 in submit_requests (niov=3D, num_reqs=3D, start=3D, mrb=3D, blk=3D) at qemu/hw/block/virtio-blk.c:361 nb_sectors =3D is_write =3D qiov =3D sector_num =3D #6 virtio_blk_submit_multireq (blk=3D0x564cf9f80250, mrb=3Dmrb@entry=3D0x7ffeffbfce40) at qemu/hw/block/virtio-blk.c:391 i =3D start =3D num_reqs =3D niov =3D nb_sectors =3D max_xfer_len =3D sector_num =3D #7 0x0000564cf7ab38c2 in virtio_blk_handle_vq (s=3D0x564cf9e51268, vq=3D) at qemu/hw/block/virtio-blk.c:593 req =3D 0x0 mrb =3D {reqs =3D {0x564cfb8e8c30, 0x564cfb7bc290, 0x0 }, num_reqs =3D 2, is_write =3D false} #8 0x0000564cf7addcf5 in virtio_queue_notify_vq (vq=3D0x564cfa000be0) at= qemu/hw/virtio/virtio.c:1108 vdev =3D 0x564cf9e51268 #9 0x0000564cf7d19980 in aio_dispatch (ctx=3D0x564cf9e42f40) at qemu/aio-posix.c:327 tmp =3D revents =3D node =3D 0x7f3e54015030 progress =3D false #10 0x0000564cf7d0eecd in aio_ctx_dispatch (source=3D, callback=3D, user_data=3D) at qemu/async.c:= 233 ctx =3D #11 0x00007f3e781d7f07 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 No symbol table info available. #12 0x0000564cf7d1803b in glib_pollfds_poll () at qemu/main-loop.c:213 context =3D 0x564cf9e44800 pfds =3D #13 os_host_main_loop_wait (timeout=3D) at qemu/main-loop.= c:258 ret =3D 2 spin_counter =3D 2 #14 main_loop_wait (nonblocking=3D) at qemu/main-loop.c:50= 6 ret =3D 2 timeout =3D 1000 timeout_ns =3D #15 0x0000564cf7a4c91c in main_loop () at qemu/vl.c:1934 nonblocking =3D last_io =3D 0 #16 main (argc=3D, argv=3D, envp=3D) at qemu/vl.c:4658 Maybe bdrv_drained_begin() is supposed to work like this and to let this request through but that would be pretty counter-intuitive. Max --j087tEW9ptLECv7LIbhAO2f2VKeKAG07v-- --E3pSTNQpEcDh7CDRJMM3Wo7QcdQTP3auV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJXF8XzAAoJEDuxQgLoOKytktgH/1M/X9RsJU7Bv8l9vsFRvoDq 2qnMdthroAQovJUmsaOaiyHHmWpk3qK/8umiAJ0c/EcAA/CmELSEWaOyEOpFf0+V dg4Uf9Gl6LuYaF9dCHye8yxQy5z31xT4jXoxqV9QYokGNvMiYqKBDjSusH3Sxxvs KJFNgRJgzO93RxIJJR7yqt8Z688lMXRa3Nc6NSG+SNsqST9QmVK5waPv55V+GvDu eZtIzkY0gfXEjKyKN62+oOjh5SsC60+zcsizXnzdVogdPys7Ab92JjFWDldpaUi2 RIWEbm9CfUK2AWL1ZqpeBAd2RemER8QVrzfqleUTKdtWciW/0uiw1Cq7Ldi2xCE= =N9sC -----END PGP SIGNATURE----- --E3pSTNQpEcDh7CDRJMM3Wo7QcdQTP3auV--