From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56582) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cq8vZ-0005KA-3G for qemu-devel@nongnu.org; Mon, 20 Mar 2017 21:49:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cq8vY-0001vB-8D for qemu-devel@nongnu.org; Mon, 20 Mar 2017 21:49:37 -0400 Received: from mail-vk0-x234.google.com ([2607:f8b0:400c:c05::234]:36863) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cq8vY-0001uf-31 for qemu-devel@nongnu.org; Mon, 20 Mar 2017 21:49:36 -0400 Received: by mail-vk0-x234.google.com with SMTP id j64so65852218vkg.3 for ; Mon, 20 Mar 2017 18:49:35 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: Ed Swierk Date: Mon, 20 Mar 2017 18:48:54 -0700 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] Assertion failure taking external snapshot with virtio drive + iothread List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Fam Zheng , Kevin Wolf , qemu-devel@nongnu.org On Fri, Mar 17, 2017 at 12:27 PM, Paolo Bonzini wrote: > And this is a fix, but I have no idea why/how it works and what else it > may break. > > Patches 1 and 2 are pretty obvious and would be the first step towards > eliminating aio_disable/enable_external altogether. > > However I got patch 3 more or less by trial and error, and when I > thought I had the reasoning right I noticed this: > > bdrv_drained_end(state->old_bs); > > in external_snapshot_clean which makes no sense given the > > bdrv_drained_begin(bs_new); > > that I added to bdrv_append. So take this with a ton of salt. > > The basic idea is that calling child->role->drained_begin and > child->role->drained_end is not necessary and in fact actively wrong > when both the old and the new child should be in a drained section. > But maybe instead it should be asserted that they are, except for the > special case of adding or removing a child. i.e. after > > int drain = !!(old_bs && old_bs->quiesce_counter) - !!(new_bs && new_bs->quiesce_counter); > > add > > assert(!(drain && old_bs && new_bs)); > > Throwing this out because it's Friday evening... Maybe Fam can pick > it up on Monday. I just tested this patch on top of today's master. It does make the ctx->external_disable_cnt > 0 assertion failure on snapshot_blkdev go away. But it seems to cause a different assertion failure when running without an iothread, e.g. qemu-system-x86_64 -nographic -enable-kvm -monitor telnet:0.0.0.0:1234,server,nowait -m 1024 -object iothread,id=iothread1 -drive file=/x/drive.qcow2,if=none,id=drive0 -device virtio-blk-pci,drive=drive0 and with the guest constantly writing to the disk with something like while true; do echo 12345 >blah; done Running snapshot_blkdev in the monitor repeatedly (with a new backing file each time) triggers the following after a few tries: qemu-system-x86_64: /x/qemu/block.c:2965: bdrv_replace_node: Assertion `!({ typedef struct { int:(sizeof(*&from->in_flight) > sizeof(void *)) ? -1 : 1; } qemu_build_bug_on__4 __attribute__((unused)); __atomic_load_n(&from->in_flight, 0); })' failed. This does not occur on today's master without this patch. --Ed