From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52366) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zujk4-0003BE-Fb for qemu-devel@nongnu.org; Fri, 06 Nov 2015 11:19:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zujk1-0007MY-7q for qemu-devel@nongnu.org; Fri, 06 Nov 2015 11:19:56 -0500 Received: from relay.parallels.com ([195.214.232.42]:59731) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zujk1-0007Hf-0B for qemu-devel@nongnu.org; Fri, 06 Nov 2015 11:19:53 -0500 References: <1446657582-21619-1-git-send-email-den@openvz.org> <20151106155455.GS12285@stefanha-x1.localdomain> <563CCFC8.2000206@redhat.com> From: "Denis V. Lunev" Message-ID: <563CD315.20104@openvz.org> Date: Fri, 6 Nov 2015 19:19:33 +0300 MIME-Version: 1.0 In-Reply-To: <563CCFC8.2000206@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 2.5 v5 0/11] dataplane snapshot fixes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , Stefan Hajnoczi Cc: Kevin Wolf , qemu-devel@nongnu.org, Stefan Hajnoczi , Juan Quintela On 11/06/2015 07:05 PM, Eric Blake wrote: > On 11/06/2015 08:54 AM, Stefan Hajnoczi wrote: >> On Wed, Nov 04, 2015 at 08:19:31PM +0300, Denis V. Lunev wrote: >>> with test >>> while /bin/true ; do >>> virsh snapshot-create rhel7 >>> sleep 10 >>> virsh snapshot-delete rhel7 --current >>> done >>> with enabled iothreads on a running VM leads to a lot of troubles: hangs, >>> asserts, errors. > That is a case of using libvirt to trigger internal snapshots... > >> The HMP monitor is legacy and also not used by modern libvirt. > ...and libvirt is forced to use HMP for internal snapshots, since we > _still_ haven't exposed internal snapshots as a QMP command. > >> I think the affected use cases are restricted to savevm+dataplane and >> HMP+dataplane. > The fact that the commit message calls out a libvirt method of > triggering the bug does mean that it is user-visible, and so it would > qualify as a bug fix even during hard freeze. But I also understand > that taking a large complex series late in the game is not without risk; > and it is not like this is a regression (rather, something that has > never worked bulletproof), right? > yes, this was not working in the past and this is not a regression. The problem is that it seems that NOBODY uses iothreads in the production or even for complex real life production tests. There is another recently merged example of this (100% reproducible, happens both on migration/snapshot). We have faced this on suspend operation. commit 10a06fd65f667a972848ebbbcac11bdba931b544 Author: Pavel Butsykin Date: Mon Oct 26 14:42:57 2015 +0300 virtio: sync the dataplane vring state to the virtqueue before virtio_save I have started this initially as a set of small bits in savevm code and was asked to move the code from savevm.c to block layer. This has been done and yes, series becomes complex after that and it was obvious that it will be complex when the task was set to move a bunch of code from one place to another. Anyway, from my point of view the serie is not that complex. It is just large and is doing simple things almost near copy/paste and there is a month to catch bugs here. Can we still consider this for merge? Den