From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53399) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c9EqA-0007D1-08 for qemu-devel@nongnu.org; Tue, 22 Nov 2016 12:26:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c9Eq4-0001MT-Vg for qemu-devel@nongnu.org; Tue, 22 Nov 2016 12:26:42 -0500 Received: from mx1.redhat.com ([209.132.183.28]:62443) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c9Eq4-0001LR-Nw for qemu-devel@nongnu.org; Tue, 22 Nov 2016 12:26:36 -0500 References: <2fb12281-1023-71c0-7fd9-39e27787c1e9@virtuozzo.com> <6602e519-1d25-86be-855e-d29155ec267c@redhat.com> <40ca4b89-f7bb-6b03-3bd2-0b177d2359ab@redhat.com> From: John Snow Message-ID: <4a811aba-6ede-ee4e-346a-52700d045206@redhat.com> Date: Tue, 22 Nov 2016 12:26:34 -0500 MIME-Version: 1.0 In-Reply-To: <40ca4b89-f7bb-6b03-3bd2-0b177d2359ab@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC] dirty bitmap state uncertainty under certain conditions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , Nikolay Shirokovskiy , qemu-devel@nongnu.org Cc: Denis Lunev , Vladimir Sementsov-Ogievskiy , Maxim Nestratov , Jeff Cody On 11/22/2016 11:16 AM, Eric Blake wrote: > On 11/22/2016 10:07 AM, John Snow wrote: >> >> >> On 11/22/2016 07:01 AM, Nikolay Shirokovskiy wrote: >>> Hi, everyone. >>> >>> There is a problem with current incremental backups. Imagine I ask >>> qemu to >>> make an incremental backup then go away and return back when backup >>> job is finished. Qemu process dismisses the job completely and I miss= ed >>> all the events so I don't know the result of the operation and what i= s >>> most important I don't know the base for dirty bitmap now. In case of >>> failure >>> it is previous backup and in case of success it is the last backup. >>> Qemu does >>> not track dirty bitmap base for me so I have no choice other then cle= ar >>> dirty bitmap and make full backup which would be rather unexpected >>> from user >>> POV (The situation of going away/coming back is libvirt crash/restart >>> of course.) >>> >> >> Why was the completion/failure event missed? Is there some reason why >> you cannot guarantee that you will observe the completion? > > I think the intent of some of the on-error parameters is to make it so > that the job can't go away on error, only on success. Admittedly, > libvirt isn't using those policies as well as it could. > >> >>> I guess problem has wider scope. In case I miss successfull >>> completion of full >>> backup my only option is to drop backup file and redo the backup >>> completely >>> which is rather wasteful. AFAIU I can not query backup completion >>> result from >>> backup file itself. I guess there can be similar issues for other qem= u >>> jobs. >>> >>> Nikolay >>> >> >> I would personally advocate for a job-neutral solution where jobs can = be >> given a parameter such that the job persists in memory in a new >> "completed" state until such time that it is queried explicitly, then = it >> can be dropped. >> >> I am not sure if we can make this the default behavior, as it might >> confuse libvirt to occasionally see jobs that have already completed. >> >> Talking to Kevin off-list, he suggested that we might be able to make >> this the default behavior if we pivot to the new jobs API that I have >> been proposing, accompanied by a new explicit command to put a command >> to rest. > > Yeah, revisiting the overall job API will require some overhaul in > libvirt as well, but it is probably worth it. > I wonder if I should try to rectify this temporarily for 2.9, or just=20 jump straight into a new interface. >> >> I can work on this for 2.9; though we may still need a "temporary" >> solution for the old jobs API until we're ready to officially deprecat= e >> the older interface. >> >> > --=20 =E2=80=94js