From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [RFC] KVM Fault Tolerance: Kemari for KVM Date: Tue, 17 Nov 2009 14:15:21 +0200 Message-ID: <4B0293D9.7000302@redhat.com> References: <4AF79242.20406@oss.ntt.co.jp> <4AFFD96D.5090100@redhat.com> <4B015F42.7070609@oss.ntt.co.jp> <4B01667F.3000600@redhat.com> <4B028334.1070004@lab.ntt.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: =?UTF-8?B?RmVybmFuZG8gTHVpcyBWw6F6cXVleiBDYW8=?= , kvm@vger.kernel.org, qemu-devel@nongnu.org, =?UTF-8?B?IuWkp+adkeWcrShvb211cmEga2VpKSI=?= , Takuya Yoshikawa , anthony@codemonkey.ws, Andrea Arcangeli , Chris Wright To: Yoshiaki Tamura Return-path: Received: from mx1.redhat.com ([209.132.183.28]:39704 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754165AbZKQMPk (ORCPT ); Tue, 17 Nov 2009 07:15:40 -0500 In-Reply-To: <4B028334.1070004@lab.ntt.co.jp> Sender: kvm-owner@vger.kernel.org List-ID: On 11/17/2009 01:04 PM, Yoshiaki Tamura wrote: >> What I mean is: >> >> - choose synchronization point A >> - start copying memory for synchronization point A >> - output is delayed >> - choose synchronization point B >> - copy memory for A and B >> if guest touches memory not yet copied for A, COW it >> - once A copying is complete, release A output >> - continue copying memory for B >> - choose synchronization point B >> >> by keeping two synchronization points active, you don't have any >> pauses. The cost is maintaining copy-on-write so we can copy dirty >> pages for A while keeping execution. > > > The overall idea seems good, but if I'm understanding correctly, we > need a buffer for copying memory locally, and when it gets full, or > when we COW the memory for B, we still have to pause the guest to > prevent from overwriting. Correct? Yes. During COW the guest would not be able to access the page, but if other vcpus access other pages, they can still continue. So generally synchronization would be pauseless. > To make things simple, we would like to start with the synchronous > transmission first, and tackle asynchronous transmission later. Of course. I'm just worried that realistic workloads will drive the latency beyond acceptable limits. > >>>> How many pages do you copy per synchronization point for reasonably >>>> difficult workloads? >>> >>> That is very workload-dependent, but if you take a look at the examples >>> below you can get a feeling of how Kemari behaves. >>> >>> IOzone Kemari sync interval[ms] dirtied pages >>> --------------------------------------------------------- >>> buffered + fsync 400 3000 >>> O_SYNC 10 80 >>> >>> In summary, if the guest executes few I/O operations, the interval >>> between Kemari synchronizations points will increase and the number of >>> dirtied pages will grow accordingly. >> >> In the example above, the externally observed latency grows to 400 >> ms, yes? > > Not exactly. The sync interval refers to the interval of > synchronization points captured when the workload is running. In the > example above, when the observed sync interval is 400ms, it takes > about 150ms to sync VMs with 3000 dirtied pages. Kemari resumes I/O > operations immediately once the synchronization is finished, and thus, > the externally observed latency is 150ms in this case. Not sure I understand. If a packet is output from a guest immediately after a synchronization point, doesn't it need to be delayed until the next synchronization point? So it's not just the guest pause time that matters, but also the interval between sync points? -- error compiling committee.c: too many arguments to function From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NAMyM-0008Cs-Fn for qemu-devel@nongnu.org; Tue, 17 Nov 2009 07:15:50 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NAMyH-0008BC-EL for qemu-devel@nongnu.org; Tue, 17 Nov 2009 07:15:49 -0500 Received: from [199.232.76.173] (port=58756 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NAMyG-0008B5-Pp for qemu-devel@nongnu.org; Tue, 17 Nov 2009 07:15:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53744) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NAMyF-00008x-MV for qemu-devel@nongnu.org; Tue, 17 Nov 2009 07:15:44 -0500 Message-ID: <4B0293D9.7000302@redhat.com> Date: Tue, 17 Nov 2009 14:15:21 +0200 From: Avi Kivity MIME-Version: 1.0 References: <4AF79242.20406@oss.ntt.co.jp> <4AFFD96D.5090100@redhat.com> <4B015F42.7070609@oss.ntt.co.jp> <4B01667F.3000600@redhat.com> <4B028334.1070004@lab.ntt.co.jp> In-Reply-To: <4B028334.1070004@lab.ntt.co.jp> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yoshiaki Tamura Cc: Andrea Arcangeli , Chris Wright , =?UTF-8?B?IuWkp+adkeWcrShvb211cmEga2VpKSI=?= , kvm@vger.kernel.org, =?UTF-8?B?RmVybmFuZG8gTHVpcyBWw6F6cXVleiBDYW8=?= , qemu-devel@nongnu.org, Takuya Yoshikawa On 11/17/2009 01:04 PM, Yoshiaki Tamura wrote: >> What I mean is: >> >> - choose synchronization point A >> - start copying memory for synchronization point A >> - output is delayed >> - choose synchronization point B >> - copy memory for A and B >> if guest touches memory not yet copied for A, COW it >> - once A copying is complete, release A output >> - continue copying memory for B >> - choose synchronization point B >> >> by keeping two synchronization points active, you don't have any >> pauses. The cost is maintaining copy-on-write so we can copy dirty >> pages for A while keeping execution. > > > The overall idea seems good, but if I'm understanding correctly, we > need a buffer for copying memory locally, and when it gets full, or > when we COW the memory for B, we still have to pause the guest to > prevent from overwriting. Correct? Yes. During COW the guest would not be able to access the page, but if other vcpus access other pages, they can still continue. So generally synchronization would be pauseless. > To make things simple, we would like to start with the synchronous > transmission first, and tackle asynchronous transmission later. Of course. I'm just worried that realistic workloads will drive the latency beyond acceptable limits. > >>>> How many pages do you copy per synchronization point for reasonably >>>> difficult workloads? >>> >>> That is very workload-dependent, but if you take a look at the examples >>> below you can get a feeling of how Kemari behaves. >>> >>> IOzone Kemari sync interval[ms] dirtied pages >>> --------------------------------------------------------- >>> buffered + fsync 400 3000 >>> O_SYNC 10 80 >>> >>> In summary, if the guest executes few I/O operations, the interval >>> between Kemari synchronizations points will increase and the number of >>> dirtied pages will grow accordingly. >> >> In the example above, the externally observed latency grows to 400 >> ms, yes? > > Not exactly. The sync interval refers to the interval of > synchronization points captured when the workload is running. In the > example above, when the observed sync interval is 400ms, it takes > about 150ms to sync VMs with 3000 dirtied pages. Kemari resumes I/O > operations immediately once the synchronization is finished, and thus, > the externally observed latency is 150ms in this case. Not sure I understand. If a packet is output from a guest immediately after a synchronization point, doesn't it need to be delayed until the next synchronization point? So it's not just the guest pause time that matters, but also the interval between sync points? -- error compiling committee.c: too many arguments to function