From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Subject: Re: [RFC] KVM Fault Tolerance: Kemari for KVM
Date: Wed, 18 Nov 2009 22:28:46 +0900
Message-ID: <87e9effc0911180528s5546c8bt383a6674b382890d@mail.gmail.com>
References: <4AF79242.20406@oss.ntt.co.jp> <4AFFD96D.5090100@redhat.com>
	 <4B015F42.7070609@oss.ntt.co.jp> <4B01667F.3000600@redhat.com>
	 <4B028334.1070004@lab.ntt.co.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: =?ISO-8859-1?Q?Fernando_Luis_V=E1zquez_Cao?=
	<fernando@oss.ntt.co.jp>, kvm@vger.kernel.org,
	qemu-devel@nongnu.org,
	=?ISO-2022-JP?B?GyRCQmdCPDc9GyhCKG9vbXVyYSBrZWkp?=
	<ohmura.kei@lab.ntt.co.jp>,
	Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
	anthony@codemonkey.ws, Andrea Arcangeli <aarcange@redhat.com>,
	Chris Wright <chrisw@redhat.com>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-yw0-f202.google.com ([209.85.211.202]:54863 "EHLO
	mail-yw0-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753158AbZKRN2k convert rfc822-to-8bit (ORCPT
	<rfc822;kvm@vger.kernel.org>); Wed, 18 Nov 2009 08:28:40 -0500
Received: by ywh40 with SMTP id 40so274306ywh.33
        for <kvm@vger.kernel.org>; Wed, 18 Nov 2009 05:28:46 -0800 (PST)
In-Reply-To: <4B028334.1070004@lab.ntt.co.jp>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

2009/11/17 Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>:
> Avi Kivity wrote:
>>
>> On 11/16/2009 04:18 PM, Fernando Luis V=E1zquez Cao wrote:
>>>
>>> Avi Kivity wrote:
>>>>
>>>> On 11/09/2009 05:53 AM, Fernando Luis V=E1zquez Cao wrote:
>>>>>
>>>>> Kemari runs paired virtual machines in an active-passive configur=
ation
>>>>> and achieves whole-system replication by continuously copying the
>>>>> state of the system (dirty pages and the state of the virtual dev=
ices)
>>>>> from the active node to the passive node. An interesting implicat=
ion
>>>>> of this is that during normal operation only the active node is
>>>>> actually executing code.
>>>>>
>>>>
>>>> Can you characterize the performance impact for various workloads?=
 =A0I
>>>> assume you are running continuously in log-dirty mode. =A0Doesn't =
this make
>>>> memory intensive workloads suffer?
>>>
>>> Yes, we're running continuously in log-dirty mode.
>>>
>>> We still do not have numbers to show for KVM, but
>>> the snippets below from several runs of lmbench
>>> using Xen+Kemari will give you an idea of what you
>>> can expect in terms of overhead. All the tests were
>>> run using a fully virtualized Debian guest with
>>> hardware nested paging enabled.
>>>
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fork exec =A0 sh =A0 =A0P/F=
 =A0C/S =A0 [us]
>>> ------------------------------------------------------
>>> Base =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0114 =A0349 1197 1.2845 =A08=
=2E2
>>> Kemari(10GbE) + FC =A0 =A0141 =A0403 1280 1.2835 11.6
>>> Kemari(10GbE) + DRBD =A0161 =A0415 1388 1.3145 11.6
>>> Kemari(1GbE) + FC =A0 =A0 151 =A0410 1335 1.3370 11.5
>>> Kemari(1GbE) + DRBD =A0 162 =A0413 1318 1.3239 11.6
>>> * P/F=3Dpage fault, C/S=3Dcontext switch
>>>
>>> The benchmarks above are memory intensive and, as you
>>> can see, the overhead varies widely from 7% to 40%.
>>> We also measured CPU bound operations, but, as expected,
>>> Kemari incurred almost no overhead.
>>
>> Is lmbench fork that memory intensive?
>>
>> Do you have numbers for benchmarks that use significant anonymous RS=
S?
>> =A0Say, a parallel kernel build.
>>
>> Note that scaling vcpus will increase a guest's memory-dirtying powe=
r but
>> snapshot rate will not scale in the same way.
>
> I don't think lmbench is intensive but it's sensitive to memory laten=
cy.
> We'll measure kernel build time with minimum config, and post it late=
r.

Here are some quick numbers of parallel kernel compile time.
The number of vcpu is 1, just for convenience.

time make -j 2 all
-----------------------------------------------------------------------=
------
Base:    real 1m13.950s (user 1m2.742s, sys 0m10.446s)
Kemari: real 1m22.720s (user 1m5.882s, sys 0m10.882s)

time make -j 4 all
-----------------------------------------------------------------------=
------
Base:    real 1m11.234s (user 1m2.582s, sys 0m8.643s)
Kemari: real 1m26.964s (user 1m6.530s, sys 0m12.194s)

The result of Kemari includes everything, meaning dirty pages tracking =
and
synchronization upon I/O operations to the disk.
The compile time using j=3D4 under Kemari was worse than that of j=3D2,
but I'm not sure this is due to dirty pages tracking or sync interval.

Thanks,

Yoshi

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NAkaf-0005H8-6C
	for qemu-devel@nongnu.org; Wed, 18 Nov 2009 08:28:57 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NAkaa-0005Cm-6L
	for qemu-devel@nongnu.org; Wed, 18 Nov 2009 08:28:56 -0500
Received: from [199.232.76.173] (port=52709 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NAkaa-0005Ca-1O
	for qemu-devel@nongnu.org; Wed, 18 Nov 2009 08:28:52 -0500
Received: from mail-yx0-f188.google.com ([209.85.210.188]:47164)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <tamura.yoshiaki@gmail.com>) id 1NAkaY-0000Qh-6x
	for qemu-devel@nongnu.org; Wed, 18 Nov 2009 08:28:50 -0500
Received: by yxe26 with SMTP id 26so941308yxe.4
	for <qemu-devel@nongnu.org>; Wed, 18 Nov 2009 05:28:46 -0800 (PST)
MIME-Version: 1.0
Sender: tamura.yoshiaki@gmail.com
In-Reply-To: <4B028334.1070004@lab.ntt.co.jp>
References: <4AF79242.20406@oss.ntt.co.jp> <4AFFD96D.5090100@redhat.com>
	<4B015F42.7070609@oss.ntt.co.jp> <4B01667F.3000600@redhat.com>
	<4B028334.1070004@lab.ntt.co.jp>
Date: Wed, 18 Nov 2009 22:28:46 +0900
Message-ID: <87e9effc0911180528s5546c8bt383a6674b382890d@mail.gmail.com>
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: [Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>, Chris Wright <chrisw@redhat.com>, =?ISO-2022-JP?B?GyRCQmdCPDc9GyhCKG9vbXVyYSBrZWkp?= <ohmura.kei@lab.ntt.co.jp>, kvm@vger.kernel.org, =?ISO-8859-1?Q?Fernando_Luis_V=E1zquez_Cao?= <fernando@oss.ntt.co.jp>, qemu-devel@nongnu.org, Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

2009/11/17 Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>:
> Avi Kivity wrote:
>>
>> On 11/16/2009 04:18 PM, Fernando Luis V=E1zquez Cao wrote:
>>>
>>> Avi Kivity wrote:
>>>>
>>>> On 11/09/2009 05:53 AM, Fernando Luis V=E1zquez Cao wrote:
>>>>>
>>>>> Kemari runs paired virtual machines in an active-passive configuratio=
n
>>>>> and achieves whole-system replication by continuously copying the
>>>>> state of the system (dirty pages and the state of the virtual devices=
)
>>>>> from the active node to the passive node. An interesting implication
>>>>> of this is that during normal operation only the active node is
>>>>> actually executing code.
>>>>>
>>>>
>>>> Can you characterize the performance impact for various workloads? =A0=
I
>>>> assume you are running continuously in log-dirty mode. =A0Doesn't this=
 make
>>>> memory intensive workloads suffer?
>>>
>>> Yes, we're running continuously in log-dirty mode.
>>>
>>> We still do not have numbers to show for KVM, but
>>> the snippets below from several runs of lmbench
>>> using Xen+Kemari will give you an idea of what you
>>> can expect in terms of overhead. All the tests were
>>> run using a fully virtualized Debian guest with
>>> hardware nested paging enabled.
>>>
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fork exec =A0 sh =A0 =A0P/F =A0=
C/S =A0 [us]
>>> ------------------------------------------------------
>>> Base =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0114 =A0349 1197 1.2845 =A08.2
>>> Kemari(10GbE) + FC =A0 =A0141 =A0403 1280 1.2835 11.6
>>> Kemari(10GbE) + DRBD =A0161 =A0415 1388 1.3145 11.6
>>> Kemari(1GbE) + FC =A0 =A0 151 =A0410 1335 1.3370 11.5
>>> Kemari(1GbE) + DRBD =A0 162 =A0413 1318 1.3239 11.6
>>> * P/F=3Dpage fault, C/S=3Dcontext switch
>>>
>>> The benchmarks above are memory intensive and, as you
>>> can see, the overhead varies widely from 7% to 40%.
>>> We also measured CPU bound operations, but, as expected,
>>> Kemari incurred almost no overhead.
>>
>> Is lmbench fork that memory intensive?
>>
>> Do you have numbers for benchmarks that use significant anonymous RSS?
>> =A0Say, a parallel kernel build.
>>
>> Note that scaling vcpus will increase a guest's memory-dirtying power bu=
t
>> snapshot rate will not scale in the same way.
>
> I don't think lmbench is intensive but it's sensitive to memory latency.
> We'll measure kernel build time with minimum config, and post it later.

Here are some quick numbers of parallel kernel compile time.
The number of vcpu is 1, just for convenience.

time make -j 2 all
---------------------------------------------------------------------------=
--
Base:    real 1m13.950s (user 1m2.742s, sys 0m10.446s)
Kemari: real 1m22.720s (user 1m5.882s, sys 0m10.882s)

time make -j 4 all
---------------------------------------------------------------------------=
--
Base:    real 1m11.234s (user 1m2.582s, sys 0m8.643s)
Kemari: real 1m26.964s (user 1m6.530s, sys 0m12.194s)

The result of Kemari includes everything, meaning dirty pages tracking and
synchronization upon I/O operations to the disk.
The compile time using j=3D4 under Kemari was worse than that of j=3D2,
but I'm not sure this is due to dirty pages tracking or sync interval.

Thanks,

Yoshi