All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: "Fernando Luis Vázquez Cao" <fernando@oss.ntt.co.jp>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
	"\"大村圭(oomura kei)\"" <ohmura.kei@lab.ntt.co.jp>,
	"Yoshiaki Tamura" <tamura.yoshiaki@lab.ntt.co.jp>,
	"Takuya Yoshikawa" <yoshikawa.takuya@oss.ntt.co.jp>,
	anthony@codemonkey.ws, "Andrea Arcangeli" <aarcange@redhat.com>,
	"Chris Wright" <chrisw@redhat.com>
Subject: Re: [RFC] KVM Fault Tolerance: Kemari for KVM
Date: Sun, 15 Nov 2009 12:35:25 +0200	[thread overview]
Message-ID: <4AFFD96D.5090100@redhat.com> (raw)
In-Reply-To: <4AF79242.20406@oss.ntt.co.jp>

On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote:
>
> Kemari runs paired virtual machines in an active-passive configuration
> and achieves whole-system replication by continuously copying the
> state of the system (dirty pages and the state of the virtual devices)
> from the active node to the passive node. An interesting implication
> of this is that during normal operation only the active node is
> actually executing code.
>

Can you characterize the performance impact for various workloads?  I 
assume you are running continuously in log-dirty mode.  Doesn't this 
make memory intensive workloads suffer?

>
> The synchronization process can be broken down as follows:
>
>   - Event tapping: On KVM all I/O generates a VMEXIT that is
>     synchronously handled by the Linux kernel monitor i.e. KVM (it is
>     worth noting that this applies to virtio devices too, because they
>     use MMIO and PIO just like a regular PCI device).

Some I/O (virtio-based) is asynchronous, but you still have well-known 
tap points within qemu.

>
>   - Notification to qemu: Taking a page from live migration's
>     playbook, the synchronization process is user-space driven, which
>     means that qemu needs to be woken up at each synchronization
>     point. That is already the case for qemu-emulated devices, but we
>     also have in-kernel emulators. To compound the problem, even for
>     user-space emulated devices accesses to coalesced MMIO areas can
>     not be detected. As a consequence we need a mechanism to
>     communicate KVM-handled events to qemu.

Do you mean the ioapic, pic, and lapic?  Perhaps its best to start with 
those in userspace (-no-kvm-irqchip).

Why is access to those chips considered a synchronization point?

>   - Virtual machine synchronization: All the dirty pages since the
>     last synchronization point and the state of the virtual devices is
>     sent to the fallback node from the user-space qemu process. For this
>     the existing savevm infrastructure and KVM's dirty page tracking
>     capabilities can be reused. Regarding in-kernel devices, with the
>     likely advent of in-kernel virtio backends we need a generic way
>     to access their state from user-space, for which, again, the kvm_run
>     share memory area could be used.

I wonder if you can pipeline dirty memory synchronization.  That is, 
write-protect those pages that are dirty, start copying them to the 
other side, and continue execution, copying memory if the guest faults 
it again.

How many pages do you copy per synchronization point for reasonably 
difficult workloads?

-- 
error compiling committee.c: too many arguments to function


WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: "Fernando Luis Vázquez Cao" <fernando@oss.ntt.co.jp>
Cc: "Andrea Arcangeli" <aarcange@redhat.com>,
	"Chris Wright" <chrisw@redhat.com>,
	"omura kei)\"" <ohmura.kei@lab.ntt.co.jp>,
	kvm@vger.kernel.org,
	"Yoshiaki Tamura" <tamura.yoshiaki@lab.ntt.co.jp>,
	qemu-devel@nongnu.org,
	"Takuya Yoshikawa" <yoshikawa.takuya@oss.ntt.co.jp>,
	=?UTF-8?B?IuWkp+adkeWcrShv?=@gnu.org
Subject: [Qemu-devel] Re: [RFC] KVM Fault Tolerance: Kemari for KVM
Date: Sun, 15 Nov 2009 12:35:25 +0200	[thread overview]
Message-ID: <4AFFD96D.5090100@redhat.com> (raw)
In-Reply-To: <4AF79242.20406@oss.ntt.co.jp>

On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote:
>
> Kemari runs paired virtual machines in an active-passive configuration
> and achieves whole-system replication by continuously copying the
> state of the system (dirty pages and the state of the virtual devices)
> from the active node to the passive node. An interesting implication
> of this is that during normal operation only the active node is
> actually executing code.
>

Can you characterize the performance impact for various workloads?  I 
assume you are running continuously in log-dirty mode.  Doesn't this 
make memory intensive workloads suffer?

>
> The synchronization process can be broken down as follows:
>
>   - Event tapping: On KVM all I/O generates a VMEXIT that is
>     synchronously handled by the Linux kernel monitor i.e. KVM (it is
>     worth noting that this applies to virtio devices too, because they
>     use MMIO and PIO just like a regular PCI device).

Some I/O (virtio-based) is asynchronous, but you still have well-known 
tap points within qemu.

>
>   - Notification to qemu: Taking a page from live migration's
>     playbook, the synchronization process is user-space driven, which
>     means that qemu needs to be woken up at each synchronization
>     point. That is already the case for qemu-emulated devices, but we
>     also have in-kernel emulators. To compound the problem, even for
>     user-space emulated devices accesses to coalesced MMIO areas can
>     not be detected. As a consequence we need a mechanism to
>     communicate KVM-handled events to qemu.

Do you mean the ioapic, pic, and lapic?  Perhaps its best to start with 
those in userspace (-no-kvm-irqchip).

Why is access to those chips considered a synchronization point?

>   - Virtual machine synchronization: All the dirty pages since the
>     last synchronization point and the state of the virtual devices is
>     sent to the fallback node from the user-space qemu process. For this
>     the existing savevm infrastructure and KVM's dirty page tracking
>     capabilities can be reused. Regarding in-kernel devices, with the
>     likely advent of in-kernel virtio backends we need a generic way
>     to access their state from user-space, for which, again, the kvm_run
>     share memory area could be used.

I wonder if you can pipeline dirty memory synchronization.  That is, 
write-protect those pages that are dirty, start copying them to the 
other side, and continue execution, copying memory if the guest faults 
it again.

How many pages do you copy per synchronization point for reasonably 
difficult workloads?

-- 
error compiling committee.c: too many arguments to function

  parent reply	other threads:[~2009-11-15 10:35 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-09  3:53 [RFC] KVM Fault Tolerance: Kemari for KVM Fernando Luis Vázquez Cao
2009-11-09  3:53 ` [Qemu-devel] " Fernando Luis Vázquez Cao
2009-11-12 21:51 ` Dor Laor
2009-11-12 21:51   ` [Qemu-devel] " Dor Laor
2009-11-13 11:48   ` Yoshiaki Tamura
2009-11-13 11:48     ` [Qemu-devel] " Yoshiaki Tamura
2009-11-15 13:42     ` Dor Laor
2009-11-15 13:42       ` [Qemu-devel] " Dor Laor
2009-11-15 10:35 ` Avi Kivity [this message]
2009-11-15 10:35   ` Avi Kivity
2009-11-16 14:18   ` Fernando Luis Vázquez Cao
2009-11-16 14:18     ` [Qemu-devel] " Fernando Luis Vázquez Cao
2009-11-16 14:49     ` Avi Kivity
2009-11-16 14:49       ` [Qemu-devel] " Avi Kivity
2009-11-17 11:04       ` Yoshiaki Tamura
2009-11-17 11:04         ` [Qemu-devel] " Yoshiaki Tamura
2009-11-17 12:15         ` Avi Kivity
2009-11-17 12:15           ` [Qemu-devel] " Avi Kivity
2009-11-17 14:06           ` Yoshiaki Tamura
2009-11-17 14:06             ` [Qemu-devel] " Yoshiaki Tamura
2009-11-18 13:28         ` Yoshiaki Tamura
2009-11-18 13:28           ` [Qemu-devel] " Yoshiaki Tamura
2009-11-18 13:58           ` Avi Kivity
2009-11-18 13:58             ` [Qemu-devel] " Avi Kivity
2009-11-19  3:43             ` Yoshiaki Tamura
2009-11-19  3:43               ` [Qemu-devel] " Yoshiaki Tamura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AFFD96D.5090100@redhat.com \
    --to=avi@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=chrisw@redhat.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=kvm@vger.kernel.org \
    --cc=ohmura.kei@lab.ntt.co.jp \
    --cc=qemu-devel@nongnu.org \
    --cc=tamura.yoshiaki@lab.ntt.co.jp \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.