From: "Cao, Lei" <Lei.Cao@stratus.com>
To: "Paolo Bonzini" <pbonzini@redhat.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: [PATCH v3 0/5] KVM: Ring-based dirty memory tracking for performant checkpointing solutions
Date: Fri, 3 Feb 2017 19:58:47 +0000 [thread overview]
Message-ID: <CY1PR08MB1992C95AA09F4C4FE14687AEF04F0@CY1PR08MB1992.namprd08.prod.outlook.com> (raw)
In-Reply-To: 201702031949.v13Jn8eJ032004@dev1.sn.stratus.com
This patch series adds ring-based dirty memory tracking support for
performant checkpointing solutions. It can also be used by live migration
to improve predictability.
Introduction
Brendan Cully's Remus project white paper is one of the best written on
the subject of fault tolerance using checkpoint/rollback techniques and
is the best place to start for a general background.
(http://www.cs.ubc.ca/~andy/papers/remus-nsdi-final.pdf)
It gives a great outline of the basic requirements and characteristics
of a checkpointed system, including a few of the performance issues.
But Remus did not go far enough in the area of system performance for
commercial production.
This patch series addresses known bottleneck and limitation in a
checkpointed system: use of large bitmaps to track dirty memory.
These bitmaps are copied to userspace when userspace queries KVM for
its dirty page information. The use of bitmaps makes sense in the
live-migration method, as it is possible for all of memory to be dirtied
from one log-dirty pass to another. But in a checkpointed system, the
number of dirty pages is bounded such that the VM is paused when it has
dirtied a pre-defined number of pages. Traversing a large, sparsely
populated bitmap to find set bits is time-consuming, as is copying the
bitmap to user-space.
The preferred data structure for performant checkpointing solutions is
a dense list of guest frame numbers (GFN). This patch series stores
the dirty list in kernel memory that can be memory mapped into
userspace to allow speedy harvesting.
The modification and still more modifications to qemu have allowed us
to run checkpoint cycles at rates up to 2500 per second, while still
allowing the VM to get useful work done.
Design Goals
The patch series does not change or remove any existing KVM functionality.
It represents only additional functions (ioctls) into KVM from user space
and these changes coexist with the current dirty memory logging facilities.
It is possible to run multiple guests such that some of the guests
perform live migration using the existing memory logging mechanism and
others migrate or run in fault tolerant mode using the new memory tracking
functions.
Modifications
All modifications affect only the KVM instance where the primary (active) VM
is running, and these modifications are not in play on the standby (passive)
host, where a VM is created that matches the primary in its configuration,
but it does not execute until a migration/failover event occurs.
Patch 1: Add support for capabilities that can be enabled in a generic way.
Instroduce new capability: ring-based dirty memory logging
Patch 2: Add new data type, struct kvm_gfn_ring, and support functions for
ring-based dirty memory logging. Add new ioctl,
KVM_RESET_DIRTY_PAGES, for dirty trap reset.
Patch 3: Modify kvm_write_guest_cached() and kvm_write_guest_offset_cached() to
take vcpu as a parameter instead kvm.
Patch 4: Add new exit reason KVM_EXIT_DIRTY_LOG_FULL for dirty ring full
conditions.
Patch 5: Implement ring-base dirty memory tracking.
Documentation/virtual/kvm/api.txt | 94 +++++++++-
arch/powerpc/kvm/powerpc.c | 14 +-
arch/s390/kvm/kvm-s390.c | 11 +-
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/kvm/Makefile | 3 +-
arch/x86/kvm/lapic.c | 4 +-
arch/x86/kvm/mmu.c | 7 +
arch/x86/kvm/vmx.c | 7 +
arch/x86/kvm/x86.c | 36 ++--
include/linux/kvm_gfn_ring.h | 37 ++++
include/linux/kvm_host.h | 20 ++-
include/uapi/linux/kvm.h | 33 ++++
virt/kvm/gfn_ring.c | 100 +++++++++++
virt/kvm/kvm_main.c | 267 ++++++++++++++++++++++++++--
14 files changed, 569 insertions(+), 69 deletions(-)
next parent reply other threads:[~2017-02-03 19:58 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <201702031949.v13Jn8eJ032004@dev1.sn.stratus.com>
2017-02-03 19:58 ` Cao, Lei [this message]
2017-02-04 7:25 ` [PATCH v3 0/5] KVM: Ring-based dirty memory tracking for performant checkpointing solutions Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CY1PR08MB1992C95AA09F4C4FE14687AEF04F0@CY1PR08MB1992.namprd08.prod.outlook.com \
--to=lei.cao@stratus.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.