From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Cao, Lei" Subject: [PATCH v3 0/5] KVM: Ring-based dirty memory tracking for performant checkpointing solutions Date: Fri, 3 Feb 2017 19:58:47 +0000 Message-ID: References: <201702031949.v13Jn8eJ032004@dev1.sn.stratus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable To: Paolo Bonzini , =?iso-8859-2?Q?Radim_Kr=E8m=E1=F8?= , "kvm@vger.kernel.org" Return-path: Received: from us-smtp-delivery-131.mimecast.com ([63.128.21.131]:44598 "EHLO us-smtp-delivery-131.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751966AbdBCT6y (ORCPT ); Fri, 3 Feb 2017 14:58:54 -0500 Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: This patch series adds ring-based dirty memory tracking support for=0Aperfo= rmant checkpointing solutions. It can also be used by live migration=0Ato i= mprove predictability.=20=0A=0AIntroduction=0A=0ABrendan Cully's Remus proj= ect white paper is one of the best written on=20=0Athe subject of fault tol= erance using checkpoint/rollback techniques and=20=0Ais the best place to s= tart for a general background.=20=0A(http://www.cs.ubc.ca/~andy/papers/remu= s-nsdi-final.pdf) =20=0AIt gives a great outline of the basic requirements = and characteristics=20=0Aof a checkpointed system, including a few of the p= erformance issues. =20=0ABut Remus did not go far enough in the area of sys= tem performance for=20=0Acommercial production.=0A=0AThis patch series addr= esses known bottleneck and limitation in a=20=0Acheckpointed system: use of= large bitmaps to track dirty memory.=0AThese bitmaps are copied to userspa= ce when userspace queries KVM for=0Aits dirty page information. The use of = bitmaps makes sense in the=0Alive-migration method, as it is possible for a= ll of memory to be dirtied=0Afrom one log-dirty pass to another. But in a c= heckpointed system, the=0Anumber of dirty pages is bounded such that the VM= is paused when it has=0Adirtied a pre-defined number of pages. Traversing = a large, sparsely=0Apopulated bitmap to find set bits is time-consuming, as= is copying the=0Abitmap to user-space.=0A=0AThe preferred data structure f= or performant checkpointing solutions is=0Aa dense list of guest frame numb= ers (GFN). This patch series stores=0Athe dirty list in kernel memory that = can be memory mapped into=20=0Auserspace to allow speedy harvesting.=0A=0AT= he modification and still more modifications to qemu have allowed us=20=0At= o run checkpoint cycles at rates up to 2500 per second, while still=20=0Aa= llowing the VM to get useful work done.=0A=0ADesign Goals=0A=0AThe patch se= ries does not change or remove any existing KVM functionality.=0AIt represe= nts only additional functions (ioctls) into KVM from user space=20=0Aand th= ese changes coexist with the current dirty memory logging facilities.=20=0A= It is possible to run multiple guests such that some of the guests=0Aperfor= m live migration using the existing memory logging mechanism and=20=0Aother= s migrate or run in fault tolerant mode using the new memory tracking=20=0A= functions. =20=0A=0AModifications=0A=0AAll modifications affect only the KV= M instance where the primary (active) VM=20=0Ais running, and these modific= ations are not in play on the standby (passive)=20=0Ahost, where a VM is cr= eated that matches the primary in its configuration,=20=0Abut it does not e= xecute until a migration/failover event occurs.=0A=0APatch 1: Add support f= or capabilities that can be enabled in a generic way.=0A=09 Instroduce new = capability: ring-based dirty memory logging=0APatch 2: Add new data type, s= truct kvm_gfn_ring, and support functions for=0A=09 ring-based dirty memory= logging. Add new ioctl,=0A=09 KVM_RESET_DIRTY_PAGES, for dirty trap reset.= =0APatch 3: Modify kvm_write_guest_cached() and kvm_write_guest_offset_cach= ed() to=0A=09 take vcpu as a parameter instead kvm.=0APatch 4: Add new exit= reason KVM_EXIT_DIRTY_LOG_FULL for dirty ring full=0A=09 conditions.=0APat= ch 5: Implement ring-base dirty memory tracking.=0A=0A Documentation/virtua= l/kvm/api.txt | 94 +++++++++-=0A arch/powerpc/kvm/powerpc.c | 14 += -=0A arch/s390/kvm/kvm-s390.c | 11 +-=0A arch/x86/include/asm/kvm= _host.h | 5 +=0A arch/x86/kvm/Makefile | 3 +-=0A arch/x86= /kvm/lapic.c | 4 +-=0A arch/x86/kvm/mmu.c | = 7 +=0A arch/x86/kvm/vmx.c | 7 +=0A arch/x86/kvm/x86.c = | 36 ++--=0A include/linux/kvm_gfn_ring.h | 37 ++++=0A i= nclude/linux/kvm_host.h | 20 ++-=0A include/uapi/linux/kvm.h = | 33 ++++=0A virt/kvm/gfn_ring.c | 100 +++++++++++=0A v= irt/kvm/kvm_main.c | 267 ++++++++++++++++++++++++++--=0A 14 f= iles changed, 569 insertions(+), 69 deletions(-)=0A=0A