linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: gleb@redhat.com
Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Subject: [PATCH v3 05/15] KVM: MMU: update spte and add it into rmap before dirty log
Date: Wed, 23 Oct 2013 21:29:23 +0800	[thread overview]
Message-ID: <1382534973-13197-6-git-send-email-xiaoguangrong@linux.vnet.ibm.com> (raw)
In-Reply-To: <1382534973-13197-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com>

kvm_vm_ioctl_get_dirty_log() write-protects the spte based on the its dirty
bitmap, so we should ensure the writable spte can be found in rmap before the
dirty bitmap is visible. Otherwise, we clear the dirty bitmap but fail to
write-protect the page which is detailed in the comments in this patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c | 84 ++++++++++++++++++++++++++++++++++++++++++------------
 arch/x86/kvm/x86.c | 10 +++++++
 2 files changed, 76 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 337d173..e85eed6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2427,6 +2427,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 {
 	u64 spte;
 	int ret = 0;
+	bool remap = is_rmap_spte(*sptep);
 
 	if (set_mmio_spte(vcpu->kvm, sptep, gfn, pfn, pte_access))
 		return 0;
@@ -2488,12 +2489,73 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		}
 	}
 
-	if (pte_access & ACC_WRITE_MASK)
-		mark_page_dirty(vcpu->kvm, gfn);
-
 set_pte:
 	if (mmu_spte_update(sptep, spte))
 		kvm_flush_remote_tlbs(vcpu->kvm);
+
+	if (!remap) {
+		if (rmap_add(vcpu, sptep, gfn) > RMAP_RECYCLE_THRESHOLD)
+			rmap_recycle(vcpu, sptep, gfn);
+
+		if (level > PT_PAGE_TABLE_LEVEL)
+			++vcpu->kvm->stat.lpages;
+	}
+
+	/*
+	 * The orders we require are:
+	 * 1) set spte to writable __before__ set the dirty bitmap.
+	 *    It makes sure that dirty-logging is not missed when do
+	 *    live migration at the final step where kvm should stop
+	 *    the guest and push the remaining dirty pages got from
+	 *    dirty-bitmap to the destination. The similar cases are
+	 *    in fast_pf_fix_direct_spte() and kvm_write_guest_page().
+	 *
+	 * 2) add the spte into rmap __before__ set the dirty bitmap.
+	 *
+	 * They can ensure we can find the writable spte on the rmap
+	 * when we do lockless write-protection since
+	 * kvm_vm_ioctl_get_dirty_log() write-protects the pages based
+	 * on its dirty-bitmap, otherwise these cases will happen:
+	 *
+	 *      CPU 0                         CPU 1
+	 *                              kvm ioctl doing get-dirty-pages
+	 * mark_page_dirty(gfn) which
+	 * set the gfn on the dirty maps
+	 *                              mask = xchg(dirty_bitmap, 0)
+	 *
+	 *                              try to write-protect gfns which
+	 *                              are set on "mask" then walk then
+	 *                              rmap, see no spte on that rmap
+	 * add the spte into rmap
+	 *
+	 * !!!!!! Then the page can be freely wrote but not recorded in
+	 * the dirty bitmap.
+	 *
+	 * And:
+	 *
+	 *      VCPU 0                        CPU 1
+	 *                                kvm ioctl doing get-dirty-pages
+	 * mark_page_dirty(gfn) which
+	 * set the gfn on the dirty maps
+	 *
+	 * add spte into rmap
+	 *                                mask = xchg(dirty_bitmap, 0)
+	 *
+	 *                                try to write-protect gfns which
+	 *                                are set on "mask" then walk then
+	 *                                rmap, see spte is on the ramp
+	 *                                but it is readonly or nonpresent
+	 * Mark spte writable
+	 *
+	 * !!!!!! Then the page can be freely wrote but not recorded in the
+	 * dirty bitmap.
+	 *
+	 * See the comments in kvm_vm_ioctl_get_dirty_log().
+	 */
+	smp_wmb();
+
+	if (pte_access & ACC_WRITE_MASK)
+		mark_page_dirty(vcpu->kvm, gfn);
 done:
 	return ret;
 }
@@ -2503,9 +2565,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			 int level, gfn_t gfn, pfn_t pfn, bool speculative,
 			 bool host_writable)
 {
-	int was_rmapped = 0;
-	int rmap_count;
-
 	pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 		 *sptep, write_fault, gfn);
 
@@ -2527,8 +2586,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 				 spte_to_pfn(*sptep), pfn);
 			drop_spte(vcpu->kvm, sptep);
 			kvm_flush_remote_tlbs(vcpu->kvm);
-		} else
-			was_rmapped = 1;
+		}
 	}
 
 	if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
@@ -2546,16 +2604,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 is_large_pte(*sptep)? "2MB" : "4kB",
 		 *sptep & PT_PRESENT_MASK ?"RW":"R", gfn,
 		 *sptep, sptep);
-	if (!was_rmapped && is_large_pte(*sptep))
-		++vcpu->kvm->stat.lpages;
-
-	if (is_shadow_present_pte(*sptep)) {
-		if (!was_rmapped) {
-			rmap_count = rmap_add(vcpu, sptep, gfn);
-			if (rmap_count > RMAP_RECYCLE_THRESHOLD)
-				rmap_recycle(vcpu, sptep, gfn);
-		}
-	}
 
 	kvm_release_pfn_clean(pfn);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 573c6b3..4ac3a27 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3566,6 +3566,16 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 		is_dirty = true;
 
 		mask = xchg(&dirty_bitmap[i], 0);
+		/*
+		 * smp_rmb();
+		 *
+		 * The read-barrier is implied in xchg() acting as a
+		 * full barrier and it ensures getting dirty bitmap
+		 * is before walking the rmap and spte write-protection.
+		 *
+		 * See the comments in set_spte().
+		 */
+
 		dirty_bitmap_buffer[i] = mask;
 
 		offset = i * BITS_PER_LONG;
-- 
1.8.1.4


  parent reply	other threads:[~2013-10-23 13:34 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-23 13:29 [PATCH v3 00/15] KVM: MMU: locklessly write-protect Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 01/15] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-11-12  0:25   ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 02/15] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-11-12 22:44   ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 03/15] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-11-13  0:10   ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 04/15] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2013-11-14  0:36   ` Marcelo Tosatti
2013-11-14  5:15     ` Xiao Guangrong
2013-11-14 18:39       ` Marcelo Tosatti
2013-11-15  7:09         ` Xiao Guangrong
2013-11-19  0:19           ` Marcelo Tosatti
2013-10-23 13:29 ` Xiao Guangrong [this message]
2013-11-15  0:08   ` [PATCH v3 05/15] KVM: MMU: update spte and add it into rmap before dirty log Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 06/15] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-11-19  0:48   ` Marcelo Tosatti
2013-10-23 13:29 ` [PATCH v3 07/15] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-11-22 19:14   ` Marcelo Tosatti
2013-11-25  6:11     ` Xiao Guangrong
2013-11-25  6:29       ` Xiao Guangrong
2013-11-25 18:12         ` Marcelo Tosatti
2013-11-26  3:21           ` Xiao Guangrong
2013-11-26 10:12             ` Gleb Natapov
2013-11-26 19:31             ` Marcelo Tosatti
2013-11-28  8:53               ` Xiao Guangrong
2013-12-03  7:10                 ` Xiao Guangrong
2013-12-05 13:50                   ` Marcelo Tosatti
2013-12-05 15:30                     ` Xiao Guangrong
2013-12-06  0:15                       ` Marcelo Tosatti
2013-12-06  0:22                       ` Marcelo Tosatti
2013-12-10  6:58                         ` Xiao Guangrong
2013-11-25 10:19       ` Gleb Natapov
2013-11-25 10:25         ` Xiao Guangrong
2013-11-25 12:48       ` Avi Kivity
2013-11-25 14:23         ` Marcelo Tosatti
2013-11-25 14:29           ` Gleb Natapov
2013-11-25 18:06             ` Marcelo Tosatti
2013-11-26  3:10           ` Xiao Guangrong
2013-11-26 10:15             ` Gleb Natapov
2013-11-26 19:58             ` Marcelo Tosatti
2013-11-28  8:32               ` Xiao Guangrong
2013-11-25 14:08       ` Marcelo Tosatti
2013-11-26  3:02         ` Xiao Guangrong
2013-11-25  9:31     ` Peter Zijlstra
2013-11-25 10:59       ` Xiao Guangrong
2013-11-25 11:05         ` Peter Zijlstra
2013-11-25 11:29           ` Peter Zijlstra
2013-10-23 13:29 ` [PATCH v3 08/15] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 09/15] KVM: MMU: initialize the pointers in pte_list_desc properly Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 10/15] KVM: MMU: allocate shadow pages from slab Xiao Guangrong
2013-10-24  9:19   ` Gleb Natapov
2013-10-24  9:29     ` Xiao Guangrong
2013-10-24  9:52       ` Gleb Natapov
2013-10-24 10:10         ` Xiao Guangrong
2013-10-24 10:39           ` Gleb Natapov
2013-10-24 11:01             ` Xiao Guangrong
2013-10-24 12:32               ` Gleb Natapov
2013-10-28  3:16                 ` Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 11/15] KVM: MMU: locklessly access shadow page under rcu protection Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 12/15] KVM: MMU: check last spte with unawareness of mapping level Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 13/15] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-10-24  9:17   ` Gleb Natapov
2013-10-24  9:24     ` Xiao Guangrong
2013-10-24  9:32       ` Gleb Natapov
2013-10-23 13:29 ` [PATCH v3 14/15] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-10-23 13:29 ` [PATCH v3 15/15] KVM: MMU: use rcu functions to access the pointer Xiao Guangrong
2013-11-03 12:29 ` [PATCH v3 00/15] KVM: MMU: locklessly write-protect Gleb Natapov
2013-11-11  5:33   ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1382534973-13197-6-git-send-email-xiaoguangrong@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi.kivity@gmail.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).