linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
@ 2014-03-10 14:01 Xiao Guangrong
  2014-03-10 14:01 ` [PATCH 1/5] Revert "KVM: Simplify kvm->tlbs_dirty handling" Xiao Guangrong
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-10 14:01 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

This patchset is splited from my previous patchset:
[PATCH v3 00/15] KVM: MMU: locklessly write-protect
that can be found at:
https://lkml.org/lkml/2013/10/23/265

Changelog v4 :
- add more comments for patch 5 and thank for Marcelo's review

Xiao Guangrong (5):
  Revert "KVM: Simplify kvm->tlbs_dirty handling"
  KVM: MMU: properly check last spte in fast_page_fault()
  KVM: MMU: lazily drop large spte
  KVM: MMU: flush tlb if the spte can be locklessly modified
  KVM: MMU: flush tlb out of mmu lock when write-protect the sptes

 arch/x86/kvm/mmu.c         | 72 ++++++++++++++++++++++++++++++----------------
 arch/x86/kvm/mmu.h         | 14 +++++++++
 arch/x86/kvm/paging_tmpl.h |  7 ++---
 arch/x86/kvm/x86.c         | 20 ++++++++++---
 include/linux/kvm_host.h   |  4 +--
 virt/kvm/kvm_main.c        |  5 +++-
 6 files changed, 85 insertions(+), 37 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] Revert "KVM: Simplify kvm->tlbs_dirty handling"
  2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
@ 2014-03-10 14:01 ` Xiao Guangrong
  2014-03-10 14:01 ` [PATCH 2/5] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-10 14:01 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

This reverts commit 5befdc385ddb2d5ae8995ad89004529a3acf58fc.

Since we will allow flush tlb out of mmu-lock in the later
patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/paging_tmpl.h | 7 +++----
 include/linux/kvm_host.h   | 4 +---
 virt/kvm/kvm_main.c        | 5 ++++-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b1e6c1b..cba218a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -913,8 +913,7 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
  *   and kvm_mmu_notifier_invalidate_range_start detect the mapping page isn't
  *   used by guest then tlbs are not flushed, so guest is allowed to access the
  *   freed pages.
- *   We set tlbs_dirty to let the notifier know this change and delay the flush
- *   until such a case actually happens.
+ *   And we increase kvm->tlbs_dirty to delay tlbs flush in this case.
  */
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
@@ -943,7 +942,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			return -EINVAL;
 
 		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
-			vcpu->kvm->tlbs_dirty = true;
+			vcpu->kvm->tlbs_dirty++;
 			continue;
 		}
 
@@ -958,7 +957,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		if (gfn != sp->gfns[i]) {
 			drop_spte(vcpu->kvm, &sp->spt[i]);
-			vcpu->kvm->tlbs_dirty = true;
+			vcpu->kvm->tlbs_dirty++;
 			continue;
 		}
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9816b68..f5937b8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -401,9 +401,7 @@ struct kvm {
 	unsigned long mmu_notifier_seq;
 	long mmu_notifier_count;
 #endif
-	/* Protected by mmu_lock */
-	bool tlbs_dirty;
-
+	long tlbs_dirty;
 	struct list_head devices;
 };
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5fd4cf8..f6a1e74 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,9 +186,12 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
+	long dirty_count = kvm->tlbs_dirty;
+
+	smp_mb();
 	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
 		++kvm->stat.remote_tlb_flush;
-	kvm->tlbs_dirty = false;
+	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] KVM: MMU: properly check last spte in fast_page_fault()
  2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
  2014-03-10 14:01 ` [PATCH 1/5] Revert "KVM: Simplify kvm->tlbs_dirty handling" Xiao Guangrong
@ 2014-03-10 14:01 ` Xiao Guangrong
  2014-03-10 14:01 ` [PATCH 3/5] KVM: MMU: lazily drop large spte Xiao Guangrong
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-10 14:01 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Using sp->role.level instead of @level since @level is not got from the
page table hierarchy

There is no issue in current code since the fast page fault currently only
fixes the fault caused by dirty-log that is always on the last level
(level = 1)

This patch makes the code more readable and avoids potential issue in the
further development

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f5704d9..046ef10 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2802,9 +2802,9 @@ static bool page_fault_can_be_fast(u32 error_code)
 }
 
 static bool
-fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 spte)
+fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			u64 *sptep, u64 spte)
 {
-	struct kvm_mmu_page *sp = page_header(__pa(sptep));
 	gfn_t gfn;
 
 	WARN_ON(!sp->role.direct);
@@ -2830,6 +2830,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 			    u32 error_code)
 {
 	struct kvm_shadow_walk_iterator iterator;
+	struct kvm_mmu_page *sp;
 	bool ret = false;
 	u64 spte = 0ull;
 
@@ -2853,7 +2854,8 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 		goto exit;
 	}
 
-	if (!is_last_spte(spte, level))
+	sp = page_header(__pa(iterator.sptep));
+	if (!is_last_spte(spte, sp->role.level))
 		goto exit;
 
 	/*
@@ -2879,7 +2881,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 	 * the gfn is not stable for indirect shadow page.
 	 * See Documentation/virtual/kvm/locking.txt to get more detail.
 	 */
-	ret = fast_pf_fix_direct_spte(vcpu, iterator.sptep, spte);
+	ret = fast_pf_fix_direct_spte(vcpu, sp, iterator.sptep, spte);
 exit:
 	trace_fast_page_fault(vcpu, gva, error_code, iterator.sptep,
 			      spte, ret);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] KVM: MMU: lazily drop large spte
  2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
  2014-03-10 14:01 ` [PATCH 1/5] Revert "KVM: Simplify kvm->tlbs_dirty handling" Xiao Guangrong
  2014-03-10 14:01 ` [PATCH 2/5] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
@ 2014-03-10 14:01 ` Xiao Guangrong
  2014-03-10 14:01 ` [PATCH 4/5] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-10 14:01 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them un-present, the page fault caused by read access can
be avoided

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.

This version has fixed the issue reported in 6b73a9606, the reason of that
issue is that fast_page_fault() directly sets the readonly large spte to
writable but only dirty the first page into the dirty-bitmap that means
other pages are missed. Fixed it by only the normal sptes (on the
PT_PAGE_TABLE_LEVEL level) can be fast fixed

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c | 34 ++++++++++++++++++----------------
 arch/x86/kvm/x86.c |  8 ++++++--
 2 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 046ef10..b8468c8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1176,8 +1176,7 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
 
 /*
  * Write-protect on the specified @sptep, @pt_protect indicates whether
- * spte writ-protection is caused by protecting shadow page table.
- * @flush indicates whether tlb need be flushed.
+ * spte write-protection is caused by protecting shadow page table.
  *
  * Note: write protection is difference between drity logging and spte
  * protection:
@@ -1186,10 +1185,9 @@ static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
  * - for spte protection, the spte can be writable only after unsync-ing
  *   shadow page.
  *
- * Return true if the spte is dropped.
+ * Return true if tlb need be flushed.
  */
-static bool
-spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
+static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect)
 {
 	u64 spte = *sptep;
 
@@ -1199,17 +1197,11 @@ spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)
 
 	rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep);
 
-	if (__drop_large_spte(kvm, sptep)) {
-		*flush |= true;
-		return true;
-	}
-
 	if (pt_protect)
 		spte &= ~SPTE_MMU_WRITEABLE;
 	spte = spte & ~PT_WRITABLE_MASK;
 
-	*flush |= mmu_spte_update(sptep, spte);
-	return false;
+	return mmu_spte_update(sptep, spte);
 }
 
 static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
@@ -1221,11 +1213,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
 
 	for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
 		BUG_ON(!(*sptep & PT_PRESENT_MASK));
-		if (spte_write_protect(kvm, sptep, &flush, pt_protect)) {
-			sptep = rmap_get_first(*rmapp, &iter);
-			continue;
-		}
 
+		flush |= spte_write_protect(kvm, sptep, pt_protect);
 		sptep = rmap_get_next(&iter);
 	}
 
@@ -2877,6 +2866,19 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
 		goto exit;
 
 	/*
+	 * Do not fix write-permission on the large spte since we only dirty
+	 * the first page into the dirty-bitmap in fast_pf_fix_direct_spte()
+	 * that means other pages are missed if its slot is dirty-logged.
+	 *
+	 * Instead, we let the slow page fault path create a normal spte to
+	 * fix the access.
+	 *
+	 * See the comments in kvm_arch_commit_memory_region().
+	 */
+	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
+		goto exit;
+
+	/*
 	 * Currently, fast page fault only works for direct mapping since
 	 * the gfn is not stable for indirect shadow page.
 	 * See Documentation/virtual/kvm/locking.txt to get more detail.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d906391..6bcc343 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7301,8 +7301,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
 	/*
 	 * Write protect all pages for dirty logging.
-	 * Existing largepage mappings are destroyed here and new ones will
-	 * not be created until the end of the logging.
+	 *
+	 * All the sptes including the large sptes which point to this
+	 * slot are set to readonly. We can not create any new large
+	 * spte on this slot until the end of the logging.
+	 *
+	 * See the comments in fast_page_fault().
 	 */
 	if ((change != KVM_MR_DELETE) && (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
 		kvm_mmu_slot_remove_write_access(kvm, mem->slot);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] KVM: MMU: flush tlb if the spte can be locklessly modified
  2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
                   ` (2 preceding siblings ...)
  2014-03-10 14:01 ` [PATCH 3/5] KVM: MMU: lazily drop large spte Xiao Guangrong
@ 2014-03-10 14:01 ` Xiao Guangrong
  2014-03-10 14:01 ` [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-10 14:01 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)

This patch is used to avoid this kind of race:

      VCPU 0                         VCPU 1
lockless wirte protection:
      set spte.w = 0
                                 lock mmu-lock

                                 write protection the spte to sync shadow page,
                                 see spte.w = 0, then without flush tlb

				 unlock mmu-lock

                                 !!! At this point, the shadow page can still be
                                     writable due to the corrupt tlb entry
     Flush all TLB

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b8468c8..17bb136 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -595,7 +595,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
 	 * we always atomicly update it, see the comments in
 	 * spte_has_volatile_bits().
 	 */
-	if (is_writable_pte(old_spte) && !is_writable_pte(new_spte))
+	if (spte_is_locklessly_modifiable(old_spte) &&
+	      !is_writable_pte(new_spte))
 		ret = true;
 
 	if (!shadow_accessed_mask)
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
  2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
                   ` (3 preceding siblings ...)
  2014-03-10 14:01 ` [PATCH 4/5] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
@ 2014-03-10 14:01 ` Xiao Guangrong
  2014-04-09 14:51   ` Marcelo Tosatti
  2014-03-25  3:29 ` [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
  2014-03-25 16:25 ` Hu Yaohui
  6 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-10 14:01 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm, Xiao Guangrong

Now we can flush all the TLBs out of the mmu lock without TLB corruption when
write-proect the sptes, it is because:
- we have marked large sptes readonly instead of dropping them that means we
  just change the spte from writable to readonly so that we only need to care
  the case of changing spte from present to present (changing the spte from
  present to nonpresent will flush all the TLBs immediately), in other words,
  the only case we need to care is mmu_spte_update()

- in mmu_spte_update(), we haved checked
  SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
  means it does not depend on PT_WRITABLE_MASK anymore

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/mmu.c | 25 +++++++++++++++++++++----
 arch/x86/kvm/mmu.h | 14 ++++++++++++++
 arch/x86/kvm/x86.c | 12 ++++++++++--
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 17bb136..01a8c35 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4281,15 +4281,32 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
 			if (*rmapp)
 				__rmap_write_protect(kvm, rmapp, false);
 
-			if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
-				kvm_flush_remote_tlbs(kvm);
+			if (need_resched() || spin_needbreak(&kvm->mmu_lock))
 				cond_resched_lock(&kvm->mmu_lock);
-			}
 		}
 	}
 
-	kvm_flush_remote_tlbs(kvm);
 	spin_unlock(&kvm->mmu_lock);
+
+	/*
+	 * kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
+	 * which do tlb flush out of mmu-lock should be serialized by
+	 * kvm->slots_lock otherwise tlb flush would be missed.
+	 */
+	lockdep_assert_held(&kvm->slots_lock);
+
+	/*
+	 * We can flush all the TLBs out of the mmu lock without TLB
+	 * corruption since we just change the spte from writable to
+	 * readonly so that we only need to care the case of changing
+	 * spte from present to present (changing the spte from present
+	 * to nonpresent will flush all the TLBs immediately), in other
+	 * words, the only case we care is mmu_spte_update() where we
+	 * haved checked SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE
+	 * instead of PT_WRITABLE_MASK, that means it does not depend
+	 * on PT_WRITABLE_MASK anymore.
+	 */
+	kvm_flush_remote_tlbs(kvm);
 }
 
 #define BATCH_ZAP_PAGES	10
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2926152..585d6b1 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -96,6 +96,20 @@ static inline int is_present_gpte(unsigned long pte)
 	return pte & PT_PRESENT_MASK;
 }
 
+/*
+ * Please note PT_WRITABLE_MASK is not stable since
+ * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
+ * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
+ *    can write protect sptes but flush tlb out mmu-lock that means we may use
+ *    the corrupt tlb entries which depend on this bit.
+ *
+ * Both cases do not modify the status of  spte_is_locklessly_modifiable() so
+ * if you want to check whether the spte is writable on MMU you can check
+ * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
+ * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
+ * instead. See the comments in spte_has_volatile_bits() and
+ * mmu_spte_update().
+ */
 static inline int is_writable_pte(unsigned long pte)
 {
 	return pte & PT_WRITABLE_MASK;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6bcc343..ba1b1db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3642,11 +3642,19 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 		offset = i * BITS_PER_LONG;
 		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
 	}
-	if (is_dirty)
-		kvm_flush_remote_tlbs(kvm);
 
 	spin_unlock(&kvm->mmu_lock);
 
+	/* See the comments in kvm_mmu_slot_remove_write_access(). */
+	lockdep_assert_held(&kvm->slots_lock);
+
+	/*
+	 * All the TLBs can be flushed out of mmu lock, see the comments in
+	 * kvm_mmu_slot_remove_write_access().
+	 */
+	if (is_dirty)
+		kvm_flush_remote_tlbs(kvm);
+
 	r = -EFAULT;
 	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
 		goto out;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
  2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
                   ` (4 preceding siblings ...)
  2014-03-10 14:01 ` [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
@ 2014-03-25  3:29 ` Xiao Guangrong
  2014-03-25 10:22   ` Paolo Bonzini
  2014-03-25 16:25 ` Hu Yaohui
  6 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-25  3:29 UTC (permalink / raw)
  To: gleb; +Cc: avi.kivity, mtosatti, pbonzini, linux-kernel, kvm


Ping......

On 03/10/2014 10:01 PM, Xiao Guangrong wrote:
> This patchset is splited from my previous patchset:
> [PATCH v3 00/15] KVM: MMU: locklessly write-protect
> that can be found at:
> https://lkml.org/lkml/2013/10/23/265
> 
> Changelog v4 :
> - add more comments for patch 5 and thank for Marcelo's review
> 
> Xiao Guangrong (5):
>   Revert "KVM: Simplify kvm->tlbs_dirty handling"
>   KVM: MMU: properly check last spte in fast_page_fault()
>   KVM: MMU: lazily drop large spte
>   KVM: MMU: flush tlb if the spte can be locklessly modified
>   KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
> 
>  arch/x86/kvm/mmu.c         | 72 ++++++++++++++++++++++++++++++----------------
>  arch/x86/kvm/mmu.h         | 14 +++++++++
>  arch/x86/kvm/paging_tmpl.h |  7 ++---
>  arch/x86/kvm/x86.c         | 20 ++++++++++---
>  include/linux/kvm_host.h   |  4 +--
>  virt/kvm/kvm_main.c        |  5 +++-
>  6 files changed, 85 insertions(+), 37 deletions(-)
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
  2014-03-25  3:29 ` [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
@ 2014-03-25 10:22   ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2014-03-25 10:22 UTC (permalink / raw)
  To: Xiao Guangrong, gleb; +Cc: avi.kivity, mtosatti, linux-kernel, kvm

Il 25/03/2014 04:29, Xiao Guangrong ha scritto:
> Ping......

Hi Xiao, I'll look at these patches soon.  I plan to merge them for 3.16.

Paolo

> On 03/10/2014 10:01 PM, Xiao Guangrong wrote:
>> This patchset is splited from my previous patchset:
>> [PATCH v3 00/15] KVM: MMU: locklessly write-protect
>> that can be found at:
>> https://lkml.org/lkml/2013/10/23/265
>>
>> Changelog v4 :
>> - add more comments for patch 5 and thank for Marcelo's review
>>
>> Xiao Guangrong (5):
>>   Revert "KVM: Simplify kvm->tlbs_dirty handling"
>>   KVM: MMU: properly check last spte in fast_page_fault()
>>   KVM: MMU: lazily drop large spte
>>   KVM: MMU: flush tlb if the spte can be locklessly modified
>>   KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
>>
>>  arch/x86/kvm/mmu.c         | 72 ++++++++++++++++++++++++++++++----------------
>>  arch/x86/kvm/mmu.h         | 14 +++++++++
>>  arch/x86/kvm/paging_tmpl.h |  7 ++---
>>  arch/x86/kvm/x86.c         | 20 ++++++++++---
>>  include/linux/kvm_host.h   |  4 +--
>>  virt/kvm/kvm_main.c        |  5 +++-
>>  6 files changed, 85 insertions(+), 37 deletions(-)
>>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
  2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
                   ` (5 preceding siblings ...)
  2014-03-25  3:29 ` [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
@ 2014-03-25 16:25 ` Hu Yaohui
  2014-03-26  4:40   ` Hu Yaohui
  2014-03-26  4:54   ` Xiao Guangrong
  6 siblings, 2 replies; 15+ messages in thread
From: Hu Yaohui @ 2014-03-25 16:25 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Gleb Natapov, avi.kivity, Marcelo Tosatti, Paolo Bonzini,
	linux-kernel, kvm

Hi Guangrong,
Since you have written in the kvm/mmu.txt.
<quote>

  unsync:
    If true, then the translations in this page may not match the guest's
    translation.  This is equivalent to the state of the tlb when a pte is
    changed but before the tlb entry is flushed.  Accordingly, unsync ptes
    are synchronized when the guest executes invlpg or flushes its tlb by
    other means.  Valid for leaf pages.

</quote>

This make sense to me, my question is when those unsync bits will be
set? When the guest writes to the level 1 guest page tables, it will
not cause a page fault. Those unsync bit is unlikely to be set when
the entry is modified. (correct me if I am wrong).

<code>
static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
{
...
    for_each_shadow_entry(vcpu, gva, iterator) {
        level = iterator.level;
        sptep = iterator.sptep;

        sp = page_header(__pa(sptep));
        if (is_last_spte(*sptep, level)) {
            int offset, shift;

            if (!sp->unsync)
                break;
...
</code>
When guest called invlpg, kvm invlpg will navigate to to the last
level,  if the sp->unsync is not set to 1, since the initial value is
zero. it will just break. It's not straight forward to me that the
specified sp will be synced with the guest page table.

I think I have missed something or misunderstood the whole mechanism,
I would really appreciate it if you could shed some lights on that.

Best Wishes,
Yaohui Hu

On Mon, Mar 10, 2014 at 10:01 AM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> This patchset is splited from my previous patchset:
> [PATCH v3 00/15] KVM: MMU: locklessly write-protect
> that can be found at:
> https://lkml.org/lkml/2013/10/23/265
>
> Changelog v4 :
> - add more comments for patch 5 and thank for Marcelo's review
>
> Xiao Guangrong (5):
>   Revert "KVM: Simplify kvm->tlbs_dirty handling"
>   KVM: MMU: properly check last spte in fast_page_fault()
>   KVM: MMU: lazily drop large spte
>   KVM: MMU: flush tlb if the spte can be locklessly modified
>   KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
>
>  arch/x86/kvm/mmu.c         | 72 ++++++++++++++++++++++++++++++----------------
>  arch/x86/kvm/mmu.h         | 14 +++++++++
>  arch/x86/kvm/paging_tmpl.h |  7 ++---
>  arch/x86/kvm/x86.c         | 20 ++++++++++---
>  include/linux/kvm_host.h   |  4 +--
>  virt/kvm/kvm_main.c        |  5 +++-
>  6 files changed, 85 insertions(+), 37 deletions(-)
>
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
  2014-03-25 16:25 ` Hu Yaohui
@ 2014-03-26  4:40   ` Hu Yaohui
  2014-03-26  5:07     ` Xiao Guangrong
  2014-03-26  4:54   ` Xiao Guangrong
  1 sibling, 1 reply; 15+ messages in thread
From: Hu Yaohui @ 2014-03-26  4:40 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Gleb Natapov, Avi Kivity, Marcelo Tosatti, Paolo Bonzini,
	linux-kernel, kvm

Hi all,
I hope you have a good day!
I have debugged the code myself. I have called dump_stack() in
function "__kvm_unsync_page"
and function "invlpg".  Actually every time before invlpg is called,
the page fault handled will call "__kvm_unsync_page" before invlpg to
mark the specified sp as unsynced. (correct me if I am wrong). I am
wondering why there is a page fault. AFAIK when calling flush_tlb_page
in the guest os. it will issue invlpg instruction directly, I did not
see any operation which could always cause the page fault.I would
really appreciate if if someone could shed me some lights on it.
Thanks for your time!

Best Wishes,
Yaohui

On Tue, Mar 25, 2014 at 12:25 PM, Hu Yaohui <loki2441@gmail.com> wrote:
> Hi Guangrong,
> Since you have written in the kvm/mmu.txt.
> <quote>
>
>   unsync:
>     If true, then the translations in this page may not match the guest's
>     translation.  This is equivalent to the state of the tlb when a pte is
>     changed but before the tlb entry is flushed.  Accordingly, unsync ptes
>     are synchronized when the guest executes invlpg or flushes its tlb by
>     other means.  Valid for leaf pages.
>
> </quote>
>
> This make sense to me, my question is when those unsync bits will be
> set? When the guest writes to the level 1 guest page tables, it will
> not cause a page fault. Those unsync bit is unlikely to be set when
> the entry is modified. (correct me if I am wrong).
>
> <code>
> static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
> {
> ...
>     for_each_shadow_entry(vcpu, gva, iterator) {
>         level = iterator.level;
>         sptep = iterator.sptep;
>
>         sp = page_header(__pa(sptep));
>         if (is_last_spte(*sptep, level)) {
>             int offset, shift;
>
>             if (!sp->unsync)
>                 break;
> ...
> </code>
> When guest called invlpg, kvm invlpg will navigate to to the last
> level,  if the sp->unsync is not set to 1, since the initial value is
> zero. it will just break. It's not straight forward to me that the
> specified sp will be synced with the guest page table.
>
> I think I have missed something or misunderstood the whole mechanism,
> I would really appreciate it if you could shed some lights on that.
>
> Best Wishes,
> Yaohui Hu
>
> On Mon, Mar 10, 2014 at 10:01 AM, Xiao Guangrong
> <xiaoguangrong@linux.vnet.ibm.com> wrote:
>> This patchset is splited from my previous patchset:
>> [PATCH v3 00/15] KVM: MMU: locklessly write-protect
>> that can be found at:
>> https://lkml.org/lkml/2013/10/23/265
>>
>> Changelog v4 :
>> - add more comments for patch 5 and thank for Marcelo's review
>>
>> Xiao Guangrong (5):
>>   Revert "KVM: Simplify kvm->tlbs_dirty handling"
>>   KVM: MMU: properly check last spte in fast_page_fault()
>>   KVM: MMU: lazily drop large spte
>>   KVM: MMU: flush tlb if the spte can be locklessly modified
>>   KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
>>
>>  arch/x86/kvm/mmu.c         | 72 ++++++++++++++++++++++++++++++----------------
>>  arch/x86/kvm/mmu.h         | 14 +++++++++
>>  arch/x86/kvm/paging_tmpl.h |  7 ++---
>>  arch/x86/kvm/x86.c         | 20 ++++++++++---
>>  include/linux/kvm_host.h   |  4 +--
>>  virt/kvm/kvm_main.c        |  5 +++-
>>  6 files changed, 85 insertions(+), 37 deletions(-)
>>
>> --
>> 1.8.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
  2014-03-25 16:25 ` Hu Yaohui
  2014-03-26  4:40   ` Hu Yaohui
@ 2014-03-26  4:54   ` Xiao Guangrong
  1 sibling, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-26  4:54 UTC (permalink / raw)
  To: Hu Yaohui
  Cc: Gleb Natapov, avi.kivity, Marcelo Tosatti, Paolo Bonzini,
	linux-kernel, kvm


A suggestion: please send a new mail to ask question, especially,
when your question is not related to the patches, so that others
will probably discover the topic and join the discussion.

On 03/26/2014 12:25 AM, Hu Yaohui wrote:
> Hi Guangrong,
> Since you have written in the kvm/mmu.txt.
> <quote>
> 
>   unsync:
>     If true, then the translations in this page may not match the guest's
>     translation.  This is equivalent to the state of the tlb when a pte is
>     changed but before the tlb entry is flushed.  Accordingly, unsync ptes
>     are synchronized when the guest executes invlpg or flushes its tlb by
>     other means.  Valid for leaf pages.
> 
> </quote>
> 
> This make sense to me, my question is when those unsync bits will be
> set? When the guest writes to the level 1 guest page tables, it will
> not cause a page fault. Those unsync bit is unlikely to be set when
> the entry is modified. (correct me if I am wrong).

The bit is set in mmu_need_write_protect() where the host makes decision
if the page need to be write-protected (!unsync) or to be unsynced.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
  2014-03-26  4:40   ` Hu Yaohui
@ 2014-03-26  5:07     ` Xiao Guangrong
  2014-03-26  5:30       ` Hu Yaohui
  0 siblings, 1 reply; 15+ messages in thread
From: Xiao Guangrong @ 2014-03-26  5:07 UTC (permalink / raw)
  To: Hu Yaohui
  Cc: Gleb Natapov, Avi Kivity, Marcelo Tosatti, Paolo Bonzini,
	linux-kernel, kvm

On 03/26/2014 12:40 PM, Hu Yaohui wrote:
> Hi all,
> I hope you have a good day!
> I have debugged the code myself. I have called dump_stack() in
> function "__kvm_unsync_page"
> and function "invlpg".  Actually every time before invlpg is called,
> the page fault handled will call "__kvm_unsync_page" before invlpg to
> mark the specified sp as unsynced. (correct me if I am wrong). I am
> wondering why there is a page fault. AFAIK when calling flush_tlb_page
> in the guest os. it will issue invlpg instruction directly, I did not
> see any operation which could always cause the page fault.I would
> really appreciate if if someone could shed me some lights on it.
> Thanks for your time!

Page fault is used to map a page into guest and set the proper permission
on for it so this is the right place to make decision if the page need be
writable.

Tlb flush is used when guest wants to have a clean tlb so that host syncs the
unsync page tables when it is happing.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection
  2014-03-26  5:07     ` Xiao Guangrong
@ 2014-03-26  5:30       ` Hu Yaohui
  0 siblings, 0 replies; 15+ messages in thread
From: Hu Yaohui @ 2014-03-26  5:30 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Gleb Natapov, Avi Kivity, Marcelo Tosatti, Paolo Bonzini,
	linux-kernel, kvm

Hi Guangrong,
Thanks a lot!
I will reopen a new discussion next time.
Does that means once the level 1 guest page table entry is modified,
the host should make a decision in page fault handler whether the page
sync or unsync?
My question is that since all level 1 guest page tables are writable,
when the entry in the page table is modified. there is no page fault
will happen, how can the mmu_need_write_protect to be called to make
whether sync or unsync decision?
Thanks for your time!

Best Wishes,
Yaohui

On Wed, Mar 26, 2014 at 1:07 AM, Xiao Guangrong
<xiaoguangrong@linux.vnet.ibm.com> wrote:
> On 03/26/2014 12:40 PM, Hu Yaohui wrote:
>> Hi all,
>> I hope you have a good day!
>> I have debugged the code myself. I have called dump_stack() in
>> function "__kvm_unsync_page"
>> and function "invlpg".  Actually every time before invlpg is called,
>> the page fault handled will call "__kvm_unsync_page" before invlpg to
>> mark the specified sp as unsynced. (correct me if I am wrong). I am
>> wondering why there is a page fault. AFAIK when calling flush_tlb_page
>> in the guest os. it will issue invlpg instruction directly, I did not
>> see any operation which could always cause the page fault.I would
>> really appreciate if if someone could shed me some lights on it.
>> Thanks for your time!
>
> Page fault is used to map a page into guest and set the proper permission
> on for it so this is the right place to make decision if the page need be
> writable.
>
> Tlb flush is used when guest wants to have a clean tlb so that host syncs the
> unsync page tables when it is happing.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
  2014-03-10 14:01 ` [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
@ 2014-04-09 14:51   ` Marcelo Tosatti
  2014-04-15 11:11     ` Xiao Guangrong
  0 siblings, 1 reply; 15+ messages in thread
From: Marcelo Tosatti @ 2014-04-09 14:51 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, avi.kivity, pbonzini, linux-kernel, kvm

On Mon, Mar 10, 2014 at 10:01:49PM +0800, Xiao Guangrong wrote:
> Now we can flush all the TLBs out of the mmu lock without TLB corruption when
> write-proect the sptes, it is because:
> - we have marked large sptes readonly instead of dropping them that means we
>   just change the spte from writable to readonly so that we only need to care
>   the case of changing spte from present to present (changing the spte from
>   present to nonpresent will flush all the TLBs immediately), in other words,
>   the only case we need to care is mmu_spte_update()
> 
> - in mmu_spte_update(), we haved checked
>   SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
>   means it does not depend on PT_WRITABLE_MASK anymore
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/mmu.c | 25 +++++++++++++++++++++----
>  arch/x86/kvm/mmu.h | 14 ++++++++++++++
>  arch/x86/kvm/x86.c | 12 ++++++++++--
>  3 files changed, 45 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 17bb136..01a8c35 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4281,15 +4281,32 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
>  			if (*rmapp)
>  				__rmap_write_protect(kvm, rmapp, false);
>  
> -			if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
> -				kvm_flush_remote_tlbs(kvm);
> +			if (need_resched() || spin_needbreak(&kvm->mmu_lock))
>  				cond_resched_lock(&kvm->mmu_lock);
> -			}
>  		}
>  	}
>  
> -	kvm_flush_remote_tlbs(kvm);
>  	spin_unlock(&kvm->mmu_lock);
> +
> +	/*
> +	 * kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
> +	 * which do tlb flush out of mmu-lock should be serialized by
> +	 * kvm->slots_lock otherwise tlb flush would be missed.
> +	 */
> +	lockdep_assert_held(&kvm->slots_lock);
> +
> +	/*
> +	 * We can flush all the TLBs out of the mmu lock without TLB
> +	 * corruption since we just change the spte from writable to
> +	 * readonly so that we only need to care the case of changing
> +	 * spte from present to present (changing the spte from present
> +	 * to nonpresent will flush all the TLBs immediately), in other
> +	 * words, the only case we care is mmu_spte_update() where we
> +	 * haved checked SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE
> +	 * instead of PT_WRITABLE_MASK, that means it does not depend
> +	 * on PT_WRITABLE_MASK anymore.
> +	 */
> +	kvm_flush_remote_tlbs(kvm);
>  }
>  
>  #define BATCH_ZAP_PAGES	10
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 2926152..585d6b1 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -96,6 +96,20 @@ static inline int is_present_gpte(unsigned long pte)
>  	return pte & PT_PRESENT_MASK;
>  }
>  
> +/*
> + * Please note PT_WRITABLE_MASK is not stable since
> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
> + *    can write protect sptes but flush tlb out mmu-lock that means we may use
> + *    the corrupt tlb entries which depend on this bit.
> + *
> + * Both cases do not modify the status of  spte_is_locklessly_modifiable() so
> + * if you want to check whether the spte is writable on MMU you can check
> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
> + * instead. See the comments in spte_has_volatile_bits() and
> + * mmu_spte_update().
> + */
>  static inline int is_writable_pte(unsigned long pte)
>  {

Xiao,

Can't get the SPTE_MMU_WRITEABLE part.

So assume you are writing code to perform some action after guest memory
has been write protected. You would

spin_lock(mmu_lock);

if (writeable spte bit is set)
    remove writeable spte bit from spte
remote TLB flush            (*)
action

spin_unlock(mmu_lock);

(*) is necessary because reading the writeable spte bit as zero
does not guarantee remote TLBs have been flushed.

Now what SPTE_MMU_WRITEABLE has to do with it ?

Perhaps a recipe like that (or just the rules) would be useful.

The remaining patches look good.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
  2014-04-09 14:51   ` Marcelo Tosatti
@ 2014-04-15 11:11     ` Xiao Guangrong
  0 siblings, 0 replies; 15+ messages in thread
From: Xiao Guangrong @ 2014-04-15 11:11 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: gleb, avi.kivity, pbonzini, linux-kernel, kvm


Hi Marcelo,

Thanks your time to review it.

On 04/09/2014 10:51 PM, Marcelo Tosatti wrote:

>> +/*
>> + * Please note PT_WRITABLE_MASK is not stable since
>> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
>> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
>> + *    can write protect sptes but flush tlb out mmu-lock that means we may use
>> + *    the corrupt tlb entries which depend on this bit.
>> + *
>> + * Both cases do not modify the status of  spte_is_locklessly_modifiable() so
>> + * if you want to check whether the spte is writable on MMU you can check
>> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
>> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
>> + * instead. See the comments in spte_has_volatile_bits() and
>> + * mmu_spte_update().
>> + */
>>  static inline int is_writable_pte(unsigned long pte)
>>  {
> 
> Xiao,
> 
> Can't get the SPTE_MMU_WRITEABLE part.
> 
> So assume you are writing code to perform some action after guest memory
> has been write protected. You would
> 
> spin_lock(mmu_lock);
> 
> if (writeable spte bit is set)
>     remove writeable spte bit from spte
> remote TLB flush            (*)
> action
> 
> spin_unlock(mmu_lock);
> 
> (*) is necessary because reading the writeable spte bit as zero
> does not guarantee remote TLBs have been flushed.
> 
> Now what SPTE_MMU_WRITEABLE has to do with it ?

It is contained in "remove writeable spte bit from spte" which
is done by mmu_spte_update():

	if (spte_is_locklessly_modifiable(old_spte) &&
	      !is_writable_pte(new_spte))
		ret = true;
> 
> Perhaps a recipe like that (or just the rules) would be useful.

Okay, i am considering to improve this comments, how about like
this:

Currently, we have two sorts of write-protection, a) the first
one write-protects guest page to sync the guest modification,
b) another one is used to sync dirty bitmap when we do
KVM_GET_DIRTY_LOG. The differences between these two sorts are:
1) the first case clears SPTE_MMU_WRITEABLE bit.
2) the first case requires flushing tlb immediately avoiding
   corrupting shadow page table between all vcpus so it should
   be in the protection of mmu-lock. And the another case does
   not need to flush tlb until returning the dirty bitmap to
   userspace since it only write-protects the page logged in
   the bitmap, that means the page in the dirty bitmap is not
   missed, so it can flush tlb out of mmu-lock.

So, there is the problem: the first case can meet the corrupted
tlb caused by another case which write-protects pages but without
flush tlb immediately. In order to making the first case be aware
this problem we let it flush tlb if we try to write-protect
a spte whose SPTE_MMU_WRITEABLE bit is set, it works since another
case never touches SPTE_MMU_WRITEABLE bit.

Anyway, whenever a spte is updated (only permission and status bits
are changed) we need to check whether the spte with SPTE_MMU_WRITEABLE
becomes readonly, if that happens, we need to flush tlb. Fortunately,
mmu_spte_update() has already handled it perfectly.

The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK:
- if we want to see if it has writable tlb entry or if the spte can
  be writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is
  the most case, otherwise
- if we fix page fault on the spte or do write-protection by dirty logging,
  check PT_WRITABLE_MASK.

TODO: introduce APIs to split these two cases.

> 
> The remaining patches look good.

Thank you.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-04-15 11:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-10 14:01 [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
2014-03-10 14:01 ` [PATCH 1/5] Revert "KVM: Simplify kvm->tlbs_dirty handling" Xiao Guangrong
2014-03-10 14:01 ` [PATCH 2/5] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2014-03-10 14:01 ` [PATCH 3/5] KVM: MMU: lazily drop large spte Xiao Guangrong
2014-03-10 14:01 ` [PATCH 4/5] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2014-03-10 14:01 ` [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Xiao Guangrong
2014-04-09 14:51   ` Marcelo Tosatti
2014-04-15 11:11     ` Xiao Guangrong
2014-03-25  3:29 ` [PATCH v4 0/5] KVM: x86: flush tlb out of mmu-lock after write protection Xiao Guangrong
2014-03-25 10:22   ` Paolo Bonzini
2014-03-25 16:25 ` Hu Yaohui
2014-03-26  4:40   ` Hu Yaohui
2014-03-26  5:07     ` Xiao Guangrong
2014-03-26  5:30       ` Hu Yaohui
2014-03-26  4:54   ` Xiao Guangrong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).