All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Unlocked TLB flush
@ 2012-05-03 11:22 Avi Kivity
  2012-05-03 11:22 ` [PATCH 1/4] KVM: Add APIs for unlocked " Avi Kivity
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Avi Kivity @ 2012-05-03 11:22 UTC (permalink / raw)
  To: kvm, Marcelo Tosatti, Xiao Guangrong, takuya.yoshikawa

This patchset implements unlocked TLB flushing for KVM.  An operation that
generates stale TLB entries can mark the TLB as dirty instead of flushing
immediately, and then flush after releasing mmu_lock but before returning
to the guest or the caller.  A few call sites are converted too.

Note not all call sites are easily convertible; as an example, sync_page()
must flush before reading the guest page table.

Avi Kivity (4):
  KVM: Add APIs for unlocked TLB flush
  KVM: Flush TLB in mmu notifier without holding mmu_lock
  KVM: Flush TLB in FNAME(invlpg) without holding mmu_lock
  KVM: Flush TLB in change_pte mmu notifier without holding mmu_lock

 Documentation/virtual/kvm/locking.txt |   14 ++++++++++++
 arch/x86/kvm/mmu.c                    |   15 +++++--------
 arch/x86/kvm/paging_tmpl.h            |    9 ++++----
 include/linux/kvm_host.h              |   22 ++++++++++++++++++-
 virt/kvm/kvm_main.c                   |   39 +++++++++++++++++++++++----------
 5 files changed, 72 insertions(+), 27 deletions(-)

-- 
1.7.10


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-03 11:22 [PATCH 0/4] Unlocked TLB flush Avi Kivity
@ 2012-05-03 11:22 ` Avi Kivity
  2012-05-03 13:23   ` Xiao Guangrong
  2012-05-03 11:23 ` [PATCH 2/4] KVM: Flush TLB in mmu notifier without holding mmu_lock Avi Kivity
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2012-05-03 11:22 UTC (permalink / raw)
  To: kvm, Marcelo Tosatti, Xiao Guangrong, takuya.yoshikawa

Currently we flush the TLB while holding mmu_lock.  This
increases the lock hold time by the IPI round-trip time, increasing
contention, and makes dropping the lock (for latency reasons) harder.

This patch changes TLB management to be usable locklessly, introducing
the following APIs:

  kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
  kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
                                  dirty

These APIs can be used without holding mmu_lock (though if the TLB
became stale due to shadow page table modifications, typically it
will need to be called with the lock held to prevent other threads
from seeing the modified page tables with the TLB unmarked and unflushed)/

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 Documentation/virtual/kvm/locking.txt |   14 ++++++++++++++
 arch/x86/kvm/paging_tmpl.h            |    4 ++--
 include/linux/kvm_host.h              |   22 +++++++++++++++++++++-
 virt/kvm/kvm_main.c                   |   29 ++++++++++++++++++++---------
 4 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
index 3b4cd3b..f6c90479 100644
--- a/Documentation/virtual/kvm/locking.txt
+++ b/Documentation/virtual/kvm/locking.txt
@@ -23,3 +23,17 @@ Arch:		x86
 Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
 		- tsc offset in vmcb
 Comment:	'raw' because updating the tsc offsets must not be preempted.
+
+3. TLB control
+--------------
+
+The following APIs should be used for TLB control:
+
+ - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
+      either guest or host page tables.
+ - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
+ - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
+
+These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
+used while holding mmu_lock if it is called due to host page table changes
+(contrast to guest page table changes).
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 34f9709..97e2a81 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -793,7 +793,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			return -EINVAL;
 
 		if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
-			vcpu->kvm->tlbs_dirty++;
+			kvm_mark_tlb_dirty(vcpu->kvm);
 			continue;
 		}
 
@@ -806,7 +806,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		if (gfn != sp->gfns[i]) {
 			drop_spte(vcpu->kvm, &sp->spt[i]);
-			vcpu->kvm->tlbs_dirty++;
+			kvm_mark_tlb_dirty(vcpu->kvm);
 			continue;
 		}
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6f34330..4bff05d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -310,7 +310,14 @@ struct kvm {
 	unsigned long mmu_notifier_seq;
 	long mmu_notifier_count;
 #endif
-	long tlbs_dirty;
+	struct {
+		/*
+		 * When these two are different, a TLB somewhere holds a
+		 * stale TLB entry.  Clean with kvm_[cond_]flush_remote_tlbs().
+		 */
+		atomic_long_t dirtied_count;
+		atomic_long_t flushed_count;
+	} tlb_state;
 };
 
 /* The guest did something we don't support. */
@@ -467,6 +474,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
+void kvm_cond_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 
 long kvm_arch_dev_ioctl(struct file *filp,
@@ -888,5 +896,17 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * Mark the TLB as dirty, for kvm_cond_flush_remote_tlbs().
+ */
+static inline void kvm_mark_tlb_dirty(struct kvm *kvm)
+{
+	/*
+	 * Make any changes to the page tables visible to remote flushers.
+	 */
+	smp_mb__before_atomic_inc();
+	atomic_long_inc(&kvm->tlb_state.dirtied_count);
+}
+
 #endif
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1847c76..643ce01 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -203,12 +203,21 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
 
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
-	long dirty_count = kvm->tlbs_dirty;
+	long dirty_count = atomic_long_read(&kvm->tlb_state.dirtied_count);
+	long flushed_count = atomic_long_read(&kvm->tlb_state.flushed_count);
 
 	smp_mb();
 	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
 		++kvm->stat.remote_tlb_flush;
-	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
+	atomic_long_cmpxchg(&kvm->tlb_state.flushed_count,
+			    flushed_count, dirty_count);
+}
+
+void kvm_cond_flush_remote_tlbs(struct kvm *kvm)
+{
+	if (atomic_long_read(&kvm->tlb_state.dirtied_count)
+	    != atomic_long_read(&kvm->tlb_state.flushed_count))
+		kvm_flush_remote_tlbs(kvm);
 }
 
 void kvm_reload_remote_mmus(struct kvm *kvm)
@@ -267,7 +276,7 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
 					     unsigned long address)
 {
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
-	int need_tlb_flush, idx;
+	int idx;
 
 	/*
 	 * When ->invalidate_page runs, the linux pte has been zapped
@@ -291,10 +300,10 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
 	spin_lock(&kvm->mmu_lock);
 
 	kvm->mmu_notifier_seq++;
-	need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm->tlbs_dirty;
+	if (kvm_unmap_hva(kvm, address))
+		kvm_mark_tlb_dirty(kvm);
 	/* we've to flush the tlb before the pages can be freed */
-	if (need_tlb_flush)
-		kvm_flush_remote_tlbs(kvm);
+	kvm_cond_flush_remote_tlbs(kvm);
 
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -334,10 +343,12 @@ static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	kvm->mmu_notifier_count++;
 	for (; start < end; start += PAGE_SIZE)
 		need_tlb_flush |= kvm_unmap_hva(kvm, start);
-	need_tlb_flush |= kvm->tlbs_dirty;
-	/* we've to flush the tlb before the pages can be freed */
+
 	if (need_tlb_flush)
-		kvm_flush_remote_tlbs(kvm);
+		kvm_mark_tlb_dirty(kvm);
+
+	/* we've to flush the tlb before the pages can be freed */
+	kvm_cond_flush_remote_tlbs(kvm);
 
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] KVM: Flush TLB in mmu notifier without holding mmu_lock
  2012-05-03 11:22 [PATCH 0/4] Unlocked TLB flush Avi Kivity
  2012-05-03 11:22 ` [PATCH 1/4] KVM: Add APIs for unlocked " Avi Kivity
@ 2012-05-03 11:23 ` Avi Kivity
  2012-05-03 11:23 ` [PATCH 3/4] KVM: Flush TLB in FNAME(invlpg) " Avi Kivity
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2012-05-03 11:23 UTC (permalink / raw)
  To: kvm, Marcelo Tosatti, Xiao Guangrong, takuya.yoshikawa

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 virt/kvm/kvm_main.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 643ce01..9e2db44 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -302,11 +302,11 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
 	kvm->mmu_notifier_seq++;
 	if (kvm_unmap_hva(kvm, address))
 		kvm_mark_tlb_dirty(kvm);
-	/* we've to flush the tlb before the pages can be freed */
-	kvm_cond_flush_remote_tlbs(kvm);
-
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
+
+	/* we've to flush the tlb before the pages can be freed */
+	kvm_cond_flush_remote_tlbs(kvm);
 }
 
 static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
@@ -347,11 +347,11 @@ static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	if (need_tlb_flush)
 		kvm_mark_tlb_dirty(kvm);
 
-	/* we've to flush the tlb before the pages can be freed */
-	kvm_cond_flush_remote_tlbs(kvm);
-
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
+
+	/* we've to flush the tlb before the pages can be freed */
+	kvm_cond_flush_remote_tlbs(kvm);
 }
 
 static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
@@ -392,11 +392,13 @@ static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
 
 	young = kvm_age_hva(kvm, address);
 	if (young)
-		kvm_flush_remote_tlbs(kvm);
+		kvm_mark_tlb_dirty(kvm);
 
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 
+	kvm_cond_flush_remote_tlbs(kvm);
+
 	return young;
 }
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] KVM: Flush TLB in FNAME(invlpg) without holding mmu_lock
  2012-05-03 11:22 [PATCH 0/4] Unlocked TLB flush Avi Kivity
  2012-05-03 11:22 ` [PATCH 1/4] KVM: Add APIs for unlocked " Avi Kivity
  2012-05-03 11:23 ` [PATCH 2/4] KVM: Flush TLB in mmu notifier without holding mmu_lock Avi Kivity
@ 2012-05-03 11:23 ` Avi Kivity
  2012-05-03 11:23 ` [PATCH 4/4] KVM: Flush TLB in change_pte mmu notifier " Avi Kivity
  2012-05-08  1:25 ` [PATCH 0/4] Unlocked TLB flush Marcelo Tosatti
  4 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2012-05-03 11:23 UTC (permalink / raw)
  To: kvm, Marcelo Tosatti, Xiao Guangrong, takuya.yoshikawa

mmu_page_zap_pte() is modified to mark the TLB as dirty; but currently
only FNAME(invlpg) takes advantage of this.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/mmu.c         |    7 +++----
 arch/x86/kvm/paging_tmpl.h |    5 +++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 07424cf..0e7bcff 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1887,7 +1887,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	}
 }
 
-static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
+static void mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			     u64 *spte)
 {
 	u64 pte;
@@ -1903,13 +1903,12 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 			child = page_header(pte & PT64_BASE_ADDR_MASK);
 			drop_parent_pte(child, spte);
 		}
-		return true;
+		kvm_mark_tlb_dirty(kvm);
+		return;
 	}
 
 	if (is_mmio_spte(pte))
 		mmu_spte_clear_no_track(spte);
-
-	return false;
 }
 
 static void kvm_mmu_page_unlink_children(struct kvm *kvm,
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 97e2a81..72c9cf4 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -697,8 +697,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 			pte_gpa = FNAME(get_level1_sp_gpa)(sp);
 			pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
 
-			if (mmu_page_zap_pte(vcpu->kvm, sp, sptep))
-				kvm_flush_remote_tlbs(vcpu->kvm);
+			mmu_page_zap_pte(vcpu->kvm, sp, sptep);
 
 			if (!rmap_can_add(vcpu))
 				break;
@@ -714,6 +713,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 			break;
 	}
 	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	kvm_cond_flush_remote_tlbs(vcpu->kvm);
 }
 
 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] KVM: Flush TLB in change_pte mmu notifier without holding mmu_lock
  2012-05-03 11:22 [PATCH 0/4] Unlocked TLB flush Avi Kivity
                   ` (2 preceding siblings ...)
  2012-05-03 11:23 ` [PATCH 3/4] KVM: Flush TLB in FNAME(invlpg) " Avi Kivity
@ 2012-05-03 11:23 ` Avi Kivity
  2012-05-08  1:25 ` [PATCH 0/4] Unlocked TLB flush Marcelo Tosatti
  4 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2012-05-03 11:23 UTC (permalink / raw)
  To: kvm, Marcelo Tosatti, Xiao Guangrong, takuya.yoshikawa

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/mmu.c  |    8 ++------
 virt/kvm/kvm_main.c |    2 ++
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0e7bcff..bc1d83c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1138,7 +1138,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
-	int need_flush = 0;
 	u64 new_spte;
 	pte_t *ptep = (pte_t *)data;
 	pfn_t new_pfn;
@@ -1150,8 +1149,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		BUG_ON(!is_shadow_present_pte(*sptep));
 		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", sptep, *sptep);
 
-		need_flush = 1;
-
 		if (pte_write(*ptep)) {
 			drop_spte(kvm, sptep);
 			sptep = rmap_get_first(*rmapp, &iter);
@@ -1167,10 +1164,9 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			mmu_spte_set(sptep, new_spte);
 			sptep = rmap_get_next(&iter);
 		}
-	}
 
-	if (need_flush)
-		kvm_flush_remote_tlbs(kvm);
+		kvm_mark_tlb_dirty(kvm);
+	}
 
 	return 0;
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9e2db44..92d8ddc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -323,6 +323,8 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 	kvm_set_spte_hva(kvm, address, pte);
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
+
+	kvm_cond_flush_remote_tlbs(kvm);
 }
 
 static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-03 11:22 ` [PATCH 1/4] KVM: Add APIs for unlocked " Avi Kivity
@ 2012-05-03 13:23   ` Xiao Guangrong
  2012-05-03 14:11     ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Xiao Guangrong @ 2012-05-03 13:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti, takuya.yoshikawa

On 05/03/2012 07:22 PM, Avi Kivity wrote:

> Currently we flush the TLB while holding mmu_lock.  This
> increases the lock hold time by the IPI round-trip time, increasing
> contention, and makes dropping the lock (for latency reasons) harder.
> 
> This patch changes TLB management to be usable locklessly, introducing
> the following APIs:
> 
>   kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
>   kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
>                                   dirty
> 
> These APIs can be used without holding mmu_lock (though if the TLB
> became stale due to shadow page table modifications, typically it
> will need to be called with the lock held to prevent other threads
> from seeing the modified page tables with the TLB unmarked and unflushed)/
> 
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  Documentation/virtual/kvm/locking.txt |   14 ++++++++++++++
>  arch/x86/kvm/paging_tmpl.h            |    4 ++--
>  include/linux/kvm_host.h              |   22 +++++++++++++++++++++-
>  virt/kvm/kvm_main.c                   |   29 ++++++++++++++++++++---------
>  4 files changed, 57 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> index 3b4cd3b..f6c90479 100644
> --- a/Documentation/virtual/kvm/locking.txt
> +++ b/Documentation/virtual/kvm/locking.txt
> @@ -23,3 +23,17 @@ Arch:		x86
>  Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
>  		- tsc offset in vmcb
>  Comment:	'raw' because updating the tsc offsets must not be preempted.
> +
> +3. TLB control
> +--------------
> +
> +The following APIs should be used for TLB control:
> +
> + - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
> +      either guest or host page tables.
> + - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
> + - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
> +
> +These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
> +used while holding mmu_lock if it is called due to host page table changes
> +(contrast to guest page table changes).


In these patches, it seems that kvm_mark_tlb_dirty is always used
under the protection of mmu-lock, yes?

If both kvm_mark_tlb_dirty and kvm_cond_flush_remote_tlbs are use
out of mmu-lock, i think we can use kvm_flush_remote_tlbs instead.

If it is so, dirtied_count/flushed_count need not be atomic.

And, it seems there is a bug:

 VCPU 0                              VCPU 1

 hold mmu-lock
 zap spte which points to 'gfn'
 mark_tlb_dirty
 release mmu-lock
                                    hold mmu-lock
                                    rmap_write-protect:
                                       see no spte pointing to gfn
                                       tlb did not be flushed
                                    release mmu-lock

                                    write gfn by guest
                                      OOPS!!!

 kvm_cond_flush_remote_tlbs

We need call kvm_cond_flush_remote_tlbs in rmap_write-protect
unconditionally?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-03 13:23   ` Xiao Guangrong
@ 2012-05-03 14:11     ` Avi Kivity
  2012-05-07  7:06       ` Xiao Guangrong
  2012-05-08  2:25       ` Marcelo Tosatti
  0 siblings, 2 replies; 20+ messages in thread
From: Avi Kivity @ 2012-05-03 14:11 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: kvm, Marcelo Tosatti, takuya.yoshikawa

On 05/03/2012 04:23 PM, Xiao Guangrong wrote:
> On 05/03/2012 07:22 PM, Avi Kivity wrote:
>
> > Currently we flush the TLB while holding mmu_lock.  This
> > increases the lock hold time by the IPI round-trip time, increasing
> > contention, and makes dropping the lock (for latency reasons) harder.
> > 
> > This patch changes TLB management to be usable locklessly, introducing
> > the following APIs:
> > 
> >   kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
> >   kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
> >                                   dirty
> > 
> > These APIs can be used without holding mmu_lock (though if the TLB
> > became stale due to shadow page table modifications, typically it
> > will need to be called with the lock held to prevent other threads
> > from seeing the modified page tables with the TLB unmarked and unflushed)/
> > 
> > Signed-off-by: Avi Kivity <avi@redhat.com>
> > ---
> >  Documentation/virtual/kvm/locking.txt |   14 ++++++++++++++
> >  arch/x86/kvm/paging_tmpl.h            |    4 ++--
> >  include/linux/kvm_host.h              |   22 +++++++++++++++++++++-
> >  virt/kvm/kvm_main.c                   |   29 ++++++++++++++++++++---------
> >  4 files changed, 57 insertions(+), 12 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> > index 3b4cd3b..f6c90479 100644
> > --- a/Documentation/virtual/kvm/locking.txt
> > +++ b/Documentation/virtual/kvm/locking.txt
> > @@ -23,3 +23,17 @@ Arch:		x86
> >  Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
> >  		- tsc offset in vmcb
> >  Comment:	'raw' because updating the tsc offsets must not be preempted.
> > +
> > +3. TLB control
> > +--------------
> > +
> > +The following APIs should be used for TLB control:
> > +
> > + - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
> > +      either guest or host page tables.
> > + - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
> > + - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
> > +
> > +These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
> > +used while holding mmu_lock if it is called due to host page table changes
> > +(contrast to guest page table changes).
>
>
> In these patches, it seems that kvm_mark_tlb_dirty is always used
> under the protection of mmu-lock, yes?

Correct.  It's possible we'll find a use outside mmu_lock, but this
isn't likely.

> If both kvm_mark_tlb_dirty and kvm_cond_flush_remote_tlbs are use
> out of mmu-lock, i think we can use kvm_flush_remote_tlbs instead.
>
> If it is so, dirtied_count/flushed_count need not be atomic.

But we always mark with mmu_lock held.

>
> And, it seems there is a bug:
>
>  VCPU 0                              VCPU 1
>
>  hold mmu-lock
>  zap spte which points to 'gfn'
>  mark_tlb_dirty
>  release mmu-lock
>                                     hold mmu-lock
>                                     rmap_write-protect:
>                                        see no spte pointing to gfn
>                                        tlb did not be flushed
>                                     release mmu-lock
>
>                                     write gfn by guest
>                                       OOPS!!!
>
>  kvm_cond_flush_remote_tlbs
>
> We need call kvm_cond_flush_remote_tlbs in rmap_write-protect
> unconditionally?

Yes, that's the point.  We change

   spin_lock(mmu_lock)
   conditionally flush the tlb
   spin_unlock(mmu_lock)

to

  spin_lock(mmu_lock)
  conditionally mark the tlb as dirty
  spin_unlock(mmu_lock)
  kvm_cond_flush_remote_tlbs()

but that means the entire codebase has to be converted.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-03 14:11     ` Avi Kivity
@ 2012-05-07  7:06       ` Xiao Guangrong
  2012-05-07  7:59         ` Avi Kivity
  2012-05-08  2:25       ` Marcelo Tosatti
  1 sibling, 1 reply; 20+ messages in thread
From: Xiao Guangrong @ 2012-05-07  7:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Marcelo Tosatti, takuya.yoshikawa

On 05/03/2012 10:11 PM, Avi Kivity wrote:

> On 05/03/2012 04:23 PM, Xiao Guangrong wrote:
>> On 05/03/2012 07:22 PM, Avi Kivity wrote:
>>
>>> Currently we flush the TLB while holding mmu_lock.  This
>>> increases the lock hold time by the IPI round-trip time, increasing
>>> contention, and makes dropping the lock (for latency reasons) harder.
>>>
>>> This patch changes TLB management to be usable locklessly, introducing
>>> the following APIs:
>>>
>>>   kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
>>>   kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
>>>                                   dirty
>>>
>>> These APIs can be used without holding mmu_lock (though if the TLB
>>> became stale due to shadow page table modifications, typically it
>>> will need to be called with the lock held to prevent other threads
>>> from seeing the modified page tables with the TLB unmarked and unflushed)/
>>>
>>> Signed-off-by: Avi Kivity <avi@redhat.com>
>>> ---
>>>  Documentation/virtual/kvm/locking.txt |   14 ++++++++++++++
>>>  arch/x86/kvm/paging_tmpl.h            |    4 ++--
>>>  include/linux/kvm_host.h              |   22 +++++++++++++++++++++-
>>>  virt/kvm/kvm_main.c                   |   29 ++++++++++++++++++++---------
>>>  4 files changed, 57 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
>>> index 3b4cd3b..f6c90479 100644
>>> --- a/Documentation/virtual/kvm/locking.txt
>>> +++ b/Documentation/virtual/kvm/locking.txt
>>> @@ -23,3 +23,17 @@ Arch:		x86
>>>  Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
>>>  		- tsc offset in vmcb
>>>  Comment:	'raw' because updating the tsc offsets must not be preempted.
>>> +
>>> +3. TLB control
>>> +--------------
>>> +
>>> +The following APIs should be used for TLB control:
>>> +
>>> + - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
>>> +      either guest or host page tables.
>>> + - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
>>> + - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
>>> +
>>> +These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
>>> +used while holding mmu_lock if it is called due to host page table changes
>>> +(contrast to guest page table changes).
>>
>>
>> In these patches, it seems that kvm_mark_tlb_dirty is always used
>> under the protection of mmu-lock, yes?
> 
> Correct.  It's possible we'll find a use outside mmu_lock, but this
> isn't likely.


If we need call kvm_mark_tlb_dirty outside mmu-lock, just use
kvm_flush_remote_tlbs instead:

if (need-flush-tlb)
	flush = true;

do something...

if (flush)
	kvm_flush_remote_tlbs

> 
>> If both kvm_mark_tlb_dirty and kvm_cond_flush_remote_tlbs are use
>> out of mmu-lock, i think we can use kvm_flush_remote_tlbs instead.
>>
>> If it is so, dirtied_count/flushed_count need not be atomic.
> 
> But we always mark with mmu_lock held.
> 


Yes, so, we can change kvm_mark_tlb_dirty to:

+static inline void kvm_mark_tlb_dirty(struct kvm *kvm)
+{
+	/*
+	 * Make any changes to the page tables visible to remote flushers.
+	 */
+	smb_mb();
+	kvm->tlb_state.dirtied_count++;
+}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-07  7:06       ` Xiao Guangrong
@ 2012-05-07  7:59         ` Avi Kivity
  2012-05-08  1:55           ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2012-05-07  7:59 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: kvm, Marcelo Tosatti, takuya.yoshikawa

On 05/07/2012 10:06 AM, Xiao Guangrong wrote:
> On 05/03/2012 10:11 PM, Avi Kivity wrote:
>
> > On 05/03/2012 04:23 PM, Xiao Guangrong wrote:
> >> On 05/03/2012 07:22 PM, Avi Kivity wrote:
> >>
> >>> Currently we flush the TLB while holding mmu_lock.  This
> >>> increases the lock hold time by the IPI round-trip time, increasing
> >>> contention, and makes dropping the lock (for latency reasons) harder.
> >>>
> >>> This patch changes TLB management to be usable locklessly, introducing
> >>> the following APIs:
> >>>
> >>>   kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
> >>>   kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
> >>>                                   dirty
> >>>
> >>> These APIs can be used without holding mmu_lock (though if the TLB
> >>> became stale due to shadow page table modifications, typically it
> >>> will need to be called with the lock held to prevent other threads
> >>> from seeing the modified page tables with the TLB unmarked and unflushed)/
> >>>
> >>> Signed-off-by: Avi Kivity <avi@redhat.com>
> >>> ---
> >>>  Documentation/virtual/kvm/locking.txt |   14 ++++++++++++++
> >>>  arch/x86/kvm/paging_tmpl.h            |    4 ++--
> >>>  include/linux/kvm_host.h              |   22 +++++++++++++++++++++-
> >>>  virt/kvm/kvm_main.c                   |   29 ++++++++++++++++++++---------
> >>>  4 files changed, 57 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> >>> index 3b4cd3b..f6c90479 100644
> >>> --- a/Documentation/virtual/kvm/locking.txt
> >>> +++ b/Documentation/virtual/kvm/locking.txt
> >>> @@ -23,3 +23,17 @@ Arch:		x86
> >>>  Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
> >>>  		- tsc offset in vmcb
> >>>  Comment:	'raw' because updating the tsc offsets must not be preempted.
> >>> +
> >>> +3. TLB control
> >>> +--------------
> >>> +
> >>> +The following APIs should be used for TLB control:
> >>> +
> >>> + - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
> >>> +      either guest or host page tables.
> >>> + - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
> >>> + - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
> >>> +
> >>> +These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
> >>> +used while holding mmu_lock if it is called due to host page table changes
> >>> +(contrast to guest page table changes).
> >>
> >>
> >> In these patches, it seems that kvm_mark_tlb_dirty is always used
> >> under the protection of mmu-lock, yes?
> > 
> > Correct.  It's possible we'll find a use outside mmu_lock, but this
> > isn't likely.
>
>
> If we need call kvm_mark_tlb_dirty outside mmu-lock, just use
> kvm_flush_remote_tlbs instead:
>
> if (need-flush-tlb)
> 	flush = true;
>
> do something...
>
> if (flush)
> 	kvm_flush_remote_tlbs

It depends on how need-flush-tlb is computed.  If it depends on sptes,
then we mush use kvm_cond_flush_remote_tlbs().

> > 
> >> If both kvm_mark_tlb_dirty and kvm_cond_flush_remote_tlbs are use
> >> out of mmu-lock, i think we can use kvm_flush_remote_tlbs instead.
> >>
> >> If it is so, dirtied_count/flushed_count need not be atomic.
> > 
> > But we always mark with mmu_lock held.
> > 
>
>
> Yes, so, we can change kvm_mark_tlb_dirty to:
>
> +static inline void kvm_mark_tlb_dirty(struct kvm *kvm)
> +{
> +	/*
> +	 * Make any changes to the page tables visible to remote flushers.
> +	 */
> +	smb_mb();
> +	kvm->tlb_state.dirtied_count++;
> +}
>

Yes.  We'll have to change it again if we ever dirty sptes outside the
lock, but that's okay.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Unlocked TLB flush
  2012-05-03 11:22 [PATCH 0/4] Unlocked TLB flush Avi Kivity
                   ` (3 preceding siblings ...)
  2012-05-03 11:23 ` [PATCH 4/4] KVM: Flush TLB in change_pte mmu notifier " Avi Kivity
@ 2012-05-08  1:25 ` Marcelo Tosatti
  2012-05-08  1:27   ` Marcelo Tosatti
  4 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2012-05-08  1:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Xiao Guangrong, takuya.yoshikawa

On Thu, May 03, 2012 at 02:22:58PM +0300, Avi Kivity wrote:
> This patchset implements unlocked TLB flushing for KVM.  An operation that
> generates stale TLB entries can mark the TLB as dirty instead of flushing
> immediately, and then flush after releasing mmu_lock but before returning
> to the guest or the caller.  A few call sites are converted too.
> 
> Note not all call sites are easily convertible; as an example, sync_page()
> must flush before reading the guest page table.

Huh? Are you referring to:

 * Note:
 *   We should flush all tlbs if spte is dropped even though guest is
 *   responsible for it. Since if we don't,
 *   kvm_mmu_notifier_invalidate_page
 *   and kvm_mmu_notifier_invalidate_range_start detect the mapping page
 *   isn't
 *   used by guest then tlbs are not flushed, so guest is allowed to
 *   access the
 *   freed pages.
 *   And we increase kvm->tlbs_dirty to delay tlbs flush in this case.

With an increased dirtied_count the flush can be performed
by kvm_mmu_notifier_invalidate_page.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Unlocked TLB flush
  2012-05-08  1:25 ` [PATCH 0/4] Unlocked TLB flush Marcelo Tosatti
@ 2012-05-08  1:27   ` Marcelo Tosatti
  2012-05-08 10:51     ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2012-05-08  1:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Xiao Guangrong, takuya.yoshikawa

On Mon, May 07, 2012 at 10:25:34PM -0300, Marcelo Tosatti wrote:
> On Thu, May 03, 2012 at 02:22:58PM +0300, Avi Kivity wrote:
> > This patchset implements unlocked TLB flushing for KVM.  An operation that
> > generates stale TLB entries can mark the TLB as dirty instead of flushing
> > immediately, and then flush after releasing mmu_lock but before returning
> > to the guest or the caller.  A few call sites are converted too.
> > 
> > Note not all call sites are easily convertible; as an example, sync_page()
> > must flush before reading the guest page table.
> 
> Huh? Are you referring to:
> 
>  * Note:
>  *   We should flush all tlbs if spte is dropped even though guest is
>  *   responsible for it. Since if we don't,
>  *   kvm_mmu_notifier_invalidate_page
>  *   and kvm_mmu_notifier_invalidate_range_start detect the mapping page
>  *   isn't
>  *   used by guest then tlbs are not flushed, so guest is allowed to
>  *   access the
>  *   freed pages.
>  *   And we increase kvm->tlbs_dirty to delay tlbs flush in this case.
> 
> With an increased dirtied_count the flush can be performed
> by kvm_mmu_notifier_invalidate_page.

Which is what patch 1 does. Your comment regarding sync_page()
above is what is outdated, unless i am missing something.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-07  7:59         ` Avi Kivity
@ 2012-05-08  1:55           ` Marcelo Tosatti
  2012-05-08  9:09             ` Avi Kivity
  2012-05-08  9:09             ` Avi Kivity
  0 siblings, 2 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2012-05-08  1:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, kvm, takuya.yoshikawa

On Mon, May 07, 2012 at 10:59:04AM +0300, Avi Kivity wrote:
> On 05/07/2012 10:06 AM, Xiao Guangrong wrote:
> > On 05/03/2012 10:11 PM, Avi Kivity wrote:
> >
> > > On 05/03/2012 04:23 PM, Xiao Guangrong wrote:
> > >> On 05/03/2012 07:22 PM, Avi Kivity wrote:
> > >>
> > >>> Currently we flush the TLB while holding mmu_lock.  This
> > >>> increases the lock hold time by the IPI round-trip time, increasing
> > >>> contention, and makes dropping the lock (for latency reasons) harder.
> > >>>
> > >>> This patch changes TLB management to be usable locklessly, introducing
> > >>> the following APIs:
> > >>>
> > >>>   kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
> > >>>   kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
> > >>>                                   dirty
> > >>>
> > >>> These APIs can be used without holding mmu_lock (though if the TLB
> > >>> became stale due to shadow page table modifications, typically it
> > >>> will need to be called with the lock held to prevent other threads
> > >>> from seeing the modified page tables with the TLB unmarked and unflushed)/
> > >>>
> > >>> Signed-off-by: Avi Kivity <avi@redhat.com>
> > >>> ---
> > >>>  Documentation/virtual/kvm/locking.txt |   14 ++++++++++++++
> > >>>  arch/x86/kvm/paging_tmpl.h            |    4 ++--
> > >>>  include/linux/kvm_host.h              |   22 +++++++++++++++++++++-
> > >>>  virt/kvm/kvm_main.c                   |   29 ++++++++++++++++++++---------
> > >>>  4 files changed, 57 insertions(+), 12 deletions(-)
> > >>>
> > >>> diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> > >>> index 3b4cd3b..f6c90479 100644
> > >>> --- a/Documentation/virtual/kvm/locking.txt
> > >>> +++ b/Documentation/virtual/kvm/locking.txt
> > >>> @@ -23,3 +23,17 @@ Arch:		x86
> > >>>  Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
> > >>>  		- tsc offset in vmcb
> > >>>  Comment:	'raw' because updating the tsc offsets must not be preempted.
> > >>> +
> > >>> +3. TLB control
> > >>> +--------------
> > >>> +
> > >>> +The following APIs should be used for TLB control:
> > >>> +
> > >>> + - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
> > >>> +      either guest or host page tables.
> > >>> + - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
> > >>> + - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
> > >>> +
> > >>> +These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
> > >>> +used while holding mmu_lock if it is called due to host page table changes
> > >>> +(contrast to guest page table changes).
> > >>
> > >>
> > >> In these patches, it seems that kvm_mark_tlb_dirty is always used
> > >> under the protection of mmu-lock, yes?
> > > 
> > > Correct.  It's possible we'll find a use outside mmu_lock, but this
> > > isn't likely.
> >
> >
> > If we need call kvm_mark_tlb_dirty outside mmu-lock, just use
> > kvm_flush_remote_tlbs instead:
> >
> > if (need-flush-tlb)
> > 	flush = true;
> >
> > do something...
> >
> > if (flush)
> > 	kvm_flush_remote_tlbs
> 
> It depends on how need-flush-tlb is computed.  If it depends on sptes,
> then we mush use kvm_cond_flush_remote_tlbs().
> 
> > > 
> > >> If both kvm_mark_tlb_dirty and kvm_cond_flush_remote_tlbs are use
> > >> out of mmu-lock, i think we can use kvm_flush_remote_tlbs instead.
> > >>
> > >> If it is so, dirtied_count/flushed_count need not be atomic.
> > > 
> > > But we always mark with mmu_lock held.
> > > 
> >
> >
> > Yes, so, we can change kvm_mark_tlb_dirty to:
> >
> > +static inline void kvm_mark_tlb_dirty(struct kvm *kvm)
> > +{
> > +	/*
> > +	 * Make any changes to the page tables visible to remote flushers.
> > +	 */
> > +	smb_mb();
> > +	kvm->tlb_state.dirtied_count++;
> > +}
> >
> 
> Yes.  We'll have to change it again if we ever dirty sptes outside the
> lock, but that's okay.

Please don't. There are readers outside mmu_lock, so it should be
atomic.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-03 14:11     ` Avi Kivity
  2012-05-07  7:06       ` Xiao Guangrong
@ 2012-05-08  2:25       ` Marcelo Tosatti
  2012-05-08 12:39         ` Avi Kivity
  1 sibling, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2012-05-08  2:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, kvm, takuya.yoshikawa

On Thu, May 03, 2012 at 05:11:01PM +0300, Avi Kivity wrote:
> On 05/03/2012 04:23 PM, Xiao Guangrong wrote:
> > On 05/03/2012 07:22 PM, Avi Kivity wrote:
> >
> > > Currently we flush the TLB while holding mmu_lock.  This
> > > increases the lock hold time by the IPI round-trip time, increasing
> > > contention, and makes dropping the lock (for latency reasons) harder.
> > > 
> > > This patch changes TLB management to be usable locklessly, introducing
> > > the following APIs:
> > > 
> > >   kvm_mark_tlb_dirty() - mark the TLB as containing stale entries
> > >   kvm_cond_flush_remote_tlbs() - flush the TLB if it was marked as
> > >                                   dirty
> > > 
> > > These APIs can be used without holding mmu_lock (though if the TLB
> > > became stale due to shadow page table modifications, typically it
> > > will need to be called with the lock held to prevent other threads
> > > from seeing the modified page tables with the TLB unmarked and unflushed)/
> > > 
> > > Signed-off-by: Avi Kivity <avi@redhat.com>
> > > ---
> > >  Documentation/virtual/kvm/locking.txt |   14 ++++++++++++++
> > >  arch/x86/kvm/paging_tmpl.h            |    4 ++--
> > >  include/linux/kvm_host.h              |   22 +++++++++++++++++++++-
> > >  virt/kvm/kvm_main.c                   |   29 ++++++++++++++++++++---------
> > >  4 files changed, 57 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/Documentation/virtual/kvm/locking.txt b/Documentation/virtual/kvm/locking.txt
> > > index 3b4cd3b..f6c90479 100644
> > > --- a/Documentation/virtual/kvm/locking.txt
> > > +++ b/Documentation/virtual/kvm/locking.txt
> > > @@ -23,3 +23,17 @@ Arch:		x86
> > >  Protects:	- kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
> > >  		- tsc offset in vmcb
> > >  Comment:	'raw' because updating the tsc offsets must not be preempted.
> > > +
> > > +3. TLB control
> > > +--------------
> > > +
> > > +The following APIs should be used for TLB control:
> > > +
> > > + - kvm_mark_tlb_dirty() - indicates that the TLB is out of sync wrt
> > > +      either guest or host page tables.
> > > + - kvm_flush_remote_tlbs() - unconditionally flush the tlbs
> > > + - kvm_cond_flush_remote_tlbs() - flush the TLBs if previously marked
> > > +
> > > +These may be used without mmu_lock, though kvm_mark_tlb_dirty() needs to be
> > > +used while holding mmu_lock if it is called due to host page table changes
> > > +(contrast to guest page table changes).
> >
> >
> > In these patches, it seems that kvm_mark_tlb_dirty is always used
> > under the protection of mmu-lock, yes?
> 
> Correct.  It's possible we'll find a use outside mmu_lock, but this
> isn't likely.
> 
> > If both kvm_mark_tlb_dirty and kvm_cond_flush_remote_tlbs are use
> > out of mmu-lock, i think we can use kvm_flush_remote_tlbs instead.
> >
> > If it is so, dirtied_count/flushed_count need not be atomic.
> 
> But we always mark with mmu_lock held.
> 
> >
> > And, it seems there is a bug:
> >
> >  VCPU 0                              VCPU 1
> >
> >  hold mmu-lock
> >  zap spte which points to 'gfn'
> >  mark_tlb_dirty
> >  release mmu-lock
> >                                     hold mmu-lock
> >                                     rmap_write-protect:
> >                                        see no spte pointing to gfn
> >                                        tlb did not be flushed
> >                                     release mmu-lock
> >
> >                                     write gfn by guest
> >                                       OOPS!!!
> >
> >  kvm_cond_flush_remote_tlbs
> >
> > We need call kvm_cond_flush_remote_tlbs in rmap_write-protect
> > unconditionally?
> 
> Yes, that's the point.  We change
> 
>    spin_lock(mmu_lock)
>    conditionally flush the tlb
>    spin_unlock(mmu_lock)
> 
> to
> 
>   spin_lock(mmu_lock)
>   conditionally mark the tlb as dirty
>   spin_unlock(mmu_lock)
>   kvm_cond_flush_remote_tlbs()
> 
> but that means the entire codebase has to be converted.

Is there any other site which expects sptes and TLBs to be in sync,     
other than rmap_write_protect?                                          

Please convert the flush_remote_tlbs at the end of set_spte (RW->RO) to
mark_dirty.

Looks good in general (patchset is incomplete though). One thing that
is annoying is that there is no guarantee of progress for flushed_count
increment (it can, in theory, always race with a mark_dirty). But since
that would be only a performance, and not correctness, aspect, it is
fine.

It has the advantage that it requires code to explicitly document where
the TLB must be flushed and the sites which expect sptes to be in sync
with TLBs.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-08  1:55           ` Marcelo Tosatti
@ 2012-05-08  9:09             ` Avi Kivity
  2012-05-08 13:50               ` Marcelo Tosatti
  2012-05-08  9:09             ` Avi Kivity
  1 sibling, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2012-05-08  9:09 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, kvm, takuya.yoshikawa

On 05/08/2012 04:55 AM, Marcelo Tosatti wrote:
> > > Yes, so, we can change kvm_mark_tlb_dirty to:
> > >
> > > +static inline void kvm_mark_tlb_dirty(struct kvm *kvm)
> > > +{
> > > +	/*
> > > +	 * Make any changes to the page tables visible to remote flushers.
> > > +	 */
> > > +	smb_mb();
> > > +	kvm->tlb_state.dirtied_count++;
> > > +}
> > >
> > 
> > Yes.  We'll have to change it again if we ever dirty sptes outside the
> > lock, but that's okay.
>
> Please don't. There are readers outside mmu_lock, so it should be
> atomic.

Why does it need to be atomic?  All it needs is to be properly barriered
(provided by spin_unlock(mmu_lock).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-08  1:55           ` Marcelo Tosatti
  2012-05-08  9:09             ` Avi Kivity
@ 2012-05-08  9:09             ` Avi Kivity
  1 sibling, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2012-05-08  9:09 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, kvm, takuya.yoshikawa

On 05/08/2012 04:55 AM, Marcelo Tosatti wrote:
> > > Yes, so, we can change kvm_mark_tlb_dirty to:
> > >
> > > +static inline void kvm_mark_tlb_dirty(struct kvm *kvm)
> > > +{
> > > +	/*
> > > +	 * Make any changes to the page tables visible to remote flushers.
> > > +	 */
> > > +	smb_mb();
> > > +	kvm->tlb_state.dirtied_count++;
> > > +}
> > >
> > 
> > Yes.  We'll have to change it again if we ever dirty sptes outside the
> > lock, but that's okay.
>
> Please don't. There are readers outside mmu_lock, so it should be
> atomic.

Why does it need to be atomic?  All it needs is to be properly barriered
(provided by spin_unlock(mmu_lock)).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Unlocked TLB flush
  2012-05-08  1:27   ` Marcelo Tosatti
@ 2012-05-08 10:51     ` Avi Kivity
  0 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2012-05-08 10:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, Xiao Guangrong, takuya.yoshikawa

On 05/08/2012 04:27 AM, Marcelo Tosatti wrote:
> On Mon, May 07, 2012 at 10:25:34PM -0300, Marcelo Tosatti wrote:
> > On Thu, May 03, 2012 at 02:22:58PM +0300, Avi Kivity wrote:
> > > This patchset implements unlocked TLB flushing for KVM.  An operation that
> > > generates stale TLB entries can mark the TLB as dirty instead of flushing
> > > immediately, and then flush after releasing mmu_lock but before returning
> > > to the guest or the caller.  A few call sites are converted too.
> > > 
> > > Note not all call sites are easily convertible; as an example, sync_page()
> > > must flush before reading the guest page table.
> > 
> > Huh? Are you referring to:
> > 
> >  * Note:
> >  *   We should flush all tlbs if spte is dropped even though guest is
> >  *   responsible for it. Since if we don't,
> >  *   kvm_mmu_notifier_invalidate_page
> >  *   and kvm_mmu_notifier_invalidate_range_start detect the mapping page
> >  *   isn't
> >  *   used by guest then tlbs are not flushed, so guest is allowed to
> >  *   access the
> >  *   freed pages.
> >  *   And we increase kvm->tlbs_dirty to delay tlbs flush in this case.
> > 
> > With an increased dirtied_count the flush can be performed
> > by kvm_mmu_notifier_invalidate_page.
>
> Which is what patch 1 does. Your comment regarding sync_page()
> above is what is outdated, unless i am missing something.

I wasn't referring to that.  sync_page() (and page_fault()) must be sure
the guest page table is write-protected before reading gptes from it. 
(page_fault() reads it before write protection, but verifies it afterwards:

        /*
         * Verify that the gpte in the page we've just write
         * protected is still there.
         */
        if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))
            goto out_gpte_changed;

so we must kvm_cond_flush_remote_tlbs() before calling gpte_changed().

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-08  2:25       ` Marcelo Tosatti
@ 2012-05-08 12:39         ` Avi Kivity
  2012-05-09 21:03           ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2012-05-08 12:39 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, kvm, takuya.yoshikawa

On 05/08/2012 05:25 AM, Marcelo Tosatti wrote:
> > >
> > > We need call kvm_cond_flush_remote_tlbs in rmap_write-protect
> > > unconditionally?
> > 
> > Yes, that's the point.  We change
> > 
> >    spin_lock(mmu_lock)
> >    conditionally flush the tlb
> >    spin_unlock(mmu_lock)
> > 
> > to
> > 
> >   spin_lock(mmu_lock)
> >   conditionally mark the tlb as dirty
> >   spin_unlock(mmu_lock)
> >   kvm_cond_flush_remote_tlbs()
> > 
> > but that means the entire codebase has to be converted.
>
> Is there any other site which expects sptes and TLBs to be in sync,     
> other than rmap_write_protect?                                          

I wouldn't say rmap_write_protect() expects sptes and TLBs to be in
sync.  Rather its callers.

> Please convert the flush_remote_tlbs at the end of set_spte (RW->RO) to
> mark_dirty.

I'd like to take an incremental approach, since there are many paths.  I
don't have a concrete plan though.

> Looks good in general (patchset is incomplete though). One thing that
> is annoying is that there is no guarantee of progress for flushed_count
> increment (it can, in theory, always race with a mark_dirty). But since
> that would be only a performance, and not correctness, aspect, it is
> fine.

We don't have a while () look in cond_flush(), so it won't be slowed
down by the race; whoever caused the race will have to flush on their
own, but they do that anyway now.

We can regress if vcpu 0 marks the tlb as dirty, and then vcpu 0 and 1
simultaneously flush, even though vcpu 1 did nothing to deserve it.  I
don't see a way around it except to hope its a rare event.

> It has the advantage that it requires code to explicitly document where
> the TLB must be flushed and the sites which expect sptes to be in sync
> with TLBs.

I'm looking for an idea of how to make the flush in those cases not hold
mmu_lock.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-08  9:09             ` Avi Kivity
@ 2012-05-08 13:50               ` Marcelo Tosatti
  0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2012-05-08 13:50 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, kvm, takuya.yoshikawa

On Tue, May 08, 2012 at 12:09:38PM +0300, Avi Kivity wrote:
> On 05/08/2012 04:55 AM, Marcelo Tosatti wrote:
> > > > Yes, so, we can change kvm_mark_tlb_dirty to:
> > > >
> > > > +static inline void kvm_mark_tlb_dirty(struct kvm *kvm)
> > > > +{
> > > > +	/*
> > > > +	 * Make any changes to the page tables visible to remote flushers.
> > > > +	 */
> > > > +	smb_mb();
> > > > +	kvm->tlb_state.dirtied_count++;
> > > > +}
> > > >
> > > 
> > > Yes.  We'll have to change it again if we ever dirty sptes outside the
> > > lock, but that's okay.
> >
> > Please don't. There are readers outside mmu_lock, so it should be
> > atomic.
> 
> Why does it need to be atomic?  All it needs is to be properly barriered
> (provided by spin_unlock(mmu_lock).

It does not _need_ to be atomic. It is easier to verify and self 
contained.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-08 12:39         ` Avi Kivity
@ 2012-05-09 21:03           ` Marcelo Tosatti
  2012-05-21 14:45             ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2012-05-09 21:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Xiao Guangrong, kvm, takuya.yoshikawa

On Tue, May 08, 2012 at 03:39:43PM +0300, Avi Kivity wrote:
> On 05/08/2012 05:25 AM, Marcelo Tosatti wrote:
> > > >
> > > > We need call kvm_cond_flush_remote_tlbs in rmap_write-protect
> > > > unconditionally?
> > > 
> > > Yes, that's the point.  We change
> > > 
> > >    spin_lock(mmu_lock)
> > >    conditionally flush the tlb
> > >    spin_unlock(mmu_lock)
> > > 
> > > to
> > > 
> > >   spin_lock(mmu_lock)
> > >   conditionally mark the tlb as dirty
> > >   spin_unlock(mmu_lock)
> > >   kvm_cond_flush_remote_tlbs()
> > > 
> > > but that means the entire codebase has to be converted.
> >
> > Is there any other site which expects sptes and TLBs to be in sync,     
> > other than rmap_write_protect?                                          
> 
> I wouldn't say rmap_write_protect() expects sptes and TLBs to be in
> sync.  Rather its callers.
> 
> > Please convert the flush_remote_tlbs at the end of set_spte (RW->RO) to
> > mark_dirty.
> 
> I'd like to take an incremental approach, since there are many paths.  I
> don't have a concrete plan though.
> 
> > Looks good in general (patchset is incomplete though). One thing that
> > is annoying is that there is no guarantee of progress for flushed_count
> > increment (it can, in theory, always race with a mark_dirty). But since
> > that would be only a performance, and not correctness, aspect, it is
> > fine.
> 
> We don't have a while () look in cond_flush(), so it won't be slowed
> down by the race; whoever caused the race will have to flush on their
> own, but they do that anyway now.
> 
> We can regress if vcpu 0 marks the tlb as dirty, and then vcpu 0 and 1
> simultaneously flush, even though vcpu 1 did nothing to deserve it.  I
> don't see a way around it except to hope its a rare event.
> 
> > It has the advantage that it requires code to explicitly document where
> > the TLB must be flushed and the sites which expect sptes to be in sync
> > with TLBs.
> 
> I'm looking for an idea of how to make the flush in those cases not hold
> mmu_lock.

You can't easily for rmap_write_protect, must check that the state is
unchanged (write protect operation is still relevant).

Current patchset (with corrections to comments by Xiao) is good enough
already IMO.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: Add APIs for unlocked TLB flush
  2012-05-09 21:03           ` Marcelo Tosatti
@ 2012-05-21 14:45             ` Avi Kivity
  0 siblings, 0 replies; 20+ messages in thread
From: Avi Kivity @ 2012-05-21 14:45 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Xiao Guangrong, kvm, takuya.yoshikawa

On 05/10/2012 12:03 AM, Marcelo Tosatti wrote:
> > 
> > > It has the advantage that it requires code to explicitly document where
> > > the TLB must be flushed and the sites which expect sptes to be in sync
> > > with TLBs.
> > 
> > I'm looking for an idea of how to make the flush in those cases not hold
> > mmu_lock.
>
> You can't easily for rmap_write_protect, must check that the state is
> unchanged (write protect operation is still relevant).

We could simply call it in a loop:

   while rmap_write_protect() == SOME_SPTES_CHANGED:
          mark tlbs dirty
          drop lock
          flush remote tlbs
          take lock

with some backoff to prevent livelock.  It looks scary but usually the
pte list is short, and the second loop is fast since everything has been
brought into the cache.

> Current patchset (with corrections to comments by Xiao) is good enough
> already IMO.

Please review the new version then.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-05-21 14:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-03 11:22 [PATCH 0/4] Unlocked TLB flush Avi Kivity
2012-05-03 11:22 ` [PATCH 1/4] KVM: Add APIs for unlocked " Avi Kivity
2012-05-03 13:23   ` Xiao Guangrong
2012-05-03 14:11     ` Avi Kivity
2012-05-07  7:06       ` Xiao Guangrong
2012-05-07  7:59         ` Avi Kivity
2012-05-08  1:55           ` Marcelo Tosatti
2012-05-08  9:09             ` Avi Kivity
2012-05-08 13:50               ` Marcelo Tosatti
2012-05-08  9:09             ` Avi Kivity
2012-05-08  2:25       ` Marcelo Tosatti
2012-05-08 12:39         ` Avi Kivity
2012-05-09 21:03           ` Marcelo Tosatti
2012-05-21 14:45             ` Avi Kivity
2012-05-03 11:23 ` [PATCH 2/4] KVM: Flush TLB in mmu notifier without holding mmu_lock Avi Kivity
2012-05-03 11:23 ` [PATCH 3/4] KVM: Flush TLB in FNAME(invlpg) " Avi Kivity
2012-05-03 11:23 ` [PATCH 4/4] KVM: Flush TLB in change_pte mmu notifier " Avi Kivity
2012-05-08  1:25 ` [PATCH 0/4] Unlocked TLB flush Marcelo Tosatti
2012-05-08  1:27   ` Marcelo Tosatti
2012-05-08 10:51     ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.