linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] KVM: Optimize MMU notifier's THP page invalidation
@ 2012-06-15 11:30 Takuya Yoshikawa
  2012-06-15 11:30 ` [PATCH 1/4] KVM: MMU: Use __gfn_to_rmap() to clean up kvm_handle_hva() Takuya Yoshikawa
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-15 11:30 UTC (permalink / raw)
  To: avi, mtosatti
  Cc: agraf, paulus, aarcange, kvm, kvm-ppc, linux-kernel, takuya.yoshikawa

Takuya Yoshikawa (4):
  KVM: MMU: Use __gfn_to_rmap() to clean up kvm_handle_hva()
  KVM: Introduce hva_to_gfn() for kvm_handle_hva()
  KVM: MMU: Make kvm_handle_hva() handle range of addresses
  KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start()

 arch/powerpc/include/asm/kvm_host.h |    2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   38 +++++++++++++++++++-------
 arch/x86/include/asm/kvm_host.h     |    1 +
 arch/x86/kvm/mmu.c                  |   51 +++++++++++++++++++++++-----------
 include/linux/kvm_host.h            |    7 +++++
 virt/kvm/kvm_main.c                 |    3 +-
 6 files changed, 73 insertions(+), 29 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] KVM: MMU: Use __gfn_to_rmap() to clean up kvm_handle_hva()
  2012-06-15 11:30 [RFC PATCH 0/4] KVM: Optimize MMU notifier's THP page invalidation Takuya Yoshikawa
@ 2012-06-15 11:30 ` Takuya Yoshikawa
  2012-06-15 11:31 ` [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva() Takuya Yoshikawa
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-15 11:30 UTC (permalink / raw)
  To: avi, mtosatti
  Cc: agraf, paulus, aarcange, kvm, kvm-ppc, linux-kernel, takuya.yoshikawa

We can treat every level uniformly.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 24dd43d..a2f3969 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1207,14 +1207,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
 			gfn_t gfn = memslot->base_gfn + gfn_offset;
 
-			ret = handler(kvm, &memslot->rmap[gfn_offset], data);
+			ret = 0;
 
-			for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
-				struct kvm_lpage_info *linfo;
+			for (j = PT_PAGE_TABLE_LEVEL;
+			     j < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
+				unsigned long *rmapp;
 
-				linfo = lpage_info_slot(gfn, memslot,
-							PT_DIRECTORY_LEVEL + j);
-				ret |= handler(kvm, &linfo->rmap_pde, data);
+				rmapp = __gfn_to_rmap(gfn, j, memslot);
+				ret |= handler(kvm, rmapp, data);
 			}
 			trace_kvm_age_page(hva, memslot, ret);
 			retval |= ret;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva()
  2012-06-15 11:30 [RFC PATCH 0/4] KVM: Optimize MMU notifier's THP page invalidation Takuya Yoshikawa
  2012-06-15 11:30 ` [PATCH 1/4] KVM: MMU: Use __gfn_to_rmap() to clean up kvm_handle_hva() Takuya Yoshikawa
@ 2012-06-15 11:31 ` Takuya Yoshikawa
  2012-06-15 21:49   ` Takuya Yoshikawa
  2012-06-18 11:59   ` Avi Kivity
  2012-06-15 11:32 ` [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses Takuya Yoshikawa
  2012-06-15 11:33 ` [PATCH 4/4] KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start() Takuya Yoshikawa
  3 siblings, 2 replies; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-15 11:31 UTC (permalink / raw)
  To: avi, mtosatti
  Cc: agraf, paulus, aarcange, kvm, kvm-ppc, linux-kernel, takuya.yoshikawa

This restricts hva handling in mmu code and makes it easier to extend
kvm_handle_hva() so that it can treat a range of addresses later in this
patch series.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Alexander Graf <agraf@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   12 +++++-------
 arch/x86/kvm/mmu.c                  |   10 +++-------
 include/linux/kvm_host.h            |    7 +++++++
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d03eb6f..53716dd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -767,15 +767,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots) {
-		unsigned long start = memslot->userspace_addr;
-		unsigned long end;
+		gfn_t gfn = hva_to_gfn(hva, memslot);
 
-		end = start + (memslot->npages << PAGE_SHIFT);
-		if (hva >= start && hva < end) {
-			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+		if (gfn >= memslot->base_gfn &&
+		    gfn < memslot->base_gfn + memslot->npages) {
+			gfn_t gfn_offset = gfn - memslot->base_gfn;
 
-			ret = handler(kvm, &memslot->rmap[gfn_offset],
-				      memslot->base_gfn + gfn_offset);
+			ret = handler(kvm, &memslot->rmap[gfn_offset], gfn);
 			retval |= ret;
 		}
 	}
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a2f3969..ba57b3b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1199,14 +1199,10 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	slots = kvm_memslots(kvm);
 
 	kvm_for_each_memslot(memslot, slots) {
-		unsigned long start = memslot->userspace_addr;
-		unsigned long end;
-
-		end = start + (memslot->npages << PAGE_SHIFT);
-		if (hva >= start && hva < end) {
-			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
-			gfn_t gfn = memslot->base_gfn + gfn_offset;
+		gfn_t gfn = hva_to_gfn(hva, memslot);
 
+		if (gfn >= memslot->base_gfn &&
+		    gfn < memslot->base_gfn + memslot->npages) {
 			ret = 0;
 
 			for (j = PT_PAGE_TABLE_LEVEL;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 27ac8a4..92b2029 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -740,6 +740,13 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
 		(base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
 }
 
+static inline gfn_t hva_to_gfn(unsigned long hva, struct kvm_memory_slot *slot)
+{
+	gfn_t gfn_offset = (hva - slot->userspace_addr) >> PAGE_SHIFT;
+
+	return slot->base_gfn + gfn_offset;
+}
+
 static inline unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot,
 					       gfn_t gfn)
 {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses
  2012-06-15 11:30 [RFC PATCH 0/4] KVM: Optimize MMU notifier's THP page invalidation Takuya Yoshikawa
  2012-06-15 11:30 ` [PATCH 1/4] KVM: MMU: Use __gfn_to_rmap() to clean up kvm_handle_hva() Takuya Yoshikawa
  2012-06-15 11:31 ` [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva() Takuya Yoshikawa
@ 2012-06-15 11:32 ` Takuya Yoshikawa
  2012-06-18 12:11   ` Avi Kivity
  2012-06-15 11:33 ` [PATCH 4/4] KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start() Takuya Yoshikawa
  3 siblings, 1 reply; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-15 11:32 UTC (permalink / raw)
  To: avi, mtosatti
  Cc: agraf, paulus, aarcange, kvm, kvm-ppc, linux-kernel, takuya.yoshikawa

When guest's memory is backed by THP pages, MMU notifier needs to call
kvm_unmap_hva(), which in turn leads to kvm_handle_hva(), in a loop to
invalidate a range of pages which constitute one huge page:

  for each guest page
    for each memslot
      if page is in memslot
        unmap using rmap

This means although every page in that range is expected to be found in
the same memslot, we are forced to check unrelated memslots many times.
If the guest has more memslots, the situation will become worse.

This patch, together with the following patch, solves this problem by
introducing kvm_handle_hva_range() which makes the loop look like this:

  for each memslot
    for each guest page in memslot
      unmap using rmap

In this new processing, the actual work is converted to the loop over
rmap array which is much more cache friendly than before.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Alexander Graf <agraf@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   25 +++++++++++++++++++------
 arch/x86/kvm/mmu.c                  |   32 ++++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 53716dd..97465ba 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -756,9 +756,12 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	goto out_put;
 }
 
-static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
-			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
-					 unsigned long gfn))
+static int kvm_handle_hva_range(struct kvm *kvm,
+				unsigned long start_hva,
+				unsigned long end_hva,
+				int (*handler)(struct kvm *kvm,
+					       unsigned long *rmapp,
+					       unsigned long gfn))
 {
 	int ret;
 	int retval = 0;
@@ -767,10 +770,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots) {
-		gfn_t gfn = hva_to_gfn(hva, memslot);
+		gfn_t gfn = hva_to_gfn(start_hva, memslot);
+		gfn_t end_gfn = hva_to_gfn(end_hva, memslot);
 
-		if (gfn >= memslot->base_gfn &&
-		    gfn < memslot->base_gfn + memslot->npages) {
+		gfn = max(gfn, memslot->base_gfn);
+		end_gfn = min(end_gfn, memslot->base_gfn + memslot->npages);
+
+		for (; gfn < end_gfn; gfn++) {
 			gfn_t gfn_offset = gfn - memslot->base_gfn;
 
 			ret = handler(kvm, &memslot->rmap[gfn_offset], gfn);
@@ -781,6 +787,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	return retval;
 }
 
+static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
+			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+					 unsigned long gfn))
+{
+	return kvm_handle_hva_range(kvm, hva, hva + PAGE_SIZE, handler);
+}
+
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			   unsigned long gfn)
 {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ba57b3b..3629f9b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1185,10 +1185,13 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	return 0;
 }
 
-static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
-			  unsigned long data,
-			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
-					 unsigned long data))
+static int kvm_handle_hva_range(struct kvm *kvm,
+				unsigned long start_hva,
+				unsigned long end_hva,
+				unsigned long data,
+				int (*handler)(struct kvm *kvm,
+					       unsigned long *rmapp,
+					       unsigned long data))
 {
 	int j;
 	int ret;
@@ -1199,10 +1202,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	slots = kvm_memslots(kvm);
 
 	kvm_for_each_memslot(memslot, slots) {
-		gfn_t gfn = hva_to_gfn(hva, memslot);
+		gfn_t gfn = hva_to_gfn(start_hva, memslot);
+		gfn_t end_gfn = hva_to_gfn(end_hva, memslot);
+
+		gfn = max(gfn, memslot->base_gfn);
+		end_gfn = min(end_gfn, memslot->base_gfn + memslot->npages);
 
-		if (gfn >= memslot->base_gfn &&
-		    gfn < memslot->base_gfn + memslot->npages) {
+		for (; gfn < end_gfn; gfn++) {
 			ret = 0;
 
 			for (j = PT_PAGE_TABLE_LEVEL;
@@ -1212,7 +1218,9 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 				rmapp = __gfn_to_rmap(gfn, j, memslot);
 				ret |= handler(kvm, rmapp, data);
 			}
-			trace_kvm_age_page(hva, memslot, ret);
+			trace_kvm_age_page(memslot->userspace_addr +
+					(gfn - memslot->base_gfn) * PAGE_SIZE,
+					memslot, ret);
 			retval |= ret;
 		}
 	}
@@ -1220,6 +1228,14 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	return retval;
 }
 
+static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
+			  unsigned long data,
+			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+					 unsigned long data))
+{
+	return kvm_handle_hva_range(kvm, hva, hva + PAGE_SIZE, data, handler);
+}
+
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 {
 	return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start()
  2012-06-15 11:30 [RFC PATCH 0/4] KVM: Optimize MMU notifier's THP page invalidation Takuya Yoshikawa
                   ` (2 preceding siblings ...)
  2012-06-15 11:32 ` [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses Takuya Yoshikawa
@ 2012-06-15 11:33 ` Takuya Yoshikawa
  3 siblings, 0 replies; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-15 11:33 UTC (permalink / raw)
  To: avi, mtosatti
  Cc: agraf, paulus, aarcange, kvm, kvm-ppc, linux-kernel, takuya.yoshikawa

When we tested KVM under memory pressure, with THP enabled on the host,
we noticed that MMU notifier took a long time to invalidate huge pages.

Since the invalidation was done with mmu_lock held, it not only wasted
the CPU but also made the host harder to respond.

This patch mitigates this by using kvm_handle_hva_range().

On our x86 host, with a minimum configuration for the guest, the
invalidation became 40% faster on average and the worst case was also
improved to the same degree.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Alexander Graf <agraf@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |    2 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |    7 +++++++
 arch/x86/include/asm/kvm_host.h     |    1 +
 arch/x86/kvm/mmu.c                  |    5 +++++
 virt/kvm/kvm_main.c                 |    3 +--
 5 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 50ea12f..572ad01 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -52,6 +52,8 @@
 
 struct kvm;
 extern int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+extern int kvm_unmap_hva_range(struct kvm *kvm,
+			       unsigned long start, unsigned long end);
 extern int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 97465ba..9de45b1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -861,6 +861,13 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 	return 0;
 }
 
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
+{
+	if (kvm->arch.using_mmu_notifiers)
+		kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
+	return 0;
+}
+
 static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			 unsigned long gfn)
 {
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..d6ad50d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -930,6 +930,7 @@ extern bool kvm_rebooting;
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3629f9b..6601d1c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1241,6 +1241,11 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 	return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
 }
 
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
+{
+	return kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp);
+}
+
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
 	kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 02cb440..44e44c0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -332,8 +332,7 @@ static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	 * count is also read inside the mmu_lock critical section.
 	 */
 	kvm->mmu_notifier_count++;
-	for (; start < end; start += PAGE_SIZE)
-		need_tlb_flush |= kvm_unmap_hva(kvm, start);
+	need_tlb_flush = kvm_unmap_hva_range(kvm, start, end);
 	need_tlb_flush |= kvm->tlbs_dirty;
 	/* we've to flush the tlb before the pages can be freed */
 	if (need_tlb_flush)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva()
  2012-06-15 11:31 ` [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva() Takuya Yoshikawa
@ 2012-06-15 21:49   ` Takuya Yoshikawa
  2012-06-18 11:59   ` Avi Kivity
  1 sibling, 0 replies; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-15 21:49 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: avi, mtosatti, agraf, paulus, aarcange, kvm, kvm-ppc, linux-kernel

On Fri, 15 Jun 2012 20:31:44 +0900
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> wrote:

...

> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index d03eb6f..53716dd 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -767,15 +767,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  
>  	slots = kvm_memslots(kvm);
>  	kvm_for_each_memslot(memslot, slots) {
> -		unsigned long start = memslot->userspace_addr;
> -		unsigned long end;
> +		gfn_t gfn = hva_to_gfn(hva, memslot);
>  
> -		end = start + (memslot->npages << PAGE_SHIFT);
> -		if (hva >= start && hva < end) {
> -			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
> +		if (gfn >= memslot->base_gfn &&
> +		    gfn < memslot->base_gfn + memslot->npages) {

Here
...

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index a2f3969..ba57b3b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1199,14 +1199,10 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  	slots = kvm_memslots(kvm);
>  
>  	kvm_for_each_memslot(memslot, slots) {
> -		unsigned long start = memslot->userspace_addr;
> -		unsigned long end;
> -
> -		end = start + (memslot->npages << PAGE_SHIFT);
> -		if (hva >= start && hva < end) {
> -			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
> -			gfn_t gfn = memslot->base_gfn + gfn_offset;
> +		gfn_t gfn = hva_to_gfn(hva, memslot);
>  
> +		if (gfn >= memslot->base_gfn &&
> +		    gfn < memslot->base_gfn + memslot->npages) {

and here
...

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 27ac8a4..92b2029 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -740,6 +740,13 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
>  		(base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
>  }
>  
> +static inline gfn_t hva_to_gfn(unsigned long hva, struct kvm_memory_slot *slot)
> +{
> +	gfn_t gfn_offset = (hva - slot->userspace_addr) >> PAGE_SHIFT;
> +
> +	return slot->base_gfn + gfn_offset;
> +}

Something wrong may happen when hva < slot->userspace_addr.

I will fix this after I get some feedback for other parts.

	Takuya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva()
  2012-06-15 11:31 ` [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva() Takuya Yoshikawa
  2012-06-15 21:49   ` Takuya Yoshikawa
@ 2012-06-18 11:59   ` Avi Kivity
  1 sibling, 0 replies; 12+ messages in thread
From: Avi Kivity @ 2012-06-18 11:59 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: mtosatti, agraf, paulus, aarcange, kvm, kvm-ppc, linux-kernel,
	takuya.yoshikawa

On 06/15/2012 02:31 PM, Takuya Yoshikawa wrote:
> This restricts hva handling in mmu code and makes it easier to extend
> kvm_handle_hva() so that it can treat a range of addresses later in this
> patch series.
> 
> 
>  
>  	kvm_for_each_memslot(memslot, slots) {
> -		unsigned long start = memslot->userspace_addr;
> -		unsigned long end;
> -
> -		end = start + (memslot->npages << PAGE_SHIFT);
> -		if (hva >= start && hva < end) {
> -			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
> -			gfn_t gfn = memslot->base_gfn + gfn_offset;
> +		gfn_t gfn = hva_to_gfn(hva, memslot);
>  
> +		if (gfn >= memslot->base_gfn &&
> +		    gfn < memslot->base_gfn + memslot->npages) {

First you convert it, then you check if the conversion worked?  Let's
make is a straightforward check-then-convert (or check-and-convert).

>  			ret = 0;
>  
>  			for (j = PT_PAGE_TABLE_LEVEL;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 27ac8a4..92b2029 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -740,6 +740,13 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
>  		(base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
>  }
>  
> +static inline gfn_t hva_to_gfn(unsigned long hva, struct kvm_memory_slot *slot)
> +{
> +	gfn_t gfn_offset = (hva - slot->userspace_addr) >> PAGE_SHIFT;
> +
> +	return slot->base_gfn + gfn_offset;
> +}

Should be named hva_to_gfn_memslot(), like the below, to emphasise it
isn't generic.

> +
>  static inline unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot,
>  					       gfn_t gfn)
>  {
> 


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses
  2012-06-15 11:32 ` [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses Takuya Yoshikawa
@ 2012-06-18 12:11   ` Avi Kivity
  2012-06-18 13:20     ` Takuya Yoshikawa
  2012-06-19 13:46     ` Takuya Yoshikawa
  0 siblings, 2 replies; 12+ messages in thread
From: Avi Kivity @ 2012-06-18 12:11 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: mtosatti, agraf, paulus, aarcange, kvm, kvm-ppc, linux-kernel,
	takuya.yoshikawa

On 06/15/2012 02:32 PM, Takuya Yoshikawa wrote:
> When guest's memory is backed by THP pages, MMU notifier needs to call
> kvm_unmap_hva(), which in turn leads to kvm_handle_hva(), in a loop to
> invalidate a range of pages which constitute one huge page:
> 
>   for each guest page
>     for each memslot
>       if page is in memslot
>         unmap using rmap
> 
> This means although every page in that range is expected to be found in
> the same memslot, we are forced to check unrelated memslots many times.
> If the guest has more memslots, the situation will become worse.
> 
> This patch, together with the following patch, solves this problem by
> introducing kvm_handle_hva_range() which makes the loop look like this:
> 
>   for each memslot
>     for each guest page in memslot
>       unmap using rmap
> 
> In this new processing, the actual work is converted to the loop over
> rmap array which is much more cache friendly than before.


Moreover, if the pages are in no slot (munmap of some non-guest memory),
then we're iterating over all those pages for no purpose.

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ba57b3b..3629f9b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1185,10 +1185,13 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  	return 0;
>  }
>  
> -static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
> -			  unsigned long data,
> -			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
> -					 unsigned long data))
> +static int kvm_handle_hva_range(struct kvm *kvm,
> +				unsigned long start_hva,
> +				unsigned long end_hva,
> +				unsigned long data,
> +				int (*handler)(struct kvm *kvm,
> +					       unsigned long *rmapp,
> +					       unsigned long data))
>  {
>  	int j;
>  	int ret;
> @@ -1199,10 +1202,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  	slots = kvm_memslots(kvm);
>  
>  	kvm_for_each_memslot(memslot, slots) {
> -		gfn_t gfn = hva_to_gfn(hva, memslot);
> +		gfn_t gfn = hva_to_gfn(start_hva, memslot);
> +		gfn_t end_gfn = hva_to_gfn(end_hva, memslot);

These will return random results which you then use in min/max later, no?

> +
> +		gfn = max(gfn, memslot->base_gfn);
> +		end_gfn = min(end_gfn, memslot->base_gfn + memslot->npages);
>  
> -		if (gfn >= memslot->base_gfn &&
> -		    gfn < memslot->base_gfn + memslot->npages) {
> +		for (; gfn < end_gfn; gfn++) {
>  			ret = 0;
>  
>  			for (j = PT_PAGE_TABLE_LEVEL;
> @@ -1212,7 +1218,9 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  				rmapp = __gfn_to_rmap(gfn, j, memslot);
>  				ret |= handler(kvm, rmapp, data);

Potential for improvement: don't do 512 iterations on same large page.

Something like

    if ((gfn ^ prev_gfn) & mask(level))
        ret |= handler(...)

with clever selection of the first prev_gfn so it always matches (~gfn
maybe).


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses
  2012-06-18 12:11   ` Avi Kivity
@ 2012-06-18 13:20     ` Takuya Yoshikawa
  2012-06-19 13:46     ` Takuya Yoshikawa
  1 sibling, 0 replies; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-18 13:20 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Takuya Yoshikawa, mtosatti, agraf, paulus, aarcange, kvm,
	kvm-ppc, linux-kernel

On Mon, 18 Jun 2012 15:11:42 +0300
Avi Kivity <avi@redhat.com> wrote:


> >  	kvm_for_each_memslot(memslot, slots) {
> > -		gfn_t gfn = hva_to_gfn(hva, memslot);
> > +		gfn_t gfn = hva_to_gfn(start_hva, memslot);
> > +		gfn_t end_gfn = hva_to_gfn(end_hva, memslot);
> 
> These will return random results which you then use in min/max later, no?

Yes, I will follow your advice: check-then-convert (or check-and-convert).

> > @@ -1212,7 +1218,9 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
> >  				rmapp = __gfn_to_rmap(gfn, j, memslot);
> >  				ret |= handler(kvm, rmapp, data);
> 
> Potential for improvement: don't do 512 iterations on same large page.
> 
> Something like
> 
>     if ((gfn ^ prev_gfn) & mask(level))
>         ret |= handler(...)
> 
> with clever selection of the first prev_gfn so it always matches (~gfn
> maybe).

Really nice.
I'm sure that will make this much faster!

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses
  2012-06-18 12:11   ` Avi Kivity
  2012-06-18 13:20     ` Takuya Yoshikawa
@ 2012-06-19 13:46     ` Takuya Yoshikawa
  2012-06-21  8:24       ` Avi Kivity
  1 sibling, 1 reply; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-19 13:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Takuya Yoshikawa, mtosatti, agraf, paulus, aarcange, kvm,
	kvm-ppc, linux-kernel

On Mon, 18 Jun 2012 15:11:42 +0300
Avi Kivity <avi@redhat.com> wrote:

> Potential for improvement: don't do 512 iterations on same large page.
> 
> Something like
> 
>     if ((gfn ^ prev_gfn) & mask(level))
>         ret |= handler(...)
> 
> with clever selection of the first prev_gfn so it always matches (~gfn
> maybe).


I thought up a better solution:

1. Separate rmap_pde from lpage_info->write_count and
   make this a simple array. (I once tried this.)

2. Use gfn_to_index() and loop over rmap array:
 ...
  /* intersection check */
  start = max(start, memslot->userspace_addr);
  end = min(end, memslot->userspace_addr +
                 (memslot->npages << PAGE_SHIFT));
  if (start > end)
      continue;

  /* hva to gfn conversion */
  gfn_start = hva_to_gfn_memslot(start);
  gfn_end = hva_to_gfn_memslot(end);

  /* main part */
  for each level {
      rmapp = __gfn_to_rmap(gfn_start, level, memslot);
      for (idx = gfn_to_index(gfn_start, memslot->base_gfn, level);
           idx < gfn_to_index(gfn_end, memslot->base_gfn, level); idx++) {
              ...
          /* loop over rmap array */
          ret |= handler(kvm, rmapp + idx, data);
      }
  }


Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses
  2012-06-19 13:46     ` Takuya Yoshikawa
@ 2012-06-21  8:24       ` Avi Kivity
  2012-06-21 13:41         ` Takuya Yoshikawa
  0 siblings, 1 reply; 12+ messages in thread
From: Avi Kivity @ 2012-06-21  8:24 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: Takuya Yoshikawa, mtosatti, agraf, paulus, aarcange, kvm,
	kvm-ppc, linux-kernel

On 06/19/2012 04:46 PM, Takuya Yoshikawa wrote:
> On Mon, 18 Jun 2012 15:11:42 +0300
> Avi Kivity <avi@redhat.com> wrote:
> 
>> Potential for improvement: don't do 512 iterations on same large page.
>> 
>> Something like
>> 
>>     if ((gfn ^ prev_gfn) & mask(level))
>>         ret |= handler(...)
>> 
>> with clever selection of the first prev_gfn so it always matches (~gfn
>> maybe).
> 
> 
> I thought up a better solution:
> 
> 1. Separate rmap_pde from lpage_info->write_count and
>    make this a simple array. (I once tried this.)
> 

This has the potential to increase cache misses, but I don't think it's
a killer.  The separation can simplify other things as well.


> 2. Use gfn_to_index() and loop over rmap array:
>  ...
>   /* intersection check */
>   start = max(start, memslot->userspace_addr);
>   end = min(end, memslot->userspace_addr +
>                  (memslot->npages << PAGE_SHIFT));
>   if (start > end)
>       continue;
> 
>   /* hva to gfn conversion */
>   gfn_start = hva_to_gfn_memslot(start);
>   gfn_end = hva_to_gfn_memslot(end);
> 
>   /* main part */
>   for each level {
>       rmapp = __gfn_to_rmap(gfn_start, level, memslot);
>       for (idx = gfn_to_index(gfn_start, memslot->base_gfn, level);
>            idx < gfn_to_index(gfn_end, memslot->base_gfn, level); idx++) {
>               ...
>           /* loop over rmap array */
>           ret |= handler(kvm, rmapp + idx, data);
>       }
>   }
> 

Probably want idx <= gfn_to_index(gfn_end-1, ...) otherwise we fail on
small slots.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses
  2012-06-21  8:24       ` Avi Kivity
@ 2012-06-21 13:41         ` Takuya Yoshikawa
  0 siblings, 0 replies; 12+ messages in thread
From: Takuya Yoshikawa @ 2012-06-21 13:41 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Takuya Yoshikawa, mtosatti, agraf, paulus, aarcange, kvm,
	kvm-ppc, linux-kernel

I should have read this before sending v2...

On Thu, 21 Jun 2012 11:24:59 +0300
Avi Kivity <avi@redhat.com> wrote:

> > 1. Separate rmap_pde from lpage_info->write_count and
> >    make this a simple array. (I once tried this.)
> > 
> 
> This has the potential to increase cache misses, but I don't think it's
> a killer.  The separation can simplify other things as well.

Yes, I think so too.

IIRC, write_count and rmap_pde are not used together so often.

> > 2. Use gfn_to_index() and loop over rmap array:

...

> >   /* main part */
> >   for each level {
> >       rmapp = __gfn_to_rmap(gfn_start, level, memslot);
> >       for (idx = gfn_to_index(gfn_start, memslot->base_gfn, level);
> >            idx < gfn_to_index(gfn_end, memslot->base_gfn, level); idx++) {
> >               ...
> >           /* loop over rmap array */
> >           ret |= handler(kvm, rmapp + idx, data);
> >       }
> >   }
> > 
> 
> Probably want idx <= gfn_to_index(gfn_end-1, ...) otherwise we fail on
> small slots.

I was thinking the same thing when making v2.
But I will check the boundary condition again.

(mmu_notifier + memslot + lpage + rmap...) * alignment...
Very confusing.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-06-21 13:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-15 11:30 [RFC PATCH 0/4] KVM: Optimize MMU notifier's THP page invalidation Takuya Yoshikawa
2012-06-15 11:30 ` [PATCH 1/4] KVM: MMU: Use __gfn_to_rmap() to clean up kvm_handle_hva() Takuya Yoshikawa
2012-06-15 11:31 ` [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva() Takuya Yoshikawa
2012-06-15 21:49   ` Takuya Yoshikawa
2012-06-18 11:59   ` Avi Kivity
2012-06-15 11:32 ` [PATCH 3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses Takuya Yoshikawa
2012-06-18 12:11   ` Avi Kivity
2012-06-18 13:20     ` Takuya Yoshikawa
2012-06-19 13:46     ` Takuya Yoshikawa
2012-06-21  8:24       ` Avi Kivity
2012-06-21 13:41         ` Takuya Yoshikawa
2012-06-15 11:33 ` [PATCH 4/4] KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start() Takuya Yoshikawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).