[PATCH 0/2] KVM: PPC: Book3S HV: Optimize for MMIO emulation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] KVM: PPC: Book3S HV: Optimize for MMIO emulation
@ 2016-11-02 13:13 ` Yongji Xie
  0 siblings, 0 replies; 10+ messages in thread
From: Yongji Xie @ 2016-11-02 13:13 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: paulus, zhong

Today, for each emulated MMIO operation, we have to lock the guest
HPTE twice. One is in real mode and another is in virtual mode.

This series aimed at reducing those MMIO HPTE locks, which will improve
the IO scalability of emulated device, especially virtio device. Because
currently almost all of virtio nofications(Port IO access) will hit the
same HPTE.

In this series, patch 1 fixed a bug on key field of HPTE; patch 2 tried
to reduce MMIO HPTE locks and speed up MMIO emulation by keeping a per
vcpu cache for recently page faulted MMIO entries.

Yongji Xie (2):
  KVM: PPC: Book3S HV: Clear the key field of HPTE when the page is paged out
  KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries

 arch/powerpc/include/asm/kvm_host.h |   21 ++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   20 ++++++-
 arch/powerpc/kvm/book3s_hv.c        |    9 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  120 +++++++++++++++++++++++++++++------
 4 files changed, 149 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/2] KVM: PPC: Book3S HV: Optimize for MMIO emulation
@ 2016-11-02 13:13 ` Yongji Xie
  0 siblings, 0 replies; 10+ messages in thread
From: Yongji Xie @ 2016-11-02 13:13 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: paulus, zhong

Today, for each emulated MMIO operation, we have to lock the guest
HPTE twice. One is in real mode and another is in virtual mode.

This series aimed at reducing those MMIO HPTE locks, which will improve
the IO scalability of emulated device, especially virtio device. Because
currently almost all of virtio nofications(Port IO access) will hit the
same HPTE.

In this series, patch 1 fixed a bug on key field of HPTE; patch 2 tried
to reduce MMIO HPTE locks and speed up MMIO emulation by keeping a per
vcpu cache for recently page faulted MMIO entries.

Yongji Xie (2):
  KVM: PPC: Book3S HV: Clear the key field of HPTE when the page is paged out
  KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries

 arch/powerpc/include/asm/kvm_host.h |   21 ++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   20 ++++++-
 arch/powerpc/kvm/book3s_hv.c        |    9 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  120 +++++++++++++++++++++++++++++------
 4 files changed, 149 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] KVM: PPC: Book3S HV: Clear the key field of HPTE when the page is paged out
  2016-11-02 13:13 ` Yongji Xie
@ 2016-11-02 13:13   ` Yongji Xie
  -1 siblings, 0 replies; 10+ messages in thread
From: Yongji Xie @ 2016-11-02 13:13 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: paulus, zhong

Currently we mark a HPTE for emulated MMIO with HPTE_V_ABSENT bit
set as well as key 0x1f. However, those HPTEs may be conflicted with
the HPTE for real guest RAM page HPTE with key 0x1f when the page
get paged out.

This patch clears the key field of HPTE when the page is paged out,
then recover it when HPTE is re-established.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |    4 +++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |    5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 05f09ae..ec9e545 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -575,7 +575,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 */
 	if (psize < PAGE_SIZE)
 		psize = PAGE_SIZE;
-	r = (r & ~(HPTE_R_PP0 - psize)) | ((pfn << PAGE_SHIFT) & ~(psize - 1));
+	r = (r & HPTE_R_KEY_HI) | (r & ~(HPTE_R_PP0 - psize)) |
+					((pfn << PAGE_SHIFT) & ~(psize - 1));
 	if (hpte_is_writable(r) && !write_ok)
 		r = hpte_make_readonly(r);
 	ret = RESUME_GUEST;
@@ -758,6 +759,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		    hpte_rpn(ptel, psize) == gfn) {
 			hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
 			kvmppc_invalidate_hpte(kvm, hptep, i);
+			hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
 			/* Harvest R and C */
 			rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
 			*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 99b4e9d..61ff8ee 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -264,8 +264,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	if (pa)
 		pteh |= HPTE_V_VALID;
-	else
+	else {
 		pteh |= HPTE_V_ABSENT;
+		ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+	}
 
 	/*If we had host pte mapping then  Check WIMG */
 	if (ptep && !hpte_cache_flags_ok(ptel, is_ci)) {
@@ -351,6 +353,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 			/* inval in progress, write a non-present HPTE */
 			pteh |= HPTE_V_ABSENT;
 			pteh &= ~HPTE_V_VALID;
+			ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
 			unlock_rmap(rmap);
 		} else {
 			kvmppc_add_revmap_chain(kvm, rev, rmap, pte_index,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 1/2] KVM: PPC: Book3S HV: Clear the key field of HPTE when the page is paged out
@ 2016-11-02 13:13   ` Yongji Xie
  0 siblings, 0 replies; 10+ messages in thread
From: Yongji Xie @ 2016-11-02 13:13 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: paulus, zhong

Currently we mark a HPTE for emulated MMIO with HPTE_V_ABSENT bit
set as well as key 0x1f. However, those HPTEs may be conflicted with
the HPTE for real guest RAM page HPTE with key 0x1f when the page
get paged out.

This patch clears the key field of HPTE when the page is paged out,
then recover it when HPTE is re-established.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |    4 +++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |    5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 05f09ae..ec9e545 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -575,7 +575,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 */
 	if (psize < PAGE_SIZE)
 		psize = PAGE_SIZE;
-	r = (r & ~(HPTE_R_PP0 - psize)) | ((pfn << PAGE_SHIFT) & ~(psize - 1));
+	r = (r & HPTE_R_KEY_HI) | (r & ~(HPTE_R_PP0 - psize)) |
+					((pfn << PAGE_SHIFT) & ~(psize - 1));
 	if (hpte_is_writable(r) && !write_ok)
 		r = hpte_make_readonly(r);
 	ret = RESUME_GUEST;
@@ -758,6 +759,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		    hpte_rpn(ptel, psize) = gfn) {
 			hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
 			kvmppc_invalidate_hpte(kvm, hptep, i);
+			hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
 			/* Harvest R and C */
 			rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
 			*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 99b4e9d..61ff8ee 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -264,8 +264,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	if (pa)
 		pteh |= HPTE_V_VALID;
-	else
+	else {
 		pteh |= HPTE_V_ABSENT;
+		ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+	}
 
 	/*If we had host pte mapping then  Check WIMG */
 	if (ptep && !hpte_cache_flags_ok(ptel, is_ci)) {
@@ -351,6 +353,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 			/* inval in progress, write a non-present HPTE */
 			pteh |= HPTE_V_ABSENT;
 			pteh &= ~HPTE_V_VALID;
+			ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
 			unlock_rmap(rmap);
 		} else {
 			kvmppc_add_revmap_chain(kvm, rev, rmap, pte_index,
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries
  2016-11-02 13:13 ` Yongji Xie
@ 2016-11-02 13:13   ` Yongji Xie
  -1 siblings, 0 replies; 10+ messages in thread
From: Yongji Xie @ 2016-11-02 13:13 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: paulus, zhong

This keeps a per vcpu cache for recently page faulted MMIO entries.
On a page fault, if the entry exists in the cache, we can avoid
locking the HPTE in real mode and virtual mode, then directly
call kvmppc_hv_emulate_mmio().

In current implenment, we limit the size of cache to four. We think
it's enough to cover the high-frequency MMIO HPTEs in most case.
For example, considering the case of using virtio device, for virtio
legacy devices, one HPTE could handle notifications from up to
1024 (64K page / 64 byte Port IO register) devices, so one cache entry
is enough; for virtio modern devices, we always need one HPTE to handle
notification for each device because modern device would use a 8M MMIO
register to notify host instead of Port IO register, typically the
system's configuration should not exceed four virtio devices per
vcpu, four cache entry is also enough in this case. Of course, if needed,
we could also modify the macro to a module parameter in the future.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h |   21 ++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   16 +++++
 arch/powerpc/kvm/book3s_hv.c        |    9 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  115 +++++++++++++++++++++++++++++------
 4 files changed, 142 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 28350a2..20ef27d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -246,6 +246,7 @@ struct kvm_arch {
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	unsigned long hpt_virt;
 	struct revmap_entry *revmap;
+	atomic64_t mmio_update;
 	unsigned int host_lpid;
 	unsigned long host_lpcr;
 	unsigned long sdr1;
@@ -408,6 +409,24 @@ struct kvmppc_passthru_irqmap {
 #define KVMPPC_IRQ_MPIC		1
 #define KVMPPC_IRQ_XICS		2
 
+#define MMIO_HPTE_CACHE_SIZE	4
+
+struct mmio_hpte_cache_entry {
+	unsigned long hpte_v;
+	unsigned long hpte_r;
+	unsigned long rpte;
+	unsigned long pte_index;
+	unsigned long eaddr;
+	unsigned long slb_v;
+	long mmio_update;
+	unsigned int slb_base_pshift;
+};
+
+struct mmio_hpte_cache {
+	struct mmio_hpte_cache_entry entry[MMIO_HPTE_CACHE_SIZE];
+	unsigned int index;
+};
+
 struct openpic;
 
 struct kvm_vcpu_arch {
@@ -655,9 +674,11 @@ struct kvm_vcpu_arch {
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	struct kvm_vcpu_arch_shared shregs;
 
+	struct mmio_hpte_cache mmio_cache;
 	unsigned long pgfault_addr;
 	long pgfault_index;
 	unsigned long pgfault_hpte[2];
+	struct mmio_hpte_cache_entry *pgfault_cache;
 
 	struct task_struct *run_task;
 	struct kvm_run *kvm_run;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ec9e545..7755bd0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -88,6 +88,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 	/* 128 (2**7) bytes in each HPTEG */
 	kvm->arch.hpt_mask = (1ul << (order - 7)) - 1;
 
+	atomic64_set(&kvm->arch.mmio_update, 0);
+
 	/* Allocate reverse map array */
 	rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte);
 	if (!rev) {
@@ -451,6 +453,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	unsigned int writing, write_ok;
 	struct vm_area_struct *vma;
 	unsigned long rcbits;
+	long mmio_update;
 
 	/*
 	 * Real-mode code has already searched the HPT and found the
@@ -460,6 +463,19 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 */
 	if (ea != vcpu->arch.pgfault_addr)
 		return RESUME_GUEST;
+
+	if (vcpu->arch.pgfault_cache) {
+		mmio_update = atomic64_read(&kvm->arch.mmio_update);
+		if (mmio_update == vcpu->arch.pgfault_cache->mmio_update) {
+			r = vcpu->arch.pgfault_cache->rpte;
+			psize = hpte_page_size(vcpu->arch.pgfault_hpte[0], r);
+			gpa_base = r & HPTE_R_RPN & ~(psize - 1);
+			gfn_base = gpa_base >> PAGE_SHIFT;
+			gpa = gpa_base | (ea & (psize - 1));
+			return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea,
+						dsisr & DSISR_ISSTORE);
+		}
+	}
 	index = vcpu->arch.pgfault_index;
 	hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
 	rev = &kvm->arch.revmap[index];
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3686471..40b2b6d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2971,6 +2971,15 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
 
+	/*
+	 * If we are making a new memslot, it might make
+	 * some address that was previously cached as emulated
+	 * MMIO be no longer emulated MMIO, so invalidate
+	 * all the caches of emulated MMIO translations.
+	 */
+	if (npages)
+		atomic64_inc(&kvm->arch.mmio_update);
+
 	if (npages && old->npages) {
 		/*
 		 * If modifying a memslot, reset all the rmap dirty bits.
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 61ff8ee..1f048e7 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -389,6 +389,11 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 #define LOCK_TOKEN	(*(u32 *)(&get_paca()->paca_index))
 #endif
 
+static inline int is_mmio_hpte(unsigned long v, unsigned long r)
+{
+	return ((v & HPTE_V_ABSENT) && (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)));
+}
+
 static inline int try_lock_tlbie(unsigned int *lock)
 {
 	unsigned int tmp, old;
@@ -455,6 +460,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 
 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	v = pte & ~HPTE_V_HVLOCK;
+	pte = be64_to_cpu(hpte[1]);
 	if (v & HPTE_V_VALID) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
 		rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
@@ -475,6 +481,9 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	note_hpte_modification(kvm, rev);
 	unlock_hpte(hpte, 0);
 
+	if (is_mmio_hpte(v, pte))
+		atomic64_inc(&kvm->arch.mmio_update);
+
 	if (v & HPTE_V_ABSENT)
 		v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID;
 	hpret[0] = v;
@@ -501,7 +510,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 	int global;
 	long int ret = H_SUCCESS;
 	struct revmap_entry *rev, *revs[4];
-	u64 hp0;
+	u64 hp0, hp1;
 
 	global = global_invalidates(kvm, 0);
 	for (i = 0; i < 4 && ret == H_SUCCESS; ) {
@@ -534,6 +543,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 			found = 0;
 			hp0 = be64_to_cpu(hp[0]);
+			hp1 = be64_to_cpu(hp[1]);
 			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
 				switch (flags & 3) {
 				case 0:		/* absolute */
@@ -564,6 +574,8 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 				rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
 				args[j] |= rcbits << (56 - 5);
 				hp[0] = 0;
+				if (is_mmio_hpte(hp0, hp1))
+					atomic64_inc(&kvm->arch.mmio_update);
 				continue;
 			}
 
@@ -624,6 +636,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	}
 
 	v = pte;
+	pte = be64_to_cpu(hpte[1]);
 	bits = (flags << 55) & HPTE_R_PP0;
 	bits |= (flags << 48) & HPTE_R_KEY_HI;
 	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
@@ -645,7 +658,6 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 		 * readonly to writable.  If it should be writable, we'll
 		 * take a trap and let the page fault code sort it out.
 		 */
-		pte = be64_to_cpu(hpte[1]);
 		r = (pte & ~mask) | bits;
 		if (hpte_is_writable(r) && !hpte_is_writable(pte))
 			r = hpte_make_readonly(r);
@@ -661,6 +673,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	}
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
 	asm volatile("ptesync" : : : "memory");
+	if (is_mmio_hpte(v, pte))
+		atomic64_inc(&kvm->arch.mmio_update);
+
 	return H_SUCCESS;
 }
 
@@ -831,6 +846,37 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
 	20,	/* 1M, unsupported */
 };
 
+static struct mmio_hpte_cache_entry *mmio_cache_search(struct kvm_vcpu *vcpu,
+		unsigned long eaddr, unsigned long slb_v, long mmio_update)
+{
+	struct mmio_hpte_cache_entry *entry = NULL;
+	unsigned int pshift;
+	unsigned int i;
+
+	for (i = 0; i < MMIO_HPTE_CACHE_SIZE; i++) {
+		entry = &vcpu->arch.mmio_cache.entry[i];
+		if (entry->mmio_update == mmio_update) {
+			pshift = entry->slb_base_pshift;
+			if ((entry->eaddr >> pshift) == (eaddr >> pshift) &&
+			    entry->slb_v == slb_v)
+				return entry;
+		}
+	}
+	return NULL;
+}
+
+static struct mmio_hpte_cache_entry *
+			next_mmio_cache_entry(struct kvm_vcpu *vcpu)
+{
+	unsigned int index = vcpu->arch.mmio_cache.index;
+
+	vcpu->arch.mmio_cache.index++;
+	if (vcpu->arch.mmio_cache.index == MMIO_HPTE_CACHE_SIZE)
+		vcpu->arch.mmio_cache.index = 0;
+
+	return &vcpu->arch.mmio_cache.entry[index];
+}
+
 /* When called from virtmode, this func should be protected by
  * preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK
  * can trigger deadlock issue.
@@ -932,25 +978,36 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	unsigned long valid;
 	struct revmap_entry *rev;
 	unsigned long pp, key;
+	struct mmio_hpte_cache_entry *cache_entry = NULL;
+	long mmio_update = 0;
 
 	/* For protection fault, expect to find a valid HPTE */
 	valid = HPTE_V_VALID;
-	if (status & DSISR_NOHPTE)
+	if (status & DSISR_NOHPTE) {
 		valid |= HPTE_V_ABSENT;
-
-	index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
-	if (index < 0) {
-		if (status & DSISR_NOHPTE)
-			return status;	/* there really was no HPTE */
-		return 0;		/* for prot fault, HPTE disappeared */
+		mmio_update = atomic64_read(&kvm->arch.mmio_update);
+		cache_entry = mmio_cache_search(vcpu, addr, slb_v, mmio_update);
 	}
-	hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
-	v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
-	r = be64_to_cpu(hpte[1]);
-	rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
-	gr = rev->guest_rpte;
+	if (cache_entry) {
+		index = cache_entry->pte_index;
+		v = cache_entry->hpte_v;
+		r = cache_entry->hpte_r;
+		gr = cache_entry->rpte;
+	} else {
+		index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
+		if (index < 0) {
+			if (status & DSISR_NOHPTE)
+				return status;	/* there really was no HPTE */
+			return 0;	/* for prot fault, HPTE disappeared */
+		}
+		hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
+		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
+		r = be64_to_cpu(hpte[1]);
+		rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
+		gr = rev->guest_rpte;
 
-	unlock_hpte(hpte, v);
+		unlock_hpte(hpte, v);
+	}
 
 	/* For not found, if the HPTE is valid by now, retry the instruction */
 	if ((status & DSISR_NOHPTE) && (v & HPTE_V_VALID))
@@ -988,12 +1045,32 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	vcpu->arch.pgfault_index = index;
 	vcpu->arch.pgfault_hpte[0] = v;
 	vcpu->arch.pgfault_hpte[1] = r;
+	vcpu->arch.pgfault_cache = cache_entry;
 
 	/* Check the storage key to see if it is possibly emulated MMIO */
-	if (data && (vcpu->arch.shregs.msr & MSR_IR) &&
-	    (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
-	    (HPTE_R_KEY_HI | HPTE_R_KEY_LO))
-		return -2;	/* MMIO emulation - load instr word */
+	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
+	    (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) {
+		if (!cache_entry) {
+			unsigned int pshift = 12;
+			unsigned int pshift_index;
+
+			if (slb_v & SLB_VSID_L) {
+				pshift_index = ((slb_v & SLB_VSID_LP) >> 4);
+				pshift = slb_base_page_shift[pshift_index];
+			}
+			cache_entry = next_mmio_cache_entry(vcpu);
+			cache_entry->eaddr = addr;
+			cache_entry->slb_base_pshift = pshift;
+			cache_entry->pte_index = index;
+			cache_entry->hpte_v = v;
+			cache_entry->hpte_r = r;
+			cache_entry->rpte = gr;
+			cache_entry->slb_v = slb_v;
+			cache_entry->mmio_update = mmio_update;
+		}
+		if (data && (vcpu->arch.shregs.msr & MSR_IR))
+			return -2;	/* MMIO emulation - load instr word */
+	}
 
 	return -1;		/* send fault up to host kernel mode */
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries
@ 2016-11-02 13:13   ` Yongji Xie
  0 siblings, 0 replies; 10+ messages in thread
From: Yongji Xie @ 2016-11-02 13:13 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: paulus, zhong

This keeps a per vcpu cache for recently page faulted MMIO entries.
On a page fault, if the entry exists in the cache, we can avoid
locking the HPTE in real mode and virtual mode, then directly
call kvmppc_hv_emulate_mmio().

In current implenment, we limit the size of cache to four. We think
it's enough to cover the high-frequency MMIO HPTEs in most case.
For example, considering the case of using virtio device, for virtio
legacy devices, one HPTE could handle notifications from up to
1024 (64K page / 64 byte Port IO register) devices, so one cache entry
is enough; for virtio modern devices, we always need one HPTE to handle
notification for each device because modern device would use a 8M MMIO
register to notify host instead of Port IO register, typically the
system's configuration should not exceed four virtio devices per
vcpu, four cache entry is also enough in this case. Of course, if needed,
we could also modify the macro to a module parameter in the future.

Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h |   21 ++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   16 +++++
 arch/powerpc/kvm/book3s_hv.c        |    9 +++
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  115 +++++++++++++++++++++++++++++------
 4 files changed, 142 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 28350a2..20ef27d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -246,6 +246,7 @@ struct kvm_arch {
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	unsigned long hpt_virt;
 	struct revmap_entry *revmap;
+	atomic64_t mmio_update;
 	unsigned int host_lpid;
 	unsigned long host_lpcr;
 	unsigned long sdr1;
@@ -408,6 +409,24 @@ struct kvmppc_passthru_irqmap {
 #define KVMPPC_IRQ_MPIC		1
 #define KVMPPC_IRQ_XICS		2
 
+#define MMIO_HPTE_CACHE_SIZE	4
+
+struct mmio_hpte_cache_entry {
+	unsigned long hpte_v;
+	unsigned long hpte_r;
+	unsigned long rpte;
+	unsigned long pte_index;
+	unsigned long eaddr;
+	unsigned long slb_v;
+	long mmio_update;
+	unsigned int slb_base_pshift;
+};
+
+struct mmio_hpte_cache {
+	struct mmio_hpte_cache_entry entry[MMIO_HPTE_CACHE_SIZE];
+	unsigned int index;
+};
+
 struct openpic;
 
 struct kvm_vcpu_arch {
@@ -655,9 +674,11 @@ struct kvm_vcpu_arch {
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	struct kvm_vcpu_arch_shared shregs;
 
+	struct mmio_hpte_cache mmio_cache;
 	unsigned long pgfault_addr;
 	long pgfault_index;
 	unsigned long pgfault_hpte[2];
+	struct mmio_hpte_cache_entry *pgfault_cache;
 
 	struct task_struct *run_task;
 	struct kvm_run *kvm_run;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ec9e545..7755bd0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -88,6 +88,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
 	/* 128 (2**7) bytes in each HPTEG */
 	kvm->arch.hpt_mask = (1ul << (order - 7)) - 1;
 
+	atomic64_set(&kvm->arch.mmio_update, 0);
+
 	/* Allocate reverse map array */
 	rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte);
 	if (!rev) {
@@ -451,6 +453,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	unsigned int writing, write_ok;
 	struct vm_area_struct *vma;
 	unsigned long rcbits;
+	long mmio_update;
 
 	/*
 	 * Real-mode code has already searched the HPT and found the
@@ -460,6 +463,19 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 */
 	if (ea != vcpu->arch.pgfault_addr)
 		return RESUME_GUEST;
+
+	if (vcpu->arch.pgfault_cache) {
+		mmio_update = atomic64_read(&kvm->arch.mmio_update);
+		if (mmio_update = vcpu->arch.pgfault_cache->mmio_update) {
+			r = vcpu->arch.pgfault_cache->rpte;
+			psize = hpte_page_size(vcpu->arch.pgfault_hpte[0], r);
+			gpa_base = r & HPTE_R_RPN & ~(psize - 1);
+			gfn_base = gpa_base >> PAGE_SHIFT;
+			gpa = gpa_base | (ea & (psize - 1));
+			return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea,
+						dsisr & DSISR_ISSTORE);
+		}
+	}
 	index = vcpu->arch.pgfault_index;
 	hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
 	rev = &kvm->arch.revmap[index];
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3686471..40b2b6d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2971,6 +2971,15 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
 
+	/*
+	 * If we are making a new memslot, it might make
+	 * some address that was previously cached as emulated
+	 * MMIO be no longer emulated MMIO, so invalidate
+	 * all the caches of emulated MMIO translations.
+	 */
+	if (npages)
+		atomic64_inc(&kvm->arch.mmio_update);
+
 	if (npages && old->npages) {
 		/*
 		 * If modifying a memslot, reset all the rmap dirty bits.
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 61ff8ee..1f048e7 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -389,6 +389,11 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 #define LOCK_TOKEN	(*(u32 *)(&get_paca()->paca_index))
 #endif
 
+static inline int is_mmio_hpte(unsigned long v, unsigned long r)
+{
+	return ((v & HPTE_V_ABSENT) && (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)));
+}
+
 static inline int try_lock_tlbie(unsigned int *lock)
 {
 	unsigned int tmp, old;
@@ -455,6 +460,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 
 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	v = pte & ~HPTE_V_HVLOCK;
+	pte = be64_to_cpu(hpte[1]);
 	if (v & HPTE_V_VALID) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
 		rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
@@ -475,6 +481,9 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	note_hpte_modification(kvm, rev);
 	unlock_hpte(hpte, 0);
 
+	if (is_mmio_hpte(v, pte))
+		atomic64_inc(&kvm->arch.mmio_update);
+
 	if (v & HPTE_V_ABSENT)
 		v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID;
 	hpret[0] = v;
@@ -501,7 +510,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 	int global;
 	long int ret = H_SUCCESS;
 	struct revmap_entry *rev, *revs[4];
-	u64 hp0;
+	u64 hp0, hp1;
 
 	global = global_invalidates(kvm, 0);
 	for (i = 0; i < 4 && ret = H_SUCCESS; ) {
@@ -534,6 +543,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 			found = 0;
 			hp0 = be64_to_cpu(hp[0]);
+			hp1 = be64_to_cpu(hp[1]);
 			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
 				switch (flags & 3) {
 				case 0:		/* absolute */
@@ -564,6 +574,8 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 				rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
 				args[j] |= rcbits << (56 - 5);
 				hp[0] = 0;
+				if (is_mmio_hpte(hp0, hp1))
+					atomic64_inc(&kvm->arch.mmio_update);
 				continue;
 			}
 
@@ -624,6 +636,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	}
 
 	v = pte;
+	pte = be64_to_cpu(hpte[1]);
 	bits = (flags << 55) & HPTE_R_PP0;
 	bits |= (flags << 48) & HPTE_R_KEY_HI;
 	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
@@ -645,7 +658,6 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 		 * readonly to writable.  If it should be writable, we'll
 		 * take a trap and let the page fault code sort it out.
 		 */
-		pte = be64_to_cpu(hpte[1]);
 		r = (pte & ~mask) | bits;
 		if (hpte_is_writable(r) && !hpte_is_writable(pte))
 			r = hpte_make_readonly(r);
@@ -661,6 +673,9 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	}
 	unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
 	asm volatile("ptesync" : : : "memory");
+	if (is_mmio_hpte(v, pte))
+		atomic64_inc(&kvm->arch.mmio_update);
+
 	return H_SUCCESS;
 }
 
@@ -831,6 +846,37 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
 	20,	/* 1M, unsupported */
 };
 
+static struct mmio_hpte_cache_entry *mmio_cache_search(struct kvm_vcpu *vcpu,
+		unsigned long eaddr, unsigned long slb_v, long mmio_update)
+{
+	struct mmio_hpte_cache_entry *entry = NULL;
+	unsigned int pshift;
+	unsigned int i;
+
+	for (i = 0; i < MMIO_HPTE_CACHE_SIZE; i++) {
+		entry = &vcpu->arch.mmio_cache.entry[i];
+		if (entry->mmio_update = mmio_update) {
+			pshift = entry->slb_base_pshift;
+			if ((entry->eaddr >> pshift) = (eaddr >> pshift) &&
+			    entry->slb_v = slb_v)
+				return entry;
+		}
+	}
+	return NULL;
+}
+
+static struct mmio_hpte_cache_entry *
+			next_mmio_cache_entry(struct kvm_vcpu *vcpu)
+{
+	unsigned int index = vcpu->arch.mmio_cache.index;
+
+	vcpu->arch.mmio_cache.index++;
+	if (vcpu->arch.mmio_cache.index = MMIO_HPTE_CACHE_SIZE)
+		vcpu->arch.mmio_cache.index = 0;
+
+	return &vcpu->arch.mmio_cache.entry[index];
+}
+
 /* When called from virtmode, this func should be protected by
  * preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK
  * can trigger deadlock issue.
@@ -932,25 +978,36 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	unsigned long valid;
 	struct revmap_entry *rev;
 	unsigned long pp, key;
+	struct mmio_hpte_cache_entry *cache_entry = NULL;
+	long mmio_update = 0;
 
 	/* For protection fault, expect to find a valid HPTE */
 	valid = HPTE_V_VALID;
-	if (status & DSISR_NOHPTE)
+	if (status & DSISR_NOHPTE) {
 		valid |= HPTE_V_ABSENT;
-
-	index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
-	if (index < 0) {
-		if (status & DSISR_NOHPTE)
-			return status;	/* there really was no HPTE */
-		return 0;		/* for prot fault, HPTE disappeared */
+		mmio_update = atomic64_read(&kvm->arch.mmio_update);
+		cache_entry = mmio_cache_search(vcpu, addr, slb_v, mmio_update);
 	}
-	hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
-	v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
-	r = be64_to_cpu(hpte[1]);
-	rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
-	gr = rev->guest_rpte;
+	if (cache_entry) {
+		index = cache_entry->pte_index;
+		v = cache_entry->hpte_v;
+		r = cache_entry->hpte_r;
+		gr = cache_entry->rpte;
+	} else {
+		index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
+		if (index < 0) {
+			if (status & DSISR_NOHPTE)
+				return status;	/* there really was no HPTE */
+			return 0;	/* for prot fault, HPTE disappeared */
+		}
+		hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
+		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
+		r = be64_to_cpu(hpte[1]);
+		rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
+		gr = rev->guest_rpte;
 
-	unlock_hpte(hpte, v);
+		unlock_hpte(hpte, v);
+	}
 
 	/* For not found, if the HPTE is valid by now, retry the instruction */
 	if ((status & DSISR_NOHPTE) && (v & HPTE_V_VALID))
@@ -988,12 +1045,32 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	vcpu->arch.pgfault_index = index;
 	vcpu->arch.pgfault_hpte[0] = v;
 	vcpu->arch.pgfault_hpte[1] = r;
+	vcpu->arch.pgfault_cache = cache_entry;
 
 	/* Check the storage key to see if it is possibly emulated MMIO */
-	if (data && (vcpu->arch.shregs.msr & MSR_IR) &&
-	    (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) =
-	    (HPTE_R_KEY_HI | HPTE_R_KEY_LO))
-		return -2;	/* MMIO emulation - load instr word */
+	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) =
+	    (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) {
+		if (!cache_entry) {
+			unsigned int pshift = 12;
+			unsigned int pshift_index;
+
+			if (slb_v & SLB_VSID_L) {
+				pshift_index = ((slb_v & SLB_VSID_LP) >> 4);
+				pshift = slb_base_page_shift[pshift_index];
+			}
+			cache_entry = next_mmio_cache_entry(vcpu);
+			cache_entry->eaddr = addr;
+			cache_entry->slb_base_pshift = pshift;
+			cache_entry->pte_index = index;
+			cache_entry->hpte_v = v;
+			cache_entry->hpte_r = r;
+			cache_entry->rpte = gr;
+			cache_entry->slb_v = slb_v;
+			cache_entry->mmio_update = mmio_update;
+		}
+		if (data && (vcpu->arch.shregs.msr & MSR_IR))
+			return -2;	/* MMIO emulation - load instr word */
+	}
 
 	return -1;		/* send fault up to host kernel mode */
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries
  2016-11-02 13:13   ` Yongji Xie
@ 2016-11-03  0:38     ` Paul Mackerras
  -1 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2016-11-03  0:38 UTC (permalink / raw)
  To: Yongji Xie; +Cc: kvm-ppc, kvm, zhong

On Wed, Nov 02, 2016 at 09:13:39PM +0800, Yongji Xie wrote:
> This keeps a per vcpu cache for recently page faulted MMIO entries.
> On a page fault, if the entry exists in the cache, we can avoid
> locking the HPTE in real mode and virtual mode, then directly
> call kvmppc_hv_emulate_mmio().
> 
> In current implenment, we limit the size of cache to four. We think
> it's enough to cover the high-frequency MMIO HPTEs in most case.
> For example, considering the case of using virtio device, for virtio
> legacy devices, one HPTE could handle notifications from up to
> 1024 (64K page / 64 byte Port IO register) devices, so one cache entry
> is enough; for virtio modern devices, we always need one HPTE to handle
> notification for each device because modern device would use a 8M MMIO
> register to notify host instead of Port IO register, typically the
> system's configuration should not exceed four virtio devices per
> vcpu, four cache entry is also enough in this case. Of course, if needed,
> we could also modify the macro to a module parameter in the future.

Thanks, the patch looks nice.  I did notice one thing though, see below.

Do you have any performance results with and without the patch, to
show the improvement it gives?  Something like the throughput for a
1-byte ping-pong message test over virtio-net with vhost, perhaps?

> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 61ff8ee..1f048e7 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -389,6 +389,11 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
>  #define LOCK_TOKEN	(*(u32 *)(&get_paca()->paca_index))
>  #endif
>  
> +static inline int is_mmio_hpte(unsigned long v, unsigned long r)
> +{
> +	return ((v & HPTE_V_ABSENT) && (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)));

Shouldn't this be testing (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
(HPTE_R_KEY_HI | HPTE_R_KEY_LO), rather than just a non-zero key field?

Regards,
Paul.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries
@ 2016-11-03  0:38     ` Paul Mackerras
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2016-11-03  0:38 UTC (permalink / raw)
  To: Yongji Xie; +Cc: kvm-ppc, kvm, zhong

On Wed, Nov 02, 2016 at 09:13:39PM +0800, Yongji Xie wrote:
> This keeps a per vcpu cache for recently page faulted MMIO entries.
> On a page fault, if the entry exists in the cache, we can avoid
> locking the HPTE in real mode and virtual mode, then directly
> call kvmppc_hv_emulate_mmio().
> 
> In current implenment, we limit the size of cache to four. We think
> it's enough to cover the high-frequency MMIO HPTEs in most case.
> For example, considering the case of using virtio device, for virtio
> legacy devices, one HPTE could handle notifications from up to
> 1024 (64K page / 64 byte Port IO register) devices, so one cache entry
> is enough; for virtio modern devices, we always need one HPTE to handle
> notification for each device because modern device would use a 8M MMIO
> register to notify host instead of Port IO register, typically the
> system's configuration should not exceed four virtio devices per
> vcpu, four cache entry is also enough in this case. Of course, if needed,
> we could also modify the macro to a module parameter in the future.

Thanks, the patch looks nice.  I did notice one thing though, see below.

Do you have any performance results with and without the patch, to
show the improvement it gives?  Something like the throughput for a
1-byte ping-pong message test over virtio-net with vhost, perhaps?

> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 61ff8ee..1f048e7 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -389,6 +389,11 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
>  #define LOCK_TOKEN	(*(u32 *)(&get_paca()->paca_index))
>  #endif
>  
> +static inline int is_mmio_hpte(unsigned long v, unsigned long r)
> +{
> +	return ((v & HPTE_V_ABSENT) && (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)));

Shouldn't this be testing (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) =
(HPTE_R_KEY_HI | HPTE_R_KEY_LO), rather than just a non-zero key field?

Regards,
Paul.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries
  2016-11-03  0:38     ` Paul Mackerras
@ 2016-11-03  7:41       ` Yongji Xie
  -1 siblings, 0 replies; 10+ messages in thread
From: Yongji Xie @ 2016-11-03  7:41 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm, zhong

On 2016/11/3 8:38, Paul Mackerras wrote:

> On Wed, Nov 02, 2016 at 09:13:39PM +0800, Yongji Xie wrote:
>> This keeps a per vcpu cache for recently page faulted MMIO entries.
>> On a page fault, if the entry exists in the cache, we can avoid
>> locking the HPTE in real mode and virtual mode, then directly
>> call kvmppc_hv_emulate_mmio().
>>
>> In current implenment, we limit the size of cache to four. We think
>> it's enough to cover the high-frequency MMIO HPTEs in most case.
>> For example, considering the case of using virtio device, for virtio
>> legacy devices, one HPTE could handle notifications from up to
>> 1024 (64K page / 64 byte Port IO register) devices, so one cache entry
>> is enough; for virtio modern devices, we always need one HPTE to handle
>> notification for each device because modern device would use a 8M MMIO
>> register to notify host instead of Port IO register, typically the
>> system's configuration should not exceed four virtio devices per
>> vcpu, four cache entry is also enough in this case. Of course, if needed,
>> we could also modify the macro to a module parameter in the future.
> Thanks, the patch looks nice.  I did notice one thing though, see below.
>
> Do you have any performance results with and without the patch, to
> show the improvement it gives?  Something like the throughput for a
> 1-byte ping-pong message test over virtio-net with vhost, perhaps?
Hi Paul,

Here is the performance result (About 1% ~ 2% improvement):

fio(iops)
===========================

     Before    After
1st 32095    32616
2nd 32032    32635
3rd 32057    32638
4th 32005    32598
5th 32069    32628

netperf(TCP_RR, Trans.Rate per sec)
===========================

     Before    After
1st 30734.66    31328.63
2nd 30749.26    31337.77
3rd 30825.99    31278.19
4th 30708.94    31350.81
5th 30832.35    31484.80

I will add this in cover-letter in next version.

>> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> index 61ff8ee..1f048e7 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> @@ -389,6 +389,11 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
>>   #define LOCK_TOKEN	(*(u32 *)(&get_paca()->paca_index))
>>   #endif
>>   
>> +static inline int is_mmio_hpte(unsigned long v, unsigned long r)
>> +{
>> +	return ((v & HPTE_V_ABSENT) && (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)));
> Shouldn't this be testing (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
> (HPTE_R_KEY_HI | HPTE_R_KEY_LO), rather than just a non-zero key field?
>
Oh, yes. I will fix it.

Thanks,
Yongji


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries
@ 2016-11-03  7:41       ` Yongji Xie
  0 siblings, 0 replies; 10+ messages in thread
From: Yongji Xie @ 2016-11-03  7:41 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm, zhong

On 2016/11/3 8:38, Paul Mackerras wrote:

> On Wed, Nov 02, 2016 at 09:13:39PM +0800, Yongji Xie wrote:
>> This keeps a per vcpu cache for recently page faulted MMIO entries.
>> On a page fault, if the entry exists in the cache, we can avoid
>> locking the HPTE in real mode and virtual mode, then directly
>> call kvmppc_hv_emulate_mmio().
>>
>> In current implenment, we limit the size of cache to four. We think
>> it's enough to cover the high-frequency MMIO HPTEs in most case.
>> For example, considering the case of using virtio device, for virtio
>> legacy devices, one HPTE could handle notifications from up to
>> 1024 (64K page / 64 byte Port IO register) devices, so one cache entry
>> is enough; for virtio modern devices, we always need one HPTE to handle
>> notification for each device because modern device would use a 8M MMIO
>> register to notify host instead of Port IO register, typically the
>> system's configuration should not exceed four virtio devices per
>> vcpu, four cache entry is also enough in this case. Of course, if needed,
>> we could also modify the macro to a module parameter in the future.
> Thanks, the patch looks nice.  I did notice one thing though, see below.
>
> Do you have any performance results with and without the patch, to
> show the improvement it gives?  Something like the throughput for a
> 1-byte ping-pong message test over virtio-net with vhost, perhaps?
Hi Paul,

Here is the performance result (About 1% ~ 2% improvement):

fio(iops)
=============
     Before    After
1st 32095    32616
2nd 32032    32635
3rd 32057    32638
4th 32005    32598
5th 32069    32628

netperf(TCP_RR, Trans.Rate per sec)
=============
     Before    After
1st 30734.66    31328.63
2nd 30749.26    31337.77
3rd 30825.99    31278.19
4th 30708.94    31350.81
5th 30832.35    31484.80

I will add this in cover-letter in next version.

>> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> index 61ff8ee..1f048e7 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
>> @@ -389,6 +389,11 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
>>   #define LOCK_TOKEN	(*(u32 *)(&get_paca()->paca_index))
>>   #endif
>>   
>> +static inline int is_mmio_hpte(unsigned long v, unsigned long r)
>> +{
>> +	return ((v & HPTE_V_ABSENT) && (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)));
> Shouldn't this be testing (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) =
> (HPTE_R_KEY_HI | HPTE_R_KEY_LO), rather than just a non-zero key field?
>
Oh, yes. I will fix it.

Thanks,
Yongji


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-11-03  7:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-02 13:13 [PATCH 0/2] KVM: PPC: Book3S HV: Optimize for MMIO emulation Yongji Xie
2016-11-02 13:13 ` Yongji Xie
2016-11-02 13:13 ` [PATCH 1/2] KVM: PPC: Book3S HV: Clear the key field of HPTE when the page is paged out Yongji Xie
2016-11-02 13:13   ` Yongji Xie
2016-11-02 13:13 ` [PATCH 2/2] KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries Yongji Xie
2016-11-02 13:13   ` Yongji Xie
2016-11-03  0:38   ` Paul Mackerras
2016-11-03  0:38     ` Paul Mackerras
2016-11-03  7:41     ` Yongji Xie
2016-11-03  7:41       ` Yongji Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.